GSoC 2023 Week 8 - InstCombine And SmallVector Experiments

06 Jul 2023

This blog post is related to my GSoC 2023 Project.

This week, I mostly worked on experiments regarding InstCombine that my mentor recommended me to check out as possible places for improvements. One of these experiments does have some potential, though it needs further investigation.

I also worked on creating an ADT which works like SmallVector but only does allocations on the stack. This ADT was called StackVector and while it did not turn out to have any measurable improvements, it was a good experiment to learn how to implement a container whose interface is compatible with SmallVector.

The experiments were:

Remove the call to threadCmpOverPHI in simplifyICmpInst - This is a change to measure the speedup that can be gained by disabling threadCmpOverPHI as it seemed to be a fairly hot path in the code. This gave a 0.24% speedup.
Limit the recursion depth in threadCmpOverPHI to 1. - Other places in the compiler (such as ValueTracking) only allow recursing one level into phi nodes. By limiting the recursion depth to 1 in InstSimplify as well, it is possible to get a 0.1% speedup. This is the patch that is worth pursuing.
Query !range metadata only on Load and Call instructions - !range metadata can occur only in Load and Call instructions, so it makes sense to only query it in those instructions. Unfortunately, this does not give a measurable improvement.
Change cmpExcludesZero to return std::optional<bool> - This change took me 3 days to make, because I spent all that time trying to figure out how the function actually works. cmpExcludesZero is a function that returns true whenever the predicate Pred being queried having the second operand RHS will always return false when the first operand (i.e. the LHS) is zero, hence the name.

If it is called with the inverse predicate of Pred and it returns true, it means that the predicate will always return true when the LHS is zero. An example of this would be the expression LHS == 0. To merge these two cases into one, the following logic applies:
- If the range of values where the comparison returns true includes zero, return false.
- If the range does not include zero, return true.
- As the range can only be generated when the RHS is a known-constant value, return failure (i.e. std::nullopt) to indicate that there is no way to predict the result.
It is quite unfortunate then, that this does not give a measurable speedup .
Track the value of NumUsesExplored in isKnownNonNullFromDominatingCondition inside inner loops as well - This was an idea to reduce the number of iterations that the loops would perform, however they are not usually high enough for this to make a change. No measurable improvement, unfortunately .
Replace getDereferenceableBytes() with hasParamAttribute() in Attribute::hasNonNullAttr() - Same thing as the above, no measurable speedup .
Delete the code for Instruction::GetElementPtr from computeKnownBitsFromOperator - Simple change to measure the possible gains, gives a 0.7% speedup .

StackVector was an idea that I had when I saw SmallSet and noticed that it only uses the stack storage of the SmallVector, wasting code space and compile times on the part of the code used to manipulate the heap storage. So, I spent 3-4 days trying to come up with a container that follows the exact interface of SmallVector, just with storage limited to the stack. As it turns out, this is also a container that may make it into the C++26 standard: P0843, std::inplace_vector.

Designing this container was… interesting. SmallVector is relied on very heavily throughout LLVM, and it is a super-optimized container with excellent performance. It does some strange things, however:

Whenever the contained type satisfies some conditions, it switches from calling std::uninitialized_move to std::uninitialized_copy. This is good for performance for certain types, however it means that there are types where the move constructor has to behave like a copy constructor because even though the move constructor is called, the type is not move-constructible… not fun. I ended up giving up on this problem and just copied what SmallVector does.
The iterators are pointers. While this is not a huge issue in actuality, it means that there is code that implicitly relies on them being pointers, for example the constructor for ArrayRef, which is another type that is designed to work well with SmallVector, for example having a constructor that accepts a SmallVectorImpl. This effectively means that there is no drop-in iterator type that works with the existing code, the iterators have to be pointers.
- As an example, I tried to have an iterator type that could implicitly decay to the pointer type so that it would work fine with ArrayRef. Unfortunately, that did not work because it caused Iterator::operator+ to have an ambiguity where it could resolve to either the overloaded Iterator::operator+ or the operator+(T*, ...) that could be called by decaying the iterator to a pointer. Not fun.
- Another problem with the explicit iterator type route was that SmallVector and its users rely on iterator being implicitly convertible to const_iterator (T* to const T*). This is also not fun to deal with, it made me have to create four distinct iterator types, where the Const* iterator types had implicit constructors from the corresponding iterator types. I just gave up and switched the iterators to pointers at this point, with the suggestion from my mentor.
No explicit constructors. There are places in the code that rely on SmallVector being implicitly constructible from whatever, so its not possible to have explicit constructors without updating quite a few users. There is nothing wrong with this, it is just not very good practice. Always have explicit constructors when they accept only one argument.

This is what the final implementation ended up looking like: link. I spent quite some time ironing out the bugs in this implementation, there were a lot of broken functions. What helped me the most was adapting this container to the SmallVector unit tests, at least the ones that would make sense to run on this. I had one really bad bug that took me way too long to fix though: in my excitement, I ended up replacing the SmallSetVector type as well, but I forgot that SetVector actually uses the heap storage as well. Unfortunately, this was not easy to debug as I ended up making push_back() and emplace_back() fallible where it returned end() on the container being full, which meant that it would end up silently corrupting stored data when the vector was full.

So, this led to one whole day where I was trying to build some tool, and it would keep failing on the TableGen step where it would show a random failure somewhere. Even with assertions, this obviously did not fail anywhere because the container had no assertions to trip :facepalm:. No more fallible operations for me, everything must crash and burn when something goes wrong.

Even more unfortunately, though expected, no meaningful improvements, which is slowly becoming the tagline of my GSoC project :(. This also involved a change where SmallSet would store a union of the set and the vector instead of both together, to save memory. I was not able to measure if this made a difference in memory, though I expect it would have done something, at the least. Oh well, it gave a tiny regression anyways, just as a last insult. I suspect I won’t go on any container making journeys any time soon, it is probably much easier to just make the ones that LLVM already has run faster.

dc03

GSoC 2023 Week 8 - InstCombine And SmallVector Experiments

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023