GSoC 2023 Week 8 - InstCombine And SmallVector Experiments
06 Jul 2023
This blog post is related to my GSoC 2023 Project.
This week, I mostly worked on experiments regarding InstCombine
that my mentor recommended me to check
out as possible places for improvements. One of these experiments does have some potential, though it needs further
investigation.
I also worked on creating an ADT which works like SmallVector
but only does allocations on the stack.
This ADT was called StackVector
and while it did not turn out to have any measurable improvements, it was
a good experiment to learn how to implement a container whose interface is compatible with SmallVector
.
The experiments were:
-
Remove the call to
threadCmpOverPHI
insimplifyICmpInst
- This is a change to measure the speedup that can be gained by disablingthreadCmpOverPHI
as it seemed to be a fairly hot path in the code. This gave a 0.24% speedup. -
Limit the recursion depth in
threadCmpOverPHI
to 1. - Other places in the compiler (such asValueTracking
) only allow recursing one level into phi nodes. By limiting the recursion depth to 1 inInstSimplify
as well, it is possible to get a 0.1% speedup. This is the patch that is worth pursuing. -
Query
!range
metadata only onLoad
andCall
instructions -!range
metadata can occur only inLoad
andCall
instructions, so it makes sense to only query it in those instructions. Unfortunately, this does not give a measurable improvement. -
Change
cmpExcludesZero
to returnstd::optional<bool>
- This change took me 3 days to make, because I spent all that time trying to figure out how the function actually works.cmpExcludesZero
is a function that returns true whenever the predicatePred
being queried having the second operandRHS
will always return false when the first operand (i.e. the LHS) is zero, hence the name.If it is called with the inverse predicate of
Pred
and it returns true, it means that the predicate will always return true when the LHS is zero. An example of this would be the expressionLHS == 0
. To merge these two cases into one, the following logic applies:-
If the range of values where the comparison returns true includes zero, return false.
-
If the range does not include zero, return true.
-
As the range can only be generated when the RHS is a known-constant value, return failure (i.e.
std::nullopt
) to indicate that there is no way to predict the result.
It is quite unfortunate then, that this does not give a measurable speedup .
-
-
Track the value of
NumUsesExplored
inisKnownNonNullFromDominatingCondition
inside inner loops as well - This was an idea to reduce the number of iterations that the loops would perform, however they are not usually high enough for this to make a change. No measurable improvement, unfortunately . -
Replace
getDereferenceableBytes()
withhasParamAttribute()
inAttribute::hasNonNullAttr()
- Same thing as the above, no measurable speedup . -
Delete the code for
Instruction::GetElementPtr
fromcomputeKnownBitsFromOperator
- Simple change to measure the possible gains, gives a 0.7% speedup .
StackVector
was an idea that I had when I saw SmallSet
and noticed that it only uses the
stack storage of the SmallVector
, wasting code space and compile times on the part of the code used to
manipulate the heap storage. So, I spent 3-4 days trying to come up with a container that follows the exact interface
of SmallVector
, just with storage limited to the stack. As it turns out, this is also a container that
may make it into the C++26 standard:
P0843,
std::inplace_vector
.
Designing this container was… interesting. SmallVector
is relied on very heavily throughout LLVM, and
it is a super-optimized container with excellent performance. It does some strange things, however:
-
Whenever the contained type satisfies some conditions, it switches from calling
std::uninitialized_move
tostd::uninitialized_copy
. This is good for performance for certain types, however it means that there are types where the move constructor has to behave like a copy constructor because even though the move constructor is called, the type is not move-constructible… not fun. I ended up giving up on this problem and just copied whatSmallVector
does. -
The iterators are pointers. While this is not a huge issue in actuality, it means that there is code that implicitly relies on them being pointers, for example the constructor for
ArrayRef
, which is another type that is designed to work well withSmallVector
, for example having a constructor that accepts aSmallVectorImpl
. This effectively means that there is no drop-in iterator type that works with the existing code, the iterators have to be pointers.-
As an example, I tried to have an iterator type that could implicitly decay to the pointer type so that it would work fine with
ArrayRef
. Unfortunately, that did not work because it causedIterator::operator+
to have an ambiguity where it could resolve to either the overloadedIterator::operator+
or theoperator+(T*, ...)
that could be called by decaying the iterator to a pointer. Not fun. -
Another problem with the explicit iterator type route was that
SmallVector
and its users rely oniterator
being implicitly convertible toconst_iterator
(T*
toconst T*
). This is also not fun to deal with, it made me have to create four distinct iterator types, where theConst*
iterator types had implicit constructors from the corresponding iterator types. I just gave up and switched the iterators to pointers at this point, with the suggestion from my mentor.
-
-
No explicit constructors. There are places in the code that rely on
SmallVector
being implicitly constructible from whatever, so its not possible to have explicit constructors without updating quite a few users. There is nothing wrong with this, it is just not very good practice. Always haveexplicit
constructors when they accept only one argument.
This is what the final implementation ended up looking like:
link. I spent quite some time ironing out the bugs in this implementation, there were a lot of broken functions.
What helped me the most was adapting this container to the SmallVector
unit tests, at least the ones that
would make sense to run on this. I had one really bad bug that took me way too long to fix though: in my excitement,
I ended up replacing the SmallSetVector
type as well, but I forgot that SetVector
actually
uses the heap storage as well. Unfortunately, this was not easy to debug as I ended up making push_back()
and emplace_back()
fallible where it returned end()
on the container being full, which meant
that it would end up silently corrupting stored data when the vector was full.
So, this led to one whole day where I was trying to build some tool, and it would keep failing on the TableGen step where it would show a random failure somewhere. Even with assertions, this obviously did not fail anywhere because the container had no assertions to trip :facepalm:. No more fallible operations for me, everything must crash and burn when something goes wrong.
Even more unfortunately, though expected, no
meaningful improvements, which is slowly becoming the tagline of my GSoC project :(. This also involved a change
where SmallSet
would store a union of the set and the vector instead of both together, to save memory.
I was not able to measure if this made a difference in memory, though I expect it would have done something, at
the least. Oh well, it gave a tiny regression anyways, just as a last insult. I suspect I won’t go on any container
making journeys any time soon, it is probably much easier to just make the ones that LLVM already has run faster.