GSoC 2023 Week 8 - InstCombine And SmallVector Experiments

This blog post is related to my GSoC 2023 Project.

This week, I mostly worked on experiments regarding InstCombine that my mentor recommended me to check out as possible places for improvements. One of these experiments does have some potential, though it needs further investigation.

I also worked on creating an ADT which works like SmallVector but only does allocations on the stack. This ADT was called StackVector and while it did not turn out to have any measurable improvements, it was a good experiment to learn how to implement a container whose interface is compatible with SmallVector.

The experiments were:

StackVector was an idea that I had when I saw SmallSet and noticed that it only uses the stack storage of the SmallVector, wasting code space and compile times on the part of the code used to manipulate the heap storage. So, I spent 3-4 days trying to come up with a container that follows the exact interface of SmallVector, just with storage limited to the stack. As it turns out, this is also a container that may make it into the C++26 standard: P0843, std::inplace_vector.

Designing this container was… interesting. SmallVector is relied on very heavily throughout LLVM, and it is a super-optimized container with excellent performance. It does some strange things, however:

This is what the final implementation ended up looking like: link. I spent quite some time ironing out the bugs in this implementation, there were a lot of broken functions. What helped me the most was adapting this container to the SmallVector unit tests, at least the ones that would make sense to run on this. I had one really bad bug that took me way too long to fix though: in my excitement, I ended up replacing the SmallSetVector type as well, but I forgot that SetVector actually uses the heap storage as well. Unfortunately, this was not easy to debug as I ended up making push_back() and emplace_back() fallible where it returned end() on the container being full, which meant that it would end up silently corrupting stored data when the vector was full.

So, this led to one whole day where I was trying to build some tool, and it would keep failing on the TableGen step where it would show a random failure somewhere. Even with assertions, this obviously did not fail anywhere because the container had no assertions to trip :facepalm:. No more fallible operations for me, everything must crash and burn when something goes wrong.

Even more unfortunately, though expected, no meaningful improvements, which is slowly becoming the tagline of my GSoC project :(. This also involved a change where SmallSet would store a union of the set and the vector instead of both together, to save memory. I was not able to measure if this made a difference in memory, though I expect it would have done something, at the least. Oh well, it gave a tiny regression anyways, just as a last insult. I suspect I won’t go on any container making journeys any time soon, it is probably much easier to just make the ones that LLVM already has run faster.