GSoC 2023 Week 6 - RegAlloc Experiments
23 Jun 2023
This blog post is related to my GSoC 2023 Project.
This week, I spent most of my time just analyzing code. I was digging mainly through profiles of
RegAllocGreedy
, and I was trying to find areas to improve. While I wasn’t able to do much in general, I
was able to get one useful patch out of it:
https://reviews.llvm.org/D152781.
While going through the profiles, I noticed that a fair bit of time was being spent in the function
BlockFrequency::operator+=
. I initially tried to see if I could optimize it, but there was not much to be
done there. Then I noticed that it wasn’t being inlined, so I asked around and got to know that LLVM in general is not
built using LTO. As this function is called quite a lot, such a simple thing not being inlined led to an
unnecessary overhead. By moving this method into the header so that it could be inlined everywhere, a
0.1% improvement
was gained.
I also tried a few other experiments, which led to no useful gains:
-
Trying to speed up
BitVector::reference::operator=
- I noticed that this function was using a branch where it wasn’t necessary, however its probably not used often enough to matter. -
Replacing
array_pod_sort
withstd::sort
-array_pod_sort
is used in the LLVM codebase as it is not a template and instead usesqsort
as its sorting algorithm. This gives a good binary size reduction as it reduces bloat caused by template instantiations. However,std::sort
is a fair bit faster, and it gave a meaningful improvement in this case but not enough to warrant a patch. -
Changing a couple of container sizes - This is my favourite kind of experiment, cause it is super easy and usually gives results. In this case, making the containers’ small sizes just a bit bigger gives a slightly measurable result though it is actually not really distinguishable from noise.
-
Adding
LLVM_LIKELY
andLLVM_UNLIKELY
- I didn’t expect this to have any real results, and it didn’t. I noticed that a fair number of jumps were being taken when the condition was false on these conditions, which could have led to an issue with jumping to instructions which are out of cache. However, there is no such issue in reality because the jumps are likely too small to matter, so there was no measurable difference.
I also looked for a long time into RegAllocFast
, however it is a very small file and very well optimized,
so I couldn’t find anything to optimize in there. Similarly, FastISel
also didn’t seem to have too much
to optimize in it either, though I did certainly try :).