GSoC 2023 Week 7 - FastISel Is Really Fast
24 Jun 2023
This blog post is related to my GSoC 2023 Project.
This week, I spent most of my time on FastISel
and VarLocBasedLDV
. In the process, I learnt
that FastISel
is in fact, as the name implies, quite “fast”. I also tried to optimize
VarLocBasedLDV
, however I fell into a derp-induced optimization which gave me the wrong confidence that I
had actually found a good improvement :D.
So, with FastISel
I tried a few optimizations I thought would have good potential:
-
Remove
toIndex_
as a member ofIndexedMap
- This is a commit that I feel I should still check the validity of, as even though it didn’t give any useful performance improvements, it still removes an unnecessary member from a container which may have benefits in terms of memory usage. I found it a bit odd that the functor to map the index was instantiated as a non-static member, as the functor is usually constructed as a temporary object instd::
functions. -
Give
MapVector
better performance for small sizes - This is one I definitely expected to have really good results. Maybe I didn’t replace enough uses, but I expected this change to have an impact as it is basically the same thing I did forSetVector
just applied toMapVector
. Maybe the requirement of it being a map instead of a set has weird memory performance due to gaps between keys caused by the values, but I don’t know. I feel I should come back to this optimization, because I really do feel there is performance to be gained here. -
Replace usages of
MapVector
withSmallMapVector
- A follow-up to the previous change, unfortunately measurable improvements :(. -
Replace
DenseMap
withSmallDenseMap
- I expected this to improve performance, but for some reason it didn’t. I have noticed that generally replacing the small versions ofDenseMap
andDenseSet
with the generic large versions gives better performance, but I have no idea why. -
Replace usages of
MapVector
withSmallMapVector
in clang - Same thing as before, just in clang. No measurable difference unfortunately. -
Reduce the number of branches in
ScheduleDAGSDNodes
- A bit of a detour from theFastISel
stuff, for some reason I committed it under the same branch. Not much measurable results here either, sadly.
I also got a good few experiments out of VarLocBasedLDV
, however I haven’t hammered them into something
useful yet:
-
Make
CoalescingBitVector
movable again - This change reverts a change done by a previous patch: https://reviews.llvm.org/D76465. This patch removed the move constructors, and I’m not really sure why. So, I added them back and removed the unnecessarystd::unique_ptr
in the type aliasVarLocInMBB
. This gave a noise-equivalent speedup, but I think its still worth pursuing for memory reasons. -
Messing around with data structures - As I said before, this is one of the things I like doing most. However, in this case it actually leads to a major slow down, ~0.3%. That is not good.
-
“Optimize”
collectIDsForRegs
- “Optimize” is in quotes there, because I didn’t realize at the time that I was removing an iteration over the sorted collection from the for-loop. I did that, then forgot about it, then removed the sort entirely from the function because I only saw a use in taking thefront()
element. Not my brightest moment. -
Replace
std::vector
withSmallVector
- I noticed that this gave a gain locally, unfortunately this gain was not visible on the compile time tracker. In general,SmallVector
is supposed to be better thanstd::vector
, because it does less work regarding exception handling among other things. -
Remove a redundant two-time lookup - This was pointed out to me by my mentor, and it does have a bit of an impact however its not enough to really warrant a patch. I will probably commit this as an NFC later on.
So, those were all the experiments I did this week. There were quite a few of them, unfortunately none of them panned out the way I wanted them to. However, I do feel there are a few of them that should still be looked into further, and I will probably do so.