GSoC 2023 Week 7 - FastISel Is Really Fast
24 Jun 2023
This blog post is related to my GSoC 2023 Project.
This week, I spent most of my time on FastISel and VarLocBasedLDV. In the process, I learnt
that FastISel is in fact, as the name implies, quite “fast”. I also tried to optimize
VarLocBasedLDV, however I fell into a derp-induced optimization which gave me the wrong confidence that I
had actually found a good improvement :D.
So, with FastISel I tried a few optimizations I thought would have good potential:
-
Remove
toIndex_as a member ofIndexedMap- This is a commit that I feel I should still check the validity of, as even though it didn’t give any useful performance improvements, it still removes an unnecessary member from a container which may have benefits in terms of memory usage. I found it a bit odd that the functor to map the index was instantiated as a non-static member, as the functor is usually constructed as a temporary object instd::functions. -
Give
MapVectorbetter performance for small sizes - This is one I definitely expected to have really good results. Maybe I didn’t replace enough uses, but I expected this change to have an impact as it is basically the same thing I did forSetVectorjust applied toMapVector. Maybe the requirement of it being a map instead of a set has weird memory performance due to gaps between keys caused by the values, but I don’t know. I feel I should come back to this optimization, because I really do feel there is performance to be gained here. -
Replace usages of
MapVectorwithSmallMapVector- A follow-up to the previous change, unfortunately measurable improvements :(. -
Replace
DenseMapwithSmallDenseMap- I expected this to improve performance, but for some reason it didn’t. I have noticed that generally replacing the small versions ofDenseMapandDenseSetwith the generic large versions gives better performance, but I have no idea why. -
Replace usages of
MapVectorwithSmallMapVectorin clang - Same thing as before, just in clang. No measurable difference unfortunately. -
Reduce the number of branches in
ScheduleDAGSDNodes- A bit of a detour from theFastISelstuff, for some reason I committed it under the same branch. Not much measurable results here either, sadly.
I also got a good few experiments out of VarLocBasedLDV, however I haven’t hammered them into something
useful yet:
-
Make
CoalescingBitVectormovable again - This change reverts a change done by a previous patch: https://reviews.llvm.org/D76465. This patch removed the move constructors, and I’m not really sure why. So, I added them back and removed the unnecessarystd::unique_ptrin the type aliasVarLocInMBB. This gave a noise-equivalent speedup, but I think its still worth pursuing for memory reasons. -
Messing around with data structures - As I said before, this is one of the things I like doing most. However, in this case it actually leads to a major slow down, ~0.3%. That is not good.
-
“Optimize”
collectIDsForRegs- “Optimize” is in quotes there, because I didn’t realize at the time that I was removing an iteration over the sorted collection from the for-loop. I did that, then forgot about it, then removed the sort entirely from the function because I only saw a use in taking thefront()element. Not my brightest moment. -
Replace
std::vectorwithSmallVector- I noticed that this gave a gain locally, unfortunately this gain was not visible on the compile time tracker. In general,SmallVectoris supposed to be better thanstd::vector, because it does less work regarding exception handling among other things. -
Remove a redundant two-time lookup - This was pointed out to me by my mentor, and it does have a bit of an impact however its not enough to really warrant a patch. I will probably commit this as an NFC later on.
So, those were all the experiments I did this week. There were quite a few of them, unfortunately none of them panned out the way I wanted them to. However, I do feel there are a few of them that should still be looked into further, and I will probably do so.