GSoC 2023 Week 13 and 14 - Small Experiments

14 Aug 2023

This blog post is related to my GSoC 2023 Project.

These two weeks, I worked on some random experiments, of which 3 had useful results. These were mostly done on the branches perf/instcombine4 and perf/instcombine5. While the commits on both branches are functionally the same, the perf/instcombine5 branch (naturally) contains the better / more updated code which is far more likely to be made into patches.

On the perf/instcombine4 branch, I made the following commits:

Replacing a slow O(n) lookup with a DenseMap: Within the function TargetLibraryInfo::getLibFunc, there is a lookup to the static, compile-time constant array StandardNames to find the index of the queried library function. These can be functions like getc, putc, printf etc. The problem with the old approach was that it was calling the function std::lower_bound to find the element, which would be much slower than just using a map lookup.

So, I updated the code to use a DenseMap and that gave a decent 0.24% speedup.
Remove calls to computeKnownBits for non-intrinsic CallInst instructions: This was a minor thing, a continuation of the work done on moving handling for LoadInst into isKnownNonZeroFromOperator to directly check the presence of range metadata. Similarly, this commit moves the handling for “getReturnedArgOperand” from computeKnownBits into isKnownNonZeroFromOperator, which gives a slight 0.06% speedup.
Reduce calls to computeKnownBits in visitAdd for InstCombine: This commit aimed to reduce the computeKnownBits calls in computeConstantRangeIncludingKnownBits by passing the KnownBits information computed in haveNoCommonBitsSet through to willNotOverflowSignedAdd. I implemented this by simply adding overloads for the required functions that take KnownBits parameters wherever required.

This gave me a better speedup than I expected, 0.15%.

On the perf/instcombine5 branch, I made a few changes to the last commit on feedback from my mentor, that being the creation of a new class which would lazily compute the KnownBits information instead of piping it through the various functions so that no new overloads would be needed. This class ended up taking on the name CachedBitsValue, which is still a work in progress. It is a fairly simple implementation, which stores a Value * and the KnownBits information associated with it. This information is computed as-required by calling getKnownBits, and the computed value is cached within the class.

Unfortunately, to allow implicit creation from a Value *, any parameter that accepted either a Value * or a const Value * needs to be updated to use a const CachedBitsConstValue & or a const CachedBitsNonConstValue &, to allow implicit creation of a temporary. What this means is that both the KnownBits member and the boolean member that signifies its presence need to be made mutable so that they can be modified from const objects. This is quite an ugly solution, but it comes from the fact that C++ doesn’t allow you to implicitly create an object that binds to a non-const l-value reference.

With this in place however, the code for passing through KnownBits information becomes quite clean. This is shown by the first commit below, where both of these commits are on the perf/instcombine5 branch:

Reduce computeKnownBits calls: This is a rebase of the previous commit, with the overloads replaced by usage of CachedBitsValue and some extra work done to isKnownNonZero. This gives a slower speedup of 0.1%. This is most likely because of the isKnownNonZero code causing a regression, however I will have to investigate further to know for sure.
Reduce getLibFunc calls: This commit aimed to reduce duplicate calls to getLibFunc that I found within FortifiedLibCallSimplifier. This builds upon the first commit from the previous branch, and I think further work can be done here to reduce the number of calls even more. However, with the optimization introduced by using DenseMap, this doesn’t really do much and gives a negligible difference.

I also tried reducing the number of computeKnownBits calls in simplifyICmpInst. I noticed that visitICmpInst was calling foldICmpUsingKnownBits, which computes known bits information. Before that however, simplifyICmpInst calls simplifyICmpWithZero which ends up calling computeKnownBits a few times, a redundant call which can be handled by reusing the information computed before in foldICmpUsingKnownBits. This is incidentally where the change to simplifyICmpWithZero comes from in the above commit using CachedBitsValue, however it unfortunately ended up not giving a measurable improvement at all. It also led to three test regressions, which led to me abandoning the patch.

dc03

GSoC 2023 Week 13 and 14 - Small Experiments

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023