GSoC 2023 Week 1 - Getting Started

This blog post is related to my GSoC 2023 Project.

Before anything, I would like to thank my mentor Nikita Popov for deciding to mentor me and accepting my proposal. Working on LLVM has been a dream of mine since I discovered both GSoC and it in 2021 and being able to realise it within the short span of 2 years means a lot to me.

The project that I have decided to work on is Improving Compile Times, which is basically about finding and exploiting optimization opportunities within LLVM to decrease geomean compile times by (realistically) 1% to (optimistically) 4-5%. For this, the main target for optimization is SelectionDAG and within it DAGCombiner, as they use up a major portion of compile time.

To start with, with the idea from my mentor, I removed a redundant call to the expensive SelectionDAG::computeKnownBits function from TargetLowering::SimplifyDemandedBits, for the cases of ISD::MUL, ISD::ADD and ISD::SUB. More optimization opportunities remain though, as the NSW (No Signed Wrap) flag is currently not used as it is also not used by the KnownBits::computeForAddCarry function (which is called by computeKnownBits). This ensures that the change remains an NFC, and further work can be done in computeForAddCarry with more research regarding whether or not the add with carry instructions support the NSW flag.

This change nets a speedup of 0.26% on the geomean compile times, which is quite surprising given just how simple it is. I guess because of the frequency of ADD, SUB and MUL instructions, any work done on their code paths will net a large gain.

The next bit of work to be done involves moving a fold of an ISD::ADD to an ISD::OR when the two operands do not have any shared known bits, i.e. folding a + b into a | b. This code currently resides in the DAGCombiner::visitADD function and calls the SelectionDAG::haveNoCommonBitsSet function, which is quite expensive as once again it involves calls to computeKnownBits. All the information required to carry out this transformation is already present when SimplifyDemandedBits is called through visitADDLike.

This transformation therefore is pretty easy to move into SimplifyDemandedBits, however the issue that I am stuck on currently and have been stuck on for the past 2 days is that this is causing transformation failures where ADDs are not being correctly folded to ORs in some cases (for example AVX, vpaddd -> vpor in llvm/test/CodeGen/X86/combine-shl.ll). Hopefully I can figure out this issue by this weekend and have a patch submitted for it, as it seems strange that the same logic in a different place within the same code path is causing differing results.

The next bit of work that I can do likely involves either working on the support for the NSW flag in the KnownBits::computeForAddCarry function or starting on the part of my proposal which involves fixing this issue: #33910. The idea that I have with this issue is that a large amount of time is spent in llvm::MDNodeKeyImpl<llvm::DIDerivedType>::isKeyOf, which is a very complex operation which can hopefully be sped up by rearranging the comparisons to check the most-likely-to-be-unequal members first. It will be interesting to see what gains can actually be had from such a change.

For now, until I figure out the issue with SimplifyDemandedBits, I am binge watching the LLVM developer meeting talks on YouTube to learn more about what actually goes on behind the scenes in LLVM, mainly focusing on the backend i.e. things such as SelectionDAG, MIR and the MC layer. These talks have already been immensely helpful with understanding the extremely vast and complex systems that make up LLVM, and I highly recommend watching them to anyone who is interested in getting into working on LLVM.