GSoC 2023 Week 1 - Getting Started
11 May 2023This blog post is related to my GSoC 2023 Project.
Before anything, I would like to thank my mentor Nikita Popov for deciding to mentor me and accepting my proposal. Working on LLVM has been a dream of mine since I discovered both GSoC and it in 2021 and being able to realise it within the short span of 2 years means a lot to me.
The project that I have decided to work on is Improving Compile Times
, which is basically about finding
and exploiting optimization opportunities within LLVM to decrease geomean compile times by (realistically) 1% to
(optimistically) 4-5%. For this, the main target for optimization is SelectionDAG
and within it
DAGCombiner
, as they use up a major portion of compile time.
To start with, with the idea from my mentor,
I removed a redundant call to the
expensive SelectionDAG::computeKnownBits
function from
TargetLowering::SimplifyDemandedBits
, for the cases of ISD::MUL
, ISD::ADD
and
ISD::SUB
. More optimization opportunities remain though, as the NSW (No Signed Wrap) flag is currently
not used as it is also not used by the KnownBits::computeForAddCarry
function (which is called by
computeKnownBits
). This ensures that the change remains an NFC, and further work can be done in
computeForAddCarry
with more research regarding whether or not the add with carry instructions support
the NSW flag.
This change nets a speedup of 0.26% on the geomean compile times, which is quite surprising given just how simple it is. I guess because of the frequency of ADD, SUB and MUL instructions, any work done on their code paths will net a large gain.
The next bit of work to be done involves moving a fold of an ISD::ADD
to an ISD::OR
when
the two operands do not have any shared known bits, i.e. folding a + b
into a | b
. This
code currently resides in the DAGCombiner::visitADD
function and calls the
SelectionDAG::haveNoCommonBitsSet
function, which is quite expensive as once again it involves calls to
computeKnownBits
. All the information required to carry out this transformation is already present when
SimplifyDemandedBits
is called through visitADDLike
.
This transformation therefore is pretty easy to move into SimplifyDemandedBits
, however the issue that
I am stuck on currently and have been stuck on for the past 2 days is that this is causing transformation failures
where ADDs are not being correctly folded to ORs in some cases (for example AVX, vpaddd
->
vpor
in llvm/test/CodeGen/X86/combine-shl.ll
). Hopefully I can figure out this issue by
this weekend and have a patch submitted for it, as it seems strange that the same logic in a different place within
the same code path is causing differing results.
The next bit of work that I can do likely involves either working on the support for the NSW flag in the
KnownBits::computeForAddCarry
function or starting on the part of my proposal which involves fixing
this issue: #33910. The idea that I have with this
issue is that a large amount of time is spent in
llvm::MDNodeKeyImpl<llvm::DIDerivedType>::isKeyOf
, which is a very complex operation which can
hopefully be sped up by rearranging the comparisons to check the most-likely-to-be-unequal members first. It will be
interesting to see what gains can actually be had from such a change.
For now, until I figure out the issue with SimplifyDemandedBits
, I am binge watching the LLVM developer
meeting talks on YouTube to learn more about what actually goes on behind the scenes in LLVM, mainly focusing on the
backend i.e. things such as SelectionDAG
, MIR
and the MC
layer. These talks
have already been immensely helpful with understanding the extremely vast and complex systems that make up LLVM, and
I highly recommend watching them to anyone who is interested in getting into working on LLVM.