GSoC 2023 Week 10 and 11 - A Break And A Pass
24 Jul 2023This blog post is related to my GSoC 2023 Project.
I decided to take a break during week 10, mainly because I was starting to get tired but also because it was the last
week of my summer vacations and I wanted to clear my mind before heading back to college. Thus, I did not really do much
work during this week. I committed a patch to generalize the function foldAndOrOfICmpEqZeroAndICmp
, which originally
did the following transform:
(icmp eq X, 0) | (icmp ult Other, X) -> (icmp ule Other, X-1)
(icmp ne X, 0) & (icmp uge Other, X) -> (icmp ugt Other, X-1)
While looking at this bug, my mentor initially pointed out that this function is a good candidate to base
this patch off of. While looking further into it, I realized that the fold could be generalized to the following
transform and renamed the function to foldAndOrOfICmpEqConstantAndICmp
:
(icmp eq X, C) | (icmp ult Other, (X - C)) -> (icmp ule Other, (X - (C + 1)))
(icmp ne X, C) & (icmp uge Other, (X - C)) -> (icmp ugt Other, (X - (C + 1)))
And that was pretty much all I did. The past week, I have mainly been working on three things:
-
Fixing bug #33910
As outlined in my proposal, this is a bug where
O(N^2)
behaviour was being caused in the metadata parser. This was mainly because of two things:-
The files provided along with the bug had roughly 20,000 lines of almost-identical metadata lines which only differed in the
OffsetInBits
field. I have covered this before here. -
The second part, which I have not made a patch for yet, involves changing the way the
replaceAllUsesWith
function works to only unique nodes once. This is still a work in progress, though it is mostly complete. This is a much bigger change, as it involves adding a way to callMDNode::handleChangedOperand
without having it unique the node each time. To do this, I’ve introduced two functions:resolveChangedOperandWithoutUniquing
andhandledChangedOperandWithoutUniquing
These add the node onto a map instead. The map is needed as
MDNode
calls the functionresolveAfterOperandChange
after uniquing, so we need to store the pointer to theMDNode
and then the two pointers to the modified operands. Thus it leads to the typeSmallDenseMap<Metadata *, SmallVector<std::pair<void *, Metadata *>>>
.The vector is required because there may be multiple operands that need to be resolved after the node is uniqued. While the change itself is quite simple, the problem is mainly that
Metadata.h
is included absolutely everywhere, so every change to that header ends up recompiling approximately 1600-2000 files.
-
-
Implementing the
InferAlignment
passThis pass was initially implemented by my mentor back in 2020, however problems arose because of the constant expression support in LLVM IR and the patch was basically abandoned. Now that he has pretty much removed all of the constant expression support, this pass can be added again. The implementation carries over almost exactly from the original commit, it is just updated to use the new pass manager.
This pass was mainly motivated by the issue that
InstCombine
was spending a lot of its time just inferring alignment on instructions which is information that generally profits the back-end and not the middle-end. So, this pass slots in somewhere before the back-end passes run, the best place likely being before loop unrolling as that seems to disrupt the known bits information that this pass relies on.It gives a really decent result:
- On removing alignment inference from
InstCombine
: -1.3% instrs(u) - On adding the pass into the pipeline: +0.11% instrs(u)
Overall, this means a ~1.1% speedup.
- On removing alignment inference from
-
Patch D155958
This patch fixes an inefficiency that my mentor noticed - the handling for
LoadInst
inValueTracking::isKnownNonZero
ends up callingcomputeKnownBits
the most out of any instruction, however it is not necessary at all because this information is already checked byrangeMetadataExcludesValue
here:if (auto *I = dyn_cast<Instruction>(V)) { if (MDNode *Ranges = Q.IIQ.getMetadata(I, LLVMContext::MD_range)) { // If the possible ranges don't contain zero, then the value is // definitely non-zero. if (auto *Ty = dyn_cast<IntegerType>(V->getType())) { const APInt ZeroValue(Ty->getBitWidth(), 0); if (rangeMetadataExcludesValue(Ranges, ZeroValue)) return true; } } }
This is because
computeKnownBits
forLoadInst
only checks the range metadata as well, so it is completely redundant to call it. This also means that it ends up doing the check twice each time it returns false. Removing the check gives a 0.12% speedup.
In the coming week I should be able to finalize the patches for #33910 (even though it is not likely to be merged
because it is a really fringe case, but hey I fixed it!) and the InferAlignment
pass. After that, I will also try to
fix the other bugs that I had mentioned in my proposal.