GSoC 2023 Week 10 and 11 - A Break And A Pass

24 Jul 2023

This blog post is related to my GSoC 2023 Project.

I decided to take a break during week 10, mainly because I was starting to get tired but also because it was the last week of my summer vacations and I wanted to clear my mind before heading back to college. Thus, I did not really do much work during this week. I committed a patch to generalize the function foldAndOrOfICmpEqZeroAndICmp, which originally did the following transform:

(icmp eq X, 0) | (icmp ult Other, X) -> (icmp ule Other, X-1)
(icmp ne X, 0) & (icmp uge Other, X) -> (icmp ugt Other, X-1)

While looking at this bug, my mentor initially pointed out that this function is a good candidate to base this patch off of. While looking further into it, I realized that the fold could be generalized to the following transform and renamed the function to foldAndOrOfICmpEqConstantAndICmp:

(icmp eq X, C) | (icmp ult Other, (X - C)) -> (icmp ule Other, (X - (C + 1)))
(icmp ne X, C) & (icmp uge Other, (X - C)) -> (icmp ugt Other, (X - (C + 1)))

And that was pretty much all I did. The past week, I have mainly been working on three things:

Fixing bug #33910

As outlined in my proposal, this is a bug where O(N^2) behaviour was being caused in the metadata parser. This was mainly because of two things:
- The files provided along with the bug had roughly 20,000 lines of almost-identical metadata lines which only differed in the OffsetInBits field. I have covered this before here.
- The second part, which I have not made a patch for yet, involves changing the way the replaceAllUsesWith function works to only unique nodes once. This is still a work in progress, though it is mostly complete. This is a much bigger change, as it involves adding a way to call MDNode::handleChangedOperand without having it unique the node each time. To do this, I’ve introduced two functions:
  - resolveChangedOperandWithoutUniquing and
  - handledChangedOperandWithoutUniquing
  These add the node onto a map instead. The map is needed as MDNode calls the function resolveAfterOperandChange after uniquing, so we need to store the pointer to the MDNode and then the two pointers to the modified operands. Thus it leads to the type SmallDenseMap<Metadata *, SmallVector<std::pair<void *, Metadata *>>>.
  
  The vector is required because there may be multiple operands that need to be resolved after the node is uniqued. While the change itself is quite simple, the problem is mainly that Metadata.h is included absolutely everywhere, so every change to that header ends up recompiling approximately 1600-2000 files.
Implementing the InferAlignment pass

This pass was initially implemented by my mentor back in 2020, however problems arose because of the constant expression support in LLVM IR and the patch was basically abandoned. Now that he has pretty much removed all of the constant expression support, this pass can be added again. The implementation carries over almost exactly from the original commit, it is just updated to use the new pass manager.

This pass was mainly motivated by the issue that InstCombine was spending a lot of its time just inferring alignment on instructions which is information that generally profits the back-end and not the middle-end. So, this pass slots in somewhere before the back-end passes run, the best place likely being before loop unrolling as that seems to disrupt the known bits information that this pass relies on.

It gives a really decent result:
- On removing alignment inference from InstCombine: -1.3% instrs(u)
- On adding the pass into the pipeline: +0.11% instrs(u)
Overall, this means a ~1.1% speedup.
Patch D155958

This patch fixes an inefficiency that my mentor noticed - the handling for LoadInst in ValueTracking::isKnownNonZero ends up calling computeKnownBits the most out of any instruction, however it is not necessary at all because this information is already checked by rangeMetadataExcludesValue here:
```
if (auto *I = dyn_cast<Instruction>(V)) {
  if (MDNode *Ranges = Q.IIQ.getMetadata(I, LLVMContext::MD_range)) {
    // If the possible ranges don't contain zero, then the value is
    // definitely non-zero.
    if (auto *Ty = dyn_cast<IntegerType>(V->getType())) {
      const APInt ZeroValue(Ty->getBitWidth(), 0);
      if (rangeMetadataExcludesValue(Ranges, ZeroValue))
        return true;
    }
  }
}
```
This is because computeKnownBits for LoadInst only checks the range metadata as well, so it is completely redundant to call it. This also means that it ends up doing the check twice each time it returns false. Removing the check gives a 0.12% speedup.

In the coming week I should be able to finalize the patches for #33910 (even though it is not likely to be merged because it is a really fringe case, but hey I fixed it!) and the InferAlignment pass. After that, I will also try to fix the other bugs that I had mentioned in my proposal.

dc03

GSoC 2023 Week 10 and 11 - A Break And A Pass

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023