GSoC 2023 Week 12 - Limbo

This blog post is related to my GSoC 2023 Project.

This week, not much happened. I was mainly stuck on three different issues:

The threadCmpOverPHI change is currently blocked on an optimization regression: specifically in the file CMakeFiles/consumer-typeset.dir/z10.c.o, there is a file size regression of +0.28%, which on inspecting the generated IR shows that there is a fold of an icmp into a phi that is missed even after applying D155718.

The likely reason for this occuring is the lack of recursion in threadCmpOverPHI now. However, this is something that should be handled later on in InstCombine, but is not. I have spent quite a while trying unsuccessfully to come up with reproductions that show this issue (at least 3 times by now!), where one of the following two have occured:


I’ve been experimenting with TargetLowering::SimplifyDemandedBits and SelectionDAG::computeKnownBits again. This is finally the time I feel like throwing the towel in, because I have reached the point where I feel SimplifyDemandedBits cannot be salvaged at all, at least not without a major effort.

Let me explain: this all began when my mentor had asked me to do the simple job of moving a fold from DAGCombiner::visitADD (which took A+B and converted it into A|B when A and B share no common set bits) into TargetLowering::SimplifyDemandedBits, which is where it makes a lot more sense.

Now, in something like InstCombine where the SimplifyDemandedBits and computeKnownBits implementations match rather well, this would be straightforward - just copy the code and remove the calls to computeKnownBits. However, when I tried this in DAGCombiner it started to cause all sorts of test failures because the known bits information computed by SimplifyDemandedBits and computeKnownBits was different.

Initially, I stopped working on this because it was simply not worth the time sink. But, later on I picked it up again because I felt that the only difference between the two functions was that SimplifyDemandedBits was lacking some code that was present in computeKnownBits. So, I set forth comparing the implementations of each instruction between the two functions and adding code anywhere I saw they were different. After this, nothing really changed - very few tests stopped failing, and the situation was basically the same as before.

Recently, I came back to this function because I want to reduce the number of haveNoCommonBitsSet calls that are there in DAGCombiner, so I tried to make the two functions achieve parity again. After more digging I decided to basically go the nuclear route and fell back to calling computeKnownBits for cases where I was not sure about the known bits information having a chance of being equal. But, this was still causing failures between the two functions. How?! Even after making sure pretty much all the code between the two functions was equal, I was still getting failures.

This was especially more confusing in the case of ISD::BUILD_VECTOR, whose code is just a call to computeKnownBits:

case ISD::BUILD_VECTOR:
  // Collect the known bits that are shared by every demanded element.
  // TODO: Call SimplifyDemandedBits for non-constant demanded elements.
  Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);

The only parameter which can be different here versus a call going through computeKnownBits is DemandedElts, and sure enough that happens to be the culprit:

if (!AssumeSingleUse && !Op.getNode()->hasOneUse()) {
  // ...
  // Allow multiple uses, just set the DemandedBits/Elts to all bits.
  DemandedBits = APInt::getAllOnes(BitWidth);
  DemandedElts = APInt::getAllOnes(NumElts);
  // ...

This code is present near the start of SimplifyDemandedBits, here. After doing some testing, the value of DemandedElts was always different for the innermost node where the known bits information computed was different between the two functions.

Because of this code, the known bits information computed for multi-use nodes is fundamentally different from that computed using computeKnownBits because the demanded bits and demanded elements information is reset to a conservative value. InstCombine works around this by calling SimplifyMultipleUseDemandedBits when a multi-use instruction is encountered, whereas TargetLowering chooses instead to try and handle it inside SimplifyDemandedBits, with calls to the function SimplifyMultipleUseDemandedBits sprinkled throughout.

What this means is that there is no real way to have SimplifyDemandedBits and computeKnownBits achieve even near-parity in the known bits information computed, because of the conservative nature of the former. To remedy this, I did try to add a call to SimplifyMultipleUseDemandedBits near the start of SimplifyDemandedBits where multi-use nodes are handled, however this led to so many test failures that I did not even try to bother working through them.

I was able to achieve near parity between the results when falling back to computeKnownBits, however this was only after disabling the above assignments to DemandedBits and DemandedElts, which made the transform much more unsound. So, that is where I decided to formally just give up on the two functions. As far as I am aware, there is no real easy way to change the functions to work similarly, and the test fallout may not be worth going through the churn with.


I finally finished wrapping up this bug with an initial implementation. There are currently a few test failures that need to be resolved when running ninja check-llvm-{assembler,bitcode,debuginfo}, however I will not be working on them because it takes a huge amount of time to make any forward progress on this patch.