GSoC 2023 Week 12 - Limbo

05 Aug 2023

This blog post is related to my GSoC 2023 Project.

This week, not much happened. I was mainly stuck on three different issues:

threadCmpOverPHI

The threadCmpOverPHI change is currently blocked on an optimization regression: specifically in the file CMakeFiles/consumer-typeset.dir/z10.c.o, there is a file size regression of +0.28%, which on inspecting the generated IR shows that there is a fold of an icmp into a phi that is missed even after applying D155718.

The likely reason for this occuring is the lack of recursion in threadCmpOverPHI now. However, this is something that should be handled later on in InstCombine, but is not. I have spent quite a while trying unsuccessfully to come up with reproductions that show this issue (at least 3 times by now!), where one of the following two have occured:

The optimization fails when running only InstCombine (via opt -p instcombine) and succeeds when running the O3 pipeline (via opt -O3).
The issue reproduces on both my branch and on trunk. This one is strange, because it is only happening with the C source file. If I take the generated IR for the file (run through clang -O3) which contains the failed fold, the IR folds just fine when running through opt or clang again. This is the more confusing one because I genuinely have no idea what is going on here.

The endless tale of SimplifyDemandedBits and computeKnownBits

I’ve been experimenting with TargetLowering::SimplifyDemandedBits and SelectionDAG::computeKnownBits again. This is finally the time I feel like throwing the towel in, because I have reached the point where I feel SimplifyDemandedBits cannot be salvaged at all, at least not without a major effort.

Let me explain: this all began when my mentor had asked me to do the simple job of moving a fold from DAGCombiner::visitADD (which took A+B and converted it into A|B when A and B share no common set bits) into TargetLowering::SimplifyDemandedBits, which is where it makes a lot more sense.

Now, in something like InstCombine where the SimplifyDemandedBits and computeKnownBits implementations match rather well, this would be straightforward - just copy the code and remove the calls to computeKnownBits. However, when I tried this in DAGCombiner it started to cause all sorts of test failures because the known bits information computed by SimplifyDemandedBits and computeKnownBits was different.

Initially, I stopped working on this because it was simply not worth the time sink. But, later on I picked it up again because I felt that the only difference between the two functions was that SimplifyDemandedBits was lacking some code that was present in computeKnownBits. So, I set forth comparing the implementations of each instruction between the two functions and adding code anywhere I saw they were different. After this, nothing really changed - very few tests stopped failing, and the situation was basically the same as before.

Recently, I came back to this function because I want to reduce the number of haveNoCommonBitsSet calls that are there in DAGCombiner, so I tried to make the two functions achieve parity again. After more digging I decided to basically go the nuclear route and fell back to calling computeKnownBits for cases where I was not sure about the known bits information having a chance of being equal. But, this was still causing failures between the two functions. How?! Even after making sure pretty much all the code between the two functions was equal, I was still getting failures.

This was especially more confusing in the case of ISD::BUILD_VECTOR, whose code is just a call to computeKnownBits:

case ISD::BUILD_VECTOR:
  // Collect the known bits that are shared by every demanded element.
  // TODO: Call SimplifyDemandedBits for non-constant demanded elements.
  Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);

The only parameter which can be different here versus a call going through computeKnownBits is DemandedElts, and sure enough that happens to be the culprit:

if (!AssumeSingleUse && !Op.getNode()->hasOneUse()) {
  // ...
  // Allow multiple uses, just set the DemandedBits/Elts to all bits.
  DemandedBits = APInt::getAllOnes(BitWidth);
  DemandedElts = APInt::getAllOnes(NumElts);
  // ...

This code is present near the start of SimplifyDemandedBits, here. After doing some testing, the value of DemandedElts was always different for the innermost node where the known bits information computed was different between the two functions.

Because of this code, the known bits information computed for multi-use nodes is fundamentally different from that computed using computeKnownBits because the demanded bits and demanded elements information is reset to a conservative value. InstCombine works around this by calling SimplifyMultipleUseDemandedBits when a multi-use instruction is encountered, whereas TargetLowering chooses instead to try and handle it inside SimplifyDemandedBits, with calls to the function SimplifyMultipleUseDemandedBits sprinkled throughout.

What this means is that there is no real way to have SimplifyDemandedBits and computeKnownBits achieve even near-parity in the known bits information computed, because of the conservative nature of the former. To remedy this, I did try to add a call to SimplifyMultipleUseDemandedBits near the start of SimplifyDemandedBits where multi-use nodes are handled, however this led to so many test failures that I did not even try to bother working through them.

I was able to achieve near parity between the results when falling back to computeKnownBits, however this was only after disabling the above assignments to DemandedBits and DemandedElts, which made the transform much more unsound. So, that is where I decided to formally just give up on the two functions. As far as I am aware, there is no real easy way to change the functions to work similarly, and the test fallout may not be worth going through the churn with.

Bug #33910

I finally finished wrapping up this bug with an initial implementation. There are currently a few test failures that need to be resolved when running ninja check-llvm-{assembler,bitcode,debuginfo}, however I will not be working on them because it takes a huge amount of time to make any forward progress on this patch.

dc03

GSoC 2023 Week 12 - Limbo

`threadCmpOverPHI`

The endless tale of `SimplifyDemandedBits` and `computeKnownBits`

Bug #33910

dc03

GSoC 2023 Week 12 - Limbo

threadCmpOverPHI

The endless tale of SimplifyDemandedBits and computeKnownBits

Bug #33910

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

`threadCmpOverPHI`

The endless tale of `SimplifyDemandedBits` and `computeKnownBits`

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023