GSoC 2023 Week 12 - Limbo
05 Aug 2023This blog post is related to my GSoC 2023 Project.
This week, not much happened. I was mainly stuck on three different issues:
The threadCmpOverPHI
change is currently blocked on an optimization regression: specifically in
the file CMakeFiles/consumer-typeset.dir/z10.c.o
, there is a file size regression of +0.28%, which on inspecting the
generated IR shows that there is a fold of an icmp into a phi that is missed even after applying D155718
.
The likely reason for this occuring is the lack of recursion in threadCmpOverPHI
now. However, this is something
that should be handled later on in InstCombine
, but is not. I have spent quite a while trying unsuccessfully to come
up with reproductions that show this issue (at least 3 times by now!), where one of the following two have occured:
- The optimization fails when running only
InstCombine
(viaopt -p instcombine
) and succeeds when running theO3
pipeline (viaopt -O3
). - The issue reproduces on both my branch and on trunk. This one is strange, because it is only happening with the C
source file. If I take the generated IR for the file (run through
clang -O3
) which contains the failed fold, the IR folds just fine when running throughopt
orclang
again. This is the more confusing one because I genuinely have no idea what is going on here.
I’ve been experimenting with TargetLowering::SimplifyDemandedBits
and SelectionDAG::computeKnownBits
again. This
is finally the time I feel like throwing the towel in, because I have reached the point where I feel
SimplifyDemandedBits
cannot be salvaged at all, at least not without a major effort.
Let me explain: this all began when my mentor had asked me to do the simple job of moving a fold from
DAGCombiner::visitADD
(which took A+B
and converted it into A|B
when A
and B
share no common set bits) into
TargetLowering::SimplifyDemandedBits
, which is where it makes a lot more sense.
Now, in something like InstCombine
where the SimplifyDemandedBits
and computeKnownBits
implementations match
rather well, this would be straightforward - just copy the code and remove the calls to computeKnownBits
. However,
when I tried this in DAGCombiner
it started to cause all sorts of test failures because the known bits information
computed by SimplifyDemandedBits
and computeKnownBits
was different.
Initially, I stopped working on this because it was simply not worth the time sink. But, later on I picked it up again
because I felt that the only difference between the two functions was that SimplifyDemandedBits
was lacking some code
that was present in computeKnownBits
. So, I set forth comparing the implementations of each instruction between the
two functions and adding code anywhere I saw they were different. After this, nothing really changed - very few tests
stopped failing, and the situation was basically the same as before.
Recently, I came back to this function because I want to reduce the number of haveNoCommonBitsSet
calls that are there
in DAGCombiner
, so I tried to make the two functions achieve parity again. After more digging I decided to basically
go the nuclear route and fell back to calling computeKnownBits
for cases where I was not sure about the known bits
information having a chance of being equal. But, this was still causing failures between the two functions. How?! Even
after making sure pretty much all the code between the two functions was equal, I was still getting failures.
This was especially more confusing in the case of ISD::BUILD_VECTOR
, whose code is just a call to computeKnownBits
:
case ISD::BUILD_VECTOR:
// Collect the known bits that are shared by every demanded element.
// TODO: Call SimplifyDemandedBits for non-constant demanded elements.
Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
The only parameter which can be different here versus a call going through computeKnownBits
is DemandedElts
, and
sure enough that happens to be the culprit:
if (!AssumeSingleUse && !Op.getNode()->hasOneUse()) {
// ...
// Allow multiple uses, just set the DemandedBits/Elts to all bits.
DemandedBits = APInt::getAllOnes(BitWidth);
DemandedElts = APInt::getAllOnes(NumElts);
// ...
This code is present near the start of SimplifyDemandedBits
, here. After doing some
testing, the value of DemandedElts
was always different for the innermost node where the known bits information
computed was different between the two functions.
Because of this code, the known bits information computed for multi-use nodes is fundamentally different from that
computed using computeKnownBits
because the demanded bits and demanded elements information is reset to a conservative
value. InstCombine
works around this by calling SimplifyMultipleUseDemandedBits
when a multi-use instruction is
encountered, whereas TargetLowering
chooses instead to try and handle it inside SimplifyDemandedBits
, with calls to
the function SimplifyMultipleUseDemandedBits
sprinkled throughout.
What this means is that there is no real way to have SimplifyDemandedBits
and computeKnownBits
achieve even
near-parity in the known bits information computed, because of the conservative nature of the former. To remedy this, I
did try to add a call to SimplifyMultipleUseDemandedBits
near the start of SimplifyDemandedBits
where multi-use
nodes are handled, however this led to so many test failures that I did not even try to bother working through them.
I was able to achieve near parity between the results when falling back to computeKnownBits
, however this was only
after disabling the above assignments to DemandedBits
and DemandedElts
, which made the transform much more unsound.
So, that is where I decided to formally just give up on the two functions. As far as I am aware, there is no real easy
way to change the functions to work similarly, and the test fallout may not be worth going through the churn with.
I finally finished wrapping up this bug with an initial implementation. There are currently a few test
failures that need to be resolved when running ninja check-llvm-{assembler,bitcode,debuginfo}
, however I will not be
working on them because it takes a huge amount of time to make any forward progress on this patch.