GSoC 2023 Week 9 - A Saturating Arithmetic And ICmp Fold
08 Jul 2023
This blog post is related to my GSoC 2023 Project.
This week, I worked on a couple of optimizations in InstCombine
: folds of the form
u[add|sub]_sat(X, C) pred C2
, where both C
and C2
are constants. This fold
takes on the general formula:
u[add|sub]_sat(X, C) Pred C2
=> (OpWillWrap ? SaturatingValue : (X Binop C)) Pred C2
=> OpWillWrap ? (SaturatingValue Pred C2) : ((X Binop C) Pred C2)
When the value of SaturatingValue Pred C2
is true, we get the following fold:
OpWillWrap ? true : ((X Binop C) Pred C2)
=> OpWillWrap || ((X Binop C) Pred C2
When the value is false, we get the following fold:
OpWillWrap ? false : ((X Binop C) Pred C2)
=> !OpWillWrap ? ((X Binop C) Pred C2 : false
=> !OpWillWrap && ((X Binop C) Pred C2
It is usually possible to combine these two checks into just one comparison. For the two intrinsics currently
implemented, the OpWillWrap
checks look like the following:
-
usub_sat
:X < C
-
uadd_sat
:X >= ~C
While the usub_sat
check is quite obvious, I found the uadd_sat
one is not, so here is a
little derivation:
(X + C >= UINT_MAX)
=> (X >= UINT_MAX - C)
=> (X >= (UINT_MAX + ~C + 1)) [as -C == (~C + 1)]
=> (X >= (~C + 0)) [as (UINT_MAX + 1) == 0]
=> (X >= ~C)
This is a really neat way to check the condition without causing overflow through the addition.
It is possible to combine the value ranges for the two operands above, to yield just one icmp
for the
overall expression. This can be done by noting the following:
-
For operator
||
, either operand can be true for the whole expression to evaluate to true. This means the overall value range for the expression will be the union of the LHS and the RHS, where the LHS is the value range of theOpWillWrap
expression. For the RHS, the value range remains the same across the two types of expressions. -
For operator
&&
, both operands have to be true for the whole expression to evaluate to true. This means the overall value range for the expression will be the intersection of the LHS and the RHS, where the LHS is the value range of the!OpWillWrap
expression, which happens to be the exact inverse of the above so it maps quite cleanly to code.
Thus, I created the following two patches for the fold:
The reason that these patches took so long to make is because it took me quite a while to understand the semantics of
the fold itself. What helped me a lot was working out the simplifications on paper, then transferring it into code.
I got really stuck on the ||
vs. &&
simplification, and then generalizing it to have the
same general formula for both also took some time. Overall, the patch for usub_sat
took 5 days, whereas
the one for uadd_sat
took 3 days. However, I learnt a fair bit about writing proofs on Alive2 from this,
and also learnt that its probably a good idea to mathematically verify that a transform actually works before
transferring it into code and just chucking it on phabricator for review :).