GSoC 2023 Week 9 - A Saturating Arithmetic And ICmp Fold

08 Jul 2023

This blog post is related to my GSoC 2023 Project.

This week, I worked on a couple of optimizations in InstCombine: folds of the form u[add|sub]_sat(X, C) pred C2, where both C and C2 are constants. This fold takes on the general formula:

   u[add|sub]_sat(X, C) Pred C2
=> (OpWillWrap ? SaturatingValue : (X Binop C)) Pred C2
=>  OpWillWrap ? (SaturatingValue Pred C2) : ((X Binop C) Pred C2)

When the value of SaturatingValue Pred C2 is true, we get the following fold:

   OpWillWrap ?  true : ((X Binop C) Pred C2)
=> OpWillWrap || ((X Binop C) Pred C2

When the value is false, we get the following fold:

    OpWillWrap ?  false : ((X Binop C) Pred C2)
=> !OpWillWrap ?  ((X Binop C) Pred C2 : false
=> !OpWillWrap && ((X Binop C) Pred C2

It is usually possible to combine these two checks into just one comparison. For the two intrinsics currently implemented, the OpWillWrap checks look like the following:

usub_sat: X < C
uadd_sat: X >= ~C

While the usub_sat check is quite obvious, I found the uadd_sat one is not, so here is a little derivation:

   (X + C >= UINT_MAX)
=> (X     >= UINT_MAX - C)
=> (X     >= (UINT_MAX + ~C + 1)) [as -C == (~C + 1)]
=> (X     >= (~C + 0))            [as (UINT_MAX + 1) == 0]
=> (X     >= ~C)

This is a really neat way to check the condition without causing overflow through the addition.

It is possible to combine the value ranges for the two operands above, to yield just one icmp for the overall expression. This can be done by noting the following:

For operator ||, either operand can be true for the whole expression to evaluate to true. This means the overall value range for the expression will be the union of the LHS and the RHS, where the LHS is the value range of the OpWillWrap expression. For the RHS, the value range remains the same across the two types of expressions.
For operator &&, both operands have to be true for the whole expression to evaluate to true. This means the overall value range for the expression will be the intersection of the LHS and the RHS, where the LHS is the value range of the !OpWillWrap expression, which happens to be the exact inverse of the above so it maps quite cleanly to code.

Thus, I created the following two patches for the fold:

The reason that these patches took so long to make is because it took me quite a while to understand the semantics of the fold itself. What helped me a lot was working out the simplifications on paper, then transferring it into code. I got really stuck on the || vs. && simplification, and then generalizing it to have the same general formula for both also took some time. Overall, the patch for usub_sat took 5 days, whereas the one for uadd_sat took 3 days. However, I learnt a fair bit about writing proofs on Alive2 from this, and also learnt that its probably a good idea to mathematically verify that a transform actually works before transferring it into code and just chucking it on phabricator for review :).

dc03

GSoC 2023 Week 9 - A Saturating Arithmetic And ICmp Fold

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023