Talk from IWST 2025: Control flow-sensitive optimizations In the Druid Meta-Compiler
PDF: https://archive.esug.org/ESUG2025/iwst-day1/iwst-104-dema-pinodes-druid.pdf
Control flow-sensitive optimizations In the Druid Meta-Compiler
1. Control flow-sensitive
optimizations
In the Druid Meta-Compiler
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr
github.com/m-demare
15. The issue with branches
15
r2 := Add(r5, r8)
jumpIf (r1 > 5)
to X
[other instructions…]
X: …
16. And it gets worse…
● The deeper the pipeline is, the longer the stall
○ AMD Zen uses 19 stages
○ Intel Lion Cove uses 10 stages
● Other CPU features can make this even more costly (see
indirect branches, superscalar microarchitectures, etc)
16
17. 17
CPUs try to solve this
● Branch prediction:
○ Guess which branch will run, and start executing it
○ Discard results if the guess was wrong
● Eager speculative execution
○ Execute both branches simultaneously
○ Discard the results from the one that ends up being “wrong”
Still, the best branch is no branch at all
18. “But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
18
19. “But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
[ i < array size ] whileTrue: [
var := array at: i.
i := i + 1.
]
Start:
jumpIf (i >= array size)
to End
jumpIf (i >= array size)
to Error
var := MemLoad(array + i)
i := Add(i, 1)
jumpTo Start
End: …
19
20. Goal: Detect and eliminate dead branches
20
Dead branches: cannot be reached in any execution of the program.
Detecting them implies determining if a given condition is satisfiable in its context
x < 5 ifTrue: [
x > 10 ifTrue: [
“Unreachable code”
]]
33. Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
y1 := x2.
] ifFalse: [
x3 := 𝜋(x1, >=5).
y2 := 8.
].
y3 := Φ(y1, y2)
What are the possible values of y3?
The union between the possible
values of y1 and y2
(-∞; 5) ∪ {8}
33
34. ABCD method
More powerful:
● Models the relationship between variables
● Models the effect of basic arithmetic operations
(addition and subtraction)
34
35. ABCD method
x1 <= y1 ifTrue: [
x2 := 𝜋(x1, <=y1).
y2 := 𝜋(y1, >=x1).
x3 := x2 - 10.
x3 < y2 ifTrue: [ "tautology"
x4 := 𝜋(x3, <y2).
y3 := 𝜋(y2, >x3).
] ifFalse: [ "unreachable"
x5 := 𝜋(x3, >=y2).
y4 := 𝜋(y2, <=x3).
] ]. 35
Nodes represent SSA values
An edge from a to b with
weight w means that b - a ≤ w.
38. Old DBE vs PiNodes
● Druid already had a DBE pass
● Worked by computing all paths a variable was alive in
● Questions:
○ Is our new constant DBE method faster?
○ Is ABCD, the more powerful method, slower?
38
39. Measuring Compile Time Improvement
● Used two benchmarks to compare time spent optimizing:
○ Compiled all methods of a test class
○ Compiled one hand-crafted method with an intentionally complex
control flow
39
44. Future Work
● Stronger constraint solving - Z3
● Measuring run time improvements
● Looking for more optimization opportunities
○ Using the Druid optimizer for high-level Pharo code
○ Message splitting
44
45. More Opportunities for Complex
Control Flows
x < 3 ifTrue: [
y doSomething.
].
x < 5 ifTrue: [
z doSomethingElse.
].
45
50. 50
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr github.com/m-demare
● Branches make code slow
● It’s common to have some dead
branches in your code
● PiNodes represent constraints on SSA
variables, and can be used for DBE
Conclusions
52. Critical Edges
● Edges whose successor has
multiple predecessors, and
whose predecessor has multiple
successors
● They are annoying for PiNode
insertion, because the successor
is not dominated by the block
containing the condition
52
53. Breaking Critical Edges
● Remove that edge, and insert
● Insert a new basic block with just
an unconditional jump to the
critical edge’s target in its place
53
54. Domination
● B1
dominates B2
if every path
from the entry node to B2
must
go through B1
B1
dominates B1
, B2
, B3
, B4
, B5
B2
dominates B2
, B3
B3
, B4
, B5
only dominate themselves
54
55. The PiNode Framework - insertion
● Break critical edges
● Insert PiNodes in each successor of a condition (one for each
variable involved)
● Replace usages in dominated blocks
55
56. The PiNode Framework - deletion
Simple copy propagation algorithm: replace each usage of
the PiNode for a usage of the copied variable
56
57. Dead branch
elimination
57
cfg piNodesDo: [ :piNode |
piNode ifNotSatisfiable: [
unreachableBlocks add:
piNode basicBlock.
].
].
cfg removeJmpsTo: unreachableBlocks.
cfg removeBlocks: unreachableBlocks.
Basic pseudocode of the algorithm
58. Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
58
59. Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
59