Control flow-sensitive optimizations In the Druid Meta-Compiler

Control ﬂow-sensitive
optimizations
In the Druid Meta-Compiler
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr
github.com/m-demare

Conditional branches are slow
2

Conditional branches are slow…
But why?
● Complexiﬁcation of control ﬂow
● Increase in code size
● CPU pipeline stalling
3

Complexiﬁcation1
of control ﬂow
1
https://english.stackexchange.com/a/607869
● Prevents optimizations
● Makes some compilers’
tasks harder
○ Block placement
○ Register allocation
4

Increase in code size
Has a considerable impact, especially with suboptimal code placement
5
(10s of bytes)
~1ns
latency
~50ns
latency

Pipeline stalling ● Caused by the way
modern processors work
● Can have a huge impact
6

What’s a CPU pipeline?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
Assuming 3 cycles
per instruction
7

What’s a CPU pipeline?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
Assuming 3 cycles
per instruction
8
9 cycles
in total

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
9

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
10

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
11

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
12

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
13

…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
14

The issue with branches
15
r2 := Add(r5, r8)
jumpIf (r1 > 5)
to X
[other instructions…]
X: …

And it gets worse…
● The deeper the pipeline is, the longer the stall
○ AMD Zen uses 19 stages
○ Intel Lion Cove uses 10 stages
● Other CPU features can make this even more costly (see
indirect branches, superscalar microarchitectures, etc)
16

17
CPUs try to solve this
● Branch prediction:
○ Guess which branch will run, and start executing it
○ Discard results if the guess was wrong
● Eager speculative execution
○ Execute both branches simultaneously
○ Discard the results from the one that ends up being “wrong”
Still, the best branch is no branch at all

“But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
18

“But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
[ i < array size ] whileTrue: [
var := array at: i.
i := i + 1.
]
Start:
jumpIf (i >= array size)
to End
jumpIf (i >= array size)
to Error
var := MemLoad(array + i)
i := Add(i, 1)
jumpTo Start
End: …
19

Goal: Detect and eliminate dead branches
20
Dead branches: cannot be reached in any execution of the program.
Detecting them implies determining if a given condition is satisﬁable in its context
x < 5 ifTrue: [
x > 10 ifTrue: [
“Unreachable code”
]]

PiNodes: Representing Constraints
on Variables
21

x < 5 ifTrue: [
x > 10 ifTrue: [
]
]
22
x1
< 5 ifTrue: [
x2
:= 𝜋(x1
, <5)
x2
> 10 ifTrue: [
x3
:= 𝜋(x2
, >10)
]
]
PiNodes: Representing Constraints on Variables

● Graph based representation
of a program
● Nodes = Basic blocks
● Edges = Jumps
Control Flow Graph (CFG)
24

SSA
● Variables are assigned exactly once
x := 5.
x := 27.
x < 10 ifTrue: [
y := x.
] ifFalse: [
y := 13.
].
z = y + x
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [
y1
:= x2
.
] ifFalse: [
y2
:= 13.
].
z1
= ?? + x2
25

SSA
● Variables are assigned exactly once
● Φ-functions represent variables at merge points
x := 5.
x := 27.
x < 10 ifTrue: [
y := x.
] ifFalse: [
y := 13.
].
z = y + x
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [ (B1
)
y1
:= x2
.
] ifFalse: [ (B2
)
y2
:= 13.
].
y3
:= Φ(B1
→ y1
, B2
→ y2
)
z1
= y3
+ x2
26

SSA - use-def chains
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [ (B1
)
y1
:= x2
.
] ifFalse: [ (B2
)
y2
:= 13.
].
y3
:= Φ(B1
→ y1
, B2
→ y2
)
z1
= y3
+ x2
27

x < 5 ifTrue: [
x > 10 ifTrue: [
x doSomething.
]
]
28
x1
< 5 ifTrue: [
x2
:= 𝜋(x1
, <5)
x2
> 10 ifTrue: [
x3
:= 𝜋(x2
, >10)
x3
doSomething.
]
]
PiNodes: representing constraints on variables

Optimizations - Dead branch elimination
29

x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
30

x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ [10; ∞) empty?
YES ⇒ Unreachable
31

x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ [10; ∞) empty?
YES ⇒ Unreachable
32

x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
y1 := x2.
] ifFalse: [
x3 := 𝜋(x1, >=5).
y2 := 8.
].
y3 := Φ(y1, y2)
What are the possible values of y3?
The union between the possible
values of y1 and y2
(-∞; 5) ∪ {8}
33

ABCD method
More powerful:
● Models the relationship between variables
● Models the eﬀect of basic arithmetic operations
(addition and subtraction)
34

ABCD method
x1 <= y1 ifTrue: [
x2 := 𝜋(x1, <=y1).
y2 := 𝜋(y1, >=x1).
x3 := x2 - 10.
x3 < y2 ifTrue: [ "tautology"
x4 := 𝜋(x3, <y2).
y3 := 𝜋(y2, >x3).
] ifFalse: [ "unreachable"
x5 := 𝜋(x3, >=y2).
y4 := 𝜋(y2, <=x3).
] ]. 35
Nodes represent SSA values
An edge from a to b with
weight w means that b - a ≤ w.

Experimental Context
Druid
37
● source-to-source meta-compiler
● uses many optimization passes
○ Analysis and code
transformation

Old DBE vs PiNodes
● Druid already had a DBE pass
● Worked by computing all paths a variable was alive in
● Questions:
○ Is our new constant DBE method faster?
○ Is ABCD, the more powerful method, slower?
38

Measuring Compile Time Improvement
● Used two benchmarks to compare time spent optimizing:
○ Compiled all methods of a test class
○ Compiled one hand-crafted method with an intentionally complex
control ﬂow
39

Future Work
● Stronger constraint solving - Z3
● Measuring run time improvements
● Looking for more optimization opportunities
○ Using the Druid optimizer for high-level Pharo code
○ Message splitting
44

More Opportunities for Complex
Control Flows
x < 3 ifTrue: [
y doSomething.
].
x < 5 ifTrue: [
z doSomethingElse.
].
45

Future: Message splitting
x < 3 ifTrue: [
y doSomething.
].
x < 5 ifTrue: [
z doSomethingElse.
].
46

50
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr github.com/m-demare
● Branches make code slow
● It’s common to have some dead
branches in your code
● PiNodes represent constraints on SSA
variables, and can be used for DBE
Conclusions

Critical Edges
● Edges whose successor has
multiple predecessors, and
whose predecessor has multiple
successors
● They are annoying for PiNode
insertion, because the successor
is not dominated by the block
containing the condition
52

Breaking Critical Edges
● Remove that edge, and insert
● Insert a new basic block with just
an unconditional jump to the
critical edge’s target in its place
53

Domination
● B1
dominates B2
if every path
from the entry node to B2
must
go through B1
B1
dominates B1
, B2
, B3
, B4
, B5
B2
dominates B2
, B3
B3
, B4
, B5
only dominate themselves
54

The PiNode Framework - insertion
● Break critical edges
● Insert PiNodes in each successor of a condition (one for each
variable involved)
● Replace usages in dominated blocks
55

The PiNode Framework - deletion
Simple copy propagation algorithm: replace each usage of
the PiNode for a usage of the copied variable
56

Dead branch
elimination
57
cfg piNodesDo: [ :piNode |
piNode ifNotSatisfiable: [
unreachableBlocks add:
piNode basicBlock.
].
].
cfg removeJmpsTo: unreachableBlocks.
cfg removeBlocks: unreachableBlocks.
Basic pseudocode of the algorithm

Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
58

Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
59

Control flow-sensitive optimizations In the Druid Meta-Compiler

More Related Content

Similar to Control flow-sensitive optimizations In the Druid Meta-Compiler (20)

More from ESUG (20)

Recently uploaded (20)

Control flow-sensitive optimizations In the Druid Meta-Compiler