SlideShare a Scribd company logo
Control flow-sensitive
optimizations
In the Druid Meta-Compiler
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr
github.com/m-demare
Conditional branches are slow
2
Conditional branches are slow…
But why?
● Complexification of control flow
● Increase in code size
● CPU pipeline stalling
3
Complexification1
of control flow
1
https://english.stackexchange.com/a/607869
● Prevents optimizations
● Makes some compilers’
tasks harder
○ Block placement
○ Register allocation
4
Increase in code size
Has a considerable impact, especially with suboptimal code placement
5
(10s of bytes)
~1ns
latency
~50ns
latency
Pipeline stalling ● Caused by the way
modern processors work
● Can have a huge impact
6
What’s a CPU pipeline?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
Assuming 3 cycles
per instruction
7
What’s a CPU pipeline?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
Assuming 3 cycles
per instruction
8
9 cycles
in total
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
9
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
10
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
11
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
12
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
13
…or does it?
r1 := Add(r5, r8)
r2 := Mul(r5, r3)
r3 := Sub(r2, 5)
14
The issue with branches
15
r2 := Add(r5, r8)
jumpIf (r1 > 5)
to X
[other instructions…]
X: …
And it gets worse…
● The deeper the pipeline is, the longer the stall
○ AMD Zen uses 19 stages
○ Intel Lion Cove uses 10 stages
● Other CPU features can make this even more costly (see
indirect branches, superscalar microarchitectures, etc)
16
17
CPUs try to solve this
● Branch prediction:
○ Guess which branch will run, and start executing it
○ Discard results if the guess was wrong
● Eager speculative execution
○ Execute both branches simultaneously
○ Discard the results from the one that ends up being “wrong”
Still, the best branch is no branch at all
“But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
18
“But… I don’t write redundant conditionals!!”
Sadly, compilers write them for you
● Function inlining
● Lowering of high-level features. E.g.,
○ Array bounds checks
○ Runtime type checks
○ Polymorphism + message sends
[ i < array size ] whileTrue: [
var := array at: i.
i := i + 1.
]
Start:
jumpIf (i >= array size)
to End
jumpIf (i >= array size)
to Error
var := MemLoad(array + i)
i := Add(i, 1)
jumpTo Start
End: …
19
Goal: Detect and eliminate dead branches
20
Dead branches: cannot be reached in any execution of the program.
Detecting them implies determining if a given condition is satisfiable in its context
x < 5 ifTrue: [
x > 10 ifTrue: [
“Unreachable code”
]]
PiNodes: Representing Constraints
on Variables
21
x < 5 ifTrue: [
x > 10 ifTrue: [
“Unreachable code”
]
]
22
x1
< 5 ifTrue: [
x2
:= 𝜋(x1
, <5)
x2
> 10 ifTrue: [
x3
:= 𝜋(x2
, >10)
“Unreachable code”
]
]
PiNodes: Representing Constraints on Variables
Optimizing with PiNodes
23
● Graph based representation
of a program
● Nodes = Basic blocks
● Edges = Jumps
Control Flow Graph (CFG)
24
SSA
● Variables are assigned exactly once
x := 5.
x := 27.
x < 10 ifTrue: [
y := x.
] ifFalse: [
y := 13.
].
z = y + x
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [
y1
:= x2
.
] ifFalse: [
y2
:= 13.
].
z1
= ?? + x2
25
SSA
● Variables are assigned exactly once
● Φ-functions represent variables at merge points
x := 5.
x := 27.
x < 10 ifTrue: [
y := x.
] ifFalse: [
y := 13.
].
z = y + x
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [ (B1
)
y1
:= x2
.
] ifFalse: [ (B2
)
y2
:= 13.
].
y3
:= Φ(B1
→ y1
, B2
→ y2
)
z1
= y3
+ x2
26
SSA - use-def chains
x1
:= 5.
x2
:= 27.
x2
< 10 ifTrue: [ (B1
)
y1
:= x2
.
] ifFalse: [ (B2
)
y2
:= 13.
].
y3
:= Φ(B1
→ y1
, B2
→ y2
)
z1
= y3
+ x2
27
x < 5 ifTrue: [
x > 10 ifTrue: [
x doSomething.
“Unreachable code”
]
]
28
x1
< 5 ifTrue: [
x2
:= 𝜋(x1
, <5)
x2
> 10 ifTrue: [
x3
:= 𝜋(x2
, >10)
x3
doSomething.
“Unreachable code”
]
]
PiNodes: representing constraints on variables
Optimizations - Dead branch elimination
29
x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
30
x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ [10; ∞) empty?
YES ⇒ Unreachable
31
x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ (-∞; 10) empty?
NO ⇒ Reachable
Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
x2 < 10 ifTrue: [
x3 := 𝜋(x2, <10).
] ifFalse: [
x4 := 𝜋(x2, >=10).
" unreachable "
].
]
x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5)
Is (-∞; 5) ∩ [10; ∞) empty?
YES ⇒ Unreachable
32
Constant dead branch elimination
x1 < 5 ifTrue: [
x2 := 𝜋(x1, <5).
y1 := x2.
] ifFalse: [
x3 := 𝜋(x1, >=5).
y2 := 8.
].
y3 := Φ(y1, y2)
What are the possible values of y3?
The union between the possible
values of y1 and y2
(-∞; 5) ∪ {8}
33
ABCD method
More powerful:
● Models the relationship between variables
● Models the effect of basic arithmetic operations
(addition and subtraction)
34
ABCD method
x1 <= y1 ifTrue: [
x2 := 𝜋(x1, <=y1).
y2 := 𝜋(y1, >=x1).
x3 := x2 - 10.
x3 < y2 ifTrue: [ "tautology"
x4 := 𝜋(x3, <y2).
y3 := 𝜋(y2, >x3).
] ifFalse: [ "unreachable"
x5 := 𝜋(x3, >=y2).
y4 := 𝜋(y2, <=x3).
] ]. 35
Nodes represent SSA values
An edge from a to b with
weight w means that b - a ≤ w.
Experiments and results
36
Experimental Context
Druid
37
● source-to-source meta-compiler
● uses many optimization passes
○ Analysis and code
transformation
Old DBE vs PiNodes
● Druid already had a DBE pass
● Worked by computing all paths a variable was alive in
● Questions:
○ Is our new constant DBE method faster?
○ Is ABCD, the more powerful method, slower?
38
Measuring Compile Time Improvement
● Used two benchmarks to compare time spent optimizing:
○ Compiled all methods of a test class
○ Compiled one hand-crafted method with an intentionally complex
control flow
39
Results
40
Results
41
Results
42
Future work
43
Future Work
● Stronger constraint solving - Z3
● Measuring run time improvements
● Looking for more optimization opportunities
○ Using the Druid optimizer for high-level Pharo code
○ Message splitting
44
More Opportunities for Complex
Control Flows
x < 3 ifTrue: [
y doSomething.
].
x < 5 ifTrue: [
z doSomethingElse.
].
45
Future: Message splitting
x < 3 ifTrue: [
y doSomething.
].
x < 5 ifTrue: [
z doSomethingElse.
].
46
47
Future: Message splitting
48
Future: Message splitting
49
Future: Message splitting
50
Matías Demare - Guillermo Polito
Javier Pimás - Nahuel Palumbo
matias-nicolas.demare@inria.fr github.com/m-demare
● Branches make code slow
● It’s common to have some dead
branches in your code
● PiNodes represent constraints on SSA
variables, and can be used for DBE
Conclusions
Addendum
Critical Edges
● Edges whose successor has
multiple predecessors, and
whose predecessor has multiple
successors
● They are annoying for PiNode
insertion, because the successor
is not dominated by the block
containing the condition
52
Breaking Critical Edges
● Remove that edge, and insert
● Insert a new basic block with just
an unconditional jump to the
critical edge’s target in its place
53
Domination
● B1
dominates B2
if every path
from the entry node to B2
must
go through B1
B1
dominates B1
, B2
, B3
, B4
, B5
B2
dominates B2
, B3
B3
, B4
, B5
only dominate themselves
54
The PiNode Framework - insertion
● Break critical edges
● Insert PiNodes in each successor of a condition (one for each
variable involved)
● Replace usages in dominated blocks
55
The PiNode Framework - deletion
Simple copy propagation algorithm: replace each usage of
the PiNode for a usage of the copied variable
56
Dead branch
elimination
57
cfg piNodesDo: [ :piNode |
piNode ifNotSatisfiable: [
unreachableBlocks add:
piNode basicBlock.
].
].
cfg removeJmpsTo: unreachableBlocks.
cfg removeBlocks: unreachableBlocks.
Basic pseudocode of the algorithm
Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
58
Message splitting (with code)
x < 3 ifTrue: [
y doSomething.
] ifFalse: [].
x < 5 ifTrue: [
z doSomethingElse.
].
x < 3 ifTrue: [
y doSomething.
x < 5 ifTrue: [
z doSomethingElse.
].
] ifFalse: [
x < 5 ifTrue: [
z doSomethingElse.
].
].
59

More Related Content

PPT
lecture8_Cuong.ppt
PDF
Vectorization in ATLAS
PPTX
presentation.pptx
PDF
Tutorial matlab
PDF
Tutorialmatlab kurniawan.s
PPT
Archi Modelling
PDF
Reduction
PDF
Idea for ineractive programming language
lecture8_Cuong.ppt
Vectorization in ATLAS
presentation.pptx
Tutorial matlab
Tutorialmatlab kurniawan.s
Archi Modelling
Reduction
Idea for ineractive programming language

Similar to Control flow-sensitive optimizations In the Druid Meta-Compiler (20)

PDF
Boosting Developer Productivity with Clang
PPT
Lecture#6 functions in c++
PDF
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
PPTX
Lines and planes in space
PPTX
1.1Introduction to matlab.pptx
PPTX
Mat lab workshop
PPTX
Pythonlearn-02-Expressions123AdvanceLevel.pptx
PDF
Control Flow Graphs
PDF
Control Flow Graphs
PDF
Mit6 094 iap10_lec02
PPTX
INTRODUCTION TO MATLAB presentation.pptx
PDF
Functional programming with haskell
PPTX
Programming python quick intro for schools
PPT
Verilogforlab
PDF
Functional Operations - Susan Potter
PDF
openMP loop parallelization
PDF
GBM in H2O with Cliff Click: H2O API
PDF
Numerical Methods for Engineers 6th Edition Chapra Solutions Manual
PPT
simple notes for ug students for college
PDF
Eye deep
Boosting Developer Productivity with Clang
Lecture#6 functions in c++
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Lines and planes in space
1.1Introduction to matlab.pptx
Mat lab workshop
Pythonlearn-02-Expressions123AdvanceLevel.pptx
Control Flow Graphs
Control Flow Graphs
Mit6 094 iap10_lec02
INTRODUCTION TO MATLAB presentation.pptx
Functional programming with haskell
Programming python quick intro for schools
Verilogforlab
Functional Operations - Susan Potter
openMP loop parallelization
GBM in H2O with Cliff Click: H2O API
Numerical Methods for Engineers 6th Edition Chapra Solutions Manual
simple notes for ug students for college
Eye deep
Ad

More from ESUG (20)

PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
PDF
Micromaid: A simple Mermaid-like chart generator for Pharo
PDF
Directing Generative AI for Pharo Documentation
PDF
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
PDF
Composing and Performing Electronic Music on-the-Fly with Pharo and Coypu
PDF
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
PDF
Analysing Python Machine Learning Notebooks with Moose
PDF
FASTTypeScript metamodel generation using FAST traits and TreeSitter project
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
PDF
Package-Aware Approach for Repository-Level Code Completion in Pharo
PDF
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
PDF
An Analysis of Inline Method Refactoring
PDF
Identification of unnecessary object allocations using static escape analysis
PDF
Clean Blocks (IWST 2025, Gdansk, Poland)
PDF
Encoding for Objects Matters (IWST 2025)
PDF
Challenges of Transpiling Smalltalk to JavaScript
PDF
Immersive experiences: what Pharo users do!
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
PDF
Cavrois - an Organic Window Management (ESUG 2025)
PDF
Fluid Class Definitions in Pharo (ESUG 2025)
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
Micromaid: A simple Mermaid-like chart generator for Pharo
Directing Generative AI for Pharo Documentation
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
Composing and Performing Electronic Music on-the-Fly with Pharo and Coypu
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
Analysing Python Machine Learning Notebooks with Moose
FASTTypeScript metamodel generation using FAST traits and TreeSitter project
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
Package-Aware Approach for Repository-Level Code Completion in Pharo
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
An Analysis of Inline Method Refactoring
Identification of unnecessary object allocations using static escape analysis
Clean Blocks (IWST 2025, Gdansk, Poland)
Encoding for Objects Matters (IWST 2025)
Challenges of Transpiling Smalltalk to JavaScript
Immersive experiences: what Pharo users do!
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
Cavrois - an Organic Window Management (ESUG 2025)
Fluid Class Definitions in Pharo (ESUG 2025)
Ad

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
PDF
diccionario toefl examen de ingles para principiante
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Sciences of Europe No 170 (2025)
PPTX
Microbiology with diagram medical studies .pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPT
Chemical bonding and molecular structure
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life
diccionario toefl examen de ingles para principiante
TOTAL hIP ARTHROPLASTY Presentation.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
microscope-Lecturecjchchchchcuvuvhc.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Sciences of Europe No 170 (2025)
Microbiology with diagram medical studies .pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The KM-GBF monitoring framework – status & key messages.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Chemical bonding and molecular structure
bbec55_b34400a7914c42429908233dbd381773.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Derivatives of integument scales, beaks, horns,.pptx
Cell Membrane: Structure, Composition & Functions

Control flow-sensitive optimizations In the Druid Meta-Compiler

  • 1. Control flow-sensitive optimizations In the Druid Meta-Compiler Matías Demare - Guillermo Polito Javier Pimás - Nahuel Palumbo matias-nicolas.demare@inria.fr github.com/m-demare
  • 3. Conditional branches are slow… But why? ● Complexification of control flow ● Increase in code size ● CPU pipeline stalling 3
  • 4. Complexification1 of control flow 1 https://english.stackexchange.com/a/607869 ● Prevents optimizations ● Makes some compilers’ tasks harder ○ Block placement ○ Register allocation 4
  • 5. Increase in code size Has a considerable impact, especially with suboptimal code placement 5 (10s of bytes) ~1ns latency ~50ns latency
  • 6. Pipeline stalling ● Caused by the way modern processors work ● Can have a huge impact 6
  • 7. What’s a CPU pipeline? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) Assuming 3 cycles per instruction 7
  • 8. What’s a CPU pipeline? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) Assuming 3 cycles per instruction 8 9 cycles in total
  • 9. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 9
  • 10. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 10
  • 11. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 11
  • 12. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 12
  • 13. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 13
  • 14. …or does it? r1 := Add(r5, r8) r2 := Mul(r5, r3) r3 := Sub(r2, 5) 14
  • 15. The issue with branches 15 r2 := Add(r5, r8) jumpIf (r1 > 5) to X [other instructions…] X: …
  • 16. And it gets worse… ● The deeper the pipeline is, the longer the stall ○ AMD Zen uses 19 stages ○ Intel Lion Cove uses 10 stages ● Other CPU features can make this even more costly (see indirect branches, superscalar microarchitectures, etc) 16
  • 17. 17 CPUs try to solve this ● Branch prediction: ○ Guess which branch will run, and start executing it ○ Discard results if the guess was wrong ● Eager speculative execution ○ Execute both branches simultaneously ○ Discard the results from the one that ends up being “wrong” Still, the best branch is no branch at all
  • 18. “But… I don’t write redundant conditionals!!” Sadly, compilers write them for you ● Function inlining ● Lowering of high-level features. E.g., ○ Array bounds checks ○ Runtime type checks ○ Polymorphism + message sends 18
  • 19. “But… I don’t write redundant conditionals!!” Sadly, compilers write them for you ● Function inlining ● Lowering of high-level features. E.g., ○ Array bounds checks ○ Runtime type checks ○ Polymorphism + message sends [ i < array size ] whileTrue: [ var := array at: i. i := i + 1. ] Start: jumpIf (i >= array size) to End jumpIf (i >= array size) to Error var := MemLoad(array + i) i := Add(i, 1) jumpTo Start End: … 19
  • 20. Goal: Detect and eliminate dead branches 20 Dead branches: cannot be reached in any execution of the program. Detecting them implies determining if a given condition is satisfiable in its context x < 5 ifTrue: [ x > 10 ifTrue: [ “Unreachable code” ]]
  • 22. x < 5 ifTrue: [ x > 10 ifTrue: [ “Unreachable code” ] ] 22 x1 < 5 ifTrue: [ x2 := 𝜋(x1 , <5) x2 > 10 ifTrue: [ x3 := 𝜋(x2 , >10) “Unreachable code” ] ] PiNodes: Representing Constraints on Variables
  • 24. ● Graph based representation of a program ● Nodes = Basic blocks ● Edges = Jumps Control Flow Graph (CFG) 24
  • 25. SSA ● Variables are assigned exactly once x := 5. x := 27. x < 10 ifTrue: [ y := x. ] ifFalse: [ y := 13. ]. z = y + x x1 := 5. x2 := 27. x2 < 10 ifTrue: [ y1 := x2 . ] ifFalse: [ y2 := 13. ]. z1 = ?? + x2 25
  • 26. SSA ● Variables are assigned exactly once ● Φ-functions represent variables at merge points x := 5. x := 27. x < 10 ifTrue: [ y := x. ] ifFalse: [ y := 13. ]. z = y + x x1 := 5. x2 := 27. x2 < 10 ifTrue: [ (B1 ) y1 := x2 . ] ifFalse: [ (B2 ) y2 := 13. ]. y3 := Φ(B1 → y1 , B2 → y2 ) z1 = y3 + x2 26
  • 27. SSA - use-def chains x1 := 5. x2 := 27. x2 < 10 ifTrue: [ (B1 ) y1 := x2 . ] ifFalse: [ (B2 ) y2 := 13. ]. y3 := Φ(B1 → y1 , B2 → y2 ) z1 = y3 + x2 27
  • 28. x < 5 ifTrue: [ x > 10 ifTrue: [ x doSomething. “Unreachable code” ] ] 28 x1 < 5 ifTrue: [ x2 := 𝜋(x1 , <5) x2 > 10 ifTrue: [ x3 := 𝜋(x2 , >10) x3 doSomething. “Unreachable code” ] ] PiNodes: representing constraints on variables
  • 29. Optimizations - Dead branch elimination 29
  • 30. x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5) Is (-∞; 5) ∩ (-∞; 10) empty? NO ⇒ Reachable Constant dead branch elimination x1 < 5 ifTrue: [ x2 := 𝜋(x1, <5). x2 < 10 ifTrue: [ x3 := 𝜋(x2, <10). ] ifFalse: [ x4 := 𝜋(x2, >=10). " unreachable " ]. ] 30
  • 31. x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5) Is (-∞; 5) ∩ (-∞; 10) empty? NO ⇒ Reachable Constant dead branch elimination x1 < 5 ifTrue: [ x2 := 𝜋(x1, <5). x2 < 10 ifTrue: [ x3 := 𝜋(x2, <10). ] ifFalse: [ x4 := 𝜋(x2, >=10). " unreachable " ]. ] x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5) Is (-∞; 5) ∩ [10; ∞) empty? YES ⇒ Unreachable 31
  • 32. x3:=𝜋(x2, <10) and x2:=𝜋(x1, <5) Is (-∞; 5) ∩ (-∞; 10) empty? NO ⇒ Reachable Constant dead branch elimination x1 < 5 ifTrue: [ x2 := 𝜋(x1, <5). x2 < 10 ifTrue: [ x3 := 𝜋(x2, <10). ] ifFalse: [ x4 := 𝜋(x2, >=10). " unreachable " ]. ] x4:=𝜋(x2, >=10) and x2:=𝜋(x1, <5) Is (-∞; 5) ∩ [10; ∞) empty? YES ⇒ Unreachable 32
  • 33. Constant dead branch elimination x1 < 5 ifTrue: [ x2 := 𝜋(x1, <5). y1 := x2. ] ifFalse: [ x3 := 𝜋(x1, >=5). y2 := 8. ]. y3 := Φ(y1, y2) What are the possible values of y3? The union between the possible values of y1 and y2 (-∞; 5) ∪ {8} 33
  • 34. ABCD method More powerful: ● Models the relationship between variables ● Models the effect of basic arithmetic operations (addition and subtraction) 34
  • 35. ABCD method x1 <= y1 ifTrue: [ x2 := 𝜋(x1, <=y1). y2 := 𝜋(y1, >=x1). x3 := x2 - 10. x3 < y2 ifTrue: [ "tautology" x4 := 𝜋(x3, <y2). y3 := 𝜋(y2, >x3). ] ifFalse: [ "unreachable" x5 := 𝜋(x3, >=y2). y4 := 𝜋(y2, <=x3). ] ]. 35 Nodes represent SSA values An edge from a to b with weight w means that b - a ≤ w.
  • 37. Experimental Context Druid 37 ● source-to-source meta-compiler ● uses many optimization passes ○ Analysis and code transformation
  • 38. Old DBE vs PiNodes ● Druid already had a DBE pass ● Worked by computing all paths a variable was alive in ● Questions: ○ Is our new constant DBE method faster? ○ Is ABCD, the more powerful method, slower? 38
  • 39. Measuring Compile Time Improvement ● Used two benchmarks to compare time spent optimizing: ○ Compiled all methods of a test class ○ Compiled one hand-crafted method with an intentionally complex control flow 39
  • 44. Future Work ● Stronger constraint solving - Z3 ● Measuring run time improvements ● Looking for more optimization opportunities ○ Using the Druid optimizer for high-level Pharo code ○ Message splitting 44
  • 45. More Opportunities for Complex Control Flows x < 3 ifTrue: [ y doSomething. ]. x < 5 ifTrue: [ z doSomethingElse. ]. 45
  • 46. Future: Message splitting x < 3 ifTrue: [ y doSomething. ]. x < 5 ifTrue: [ z doSomethingElse. ]. 46
  • 50. 50 Matías Demare - Guillermo Polito Javier Pimás - Nahuel Palumbo matias-nicolas.demare@inria.fr github.com/m-demare ● Branches make code slow ● It’s common to have some dead branches in your code ● PiNodes represent constraints on SSA variables, and can be used for DBE Conclusions
  • 52. Critical Edges ● Edges whose successor has multiple predecessors, and whose predecessor has multiple successors ● They are annoying for PiNode insertion, because the successor is not dominated by the block containing the condition 52
  • 53. Breaking Critical Edges ● Remove that edge, and insert ● Insert a new basic block with just an unconditional jump to the critical edge’s target in its place 53
  • 54. Domination ● B1 dominates B2 if every path from the entry node to B2 must go through B1 B1 dominates B1 , B2 , B3 , B4 , B5 B2 dominates B2 , B3 B3 , B4 , B5 only dominate themselves 54
  • 55. The PiNode Framework - insertion ● Break critical edges ● Insert PiNodes in each successor of a condition (one for each variable involved) ● Replace usages in dominated blocks 55
  • 56. The PiNode Framework - deletion Simple copy propagation algorithm: replace each usage of the PiNode for a usage of the copied variable 56
  • 57. Dead branch elimination 57 cfg piNodesDo: [ :piNode | piNode ifNotSatisfiable: [ unreachableBlocks add: piNode basicBlock. ]. ]. cfg removeJmpsTo: unreachableBlocks. cfg removeBlocks: unreachableBlocks. Basic pseudocode of the algorithm
  • 58. Message splitting (with code) x < 3 ifTrue: [ y doSomething. ] ifFalse: []. x < 5 ifTrue: [ z doSomethingElse. ]. x < 3 ifTrue: [ y doSomething. x < 5 ifTrue: [ z doSomethingElse. ]. ] ifFalse: [ x < 5 ifTrue: [ z doSomethingElse. ]. ]. 58
  • 59. Message splitting (with code) x < 3 ifTrue: [ y doSomething. ] ifFalse: []. x < 5 ifTrue: [ z doSomethingElse. ]. x < 3 ifTrue: [ y doSomething. x < 5 ifTrue: [ z doSomethingElse. ]. ] ifFalse: [ x < 5 ifTrue: [ z doSomethingElse. ]. ]. 59