SlideShare a Scribd company logo
Value Numbering in GCC
Dr. Richard Biener
SUSE Labs, Sep 15th, 2022
Value Numbering
I Assign value numbers to expressions
I Expressions that produce the same value should have the same
value number
I Usually achieved by hashing of simplified and canonicalized
expressions with operands replaced by their value number
Value Numbering in GCC
Multiple value numbering implementations and their main users
I RTL CSE (cselib)
I RTL PRE
I GIMPLE SSA DOM (scoped tables)
I GIMPLE SSA FRE/PRE (RPO VN)
I simpler forms of VN in CCP and copy propagation
Common Subexpression Elimination
For each statement
I try to simplify the computed expression using value numbers of
the operands
I lookup value number of the simplified expression
I if found and a register with that value is available, replace the
expression with the register or constant
I if not found, record a new value number for it and make it
available in the destination receiving the value of the expression
Availability
Different ways to track, update and query availability of a so called
leader for a value number
I with a DOM walk a value to leader map can be kept
up-to-date with an unwind stack
I the RPO VN walk records a list of leaders for each value that
can be unwound when iterating and otherwise queried with
dominator checks
Availability and expression simplification
I use match.pd based simplification
I value expression operands get substituted with their leaders
I allows to keep flow-sensitive info like ranges
Memory Expressions
ENTRY
<bb 2>:
# .MEM_3 = VDEF <.MEM_1(D)>
p_2(D)->a = 0;
# .MEM_4 = VDEF <.MEM_3>
p_2(D)->b = 1;
# .MEM_5 = VDEF <.MEM_4>
x = *p_2(D);
# VUSE <.MEM_5>
_6 = x.a;
# .MEM_7 = VDEF <.MEM_5>
x ={v} {CLOBBER(eol)};
# VUSE <.MEM_7>
return _6;
Memory Expressions
I memory state is part of hashing, the current .MEM_n virtual
definition is used
I at lookup time walk the virtual SSA use->def chains, skip
clobbers that do not alias and perform lookups with the
previous memory state
I fancy tricks during walking
I memory to memory copies
I pieces from larger entities
I larger objects formed from smaller entities
I memory handling consumes the majority of compile time
Why RPO VN
I SSA SCC VN
I reduces what to iterate
I difficult to mate with CFG: not executable parts, predication,
equivalences, region
I RPO VN
I iteration more costly
I maps to the CFG, allows for flow-sensitive optimizations easily
I allows region-based operation
RPO VN Operation Modes
I can operate with different effort for memory handling
I can do optimistic, iterating VN with elimination done after the
fact
I can do non-iterating VN with immediate elimination
I can operate on the whole function or a single entry, multiple
exit region
Iterating vs non-Iterating
loop 1
<bb 3>:
# i_1 = PHI <i_4(2), i_7(3)>
# val_2 = PHI <val_5(2), val_6(3)>
val_6 = val_2 + 1;
i_7 = i_1 + 1;
if (i_7 < n_3)
goto <bb 3>; [INV]
else
goto <bb 4>; [INV]
<bb 4>:
_8 = val_6;
return _8;
ENTRY
<bb 2>:
n_3 = 1;
i_4 = 0;
val_5 = 0;
Iteration scheme
I SSA SCC based VN iterates SSA SCCs until nothing changes
I RPO VN iterates CFG cycles
I rev_post_order_and_mark_dfs_back_seme can compute a
RPO with CFG cycles adjacent and their extent in the RPO
array recorded
I handles irreducible regions, loop info would not
I optimal regions for iteration
I avoid iteration when possible, do not iterate until nothing
changes
I unwind cost to the iteration point linear with the amound of
things to undo (expression hashes, availability)
I iteration itself is O(n * loop-depth), inner cycles are iterated
fully before iterating outer cycles
Non-iterative mode
I Greedy walk along edges discovered as executable, but
enforcing RPO visiting of reachable blocks.
I Predecessors not visited and reachable from blocks later in
RPO have to be conservatively assumed reachable.
I Handles PHIs with unreachable incoming non-back edges
optimally
RPO VN as Utility
RPO VN was designed to be usable on small regions of a function
without much overhead when doing that very often and with being
much cheaper than a pass over the whole function.
I loop unrolling applies CSE on unrolled bodies before trying to
unroll the containing loop
I loop if-conversion applies CSE to optimize predicates
I unroll-and-jam applies CSE to leverage cross loop redundancies
I uninit analysis uses RPO VN to compute basic block
reachability without performing actual CSE
RPO VN Utility API
enum vn_lookup_kind { VN_NOWALK, VN_WALK, VN_WALKREWRITE };
unsigned do_rpo_vn
(function *fun, edge entry, bitmap exits,
/* iterate */ bool = false, /* eliminate */ bool = true
vn_lookup_kind = VN_WALKREWRITE);
rev_post_order_and_mark_dfs_back_seme
(function *fn, edge entry, bitmap exit_bbs,
bool for_iteration, int *rev_post_order,
vec<std::pair<int, int> > *scc_ext);
auto_bb_flag, auto_edge_flag
RPO VN Utility Efficiency
Non-iterating region-based VN with or without elimination was
designed to be efficient
I startup cost linear in the size of the region
I performing RPO VN with VN_NOWALK, without iteration
and elimination on each basic-block individually vs. performing
a single RPO VN on the whole function is only around 15%
slower for cc1files with insn-attrtab.i being the outlier at 280%
I more elaborate memory handling or doing elimination does not
allow for an apples vs. apples comparison
I while doing CSE on the whole function might perform more
optimizations doing that should never be faster than only doing
CSE on the regions a pass performed a transformation on
TODO
I experiment with using ranger instead of the ad-hoc predication
we have
I review equivalence tracking changes
I think of a cheaper way to do “iteration”
I we have simple DCE with a SSA worklist, need region
DCE/DSE
Questions?

More Related Content

PDF
Stale pointers are the new black - white paper
PPT
Code Tuning
PDF
PPU Optimisation Lesson
PDF
Introduction to Compiler Development
PPTX
#GDC15 Code Clinic
PPTX
Mod.2.pptx
PPTX
Computer Architecture Assignment Help
PDF
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Stale pointers are the new black - white paper
Code Tuning
PPU Optimisation Lesson
Introduction to Compiler Development
#GDC15 Code Clinic
Mod.2.pptx
Computer Architecture Assignment Help
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2

Similar to 2022 Cauldron Value Numbering for gcc versions (20)

PDF
PVS-Studio documentation (version 4.54)
PDF
SFO15-500: VIXL
PDF
Appsec obfuscator reloaded
PPTX
How Data Flow analysis works in a static code analyzer
PDF
Finding bugs in the code of LLVM project with the help of PVS-Studio
PPTX
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
PPTX
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
PPTX
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PPTX
Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...
PPTX
Virtual Separation of Concerns (2011 Update)
PDF
Optimizing with persistent data structures (LLVM Cauldron 2016)
PDF
Vector Codegen in the RISC-V Backend
PPTX
Hypercritical C++ Code Review
PDF
OptimizingARM
PDF
new-iter-concepts
PPTX
Static analysis of C++ source code
PPTX
Static analysis of C++ source code
PDF
Embedded C - Optimization techniques
PPT
Chapter Seven(2)
PPTX
The Style of C++ 11
PVS-Studio documentation (version 4.54)
SFO15-500: VIXL
Appsec obfuscator reloaded
How Data Flow analysis works in a static code analyzer
Finding bugs in the code of LLVM project with the help of PVS-Studio
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...
Virtual Separation of Concerns (2011 Update)
Optimizing with persistent data structures (LLVM Cauldron 2016)
Vector Codegen in the RISC-V Backend
Hypercritical C++ Code Review
OptimizingARM
new-iter-concepts
Static analysis of C++ source code
Static analysis of C++ source code
Embedded C - Optimization techniques
Chapter Seven(2)
The Style of C++ 11
Ad

More from ssuser866937 (11)

PDF
GNU Toolchain Infrastructure at gcc cauldron
PDF
Ctrl-C redesign for gcc cauldron in 2022 in prague
PDF
cauldron-2022-docs-bof at gcc cauldron in 2022
PDF
Cauldron_2022_ctf_frame at gcc cauldron 2022 in prague
PDF
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
PDF
Anatomy of ROCgdb presentation at gcc cauldron 2022
PDF
2022-ranger-update-Cauldron for gcc versions
PDF
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
PDF
2022 Cauldron analyzer talk from david malcolm
PDF
OpenMP-OpenACC-Offload-Cauldron2022-1.pdf
PDF
cs.ds-2211.13454.pdf
GNU Toolchain Infrastructure at gcc cauldron
Ctrl-C redesign for gcc cauldron in 2022 in prague
cauldron-2022-docs-bof at gcc cauldron in 2022
Cauldron_2022_ctf_frame at gcc cauldron 2022 in prague
BoF-OpenMP-OpenACC-Offloading-Cauldron2022.pdf
Anatomy of ROCgdb presentation at gcc cauldron 2022
2022-ranger-update-Cauldron for gcc versions
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022 Cauldron analyzer talk from david malcolm
OpenMP-OpenACC-Offload-Cauldron2022-1.pdf
cs.ds-2211.13454.pdf
Ad

Recently uploaded (20)

PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPTX
E -tech empowerment technologies PowerPoint
PDF
Sims 4 Historia para lo sims 4 para jugar
PPTX
Digital Literacy And Online Safety on internet
PPTX
Introduction to Information and Communication Technology
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPTX
Internet___Basics___Styled_ presentation
PPT
Ethics in Information System - Management Information System
PPTX
artificial intelligence overview of it and more
DOCX
Unit-3 cyber security network security of internet system
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PPTX
Introduction to cybersecurity and digital nettiquette
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
Database Information System - Management Information System
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
Module 1 - Cyber Law and Ethics 101.pptx
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
E -tech empowerment technologies PowerPoint
Sims 4 Historia para lo sims 4 para jugar
Digital Literacy And Online Safety on internet
Introduction to Information and Communication Technology
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Internet___Basics___Styled_ presentation
Ethics in Information System - Management Information System
artificial intelligence overview of it and more
Unit-3 cyber security network security of internet system
artificialintelligenceai1-copy-210604123353.pptx
Introduction to cybersecurity and digital nettiquette
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Database Information System - Management Information System
Exploring VPS Hosting Trends for SMBs in 2025
Job_Card_System_Styled_lorem_ipsum_.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
Tenda Login Guide: Access Your Router in 5 Easy Steps

2022 Cauldron Value Numbering for gcc versions

  • 1. Value Numbering in GCC Dr. Richard Biener SUSE Labs, Sep 15th, 2022
  • 2. Value Numbering I Assign value numbers to expressions I Expressions that produce the same value should have the same value number I Usually achieved by hashing of simplified and canonicalized expressions with operands replaced by their value number
  • 3. Value Numbering in GCC Multiple value numbering implementations and their main users I RTL CSE (cselib) I RTL PRE I GIMPLE SSA DOM (scoped tables) I GIMPLE SSA FRE/PRE (RPO VN) I simpler forms of VN in CCP and copy propagation
  • 4. Common Subexpression Elimination For each statement I try to simplify the computed expression using value numbers of the operands I lookup value number of the simplified expression I if found and a register with that value is available, replace the expression with the register or constant I if not found, record a new value number for it and make it available in the destination receiving the value of the expression
  • 5. Availability Different ways to track, update and query availability of a so called leader for a value number I with a DOM walk a value to leader map can be kept up-to-date with an unwind stack I the RPO VN walk records a list of leaders for each value that can be unwound when iterating and otherwise queried with dominator checks
  • 6. Availability and expression simplification I use match.pd based simplification I value expression operands get substituted with their leaders I allows to keep flow-sensitive info like ranges
  • 7. Memory Expressions ENTRY <bb 2>: # .MEM_3 = VDEF <.MEM_1(D)> p_2(D)->a = 0; # .MEM_4 = VDEF <.MEM_3> p_2(D)->b = 1; # .MEM_5 = VDEF <.MEM_4> x = *p_2(D); # VUSE <.MEM_5> _6 = x.a; # .MEM_7 = VDEF <.MEM_5> x ={v} {CLOBBER(eol)}; # VUSE <.MEM_7> return _6;
  • 8. Memory Expressions I memory state is part of hashing, the current .MEM_n virtual definition is used I at lookup time walk the virtual SSA use->def chains, skip clobbers that do not alias and perform lookups with the previous memory state I fancy tricks during walking I memory to memory copies I pieces from larger entities I larger objects formed from smaller entities I memory handling consumes the majority of compile time
  • 9. Why RPO VN I SSA SCC VN I reduces what to iterate I difficult to mate with CFG: not executable parts, predication, equivalences, region I RPO VN I iteration more costly I maps to the CFG, allows for flow-sensitive optimizations easily I allows region-based operation
  • 10. RPO VN Operation Modes I can operate with different effort for memory handling I can do optimistic, iterating VN with elimination done after the fact I can do non-iterating VN with immediate elimination I can operate on the whole function or a single entry, multiple exit region
  • 11. Iterating vs non-Iterating loop 1 <bb 3>: # i_1 = PHI <i_4(2), i_7(3)> # val_2 = PHI <val_5(2), val_6(3)> val_6 = val_2 + 1; i_7 = i_1 + 1; if (i_7 < n_3) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 4>: _8 = val_6; return _8; ENTRY <bb 2>: n_3 = 1; i_4 = 0; val_5 = 0;
  • 12. Iteration scheme I SSA SCC based VN iterates SSA SCCs until nothing changes I RPO VN iterates CFG cycles I rev_post_order_and_mark_dfs_back_seme can compute a RPO with CFG cycles adjacent and their extent in the RPO array recorded I handles irreducible regions, loop info would not I optimal regions for iteration I avoid iteration when possible, do not iterate until nothing changes I unwind cost to the iteration point linear with the amound of things to undo (expression hashes, availability) I iteration itself is O(n * loop-depth), inner cycles are iterated fully before iterating outer cycles
  • 13. Non-iterative mode I Greedy walk along edges discovered as executable, but enforcing RPO visiting of reachable blocks. I Predecessors not visited and reachable from blocks later in RPO have to be conservatively assumed reachable. I Handles PHIs with unreachable incoming non-back edges optimally
  • 14. RPO VN as Utility RPO VN was designed to be usable on small regions of a function without much overhead when doing that very often and with being much cheaper than a pass over the whole function. I loop unrolling applies CSE on unrolled bodies before trying to unroll the containing loop I loop if-conversion applies CSE to optimize predicates I unroll-and-jam applies CSE to leverage cross loop redundancies I uninit analysis uses RPO VN to compute basic block reachability without performing actual CSE
  • 15. RPO VN Utility API enum vn_lookup_kind { VN_NOWALK, VN_WALK, VN_WALKREWRITE }; unsigned do_rpo_vn (function *fun, edge entry, bitmap exits, /* iterate */ bool = false, /* eliminate */ bool = true vn_lookup_kind = VN_WALKREWRITE); rev_post_order_and_mark_dfs_back_seme (function *fn, edge entry, bitmap exit_bbs, bool for_iteration, int *rev_post_order, vec<std::pair<int, int> > *scc_ext); auto_bb_flag, auto_edge_flag
  • 16. RPO VN Utility Efficiency Non-iterating region-based VN with or without elimination was designed to be efficient I startup cost linear in the size of the region I performing RPO VN with VN_NOWALK, without iteration and elimination on each basic-block individually vs. performing a single RPO VN on the whole function is only around 15% slower for cc1files with insn-attrtab.i being the outlier at 280% I more elaborate memory handling or doing elimination does not allow for an apples vs. apples comparison I while doing CSE on the whole function might perform more optimizations doing that should never be faster than only doing CSE on the regions a pass performed a transformation on
  • 17. TODO I experiment with using ranger instead of the ad-hoc predication we have I review equivalence tracking changes I think of a cheaper way to do “iteration” I we have simple DCE with a SSA worklist, need region DCE/DSE