SlideShare a Scribd company logo
OptView2
View and improve
compiler optimizations
Ofek Shilon
Istra Research
Jan 2022
@OfekShilon
1
Things I’d like the compiler to tell me:
• I couldn’t inline that function
…because I had only its declaration.
• I’m re-evaluating these expressions on every loop iteration
…because I couldn’t prove they stay fixed.
• I’m reloading this variable from memory many times
…because I don’t know whether unrelated code lines change it.
2
Some raw optimization data is exposed
• Clang/gcc: -Rpass
code.cc:4:25: remark: foo inlined into bar [-Rpass=inline]
• Intel: -qopt-report=[0..5]
• Microsoft (very partial):
/Qpar-report, /Qvec-report
3
4
Presentation Matters.
• opt-viewer, 2016 work by Adam Nemet (Apple) and others
• Focus on Clang, over Linux
5
Opt-Viewer Usage
• Build with an extra clang switch:
-fsave-optimization-record
*.opt.yaml files are generated, by
default in the obj folder.
• Generate htmls:
$ opt-viewer.py
--output-dir <htmls folder>
--source-dir <repo>
<yamls folder>
6
opt-viewer.py Sample Output
Inlining
context
Hotness
(PGO)
7
Opt-Viewer Usage
• Speed up yaml generation ~x50 by installing libyaml:
$ sudo apt install libyaml-dev
$ pip --no-cache-dir install --verbose --force-reinstall -I pyyaml
• The script would detect missing libyaml and suggest installing it:
“For faster parsing, you may want to install libyaml for PyYAL”
8
9
Great work, still not much traction
https://www.youtube.com/watch?v=qq0q1hfzidg
10
Great work, still not much traction
11
Why?
• Heavy
• High I/O
• High memory
• >1G htmls
• Designed (and presented) for compiler authors
• Mostly non actionable to developers
12
Introducing optview2
• https://github.com/OfekShilon/optview2
13
Target Developers, Not Compiler Authors
• Denoise:
• Collect only optimization failures
• Ignore system headers
• Display a single entry per type/source loc (in index)
• …
• ~1.5M lines ==> 22K lines
• Don’t mention ‘passes’
• Make the index sortable, resizable & pageable
• Abridged function names
• ...
14
Example - OpenCV
• file://wsl.localhost/Ubuntu-20.04/home/ofek/html/opencv-
clean/modules/photo/index.html
15
“Definition is unavailable”
16
!
17
“Clobbered by store”
• https://godbolt.org/z/K9a79MhcE
• Aliasing
• Forces reloading redistBatch on every loop iteration
18
Aliasing – the silent killer
• Distribution of 22K remarks in a C++ project:
19
20
“Clobbered by call”
• https://godbolt.org/z/bb9fTKj9d
• https://github.com/llvm/llvm-project/issues/53102
21
22
Aliasing: Counter-measures
• LTO ?
• __restrict__
• Non standard, tells the compiler an argument doesn’t alias with anything else
• __attribute__((pure)), __attribute__((const))
• Non standard, tell the compiler a function doesn’t write/read global state.
• Potentially resolve ‘clobbered by call’ opt failures,
• Strict-Aliasing
• The only C++ standard compliant way.
23
Strict Aliasing
• “Strict aliasing is an assumption made by the compiler, that
objects of different types will never refer to the same
memory location (i.e. alias each other.)”
Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
• Allowed optimization. on by default for -O2 +
• char* is an exception – always allowed to alias with any type.
• Can be disabled with –fno-strict-aliasing (pessimization!)
24
Strict Aliasing: Forcing type difference
• typedef / using ?
• Inherit int?
• “I considered …to allow derivation from built-in classes … However, I
restrained myself. … the C conversion rules are so chaotic that pretending that
int, short, etc., are well-behaved ordinary classes is not going to work. They
are either C compatible, or they obey the relatively well-behaved C++ rules
for classes, but not both.”
Bjarne Stroustrup, The Design and Evolution of C++, §15.11.3.
• Strong-Typedefs
• Wrappers
25
Strict Aliasing demos
• https://godbolt.org/z/qGdj4qoG1
• https://godbolt.org/z/zWfYdsorE
• https://godbolt.org/z/oxq1K3n8W
26
Demo 3: loop hoisting
27
• Check foreach!!
• Try to form a toy example
28
Demo 3: loop hoisting
29
30
GCC work
• https://gcc.gnu.org/legacy-ml/gcc-patches/2018-
05/msg01675.html
• https://github.com/davidmalcolm/gcc-opt-viewer
• YAML -> JSON.gz
• Actually a better choice
https://stackoverflow.com/questions/27743711/can-i-speedup-yaml
• Active only during 2018
• Currently seems unusable
https://github.com/davidmalcolm/gcc-opt-viewer/issues/3
31
Conclusions applicable to other compilers?
• In many cases – yes.
• Shows where to look
32
OptView2 Future work
• Improve filtering,
• Reduce run time & memory,
• Run on windows,
• Consume binary optimization remarks,
• Revive (possibly integrate) gcc-opt-viewer,
• Support code annotations,
• Report LLVM bugs.
33
Lots of help needed 
• https://github.com/Of
ekShilon/optview2
• ofekshilon@gmail.com
34
Bkp
35
llvm-opt-report
$ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
$ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
$ cat /tmp/v.lst
• https://reviews.llvm.org/D25262
• https://github.com/llvm/llvm-
project/tree/main/llvm/tools/llvm-opt-
report
36
Impact
https://github.com/opencv/opencv/wiki/HowToUsePerfTests
$ cd ~/src/opencv/modules/ts/misc/
$ python3 ./run.py ~/src/opencv-clean/build/ -w ~/logs-opencv-clean
$ python3 ./run.py ~/src/opencv-opt/build/ -w ~/logs-opencv-opt
$ python3 ./summary.py ~/logs-opencv-clean/core* ~/logs-opencv-opt/core*
-u mks -m median
• Seems substantial.
• Will publish more detailed steps in the optview2 github page (and
possibly fork opencv)
Detour
37
38

More Related Content

PPT
Modelo atómico
PPTX
OptView2 - C++ on Sea 2022
PDF
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
PDF
Code quality par Simone Civetta
PDF
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
PPTX
Effective C++
PDF
Использование AzureDevOps при разработке микросервисных приложений
PDF
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Modelo atómico
OptView2 - C++ on Sea 2022
AzovDevMeetup 2016 | Angular 2: обзор | Александр Шевнин
Code quality par Simone Civetta
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019
Effective C++
Использование AzureDevOps при разработке микросервисных приложений
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...

Similar to OptView2 MUC meetup slides (20)

PDF
Callgraph analysis
PDF
Steamlining your puppet development workflow
PDF
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
PDF
SPARKNaCl: A verified, fast cryptographic library
PDF
Linux programming
ODP
NovaProva, a new generation unit test framework for C programs
PDF
What to expect from Java 9
PDF
Our Puppet Story (GUUG FFG 2015)
PPTX
Performance Benchmarking: Tips, Tricks, and Lessons Learned
KEY
Writing Better Haskell
PPTX
C++ in kernel mode
KEY
Exciting JavaScript - Part II
PDF
Two C++ Tools: Compiler Explorer and Cpp Insights
PDF
Code lifecycle in the jvm - TopConf Linz
ODP
Groovy In the Cloud
PPTX
GCC Summit 2010
PDF
Bounded Model Checking for C Programs in an Enterprise Environment
PDF
Webinar - Unbox GitLab CI/CD
PDF
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Callgraph analysis
Steamlining your puppet development workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
SPARKNaCl: A verified, fast cryptographic library
Linux programming
NovaProva, a new generation unit test framework for C programs
What to expect from Java 9
Our Puppet Story (GUUG FFG 2015)
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Writing Better Haskell
C++ in kernel mode
Exciting JavaScript - Part II
Two C++ Tools: Compiler Explorer and Cpp Insights
Code lifecycle in the jvm - TopConf Linz
Groovy In the Cloud
GCC Summit 2010
Bounded Model Checking for C Programs in an Enterprise Environment
Webinar - Unbox GitLab CI/CD
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Ad

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
System and Network Administraation Chapter 3
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Introduction to Artificial Intelligence
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Nekopoi APK 2025 free lastest update
Upgrade and Innovation Strategies for SAP ERP Customers
System and Network Administraation Chapter 3
Wondershare Filmora 15 Crack With Activation Key [2025
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Systems & Binary Numbers (comprehensive )
How to Choose the Right IT Partner for Your Business in Malaysia
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Design an Analysis of Algorithms II-SECS-1021-03
VVF-Customer-Presentation2025-Ver1.9.pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Design an Analysis of Algorithms I-SECS-1021-03
Softaken Excel to vCard Converter Software.pdf
Transform Your Business with a Software ERP System
Navsoft: AI-Powered Business Solutions & Custom Software Development
Introduction to Artificial Intelligence
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Ad

OptView2 MUC meetup slides

  • 1. OptView2 View and improve compiler optimizations Ofek Shilon Istra Research Jan 2022 @OfekShilon 1
  • 2. Things I’d like the compiler to tell me: • I couldn’t inline that function …because I had only its declaration. • I’m re-evaluating these expressions on every loop iteration …because I couldn’t prove they stay fixed. • I’m reloading this variable from memory many times …because I don’t know whether unrelated code lines change it. 2
  • 3. Some raw optimization data is exposed • Clang/gcc: -Rpass code.cc:4:25: remark: foo inlined into bar [-Rpass=inline] • Intel: -qopt-report=[0..5] • Microsoft (very partial): /Qpar-report, /Qvec-report 3
  • 4. 4
  • 5. Presentation Matters. • opt-viewer, 2016 work by Adam Nemet (Apple) and others • Focus on Clang, over Linux 5
  • 6. Opt-Viewer Usage • Build with an extra clang switch: -fsave-optimization-record *.opt.yaml files are generated, by default in the obj folder. • Generate htmls: $ opt-viewer.py --output-dir <htmls folder> --source-dir <repo> <yamls folder> 6
  • 8. Opt-Viewer Usage • Speed up yaml generation ~x50 by installing libyaml: $ sudo apt install libyaml-dev $ pip --no-cache-dir install --verbose --force-reinstall -I pyyaml • The script would detect missing libyaml and suggest installing it: “For faster parsing, you may want to install libyaml for PyYAL” 8
  • 9. 9
  • 10. Great work, still not much traction https://www.youtube.com/watch?v=qq0q1hfzidg 10
  • 11. Great work, still not much traction 11
  • 12. Why? • Heavy • High I/O • High memory • >1G htmls • Designed (and presented) for compiler authors • Mostly non actionable to developers 12
  • 14. Target Developers, Not Compiler Authors • Denoise: • Collect only optimization failures • Ignore system headers • Display a single entry per type/source loc (in index) • … • ~1.5M lines ==> 22K lines • Don’t mention ‘passes’ • Make the index sortable, resizable & pageable • Abridged function names • ... 14
  • 15. Example - OpenCV • file://wsl.localhost/Ubuntu-20.04/home/ofek/html/opencv- clean/modules/photo/index.html 15
  • 17. ! 17
  • 18. “Clobbered by store” • https://godbolt.org/z/K9a79MhcE • Aliasing • Forces reloading redistBatch on every loop iteration 18
  • 19. Aliasing – the silent killer • Distribution of 22K remarks in a C++ project: 19
  • 20. 20
  • 21. “Clobbered by call” • https://godbolt.org/z/bb9fTKj9d • https://github.com/llvm/llvm-project/issues/53102 21
  • 22. 22
  • 23. Aliasing: Counter-measures • LTO ? • __restrict__ • Non standard, tells the compiler an argument doesn’t alias with anything else • __attribute__((pure)), __attribute__((const)) • Non standard, tell the compiler a function doesn’t write/read global state. • Potentially resolve ‘clobbered by call’ opt failures, • Strict-Aliasing • The only C++ standard compliant way. 23
  • 24. Strict Aliasing • “Strict aliasing is an assumption made by the compiler, that objects of different types will never refer to the same memory location (i.e. alias each other.)” Mike Acton https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html • Allowed optimization. on by default for -O2 + • char* is an exception – always allowed to alias with any type. • Can be disabled with –fno-strict-aliasing (pessimization!) 24
  • 25. Strict Aliasing: Forcing type difference • typedef / using ? • Inherit int? • “I considered …to allow derivation from built-in classes … However, I restrained myself. … the C conversion rules are so chaotic that pretending that int, short, etc., are well-behaved ordinary classes is not going to work. They are either C compatible, or they obey the relatively well-behaved C++ rules for classes, but not both.” Bjarne Stroustrup, The Design and Evolution of C++, §15.11.3. • Strong-Typedefs • Wrappers 25
  • 26. Strict Aliasing demos • https://godbolt.org/z/qGdj4qoG1 • https://godbolt.org/z/zWfYdsorE • https://godbolt.org/z/oxq1K3n8W 26
  • 27. Demo 3: loop hoisting 27
  • 28. • Check foreach!! • Try to form a toy example 28
  • 29. Demo 3: loop hoisting 29
  • 30. 30
  • 31. GCC work • https://gcc.gnu.org/legacy-ml/gcc-patches/2018- 05/msg01675.html • https://github.com/davidmalcolm/gcc-opt-viewer • YAML -> JSON.gz • Actually a better choice https://stackoverflow.com/questions/27743711/can-i-speedup-yaml • Active only during 2018 • Currently seems unusable https://github.com/davidmalcolm/gcc-opt-viewer/issues/3 31
  • 32. Conclusions applicable to other compilers? • In many cases – yes. • Shows where to look 32
  • 33. OptView2 Future work • Improve filtering, • Reduce run time & memory, • Run on windows, • Consume binary optimization remarks, • Revive (possibly integrate) gcc-opt-viewer, • Support code annotations, • Report LLVM bugs. 33
  • 34. Lots of help needed  • https://github.com/Of ekShilon/optview2 • ofekshilon@gmail.com 34
  • 36. llvm-opt-report $ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record $ llvm-opt-report /tmp/v.yaml > /tmp/v.lst $ cat /tmp/v.lst • https://reviews.llvm.org/D25262 • https://github.com/llvm/llvm- project/tree/main/llvm/tools/llvm-opt- report 36
  • 37. Impact https://github.com/opencv/opencv/wiki/HowToUsePerfTests $ cd ~/src/opencv/modules/ts/misc/ $ python3 ./run.py ~/src/opencv-clean/build/ -w ~/logs-opencv-clean $ python3 ./run.py ~/src/opencv-opt/build/ -w ~/logs-opencv-opt $ python3 ./summary.py ~/logs-opencv-clean/core* ~/logs-opencv-opt/core* -u mks -m median • Seems substantial. • Will publish more detailed steps in the optview2 github page (and possibly fork opencv) Detour 37
  • 38. 38

Editor's Notes

  • #2: Very little known LLVM feature, I really think can add value to your day to day work. Certainly if you’re working in clang I’ll present first the LLVM existing opt-viewer tool, then optview2 which is derived work by me, And about half the time would be dedicated to demonstrations of this tool in use. With specific focus on aliasing issues.
  • #3: I want to gain visibility into compiler optimizations. More specifically, I want to see where the compiler tried optimizations and failed – there might be something I could do about it.
  • #7: LLVM was already instrumented to show optimization choices (for –Rpass), Hal Finkel wrapped the data in yaml format for machine consumption. Adam Nemet wrote the opt-viewer python scripts
  • #8: A few small python scripts that process the compiler-emitted remarks into annotated source htmls, and an index page – which is an optimization dashboard of sorts.
  • #13: Don’t know, guessing I stopped it at 60G mem Ran it on separate subsets of a real project An extra step is needed to make this usable to us Let’s take that extra step.
  • #16: The demos are of results of the optview2 version, but let it be said that the entire heavy lifting is done by the original code Credit to Ilan Ben Hagai for javascript wizardry
  • #18: Full opt-viewer, not just failures
  • #19: file:///C:/optview2/html/opencv4/modules_imgproc_src_clahe.cpp.html#L201 One of the darker corners corners of C++, already full of dark corners Some guesswork in interpreting the optview2 reports
  • #24: Restrict –compiles on non arguments, didn’t see any impact
  • #25: Turning on strict-aliasing means allowing the compiler to assume exactly that. Turning it off is a pessimization What can be done? Force type difference
  • #28: file:///C:/optview2/html/opencv2/_home_ofek_src_opencv_modules_imgproc_src_canny.cpp.html#L419
  • #30: This is not good C++ code Don’t do this for fun. Do this in code where you *really* care about cycle-level performance. Hopefully compilers would get good enough to not have to do this.
  • #33: Probably If your project *can* be built by clang – I suggest you give it a go.
  • #34: Report LLVM bugs: alias analysis bugs had no observable symptoms until now. There are no diagnostics emitted on them and they don’t result in bad codegen So, I suspect there are plenty of them.
  • #38: Tests done on WSL2, sys calls have overhead – but in both repos