SlideShare a Scribd company logo
Stencil Computation
Research Project
Jishnu P | Reshmi Mitra
Presentation #1 | Date: 11-Jul-2017
The Agenda
Discuss 2 or 3 Optimization
techniques from
An Auto-Tuning
Framework for Parallel
Multicore Stencil
Computations
Optimizations
Techniques used in the auto
tuning framework.
Several common optimizations have
been implemented in the framework
as AST transformations, including
● Loop Unrolling
● Cache Blocking
● Arithmetic Simplification
Loop Unrolling
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Cache Blocking
To expose temporal locality and increase cache reuse
Cache Blocking
● An important class of algorithmic changes involves blocking data structures to
fit in cache.
● By organizing data memory accesses, one can load the cache with a small
subset of a much larger data set.
● The idea is then to work on this block of data in cache.
● By using/reusing this data in cache we reduce the need to go to memory
(reduce memory bandwidth pressure).
An example.
Example contd...
Arithmetic simplification
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
Stencil computation research project presentation #1
AST - Abstract Syntax Tree
● Abstract syntax trees are data structures widely used in compilers,
due to their property of representing the structure of program code.
● An AST is usually the result of the syntax analysis phase of a
compiler.
● It often serves as an intermediate representation of the program
through several stages that the compiler requires, and has a strong
impact on the final output of the compiler.
AST example
These were some of the serial optimizations.
● Although the current set of optimizations may seem identical to existing
compiler optimizations, future strategies such as memory structure
transformations will be beyond the scope of compilers, since such
optimizations are specific to stencil-based computations.
● Additionally, the fact that the framework’s transformations yield code that
outperforms compiler-only optimized versions shows compiler algorithms
cannot always prove that these (safe) optimizations are allowed.
● Thus, a domain-specific code generator run by the user has the freedom to
implement transformations that a compiler may not.
Parallelization optimization
Parellel Optimization
● The shared-memory parallel code generators leverage the serial code
generation routines to produce the version run by each individual
thread.
● Since the parallelization strategy influences code structure, the AST —
which represents code run on each individual thread — must be
modified to reflect the chosen parallelization strategy.
● The parallel code generators make the necessary modifications to the
AST before passing it to the serial code generator.
Stencil auto-tuning framework flow
References
● http://people.csail.mit.edu/cycha
n/papers/ipdps10.pdf
● https://en.wikipedia.org/wiki/Abs
tract_syntax_tree
● https://www.youtube.com/watch
?v=SfV8aRX0YY0
● https://software.intel.com/en-us/
articles/cache-blocking-techniqu
es
Sometimes it is good to revisit our learnings. It helps to be a
good competitor and also to be prepared for grabbing
opportunities.
Thank you

More Related Content

PPTX
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
PDF
Bulk-Synchronous-Parallel - BSP
DOCX
Solution manual for modern processor design by john paul shen and mikko h. li...
PDF
2D_BitBlt_Scale
PDF
Microarchitecture of a coarse grain out-of-order superscalar processor
DOCX
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor
PPTX
Adaptive Execution Support for Malleable Computation
ODP
BAXTER PoC
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Bulk-Synchronous-Parallel - BSP
Solution manual for modern processor design by john paul shen and mikko h. li...
2D_BitBlt_Scale
Microarchitecture of a coarse grain out-of-order superscalar processor
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor
Adaptive Execution Support for Malleable Computation
BAXTER PoC

More from Jishnu P (7)

PDF
SinGAN - Learning a Generative Model from a Single Natural Image
PDF
Breaking CAPTCHAs using ML
PDF
Btp 2017 presentation
PDF
Ir mcq-answering-system
PDF
Cs403 Parellel Programming Travelling Salesman Problem
PDF
Ansible Overview - System Administration and Maintenance
PDF
CS404 Pattern Recognition - Locality Preserving Projections
SinGAN - Learning a Generative Model from a Single Natural Image
Breaking CAPTCHAs using ML
Btp 2017 presentation
Ir mcq-answering-system
Cs403 Parellel Programming Travelling Salesman Problem
Ansible Overview - System Administration and Maintenance
CS404 Pattern Recognition - Locality Preserving Projections
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
Programs and apps: productivity, graphics, security and other tools
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Ad

Stencil computation research project presentation #1

  • 1. Stencil Computation Research Project Jishnu P | Reshmi Mitra Presentation #1 | Date: 11-Jul-2017
  • 2. The Agenda Discuss 2 or 3 Optimization techniques from An Auto-Tuning Framework for Parallel Multicore Stencil Computations
  • 3. Optimizations Techniques used in the auto tuning framework. Several common optimizations have been implemented in the framework as AST transformations, including ● Loop Unrolling ● Cache Blocking ● Arithmetic Simplification
  • 12. Cache Blocking To expose temporal locality and increase cache reuse
  • 13. Cache Blocking ● An important class of algorithmic changes involves blocking data structures to fit in cache. ● By organizing data memory accesses, one can load the cache with a small subset of a much larger data set. ● The idea is then to work on this block of data in cache. ● By using/reusing this data in cache we reduce the need to go to memory (reduce memory bandwidth pressure).
  • 25. AST - Abstract Syntax Tree ● Abstract syntax trees are data structures widely used in compilers, due to their property of representing the structure of program code. ● An AST is usually the result of the syntax analysis phase of a compiler. ● It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler.
  • 27. These were some of the serial optimizations. ● Although the current set of optimizations may seem identical to existing compiler optimizations, future strategies such as memory structure transformations will be beyond the scope of compilers, since such optimizations are specific to stencil-based computations. ● Additionally, the fact that the framework’s transformations yield code that outperforms compiler-only optimized versions shows compiler algorithms cannot always prove that these (safe) optimizations are allowed. ● Thus, a domain-specific code generator run by the user has the freedom to implement transformations that a compiler may not.
  • 29. Parellel Optimization ● The shared-memory parallel code generators leverage the serial code generation routines to produce the version run by each individual thread. ● Since the parallelization strategy influences code structure, the AST — which represents code run on each individual thread — must be modified to reflect the chosen parallelization strategy. ● The parallel code generators make the necessary modifications to the AST before passing it to the serial code generator.
  • 31. References ● http://people.csail.mit.edu/cycha n/papers/ipdps10.pdf ● https://en.wikipedia.org/wiki/Abs tract_syntax_tree ● https://www.youtube.com/watch ?v=SfV8aRX0YY0 ● https://software.intel.com/en-us/ articles/cache-blocking-techniqu es
  • 32. Sometimes it is good to revisit our learnings. It helps to be a good competitor and also to be prepared for grabbing opportunities. Thank you