SlideShare a Scribd company logo
Overview of ppOpen-AT/Static
for ppOpen-APPL/FDM ver. 0.2.0
Takahiro Katagiri
(Information Technology Center,
The University of Tokyo)
1
As of July 30th, 2014
ppOpen‐AT System
ppOpen‐APPL /*
ppOpen‐AT
Directives
User 
KnowledgeLibrary 
Developer
① Before 
Release‐time
Candidate
1
Candidate
2
Candidate
3
Candidate
nppOpen‐AT
Auto‐Tuner
ppOpen‐APPL / *
Automatic
Code
Generation②
:Target 
Computers
Execution Time④
Library User
③
Library Call
Selection
⑤
⑥
Auto‐tuned
Kernel
Execution
Run‐
time
A Scenario to Software Developers for
ppOpen-AT
3
Executable Code with
Optimization Candidates
and AT Function
Software
Developer
Description of AT by Using
ppOpen-AT
Invocate dedicated
Preprocessor
Program with AT
Functions
Optimization
that cannot be
established by
compilers
■Automatic Generated
Functions
Optimization
Candidates
Performance Monitor
Parameter Search
Performance Modeling
#pragma oat install unroll (i,j,k) region start
#pragma oat varied (i,j,k) from 1 to 8
for(i = 0 ; i < n ; i++){
for(j = 0 ; j < n ; j++){
for(k = 0 ; k < n ; k++){
A[i][j]=A[i][j]+B[i][k]*C[k][j]; }}}
#pragma oat install unroll (i,j,k) region end
Description By Software Developer
Optimizations for Source Codes,
Computer Resource, Power Consumption
Target Application
 Seism_3D: Simulation for seismic wave analysis.
 Developed by Professor Furumura at the
University of Tokyo.
◦ The code is re-constructed as
ppOpen-APPL/FDM.
◦ version 0.2.0
 OMP parallelization is implemented.
 Finite Differential Method (FDM)
 3D simulation
◦ 3D arrays are allocated.
 Data type: Single Precision (real*4)
4
Target Computation Kernels
in Current Implementation
1. Kernel update_stress
◦ 8 Kinds of Candidates with Loop fusion and Loop Split.
2. Kernel update_vel
◦ 5 Kinds of Candidates with Loop Fusion and Loop Split, and
reordering of statements.
3 Kinds of Candidates with Loop Fusion and Loop Split.
3. Kernel update_stress_sponge
4. Kernel update_vel_sponge
5. Kernel ppohFDM_pdiffx3_p4
6. Kernel ppohFDM_pdiffx3_m4
7. Kernel ppohFDM_pdiffy3_p4
8. Kernel ppohFDM_pdiffy3_m4
9. Kernel ppohFDM_pdiffz3_p4
10. Kernel ppohFDM_pdiffz3_m4
5
Additional Kernels in Next Release
(Planned in Nov. 2014)
 3 Kinds of Candidates with Loop Fusion
and Loop Split for data packing and
unpacking.
11. Kernel ppohFDM_ps_pack
12. Kernel ppohFDM_ps_unpack
13. Kernel ppohFDM_pv_pack
14. Kernel ppohFDM_pv_unpack
6
An Example of Seism_3D Simulation
 West part earthquake inTottori prefecture in Japan
at year 2000. ([1], pp.14)
 The region of 820km x 410km x 128 km is discretized with 0.4km.
 NX x NY x NZ = 2050 x 1025 x 320 ≒ 6.4 : 3.2 : 1.
[1] T. Furumura,“Large-scale Parallel FDM Simulation for Seismic Waves and Strong Shaking”, Supercomputing News,
Information Technology Center,The University of Tokyo,Vol.11, Special Edition 1, 2009. In Japanese.
Figure : Seismic wave translations in west part earthquake in Tottori prefecture in Japan. (a)
Measured waves; (b) Simulation results; (Reference : [1] in pp.13)
Maximum AT Effect (Xeon Phi, 8 Nodes)
558
200
171
30 20 51
Speedup [%]
Maximum AT Effect (Ivy Bridge, 8 Nodes)
48
42
67
29
47
23
Speedup [%]
Maximum AT Effect (FX10, 8 Nodes)
10 9 7
37
46
2
Speedup [%]

More Related Content

PDF
Code GPU with CUDA - Device code optimization principle
PDF
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
PDF
Learning Erlang (from a Prolog dropout's perspective)
PDF
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
PDF
Arm tools and roadmap for SVE compiler support
PDF
Code GPU with CUDA - Optimizing memory and control flow
PDF
An evaluation of LLVM compiler for SVE with fairly complicated loops
PDF
Code GPU with CUDA - Device code optimization principle
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Learning Erlang (from a Prolog dropout's perspective)
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Arm tools and roadmap for SVE compiler support
Code GPU with CUDA - Optimizing memory and control flow
An evaluation of LLVM compiler for SVE with fairly complicated loops

What's hot (20)

PDF
Performance evaluation with Arm HPC tools for SVE
PDF
Compilation of COSMO for GPU using LLVM
PDF
A698111855 22750 26_2018_finite
PPT
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PPT
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PDF
A Gomez TimTrack at C E S G A
PDF
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PDF
The Joy of SciPy
PPT
DUSK - Develop at Userland Install into Kernel
PPTX
NumPy sorting (and dancing and memes): ​ a short review for students​
PDF
SoC-powered Linux / Linux-powered SoC
PDF
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
PDF
Faster Python
PDF
Laplace Daemon: from a math theory to AI practice
PPTX
Graph 500 DISLIB powered optimized version
PDF
May2010 hex-core-opt
PDF
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
PPTX
Automata
PPTX
Building Efficient and Highly Run-Time Adaptable Virtual Machines
PDF
FPGA処理をROSコンポーネント化する自動設計環境
Performance evaluation with Arm HPC tools for SVE
Compilation of COSMO for GPU using LLVM
A698111855 22750 26_2018_finite
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
PREDICTING THE TIME OF OBLIVIOUS PROGRAMS. Euromicro 2001
A Gomez TimTrack at C E S G A
Numba: Flexible analytics written in Python with machine-code speeds and avo...
The Joy of SciPy
DUSK - Develop at Userland Install into Kernel
NumPy sorting (and dancing and memes): ​ a short review for students​
SoC-powered Linux / Linux-powered SoC
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Faster Python
Laplace Daemon: from a math theory to AI practice
Graph 500 DISLIB powered optimized version
May2010 hex-core-opt
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
Automata
Building Efficient and Highly Run-Time Adaptable Virtual Machines
FPGA処理をROSコンポーネント化する自動設計環境
Ad

More from Takahiro Katagiri (15)

PDF
情報処理学会HPC研究会HPC196研究会パネル討論導入部資料(2024年9月30日)
PDF
Auto‐Tuning of Hierarchical Computations with ppOpen‐AT
PDF
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
PDF
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
PDF
ソフトウェア自動チューニング研究紹介
PDF
Ase20 20151016 hp
PDF
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
PDF
SCG-AT:静的コード生成のみによる自動チューニング実現方式
PDF
自動チューニングとビックデータ:機械学習の適用の可能性
PDF
iWAPT2015_katagiri
PDF
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
PDF
ppOpen-AT : Yet Another Directive-base AT Language
PDF
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
PDF
ATTA2014基盤B導入(片桐)
PDF
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
情報処理学会HPC研究会HPC196研究会パネル討論導入部資料(2024年9月30日)
Auto‐Tuning of Hierarchical Computations with ppOpen‐AT
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
ソフトウェア自動チューニング研究紹介
Ase20 20151016 hp
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
SCG-AT:静的コード生成のみによる自動チューニング実現方式
自動チューニングとビックデータ:機械学習の適用の可能性
iWAPT2015_katagiri
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
ppOpen-AT : Yet Another Directive-base AT Language
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
ATTA2014基盤B導入(片桐)
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Ad

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
1. Introduction to Computer Programming.pptx
PDF
August Patch Tuesday
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
TLE Review Electricity (Electricity).pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting Started with Data Integration: FME Form 101
Digital-Transformation-Roadmap-for-Companies.pptx
Heart disease approach using modified random forest and particle swarm optimi...
1. Introduction to Computer Programming.pptx
August Patch Tuesday
Zenith AI: Advanced Artificial Intelligence
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
1 - Historical Antecedents, Social Consideration.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
NewMind AI Weekly Chronicles - August'25-Week II
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
DP Operators-handbook-extract for the Mautical Institute
Encapsulation_ Review paper, used for researhc scholars
TLE Review Electricity (Electricity).pptx

Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0