SlideShare a Scribd company logo
Auto‐Tuning of Hierarchical 
Computations with ppOpen‐AT 
Takahiro Katagiri i),ii),iii) ,
Masaharu Matsumoto i),ii), Satoshi Ohshima i),ii)
1
17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16)
Universite Pierre et Marie Curie, Cordeliers Campus, Paris, France
Thursday, April 14, 2016, MS55:Auto‐Tuning for the Post Moore’s Era – Part I of II, 2:40‐3:00 
i) Information Technology Center, The University of Tokyo
ii) JST, CREST
iii) Currently, Information Technology Center, Nagoya University
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
Source: 
http://www.engad
get.com/2011/12/0
6/the‐big‐memory‐
cube‐gamble‐ibm‐
and‐micron‐stack‐
their‐chips/
Reconstruction of algorithms w.r.t. 
 Increase memory bandwidth
 Increase local memory amounts (caches)
 Non‐Uniform  memory accesses latency 
Issues in AT Technologies.
 Hierarchical AT Methodology 
 Reducing Communication 
Algorithm.
 New algorithms and 
code (algorithm) selection 
utilizing high bandwidth ability. 
 Rethink classical algorithms.
 Non‐Blocking Algorithms.
 From explicit method to implicit 
method. 
 Out of Core algorithms 
(Out of main memory)
Source:
http://www.there
gister.co.uk/2015/
07/28/intel_micro
n_3d_xpoint/
• It is expected that Moore’s Low is broken around 
end of 2020.
– End of “One‐time Speedup”: Many Cores, Wiring 
miniaturization to reduce power.
→ It cannot increase FLOPS inside node.
• However, memory bandwidth inside memory can 
increase by using “3D Stacking Memory”
Technologies. 
• 3D Stacking Memory: 
– It can increase bandwidth for Z‐Direction 
(Stacking distance) , and keeping access latency 
(→ High performance)
– It can be low access latency for X‐Y directions. 
• Access latency between nodes goes down, but 
bandwidth can be increased by optical 
interconnection technology.
• We need to take care of new algorithms with 
respect to ability of data movements.
HMC
Intel 3D Xpoint
AT Technologies in Post Moore’s Era
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
ppOpen‐HPC Project
• Middleware for HPC and Its AT Technology
– Supported by JST, CREST, from FY2011 to FY2015.
– PI: Professor Kengo Nakajima (U. Tokyo)
• ppOpen‐HPC 
– An open source infrastructure for reliable simulation codes on 
post‐peta (pp) scale parallel computers.
– Consists of various types of libraries, which covers 
5 kinds of discretization methods for scientific computations. 
• ppOpen‐AT 
– An auto‐tuning language for ppOpen‐HPC codes 
– Using knowledge of previous project: ABCLibScript Project 4). 
– Auto‐tuning language based on directives for AT.
6
Software Architecture of 
ppOpen‐HPC
7
FVM DEMFDMFEM
Many‐core 
CPUs
GPUs
Multi‐core
CPUs
MG
COMM
Auto‐Tuning Facility
Code Generation for Optimization Candidates
Search for the best candidate
Automatic Execution for the optimization
ppOpen‐APPL
ppOpen‐MATH
BEM
ppOpen‐AT
User Program
GRAPH VIS MP
STATIC DYNAMIC
ppOpen‐SYS FT
Optimize 
memory 
accesses 
ppOpen‐AT System  (Based on FIBER 2),3),4),5) ) 
ppOpen‐APPL /*
ppOpen‐AT
Directives
User 
KnowledgeLibrary 
Developer
① Before 
Release‐time
Candidate
1
Candidate
2
Candidate
3
Candidate
nppOpen‐AT
Auto‐Tuner
ppOpen‐APPL / *
Automatic
Code
Generation②
:Target 
Computers
Execution Time④
Library User
③
Library Call
Selection
⑤
⑥
Auto‐tuned
Kernel
Execution
Run‐
time
This user 
benefited 
from AT.
Scenario of AT for ppOpen‐APPL/FDM
9
Execution with optimized
kernels without AT process.
Library User
Set AT parameters,
and execute the library
(OAT_AT_EXEC=1)
■Execute auto-tuner:
With fixed loop lengths
(by specifying problem size and number of
MPI processes)
Time measurement for target kernels
Store the best variant information.
Set AT parameters, and
execute the library
(OAT_AT_EXEC=0)
Store the fastest kernel
information
Using the fastest kernel without AT 
(except for varying problem size, number 
of MPI processes and OpenMP threads.)
Specify problem size,
number of MPI processes
and OpenMP threads.
AT Timings of ppOpen‐AT (FIBER Framework)
OAT_ATexec()
…
do i=1, MAX_ITER
Target_kernel_k()
…
Target_kernel_m()
…
enddo
Read the best parameter
Is this first call?
Yes
Read the best parameter
Is this first call?
Yes
One time execution (except 
for varying problem size and 
number of MPI processes )
AT for Before Execute‐time
Execute Target_Kernel_k() with varying 
parameters
Execute Target_Kernel_m() with varying 
parameters
parameter 
Store the best 
parameter 
…
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
Target Application
• Seism3D: 
Simulation for seismic wave analysis.
• Developed by Professor T.Furumura
at the University of Tokyo.
–The code is re‐constructed as 
ppOpen‐APPL/FDM.
• Finite Differential Method (FDM) 
• 3D simulation
–3D arrays are allocated.
• Data type: Single Precision (real*4) 12
Flow Diagram of 
ppOpen‐APPL/FDM
13
),,,(
}],,,)
2
1
({},,)
2
1
({[
1
),,(
2/
1
zyxqp
zyxmxzyxmxc
x
zyx
dx
d
pqpq
M
m
m
pq



 


 Space difference by FDM.
),,(,
12
1
2
1
zyxptf
zyx
uu n
p
n
zp
n
yp
n
xpn
p
n
p 

















 


 Explicit time expansion by central 
difference.
Initialization
Velocity Derivative (def_vel)
Velocity Update (update_vel)
Stress Derivative (def_stress)
Stress Update (update_stress)
Stop Iteration?
NO
YES
End
Velocity PML condition (update_vel_sponge)
Velocity Passing (MPI) (passing_vel)
Stress PML condition (update_stress_sponge)
Stress Passing (MPI) (passing_stress)
Original Implementation (For Vector Machines)
call ppohFDM_pdiffx3_m4_OAT( VX,DXVX, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
call ppohFDM_pdiffy3_p4_OAT( VX,DYVX, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
call ppohFDM_pdiffz3_p4_OAT( VX,DZVX, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
call ppohFDM_pdiffy3_m4_OAT( VY,DYVY, NXP,NYP,NZP,NXP0,NXP1,NYP0,… )
call ppohFDM_pdiffx3_p4_OAT( VY,DXVY, NXP,NYP,NZP,NXP0,NXP1,NYP0,… )
call ppohFDM_pdiffz3_p4_OAT( VY,DZVY, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
call ppohFDM_pdiffx3_p4_OAT( VZ,DXVZ, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
call ppohFDM_pdiffy3_p4_OAT( VZ,DYVZ, NXP,NYP,NZP,NXP0,NXP1,NYP0,… )
call ppohFDM_pdiffz3_m4_OAT( VZ,DZVZ, NXP,NYP,NZP,NXP0,NXP1,NYP0,…)
if( is_fs .or. is_nearfs ) then
call ppohFDM_bc_vel_deriv( KFSZ,NIFS,NJFS,IFSX,IFSY,IFSZ,JFSX,JFSY,JFSZ )
end if
call ppohFDM_update_stress(1, NXP, 1, NYP, 1, NZP)
Fourth‐order accurate central‐difference scheme 
for velocity. (def_stress)
Process of model boundary. 
Explicit time expansion by leap‐frog scheme. (update_stress)
Original Implementation (For Vector Machines)
subroutine OAT_InstallppohFDMupdate_stress(..)
!$omp parallel do private(i,j,k,RL1,RM1,RM2,RLRM2,DXVX1,DYVY1,DZVZ1,…)
do k = NZ00, NZ01
do j = NY00, NY01
do i = NX00, NX01
RL1   = LAM (I,J,K); RM1   = RIG (I,J,K);  RM2   = RM1 + RM1; RLRM2 = RL1+RM2
DXVX1 = DXVX(I,J,K);  DYVY1 = DYVY(I,J,K);  DZVZ1 = DZVZ(I,J,K)
D3V3  = DXVX1 + DYVY1 + DZVZ1
SXX (I,J,K) = SXX (I,J,K) + (RLRM2*(D3V3)‐RM2*(DZVZ1+DYVY1) ) * DT
SYY (I,J,K) = SYY (I,J,K)  + (RLRM2*(D3V3)‐RM2*(DXVX1+DZVZ1) ) * DT
SZZ (I,J,K) = SZZ (I,J,K)  + (RLRM2*(D3V3)‐RM2*(DXVX1+DYVY1) ) * DT
DXVYDYVX1 = DXVY(I,J,K)+DYVX(I,J,K);  DXVZDZVX1 = DXVZ(I,J,K)+DZVX(I,J,K)
DYVZDZVY1 = DYVZ(I,J,K)+DZVY(I,J,K)
SXY (I,J,K) = SXY (I,J,K) + RM1 * DXVYDYVX1 * DT
SXZ (I,J,K) = SXZ (I,J,K) + RM1 * DXVZDZVX1 * DT
SYZ (I,J,K) = SYZ (I,J,K) + RM1 * DYVZDZVY1 * DT
end do
end do
end do
retuen
end
Explicit time 
expansion by 
leap‐frog scheme. 
(update_stress)
Input and output for arrays Input and output for arrays 
in each call ‐> Increase of 
B/F ratio: ~1.7
Code selection by ppOpen‐AT and hierarchical AT
Program main
….
!OAT$ install select region start
!OAT$ name ppohFDMupdate_vel_select
!OAT$ select sub region start
call ppohFDM_pdiffx3_p4( SXX,DXSXX,NXP,NYP,NZP,….)
call ppohFDM_pdiffy3_p4( SYY,DYSYY, NXP,NYP,NZP,…..)
…
if( is_fs .or. is_nearfs ) then
call ppohFDM_bc_stress_deriv( KFSZ,NIFS,NJFS,IFSX,….)
end if
call ppohFDM_update_vel    ( 1, NXP, 1, NYP, 1, NZP )
!OAT$ select sub region end
!OAT$ select sub region start
Call ppohFDM_update_vel_Intel  ( 1, NXP, 1, NYP, 1, NZP )
!OAT$ select sub region end
!OAT$ install select region end
Upper Code
With Select clause, 
code selection can be 
specified.
subroutine ppohFDM_update_vel(….)
….
!OAT$ install LoopFusion region start
!OAT$ name ppohFDMupdate_vel
!OAT$ debug (pp)
!$omp parallel do private(i,j,k,ROX,ROY,ROZ)
do k = NZ00, NZ01
do j = NY00, NY01
do i = NX00, NX01
…..
….
Lower Code
subroutine ppohFDM_pdiffx3_p4(….)
….
!OAT$ install LoopFusion region start
….
Call tree graph by the AT
Start
Stress Derivative (def_stress)
Stress Update (update_stress)
Stop iteration?
NO
YES
End
Velocity PML condition (update_vel_sponge)
Velocity Passing (MPI) (passing_vel)
Stress PML condition (update_stress_sponge)
Stress Passing (MPI) (passing_stress)
Velocity Derivative (def_vel)
Velocity Update (update_vel)
Main
Program
Velocity Update (update_vel_Scalar)
Stress Update (update_stress_Scalar)
Selection
Selection
update_vel_select
update_stress_select
update_vel_select
update_stress_select
:auto‐generated codes
Velocity Update (update_vel_IF_free)
Selection
Stress Update (update_stress_IF_free)
Selection
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
CandidateCandidateCandidateCandidate
Execution Order of the AT
Velocity PML condition (update_vel_sponge)
Velocity Passing (MPI) (passing_vel)
Stress PML condition (update_stress_sponge)
Stress Passing (MPI) (passing_stress)
update_vel_select
update_stress_select
Stress Derivative 
(def_stress)
Stress Update 
(update_stress)
Velocity Derivative
(def_vel)
Velocity Update
(update_vel)
Velocity Update 
(update_vel_Scalar)
Stress Update
(update_stress_Scalar)
Def_* AT Candidates
Update_vel
AT Candidates
Update_stress
AT Candidates
Update_vel_sponge
AT Candidates
Update_stress_sponge
AT Candidates
Passing_vel
AT Candidates
Passing_stress
AT Candidates
①
②
②
③
③
④
④
④
④
Velocity Update 
(update_vel_IF_free)
Stress Update
(update_stress_IF_free)
We can specify the 
order via a directive of 
ppOpen‐AT.
(an extended function)
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
The Number of AT Candidates (ppOpen‐APPL/FDM)
20
Kernel Names AT Objects The Number of Candidates
1. update_stress ・Loop Collapses and Splits :8 Kinds
・Code Selections : 2 Kinds
10
2. update_vel ・Loop Collapses, Splits, and re‐
ordering of statements: :6 Kinds
・Code Selections: 2 Kinds
8
3. update_stress_sponge ・Loop Collapses:3 Kinds 3
4. update_vel_sponge ・Loop Collapses:3 Kinds 3
5. ppohFDM_pdiffx3_p4 Kernel Names:def_update、def_vel
・Loop Collapses:3 Kinds
3
6. ppohFDM_pdiffx3_m4 3
7. ppohFDM_pdiffy3_p4 3
8. ppohFDM_pdiffy3_m4 3
9. ppohFDM_pdiffz3_p4 3
10.ppohFDM_pdiffz3_m4 3
11. ppohFDM_ps_pack Data packing and unpacking
・Loop Collapses: 3 Kinds
3
12. ppohFDM_ps_unpack 3
13. ppohFDM_pv_pack 3
14. ppohFDM_pv_unpack 3
 Total:54 Kinds
 Hybrid 
MPI/OpenMP:
7 Kinds
 54×7 = 
378 Kinds
Machine Environment 
(8 nodes of the Xeon Phi)
 The Intel Xeon Phi 
 Xeon Phi 5110P (1.053 GHz), 60 cores
 Memory Amount:8 GB (GDDR5)
 Theoretical Peak Performance:1.01 TFLOPS
 One board per node of the Xeon phi cluster
 InfiniBand FDR x 2 Ports 
 Mellanox Connect‐IB
 PCI‐E Gen3 x16
 56Gbps x 2
 Theoretical Peak bandwidth 13.6GB/s
 Full‐Bisection
 Intel MPI
 Based on MPICH2, MVAPICH2
 Version 5.0 Update 3 Build 20150128 
 Compiler:Intel Fortran version 15.0.3 20150407
 Compiler Options:
‐ipo20 ‐O3 ‐warn all ‐openmp ‐mcmodel=medium ‐shared‐intel –mmic
‐align array64byte
 KMP_AFFINITY=granularity=fine, balanced (Uniform Distribution of threads 
between sockets)
Execution Details
• ppOpen‐APPL/FDM ver.0.2
• ppOpen‐AT ver.0.2
• The number of time step: 2000 steps
• The number of nodes: 8 node
• Native Mode Execution
• Target Problem Size 
(Almost maximum size with 8 GB/node)
– NX * NY * NZ = 512 x  512 x 512 / 8 Node
– NX * NY * NZ = 256 * 256 * 256 / node
(!= per MPI Process)
• The number of iterations for kernels 
to do auto‐tuning: 100
Execution Details of Hybrid 
MPI/OpenMP
• Target MPI Processes and OMP Threads on the Xeon Phi
– The Xeon Phi with 4 HT (Hyper Threading) 
– PX TY: X MPI Processes and Y Threads per process
– P8T240 : Minimum Hybrid MPI/OpenMP execution for 
ppOpen‐APPL/FDM, since it needs minimum 8 MPI Processes.
– P16T120
– P32T60
– P64T30
– P128T15  
– P240T8
– P480T4
– Less than P960T2 cause an MPI error in this environment.
#
0
#
1
#
2
#
3
#
4
#
5
#
6
#
7
#
8
#
9
#
1
0
#
1
1
#
1
2
#
1
3
#
1
4
#
1
5
P2T8
#
0
#
1
#
2
#
3
#
4
#
5
#
6
#
7
#
8
#
9
#
1
0
#
1
1
#
1
2
#
1
3
#
1
4
#
1
5
P4T4 Target of cores for 
one MPI Process
BREAK DOWN OF TIMINGS 
THE BEST IMPLEMENTATION
Outline
1. Background
2. ppOpen‐HPC and ppOpen‐AT
3. Code Selection in Seism3D and 
Its Implementation by ppOpen‐AT 
4. Performance Evaluation
5. Conclusion
RELATED WORK
Originality (AT Languages)
AT Language 
/ Items
#
1
#
2
#
3
#
4
#
5
#
6
#
7
#
8
ppOpen‐AT OAT Directives ✔ ✔ ✔ ✔ ✔ None
Vendor Compilers Out of Target Limited ‐
Transformation 
Recipes 
Recipe
Descriptions
✔ ✔ ChiLL
POET Xform Description ✔ ✔ POET translator, ROSE
X language 
Xlang Pragmas ✔ ✔ X Translation,
‘C and tcc
SPL SPL Expressions ✔ ✔ ✔ A Script Language
ADAPT
ADAPT 
Language
✔ ✔ Polaris Compiler 
Infrastructure, 
Remote Procedure 
Call (RPC)
Atune‐IL
Atune Pragmas ✔ A Monitoring 
Daemon
PEPPHER
PEPPHER Pragmas
(interface)
✔ ✔ ✔ PEPPHER task graph
and run-time
Xevolver
Directive Extension
(Recipe Descriptions)
(✔) (✔) (✔) ROSE,
XSLT Translator
#1: Method for supporting multi-computer environments. #2: Obtaining loop length in run-time.
#3: Loop split with increase of computations6) ,and loop collapses to the split loops6),7),8) .
#4: Re-ordering of inner-loop sentences8) . #5:Code selection with loop transformations (Hierarchical AT
descriptions*) *This is originality in current researches of AT as of 2015. #6:Algorithm selection.
#7: Code generation with execution feedback. #8: Software requirement.
(Users need to define rules. )
Conclusion Remarks
 Propose an IF‐free kernel: An effective kernel 
implementation of an application with FDM by merging 
computations of central‐difference and explicit time 
expansion schemes. 
 Use AT language to adapt code selection for new kernels: 
The effectiveness of the new implementation depends 
on the CPU architecture and execution situation, such as 
problem size and the number of MPI processes and 
OpenMP threads. 
To obtain free code (MIT Licensing):
http://ppopenhpc.cc.u‐tokyo.ac.jp/
Future Work
• Improving Search Algorithm
– We use a brute‐force search in current implementation.
• This is feasible by applying knowledge of application.
– We have implemented a new search algorithm based on black box 
performance models.
• d‐Spline Model (interpolation and incremental addition
based) ,collaborated with Prof. Tanaka (Kogakuin U.)
• Surrogate Model (interpolation and probability based) collaborated 
with Prof. Wang (National Taiwan U.) 
• Off‐loading Implementation Selection 
(for the Xeon Phi)
– If problem size is too small to do off‐loading, the target execution is 
performed on CPU automatically.
• Adaptation of OpenACC for GPU computing
– Selection of OpenACC directives with ppOpenAT 
by Dr. Ohshima.
• gang, vector, parallel, etc. 
http://ppopenhpc.cc.u‐tokyo.ac.jp/
Thank you for your attention!
Questions?

More Related Content

PDF
190828 royal council (4) chidchanok
PDF
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
PDF
Ase20 20151016 hp
PDF
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
PDF
ppOpen-AT : Yet Another Directive-base AT Language
PDF
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
PDF
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
PDF
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
190828 royal council (4) chidchanok
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
Ase20 20151016 hp
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...

Viewers also liked (7)

PDF
自動チューニングとビックデータ:機械学習の適用の可能性
PDF
SCG-AT:静的コード生成のみによる自動チューニング実現方式
PDF
iWAPT2015_katagiri
PDF
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
PDF
ATTA2014基盤B導入(片桐)
PDF
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
PDF
ソフトウェア自動チューニング研究紹介
自動チューニングとビックデータ:機械学習の適用の可能性
SCG-AT:静的コード生成のみによる自動チューニング実現方式
iWAPT2015_katagiri
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
ATTA2014基盤B導入(片桐)
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
ソフトウェア自動チューニング研究紹介
Ad

Recently uploaded (20)

PDF
August Patch Tuesday
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
August Patch Tuesday
Unlocking AI with Model Context Protocol (MCP)
SOPHOS-XG Firewall Administrator PPT.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
1. Introduction to Computer Programming.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Machine Learning_overview_presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
A comparative study of natural language inference in Swahili using monolingua...
Network Security Unit 5.pdf for BCA BBA.
A comparative analysis of optical character recognition models for extracting...
TLE Review Electricity (Electricity).pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Per capita expenditure prediction using model stacking based on satellite ima...
OMC Textile Division Presentation 2021.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Ad

Auto‐Tuning of Hierarchical Computations with ppOpen‐AT