SlideShare a Scribd company logo
www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Introduction to HPC
Programming Models
Stefano Markidis
KTH, Sweden
Supercomputing - I
Use of computer simulation as a tool
for greater understanding of the real
world
Complements experimentation
and theory
Problems are increasingly
computationally challenging
Large parallel machines needed
to perform calculations
Critical to leverage parallelism in
all phases
Data access is a huge challenge
Using parallelism to obtain
performance
Finding usable, efficient, portable
I/O interfaces
millenium project
Thermal hydraulics with Nek500
Parallel Machines
c c
c c
💲💲
DRAM
Your First
Parallel Machine
=
Your Laptop
Parallel Machines
c c
c c
💲💲
DRAM
Office Workstation
c c
c c
💲💲
Parallel Machines
Computing Node
c c
c c
💲💲
DRAM
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
NIC
c c
c c
💲💲
DRAM
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
NIC
c c
c c
💲💲
DRAM
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
NIC
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
c c
c c
c c
c c
c c
c c
c c
c c
HPC I/O System is also rather complex…
An HPC I/O system is attached to supercomputer
The HPC I/O system is a supercomputer itself
Commodity
network primarily
carries storage traffic
Enterprise storage
controllers and large racks
of disks connected via
Storage nodes run
parallel file system
software and manage
Gateway nodes run
parallel file system client
software and forward I/O
Ethernet
10 Gbit/sec
InfiniBand
16 Gbit/sec
BG/P Tree
6.8 Gbit/sec
Serial ATA
3.0 Gbit/sec
HW bottleneck is
here. Controllers
can manage only
4.6 Gbyte/sec.
Peak I/O system
bandwidth is
78.2 Gbyte/sec.
Architectural diagram of 557 TF Argonne Leadership Computing Facility Blue Gene/P I/O system
Supercomputing – II
Most of modern supercomputer
hardware are built following two
principles:
use of commodity hardware:
Intel CPUs, AMD CPUs, DDR4,
NVIDIA GPU …
Using parallelism to achieve
very high performance
The file systems connected to
computers are built in the same
way
Gather large numbers of
storage device: HDDs, SSDs
Connect them together in
parallel to create a high
bandwidth, high capacity
storage device.
Largest Supercomputers
https://www.top500.org/
Largest HPC IO Systems
https://www.vi4io.org/hpsl/2017/start
This is where
Big Data starts for HPC
Supercomputing - III
Supercomputing, n. [ sˌuːpəkəmpjˈuːtə]
A special branch of scientific computing
that turns a computation-bound problem
into an I/O-bound problem.
Why is that ? I/O vs Compute Performance
Disk AccessRates over Time
HPC Programming Models
Programming models are an abstraction of parallel
computer architectures
To express conveniently algorithms without
focusing on the details of underlying hardware
To remove complexity of architecture when
designing algorithms
To allow for high-performance
implementations
Two HPC Programming Models for Supercomputers
p0 p1 p2
a=12 a=77 a=32
a=12
a=12
12
12
Message-Passing: explicit send
and receive operations (explicit
communication)
p0 p1 p2
a(1) =
12
a(2) =
77
a(3) =
32
a(1) a(2) a(3)
a(2) =
12 a(2) =
12
a(3) =
12
a(3) =
12
PGAS: access global memory that is
physically distributed (implicit
communication)
Get/Put are load/store to global memory
Problem: move value of a from p0,
to p1 and then p2
How do you program a supercomputer ?
99% of the codes for supercomputers are written in
Fortran (including Fortran77) and C/C++
Other languages supporting multithreading for
on-node parallelism (Python, Java, …)
99% of the large HPC codes use MPI libraries (MP
programming model)
Used to move data from one computing node to
another but also used for on-node parallelism
Data-analytics frameworks for supercomputers
often use MPI as transport layers
MPI
MPI = standardized specification document for a
Message Passing library to support parallel computing in
C/C++ and Fortran.
Portability
High-Performance
Two main implementations:
MPICH and OpenMPI (you can install on your laptop)
Supercomputer vendors provide highly-tuned
implementations of these two.
Only four fundamental functions: MPI_Init,
MPI_Finalize, MPI_Send, MPI_Recv
Other collective functions that include all the
communicate processes, i.e. broadcast, scatter, …
Includes RDMA operations (one-sided), also streaming
models built atop
Simple MPI code
What is Parallel I/O?
At the program level:
Concurrent reads or writes from multiple
processes to a common file
At the system level:
A parallel file system and hardware that support
such concurrent access
Three strategies of I/O in HPC:
Spokesperson
Multiple writers multiple files
Cooperative
Spokesperson
One process performs the I/O
Easy to program
It doesn’t scale
Shared File
Multiple writers multiple files
All the processes write to individual files
Might limited by the file system
Easy to program
It doesn’t scale
Number of files creates bottleneck with metadata operations
Number of simultaneous disk accesses creates contention for
file system resources
Cooperative Parallel I/O (Real Parallel IO)
Multiple processes write to a shared file potentially not in
a non-contiguous way
Truly IO-parallel
EUDAT Summer School, 3-7 July 2017, Crete
Applications (Weather Forecast, CFD, Astrophysics …)
High-Level I/O Level Libraries
I/O Middleware
I/O Forwarding
I/O Parallel File system
I/O Hardware
MPI I/O
HDF5, NetCDF, SionLib, ADIOS
CIOD/DVS
Lustre, GPFS, …
Know about this
allow you to optimize
higher level of the
software stack
HPC I/O Software Stack
MPI I/O
Why Parallel I/O in MPI?
Writing is like sending and reading is like receiving.
Any parallel I/O system will need:
collective operations, communicators, …
Why do I/O in MPI?
Why not just POSIX?
Parallel performance
Single file (instead of one file / process)
MPI has replacement functions for POSIX I/O
Provides migration path
Multiple styles of I/O can all be expressed in MPI
Including some that cannot be expressed without
MPI
MPI I/O: the basics
I/O operations for unformatted binary file, similar to
read and write, there is no fwrite nor fread.
Just like POSIX I/O, you need to
Open the file
Read or Write data to the file
Close the file
In MPI, these steps are almost the same:
Open the file: MPI_File_open
Write to the file: MPI_File_write
Close the file: MPI_File_close
An example of MPI I/O
#include <stdio.h>
#include "mpi.h”
int main(int argc, char *argv[])
{
MPI_File fh;
int buf[1000], rank;
MPI_Init(0,0);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI _File_ open(MPI_COMM_WORLD, "test.out",
MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
if (rank == 0)
MPI _File_ w rite(fh, buf, 1000, MPI_INT, MPI_STATUS_IGNORE);
MPI _File_ close(&fh);
MPI_Finalize();
return 0;
}
Example code to write to a shared file
High-Level Parallel Libraries
Provide structure to files
Well-defined, portable formats
Self-describing
APIs more appropriate for computational science
Typed data
Noncontiguous regions in memory and file
Interfaces are implemented on top of MPI-IO
HDF5
Most used high-level library in scientific codes
HDF5 = Hierarchical Data format
HDF5 is three things:
Data model: container, data set, group and link
Library: support for parallel I/O operations
File Format: Hierarchical data organization in
single file; typed, multidimensional array storage ;
attributes on dataset, data
What about PGAS I/O ?
Effort in designing PGAS-like
programming systems for I/O
operations
Different parts of a shared
file are virtually mapped to
a global memory space,
that is accessible by all
processes, think about
mmap for instance
To write to disk, make a
store to global memory
To read from disk, make a
load from the global
memory
I/O system is becoming very
heterogeneous so it is good to
have a unique flat global
“memory” space to hide this
architectural complexity
a(0) a(1) a(2) a(3) a(4) a(5) a(6)
File 1 File 2
mapping
User
write
a(5)
a(5) = 8.7
Global “memory”
Conclusions
Supercomputers consist of several computing
nodes connected by an high-performance network
Programming models abstract supercomputer
hardware to allow for efficient implementation of
algorithms
MPI, C/C++ and Fortran are dominant
MPI I/O provides means for real parallel I/O
HDF5 most famous data format, library and data
model in HPC
PGAS I/O might be a viable option
www.eudat.eu
Thank you!
www.eudat.eu
Acknowledgements
These slides are largely based and adapted from:
- “Parallel I/O in Practice” by Rob Ross
https://www.nersc.gov/assets/Training/pio-in-practice-sc12.pdf
- “Short introduktion on Optimizing I/O” by Cray
https://www.pdc.kth.se/education/course-resources/introduction-to-
cray-xc30-xc40/feb-2015/05_Short_Intro_Optimizing-IO.pdf
- “Lecture 32: Introduction to MPI I/O” by Bill Gropp
http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture32.pdf

More Related Content

PDF
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
PDF
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
PDF
Xian He Sun Data-Centric Into
PDF
Implementing HDF5 in MATLAB
PDF
C++ on its way to exascale and beyond -- The HPX Parallel Runtime System
PDF
Introduction to Spark: Or how I learned to love 'big data' after all.
PPT
The MATLAB Low-Level HDF5 Interface
PPT
HDF5 Advanced Topics - Datatypes and Partial I/O
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Xian He Sun Data-Centric Into
Implementing HDF5 in MATLAB
C++ on its way to exascale and beyond -- The HPX Parallel Runtime System
Introduction to Spark: Or how I learned to love 'big data' after all.
The MATLAB Low-Level HDF5 Interface
HDF5 Advanced Topics - Datatypes and Partial I/O

What's hot (20)

PDF
Sap technical deep dive in a column oriented in memory database
PPT
Hadoop and MapReduce
PPTX
An unsupervised framework for effective indexing of BigData
PPT
PPSX
NASA HDF/HDF-EOS Data for Dummies (and Developers)
PDF
Programming Modes and Performance of Raspberry-Pi Clusters
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
PPTX
Introduction to Ocean Observation1
PPTX
MATLAB Modernization on HDF5 1.10
PDF
Flow Solver: HiFUN
PDF
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
PPTX
Interconnecting Belgian national and regional address data using EC ISA "Loca...
PPTX
All AI Roads lead to Distribution - Dot AI
PPTX
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
PPTX
Resisting skew accumulation
PPTX
3.introduction to map reduce
Sap technical deep dive in a column oriented in memory database
Hadoop and MapReduce
An unsupervised framework for effective indexing of BigData
NASA HDF/HDF-EOS Data for Dummies (and Developers)
Programming Modes and Performance of Raspberry-Pi Clusters
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Introduction to Ocean Observation1
MATLAB Modernization on HDF5 1.10
Flow Solver: HiFUN
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Interconnecting Belgian national and regional address data using EC ISA "Loca...
All AI Roads lead to Distribution - Dot AI
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
Resisting skew accumulation
3.introduction to map reduce
Ad

Similar to Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidis, KTH) (20)

PDF
Automatic generation of hardware memory architectures for HPC
PDF
Burst Buffer: From Alpha to Omega
PPT
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
PPTX
Role of python in hpc
PPT
Mainframe Architecture & Product Overview
PDF
2023comp90024_Spartan.pdf
PPTX
Scientific Computing @ Fred Hutch
PDF
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
PDF
CC LECTURE NOTES (1).pdf
ODP
High-Performance Computing and OpenSolaris
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
PDF
Accelerate Big Data Processing with High-Performance Computing Technologies
PDF
2019 HighPerformance Computing - Strategies for Machine Learning.pdf
PDF
HPE Solutions for Challenges in AI and Big Data
PDF
Saviak lviv ai-2019-e-mail (1)
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PPT
pythonOCC PDE2009 presentation
PDF
Achieving compute and storage independence for data-driven workloads
PDF
Build your own discovery index of scholary e-resources
Automatic generation of hardware memory architectures for HPC
Burst Buffer: From Alpha to Omega
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Role of python in hpc
Mainframe Architecture & Product Overview
2023comp90024_Spartan.pdf
Scientific Computing @ Fred Hutch
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
CC LECTURE NOTES (1).pdf
High-Performance Computing and OpenSolaris
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
Accelerate Big Data Processing with High-Performance Computing Technologies
2019 HighPerformance Computing - Strategies for Machine Learning.pdf
HPE Solutions for Challenges in AI and Big Data
Saviak lviv ai-2019-e-mail (1)
Hopsworks at Google AI Huddle, Sunnyvale
pythonOCC PDE2009 presentation
Achieving compute and storage independence for data-driven workloads
Build your own discovery index of scholary e-resources
Ad

More from EUDAT (20)

PDF
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
PDF
EUDAT Booklet Mar22 (2).pdf
PDF
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
PDF
EUDAT Brochure - B2HANDLE.pdf
PDF
EUDAT Brochure - B2DROP.pdf
PDF
EUDAT Brochure - B2SHARE.pdf
PDF
EUDAT Brochure - B2SAFE.pdf
PDF
EUDAT Brochure - B2FIND(1).pdf
PDF
EUDAT Brochure - B2ACCESS.pdf
PDF
Rob Carrillo - Writing effective service documentation for EUDAT services
PDF
Ariyo - EUDAT CDI B2 services documentation
PDF
Introduction to eudat and its services
PPTX
Using B2NOTE: The U.Porto Pilot
PPT
OpenAIRE Advance - Kick off last week
PPT
European Open Science Cloud - Skills workshop
PPT
Linking service capabilities to data stweardship competences for professional...
PPT
FAIRness of training materials
PPT
Training by EOSC-hub - Integrating and Managing services for the European Ope...
PDF
Draft Governance Framework for the EOSC
PDF
Building Interoperable AAI for Researchers
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT Booklet Mar22 (2).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2ACCESS.pdf
Rob Carrillo - Writing effective service documentation for EUDAT services
Ariyo - EUDAT CDI B2 services documentation
Introduction to eudat and its services
Using B2NOTE: The U.Porto Pilot
OpenAIRE Advance - Kick off last week
European Open Science Cloud - Skills workshop
Linking service capabilities to data stweardship competences for professional...
FAIRness of training materials
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Draft Governance Framework for the EOSC
Building Interoperable AAI for Researchers

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
1_Introduction to advance data techniques.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Database Infoormation System (DBIS).pptx
Business Analytics and business intelligence.pdf
1_Introduction to advance data techniques.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Mega Projects Data Mega Projects Data
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
Clinical guidelines as a resource for EBP(1).pdf
Qualitative Qantitative and Mixed Methods.pptx
IB Computer Science - Internal Assessment.pptx
.pdf is not working space design for the following data for the following dat...
Foundation of Data Science unit number two notes
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Database Infoormation System (DBIS).pptx

Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidis, KTH)

  • 1. www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Introduction to HPC Programming Models Stefano Markidis KTH, Sweden
  • 2. Supercomputing - I Use of computer simulation as a tool for greater understanding of the real world Complements experimentation and theory Problems are increasingly computationally challenging Large parallel machines needed to perform calculations Critical to leverage parallelism in all phases Data access is a huge challenge Using parallelism to obtain performance Finding usable, efficient, portable I/O interfaces millenium project Thermal hydraulics with Nek500
  • 3. Parallel Machines c c c c 💲💲 DRAM Your First Parallel Machine = Your Laptop
  • 4. Parallel Machines c c c c 💲💲 DRAM Office Workstation c c c c 💲💲
  • 5. Parallel Machines Computing Node c c c c 💲💲 DRAM c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 NIC
  • 6. c c c c 💲💲 DRAM c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 NIC c c c c 💲💲 DRAM c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 NIC c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 c c c c 💲💲 c c c c c c c c c c c c c c c c c c c c
  • 7. HPC I/O System is also rather complex… An HPC I/O system is attached to supercomputer The HPC I/O system is a supercomputer itself Commodity network primarily carries storage traffic Enterprise storage controllers and large racks of disks connected via Storage nodes run parallel file system software and manage Gateway nodes run parallel file system client software and forward I/O Ethernet 10 Gbit/sec InfiniBand 16 Gbit/sec BG/P Tree 6.8 Gbit/sec Serial ATA 3.0 Gbit/sec HW bottleneck is here. Controllers can manage only 4.6 Gbyte/sec. Peak I/O system bandwidth is 78.2 Gbyte/sec. Architectural diagram of 557 TF Argonne Leadership Computing Facility Blue Gene/P I/O system
  • 8. Supercomputing – II Most of modern supercomputer hardware are built following two principles: use of commodity hardware: Intel CPUs, AMD CPUs, DDR4, NVIDIA GPU … Using parallelism to achieve very high performance The file systems connected to computers are built in the same way Gather large numbers of storage device: HDDs, SSDs Connect them together in parallel to create a high bandwidth, high capacity storage device.
  • 10. Largest HPC IO Systems https://www.vi4io.org/hpsl/2017/start This is where Big Data starts for HPC
  • 11. Supercomputing - III Supercomputing, n. [ sˌuːpəkəmpjˈuːtə] A special branch of scientific computing that turns a computation-bound problem into an I/O-bound problem.
  • 12. Why is that ? I/O vs Compute Performance Disk AccessRates over Time
  • 13. HPC Programming Models Programming models are an abstraction of parallel computer architectures To express conveniently algorithms without focusing on the details of underlying hardware To remove complexity of architecture when designing algorithms To allow for high-performance implementations
  • 14. Two HPC Programming Models for Supercomputers p0 p1 p2 a=12 a=77 a=32 a=12 a=12 12 12 Message-Passing: explicit send and receive operations (explicit communication) p0 p1 p2 a(1) = 12 a(2) = 77 a(3) = 32 a(1) a(2) a(3) a(2) = 12 a(2) = 12 a(3) = 12 a(3) = 12 PGAS: access global memory that is physically distributed (implicit communication) Get/Put are load/store to global memory Problem: move value of a from p0, to p1 and then p2
  • 15. How do you program a supercomputer ? 99% of the codes for supercomputers are written in Fortran (including Fortran77) and C/C++ Other languages supporting multithreading for on-node parallelism (Python, Java, …) 99% of the large HPC codes use MPI libraries (MP programming model) Used to move data from one computing node to another but also used for on-node parallelism Data-analytics frameworks for supercomputers often use MPI as transport layers
  • 16. MPI MPI = standardized specification document for a Message Passing library to support parallel computing in C/C++ and Fortran. Portability High-Performance Two main implementations: MPICH and OpenMPI (you can install on your laptop) Supercomputer vendors provide highly-tuned implementations of these two. Only four fundamental functions: MPI_Init, MPI_Finalize, MPI_Send, MPI_Recv Other collective functions that include all the communicate processes, i.e. broadcast, scatter, … Includes RDMA operations (one-sided), also streaming models built atop
  • 18. What is Parallel I/O? At the program level: Concurrent reads or writes from multiple processes to a common file At the system level: A parallel file system and hardware that support such concurrent access Three strategies of I/O in HPC: Spokesperson Multiple writers multiple files Cooperative
  • 19. Spokesperson One process performs the I/O Easy to program It doesn’t scale Shared File
  • 20. Multiple writers multiple files All the processes write to individual files Might limited by the file system Easy to program It doesn’t scale Number of files creates bottleneck with metadata operations Number of simultaneous disk accesses creates contention for file system resources
  • 21. Cooperative Parallel I/O (Real Parallel IO) Multiple processes write to a shared file potentially not in a non-contiguous way Truly IO-parallel
  • 22. EUDAT Summer School, 3-7 July 2017, Crete Applications (Weather Forecast, CFD, Astrophysics …) High-Level I/O Level Libraries I/O Middleware I/O Forwarding I/O Parallel File system I/O Hardware MPI I/O HDF5, NetCDF, SionLib, ADIOS CIOD/DVS Lustre, GPFS, … Know about this allow you to optimize higher level of the software stack HPC I/O Software Stack
  • 23. MPI I/O Why Parallel I/O in MPI? Writing is like sending and reading is like receiving. Any parallel I/O system will need: collective operations, communicators, … Why do I/O in MPI? Why not just POSIX? Parallel performance Single file (instead of one file / process) MPI has replacement functions for POSIX I/O Provides migration path Multiple styles of I/O can all be expressed in MPI Including some that cannot be expressed without MPI
  • 24. MPI I/O: the basics I/O operations for unformatted binary file, similar to read and write, there is no fwrite nor fread. Just like POSIX I/O, you need to Open the file Read or Write data to the file Close the file In MPI, these steps are almost the same: Open the file: MPI_File_open Write to the file: MPI_File_write Close the file: MPI_File_close
  • 25. An example of MPI I/O #include <stdio.h> #include "mpi.h” int main(int argc, char *argv[]) { MPI_File fh; int buf[1000], rank; MPI_Init(0,0); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI _File_ open(MPI_COMM_WORLD, "test.out", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh); if (rank == 0) MPI _File_ w rite(fh, buf, 1000, MPI_INT, MPI_STATUS_IGNORE); MPI _File_ close(&fh); MPI_Finalize(); return 0; } Example code to write to a shared file
  • 26. High-Level Parallel Libraries Provide structure to files Well-defined, portable formats Self-describing APIs more appropriate for computational science Typed data Noncontiguous regions in memory and file Interfaces are implemented on top of MPI-IO
  • 27. HDF5 Most used high-level library in scientific codes HDF5 = Hierarchical Data format HDF5 is three things: Data model: container, data set, group and link Library: support for parallel I/O operations File Format: Hierarchical data organization in single file; typed, multidimensional array storage ; attributes on dataset, data
  • 28. What about PGAS I/O ? Effort in designing PGAS-like programming systems for I/O operations Different parts of a shared file are virtually mapped to a global memory space, that is accessible by all processes, think about mmap for instance To write to disk, make a store to global memory To read from disk, make a load from the global memory I/O system is becoming very heterogeneous so it is good to have a unique flat global “memory” space to hide this architectural complexity a(0) a(1) a(2) a(3) a(4) a(5) a(6) File 1 File 2 mapping User write a(5) a(5) = 8.7 Global “memory”
  • 29. Conclusions Supercomputers consist of several computing nodes connected by an high-performance network Programming models abstract supercomputer hardware to allow for efficient implementation of algorithms MPI, C/C++ and Fortran are dominant MPI I/O provides means for real parallel I/O HDF5 most famous data format, library and data model in HPC PGAS I/O might be a viable option
  • 31. www.eudat.eu Acknowledgements These slides are largely based and adapted from: - “Parallel I/O in Practice” by Rob Ross https://www.nersc.gov/assets/Training/pio-in-practice-sc12.pdf - “Short introduktion on Optimizing I/O” by Cray https://www.pdc.kth.se/education/course-resources/introduction-to- cray-xc30-xc40/feb-2015/05_Short_Intro_Optimizing-IO.pdf - “Lecture 32: Introduction to MPI I/O” by Bill Gropp http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture32.pdf