SlideShare a Scribd company logo
HPC I/O for Computational Scientists:
Understanding I/O
Presented to
ATPESC 2017 Participants
Rob Latham and Phil Carns
Mathematics and Computer Science Division
Argonne National Laboratory
Q Center, St. Charles, IL (USA)
8/4/2017
ATPESC 2017, July 30 – August 11, 20172
Motivation for
Characterizing parallel I/O
• Most scientific domains are
increasingly data intensive:
climate, physics, biology and
much more
• Upcoming platforms include
complex hierarchical
storage systems
How can we
maximize productivity
in this environment?
Times are changing in HPC storage!
Example visualizations from
the Human Connectome
Project, CERN/LHC, and the
Parallel Ocean Program
The NERSC burst buffer roadmap and architecture, including solid
state burst buffers that can be used in a variety of ways
ATPESC 2017, July 30 – August 11, 20173
Key challenges
• Instrumentation:
– What do we measure?
– How much overhead is acceptable and when?
• Analysis:
– How do we correlate data and extract actionable information?
– Can we identify the root cause of performance problems?
• Impact:
– Develop best practices and tune applications
– Improve system software
– Design and procure better systems
3
CHARACTERIZING APPLICATION I/O
WITH DARSHAN
ATPESC 2017, July 30 – August 11, 20175
What is Darshan?
Darshan is a scalable HPC I/O characterization tool. It captures an
accurate but concise picture of application I/O behavior with
minimum overhead.
• No code changes, easy to use
– Negligible performance impact: just “leave it on”
– Enabled by default at ALCF, NERSC, NCSA, and KAUST
– Installed and available for case by case use at many other sites
• Produces a summary of I/O activity for each job, including:
– Counters for file access operations
– Time stamps and cumulative timers for key operations
– Histograms of access, stride, datatype, and extent sizes
5
Project began in 2008, first public software
release and deployment in 2009
ATPESC 2017, July 30 – August 11, 20176
Darshan design principles
• The Darshan run time library is inserted at link time (for static
executables) or at run time (for dynamic executables)
• Transparent wrappers for I/O functions collect per-file statistics
• Statistics are stored in bounded memory at each rank
• At shutdown time:
– Collective reduction to merge shared file records
– Parallel compression
– Collective write to a single log file
• No communication or storage operations until shutdown
• Command-line tools are used to post-process log files
6
ATPESC 2017, July 30 – August 11, 20177
JOB analysis example
Example: Darshan-job-summary.pl
produces a 3-page PDF file
summarizing various aspects of I/O
performance
Estimated performance
Percentage of runtime in I/O
Access size histogram
Access type histograms
File usage
ATPESC 2017, July 30 – August 11, 20178
SYSTEM analysis example
• With a sufficient archive of
performance statistics, we can
develop heuristics to detect
anomalous behavior
8
This example highlights large jobs that spent a
disproportionate amount of time managing file
metadata rather than performing raw data transfer
Worst offender spent 99% of I/O time in
open/close/stat/seek
This identification process is not yet automated;
alerts/triggers are needed in future work for greater
impact
Example of heuristics applied to a population of
production jobs on the Hopper system in 2013:
Carns et al., “Production I/O Characterization on the Cray XE6,” In
Proceedings of the Cray User Group meeting 2013 (CUG 2013).
ATPESC 2017, July 30 – August 11, 20179
Performance:
function wrapping overhead
What is the cost of interposing Darshan I/O instrumentation wrappers?
• To test, we compare observed I/O time of an IOR configuration
linked against different Darshan versions on Edison
• File-per-process workload, 6,000 processes, over 12 million
instrumented calls
Type of Darshan builds now
deployed on Theta and Cori
Why the box plots? Recall
observation from this morning that
variability is a constant theme in
HPC I/O today.
(note that the Y axis labels start at 40)
Snyder et al. Modular HPC I/O Characterization with Darshan. In Proceedings
of 5th Workshop on Extreme-scale Programming Tools (ESPT 2016), 2016.
ATPESC 2017, July 30 – August 11, 201710
Performance: shutdown overhead
• Involves aggregating, compressing, and collectively writing I/O
data records
• To test, synthetic workloads are injected into Darshan and resulting
shutdown time is measured on Edison
Near constant shutdown time of
~100 ms in all cases
Shutdown time scales linearly with job size:
5-6s extra shutdown time with 12,000 files
single shared file file-per-process
USING DARSHAN IN PRACTICE
ATPESC 2017, July 30 – August 11, 201712
Typical deployment and usage
• Darshan usage on Mira, Cetus, Vesta, Theta,
Cori, or Edison, abridged:
– Run your job
– If the job calls MPI_Finalize(), log will be stored in
DARSHAN_LOG_DIR/month/day/
– Theta: /lus/theta-fs0/logs/darshan/theta
– Use tools (next slides) to interpret log
• On Titan: “module load darshan” first
• Links to documentation with details will be
given at the end of this presentation
12
ATPESC 2017, July 30 – August 11, 201713
Generating job summaries
• Run job and find its log file:
• Copy log files to save, generate PDF summaries:
13
Job id
Corresponding
log file in today’s
directory
Copy out logs
List logs
Load “latex” module,
(if needed)
Generate PDF
ATPESC 2017, July 30 – August 11, 201714
First page of summary
14
Common questions:
• Did I spend much time performing IO?
• What were the access sizes?
• How many files where opened, and
how big were they?
ATPESC 2017, July 30 – August 11, 201715
Second page of summary (excerpt)
15
Common questions:
• Where in the timeline of the execution did ea
rank do I/O?
There are additional graphs in the PDF file with increasingly detailed information.
You can also dump all data from the log in text format using “darshan-parser”.
TIPS AND TRICKS: ENABLING ADDITIONAL DATA
CAPTURE
ATPESC 2017, July 30 – August 11, 201717
What if you are doing shared-file IO?
17
Your timeline might look like this
No per-process information available
because the data was aggregated by
Darshan to save space/overhead
Is that important? It depends on what
you need to learn about your
application.
– It may be interesting for applications
that access the same file in distinct
phases over time
ATPESC 2017, July 30 – August 11, 201718
What if you are doing shared-file IO?
18
Set environment variable to disable shared file
reductions
Increases overhead and log file size, but provides
per-rank info even on shared files
ATPESC 2017, July 30 – August 11, 201719
Detailed trace data
19
Set environment variable to enable “DXT” tracing
This causes additional overhead and larger files, but
captures precise access data
Parse trace with “darshan-dxt-parser”
Feature contributed by
Cong Xu and Intel’s High
Performance Data Division
Cong Xu et. al, "DXT:
Darshan eXtended Tracing",
Cray User Group Conference
2017
DARSHAN FUTURE WORK
ATPESC 2017, July 30 – August 11, 201721
What’s new?
Modularized instrumentation
• Frequently asked question:
Can I add instrumentation for X?
• Darshan has been re-architected as a
modular framework to help facilitate this,
starting in v3.0
21
Snyder et al. Modular HPC I/O Characterization with
Darshan. In Proceedings of 5th Workshop on Extreme-
scale Programming Tools (ESPT 2016), 2016.
Self-describing log format
ATPESC 2017, July 30 – August 11, 201722
Darshan Module example
• We are using the modular
framework to integrate more data
sources and simplify the
connections between various
components in the stack
• This is a good way for
collaborators to get involved in
Darshan development
22
ATPESC 2017, July 30 – August 11, 201723
The need for HOLISTIC characterization
• We’ve used Darshan to improving application productivity with case
studies, application tuning, and user education
• ... But challenges remain:
– What other factors influence performance?
– What if the problem is beyond a user’s control?
– The user population evolves over time; how do we stay engaged?
23
ATPESC 2017, July 30 – August 11, 201724
“I observed performance XYZ. Now what?”
• A climate vs. weather analogy: It is snowing in Atlanta, Georgia.
Is that normal?
• You need context to know:
– Does it ever snow there?
– What time of year is it?
– What was the temperature yesterday?
– Do your neighbors see snow too?
– Should you look at it first hand?
• It is similarly difficult to understand a single application performance
measurement without broader context. How do we differentiate
typical I/O climate from extreme I/O weather events?
24
+ = ?
ATPESC 2017, July 30 – August 11, 201725
Characterizing the I/O system
• We need a big picture view
• No lack of instrumentation
methods for system
components…
– but with divergent data formats,
resolutions, and scope
25
ATPESC 2017, July 30 – August 11, 201726
Characterizing the I/O system
• We need a big picture view
• No lack of instrumentation
methods for system
components…
– but with wildly divergent data
formats, resolutions, and scope
• This is the motivation for the
TOKIO (TOtal Knowledge of
I/O) project:
– Integrate, correlate, and analyze
I/O behavior from the system as a
whole for holistic understanding
26
Holistic I/O characterization
https://www.nersc.gov/research-and-development/tokio/
ATPESC 2017, July 30 – August 11, 201727
TOKIO Strategy
• Integrate existing best-in-class instrumentation tools with help from
vendors
• Index and query data sources in their native format
– Infrastructure to align and link data sets
– Adapters/parsers to produce coherent views on demand
• Develop integration and analysis methods
• Produce tools that share a common interface and data format
– Correlation, data mining, dashboards, etc.
27
The TOKIO project is a collaboration between LBL and ANL
PI: Nick Wright (LBL), Collaborators: Suren Byna, Glenn Lockwood,
William Yoo, Prabhat, Jialin Liu (LBL) Phil Carns, Shane Snyder, Kevin
Harms, Zach Nault, Matthieu Dorier, Rob Ross (ANL)
ATPESC 2017, July 30 – August 11, 201728
UMAMI example
TOKIO Unified Measurements And Metrics Interface
28
UMAMI is a pluggable dashboard that displays the
I/O performance of an application in context with
system telemetry and historical records
Each metric is shown
in a separate row
Historical samples (for a
given application) are
plotted over time
Box plots relate current
values to overall
variance
(figures courtesy of Glenn Lockwood, NERSC)
ATPESC 2017, July 30 – August 11, 201729
UMAMI example
TOKIO Unified Measurements And Metrics Interface
29
System background
load is typical
Performance for this job
is higher than usual
Server CPU load is low
after a long-term steady
climb
Corresponds to data
purge that freed up disk
blocks
Broader contextual clues simplify interpretation of
unusual performance measurements
ATPESC 2017, July 30 – August 11, 201730
Hands on exercises
https://xgitlab.cels.anl.gov/ATPESC-IO/hands-on-2017
• There are hands-on exercises available for you to try out during the
day or in tonight’s session
– Demonstrates running applications and analyzing I/O on Theta
– Try some examples and see if you can find the I/O problem!
• We can also answer questions about your own applications
– Try it on Theta, Mira, Cetus, Vesta, Cori, Edison, or Titan
– (note: the Mira, Vesta, and Cetus Darshan versions are a little
older and will differ slightly in details from this presentation)
30
ATPESC 2017, July 30 – August 11, 201731
Next up!
• This presentation covered how to evaluate I/O and tune your
application.
• The next presentation will walk through the HDF5 data management
library.

More Related Content

PDF
Hadoop ensma poitiers
PDF
Asd 2015
PDF
Parallel Sequence Generator
PPTX
ArcGIS and Multi-D: Tools & Roadmap
PDF
parallel OLAP
PDF
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
PPTX
SPD and KEA: HDF5 based file formats for Earth Observation
Hadoop ensma poitiers
Asd 2015
Parallel Sequence Generator
ArcGIS and Multi-D: Tools & Roadmap
parallel OLAP
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
SPD and KEA: HDF5 based file formats for Earth Observation

What's hot (20)

PDF
Scaling Deep Learning Algorithms on Extreme Scale Architectures
PDF
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
PPTX
Multidimensional Scientific Data in ArcGIS
PDF
Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...
PDF
Big data in the research life cycle: technologies, infrastructures, policies
PPTX
ICESat-2 Metadata and Status
PPTX
MATLAB and Scientific Data: New Features and Capabilities
PDF
Virtualization for HPC at NCI
PPTX
Matlab, Big Data, and HDF Server
PDF
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
PDF
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
PPTX
Pilot Project for HDF5 Metadata Structures for SWOT
PDF
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
PPTX
Working with Scientific Data in MATLAB
PPTX
Incorporating ISO Metadata Using HDF Product Designer
PPTX
Improved Methods for Accessing Scientific Data for the Masses
Scaling Deep Learning Algorithms on Extreme Scale Architectures
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Multidimensional Scientific Data in ArcGIS
Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...
Big data in the research life cycle: technologies, infrastructures, policies
ICESat-2 Metadata and Status
MATLAB and Scientific Data: New Features and Capabilities
Virtualization for HPC at NCI
Matlab, Big Data, and HDF Server
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Pilot Project for HDF5 Metadata Structures for SWOT
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with Scientific Data in MATLAB
Incorporating ISO Metadata Using HDF Product Designer
Improved Methods for Accessing Scientific Data for the Masses
Ad

Similar to HPC I/O for Computational Scientists (20)

PPTX
Big Data HPC Convergence and a bunch of other things
PDF
InternReport
PDF
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
PDF
Property Graphs with Time
PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
50120130405014 2-3
PPTX
Linked Open Data about Springer Nature conferences. The story so far
PDF
Welcome to HDF Workshop V
PDF
Team 05 linked data generation
PPTX
Inspire hack 2017-linked-data
PDF
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
PPTX
Scaling Data Science on Big Data
PDF
An Efficient Approach to Manage Small Files in Distributed File Systems
PDF
Logging/Request Tracing in Distributed Environment
PDF
Apricot2017 Request tracing in distributed environment
PPTX
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
PDF
Bicod2017
PDF
BICOD-2017
PDF
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
PPTX
Solving the data problem for research beyond
Big Data HPC Convergence and a bunch of other things
InternReport
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Property Graphs with Time
Minimizing the Complexities of Machine Learning with Data Virtualization
50120130405014 2-3
Linked Open Data about Springer Nature conferences. The story so far
Welcome to HDF Workshop V
Team 05 linked data generation
Inspire hack 2017-linked-data
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Scaling Data Science on Big Data
An Efficient Approach to Manage Small Files in Distributed File Systems
Logging/Request Tracing in Distributed Environment
Apricot2017 Request tracing in distributed environment
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
Bicod2017
BICOD-2017
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
Solving the data problem for research beyond
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Machine Learning_overview_presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
A comparative analysis of optical character recognition models for extracting...
Machine Learning_overview_presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Assigned Numbers - 2025 - Bluetooth® Document

HPC I/O for Computational Scientists

  • 1. HPC I/O for Computational Scientists: Understanding I/O Presented to ATPESC 2017 Participants Rob Latham and Phil Carns Mathematics and Computer Science Division Argonne National Laboratory Q Center, St. Charles, IL (USA) 8/4/2017
  • 2. ATPESC 2017, July 30 – August 11, 20172 Motivation for Characterizing parallel I/O • Most scientific domains are increasingly data intensive: climate, physics, biology and much more • Upcoming platforms include complex hierarchical storage systems How can we maximize productivity in this environment? Times are changing in HPC storage! Example visualizations from the Human Connectome Project, CERN/LHC, and the Parallel Ocean Program The NERSC burst buffer roadmap and architecture, including solid state burst buffers that can be used in a variety of ways
  • 3. ATPESC 2017, July 30 – August 11, 20173 Key challenges • Instrumentation: – What do we measure? – How much overhead is acceptable and when? • Analysis: – How do we correlate data and extract actionable information? – Can we identify the root cause of performance problems? • Impact: – Develop best practices and tune applications – Improve system software – Design and procure better systems 3
  • 5. ATPESC 2017, July 30 – August 11, 20175 What is Darshan? Darshan is a scalable HPC I/O characterization tool. It captures an accurate but concise picture of application I/O behavior with minimum overhead. • No code changes, easy to use – Negligible performance impact: just “leave it on” – Enabled by default at ALCF, NERSC, NCSA, and KAUST – Installed and available for case by case use at many other sites • Produces a summary of I/O activity for each job, including: – Counters for file access operations – Time stamps and cumulative timers for key operations – Histograms of access, stride, datatype, and extent sizes 5 Project began in 2008, first public software release and deployment in 2009
  • 6. ATPESC 2017, July 30 – August 11, 20176 Darshan design principles • The Darshan run time library is inserted at link time (for static executables) or at run time (for dynamic executables) • Transparent wrappers for I/O functions collect per-file statistics • Statistics are stored in bounded memory at each rank • At shutdown time: – Collective reduction to merge shared file records – Parallel compression – Collective write to a single log file • No communication or storage operations until shutdown • Command-line tools are used to post-process log files 6
  • 7. ATPESC 2017, July 30 – August 11, 20177 JOB analysis example Example: Darshan-job-summary.pl produces a 3-page PDF file summarizing various aspects of I/O performance Estimated performance Percentage of runtime in I/O Access size histogram Access type histograms File usage
  • 8. ATPESC 2017, July 30 – August 11, 20178 SYSTEM analysis example • With a sufficient archive of performance statistics, we can develop heuristics to detect anomalous behavior 8 This example highlights large jobs that spent a disproportionate amount of time managing file metadata rather than performing raw data transfer Worst offender spent 99% of I/O time in open/close/stat/seek This identification process is not yet automated; alerts/triggers are needed in future work for greater impact Example of heuristics applied to a population of production jobs on the Hopper system in 2013: Carns et al., “Production I/O Characterization on the Cray XE6,” In Proceedings of the Cray User Group meeting 2013 (CUG 2013).
  • 9. ATPESC 2017, July 30 – August 11, 20179 Performance: function wrapping overhead What is the cost of interposing Darshan I/O instrumentation wrappers? • To test, we compare observed I/O time of an IOR configuration linked against different Darshan versions on Edison • File-per-process workload, 6,000 processes, over 12 million instrumented calls Type of Darshan builds now deployed on Theta and Cori Why the box plots? Recall observation from this morning that variability is a constant theme in HPC I/O today. (note that the Y axis labels start at 40) Snyder et al. Modular HPC I/O Characterization with Darshan. In Proceedings of 5th Workshop on Extreme-scale Programming Tools (ESPT 2016), 2016.
  • 10. ATPESC 2017, July 30 – August 11, 201710 Performance: shutdown overhead • Involves aggregating, compressing, and collectively writing I/O data records • To test, synthetic workloads are injected into Darshan and resulting shutdown time is measured on Edison Near constant shutdown time of ~100 ms in all cases Shutdown time scales linearly with job size: 5-6s extra shutdown time with 12,000 files single shared file file-per-process
  • 11. USING DARSHAN IN PRACTICE
  • 12. ATPESC 2017, July 30 – August 11, 201712 Typical deployment and usage • Darshan usage on Mira, Cetus, Vesta, Theta, Cori, or Edison, abridged: – Run your job – If the job calls MPI_Finalize(), log will be stored in DARSHAN_LOG_DIR/month/day/ – Theta: /lus/theta-fs0/logs/darshan/theta – Use tools (next slides) to interpret log • On Titan: “module load darshan” first • Links to documentation with details will be given at the end of this presentation 12
  • 13. ATPESC 2017, July 30 – August 11, 201713 Generating job summaries • Run job and find its log file: • Copy log files to save, generate PDF summaries: 13 Job id Corresponding log file in today’s directory Copy out logs List logs Load “latex” module, (if needed) Generate PDF
  • 14. ATPESC 2017, July 30 – August 11, 201714 First page of summary 14 Common questions: • Did I spend much time performing IO? • What were the access sizes? • How many files where opened, and how big were they?
  • 15. ATPESC 2017, July 30 – August 11, 201715 Second page of summary (excerpt) 15 Common questions: • Where in the timeline of the execution did ea rank do I/O? There are additional graphs in the PDF file with increasingly detailed information. You can also dump all data from the log in text format using “darshan-parser”.
  • 16. TIPS AND TRICKS: ENABLING ADDITIONAL DATA CAPTURE
  • 17. ATPESC 2017, July 30 – August 11, 201717 What if you are doing shared-file IO? 17 Your timeline might look like this No per-process information available because the data was aggregated by Darshan to save space/overhead Is that important? It depends on what you need to learn about your application. – It may be interesting for applications that access the same file in distinct phases over time
  • 18. ATPESC 2017, July 30 – August 11, 201718 What if you are doing shared-file IO? 18 Set environment variable to disable shared file reductions Increases overhead and log file size, but provides per-rank info even on shared files
  • 19. ATPESC 2017, July 30 – August 11, 201719 Detailed trace data 19 Set environment variable to enable “DXT” tracing This causes additional overhead and larger files, but captures precise access data Parse trace with “darshan-dxt-parser” Feature contributed by Cong Xu and Intel’s High Performance Data Division Cong Xu et. al, "DXT: Darshan eXtended Tracing", Cray User Group Conference 2017
  • 21. ATPESC 2017, July 30 – August 11, 201721 What’s new? Modularized instrumentation • Frequently asked question: Can I add instrumentation for X? • Darshan has been re-architected as a modular framework to help facilitate this, starting in v3.0 21 Snyder et al. Modular HPC I/O Characterization with Darshan. In Proceedings of 5th Workshop on Extreme- scale Programming Tools (ESPT 2016), 2016. Self-describing log format
  • 22. ATPESC 2017, July 30 – August 11, 201722 Darshan Module example • We are using the modular framework to integrate more data sources and simplify the connections between various components in the stack • This is a good way for collaborators to get involved in Darshan development 22
  • 23. ATPESC 2017, July 30 – August 11, 201723 The need for HOLISTIC characterization • We’ve used Darshan to improving application productivity with case studies, application tuning, and user education • ... But challenges remain: – What other factors influence performance? – What if the problem is beyond a user’s control? – The user population evolves over time; how do we stay engaged? 23
  • 24. ATPESC 2017, July 30 – August 11, 201724 “I observed performance XYZ. Now what?” • A climate vs. weather analogy: It is snowing in Atlanta, Georgia. Is that normal? • You need context to know: – Does it ever snow there? – What time of year is it? – What was the temperature yesterday? – Do your neighbors see snow too? – Should you look at it first hand? • It is similarly difficult to understand a single application performance measurement without broader context. How do we differentiate typical I/O climate from extreme I/O weather events? 24 + = ?
  • 25. ATPESC 2017, July 30 – August 11, 201725 Characterizing the I/O system • We need a big picture view • No lack of instrumentation methods for system components… – but with divergent data formats, resolutions, and scope 25
  • 26. ATPESC 2017, July 30 – August 11, 201726 Characterizing the I/O system • We need a big picture view • No lack of instrumentation methods for system components… – but with wildly divergent data formats, resolutions, and scope • This is the motivation for the TOKIO (TOtal Knowledge of I/O) project: – Integrate, correlate, and analyze I/O behavior from the system as a whole for holistic understanding 26 Holistic I/O characterization https://www.nersc.gov/research-and-development/tokio/
  • 27. ATPESC 2017, July 30 – August 11, 201727 TOKIO Strategy • Integrate existing best-in-class instrumentation tools with help from vendors • Index and query data sources in their native format – Infrastructure to align and link data sets – Adapters/parsers to produce coherent views on demand • Develop integration and analysis methods • Produce tools that share a common interface and data format – Correlation, data mining, dashboards, etc. 27 The TOKIO project is a collaboration between LBL and ANL PI: Nick Wright (LBL), Collaborators: Suren Byna, Glenn Lockwood, William Yoo, Prabhat, Jialin Liu (LBL) Phil Carns, Shane Snyder, Kevin Harms, Zach Nault, Matthieu Dorier, Rob Ross (ANL)
  • 28. ATPESC 2017, July 30 – August 11, 201728 UMAMI example TOKIO Unified Measurements And Metrics Interface 28 UMAMI is a pluggable dashboard that displays the I/O performance of an application in context with system telemetry and historical records Each metric is shown in a separate row Historical samples (for a given application) are plotted over time Box plots relate current values to overall variance (figures courtesy of Glenn Lockwood, NERSC)
  • 29. ATPESC 2017, July 30 – August 11, 201729 UMAMI example TOKIO Unified Measurements And Metrics Interface 29 System background load is typical Performance for this job is higher than usual Server CPU load is low after a long-term steady climb Corresponds to data purge that freed up disk blocks Broader contextual clues simplify interpretation of unusual performance measurements
  • 30. ATPESC 2017, July 30 – August 11, 201730 Hands on exercises https://xgitlab.cels.anl.gov/ATPESC-IO/hands-on-2017 • There are hands-on exercises available for you to try out during the day or in tonight’s session – Demonstrates running applications and analyzing I/O on Theta – Try some examples and see if you can find the I/O problem! • We can also answer questions about your own applications – Try it on Theta, Mira, Cetus, Vesta, Cori, Edison, or Titan – (note: the Mira, Vesta, and Cetus Darshan versions are a little older and will differ slightly in details from this presentation) 30
  • 31. ATPESC 2017, July 30 – August 11, 201731 Next up! • This presentation covered how to evaluate I/O and tune your application. • The next presentation will walk through the HDF5 data management library.