SlideShare a Scribd company logo
©2018 DataDirect Networks, Inc. DDN Confidential
Death to the Shared POSIX File System!?
Long Live the Shared POSIX File System!!
HPC Storage and the HPC Data Center
Long Live Posix - HPC Storage and the HPC Datacenter
this is a vendor presentation!
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Classical HPC Data Centers
• Everything is bare metal
• Most data is stored in shared POSIX
file systems that largely remain on
HDDs
• SSD-based systems are slowly
coming, but capacities remain small
• People have played with Hadoop and
objects stores, but very few systems
are actually in production
• Special performance and capacity
tiers are used mainly by the large
centers, but not the broader market
Cloud Data Centers
• Everything is virtualized, containerized,
dynamic, automated, etc.
• Most date lives on local file systems in
virtual machines (or containers), either
on SSDs or HDDs
• Monolithic web applications use large,
distributed object stores
• Enterprise storage still being used for
various specific requirements, both on
premise and hosted in the cloud
• Tiers exists for redundancy, data back-
up, data retention, etc.
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Example: 2015 DDN User Analysis
Work Data Mixed Use Archive
Cloud
HPC
Work
Weather
Climate
CAE
Chemical
General
Academic
Genomics
Big Data
Science
Security
Finance
Energy
Tier 2HPC
Cloud
Cloud
Work
31%
Data
33%
Mixed
22%
Cloud
2%
Archive
12%
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Example: 2018 Lustre End User Survey
0% 5% 10% 15% 20% 25% 30% 35% 40%
Weather/Climate
Other
Media
Manufacturing
Life Sciences
Government
Financial Services
Energy
Education
Defense
AI/Machine Learning
Primary Usage of Lustre
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
HPC Data Centers: The Past Decade
►“Research Big Data” is a very big market for storage (and a much larger than
the classical tightly-coupled simulations) and it is quite different from classical
HPC
►Many data center dedicated to analytics and deep learning look like large
cloud environments, but their IO requirements are much closer to Research
Big Data
►Some “Research Big Data” end users have moved to on-premise cloud
environments using Open Stack etc.
►More HPC and Analytics customers are running applications in the cloud or in
cloud-like environments, with shared file systems running in the cloud
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions From LANL in 2016:
Serving Data to the Lunatic Fringe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions From LANL in 2016:
Serving Data to the Lunatic Fringe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
This architecture was imperiled by SSD economics
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions From LANL in 2016:
Serving Data to the Lunatic Fringe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
This architecture becomes imperiled by tape economics
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions From LANL in 2016:
Serving Data to the Lunatic Fringe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
This architecture becomes imperiled by Lang’s Law!
”DOE doesn’t want tiers. Tiers
are an unfortunate accident of
economics. DOE wants infinite
memory and a system without
unplanned interrupts.
Just remember this:
The fewer tiers, the fewer tears.”
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions From LANL in 2016:
Serving Data to the Lunatic Fringe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Predictions from NERSC: Storage 2020-2025
Four Tiers Three Tiers Two Tiers
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
The Foreseeable Future Remains Tiered
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN ConfidentialDDN Storage | ©2019 DataDirect Networks
Tiering Tiering – Data Schmiering
All due respect to Lang’s Law (‘fewer tiers,
fewer tears’),
tiering is a (mostly) solved problem.
Buffer-caching is a (mostly) solved problem!
Russel Kirsch developed it for the SEAC in
1952.
DDN ConfidentialDDN Storage ©2019 DataDirect Networks, Inc.
Flash Acceleration Layer Usage and Tiering Workflows in Traditional HPC
Lee Ward, Use Cases or BB Roles,
Informal Burst Buffer Presentation via
Sandia National Laboratories, 2015.
Challenges and Considerations for Utilizing
Burst Buffers in High-Performance
Computing, Melissa Romanus, Robert
Ross, Manish Parashar, 2018.
Development of a Burst Buffer System
for Data-Intensive Applications,
Teng Wang, Sarp Oral, Michael Pritchard,
Kevin Vasko, Weikuan Yu, 2015.
An Operational Perspective on a Hybrid
and Heterogeneous Cray XC50 System.
Sadaf Alam, Nicola Bianchi, Nicholas
Cardo, Matteo Chesi, Miguel Gila, Stefano
Gorini, Mark Klein, Colin McMurtrie,
Marco Passerini, Carmelo Ponti, Fabio
Verzelloni, 2017.
1. Checkpoint-Restart
2. In-situ/transit viz/analysis
3. Out-of-core
4. Accelerated reads (pre-stage)
5. Random-read centric
applications (lots of them!)
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Data Tiering: A Subtle Shift in Perception
Applications
Performance Storage
Capacity Storage
Unnecessarily Strict Tiering Relaxed Tiering
Applications
Capacity Storage
Performance Storage
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
“What?”
“Do you have any water?”
“Build an object store!”
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Typical Object Requirements
(“Object Schmobject; long live POSIX”)
► Immutable, transactional get/put, trillions
of objects
► Named objects
► Group objects into logical collections
► Nest logical collections within each other
► Have the same object appear within
multiple collections
► Multi-threaded writes
► Tag objects
►Object is a subset of file
• There is no application which uniquely requires
object semantics
• O_TMPFILE and rename are useful primitives
►Object requirements grow as humans use
them
• Eventually they become file requirements
►We do not live on a deserted desert island
• We have two decades experience building parallel
file systems
►RELEVANT LESSON FROM OBJECT STORES?
• POSIX relaxation is useful
Bent’s Law for HPC Storage:
The future is bright and mostly
as we predicted it.
Don’t be scared of POSIX;
but embrace relaxations.
Don’t be scared of tiering;
but embrace relaxations.
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
ML/DL Storage Scenarios: Small-to-Large
Local
NVMe
Node-Local
“Shared” NVMe
Global/Capacity
Global NVMe
OnDemand
Namespaces Remote NVMe
(NVMeoF)
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Dynamically-Provisioned File Systems
►Job-Specific File Systems
• LLNL Proposal (2014): file system per JOB with data staging
• Isolation from other jobs
• Increased metadata performance
► Loop-back devices for increased metadata Performance
• K-Computer/Fujitsu: Very large-scale Implementation
• NERSC Library (e.g. for use of SPARC on Lustre)
• Lustre on Amazon/Azure/Google (since 2013)
• Amazon Lustre Service (November 2018)
►Future
• Client Container Image (CCI) Feature in Lustre (integrates look-back
devices into Lustre)
• Lustre-on-Demand Feature (first deployments in early 2019!)
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Amazon FSx for Lustre
►Dynamically-provisioned
Lustre file systems
►Application-specific
►Focused on analytics and DL
applications
►Connector to Amazon S3
►But: the usual limitations
still apply;-)
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
Next Decade of HPC Storage: Back to the Future?
►Still PFS…
• … but with increasing portions that are
dynamically allocated and integrated into
the compute platform
• e.g. a file server turns into a containerized
process run anywhere
►Still POSIX…
• … but relaxed where needed
►Still Tiers…
• … but relaxed to reflect actual application
workflows
Relaxed Tiering with
Dynamically-Managed,
System-integrated
Storage
Applications
Capacity Storage
Performance ENS ENS
CapS

More Related Content

PDF
Containerized Storage for Containers
PDF
White paper whitewater-datastorageinthecloud
PPTX
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
PDF
South Bay Kubernetes DevOps
PDF
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
PDF
Storage Virtualization: Towards an Efficient and Scalable Framework
PDF
WekaIO: Making Machine Learning Compute Bound Again
PDF
Containerized Storage for Containers
White paper whitewater-datastorageinthecloud
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
South Bay Kubernetes DevOps
Distributed Large Dataset Deployment with Improved Load Balancing and Perform...
Storage Virtualization: Towards an Efficient and Scalable Framework
WekaIO: Making Machine Learning Compute Bound Again

What's hot (20)

PPTX
The Extreme Data Cloud (XDC) Project
PDF
Connect July-Aug 2014
PDF
Cloud Storage: Focusing On Back End Storage Architecture
PDF
A hybrid cloud approach for secure authorized
PPT
Solving Big Data Problems
PPTX
Thoughts on Cybersecurity
PDF
Cloud Standards in the Real World: Cloud Standards Testing for Developers
PPT
Rhs story61712
PPTX
Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...
PDF
Guaranteed Availability of Cloud Data with Efficient Cost
PDF
Privacy preserving public auditing for secured cloud storage
PDF
Panasas ® UCLA Customer Success Story
PDF
Software defined storage
PDF
Postponed Optimized Report Recovery under Lt Based Cloud Memory
PDF
Virtual graphic workspace
PDF
Strongbox Data Storage Podcast
PDF
Dataline Tysons Corner 100808 Barry Lynn
PDF
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
PDF
Groupe Mutuel case study
PPTX
Improving availability and reducing redundancy using deduplication of cloud s...
The Extreme Data Cloud (XDC) Project
Connect July-Aug 2014
Cloud Storage: Focusing On Back End Storage Architecture
A hybrid cloud approach for secure authorized
Solving Big Data Problems
Thoughts on Cybersecurity
Cloud Standards in the Real World: Cloud Standards Testing for Developers
Rhs story61712
Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...
Guaranteed Availability of Cloud Data with Efficient Cost
Privacy preserving public auditing for secured cloud storage
Panasas ® UCLA Customer Success Story
Software defined storage
Postponed Optimized Report Recovery under Lt Based Cloud Memory
Virtual graphic workspace
Strongbox Data Storage Podcast
Dataline Tysons Corner 100808 Barry Lynn
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
Groupe Mutuel case study
Improving availability and reducing redundancy using deduplication of cloud s...
Ad

Similar to Long Live Posix - HPC Storage and the HPC Datacenter (20)

PDF
Optimizing Lustre and GPFS with DDN
PPTX
Innovating to Create a Brighter Future for AI, HPC, and Big Data
PPTX
DDN EXA 5 - Innovation at Scale
PDF
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
PDF
DDN Product Update from SC13
PDF
DDN: Protecting Your Data, Protecting Your Hardware
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale...
PDF
A Glimpse into the Future of I/O
PDF
DDN Strategic Vision Tour June 2015
PDF
DDN and Intel: Partnered for Exascale
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
PDF
Ddn Vision
PDF
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
PDF
Data Capacitor II at Indiana University
PDF
Infinite Memory Engine: HPC in the FLASH Era
PPTX
Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)
PPTX
The Importance of Fast, Scalable Storage for Today’s HPC
PPTX
Accelerated Any-Scale Solutions from DDN
PDF
HPE Solutions for Challenges in AI and Big Data
Optimizing Lustre and GPFS with DDN
Innovating to Create a Brighter Future for AI, HPC, and Big Data
DDN EXA 5 - Innovation at Scale
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN Product Update from SC13
DDN: Protecting Your Data, Protecting Your Hardware
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale...
A Glimpse into the Future of I/O
DDN Strategic Vision Tour June 2015
DDN and Intel: Partnered for Exascale
HPC DAY 2017 | HPE Storage and Data Management for Big Data
Ddn Vision
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Data Capacitor II at Indiana University
Infinite Memory Engine: HPC in the FLASH Era
Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)
The Importance of Fast, Scalable Storage for Today’s HPC
Accelerated Any-Scale Solutions from DDN
HPE Solutions for Challenges in AI and Big Data
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectroscopy.pptx food analysis technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf

Long Live Posix - HPC Storage and the HPC Datacenter

  • 1. ©2018 DataDirect Networks, Inc. DDN Confidential Death to the Shared POSIX File System!? Long Live the Shared POSIX File System!! HPC Storage and the HPC Data Center
  • 3. this is a vendor presentation!
  • 4. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Classical HPC Data Centers • Everything is bare metal • Most data is stored in shared POSIX file systems that largely remain on HDDs • SSD-based systems are slowly coming, but capacities remain small • People have played with Hadoop and objects stores, but very few systems are actually in production • Special performance and capacity tiers are used mainly by the large centers, but not the broader market Cloud Data Centers • Everything is virtualized, containerized, dynamic, automated, etc. • Most date lives on local file systems in virtual machines (or containers), either on SSDs or HDDs • Monolithic web applications use large, distributed object stores • Enterprise storage still being used for various specific requirements, both on premise and hosted in the cloud • Tiers exists for redundancy, data back- up, data retention, etc.
  • 5. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Example: 2015 DDN User Analysis Work Data Mixed Use Archive Cloud HPC Work Weather Climate CAE Chemical General Academic Genomics Big Data Science Security Finance Energy Tier 2HPC Cloud Cloud Work 31% Data 33% Mixed 22% Cloud 2% Archive 12%
  • 6. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Example: 2018 Lustre End User Survey 0% 5% 10% 15% 20% 25% 30% 35% 40% Weather/Climate Other Media Manufacturing Life Sciences Government Financial Services Energy Education Defense AI/Machine Learning Primary Usage of Lustre
  • 7. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. HPC Data Centers: The Past Decade ►“Research Big Data” is a very big market for storage (and a much larger than the classical tightly-coupled simulations) and it is quite different from classical HPC ►Many data center dedicated to analytics and deep learning look like large cloud environments, but their IO requirements are much closer to Research Big Data ►Some “Research Big Data” end users have moved to on-premise cloud environments using Open Stack etc. ►More HPC and Analytics customers are running applications in the cloud or in cloud-like environments, with shared file systems running in the cloud
  • 8. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions From LANL in 2016: Serving Data to the Lunatic Fringe
  • 9. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions From LANL in 2016: Serving Data to the Lunatic Fringe
  • 10. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. This architecture was imperiled by SSD economics
  • 11. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions From LANL in 2016: Serving Data to the Lunatic Fringe
  • 12. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. This architecture becomes imperiled by tape economics
  • 13. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions From LANL in 2016: Serving Data to the Lunatic Fringe
  • 14. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. This architecture becomes imperiled by Lang’s Law! ”DOE doesn’t want tiers. Tiers are an unfortunate accident of economics. DOE wants infinite memory and a system without unplanned interrupts. Just remember this: The fewer tiers, the fewer tears.”
  • 15. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions From LANL in 2016: Serving Data to the Lunatic Fringe
  • 16. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Predictions from NERSC: Storage 2020-2025 Four Tiers Three Tiers Two Tiers
  • 17. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. The Foreseeable Future Remains Tiered
  • 18. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
  • 19. DDN ConfidentialDDN Storage | ©2019 DataDirect Networks Tiering Tiering – Data Schmiering All due respect to Lang’s Law (‘fewer tiers, fewer tears’), tiering is a (mostly) solved problem. Buffer-caching is a (mostly) solved problem! Russel Kirsch developed it for the SEAC in 1952.
  • 20. DDN ConfidentialDDN Storage ©2019 DataDirect Networks, Inc. Flash Acceleration Layer Usage and Tiering Workflows in Traditional HPC Lee Ward, Use Cases or BB Roles, Informal Burst Buffer Presentation via Sandia National Laboratories, 2015. Challenges and Considerations for Utilizing Burst Buffers in High-Performance Computing, Melissa Romanus, Robert Ross, Manish Parashar, 2018. Development of a Burst Buffer System for Data-Intensive Applications, Teng Wang, Sarp Oral, Michael Pritchard, Kevin Vasko, Weikuan Yu, 2015. An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System. Sadaf Alam, Nicola Bianchi, Nicholas Cardo, Matteo Chesi, Miguel Gila, Stefano Gorini, Mark Klein, Colin McMurtrie, Marco Passerini, Carmelo Ponti, Fabio Verzelloni, 2017. 1. Checkpoint-Restart 2. In-situ/transit viz/analysis 3. Out-of-core 4. Accelerated reads (pre-stage) 5. Random-read centric applications (lots of them!)
  • 21. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Data Tiering: A Subtle Shift in Perception Applications Performance Storage Capacity Storage Unnecessarily Strict Tiering Relaxed Tiering Applications Capacity Storage Performance Storage
  • 22. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. “What?” “Do you have any water?” “Build an object store!”
  • 23. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Typical Object Requirements (“Object Schmobject; long live POSIX”) ► Immutable, transactional get/put, trillions of objects ► Named objects ► Group objects into logical collections ► Nest logical collections within each other ► Have the same object appear within multiple collections ► Multi-threaded writes ► Tag objects ►Object is a subset of file • There is no application which uniquely requires object semantics • O_TMPFILE and rename are useful primitives ►Object requirements grow as humans use them • Eventually they become file requirements ►We do not live on a deserted desert island • We have two decades experience building parallel file systems ►RELEVANT LESSON FROM OBJECT STORES? • POSIX relaxation is useful
  • 24. Bent’s Law for HPC Storage: The future is bright and mostly as we predicted it. Don’t be scared of POSIX; but embrace relaxations. Don’t be scared of tiering; but embrace relaxations.
  • 25. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. ML/DL Storage Scenarios: Small-to-Large Local NVMe Node-Local “Shared” NVMe Global/Capacity Global NVMe OnDemand Namespaces Remote NVMe (NVMeoF)
  • 26. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Dynamically-Provisioned File Systems ►Job-Specific File Systems • LLNL Proposal (2014): file system per JOB with data staging • Isolation from other jobs • Increased metadata performance ► Loop-back devices for increased metadata Performance • K-Computer/Fujitsu: Very large-scale Implementation • NERSC Library (e.g. for use of SPARC on Lustre) • Lustre on Amazon/Azure/Google (since 2013) • Amazon Lustre Service (November 2018) ►Future • Client Container Image (CCI) Feature in Lustre (integrates look-back devices into Lustre) • Lustre-on-Demand Feature (first deployments in early 2019!)
  • 27. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Amazon FSx for Lustre ►Dynamically-provisioned Lustre file systems ►Application-specific ►Focused on analytics and DL applications ►Connector to Amazon S3 ►But: the usual limitations still apply;-)
  • 28. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. Next Decade of HPC Storage: Back to the Future? ►Still PFS… • … but with increasing portions that are dynamically allocated and integrated into the compute platform • e.g. a file server turns into a containerized process run anywhere ►Still POSIX… • … but relaxed where needed ►Still Tiers… • … but relaxed to reflect actual application workflows Relaxed Tiering with Dynamically-Managed, System-integrated Storage Applications Capacity Storage Performance ENS ENS CapS