SlideShare a Scribd company logo
DDN Confidential
DDN Storage | ©2018 DataDirect Networks, Inc.
IME: Unlocking the Potential of NVMe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN Direction| Building on 20 Years of Innovation
JCAHPC
>1TB/sec NVMe
S2A Systems
1998 2000 2016 2017
SFA – Scale-Out Systems
IME – Software Defined Elastic Data Services
Data Integrity, Declustering, Erasure Coding – innovative data protect schemas, geo distribution, new orders of scaling
Storage Orchestration – Open APIs, Kubernetes, Docker, Openstack, Ansible, Puppet…
New Hierarchies – automated data placement, IOPs and Bulk IO Engines,
Flash and NVRAM – performance, lifetime, memory-class
HW Fabrics and Interconnects – NVMe (oF), IB, OPA, Datacentre Ethernet, Gen-Z
Virtualisation | Containers | Tenancy – secure tenants at scale, SCSI VM stack
Disaggregated, Shared Nothing - KV stores, node-local, edge server, SW/cloud deployment
Distributed Storage: GPFS (Spectrum Scale) and Lustre, NFS/SMB, POSIX and POSIX-lite, Object
Technologyanddemand
2020
SFA12Kxi
4.5 M IOPS
ONRL Spider 2
1 TB/sec HDD
First Large
Lustre System
First S2A
Shipped to NASA
20132003
Elastic Data Services for Performance and Scale
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN the First to Realize the Research Dream
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
WHAT IS IME?
IME’s Active I/O Tier, is inserted right between
Compute and the parallel file system
IME software intelligently virtualizes disparate NVMe SSDs into
a single pool of shared memory that accelerates I/O, PFS & Applications
► Scale-Out Flash Cache Layer using NVMe SSDs
inserted between compute cluster and Parallel
File System (PFS)
• IME is configured as CLUSTER with multiple
NVMe servers
• All compute nodes can access cache data on
IME
► Accelerates difficult IO patterns:
small/random/shared file/high concurrency due
to thin SW IO management layer
► configured as scale-out massive cache layer
with huge IO bandwidth and IOPs
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME INTERNALS
File1 DFCD3455
File4 52ED789E
File3 46042D43
File6 DC355CE
Data Key Distributed
Network
Hash
Function peers
data
data
data
data
data
data
empty space à Log (time)
Log Tail Log Head
New data added here
Space reclaimed
here
wrap
DHT provides foundation for
• Network parallelism
• Node-level fault tolerance
• Distributed metadata
• Self-Optimising for Noisy Fabrics
Log Structured Filesystem at the storage device
level
• High performance device throughput (NAND
Flash)
• Maximises device lifetime
FABRIC-AWARE
FLASH-NATIVE
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME and the Plateau of Death
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 Front and Rear View
► The IME140 is a 1U Intel-based Server with up to 10 NVMe drives.
► 9 of the NVMe Drives serve application data, the 10th is for IME SW internal use (commit log)
► 135 TB per 1U, 18 GB/s Read & Write bandwidth with 2 OPA/EDR links
DDN Confidential
Dual Intel
4108 CPU
8C,
1.8GHz
Ten NVMe Drives
8 System Fans
One 128GB SATA
DoM
Sata DoM delivers much
higher boot performance
at lower power
Six 16GB DIMMS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 PERFORMANCE
► IME140 Performance Demonstrates around
17GB/s and 300K IOPs per node
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME240 Performance Scalability & R/W Parity
240
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME | IME240 Performance
>600K IOPs | 20GB/s | 2 Rack Units
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M
Throughput(MB/s)
IME240 Sequential Throughput
Read (dark red) and write (light red)
0
200
400
600
800
1000
1200
1400
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k
IOPs(1000s)
Throughput(MB/s)
IO Size
IOPs and Throughput for Random Write IO - single
IME240
►File Performance at over 20GB/s in 2RU, 1M write IOPS and 600K read IOPs
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME140 SCALE-OUT NVME
IME140 SPECIFICATIONS
Enclosure 1RU
Disk Slots Up to 9 front accessible 2.5” NVMe drives
PSU/Cooling Redundant Power/Cooling
Network
Connectivity
EDR Infiniband, OPA, Ethernet
Performance 17GB/s per 1U server, 300K IOPs
* Cached
EXTRACT MORE FROM
YOUR APPLICATIONS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► IME’s datapath is designed to deliver
the potential of flash to the application
► Other Burst Buffers use a conventional
filesystem which severely limits the
ability to deliver flash performance
► The IO500 uses “Easy” and “Hard” IOR
benchmarks
• IOR easy. You can set the parameters to be whatever you would
like. You can use any of the modules such as HDF5 or MPI-IO.
Typically people maximize performance by doing file-per-process
and large aligned IO.
• IOR hard. We enforce a particular set of parameters. Specifically,
the IOs are 47008 bytes each interspersed in a single shared file.
Your only control is to specify how many writes each thread does.
► Anyone can get good performance with enough
equipment with the easy benchmark. Good
Performance with the Hard Benchmark requires a
new approach
0
200
400
600
800
1000
1200
Oakforest-PACS at JCAHPC
(IME)
Shaheen at Kaust (Datawarp)
IORResult(GiB/s)
IO500 IOR Results
https://www.vi4io.org/std/io500/start
easy write hard write easy read hard read
-98%
DataWarp
-98%
-20%
-40%
IME
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► Extracting results from IO500
where the client count is 100
nodes or more
► Filesystem options show huge
degradation when the IO
patterns is tough.
► Only IME is able to present Flash
to the applications efficiently
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Oakforest-PACS
at JCAHPC (IME)
Shaheen at
Kaust
(Datawarp)
Mistral at DKRZ
(Lustre)
EMSL Cascade at
PNNL (Lustre)
RatioofEasy:Hard
IO500 Results
Ratio of Easy:Hard (systems with 100 clients or more)
Write Ratio Read Ratio
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
-5000
5000
15000
25000
Large FPP Sequential
Large FPP Random
Large Shared Sequential
Large Shared Random
Small FPP Sequential
Small FPP Random
Small Shared Sequential
Small Shared Random
Medium FPP Sequential
Medium FPP Random
Medium Shared Sequential
Medium Shared Random
IME single server
-5000
5000
15000
25000
Lustre Nvme
-5000
5000
15000
25000
GPFS/NVMe
► IME I/O Characteristics demonstrate clear
benefits in comparison with Traditional
Parallel Filesystems
► Particularly strong performance for small
IO, writes and shared file operations
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
#1 on the IO500
More than 78% higher than the #2 score!
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
► 4xIME240 with parity=2+1 dhtcopy=3
► Device/Server failures are transparent
for the application
► Automatic data rebuild with no service
interruption
► Native De-Dlustered Distributed Erasure
Coding ensures fast rebuild
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
2
3
I/O write intensive job startup
Server 3 fails with 1TB data
1
Data Rebuild Zone
Normal
Service
Resumed
4
~3 mins
Continued Production
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
Continued Production
►Even after single node
failure, the rebuilt data are
still protected against failure
• 3 failing devices on surviving servers
• 2nd node failing
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 MONITORING WITH DDN INSIGHT
►IME Monitoring Integrated in
DDN Insight
►Aggregated views for Clients,
Servers, Devices
►Performance and status data
collection
►Event monitoring and alerts
►Live and historical data analysis
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
ROADMAP ITEM: NFS AS a BFS
NFS
IME
COMPUTE
► Brings scale-out Flash native performance to NFS access
► Shield NFS server from ”tough" IO
► Increase IO throughput from NFS hardware
► Zero application changes - replace NFS mount by IME mount
IME1.2 | TRUE DIAL-IN ERASURE CODING
IME
Server0
IME
Server1
IME
Server2
IME
ServerN
FILE CACHE
8+38+1
6+0 8+2
6+1
4+1
4+14+1
8+0
1+1
▶ IME1.1 supports multiple resilience levels
through flexible, adaptive erasure coding
▶ System Wide Default up to 15+3
▶ Applications can overide defaults and select
a specific Erasure Coding Scheme
DEFAULT: 8+3
Erasure coding options:
1+1 1+2 1+3
2+1 2+2 2+3
2+3 3+2 3+3
... ... ...
15+1 15+2 15+3
ddn.com©2018 DataDirect Networks, Inc. *Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Thank You!
Keep in touch with us.
9351 Deering Avenue
Chatsworth, CA 91311
1.800.837.2298
1.818.700.4000
company/datadirect-networks
@ddn_limitless
sales@ddn.com
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
Shared File
Shared File
►Parallel File systems can exhibit extremely
poor performance for shared file IO due to
internal lock management as a result of
managing files in large lock units
►IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior
to flushing to the parallel file system
Performance
barrier
file
file
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
►Thick File system SW layers and
traditional data layout severly
restricts performance for tough
workloads
►IME’s lean write anywhere, fully
parallel IO completely removes the
barriers that prevent your application
seeing full performance
FILESYSTEM
LAYERS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
SHARED FILE I/O OPTIMIZATION
Process 0
File
Process 1 Process 2
Filesystem lock
management when
IO's cross page
boundaries
► Parallel File systems can exhibit extremely poor
performance for shared file IO due to internal lock
management as a result of managing files in large
lock units
► IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior to
flushing to the parallel file system
DDN Confidential
PFS
FLASH-NATIVE - MAXIMISE FLASH PERFORMANCE AND
LIFETIME
valid sectors with
user data
Random writes incoming with LBA
offsets corresponding to existing
data
Log Structured Filesystem:
SSDs sees writes to new block
ranges
free, unused
blocks
If IME reaches threshold then
data is flushed or purged in large
chunks
• Standard Storage Systems cannot
manage Flash without incurring
costly garbage collection,
reducing performance and SSD
Lifetime
• IME's Flash-Native approach
maximises Flash lifetime by
issuing IO in 128K chunks and
unmapping in large chunks
IMEPFS
new writes invalidate
corresponding sectors
blocks with large invalid sector
count undergo garbage
collection
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME IS UNIQUE!
Designed for Scalability
Patented DDN Algorithms
Scale-Out Data Protection
Distributed Erasure Coding
Integrated With File Systems
Designed to Accelerate Lustre*, GPFS
No Code Modification Needed
Fully POSIX & HPC
Compatible
No Application Modifications
Writes Fast; Read Fast Too
No other system offers both at scale.
Intelligent, Adaptive System
On-the-Fly Data Placement

More Related Content

PDF
NVMe and Flash – Make Your Storage Great Again!
PPTX
Integrating Hyper-converged Systems with Existing SANs
PPT
Fighting the Hidden Costs of Data Storage
PDF
Data core overview - haluk-final
PPTX
Increase Your Mission Critical Application Performance without Breaking the B...
PPTX
DataCore At VMworld 2016
PDF
How to Integrate Hyperconverged Systems with Existing SANs
PPTX
Delivering First Class performance and Availability for Virtualized Tier 1 Apps
NVMe and Flash – Make Your Storage Great Again!
Integrating Hyper-converged Systems with Existing SANs
Fighting the Hidden Costs of Data Storage
Data core overview - haluk-final
Increase Your Mission Critical Application Performance without Breaking the B...
DataCore At VMworld 2016
How to Integrate Hyperconverged Systems with Existing SANs
Delivering First Class performance and Availability for Virtualized Tier 1 Apps

What's hot (20)

PPTX
Can $0.08 Change your View of Storage?
PDF
NetApp enterprise All Flash Storage
PPTX
Solutions for Healthcare IT
PPT
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
PPTX
End User Computing with NetApp
PDF
The Value of NetApp with VMware
PDF
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
PDF
32992 lam ebc storage overview3
PDF
IBM Storage for SAP HANA Deployments
PPTX
Software Defined Storage In Action
PPTX
Software-defined Storage in Action
PDF
DataCore Software - The one and only Storage Hypervisor
PPTX
Seize Profits in the Cloud with SolidFire
PPTX
There are 250 Database products, are you running the right one?
PDF
NetApp HCI
PDF
Overview of how NetApp IT Runs NetApp Technology in Their Enterprise
PDF
Software Defined Storage - Open Framework and Intel® Architecture Technologies
PDF
Modernize Your Oracle Environment with an Agile Data Infrastructure
PPTX
Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...
PDF
Enterprise Mass Storage TCO Case Study
Can $0.08 Change your View of Storage?
NetApp enterprise All Flash Storage
Solutions for Healthcare IT
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
End User Computing with NetApp
The Value of NetApp with VMware
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
32992 lam ebc storage overview3
IBM Storage for SAP HANA Deployments
Software Defined Storage In Action
Software-defined Storage in Action
DataCore Software - The one and only Storage Hypervisor
Seize Profits in the Cloud with SolidFire
There are 250 Database products, are you running the right one?
NetApp HCI
Overview of how NetApp IT Runs NetApp Technology in Their Enterprise
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Modernize Your Oracle Environment with an Agile Data Infrastructure
Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...
Enterprise Mass Storage TCO Case Study
Ad

Similar to IME - Unlocking the Potential of NVMe (20)

PDF
IO Management with IME 1.1
PDF
Infinite Memory Engine: HPC in the FLASH Era
PDF
DDN Strategic Vision Tour June 2015
PDF
DDN: Protecting Your Data, Protecting Your Hardware
PDF
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
PDF
DDN and Intel: Partnered for Exascale
PDF
DDN Product Update from SC13
PDF
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
PDF
Long Live Posix - HPC Storage and the HPC Datacenter
PPTX
DDN EXA 5 - Innovation at Scale
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Optimizing Lustre and GPFS with DDN
PDF
Scalar Decisions: Emerging Trends and Technologies in Storage
PPTX
Keith Norbie Flash Storage decision methodology - mnvmug
PPTX
Webinar: Getting Beyond Flash 101 - Flash 102 Selecting the Right Flash Array
PDF
Storage networking fcf_co_eiscsivsn_technology
 
PPTX
Presentation symmetrix vmax family with enginuity 5876
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
IO Management with IME 1.1
Infinite Memory Engine: HPC in the FLASH Era
DDN Strategic Vision Tour June 2015
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN and Intel: Partnered for Exascale
DDN Product Update from SC13
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Long Live Posix - HPC Storage and the HPC Datacenter
DDN EXA 5 - Innovation at Scale
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Optimizing Lustre and GPFS with DDN
Scalar Decisions: Emerging Trends and Technologies in Storage
Keith Norbie Flash Storage decision methodology - mnvmug
Webinar: Getting Beyond Flash 101 - Flash 102 Selecting the Right Flash Array
Storage networking fcf_co_eiscsivsn_technology
 
Presentation symmetrix vmax family with enginuity 5876
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Ceph Community Talk on High-Performance Solid Sate Ceph
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
The various Industrial Revolutions .pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Architecture types and enterprise applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
project resource management chapter-09.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Hybrid model detection and classification of lung cancer
Tartificialntelligence_presentation.pptx
Getting started with AI Agents and Multi-Agent Systems
NewMind AI Weekly Chronicles - August'25-Week II
OMC Textile Division Presentation 2021.pptx
observCloud-Native Containerability and monitoring.pptx
Getting Started with Data Integration: FME Form 101
The various Industrial Revolutions .pptx
Developing a website for English-speaking practice to English as a foreign la...
cloud_computing_Infrastucture_as_cloud_p
NewMind AI Weekly Chronicles – August ’25 Week III
Architecture types and enterprise applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Web App vs Mobile App What Should You Build First.pdf
project resource management chapter-09.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Hybrid model detection and classification of lung cancer

IME - Unlocking the Potential of NVMe

  • 1. DDN Confidential DDN Storage | ©2018 DataDirect Networks, Inc. IME: Unlocking the Potential of NVMe
  • 2. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN Direction| Building on 20 Years of Innovation JCAHPC >1TB/sec NVMe S2A Systems 1998 2000 2016 2017 SFA – Scale-Out Systems IME – Software Defined Elastic Data Services Data Integrity, Declustering, Erasure Coding – innovative data protect schemas, geo distribution, new orders of scaling Storage Orchestration – Open APIs, Kubernetes, Docker, Openstack, Ansible, Puppet… New Hierarchies – automated data placement, IOPs and Bulk IO Engines, Flash and NVRAM – performance, lifetime, memory-class HW Fabrics and Interconnects – NVMe (oF), IB, OPA, Datacentre Ethernet, Gen-Z Virtualisation | Containers | Tenancy – secure tenants at scale, SCSI VM stack Disaggregated, Shared Nothing - KV stores, node-local, edge server, SW/cloud deployment Distributed Storage: GPFS (Spectrum Scale) and Lustre, NFS/SMB, POSIX and POSIX-lite, Object Technologyanddemand 2020 SFA12Kxi 4.5 M IOPS ONRL Spider 2 1 TB/sec HDD First Large Lustre System First S2A Shipped to NASA 20132003 Elastic Data Services for Performance and Scale
  • 3. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN the First to Realize the Research Dream
  • 4. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. WHAT IS IME? IME’s Active I/O Tier, is inserted right between Compute and the parallel file system IME software intelligently virtualizes disparate NVMe SSDs into a single pool of shared memory that accelerates I/O, PFS & Applications ► Scale-Out Flash Cache Layer using NVMe SSDs inserted between compute cluster and Parallel File System (PFS) • IME is configured as CLUSTER with multiple NVMe servers • All compute nodes can access cache data on IME ► Accelerates difficult IO patterns: small/random/shared file/high concurrency due to thin SW IO management layer ► configured as scale-out massive cache layer with huge IO bandwidth and IOPs
  • 5. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME INTERNALS File1 DFCD3455 File4 52ED789E File3 46042D43 File6 DC355CE Data Key Distributed Network Hash Function peers data data data data data data empty space à Log (time) Log Tail Log Head New data added here Space reclaimed here wrap DHT provides foundation for • Network parallelism • Node-level fault tolerance • Distributed metadata • Self-Optimising for Noisy Fabrics Log Structured Filesystem at the storage device level • High performance device throughput (NAND Flash) • Maximises device lifetime FABRIC-AWARE FLASH-NATIVE
  • 6. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME and the Plateau of Death
  • 7. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME140 Front and Rear View ► The IME140 is a 1U Intel-based Server with up to 10 NVMe drives. ► 9 of the NVMe Drives serve application data, the 10th is for IME SW internal use (commit log) ► 135 TB per 1U, 18 GB/s Read & Write bandwidth with 2 OPA/EDR links DDN Confidential Dual Intel 4108 CPU 8C, 1.8GHz Ten NVMe Drives 8 System Fans One 128GB SATA DoM Sata DoM delivers much higher boot performance at lower power Six 16GB DIMMS
  • 8. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME140 PERFORMANCE ► IME140 Performance Demonstrates around 17GB/s and 300K IOPs per node
  • 9. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME240 Performance Scalability & R/W Parity 240
  • 10. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME | IME240 Performance >600K IOPs | 20GB/s | 2 Rack Units 0 5000 10000 15000 20000 25000 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M Throughput(MB/s) IME240 Sequential Throughput Read (dark red) and write (light red) 0 200 400 600 800 1000 1200 1400 0 5000 10000 15000 20000 25000 4k 8k 16k 32k 64k 128k IOPs(1000s) Throughput(MB/s) IO Size IOPs and Throughput for Random Write IO - single IME240 ►File Performance at over 20GB/s in 2RU, 1M write IOPS and 600K read IOPs
  • 11. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN | IME140 SCALE-OUT NVME IME140 SPECIFICATIONS Enclosure 1RU Disk Slots Up to 9 front accessible 2.5” NVMe drives PSU/Cooling Redundant Power/Cooling Network Connectivity EDR Infiniband, OPA, Ethernet Performance 17GB/s per 1U server, 300K IOPs * Cached EXTRACT MORE FROM YOUR APPLICATIONS
  • 12. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD ► IME’s datapath is designed to deliver the potential of flash to the application ► Other Burst Buffers use a conventional filesystem which severely limits the ability to deliver flash performance ► The IO500 uses “Easy” and “Hard” IOR benchmarks • IOR easy. You can set the parameters to be whatever you would like. You can use any of the modules such as HDF5 or MPI-IO. Typically people maximize performance by doing file-per-process and large aligned IO. • IOR hard. We enforce a particular set of parameters. Specifically, the IOs are 47008 bytes each interspersed in a single shared file. Your only control is to specify how many writes each thread does. ► Anyone can get good performance with enough equipment with the easy benchmark. Good Performance with the Hard Benchmark requires a new approach 0 200 400 600 800 1000 1200 Oakforest-PACS at JCAHPC (IME) Shaheen at Kaust (Datawarp) IORResult(GiB/s) IO500 IOR Results https://www.vi4io.org/std/io500/start easy write hard write easy read hard read -98% DataWarp -98% -20% -40% IME
  • 13. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD ► Extracting results from IO500 where the client count is 100 nodes or more ► Filesystem options show huge degradation when the IO patterns is tough. ► Only IME is able to present Flash to the applications efficiently 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% Oakforest-PACS at JCAHPC (IME) Shaheen at Kaust (Datawarp) Mistral at DKRZ (Lustre) EMSL Cascade at PNNL (Lustre) RatioofEasy:Hard IO500 Results Ratio of Easy:Hard (systems with 100 clients or more) Write Ratio Read Ratio
  • 14. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD -5000 5000 15000 25000 Large FPP Sequential Large FPP Random Large Shared Sequential Large Shared Random Small FPP Sequential Small FPP Random Small Shared Sequential Small Shared Random Medium FPP Sequential Medium FPP Random Medium Shared Sequential Medium Shared Random IME single server -5000 5000 15000 25000 Lustre Nvme -5000 5000 15000 25000 GPFS/NVMe ► IME I/O Characteristics demonstrate clear benefits in comparison with Traditional Parallel Filesystems ► Particularly strong performance for small IO, writes and shared file operations
  • 15. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. #1 on the IO500 More than 78% higher than the #2 score!
  • 16. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE ► 4xIME240 with parity=2+1 dhtcopy=3 ► Device/Server failures are transparent for the application ► Automatic data rebuild with no service interruption ► Native De-Dlustered Distributed Erasure Coding ensures fast rebuild
  • 17. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE 2 3 I/O write intensive job startup Server 3 fails with 1TB data 1 Data Rebuild Zone Normal Service Resumed 4 ~3 mins Continued Production
  • 18. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE Continued Production ►Even after single node failure, the rebuilt data are still protected against failure • 3 failing devices on surviving servers • 2nd node failing
  • 19. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 MONITORING WITH DDN INSIGHT ►IME Monitoring Integrated in DDN Insight ►Aggregated views for Clients, Servers, Devices ►Performance and status data collection ►Event monitoring and alerts ►Live and historical data analysis
  • 20. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. ROADMAP ITEM: NFS AS a BFS NFS IME COMPUTE ► Brings scale-out Flash native performance to NFS access ► Shield NFS server from ”tough" IO ► Increase IO throughput from NFS hardware ► Zero application changes - replace NFS mount by IME mount
  • 21. IME1.2 | TRUE DIAL-IN ERASURE CODING IME Server0 IME Server1 IME Server2 IME ServerN FILE CACHE 8+38+1 6+0 8+2 6+1 4+1 4+14+1 8+0 1+1 ▶ IME1.1 supports multiple resilience levels through flexible, adaptive erasure coding ▶ System Wide Default up to 15+3 ▶ Applications can overide defaults and select a specific Erasure Coding Scheme DEFAULT: 8+3 Erasure coding options: 1+1 1+2 1+3 2+1 2+2 2+3 2+3 3+2 3+3 ... ... ... 15+1 15+2 15+3
  • 22. ddn.com©2018 DataDirect Networks, Inc. *Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. Thank You! Keep in touch with us. 9351 Deering Avenue Chatsworth, CA 91311 1.800.837.2298 1.818.700.4000 company/datadirect-networks @ddn_limitless sales@ddn.com
  • 23. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE FILESYSTEM Shared File Shared File ►Parallel File systems can exhibit extremely poor performance for shared file IO due to internal lock management as a result of managing files in large lock units ►IME eliminates contention by managing IO fragments directly, and coalescing IO's prior to flushing to the parallel file system Performance barrier file file
  • 24. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE FILESYSTEM ►Thick File system SW layers and traditional data layout severly restricts performance for tough workloads ►IME’s lean write anywhere, fully parallel IO completely removes the barriers that prevent your application seeing full performance FILESYSTEM LAYERS
  • 25. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. SHARED FILE I/O OPTIMIZATION Process 0 File Process 1 Process 2 Filesystem lock management when IO's cross page boundaries ► Parallel File systems can exhibit extremely poor performance for shared file IO due to internal lock management as a result of managing files in large lock units ► IME eliminates contention by managing IO fragments directly, and coalescing IO's prior to flushing to the parallel file system
  • 26. DDN Confidential PFS FLASH-NATIVE - MAXIMISE FLASH PERFORMANCE AND LIFETIME valid sectors with user data Random writes incoming with LBA offsets corresponding to existing data Log Structured Filesystem: SSDs sees writes to new block ranges free, unused blocks If IME reaches threshold then data is flushed or purged in large chunks • Standard Storage Systems cannot manage Flash without incurring costly garbage collection, reducing performance and SSD Lifetime • IME's Flash-Native approach maximises Flash lifetime by issuing IO in 128K chunks and unmapping in large chunks IMEPFS new writes invalidate corresponding sectors blocks with large invalid sector count undergo garbage collection
  • 27. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN | IME IS UNIQUE! Designed for Scalability Patented DDN Algorithms Scale-Out Data Protection Distributed Erasure Coding Integrated With File Systems Designed to Accelerate Lustre*, GPFS No Code Modification Needed Fully POSIX & HPC Compatible No Application Modifications Writes Fast; Read Fast Too No other system offers both at scale. Intelligent, Adaptive System On-the-Fly Data Placement