SlideShare a Scribd company logo
High-Performance Computing
Dell Intel EE Lustre Storage
Melbourne Big Data User Group
January 2016
Andrew Underwood
Lead HPC Technologist
The future of data is BIG, and HPC is needed to power it
The digital universe is growing 40%* a year into the next decade…
2016 --------------------------------------------------- 2020
~90%
of the worlds data
has been created
in the last 2 years
* Source: EMC Digital Universe with Research & Analysis by IDC
44ZB
of data will exist
in the digital universe
by the year 2020
~37%
of the data generated
in 2020 will be used for
analysis and processing
The convergence of HPC and Big Data are driving change
Scalable performance and massive capacity
Stable, predictable and reliable
Balanced configuration, designed for parallel input-output
Support compute intensive, and data intensive ‘Big Data’ workloads with Hadoop
1
2
3
4
Enterprise grade technology with 24/7 access to data5
Introducing our latest generation Dell EE Lustre Storage
Limitless
Endless Scalability
11GB/s
Peak Read
per building block
Up to 4PB
raw capacity
per rack
Up to 44GB/s
throughput
per rack
The Ultimate HPC File SystemParallel
For ultimate scale-out
Hadoop
Converged Platform
7GB/s
Peak Write
per building block
What is Lustre?
Designed for maximum performance and scalability….
• Open-Source parallel file system built on open standards hardware
• Global, shared name space – accessible by over 25,000+ clients
• Object based file system with distributed file stripe across storage targets
• Highly available design, with no single point of failure
• Scalable beyond an Exabyte in capacity, and over a Petabyte of sustained throughput
• Accessible by clients over network (Ethernet, InfiniBand, Omni-Path Architecture)
The Intel Enterprise Edition of Lustre
Inside the Lustre File System
Ethernet Management
Network
High Performance Data Network
(InfiniBand, Ethernet, Omni-Path)
Metadata
Servers
Object Storage
Servers
Intel Manager for Lustre
Lustre Clients (1 – 100,000+)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Storage servers grouped
into failover pairs
A scalable building block design
Designed to scale to the Exabyte, with a Petabyte of throughput
Solution benefits & Dell differentiation
• File system based on Intel Enterprise Edition for Lustre
• Single file system namespace scalable to high capacities
and performance
• Engineered by Dell HPC Engineering to provide optimal
performance on Dell hardware platform
• Design providing maximum throughput per building
block with on-the-fly storage expansion
• Solution design for Big Data workloads using Intel
Hadoop Adapter for Lustre (HAL)
• Share data with other file systems utilizing optional
NFS/CIFS gateway
• Dell Networking 10/40GbE, InfiniBand, or Omni-Path
How Lustre Works
Basic File Storage Principles of the Lustre File System
The file system consists of many object storage targets (OST) which are presented as a single unified file system.
IO can be increased in most cases by using file striping. File striping divides data into chunks that are distributed
across OSTs within the file system.
Lustre Write File Data Flow
1) Client requests to write a file to the file system
2) Client contacts the MDS with a write request
3) MDS checks the user authentication and intended
location of the file
4) MDS responds to the Client with a list of OSTs that the
Client can write the file to
5) Client receives the response and writes to the assigned
OSTs without further communication with the MDS
Lustre Read File Data Flow
1) Client requests to read a file from the file system
2) Client contacts the MDS with a read request
3) MDS checks the user authentication and file location
4) MDS responds to the Client with a list of OSTs that the
stripes of the file are located
5) Client receives the response and reads the data from
the OSTs without further communication with the MDS
Connectivity and HSM
Lustre design is performance centric
CIFS Access
As over 99% of the worlds HPC and Supercomputing environments are
built on Linux, Lustre is designed to be accessible by clients running
RHEL/CentOS/SUSE Linux operating systems.
Access can still be provided via CIFS using Dell PowerEdge R630 clients
running SAMBA gateways.
Hierarchical Storage Management (HSM)
HSM can be configured and managed via the Intel Manager for Lustre
(IML), and provides a reliable mechanism for archiving data onto tiers of
secondary, high-capacity, affordable storage.
The Intel Lustre HSM framework uses the Linux copytool and Robinhood
Where performance meets scalability
• Peak Write = 7GB/s
• Peak Read = 11GB/s
• Peak Write = 12.6K IOPS
• Peak Read = 96K IOPS
Sequential IO Performance – 1 MB Stripe Random IO Performance – 4 MB Stripe
Increase metadata performance with Luster Distributed
Namespace feature (Lustre DNE)
• Lustre DNE Phase 1 is the
designation for Luster DNE
Remote Directories.
• Lustre sub-directories can
now be distributed across
multiple MDTs to increase
metadata capacity
capabilities and
performance.
Configure, optimize and manage using Intel Manager for Lustre
IML simplifies your management workflow:
• UI driven configuration, monitoring, and
overall management lowers complexity
and cost
• Advanced charting options illustrate
storage performance in near-real time
• Automated configuration of storage
servers pairs for increased high-availability
• Configure and manage power distribution
units for automated fail-over
• Smart, intuitive alerts and logs help storage
administrators monitor storage
performance
Intel Manager for Lustre (IML) software dashboard
The ‘dashboard’ canvas displays a variety of dynamic charts
illustrating performance levels and resource utilization.
Administrators can easily view file systems, check resource
consumption for jobs, and monitor performance.
In depth storage hardware reporting is possible when combined
with optional hardware vendor provided plug-ins.
System status indictor
provides the status for all
managed file systems. Click
to go to detailed
information.
Easily configure servers,
volumes and power
controls. Optionally, enable
HSM per file system
Intelligent, intuitive log files
– quickly understand how
your storage is performing
Where Big Data meets high-performance computing
As data sets expand, the infrastructure that supports them needs to be faster, bigger, more scalable
Hadoop over Lustre (HAL)
Intel EE Lustre has plugins for Apache Hadoop and Cloudera Distribution of Hadoop, which require no changes to the
Lustre architecture and allows for big data workloads to take advantage of high-performance infrastructure, by replacing
the HDFS portion of Hadoop with the Lustre file system.
This is driving a new market for High-Performance Data Analytics, where HPC boosts performance of Big Data workloads.
Measuring the performance of Hadoop running on Lustre – Check out this video of Dell deploying Hadoop on Lustre in production
* Source: Tests are from Intel and Tata Consultancy Services Lustre Big Data White Paper (Click Here)
Start by making your data future ready!
Contact your local Dell Account Executive for a free HPC workshop, and we can help identify
bottlenecks, opportunities for optimization, and provide a plan for how Lustre can boost your
performance, and build a future ready architecture for the convergence of HPC and Big Data
19
Dell - Internal Use - Confidential

More Related Content

PDF
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
PPTX
The Importance of Fast, Scalable Storage for Today’s HPC
PDF
Blazing Fast Lustre Storage
PDF
Lustre Releases Update from LAD'14
PDF
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
PDF
Optimizing Lustre and GPFS with DDN
PDF
Optimizing Dell PowerEdge Configurations for Hadoop
PPTX
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
The Importance of Fast, Scalable Storage for Today’s HPC
Blazing Fast Lustre Storage
Lustre Releases Update from LAD'14
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Optimizing Lustre and GPFS with DDN
Optimizing Dell PowerEdge Configurations for Hadoop
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...

What's hot (20)

PDF
Backup Options for IBM PureData for Analytics powered by Netezza
PPTX
DDN EXA 5 - Innovation at Scale
PDF
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
PDF
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
PDF
HDFS Analysis for Small Files
PPTX
HDFS Tiered Storage
PDF
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
PPTX
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
PDF
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
PPTX
HDFS Erasure Coding in Action
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PPTX
Hadoop and WANdisco: The Future of Big Data
PDF
Spectrum Scale final
PPTX
IBM GPFS
PPTX
Big Data Platform Industrialization
PDF
Make sense of important data faster with AWS EC2 M6i instances
PDF
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
PDF
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Backup Options for IBM PureData for Analytics powered by Netezza
DDN EXA 5 - Innovation at Scale
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
HDFS Analysis for Small Files
HDFS Tiered Storage
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
HDFS Erasure Coding in Action
Flexible and Fast Storage for Deep Learning with Alluxio
Hadoop and WANdisco: The Future of Big Data
Spectrum Scale final
IBM GPFS
Big Data Platform Industrialization
Make sense of important data faster with AWS EC2 M6i instances
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Ad

Viewers also liked (15)

PDF
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
PDF
Lustre client performance comparison and tuning (1.8.x to 2.x)
PDF
HPC Storage Appliances for the Enterpris
PDF
HPC Storage and IO Trends and Workflows
PPT
GPFS - graphical intro
PDF
What HPC can learn from DevOps?
PDF
Streamlining HPC Workloads with Containers
PDF
Trends towards the merge of HPC + Big Data systems
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PDF
[Container world 2017] The Questions You're Afraid to Ask about Containers
PPTX
Exploring the Momentum: The Intersection of AI and HPC
PPT
Distributed & parallel system
PPTX
Top 5 Deep Learning Stories 2/24
PDF
[DDBJing31] DDBJ と NIG SuperComputer の使い方
PPTX
HPC Top 5 Stories: March 22, 2017
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
Lustre client performance comparison and tuning (1.8.x to 2.x)
HPC Storage Appliances for the Enterpris
HPC Storage and IO Trends and Workflows
GPFS - graphical intro
What HPC can learn from DevOps?
Streamlining HPC Workloads with Containers
Trends towards the merge of HPC + Big Data systems
Performance comparison of Distributed File Systems on 1Gbit networks
[Container world 2017] The Questions You're Afraid to Ask about Containers
Exploring the Momentum: The Intersection of AI and HPC
Distributed & parallel system
Top 5 Deep Learning Stories 2/24
[DDBJing31] DDBJ と NIG SuperComputer の使い方
HPC Top 5 Stories: March 22, 2017
Ad

Similar to Dell Lustre Storage Architecture Presentation - MBUG 2016 (20)

PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
PDF
Expanding Adoption and Access to Next Generation of HPC Storage Utilizing Int...
PPT
11540800.ppt
PDF
제3회난공불락 오픈소스 인프라세미나 - lustre
PPTX
Whamcloud is Back: Lustre Today and Future
PPTX
Whamcloud - Lustre for HPC and Ai
PDF
Lustre at indiana university
PDF
Seagate SC15 Announcements for HPC
PDF
clusterstor-hadoop-data-sheet
PPTX
Xyratex SC13 Podcast
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
Lustre Community Release Update
PDF
Long Live Posix - HPC Storage and the HPC Datacenter
PDF
Accelerate Big Data Processing with High-Performance Computing Technologies
PPT
Integrating Array Management into Lustre
PDF
LUG 2014
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
PDF
Tacc Infinite Memory Engine
PDF
Lustre Generational Performance Improvements & New Features
PPTX
Data-Intensive Workflows with DAOS
HPC DAY 2017 | HPE Storage and Data Management for Big Data
Expanding Adoption and Access to Next Generation of HPC Storage Utilizing Int...
11540800.ppt
제3회난공불락 오픈소스 인프라세미나 - lustre
Whamcloud is Back: Lustre Today and Future
Whamcloud - Lustre for HPC and Ai
Lustre at indiana university
Seagate SC15 Announcements for HPC
clusterstor-hadoop-data-sheet
Xyratex SC13 Podcast
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Lustre Community Release Update
Long Live Posix - HPC Storage and the HPC Datacenter
Accelerate Big Data Processing with High-Performance Computing Technologies
Integrating Array Management into Lustre
LUG 2014
Hp Converged Systems and Hortonworks - Webinar Slides
Tacc Infinite Memory Engine
Lustre Generational Performance Improvements & New Features
Data-Intensive Workflows with DAOS

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Dell Lustre Storage Architecture Presentation - MBUG 2016

  • 1. High-Performance Computing Dell Intel EE Lustre Storage Melbourne Big Data User Group January 2016 Andrew Underwood Lead HPC Technologist
  • 2. The future of data is BIG, and HPC is needed to power it The digital universe is growing 40%* a year into the next decade… 2016 --------------------------------------------------- 2020 ~90% of the worlds data has been created in the last 2 years * Source: EMC Digital Universe with Research & Analysis by IDC 44ZB of data will exist in the digital universe by the year 2020 ~37% of the data generated in 2020 will be used for analysis and processing
  • 3. The convergence of HPC and Big Data are driving change Scalable performance and massive capacity Stable, predictable and reliable Balanced configuration, designed for parallel input-output Support compute intensive, and data intensive ‘Big Data’ workloads with Hadoop 1 2 3 4 Enterprise grade technology with 24/7 access to data5
  • 4. Introducing our latest generation Dell EE Lustre Storage Limitless Endless Scalability 11GB/s Peak Read per building block Up to 4PB raw capacity per rack Up to 44GB/s throughput per rack The Ultimate HPC File SystemParallel For ultimate scale-out Hadoop Converged Platform 7GB/s Peak Write per building block
  • 5. What is Lustre? Designed for maximum performance and scalability…. • Open-Source parallel file system built on open standards hardware • Global, shared name space – accessible by over 25,000+ clients • Object based file system with distributed file stripe across storage targets • Highly available design, with no single point of failure • Scalable beyond an Exabyte in capacity, and over a Petabyte of sustained throughput • Accessible by clients over network (Ethernet, InfiniBand, Omni-Path Architecture)
  • 6. The Intel Enterprise Edition of Lustre
  • 7. Inside the Lustre File System Ethernet Management Network High Performance Data Network (InfiniBand, Ethernet, Omni-Path) Metadata Servers Object Storage Servers Intel Manager for Lustre Lustre Clients (1 – 100,000+) Object Storage Targets (OSTs) Metadata Target (MDT) Management Target (MGT) Storage servers grouped into failover pairs
  • 8. A scalable building block design Designed to scale to the Exabyte, with a Petabyte of throughput Solution benefits & Dell differentiation • File system based on Intel Enterprise Edition for Lustre • Single file system namespace scalable to high capacities and performance • Engineered by Dell HPC Engineering to provide optimal performance on Dell hardware platform • Design providing maximum throughput per building block with on-the-fly storage expansion • Solution design for Big Data workloads using Intel Hadoop Adapter for Lustre (HAL) • Share data with other file systems utilizing optional NFS/CIFS gateway • Dell Networking 10/40GbE, InfiniBand, or Omni-Path
  • 9. How Lustre Works Basic File Storage Principles of the Lustre File System The file system consists of many object storage targets (OST) which are presented as a single unified file system. IO can be increased in most cases by using file striping. File striping divides data into chunks that are distributed across OSTs within the file system. Lustre Write File Data Flow 1) Client requests to write a file to the file system 2) Client contacts the MDS with a write request 3) MDS checks the user authentication and intended location of the file 4) MDS responds to the Client with a list of OSTs that the Client can write the file to 5) Client receives the response and writes to the assigned OSTs without further communication with the MDS Lustre Read File Data Flow 1) Client requests to read a file from the file system 2) Client contacts the MDS with a read request 3) MDS checks the user authentication and file location 4) MDS responds to the Client with a list of OSTs that the stripes of the file are located 5) Client receives the response and reads the data from the OSTs without further communication with the MDS
  • 10. Connectivity and HSM Lustre design is performance centric CIFS Access As over 99% of the worlds HPC and Supercomputing environments are built on Linux, Lustre is designed to be accessible by clients running RHEL/CentOS/SUSE Linux operating systems. Access can still be provided via CIFS using Dell PowerEdge R630 clients running SAMBA gateways. Hierarchical Storage Management (HSM) HSM can be configured and managed via the Intel Manager for Lustre (IML), and provides a reliable mechanism for archiving data onto tiers of secondary, high-capacity, affordable storage. The Intel Lustre HSM framework uses the Linux copytool and Robinhood
  • 11. Where performance meets scalability • Peak Write = 7GB/s • Peak Read = 11GB/s • Peak Write = 12.6K IOPS • Peak Read = 96K IOPS Sequential IO Performance – 1 MB Stripe Random IO Performance – 4 MB Stripe
  • 12. Increase metadata performance with Luster Distributed Namespace feature (Lustre DNE) • Lustre DNE Phase 1 is the designation for Luster DNE Remote Directories. • Lustre sub-directories can now be distributed across multiple MDTs to increase metadata capacity capabilities and performance.
  • 13. Configure, optimize and manage using Intel Manager for Lustre IML simplifies your management workflow: • UI driven configuration, monitoring, and overall management lowers complexity and cost • Advanced charting options illustrate storage performance in near-real time • Automated configuration of storage servers pairs for increased high-availability • Configure and manage power distribution units for automated fail-over • Smart, intuitive alerts and logs help storage administrators monitor storage performance
  • 14. Intel Manager for Lustre (IML) software dashboard The ‘dashboard’ canvas displays a variety of dynamic charts illustrating performance levels and resource utilization. Administrators can easily view file systems, check resource consumption for jobs, and monitor performance. In depth storage hardware reporting is possible when combined with optional hardware vendor provided plug-ins. System status indictor provides the status for all managed file systems. Click to go to detailed information. Easily configure servers, volumes and power controls. Optionally, enable HSM per file system Intelligent, intuitive log files – quickly understand how your storage is performing
  • 15. Where Big Data meets high-performance computing As data sets expand, the infrastructure that supports them needs to be faster, bigger, more scalable Hadoop over Lustre (HAL) Intel EE Lustre has plugins for Apache Hadoop and Cloudera Distribution of Hadoop, which require no changes to the Lustre architecture and allows for big data workloads to take advantage of high-performance infrastructure, by replacing the HDFS portion of Hadoop with the Lustre file system. This is driving a new market for High-Performance Data Analytics, where HPC boosts performance of Big Data workloads. Measuring the performance of Hadoop running on Lustre – Check out this video of Dell deploying Hadoop on Lustre in production * Source: Tests are from Intel and Tata Consultancy Services Lustre Big Data White Paper (Click Here)
  • 16. Start by making your data future ready! Contact your local Dell Account Executive for a free HPC workshop, and we can help identify bottlenecks, opportunities for optimization, and provide a plan for how Lustre can boost your performance, and build a future ready architecture for the convergence of HPC and Big Data
  • 17. 19 Dell - Internal Use - Confidential