SlideShare a Scribd company logo
LLNL-PRES-738369
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Sierra: The LLNL IBM CORAL System
Bronis R. de Supinski
Chief Technology Officer
Livermore Computing
October 6, 2017
LLNL-PRES-738369
2
AdvancedTechnology
Systems(ATS)
Fiscal Year
‘13 ‘14 ‘15 ‘16 ‘17 ‘18
Use
Retire
‘20‘19 ‘21
Commodity
Technology
Systems(CTS)
Procure&
Deploy
Sequoia (LLNL)
ATS 1 – Trinity (LANL/SNL)
ATS 2 – Sierra (LLNL)
Tri-lab Linux Capacity Cluster II (TLCC II)
CTS 1
CTS 2
‘22
System
Delivery
ATS 3 – Crossroads (LANL/SNL)
ATS 4 – (LLNL)
‘23
ATS 5 – (LANL/SNL)
Sierra will be the next ASC ATS platform
Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL
LLNL-PRES-738369
3
Sierra is part of CORAL, the Collaboration
of Oak Ridge, Argonne and Livermore
CORAL is the next major phase in the U.S. Department of Energy’s
scientific computing roadmap and path to exascale computing
Modeled on successful LLNL/ANL/IBM
Blue Gene partnership (Sequoia/Mira)
Long-term contractual partnership
with 2 vendors
2 awardees for 3 platform
acquisition contracts
2 nonrecurring eng. contracts
NRE contract
ANL Aurora contract (2018 delivery)
NRE contract
ORNL Summit contract (2017 delivery)
LLNL Sierra contract (2017 delivery)
RFP
BG/Q Sequoia
BG/L BG/P Dawn
LLNL’s IBM Blue Gene Systems
LLNL-PRES-738369
4
The Sierra system that will replace Sequoia
features a GPU-accelerated architecture
Mellanox Interconnect
Single Plane EDR InfiniBand
2 to 1 Tapered Fat Tree
IBM POWER9
• Gen2 NVLink
NVIDIA Volta
• 7 TFlop/s
• HBM2
• Gen2 NVLink
Components
Compute Node
2 IBM POWER9 CPUs
4 NVIDIA Volta GPUs
NVMe-compatible PCIe 1.6 TB SSD
256 GiB DDR4
16 GiB Globally addressable HBM2
associated with each GPU
Coherent Shared Memory
Compute Rack
Standard 19”
Warm water cooling
Compute System
4320 nodes
1.29 PB Memory
240 Compute Racks
125 PFLOPS
~12 MW
GPFS File System
154 PB usable storage
1.54 TB/s R/W bandwidth
LLNL-PRES-738369
5
Outstanding benchmark analysis by IBM and
NVIDIA demonstrates the system’s usability
Projections included code changes that showed tractable
annotation-based approach (i.e., OpenMP) will be competitive
Figure 5: CORAL benchmark projections show GPU-accelerated system is expected to deliver substan
performance at the system level compared to CPU-only configuration.
The demonstration of compelling, scalable performance at the system level across a
13
CORAL APPLICATION PERFORMANCE PROJECTIONS
0x
2x
4x
6x
8x
10x
12x
14x
QBOX LSMS HACC NEKbone
RelativePerformance
Scalable Science Benchmarks
CPU-only CPU + GPU
0x
2x
4x
6x
8x
10x
12x
14x
CAM-SE UMT2013 AMG2013 MCB
RelativePerformance
Throughput Benchmarks
CPU-only CPU + GPU
LLNL-PRES-738369
6
Sierra system architecture details have
recently been finalized with Go decision
Sierra uSierra
Nodes 4,320 684
POWER9 processors per node 2 2
GV100 (Volta) GPUs per node 4 4
Node Peak (TFLOP/s) 29.1 29.1
System Peak (PFLOP/s) 125 19.9
Node Memory (GiB) 320 320
System Memory (PiB) 1.29 0.209
Interconnect 2x IB EDR 2x IB EDR
Off-Node Aggregate b/w (GB/s) 45.5 45.5
Compute racks 240 38
Network and Infrastructure racks 13 4
Storage Racks 24 4
Total racks 277 46
Peak Power (MW) ~12 ~1.8
These are working numbers; the final configuration
will only be set once the system is fully installed
LLNL-PRES-738369
7
LLNL and ASC platform chose to use a
tapered fat-tree for Sierra’s network
This decision, counter to prevailing wisdom for system design, benefits Sierra’s
planned UQ workload: 5% more UQ simulations at performance loss of < 1%
 Full bandwidth from dual ported Mellanox EDR HCAs to TOR switches
 Half bandwidth from TOR switches to director switches
 An economic trade-off that provides approximately 5% more nodes
LLNL-PRES-738369
8
NVIDIA Volta GPUs (GV100) provide the
bulk of Sierra’s compute capability
To realize Sierra’s full potential, we must exploit the tensor operations. The
commoditization of machine learning will make this an enduring challenge.
9
VOLTA GV100 SM
GV100
FP32 units 64
FP64 units 32
INT32 units 64
Tensor Cores 8
Register File 256 KB
Unified L1/Shared
memory
128 KB
Active Threads 2048
SMs 80
FP64 Units (per SM) 32
FP32 Units (per SM) 64
Tensor Cores (per SM) 8
Register File (per SM) 256KiB
L1/Shared Memory (per SM) 128KiB
Double Precision Peak (TFlop/s) 7 (7.5)
Single Precision Peak (TFlop/s) 14 (15)
Tensor Op Peak (TOp/s) 120
HBM2 Bandwidth (GB/s) 898
NVLINK BW to CPU/Other GPU 75 (60)
LLNL-PRES-738369
9
 The advantages that led us to select Sierra generalize
—Power efficiency
—Network advantages of “fat nodes”
—Balancing capabilities/costs implies complex memory hierarchies
 Planning a similar, unclassified, M&IC resource
—Same architecture as Sierra
—Up to 25% of Sierra’s capability
 Exploring possibilities for other GPU-based resources
—Not necessarily NVIDIA-based
—May support higher single precision performance
Sierra and its EA systems are beginning an
accelerator-based computing era at LLNL
We have multiple projects planned to foster a healthy ecosystem
Sierra overview

More Related Content

PDF
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
PDF
Barcelona Supercomputing Center, Generador de Riqueza
PDF
Sierra Supercomputer: Science Unleashed
PDF
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
Achitecture Aware Algorithms and Software for Peta and Exascale
PDF
Hardware & Software Platforms for HPC, AI and ML
PDF
The Sierra Supercomputer: Science and Technology on a Mission
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
Barcelona Supercomputing Center, Generador de Riqueza
Sierra Supercomputer: Science Unleashed
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
Energy Efficient Computing using Dynamic Tuning
Achitecture Aware Algorithms and Software for Peta and Exascale
Hardware & Software Platforms for HPC, AI and ML
The Sierra Supercomputer: Science and Technology on a Mission

What's hot (20)

PDF
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
PDF
IBM and ASTRON 64bit μServer for DOME
PDF
AI is Impacting HPC Everywhere
PDF
Open Hardware and Future Computing
PDF
NNSA Explorations: ARM for Supercomputing
PDF
IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver
PDF
High Performance Interconnects: Assessment & Rankings
PDF
An Update on Arm HPC
PDF
BKK16-311 EAS Upstream Stategy
PDF
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
PDF
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
PDF
IBM Data Centric Systems & OpenPOWER
PDF
DOME 64-bit μDataCenter
PDF
Lightweight DNN Processor Design (based on NVDLA)
PDF
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
PDF
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
PPT
Active Network Node in Silicon-Based L3 Gigabit Routing Switch
PDF
Bartolomeo_ASGSR
PDF
計算力学シミュレーションに GPU は役立つのか?
PDF
01 From K to Fugaku
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
IBM and ASTRON 64bit μServer for DOME
AI is Impacting HPC Everywhere
Open Hardware and Future Computing
NNSA Explorations: ARM for Supercomputing
IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver
High Performance Interconnects: Assessment & Rankings
An Update on Arm HPC
BKK16-311 EAS Upstream Stategy
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
IBM Data Centric Systems & OpenPOWER
DOME 64-bit μDataCenter
Lightweight DNN Processor Design (based on NVDLA)
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
Active Network Node in Silicon-Based L3 Gigabit Routing Switch
Bartolomeo_ASGSR
計算力学シミュレーションに GPU は役立つのか?
01 From K to Fugaku
Ad

Similar to Sierra overview (20)

PPT
SDC Server Sao Jose
PDF
BURA Supercomputer
PDF
Future Commodity Chip Called CELL for HPC
PDF
PowerDRC/LVS 2.0.1 released by POLYTEDA
PPT
NWU and HPC
PDF
PowerDRC/LVS 2.2 released by POLYTEDA
PPT
Power 7 Overview
PDF
Large-Scale Optimization Strategies for Typical HPC Workloads
PPT
No[1][1]
PDF
Latest HPC News from NVIDIA
PDF
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
PPTX
Steen_Dissertation_March5
PDF
組み込みから HPC まで ARM コアで実現するエコシステム
PPT
POLYTEDA PowerDRC/LVS overview
PDF
Design installation-commissioning-red raider-cluster-ttu
PDF
9/ IBM POWER @ OPEN'16
PDF
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
PPT
Valladolid final-septiembre-2010
PDF
Polyteda Power DRC/LVS July 2016
PPT
Oracle RAC Presentation at Oracle Open World
SDC Server Sao Jose
BURA Supercomputer
Future Commodity Chip Called CELL for HPC
PowerDRC/LVS 2.0.1 released by POLYTEDA
NWU and HPC
PowerDRC/LVS 2.2 released by POLYTEDA
Power 7 Overview
Large-Scale Optimization Strategies for Typical HPC Workloads
No[1][1]
Latest HPC News from NVIDIA
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
Steen_Dissertation_March5
組み込みから HPC まで ARM コアで実現するエコシステム
POLYTEDA PowerDRC/LVS overview
Design installation-commissioning-red raider-cluster-ttu
9/ IBM POWER @ OPEN'16
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Valladolid final-septiembre-2010
Polyteda Power DRC/LVS July 2016
Oracle RAC Presentation at Oracle Open World
Ad

More from Ganesan Narayanasamy (20)

PDF
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
PDF
Chip Design Curriculum development Residency program
PDF
Basics of Digital Design and Verilog
PDF
180 nm Tape out experience using Open POWER ISA
PDF
Workload Transformation and Innovations in POWER Architecture
PDF
OpenPOWER Workshop at IIT Roorkee
PDF
Deep Learning Use Cases using OpenPOWER systems
PDF
IBM BOA for POWER
PDF
OpenPOWER System Marconi100
PDF
OpenPOWER Latest Updates
PDF
POWER10 innovations for HPC
PDF
Deeplearningusingcloudpakfordata
PDF
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
PDF
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
PDF
AI in healthcare - Use Cases
PDF
AI in Health Care using IBM Systems/OpenPOWER systems
PDF
AI in Healh Care using IBM POWER systems
PDF
Poster from NUS
PDF
SAP HANA on POWER9 systems
PPTX
Graphical Structure Learning accelerated with POWER9
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Chip Design Curriculum development Residency program
Basics of Digital Design and Verilog
180 nm Tape out experience using Open POWER ISA
Workload Transformation and Innovations in POWER Architecture
OpenPOWER Workshop at IIT Roorkee
Deep Learning Use Cases using OpenPOWER systems
IBM BOA for POWER
OpenPOWER System Marconi100
OpenPOWER Latest Updates
POWER10 innovations for HPC
Deeplearningusingcloudpakfordata
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare - Use Cases
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Healh Care using IBM POWER systems
Poster from NUS
SAP HANA on POWER9 systems
Graphical Structure Learning accelerated with POWER9

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Modernizing your data center with Dell and AMD
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Modernizing your data center with Dell and AMD
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity

Sierra overview

  • 1. LLNL-PRES-738369 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Sierra: The LLNL IBM CORAL System Bronis R. de Supinski Chief Technology Officer Livermore Computing October 6, 2017
  • 2. LLNL-PRES-738369 2 AdvancedTechnology Systems(ATS) Fiscal Year ‘13 ‘14 ‘15 ‘16 ‘17 ‘18 Use Retire ‘20‘19 ‘21 Commodity Technology Systems(CTS) Procure& Deploy Sequoia (LLNL) ATS 1 – Trinity (LANL/SNL) ATS 2 – Sierra (LLNL) Tri-lab Linux Capacity Cluster II (TLCC II) CTS 1 CTS 2 ‘22 System Delivery ATS 3 – Crossroads (LANL/SNL) ATS 4 – (LLNL) ‘23 ATS 5 – (LANL/SNL) Sierra will be the next ASC ATS platform Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL
  • 3. LLNL-PRES-738369 3 Sierra is part of CORAL, the Collaboration of Oak Ridge, Argonne and Livermore CORAL is the next major phase in the U.S. Department of Energy’s scientific computing roadmap and path to exascale computing Modeled on successful LLNL/ANL/IBM Blue Gene partnership (Sequoia/Mira) Long-term contractual partnership with 2 vendors 2 awardees for 3 platform acquisition contracts 2 nonrecurring eng. contracts NRE contract ANL Aurora contract (2018 delivery) NRE contract ORNL Summit contract (2017 delivery) LLNL Sierra contract (2017 delivery) RFP BG/Q Sequoia BG/L BG/P Dawn LLNL’s IBM Blue Gene Systems
  • 4. LLNL-PRES-738369 4 The Sierra system that will replace Sequoia features a GPU-accelerated architecture Mellanox Interconnect Single Plane EDR InfiniBand 2 to 1 Tapered Fat Tree IBM POWER9 • Gen2 NVLink NVIDIA Volta • 7 TFlop/s • HBM2 • Gen2 NVLink Components Compute Node 2 IBM POWER9 CPUs 4 NVIDIA Volta GPUs NVMe-compatible PCIe 1.6 TB SSD 256 GiB DDR4 16 GiB Globally addressable HBM2 associated with each GPU Coherent Shared Memory Compute Rack Standard 19” Warm water cooling Compute System 4320 nodes 1.29 PB Memory 240 Compute Racks 125 PFLOPS ~12 MW GPFS File System 154 PB usable storage 1.54 TB/s R/W bandwidth
  • 5. LLNL-PRES-738369 5 Outstanding benchmark analysis by IBM and NVIDIA demonstrates the system’s usability Projections included code changes that showed tractable annotation-based approach (i.e., OpenMP) will be competitive Figure 5: CORAL benchmark projections show GPU-accelerated system is expected to deliver substan performance at the system level compared to CPU-only configuration. The demonstration of compelling, scalable performance at the system level across a 13 CORAL APPLICATION PERFORMANCE PROJECTIONS 0x 2x 4x 6x 8x 10x 12x 14x QBOX LSMS HACC NEKbone RelativePerformance Scalable Science Benchmarks CPU-only CPU + GPU 0x 2x 4x 6x 8x 10x 12x 14x CAM-SE UMT2013 AMG2013 MCB RelativePerformance Throughput Benchmarks CPU-only CPU + GPU
  • 6. LLNL-PRES-738369 6 Sierra system architecture details have recently been finalized with Go decision Sierra uSierra Nodes 4,320 684 POWER9 processors per node 2 2 GV100 (Volta) GPUs per node 4 4 Node Peak (TFLOP/s) 29.1 29.1 System Peak (PFLOP/s) 125 19.9 Node Memory (GiB) 320 320 System Memory (PiB) 1.29 0.209 Interconnect 2x IB EDR 2x IB EDR Off-Node Aggregate b/w (GB/s) 45.5 45.5 Compute racks 240 38 Network and Infrastructure racks 13 4 Storage Racks 24 4 Total racks 277 46 Peak Power (MW) ~12 ~1.8 These are working numbers; the final configuration will only be set once the system is fully installed
  • 7. LLNL-PRES-738369 7 LLNL and ASC platform chose to use a tapered fat-tree for Sierra’s network This decision, counter to prevailing wisdom for system design, benefits Sierra’s planned UQ workload: 5% more UQ simulations at performance loss of < 1%  Full bandwidth from dual ported Mellanox EDR HCAs to TOR switches  Half bandwidth from TOR switches to director switches  An economic trade-off that provides approximately 5% more nodes
  • 8. LLNL-PRES-738369 8 NVIDIA Volta GPUs (GV100) provide the bulk of Sierra’s compute capability To realize Sierra’s full potential, we must exploit the tensor operations. The commoditization of machine learning will make this an enduring challenge. 9 VOLTA GV100 SM GV100 FP32 units 64 FP64 units 32 INT32 units 64 Tensor Cores 8 Register File 256 KB Unified L1/Shared memory 128 KB Active Threads 2048 SMs 80 FP64 Units (per SM) 32 FP32 Units (per SM) 64 Tensor Cores (per SM) 8 Register File (per SM) 256KiB L1/Shared Memory (per SM) 128KiB Double Precision Peak (TFlop/s) 7 (7.5) Single Precision Peak (TFlop/s) 14 (15) Tensor Op Peak (TOp/s) 120 HBM2 Bandwidth (GB/s) 898 NVLINK BW to CPU/Other GPU 75 (60)
  • 9. LLNL-PRES-738369 9  The advantages that led us to select Sierra generalize —Power efficiency —Network advantages of “fat nodes” —Balancing capabilities/costs implies complex memory hierarchies  Planning a similar, unclassified, M&IC resource —Same architecture as Sierra —Up to 25% of Sierra’s capability  Exploring possibilities for other GPU-based resources —Not necessarily NVIDIA-based —May support higher single precision performance Sierra and its EA systems are beginning an accelerator-based computing era at LLNL We have multiple projects planned to foster a healthy ecosystem