SlideShare a Scribd company logo
1
2
THE EXPANDING UNIVERSE OF HPC
JENSEN HUANG | SC19
3
AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI
4
5
COMPUTING FOR THE DA VINCIS OF OUR TIME
FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS
ABCI
SUMMIT
CLIMATE
LBNL | NVIDIA
GENOMICS
ORNL
NUCLEAR WASTE
REMEDIATION
LBNL | PNNL Brown U. NVIDIA
CANCER DETECTION
ORNL | Stony Brook U.
6
FULL STACK
SPEED-UP
CUDA-X
CUDA
AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG
CUDA 10.2
cuTENSOR 1.0
cuSOLVER 10.3
cuBLAS 10.2
cuDNN 7.6
TensorRT 6.0
DALI 0.15
NCCL 2.5
IndeX 2.1
OptiX 7.0
RAPIDS 0.10
Spark XGBoost
3x in 2 Years
2017
2019
2018
Time to Solution
27 Hours
20 Hours
10 Hours
Amber
Chroma
GROMACS
GTC
LAMMPS
MILC
NAMD
QE
SPECFEM3D
TensorFlow
VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC
[moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D
[four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
7
THE EXPANDING UNIVERSE OF HPC
NETWORK
EDGE ANALYTICS
SIMULATION
AI
Edge
Cloud
Arm
Data
Analytics
Extreme
IO
EXTREME IO
8
INCREDIBLE ADVANCES IN AI
WRITING
DIALOG
TRANSLATION
SUMMARIZATION
Q&A
CLASSIFICATION
2012 2019
BERT
TRANSFORMER
ALEXNET
CNN
3D POSE
DENOISING
SEGMENTATION
OBJECT RECOGNITION
CLASSIFICATION
IMAGE GENERATION
9
GPU COMPUTING POWERS AI ADVANCES
#1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE
Doubling
2 Years
Doubling
3.4 Months
Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years
K80 SERVER
DGX 2 Hours
600 Hours
Time to Train (ResNet-50)
10
NVIDIA AI END-TO-END PLATFORM
TRAINING AUTONOMOUS MACHINES
DGX HGX EGX AGX
EDGE AICLOUD
11
AI FOR SCIENCE
EXPERIMENTATION
DATA
SIMULATION
DATA
NEURAL ESTIMATION
Real-time Steering Fast Approximation
Design Space Exploration
ICF + MERLIN — Fusion
Inverse Problems
LIGO — Gravitational Waves
Faster Prediction
ANI + MD – Chemistry
Real-time Steering
ITER – Fusion Energy
12
13
1x
Data Transfer
100x
Data Collected
STREAMING AI
SOFTWARE-DEFINED
SENSORS
BUILD MODELSSTREAMING AI
PROCESSING
ECMWF: 287 TB/dayLSST: 20 TB/day
SKA: 16 TB/sec
14
NVIDIA EGX STACK
NGC
Kubernetes Networking Storage Security
CUDA-X
Third-Party ISVs
METROPOLIS
IMAGE
PROCESSING
DECODE DNN GRAPHICS ENCODE
DEEPSTREAM
Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust
Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA
Industrial-strength Cloud Native and AI Stack
NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM
15
VERTICAL INDUSTRY FRAMEWORKS
Clara Metropolis
Isaac Omniverse Aerial
DRIVE
WORLD’S LARGEST DELIVERY SERVICE
ADOPTS NVIDIA AI
PUTTING AI TO WORK
16
NVIDIA EGX
Edge Supercomputing Platform
17
SUPERCOMPUTING CLOUD
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec:
Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50],
VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE
18
SUPERCOMPUTING CLOUD
8x GPU Instance
1x GPU Instance
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec:
Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50],
VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
SUPER COMPUTING IS HARD —
GPU CLOUD 1/7TH COST OF CPU CLOUD
48x Faster, 1/7th the Cost
19
ICECUBE OBSERVATORY
DETECTING NEUTRINOS
50K NVIDIA GPUs IN THE CLOUD
350 PF OF SIMULATION FOR 2 HOURS
PRODUCED 5% OF ANNUAL SIMULATION DATA
AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM
DISTRIBUTED ACROSS U.S., EUROPE, APAC
Frank Wüerthwein, Ph.D.
Executive Director, Open Science Grid
Igor Sfiligoi
Lead Developer and Researcher
MULTIPLE GENERATIONS,
ONE APPLICATION
Events Processed
Per GPU Type
V100
M60
K80
T4
P40
P100
THE LARGEST CLOUD SIMULATION IN HISTORY
20
Up to 800 V100 GPUs Connected via Mellanox InfiniBand
ANNOUNCING
WORLD’S LARGEST ON-DEMAND SUPERCOMPUTER
21
DIVERSE ARM ARCHITECTURES
AMPERE COMPUTING eMAG
Hyperscale and Storage
AMAZON GRAVITON
Hyperscale and SmartNIC
MARVELL THUNDERX2
Hyperscale, Storage and HPC
FUJITSU A64FX
Supercomputing
HUAWEI KUNPENG 920
Big Data and Edge
22
NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB
Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small],
NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D
[four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node:
Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow
23
ANNOUNCING
NVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU
24
ANNOUNCING
NVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU
25
ANNOUNCING
NVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
APPLICATIONS
PROGRAMMING MODELS
C++
CUDA
FORTRAN
COMET
DCA++ GROMACS
INDEX
LAMMPS
LSMS
MATLAB
MILC
NAMD
OPTIX
RELION
TENSORFLOW
PARAVIEW
OPENACC
PYTHON
ARM ALLINEA STUDIO
BRIGHT COMPUTING
CMAKE
CUDA-GDB
CUPTI
GCC
LLVM
NVCC
PAPI
SINGULARITY
SLURM
TAU
GAMERA
SDKS
QUANTUM ESPRESSO
PERFORCE TOTALVIEW
PGI
SCORE-P
VMD
26
27
50 GB/s 50 GB/s
EXTREME COMPUTE NEEDS EXTREME IO
TRADITIONAL RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
28
EXTREME COMPUTE NEEDS EXTREME IO
GPUDIRECT RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
29
EXTREME COMPUTE NEEDS EXTREME IO
TRADITIONAL STORAGE
PCIe Switch
CPU
System Memory
GPU
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
50 GB/s
30
EXTREME COMPUTE NEEDS EXTREME IO
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
31
ANNOUNCING NVIDIA MAGNUM IO
Acceleration Libraries for Large-scale HPC and IO
High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
32
PYTHON
CUDA
APACHE ARROW
CUDF CUGRAPH
RAPIDS
CUML
PANDAS SCI-KL / XGBOOST
CUDNN
DEEP LEARNING
FRAMEWORKS
DASK
NVIDIA RAPIDS DATA SCIENCE
Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year
Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC
33
NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS
20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER
FOR CLIMATE
Q4 TPC-H Benchmark Work Breakdown:
With Repeated Query
0 400,000 800,000 1,200,000
WITHOUT GDS
WITH GDS
Latency (msec)
CUDA
Startup
GPU and CPU
Allocation
Data
Preload
Warmup
Query
Repeat
Query
Clean
Up
Driver
Close
34
ANNOUNCING
WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION
Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh
4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX
35
ANNOUNCING
NVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT
16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps
30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO
2.3x Faster Than Current IO500 10-node Leader
Powered by NVIDIA Magnum IO
EXTREME WEATHER
AI INFERENCE
NVIDIA TENSOR RT
3D VOLUME ANALYTICS
PANGEO XARRAY
VMD COMPUTATIONAL
MICROSCOPE
NVIDIA OPTIX
3D INTERACTIVE VOLUME RENDERING
NVIDIA INDEX
TPC-H RECORD
10 TB JOIN
NVIDIA RAPIDS
36
THE EXPANDING UNIVERSE OF HPC
NETWORK
EDGE ANALYTICS
SIMULATION
EXTREME IO
NVIDIA HPC
for ARM
NVIDIA EGX
Edge
Supercomputing
Platform
NVIDIA DGX-2
Supercomputing
Analytics Instrument
NVIDIA
Magnum IO
NGC
Azure
37

More Related Content

PDF
NVIDIA GTC 2020 October Summary
PDF
Google Anthos - Azure Stack - AWS Outposts :Comparison
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPT
Introduction to Android
PPT
The business value of managed services: Findings from IDC research sponsored...
DOC
Uttam Resume(Dot Net Developer)
PDF
A Tour of Google Cloud Platform
PPTX
Google cloud functions
NVIDIA GTC 2020 October Summary
Google Anthos - Azure Stack - AWS Outposts :Comparison
Introducing Cloudera DataFlow (CDF) 2.13.19
Introduction to Android
The business value of managed services: Findings from IDC research sponsored...
Uttam Resume(Dot Net Developer)
A Tour of Google Cloud Platform
Google cloud functions

What's hot (20)

PDF
Big data on google cloud
PPTX
Google app engine - Overview
PPTX
GCP.pptx
PDF
Network slicing: A key technology for 5G and its impact on mobile virtual net...
PDF
Cloud run - Serverless Containers Done Right
PDF
Google cloud - solution deck
PPT
Cliff Mass: Big Data and Weather Prediction - Seattle Interactive 2015
PPTX
Google Cloud Platform
PPTX
Introduction to Serverless and Google Cloud Functions
PPTX
Introduction to Google Cloud Platform
PPTX
Google Cloud Platform (GCP) At a Glance
PDF
AI and Big Data Analytics
PDF
Mendix - Cloud PAAS App Platform
PDF
HPC on Azure for Reserach
PPTX
Google cloud platform
PPTX
Akraino and Edge Computing
PDF
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
PPTX
Docker Networking Overview
PPTX
BIND DNS IPWorks Introduction To Advanced
PDF
Serverless computing
Big data on google cloud
Google app engine - Overview
GCP.pptx
Network slicing: A key technology for 5G and its impact on mobile virtual net...
Cloud run - Serverless Containers Done Right
Google cloud - solution deck
Cliff Mass: Big Data and Weather Prediction - Seattle Interactive 2015
Google Cloud Platform
Introduction to Serverless and Google Cloud Functions
Introduction to Google Cloud Platform
Google Cloud Platform (GCP) At a Glance
AI and Big Data Analytics
Mendix - Cloud PAAS App Platform
HPC on Azure for Reserach
Google cloud platform
Akraino and Edge Computing
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
Docker Networking Overview
BIND DNS IPWorks Introduction To Advanced
Serverless computing
Ad

Similar to NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019 (20)

PDF
組み込みから HPC まで ARM コアで実現するエコシステム
PDF
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
PDF
GTC 2022 Keynote
PDF
Hardware & Software Platforms for HPC, AI and ML
PDF
Latest HPC News from NVIDIA
PDF
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
PDF
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
PDF
Talk on commercialising space data
PDF
Advances in GPU Computing
PDF
Nvidia at SEMICon, Munich
PDF
Nvidia tesla-k80-overview
PDF
Tesla Accelerated Computing Platform
PDF
RAPIDS Overview
PDF
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
PDF
Ac922 cdac webinar
PDF
Application Optimisation using OpenPOWER and Power 9 systems
PPTX
PGI Compilers & Tools Update- March 2018
PDF
Accelerating Data Science With GPUs
PDF
GTC 2019 Keynote in Silicon Valley
PPTX
OpenACC Monthly Highlights: October2020
組み込みから HPC まで ARM コアで実現するエコシステム
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
GTC 2022 Keynote
Hardware & Software Platforms for HPC, AI and ML
Latest HPC News from NVIDIA
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
Talk on commercialising space data
Advances in GPU Computing
Nvidia at SEMICon, Munich
Nvidia tesla-k80-overview
Tesla Accelerated Computing Platform
RAPIDS Overview
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
Ac922 cdac webinar
Application Optimisation using OpenPOWER and Power 9 systems
PGI Compilers & Tools Update- March 2018
Accelerating Data Science With GPUs
GTC 2019 Keynote in Silicon Valley
OpenACC Monthly Highlights: October2020
Ad

More from NVIDIA (20)

PDF
NVIDIA Story 2023.pdf
PDF
NVIDIA GTC2022 Spring Highlights
PDF
NVIDIA Brochure 2021 Company Overview
PPTX
The Best of AI and HPC in Healthcare and Life Sciences
PPTX
NLP for Biomedical Applications
PPTX
Top 5 Deep Learning and AI Stories - August 30, 2019
PPTX
Seven Ways to Boost Artificial Intelligence Research
PPTX
NVIDIA Developer Program Overview
PDF
NVIDIA at Computex 2019
PDF
Top 5 DGX Sessions From GTC 2019
PDF
DGX POD Top 4 Sessions From GTC 2019
PDF
Top 5 Data Science Sessions from GTC 2019
PPTX
This Week in Data Science - Top 5 News - April 26, 2019
PPTX
CUDA DLI Training Courses at GTC 2019
PPTX
DGX Sessions You Won't Want to Miss at GTC 2019
PPTX
Transforming Healthcare at GTC Silicon Valley
PPTX
OpenACC Monthly Highlights February 2019
PPTX
CUDA Sessions You Won't Want to Miss at GTC 2019
PPTX
Empowering Radiology with AI
PDF
Top 5 Deep Learning and AI Stories - November 30, 2018
NVIDIA Story 2023.pdf
NVIDIA GTC2022 Spring Highlights
NVIDIA Brochure 2021 Company Overview
The Best of AI and HPC in Healthcare and Life Sciences
NLP for Biomedical Applications
Top 5 Deep Learning and AI Stories - August 30, 2019
Seven Ways to Boost Artificial Intelligence Research
NVIDIA Developer Program Overview
NVIDIA at Computex 2019
Top 5 DGX Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019
Top 5 Data Science Sessions from GTC 2019
This Week in Data Science - Top 5 News - April 26, 2019
CUDA DLI Training Courses at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019
Transforming Healthcare at GTC Silicon Valley
OpenACC Monthly Highlights February 2019
CUDA Sessions You Won't Want to Miss at GTC 2019
Empowering Radiology with AI
Top 5 Deep Learning and AI Stories - November 30, 2018

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced IT Governance
NewMind AI Weekly Chronicles - August'25 Week I
NewMind AI Monthly Chronicles - July 2025
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced IT Governance

NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019

  • 1. 1
  • 2. 2 THE EXPANDING UNIVERSE OF HPC JENSEN HUANG | SC19
  • 3. 3 AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI
  • 4. 4
  • 5. 5 COMPUTING FOR THE DA VINCIS OF OUR TIME FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS ABCI SUMMIT CLIMATE LBNL | NVIDIA GENOMICS ORNL NUCLEAR WASTE REMEDIATION LBNL | PNNL Brown U. NVIDIA CANCER DETECTION ORNL | Stony Brook U.
  • 6. 6 FULL STACK SPEED-UP CUDA-X CUDA AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG CUDA 10.2 cuTENSOR 1.0 cuSOLVER 10.3 cuBLAS 10.2 cuDNN 7.6 TensorRT 6.0 DALI 0.15 NCCL 2.5 IndeX 2.1 OptiX 7.0 RAPIDS 0.10 Spark XGBoost 3x in 2 Years 2017 2019 2018 Time to Solution 27 Hours 20 Hours 10 Hours Amber Chroma GROMACS GTC LAMMPS MILC NAMD QE SPECFEM3D TensorFlow VASP Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
  • 7. 7 THE EXPANDING UNIVERSE OF HPC NETWORK EDGE ANALYTICS SIMULATION AI Edge Cloud Arm Data Analytics Extreme IO EXTREME IO
  • 8. 8 INCREDIBLE ADVANCES IN AI WRITING DIALOG TRANSLATION SUMMARIZATION Q&A CLASSIFICATION 2012 2019 BERT TRANSFORMER ALEXNET CNN 3D POSE DENOISING SEGMENTATION OBJECT RECOGNITION CLASSIFICATION IMAGE GENERATION
  • 9. 9 GPU COMPUTING POWERS AI ADVANCES #1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE Doubling 2 Years Doubling 3.4 Months Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years K80 SERVER DGX 2 Hours 600 Hours Time to Train (ResNet-50)
  • 10. 10 NVIDIA AI END-TO-END PLATFORM TRAINING AUTONOMOUS MACHINES DGX HGX EGX AGX EDGE AICLOUD
  • 11. 11 AI FOR SCIENCE EXPERIMENTATION DATA SIMULATION DATA NEURAL ESTIMATION Real-time Steering Fast Approximation Design Space Exploration ICF + MERLIN — Fusion Inverse Problems LIGO — Gravitational Waves Faster Prediction ANI + MD – Chemistry Real-time Steering ITER – Fusion Energy
  • 12. 12
  • 13. 13 1x Data Transfer 100x Data Collected STREAMING AI SOFTWARE-DEFINED SENSORS BUILD MODELSSTREAMING AI PROCESSING ECMWF: 287 TB/dayLSST: 20 TB/day SKA: 16 TB/sec
  • 14. 14 NVIDIA EGX STACK NGC Kubernetes Networking Storage Security CUDA-X Third-Party ISVs METROPOLIS IMAGE PROCESSING DECODE DNN GRAPHICS ENCODE DEEPSTREAM Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA Industrial-strength Cloud Native and AI Stack NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM
  • 15. 15 VERTICAL INDUSTRY FRAMEWORKS Clara Metropolis Isaac Omniverse Aerial DRIVE WORLD’S LARGEST DELIVERY SERVICE ADOPTS NVIDIA AI PUTTING AI TO WORK
  • 17. 17 SUPERCOMPUTING CLOUD Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU. CPU Instance 48 Hours, $152 Amber, Chroma, GROMACS, GTC, LAMMPS MILC, NAMD, QE, SPECFEM3D, TensorFlow, VASP SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE
  • 18. 18 SUPERCOMPUTING CLOUD 8x GPU Instance 1x GPU Instance CPU Instance 48 Hours, $152 Amber, Chroma, GROMACS, GTC, LAMMPS MILC, NAMD, QE, SPECFEM3D, TensorFlow, VASP Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU. SUPER COMPUTING IS HARD — GPU CLOUD 1/7TH COST OF CPU CLOUD 48x Faster, 1/7th the Cost
  • 19. 19 ICECUBE OBSERVATORY DETECTING NEUTRINOS 50K NVIDIA GPUs IN THE CLOUD 350 PF OF SIMULATION FOR 2 HOURS PRODUCED 5% OF ANNUAL SIMULATION DATA AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM DISTRIBUTED ACROSS U.S., EUROPE, APAC Frank Wüerthwein, Ph.D. Executive Director, Open Science Grid Igor Sfiligoi Lead Developer and Researcher MULTIPLE GENERATIONS, ONE APPLICATION Events Processed Per GPU Type V100 M60 K80 T4 P40 P100 THE LARGEST CLOUD SIMULATION IN HISTORY
  • 20. 20 Up to 800 V100 GPUs Connected via Mellanox InfiniBand ANNOUNCING WORLD’S LARGEST ON-DEMAND SUPERCOMPUTER
  • 21. 21 DIVERSE ARM ARCHITECTURES AMPERE COMPUTING eMAG Hyperscale and Storage AMAZON GRAVITON Hyperscale and SmartNIC MARVELL THUNDERX2 Hyperscale, Storage and HPC FUJITSU A64FX Supercomputing HUAWEI KUNPENG 920 Big Data and Edge
  • 22. 22 NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small], NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D [four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node: Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow
  • 23. 23 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now NIC PCIe Switch PCIe Switch NIC CPU CPU GPU GPU GPU GPU
  • 24. 24 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now NIC PCIe Switch PCIe Switch NIC CPU CPU GPU GPU GPU GPU
  • 25. 25 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now APPLICATIONS PROGRAMMING MODELS C++ CUDA FORTRAN COMET DCA++ GROMACS INDEX LAMMPS LSMS MATLAB MILC NAMD OPTIX RELION TENSORFLOW PARAVIEW OPENACC PYTHON ARM ALLINEA STUDIO BRIGHT COMPUTING CMAKE CUDA-GDB CUPTI GCC LLVM NVCC PAPI SINGULARITY SLURM TAU GAMERA SDKS QUANTUM ESPRESSO PERFORCE TOTALVIEW PGI SCORE-P VMD
  • 26. 26
  • 27. 27 50 GB/s 50 GB/s EXTREME COMPUTE NEEDS EXTREME IO TRADITIONAL RDMA NODE A NODE B PCIe Switch CPU System Memory GPU NIC PCIe Switch CPU System Memory GPU NIC
  • 28. 28 EXTREME COMPUTE NEEDS EXTREME IO GPUDIRECT RDMA NODE A NODE B PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  • 29. 29 EXTREME COMPUTE NEEDS EXTREME IO TRADITIONAL STORAGE PCIe Switch CPU System Memory GPU GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC 50 GB/s
  • 30. 30 EXTREME COMPUTE NEEDS EXTREME IO GPUDIRECT STORAGE PCIe Switch CPU System Memory GPU Storage 100 GB/s GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  • 31. 31 ANNOUNCING NVIDIA MAGNUM IO Acceleration Libraries for Large-scale HPC and IO High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization GPUDIRECT STORAGE PCIe Switch CPU System Memory GPU Storage 100 GB/s GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  • 32. 32 PYTHON CUDA APACHE ARROW CUDF CUGRAPH RAPIDS CUML PANDAS SCI-KL / XGBOOST CUDNN DEEP LEARNING FRAMEWORKS DASK NVIDIA RAPIDS DATA SCIENCE Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC
  • 33. 33 NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS 20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER FOR CLIMATE Q4 TPC-H Benchmark Work Breakdown: With Repeated Query 0 400,000 800,000 1,200,000 WITHOUT GDS WITH GDS Latency (msec) CUDA Startup GPU and CPU Allocation Data Preload Warmup Query Repeat Query Clean Up Driver Close
  • 34. 34 ANNOUNCING WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh 4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX
  • 35. 35 ANNOUNCING NVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT 16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps 30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO 2.3x Faster Than Current IO500 10-node Leader Powered by NVIDIA Magnum IO EXTREME WEATHER AI INFERENCE NVIDIA TENSOR RT 3D VOLUME ANALYTICS PANGEO XARRAY VMD COMPUTATIONAL MICROSCOPE NVIDIA OPTIX 3D INTERACTIVE VOLUME RENDERING NVIDIA INDEX TPC-H RECORD 10 TB JOIN NVIDIA RAPIDS
  • 36. 36 THE EXPANDING UNIVERSE OF HPC NETWORK EDGE ANALYTICS SIMULATION EXTREME IO NVIDIA HPC for ARM NVIDIA EGX Edge Supercomputing Platform NVIDIA DGX-2 Supercomputing Analytics Instrument NVIDIA Magnum IO NGC Azure
  • 37. 37