NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019

2
THE EXPANDING UNIVERSE OF HPC
JENSEN HUANG | SC19

3
AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI

5
COMPUTING FOR THE DA VINCIS OF OUR TIME
FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS
ABCI
SUMMIT
CLIMATE
LBNL | NVIDIA
GENOMICS
ORNL
NUCLEAR WASTE
REMEDIATION
LBNL | PNNL Brown U. NVIDIA
CANCER DETECTION
ORNL | Stony Brook U.

6
FULL STACK
SPEED-UP
CUDA-X
CUDA
AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG
CUDA 10.2
cuTENSOR 1.0
cuSOLVER 10.3
cuBLAS 10.2
cuDNN 7.6
TensorRT 6.0
DALI 0.15
NCCL 2.5
IndeX 2.1
OptiX 7.0
RAPIDS 0.10
Spark XGBoost
3x in 2 Years
2017
2019
2018
Time to Solution
27 Hours
20 Hours
10 Hours
Amber
Chroma
GROMACS
GTC
LAMMPS
MILC
NAMD
QE
SPECFEM3D
TensorFlow
VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC
[moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D
[four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.

7
NETWORK
EDGE ANALYTICS
SIMULATION
AI
Edge
Cloud
Arm
Data
Analytics
Extreme
IO
EXTREME IO

8
INCREDIBLE ADVANCES IN AI
WRITING
DIALOG
TRANSLATION
SUMMARIZATION
Q&A
CLASSIFICATION
2012 2019
BERT
TRANSFORMER
ALEXNET
CNN
3D POSE
DENOISING
SEGMENTATION
OBJECT RECOGNITION
CLASSIFICATION
IMAGE GENERATION

9
GPU COMPUTING POWERS AI ADVANCES
#1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE
Doubling
2 Years
Doubling
3.4 Months
Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years
K80 SERVER
DGX 2 Hours
600 Hours
Time to Train (ResNet-50)

10
NVIDIA AI END-TO-END PLATFORM
TRAINING AUTONOMOUS MACHINES
DGX HGX EGX AGX
EDGE AICLOUD

11
AI FOR SCIENCE
EXPERIMENTATION
DATA
SIMULATION
DATA
NEURAL ESTIMATION
Real-time Steering Fast Approximation
Design Space Exploration
ICF + MERLIN — Fusion
Inverse Problems
LIGO — Gravitational Waves
Faster Prediction
ANI + MD – Chemistry
Real-time Steering
ITER – Fusion Energy

13
1x
Data Transfer
100x
Data Collected
STREAMING AI
SOFTWARE-DEFINED
SENSORS
BUILD MODELSSTREAMING AI
PROCESSING
ECMWF: 287 TB/dayLSST: 20 TB/day
SKA: 16 TB/sec

14
NVIDIA EGX STACK
NGC
Kubernetes Networking Storage Security
CUDA-X
Third-Party ISVs
METROPOLIS
IMAGE
PROCESSING
DECODE DNN GRAPHICS ENCODE
DEEPSTREAM
Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust
Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA
Industrial-strength Cloud Native and AI Stack
NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM

15
VERTICAL INDUSTRY FRAMEWORKS
Clara Metropolis
Isaac Omniverse Aerial
DRIVE
WORLD’S LARGEST DELIVERY SERVICE
ADOPTS NVIDIA AI
PUTTING AI TO WORK

16
NVIDIA EGX
Edge Supercomputing Platform

17
SUPERCOMPUTING CLOUD
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec:
Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50],
VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE

18
SUPERCOMPUTING CLOUD
8x GPU Instance
1x GPU Instance
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec:
Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50],
VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
SUPER COMPUTING IS HARD —
GPU CLOUD 1/7TH COST OF CPU CLOUD
48x Faster, 1/7th the Cost

19
ICECUBE OBSERVATORY
DETECTING NEUTRINOS
50K NVIDIA GPUs IN THE CLOUD
350 PF OF SIMULATION FOR 2 HOURS
PRODUCED 5% OF ANNUAL SIMULATION DATA
AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM
DISTRIBUTED ACROSS U.S., EUROPE, APAC
Frank Wüerthwein, Ph.D.
Executive Director, Open Science Grid
Igor Sfiligoi
Lead Developer and Researcher
MULTIPLE GENERATIONS,
ONE APPLICATION
Events Processed
Per GPU Type
V100
M60
K80
T4
P40
P100
THE LARGEST CLOUD SIMULATION IN HISTORY

20
Up to 800 V100 GPUs Connected via Mellanox InfiniBand
ANNOUNCING
WORLD’S LARGEST ON-DEMAND SUPERCOMPUTER

21
DIVERSE ARM ARCHITECTURES
AMPERE COMPUTING eMAG
Hyperscale and Storage
AMAZON GRAVITON
Hyperscale and SmartNIC
MARVELL THUNDERX2
Hyperscale, Storage and HPC
FUJITSU A64FX
Supercomputing
HUAWEI KUNPENG 920
Big Data and Edge

22
NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB
Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small],
NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D
[four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node:
Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow

23
ANNOUNCING
NVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU

24
ANNOUNCING
NVIDIA HPC FOR ARM
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU

25
ANNOUNCING
NVIDIA HPC FOR ARM
APPLICATIONS
PROGRAMMING MODELS
C++
CUDA
FORTRAN
COMET
DCA++ GROMACS
INDEX
LAMMPS
LSMS
MATLAB
MILC
NAMD
OPTIX
RELION
TENSORFLOW
PARAVIEW
OPENACC
PYTHON
ARM ALLINEA STUDIO
BRIGHT COMPUTING
CMAKE
CUDA-GDB
CUPTI
GCC
LLVM
NVCC
PAPI
SINGULARITY
SLURM
TAU
GAMERA
SDKS
QUANTUM ESPRESSO
PERFORCE TOTALVIEW
PGI
SCORE-P
VMD

27
50 GB/s 50 GB/s
EXTREME COMPUTE NEEDS EXTREME IO
TRADITIONAL RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
NIC
PCIe Switch
CPU
System Memory
GPU
NIC

28
GPUDIRECT RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC

29
TRADITIONAL STORAGE
PCIe Switch
CPU
System Memory
GPU
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
50 GB/s

30
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC

31
ANNOUNCING NVIDIA MAGNUM IO
Acceleration Libraries for Large-scale HPC and IO
High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC

32
PYTHON
CUDA
APACHE ARROW
CUDF CUGRAPH
RAPIDS
CUML
PANDAS SCI-KL / XGBOOST
CUDNN
DEEP LEARNING
FRAMEWORKS
DASK
NVIDIA RAPIDS DATA SCIENCE
Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year
Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC

33
NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS
20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER
FOR CLIMATE
Q4 TPC-H Benchmark Work Breakdown:
With Repeated Query
0 400,000 800,000 1,200,000
WITHOUT GDS
WITH GDS
Latency (msec)
CUDA
Startup
GPU and CPU
Allocation
Data
Preload
Warmup
Query
Repeat
Query
Clean
Up
Driver
Close

34
ANNOUNCING
WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION
Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh
4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX

35
ANNOUNCING
NVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT
16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps
30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO
2.3x Faster Than Current IO500 10-node Leader
Powered by NVIDIA Magnum IO
EXTREME WEATHER
AI INFERENCE
NVIDIA TENSOR RT
3D VOLUME ANALYTICS
PANGEO XARRAY
VMD COMPUTATIONAL
MICROSCOPE
NVIDIA OPTIX
3D INTERACTIVE VOLUME RENDERING
NVIDIA INDEX
TPC-H RECORD
10 TB JOIN
NVIDIA RAPIDS

36
NETWORK
EDGE ANALYTICS
SIMULATION
EXTREME IO
NVIDIA HPC
for ARM
NVIDIA EGX
Edge
Supercomputing
Platform
NVIDIA DGX-2
Supercomputing
Analytics Instrument
NVIDIA
Magnum IO
NGC
Azure

NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019

More Related Content

What's hot (20)

Similar to NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019 (20)

More from NVIDIA (20)

Recently uploaded (20)

NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019