HPE and NVIDIA empowering AI and IoT

Helge Gose, NVIDIA Solution Architect, June 7, 2018
HPE AND NVIDIA
EMPOWERING AI AND IOT

2
AGENDA
What is Deep Learning?
Volta and NVLINK
Inference to Training – HPE solutions
IoT Use Case

3
AI AND DEEP LEARNING
NIPS (2012)
ImageNet Classification with Deep
Convolutional Neural Networks
Alex Krizhevsky
University of Toronto
Ilya Sutskever
Geoffrey e. Hinton
Deep Learning

5
DEEP LEARNING
IS SWEEPING ACROSS INDUSTRIES
INTERNET SERVICES
MEDICINE MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES
Cancer cell detection
Diabetic grading
Drug discovery
Pedestrian detection
Lane tracking
Recognize traffic signs
Face recognition
Video surveillance
Cyber security
Video captioning
Content based search
Real time translation
Image/Video classification
Speech recognition
Natural language processing
INTERNET SERVICES

DEEP LEARNING APPLICATION DEVELOPMENT

Untrained
Neural Network
Model

Untrained
Neural Network
Model
Deep Learning
Framework
TRAINING
Learning a new capability
from existing data

Untrained
Neural Network
Model
Deep Learning
Framework
TRAINING
from existing data
Trained Model
New Capability

Untrained
Neural Network
Model
Deep Learning
Framework
TRAINING
from existing data
Trained Model
New Capability
Trained Model
Optimized for
Performance

Untrained
Neural Network
Model
Deep Learning
Framework
TRAINING
from existing data
Trained Model
New Capability
App or Service
Featuring Capability
INFERENCE
Applying this capability
to new data
Trained Model
Optimized for
Performance

13
TESLA V100
WORLD’S MOST ADVANCED DATA CENTER GPU
5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
16GB/ 32GB HBM2 @ 900GB/s | 300GB/s NVLink

14
NVLINK FABRIC
V100: 6 NVLINKS @ 50 GB/s bidirectional
Total bandwidth of 300 GB/sec
Maximizing system throughput
Advancing Multi-GPU Processing

15
REVOLUTIONARY AI PERFORMANCE
3X Faster DL Training Performance
3X Reduction in Time to Train Over P100
0 10 20
1X
V100
1X
P100
2X
CPU
Relative Time to Train Improvements
(LSTM)
Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x
Xeon E5 2699 V4
15 Days
18 Hours
6 Hours
Over 80X DL Training
Performance in 3 Years
1x K80
cuDNN2
4x M40
cuDNN3
8x P100
cuDNN6
8x V100
cuDNN7
0x
20x
40x
60x
80x
100x
Q1
15
Q3
15
Q2
17
Q2
16
Exponential Performance over time
(GoogleNet)
SpeedupvsK80
GoogleNet Training Performance on versions of cuDNN
Vs 1x K80 cuDNN2

16
END-TO-END PRODUCT FAMILY
TRAINING INFERENCE
Jetson
Drive PX
Apollo 6500 | V100
DATA CENTER
TITAN V
TESLA V100
DESKTOP
DGX Station
DATA CENTER
TESLA V100
TESLA P4
EMBEDDED AUTOMOTIVE
DriveWorks SDKJETPACK SDK

17
POWERING THE DEEP LEARNING ECOSYSTEM
NVIDIA SDK Accelerates Every Major Framework
COMPUTER VISION
OBJECT DETECTION IMAGE CLASSIFICATION
SPEECH & AUDIO
VOICE RECOGNITION LANGUAGE TRANSLATION
NATURAL LANGUAGE PROCESSING
RECOMMENDATION ENGINES SENTIMENT ANALYSIS
DEEP LEARNING FRAMEWORKS
Mocha.jl
NVIDIA DEEP LEARNING SDK
developer.nvidia.com/deep-learning-software

18
TRAINING TO INFERENCE
- HPE SOLUTIONS

19
HPE COMPREHENSIVE, PURPOSE-BUILT
PORTFOLIO FOR DEEP LEARNING
Compute ideal for training models in data center
Edge analytics and
inference engine
Compute for both training models
and inference at edge
HPE Apollo 6500
AI / HPC Storage Choice of Fabrics
HPE SGI 8600
Government,
academia and
industries
Financial
services
Life Sciences,
Health
Government
and academia
Autonomous
vehicles / Mfg.
AI Software Framework
– Intel® Omni-Path
Architecture
– Mellanox
InfiniBand
– HPE FlexFabric
Network
Petaflop scale for deep
learning and HPC
Enterprise platform for
accelerated computing
HPE Apollo 2000
The bridge to enterprise
scale-out architecture
HPE Edgeline EL4000
Unprecedented deep edge compute and
high capacity storage; open standards
HPE Apollo sx40
Maximize GPU capacity and
performance with lower TCO
Advisory and Transformation Services | GreenLake Flexible Capacity | Datacenter Care Services
HPE Data Management
Framework Software
Storage elasticity
through tiered
data
management

20
ACCELERATED PERFORMANCE
FOR MACHINE LEARNING
deep learning platform with industry leading GPUs and interconnects
Enterprise AI
Dependable system performance
− Power and cooling headroom for
top bin accelerators
− Robust signal integrity provides
higher bandwidth, lower latency
HPE’s highest GPUs per
server
− Up to 125 Tflops1 single precision
performance
− Eight GPUs per server
Powerful host
− 2x the high speed fabric adapters2
− NVMe drives
− High speed DDR4 SmartMemory
Leading accelerator
technology
− NVLink 2.0 enables dedicated
GPU-to-GPU communication
1 NVIDIA Tesla V100 provides 15.7 TFlops single precision performance with NVLink 2.0 per GPU x 8 GPUs = 125 Tflops http://www.nvidia.com/content/PDF/Volta-Datasheet.pdf
2 Apollo 6500 Gen10 with 4 OPA or IB connectors vs Apollo 6500 Gen9 with 2 connectors

High Performing Edge Compute Brings Action to the Edge
22
Shift to the Edge for Control!

23
HPE enables both Edge AI “Inference” & Core “Training”

24
FOR MORE
INFO
24
NVIDIA:
https://www.nvidia.de/data-center/tesla/
HPE:
https://www.hpe.com/us/en/solutions/hpc-
high-performance-computing.html

HPE and NVIDIA empowering AI and IoT

More Related Content

What's hot (20)

Similar to HPE and NVIDIA empowering AI and IoT (20)

More from Renee Yao (15)

Recently uploaded (20)

HPE and NVIDIA empowering AI and IoT

Editor's Notes