SlideShare a Scribd company logo
10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD
Oct 18th AI Connect Speakers
WiBD Introduction & DL Use Cases
Renee Yao
Product Marketing Manager,
Deep Learning and Analytics
NVIDIA
Deep Learning Workflows (w/ a demo)
Kari Briski
Director of Deep Learning
Software Product
NVIDIA
Deep Learning in Enterprise
Nazanin Zaker
Data Scientist
SAP Innovation Center Network
Renee Yao
Product Marketing Manager, NVIDIA
AI CONNECT
Agenda
AI Connect
• 6:00-7:00pm – Registration and Networking
• 7:00-7:15pm – “WiBD Introduction & DL Use
Cases”, Renee Yao, Product Marketing
Manager, Deep Learning and Analytics, NVIDIA
• 7:15-7:45pm – “Deep Learning Workflows
(with a live demo)”, Kari Briski, Director of
Deep Learning Software Product, NVIDIA
• 7:45-8:15pm – “Deep Learning in Enterprise”
by Nazanin Zaker, Data Scientist, SAP
Innovation Center Network
• 8:15-8:30pm - Wrap-up & Giveaways
February Apache Hadoop
Training @ Cloudera
May Apache Drill and
Apache Spark @ MapR
June Career Empowerment
@ Andreessen Horowitz
June @ Spark Summit
June @ Hadoop SummitMarch @ Strata+Hadoop
World SJ
10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD
10/20/2017 Women in Big Data Forum
Be Part of The Solution
Become a member or a sponsor
• Website: womeninbigdata.org
• LinkedIn: “Women in Big Data Forum”
• Meetup: meetup.com/Women-in-Big-Data-Meetup/
• Twitter: @DataWomen
• Video: https://www.youtube.com/channel/UCOaMT7A9SVkeBdvYNxiITVA
Join us
Event Hashtags: #IamAI, #WiBD
Deep Learning Workflows: Training and Inference
Kari Briski, 10-18-17
DEEP LEARNING WORKFLOWS:
DEEP LEARNING TRAINING AND INFERENCE
7
NATURAL LANGUAGE
PROCESSING
SPEECH & AUDIO
AI APPLICATIONS
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION
8
NATURAL LANGUAGE
PROCESSING
SPEECH & AUDIO
AI APPLICATIONS
Object Detection
Classification
Segmentation
Visual Q&A
Neural Machine
Translation
Question & Answer
Sentiment Analysis
Search and
recommendation engines
ASR
automatic speech recognition
Generation
Processing
Audio-classification
Denoising
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION
9
ACCELERATED DEEP LEARNING TRAINING STACK
AI Applications
are Built on NVIDIA Hardware and Software
End-to-End
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
10
NVIDIA TOOLS FOR DEEP LEARNING WORKFLOW
NVIDIA DEEP LEARNING SDK
TRAINING DEPLOY WITH TENSORRT
TRAINED
NETWORK
TRAINING
DATA TRAINING
DATA MANAGEMENT
MODEL ASSESSMENT
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
DATA: GATHER AND LABEL
Rapidly label data,
guide training get
insights
Gather Data
Curate data sets
Accelerated Deep Learning Training Software Stack
11
DL FLOW
INFERENCE &
MICROSERVICES
IMPORT
Format…
PREPROCESS
clean, clip,
label,
Normalize, ..
VISUALIZATION
Curated DatasetSource Dataset
TRAIN
SCORE +
OPTIMIZE,
VISUALIZATION
DEPLOY
tune,
compile
+ runtime
REST
API
RESULT *
inference,
prediction
MODEL
ZOO
12
INFRASTRUCTURE FOR AI
13
GATHER DATA, CURATE LABEL
14
Crowd Source Tools
VATIC
Free Labeled Data
ViPER
Computer Vision
Translation
Speech & Audio
Home-grown
15
Project Manager
STEP 1
Project Setup
Project named
Classifier types defined
Labeling task settings
defined
Sequences added
Data Labeler
STEP 3
Labeling
Labels created
Attributes of labels selected
Frames committed for QA
Curator
STEP 2
Data Labeler
STEP 4
QA
Frames accepted or rejected
Rejection reason specified
Data Labeler
STEP 5
Export
Data sent to training
Export
Data set sent to training
Curation
Which pieces of data make the
most sense to us
16
TRAINING
17
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras
NVIDIA DEEP LEARNING SOFTWARE TRAINING STACK
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
At Your Desk On-Prem In-the-Cloud
18
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
COMMUNICATION
cuFFT
ACCELERATED DEEP LEARNING TRAINING STACK
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras
DEEP LEARNING FRAMEWORKS
Deep Learning Software Libraries (AKA Frameworks)
NCCLArchitecture Specific Libraries
Productivity: Workflow, Data and Job Management, Experiments
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment AnalysisImage Classification
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
At Your Desk On-Prem In-the-Cloud
19
Object Detection Voice Recognition Language Translation
Recommendation
Engines Sentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
COMMUNICATION
cuFFT
Image Classification
ACCELERATED DEEP LEARNING TRAINING STACK
UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION
DIGITS, NVIDIA GPU Cloud, NVDocker, Keras, Kubernetes
NV OPTIMIZED NV ACCELERATED
COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
NCCL
Paddle
At Your Desk On-Prem In-the-Cloud
20
GENERATIONAL GPU PERFORMANCE & TENSOR CORES
0
1
2
3
4
5
6
7
8
k80 p100 v100 v100 TC
Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta
21
GENERATIONAL GPU PERFORMANCE & TENSOR CORES
0
1
2
3
4
5
6
7
8
k80 p100 v100 v100 TC
Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta
with Volta Tensor Core math
3-3.5X CNN training
over Pascal
22
0 10 20 30 40 50
8x-V100
8x P100
8x K80
TIME TO SOLUTION (HOURS)
1 weekend
1 day
1 afternoon
Convolutional Neural Networks
Recursive Neural Networks
Training ImageNet to accuracy(90 epochs) with ResNet-50
Training OpenNMT to accuracy (13 epochs)
0 10 20 30 40
V100
P100
K80
23
WHERE TO TRAIN
At Your Desk On-Prem In-the-Cloud
24
INFERENCE
DEPLOY YOUR TRAINED NETWORK
TO INFER IN APPLICATIONS
25
TRAINED
NETWORK
MODEL
NOW WHAT?
0
500
1000
1500
2000
2500
CPU K80 TF P100 TF P100 TRT
Throughput
Images/sec
26
TRAINED
NETWORK
MODEL
OPTIMIZE
0
500
1000
1500
2000
2500
CPU K80 TF P100 TF P100 TRT
Throughput
Images/sec
27
NVIDIA TENSOR RT
Maximize inference throughput for latency critical services
TRAINED
NETWORK
MODEL
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
High performance neural network inference optimizer and runtime engine for production
deployment
TensorRT Optimizer
TensorRT
Runtime
Engine
OPTIMIZED
NETWORK
28
TESLA V100
DRIVE PX 2
TESLA P4
JETSON TX2
NVIDIA DLA
TensorRT
NVIDIA TENSORRT PROGRAMMABLE
INFERENCING PLATFORM
NVIDIA TENSORRT PROGRAMMABLE
INFERENCING PLATFORM
29
NVIDIA TensorRT
Maximize throughput and minimize latency
Deploy reduced precision without retraining and
without accuracy loss
Train in any framework, deploy in TensorRT without
overhead
Programmable Inference Accelerator
Embedded Automotive Data center
Jetson Drive PX Tesla
developer.nvidia.com/tensorrt
30
VOLTA ON A BUDGET
LATENCY BENCHMARKS
0
1000
2000
3000
4000
5000
6000
CPU-Only V100 + TensorFlow V100 + TensorRT
Throughput (image/s) vs Latency (ms)
CPU-Only
V100 + TensorFlow
V100 + TensorRT
3X
19
6
7
ResNet-50 (ImageNet) OpenNMT (English to Deutsch)
Throughput on a 200 ms latency budget
6X
31
ENABLE INT8 INFERENCE
TensorRT is ENABLER
for entropy quantization
Training
Framework
TensorRT
Calibrate
&
Quantize
fp32 int8 Inference
100’s of samples
of training data
FP32 TOP 1 INT8 TOP 1 DIFFERENCE
Alexnet 57.22% 56.96% 0.26%
Googlenet 68.87% 68.49% 0.38%
VGG 68.56% 68.45% 0.11%
Resnet-50 73.11% 72.54% 0.57%
Resnet-
101
74.58% 74.14% 0.44%
Resnet-
152
75.18% 74.56% 0.61%
Maintain accuracy without
retraining
32
NVIDIA TENSOR RT
Maximize inference throughput for latency critical services
EMBEDDED
Jetson TX
AUTOMOTIVE
Drive PX (XAVIER)
DATA CENTER
Tesla (Pascal, Volta)
Large Batch,
Low Latency, Production-ready
Real-time execution, high resolution,
high throughput, small footprint
Low power
small footprint, multi-inference
33
“On average TensorRT has doubled the speed of
our inference which is pretty amazing!”
Source: Paul Kruszewski; CEO WRNCH
“On average we see around 10x speedup, with between
3-70x speedups depending on the scenarios ”
Source: Matthew Zieler CEO Clarifai
“Self-driving car’s having real-time execution is obviously
very important. With our ResNet101 network, TensorRT
brought our inference time down from 250ms to 89ms.”
34
35NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
FAST IMPLEMENTATION OF TENSORFLOW
36
EXAMPLE WORKFLOWS
37
DL DATACENTER WORKFLOW
TensorRT increases productivity and time to results
INFERENCE &
MICROSERVICES
TRAIN
SCORE +
OPTIMIZE,
VISUALIZATION
DEPLOY
tune,
compile
+ runtime
REST
API
RESULT
inference,
prediction
MODEL
ZOO
Automated
with TensorRT
A/B Testing, Use
data
38
NVIDIA DIGITS
>10k pulls
>2.5k stars
DL EDGE/ IVA WORKFLOW
Transfer Learning: Train and deploy to edge in less than a minute
39
DEMO DEEP LEARNING WORKFLOW
Transfer Learning: Train and deploy to edge in less than a minute
A special THANK YOU!
Zheng Liu &
Varun Praveen
40
IN SUMMARY
41
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business
APPLIED DL/ DATA SCIENTISTS
Retrain w/ data, productize models
for consistency, focus on quality
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
42
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business, and find
ways to productize it
APPLIED DL/ DATA SCIENTISTS
Retrain, productize models for
consistency, quality, tuning with
right data
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification
Paddle
43
WHO, WHAT, WHERE
RESEARCHERS
Explore the “next big thing”
opportunity to fuel business, and find
ways to productize it
DATA SCIENTISTS
Retrain, productize models for
consistency, quality, tuning with
right data
APPLICATION DEVELOPER
Scale and deploy successful
applications w/ great user ex.
TensorRT
Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification
Training Deployingor
Paddle
44

More Related Content

PPTX
Generative AI and Large Language Models (LLMs)
PDF
Transformer Introduction (Seminar Material)
PPTX
Natural language processing and transformer models
PPTX
Introduction to Transformer Model
PPTX
Fine tune and deploy Hugging Face NLP models
PDF
bag-of-words models
PDF
Vector databases and neural search
PDF
Introduction to Transformers for NLP - Olga Petrova
Generative AI and Large Language Models (LLMs)
Transformer Introduction (Seminar Material)
Natural language processing and transformer models
Introduction to Transformer Model
Fine tune and deploy Hugging Face NLP models
bag-of-words models
Vector databases and neural search
Introduction to Transformers for NLP - Olga Petrova

What's hot (20)

PDF
Natural Language Processing (NLP)
PDF
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PDF
PR-175: XLNet: Generalized Autoregressive Pretraining for Language Understanding
PDF
An introduction to the Transformers architecture and BERT
PDF
Deep learning - A Visual Introduction
PPTX
Deep Learning With Neural Networks
PDF
Recommender Systems
PDF
Evaluating LLM Models for Production Systems Methods and Practices -
PDF
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
PPTX
Binarized CNN on FPGA
PDF
Vector database
PDF
Generative AI
PPTX
Nlp toolkits and_preprocessing_techniques
PDF
Serverless ML Workshop with Hopsworks at PyData Seattle
PPTX
RNN & LSTM: Neural Network for Sequential Data
PPTX
How does ChatGPT work: an Information Retrieval perspective
PPTX
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
PDF
Transformers in 2021
PDF
밑바닥부터 시작하는딥러닝 8장
PPTX
Explainability for Natural Language Processing
Natural Language Processing (NLP)
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PR-175: XLNet: Generalized Autoregressive Pretraining for Language Understanding
An introduction to the Transformers architecture and BERT
Deep learning - A Visual Introduction
Deep Learning With Neural Networks
Recommender Systems
Evaluating LLM Models for Production Systems Methods and Practices -
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Binarized CNN on FPGA
Vector database
Generative AI
Nlp toolkits and_preprocessing_techniques
Serverless ML Workshop with Hopsworks at PyData Seattle
RNN & LSTM: Neural Network for Sequential Data
How does ChatGPT work: an Information Retrieval perspective
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
Transformers in 2021
밑바닥부터 시작하는딥러닝 8장
Explainability for Natural Language Processing
Ad

Viewers also liked (6)

PPTX
Deep Learning In Industries
PPTX
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
PPTX
Top 5 Deep Learning and AI Stories - November 3, 2017
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
PDF
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
PDF
Deep Learning In Industries
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Top 5 Deep Learning and AI Stories - November 3, 2017
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Ad

Similar to Deep Learning Workflows: Training and Inference (20)

PDF
Introduction to Deep Learning (NVIDIA)
PPTX
abelbrownnvidiarakuten2016-170208065814 (1).pptx
PDF
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
PDF
Enabling Artificial Intelligence - Alison B. Lowndes
PDF
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
PDF
NVIDIA @ Infinite Conference, London
PDF
Fueling the AI Revolution with Gaming
PDF
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
PDF
Nvidia why every industry should be thinking about AI today
PDF
AI in the Financial Services Industry
PDF
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
PDF
AI talk at CogX 2018
PDF
GTC Europe 2017 Keynote
PDF
GTC Taiwan 2017 主題演說
PDF
GTC China 2016
PDF
Big Data LDN 2017: Deep Learning Demystified
PDF
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
PDF
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
PDF
NVIDIA Deep Learning Institute 2017 基調講演
PDF
AI and Deep Learning with NVIDIA Technologies
Introduction to Deep Learning (NVIDIA)
abelbrownnvidiarakuten2016-170208065814 (1).pptx
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
Enabling Artificial Intelligence - Alison B. Lowndes
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
NVIDIA @ Infinite Conference, London
Fueling the AI Revolution with Gaming
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Nvidia why every industry should be thinking about AI today
AI in the Financial Services Industry
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
AI talk at CogX 2018
GTC Europe 2017 Keynote
GTC Taiwan 2017 主題演說
GTC China 2016
Big Data LDN 2017: Deep Learning Demystified
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA Deep Learning Institute 2017 基調講演
AI and Deep Learning with NVIDIA Technologies

More from NVIDIA (20)

PDF
NVIDIA Story 2023.pdf
PDF
NVIDIA GTC2022 Spring Highlights
PDF
NVIDIA Brochure 2021 Company Overview
PDF
NVIDIA GTC 2020 October Summary
PPTX
The Best of AI and HPC in Healthcare and Life Sciences
PDF
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
PPTX
NLP for Biomedical Applications
PPTX
Top 5 Deep Learning and AI Stories - August 30, 2019
PPTX
Seven Ways to Boost Artificial Intelligence Research
PPTX
NVIDIA Developer Program Overview
PDF
NVIDIA at Computex 2019
PDF
Top 5 DGX Sessions From GTC 2019
PDF
DGX POD Top 4 Sessions From GTC 2019
PDF
Top 5 Data Science Sessions from GTC 2019
PPTX
This Week in Data Science - Top 5 News - April 26, 2019
PDF
GTC 2019 Keynote in Silicon Valley
PPTX
CUDA DLI Training Courses at GTC 2019
PPTX
DGX Sessions You Won't Want to Miss at GTC 2019
PPTX
Transforming Healthcare at GTC Silicon Valley
PPTX
OpenACC Monthly Highlights February 2019
NVIDIA Story 2023.pdf
NVIDIA GTC2022 Spring Highlights
NVIDIA Brochure 2021 Company Overview
NVIDIA GTC 2020 October Summary
The Best of AI and HPC in Healthcare and Life Sciences
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NLP for Biomedical Applications
Top 5 Deep Learning and AI Stories - August 30, 2019
Seven Ways to Boost Artificial Intelligence Research
NVIDIA Developer Program Overview
NVIDIA at Computex 2019
Top 5 DGX Sessions From GTC 2019
DGX POD Top 4 Sessions From GTC 2019
Top 5 Data Science Sessions from GTC 2019
This Week in Data Science - Top 5 News - April 26, 2019
GTC 2019 Keynote in Silicon Valley
CUDA DLI Training Courses at GTC 2019
DGX Sessions You Won't Want to Miss at GTC 2019
Transforming Healthcare at GTC Silicon Valley
OpenACC Monthly Highlights February 2019

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I

Deep Learning Workflows: Training and Inference

  • 1. 10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD Oct 18th AI Connect Speakers WiBD Introduction & DL Use Cases Renee Yao Product Marketing Manager, Deep Learning and Analytics NVIDIA Deep Learning Workflows (w/ a demo) Kari Briski Director of Deep Learning Software Product NVIDIA Deep Learning in Enterprise Nazanin Zaker Data Scientist SAP Innovation Center Network
  • 2. Renee Yao Product Marketing Manager, NVIDIA AI CONNECT
  • 3. Agenda AI Connect • 6:00-7:00pm – Registration and Networking • 7:00-7:15pm – “WiBD Introduction & DL Use Cases”, Renee Yao, Product Marketing Manager, Deep Learning and Analytics, NVIDIA • 7:15-7:45pm – “Deep Learning Workflows (with a live demo)”, Kari Briski, Director of Deep Learning Software Product, NVIDIA • 7:45-8:15pm – “Deep Learning in Enterprise” by Nazanin Zaker, Data Scientist, SAP Innovation Center Network • 8:15-8:30pm - Wrap-up & Giveaways February Apache Hadoop Training @ Cloudera May Apache Drill and Apache Spark @ MapR June Career Empowerment @ Andreessen Horowitz June @ Spark Summit June @ Hadoop SummitMarch @ Strata+Hadoop World SJ 10/20/2017 Women in Big Data Event Hashtags: #IamAI, #WiBD
  • 4. 10/20/2017 Women in Big Data Forum Be Part of The Solution Become a member or a sponsor • Website: womeninbigdata.org • LinkedIn: “Women in Big Data Forum” • Meetup: meetup.com/Women-in-Big-Data-Meetup/ • Twitter: @DataWomen • Video: https://www.youtube.com/channel/UCOaMT7A9SVkeBdvYNxiITVA Join us Event Hashtags: #IamAI, #WiBD
  • 6. Kari Briski, 10-18-17 DEEP LEARNING WORKFLOWS: DEEP LEARNING TRAINING AND INFERENCE
  • 7. 7 NATURAL LANGUAGE PROCESSING SPEECH & AUDIO AI APPLICATIONS Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification COMPUTER VISION
  • 8. 8 NATURAL LANGUAGE PROCESSING SPEECH & AUDIO AI APPLICATIONS Object Detection Classification Segmentation Visual Q&A Neural Machine Translation Question & Answer Sentiment Analysis Search and recommendation engines ASR automatic speech recognition Generation Processing Audio-classification Denoising Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification COMPUTER VISION
  • 9. 9 ACCELERATED DEEP LEARNING TRAINING STACK AI Applications are Built on NVIDIA Hardware and Software End-to-End Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING
  • 10. 10 NVIDIA TOOLS FOR DEEP LEARNING WORKFLOW NVIDIA DEEP LEARNING SDK TRAINING DEPLOY WITH TENSORRT TRAINED NETWORK TRAINING DATA TRAINING DATA MANAGEMENT MODEL ASSESSMENT EMBEDDED Jetson TX AUTOMOTIVE Drive PX (XAVIER) DATA CENTER Tesla (Pascal, Volta) DATA: GATHER AND LABEL Rapidly label data, guide training get insights Gather Data Curate data sets Accelerated Deep Learning Training Software Stack
  • 11. 11 DL FLOW INFERENCE & MICROSERVICES IMPORT Format… PREPROCESS clean, clip, label, Normalize, .. VISUALIZATION Curated DatasetSource Dataset TRAIN SCORE + OPTIMIZE, VISUALIZATION DEPLOY tune, compile + runtime REST API RESULT * inference, prediction MODEL ZOO
  • 14. 14 Crowd Source Tools VATIC Free Labeled Data ViPER Computer Vision Translation Speech & Audio Home-grown
  • 15. 15 Project Manager STEP 1 Project Setup Project named Classifier types defined Labeling task settings defined Sequences added Data Labeler STEP 3 Labeling Labels created Attributes of labels selected Frames committed for QA Curator STEP 2 Data Labeler STEP 4 QA Frames accepted or rejected Rejection reason specified Data Labeler STEP 5 Export Data sent to training Export Data set sent to training Curation Which pieces of data make the most sense to us
  • 17. 17 UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras NVIDIA DEEP LEARNING SOFTWARE TRAINING STACK Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING At Your Desk On-Prem In-the-Cloud
  • 18. 18 DEEP LEARNING cuDNN MATH LIBRARIES cuBLAS cuSPARSE COMMUNICATION cuFFT ACCELERATED DEEP LEARNING TRAINING STACK UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras DEEP LEARNING FRAMEWORKS Deep Learning Software Libraries (AKA Frameworks) NCCLArchitecture Specific Libraries Productivity: Workflow, Data and Job Management, Experiments Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING At Your Desk On-Prem In-the-Cloud
  • 19. 19 Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment Analysis DEEP LEARNING cuDNN MATH LIBRARIES cuBLAS cuSPARSE COMMUNICATION cuFFT Image Classification ACCELERATED DEEP LEARNING TRAINING STACK UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION DIGITS, NVIDIA GPU Cloud, NVDocker, Keras, Kubernetes NV OPTIMIZED NV ACCELERATED COMPUTER VISION SPEECH AND AUDIO NATURAL LANGUAGE PROCESSING NCCL Paddle At Your Desk On-Prem In-the-Cloud
  • 20. 20 GENERATIONAL GPU PERFORMANCE & TENSOR CORES 0 1 2 3 4 5 6 7 8 k80 p100 v100 v100 TC Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta
  • 21. 21 GENERATIONAL GPU PERFORMANCE & TENSOR CORES 0 1 2 3 4 5 6 7 8 k80 p100 v100 v100 TC Single GPU Generational Training Scaling ResNet-50; 1,4,8 GPU training on DGX-1 Volta with Volta Tensor Core math 3-3.5X CNN training over Pascal
  • 22. 22 0 10 20 30 40 50 8x-V100 8x P100 8x K80 TIME TO SOLUTION (HOURS) 1 weekend 1 day 1 afternoon Convolutional Neural Networks Recursive Neural Networks Training ImageNet to accuracy(90 epochs) with ResNet-50 Training OpenNMT to accuracy (13 epochs) 0 10 20 30 40 V100 P100 K80
  • 23. 23 WHERE TO TRAIN At Your Desk On-Prem In-the-Cloud
  • 24. 24 INFERENCE DEPLOY YOUR TRAINED NETWORK TO INFER IN APPLICATIONS
  • 25. 25 TRAINED NETWORK MODEL NOW WHAT? 0 500 1000 1500 2000 2500 CPU K80 TF P100 TF P100 TRT Throughput Images/sec
  • 27. 27 NVIDIA TENSOR RT Maximize inference throughput for latency critical services TRAINED NETWORK MODEL EMBEDDED Jetson TX AUTOMOTIVE Drive PX (XAVIER) DATA CENTER Tesla (Pascal, Volta) High performance neural network inference optimizer and runtime engine for production deployment TensorRT Optimizer TensorRT Runtime Engine OPTIMIZED NETWORK
  • 28. 28 TESLA V100 DRIVE PX 2 TESLA P4 JETSON TX2 NVIDIA DLA TensorRT NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM
  • 29. 29 NVIDIA TensorRT Maximize throughput and minimize latency Deploy reduced precision without retraining and without accuracy loss Train in any framework, deploy in TensorRT without overhead Programmable Inference Accelerator Embedded Automotive Data center Jetson Drive PX Tesla developer.nvidia.com/tensorrt
  • 30. 30 VOLTA ON A BUDGET LATENCY BENCHMARKS 0 1000 2000 3000 4000 5000 6000 CPU-Only V100 + TensorFlow V100 + TensorRT Throughput (image/s) vs Latency (ms) CPU-Only V100 + TensorFlow V100 + TensorRT 3X 19 6 7 ResNet-50 (ImageNet) OpenNMT (English to Deutsch) Throughput on a 200 ms latency budget 6X
  • 31. 31 ENABLE INT8 INFERENCE TensorRT is ENABLER for entropy quantization Training Framework TensorRT Calibrate & Quantize fp32 int8 Inference 100’s of samples of training data FP32 TOP 1 INT8 TOP 1 DIFFERENCE Alexnet 57.22% 56.96% 0.26% Googlenet 68.87% 68.49% 0.38% VGG 68.56% 68.45% 0.11% Resnet-50 73.11% 72.54% 0.57% Resnet- 101 74.58% 74.14% 0.44% Resnet- 152 75.18% 74.56% 0.61% Maintain accuracy without retraining
  • 32. 32 NVIDIA TENSOR RT Maximize inference throughput for latency critical services EMBEDDED Jetson TX AUTOMOTIVE Drive PX (XAVIER) DATA CENTER Tesla (Pascal, Volta) Large Batch, Low Latency, Production-ready Real-time execution, high resolution, high throughput, small footprint Low power small footprint, multi-inference
  • 33. 33 “On average TensorRT has doubled the speed of our inference which is pretty amazing!” Source: Paul Kruszewski; CEO WRNCH “On average we see around 10x speedup, with between 3-70x speedups depending on the scenarios ” Source: Matthew Zieler CEO Clarifai “Self-driving car’s having real-time execution is obviously very important. With our ResNet101 network, TensorRT brought our inference time down from 250ms to 89ms.”
  • 34. 34
  • 35. 35NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. FAST IMPLEMENTATION OF TENSORFLOW
  • 37. 37 DL DATACENTER WORKFLOW TensorRT increases productivity and time to results INFERENCE & MICROSERVICES TRAIN SCORE + OPTIMIZE, VISUALIZATION DEPLOY tune, compile + runtime REST API RESULT inference, prediction MODEL ZOO Automated with TensorRT A/B Testing, Use data
  • 38. 38 NVIDIA DIGITS >10k pulls >2.5k stars DL EDGE/ IVA WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute
  • 39. 39 DEMO DEEP LEARNING WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute A special THANK YOU! Zheng Liu & Varun Praveen
  • 41. 41 WHO, WHAT, WHERE RESEARCHERS Explore the “next big thing” opportunity to fuel business APPLIED DL/ DATA SCIENTISTS Retrain w/ data, productize models for consistency, focus on quality APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex.
  • 42. 42 WHO, WHAT, WHERE RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it APPLIED DL/ DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification Paddle
  • 43. 43 WHO, WHAT, WHERE RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. TensorRT Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment AnalysisImage Classification Training Deployingor Paddle
  • 44. 44