SlideShare a Scribd company logo
Discovering Your
AI Super Powers
Tips And Tricks To Jumpstart Your AI Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence Conference 2018, Seattle
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
How long does it take to train a
deep learning model?
Before 2017
2017
April
ResNet-50
32 CPU
256 Nvidia P100 GPUs
1
hour
ResNet-50
NVIDIA M40 GPU
14
days
1018 single precision
operations
Sept
ResNet-50
1,600 CPUs
31
minutes
Nov
15
minutes
ResNet-50
1,024 P100 GPUs
UC Berkeley, TACC, UC DavisFacebook Preferred Network
ChainerMN
5 Super Powers
Understand the problem!1
Different Types of Deep Learning Problems
Yes
Similar
image
Query
image
Deep Learning Tasks
“I had a wonderful
experience! The rooms were
wonderful and the staff was
helpful.”
Build your AI Toolbox!2
Concepts
Techniques
Infrastructure
+ quality of the models
AI
Common Deep Learning Models
CNN
Convolutional
Neural Network
RNN
Recurrent Neural
Network
Deep Learning Libraries
And more….
AI
Toolbox of a Data Scientist
1313
Deep Learning Virtual Machine (DLVM)
14
• Requirements of agile data science
• Elasticity.
• Efficiency.
• Cost-effectiveness.
• Features of DLVM
• Languages.
• Data platforms.
• ML and AI tools.
• Data Exploration and Visualization.
• Data Ingestion tools.
• Development tools
3 Jumpstart with Transfer Learning
Credits: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Types of Transfer Learning
Type How to Initialize
Featurization
Layers
Output Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning
rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)
Deep Neural Network for Computer Vision
cat? YES
dog? NO
car? NO
Convolutional Layers
Fully
Connected
Layers
Complex Objects
& Scenes
(people, animals,
cars, beach
scene, etc.)
Image
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
3x3 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
AlexNet, 8 layers
(ILSVRC 2012)
3x3 conv, 64
3x3 conv, 64, pool/2
3x3 conv, 128
3x3 conv, 128, pool/2
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
fc, 4096
fc, 4096
fc, 1000
VGG, 19 layers
(ILSVRC 2014)
input
Conv
7x7+ 2(S)
MaxPool
3x3+ 2(S)
LocalRespNorm
Conv
1x1+ 1(V)
Conv
3x3+ 1(S)
LocalRespNorm
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
AveragePool
7x7+ 1(V)
FC
Conv
1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max0
Conv
1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max1
Soft maxAct ivat ion
soft max2
GoogleNet, 22 layers
(ILSVRC 2014)
ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
AlexNet, VGG, GoogleNet, ResNet and more
ResNet, 152 layers 1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x2 conv, 128, /2
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
7x7 conv, 64, /2, pool/2
ResNet-152
• Research dataset with >10
million images
• Image annotated with labels
from on ontology (>22K
labels)
• Generic images covering an
extremely wide ranges of labels
ImageNet dataset
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
dotted?
Train one or more
layers in new network
Pre-trained Models
http://bit.ly/2jf97NE http://bit.ly/2zNiN8B http://bit.ly/1KlVMf0
“Transferring knowledge” also possible in other domains such as use of word
embeddings for text
Select the Right
Infrastructure for AI4
AI
@ Scale
Read Data
Clean and Pre-process
Train Models
Inferencing @ Scale
Distributed Training Architecture
Data Parallelism Model Parallelism
1. Parallel training on different
machines
2. Update the parameter server
synchronously/asynchronously
3. Refresh the local model with
new parameters, go to 1 and
repeat
1. The global model is partitioned
into K sub-models without
overlap.
2. The sub-models are distributed
over K local workers and serve
as their local models.
3. In each mini-batch, the local
workers compute the gradients
of the local weights by back
propagation.
Credits: Taifeng Wang, DMTK team
Selecting Big Data and AI Infrastructure
Desktop / Laptop Spark
Clusters
CloudKubernetes
Clusters
VM1
k8s-master-27473156-0
VM2
VM3
k8s-agentpool1-27473156-1
k8s-agentpool2-27473156-0
NC6
(GPU Enabled)
DS_v2
(Non-GPU)
Virtual network
K8s-vnet-27473156
Deployment to Azure
Example – AI @ Scale on Kubernetes Clusters
What can we run on the deep
learning infrastructure?
Using Spark for Machine Learning
• User needs to write lots of “glue” code to prepare features for ML
algorithms.
• Coerce types and data layout to that what’s expected by learner
• Use different conventions for different learners
• Lack of domain-specific libraries: computer vision or text
analytics…
• Latest: Image Data Support in Apache Spark 2.3
https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-
data-support-in-apache-spark/
• Limited capabilities for model evaluation & model management
Microsoft Machine Learning Library
for Apache Spark (MMLSpark)
GitHub Repo: https://github.com/Azure/mmlspark
Get started now using Docker image:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark
Navigate to http://localhost:8888 to view example Jupyter notebooks
Believe we can all do AI ☺5
Custom Vision Video
CustomVision.ai
Discovering Your AI Super
Powers - Tips And Tricks
To Jumpstart Your AI
Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence
Conference 2018, Seattle
Thank You!

More Related Content

PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
PDF
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
PDF
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Intepretability / Explainable AI for Deep Neural Networks
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...

What's hot (16)

PDF
Recurrent Instance Segmentation (UPC Reading Group)
PPT
Realtime Per Face Texture Mapping (PTEX)
PDF
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
PDF
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
PDF
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
PDF
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
Pixel RNN to Pixel CNN++
PDF
Deep Learning behind Prisma
PPTX
[SNU Computer Vision Course Project] Image Style Recognition
PDF
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
PDF
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
PDF
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PDF
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
PDF
Large scale logistic regression and linear support vector machines using spark
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
Recurrent Instance Segmentation (UPC Reading Group)
Realtime Per Face Texture Mapping (PTEX)
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Pixel RNN to Pixel CNN++
Deep Learning behind Prisma
[SNU Computer Vision Course Project] Image Style Recognition
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
Large scale logistic regression and linear support vector machines using spark
Deep Learning for Computer Vision: Visualization (UPC 2016)
Ad

Similar to Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects (20)

PDF
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
PDF
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
PDF
imageclassification-160206090009.pdf
PDF
[第34回 WBA若手の会勉強会] Microsoft AI platform
PPTX
Image classification with Deep Neural Networks
PPTX
Nuts and Bolts of Transfer Learning.pptx
PDF
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
PDF
Deep learning for Industries
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Deep Learning Fundamentals
PPTX
AlexNet
PDF
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PDF
Dato Keynote
PPTX
Deep Neural Networks for Computer Vision
PPTX
Deep cv 101
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
imageclassification-160206090009.pdf
[第34回 WBA若手の会勉強会] Microsoft AI platform
Image classification with Deep Neural Networks
Nuts and Bolts of Transfer Learning.pptx
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
Deep learning for Industries
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Deep Learning Fundamentals
AlexNet
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
DLD meetup 2017, Efficient Deep Learning
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Dato Keynote
Deep Neural Networks for Computer Vision
Deep cv 101
Ad

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IB Computer Science - Internal Assessment.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
Business Acumen Training GuidePresentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
climate analysis of Dhaka ,Banglades.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Miokarditis (Inflamasi pada Otot Jantung)
IB Computer Science - Internal Assessment.pptx

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

  • 1. Discovering Your AI Super Powers Tips And Tricks To Jumpstart Your AI Project Wee Hyong Tok, PhD Principal Data Science Manager Microsoft @weehyong Global Artificial Intelligence Conference 2018, Seattle
  • 3. How long does it take to train a deep learning model?
  • 4. Before 2017 2017 April ResNet-50 32 CPU 256 Nvidia P100 GPUs 1 hour ResNet-50 NVIDIA M40 GPU 14 days 1018 single precision operations Sept ResNet-50 1,600 CPUs 31 minutes Nov 15 minutes ResNet-50 1,024 P100 GPUs UC Berkeley, TACC, UC DavisFacebook Preferred Network ChainerMN
  • 7. Different Types of Deep Learning Problems Yes Similar image Query image
  • 8. Deep Learning Tasks “I had a wonderful experience! The rooms were wonderful and the staff was helpful.”
  • 9. Build your AI Toolbox!2
  • 11. AI Common Deep Learning Models CNN Convolutional Neural Network RNN Recurrent Neural Network
  • 13. Toolbox of a Data Scientist 1313
  • 14. Deep Learning Virtual Machine (DLVM) 14 • Requirements of agile data science • Elasticity. • Efficiency. • Cost-effectiveness. • Features of DLVM • Languages. • Data platforms. • ML and AI tools. • Data Exploration and Visualization. • Data Ingestion tools. • Development tools
  • 15. 3 Jumpstart with Transfer Learning Credits: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/
  • 16. Types of Transfer Learning Type How to Initialize Featurization Layers Output Layer Initialization How is Transfer Learning used? How to Train? Standard DNN Random Random None Train featurization and output jointly Headless DNN Learn using another task Separate ML algorithm Use the features learned on a related task Use the features to train a separate classifier Fine Tune DNN Learn using another task Random Use and fine tune features learned on a related task Train featurization and output jointly with a small learning rate Multi-Task DNN Random Random Learned features need to solve many related tasks Share a featurization network across both tasks. Train all networks jointly with a loss function (sum of individual task loss function)
  • 17. Deep Neural Network for Computer Vision cat? YES dog? NO car? NO Convolutional Layers Fully Connected Layers Complex Objects & Scenes (people, animals, cars, beach scene, etc.) Image Low-Level Features (lines, edges, color fields, etc.) High-Level Features (corners, contours, simple shapes) Object Parts (wheels, faces, windows, etc.)
  • 18. 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012) 3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000 VGG, 19 layers (ILSVRC 2014) input Conv 7x7+ 2(S) MaxPool 3x3+ 2(S) LocalRespNorm Conv 1x1+ 1(V) Conv 3x3+ 1(S) LocalRespNorm MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) AveragePool 5x5+ 3(V) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) AveragePool 5x5+ 3(V) Dept hConcat MaxPool 3x3+ 2(S) Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hConcat AveragePool 7x7+ 1(V) FC Conv 1x1+ 1(S) FC FC Soft maxAct ivat ion soft max0 Conv 1x1+ 1(S) FC FC Soft maxAct ivat ion soft max1 Soft maxAct ivat ion soft max2 GoogleNet, 22 layers (ILSVRC 2014) ILSVRC (ImageNet Large Scale Visual Recognition Challenge) AlexNet, VGG, GoogleNet, ResNet and more
  • 19. ResNet, 152 layers 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x2 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 7x7 conv, 64, /2, pool/2 ResNet-152
  • 20. • Research dataset with >10 million images • Image annotated with labels from on ontology (>22K labels) • Generic images covering an extremely wide ranges of labels ImageNet dataset
  • 21. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped Classi fier e.g. SVM dotted? Complex Objects & Scenes (people, animals, cars, beach scene, etc.) Low-Level Features (lines, edges, color fields, etc.) High-Level Features (corners, contours, simple shapes) Object Parts (wheels, faces, windows, etc.) Outputs of penultimate layer of ImageNet Trained CNN provide excellent general purpose image features
  • 22. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped Using a pre-trained DNN, an accurate model can be achieved with thousands (or less) of labeled examples instead of millions dotted? Train one or more layers in new network
  • 23. Pre-trained Models http://bit.ly/2jf97NE http://bit.ly/2zNiN8B http://bit.ly/1KlVMf0 “Transferring knowledge” also possible in other domains such as use of word embeddings for text
  • 25. AI @ Scale Read Data Clean and Pre-process Train Models Inferencing @ Scale
  • 26. Distributed Training Architecture Data Parallelism Model Parallelism 1. Parallel training on different machines 2. Update the parameter server synchronously/asynchronously 3. Refresh the local model with new parameters, go to 1 and repeat 1. The global model is partitioned into K sub-models without overlap. 2. The sub-models are distributed over K local workers and serve as their local models. 3. In each mini-batch, the local workers compute the gradients of the local weights by back propagation. Credits: Taifeng Wang, DMTK team
  • 27. Selecting Big Data and AI Infrastructure Desktop / Laptop Spark Clusters CloudKubernetes Clusters
  • 29. What can we run on the deep learning infrastructure?
  • 30. Using Spark for Machine Learning • User needs to write lots of “glue” code to prepare features for ML algorithms. • Coerce types and data layout to that what’s expected by learner • Use different conventions for different learners • Lack of domain-specific libraries: computer vision or text analytics… • Latest: Image Data Support in Apache Spark 2.3 https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image- data-support-in-apache-spark/ • Limited capabilities for model evaluation & model management
  • 31. Microsoft Machine Learning Library for Apache Spark (MMLSpark) GitHub Repo: https://github.com/Azure/mmlspark Get started now using Docker image: docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark Navigate to http://localhost:8888 to view example Jupyter notebooks
  • 32. Believe we can all do AI ☺5
  • 34. Discovering Your AI Super Powers - Tips And Tricks To Jumpstart Your AI Project Wee Hyong Tok, PhD Principal Data Science Manager Microsoft @weehyong Global Artificial Intelligence Conference 2018, Seattle Thank You!