Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

Discovering Your
AI Super Powers
Tips And Tricks To Jumpstart Your AI Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence Conference 2018, Seattle

How long does it take to train a
deep learning model?

Before 2017
2017
April
ResNet-50
32 CPU
256 Nvidia P100 GPUs
1
hour
ResNet-50
NVIDIA M40 GPU
14
days
1018 single precision
operations
Sept
ResNet-50
1,600 CPUs
31
minutes
Nov
15
minutes
ResNet-50
1,024 P100 GPUs
UC Berkeley, TACC, UC DavisFacebook Preferred Network
ChainerMN

Different Types of Deep Learning Problems
Yes
Similar
image
Query
image

Deep Learning Tasks
“I had a wonderful
experience! The rooms were
wonderful and the staff was
helpful.”

Concepts
Techniques
Infrastructure
+ quality of the models

AI
Common Deep Learning Models
CNN
Convolutional
Neural Network
RNN
Recurrent Neural
Network

Deep Learning Libraries
And more….
AI

Toolbox of a Data Scientist
1313

Deep Learning Virtual Machine (DLVM)
14
• Requirements of agile data science
• Elasticity.
• Efficiency.
• Cost-effectiveness.
• Features of DLVM
• Languages.
• Data platforms.
• ML and AI tools.
• Data Exploration and Visualization.
• Data Ingestion tools.
• Development tools

3 Jumpstart with Transfer Learning
Credits: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/

Types of Transfer Learning
Type How to Initialize
Featurization
Layers
Output Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning
rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)

Deep Neural Network for Computer Vision
cat? YES
dog? NO
car? NO
Convolutional Layers
Fully
Connected
Layers
Complex Objects
& Scenes
(people, animals,
cars, beach
scene, etc.)
Image
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)

11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
fc, 4096
fc, 4096
fc, 1000
AlexNet, 8 layers
(ILSVRC 2012)
3x3 conv, 64
3x3 conv, 128
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
fc, 4096
fc, 4096
fc, 1000
VGG, 19 layers
(ILSVRC 2014)
input
Conv
7x7+ 2(S)
MaxPool
3x3+ 2(S)
LocalRespNorm
Conv
1x1+ 1(V)
Conv
3x3+ 1(S)
LocalRespNorm
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
AveragePool
7x7+ 1(V)
FC
Conv
1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max0
Conv
1x1+ 1(S)
FC
FC
soft max1
soft max2
GoogleNet, 22 layers
(ILSVRC 2014)
ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
AlexNet, VGG, GoogleNet, ResNet and more

ResNet, 152 layers 1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x2 conv, 128, /2
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
7x7 conv, 64, /2, pool/2
ResNet-152

• Research dataset with >10
million images
• Image annotated with labels
from on ontology (>22K
labels)
• Generic images covering an
extremely wide ranges of labels
ImageNet dataset

Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features

Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
dotted?
Train one or more
layers in new network

Pre-trained Models
http://bit.ly/2jf97NE http://bit.ly/2zNiN8B http://bit.ly/1KlVMf0
“Transferring knowledge” also possible in other domains such as use of word
embeddings for text

Select the Right
Infrastructure for AI4

AI
@ Scale
Read Data
Clean and Pre-process
Train Models
Inferencing @ Scale

Distributed Training Architecture
Data Parallelism Model Parallelism
1. Parallel training on different
machines
2. Update the parameter server
synchronously/asynchronously
3. Refresh the local model with
new parameters, go to 1 and
repeat
1. The global model is partitioned
into K sub-models without
overlap.
2. The sub-models are distributed
over K local workers and serve
as their local models.
3. In each mini-batch, the local
workers compute the gradients
of the local weights by back
propagation.
Credits: Taifeng Wang, DMTK team

Selecting Big Data and AI Infrastructure
Desktop / Laptop Spark
Clusters
CloudKubernetes
Clusters

VM1
k8s-master-27473156-0
VM2
VM3
k8s-agentpool1-27473156-1
k8s-agentpool2-27473156-0
NC6
(GPU Enabled)
DS_v2
(Non-GPU)
Virtual network
K8s-vnet-27473156
Deployment to Azure
Example – AI @ Scale on Kubernetes Clusters

What can we run on the deep
learning infrastructure?

Using Spark for Machine Learning
• User needs to write lots of “glue” code to prepare features for ML
algorithms.
• Coerce types and data layout to that what’s expected by learner
• Use different conventions for different learners
• Lack of domain-specific libraries: computer vision or text
analytics…
• Latest: Image Data Support in Apache Spark 2.3
https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-
data-support-in-apache-spark/
• Limited capabilities for model evaluation & model management

Microsoft Machine Learning Library
for Apache Spark (MMLSpark)
GitHub Repo: https://github.com/Azure/mmlspark
Get started now using Docker image:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark
Navigate to http://localhost:8888 to view example Jupyter notebooks

Custom Vision Video
CustomVision.ai

Discovering Your AI Super
Powers - Tips And Tricks
To Jumpstart Your AI
Project
Wee Hyong Tok, PhD
Principal Data Science Manager
Microsoft
@weehyong
Global Artificial Intelligence
Conference 2018, Seattle
Thank You!

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

More Related Content

What's hot (16)

Similar to Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects (20)

Recently uploaded (20)

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects