SlideShare a Scribd company logo
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to bootstrap image
classification and question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
Textbook ML development
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Fact | Industry grade ML solutions are highly exploratory
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Attempt 1 Attempt 2 Attempt 3
Attempt 4 Attempt n
Traditional versus Transfer learning
Learning
system
Learning
system
Learning
system
Different tasks
Traditional Machine Learning Transfer Learning
Source tasks
Learning
system
Target task
Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
Why are we talking about transfer learning ?
Commercial
success
Time 2016
Supervised
learning
Transfer
learning
Unsupervised
learning
Reinforcement
learning
Drivers of ML success in industry
Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
Transfer Learning in Computer Vision
Can we leverage knowledge of processing images to help with new
tasks?
• What’s in the picture?
• Where is the bike located?
• Can you find a similar bike?
• How many bikes are there?
Before Deep Learning
• Researchers took a traditional machine learning approach
• Manual creation of a variety of different visual feature extractors
• Followed by traditional ML classifiers
• Features not very generalizable to other vision tasks – not easy to transfer
• Example: HoG Detectors
- Histogram of oriented
gradients (HoG) features
- Sliding window detector
- SVM Classifier
- Very fast OpenCV
implementation (<100ms)
Deep Neural Networks
14,197,122 images
21841 synsets
Diverse images, Lots of labels!
Transfer Learning for Computer Vision
Train a model
using data from
ImageNet Retail
Manufacturing
Deep Learning
Model for
Computer
Vision
Apply the
model to
other domains
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Another fun site:
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Check out these sites -
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Clothing texture dataset:
• 1716 images from Bing which were manually annotated
Striped
Argyle
Dotted
OReilly AI Transfer Learning
Transfer Learning – How to get started?
Type How to Initialize
Featurization
Layers
Output
Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
cat? YES
dog? NO
car? NO
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
cat? YES
dog? NO
car? NO
dotted?
Train one or more
layers in new network
Transfer Learning Results - Texture Dataset
DNN featurization
Input Image Size: 224x224 pixels
Area Under Curve: 0.59
Classification Accuracy: 69.0%
Fine-tuning (full CNN)
Input Image Size: 224x224 pixels
Area Under Curve: 0.76
Classification Accuracy: 77.4%
Fine-tuning (full CNN)
Input Image Size: 896x886 pixels
Area Under Curve: 0.83
Classification Accuracy: 88.2%
Transfer Learning for Similarity
OReilly AI Transfer Learning
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
•Hymenoptera, 2 classes and 397 images.
•Simpsons, 20 classes (subset of total) and 19548 images.
•Dogs vs Cats, 2 classes and 25000 images.
•Caltech 256, 257 classes and 30607 images.
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
0
5000
10000
15000
20000
25000
30000
35000
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Dataset
Hymenoptera
Dataset
Hymenoptera
gray
Dataset
Simpsons
Dataset
Simpsons gray
Dataset Dogs
vs Cats
Dataset Dogs
vs Cats gray
Dataset
Caltech256
Dataset
Caltech256
gray
Val. accuracy finetuning Val. accuracy freezing # of images
Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection
Example Applications in Computer Vision
Lung Cancer Detection
Distributed deep domain
adaptation for automated
poacher detection
https://github.com/MattKleinsmith/void-detector
OReilly AI Transfer Learning
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Label Noise
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Traditional Method: Manual Verification
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Applying Transfer Learning
Computer Vision is not a “solved problem”
The knowledge being “transferred” can be very useful but not the same as
how humans learn to see
Recap: Transfer Learning for Image Classification
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
or use it as a
headless DNN
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
Audio Spectrograms
Images
Rich, high-dimensional datasets
Rich, high-dimensional datasets
Text
Spare data (depends on the encoding)I s e e a b I g c a t
Deep Learning on Different Types of Data
How do we apply
Transfer Learning to NLP?
Different Type of NLP Tasks
And many more….
Transfer Learning for Text
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
What does the top
layer encode?
What kind of pre-
trained model?
Word Embeddings
Male - Female Verb Tense Country - Capital
Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
Word Embeddings
2013 2014-2015 2017
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
embeddings_index = {}
f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'))
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
Compute an index
mapping words to
known embeddings
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Compute Embedding
Matrix
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
from keras.layers import Embedding
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
Load the Embedding
Matrix into an
Embedding Layer
Prevent weights from being
updated during training
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(len(labels_index), activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=2, batch_size=128)
Build a small 1D
convnet to solve the
classification problem
From initializing the first layers to pre-
training the entire model
(and learning higher level semantic concepts)
Transfer Learning for NLP - ULMFiT
Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018
Train a Language Model
using Large General
Domain Corpus
Fine-tune the
Language Model
Fine-tune Classifier
Transfer Learning for NLP - ELMo
Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018
ELMo ELMo ELMo
have a nice
Corpus
Train
biLMs
Enhancing
Inputs with ELMos
Usual
Inputs
ELMo Pre-trained Models
Source: https://allennlp.org/elmo
Using ELMo with TensorFlow Hub
Source: https://www.tensorflow.org/hub/modules/google/elmo/2
elmo = hub.Module("https://tfhub.dev/google/elmo/2",
trainable=True)
embeddings = elmo(
["the cat is on the mat", "dogs are in the fog"],
signature="default",
as_dict=True)["elmo"]
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]
ELMo
Untokenized Sentences
Tokens
Or Dictionary
• Character-based word representation
• First LSTM Hidden State
• Second LSTM Hidden State
• elmo (weighted sum of 3 layers)
• Fixed mean-pooling of contextualized
word representation
Transfer Learning for MRC tasks
Source:
Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
Transfer Learning for MRC
Train a model
using data from
WikiPedia
News Articles
Customer Support Data
MRC
Model Apply the
model to
other domains
SQUAD
Stanford Question Answering Dataset (SQuAD)
Reading comprehension dataset
Based on Wikipedia articles
Crowdsource questions
Answer is Text Segment, or span, from
the corresponding reading passage, or the no
answers found.
Question Answer Pairs
MRC Datasets
Transfer Learning for MRC using SynNet
Train using a large
MRC Dataset (e.g.
SQuAD)
Apply the pre-
trained model to a
new domain (e.g.
NewsQA)
Validate
the model
Deploy the model
Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0
More comparisons between different MRC Approaches
SynNet
Stage 1- Answer Synthesis module
uses a bi-directional LSTM to predict
IOB tags on the input paragraph.
Marks out semantic concept that are
likely answer
Stage 2 – Question Synthesis module
uses a uni-directional LSTM to
generate the questions
Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
SynNet – Question/Answer Generation Example
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Summary
1. Transfer Learning and
Applications
2. How to use Transfer Learning for
Image Classification
3. How to use Transfer Learning for
NLP tasks
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Thank You!

More Related Content

PDF
How to use transfer learning to bootstrap image classification and question a...
PDF
Deep Learning Primer - a brief introduction
PDF
Convolutional Neural Networks: Part 1
PPTX
Human age and gender Detection
PDF
Report
PDF
Recommender Systems In Industry
PPTX
Multimodal Learning Analytics
PDF
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
How to use transfer learning to bootstrap image classification and question a...
Deep Learning Primer - a brief introduction
Convolutional Neural Networks: Part 1
Human age and gender Detection
Report
Recommender Systems In Industry
Multimodal Learning Analytics
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...

What's hot (11)

PDF
林守德/Practical Issues in Machine Learning
PPTX
Our research lines on Model-Driven Engineering and Software Engineering
PDF
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
PDF
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
PPTX
Deep Learning - A Literature survey
PPTX
Academia to Data Science - A Hitchhiker's Guide
PDF
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
PDF
Please don't make me draw (eKnow 2010)
PDF
Generative Models for General Audiences
PDF
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
PPTX
Building Continuous Learning Systems
林守德/Practical Issues in Machine Learning
Our research lines on Model-Driven Engineering and Software Engineering
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
Deep Learning - A Literature survey
Academia to Data Science - A Hitchhiker's Guide
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Please don't make me draw (eKnow 2010)
Generative Models for General Audiences
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Continuous Learning Systems
Ad

Similar to OReilly AI Transfer Learning (20)

PPTX
Strata London - Deep Learning 05-2015
PDF
From Conventional Machine Learning to Deep Learning and Beyond.pptx
PPT
NEXiDA at OMG June 2009
PDF
Crafting Recommenders: the Shallow and the Deep of it!
PDF
ODSC East: Effective Transfer Learning for NLP
PDF
Dato Keynote
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PDF
Mirko Lucchese - Deep Image Processing
PPTX
DotNet Conf Madrid 2019 - Whats New in ML.NET
PDF
Deep-learning-for-computer-vision-applications-using-matlab.pdf
PDF
Easy path to machine learning (2023-2024)
PPTX
Production ML Systems and Computer Vision with Google Cloud
PDF
Deep Learning and the state of AI / 2016
PPTX
Designing Artificial Intelligence
PDF
Cutting Edge Computer Vision for Everyone
PDF
Deep Learning with CNTK
PPTX
B4UConference_machine learning_deeplearning
PDF
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
PPTX
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
PPTX
Strata London - Deep Learning 05-2015
From Conventional Machine Learning to Deep Learning and Beyond.pptx
NEXiDA at OMG June 2009
Crafting Recommenders: the Shallow and the Deep of it!
ODSC East: Effective Transfer Learning for NLP
Dato Keynote
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Mirko Lucchese - Deep Image Processing
DotNet Conf Madrid 2019 - Whats New in ML.NET
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Easy path to machine learning (2023-2024)
Production ML Systems and Computer Vision with Google Cloud
Deep Learning and the state of AI / 2016
Designing Artificial Intelligence
Cutting Edge Computer Vision for Everyone
Deep Learning with CNTK
B4UConference_machine learning_deeplearning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Ad

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Spectroscopy.pptx food analysis technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MIND Revenue Release Quarter 2 2025 Press Release
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology

OReilly AI Transfer Learning

  • 1. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
  • 2. Textbook ML development Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions
  • 3. Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Fact | Industry grade ML solutions are highly exploratory Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt n
  • 4. Traditional versus Transfer learning Learning system Learning system Learning system Different tasks Traditional Machine Learning Transfer Learning Source tasks Learning system Target task Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
  • 5. Why are we talking about transfer learning ? Commercial success Time 2016 Supervised learning Transfer learning Unsupervised learning Reinforcement learning Drivers of ML success in industry Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
  • 6. Transfer Learning in Computer Vision Can we leverage knowledge of processing images to help with new tasks? • What’s in the picture? • Where is the bike located? • Can you find a similar bike? • How many bikes are there?
  • 7. Before Deep Learning • Researchers took a traditional machine learning approach • Manual creation of a variety of different visual feature extractors • Followed by traditional ML classifiers • Features not very generalizable to other vision tasks – not easy to transfer • Example: HoG Detectors - Histogram of oriented gradients (HoG) features - Sliding window detector - SVM Classifier - Very fast OpenCV implementation (<100ms)
  • 9. 14,197,122 images 21841 synsets Diverse images, Lots of labels!
  • 10. Transfer Learning for Computer Vision Train a model using data from ImageNet Retail Manufacturing Deep Learning Model for Computer Vision Apply the model to other domains
  • 11. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Another fun site: https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 12. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Check out these sites - https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 13. Clothing texture dataset: • 1716 images from Bing which were manually annotated Striped Argyle Dotted
  • 15. Transfer Learning – How to get started? Type How to Initialize Featurization Layers Output Layer Initialization How is Transfer Learning used? How to Train? Standard DNN Random Random None Train featurization and output jointly Headless DNN Learn using another task Separate ML algorithm Use the features learned on a related task Use the features to train a separate classifier Fine Tune DNN Learn using another task Random Use and fine tune features learned on a related task Train featurization and output jointly with a small learning rate Multi-Task DNN Random Random Learned features need to solve many related tasks Share a featurization network across both tasks. Train all networks jointly with a loss function (sum of individual task loss function)
  • 16. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped cat? YES dog? NO car? NO Classi fier e.g. SVM dotted? Complex Objects & Scenes (people, animals, cars, beach scene, etc.) Low-Level Features (lines, edges, color fields, etc.) High-Level Features (corners, contours, simple shapes) Object Parts (wheels, faces, windows, etc.) Outputs of penultimate layer of ImageNet Trained CNN provide excellent general purpose image features
  • 17. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped Using a pre-trained DNN, an accurate model can be achieved with thousands (or less) of labeled examples instead of millions cat? YES dog? NO car? NO dotted? Train one or more layers in new network
  • 18. Transfer Learning Results - Texture Dataset DNN featurization Input Image Size: 224x224 pixels Area Under Curve: 0.59 Classification Accuracy: 69.0% Fine-tuning (full CNN) Input Image Size: 224x224 pixels Area Under Curve: 0.76 Classification Accuracy: 77.4% Fine-tuning (full CNN) Input Image Size: 896x886 pixels Area Under Curve: 0.83 Classification Accuracy: 88.2%
  • 19. Transfer Learning for Similarity
  • 21. Full code: https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb •Hymenoptera, 2 classes and 397 images. •Simpsons, 20 classes (subset of total) and 19548 images. •Dogs vs Cats, 2 classes and 25000 images. •Caltech 256, 257 classes and 30607 images.
  • 25. Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection Example Applications in Computer Vision Lung Cancer Detection Distributed deep domain adaptation for automated poacher detection
  • 28. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Label Noise
  • 29. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Traditional Method: Manual Verification
  • 30. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Applying Transfer Learning
  • 31. Computer Vision is not a “solved problem” The knowledge being “transferred” can be very useful but not the same as how humans learn to see
  • 32. Recap: Transfer Learning for Image Classification Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune or use it as a headless DNN Freeze top layers, re-train the classifier Validate the model Deploy the model
  • 33. Audio Spectrograms Images Rich, high-dimensional datasets Rich, high-dimensional datasets Text Spare data (depends on the encoding)I s e e a b I g c a t Deep Learning on Different Types of Data
  • 34. How do we apply Transfer Learning to NLP?
  • 35. Different Type of NLP Tasks And many more….
  • 36. Transfer Learning for Text Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune Freeze top layers, re-train the classifier Validate the model Deploy the model What does the top layer encode? What kind of pre- trained model?
  • 37. Word Embeddings Male - Female Verb Tense Country - Capital Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
  • 39. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html embeddings_index = {} f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs f.close() Compute an index mapping words to known embeddings embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM)) for word, i in word_index.items(): embedding_vector = embeddings_index.get(word) if embedding_vector is not None: # words not found in embedding index will be all-zeros. embedding_matrix[i] = embedding_vector Compute Embedding Matrix
  • 40. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html from keras.layers import Embedding embedding_layer = Embedding(len(word_index) + 1, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False) Load the Embedding Matrix into an Embedding Layer Prevent weights from being updated during training
  • 41. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32') embedded_sequences = embedding_layer(sequence_input) x = Conv1D(128, 5, activation='relu')(embedded_sequences) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(35)(x) # global max pooling x = Flatten()(x) x = Dense(128, activation='relu')(x) preds = Dense(len(labels_index), activation='softmax')(x) model = Model(sequence_input, preds) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc']) model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=2, batch_size=128) Build a small 1D convnet to solve the classification problem
  • 42. From initializing the first layers to pre- training the entire model (and learning higher level semantic concepts)
  • 43. Transfer Learning for NLP - ULMFiT Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018 Train a Language Model using Large General Domain Corpus Fine-tune the Language Model Fine-tune Classifier
  • 44. Transfer Learning for NLP - ELMo Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018 ELMo ELMo ELMo have a nice Corpus Train biLMs Enhancing Inputs with ELMos Usual Inputs
  • 45. ELMo Pre-trained Models Source: https://allennlp.org/elmo
  • 46. Using ELMo with TensorFlow Hub Source: https://www.tensorflow.org/hub/modules/google/elmo/2 elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) embeddings = elmo( ["the cat is on the mat", "dogs are in the fog"], signature="default", as_dict=True)["elmo"] elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) tokens_input = [["the", "cat", "is", "on", "the", "mat"], ["dogs", "are", "in", "the", "fog", ""]] tokens_length = [6, 5] embeddings = elmo( inputs={ "tokens": tokens_input, "sequence_len": tokens_length }, signature="tokens", as_dict=True)["elmo"] ELMo Untokenized Sentences Tokens Or Dictionary • Character-based word representation • First LSTM Hidden State • Second LSTM Hidden State • elmo (weighted sum of 3 layers) • Fixed mean-pooling of contextualized word representation
  • 47. Transfer Learning for MRC tasks Source: Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
  • 48. Transfer Learning for MRC Train a model using data from WikiPedia News Articles Customer Support Data MRC Model Apply the model to other domains
  • 49. SQUAD Stanford Question Answering Dataset (SQuAD) Reading comprehension dataset Based on Wikipedia articles Crowdsource questions Answer is Text Segment, or span, from the corresponding reading passage, or the no answers found. Question Answer Pairs
  • 51. Transfer Learning for MRC using SynNet Train using a large MRC Dataset (e.g. SQuAD) Apply the pre- trained model to a new domain (e.g. NewsQA) Validate the model Deploy the model Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0 More comparisons between different MRC Approaches
  • 52. SynNet Stage 1- Answer Synthesis module uses a bi-directional LSTM to predict IOB tags on the input paragraph. Marks out semantic concept that are likely answer Stage 2 – Question Synthesis module uses a uni-directional LSTM to generate the questions Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
  • 53. SynNet – Question/Answer Generation Example
  • 54. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Summary 1. Transfer Learning and Applications 2. How to use Transfer Learning for Image Classification 3. How to use Transfer Learning for NLP tasks
  • 55. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Thank You!