Large Scale Deep Learning
Quoc V. Le	

Google & CMU
Deep Learning	


•  Google is using Machine Learning 	

•  Machine Learning is difficult	

•  Requires domain knowledge from human experts	

Deep Learning: 	


•  Great performances for many problems	

•  Works well with a large amount of data	

•  Requires less domain knowledge	

Focus: 	


•  Scale deep learning to bigger models and bigger problems	


Quoc V. Le
Deep Learning	


•  Google is using Machine Learning 	

•  Machine Learning is difficult	

•  Requires domain knowledge from human experts	

Deep Learning: 	


•  Great performances for many problems	

•  Works well with a large amount of data	

•  Requires less domain knowledge	

Focus: 	


•  Scale deep learning to bigger models and bigger problems	


Quoc V. Le
What is Deep Learning?	


Quoc V. Le
What is Deep Learning?	


…	

v = g(B u)	

B	

A	


u = g(A x)	

x	


(images, audio, texts, etc.)	


Quoc V. Le
What is Deep Learning?	


…	

v = g(B u)	

B	

A	


u = g(A x)	

x	


(images, audio, texts, etc.)	


Quoc V. Le
High-level features by Deep Learning	

Face detector, Cat detector	

…	


Edge detectors	

Pixels	


Quoc V. Le
Google’s DistBelief	

Model	

Goal: Train deep learning on many
machines	

	

Model: A multiple layered architecture	

	

Forward pass to compute the
features	

Backward pass to compute the
gradient 	


	

Training Data	


Quoc V. Le
Model partition with DistBelief 	

Model	


DistBelief distributes a model across
multiple machines and multiple cores. 	


Machine (Model Partition)	

Training Data	


Quoc V. Le
Model partition with DistBelief 	

Model	


DistBelief distributes a model across
multiple machines and cores. 	


Machine (Model Partition)	

Training Data	


Core	


Quoc V. Le
Model partition with DistBelief 	

Model	


Stochastic Gradient Descent (SGD)	

	

Model parameters are partitioned	

	

Can use up to 1000 cores	


Training Data	


Quoc V. Le
Model partition with DistBelief 	

Model	

But training is still slow on large data sets	

Can we add more parallelism?	

	

Idea: Train multiple models on different
partitions of the data, and merge them	

Training Data	


Quoc V. Le
Data partition with DistBelief 	

Parameter Server	


∆p	


p’ = p + ∆p	


p’	


Model	

Workers	

Data	

Shards	


Quoc V. Le
Parallelism in DistBelief 	

Model parallelism via model partitioning	

	

Data parallelism via data partitioning and asynchronous communications	

	

DistBelief can scale to billion examples and use 100,000 cores or more	

	

Thanks to its speed, DistBelief dramatically improves many applications	

	


Quoc V. Le
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Voice Search	

Classifier	

Hidden layers with 1000s nodes	


Speech frame	


label!

Quoc V. Le
Voice Search	


Quoc V. Le
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Photo Search
Cat detector	

Front page of New York Times	


Quoc V. Le
Seat-belt	


Archery	


Boston rocker	


Shredder
Face	

Amusement, Park	


Hammock
Google+ PhotoSearch
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Text understanding	


Very useful but also difficult	

	

We should try to understand the meaning of words	

	

Deep Learning can learn the meaning of words	

	


Quoc V. Le
Text understanding	

~100-D vector space	

Clinton

Paris

Obama

whale

dolphin

Quoc V. Le
Predicting the next word in a sentence	

Classifier	

Hidden Layers	


E	


E	


E	


E	


E	


the!

Word Matrix	


cat!

sat!

on!

the!

is a matrix of dimension ||Vocab|| x d	


Quoc V. Le
Visualizing the word vectors	


• 

Example nearest neighbors trained on Google News	

apple

Apple

iPhone
Relation Extraction	


Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013	


Quoc V. Le
Machine Translation	


Quoc V. Le
Summary	

Model partition	

Data partition	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Joint work with	


Kai Chen	


Greg Corrado	


Rajat Monga	

 Andrew Ng	


Jeff Dean	


Matthieu Devin	


Paul Tucker	


Ke Yang	


Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas
Additional	

Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato,
Thanks:	

 Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

More Related Content

PPTX
Quoc Le, Software Engineer, Google at MLconf SF
PDF
Practical Deep Learning
PDF
Introducing TensorFlow: The game changer in building "intelligent" applications
PPTX
Big data app meetup 2016-06-15
PPTX
H2O & Tensorflow - Fabrizio
PPTX
Amazon Deep Learning
PDF
AI on a Pi
PDF
Array computing and the evolution of SciPy, NumPy, and PyData
Quoc Le, Software Engineer, Google at MLconf SF
Practical Deep Learning
Introducing TensorFlow: The game changer in building "intelligent" applications
Big data app meetup 2016-06-15
H2O & Tensorflow - Fabrizio
Amazon Deep Learning
AI on a Pi
Array computing and the evolution of SciPy, NumPy, and PyData

What's hot (13)

PPTX
Deep Learning on Qubole Data Platform
PDF
Dato Keynote
PDF
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
PPTX
Analyzing Data With Python
PDF
Introduction To TensorFlow
PDF
Kaz Sato, Evangelist, Google at MLconf ATL 2016
PDF
Data Science at the Command Line
PDF
Deep learning with TensorFlow
PDF
New Capabilities in the PyData Ecosystem
PPTX
Deep learning trends
PDF
Webinar: Deep Learning with H2O
PDF
SciPy Latin America 2019
PDF
Deep Learning with CNTK
Deep Learning on Qubole Data Platform
Dato Keynote
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Analyzing Data With Python
Introduction To TensorFlow
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Data Science at the Command Line
Deep learning with TensorFlow
New Capabilities in the PyData Ecosystem
Deep learning trends
Webinar: Deep Learning with H2O
SciPy Latin America 2019
Deep Learning with CNTK
Ad

Similar to Quoc le, slides MLconf 11/15/13 (20)

PDF
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
PDF
An Introduction to Deep Learning (March 2018)
PPTX
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
PPTX
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
PDF
Machine Learning and Deep Learning with R
PDF
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
PDF
Tutorial on Deep Learning
PDF
Deep learning - Conceptual understanding and applications
PDF
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
PDF
Apache MXNet ODSC West 2018
PDF
Recent progress on distributing deep learning
PPTX
Introduction to Deep learning
PPTX
A simple presentation for deep learning.
PDF
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
PPTX
Introduction to Deep learning
PDF
building intelligent systems with large scale deep learning
PPTX
Deep Learning on Hadoop
PPTX
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
PPTX
Hadoop Summit 2014 Distributed Deep Learning
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
An Introduction to Deep Learning (March 2018)
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Machine Learning and Deep Learning with R
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
Tutorial on Deep Learning
Deep learning - Conceptual understanding and applications
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Deep Learning - The Past, Present and Future of Artificial Intelligence
Apache MXNet ODSC West 2018
Recent progress on distributing deep learning
Introduction to Deep learning
A simple presentation for deep learning.
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
Introduction to Deep learning
building intelligent systems with large scale deep learning
Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 Distributed Deep Learning
Ad

More from MLconf (20)

PDF
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PPTX
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
PDF
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
PPTX
Josh Wills - Data Labeling as Religious Experience
PDF
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
PDF
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
PDF
Meghana Ravikumar - Optimized Image Classification on the Cheap
PDF
Noam Finkelstein - The Importance of Modeling Data Collection
PDF
June Andrews - The Uncanny Valley of ML
PDF
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
PDF
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
PDF
Vito Ostuni - The Voice: New Challenges in a Zero UI World
PDF
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
PDF
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
PPTX
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
PPTX
Roy Lowrance - Predicting Bond Prices: Regime Changes
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Josh Wills - Data Labeling as Religious Experience
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Meghana Ravikumar - Optimized Image Classification on the Cheap
Noam Finkelstein - The Importance of Modeling Data Collection
June Andrews - The Uncanny Valley of ML
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Neel Sundaresan - Teaching a machine to code
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Soumith Chintala - Increasing the Impact of AI Through Better Software
Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded (20)

PDF
Zenith AI: Advanced Artificial Intelligence
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
2018-HIPAA-Renewal-Training for executives
PPT
Geologic Time for studying geology for geologist
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
Modernising the Digital Integration Hub
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Five Habits of High-Impact Board Members
Zenith AI: Advanced Artificial Intelligence
NewMind AI Weekly Chronicles – August ’25 Week III
Consumable AI The What, Why & How for Small Teams.pdf
Chapter 5: Probability Theory and Statistics
Flame analysis and combustion estimation using large language and vision assi...
Module 1.ppt Iot fundamentals and Architecture
A proposed approach for plagiarism detection in Myanmar Unicode text
2018-HIPAA-Renewal-Training for executives
Geologic Time for studying geology for geologist
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Modernising the Digital Integration Hub
Final SEM Unit 1 for mit wpu at pune .pptx
Microsoft Excel 365/2024 Beginner's training
sustainability-14-14877-v2.pddhzftheheeeee
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Getting started with AI Agents and Multi-Agent Systems
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Custom Battery Pack Design Considerations for Performance and Safety
Five Habits of High-Impact Board Members

Quoc le, slides MLconf 11/15/13