Sequence Learning for 
Language Understanding 
Presenter: Quoc V. Le 
Google 
Thanks: Andrew Dai, Jeff Dean, Matthieu Devin, Geoff 
Hinton, Thang Luong, Rajat Monga, Ilya Sutskever, Oriol 
Vinyals
Sequence Learning 
Typical success of Machine Learning: Mapping fixed length input to 
a scalar value: 
- Image recognition (Pixels -> “cat”) 
- Speech recognition (Waveforms -> the utterance of “cat”) 
Many language understanding problems require mapping from 
sequences to sequences: 
- Machine Translation (“I love music” -> “Je aime la musique”) 
Quoc V. Le
Sequence Learning 
Typical success of Machine Learning: Mapping fixed length input to 
a scalar value: 
- Image recognition (Pixels -> “cat”) 
- Speech recognition (Waveforms -> the utterance of “cat”) 
Many language understanding problems require mapping from 
sequences to sequences: 
- Machine Translation (“I love music” -> “Je aime la musique”) 
Quoc V. Le
How does Machine Translation work? 
Use a dictionary to translate one word at a time 
Use a model put reorder the words so that the sentence looks 
reasonable. 
Lots of rules: 
- Phrases instead of words (“New York” should not be translated 
as “New” + “York”) 
- Meaning of words depend on contexts 
Quoc V. Le
Ideas: 
Sequence Learning 
- Use a Recurrent Neural Net encoder to map an input sequence 
to a vector 
- Use a Recurrent Neural Net decoder to map the vector to 
another sequence 
Quoc V. Le
Sequence Learning 
W X Y Z <EOS> 
Quoc V. Le 
Example network that maps ABC -> WXYZ 
A B C <EOS> W X Y Z 
At test time, feed the output back into the decoder as the input 
For better output sequence, generate many candidates, feed each 
candidate to the decoder to have a beam of possible sequences 
Use “beam search” to find the top sequences
Sequence Learning 
W X Y Z <EOS> 
Quoc V. Le 
Example network that maps ABC -> WXYZ 
A B C <EOS> W X Y Z 
At test time, feed the output back into the decoder as the input 
For better output sequence, generate many candidates, feed each 
candidate to the decoder to have a beam of possible sequences 
Use “beam search” to find the top sequences
A machine translation experiment 
WMT’2014 (small in comparison to Google’s data): 
- State-of-art (a combination of many methods, took 20 years to 
develop): 37 
- Our method (took 3 person year): 37 
Important achievement because it’s a new way to represent input 
texts and output texts. Potential breakthrough in many other areas 
of language understanding. 
Quoc V. Le
Sequence Learning 
W X Y Z <EOS> 
A B C <EOS> W X Y Z 
Quoc V. Le
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
Contact: Quoc V. Le (qvl@google.com), 
Ilya Sutskever (ilyasu@google.com), 
Oriol Vinyals (vinyals@google.com) 
Minh-Thang Luong (lmthang@cs.stanford.edu) 
Paper: Sequence to Sequence Learning with Neural Networks 
Addressing the Rare Word Problem in Neural Machine 
Translation 
Upcoming NIPS paper 
Quoc V. Le

More Related Content

PDF
Quoc le, slides MLconf 11/15/13
PDF
Video concept detection by learning from web images
PDF
Build your own ASR engine
PDF
Music recommendations @ MLConf 2014
PDF
Neo4j Graph Data Modeling
PDF
Building Killer Apps with Neo4j 2.0
PDF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
PPTX
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Quoc le, slides MLconf 11/15/13
Video concept detection by learning from web images
Build your own ASR engine
Music recommendations @ MLConf 2014
Neo4j Graph Data Modeling
Building Killer Apps with Neo4j 2.0
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF

Viewers also liked (15)

PDF
Scott Clark, Software Engineer, Yelp at MLconf SF
PDF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
PDF
Steffen Rendle, Research Scientist, Google at MLconf SF
PDF
MLconf - Distributed Deep Learning for Classification and Regression Problems...
PDF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
PDF
10 Lessons Learned from Building Machine Learning Systems
PDF
Sequence learning under incidental conditions [poster]
PDF
Pnomics-2015-FINAL-kbs
PDF
llvm-py: Writing Compilers In Python
PPTX
CogSci2014-kbs-2
PDF
Python libraries for Deep Learning with Sequences
PPTX
Cognitive Science in Virtual Worlds
PDF
Agile Machine Learning for Real-time Recommender Systems
PDF
Introduction to the LLVM Compiler System
PDF
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Scott Clark, Software Engineer, Yelp at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
MLconf - Distributed Deep Learning for Classification and Regression Problems...
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
10 Lessons Learned from Building Machine Learning Systems
Sequence learning under incidental conditions [poster]
Pnomics-2015-FINAL-kbs
llvm-py: Writing Compilers In Python
CogSci2014-kbs-2
Python libraries for Deep Learning with Sequences
Cognitive Science in Virtual Worlds
Agile Machine Learning for Real-time Recommender Systems
Introduction to the LLVM Compiler System
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Ad

Similar to Quoc Le, Software Engineer, Google at MLconf SF (20)

PPTX
Sequence to Sequence Learning with Neural Networks
PDF
Tensor flow05 neural-machine-translation-seq2seq
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
PDF
Generating Natural-Language Text with Neural Networks
PPTX
Deep Learning for Machine Translation
PPTX
Machine Learning - Transformers, Large Language Models and ChatGPT
PPTX
Deep Learning for Natural Language Processing
PDF
MIT-Lec-11- RSA.pdf
PDF
Deep learning for NLP and Transformer
PDF
CSCE181 Big ideas in NLP
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PPTX
Document Analysis with Deep Learning
PDF
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
PDF
A Brief Introduction on Recurrent Neural Network and Its Application
PDF
05-transformers.pdf
PDF
Denis Yarats ITEM 2018
PDF
Deep Learning, Where Are You Going?
PDF
Building a Neural Machine Translation System From Scratch
PDF
Abstractive Text Summarization
PDF
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Sequence to Sequence Learning with Neural Networks
Tensor flow05 neural-machine-translation-seq2seq
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Generating Natural-Language Text with Neural Networks
Deep Learning for Machine Translation
Machine Learning - Transformers, Large Language Models and ChatGPT
Deep Learning for Natural Language Processing
MIT-Lec-11- RSA.pdf
Deep learning for NLP and Transformer
CSCE181 Big ideas in NLP
Beyond the Symbols: A 30-minute Overview of NLP
Document Analysis with Deep Learning
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
A Brief Introduction on Recurrent Neural Network and Its Application
05-transformers.pdf
Denis Yarats ITEM 2018
Deep Learning, Where Are You Going?
Building a Neural Machine Translation System From Scratch
Abstractive Text Summarization
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Ad

More from MLconf (20)

PDF
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PPTX
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
PDF
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
PPTX
Josh Wills - Data Labeling as Religious Experience
PDF
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
PDF
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
PDF
Meghana Ravikumar - Optimized Image Classification on the Cheap
PDF
Noam Finkelstein - The Importance of Modeling Data Collection
PDF
June Andrews - The Uncanny Valley of ML
PDF
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
PDF
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
PDF
Vito Ostuni - The Voice: New Challenges in a Zero UI World
PDF
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
PDF
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
PPTX
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
PPTX
Roy Lowrance - Predicting Bond Prices: Regime Changes
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Josh Wills - Data Labeling as Religious Experience
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Meghana Ravikumar - Optimized Image Classification on the Cheap
Noam Finkelstein - The Importance of Modeling Data Collection
June Andrews - The Uncanny Valley of ML
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Neel Sundaresan - Teaching a machine to code
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Soumith Chintala - Increasing the Impact of AI Through Better Software
Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded (20)

PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
The various Industrial Revolutions .pptx
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
STKI Israel Market Study 2025 version august
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
Benefits of Physical activity for teenagers.pptx
DOCX
search engine optimization ppt fir known well about this
PPTX
Modernising the Digital Integration Hub
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Five Habits of High-Impact Board Members
PDF
A comparative study of natural language inference in Swahili using monolingua...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
The various Industrial Revolutions .pptx
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
STKI Israel Market Study 2025 version august
A proposed approach for plagiarism detection in Myanmar Unicode text
Benefits of Physical activity for teenagers.pptx
search engine optimization ppt fir known well about this
Modernising the Digital Integration Hub
OpenACC and Open Hackathons Monthly Highlights July 2025
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
sustainability-14-14877-v2.pddhzftheheeeee
Custom Battery Pack Design Considerations for Performance and Safety
Enhancing emotion recognition model for a student engagement use case through...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Five Habits of High-Impact Board Members
A comparative study of natural language inference in Swahili using monolingua...

Quoc Le, Software Engineer, Google at MLconf SF

  • 1. Sequence Learning for Language Understanding Presenter: Quoc V. Le Google Thanks: Andrew Dai, Jeff Dean, Matthieu Devin, Geoff Hinton, Thang Luong, Rajat Monga, Ilya Sutskever, Oriol Vinyals
  • 2. Sequence Learning Typical success of Machine Learning: Mapping fixed length input to a scalar value: - Image recognition (Pixels -> “cat”) - Speech recognition (Waveforms -> the utterance of “cat”) Many language understanding problems require mapping from sequences to sequences: - Machine Translation (“I love music” -> “Je aime la musique”) Quoc V. Le
  • 3. Sequence Learning Typical success of Machine Learning: Mapping fixed length input to a scalar value: - Image recognition (Pixels -> “cat”) - Speech recognition (Waveforms -> the utterance of “cat”) Many language understanding problems require mapping from sequences to sequences: - Machine Translation (“I love music” -> “Je aime la musique”) Quoc V. Le
  • 4. How does Machine Translation work? Use a dictionary to translate one word at a time Use a model put reorder the words so that the sentence looks reasonable. Lots of rules: - Phrases instead of words (“New York” should not be translated as “New” + “York”) - Meaning of words depend on contexts Quoc V. Le
  • 5. Ideas: Sequence Learning - Use a Recurrent Neural Net encoder to map an input sequence to a vector - Use a Recurrent Neural Net decoder to map the vector to another sequence Quoc V. Le
  • 6. Sequence Learning W X Y Z <EOS> Quoc V. Le Example network that maps ABC -> WXYZ A B C <EOS> W X Y Z At test time, feed the output back into the decoder as the input For better output sequence, generate many candidates, feed each candidate to the decoder to have a beam of possible sequences Use “beam search” to find the top sequences
  • 7. Sequence Learning W X Y Z <EOS> Quoc V. Le Example network that maps ABC -> WXYZ A B C <EOS> W X Y Z At test time, feed the output back into the decoder as the input For better output sequence, generate many candidates, feed each candidate to the decoder to have a beam of possible sequences Use “beam search” to find the top sequences
  • 8. A machine translation experiment WMT’2014 (small in comparison to Google’s data): - State-of-art (a combination of many methods, took 20 years to develop): 37 - Our method (took 3 person year): 37 Important achievement because it’s a new way to represent input texts and output texts. Potential breakthrough in many other areas of language understanding. Quoc V. Le
  • 9. Sequence Learning W X Y Z <EOS> A B C <EOS> W X Y Z Quoc V. Le
  • 12. Contact: Quoc V. Le (qvl@google.com), Ilya Sutskever (ilyasu@google.com), Oriol Vinyals (vinyals@google.com) Minh-Thang Luong (lmthang@cs.stanford.edu) Paper: Sequence to Sequence Learning with Neural Networks Addressing the Rare Word Problem in Neural Machine Translation Upcoming NIPS paper Quoc V. Le