SlideShare a Scribd company logo
Deep Learning for NLP
Yves Peirsman
About me
2012
NLP engineer
2011
Post-doctoral researcher,
Stanford University
2014
NLP Town
2010
PhD Computational
Linguistics, KULeuven
Deep Learning in NLP
2012
Deep Learning
● Comeback of neural networks
● Unified framework for many
problems
1990
Statistical NLP
● Machine learning from data
● Many different models
1950s
Rule-based NLP
● Hand-written linguistic rules
● Knowledge models
20??
??
Deep Learning in NLP
Basic models
The basic NLPer’s
toolkit
Main advantages
Why has DL become
so popular in NLP?
Beyond the hype
Deeper dive & recent trends
Word embeddings
Words as atomic units Words as dense embeddings
The movie has an excellent cast. M
I like the cover of the book. B
There were too many pages in the
novel.
?
book
novel
movie
POS
syntaxcapitaliz
ation
prefix
suffix
bigrams
lemmas
Feature engineering
Major feature engineering Little feature engineering (if any)
strings
strings
Models
Distinct models for different problems Unified toolkit
Text
classification
NER MT ...
PP PP PP
SVM CRF LM TM
decoder
Text
classification
NER MT ...
LSTM (or similar)
NLP Toolkit: LSTM for classification
Applications: text classification, language modelling
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings
dense layer
weights
biases
The
movie
was
boring
.
positive
neutral
negative
NLP Toolkit: inside the LSTM
forget input tanh output
tanh
was boring
The movie ...
NLP Toolkit: LSTM for sequence labelling
Applications: named entity recognition
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings dense layer
logits
B-PER
O
O
B-LOC
O
weights
biases
John
lives
in
London
.
NLP Toolkit: Encoder-Decoder Architectures
Applications: machine translation, text summarization, dialogue modelling, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.
NLP Toolkit: Attention
Applications: machine translation, question answering, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.
attention
NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.
Many DL approaches do not model any linguistic knowledge.
They view language as a sequence of strings.
Is this the end of NLP as a separate discipline?
NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.
Language models
Rajeswar et al. 2017, https://arxiv.org/pdf/1705.10929.pdf
Language models
● Great performance when explicitly trained for the task: 99% correct
○ > 120,000 sentence starts, labelled with singular or plural.
○ 50-dimensional LSTM followed by logistic regression.
○ In > 95% of the cases, the last noun determines the number.
● Performance drop for generic language models: 93% correct
○ Worse than chance on cases where a noun of the “incorrect” number occurs between the
subject and the verb
Linzen, Dupoux & Goldberg 2016, https://arxiv.org/pdf/1611.01368.pdf
Machine Translation
● NMT can behave strangely
● Problems for languages with a very different syntax, such as English and
Chinese:
○ 25% of Chinese noun phrases are translated into discontinuous phrases in English
○ Chinese noun phrases are often translated twice
Li et al. 2017, https://arxiv.org/abs/1705.01020
Question Answering
Jia & Liang 2017, https://arxiv.org/pdf/1707.07328.pdf
Textual entailment
Deep Learning in NLP
● Deep Learning produces great results on many tasks.
● But:
○ Race to the bottom on standard data sets:
■ Language models: Penn Treebank, WikiText-2
■ Machine Translation: WMT datasets
■ Question Answering: SQuAD
○ Its ignorance of linguistic structure is problematic in the evolution towards NLU
● So:
○ What do neural networks model?
○ How can we make them better?
Linguistic knowledge in MT
● What linguistic knowledge does MT model?
● Simple syntactic labels
○ Encoder output + logistic regression
■ Word-level output: part-of-speech
■ Sentence-level output: voice (active or passive), tense (past or present)
● Deep syntactic structure
○ Encoder output + decoder to predict parse trees
● Two benchmarks:
○ Upper bound: neural parser
○ Lower bound: English-to-English “MT” auto-encoder
Shi et al. 2016, https://www.isi.edu/natural-language/mt/emnlp16-nmt-grammar.pdf
Linguistic knowledge in MT
Linguistic knowledge in MT
Linguistic knowledge in MT
Solution 1: present the encoder with both syntactic and lexical information
Li et al. 2017, https://arxiv.org/abs/1705.01020
Linguistic knowledge in MT
Li et al. 2017, https://arxiv.org/abs/1705.01020
Linguistic knowledge in MT
Solution 2: combine MT with
parsing in multi-task learning
Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
Linguistic knowledge in MT
Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
Linguistic knowledge in QA
● Most answers to questions are
constituents in the sentence.
● Restricting our candidate answers
to constituents reduces the search spaces.
● Instead of feeding the network flat
sequences, we need to feed it syntax trees.
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
Linguistic knowledge in QA
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
Conclusions
● Deep Learning works great for NLP, but it is not a silver bullet.
● For simple tasks, simple string input may suffice, but for deeper natural
language understanding likely not.
● To tackle this challenge, we need to:
○ Better understand what neural networks model,
○ Help them model more linguistic knowledge,
○ Combine language with other modalities.
yves@nlp.town
Yves Peirsman - Deep Learning for NLP

More Related Content

PDF
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
PDF
Lingsoft Language Management Central
PDF
Acl reading@2016 10-26
PDF
TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...
PPTX
NLP pipeline in machine translation
ODT
A tutorial on Machine Translation
PDF
13. Constantin Orasan (UoW) Natural Language Processing for Translation
PDF
AINL 2016: Eyecioglu
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Lingsoft Language Management Central
Acl reading@2016 10-26
TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...
NLP pipeline in machine translation
A tutorial on Machine Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
AINL 2016: Eyecioglu

What's hot (20)

PPTX
Technical Development Workshop - Text Analytics with Python
PDF
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
PPTX
Machine translation from English to Hindi
PDF
Deep learning Type Inference for Dynamic Programming Languages
PPT
Class9
PPTX
PDF
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
PDF
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
PDF
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
PPTX
Past, Present, and Future: Machine Translation & Natural Language Processing ...
PDF
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
PDF
6. Khalil Sima'an (UVA) Statistical Machine Translation
PDF
AINL 2016: Kravchenko
PPTX
Machine translation with statistical approach
PPTX
Machine translator Introduction
PDF
Aspects of NLP Practice
PDF
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
PDF
Bilingual terminology mining
PDF
Nlp and transformer (v3s)
PPTX
Natural Language processing Parts of speech tagging, its classes, and how to ...
Technical Development Workshop - Text Analytics with Python
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Machine translation from English to Hindi
Deep learning Type Inference for Dynamic Programming Languages
Class9
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
Past, Present, and Future: Machine Translation & Natural Language Processing ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
6. Khalil Sima'an (UVA) Statistical Machine Translation
AINL 2016: Kravchenko
Machine translation with statistical approach
Machine translator Introduction
Aspects of NLP Practice
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
Bilingual terminology mining
Nlp and transformer (v3s)
Natural Language processing Parts of speech tagging, its classes, and how to ...
Ad

Similar to Yves Peirsman - Deep Learning for NLP (20)

PPTX
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
PDF
Frontiers of Natural Language Processing
PDF
How to build a GPT model.pdf
PPTX
Natural Language Processing - Language Model.pptx
PPTX
Understanding Generative AI Models and Their Real-World Applications.pptx
PDF
Natural Language Processing (NLP)
PDF
cs224n natural language processing with deep learning cs224n
PDF
Natural Language Processing, Techniques, Current Trends and Applications in I...
DOCX
Langauage model
PDF
Deep learning for NLP and Transformer
PDF
NLP Meetup 2023
PPTX
Deep Learning for Machine Translation
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PPTX
History of deep learning
PDF
Machine Learning in NLP
PDF
ICS1020 NLP 2020
PPTX
State of the art in Natural Language Processing (March 2019)
PPT
Nlp 2020 global ai conf -jeff_shomaker_final
PDF
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
PDF
How deep learning is shaping natural language processing(NLP)
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Frontiers of Natural Language Processing
How to build a GPT model.pdf
Natural Language Processing - Language Model.pptx
Understanding Generative AI Models and Their Real-World Applications.pptx
Natural Language Processing (NLP)
cs224n natural language processing with deep learning cs224n
Natural Language Processing, Techniques, Current Trends and Applications in I...
Langauage model
Deep learning for NLP and Transformer
NLP Meetup 2023
Deep Learning for Machine Translation
Beyond the Symbols: A 30-minute Overview of NLP
History of deep learning
Machine Learning in NLP
ICS1020 NLP 2020
State of the art in Natural Language Processing (March 2019)
Nlp 2020 global ai conf -jeff_shomaker_final
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
How deep learning is shaping natural language processing(NLP)
Ad

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Foundation of Data Science unit number two notes
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
Supervised vs unsupervised machine learning algorithms
Business Acumen Training GuidePresentation.pptx
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
1_Introduction to advance data techniques.pptx
IB Computer Science - Internal Assessment.pptx
Foundation of Data Science unit number two notes

Yves Peirsman - Deep Learning for NLP

  • 1. Deep Learning for NLP Yves Peirsman
  • 2. About me 2012 NLP engineer 2011 Post-doctoral researcher, Stanford University 2014 NLP Town 2010 PhD Computational Linguistics, KULeuven
  • 3. Deep Learning in NLP 2012 Deep Learning ● Comeback of neural networks ● Unified framework for many problems 1990 Statistical NLP ● Machine learning from data ● Many different models 1950s Rule-based NLP ● Hand-written linguistic rules ● Knowledge models 20?? ??
  • 4. Deep Learning in NLP Basic models The basic NLPer’s toolkit Main advantages Why has DL become so popular in NLP? Beyond the hype Deeper dive & recent trends
  • 5. Word embeddings Words as atomic units Words as dense embeddings The movie has an excellent cast. M I like the cover of the book. B There were too many pages in the novel. ? book novel movie
  • 6. POS syntaxcapitaliz ation prefix suffix bigrams lemmas Feature engineering Major feature engineering Little feature engineering (if any) strings strings
  • 7. Models Distinct models for different problems Unified toolkit Text classification NER MT ... PP PP PP SVM CRF LM TM decoder Text classification NER MT ... LSTM (or similar)
  • 8. NLP Toolkit: LSTM for classification Applications: text classification, language modelling LSTM LSTM LSTM LSTM LSTM embeddings dense layer weights biases The movie was boring . positive neutral negative
  • 9. NLP Toolkit: inside the LSTM forget input tanh output tanh was boring The movie ...
  • 10. NLP Toolkit: LSTM for sequence labelling Applications: named entity recognition LSTM LSTM LSTM LSTM LSTM embeddings dense layer logits B-PER O O B-LOC O weights biases John lives in London .
  • 11. NLP Toolkit: Encoder-Decoder Architectures Applications: machine translation, text summarization, dialogue modelling, etc. LSTM LSTM LSTM LSTM source embeddings LSTM LSTM LSTM LSTM LSTM target embeddings Je t’ aime . <END> I love you .
  • 12. NLP Toolkit: Attention Applications: machine translation, question answering, etc. LSTM LSTM LSTM LSTM source embeddings LSTM LSTM LSTM LSTM LSTM target embeddings Je t’ aime . <END> I love you . attention
  • 13. NLP under threat? Deep learning models have taken NLP by storm, achieving superior results across many applications. Many DL approaches do not model any linguistic knowledge. They view language as a sequence of strings. Is this the end of NLP as a separate discipline?
  • 14. NLP under threat? Deep learning models have taken NLP by storm, achieving superior results across many applications.
  • 15. Language models Rajeswar et al. 2017, https://arxiv.org/pdf/1705.10929.pdf
  • 16. Language models ● Great performance when explicitly trained for the task: 99% correct ○ > 120,000 sentence starts, labelled with singular or plural. ○ 50-dimensional LSTM followed by logistic regression. ○ In > 95% of the cases, the last noun determines the number. ● Performance drop for generic language models: 93% correct ○ Worse than chance on cases where a noun of the “incorrect” number occurs between the subject and the verb Linzen, Dupoux & Goldberg 2016, https://arxiv.org/pdf/1611.01368.pdf
  • 17. Machine Translation ● NMT can behave strangely ● Problems for languages with a very different syntax, such as English and Chinese: ○ 25% of Chinese noun phrases are translated into discontinuous phrases in English ○ Chinese noun phrases are often translated twice Li et al. 2017, https://arxiv.org/abs/1705.01020
  • 18. Question Answering Jia & Liang 2017, https://arxiv.org/pdf/1707.07328.pdf
  • 20. Deep Learning in NLP ● Deep Learning produces great results on many tasks. ● But: ○ Race to the bottom on standard data sets: ■ Language models: Penn Treebank, WikiText-2 ■ Machine Translation: WMT datasets ■ Question Answering: SQuAD ○ Its ignorance of linguistic structure is problematic in the evolution towards NLU ● So: ○ What do neural networks model? ○ How can we make them better?
  • 21. Linguistic knowledge in MT ● What linguistic knowledge does MT model? ● Simple syntactic labels ○ Encoder output + logistic regression ■ Word-level output: part-of-speech ■ Sentence-level output: voice (active or passive), tense (past or present) ● Deep syntactic structure ○ Encoder output + decoder to predict parse trees ● Two benchmarks: ○ Upper bound: neural parser ○ Lower bound: English-to-English “MT” auto-encoder Shi et al. 2016, https://www.isi.edu/natural-language/mt/emnlp16-nmt-grammar.pdf
  • 24. Linguistic knowledge in MT Solution 1: present the encoder with both syntactic and lexical information Li et al. 2017, https://arxiv.org/abs/1705.01020
  • 25. Linguistic knowledge in MT Li et al. 2017, https://arxiv.org/abs/1705.01020
  • 26. Linguistic knowledge in MT Solution 2: combine MT with parsing in multi-task learning Eriguchi et al. 2017 http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
  • 27. Linguistic knowledge in MT Eriguchi et al. 2017 http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
  • 28. Linguistic knowledge in QA ● Most answers to questions are constituents in the sentence. ● Restricting our candidate answers to constituents reduces the search spaces. ● Instead of feeding the network flat sequences, we need to feed it syntax trees. Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
  • 29. Linguistic knowledge in QA Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
  • 30. Conclusions ● Deep Learning works great for NLP, but it is not a silver bullet. ● For simple tasks, simple string input may suffice, but for deeper natural language understanding likely not. ● To tackle this challenge, we need to: ○ Better understand what neural networks model, ○ Help them model more linguistic knowledge, ○ Combine language with other modalities. yves@nlp.town