Yves Peirsman - Deep Learning for NLP

Deep Learning for NLP
Yves Peirsman

About me
2012
NLP engineer
2011
Post-doctoral researcher,
Stanford University
2014
NLP Town
2010
PhD Computational
Linguistics, KULeuven

Deep Learning in NLP
2012
Deep Learning
● Comeback of neural networks
● Unified framework for many
problems
1990
Statistical NLP
● Machine learning from data
● Many different models
1950s
Rule-based NLP
● Hand-written linguistic rules
● Knowledge models
20??
??

Basic models
The basic NLPer’s
toolkit
Main advantages
Why has DL become
so popular in NLP?
Beyond the hype
Deeper dive & recent trends

Word embeddings
Words as atomic units Words as dense embeddings
The movie has an excellent cast. M
I like the cover of the book. B
There were too many pages in the
novel.
?
book
novel
movie

POS
syntaxcapitaliz
ation
prefix
suffix
bigrams
lemmas
Feature engineering
Major feature engineering Little feature engineering (if any)
strings
strings

Models
Distinct models for different problems Unified toolkit
Text
classification
NER MT ...
PP PP PP
SVM CRF LM TM
decoder
Text
classification
NER MT ...
LSTM (or similar)

NLP Toolkit: LSTM for classification
Applications: text classification, language modelling
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings
dense layer
weights
biases
The
movie
was
boring
.
positive
neutral
negative

NLP Toolkit: inside the LSTM
forget input tanh output
tanh
was boring
The movie ...

NLP Toolkit: LSTM for sequence labelling
Applications: named entity recognition
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings dense layer
logits
B-PER
O
O
B-LOC
O
weights
biases
John
lives
in
London
.

NLP Toolkit: Encoder-Decoder Architectures
Applications: machine translation, text summarization, dialogue modelling, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.

NLP Toolkit: Attention
Applications: machine translation, question answering, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.
attention

NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.
Many DL approaches do not model any linguistic knowledge.
They view language as a sequence of strings.
Is this the end of NLP as a separate discipline?

NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.

Language models
Rajeswar et al. 2017, https://arxiv.org/pdf/1705.10929.pdf

Language models
● Great performance when explicitly trained for the task: 99% correct
○ > 120,000 sentence starts, labelled with singular or plural.
○ 50-dimensional LSTM followed by logistic regression.
○ In > 95% of the cases, the last noun determines the number.
● Performance drop for generic language models: 93% correct
○ Worse than chance on cases where a noun of the “incorrect” number occurs between the
subject and the verb
Linzen, Dupoux & Goldberg 2016, https://arxiv.org/pdf/1611.01368.pdf

Machine Translation
● NMT can behave strangely
● Problems for languages with a very different syntax, such as English and
Chinese:
○ 25% of Chinese noun phrases are translated into discontinuous phrases in English
○ Chinese noun phrases are often translated twice
Li et al. 2017, https://arxiv.org/abs/1705.01020

Question Answering
Jia & Liang 2017, https://arxiv.org/pdf/1707.07328.pdf

● Deep Learning produces great results on many tasks.
● But:
○ Race to the bottom on standard data sets:
■ Language models: Penn Treebank, WikiText-2
■ Machine Translation: WMT datasets
■ Question Answering: SQuAD
○ Its ignorance of linguistic structure is problematic in the evolution towards NLU
● So:
○ What do neural networks model?
○ How can we make them better?

Linguistic knowledge in MT
● What linguistic knowledge does MT model?
● Simple syntactic labels
○ Encoder output + logistic regression
■ Word-level output: part-of-speech
■ Sentence-level output: voice (active or passive), tense (past or present)
● Deep syntactic structure
○ Encoder output + decoder to predict parse trees
● Two benchmarks:
○ Upper bound: neural parser
○ Lower bound: English-to-English “MT” auto-encoder
Shi et al. 2016, https://www.isi.edu/natural-language/mt/emnlp16-nmt-grammar.pdf

Solution 1: present the encoder with both syntactic and lexical information

Solution 2: combine MT with
parsing in multi-task learning
Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf

Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf

Linguistic knowledge in QA
● Most answers to questions are
constituents in the sentence.
● Restricting our candidate answers
to constituents reduces the search spaces.
● Instead of feeding the network flat
sequences, we need to feed it syntax trees.
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf

Linguistic knowledge in QA
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf

Conclusions
● Deep Learning works great for NLP, but it is not a silver bullet.
● For simple tasks, simple string input may suffice, but for deeper natural
language understanding likely not.
● To tackle this challenge, we need to:
○ Better understand what neural networks model,
○ Help them model more linguistic knowledge,
○ Combine language with other modalities.
yves@nlp.town

Yves Peirsman - Deep Learning for NLP

Yves Peirsman - Deep Learning for NLP

More Related Content

What's hot (20)

Similar to Yves Peirsman - Deep Learning for NLP (20)

Recently uploaded (20)

Yves Peirsman - Deep Learning for NLP