SlideShare a Scribd company logo
Supervised Learning of Universal Sentence Representations
from Natural Language Inference Data
2017/11/7 B4 Hiroki Shimanaka
Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo ̈ıc Barrault, Antoine Bordes
EMNLP 2017
Abstract & Introduction (1)
Many modern NLP systems rely on word embeddings, previously
trained in an un-supervised manner on large corpora, as base features.
Word embeddings
 Word2Vec (Mikolov et al., 2013)
 GloVe (Pennington et al., 2014)
Efforts to obtain embeddings for larger chunks of text, such as
sentences, have however not been so successful.
Unsupervised Sentence embeddings
 SkipThought (Kiros et al., 2015)
 FastSent (Hill et al., 2016)
1
2
In this paper, they show how universal sentence representations
trained using the supervised data of the Stanford Natural Language
Inference datasets.
It can consistently outperform unsu-pervised methods like SkipThought
vec-tors (Kiros et al., 2015) on a wide range of transfer tasks.
Abstract & Introduction (2)
3
Approach
This work combines two research directions, which they describe in
what follows.
First, they ex-plain how the NLI task can be used to train univer-sal sentence
encoding models using the SNLI task.
Second, they describe the architectures that we investigated for the sentence
encoder, which, in their opinion, covers a suitable range of sentence
encoders currently in use.
4
SNLI (Stanford Natural Language Inference) dataset
The SNLI dataset consists of 570k human-generated English sentence
pairs, manually la-beled with one of three categories: entailment,
contradiction and neutral.
https://nlp.stanford.edu/projects/snli/
5
The Natural Language Inference task
Once the sentence vectors are generated, 3
matching methods are applied to extract
relations between 𝑢 and 𝑣.
(i) concatenation of the two representa-tions (u, v)
(ii) element-wise product u ∗ v
(iii) absolute element-wise difference |u − v|
The resulting vector, which captures
information from both the premise and the
hypothesis, is fed into a 3-class classifier
consisting of multiple fully-connected layers
culminating in a softmax layer.
6
Sentence encoder architectures (1)
LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Cho et al., 2014)
A sentence is represented by the last hid-den vector.
BiGRU-last
It con-catenates the last hidden state of a forward GRU, and the last hidden
state of a backward GRU.
7
Sentence encoder architectures (2)
BiLSTM with mean/max pooling
8
Sentence encoder architectures (3)
It uses an attention mecha-nism
over the hidden states of a
BiLSTM to gen-erate a
representation 𝑢 of an input
sentence.
 self-attentive sentence encoder (Liu et al., 2016; Lin et al., 2017)
9
Sentence encoder architectures (4)
Hierarchical ConvNet
It is like AdaSent (Zhao et al., 2015).
The final representation 𝑢 =
[𝑢1, 𝑢2, 𝑢3, 𝑢4] concatenates
representations at different levels of
the input sentence.
10
Training details
For all their models trained on SNLI, they use SGD with a learning rate
of 0.1 and a weight decay of 0.99.
At each epoch, they divide the learning rate by 5 if the dev accuracy
decreases.
they use mini-batches of size 64 and training is stopped when the
learning rate goes under the threshold of 10−5.
For the classifier, they use a multi-layer perceptron with 1 hidden-layer
of 512 hidden units.
They use open-source GloVe vectors trained on Common Crawl 840B
with 300 dimensions as fixed word embeddings.
11
Evaluation of sentence representations (1)
Binary and multi-class classification
sentiment analysis (MR, SST)
question-type (TREC)
product reviews (CR)
sub-jectivity/objectivity (SUBJ)
opinion polarity (MPQA)
Entailment and semantic relatedness
They also evaluate on the SICK dataset for both entailment (SICK-E) and
semantic relatedness (SICK-R).
Semantic Textual Similarity (STS14 (Agirre et al., 2014)).
12
Evaluation of sentence representations (2)
Paraphrase detection
Sentence pairs have been human-annotated according to whether they cap-
ture a paraphrase/semantic equivalence relation-ship.
Caption-Image retrieval
The caption-image retrieval task evaluates joint image and language feature
models (Hodosh et al., 2013; Lin et al., 2014).
The goal is either to rank a large collec-tion of images by their relevance with
respect to a given query caption (Image Retrieval), or ranking captions by
their relevance for a given query image (Caption Retrieval).
13
Result (1)
14
Result (2)
15
Result (3)
16
Conclusion
This paper studies the effects of training sentence embeddings with
supervised data by testing on 12 different transfer tasks.
They showed that mod-els learned on NLI can perform better than
mod-els trained in unsupervised conditions or on other supervised tasks.
They showed that a BiLSTM network with max pooling makes the best
current universal sen-tence encoding methods, outperforming existing
approaches like SkipThought vectors.
17
Reference
Supervised Learning of Universal Sentence Representations from
Natural Language Inference Data, Conneau et al., EMNLP 2017

More Related Content

PDF
Optic flow estimation with deep learning
PDF
04 image enhancement edge detection
PPT
Language acquisition2
PPTX
Toward a Theory of Second Language Acquisition
PPTX
Bert.pptx
DOCX
Error analysis revised
PDF
Video Steganography
PPSX
Image Enhancement in Spatial Domain
Optic flow estimation with deep learning
04 image enhancement edge detection
Language acquisition2
Toward a Theory of Second Language Acquisition
Bert.pptx
Error analysis revised
Video Steganography
Image Enhancement in Spatial Domain

What's hot (8)

PPTX
Dual Layer Security Of Data Using LSB Image Steganography And AES Encryption ...
PPT
Data Models.ppt
PPTX
Style gan
PDF
Exploring Simple Siamese Representation Learning
PPTX
Cursors, triggers, procedures
PPTX
Code-Switching
PPT
PPT
Feral Children Presentation
Dual Layer Security Of Data Using LSB Image Steganography And AES Encryption ...
Data Models.ppt
Style gan
Exploring Simple Siamese Representation Learning
Cursors, triggers, procedures
Code-Switching
Feral Children Presentation
Ad

Similar to [Paper Reading] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (20)

PDF
Continuous bag of words cbow word2vec word embedding work .pdf
PDF
Effect of word embedding vector dimensionality on sentiment analysis through ...
PDF
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
PDF
A semantic framework and software design to enable the transparent integratio...
PDF
UWB semeval2016-task5
PDF
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
PDF
1808.10245v1 (1).pdf
PDF
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
PDF
French machine reading for question answering
PPTX
Deep Neural Methods for Retrieval
PDF
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
PDF
L017158389
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
PDF
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
PDF
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
PDF
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
PDF
Natural Language Processing Through Different Classes of Machine Learning
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Continuous bag of words cbow word2vec word embedding work .pdf
Effect of word embedding vector dimensionality on sentiment analysis through ...
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
A semantic framework and software design to enable the transparent integratio...
UWB semeval2016-task5
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
1808.10245v1 (1).pdf
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
French machine reading for question answering
Deep Neural Methods for Retrieval
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
L017158389
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Natural Language Processing Through Different Classes of Machine Learning
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Ad

More from Hiroki Shimanaka (7)

PDF
[Tutorial] Sentence Representation
PDF
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
PDF
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
PDF
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
PDF
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
PDF
[論文紹介] Are BLEU and Meaning Representation in Opposition?
PPTX
[論文紹介] Skip-Thought Vectors
[Tutorial] Sentence Representation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Skip-Thought Vectors

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
. Radiology Case Scenariosssssssssssssss
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Sciences of Europe No 170 (2025)
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
protein biochemistry.ppt for university classes
PDF
The scientific heritage No 166 (166) (2025)
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
2. Earth - The Living Planet earth and life
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
AlphaEarth Foundations and the Satellite Embedding dataset
Comparative Structure of Integument in Vertebrates.pptx
7. General Toxicologyfor clinical phrmacy.pptx
Biophysics 2.pdffffffffffffffffffffffffff
. Radiology Case Scenariosssssssssssssss
POSITIONING IN OPERATION THEATRE ROOM.ppt
Sciences of Europe No 170 (2025)
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
protein biochemistry.ppt for university classes
The scientific heritage No 166 (166) (2025)
microscope-Lecturecjchchchchcuvuvhc.pptx
Phytochemical Investigation of Miliusa longipes.pdf
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
TOTAL hIP ARTHROPLASTY Presentation.pptx
2. Earth - The Living Planet earth and life
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice

[Paper Reading] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

  • 1. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data 2017/11/7 B4 Hiroki Shimanaka Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo ̈ıc Barrault, Antoine Bordes EMNLP 2017
  • 2. Abstract & Introduction (1) Many modern NLP systems rely on word embeddings, previously trained in an un-supervised manner on large corpora, as base features. Word embeddings  Word2Vec (Mikolov et al., 2013)  GloVe (Pennington et al., 2014) Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Unsupervised Sentence embeddings  SkipThought (Kiros et al., 2015)  FastSent (Hill et al., 2016) 1
  • 3. 2 In this paper, they show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets. It can consistently outperform unsu-pervised methods like SkipThought vec-tors (Kiros et al., 2015) on a wide range of transfer tasks. Abstract & Introduction (2)
  • 4. 3 Approach This work combines two research directions, which they describe in what follows. First, they ex-plain how the NLI task can be used to train univer-sal sentence encoding models using the SNLI task. Second, they describe the architectures that we investigated for the sentence encoder, which, in their opinion, covers a suitable range of sentence encoders currently in use.
  • 5. 4 SNLI (Stanford Natural Language Inference) dataset The SNLI dataset consists of 570k human-generated English sentence pairs, manually la-beled with one of three categories: entailment, contradiction and neutral. https://nlp.stanford.edu/projects/snli/
  • 6. 5 The Natural Language Inference task Once the sentence vectors are generated, 3 matching methods are applied to extract relations between 𝑢 and 𝑣. (i) concatenation of the two representa-tions (u, v) (ii) element-wise product u ∗ v (iii) absolute element-wise difference |u − v| The resulting vector, which captures information from both the premise and the hypothesis, is fed into a 3-class classifier consisting of multiple fully-connected layers culminating in a softmax layer.
  • 7. 6 Sentence encoder architectures (1) LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Cho et al., 2014) A sentence is represented by the last hid-den vector. BiGRU-last It con-catenates the last hidden state of a forward GRU, and the last hidden state of a backward GRU.
  • 8. 7 Sentence encoder architectures (2) BiLSTM with mean/max pooling
  • 9. 8 Sentence encoder architectures (3) It uses an attention mecha-nism over the hidden states of a BiLSTM to gen-erate a representation 𝑢 of an input sentence.  self-attentive sentence encoder (Liu et al., 2016; Lin et al., 2017)
  • 10. 9 Sentence encoder architectures (4) Hierarchical ConvNet It is like AdaSent (Zhao et al., 2015). The final representation 𝑢 = [𝑢1, 𝑢2, 𝑢3, 𝑢4] concatenates representations at different levels of the input sentence.
  • 11. 10 Training details For all their models trained on SNLI, they use SGD with a learning rate of 0.1 and a weight decay of 0.99. At each epoch, they divide the learning rate by 5 if the dev accuracy decreases. they use mini-batches of size 64 and training is stopped when the learning rate goes under the threshold of 10−5. For the classifier, they use a multi-layer perceptron with 1 hidden-layer of 512 hidden units. They use open-source GloVe vectors trained on Common Crawl 840B with 300 dimensions as fixed word embeddings.
  • 12. 11 Evaluation of sentence representations (1) Binary and multi-class classification sentiment analysis (MR, SST) question-type (TREC) product reviews (CR) sub-jectivity/objectivity (SUBJ) opinion polarity (MPQA) Entailment and semantic relatedness They also evaluate on the SICK dataset for both entailment (SICK-E) and semantic relatedness (SICK-R). Semantic Textual Similarity (STS14 (Agirre et al., 2014)).
  • 13. 12 Evaluation of sentence representations (2) Paraphrase detection Sentence pairs have been human-annotated according to whether they cap- ture a paraphrase/semantic equivalence relation-ship. Caption-Image retrieval The caption-image retrieval task evaluates joint image and language feature models (Hodosh et al., 2013; Lin et al., 2014). The goal is either to rank a large collec-tion of images by their relevance with respect to a given query caption (Image Retrieval), or ranking captions by their relevance for a given query image (Caption Retrieval).
  • 17. 16 Conclusion This paper studies the effects of training sentence embeddings with supervised data by testing on 12 different transfer tasks. They showed that mod-els learned on NLI can perform better than mod-els trained in unsupervised conditions or on other supervised tasks. They showed that a BiLSTM network with max pooling makes the best current universal sen-tence encoding methods, outperforming existing approaches like SkipThought vectors.
  • 18. 17 Reference Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, Conneau et al., EMNLP 2017

Editor's Notes

  • #3: Word embedding が様々なNLPタスクで役に立っている。Word embeddings では文のベクトルを生成しにくい。他にも文ベクトルを生成するツールが存在するが、それらもあまりいい成果を挙げれていない。
  • #4: そこでこの論文では SNLI の教師あり datasets を用いてどのように文ベクトルを学習するかを示す。 またこの方法で様々なタスクでSkipThoughtのような教師なし学習の方法より良い結果を出せることができる。
  • #5: この方法は以下の2つの研究方向を組み合わせたもの出す。 一つ目の方向性として、SNLI datasets を用いた NLIタスクをどのようにしてユニバーサルセンテンスエンコーディングに用いるかというもの 二つ目の方向性として、どのセンテンスエンコーダがこのタスクに向いているかというもの
  • #6: Flickr30kのキャプションをもとに含意、矛盾、中立する文をクラウドソーシングで収集したもの
  • #7: 512次元の隠れ層とsoftmax層により3値分類する
  • #13: 分類問題 意味的関係
  • #14: 言い換えの認識 キャプション、画像検索
  • #16: 今まで最も優れていた文エンコーダであるSkipthought-LN は64M, 1ヶ月 提案手法は 57k, 1日
  • #18: 本稿では、12の異なる転送タスクをテストすることにより、教師付きデータを含む訓練センテンス埋め込みの効果を研究する。 彼らは、NLIで学んだモデルは、監督されていない状態や他の管理対象タスクで訓練されたモデルよりも優れたパフォーマンスを発揮できることを示しました。 彼らは、最大プールを持つBiLSTMネットワークがSkipThoughtベクトルのような既存のアプローチより優れた現在の普遍的なセンシングエンコーディングメソッドを作成することを示しました。
  • #19: 本稿では、12の異なる転送タスクをテストすることにより、教師付きデータを含む訓練センテンス埋め込みの効果を研究する。 彼らは、NLIで学んだモデルは、監督されていない状態や他の管理対象タスクで訓練されたモデルよりも優れたパフォーマンスを発揮できることを示しました。 彼らは、最大プールを持つBiLSTMネットワークがSkipThoughtベクトルのような既存のアプローチより優れた現在の普遍的なセンシングエンコーディングメソッドを作成することを示しました。