SlideShare a Scribd company logo
Understanding
GloVe
(Global Vectors for Word Representation)
-
Slides written by Park JeeHyun
27 DEC 2017
Contents
1. Introduction
2. GloVe model
3. GloVe cost function
4. Experiments & Results
5. References
1. Introduction
• The statistics of word occurrences in a corpus is the primary
source of information available to all unsupervised methods
for learning word representations.
• Although many such methods now exist, the question still
remains as to how meaning is generated from these
statistics, and how the resulting word vectors might
represent that meaning.
1. Introduction
• Recent methods for learning vector space representations
of words have succeeded in capturing fine-grained
semantic and syntactic regularities using vector arithmetic,
• But the origin of these regularities has remained opaque.
1. Introduction : pros & cons
2. GloVe model
• Combines the advantages of the two major model families
in the literature:
• global matrix factorization and,
• local context window methods
• Our model efficiently leverages statistical information by
training only on the nonzero elements in a word-word co-
occurrence matrix, rather than on the entire sparse matrix or
on individual context windows in a large corpus.
3. GloVe cost function
• Deriving the Cost Function
• The above argument suggests that the appropriate starting point for word vector
learning should be with ratios of co-occurrence probabilities
rather than the probabilities themselves.
3. GloVe cost function
• Establish some notations
• let the matrix of word-word co-occurrence counts be denoted by 𝑋,
• whose entries 𝑋𝑖𝑗 tabulate the number of times word 𝑗
occurs in the context of word 𝑖
• let 𝑋𝑖 = 𝑘 𝑋𝑖𝑘 be the number of times any word
appears in the context of word 𝑖
• let 𝑃𝑖𝑗 = 𝑃 𝑖 𝑗 = 𝑋𝑖𝑗 𝑋𝑖 be the probability that word j
appear in the context of word 𝑖
3. GloVe cost function
• Deriving the Cost Function
• set a function F that represents ratios of co-occurrence probabilities rather than t
he probabilities themselves
• we would like F to encode the information present the ratio 𝑃𝑖𝑘 𝑃𝑗𝑘 in the
word vector space. Since vector spaces are inherently linear structures,
the most natural way to do this is with vector differences.
3. GloVe cost function
 Note that 𝑤𝑖 and 𝑤 𝑘 are v
ectors from different
vector-spaces
• Deriving the Cost Function
• We note that the arguments of F in Eqn. (2) are vectors while the right-hand sid
e is a scalar.
• While F could be taken to be a complicated function parameterized by,
e.g., a neural network, doing so would obfuscate the linear structure we are tryin
g to capture.
• To avoid this issue, we can first take the dot product of the arguments,
which prevents F from mixing the vector dimensions in undesirable ways.
3. GloVe cost function
• Deriving the Cost Function
• for word-word co-occurrence matrices,
the distinction between a word and a context word is arbitrary
and that we are free to exchange the two roles.
• the symmetry can be restored in two steps.
• First, we require that 𝐹 be a homomorphism between the groups
(ℝ,+) and (ℝ >0, ×), i.e.,
• which, by Eqn. (3), is solved by,
3. GloVe cost function
3. GloVe cost function : homomorphism
Homo (same)
+
Morph (shape)
3. GloVe cost function : homomorphism
• Deriving the Cost Function
• The solution to Eqn. (4) is 𝐹 = 𝑒𝑥𝑝 or,
• Next, we note that Eqn. (6) would exhibit the exchange symmetry
if not for the 𝐥𝐨𝐠(𝑿𝒊) on the right-hand side.
• However, this term is independent of 𝑘 so it can be absorbed into
a bias 𝑏𝑖 for 𝑤𝑖.
• Finally, adding an additional bias 𝑏𝑖 for 𝑤𝑖 restores the symmetry.
3. GloVe cost function
• Deriving the Cost Function
3. GloVe cost function : weighting function
• Deriving the Cost Function
3. GloVe cost function : weighting function
4. Experiments & Results
• Experiments
• Word analogy
• Word similarity
• Named entity recognition (NER)
• Result
4. Experiments : analogy task
• The word analogy task consists of questions like,
“a is to b as c is to ___?”
• The semantic questions, like
“Athens is to Greece as Berlin is to ___?”.
• The syntactic questions like,
“dance is to dancing as fly is to ___?”
4. Experiments : analogy task
4. Experiments : analogy task
4. Experiments : similarity task
• A similarity score is obtained from the word vectors by first
normalizing each feature across the vocabulary and then
calculating the cosine similarity.
• We compute Spearman’s rank correlation coefficient
between this score and the human judgments.
4. Experiments : similarity task
4. Experiments : similarity task
4. Experiments : NER task
• The CoNLL-2003 English benchmark dataset for NER is a
collection of documents from Reuters newswire articles,
annotated with four entity types:
• Person
• Location
• Organization
• Miscellaneous
• We train models on CoNLL-03 training data on test on three
datasets:
1) ConLL-03 testing data
2) ACE Phase 2 (2001-02) and ACE-2003 data
3) MUC7 Formal Run test set.
4. Experiments : NER task
4. Result
• GloVe, is a new global log-bilinear regression model for the
unsupervised learning of word representations that
outperforms other models on word analogy, word similarity,
and named entity recognition tasks.
5. References
• GloVe
• https://nlp.stanford.edu/projects/glove/
• Stanford NLP lecture
• http://web.stanford.edu/class/cs224n/
• https://www.youtube.com/watch?v=ASn7ExxLZws&t=43s
• Socratica’s video about homomorphism
• https://www.youtube.com/watch?v=cYzp5IWqCsg
Understanding GloVe

More Related Content

PDF
Glove global vectors for word representation
PPTX
Subword tokenizers
PPTX
Python - Numpy/Pandas/Matplot Machine Learning Libraries
PPT
Chapter 6/Mobile Commerce and Ubiquitous Computing Technology of E-Business
PDF
1.introduction to research methodology
PPTX
BERT introduction
PPTX
Deep Learning for Natural Language Processing
PDF
Word Embeddings - Introduction
Glove global vectors for word representation
Subword tokenizers
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Chapter 6/Mobile Commerce and Ubiquitous Computing Technology of E-Business
1.introduction to research methodology
BERT introduction
Deep Learning for Natural Language Processing
Word Embeddings - Introduction

What's hot (20)

PDF
Word2Vec
PDF
NLP using transformers
PPTX
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPTX
PDF
Introduction to Transformers for NLP - Olga Petrova
PPTX
NLP State of the Art | BERT
PPTX
[Paper review] BERT
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PPTX
Text similarity measures
PDF
Deep learning for NLP and Transformer
PPTX
PPTX
Natural language processing and transformer models
PPTX
Word embedding
PPTX
Deep Learning in Computer Vision
PDF
An introduction to the Transformers architecture and BERT
PPTX
Natural Language processing Parts of speech tagging, its classes, and how to ...
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PPTX
Introduction to Transformer Model
Word2Vec
NLP using transformers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Introduction to Transformers for NLP - Olga Petrova
NLP State of the Art | BERT
[Paper review] BERT
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Text similarity measures
Deep learning for NLP and Transformer
Natural language processing and transformer models
Word embedding
Deep Learning in Computer Vision
An introduction to the Transformers architecture and BERT
Natural Language processing Parts of speech tagging, its classes, and how to ...
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Introduction to Transformer Model
Ad

Similar to Understanding GloVe (20)

PPTX
Word_Embedding.pptx
PDF
[Emnlp] what is glo ve part iii - towards data science
PPTX
Unit - I Sentiment anlysis with logistic regression.pptx
PDF
[Emnlp] what is glo ve part i - towards data science
PPTX
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
PPTX
NLP_KASHK:Evaluating Language Model
PDF
[Emnlp] what is glo ve part ii - towards data science
PPTX
Word_Embeddings.pptx
PDF
Turkish language modeling using BERT
PDF
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
PDF
Interface for Finding Close Matches from Translation Memory
PPTX
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
PPTX
Word embeddings
PDF
Word2vec on the italian language: first experiments
PPTX
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
PDF
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
PDF
Intrinsic and Extrinsic Evaluations of Word Embeddings
PDF
Marvin_Capstone
PPTX
wordembedding.pptx
PDF
Open vocabulary problem
Word_Embedding.pptx
[Emnlp] what is glo ve part iii - towards data science
Unit - I Sentiment anlysis with logistic regression.pptx
[Emnlp] what is glo ve part i - towards data science
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
NLP_KASHK:Evaluating Language Model
[Emnlp] what is glo ve part ii - towards data science
Word_Embeddings.pptx
Turkish language modeling using BERT
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
Interface for Finding Close Matches from Translation Memory
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Word embeddings
Word2vec on the italian language: first experiments
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Intrinsic and Extrinsic Evaluations of Word Embeddings
Marvin_Capstone
wordembedding.pptx
Open vocabulary problem
Ad

More from JEE HYUN PARK (8)

PPTX
keti companion classifier
PPTX
Kcc201728apr2017 170828235330
PPTX
neural based_context_representation_learning_for_dialog_act_classification
PPTX
a deep reinforced model for abstractive summarization
PPTX
Historical Finance Data
PPTX
Understanding lstm and its diagrams
PPTX
KCC2017 28APR2017
PPTX
Short-Term Load Forecasting of Australian National Electricity Market by Hier...
keti companion classifier
Kcc201728apr2017 170828235330
neural based_context_representation_learning_for_dialog_act_classification
a deep reinforced model for abstractive summarization
Historical Finance Data
Understanding lstm and its diagrams
KCC2017 28APR2017
Short-Term Load Forecasting of Australian National Electricity Market by Hier...

Recently uploaded (20)

PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Soil Improvement Techniques Note - Rabbi
PDF
737-MAX_SRG.pdf student reference guides
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
communication and presentation skills 01
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPT
Total quality management ppt for engineering students
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
Fundamentals of safety and accident prevention -final (1).pptx
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Fundamentals of Mechanical Engineering.pptx
III.4.1.2_The_Space_Environment.p pdffdf
Module 8- Technological and Communication Skills.pptx
Soil Improvement Techniques Note - Rabbi
737-MAX_SRG.pdf student reference guides
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
distributed database system" (DDBS) is often used to refer to both the distri...
communication and presentation skills 01
Visual Aids for Exploratory Data Analysis.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Categorization of Factors Affecting Classification Algorithms Selection
Total quality management ppt for engineering students

Understanding GloVe

  • 1. Understanding GloVe (Global Vectors for Word Representation) - Slides written by Park JeeHyun 27 DEC 2017
  • 2. Contents 1. Introduction 2. GloVe model 3. GloVe cost function 4. Experiments & Results 5. References
  • 3. 1. Introduction • The statistics of word occurrences in a corpus is the primary source of information available to all unsupervised methods for learning word representations. • Although many such methods now exist, the question still remains as to how meaning is generated from these statistics, and how the resulting word vectors might represent that meaning.
  • 4. 1. Introduction • Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, • But the origin of these regularities has remained opaque.
  • 5. 1. Introduction : pros & cons
  • 6. 2. GloVe model • Combines the advantages of the two major model families in the literature: • global matrix factorization and, • local context window methods • Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co- occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus.
  • 7. 3. GloVe cost function
  • 8. • Deriving the Cost Function • The above argument suggests that the appropriate starting point for word vector learning should be with ratios of co-occurrence probabilities rather than the probabilities themselves. 3. GloVe cost function
  • 9. • Establish some notations • let the matrix of word-word co-occurrence counts be denoted by 𝑋, • whose entries 𝑋𝑖𝑗 tabulate the number of times word 𝑗 occurs in the context of word 𝑖 • let 𝑋𝑖 = 𝑘 𝑋𝑖𝑘 be the number of times any word appears in the context of word 𝑖 • let 𝑃𝑖𝑗 = 𝑃 𝑖 𝑗 = 𝑋𝑖𝑗 𝑋𝑖 be the probability that word j appear in the context of word 𝑖 3. GloVe cost function
  • 10. • Deriving the Cost Function • set a function F that represents ratios of co-occurrence probabilities rather than t he probabilities themselves • we would like F to encode the information present the ratio 𝑃𝑖𝑘 𝑃𝑗𝑘 in the word vector space. Since vector spaces are inherently linear structures, the most natural way to do this is with vector differences. 3. GloVe cost function  Note that 𝑤𝑖 and 𝑤 𝑘 are v ectors from different vector-spaces
  • 11. • Deriving the Cost Function • We note that the arguments of F in Eqn. (2) are vectors while the right-hand sid e is a scalar. • While F could be taken to be a complicated function parameterized by, e.g., a neural network, doing so would obfuscate the linear structure we are tryin g to capture. • To avoid this issue, we can first take the dot product of the arguments, which prevents F from mixing the vector dimensions in undesirable ways. 3. GloVe cost function
  • 12. • Deriving the Cost Function • for word-word co-occurrence matrices, the distinction between a word and a context word is arbitrary and that we are free to exchange the two roles. • the symmetry can be restored in two steps. • First, we require that 𝐹 be a homomorphism between the groups (ℝ,+) and (ℝ >0, ×), i.e., • which, by Eqn. (3), is solved by, 3. GloVe cost function
  • 13. 3. GloVe cost function : homomorphism
  • 14. Homo (same) + Morph (shape) 3. GloVe cost function : homomorphism
  • 15. • Deriving the Cost Function • The solution to Eqn. (4) is 𝐹 = 𝑒𝑥𝑝 or, • Next, we note that Eqn. (6) would exhibit the exchange symmetry if not for the 𝐥𝐨𝐠(𝑿𝒊) on the right-hand side. • However, this term is independent of 𝑘 so it can be absorbed into a bias 𝑏𝑖 for 𝑤𝑖. • Finally, adding an additional bias 𝑏𝑖 for 𝑤𝑖 restores the symmetry. 3. GloVe cost function
  • 16. • Deriving the Cost Function 3. GloVe cost function : weighting function
  • 17. • Deriving the Cost Function 3. GloVe cost function : weighting function
  • 18. 4. Experiments & Results • Experiments • Word analogy • Word similarity • Named entity recognition (NER) • Result
  • 19. 4. Experiments : analogy task • The word analogy task consists of questions like, “a is to b as c is to ___?” • The semantic questions, like “Athens is to Greece as Berlin is to ___?”. • The syntactic questions like, “dance is to dancing as fly is to ___?”
  • 20. 4. Experiments : analogy task
  • 21. 4. Experiments : analogy task
  • 22. 4. Experiments : similarity task • A similarity score is obtained from the word vectors by first normalizing each feature across the vocabulary and then calculating the cosine similarity. • We compute Spearman’s rank correlation coefficient between this score and the human judgments.
  • 23. 4. Experiments : similarity task
  • 24. 4. Experiments : similarity task
  • 25. 4. Experiments : NER task • The CoNLL-2003 English benchmark dataset for NER is a collection of documents from Reuters newswire articles, annotated with four entity types: • Person • Location • Organization • Miscellaneous • We train models on CoNLL-03 training data on test on three datasets: 1) ConLL-03 testing data 2) ACE Phase 2 (2001-02) and ACE-2003 data 3) MUC7 Formal Run test set.
  • 26. 4. Experiments : NER task
  • 27. 4. Result • GloVe, is a new global log-bilinear regression model for the unsupervised learning of word representations that outperforms other models on word analogy, word similarity, and named entity recognition tasks.
  • 28. 5. References • GloVe • https://nlp.stanford.edu/projects/glove/ • Stanford NLP lecture • http://web.stanford.edu/class/cs224n/ • https://www.youtube.com/watch?v=ASn7ExxLZws&t=43s • Socratica’s video about homomorphism • https://www.youtube.com/watch?v=cYzp5IWqCsg