SlideShare a Scribd company logo
Evaluating the Utility of Vector Differences for Lexical Relation
Learning
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin
August, 09 2016
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 1 / 23
The utility of difference vectors
DIFFVEC = word2 − word1
Vector Difference, or Offset
Mikolov et al, 2013: king − man + woman ≈ queen
CAPITAL-CITY: Paris − France + Poland ≈ Warsaw
or PLURALISATION: cars − car + apple ≈ apples
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 2 / 23
The utility of difference vectors
DIFFVEC = word2 − word1
Vector Difference, or Offset
Mikolov et al, 2013: king − man + woman ≈ queen
CAPITAL-CITY: Paris − France + Poland ≈ Warsaw
or PLURALISATION: cars − car + apple ≈ apples
Can Diffw1,w2be clustered or classified into a broad coverage set of lexical relations?
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 3 / 23
Types of relations
Lexical semantic relations
LEXSEMHyper: Hypernymy (animal, dog)
LEXSEMMero: Meronymy (bird, wing)
LEXSEMEvent: Object’s Action (zip, coat)
Morphosyntactic relations
VERBPast: Present, 1st → Past (know, knew)
VERB3: Present, 1st → Present, 3rd (know, knows)
VERB3Past: Present, 3rd → Past (knows, knew)
NOUNSP: Singular → Plural (year, years)
Morphosemantic relations
VERBNOUN: Nominalisation of a verb (drive, drift)
PREFIX: Prefixing with re morpheme (vote, revote)
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 4 / 23
Word Embeddings
Name Dimensions Training data
w2v 300 100 × 109
GloVe 200 6 × 109
SENNA 100 37 × 106
HLBL 200 37 × 106
w2vwiki 300 50 × 106
GloVewiki 300 50 × 106
SVDwiki 300 50 × 106
The Models Used
w2v (Mikolov et al.,2013),
GloVe (Pennington et al.,2014),
SENNA (Collobert et al., 2011),
HLBL (Mnih and Hinton,2009)
PPMI+SVD (Levy and Goldberg, 2015)
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 5 / 23
Closed-World Experiments
Closed-World setting: Multi-class classifier
Let {(wi , wj )} be a set of word pairs
R = {rk } be a set of binary lexical relations
(wi , wj ) → rk ∈ R,
i.e. all word pairs can be uniquely classified according to a relation in R
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 6 / 23
Spectral clustering: t-SNE projection for 10 samples per class
LEXSEMAttr
LEXSEMCause
NOUNColl
LEXSEMEvent
LEXSEMHyper
LVC
LEXSEMMero
NOUNSP
PREFIX
LEXSEMRef
LEXSEMSpace
VERB3
VERB3Past
VERBPast
VERBNOUN
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 7 / 23
Methods
Clustering algorithm
Spectral clustering (von Luxburg, 2007).
Two hyperparameters: (1) the number of clusters; and (2) the pairwise similarity measure for
comparing DIFFVECs.
Clustering evaluation
V-Measure (Rosenberg, Hirschberg, 2007)
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 8 / 23
Clustering results
10 20 30 40 50 60 70 80
0.150.200.250.300.350.40
Number of clusters
V-Measure
w2v
w2v wiki
GloVe
GloVe wiki
SVD wiki
HLBL
SENNA
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 9 / 23
Clustering results
Incorrectly classified due to ambiguity/one word overwhelming another
studies − study ⇒ VERBNOUN
saw − utensil ⇒ VERBPast
tigers − ambush ⇒ NOUNColl
Single ”hypernym”-specific clusters
necklace − unit, wristband − unit, hairpin − unit
Semantic sub-clusters
movement verb − animal noun
food verb − food noun
action verb − profession noun
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 10 / 23
From Clustering to Classification
From Clustering to Classification
Encouraged by the results of the clustering experiment,
we next move to classification experiments.
We train multi-class linear classifier to differentiate between the relations types.
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 11 / 23
Classification: Multi-class linear SVM, F-scores
0.00.20.40.60.81.0
H
yper
Event
M
ero
N
oun
SP
Verb
3
Verb
Past
Verb
3Past
PrefixR
e
N
oun
C
oll
M
icroAvg
Baseline W2V W2V:Wiki SVD:Wiki
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 12 / 23
Open-World Experiments
Open-World setting: Binary classifier
Let {(wi , wj )} be a set of word pairs
R = {rk } be a set of binary lexical relations
(wi , wj ) → rk ∈ R ∪ {φ}
where φ signifies the fact that none of the relations in R apply to the word pair
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 13 / 23
Binary classification
Adding some random pairs, i.e. randomly linked word pairs
Generating random pairs
(1) Seed word proportional to their frequency in Wikipedia ⇒
(2) take the Cartesian product over pairs of words from the seed lexicon ⇒
(3) sample word pairs uniformly from this set
Training the classifiers
Train 9 binary SVM classifiers with RBF kernel and evaluate on test set augmented with random
samples
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 14 / 23
Open-World Results
Open−World
0.00.20.40.60.81.0
H
yper
Event
M
ero
N
oun
SP
Verb
3
Verb
PastVerb
3PastPrefixR
e
N
oun
C
oll
Pr: No NS Re: No NS
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 15 / 23
Binary classification
Results
correctly captured many of the true classes of the relations (high recall), but also many of the
random samples as being related (low precision):
(have, works), (turn, took), (works, started) ⇒ VERB3, VERBPast and VERB3Past
NOUNColl: everything related to animals
LEXSEMMero: mainly relations consisting of nouns
Relational similarity ≈ a combination of attributional similarities.
Some of them (e.g., syntactic) the classifier captures, and some(e.g., semantics) might be missing.
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 16 / 23
Binary classification
The classifier over-generalizes → Add extra negative samples to the training data.
Negative Samples
opposite pairs: switching the order of word pairs, Opposw1,w2 = word1 − word2
shuffled pairs: replacing w2 with a random word from the same relation,
Shuffw1,w2 = word2 − word1.
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 17 / 23
Binary classification with negative samples
Usage of Negative Samples
0.00.20.40.60.81.0
H
yper
Event
M
ero
N
oun
SP
Verb
3
Verb
Past
Verb
3Past
PrefixR
e
N
oun
C
oll
Pr: No NS Pr: With NS Re: No NS Re: With NS
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 18 / 23
Lexical memorization
A Short Example
Train:
LEXSEMHyper: Hypernymy (animal, dog)
LEXSEMHyper: Hypernymy (animal, cat)
LEXSEMHyper: Hypernymy (animal, monkey)
Then in Test:
LEXSEMHyper: Hypernymy : (animal, banana)
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 19 / 23
Lexical memorization: No lexical overlap between test and train
0 1 2 3 4 5
0.00.20.40.60.81.0
Volume of random word pairs
P/R/F
P
P+neg
R
R+neg
F
F+neg
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 20 / 23
Conclusion
many types of morphosyntactic differences are captured by DIFFVECs, morphosemantic
relations are a bit harder and lexical semantic relations are captured less well
classification over the DIFFVECs works extremely well in a closed-world setup, but less well over
open data
With the introduction of automatically-generated negative samples, however, the results
improved substantially
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 21 / 23
Open Questions
Could some examples be more representative of the relation type?
How much data do we need for the best generalization?
Some morphosemantic (derivations:prefixing) and lexical relations (meronyms) are still hard to
capture.
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 22 / 23
Thanks
Thank you for your time and attention! Questions?
See more details here: http://arxiv.org/abs/1509.01692
Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 23 / 23

More Related Content

PPTX
Introduciton to international relation
PDF
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
PDF
Inductive learning of long-distance dissimilation as a problem for phonology
PPT
Distributed representation of sentences and documents
PDF
Parekh dfa
PPTX
Ir meaning, nature and importance
PDF
International relations
PPTX
Text Mining for Lexicography
Introduciton to international relation
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Inductive learning of long-distance dissimilation as a problem for phonology
Distributed representation of sentences and documents
Parekh dfa
Ir meaning, nature and importance
International relations
Text Mining for Lexicography

Similar to Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning (20)

PDF
Word2vec and Friends
PPTX
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
PPTX
Vectors in Search - Towards More Semantic Matching
PPTX
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
PDF
Word2vec ultimate beginner
PPTX
Haystack 2019 - Search with Vectors - Simon Hughes
PPTX
Searching with vectors
PDF
Lda2vec text by the bay 2016 with notes
PPTX
Vectorization In NLP.pptx
PPTX
What is word2vec?
PDF
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
PDF
Word2vec on the italian language: first experiments
PDF
lda2vec Text by the Bay 2016
PDF
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
DOC
Doc format.
PDF
StarSpace: Embed All The Things!
PPTX
Intro to Vectorization Concepts - GaTech cse6242
PPTX
Word2 vec
PDF
Yoav Goldberg: Word Embeddings What, How and Whither
PPTX
A Simple Introduction to Word Embeddings
Word2vec and Friends
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
Vectors in Search - Towards More Semantic Matching
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Word2vec ultimate beginner
Haystack 2019 - Search with Vectors - Simon Hughes
Searching with vectors
Lda2vec text by the bay 2016 with notes
Vectorization In NLP.pptx
What is word2vec?
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Word2vec on the italian language: first experiments
lda2vec Text by the Bay 2016
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Doc format.
StarSpace: Embed All The Things!
Intro to Vectorization Concepts - GaTech cse6242
Word2 vec
Yoav Goldberg: Word Embeddings What, How and Whither
A Simple Introduction to Word Embeddings
Ad

More from Katerina Vylomova (17)

PDF
Documenting and modeling inflectional paradigms in under-resourced languages
PDF
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
PDF
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
PDF
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
PDF
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
PDF
Ekaterina vylomova-what-do-neural models-know-about-language-p2
PDF
Ekaterina vylomova-what-do-neural models-know-about-language-p1
PDF
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
PDF
Contextualization of Morphological Inflection
PDF
Paradigm Completion for Derivational Morphology
PDF
Contemporary Models of Natural Language Processing
PDF
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
PDF
Context-Aware Derivation Prediction // EACL 2017
PDF
Working with text data
PDF
Neural models for recognition of basic units of semiographic chants
PDF
Russia, Russians and Russian language
PDF
Ekaterina Vylomova/Brown Bag seminar presentation
Documenting and modeling inflectional paradigms in under-resourced languages
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Contextualization of Morphological Inflection
Paradigm Completion for Derivational Morphology
Contemporary Models of Natural Language Processing
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Context-Aware Derivation Prediction // EACL 2017
Working with text data
Neural models for recognition of basic units of semiographic chants
Russia, Russians and Russian language
Ekaterina Vylomova/Brown Bag seminar presentation
Ad

Recently uploaded (20)

PPTX
BIOMOLECULES PPT........................
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Application of enzymes in medicine (2).pptx
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Pharmacology of Autonomic nervous system
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
Microbes in human welfare class 12 .pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Biomechanics of the Hip - Basic Science.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
BIOMOLECULES PPT........................
Biophysics 2.pdffffffffffffffffffffffffff
Science Quipper for lesson in grade 8 Matatag Curriculum
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Application of enzymes in medicine (2).pptx
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Pharmacology of Autonomic nervous system
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Seminar Hypertension and Kidney diseases.pptx
Microbes in human welfare class 12 .pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
The Land of Punt — A research by Dhani Irwanto
Phytochemical Investigation of Miliusa longipes.pdf
Biomechanics of the Hip - Basic Science.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

  • 1. Evaluating the Utility of Vector Differences for Lexical Relation Learning Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin August, 09 2016 Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 1 / 23
  • 2. The utility of difference vectors DIFFVEC = word2 − word1 Vector Difference, or Offset Mikolov et al, 2013: king − man + woman ≈ queen CAPITAL-CITY: Paris − France + Poland ≈ Warsaw or PLURALISATION: cars − car + apple ≈ apples Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 2 / 23
  • 3. The utility of difference vectors DIFFVEC = word2 − word1 Vector Difference, or Offset Mikolov et al, 2013: king − man + woman ≈ queen CAPITAL-CITY: Paris − France + Poland ≈ Warsaw or PLURALISATION: cars − car + apple ≈ apples Can Diffw1,w2be clustered or classified into a broad coverage set of lexical relations? Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 3 / 23
  • 4. Types of relations Lexical semantic relations LEXSEMHyper: Hypernymy (animal, dog) LEXSEMMero: Meronymy (bird, wing) LEXSEMEvent: Object’s Action (zip, coat) Morphosyntactic relations VERBPast: Present, 1st → Past (know, knew) VERB3: Present, 1st → Present, 3rd (know, knows) VERB3Past: Present, 3rd → Past (knows, knew) NOUNSP: Singular → Plural (year, years) Morphosemantic relations VERBNOUN: Nominalisation of a verb (drive, drift) PREFIX: Prefixing with re morpheme (vote, revote) Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 4 / 23
  • 5. Word Embeddings Name Dimensions Training data w2v 300 100 × 109 GloVe 200 6 × 109 SENNA 100 37 × 106 HLBL 200 37 × 106 w2vwiki 300 50 × 106 GloVewiki 300 50 × 106 SVDwiki 300 50 × 106 The Models Used w2v (Mikolov et al.,2013), GloVe (Pennington et al.,2014), SENNA (Collobert et al., 2011), HLBL (Mnih and Hinton,2009) PPMI+SVD (Levy and Goldberg, 2015) Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 5 / 23
  • 6. Closed-World Experiments Closed-World setting: Multi-class classifier Let {(wi , wj )} be a set of word pairs R = {rk } be a set of binary lexical relations (wi , wj ) → rk ∈ R, i.e. all word pairs can be uniquely classified according to a relation in R Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 6 / 23
  • 7. Spectral clustering: t-SNE projection for 10 samples per class LEXSEMAttr LEXSEMCause NOUNColl LEXSEMEvent LEXSEMHyper LVC LEXSEMMero NOUNSP PREFIX LEXSEMRef LEXSEMSpace VERB3 VERB3Past VERBPast VERBNOUN Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 7 / 23
  • 8. Methods Clustering algorithm Spectral clustering (von Luxburg, 2007). Two hyperparameters: (1) the number of clusters; and (2) the pairwise similarity measure for comparing DIFFVECs. Clustering evaluation V-Measure (Rosenberg, Hirschberg, 2007) Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 8 / 23
  • 9. Clustering results 10 20 30 40 50 60 70 80 0.150.200.250.300.350.40 Number of clusters V-Measure w2v w2v wiki GloVe GloVe wiki SVD wiki HLBL SENNA Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 9 / 23
  • 10. Clustering results Incorrectly classified due to ambiguity/one word overwhelming another studies − study ⇒ VERBNOUN saw − utensil ⇒ VERBPast tigers − ambush ⇒ NOUNColl Single ”hypernym”-specific clusters necklace − unit, wristband − unit, hairpin − unit Semantic sub-clusters movement verb − animal noun food verb − food noun action verb − profession noun Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 10 / 23
  • 11. From Clustering to Classification From Clustering to Classification Encouraged by the results of the clustering experiment, we next move to classification experiments. We train multi-class linear classifier to differentiate between the relations types. Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 11 / 23
  • 12. Classification: Multi-class linear SVM, F-scores 0.00.20.40.60.81.0 H yper Event M ero N oun SP Verb 3 Verb Past Verb 3Past PrefixR e N oun C oll M icroAvg Baseline W2V W2V:Wiki SVD:Wiki Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 12 / 23
  • 13. Open-World Experiments Open-World setting: Binary classifier Let {(wi , wj )} be a set of word pairs R = {rk } be a set of binary lexical relations (wi , wj ) → rk ∈ R ∪ {φ} where φ signifies the fact that none of the relations in R apply to the word pair Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 13 / 23
  • 14. Binary classification Adding some random pairs, i.e. randomly linked word pairs Generating random pairs (1) Seed word proportional to their frequency in Wikipedia ⇒ (2) take the Cartesian product over pairs of words from the seed lexicon ⇒ (3) sample word pairs uniformly from this set Training the classifiers Train 9 binary SVM classifiers with RBF kernel and evaluate on test set augmented with random samples Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 14 / 23
  • 15. Open-World Results Open−World 0.00.20.40.60.81.0 H yper Event M ero N oun SP Verb 3 Verb PastVerb 3PastPrefixR e N oun C oll Pr: No NS Re: No NS Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 15 / 23
  • 16. Binary classification Results correctly captured many of the true classes of the relations (high recall), but also many of the random samples as being related (low precision): (have, works), (turn, took), (works, started) ⇒ VERB3, VERBPast and VERB3Past NOUNColl: everything related to animals LEXSEMMero: mainly relations consisting of nouns Relational similarity ≈ a combination of attributional similarities. Some of them (e.g., syntactic) the classifier captures, and some(e.g., semantics) might be missing. Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 16 / 23
  • 17. Binary classification The classifier over-generalizes → Add extra negative samples to the training data. Negative Samples opposite pairs: switching the order of word pairs, Opposw1,w2 = word1 − word2 shuffled pairs: replacing w2 with a random word from the same relation, Shuffw1,w2 = word2 − word1. Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 17 / 23
  • 18. Binary classification with negative samples Usage of Negative Samples 0.00.20.40.60.81.0 H yper Event M ero N oun SP Verb 3 Verb Past Verb 3Past PrefixR e N oun C oll Pr: No NS Pr: With NS Re: No NS Re: With NS Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 18 / 23
  • 19. Lexical memorization A Short Example Train: LEXSEMHyper: Hypernymy (animal, dog) LEXSEMHyper: Hypernymy (animal, cat) LEXSEMHyper: Hypernymy (animal, monkey) Then in Test: LEXSEMHyper: Hypernymy : (animal, banana) Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 19 / 23
  • 20. Lexical memorization: No lexical overlap between test and train 0 1 2 3 4 5 0.00.20.40.60.81.0 Volume of random word pairs P/R/F P P+neg R R+neg F F+neg Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 20 / 23
  • 21. Conclusion many types of morphosyntactic differences are captured by DIFFVECs, morphosemantic relations are a bit harder and lexical semantic relations are captured less well classification over the DIFFVECs works extremely well in a closed-world setup, but less well over open data With the introduction of automatically-generated negative samples, however, the results improved substantially Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 21 / 23
  • 22. Open Questions Could some examples be more representative of the relation type? How much data do we need for the best generalization? Some morphosemantic (derivations:prefixing) and lexical relations (meronyms) are still hard to capture. Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 22 / 23
  • 23. Thanks Thank you for your time and attention! Questions? See more details here: http://arxiv.org/abs/1509.01692 Ekaterina Vylomova, Laura Rimell, Trevor Cohn, and Tim Baldwin Evaluating the Utility of Vector Differences for Lexical Relation Learning 23 / 23