SlideShare a Scribd company logo
Module 4 –Semantics
Analysis
Semantics - Lexical Semantics- Word Senses - Relations between Senses -
Word Sense Disambiguation (WSD) – Word Similarity Analysis using
Thesaurus and Distributional methods – Word2vec – fastText word
Embedding - Lesk Algorithm – Thematic Roles, Semantic Role labelling -
Pragmatics Analysis - Anaphora Resolution.
2
Semantic Analysis
• Assigning meanings to the structures created by syntactic analysis.
• Mapping words and structures to particular domain objects in way
consistent with our knowledge of the world.
• Semantic can play an import role in selecting among competing
syntactic analyses and discarding logical analyses.
• I robbed the bank -- bank is a river bank or a financial institution
• We have to decide the formalisms- which will be used in the meaning
representation.
Aspects of Semantic Analysis
Semantic Analysis
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
WordNet Synset Relationships
• Antonym: front  back
• Attribute: benevolence  good (noun to adjective)
• Pertainym: alphabetical  alphabet (adjective to noun)
• Similar: unquestioning  absolute
• Cause: kill  die
• Entailment: breathe  inhale
• Holonym: chapter  text (part to whole)
• Meronym: computer  cpu (whole to part)
• Hyponym: plant  tree (specialization)
• Hypernym: apple  fruit (generalization)
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
ways that dictionaries and thesauruses offer for defining senses
Glosses: textual definitions for each sense. Glosses are not a formal meaning
representation; they are just written for people.
Ex: Two sense for bank
1. financial institution that accepts deposits and channels the money into
lending activities 2. sloping land (especially the slope beside a body of
water)
Dictionary definitions: Defining a sense through its relationship with other
senses. Formal meaning
Word Sense
WordNet: A Database of Lexical Relations
• WordNet lexical database: English WordNet consists of three separate
databases, one each for nouns and verbs and a third for adjectives
and adverbs.
• Each database contains a set of lemmas, each one annotated with a
set of senses.
• The WordNet 3.0 release has 117,798 nouns, 11,529 verbs, 22,479
adjectives, and 4,481 adverbs.
• The average noun has 1.23 senses, and the average verb has 2.16
senses.
WordNet: A Database of Lexical Relations
• The set of near-synonyms for a WordNet sense is called a synset (for synonym set);
synsets are an important primitive in WordNet. The entry for bass includes synsets like
{bass6 , bass voice1 , basso2}
WordNet: A Database of Lexical Relations
• WordNet also labels each synset with a lexicographic category drawn from a
semantic field for example the 26 categories for nouns
• These categories are often called supersenses, because they act as coarse
semantic categories or groupings of senses which can be useful when word
senses are too fine-grained
WordNet: Sense Relations in WordNet
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
WordNet: Sense Relations in WordNet
Word sense disambiguation
• The task of selecting the correct sense for a word is called word sense
disambiguation, or WSD.
• WSD algorithms take as input a word in context and a fixed inventory
word sense disambiguation WSD of potential word senses and
outputs the correct word sense in context.
Word sense disambiguation – Task and
dataset
• all-words task: the system is given an entire texts and a lexicon with an
inventory of senses for each entry and we have to disambiguate every
word in the text (or sometimes just every content word).
Word sense disambiguation – Task and
dataset
• Supervised all-word disambiguation tasks are generally trained from a
semantic concordance, a corpus in which each open-class word in each
sentence is labeled semantic concordance with its word sense from a
specific dictionary or thesaurus, most often WordNet
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
WSD BASELINE
• Most Frequent Sense:
• each word from the senses in a labeled corpus
• For WordNet, this corresponds to the first sense, since senses in
WordNet are generally ordered from most frequent to least frequent
based on their counts in the SemCor sense-tagged corpus.
• The most frequent sense baseline can be quite accurate, and is
therefore often used as a default, to supply a word sense when a
supervised algorithm has insufficient training data.
WSD BASELINE
• One sense per discourse:
• a word appearing multiple times in a text or discourse often appears
with the same sense.
• Hold better for homonymy.
WSD ALGORITHM – CONTEXTUAL
EMBEDDING
• At training time we pass each sentence in the SemCore labeled
dataset through any contextual embedding resulting in a contextual
embedding for each labeled token in SemCore.
• sense embedding vs
• token of sense ci
WSD ALGORITHM – CONTEXTUAL
EMBEDDING
• At test time we similarly compute a contextual embedding t for the
target word, and choose its nearest neighbor sense (the sense with
the highest cosine similarity) from the training set.
WSD Methods
• Supervised Machine Learning
• Thesaurus/Dictionary Methods
• Semi-­Supervised
‐ Learning
4
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Word Sense
Disambiguation
Classification
Dan Jurafsky
Classification: definition
• Input:
• a word w and some features f
• a fixed set of classes C = {c1, c2,
…, cJ}
• Output: a predicted class c∈C
Dan Jurafsky
Classification Methods:
Supervised Machine Learning
• Input:
• a word w in a text window d (which we’ll call a “document”)
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-­labeled
‐ text windows again called
“documents” (d1,c1),....,(dm,cm)
• Output:
• a learned classifier γ:d  c
22
Dan Jurafsky
Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naive Bayes
• Logistic regression
• Neural Networks
• Support-­vector
‐ machines
• k-­Nearest
‐ Neighbors
• …
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Word Sense
Disambiguation
•Dictionary and
Thesaurus Methods
The Simplified Lesk algorithm
• Labeled corpora is expensive and difficult
• An alternative class of WSD algorithms, knowledge-based algorithms, rely
on knowledge-based wordnet or other such resources and don’t require
labeled data.
• While supervised algorithms generally work better, knowledge-based
methods can be used in languages or domains where thesauruses or
dictionaries but not sense labeled corpora are available.
• The lesk algorithm is the oldest and most powerful knowledge-based wsd
method, and is a useful baseline.
• Lesk is really a family of algorithms that choose the sense whose
dictionary gloss or definition shares the most words with the target
word’s neighborhood.
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
WSD-Word-in-Context Evaluation
• We can think of WSD as a kind of contextualized similarity task, since
our goal is to be able to distinguish the meaning of a word like bass in
one context (playing music) from another context (fishing).
• Here the system is given two sentences, each with the same target
word but in a different sentential context
• The system must decide whether the target words are used in the
same sense in the two sentences or in a different sense.
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
• In Word-in-context, first cluster the word senses into coarser clusters,
so that the two sentential contexts for the target word are marked as T
if the two senses are in the same cluster.
• In Word-in-context, first cluster the word senses into coarser clusters,
so that the two sentential contexts for the target word are marked as T
if the two senses are in the same cluster.
?
?
• In Word-in-context, first cluster the word senses into coarser clusters,
so that the two sentential contexts for the target word are marked as T
if the two senses are in the same cluster.
WSD- Wikipedia as a source of training data
• One important direction is to use Wikipedia as a source of sense-
labeled data.
• When a concept is mentioned in a Wikipedia article, the article text
may contain an explicit link to the concept’s Wikipedia page, which is
named by a unique identifier.
• This link can be used as a sense annotation.
• These sentences can then be added to the training data for a
supervised system
Extended Lesk Algorithm
WORD SIMILARITY
• Vectors semantics is the standard way to represent word meaning in
NLP
• vector semantics is to represent a word as a point in a
multidimensional semantic space that is derived (in ways we’ll see)
from the distributions of embeddings word neighbors.
WORD SIMILARITY
• For example, suppose you didn’t know the meaning of the word ongchoi (a
recent borrowing from Cantonese) but you see it in the following contexts:
(6.1) Ongchoi is delicious sauteed with garlic.
• (6.2) Ongchoi is superb over rice.
• (6.3) Ongchoi leaves with salty sauces...
WORD SIMILARITY
• For example, suppose you didn’t know the meaning of the word ongchoi (a
recent borrowing from Cantonese) but you see it in the following contexts:
(6.1) Ongchoi is delicious sauteed with garlic.
• (6.2) Ongchoi is superb over rice.
• (6.3) Ongchoi leaves with salty sauces...
• And suppose that you had seen many of these context words in other
contexts:
• (6.4) ...spinach sauteed with garlic over rice...
• (6.5) ...chard stems and leaves are delicious...
• (6.6) ...collard greens and other salty leafy greens
WORD SIMILARITY
• For example, suppose you didn’t know the meaning of the word ongchoi (a
recent borrowing from Cantonese) but you see it in the following contexts:
(6.1) Ongchoi is delicious sauteed with garlic.
• (6.2) Ongchoi is superb over rice.
• (6.3) Ongchoi leaves with salty sauces...
• And suppose that you had seen many of these context words in other
contexts:
• (6.4) ...spinach sauteed with garlic over rice...
• (6.5) ...chard stems and leaves are delicious...
• (6.6) ...collard greens and other salty leafy greens
ongchoi is a leafy green similar to these other leafy greens
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Vectors and documents
• a term-document matrix: each row represents a word in the vocabulary and
each term-document matrix column represents a document from some
collection of documents
WORDS
DOCUMENTS
• A vector space is a collection of vectors, characterized by their dimension
dimension.
• The ordering of the numbers in a vector space indicates different meaningful
dimensions on which documents vary.
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
• Term-document matrices were originally defined as a means of finding similar
documents for the task of document information retrieval.
• Two documents that are similar will tend to have similar words, and if two
documents have similar words their column vectors will tend to be similar.
• The vectors for the comedies As You Like It [1,114,36,20] and Twelfth Night
[0,80,58,15] look a lot more like each other (more fools and wit than battles)
than they look like Julius Caesar [7,62,1,2] or Henry V [13,89,4,3].
Words as vectors: document dimensions
wit, [20,15,2,3]; battle, [1,0,7,13]; and good = ?; fool= ?
Words as vectors: document dimensions
wit, [20,15,2,3]; battle, [1,0,7,13]; and good [114,80,62,89]; fool, [36,58,1,4]
Words as vectors: word dimensions
Word-word matrix or the term-context matrix: The columns are labeled by words
rather than documents
Retrieval in vector space model
• Query q is represented in the same way or slightly
differently.
• Relevance of di to q: Compare the similarity of query q
and document di.
• Cosine similarity (the cosine of the angle between the
two vectors)
• Cosine is also commonly used in text clustering
89
Cosine for measuring similarity
• To measure the similarity between two target words v and w, we need a metric
that takes two vectors (of the same dimensionality, either both with words as
dimensions
• By far the most common similarity metric is the cosine of the angle between the
vectors
Cosine for measuring similarity
Vector length
Cosine for measuring similarity
Cosine for measuring similarity
Cosine for measuring similarity
An Example
• A document space is defined by three terms:
• hardware, software, users
• the vocabulary
• A set of documents are defined as:
• A1=(1, 0, 0), A2=(0, 1, 0), A3=(0, 0, 1)
• A4=(1, 1, 0), A5=(1, 0, 1), A6=(0, 1, 1)
• A7=(1, 1, 1) A8=(1, 0, 1). A9=(0, 1, 1)
• If the Query is “hardware and software”
• what documents should be retrieved?
95
An Example (cont.)
• In Boolean query matching:
• document A4, A7 will be retrieved (“AND”)
• retrieved: A1, A2, A4, A5, A6, A7, A8, A9 (“OR”)
• In similarity matching (cosine):
• q=(1, 1, 0)
• S(q, A1)=0.71, S(q, A2)=0.71, S(q, A3)=0
• S(q, A4)=1, S(q, A5)=0.5, S(q, A6)=0.5
• S(q, A7)=0.82, S(q, A8)=0.5, S(q, A9)=0.5
• Document retrieved set (with ranking)=
• {A4, A7, A1, A2, A5, A6, A8, A9}
96
Word Similarity analysis using Thesaurus and
Distributional methods
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Deer - elk
WORDNET
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps
Module 4.1 of chennai's slides wo hanve dot do thhopps otps

More Related Content

PDF
Making sense of word senses: An introduction to word-sense disambiguation and...
PDF
Ny3424442448
PDF
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
PPTX
WORD SENSE DISAMIGUITION, methods to resolve WSD, wordnet,
PDF
Word sense disambiguation a survey
PDF
Lecture: Word Sense Disambiguation
PDF
Usage of word sense disambiguation in concept identification in ontology cons...
PPTX
A Simple Walkthrough of Word Sense Disambiguation
Making sense of word senses: An introduction to word-sense disambiguation and...
Ny3424442448
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
WORD SENSE DISAMIGUITION, methods to resolve WSD, wordnet,
Word sense disambiguation a survey
Lecture: Word Sense Disambiguation
Usage of word sense disambiguation in concept identification in ontology cons...
A Simple Walkthrough of Word Sense Disambiguation

Similar to Module 4.1 of chennai's slides wo hanve dot do thhopps otps (20)

PDF
Improvement wsd dictionary using annotated corpus and testing it with simplif...
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
A supervised word sense disambiguation method using ontology and context know...
PDF
Word sense disambiguation a survey
PPTX
An Improved Approach to Word Sense Disambiguation
PDF
A Survey on Word Sense Disambiguation
PDF
A Survey on Unsupervised Graph-based Word Sense Disambiguation
PDF
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
PDF
Wsd final paper
PDF
IJNLC 2013 - Ambiguity-Aware Document Similarity
PDF
AMBIGUITY-AWARE DOCUMENT SIMILARITY
PDF
Exempler approach
DOC
Doc format.
PDF
Disambiguating Polysemous Queries For Document Retrieval
PDF
Word Sense Disambiguation and Induction
PPTX
Word sense disambiguation and lexical chains construction using wordnet
PPTX
LexicalSemanticsWordSenses.pptxMMMMMMMMMMMMMMMMMMMMMMMMM
PPTX
Word Sense Disambiguation - Algorithms for WSD.pptx
PPTX
DATA641 Lecture 3 - Word meaning.pptx
PPTX
NLP Introduction and basics of natural language processing
Improvement wsd dictionary using annotated corpus and testing it with simplif...
International Journal of Engineering and Science Invention (IJESI)
A supervised word sense disambiguation method using ontology and context know...
Word sense disambiguation a survey
An Improved Approach to Word Sense Disambiguation
A Survey on Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Wsd final paper
IJNLC 2013 - Ambiguity-Aware Document Similarity
AMBIGUITY-AWARE DOCUMENT SIMILARITY
Exempler approach
Doc format.
Disambiguating Polysemous Queries For Document Retrieval
Word Sense Disambiguation and Induction
Word sense disambiguation and lexical chains construction using wordnet
LexicalSemanticsWordSenses.pptxMMMMMMMMMMMMMMMMMMMMMMMMM
Word Sense Disambiguation - Algorithms for WSD.pptx
DATA641 Lecture 3 - Word meaning.pptx
NLP Introduction and basics of natural language processing
Ad

Recently uploaded (20)

PPTX
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PDF
Lubrication system for Automotive technologies
PPTX
Victory precisions_Die casting foundry_.pptx
PPTX
Engineering equation silver Presentation.pptx
PPTX
45-Days-of-Engineering-Excellence-132-kV-Grid-Substation-Training.pptx
PPTX
Advance Module FI 160.pptx para pulsaar 160 y su sistema de encencido
PDF
Pharmacy is a goood college yvucc7t7tvy7tv7t
PPTX
Applications of SAP S4HANA in Mechanical by Sidhant Vohra (SET23A24040166).pptx
PPTX
Moral Theology (PREhhhhhhhhhhhhhhhhhhhhhLIMS) (1).pptx
PPTX
internal combustion engine renewable new
PPTX
description of motor equipments and its process.pptx
PDF
Marketing project 2024 for marketing students
PDF
John Deere 410E service Repair Manual.pdf
PDF
John Deere 460E II Articulated Dump Truck Service Manual.pdf
PPTX
LESSON 3 Apply Safety Practices mmms.pptx
PPTX
Zeem: Transition Your Fleet, Seamlessly by Margaret Boelter
PDF
John Deere 410E II Articulated Dump Truck Service Manual.pdf
PDF
LB85 New Holland Service Repair Manual.pdf
PPTX
Money and credit.pptx from economice class IX
PDF
Fuel injection pump Volvo EC55 Repair Manual.pdf
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Lubrication system for Automotive technologies
Victory precisions_Die casting foundry_.pptx
Engineering equation silver Presentation.pptx
45-Days-of-Engineering-Excellence-132-kV-Grid-Substation-Training.pptx
Advance Module FI 160.pptx para pulsaar 160 y su sistema de encencido
Pharmacy is a goood college yvucc7t7tvy7tv7t
Applications of SAP S4HANA in Mechanical by Sidhant Vohra (SET23A24040166).pptx
Moral Theology (PREhhhhhhhhhhhhhhhhhhhhhLIMS) (1).pptx
internal combustion engine renewable new
description of motor equipments and its process.pptx
Marketing project 2024 for marketing students
John Deere 410E service Repair Manual.pdf
John Deere 460E II Articulated Dump Truck Service Manual.pdf
LESSON 3 Apply Safety Practices mmms.pptx
Zeem: Transition Your Fleet, Seamlessly by Margaret Boelter
John Deere 410E II Articulated Dump Truck Service Manual.pdf
LB85 New Holland Service Repair Manual.pdf
Money and credit.pptx from economice class IX
Fuel injection pump Volvo EC55 Repair Manual.pdf
Ad

Module 4.1 of chennai's slides wo hanve dot do thhopps otps

  • 1. Module 4 –Semantics Analysis Semantics - Lexical Semantics- Word Senses - Relations between Senses - Word Sense Disambiguation (WSD) – Word Similarity Analysis using Thesaurus and Distributional methods – Word2vec – fastText word Embedding - Lesk Algorithm – Thematic Roles, Semantic Role labelling - Pragmatics Analysis - Anaphora Resolution.
  • 2. 2 Semantic Analysis • Assigning meanings to the structures created by syntactic analysis. • Mapping words and structures to particular domain objects in way consistent with our knowledge of the world. • Semantic can play an import role in selecting among competing syntactic analyses and discarding logical analyses. • I robbed the bank -- bank is a river bank or a financial institution • We have to decide the formalisms- which will be used in the meaning representation.
  • 15. WordNet Synset Relationships • Antonym: front  back • Attribute: benevolence  good (noun to adjective) • Pertainym: alphabetical  alphabet (adjective to noun) • Similar: unquestioning  absolute • Cause: kill  die • Entailment: breathe  inhale • Holonym: chapter  text (part to whole) • Meronym: computer  cpu (whole to part) • Hyponym: plant  tree (specialization) • Hypernym: apple  fruit (generalization)
  • 20. ways that dictionaries and thesauruses offer for defining senses Glosses: textual definitions for each sense. Glosses are not a formal meaning representation; they are just written for people. Ex: Two sense for bank 1. financial institution that accepts deposits and channels the money into lending activities 2. sloping land (especially the slope beside a body of water) Dictionary definitions: Defining a sense through its relationship with other senses. Formal meaning Word Sense
  • 21. WordNet: A Database of Lexical Relations • WordNet lexical database: English WordNet consists of three separate databases, one each for nouns and verbs and a third for adjectives and adverbs. • Each database contains a set of lemmas, each one annotated with a set of senses. • The WordNet 3.0 release has 117,798 nouns, 11,529 verbs, 22,479 adjectives, and 4,481 adverbs. • The average noun has 1.23 senses, and the average verb has 2.16 senses.
  • 22. WordNet: A Database of Lexical Relations • The set of near-synonyms for a WordNet sense is called a synset (for synonym set); synsets are an important primitive in WordNet. The entry for bass includes synsets like {bass6 , bass voice1 , basso2}
  • 23. WordNet: A Database of Lexical Relations • WordNet also labels each synset with a lexicographic category drawn from a semantic field for example the 26 categories for nouns • These categories are often called supersenses, because they act as coarse semantic categories or groupings of senses which can be useful when word senses are too fine-grained
  • 27. Word sense disambiguation • The task of selecting the correct sense for a word is called word sense disambiguation, or WSD. • WSD algorithms take as input a word in context and a fixed inventory word sense disambiguation WSD of potential word senses and outputs the correct word sense in context.
  • 28. Word sense disambiguation – Task and dataset • all-words task: the system is given an entire texts and a lexicon with an inventory of senses for each entry and we have to disambiguate every word in the text (or sometimes just every content word).
  • 29. Word sense disambiguation – Task and dataset • Supervised all-word disambiguation tasks are generally trained from a semantic concordance, a corpus in which each open-class word in each sentence is labeled semantic concordance with its word sense from a specific dictionary or thesaurus, most often WordNet
  • 41. WSD BASELINE • Most Frequent Sense: • each word from the senses in a labeled corpus • For WordNet, this corresponds to the first sense, since senses in WordNet are generally ordered from most frequent to least frequent based on their counts in the SemCor sense-tagged corpus. • The most frequent sense baseline can be quite accurate, and is therefore often used as a default, to supply a word sense when a supervised algorithm has insufficient training data.
  • 42. WSD BASELINE • One sense per discourse: • a word appearing multiple times in a text or discourse often appears with the same sense. • Hold better for homonymy.
  • 43. WSD ALGORITHM – CONTEXTUAL EMBEDDING • At training time we pass each sentence in the SemCore labeled dataset through any contextual embedding resulting in a contextual embedding for each labeled token in SemCore. • sense embedding vs • token of sense ci
  • 44. WSD ALGORITHM – CONTEXTUAL EMBEDDING • At test time we similarly compute a contextual embedding t for the target word, and choose its nearest neighbor sense (the sense with the highest cosine similarity) from the training set.
  • 45. WSD Methods • Supervised Machine Learning • Thesaurus/Dictionary Methods • Semi-­Supervised ‐ Learning 4
  • 48. Dan Jurafsky Classification: definition • Input: • a word w and some features f • a fixed set of classes C = {c1, c2, …, cJ} • Output: a predicted class c∈C
  • 49. Dan Jurafsky Classification Methods: Supervised Machine Learning • Input: • a word w in a text window d (which we’ll call a “document”) • a fixed set of classes C = {c1, c2,…, cJ} • A training set of m hand-­labeled ‐ text windows again called “documents” (d1,c1),....,(dm,cm) • Output: • a learned classifier γ:d  c 22
  • 50. Dan Jurafsky Classification Methods: Supervised Machine Learning • Any kind of classifier • Naive Bayes • Logistic regression • Neural Networks • Support-­vector ‐ machines • k-­Nearest ‐ Neighbors • …
  • 56. The Simplified Lesk algorithm • Labeled corpora is expensive and difficult • An alternative class of WSD algorithms, knowledge-based algorithms, rely on knowledge-based wordnet or other such resources and don’t require labeled data. • While supervised algorithms generally work better, knowledge-based methods can be used in languages or domains where thesauruses or dictionaries but not sense labeled corpora are available. • The lesk algorithm is the oldest and most powerful knowledge-based wsd method, and is a useful baseline. • Lesk is really a family of algorithms that choose the sense whose dictionary gloss or definition shares the most words with the target word’s neighborhood.
  • 68. WSD-Word-in-Context Evaluation • We can think of WSD as a kind of contextualized similarity task, since our goal is to be able to distinguish the meaning of a word like bass in one context (playing music) from another context (fishing). • Here the system is given two sentences, each with the same target word but in a different sentential context • The system must decide whether the target words are used in the same sense in the two sentences or in a different sense.
  • 70. • In Word-in-context, first cluster the word senses into coarser clusters, so that the two sentential contexts for the target word are marked as T if the two senses are in the same cluster.
  • 71. • In Word-in-context, first cluster the word senses into coarser clusters, so that the two sentential contexts for the target word are marked as T if the two senses are in the same cluster. ? ?
  • 72. • In Word-in-context, first cluster the word senses into coarser clusters, so that the two sentential contexts for the target word are marked as T if the two senses are in the same cluster.
  • 73. WSD- Wikipedia as a source of training data • One important direction is to use Wikipedia as a source of sense- labeled data. • When a concept is mentioned in a Wikipedia article, the article text may contain an explicit link to the concept’s Wikipedia page, which is named by a unique identifier. • This link can be used as a sense annotation. • These sentences can then be added to the training data for a supervised system
  • 75. WORD SIMILARITY • Vectors semantics is the standard way to represent word meaning in NLP • vector semantics is to represent a word as a point in a multidimensional semantic space that is derived (in ways we’ll see) from the distributions of embeddings word neighbors.
  • 76. WORD SIMILARITY • For example, suppose you didn’t know the meaning of the word ongchoi (a recent borrowing from Cantonese) but you see it in the following contexts: (6.1) Ongchoi is delicious sauteed with garlic. • (6.2) Ongchoi is superb over rice. • (6.3) Ongchoi leaves with salty sauces...
  • 77. WORD SIMILARITY • For example, suppose you didn’t know the meaning of the word ongchoi (a recent borrowing from Cantonese) but you see it in the following contexts: (6.1) Ongchoi is delicious sauteed with garlic. • (6.2) Ongchoi is superb over rice. • (6.3) Ongchoi leaves with salty sauces... • And suppose that you had seen many of these context words in other contexts: • (6.4) ...spinach sauteed with garlic over rice... • (6.5) ...chard stems and leaves are delicious... • (6.6) ...collard greens and other salty leafy greens
  • 78. WORD SIMILARITY • For example, suppose you didn’t know the meaning of the word ongchoi (a recent borrowing from Cantonese) but you see it in the following contexts: (6.1) Ongchoi is delicious sauteed with garlic. • (6.2) Ongchoi is superb over rice. • (6.3) Ongchoi leaves with salty sauces... • And suppose that you had seen many of these context words in other contexts: • (6.4) ...spinach sauteed with garlic over rice... • (6.5) ...chard stems and leaves are delicious... • (6.6) ...collard greens and other salty leafy greens ongchoi is a leafy green similar to these other leafy greens
  • 80. Vectors and documents • a term-document matrix: each row represents a word in the vocabulary and each term-document matrix column represents a document from some collection of documents WORDS DOCUMENTS
  • 81. • A vector space is a collection of vectors, characterized by their dimension dimension. • The ordering of the numbers in a vector space indicates different meaningful dimensions on which documents vary.
  • 83. • Term-document matrices were originally defined as a means of finding similar documents for the task of document information retrieval. • Two documents that are similar will tend to have similar words, and if two documents have similar words their column vectors will tend to be similar. • The vectors for the comedies As You Like It [1,114,36,20] and Twelfth Night [0,80,58,15] look a lot more like each other (more fools and wit than battles) than they look like Julius Caesar [7,62,1,2] or Henry V [13,89,4,3].
  • 84. Words as vectors: document dimensions wit, [20,15,2,3]; battle, [1,0,7,13]; and good = ?; fool= ?
  • 85. Words as vectors: document dimensions wit, [20,15,2,3]; battle, [1,0,7,13]; and good [114,80,62,89]; fool, [36,58,1,4]
  • 86. Words as vectors: word dimensions Word-word matrix or the term-context matrix: The columns are labeled by words rather than documents
  • 87. Retrieval in vector space model • Query q is represented in the same way or slightly differently. • Relevance of di to q: Compare the similarity of query q and document di. • Cosine similarity (the cosine of the angle between the two vectors) • Cosine is also commonly used in text clustering 89
  • 88. Cosine for measuring similarity • To measure the similarity between two target words v and w, we need a metric that takes two vectors (of the same dimensionality, either both with words as dimensions • By far the most common similarity metric is the cosine of the angle between the vectors
  • 89. Cosine for measuring similarity Vector length
  • 90. Cosine for measuring similarity
  • 91. Cosine for measuring similarity
  • 92. Cosine for measuring similarity
  • 93. An Example • A document space is defined by three terms: • hardware, software, users • the vocabulary • A set of documents are defined as: • A1=(1, 0, 0), A2=(0, 1, 0), A3=(0, 0, 1) • A4=(1, 1, 0), A5=(1, 0, 1), A6=(0, 1, 1) • A7=(1, 1, 1) A8=(1, 0, 1). A9=(0, 1, 1) • If the Query is “hardware and software” • what documents should be retrieved? 95
  • 94. An Example (cont.) • In Boolean query matching: • document A4, A7 will be retrieved (“AND”) • retrieved: A1, A2, A4, A5, A6, A7, A8, A9 (“OR”) • In similarity matching (cosine): • q=(1, 1, 0) • S(q, A1)=0.71, S(q, A2)=0.71, S(q, A3)=0 • S(q, A4)=1, S(q, A5)=0.5, S(q, A6)=0.5 • S(q, A7)=0.82, S(q, A8)=0.5, S(q, A9)=0.5 • Document retrieved set (with ranking)= • {A4, A7, A1, A2, A5, A6, A8, A9} 96
  • 95. Word Similarity analysis using Thesaurus and Distributional methods

Editor's Notes

  • #13: A holonym is a word that refers to a complete thing, while another word represents a part of that thing. In natural language processing (NLP), holonymy can help improve the understanding and interpretation of human language. For example, if you search for "air_bag" but can't remember the name, you can search for "car" instead and the results may include "air_bag". a term that denotes part of something but is used to refer to the whole of it, e.g. faces when used to mean people in I see several familiar faces present.
  • #15: Pertainyms are usually defined by phrases like "of or pertaining to" and do not have antonyms.