SlideShare a Scribd company logo
4
Most read
12
Most read
14
Most read
DLO8012: Natural Language
Processing
Subject Teacher:
Prof. Vikas Dubey
RIZVI COLLEGE OF ENGINEERING
BANDRA(W),MUMBAI
1
Module-3
Syntax Analysis
CO-3 [10hrs]
CO-3: Be able to model linguistic phenomena with formal grammars.
2
3
Conditional Probability and Tags
• P(Verb) is the probability of a randomly selected word being a verb.
• P(Verb|race) is “what’s the probability of a word being a verb given that it’s
the word “race”?
• Race can be a noun or a verb.
• It’s more likely to be a noun.
• P(Verb|race) can be estimated by looking at some corpus and saying “out of
all the times we saw ‘race’, how many were verbs?
• In Brown corpus, P(Verb|race) = 96/98 = .98
• How to calculate for a tag sequence, say P(NN|DT)?

P(V | race) =
Count(race is verb)
total Count(race)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Stochastic Tagging
• Stochastic taggers generally resolve tagging ambiguities by using a
training corpus to compute the probability of a given word having a
given tag in a given context.
• Stochastic tagger called also HMM tagger or a Maximum
Likelihood Tagger, or a Markov model HMM TAGGER tagger,
based on the Hidden Markov Model.
• For a given word sequence, Hidden Markov Model (HMM) Taggers
choose the tag sequence that maximizes,
P(word | tag) * P(tag | previous-n-tags)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
4
Stochastic Tagging
• A bigram HMM tagger chooses the tag ti for word wi that is most
probable given the previous tag, ti-1
ti = argmaxj P(tj | ti-1, wi)
• From the chain rule for probability factorization,
• Some approximation are introduced to simplify the model, such as
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
5
Stochastic Tagging
• The word probability depends only on the tag
• The dependence of a tag from the preceding tag history is limited in
time, e.i. a tag depends only on the two preceding ones,
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
6
7
Statistical POS Tagging (Allen95)
• Let’s step back a minute and remember some probability theory and its use in
POS tagging.
• Suppose, with no context, we just want to know given the word “flies” whether
it should be tagged as a noun or as a verb.
• We use conditional probability for this: we want to know which is greater
PROB(N | flies) or PROB(V | flies)
• Note definition of conditional probability
PROB(a | b) = PROB(a & b) / PROB(b)
– Where PROB(a & b) is the probability of the two events a & b occurring
simultaneously
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
8
Calculating POS for “flies”
We need to know which is more
• PROB(N | flies) = PROB(flies & N) / PROB(flies)
• PROB(V | flies) = PROB(flies & V) / PROB(flies)
• Count on a Corpus
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Stochastic Tagging
• The simplest stochastic tagger applies the following approaches for POS
tagging –
Approach 1: Word Frequency Approach
• In this approach, the stochastic taggers disambiguate the words based on
the probability that a word occurs with a particular tag.
• We can also say that the tag encountered most frequently with the word in
the training set is the one assigned to an ambiguous instance of that word.
• The main issue with this approach is that it may yield inadmissible
sequence of tags.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
9
Stochastic Tagging
• Assign each word its most likely POS tag
– If w has tags t1, …, tk, then can use
– P(ti | w) = c(w,ti )/(c(w,t1) + … + c(w,tk)), where
– c(w, ti ) = number of times w/ti appears in the corpus
– Success: 91% for English
Example heat :: noun/89, verb/5
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
10
Stochastic Tagging
Approach 2: Tag Sequence Probabilities
• It is another approach of stochastic tagging, where the tagger
calculates the probability of a given sequence of tags occurring.
• It is also called n-gram approach.
• It is called so because the best tag for a given word is determined by
the probability at which it occurs with the n previous tags.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
11
Stochastic Tagging
• Given: sequence of words W
– W = w1,w2,…,wn (a sentence)
• – e.g., W = heat water in a large vessel
• Assign sequence of tags T:
• T = t1, t2, … , tn
• Find T that maximizes P(T | W)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
12
Stochastic Tagging
• But P(ti|wi) is difficult to compute and Bayesian classification rule is used:
P(x|y) = P(x) P(y|x) / P(y)
• When applied to the sequence of words, the most probable tag sequence
would be
P(ti|wi) = P(ti) P(wi|ti)/P(wi)
• where P(wi) does not change and thus do not need to be calculated
• Thus, the most probable tag sequence is the product of two probabilities for
each possible sequence:
– Prior probability of the tag sequence. Context P(ti)
– Likelihood of the sequence of words considering a sequence of (hidden)
tags. P(wi|ti)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
13
Stochastic Tagging
• Two simplifications for computing the most probable sequence of tags:
– Prior probability of the part of speech tag of a word depends only on the
tag of the previous word (bigrams, reduce context to previous). Facilitates
the computation of P(ti)
– Ex. Probability of noun after determiner
– Probability of a word depends only on its part-of-speech tag.
(independent of other words in the context). Facilitates the computation of
P(wi|ti), Likelihood probability.
• Ex. given the tag noun, probability of word dog
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
14
15
Stochastic Tagging
• Based on probability of certain tag occurring given
various possibilities
• Necessitates a training corpus
• No probabilities for words not in corpus.
• Training corpus may be too different from test corpus.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
16
Stochastic Tagging (cont.)
Simple Method: Choose most frequent tag in training text
for each word!
– Result: 90% accuracy
– Why?
– Baseline: Others will do better
– HMM is an example
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Thank You…
17

More Related Content

PDF
An introduction to the Transformers architecture and BERT
PPTX
Tutorial on word2vec
PPT
backpropagation in neural networks
PDF
Lecture: Word Sense Disambiguation
PPTX
Text similarity measures
PPTX
NLP_KASHK:POS Tagging
PPTX
NLP_KASHK:Parsing with Context-Free Grammar
PDF
Word Embeddings - Introduction
An introduction to the Transformers architecture and BERT
Tutorial on word2vec
backpropagation in neural networks
Lecture: Word Sense Disambiguation
Text similarity measures
NLP_KASHK:POS Tagging
NLP_KASHK:Parsing with Context-Free Grammar
Word Embeddings - Introduction

What's hot (20)

PPTX
Chomsky classification of Language
PDF
Toward wave net speech synthesis
PPTX
Word embedding
PPT
Adaline and Madaline.ppt
PPTX
String Matching Finite Automata & KMP Algorithm.
PPTX
PPTX
NLP_KASHK:Minimum Edit Distance
PPTX
An overview of gradient descent optimization algorithms
PPT
01 knapsack using backtracking
PDF
IE: Named Entity Recognition (NER)
PDF
Syntactic analysis in NLP
PDF
PR-409: Denoising Diffusion Probabilistic Models
PPTX
Lecture 23 alpha beta pruning
PPTX
What is word2vec?
PPTX
Nlp toolkits and_preprocessing_techniques
PPTX
Subword tokenizers
PPTX
[AIoTLab]attention mechanism.pptx
PPTX
Notes on attention mechanism
PPT
Genetic algorithm
Chomsky classification of Language
Toward wave net speech synthesis
Word embedding
Adaline and Madaline.ppt
String Matching Finite Automata & KMP Algorithm.
NLP_KASHK:Minimum Edit Distance
An overview of gradient descent optimization algorithms
01 knapsack using backtracking
IE: Named Entity Recognition (NER)
Syntactic analysis in NLP
PR-409: Denoising Diffusion Probabilistic Models
Lecture 23 alpha beta pruning
What is word2vec?
Nlp toolkits and_preprocessing_techniques
Subword tokenizers
[AIoTLab]attention mechanism.pptx
Notes on attention mechanism
Genetic algorithm
Ad

Similar to Lecture-18(11-02-22)Stochastics POS Tagging.pdf (20)

PPTX
Parts of Speect Tagging
PPTX
Sentiment analysis using naive bayes classifier
PPT
2-Chapter Two-N-gram Language Models.ppt
PPTX
Contactsession -6NLM- dse engineering course bits
PDF
Enriching Word Vectors with Subword Information
PPTX
NLP_KASHK:Evaluating Language Model
PPTX
Language model in nature language processing
PPTX
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
PPTX
Pptphrase tagset mapping for french and english treebanks and its application...
PDF
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
PDF
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
PDF
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
PPT
slp05.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnmmmmmmmmm
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-II.pptx
PPTX
Statistical machine translation
PDF
Lecture 3: Semantic Role Labelling
PDF
word level analysis
PDF
Pptphrase tagset mapping for french and english treebanks and its application...
PPTX
Barreiro-Batista-LR4NLP@Coling2018-presentation
PDF
Poscat seminar 8
Parts of Speect Tagging
Sentiment analysis using naive bayes classifier
2-Chapter Two-N-gram Language Models.ppt
Contactsession -6NLM- dse engineering course bits
Enriching Word Vectors with Subword Information
NLP_KASHK:Evaluating Language Model
Language model in nature language processing
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Pptphrase tagset mapping for french and english treebanks and its application...
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
slp05.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnmmmmmmmmm
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-II.pptx
Statistical machine translation
Lecture 3: Semantic Role Labelling
word level analysis
Pptphrase tagset mapping for french and english treebanks and its application...
Barreiro-Batista-LR4NLP@Coling2018-presentation
Poscat seminar 8
Ad

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PPTX
Machine Learning_overview_presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Spectroscopy.pptx food analysis technology
Getting Started with Data Integration: FME Form 101
Machine Learning_overview_presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25-Week II
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology

Lecture-18(11-02-22)Stochastics POS Tagging.pdf

  • 1. DLO8012: Natural Language Processing Subject Teacher: Prof. Vikas Dubey RIZVI COLLEGE OF ENGINEERING BANDRA(W),MUMBAI 1
  • 2. Module-3 Syntax Analysis CO-3 [10hrs] CO-3: Be able to model linguistic phenomena with formal grammars. 2
  • 3. 3 Conditional Probability and Tags • P(Verb) is the probability of a randomly selected word being a verb. • P(Verb|race) is “what’s the probability of a word being a verb given that it’s the word “race”? • Race can be a noun or a verb. • It’s more likely to be a noun. • P(Verb|race) can be estimated by looking at some corpus and saying “out of all the times we saw ‘race’, how many were verbs? • In Brown corpus, P(Verb|race) = 96/98 = .98 • How to calculate for a tag sequence, say P(NN|DT)?  P(V | race) = Count(race is verb) total Count(race) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 4. Stochastic Tagging • Stochastic taggers generally resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context. • Stochastic tagger called also HMM tagger or a Maximum Likelihood Tagger, or a Markov model HMM TAGGER tagger, based on the Hidden Markov Model. • For a given word sequence, Hidden Markov Model (HMM) Taggers choose the tag sequence that maximizes, P(word | tag) * P(tag | previous-n-tags) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 4
  • 5. Stochastic Tagging • A bigram HMM tagger chooses the tag ti for word wi that is most probable given the previous tag, ti-1 ti = argmaxj P(tj | ti-1, wi) • From the chain rule for probability factorization, • Some approximation are introduced to simplify the model, such as Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 5
  • 6. Stochastic Tagging • The word probability depends only on the tag • The dependence of a tag from the preceding tag history is limited in time, e.i. a tag depends only on the two preceding ones, Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 6
  • 7. 7 Statistical POS Tagging (Allen95) • Let’s step back a minute and remember some probability theory and its use in POS tagging. • Suppose, with no context, we just want to know given the word “flies” whether it should be tagged as a noun or as a verb. • We use conditional probability for this: we want to know which is greater PROB(N | flies) or PROB(V | flies) • Note definition of conditional probability PROB(a | b) = PROB(a & b) / PROB(b) – Where PROB(a & b) is the probability of the two events a & b occurring simultaneously Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 8. 8 Calculating POS for “flies” We need to know which is more • PROB(N | flies) = PROB(flies & N) / PROB(flies) • PROB(V | flies) = PROB(flies & V) / PROB(flies) • Count on a Corpus Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 9. Stochastic Tagging • The simplest stochastic tagger applies the following approaches for POS tagging – Approach 1: Word Frequency Approach • In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs with a particular tag. • We can also say that the tag encountered most frequently with the word in the training set is the one assigned to an ambiguous instance of that word. • The main issue with this approach is that it may yield inadmissible sequence of tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 9
  • 10. Stochastic Tagging • Assign each word its most likely POS tag – If w has tags t1, …, tk, then can use – P(ti | w) = c(w,ti )/(c(w,t1) + … + c(w,tk)), where – c(w, ti ) = number of times w/ti appears in the corpus – Success: 91% for English Example heat :: noun/89, verb/5 Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 10
  • 11. Stochastic Tagging Approach 2: Tag Sequence Probabilities • It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of tags occurring. • It is also called n-gram approach. • It is called so because the best tag for a given word is determined by the probability at which it occurs with the n previous tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 11
  • 12. Stochastic Tagging • Given: sequence of words W – W = w1,w2,…,wn (a sentence) • – e.g., W = heat water in a large vessel • Assign sequence of tags T: • T = t1, t2, … , tn • Find T that maximizes P(T | W) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 12
  • 13. Stochastic Tagging • But P(ti|wi) is difficult to compute and Bayesian classification rule is used: P(x|y) = P(x) P(y|x) / P(y) • When applied to the sequence of words, the most probable tag sequence would be P(ti|wi) = P(ti) P(wi|ti)/P(wi) • where P(wi) does not change and thus do not need to be calculated • Thus, the most probable tag sequence is the product of two probabilities for each possible sequence: – Prior probability of the tag sequence. Context P(ti) – Likelihood of the sequence of words considering a sequence of (hidden) tags. P(wi|ti) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 13
  • 14. Stochastic Tagging • Two simplifications for computing the most probable sequence of tags: – Prior probability of the part of speech tag of a word depends only on the tag of the previous word (bigrams, reduce context to previous). Facilitates the computation of P(ti) – Ex. Probability of noun after determiner – Probability of a word depends only on its part-of-speech tag. (independent of other words in the context). Facilitates the computation of P(wi|ti), Likelihood probability. • Ex. given the tag noun, probability of word dog Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 14
  • 15. 15 Stochastic Tagging • Based on probability of certain tag occurring given various possibilities • Necessitates a training corpus • No probabilities for words not in corpus. • Training corpus may be too different from test corpus. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 16. 16 Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! – Result: 90% accuracy – Why? – Baseline: Others will do better – HMM is an example Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22