SlideShare a Scribd company logo
Word Level Analysis
POS (Continued)
L3 NLP (Elective)
1
TJS
Words and Word Classes
 Words are classified into categories called
part of speech (word classes or lexical
categories)
2
Part of Speech
NN noun student, chair, proof, mechanism
VB verb study, increase, produce
ADJ adj large, high, tall, few,
JJ adverb carefully slowly, uniformly
3
JJ adverb carefully slowly, uniformly
IN preposition in, on, to, of
PRP pronoun I, me, they
DET determiner the, a, an, this, those
 open vs. closed word classes
Part of Speech tagging
 process of assigning a part of speech like noun,
verb, pronoun, preposition, adverb, adjective,
etc. to each word in a sentence
4
POS tagger
Words
+
tag set
POS
tag
Speech/NN sounds/NNS were/VBD sampled/VBN
by/IN a/DT microphone/NN.
Another tagging possible for the sentence is:
5
Speech/NN sounds/VBZ were/VBD sampled/VBN
by/IN a/DT microphone/NN.
Part of speech tagging
methods
 Rule-based (linguistic)
 Stochastic (Data-driven) and
 TBL (Transformation Based Learning)
6
 TBL (Transformation Based Learning)
Rule-based (linguistic)
Steps:
1. Dictionary lookup  potential tags
2. Hand-coded Rules
The show must go on.
7
The show must go on.
Step 1  NN, VB
Step 2  discard incorrect tag
Rule: IF preceding word is determiner THEN
eliminate VB tag.
 Morphological information
IF word ends in –ing and preceding word is a verb
THEN label it a verb (VB).
8
 Capitalization information
+
 Speed
 Deterministic
9
 Deterministic
-
 requires manual work
 usable for only one language
Stochastic Tagger
 The standard stochastic tagger algorithm is the
Hidden Markov Model (HMM) tagger.
 A Markov model applies the simplifying
assumption that the probability of a chain of
assumption that the probability of a chain of
symbols can be approximated in terms of its
parts or n-grams.
 The simplest n-gram model is the unigram
model, which assigns the most likely tag (part-of-
speech) to each token.
10
 The unigram model requires tagged data to
gather most likely statistics. The context used by
the unigram tagger is the text of the word itself.
For example, it will assign the tag JJ for each
occurrence of fast if fast is used as an adjective
more frequently than it is used as a noun, verb,
or adverb.
or adverb.
 She had a fast
 Muslim fast during Ramadan
 Those who are injured need medical help fast.
 We would expect more accurate predictions if we
took more context into account when making a
tagging decision.
11
 A bi-gram tagger uses the current word and the
tag of the previous word in the tagging process.
As the tag sequence “DT NN” is more likely than
the tag sequence “DT JJ”, a bi-gram model will
assign a correct tag to the word fast in sentence
assign a correct tag to the word fast in sentence
(1).
 Similarly, it is more likely that an adverb (rather
than a noun or an adjective) follows a verb.
Hence, in sentence (3), the tag assigned to fast
will be RB (Adverb)
12
N- gram Model
 An n-gram model considers the current word
and the tag of the previous n-1 words in
assigning a tag to a word.
Fig. Context used by Tri-gram Model
13
HMM Tagger
 Given a sequence of words (sentence), the
objective is to find the most probable tag sequence
for the sentence.
 Let W be the sequence of words:
 Let W be the sequence of words:
W = w1, w2, … , wn
 The task is to find the tag sequence
T = t1, t2, … , tn
which maximizes P(T|W), i.e.,
T’ = argmaxT P(T|W)
14
 Applying Bayes Rule, P(T/W) can be
estimated using the expression:
P(T|W) = P(W|T) * P(T) /P(W)
 As the probability of the word sequence,
P(W), remains the same for each tag
sequence, we can drop it. The expression
for the most likely tag sequence becomes:
T’ = argmaxT P(W|T) * P(T)
15
 Using the Markov assumption, the probability of
a tag sequence can be estimated as the product
of the probability of its constituent n-grams, i.e.,
P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …
 P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …
tn-1)
 P(W/T) is the probability of seeing a word
sequence, given a tag sequence
 For ex, what is the probability of seeing ‘The egg
is rotten’ given ‘DT NNP VB JJ’.
16
 We make following two assumptions :
1. The words are independent of each other, and
2. The probability of a word is dependent only on
2. The probability of a word is dependent only on
its tag.
Using these assumptions P(W/T) can be expr :
P(W/T) = P(w1/t1) * P(w2/t2) .... P(wi/ti) *
...P(Wn/tn)
i,.e.,
17
18
19
 Some of the possible tag sequence:
DT NNP NNP NNP
DT NNP MD VB or DT NNP MD NNP (Output  Most likely)
20
21
Brill Tagger: Initial state
 Initial State:
most likely tag
 Transformation:
22
 Transformation:
The text is then passed through an ordered
list of transformations.
Each transformation is a pair of a a rewrite
rule and a contextual condition .
Learning Rules
Rules are learned in the following manner
1. each rule, i.e. each possible transformation, is
applied to each matching word-tag-pair.
2. the number of tagging errors is measured
against the correct sequences of the training
corpus ("Truth" ).
23
corpus ("Truth" ).
3. the transformation which yields the greatest
error reduction is chosen.
4. Learning stops when no rules / transformations
can be found that, if applied, reduces errors
beyond some given threshold.
• Set of possible ‘transforms’ is infinite, e.g.,
“transform NN to VB if the previous word
was MicrosoftWindoze & word braindead
occurs between 17 and 158 words before
24
occurs between 17 and 158 words before
that”
• To limit: start with small set of abstracted
transforms, or templates
Templates used: Change a
to b when…
25
Rules learned by TBL tagger
26
Lexicalized transformations
Brill complements the rule schemes by so-called
lexicalized rules which refer to particular words in
the condition part of the transformation:
Change a to b if
27
 Change a to b if
1. the preceding (following, current) word is C
2. the preceding (following, current) word is C and
the preceding (following) word is tagged d.
etc.
unknown words
 In handling unknown words, a POS-tagger can
adopt the following strategies
 assign all possible tags to the unknown word
 assign the most probable tag to the unknown
word
28
 same distribution as ‘Things seen once’
estimator of ‘things never seen’
 use word features i.e. see how words are
spelled (prefixes, suffixes, word length,
capitalization) to guess a (set of) word
class(es). -- Most powerful
Most powerful unknown word
detectors
 32 derivational endings ( -ion,etc.);
 capitalization; hyphenation
 More generally: should use morphological
29
 More generally: should use morphological
analysis! (and some kind of machine learning
approach)

More Related Content

PPT
Ontology engineering
PPTX
Parts of Speect Tagging
PPTX
Mining Association Rules in Large Database
PPTX
2_1 Edit Distance.pptx
PPTX
Natural Language processing Parts of speech tagging, its classes, and how to ...
PPTX
AI search techniques
PPTX
Introduction to Transformer Model
Ontology engineering
Parts of Speect Tagging
Mining Association Rules in Large Database
2_1 Edit Distance.pptx
Natural Language processing Parts of speech tagging, its classes, and how to ...
AI search techniques
Introduction to Transformer Model

What's hot (20)

PPTX
Presentation on Text Classification
PDF
PDF
Natural language processing (NLP) introduction
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPTX
Natural language processing
PDF
Lecture: Word Sense Disambiguation
PPTX
Nlp toolkits and_preprocessing_techniques
PPTX
Types of Machine Learning
PDF
Natural Language Processing (NLP)
PPTX
Natural Language Processing
PPTX
Natural language processing and transformer models
PPTX
Introduction to natural language processing, history and origin
PPTX
Text categorization
PPT
2.4 rule based classification
PDF
AI_ 3 & 4 Knowledge Representation issues
PDF
Syntactic analysis in NLP
PDF
Natural language processing (nlp)
PPTX
Natural Language Processing
PPTX
Natural Language Processing in AI
PDF
Glove global vectors for word representation
Presentation on Text Classification
Natural language processing (NLP) introduction
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Natural language processing
Lecture: Word Sense Disambiguation
Nlp toolkits and_preprocessing_techniques
Types of Machine Learning
Natural Language Processing (NLP)
Natural Language Processing
Natural language processing and transformer models
Introduction to natural language processing, history and origin
Text categorization
2.4 rule based classification
AI_ 3 & 4 Knowledge Representation issues
Syntactic analysis in NLP
Natural language processing (nlp)
Natural Language Processing
Natural Language Processing in AI
Glove global vectors for word representation
Ad

Similar to word level analysis (20)

PPTX
2015ht13439 final presentation
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-II.pptx
PPTX
Statistical machine translation
PDF
Text Mining Analytics 101
PPTX
Recommender systems
PDF
K017367680
PDF
A survey of Stemming Algorithms for Information Retrieval
PDF
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
PPT
Dictionaries and Tolerant Retrieval.ppt
PDF
05_nlp_Vectorization_ML_model_in_text_analysis.pdf
PDF
Ijarcet vol-2-issue-2-323-329
PDF
Learning to summarize using coherence
PDF
Cc35451454
PDF
A-Study_TopicModeling
PDF
7 probability and statistics an introduction
PDF
An approach to speed up the word sense disambiguation procedure through sense...
DOCX
UNIT 3 IRT.docx
PPTX
Natural Language Processing
PPT
An Intuitive Natural Language Understanding System
2015ht13439 final presentation
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-II.pptx
Statistical machine translation
Text Mining Analytics 101
Recommender systems
K017367680
A survey of Stemming Algorithms for Information Retrieval
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
Dictionaries and Tolerant Retrieval.ppt
05_nlp_Vectorization_ML_model_in_text_analysis.pdf
Ijarcet vol-2-issue-2-323-329
Learning to summarize using coherence
Cc35451454
A-Study_TopicModeling
7 probability and statistics an introduction
An approach to speed up the word sense disambiguation procedure through sense...
UNIT 3 IRT.docx
Natural Language Processing
An Intuitive Natural Language Understanding System
Ad

Recently uploaded (20)

PPTX
Artificial Intelligence
PDF
composite construction of structures.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
737-MAX_SRG.pdf student reference guides
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
PPT on Performance Review to get promotions
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
web development for engineering and engineering
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Artificial Intelligence
composite construction of structures.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
R24 SURVEYING LAB MANUAL for civil enggi
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
CH1 Production IntroductoryConcepts.pptx
bas. eng. economics group 4 presentation 1.pptx
additive manufacturing of ss316l using mig welding
737-MAX_SRG.pdf student reference guides
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
PPT on Performance Review to get promotions
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
UNIT 4 Total Quality Management .pptx
Internet of Things (IOT) - A guide to understanding
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
web development for engineering and engineering
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

word level analysis

  • 1. Word Level Analysis POS (Continued) L3 NLP (Elective) 1 TJS
  • 2. Words and Word Classes  Words are classified into categories called part of speech (word classes or lexical categories) 2
  • 3. Part of Speech NN noun student, chair, proof, mechanism VB verb study, increase, produce ADJ adj large, high, tall, few, JJ adverb carefully slowly, uniformly 3 JJ adverb carefully slowly, uniformly IN preposition in, on, to, of PRP pronoun I, me, they DET determiner the, a, an, this, those  open vs. closed word classes
  • 4. Part of Speech tagging  process of assigning a part of speech like noun, verb, pronoun, preposition, adverb, adjective, etc. to each word in a sentence 4 POS tagger Words + tag set POS tag
  • 5. Speech/NN sounds/NNS were/VBD sampled/VBN by/IN a/DT microphone/NN. Another tagging possible for the sentence is: 5 Speech/NN sounds/VBZ were/VBD sampled/VBN by/IN a/DT microphone/NN.
  • 6. Part of speech tagging methods  Rule-based (linguistic)  Stochastic (Data-driven) and  TBL (Transformation Based Learning) 6  TBL (Transformation Based Learning)
  • 7. Rule-based (linguistic) Steps: 1. Dictionary lookup  potential tags 2. Hand-coded Rules The show must go on. 7 The show must go on. Step 1  NN, VB Step 2  discard incorrect tag Rule: IF preceding word is determiner THEN eliminate VB tag.
  • 8.  Morphological information IF word ends in –ing and preceding word is a verb THEN label it a verb (VB). 8  Capitalization information
  • 9. +  Speed  Deterministic 9  Deterministic -  requires manual work  usable for only one language
  • 10. Stochastic Tagger  The standard stochastic tagger algorithm is the Hidden Markov Model (HMM) tagger.  A Markov model applies the simplifying assumption that the probability of a chain of assumption that the probability of a chain of symbols can be approximated in terms of its parts or n-grams.  The simplest n-gram model is the unigram model, which assigns the most likely tag (part-of- speech) to each token. 10
  • 11.  The unigram model requires tagged data to gather most likely statistics. The context used by the unigram tagger is the text of the word itself. For example, it will assign the tag JJ for each occurrence of fast if fast is used as an adjective more frequently than it is used as a noun, verb, or adverb. or adverb.  She had a fast  Muslim fast during Ramadan  Those who are injured need medical help fast.  We would expect more accurate predictions if we took more context into account when making a tagging decision. 11
  • 12.  A bi-gram tagger uses the current word and the tag of the previous word in the tagging process. As the tag sequence “DT NN” is more likely than the tag sequence “DT JJ”, a bi-gram model will assign a correct tag to the word fast in sentence assign a correct tag to the word fast in sentence (1).  Similarly, it is more likely that an adverb (rather than a noun or an adjective) follows a verb. Hence, in sentence (3), the tag assigned to fast will be RB (Adverb) 12
  • 13. N- gram Model  An n-gram model considers the current word and the tag of the previous n-1 words in assigning a tag to a word. Fig. Context used by Tri-gram Model 13
  • 14. HMM Tagger  Given a sequence of words (sentence), the objective is to find the most probable tag sequence for the sentence.  Let W be the sequence of words:  Let W be the sequence of words: W = w1, w2, … , wn  The task is to find the tag sequence T = t1, t2, … , tn which maximizes P(T|W), i.e., T’ = argmaxT P(T|W) 14
  • 15.  Applying Bayes Rule, P(T/W) can be estimated using the expression: P(T|W) = P(W|T) * P(T) /P(W)  As the probability of the word sequence, P(W), remains the same for each tag sequence, we can drop it. The expression for the most likely tag sequence becomes: T’ = argmaxT P(W|T) * P(T) 15
  • 16.  Using the Markov assumption, the probability of a tag sequence can be estimated as the product of the probability of its constituent n-grams, i.e., P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 …  P(T) = P(t1) * P(t2|t1) * P(t3|t1t2) … * P(tn|t1 … tn-1)  P(W/T) is the probability of seeing a word sequence, given a tag sequence  For ex, what is the probability of seeing ‘The egg is rotten’ given ‘DT NNP VB JJ’. 16
  • 17.  We make following two assumptions : 1. The words are independent of each other, and 2. The probability of a word is dependent only on 2. The probability of a word is dependent only on its tag. Using these assumptions P(W/T) can be expr : P(W/T) = P(w1/t1) * P(w2/t2) .... P(wi/ti) * ...P(Wn/tn) i,.e., 17
  • 18. 18
  • 19. 19
  • 20.  Some of the possible tag sequence: DT NNP NNP NNP DT NNP MD VB or DT NNP MD NNP (Output  Most likely) 20
  • 21. 21
  • 22. Brill Tagger: Initial state  Initial State: most likely tag  Transformation: 22  Transformation: The text is then passed through an ordered list of transformations. Each transformation is a pair of a a rewrite rule and a contextual condition .
  • 23. Learning Rules Rules are learned in the following manner 1. each rule, i.e. each possible transformation, is applied to each matching word-tag-pair. 2. the number of tagging errors is measured against the correct sequences of the training corpus ("Truth" ). 23 corpus ("Truth" ). 3. the transformation which yields the greatest error reduction is chosen. 4. Learning stops when no rules / transformations can be found that, if applied, reduces errors beyond some given threshold.
  • 24. • Set of possible ‘transforms’ is infinite, e.g., “transform NN to VB if the previous word was MicrosoftWindoze & word braindead occurs between 17 and 158 words before 24 occurs between 17 and 158 words before that” • To limit: start with small set of abstracted transforms, or templates
  • 25. Templates used: Change a to b when… 25
  • 26. Rules learned by TBL tagger 26
  • 27. Lexicalized transformations Brill complements the rule schemes by so-called lexicalized rules which refer to particular words in the condition part of the transformation: Change a to b if 27  Change a to b if 1. the preceding (following, current) word is C 2. the preceding (following, current) word is C and the preceding (following) word is tagged d. etc.
  • 28. unknown words  In handling unknown words, a POS-tagger can adopt the following strategies  assign all possible tags to the unknown word  assign the most probable tag to the unknown word 28  same distribution as ‘Things seen once’ estimator of ‘things never seen’  use word features i.e. see how words are spelled (prefixes, suffixes, word length, capitalization) to guess a (set of) word class(es). -- Most powerful
  • 29. Most powerful unknown word detectors  32 derivational endings ( -ion,etc.);  capitalization; hyphenation  More generally: should use morphological 29  More generally: should use morphological analysis! (and some kind of machine learning approach)