SlideShare a Scribd company logo
Part-of-Speech Tagging
Natural Language Processing
Emory University
Jinho D. Choi
Part-of-Speech Tagging
2
Classify the part-of-speech tag of each token.
Jinho is a professor
noun verb det. noun
proper common3rd, present
Supervised NLP
1. Collect
2. Train
3. Evaluate
a. Design a processing algorithm.
b. Extract (label, features) pairs.
c. Vectorize labels and features.
d. Build statistical models.
https://github.com/emory-courses/cs571/wiki/Part-of-Speech-Tags
Feature Extraction
3
{wi, wi-1, wi+1, pi-1}
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
John/NNP is/VBZ a/DT teacher/NN
NNP John ∅ was ∅
VBD was John a NNP
DT a was student VBD
NN student a ∅ DT
John/NNP was/VBD a/DT student/NN
Extract the label and the features given the current state.
Feature Extraction
4
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
NNP John ∅ was ∅
VBD was John a NNP
DT a was student VBD
NN student a ∅ DT
Filter out ones whose frequencies ≤ cutoff.
Label {NNP:2, VBZ:1, DT:2, NN:2, VBD:1}
F0 {John:2, is:1, a:2, teacher:1, was:1, student:1}
F1 {John:2, is:1, a:2, was:1}
F2 {is:1, a:2, teacher:1, was:1, student:1}
F3 {NNP:2, VBZ:1, DT:2, VBD:1}
cutoff
= 1
Count
Feature Extraction
5
Assign an unique ID to each label and feature.
Label {NNP:0, DT:1, NN:2}
F0 {John:1, a:2}
F1 {John:3, a:4}
F2 {a:5}
F3 {NNP:6, DT:7}
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
NNP John ∅ was ∅
VBD was John a NNP
DT a was student VBD
NN student a ∅ DT
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
0 1 2 3 4 5 6 7
0
1
2
0
1
2
Softmax Regression
6
0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x
p(y|X) =
1
Z(x)
exp
(
y +
X
8k
y,k · xk
)
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ƛNN
ƛIN
ƛVB
ƛRB
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
?
1
b
b
b
b
p(y|X) =
1
Z(x)
exp
(
X
8k
y,k · xk
)
Column vs. Row
7
b f1 f2 … fd
b f1 f2 … fd
b f1 f2 … fd
b f1 f2 … fd
ƛNN
ƛIN
ƛVB
ƛRB
f1 NN VB IN RB
NN VB IN RB
NN VB IN RB
f2
fd
…vs.
1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x =
f0 f9 f11 f23 f32
What is faster?
Column vs. Row
8
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
0 1 2 3 4 5 6 7
0
1
2
0
1
2
00 01 02 10 11 12 20 21 22 30 31 32 40 41 42 50 51 52 60 61 62 70 71 72
Machine learning algorithm
Y X
Why group labels

by features?
Ambiguity Classes
9
John ← NNP:100
study ←VB: 50, NN:50
interest ← JJ:70, NN:30
The likely part-of-speech tag.
NNP
VB_NN
JJ or JJ_NN
Collect the ambiguity classes before training.
Use them as extra features.

More Related Content

PDF
CS571: Distributional semantics
PDF
CS571: Gradient Descent
PDF
CS571: Language Models
PDF
Factoring lesson
PDF
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
PDF
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
PDF
word2vec_summary_revised
PDF
A Note on BPTT for LSTM LM
CS571: Distributional semantics
CS571: Gradient Descent
CS571: Language Models
Factoring lesson
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
word2vec_summary_revised
A Note on BPTT for LSTM LM

What's hot (20)

PDF
mathFin01
PPT
Functions limits and continuity
PDF
Machine learning (8)
PPTX
Genetic algorithm and graph partitioning problem
PDF
PDF
Functions of several variables
PDF
Gentle intro to SVM
PDF
Imc2017 day2-solutions
PPT
7.1 area between curves
PPTX
7.3 volumes by cylindrical shells
PDF
Lesson 27: Integration by Substitution (Section 041 slides)
PPT
1573 measuring arclength
PDF
PPTX
Basic Mathematics
PPT
125 5.2
PDF
Continuity of functions by graph (exercises with detailed solutions)
PDF
Imc2017 day1-solutions
PPT
Antiderivatives
PPTX
Integral calculus
PPT
125 5.3
mathFin01
Functions limits and continuity
Machine learning (8)
Genetic algorithm and graph partitioning problem
Functions of several variables
Gentle intro to SVM
Imc2017 day2-solutions
7.1 area between curves
7.3 volumes by cylindrical shells
Lesson 27: Integration by Substitution (Section 041 slides)
1573 measuring arclength
Basic Mathematics
125 5.2
Continuity of functions by graph (exercises with detailed solutions)
Imc2017 day1-solutions
Antiderivatives
Integral calculus
125 5.3
Ad

Viewers also liked (20)

PPTX
English : Part of speech
PDF
CS571: Vector Space Models
PDF
CS571: Dependency Parsing
PDF
CS571: Coreference Resolution
PDF
CS571: Sentiment Analysis
PPTX
Parts of Speech
PDF
Text Analytics for Security
PPTX
Natural Language processing Parts of speech tagging, its classes, and how to ...
PPTX
TENSES (BY:Qurat-ul-ain)
PPTX
Part of speech- noun
PPTX
Sentiment Analysis Using Twitter
PPT
Present simple indefinite tense
PPTX
Easy way to learn tense
PPT
Present Indefinite Tense
PPT
Part of speech bingo1
PPTX
Verb tenses charts
PPTX
Parts of speech
PPTX
Sentimental analysis
PPT
Presentation of english
PPT
Simple Present Tense
English : Part of speech
CS571: Vector Space Models
CS571: Dependency Parsing
CS571: Coreference Resolution
CS571: Sentiment Analysis
Parts of Speech
Text Analytics for Security
Natural Language processing Parts of speech tagging, its classes, and how to ...
TENSES (BY:Qurat-ul-ain)
Part of speech- noun
Sentiment Analysis Using Twitter
Present simple indefinite tense
Easy way to learn tense
Present Indefinite Tense
Part of speech bingo1
Verb tenses charts
Parts of speech
Sentimental analysis
Presentation of english
Simple Present Tense
Ad

Similar to CS571:: Part of-Speech Tagging (20)

PDF
Evaluating definite integrals
PPT
Beam buckling
PDF
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
PDF
Integrability and weak diffraction in a two-particle Bose-Hubbard model
PPTX
Transformations computer graphics
PDF
Fast parallelizable scenario-based stochastic optimization
PDF
Learning Deep Learning
PDF
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
PPT
1 blind search
PPTX
B sc cs i bo-de u-ii logic gates
PDF
Joel Spencer – Finding Needles in Exponential Haystacks
PPT
Basic_course of_Robotics_Jacobian_part1.ppt
PDF
Slides_ICML v6.pdf
PDF
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
PDF
chapter5_marked_optimization_with_SGD.pdf
PPT
Anov af03
PDF
Recursive Compressed Sensing
PDF
The lattice Boltzmann equation: background and boundary conditions
PDF
lecture01_lecture01_lecture0001_ceva.pdf
PDF
Machine Learning Lecture10 From Abu Mustafa.pdf
Evaluating definite integrals
Beam buckling
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Integrability and weak diffraction in a two-particle Bose-Hubbard model
Transformations computer graphics
Fast parallelizable scenario-based stochastic optimization
Learning Deep Learning
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
1 blind search
B sc cs i bo-de u-ii logic gates
Joel Spencer – Finding Needles in Exponential Haystacks
Basic_course of_Robotics_Jacobian_part1.ppt
Slides_ICML v6.pdf
The Impact of Smoothness on Model Class Selection in Nonlinear System Identif...
chapter5_marked_optimization_with_SGD.pdf
Anov af03
Recursive Compressed Sensing
The lattice Boltzmann equation: background and boundary conditions
lecture01_lecture01_lecture0001_ceva.pdf
Machine Learning Lecture10 From Abu Mustafa.pdf

More from Jinho Choi (20)

PDF
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
PDF
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
PDF
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
PDF
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
PDF
The Myth of Higher-Order Inference in Coreference Resolution
PDF
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
PDF
Abstract Meaning Representation
PDF
Semantic Role Labeling
PDF
CKY Parsing
PDF
CS329 - WordNet Similarities
PDF
CS329 - Lexical Relations
PDF
Automatic Knowledge Base Expansion for Dialogue Management
PDF
Attention is All You Need for AMR Parsing
PDF
Graph-to-Text Generation and its Applications to Dialogue
PDF
Real-time Coreference Resolution for Dialogue Understanding
PDF
Topological Sort
PDF
Tries - Put
PDF
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
PDF
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
PDF
How to make Emora talk about Sports Intelligently
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
The Myth of Higher-Order Inference in Coreference Resolution
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Abstract Meaning Representation
Semantic Role Labeling
CKY Parsing
CS329 - WordNet Similarities
CS329 - Lexical Relations
Automatic Knowledge Base Expansion for Dialogue Management
Attention is All You Need for AMR Parsing
Graph-to-Text Generation and its Applications to Dialogue
Real-time Coreference Resolution for Dialogue Understanding
Topological Sort
Tries - Put
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
How to make Emora talk about Sports Intelligently

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?

CS571:: Part of-Speech Tagging

  • 1. Part-of-Speech Tagging Natural Language Processing Emory University Jinho D. Choi
  • 2. Part-of-Speech Tagging 2 Classify the part-of-speech tag of each token. Jinho is a professor noun verb det. noun proper common3rd, present Supervised NLP 1. Collect 2. Train 3. Evaluate a. Design a processing algorithm. b. Extract (label, features) pairs. c. Vectorize labels and features. d. Build statistical models. https://github.com/emory-courses/cs571/wiki/Part-of-Speech-Tags
  • 3. Feature Extraction 3 {wi, wi-1, wi+1, pi-1} Label F0 F1 F2 F3 NNP John ∅ is ∅ VBZ is John a NNP DT a is teacher VBZ NN teacher a ∅ DT John/NNP is/VBZ a/DT teacher/NN NNP John ∅ was ∅ VBD was John a NNP DT a was student VBD NN student a ∅ DT John/NNP was/VBD a/DT student/NN Extract the label and the features given the current state.
  • 4. Feature Extraction 4 Label F0 F1 F2 F3 NNP John ∅ is ∅ VBZ is John a NNP DT a is teacher VBZ NN teacher a ∅ DT NNP John ∅ was ∅ VBD was John a NNP DT a was student VBD NN student a ∅ DT Filter out ones whose frequencies ≤ cutoff. Label {NNP:2, VBZ:1, DT:2, NN:2, VBD:1} F0 {John:2, is:1, a:2, teacher:1, was:1, student:1} F1 {John:2, is:1, a:2, was:1} F2 {is:1, a:2, teacher:1, was:1, student:1} F3 {NNP:2, VBZ:1, DT:2, VBD:1} cutoff = 1 Count
  • 5. Feature Extraction 5 Assign an unique ID to each label and feature. Label {NNP:0, DT:1, NN:2} F0 {John:1, a:2} F1 {John:3, a:4} F2 {a:5} F3 {NNP:6, DT:7} Label F0 F1 F2 F3 NNP John ∅ is ∅ VBZ is John a NNP DT a is teacher VBZ NN teacher a ∅ DT NNP John ∅ was ∅ VBD was John a NNP DT a was student VBD NN student a ∅ DT 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 2 3 4 5 6 7 0 1 2 0 1 2
  • 6. Softmax Regression 6 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x p(y|X) = 1 Z(x) exp ( y + X 8k y,k · xk ) ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ƛNN ƛIN ƛVB ƛRB ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ? 1 b b b b p(y|X) = 1 Z(x) exp ( X 8k y,k · xk )
  • 7. Column vs. Row 7 b f1 f2 … fd b f1 f2 … fd b f1 f2 … fd b f1 f2 … fd ƛNN ƛIN ƛVB ƛRB f1 NN VB IN RB NN VB IN RB NN VB IN RB f2 fd …vs. 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x = f0 f9 f11 f23 f32 What is faster?
  • 8. Column vs. Row 8 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 2 3 4 5 6 7 0 1 2 0 1 2 00 01 02 10 11 12 20 21 22 30 31 32 40 41 42 50 51 52 60 61 62 70 71 72 Machine learning algorithm Y X Why group labels
 by features?
  • 9. Ambiguity Classes 9 John ← NNP:100 study ←VB: 50, NN:50 interest ← JJ:70, NN:30 The likely part-of-speech tag. NNP VB_NN JJ or JJ_NN Collect the ambiguity classes before training. Use them as extra features.