CS571:: Part of-Speech Tagging

Part-of-Speech Tagging
Natural Language Processing
Emory University
Jinho D. Choi

Part-of-Speech Tagging
2
Classify the part-of-speech tag of each token.
Jinho is a professor
noun verb det. noun
proper common3rd, present
Supervised NLP
1. Collect
2. Train
3. Evaluate
a. Design a processing algorithm.
b. Extract (label, features) pairs.
c. Vectorize labels and features.
d. Build statistical models.
https://github.com/emory-courses/cs571/wiki/Part-of-Speech-Tags

Feature Extraction
3
{wi, wi-1, wi+1, pi-1}
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
John/NNP is/VBZ a/DT teacher/NN
NNP John ∅ was ∅
VBD was John a NNP
DT a was student VBD
NN student a ∅ DT
John/NNP was/VBD a/DT student/NN
Extract the label and the features given the current state.

Feature Extraction
4
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
VBD was John a NNP
NN student a ∅ DT
Filter out ones whose frequencies ≤ cutoff.
Label {NNP:2, VBZ:1, DT:2, NN:2, VBD:1}
F0 {John:2, is:1, a:2, teacher:1, was:1, student:1}
F1 {John:2, is:1, a:2, was:1}
F2 {is:1, a:2, teacher:1, was:1, student:1}
F3 {NNP:2, VBZ:1, DT:2, VBD:1}
cutoff
= 1
Count

Feature Extraction
5
Assign an unique ID to each label and feature.
Label {NNP:0, DT:1, NN:2}
F0 {John:1, a:2}
F1 {John:3, a:4}
F2 {a:5}
F3 {NNP:6, DT:7}
Label F0 F1 F2 F3
NNP John ∅ is ∅
VBZ is John a NNP
DT a is teacher VBZ
NN teacher a ∅ DT
VBD was John a NNP
NN student a ∅ DT
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
0 1 2 3 4 5 6 7
0
1
2
0
1
2

Softmax Regression
6
0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x
p(y|X) =
1
Z(x)
exp
(
y +
X
8k
y,k · xk
)
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ƛNN
ƛIN
ƛVB
ƛRB
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
?
1
b
b
b
b
p(y|X) =
1
Z(x)
exp
(
X
8k
y,k · xk
)

Column vs. Row
7
b f1 f2 … fd
b f1 f2 … fd
b f1 f2 … fd
b f1 f2 … fd
ƛNN
ƛIN
ƛVB
ƛRB
f1 NN VB IN RB
NN VB IN RB
NN VB IN RB
f2
fd
…vs.
1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0x =
f0 f9 f11 f23 f32
What is faster?

Column vs. Row
8
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
1 0 0 0 1 0 0 1
0 1 2 3 4 5 6 7
0
1
2
0
1
2
00 01 02 10 11 12 20 21 22 30 31 32 40 41 42 50 51 52 60 61 62 70 71 72
Machine learning algorithm
Y X
Why group labels 
by features?

Ambiguity Classes
9
John ← NNP:100
study ←VB: 50, NN:50
interest ← JJ:70, NN:30
The likely part-of-speech tag.
NNP
VB_NN
JJ or JJ_NN
Collect the ambiguity classes before training.
Use them as extra features.

CS571:: Part of-Speech Tagging

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to CS571:: Part of-Speech Tagging (20)

More from Jinho Choi (20)

Recently uploaded (20)

CS571:: Part of-Speech Tagging