Automatic Key Term Extraction from Spoken Course Lectures

Yun-Nung (Vivian) Chen, Yu Huang,
Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan

2
Key Term Extraction, National Taiwan University

Definition
•Key Term
• Higher term frequency
• Core content
•Two types
• Keyword
• Key phrase
•Advantage
• Indexing and retrieval
• The relations between key terms and segments of documents
3

Introduction
4

Introduction
5
acoustic model
language model
hmm n gram
phonehidden Markov model

Introduction
6
hmm
acoustic model
language model
n gram
phonehidden Markov model
bigram
Target: extract key terms from course lectures

7

Automatic Key Term Extraction
8
▼ Original spoken documents
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans

9
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans

10
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans

Phrase
Identification
11
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
First using branching entropy to identify phrases
ASR trans

Phrase
Identification
Key Term Extraction
12
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Then using learning methods to extract key terms by some features
ASR trans

Phrase
Identification
Key Term Extraction
13
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
ASR trans

Branching Entropy
14
• “hidden” is almost always followed by the same word
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
15
• “hidden Markov” is almost always followed by the same word
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
16
hidden Markov model
boundary
Define branching entropy to decide possible boundary
represent
is
can
:
:
is
of
in
:
:
• “hidden Markov” is almost always followed by the same word
• “hidden Markov model” is followed by many different words

Branching Entropy
17
hidden Markov model
• Definition of Right Branching Entropy
• Probability of children xi for X
• Right branching entropy for X
X xi
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
18
hidden Markov model
• Decision of Right Boundary
• Find the right boundary located between X and xi where
X
boundary
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
19
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
20
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
21
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
22
hidden Markov model
• Decision of Left Boundary
• Find the left boundary located between X and xi where
X: model Markov hidden
boundary
X
represent
is
can
:
:
is
of
in
:
:
Using PAT Tree to implement

Branching Entropy
23
• Implementation in the PAT tree
• Probability of children xi for X
• Right branching entropy for X
hidden
Markov
1 model
2
chain
3
state
5distribution
6
variable
4
X
x1
x2
X : hidden Markov
x1: hidden Markov model
x2: hidden Markov chain

Phrase
Identification
Key Term Extraction
24
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans

Feature Extraction
25
•Prosodic features
• For each candidate term appearing at the first time
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for
duration of the term
duration of phone “a” normalized by
avg duration of phone “a”

Feature Extraction
26
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration

Feature Extraction
27
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0

Feature Extraction
28
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0

Feature Extraction
29
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0
Energy
(I - IV)
energy

Feature Extraction
30
•Lexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term

Feature Extraction
31
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Probability
Key terms tend to focus on limited topics
t1
t2
tj
tn
D1
D2
Di
DN
TK
Tk
T2
T1
P(T |D )k i
P(t |T )j k
Di: documents Tk: latent topics tj: terms

Feature Extraction
32
 Latent Topic Probability
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
describe a probability distribution
How to use it?

Feature Extraction
33
 Latent Topic Significance
Within-topic to out-of-topic ratio
non-key term
key term
within-topic freq.
out-of-topic freq.

Feature Extraction
34
 Latent Topic Significance
Within-topic to out-of-topic ratio
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
within-topic freq.
out-of-topic freq.

Feature Extraction
35
 Latent Topic Entropy
non-key term
key term

Feature Extraction
36
 Latent Topic Entropy
LTE term entropy for latent topic
non-key term
key term
Higher LTE
Lower LTE

Phrase
Identification
Key Term Extraction
37
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans
Key terms
entropy
acoustic
model
:
Using unsupervised and supervised approaches to extract key terms

Learning Methods
38
•Unsupervised learning
• K-means Exemplar
 Transform a term into a vector in LTS (Latent Topic
Significance) space
 Run K-means
 Find the centroid of each cluster to be the key term
The candidate term in the same group are related to the key term
The key term can represent this topic
The terms in the same cluster focus on a single topic

Learning Methods
39
•Supervised learning
• Adaptive Boosting
• Neural Network
Automatically adjust the weights of features to produce a classifier

40

Experiments
41
•Corpus
• NTU lecture corpus
 Mandarin Chinese embedded by English words
 Single speaker
 45.2 hours
我們的solution是viterbi algorithm
(Our solution is viterbi algorithm)

Experiments
42
•ASR Accuracy
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
CH EN
SI Model
some data from
target speaker
AM
Out-of-domain
corpora
Background
In-domain
corpus
Adaptive
trigram
interpolation LM
Bilingual AM and
model adaptation

Experiments
43
•Reference Key Terms
• Annotations from 61 students who have taken the course
 If the k-th annotator labeled Nk key terms, he gave each
of them a score of , but 0 to others
 Rank the terms by the sum of all scores given by all
annotators for each term
 Choose the top N terms form the list (N is average Nk)
• N = 154 key terms
 59 key phrases and 95 keywords

Experiments
44
•Evaluation
• Unsupervised learning
 Set the number of key terms to be N
• Supervised learning
 3-fold cross validation

0
10
20
30
40
50
60
Pr Lx Sm Pr+Lx Pr+Lx+Sm
Experiments
45
•Feature Effectiveness
• Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: Prosodic
Lx: Lexical
Sm: Semantic
F-measure

0
10
20
30
40
50
60
70
Baseline U: TFIDF U: K-means S: AB S: NN
manual
Experiments
46
•Overall Performance
51.95
55.84
62.39
67.31
23.38
Conventional TFIDF scores
w/o branching entropy
stop word removal
PoS filtering
Branching entropy performs wellK-means Exempler outperforms TFIDFSupervised approaches are better than unsupervised approaches
F-measure
AB: AdaBoost
NN: Neural Network

0
10
20
30
40
50
60
70
Baseline U: TFIDF U: K-means S: AB S: NN
manual
ASR
Experiments
47
•Overall Performance
The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results
23.38
20.78
51.95
43.51
55.84
52.60
62.39
57.68
67.31
62.70
F-measure
AB: AdaBoost
NN: Neural Network

48

Conclusion
49
•We propose the new approach to extract key terms
•The performance can be improved by
• Identifying phrases by branching entropy
• Prosodic, lexical, and semantic features together
•The results are encouraging

Thank reviewers for valuable comments
NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture
50

Automatic Key Term Extraction from Spoken Course Lectures

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Automatic Key Term Extraction from Spoken Course Lectures (20)

More from Yun-Nung (Vivian) Chen (6)

Recently uploaded (20)

Automatic Key Term Extraction from Spoken Course Lectures

Editor's Notes