SlideShare a Scribd company logo
Yun-Nung (Vivian) Chen, Yu Huang,
Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan
2
Key Term Extraction, National Taiwan University
Definition
•Key Term
• Higher term frequency
• Core content
•Two types
• Keyword
• Key phrase
•Advantage
• Indexing and retrieval
• The relations between key terms and segments of documents
3
Key Term Extraction, National Taiwan University
Introduction
4
Key Term Extraction, National Taiwan University
Introduction
5
Key Term Extraction, National Taiwan University
acoustic model
language model
hmm n gram
phonehidden Markov model
Introduction
6
Key Term Extraction, National Taiwan University
hmm
acoustic model
language model
n gram
phonehidden Markov model
bigram
Target: extract key terms from course lectures
7
Key Term Extraction, National Taiwan University
Automatic Key Term Extraction
8
Key Term Extraction, National Taiwan University
▼ Original spoken documents
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans
Automatic Key Term Extraction
9
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans
Automatic Key Term Extraction
10
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans
Phrase
Identification
Automatic Key Term Extraction
11
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
First using branching entropy to identify phrases
ASR trans
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
12
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Then using learning methods to extract key terms by some features
ASR trans
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
13
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
ASR trans
Branching Entropy
14
Key Term Extraction, National Taiwan University
• “hidden” is almost always followed by the same word
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
15
Key Term Extraction, National Taiwan University
• “hidden” is almost always followed by the same word
• “hidden Markov” is almost always followed by the same word
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
16
Key Term Extraction, National Taiwan University
hidden Markov model
boundary
Define branching entropy to decide possible boundary
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
• “hidden” is almost always followed by the same word
• “hidden Markov” is almost always followed by the same word
• “hidden Markov model” is followed by many different words
Branching Entropy
17
Key Term Extraction, National Taiwan University
hidden Markov model
• Definition of Right Branching Entropy
• Probability of children xi for X
• Right branching entropy for X
X xi
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
18
Key Term Extraction, National Taiwan University
hidden Markov model
• Decision of Right Boundary
• Find the right boundary located between X and xi where
X
boundary
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
19
Key Term Extraction, National Taiwan University
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
20
Key Term Extraction, National Taiwan University
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
21
Key Term Extraction, National Taiwan University
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Branching Entropy
22
Key Term Extraction, National Taiwan University
hidden Markov model
• Decision of Left Boundary
• Find the left boundary located between X and xi where
X: model Markov hidden
How to decide the boundary of a phrase?
boundary
X
represent
is
can
:
:
is
of
in
:
:
Using PAT Tree to implement
Branching Entropy
23
Key Term Extraction, National Taiwan University
• Implementation in the PAT tree
• Probability of children xi for X
• Right branching entropy for X
hidden
Markov
1 model
2
chain
3
state
5distribution
6
variable
4
X
x1
x2
X : hidden Markov
x1: hidden Markov model
x2: hidden Markov chain
How to decide the boundary of a phrase?
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
24
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans
Feature Extraction
25
Key Term Extraction, National Taiwan University
•Prosodic features
• For each candidate term appearing at the first time
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for
duration of the term
duration of phone “a” normalized by
avg duration of phone “a”
Feature Extraction
26
Key Term Extraction, National Taiwan University
•Prosodic features
• For each candidate term appearing at the first time
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Feature Extraction
27
Key Term Extraction, National Taiwan University
•Prosodic features
• For each candidate term appearing at the first time
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Feature Extraction
28
Key Term Extraction, National Taiwan University
•Prosodic features
• For each candidate term appearing at the first time
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Feature Extraction
29
Key Term Extraction, National Taiwan University
•Prosodic features
• For each candidate term appearing at the first time
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Energy
(I - IV)
energy
(max, min, mean, range)
Feature Extraction
30
Key Term Extraction, National Taiwan University
•Lexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term
Feature Extraction
31
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Probability
Key terms tend to focus on limited topics
t1
t2
tj
tn
D1
D2
Di
DN
TK
Tk
T2
T1
P(T |D )k i
P(t |T )j k
Di: documents Tk: latent topics tj: terms
Feature Extraction
32
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Probability
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
describe a probability distribution
How to use it?
Feature Extraction
33
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Feature Extraction
34
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Feature Extraction
35
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
Feature Extraction
36
Key Term Extraction, National Taiwan University
•Semantic features
• Probabilistic Latent Semantic Analysis (PLSA)
 Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
LTE term entropy for latent topic
non-key term
key term
Key terms tend to focus on limited topics
Higher LTE
Lower LTE
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
37
Key Term Extraction, National Taiwan University
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) K-means Exemplar
2) AdaBoost
3) Neural Network
ASR
speech signal
ASR trans
Key terms
entropy
acoustic
model
:
Using unsupervised and supervised approaches to extract key terms
Learning Methods
38
Key Term Extraction, National Taiwan University
•Unsupervised learning
• K-means Exemplar
 Transform a term into a vector in LTS (Latent Topic
Significance) space
 Run K-means
 Find the centroid of each cluster to be the key term
The candidate term in the same group are related to the key term
The key term can represent this topic
The terms in the same cluster focus on a single topic
Learning Methods
39
Key Term Extraction, National Taiwan University
•Supervised learning
• Adaptive Boosting
• Neural Network
Automatically adjust the weights of features to produce a classifier
40
Key Term Extraction, National Taiwan University
Experiments
41
Key Term Extraction, National Taiwan University
•Corpus
• NTU lecture corpus
 Mandarin Chinese embedded by English words
 Single speaker
 45.2 hours
我們的solution是viterbi algorithm
(Our solution is viterbi algorithm)
Experiments
42
Key Term Extraction, National Taiwan University
•ASR Accuracy
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
CH EN
SI Model
some data from
target speaker
AM
Out-of-domain
corpora
Background
In-domain
corpus
Adaptive
trigram
interpolation LM
Bilingual AM and
model adaptation
Experiments
43
Key Term Extraction, National Taiwan University
•Reference Key Terms
• Annotations from 61 students who have taken the course
 If the k-th annotator labeled Nk key terms, he gave each
of them a score of , but 0 to others
 Rank the terms by the sum of all scores given by all
annotators for each term
 Choose the top N terms form the list (N is average Nk)
• N = 154 key terms
 59 key phrases and 95 keywords
Experiments
44
Key Term Extraction, National Taiwan University
•Evaluation
• Unsupervised learning
 Set the number of key terms to be N
• Supervised learning
 3-fold cross validation
0
10
20
30
40
50
60
Pr Lx Sm Pr+Lx Pr+Lx+Sm
Experiments
45
Key Term Extraction, National Taiwan University
•Feature Effectiveness
• Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: Prosodic
Lx: Lexical
Sm: Semantic
F-measure
0
10
20
30
40
50
60
70
Baseline U: TFIDF U: K-means S: AB S: NN
manual
Experiments
46
Key Term Extraction, National Taiwan University
•Overall Performance
51.95
55.84
62.39
67.31
23.38
Conventional TFIDF scores
w/o branching entropy
stop word removal
PoS filtering
Branching entropy performs wellK-means Exempler outperforms TFIDFSupervised approaches are better than unsupervised approaches
F-measure
AB: AdaBoost
NN: Neural Network
0
10
20
30
40
50
60
70
Baseline U: TFIDF U: K-means S: AB S: NN
manual
ASR
Experiments
47
Key Term Extraction, National Taiwan University
•Overall Performance
The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results
23.38
20.78
51.95
43.51
55.84
52.60
62.39
57.68
67.31
62.70
F-measure
AB: AdaBoost
NN: Neural Network
48
Key Term Extraction, National Taiwan University
Conclusion
49
Key Term Extraction, National Taiwan University
•We propose the new approach to extract key terms
•The performance can be improved by
• Identifying phrases by branching entropy
• Prosodic, lexical, and semantic features together
•The results are encouraging
Thank reviewers for valuable comments
NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture
50
Key Term Extraction, National Taiwan University

More Related Content

PPTX
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
PDF
Word Segmentation and Lexical Normalization for Unsegmented Languages
PPT
An Intuitive Natural Language Understanding System
PDF
Language Models for Information Retrieval
PDF
The VoiceMOS Challenge 2022
PPT
Improvement in Quality of Speech associated with Braille codes - A Review
PDF
Filled pauses and L2 proficiency: Finnish Australians speaking English
PDF
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Word Segmentation and Lexical Normalization for Unsegmented Languages
An Intuitive Natural Language Understanding System
Language Models for Information Retrieval
The VoiceMOS Challenge 2022
Improvement in Quality of Speech associated with Braille codes - A Review
Filled pauses and L2 proficiency: Finnish Australians speaking English
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...

What's hot (20)

PDF
Natural Language Processing: L02 words
PDF
co:op-READ-Convention Marburg - Roger Labahn
PDF
Automated Abstracts and Big Data
PDF
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
PPT
Taking into account communities of practice’s specific vocabularies in inform...
PDF
Frontiers of Natural Language Processing
PPTX
2010 INTERSPEECH
PPTX
Processing short-message communications in low-resource languages
PDF
Enriching Word Vectors with Subword Information
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
PDF
Latent Topic-semantic Indexing based Automatic Text Summarization
PPTX
Topic Extraction on Domain Ontology
PDF
Natural language processing for requirements engineering: ICSE 2021 Technical...
PPTX
Arabic question answering ‫‬
PPT
Teldap4 getty multilingual vocab workshop2010
PDF
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
PDF
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
PPTX
Deep Content Learning in Traffic Prediction and Text Classification
PPT
ppt
Natural Language Processing: L02 words
co:op-READ-Convention Marburg - Roger Labahn
Automated Abstracts and Big Data
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Taking into account communities of practice’s specific vocabularies in inform...
Frontiers of Natural Language Processing
2010 INTERSPEECH
Processing short-message communications in low-resource languages
Enriching Word Vectors with Subword Information
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Latent Topic-semantic Indexing based Automatic Text Summarization
Topic Extraction on Domain Ontology
Natural language processing for requirements engineering: ICSE 2021 Technical...
Arabic question answering ‫‬
Teldap4 getty multilingual vocab workshop2010
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Deep Content Learning in Traffic Prediction and Text Classification
ppt
Ad

Viewers also liked (20)

PPTX
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
PPTX
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
PPTX
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
PPTX
Language Empowering Intelligent Assistants (CHT)
PPTX
Statistical Learning from Dialogues for Intelligent Assistants
PDF
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
PPTX
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
PPTX
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
PPTX
Deep Learning for Dialogue Modeling - NTHU
PPTX
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
PDF
Intent-Aware Diversification Using a Constrained PLSA
PPTX
Pengolahan Limbah Secara Alamiah
PPTX
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
PPTX
An Intelligent Assistant for High-Level Task Understanding
PPTX
큐레이션 세미나 Pt자료(도서관대회 인쇄본)
PDF
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
[D2CAMPUS] Algorithm tips - ALGOS
PPTX
NLP and LSA getting started
PDF
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
Language Empowering Intelligent Assistants (CHT)
Statistical Learning from Dialogues for Intelligent Assistants
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Deep Learning for Dialogue Modeling - NTHU
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
Intent-Aware Diversification Using a Constrained PLSA
Pengolahan Limbah Secara Alamiah
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
An Intelligent Assistant for High-Level Task Understanding
큐레이션 세미나 Pt자료(도서관대회 인쇄본)
EM algorithm and its application in probabilistic latent semantic analysis
[D2CAMPUS] Algorithm tips - ALGOS
NLP and LSA getting started
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
Ad

Similar to Automatic Key Term Extraction from Spoken Course Lectures (20)

PDF
#3 Information extraction from news to conversations
PDF
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
PDF
Bogstad 2015
PDF
A Novel Approach for Keyword extraction in learning objects using text mining
PDF
Extraction Based automatic summarization
PDF
Survey on Key Phrase Extraction using Machine Learning Approaches
PPTX
3. introduction to text mining
PPTX
3. introduction to text mining
PPTX
NLP Concepts detail explained in details.pptx
PPTX
From Semantics to Self-supervised Learning for Speech and Beyond
PPTX
Natural Language Processing Datascience.pptx
PDF
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
PDF
6.domain extraction from research papers
PDF
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
PDF
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...
PDF
Lecture14 xing fei-fei
PPTX
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
PDF
Domain Extraction From Research Papers
PDF
Ontology learning
#3 Information extraction from news to conversations
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
Bogstad 2015
A Novel Approach for Keyword extraction in learning objects using text mining
Extraction Based automatic summarization
Survey on Key Phrase Extraction using Machine Learning Approaches
3. introduction to text mining
3. introduction to text mining
NLP Concepts detail explained in details.pptx
From Semantics to Self-supervised Learning for Speech and Beyond
Natural Language Processing Datascience.pptx
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
6.domain extraction from research papers
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...
Lecture14 xing fei-fei
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Domain Extraction From Research Papers
Ontology learning

More from Yun-Nung (Vivian) Chen (6)

PPTX
End-to-End Task-Completion Neural Dialogue Systems
PPTX
How the Context Matters Language and Interaction in Dialogues
PPTX
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
PDF
Chatbot的智慧與靈魂
PDF
Deep Learning for Dialogue Systems
PDF
One Day for Bot 一天搞懂聊天機器人
End-to-End Task-Completion Neural Dialogue Systems
How the Context Matters Language and Interaction in Dialogues
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Chatbot的智慧與靈魂
Deep Learning for Dialogue Systems
One Day for Bot 一天搞懂聊天機器人

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf

Automatic Key Term Extraction from Spoken Course Lectures

  • 1. Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan
  • 2. 2 Key Term Extraction, National Taiwan University
  • 3. Definition •Key Term • Higher term frequency • Core content •Two types • Keyword • Key phrase •Advantage • Indexing and retrieval • The relations between key terms and segments of documents 3 Key Term Extraction, National Taiwan University
  • 4. Introduction 4 Key Term Extraction, National Taiwan University
  • 5. Introduction 5 Key Term Extraction, National Taiwan University acoustic model language model hmm n gram phonehidden Markov model
  • 6. Introduction 6 Key Term Extraction, National Taiwan University hmm acoustic model language model n gram phonehidden Markov model bigram Target: extract key terms from course lectures
  • 7. 7 Key Term Extraction, National Taiwan University
  • 8. Automatic Key Term Extraction 8 Key Term Extraction, National Taiwan University ▼ Original spoken documents Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal ASR trans
  • 9. Automatic Key Term Extraction 9 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal ASR trans
  • 10. Automatic Key Term Extraction 10 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal ASR trans
  • 11. Phrase Identification Automatic Key Term Extraction 11 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal First using branching entropy to identify phrases ASR trans
  • 12. Phrase Identification Key Term Extraction Automatic Key Term Extraction 12 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal Key terms entropy acoustic model : Then using learning methods to extract key terms by some features ASR trans
  • 13. Phrase Identification Key Term Extraction Automatic Key Term Extraction 13 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal Key terms entropy acoustic model : ASR trans
  • 14. Branching Entropy 14 Key Term Extraction, National Taiwan University • “hidden” is almost always followed by the same word hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : :
  • 15. Branching Entropy 15 Key Term Extraction, National Taiwan University • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : :
  • 16. Branching Entropy 16 Key Term Extraction, National Taiwan University hidden Markov model boundary Define branching entropy to decide possible boundary How to decide the boundary of a phrase? represent is can : : is of in : : • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word • “hidden Markov model” is followed by many different words
  • 17. Branching Entropy 17 Key Term Extraction, National Taiwan University hidden Markov model • Definition of Right Branching Entropy • Probability of children xi for X • Right branching entropy for X X xi How to decide the boundary of a phrase? represent is can : : is of in : :
  • 18. Branching Entropy 18 Key Term Extraction, National Taiwan University hidden Markov model • Decision of Right Boundary • Find the right boundary located between X and xi where X boundary How to decide the boundary of a phrase? represent is can : : is of in : :
  • 19. Branching Entropy 19 Key Term Extraction, National Taiwan University hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : :
  • 20. Branching Entropy 20 Key Term Extraction, National Taiwan University hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : :
  • 21. Branching Entropy 21 Key Term Extraction, National Taiwan University hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : :
  • 22. Branching Entropy 22 Key Term Extraction, National Taiwan University hidden Markov model • Decision of Left Boundary • Find the left boundary located between X and xi where X: model Markov hidden How to decide the boundary of a phrase? boundary X represent is can : : is of in : : Using PAT Tree to implement
  • 23. Branching Entropy 23 Key Term Extraction, National Taiwan University • Implementation in the PAT tree • Probability of children xi for X • Right branching entropy for X hidden Markov 1 model 2 chain 3 state 5distribution 6 variable 4 X x1 x2 X : hidden Markov x1: hidden Markov model x2: hidden Markov chain How to decide the boundary of a phrase?
  • 24. Phrase Identification Key Term Extraction Automatic Key Term Extraction 24 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal Key terms entropy acoustic model : Extract prosodic, lexical, and semantic features for each candidate term ASR trans
  • 25. Feature Extraction 25 Key Term Extraction, National Taiwan University •Prosodic features • For each candidate term appearing at the first time Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Speaker tends to use longer duration to emphasize key terms using 4 values for duration of the term duration of phone “a” normalized by avg duration of phone “a”
  • 26. Feature Extraction 26 Key Term Extraction, National Taiwan University •Prosodic features • For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range)
  • 27. Feature Extraction 27 Key Term Extraction, National Taiwan University •Prosodic features • For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range)
  • 28. Feature Extraction 28 Key Term Extraction, National Taiwan University •Prosodic features • For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range)
  • 29. Feature Extraction 29 Key Term Extraction, National Taiwan University •Prosodic features • For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Energy (I - IV) energy (max, min, mean, range)
  • 30. Feature Extraction 30 Key Term Extraction, National Taiwan University •Lexical features Feature Name Feature Description TF term frequency IDF inverse document frequency TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term
  • 31. Feature Extraction 31 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Probability Key terms tend to focus on limited topics t1 t2 tj tn D1 D2 Di DN TK Tk T2 T1 P(T |D )k i P(t |T )j k Di: documents Tk: latent topics tj: terms
  • 32. Feature Extraction 32 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Probability Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics describe a probability distribution How to use it?
  • 33. Feature Extraction 33 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq.
  • 34. Feature Extraction 34 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq.
  • 35. Feature Extraction 35 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics
  • 36. Feature Extraction 36 Key Term Extraction, National Taiwan University •Semantic features • Probabilistic Latent Semantic Analysis (PLSA)  Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) LTE term entropy for latent topic non-key term key term Key terms tend to focus on limited topics Higher LTE Lower LTE
  • 37. Phrase Identification Key Term Extraction Automatic Key Term Extraction 37 Key Term Extraction, National Taiwan University Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) K-means Exemplar 2) AdaBoost 3) Neural Network ASR speech signal ASR trans Key terms entropy acoustic model : Using unsupervised and supervised approaches to extract key terms
  • 38. Learning Methods 38 Key Term Extraction, National Taiwan University •Unsupervised learning • K-means Exemplar  Transform a term into a vector in LTS (Latent Topic Significance) space  Run K-means  Find the centroid of each cluster to be the key term The candidate term in the same group are related to the key term The key term can represent this topic The terms in the same cluster focus on a single topic
  • 39. Learning Methods 39 Key Term Extraction, National Taiwan University •Supervised learning • Adaptive Boosting • Neural Network Automatically adjust the weights of features to produce a classifier
  • 40. 40 Key Term Extraction, National Taiwan University
  • 41. Experiments 41 Key Term Extraction, National Taiwan University •Corpus • NTU lecture corpus  Mandarin Chinese embedded by English words  Single speaker  45.2 hours 我們的solution是viterbi algorithm (Our solution is viterbi algorithm)
  • 42. Experiments 42 Key Term Extraction, National Taiwan University •ASR Accuracy Language Mandarin English Overall Char Acc (%) 78.15 53.44 76.26 CH EN SI Model some data from target speaker AM Out-of-domain corpora Background In-domain corpus Adaptive trigram interpolation LM Bilingual AM and model adaptation
  • 43. Experiments 43 Key Term Extraction, National Taiwan University •Reference Key Terms • Annotations from 61 students who have taken the course  If the k-th annotator labeled Nk key terms, he gave each of them a score of , but 0 to others  Rank the terms by the sum of all scores given by all annotators for each term  Choose the top N terms form the list (N is average Nk) • N = 154 key terms  59 key phrases and 95 keywords
  • 44. Experiments 44 Key Term Extraction, National Taiwan University •Evaluation • Unsupervised learning  Set the number of key terms to be N • Supervised learning  3-fold cross validation
  • 45. 0 10 20 30 40 50 60 Pr Lx Sm Pr+Lx Pr+Lx+Sm Experiments 45 Key Term Extraction, National Taiwan University •Feature Effectiveness • Neural network for keywords from ASR transcriptions Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful 20.78 42.86 35.63 48.15 56.55 Pr: Prosodic Lx: Lexical Sm: Semantic F-measure
  • 46. 0 10 20 30 40 50 60 70 Baseline U: TFIDF U: K-means S: AB S: NN manual Experiments 46 Key Term Extraction, National Taiwan University •Overall Performance 51.95 55.84 62.39 67.31 23.38 Conventional TFIDF scores w/o branching entropy stop word removal PoS filtering Branching entropy performs wellK-means Exempler outperforms TFIDFSupervised approaches are better than unsupervised approaches F-measure AB: AdaBoost NN: Neural Network
  • 47. 0 10 20 30 40 50 60 70 Baseline U: TFIDF U: K-means S: AB S: NN manual ASR Experiments 47 Key Term Extraction, National Taiwan University •Overall Performance The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results 23.38 20.78 51.95 43.51 55.84 52.60 62.39 57.68 67.31 62.70 F-measure AB: AdaBoost NN: Neural Network
  • 48. 48 Key Term Extraction, National Taiwan University
  • 49. Conclusion 49 Key Term Extraction, National Taiwan University •We propose the new approach to extract key terms •The performance can be improved by • Identifying phrases by branching entropy • Prosodic, lexical, and semantic features together •The results are encouraging
  • 50. Thank reviewers for valuable comments NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture 50 Key Term Extraction, National Taiwan University

Editor's Notes

  • #2: Hello, everybody. I am Vivian Chen, coming from National Taiwan University. Today I’m going to present my work about automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features.
  • #4: First I will define what is key term. A key term is a term that has higher term frequency and includes core content. There are two types of key terms. One of them is phrase, we call it key phrase. For example, “language model” is a key phrase. Another type is single word, we call it keyword, like “entropy”. Then there are two advantages about key term extraction. They can help us index and retrieve. We also can construct the relationships between key terms and segment documents. Here’s an example.
  • #5: We can show some key terms related to acoustic model. If the key term and acoustic model co-occurs in the same document, they are relevant, so that we can show them for users.
  • #6: Then we can construct the key term graph to represent the relationships between these key terms like this.
  • #7: Similarly, we can also construct the relation between language model and other terms. Then we can show the whole graph to know the organization of key terms.
  • #9: Here’s the flow chart. Now there are a lot of spoken documents.
  • #10: First, with speech recognition system, we can get ASR transcriptions.
  • #11: The input of our work includes transcriptions and speech signals.
  • #12: The first part is to identify phrases. We propose branching entropy to finish this part.
  • #13: Then we can enter key term extraction part. We first extract some features to represent a candidate term, and use machine learning methods to decide the key terms.
  • #14: First we explain how to do phrase identification.
  • #15: The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.
  • #16: Similarly, hidden Markov is almost always followed by model,
  • #17: But at this position, we find that hidden Markov model is followed by some different words. The position is more likely to be boundary. Then we define branching entropy by the concept.
  • #18: First, I will define what is right branching entropy. We assume that X is hidden and x_i is the children of X, representing hidden Markov. We define p of x_i to be the frequency of x_i over the frequency of X, which is probability of children x_i for X. Then we define right branching entropy as this equation, H_r of X, which represents the entropy of X’s chidren.
  • #19: The idea is that the branching entropy at the boundary is higher. So we can find the right boundary where branching entropy is higher like hidden Markov model.
  • #20: Similarly, left branching entropy has the same attribute.
  • #21: The approach called branching entropy is to find the boundaries of key phrases, but where’s the boundary. The idea is that the word entropy at the boundary is higher. There are four parts in this approach. We can present each part in detail.
  • #22: The approach called branching entropy is to find the boundaries of key phrases, but where’s the boundary. The idea is that the word entropy at the boundary is higher. There are four parts in this approach. We can present each part in detail.
  • #23: We also decide the left boundary by H_l of X bar, X bar is the reverse pattern like model Markov hidden. Branching entropy can be implemented by PAT tree.
  • #24: Then in the PAT tree, we can compute right branching entropy for each node. We can take an example to explain p(xi). X is the node representing a phrase hidden Markov, x_1 is X’s child hidden Markov model, x_2 is X’s another child hidden Markov chain. Previously described p of x_i is shown as this one, and then compute right branching entropy of this node. We compute H_r of X for all X in PAT tree and H_l of X bar for all X bar in the reverse PAT tree. We can co
  • #25: With all candidate terms, including words and phrases, we can extract prosodic, lexical, semantic features for each term.
  • #26: For each word, we also compute some prosodic features. First, we believe that lecturer would use longer duration to emphasize key terms. For the candidate term first appearing, we compute the duration for each phone in each candidate term. Then we normalize the duration of specific phone by the average duration of this phone. In this feature, we only use four values to represent this term. We can compute maximum, minimum, mean and range over all phones in a single term to be the features.
  • #27: We believe that higher pitch may represent important information. So we can extract the pitch contour like this.
  • #28: The method is like duration, but the segment unit is changed to single frame. We also use these four values to represent the features.
  • #29: Similarly, we think that higher energy may represent important information. We also can extract energy for each frame in a candidate term.
  • #30: The features are like pitch. The first set of features is shown in this.
  • #31: The second set of features is lexical features. These features are well-known, which may indicate the importance of term. We just use these features to represent each term.
  • #32: Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term.
  • #33: Here are distributions of two terms. Notice that the first term has similar probability for most topics, and it may be a function word, so that this candidate term may be non-key term. In other one, the term focuses on less topics, so it is more likely to be key term. In order to describe the distribution, for the distribution, we compute mean, variance and standard deviation to be the features of this term.
  • #34: Similarly, we can compute latent topic significance, which is within-topic to out-of-topic ratio like this equation. This is the significance of topic T_k in term t_i. This one is within-topic frequency, where n of t_i and d_j is the number of t_i in document d_j, and this one is out-of-topic frequency.
  • #35: Because the significance of key terms would focus on limited topics, we also compute this three values to represent a distribution.
  • #36: Then, we also can compute latent topic entropy. Because this feature also can describe the difference between these two distributions. Non-key term may have higher LTE, and key term would have lower one.
  • #38: Finally, we use some machine learning methods to extract key terms.
  • #39: The first one is an unsupervised learning, K-means Examplar. First, we can transform each word into a vector in latent topic significance space, as this equation. With these vectors, we run K-means. The terms in the same cluster focus on a single topic. So we can extract the centroid of each cluster to be the key term, because the terms in the same group are related to this key term.
  • #40: We also use two supervised learning, adaptive boosting and neural network to automatically adjust the weights of features to produce a classifier.
  • #42: Then we do some experiments to evaluate our approach. The corpus is NTU lecture, which includes Mandarin Chinese and some English words, like this example. This sentence is ~~, which means our solution is viterbi algorithm, The lecture is from a single speaker, and total corpus is about 45 hours.
  • #43: In the ASR system, we train two acoustic models for chinese and english, and use some data to adapt, finally getting a bilingual acoustic model. Language model is trigram interpolation of out-of-domain corpora and some in-domain corpus. Here’s ASR accuracy.
  • #44: To evaluate our result, we need to generate reference key term list. The reference key terms are from students’ annotations, and these students have taken the course. Then we sort all terms and decide top N to be key terms. N is the average numbers of key terms extracted by students, and N is 154. The reference key term list includes 59 key phrases and 95 keywords.
  • #45: Finally we evaluate the results. For unsupervised learning, we set the number of key terms to be N. And using 3-fold cross validation evaluates supervised learning.
  • #46: At this experiment, we are going to see feature effectiveness. The results only include keywords, and it is from neural network using ASR transcriptions. Row a, b, c perform F1 measure from 20% to 42%. Then row d shows that prosodic and lexical features are additive. Row e proves that adding semantic features can further improve the performance so that three sets of features are all useful.
  • #47: Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62.
  • #48: Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62.
  • #50: From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.