Wikification of Concept Mentions within Spoken Dialogues
Using Domain Constraints from Wikipedia
Seokhwan Kim, Rafael E. Banchs, Haizhou Li
Human Language Technology Department, Institute for Infocomm Research (I2
R), Singapore
Wikification on Spoken Dialogues
Linking mentions to the relevant concepts in Wikipedia
Differences between spoken dialogues and written texts
Number of speakers
Dependencies to background knowledge
Degree of informal and noisy expressions
Examples of Wikification on Singapore tour guide dialogues
Guide How can I help you?
Tourist Can you recommend some good places to visit in Singapore?
Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice
place to visit.
Tourist That is a symbol for your country, right?
Guide Yes, we use that to symbolise Singapore.
Tourist Okay.
Guide The lion head symbolised the founding of the island and the fish body
just symbolised the humble fishing village.
Tourist How can I get there from Orchard Road?
Guide You can take the red line train from Orchard and stop at Raffles Place.
Tourist Is this walking distance from the station to the destination?
Guide Yes, it’ll take only ten minutes on foot.
Tourist Alright.
Guide Well, you can also enjoy some seafoods at the riverside near the
place.
Tourist What food do you have any recommendations to try there?
Guide If you like spicy foods, you must try chilli crab which is one of our
favourite dishes here.
Tourist Great! I’ll try that.
Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles
Place MRT Station Singapore River, Chilli crab
Three-step Approach for Wikification on Dialogues
Input Mention
mi
Linking
Validity
Analysis
In-dialogue
Reference
Analysis
Domain
Relevance
Analysis
Speaker
Relatedness
Analysis
Candidate
Generation
Wikipedia
Concepts
History
<mj, f(mj)>j=0..(i-1)
Candidate
Ranking
Output Concept
f(mi)
Step 1
Step 2
Step 3
Step 1: Mention Analysis
Analyzing four binary properties of a given mention
Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness
Guide: In the morning I suggest to you to go to Botanical Garden.
LV ID DR SRG SRT
- - - - -
LV ID DR SRG SRT
+ - + + -
Tourist: Oh, we also have Botanical Garden.
LV ID DR SRG SRT
+ - - - +
Tourist: That is actually one of my favourite places here.
LV ID DR SRG SRT
+ + - - +
LV ID DR SRG SRT
+ - - - +
Guide: If so, you might like this place also.
LV ID DR SRG SRT
+ + + + -
Step 2: Candidate Generation
Candidates retrieval from a Lucene index on the Wikipedia collection
With filtering constraints based on the analyzed properties in step 1
Combination of multiple constraints: Intersection or Union
Step 3: Candidate Ranking
Ranking SVM: Supervised learning to rank algorithm
s(m, c) =



4 if c is the exactly same as g(m),
3 if c is the parent article of g(m),
2 if c belongs to the same article
but different section of g(m),
1 otherwise.
m: a mention
c: a candidate concept
g(m): the manual annotation for the most relevant concept of m
Datasets
Singapore tour guide dialogues
Human-human mixed initiative dialogues
35 sessions, 21 hours, 31,034 utterances
Manually annotated with relevant Wikipedia concepts
Preprocessed by Stanford CoreNLP toolkit
Wikipedia collection
4,797,927 articles and 25,577,464 sections in total
Collected from Wikipedia database dump as of January 2015
Indexed into a Lucene index
Evaluation: Mention Analysis
SVMlight
was used for training four mention analyzers
With four sets of features: mention (M), utterance (U), dialogue (D),
and Wikipedia-based (W) features
Five-fold cross validation with F-measure
Features LV ID SRG SRT
M 86.29 69.15 71.10 72.94
M+U 86.90 70.43 70.43 68.85
M+D 86.17 71.09 70.56 71.52
M+W 86.21 68.96 70.66 71.86
M+U+D 86.82 72.37 70.12 68.30
M+U+W 86.84 70.13 70.19 68.78
M+U+D+W 86.77 72.20 69.94 68.10
Evaluation: Candidate Generation
Four sets of candidates were prepared for each mention
Baseline: Retrieved with no filtering
Intersection: Filtered with intersection of analyzed properties
Union: Filtered with union of analyzed properties
Oracle: Filtered with manually annotated properties
Top 100 candidates were retrieved from a Lucene index for each set
Evaluation: Candidate Ranking
SVMrank
was used for training ranking functions
The top-ranked item in the list is considered as the result of Wikification
Five-fold cross validation with Precision/Recall/F-measure
Method P R F
Baseline 26.85 22.52 21.24
Intersection 44.37 27.35 33.84
Union 38.04 31.97 34.74
Manual Filtering 39.90 34.72 37.13
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg

More Related Content

PDF
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
PDF
ITMO RecSys course. Autumn 2014. Lecture 5
PDF
Science in text mining
PDF
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
PDF
PyData2015
PDF
IRE Semantic Annotation of Documents
PDF
The Fourth Dialog State Tracking Challenge (DSTC4)
PDF
Semantic Annotation of Documents
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
ITMO RecSys course. Autumn 2014. Lecture 5
Science in text mining
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
PyData2015
IRE Semantic Annotation of Documents
The Fourth Dialog State Tracking Challenge (DSTC4)
Semantic Annotation of Documents

Viewers also liked (6)

PPTX
Wikipedia Document Classification
PDF
Word2Vec: Vector presentation of words - Mohammad Mahdavi
PDF
Natural Language in Human-Robot Interaction
PDF
Representation Learning of Vectors of Words and Phrases
PDF
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
PDF
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
Wikipedia Document Classification
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Natural Language in Human-Robot Interaction
Representation Learning of Vectors of Words and Phrases
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
Ad

Similar to Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia (15)

PDF
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
PDF
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
PDF
Cv huaiping
PPT
Automated evaluation of crowdsourced annotations in the cultural heritage domain
PDF
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
PPTX
Morphological Analyzer and Generator for Tamil Language
PPT
"Thinking in English" information structures task array
PPTX
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
PDF
Esha t patkar portfolio 2020
PPTX
NLP guest lecture: How to get text to confess what knowledge it has
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
PPTX
transfer.pptx
PPTX
Search and Hyperlinking Overview @MediaEval2014
PPT
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
Cv huaiping
Automated evaluation of crowdsourced annotations in the cultural heritage domain
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Morphological Analyzer and Generator for Tamil Language
"Thinking in English" information structures task array
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Esha t patkar portfolio 2020
NLP guest lecture: How to get text to confess what knowledge it has
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
transfer.pptx
Search and Hyperlinking Overview @MediaEval2014
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Ad

More from Seokhwan Kim (16)

PDF
The Eighth Dialog System Technology Challenge (DSTC8)
PDF
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
PDF
Dynamic Memory Networks for Dialogue Topic Tracking
PDF
The Fifth Dialog State Tracking Challenge (DSTC5)
PDF
Sequential Labeling for Tracking Dynamic Dialog States
PDF
Wikipedia-based Kernels for Dialogue Topic Tracking
PDF
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
PDF
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
PDF
MMR-based active machine learning for Bio named entity recognition
PDF
A semi-supervised method for efficient construction of statistical spoken lan...
PDF
A spoken dialog system for electronic program guide information access
PDF
An alignment-based approach to semi-supervised relation extraction including ...
PDF
An Alignment-based Pattern Representation Model for Information Extraction
PDF
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
PDF
A Cross-Lingual Annotation Projection Approach for Relation Detection
PDF
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
The Eighth Dialog System Technology Challenge (DSTC8)
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Dynamic Memory Networks for Dialogue Topic Tracking
The Fifth Dialog State Tracking Challenge (DSTC5)
Sequential Labeling for Tracking Dynamic Dialog States
Wikipedia-based Kernels for Dialogue Topic Tracking
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
MMR-based active machine learning for Bio named entity recognition
A semi-supervised method for efficient construction of statistical spoken lan...
A spoken dialog system for electronic program guide information access
An alignment-based approach to semi-supervised relation extraction including ...
An Alignment-based Pattern Representation Model for Information Extraction
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...

Recently uploaded (20)

PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
mechattonicsand iotwith sensor and actuator
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
Petroleum Refining & Petrochemicals.pptx
PDF
Computer organization and architecuture Digital Notes....pdf
PPTX
Software Engineering and software moduleing
PPTX
Measurement Uncertainty and Measurement System analysis
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Building constraction Conveyance of water.pptx
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Java Basics-Introduction and program control
PPTX
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PPTX
Amdahl’s law is explained in the above power point presentations
PDF
Unit1 - AIML Chapter 1 concept and ethics
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
CyberSecurity Mobile and Wireless Devices
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
mechattonicsand iotwith sensor and actuator
Abrasive, erosive and cavitation wear.pdf
Petroleum Refining & Petrochemicals.pptx
Computer organization and architecuture Digital Notes....pdf
Software Engineering and software moduleing
Measurement Uncertainty and Measurement System analysis
Information Storage and Retrieval Techniques Unit III
Building constraction Conveyance of water.pptx
Soil Improvement Techniques Note - Rabbi
Java Basics-Introduction and program control
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Amdahl’s law is explained in the above power point presentations
Unit1 - AIML Chapter 1 concept and ethics
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
CyberSecurity Mobile and Wireless Devices

Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

  • 1. Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research (I2 R), Singapore Wikification on Spoken Dialogues Linking mentions to the relevant concepts in Wikipedia Differences between spoken dialogues and written texts Number of speakers Dependencies to background knowledge Degree of informal and noisy expressions Examples of Wikification on Singapore tour guide dialogues Guide How can I help you? Tourist Can you recommend some good places to visit in Singapore? Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. Tourist That is a symbol for your country, right? Guide Yes, we use that to symbolise Singapore. Tourist Okay. Guide The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. Tourist How can I get there from Orchard Road? Guide You can take the red line train from Orchard and stop at Raffles Place. Tourist Is this walking distance from the station to the destination? Guide Yes, it’ll take only ten minutes on foot. Tourist Alright. Guide Well, you can also enjoy some seafoods at the riverside near the place. Tourist What food do you have any recommendations to try there? Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes here. Tourist Great! I’ll try that. Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles Place MRT Station Singapore River, Chilli crab Three-step Approach for Wikification on Dialogues Input Mention mi Linking Validity Analysis In-dialogue Reference Analysis Domain Relevance Analysis Speaker Relatedness Analysis Candidate Generation Wikipedia Concepts History <mj, f(mj)>j=0..(i-1) Candidate Ranking Output Concept f(mi) Step 1 Step 2 Step 3 Step 1: Mention Analysis Analyzing four binary properties of a given mention Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness Guide: In the morning I suggest to you to go to Botanical Garden. LV ID DR SRG SRT - - - - - LV ID DR SRG SRT + - + + - Tourist: Oh, we also have Botanical Garden. LV ID DR SRG SRT + - - - + Tourist: That is actually one of my favourite places here. LV ID DR SRG SRT + + - - + LV ID DR SRG SRT + - - - + Guide: If so, you might like this place also. LV ID DR SRG SRT + + + + - Step 2: Candidate Generation Candidates retrieval from a Lucene index on the Wikipedia collection With filtering constraints based on the analyzed properties in step 1 Combination of multiple constraints: Intersection or Union Step 3: Candidate Ranking Ranking SVM: Supervised learning to rank algorithm s(m, c) =    4 if c is the exactly same as g(m), 3 if c is the parent article of g(m), 2 if c belongs to the same article but different section of g(m), 1 otherwise. m: a mention c: a candidate concept g(m): the manual annotation for the most relevant concept of m Datasets Singapore tour guide dialogues Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with relevant Wikipedia concepts Preprocessed by Stanford CoreNLP toolkit Wikipedia collection 4,797,927 articles and 25,577,464 sections in total Collected from Wikipedia database dump as of January 2015 Indexed into a Lucene index Evaluation: Mention Analysis SVMlight was used for training four mention analyzers With four sets of features: mention (M), utterance (U), dialogue (D), and Wikipedia-based (W) features Five-fold cross validation with F-measure Features LV ID SRG SRT M 86.29 69.15 71.10 72.94 M+U 86.90 70.43 70.43 68.85 M+D 86.17 71.09 70.56 71.52 M+W 86.21 68.96 70.66 71.86 M+U+D 86.82 72.37 70.12 68.30 M+U+W 86.84 70.13 70.19 68.78 M+U+D+W 86.77 72.20 69.94 68.10 Evaluation: Candidate Generation Four sets of candidates were prepared for each mention Baseline: Retrieved with no filtering Intersection: Filtered with intersection of analyzed properties Union: Filtered with union of analyzed properties Oracle: Filtered with manually annotated properties Top 100 candidates were retrieved from a Lucene index for each set Evaluation: Candidate Ranking SVMrank was used for training ranking functions The top-ranked item in the list is considered as the result of Wikification Five-fold cross validation with Precision/Recall/F-measure Method P R F Baseline 26.85 22.52 21.24 Intersection 44.37 27.35 33.84 Union 38.04 31.97 34.74 Manual Filtering 39.90 34.72 37.13 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg