SlideShare a Scribd company logo
Facilitating Human Intervention in
Coreference Resolution with
Comparative Entity Summaries
Danyun Xu, Gong Cheng, Yuzhong Qu
Nanjing University, China
Presented at ESWC 2014, Crete, Greece
Coreference resolution
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDirectorOf: W3C
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
Wendy
fullName: “Wendy Hall”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Female”
birthplace: London
founded: WSRI
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate coreferent entities
…
TimBL ------ Wendy
TimBL ------ TBL
ChrisB ------ Bizer
…
Select & Present
Verify
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate coreferent entities
…
TimBL ------ Wendy
TimBL ------ TBL
ChrisB ------ Bizer
…
Select & Present
Verify
Existing focus
Methods with humans in the loop
(or, coordinating “ings”)
• Active learning
• Crowdsourcing
• Pay-as-you-go
Candidate coreferent entities
…
TimBL ------ Wendy
TimBL ------ TBL
ChrisB ------ Bizer
…
Select & Present
Verify
Our focus
Present entire entity descriptions?
Present a compact comparative summary!
givenName: “Tim”
surname: “Berners-Lee”
isDirectorOf: W3C
name: “Tim Berners-Lee”
invented: WWW
Present a compact comparative summary!
Which property-value (PV) pairs
are more helpful?
Four aspects of a good comparative summary
1. Reflecting commonality
2. Reflecting difference
3. Providing information on identity
4. Providing diverse information
1. Commonality
• Common PV pairs =
comparable properties + similar values
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDirectorOf: W3C
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
1. Commonality
• Common PV pairs =
comparable properties + similar values
• More helpful properties =
more like an Inverse Functional Property (IFP)
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDirectorOf: W3C
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
Comparable properties = Properties having similar values
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
• Similarity between values
• String similarity
Comparable properties = Properties having similar values
1. Commonality (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
• Similarity between values
• String similarity
• Likeness to an IFP
• Estimated based on the data set
𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑣𝑎𝑙𝑢𝑒𝑠
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
Comparable properties = Properties having similar values
1. Commonality (weakness)
• Only reflecting commonality can be misleading.
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
Wendy
fullName: “Wendy Hall”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Female”
birthplace: London
founded: WSRI
2. Difference
• Different PV pairs =
comparable properties + dissimilar values
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
Wendy
fullName: “Wendy Hall”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Female”
birthplace: London
founded: WSRI
2. Difference
• Different PV pairs =
comparable properties + dissimilar values
• More helpful properties =
more like a Functional Property (FP)
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
Wendy
fullName: “Wendy Hall”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Female”
birthplace: London
founded: WSRI
2. Difference (details)
• Comparability between properties
• Learned from known coreferent entities
• String similarity
• Dissimilarity between values
• String similarity
• Likeness to a FP
• Estimated based on the data set
𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
3. Information on identity
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDirectorOf: W3C
TBL
name: “Tim Berners-Lee”
type: ComputerScientist
type: RoyalSocietyFellow
sex: “Male”
invented: WWW
founded: WSRI
3. Information on identity (details)
• Information on identity
• Estimated based on the data set
𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = 1 −
log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 ℎ𝑎𝑣𝑖𝑛𝑔 𝑡ℎ𝑖𝑠 𝑃𝑉 𝑝𝑎𝑖𝑟
log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
4. Diversity of information
• Overlapping PV pairs =
similar properties or similar values
TimBL
givenName: “Tim”
surname: “Berners-Lee”
altName: “Tim BL”
type: Scientist
gender: “male”
isDirectorOf: W3C
Overlapping
To find an optimal summary
(or, to find the most helpful PV pairs)
• Maximize
• Commonality
• Difference
• Information on identity
• Diversity of information
• Subject to
• A length limit
To find an optimal summary
(or, to find the most helpful PV pairs)
• Maximize
• Commonality
• Difference
• Information on identity
• Diversity of information
• Subject to
• A length limit
• Formulated as a binary quadratic knapsack problem
• Solved by GRASP-based local search
Evaluation method
• 4 approaches to be blindly tested
• 20 subjects (university students)
• 24 random tasks for each subject
• 4 approaches * (3 positive cases + 3 negative cases)
• Sorted in random order
givenName: “Tim”
surname: “Berners-Lee”
isDirectorOf: W3C
name: “Tim Berners-Lee”
invented: WWW
Entity summary Subject
Coreferent Non-coreferent Not sure
Present
Verify
Data sets and tasks
• Data sets
Places
Films
Data sets and tasks
• Data sets
• Tasks
http://dbpedia.org/resource/Paris,_Texas
http://dbpedia.org/resource/Paris
http://sws.geonames.org/4717560/
http://sws.geonames.org/2988507/
sameAs
(positive case)
sameAs
(positive case)
Places
Films
Data sets and tasks
• Data sets
• Tasks
Paris
http://dbpedia.org/resource/Paris,_Texas
http://dbpedia.org/resource/Paris
http://sws.geonames.org/4717560/
http://sws.geonames.org/2988507/
disambiguates
sameAs
(positive case)
sameAs
(positive case)
(negative cases)
Places
Films
Approaches
Approach Description
NOSUMM Present entire entity descriptions
GENERIC • Information on identity [3]
• Diversity of information
COMPSUMM • Commonality
• Difference
• Information on identity
• Diversity of information
COMPSUMM-C • Commonality
• Difference
• Information on identity
• Diversity of information
[3] Gong Cheng et al. RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization (ISWC 2011)
Results (1)
• Accuracy of verification
• COMPSUMM ≈ NOSUMM
> COMPSUMM-C
> GENERIC
Results (2)
• Efficiency of verification
• COMPSUMM > NOSUMM (2.7—2.9 times faster)
Take-home messages
• Provide entity summaries for verifying coreference.
• improves efficiency (2.7—2.9 times faster)
• without notably affecting accuracy
• Provide comparative (but not just generic) summaries.
• Show both commonality and difference.
Future work
• Present = Summarize + Visualize
Candidate coreferent entities
…
TimBL ------ Wendy
TimBL ------ TBL
ChrisB ------ Bizer
…
Select & Present
Verify
Our focus
Thanks for your attention
Results (3)
• Erroneous decisions
• COMPSUMM-C > COMPSUMM (mostly in negative cases)
Performance testing
• Offline computation
• Comparability between properties (the learning part)
• Likeness to an IFP/FP
• Information on identity
Performance testing
• Offline computation
• Comparability between properties (the learning part)
• Likeness to an IFP/FP
• Information on identity
• Online computation
• Similarity between properties/values
• Optimization
• Results
• Places (DBpedia and GeoNames): 24ms per case
• Films (DBpedia and LinkedMDB): 35ms per case

More Related Content

PPTX
Linked Data at ISAW: How and Why
PDF
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
PPTX
Petermrjisc20141201
KEY
It's not rocket surgery - Linked In: ALA 2011
PDF
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ZIP
Intro to Linked Open Data in Libraries Archives & Museums.
PDF
Studying people who can talk back, Meyer 2013 DH at Oxford summer school
Linked Data at ISAW: How and Why
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
Petermrjisc20141201
It's not rocket surgery - Linked In: ALA 2011
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Intro to Linked Open Data in Libraries Archives & Museums.
Studying people who can talk back, Meyer 2013 DH at Oxford summer school

What's hot (20)

PDF
PPTX
Principles and practice of Open Science
PPTX
UVA MDST 3703 Thematic Research Collections 2012-09-18
PPTX
Meyer dig ethno_2013sdp
PDF
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
PPTX
Open data and Open Science
PPT
Linked Open Data for Libraries
PPTX
Copyright Reform and Open Data
PPTX
ContentMine: Open Data and Social Machines
PDF
What We Organize
PPT
Crowdsourcing Open Corpus-based Resources for EAP
PDF
03 Researchfriendly Org2
PDF
UKSG 2015 Mechanical curator and British Library labs
PDF
MARC and BIBFRAME; Linking libraries and archives
PDF
Metadata
PPTX
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
PPTX
Irish Studies - making library data work harder
ODP
FirstWorkshopOnWikipediaResearch
PPTX
MDST 3270 F10 Seminar 9
PDF
OAC Presentation at CNI 09 Fall Forum
Principles and practice of Open Science
UVA MDST 3703 Thematic Research Collections 2012-09-18
Meyer dig ethno_2013sdp
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
Open data and Open Science
Linked Open Data for Libraries
Copyright Reform and Open Data
ContentMine: Open Data and Social Machines
What We Organize
Crowdsourcing Open Corpus-based Resources for EAP
03 Researchfriendly Org2
UKSG 2015 Mechanical curator and British Library labs
MARC and BIBFRAME; Linking libraries and archives
Metadata
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
Irish Studies - making library data work harder
FirstWorkshopOnWikipediaResearch
MDST 3270 F10 Seminar 9
OAC Presentation at CNI 09 Fall Forum
Ad

Similar to Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries (20)

PPT
Questions and Answers in a Virtual World : Educators and Librarians as Inform...
PPTX
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
PPT
Gathering Alternative Surface Forms for DBpedia Entities
PPTX
Efficient Algorithms for Association Finding and Frequent Association Pattern...
PPTX
Research presentation for teens (1)
PPT
PDF
Extending Schema.org
PPTX
Linked dataworkshopintro14aug2014
PPTX
EXTRACTING KNOWLEDGE FROM WORLD WIDE WEB
PPTX
Crim 4385 undergraduate research methods spr15
PPT
Hpsj orientation
PPT
Sources
PDF
Wolven, Hickey, and Henderson, "Identifiers: New Problems, New Solutions, Par...
PPTX
Linked Data: principles and examples
PPT
Cj 4111 serial killers1
PPTX
Exploring a world of networked information built from free-text metadata
PPTX
Fa2012 college level research peck
PDF
Why language technology can’t handle Game of Thrones (yet)
PPT
The Nevada Test Site Project: Finding Treasures in Firsthand Historical Acco...
PPTX
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
Questions and Answers in a Virtual World : Educators and Librarians as Inform...
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
Gathering Alternative Surface Forms for DBpedia Entities
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Research presentation for teens (1)
Extending Schema.org
Linked dataworkshopintro14aug2014
EXTRACTING KNOWLEDGE FROM WORLD WIDE WEB
Crim 4385 undergraduate research methods spr15
Hpsj orientation
Sources
Wolven, Hickey, and Henderson, "Identifiers: New Problems, New Solutions, Par...
Linked Data: principles and examples
Cj 4111 serial killers1
Exploring a world of networked information built from free-text metadata
Fa2012 college level research peck
Why language technology can’t handle Game of Thrones (yet)
The Nevada Test Site Project: Finding Treasures in Firsthand Historical Acco...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
Ad

More from Gong Cheng (20)

PPTX
Towards Content-Based Dataset Search - Test Collections and Beyond
PPTX
从元数据到内容——新一代知识图谱搜索引擎初探
PPTX
知识图谱中的实体摘要:基于神经网络的方法
PPTX
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
PPTX
知识图谱中的关联搜索
PPTX
面向高考机器人的知识表示与推理初探
PPTX
知识图谱中的实体关联搜索
PPTX
Semantic Data Retrieval: Search, Ranking, and Summarization
PPTX
Semantic Web related top conference review
PPTX
Relatedness-based Multi-Entity Summarization
PPTX
Generating Illustrative Snippets for Open Data on the Web
PPTX
常识推理在地理自动答题中的需求分析
PPTX
Summarizing Semantic Data
PPTX
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
PPTX
Taking up the Gaokao Challenge: An Information Retrieval Approach
PPTX
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
PPTX
知识的摘要
PPTX
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
PPTX
Towards Exploratory Relationship Search: A Clustering-based Approach
PPT
NJVR: The NanJing Vocabulary Repository
Towards Content-Based Dataset Search - Test Collections and Beyond
从元数据到内容——新一代知识图谱搜索引擎初探
知识图谱中的实体摘要:基于神经网络的方法
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
知识图谱中的关联搜索
面向高考机器人的知识表示与推理初探
知识图谱中的实体关联搜索
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Web related top conference review
Relatedness-based Multi-Entity Summarization
Generating Illustrative Snippets for Open Data on the Web
常识推理在地理自动答题中的需求分析
Summarizing Semantic Data
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
Taking up the Gaokao Challenge: An Information Retrieval Approach
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
知识的摘要
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Towards Exploratory Relationship Search: A Clustering-based Approach
NJVR: The NanJing Vocabulary Repository

Recently uploaded (20)

PPTX
Emphasizing It's Not The End 08 06 2025.pptx
PPTX
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
PDF
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPT
First Aid Training Presentation Slides.ppt
PPTX
Learning-Plan-5-Policies-and-Practices.pptx
PPTX
Impressionism_PostImpressionism_Presentation.pptx
PPTX
Relationship Management Presentation In Banking.pptx
PPTX
Human Mind & its character Characteristics
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PPTX
An Unlikely Response 08 10 2025.pptx
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
Introduction-to-Food-Packaging-and-packaging -materials.pptx
PPTX
Tour Presentation Educational Activity.pptx
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
Effective_Handling_Information_Presentation.pptx
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Emphasizing It's Not The End 08 06 2025.pptx
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
Tablets And Capsule Preformulation Of Paracetamol
chapter8-180915055454bycuufucdghrwtrt.pptx
First Aid Training Presentation Slides.ppt
Learning-Plan-5-Policies-and-Practices.pptx
Impressionism_PostImpressionism_Presentation.pptx
Relationship Management Presentation In Banking.pptx
Human Mind & its character Characteristics
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
An Unlikely Response 08 10 2025.pptx
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Introduction-to-Food-Packaging-and-packaging -materials.pptx
Tour Presentation Educational Activity.pptx
2025-08-10 Joseph 02 (shared slides).pptx
Instagram's Product Secrets Unveiled with this PPT
Anesthesia and it's stage with mnemonic and images
Effective_Handling_Information_Presentation.pptx
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY

Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

  • 1. Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries Danyun Xu, Gong Cheng, Yuzhong Qu Nanjing University, China Presented at ESWC 2014, Crete, Greece
  • 2. Coreference resolution TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  • 3. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go
  • 4. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify
  • 5. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Existing focus
  • 6. Methods with humans in the loop (or, coordinating “ings”) • Active learning • Crowdsourcing • Pay-as-you-go Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Our focus
  • 7. Present entire entity descriptions?
  • 8. Present a compact comparative summary! givenName: “Tim” surname: “Berners-Lee” isDirectorOf: W3C name: “Tim Berners-Lee” invented: WWW
  • 9. Present a compact comparative summary! Which property-value (PV) pairs are more helpful?
  • 10. Four aspects of a good comparative summary 1. Reflecting commonality 2. Reflecting difference 3. Providing information on identity 4. Providing diverse information
  • 11. 1. Commonality • Common PV pairs = comparable properties + similar values TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  • 12. 1. Commonality • Common PV pairs = comparable properties + similar values • More helpful properties = more like an Inverse Functional Property (IFP) TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  • 13. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity Comparable properties = Properties having similar values
  • 14. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Similarity between values • String similarity Comparable properties = Properties having similar values
  • 15. 1. Commonality (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Similarity between values • String similarity • Likeness to an IFP • Estimated based on the data set 𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑣𝑎𝑙𝑢𝑒𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 Comparable properties = Properties having similar values
  • 16. 1. Commonality (weakness) • Only reflecting commonality can be misleading. TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  • 17. 2. Difference • Different PV pairs = comparable properties + dissimilar values TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  • 18. 2. Difference • Different PV pairs = comparable properties + dissimilar values • More helpful properties = more like a Functional Property (FP) TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI Wendy fullName: “Wendy Hall” type: ComputerScientist type: RoyalSocietyFellow sex: “Female” birthplace: London founded: WSRI
  • 19. 2. Difference (details) • Comparability between properties • Learned from known coreferent entities • String similarity • Dissimilarity between values • String similarity • Likeness to a FP • Estimated based on the data set 𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠
  • 20. 3. Information on identity TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C TBL name: “Tim Berners-Lee” type: ComputerScientist type: RoyalSocietyFellow sex: “Male” invented: WWW founded: WSRI
  • 21. 3. Information on identity (details) • Information on identity • Estimated based on the data set 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = 1 − log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 ℎ𝑎𝑣𝑖𝑛𝑔 𝑡ℎ𝑖𝑠 𝑃𝑉 𝑝𝑎𝑖𝑟 log 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
  • 22. 4. Diversity of information • Overlapping PV pairs = similar properties or similar values TimBL givenName: “Tim” surname: “Berners-Lee” altName: “Tim BL” type: Scientist gender: “male” isDirectorOf: W3C Overlapping
  • 23. To find an optimal summary (or, to find the most helpful PV pairs) • Maximize • Commonality • Difference • Information on identity • Diversity of information • Subject to • A length limit
  • 24. To find an optimal summary (or, to find the most helpful PV pairs) • Maximize • Commonality • Difference • Information on identity • Diversity of information • Subject to • A length limit • Formulated as a binary quadratic knapsack problem • Solved by GRASP-based local search
  • 25. Evaluation method • 4 approaches to be blindly tested • 20 subjects (university students) • 24 random tasks for each subject • 4 approaches * (3 positive cases + 3 negative cases) • Sorted in random order givenName: “Tim” surname: “Berners-Lee” isDirectorOf: W3C name: “Tim Berners-Lee” invented: WWW Entity summary Subject Coreferent Non-coreferent Not sure Present Verify
  • 26. Data sets and tasks • Data sets Places Films
  • 27. Data sets and tasks • Data sets • Tasks http://dbpedia.org/resource/Paris,_Texas http://dbpedia.org/resource/Paris http://sws.geonames.org/4717560/ http://sws.geonames.org/2988507/ sameAs (positive case) sameAs (positive case) Places Films
  • 28. Data sets and tasks • Data sets • Tasks Paris http://dbpedia.org/resource/Paris,_Texas http://dbpedia.org/resource/Paris http://sws.geonames.org/4717560/ http://sws.geonames.org/2988507/ disambiguates sameAs (positive case) sameAs (positive case) (negative cases) Places Films
  • 29. Approaches Approach Description NOSUMM Present entire entity descriptions GENERIC • Information on identity [3] • Diversity of information COMPSUMM • Commonality • Difference • Information on identity • Diversity of information COMPSUMM-C • Commonality • Difference • Information on identity • Diversity of information [3] Gong Cheng et al. RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization (ISWC 2011)
  • 30. Results (1) • Accuracy of verification • COMPSUMM ≈ NOSUMM > COMPSUMM-C > GENERIC
  • 31. Results (2) • Efficiency of verification • COMPSUMM > NOSUMM (2.7—2.9 times faster)
  • 32. Take-home messages • Provide entity summaries for verifying coreference. • improves efficiency (2.7—2.9 times faster) • without notably affecting accuracy • Provide comparative (but not just generic) summaries. • Show both commonality and difference.
  • 33. Future work • Present = Summarize + Visualize Candidate coreferent entities … TimBL ------ Wendy TimBL ------ TBL ChrisB ------ Bizer … Select & Present Verify Our focus
  • 34. Thanks for your attention
  • 35. Results (3) • Erroneous decisions • COMPSUMM-C > COMPSUMM (mostly in negative cases)
  • 36. Performance testing • Offline computation • Comparability between properties (the learning part) • Likeness to an IFP/FP • Information on identity
  • 37. Performance testing • Offline computation • Comparability between properties (the learning part) • Likeness to an IFP/FP • Information on identity • Online computation • Similarity between properties/values • Optimization • Results • Places (DBpedia and GeoNames): 24ms per case • Films (DBpedia and LinkedMDB): 35ms per case