SlideShare a Scribd company logo
Summarizing Entity Descriptions for
Effective and Efficient
Human-centered Entity Linking
Gong Cheng, Danyun Xu, Yuzhong Qu
Websoft Research Group
State Key Laboratory for Novel Software Technology
Nanjing University, China
Entity Linking (EL)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Human-centered EL is needed
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
• for defining gold standard,
• for crowdsourced EL.
entity description:
set of property-value pairs (called features)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Entity descriptions are long.
Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
?… Apple
Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
?… Apple
summarizing entity descriptions  combinatorial optimization
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung
Electronics
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung
Electronics
𝑐ℎ 𝑓 = − log
number of entities having 𝑓
number of all entities
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment = maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment  maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment  maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
ov = max{similarity between properties, similarity between values}
ov(type: IT company, product: iPhone 5) = SMALL
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Optimization goal (1)
+characterizing power, -information overlap
• Formulated as k Quadratic Knapsack Problems (QKP)
weight of a feature: length
profit of a pair of features:
to maximize characterizing power
to minimize information overlap
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = property’s value uniqueness * dissimilarity between values
di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment  minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
Optimization goal (2): +differentiating power
• Formulated as a Quadratic Multidimensional
Knapsack Problem (QMKP)
weight of a feature: length
profit of a pair of features: differentiating power
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Solved by k Maximizing Marginal Relevance (MMR)
frameworks
• Features are iteratively selected.
• In each iteration, candidate features are re-ranked by
• relevance to context
• dissimilarity to selected features
Optimization goal (1+2+3)
• Formulated as a Quadratic Multidimensional
Knapsack Problem (QMKP)
Experiments: data sets
• Text corpora (with entity mentions linked to Wikipedia)
• AQUAINT
• IITB
• Knowledge base
• DBpedia
• Gold-standard links
• entity mentions  Wikipedia articles  DBpedia entities
Experiments: EL tasks
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
?
..., Apple has finally gone
into big-screen territory, …
1 target entity
• gold-standard
2 (very challenging) noise entities
• sharing a common name with the target entity,
obtained from Wikipedia’s disambiguation pages
Experiments: approaches
• Proposed approaches
• CHR: +characterizing power, -information overlap
• DFF: +differentiating power
• CNT: +relevance to context
• COMB: CHR+DFF+CNT
• Baseline approaches
• DESC: returns entire entity descriptions
• RELIN: a state-of-the-art entity summarization approach for
generic purposes
• average length of entity descriptions: 680 characters
• length limit for summaries: 100 characters (14.7%)
Experiments: extrinsic evaluation
• COMB is the only approach that achieved the following
statistically significant results on both data sets:
• accuracy (% of correct answers): COMB = DESC
• time: COMB < DESC (22-23% faster)
Experiments: intrinsic evaluation
• Statistically significant results on both data sets:
• human ratings: COMB > CHR > other approaches
Future work
• More extensive experiments
• to test with not-in-the-list
• Summaries for automatic EL
Questions?

More Related Content

PPSX
Apple iPhone
DOCX
WagensellerKC_303SWOT
DOCX
Market Analysis Report
PDF
RDBを中核としたXMLDBの開発
PDF
Accelerating Application Development in the Internet of Things using Model-dr...
PPTX
Course-Adaptive Content Recommender for Course Authoring
PPTX
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
PPT
HYbrid semantic and fuzzy approaches to context-aware PERsonalisation
Apple iPhone
WagensellerKC_303SWOT
Market Analysis Report
RDBを中核としたXMLDBの開発
Accelerating Application Development in the Internet of Things using Model-dr...
Course-Adaptive Content Recommender for Course Authoring
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
HYbrid semantic and fuzzy approaches to context-aware PERsonalisation

Similar to Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking (6)

PPTX
Classification algorithms Supervised machine learning technique.pptx
PPTX
Predicting the relevance of search results for e-commerce systems
PDF
Boosting Product Categorization with Machine Learning
PDF
Software product line with IBEA
PDF
IRJET- Artificial Neural Network and Particle Swarm Optimization in Orange Id...
Classification algorithms Supervised machine learning technique.pptx
Predicting the relevance of search results for e-commerce systems
Boosting Product Categorization with Machine Learning
Software product line with IBEA
IRJET- Artificial Neural Network and Particle Swarm Optimization in Orange Id...
Ad

More from Gong Cheng (20)

PPTX
Towards Content-Based Dataset Search - Test Collections and Beyond
PPTX
从元数据到内容——新一代知识图谱搜索引擎初探
PPTX
知识图谱中的实体摘要:基于神经网络的方法
PPTX
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
PPTX
知识图谱中的关联搜索
PPTX
面向高考机器人的知识表示与推理初探
PPTX
知识图谱中的实体关联搜索
PPTX
Semantic Data Retrieval: Search, Ranking, and Summarization
PPTX
Semantic Web related top conference review
PPTX
Relatedness-based Multi-Entity Summarization
PPTX
Generating Illustrative Snippets for Open Data on the Web
PPTX
常识推理在地理自动答题中的需求分析
PPTX
Efficient Algorithms for Association Finding and Frequent Association Pattern...
PPTX
Summarizing Semantic Data
PPTX
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
PPTX
Taking up the Gaokao Challenge: An Information Retrieval Approach
PPTX
知识的摘要
PPTX
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
PPTX
Facilitating Human Intervention in Coreference Resolution with Comparative En...
PPTX
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Content-Based Dataset Search - Test Collections and Beyond
从元数据到内容——新一代知识图谱搜索引擎初探
知识图谱中的实体摘要:基于神经网络的方法
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
知识图谱中的关联搜索
面向高考机器人的知识表示与推理初探
知识图谱中的实体关联搜索
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Web related top conference review
Relatedness-based Multi-Entity Summarization
Generating Illustrative Snippets for Open Data on the Web
常识推理在地理自动答题中的需求分析
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Summarizing Semantic Data
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
Taking up the Gaokao Challenge: An Information Retrieval Approach
知识的摘要
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Towards Exploratory Relationship Search: A Clustering-based Approach
Ad

Recently uploaded (20)

PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
Effective_Handling_Information_Presentation.pptx
PPTX
Emphasizing It's Not The End 08 06 2025.pptx
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPTX
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
PPTX
Human Mind & its character Characteristics
PPTX
Self management and self evaluation presentation
PPTX
water for all cao bang - a charity project
PPTX
nose tajweed for the arabic alphabets for the responsive
PPTX
worship songs, in any order, compilation
PPTX
fundraisepro pitch deck elegant and modern
PPTX
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
PDF
COLEAD A2F approach and Theory of Change
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
Relationship Management Presentation In Banking.pptx
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPT
First Aid Training Presentation Slides.ppt
Hydrogel Based delivery Cancer Treatment
Effective_Handling_Information_Presentation.pptx
Emphasizing It's Not The End 08 06 2025.pptx
chapter8-180915055454bycuufucdghrwtrt.pptx
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
Human Mind & its character Characteristics
Self management and self evaluation presentation
water for all cao bang - a charity project
nose tajweed for the arabic alphabets for the responsive
worship songs, in any order, compilation
fundraisepro pitch deck elegant and modern
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
COLEAD A2F approach and Theory of Change
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Relationship Management Presentation In Banking.pptx
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
Instagram's Product Secrets Unveiled with this PPT
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
First Aid Training Presentation Slides.ppt

Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

  • 1. Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking Gong Cheng, Danyun Xu, Yuzhong Qu Websoft Research Group State Key Laboratory for Novel Software Technology Nanjing University, China
  • 2. Entity Linking (EL) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 3. Human-centered EL is needed But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities • for defining gold standard, • for crowdsourced EL.
  • 4. entity description: set of property-value pairs (called features) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 6. Short, extractive summaries are adequate for human-centered EL. Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit summary of k candidate entity descriptions: k subsets of features (subject to a length limit) ?… Apple
  • 7. Short, extractive summaries are adequate for human-centered EL. Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit ?… Apple summarizing entity descriptions  combinatorial optimization summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
  • 8. Optimization goal (1) +characterizing power, -information overlap • Characterizing power of a feature (ch) ch(type: IT company) < ch(product: iPhone 5) Apple (Inc.) Samsung Electronics Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  • 9. Optimization goal (1) +characterizing power, -information overlap • Characterizing power of a feature (ch) ch(type: IT company) < ch(product: iPhone 5) Apple (Inc.) Samsung Electronics 𝑐ℎ 𝑓 = − log number of entities having 𝑓 number of all entities Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  • 10. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment = maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  • 11. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment  maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  • 12. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment  maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity ov = max{similarity between properties, similarity between values} ov(type: IT company, product: iPhone 5) = SMALL Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  • 13. Optimization goal (1) +characterizing power, -information overlap • Formulated as k Quadratic Knapsack Problems (QKP) weight of a feature: length profit of a pair of features: to maximize characterizing power to minimize information overlap
  • 14. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = property’s value uniqueness * dissimilarity between values di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment = minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  • 15. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = dissimilarity between values * property’s value uniqueness di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment = minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  • 16. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = dissimilarity between values * property’s value uniqueness di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment  minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  • 17. Optimization goal (2): +differentiating power • Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP) weight of a feature: length profit of a pair of features: differentiating power
  • 18. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 19. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 20. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 21. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 22. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  • 23. Optimization goal (3): +relevance to context • Solved by k Maximizing Marginal Relevance (MMR) frameworks • Features are iteratively selected. • In each iteration, candidate features are re-ranked by • relevance to context • dissimilarity to selected features
  • 24. Optimization goal (1+2+3) • Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)
  • 25. Experiments: data sets • Text corpora (with entity mentions linked to Wikipedia) • AQUAINT • IITB • Knowledge base • DBpedia • Gold-standard links • entity mentions  Wikipedia articles  DBpedia entities
  • 26. Experiments: EL tasks Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit ? ..., Apple has finally gone into big-screen territory, … 1 target entity • gold-standard 2 (very challenging) noise entities • sharing a common name with the target entity, obtained from Wikipedia’s disambiguation pages
  • 27. Experiments: approaches • Proposed approaches • CHR: +characterizing power, -information overlap • DFF: +differentiating power • CNT: +relevance to context • COMB: CHR+DFF+CNT • Baseline approaches • DESC: returns entire entity descriptions • RELIN: a state-of-the-art entity summarization approach for generic purposes • average length of entity descriptions: 680 characters • length limit for summaries: 100 characters (14.7%)
  • 28. Experiments: extrinsic evaluation • COMB is the only approach that achieved the following statistically significant results on both data sets: • accuracy (% of correct answers): COMB = DESC • time: COMB < DESC (22-23% faster)
  • 29. Experiments: intrinsic evaluation • Statistically significant results on both data sets: • human ratings: COMB > CHR > other approaches
  • 30. Future work • More extensive experiments • to test with not-in-the-list • Summaries for automatic EL