SlideShare a Scribd company logo
On Type-Aware Entity Retrieval
Dar´ıo Garigliotti and Krisztian Balog
University of Stavanger
3rd ACM International Conference
on the Theory of Information Retrieval
Amsterdam, The Netherlands - October 2, 2017
We thank SIGIR for the Students Travel Grant
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Outline:
1 Type-Aware Entity Retrieval
2 Dimensions of Type Information
3 Results and Analysis
4 Conclusions and Future Work
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Entity Types
Target Types
Entity Types
A characteristic property of entities is that they are typed
Types are organized in hierarchies (or taxonomies)
…
Scientist
… ……
Person
Agent …
Enrico
Fermi
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Entity Types
Target Types
Query Target Types
Target types: types of entities sought by the query
…
ScientistArtist Writer
… ……
Person
Agent …
italian nobel prize winners
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Entity Types
Target Types
Target Types
Target types occur in many queries
countries where one can pay with the euro
art museums in Amsterdam
italian nobel prize winners
Types help to reduce the space of search
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Entity Types
Target Types
E.g. Buying a book
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimensions of Type Information
Type information have been shown to improve Entity Retrieval
INEX Entity Ranking track
We systematically identify and compare all combinations of
three dimensions of type information
Type taxonomies
Type representations
Retrieval models
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Type Taxonomies
Which type taxonomy to use?
DBpedia Ontology (7 levels, 600 types)
Freebase Types (2 levels, 2K types)
Wikipedia Categories (34 levels, 600K types)
YAGO Taxonomy (19 levels, 500K types)
These vary a lot in terms of hierarchical structure and in how
entity-type assignments are recorded
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Type Representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Type Representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Top-level type(s)
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Type Representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Top-level type(s)
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Most specific type(s)
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Retrieval Models
How to add type information into entity retrieval?
Retrieval task
defined in a
generative
probabilistic
framework
P(q | e)
query entity
Olympic games
target types
Rio de Janeiro
term-based
similarity
type-based
similarity
… …
entity types
Both query and entity considered in the term space as well as
in the type space
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Retrieval Models
(Strict) Filtering model
P(q | e) = P(θT
q | θT
e ) · χ[types(q) ∩ types(e) = ∅]
Types(q)Types(q) Types(e)Types(e)
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Retrieval Models
(Soft) Filtering model
P(q | e) = P(θT
q | θT
e ) · P(θT
q | θT
e )
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Type Taxonomies
Type Representations
Retrieval Models
Dimension: Retrieval Models
Interpolation model
P(q | e) = (1 − λ) · P(θT
q | θT
e ) + λ · P(θT
q | θT
e )
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Experimental Setup
Test collection of DBpedia entities 1
Baseline: Mixture of Language Models (title and content)
Idealized assumption of a target types oracle
Settings for type assignments
1
Krisztian Balog and Robert Neumayer. 2013. A Test Collection for
Entity Search in DBpedia. In Proc. of SIGIR. 737–740.
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Experimental Setup: Target Types Oracle
An oracle provides us with the (distribution of) correct
target types for a given query
Construction: given a query, take union of all types of all its
relevant entities
Probability proportional to the number of relevant entities
having the type
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Experimental Setup: Type Assignments
Two settings to deal with missing type assignments
4TT: Only entities with types in all types taxonomies
E.g. types for the entity Enrico Fermi
In DBpedia: Scientist
In Freebase: award.award winner,
people.deceased person, education.academic, ...
In Wikipedia: Nobel laureates in Physics,
University of Pisa alumni, ...
In YAGO: ItalianPhysicists,
NobelLaureatesInPhysics,
AmericanPeopleOfItalianDescent, ...
ALL: All available entities are allowed
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Research Questions
RQ1 What is the impact of the particular choice of type
taxonomy on entity retrieval performance?
RQ2 How to represent hierarchical entity type information
for entity retrieval?
RQ3 How to combine term-based and type-based
information?
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Results
Wikipedia, in combination with the most specific type
representation, performs best (for both 4TT and ALL)
Highly significant improvements for all models in 4TT
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Results
RQ1 What is the impact of the particular choice of type
taxonomy on entity retrieval performance?
Wikipedia, in combination with the most specific type
representation, performs best (for both 4TT and ALL)
Highly significant improvements for all models in 4TT
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Results
RQ2 How to represent hierarchical entity type information for
entity retrieval?
Using the most specific types in the hierarchy provides the
best performance
No evidence that hierarchical relationships from ancestor
types would benefit retrieval effectiveness
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Results
RQ3 How to combine term-based and type-based information?
In the 4TT setting, strict filtering is the best retrieval model
Only the interpolation model can deal in a robust manner
with the loss of type assignments in the ALL setting
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Results
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
Summary of Findings
Using the most specific types is the most effective way to
represent hierarchical entity type information
Wikipedia performs best across all type taxonomies in
most of the cases
All models to combine term- and type-based information
suffer from missing type information, but interpolation
appears to be the most robust
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
An Instance of Query-level Analysis
Query: italian nobel prize winners
Baseline. MAP: 0.1607
Target types:
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
An Instance of Query-level Analysis
Query: italian nobel prize winners
Baseline. MAP: 0.1607
Target types:
DBpedia, most specific,
soft filter. MAP: 0.1829
Artist,
Scientist,
Writer.
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
An Instance of Query-level Analysis
Query: italian nobel prize winners
Baseline. MAP: 0.1607
Target types:
DBpedia, most specific,
soft filter. MAP: 0.1829
Artist,
Scientist,
Writer.
Wikipedia, most specific,
inter (0.95). MAP: 0.8518
Italian Nobel
laureates,
Nobel laureates in
Literature, ...
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Experimental Setup
Research Questions
Results and Analysis
What is in a Target Type?
What portion of relevant entities can target types capture?
Top-K types Type Taxonomy P R F1
K = 1 DBpedia 0.0027 0.5863 0.0046
Freebase 0.0060 0.7254 0.0076
Wikipedia 0.1147 0.4798 0.1287
YAGO 0.0418 0.6303 0.0488
K = 3 DBpedia 0.0006 0.7199 0.0012
Freebase 0.0004 0.7805 0.0008
Wikipedia 0.0402 0.5847 0.0614
YAGO 0.0036 0.7025 0.0062
Fine-grained types in Wikipedia category graph can capture
some subset of relevant entities with the highest P and F1
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Conclusions and Future Work
In this work:
We identify and systematically compare distinguished
dimensions in type-aware entity retrieval
We observe that type information proves most useful when
larger, deeper type taxonomies provide very specific types.
In future work:
We plan to report further query-level analyses
We wish to re-assess the experiments using automatically
identified target types2
2
Dar´ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target
Type Identification for Entity-Bearing Queries. In Proc. of SIGIR.
845–848.
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Appendices
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Appendix: Retrieval Model
Interpolation model
For DBpedia and Freebase, more type-based information
is always increasingly more harmful
Wikipedia and YAGO performances increase with higher
contribution of type information using most specific types.
0 0.5 10
0.1
0.2
0.3
0.4
λt
MAP
DBpedia Freebase Wikipedia YAGO
(a) Along path
Figure 1: Interpolation performances for different type weights λt (4TT).
0 0.5 10
0.1
0.2
0.3
0.4
λt
MAP
(a) Path-to-top types
0 0.5 1
λt
(b) Top-level types
0 0.5 1
λt
(c) Most specific types
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Appendix: Revisited Target Types Oracle
The target types distribution of the default oracle includes
all types associated with known relevant entities
Alternatively, we assess the configurations using a filtered
oracle of target types that satisfy a threshold of coverage of
relevant entities
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dimensions of Type Information
Results and Analysis
Appendix: Revisited Target Types Oracle
Target Types Oracles: Default Filtered Models: Strict filtering Soft filtering Interpolation
0
0.1
0.2
0.3
Configurations
MAP
0
0.1
0.2
0.3
DBpedia Freebase Wikipedia YAGO
MAP
(a) Path-to-top types
0
0.1
0.2
0.3
DBpedia Freebase Wikipedia YAGO
MAP
(b) Top-level types
0
0.1
0.2
0.3
DBpedia Freebase Wikipedia YAGO
MAP
(c) Most specific types
Filtered oracle leads to considerable drops in performance
of settings using the most specific types
It is important to consider all possible target types
Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

More Related Content

PDF
Type-Aware Entity Retrieval
PPTX
Entity Linking in Queries: Tasks and Evaluation
PDF
Task-Based Information Retrieval
PDF
Lect6-An introduction to ontologies and ontology development
PDF
Ontology Engineering: Introduction
PDF
Type-Aware Entity Retrieval
PDF
Type-Aware Entity Retrieval
PDF
Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Entity Linking in Queries: Tasks and Evaluation
Task-Based Information Retrieval
Lect6-An introduction to ontologies and ontology development
Ontology Engineering: Introduction
Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Type-Aware Entity Retrieval

Similar to On Type-Aware Entity Retrieval (20)

PDF
Type Information in Entity Retrieval
PDF
Entity Retrieval (WWW 2013 tutorial)
PDF
Task-Based Support in Search Engines
PDF
A Semantic Search Approach to Task-Completion Engines
PDF
On Entities and Evaluation
PDF
Entity Retrieval (WSDM 2014 tutorial)
PDF
Entity Retrieval (SIGIR 2013 tutorial)
PDF
Evaluation Initiatives for Entity-oriented Search
PDF
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
PDF
Entities for Augmented Intelligence
PDF
Improving Entity Retrieval on Structured Data
PDF
Entity Search: The Last Decade and the Next
PPTX
Gleaning Types for Literals in RDF with Application to Entity Summarization
PDF
Knowledge Graph Maintenance
PPTX
TRank ISWC2013
PDF
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
PPTX
Exploring the Application Potential of Relational Web Tables
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
PDF
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
Type Information in Entity Retrieval
Entity Retrieval (WWW 2013 tutorial)
Task-Based Support in Search Engines
A Semantic Search Approach to Task-Completion Engines
On Entities and Evaluation
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
Evaluation Initiatives for Entity-oriented Search
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
Entities for Augmented Intelligence
Improving Entity Retrieval on Structured Data
Entity Search: The Last Decade and the Next
Gleaning Types for Literals in RDF with Application to Entity Summarization
Knowledge Graph Maintenance
TRank ISWC2013
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Exploring the Application Potential of Relational Web Tables
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
Ad

More from Darío Garigliotti (20)

PDF
Task Recommendation
PDF
About "Towards Better Text Understanding and Retrieval through Kernel Entity ...
PDF
A Summary of ECIR'18
PDF
A Semantic Search Approach to Task-Completion Engines
PDF
A Knowledge Base of Entity-Oriented Search Intents
PDF
Learning-to-Rank Target Types for Entity-Bearing Queries
PDF
Dive into Deep Learning
PDF
If this is the answer, what was the question?
PDF
Semi-supervised Learning for Word Sense Disambiguation
PDF
Semi-supervised Learning for Word Sense Disambiguation
PDF
Semi-supervised Learning for Word Sense Disambiguation
PDF
FACT-IR. Fairness, Accountability, Confidentiality and Transparency in Inform...
PDF
Machine Learning - Clustering
PDF
Machine Learning - Classification (ctd.)
PDF
Machine Learning - Classification
PDF
Data Mining - Exploring Data
PDF
Data Mining - Introduction and Data
PDF
Predicate Logic
PDF
Patterns, Automata and Regular Expressions
PDF
The List Data Model
Task Recommendation
About "Towards Better Text Understanding and Retrieval through Kernel Entity ...
A Summary of ECIR'18
A Semantic Search Approach to Task-Completion Engines
A Knowledge Base of Entity-Oriented Search Intents
Learning-to-Rank Target Types for Entity-Bearing Queries
Dive into Deep Learning
If this is the answer, what was the question?
Semi-supervised Learning for Word Sense Disambiguation
Semi-supervised Learning for Word Sense Disambiguation
Semi-supervised Learning for Word Sense Disambiguation
FACT-IR. Fairness, Accountability, Confidentiality and Transparency in Inform...
Machine Learning - Clustering
Machine Learning - Classification (ctd.)
Machine Learning - Classification
Data Mining - Exploring Data
Data Mining - Introduction and Data
Predicate Logic
Patterns, Automata and Regular Expressions
The List Data Model
Ad

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
An interstellar mission to test astrophysical black holes
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
The scientific heritage No 166 (166) (2025)
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Sciences of Europe No 170 (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Introduction to Cardiovascular system_structure and functions-1
Comparative Structure of Integument in Vertebrates.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
lecture 2026 of Sjogren's syndrome l .pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
AlphaEarth Foundations and the Satellite Embedding dataset
Placing the Near-Earth Object Impact Probability in Context
An interstellar mission to test astrophysical black holes
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
TOTAL hIP ARTHROPLASTY Presentation.pptx
neck nodes and dissection types and lymph nodes levels
2. Earth - The Living Planet Module 2ELS
The scientific heritage No 166 (166) (2025)
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
The KM-GBF monitoring framework – status & key messages.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

On Type-Aware Entity Retrieval

  • 1. On Type-Aware Entity Retrieval Dar´ıo Garigliotti and Krisztian Balog University of Stavanger 3rd ACM International Conference on the Theory of Information Retrieval Amsterdam, The Netherlands - October 2, 2017
  • 2. We thank SIGIR for the Students Travel Grant
  • 3. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Outline: 1 Type-Aware Entity Retrieval 2 Dimensions of Type Information 3 Results and Analysis 4 Conclusions and Future Work Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 4. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Entity Types A characteristic property of entities is that they are typed Types are organized in hierarchies (or taxonomies) … Scientist … …… Person Agent … Enrico Fermi Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 5. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Query Target Types Target types: types of entities sought by the query … ScientistArtist Writer … …… Person Agent … italian nobel prize winners Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 6. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Target Types Target types occur in many queries countries where one can pay with the euro art museums in Amsterdam italian nobel prize winners Types help to reduce the space of search Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 7. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types E.g. Buying a book Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 8. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimensions of Type Information Type information have been shown to improve Entity Retrieval INEX Entity Ranking track We systematically identify and compare all combinations of three dimensions of type information Type taxonomies Type representations Retrieval models Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 9. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Taxonomies Which type taxonomy to use? DBpedia Ontology (7 levels, 600 types) Freebase Types (2 levels, 2K types) Wikipedia Categories (34 levels, 600K types) YAGO Taxonomy (19 levels, 500K types) These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 10. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Type(s) along path to top Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 11. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Type(s) along path to top t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Top-level type(s) Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 12. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Type(s) along path to top t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Top-level type(s) t3t3 t2t2 t5t5t4t4 t9t9t8t8 e t6t6 t12t12 t7t7 … t10t10 t11t11 t0t0 t1t1 … Most specific type(s) Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 13. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models How to add type information into entity retrieval? Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 14. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q)Types(q) Types(e)Types(e) Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 15. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 16. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 17. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup Test collection of DBpedia entities 1 Baseline: Mixture of Language Models (title and content) Idealized assumption of a target types oracle Settings for type assignments 1 Krisztian Balog and Robert Neumayer. 2013. A Test Collection for Entity Search in DBpedia. In Proc. of SIGIR. 737–740. Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 18. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup: Target Types Oracle An oracle provides us with the (distribution of) correct target types for a given query Construction: given a query, take union of all types of all its relevant entities Probability proportional to the number of relevant entities having the type Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 19. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup: Type Assignments Two settings to deal with missing type assignments 4TT: Only entities with types in all types taxonomies E.g. types for the entity Enrico Fermi In DBpedia: Scientist In Freebase: award.award winner, people.deceased person, education.academic, ... In Wikipedia: Nobel laureates in Physics, University of Pisa alumni, ... In YAGO: ItalianPhysicists, NobelLaureatesInPhysics, AmericanPeopleOfItalianDescent, ... ALL: All available entities are allowed Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 20. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Research Questions RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? RQ2 How to represent hierarchical entity type information for entity retrieval? RQ3 How to combine term-based and type-based information? Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 21. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 22. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 23. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ2 How to represent hierarchical entity type information for entity retrieval? Using the most specific types in the hierarchy provides the best performance No evidence that hierarchical relationships from ancestor types would benefit retrieval effectiveness Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 24. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ3 How to combine term-based and type-based information? In the 4TT setting, strict filtering is the best retrieval model Only the interpolation model can deal in a robust manner with the loss of type assignments in the ALL setting Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 25. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 26. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Summary of Findings Using the most specific types is the most effective way to represent hierarchical entity type information Wikipedia performs best across all type taxonomies in most of the cases All models to combine term- and type-based information suffer from missing type information, but interpolation appears to be the most robust Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 27. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 28. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 29. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Wikipedia, most specific, inter (0.95). MAP: 0.8518 Italian Nobel laureates, Nobel laureates in Literature, ... Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 30. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis What is in a Target Type? What portion of relevant entities can target types capture? Top-K types Type Taxonomy P R F1 K = 1 DBpedia 0.0027 0.5863 0.0046 Freebase 0.0060 0.7254 0.0076 Wikipedia 0.1147 0.4798 0.1287 YAGO 0.0418 0.6303 0.0488 K = 3 DBpedia 0.0006 0.7199 0.0012 Freebase 0.0004 0.7805 0.0008 Wikipedia 0.0402 0.5847 0.0614 YAGO 0.0036 0.7025 0.0062 Fine-grained types in Wikipedia category graph can capture some subset of relevant entities with the highest P and F1 Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 31. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Conclusions and Future Work In this work: We identify and systematically compare distinguished dimensions in type-aware entity retrieval We observe that type information proves most useful when larger, deeper type taxonomies provide very specific types. In future work: We plan to report further query-level analyses We wish to re-assess the experiments using automatically identified target types2 2 Dar´ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proc. of SIGIR. 845–848. Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 32. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 33. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 34. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendices Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 35. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Retrieval Model Interpolation model For DBpedia and Freebase, more type-based information is always increasingly more harmful Wikipedia and YAGO performances increase with higher contribution of type information using most specific types. 0 0.5 10 0.1 0.2 0.3 0.4 λt MAP DBpedia Freebase Wikipedia YAGO (a) Along path Figure 1: Interpolation performances for different type weights λt (4TT). 0 0.5 10 0.1 0.2 0.3 0.4 λt MAP (a) Path-to-top types 0 0.5 1 λt (b) Top-level types 0 0.5 1 λt (c) Most specific types Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 36. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Revisited Target Types Oracle The target types distribution of the default oracle includes all types associated with known relevant entities Alternatively, we assess the configurations using a filtered oracle of target types that satisfy a threshold of coverage of relevant entities Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  • 37. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Revisited Target Types Oracle Target Types Oracles: Default Filtered Models: Strict filtering Soft filtering Interpolation 0 0.1 0.2 0.3 Configurations MAP 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (a) Path-to-top types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (b) Top-level types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (c) Most specific types Filtered oracle leads to considerable drops in performance of settings using the most specific types It is important to consider all possible target types Dar´ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval