SlideShare a Scribd company logo
Fielded Sequential Dependence
Model for Ad-Hoc Entity Retrieval in
the Web of Data
Nikita Zhiltsov 1,2
Alexander Kotov 3
Fedor Nikolaev 3
1
Kazan Federal University
2
Textocat
3
Textual Data Analytics Lab, Department of Computer Science, Wayne State University
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
2/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Knowledge Graphs
3/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Linked Open Data (LOD) Cloud
4/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Entities
Material objects or concepts in the
real world or fiction (e.g. people,
movies, conferences etc.)
Are connected with other entities by
relations (e.g. hasGenre, actedIn,
isPCmemberOf etc.)
Subject-Predicate-Object (SPO)
triple: subject=entity; object=entity
(or primitive data value);
predicate=relationship between
subject and object
Many SPO triples → knowledge graph
5/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
DBPedia entity page example
6/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Entity Retrieval from Knowledge Graph(s)
Graph KBs are
perfectly suited for
addressing the
information needs
that aim at finding
specific objects
(entities) rather than
documents
Given the user’s
information need
expressed as a
keyword query,
retrieve a relevant set
of objects from the
knowledge graph(s)
7/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Typical ERWD tasks
Entity Search
Queries refer to a particular entity.
“Ben Franklin”
“England football player highest paid”
“Einstein Relativity theory”
List Search
Complex queries with several relevant entities.
“US presidents since 1960”
“animals lay eggs mammals”
Question Answering
Queries are questions in natural language.
“Who is the mayor of Santiago?”
“For which label did Elvis record his first album?”
8/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Fundamental problems in ERWD
Designing effective and concise entity representations
• Pound, Mika et al. Ad-hoc Object Retrieval in the Web of Data,
WWW’10
• Blanco, Mika et al. Effective and Efficient Entity Search in RDF
Data, ISWC’11
• Neumayer, Balog et al. On the Modeling of Entities for Ad-hoc
Entity Search in the Web of Data, ECIR’12
Developing accurate retrieval models
• Mostly adaptations of standard unigram bag-of-words retrieval
models, such as BM25F, MLM
9/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
10/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Entity document
An entity is represented as a structured (multi-fielded) document:
names
Conventional names of the entities, such as the name of
a person or the name of an organization
attributes
All entity properties, other than names
categories
Classes or groups, to which the entity has been
assigned
similar entity names
Names of the entities that are very similar or identical
to a given entity
related entity names
Names of the entities that are part of the same RDF
triple
11/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Entity document example
Multi-fielded entity document for the entity Barack Obama.
Field Content
names barack obama barack hussein obama ii
attributes 44th current president united states
birth place honolulu hawaii
categories democratic party united states senator
nobel peace prize laureate christian
similar entity names barack obama jr barak hussein obama
barack h obama ii
related entity names spouse michelle obama illinois state
predecessor george walker bush
12/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
13/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Motivation
Previous research in ad-hoc IR has focused on two major directions:
unigram bag-of-words retrieval models for multi-fielded
documents
• Ogilvie and Callan. Combining Document Representations for
Known-item Search, SIGIR’03
• Robertson et al. Simple BM25 Extension to Multiple Weighted
Fields, CIKM’04
retrieval models incorporating term dependencies
• Metzler and Croft. A Markov Random Field Model for Term
Dependencies, SIGIR’05
• Huston and Croft. A Comparison of Retrieval Models using Term
Dependencies, CIKM’14
Goal: to develop a retrieval model that captures both document
structure and term dependencies
14/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
MLM
P(Q|D)
rank
=
qi∈Q
P(qi|θD)tf (qi)
,
where
P(qi|θD) =
j
wjP(qi|θ
j
D)
15/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
SDM
Ranks w.r.t. PΛ(D|Q) = i∈{T,U,O} λi fi(Q, D)
Potential function for unigrams is QL:
fT(qi, D) = log P(qi|θD) = log
tfqi,D + µ
cfqi
|C|
|D| + µ
16/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
FSDM incorporates document structure and term dependencies with
the following ranking function:
PΛ(D|Q)
rank
= λT
q∈Q
˜fT(qi, D) +
λO
q∈Q
˜fO(qi, qi+1, D) +
λU
q∈Q
˜fU(qi, qi+1, D)
Separate MLMs for bigrams and unigrams give FSDM the flexibility
to adjust the document scoring depending on the query type
MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0
17/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
FSDM incorporates document structure and term dependencies with
the following ranking function:
PΛ(D|Q)
rank
= λT
q∈Q
˜fT(qi, D) +
λO
q∈Q
˜fO(qi, qi+1, D) +
λU
q∈Q
˜fU(qi, qi+1, D)
Separate MLMs for bigrams and unigrams give FSDM the flexibility
to adjust the document scoring depending on the query type
MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0
17/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
FSDM incorporates document structure and term dependencies with
the following ranking function:
PΛ(D|Q)
rank
= λT
q∈Q
˜fT(qi, D) +
λO
q∈Q
˜fO(qi, qi+1, D) +
λU
q∈Q
˜fU(qi, qi+1, D)
Separate MLMs for bigrams and unigrams give FSDM the flexibility
to adjust the document scoring depending on the query type
MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0
17/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
FSDM incorporates document structure and term dependencies with
the following ranking function:
PΛ(D|Q)
rank
= λT
q∈Q
˜fT(qi, D) +
λO
q∈Q
˜fO(qi, qi+1, D) +
λU
q∈Q
˜fU(qi, qi+1, D)
Separate MLMs for bigrams and unigrams give FSDM the flexibility
to adjust the document scoring depending on the query type
MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0
17/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
Potential function for unigrams in case of FSDM:
˜fT(qi, D) = log
j
wT
j P(qi|θ
j
D) = log
j
wT
j
tfqi,Dj + µj
cf
j
qi
|Cj|
|Dj| + µj
Example
apollo astronauts who walked on the moon
18/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
Potential function for unigrams in case of FSDM:
˜fT(qi, D) = log
j
wT
j P(qi|θ
j
D) = log
j
wT
j
tfqi,Dj + µj
cf
j
qi
|Cj|
|Dj| + µj
Example
apollo astronauts
category
who walked on the moon
18/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
FSDM ranking function
Potential function for unigrams in case of FSDM:
˜fT(qi, D) = log
j
wT
j P(qi|θ
j
D) = log
j
wT
j
tfqi,Dj + µj
cf
j
qi
|Cj|
|Dj| + µj
Example
apollo astronauts
category
who walked on the moon
attribute
18/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Parameters of FSDM
Overall, FSDM has 3 ∗ F + 3 free parameters: wT
, wO
, wU
, λ .
Properties of ranking function
1. Linearity with respect to λ.
We can apply any linear learning-to-rank algorithm to optimize
the ranking function with respect to λ.
2. Linearity with respect to w of the arguments of monotonic ˜f (·)
functions.
Optimization of the arguments as linear functions with respect to
w, leads to optimization of each function ˜f (·).
19/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
20/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Optimization algorithm
1: Q ← Training queries
2: for s ∈ {T, O, U} do // Optimize field weights of LMs independently
3: λ = es
4: ˆws
← CoordAsc(Q, λ)
5: end for
6: ˆλ ← CoordAsc(Q, ˆwT, ˆwO, ˆwU) // Optimize λ
The unit vectors eT = (1, 0, 0), eO = (0, 1, 0), eU = (0, 0, 1) are the
corresponding settings of the parameters λ in the formula of FSDM
ranking function.
⇒ direct optimization w.r.t. target metric, e.g. MAP
21/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
22/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Collection
DBPedia 3.7 was used as a collection in all experiments
Structured version of on-line encyclopedia Wikipedia
Provides the descriptions of over 3.5 million entities belonging to
320 classes
23/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Query Sets
Balog and Neumayer. A Test Collection for Entity Search in DBpedia,
SIGIR’13.
Query set Amount Query types [Pound et al., 2010]
SemSearch ES 130 Entity
ListSearch 115 Type
INEX-LD 100 Entity, Type, Attribute, Relation
QALD-2 140 Entity, Type, Attribute, Relation
24/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Tuning field weights
Attributes field is consistently considered to be a very valuable for
both unigrams and bigrams.
The names field as well as the similar entity names field are highly
important for queries aiming at finding named entities.
Distinguishing categories from related entity names is particularly
important for type queries.
25/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Tuning λλT,λO,λU
0.0
0.2
0.4
0.6
0.8
Sem
Search_ES
ListSearch
IN
EX_LD
Q
A
LD
2
λT
λO
λU
(a) SDM λT,λO,λU
0.0
0.2
0.4
0.6
0.8
Sem
Search_ES
ListSearch
IN
EX_LD
Q
A
LD
2
λT
λO
λU
(b) FSDM
Bigram matches are important for named entity queries.
Transformation of SDM into FSDM increases the importance of
bigram matches, which ultimately improves the retrieval
performance, as we will demonstrate next.
26/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Experimental results
Query set Method MAP P@10 P@20 b-pref
SemSearch ES
MLM-CA 0.320 0.250 0.179 0.674
SDM-CA 0.254∗
0.202∗
0.149∗
0.671
FSDM 0.386∗
† 0.286∗
† 0.204∗
† 0.750∗
†
ListSearch
MLM-CA 0.190 0.252 0.192 0.428
SDM-CA 0.197 0.252 0.202 0.471∗
FSDM 0.203 0.256 0.203 0.466∗
INEX-LD
MLM-CA 0.102 0.238 0.190 0.318
SDM-CA 0.117∗
0.258 0.199 0.335
FSDM 0.111∗
0.263∗
0.215∗
† 0.341∗
QALD-2
MLM-CA 0.152 0.103 0.084 0.373
SDM-CA 0.184 0.106 0.090 0.465∗
FSDM 0.195∗
0.136∗
† 0.111∗
0.466∗
All queries
MLM-CA 0.196 0.206 0.157 0.455
SDM-CA 0.192 0.198 0.155 0.495∗
FSDM 0.231∗
† 0.231∗
† 0.179∗
† 0.517∗
†
27/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Topic-Level differences between SDM and FSDM
-1.0
-0.5
0.0
0.5
1.0
0 100 200 300 400 500
(a) All queries
-1.0
-0.5
0.0
0.5
1.0
0 50 100
(b) SemSearch ES
-1.0
-0.5
0.0
0.5
1.0
0 30 60 90 120
(c) ListSearch
-1.0
-0.5
0.0
0.5
1.0
0 25 50 75 100
(d) INEX-LD
-1.0
-0.5
0.0
0.5
1.0
0 50 100
(e) QALD-2
Topic-level differences in average precision between FSDM and SDM.
Positive values indicate FSDM is better.
28/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Overview
Entities
Entity Representation
Fielded Sequential Dependence Model
Parameter Estimation
Results
Conclusion
29/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Conclusion
We proposed Fielded Sequential Dependence Model, a novel
retrieval model, which incorporates term dependencies into
structured document retrieval
We proposed a two-stage algorithm to directly optimize the
parameters of FSDM with respect to the target retrieval metric
We experimentally demonstrated that having different field
weighting schemes for unigrams and bigrams is effective for
different types of ERWD queries
Experimental evaluation of FSDM on a standard publicly
available benchmark showed that it consistently and, in most
cases, statistically significantly outperforms MLM and SDM for
the task of ERWD
30/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Code and runs are available at
github.com/teanalab/FieldedSDM
Questions?
31/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Robustness
0
50
100
150
200
<=
-100
-[75,
100)
-[50,
75)
-[25,
50)
-(0,
25)
[0,
25)
[25,
50)
[50,
75)
[75,
100)
>=
100
SDM
FSDM
FSDM is more robust compared to SDM
FSDM improves the performance of 50% of the queries with
respect to MLM-CA, compared to 45% of the queries improved
by SDM
FSDM decreases the performance of only 26% of the queries,
while SDM degrades the performance of 40% of the queries
32/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Various Levels of Difficulty
Level Model MAP P@10 P@20 b-pref
Difficult queries
SDM 0.213 0.067 0.042 0.599
FSDM 0.239 0.065 0.043 0.621
Medium queries
SDM 0.209 0.224 0.165 0.532
FSDM 0.264† 0.272† 0.191† 0.559†
Easy queries
SDM 0.139 0.298 0.262 0.316
FSDM 0.166† 0.345† 0.309† 0.330
Creating sophisticated entity descriptions is not sufficient for
answering difficult queries in entity retrieval scenario and better
capturing the semantics of query terms is required to further improve
the precision of FSDM for difficult queries.
33/34
Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion
Failure Analysis
SDM errors
Overestimation of importance of matches in the fields other than
names
“city of charlotte”
“give me all soccer clubs in the premier league”
“us presidents since 1960”
FSDM errors
Neglecting the important query terms
“members of the beaux arts trio”
“who created goofy”
“where is the residence of the prime minister of spain?”
Lack of semantic knowledge.
“did nicole kidman have any siblings”
34/34

More Related Content

PDF
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
PDF
Entity Retrieval (SIGIR 2013 tutorial)
PPTX
Relational algebra
PDF
Entity Linking
PDF
Entity Linking in Queries: Efficiency vs. Effectiveness
PDF
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
PPT
Vsm 벡터공간모델
PDF
Entity Retrieval (WSDM 2014 tutorial)
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Entity Retrieval (SIGIR 2013 tutorial)
Relational algebra
Entity Linking
Entity Linking in Queries: Efficiency vs. Effectiveness
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Vsm 벡터공간모델
Entity Retrieval (WSDM 2014 tutorial)

What's hot (20)

PPTX
Object Oriented Programming Concepts
PDF
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
PDF
A Graph-based Model for Multimodal Information Retrieval
PDF
Evaluation Initiatives for Entity-oriented Search
PDF
Exploiting Entity Linking in Queries For Entity Retrieval
PPTX
Probabilistic models (part 1)
PPT
Text classification using Text kernels
PDF
Java - Class Structure
PPTX
Text categorization
PPTX
Tdm probabilistic models (part 2)
PPTX
Object Oriented Technologies
PPTX
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
PDF
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
PDF
What's next in Julia
PDF
Text Categorization Using Improved K Nearest Neighbor Algorithm
PDF
Object-Oriented Programming (OOP)
PPT
C plusplus
PPTX
Document ranking using qprp with concept of multi dimensional subspace
PDF
Entity Search: The Last Decade and the Next
Object Oriented Programming Concepts
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
A Graph-based Model for Multimodal Information Retrieval
Evaluation Initiatives for Entity-oriented Search
Exploiting Entity Linking in Queries For Entity Retrieval
Probabilistic models (part 1)
Text classification using Text kernels
Java - Class Structure
Text categorization
Tdm probabilistic models (part 2)
Object Oriented Technologies
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
What's next in Julia
Text Categorization Using Improved K Nearest Neighbor Algorithm
Object-Oriented Programming (OOP)
C plusplus
Document ranking using qprp with concept of multi dimensional subspace
Entity Search: The Last Decade and the Next
Ad

Similar to Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data (20)

PDF
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
PDF
PDF
Entity Retrieval (WWW 2013 tutorial)
PDF
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
PDF
Improving Entity Retrieval on Structured Data
PDF
Unsupervised Main Entity Extraction from News Articles using Latent Variables
PDF
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
PDF
A scalable gibbs sampler for probabilistic entity linking
PDF
Ranking Objects by Following Paths in Entity-Relationship Graphs (PhD Worksho...
PDF
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
ODP
PDF
inteSearch: An Intelligent Linked Data Information Access Framework
PDF
Neural Network in Knowledge Bases
PPT
ppt
PDF
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
PDF
Review on Automation Tool for ERD Normalization
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
Entity Retrieval (WWW 2013 tutorial)
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Machine Learning & Embeddings for Large Knowledge Graphs
Improving Entity Retrieval on Structured Data
Unsupervised Main Entity Extraction from News Articles using Latent Variables
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
A scalable gibbs sampler for probabilistic entity linking
Ranking Objects by Following Paths in Entity-Relationship Graphs (PhD Worksho...
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
inteSearch: An Intelligent Linked Data Information Access Framework
Neural Network in Knowledge Bases
ppt
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Review on Automation Tool for ERD Normalization
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
Ad

Recently uploaded (20)

PPTX
2. Earth - The Living Planet earth and life
PPTX
Microbiology with diagram medical studies .pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Sciences of Europe No 170 (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
famous lake in india and its disturibution and importance
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
2. Earth - The Living Planet earth and life
Microbiology with diagram medical studies .pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Sciences of Europe No 170 (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
TOTAL hIP ARTHROPLASTY Presentation.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Classification Systems_TAXONOMY_SCIENCE8.pptx
ECG_Course_Presentation د.محمد صقران ppt
Viruses (History, structure and composition, classification, Bacteriophage Re...
2. Earth - The Living Planet Module 2ELS
famous lake in india and its disturibution and importance
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Derivatives of integument scales, beaks, horns,.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

  • 1. Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics Lab, Department of Computer Science, Wayne State University
  • 2. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 2/34
  • 3. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Knowledge Graphs 3/34
  • 4. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Linked Open Data (LOD) Cloud 4/34
  • 5. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entities Material objects or concepts in the real world or fiction (e.g. people, movies, conferences etc.) Are connected with other entities by relations (e.g. hasGenre, actedIn, isPCmemberOf etc.) Subject-Predicate-Object (SPO) triple: subject=entity; object=entity (or primitive data value); predicate=relationship between subject and object Many SPO triples → knowledge graph 5/34
  • 6. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion DBPedia entity page example 6/34
  • 7. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity Retrieval from Knowledge Graph(s) Graph KBs are perfectly suited for addressing the information needs that aim at finding specific objects (entities) rather than documents Given the user’s information need expressed as a keyword query, retrieve a relevant set of objects from the knowledge graph(s) 7/34
  • 8. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Typical ERWD tasks Entity Search Queries refer to a particular entity. “Ben Franklin” “England football player highest paid” “Einstein Relativity theory” List Search Complex queries with several relevant entities. “US presidents since 1960” “animals lay eggs mammals” Question Answering Queries are questions in natural language. “Who is the mayor of Santiago?” “For which label did Elvis record his first album?” 8/34
  • 9. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Fundamental problems in ERWD Designing effective and concise entity representations • Pound, Mika et al. Ad-hoc Object Retrieval in the Web of Data, WWW’10 • Blanco, Mika et al. Effective and Efficient Entity Search in RDF Data, ISWC’11 • Neumayer, Balog et al. On the Modeling of Entities for Ad-hoc Entity Search in the Web of Data, ECIR’12 Developing accurate retrieval models • Mostly adaptations of standard unigram bag-of-words retrieval models, such as BM25F, MLM 9/34
  • 10. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 10/34
  • 11. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity document An entity is represented as a structured (multi-fielded) document: names Conventional names of the entities, such as the name of a person or the name of an organization attributes All entity properties, other than names categories Classes or groups, to which the entity has been assigned similar entity names Names of the entities that are very similar or identical to a given entity related entity names Names of the entities that are part of the same RDF triple 11/34
  • 12. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity document example Multi-fielded entity document for the entity Barack Obama. Field Content names barack obama barack hussein obama ii attributes 44th current president united states birth place honolulu hawaii categories democratic party united states senator nobel peace prize laureate christian similar entity names barack obama jr barak hussein obama barack h obama ii related entity names spouse michelle obama illinois state predecessor george walker bush 12/34
  • 13. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 13/34
  • 14. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Motivation Previous research in ad-hoc IR has focused on two major directions: unigram bag-of-words retrieval models for multi-fielded documents • Ogilvie and Callan. Combining Document Representations for Known-item Search, SIGIR’03 • Robertson et al. Simple BM25 Extension to Multiple Weighted Fields, CIKM’04 retrieval models incorporating term dependencies • Metzler and Croft. A Markov Random Field Model for Term Dependencies, SIGIR’05 • Huston and Croft. A Comparison of Retrieval Models using Term Dependencies, CIKM’14 Goal: to develop a retrieval model that captures both document structure and term dependencies 14/34
  • 15. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion MLM P(Q|D) rank = qi∈Q P(qi|θD)tf (qi) , where P(qi|θD) = j wjP(qi|θ j D) 15/34
  • 16. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion SDM Ranks w.r.t. PΛ(D|Q) = i∈{T,U,O} λi fi(Q, D) Potential function for unigrams is QL: fT(qi, D) = log P(qi|θD) = log tfqi,D + µ cfqi |C| |D| + µ 16/34
  • 17. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function FSDM incorporates document structure and term dependencies with the following ranking function: PΛ(D|Q) rank = λT q∈Q ˜fT(qi, D) + λO q∈Q ˜fO(qi, qi+1, D) + λU q∈Q ˜fU(qi, qi+1, D) Separate MLMs for bigrams and unigrams give FSDM the flexibility to adjust the document scoring depending on the query type MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0 17/34
  • 18. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function FSDM incorporates document structure and term dependencies with the following ranking function: PΛ(D|Q) rank = λT q∈Q ˜fT(qi, D) + λO q∈Q ˜fO(qi, qi+1, D) + λU q∈Q ˜fU(qi, qi+1, D) Separate MLMs for bigrams and unigrams give FSDM the flexibility to adjust the document scoring depending on the query type MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0 17/34
  • 19. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function FSDM incorporates document structure and term dependencies with the following ranking function: PΛ(D|Q) rank = λT q∈Q ˜fT(qi, D) + λO q∈Q ˜fO(qi, qi+1, D) + λU q∈Q ˜fU(qi, qi+1, D) Separate MLMs for bigrams and unigrams give FSDM the flexibility to adjust the document scoring depending on the query type MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0 17/34
  • 20. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function FSDM incorporates document structure and term dependencies with the following ranking function: PΛ(D|Q) rank = λT q∈Q ˜fT(qi, D) + λO q∈Q ˜fO(qi, qi+1, D) + λU q∈Q ˜fU(qi, qi+1, D) Separate MLMs for bigrams and unigrams give FSDM the flexibility to adjust the document scoring depending on the query type MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0 17/34
  • 21. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function Potential function for unigrams in case of FSDM: ˜fT(qi, D) = log j wT j P(qi|θ j D) = log j wT j tfqi,Dj + µj cf j qi |Cj| |Dj| + µj Example apollo astronauts who walked on the moon 18/34
  • 22. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function Potential function for unigrams in case of FSDM: ˜fT(qi, D) = log j wT j P(qi|θ j D) = log j wT j tfqi,Dj + µj cf j qi |Cj| |Dj| + µj Example apollo astronauts category who walked on the moon 18/34
  • 23. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function Potential function for unigrams in case of FSDM: ˜fT(qi, D) = log j wT j P(qi|θ j D) = log j wT j tfqi,Dj + µj cf j qi |Cj| |Dj| + µj Example apollo astronauts category who walked on the moon attribute 18/34
  • 24. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Parameters of FSDM Overall, FSDM has 3 ∗ F + 3 free parameters: wT , wO , wU , λ . Properties of ranking function 1. Linearity with respect to λ. We can apply any linear learning-to-rank algorithm to optimize the ranking function with respect to λ. 2. Linearity with respect to w of the arguments of monotonic ˜f (·) functions. Optimization of the arguments as linear functions with respect to w, leads to optimization of each function ˜f (·). 19/34
  • 25. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 20/34
  • 26. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Optimization algorithm 1: Q ← Training queries 2: for s ∈ {T, O, U} do // Optimize field weights of LMs independently 3: λ = es 4: ˆws ← CoordAsc(Q, λ) 5: end for 6: ˆλ ← CoordAsc(Q, ˆwT, ˆwO, ˆwU) // Optimize λ The unit vectors eT = (1, 0, 0), eO = (0, 1, 0), eU = (0, 0, 1) are the corresponding settings of the parameters λ in the formula of FSDM ranking function. ⇒ direct optimization w.r.t. target metric, e.g. MAP 21/34
  • 27. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 22/34
  • 28. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Collection DBPedia 3.7 was used as a collection in all experiments Structured version of on-line encyclopedia Wikipedia Provides the descriptions of over 3.5 million entities belonging to 320 classes 23/34
  • 29. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Query Sets Balog and Neumayer. A Test Collection for Entity Search in DBpedia, SIGIR’13. Query set Amount Query types [Pound et al., 2010] SemSearch ES 130 Entity ListSearch 115 Type INEX-LD 100 Entity, Type, Attribute, Relation QALD-2 140 Entity, Type, Attribute, Relation 24/34
  • 30. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Tuning field weights Attributes field is consistently considered to be a very valuable for both unigrams and bigrams. The names field as well as the similar entity names field are highly important for queries aiming at finding named entities. Distinguishing categories from related entity names is particularly important for type queries. 25/34
  • 31. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Tuning λλT,λO,λU 0.0 0.2 0.4 0.6 0.8 Sem Search_ES ListSearch IN EX_LD Q A LD 2 λT λO λU (a) SDM λT,λO,λU 0.0 0.2 0.4 0.6 0.8 Sem Search_ES ListSearch IN EX_LD Q A LD 2 λT λO λU (b) FSDM Bigram matches are important for named entity queries. Transformation of SDM into FSDM increases the importance of bigram matches, which ultimately improves the retrieval performance, as we will demonstrate next. 26/34
  • 32. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Experimental results Query set Method MAP P@10 P@20 b-pref SemSearch ES MLM-CA 0.320 0.250 0.179 0.674 SDM-CA 0.254∗ 0.202∗ 0.149∗ 0.671 FSDM 0.386∗ † 0.286∗ † 0.204∗ † 0.750∗ † ListSearch MLM-CA 0.190 0.252 0.192 0.428 SDM-CA 0.197 0.252 0.202 0.471∗ FSDM 0.203 0.256 0.203 0.466∗ INEX-LD MLM-CA 0.102 0.238 0.190 0.318 SDM-CA 0.117∗ 0.258 0.199 0.335 FSDM 0.111∗ 0.263∗ 0.215∗ † 0.341∗ QALD-2 MLM-CA 0.152 0.103 0.084 0.373 SDM-CA 0.184 0.106 0.090 0.465∗ FSDM 0.195∗ 0.136∗ † 0.111∗ 0.466∗ All queries MLM-CA 0.196 0.206 0.157 0.455 SDM-CA 0.192 0.198 0.155 0.495∗ FSDM 0.231∗ † 0.231∗ † 0.179∗ † 0.517∗ † 27/34
  • 33. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Topic-Level differences between SDM and FSDM -1.0 -0.5 0.0 0.5 1.0 0 100 200 300 400 500 (a) All queries -1.0 -0.5 0.0 0.5 1.0 0 50 100 (b) SemSearch ES -1.0 -0.5 0.0 0.5 1.0 0 30 60 90 120 (c) ListSearch -1.0 -0.5 0.0 0.5 1.0 0 25 50 75 100 (d) INEX-LD -1.0 -0.5 0.0 0.5 1.0 0 50 100 (e) QALD-2 Topic-level differences in average precision between FSDM and SDM. Positive values indicate FSDM is better. 28/34
  • 34. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 29/34
  • 35. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Conclusion We proposed Fielded Sequential Dependence Model, a novel retrieval model, which incorporates term dependencies into structured document retrieval We proposed a two-stage algorithm to directly optimize the parameters of FSDM with respect to the target retrieval metric We experimentally demonstrated that having different field weighting schemes for unigrams and bigrams is effective for different types of ERWD queries Experimental evaluation of FSDM on a standard publicly available benchmark showed that it consistently and, in most cases, statistically significantly outperforms MLM and SDM for the task of ERWD 30/34
  • 36. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Code and runs are available at github.com/teanalab/FieldedSDM Questions? 31/34
  • 37. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Robustness 0 50 100 150 200 <= -100 -[75, 100) -[50, 75) -[25, 50) -(0, 25) [0, 25) [25, 50) [50, 75) [75, 100) >= 100 SDM FSDM FSDM is more robust compared to SDM FSDM improves the performance of 50% of the queries with respect to MLM-CA, compared to 45% of the queries improved by SDM FSDM decreases the performance of only 26% of the queries, while SDM degrades the performance of 40% of the queries 32/34
  • 38. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Various Levels of Difficulty Level Model MAP P@10 P@20 b-pref Difficult queries SDM 0.213 0.067 0.042 0.599 FSDM 0.239 0.065 0.043 0.621 Medium queries SDM 0.209 0.224 0.165 0.532 FSDM 0.264† 0.272† 0.191† 0.559† Easy queries SDM 0.139 0.298 0.262 0.316 FSDM 0.166† 0.345† 0.309† 0.330 Creating sophisticated entity descriptions is not sufficient for answering difficult queries in entity retrieval scenario and better capturing the semantics of query terms is required to further improve the precision of FSDM for difficult queries. 33/34
  • 39. Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Failure Analysis SDM errors Overestimation of importance of matches in the fields other than names “city of charlotte” “give me all soccer clubs in the premier league” “us presidents since 1960” FSDM errors Neglecting the important query terms “members of the beaux arts trio” “who created goofy” “where is the residence of the prime minister of spain?” Lack of semantic knowledge. “did nicole kidman have any siblings” 34/34