Entity Linking in Queries: Efficiency vs. Effectiveness

Faegheh Hasibi, Krisztian Balog, Svein E. Bratsberg
ECIR 2017
ENTITY LINKING IN QUERIES:
Efficiency vs. Effectiveness
IAI_group

ENTITY LINKING
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER PHILIP K. DICK
‣ Identifying entities in the text and linking them to the
corresponding entry in the knowledge base
Total Recall is a 1990 American science fiction action film
directed by Paul Verhoeven, starring Rachel Ticotin, Sharon
Stone, Arnold Schwarzenegger, Ronny Cox and Michael
Ironside. The film is loosely based on the Philip K. Dick
story…

TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
ENTITY LINKING IN QUERIES (ELQ)
total recall arnold schwarzenegger

TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
total recall arnold schwarzenegger total recall
TOTAL RECALL (2012 FILM)TOTAL RECALL (1990 FILM)

Query Entity Linking Interpretation
‣ Identifying sets of entity linking interpretations
[Carmel et al. 2014], [Hasibi et al. 2015], [Cornolti et al. 2016]

CONVENTIONAL APPROACH
Mention detection
Candidate Entity
Ranking
Disambiguation
new york
new york
pizza
0.8
NEW YORK
NEW YORK CITY
..
NEW YORK-STYLE PIZZA
SICILIAN_PIZZA
…
0.7
0.1
m e
0.9
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
s(m,e) interpretation sets
1 2
…

Incorporating different signals:
CONVENTIONAL APPROACH
‣ Contextual similarity between a candidate entity and text (or entity mention)
‣ Interdependence between all entity linking decisions (extracted from the
underlying KB)
e1 e2
e3
e4e5

What is special about entity linking in queries?
What is really different when it
comes to the queries?

RESEARCH QUESTIONS
ELQ is an online process and should be done under strict
time constraints
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?

Which group of features is
needed the most for effective
entity linking: contextual
similarity, interdependence
between entities, or both?
RQ2?
RESEARCH QUESTIONS
The context provided by the queries is limited

SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
Candidate Entity
Ranking
Disambiguation
1 2

Estimate:
score(m, e, q)
Given:
• mention m
• entity e
• query q
CANDIDATE ENTITY RANKING
(I) Identify all possible entities that can
be linked in the query
(II) Rank them based on how likely
they are link targets
Goal
‣ lexical matching of query n-grams against
a rich dictionary of entity name variants

UNSUPERVISED [MLMcg]
rectly in the subsequent disambiguation step. Using lexical matching of query n-g
against a rich dictionary of entity name variants allows for the identification of ca
date entities with close to perfect recall [16]. We follow this approach to obtain a l
candidate entities together with their corresponding mentions in the query. Our f
of attention below is on ranking these candidate (m, e) pairs with respect to the q
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-the-art g
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]. This m
considers both the likelihood of the given mention and the similarity between the q
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the probability
mention being linked to an entity (a.k.a. commonness [22]), computed from the F
collection [12]. The query likelihood P(q|e) is estimated using the query length
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q|). The e
and collection language models, P(t|✓e) and P(t|C), are computed using the Mix
of Language Models (MLM) approach [27].
against a rich dictionary of entity name variants allows for the identificat
date entities with close to perfect recall [16]. We follow this approach to o
candidate entities together with their corresponding mentions in the quer
of attention below is on ranking these candidate (m, e) pairs with respect
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-t
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]
considers both the likelihood of the given mention and the similarity betwe
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the pro
mention being linked to an entity (a.k.a. commonness [22]), computed fro
collection [12]. The query likelihood P(q|e) is estimated using the query
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q
and collection language models, P(t|✓e) and P(t|C), are computed using
Generative model for ranking entities based on the query
and the specific mention
Prob. mention being
linked to the entity
query likelihood based on 
Mixture of Language Models (MLM)
[Hasibi et al. 2015]

SUPERVISED [LTR]
28 features mainly from literature
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
[Meij et al. 2013], [Medelyan et al. 2008]

SUPERVISED [LTR]
28 features mainly from literature
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
NTEM(m)‡
SMIL(m)‡
MCT(e, m)‡
TCM(e, m)‡
TEM(e, m)‡
SimMf (e, m)†
|q|
Q
SimQf (e, q)†
‡
†
3.2 Disambiguation
[Meij et al. 2012], [Medelyan et al. 2008]
Extracted based on: 
Mention, Entity,
Mention-entity, Query

TEST COLLECTIONS
‣ Y-ERD
• Based on the Yahoo Search Query Log to Entities dataset
• 2398 queries
‣ ERD-dev
• Released as part of the ERD challenge 2014
• 91 queries
[Carmel et al. 2014], [Hasibi et al. 2015]

Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Signiﬁcance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
niﬁcance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.

RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.
Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Signiﬁcance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Method
Y-ERD ERD-dev
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423

SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
METHODS:
(1) Unsupervised (MLMcg)
Candidate Entity
Ranking
Disambiguation
21

Identify interpretation set(s):
Ei ={(m1,e1),...,(mk,ek)}
where: 
mentions of each set are 
non-overlapping
DISAMBIGUATION
Goal
interpretation sets

UNSUPERVISED - [Greedy]
Algorithm 1 Greedy Interpretation Finding (GIF)
Input: Ranked list of mention-entity pairs M; score threshold s
Output: Interpretations I = {E1, ..., Em}
begin
1: M0
Prune(M, s)
2: M0
PruneContainmentMentions(M0
)
3: I CreateInterpretations(M0
)
4: return I
end
1: function CREATEINTERPRETATIONS(M)
2: I {;}
3: for (m, e) in M do
4: h 0
5: for E in I do
6: if ¬ hasOverlap(E, (m, e)) then
7: E.add((m, e))
8: h 1
9: end if
10: end for
11: if h == 0 then
12: I.add({(m, e)})
13: end if
14: end for
15: return I
16: end function
(I) Pruning of mention-entity pairs
Discards the ones with score < threshold τs 
For containment mentions, keep only the highest
scoring one
(II) Set generation
Adding mention-entity pairs in decreasing order
of score to an interpretation set

SUPERVISED
(I) Generate all interpretations (from the top-K entity-mention pairs)
(II) Collectively select the interpretations with a binary classifier
Collective disambiguation
new york
manhattan
0.8
NEW YORK CITY
NEW YORK
MANHATTAN
MANHATTAN (FILM)
0.7
0.2
m e
0.8
(new york, NEW YORK CITY),
interpretation sets
…
SYRACUSE, NEW YORK
ALBANY,_NEW_YORK
new york pizza NEW YORK-STYLE PIZZA 0.9
K 0.5
0.2
(new york, SYRACUSE, NEW YORK),
(new york pizza, NEW YORK-STYLE PIZZA)
…
✓
✗
✗
✓
✗
new york
manhattan
new york
new york
…

SUPERVISED
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
mention-entity pairs (obtained from the CER step) and generate all possible interpreta-
tions out of those. We further require that mentions within the same interpretation do
e2E out(e). QI
e2E out(e) QI
T otalLink(E)
QI
Jcorpora(E)‡
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
|
T
e2E doc(e)|
T otalDocs
QI
Completeness(E)†
|edges(K|E|)|
QI
LenRatioSet(E, q)§
P
e2E |me|
|q|
QD
rank(e,q)
QD
‡
†
§
Entity-based features
computed for the
entire interpretation set
individual entity
features, aggregated
for all entities in the set
Set-based features

Method
Y-ERD ERD-dev
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Method
Y-ERD ERD-dev
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
Candidate entity ranking Table 4 presents the results for CER on the Y-ERD and
ERD-dev datasets. We find that commonness is a strong performer (this is in line with
the findings of [1, 16]). Combining commonness with MLM in a generative model
(MLMcg) delivers excellent performance, with MAP above 0.85 and R@5 around 0.9.
The LTR approach can bring in further slight, but for Y-ERD significant, improvements.
This means that both of our CER methods (MLMcg and LTR) are able to find the vast
majority of the relevant entities and return them at the top ranks.
Disambiguation Table 5 reports on the disambiguation results. We use the naming
RESULTS
LTR-Greedy outperforming other approaches
significantly on both test sets, and is the second
most efficient one.

RESULTS
Comparison with the top performers of the ERD challenge
(on official ERD test platform)
LTR-Greedy approach performs on a par
with the state-of-the-art systems.
Remarkable finding considering the
complexity of other solutions.
bi, Krisztian Balog, and Svein Erik Bratsberg
Table 6. ELQ results on the of-
ficial ERD test platform.
Method F1
LTR-Greedy 0.699
SMAPH-2 [6] 0.708
NTUNLP [5] 0.680
Seznam [10] 0.669
ults, LTR-Greedy is our overall rec-
ompare this method against the top
RD challenge (using the official chal-
Table 6. For this comparison, we
spell checking, as this has also been
erforming system (SMAPH-2) [6].
hat our LTR-Greedy approach per-
h the state-of-the-art systems. This
g into account the simplicity of the
ion algorithm vs. the considerably
ons employed by others.
r results reveal that candidate entity ranking is of higher importance
for ELQ. Hence, it is more beneficial to perform the (expensive)
early on in the pipeline for the seemingly easier CER step; dis-
n be tackled successfully with an unsupervised (greedy) algorithm.
he top ranked entity does not yield an immediate solution; as shown
ion is an indispensable step in ELQ.)
top-3
systems

3 0.06 0.09 0.12
SetSim
LenRatioSet
P
ContextSim/avg
H
ContextSim/min
ContextSim/max
Commonness/avg
iRank/max
Commonness/min
iRank/min
iRank/avg
Score/max
Score/min
Score/avg
0.00 0.04 0.08 0.12 0.16 0.20
LTR-LTR
MLM-LTR
SimQ-label
SimQ-subject
SimQ-asbtract
Len
Links
Redirects
LenRatio
TEM
SimQ-wikiLink
Sim
QCT
SimQ-content
MCT
Commonness
Matches
0.00 0.03
t features used in the supervised approaches, sorted by Gini score: (Left)
ng, (Right) Disambiguation.
ore components: candidate entity ranking and disambiguation. For
FEATURE IMPORTANCE
LTR-LTR 
MLMcg-LTR

WE ASKED …
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?
Candidate entity ranking is of higher
importance than disambiguation.  
It is more beneﬁcial to perform the
(expensive) supervised learning early
on, and tackle disambiguation with an
unsupervised algorithm.
Answer

WE ASKED …
Which group of features is
needed the most for
effective entity
disambiguation: contextual
similarity, interdependence
between entities, or both?
RQ2?
Contextual similarity features are the
most effective for entity disambiguation. 
 
Entity interdependences are helpful
when sufﬁciently many entities are
mentioned in the text; not the case for
queries.
Answer

Code and resources at:
http://bit.ly/ecir2017-elq
Thank you!
Questions?

Entity Linking in Queries: Efficiency vs. Effectiveness

More Related Content

What's hot (20)

Similar to Entity Linking in Queries: Efficiency vs. Effectiveness (20)

Recently uploaded (20)

Entity Linking in Queries: Efficiency vs. Effectiveness