Faegheh Hasibi, Krisztian Balog, Svein E. Bratsberg
ECIR 2017
ENTITY LINKING IN QUERIES:
Efficiency vs. Effectiveness
IAI_group
ENTITY LINKING
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER PHILIP K. DICK
‣ Identifying entities in the text and linking them to the
corresponding entry in the knowledge base
Total Recall is a 1990 American science fiction action film
directed by Paul Verhoeven, starring Rachel Ticotin, Sharon
Stone, Arnold Schwarzenegger, Ronny Cox and Michael
Ironside. The film is loosely based on the Philip K. Dick
story…
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
ENTITY LINKING IN QUERIES (ELQ)
total recall arnold schwarzenegger
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
ENTITY LINKING IN QUERIES (ELQ)
total recall arnold schwarzenegger total recall
TOTAL RECALL (2012 FILM)TOTAL RECALL (1990 FILM)
Query Entity Linking Interpretation
‣ Identifying sets of entity linking interpretations
ENTITY LINKING IN QUERIES (ELQ)
[Carmel et al. 2014], [Hasibi et al. 2015], [Cornolti et al. 2016]
CONVENTIONAL APPROACH
Mention detection
Candidate Entity
Ranking
Disambiguation
new york
new york
pizza
0.8
NEW YORK
NEW YORK CITY
..
NEW YORK-STYLE PIZZA
SICILIAN_PIZZA
…
0.7
0.1
m e
0.9
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
s(m,e) interpretation sets
1 2
…
Incorporating different signals:
CONVENTIONAL APPROACH
‣ Contextual similarity between a candidate entity and text (or entity mention)
‣ Interdependence between all entity linking decisions (extracted from the
underlying KB)
e1 e2
e3
e4e5
What is special about entity linking in queries?
What is really different when it
comes to the queries?
RESEARCH QUESTIONS
ELQ is an online process and should be done under strict
time constraints
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?
Which group of features is
needed the most for effective
entity linking: contextual
similarity, interdependence
between entities, or both?
RQ2?
RESEARCH QUESTIONS
The context provided by the queries is limited

SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
1 2
SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
1 2
Estimate:
score(m, e, q)
Given:
• mention m
• entity e
• query q
CANDIDATE ENTITY RANKING
(I) Identify all possible entities that can
be linked in the query
(II) Rank them based on how likely
they are link targets
Goal
‣ lexical matching of query n-grams against
a rich dictionary of entity name variants
UNSUPERVISED [MLMcg]
rectly in the subsequent disambiguation step. Using lexical matching of query n-g
against a rich dictionary of entity name variants allows for the identification of ca
date entities with close to perfect recall [16]. We follow this approach to obtain a l
candidate entities together with their corresponding mentions in the query. Our f
of attention below is on ranking these candidate (m, e) pairs with respect to the q
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-the-art g
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]. This m
considers both the likelihood of the given mention and the similarity between the q
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the probability
mention being linked to an entity (a.k.a. commonness [22]), computed from the F
collection [12]. The query likelihood P(q|e) is estimated using the query length
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q|). The e
and collection language models, P(t|✓e) and P(t|C), are computed using the Mix
of Language Models (MLM) approach [27].
against a rich dictionary of entity name variants allows for the identificat
date entities with close to perfect recall [16]. We follow this approach to o
candidate entities together with their corresponding mentions in the quer
of attention below is on ranking these candidate (m, e) pairs with respect
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-t
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]
considers both the likelihood of the given mention and the similarity betwe
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the pro
mention being linked to an entity (a.k.a. commonness [22]), computed fro
collection [12]. The query likelihood P(q|e) is estimated using the query
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q
and collection language models, P(t|✓e) and P(t|C), are computed using
Generative model for ranking entities based on the query
and the specific mention
Prob. mention being
linked to the entity
query likelihood based on

Mixture of Language Models (MLM)
[Hasibi et al. 2015]
SUPERVISED [LTR]
28 features mainly from literature
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
[Meij et al. 2013], [Medelyan et al. 2008]
SUPERVISED [LTR]
28 features mainly from literature
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
[Meij et al. 2012], [Medelyan et al. 2008]
Extracted based on:

Mention, Entity,
Mention-entity, Query
TEST COLLECTIONS
‣ Y-ERD
• Based on the Yahoo Search Query Log to Entities dataset
• 2398 queries
‣ ERD-dev
• Released as part of the ERD challenge 2014
• 91 queries
[Carmel et al. 2014], [Hasibi et al. 2015]
Entity Linking in Queries: Efficiency vs. Effectiveness 9
Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.
RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.
Entity Linking in Queries: Efficiency vs. Effectiveness 9
Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
21
Identify interpretation set(s):
Ei ={(m1,e1),...,(mk,ek)}
where:

mentions of each set are

non-overlapping
DISAMBIGUATION
Goal
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
interpretation sets
UNSUPERVISED - [Greedy]
Algorithm 1 Greedy Interpretation Finding (GIF)
Input: Ranked list of mention-entity pairs M; score threshold s
Output: Interpretations I = {E1, ..., Em}
begin
1: M0
Prune(M, s)
2: M0
PruneContainmentMentions(M0
)
3: I CreateInterpretations(M0
)
4: return I
end
1: function CREATEINTERPRETATIONS(M)
2: I {;}
3: for (m, e) in M do
4: h 0
5: for E in I do
6: if ¬ hasOverlap(E, (m, e)) then
7: E.add((m, e))
8: h 1
9: end if
10: end for
11: if h == 0 then
12: I.add({(m, e)})
13: end if
14: end for
15: return I
16: end function
(I) Pruning of mention-entity pairs
Discards the ones with score < threshold τs

For containment mentions, keep only the highest
scoring one
(II) Set generation
Adding mention-entity pairs in decreasing order
of score to an interpretation set
SUPERVISED
(I) Generate all interpretations (from the top-K entity-mention pairs)
(II) Collectively select the interpretations with a binary classifier
Collective disambiguation
new york
manhattan
0.8
NEW YORK CITY
NEW YORK
MANHATTAN
MANHATTAN (FILM)
0.7
0.2
m e
0.8
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york, NEW YORK CITY),
(manhattan, MANHATTAN)
interpretation sets
…
SYRACUSE, NEW YORK
ALBANY,_NEW_YORK
new york pizza NEW YORK-STYLE PIZZA 0.9
K 0.5
0.2
(new york, SYRACUSE, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA)
…
✓
✗
✗
✓
✗
new york
manhattan
new york
new york
…
SUPERVISED
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
mention-entity pairs (obtained from the CER step) and generate all possible interpreta-
tions out of those. We further require that mentions within the same interpretation do
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
Entity-based features
computed for the
entire interpretation set
individual entity
features, aggregated
for all entities in the set
Set-based features
SUPERVISED
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
mention-entity pairs (obtained from the CER step) and generate all possible interpreta-
tions out of those. We further require that mentions within the same interpretation do
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
Set-based features
Entity-based features
Query independent

features
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
Candidate entity ranking Table 4 presents the results for CER on the Y-ERD and
ERD-dev datasets. We find that commonness is a strong performer (this is in line with
the findings of [1, 16]). Combining commonness with MLM in a generative model
(MLMcg) delivers excellent performance, with MAP above 0.85 and R@5 around 0.9.
The LTR approach can bring in further slight, but for Y-ERD significant, improvements.
This means that both of our CER methods (MLMcg and LTR) are able to find the vast
majority of the relevant entities and return them at the top ranks.
Disambiguation Table 5 reports on the disambiguation results. We use the naming
RESULTS
LTR-Greedy outperforming other approaches
significantly on both test sets, and is the second
most efficient one.
RESULTS
Comparison with the top performers of the ERD challenge
(on official ERD test platform)
LTR-Greedy approach performs on a par
with the state-of-the-art systems.
Remarkable finding considering the
complexity of other solutions.
bi, Krisztian Balog, and Svein Erik Bratsberg
Table 6. ELQ results on the of-
ficial ERD test platform.
Method F1
LTR-Greedy 0.699
SMAPH-2 [6] 0.708
NTUNLP [5] 0.680
Seznam [10] 0.669
ults, LTR-Greedy is our overall rec-
ompare this method against the top
RD challenge (using the official chal-
Table 6. For this comparison, we
spell checking, as this has also been
erforming system (SMAPH-2) [6].
hat our LTR-Greedy approach per-
h the state-of-the-art systems. This
g into account the simplicity of the
ion algorithm vs. the considerably
ons employed by others.
r results reveal that candidate entity ranking is of higher importance
for ELQ. Hence, it is more beneficial to perform the (expensive)
early on in the pipeline for the seemingly easier CER step; dis-
n be tackled successfully with an unsupervised (greedy) algorithm.
he top ranked entity does not yield an immediate solution; as shown
ion is an indispensable step in ELQ.)
top-3
systems
Entity Linking in Queries: Efficiency vs. Effectiveness 11
3 0.06 0.09 0.12
SetSim
LenRatioSet
P
ContextSim/avg
H
ContextSim/min
ContextSim/max
Commonness/avg
iRank/max
Commonness/min
iRank/min
iRank/avg
Score/max
Score/min
Score/avg
0.00 0.04 0.08 0.12 0.16 0.20
LTR-LTR
MLM-LTR
SimQ-label
SimQ-subject
SimQ-asbtract
Len
Links
Redirects
LenRatio
TEM
SimQ-wikiLink
Sim
QCT
SimQ-content
MCT
Commonness
Matches
0.00 0.03
t features used in the supervised approaches, sorted by Gini score: (Left)
ng, (Right) Disambiguation.
ore components: candidate entity ranking and disambiguation. For
FEATURE IMPORTANCE
LTR-LTR

MLMcg-LTR
WE ASKED …
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?
Candidate entity ranking is of higher
importance than disambiguation. 

It is more beneficial to perform the
(expensive) supervised learning early
on, and tackle disambiguation with an
unsupervised algorithm.
Answer
WE ASKED …
Which group of features is
needed the most for
effective entity
disambiguation: contextual
similarity, interdependence
between entities, or both?
RQ2?
Contextual similarity features are the
most effective for entity disambiguation.



Entity interdependences are helpful
when sufficiently many entities are
mentioned in the text; not the case for
queries.
Answer
Efficiency vs. Effectiveness
Code and resources at:
http://bit.ly/ecir2017-elq
Thank you!
Questions?

More Related Content

PDF
Exploiting Entity Linking in Queries For Entity Retrieval
PPTX
Entity Linking in Queries: Tasks and Evaluation
PDF
Entity Linking
PDF
Entity Search: The Last Decade and the Next
PDF
Evaluation Initiatives for Entity-oriented Search
PDF
Entity Retrieval (SIGIR 2013 tutorial)
PPTX
Gleaning Types for Literals in RDF with Application to Entity Summarization
PDF
Entity Retrieval (WWW 2013 tutorial)
Exploiting Entity Linking in Queries For Entity Retrieval
Entity Linking in Queries: Tasks and Evaluation
Entity Linking
Entity Search: The Last Decade and the Next
Evaluation Initiatives for Entity-oriented Search
Entity Retrieval (SIGIR 2013 tutorial)
Gleaning Types for Literals in RDF with Application to Entity Summarization
Entity Retrieval (WWW 2013 tutorial)

What's hot (20)

PDF
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
PDF
Table Retrieval and Generation
PDF
Representing financial reports on the semantic web a faithful translation f...
PPTX
Expressive Query Answering For Semantic Wikis (20min)
ODP
Information Extraction from the Web - Algorithms and Tools
PDF
What's next in Julia
PDF
SQL For PHP Programmers
PPTX
Expressive Query Answering For Semantic Wikis
PDF
p138-jiang
PDF
Deep Dependency Graph Conversion in English
PPT
A Distributed Tableau Algorithm for Package-based Description Logics
PPT
Everything you wanted to know about Dublin Core metadata
PDF
A Computational Approach to Yoruba Morphology
PPT
Artificial intelligence Prolog Language
PPTX
Normalization 1 nf,2nf,3nf,bcnf
PDF
Database management system session 5
PDF
Neural Nets Deconstructed
PDF
Topic Modeling - NLP
PDF
Ballerina philosophy
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Table Retrieval and Generation
Representing financial reports on the semantic web a faithful translation f...
Expressive Query Answering For Semantic Wikis (20min)
Information Extraction from the Web - Algorithms and Tools
What's next in Julia
SQL For PHP Programmers
Expressive Query Answering For Semantic Wikis
p138-jiang
Deep Dependency Graph Conversion in English
A Distributed Tableau Algorithm for Package-based Description Logics
Everything you wanted to know about Dublin Core metadata
A Computational Approach to Yoruba Morphology
Artificial intelligence Prolog Language
Normalization 1 nf,2nf,3nf,bcnf
Database management system session 5
Neural Nets Deconstructed
Topic Modeling - NLP
Ballerina philosophy
Ad

Similar to Entity Linking in Queries: Efficiency vs. Effectiveness (20)

PPTX
Understanding Queries through Entities
PDF
Entity Retrieval (WSDM 2014 tutorial)
PDF
Dynamic Factual Summaries for Entity Cards
PDF
PDF
Improving Entity Retrieval on Structured Data
PPTX
Exploiting web search engines to search structured
PDF
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
PPTX
NLP & DBpedia
ODP
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
PDF
Semantic Search and Result Presentation with Entity Cards
PDF
A scalable gibbs sampler for probabilistic entity linking
DOCX
Entity linking with a knowledge base issues,
PDF
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
PDF
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
PDF
A Semantic Search Approach to Task-Completion Engines
PDF
Perspectives on mining knowledge graphs from text
PDF
Web-scale semantic search
PDF
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
PDF
Combining Similarities and Regression for Entity Linking.
Understanding Queries through Entities
Entity Retrieval (WSDM 2014 tutorial)
Dynamic Factual Summaries for Entity Cards
Improving Entity Retrieval on Structured Data
Exploiting web search engines to search structured
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from...
NLP & DBpedia
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
Semantic Search and Result Presentation with Entity Cards
A scalable gibbs sampler for probabilistic entity linking
Entity linking with a knowledge base issues,
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
A Semantic Search Approach to Task-Completion Engines
Perspectives on mining knowledge graphs from text
Web-scale semantic search
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
Combining Similarities and Regression for Entity Linking.
Ad

Recently uploaded (20)

PPTX
Introduction to Immunology (Unit-1).pptx
PPTX
congenital heart diseases of burao university.pptx
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
Platelet disorders - thrombocytopenia.pptx
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PPTX
Substance Disorders- part different drugs change body
PPTX
Introcution to Microbes Burton's Biology for the Health
PPT
LEC Synthetic Biology and its application.ppt
PDF
CuO Nps photocatalysts 15156456551564161
PPTX
gene cloning powerpoint for general biology 2
PDF
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PPTX
Understanding the Circulatory System……..
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPTX
2currentelectricity1-201006102815 (1).pptx
PDF
Science Form five needed shit SCIENEce so
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Introduction to Immunology (Unit-1).pptx
congenital heart diseases of burao university.pptx
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Presentation1 INTRODUCTION TO ENZYMES.pptx
Platelet disorders - thrombocytopenia.pptx
Preformulation.pptx Preformulation studies-Including all parameter
Substance Disorders- part different drugs change body
Introcution to Microbes Burton's Biology for the Health
LEC Synthetic Biology and its application.ppt
CuO Nps photocatalysts 15156456551564161
gene cloning powerpoint for general biology 2
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
Understanding the Circulatory System……..
Enhancing Laboratory Quality Through ISO 15189 Compliance
2currentelectricity1-201006102815 (1).pptx
Science Form five needed shit SCIENEce so
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...

Entity Linking in Queries: Efficiency vs. Effectiveness

  • 1. Faegheh Hasibi, Krisztian Balog, Svein E. Bratsberg ECIR 2017 ENTITY LINKING IN QUERIES: Efficiency vs. Effectiveness IAI_group
  • 2. ENTITY LINKING TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER PHILIP K. DICK ‣ Identifying entities in the text and linking them to the corresponding entry in the knowledge base Total Recall is a 1990 American science fiction action film directed by Paul Verhoeven, starring Rachel Ticotin, Sharon Stone, Arnold Schwarzenegger, Ronny Cox and Michael Ironside. The film is loosely based on the Philip K. Dick story…
  • 3. TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER ENTITY LINKING IN QUERIES (ELQ) total recall arnold schwarzenegger
  • 4. TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER ENTITY LINKING IN QUERIES (ELQ) total recall arnold schwarzenegger total recall TOTAL RECALL (2012 FILM)TOTAL RECALL (1990 FILM)
  • 5. Query Entity Linking Interpretation ‣ Identifying sets of entity linking interpretations ENTITY LINKING IN QUERIES (ELQ) [Carmel et al. 2014], [Hasibi et al. 2015], [Cornolti et al. 2016]
  • 6. CONVENTIONAL APPROACH Mention detection Candidate Entity Ranking Disambiguation new york new york pizza 0.8 NEW YORK NEW YORK CITY .. NEW YORK-STYLE PIZZA SICILIAN_PIZZA … 0.7 0.1 m e 0.9 (new york, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) s(m,e) interpretation sets 1 2 …
  • 7. Incorporating different signals: CONVENTIONAL APPROACH ‣ Contextual similarity between a candidate entity and text (or entity mention) ‣ Interdependence between all entity linking decisions (extracted from the underlying KB) e1 e2 e3 e4e5
  • 8. What is special about entity linking in queries? What is really different when it comes to the queries?
  • 9. RESEARCH QUESTIONS ELQ is an online process and should be done under strict time constraints If we are to allocate the available processing time between the two steps, which one would yield the highest gain? RQ1?
  • 10. Which group of features is needed the most for effective entity linking: contextual similarity, interdependence between entities, or both? RQ2? RESEARCH QUESTIONS The context provided by the queries is limited

  • 11. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 1 2
  • 12. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 1 2
  • 13. Estimate: score(m, e, q) Given: • mention m • entity e • query q CANDIDATE ENTITY RANKING (I) Identify all possible entities that can be linked in the query (II) Rank them based on how likely they are link targets Goal ‣ lexical matching of query n-grams against a rich dictionary of entity name variants
  • 14. UNSUPERVISED [MLMcg] rectly in the subsequent disambiguation step. Using lexical matching of query n-g against a rich dictionary of entity name variants allows for the identification of ca date entities with close to perfect recall [16]. We follow this approach to obtain a l candidate entities together with their corresponding mentions in the query. Our f of attention below is on ranking these candidate (m, e) pairs with respect to the q i.e., estimating score(m, e, q). Unsupervised For the unsupervised ranking approach, we take a state-of-the-art g ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]. This m considers both the likelihood of the given mention and the similarity between the q and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the probability mention being linked to an entity (a.k.a. commonness [22]), computed from the F collection [12]. The query likelihood P(q|e) is estimated using the query length malized language model similarity [20]: P(q|e) = Q t2q P(t|✓e)P (t|q) Q t2q P(t|C)P (t|q) , where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q|). The e and collection language models, P(t|✓e) and P(t|C), are computed using the Mix of Language Models (MLM) approach [27]. against a rich dictionary of entity name variants allows for the identificat date entities with close to perfect recall [16]. We follow this approach to o candidate entities together with their corresponding mentions in the quer of attention below is on ranking these candidate (m, e) pairs with respect i.e., estimating score(m, e, q). Unsupervised For the unsupervised ranking approach, we take a state-of-t ative model, specifically, the MLMcg model proposed by Hasibi et al. [16] considers both the likelihood of the given mention and the similarity betwe and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the pro mention being linked to an entity (a.k.a. commonness [22]), computed fro collection [12]. The query likelihood P(q|e) is estimated using the query malized language model similarity [20]: P(q|e) = Q t2q P(t|✓e)P (t|q) Q t2q P(t|C)P (t|q) , where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q and collection language models, P(t|✓e) and P(t|C), are computed using Generative model for ranking entities based on the query and the specific mention Prob. mention being linked to the entity query likelihood based on
 Mixture of Language Models (MLM) [Hasibi et al. 2015]
  • 15. SUPERVISED [LTR] 28 features mainly from literature Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- [Meij et al. 2013], [Medelyan et al. 2008]
  • 16. SUPERVISED [LTR] 28 features mainly from literature Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- [Meij et al. 2012], [Medelyan et al. 2008] Extracted based on:
 Mention, Entity, Mention-entity, Query
  • 17. TEST COLLECTIONS ‣ Y-ERD • Based on the Yahoo Search Query Log to Entities dataset • 2398 queries ‣ ERD-dev • Released as part of the ERD challenge 2014 • 91 queries [Carmel et al. 2014], [Hasibi et al. 2015]
  • 18. Entity Linking in Queries: Efficiency vs. Effectiveness 9 Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423 RESULTS Both methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks.
  • 19. RESULTS Both methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks. Entity Linking in Queries: Efficiency vs. Effectiveness 9 Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423
  • 20. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 21
  • 21. Identify interpretation set(s): Ei ={(m1,e1),...,(mk,ek)} where:
 mentions of each set are
 non-overlapping DISAMBIGUATION Goal (new york, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) interpretation sets
  • 22. UNSUPERVISED - [Greedy] Algorithm 1 Greedy Interpretation Finding (GIF) Input: Ranked list of mention-entity pairs M; score threshold s Output: Interpretations I = {E1, ..., Em} begin 1: M0 Prune(M, s) 2: M0 PruneContainmentMentions(M0 ) 3: I CreateInterpretations(M0 ) 4: return I end 1: function CREATEINTERPRETATIONS(M) 2: I {;} 3: for (m, e) in M do 4: h 0 5: for E in I do 6: if ¬ hasOverlap(E, (m, e)) then 7: E.add((m, e)) 8: h 1 9: end if 10: end for 11: if h == 0 then 12: I.add({(m, e)}) 13: end if 14: end for 15: return I 16: end function (I) Pruning of mention-entity pairs Discards the ones with score < threshold τs
 For containment mentions, keep only the highest scoring one (II) Set generation Adding mention-entity pairs in decreasing order of score to an interpretation set
  • 23. SUPERVISED (I) Generate all interpretations (from the top-K entity-mention pairs) (II) Collectively select the interpretations with a binary classifier Collective disambiguation new york manhattan 0.8 NEW YORK CITY NEW YORK MANHATTAN MANHATTAN (FILM) 0.7 0.2 m e 0.8 (new york, NEW YORK), (manhattan, MANHATTAN) (new york, NEW YORK CITY), (manhattan, MANHATTAN) interpretation sets … SYRACUSE, NEW YORK ALBANY,_NEW_YORK new york pizza NEW YORK-STYLE PIZZA 0.9 K 0.5 0.2 (new york, SYRACUSE, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA) … ✓ ✗ ✗ ✓ ✗ new york manhattan new york new york …
  • 24. SUPERVISED 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e mention-entity pairs (obtained from the CER step) and generate all possible interpreta- tions out of those. We further require that mentions within the same interpretation do 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e Entity-based features computed for the entire interpretation set individual entity features, aggregated for all entities in the set Set-based features
  • 25. SUPERVISED 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e mention-entity pairs (obtained from the CER step) and generate all possible interpreta- tions out of those. We further require that mentions within the same interpretation do 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e Set-based features Entity-based features Query independent
 features
  • 26. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423 Candidate entity ranking Table 4 presents the results for CER on the Y-ERD and ERD-dev datasets. We find that commonness is a strong performer (this is in line with the findings of [1, 16]). Combining commonness with MLM in a generative model (MLMcg) delivers excellent performance, with MAP above 0.85 and R@5 around 0.9. The LTR approach can bring in further slight, but for Y-ERD significant, improvements. This means that both of our CER methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks. Disambiguation Table 5 reports on the disambiguation results. We use the naming RESULTS LTR-Greedy outperforming other approaches significantly on both test sets, and is the second most efficient one.
  • 27. RESULTS Comparison with the top performers of the ERD challenge (on official ERD test platform) LTR-Greedy approach performs on a par with the state-of-the-art systems. Remarkable finding considering the complexity of other solutions. bi, Krisztian Balog, and Svein Erik Bratsberg Table 6. ELQ results on the of- ficial ERD test platform. Method F1 LTR-Greedy 0.699 SMAPH-2 [6] 0.708 NTUNLP [5] 0.680 Seznam [10] 0.669 ults, LTR-Greedy is our overall rec- ompare this method against the top RD challenge (using the official chal- Table 6. For this comparison, we spell checking, as this has also been erforming system (SMAPH-2) [6]. hat our LTR-Greedy approach per- h the state-of-the-art systems. This g into account the simplicity of the ion algorithm vs. the considerably ons employed by others. r results reveal that candidate entity ranking is of higher importance for ELQ. Hence, it is more beneficial to perform the (expensive) early on in the pipeline for the seemingly easier CER step; dis- n be tackled successfully with an unsupervised (greedy) algorithm. he top ranked entity does not yield an immediate solution; as shown ion is an indispensable step in ELQ.) top-3 systems
  • 28. Entity Linking in Queries: Efficiency vs. Effectiveness 11 3 0.06 0.09 0.12 SetSim LenRatioSet P ContextSim/avg H ContextSim/min ContextSim/max Commonness/avg iRank/max Commonness/min iRank/min iRank/avg Score/max Score/min Score/avg 0.00 0.04 0.08 0.12 0.16 0.20 LTR-LTR MLM-LTR SimQ-label SimQ-subject SimQ-asbtract Len Links Redirects LenRatio TEM SimQ-wikiLink Sim QCT SimQ-content MCT Commonness Matches 0.00 0.03 t features used in the supervised approaches, sorted by Gini score: (Left) ng, (Right) Disambiguation. ore components: candidate entity ranking and disambiguation. For FEATURE IMPORTANCE LTR-LTR
 MLMcg-LTR
  • 29. WE ASKED … If we are to allocate the available processing time between the two steps, which one would yield the highest gain? RQ1? Candidate entity ranking is of higher importance than disambiguation. 
 It is more beneficial to perform the (expensive) supervised learning early on, and tackle disambiguation with an unsupervised algorithm. Answer
  • 30. WE ASKED … Which group of features is needed the most for effective entity disambiguation: contextual similarity, interdependence between entities, or both? RQ2? Contextual similarity features are the most effective for entity disambiguation.
 
 Entity interdependences are helpful when sufficiently many entities are mentioned in the text; not the case for queries. Answer
  • 32. Code and resources at: http://bit.ly/ecir2017-elq Thank you! Questions?