Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

Query Expansion with Locally-
Trained Word Embeddings
Fernando Bhaskar Mitra Nick Craswell
Microsoft

cut
global local*
cutting tax
squeeze deﬁcit
reduce vote
slash budget
reduction reduction
spend house
lower bill
halve plan
soften spend
freeze billion
global: trained using full
corpus
local: trained using topically-
*gas

global local
t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))

• local term clustering [Lesk 1968, Attar and Fraenkel
1977]
• local latent semantic analysis [Hull 1995, Hull, 1994;
Schutze et al., 1995; Singhal et al., 1997]
• local document clustering [Tombros and van
Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]
• one sense per discourse [Gale et al., 1992]

q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax

query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]

query = gas tax
…
gas petroleum:0.9 indigestion:0.6 …
tax tariff:0.7 strain:0.4 …
…[ ]W=

query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]

W = UUT
U m ⇥ k embedding matrix

target
corpus
query
results
external
corpus
query
results

U =
8
>>><
>>>:
uniform p(d) on the target corpus
uniform p(d) on an external corpus
p(d|q) on the target corpus
p(d|q) on an external corpus

docs words queries
trec12 469,949 438,338 150
robust 528,155 665,128 250
web 50,220,423 90,411,624 200

global local
target target
wikipedia+gigaword* gigaword†
google news* wikipedia†
*publicly available embedding; †publicly available external corpus
target
corpus
query
results
external
corpus
query
results
target
corpus
query
results
target
corpus
query
results
external
corpus
query
results

trec12 robust web
local vs global
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
expansion
none
global
local

trec12 robust web
local embedding
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
corpus
target
gigaword
wikipedia

• local embedding provides a stronger representation than
global embedding
• potential impact for other topic-speciﬁc natural language
processing tasks
• future work
• effectiveness improvements
• efﬁciency improvements

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

More Related Content

Viewers also liked (10)

More from Bhaskar Mitra (20)

Recently uploaded (20)

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)