SlideShare a Scribd company logo
Query Expansion with Locally-
Trained Word Embeddings
Fernando Bhaskar Mitra Nick Craswell
Microsoft
p(d)
d
p(d)
d
q
p(d|q)
cut
global local*
cutting tax
squeeze deficit
reduce vote
slash budget
reduction reduction
spend house
lower bill
halve plan
soften spend
freeze billion
global: trained using full
corpus
local: trained using topically-
*gas
global local
t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
• local term clustering [Lesk 1968, Attar and Fraenkel
1977]
• local latent semantic analysis [Hull 1995, Hull, 1994;
Schutze et al., 1995; Singhal et al., 1997]
• local document clustering [Tombros and van
Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]
• one sense per discourse [Gale et al., 1992]
target
corpus
query
results
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
…
gas petroleum:0.9 indigestion:0.6 …
tax tariff:0.7 strain:0.4 …
…[ ]W=
q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
W = UUT
U m ⇥ k embedding matrix
p(d)
d
q
p(d|q)
p(d)
d
q
˜p(d|q)
target
corpus
query
results
external
corpus
query
results
U =
8
>>><
>>>:
uniform p(d) on the target corpus
uniform p(d) on an external corpus
p(d|q) on the target corpus
p(d|q) on an external corpus
docs words queries
trec12 469,949 438,338 150
robust 528,155 665,128 250
web 50,220,423 90,411,624 200
global local
target target
wikipedia+gigaword* gigaword†
google news* wikipedia†
*publicly available embedding; †publicly available external corpus
target
corpus
query
results
external
corpus
query
results
target
corpus
query
results
target
corpus
query
results
external
corpus
query
results
trec12 robust web
local vs global
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
expansion
none
global
local
trec12 robust web
local embedding
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
corpus
target
gigaword
wikipedia
• local embedding provides a stronger representation than
global embedding
• potential impact for other topic-specific natural language
processing tasks
• future work
• effectiveness improvements
• efficiency improvements

More Related Content

PPTX
R user-group-2011-09
PPTX
How does one go from binary data to HDF files efficiently?
PPTX
R user group 2011 09
PDF
For beginner circle_ci_config_workflow_explanation
PDF
Circle ci terraform_plan_apply_automation
PDF
Terraform meetup 02_workspace_12_factorapp
PDF
Optimizing joins in Map reduce jobs via Lookup Service
PDF
Gc in golang
R user-group-2011-09
How does one go from binary data to HDF files efficiently?
R user group 2011 09
For beginner circle_ci_config_workflow_explanation
Circle ci terraform_plan_apply_automation
Terraform meetup 02_workspace_12_factorapp
Optimizing joins in Map reduce jobs via Lookup Service
Gc in golang

Viewers also liked (10)

PDF
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
PPTX
Using Text Embeddings for Information Retrieval
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
PPTX
A Proposal for Evaluating Answer Distillation from Web Data
PPTX
Neu-IR 2016: Lessons from the Trenches
PPTX
Neu-ir 2016: Opening note
PPTX
Recurrent networks and beyond by Tomas Mikolov
PPTX
A Simple Introduction to Word Embeddings
PPTX
Vectorland: Brief Notes from Using Text Embeddings for Search
PDF
Natural Language Processing with Graph Databases and Neo4j
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Using Text Embeddings for Information Retrieval
Neural Text Embeddings for Information Retrieval (WSDM 2017)
A Proposal for Evaluating Answer Distillation from Web Data
Neu-IR 2016: Lessons from the Trenches
Neu-ir 2016: Opening note
Recurrent networks and beyond by Tomas Mikolov
A Simple Introduction to Word Embeddings
Vectorland: Brief Notes from Using Text Embeddings for Search
Natural Language Processing with Graph Databases and Neo4j
Ad

More from Bhaskar Mitra (20)

PPTX
Emancipatory Information Retrieval (Invited Talk at UCC)
PPTX
Emancipatory Information Retrieval (SWIRL 2025)
PPTX
Sociotechnical Implications of Generative AI for Information Access
PDF
Bias and Beyond: On Generative AI and the Future of Search and Society
PPTX
Search and Society: Reimagining Information Access for Radical Futures
PPTX
Joint Multisided Exposure Fairness for Search and Recommendation
PPTX
What’s next for deep learning for Search?
PDF
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
PPTX
Efficient Machine Learning and Machine Learning for Efficiency in Information...
PPTX
Multisided Exposure Fairness for Search and Recommendation
PPTX
Neural Learning to Rank
PPTX
Neural Information Retrieval: In search of meaningful progress
PPTX
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
PPTX
Neural Learning to Rank
PPTX
Duet @ TREC 2019 Deep Learning Track
PPTX
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
PPTX
Deep Neural Methods for Retrieval
PPTX
Neural Learning to Rank
PPTX
Learning to Rank with Neural Networks
PPTX
Deep Learning for Search
Emancipatory Information Retrieval (Invited Talk at UCC)
Emancipatory Information Retrieval (SWIRL 2025)
Sociotechnical Implications of Generative AI for Information Access
Bias and Beyond: On Generative AI and the Future of Search and Society
Search and Society: Reimagining Information Access for Radical Futures
Joint Multisided Exposure Fairness for Search and Recommendation
What’s next for deep learning for Search?
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Multisided Exposure Fairness for Search and Recommendation
Neural Learning to Rank
Neural Information Retrieval: In search of meaningful progress
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Neural Learning to Rank
Duet @ TREC 2019 Deep Learning Track
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Deep Neural Methods for Retrieval
Neural Learning to Rank
Learning to Rank with Neural Networks
Deep Learning for Search
Ad

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

  • 1. Query Expansion with Locally- Trained Word Embeddings Fernando Bhaskar Mitra Nick Craswell Microsoft
  • 4. cut global local* cutting tax squeeze deficit reduce vote slash budget reduction reduction spend house lower bill halve plan soften spend freeze billion global: trained using full corpus local: trained using topically- *gas
  • 5. global local t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
  • 8. • local term clustering [Lesk 1968, Attar and Fraenkel 1977] • local latent semantic analysis [Hull 1995, Hull, 1994; Schutze et al., 1995; Singhal et al., 1997] • local document clustering [Tombros and van Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985] • one sense per discourse [Gale et al., 1992]
  • 10. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax
  • 11. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 12. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax … gas petroleum:0.9 indigestion:0.6 … tax tariff:0.7 strain:0.4 … …[ ]W=
  • 13. q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 14. W = UUT U m ⇥ k embedding matrix
  • 18. U = 8 >>>< >>>: uniform p(d) on the target corpus uniform p(d) on an external corpus p(d|q) on the target corpus p(d|q) on an external corpus
  • 19. docs words queries trec12 469,949 438,338 150 robust 528,155 665,128 250 web 50,220,423 90,411,624 200
  • 20. global local target target wikipedia+gigaword* gigaword† google news* wikipedia† *publicly available embedding; †publicly available external corpus target corpus query results external corpus query results target corpus query results target corpus query results external corpus query results
  • 21. trec12 robust web local vs global NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 expansion none global local
  • 22. trec12 robust web local embedding NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 corpus target gigaword wikipedia
  • 23. • local embedding provides a stronger representation than global embedding • potential impact for other topic-specific natural language processing tasks • future work • effectiveness improvements • efficiency improvements