SlideShare a Scribd company logo
Query expansion_group42_ire
Query expansion is the process of reformulating a seed query to
improve retrieval performance in information retrieval
operations
Query expansion majorly comprises of augmenting certain
terms to the query such that the final query will result in desired
results.
For example, I might want information about rahul (Rahul
Gandhi) and I give the query “Rahul”, then seeing the results I
change my query - I write his profession, my new query would
be something like “rahul politics”
Many users refine their queries by analyzing the results from
initial queries.
Automating this process is a highly beneficial task for
fulfilling the information needs of the user and satisfying
him/her.
DocClustQE : This approach uses both documents and
clusters that are similar to the query to perform the
expansion.
Specifically, the k elements x in D ∪ C l(D) that
yield the highest query-likelihood Πqi px (qi ) are used
where, q, d, and D denote a query, a document, and a
corpus respectively.
Our approach includes the following steps:
Offline Process
1. Creating Indexes of all the individual files of the Dataset.
2. Cluster the documents. : Creating a similarity matrix for
documents by cosine-similarity.
3. Identifying cluster tags : top terms from each cluster are pre-
computed and stored, these represent cluster tags.
Online process:
Seed query Add
wikisynonym
Re-weight
Initial search
Find C
clusters
{T1, T2, T3, T4, T5, T6,………………..Tm}
Top N results
Tags of C clusters.
Terms for Query Expansion-{T1,T2……Tm}
Follow up
search
Is p10
improved
?
N o
yes
Final Terms for
Expansions.
Pseudo-
relevence
feedback
Exclude
Clusters capture context-information better than individual
documents.
For clustering we have used “scluster” program of cluto. Cluto's
scluster program takes input as an adjacency matrix of the
graph that specifies the similarity between the objects(here files)
to be clustered.
1. Cluto2.1:
Cluto is a high dimensional clustering tool. We have used this for
clustering of documents.
This tool takes an input of similarity matrix(documents represented in
dimensional space) and no.of.clusters and gives the clusters(cluster of
documents).
2. Lucene4.0:
Lucene 4.0 is a tool for creating, reading and searching indexes.
• The data set we have used is “News paper stories.”
Query expansion_group42_ire
1. Acronym and Synonyms
2. Spell Check
3. Query Re-weighing
4. Real Time Feedback
eg. Rahul → Rahul gandhi congress party.
5. Combine Morphologicals Form into one
1. Calculate value of ‘k’ used in k-means dynamically.
2. Using semantic distance or similarity score
between words.
3. Query logs can be implemented.
201101142 - YARRAM SUDHIR KUMAR REDDY
201125226 – NELAKUDITI KOVIDA
201206689 – VISLESH KODURUPAKA

More Related Content

PPTX
Query expansion_Team42_IRE2k14
PPTX
Document clustering for forensic analysis
DOCX
Final proj 2 (1)
PPTX
Document clustering and classification
PPTX
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
PDF
LoryfelNunez
PDF
LoryfelNunezInsight
DOCX
Ontology based clustering algorithms
Query expansion_Team42_IRE2k14
Document clustering for forensic analysis
Final proj 2 (1)
Document clustering and classification
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
LoryfelNunez
LoryfelNunezInsight
Ontology based clustering algorithms

What's hot (20)

DOCX
assignment3
PDF
A fuzzy clustering algorithm for high dimensional streaming data
PDF
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PDF
Mining Fuzzy Association Rules from Web Usage Quantitative Data
PPT
Lect4
PPTX
Hierarchical clustering in Python and beyond
PDF
10 Algorithms in data mining
PPT
(Talk in Powerpoint Format)
PPTX
Modeling with Document Database: 5 Key Patterns
PDF
Enhanced Clustering Algorithm for Processing Online Data
PPTX
Data Mining: clustering and analysis
PPTX
Data Mining: Mining stream time series and sequence data
PPT
A Model of the Scholarly Community
PPSX
Elasticsearch - basics and beyond
PDF
WoSC19: Serverless Workflows for Indexing Large Scientific Data
PDF
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
PPT
Automatic Metadata Generation using Associative Networks
PDF
test
PDF
Big Data Clustering Model based on Fuzzy Gaussian
PPTX
Scoring, term weighting and the vector space
assignment3
A fuzzy clustering algorithm for high dimensional streaming data
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Lect4
Hierarchical clustering in Python and beyond
10 Algorithms in data mining
(Talk in Powerpoint Format)
Modeling with Document Database: 5 Key Patterns
Enhanced Clustering Algorithm for Processing Online Data
Data Mining: clustering and analysis
Data Mining: Mining stream time series and sequence data
A Model of the Scholarly Community
Elasticsearch - basics and beyond
WoSC19: Serverless Workflows for Indexing Large Scientific Data
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
Automatic Metadata Generation using Associative Networks
test
Big Data Clustering Model based on Fuzzy Gaussian
Scoring, term weighting and the vector space
Ad

Viewers also liked (20)

PDF
Information retrieval on the web
PPTX
Data retrieval basics_v1.0
PDF
Testingggg 2
PDF
Aspect Level Information Retrieval System for Micro Blogging Site
ODP
Web Information Retrieval and Mining
PPTX
A MIND MAP QUERY IN INFORMATION RETRIEVAL
PDF
wEb infomation retrieval
PPTX
Financial Comic Information Retrieval System
PDF
Query Languages for Document Stores
PPTX
User Focused Digital Library: A Practical Guide
PPTX
Adaptive relevance feedback in information retrieval
PPTX
Tdm information retrieval
PPTX
Ppt evaluation of information retrieval system
PDF
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
PPTX
Using Text Embeddings for Information Retrieval
PDF
Data Retrieval Systems
PPTX
Information retrieval system!
PPTX
Information storage and retrieval
Information retrieval on the web
Data retrieval basics_v1.0
Testingggg 2
Aspect Level Information Retrieval System for Micro Blogging Site
Web Information Retrieval and Mining
A MIND MAP QUERY IN INFORMATION RETRIEVAL
wEb infomation retrieval
Financial Comic Information Retrieval System
Query Languages for Document Stores
User Focused Digital Library: A Practical Guide
Adaptive relevance feedback in information retrieval
Tdm information retrieval
Ppt evaluation of information retrieval system
Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Using Text Embeddings for Information Retrieval
Data Retrieval Systems
Information retrieval system!
Information storage and retrieval
Ad

Similar to Query expansion_group42_ire (20)

PDF
Query expansion
DOC
TEXT CLUSTERING.doc
PPTX
Using Knowledge Graph for Promoting Cognitive Computing
PDF
Volume 2-issue-6-1969-1973
PDF
Volume 2-issue-6-1969-1973
PPT
Cluster
PPTX
Techniques For Deep Query Understanding
PDF
IRJET- Semantics based Document Clustering
PDF
Crowdsourced query augmentation through the semantic discovery of domain spec...
PDF
Hierarchal clustering and similarity measures along with multi representation
PDF
Hierarchal clustering and similarity measures along
PDF
Document retrieval using clustering
PDF
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
PDF
Web-scale semantic search
PPT
Giovanni Maria Sacco
PPT
Lecture#14 Clustering in querie eees.ppt
PDF
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
PDF
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
PDF
Enhancing the labelling technique of
Query expansion
TEXT CLUSTERING.doc
Using Knowledge Graph for Promoting Cognitive Computing
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
Cluster
Techniques For Deep Query Understanding
IRJET- Semantics based Document Clustering
Crowdsourced query augmentation through the semantic discovery of domain spec...
Hierarchal clustering and similarity measures along with multi representation
Hierarchal clustering and similarity measures along
Document retrieval using clustering
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
Web-scale semantic search
Giovanni Maria Sacco
Lecture#14 Clustering in querie eees.ppt
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
Enhancing the labelling technique of

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Machine Learning_overview_presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
1. Introduction to Computer Programming.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Getting Started with Data Integration: FME Form 101
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
NewMind AI Weekly Chronicles - August'25-Week II
Machine Learning_overview_presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
1. Introduction to Computer Programming.pptx
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Query expansion_group42_ire

  • 2. Query expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations Query expansion majorly comprises of augmenting certain terms to the query such that the final query will result in desired results. For example, I might want information about rahul (Rahul Gandhi) and I give the query “Rahul”, then seeing the results I change my query - I write his profession, my new query would be something like “rahul politics”
  • 3. Many users refine their queries by analyzing the results from initial queries. Automating this process is a highly beneficial task for fulfilling the information needs of the user and satisfying him/her.
  • 4. DocClustQE : This approach uses both documents and clusters that are similar to the query to perform the expansion. Specifically, the k elements x in D ∪ C l(D) that yield the highest query-likelihood Πqi px (qi ) are used where, q, d, and D denote a query, a document, and a corpus respectively.
  • 5. Our approach includes the following steps: Offline Process 1. Creating Indexes of all the individual files of the Dataset. 2. Cluster the documents. : Creating a similarity matrix for documents by cosine-similarity. 3. Identifying cluster tags : top terms from each cluster are pre- computed and stored, these represent cluster tags.
  • 6. Online process: Seed query Add wikisynonym Re-weight Initial search Find C clusters {T1, T2, T3, T4, T5, T6,………………..Tm} Top N results Tags of C clusters. Terms for Query Expansion-{T1,T2……Tm} Follow up search Is p10 improved ? N o yes Final Terms for Expansions. Pseudo- relevence feedback Exclude
  • 7. Clusters capture context-information better than individual documents. For clustering we have used “scluster” program of cluto. Cluto's scluster program takes input as an adjacency matrix of the graph that specifies the similarity between the objects(here files) to be clustered.
  • 8. 1. Cluto2.1: Cluto is a high dimensional clustering tool. We have used this for clustering of documents. This tool takes an input of similarity matrix(documents represented in dimensional space) and no.of.clusters and gives the clusters(cluster of documents). 2. Lucene4.0: Lucene 4.0 is a tool for creating, reading and searching indexes. • The data set we have used is “News paper stories.”
  • 10. 1. Acronym and Synonyms 2. Spell Check 3. Query Re-weighing 4. Real Time Feedback eg. Rahul → Rahul gandhi congress party. 5. Combine Morphologicals Form into one
  • 11. 1. Calculate value of ‘k’ used in k-means dynamically. 2. Using semantic distance or similarity score between words. 3. Query logs can be implemented.
  • 12. 201101142 - YARRAM SUDHIR KUMAR REDDY 201125226 – NELAKUDITI KOVIDA 201206689 – VISLESH KODURUPAKA