Query expansion_group42_ire

Query expansion is the process of reformulating a seed query to
improve retrieval performance in information retrieval
operations
Query expansion majorly comprises of augmenting certain
terms to the query such that the final query will result in desired
results.
For example, I might want information about rahul (Rahul
Gandhi) and I give the query “Rahul”, then seeing the results I
change my query - I write his profession, my new query would
be something like “rahul politics”

Many users refine their queries by analyzing the results from
initial queries.
Automating this process is a highly beneficial task for
fulfilling the information needs of the user and satisfying
him/her.

DocClustQE : This approach uses both documents and
clusters that are similar to the query to perform the
expansion.
Specifically, the k elements x in D ∪ C l(D) that
yield the highest query-likelihood Πqi px (qi ) are used
where, q, d, and D denote a query, a document, and a
corpus respectively.

Our approach includes the following steps:
Offline Process
1. Creating Indexes of all the individual files of the Dataset.
2. Cluster the documents. : Creating a similarity matrix for
documents by cosine-similarity.
3. Identifying cluster tags : top terms from each cluster are pre-
computed and stored, these represent cluster tags.

Online process:
Seed query Add
wikisynonym
Re-weight
Initial search
Find C
clusters
{T1, T2, T3, T4, T5, T6,………………..Tm}
Top N results
Tags of C clusters.
Terms for Query Expansion-{T1,T2……Tm}
Follow up
search
Is p10
improved
?
N o
yes
Final Terms for
Expansions.
Pseudo-
relevence
feedback
Exclude

Clusters capture context-information better than individual
documents.
For clustering we have used “scluster” program of cluto. Cluto's
scluster program takes input as an adjacency matrix of the
graph that specifies the similarity between the objects(here files)
to be clustered.

1. Cluto2.1:
Cluto is a high dimensional clustering tool. We have used this for
clustering of documents.
This tool takes an input of similarity matrix(documents represented in
dimensional space) and no.of.clusters and gives the clusters(cluster of
documents).
2. Lucene4.0:
Lucene 4.0 is a tool for creating, reading and searching indexes.
• The data set we have used is “News paper stories.”

1. Acronym and Synonyms
2. Spell Check
3. Query Re-weighing
4. Real Time Feedback
eg. Rahul → Rahul gandhi congress party.
5. Combine Morphologicals Form into one

1. Calculate value of ‘k’ used in k-means dynamically.
2. Using semantic distance or similarity score
between words.
3. Query logs can be implemented.

201101142 - YARRAM SUDHIR KUMAR REDDY
201125226 – NELAKUDITI KOVIDA
201206689 – VISLESH KODURUPAKA

Query expansion_group42_ire

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Query expansion_group42_ire (20)

Recently uploaded (20)

Query expansion_group42_ire