An efficient approach to query reformulation in web search

IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 172
AN EFFICIENT APPROACH TO QUERY REFORMULATION IN WEB
SEARCH
M. Kiran Kumar1
, S. Jessica Saritha2
1
M-Tech, Computer Science & Engineering, JNTUA College of Engineering Pulivendula, AP, India
2
Assistant Professor, Computer Science & Engineering, JNTUA College of Engineering, Pulivendula, AP, India.
Abstract
Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can
be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the
top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and
probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log
linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log
linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration
conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation
which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of
queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the
previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not
focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show
that the projected method is extremely accurate and also efficient.
Keywords: Log linear method, Query reformulation, Spelling Error correction.
--------------------------------------------------------------------***--------------------------------------------------------------------
1. INTRODUCTION
This paper focuses on string transformation, which is the
most common problem, in various applications. In the
processing of natural language correction of spelling errors
generation of pronunciations, word stemming can be
categorized as string transformation. It can also be used in
query reformulation and query implication in search. In the
domain of data mining, string transformation is used in
synonyms mining and database verification matching. Now
days all the application are based on the network kit is
compulsory that the transformation must be accurate as well
as efficient. If we give an input string, the system will
generates the top k most equivalent output strings which are
related to the input string by the usage of many operations.
Here any type of tokens such as string of characters, words.
Every operator is considered as a rule which defines
replacement of substring with its equivalent substring. The
possibility of transformation can characterize relationship,
importance, and connection between strings in a particular
application. Though assured development has been made,
supplementary examination of the job is still needed, mainly
from the point of view of enhancing accuracy as well as
efficiency, which is accurately the objective of this effort.
Based on the dictionary usage string transformation can be
performed on different settings, on is when the dictionary is
used and the other is when not used.. When we use a
dictionary, the resultant strings must be present in the
dictionary, as the
Volume of the dictionary is very large. Without the
simplification, we particularly study about the correction of
spelling in query along with the reformulation of the queries
in web search. In the initial task, a string contains character
set. In the next task, a string is of words. Correcting spelling
in queries typically consists of two stages: candidate
generation and candidate selection. Candidate generation is
used when it is required to identify the most common
corrections of misspelled string from dictionary. In such
cases, a string consists of characters is considered as input
and operators correspond to insert, delete, and substitution
with surrounding characters, or without surrounding
characters for example, “lly”!“ly”. Clearly candidate
generation can be considered as an example for string
transformation. Note that the candidate generation is
disturbed with a solitary word; after candidate creation, the
words present in the query can be later engaged to compose
the concluding candidate selection.. For example, in case of
the abbreviation, while searching the user given the
abbreviation as an query in the search interface, but the
document in the records contains the full form of that
abbreviation, here raises the problem regarding in not
identifying the search related records. if we consider “NY
Times” as the query and the source document contains
“New York Times”, in such cases the query and the
document does not match fit and then the document cannot
be considered as ranked high. Query reformulation attempt
to convert “NY Times” into “New York Times” and thus
create a better identity between the query and the document.
Earlier effort on string transformation is categorized into
two groups. Few work groups mainly consider efficient

_______________________________________________________________________________________
generation of strings. The other effort tried to find out the
model with dissimilar approaches, such as a generative
method, a logistic regression method, and the discriminative
model. Yet, efficiency is not considered as an important
factor in these methods.
2. LITERATURE SURVEY
It is an important factor for a browser to check the spelling
of the query term entered in the search interface to make the
search efficient and to get the results in the efficient manner.
This paper describes narrative methods for the use of
distributional similarity estimated from query logs in
learning improved query spelling correction models. The
key point in our methods is to measure the distributional
similarities between the two terms. When measured it is
known to be high between the frequently occurring mistakes
in spellings and those correction, and it is found to be low
two immaterial terms only with same spellings.
According to winnow-based approach for context-sensitive
correction of spellings, for a large variety of problems there
is a requirement of characterization of linguistics. It focus
on the concepts of techniques that are used for correcting the
strings errors in of miss-placed so as to avoid these kinds of
problems while searching on the net the earlier techniques
used the winnow based approaches for correcting those
errors. In this effort we are acquiring the properties of those
methods to avoid such mistakes. if we consider the example
string “ to”, suppose if we place the string „too” in place of
“to” it leads to difference in meanings and may the sentence
leads to wrong. This is the task of fixing spelling errors that
happen to result in valid words, such as substituting to for
too, casual for causal, and so on.
A unified and discriminative method for query modification
is a method used for providing the alternative substrings in
place of original string, in the case when the original string
is found to be misspelled.
3. FUNDAMENTALS
3.1 Spelling Error Correction
In this module if a user wants to check the spelling, He/She
can check it and correct it automatically. Efficiency is
critical for this job due to following reasons.
(1) The dictionary is very large and (2) The response time
must be very short.
The initial point indicates , while using the dictionary it is
very difficult to find the required string if in case it is
misspelled when the size of the dictionary is very large. The
second point indicates that the response time is based on the
size of the dictionary present. Large the size of the
dictionary more will the response time and small the size of
the dictionary low will be the response time.
3.1.1 Word Pair Mining
Searching on web by a user is of session based. That session
will be of the frequently made mistakes and the spelling
errors in the query term. In order to avoid those mistakes
pairing of strings must be done in the browser which
indicates that the string which is spelt wrong must be paired
with the correct spelling of the same word. This leads to
replacement of the wrong spelt word with the correct
spelling. The following are some of the examples of the
word pairs with misspelled and correct spelled.
Table-1: Examples of Word Pairs
Misspelled Correct Misspelled Correct
Aacoustic Acoustic Chevorle Chevrolet
Liyerature Literature Tournemen Tournament
Shingle Shingle Newpape Newspaper
Finlad Finland Ccomponet Component
Reteive Retrieve Olimpick Olympic
3.2 String Transformation
Here we are using two techniques for searching the String
1)String Generation 2)String Transformation.
String Generation: Here we have to generate 50,000
Strings in the alphabetical order. Starting from a to z like
a,aa,…..z. Generating the strings manually takes large
amount of time and it is recommended to use the database
with thousands of words, by connecting the database to the
required system.
String Transformation: It means we have given the user
with the advantage of the String Generation along with the
String alias. For example if the end user have typed
“TKDE” which is equal to “Transactions on Knowledge and
Data Engineering”, the search interface may be able to find
the related result.
String Mining: The User can be able to download the string
along with its synonyms and also he can be able to
download its related substrings and its inverse etc. The user
can also check whether the given string is present in the
collection of strings, if it is present in the group the result
will be “String is Found” and if string is not present the
result will be ”String is Not Found”.
4. EXISTING & PROPOSED SYSTEMS
Earlier work on string transformation is of two categories
the initial works mainly focused on the methods that are
employed for the generation of stings in efficient manner.
The later works tried to develop the models with various
types of approaches. Spelling error correction, query
reformulation and synonym mining for the related query are
not taken into consideration in the earlier work. In the
previous methods and techniques they are not considered the
efficiency and accuracy as the important factors. A log-
linear method for string transformation with the use of
efficient method for generating string is used in the present
work. Two specific Applications are associated with our
method namely
1. Spelling error Correction of Queries.
2. Query Reformulation in web search.

_______________________________________________________________________________________
As the proposed system is focused on the log-linear model
for string Transformation Along with the spelling error
check and query reformulation in web search, the efficiency
and accuracy can be enhanced. Query reformulation helps
the user in generating the related substrings in the top k
candidate list.
4.1 Model for String Transformation
Here we are proposing a alignment procedure which is
known as the edit distance based alignment.
Fig-2 Edit distance based alignment
It consists of three steps. In the initial step the query is
represented in edit distance style. In the second step we
derive a rule by which the representation is done in the
previous step. We expand the rules with context in the last
step. The representation is as follows.
4.2 Architecture
Fig 1 Architecture of proposed system
The above figure indicates the proposed architecture in
which the database, admin and the end user involves. In the
string generation phase the strings are generated by the
admin either by manually or the database with huge amount
of data is connected. When the user want to perform the
operation such as retrieving the data from the web,
regarding to the query the transformation process is done at
the browser in order to rectify any mistakes regarding to
spelling etc. the query reformulation is done whine receiving
the query from the user at the search interface.
5. CONCLUSION
In this paper, we have projected new statistical learning
methods for string transformation. Proposed method is
narrative and distinctive in its sculpt, learning algorithm,
and the algorithm of string generation. Two detailed
applications are associated with our technique, they are
spelling error rectification in queries and the query
reformulation during web search. Experimental
consequences on two huge data sets and Microsoft Speller
shows that proposed method improves on the baselines in
case of accurateness and effectiveness. Our proposed
method is mainly useful when the problem occurs on a
outsized scale.
In case of large scale systems our method of query
reformulation is well used because it is very easy to retrieve
the information regarding to the search query even in the
case of substitutes. Our proposed log linear model can be
used in these system for efficient results.
REFERENCES
[1]. M. Li, M. Zhu, Y. Zhang and M. Zhou, “Exploring
distributional likeness based methods for query spelling
error correction”.
[2]. D. Roth, R. A. Golding “A winnow-based method to
context-sensitive correction”.
[3]. J. Gauo, H. Li, and X. Cheng, G. Xu, “A combined and
discriminative model for query alteration,” in Proceedings in
the 31st annual intercontinental ACM SIGIR confernce on
Research and devlopment in information recovery.
[4]. A. Bem, C. Li, and J. Lu, S. Ji, “Space-constrained
gram-based indexing for well-organized rough string
search,” Proceedings in the 2009 IEEE International
Confernce on Data Mining Engineering.
[5]. E. Brill and R. C. Moore, “An improved error model for
noisy channel spelling correction,” in Proceedings of the
38th Annual Meeting on Association for Computational
Linguistics.
[6]. G. Xu and J. Xu, “Learning similarity function for rare
queries,” in Proc. 4th ACM Int. Conf. Web Search and Data
Mining, NewYork.
[7]. C. A. Knoblock, S. Tejada, “Learning province self-
sufficient string transformation in weights for high
accurateness object classification,” in Proc. ACM SIGKDD
Int. Conf. Knowledge and Data Mining.
[8]. A. Arasu, S. Chaudhuri, and R. Kaushik, “Learning
transformations through examples,” Proc. VLDB Endow.,
vol. 2, pp. 514–525, 2009.
[9]. S. Tejada, C. Knoblock, and S. Minton, “Learning
domainindependent string transformation for high accuracy
identification,” Proc. 8th ACM SIGKDD Int. Conf.
Knowledge and Data Mining, New York, USA, 2002.
[10]. C. Li, “Efficient estimated search for string
collections,” VLDB Endow., vol. 3, no. 2,. 1660–1661
[11]. C. Li, B. Wang, and X. Yang, Improving presentation
of estimated queries on thread collections using variable-
length ,” Proc. 33rd Int. Conf. Very Large Data Bases,
Vienna,

_______________________________________________________________________________________
[12]. X. Yang, C. Li, B. Wang, “Cost-based variable-length-
gram collection for string collections to hold up estimated
queries competently,” in Proc. ACM SIGMOD Int. Conf.
Data Mining, New York, USA, pp. 353–364.
BIOGRAPHIES
M . Kiran kumar, M-Tech, Department of
CSE, JNTUA college of engineering,
Pulivendula, Andhra Pradesh, India.
Smt. S.Jessica Saritha is currently
working as an Assistant Professor in
Department of CSE , JNTUA College of
Engineering, Pulivendula Andhra Pradesh
India Her Research interests are Data
mining and diustributed computing.

An efficient approach to query reformulation in web search

More Related Content

What's hot (20)

Similar to An efficient approach to query reformulation in web search (20)

More from eSAT Journals (20)

Recently uploaded (20)

An efficient approach to query reformulation in web search