SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 172
AN EFFICIENT APPROACH TO QUERY REFORMULATION IN WEB
SEARCH
M. Kiran Kumar1
, S. Jessica Saritha2
1
M-Tech, Computer Science & Engineering, JNTUA College of Engineering Pulivendula, AP, India
2
Assistant Professor, Computer Science & Engineering, JNTUA College of Engineering, Pulivendula, AP, India.
Abstract
Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can
be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the
top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and
probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log
linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log
linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration
conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation
which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of
queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the
previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not
focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show
that the projected method is extremely accurate and also efficient.
Keywords: Log linear method, Query reformulation, Spelling Error correction.
--------------------------------------------------------------------***--------------------------------------------------------------------
1. INTRODUCTION
This paper focuses on string transformation, which is the
most common problem, in various applications. In the
processing of natural language correction of spelling errors
generation of pronunciations, word stemming can be
categorized as string transformation. It can also be used in
query reformulation and query implication in search. In the
domain of data mining, string transformation is used in
synonyms mining and database verification matching. Now
days all the application are based on the network kit is
compulsory that the transformation must be accurate as well
as efficient. If we give an input string, the system will
generates the top k most equivalent output strings which are
related to the input string by the usage of many operations.
Here any type of tokens such as string of characters, words.
Every operator is considered as a rule which defines
replacement of substring with its equivalent substring. The
possibility of transformation can characterize relationship,
importance, and connection between strings in a particular
application. Though assured development has been made,
supplementary examination of the job is still needed, mainly
from the point of view of enhancing accuracy as well as
efficiency, which is accurately the objective of this effort.
Based on the dictionary usage string transformation can be
performed on different settings, on is when the dictionary is
used and the other is when not used.. When we use a
dictionary, the resultant strings must be present in the
dictionary, as the
Volume of the dictionary is very large. Without the
simplification, we particularly study about the correction of
spelling in query along with the reformulation of the queries
in web search. In the initial task, a string contains character
set. In the next task, a string is of words. Correcting spelling
in queries typically consists of two stages: candidate
generation and candidate selection. Candidate generation is
used when it is required to identify the most common
corrections of misspelled string from dictionary. In such
cases, a string consists of characters is considered as input
and operators correspond to insert, delete, and substitution
with surrounding characters, or without surrounding
characters for example, “lly”!“ly”. Clearly candidate
generation can be considered as an example for string
transformation. Note that the candidate generation is
disturbed with a solitary word; after candidate creation, the
words present in the query can be later engaged to compose
the concluding candidate selection.. For example, in case of
the abbreviation, while searching the user given the
abbreviation as an query in the search interface, but the
document in the records contains the full form of that
abbreviation, here raises the problem regarding in not
identifying the search related records. if we consider “NY
Times” as the query and the source document contains
“New York Times”, in such cases the query and the
document does not match fit and then the document cannot
be considered as ranked high. Query reformulation attempt
to convert “NY Times” into “New York Times” and thus
create a better identity between the query and the document.
Earlier effort on string transformation is categorized into
two groups. Few work groups mainly consider efficient
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 173
generation of strings. The other effort tried to find out the
model with dissimilar approaches, such as a generative
method, a logistic regression method, and the discriminative
model. Yet, efficiency is not considered as an important
factor in these methods.
2. LITERATURE SURVEY
It is an important factor for a browser to check the spelling
of the query term entered in the search interface to make the
search efficient and to get the results in the efficient manner.
This paper describes narrative methods for the use of
distributional similarity estimated from query logs in
learning improved query spelling correction models. The
key point in our methods is to measure the distributional
similarities between the two terms. When measured it is
known to be high between the frequently occurring mistakes
in spellings and those correction, and it is found to be low
two immaterial terms only with same spellings.
According to winnow-based approach for context-sensitive
correction of spellings, for a large variety of problems there
is a requirement of characterization of linguistics. It focus
on the concepts of techniques that are used for correcting the
strings errors in of miss-placed so as to avoid these kinds of
problems while searching on the net the earlier techniques
used the winnow based approaches for correcting those
errors. In this effort we are acquiring the properties of those
methods to avoid such mistakes. if we consider the example
string “ to”, suppose if we place the string „too” in place of
“to” it leads to difference in meanings and may the sentence
leads to wrong. This is the task of fixing spelling errors that
happen to result in valid words, such as substituting to for
too, casual for causal, and so on.
A unified and discriminative method for query modification
is a method used for providing the alternative substrings in
place of original string, in the case when the original string
is found to be misspelled.
3. FUNDAMENTALS
3.1 Spelling Error Correction
In this module if a user wants to check the spelling, He/She
can check it and correct it automatically. Efficiency is
critical for this job due to following reasons.
(1) The dictionary is very large and (2) The response time
must be very short.
The initial point indicates , while using the dictionary it is
very difficult to find the required string if in case it is
misspelled when the size of the dictionary is very large. The
second point indicates that the response time is based on the
size of the dictionary present. Large the size of the
dictionary more will the response time and small the size of
the dictionary low will be the response time.
3.1.1 Word Pair Mining
Searching on web by a user is of session based. That session
will be of the frequently made mistakes and the spelling
errors in the query term. In order to avoid those mistakes
pairing of strings must be done in the browser which
indicates that the string which is spelt wrong must be paired
with the correct spelling of the same word. This leads to
replacement of the wrong spelt word with the correct
spelling. The following are some of the examples of the
word pairs with misspelled and correct spelled.
Table-1: Examples of Word Pairs
Misspelled Correct Misspelled Correct
Aacoustic Acoustic Chevorle Chevrolet
Liyerature Literature Tournemen Tournament
Shingle Shingle Newpape Newspaper
Finlad Finland Ccomponet Component
Reteive Retrieve Olimpick Olympic
3.2 String Transformation
Here we are using two techniques for searching the String
1)String Generation 2)String Transformation.
String Generation: Here we have to generate 50,000
Strings in the alphabetical order. Starting from a to z like
a,aa,…..z. Generating the strings manually takes large
amount of time and it is recommended to use the database
with thousands of words, by connecting the database to the
required system.
String Transformation: It means we have given the user
with the advantage of the String Generation along with the
String alias. For example if the end user have typed
“TKDE” which is equal to “Transactions on Knowledge and
Data Engineering”, the search interface may be able to find
the related result.
String Mining: The User can be able to download the string
along with its synonyms and also he can be able to
download its related substrings and its inverse etc. The user
can also check whether the given string is present in the
collection of strings, if it is present in the group the result
will be “String is Found” and if string is not present the
result will be ”String is Not Found”.
4. EXISTING & PROPOSED SYSTEMS
Earlier work on string transformation is of two categories
the initial works mainly focused on the methods that are
employed for the generation of stings in efficient manner.
The later works tried to develop the models with various
types of approaches. Spelling error correction, query
reformulation and synonym mining for the related query are
not taken into consideration in the earlier work. In the
previous methods and techniques they are not considered the
efficiency and accuracy as the important factors. A log-
linear method for string transformation with the use of
efficient method for generating string is used in the present
work. Two specific Applications are associated with our
method namely
1. Spelling error Correction of Queries.
2. Query Reformulation in web search.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 174
As the proposed system is focused on the log-linear model
for string Transformation Along with the spelling error
check and query reformulation in web search, the efficiency
and accuracy can be enhanced. Query reformulation helps
the user in generating the related substrings in the top k
candidate list.
4.1 Model for String Transformation
Here we are proposing a alignment procedure which is
known as the edit distance based alignment.
Fig-2 Edit distance based alignment
It consists of three steps. In the initial step the query is
represented in edit distance style. In the second step we
derive a rule by which the representation is done in the
previous step. We expand the rules with context in the last
step. The representation is as follows.
4.2 Architecture
Fig 1 Architecture of proposed system
The above figure indicates the proposed architecture in
which the database, admin and the end user involves. In the
string generation phase the strings are generated by the
admin either by manually or the database with huge amount
of data is connected. When the user want to perform the
operation such as retrieving the data from the web,
regarding to the query the transformation process is done at
the browser in order to rectify any mistakes regarding to
spelling etc. the query reformulation is done whine receiving
the query from the user at the search interface.
5. CONCLUSION
In this paper, we have projected new statistical learning
methods for string transformation. Proposed method is
narrative and distinctive in its sculpt, learning algorithm,
and the algorithm of string generation. Two detailed
applications are associated with our technique, they are
spelling error rectification in queries and the query
reformulation during web search. Experimental
consequences on two huge data sets and Microsoft Speller
shows that proposed method improves on the baselines in
case of accurateness and effectiveness. Our proposed
method is mainly useful when the problem occurs on a
outsized scale.
In case of large scale systems our method of query
reformulation is well used because it is very easy to retrieve
the information regarding to the search query even in the
case of substitutes. Our proposed log linear model can be
used in these system for efficient results.
REFERENCES
[1]. M. Li, M. Zhu, Y. Zhang and M. Zhou, “Exploring
distributional likeness based methods for query spelling
error correction”.
[2]. D. Roth, R. A. Golding “A winnow-based method to
context-sensitive correction”.
[3]. J. Gauo, H. Li, and X. Cheng, G. Xu, “A combined and
discriminative model for query alteration,” in Proceedings in
the 31st annual intercontinental ACM SIGIR confernce on
Research and devlopment in information recovery.
[4]. A. Bem, C. Li, and J. Lu, S. Ji, “Space-constrained
gram-based indexing for well-organized rough string
search,” Proceedings in the 2009 IEEE International
Confernce on Data Mining Engineering.
[5]. E. Brill and R. C. Moore, “An improved error model for
noisy channel spelling correction,” in Proceedings of the
38th Annual Meeting on Association for Computational
Linguistics.
[6]. G. Xu and J. Xu, “Learning similarity function for rare
queries,” in Proc. 4th ACM Int. Conf. Web Search and Data
Mining, NewYork.
[7]. C. A. Knoblock, S. Tejada, “Learning province self-
sufficient string transformation in weights for high
accurateness object classification,” in Proc. ACM SIGKDD
Int. Conf. Knowledge and Data Mining.
[8]. A. Arasu, S. Chaudhuri, and R. Kaushik, “Learning
transformations through examples,” Proc. VLDB Endow.,
vol. 2, pp. 514–525, 2009.
[9]. S. Tejada, C. Knoblock, and S. Minton, “Learning
domainindependent string transformation for high accuracy
identification,” Proc. 8th ACM SIGKDD Int. Conf.
Knowledge and Data Mining, New York, USA, 2002.
[10]. C. Li, “Efficient estimated search for string
collections,” VLDB Endow., vol. 3, no. 2,. 1660–1661
[11]. C. Li, B. Wang, and X. Yang, Improving presentation
of estimated queries on thread collections using variable-
length ,” Proc. 33rd Int. Conf. Very Large Data Bases,
Vienna,
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 175
[12]. X. Yang, C. Li, B. Wang, “Cost-based variable-length-
gram collection for string collections to hold up estimated
queries competently,” in Proc. ACM SIGMOD Int. Conf.
Data Mining, New York, USA, pp. 353–364.
BIOGRAPHIES
M . Kiran kumar, M-Tech, Department of
CSE, JNTUA college of engineering,
Pulivendula, Andhra Pradesh, India.
Smt. S.Jessica Saritha is currently
working as an Assistant Professor in
Department of CSE , JNTUA College of
Engineering, Pulivendula Andhra Pradesh
India Her Research interests are Data
mining and diustributed computing.

More Related Content

PDF
C03504013016
PDF
Hybrid approach for generating non overlapped substring using genetic algorithm
PDF
[IJET-V2I3P19] Authors: Priyanka Sharma
PDF
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
PDF
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
PDF
Proposed Method for String Transformation using Probablistic Approach
PDF
Named Entity Recognition using Tweet Segmentation
PDF
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
C03504013016
Hybrid approach for generating non overlapped substring using genetic algorithm
[IJET-V2I3P19] Authors: Priyanka Sharma
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
Proposed Method for String Transformation using Probablistic Approach
Named Entity Recognition using Tweet Segmentation
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION

What's hot (20)

PDF
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
PDF
Cohesive Software Design
PDF
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
PDF
Lexical Analysis to Effectively Detect User's Opinion
PDF
Tracing Requirements as a Problem of Machine Learning
PDF
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
PDF
Using Hybrid Approach Analyzing Sentence Pattern by POS Sequence over Twitter
PDF
IRJET- An Analysis of Recent Advancements on the Dependency Parser
PDF
Context Sensitive Relatedness Measure of Word Pairs
PDF
Ijarcet vol-3-issue-1-9-11
PDF
L1803058388
PDF
English to punjabi machine translation system using hybrid approach of word s
PDF
Extraction of Data Using Comparable Entity Mining
PDF
Lexicon Based Emotion Analysis on Twitter Data
PDF
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
PDF
IRJET- Public Opinion Analysis on Law Enforcement
PDF
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
PDF
A Novel Text Classification Method Using Comprehensive Feature Weight
PDF
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
PDF
A Novel Approach for Rule Based Translation of English to Marathi
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
Cohesive Software Design
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
Lexical Analysis to Effectively Detect User's Opinion
Tracing Requirements as a Problem of Machine Learning
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
Using Hybrid Approach Analyzing Sentence Pattern by POS Sequence over Twitter
IRJET- An Analysis of Recent Advancements on the Dependency Parser
Context Sensitive Relatedness Measure of Word Pairs
Ijarcet vol-3-issue-1-9-11
L1803058388
English to punjabi machine translation system using hybrid approach of word s
Extraction of Data Using Comparable Entity Mining
Lexicon Based Emotion Analysis on Twitter Data
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
IRJET- Public Opinion Analysis on Law Enforcement
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
A Novel Text Classification Method Using Comprehensive Feature Weight
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
A Novel Approach for Rule Based Translation of English to Marathi
Ad

Similar to An efficient approach to query reformulation in web search (20)

PDF
IRJET- Spelling and Grammar Checker and Template Suggestion
PDF
EasyChair-Preprint-7375.pdf
PDF
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
PDF
Improved method for pattern discovery in text mining
PDF
Improved method for pattern discovery in text mining
PDF
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
PDF
CONCEPTUAL SIMILARITY MEASUREMENT ALGORITHM FOR DOMAIN SPECIFIC ONTOLOGY
PDF
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
PDF
A lexicon based algorithm for noisy text normalization as pre processing for ...
PDF
Conceptual similarity measurement algorithm for domain specific ontology[
PDF
Semantic Based Document Clustering Using Lexical Chains
PDF
IRJET-Semantic Based Document Clustering Using Lexical Chains
PDF
IRJET- Automatic Recapitulation of Text Document
DOCX
2014 IEEE DOTNET DATA MINING PROJECT A probabilistic approach to string trans...
DOCX
IEEE 2014 DOTNET DATA MINING PROJECTS A probabilistic approach to string tran...
PDF
Aq35241246
PDF
IRJET- Missing Value Evaluation in SQL Queries: A Survey
PDF
Missing Value Evaluation in SQL Queries: A Survey
DOCX
2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transfo...
DOCX
2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transfo...
IRJET- Spelling and Grammar Checker and Template Suggestion
EasyChair-Preprint-7375.pdf
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
Improved method for pattern discovery in text mining
Improved method for pattern discovery in text mining
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
CONCEPTUAL SIMILARITY MEASUREMENT ALGORITHM FOR DOMAIN SPECIFIC ONTOLOGY
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
A lexicon based algorithm for noisy text normalization as pre processing for ...
Conceptual similarity measurement algorithm for domain specific ontology[
Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET- Automatic Recapitulation of Text Document
2014 IEEE DOTNET DATA MINING PROJECT A probabilistic approach to string trans...
IEEE 2014 DOTNET DATA MINING PROJECTS A probabilistic approach to string tran...
Aq35241246
IRJET- Missing Value Evaluation in SQL Queries: A Survey
Missing Value Evaluation in SQL Queries: A Survey
2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transfo...
2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transfo...
Ad

More from eSAT Journals (20)

PDF
Mechanical properties of hybrid fiber reinforced concrete for pavements
PDF
Material management in construction – a case study
PDF
Managing drought short term strategies in semi arid regions a case study
PDF
Life cycle cost analysis of overlay for an urban road in bangalore
PDF
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
PDF
Laboratory investigation of expansive soil stabilized with natural inorganic ...
PDF
Influence of reinforcement on the behavior of hollow concrete block masonry p...
PDF
Influence of compaction energy on soil stabilized with chemical stabilizer
PDF
Geographical information system (gis) for water resources management
PDF
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
PDF
Factors influencing compressive strength of geopolymer concrete
PDF
Experimental investigation on circular hollow steel columns in filled with li...
PDF
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
PDF
Evaluation of punching shear in flat slabs
PDF
Evaluation of performance of intake tower dam for recent earthquake in india
PDF
Evaluation of operational efficiency of urban road network using travel time ...
PDF
Estimation of surface runoff in nallur amanikere watershed using scs cn method
PDF
Estimation of morphometric parameters and runoff using rs & gis techniques
PDF
Effect of variation of plastic hinge length on the results of non linear anal...
PDF
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Mechanical properties of hybrid fiber reinforced concrete for pavements
Material management in construction – a case study
Managing drought short term strategies in semi arid regions a case study
Life cycle cost analysis of overlay for an urban road in bangalore
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of compaction energy on soil stabilized with chemical stabilizer
Geographical information system (gis) for water resources management
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Factors influencing compressive strength of geopolymer concrete
Experimental investigation on circular hollow steel columns in filled with li...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Evaluation of punching shear in flat slabs
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of operational efficiency of urban road network using travel time ...
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of morphometric parameters and runoff using rs & gis techniques
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of use of recycled materials on indirect tensile strength of asphalt c...

Recently uploaded (20)

PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
composite construction of structures.pdf
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Construction Project Organization Group 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
composite construction of structures.pdf
PPT on Performance Review to get promotions
Project quality management in manufacturing
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Operating System & Kernel Study Guide-1 - converted.pdf
Digital Logic Computer Design lecture notes
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT 4 Total Quality Management .pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mechanical Engineering MATERIALS Selection
Model Code of Practice - Construction Work - 21102022 .pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Construction Project Organization Group 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

An efficient approach to query reformulation in web search

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 172 AN EFFICIENT APPROACH TO QUERY REFORMULATION IN WEB SEARCH M. Kiran Kumar1 , S. Jessica Saritha2 1 M-Tech, Computer Science & Engineering, JNTUA College of Engineering Pulivendula, AP, India 2 Assistant Professor, Computer Science & Engineering, JNTUA College of Engineering, Pulivendula, AP, India. Abstract Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show that the projected method is extremely accurate and also efficient. Keywords: Log linear method, Query reformulation, Spelling Error correction. --------------------------------------------------------------------***-------------------------------------------------------------------- 1. INTRODUCTION This paper focuses on string transformation, which is the most common problem, in various applications. In the processing of natural language correction of spelling errors generation of pronunciations, word stemming can be categorized as string transformation. It can also be used in query reformulation and query implication in search. In the domain of data mining, string transformation is used in synonyms mining and database verification matching. Now days all the application are based on the network kit is compulsory that the transformation must be accurate as well as efficient. If we give an input string, the system will generates the top k most equivalent output strings which are related to the input string by the usage of many operations. Here any type of tokens such as string of characters, words. Every operator is considered as a rule which defines replacement of substring with its equivalent substring. The possibility of transformation can characterize relationship, importance, and connection between strings in a particular application. Though assured development has been made, supplementary examination of the job is still needed, mainly from the point of view of enhancing accuracy as well as efficiency, which is accurately the objective of this effort. Based on the dictionary usage string transformation can be performed on different settings, on is when the dictionary is used and the other is when not used.. When we use a dictionary, the resultant strings must be present in the dictionary, as the Volume of the dictionary is very large. Without the simplification, we particularly study about the correction of spelling in query along with the reformulation of the queries in web search. In the initial task, a string contains character set. In the next task, a string is of words. Correcting spelling in queries typically consists of two stages: candidate generation and candidate selection. Candidate generation is used when it is required to identify the most common corrections of misspelled string from dictionary. In such cases, a string consists of characters is considered as input and operators correspond to insert, delete, and substitution with surrounding characters, or without surrounding characters for example, “lly”!“ly”. Clearly candidate generation can be considered as an example for string transformation. Note that the candidate generation is disturbed with a solitary word; after candidate creation, the words present in the query can be later engaged to compose the concluding candidate selection.. For example, in case of the abbreviation, while searching the user given the abbreviation as an query in the search interface, but the document in the records contains the full form of that abbreviation, here raises the problem regarding in not identifying the search related records. if we consider “NY Times” as the query and the source document contains “New York Times”, in such cases the query and the document does not match fit and then the document cannot be considered as ranked high. Query reformulation attempt to convert “NY Times” into “New York Times” and thus create a better identity between the query and the document. Earlier effort on string transformation is categorized into two groups. Few work groups mainly consider efficient
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 173 generation of strings. The other effort tried to find out the model with dissimilar approaches, such as a generative method, a logistic regression method, and the discriminative model. Yet, efficiency is not considered as an important factor in these methods. 2. LITERATURE SURVEY It is an important factor for a browser to check the spelling of the query term entered in the search interface to make the search efficient and to get the results in the efficient manner. This paper describes narrative methods for the use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key point in our methods is to measure the distributional similarities between the two terms. When measured it is known to be high between the frequently occurring mistakes in spellings and those correction, and it is found to be low two immaterial terms only with same spellings. According to winnow-based approach for context-sensitive correction of spellings, for a large variety of problems there is a requirement of characterization of linguistics. It focus on the concepts of techniques that are used for correcting the strings errors in of miss-placed so as to avoid these kinds of problems while searching on the net the earlier techniques used the winnow based approaches for correcting those errors. In this effort we are acquiring the properties of those methods to avoid such mistakes. if we consider the example string “ to”, suppose if we place the string „too” in place of “to” it leads to difference in meanings and may the sentence leads to wrong. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. A unified and discriminative method for query modification is a method used for providing the alternative substrings in place of original string, in the case when the original string is found to be misspelled. 3. FUNDAMENTALS 3.1 Spelling Error Correction In this module if a user wants to check the spelling, He/She can check it and correct it automatically. Efficiency is critical for this job due to following reasons. (1) The dictionary is very large and (2) The response time must be very short. The initial point indicates , while using the dictionary it is very difficult to find the required string if in case it is misspelled when the size of the dictionary is very large. The second point indicates that the response time is based on the size of the dictionary present. Large the size of the dictionary more will the response time and small the size of the dictionary low will be the response time. 3.1.1 Word Pair Mining Searching on web by a user is of session based. That session will be of the frequently made mistakes and the spelling errors in the query term. In order to avoid those mistakes pairing of strings must be done in the browser which indicates that the string which is spelt wrong must be paired with the correct spelling of the same word. This leads to replacement of the wrong spelt word with the correct spelling. The following are some of the examples of the word pairs with misspelled and correct spelled. Table-1: Examples of Word Pairs Misspelled Correct Misspelled Correct Aacoustic Acoustic Chevorle Chevrolet Liyerature Literature Tournemen Tournament Shingle Shingle Newpape Newspaper Finlad Finland Ccomponet Component Reteive Retrieve Olimpick Olympic 3.2 String Transformation Here we are using two techniques for searching the String 1)String Generation 2)String Transformation. String Generation: Here we have to generate 50,000 Strings in the alphabetical order. Starting from a to z like a,aa,…..z. Generating the strings manually takes large amount of time and it is recommended to use the database with thousands of words, by connecting the database to the required system. String Transformation: It means we have given the user with the advantage of the String Generation along with the String alias. For example if the end user have typed “TKDE” which is equal to “Transactions on Knowledge and Data Engineering”, the search interface may be able to find the related result. String Mining: The User can be able to download the string along with its synonyms and also he can be able to download its related substrings and its inverse etc. The user can also check whether the given string is present in the collection of strings, if it is present in the group the result will be “String is Found” and if string is not present the result will be ”String is Not Found”. 4. EXISTING & PROPOSED SYSTEMS Earlier work on string transformation is of two categories the initial works mainly focused on the methods that are employed for the generation of stings in efficient manner. The later works tried to develop the models with various types of approaches. Spelling error correction, query reformulation and synonym mining for the related query are not taken into consideration in the earlier work. In the previous methods and techniques they are not considered the efficiency and accuracy as the important factors. A log- linear method for string transformation with the use of efficient method for generating string is used in the present work. Two specific Applications are associated with our method namely 1. Spelling error Correction of Queries. 2. Query Reformulation in web search.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 174 As the proposed system is focused on the log-linear model for string Transformation Along with the spelling error check and query reformulation in web search, the efficiency and accuracy can be enhanced. Query reformulation helps the user in generating the related substrings in the top k candidate list. 4.1 Model for String Transformation Here we are proposing a alignment procedure which is known as the edit distance based alignment. Fig-2 Edit distance based alignment It consists of three steps. In the initial step the query is represented in edit distance style. In the second step we derive a rule by which the representation is done in the previous step. We expand the rules with context in the last step. The representation is as follows. 4.2 Architecture Fig 1 Architecture of proposed system The above figure indicates the proposed architecture in which the database, admin and the end user involves. In the string generation phase the strings are generated by the admin either by manually or the database with huge amount of data is connected. When the user want to perform the operation such as retrieving the data from the web, regarding to the query the transformation process is done at the browser in order to rectify any mistakes regarding to spelling etc. the query reformulation is done whine receiving the query from the user at the search interface. 5. CONCLUSION In this paper, we have projected new statistical learning methods for string transformation. Proposed method is narrative and distinctive in its sculpt, learning algorithm, and the algorithm of string generation. Two detailed applications are associated with our technique, they are spelling error rectification in queries and the query reformulation during web search. Experimental consequences on two huge data sets and Microsoft Speller shows that proposed method improves on the baselines in case of accurateness and effectiveness. Our proposed method is mainly useful when the problem occurs on a outsized scale. In case of large scale systems our method of query reformulation is well used because it is very easy to retrieve the information regarding to the search query even in the case of substitutes. Our proposed log linear model can be used in these system for efficient results. REFERENCES [1]. M. Li, M. Zhu, Y. Zhang and M. Zhou, “Exploring distributional likeness based methods for query spelling error correction”. [2]. D. Roth, R. A. Golding “A winnow-based method to context-sensitive correction”. [3]. J. Gauo, H. Li, and X. Cheng, G. Xu, “A combined and discriminative model for query alteration,” in Proceedings in the 31st annual intercontinental ACM SIGIR confernce on Research and devlopment in information recovery. [4]. A. Bem, C. Li, and J. Lu, S. Ji, “Space-constrained gram-based indexing for well-organized rough string search,” Proceedings in the 2009 IEEE International Confernce on Data Mining Engineering. [5]. E. Brill and R. C. Moore, “An improved error model for noisy channel spelling correction,” in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. [6]. G. Xu and J. Xu, “Learning similarity function for rare queries,” in Proc. 4th ACM Int. Conf. Web Search and Data Mining, NewYork. [7]. C. A. Knoblock, S. Tejada, “Learning province self- sufficient string transformation in weights for high accurateness object classification,” in Proc. ACM SIGKDD Int. Conf. Knowledge and Data Mining. [8]. A. Arasu, S. Chaudhuri, and R. Kaushik, “Learning transformations through examples,” Proc. VLDB Endow., vol. 2, pp. 514–525, 2009. [9]. S. Tejada, C. Knoblock, and S. Minton, “Learning domainindependent string transformation for high accuracy identification,” Proc. 8th ACM SIGKDD Int. Conf. Knowledge and Data Mining, New York, USA, 2002. [10]. C. Li, “Efficient estimated search for string collections,” VLDB Endow., vol. 3, no. 2,. 1660–1661 [11]. C. Li, B. Wang, and X. Yang, Improving presentation of estimated queries on thread collections using variable- length ,” Proc. 33rd Int. Conf. Very Large Data Bases, Vienna,
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 06 | June-2015, Available @ http://www.ijret.org 175 [12]. X. Yang, C. Li, B. Wang, “Cost-based variable-length- gram collection for string collections to hold up estimated queries competently,” in Proc. ACM SIGMOD Int. Conf. Data Mining, New York, USA, pp. 353–364. BIOGRAPHIES M . Kiran kumar, M-Tech, Department of CSE, JNTUA college of engineering, Pulivendula, Andhra Pradesh, India. Smt. S.Jessica Saritha is currently working as an Assistant Professor in Department of CSE , JNTUA College of Engineering, Pulivendula Andhra Pradesh India Her Research interests are Data mining and diustributed computing.