Query Recommendation by using Collaborative Filtering Approach

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1117
Query Recommendation by Using Collaborative Filtering Approach
Miss. Rupali Vasant Ubale1, Prof. Vrushali Desale2
1Student, 2Professor, Department of Computer Engineering, D. Y. Patil college of Engineering
----------------------------------------------------------------------------***-------------------------------------------------------------------------------
Abstract- Query facets offer exciting as well as beneficial
knowledge about any query and hence it can be used to
increase search performs in many ways. Thus in this paper, the
issues about the finding of query facets are summaries. Query
facets are different groups of words or else phrases which
describe and abstract the content enclosed by a query. From
the study, it is detected that the related feature of a query are
typically arranged and also repeated in the top-k retrieved
documents of the query in the form of document lists, and by
mingling these lists the query facets can be mined out. In this
paper an actual solution is projected which stated to as
QDMiner, to automatically mine query facets by mining then
grouping common lists from HTML tags, free text and duplicate
sections within topmost search results. Moreover, the trouble in
the duplication of the list is inspected and better query facets
are found that can be mined by representing fine-grained
interactions among lists and fining the duplicated lists. In
addition, the collaborative filtering techniques are used for
recommendation of top-k results of user interest. In this
recommendation process, ICHM and UCHM techniques are used
to forecast results according to user interest through matrix
generation.
Index Terms - Query facet, Faceted search,
Summarization, collaborative filtering.
I. INTRODUCTION
A. Query Facets
A query facet is a set of items like words or several phrases
which define and review important characteristic of a query.
One single query may have numerous facets that describe the
information about the query from different viewpoints. The
query“visit Mumbai” has a query facet about popular hotels in
Mumbai (Marine Drive, hotel Taj, Gate way of India, . . .) and a
facet on travel related areas (attractions, supermarket run,
dining, . . .). Query facets offer interesting as well as beneficial
knowledge about any query and hence it can be used to
increase search practices in numerous ways. The users can
simplify their exact intent by selecting facet things. Then
search results might be limited to the documents that are
important to the things. A user maybe will drill down to
ladies‘ watches if he is watching for a gift for his wife. These
numerous groups of query facets are in definite suitable for
unclear or unclear queries, such as “apple”. We might display
the products of Apple Inc. in any facet as well as numerous
types of the apple fruit in another facet. Another, query facets
could deliver direct information or instant answers that users
are looking for. Such as, for the query “bigg boss season 6”, all
episode descriptions are shown in one facet and leading
actors are displayed in another. In this condition, showing
query facets might save browsing time. Third, query facets
might be used to increase the variability of the ten blue links.
We may re-rank search results to avoid showing the pages
which are close to replicated data in query facets on the top.
Moreover, query facets include structured data surrounded
by the query, and therefore query facets can be used in new
fields besides normal web searches, for example, semantic
search or else entity search.
B. Collaborative Filtering
Collaborative filtering (CF) is a standard technology for
recommender systems. These devices are classified into two
fragments one is user-based CF and another is item-based CF.
The chief goal of user-based CF approach is to quest out a
group of users who have similar service forms to a known
user (i.e., “neighbors” of the user) and recommend those
things to the user that dissimilar users in the same set, while
the item-based CF method goals to suggest a user with the
recommendation on an item supported the opposite things
with high connections (i.e., “neighbors” of the item). In
entirely collaborative filtering approaches, it is a major step
to finding user’s (or item’s) neighbors, i.e., a set of
comparable users (or items). At present, almost complete CF
schemes measure user’s similarity (or thing’s similarity)
supported co-rated items of users (or collective users of
items). Though these recommendation strategies are broadly
used in E-Commerce, a number of insufficiencies are known.
Recently we are typically overcome by the large volume of
information manageable on the net, and in this environment,
we should build choices relating to the ingesting of data. In
our everyday lives, opponents do much of our data filtering.
For example, check rank lists for blockbusters and pay
consideration to movie critics. Collaborative filtering scheme
overcomes some limits of content-based filtering. The method
can recommend items (such as music, books etc.) to users as
well as recommendations are built on the ratings given to the
items, as an alternative of the contents of the items, which can
increase the quality of recommendations. Though
collaborative filtering has been successfully used in together
research and implementation, there still continue nearly
challenges for it as successful data filtering.

Since the previous study, it is supposed that important
sections of data about a query are typically manageable in list
formats and regularly used numerous times between top-k
retrieved documents. Therefore frequent lists combining
inside the top-k search results are planned to mine query
facets as well as implement a method called as QDMiner.
More exactly, QDMiner saves lists from HTML tags, and free
text surrounded in the top-k search results, combines them
into groups depends on the items they surround, then orders
the clusters as well as items based on in what manner the
lists and items appear in the top-k results. The scheme
includes two representations, one is the Unique Website
Model, and another is the Context Similarity Model, to order
or rank query facets. Moreover, to recommend user
interested result, a collaborative filtering technique is used.
As for a collaborative recommendation, there are two ways to
estimate the correspondence for group recommendation:
item-based and user-based.
The following sections of the paper are organized as
follows: Section II gives the significant literature survey.
Section III addresses proposed system. Section IV presents
the process in mathematical manner of the proposed system.
Section V describes assumptions expected results. Section VI
accomplishes the paper.
II. REVIEW OF LITERATURE
In the literature review, we are going to discuss topical
methods over the collaborative filtering and query facet
search.
In [1] L. Bing et al. suggest a graphical model to give score
queries. The suggested model feats a latent topic space, which
is automatically resulting from the log of query, to identify
semantic dependence of terms in a query as well as
dependency between topics. The graphical model
correspondingly captures the context of term in the history
query through skip-bigram in addition to n-gram language
models.
W. Kong et al. [2] challenge the heterogeneous
environment of the web suggests to use automatic query-
dependent facet generation, which creates facets for a query
as an alternative of the entire corpus. To integrate feedback of
user on these query facets into document ordering, they
investigate together Boolean filtering as well as soft ranking
models.
I. Szpektor et al. [3] recommend a technique to extend the
influence of query assistance methods as well as specific
query recommendation to long-tail queries by thinking about
rules among query patterns instead of individual query
evolutions, as presently done in graph models of query-flow.
X. Xue and W. B. Croft [4] projected a framework that
represents reformulation as a distribution of queries, where
each query is a variant of the actual query. This methodology
deliberates a query as a simple unit and may capture
significant dependencies among words as well as phrases in
the query. Preceding reformulation models are different
cases of the projected framework by creating perticular
assumptions.
L. Liet al. [5] projected the three-phase framework
designed for personalized query recommendations. The
primary phase is the training of queries and their significant
search results returned by a search engine, which creates a
historic queryURL bipartite group. The next phase is the
finding of related queries by retrieving a query affinity graph
from the bipartite graph, rather than directly working on the
original bipartite graph using biclique-based methodology or
graph clustering. The third phase is to rank or order the
similar queries. For this phase they create a rank technique
for ordering the related queries based on the merging
distances of a hierarchical agglomerative clustering (HAC).
W. Kong [6] improves a supervised method built on a
graphical model to identify and recognize query facets from
the noisy candidates found. The graphical model studies in
what manner possibly a candidate term is to be a facet term
along with how probable two terms are to be gathered
together in a query facet also captures the dependencies
among the two factors. They suggest two procedures for
approximate implication on the graphical model temporarily
exact inference is inflexible.
Qing Li et al. [7] applied a clustering method to assimilate
the contents of things into the framework of the item-based
collaborative filtering. The group rating data that is achieved
from the clustering outcome delivers a mode to present
content data into a collaborative recommendation.
I Szpektor et al. [8] suggest a method to encompassthe reach
of query support methods and in specific query
recommendation to long-tail queries by reasoning nearby
rules among query templates instead of individual query
transitions, as presently done in graph models of query-flow.
I Pound et al. [9] presented the user faceted-search perfor-
mance using the connection of web query logs with present
structured information. Meanwhile web queries are
expressed as free-text queries; a challenge in this method is
the inherent ambiguity in mapping keywords to the
dissimilar probable attributes of assumed entity type. They
present a solution that produces user partialities on
attributes as well as values, employing dissimilar
disambiguation methods ranging from humble keyword
matching to additional sophisticated probabilistic models.

M. Diao et al. [10] apply the ideas of faceted search in
addition browsing to the SpokenWeb search issue. They use
the ideas of facets to index the metadata related to the audio
content. Authors deliver a mechanism to order the facets
created on the search results. They develop a collaborative
query interface that allows browsing of search outcomes over
the top ranked facets.
K. Balog et al. [11] deliberate the task of entity search as
well as study to which extent state-of-art information
retrieval (IR) and semantic web (SW) skills are accomplished
of answering data requirements that focus on entities. They
similarly explore the possibility of merging IR with SW
technologies to increase the end-to-end presentation on
explicit entity search task.
M. Bron et al [12] examined the presentation of a model that
individual uses co-occurrence statistics. Though it recognizes
a set of related entities, it fails to order them efficiently. Two
types of error arise: (1) entities of the incorrect type
contaminate the ranking then (2) though some-how related
to the basic entity, some extracted entities do not involve in
the correct relation to it. To address error (1), they enhance
type filtering based on group information accessible in
Wikipedia. To precise for (2), they improve contextual data,
characterized as language models resulting from documents
in which source as well as destination entities co-occur. To
finalize the pipeline, they find homepages of top-ranked
entities by merging a language modeling method with
heuristics established on Wikipedias outer links.
C. Li et al. [13] suggests Facetedpedia, a faceted recovery
method for data discovery in addition exploration in
Wikipedia. Assumed the group of Wikipedia articles
subsequent from a keyword query, Facetedpedia creates a
faceted interface for navigation of the result articles. Related
with other faceted retrieval methods, Facetedpedia is
completely automatic as well as dynamic in together facet
generation as well as hierarchy construction, and the facets
are created on the rich semantic documents from Wikipedia.
A. Herdagdelen et al. [14] offered method to query
reformulation which associates syntactic as well as semantic
data by means of generalized Levenshtein distance
algorithms where the replacement process costs are
grounded on probabilistic term rewrite functions. They
examine unsupervised, compact and effectual models, as well
as deliver empirical evidence of their efficiency. They
additional discover a query reformulation generative model
and supervised grouping approaches providing better
performance at variable computational costs.
J. Huang and E. N. Efthimiadis [15] learning user’s
reformulation approaches in the context of the AOL query
logs. They generate the taxonomy of query refinement
approaches and construct a high precision rule-based
classifier to identify separately type of reformulation. The
efficiency of reformulations is dignified using user click
activities.
S. Gholamrezazadeh et al. [16] offerings the taxonomy of
summarization schemes and describes the most significant
criteria for a summary that may be produced by a scheme.
Moreover, dissimilar approaches of text summarization,
besides key steps for summarization procedure are
deliberated. Likewise, go over core criteria for calculating a
text summarization.
H. Zhang et al. [17] studies the employment of topic models
to construct semantic classes, taking as the basis data a
collection of raw semantic classes (RASCs), which were
mined by applying prescribed designs to web pages. The
main necessity and challenge is to deal with multi-
membership: An item could belong to numerous semantic
classes, and need to determine many conceivable the
dissimilar semantic classes the item belongs to. They treat
RASCs by way of “documents”, items by way of “words” and
the last semantic classes by way of “topics” to accept topic
models.
O. Ben-Yitzhak [18] extends faceted search to support
comfortable data detection responsibilities over more
complex information models. Their primary extension
enhances flexible, dynamic business intelligence
combinations to the faceted presentation, allowing users to
increase insight into their information that is far richer than
impartial knowing the numbers of documents going to
respectively facet. They understand this ability as a step
toward bringing OLAP abilities, conventionally supported by
databases completed relational information, to the domain of
free-text queries over metadata-rich content. Their next
extension displays how one can capably extend a faceted
search engine to provision related facets additional complex
data model in which the values related with a document
through multiple facets are not independent.
W. Dakka and P. G. Ipeirotis [19] detect that facet terms
rarely perform in text documents, viewing that they require
exterior resources to recognize useful facet terms. For this,
they first identify significant phrases in respective document.
Then, they develop respective phrase with “context” phrases
by means of external assets, for example WordNet and
Wikipedia, producing facet terms to perform the extended
database. Lastly, they associate the term deliveries in the
original database then the extended database to recognize the
terms that might be used to build browsing facets.

S. Riezler et al. [20] apply pairs of user queries as well as
snippets of user clicked results to train a model of machine
translation to associate the “lexical gap“ among query and
document space. They show that the combination of a query-
to-snippet translation model through a huge n-gram language
model trained on queries accomplishes developed relative
query extension related to a method based on term
correlations.
III. PROPOSED SYSTEM
In QDMiner, for a query q, the top-k results are retrieved
from a search engine then fetch complete documents to form
a set R as input. After that, query facets are mined by four
methods:
• Extraction of list and context: Lists and their context are
mined from every document in set R. “men’s watches,
kid’s watches, women’s watches, luxury watches,” is a
sample list mined.
• List weighting: Completely extracted lists are weighted,
then therefore some insignificant or noisy lists
infrequently occur on a page, for example the price list
“290.99, 340.99, 490.99...”, that may be allocated by low
weights.
• List clustering: Related lists are clustered composed to
encompass a facet. Such as, different lists near watch
gender categories are clustered since they share the
similar items “men’s” as well as “women’s”.
• Facet and item ranking: Additional Facets as well as their
items are assessed and ranked. Such as, the facet on
brands is well-ordered advanced than the facet on colors
positioned on in what way frequent the facets occur and
how suitable the related documents are. Inside the query
facet on gender types, “men’s” then “women’s” are
placed higher than “unisex” and ‘kids” built on how
common the items seem, and their rank in the original
lists.
This paper also proposes the technique that presents the
contents of items into the item-based collaborative filtering to
increase its prediction distinction and resolve the cold start
difficulty. The technique is called as ICHM (Item-based
Clustering Hybrid Method in which the item data and user
ratings are combined to multiply the item-item resemblance.
Clustering method not just can be applied to item-based
collaborative recommenders but furthermore may be applied
to user-based collaborative recommenders. The technique is
called as UCHM (User-based Clustering Hybrid Method which
is based on the characteristics of user profiles as well as
clustering significance is preserved as items. Nevertheless, in
ICHM, clustering is based on the characteristics of items and
clustering result is preserved as users.
A. Proposed Architecture Diagram
Figure 1 shows the proposed approach for emotion
recognition.
Fig. 1. Architecture of Proposed System
IV. PROPOSED SYSTEM PROCESSING
A. Preprocessing
Input:
D-set of documents D = d1,d2,...,dn and Ld-set of lists Ld = {l0}
extracted from the HTML content of d.
Process:
1) List Weighting
a) Compute document matching weight as:
SDCO = Pd∈R (smd .srd), where, smd .srd is the supporting
score by each result d,
, where,Nl,d is the number of items which
appear both in list l and document d

, where rankd is the rank of document b)
Compute average invert document frequency (IDF)
of items:
,
where, , Where Ne is the total
number of documents that contain item e in the
corpus and N is the total number of documents.
c) Evaluate the importance of a list l as:
Sl = SDOC.SIDF
2) List Clustering
Use the complete linkage distance to compute the
distance between two clusters of lists l1, l2. dc(c1,c2) =
maxl1∈c1,l2∈c2dl(l1,l2) Where,
3) Item Ranking
a) Calculate the weight of an item e within a facet c
√
Where w(c,e,C) the weight contributed by a group
of lists G, and AvgRankc,e,G is the average rank of
item e within all lists extracted from group G.
Suppose L(c,e,G) is the set of all lists in c and G(G ⊆
c) that contain item e.
B. Collaboative Filtering
For collaborative clustering, the Pearson correlation based
similarity and adjusted cosine similarity methods are used.
Using the linear combination of these methods, user can get
predicted results.
1) Pearson correlation-based Similarity
Where, sim(k,l) means the similarity between item k
and l, n means the total number of users, who rated on
both item k and l, R¯
k,R¯
l are the average ratings of item k
and l , respectively; Ru,k , Ru,l mean the rating of user u on
item k and l respectively.
2) Adjusted Cosine Similarity
3) Linear Combination sim(k,l) = sim(k,l)item
×(1−c)+sim(k,l)group ×c
Where, c Means the combination coefficient, sim(k,l)item
Means that the similarity between item k and l,
sim(k,l)group Means that the similarity between item k
and l
4) Collaborative Prediction Prediction for an item is then
computed by
V. ANALYSIS AND RESULTS
A. Dataset
For calculation, product data is composed from web having
different categories such as gender, brand, colours etc.
B. Results
To evaluate the performance, time require to search query
on various database size is used. The expected results are
evaluated according to time requires to process user query1,
query2 and query 3 that extract output result. The table 1
show readings for processing time require for query
processing.
TABLE I: TIME REQUIRE FOR SEARCH RESULT
Database
size
Query1 Query2 Query3
50 20 12 21
100 25 44 12
150 35 48 30
200 45 48 35
Fig. 2. Time require for search result

In item based collaborative method, which makes
prediction only based on item-based matrix as in table 2, it is
impossible to make predictions on this item. ICHM matrix
presentation of some of our products and users are shown in
figure 3.
Fig. 3. ICHM Matrix Presentation
In user based collaborative method, which can makes
prediction for users, based on group rating. UCHM matrix
presentation of some of our products and users are shown in
figure 4.
Fig. 4. UCHM Matrix Presentation
Search time for different records :
1) search result time for 25 records
2) Search result time for 50 records
VI. CONCLUSION
The paper introduces collaborative filtering technique for
product recommendation to user. It applies clustering
technique to the item content information (ICHM) to
complement the user rating information (UCHM), which
improves the correctness of collaborative similarity.This
paper proposes the query facet which is a collection of items
which describe and summarize important feature of a query.
This paper address the issue of finding query facets which are
numerous groups of words or else phrases that clarify and
summarize the content enclosed by a query. Paper assume
that the significant aspects of a query are generally accessible
and frequent in the querys top-k retrieved documents in the
form of lists, as well as query facets can be extracted by
aggregating these important lists. A systematic resolution is
suggested which denoted as QDMiner, to automatically
extract query facets by extracting as well as grouping
repeated lists from HTML tags, free text and duplicate regions
within topk search results. Moreover this paper introduces
clustering techniques to the item content data to
accompaniment the user rating statistics, which increases the

accuracy of collaborative similarity. Using collaborative idea,
effectiveness of scheme get increase since of user intends
documents is recommended to user. Therefore user search
time becomes less for the same or similar data that require to
user
ACKNOWLEDGMENT
The authors would like to thank the researchers as well as
publishers for making their resources available and teachers
for their guidance.
REFERENCES
[1] L. Bing, W. Lam, T.-L. Wong, and S. Jameel, Web query
reformulation via joint modeling of latent topic
dependency and term context, ACM Trans. Inf. Syst., vol.
33, no. 2, pp. 6:16:38, eb. 2015.
[2] W. Kong and J. Allan, Extending faceted search to the
general web, in Proc.ACMInt. Conf. Inf. Knowl. Manage.,
2014, pp. 839848.
[3] I. Szpektor, A. Gionis, and Y. Maarek, Improving
recommendation for long-tail queries via templates, in
Proc. 20th Int. Conf. World Wide Web, 2011, pp. 4756.
[4] X. Xue and W. B. Croft, Modeling reformulation using
query distributions, ACM Trans. Inf. Syst., vol. 31, no. 2, pp.
6:16:34, May 2013.
[5] L. Li, L. Zhong, Z. Yang, and M. Kitsuregawa, Qubic: An
adaptive approach to query-based recommendation, J.
Intell. Inf. Syst., vol. 40, no. 3, pp. 555587, Jun. 2013.
[6] W. Kong and J. Allan, Extracting query facets from search
results, in Proc. 36th Int. ACM SIGIR Conf. Res. Develop.
Inf. Retrieval, 2013, pp. 93102.
[7] Qing Li and Byeong Man Kim, An Approach for Combining
Contentbased and Collaborative Filters, Korea Research
Foundation Grant (KRF2002-041-D00459), 2002.
[8] I. Szpektor, A. Gionis, and Y. Maarek, Improving
recommendation for long-tail queries via templates, in
Proc. 20th Int. Conf. World Wide Web, 2011, pp. 4756.
[9] J. Pound, S. Paparizos, and P. Tsaparas, Facet discovery for
structured web search: A query-log mining approach, in
Proc. ACM SIGMOD Int. Conf. Manage. Data, 2011, pp.
169180.
[10] M. Diao, S. Mukherjea, N. Rajput, and K. Srivastava,,
Faceted search and browsing of audio content on spoken
web, in Proc. 19th ACM Int. Conf. Inf. Knowl. Manage.,
2010, pp. 10291038.
[11] K. Balog, E. Meij, and M. de Rijke, Entity search: Building
bridges between two worlds, in Proc. 3rd Int. Semantic
Search Workshop, 2010, pp. 9:19:5.
[12] M. Bron, K. Balog, and M. de Rijke, Ranking related
entities: Components and analyses, in Proc. ACM Int. Conf.
Inf. Knowl. Manage., 2010, pp. 10791088.
[13] C. Li, N. Yan, S. B. Roy, L. Lisham, and G. Das,
Facetedpedia: Dynamic generation of query-dependent
faceted interfaces for wikipedia, in Proc. 19th Int. Conf.
World Wide Web, 2010, pp. 651660.
[14] A. Herdagdelen, M. Ciaramita, D. Mahler, M. Holmqvist, K.
Hall, S. Riezler, and E. Alfonseca, Generalized syntactic and
semantic models of query reformulation, in Proc. 33rd Int.
ACM SIGIR Conf. Res. Develop. Inf. retrieval, 2010, pp.
283290.
[15] J. Huang and E. N. Efthimiadis, Analyzing and evaluating
query reformulation strategies in web search logs, in Proc.
18th ACM Conf. Inf. Knowl. Manage., 2009, pp. 7786.
[16] S. Gholamrezazadeh, M. A. Salehi, and B. Gholamzadeh, A
comprehensive survey on text summarization systems, in
Proc. 2nd Int. Conf. Comput. Sci. Appli., 2009, pp. 16.
[17] H. Zhang, M. Zhu, S. Shi, and J.-R. Wen, Employing topic
models for pattern-based semantic class discovery, in
Proc. Joint Conf. 47th Annu. Meet. ACL 4th Int. Joint Conf.
Natural Lang. Process. AFNLP, 2009, pp. 459467.
[18] O. Ben-Yitzhak, N. Golbandi, N. HarEl, R. Lempel, A.
Neumann,S. Ofek-Koifman, D. Sheinwald, E. Shekita, B.
Sznajder, and S. Yogev, Beyond basic faceted search, in
Proc. Int. Conf. Web Search Data Mining, 2008, pp. 3344.
[19] W. Dakka and P. G. Ipeirotis, Automatic extraction of
useful facet hierarchies from text databases, in Proc. IEEE
24th Int. Conf. Data Eng., 2008, pp. 466475.
[20] S. Riezler, Y. Liu, and A. Vasserman, Translating queries
into snippets for improved query expansion, in Proc. 22nd
Int. Conf. Comput. Ling., 2008, pp. 737744.

Query Recommendation by using Collaborative Filtering Approach

More Related Content

What's hot (17)

Similar to Query Recommendation by using Collaborative Filtering Approach (20)

More from IRJET Journal (20)

Recently uploaded (20)

Query Recommendation by using Collaborative Filtering Approach