IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661, ISBN: 2278-8727Volume 6, Issue 6 (Nov. - Dec. 2012), PP 36-43
www.iosrjournals.org
www.iosrjournals.org 36 | Page
Query- And User-Dependent Approach for Ranking Query
Results in Web Databases
Sruthi Ambati1
, Raghava Rao2
1
Department of CSE, DRK Institute of Science & Technology, Ranga Reddy, Andhra Pradesh, India HOD
2
Department of CSE, DRK Institute of Science & Technology, Ranga Reddy, Andhra Pradesh, India
Abstract: Internet has paved the way for the emergence of web databases. Querying such databases for
required information has become a common task. Ranking such query results is an open problem to be
addressed. The existing solutions such as user profiles, query logs, and database values perform ranking in user
independent and/or query independent fashion. This can’t provide efficient ranking. This paper presents a new
approach known as Query and User Dependent Ranking for giving ranking to query results of deep web. The
proposed ranking framework is based on two fundamental aspects to the problem of ranking query results. They
are query similarity and user similarity. These similarities are exploited to make efficient ranking of query
results. A prototype application is built to test the efficiency of our model. The empirical results revealed that
our approach is efficient and can be used in real world applications.
Index Terms: Deep web, ranking query results, user similarity, query similarity
I. Introduction:
Due to emergence of Internet and its related technologies, people of all walks of life started storing data
over web. This helps them to access the content from anywhere in the world. Thus web databases also emerged.
These web databases are known as deep web [1], [2]. The web databases are from various domains such as
vehicles, real estate, education, health care and so on. These web databases are searched by online users through
a search mechanism provided. The queries can have criteria that correspond to the attributes of the database
schema. When results returned are huge in number, user time gets wasted in brewing for required information.
To overcome this problem the present web databases simplify the results by sorting them in a particular
attribute. This may not be suitable to the requirements of many users who prefer ordering on multiple attributes.
There are many existing web databases in the world that can be accessed through WWW. For instance Google’s
Google Base which is a web database that stored information about vehicles with all relevant attributes such as
Make, Mileage, Price etc. In this table each record represents a vehicle in the real world that is ready for sale.
Two scenarios which are common are considered for making a new ranking model in this paper. The first
scenario is that various web users can have different ranking preferences towards the results of the same query.
This is because each user requires different information that is part of the same query results. Thus ranking
preferences of each user is different. The second scenario is that same user ma has different ranking preferences
for the results of different queries. From these two scenarios it can be understood that various users make
queries to web databases and the queries they make may be similar or different. Therefore it is ideal to have user
dependent and also query dependent similarity for ranking query results. Many existing web databases follow
simple sorting for ranking while the extension of SQL allows providing attribute weights [3], [4]. For most web
databases this approach is not user friendly and users have to waste some of their time in browsing the query
results. For this reason an automated ranking of query results is studied and some techniques are proposed in
[5], [6], and [7]. However these approaches either user query independent or user independent way of ranking
query results. Another approach used in [8] if to build extensive user profiles and in that case users are supposed
to order the records. This is proposed for user-dependent ranking and do not differentiate the different between
different queries and provide a single ranking order for any query. Even recommender systems made use of
either user similarity or query similarity. Some of them are collaborative in nature [9], [10], [11] and some of
them are content based filters [12], [13]. The work in this paper is inspired from those works. However, there is
difference between them. Our work is based on user similarity and also queries similarity. It does mean that the
proposed approach is user and query dependent ranking for web database.
In this paper the ranking model is based on two notions such as user similarity and query similarity.
User similarity indicates that different users can have same preferences. Query similarity indicates that different
users can have same queries. In order to achieve this ranking of users and queries are to be maintained. We have
developed a workload file that contains the user and query ranking functions. When new record is entered into
web database, obviously that is given by a user. There might be many users who issued that query previously
and there might be same queries issued earlier. The workload file is in tabular format and it gets updated with
ranking functions as per the proposed algorithm as and when new queries are made. The proposed model has
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 37 | Page
two models mixed. They are known as user dependent ranking model and query dependent ranking model.
However, we prefer applying both of them for better results. The proposed ranking modal in this paper is a
linear weighted sum function. It contains attribute weights and value weights. Attribute weights indicate the
importance of attributes while the value weight indicates the importance of values of attributes. Relevance
feedback techniques [14] are utilized for making the workload minimal. The main contributions of this paper are
 User and query dependent approach for ranking web databases.
 Ranking model based on user similarity and query similarity notions.
 Two synthetic databases such as college and hospital used for experiments. However, the model can be
tested with web databases.
 We used a new workload concept for maintaining the updated ranking of users and queries.
II. Related Work:
Usage of web databases has brought the ranking the query results concept. There is no such
requirement in case of relational databases. However it is there in case of IR for some time. Ranking gained
popularity with emergence of deep web. Ranking has become an essential task as the results of query results in
large number of records that waste user’s time as he has to browse the results for actual information required.
Recommender systems have been using ranking for making best recommendations to end users of online
applications. With respect to user and query similarity this paper resembles to the work done in [9], [11] and
[10]. It also has some relevance with content filtering mechanisms explored in [13] and [15]. There is main
different between ranking a database and making recommendations. This way our work differs from existing
work. The existing web databases make use of simple ordering for ranking while our proposed framework
focuses on user similarity and query similarity based. Moreover the existing techniques for ranking do not user
both similarities. They are either user independent or query independent or independent of both of them. In the
case of recommender systems [16], and [17] each attribute holds presence or absence of the user input tag. In
case of user and query similarity matrix each cell contains ordered set of functions represented by ranking
function. These way recommender systems differ from web databases. However, both of them make use of
ranking phenomenon. The notion of similarity also makes our work different from existing ones as specified
earlier approaches don’t use query and user similarity together. This paper makes use of query and user
similarity at the same time and the resultant ranking function is updated in to a workload file which is nothing
but a relational table in this paper. Incase of content filtering concepts, the similarity is found by domain expert
or user profiles are used [15]. The same can be achieved by using user profiles [15]. Direct user profiles usage in
our paper is not possible as we need to find both user and query similarities. Thus the work load file concept in
this paper is having prominent importance. This is because the user profile gives importance to user information
rather than the queries. We also assume that different users have different query preferences and same user may
have different query preferences. The notion of user similarity is same as the concept used in collaborative
filtering; however the technique used is different. Based on ratings users are compared in collaborative filtering;
our work extends user personalization besides considering query similarity notion. Thus our work stands
different from many existing works. As web databases and query results of them has received attention from
academic circles, the user and query dependent ranking has not been addressed. Chaudhuri et al. [5] addresses
only query – dependent ranking using vector and IR models. In [6] user – dependent ranking for web databases
is explored. In either case both user and query dependent ranking is not used. Thus this paper is first in its
solution for both user and query dependent ranking of query results of web databases. In [18] the approach for
user similarity requires the user to specify ordering without making queries. These approaches do not recognize
the fact that users are having different ranking preferences.
The closest of our approach are [3], [4] and [19]. However, these techniques are not suitable for
efficient ranking of query results of web databases as they do not consider both user and query similarity
notions. The cosine similarity metric proposed in [20] and the IR method proposed in [21] and relevance
feedback approaches in [14], [22], [23] and [24] are not suitable for direct use for web databases. For this reason
we implement a ranking model that is based on the user and query similarity. It does mean that the proposed
model is query and user dependent ranking model.
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 38 | Page
III. Proposed Ranking Algorithm
Listing 1 – Ranking algorithm [25]
This algorithm is based on the algorithm given in [36] and implemented in the prototype application which
shows both user and query dependent rankings for query results of web databases.
IV. Sample Workload File
The sample workload file is given in fig. 1 which shows queries, users and the ranking functions
calculated as per the algorithm given in listing 1.
Table 1 – Sample Workload
V. Experimental Evaluation
The experiments are made using a prototype application with two synthetic web databases such as college and
hospital. The experimental results are evaluated by visualizing the results in the form of graphs. Fig. 1 and 2
shows the ranking quality of query similarity models for both databases with 10% work load.
Fig. 2 – Ranking quality of query similarity (College DB)
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14
SpearmanCoefficient
College DB (10% Workload)
query
result
similarit
y
query
conditio
n
similarit
y
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 39 | Page
Fig. 3 – Ranking quality of query similarity (Hospital DB)
As can be seen in fig. 2 and 3, query condition similarity average is found across all queries. The X
axis shows queries while the Y axis shows spearman coefficient. As it is evident in the graphs, the query
condition model outperforms query result model. The loss of quality is due the restricted workload that is 10%.
When workload increases, the quality also increases.
Fig. 4 – Ranking quality of user similarity model (College DB)
Fig. 5 – Ranking quality of user similarity model (Hospital D)
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SpearmanCoefficient
HospitalDB (10% Workload)
query result
similarity
query
condition
similarity
0.65
0.7
0.75
0.8
0.85
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SpearmanCoefficient
College DB(10% workload)
query
independent
clustering
strict top-K
user top-K
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
SpearmanCoefficient
Hospitalestate DB(10% workload)
query
indepen
dent
clusteri
ng
strict
top-K
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 40 | Page
Fig. 6 – Ranking quality of user similarity model (College DB)
Fig. 4, 5, and 6 show the average ranking quality achieved fro both college and hospital database across
all queries for all users. The results reveal that strict top-K model performs bittern than other models. However,
the strict top-K has no ranking functions for many queries.
Fig. 7 – Ranking functions derived for user similarity (College DB)
Fig. 8 – Ranking functions derived for user similarity (Hospital DB)
Fig. 7 and 8 confirm the fact that different models have different abilities for determining ranking
functions across the workload. Nevertheless, the strict top-K is accurate and superior to all other models from
the perspective of ranking function.
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SpearmanCoefficient
College DB(User 27, 10% workload)
query
independent
clustering
strict top-K
user top-K
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5
Series3
Series2
Series1
0
200
400
600
800
1000
1200
1 2 3 4 5
Series3
Series2
Series1
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 41 | Page
Fig. 9 – Ranking quality of combined similarity model
Fig. 10 – Ranking quality of combined similarity model
Fig. 11 – Ranking quality of combined similarity model
0.8
0.82
0.84
0.86
0.88
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SpearmanCoefficient
College DB(10% workload)
Combined
Sim
Query Sim
0.68
0.7
0.72
0.74
0.76
0.78
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SpearmanCoefficient
College DB (1% workload)
Combin
ed Sim
Query
Sim
User
Sim
0.77
0.78
0.79
0.8
0.81
0.82
0.83
0.84
0.85
0.86
0.87
1 2 3 4 5 6 7 8 9 1011121314151617181920
SpearmanCoefficient
HospitalDB (10% workload)
Combi
ned
Sim
Query
Sim
User
Sim
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 42 | Page
Fig. 12 – Ranking quality of combined similarity model
Fig. 9, 10, 11 and 12 show the quality of combined models for both databases with 1% and 10% workload. The
important observation is that the composite model is performing better than other individual models. Another
fact established here is that with more ranking functions in workload better similarity and quality of results is
achieved.
VI. Conclusion
This paper proposed a new ranking model for ranking query results of web databases. We used two
synthetic web databases for experiments. They are college database and hospital database. The model is based
on both query similarity and user similarity. We have also built a prototype web based application that
demonstrates the efficiency of the proposed ranking model. A workload file is maintained that that continually
stores updated ranking functions for both user similarity and query similarity. When a new query is made, this
workload file is used for giving ranking to the query results. Designing and maintaining a workload is
challenging in the context of web databases. We have implemented an algorithm for computing user and query
similarities and update workload consistently. The experimental results revealed that our new ranking model
works well and it can be explored for real world web databases.
References:
[1] M.K. Bergman, “The Deep Web: Surfacing Hidden Value,” J. Electronic Publishing, vol. 7, no. 1, pp. 41-50, 2001.
[2] K.C.-C. Chang, B. He, C. Li, M. Patil, and Z. Zhang, “Structured Databases on the Web: Observations and Implications,” SIGMOD
Record, vol. 33, no. 3, pp. 61-70, 2004.
[3] C. Li, K.C.-C. Chang, I.F. Ilyas, and S. Song, “Ranksql: Query Algebra and Optimization for Relational Top-k Queries,” Proc.
ACM SIGMOD Int’l Conf. Management of Data, pp. 131-142, 2005.
[4] A. Marian, N. Bruno, and L. Gravano, “Evaluating Top-k Queries over Web-Accessible Databases,” ACM Trans. Database
Systems, vol. 29, no. 2, pp. 319-362, 2004.
[5] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic Ranking of Database Query Results,” Proc. 30th Int’l Conf. Very
Large Data Bases (VLDB), pp. 888-899, 2004. 684 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 24, NO. 9, SEPTEMBER 2012 Fig. 10. Ranking quality of learning models.
[6] W. Su, J. Wang, Q. Huang, and F. Lochovsky, “Query Result Ranking over E-Commerce Web Databases,” Proc. Conf. Information
and Knowledge Management (CIKM), pp. 575-584, 2006.
[7] H. Yu, Y. Kim, and S. won Hwang, “Rv-svm: An Efficient Method for Learning Ranking Svm,” Proc. Pacific-Asia Conf.
Knowledge Discovery and Data Mining (PAKDD), pp. 426-438, 2009.
[8] G. Koutrika and Y.E. Ioannidis, “Constrained Optimalities in Query Personalization,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, pp. 73-84, 2005.
[9] J. Basilico and T. Hofmann, “A Joint Framework for Collaborative and Content Filtering,” Proc. 27th Ann. Int’l ACM SIGIR Conf.
Research and Development in Information Retrieval, pp. 550- 551, 2004.
[10] T. Hofmann, “Collaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis,” Proc. 26th Ann. Int’l ACM SIGIR
Conf. Research and Development in Information Retrieval, pp. 259-266, 2003.
[11] D. Billsus and M.J. Pazzani, “Learning Collaborative Information Filters,” Proc. Int’l Conf. Machine Learning (ICML), pp. 46-54,
1998.
[12] M. Balabanovic and Y. Shoham, “Content-Based Collaborative Recommendation,” Comm. ACM, vol. 40, no. 3, pp. 66-72, 1997.
[13] C. Basu, H. Hirsh, and W.W. Cohen, “Recommendation as Classification: Using Social and Content-Based Information in
Recommendation,” Proc. 15th Nat’l Conf. Artificial Intelligence (AAAI/IAAI), pp. 714-720, 1998.
[14] B. He, “Relevance Feedback,” Encyclopedia of Database Systems, pp. 2378-2379, Springer, 2009.
[15] S. Gauch and M. Speretta, “User Profiles for Personalized Information Access,” Adaptive Web, pp. 54-89, 2007.
[16] S. Amer-Yahia, A. Galland, J. Stoyanovich, and C. Yu, “From del.icio.us to x.qui.site: Recommendations in Social Tagging Sites,”
Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 1323-1326, 2008.
0.66
0.68
0.7
0.72
0.74
0.76
0.78
1 2 3 4 5 6 7 8 9 1011121314151617181920
SpearmanCoefficient
HospitalDB(1% work load)
Combined
Sim
Query Sim
User Sim
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
www.iosrjournals.org 43 | Page
[17] A. Penev and R.K. Wong, “Finding Similar Pages in a Social Tagging Repository,” Proc. 17th Int’l Conf. World Wide Web
(WWW), pp. 1091-1092, 2008.
[18] S.-W. Hwang, “Supporting Ranking For Data Retrieval,” PhD thesis, Univ. of Illinois, Urbana Champaign, 2005.
[19] K. Werner, “Foundations of Preferences in Database Systems,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB), pp. 311-322,
2002.
[20] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999.
[21] N. Fuhr, “A Probabilistic Framework for Vague Queries and Imprecise Information in Databases,” Proc. 16th Int’l Conf. Very
Large Data Bases (VLDB), pp. 696-707, 1990.
[22] Y. Rui, T.S. Huang, and S. Mehrotra, “Content-Based Image Retrieval with Relevance Feedback in Mars,” Proc. IEEE Int’l Conf.
Image Processing, pp. 815-818, 1997.
[23] L. Wu et al., “Falcon: Feedback Adaptive Loop for Content-Based Retrieval,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp.
297- 306, 2000.
[24] X. Luo, X. Wei, and J. Zhang, “Guided Game-Based Learning Using Fuzzy Cognitive Maps,” IEEE Trans. Learning Technologies,
vol. 3, no. 4, pp. 344-357, Oct.-Dec. 2010.
[25] Aditya Telang, Chengkai Li, and Sharma Chakravarthy, “One Size Does Not Fit All: Toward User- and Query-Dependent Ranking
for Web Databases”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER
2012.
AUTHORS
Sruthi Ambati is a student of DRK Institute of science and Technology, Ranga Reddy,
Andhra Pradesh, India. He has received B.Tech degree in Computer Science and
Engineering and M.Tech Degree in Computer Science and Engineering. Her main research
interest includes Data Mining and Image Processing.
Raghava Rao.N is working as HOD and Associate Professor at DRK Institute of Science &
Technology, Ranga Reddy, and Andhra Pradesh, India. He has received M.Tech Degree in
Computer Science and Engineering. His Main Interest includes Cloud Computing, Software
Engineering.

More Related Content

PDF
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
PDF
dynamic query forms for non relational database
PDF
Personalized web search using browsing history and domain knowledge
PDF
F0433439
PDF
Recent research in web page classification – a review
PDF
Recent research in web page classification – a review
PDF
Cluster Based Web Search Using Support Vector Machine
PDF
24 24 nov16 13455 27560-1-sm(edit)
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
dynamic query forms for non relational database
Personalized web search using browsing history and domain knowledge
F0433439
Recent research in web page classification – a review
Recent research in web page classification – a review
Cluster Based Web Search Using Support Vector Machine
24 24 nov16 13455 27560-1-sm(edit)

What's hot (14)

PDF
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
PDF
50120140502013
PDF
AN ARCHITECTURE FOR WEB SERVICE SIMILARITY EVALUATION BASED ON THEIR FUNCTION...
PDF
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
PPTX
Determining Relevance Rankings from Search Click Logs
PDF
K1803057782
PDF
Determining Relevance Rankings with Search Click Logs
PDF
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
PDF
Web service discovery methods and techniques a review
PDF
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
PDF
Vol 12 No 1 - April 2014
PDF
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
DOCX
TAG BASED IMAGE SEARCH BY SOCIAL RE-RANKING
PDF
Classification-based Retrieval Methods to Enhance Information Discovery on th...
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
50120140502013
AN ARCHITECTURE FOR WEB SERVICE SIMILARITY EVALUATION BASED ON THEIR FUNCTION...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
Determining Relevance Rankings from Search Click Logs
K1803057782
Determining Relevance Rankings with Search Click Logs
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
Web service discovery methods and techniques a review
IRJET- A Literature Review and Classification of Semantic Web Approaches for ...
Vol 12 No 1 - April 2014
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
TAG BASED IMAGE SEARCH BY SOCIAL RE-RANKING
Classification-based Retrieval Methods to Enhance Information Discovery on th...
Ad

Viewers also liked (20)

PDF
H0954451
PDF
A0520106
PDF
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
PDF
H0524548
PDF
Capacitive voltage and current induction phenomena in GIS substation
PDF
Detection of Malignancy in Digital Mammograms from Segmented Breast Region Us...
PDF
Performance evaluation of Hard and Soft Wimax by using PGP and PKM protocols ...
PDF
B0610611
PPTX
Perplexity of Index Models over Evolving Linked Data
PDF
M0537683
PPTX
Why U.S. Bank Lost Its Case against Ibanez on a Foreclosed Property
PDF
D0532025
PDF
D0151724
PDF
8 hw cells_organisation
PDF
Hiding Text within Image Using LSB Replacement
PPTX
Energy
PDF
Designing Secure Systems Using AORDD Methodologies in UML System Models
PDF
J0427985
PDF
D0562022
PDF
A0710113
H0954451
A0520106
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
H0524548
Capacitive voltage and current induction phenomena in GIS substation
Detection of Malignancy in Digital Mammograms from Segmented Breast Region Us...
Performance evaluation of Hard and Soft Wimax by using PGP and PKM protocols ...
B0610611
Perplexity of Index Models over Evolving Linked Data
M0537683
Why U.S. Bank Lost Its Case against Ibanez on a Foreclosed Property
D0532025
D0151724
8 hw cells_organisation
Hiding Text within Image Using LSB Replacement
Energy
Designing Secure Systems Using AORDD Methodologies in UML System Models
J0427985
D0562022
A0710113
Ad

Similar to Query- And User-Dependent Approach for Ranking Query Results in Web Databases (20)

PDF
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
PDF
50120140506005 2
PDF
User search goal inference and feedback session using fast generalized – fuzz...
PDF
10 personalized-web-search-techniques
PDF
`A Survey on approaches of Web Mining in Varied Areas
PDF
Data mining in web search engine optimization
PDF
A Survey on: Utilizing of Different Features in Web Behavior Prediction
PDF
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
PDF
Tourism Based Hybrid Recommendation System
PDF
50120140502013
PDF
Ontological and clustering approach for content based recommendation systems
PDF
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
PDF
An effective search on web log from most popular downloaded content
PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
PDF
Query Recommendation by using Collaborative Filtering Approach
PDF
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
PDF
AN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONS
PDF
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
PDF
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
PDF
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
50120140506005 2
User search goal inference and feedback session using fast generalized – fuzz...
10 personalized-web-search-techniques
`A Survey on approaches of Web Mining in Varied Areas
Data mining in web search engine optimization
A Survey on: Utilizing of Different Features in Web Behavior Prediction
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
Tourism Based Hybrid Recommendation System
50120140502013
Ontological and clustering approach for content based recommendation systems
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
An effective search on web log from most popular downloaded content
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Query Recommendation by using Collaborative Filtering Approach
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
AN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONS
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...

More from IOSR Journals (20)

PDF
A011140104
PDF
M0111397100
PDF
L011138596
PDF
K011138084
PDF
J011137479
PDF
I011136673
PDF
G011134454
PDF
H011135565
PDF
F011134043
PDF
E011133639
PDF
D011132635
PDF
C011131925
PDF
B011130918
PDF
A011130108
PDF
I011125160
PDF
H011124050
PDF
G011123539
PDF
F011123134
PDF
E011122530
PDF
D011121524
A011140104
M0111397100
L011138596
K011138084
J011137479
I011136673
G011134454
H011135565
F011134043
E011133639
D011132635
C011131925
B011130918
A011130108
I011125160
H011124050
G011123539
F011123134
E011122530
D011121524

Recently uploaded (20)

PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPT
Geologic Time for studying geology for geologist
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPT
What is a Computer? Input Devices /output devices
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Architecture types and enterprise applications.pdf
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Developing a website for English-speaking practice to English as a foreign la...
Build Your First AI Agent with UiPath.pptx
Improvisation in detection of pomegranate leaf disease using transfer learni...
Flame analysis and combustion estimation using large language and vision assi...
Geologic Time for studying geology for geologist
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
NewMind AI Weekly Chronicles – August ’25 Week III
What is a Computer? Input Devices /output devices
Custom Battery Pack Design Considerations for Performance and Safety
Architecture types and enterprise applications.pdf
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Consumable AI The What, Why & How for Small Teams.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A contest of sentiment analysis: k-nearest neighbor versus neural network
UiPath Agentic Automation session 1: RPA to Agents
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
TEXTILE technology diploma scope and career opportunities
Developing a website for English-speaking practice to English as a foreign la...

Query- And User-Dependent Approach for Ranking Query Results in Web Databases

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 6, Issue 6 (Nov. - Dec. 2012), PP 36-43 www.iosrjournals.org www.iosrjournals.org 36 | Page Query- And User-Dependent Approach for Ranking Query Results in Web Databases Sruthi Ambati1 , Raghava Rao2 1 Department of CSE, DRK Institute of Science & Technology, Ranga Reddy, Andhra Pradesh, India HOD 2 Department of CSE, DRK Institute of Science & Technology, Ranga Reddy, Andhra Pradesh, India Abstract: Internet has paved the way for the emergence of web databases. Querying such databases for required information has become a common task. Ranking such query results is an open problem to be addressed. The existing solutions such as user profiles, query logs, and database values perform ranking in user independent and/or query independent fashion. This can’t provide efficient ranking. This paper presents a new approach known as Query and User Dependent Ranking for giving ranking to query results of deep web. The proposed ranking framework is based on two fundamental aspects to the problem of ranking query results. They are query similarity and user similarity. These similarities are exploited to make efficient ranking of query results. A prototype application is built to test the efficiency of our model. The empirical results revealed that our approach is efficient and can be used in real world applications. Index Terms: Deep web, ranking query results, user similarity, query similarity I. Introduction: Due to emergence of Internet and its related technologies, people of all walks of life started storing data over web. This helps them to access the content from anywhere in the world. Thus web databases also emerged. These web databases are known as deep web [1], [2]. The web databases are from various domains such as vehicles, real estate, education, health care and so on. These web databases are searched by online users through a search mechanism provided. The queries can have criteria that correspond to the attributes of the database schema. When results returned are huge in number, user time gets wasted in brewing for required information. To overcome this problem the present web databases simplify the results by sorting them in a particular attribute. This may not be suitable to the requirements of many users who prefer ordering on multiple attributes. There are many existing web databases in the world that can be accessed through WWW. For instance Google’s Google Base which is a web database that stored information about vehicles with all relevant attributes such as Make, Mileage, Price etc. In this table each record represents a vehicle in the real world that is ready for sale. Two scenarios which are common are considered for making a new ranking model in this paper. The first scenario is that various web users can have different ranking preferences towards the results of the same query. This is because each user requires different information that is part of the same query results. Thus ranking preferences of each user is different. The second scenario is that same user ma has different ranking preferences for the results of different queries. From these two scenarios it can be understood that various users make queries to web databases and the queries they make may be similar or different. Therefore it is ideal to have user dependent and also query dependent similarity for ranking query results. Many existing web databases follow simple sorting for ranking while the extension of SQL allows providing attribute weights [3], [4]. For most web databases this approach is not user friendly and users have to waste some of their time in browsing the query results. For this reason an automated ranking of query results is studied and some techniques are proposed in [5], [6], and [7]. However these approaches either user query independent or user independent way of ranking query results. Another approach used in [8] if to build extensive user profiles and in that case users are supposed to order the records. This is proposed for user-dependent ranking and do not differentiate the different between different queries and provide a single ranking order for any query. Even recommender systems made use of either user similarity or query similarity. Some of them are collaborative in nature [9], [10], [11] and some of them are content based filters [12], [13]. The work in this paper is inspired from those works. However, there is difference between them. Our work is based on user similarity and also queries similarity. It does mean that the proposed approach is user and query dependent ranking for web database. In this paper the ranking model is based on two notions such as user similarity and query similarity. User similarity indicates that different users can have same preferences. Query similarity indicates that different users can have same queries. In order to achieve this ranking of users and queries are to be maintained. We have developed a workload file that contains the user and query ranking functions. When new record is entered into web database, obviously that is given by a user. There might be many users who issued that query previously and there might be same queries issued earlier. The workload file is in tabular format and it gets updated with ranking functions as per the proposed algorithm as and when new queries are made. The proposed model has
  • 2. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 37 | Page two models mixed. They are known as user dependent ranking model and query dependent ranking model. However, we prefer applying both of them for better results. The proposed ranking modal in this paper is a linear weighted sum function. It contains attribute weights and value weights. Attribute weights indicate the importance of attributes while the value weight indicates the importance of values of attributes. Relevance feedback techniques [14] are utilized for making the workload minimal. The main contributions of this paper are  User and query dependent approach for ranking web databases.  Ranking model based on user similarity and query similarity notions.  Two synthetic databases such as college and hospital used for experiments. However, the model can be tested with web databases.  We used a new workload concept for maintaining the updated ranking of users and queries. II. Related Work: Usage of web databases has brought the ranking the query results concept. There is no such requirement in case of relational databases. However it is there in case of IR for some time. Ranking gained popularity with emergence of deep web. Ranking has become an essential task as the results of query results in large number of records that waste user’s time as he has to browse the results for actual information required. Recommender systems have been using ranking for making best recommendations to end users of online applications. With respect to user and query similarity this paper resembles to the work done in [9], [11] and [10]. It also has some relevance with content filtering mechanisms explored in [13] and [15]. There is main different between ranking a database and making recommendations. This way our work differs from existing work. The existing web databases make use of simple ordering for ranking while our proposed framework focuses on user similarity and query similarity based. Moreover the existing techniques for ranking do not user both similarities. They are either user independent or query independent or independent of both of them. In the case of recommender systems [16], and [17] each attribute holds presence or absence of the user input tag. In case of user and query similarity matrix each cell contains ordered set of functions represented by ranking function. These way recommender systems differ from web databases. However, both of them make use of ranking phenomenon. The notion of similarity also makes our work different from existing ones as specified earlier approaches don’t use query and user similarity together. This paper makes use of query and user similarity at the same time and the resultant ranking function is updated in to a workload file which is nothing but a relational table in this paper. Incase of content filtering concepts, the similarity is found by domain expert or user profiles are used [15]. The same can be achieved by using user profiles [15]. Direct user profiles usage in our paper is not possible as we need to find both user and query similarities. Thus the work load file concept in this paper is having prominent importance. This is because the user profile gives importance to user information rather than the queries. We also assume that different users have different query preferences and same user may have different query preferences. The notion of user similarity is same as the concept used in collaborative filtering; however the technique used is different. Based on ratings users are compared in collaborative filtering; our work extends user personalization besides considering query similarity notion. Thus our work stands different from many existing works. As web databases and query results of them has received attention from academic circles, the user and query dependent ranking has not been addressed. Chaudhuri et al. [5] addresses only query – dependent ranking using vector and IR models. In [6] user – dependent ranking for web databases is explored. In either case both user and query dependent ranking is not used. Thus this paper is first in its solution for both user and query dependent ranking of query results of web databases. In [18] the approach for user similarity requires the user to specify ordering without making queries. These approaches do not recognize the fact that users are having different ranking preferences. The closest of our approach are [3], [4] and [19]. However, these techniques are not suitable for efficient ranking of query results of web databases as they do not consider both user and query similarity notions. The cosine similarity metric proposed in [20] and the IR method proposed in [21] and relevance feedback approaches in [14], [22], [23] and [24] are not suitable for direct use for web databases. For this reason we implement a ranking model that is based on the user and query similarity. It does mean that the proposed model is query and user dependent ranking model.
  • 3. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 38 | Page III. Proposed Ranking Algorithm Listing 1 – Ranking algorithm [25] This algorithm is based on the algorithm given in [36] and implemented in the prototype application which shows both user and query dependent rankings for query results of web databases. IV. Sample Workload File The sample workload file is given in fig. 1 which shows queries, users and the ranking functions calculated as per the algorithm given in listing 1. Table 1 – Sample Workload V. Experimental Evaluation The experiments are made using a prototype application with two synthetic web databases such as college and hospital. The experimental results are evaluated by visualizing the results in the form of graphs. Fig. 1 and 2 shows the ranking quality of query similarity models for both databases with 10% work load. Fig. 2 – Ranking quality of query similarity (College DB) 0 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 SpearmanCoefficient College DB (10% Workload) query result similarit y query conditio n similarit y
  • 4. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 39 | Page Fig. 3 – Ranking quality of query similarity (Hospital DB) As can be seen in fig. 2 and 3, query condition similarity average is found across all queries. The X axis shows queries while the Y axis shows spearman coefficient. As it is evident in the graphs, the query condition model outperforms query result model. The loss of quality is due the restricted workload that is 10%. When workload increases, the quality also increases. Fig. 4 – Ranking quality of user similarity model (College DB) Fig. 5 – Ranking quality of user similarity model (Hospital D) 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SpearmanCoefficient HospitalDB (10% Workload) query result similarity query condition similarity 0.65 0.7 0.75 0.8 0.85 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SpearmanCoefficient College DB(10% workload) query independent clustering strict top-K user top-K 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SpearmanCoefficient Hospitalestate DB(10% workload) query indepen dent clusteri ng strict top-K
  • 5. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 40 | Page Fig. 6 – Ranking quality of user similarity model (College DB) Fig. 4, 5, and 6 show the average ranking quality achieved fro both college and hospital database across all queries for all users. The results reveal that strict top-K model performs bittern than other models. However, the strict top-K has no ranking functions for many queries. Fig. 7 – Ranking functions derived for user similarity (College DB) Fig. 8 – Ranking functions derived for user similarity (Hospital DB) Fig. 7 and 8 confirm the fact that different models have different abilities for determining ranking functions across the workload. Nevertheless, the strict top-K is accurate and superior to all other models from the perspective of ranking function. 0 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SpearmanCoefficient College DB(User 27, 10% workload) query independent clustering strict top-K user top-K 0 200 400 600 800 1000 1200 1400 1600 1 2 3 4 5 Series3 Series2 Series1 0 200 400 600 800 1000 1200 1 2 3 4 5 Series3 Series2 Series1
  • 6. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 41 | Page Fig. 9 – Ranking quality of combined similarity model Fig. 10 – Ranking quality of combined similarity model Fig. 11 – Ranking quality of combined similarity model 0.8 0.82 0.84 0.86 0.88 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SpearmanCoefficient College DB(10% workload) Combined Sim Query Sim 0.68 0.7 0.72 0.74 0.76 0.78 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SpearmanCoefficient College DB (1% workload) Combin ed Sim Query Sim User Sim 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 1 2 3 4 5 6 7 8 9 1011121314151617181920 SpearmanCoefficient HospitalDB (10% workload) Combi ned Sim Query Sim User Sim
  • 7. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 42 | Page Fig. 12 – Ranking quality of combined similarity model Fig. 9, 10, 11 and 12 show the quality of combined models for both databases with 1% and 10% workload. The important observation is that the composite model is performing better than other individual models. Another fact established here is that with more ranking functions in workload better similarity and quality of results is achieved. VI. Conclusion This paper proposed a new ranking model for ranking query results of web databases. We used two synthetic web databases for experiments. They are college database and hospital database. The model is based on both query similarity and user similarity. We have also built a prototype web based application that demonstrates the efficiency of the proposed ranking model. A workload file is maintained that that continually stores updated ranking functions for both user similarity and query similarity. When a new query is made, this workload file is used for giving ranking to the query results. Designing and maintaining a workload is challenging in the context of web databases. We have implemented an algorithm for computing user and query similarities and update workload consistently. The experimental results revealed that our new ranking model works well and it can be explored for real world web databases. References: [1] M.K. Bergman, “The Deep Web: Surfacing Hidden Value,” J. Electronic Publishing, vol. 7, no. 1, pp. 41-50, 2001. [2] K.C.-C. Chang, B. He, C. Li, M. Patil, and Z. Zhang, “Structured Databases on the Web: Observations and Implications,” SIGMOD Record, vol. 33, no. 3, pp. 61-70, 2004. [3] C. Li, K.C.-C. Chang, I.F. Ilyas, and S. Song, “Ranksql: Query Algebra and Optimization for Relational Top-k Queries,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 131-142, 2005. [4] A. Marian, N. Bruno, and L. Gravano, “Evaluating Top-k Queries over Web-Accessible Databases,” ACM Trans. Database Systems, vol. 29, no. 2, pp. 319-362, 2004. [5] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic Ranking of Database Query Results,” Proc. 30th Int’l Conf. Very Large Data Bases (VLDB), pp. 888-899, 2004. 684 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER 2012 Fig. 10. Ranking quality of learning models. [6] W. Su, J. Wang, Q. Huang, and F. Lochovsky, “Query Result Ranking over E-Commerce Web Databases,” Proc. Conf. Information and Knowledge Management (CIKM), pp. 575-584, 2006. [7] H. Yu, Y. Kim, and S. won Hwang, “Rv-svm: An Efficient Method for Learning Ranking Svm,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), pp. 426-438, 2009. [8] G. Koutrika and Y.E. Ioannidis, “Constrained Optimalities in Query Personalization,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 73-84, 2005. [9] J. Basilico and T. Hofmann, “A Joint Framework for Collaborative and Content Filtering,” Proc. 27th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 550- 551, 2004. [10] T. Hofmann, “Collaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis,” Proc. 26th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 259-266, 2003. [11] D. Billsus and M.J. Pazzani, “Learning Collaborative Information Filters,” Proc. Int’l Conf. Machine Learning (ICML), pp. 46-54, 1998. [12] M. Balabanovic and Y. Shoham, “Content-Based Collaborative Recommendation,” Comm. ACM, vol. 40, no. 3, pp. 66-72, 1997. [13] C. Basu, H. Hirsh, and W.W. Cohen, “Recommendation as Classification: Using Social and Content-Based Information in Recommendation,” Proc. 15th Nat’l Conf. Artificial Intelligence (AAAI/IAAI), pp. 714-720, 1998. [14] B. He, “Relevance Feedback,” Encyclopedia of Database Systems, pp. 2378-2379, Springer, 2009. [15] S. Gauch and M. Speretta, “User Profiles for Personalized Information Access,” Adaptive Web, pp. 54-89, 2007. [16] S. Amer-Yahia, A. Galland, J. Stoyanovich, and C. Yu, “From del.icio.us to x.qui.site: Recommendations in Social Tagging Sites,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 1323-1326, 2008. 0.66 0.68 0.7 0.72 0.74 0.76 0.78 1 2 3 4 5 6 7 8 9 1011121314151617181920 SpearmanCoefficient HospitalDB(1% work load) Combined Sim Query Sim User Sim
  • 8. Query- And User-Dependent Approach for Ranking Query Results in Web Databases www.iosrjournals.org 43 | Page [17] A. Penev and R.K. Wong, “Finding Similar Pages in a Social Tagging Repository,” Proc. 17th Int’l Conf. World Wide Web (WWW), pp. 1091-1092, 2008. [18] S.-W. Hwang, “Supporting Ranking For Data Retrieval,” PhD thesis, Univ. of Illinois, Urbana Champaign, 2005. [19] K. Werner, “Foundations of Preferences in Database Systems,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB), pp. 311-322, 2002. [20] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999. [21] N. Fuhr, “A Probabilistic Framework for Vague Queries and Imprecise Information in Databases,” Proc. 16th Int’l Conf. Very Large Data Bases (VLDB), pp. 696-707, 1990. [22] Y. Rui, T.S. Huang, and S. Mehrotra, “Content-Based Image Retrieval with Relevance Feedback in Mars,” Proc. IEEE Int’l Conf. Image Processing, pp. 815-818, 1997. [23] L. Wu et al., “Falcon: Feedback Adaptive Loop for Content-Based Retrieval,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 297- 306, 2000. [24] X. Luo, X. Wei, and J. Zhang, “Guided Game-Based Learning Using Fuzzy Cognitive Maps,” IEEE Trans. Learning Technologies, vol. 3, no. 4, pp. 344-357, Oct.-Dec. 2010. [25] Aditya Telang, Chengkai Li, and Sharma Chakravarthy, “One Size Does Not Fit All: Toward User- and Query-Dependent Ranking for Web Databases”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER 2012. AUTHORS Sruthi Ambati is a student of DRK Institute of science and Technology, Ranga Reddy, Andhra Pradesh, India. He has received B.Tech degree in Computer Science and Engineering and M.Tech Degree in Computer Science and Engineering. Her main research interest includes Data Mining and Image Processing. Raghava Rao.N is working as HOD and Associate Professor at DRK Institute of Science & Technology, Ranga Reddy, and Andhra Pradesh, India. He has received M.Tech Degree in Computer Science and Engineering. His Main Interest includes Cloud Computing, Software Engineering.