SlideShare a Scribd company logo
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
DOI : 10.5121/ijwsc.2012.3307 77
Web Image Retrieval Using Visual Dictionary
Umesh K K1
and Suresha2
1
Department of Information Science and Engineering,
S J College of Engineering, Mysore, India.
umeshkatte@gmail.com
2
Department of Studies in Computer Science, Manasagangothri,
Mysore University, Mysore, India.
sureshabm@yahoo.com
ABSTRACT
In this research, we have proposed semantic based image retrieval system to retrieve set of relevant images
for the given query image from the Web. We have used global color space model and Dense SIFT feature
extraction technique to generate visual dictionary using proposed quantization algorithm. The images are
transformed into set of features. These features are used as inputs in our proposed Quantization algorithm
for generating the code word to form visual dictionary. These codewords are used to represent images se-
mantically to form visual labels using Bag-of-Features (BoF). The Histogram intersection method is used
to measure the distance between input image and the set of images in the image database to retrieve simi-
lar images. The experimental results are evaluated over a collection of 1000 generic Web images to dem-
onstrate the effectiveness of the proposed system.
KEYWORDS
Content-based image retrieval, Dense SIFT feature, Quantization, Bag-of-Features, Bag-of-Words, Simi-
larity measure.
1 INTRODUCTION
Due to rapid changes in digital technologies, in the recent years many people wish to publish their
digital information on the Web such as text, image, video, audio etc. Hence, it requires effective
indexing and searching tools for Web. Many researchers have been involved in developing the
system to retrieve the set of similar Web images, for the given query image. The contents of an
image have been used to represent the image semantically. The derived image features are used to
retrieve relevant images semantically from the Web.
In the past few years, many researchers have been involved in the area of Content-Based Image
Retrieval (CBIR) system to develop techniques to retrieve unannotated images [1]. Today many
people use a digital images and video libraries as the main source of visual information. Hence it
is an open challenge for the research community to develop cost effective technologies for re-
trieving, managing and browsing the images in the Web.
Many CBIR systems have been proposed in recent years. The first well known CBIR system is
Query by Image Content (QBIC) [2] developed at the IBM Almaden Research Center. Other sys-
tems include MIT’s Photobook [3] and its recent version, FourEyes [4], the search engine family
of VisualSEEk, WebSEEk, and MetaSEEk [5, 6, 7] (all are developed at Columbia University).
The virage [8] is a commercial content-based search engine developed at Virage Technologies. In
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
78
these models, the low level features are extracted from the collected images to retrieve set of re-
levant images. The results of the system are based on the previously indexed detection results.
The limitations of existing systems are: It is necessary of creating an endless number of object de-
tectors, each requiring ground truth labels, parameter selection, validation, and so on and the si-
milarity-based image retrieval approach requires the user to present one or more images of what
he/she is searching using low level features.
Our work is based on the bag-of-features approach. The basic idea of this work is that a set of lo-
cal image blocks is sampled and a vector of visual descriptors is evaluated on each independently
by using Dense Scale Invariant Feature Transform (DSIFT). The resulting distribution of descrip-
tors in descriptor space is then quantified by using proposed vector quantization. As per the sur-
vey, there are many clustering/vector quantization algorithms developed to build the visual voca-
bulary [9,10,11]. In our research work, we proposed novel approach to generate codebook and it
reduces the .
According to the literature survey, the bag-of-features were first devised in NeTra image retrieval
system [18], which uses codebook to efficiently index color, texture and shape descriptors. The
same idea was revisited and implemented in RETIN system [19]. These systems were based on
global descriptors. The codebook of local descriptors used in Video Google approach [20]. In the
same line, we focused to build visual dictionary of local descriptors, since building a good visual
dictionary is the biggest challenge when employing the techniques. The creation of visual dictio-
nary requires the quantization of the description space, which can be obtained by a clustering al-
gorithm. In commonest, the choice found in the literature survey is a combination of dimensional-
ity reduction using Principal Component Analysis (PCA) and a clustering using k-means
algorithm with Euclidean distance. This choice has also been criticized [21], since the solution is
far from adequate to deal with high dimensional spaces. Also the optimality criterion of PCA
does not match the needs of clusterization. Since it chooses the components which maximize the
global variance of the data, not the ones which better preserve the local clusters.
The Harris corner detector is very sensitive to changes in image scale, so it does not provide a
good basis for matching images of different sizes [22]. Scale Invariant Features Transform (SIFT)
approach extended the local features to achieve scale invariance and local image distortions [22].
The novelty of our work is as follows:
1. Dense SIFT (DSIFT) feature description is used to prove that better local feature extraction
technique compare with global color model.
2. The critics of basic clusterization technique is considered and proposed novel quantization
algorithm for generating good vocabulary dictionary.
The organization of this paper is as follows: In Section 2, the proposed methodology is discussed.
The Section 3 describes the experimental results. In Section 3.2, we will conclude our research
findings.
2 THE PROPOSED METHODOLOGY
This section presents a brief description of the datasets used in our study, the feature extraction,
quantization and distance measures, Bag-of-Features and image representations, and image re-
trieval using Bag-of-Words. The block diagram of the proposed model is as shown in Figure 1.
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
79
2.2 Feature Extraction
The Scale-Invariant Feature Transform (SIFT) extracting keypoints are invariant to rotation, scal-
ing, and translation. These keypoints are used to detect distinctive edges and textures in an image.
Each keypoint has 132 dimensions: 128 spatial orientations bins, plus coordinates, scale, and rota-
tion of the keypoints. We will also discuss to extract global image descriptors and represent im-
ages as a collection of local properties and calculate from a set of small sub-images called blocks.
Image features are extracted from a set of collected images and stored in a database. According to
David Lowe [23], SIFT gives a good result to recognize/retrieve images. This approach achieves
scale invariance and also less sensitive to local image distortions such as 3D viewpoint change.
The feature vector consists of SIFT features computed on a regular grid across the image Dense
SIFT (DSIFT). These feature vectors are quantized into visual words to build visual dictionary
using proposed quantization algorithm.
DSIFT algorithm makes some new assumptions:
1. The location of each keypoint is not from the gradient feature of the pixel, but from a prede
fined location.
2. The scale of each keypoint is same, which is also predesigned.
3. The orientation of each keypoint is always zero.
Fig.1. The Block Diagram of Proposed Model.
With these assumptions, DSIFT can acquire more feature in less time than SIFT does.
2.2.1 Dense SIFT
The Dense SIFT feature extraction algorithm is used to extract Web image features. The impor-
tant parameters of extraction of features from the Web images are the size of the blocks (N) and
their spacing (M) (which controls the degree of overlap). In this connection, the RGB component
of Web images are converted into to HSV color space and split the image into blocks formed by
an N ×N grid. We then compute the mean and variance of each block’s color band. We use 4×4
grid in our test, yielding a 16×3×2 = 96-dimensional feature space. The block size tested are N =
5, 7 and 11. The blocks are spaced by M pixels on a regular grid. The blocks do not overlap when
M = N, do overlap when M=3 (for N = 5, 7) and M = 7 (for N = 11). The proposed image retriev-
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
80
al system is compared with baseline global color model which is discussed below. This is used to
gauge the precision and recall of retrieval tasks.
2.2.2 Global Color Model
The Global Color Model is used to compute the global HSV color histogram for each training
Web image. The color values are represented by a histogram with 36 bins for H, 32 bins for S,
and 16 bins for V, giving 84-dimensional vector for each image. A query image is tested using
histogram intersection and measure the precision and recall separately. In the formulation of his-
togram, we compute the frequency of each visual word, where each image is represented as a col-
lection of visual words, provided from visual vocabulary. This visual vocabulary is obtained by
vector quantizing descriptors computed from the image database. The DSIF descriptor and HSV
color space are used as input in proposed novel quantization algorithm separately to generate the
vocabulary discussed in Section 2.3.
2.3 Quantization and Distance Measures
In this method, initially all the training vectors are treated as Cluster centers. The distance be-
tween a training vector Xi and all the other training vectors Yj where i ≠ j is computed as
= ∑ ( − ) where ≠ and 1 ≤ ≤ (1)
The minimum of all the distances is identified and the Cluster Density of the corresponding train-
ing vector is incremented by 1. For example, if the distance between the 1st and the 10000th
vec-
tors is the minimum, then the cluster density of the 10000th
vector is incremented by 1. Similarly
the distance between the second training vector and all the other vectors are computed and so on.
Whichever distance is minimum the cluster density of the corresponding vector is incremented by
one. The above steps are repeated for all the training vectors one by one. Hence if 250 training
vectors are closer to any training vector i, then the cluster density of ith
vector is 250. The training
vectors are sorted in the descending order based on their cluster densities. Finally the M training
vectors with the top density values are selected as the seeds for the initial codebook that is to be
used as the input for the Web image representation in vector space model.
Steps to generate the codebook
Step1: Input the image
Step2: Split the image into small blocks of size 4 x 4.
Step3: Generate the training set of size N.
Step4: Initialize the Density array of size N with zero.
Step5: for i=1 to N
Step6: for j=1 to N
Step7: if( i≠ j)
Step8: compute the distance between the vectors i and j using the Eq.(1).
Step9: End j
Step10. Find the minimum of all the N-1 distances computed and increment the
corresponding density by 1.
Step11. End i.
2.4 Image Representations with a bag of visual words
The concept of Bag of visual words method is inspired by Bag-of-Words (BoW) [22], which has
been used in text mining. In the BoW model, each stop word is treated as independent feature and
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
81
achieved outstanding performance in the text retrieval system. The same idea is used in our algo-
rithm to represent Web images semantically. In the Bag of visual model, each image is
represented as a set of orderless visual words. Recently few researchers have demonstrated its ef-
fectiveness in semantic based image retrieval [23]. The local descriptors (features) are much more
precise and discriminating than global descriptors. When looking for a specific image in a image
database or target object within the image, this discrimination is very essential in retrieval system,
but when looking for complex categories in a large image database, it becomes difficult. To mi-
nimize this problem, there is scope of improvement to propose a possible solution is the technique
of visual dictionaries. It is made up of set of bag of visual words constructed using clustering ap-
proaches.
In the visual dictionary label representation, each region of an image becomes a visual “word” of
the dictionary. The idea is that different regions of description space will become associated with
different semantic concepts. For example, sky, clouds, land, rocks, vegetation etc having its own
semantics. In CBIR, the low level representation considers the descriptor space and split into mul-
tiple regions. It employs unsupervised learning techniques like clustering. The limitation of CBIR
is that, it describes using low level features. These features are not able to fill the gap between
high-level representation and low-level representation. Hence, we proposed Bag-of-Words model
derived from text retrieval system to construct a visual dictionary. Based on the visual vocabu-
lary (codebook), each Web image is represented of the corresponding codeword semantically to
retrieve similar images from the Web. Besides this compact representation, a sparse but uniform
representation is also utilized. In this representation, an image is a vector with each dimension
corresponding to a codeword.
2.5 Image Retrieval Using Bag-of-Words
We use histogram intersection method to retrieve similar images from the image database. It is
used to compute the similarity between two spatial histograms of given images A and B. The his-
togram intersection is defined in Equation 1.
∑ ∑= =
−=
m
i
n
j
ji baBAd
1 1
),min(1),( (1)
where ai and bj represent the frequencies of visual words of each block (the whole image is di-
vided into 4 big blocks) of image A and B respectively. The frequency of each visual word is rec-
orded in a histogram for ‘m’ blocks of a spatial tiling. For a given query image Q, the distance be-
tween Q and each image in the Web image database will be calculated. Consequently, a set of
images in Web with small similarity distance is selected and ranked from the most to the least
similar ones.
The quantitative measures of retrieval system are the average precision and recall [12, 13, 23].
We have used the same in our experiment, which is defined in Equations 2 and 3.
∑=
=
M
j
jir
M
iP
1
),(
1
)( (2)
∑=
=
N
j
jir
N
iR
1
),(
1
)( (3)
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
82
where ( )


 =
=
otherwise
jidiidif
jir
,0
)()(,1
,
where P(i) is the precision of query image i, R(i) is the recall of query image i, id (i) and id ( j) are
the category ID of image i and j, respectively, which are in the range of 1 to 10 categories. M is
the original size of the category that image i is from. This value is the percentage of images be-
longing to the category (relevant) of image i in the first M retrieved images. The recall is the per-
centage of images belonging to the category (relevant) of image i in the N relevant images.
For example, if the query image is a dinosaur (Figure 1), if 89 of the first 100 retrieved images
are belonging to the category of dinosaurs, the retrieval precision is 0.89%. Similarly, if the query
image is an elephant (Figure 1), if 59 of the first 100 retrieved images are belonging to the cate-
gory of elephant, the retrieval precision is 0.59% in semantic representation. In global color mod-
el, the retrieval precision of dinosaurs is 0.85% and elephant is 0.58%. The average precision of
global color model is lesser than DSIFT features semantic representation. Also we observed that
in Table 1, the average precision of global color model is 0.49% and the average precision of
DSIFT model is 0.62%. Based on the results as mentioned in the Table 1, DSIFT gives better se-
mantic representation than global color model.
Table 2 Comparisons of average retrieval precision (ARP) and precision obtained by proposed methods to
other existing image retrieval systems.
Class Global Color Model Dense SIFT Color Mod-
el
Precision Recall Precision Recall
Africa .44 .70 .58 .78
Beaches .38 .69 .52 .76
Buildings .39 .58 .57 .69
Bus .61 .64 .77 .89
Dinosaur .84 .85 .89 .90
Elephant .54 .58 .59 .66
Flower .67 .67 .78 .83
Horses 78 .73 .89 .89
Mountains .49 .53 .59 .62
Food .49 .61 .62 .71
3. EXPERIMENTAL RESULTS
We tested a benchmark Corel image dataset, which comprises of 1000 images from 10 categories
[16, 17]. Most of the images are in color photographs in JPEG formatwith the size of 256×384 or
384×256 resolutions. We analyzed the images belonging to different categories and not to a spe-
cific domain. This dataset has been used in SIMPLIcity [14] and ALIPR [15] projects to demon-
strate the results. We also used WBIIS image dataset, which comprises of 10,000 images from
100 categories [16, 17]. Most of the images are 24-bit true color JPEG format images with the
size of 85×128 or 128×85 resolutions. The Corel image datasets are used in our experiments to
measure the effectiveness of the retrieval system.
The SIFT features are given as input to proposed Quantization algorithm to visualize and interpret
large high-dimensional data sets. First, we investigate the codebook size on retrieval performance
of the system. We conducted many experiments to fix the size of the codebook from 50, 100, 150,
200, 250. The 200 size codebook is the best for Corel image dataset and 1000 size codebook is
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
83
the best for WBIIS low resolution Web crawled image dataset. The detailed results of using SIFT
features for each of the 10 categories are shown in Table 1. For the SIFT, we tested block of each
region of 16×16 and 8×8 with different size of codebook, the best results are obtained with patch
size of codebook. The best results are obtained with patch size of 8×8 and N = 200, which are
listed in Table 1. The comparative results of average precision of retrieval results in global Col-
or/DSIFT model for 10 categories are shown Figure 2.
Fig.2. Comparison of average precision of retrieval results in global color/Dense SIFT model for 10 classes
3.1 Performance
The proposed system has been used for feature extraction, generating visual dictionary, image re-
presentation and similarity measures. The algorithm has been implemented on an IntelR coreTM
i3 processor with 2.20 GHz using Windows operating system. To compute feature vectors for
1000 color images of size 256×384 or 384×256, takes approximately 30 minutes. On average, it
takes 1.8 seconds to compute the feature vector of an image. To generate code words, it takes
about 5 minutes. The matching speed is quite fast in this architecture. If the query image is in the
database, it takes about 0.1 seconds of CPU time on an average to intersect query and indexed
images, then sort the distance of all the images to be demonstrated at the user side. The new im-
age retrieval system is compared with baseline global color model. These are included in order to
gauge the precision of retrieval tasks. By using the visual dictionary techniques, the applications
have been greatly broadened and encompass category retrieval in image databases.
3.2. Conclusions
We have proposed semantic based image representation to retrieve images from the Web image
database. The proposed quantization algorithm is an iterative procedure used to generate code
words for every image to represent cluster similar features efficiently. The results of our experi-
ments show that the proposed technique is able to retrieve similar kind of image effectively from
the Web image database. In Web to increase better retrieval performance, we have to do an exten-
sive study of different feature representations to reduce quantization error and topological error
rate to evaluate the complexity of the output space. We observed that, large numbers of unclassi-
fied images are available on the internet, which are the major challenges of collection of Web im-
ages. We conducted an Experiment on a Corel Web image database to demonstrate the efficiency
and effectiveness of the proposed framework.
International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012
84
REFERENCES
[1] Rui, Y., Huang, T.S., Chang, S.-F.: Image Retrieval: Current Techniques, Promising Directions, and
Open Issues. Journal of Visual Communication and Image Representation. 10 (1), 39-62 (1999).
[2] Flickner, M., Sawhney, H., Niblck, W., et al.: Query By Image and Video Content: the QBIC system.
IEEE Computer September, 23-31 (1995).
[3] Pentland, A., Picard, R.W., Sclarof, S.: Photobook: Tools for Content-Based Manipulation of Image
Databases. In: Storage and Retrieval for Image and Video Databases II. In: SPIE Proceedings Series,
Vol. 2185, (1994).
[4] Minka, T.P.: An Image Database Browser that Learns from User Interaction. Master’s thesis, M.I.T.,
Cambridge, MA (1996).
[5] Michael J. Swain., Charles Frankel., and Vassilis Athitsos.:WebSeer: An Image Search Engine for the
World Wide Web, Technical Report 96-14, (1997).
[6] J. R. Smith,: Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis.
PhD thesis, Graduate School of Arts and Sciences, Columbia University (1997).
[7] S. Sclaroff., L. Taycher., and M. La Cascia.: Imagerover: A Content-Based Image Browser for the
World Wide Web. In Proceedings IEEE Workshop on Content-based Access of Image and Video Li-
braries (1997).
[8] Bach, J.R., Fuller, C., Gupta, A., et al.: The Virage Image Search Engine: an open framework for im-
age management. In: Sethi, I.K., Jain, R.J. (Eds. , Storage and Retrieval for Image and Video Data-
bases IV. In: SPIE Proceedings Series, Vol. 2670. San Jose, CA, USA (1996).
[9] Koikkalainen, P.: Progress with the Tree-Structured Self Organizing Map. In: Cohn, A.G. (Ed.), 11th
European Conference on Artificial Intelligence. European Committee for Artificial Intelligence
(ECCAI). Wiley, New York (1994).
[10] Koikkalainen, P., Oja, E.: Self-Organizing Hierarchical Feature Maps. In: Proceedings of 1990 In-
ternational Joint Conference on Neural Networks, Vol. II. IEEE, INNS, San Diego, CA (1990).
[11] T.Kohonen, Self-Organizing Maps, Springer-Verlag, New York (1997)
[12] Salton, G., McGill,M.J.: Introduction to Modern Information Retrieval. In: Computer Science Se-ries.
Mc- Graw-Hill, New York (1983).
[13] Li, J.,Wang, J. Z. andWiederhold, G.: Integrated Region Matching for Image Retrieval. ACM Multi-
media, p. 147-156 (2000).
[14] James Z. Wang, Jia Li and Gio Wiederhold.: SIMPLIcity: Semantics-Sensitive Integrated Match-ing
for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9,
pp. 947-963 (2001).
[15] http://alipr.com/
[16] R. Datta, J. Li, and J. Z.Wang.: Algorithmic Inferencing of Aesthetics and Emotion in Natural Im-
ages: An Exposition. Proc. IEEE ICIP, Special Session on Image Aesthetics, Mood and Emotion, San
Diego, CA (2008).
[17] http://wang.ist.psu.edu/jwang/test1.jar.
[18] W-Y. Ma, B.Manjunath.: NeTra: A Toolbox for Navigating Large Databases. Multimedia systems,
Vol. 7, n.3, pp. 184-198 (1999).
[19] J.Fournier, M.Cord, S. Philipp-Foliguet.: RETIN: A Content-Based Image Indexing and Retrieval
System. Journal of Pattern Analysis and Applications, Vol. 4(2/3), pp. 153-173 (2001).
[20] J.Sivic , A.Zisserman.: Video Google: A Text Retrieval Approach to Object Matching in Video. In
Proc. 9th IEEE International Conference on Computer Vision, pp. 1470-1477 (2003).
[21] B. Triggs, F.Jurie.: Creating Efficient Codebooks for Visual Recognition. In IEEE Int. Conf. on
Computer Vision, vol. 1, pp. 604-610 (2005).
[22] Q.Tian, S.Zhang.: Descriptive VisualWords and Visual Phrases for Image Applications. ACM Mul-
timedia, pp. 19-24 (2009).
[23] David G.Lowe.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of
Computer Vision, 60, pp.91-110, 2(2004).

More Related Content

PDF
SIGNIFICANCE OF DIMENSIONALITY REDUCTION IN IMAGE PROCESSING
PDF
OBJECT DETECTION AND RECOGNITION: A SURVEY
PDF
Research Inventy : International Journal of Engineering and Science
PDF
A Review of Feature Extraction Techniques for CBIR based on SVM
PDF
A Survey on Image Retrieval By Different Features and Techniques
PDF
Content based image retrieval (cbir) using
PDF
SEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURES
PDF
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
SIGNIFICANCE OF DIMENSIONALITY REDUCTION IN IMAGE PROCESSING
OBJECT DETECTION AND RECOGNITION: A SURVEY
Research Inventy : International Journal of Engineering and Science
A Review of Feature Extraction Techniques for CBIR based on SVM
A Survey on Image Retrieval By Different Features and Techniques
Content based image retrieval (cbir) using
SEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURES
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...

What's hot (18)

PDF
Improving the Accuracy of Object Based Supervised Image Classification using ...
PDF
Image Retrieval using Equalized Histogram Image Bins Moments
KEY
Content-based Image Retrieval
PDF
Color and texture based image retrieval a proposed
PDF
2015.basicsof imageanalysischapter2 (1)
PDF
Content Based Image Retrieval
PDF
An Unsupervised Cluster-based Image Retrieval Algorithm using Relevance Feedback
PDF
Survey on Content Based Image Retrieval
PDF
Literature Review on Content Based Image Retrieval
PDF
10.1.1.432.9149
PDF
I017417176
PDF
3.[18 30]graph cut based local binary patterns for content based image retrieval
PDF
Analysis of combined approaches of CBIR systems by clustering at varying prec...
PDF
Information search using text and image query
PDF
Content Based Image Retrieval : Classification Using Neural Networks
PDF
Improving Performance of Multileveled BTC Based CBIR Using Sundry Color Spaces
PDF
11.graph cut based local binary patterns for content based image retrieval
PDF
3.[13 21]framework of smart mobile rfid networks
Improving the Accuracy of Object Based Supervised Image Classification using ...
Image Retrieval using Equalized Histogram Image Bins Moments
Content-based Image Retrieval
Color and texture based image retrieval a proposed
2015.basicsof imageanalysischapter2 (1)
Content Based Image Retrieval
An Unsupervised Cluster-based Image Retrieval Algorithm using Relevance Feedback
Survey on Content Based Image Retrieval
Literature Review on Content Based Image Retrieval
10.1.1.432.9149
I017417176
3.[18 30]graph cut based local binary patterns for content based image retrieval
Analysis of combined approaches of CBIR systems by clustering at varying prec...
Information search using text and image query
Content Based Image Retrieval : Classification Using Neural Networks
Improving Performance of Multileveled BTC Based CBIR Using Sundry Color Spaces
11.graph cut based local binary patterns for content based image retrieval
3.[13 21]framework of smart mobile rfid networks
Ad

Similar to Web Image Retrieval Using Visual Dictionary (20)

PDF
Information search using text and image query
PDF
Applications of spatial features in cbir a survey
PDF
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
PDF
H018124360
PDF
K018217680
PDF
A Review on Matching For Sketch Technique
PDF
A Color Boosted Local Feature Extraction Method for Mobile Product Search
PDF
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
PDF
CBIR Processing Approach on Colored and Texture Images using KNN Classifier a...
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
A Hybrid Approach for Content Based Image Retrieval System
PDF
Empirical Coding for Curvature Based Linear Representation in Image Retrieval...
PDF
B017360516
PDF
Global Descriptor Attributes Based Content Based Image Retrieval of Query Images
PDF
A comparative analysis of retrieval techniques in content based image retrieval
PDF
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
PDF
IRJET- Content Based Image Retrieval (CBIR)
PDF
Improving Graph Based Model for Content Based Image Retrieval
PDF
Feature extraction techniques on cbir a review
PDF
26 3 jul17 22may 6664 8052-1-ed edit septian
Information search using text and image query
Applications of spatial features in cbir a survey
APPLICATIONS OF SPATIAL FEATURES IN CBIR : A SURVEY
H018124360
K018217680
A Review on Matching For Sketch Technique
A Color Boosted Local Feature Extraction Method for Mobile Product Search
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
CBIR Processing Approach on Colored and Texture Images using KNN Classifier a...
International Journal of Engineering Research and Development (IJERD)
A Hybrid Approach for Content Based Image Retrieval System
Empirical Coding for Curvature Based Linear Representation in Image Retrieval...
B017360516
Global Descriptor Attributes Based Content Based Image Retrieval of Query Images
A comparative analysis of retrieval techniques in content based image retrieval
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
IRJET- Content Based Image Retrieval (CBIR)
Improving Graph Based Model for Content Based Image Retrieval
Feature extraction techniques on cbir a review
26 3 jul17 22may 6664 8052-1-ed edit septian
Ad

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Welding lecture in detail for understanding
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
573137875-Attendance-Management-System-original
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Sustainable Sites - Green Building Construction
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Foundation to blockchain - A guide to Blockchain Tech
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Welding lecture in detail for understanding
Model Code of Practice - Construction Work - 21102022 .pdf
Digital Logic Computer Design lecture notes
OOP with Java - Java Introduction (Basics)
Lecture Notes Electrical Wiring System Components
573137875-Attendance-Management-System-original
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Sustainable Sites - Green Building Construction
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf

Web Image Retrieval Using Visual Dictionary

  • 1. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 DOI : 10.5121/ijwsc.2012.3307 77 Web Image Retrieval Using Visual Dictionary Umesh K K1 and Suresha2 1 Department of Information Science and Engineering, S J College of Engineering, Mysore, India. umeshkatte@gmail.com 2 Department of Studies in Computer Science, Manasagangothri, Mysore University, Mysore, India. sureshabm@yahoo.com ABSTRACT In this research, we have proposed semantic based image retrieval system to retrieve set of relevant images for the given query image from the Web. We have used global color space model and Dense SIFT feature extraction technique to generate visual dictionary using proposed quantization algorithm. The images are transformed into set of features. These features are used as inputs in our proposed Quantization algorithm for generating the code word to form visual dictionary. These codewords are used to represent images se- mantically to form visual labels using Bag-of-Features (BoF). The Histogram intersection method is used to measure the distance between input image and the set of images in the image database to retrieve simi- lar images. The experimental results are evaluated over a collection of 1000 generic Web images to dem- onstrate the effectiveness of the proposed system. KEYWORDS Content-based image retrieval, Dense SIFT feature, Quantization, Bag-of-Features, Bag-of-Words, Simi- larity measure. 1 INTRODUCTION Due to rapid changes in digital technologies, in the recent years many people wish to publish their digital information on the Web such as text, image, video, audio etc. Hence, it requires effective indexing and searching tools for Web. Many researchers have been involved in developing the system to retrieve the set of similar Web images, for the given query image. The contents of an image have been used to represent the image semantically. The derived image features are used to retrieve relevant images semantically from the Web. In the past few years, many researchers have been involved in the area of Content-Based Image Retrieval (CBIR) system to develop techniques to retrieve unannotated images [1]. Today many people use a digital images and video libraries as the main source of visual information. Hence it is an open challenge for the research community to develop cost effective technologies for re- trieving, managing and browsing the images in the Web. Many CBIR systems have been proposed in recent years. The first well known CBIR system is Query by Image Content (QBIC) [2] developed at the IBM Almaden Research Center. Other sys- tems include MIT’s Photobook [3] and its recent version, FourEyes [4], the search engine family of VisualSEEk, WebSEEk, and MetaSEEk [5, 6, 7] (all are developed at Columbia University). The virage [8] is a commercial content-based search engine developed at Virage Technologies. In
  • 2. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 78 these models, the low level features are extracted from the collected images to retrieve set of re- levant images. The results of the system are based on the previously indexed detection results. The limitations of existing systems are: It is necessary of creating an endless number of object de- tectors, each requiring ground truth labels, parameter selection, validation, and so on and the si- milarity-based image retrieval approach requires the user to present one or more images of what he/she is searching using low level features. Our work is based on the bag-of-features approach. The basic idea of this work is that a set of lo- cal image blocks is sampled and a vector of visual descriptors is evaluated on each independently by using Dense Scale Invariant Feature Transform (DSIFT). The resulting distribution of descrip- tors in descriptor space is then quantified by using proposed vector quantization. As per the sur- vey, there are many clustering/vector quantization algorithms developed to build the visual voca- bulary [9,10,11]. In our research work, we proposed novel approach to generate codebook and it reduces the . According to the literature survey, the bag-of-features were first devised in NeTra image retrieval system [18], which uses codebook to efficiently index color, texture and shape descriptors. The same idea was revisited and implemented in RETIN system [19]. These systems were based on global descriptors. The codebook of local descriptors used in Video Google approach [20]. In the same line, we focused to build visual dictionary of local descriptors, since building a good visual dictionary is the biggest challenge when employing the techniques. The creation of visual dictio- nary requires the quantization of the description space, which can be obtained by a clustering al- gorithm. In commonest, the choice found in the literature survey is a combination of dimensional- ity reduction using Principal Component Analysis (PCA) and a clustering using k-means algorithm with Euclidean distance. This choice has also been criticized [21], since the solution is far from adequate to deal with high dimensional spaces. Also the optimality criterion of PCA does not match the needs of clusterization. Since it chooses the components which maximize the global variance of the data, not the ones which better preserve the local clusters. The Harris corner detector is very sensitive to changes in image scale, so it does not provide a good basis for matching images of different sizes [22]. Scale Invariant Features Transform (SIFT) approach extended the local features to achieve scale invariance and local image distortions [22]. The novelty of our work is as follows: 1. Dense SIFT (DSIFT) feature description is used to prove that better local feature extraction technique compare with global color model. 2. The critics of basic clusterization technique is considered and proposed novel quantization algorithm for generating good vocabulary dictionary. The organization of this paper is as follows: In Section 2, the proposed methodology is discussed. The Section 3 describes the experimental results. In Section 3.2, we will conclude our research findings. 2 THE PROPOSED METHODOLOGY This section presents a brief description of the datasets used in our study, the feature extraction, quantization and distance measures, Bag-of-Features and image representations, and image re- trieval using Bag-of-Words. The block diagram of the proposed model is as shown in Figure 1.
  • 3. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 79 2.2 Feature Extraction The Scale-Invariant Feature Transform (SIFT) extracting keypoints are invariant to rotation, scal- ing, and translation. These keypoints are used to detect distinctive edges and textures in an image. Each keypoint has 132 dimensions: 128 spatial orientations bins, plus coordinates, scale, and rota- tion of the keypoints. We will also discuss to extract global image descriptors and represent im- ages as a collection of local properties and calculate from a set of small sub-images called blocks. Image features are extracted from a set of collected images and stored in a database. According to David Lowe [23], SIFT gives a good result to recognize/retrieve images. This approach achieves scale invariance and also less sensitive to local image distortions such as 3D viewpoint change. The feature vector consists of SIFT features computed on a regular grid across the image Dense SIFT (DSIFT). These feature vectors are quantized into visual words to build visual dictionary using proposed quantization algorithm. DSIFT algorithm makes some new assumptions: 1. The location of each keypoint is not from the gradient feature of the pixel, but from a prede fined location. 2. The scale of each keypoint is same, which is also predesigned. 3. The orientation of each keypoint is always zero. Fig.1. The Block Diagram of Proposed Model. With these assumptions, DSIFT can acquire more feature in less time than SIFT does. 2.2.1 Dense SIFT The Dense SIFT feature extraction algorithm is used to extract Web image features. The impor- tant parameters of extraction of features from the Web images are the size of the blocks (N) and their spacing (M) (which controls the degree of overlap). In this connection, the RGB component of Web images are converted into to HSV color space and split the image into blocks formed by an N ×N grid. We then compute the mean and variance of each block’s color band. We use 4×4 grid in our test, yielding a 16×3×2 = 96-dimensional feature space. The block size tested are N = 5, 7 and 11. The blocks are spaced by M pixels on a regular grid. The blocks do not overlap when M = N, do overlap when M=3 (for N = 5, 7) and M = 7 (for N = 11). The proposed image retriev-
  • 4. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 80 al system is compared with baseline global color model which is discussed below. This is used to gauge the precision and recall of retrieval tasks. 2.2.2 Global Color Model The Global Color Model is used to compute the global HSV color histogram for each training Web image. The color values are represented by a histogram with 36 bins for H, 32 bins for S, and 16 bins for V, giving 84-dimensional vector for each image. A query image is tested using histogram intersection and measure the precision and recall separately. In the formulation of his- togram, we compute the frequency of each visual word, where each image is represented as a col- lection of visual words, provided from visual vocabulary. This visual vocabulary is obtained by vector quantizing descriptors computed from the image database. The DSIF descriptor and HSV color space are used as input in proposed novel quantization algorithm separately to generate the vocabulary discussed in Section 2.3. 2.3 Quantization and Distance Measures In this method, initially all the training vectors are treated as Cluster centers. The distance be- tween a training vector Xi and all the other training vectors Yj where i ≠ j is computed as = ∑ ( − ) where ≠ and 1 ≤ ≤ (1) The minimum of all the distances is identified and the Cluster Density of the corresponding train- ing vector is incremented by 1. For example, if the distance between the 1st and the 10000th vec- tors is the minimum, then the cluster density of the 10000th vector is incremented by 1. Similarly the distance between the second training vector and all the other vectors are computed and so on. Whichever distance is minimum the cluster density of the corresponding vector is incremented by one. The above steps are repeated for all the training vectors one by one. Hence if 250 training vectors are closer to any training vector i, then the cluster density of ith vector is 250. The training vectors are sorted in the descending order based on their cluster densities. Finally the M training vectors with the top density values are selected as the seeds for the initial codebook that is to be used as the input for the Web image representation in vector space model. Steps to generate the codebook Step1: Input the image Step2: Split the image into small blocks of size 4 x 4. Step3: Generate the training set of size N. Step4: Initialize the Density array of size N with zero. Step5: for i=1 to N Step6: for j=1 to N Step7: if( i≠ j) Step8: compute the distance between the vectors i and j using the Eq.(1). Step9: End j Step10. Find the minimum of all the N-1 distances computed and increment the corresponding density by 1. Step11. End i. 2.4 Image Representations with a bag of visual words The concept of Bag of visual words method is inspired by Bag-of-Words (BoW) [22], which has been used in text mining. In the BoW model, each stop word is treated as independent feature and
  • 5. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 81 achieved outstanding performance in the text retrieval system. The same idea is used in our algo- rithm to represent Web images semantically. In the Bag of visual model, each image is represented as a set of orderless visual words. Recently few researchers have demonstrated its ef- fectiveness in semantic based image retrieval [23]. The local descriptors (features) are much more precise and discriminating than global descriptors. When looking for a specific image in a image database or target object within the image, this discrimination is very essential in retrieval system, but when looking for complex categories in a large image database, it becomes difficult. To mi- nimize this problem, there is scope of improvement to propose a possible solution is the technique of visual dictionaries. It is made up of set of bag of visual words constructed using clustering ap- proaches. In the visual dictionary label representation, each region of an image becomes a visual “word” of the dictionary. The idea is that different regions of description space will become associated with different semantic concepts. For example, sky, clouds, land, rocks, vegetation etc having its own semantics. In CBIR, the low level representation considers the descriptor space and split into mul- tiple regions. It employs unsupervised learning techniques like clustering. The limitation of CBIR is that, it describes using low level features. These features are not able to fill the gap between high-level representation and low-level representation. Hence, we proposed Bag-of-Words model derived from text retrieval system to construct a visual dictionary. Based on the visual vocabu- lary (codebook), each Web image is represented of the corresponding codeword semantically to retrieve similar images from the Web. Besides this compact representation, a sparse but uniform representation is also utilized. In this representation, an image is a vector with each dimension corresponding to a codeword. 2.5 Image Retrieval Using Bag-of-Words We use histogram intersection method to retrieve similar images from the image database. It is used to compute the similarity between two spatial histograms of given images A and B. The his- togram intersection is defined in Equation 1. ∑ ∑= = −= m i n j ji baBAd 1 1 ),min(1),( (1) where ai and bj represent the frequencies of visual words of each block (the whole image is di- vided into 4 big blocks) of image A and B respectively. The frequency of each visual word is rec- orded in a histogram for ‘m’ blocks of a spatial tiling. For a given query image Q, the distance be- tween Q and each image in the Web image database will be calculated. Consequently, a set of images in Web with small similarity distance is selected and ranked from the most to the least similar ones. The quantitative measures of retrieval system are the average precision and recall [12, 13, 23]. We have used the same in our experiment, which is defined in Equations 2 and 3. ∑= = M j jir M iP 1 ),( 1 )( (2) ∑= = N j jir N iR 1 ),( 1 )( (3)
  • 6. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 82 where ( )    = = otherwise jidiidif jir ,0 )()(,1 , where P(i) is the precision of query image i, R(i) is the recall of query image i, id (i) and id ( j) are the category ID of image i and j, respectively, which are in the range of 1 to 10 categories. M is the original size of the category that image i is from. This value is the percentage of images be- longing to the category (relevant) of image i in the first M retrieved images. The recall is the per- centage of images belonging to the category (relevant) of image i in the N relevant images. For example, if the query image is a dinosaur (Figure 1), if 89 of the first 100 retrieved images are belonging to the category of dinosaurs, the retrieval precision is 0.89%. Similarly, if the query image is an elephant (Figure 1), if 59 of the first 100 retrieved images are belonging to the cate- gory of elephant, the retrieval precision is 0.59% in semantic representation. In global color mod- el, the retrieval precision of dinosaurs is 0.85% and elephant is 0.58%. The average precision of global color model is lesser than DSIFT features semantic representation. Also we observed that in Table 1, the average precision of global color model is 0.49% and the average precision of DSIFT model is 0.62%. Based on the results as mentioned in the Table 1, DSIFT gives better se- mantic representation than global color model. Table 2 Comparisons of average retrieval precision (ARP) and precision obtained by proposed methods to other existing image retrieval systems. Class Global Color Model Dense SIFT Color Mod- el Precision Recall Precision Recall Africa .44 .70 .58 .78 Beaches .38 .69 .52 .76 Buildings .39 .58 .57 .69 Bus .61 .64 .77 .89 Dinosaur .84 .85 .89 .90 Elephant .54 .58 .59 .66 Flower .67 .67 .78 .83 Horses 78 .73 .89 .89 Mountains .49 .53 .59 .62 Food .49 .61 .62 .71 3. EXPERIMENTAL RESULTS We tested a benchmark Corel image dataset, which comprises of 1000 images from 10 categories [16, 17]. Most of the images are in color photographs in JPEG formatwith the size of 256×384 or 384×256 resolutions. We analyzed the images belonging to different categories and not to a spe- cific domain. This dataset has been used in SIMPLIcity [14] and ALIPR [15] projects to demon- strate the results. We also used WBIIS image dataset, which comprises of 10,000 images from 100 categories [16, 17]. Most of the images are 24-bit true color JPEG format images with the size of 85×128 or 128×85 resolutions. The Corel image datasets are used in our experiments to measure the effectiveness of the retrieval system. The SIFT features are given as input to proposed Quantization algorithm to visualize and interpret large high-dimensional data sets. First, we investigate the codebook size on retrieval performance of the system. We conducted many experiments to fix the size of the codebook from 50, 100, 150, 200, 250. The 200 size codebook is the best for Corel image dataset and 1000 size codebook is
  • 7. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 83 the best for WBIIS low resolution Web crawled image dataset. The detailed results of using SIFT features for each of the 10 categories are shown in Table 1. For the SIFT, we tested block of each region of 16×16 and 8×8 with different size of codebook, the best results are obtained with patch size of codebook. The best results are obtained with patch size of 8×8 and N = 200, which are listed in Table 1. The comparative results of average precision of retrieval results in global Col- or/DSIFT model for 10 categories are shown Figure 2. Fig.2. Comparison of average precision of retrieval results in global color/Dense SIFT model for 10 classes 3.1 Performance The proposed system has been used for feature extraction, generating visual dictionary, image re- presentation and similarity measures. The algorithm has been implemented on an IntelR coreTM i3 processor with 2.20 GHz using Windows operating system. To compute feature vectors for 1000 color images of size 256×384 or 384×256, takes approximately 30 minutes. On average, it takes 1.8 seconds to compute the feature vector of an image. To generate code words, it takes about 5 minutes. The matching speed is quite fast in this architecture. If the query image is in the database, it takes about 0.1 seconds of CPU time on an average to intersect query and indexed images, then sort the distance of all the images to be demonstrated at the user side. The new im- age retrieval system is compared with baseline global color model. These are included in order to gauge the precision of retrieval tasks. By using the visual dictionary techniques, the applications have been greatly broadened and encompass category retrieval in image databases. 3.2. Conclusions We have proposed semantic based image representation to retrieve images from the Web image database. The proposed quantization algorithm is an iterative procedure used to generate code words for every image to represent cluster similar features efficiently. The results of our experi- ments show that the proposed technique is able to retrieve similar kind of image effectively from the Web image database. In Web to increase better retrieval performance, we have to do an exten- sive study of different feature representations to reduce quantization error and topological error rate to evaluate the complexity of the output space. We observed that, large numbers of unclassi- fied images are available on the internet, which are the major challenges of collection of Web im- ages. We conducted an Experiment on a Corel Web image database to demonstrate the efficiency and effectiveness of the proposed framework.
  • 8. International Journal on Web Service Computing (IJWSC), Vol.3, No.3, September 2012 84 REFERENCES [1] Rui, Y., Huang, T.S., Chang, S.-F.: Image Retrieval: Current Techniques, Promising Directions, and Open Issues. Journal of Visual Communication and Image Representation. 10 (1), 39-62 (1999). [2] Flickner, M., Sawhney, H., Niblck, W., et al.: Query By Image and Video Content: the QBIC system. IEEE Computer September, 23-31 (1995). [3] Pentland, A., Picard, R.W., Sclarof, S.: Photobook: Tools for Content-Based Manipulation of Image Databases. In: Storage and Retrieval for Image and Video Databases II. In: SPIE Proceedings Series, Vol. 2185, (1994). [4] Minka, T.P.: An Image Database Browser that Learns from User Interaction. Master’s thesis, M.I.T., Cambridge, MA (1996). [5] Michael J. Swain., Charles Frankel., and Vassilis Athitsos.:WebSeer: An Image Search Engine for the World Wide Web, Technical Report 96-14, (1997). [6] J. R. Smith,: Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis. PhD thesis, Graduate School of Arts and Sciences, Columbia University (1997). [7] S. Sclaroff., L. Taycher., and M. La Cascia.: Imagerover: A Content-Based Image Browser for the World Wide Web. In Proceedings IEEE Workshop on Content-based Access of Image and Video Li- braries (1997). [8] Bach, J.R., Fuller, C., Gupta, A., et al.: The Virage Image Search Engine: an open framework for im- age management. In: Sethi, I.K., Jain, R.J. (Eds. , Storage and Retrieval for Image and Video Data- bases IV. In: SPIE Proceedings Series, Vol. 2670. San Jose, CA, USA (1996). [9] Koikkalainen, P.: Progress with the Tree-Structured Self Organizing Map. In: Cohn, A.G. (Ed.), 11th European Conference on Artificial Intelligence. European Committee for Artificial Intelligence (ECCAI). Wiley, New York (1994). [10] Koikkalainen, P., Oja, E.: Self-Organizing Hierarchical Feature Maps. In: Proceedings of 1990 In- ternational Joint Conference on Neural Networks, Vol. II. IEEE, INNS, San Diego, CA (1990). [11] T.Kohonen, Self-Organizing Maps, Springer-Verlag, New York (1997) [12] Salton, G., McGill,M.J.: Introduction to Modern Information Retrieval. In: Computer Science Se-ries. Mc- Graw-Hill, New York (1983). [13] Li, J.,Wang, J. Z. andWiederhold, G.: Integrated Region Matching for Image Retrieval. ACM Multi- media, p. 147-156 (2000). [14] James Z. Wang, Jia Li and Gio Wiederhold.: SIMPLIcity: Semantics-Sensitive Integrated Match-ing for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963 (2001). [15] http://alipr.com/ [16] R. Datta, J. Li, and J. Z.Wang.: Algorithmic Inferencing of Aesthetics and Emotion in Natural Im- ages: An Exposition. Proc. IEEE ICIP, Special Session on Image Aesthetics, Mood and Emotion, San Diego, CA (2008). [17] http://wang.ist.psu.edu/jwang/test1.jar. [18] W-Y. Ma, B.Manjunath.: NeTra: A Toolbox for Navigating Large Databases. Multimedia systems, Vol. 7, n.3, pp. 184-198 (1999). [19] J.Fournier, M.Cord, S. Philipp-Foliguet.: RETIN: A Content-Based Image Indexing and Retrieval System. Journal of Pattern Analysis and Applications, Vol. 4(2/3), pp. 153-173 (2001). [20] J.Sivic , A.Zisserman.: Video Google: A Text Retrieval Approach to Object Matching in Video. In Proc. 9th IEEE International Conference on Computer Vision, pp. 1470-1477 (2003). [21] B. Triggs, F.Jurie.: Creating Efficient Codebooks for Visual Recognition. In IEEE Int. Conf. on Computer Vision, vol. 1, pp. 604-610 (2005). [22] Q.Tian, S.Zhang.: Descriptive VisualWords and Visual Phrases for Image Applications. ACM Mul- timedia, pp. 19-24 (2009). [23] David G.Lowe.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, pp.91-110, 2(2004).