CBIR by deep learning

© Vigen Sahakyan 2016
Content Based Image Retrieval by
Deep Learning

Agenda
● Goals
● What is CBIR?
● What is Deep Learning ?
● AutoEncoder
● Tool description

Goals
● We want to create Image search system based on Machine Learning
technique, which can do searching by image content. It has lots of
applications in public safety, military, medicine diagnoses e.t.c
● In modern web we have millions and billions of images without labels and
only a couple thousands of labeled images. The problem is how we can use
the power of this unlabeled data in our system ?
● In this presentation we explain our CBIR system which able to collect all
meaningful information from unlabeled data by using one of the widely used
Deep Learning technique which is called AutoEncoder.

What is CBIR?
● Content Based Image Retrieval (CBIR)
● Is the process by which one searches for similar images.
● "Content-based" means that the search analyzes the contents of the image
rather than the metadata such as keywords, tags, or descriptions associated
with the image.
● One of the open problems in Computer Vision.
● It has lots of applications in many fields such as (Public safety, Military,
Medical Diagnoses, Robotics e.t.c)

What is Deep Learning?
1. Deep learning is a branch of machine learning based on a set of algorithms that attempt to model
high-level abstractions in data by using multiple processing layers.
2. It’s used in Machine Learning to automatically figure out high level feature.
3. By Deep Learning we can extract high level features like shape, texture, contrast e.t.c from image
datasets(it’s not necessary for images to be labeled).
4. There are lots of Deep Learning algorithms
like Convolutional and Recursive Neural
Network, Deep Belief Network, Restricted
Boltzmann Machine e.t.c. In this work we
were used AutoEncoder .
5. It has lots of applications in many fields such
as (Computer Vision, Search Engines, Speech
Recognition, Artificial Intelligence e.t.c)

AutoEncoder
● The aim of an autoencoder is to learn a representation (encoding) for a set of data,
typically for the purpose of dimensionality reduction.
● Recently, the autoencoder concept has become more widely used for learning
generative models of data
● The AutoEncoder is also a Neural Network.
The difference is that the AutoEncoder uses
unsupervised learning. To achieve this, the
AutoEncoder gets the same input value vector
at the output. Differences in the vectors at the
output can be considered errors for
backpropagation. It try to learn codec on hidden
layer (encoded value).
● Input = Decode(Encode(Input))

Tool description
1. First of all Web service receive raw image (.jpg, .png, e.t.c) and pass it to
preprocessing step.
2. Preprocess raw Image:
a. Resize image to the appropriate size (our model size)
b. Generate GrayScale representation of resized image.
3. Generate row vector from preprocessed image pixels.
4. Call Normalization module

Tool description
We call sigmoid function on value of every neuron
and it useful to have normalized inputs, to find global
minimum faster and improve error rate.
1. We do Min-Max normalization of input values by following
formula. zi
=(xi
−min(x))/(max(x)−min(x))
2. In our case zi
= xi
/ 255
3. Call Encoding module

Tool description
We have already pretrained our AutoEncoder model via stochastic gradient
descent. As dataset we used 60000 unlabeled images of handwritten digits. After
training AutoEncoder figured out lots of high level feature of those images.
1. We feed our normalized row image to our AutoEncoder then we get more
compact feature vector (this vector represent probabilities of each high level
feature to be found on this image).
2. We pass new compact vector to Classifier module. (There isn’t need to
normalize this vector as it’s already had normalized when passed through
sigmoid function)

Tool description
We pre trained our Neural Network classifier with several
thousands of labeled examples which were passed through
the AutoEncoder.
1. We feed row vector encoded by AutoEncoder
and call Result retrieval module to figure out
Result class from output layer.

Tool description
Each node in the output layer will have a probability that it's class is the
correct output.
1. If the probability of one of the outputs class is greater than the
threshold (0.5) then it is considered as result class.

Result
We tested our algorithm on MNIST digital handwritten image dataset and
compared it with the couple of famous article results.
MNIST
Our algorithm 95%
Yann LeCun algorithm 95.3%
Aurelio Ranzato algorithm 99%

CBIR by deep learning

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to CBIR by deep learning (20)

Recently uploaded (20)

CBIR by deep learning