SlideShare a Scribd company logo
2
Most read
12
Most read
13
Most read
Content Based Image
Retrieval (CBIR)
Behzad Shomali
What is CBIR?
Content-based image retrieval, also known as query by image content (QBIC) and content-based
visual information retrieval (CBVIR), is the application of computer vision techniques to the image
retrieval problem, that is, the problem of searching for digital images in large databases.
https://en.wikipedia.org/wiki/Content-based_image_retrieval
Query
Image
Image
Feature
Extraction
Feature
Extraction
Similarity
Measuremen
t
Retrieved
Images
Technologies
● Query by example (QBE)


● Semantic retrieval


● Relevance feedback (human interaction)


● Iterative/machine learning


● Other query methods
https://en.wikipedia.org/wiki/Content-based_image_retrieval
Technologies
● Query by example (QBE)


● Semantic retrieval


● Relevance feedback (human interaction)


● Iterative/machine learning


● Other query methods
https://en.wikipedia.org/wiki/Content-based_image_retrieval
Application in popular search systems
● Google images


○ Constructing a mathematical model


○ Metadata


● eBay


○ ResNet-50 for category recognition


● SK Planet


○ inception-v3 as vision encoder


○ RNN multi-class classification


● Alibaba


○ GoogLeNet V1 for category prediction and feature learning


● Pinterest


○ Two-step object detection
https://en.wikipedia.org/wiki/Reverse_image_search
Application in popular search systems
● Google images


○ Constructing a mathematical model


○ Metadata


● eBay


○ ResNet-50 for category recognition


● SK Planet


○ inception-v3 as vision encoder


○ RNN multi-class classification


● Alibaba


○ GoogLeNet V1 for category prediction and feature learning


● Pinterest


○ Two-step object detection
https://en.wikipedia.org/wiki/Reverse_image_search
Image Representation and Features
● Extract local and deep features


● Studied AlexNet and VGG


○ Extract feature representations from fc6 and fc8 layers


○ Binarized


○ Hamming distance


● Extract salient color signatures


○ Detect salient regions


○ K-means clustering


○ Store cluster centroids and weights as image signature
[DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
Two-step Object Detection and Localization
1. Category classification


2. Object detection
[DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
Car Flower Person
c1 ... cn f1 ... fm p1 ... pk
Input
Category classification
Object detection
Car Flower Person
c1 ... cn f1 ... fm p1 ... pk
Input
Category classification
Object detection
Car Flower Person
c1 ... cn f1 ... fm p1 ... pk
Input
Category classification
Object detection
Reduce
computational cost
Static Evaluation of Search Relevance
● Used dataset contains: 1.6 M unique images


○ Be assumed to be relevant, if two images share a label


● Computed precision@k based on several features


○ The fc6 layer activations from the generic AlexNet (pre-trained for ILSVRC)


○ The fc6 activations of an AlexNet model fine-tuned to recognize over 3,000 Pinterest
products categories


○ The loss3/classifier activations of a generic GoogLeNet


○ The fc6 activations of a generic VGG 16-layer model
[DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
Precision vs. Recall
[Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601]
Precision vs. Recall
[Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601]
Either value alone contains insufficient information


● We can always make recall 1, simply by retrieving all images


● Similarly, precision can be kept high by retrieving only a few images
● P (10) ; P (30) ; P (NR) - the precision after the first 10 ; 30 ; NR documents are retrieved


● Mean Average Precision - mean (non-interpolated) average precision .


● recall at .5 precision - recall at the rank where precision drops below .5.


● R (1000) - recall after 1000 documents are retrieved.


● Rank first relevant - The rank of the highest-ranked relevant document.
Precision vs. Recall
[Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601]
Either value alone contains insufficient information


● We can always make recall 1, simply by retrieving all images


● Similarly, precision can be kept high by retrieving only a few images
Precision and recall
should either be
used together
● P (10) ; P (30) ; P (NR) - the precision after the first 10 ; 30 ; NR documents are retrieved


● Mean Average Precision - mean (non-interpolated) average precision .


● recall at .5 precision - recall at the rank where precision drops below .5.


● R (1000) - recall after 1000 documents are retrieved.


● Rank first relevant - The rank of the highest-ranked relevant document.
Relevance of visual search
Table 1 shows p@5 and p@10 performance of these models, along with the average CPU-based
latency of our visual search service, which includes feature extraction for the query image as well as
retrieval.
[DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
Siamese networks
[Das, Arpita, et al. "Together we stand: Siamese networks for similar question retrieval." Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). 2016]
Siamese networks
● Let, F(X) be the family of functions with set of parameters W. F(X) is assumed to be
differentiable with respect to W. Siamese network seeks a value of the parameter W such that
the symmetric similarity metric is small if X1 and X2 belong to the same category, and large if
they belong to different categories.
[Das, Arpita, et al. "Together we stand: Siamese networks for similar question retrieval." Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). 2016]
Different loss functions for training a Siamese network
Two commonly used ones are


● Triplet loss


● Contrastive loss


The main idea of these loss functions is to
pull the samples of every class toward one
another and push the samples of different
classes away from each other
[Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural
Networks (IJCNN). IEEE, 2020]
Different loss functions - Triplet loss
The triplet loss uses anchor, neighbor, and distant. Let f(x) be the output (i.e., embedding) of the network
for the input x. The triplet loss tries to reduce the distance of anchor and neighbor embeddings and desires
to increase the distance of anchor and distant embeddings. As long as the distances of anchor-distant pairs
get larger than the distances of anchor-neighbor pairs by a margin α ≥ 0, the desired embedding is obtained
[Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural
Networks (IJCNN). IEEE, 2020]
Different loss functions - Contrastive loss
The contrastive loss uses pairs of samples which can be anchor and neighbor or anchor and distant. If the
samples are anchor and neighbor, they are pulled towards each other; otherwise, their distance is
increased. In other words, the contrastive loss performs like the triplet loss but one by one rather than
simultaneously. The desired embedding is obtained when the anchor-distant distances get larger than the
anchor-neighbor distances by a margin of α
[Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural
Networks (IJCNN). IEEE, 2020]

More Related Content

PPT
Content based image retrieval(cbir)
PDF
Content Based Image Retrieval
PPTX
Features image processing and Extaction
PDF
Moving Object Detection And Tracking Using CNN
PDF
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPTX
Handwritten digit recognition using image processing
PPT
Hive(ppt)
PPTX
Image Processing and Computer Vision
Content based image retrieval(cbir)
Content Based Image Retrieval
Features image processing and Extaction
Moving Object Detection And Tracking Using CNN
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
Handwritten digit recognition using image processing
Hive(ppt)
Image Processing and Computer Vision

What's hot (20)

PPTX
information retrieval Techniques and normalization
PPT
★Mean shift a_robust_approach_to_feature_space_analysis
PDF
Introduction to OpenCV
PPT
Hidden lines & surfaces
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PPTX
Computer vision
PPTX
Image captioning
PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
PPTX
AI Computer vision
ODP
Image Processing with OpenCV
PDF
Machine learning Algorithms
PPTX
Object tracking
PDF
Deep learning
PPTX
Computer vision
PPTX
Concept of basic illumination model
PPT
hidden surface elimination using z buffer algorithm
PPTX
Active contour segmentation
PPTX
Hidden surface removal
PDF
Abnormal activity detection in surveillance video scenes
PPTX
Content Based Image Retrieval
information retrieval Techniques and normalization
★Mean shift a_robust_approach_to_feature_space_analysis
Introduction to OpenCV
Hidden lines & surfaces
Transfer Learning and Fine-tuning Deep Neural Networks
Computer vision
Image captioning
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
AI Computer vision
Image Processing with OpenCV
Machine learning Algorithms
Object tracking
Deep learning
Computer vision
Concept of basic illumination model
hidden surface elimination using z buffer algorithm
Active contour segmentation
Hidden surface removal
Abnormal activity detection in surveillance video scenes
Content Based Image Retrieval
Ad

Similar to Content Based Image Retrieval (CBIR) (20)

PDF
Kernel based similarity estimation and real time tracking of moving
PDF
Scene Description From Images To Sentences
PDF
2019 cvpr paper overview by Ho Seong Lee
PDF
2019 cvpr paper_overview
PPTX
Optimized Feedforward Network of CNN with Xnor Final Presentation
PDF
物件偵測與辨識技術
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PDF
Cartoonization of images using machine Learning
PDF
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
PDF
AI and Deep Learning
PDF
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
PDF
K-Means Clustering in Moving Objects Extraction with Selective Background
PDF
ArtificialIntelligenceInObjectDetection-Report.pdf
PDF
Hand Written Digit Classification
PDF
IMAGE GENERATION FROM CAPTION
PDF
Image Generation from Caption
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Image classification with Deep Neural Networks
PDF
最近の研究情勢についていくために - Deep Learningを中心に -
PDF
Learning with Relative Attributes
Kernel based similarity estimation and real time tracking of moving
Scene Description From Images To Sentences
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper_overview
Optimized Feedforward Network of CNN with Xnor Final Presentation
物件偵測與辨識技術
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Cartoonization of images using machine Learning
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
AI and Deep Learning
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
K-Means Clustering in Moving Objects Extraction with Selective Background
ArtificialIntelligenceInObjectDetection-Report.pdf
Hand Written Digit Classification
IMAGE GENERATION FROM CAPTION
Image Generation from Caption
Introduction to Machine Learning with SciKit-Learn
Image classification with Deep Neural Networks
最近の研究情勢についていくために - Deep Learningを中心に -
Learning with Relative Attributes
Ad

Recently uploaded (20)

PPTX
Sustainable Forest Management ..SFM.pptx
PPTX
Anesthesia and it's stage with mnemonic and images
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
The Effect of Human Resource Management Practice on Organizational Performanc...
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPTX
fundraisepro pitch deck elegant and modern
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
Human Mind & its character Characteristics
PDF
natwest.pdf company description and business model
PDF
Microsoft-365-Administrator-s-Guide_.pdf
PPT
The Effect of Human Resource Management Practice on Organizational Performanc...
DOC
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
PPTX
Relationship Management Presentation In Banking.pptx
PPTX
worship songs, in any order, compilation
PDF
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
PDF
Presentation1 [Autosaved].pdf diagnosiss
PPT
First Aid Training Presentation Slides.ppt
Sustainable Forest Management ..SFM.pptx
Anesthesia and it's stage with mnemonic and images
Swiggy’s Playbook: UX, Logistics & Monetization
An Unlikely Response 08 10 2025.pptx
The Effect of Human Resource Management Practice on Organizational Performanc...
Instagram's Product Secrets Unveiled with this PPT
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
fundraisepro pitch deck elegant and modern
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
Hydrogel Based delivery Cancer Treatment
Human Mind & its character Characteristics
natwest.pdf company description and business model
Microsoft-365-Administrator-s-Guide_.pdf
The Effect of Human Resource Management Practice on Organizational Performanc...
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
Relationship Management Presentation In Banking.pptx
worship songs, in any order, compilation
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
Presentation1 [Autosaved].pdf diagnosiss
First Aid Training Presentation Slides.ppt

Content Based Image Retrieval (CBIR)

  • 1. Content Based Image Retrieval (CBIR) Behzad Shomali
  • 2. What is CBIR? Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. https://en.wikipedia.org/wiki/Content-based_image_retrieval Query Image Image Feature Extraction Feature Extraction Similarity Measuremen t Retrieved Images
  • 3. Technologies ● Query by example (QBE) ● Semantic retrieval ● Relevance feedback (human interaction) ● Iterative/machine learning ● Other query methods https://en.wikipedia.org/wiki/Content-based_image_retrieval
  • 4. Technologies ● Query by example (QBE) ● Semantic retrieval ● Relevance feedback (human interaction) ● Iterative/machine learning ● Other query methods https://en.wikipedia.org/wiki/Content-based_image_retrieval
  • 5. Application in popular search systems ● Google images ○ Constructing a mathematical model ○ Metadata ● eBay ○ ResNet-50 for category recognition ● SK Planet ○ inception-v3 as vision encoder ○ RNN multi-class classification ● Alibaba ○ GoogLeNet V1 for category prediction and feature learning ● Pinterest ○ Two-step object detection https://en.wikipedia.org/wiki/Reverse_image_search
  • 6. Application in popular search systems ● Google images ○ Constructing a mathematical model ○ Metadata ● eBay ○ ResNet-50 for category recognition ● SK Planet ○ inception-v3 as vision encoder ○ RNN multi-class classification ● Alibaba ○ GoogLeNet V1 for category prediction and feature learning ● Pinterest ○ Two-step object detection https://en.wikipedia.org/wiki/Reverse_image_search
  • 7. Image Representation and Features ● Extract local and deep features ● Studied AlexNet and VGG ○ Extract feature representations from fc6 and fc8 layers ○ Binarized ○ Hamming distance ● Extract salient color signatures ○ Detect salient regions ○ K-means clustering ○ Store cluster centroids and weights as image signature [DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
  • 8. Two-step Object Detection and Localization 1. Category classification 2. Object detection [DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
  • 9. Car Flower Person c1 ... cn f1 ... fm p1 ... pk Input Category classification Object detection
  • 10. Car Flower Person c1 ... cn f1 ... fm p1 ... pk Input Category classification Object detection
  • 11. Car Flower Person c1 ... cn f1 ... fm p1 ... pk Input Category classification Object detection Reduce computational cost
  • 12. Static Evaluation of Search Relevance ● Used dataset contains: 1.6 M unique images ○ Be assumed to be relevant, if two images share a label ● Computed precision@k based on several features ○ The fc6 layer activations from the generic AlexNet (pre-trained for ILSVRC) ○ The fc6 activations of an AlexNet model fine-tuned to recognize over 3,000 Pinterest products categories ○ The loss3/classifier activations of a generic GoogLeNet ○ The fc6 activations of a generic VGG 16-layer model [DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
  • 13. Precision vs. Recall [Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601]
  • 14. Precision vs. Recall [Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601] Either value alone contains insufficient information ● We can always make recall 1, simply by retrieving all images ● Similarly, precision can be kept high by retrieving only a few images ● P (10) ; P (30) ; P (NR) - the precision after the first 10 ; 30 ; NR documents are retrieved ● Mean Average Precision - mean (non-interpolated) average precision . ● recall at .5 precision - recall at the rank where precision drops below .5. ● R (1000) - recall after 1000 documents are retrieved. ● Rank first relevant - The rank of the highest-ranked relevant document.
  • 15. Precision vs. Recall [Müller, Henning, et al. "Performance evaluation in content-based image retrieval: overview and proposals." Pattern recognition letters 22.5 (2001): 593-601] Either value alone contains insufficient information ● We can always make recall 1, simply by retrieving all images ● Similarly, precision can be kept high by retrieving only a few images Precision and recall should either be used together ● P (10) ; P (30) ; P (NR) - the precision after the first 10 ; 30 ; NR documents are retrieved ● Mean Average Precision - mean (non-interpolated) average precision . ● recall at .5 precision - recall at the rank where precision drops below .5. ● R (1000) - recall after 1000 documents are retrieved. ● Rank first relevant - The rank of the highest-ranked relevant document.
  • 16. Relevance of visual search Table 1 shows p@5 and p@10 performance of these models, along with the average CPU-based latency of our visual search service, which includes feature extraction for the query image as well as retrieval. [DavJing, Yushi, et al. "Visual search at pinterest." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015]
  • 17. Siamese networks [Das, Arpita, et al. "Together we stand: Siamese networks for similar question retrieval." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016]
  • 18. Siamese networks ● Let, F(X) be the family of functions with set of parameters W. F(X) is assumed to be differentiable with respect to W. Siamese network seeks a value of the parameter W such that the symmetric similarity metric is small if X1 and X2 belong to the same category, and large if they belong to different categories. [Das, Arpita, et al. "Together we stand: Siamese networks for similar question retrieval." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016]
  • 19. Different loss functions for training a Siamese network Two commonly used ones are ● Triplet loss ● Contrastive loss The main idea of these loss functions is to pull the samples of every class toward one another and push the samples of different classes away from each other [Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020]
  • 20. Different loss functions - Triplet loss The triplet loss uses anchor, neighbor, and distant. Let f(x) be the output (i.e., embedding) of the network for the input x. The triplet loss tries to reduce the distance of anchor and neighbor embeddings and desires to increase the distance of anchor and distant embeddings. As long as the distances of anchor-distant pairs get larger than the distances of anchor-neighbor pairs by a margin α ≥ 0, the desired embedding is obtained [Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020]
  • 21. Different loss functions - Contrastive loss The contrastive loss uses pairs of samples which can be anchor and neighbor or anchor and distant. If the samples are anchor and neighbor, they are pulled towards each other; otherwise, their distance is increased. In other words, the contrastive loss performs like the triplet loss but one by one rather than simultaneously. The desired embedding is obtained when the anchor-distant distances get larger than the anchor-neighbor distances by a margin of α [Ghojogh, Benyamin, et al. "Fisher discriminant triplet and contrastive losses for training siamese networks." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020]