SlideShare a Scribd company logo
Kernel Descriptors for Visual Recognition
by L.Bo, X.Ren and D.Fox
A Term Paper Report by Priyatham Bollimpalli (10010148)
Summary of the Paper
Popular Computer Vision algorithms like SIFT and HOG compute feature descriptor for an
image. A descriptor for an image is in simple terms, a concise representation of the image
properties which enables them to be used for many practical applications such as object
recognition, scene detection, image matching etc. Inspired from the orientation histogram
approach used in SIFT and HOG, this paper defines kernel orientation histogram and then
designs kernel descriptors for gradient, colour and local binary pattern (shape) using match
kernels. The definition of these kernels resulted in the reduction of granularity of low level
pixel features and made the idea of similaritybetween patches(high level features) come true.
To generate kernels in a computationally feasible manner, first match kernels are
approximated to finite dimension taking a set of finite basis vector from sampled normalized
gradient vectors. Then to reduce the redundancy and generate the compact features, Kernel
Principal Component Analysis is done. It is shown experimentally that the error which results
in these two stages is very less. Now gradient, colour and shape kernel descriptors are
computed more efficiently and in a simple, straight forward way over the images.
Experiment is done on four publicly available datasets: Scene-15, Caltech101, CIFAR10 and
CIFAR10-ImageNet. These datasets are for image classification and Laplacian kernel SVMs is
used in the experiments to classify. It is shown that the gradient kernel descriptor performs
best among the proposed kernel descriptors. All of them perform better than the SIFT
descriptor and other sophisticated feature learning methods.
The main novelty in the paper is that this is the first work done on kernels which is based on
low-level visual feature learning and that shows better performance than very famous
methods which are used as default choice for many applications. But some of the limitations
of this proposed scheme is the high computational time (even after optimizing) compared to
other methods and difficulty in learning pixel attributes from large image collection to
approximate the kernel. But since this area of research is new, alternative kernel functions or
using the existing one in combination of other kernel methods may get around this limitation,
further improving the performance or using in other areas where SIFT is used such as object
tracking, multi-view matching etc.
Details and Explanation of the paper
The gradient orientation at a pixel plays an important role in describing the features of the
image and this concept has been extensively used in many image descriptors. For example,
SIFT descriptor assigns the orientations to 8 bins as depicted below across 4 x 4 block.
Feature vector of each pixel z is defined as F(z) = m(z)(z) where m(z) is the magnitude of
the gradient and the ith
component of (z) is 1 if the gradient falls in ith bin and 0 otherwise.
Soft bin formulation can also be used as (z) = max(cos((z ),ai )9
, 0) where (z ) is the gradient
and ai is bin center. Over a patch P, histogram of gradients is obtained by
𝐹ℎ = ∑ 𝑚̃( 𝑧)δ(z)𝑧∈𝑃 where 𝑚̃( 𝑧) = 𝑚( 𝑧)/√∑ 𝑚( 𝑧)2 + 𝜖𝑧∈𝑃 (normalized magnitudes)
Intuitively, the similarity between two patches P and Q from different images is defined as
Since there are only inner product in the RHS, kernel functions can be defined between two
pixels and hence kernalized notion of similarity between two patches (as in HOG) is obtained.
But defining the kernel in this way introduces quantization errors and poor performance.
So to capture image variations properly, Gradient match kernel is defined as follows.
Here kp and ko are Gaussian kernels over position of pixel and orientations respectively. To
get more accuracy and for defining in uniform way, the values of pixel positions and
orientations are normalized.
The motivation for defining the gradient match kernel K as product of three kernels is as
follows. First we have to weigh the contribution of each pixel gradient magnitude and
normalized linear kernel is used for this. Then a measure of similarity of gradient orientations
should be included and the last Gaussian kernel kp measures how close two pixels are
spatially. By similar motivation, colour match kernel is defined (c(z) is the colour at z).
In shape kernel, s is the standard deviation of pixel values in the 3 x 3 neighborhood, b (z) is
binary column vector with the pixel value differences in a local window around z. Thus in
Shape Kernel descriptor, the contribution of each local binary pattern s(z) is weighed, and
shape similarity is obtained through local binary patterns b(z).
Features over image patches can be expressed as
Since Gaussian kernels are used, Fgrad(P) has infinite dimensions. Directly applying KPCA may
be computationally infeasible when the number of patches is very large. So first match kernels
are approximated directly by learning finite-dimensional features obtained by projecting
Fgrad(P ) into a set of basis vectors. An example to approximate Gaussian kernel over gradients
to d dimensions is shown below. Here xi are sampled normalized gradient vectors.
Note that the Kronekar product ⨂ is used to compute the features which still results in large
number of dimensions. Now to achieve fewer compact features, KPCA is done. This makes
the computation time of evaluation practical. The tth
kernel principle component is written as
Finally the gradient kernel descriptor is expressed as shown below. It is shown that the error
incurred in approximating the match kernels in this way is very less.
The gradient (KDES-G), color (KDES-C), and shape (KDES-S) kernel descriptors are compared
to SIFT and several other state of the art object recognition algorithms using four publicly
available datasets of Scene-15, Caltech101, CIFAR10, and CIFAR10-ImageNet. Except in
CIFAR10, Laplacian kernel SVMs are used in the experiments. The summary of the result is
shown below. The combination of the three kernel descriptors is observed to boost the
performance by 2%. Thus we can see that the proposed kernel descriptor outperforms all the
other methods.
Scene-15 Caltech-101
KDES 86.7% KDES 76.4% CDBN[2]
65.5%
SIFT 82.2% SPM [1]
64.4% LCC[4]
73.4%
CIFAR10 KDES 76.0% LCC[4]
74.5%
mcRBM-DBN[3]
71.0% TCNN[5]
73.1%
[1]Lazebnik, Schmid, Ponce, CVPR '06 [2]Lee, Grosse, Ranganath, Ng, ICML '09 [3]Ranzato, Hinton, CVPR '10 [4]Yu,
Zhang, ICML '10 [5]Le, Ngiam, Chen, Chia, Koh, Ng, NIPS '10

More Related Content

PDF
www.ijerd.com
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
PPTX
ICRA 2015 interactive presentation
PDF
Edge Representation Learning with Hypergraphs
PDF
Presentation visapp
PPTX
Tutorial on Object Detection (Faster R-CNN)
PDF
tScene classification using pyramid histogram of
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
www.ijerd.com
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
ICRA 2015 interactive presentation
Edge Representation Learning with Hypergraphs
Presentation visapp
Tutorial on Object Detection (Faster R-CNN)
tScene classification using pyramid histogram of
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...

What's hot (19)

PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PPTX
Aerial detection part2
PDF
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
PPTX
Aerial detection1
PPT
Double Patterning
PPT
Double Patterning (4/2 update)
PPT
Double Patterning
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Building and road detection from large aerial imagery
PDF
Comparison of Various RCNN techniques for Classification of Object from Image
PDF
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
PPTX
Deep image retrieval - learning global representations for image search - ub ...
PDF
Detection focal loss 딥러닝 논문읽기 모임 발표자료
PDF
DNR - Auto deep lab paper review ppt
PPTX
Thesis Presentation
PPTX
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
PDF
Implementation of D* Path Planning Algorithm with NXT LEGO Mindstorms Kit for...
PPTX
Convolutional Patch Representations for Image Retrieval An unsupervised approach
PPT
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Accurate Learning of Graph Representations with Graph Multiset Pooling
Aerial detection part2
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Aerial detection1
Double Patterning
Double Patterning (4/2 update)
Double Patterning
Welcome to International Journal of Engineering Research and Development (IJERD)
Building and road detection from large aerial imagery
Comparison of Various RCNN techniques for Classification of Object from Image
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Deep image retrieval - learning global representations for image search - ub ...
Detection focal loss 딥러닝 논문읽기 모임 발표자료
DNR - Auto deep lab paper review ppt
Thesis Presentation
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Implementation of D* Path Planning Algorithm with NXT LEGO Mindstorms Kit for...
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Ad

Similar to Kernel Descriptors for Visual Recognition (20)

PDF
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
PDF
Applying R-spatiogram in Object Tracking for Occlusion Handling
PDF
A survey on feature descriptors for texture image classification
PDF
Oc2423022305
PDF
Currency recognition on mobile phones
PDF
Deferred Pixel Shading on the PLAYSTATION®3
PDF
Deferred Pixel Shading on the PlayStation 3
PDF
Low complexity features for jpeg steganalysis using undecimated dct
PDF
Speeded-up and Compact Visual Codebook for Object Recognition
PDF
Probabilistic model based image segmentation
PDF
Texture descriptor based on local combination adaptive ternary pattern
PPT
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PDF
Object Shape Representation by Kernel Density Feature Points Estimator
PDF
Conception_et_realisation_dun_site_Web_d.pdf
PDF
3.[18 30]graph cut based local binary patterns for content based image retrieval
PDF
11.framework of smart mobile rfid networks
PDF
11.graph cut based local binary patterns for content based image retrieval
PDF
3.[13 21]framework of smart mobile rfid networks
PDF
Empirical Coding for Curvature Based Linear Representation in Image Retrieval...
APPLYING R-SPATIOGRAM IN OBJECT TRACKING FOR OCCLUSION HANDLING
Applying R-spatiogram in Object Tracking for Occlusion Handling
A survey on feature descriptors for texture image classification
Oc2423022305
Currency recognition on mobile phones
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PlayStation 3
Low complexity features for jpeg steganalysis using undecimated dct
Speeded-up and Compact Visual Codebook for Object Recognition
Probabilistic model based image segmentation
Texture descriptor based on local combination adaptive ternary pattern
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
Object Shape Representation by Kernel Density Feature Points Estimator
Conception_et_realisation_dun_site_Web_d.pdf
3.[18 30]graph cut based local binary patterns for content based image retrieval
11.framework of smart mobile rfid networks
11.graph cut based local binary patterns for content based image retrieval
3.[13 21]framework of smart mobile rfid networks
Empirical Coding for Curvature Based Linear Representation in Image Retrieval...
Ad

More from Priyatham Bollimpalli (10)

PDF
Meta Machine Learning: Hyperparameter Optimization
PDF
Science and Ethics: The Manhattan Project during World War II
PDF
Auction Portal
PDF
IIT JEE Seat Allocation System
PDF
Design and Fabrication of 4-bit processor
PDF
Library Management System
PDF
Interface for Finding Close Matches from Translation Memory
PPTX
GCC RTL and Machine Description
PDF
The problem of Spatio-Temporal Invariant Points in Videos
PDF
Literature Survey on Interest Points based Watermarking
Meta Machine Learning: Hyperparameter Optimization
Science and Ethics: The Manhattan Project during World War II
Auction Portal
IIT JEE Seat Allocation System
Design and Fabrication of 4-bit processor
Library Management System
Interface for Finding Close Matches from Translation Memory
GCC RTL and Machine Description
The problem of Spatio-Temporal Invariant Points in Videos
Literature Survey on Interest Points based Watermarking

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Geodesy 1.pptx...............................................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CH1 Production IntroductoryConcepts.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Embodied AI: Ushering in the Next Era of Intelligent Systems
Arduino robotics embedded978-1-4302-3184-4.pdf
Mechanical Engineering MATERIALS Selection
Geodesy 1.pptx...............................................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
additive manufacturing of ss316l using mig welding
Sustainable Sites - Green Building Construction
Lecture Notes Electrical Wiring System Components

Kernel Descriptors for Visual Recognition

  • 1. Kernel Descriptors for Visual Recognition by L.Bo, X.Ren and D.Fox A Term Paper Report by Priyatham Bollimpalli (10010148) Summary of the Paper Popular Computer Vision algorithms like SIFT and HOG compute feature descriptor for an image. A descriptor for an image is in simple terms, a concise representation of the image properties which enables them to be used for many practical applications such as object recognition, scene detection, image matching etc. Inspired from the orientation histogram approach used in SIFT and HOG, this paper defines kernel orientation histogram and then designs kernel descriptors for gradient, colour and local binary pattern (shape) using match kernels. The definition of these kernels resulted in the reduction of granularity of low level pixel features and made the idea of similaritybetween patches(high level features) come true. To generate kernels in a computationally feasible manner, first match kernels are approximated to finite dimension taking a set of finite basis vector from sampled normalized gradient vectors. Then to reduce the redundancy and generate the compact features, Kernel Principal Component Analysis is done. It is shown experimentally that the error which results in these two stages is very less. Now gradient, colour and shape kernel descriptors are computed more efficiently and in a simple, straight forward way over the images. Experiment is done on four publicly available datasets: Scene-15, Caltech101, CIFAR10 and CIFAR10-ImageNet. These datasets are for image classification and Laplacian kernel SVMs is used in the experiments to classify. It is shown that the gradient kernel descriptor performs best among the proposed kernel descriptors. All of them perform better than the SIFT descriptor and other sophisticated feature learning methods. The main novelty in the paper is that this is the first work done on kernels which is based on low-level visual feature learning and that shows better performance than very famous methods which are used as default choice for many applications. But some of the limitations of this proposed scheme is the high computational time (even after optimizing) compared to other methods and difficulty in learning pixel attributes from large image collection to approximate the kernel. But since this area of research is new, alternative kernel functions or using the existing one in combination of other kernel methods may get around this limitation, further improving the performance or using in other areas where SIFT is used such as object tracking, multi-view matching etc.
  • 2. Details and Explanation of the paper The gradient orientation at a pixel plays an important role in describing the features of the image and this concept has been extensively used in many image descriptors. For example, SIFT descriptor assigns the orientations to 8 bins as depicted below across 4 x 4 block. Feature vector of each pixel z is defined as F(z) = m(z)(z) where m(z) is the magnitude of the gradient and the ith component of (z) is 1 if the gradient falls in ith bin and 0 otherwise. Soft bin formulation can also be used as (z) = max(cos((z ),ai )9 , 0) where (z ) is the gradient and ai is bin center. Over a patch P, histogram of gradients is obtained by 𝐹ℎ = ∑ 𝑚̃( 𝑧)δ(z)𝑧∈𝑃 where 𝑚̃( 𝑧) = 𝑚( 𝑧)/√∑ 𝑚( 𝑧)2 + 𝜖𝑧∈𝑃 (normalized magnitudes) Intuitively, the similarity between two patches P and Q from different images is defined as Since there are only inner product in the RHS, kernel functions can be defined between two pixels and hence kernalized notion of similarity between two patches (as in HOG) is obtained. But defining the kernel in this way introduces quantization errors and poor performance. So to capture image variations properly, Gradient match kernel is defined as follows. Here kp and ko are Gaussian kernels over position of pixel and orientations respectively. To get more accuracy and for defining in uniform way, the values of pixel positions and orientations are normalized. The motivation for defining the gradient match kernel K as product of three kernels is as follows. First we have to weigh the contribution of each pixel gradient magnitude and normalized linear kernel is used for this. Then a measure of similarity of gradient orientations should be included and the last Gaussian kernel kp measures how close two pixels are spatially. By similar motivation, colour match kernel is defined (c(z) is the colour at z). In shape kernel, s is the standard deviation of pixel values in the 3 x 3 neighborhood, b (z) is binary column vector with the pixel value differences in a local window around z. Thus in Shape Kernel descriptor, the contribution of each local binary pattern s(z) is weighed, and shape similarity is obtained through local binary patterns b(z).
  • 3. Features over image patches can be expressed as Since Gaussian kernels are used, Fgrad(P) has infinite dimensions. Directly applying KPCA may be computationally infeasible when the number of patches is very large. So first match kernels are approximated directly by learning finite-dimensional features obtained by projecting Fgrad(P ) into a set of basis vectors. An example to approximate Gaussian kernel over gradients to d dimensions is shown below. Here xi are sampled normalized gradient vectors. Note that the Kronekar product ⨂ is used to compute the features which still results in large number of dimensions. Now to achieve fewer compact features, KPCA is done. This makes the computation time of evaluation practical. The tth kernel principle component is written as Finally the gradient kernel descriptor is expressed as shown below. It is shown that the error incurred in approximating the match kernels in this way is very less. The gradient (KDES-G), color (KDES-C), and shape (KDES-S) kernel descriptors are compared to SIFT and several other state of the art object recognition algorithms using four publicly available datasets of Scene-15, Caltech101, CIFAR10, and CIFAR10-ImageNet. Except in CIFAR10, Laplacian kernel SVMs are used in the experiments. The summary of the result is shown below. The combination of the three kernel descriptors is observed to boost the performance by 2%. Thus we can see that the proposed kernel descriptor outperforms all the other methods. Scene-15 Caltech-101 KDES 86.7% KDES 76.4% CDBN[2] 65.5% SIFT 82.2% SPM [1] 64.4% LCC[4] 73.4% CIFAR10 KDES 76.0% LCC[4] 74.5% mcRBM-DBN[3] 71.0% TCNN[5] 73.1% [1]Lazebnik, Schmid, Ponce, CVPR '06 [2]Lee, Grosse, Ranganath, Ng, ICML '09 [3]Ranzato, Hinton, CVPR '10 [4]Yu, Zhang, ICML '10 [5]Le, Ngiam, Chen, Chia, Koh, Ng, NIPS '10