SlideShare a Scribd company logo
By
G a j j a r B h av i n ku m a r
(IU1571090002)
30th July, 2022
Synopsis
(Electronics and Communication Engineering)
Sparse based feature parameterization and multi
kernel SVM for large scale scene classification
Under the supervision of
D r. H i r e n M e w a d a
(Associate Professor, EE,PMU)
D r. A s h w i n Pa t n i
(Assistant Professor, E&C,IITE,IU)
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Introduction of Image Classification
***Methods of Feature Selection
Exhaustive search,Branch and Bound Search,Relaxed Branch and Bound,Selecting Best Individual
Features,Sequential Forward Selection(SFS),Sequential Backward Selection(SBS),Sequential
Floating Forward Search(SFFS),Sequential Floating Backward Search and Max-Min approach etc…
Classification of Image feature
Color Feature
Histogram,momemnt(CM),Col
or Coherence Vector(CCV),
Color Correlogram
Texture Feature
The Grey Level Co-occurrence
Matrix, Edge Detection, Laws
Texture Energy Measures
Shape Feature
Binary image algorithm,
Horizontal and vertical
segmentation
4
What is Image/object Classification?
Classification Techniques
Classification
Supervised Unsupervised
Distribution Free
Euclidean classifier
K-nearest
neighbour
Minimum distance
Decision Tree
Statistical
Techniques
based on
probability
Distribution
models,which may
be parametric or
nonparametric
Clustering
No extensive prior
knowledge required
Unknown, but distinct,
spectral classes are
generated
Limited control over
classes and identities
No detailed
information
• Large dimensionality of classes reduce the accuracy.
• In real-time most of the high dimensional datasets
do not follow normal distribution. Hence, Linear
kernel fails to classify image.
• Bag of word representation can not capture the
spatial information.
• Dense features representation makes it difficult to
learn.
• Linear SVM algorithm is not suitable for large data
sets
7
Challenges in Image Classification
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Motivation from literature
Over the past few years, the classification and recognition of vision have gained
importance.
There are three main component involved
1. Point of interest detection
2. Description of region of interest (Feature based)
3. Classification (Kernel based)
Feature Based:
To solve the multiclass reorganization problems, there are many supervised
[1][2][3][4] and unsupervised [5][6][7][8] techniques used with sparse dictionaries.
The state-of-the-art is accompanied by results on standard benchmark datasets,
i.e. Caltech-101 [9], Caltech-256 [10], and Scene-15 [11]
As reported in [8], vector quantization is used to generate sparse code with
maximum pooling. By using this approach, the computation complexity of SVM is
significantly reduced from O(n2) to O(n).
Motivation from literature
[12] suggested a method for multi-scale spatial latent semantic analysis based on
sparse coding. The spatial pyramid matching of image segmentation is used to
extract the target's spatial position information, and feature soft quantization
based on sparse coding is utilised to produce a co-occurrence matrix, which
increases the accuracy of the original feature representation.
For matching multilevel detail locally in the learning and recognition stages, multi
resolution pyramids were introduced in SIFT (PSIFT) feature space in [13]. This P-
SIFT experiment showed positive results for streamline work.
The authors of [14] experimented with a classification technique based on SIFT, in
which SIFT are clustered using KNN to build a dictionary and then used the SPM to
generate a feature vector.
Feature Based:
Motivation from literature
Across all these studies, authors did not report the effect of SIFT parameters in
their algorithms.
Table 1 lists the parameters controlling the SIFT features. The majority of
experiments in the literature use default values without tuning them for each task.
As part of the first experiment, we investigated SIFT parameters on a sparse-
based dictionary approach for image classification as suggested by Yang et al. [8].
Feature Based:
Motivation from literature
The combination of various descriptors employing multiple kernels SVM was
introduced in [15] and demonstrated a significant improvement in various scene
classifications.
In [16] proposed the multilabel least-squares SVM method. For the multi-label
scene classification problem, they used multi-kernel RBFbased SVM. The classifier
was validated on four datasets, with a maximum accuracy of 85%
Kancherla et al [17] validate the effect of kernel in SVM. They simulated the
algorithm with a 3 to 4 class dataset and used different feature sets with various
linear kernel SVM. On the MIT dataset, they discovered that the RBF kernel
outperforms other kernels with a classification rate of 82.06 percent.
Kernel based:
Motivation from literature
[18] presented an SVM-based scene classification method for robotic applications.
The robotic development necessitates quick execution. As a result, from the
captured scene, heuristic metric-based key points were identified and used in the
SVM model. They conclude that combining local binary pattern and SURF features
with SVM yielded higher accuracy than a VGG-based neural network model.
To classify hyperspectral images, [20] proposed a hybrid approach of spatial,
spectral, and semantic features. Gabor-based structural features are combined
with morphological-based spatial features and semantic features based on K-
means and entropy. A composite kernel is then created that corresponds to these
three features, achieving an accuracy of 98%.
Conversely, in a large dataset, SVM outperforms NN when features are interpreted
geometrically. Real-world scene classification was achieved with the combination
of dense SIFT, color SIFT, and structure similarity, as well as localized multikernel
neural networks[23].
Kernel based:
Motivation from literature
Overall, Multi-kernel SVMs have proved essential in many recognition and
classification applications. Despite the advantage of multikernel over CNN
approaches for classifying scenes amongst a large number of categories, further
improvement is needed to reduce the miss-classification rates among databases
containing many classes.
In addition, robust features can be achieved if redundancy is minimized and the
SVM kernel is designed with optimized parameters consistent with these feature
sets.
Kernel based:
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Objective and Scope of the work
Objective of study:
 Check the effectiveness of the sparse data in image classification.
 Addressing the issue of which size and types of dictionary is best for large scale
dataset.
 Selecting robust features that can address this problem.
 How linear vs Non-linear kernels of traditional SVM classifier affect on large
scale dataset?
 Find the possibilities of reducing computational cost compared to modern
Neural Networks for satisfactory accuracy.
 Experimenting pros and cons of traditional Machine Learning over Modern
deep learning algorithms.
Objective and Scope of the work
Scope of the work:
In machine vision, there is no any rigorous study of tuning most proven SIFT
feature in classification task. Our study suggests that SIFT feature can be tuned
according to problem and that features can be sparsified by matching the
appropriate size of dictionary. Any traditional machine learning approach can
take advantage of this feature set in order to deal with modern deep learning
algorithms where the requirements of training data, training time, and
computational hardware are higher.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Problem definition
 Image classification problems include intra-class variation, scale variation,
view-point variation, occlusion, lighting, background clutter, etc. Feature
selection, kernels, classifiers, machine learning, and deep-learning algorithms
can be applied.
 Until date, it was difficult to apply any of these methodologies to large-scale
data while preserving accuracy.
 Sparse representation has shown significant potential in dealing with these
challenges.
 Traditional classification techniques that use sparse representations lack
image label information. The current deep learning technique's primary flaw
is its excessively expensive training effort. Integrating existing sparse
representation technologies into deep learning is a valuable unresolved topic.
Problem definition
 We presented a methodology for bridging sparse and machine learning
algorithms and showed its performance for large datasets. The research aims
to enhance multi-class large dataset classification accuracy.
 Sparse picture characteristics and machine learning will be used for
categorization.
 Another sub-objective is to optimize machine learning speed and class
detection with appropriate accuracy.
Problem statement in summury
1. Classification accuracy in multiclass is still difficult with existing techniques
2. Computational time is second concern to optimize with 1.
3. Sparse and ML based approach for classification will be overlooked
4. Possible outcome will be an efficient algorithm which satisfy 1-2-3.
5. Targeted benchmark data set : Caltech-101,Caltech-256, Scene-15
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Original contribution by the thesis
The impact of dictionary size and type
 converges quickly
 KSVD
 16x16 image patch size
 Over-complete dictionary of size 256x1024
Parameterizing SIFT (T-SIFT)
 SIFT descriptor size of 128 is insufficient for all data sizes
 SIFT can be customized
 256 size descriptor with 16 angels and 4 SIFT bins is sufficient
 Table-3
 T-SIFT is more robust
 T-SIFT outperforms CNN in hardware, training time, and training data requirements.
Multi-kernel SVM with Tuned SIFT
 Gaussian Kernel outperforms the Polynomial and its fusion
 Improvement on Caltech-101: 4% and Scene-15 : 10%
 Caltech-256 is difficult to train with minimal hardware.
 T-SIFT with MKL SVM is a novel method
Original contribution by the thesis
The impact of dictionary size and type
Parameterizing SIFT (T-SIFT)
Multi-kernel SVM with Tuned SIFT
Summary:
 This thesis presents a distinctive contribution by providing some recommendations for
modifying the parameters value chosen for the Dictionary and SIFT.
 When contrasted with the prior art, Tunable-use SIFTs in Sparse coded Spatial Pyramid
Matching (ScSPM) and Multi-Kernel nonlinear Support Vector Machines (SVM) produce
significant gains in terms of classification accuracy.
 In addition, the uniqueness of the contribution can be seen in the studies that are
referenced in the bibliography.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Methodologies of Research and Results
Intel Core i3 of 2.50 GHz, 8 GB RAM, and Windows-10 of 64 bit machine
SIFT feature analysis and
T-SIFT implementations
Sparse coded SPM with
multi kernel SVM
implementation
First method
Phase-1:
The impact of
dictionary size
and type
Phase-2:
Parameterizing SIFT
(T-SIFT)
Second method
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Figure-1: Proposed tunable SIFT ScSPM
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
There were two phases of the study for the first method:
1 - Dictionary learning
2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
In this experiment, we used kernel weights dm to solve the convex optimization problem stated
in equation-7 using SVM as proposed in [30]. To obtain kernel weights d, the fusions of the
kernels with the weights of respective coefficients are listed in Tab. 4.
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Conclusion and Future work
 The size and sparsity of the dictionary are determined by the SIFT parameters.
Therefore, in the first experiment we are presenting the effect of orientation
and orientation bins on the size and sparsity of feature vectors.
 By reducing the average number of coefficients, the study concludes that 30
iterations are sufficient to achieve maximum sparsity in the dictionary.
 After obtaining the maximum sparsity of the dictionary, the effect of dictionary
sizes on overall classification accuracy is examined.
 In further research, it was found that the classification accuracy would be less
for low values of either orientation or orientation bins in histogram formation.
As a result, the appropriate choice of those two parameters results in a boost in
performance as described in the first method. SVM linear kernels were used in
this empirical study.
Conclusion and Future work
 Secondly, we investigated the fusion of Nonlinear Multi Kernel Learning (MKL).
Although CNNs have achieved high popularity in classification models, they
require a lot of training time and computation power.
 SVM has a greater flexibility in characterization than CNN, if a suitable kernel is
used for challenging datasets. As a single kernel, it is limited to datasets with
linear classification.
 Therefore, a multi-kernel SVM has been re-experimented with the aim of
optimizing the kernels and studying the various parameters affecting the kernel
performance in classification.
 The function of various parameters has been investigated to eliminate duplicate
features in the evaluation of simple MKL over ScSPM features for classification
accuracy.
Conclusion and Future work
 The effect of MKL on overall classification accuracy is presented after obtaining
the maximum sparsity of the dictionary. Even with the simplest combination of a
single type kernel, such as Polynomial, as represented in Tab. 4, accuracy will be
greater than the single kernel SVM method.
 For 101 class datasets, using several combination of Gaussian kernels improved
classification accuracy to 85.72 percent.
 With an increasing number of Gaussian kernels, training time and storage needs
grow, making it impossible to work on huge datasets like Caltech-256 with
minimal hardware requirements.
 As a whole, we conclude that working with strong features and Multi kernels on
object identification is still an open area. We will investigate the impact of this
feature on similar classes in the future.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
List of publications
1. Gajjar, Bhavinkumar, Hiren Mewada and Ashwin Patani. "Parameterizing sift
and sparse dictionary for svm based multi-class object classification“
International Journal of Artificial Intelligence 19 (2021): 95-108.
http://www.ceser.in/ceserp/index.php/ijai/article/view/6647 (SCOPUS)
2. Gajjar, Bhavinkumar, Hiren Mewada, and Ashwin Patani. "Sparse coded
spatial pyramid matching and multikernel integrated SVM for non-linear
scene classification" Journal of Electrical Engineering 72.6 (2021): 374-380.
https://doi.org/10.2478/jee-2021-0053/(SCOPUS)
1. cameraman.tif 2. rice.png 3. circlesBrightDark.png 4. liftingBody.png
Results for Matching Pursuit Algorithm
Dict1- Discrete Wavelet
Dict2- DCT and Kronecker Delta
Dict3- Haar Wavelet Packets and DCT
Dict4- K-SVD
Arpan Patel. ”Image Classification with sparse coding and machine learning” thesis. CSPIT,
2017.
43
L7
Objective and Scope of the work
1. Check the effectiveness of the sparse data in image classification.
2. Addressing the issue of which size and types of dictionary is best for large scale
dataset.
3. Selecting robust features that can address this problem.
4. How linear vs Non-linear kernels of traditional SVM classifier affect on large scale
dataset?
5. Find the possibilities of reducing computational cost compared to modern Neural
Networks for satisfactory accuracy.
6. Experimenting pros and cons of traditional Machine Learning over Modern deep
learning algorithms.
Objective and Scope of the work
In machine vision, there is no any rigorous study of tuning most proven SIFT feature in
classification task. Our study suggests that SIFT feature can be tuned according to problem
and that features can be sparsified by matching the appropriate size of dictionary. Any
traditional machine learning approach can take advantage of this feature set in order to
deal with modern deep learning algorithms where the requirements of training data,
training time, and computational hardware are higher.
Features (+Sparse) Kernel function of
Classifier
Classification techniques(ML)
Speeded Up Robust Features (SURF) Linear K-Means
Features from Accelerated Segment Test
(FAST)
RBF SVM
Binary Robust Independent Elementary
Features (BRIEF)
Polinomial K-nearest neighbour(KNN)
Oriented FAST and Rotated BRIEF (ORB) sigmoid Artificial Neural Network(ANN)
Histogram of Oriented Gradients (HOGs) Convolutional neural
Network(CNN)
… … …
Good features
Classification
techniques
Kernels of
classifier
Accuracy ?
Computation Time?
Introduction of Image Classification
 Challenges: intra-class variation, scale variation, view-point variation, occlusion,
illumination, background clutter
 Approaches: feature selection, kernels, classifiers, machine learning and deep-learning
algorithms
48
A sparse matrix is a one in which the majority of the values are zero. The proportion of
zero elements to non-zero elements is called the sparsity of the matrix. The opposite of
a sparse matrix, in which the majority of its values are non-zero, is called a dense
matrix.
5 0 0 0
0 11 0 0
0 0 25 0
0 0 0 7
Sparsity = 3 (12 Zeros / 4 Non-zeros)
Advantage:
 save a significant amount of memory
 speed up the processing of that data
 Reduce computation time by eliminating operations on zero elements
What is Sparse?
What is Sparse?
49
Sparse model
50
Greedy Algorithms
Matching Pursuit(MP)
Orthogonal Matching Pursuit(OMP)
[Stagewise Orthogonal Matching Pursuit (StOMP),
Subspace Pursuit (SP),
Compressive Sampling Matching Pursuit (CoSaMP),
Regularized Orthogonal Matching Pursuit (ROMP),
Gradient Pursuit (GP),
Iterative Hard Thresholding (IHT),
Hard Thresholding Pursuit (HTP)]
Relaxation Algorithm
 Basic Pursuit(BP)
 Least-Absolute-Shrinkage-and-
Selection-Operator (LASSO)
 FOcal Under-determined System
Solver (FOCUSS)
Sparse coding
51
1. Maximum Likelihood (ML)
2. Method of Optimal Directions (MOD)
3. K-SVD
4. Simultaneous Codeword Optimization (SimCO)
Dictionary Learning Algorithms
D
Initialize D
Sparse Coding
(Greedy/Relaxation
Algorithms)
Dictionary Update
(DL Algorithms)
Aharon, Elad, & Bruckstein (`04)
Y
T
The K-SVD Algorithm - General
 Can be applied to almost everything
 Classifications or numerical predictions
 Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
x 1
x2
This is the maximum margin
solution
x 1
x2
Support vectors:
 the points from each class that are closest to the maximum margin hyperplane
 each class have at least 1 support vector
Support vectors
x1
x2
With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible  so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
Non-linear spaces
 In many real-world applications, the relationships between variables are
non-linear
 A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
 Non-linear relationship may suddenly appears to be quite linear
We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large  the algorithm tries to find a 100% separation
C is low  wider overall margin is allowed with more misclassified data points
2
latitude
longitude
Kernels
It can be weather classes: sunny and snowy
kernel
altitude
longitude
latitude
longitude
kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
 Can be applied to almost everything
 Classifications or numerical predictions
 Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
x 1
x2
This is the maximum margin
solution
x 1
x2
Support vectors:
 the points from each class that are closest to the maximum margin hyperplane
 each class have at least 1 support vector
Support vectors
x1
x2
With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible  so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
Non-linear spaces
 In many real-world applications, the relationships between variables are
non-linear
 A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
 Non-linear relationship may suddenly appears to be quite linear
We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large  the algorithm tries to find a 100% separation
C is low  wider overall margin is allowed with more misclassified data points
2
latitude
longitude
Kernels
It can be weather classes: sunny and snowy
kernel
altitude
longitude
latitude
longitude
kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
kernel
Higher dimensional space
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
latitude
longitude
altitude
longitude
kernel
SVM learns concepts that were not explicitly measured in the original data !!!
Higher dimensional space
latitude
longitude
altitude
longitude
Kernel functions
Φ(x) “phi function”
This is the mapping of data x into an other space
K(x , x ) this is the kernel function
i j
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
Advantages
 SVM can be used for regression problems as well as for classifications
 Not overly influenced by noisy data
 Easier to use than neural networks
 Finding the best model requires testing of various combinations of kernels
and model parameters
 Quite slow  especially when the input dataset has a large number of
features
Disadvantages
93
Algorithms/
No. of
Classes
2 Class
Bonsai and
car side
5 class 20 40 80 101
Spars+SIFT
+SVM
100% 95.38% 79.19% 76.07% 75.26% 73.13%
Sparse
+SVM
47.37% 43.10% - - - -
SIFT
+SVM
56.56% 52.60% - - - -
Classification Test results
Caltech 101
94
Overview: Kernel-based learning
Lower dimension
Input Space
Higher dimension
Feature Space
Kernel
Design
 Kernel measures the similarity between data points
 Kernel transformation helps in using in linear separation algorithm
like Support Vector Classification (SVC) in higher dimensions
95
Same data can have elements that show different patterns
Best kernel is a linear combination
of different kernels
Overview: Kernel-based learning
96
Single Kernel SVM to Multikernel SVM[ 8 ]
97
Single Kernel SVM to Multikernel SVM[ 8 ]
98
SIFT
Dataset Features MKL SVM
Multikernel Approach
99
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
Kernels used
100
Results we obtained Caltech101
MKL performance in our Algorithm
No. Kernel Parameter
No of training
image/Class
Accuracy
1 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 30 75.52
2 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 15 -
3 Polynomial [1 2 3] 30 75.70
4 Polynomial [1 2 3] 15 69.29
5 Gaussian +
Gaussian+Polynomial
+Polynomial +
[0.5 1 2 5 7 10 12 15 17 20]
[1 2 3]
30 74.97
6 Polynomial + Gausian +
Polynomia
[1 2 3]
[0.5 1 2 5 7 10 12 15 17 20]
30 75.58
Single Kernel Performance in our Algorithm
No. Kernel No of training Images/class Accuracy
1 Linear 30 69.71
2 Polynomial 30 64.18
3 Gaussian 30 61.81
101
Algorithms 15 training images/class 30 training images/class
Zhang et al. [1] 59.10±0.60 66.20±0.50
KSPM [2] 56.40 64.40±0.80
NBNN [3] 65.00±1.14 70.40
ML+CORR [4] 61.00 69.60
KC [5] - 64.14±1.18
LSPM[6] 53.23±0.65 58.81±1.51
ScSPM[6] 67.0±0.45 73.2±0.54
DMKDL [7] - 82.66± 0.36*
MKLDPL [7] - 86.81±0.21*
Our method
(Best Result)
69.29±0.98 75.70±1.30
*30 images for training and 15 images for testing
Comp aris on with oth er meth od s for Caltech 1 0 1 Datas et
102
 Try other Kernels with different L-norms
 Work on Two more dataset Caltech256, Scene-15
 Understand the cost function effect on different dataset in SVM
 Divide the training and testing data with standard approximation for all class
and check performance
 Publish a paper on above results
Future Work
103
Feature extraction
SVM
Caltech-101
Caltech-256
Scene-15
Dictionary Learning
Sparse Coding
Training features & Labels Testing features
Classified labels
% A c c u r a c y
SIFT, LBP etc…
KSVD,SimCO
OMP,MP,BP
Multikernel,
Cost Function
Using LBP:63
Using SIFT: 65
Fusion of SIFT+LBP : -
SPM+SIFT: ~77
Using SimCo: ~68
Using OMP: ~66
Using KSVD: ~73
Multikernel: ~ 75.70
Single kernel: ~ 69.71
Testing Labels
104
Sparse formulation of feature vector
Attractive properties of Sparse Coding.
 First, compared with the VQ coding, SC coding can achieve a much lower reconstruction
error due to the less restrictive constraint;
 Second, sparsity allows the representation to be specialize, and to capture salient
properties of images;
Third, research in image statistics clearly reveals that image patches are sparse signals.
D
Initialize D
Sparse Coding
(Greedy/Relaxation
Algorithms)
Dictionary Update
(DL Algorithms)
Aharon, Elad, & Bruckstein (`04)
Y
T
The K-SVD Algorithm - General
A
106
Caltech101 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
107
Caltech256 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
108
Caltech256 Learned Dictionary patches
256 x 2048:
32x64(Patches)
109
Scene-15 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
110
Linear SPM kernel for SVM
Histogram z
SVM decision function
for binary class
K : Any Kernel Function
α : weight
b : fix bias
111
Linear SPM kernel for SVM
Pooling Function
Max pooling
SPM kernel
Primal
formulation for
SVM [6]
Y
A
z
112
Spatial Pyramid Matching
113
Key Parameter of SIFT feature
[ Lowe, David G. "Distinctive image features from scale-
invariant keypoints." International journal of computer
vision 60.2 (2004): 91-110.]
2 × 2 descriptor array computed from an 8 × 8 set of samples

More Related Content

PDF
research_paper
PDF
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
PDF
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
PDF
Flickr Image Classification using SIFT Algorism
PDF
Image classification
PDF
C1803011419
PDF
J25043046
PDF
J25043046
research_paper
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
Flickr Image Classification using SIFT Algorism
Image classification
C1803011419
J25043046
J25043046

Similar to SYNOPSIS on Parse representation and Linear SVM. (20)

PDF
Analysis of Classification Approaches
DOCX
Remote Sensing Image Scene Classification
ODP
An Introduction to Computer Vision
PDF
PPT s11-machine vision-s2
PPTX
Large scale object recognition (AMMAI presentation)
PDF
PPT s09-machine vision-s2
PDF
Texture Classification
PPTX
Computer_Vision_ItsHistory_Advantages_and Uses.pptx
PPTX
Digital_Image_Classification.pptx
PDF
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
PPTX
Conventional Neural Networks and compute
PDF
Cm31588593
PDF
Lecture 02 internet video search
PDF
Combining Generative And Discriminative Classifiers For Semantic Automatic Im...
PDF
IRJET - Symmetric Image Registration based on Intensity and Spatial Informati...
PDF
Influence Analysis of Image Feature Selection TechniquesOver Deep Learning Model
PDF
Linearity of Feature Extraction Techniques for Medical Images by using Scale ...
PDF
A comparison of SIFT, PCA-SIFT and SURF
PDF
thesis
PDF
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
Analysis of Classification Approaches
Remote Sensing Image Scene Classification
An Introduction to Computer Vision
PPT s11-machine vision-s2
Large scale object recognition (AMMAI presentation)
PPT s09-machine vision-s2
Texture Classification
Computer_Vision_ItsHistory_Advantages_and Uses.pptx
Digital_Image_Classification.pptx
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
Conventional Neural Networks and compute
Cm31588593
Lecture 02 internet video search
Combining Generative And Discriminative Classifiers For Semantic Automatic Im...
IRJET - Symmetric Image Registration based on Intensity and Spatial Informati...
Influence Analysis of Image Feature Selection TechniquesOver Deep Learning Model
Linearity of Feature Extraction Techniques for Medical Images by using Scale ...
A comparison of SIFT, PCA-SIFT and SURF
thesis
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Lecture1 pattern recognition............
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Knowledge Engineering Part 1
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Quality review (1)_presentation of this 21
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Computer network topology notes for revision
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
annual-report-2024-2025 original latest.
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Lecture1 pattern recognition............
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
Database Infoormation System (DBIS).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Knowledge Engineering Part 1
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Quality review (1)_presentation of this 21
ISS -ESG Data flows What is ESG and HowHow
IBA_Chapter_11_Slides_Final_Accessible.pptx
Computer network topology notes for revision
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
annual-report-2024-2025 original latest.
Ad

SYNOPSIS on Parse representation and Linear SVM.

  • 1. By G a j j a r B h av i n ku m a r (IU1571090002) 30th July, 2022 Synopsis (Electronics and Communication Engineering) Sparse based feature parameterization and multi kernel SVM for large scale scene classification Under the supervision of D r. H i r e n M e w a d a (Associate Professor, EE,PMU) D r. A s h w i n Pa t n i (Assistant Professor, E&C,IITE,IU)
  • 2. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 3. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 4. Introduction of Image Classification ***Methods of Feature Selection Exhaustive search,Branch and Bound Search,Relaxed Branch and Bound,Selecting Best Individual Features,Sequential Forward Selection(SFS),Sequential Backward Selection(SBS),Sequential Floating Forward Search(SFFS),Sequential Floating Backward Search and Max-Min approach etc… Classification of Image feature Color Feature Histogram,momemnt(CM),Col or Coherence Vector(CCV), Color Correlogram Texture Feature The Grey Level Co-occurrence Matrix, Edge Detection, Laws Texture Energy Measures Shape Feature Binary image algorithm, Horizontal and vertical segmentation 4
  • 5. What is Image/object Classification?
  • 6. Classification Techniques Classification Supervised Unsupervised Distribution Free Euclidean classifier K-nearest neighbour Minimum distance Decision Tree Statistical Techniques based on probability Distribution models,which may be parametric or nonparametric Clustering No extensive prior knowledge required Unknown, but distinct, spectral classes are generated Limited control over classes and identities No detailed information
  • 7. • Large dimensionality of classes reduce the accuracy. • In real-time most of the high dimensional datasets do not follow normal distribution. Hence, Linear kernel fails to classify image. • Bag of word representation can not capture the spatial information. • Dense features representation makes it difficult to learn. • Linear SVM algorithm is not suitable for large data sets 7 Challenges in Image Classification
  • 8. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 9. Motivation from literature Over the past few years, the classification and recognition of vision have gained importance. There are three main component involved 1. Point of interest detection 2. Description of region of interest (Feature based) 3. Classification (Kernel based) Feature Based: To solve the multiclass reorganization problems, there are many supervised [1][2][3][4] and unsupervised [5][6][7][8] techniques used with sparse dictionaries. The state-of-the-art is accompanied by results on standard benchmark datasets, i.e. Caltech-101 [9], Caltech-256 [10], and Scene-15 [11] As reported in [8], vector quantization is used to generate sparse code with maximum pooling. By using this approach, the computation complexity of SVM is significantly reduced from O(n2) to O(n).
  • 10. Motivation from literature [12] suggested a method for multi-scale spatial latent semantic analysis based on sparse coding. The spatial pyramid matching of image segmentation is used to extract the target's spatial position information, and feature soft quantization based on sparse coding is utilised to produce a co-occurrence matrix, which increases the accuracy of the original feature representation. For matching multilevel detail locally in the learning and recognition stages, multi resolution pyramids were introduced in SIFT (PSIFT) feature space in [13]. This P- SIFT experiment showed positive results for streamline work. The authors of [14] experimented with a classification technique based on SIFT, in which SIFT are clustered using KNN to build a dictionary and then used the SPM to generate a feature vector. Feature Based:
  • 11. Motivation from literature Across all these studies, authors did not report the effect of SIFT parameters in their algorithms. Table 1 lists the parameters controlling the SIFT features. The majority of experiments in the literature use default values without tuning them for each task. As part of the first experiment, we investigated SIFT parameters on a sparse- based dictionary approach for image classification as suggested by Yang et al. [8]. Feature Based:
  • 12. Motivation from literature The combination of various descriptors employing multiple kernels SVM was introduced in [15] and demonstrated a significant improvement in various scene classifications. In [16] proposed the multilabel least-squares SVM method. For the multi-label scene classification problem, they used multi-kernel RBFbased SVM. The classifier was validated on four datasets, with a maximum accuracy of 85% Kancherla et al [17] validate the effect of kernel in SVM. They simulated the algorithm with a 3 to 4 class dataset and used different feature sets with various linear kernel SVM. On the MIT dataset, they discovered that the RBF kernel outperforms other kernels with a classification rate of 82.06 percent. Kernel based:
  • 13. Motivation from literature [18] presented an SVM-based scene classification method for robotic applications. The robotic development necessitates quick execution. As a result, from the captured scene, heuristic metric-based key points were identified and used in the SVM model. They conclude that combining local binary pattern and SURF features with SVM yielded higher accuracy than a VGG-based neural network model. To classify hyperspectral images, [20] proposed a hybrid approach of spatial, spectral, and semantic features. Gabor-based structural features are combined with morphological-based spatial features and semantic features based on K- means and entropy. A composite kernel is then created that corresponds to these three features, achieving an accuracy of 98%. Conversely, in a large dataset, SVM outperforms NN when features are interpreted geometrically. Real-world scene classification was achieved with the combination of dense SIFT, color SIFT, and structure similarity, as well as localized multikernel neural networks[23]. Kernel based:
  • 14. Motivation from literature Overall, Multi-kernel SVMs have proved essential in many recognition and classification applications. Despite the advantage of multikernel over CNN approaches for classifying scenes amongst a large number of categories, further improvement is needed to reduce the miss-classification rates among databases containing many classes. In addition, robust features can be achieved if redundancy is minimized and the SVM kernel is designed with optimized parameters consistent with these feature sets. Kernel based:
  • 15. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 16. Objective and Scope of the work Objective of study:  Check the effectiveness of the sparse data in image classification.  Addressing the issue of which size and types of dictionary is best for large scale dataset.  Selecting robust features that can address this problem.  How linear vs Non-linear kernels of traditional SVM classifier affect on large scale dataset?  Find the possibilities of reducing computational cost compared to modern Neural Networks for satisfactory accuracy.  Experimenting pros and cons of traditional Machine Learning over Modern deep learning algorithms.
  • 17. Objective and Scope of the work Scope of the work: In machine vision, there is no any rigorous study of tuning most proven SIFT feature in classification task. Our study suggests that SIFT feature can be tuned according to problem and that features can be sparsified by matching the appropriate size of dictionary. Any traditional machine learning approach can take advantage of this feature set in order to deal with modern deep learning algorithms where the requirements of training data, training time, and computational hardware are higher.
  • 18. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 19. Problem definition  Image classification problems include intra-class variation, scale variation, view-point variation, occlusion, lighting, background clutter, etc. Feature selection, kernels, classifiers, machine learning, and deep-learning algorithms can be applied.  Until date, it was difficult to apply any of these methodologies to large-scale data while preserving accuracy.  Sparse representation has shown significant potential in dealing with these challenges.  Traditional classification techniques that use sparse representations lack image label information. The current deep learning technique's primary flaw is its excessively expensive training effort. Integrating existing sparse representation technologies into deep learning is a valuable unresolved topic.
  • 20. Problem definition  We presented a methodology for bridging sparse and machine learning algorithms and showed its performance for large datasets. The research aims to enhance multi-class large dataset classification accuracy.  Sparse picture characteristics and machine learning will be used for categorization.  Another sub-objective is to optimize machine learning speed and class detection with appropriate accuracy.
  • 21. Problem statement in summury 1. Classification accuracy in multiclass is still difficult with existing techniques 2. Computational time is second concern to optimize with 1. 3. Sparse and ML based approach for classification will be overlooked 4. Possible outcome will be an efficient algorithm which satisfy 1-2-3. 5. Targeted benchmark data set : Caltech-101,Caltech-256, Scene-15
  • 22. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 23. Original contribution by the thesis The impact of dictionary size and type  converges quickly  KSVD  16x16 image patch size  Over-complete dictionary of size 256x1024 Parameterizing SIFT (T-SIFT)  SIFT descriptor size of 128 is insufficient for all data sizes  SIFT can be customized  256 size descriptor with 16 angels and 4 SIFT bins is sufficient  Table-3  T-SIFT is more robust  T-SIFT outperforms CNN in hardware, training time, and training data requirements. Multi-kernel SVM with Tuned SIFT  Gaussian Kernel outperforms the Polynomial and its fusion  Improvement on Caltech-101: 4% and Scene-15 : 10%  Caltech-256 is difficult to train with minimal hardware.  T-SIFT with MKL SVM is a novel method
  • 24. Original contribution by the thesis The impact of dictionary size and type Parameterizing SIFT (T-SIFT) Multi-kernel SVM with Tuned SIFT Summary:  This thesis presents a distinctive contribution by providing some recommendations for modifying the parameters value chosen for the Dictionary and SIFT.  When contrasted with the prior art, Tunable-use SIFTs in Sparse coded Spatial Pyramid Matching (ScSPM) and Multi-Kernel nonlinear Support Vector Machines (SVM) produce significant gains in terms of classification accuracy.  In addition, the uniqueness of the contribution can be seen in the studies that are referenced in the bibliography.
  • 25. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 26. Methodologies of Research and Results Intel Core i3 of 2.50 GHz, 8 GB RAM, and Windows-10 of 64 bit machine SIFT feature analysis and T-SIFT implementations Sparse coded SPM with multi kernel SVM implementation First method Phase-1: The impact of dictionary size and type Phase-2: Parameterizing SIFT (T-SIFT) Second method
  • 27. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Figure-1: Proposed tunable SIFT ScSPM
  • 28. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations There were two phases of the study for the first method: 1 - Dictionary learning 2 - Training the classifier.
  • 29. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 1 - Dictionary learning
  • 30. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 1 - Dictionary learning
  • 31. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 32. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 33. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 34. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation
  • 35. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation In this experiment, we used kernel weights dm to solve the convex optimization problem stated in equation-7 using SVM as proposed in [30]. To obtain kernel weights d, the fusions of the kernels with the weights of respective coefficients are listed in Tab. 4.
  • 36. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation
  • 37. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 38. Conclusion and Future work  The size and sparsity of the dictionary are determined by the SIFT parameters. Therefore, in the first experiment we are presenting the effect of orientation and orientation bins on the size and sparsity of feature vectors.  By reducing the average number of coefficients, the study concludes that 30 iterations are sufficient to achieve maximum sparsity in the dictionary.  After obtaining the maximum sparsity of the dictionary, the effect of dictionary sizes on overall classification accuracy is examined.  In further research, it was found that the classification accuracy would be less for low values of either orientation or orientation bins in histogram formation. As a result, the appropriate choice of those two parameters results in a boost in performance as described in the first method. SVM linear kernels were used in this empirical study.
  • 39. Conclusion and Future work  Secondly, we investigated the fusion of Nonlinear Multi Kernel Learning (MKL). Although CNNs have achieved high popularity in classification models, they require a lot of training time and computation power.  SVM has a greater flexibility in characterization than CNN, if a suitable kernel is used for challenging datasets. As a single kernel, it is limited to datasets with linear classification.  Therefore, a multi-kernel SVM has been re-experimented with the aim of optimizing the kernels and studying the various parameters affecting the kernel performance in classification.  The function of various parameters has been investigated to eliminate duplicate features in the evaluation of simple MKL over ScSPM features for classification accuracy.
  • 40. Conclusion and Future work  The effect of MKL on overall classification accuracy is presented after obtaining the maximum sparsity of the dictionary. Even with the simplest combination of a single type kernel, such as Polynomial, as represented in Tab. 4, accuracy will be greater than the single kernel SVM method.  For 101 class datasets, using several combination of Gaussian kernels improved classification accuracy to 85.72 percent.  With an increasing number of Gaussian kernels, training time and storage needs grow, making it impossible to work on huge datasets like Caltech-256 with minimal hardware requirements.  As a whole, we conclude that working with strong features and Multi kernels on object identification is still an open area. We will investigate the impact of this feature on similar classes in the future.
  • 41. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 42. List of publications 1. Gajjar, Bhavinkumar, Hiren Mewada and Ashwin Patani. "Parameterizing sift and sparse dictionary for svm based multi-class object classification“ International Journal of Artificial Intelligence 19 (2021): 95-108. http://www.ceser.in/ceserp/index.php/ijai/article/view/6647 (SCOPUS) 2. Gajjar, Bhavinkumar, Hiren Mewada, and Ashwin Patani. "Sparse coded spatial pyramid matching and multikernel integrated SVM for non-linear scene classification" Journal of Electrical Engineering 72.6 (2021): 374-380. https://doi.org/10.2478/jee-2021-0053/(SCOPUS)
  • 43. 1. cameraman.tif 2. rice.png 3. circlesBrightDark.png 4. liftingBody.png Results for Matching Pursuit Algorithm Dict1- Discrete Wavelet Dict2- DCT and Kronecker Delta Dict3- Haar Wavelet Packets and DCT Dict4- K-SVD Arpan Patel. ”Image Classification with sparse coding and machine learning” thesis. CSPIT, 2017. 43 L7
  • 44. Objective and Scope of the work 1. Check the effectiveness of the sparse data in image classification. 2. Addressing the issue of which size and types of dictionary is best for large scale dataset. 3. Selecting robust features that can address this problem. 4. How linear vs Non-linear kernels of traditional SVM classifier affect on large scale dataset? 5. Find the possibilities of reducing computational cost compared to modern Neural Networks for satisfactory accuracy. 6. Experimenting pros and cons of traditional Machine Learning over Modern deep learning algorithms.
  • 45. Objective and Scope of the work In machine vision, there is no any rigorous study of tuning most proven SIFT feature in classification task. Our study suggests that SIFT feature can be tuned according to problem and that features can be sparsified by matching the appropriate size of dictionary. Any traditional machine learning approach can take advantage of this feature set in order to deal with modern deep learning algorithms where the requirements of training data, training time, and computational hardware are higher.
  • 46. Features (+Sparse) Kernel function of Classifier Classification techniques(ML) Speeded Up Robust Features (SURF) Linear K-Means Features from Accelerated Segment Test (FAST) RBF SVM Binary Robust Independent Elementary Features (BRIEF) Polinomial K-nearest neighbour(KNN) Oriented FAST and Rotated BRIEF (ORB) sigmoid Artificial Neural Network(ANN) Histogram of Oriented Gradients (HOGs) Convolutional neural Network(CNN) … … … Good features Classification techniques Kernels of classifier Accuracy ? Computation Time?
  • 47. Introduction of Image Classification  Challenges: intra-class variation, scale variation, view-point variation, occlusion, illumination, background clutter  Approaches: feature selection, kernels, classifiers, machine learning and deep-learning algorithms
  • 48. 48 A sparse matrix is a one in which the majority of the values are zero. The proportion of zero elements to non-zero elements is called the sparsity of the matrix. The opposite of a sparse matrix, in which the majority of its values are non-zero, is called a dense matrix. 5 0 0 0 0 11 0 0 0 0 25 0 0 0 0 7 Sparsity = 3 (12 Zeros / 4 Non-zeros) Advantage:  save a significant amount of memory  speed up the processing of that data  Reduce computation time by eliminating operations on zero elements What is Sparse? What is Sparse?
  • 50. 50 Greedy Algorithms Matching Pursuit(MP) Orthogonal Matching Pursuit(OMP) [Stagewise Orthogonal Matching Pursuit (StOMP), Subspace Pursuit (SP), Compressive Sampling Matching Pursuit (CoSaMP), Regularized Orthogonal Matching Pursuit (ROMP), Gradient Pursuit (GP), Iterative Hard Thresholding (IHT), Hard Thresholding Pursuit (HTP)] Relaxation Algorithm  Basic Pursuit(BP)  Least-Absolute-Shrinkage-and- Selection-Operator (LASSO)  FOcal Under-determined System Solver (FOCUSS) Sparse coding
  • 51. 51 1. Maximum Likelihood (ML) 2. Method of Optimal Directions (MOD) 3. K-SVD 4. Simultaneous Codeword Optimization (SimCO) Dictionary Learning Algorithms
  • 52. D Initialize D Sparse Coding (Greedy/Relaxation Algorithms) Dictionary Update (DL Algorithms) Aharon, Elad, & Bruckstein (`04) Y T The K-SVD Algorithm - General
  • 53.  Can be applied to almost everything  Classifications or numerical predictions  Widely used in pattern recognition o Identify cancer or genetic diseases o Text classification: classify texts based on the language o Detecting rare events: earthquakes or engine failures Support vector machine
  • 54. x We have two features ( x , x ) and some data points 1 2 1 x2 Linearly separable problem
  • 55. We want to find a hyperplane, in this case a line, that separates the different data points with the maximum margin x1 x2
  • 57. This is the maximum margin solution x 1 x2
  • 58. Support vectors:  the points from each class that are closest to the maximum margin hyperplane  each class have at least 1 support vector Support vectors x1 x2
  • 59. With the support vectors alone it is possible to reconstruct the hyperplane: it is good !!! We can store the classification model even when we have millions of features Support vectors x1 x2
  • 60. How to find the hyperplane when the problem is linearly separable? With convex hulls x1 x2
  • 61. How to find the hyperplane when the problem is linearly separable? With convex hulls Convex hull: smallest convex set that contains all the point The hyperplane is the perpendicular bisector of the shortest line between the two hull x1 x2
  • 62. Mathematical approach w * x + b = 0 the equation of a hyperplane in n-dimensions In 2D: y = m*x + b w w ... w n 1 2 x x ... x n 1 2 we have the so called weights The aim of the SVM algorithm is to find the w weights so that the data points will be separated accordingly: w * x + b > +1 w * x + b < -1 n
  • 63. How to find the hyperplane in 2D? With convex hulls The two planes defined by the equations x1 x2 d H0 H1
  • 64. Mathematical approach Vector geometry defines, that the distance between the two planes: 2 w Euclidean-norm ( distance from 0 ) We want to make the distance as large as possible  so we want to minimize the norm of the w We usually minimize: 1 2 w Quadratic optimization solve this problem !!! 2
  • 65. Non-linear spaces  In many real-world applications, the relationships between variables are non-linear  A key feature of SVMs is their ability to map the problem into a higher dimensional space using a process known as the “kernel trick”  Non-linear relationship may suddenly appears to be quite linear
  • 66. We have to use slack variables, it is a non-linearly separable problem a i a i x1 x2
  • 67. Mathematical approach We minimize: 1 2 w + C 𝒊 𝒂 i C: cost parameter to all points that violate the constraints We make our optimization on this cost function We can tune the C parameter: we can modify the penalty for the data points that are misclassified C is very large  the algorithm tries to find a 100% separation C is low  wider overall margin is allowed with more misclassified data points 2
  • 68. latitude longitude Kernels It can be weather classes: sunny and snowy
  • 70. kernel With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) Higher dimensional space latitude longitude altitude longitude
  • 71.  Can be applied to almost everything  Classifications or numerical predictions  Widely used in pattern recognition o Identify cancer or genetic diseases o Text classification: classify texts based on the language o Detecting rare events: earthquakes or engine failures Support vector machine
  • 72. x We have two features ( x , x ) and some data points 1 2 1 x2 Linearly separable problem
  • 73. We want to find a hyperplane, in this case a line, that separates the different data points with the maximum margin x1 x2
  • 75. This is the maximum margin solution x 1 x2
  • 76. Support vectors:  the points from each class that are closest to the maximum margin hyperplane  each class have at least 1 support vector Support vectors x1 x2
  • 77. With the support vectors alone it is possible to reconstruct the hyperplane: it is good !!! We can store the classification model even when we have millions of features Support vectors x1 x2
  • 78. How to find the hyperplane when the problem is linearly separable? With convex hulls x1 x2
  • 79. How to find the hyperplane when the problem is linearly separable? With convex hulls Convex hull: smallest convex set that contains all the point The hyperplane is the perpendicular bisector of the shortest line between the two hull x1 x2
  • 80. Mathematical approach w * x + b = 0 the equation of a hyperplane in n-dimensions In 2D: y = m*x + b w w ... w n 1 2 x x ... x n 1 2 we have the so called weights The aim of the SVM algorithm is to find the w weights so that the data points will be separated accordingly: w * x + b > +1 w * x + b < -1 n
  • 81. How to find the hyperplane in 2D? With convex hulls The two planes defined by the equations x1 x2 d H0 H1
  • 82. Mathematical approach Vector geometry defines, that the distance between the two planes: 2 w Euclidean-norm ( distance from 0 ) We want to make the distance as large as possible  so we want to minimize the norm of the w We usually minimize: 1 2 w Quadratic optimization solve this problem !!! 2
  • 83. Non-linear spaces  In many real-world applications, the relationships between variables are non-linear  A key feature of SVMs is their ability to map the problem into a higher dimensional space using a process known as the “kernel trick”  Non-linear relationship may suddenly appears to be quite linear
  • 84. We have to use slack variables, it is a non-linearly separable problem a i a i x1 x2
  • 85. Mathematical approach We minimize: 1 2 w + C 𝒊 𝒂 i C: cost parameter to all points that violate the constraints We make our optimization on this cost function We can tune the C parameter: we can modify the penalty for the data points that are misclassified C is very large  the algorithm tries to find a 100% separation C is low  wider overall margin is allowed with more misclassified data points 2
  • 86. latitude longitude Kernels It can be weather classes: sunny and snowy
  • 88. kernel With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) Higher dimensional space latitude longitude altitude longitude
  • 89. kernel Higher dimensional space With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) latitude longitude altitude longitude
  • 90. kernel SVM learns concepts that were not explicitly measured in the original data !!! Higher dimensional space latitude longitude altitude longitude
  • 91. Kernel functions Φ(x) “phi function” This is the mapping of data x into an other space K(x , x ) this is the kernel function i j K(x , x ) = i j x * x i j Linear kernel: does not transform the data ( x * x + 1 ) i j Polynomial kernel d x - x i j exp 2* 2 2 gaussian RBF kernel - K(x , x ) = i j K(x , x ) = i j
  • 92. Advantages  SVM can be used for regression problems as well as for classifications  Not overly influenced by noisy data  Easier to use than neural networks  Finding the best model requires testing of various combinations of kernels and model parameters  Quite slow  especially when the input dataset has a large number of features Disadvantages
  • 93. 93 Algorithms/ No. of Classes 2 Class Bonsai and car side 5 class 20 40 80 101 Spars+SIFT +SVM 100% 95.38% 79.19% 76.07% 75.26% 73.13% Sparse +SVM 47.37% 43.10% - - - - SIFT +SVM 56.56% 52.60% - - - - Classification Test results Caltech 101
  • 94. 94 Overview: Kernel-based learning Lower dimension Input Space Higher dimension Feature Space Kernel Design  Kernel measures the similarity between data points  Kernel transformation helps in using in linear separation algorithm like Support Vector Classification (SVC) in higher dimensions
  • 95. 95 Same data can have elements that show different patterns Best kernel is a linear combination of different kernels Overview: Kernel-based learning
  • 96. 96 Single Kernel SVM to Multikernel SVM[ 8 ]
  • 97. 97 Single Kernel SVM to Multikernel SVM[ 8 ]
  • 98. 98 SIFT Dataset Features MKL SVM Multikernel Approach
  • 99. 99 K(x , x ) = i j x * x i j Linear kernel: does not transform the data ( x * x + 1 ) i j Polynomial kernel d x - x i j exp 2* 2 2 gaussian RBF kernel - K(x , x ) = i j K(x , x ) = i j Kernels used
  • 100. 100 Results we obtained Caltech101 MKL performance in our Algorithm No. Kernel Parameter No of training image/Class Accuracy 1 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 30 75.52 2 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 15 - 3 Polynomial [1 2 3] 30 75.70 4 Polynomial [1 2 3] 15 69.29 5 Gaussian + Gaussian+Polynomial +Polynomial + [0.5 1 2 5 7 10 12 15 17 20] [1 2 3] 30 74.97 6 Polynomial + Gausian + Polynomia [1 2 3] [0.5 1 2 5 7 10 12 15 17 20] 30 75.58 Single Kernel Performance in our Algorithm No. Kernel No of training Images/class Accuracy 1 Linear 30 69.71 2 Polynomial 30 64.18 3 Gaussian 30 61.81
  • 101. 101 Algorithms 15 training images/class 30 training images/class Zhang et al. [1] 59.10±0.60 66.20±0.50 KSPM [2] 56.40 64.40±0.80 NBNN [3] 65.00±1.14 70.40 ML+CORR [4] 61.00 69.60 KC [5] - 64.14±1.18 LSPM[6] 53.23±0.65 58.81±1.51 ScSPM[6] 67.0±0.45 73.2±0.54 DMKDL [7] - 82.66± 0.36* MKLDPL [7] - 86.81±0.21* Our method (Best Result) 69.29±0.98 75.70±1.30 *30 images for training and 15 images for testing Comp aris on with oth er meth od s for Caltech 1 0 1 Datas et
  • 102. 102  Try other Kernels with different L-norms  Work on Two more dataset Caltech256, Scene-15  Understand the cost function effect on different dataset in SVM  Divide the training and testing data with standard approximation for all class and check performance  Publish a paper on above results Future Work
  • 103. 103 Feature extraction SVM Caltech-101 Caltech-256 Scene-15 Dictionary Learning Sparse Coding Training features & Labels Testing features Classified labels % A c c u r a c y SIFT, LBP etc… KSVD,SimCO OMP,MP,BP Multikernel, Cost Function Using LBP:63 Using SIFT: 65 Fusion of SIFT+LBP : - SPM+SIFT: ~77 Using SimCo: ~68 Using OMP: ~66 Using KSVD: ~73 Multikernel: ~ 75.70 Single kernel: ~ 69.71 Testing Labels
  • 104. 104 Sparse formulation of feature vector Attractive properties of Sparse Coding.  First, compared with the VQ coding, SC coding can achieve a much lower reconstruction error due to the less restrictive constraint;  Second, sparsity allows the representation to be specialize, and to capture salient properties of images; Third, research in image statistics clearly reveals that image patches are sparse signals.
  • 105. D Initialize D Sparse Coding (Greedy/Relaxation Algorithms) Dictionary Update (DL Algorithms) Aharon, Elad, & Bruckstein (`04) Y T The K-SVD Algorithm - General A
  • 106. 106 Caltech101 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 107. 107 Caltech256 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 108. 108 Caltech256 Learned Dictionary patches 256 x 2048: 32x64(Patches)
  • 109. 109 Scene-15 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 110. 110 Linear SPM kernel for SVM Histogram z SVM decision function for binary class K : Any Kernel Function α : weight b : fix bias
  • 111. 111 Linear SPM kernel for SVM Pooling Function Max pooling SPM kernel Primal formulation for SVM [6] Y A z
  • 113. 113 Key Parameter of SIFT feature [ Lowe, David G. "Distinctive image features from scale- invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.] 2 × 2 descriptor array computed from an 8 × 8 set of samples