SlideShare a Scribd company logo
3
Most read
4
Most read
9
Most read
Support Vector Machine (SVM)
Dr. Prasenjit Dey
Dr.
Prasenjit
Dey
Linear Separators
• What could be the optimal line to separate the blue dots from red dots?
Dr.
Prasenjit
Dey
Support Vector Machine (SVM)
• A SVM is a classifier that provides an optimal hyperplane on the feature space using the
training data.
Optimal hyperplane
Dr.
Prasenjit
Dey
Classification margin
w
x
w b
r i
T


r
ρ
• Margin is the perpendicular distance between the closest data points
and the hyperplane.
• The closest point where the margin distance is calculated is
called the support vector.
• Margin of the hyperplane is the distance between support
vectors.
• Distance from an example xi to the hyperplane is:
ρ
Dr.
Prasenjit
Dey
Maximum classification margin
• The best optimal hyperplane with maximum margin is called maximum margin
hyperplane.
• It is an important observation that for maximizing the margin, only the support vectors
are matter. The remaining examples are ignorable.
r
ρ
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM (contd.)
• For every support vector xs , the above inequality is an equality. After rescaling w and b
by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is
w
2
2 
 r

• Then the margin can be expressed through (rescaled) w and b as:
• Objective is to find w and b such that is maximized and
for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
w
2


w
w
x
w 1
)
(
y



b
r s
T
s
• The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Dr.
Prasenjit
Dey
Mathematical representation of the linear SVM
• Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with
margin ρ is the separator for the training set.
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1
or, yi(wTxi + b) ≥ ρ/2
• Then for each training sample (xi, yi):
r
ρ
Dr.
Prasenjit
Dey
Linear and non-linear data
0
0
0
x2
x
x
x
• For linearly separable data, SVM is perform well even the data contain noise.
• However, if the data are non-linear, for SVM it is too hard to draw the separator line.
• Solution: Map the data into a the higher dimensional space:
Dr.
Prasenjit
Dey
Non-linear data: example 1
Hyperplane in the higher
dimension
• The original feature space can always be mapped to some higher dimensional feature space
where the training set is separable.
Dr.
Prasenjit
Dey
Non-linear data: example 2
• For this hyperplane, three red dots are fall into
the blue categories (misclassification)
• Here, the classification is not perfect
• This separator removes the misclassification.
However, it is difficult to train model like this
• For this, the regularization parameter is
required
Dr.
Prasenjit
Dey
• The linear classifier relies on inner product between vectors K(xi,xj) = xi
Txj
• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner
product becomes: K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some feature space.
• Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi
Txj)2
,
Need to show that K(xi,xj) = φ(xi) Tφ(xj):
K(xi,xj)=(1 + xi
Txj)2
,= 1+ xi1
2xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj2
2 + 2xi1xj1 + 2xi2xj2=
= [1 xi1
2 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj1
2 √2 xj1xj2 xj2
2 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x2
2 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute
each φ(x) explicitly).
The Kernel Functions
Dr.
Prasenjit
Dey
The various kernel functions
• Linear: K(xi,xj)= xi
Txj
Mapping Φ: x → φ(x), where φ(x) is x itself
• Polynomial of power p: K(xi,xj)= (1+ xi
Txj)p
Mapping Φ: x → φ(x), where φ(x) has dimensions
• Gaussian (radial-basis function): K(xi,xj) =
Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional
 every point is mapped to a function (a Gaussian)
 combination of functions for support vectors is the separator.
2
2
2
j
i
e
x
x 








 
p
p
d
Dr.
Prasenjit
Dey
The main idea of SVM is summarized below
• Margin, Regularization, Gamma, Kernel
• Define an optimal hyperplane: maximize the margin
• Generalize to non-linearly separable problems: use penalty based regularization to deal with the
misclassification
• Map the data into a higher dimensional space where it is easier to classify with linear decision
surface: use the kernel function for transformation of the data from one feature space to another.
Tunable parameters of SVM
Dr.
Prasenjit
Dey
Regularization
• For non-linearly separable problems, slack variables ξi can be added to allow misclassification of
difficult or noisy examples. Here, the margin is called soft margin
ξi
ξi
• For soft margin classification, the old formulation of objective
is modified:
Find w and b such that
Φ(w) = wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
• Parameter C can be viewed as a way to control overfitting: it
“trades off” the relative importance of maximizing the margin
and fitting the training data.
Dr.
Prasenjit
Dey
The effect of regularization parameter ‘C’:
• For small value of C  large margin (possible of misclassification)  underfitting
• For large value of C  small margin  overfitting
For small C
For large C
Dr.
Prasenjit
Dey
Gamma
• Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single
training point.
• Low values of gamma indicates a large similarity radius which results in more points being grouped
together.
• For high values of gamma, the points need to be very close to each other in order to be considered in the
same group (or class).
Low gamma
High gamma
Dr.
Prasenjit
Dey
The effect of Gamma:
Gamma=0.001
Gamma=0.01
Gamma=0.1 Gamma=1 (Chances of overfitting)
(Considering as one class)
Dr.
Prasenjit
Dey
Thank you
Dr.
Prasenjit
Dey

More Related Content

PPTX
Support vector machine
PDF
Decision trees in Machine Learning
ODP
Machine Learning With Logistic Regression
PPTX
Presentation on unsupervised learning
PPTX
Decision Tree Learning
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PPTX
Introduction to deep learning
PPT
Support Vector Machines
Support vector machine
Decision trees in Machine Learning
Machine Learning With Logistic Regression
Presentation on unsupervised learning
Decision Tree Learning
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Introduction to deep learning
Support Vector Machines

What's hot (20)

PDF
Reinforcement learning
DOC
Unit 3 daa
PPTX
Forward and Backward chaining in AI
PPTX
Unsupervised learning clustering
PPTX
Dijkstra s algorithm
DOCX
Artificial Intelligence Lab File
PPTX
daa-unit-3-greedy method
PDF
Gram schmidt orthogonalization | Orthonormal Process
PDF
Bias and variance trade off
PPTX
Unsupervised learning (clustering)
PPTX
Sum of subset problem.pptx
PPTX
Introduction to artificial neural network
PPTX
Jacobi method
PPTX
Machine Learning - Accuracy and Confusion Matrix
PPTX
N queen problem
PPTX
K means clustering
PPTX
Machine learning for Data Science
PPTX
Deep learning.pptx
PPTX
Support Vector Machine (SVM)
PPTX
Support vector machines (svm)
Reinforcement learning
Unit 3 daa
Forward and Backward chaining in AI
Unsupervised learning clustering
Dijkstra s algorithm
Artificial Intelligence Lab File
daa-unit-3-greedy method
Gram schmidt orthogonalization | Orthonormal Process
Bias and variance trade off
Unsupervised learning (clustering)
Sum of subset problem.pptx
Introduction to artificial neural network
Jacobi method
Machine Learning - Accuracy and Confusion Matrix
N queen problem
K means clustering
Machine learning for Data Science
Deep learning.pptx
Support Vector Machine (SVM)
Support vector machines (svm)
Ad

Similar to Support vector machine (20)

PPTX
support vector machine
PPTX
super vector machines algorithms using deep
PPT
4.Support Vector Machines.ppt machine learning and development
PPT
linear SVM.ppt
PPTX
machine learning.pptx
PPT
svm_introductory_ppt by university of texas
PPT
Support vector MAchine using machine learning
PPT
svm.ppt
PPT
support vector machine algorithm in machine learning
PPT
Support Vector Machine.ppt
PPT
Support Vector Machine using machin learning
PPTX
Module 3 -Support Vector Machines data mining
PPT
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
PPTX
Lecture 8 about data mining and how to use it.pptx
PPTX
Lec2-review-III-svm-logreg_for the beginner.pptx
PPTX
Lec2-review-III-svm-logregressionmodel.pptx
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
PDF
lec5_annotated.pdf ml csci 567 vatsal sharan
PPTX
Machine learning interviews day2
PPTX
Support vector machines
support vector machine
super vector machines algorithms using deep
4.Support Vector Machines.ppt machine learning and development
linear SVM.ppt
machine learning.pptx
svm_introductory_ppt by university of texas
Support vector MAchine using machine learning
svm.ppt
support vector machine algorithm in machine learning
Support Vector Machine.ppt
Support Vector Machine using machin learning
Module 3 -Support Vector Machines data mining
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
Lecture 8 about data mining and how to use it.pptx
Lec2-review-III-svm-logreg_for the beginner.pptx
Lec2-review-III-svm-logregressionmodel.pptx
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
lec5_annotated.pdf ml csci 567 vatsal sharan
Machine learning interviews day2
Support vector machines
Ad

More from Prasenjit Dey (20)

PPTX
Dynamic interconnection networks
PPTX
Machine Learning in Agriculture Module 6: classification
PPTX
Machine Learning in Agriculture Module 3: linear regression
PPTX
Machine learning in agriculture module 2
PPTX
Machine Learning in Agriculture Module 1
PPTX
Numerical on general pipelines
PPTX
General pipeline concepts
PPTX
Evaluation of computer performance
PPTX
Instruction Set Architecture: MIPS
PPTX
Page replacement and thrashing
PPTX
Addressing mode
PPTX
Register transfer and microoperations part 2
PPTX
Instruction set (prasenjit dey)
PPTX
Register transfer and microoperations part 1
PPTX
Different types of memory and hardware designs of RAM and ROM
PPTX
Cache memory
PPTX
Carry look ahead adder
PPTX
Binary division restoration and non restoration algorithm
PPTX
Booth's algorithm
PPTX
Computer organization basics and number systems
Dynamic interconnection networks
Machine Learning in Agriculture Module 6: classification
Machine Learning in Agriculture Module 3: linear regression
Machine learning in agriculture module 2
Machine Learning in Agriculture Module 1
Numerical on general pipelines
General pipeline concepts
Evaluation of computer performance
Instruction Set Architecture: MIPS
Page replacement and thrashing
Addressing mode
Register transfer and microoperations part 2
Instruction set (prasenjit dey)
Register transfer and microoperations part 1
Different types of memory and hardware designs of RAM and ROM
Cache memory
Carry look ahead adder
Binary division restoration and non restoration algorithm
Booth's algorithm
Computer organization basics and number systems

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Business Acumen Training GuidePresentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Database Infoormation System (DBIS).pptx
Introduction to Knowledge Engineering Part 1
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
.pdf is not working space design for the following data for the following dat...
IB Computer Science - Internal Assessment.pptx
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction-to-Cloud-ComputingFinal.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Acumen Training GuidePresentation.pptx

Support vector machine

  • 1. Support Vector Machine (SVM) Dr. Prasenjit Dey Dr. Prasenjit Dey
  • 2. Linear Separators • What could be the optimal line to separate the blue dots from red dots? Dr. Prasenjit Dey
  • 3. Support Vector Machine (SVM) • A SVM is a classifier that provides an optimal hyperplane on the feature space using the training data. Optimal hyperplane Dr. Prasenjit Dey
  • 4. Classification margin w x w b r i T   r ρ • Margin is the perpendicular distance between the closest data points and the hyperplane. • The closest point where the margin distance is calculated is called the support vector. • Margin of the hyperplane is the distance between support vectors. • Distance from an example xi to the hyperplane is: ρ Dr. Prasenjit Dey
  • 5. Maximum classification margin • The best optimal hyperplane with maximum margin is called maximum margin hyperplane. • It is an important observation that for maximizing the margin, only the support vectors are matter. The remaining examples are ignorable. r ρ Dr. Prasenjit Dey
  • 6. Mathematical representation of the linear SVM (contd.) • For every support vector xs , the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is w 2 2   r  • Then the margin can be expressed through (rescaled) w and b as: • Objective is to find w and b such that is maximized and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1 w 2   w w x w 1 ) ( y    b r s T s • The objective can be reformulated as: find w and b such that Φ(w) = ||w||2= wT w is minimized and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1 Dr. Prasenjit Dey
  • 7. Mathematical representation of the linear SVM • Let the training set is {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} and the hyperplane with margin ρ is the separator for the training set. wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1 or, yi(wTxi + b) ≥ ρ/2 • Then for each training sample (xi, yi): r ρ Dr. Prasenjit Dey
  • 8. Linear and non-linear data 0 0 0 x2 x x x • For linearly separable data, SVM is perform well even the data contain noise. • However, if the data are non-linear, for SVM it is too hard to draw the separator line. • Solution: Map the data into a the higher dimensional space: Dr. Prasenjit Dey
  • 9. Non-linear data: example 1 Hyperplane in the higher dimension • The original feature space can always be mapped to some higher dimensional feature space where the training set is separable. Dr. Prasenjit Dey
  • 10. Non-linear data: example 2 • For this hyperplane, three red dots are fall into the blue categories (misclassification) • Here, the classification is not perfect • This separator removes the misclassification. However, it is difficult to train model like this • For this, the regularization parameter is required Dr. Prasenjit Dey
  • 11. • The linear classifier relies on inner product between vectors K(xi,xj) = xi Txj • If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes: K(xi,xj)= φ(xi) Tφ(xj) • A kernel function is a function that is equivalent to an inner product in some feature space. • Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj) = (1 + xi Txj)2 , Need to show that K(xi,xj) = φ(xi) Tφ(xj): K(xi,xj)=(1 + xi Txj)2 ,= 1+ xi1 2xj1 2 + 2 xi1xj1 xi2xj2+ xi2 2xj2 2 + 2xi1xj1 + 2xi2xj2= = [1 xi1 2 √2 xi1xi2 xi2 2 √2xi1 √2xi2]T [1 xj1 2 √2 xj1xj2 xj2 2 √2xj1 √2xj2] = = φ(xi) Tφ(xj), where φ(x) = [1 x1 2 √2 x1x2 x2 2 √2x1 √2x2] • Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x) explicitly). The Kernel Functions Dr. Prasenjit Dey
  • 12. The various kernel functions • Linear: K(xi,xj)= xi Txj Mapping Φ: x → φ(x), where φ(x) is x itself • Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Mapping Φ: x → φ(x), where φ(x) has dimensions • Gaussian (radial-basis function): K(xi,xj) = Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional  every point is mapped to a function (a Gaussian)  combination of functions for support vectors is the separator. 2 2 2 j i e x x            p p d Dr. Prasenjit Dey
  • 13. The main idea of SVM is summarized below • Margin, Regularization, Gamma, Kernel • Define an optimal hyperplane: maximize the margin • Generalize to non-linearly separable problems: use penalty based regularization to deal with the misclassification • Map the data into a higher dimensional space where it is easier to classify with linear decision surface: use the kernel function for transformation of the data from one feature space to another. Tunable parameters of SVM Dr. Prasenjit Dey
  • 14. Regularization • For non-linearly separable problems, slack variables ξi can be added to allow misclassification of difficult or noisy examples. Here, the margin is called soft margin ξi ξi • For soft margin classification, the old formulation of objective is modified: Find w and b such that Φ(w) = wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0 • Parameter C can be viewed as a way to control overfitting: it “trades off” the relative importance of maximizing the margin and fitting the training data. Dr. Prasenjit Dey
  • 15. The effect of regularization parameter ‘C’: • For small value of C  large margin (possible of misclassification)  underfitting • For large value of C  small margin  overfitting For small C For large C Dr. Prasenjit Dey
  • 16. Gamma • Gamma parameter involves with the RBF kernel function. It controls the distance of influence of a single training point. • Low values of gamma indicates a large similarity radius which results in more points being grouped together. • For high values of gamma, the points need to be very close to each other in order to be considered in the same group (or class). Low gamma High gamma Dr. Prasenjit Dey
  • 17. The effect of Gamma: Gamma=0.001 Gamma=0.01 Gamma=0.1 Gamma=1 (Chances of overfitting) (Considering as one class) Dr. Prasenjit Dey