SlideShare a Scribd company logo
Machine Learning to build Intelligent Systems
Manas Dasgupta
Understanding Support Vector Machine
(SVM)
Structure of this Module
TOPICS
Introduction to Support Vector Machine
Linear SVM Classifications
Non-linear SVM Classification
Polynomial Kernel Trick
Understanding Support Vector Machine
A Support Vector Machine (SVM) is a very powerful and versatile Machine Learning
model, capable of performing linear or nonlinear classification and regression.
It is one of the most popular models in Machine Learning, and is a must learn for
anyone interested in Machine Learning.
SVMs are particularly well suited for classification of complex but small or medium-
sized datasets.
Introduction to Support Vector
Large Margin Classification
The SVM method can be explained with the help of the
figure.
The data has two classes as noted by the colors and
they are clearly linearly separable.
The goal of the SVM Classification challenge here will
be to find the widest possible street between the
classes.
This is called “large-margin-classification”. The bold
line in the middle is the “decision boundary”.
Notice that adding more training instances “off the
street” will not affect the decision boundary at all: it is
fully determined (or “supported”) by the instances
located on the edge of the street. These instances are
called the support vectors.
Sensitivity to Scales
SVMs are sensitive to the feature scales, as you can see in Figure below: on the left plot,
the vertical scale is much larger than the horizontal scale, so the widest possible street is
close to horizontal.
Hard Margin Classification
If we strictly impose that all instances be off the street and on the correct side, this is called hard margin
classification. There are two main issues with hard margin classification.
First, it only works if the data is linearly separable, and second it is quite sensitive to outliers.
Figure shows the iris dataset with just one additional outlier: on the left, it is impossible to find a hard
margin, and on the right the decision boundary ends up very different from the one we saw in Figure
without the outlier, and it will probably not generalize as well.
Soft Margin Classification
To avoid these issues it is preferable to use a more flexible
model. The objective is to find a good balance between
keeping the street as large as possible and limiting the margin
violations (i.e., instances that end up in the middle of the street
or even on the wrong side).
This is called soft margin classification.
The ‘c’ Hyperparameter
• In Scikit-Learn’s SVM classes, you can control this balance using the ‘C’ hyperparameter: a smaller ‘C’ value
leads to a wider street but more margin violations.
• The figure shows the decision boundaries and margins of two soft margin SVM classifiers on a nonlinearly
separable dataset.
• On the left, using a low C value the margin is quite large, but many instances end up on the street.
• On the right, using a high C value the classifier makes fewer margin violations but ends up with a smaller
margin.
• However, it seems likely that the first classifier will generalize better: in fact even on this training set it makes
fewer prediction errors, since most of the margin violations are actually on the correct side of the decision
boundary.
The Hinge Loss
• The x-axis represents the distance from the boundary of any
single instance, and the y-axis represents the amount of loss
or penalty.
• The dotted line on the x-axis represents 1. When an
instance’s distance from the boundary is greater than or at
1, the loss is 0.
• If the distance from the boundary is 0 (meaning that the
instance is literally on the boundary), then the loss size is 1.
• We see that correctly classified points will have a small (or
none) loss size, while incorrectly classified instances will have
a high loss size.
• A negative distance from the boundary incurs a high hinge
loss. This essentially means that we are on the wrong side of
the boundary, and that the instance will be classified
incorrectly.
• On the other hand, a positive distance from the boundary
incurs a low hinge loss, or no hinge loss at all, and the further
we are away from the boundary (and on the right side of it),
the lower our hinge loss will be.
The hinge loss is a loss function used for training Classifiers
such as the SVM.
The Hinge Loss
[0]: the actual value of this instance is +1 and the predicted value is 0.97, so the
hinge loss is very small as the instance is very far away from the boundary.
[1]: the actual value of this instance is +1 and the predicted value is 1.2, which is
greater than 1, thus resulting in no hinge loss
[2]: the actual value of this instance is +1 and the predicted value is 0, which
means that the point is on the boundary, thus incurring a cost of 1.
[3]: the actual value of this instance is +1 and the predicted value is -0.25,
meaning the point is on the wrong side of the boundary, thus incurring a large
hinge loss of 1.25
[4]: the actual value of this instance is -1 and the predicted value is -0.88, which is
a correct classification but the point is slightly penalised because it is slightly on
the margin
[5]: the actual value of this instance is -1 and the predicted value is -1.01, again
perfect classification and the point is not on the margin, resulting in a loss of 0
[6]: the actual value of this instance is -1 and the predicted value is 0, which
means that the point is on the boundary, thus incurring a cost of 1.
[7]: the actual value of this instance is -1 and the predicted value is 0.40, meaning
the point is on the wrong side of the boundary, thus incurring a large hinge loss of
1.40
Let’s look at an example numerically:
The Hinge loss separates negative and positive
instances as +1 and -1, with -1 being on the left
side of the boundary and +1 being on the right.
Non-Linear SVM
Although linear SVM classifiers are efficient and work surprisingly well in most cases, many datasets are not even
close to being linearly separable.
One approach to handling nonlinear datasets is to add more features, such as polynomial features . In some cases
this can result in a linearly separable dataset.
Consider the left plot in Figure below. It represents a simple dataset with just one feature x1. This dataset is not
linearly separable. However, if we add a second feature x2 = (x1)2, the resulting dataset is perfectly linearly
separable.
Polynomial Kernel Trick
Adding polynomial features is a solution that can be used to solve classification challenges involving
complex data.
However, a low polynomial transformation cannot deal with very complex datasets, and with a high
polynomial degree it creates a huge number of features, making the model too slow.
Fortunately, when using SVMs we can apply a mathematical technique called the kernel trick. It
makes it possible to get the same result as if you added many polynomial features, even with very
high-degree polynomials, without actually having to add them as features. So there is no
combinatorial explosion of the number of features since you don’t actually add any features.
This trick is implemented by the SVC class in Scikit-Learn.
SVM in Practice
Python Demo
Hope you have liked this Video.
Please help us by providing your Ratings and Comments for this
Course!
Thank You!!
Manas Dasgupta
Happy Learning!!
You can Resize without
losing quality
You can Change Fill
Color &
Line Color
www.allppt.com
FREE
PPT TEMPLATES
Fully Editable Icon Sets: A
You can Resize without
losing quality
You can Change Fill
Color &
Line Color
www.allppt.com
FREE
PPT TEMPLATES
Fully Editable Icon Sets: B
You can Resize without
losing quality
You can Change Fill
Color &
Line Color
www.allppt.com
FREE
PPT TEMPLATES
Fully Editable Icon Sets: C

More Related Content

PPTX
SVM[Support vector Machine] Machine learning
PDF
Lecture 23 support vector classifier
PPTX
Support Vector Machine topic of machine learning.pptx
PPTX
ML Softmax JP 24.pptx
PPT
November, 2006 CCKM'06 1
PDF
Machine learning (3)
PPTX
Lecture 4a Random Forest classifier and SVM.pptx
PDF
properties, application and issues of support vector machine
SVM[Support vector Machine] Machine learning
Lecture 23 support vector classifier
Support Vector Machine topic of machine learning.pptx
ML Softmax JP 24.pptx
November, 2006 CCKM'06 1
Machine learning (3)
Lecture 4a Random Forest classifier and SVM.pptx
properties, application and issues of support vector machine

Similar to Machine learninf-Support-Vector-Machine.pdf (20)

PPTX
Classification-Support Vector Machines.pptx
PPTX
Support-Vector-Machine (Supervised Learning).pptx
PPTX
classification algorithms in machine learning.pptx
PPT
Support Vector Machines
PPT
Introduction to Support Vector Machine 221 CMU.ppt
PPTX
SVMs.pptx support vector machines machine learning
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
PPTX
Support Vector Machine.pptx
PDF
Generalization of linear and non-linear support vector machine in multiple fi...
PPTX
super vector machines algorithms using deep
PDF
TextCategorization support vector_0308.pdf
PPTX
Support vector machine
PPTX
Tariku Bokila SVMA Presentation.pptx ddd
PPTX
Module-3_SVM_Kernel_KNN.pptx
PDF
Support Vector Machines (SVM)
PPTX
Support vector machines
DOCX
introduction to machine learning unit iv
PPTX
Lec2-review-III-svm-logreg_for the beginner.pptx
PPTX
Lec2-review-III-svm-logregressionmodel.pptx
Classification-Support Vector Machines.pptx
Support-Vector-Machine (Supervised Learning).pptx
classification algorithms in machine learning.pptx
Support Vector Machines
Introduction to Support Vector Machine 221 CMU.ppt
SVMs.pptx support vector machines machine learning
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
Support Vector Machine.pptx
Generalization of linear and non-linear support vector machine in multiple fi...
super vector machines algorithms using deep
TextCategorization support vector_0308.pdf
Support vector machine
Tariku Bokila SVMA Presentation.pptx ddd
Module-3_SVM_Kernel_KNN.pptx
Support Vector Machines (SVM)
Support vector machines
introduction to machine learning unit iv
Lec2-review-III-svm-logreg_for the beginner.pptx
Lec2-review-III-svm-logregressionmodel.pptx
Ad

Recently uploaded (20)

PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Project quality management in manufacturing
PPTX
additive manufacturing of ss316l using mig welding
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Digital Logic Computer Design lecture notes
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
PPT on Performance Review to get promotions
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
composite construction of structures.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Project quality management in manufacturing
additive manufacturing of ss316l using mig welding
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Digital Logic Computer Design lecture notes
OOP with Java - Java Introduction (Basics)
CYBER-CRIMES AND SECURITY A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT on Performance Review to get promotions
bas. eng. economics group 4 presentation 1.pptx
CH1 Production IntroductoryConcepts.pptx
Foundation to blockchain - A guide to Blockchain Tech
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
composite construction of structures.pdf
Ad

Machine learninf-Support-Vector-Machine.pdf

  • 1. Machine Learning to build Intelligent Systems Manas Dasgupta
  • 3. Structure of this Module TOPICS Introduction to Support Vector Machine Linear SVM Classifications Non-linear SVM Classification Polynomial Kernel Trick Understanding Support Vector Machine
  • 4. A Support Vector Machine (SVM) is a very powerful and versatile Machine Learning model, capable of performing linear or nonlinear classification and regression. It is one of the most popular models in Machine Learning, and is a must learn for anyone interested in Machine Learning. SVMs are particularly well suited for classification of complex but small or medium- sized datasets. Introduction to Support Vector
  • 5. Large Margin Classification The SVM method can be explained with the help of the figure. The data has two classes as noted by the colors and they are clearly linearly separable. The goal of the SVM Classification challenge here will be to find the widest possible street between the classes. This is called “large-margin-classification”. The bold line in the middle is the “decision boundary”. Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street. These instances are called the support vectors.
  • 6. Sensitivity to Scales SVMs are sensitive to the feature scales, as you can see in Figure below: on the left plot, the vertical scale is much larger than the horizontal scale, so the widest possible street is close to horizontal.
  • 7. Hard Margin Classification If we strictly impose that all instances be off the street and on the correct side, this is called hard margin classification. There are two main issues with hard margin classification. First, it only works if the data is linearly separable, and second it is quite sensitive to outliers. Figure shows the iris dataset with just one additional outlier: on the left, it is impossible to find a hard margin, and on the right the decision boundary ends up very different from the one we saw in Figure without the outlier, and it will probably not generalize as well.
  • 8. Soft Margin Classification To avoid these issues it is preferable to use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side). This is called soft margin classification.
  • 9. The ‘c’ Hyperparameter • In Scikit-Learn’s SVM classes, you can control this balance using the ‘C’ hyperparameter: a smaller ‘C’ value leads to a wider street but more margin violations. • The figure shows the decision boundaries and margins of two soft margin SVM classifiers on a nonlinearly separable dataset. • On the left, using a low C value the margin is quite large, but many instances end up on the street. • On the right, using a high C value the classifier makes fewer margin violations but ends up with a smaller margin. • However, it seems likely that the first classifier will generalize better: in fact even on this training set it makes fewer prediction errors, since most of the margin violations are actually on the correct side of the decision boundary.
  • 10. The Hinge Loss • The x-axis represents the distance from the boundary of any single instance, and the y-axis represents the amount of loss or penalty. • The dotted line on the x-axis represents 1. When an instance’s distance from the boundary is greater than or at 1, the loss is 0. • If the distance from the boundary is 0 (meaning that the instance is literally on the boundary), then the loss size is 1. • We see that correctly classified points will have a small (or none) loss size, while incorrectly classified instances will have a high loss size. • A negative distance from the boundary incurs a high hinge loss. This essentially means that we are on the wrong side of the boundary, and that the instance will be classified incorrectly. • On the other hand, a positive distance from the boundary incurs a low hinge loss, or no hinge loss at all, and the further we are away from the boundary (and on the right side of it), the lower our hinge loss will be. The hinge loss is a loss function used for training Classifiers such as the SVM.
  • 11. The Hinge Loss [0]: the actual value of this instance is +1 and the predicted value is 0.97, so the hinge loss is very small as the instance is very far away from the boundary. [1]: the actual value of this instance is +1 and the predicted value is 1.2, which is greater than 1, thus resulting in no hinge loss [2]: the actual value of this instance is +1 and the predicted value is 0, which means that the point is on the boundary, thus incurring a cost of 1. [3]: the actual value of this instance is +1 and the predicted value is -0.25, meaning the point is on the wrong side of the boundary, thus incurring a large hinge loss of 1.25 [4]: the actual value of this instance is -1 and the predicted value is -0.88, which is a correct classification but the point is slightly penalised because it is slightly on the margin [5]: the actual value of this instance is -1 and the predicted value is -1.01, again perfect classification and the point is not on the margin, resulting in a loss of 0 [6]: the actual value of this instance is -1 and the predicted value is 0, which means that the point is on the boundary, thus incurring a cost of 1. [7]: the actual value of this instance is -1 and the predicted value is 0.40, meaning the point is on the wrong side of the boundary, thus incurring a large hinge loss of 1.40 Let’s look at an example numerically: The Hinge loss separates negative and positive instances as +1 and -1, with -1 being on the left side of the boundary and +1 being on the right.
  • 12. Non-Linear SVM Although linear SVM classifiers are efficient and work surprisingly well in most cases, many datasets are not even close to being linearly separable. One approach to handling nonlinear datasets is to add more features, such as polynomial features . In some cases this can result in a linearly separable dataset. Consider the left plot in Figure below. It represents a simple dataset with just one feature x1. This dataset is not linearly separable. However, if we add a second feature x2 = (x1)2, the resulting dataset is perfectly linearly separable.
  • 13. Polynomial Kernel Trick Adding polynomial features is a solution that can be used to solve classification challenges involving complex data. However, a low polynomial transformation cannot deal with very complex datasets, and with a high polynomial degree it creates a huge number of features, making the model too slow. Fortunately, when using SVMs we can apply a mathematical technique called the kernel trick. It makes it possible to get the same result as if you added many polynomial features, even with very high-degree polynomials, without actually having to add them as features. So there is no combinatorial explosion of the number of features since you don’t actually add any features. This trick is implemented by the SVC class in Scikit-Learn.
  • 15. Hope you have liked this Video. Please help us by providing your Ratings and Comments for this Course! Thank You!! Manas Dasgupta Happy Learning!!
  • 16. You can Resize without losing quality You can Change Fill Color & Line Color www.allppt.com FREE PPT TEMPLATES Fully Editable Icon Sets: A
  • 17. You can Resize without losing quality You can Change Fill Color & Line Color www.allppt.com FREE PPT TEMPLATES Fully Editable Icon Sets: B
  • 18. You can Resize without losing quality You can Change Fill Color & Line Color www.allppt.com FREE PPT TEMPLATES Fully Editable Icon Sets: C