SlideShare a Scribd company logo
12. Support Vector Machines –
Supervised Leaning Algorithm
(Like linear/logistic regression and neural networks)
These lines in magenta are the cost fxns modified such that they
are just in the form of straight lines… which are very close to the
Cost function curves
It can be written as:
We can get rid of 1/m term as it doesn’t affect the value of
minimum Θ, which gives us
Cost Function:
So, SVM hypothesis outputs 1 directly if z >= 0 and 0 if z < 0
Note:
In cost function, we use λ as the parameter to minimize the values
of Θ, but it can also be written as C.A + B… this just makes the same
effect if C=1/λ …
We use an acceptably large value of λ .. so we can use an
equivalrent small value of C to make the same effect and obtain the
same Θ
LARGE MARGIN INTUITION:
Here we want ΘT
x to be >=1 for positive examples as safety
margins
Similar for negative examples
Now, if we set C to be a very large value, say 100,000:
Then our minimization algo will technically vanish the term
multiplied with C
So, we minimize regularization term subject to:
This bring out a very interesting decision boundary: SVM
DECISION BOUNDARY
SVM give the best decision boundary (black one), which is at a
minimum distance from both positive and negative examples
LARGE MARGIN OCCURS WHEN WE CHOOSE “C” – A VERY LARGE
VALUE
If C is very large, SVM will not act as a large margin classifier:
Instead it will give a closer margin.
In case of a few outliers, if C is very large: we will get magenta line
While if C is (large but) not too large: we will still get black line
In case, if the data is not linearly separable OR there are more
outliers: Choosing a value of C (large but) not too large, SVM will
still do the right thing, i.e., it will give the black line
MATH BEHIND LARGE MARGIN CLASSIFICATION:
Vector Inner Product:
So, we need to find:
We have:
Projections:
Here:
➔ P is signed, it can be +ve or -ve
So, we get:
Matrix representation:
uTv =
Since p is signed:
If p < 0:
Optimization objective of SVM: Math behind it:
Why SVM gives large margin classification?
For simplification we take Θ0 = 0 and n = 2:
Θ0 = 0 → so that Θ vector passes through origin.
N=2 → only two features in the data
So, to find ΘT
x:
Therefore:
This gives us:
Let’s consider the case of small margin decision boundary: it’s not
a very good choice though:
➢Θ vector will be perpendicular to decision boundary as we can
recall that decision boundary does not depend on parameters
or hypothesis: it only depends on features.
Now lets plot the examples on this decision boundary:
This means that, since p( 1 )
and p( 2 )
are small, for p( 1 )||Θ|| to
be greater than 1:
||Θ|| will have to be large.
But this contradicts our cost minimization efforts, So, this decision
boundary is not the chosen
Now suppose a large margin decision boundary is chosen:
Since p( 1 )
is larger, Θ can be smaller, which supports are norm of
minimizing Θ in cost function’s regularization part.
➢The values of margin is equal to the values of p for given
example
KERNELS:
To write complex non-linear classifiers
Usually what we do is:
This is how we define features for our hypothesis.
But:
Landmarks are some vectors on the feature vs feature graph
Similarity can also be written as:
➢||x – l( 1 )
|| is the component wise difference bw vector x
and vector l.
What are kernels:
Lets try diff values of σ :
If we decr σ:
More values of x will be close to 0.
If we incr σ :
More values of x will be higher than 0.
Example:
Here, pink point is the example close to landmark 1:
So the hypothesis o/p is greater than 0 →hence prediction is 1
Blue point is far from all landmarks
So the hypot. o/p is less than 0 → hence the prediction is 0
So this gives us a non-linear decision boundary:
HOW TO CHOOSE LANDMARKS:
We choose landmarks at exactly the same points as our examples:
→→
We will obtain a feature vector f from x.
For implementation purpose, a scaling factor M is used while
calculating Θ2
, to counter the expense of large training sets
HOW TO CHOOSE PARAMETER “C”:
HOW TO CHOOSE σ2 :
HOW TO USE SUPPORT VECTOR MACHINES:
OR:
➢In case of Gaussian Kernel:
➔ Polynomial kernels are usually (although very rarely) used
when x and l are positive
WHEN TO USE LOGISTIC REGRESSION vs SVM ?

More Related Content

PDF
2 linear regression with one variable
PDF
4 linear regeression with multiple variables
PDF
9 neural network learning
PDF
5 octave tutorial
PDF
13 unsupervised learning clustering
PDF
14 dimentionality reduction
PDF
7 regularization
PDF
15 anomaly detection
2 linear regression with one variable
4 linear regeression with multiple variables
9 neural network learning
5 octave tutorial
13 unsupervised learning clustering
14 dimentionality reduction
7 regularization
15 anomaly detection

What's hot (20)

PDF
8 neural network representation
PDF
6 logistic regression classification algo
PDF
17 large scale machine learning
PDF
10 advice for applying ml
PPTX
Lecture two
PPT
Calc 2.1
ODP
U6 Cn2 Definite Integrals Intro
PPTX
Simplex Method Flowchart/Algorithm
PPT
simplex method
PPTX
Teknik Simulasi
PPTX
Interpolation and its applications
PPTX
Lectue five
PDF
Random number generator
PPTX
linear programming
PPTX
All pair shortest path
PDF
Lesson 25: The Definite Integral
PDF
Logistic regression in Machine Learning
PDF
Linear programming using the simplex method
PDF
R - binomial distribution
RTF
Amortized complexity
8 neural network representation
6 logistic regression classification algo
17 large scale machine learning
10 advice for applying ml
Lecture two
Calc 2.1
U6 Cn2 Definite Integrals Intro
Simplex Method Flowchart/Algorithm
simplex method
Teknik Simulasi
Interpolation and its applications
Lectue five
Random number generator
linear programming
All pair shortest path
Lesson 25: The Definite Integral
Logistic regression in Machine Learning
Linear programming using the simplex method
R - binomial distribution
Amortized complexity
Ad

Similar to 12 support vector machines (20)

PPTX
Gradient Decent in Linear Regression.pptx
PPTX
UE19EC353 ML Unit4_slides.pptx
PPTX
classification algorithms in machine learning.pptx
PPTX
Bootcamp of new world to taken seriously
PPTX
SVM[Support vector Machine] Machine learning
PDF
working with python
PPTX
Machine learning session4(linear regression)
PPTX
Difference between logistic regression shallow neural network and deep neura...
PPTX
ML Study Jams Session 2.pptx
PPT
99995320.ppt
PPTX
Machine learning session8(svm nlp)
PPTX
Support Vector Machine topic of machine learning.pptx
PPTX
CALCULUS chapter number one presentation
PPTX
Multimedia lossy compression algorithms
PDF
Linear logisticregression
PPTX
PRML Chapter 4
PDF
Matlab intro notes
PPTX
PRML Chapter 7
PDF
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
PPTX
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Gradient Decent in Linear Regression.pptx
UE19EC353 ML Unit4_slides.pptx
classification algorithms in machine learning.pptx
Bootcamp of new world to taken seriously
SVM[Support vector Machine] Machine learning
working with python
Machine learning session4(linear regression)
Difference between logistic regression shallow neural network and deep neura...
ML Study Jams Session 2.pptx
99995320.ppt
Machine learning session8(svm nlp)
Support Vector Machine topic of machine learning.pptx
CALCULUS chapter number one presentation
Multimedia lossy compression algorithms
Linear logisticregression
PRML Chapter 4
Matlab intro notes
PRML Chapter 7
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Ad

Recently uploaded (20)

PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Modernising the Digital Integration Hub
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
OMC Textile Division Presentation 2021.pptx
PPT
What is a Computer? Input Devices /output devices
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
cloud_computing_Infrastucture_as_cloud_p
Programs and apps: productivity, graphics, security and other tools
Modernising the Digital Integration Hub
1 - Historical Antecedents, Social Consideration.pdf
observCloud-Native Containerability and monitoring.pptx
NewMind AI Weekly Chronicles - August'25-Week II
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Zenith AI: Advanced Artificial Intelligence
Group 1 Presentation -Planning and Decision Making .pptx
Getting Started with Data Integration: FME Form 101
DP Operators-handbook-extract for the Mautical Institute
Hindi spoken digit analysis for native and non-native speakers
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TLE Review Electricity (Electricity).pptx
Developing a website for English-speaking practice to English as a foreign la...
OMC Textile Division Presentation 2021.pptx
What is a Computer? Input Devices /output devices
Enhancing emotion recognition model for a student engagement use case through...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...

12 support vector machines

  • 1. 12. Support Vector Machines – Supervised Leaning Algorithm (Like linear/logistic regression and neural networks)
  • 2. These lines in magenta are the cost fxns modified such that they are just in the form of straight lines… which are very close to the Cost function curves It can be written as: We can get rid of 1/m term as it doesn’t affect the value of minimum Θ, which gives us Cost Function: So, SVM hypothesis outputs 1 directly if z >= 0 and 0 if z < 0
  • 3. Note: In cost function, we use λ as the parameter to minimize the values of Θ, but it can also be written as C.A + B… this just makes the same effect if C=1/λ … We use an acceptably large value of λ .. so we can use an equivalrent small value of C to make the same effect and obtain the same Θ LARGE MARGIN INTUITION: Here we want ΘT x to be >=1 for positive examples as safety margins Similar for negative examples
  • 4. Now, if we set C to be a very large value, say 100,000: Then our minimization algo will technically vanish the term multiplied with C So, we minimize regularization term subject to: This bring out a very interesting decision boundary: SVM DECISION BOUNDARY
  • 5. SVM give the best decision boundary (black one), which is at a minimum distance from both positive and negative examples LARGE MARGIN OCCURS WHEN WE CHOOSE “C” – A VERY LARGE VALUE If C is very large, SVM will not act as a large margin classifier: Instead it will give a closer margin. In case of a few outliers, if C is very large: we will get magenta line While if C is (large but) not too large: we will still get black line
  • 6. In case, if the data is not linearly separable OR there are more outliers: Choosing a value of C (large but) not too large, SVM will still do the right thing, i.e., it will give the black line
  • 7. MATH BEHIND LARGE MARGIN CLASSIFICATION: Vector Inner Product: So, we need to find: We have: Projections: Here: ➔ P is signed, it can be +ve or -ve
  • 8. So, we get: Matrix representation: uTv = Since p is signed: If p < 0:
  • 9. Optimization objective of SVM: Math behind it: Why SVM gives large margin classification? For simplification we take Θ0 = 0 and n = 2: Θ0 = 0 → so that Θ vector passes through origin. N=2 → only two features in the data So, to find ΘT x: Therefore:
  • 10. This gives us: Let’s consider the case of small margin decision boundary: it’s not a very good choice though: ➢Θ vector will be perpendicular to decision boundary as we can recall that decision boundary does not depend on parameters or hypothesis: it only depends on features.
  • 11. Now lets plot the examples on this decision boundary: This means that, since p( 1 ) and p( 2 ) are small, for p( 1 )||Θ|| to be greater than 1: ||Θ|| will have to be large. But this contradicts our cost minimization efforts, So, this decision boundary is not the chosen Now suppose a large margin decision boundary is chosen:
  • 12. Since p( 1 ) is larger, Θ can be smaller, which supports are norm of minimizing Θ in cost function’s regularization part. ➢The values of margin is equal to the values of p for given example KERNELS: To write complex non-linear classifiers Usually what we do is: This is how we define features for our hypothesis. But:
  • 13. Landmarks are some vectors on the feature vs feature graph Similarity can also be written as: ➢||x – l( 1 ) || is the component wise difference bw vector x and vector l.
  • 14. What are kernels: Lets try diff values of σ :
  • 15. If we decr σ: More values of x will be close to 0. If we incr σ : More values of x will be higher than 0.
  • 16. Example: Here, pink point is the example close to landmark 1: So the hypothesis o/p is greater than 0 →hence prediction is 1 Blue point is far from all landmarks So the hypot. o/p is less than 0 → hence the prediction is 0 So this gives us a non-linear decision boundary:
  • 17. HOW TO CHOOSE LANDMARKS: We choose landmarks at exactly the same points as our examples: →→ We will obtain a feature vector f from x.
  • 18. For implementation purpose, a scaling factor M is used while calculating Θ2 , to counter the expense of large training sets HOW TO CHOOSE PARAMETER “C”:
  • 19. HOW TO CHOOSE σ2 : HOW TO USE SUPPORT VECTOR MACHINES: OR: ➢In case of Gaussian Kernel:
  • 20. ➔ Polynomial kernels are usually (although very rarely) used when x and l are positive
  • 21. WHEN TO USE LOGISTIC REGRESSION vs SVM ?