SlideShare a Scribd company logo
Support Vector Machines
Varol Kayhan, PhD
Support Vector Machines
• Very popular
• Powerful for both classification and regression
• Can uncover both linear and nonlinear relationships!
• Resource intensive!
Linear SVM Classification
• Iris data: separate one class from the other
Linear SVM Classification
• Iris data: separate one class from the other
Decision boundary (stays as far away as
possible from instances)
Goal: fit the widest possible
"street" between classes
Support vectors
Linear SVM Classification
• Hyperplanes: separate one class from another
In a 2-D world, the
hyperplane is a "line"
In a 3-D world, the
hyperplane is a "plane"
Image source: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
Hard Margin Classification
• Data must be linearly separable
• There must be NO outliers
• (i.e., the perfect world!)
Soft Margin Classification
• Allows margin violations
• Instances that are "in the street" or "misclassified"
• C is referred to as “regularization parameter”
• Controls the width of the margin
• Determines how much violation is allowed!
• Higher C values perform LESS regularization
• Leads to a SMALLER margin
• Aims for less violations
• There is a risk of overfitting
• Lower C values perform MORE regularization
• Leads to a WIDER margin
• Aims for more violations
• Favors generalizability
Nonlinear SVM Classification
• Not all data are linearly separable
• Solution 1:
• Create polynomial features and fit a linear SVM!
• Low polynomial degrees may fit the data well
• Problem: High polynomial degrees generate too much data (so needs more
resources!)
• (Example on next slide)
Nonlinear SVM Classification
• Example
Nonlinear SVM Classification
• Example
Nonlinear SVM Classification
• Solution 2: use a "kernel trick"
• Similar to adding many polynomial degrees (without actually adding them)
• SVC(kernel="poly", degree=3, coef0=1, C=5)
• coef0 controls how much the model should be influenced by higher degree polynomials
• The number of features does not increase!!!
d=3, coef0=1, C=5 d=10, coef0=100, C=5
Adding Similarity Features
• Another technique to tackle nonlinear classification
• Adds "features" (i.e., new variables/dimensions) for separation of
instances
• Computationally expensive
x1
x2
x3
Gaussian RBF Kernel
• The kernel trick to add similarity features
Gaussian Radial Basis Function
Landmarks
Gaussian RBF Kernel
• SVC(kernel="rbf", gamma=5, C=0.001)
• Higher gamma makes the bell shape narrower
• Smaller gamma makes the bell shape wider
• Increase gamma if the model is underfitting!
• Decrease gamma if overfitting!
Gaussian RBF Kernel
Increase C to
minimize
violations
Increase gamma
to address
underfitting
Computational Complexity
• LinearSVC is nearly identical to SVC(kernel=”linear”)
• LinearSVC:
• More efficient than SVC(kernel=“linear”)
• Doesn't support kernel tricks
• SVC(kernel=“linear”)
• Supports kernel tricks (such as “poly” and “rbf”)
• Takes time to train for large data sets
Multi-Class Classification
• SVM cannot perform multi-class classification (in its true sense)
• Instead, it performs: one-versus-rest ("ovr")
• Create multiple binary class classification models
• Run the observation on these models
• Make a final determination based on the combined results
SVM Regression
• SVM can be used to predict numerical values too
• Instead of creating the widest street, fits most instances on the street
• The width of the street is controlled by epsilon
SVM Regression
• Use kernelized SVM for polynomial models
Python Cheatsheet
• C: regularization
• Small C: wide margin, allows more violations (i.e., generalizable)
• High C: small margin, allows less violations (i.e., overfitting)
• coef0: used for poly kernel
• Controls how much the model is influenced by higher degree polynomials
• gamma: the shape of the bell for Gaussian RBF
• Higher values make it narrower
• Smaller values make it wider
• tol: precision parameter
• epsilon: width of the margin in regression

More Related Content

PPTX
Statistical Machine Learning unit4 lecture notes
PPTX
Support-Vector-Machine (Supervised Learning).pptx
PPTX
How Machine Learning Helps Organizations to Work More Efficiently?
PDF
Ml ch17
PPT
Machine Learning workshop by GDSC Amity University Chhattisgarh
PDF
SVM(support vector Machine)withExplanation.pdf
PPT
lec10svm.ppt
Statistical Machine Learning unit4 lecture notes
Support-Vector-Machine (Supervised Learning).pptx
How Machine Learning Helps Organizations to Work More Efficiently?
Ml ch17
Machine Learning workshop by GDSC Amity University Chhattisgarh
SVM(support vector Machine)withExplanation.pdf
lec10svm.ppt

Similar to Lecture Slides - SVM.pptx (20)

PPT
Machine Learning Deep Learning Machine learning
PPTX
SVM_and_Kernels_presentation_with_code.pptx
PPTX
Support vector machine learning.pptx
PPTX
svm-proyekt.pptx
PPT
lec10svm.ppt
PPT
lec10svm.ppt
PPT
Support Vector Machines (lecture by Geoffrey Hinton)
PPT
lec10svm.ppt SVM lecture machine learning
PPT
Svm ms
PPT
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
PPT
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
PPTX
Support vector machines
PPTX
Support vector machine-SVM's
PDF
OM-DS-Fall2022-Session10-Support vector machine.pdf
PDF
Support Vector Machines ( SVM )
PPTX
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
PPTX
Support vector machine
PPTX
Anomaly Detection and Localization Using GAN and One-Class Classifier
PPTX
Support Vector machine(SVM) and Random Forest
PPTX
support vector machine 1.pptx
Machine Learning Deep Learning Machine learning
SVM_and_Kernels_presentation_with_code.pptx
Support vector machine learning.pptx
svm-proyekt.pptx
lec10svm.ppt
lec10svm.ppt
Support Vector Machines (lecture by Geoffrey Hinton)
lec10svm.ppt SVM lecture machine learning
Svm ms
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
Support vector machines
Support vector machine-SVM's
OM-DS-Fall2022-Session10-Support vector machine.pdf
Support Vector Machines ( SVM )
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support vector machine
Anomaly Detection and Localization Using GAN and One-Class Classifier
Support Vector machine(SVM) and Random Forest
support vector machine 1.pptx
Ad

Recently uploaded (20)

PPTX
ChandigarhUniversityinformationcareer.pptx
PDF
MCQ Practice CBT OL Official Language 1.pptx.pdf
PDF
LSR CASEBOOK 2024-25.pdf. very nice casbook
PPTX
DPT-MAY24.pptx for review and ucploading
PPTX
_Dispute Resolution_July 2022.pptxmhhghhhh
PDF
202s5_Luciano André Deitos Koslowski.pdf
PPT
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
PPTX
microtomy kkk. presenting to cryst in gl
PPT
ALLIED MATHEMATICS -I UNIT III MATRICES.ppt
PPTX
430838499-Anaesthesiiiia-Equipmenooot.pptx
PPTX
Digital Education Presentation for students.
PDF
Women’s Talk Session 1- Talking about women
PDF
servsafecomprehensive-ppt-full-140617222538-phpapp01.pdf
PPTX
Definition and Relation of Food Science( Lecture1).pptx
PPTX
Prokaryotes v Eukaryotes PowerPoint.pptx
PPT
2- CELL INJURY L1 Medical (2) gggggggggg
DOCX
PRACTICE-TEST-12 is specially designed for those
PPTX
A slide for students with the advantagea
PPTX
The Stock at arrangement the stock and product.pptx
PPTX
Overview Planner of Soft Skills in a single ppt
ChandigarhUniversityinformationcareer.pptx
MCQ Practice CBT OL Official Language 1.pptx.pdf
LSR CASEBOOK 2024-25.pdf. very nice casbook
DPT-MAY24.pptx for review and ucploading
_Dispute Resolution_July 2022.pptxmhhghhhh
202s5_Luciano André Deitos Koslowski.pdf
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
microtomy kkk. presenting to cryst in gl
ALLIED MATHEMATICS -I UNIT III MATRICES.ppt
430838499-Anaesthesiiiia-Equipmenooot.pptx
Digital Education Presentation for students.
Women’s Talk Session 1- Talking about women
servsafecomprehensive-ppt-full-140617222538-phpapp01.pdf
Definition and Relation of Food Science( Lecture1).pptx
Prokaryotes v Eukaryotes PowerPoint.pptx
2- CELL INJURY L1 Medical (2) gggggggggg
PRACTICE-TEST-12 is specially designed for those
A slide for students with the advantagea
The Stock at arrangement the stock and product.pptx
Overview Planner of Soft Skills in a single ppt
Ad

Lecture Slides - SVM.pptx

  • 2. Support Vector Machines • Very popular • Powerful for both classification and regression • Can uncover both linear and nonlinear relationships! • Resource intensive!
  • 3. Linear SVM Classification • Iris data: separate one class from the other
  • 4. Linear SVM Classification • Iris data: separate one class from the other Decision boundary (stays as far away as possible from instances) Goal: fit the widest possible "street" between classes Support vectors
  • 5. Linear SVM Classification • Hyperplanes: separate one class from another In a 2-D world, the hyperplane is a "line" In a 3-D world, the hyperplane is a "plane" Image source: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
  • 6. Hard Margin Classification • Data must be linearly separable • There must be NO outliers • (i.e., the perfect world!)
  • 7. Soft Margin Classification • Allows margin violations • Instances that are "in the street" or "misclassified" • C is referred to as “regularization parameter” • Controls the width of the margin • Determines how much violation is allowed! • Higher C values perform LESS regularization • Leads to a SMALLER margin • Aims for less violations • There is a risk of overfitting • Lower C values perform MORE regularization • Leads to a WIDER margin • Aims for more violations • Favors generalizability
  • 8. Nonlinear SVM Classification • Not all data are linearly separable • Solution 1: • Create polynomial features and fit a linear SVM! • Low polynomial degrees may fit the data well • Problem: High polynomial degrees generate too much data (so needs more resources!) • (Example on next slide)
  • 11. Nonlinear SVM Classification • Solution 2: use a "kernel trick" • Similar to adding many polynomial degrees (without actually adding them) • SVC(kernel="poly", degree=3, coef0=1, C=5) • coef0 controls how much the model should be influenced by higher degree polynomials • The number of features does not increase!!! d=3, coef0=1, C=5 d=10, coef0=100, C=5
  • 12. Adding Similarity Features • Another technique to tackle nonlinear classification • Adds "features" (i.e., new variables/dimensions) for separation of instances • Computationally expensive x1 x2 x3
  • 13. Gaussian RBF Kernel • The kernel trick to add similarity features Gaussian Radial Basis Function Landmarks
  • 14. Gaussian RBF Kernel • SVC(kernel="rbf", gamma=5, C=0.001) • Higher gamma makes the bell shape narrower • Smaller gamma makes the bell shape wider • Increase gamma if the model is underfitting! • Decrease gamma if overfitting!
  • 15. Gaussian RBF Kernel Increase C to minimize violations Increase gamma to address underfitting
  • 16. Computational Complexity • LinearSVC is nearly identical to SVC(kernel=”linear”) • LinearSVC: • More efficient than SVC(kernel=“linear”) • Doesn't support kernel tricks • SVC(kernel=“linear”) • Supports kernel tricks (such as “poly” and “rbf”) • Takes time to train for large data sets
  • 17. Multi-Class Classification • SVM cannot perform multi-class classification (in its true sense) • Instead, it performs: one-versus-rest ("ovr") • Create multiple binary class classification models • Run the observation on these models • Make a final determination based on the combined results
  • 18. SVM Regression • SVM can be used to predict numerical values too • Instead of creating the widest street, fits most instances on the street • The width of the street is controlled by epsilon
  • 19. SVM Regression • Use kernelized SVM for polynomial models
  • 20. Python Cheatsheet • C: regularization • Small C: wide margin, allows more violations (i.e., generalizable) • High C: small margin, allows less violations (i.e., overfitting) • coef0: used for poly kernel • Controls how much the model is influenced by higher degree polynomials • gamma: the shape of the bell for Gaussian RBF • Higher values make it narrower • Smaller values make it wider • tol: precision parameter • epsilon: width of the margin in regression