Lecture Slides - SVM.pptx

Support Vector Machines
Varol Kayhan, PhD

Support Vector Machines
• Very popular
• Powerful for both classification and regression
• Can uncover both linear and nonlinear relationships!
• Resource intensive!

Linear SVM Classification
• Iris data: separate one class from the other

• Iris data: separate one class from the other
Decision boundary (stays as far away as
possible from instances)
Goal: fit the widest possible
"street" between classes
Support vectors

• Hyperplanes: separate one class from another
In a 2-D world, the
hyperplane is a "line"
In a 3-D world, the
hyperplane is a "plane"
Image source: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47

Hard Margin Classification
• Data must be linearly separable
• There must be NO outliers
• (i.e., the perfect world!)

Soft Margin Classification
• Allows margin violations
• Instances that are "in the street" or "misclassified"
• C is referred to as “regularization parameter”
• Controls the width of the margin
• Determines how much violation is allowed!
• Higher C values perform LESS regularization
• Leads to a SMALLER margin
• Aims for less violations
• There is a risk of overfitting
• Lower C values perform MORE regularization
• Leads to a WIDER margin
• Aims for more violations
• Favors generalizability

Nonlinear SVM Classification
• Not all data are linearly separable
• Solution 1:
• Create polynomial features and fit a linear SVM!
• Low polynomial degrees may fit the data well
• Problem: High polynomial degrees generate too much data (so needs more
resources!)
• (Example on next slide)

• Example

• Solution 2: use a "kernel trick"
• Similar to adding many polynomial degrees (without actually adding them)
• SVC(kernel="poly", degree=3, coef0=1, C=5)
• coef0 controls how much the model should be influenced by higher degree polynomials
• The number of features does not increase!!!
d=3, coef0=1, C=5 d=10, coef0=100, C=5

Adding Similarity Features
• Another technique to tackle nonlinear classification
• Adds "features" (i.e., new variables/dimensions) for separation of
instances
• Computationally expensive
x1
x2
x3

Gaussian RBF Kernel
• The kernel trick to add similarity features
Gaussian Radial Basis Function
Landmarks

Gaussian RBF Kernel
• SVC(kernel="rbf", gamma=5, C=0.001)
• Higher gamma makes the bell shape narrower
• Smaller gamma makes the bell shape wider
• Increase gamma if the model is underfitting!
• Decrease gamma if overfitting!

Gaussian RBF Kernel
Increase C to
minimize
violations
Increase gamma
to address
underfitting

Computational Complexity
• LinearSVC is nearly identical to SVC(kernel=”linear”)
• LinearSVC:
• More efficient than SVC(kernel=“linear”)
• Doesn't support kernel tricks
• SVC(kernel=“linear”)
• Supports kernel tricks (such as “poly” and “rbf”)
• Takes time to train for large data sets

Multi-Class Classification
• SVM cannot perform multi-class classification (in its true sense)
• Instead, it performs: one-versus-rest ("ovr")
• Create multiple binary class classification models
• Run the observation on these models
• Make a final determination based on the combined results

SVM Regression
• SVM can be used to predict numerical values too
• Instead of creating the widest street, fits most instances on the street
• The width of the street is controlled by epsilon

SVM Regression
• Use kernelized SVM for polynomial models

Python Cheatsheet
• C: regularization
• Small C: wide margin, allows more violations (i.e., generalizable)
• High C: small margin, allows less violations (i.e., overfitting)
• coef0: used for poly kernel
• Controls how much the model is influenced by higher degree polynomials
• gamma: the shape of the bell for Gaussian RBF
• Higher values make it narrower
• Smaller values make it wider
• tol: precision parameter
• epsilon: width of the margin in regression

Lecture Slides - SVM.pptx

More Related Content

Similar to Lecture Slides - SVM.pptx (20)

Recently uploaded (20)

Lecture Slides - SVM.pptx