Support Vector Machine

Support Vector MachineShao-Chuan Wang1

Support Vector Machine1D Classification Problem: how will you separate these data?(H1, H2, H3?)2H1H2H3x0

Support Vector Machine2D Classification Problem: which H is better?3

Max-Margin ClassifierFunctional MarginGeometric Margin4We feel more confident when functional margin is largerNote that scaling on w, b won’t change the plane.Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Maximize marginsOptimization problem: maximize minimal geometric margin under constraints.Introduce scaling factor such that5Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Optimization problem subject to constraintsMaximize f(x, y), subject to constraint g(x, y) = c6-> Lagrange multiplier method

Lagrange dualityPrimal optimization problem:GeneralizedLagrangian methodPrimal optimization problem (equivalent form)Dual optimization problem:7Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Dual ProblemThe necessary conditions that equality holds:f, giare convex, and hi are affine.KKT conditions.8Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Optimal margin classifiersIts LagrangianIts dual problem9Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Support Vector Machine (cont’d)If not linearly separable, we canFind a nonlinear solutionTechnically, it’s a linear solution in higher-order space Kernel Trick26

Kernel and feature mappingKernel:Positive semi-definiteSymmetricFor example:Loose Intuition“similarity” between features11Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Soft Margin (L1 regularization)12C = ∞ leads to hard margin SVM, Rychetsky (2001)Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Why doesn’t my model fit well on test data ?13

Bias/variance tradeoffunderfitting(high bias) overfitting(high variance) Training Error = Generalization Error =14In-sample errorOut-of-sample errorAndrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Bias/variance tradeoff15T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.

Is training error a good estimator of generalization error?16

Chernoff bound (|H|=finite)Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and and let γ > 0 be fixed. Then, based on this lemma, one can find, with probability 1-δ(k = # of hypotheses)17Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Chernoff bound (|H|=infinite)VC Dimension d : The size of largest set that H can shatter.e.g. H = linear classifiersin 2-DVC(H) = 3With probability at least 1-δ,18Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Model SelectionCross Validation: Estimator of generalization error

K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation). Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner.Leave-one-out cross validation (m-fold, m = training sample size)19traintrainvalidatetraintraintrain

Model SelectionLoop possible parameters:Pick one set of parameter, e.g. C = 2.0Do cross validation, get a error estimationPick the Cbest (with minimal error estimation) as the parameter20

Multiclass SVMOne against oneThere are binary SVMs. (1v2, 1v3, …)To predict, each SVM can vote between 2 classes.One against allThere are k binary SVMs. (1 v rest, 2 v rest, …)To predict, evaluate , pick the largest.Multiclass SVM by solving ONE optimization problem21K = 135321123456K = 3poll Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.

Multiclass SVM (2/2)DAGSVM (Directed Acyclic Graph SVM)22

An Example: image classificationProcess23K = 61/4 3/41 0:49 1:25 …1 0:49 1:25 …：：2 0:49 1:25 …：Test DataAccuracy

Support Vector Machine

More Related Content

What's hot (20)

Similar to Support Vector Machine (20)

More from Shao-Chuan Wang (9)

Recently uploaded (20)

Support Vector Machine