SlideShare a Scribd company logo
Support Vector MachineShao-Chuan Wang1
Support Vector Machine1D Classification Problem: how will you separate these data?(H1, H2, H3?)2H1H2H3x0
Support Vector Machine2D Classification Problem: which H is better?3
Max-Margin ClassifierFunctional MarginGeometric Margin4We feel more confident when functional margin is largerNote that scaling on w, b won’t  change the plane.Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Maximize marginsOptimization problem: maximize minimal geometric margin under constraints.Introduce scaling factor such that5Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Optimization problem subject to constraintsMaximize f(x, y), subject to constraint g(x, y) = c6-> Lagrange multiplier method
Lagrange dualityPrimal optimization problem:GeneralizedLagrangian methodPrimal optimization problem (equivalent form)Dual optimization problem:7Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Dual ProblemThe necessary conditions that equality holds:f, giare convex, and hi are affine.KKT conditions.8Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Optimal margin classifiersIts LagrangianIts dual problem9Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Support Vector Machine (cont’d)If not linearly separable, we canFind a nonlinear solutionTechnically, it’s a linear solution in higher-order space	Kernel Trick26
Kernel and feature mappingKernel:Positive semi-definiteSymmetricFor example:Loose Intuition“similarity” between features11Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Soft Margin (L1 regularization)12C = ∞ leads to hard margin SVM, Rychetsky (2001)Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Why doesn’t my model fit well on test data ?13
Bias/variance tradeoffunderfitting(high bias) overfitting(high variance) Training Error = Generalization Error =14In-sample errorOut-of-sample errorAndrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
Bias/variance tradeoff15T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.
Is training error a good estimator of generalization error?16
Chernoff bound (|H|=finite)Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and	and let γ > 0 be fixed. Then,	based on this lemma, one can find, with probability 1-δ(k = # of hypotheses)17Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
Chernoff bound (|H|=infinite)VC Dimension d : The size of largest set that H can shatter.e.g. H = linear classifiersin 2-DVC(H) = 3With probability at least 1-δ,18Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
Model SelectionCross Validation: Estimator of generalization error
K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation).    Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner.Leave-one-out cross validation (m-fold, m = training sample size)19traintrainvalidatetraintraintrain
Model SelectionLoop possible parameters:Pick one set of parameter, e.g. C = 2.0Do cross validation, get a error estimationPick the Cbest (with minimal error estimation) as the parameter20
Multiclass SVMOne against oneThere are         binary SVMs. (1v2, 1v3, …)To predict, each SVM can vote between 2 classes.One against allThere are k binary SVMs. (1 v rest, 2 v rest, …)To predict, evaluate                     , pick the largest.Multiclass SVM by solving ONE optimization problem21K = 135321123456K = 3poll Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.
Multiclass SVM (2/2)DAGSVM (Directed Acyclic Graph SVM)22
An Example: image classificationProcess23K = 61/4 3/41 0:49 1:25 …1 0:49 1:25 …:     :2 0:49 1:25 …:Test DataAccuracy

More Related Content

PPT
Support Vector machine
PPTX
Support vector machines (svm)
PDF
Support Vector Machines for Classification
PPTX
Machine learning session4(linear regression)
PPTX
Support Vector Machine ppt presentation
PDF
Support Vector Machines (SVM)
 
PDF
Support Vector Machines ( SVM )
PPTX
Support vector machine
Support Vector machine
Support vector machines (svm)
Support Vector Machines for Classification
Machine learning session4(linear regression)
Support Vector Machine ppt presentation
Support Vector Machines (SVM)
 
Support Vector Machines ( SVM )
Support vector machine

What's hot (20)

PPTX
Introduction to random forest and gradient boosting methods a lecture
PDF
Bayesian Learning- part of machine learning
PPTX
Decision Trees
PDF
Bayesian Networks - A Brief Introduction
PDF
Machine Learning: Introduction to Neural Networks
PPTX
PDF
Lecture 8: Decision Trees & k-Nearest Neighbors
PDF
Performance Metrics for Machine Learning Algorithms
PDF
Machine Learning Model Evaluation Methods
PPTX
Support Vector Machines- SVM
PDF
Tutorial on Deep Generative Models
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PPTX
Multiclass classification of imbalanced data
PPT
Hidden markov model ppt
PPTX
Logistic Regression.pptx
PDF
Mean shift and Hierarchical clustering
PPTX
K nearest neighbor
PDF
Confusion Matrix
PPTX
Machine learning with ADA Boost
PPTX
Ensemble Learning and Random Forests
Introduction to random forest and gradient boosting methods a lecture
Bayesian Learning- part of machine learning
Decision Trees
Bayesian Networks - A Brief Introduction
Machine Learning: Introduction to Neural Networks
Lecture 8: Decision Trees & k-Nearest Neighbors
Performance Metrics for Machine Learning Algorithms
Machine Learning Model Evaluation Methods
Support Vector Machines- SVM
Tutorial on Deep Generative Models
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Multiclass classification of imbalanced data
Hidden markov model ppt
Logistic Regression.pptx
Mean shift and Hierarchical clustering
K nearest neighbor
Confusion Matrix
Machine learning with ADA Boost
Ensemble Learning and Random Forests
Ad

Similar to Support Vector Machine (20)

PPTX
Image Classification And Support Vector Machine
PPT
Linear Discrimination Centering on Support Vector Machines
PDF
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
PDF
Lecture7 cross validation
PDF
MCQMC_talk_Chiheb_Ben_hammouda.pdf
PDF
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
PDF
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
PPT
Data Selection For Support Vector Machine Classifier
PPT
Data Selection For Support Vector Machine Classifier
PPTX
Predictive analytics using 'R' Programming
PDF
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
PDF
ICCF_2022_talk.pdf
PPT
An Analysis of Graph Cut Size for Transductive Learning
PDF
Mm chap08 -_lossy_compression_algorithms
PPT
isabelle_webinar_jan..
PDF
ENS Macrh 2022.pdf
PDF
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PPT
Introduction to Support Vector Machine 221 CMU.ppt
PDF
Talk iccf 19_ben_hammouda
Image Classification And Support Vector Machine
Linear Discrimination Centering on Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
Lecture7 cross validation
MCQMC_talk_Chiheb_Ben_hammouda.pdf
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
Data Selection For Support Vector Machine Classifier
Data Selection For Support Vector Machine Classifier
Predictive analytics using 'R' Programming
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
ICCF_2022_talk.pdf
An Analysis of Graph Cut Size for Transductive Learning
Mm chap08 -_lossy_compression_algorithms
isabelle_webinar_jan..
ENS Macrh 2022.pdf
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Introduction to Support Vector Machine 221 CMU.ppt
Talk iccf 19_ben_hammouda
Ad

More from Shao-Chuan Wang (9)

PPTX
Book Cover Recognition
PPTX
Introduction to Machine Learning
PPTX
Beyond The Euclidean Distance: Creating effective visual codebooks using the ...
PPTX
Self Taught Learning
PDF
A Friendly Guide To Sparse Coding
PPTX
An Exemplar Model For Learning Object Classes
PPTX
Evaluation Of Color Descriptors For Object And Scene
PPTX
Spatially Coherent Latent Topic Model For Concurrent Object Segmentation and ...
PPTX
About Python
Book Cover Recognition
Introduction to Machine Learning
Beyond The Euclidean Distance: Creating effective visual codebooks using the ...
Self Taught Learning
A Friendly Guide To Sparse Coding
An Exemplar Model For Learning Object Classes
Evaluation Of Color Descriptors For Object And Scene
Spatially Coherent Latent Topic Model For Concurrent Object Segmentation and ...
About Python

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Insiders guide to clinical Medicine.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Types and Its function , kingdom of life
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Business Ethics Teaching Materials for college
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Insiders guide to clinical Medicine.pdf
Pre independence Education in Inndia.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Anesthesia in Laparoscopic Surgery in India
Cell Types and Its function , kingdom of life
Week 4 Term 3 Study Techniques revisited.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
102 student loan defaulters named and shamed – Is someone you know on the list?
VCE English Exam - Section C Student Revision Booklet
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
RMMM.pdf make it easy to upload and study
Business Ethics Teaching Materials for college
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharma ospi slides which help in ospi learning
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Support Vector Machine

  • 2. Support Vector Machine1D Classification Problem: how will you separate these data?(H1, H2, H3?)2H1H2H3x0
  • 3. Support Vector Machine2D Classification Problem: which H is better?3
  • 4. Max-Margin ClassifierFunctional MarginGeometric Margin4We feel more confident when functional margin is largerNote that scaling on w, b won’t change the plane.Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 5. Maximize marginsOptimization problem: maximize minimal geometric margin under constraints.Introduce scaling factor such that5Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 6. Optimization problem subject to constraintsMaximize f(x, y), subject to constraint g(x, y) = c6-> Lagrange multiplier method
  • 7. Lagrange dualityPrimal optimization problem:GeneralizedLagrangian methodPrimal optimization problem (equivalent form)Dual optimization problem:7Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 8. Dual ProblemThe necessary conditions that equality holds:f, giare convex, and hi are affine.KKT conditions.8Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 9. Optimal margin classifiersIts LagrangianIts dual problem9Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 10. Support Vector Machine (cont’d)If not linearly separable, we canFind a nonlinear solutionTechnically, it’s a linear solution in higher-order space Kernel Trick26
  • 11. Kernel and feature mappingKernel:Positive semi-definiteSymmetricFor example:Loose Intuition“similarity” between features11Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 12. Soft Margin (L1 regularization)12C = ∞ leads to hard margin SVM, Rychetsky (2001)Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 13. Why doesn’t my model fit well on test data ?13
  • 14. Bias/variance tradeoffunderfitting(high bias) overfitting(high variance) Training Error = Generalization Error =14In-sample errorOut-of-sample errorAndrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).
  • 15. Bias/variance tradeoff15T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.
  • 16. Is training error a good estimator of generalization error?16
  • 17. Chernoff bound (|H|=finite)Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and and let γ > 0 be fixed. Then, based on this lemma, one can find, with probability 1-δ(k = # of hypotheses)17Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
  • 18. Chernoff bound (|H|=infinite)VC Dimension d : The size of largest set that H can shatter.e.g. H = linear classifiersin 2-DVC(H) = 3With probability at least 1-δ,18Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).
  • 19. Model SelectionCross Validation: Estimator of generalization error
  • 20. K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation). Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner.Leave-one-out cross validation (m-fold, m = training sample size)19traintrainvalidatetraintraintrain
  • 21. Model SelectionLoop possible parameters:Pick one set of parameter, e.g. C = 2.0Do cross validation, get a error estimationPick the Cbest (with minimal error estimation) as the parameter20
  • 22. Multiclass SVMOne against oneThere are binary SVMs. (1v2, 1v3, …)To predict, each SVM can vote between 2 classes.One against allThere are k binary SVMs. (1 v rest, 2 v rest, …)To predict, evaluate , pick the largest.Multiclass SVM by solving ONE optimization problem21K = 135321123456K = 3poll Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.
  • 23. Multiclass SVM (2/2)DAGSVM (Directed Acyclic Graph SVM)22
  • 24. An Example: image classificationProcess23K = 61/4 3/41 0:49 1:25 …1 0:49 1:25 …: :2 0:49 1:25 …:Test DataAccuracy
  • 25. An Example: image classificationResultsRun Multi-class SVM 100 times for both (linear/Gaussian).Accuracy Histogram24