Support Vector Machine
  (and Statistical Learning Theory)

           Tutorial
             Jason Weston
          NEC Labs America
  4 Independence Way, Princeton, USA.
         jasonw@nec-labs.com
1 Support Vector Machines: history
 • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became
   rather popular since.

 • Theoretically well motivated algorithm: developed from Statistical
   Learning Theory (Vapnik & Chervonenkis) since the 60s.

 • Empirically good performance: successful applications in many
   fields (bioinformatics, text, image recognition, . . . )
2 Support Vector Machines: history II
 • Centralized website: www.kernel-machines.org.

 • Several textbooks, e.g. ”An introduction to Support Vector
   Machines” by Cristianini and Shawe-Taylor is one.
 • A large and diverse community work on them: from machine
   learning, optimization, statistics, neural networks, functional
   analysis, etc.
3 Support Vector Machines: basics
[Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95]


            -              -             -              - margin
                                -

                                                                 margin
                    +                        +
           +
                                    +                    +
Nice properties: convex, theoretically motivated, nonlinear with kernels..
4 Preliminaries:
 • Machine learning is about learning structure from data.

 • Although the class of algorithms called ”SVM”s can do more, in this
   talk we focus on pattern recognition.

 • So we want to learn the mapping: X → Y, where x ∈ X is some
   object and y ∈ Y is a class label.
 • Let’s take the simplest case: 2-class classification. So: x ∈ Rn ,
   y ∈ {±1}.
5 Example:

Suppose we have 50 photographs of elephants and 50 photos of tigers.




                                  vs.

We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where
n = 10, 000.
Now, given a new (different) photograph we want to answer the question:
is it an elephant or a tiger? [we assume it is one or the other.]
6 Training sets and prediction models
 • input/output sets X , Y

 • training set (x1 , y1 ), . . . , (xm , ym )
 • ”generalization”: given a previously seen x ∈ X , find a suitable
   y ∈ Y.

 • i.e., want to learn a classifier: y = f (x, α), where α are the
   parameters of the function.
 • For example, if we are choosing our model from the set of
   hyperplanes in Rn , then we have:

                          f (x, {w, b}) = sign(w · x + b).
7 Empirical Risk and the true Risk
 • We can try to learn f (x, α) by choosing a function that performs well
   on training data:
                             m
                       1
            Remp (α) =            (f (xi , α), yi ) = Training Error
                       m    i=1

   where is the zero-one loss function, (y, y ) = 1, if y = y , and 0
                                             ˆ              ˆ
   otherwise. Remp is called the empirical risk.
 • By doing this we are trying to minimize the overall risk:

              R(α) =       (f (x, α), y)dP (x, y) = Test Error

   where P(x,y) is the (unknown) joint distribution function of x and y.
8 Choosing the set of functions
What about f (x, α) allowing all functions from X to {±1}?
Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1}
Test set x1 , . . . , xm ∈ X ,
         ¯            ¯¯
such that the two sets do not intersect.
For any f there exists f ∗ :
 1. f ∗ (xi ) = f (xi ) for all i
 2. f ∗ (xj ) = f (xj ) for all j
Based on the training data alone, there is no means of choosing which
function is better. On the test set however they give different results. So
generalization is not guaranteed.
=⇒ a restriction must be placed on the functions that we allow.
9 Empirical Risk and the true Risk
Vapnik & Chervonenkis showed that an upper bound on the true risk can
be given by the empirical risk + an additional term:


                                   h(log( 2m + 1) − log( η )
                                                         4
           R(α) ≤ Remp (α) +               h
                                             m
where h is the VC dimension of the set of functions parameterized by α.
 • The VC dimension of a set of functions is a measure of their capacity
   or complexity.
 • If you can describe a lot of different phenomena with a set of
   functions then the value of h is large.
[VC dim = the maximum number of points that can be separated in all
possible ways by that set of functions.]
10 VC dimension:

The VC dimension of a set of functions is the maximum number of points
that can be separated in all possible ways by that set of functions. For
hyperplanes in Rn , the VC dimension can be shown to be n + 1.
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                             x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                     x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
         x    xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                         x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x                                        x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                  x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                             x                                       x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                       x
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
         x
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                   x                  xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                         x                 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                         x                                        x   xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                                                                                                xxxxxxxxxxxx
11 VC dimension and capacity of functions

Simplification of bound:
  Test Error ≤ Training Error + Complexity of set of Models

 • Actually, a lot of bounds of this form have been proved (different
   measures of capacity). The complexity function is often called a
   regularizer.

 • If you take a high capacity set of functions (explain a lot) you get low
   training error. But you might ”overfit”.

 • If you take a very simple set of models, you have low complexity, but
   won’t get low training error.
12 Capacity of a set of functions (classification)




[Images taken from a talk by B. Schoelkopf.]
13 Capacity of a set of functions (regression)
                  y

                                     sine curve fit


                                             hyperplane fit



                                         true function
                                     x
14 Controlling the risk: model complexity

                                 Bound on the risk


                                 Confidence interval




                                          Empirical risk
                                          (training error)


         h1           h*    hn


              S1 S*    Sn
15 Capacity of hyperplanes

Vapnik & Chervonenkis also showed the following:
Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of
points X ∗ such that: mini |w · xi | = 1.
The set of decision functions fw (x) = sign(w · x) defined on X ∗ such
that ||w|| ≤ A has a VC dimension satisfying

                               h ≤ R2 A2 .

where R is the radius of the smallest sphere around the origin containing
X ∗.
=⇒ minimize ||w||2 and have low capacity
=⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
<w, x> + b > 0

                                    x

           q                    x
                       q

                                            x
<w, x> + b < 0     w                    x


                       q
               q
                           q
                               {x | <w, x> + b = 0}
{x | <w, x> + b = +1}
{x | <w, x> + b = −1}                                    Note:
                                      x
                                                                  <w, x1> + b = +1
           r                      x   x1       yi = +1            <w, x2> + b = −1
                   x2r
                                                         =>     <w , (x1−x2)> = 2
                                                x
                             ,                                  w
    yi = −1        w                       x             =>   <     , (x1−x2) = 2
                                                                           >
                                                              ||w||            ||w||

                        r
               r
                            r
                                 {x | <w, x> + b = 0}
16 Linear Support Vector Machines (at last!)

So, we would like to find the function which minimizes an objective like:
  Training Error + Complexity term
We write that as:
                     m
                 1
                           (f (xi , α), yi ) + Complexity term
                 m   i=1

For now we will choose the set of hyperplanes (we will extend this later),
so f (x) = (w · x) + b:
                           m
                      1
                                 (w · xi + b, yi ) + ||w||2
                      m    i=1

subject to mini |w · xi | = 1.
17 Linear Support Vector Machines II
That function before was a little difficult to minimize because of the step
function in (y, y ) (either 1 or 0).
                ˆ
Let’s assume we can separate the data perfectly. Then we can optimize
the following:
Minimize ||w||2 , subject to:


                      (w · xi + b) ≥ 1, if yi = 1
                    (w · xi + b) ≤ −1, if yi = −1
The last two constraints can be compacted to:

                            yi (w · xi + b) ≥ 1

This is a quadratic program.
18 SVMs : non-separable case
To deal with the non-separable case, one can rewrite the problem as:
Minimize:
                                             m
                                ||w||2 + C         ξi
                                             i=1
subject to:

                     yi (w · xi + b) ≥ 1 − ξi ,         ξi ≥ 0

This is just the same as the original objective:
                          m
                      1
                                (w · xi + b, yi ) + ||w||2
                      m   i=1

except is no longer the zero-one loss, but is called the ”hinge-loss”:
 (y, y ) = max(0, 1 − y y ). This is still a quadratic program!
     ˆ                  ˆ
-       +
-                        -        -
                 -

                                 margin
    +                        +
+       ξi
                     +       -        +
             -
19 Support Vector Machines - Primal
 • Decision function:
                            f (x) = w · x + b

 • Primal formulation:
                             1     2
       min P (w, b) =          w           + C          H1 [ yi f (xi ) ]
                             2                      i
                         maximize margin
                                            minimize training error
   Ideally H1 would count the number of errors, approximate with:
                                                H1(z)




   Hinge Loss H1 (z) = max(0, 1 − z)

                                                    0                   z
20 SVMs : non-linear case
Linear classifiers aren’t complex enough sometimes. SVM solution:
Map data into a richer feature space including nonlinear features, then
construct a hyperplane in that space so all other equations are the same!
Formally, preprocess the data with:

                               x → Φ(x)

and then learn the map from Φ(x) to y:

                          f (x) = w · Φ(x) + b.
21 SVMs : polynomial mapping

                             Φ : R2 → R3
          (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                            1                            (2)x1 x2 , x2 )
                                                                                     2

                                   x2
                                                                                     z3
                   !
                                           !                             !                !
      !                                                !
                                       !                                         !
                !                                                        !
                                           !                                              !
                                                                                              !
                                                                     !
      !    !                       r               !                                 !
                                                                             !
               r
                           r
                                           r
                                                           x1        r
                                                                     r                            !
                               r
                                                                       r !
      !
                                   r
                                           r
                                                                      rr   !
                                                                             !
                                                                                                      z1
           !               r                       !                 r r
                                                                         r
                                                           !                                          !
                       !
                                               !

            !              !                           !
                                       !                        z2
22 SVMs : non-linear case II
For example MNIST hand-writing recognition.
60,000 training examples, 10000 test examples, 28x28.
Linear SVM has around 8.5% test error.
Polynomial SVM has around 1% test error.

                     5   0   4   1   9   2   1   3   1   4


                     3   5   3   6   1   7   2   8   6   9


                     4   0   9   1   1   2   4   3   2   7


                     3   8   6   9   0   5   6   0   7   6


                     1   8   7   9   3   9   8   5   9   3


                     3   0   7   4   9   8   0   9   4   1


                     4   4   6   0   4   5   6   1   0   0


                     1   7   1   6   3   0   2   1   1   7


                     9   0   2   6   7   8   3   9   0   4


                     6   7   4   6   8   0   7   8   3   1
23 SVMs : full MNIST results

                        Classifier           Test Error
                          linear            8.4%
                    3-nearest-neighbor      2.4%
                       RBF-SVM              1.4 %
                     Tangent distance       1.1 %
                          LeNet             1.1 %
                     Boosted LeNet          0.7 %
                Translation invariant SVM   0.56 %


Choosing a good mapping Φ(·) (encoding prior knowledge + getting right
complexity of function class) for your problem improves results.
24 SVMs : the kernel trick
Problem: the dimensionality of Φ(x) can be very large, making w hard to
represent explicitly in memory, and hard for the QP to solve.
The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for
SVMs as a special case):
                                     m
                            w=             αi Φ(xi )
                                     i=1

for some variables α. Instead of optimizing w directly we can thus
optimize α.
The decision rule is now:
                               m
                     f (x) =         αi Φ(xi ) · Φ(x) + b
                               i=1

We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
25 Support Vector Machines - kernel trick II

We can rewrite all the SVM equations we saw before, but with the
w = m αi Φ(xi ) equation:
       i=1

 • Decision function:

                      f (x) =           αi Φ(xi ) · Φ(x) + b
                                    i


                          =             αi K(xi , x) + b
                                i

 • Dual formulation:
                              m
                      1                         2
       min P (w, b) =               αi Φ(xi )       + C        H1 [ yi f (xi ) ]
                      2     i=1                            i

                         maximize margin             minimize training error
26 Support Vector Machines - Dual
But people normally write it like this:
  • Dual formulation:
                                                                     
               1                                                        È α =0
                                                                           i   i
    min D(α) =              αi αj Φ(xi )·Φ(xj )−        yi αi s.t.
     α         2                                                        0≤yi αi ≤C
                      i,j                           i

  • Dual Decision function:

                            f (x) =       αi K(xi , x) + b
                                      i

  • Kernel function K(·, ·) is used to make (implicit) nonlinear feature
    map, e.g.
     – Polynomial kernel:       K(x, x ) = (x · x + 1)d .
     – RBF kernel:           K(x, x ) = exp(−γ||x − x ||2 ).
27 Polynomial-SVMs

The kernel K(x, x ) = (x · x )d gives the same result as the explicit
mapping + dot product that we described before:
     Φ : R2 → R3        (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                                          1       (2)x1 x2 , x2 )
                                                                              2
                                                            2                       2
 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 ,
                                 1     (2)x1 x2 , x2 ) · (x 1 ,
                                                   2               (2)x 1 x 2 , x 2 )
                              2                             2
                      = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2
                         1                         2

is the same as:

              K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2
                                           2         2
          = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2
                                   1        2

Interestingly, if d is large the kernel is still only requires n multiplications
to compute, whereas the explicit representation may not fit in memory!
28 RBF-SVMs
The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most
popular kernel functions. It adds a ”bump” around each data point:
                             m
                  f (x) =          αi exp(−γ||xi − x||2 ) + b
                            i=1




                    Φ




      .    .
      x    x'               Φ(x)     Φ(x')


Using this one can get state-of-the-art results.
29 SVMs : more results

There is much more in the field of SVMs/ kernel machines than we could
cover here, including:

 • Regression, clustering, semi-supervised learning and other domains.
 • Lots of other kernels, e.g. string kernels to handle text.

 • Lots of research in modifications, e.g. to improve generalization
   ability, or tailoring to a particular task.
 • Lots of research in speeding up training.

Please see text books such as the ones by Cristianini & Shawe-Taylor or
by Schoelkopf and Smola.
30 SVMs : software

Lots of SVM software:
 • LibSVM (C++)

 • SVMLight (C)
As well as complete machine learning toolboxes that include SVMs:

 • Torch (C++)
 • Spider (Matlab)

 • Weka (Java)
All available through www.kernel-machines.org.

More Related Content

XLSX
Evalucaion
PDF
Green Concept in Engineering Practice
XLSX
Manejar alexa
XLSX
Manejar alexa
PDF
Flevy.com - Collaborative Process Mapping
DOCX
3 chemical formulae and equations
PPTX
36x60 vertical templatev12
PPTX
Honey comb weave design
Evalucaion
Green Concept in Engineering Practice
Manejar alexa
Manejar alexa
Flevy.com - Collaborative Process Mapping
3 chemical formulae and equations
36x60 vertical templatev12
Honey comb weave design

Similar to Jason svm tutorial (20)

PPT
Sowk300postertemplate
PPTX
Demonstração 1 apresentação ppt
PPTX
SignWriting in an ASCII World
PPTX
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
PDF
Preventing vaw in-elections
PPT
B pprocessv3 brown papers-gsw
PDF
PPTX
42x90 horizontal templatev12
DOC
Copa do mundo 2018 goleadores
PDF
Diagramacion
PPTX
36x60 horizontal templatev12
DOC
Copa do mundo 2018 continentes paises vencedores
PDF
Specialusis Vilniaus dviračių takų plėtros planas iki 2020 metų
PPTX
Modeling and interpretation
PPTX
42x60 horizontal templatev12
PPTX
Recognize Relation-Function Part 1 edmodo
PDF
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
PDF
Lesson 13: Derivatives of Logarithmic and Exponential Functions
PDF
Linear Equations And Graphing
PDF
On the Spectral Evolution of Large Networks (PhD Thesis by Jérôme Kunegis)
Sowk300postertemplate
Demonstração 1 apresentação ppt
SignWriting in an ASCII World
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
Preventing vaw in-elections
B pprocessv3 brown papers-gsw
42x90 horizontal templatev12
Copa do mundo 2018 goleadores
Diagramacion
36x60 horizontal templatev12
Copa do mundo 2018 continentes paises vencedores
Specialusis Vilniaus dviračių takų plėtros planas iki 2020 metų
Modeling and interpretation
42x60 horizontal templatev12
Recognize Relation-Function Part 1 edmodo
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Lesson 13: Derivatives of Logarithmic and Exponential Functions
Linear Equations And Graphing
On the Spectral Evolution of Large Networks (PhD Thesis by Jérôme Kunegis)
Ad

Recently uploaded (20)

PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPT
Geologic Time for studying geology for geologist
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
The various Industrial Revolutions .pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
STKI Israel Market Study 2025 version august
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Improvisation in detection of pomegranate leaf disease using transfer learni...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Comparative analysis of machine learning models for fake news detection in so...
2018-HIPAA-Renewal-Training for executives
CloudStack 4.21: First Look Webinar slides
Enhancing plagiarism detection using data pre-processing and machine learning...
The influence of sentiment analysis in enhancing early warning system model f...
1 - Historical Antecedents, Social Consideration.pdf
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Geologic Time for studying geology for geologist
NewMind AI Weekly Chronicles – August ’25 Week III
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
A review of recent deep learning applications in wood surface defect identifi...
The various Industrial Revolutions .pptx
Developing a website for English-speaking practice to English as a foreign la...
STKI Israel Market Study 2025 version august
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Ad

Jason svm tutorial

  • 1. Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com
  • 2. 1 Support Vector Machines: history • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since. • Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. • Empirically good performance: successful applications in many fields (bioinformatics, text, image recognition, . . . )
  • 3. 2 Support Vector Machines: history II • Centralized website: www.kernel-machines.org. • Several textbooks, e.g. ”An introduction to Support Vector Machines” by Cristianini and Shawe-Taylor is one. • A large and diverse community work on them: from machine learning, optimization, statistics, neural networks, functional analysis, etc.
  • 4. 3 Support Vector Machines: basics [Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95] - - - - margin - margin + + + + + Nice properties: convex, theoretically motivated, nonlinear with kernels..
  • 5. 4 Preliminaries: • Machine learning is about learning structure from data. • Although the class of algorithms called ”SVM”s can do more, in this talk we focus on pattern recognition. • So we want to learn the mapping: X → Y, where x ∈ X is some object and y ∈ Y is a class label. • Let’s take the simplest case: 2-class classification. So: x ∈ Rn , y ∈ {±1}.
  • 6. 5 Example: Suppose we have 50 photographs of elephants and 50 photos of tigers. vs. We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where n = 10, 000. Now, given a new (different) photograph we want to answer the question: is it an elephant or a tiger? [we assume it is one or the other.]
  • 7. 6 Training sets and prediction models • input/output sets X , Y • training set (x1 , y1 ), . . . , (xm , ym ) • ”generalization”: given a previously seen x ∈ X , find a suitable y ∈ Y. • i.e., want to learn a classifier: y = f (x, α), where α are the parameters of the function. • For example, if we are choosing our model from the set of hyperplanes in Rn , then we have: f (x, {w, b}) = sign(w · x + b).
  • 8. 7 Empirical Risk and the true Risk • We can try to learn f (x, α) by choosing a function that performs well on training data: m 1 Remp (α) = (f (xi , α), yi ) = Training Error m i=1 where is the zero-one loss function, (y, y ) = 1, if y = y , and 0 ˆ ˆ otherwise. Remp is called the empirical risk. • By doing this we are trying to minimize the overall risk: R(α) = (f (x, α), y)dP (x, y) = Test Error where P(x,y) is the (unknown) joint distribution function of x and y.
  • 9. 8 Choosing the set of functions What about f (x, α) allowing all functions from X to {±1}? Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1} Test set x1 , . . . , xm ∈ X , ¯ ¯¯ such that the two sets do not intersect. For any f there exists f ∗ : 1. f ∗ (xi ) = f (xi ) for all i 2. f ∗ (xj ) = f (xj ) for all j Based on the training data alone, there is no means of choosing which function is better. On the test set however they give different results. So generalization is not guaranteed. =⇒ a restriction must be placed on the functions that we allow.
  • 10. 9 Empirical Risk and the true Risk Vapnik & Chervonenkis showed that an upper bound on the true risk can be given by the empirical risk + an additional term: h(log( 2m + 1) − log( η ) 4 R(α) ≤ Remp (α) + h m where h is the VC dimension of the set of functions parameterized by α. • The VC dimension of a set of functions is a measure of their capacity or complexity. • If you can describe a lot of different phenomena with a set of functions then the value of h is large. [VC dim = the maximum number of points that can be separated in all possible ways by that set of functions.]
  • 11. 10 VC dimension: The VC dimension of a set of functions is the maximum number of points that can be separated in all possible ways by that set of functions. For hyperplanes in Rn , the VC dimension can be shown to be n + 1. xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx
  • 12. 11 VC dimension and capacity of functions Simplification of bound: Test Error ≤ Training Error + Complexity of set of Models • Actually, a lot of bounds of this form have been proved (different measures of capacity). The complexity function is often called a regularizer. • If you take a high capacity set of functions (explain a lot) you get low training error. But you might ”overfit”. • If you take a very simple set of models, you have low complexity, but won’t get low training error.
  • 13. 12 Capacity of a set of functions (classification) [Images taken from a talk by B. Schoelkopf.]
  • 14. 13 Capacity of a set of functions (regression) y sine curve fit hyperplane fit true function x
  • 15. 14 Controlling the risk: model complexity Bound on the risk Confidence interval Empirical risk (training error) h1 h* hn S1 S* Sn
  • 16. 15 Capacity of hyperplanes Vapnik & Chervonenkis also showed the following: Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of points X ∗ such that: mini |w · xi | = 1. The set of decision functions fw (x) = sign(w · x) defined on X ∗ such that ||w|| ≤ A has a VC dimension satisfying h ≤ R2 A2 . where R is the radius of the smallest sphere around the origin containing X ∗. =⇒ minimize ||w||2 and have low capacity =⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
  • 17. <w, x> + b > 0 x q x q x <w, x> + b < 0 w x q q q {x | <w, x> + b = 0}
  • 18. {x | <w, x> + b = +1} {x | <w, x> + b = −1} Note: x <w, x1> + b = +1 r x x1 yi = +1 <w, x2> + b = −1 x2r => <w , (x1−x2)> = 2 x , w yi = −1 w x => < , (x1−x2) = 2 > ||w|| ||w|| r r r {x | <w, x> + b = 0}
  • 19. 16 Linear Support Vector Machines (at last!) So, we would like to find the function which minimizes an objective like: Training Error + Complexity term We write that as: m 1 (f (xi , α), yi ) + Complexity term m i=1 For now we will choose the set of hyperplanes (we will extend this later), so f (x) = (w · x) + b: m 1 (w · xi + b, yi ) + ||w||2 m i=1 subject to mini |w · xi | = 1.
  • 20. 17 Linear Support Vector Machines II That function before was a little difficult to minimize because of the step function in (y, y ) (either 1 or 0). ˆ Let’s assume we can separate the data perfectly. Then we can optimize the following: Minimize ||w||2 , subject to: (w · xi + b) ≥ 1, if yi = 1 (w · xi + b) ≤ −1, if yi = −1 The last two constraints can be compacted to: yi (w · xi + b) ≥ 1 This is a quadratic program.
  • 21. 18 SVMs : non-separable case To deal with the non-separable case, one can rewrite the problem as: Minimize: m ||w||2 + C ξi i=1 subject to: yi (w · xi + b) ≥ 1 − ξi , ξi ≥ 0 This is just the same as the original objective: m 1 (w · xi + b, yi ) + ||w||2 m i=1 except is no longer the zero-one loss, but is called the ”hinge-loss”: (y, y ) = max(0, 1 − y y ). This is still a quadratic program! ˆ ˆ
  • 22. - + - - - - margin + + + ξi + - + -
  • 23. 19 Support Vector Machines - Primal • Decision function: f (x) = w · x + b • Primal formulation: 1 2 min P (w, b) = w + C H1 [ yi f (xi ) ] 2 i maximize margin minimize training error Ideally H1 would count the number of errors, approximate with: H1(z) Hinge Loss H1 (z) = max(0, 1 − z) 0 z
  • 24. 20 SVMs : non-linear case Linear classifiers aren’t complex enough sometimes. SVM solution: Map data into a richer feature space including nonlinear features, then construct a hyperplane in that space so all other equations are the same! Formally, preprocess the data with: x → Φ(x) and then learn the map from Φ(x) to y: f (x) = w · Φ(x) + b.
  • 25. 21 SVMs : polynomial mapping Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 x2 z3 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! r ! ! ! r r r x1 r r ! r r ! ! r r rr ! ! z1 ! r ! r r r ! ! ! ! ! ! ! ! z2
  • 26. 22 SVMs : non-linear case II For example MNIST hand-writing recognition. 60,000 training examples, 10000 test examples, 28x28. Linear SVM has around 8.5% test error. Polynomial SVM has around 1% test error. 5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5 6 0 7 6 1 8 7 9 3 9 8 5 9 3 3 0 7 4 9 8 0 9 4 1 4 4 6 0 4 5 6 1 0 0 1 7 1 6 3 0 2 1 1 7 9 0 2 6 7 8 3 9 0 4 6 7 4 6 8 0 7 8 3 1
  • 27. 23 SVMs : full MNIST results Classifier Test Error linear 8.4% 3-nearest-neighbor 2.4% RBF-SVM 1.4 % Tangent distance 1.1 % LeNet 1.1 % Boosted LeNet 0.7 % Translation invariant SVM 0.56 % Choosing a good mapping Φ(·) (encoding prior knowledge + getting right complexity of function class) for your problem improves results.
  • 28. 24 SVMs : the kernel trick Problem: the dimensionality of Φ(x) can be very large, making w hard to represent explicitly in memory, and hard for the QP to solve. The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for SVMs as a special case): m w= αi Φ(xi ) i=1 for some variables α. Instead of optimizing w directly we can thus optimize α. The decision rule is now: m f (x) = αi Φ(xi ) · Φ(x) + b i=1 We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
  • 29. 25 Support Vector Machines - kernel trick II We can rewrite all the SVM equations we saw before, but with the w = m αi Φ(xi ) equation: i=1 • Decision function: f (x) = αi Φ(xi ) · Φ(x) + b i = αi K(xi , x) + b i • Dual formulation: m 1 2 min P (w, b) = αi Φ(xi ) + C H1 [ yi f (xi ) ] 2 i=1 i maximize margin minimize training error
  • 30. 26 Support Vector Machines - Dual But people normally write it like this: • Dual formulation:  1  È α =0 i i min D(α) = αi αj Φ(xi )·Φ(xj )− yi αi s.t. α 2  0≤yi αi ≤C i,j i • Dual Decision function: f (x) = αi K(xi , x) + b i • Kernel function K(·, ·) is used to make (implicit) nonlinear feature map, e.g. – Polynomial kernel: K(x, x ) = (x · x + 1)d . – RBF kernel: K(x, x ) = exp(−γ||x − x ||2 ).
  • 31. 27 Polynomial-SVMs The kernel K(x, x ) = (x · x )d gives the same result as the explicit mapping + dot product that we described before: Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 2 2 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 , 1 (2)x1 x2 , x2 ) · (x 1 , 2 (2)x 1 x 2 , x 2 ) 2 2 = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2 1 2 is the same as: K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2 2 2 = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2 1 2 Interestingly, if d is large the kernel is still only requires n multiplications to compute, whereas the explicit representation may not fit in memory!
  • 32. 28 RBF-SVMs The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most popular kernel functions. It adds a ”bump” around each data point: m f (x) = αi exp(−γ||xi − x||2 ) + b i=1 Φ . . x x' Φ(x) Φ(x') Using this one can get state-of-the-art results.
  • 33. 29 SVMs : more results There is much more in the field of SVMs/ kernel machines than we could cover here, including: • Regression, clustering, semi-supervised learning and other domains. • Lots of other kernels, e.g. string kernels to handle text. • Lots of research in modifications, e.g. to improve generalization ability, or tailoring to a particular task. • Lots of research in speeding up training. Please see text books such as the ones by Cristianini & Shawe-Taylor or by Schoelkopf and Smola.
  • 34. 30 SVMs : software Lots of SVM software: • LibSVM (C++) • SVMLight (C) As well as complete machine learning toolboxes that include SVMs: • Torch (C++) • Spider (Matlab) • Weka (Java) All available through www.kernel-machines.org.