Support Vector Machine

Introduction to Support Vector Machine

Lucas Xu

September 4, 2012

Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20

1 Classiﬁer

2 Hyper-Plane

3 Convex Optimization

4 Kernel

5 Application


Classiﬁer

Attributes and Class Labels
Training Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}


Classiﬁer

Umeng Gender Classiﬁcation Data
user app1 app2 ··· appd gender
user1 1 0 ··· 0 male
user2 0 1 ··· 1 f emale
.
. .
. .
. .. .
. .
.
. . . . . .
usern 1 1 ··· 1 f emale

Each App belongs to one category, ≈ 20 categories.
Categories are mutual exclusive.


Classiﬁer

Umeng Gender Classiﬁcation Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}

(i)
xk ∈ {0, 1}, 0 means not installed, 1 means installed on the device
1 ≤ k ≤ d, d 30, 000, about 30,000 apps
y (i) ∈ {male, f emale}


Hyper-Plane

Figure : Hyper Plane

The hyper-plane: wT x + b = 0
Classiﬁcation function: hw,b (x) = g(wT x + b)

1 if z ≥ 0
g(z) =
−1 otherwise

Hyper-Plane

Functional Margin:
γ (i) = y (i) (wT x(i) + b)
ˆ
Scaling: set constraint normalization condition : w = 1
Geometric Margin:

w T b
γ (i) = y (i) x(i) +
w w

γ (i) should be a large positive number to increase the prediction
conﬁdence.


Hyper-Plane

Deﬁnition
The geometry margin of (w, b) with respect to training dataset S:

γ = min γ (i)
i=1,...,m


Hyper-Plane
The optimal margin classiﬁer: (Intuitive)
ﬁnd a decision boundary that maximizes the margin.

maxγ,w,b γ
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m
w = 1.


Hyper-Plane
Normalization Constraint: let function margin γ = 1
ˆ

⇓

1
maxγ,w,b
w
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m

⇓

1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m


Hyper-Plane

Convex function


Hyper-Plane

Convex function
Convex set


Hyper-Plane

Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.


Hyper-Plane

Convex function
Convex set
Basic Ideas for Support Vector Machine DONE !


Hyper-Plane

Convex function
Convex set
Basic Ideas for Support Vector Machine DONE !
More eﬃcient solution ?


Convex Optimization

Primal Problem:
1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m


Convex Optimization
Lagrangian for the original problem:
m
1 2
min max L(w, b, α) = w − αi y (i) (wT x(i) + b) − 1
w,b α:αi ≥0 2
i=1

⇓
Under K.K.T condition, transforms to its Dual problem:
m m
1
max W (α) = αi − y (i) y (j) αi αj x(i) , x(j)
α 2
i=1 i,j=1

s.t. αi ≥ 0, i = 1, ..., m
m
αi y (i) = 0
i=1


Convex Optimization
Solutions:
m
∗
w = αi y (i) x(i)
i=1
maxi:y(i) =−1 w∗T x(i) + mini:y(i) =1 w∗T x(i)
b∗ = −
2

Predict:

g(x) = wT x + b
m T
= αi y (i) x(i) x+b
i=1
m
= αi y (i) x(i) , x + b
i=1


Kernel

For most of αi , αi = 0.
For those αi > 0, (x(i) , y (i) ) are called support vectors
Only needs to compute x(i) , x
(i) (i) (i)
if we can map feature space (x1 , x2 , ...xk ) to another high
(i) (i) (i)
dimension space (z1 , z2 , ...zl ), z = φ(x)
i.e. φ(x(i) , φ(x)
we can easily compute z (i) , z = K(φ( x(i) , x ))
Use a slightly diﬀerent notation:

K(x, y) = φ(x), φ(y)

Intuitive Explanation: Measure of Similarities


Kernel

Deﬁnition
Mercer Kernel: K is positive semi-deﬁnite


Kernel

Primitive x, y


Kernel

Primitive x, y
Polynomial ( x, y + 1)d


Kernel

Primitive x, y
RBF exp(−γ||x − y||2 )


Kernel

Primitive x, y
Sigmoid tanh(κ x, y + c).


Kernel

Primitive x, y

String


Kernel

Primitive x, y

String
Tree


Apply to Umeng Gender Classiﬁcation
Problem Description
Classify the gender of a user based on apps (s)he installed and
categories of apps.
Kernel Design
m
K(x, y) = φ(xi , yj )
i,j=0


 (1 + w)xi yj if i = j
φ(xi , yj ) = xi yj if i = j but the same category
0 if not the same category


w ≥ 0 , the extra weight if two users have installed the same app.
default to 1.0
Experiment Result

Apply to Umeng Gender Classiﬁcation
 
x1
 x2 
 
 . 
 . 
.
xm
⇓
 
w · x1
 w · x2 

 . 

 . 
. 

w · xm 
 
 c1 
 
 c2 
 
 . 
 . . 
c20
ci counts the number of apps belonging to category i

references

Book: Christopher Bishop – PRML Chapter 7: Section 7.1
Slides: Andrew Moore – Support Vector Machines
Video: Bernhard Scholkopf – Kernel Methods
Video: Liva Ralaivola – Introduction to Kernel Methods
Video: Colin Campbell – Introduction to Support Vector Machines
Video: Alex Smola – Kernel Methods and Support Vector
Machines
Video: Partha Niyogi – Introduction to Kernel Methods
Many more videos on kernel-related topics here
http://www.seas.harvard.edu/courses/cs281/


Support Vector Machine

More Related Content

What's hot (19)

Viewers also liked (10)

Similar to Support Vector Machine (20)

Recently uploaded (20)

Support Vector Machine