Lecture6 svdd

Lecture 6: Minimum encoding ball and Support vector
data description (SVDD)
Stéphane Canu
stephane.canu@litislab.eu
Sao Paulo 2014
May 12, 2014

Plan
1 Support Vector Data Description (SVDD)
SVDD, the smallest enclosing ball problem
The minimum enclosing ball problem with errors
The minimum enclosing ball problem in a RKHS
The two class Support vector data description (SVDD)

The minimum enclosing ball problem [Tax and Duin, 2004]
Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 3 / 35

the center

the radius
Given n points, {xi , i = 1, n} .
min
R∈IR,c∈IRd
R2
with xi − c 2
≤ R2
, i = 1, . . . , n
What is that in the convex programming hierarchy?
LP, QP, QCQP, SOCP and SDP

The convex programming hierarchy (part of)
LP



min
x
f x
with Ax ≤ d
and 0 ≤ x
QP
min
x
1
2 x Gx + f x
with Ax ≤ d
QCQP



min
x
1
2 x Gx + f x
with x Bi x + ai x ≤ di
i = 1, n
SOCP



min
x
f x
with x − ai ≤ bi x + di
i = 1, n
The convex programming hierarchy?
Model generality: LP < QP < QCQP < SOCP < SDP

MEB as a QP in the primal
Theorem (MEB as a QP)
The two following problems are equivalent,
min
R∈IR,c∈IRd
R2
with xi − c 2
≤ R2
, i = 1, . . . , n
min
w,ρ
1
2 w 2
− ρ
with w xi ≥ ρ + 1
2 xi
2
with ρ = 1
2 ( c 2
− R2
) and w = c.
Proof:
xi − c 2
≤ R2
xi
2
− 2xi c + c 2
≤ R2
−2xi c ≤ R2
− xi
2
− c 2
2xi c ≥ −R2
+ xi
2
+ c 2
xi c ≥
1
2
( c 2
− R2
)
ρ
+1
2 xi
2

MEB and the one class SVM
SVDD:
min
w,ρ
1
2 w 2
− ρ
2 xi
2
SVDD and linear OCSVM (Supporting Hyperplane)
if ∀i = 1, n, xi
2
= constant, it is the the linear one class SVM (OC SVM)
The linear one class SVM [Schölkopf and Smola, 2002]
min
w,ρ
1
2 w 2
− ρ
with w xi ≥ ρ
with ρ = ρ + 1
2 xi
2
⇒ OC SVM is a particular case of SVDD

When ∀i = 1, n, xi
2
= 1
0
c
xi − c 2
≤ R2
⇔ w xi ≥ ρ
with
ρ =
1
2
( c 2
− R + 1)
SVDD and OCSVM
"Belonging to the ball" is also "being above" an hyperplane

MEB: KKT
L(c, R, α) = R2
+
n
i=1
αi xi − c 2
− R2
KKT conditionns :
stationarty 2c
n
i=1
αi − 2
n
i=1
αi xi = 0 ← The representer theorem
1 −
n
i=1
αi = 0
primal admiss. xi − c 2
≤ R2
dual admiss. αi ≥ 0 i = 1, n
complementarity αi xi − c 2
− R2
= 0 i = 1, n

MEB: KKT
the radius
L(c, R, α) = R2
+
n
i=1
αi xi − c 2
− R2
KKT conditionns :
stationarty 2c
n
i=1
αi − 2
n
i=1
αi xi = 0 ← The representer theorem
1 −
n
i=1
αi = 0
primal admiss. xi − c 2
≤ R2
dual admiss. αi ≥ 0 i = 1, n
complementarity αi xi − c 2
− R2
= 0 i = 1, n
Complementarity tells us: two groups of points
the support vectors xi − c 2
= R2
and the insiders αi = 0

MEB: Dual
The representer theorem:
c =
n
i=1
αi xi
n
i=1
αi
=
n
i=1
αi xi
L(α) =
n
i=1
αi xi −
n
j=1
αj xj
2
n
i=1
n
j=1
αi αj xi xj = α Gα and
n
i=1
αi xi xi = α diag(G)
with G = XX the Gram matrix: Gij = xi xj ,



min
α∈IRn
α Gα − α diag(G)
with e α = 1
and 0 ≤ αi , i = 1 . . . n

SVDD primal vs. dual
Primal



min
R∈IR,c∈IRd
R2
with xi − c 2 ≤ R2,
i = 1, . . . , n
d + 1 unknown
n constraints
can be recast as a QP
perfect when d << n
Dual



min
α
with e α = 1
and 0 ≤ αi ,
i = 1 . . . n
n unknown with G the pairwise
inﬂuence Gram matrix
n box constraints
easy to solve
to be used when d > n

SVDD primal vs. dual
Primal



min
R∈IR,c∈IRd
R2
with xi − c 2 ≤ R2,
i = 1, . . . , n
d + 1 unknown
n constraints
can be recast as a QP
perfect when d << n
Dual



min
α
with e α = 1
and 0 ≤ αi ,
i = 1 . . . n
n unknown with G the pairwise
inﬂuence Gram matrix
n box constraints
easy to solve
to be used when d > n
But where is R2?

Looking for R2
min
α
with e α = 1, 0 ≤ αi , i = 1, n
The Lagrangian: L(α, µ, β) = α Gα − α diag(G) + µ(e α − 1) − β α
Stationarity cond.: αL(α, µ, β) = 2Gα − diag(G) + µe − β = 0
The bi dual
min
α
α Gα + µ
with −2Gα + diag(G) ≤ µe
by identiﬁcation
R2
= µ + α Gα = µ + c 2
µ is the Lagrange multiplier associated with the equality constraint
n
i=1
αi = 1
Also, because of the complementarity condition, if xi is a support vector, then
βi = 0 implies αi > 0 and R2
= xi − c 2
.

Plan

the slack
The same road map:
initial formuation
reformulation (as a QP)
Lagrangian, KKT
dual formulation
bi dual
Initial formulation: for a given C



min
R,a,ξ
R2
+ C
n
i=1
ξi
with xi − c 2
≤ R2
+ ξi , i = 1, . . . , n
and ξi ≥ 0, i = 1, . . . , n

The MEB with slack: QP, KKT, dual and R2
SVDD as a QP:



min
w,ρ
1
2 w 2
− ρ + C
2
n
i=1
ξi
2 xi
2
− 1
2 ξi
and ξi ≥ 0,
i = 1, n
again with OC SVM as a particular case.
With G = XX
Dual SVDD:



min
α
with e α = 1
and 0 ≤ αi ≤ C,
i = 1, n
for a given C ≤ 1. If C is larger than one it is useless (it’s the no slack case)
R2
= µ + c c
with µ denoting the Lagrange multiplier associated with the equality
constraint n
i=1 αi = 1.

Variations over SVDD
Adaptive SVDD: the weighted error case for given wi , i = 1, n



min
c∈IRp,R∈IR,ξ∈IRn
R + C
n
i=1
wi ξi
with xi − c 2
≤ R+ξi
ξi ≥ 0 i = 1, n
The dual of this problem is a QP [see for instance Liu et al., 2013]
min
α∈IRn
α XX α − α diag(XX )
with
n
i=1 αi = 1 0 ≤ αi ≤ Cwi i = 1, n
Density induced SVDD (D-SVDD):



min
c∈IRp,R∈IR,ξ∈IRn
R + C
n
i=1
ξi
with wi xi − c 2
≤ R+ξi
ξi ≥ 0 i = 1, n

Plan

SVDD in a RKHS
The feature map: IRp
−→ H
c −→ f (•)
xi −→ k(xi , •)
xi − c IRp ≤ R2
−→ k(xi , •) − f (•) 2
H ≤ R2
Kernelized SVDD (in a RKHS) is also a QP



min
f ∈H,R∈IR,ξ∈IRn
R2
+ C
n
i=1
ξi
with k(xi , •) − f (•) 2
H ≤ R2
+ξi i = 1, n
ξi ≥ 0 i = 1, n

SVDD in a RKHS: KKT, Dual and R2
L = R2
+ C
n
i=1
ξi +
n
i=1
αi k(xi , .) − f (.) 2
H − R2
−ξi −
n
i=1
βi ξi
= R2
+ C
n
i=1
ξi +
n
i=1
αi k(xi , xi ) − 2f (xi ) + f 2
H − R2
−ξi −
n
i=1
βi ξi
KKT conditions
Stationarity
2f (.)
n
i=1 αi − 2
n
i=1 αi k(., xi ) = 0 ← The representer theorem
1 −
n
i=1 αi = 0
C − αi − βi = 0
Primal admissibility: k(xi , .) − f (.) 2
≤ R2
+ ξi , ξi ≥ 0
Dual admissibility: αi ≥ 0 , βi ≥ 0
Complementarity
αi k(xi , .) − f (.) 2
− R2
− ξi = 0
βi ξi = 0

SVDD in a RKHS: Dual and R2
L(α) =
n
i=1
αi k(xi , xi ) − 2
n
i=1
f (xi ) + f 2
H with f (.) =
n
j=1
αj k(., xj )
=
n
i=1
αi k(xi , xi ) −
n
i=1
n
j=1
αi αj k(xi , xj )
Gij
Gij = k(xi , xj ) 


min
α
with e α = 1
and 0 ≤ αi ≤ C, i = 1 . . . n
As it is in the linear case:
R2
= µ + f 2
H
with µ denoting the Lagrange multiplier associated with the equality
constraint n
i=1 αi = 1.

SVDD train and val in a RKHS
Train using the dual form (in: G, C; out: α, µ)



min
α
with e α = 1
and 0 ≤ αi ≤ C, i = 1 . . . n
Val with the center in the RKHS: f (.) =
n
i=1 αi k(., xi )
φ(x) = k(x, .) − f (.) 2
H − R2
= k(x, .) 2
H − 2 k(x, .), f (.) H + f (.) 2
H − R2
= k(x, x) − 2f (x) + R2
− µ − R2
= −2f (x) + k(x, x) − µ
= −2
n
i=1
αi k(x, xi ) + k(x, x) − µ
φ(x) = 0 is the decision border

An important theoretical result
For a well-calibrated bandwidth,
The SVDD estimates the underlying distribution level set [Vert and Vert,
2006]
The level sets of a probability density function IP(x) are the set
Cp = {x ∈ IRd
| IP(x) ≥ p}
It is well estimated by the empirical minimum volume set
Vp = {x ∈ IRd
| k(x, .) − f (.) 2
H − R2
≥ 0}
The frontiers coincides

SVDD: the generalization error
For a well-calibrated bandwidth,
(x1, . . . , xn) i.i.d. from some ﬁxed but unknown IP(x)
Then [Shawe-Taylor and Cristianini, 2004] with probability at least 1 − δ,
(∀δ ∈]0, 1[), for any margin m > 0
IP k(x, .) − f (.) 2
H ≥ R2
+ m ≤
1
mn
n
i=1
ξi +
6R2
m
√
n
+ 3
ln(2/δ)
2n

Equivalence between SVDD and OCSVM for translation
invariant kernels (diagonal constant kernels)
Theorem
Let H be a RKHS on some domain X endowed with kernel k. If there
exists some constant c such that ∀x ∈ X, k(x, x) = c, then the two
following problems are equivalent,



min
f ,R,ξ
R + C
n
i=1
ξi
with k(xi , .) − f (.) 2
H ≤ R+ξi
ξi ≥ 0 i = 1, n



min
f ,ρ,ξ
1
2 f 2
H − ρ + C
n
i=1
εi
with f (xi ) ≥ ρ − εi
εi ≥ 0 i = 1, n
with ρ = 1
2(c + f 2
H − R) and εi = 1
2ξi .

Proof of the Equivalence between SVDD and OCSVM



min
R + C
n
i=1
ξi
with k(xi , .) − f (.) 2
H ≤ R+ξi , ξi ≥ 0 i = 1, n
since k(xi , .) − f (.) 2
H = k(xi , xi ) + f 2
H − 2f (xi )



min
R + C
n
i=1
ξi
with 2f (xi ) ≥ k(xi , xi ) + f 2
H − R−ξi , ξi ≥ 0 i = 1, n.
Introducing ρ = 1
2 (c + f 2
H − R) that is R = c + f 2
H − 2ρ, and since k(xi , xi )
is constant and equals to c the SVDD problem becomes



min
f ∈H,ρ∈IR,ξ∈IRn
1
2 f 2
H − ρ + C
2
n
i=1
ξi
with f (xi ) ≥ ρ−1
2 ξi , ξi ≥ 0 i = 1, n

leading to the classical one class SVM formulation (OCSVM)



min
f ∈H,ρ∈IR,ξ∈IRn
1
2 f 2
H − ρ + C
n
i=1
εi
with f (xi ) ≥ ρ − εi , εi ≥ 0 i = 1, n
with εi = 1
2 ξi . Note that by putting ν = 1
nC we can get the so called ν
formulation of the OCSVM



min
f ∈H,ρ ∈IR,ξ ∈IRn
1
2 f 2
H − nνρ +
n
i=1
ξi
with f (xi ) ≥ ρ − ξi , ξi ≥ 0 i = 1, n
with f = Cf , ρ = Cρ, and ξ = Cξ.

Duality
Note that the dual of the SVDD is
min
α∈IRn
α Gα − α g
with n
i=1 αi = 1 0 ≤ αi ≤ C i = 1, n
where G is the kernel matrix of general term Gi,j = k(xi , xj ) and g the
diagonal vector such that gi = k(xi , xi ) = c. The dual of the OCSVM is
the following equivalent QP
min
α∈IRn
1
2α Gα
with n
i=1 αi = 1 0 ≤ αi ≤ C i = 1, n
Both dual forms provide the same solution α, but not the same Lagrange
multipliers. ρ is the Lagrange multiplier of the equality constraint of the
dual of the OCSVM and R = c + α Gα − 2ρ. Using the SVDD dual, it
turns out that R = λeq + α Gα where λeq is the Lagrange multiplier of
the equality constraint of the SVDD dual form.

Plan

−4 −3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
4
−4 −3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
4
.



min
c,R,ξ+,ξ−
R2
+C
yi =1
ξ+
i +
yi =−1
ξ−
i
with xi − c 2
≤ R2
+ξ+
i , ξ+
i ≥ 0 i such that yi = 1
and xi − c 2
≥ R2
−ξ−
i , ξ−
i ≥ 0 i such that yi = −1

The two class SVDD as a QP



min
c,R,ξ+,ξ−
R2
+C
yi =1
ξ+
i +
yi =−1
ξ−
i
with xi − c 2
≤ R2
+ξ+
i , ξ+
and xi − c 2
≥ R2
−ξ−
i , ξ−
xi
2
− 2xi c + c 2
≤ R2
+ξ+
i , ξ+
xi
2
− 2xi c + c 2
≥ R2
−ξ−
i , ξ−
2xi c ≥ c 2
− R2
+ xi
2
−ξ+
i , ξ+
−2xi c ≥ − c 2
+ R2
− xi
2
−ξ−
i , ξ−
2yi xi c ≥ yi ( c 2
− R2
+ xi
2
)−ξi , ξi ≥ 0 i = 1, n
change variable: ρ = c 2 − R2



min
c,ρ,ξ
c 2
− ρ + C
n
i=1 ξi
with 2yi xi c ≥ yi (ρ − xi
2
)−ξi i = 1, n
and ξi ≥ 0 i = 1, n

The dual of the two class SVDD
Gij = yi yj xi xj
The dual formulation:



min
α∈IRn
α Gα −
n
i=1 αi yi xi
2
with
n
i=1
yi αi = 1
0 ≤ αi ≤ C i = 1, n

The two class SVDD vs. one class SVDD
The two class SVDD (left) vs. the one class SVDD (right)

Small Sphere and Large Margin (SSLM) approach
Support vector data description with margin [Wu and Ye, 2009]



min
w,R,ξ∈IRn
R2
+C
yi =1
ξ+
i +
yi =−1
ξ−
i
with xi − c 2
≤ R2
− 1+ξ+
i , ξ+
and xi − c 2
≥ R2
+ 1−ξ−
i , ξ−
xi − c 2
≥ R2
+ 1−ξ−
i and yi = −1 ⇐⇒ yi xi − c 2
≤ yi R2
− 1+ξ−
i
L(c, R, ξ, α, β) = R2
+C
n
i=1
ξi +
n
i=1
αi yi xi − c 2
− yi R2
+ 1−ξi −
n
i=1
βi ξi
−4 −3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
4

SVDD with margin – dual formulation
L(c, R, ξ, α, β) = R2
+C
n
i=1
ξi +
n
i=1
αi yi xi − c 2
− yi R2
+ 1−ξi −
n
i=1
βi ξi
Optimality: c =
n
i=1
αi yi xi ;
n
i=1
αi yi = 1 ; 0 ≤ αi ≤ C
L(α) =
n
i=1
αi yi xi −
n
j=1
αi yj xj
2
+
n
i=1
αi
= −
n
i=1
n
j=1
αj αi yi yj xj xi +
n
i=1
xi
2
yi αi +
n
i=1
αi
Dual SVDD is also a quadratic program
problem D



min
α∈IRn
α Gα − e α − f α
with y α = 1
and 0 ≤ αi ≤ C i = 1, n
with G a symmetric matrix n × n such that Gij = yi yj xj xi and fi = xi
2
yi

Conclusion
Applications
outlier detection
change detection
clustering
large number of classes
variable selection, . . .
A clear path
reformulation (to a standart problem)
KKT
Dual
Bidual
a lot of variations
L2
SVDD
two classes non symmetric
two classes in the symmetric classes (SVM)
the multi classes issue
practical problems with translation invariant
kernels
.

Bibliography
Bo Liu, Yanshan Xiao, Longbing Cao, Zhifeng Hao, and Feiqi Deng.
Svdd-based outlier detection on uncertain data. Knowledge and
information systems, 34(3):597–618, 2013.
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.
John Shawe-Taylor and Nello Cristianini. Kernel methods for pattern
analysis. Cambridge university press, 2004.
David MJ Tax and Robert PW Duin. Support vector data description.
Machine learning, 54(1):45–66, 2004.
Régis Vert and Jean-Philippe Vert. Consistency and convergence rates of
one-class svms and related algorithms. The Journal of Machine Learning
Research, 7:817–854, 2006.
Mingrui Wu and Jieping Ye. A small sphere and large margin approach for
novelty detection using training data with outliers. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 31(11):2088–2092, 2009.

Lecture6 svdd

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Lecture6 svdd (20)

More from Stéphane Canu (10)

Recently uploaded (20)

Lecture6 svdd