Classification Theory

4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009

Classification Theory
Modelling of Kernel Machine by
Infinite and Semi-Infinite Programming

Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber *

Institute of Applied Mathematics, METU, Ankara, Turkey

* Faculty of Economics, Management Science and Law, University of Siegen, Germany
Center for Research on Optimization and Control, University of Aveiro, Portugal

1
August 7, 2009

Motivation Prediction of Cleavage Sites

signal part mature part

γ

2
August 7, 2009

Logistic Regression

 P(Y = 1 X = xl ) 
log  = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp
 P(Y = 0 X = x ) 
 l 

(l = 1, 2,..., N )

3
August 7, 2009

Linear Classifiers

Maximum margin classifier:
γ i := yi ⋅ (< w, xi > +b)

Note: γ i > 0 implies correct classification.

γ
yk ⋅ (< w, xk > +b) = 1

y j ⋅ (< w, x j > +b) = 1
4
August 7, 2009

Linear Classifiers

2
• The geometric margin: γ=
w 2

2 2
max min w
w 2
2

2

Convex min w
w ,b
2

Problem
subject to yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l)

5
August 7, 2009

Linear Classifiers

Dual Problem:

l
1 l
max ∑ α i − ∑ yi y jα iα j xi , x j
i =1 2 i , j =1
l
subject to ∑ yα
i =1
i i = 0,

α i ≥ 0 (i = 1, 2,..., l).

6
August 7, 2009

Linear Classifiers

Dual Problem:

l
1 l
max ∑ α i − ∑ yi y jα iα j κ ( xi , x j )
i =1 2 i , j =1
l kernel function
subject to ∑ yα
i =1
i i = 0,

α i ≥ 0 (i = 1, 2,..., l).

7
August 7, 2009

Linear Classifiers
Soft Margin Classifier:

• Introduce slack variables to allow the margin constraints to be
violated

subject to yi ⋅ ( w, x i + b) ≥ 1 − ξi ,
ξi ≥ 0 (i = 1, 2,..., l)

l
w + C ∑ ξi2
2
min
ξ , w ,b 2
i =1

subject to yi ⋅ ( w, xi + b) ≥ 1 − ξi ,
ξi ≥ 0 (i = 1, 2,..., l)

8
August 7, 2009

Linear Classifiers

• Projection of the data into a higher dimensional feature space.

• Mapping the input space X into a new space F :

x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x))

φ (x)
φ (x)
φ (0) φ (x) φ (x)
φ (0)
φ (x)
φ (0)
φ (0) φ (0)
φ (x)

9
August 7, 2009

Nonlinear Classifiers

N
set of hypotheses f ( x) =∑ wiφi ( x) + b,
i =1

l
dual representation f ( x) =∑ α i yi φ ( xi ), φ ( x) + b.
i =1

kernel function

Ex.: polynomial kernels κ ( x, z ) = (1 + xT z )k

sigmoid Kernel κ ( x, z ) = tanh(axT z + b)

κ ( x, z ) = exp(− x − z / σ 2 )
2
Gaussian (RBF) kernel 2

10
August 7, 2009

(In-) Finite Kernel Learning

• Based on the motivation of multiple kernel learning (MKL):

K
( ) (
κ xi , x j = ∑ β k κ k xi , x j )
k =1
kernel functions κ l (⋅, ⋅) :

βl ≥ 0 ( l = 1,K, K ) , ∑ βk = 1
K
k =1

• Semi-infinite LP formulation:

(SILP MKL)
max θ
θ ,β
(θ ∈R, β ∈RK )
∑
K
such that 0 ≤ β, β
k =1 k
= 1,

∑k =1βk Sk (α ) ≥ θ ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0.
K l

Sk (α ) :=
1 l
2
( )
∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi
l
11
August 7, 2009

Infinite Kernel Learning Infinite Programming

2
ex.: −ω xi − x j
*
κ ( xi , x j , ω ) := ω exp 2 + (1 − ω )(1 + xiT x j ) d

H (ω ) := κ ( xi , x j , ω ) homotopy

2
−ω * xi − x j
H (0) = (1 + xi x j ) d
T
H (1) = exp 2

κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω )
Ω
Infinite Programming
12
August 7, 2009


• Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
we get the following general problem formulation:

κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω ) Ω = [0,1]
Ω

13
August 7, 2009


• Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
we get the following general problem formulation:

max θ
θ ,β
(θ ∈ R, β : [0,1] → R : monotonically increasing )
(IP)
1
subject to ∫0 d β (ω ) = 1,
1 
S (ω , α ) − ∑ i =1αi  d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0.
l l
∫Ω  2
 

 
( )
1 l l
S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω  
A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0 
l
2  i =1 
 
1
T (ω , α ) := S (ω , α ) − ∑ α i
l 14
2 i =1 August 7, 2009

max θ (θ ∈ R, β : a positive measure on Ω )
(IP) θ ,β
such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A, ∫Ω d β (ω ) = 1.
Ω

infinite programming
dual of (IP):

min σ (σ ∈ R , ρ : a positive measure on A )
σ ,ρ
(DIP)
such that σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω, ∫A d ρ (α ) = 1.
A

• Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and
complementary slack, so
β has measure only where σ = ∫A T (ω , α )d ρ and
ρ has measure only where θ = ∫ T (ω , α )d β .
Ω

Then, both solutions are optimal for their respective problems.

15
August 7, 2009


• The interesting theoretical problem here is to find conditions
which ensure that solutions are point masses
(i.e., the original monotonic β is a step function).

• Because of this and in view of the compactness of the feasible (index) sets at the
lower levels, A and Ω , we are interested in the nondegeneracy of the local minima
of the lower level problem to get finitely many local minimizers of

g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ).
A

• Lower Level Problem: For a given parameter (σ , ρ ), we consider

(LLP)
min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω .
ω

16
August 7, 2009


• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measures

• “finite optimization”
17
August 7, 2009


• parametrical measures 1 −(ω − µ )2
e.g., f (ω ;( µ , σ )) =
2
exp
σ 2π 2σ 2

λ exp(−λω), ω ≥ 0
f (ω ; λ) = 
0, ω<0

H (ω − a) − H (ω − b)
f (ω ;(a, b)) =
b−a
ωα −1 (1 − ω ) β −1
f (ω;(α , β )) = 1 α −1 β −1
∫0
u (1 − u ) du
• “finite optimization”
18
August 7, 2009

Infinite Kernel Learning Reduction Ansatz

g ( x, ⋅)
%
• parametrical measures

g ( x ,.)

Ω

g ( x, y ) ≥ 0 ∀y ∈ I yj yj
% yp
⇔ min g ( x, y ) ≥ 0
y∈I x a y j ( x) implicit function
19
August 7, 2009

Infinite Kernel Learning Reduction Ansatz
based on the reduction ansatz :

min f ( x)
subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p})

g ((σ , ρ ), ⋅)

g ((σ , ρ ), ⋅)

• (σ , ρ )
•
ω ω (σ , ρ )
topology
ω = ω (σ , ρ )
% 20
August 7, 2009

Infinite Kernel Learning Regularization
regularization
t t
d d2
min − θ + sup µ ∫ d β (ω ) ∫ d β (ω )
θ ,β t∈[0,1] dt 0
2
dt 0
subject to the constraints
0 = t0 < t1 < K < tι = 1

tν +1 tν

tν ∫ d β (ω ) − ∫ d β (ω ) tν +1
d 1
∫ d β (ω ) ≈ 0 0 = ∫ d β (ω )
dt tν +1 − tν tν +1 − tν
0 tν

tν + 2 tν +1
1 1
∫ d β (ω ) − ∫ d β (ω )
2 tν tν + 2 − tν +1 tν +1 − tν
d tν +1 tν
dt 2 0
∫ d β (ω ) ≈ tν +1 − tν

21
August 7, 2009

Infinite Kernel Learning Topology

Radon measure: measure on the σ -algebra of Borel sets of E that is
locally finite and inner regular.

(E,d): metric space inner regularity
Η (E) : set of Radon measures on E
neighbourhood of measure ρ :
µ (Kν )
 
 
Bρ (ε ) :=  µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε 
f

 A A 


dual space ( Η ( E ))′ of continuous bounded functions, Kν ⊂ E : compact set
f ∈ ( Η ( E ))′

22
August 7, 2009

Infinite Kernel Learning Topology

Def.: Basis of neighbourhood of a measure ρ ( f1,..., fn ∈(Η(E))′; ε > 0) :

{µ ∈ Η (E) ∫E fi d ρ − ∫E fi d µ < ε }
(i = 1, 2,..., n) .

Def.: Prokhorov metric:

d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} ,
ε
where Aε := { x ∈ E | d ( x, A) < ε }.

Open δ -neighbourhood of a measure ρ :
Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }.

23
August 7, 2009

Infinite Kernel Learning Numerical Results

24
August 7, 2009

References
Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro
peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,
10 (May 2009) 2388-2394.

Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,
Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings
1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4
(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to
appear in the special issue of OMS (Optimization Software and Application) at the occasion of International
Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.
(guest ed.).

Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,
METU, submitted to JOGO (Journal of Global Optimization).

25
August 7, 2009

Classification Theory

More Related Content

What's hot (20)

Similar to Classification Theory (20)

More from SSA KPI (20)

Recently uploaded (20)

Classification Theory