SlideShare a Scribd company logo
4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009



                              Classification Theory
                       Modelling of Kernel Machine by
                   Infinite and Semi-Infinite Programming

                   Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber *

                  Institute of Applied Mathematics, METU, Ankara, Turkey

       * Faculty of Economics, Management Science and Law, University of Siegen, Germany
           Center for Research on Optimization and Control, University of Aveiro, Portugal



                                               1
                                                                               August 7, 2009
Motivation      Prediction of Cleavage Sites




signal part   mature part




                                 γ


                            2
                                           August 7, 2009
Logistic Regression

          P(Y = 1 X = xl ) 
     log                     = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp
          P(Y = 0 X = x ) 
                        l 


                                                           (l = 1, 2,..., N )




                                  3
                                                            August 7, 2009
Linear Classifiers

  Maximum margin classifier:
                                       γ i := yi ⋅ (< w, xi > +b)

                               Note:   γ i > 0 implies correct classification.




                        γ
                                                    yk ⋅ (< w, xk > +b) = 1




        y j ⋅ (< w, x j > +b) = 1
                                            4
                                                                              August 7, 2009
Linear Classifiers


                                   2
 •   The geometric margin:      γ=
                                   w            2

                 2                                  2
        max                           min       w
                 w   2
                                                    2




                                  2

       Convex            min w
                          w ,b
                                  2

       Problem
                         subject to    yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l)



                                            5
                                                                           August 7, 2009
Linear Classifiers



  Dual Problem:

                          l
                             1 l
                  max ∑ α i − ∑ yi y jα iα j xi , x j
                      i =1   2 i , j =1
                                l
                  subject to   ∑ yα
                               i =1
                                      i   i   = 0,

                               α i ≥ 0 (i = 1, 2,..., l).



                                      6
                                                            August 7, 2009
Linear Classifiers



  Dual Problem:

                          l
                             1 l
                  max ∑ α i − ∑ yi y jα iα j κ ( xi , x j )
                      i =1   2 i , j =1
                                l                    kernel function
                  subject to   ∑ yα
                               i =1
                                      i   i   = 0,

                               α i ≥ 0 (i = 1, 2,..., l).



                                      7
                                                                 August 7, 2009
Linear Classifiers
     Soft Margin Classifier:

 •    Introduce slack variables to allow the margin constraints to be
      violated


                    subject to        yi ⋅ ( w, x i + b) ≥ 1 − ξi ,
                                      ξi ≥ 0                     (i = 1, 2,..., l)


                                                     l
                                        w + C ∑ ξi2
                                           2
                       min
                           ξ , w ,b        2
                                                    i =1

                       subject to       yi ⋅ ( w, xi + b) ≥ 1 − ξi ,
                                       ξi ≥ 0                    (i = 1, 2,..., l)

                                                8
                                                                                     August 7, 2009
Linear Classifiers

• Projection of the data into a higher dimensional feature space.

• Mapping the input space X into a new space F :


                       x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x))




                                                                                    φ (x)
                                                                        φ (x)
                                                        φ (0)               φ (x)    φ (x)
                                                        φ (0)
                                                                                    φ (x)
                                                                φ (0)
                                                          φ (0)          φ (0)
                                                                                        φ (x)



                                         9
                                                                                        August 7, 2009
Nonlinear Classifiers

                                             N
 set of hypotheses                 f ( x) =∑ wiφi ( x) + b,
                                            i =1

                                            l
 dual representation               f ( x) =∑ α i yi φ ( xi ), φ ( x) + b.
                                           i =1


                                                    kernel function



       Ex.:       polynomial kernels               κ ( x, z ) = (1 + xT z )k

                  sigmoid Kernel                   κ ( x, z ) = tanh(axT z + b)

                                                   κ ( x, z ) = exp(− x − z / σ 2 )
                                                                               2
                  Gaussian (RBF) kernel                                        2




                                          10
                                                                                   August 7, 2009
(In-) Finite Kernel Learning

     •       Based on the motivation of multiple kernel learning (MKL):

                              K
               (         )                 (
             κ xi , x j = ∑ β k κ k xi , x j          )
                             k =1
                                                              kernel functions κ l (⋅, ⋅) :

                                                              βl ≥ 0 ( l = 1,K, K ) ,      ∑          βk = 1
                                                                                               K
                                                                                               k =1

     •       Semi-infinite LP formulation:



      (SILP MKL)
                                    max θ
                                    θ ,β
                                                    (θ ∈R, β ∈RK )
                                                              ∑
                                                                K
                                    such that       0 ≤ β,          β
                                                                k =1 k
                                                                          = 1,

                                                    ∑k =1βk Sk (α ) ≥ θ          ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0.
                                                      K                                                          l



Sk (α ) :=
             1 l
             2
                                       (        )
               ∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi
                                                     l
                                                                      11
                                                                                                               August 7, 2009
Infinite Kernel Learning Infinite Programming

                                                                  2
     ex.:                                           −ω xi − x j
                                                         *
                    κ ( xi , x j , ω ) := ω exp                   2   + (1 − ω )(1 + xiT x j ) d


            H (ω ) := κ ( xi , x j , ω )                             homotopy


                                                                                                          2
                                                                                          −ω * xi − x j
                                           H (0) = (1 + xi x j ) d
                                                         T
                                                                            H (1) = exp                   2




 κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω )
                    Ω
                                                                          Infinite Programming
                                                    12
                                                                                      August 7, 2009
Infinite Kernel Learning Infinite Programming

•   Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
    we get the following general problem formulation:

                      κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω )    Ω = [0,1]
                                        Ω




                                                 13
                                                                          August 7, 2009
Infinite Kernel Learning Infinite Programming

 •    Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
      we get the following general problem formulation:



               max θ
                 θ ,β
                            (θ ∈ R, β : [0,1] → R : monotonically increasing )
     (IP)
                                 1
               subject to       ∫0 d β (ω ) = 1,
                  1                        
                      S (ω , α ) − ∑ i =1αi  d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0.
                                     l                                                 l
               ∫Ω  2
                                           




                                                                                                     
                                       (              )
              1 l                                                                           l
S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω                                                  
                                                               A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0 
                                                                          l
              2                                                                          i =1        
                                                                                                     
             1
T (ω , α ) := S (ω , α ) − ∑ α i
                             l                            14
             2               i =1                                                    August 7, 2009
Infinite Kernel Learning Infinite Programming
                max θ       (θ ∈ R, β :    a positive measure on Ω )
(IP)            θ ,β
                such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A,           ∫Ω d β (ω ) = 1.
                                  Ω

                                                                            infinite programming
dual of (IP):

                min σ       (σ ∈ R , ρ :   a positive measure on A )
                σ ,ρ
(DIP)
                such that    σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω,       ∫A d ρ (α ) = 1.
                                 A

•    Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and
     complementary slack, so
    β has measure only where σ = ∫A T (ω , α )d ρ    and
    ρ has measure only where θ = ∫ T (ω , α )d β .
                                      Ω


    Then, both solutions are optimal for their respective problems.


                                                   15
                                                                                          August 7, 2009
Infinite Kernel Learning Infinite Programming

 •   The interesting theoretical problem here is to find conditions
     which ensure that solutions are point masses
     (i.e., the original monotonic β is a step function).

 •   Because of this and in view of the compactness of the feasible (index) sets at the
     lower levels, A and Ω , we are interested in the nondegeneracy of the local minima
     of the lower level problem to get finitely many local minimizers of

                      g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ).
                                                A


 •   Lower Level Problem: For a given parameter (σ , ρ ), we consider

      (LLP)
                     min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω .
                      ω



                                                16
                                                                             August 7, 2009
Infinite Kernel Learning Infinite Programming


• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measures




•   “finite optimization”
                              17
                                        August 7, 2009
Infinite Kernel Learning Infinite Programming


• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measures                                       1      −(ω − µ )2
                                   e.g., f (ω ;( µ , σ )) =
                                                    2
                                                                 exp
                                                            σ 2π       2σ 2

                                                     λ exp(−λω), ω ≥ 0
                                         f (ω ; λ) = 
                                                     0,          ω<0

                                                            H (ω − a) − H (ω − b)
                                         f (ω ;(a, b)) =
                                                                    b−a
                                                                ωα −1 (1 − ω ) β −1
                                         f (ω;(α , β )) =    1 α −1         β −1
                                                            ∫0
                                                               u    (1 − u ) du
•   “finite optimization”
                              18
                                                                      August 7, 2009
Infinite Kernel Learning Reduction Ansatz


• “reduction ansatz” and
• Implicit Function Theorem
                                                     g ( x, ⋅)
                                                         %
• parametrical measures

                                         g ( x ,.)




                                                                 Ω

  g ( x, y ) ≥ 0 ∀y ∈ I                              yj yj
                                                        %                   yp
  ⇔ min g ( x, y ) ≥ 0
     y∈I                           x a y j ( x)            implicit function
                              19
                                                                     August 7, 2009
Infinite Kernel Learning Reduction Ansatz
based on the reduction ansatz :

 min f ( x)
 subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p})


                                                         g ((σ , ρ ), ⋅)



                                                                           g ((σ , ρ ), ⋅)



                                               • (σ , ρ )
                                  •
                            ω     ω           (σ , ρ )
                                                                                         topology
 ω = ω (σ , ρ )
 %                                      20
                                                                                  August 7, 2009
Infinite Kernel Learning Regularization
regularization
                                t                                                                  t
                    d                                                                        d2
   min − θ + sup µ     ∫ d β (ω )                                                                  ∫ d β (ω )
   θ ,β     t∈[0,1] dt 0
                                                                                               2
                                                                                             dt 0
         subject to the constraints
                                                                                     0 = t0 < t1 < K < tι = 1

                                                   tν +1              tν

                                tν                  ∫      d β (ω ) − ∫ d β (ω )                         tν +1
                           d                                                                   1
                                 ∫ d β (ω ) ≈ 0                        0                =                    ∫    d β (ω )
                           dt                                 tν +1 − tν                  tν +1 − tν
                                 0                                                                           tν

                                                                     tν + 2                            tν +1
                                                           1                                  1
                                                                       ∫      d β (ω ) −                ∫      d β (ω )
                               2 tν                 tν + 2 − tν +1                       tν +1 − tν
                           d                                         tν +1                              tν
                          dt 2 0
                                    ∫ d β (ω ) ≈                                tν +1 − tν

                                                      21
                                                                                                       August 7, 2009
Infinite Kernel Learning Topology

Radon measure: measure on the σ -algebra of Borel sets of E that is
locally finite and inner regular.


(E,d):    metric space                                            inner regularity
Η (E) :   set of Radon measures on E
neighbourhood of measure ρ :
                                                                          µ (Kν )
                                            
                                            
Bρ (ε ) :=  µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε 
 f
           
                        A        A          
                                             

dual space ( Η ( E ))′ of continuous bounded functions,               Kν ⊂ E : compact set
f ∈ ( Η ( E ))′

                                             22
                                                                             August 7, 2009
Infinite Kernel Learning Topology

Def.: Basis of neighbourhood of a measure    ρ ( f1,..., fn ∈(Η(E))′; ε > 0) :

       {µ ∈ Η (E)         ∫E fi d ρ − ∫E fi d µ < ε                     }
                                                        (i = 1, 2,..., n) .


Def.: Prokhorov metric:

      d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} ,
                    ε
      where     Aε := { x ∈ E | d ( x, A) < ε }.

      Open    δ -neighbourhood of a measure ρ :
      Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }.


                                                   23
                                                                                 August 7, 2009
Infinite Kernel Learning        Numerical Results




                           24
                                              August 7, 2009
References
Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro
peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,
10 (May 2009) 2388-2394.

Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,
Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings
1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4
(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to
appear in the special issue of OMS (Optimization Software and Application) at the occasion of International
Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.
(guest ed.).

Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,
METU, submitted to JOGO (Journal of Global Optimization).




                                                           25
                                                                                                   August 7, 2009

More Related Content

PDF
iTute Notes MM
PDF
Prediction of Financial Processes
PDF
Iceaa07 Foils
PDF
Machine learning of structured outputs
PDF
CG OpenGL Shadows + Light + Texture -course 10
PDF
Lecture9
PDF
Datamining 6th Svm
PDF
Datamining 6th svm
iTute Notes MM
Prediction of Financial Processes
Iceaa07 Foils
Machine learning of structured outputs
CG OpenGL Shadows + Light + Texture -course 10
Lecture9
Datamining 6th Svm
Datamining 6th svm

What's hot (20)

PDF
Modern features-part-3-software
KEY
provenance of lists - TAPP'11 Mini-tutorial
PDF
Montpellier Math Colloquium
PDF
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
KEY
集合知プログラミングゼミ第1回
PDF
Further Advanced Methods from Mathematical Optimization
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
On recent improvements in the conic optimizer in MOSEK
PDF
cOnscienS: social and organizational framework for gaming AI
PDF
Structured regression for efficient object detection
PDF
Functional Programming in C++
PDF
Lecture 2: linear SVM in the dual
PDF
Lesson 24: The Definite Integral (Section 10 version)
PDF
Lesson 24: The Definite Integral (Section 4 version)
PDF
Journey to structure from motion
PDF
Common derivatives integrals_reduced
PDF
Lecture3 linear svm_with_slack
PDF
Lecture5 kernel svm
PPT
Bai giang ham so kha vi va vi phan cua ham nhieu bien
Modern features-part-3-software
provenance of lists - TAPP'11 Mini-tutorial
Montpellier Math Colloquium
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
集合知プログラミングゼミ第1回
Further Advanced Methods from Mathematical Optimization
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
On recent improvements in the conic optimizer in MOSEK
cOnscienS: social and organizational framework for gaming AI
Structured regression for efficient object detection
Functional Programming in C++
Lecture 2: linear SVM in the dual
Lesson 24: The Definite Integral (Section 10 version)
Lesson 24: The Definite Integral (Section 4 version)
Journey to structure from motion
Common derivatives integrals_reduced
Lecture3 linear svm_with_slack
Lecture5 kernel svm
Bai giang ham so kha vi va vi phan cua ham nhieu bien
Ad

Similar to Classification Theory (20)

PDF
Neural Networks
PDF
Regression Theory
PDF
Support Vector Machines
PDF
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
PDF
Support Vector Machine
PDF
Lagrange
PDF
Maximum Likelihood Estimation
PPT
PPT
Submodularity slides
PDF
Semi-Infinite and Robust Optimization
PDF
05 history of cv a machine learning (theory) perspective on computer vision
PDF
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
PDF
Intro to Classification: Logistic Regression & SVM
PPT
ppt
PPT
ppt
PDF
Predicting Real-valued Outputs: An introduction to regression
PDF
Technical Tricks of Vowpal Wabbit
DOC
Spring 2003
DOC
Problem 1 – First-Order Predicate Calculus (15 points)
DOC
Problem 1 – First-Order Predicate Calculus (15 points)
Neural Networks
Regression Theory
Support Vector Machines
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
Support Vector Machine
Lagrange
Maximum Likelihood Estimation
Submodularity slides
Semi-Infinite and Robust Optimization
05 history of cv a machine learning (theory) perspective on computer vision
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Intro to Classification: Logistic Regression & SVM
ppt
ppt
Predicting Real-valued Outputs: An introduction to regression
Technical Tricks of Vowpal Wabbit
Spring 2003
Problem 1 – First-Order Predicate Calculus (15 points)
Problem 1 – First-Order Predicate Calculus (15 points)
Ad

More from SSA KPI (20)

PDF
Germany presentation
PDF
Grand challenges in energy
PDF
Engineering role in sustainability
PDF
Consensus and interaction on a long term strategy for sustainable development
PDF
Competences in sustainability in engineering education
PDF
Introducatio SD for enginers
PPT
DAAD-10.11.2011
PDF
Talking with money
PDF
'Green' startup investment
PDF
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
PDF
Dynamics of dice games
PPT
Energy Security Costs
PPT
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
PDF
Advanced energy technology for sustainable development. Part 5
PDF
Advanced energy technology for sustainable development. Part 4
PDF
Advanced energy technology for sustainable development. Part 3
PDF
Advanced energy technology for sustainable development. Part 2
PDF
Advanced energy technology for sustainable development. Part 1
PPT
Fluorescent proteins in current biology
PPTX
Neurotransmitter systems of the brain and their functions
Germany presentation
Grand challenges in energy
Engineering role in sustainability
Consensus and interaction on a long term strategy for sustainable development
Competences in sustainability in engineering education
Introducatio SD for enginers
DAAD-10.11.2011
Talking with money
'Green' startup investment
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
Dynamics of dice games
Energy Security Costs
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 1
Fluorescent proteins in current biology
Neurotransmitter systems of the brain and their functions

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Pharma ospi slides which help in ospi learning
PDF
Pre independence Education in Inndia.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Institutional Correction lecture only . . .
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
Classroom Observation Tools for Teachers
PDF
Complications of Minimal Access Surgery at WLH
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial diseases, their pathogenesis and prophylaxis
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
O7-L3 Supply Chain Operations - ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Pharma ospi slides which help in ospi learning
Pre independence Education in Inndia.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Institutional Correction lecture only . . .
STATICS OF THE RIGID BODIES Hibbelers.pdf
TR - Agricultural Crops Production NC III.pdf
Basic Mud Logging Guide for educational purpose
Classroom Observation Tools for Teachers
Complications of Minimal Access Surgery at WLH
O5-L3 Freight Transport Ops (International) V1.pdf
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

Classification Theory

  • 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Classification Theory Modelling of Kernel Machine by Infinite and Semi-Infinite Programming Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber * Institute of Applied Mathematics, METU, Ankara, Turkey * Faculty of Economics, Management Science and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal 1 August 7, 2009
  • 2. Motivation Prediction of Cleavage Sites signal part mature part γ 2 August 7, 2009
  • 3. Logistic Regression  P(Y = 1 X = xl )  log  = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp  P(Y = 0 X = x )   l  (l = 1, 2,..., N ) 3 August 7, 2009
  • 4. Linear Classifiers Maximum margin classifier: γ i := yi ⋅ (< w, xi > +b) Note: γ i > 0 implies correct classification. γ yk ⋅ (< w, xk > +b) = 1 y j ⋅ (< w, x j > +b) = 1 4 August 7, 2009
  • 5. Linear Classifiers 2 • The geometric margin: γ= w 2 2 2 max min w w 2 2 2 Convex min w w ,b 2 Problem subject to yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l) 5 August 7, 2009
  • 6. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j xi , x j i =1 2 i , j =1 l subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 6 August 7, 2009
  • 7. Linear Classifiers Dual Problem: l 1 l max ∑ α i − ∑ yi y jα iα j κ ( xi , x j ) i =1 2 i , j =1 l kernel function subject to ∑ yα i =1 i i = 0, α i ≥ 0 (i = 1, 2,..., l). 7 August 7, 2009
  • 8. Linear Classifiers Soft Margin Classifier: • Introduce slack variables to allow the margin constraints to be violated subject to yi ⋅ ( w, x i + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) l w + C ∑ ξi2 2 min ξ , w ,b 2 i =1 subject to yi ⋅ ( w, xi + b) ≥ 1 − ξi , ξi ≥ 0 (i = 1, 2,..., l) 8 August 7, 2009
  • 9. Linear Classifiers • Projection of the data into a higher dimensional feature space. • Mapping the input space X into a new space F : x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x)) φ (x) φ (x) φ (0) φ (x) φ (x) φ (0) φ (x) φ (0) φ (0) φ (0) φ (x) 9 August 7, 2009
  • 10. Nonlinear Classifiers N set of hypotheses f ( x) =∑ wiφi ( x) + b, i =1 l dual representation f ( x) =∑ α i yi φ ( xi ), φ ( x) + b. i =1 kernel function Ex.: polynomial kernels κ ( x, z ) = (1 + xT z )k sigmoid Kernel κ ( x, z ) = tanh(axT z + b) κ ( x, z ) = exp(− x − z / σ 2 ) 2 Gaussian (RBF) kernel 2 10 August 7, 2009
  • 11. (In-) Finite Kernel Learning • Based on the motivation of multiple kernel learning (MKL): K ( ) ( κ xi , x j = ∑ β k κ k xi , x j ) k =1 kernel functions κ l (⋅, ⋅) : βl ≥ 0 ( l = 1,K, K ) , ∑ βk = 1 K k =1 • Semi-infinite LP formulation: (SILP MKL) max θ θ ,β (θ ∈R, β ∈RK ) ∑ K such that 0 ≤ β, β k =1 k = 1, ∑k =1βk Sk (α ) ≥ θ ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0. K l Sk (α ) := 1 l 2 ( ) ∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi l 11 August 7, 2009
  • 12. Infinite Kernel Learning Infinite Programming 2 ex.: −ω xi − x j * κ ( xi , x j , ω ) := ω exp 2 + (1 − ω )(1 + xiT x j ) d H (ω ) := κ ( xi , x j , ω ) homotopy 2 −ω * xi − x j H (0) = (1 + xi x j ) d T H (1) = exp 2 κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω ) Ω Infinite Programming 12 August 7, 2009
  • 13. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω ) Ω = [0,1] Ω 13 August 7, 2009
  • 14. Infinite Kernel Learning Infinite Programming • Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL), we get the following general problem formulation: max θ θ ,β (θ ∈ R, β : [0,1] → R : monotonically increasing ) (IP) 1 subject to ∫0 d β (ω ) = 1, 1  S (ω , α ) − ∑ i =1αi  d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0. l l ∫Ω  2     ( ) 1 l l S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω   A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0  l 2  i =1    1 T (ω , α ) := S (ω , α ) − ∑ α i l 14 2 i =1 August 7, 2009
  • 15. Infinite Kernel Learning Infinite Programming max θ (θ ∈ R, β : a positive measure on Ω ) (IP) θ ,β such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A, ∫Ω d β (ω ) = 1. Ω infinite programming dual of (IP): min σ (σ ∈ R , ρ : a positive measure on A ) σ ,ρ (DIP) such that σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω, ∫A d ρ (α ) = 1. A • Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and complementary slack, so β has measure only where σ = ∫A T (ω , α )d ρ and ρ has measure only where θ = ∫ T (ω , α )d β . Ω Then, both solutions are optimal for their respective problems. 15 August 7, 2009
  • 16. Infinite Kernel Learning Infinite Programming • The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic β is a step function). • Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and Ω , we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizers of g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ). A • Lower Level Problem: For a given parameter (σ , ρ ), we consider (LLP) min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω . ω 16 August 7, 2009
  • 17. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures • “finite optimization” 17 August 7, 2009
  • 18. Infinite Kernel Learning Infinite Programming • “reduction ansatz” and • Implicit Function Theorem • parametrical measures 1 −(ω − µ )2 e.g., f (ω ;( µ , σ )) = 2 exp σ 2π 2σ 2 λ exp(−λω), ω ≥ 0 f (ω ; λ) =  0, ω<0 H (ω − a) − H (ω − b) f (ω ;(a, b)) = b−a ωα −1 (1 − ω ) β −1 f (ω;(α , β )) = 1 α −1 β −1 ∫0 u (1 − u ) du • “finite optimization” 18 August 7, 2009
  • 19. Infinite Kernel Learning Reduction Ansatz • “reduction ansatz” and • Implicit Function Theorem g ( x, ⋅) % • parametrical measures g ( x ,.) Ω g ( x, y ) ≥ 0 ∀y ∈ I yj yj % yp ⇔ min g ( x, y ) ≥ 0 y∈I x a y j ( x) implicit function 19 August 7, 2009
  • 20. Infinite Kernel Learning Reduction Ansatz based on the reduction ansatz : min f ( x) subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p}) g ((σ , ρ ), ⋅) g ((σ , ρ ), ⋅) • (σ , ρ ) • ω ω (σ , ρ ) topology ω = ω (σ , ρ ) % 20 August 7, 2009
  • 21. Infinite Kernel Learning Regularization regularization t t d d2 min − θ + sup µ ∫ d β (ω ) ∫ d β (ω ) θ ,β t∈[0,1] dt 0 2 dt 0 subject to the constraints 0 = t0 < t1 < K < tι = 1 tν +1 tν tν ∫ d β (ω ) − ∫ d β (ω ) tν +1 d 1 ∫ d β (ω ) ≈ 0 0 = ∫ d β (ω ) dt tν +1 − tν tν +1 − tν 0 tν tν + 2 tν +1 1 1 ∫ d β (ω ) − ∫ d β (ω ) 2 tν tν + 2 − tν +1 tν +1 − tν d tν +1 tν dt 2 0 ∫ d β (ω ) ≈ tν +1 − tν 21 August 7, 2009
  • 22. Infinite Kernel Learning Topology Radon measure: measure on the σ -algebra of Borel sets of E that is locally finite and inner regular. (E,d): metric space inner regularity Η (E) : set of Radon measures on E neighbourhood of measure ρ : µ (Kν )     Bρ (ε ) :=  µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε  f   A A   dual space ( Η ( E ))′ of continuous bounded functions, Kν ⊂ E : compact set f ∈ ( Η ( E ))′ 22 August 7, 2009
  • 23. Infinite Kernel Learning Topology Def.: Basis of neighbourhood of a measure ρ ( f1,..., fn ∈(Η(E))′; ε > 0) : {µ ∈ Η (E) ∫E fi d ρ − ∫E fi d µ < ε } (i = 1, 2,..., n) . Def.: Prokhorov metric: d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} , ε where Aε := { x ∈ E | d ( x, A) < ε }. Open δ -neighbourhood of a measure ρ : Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }. 23 August 7, 2009
  • 24. Infinite Kernel Learning Numerical Results 24 August 7, 2009
  • 25. References Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157, 10 (May 2009) 2388-2394. Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming, Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings 1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4 (August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds.. Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to appear in the special issue of OMS (Optimization Software and Application) at the occasion of International Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K. (guest ed.). Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM, METU, submitted to JOGO (Journal of Global Optimization). 25 August 7, 2009