SlideShare a Scribd company logo
A Machine Learning (Theory)
Perspective on Computer Vision

            Peter Auer
      Montanuniversität Leoben
Outline

 What I am doing and how computer
 vision approached me (in 2002).
 Some modern machine learning
 algorithms used in computer vision,
 and their development:
   Boosting
   Support Vector Machines
 Concluding remarks
My background
 COLT 1993
   Conference on Learning Theory
   „On-Line Learning of Rectangles in Noisy
   Environments“

 FOCS 1995
   Symp. Foundations of Computer Science
   „Gambling in a Rigged Casino: The Adversarial
   Multi-Arm Bandit Problem“
   with N. Cesa-Bianchi, Y. Freund, R. Schapire

 ICML, NIPS, STOC, …
A computer vision project

 EU-Project LAVA, 2002
   “Learning for adaptable visual
   assistants”
   XRCE: Ch. Dance, R. Mohr
   IRIA Grenoble: C. Schmid, B. Triggs
   RHUL: J. Shawe-Taylor
   IDIAP: S. Bengio
LAVA Proposal
 Vision (goals)
   Recognition of generic objects and events
   Attention Mechanisms
   Base line and high-level descriptors
 Learning (means)
   Statistical Analysis
   Kernels and models and features
   Online Learning
Online learning
 Online Information Setting
   An input is received, a prediction is made, and
   then feedback is acquired.
   Goal: To make good predictions, in respect to
   a (large) set of fixed predictors.
 Online Computation Setting
   The amount of computation per new example –
   to update the learned information – is constant
   (or small).
   Goal: To be fast computationally.
 (Near) real-time learning?
Learning for vision around 2002
 Viola, Jones, CVPR 2001:
   Rapid object detection using a boosted cascade
   of simple features. (Boosting)
 Agarwal, Roth, ECCV 2002:
   Learning a Sparse Representation for Object
   Detection. (Winnow)
 Fergus, Perona, Zisserman, CVPR 2003:
   Object class recognition by unsupervised scale-
   invariant learning. (EM-type algorithm)
 Wallraven, Caputo, Graf, ICCV 2003:
   Recognition with local features: the kernel
   recipe. (SVM)
Our contribution in LAVA

 Opelt, Fussenegger, Pinz, Auer,
 ECCV 2004:
   Weak hypotheses and boosting for
   generic object detection and
   recognition.
Image classification
as a learning problem
Image classification as a learning problem

       Images are represented as vectors x = (x1 , . . . , xn ) ∈ X ⊂ Rn .

       Given
            training images x (1) , . . . , x (m) ∈ X
            with their classifications y (1) , . . . , y (m) ∈ Y = {−1, +1},
       a classifier H : X → Y is learned.


       We consider linear classifiers Hw , w ∈ Rn ,

                                             +1         if w · x ≥ 0
                        Hw (x) =
                                             −1         if w · x < 0
                    n
       (w · x =     i=1 wi xi ).



                                   P. Auer        ML Perspective on CV
The Perceptron algorithm (Rosenblatt, 1958)
   The Perceptron algorithm maintains a weight vector w (t) as its
   current classifier.
       Initialization w (1) = 0.
                            +1       if w (t) · x (t) ≥ 0
       Predict y (t) =
                 ˆ
                            −1       if w (t) · x (t) < 0
       If y (t) = y (t) then w (t+1) = w (t) ,
          ˆ
       else w (t+1) = w (t) + ηy (t) x (t) .
       (η is the learning rate.)


       The Perceptron was abandoned in 1969, when Minsky and
       Papert showed that Perceptrons are not able to learn some
       simple functions.
       Revived only in the 1980’s when neural networks became
       popular.

                                 P. Auer   ML Perspective on CV
Perceptron cannot learn XOR




 No single line can separate the green
 from the red boxes.
Non-linear classifiers



       Extending the feature space (or using kernels) prevents the
       problem:
                                                             2 2
       Since XOR is a quadratic function, use (1, x1 , x2 , x1 , x2 , x1 x2 )
       instead of (x1 , x2 ).
       For x1 , x2 ∈ {+1, −1},

                               x1 XOR x2 = x1 x2 .




                                P. Auer   ML Perspective on CV
Winnow (Littlestone 1987)


      Works like the Perceptron algorithm except for the update of
      the weights:
                       (t+1)             (t)                     (t)
                     wi        = wi            ∗ exp ηy (t) xi

      for some η > 0. (w (1) = 1.)


      Observe the multiplicative update of the weights and
            (t+1)         (t)        (t)
      log wi      = log wi + ηy (t) xi .


      Very related work:
      The Weighted Majority Algorithm (Littlestone, Warmuth)


                               P. Auer         ML Perspective on CV
Comparison of the Perceptron algorithm and Winnow


      Perceptron and Winnow scale differently in respect to
      relevant, used, and irrelevant attributes:


                         all attributes             n
                         relevant attributes        k
                         used attributes            d


                                     # training ex.
                                         √
                      Perceptron           dk
                      Winnow            k log n




                           P. Auer   ML Perspective on CV
Adaboost (Freund, Schapire, 1995)


                                              (s)
      AdaBoost maintains weights vt                     on the training examples
      (x (s) , y (s) ) over time t:

                            (s)
      Initialize weights v0       = 1.
      For t = 1, 2, . . .
           Select coordinate it with maximal correlation with the labels,
                 (s) (s) (s)
              s vt y    xi , as weak hypothesis.
                                                         (s)               (s)
           Choose αt which minimizes                s   vt exp −αt y (s) xit     .
                      (s)     (s)                         (s)
           Update vt+1 = vt exp −αt y (s) xit                    .
      For x = (x1 , . . . , xn ) predict sign (           t    αt xit ).



                                    P. Auer   ML Perspective on CV
History of Boosting (1)
 Rob Schapire:
 The strength of weak learnability, 1990.
   Showed that classifiers which are only 51%
   correct, can be combined into a 99% correct
   classifier.
   Rather a theoretical result, since the algorithm
   was complicated and not practical.
   I know people who thought that this was not
   an interesting result.
History of Boosting (2)

 Yoav Freund:
 Boosting a weak learning algorithm
 by majority, 1995.
   Improved boosting algorithm, but still
   complicated and theoretical.
   Only logarithmically many examples
   are forwarded to the weak learner!
History of Boosting (3)
 Y. Freund and R. Schapire:
 A decision-theoretic generalization of on-line
 learning and an application to boosting, 1995.
   Very simple boosting algorithm, easy to implement.
   Theoretically less interesting.
   Performs very well in practice.

 Won the Gödel price in 2003 and the Kanellakis
 price in 2004. (Both are prestigious prices in
 Theoretical Computer Science.)

 Since then many variants of Boosting (mainly to
 improve error robustness):
   BrownBoost, Soft margin boosting, LPBoost.
Support Vector Machines (SVMs)
 In its vanilla version also learns a linear classifier.

 It maximizes distance between the decision
 boundary and the nearest training points.
    Formulates learning as a well-behaved optimization
    problem.

 Invented by Vladimir Vapnik
 (1979, Russian paper).
    Translated in 1982.
    No practical applications,
    since it required linear separability.
Practical SVMs
 Vapnik:
    The Nature of Statistical Learning Theory, 1995.
    Statistical Learning Theory, 1998.

 Shawe-Taylor, Cristianini:
 Support Vector Machines, 2000.

 Soft margin SVMs:
    Tolerate incorrectly labeled training examples (by
    using slack variables).

 Non-linear classification using the “kernel trick”.
Support Vector Machines (SVMs)



                                    +
                                +                +
                          +                 +
                          +   +
                                                             −
                          + +                                    −
                                                             −
                                                         −           −
                                                     −       −
                                        −                −



                                                                         – p.21
Maschinelles Lernen   —   25.8.03   —   Peter Auer
The kernel trick (1)

       Recall the perceptron update,
                                                              t
                w (t+1) = w (t) + ηy (t) x (t) = η                 y (τ ) x (τ ) ,
                                                            τ =1

       and classification,
                                                       t
                         (t+1)
            y = sign w
            ˆ                    · x = sign                 y (τ ) x (τ ) · x        .
                                                     τ =1

       A kernel function generalizes the inner product,
                                       t
                     y = sign
                     ˆ                      y (τ ) K x (τ ) , x         .
                                     τ =1



                                 P. Auer     ML Perspective on CV
The kernel trick (2)


       The inner product x (τ ) · x is a measure of similarity:
       x (τ ) · x is maximal if x (τ ) = x.


       The kernel function is a similarity measure in feature space,
       K x (τ ) , x = Φ(x (τ ) ) · Φ(x).


       Kernel functions can be designed to capture the relevant
       similarities of the domain.


       Aizerman, Braverman, Rozonoer:
       Theoretical foundations of the potential function method in
       pattern recognition learning, 1964.


                               P. Auer   ML Perspective on CV
Where are we going?

 New learning algorithms?
 Better image descriptors!
 Probably they need to be learned.
 Probably they need to be
 hierarchical.
 We need (to use) more data.
Final remark on algorithm evaluation
and benchmarks

 Computer vision is in the state of
 machine learning 10 years ago (at
 least for object classification).

 Benchmark datasets start to
 become available, e.g. PASCAL
 VOC.

More Related Content

PDF
EM algorithm and its application in probabilistic latent semantic analysis
ODP
Derivative Free Optimization
PDF
Uncertainty in deep learning
PDF
Discrete Models in Computer Vision
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
Curve fitting
PDF
Ada boost brown boost performance with noisy data
EM algorithm and its application in probabilistic latent semantic analysis
Derivative Free Optimization
Uncertainty in deep learning
Discrete Models in Computer Vision
Maximum likelihood estimation of regularisation parameters in inverse problem...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Curve fitting
Ada boost brown boost performance with noisy data

What's hot (20)

PDF
The multilayer perceptron
PDF
Bayesian Dark Knowledge and Matrix Factorization
PDF
Lesson 16: Inverse Trigonometric Functions
PDF
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
PDF
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
PDF
Approximate Bayesian Computation on GPUs
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
NIPS2010: optimization algorithms in machine learning
PDF
Lesson 12: Linear Approximation
PDF
March12 natarajan
PDF
Machine learning of structured outputs
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Stability of adaptive random-walk Metropolis algorithms
PDF
Spectral Learning Methods for Finite State Machines with Applications to Na...
PDF
Ml mle_bayes
PDF
Johan Suykens: "Models from Data: a Unifying Picture"
PDF
Learning to discover monte carlo algorithm on spin ice manifold
PDF
Nonlinear Manifolds in Computer Vision
PDF
Intro probability 4
PDF
Lesson 14: Derivatives of Logarithmic and Exponential Functions
The multilayer perceptron
Bayesian Dark Knowledge and Matrix Factorization
Lesson 16: Inverse Trigonometric Functions
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Approximate Bayesian Computation on GPUs
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
NIPS2010: optimization algorithms in machine learning
Lesson 12: Linear Approximation
March12 natarajan
Machine learning of structured outputs
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Stability of adaptive random-walk Metropolis algorithms
Spectral Learning Methods for Finite State Machines with Applications to Na...
Ml mle_bayes
Johan Suykens: "Models from Data: a Unifying Picture"
Learning to discover monte carlo algorithm on spin ice manifold
Nonlinear Manifolds in Computer Vision
Intro probability 4
Lesson 14: Derivatives of Logarithmic and Exponential Functions
Ad

Viewers also liked (20)

PPTX
Critical approaches jacqui-clark
PPT
Stanley Deetz Managerialism and Organizational Democracy Approach
PPTX
Chapter 21 ppt (critical theory of communication in organizations)
PPT
Critical Approach
PPTX
Porous materials and metallic foams
 
PPT
Critical Theory Approach To Organizations
PDF
vilas Nikam- Mechanics of structure-Column
PPT
Lecture 05
PPSX
Wanderlust- Photo Contest 2016: Winners and Commended
PDF
mobileHut_May_16
DOC
Zaragoza Turismo 32
PDF
Presentación webinar “Los 12 mejores trucos de velocidad para WordPress”
PDF
EIT-Digital_Annual-Report-2015-Digital-Version
PDF
Intranet: un modelo para la transformación digital de la organización
PDF
2015 South By Southwest Sports: #SXSports Insights
PPTX
83 solid pancreatic masses on computed tomography
PPTX
El narcotráfico
PDF
美次級房貸風暴的影響評估
PPT
Task Management: 11 Tips for Effective Management
PPT
Critical approaches jacqui-clark
Stanley Deetz Managerialism and Organizational Democracy Approach
Chapter 21 ppt (critical theory of communication in organizations)
Critical Approach
Porous materials and metallic foams
 
Critical Theory Approach To Organizations
vilas Nikam- Mechanics of structure-Column
Lecture 05
Wanderlust- Photo Contest 2016: Winners and Commended
mobileHut_May_16
Zaragoza Turismo 32
Presentación webinar “Los 12 mejores trucos de velocidad para WordPress”
EIT-Digital_Annual-Report-2015-Digital-Version
Intranet: un modelo para la transformación digital de la organización
2015 South By Southwest Sports: #SXSports Insights
83 solid pancreatic masses on computed tomography
El narcotráfico
美次級房貸風暴的影響評估
Task Management: 11 Tips for Effective Management
Ad

Similar to 05 history of cv a machine learning (theory) perspective on computer vision (20)

PPT
Lecture2---Feed-Forward Neural Networks.ppt
PDF
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
PDF
lec4_annotated.pdf ml csci 567 vatsal sharan
PDF
MVPA with SpaceNet: sparse structured priors
PPT
Introduction
PDF
lec6_annotated.pdf ml csci 567 vatsal sharan
PDF
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
PDF
Sparse autoencoder
PDF
lec5_annotated.pdf ml csci 567 vatsal sharan
PDF
机器学习Adaboost
PDF
Random Matrix Theory and Machine Learning - Part 3
PPT
Introduction to Gradient Methods in machine learning
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
PPT
Chapter 1 introduction (Image Processing)
PPT
linear SVM.ppt
PDF
Cuckoo Search Algorithm: An Introduction
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
PPT
Machine Learning and Statistical Analysis
Lecture2---Feed-Forward Neural Networks.ppt
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
lec4_annotated.pdf ml csci 567 vatsal sharan
MVPA with SpaceNet: sparse structured priors
Introduction
lec6_annotated.pdf ml csci 567 vatsal sharan
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
Sparse autoencoder
lec5_annotated.pdf ml csci 567 vatsal sharan
机器学习Adaboost
Random Matrix Theory and Machine Learning - Part 3
Introduction to Gradient Methods in machine learning
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
Chapter 1 introduction (Image Processing)
linear SVM.ppt
Cuckoo Search Algorithm: An Introduction
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Machine Learning and Statistical Analysis

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
PDF
Icml2012 learning hierarchies of invariant features
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities
Icml2012 learning hierarchies of invariant features

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation

05 history of cv a machine learning (theory) perspective on computer vision

  • 1. A Machine Learning (Theory) Perspective on Computer Vision Peter Auer Montanuniversität Leoben
  • 2. Outline What I am doing and how computer vision approached me (in 2002). Some modern machine learning algorithms used in computer vision, and their development: Boosting Support Vector Machines Concluding remarks
  • 3. My background COLT 1993 Conference on Learning Theory „On-Line Learning of Rectangles in Noisy Environments“ FOCS 1995 Symp. Foundations of Computer Science „Gambling in a Rigged Casino: The Adversarial Multi-Arm Bandit Problem“ with N. Cesa-Bianchi, Y. Freund, R. Schapire ICML, NIPS, STOC, …
  • 4. A computer vision project EU-Project LAVA, 2002 “Learning for adaptable visual assistants” XRCE: Ch. Dance, R. Mohr IRIA Grenoble: C. Schmid, B. Triggs RHUL: J. Shawe-Taylor IDIAP: S. Bengio
  • 5. LAVA Proposal Vision (goals) Recognition of generic objects and events Attention Mechanisms Base line and high-level descriptors Learning (means) Statistical Analysis Kernels and models and features Online Learning
  • 6. Online learning Online Information Setting An input is received, a prediction is made, and then feedback is acquired. Goal: To make good predictions, in respect to a (large) set of fixed predictors. Online Computation Setting The amount of computation per new example – to update the learned information – is constant (or small). Goal: To be fast computationally. (Near) real-time learning?
  • 7. Learning for vision around 2002 Viola, Jones, CVPR 2001: Rapid object detection using a boosted cascade of simple features. (Boosting) Agarwal, Roth, ECCV 2002: Learning a Sparse Representation for Object Detection. (Winnow) Fergus, Perona, Zisserman, CVPR 2003: Object class recognition by unsupervised scale- invariant learning. (EM-type algorithm) Wallraven, Caputo, Graf, ICCV 2003: Recognition with local features: the kernel recipe. (SVM)
  • 8. Our contribution in LAVA Opelt, Fussenegger, Pinz, Auer, ECCV 2004: Weak hypotheses and boosting for generic object detection and recognition.
  • 9. Image classification as a learning problem
  • 10. Image classification as a learning problem Images are represented as vectors x = (x1 , . . . , xn ) ∈ X ⊂ Rn . Given training images x (1) , . . . , x (m) ∈ X with their classifications y (1) , . . . , y (m) ∈ Y = {−1, +1}, a classifier H : X → Y is learned. We consider linear classifiers Hw , w ∈ Rn , +1 if w · x ≥ 0 Hw (x) = −1 if w · x < 0 n (w · x = i=1 wi xi ). P. Auer ML Perspective on CV
  • 11. The Perceptron algorithm (Rosenblatt, 1958) The Perceptron algorithm maintains a weight vector w (t) as its current classifier. Initialization w (1) = 0. +1 if w (t) · x (t) ≥ 0 Predict y (t) = ˆ −1 if w (t) · x (t) < 0 If y (t) = y (t) then w (t+1) = w (t) , ˆ else w (t+1) = w (t) + ηy (t) x (t) . (η is the learning rate.) The Perceptron was abandoned in 1969, when Minsky and Papert showed that Perceptrons are not able to learn some simple functions. Revived only in the 1980’s when neural networks became popular. P. Auer ML Perspective on CV
  • 12. Perceptron cannot learn XOR No single line can separate the green from the red boxes.
  • 13. Non-linear classifiers Extending the feature space (or using kernels) prevents the problem: 2 2 Since XOR is a quadratic function, use (1, x1 , x2 , x1 , x2 , x1 x2 ) instead of (x1 , x2 ). For x1 , x2 ∈ {+1, −1}, x1 XOR x2 = x1 x2 . P. Auer ML Perspective on CV
  • 14. Winnow (Littlestone 1987) Works like the Perceptron algorithm except for the update of the weights: (t+1) (t) (t) wi = wi ∗ exp ηy (t) xi for some η > 0. (w (1) = 1.) Observe the multiplicative update of the weights and (t+1) (t) (t) log wi = log wi + ηy (t) xi . Very related work: The Weighted Majority Algorithm (Littlestone, Warmuth) P. Auer ML Perspective on CV
  • 15. Comparison of the Perceptron algorithm and Winnow Perceptron and Winnow scale differently in respect to relevant, used, and irrelevant attributes: all attributes n relevant attributes k used attributes d # training ex. √ Perceptron dk Winnow k log n P. Auer ML Perspective on CV
  • 16. Adaboost (Freund, Schapire, 1995) (s) AdaBoost maintains weights vt on the training examples (x (s) , y (s) ) over time t: (s) Initialize weights v0 = 1. For t = 1, 2, . . . Select coordinate it with maximal correlation with the labels, (s) (s) (s) s vt y xi , as weak hypothesis. (s) (s) Choose αt which minimizes s vt exp −αt y (s) xit . (s) (s) (s) Update vt+1 = vt exp −αt y (s) xit . For x = (x1 , . . . , xn ) predict sign ( t αt xit ). P. Auer ML Perspective on CV
  • 17. History of Boosting (1) Rob Schapire: The strength of weak learnability, 1990. Showed that classifiers which are only 51% correct, can be combined into a 99% correct classifier. Rather a theoretical result, since the algorithm was complicated and not practical. I know people who thought that this was not an interesting result.
  • 18. History of Boosting (2) Yoav Freund: Boosting a weak learning algorithm by majority, 1995. Improved boosting algorithm, but still complicated and theoretical. Only logarithmically many examples are forwarded to the weak learner!
  • 19. History of Boosting (3) Y. Freund and R. Schapire: A decision-theoretic generalization of on-line learning and an application to boosting, 1995. Very simple boosting algorithm, easy to implement. Theoretically less interesting. Performs very well in practice. Won the Gödel price in 2003 and the Kanellakis price in 2004. (Both are prestigious prices in Theoretical Computer Science.) Since then many variants of Boosting (mainly to improve error robustness): BrownBoost, Soft margin boosting, LPBoost.
  • 20. Support Vector Machines (SVMs) In its vanilla version also learns a linear classifier. It maximizes distance between the decision boundary and the nearest training points. Formulates learning as a well-behaved optimization problem. Invented by Vladimir Vapnik (1979, Russian paper). Translated in 1982. No practical applications, since it required linear separability.
  • 21. Practical SVMs Vapnik: The Nature of Statistical Learning Theory, 1995. Statistical Learning Theory, 1998. Shawe-Taylor, Cristianini: Support Vector Machines, 2000. Soft margin SVMs: Tolerate incorrectly labeled training examples (by using slack variables). Non-linear classification using the “kernel trick”.
  • 22. Support Vector Machines (SVMs) + + + + + + + − + + − − − − − − − − – p.21 Maschinelles Lernen — 25.8.03 — Peter Auer
  • 23. The kernel trick (1) Recall the perceptron update, t w (t+1) = w (t) + ηy (t) x (t) = η y (τ ) x (τ ) , τ =1 and classification, t (t+1) y = sign w ˆ · x = sign y (τ ) x (τ ) · x . τ =1 A kernel function generalizes the inner product, t y = sign ˆ y (τ ) K x (τ ) , x . τ =1 P. Auer ML Perspective on CV
  • 24. The kernel trick (2) The inner product x (τ ) · x is a measure of similarity: x (τ ) · x is maximal if x (τ ) = x. The kernel function is a similarity measure in feature space, K x (τ ) , x = Φ(x (τ ) ) · Φ(x). Kernel functions can be designed to capture the relevant similarities of the domain. Aizerman, Braverman, Rozonoer: Theoretical foundations of the potential function method in pattern recognition learning, 1964. P. Auer ML Perspective on CV
  • 25. Where are we going? New learning algorithms? Better image descriptors! Probably they need to be learned. Probably they need to be hierarchical. We need (to use) more data.
  • 26. Final remark on algorithm evaluation and benchmarks Computer vision is in the state of machine learning 10 years ago (at least for object classification). Benchmark datasets start to become available, e.g. PASCAL VOC.