SlideShare a Scribd company logo
Object Recognition with
     Deformable Models
            Pedro F. Felzenszwalb
        Department of Computer Science
            University of Chicago



Joint work with: Dan Huttenlocher, Joshua Schwartz,
         David McAllester, Deva Ramanan.
Example Problems
  Detecting rigid objects              PASCAL challenge




                            Medical image
Detecting non-rigid objects   analysis
                                            Segmenting cells
Deformable Models
•   Significant challenge:
    - Handling variation in appearance within object classes
    - Non-rigid objects, generic categories, etc.
•   Deformable models approach:
    - Consider each object as a deformed version of a template
    - Compact representation
    - Leads to interesting modeling and algorithmic problems
Overview
•   Part I: Pictorial Structures
    - Deformable part models
    - Highly efficient matching algorithms
•   Part II: Deformable Shapes
    - Triangulated polygons
    - Hierarchical models
•   Part III: The PASCAL Challenge
    - Recognizing 20 object categories in realistic scenes
    - Discriminatively trained, multiscale, deformable part models
Part I: Pictorial Structures

•   Introduced by Fischler and Elschlager in 1973

•   Part-based models:
    - Each part represents local visual properties
    - “Springs” capture spatial relationships
                             Matching model to image involves
                             joint optimization of part locations
                                       “stretch and fit”
Local Evidence + Global Decision

•   Parts have a match quality at each image location

•   Local evidence is noisy
    - Parts are detected in the context of the whole model
             part




          test image                 match quality
Matching Problem

•   Model is represented by a graph G = (V, E)
    - V = {v ,...,v } are the parts
                 1         n

    - (v ,v ) ∈ E indicates a connection between parts
         i   j

•   mi(li) is a cost for placing part i at location li

•   dij(li,lj) is a deformation cost

•   Optimal configuration for the object is L = (l1,...,ln) minimizing
                     n
     E(L) =          ∑ m (l ) + ∑ d (l ,l )
                               i i                 ij i j
                     i=1             (vi,vj) ∈ E
Matching Problem
                           n
                E(L) =    ∑ m (l ) + ∑ d (l ,l )
                                 i i                 ij i j
                          i=1          (vi,vj) ∈ E


•   Assume n parts, k possible locations for each part
    - There are k n   configurations L

•   If graph is a tree we can use dynamic programming
    - O(nk ) algorithm
            2


•   If dij(li,lj) = g(li-lj) we can use min-convolutions
    - O(nk) algorithm
    - As fast as matching each part separately!
Dynamic Programming on Trees
                     n                                                 v2
          E(L) =    ∑ m (l ) + ∑ d (l ,l )
                              i i                 ij i j
                    i=1             (vi,vj) ∈ E                   v1



•   For each l1 find best l2:

    - Best (l ) = min [m (l ) + d
           2 1
                         l2
                                2 2               12(l1,l2)   ]
•   “Delete” v2 and solve problem with smaller model

•   Keep removing leafs until there is a single part left
Min-Convolution Speedup
                                                           v2

      Best2(l1) = min [m2(l2) + d12(l1,l2)]           v1
                     l2




•   Brute force: O(k2) --- k is number of locations

•   Suppose d12(l1,l2) = g(l1-l2):

    - Best (l ) = min [m (l ) + g(l -l )]
           2 1
                     l2
                            2 2        1 2


•   Min-convolution: O(k) if g is convex
Finding Motorbikes

Model with 6 parts:
      2 wheels
    2 headlights
front & back of seat
Human Pose Estimation
Human Tracking




Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance
IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
Part II: Deformable Shapes
•   Shape is a fundamental cue for recognizing objects

•   Many objects have no well defined parts
    - We can capture their outlines using deformable models
Triangulated Polygons




•   Polygonal templates

•   Delauney triangulation gives natural decomposition of an object

•   Consider deforming each triangle “independently”


                                    Rabbit ear can be bent by
                                    changing shape of a single
                                            triangle
Structure of Triangulated Polygons


                     There are 2 graphs associated with a
                            triangulated polygon



If the polygon is simple (no holes):

  Dual graph is a tree
  Graphical structure of triangulation is a 2-tree
Deformable Matching
        Consider piecewise affine maps from model
        to image (taking triangles to triangles)

        Find globally optimal deformation using
Model   dynamic programming over 2-tree




            Matching to MRI data
Hierarchical Shape Model
•   Shape-tree of curve from a to b:
    -   Select midpoint c, store relative location c | a,b.
    -   Left child is a shape-tree of sub-curve from a to c.
    -   Right child is a shape-tree of sub-curve from c to b.
                            h
            f           c       d     i
                e   g                                     c | a,b
                                          b
        a

                                              e | a,c                d | c,b




                                    f | a,e     g | e,c             h | c,d    i | d,b
Deformations

•   Independently perturb relative locations stored in a shape-tree
    -   Local and global properties are preserved
    -   Reconstructed curve is perceptually similar to original
Matching
                     h
     f           c           d     i
         e   g                                         c | a,b

a
                                       b   w                                           p

                                           e | a,c                d | c,b
                                                                                               r


                         v       f | a,e     g | e,c             h | c,d    i | d,b
                                                                                           q


                                             u
    model                                                                             curve

Match(v, [p,q]) = w1
Match(u, [q,r]) = w2
Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))

         similar to parsing with the CKY algorithm
Recognizing Leafs




Nearest neighbor classification
                                  15 species
   Shape-tree           96.28
                                  75 examples per species
 Inner distance         94.13
                                  (25 training, 50 test)
 Shape context          88.12
Part III: PASCAL Challenge
•   ~10,000 images, with ~25,000 target objects
    - Objects from 20 categories (person, car, bicycle, cow, table...)
    - Objects are annotated with labeled bounding boxes
Object Recognition with Deformable Models
Model Overview




detection     root filter   part filters deformation
                                         models

Model has a root filter plus deformable parts
Histogram of Gradient (HOG) Features




•   Image is partitioned into 8x8 pixel blocks

•   In each block we compute a histogram of gradient orientations
    - Invariant to changes in lighting, small deformations, etc.
•   We compute features at different resolutions (pyramid)
Filters

•   Filters are rectangular templates defining weights for features

•   Score is dot product of filter and subwindow of HOG pyramid


                                                          H
                                          W
                                      Score of H at this location is H ⋅ W




                        HOG pyramid
Object Hypothesis




                                              Score is sum of filter
                                             scores plus deformation
                                                      scores

  Image pyramid        HOG feature pyramid




Multiscale model captures features at two-resolutions
Training
•   Training data consists of images with labeled bounding boxes

•   Need to learn the model structure, filters and deformation costs




                                    Training
Connection With Linear Classifiers
 •   Score of model is sum of filter scores plus deformation scores
     - Bounding box in training data specifies that score should be
       high for some placement in a range


                   w is a model
                   x is a detection window
                   z are filter placements




concatenation of filters and       concatenation of features
deformation parameters            and part displacements
Latent SVMs


Linear in w if z is fixed




            Regularization   Hinge loss
Learned Models
                            Bicycle
                     Sofa


          Car
Bottle
Example Results
More Results
Overall Results

•   9 systems competed in the 2007 challenge

•   Out of 20 classes we get:
    - First place in 10 classes
    - Second place in 6 classes
•   Some statistics:
    - It takes ~2 seconds to evaluate a model in one image
    - It takes ~3 hours to train a model
    - MUCH faster than most systems
Component Analysis

                               PASCAL2006 Person
             1
            0.9                       Root (0.18)
                                      Root+Latent (0.24)
            0.8                       Parts+Latent (0.29)
            0.7                       Root+Parts+Latent (0.34)
            0.6
precision




            0.5
            0.4
            0.3
            0.2
            0.1
             0
                  0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9        1
                                     recall
Summary

•   Deformable models provide an elegant framework for object
    detection and recognition

    - Efficient algorithms for matching models to images
    - Applications: pose estimation, medical image analysis,
      object recognition, etc.

•   We can learn models from partially labeled data

    - Generalized standard ideas from machine learning
    - Leads to state-of-the-art results in PASCAL challenge
•   Future work: hierarchical models, grammars, 3D objects

More Related Content

PDF
Object Detection with Discrmininatively Trained Part based Models
KEY
Team meeting 100325
PDF
Mesh Processing Course : Differential Calculus
PDF
Modern features-part-1-detectors
PDF
Mesh Processing Course : Geodesic Sampling
PDF
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
OOPSLA04.ppt
Object Detection with Discrmininatively Trained Part based Models
Team meeting 100325
Mesh Processing Course : Differential Calculus
Modern features-part-1-detectors
Mesh Processing Course : Geodesic Sampling
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
OOPSLA04.ppt

What's hot (19)

PDF
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
PDF
Lesson 27: Evaluating Definite Integrals
PDF
Curve fitting
PDF
Masters Thesis Defense
PDF
Identity Based Encryption
PDF
Multimodal pattern matching algorithms and applications
PDF
Time Machine session @ ICME 2012 - DTW's New Youth
PDF
Gottlob ICDE 2011
PDF
Physics of Algorithms Talk
PDF
Computational tools for Bayesian model choice
PDF
Integrated Math 2 Section 6-2
PDF
Iccv2011 learning spatiotemporal graphs of human activities
PPTX
PDF
Learning with Nets and Meshes
PDF
Spectral Learning Methods for Finite State Machines with Applications to Na...
PDF
Auctions for Distributed (and Possibly Parallel) Matchings
PDF
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
PDF
Fcv learn ramanan
PDF
Venn diagram
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Lesson 27: Evaluating Definite Integrals
Curve fitting
Masters Thesis Defense
Identity Based Encryption
Multimodal pattern matching algorithms and applications
Time Machine session @ ICME 2012 - DTW's New Youth
Gottlob ICDE 2011
Physics of Algorithms Talk
Computational tools for Bayesian model choice
Integrated Math 2 Section 6-2
Iccv2011 learning spatiotemporal graphs of human activities
Learning with Nets and Meshes
Spectral Learning Methods for Finite State Machines with Applications to Na...
Auctions for Distributed (and Possibly Parallel) Matchings
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Fcv learn ramanan
Venn diagram
Ad

Similar to Object Recognition with Deformable Models (20)

PPTX
Computer Vision transformations
PDF
Structured regression for efficient object detection
PPT
16 17 bag_words
PPTX
lec07_transformations.pptx
PPTX
Iccv11 salientobjectdetection
PPSX
point processing
PPT
Solid modeling
PDF
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
PPT
lecture07 dicrete mathematics relation .ppt
PPTX
07 cie552 image_mosaicing
PDF
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
PPTX
Unit 4-PartB of data design and algorithms
PDF
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
PDF
bstract Point processing uses only the information in individual pixels to pr...
PPTX
Mesh final pzn_geo1004_2015_f3_2017
PDF
Cg 04-math
PDF
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
PDF
Lec11: Active Contour and Level Set for Medical Image Segmentation
PDF
Community structure in complex networks
Computer Vision transformations
Structured regression for efficient object detection
16 17 bag_words
lec07_transformations.pptx
Iccv11 salientobjectdetection
point processing
Solid modeling
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
lecture07 dicrete mathematics relation .ppt
07 cie552 image_mosaicing
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
Unit 4-PartB of data design and algorithms
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
bstract Point processing uses only the information in individual pixels to pr...
Mesh final pzn_geo1004_2015_f3_2017
Cg 04-math
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Lec11: Active Contour and Level Set for Medical Image Segmentation
Community structure in complex networks
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Icml2012 learning hierarchies of invariant features
PPTX
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Icml2012 learning hierarchies of invariant features
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharma ospi slides which help in ospi learning
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
RMMM.pdf make it easy to upload and study
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Institutional Correction lecture only . . .
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
A systematic review of self-coping strategies used by university students to ...
VCE English Exam - Section C Student Revision Booklet
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
Microbial diseases, their pathogenesis and prophylaxis
Pharma ospi slides which help in ospi learning
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Microbial disease of the cardiovascular and lymphatic systems
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
RMMM.pdf make it easy to upload and study
202450812 BayCHI UCSC-SV 20250812 v17.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
Pharmacology of Heart Failure /Pharmacotherapy of CHF
A systematic review of self-coping strategies used by university students to ...

Object Recognition with Deformable Models

  • 1. Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.
  • 2. Example Problems Detecting rigid objects PASCAL challenge Medical image Detecting non-rigid objects analysis Segmenting cells
  • 3. Deformable Models • Significant challenge: - Handling variation in appearance within object classes - Non-rigid objects, generic categories, etc. • Deformable models approach: - Consider each object as a deformed version of a template - Compact representation - Leads to interesting modeling and algorithmic problems
  • 4. Overview • Part I: Pictorial Structures - Deformable part models - Highly efficient matching algorithms • Part II: Deformable Shapes - Triangulated polygons - Hierarchical models • Part III: The PASCAL Challenge - Recognizing 20 object categories in realistic scenes - Discriminatively trained, multiscale, deformable part models
  • 5. Part I: Pictorial Structures • Introduced by Fischler and Elschlager in 1973 • Part-based models: - Each part represents local visual properties - “Springs” capture spatial relationships Matching model to image involves joint optimization of part locations “stretch and fit”
  • 6. Local Evidence + Global Decision • Parts have a match quality at each image location • Local evidence is noisy - Parts are detected in the context of the whole model part test image match quality
  • 7. Matching Problem • Model is represented by a graph G = (V, E) - V = {v ,...,v } are the parts 1 n - (v ,v ) ∈ E indicates a connection between parts i j • mi(li) is a cost for placing part i at location li • dij(li,lj) is a deformation cost • Optimal configuration for the object is L = (l1,...,ln) minimizing n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E
  • 8. Matching Problem n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E • Assume n parts, k possible locations for each part - There are k n configurations L • If graph is a tree we can use dynamic programming - O(nk ) algorithm 2 • If dij(li,lj) = g(li-lj) we can use min-convolutions - O(nk) algorithm - As fast as matching each part separately!
  • 9. Dynamic Programming on Trees n v2 E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E v1 • For each l1 find best l2: - Best (l ) = min [m (l ) + d 2 1 l2 2 2 12(l1,l2) ] • “Delete” v2 and solve problem with smaller model • Keep removing leafs until there is a single part left
  • 10. Min-Convolution Speedup v2 Best2(l1) = min [m2(l2) + d12(l1,l2)] v1 l2 • Brute force: O(k2) --- k is number of locations • Suppose d12(l1,l2) = g(l1-l2): - Best (l ) = min [m (l ) + g(l -l )] 2 1 l2 2 2 1 2 • Min-convolution: O(k) if g is convex
  • 11. Finding Motorbikes Model with 6 parts: 2 wheels 2 headlights front & back of seat
  • 13. Human Tracking Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
  • 14. Part II: Deformable Shapes • Shape is a fundamental cue for recognizing objects • Many objects have no well defined parts - We can capture their outlines using deformable models
  • 15. Triangulated Polygons • Polygonal templates • Delauney triangulation gives natural decomposition of an object • Consider deforming each triangle “independently” Rabbit ear can be bent by changing shape of a single triangle
  • 16. Structure of Triangulated Polygons There are 2 graphs associated with a triangulated polygon If the polygon is simple (no holes): Dual graph is a tree Graphical structure of triangulation is a 2-tree
  • 17. Deformable Matching Consider piecewise affine maps from model to image (taking triangles to triangles) Find globally optimal deformation using Model dynamic programming over 2-tree Matching to MRI data
  • 18. Hierarchical Shape Model • Shape-tree of curve from a to b: - Select midpoint c, store relative location c | a,b. - Left child is a shape-tree of sub-curve from a to c. - Right child is a shape-tree of sub-curve from c to b. h f c d i e g c | a,b b a e | a,c d | c,b f | a,e g | e,c h | c,d i | d,b
  • 19. Deformations • Independently perturb relative locations stored in a shape-tree - Local and global properties are preserved - Reconstructed curve is perceptually similar to original
  • 20. Matching h f c d i e g c | a,b a b w p e | a,c d | c,b r v f | a,e g | e,c h | c,d i | d,b q u model curve Match(v, [p,q]) = w1 Match(u, [q,r]) = w2 Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r)) similar to parsing with the CKY algorithm
  • 21. Recognizing Leafs Nearest neighbor classification 15 species Shape-tree 96.28 75 examples per species Inner distance 94.13 (25 training, 50 test) Shape context 88.12
  • 22. Part III: PASCAL Challenge • ~10,000 images, with ~25,000 target objects - Objects from 20 categories (person, car, bicycle, cow, table...) - Objects are annotated with labeled bounding boxes
  • 24. Model Overview detection root filter part filters deformation models Model has a root filter plus deformable parts
  • 25. Histogram of Gradient (HOG) Features • Image is partitioned into 8x8 pixel blocks • In each block we compute a histogram of gradient orientations - Invariant to changes in lighting, small deformations, etc. • We compute features at different resolutions (pyramid)
  • 26. Filters • Filters are rectangular templates defining weights for features • Score is dot product of filter and subwindow of HOG pyramid H W Score of H at this location is H ⋅ W HOG pyramid
  • 27. Object Hypothesis Score is sum of filter scores plus deformation scores Image pyramid HOG feature pyramid Multiscale model captures features at two-resolutions
  • 28. Training • Training data consists of images with labeled bounding boxes • Need to learn the model structure, filters and deformation costs Training
  • 29. Connection With Linear Classifiers • Score of model is sum of filter scores plus deformation scores - Bounding box in training data specifies that score should be high for some placement in a range w is a model x is a detection window z are filter placements concatenation of filters and concatenation of features deformation parameters and part displacements
  • 30. Latent SVMs Linear in w if z is fixed Regularization Hinge loss
  • 31. Learned Models Bicycle Sofa Car Bottle
  • 34. Overall Results • 9 systems competed in the 2007 challenge • Out of 20 classes we get: - First place in 10 classes - Second place in 6 classes • Some statistics: - It takes ~2 seconds to evaluate a model in one image - It takes ~3 hours to train a model - MUCH faster than most systems
  • 35. Component Analysis PASCAL2006 Person 1 0.9 Root (0.18) Root+Latent (0.24) 0.8 Parts+Latent (0.29) 0.7 Root+Parts+Latent (0.34) 0.6 precision 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
  • 36. Summary • Deformable models provide an elegant framework for object detection and recognition - Efficient algorithms for matching models to images - Applications: pose estimation, medical image analysis, object recognition, etc. • We can learn models from partially labeled data - Generalized standard ideas from machine learning - Leads to state-of-the-art results in PASCAL challenge • Future work: hierarchical models, grammars, 3D objects