SlideShare a Scribd company logo
CVPR 2009, Miami, Florida Subhransu Maji and Jitendra Malik University of California at Berkeley, Berkeley, CA-94720 Object Detection Using a  Max-Margin Hough Transform
Overview Overview of probabilistic Hough transform Learning framework Experiments Summary
Our Approach: Hough Transform Popular for detecting parameterized shapes Hough’59, Duda&Hart’72, Ballard’81,… Local parts vote for object pose Complexity : # parts * # votes  Can be significantly lower than brute force search over pose (for example sliding window detectors)
Generalized to object detection  Learning Learn appearance codebook Cluster over interest points on  training images Use Hough space voting to find objects  Lowe’99, Leibe et.al.’04,’08, Opelt&Pinz’08  Implicit Shape Model Leibe et.al.’04,’08 Learn spatial distributions Match codebook to training images Record matching positions on object Centroid is given Spatial occurrence distributions x y s x y s x y s x y s
Detection Pipeline B. Leibe, A. Leonardis, and B. Schiele.  Combined object categorization and segmentation with an implicit shape model ‘ 2004 Probabilistic  Voting Interest Points eg. SIFT,GB, Local Patches Matched Codebook  Entries KD Tree
Probabilistic Hough Transform C – Codebook f – features, l - locations Position Posterior Codeword Match Codeword likelihood Detection Score Codeword likelihood
Learning Feature Weights Given  :  Appearance Codebook, C  Posterior distribution of object center for each codeword P(x|…) To Do  :  Learn codebook weights such that the Hough transform detector works well (i.e. better detection rates) Contributions  :  Show that these weights can be learned optimally using a max-margin framework. Demonstrate that this leads to improved accuracy on various datasets
Naïve Bayes weights: Encourages relatively rare parts However rare parts may not be good predictors of the object location Need to jointly consider both priors and distribution of location centers. Learning Feature Weights : First Try
Location invariance assumption Overall score is linear given the matched codebook entries Learning Feature Weights : Second Try Position Posterior Codeword Match Codeword likelihood Activations Feature weights
Max-Margin Training Training:  Construct dictionary  Record codeword distributions on training examples Compute  “a”  vectors on positive and negative training examples Learn codebook weights using by max-margin training Standard ISM model (Leibe et.al.’04) Our Contribution class label  {+1,-1} activations non negative
Experiment Datasets ETHZ Shape Dataset ( Ferrari et al., ECCV 2006)  255 images, over 5 classes (Apple logo, Bottle, Giraffe, Mug, Swan)   UIUC Single Scale Cars Dataset ( Agarwal & Roth, ECCV 2002)  1050 training, 170 test images INRIA Horse Dataset ( Jurie & Ferrari)  170 positive + 170 negative images (50 + 50 for training)
Experimental Results Hough transform details Interest points : Geometric Blur descriptors at sparse sample of edges (Berg&Malik’01) Codebook constructed using  k -means Voting over position and aspect ratio  Search over scales Correct detections (PASCAL criterion)
Learned Weights (ETHZ shape) Max-Margin Important Parts Naïve Bayes blue (low)  ,  dark red (high) Influenced by clutter (rare structures)
Learned Weights (UIUC cars) blue (low)  ,  dark red (high) Naïve Bayes Max-Margin Important Parts
Learned Weights (INRIA horses) blue (low)  ,  dark red (high) Naïve Bayes Max-Margin Important Parts
Detection Results (ETHZ dataset) Recall @ 1.0 False Positives Per Window
Detection Results (INRIA Horses) Our Work
Detection Results (UIUC Cars) INRIA horses Our Work
Hough Voting + Verification Classifier Recall @ 0.3 False Positives Per Image  ETHZ Shape Dataset  IKSVM was run on top 30 windows + local search KAS – Ferrari et.al., PAMI’08 TPS-RPM – Ferrari et.al., CVPR’07 better fitting bounding box Implicit sampling over aspect-ratio
Hough Voting + Verification Classifier IKSVM was run on top 30 windows + local search Our Work
Hough Voting + Verification Classifier UIUC Single Scale Car Dataset IKSVM was run on top 10 windows + local search 1.7% improvement
Summary Hough transform based detectors offer good detection performance and speed.  To get better performance one may learn Discriminative dictionaries (two talks ago, Gall et.al.’09) Weights on codewords (our work) Our approach directly optimizes detection performance using a max-margin formulation Any weak predictor of object center can be used is this framework Eg. Regions (one talk ago, Gu et.al. CVPR’09)
Work partially supported by: ARO MURI W911NF-06-1-0076 and ONR MURI N00014-06-1-0734 Computer Vision Group @ UC Berkeley Acknowledgements Thank You Questions?
Backup Slide : Toy Example Rare but poor localization Rare and good localization

More Related Content

PPT
Cvpr2007 object category recognition p4 - combined segmentation and recogni...
PPTX
KantoCV/Selective Search for Object Recognition
PDF
20160417dlibによる顔器官検出
PDF
機械学習チュートリアル@Jubatus Casual Talks
PPT
powerpoint
PPT
lecture14-learning-ranking.ppt
PPT
lecture14-learning-ranking.ppt
PDF
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Cvpr2007 object category recognition p4 - combined segmentation and recogni...
KantoCV/Selective Search for Object Recognition
20160417dlibによる顔器官検出
機械学習チュートリアル@Jubatus Casual Talks
powerpoint
lecture14-learning-ranking.ppt
lecture14-learning-ranking.ppt
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...

Similar to CVPR2009: Object Detection Using a Max-Margin Hough Transform (20)

PPT
iccv2009 tutorial: boosting and random forest - part III
PPT
Ensemble Learning Featuring the Netflix Prize Competition and ...
PPTX
Lecture06_Version Space Algorithm Part2.pptx
PPT
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
PDF
Exposé Ontology
PPT
introducción a Machine Learning
PPT
introducción a Machine Learning
PDF
AI in Production
PPTX
Discovery Hub: on-the-fly linked data exploratory search
PDF
Analyse de sentiment et classification par approche neuronale en Python et Weka
ODP
An Introduction to Computer Vision
PDF
Scalable Software Testing and Verification of Non-Functional Properties throu...
PDF
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
ODP
Artificial Intelligence and Optimization with Parallelism
PPTX
Deep Learning: Chapter 11 Practical Methodology
PDF
What is pattern recognition (lecture 4 of 6)
PPTX
xai basic solutions , with some examples and formulas
PPT
Part 1
PPTX
Introduction
PPTX
Introduction
iccv2009 tutorial: boosting and random forest - part III
Ensemble Learning Featuring the Netflix Prize Competition and ...
Lecture06_Version Space Algorithm Part2.pptx
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Exposé Ontology
introducción a Machine Learning
introducción a Machine Learning
AI in Production
Discovery Hub: on-the-fly linked data exploratory search
Analyse de sentiment et classification par approche neuronale en Python et Weka
An Introduction to Computer Vision
Scalable Software Testing and Verification of Non-Functional Properties throu...
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Artificial Intelligence and Optimization with Parallelism
Deep Learning: Chapter 11 Practical Methodology
What is pattern recognition (lecture 4 of 6)
xai basic solutions , with some examples and formulas
Part 1
Introduction
Introduction
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities
Ad

Recently uploaded (20)

PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
20th Century Theater, Methods, History.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
Empowerment Technology for Senior High School Guide
PDF
Trump Administration's workforce development strategy
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
20th Century Theater, Methods, History.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
History, Philosophy and sociology of education (1).pptx
B.Sc. DS Unit 2 Software Engineering.pptx
Introduction to pro and eukaryotes and differences.pptx
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
HVAC Specification 2024 according to central public works department
Empowerment Technology for Senior High School Guide
Trump Administration's workforce development strategy
Indian roads congress 037 - 2012 Flexible pavement
Weekly quiz Compilation Jan -July 25.pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...

CVPR2009: Object Detection Using a Max-Margin Hough Transform

  • 1. CVPR 2009, Miami, Florida Subhransu Maji and Jitendra Malik University of California at Berkeley, Berkeley, CA-94720 Object Detection Using a Max-Margin Hough Transform
  • 2. Overview Overview of probabilistic Hough transform Learning framework Experiments Summary
  • 3. Our Approach: Hough Transform Popular for detecting parameterized shapes Hough’59, Duda&Hart’72, Ballard’81,… Local parts vote for object pose Complexity : # parts * # votes Can be significantly lower than brute force search over pose (for example sliding window detectors)
  • 4. Generalized to object detection Learning Learn appearance codebook Cluster over interest points on training images Use Hough space voting to find objects Lowe’99, Leibe et.al.’04,’08, Opelt&Pinz’08 Implicit Shape Model Leibe et.al.’04,’08 Learn spatial distributions Match codebook to training images Record matching positions on object Centroid is given Spatial occurrence distributions x y s x y s x y s x y s
  • 5. Detection Pipeline B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model ‘ 2004 Probabilistic Voting Interest Points eg. SIFT,GB, Local Patches Matched Codebook Entries KD Tree
  • 6. Probabilistic Hough Transform C – Codebook f – features, l - locations Position Posterior Codeword Match Codeword likelihood Detection Score Codeword likelihood
  • 7. Learning Feature Weights Given : Appearance Codebook, C Posterior distribution of object center for each codeword P(x|…) To Do : Learn codebook weights such that the Hough transform detector works well (i.e. better detection rates) Contributions : Show that these weights can be learned optimally using a max-margin framework. Demonstrate that this leads to improved accuracy on various datasets
  • 8. Naïve Bayes weights: Encourages relatively rare parts However rare parts may not be good predictors of the object location Need to jointly consider both priors and distribution of location centers. Learning Feature Weights : First Try
  • 9. Location invariance assumption Overall score is linear given the matched codebook entries Learning Feature Weights : Second Try Position Posterior Codeword Match Codeword likelihood Activations Feature weights
  • 10. Max-Margin Training Training: Construct dictionary Record codeword distributions on training examples Compute “a” vectors on positive and negative training examples Learn codebook weights using by max-margin training Standard ISM model (Leibe et.al.’04) Our Contribution class label {+1,-1} activations non negative
  • 11. Experiment Datasets ETHZ Shape Dataset ( Ferrari et al., ECCV 2006) 255 images, over 5 classes (Apple logo, Bottle, Giraffe, Mug, Swan) UIUC Single Scale Cars Dataset ( Agarwal & Roth, ECCV 2002) 1050 training, 170 test images INRIA Horse Dataset ( Jurie & Ferrari) 170 positive + 170 negative images (50 + 50 for training)
  • 12. Experimental Results Hough transform details Interest points : Geometric Blur descriptors at sparse sample of edges (Berg&Malik’01) Codebook constructed using k -means Voting over position and aspect ratio Search over scales Correct detections (PASCAL criterion)
  • 13. Learned Weights (ETHZ shape) Max-Margin Important Parts Naïve Bayes blue (low) , dark red (high) Influenced by clutter (rare structures)
  • 14. Learned Weights (UIUC cars) blue (low) , dark red (high) Naïve Bayes Max-Margin Important Parts
  • 15. Learned Weights (INRIA horses) blue (low) , dark red (high) Naïve Bayes Max-Margin Important Parts
  • 16. Detection Results (ETHZ dataset) Recall @ 1.0 False Positives Per Window
  • 17. Detection Results (INRIA Horses) Our Work
  • 18. Detection Results (UIUC Cars) INRIA horses Our Work
  • 19. Hough Voting + Verification Classifier Recall @ 0.3 False Positives Per Image ETHZ Shape Dataset IKSVM was run on top 30 windows + local search KAS – Ferrari et.al., PAMI’08 TPS-RPM – Ferrari et.al., CVPR’07 better fitting bounding box Implicit sampling over aspect-ratio
  • 20. Hough Voting + Verification Classifier IKSVM was run on top 30 windows + local search Our Work
  • 21. Hough Voting + Verification Classifier UIUC Single Scale Car Dataset IKSVM was run on top 10 windows + local search 1.7% improvement
  • 22. Summary Hough transform based detectors offer good detection performance and speed. To get better performance one may learn Discriminative dictionaries (two talks ago, Gall et.al.’09) Weights on codewords (our work) Our approach directly optimizes detection performance using a max-margin formulation Any weak predictor of object center can be used is this framework Eg. Regions (one talk ago, Gu et.al. CVPR’09)
  • 23. Work partially supported by: ARO MURI W911NF-06-1-0076 and ONR MURI N00014-06-1-0734 Computer Vision Group @ UC Berkeley Acknowledgements Thank You Questions?
  • 24. Backup Slide : Toy Example Rare but poor localization Rare and good localization

Editor's Notes

  • #2: Thank you. Good morning. I am going to present a learning framework for Hough transform based object detection.
  • #3: We are interested in the task of object detection where we are interested in localizing an instance of an object in an image. We use an approach based on hough transform. Before I go into the details, I will present an overview of hough tranform followed by our learning framework. I will then present experimental results and conclude.
  • #4: Yet another way of doing this is hough transform based approach. This is of course an old idea proposed by Hough for detecting lines more than 50 years ago. Since then it has been generalized to detect parametric shapes like ellipses and circles. Local parts cast vote for object pose and the complexity scales linearly with # parts times # votes.
  • #5: Recently Liebe and Schile have extended this framework for object detection. A slide from their Implicit Shape Model framework illustrates the technique. Local parts are based on patches represented using a dictionary learned form training examples. The position of each codeword is recorded on the training example to from a distribution of each codeword location wrto the object center. For example the patch corresponding to the head of the person is typically at a fixed vertical offset wrto the torso as seen in the bottom left distribution. At test time the interest points are detected and matched to the codebook entries which vote for the object center. The peaks of the voting space correspond to object locations. Quite simple but a powerful framework.
  • #7: Introducing you to a set of notations for the next set of slides. Let C be the learned codebook, let f denote the features and l the location of the features. The overall detection score is the sum of contributions from each feature f_j observed at a location l_j. Each feature is matched to a codebook as given by p(Ci|fj). This could be simply 1 for the nearest neighbour and 0 for the other codewords. P(x|O,Ci,l_j) is the distribution of the centroid given the Codeword Ci observed at location lj. The last term p(O|Ci,lj) is the confidence (or weight) of the codeword Ci.
  • #8: Learning codeword weights in the context of Hough transform has not been addressed well in the literature. In an earlier talk today we saw a way of learning discriminative dictionaries for Hough transform. However in situations where the codebook is fixed we would like to learn the importance of each codeword. I.e. we have been given a codebook and the posterior distribution of the object center for each codeword and we would like to learn weights so that the Hough transform detector has the best detection rates. What we show is that these weights can be learned optimally using convex optimization and leads to better detection rates when compared to uniform weights and even a simple learning scheme.
  • #9: Assign each codebook a weight proportional to the relative frequency of the object. We call this the naïve Bayes weights. (Read from slides)
  • #10: If you look at the equation of the Hough tranform you realize that the overall score is linear in the codebook weights. This is assuming a location invariance of the object (i.e. the object can appear anywhere in the image). Thus the score is a dot product of the weight vector and a activation vector. The activations are independent of the weights given the features and their locations. This suggests a learning scheme which learns weights which increases the score on the positive locations over negative ones. We formalize this in the next slide.
  • #12: We perform experiments on 3 datasets (ETHZ, UIUC cars and INRIA horses)
  • #13: Our HT detector is based on GB descriptors (read from slide) and correct detections are counted using the PASCAL criterion i.e. an overlap of greater than 0.5.
  • #25: To illustrate the idea : consider a toy example. We are trying to detect squares where the negative examples are parallel lines as shown. We have four kinds of codewords. The tips, vertical edges, horizontal edges and corners. Both corners and horizontal edges occur on the positive example only, however lets assume that corners are easy to localize while the horizontal edge can appear anywhere. The NB scheme assigns equal weights to both these whereas our framework distinguishes them correctly as seen in the table weights. The final scores on the + and – for all the schemes are shown and one can see that the m2ht achieves the maximum separation.