SlideShare a Scribd company logo
The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights  - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learning Compositional Models Learning Representations Overview  Low-level Representations Learning on the fly
An Overview of  Hierarchical Feature Learning  and Relations to Other Models Rob Fergus Dept. of Computer Science,  Courant Institute, New York University
Motivation Multitude of hand-designed features currently in use SIFT, HOG, LBP, MSER, Color-SIFT…………. Maybe some way of learning the features? Also,  just capture low-level edge gradients Felzenszwalb,  Girshick,  McAllester and Ramanan, PAMI 2007 Yan & Huang  (Winner of PASCAL 2010 classification competition)
Beyond Edges?  Mid-level cues “ Tokens”  from Vision by D.Marr: Continuation Parallelism Junctions Corners High-level object parts: Difficult to hand-engineer    What about learning them?
Build hierarchy of feature extractors (≥ 1 layers) All the way from pixels    classifier Homogenous structure per layer Unsupervised training Deep/Feature Learning Goal Layer 1 Layer 2 Layer 3 Simple  Classifier Image/Video Pixels Numerous approaches: Restricted Boltzmann Machines  (Hinton, Ng, Bengio,…) Sparse coding  (Yu, Fergus, LeCun) Auto-encoders  (LeCun, Bengio) ICA variants  (Ng, Cottrell) & many more….
Single Layer Architecture  Filter Normalize Pool Input:  Image Pixels / Features Output:    Features / Classifier Details in the boxes matter (especially in a hierarchy) Links to neuroscience
Example Feature Learning Architectures Pixels / Features Filter with  Dictionary (patch/tiled/convolutional) Spatial/Feature  (Sum or Max)  Normalization between  feature responses Features + Non-linearity  Local Contrast Normalization  (Subtractive / Divisive) (Group) Sparsity Max  /  Softmax
SIFT Descriptor Image  Pixels Apply Gabor filters Spatial pool  (Sum)  Normalize to unit length Feature  Vector
SIFT Features Spatial Pyramid Matching Filter with  Visual Words Multi-scale spatial pool  (Sum)  Max Classifier Lazebnik,  Schmid,  Ponce  [CVPR 2006]
Role of Normalization  Lots of different mechanisms (max, sparsity, LCN etc.) All  induce local competition between features  to explain input “ Explaining away”  Just like top-down models But more local mechanism Example:  Convolutional Sparse Coding Filters Convolution |.| 1 |.| 1 |.| 1 |.| 1 Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10],  Yang et al. [CVPR’10]
Role of Pooling  Spatial pooling Invariance to small transformations  Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007] Pooling across feature groups Gives AND/OR type behavior Compositional models of Zhu, Yuille Larger receptive fields Zeiler, Taylor, Fergus [ICCV 2011] Pooling with latent variables (& springs) Pictorial structures models Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009]
 
HOG Pyramid Object Detection with Discriminatively Trained Part-Based Models Apply object part filters Pool part responses  (latent variables  & springs)  Non-max Suppression (Spatial) Score Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] + +

More Related Content

PDF
Fcv learn ramanan
PDF
Fcv appli science_perona
PDF
Fcv acad ind_martin
PDF
Fcv acad ind_lowe
PPTX
02 cv mil_intro_to_probability
PDF
Fcv the revolution will be curated: human in the loop fine grained visual cat...
PPTX
Fcv appli science_fergus
PDF
Fcv hum mach_perona
Fcv learn ramanan
Fcv appli science_perona
Fcv acad ind_martin
Fcv acad ind_lowe
02 cv mil_intro_to_probability
Fcv the revolution will be curated: human in the loop fine grained visual cat...
Fcv appli science_fergus
Fcv hum mach_perona

Similar to Fcv learn fergus (20)

PDF
Fcv learn le_cun
PPTX
P01 introduction cvpr2012 deep learning methods for vision
PDF
Icml2012 learning hierarchies of invariant features
PPTX
Iccv2009 recognition and learning object categories p1 c01 - classical methods
PPT
Cvpr2007 object category recognition p1 - bag of words models
PPT
P02 sparse coding cvpr2012 deep learning methods for vision
PPT
Fcv learn yu
PPTX
Conventional Neural Networks and compute
PDF
Quoc Le, Stanford & Google - Tera Scale Deep Learning
PPT
Mit6870 orsu lecture11
PPTX
A brief introduction to extracting information from images
PDF
Fcv core sawhney
PDF
Computer Vision: Pattern Recognition
PPTX
13 cv mil_preprocessing
PPT
Fcv bio cv_cottrell
PPT
Fcv bio cv_cottrell
PDF
Unsupervised Computer Vision: The Current State of the Art
PDF
Computer Vision
PPTX
Scalable Learning in Computer Vision
PPTX
Lecture 06: Features
Fcv learn le_cun
P01 introduction cvpr2012 deep learning methods for vision
Icml2012 learning hierarchies of invariant features
Iccv2009 recognition and learning object categories p1 c01 - classical methods
Cvpr2007 object category recognition p1 - bag of words models
P02 sparse coding cvpr2012 deep learning methods for vision
Fcv learn yu
Conventional Neural Networks and compute
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Mit6870 orsu lecture11
A brief introduction to extracting information from images
Fcv core sawhney
Computer Vision: Pattern Recognition
13 cv mil_preprocessing
Fcv bio cv_cottrell
Fcv bio cv_cottrell
Unsupervised Computer Vision: The Current State of the Art
Computer Vision
Scalable Learning in Computer Vision
Lecture 06: Features
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
MIND Revenue Release Quarter 2 2025 Press Release
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Fcv learn fergus

  • 1. The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learning Compositional Models Learning Representations Overview Low-level Representations Learning on the fly
  • 2. An Overview of Hierarchical Feature Learning and Relations to Other Models Rob Fergus Dept. of Computer Science, Courant Institute, New York University
  • 3. Motivation Multitude of hand-designed features currently in use SIFT, HOG, LBP, MSER, Color-SIFT…………. Maybe some way of learning the features? Also, just capture low-level edge gradients Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007 Yan & Huang (Winner of PASCAL 2010 classification competition)
  • 4. Beyond Edges? Mid-level cues “ Tokens” from Vision by D.Marr: Continuation Parallelism Junctions Corners High-level object parts: Difficult to hand-engineer  What about learning them?
  • 5. Build hierarchy of feature extractors (≥ 1 layers) All the way from pixels  classifier Homogenous structure per layer Unsupervised training Deep/Feature Learning Goal Layer 1 Layer 2 Layer 3 Simple Classifier Image/Video Pixels Numerous approaches: Restricted Boltzmann Machines (Hinton, Ng, Bengio,…) Sparse coding (Yu, Fergus, LeCun) Auto-encoders (LeCun, Bengio) ICA variants (Ng, Cottrell) & many more….
  • 6. Single Layer Architecture Filter Normalize Pool Input: Image Pixels / Features Output: Features / Classifier Details in the boxes matter (especially in a hierarchy) Links to neuroscience
  • 7. Example Feature Learning Architectures Pixels / Features Filter with Dictionary (patch/tiled/convolutional) Spatial/Feature (Sum or Max) Normalization between feature responses Features + Non-linearity Local Contrast Normalization (Subtractive / Divisive) (Group) Sparsity Max / Softmax
  • 8. SIFT Descriptor Image Pixels Apply Gabor filters Spatial pool (Sum) Normalize to unit length Feature Vector
  • 9. SIFT Features Spatial Pyramid Matching Filter with Visual Words Multi-scale spatial pool (Sum) Max Classifier Lazebnik, Schmid, Ponce [CVPR 2006]
  • 10. Role of Normalization Lots of different mechanisms (max, sparsity, LCN etc.) All induce local competition between features to explain input “ Explaining away” Just like top-down models But more local mechanism Example: Convolutional Sparse Coding Filters Convolution |.| 1 |.| 1 |.| 1 |.| 1 Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
  • 11. Role of Pooling Spatial pooling Invariance to small transformations Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007] Pooling across feature groups Gives AND/OR type behavior Compositional models of Zhu, Yuille Larger receptive fields Zeiler, Taylor, Fergus [ICCV 2011] Pooling with latent variables (& springs) Pictorial structures models Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009]
  • 12.  
  • 13. HOG Pyramid Object Detection with Discriminatively Trained Part-Based Models Apply object part filters Pool part responses (latent variables & springs) Non-max Suppression (Spatial) Score Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] + +

Editor's Notes

  • #7: Winder and Brown paper. Slightly smoothed view of things.
  • #9: Note pooling is across space, not across Gabor channel
  • #10: Non-maximal suppression across VW. Like an L-Inf normalization
  • #14: Note pooling is across space, not across Gabor channel