Fcv learn fergus

The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learning Compositional Models Learning Representations Overview Low-level Representations Learning on the fly

An Overview of Hierarchical Feature Learning and Relations to Other Models Rob Fergus Dept. of Computer Science, Courant Institute, New York University

Motivation Multitude of hand-designed features currently in use SIFT, HOG, LBP, MSER, Color-SIFT…………. Maybe some way of learning the features? Also, just capture low-level edge gradients Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007 Yan & Huang (Winner of PASCAL 2010 classification competition)

Beyond Edges? Mid-level cues “ Tokens” from Vision by D.Marr: Continuation Parallelism Junctions Corners High-level object parts: Difficult to hand-engineer  What about learning them?

Build hierarchy of feature extractors (≥ 1 layers) All the way from pixels  classifier Homogenous structure per layer Unsupervised training Deep/Feature Learning Goal Layer 1 Layer 2 Layer 3 Simple Classifier Image/Video Pixels Numerous approaches: Restricted Boltzmann Machines (Hinton, Ng, Bengio,…) Sparse coding (Yu, Fergus, LeCun) Auto-encoders (LeCun, Bengio) ICA variants (Ng, Cottrell) & many more….

Single Layer Architecture Filter Normalize Pool Input: Image Pixels / Features Output: Features / Classifier Details in the boxes matter (especially in a hierarchy) Links to neuroscience

Example Feature Learning Architectures Pixels / Features Filter with Dictionary (patch/tiled/convolutional) Spatial/Feature (Sum or Max) Normalization between feature responses Features + Non-linearity Local Contrast Normalization (Subtractive / Divisive) (Group) Sparsity Max / Softmax

SIFT Descriptor Image Pixels Apply Gabor filters Spatial pool (Sum) Normalize to unit length Feature Vector

SIFT Features Spatial Pyramid Matching Filter with Visual Words Multi-scale spatial pool (Sum) Max Classifier Lazebnik, Schmid, Ponce [CVPR 2006]

Role of Normalization Lots of different mechanisms (max, sparsity, LCN etc.) All induce local competition between features to explain input “ Explaining away” Just like top-down models But more local mechanism Example: Convolutional Sparse Coding Filters Convolution |.| 1 |.| 1 |.| 1 |.| 1 Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]

Role of Pooling Spatial pooling Invariance to small transformations Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007] Pooling across feature groups Gives AND/OR type behavior Compositional models of Zhu, Yuille Larger receptive fields Zeiler, Taylor, Fergus [ICCV 2011] Pooling with latent variables (& springs) Pictorial structures models Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009]

HOG Pyramid Object Detection with Discriminatively Trained Part-Based Models Apply object part filters Pool part responses (latent variables & springs) Non-max Suppression (Spatial) Score Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] + +

Fcv learn fergus

More Related Content

Similar to Fcv learn fergus (20)

More from zukun (20)

Recently uploaded (20)

Fcv learn fergus

Editor's Notes