ECCV2010: feature learning for image classification, part 3

Part 3: Image Classification using Sparse Coding: Advanced Topics Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University

Outline of Part 3 05/13/11 Why can sparse coding learn good features? Intuition, topic model view, and geometric view A theoretical framework: local coordinate coding Two practical coding methods Recent advances in sparse coding for image classification

Intuition: why sparse coding helps classification? 05/13/11 The coding is a nonlinear feature mapping Represent data in a higher dimensional space Sparsity makes prominent patterns more distinctive Figure from http://www.dtreg.com/svm.htm

A “topic model” view to sparse coding 05/13/11 Each basis is a “ direction ” or a “ topic ”. Sparsity : each datum is a linear combination of only a few bases. Applicable to image denoising, inpainting, and super-resolution. B oth f igures adapted from CVPR10 tutorial by F. Bach, J. Mairal, J. Ponce and G. Sapiro Basis 1 Basis 2

A geometric view to sparse coding 05/13/11 Data manifold Each basis is somewhat like a pseudo data point – “ anchor point ” Sparsity : each datum is a sparse combination of neighbor anchors. The coding scheme explores the manifold structure of data. Basis Data

MNIST Experiment: Classification using SC 05/13/11 60K training, 10K for test Let k=512 Linear SVM on sparse codes Try different values

MNIST Experiment: Lambda = 0.0005 05/13/11 Each basis is like a part or direction .

MNIST Experiment: Lambda = 0.005 05/13/11 Again, each basis is like a part or direction .

MNIST Experiment: Lambda = 0.05 05/13/11 Now, each basis is more like a digit !

MNIST Experiment: Lambda = 0.5 05/13/11 Like clustering now!

Geometric view of sparse coding 05/13/11 Error: 4.54% When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association. Implication: exploring data geometry may be useful for classification . Error: 3.75% Error: 2.64%

Distribution of coefficients (MNIST) 05/13/11 Neighbor bases tend to get nonzero coefficients

Distribution of coefficient (SIFT, Caltech101) 05/13/11 Similar observation here!

Recap: two different views to sparse coding 05/13/11 View 1 Discover “topic” components Each basis is a “ direction ” Sparsity : each datum is a linear combination of several bases. Related to topic model View 2 Geometric structure of data manifold Each basis is an “ anchor point ” Sparsity : each datum is a linear combination of neighbor anchors. S omewhat like a soft VQ (link to BoW) Either can be valid for sparse coding under certain circumstances. View 2 seems to be helpful to sensory data classification.

Key theoretical question 05/13/11 Why unsupervised feature learning via sparse coding can help classification ?

The image classification setting for analysis Implication : Learning an image classifier is a matter of learning nonlinear functions on patches. Sparse Coding Dense local feature Linear Pooling Linear SVM Function on images Function on patches

Illustration: nonlinear l earning via local coding 05/13/11 data points bases locally linear

How to learn a nonlinear function? 05/13/11 S tep 1: Learning the dictionary from unlabeled data

How to learn a nonlinear function? 05/13/11 S tep 2: Use t he dictionary to encode data

How to learn a nonlinear function? Nonlinear local learning via learning a global linear function . 05/13/11 Sparse codes of data S tep 3: Estimate parameters Global linear weights to be learned

L ocal Coordinate Coding (LCC): connect coding to n onlinear f unction l earning 05/13/11 Locality term Function approximation error Coding error If f(x) is (alpha, beta)-Lipschitz smooth Yu et al NIPS-09 T he key message: A good coding scheme should 1. have a small coding error, 2. and also b e sufficiently local

Application of LCC theory 05/13/11 F ast Implementation with a large dictionary A simple geometric way to improve BoW Wang e t al, CVPR 10 Zhou et al, ECCV 10

Application of LCC theory 05/13/11 F ast Implementation with a large dictionary A simple geometric way to improve BoW

The larger dictionary, the higher accuracy, but also the higher computation cost 05/13/11 T he same observation for Caltech-256, PASCAL, ImageNet, … Yu et al NIPS-09 Y ang et al CVPR 09

L ocality-constrained linear coding a fast implementation of LCC 05/13/11 D ictionary Learning: k-means (or hierarchical k -means) C oding for X, Step 1 – ensure locality : find the K nearest bases Step 2 – ensure low coding error : Wang et al, CVPR 10

C ompetitive in accuracy, cheap in computation 05/13/11 Wang et al CVPR 10 Sparse coding Significantly better than sparse coding T his is one of the two major algorithms applied by NEC-UIUC team to achieve the No.1 position in ImageNet challenge 2010! Comparable with sparse coding

Application of the LCC theory 05/13/11 F ast Implementation with a large dictionary A simple geometric way to improve BoW

Interpret “BoW + linear classifier” data points cluster centers Piece-wise local constant ( zero-order)

Super-vector coding: a simple geometric way to improve BoW (VQ) Zhou et al, ECCV 10 data points cluster centers Piecewise local linear ( first-order) Local tangent

Super-vector coding: a simple geometric way to improve BoW (VQ) 05/13/11 Q uantization error Function approximation error If f(x) is beta-Lipschitz smooth, and Local tangent

Super-vector coding: learning nonlinear function via a global linear model 05/13/11 Let be the VQ coding of T his is one of the two major algorithms applied by NEC-UIUC team to achieve the No.1 position in PASCAL VOC 2009! Global linear weights to be learned S uper-vector codes of data

Summary of Geometric Coding Methods Super-vector Coding A ll lead to higher-dimensional, sparse , and localized coding A ll explore geometric structure of data N ew coding methods are suitable for linear classifiers . Their implementations are quite straightforward. Vector Quantization (BoW) (Fast) Local Coordinate Coding

Things not covered here 05/13/11 I mproved LCC using Local Tangent, Yu & Zhang, ICML10 M ixture of Sparse Coding, Yang et al ECCV 10 Deep Coding Network, Lin et al NIPS 10 P ooling methods Max-pooling works wel l in practice, but appears to be ad-hoc. An interesting analysis on max-pooling, Boureau et al. ICML 2010 W e are working on a linear pooling method, which has a similar effect as max-pooling. Some preliminary results already in the super-vector coding paper, Zhou et al, ECCV2010.

Fast approximation of sparse coding via neural networks 05/13/11 Gregor & LeCun, ICML-10 The method aims at improving sparse coding speed in coding time, not training speed, potentially make sparse coding practical for video. Idea: Given a trained sparse coding model, use its input outputs as training data to train a feed-forward model They showed a speedup of X20 faster. But not evaluated on real video data.

Group sparse coding 05/13/11 Sparse coding is on patches, the image representation is unlikely sparse. Idea: enforce joint sparsity via L1/L2 norm on sparse codes of a group of patches. The resultant image representation becomes sparse, which can save the memory cost, but the classification accuracy decreases. Bengio et al, NIPS 09

Learning hierarchical dictionary 05/13/11 Jenatton, Mairal, Obozinski, and Bach, 2010 A node can be active only if its ancestors are active.

Reference 05/13/11 Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010. Efficient Highly Over-Complete Sparse Coding using a Mixture Model, Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010. Learning Fast Approximations of Sparse Coding, Karol Gregor and Yann LeCun. In ICML 2010. Improved Local Coordinate Coding using Local Tangents, Kai Yu and Tong Zhang. In ICML 2010. Sparse Coding and Dictionary Learning for Image Analysis, Francis Bach, Julien Mairal, Jean Ponce, and Guillermo Sapiro. CVPR 2010 Tutorial Supervised translation-invariant sparse coding, Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010. Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong. In CVPR 2010. Group Sparse Coding, Samy Bengio, Fernando Pereira, Yoram Singer, and Dennis Strelow, In NIPS*2009. Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS*2009. Linear spatial pyramid matching using sparse coding for image classification, Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009. Efficient sparse coding algorithms. Honglak Lee, Alexis Battle, Raina Rajat and Andrew Y.Ng. In NIPS*2007.

ECCV2010: feature learning for image classification, part 3

More Related Content

What's hot (20)

Similar to ECCV2010: feature learning for image classification, part 3 (20)

More from zukun (20)

Recently uploaded (20)

ECCV2010: feature learning for image classification, part 3

Editor's Notes