SlideShare a Scribd company logo
Online Dictionary Learning for
       Sparse Coding

        Julien Mairal1 Francis Bach1
        Jean Ponce2 Guillermo Sapiro3
 1
     INRIA–Willow project 2 Ecole Normale Supérieure
                3
                  University of Minnesota

          ICML, Montréal, June 2008
What this talk is about
    Learning efficiently dictionaries (basis set) for sparse
    coding.
    Solving a large-scale matrix factorization problem.
    Making some large-scale image processing problems
    tractable.
    Proposing an algorithm which extends to NMF, sparse
    PCA,. . .
1   The Dictionary Learning Problem


2   Online Dictionary Learning


3   Extensions
1   The Dictionary Learning Problem


2   Online Dictionary Learning


3   Extensions
The Dictionary Learning Problem




              y         =       xorig        + w
         measurements       original image    noise
The Dictionary Learning Problem
[Elad & Aharon (’06)]


   Solving the denoising problem
          Extract all overlapping 8 × 8 patches xi .
          Solve a matrix factorization problem:
                         n 1
                 min         ||x − Dαi ||2 + λ||αi ||1 ,
                αi ,D∈C i=1 2 i          2
                                               sparsity
                             reconstruction

          with n > 100, 000
          Average the reconstruction of each patch.
The Dictionary Learning Problem
[Mairal, Bach, Ponce, Sapiro & Zisserman (’09)]




                       Denoising result
The Dictionary Learning Problem
[Mairal, Sapiro & Elad (’08)]




                 Image completion example
The Dictionary Learning Problem
What does D look like?
The Dictionary Learning Problem


                 n 1
           min       ||xi − Dαi ||2 + λ||αi ||1
                                  2
          α∈R i=1 2
             k×n
            D∈C

   C = {D ∈ Rm×k s.t. ∀j = 1, . . . , k, ||dj ||2 ≤ 1}.


       Classical optimization alternates between D
       and α.
       Good results, but very slow!
1   The Dictionary Learning Problem


2   Online Dictionary Learning


3   Extensions
Online Dictionary Learning



  Classical formulation of dictionary learning
                               1 n
              min fn (D) = min       l(xi , D),
              D∈C          D∈C n i=1

  where
                        1
          l(x, D) = mink ||x − Dα||2 + λ||α||1 .
                                   2
                    α∈R 2
Online Dictionary Learning




  Which formulation are we interested in?

                                   1 n
    min f (D) = Ex [l(x, D)] ≈ lim       l(xi , D)
    D∈C                       n→+∞ n
                                     i=1
Online Dictionary Learning



  Online learning can
       handle potentially infinite datasets,
       adapt to dynamic training sets,
       be dramatically faster than batch
       algorithms [Bottou & Bousquet (’08)].
Online Dictionary Learning
Proposed approach


     1: for t=1,. . . ,T do
     2:    Draw xt
     3:    Sparse Coding
                        1
            αt ← arg min ||xt − Dt−1 α||2 + λ||α||1 ,
                                        2
                  α∈Rk 2
     4:    Dictionary Learning
                          1 t 1
           Dt ← arg min           ||xi −Dαi ||2 +λ||αi ||1 ,
                                              2
                    D∈C   t i=1 2
     5: end for
Online Dictionary Learning
Proposed approach




   Implementation details
         Use LARS for the sparse coding step,
         Use a block-coordinate approach for the
         dictionary update, with warm restart,
         Use a mini-batch.
Online Dictionary Learning
Proposed approach


   Which guarantees do we have?
   Under a few reasonable assumptions,
                                       ˆ
         we build a surrogate function ft of the
         expected cost f verifying
                        ˆ
                    lim ft (Dt ) − f (Dt ) = 0,
                    t→+∞

         Dt is asymptotically close to a stationary
         point.
Online Dictionary Learning
Experimental results, batch vs online




                       m = 8 × 8, k = 256
Online Dictionary Learning
Experimental results, batch vs online




                   m = 12 × 12 × 3, k = 512
Online Dictionary Learning
Experimental results, batch vs online




                     m = 16 × 16, k = 1024
Online Dictionary Learning
Experimental results, ODL vs SGD




                     m = 8 × 8, k = 256
Online Dictionary Learning
Experimental results, ODL vs SGD




                 m = 12 × 12 × 3, k = 512
Online Dictionary Learning
Experimental results, ODL vs SGD




                   m = 16 × 16, k = 1024
Online Dictionary Learning
Inpainting a 12-Mpixel photograph
Online Dictionary Learning
Inpainting a 12-Mpixel photograph
Online Dictionary Learning
Inpainting a 12-Mpixel photograph
Online Dictionary Learning
Inpainting a 12-Mpixel photograph
1   The Dictionary Learning Problem


2   Online Dictionary Learning


3   Extensions
Extension to NMF and sparse PCA

  NMF extension

           1n
    min      ||xi − Dαi ||2 s.t. αi ≥ 0,
                          2                      D ≥ 0.
   α∈R i=1 2
      k×n
     D∈C

  SPCA extension
                   n 1
            min         ||xi − Dαi ||2 + λ||α1 ||1
                                     2
           α∈Rk×n i=1 2
            D∈C

   C = {D ∈ Rm×k s.t. ∀j ||dj ||2 + γ||dj ||1 ≤ 1}.
                                2
Extension to NMF and sparse PCA
Faces: Extended Yale Database B




         (a) PCA           (b) NNMF   (c) DL
Extension to NMF and sparse PCA
Faces: Extended Yale Database B




   (d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%
Extension to NMF and sparse PCA
Natural Patches




         (a) PCA   (b) NNMF       (c) DL
Extension to NMF and sparse PCA
Natural Patches




   (d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%
Conclusion


  Take-home message
      Online techniques are adapted to the
      dictionary learning problem.
      Our method makes some large-scale
      image processing tasks tractable—. . .
      . . . — and extends to various matrix
      factorization problems.

More Related Content

PDF
Efficient end-to-end learning for quantizable representations
PDF
Metric learning ICML2010 tutorial
PDF
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
PDF
The Perceptron (D1L2 Deep Learning for Speech and Language)
PDF
IVR - Chapter 7 - Patch models and dictionary learning
PDF
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
PPTX
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Efficient end-to-end learning for quantizable representations
Metric learning ICML2010 tutorial
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron (D1L2 Deep Learning for Speech and Language)
IVR - Chapter 7 - Patch models and dictionary learning
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
The Transformer - Xavier Giró - UPC Barcelona 2021

What's hot (20)

PDF
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Hands-on Tutorial of Machine Learning in Python
PDF
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
PDF
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
PDF
Gtti 10032021
PDF
Generative modeling with Convolutional Neural Networks
PDF
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Boosted tree
PDF
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
PDF
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Social Network Analysis
PDF
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
PDF
Macrocanonical models for texture synthesis
PDF
Output Units and Cost Function in FNN
PDF
Epsrcws08 campbell isvm_01
PDF
GAN - Theory and Applications
PDF
Dictionary Learning for Massive Matrix Factorization
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
PDF
GAN in medical imaging
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Hands-on Tutorial of Machine Learning in Python
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Gtti 10032021
Generative modeling with Convolutional Neural Networks
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Boosted tree
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Social Network Analysis
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Macrocanonical models for texture synthesis
Output Units and Cost Function in FNN
Epsrcws08 campbell isvm_01
GAN - Theory and Applications
Dictionary Learning for Massive Matrix Factorization
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
GAN in medical imaging
Ad

Viewers also liked (6)

PDF
Tuto part2
PDF
omp-and-k-svd - Gdc2013
PDF
Introduction to Sparse Methods
PDF
CSTalks - GPGPU - 19 Jan
PPTX
image denoising technique using disctere wavelet transform
PDF
Sparse representation and compressive sensing
Tuto part2
omp-and-k-svd - Gdc2013
Introduction to Sparse Methods
CSTalks - GPGPU - 19 Jan
image denoising technique using disctere wavelet transform
Sparse representation and compressive sensing
Ad

Similar to Talk icml (20)

PDF
Massive Matrix Factorization : Applications to collaborative filtering
PDF
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
PDF
Dictionary Learning for Massive Matrix Factorization
PDF
ENBIS 2018 presentation on Deep k-Means
PDF
SURF 2012 Final Report(1)
PDF
Signal processingcolumbia
PDF
NIPS2010: optimization algorithms in machine learning
PDF
Practical and Worst-Case Efficient Apportionment
PDF
lec6_annotated.pdf ml csci 567 vatsal sharan
PDF
Matrix Computations in Machine Learning
PDF
SIAM - Minisymposium on Guaranteed numerical algorithms
PDF
IVR - Chapter 1 - Introduction
PDF
lec4_annotated.pdf ml csci 567 vatsal sharan
PDF
Introduction to Big Data Science
PDF
Information-theoretic clustering with applications
PDF
Chapter 1 - Introduction
DOCX
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
PDF
Topological Inference via Meshing
PDF
upgrade2013
PDF
3-duality.pdf duality slides on methodss
Massive Matrix Factorization : Applications to collaborative filtering
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Dictionary Learning for Massive Matrix Factorization
ENBIS 2018 presentation on Deep k-Means
SURF 2012 Final Report(1)
Signal processingcolumbia
NIPS2010: optimization algorithms in machine learning
Practical and Worst-Case Efficient Apportionment
lec6_annotated.pdf ml csci 567 vatsal sharan
Matrix Computations in Machine Learning
SIAM - Minisymposium on Guaranteed numerical algorithms
IVR - Chapter 1 - Introduction
lec4_annotated.pdf ml csci 567 vatsal sharan
Introduction to Big Data Science
Information-theoretic clustering with applications
Chapter 1 - Introduction
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Topological Inference via Meshing
upgrade2013
3-duality.pdf duality slides on methodss

Talk icml

  • 1. Online Dictionary Learning for Sparse Coding Julien Mairal1 Francis Bach1 Jean Ponce2 Guillermo Sapiro3 1 INRIA–Willow project 2 Ecole Normale Supérieure 3 University of Minnesota ICML, Montréal, June 2008
  • 2. What this talk is about Learning efficiently dictionaries (basis set) for sparse coding. Solving a large-scale matrix factorization problem. Making some large-scale image processing problems tractable. Proposing an algorithm which extends to NMF, sparse PCA,. . .
  • 3. 1 The Dictionary Learning Problem 2 Online Dictionary Learning 3 Extensions
  • 4. 1 The Dictionary Learning Problem 2 Online Dictionary Learning 3 Extensions
  • 5. The Dictionary Learning Problem y = xorig + w measurements original image noise
  • 6. The Dictionary Learning Problem [Elad & Aharon (’06)] Solving the denoising problem Extract all overlapping 8 × 8 patches xi . Solve a matrix factorization problem: n 1 min ||x − Dαi ||2 + λ||αi ||1 , αi ,D∈C i=1 2 i 2 sparsity reconstruction with n > 100, 000 Average the reconstruction of each patch.
  • 7. The Dictionary Learning Problem [Mairal, Bach, Ponce, Sapiro & Zisserman (’09)] Denoising result
  • 8. The Dictionary Learning Problem [Mairal, Sapiro & Elad (’08)] Image completion example
  • 9. The Dictionary Learning Problem What does D look like?
  • 10. The Dictionary Learning Problem n 1 min ||xi − Dαi ||2 + λ||αi ||1 2 α∈R i=1 2 k×n D∈C C = {D ∈ Rm×k s.t. ∀j = 1, . . . , k, ||dj ||2 ≤ 1}. Classical optimization alternates between D and α. Good results, but very slow!
  • 11. 1 The Dictionary Learning Problem 2 Online Dictionary Learning 3 Extensions
  • 12. Online Dictionary Learning Classical formulation of dictionary learning 1 n min fn (D) = min l(xi , D), D∈C D∈C n i=1 where 1 l(x, D) = mink ||x − Dα||2 + λ||α||1 . 2 α∈R 2
  • 13. Online Dictionary Learning Which formulation are we interested in? 1 n min f (D) = Ex [l(x, D)] ≈ lim l(xi , D) D∈C n→+∞ n i=1
  • 14. Online Dictionary Learning Online learning can handle potentially infinite datasets, adapt to dynamic training sets, be dramatically faster than batch algorithms [Bottou & Bousquet (’08)].
  • 15. Online Dictionary Learning Proposed approach 1: for t=1,. . . ,T do 2: Draw xt 3: Sparse Coding 1 αt ← arg min ||xt − Dt−1 α||2 + λ||α||1 , 2 α∈Rk 2 4: Dictionary Learning 1 t 1 Dt ← arg min ||xi −Dαi ||2 +λ||αi ||1 , 2 D∈C t i=1 2 5: end for
  • 16. Online Dictionary Learning Proposed approach Implementation details Use LARS for the sparse coding step, Use a block-coordinate approach for the dictionary update, with warm restart, Use a mini-batch.
  • 17. Online Dictionary Learning Proposed approach Which guarantees do we have? Under a few reasonable assumptions, ˆ we build a surrogate function ft of the expected cost f verifying ˆ lim ft (Dt ) − f (Dt ) = 0, t→+∞ Dt is asymptotically close to a stationary point.
  • 18. Online Dictionary Learning Experimental results, batch vs online m = 8 × 8, k = 256
  • 19. Online Dictionary Learning Experimental results, batch vs online m = 12 × 12 × 3, k = 512
  • 20. Online Dictionary Learning Experimental results, batch vs online m = 16 × 16, k = 1024
  • 21. Online Dictionary Learning Experimental results, ODL vs SGD m = 8 × 8, k = 256
  • 22. Online Dictionary Learning Experimental results, ODL vs SGD m = 12 × 12 × 3, k = 512
  • 23. Online Dictionary Learning Experimental results, ODL vs SGD m = 16 × 16, k = 1024
  • 24. Online Dictionary Learning Inpainting a 12-Mpixel photograph
  • 25. Online Dictionary Learning Inpainting a 12-Mpixel photograph
  • 26. Online Dictionary Learning Inpainting a 12-Mpixel photograph
  • 27. Online Dictionary Learning Inpainting a 12-Mpixel photograph
  • 28. 1 The Dictionary Learning Problem 2 Online Dictionary Learning 3 Extensions
  • 29. Extension to NMF and sparse PCA NMF extension 1n min ||xi − Dαi ||2 s.t. αi ≥ 0, 2 D ≥ 0. α∈R i=1 2 k×n D∈C SPCA extension n 1 min ||xi − Dαi ||2 + λ||α1 ||1 2 α∈Rk×n i=1 2 D∈C C = {D ∈ Rm×k s.t. ∀j ||dj ||2 + γ||dj ||1 ≤ 1}. 2
  • 30. Extension to NMF and sparse PCA Faces: Extended Yale Database B (a) PCA (b) NNMF (c) DL
  • 31. Extension to NMF and sparse PCA Faces: Extended Yale Database B (d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%
  • 32. Extension to NMF and sparse PCA Natural Patches (a) PCA (b) NNMF (c) DL
  • 33. Extension to NMF and sparse PCA Natural Patches (d) SPCA, τ = 70% (e) SPCA, τ = 30% (f) SPCA, τ = 10%
  • 34. Conclusion Take-home message Online techniques are adapted to the dictionary learning problem. Our method makes some large-scale image processing tasks tractable—. . . . . . — and extends to various matrix factorization problems.