Icml12

Learning incoherent dictionaries for sparse
approximation using iterative projections and
rotations

Daniele Barchiesi and Mark D. Plumbley

Centre for Digital Music
School of Electronic Engineering and Computer Science
Queen Mary University of London

daniele.barchiesi@eecs.qmul.ac.uk
mark.plumbley@eecs.qmul.ac.uk

30th June 2012

Overview

Background
Dictionary learning model and algorithms
Learning incoherent dictionaries
Previous work
Learning incoherent dictionaries using iterative projections and
rotations
Constructing Grassmannian frames using iterative projections
The rotation step
Iterative projections and rotation algorithm
Numerical experiments
Incoherence results, comparison with existing methods
Sparse approximation results
Conclusions and future research
Proposed applications
Summary

D. Barchiesi and M. D. Plumbley Learning incoherent dictionaries

Background: Dictionary Learning

Problem Deﬁnition
Let {ym ∈ RN }M be a set of M observed signals of dimension N. The
m=1
goal of dictionary learning is to express:

Y ≈ ΦX

where Y contains the signals along its columns, Φ is a dictionary
containing unit norm atoms and every column of X contains at most S
non-zero coeﬃcients.


Background: Dictionary Learning

Problem Deﬁnition
Let {ym ∈ RN }M be a set of M observed signals of dimension N. The
m=1
goal of dictionary learning is to express:

Y ≈ ΦX

where Y contains the signals along its columns, Φ is a dictionary
containing unit norm atoms and every column of X contains at most S
non-zero coeﬃcients.

Optimisation

ˆ ˆ 2
(Φ, X) = arg min ||Y − ΦX||2
Φ,X

such that ||xm ||0 ≤ S ∀m

The problem is not convex even if the 0 pseudo-norm is relaxed by the 1 norm


Background: Dictionary Learning Algorithms

Optimisation Strategy
Start from an initial dictionary Φ(0)
Repeat for t = {1, . . . , T } iterations:
Sparse coding : given a ﬁxed dictionary Φ(t) , ﬁnd a sparse
approximation X(t) with any suitable algorithm.
Dictionary update : given X(t) , update the dictionary Φ(t+1) to
minimise the DL objective (possibly subject to
additional constraints).


Background: Dictionary Learning Algorithms

Optimisation Strategy
Repeat for t = {1, . . . , T } iterations:
Sparse coding : given a ﬁxed dictionary Φ(t) , ﬁnd a sparse
approximation X(t) with any suitable algorithm.
Dictionary update : given X(t) , update the dictionary Φ(t+1) to
minimise the DL objective (possibly subject to
additional constraints).

Previous Work
Methods for dictionary learning include:
Probabilistic models [Lewicki and Sejnowski]
Method of optimal directions ( mod) [Engan et al.]
k-svd [Aharon et al.]
Online learning [Mairal et al.]


Learning Incoherent Dictionaries

Mutual Coherence
The coherence of a dictionary expresses the similarity between atoms or
groups of atoms in the dictionary. The mutual coherence is deﬁned as:
def
µ(Φ) = max φi , φj
i=j

Results on sparse recovery link the performance of sparse approximation
algorithms to the coherence of the dictionary. For over-complete
approximations, low µ leads to recovery guarantees.



Mutual Coherence
The coherence of a dictionary expresses the similarity between atoms or
groups of atoms in the dictionary. The mutual coherence is deﬁned as:
def
µ(Φ) = max φi , φj
i=j

Results on sparse recovery link the performance of sparse approximation
algorithms to the coherence of the dictionary. For over-complete
approximations, low µ leads to recovery guarantees.

Goal
The objective is to learn dictionaries that are both:
Well adapted to a set of training data Y
Mutually incoherent



Advantages
Advantages of incoherent dictionaries include:
Sub-dictionaries have low condition number and their
(pseudo)inverse computed by many sparse approximation algorithms
is well-posed.
Convergence of greedy algorithms is faster for incoherent dictionaries
(experimental results).
Application-oriented intuitions (future work).



Advantages
Advantages of incoherent dictionaries include:
Sub-dictionaries have low condition number and their
(pseudo)inverse computed by many sparse approximation algorithms
is well-posed.
Convergence of greedy algorithms is faster for incoherent dictionaries
(experimental results).
Application-oriented intuitions (future work).

Previous Work
Method of coherence-constrained directions ( mocod) [Sapiro et al.]
Incoherent k-svd ( ink-svd) [Mailh´ et al.]
e
Parametric dictionary design for sparse coding [Yaghoobi et al.]


Incoherent Dictionary Learning: Previous Work
mocod
Uncostrained, penalised optimisation:
ˆ ˆ
(Φ, X) =arg min ||Y − ΦX||2 + τ
F log(|xkm | + β)+
Φ,X
m,n
K
2
+ ζ ||G − I||2 + η
F ||φk ||2 − 1
2
k=1

where the factor multiplied by τ promotes sparsity and the factors multiplied by
ζ and η promote incoherence and unit-norm atoms.


Incoherent Dictionary Learning: Previous Work
mocod
Uncostrained, penalised optimisation:
ˆ ˆ
(Φ, X) =arg min ||Y − ΦX||2 + τ
F log(|xkm | + β)+
Φ,X
m,n
K
2
+ ζ ||G − I||2 + η
F ||φk ||2 − 1
2
k=1

where the factor multiplied by τ promotes sparsity and the factors multiplied by
ζ and η promote incoherence and unit-norm atoms.

ink-svd
Greedy algorithm that includes a dictionary de-correlation step after a
k-svd dictionary update:
Find pairs of coherent atoms
De-correlate atoms two-by-two
Repeat until a target mutual coherence is reached


ipr Algorithm: constructing Grassmannian frames

A Grassmannian frame is a dictionary with minimal mutual coherence.
K −N
For a N × K dictionary, µ ≥ N(K −1) and this bound can be reached
only for some (N, K ) pairs.


ipr Algorithm: constructing Grassmannian frames

A Grassmannian frame is a dictionary with minimal mutual coherence.
K −N
For a N × K dictionary, µ ≥ N(K −1) and this bound can be reached
only for some (N, K ) pairs.

Iterative Projections Algorithm
def T
Calculate its Gram matrix G(0) = Φ(0) Φ(0)
Repeat for t = {0, . . . , T − 1} iterations:
Project Gram matrix onto the structural constraint set
def
Kµ0 = {K : K = KT , diag(K) = 1, max |ki,j | ≤ µ0 }.
i>j

Project Gram matrix onto the spectral constraint set
def
F = F : F = FT , eig(F) ≥ 0, rank(F) ≤ N
T
Factorise the Gram matrix as Φ(T −1) Φ(T −1) = G(T −1)


ipr Algorithm: the rotation step

Idea!
The factorisation at the end of the iterative projection algorithm is not
unique, since for any orthonormal matrix W
T
(WΦ) (WΦ) = ΦT WT WΦ = ΦT Φ

Therefore, we can optimise an orthonormal matrix for the DL objective!
This is an (improper) rotation of the dictionary Φ.


ipr Algorithm: the rotation step

Idea!
The factorisation at the end of the iterative projection algorithm is not
unique, since for any orthonormal matrix W
T
(WΦ) (WΦ) = ΦT WT WΦ = ΦT Φ

Therefore, we can optimise an orthonormal matrix for the DL objective!
This is an (improper) rotation of the dictionary Φ.

Dictionary Rotation
ˆ
W = arg min ||Y − WΦX||F
W:WT W=I

A closed-form solution to this problem can be found by computing the
def
svd decomposition of the covariance matrix C = ΦXYT = UΣV T
and setting:
ˆ
W = VUT


Iterative Projections and Rotations algorithm

Start from a dictionary Φ(0) returned by the dictionary update step of any
DL algorithm.
Repeat for t = {0, . . . , T − 1} iterations:
T
Calculate the Gram matrix: G(t) ← Φ(t) Φ(t)
Project Gram matrix onto the structural constraint set:

diag(G) ← 1
G ← Limit(G, µ0 )
Factorise Gram matrix and project it onto the spectral constraint set

[Q, Λ] ← evd(G)
Λ ← Thresh(Λ, N)
Φ ← Λ1/2 QT
Rotate the dictionary

C ← Y(ΦX)T
[U, Σ, V] ← svd(C)
W ← VUT
Φ ← WΦ

Numerical Experiments: The SMALLBox framework

The SMALLBox is a Matlab framework for benchmarking and developing
dictionary learning algorithms developed by a team at Queen Mary
University of London.
Latest version can be downloaded from
http://code.soundsoftware.ac.uk/
SMALLBox integrates many third-party toolboxes such as Sparco,
SparseLab, CVX, SPAMS, etc.
SMALLBox provides a unique interface for diﬀerent DL algorithms
that can be used for benchmark
The new distribution of SMALLBox allows to program add-ons to
expand the functionalities of the framework without interfering with
the core code.
IncoherentDL is a SMALLBox add-on and can be used to reproduce
some of the results presented in this talk.


Numerical Experiments: Mutual coherence vs residual norm

Test Conditions
Tests on a 16kHz guitar audio signal divided in N = 256 long
overlapping blocks
A ﬁxed number of active atoms was chosen S = 12 (around 5% of
the dimension N).
A twice-overcomplete dictionary was initialised with either:
Randomly selected samples from the training set.
An over-complete Gabor frame.
DL algorithms were run for 50 iterations.

Test Objective
The mutual coherence achieved by every learned dictionary is paired with
the approximation error deﬁned as:

||Y||F
snr(Φ, X) = 20 log10 .
||Y − ΦX||F


Numerical Experiments: mocod updates, data init.

mutual coherence µ reconstruction error

1 25

0.9
20
0.8
15

SNR (dB)
0.7
µ

0.6 10

0.5
5
0.4

0.3 10
−2 0 −2
10
−2 −2
10 0 10 0
0 10 0 10
10 2 η 10 2
2 10 2 10 η
ζ 10 ζ 10
4 4 4 4
10 10 10 10

mutual coherence−reconstruction error scatter plot
25

20 ← µmin ← µmax

15
SNR (dB)

10

5

0
0 0.2 0.4 0.6 0.8 1
mutual coherence µ


Numerical Experiments: mocod updates, Gabor init.
mutual coherence µ
reconstruction error

1
25
0.9
20
0.8

0.7 15

SNR (dB)
µ

0.6
10
0.5
5
0.4

0.3 10
−2
0 −2
−2 −2 10
10 0 10 0
0 10 0 10
10 10
ζ 2 10
2
η ζ 2 2 η
10 10 10
4 4 4 4
10 10 10 10

mutual coherence−reconstruction error scatter plot
25


15
SNR (dB)

10

5

0
0 0.2 0.4 0.6 0.8 1
mutual coherence µ


Numerical Experiments: ink-svd and ipr
Data Initialisation
25


SNR (dB)
15

10
IPR
5 INK−SVD

0
0.05 0.1 0.5 1
mutual coherence µ
Gabor Initialisation
25

SNR (dB)

15

10
IPR
5 INK−SVD
0
0.05 0.1 0.5 1
mutual coherence µ


Numerical Experiments: Sparse Approximation

Test Conditions
Matching pursuit algorithm ( mp) run for 1000 iterations on the
following signals:
Training set.
Different guitar recording taken from the rwc database.
A piano recording taken from the rwc database.
Dictionaries with different mutual coherences were selected as
returned from the ipr algorithm with data initialisation.

Test Objective
The norm of the residual in decibel defined as:

20 log10 ||y − Φx||2

is computed and averaged over the number of signals M and 10
dictionaries resulting from independent trials of the learning algorithm.


Numerical Experiments: Training set approximation

guitar − training signal
0

−20
average residual norm (dB)

−40

−60

µ = 0.72
−80
µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000
number of iterations


Numerical Experiments: Guitar approximation

guitar − test signal
0

−20

−40

−60

µ = 0.72
−80 µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000


Numerical Experiments: Piano approximation

piano
0

−20

−40

−60
µ = 0.72
−80 µ = 0.37
µ = 0.19
−100 µ = 0.1
µ = 0.06
−120
0 200 400 600 800 1000


Conclusions: Possible Applications

Morphological Component Analysis
Morphological component analysis is a dictionary learning approach to
classification.
Different dictionaries are learned on morphologically dissimilar
training sets (e.g., edges and textures, percussive and steady state
sounds)
A test signal is classified according to the support or magnitude of
the coefficients of its sparse approximation (i.e., what is the best
dictionary to represent it?)
ipr could be used to enforce incoherence between the atoms belonging
to different morphological components and enhance classification and
separation performance.


Conclusions: Possible Applications

Blind Compressed Sensing
Blind compressed sensing generalises compressed sensing to the case of
an unknown dictionary that generates the signals to be recovered.
A set of observations Z is acquired through the known measurement
matrix M. Z = MY = MΦX
Dictionary learning is used to optimise Ψ and factorize the observed
data as Z ≈ ΨX.
ˆ
The learned dictionary is factorized as the product Ψ ≈ MΦ and
ˆ ˆ
the signals reconstructed as Y = ΦX.
The two factorisations are not unique and strong constraints on Φ are
assumed to correctly reconstruct the signals. ipr might be used to
constrain the factorisations and lead to a less ambiguous solution.


Conclusions: Summary

The ipr algorithm can be used to learn dictionaries that are both
adapted to a training set and mutually incoherent.
The ipr algorithm can be used as a de-correlation step in any
dictionary learning algorithm.
Experimental data show that ipr performed generally better than
benchmark techniques on audio signals.
Incoherent dictionaries are useful for sparse approximation and could
be used in a number of potential applications.


Conclusions: Summary

The ipr algorithm can be used to learn dictionaries that are both
adapted to a training set and mutually incoherent.
The ipr algorithm can be used as a de-correlation step in any
dictionary learning algorithm.
Experimental data show that ipr performed generally better than
benchmark techniques on audio signals.
Incoherent dictionaries are useful for sparse approximation and could
be used in a number of potential applications.

Thank you for your attention
and for any question!


Icml12

More Related Content

What's hot (20)

Similar to Icml12 (20)

Recently uploaded (20)

Icml12