SlideShare a Scribd company logo
ELEG 867 - Compressive Sensing and Sparse
Signal Representations
Gonzalo R. Arce
Depart. of Electrical and Computer Engineering
University of Delaware
Fall 2011
Compressive Sensing G. Arce Fall, 2011 1 / 75
Outline
Introduction and Motivation
Vector Spaces and the Nyquist-Shannon Sampling Theorem
Sparsity and the ℓ1 Norm
Sparse Signal Representation
Compressive Sensing G. Arce Fall, 2011 2 / 75
Compressed Sensing encompasses exciting and surprising
developments in signal processing resulting from sparse
representations.
It is about the interplay between sparsity and signal recovery. Roots
trace back to †
Mathematics and harmonic analysis
Physical sciences and geophysics
Vision
Optimization and computational tools
This course describes this fascinating topic and the tools needed in its
applications.
†
D. Donoho, ”Scanning the Technology,” Proceedings of the IEEE. Vol. 98, No. 6,
June 2010
Compressive Sensing G. Arce Fall, 2011 3 / 75
Shannon-Nyquist Sampling Theorem
The Shannon-Nyquist Theorem: sampling frequency of an analog
signal must be greater than twice the highest frequency of the signal in
order to perfectly reconstruct the original signal from the sampled
version.
Theorem
If a function f (t) contains no frequencies higher than W cps, it is
completely determined by giving its ordinates at a series of points
spaced (W
2
) seconds apart.†
† C. E. Shannon. ”Communication in the presence of noise.” Proceedings of the IRE, Vol. 37, no.1, pp.10-21, Jan.1949.
H. Nyquist. ”Certain topics in telegraph transmission theory.” Trans. AIEE, vol.47, pp.617-644, Apr.1928.
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 4 / 75
Traditional signal sampling and signal compression.
Nyquist sampling rate gives exact reconstruction.
Pessimistic for some types of signals!
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 5 / 75
Sampling and Compression
Transform data and keep important coefficients.
Original Image Biorthogonal Spline
Wavelet
Wavelet Transform
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 6 / 75
Sampling and Compression
Lots of work to then throw away majority of data!.
e.g. JPEG 2000 Lossy Compression: A digital camera can take
millions of pixels but the picture is encoded on a few hundred of
kilobytes.
Original Image Wavelet Transform
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 7 / 75
Problem: Recent applications require a very large number of samples:
Higher resolution in medical imaging devices, cameras, etc.
Spectral imaging, confocal microscopy, radar arrays, etc.
Medical Imaging
y
x
λ
Spectral Imaging
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 8 / 75
Sampling and Compressive Sensing
Donoho †
, Candès ‡
, Romberg and Tao, discovered important
results on the minimum number of data needed to reconstruct a
signal
Compressive Sensing (CS) unifies sensing and compression into a
single task
Minimum number of samples to reconstruct a signal depends on
its sparsity rather than its bandwidth.
†
D. Donoho. ”Compressive Sensing”. IEEE Trans. on Information Theory. Vol.52(2), pp.5406-5425, Dec.2006.
‡
E. Candès, J. Romberg and T. Tao. ”Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency
Information”. IEEE Trans. on Information Theory. Vol.52(4), pp.1289-1306, Apr.2006.
Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 9 / 75
Vector Spaces and the Nyquist-Shannon Sampling
Theorem
Vector space: set of vectors H satisfying the following axioms:
Associativity property: v1 + (v2 + v3) = (v1 + v2) + v3.
Commutativity property: v1 + v2 = v2 + v1.
Identity element: ∃0 ∈ H, such that v + 0 = v, ∀v ∈ H.
Inverse element: ∀v ∈ H, then ∃ − v ∈ H, such that v + (−v) = 0.
Distribut. of scalar: s is a scalar, such that s(v1 + v2) = sv1 + sv2.
Distribut. of scalar: s1, s2 are scalars, such that
(s1 + s2)v = s1v + s2v.
Associat. of scalars: s1, s2 are scalars, such that s1(s2v) = (s1s2)v.
Identity element of product: ∃ a scalar 1, such that 1v = v.
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 10 / 75
Norms: A norm k · k on the vector space H satisfies:
∀x ∈ H, kxk ≥ 0, and kxk = 0 ⇔ x = 0.
∀α ∈ C, kαxk = |α|kxk. (Homogeneity).
∀x, y ∈ H, kx + yk ≤ kxk + kyk. (Triangle inequality).
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 11 / 75
Examples of norms:
H is the space Rn
, with norm kxkℓp = (
Pn
k=1 |xk|p
)1/p
, for p ≥ 1.
In R2
, set the unit ball Bp = {x : kxkℓp = 1; p ≥ 1}:
The unit ball is the set of all points (x1, x2) which satisfy the equations:
|x1| + |x2| = 1, for B1.
x2
1 + x2
2 = 1, for B2.
max{x1, x2} = 1, for B∞.
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 12 / 75
In Rn
, kxkℓ1
=
Pn
k=1 |xk| is a norm since it satisfies:
∀x ∈ Rn
, then kxkℓ1
=
Pn
k=1 |xk| ≥ 0. Also,
Pn
k=1 |xk| = 0, if and
only if xk = 0, ∀k.
∀α ∈ C, then kαxkℓ1
=
Pn
k=1 |αxk| = |α|
Pn
k=1 |xk| = |α|kxkℓ1
.
∀x, y ∈ Rn
, then
kx + ykℓ1
=
n
X
k=1
|xk + yk|
≤
n
X
k=1
(|xk| + |yk|); Convex Function
=
n
X
k=1
|xk| +
n
X
k=1
|yk|
= kxkℓ1
+ kykℓ1
.
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 13 / 75
In Rn
, kxkℓp = (
Pn
k=1 |xk|p
)1/p
, with p = 0.5, is not a norm:
∀x ∈ Rn
, then kxkℓ0.5
= (
Pn
k=1 |xk|1/2
)2
≥ 0. Also,
(
Pn
k=1 |xk|0.5
)2
= 0, if and only if xk = 0, ∀k.
∀α ∈ C, then kαxkℓ0.5
= (
Pn
k=1 |αxk|1/2
)2
=
(
Pn
k=1 |α|1/2
|xk|1/2
)2
= (|α|1/2
Pn
k=1 |xk|1/2
)2
= |α|kxkℓ0.5
.
∀x, y ∈ Rn
, then
kx + ykℓ0.5
= (
n
X
k=1
|xk + yk|1/2
)2
≥ (
n
X
k=1
|xk|1/2
+
n
X
k=1
|yk|1/2
)2
− 2
n
X
k=1
|xk|1/2
n
X
k=1
|yk|1/2
;
= (
n
X
k=1
|xk|1/2
)2
+ (
n
X
k=1
|yk|1/2
)2
= kxkℓ0.5
+ kykℓ0.5
(Triangle inequality is not satisfied)
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 14 / 75
Other Examples of Norms:
Operator norm: H is the space of m × n matrices A
kAk = σmax(A) = maximum singular value of A.
Frobenius norm: H is the space of m × n matrices A
kAkF = (
P
i,j A2
i,j)1/2
= (
P
k σ2
k )1/2
Normed vector spaces: vector spaces H satisfying the norm properties.
Examples of normed vector spaces:
ℓ2(R) (also known as ℓ2
or Euclidean space): the vector space R
satisfying the properties of the ℓ2-norm.
ℓ∞(R): the vector space R satisfying the properties of the
ℓ∞-norm.
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 15 / 75
Inner Products
An inner product < ·, · > on H satisfies ∀x, y, z ∈ H and α ∈ C:
< x, y >=< y, x >∗
< αx, y >= α < x, y >
< x + y, z >=< x, z > + < y, z >
< x, x >≥ 0, < x, x >= 0 ⇔ x = 0
A inner product operator induces a norm on H:
√
< x, x > = kxk.
In ℓ2(R), for instance, the inner product is given by:
< x, y >=
Z ∞
−∞
x(t)y∗
(t)dt. (1)
< x, x >=
Z ∞
−∞
x(t)x∗
(t)dt = kxk2
ℓ2
. (2)
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 16 / 75
Hilbert Spaces
A vector space H that satisfies the inner product properties is known as
Hilbert space.
Examples of Hilbert spaces:
The Euclidean space Rn
with the dot product as inner product:
< x, y >=
Pn
i=1 xiyi.
The space of real-valued, finite variance, zero-mean random
variables: < x, y >= E[xy].
The space of m × n matrices with: < A, B >tr= trace(AB).
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 17 / 75
Definitions
Orthogonality: two signals x, y are orthogonal if < x, y >= 0.
Orthonormal basis: a basis of a vector space is orthonormal if
their vectors are orthonormal.
Orthonormal sequence: {βn}n∈Z is an orthonormal sequence if:
kβnk = 1, ∀n, and < βn, βm >= 0, ∀n 6= m
Example:
Fourier series: {βn}n∈Z = {ej2πnt
}n∈Z is an orthobasis for
ℓ2([0,1]), since:
kβnkℓ2
= 1
< βn, βm >= 0
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 18 / 75
Definitions
Cauchy-Schwarz Inequality: | < x, y > | ≤ kxkkyk.
For the Euclidean space H = Rn :
| < x, y > | =
P
i xiyi ≤
q
(
P
i x2
i )
q
(
P
i y2
i ) = kxkℓ2
kykℓ2
.
For the space of real-valued, finite variance, zero-mean random
variables:| < x, y > | = E[xy] ≤ (E[x])(E[y]) = kxkkyk.
Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 19 / 75
Shannon-Nyquist Sampling Theorem
Sampling of a bandlimited signal.
Let f̂(w) be the Fourier transform of f (t). Let the space of bandlimited
signals be
Bπ/T = {f (t) ∈ Rn
s.t. f̂(w) = 0, ∀|w| > π/T}.
Define
hT(t) =
√
T sin(πt/T)
πt
↔ ĥ(w) =
 √
T ;if |w| ≤ π/T
0 ;if |w|  π/T.
Compressive Sensing G. Arce Vector Spaces  Nyquist-Shannon Theorem Fall, 2011 20 / 75
By the linear shift property of the Fourier series
hT(t − nT) ↔
√
TejwnT
.
Using the Parseval theorem definition
Parseval theorem:
R ∞
−∞
f (t)g∗
(t)dt = 1
2π
R ∞
−∞
f̂(w)ĝ(w)dw,
note that hT(t − nT) is an orthobasis for the bandlimited signals f (t) in
Bπ/T:
Z ∞
−∞
hT(t)h(t − nT)dt =
1
2π
Z π/T
−π/T
TejwnT
dw
=
1
2jπn
ejwnT
|
π/T
−π/T
=
1
2jπn
(ejπn
− e−jπn
)
= 0, ∀n ∈ Z.
Compressive Sensing G. Arce Vector Spaces  Nyquist-Shannon Theorem Fall, 2011 21 / 75
The signals f (t) in Bπ/T can be expressed in terms of its orthobasis
f (t) =
X
n∈Z
hf (t), h(t − nT)ih(t − nT). (3)
Using the inner product definition in (2) and the parseval theorem, the
coefficients for the signal expansion in terms of its orthobasis are
hf (t), h(t − nT)i =
1
2π
Z π/T
−π/T
f̂(w)
√
TejwnT
dw
=
√
Tf (nT) (4)
Compressive Sensing G. Arce Vector Spaces  Nyquist-Shannon Theorem Fall, 2011 22 / 75
Replacing (4) in (3), the signals f (t) in Bπ/T can then be expressed in
terms of a sequence
f (t) =
√
T
X
n∈Z
f (nT)h(t − nT). (5)
where, the coefficients f (nT) of the sequence are samples of f (t).
Nyquist-Shannon-Kotelnikov Theorem
If a signal f (t) contains frequencies satisfying |w|  π/T, the signal is
completely determined by series of points spaced T seconds apart.
Compressive Sensing G. Arce Vector Spaces  Nyquist-Shannon Theorem Fall, 2011 23 / 75
Sparsity
Signal sparsity critical to CS
Plays roughly the same role in CS that bandwidth plays in
Shannon-Nyquist theory
A signal x ∈ RN
is S-sparse on the basis Ψ if x can be represented
by a linear combination of S vectors of Ψ as x = Ψα with S ≪ N
Ψ
x
α
At most S non-zero components
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 24 / 75
The ℓ1 Norm and Sparsity
The ℓ0 norm is defined by: kxk0 = #{i : x(i) 6= 0}
Sparsity of x is measured by its number of non-zero elements.
The ℓ1 norm is defined by: kxk1 =
P
i |x(i)|
ℓ1 norm has two key properties:
Robust data fitting
Sparsity inducing norm
The ℓ2 norm is defined by: kxk2 = (
P
i |x(i)|2
)1/2
ℓ2 norm is not effective in measuring sparsity of x
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 25 / 75
Why ℓ1 Norm Promotes Sparsity?
Given two N-dimensional signals:
x1 = (1, 0, ..., 0) → ”Spike” signal
x2 = (1/
√
N, 1/
√
N, ..., 1/
√
N) → ”Comb” signal
x1 and x2 have the same ℓ2 norm:
kx1k2 = 1 and kx2k2 = 1.
However, kx1k1 = 1 and
kx2k1 =
√
N.
x2
x1
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 26 / 75
ℓ1 Norm in Regression
Linear regression is widely used in science and engineering.
Given A ∈ Rm×n
and b ∈ Rm
; m  n
Find x s.t. b = Ax (overdetermined)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 27 / 75
ℓ1 Norm Regression
Two approaches:
Minimize the ℓ2 norm of the residuals
min
x∈Rn
kb − Axk2
The ℓ2 norm penalizes large residuals
Minimizes the ℓ1 norm of the residuals
min
x∈Rn
kb − Axk1
The ℓ1 norm puts much more weight on small residuals
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 28 / 75
Matlab Code
minx∈Rn kAx − bk2
A=randn(500,150);
b=randn(500,1);
x = (A′
∗ A)(−1)
∗ A′
∗ b; Least Squares Solution
minx∈Rn kAx − bk1
A=randn(500,150);
b=randn(500,1);
X = medrec(b,A,max(A’*b),0,100,1e-5);
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 29 / 75
ℓ1 Norm Regression
m = 500, n = 150. A = randn(m, n) and b = randn(m, 1)
−3 −2 −1 0 1 2 3
0
5
10
15
20
25
30
−3 −2 −1 0 1 2 3
20
40
60
80
100
120
140
160
ℓ2 Residuals ℓ1 Residuals
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 30 / 75
ℓ1 Norm in Regression
Given A ∈ Rm×n
and b ∈ Rm
; m  n
Find x s.t. b = Ax (underdetermined)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 31 / 75
ℓ1 Norm Regression
Two approaches:
Minimize the ℓ2 norm of x
min
x∈Rn
kxk2 subject to Ax = b
Minimize the ℓ1 norm of x
min
x∈Rn
kxk1 subject to Ax = b
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 32 / 75
Matlab Code
minx∈Rn kxk2 subject to Ax = b
A=randn(150,500);
b=randn(150,1);
C=eye(150,500);
d=zeros(150,1);
X=lsqlin(C,d,[],[],A,b);
In general:
minx∈Rn f (x) subject to Ax = b
X= fmincon(@(x) f(x),zeros(500,1),[],[],A,b,[],[],options);
where f (x) is a convex function.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 33 / 75
ℓ1 Norm Regression
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
0
5
10
15
20
25
30
35
−4 −3 −2 −1 0 1 2 3 4
x 10
6
0
50
100
150
200
250
ℓ2 Solution ℓ1 Solution
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 34 / 75
ℓ1 Norm Regression
Consider N observation pairs (xi, bi) modeled in a linear fashion
bi = Axi + c + Ui, i = 1, 2, . . . , N (6)
A: Unknown slope of the fitting line.
c: Intercept.
Ui: Unobservable errors
The Least Absolute Deviation regression is
F1(A, c) =
N
X
i=1
|bi − Axi − c|, (7)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 35 / 75
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 36 / 75
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 37 / 75
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 38 / 75
PN
i=1 |bi − Axi − c| c = −xiA + bi
A
c
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 39 / 75
ℓ1 Norm in Estimation
Location Estimate in Gaussian Noise
Let x1, x2, · · · , xN, i.i.d. Gaussian with a constant but unknown mean
β. The Maximum Likelihood estimate of location is the value β̂ which
maximizes the likelihood function
f (x1, x2, · · · , xN; β) =
N
Y
i=1
f (xi − β)
=
N
Y
i=1
1
√
2πσ
e−(xi−β)2
/2σ2
(8)
=

1
2πσ2
N/2
e−
PN
i=1(xi−β)2
/2σ2
.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 40 / 75
ℓ1 Norm in Estimation
The ML estimate β̂ minimizes the least squares sum
β̂ML = arg min
β
N
X
i=1
(xi − β)2
. (9)
Results in the sample mean
β̂ML =
1
N
N
X
i=1
xi. (10)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 41 / 75
ℓ1 Norm in Estimation
Location Estimate in Generalized Gaussian Noise
If the x′
s obey a generalized Gaussian distribution, the ML estimate of
location is
f (x1, x2, · · · , xN; β) =
N
Y
i=1
fγ(xi − β)
=
N
Y
i=1
C e−|xi−β|γ
/σ
= CN
e−
PN
i=1|xi−β|γ
/σ
, (11)
where C is a normalizing constant, and γ is the dispersion parameter.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 42 / 75
ℓ1 Norm in Estimation
Maximizing the likelihood function is equivalent to
β̃ML = arg min
β
N
X
i=1
|xi − β|γ
.
X
1
X
4
X
3
X5
X
2
γ = 2
γ = 1
γ = 0.5
Figure: Cost function for γ = 0.5, 1, and 2.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 43 / 75
ℓ1 Norm in Estimation
For N odd there is an integer k, such that the slopes over the intervals
(x(k−1), x(k)] and (x(k), x(k+1)], are negative and positive, respectively.
β̂ML = arg min
β
N
X
i=1
|xi − β|
=
(
x(N+1
2
) N odd

x(N
2 ), x(N
2 )
i
N even
= MEDIAN(x1, x2, · · · , xN). (12)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 44 / 75
ℓ1 Norm Regression
ML Estimate of Location for Generalized Gaussian
Here the samples have a common location parameter β, but different
scale parameter σi. The ML estimate of location is
Gp(β) =
N
X
i=1
1
σp
i
|xi − β|p
. (13)
For the Gaussian distribution (p = 2), the ML estimate reduces to
β̂ = arg min
β
N
X
i=1
1
σ2
i
(xi − β)2
=
PN
i=1 Wi · xi
PN
i=1 Wi
(14)
where Wi = 1/σ2
i  0.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 45 / 75
For the Laplacian distribution (p = 1), the ML estimate minimizes
G1(β) =
N
X
i=1
1
σi
|xi − β|. (15)
where Wi
△
= 1/σi  0. G1(β) is piecewise linear and convex. The
weighted median output is defined as
Y(n) = arg min
β
N
X
i=1
Wi|xi − β|
= MEDIAN[W1♦x1(n), W2♦x2(n), · · · , WN♦xN(n)]
where Wi  0 and ♦ is the replication operator defined as
Wi♦xi =
Wi times
z }| {
xi, xi, · · · , xi.
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 46 / 75
ℓ1 Norm Regression
Next, consider N observation pairs (xi, bi)
bi = Axi + c + Ui, i = 1, 2, . . . , N (16)
A: Unknown slope of the fitting line.
c: Intercept.
Ui: Unobservable errors
The L1 or Least Absolute Deviation (LAD) regression is
F1(A, c) =
N
X
i=1
|bi − Axi − c|, (17)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 47 / 75
Sample space: bi = Axi + c
1. Each sample pair (xi, bi) represents a point on the plane
2. The solution is a line with slope A∗
and intercept c∗
.
3. If this line goes through some sample pair (xi, bi), then
the equation bi = A∗
xi + c∗
is satisfied
Parameter space: c = −xiA + bi
1. The solution (A∗
, b∗
) is a point.
2. The sample pair (xi, bi) defines a line with slope −xi and
intercept bi.
3. When c∗
= −xiA∗
+ bi holds, it can be inferred that the
point (A∗
, c∗
) is on the line defined by (−xi, bi)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 48 / 75
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 49 / 75
c
A
(A*,c*)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 50 / 75
Set A = A0, the objective function now becomes a one-parameter
function of c
F(c) =
N
X
i=1
| bi − A0xi
| {z }
Observations
−c|. (18)
The parameter c∗
is the Maximum Likelihood estimator of location for
c. It can be obtained by
c∗
= MED(bi − A0xi) | N
i=1. (19)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 51 / 75
Set c = c0, the objective function reduces to
F(a) =
N
X
i=1
|bi − c0 − Axi|
=
N
X
i=1
|xi|
bi − c0
xi
− A . (20)
The parameter A∗
can be seen as the ML estimator of location for A,
and can be calculated as the weighted median,
A∗
= MED

|xi| ⋄
bi − c0
xi
 N
i=1
, (21)
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 52 / 75
A simple and intuitive way of solving the LAD regression problem is:
1. Set k = 0. Find an initial value A0 for A, such as the Least
Squares (LS) solution.
2. Set k = k + 1 and obtain a new estimate of c for a fixed Ak−1 using
ck = MED(bi − Ak−1xi) | N
i=1.
3. Obtain a new estimate of A for a fixed ck using
Ak = MED

|xi| ⋄
bi − ck
xi
 N
i=1
.
4. Once Ak and ck do not deviate from Ak−1 and ck−1 within a
tolerance range, end the iteration. Otherwise, go back to step 2).
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 53 / 75
c
A
Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 54 / 75
Signal Representation
A sparse signal x ∈ RN
can be represented by a linear
combination of basis of an orthogonal representation matrix Ψ
x(t) =
X
i
αiψi(t)
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 55 / 75
Sparse Signal Representation
Active development for effective signal representation in the 90’s
Fourier
Wavelet
Curvelet
There is no universal best representation
Best representation = sparsest
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 56 / 75
Wavelets
A wavelet is a ”small wave” with finite energy that allows the analysis
of transient, or time-varying phenomena.
Figure: Daubechies (D20) Wavelet example
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 57 / 75
A signal x(t) can be represented in terms of its wavelet coefficients as
x(t) =
X
j∈Z
X
n∈Z
hx, Ψj,niΨj,n(t)
where:
Ψj,n are the wavelets that form an orthogonal basis.
hx, Ψj,ni are the wavelet coefficients.
Wavelets are vectors of a orthogonal basis formed by shifting and
dilating a mother wavelet, Ψ(t):
Ψj,n(t) = 2−j/2
Ψ(2−j
t − n), ∀j, n ∈ Z
where j is the scale parameter and n is the location parameter.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 58 / 75
Examples of wavelet expansion functions are:
Figure: Haar wavelet Figure: Daubechies
wavelet
Figure: Symlet wavelet
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 59 / 75
Daubechies Wavelet
Daubechies Wavelets are continuous and smooth wavelets.
The mother wavelet is defined by means of a scaling function.
A daubechies wavelet Ψ(t) has p − 1 vanishing moments if:
Z ∞
−∞
tk
Ψ(t)dt = 0; for 0 ≤ k  p.
The smoothness of the scaling and wavelet functions increase as
the number of vanishing moments increases.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 60 / 75
Examples of Daubechies wavelets:
(a) (b) (c)
(a) Daubechies scaling and wavelet functions with 2 vanishing
moments.
(b) Daubechies scaling and wavelet functions with 6 vanishing
moments.
(c) Daubechies scaling and wavelet functions with 10 vanishing
moments.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 61 / 75
Examples of Wavelet decompositions
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 62 / 75
Examples of Wavelet decompositions
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 63 / 75
Other examples: original signals
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 64 / 75
Noisy signals
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 65 / 75
Denoising using wavelet approximation
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 66 / 75
Denoising using wavelet approximation
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 67 / 75
Sampling and Compression
JPEG
JPEG
JPEG2000
JPEG2000
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 68 / 75
Sparse Signal Representation
Different representations are best for different applications.
Fourier Dictionary → For oscillatory phenomena
Wavelet Dictionary → For images with isolated singularities
Curvelet Dictionary → For images with contours and edges
This motivates overcomplete signal representation ‡
‡
S. Mallat and Z. Zhang. ”Matching Pursuit in a Time-Frequency Dictionary”. IEEE Trans. on Signal Proc. Vol.41, pp.3397-3415, 1993.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 69 / 75
Sparse Signal Representation
Overcomplete dictionary representation
Different bases merged into a combined dictionary
Ψ = [Ψ1, Ψ2, ..., ΨN]
Representation of x in an overcomplete dictionary
x =
X
i
αiψi, with the sparsest α
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 70 / 75
Basis Pursuit (BP)
Basis Pursuit → find the sparsest approximation of x
min
α
kαk1 s.t. x = Ψα
where kαk1 =
P
i |αi|.
BP decomposes a signal into a superposition of dictionary
elements having the smallest ℓ1-norm among all such
decompositions.
†
D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory, 47:2845-2862, 2001.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 71 / 75
Compressible Signals
In most applications
Signals are not perfectly sparse, but only a few coefficients
concentrate most of the energy.
Most of the transform coefficients are negligible.
Compressible signals can be approximated by a S-sparse signal:
- There is a transform vector αS with only S terms such that
kαS − αk2 is small.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 72 / 75
Compressible Signals
Wavelet coefficients of natural scenes exhibit the (1/n)-decay†
.
1 Megapixel Image Wavelet Coefficients Sorted Wavelet Coeff.
† E. J. Candès and J. Romberg ”Sparsity and Incoherence in Compressive Sampling.” Inverse Problems.
vol.23, pp.969-985. 2006.
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 73 / 75
Examples of Compressible Signals
Bat echolocation
Time Signal
Time-Frequency
Representation
Confocal microscopy
3D Image
3D Wavelet Coefficients
Ultra wideband signaling Amplitude[mV]
Time[ps]
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 74 / 75
Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 75 / 75

More Related Content

PDF
compressed-sensing
PDF
Image compression based on
PDF
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
PPTX
Compressive Sensing Basics - Medical Imaging - MRI
PDF
Compressed Sensing In Spectral Imaging
PDF
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
PDF
A Compressed Sensing Approach to Image Reconstruction
PPTX
Compressed Sensing - Achuta Kadambi
compressed-sensing
Image compression based on
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
Compressive Sensing Basics - Medical Imaging - MRI
Compressed Sensing In Spectral Imaging
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
A Compressed Sensing Approach to Image Reconstruction
Compressed Sensing - Achuta Kadambi

Similar to CS-ChapterI.pdf (20)

PDF
Compressive Spectral Image Sensing, Processing, and Optimization
PDF
50120140501009
PDF
Comparative Study of Compressive Sensing Techniques For Image Enhancement
PDF
PDF
Introduction to compressive sensing
PPT
signal space analysis.ppt
PDF
Performance analysis of compressive sensing recovery algorithms for image pr...
PDF
upgrade2013
PDF
omp-and-k-svd - Gdc2013
PDF
Compressed sensing techniques for sensor data using unsupervised learning
PDF
Mathematics for Deep Learning (1)
PDF
Matrix Padding Method for Sparse Signal Reconstruction
PDF
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
PDF
Introduction to compressive sensing
PDF
Pablo Estevez: "Computational Intelligence Applied to Time Series Analysis"
PDF
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
PDF
Performance of Matching Algorithmsfor Signal Approximation
PDF
Compressed Sensing and Tomography
PDF
離散値ベクトル再構成手法とその通信応用
PDF
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Compressive Spectral Image Sensing, Processing, and Optimization
50120140501009
Comparative Study of Compressive Sensing Techniques For Image Enhancement
Introduction to compressive sensing
signal space analysis.ppt
Performance analysis of compressive sensing recovery algorithms for image pr...
upgrade2013
omp-and-k-svd - Gdc2013
Compressed sensing techniques for sensor data using unsupervised learning
Mathematics for Deep Learning (1)
Matrix Padding Method for Sparse Signal Reconstruction
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Introduction to compressive sensing
Pablo Estevez: "Computational Intelligence Applied to Time Series Analysis"
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
Performance of Matching Algorithmsfor Signal Approximation
Compressed Sensing and Tomography
離散値ベクトル再構成手法とその通信応用
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Ad

Recently uploaded (20)

PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Sciences of Europe No 170 (2025)
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPT
protein biochemistry.ppt for university classes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
INTRODUCTION TO EVS | Concept of sustainability
Cell Membrane: Structure, Composition & Functions
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet Module 2ELS
Sciences of Europe No 170 (2025)
Placing the Near-Earth Object Impact Probability in Context
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Introduction to Cardiovascular system_structure and functions-1
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt
Comparative Structure of Integument in Vertebrates.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
neck nodes and dissection types and lymph nodes levels
protein biochemistry.ppt for university classes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
ECG_Course_Presentation د.محمد صقران ppt
Ad

CS-ChapterI.pdf

  • 1. ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 2011 Compressive Sensing G. Arce Fall, 2011 1 / 75
  • 2. Outline Introduction and Motivation Vector Spaces and the Nyquist-Shannon Sampling Theorem Sparsity and the ℓ1 Norm Sparse Signal Representation Compressive Sensing G. Arce Fall, 2011 2 / 75
  • 3. Compressed Sensing encompasses exciting and surprising developments in signal processing resulting from sparse representations. It is about the interplay between sparsity and signal recovery. Roots trace back to † Mathematics and harmonic analysis Physical sciences and geophysics Vision Optimization and computational tools This course describes this fascinating topic and the tools needed in its applications. † D. Donoho, ”Scanning the Technology,” Proceedings of the IEEE. Vol. 98, No. 6, June 2010 Compressive Sensing G. Arce Fall, 2011 3 / 75
  • 4. Shannon-Nyquist Sampling Theorem The Shannon-Nyquist Theorem: sampling frequency of an analog signal must be greater than twice the highest frequency of the signal in order to perfectly reconstruct the original signal from the sampled version. Theorem If a function f (t) contains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced (W 2 ) seconds apart.† † C. E. Shannon. ”Communication in the presence of noise.” Proceedings of the IRE, Vol. 37, no.1, pp.10-21, Jan.1949. H. Nyquist. ”Certain topics in telegraph transmission theory.” Trans. AIEE, vol.47, pp.617-644, Apr.1928. Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 4 / 75
  • 5. Traditional signal sampling and signal compression. Nyquist sampling rate gives exact reconstruction. Pessimistic for some types of signals! Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 5 / 75
  • 6. Sampling and Compression Transform data and keep important coefficients. Original Image Biorthogonal Spline Wavelet Wavelet Transform Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 6 / 75
  • 7. Sampling and Compression Lots of work to then throw away majority of data!. e.g. JPEG 2000 Lossy Compression: A digital camera can take millions of pixels but the picture is encoded on a few hundred of kilobytes. Original Image Wavelet Transform Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 7 / 75
  • 8. Problem: Recent applications require a very large number of samples: Higher resolution in medical imaging devices, cameras, etc. Spectral imaging, confocal microscopy, radar arrays, etc. Medical Imaging y x λ Spectral Imaging Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 8 / 75
  • 9. Sampling and Compressive Sensing Donoho † , Candès ‡ , Romberg and Tao, discovered important results on the minimum number of data needed to reconstruct a signal Compressive Sensing (CS) unifies sensing and compression into a single task Minimum number of samples to reconstruct a signal depends on its sparsity rather than its bandwidth. † D. Donoho. ”Compressive Sensing”. IEEE Trans. on Information Theory. Vol.52(2), pp.5406-5425, Dec.2006. ‡ E. Candès, J. Romberg and T. Tao. ”Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information”. IEEE Trans. on Information Theory. Vol.52(4), pp.1289-1306, Apr.2006. Compressive Sensing G. Arce Introduction and Motivation Fall, 2011 9 / 75
  • 10. Vector Spaces and the Nyquist-Shannon Sampling Theorem Vector space: set of vectors H satisfying the following axioms: Associativity property: v1 + (v2 + v3) = (v1 + v2) + v3. Commutativity property: v1 + v2 = v2 + v1. Identity element: ∃0 ∈ H, such that v + 0 = v, ∀v ∈ H. Inverse element: ∀v ∈ H, then ∃ − v ∈ H, such that v + (−v) = 0. Distribut. of scalar: s is a scalar, such that s(v1 + v2) = sv1 + sv2. Distribut. of scalar: s1, s2 are scalars, such that (s1 + s2)v = s1v + s2v. Associat. of scalars: s1, s2 are scalars, such that s1(s2v) = (s1s2)v. Identity element of product: ∃ a scalar 1, such that 1v = v. Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 10 / 75
  • 11. Norms: A norm k · k on the vector space H satisfies: ∀x ∈ H, kxk ≥ 0, and kxk = 0 ⇔ x = 0. ∀α ∈ C, kαxk = |α|kxk. (Homogeneity). ∀x, y ∈ H, kx + yk ≤ kxk + kyk. (Triangle inequality). Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 11 / 75
  • 12. Examples of norms: H is the space Rn , with norm kxkℓp = ( Pn k=1 |xk|p )1/p , for p ≥ 1. In R2 , set the unit ball Bp = {x : kxkℓp = 1; p ≥ 1}: The unit ball is the set of all points (x1, x2) which satisfy the equations: |x1| + |x2| = 1, for B1. x2 1 + x2 2 = 1, for B2. max{x1, x2} = 1, for B∞. Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 12 / 75
  • 13. In Rn , kxkℓ1 = Pn k=1 |xk| is a norm since it satisfies: ∀x ∈ Rn , then kxkℓ1 = Pn k=1 |xk| ≥ 0. Also, Pn k=1 |xk| = 0, if and only if xk = 0, ∀k. ∀α ∈ C, then kαxkℓ1 = Pn k=1 |αxk| = |α| Pn k=1 |xk| = |α|kxkℓ1 . ∀x, y ∈ Rn , then kx + ykℓ1 = n X k=1 |xk + yk| ≤ n X k=1 (|xk| + |yk|); Convex Function = n X k=1 |xk| + n X k=1 |yk| = kxkℓ1 + kykℓ1 . Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 13 / 75
  • 14. In Rn , kxkℓp = ( Pn k=1 |xk|p )1/p , with p = 0.5, is not a norm: ∀x ∈ Rn , then kxkℓ0.5 = ( Pn k=1 |xk|1/2 )2 ≥ 0. Also, ( Pn k=1 |xk|0.5 )2 = 0, if and only if xk = 0, ∀k. ∀α ∈ C, then kαxkℓ0.5 = ( Pn k=1 |αxk|1/2 )2 = ( Pn k=1 |α|1/2 |xk|1/2 )2 = (|α|1/2 Pn k=1 |xk|1/2 )2 = |α|kxkℓ0.5 . ∀x, y ∈ Rn , then kx + ykℓ0.5 = ( n X k=1 |xk + yk|1/2 )2 ≥ ( n X k=1 |xk|1/2 + n X k=1 |yk|1/2 )2 − 2 n X k=1 |xk|1/2 n X k=1 |yk|1/2 ; = ( n X k=1 |xk|1/2 )2 + ( n X k=1 |yk|1/2 )2 = kxkℓ0.5 + kykℓ0.5 (Triangle inequality is not satisfied) Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 14 / 75
  • 15. Other Examples of Norms: Operator norm: H is the space of m × n matrices A kAk = σmax(A) = maximum singular value of A. Frobenius norm: H is the space of m × n matrices A kAkF = ( P i,j A2 i,j)1/2 = ( P k σ2 k )1/2 Normed vector spaces: vector spaces H satisfying the norm properties. Examples of normed vector spaces: ℓ2(R) (also known as ℓ2 or Euclidean space): the vector space R satisfying the properties of the ℓ2-norm. ℓ∞(R): the vector space R satisfying the properties of the ℓ∞-norm. Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 15 / 75
  • 16. Inner Products An inner product < ·, · > on H satisfies ∀x, y, z ∈ H and α ∈ C: < x, y >=< y, x >∗ < αx, y >= α < x, y > < x + y, z >=< x, z > + < y, z > < x, x >≥ 0, < x, x >= 0 ⇔ x = 0 A inner product operator induces a norm on H: √ < x, x > = kxk. In ℓ2(R), for instance, the inner product is given by: < x, y >= Z ∞ −∞ x(t)y∗ (t)dt. (1) < x, x >= Z ∞ −∞ x(t)x∗ (t)dt = kxk2 ℓ2 . (2) Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 16 / 75
  • 17. Hilbert Spaces A vector space H that satisfies the inner product properties is known as Hilbert space. Examples of Hilbert spaces: The Euclidean space Rn with the dot product as inner product: < x, y >= Pn i=1 xiyi. The space of real-valued, finite variance, zero-mean random variables: < x, y >= E[xy]. The space of m × n matrices with: < A, B >tr= trace(AB). Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 17 / 75
  • 18. Definitions Orthogonality: two signals x, y are orthogonal if < x, y >= 0. Orthonormal basis: a basis of a vector space is orthonormal if their vectors are orthonormal. Orthonormal sequence: {βn}n∈Z is an orthonormal sequence if: kβnk = 1, ∀n, and < βn, βm >= 0, ∀n 6= m Example: Fourier series: {βn}n∈Z = {ej2πnt }n∈Z is an orthobasis for ℓ2([0,1]), since: kβnkℓ2 = 1 < βn, βm >= 0 Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 18 / 75
  • 19. Definitions Cauchy-Schwarz Inequality: | < x, y > | ≤ kxkkyk. For the Euclidean space H = Rn : | < x, y > | = P i xiyi ≤ q ( P i x2 i ) q ( P i y2 i ) = kxkℓ2 kykℓ2 . For the space of real-valued, finite variance, zero-mean random variables:| < x, y > | = E[xy] ≤ (E[x])(E[y]) = kxkkyk. Compressive Sensing G. Arce Vector Spaces & Nyquist-Shannon Theorem Fall, 2011 19 / 75
  • 20. Shannon-Nyquist Sampling Theorem Sampling of a bandlimited signal. Let f̂(w) be the Fourier transform of f (t). Let the space of bandlimited signals be Bπ/T = {f (t) ∈ Rn s.t. f̂(w) = 0, ∀|w| > π/T}. Define hT(t) = √ T sin(πt/T) πt ↔ ĥ(w) = √ T ;if |w| ≤ π/T 0 ;if |w| π/T. Compressive Sensing G. Arce Vector Spaces Nyquist-Shannon Theorem Fall, 2011 20 / 75
  • 21. By the linear shift property of the Fourier series hT(t − nT) ↔ √ TejwnT . Using the Parseval theorem definition Parseval theorem: R ∞ −∞ f (t)g∗ (t)dt = 1 2π R ∞ −∞ f̂(w)ĝ(w)dw, note that hT(t − nT) is an orthobasis for the bandlimited signals f (t) in Bπ/T: Z ∞ −∞ hT(t)h(t − nT)dt = 1 2π Z π/T −π/T TejwnT dw = 1 2jπn ejwnT | π/T −π/T = 1 2jπn (ejπn − e−jπn ) = 0, ∀n ∈ Z. Compressive Sensing G. Arce Vector Spaces Nyquist-Shannon Theorem Fall, 2011 21 / 75
  • 22. The signals f (t) in Bπ/T can be expressed in terms of its orthobasis f (t) = X n∈Z hf (t), h(t − nT)ih(t − nT). (3) Using the inner product definition in (2) and the parseval theorem, the coefficients for the signal expansion in terms of its orthobasis are hf (t), h(t − nT)i = 1 2π Z π/T −π/T f̂(w) √ TejwnT dw = √ Tf (nT) (4) Compressive Sensing G. Arce Vector Spaces Nyquist-Shannon Theorem Fall, 2011 22 / 75
  • 23. Replacing (4) in (3), the signals f (t) in Bπ/T can then be expressed in terms of a sequence f (t) = √ T X n∈Z f (nT)h(t − nT). (5) where, the coefficients f (nT) of the sequence are samples of f (t). Nyquist-Shannon-Kotelnikov Theorem If a signal f (t) contains frequencies satisfying |w| π/T, the signal is completely determined by series of points spaced T seconds apart. Compressive Sensing G. Arce Vector Spaces Nyquist-Shannon Theorem Fall, 2011 23 / 75
  • 24. Sparsity Signal sparsity critical to CS Plays roughly the same role in CS that bandwidth plays in Shannon-Nyquist theory A signal x ∈ RN is S-sparse on the basis Ψ if x can be represented by a linear combination of S vectors of Ψ as x = Ψα with S ≪ N Ψ x α At most S non-zero components Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 24 / 75
  • 25. The ℓ1 Norm and Sparsity The ℓ0 norm is defined by: kxk0 = #{i : x(i) 6= 0} Sparsity of x is measured by its number of non-zero elements. The ℓ1 norm is defined by: kxk1 = P i |x(i)| ℓ1 norm has two key properties: Robust data fitting Sparsity inducing norm The ℓ2 norm is defined by: kxk2 = ( P i |x(i)|2 )1/2 ℓ2 norm is not effective in measuring sparsity of x Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 25 / 75
  • 26. Why ℓ1 Norm Promotes Sparsity? Given two N-dimensional signals: x1 = (1, 0, ..., 0) → ”Spike” signal x2 = (1/ √ N, 1/ √ N, ..., 1/ √ N) → ”Comb” signal x1 and x2 have the same ℓ2 norm: kx1k2 = 1 and kx2k2 = 1. However, kx1k1 = 1 and kx2k1 = √ N. x2 x1 Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 26 / 75
  • 27. ℓ1 Norm in Regression Linear regression is widely used in science and engineering. Given A ∈ Rm×n and b ∈ Rm ; m n Find x s.t. b = Ax (overdetermined) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 27 / 75
  • 28. ℓ1 Norm Regression Two approaches: Minimize the ℓ2 norm of the residuals min x∈Rn kb − Axk2 The ℓ2 norm penalizes large residuals Minimizes the ℓ1 norm of the residuals min x∈Rn kb − Axk1 The ℓ1 norm puts much more weight on small residuals Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 28 / 75
  • 29. Matlab Code minx∈Rn kAx − bk2 A=randn(500,150); b=randn(500,1); x = (A′ ∗ A)(−1) ∗ A′ ∗ b; Least Squares Solution minx∈Rn kAx − bk1 A=randn(500,150); b=randn(500,1); X = medrec(b,A,max(A’*b),0,100,1e-5); Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 29 / 75
  • 30. ℓ1 Norm Regression m = 500, n = 150. A = randn(m, n) and b = randn(m, 1) −3 −2 −1 0 1 2 3 0 5 10 15 20 25 30 −3 −2 −1 0 1 2 3 20 40 60 80 100 120 140 160 ℓ2 Residuals ℓ1 Residuals Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 30 / 75
  • 31. ℓ1 Norm in Regression Given A ∈ Rm×n and b ∈ Rm ; m n Find x s.t. b = Ax (underdetermined) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 31 / 75
  • 32. ℓ1 Norm Regression Two approaches: Minimize the ℓ2 norm of x min x∈Rn kxk2 subject to Ax = b Minimize the ℓ1 norm of x min x∈Rn kxk1 subject to Ax = b Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 32 / 75
  • 33. Matlab Code minx∈Rn kxk2 subject to Ax = b A=randn(150,500); b=randn(150,1); C=eye(150,500); d=zeros(150,1); X=lsqlin(C,d,[],[],A,b); In general: minx∈Rn f (x) subject to Ax = b X= fmincon(@(x) f(x),zeros(500,1),[],[],A,b,[],[],options); where f (x) is a convex function. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 33 / 75
  • 34. ℓ1 Norm Regression −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0 5 10 15 20 25 30 35 −4 −3 −2 −1 0 1 2 3 4 x 10 6 0 50 100 150 200 250 ℓ2 Solution ℓ1 Solution Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 34 / 75
  • 35. ℓ1 Norm Regression Consider N observation pairs (xi, bi) modeled in a linear fashion bi = Axi + c + Ui, i = 1, 2, . . . , N (6) A: Unknown slope of the fitting line. c: Intercept. Ui: Unobservable errors The Least Absolute Deviation regression is F1(A, c) = N X i=1 |bi − Axi − c|, (7) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 35 / 75
  • 36. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 36 / 75
  • 37. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 37 / 75
  • 38. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 38 / 75
  • 39. PN i=1 |bi − Axi − c| c = −xiA + bi A c Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 39 / 75
  • 40. ℓ1 Norm in Estimation Location Estimate in Gaussian Noise Let x1, x2, · · · , xN, i.i.d. Gaussian with a constant but unknown mean β. The Maximum Likelihood estimate of location is the value β̂ which maximizes the likelihood function f (x1, x2, · · · , xN; β) = N Y i=1 f (xi − β) = N Y i=1 1 √ 2πσ e−(xi−β)2 /2σ2 (8) = 1 2πσ2 N/2 e− PN i=1(xi−β)2 /2σ2 . Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 40 / 75
  • 41. ℓ1 Norm in Estimation The ML estimate β̂ minimizes the least squares sum β̂ML = arg min β N X i=1 (xi − β)2 . (9) Results in the sample mean β̂ML = 1 N N X i=1 xi. (10) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 41 / 75
  • 42. ℓ1 Norm in Estimation Location Estimate in Generalized Gaussian Noise If the x′ s obey a generalized Gaussian distribution, the ML estimate of location is f (x1, x2, · · · , xN; β) = N Y i=1 fγ(xi − β) = N Y i=1 C e−|xi−β|γ /σ = CN e− PN i=1|xi−β|γ /σ , (11) where C is a normalizing constant, and γ is the dispersion parameter. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 42 / 75
  • 43. ℓ1 Norm in Estimation Maximizing the likelihood function is equivalent to β̃ML = arg min β N X i=1 |xi − β|γ . X 1 X 4 X 3 X5 X 2 γ = 2 γ = 1 γ = 0.5 Figure: Cost function for γ = 0.5, 1, and 2. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 43 / 75
  • 44. ℓ1 Norm in Estimation For N odd there is an integer k, such that the slopes over the intervals (x(k−1), x(k)] and (x(k), x(k+1)], are negative and positive, respectively. β̂ML = arg min β N X i=1 |xi − β| = ( x(N+1 2 ) N odd x(N 2 ), x(N 2 ) i N even = MEDIAN(x1, x2, · · · , xN). (12) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 44 / 75
  • 45. ℓ1 Norm Regression ML Estimate of Location for Generalized Gaussian Here the samples have a common location parameter β, but different scale parameter σi. The ML estimate of location is Gp(β) = N X i=1 1 σp i |xi − β|p . (13) For the Gaussian distribution (p = 2), the ML estimate reduces to β̂ = arg min β N X i=1 1 σ2 i (xi − β)2 = PN i=1 Wi · xi PN i=1 Wi (14) where Wi = 1/σ2 i 0. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 45 / 75
  • 46. For the Laplacian distribution (p = 1), the ML estimate minimizes G1(β) = N X i=1 1 σi |xi − β|. (15) where Wi △ = 1/σi 0. G1(β) is piecewise linear and convex. The weighted median output is defined as Y(n) = arg min β N X i=1 Wi|xi − β| = MEDIAN[W1♦x1(n), W2♦x2(n), · · · , WN♦xN(n)] where Wi 0 and ♦ is the replication operator defined as Wi♦xi = Wi times z }| { xi, xi, · · · , xi. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 46 / 75
  • 47. ℓ1 Norm Regression Next, consider N observation pairs (xi, bi) bi = Axi + c + Ui, i = 1, 2, . . . , N (16) A: Unknown slope of the fitting line. c: Intercept. Ui: Unobservable errors The L1 or Least Absolute Deviation (LAD) regression is F1(A, c) = N X i=1 |bi − Axi − c|, (17) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 47 / 75
  • 48. Sample space: bi = Axi + c 1. Each sample pair (xi, bi) represents a point on the plane 2. The solution is a line with slope A∗ and intercept c∗ . 3. If this line goes through some sample pair (xi, bi), then the equation bi = A∗ xi + c∗ is satisfied Parameter space: c = −xiA + bi 1. The solution (A∗ , b∗ ) is a point. 2. The sample pair (xi, bi) defines a line with slope −xi and intercept bi. 3. When c∗ = −xiA∗ + bi holds, it can be inferred that the point (A∗ , c∗ ) is on the line defined by (−xi, bi) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 48 / 75
  • 49. Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 49 / 75
  • 50. c A (A*,c*) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 50 / 75
  • 51. Set A = A0, the objective function now becomes a one-parameter function of c F(c) = N X i=1 | bi − A0xi | {z } Observations −c|. (18) The parameter c∗ is the Maximum Likelihood estimator of location for c. It can be obtained by c∗ = MED(bi − A0xi) | N i=1. (19) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 51 / 75
  • 52. Set c = c0, the objective function reduces to F(a) = N X i=1 |bi − c0 − Axi| = N X i=1 |xi| bi − c0 xi − A . (20) The parameter A∗ can be seen as the ML estimator of location for A, and can be calculated as the weighted median, A∗ = MED |xi| ⋄ bi − c0 xi N i=1 , (21) Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 52 / 75
  • 53. A simple and intuitive way of solving the LAD regression problem is: 1. Set k = 0. Find an initial value A0 for A, such as the Least Squares (LS) solution. 2. Set k = k + 1 and obtain a new estimate of c for a fixed Ak−1 using ck = MED(bi − Ak−1xi) | N i=1. 3. Obtain a new estimate of A for a fixed ck using Ak = MED |xi| ⋄ bi − ck xi N i=1 . 4. Once Ak and ck do not deviate from Ak−1 and ck−1 within a tolerance range, end the iteration. Otherwise, go back to step 2). Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 53 / 75
  • 54. c A Compressive Sensing G. Arce Sparsity and the ℓ1-Norm Fall, 2011 54 / 75
  • 55. Signal Representation A sparse signal x ∈ RN can be represented by a linear combination of basis of an orthogonal representation matrix Ψ x(t) = X i αiψi(t) Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 55 / 75
  • 56. Sparse Signal Representation Active development for effective signal representation in the 90’s Fourier Wavelet Curvelet There is no universal best representation Best representation = sparsest Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 56 / 75
  • 57. Wavelets A wavelet is a ”small wave” with finite energy that allows the analysis of transient, or time-varying phenomena. Figure: Daubechies (D20) Wavelet example Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 57 / 75
  • 58. A signal x(t) can be represented in terms of its wavelet coefficients as x(t) = X j∈Z X n∈Z hx, Ψj,niΨj,n(t) where: Ψj,n are the wavelets that form an orthogonal basis. hx, Ψj,ni are the wavelet coefficients. Wavelets are vectors of a orthogonal basis formed by shifting and dilating a mother wavelet, Ψ(t): Ψj,n(t) = 2−j/2 Ψ(2−j t − n), ∀j, n ∈ Z where j is the scale parameter and n is the location parameter. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 58 / 75
  • 59. Examples of wavelet expansion functions are: Figure: Haar wavelet Figure: Daubechies wavelet Figure: Symlet wavelet Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 59 / 75
  • 60. Daubechies Wavelet Daubechies Wavelets are continuous and smooth wavelets. The mother wavelet is defined by means of a scaling function. A daubechies wavelet Ψ(t) has p − 1 vanishing moments if: Z ∞ −∞ tk Ψ(t)dt = 0; for 0 ≤ k p. The smoothness of the scaling and wavelet functions increase as the number of vanishing moments increases. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 60 / 75
  • 61. Examples of Daubechies wavelets: (a) (b) (c) (a) Daubechies scaling and wavelet functions with 2 vanishing moments. (b) Daubechies scaling and wavelet functions with 6 vanishing moments. (c) Daubechies scaling and wavelet functions with 10 vanishing moments. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 61 / 75
  • 62. Examples of Wavelet decompositions Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 62 / 75
  • 63. Examples of Wavelet decompositions Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 63 / 75
  • 64. Other examples: original signals Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 64 / 75
  • 65. Noisy signals Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 65 / 75
  • 66. Denoising using wavelet approximation Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 66 / 75
  • 67. Denoising using wavelet approximation Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 67 / 75
  • 68. Sampling and Compression JPEG JPEG JPEG2000 JPEG2000 Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 68 / 75
  • 69. Sparse Signal Representation Different representations are best for different applications. Fourier Dictionary → For oscillatory phenomena Wavelet Dictionary → For images with isolated singularities Curvelet Dictionary → For images with contours and edges This motivates overcomplete signal representation ‡ ‡ S. Mallat and Z. Zhang. ”Matching Pursuit in a Time-Frequency Dictionary”. IEEE Trans. on Signal Proc. Vol.41, pp.3397-3415, 1993. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 69 / 75
  • 70. Sparse Signal Representation Overcomplete dictionary representation Different bases merged into a combined dictionary Ψ = [Ψ1, Ψ2, ..., ΨN] Representation of x in an overcomplete dictionary x = X i αiψi, with the sparsest α Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 70 / 75
  • 71. Basis Pursuit (BP) Basis Pursuit → find the sparsest approximation of x min α kαk1 s.t. x = Ψα where kαk1 = P i |αi|. BP decomposes a signal into a superposition of dictionary elements having the smallest ℓ1-norm among all such decompositions. † D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory, 47:2845-2862, 2001. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 71 / 75
  • 72. Compressible Signals In most applications Signals are not perfectly sparse, but only a few coefficients concentrate most of the energy. Most of the transform coefficients are negligible. Compressible signals can be approximated by a S-sparse signal: - There is a transform vector αS with only S terms such that kαS − αk2 is small. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 72 / 75
  • 73. Compressible Signals Wavelet coefficients of natural scenes exhibit the (1/n)-decay† . 1 Megapixel Image Wavelet Coefficients Sorted Wavelet Coeff. † E. J. Candès and J. Romberg ”Sparsity and Incoherence in Compressive Sampling.” Inverse Problems. vol.23, pp.969-985. 2006. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 73 / 75
  • 74. Examples of Compressible Signals Bat echolocation Time Signal Time-Frequency Representation Confocal microscopy 3D Image 3D Wavelet Coefficients Ultra wideband signaling Amplitude[mV] Time[ps] Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 74 / 75
  • 75. Compressive Sensing G. Arce Sparse Signal Representation Fall, 2011 75 / 75