Compressed Sensing and Tomography

Compressive
Sensing
Gabriel Peyré
www.numerical-tours.com

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameters Selection

Single Pixel Camera (Rice)
˜
f

˜
f

y[i] = f, i
P measures N micro-mirrors

˜
f

y[i] = f, i
P measures N micro-mirrors

P/N = 1 P/N = 0.16 P/N = 0.02

CS Hardware Model
˜
CS is about designing hardware: input signals f L2 (R2 ).
Physical hardware resolution limit: target resolution f RN .

array micro
˜
f L2 f RN mirrors y RP
resolution
K
CS hardware

CS Hardware Model
˜

array micro
˜
resolution
K
CS hardware

,
,
...

,

CS Hardware Model
˜

array micro
˜
resolution
K
CS hardware

,
Operator K
, f
...

,

Sparse CS Recovery
f0 RN
f0 RN sparse in ortho-basis

N
x0 R

Sparse CS Recovery
f0 RN

(Discretized) sampling acquisition:
y = Kf0 + w = K (x0 ) + w
=

N
x0 R

Sparse CS Recovery
f0 RN

y = Kf0 + w = K (x0 ) + w
=

K drawn from the Gaussian matrix ensemble
Ki,j N (0, P 1/2
) i.i.d.
drawn from the Gaussian matrix ensemble
N
x0 R

Sparse CS Recovery
f0 RN

y = Kf0 + w = K (x0 ) + w
=

K drawn from the Gaussian matrix ensemble
Ki,j N (0, P 1/2
) i.i.d.
drawn from the Gaussian matrix ensemble
N
x0 R
Sparse recovery: min ||x||1
|| x y|| ||w||

CS Simulation Example

Original f0
= translation invariant
wavelet frame

CS with RIP

1
recovery:
y= x0 + w
x⇥
argmin ||x||1 where
|| x y|| ||w||

Restricted Isometry Constants:
⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 + k )||x||2

CS with RIP

1
recovery:
y= x0 + w
x⇥
argmin ||x||1 where
|| x y|| ||w||

Restricted Isometry Constants:
⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 + k )||x||2

Theorem: If 2k2 1, then [Candes 2009]
C0
||x0 x || ⇥ ||x0 xk ||1 + C1
k
where xk is the best k-term approximation of x0 .

Singular Values Distributions
Eigenvalues of I I with |I| = k are essentially in [a, b]
a = (1 ) 2
and b = (1 )2
P=200, k=10
where = k/P
1.5

When k = P
1
+ , the eigenvalue distribution tends to
0.5
1
0
f (⇥) = (⇥ b)+ (a ⇥)+ [Marcenko-Pastur]
2⇤ ⇥
0 0.5 1 1.5 2 2.5

P=200, k=30

f ( )
1

0.8

0.6
P = 200, k = 30
0.4

0.2

0
0 0.5 1 1.5 2 2.5

Large deviation inequality [Ledoux]
P=200, k=50

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5

Singular Values Distributions
Eigenvalues of I with |I| = k are essentially in [a, b]
I
a = (1 ) 2
and b = (1 )2
P=200, k=10
where = k/P
1.5

When k = P
1
+ , the eigenvalue distribution tends to
0.5
1
0
f (⇥) = (⇥ b)+ (a ⇥)+ [Marcenko-Pastur]
2⇤ ⇥
0 0.5 1 1.5 2 2.5

P=200, k=30

f ( )
1

0.8

0.6
P = 200, k = 30
0.4

0.2

0
0 0.5 1 1.5 2 2.5

Large deviation inequality [Ledoux]
P=200, k=50

0.8

0.6

0.4
C
Theorem:
0.2
If k P
0
0 0.5
log(N/P )
1 1.5 2 2.5

then 2k 2 1 with high probability.

Numerics with RIP
Stability constant of A:
(1 ⇥1 (A))|| ||2 ||A ||2 (1 + ⇥2 (A))|| ||2

smallest / largest eigenvalues of A A

Numerics with RIP
Stability constant of A:
(1 ⇥1 (A))|| ||2 ||A ||2 (1 + ⇥2 (A))|| ||2

smallest / largest eigenvalues of A A

Upper/lower RIC:
ˆ2
k
i
k = max i( I)
|I|=k
2 1 ˆ2
k
k = min( k, k)
1 2

Monte-Carlo estimation:
ˆk k k
N = 4000, P = 1000

Polytope Noiseless Recovery
Counting faces of random polytopes: [Donoho]
All x0 such that ||x0 ||0 Call (P/N )P are identiﬁable.
Most x0 such that ||x0 ||0 Cmost (P/N )P are identiﬁable.

Call (1/4) 0.065
1

0.9

Cmost (1/4) 0.25 0.8

0.7

0.6

Sharp constants. 0.5

0.4

No noise robustness. 0.3

0.2

0.1

0
50 100 150 200 250 300 350 400

RIP
All Most

Polytope Noiseless Recovery
Counting faces of random polytopes: [Donoho]
All x0 such that ||x0 ||0 Call (P/N )P are identiﬁable.
Most x0 such that ||x0 ||0 Cmost (P/N )P are identiﬁable.

Call (1/4) 0.065
1

0.9

Cmost (1/4) 0.25 0.8

0.7

0.6

Sharp constants. 0.5

0.4

No noise robustness. 0.3

0.2
Computation of 0.1

“pathological” signals 0
50 100 150 200 250 300 350 400

[Dossal, P, Fadili, 2010]
RIP
All Most

Tomography and Fourier Measures

Tomography and Fourier Measures
ˆ
f = FFT2(f )

k

Fourier slice theorem: ˆ ˆ
p (⇥) = f (⇥ cos( ), ⇥ sin( ))
1D 2D Fourier

t R
Partial Fourier measurements: {p k
(t)}0 k<K
Equivalent to: ˆ
Kf = (f [!])!2⌦

Regularized Inversion
Noisy measurements: ⇥ ˆ
, y[ ] = f0 [ ] + w[ ].
Noise: w[⇥] N (0, ), white noise.
1
regularization:
1 ˆ[⇤]|2 +
f = argmin
⇥
|y[⇤] f |⇥f, ⇥m ⇤|.
f 2 m

MRI Imaging
From [Lutsig et al.]

MRI Reconstruction
From [Lutsig et al.]
randomization
Fourier sub-sampling pattern:

High resolution Low resolution Linear Sparsity

Structured Measurements
Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: { } orthogonal basis.
Kf = (h'! , f i)!2⌦ where |⌦| = P uniformly random.

Fast measurements: (e.g. Fourier basis)



⌅ ⌅
Mutual incoherence: µ= N max |⇥⇥ , m ⇤| [1, N]
,m



⌅ ⌅
Mutual incoherence: µ= N max |⇥⇥ , m ⇤| [1, N]
,m

Theorem: with high probability on , =K
CP p
If k 6 2 , then 2k 6 2 1
µ log(N )4 [Rudelson, Vershynin, 2006]
not universal: requires incoherence.

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameter Selection

Risk Minimization
1 2
Estimator: e.g. x (y) 2 argmin ||y x|| + ||x||1
x 2

I >0 a regularization parameter.
Risk Minimization
How to choose the value of the parameter ?
1 2
Estimator: e.g. x (y) 2 argmin ||y
Risk-based selection of
x|| + ||x||1
x 2
?(y, ) ||2 )x ,
Risk associated risk: R(of ) = Ew (||x (y)x x0 wrt 0
I Average to : measure the expected quality of
? R( ) = Ew ||x?(y, ) x0||2 .
(y) = argmin R( ) Plugin-estimator: x ? (y) (y)
I The optimal (theoretical) minimizes the risk.

The risk is unknown since it depends on x0.
Can we estimate the risk solely from x?(y, )?

Risk Minimization
1 2
x|| + ||x||1
x 2
?(y, ) ||2 )x ,
? R( ) = Ew ||x?(y, ) x0||2 .

Ew is not accessible ! use one observation.
But: The risk is unknown since it depends on x0.
Can we estimate the risk solely from x?(y, )?

Risk Minimization
1 2
x|| + ||x||1
x 2
?(y, ) ||2 )x ,
? R( ) = Ew ||x?(y, ) x0||2 .

Ew is not accessible ! use one observation.
But: The risk is unknown since it depends on x0.
x0 is not accessible ! needs risk estimators.
?
Can we estimate the risk solely from x (y, )?

Prediction Risk Estimation
Prediction: µ (y) = x (y)
Sensitivity analysis: if µ is weakly di↵erentiable
2
µ (y + ) = µ (y) + @µ (y) · + O(|| || )

2
µ (y + ) = µ (y) + @µ (y) · + O(|| || )
Stein Unbiased Risk Estimator:
2 2 2
SURE (y) = ||y µ (y)|| P +2 df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

2
µ (y + ) = µ (y) + @µ (y) · + O(|| || )
2 2 2
SURE (y) = ||y µ (y)|| P +2 df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

2
Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 µ (y)|| )

2
µ (y + ) = µ (y) + @µ (y) · + O(|| || )
2 2 2
SURE (y) = ||y µ (y)|| P +2 df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

2

Other estimators: GCV, BIC, AIC, . . .

2
µ (y + ) = µ (y) + @µ (y) · + O(|| || )
2 2 2
SURE (y) = ||y µ (y)|| P +2 df (y)
df (y) = tr(@µ (y)) = div(µ )(y)

2

Other estimators: GCV, BIC, AIC, . . .
Generalized SURE: estimate Ew (||Pker( )? (x0 x (y))||2 )

Computation for L1 Regularization
1 2
Sparse estimator: x (y) 2 argmin ||y x|| + ||x||1
x 2

Computation for L1 Regularization
1 2
Sparse estimator: x (y) 2 argmin ||y x|| + ||x||1
x 2

Theorem: for all y, there exists x? s.t. I injective.
df (y) = div ( x ) (y) = ||x? ||0 [Dossal et al. 2011]

(a) y

Computation for L1 Regularization Regulariz
6

Quadratic lo
? x 10
(b) x (y, ) at the optimal
2 4 6 2.5
Projection Risk
GSURE

1 True Risk

Sparse estimator: x (y) 21.5 using multi-scale 2 + ||x||thresholding
argmin 2||y
Compressed-sensing x|| wavelet 1
(a) y
2

Quadratic loss
x
)
6
1.5 x 10
)
⌘
?
Theorem: for all y, there exists x? s.t. I injective.
2.5

df (y) = div ( x ) (y)1 = ||x? ||01 [Dossal et al. 2011]
(b) x?(y, ) at the optimal 2 4 66
(b) x?(y, ) at the optimal 2 4 8 8 10 10
12
Regularization parameter λ

2R P ⇥N Compressed-sensing using multi-scale wavelet thresholding
realization of a random vector. P = N/4 2

Quadratic loss
pressed-sensing using multi-scale wavelet thresholding
are indexed by I, 6
x 10
: TI wavelets. (c) xM L
2.5
Projection Risk
on GJ : 6 GSURE
x 10 True Risk

Quadratic loss
2.5
2
1.5 Projecti
Quadratic loss
(c) xM L
GSURE
True Ri
1.5
A[J]DI sI .
2
atic loss

)? 1
+ xat ? (y) 2 4 6
y
(c) xM L (d) x?(y,
(d) x?(y,
) the optimal
) at the optimal
1
2 4 ? 6 8
10 12
Regulariz

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system
Anisotropic Total-Variation
✓ ⇤
DJ
◆✓ ◆ ✓ ⇤ ◆ 6
⌫ z 10 6
x 10
DJ⇤ 0 ⌫
˜
= x .
0 1
2.5
2.5
Extension to ` analysis, TV.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the [Vaiter et al. conjugate gradient solver
linear system with a 2012]
: vertical sub-sampling.
Numerical example
Finite di↵erences gradient:
Super-resolution using (anisotropic) Total-Variation
Observations y 2
2
D = [@1 , @2 ]
(a) y

Quadratic loss
(a) y

Quadratic loss
6
x 10
2.5
Projection Risk
GSURE
True Risk
Quadratic loss
2
(a) y
Quadratic loss

1.5
1.5

1.5

1 1
1
?
?
x (y)
(b) x?(y, ? at the optimal
) 2 4 ?6 8 10 12

Conclusion
Sparsity: approximate signals with few atoms.

dictionary

Conclusion

dictionary

Compressed sensing ideas:
Randomized sensors + sparse recovery.
Number of measurements signal complexity.
CS is about designing new hardware.

Conclusion

dictionary

Compressed sensing ideas:
Randomized sensors + sparse recovery.
Number of measurements signal complexity.
CS is about designing new hardware.

The devil is in the constants:
Worse case analysis is problematic.
Designing good signal models.

Compressed Sensing and Tomography

More Related Content

What's hot (20)

Similar to Compressed Sensing and Tomography (20)

More from Gabriel Peyré (20)

Recently uploaded (20)

Compressed Sensing and Tomography