SlideShare a Scribd company logo
Phase transition for statistical estimation:
algorithms and fundamental limits
Marc Lelarge
INRIA-ENS
APS - INFORMS 2023
A bit of history: 70’s
A bit of history: 70’s
A bit of history: 2010’s
A bit of history: 2010’s
A bit of history: 2010’s
Applications to high-dimensional statistics
Approximate Message Passing
Approximate Message Passing
This tutorial: demystifying statistical physics!
- A simple version of AMP algoritm
This tutorial: demystifying statistical physics!
- A simple version of AMP algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
This tutorial: demystifying statistical physics!
- A simple version of AMP algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms and fundamental limits.
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d
∼ X0 with E

X2
0

= 1, then for any nice test function Ψ : Rt → R,
1
n
n
X
i=1
Ψ

x1
i , . . . , xt
i

→ E [Ψ(Z1, . . . , Zt)] ,
where (Z1, . . . , Zt)
d
= (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d.
(Bayati Montanari ’11)
Sanity check
We have x1 = Wf0(x0) so that
x1
i =
X
j
Wijf0(x0
j ),
where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms).
Hence x1 is a centred Gaussian vector with entries having variance
1
N
X
j
f0(x0
j )2
≈ E
h
f0(X0)2
i
= σ1.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
When 1
n kx0k = 1, we have 1
n hxs, xti ≈ trPs(W)Pt(Wt).
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
Credit: Zhou Fan.
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
The Onsager term is very similar to the Itô-correction in stochastic
calculus.
This tutorial: demystifying statistical physics!
- A simple version of AMP algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
Low-rank matrix estimation
“Spiked Wigner” model
Y
|{z}
observations
=
v
u
u
u
t
λ
n
XX|
| {z }
signal
+ Z
|{z}
noise
I X: vector of dimension n with entries Xi
i.i.d.
∼ P0. EX1 = 0,
EX2
1 = 1.
I Zi,j = Zj,i
i.i.d.
∼ N(0, 1).
I λ: signal-to-noise ratio.
I λ and P0 are known by the statistician.
Goal: recover the low-rank matrix XX|
from Y.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
B.B.P. phase transition
I if λ 6 1



µn
a.s.
−
−
−
→
n→∞
2
X · x̂n
a.s.
−
−
−
→
n→∞
0
I if λ  1



µn
a.s.
−
−
−
→
n→∞
√
λ + 1
√
λ
 2
|X · x̂n|
a.s.
−
−
−
→
n→∞
p
1 − 1/λ  0
(Baik, Ben Arous, Péché ’05)
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
I More generally, what is the best achievable estimation
performance in both regimes?
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
We can certainly improve spectral algorithm!
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
Bayes optimal AMP
We define mmse(γ) = E
h
X0 − E[X0|
√
γX0 + Z]
2
i
and the
recursion:
q0 = 1 − λ−1
qt+1 = 1 − mmse(λqt).
With the optimal denoiser gP0 (y, γ) = E[X0|
√
γX0 + Z = y], AMP
is defined by:
xt+1
= Y
s
λ
n
ft(xt
) − λbtft−1(xt−1
),
where ft(y) = gP0 (y/
√
λqt, λqt).
Bayes optimal AMP: experiment
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Limiting formula for the MMSE
Theorem (L, Miolane ’19)
MMSEn −
−
−
→
n→∞
1
|{z}
Dummy MSE
− q∗
(λ)2
where q∗
(λ) is the minimizer of
q  0 7→ −EX0∼P0
Z0∼N

log
Z
x0
dP0(x0)e
√
λqZ0x0+λqX0x0+λq
2
x2
0

+
λ
4
q2
A simplified “free energy landscape”:
0.0 0.2 0.4 0.6 0.8 1.0
q
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0.00 −F(λ, q)
(a) “Easy” phase (λ = 1.01)
0.0 0.2 0.4 0.6 0.8
q
−0.002
−0.001
0.000
0.001
0.002
0.003
−F(λ, q)
(b) “Hard” phase (λ = 0.625)
0.0 0.2 0.4 0.6 0.8
q
0.0000
0.0025
0.0050
0.0075
0.0100
0.0125
0.0150
0.0175 −F(λ, q)
(c) “Impossible” phase (λ = 0.5)
Phase diagram
Figure: Spiked Wigner model, centred binary prior (unit variance).
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Two step proof:
I Lower bound: Guerra’s interpolation technique. Adapted in
(Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16)
(
Y =
√
t
p
λ/n XX| + Z
Y0 =
√
1 − t
√
λ X + Z0
I Upper bound: Cavity computations (Mézard, Parisi, Virasoro
’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr
’03) (Talagrand ’10)
Conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
composition... and new applications outside electrical engineering
like in ecology.
Conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
composition... and new applications outside electrical engineering
like in ecology.
Deep learning, the new kid on the block:
Thank you for your attention !

More Related Content

PDF
Circuit Network Analysis - [Chapter1] Basic Circuit Laws
PDF
Circuit Network Analysis - [Chapter2] Sinusoidal Steady-state Analysis
PDF
RF Module Design - [Chapter 2] Noises
PPTX
Voltage Controlled Oscillator Design - Short Course at NKFUST, 2013
PDF
RF Module Design - [Chapter 7] Voltage-Controlled Oscillator
PDF
8.02 introduction to electrodynamics 3e-griffiths
PDF
Table of transformation of laplace & z
PDF
電子學103-Chapter5 BJT電晶體
Circuit Network Analysis - [Chapter1] Basic Circuit Laws
Circuit Network Analysis - [Chapter2] Sinusoidal Steady-state Analysis
RF Module Design - [Chapter 2] Noises
Voltage Controlled Oscillator Design - Short Course at NKFUST, 2013
RF Module Design - [Chapter 7] Voltage-Controlled Oscillator
8.02 introduction to electrodynamics 3e-griffiths
Table of transformation of laplace & z
電子學103-Chapter5 BJT電晶體

What's hot (20)

PDF
射頻電子 - [第六章] 低雜訊放大器設計
PDF
RF Circuit Design - [Ch2-2] Smith Chart
PDF
電路學 - [第五章] 一階RC/RL電路
PDF
IC Design of Power Management Circuits (I)
PDF
射頻電子 - [第二章] 傳輸線理論
PPTX
PPTX
Notes 10 6382 Residue Theorem.pptx
PDF
射頻電子實驗手冊 [實驗1 ~ 5] ADS入門, 傳輸線模擬, 直流模擬, 暫態模擬, 交流模擬
PDF
Agilent ADS 模擬手冊 [實習3] 壓控振盪器模擬
PDF
Random Process
PDF
Multiband Transceivers - [Chapter 3] Basic Concept of Comm. Systems
PDF
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
PDF
電路學 - [第七章] 正弦激勵, 相量與穩態分析
PDF
The two dimensional wave equation
PPT
first order ode with its application
DOC
Chuong 4.2
PDF
RF Circuit Design - [Ch3-2] Power Waves and Power-Gain Expressions
PDF
BoltzTrap webinar116_David_J_Singh.pdf
PDF
Solutions manual for engineering economy 8th edition by blank ibsn 0073523437
PDF
3.2.interpolation lagrange
射頻電子 - [第六章] 低雜訊放大器設計
RF Circuit Design - [Ch2-2] Smith Chart
電路學 - [第五章] 一階RC/RL電路
IC Design of Power Management Circuits (I)
射頻電子 - [第二章] 傳輸線理論
Notes 10 6382 Residue Theorem.pptx
射頻電子實驗手冊 [實驗1 ~ 5] ADS入門, 傳輸線模擬, 直流模擬, 暫態模擬, 交流模擬
Agilent ADS 模擬手冊 [實習3] 壓控振盪器模擬
Random Process
Multiband Transceivers - [Chapter 3] Basic Concept of Comm. Systems
Agilent ADS 模擬手冊 [實習1] 基本操作與射頻放大器設計
電路學 - [第七章] 正弦激勵, 相量與穩態分析
The two dimensional wave equation
first order ode with its application
Chuong 4.2
RF Circuit Design - [Ch3-2] Power Waves and Power-Gain Expressions
BoltzTrap webinar116_David_J_Singh.pdf
Solutions manual for engineering economy 8th edition by blank ibsn 0073523437
3.2.interpolation lagrange
Ad

Similar to Tutorial APS 2023: Phase transition for statistical estimation: algorithms and fundamental limits. (20)

PDF
Mathematics and AI
PDF
MVPA with SpaceNet: sparse structured priors
PDF
Random Matrix Theory and Machine Learning - Part 3
PPTX
Neuronal self-organized criticality
PDF
AJMS_389_22.pdf
PDF
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
PDF
comm_ch02_random_en.pdf
PDF
Jere Koskela slides
PDF
PDF
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
PDF
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
PPT
Ph 101-9 QUANTUM MACHANICS
PDF
On Application of Power Series Solution of Bessel Problems to the Problems of...
PDF
PCA on graph/network
PDF
03_AJMS_209_19_RA.pdf
PDF
03_AJMS_209_19_RA.pdf
PDF
MCMC and likelihood-free methods
PDF
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
PDF
Optimal multi-configuration approximation of an N-fermion wave function
Mathematics and AI
MVPA with SpaceNet: sparse structured priors
Random Matrix Theory and Machine Learning - Part 3
Neuronal self-organized criticality
AJMS_389_22.pdf
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
comm_ch02_random_en.pdf
Jere Koskela slides
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Ph 101-9 QUANTUM MACHANICS
On Application of Power Series Solution of Bessel Problems to the Problems of...
PCA on graph/network
03_AJMS_209_19_RA.pdf
03_AJMS_209_19_RA.pdf
MCMC and likelihood-free methods
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Optimal multi-configuration approximation of an N-fermion wave function
Ad

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
famous lake in india and its disturibution and importance
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
BIOMOLECULES PPT........................
PPTX
2. Earth - The Living Planet earth and life
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Derivatives of integument scales, beaks, horns,.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Viruses (History, structure and composition, classification, Bacteriophage Re...
neck nodes and dissection types and lymph nodes levels
INTRODUCTION TO EVS | Concept of sustainability
HPLC-PPT.docx high performance liquid chromatography
Classification Systems_TAXONOMY_SCIENCE8.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
TOTAL hIP ARTHROPLASTY Presentation.pptx
Introduction to Cardiovascular system_structure and functions-1
famous lake in india and its disturibution and importance
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
BIOMOLECULES PPT........................
2. Earth - The Living Planet earth and life
7. General Toxicologyfor clinical phrmacy.pptx

Tutorial APS 2023: Phase transition for statistical estimation: algorithms and fundamental limits.

  • 1. Phase transition for statistical estimation: algorithms and fundamental limits Marc Lelarge INRIA-ENS APS - INFORMS 2023
  • 2. A bit of history: 70’s
  • 3. A bit of history: 70’s
  • 4. A bit of history: 2010’s
  • 5. A bit of history: 2010’s
  • 6. A bit of history: 2010’s
  • 10. This tutorial: demystifying statistical physics! - A simple version of AMP algoritm
  • 11. This tutorial: demystifying statistical physics! - A simple version of AMP algoritm - Gap between information-theoretically optimal and computationally feasible estimators
  • 12. This tutorial: demystifying statistical physics! - A simple version of AMP algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 14. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R.
  • 15. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R. If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d ∼ X0 with E X2 0 = 1, then for any nice test function Ψ : Rt → R, 1 n n X i=1 Ψ x1 i , . . . , xt i → E [Ψ(Z1, . . . , Zt)] , where (Z1, . . . , Zt) d = (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d. (Bayati Montanari ’11)
  • 16. Sanity check We have x1 = Wf0(x0) so that x1 i = X j Wijf0(x0 j ), where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms). Hence x1 is a centred Gaussian vector with entries having variance 1 N X j f0(x0 j )2 ≈ E h f0(X0)2 i = σ1.
  • 17. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 ,
  • 18. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+.
  • 19. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+. When 1 n kx0k = 1, we have 1 n hxs, xti ≈ trPs(W)Pt(Wt).
  • 20. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t)
  • 21. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC.
  • 22. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC. Credit: Zhou Fan.
  • 27. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2.
  • 28. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1
  • 29. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1 The Onsager term is very similar to the Itô-correction in stochastic calculus.
  • 30. This tutorial: demystifying statistical physics! - A simple version of AMP algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 31. Low-rank matrix estimation “Spiked Wigner” model Y |{z} observations = v u u u t λ n XX| | {z } signal + Z |{z} noise I X: vector of dimension n with entries Xi i.i.d. ∼ P0. EX1 = 0, EX2 1 = 1. I Zi,j = Zj,i i.i.d. ∼ N(0, 1). I λ: signal-to-noise ratio. I λ and P0 are known by the statistician. Goal: recover the low-rank matrix XX| from Y.
  • 32. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n.
  • 33. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n. B.B.P. phase transition I if λ 6 1    µn a.s. − − − → n→∞ 2 X · x̂n a.s. − − − → n→∞ 0 I if λ 1    µn a.s. − − − → n→∞ √ λ + 1 √ λ 2 |X · x̂n| a.s. − − − → n→∞ p 1 − 1/λ 0 (Baik, Ben Arous, Péché ’05)
  • 34. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal?
  • 35. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal?
  • 36. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal? I More generally, what is the best achievable estimation performance in both regimes?
  • 37. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 38. We can certainly improve spectral algorithm!
  • 39. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 40. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 41. Bayes optimal AMP We define mmse(γ) = E h X0 − E[X0| √ γX0 + Z] 2 i and the recursion: q0 = 1 − λ−1 qt+1 = 1 − mmse(λqt). With the optimal denoiser gP0 (y, γ) = E[X0| √ γX0 + Z = y], AMP is defined by: xt+1 = Y s λ n ft(xt ) − λbtft−1(xt−1 ), where ft(y) = gP0 (y/ √ λqt, λqt).
  • 42. Bayes optimal AMP: experiment
  • 43. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 44. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 45. Limiting formula for the MMSE Theorem (L, Miolane ’19) MMSEn − − − → n→∞ 1 |{z} Dummy MSE − q∗ (λ)2 where q∗ (λ) is the minimizer of q 0 7→ −EX0∼P0 Z0∼N log Z x0 dP0(x0)e √ λqZ0x0+λqX0x0+λq 2 x2 0 + λ 4 q2 A simplified “free energy landscape”: 0.0 0.2 0.4 0.6 0.8 1.0 q −0.06 −0.05 −0.04 −0.03 −0.02 −0.01 0.00 −F(λ, q) (a) “Easy” phase (λ = 1.01) 0.0 0.2 0.4 0.6 0.8 q −0.002 −0.001 0.000 0.001 0.002 0.003 −F(λ, q) (b) “Hard” phase (λ = 0.625) 0.0 0.2 0.4 0.6 0.8 q 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 −F(λ, q) (c) “Impossible” phase (λ = 0.5)
  • 46. Phase diagram Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 47. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j .
  • 48. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j . Two step proof: I Lower bound: Guerra’s interpolation technique. Adapted in (Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16) ( Y = √ t p λ/n XX| + Z Y0 = √ 1 − t √ λ X + Z0 I Upper bound: Cavity computations (Mézard, Parisi, Virasoro ’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr ’03) (Talagrand ’10)
  • 49. Conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, composition... and new applications outside electrical engineering like in ecology.
  • 50. Conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, composition... and new applications outside electrical engineering like in ecology. Deep learning, the new kid on the block:
  • 51. Thank you for your attention !