Minimax Rates for Homology Inference

Minimax Rates
for
Homology Inference

Don Sheehy

Joint work with
Sivaraman Balakrishan,
Alessandro Rinaldo,
Aarti Singh, and
Larry Wasserman

Something like a joke.

What is topological inference?

Something like a joke.

What is topological inference?

It’s when you infer the topology of a
space given only a finite subset.

We add geometric and statistical hypotheses to
make the problem well-posed.
Geometric Assumption:
The underlying space is a smooth manifold M.

Statistical Assumption:
The points are drawn i.i.d. from a distribution derived from M.

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.



Upper bound: What is the worst case complexity?




Lower Bound: What is the worst case complexity of
the best possible algorithm?

sam pled i.i.d.





ion
pled i.i.d. distribut d on
sam supporte





ion
sam supporte

e
with nois




ion
sam supporte

e
estimate
of with nois
an




ion
sam supporte

e
estimate
of with nois
an


ing
probabil ity of giv
r
a wro ng answe

ion
sam supporte

e
estimate
of with nois
an


ing
probabil ity of giv
r
a wro ng answe

The Goal:
Matching Bounds
(asymptotically)

Minimax risk is the error probability of the
best estimator on the hardest examples.


Minimax Risk: Rn = inf sup n ˆ
Q (H = H(M ))
ˆ
H Q∈Q


Q (H = H(M ))
ˆ
H Q∈Q

the best r
estimato


Q (H = H(M ))
ˆ
H Q∈Q

the best r t
t
he hardes n
estimato d istributio


Q (H = H(M ))
ˆ
H Q∈Q

product ion
the best r t
t
he hardes n distribut
estimato d istributio


Q (H = H(M ))
ˆ
H Q∈Q

product ion the true
the best r t
estimato
t homology
d istributio


Q (H = H(M ))
ˆ
H Q∈Q

product ion the true
the best r t
estimato
t homology
d istributio

Sample Complexity: n( ) = min{n : Rn ≤ }

We assume manifolds without boundary of
bounded volume and reach.


Let M be the set of compact d-dimensional
Riemannian manifolds without boundary such that


1 M ⊂ ballD (0, 1)


1 M ⊂ ballD (0, 1)
2 vol(M ) ≤ cd


1 M ⊂ ballD (0, 1)
2 vol(M ) ≤ cd
3 The reach of M is at most τ .


1 M ⊂ ballD (0, 1)
2 vol(M ) ≤ cd
3 The reach of M is at most τ .

Let P be the set of probability distributions
supported over M ∈ M with densities bounded
from below by a constant a.

We consider 4 different noise models.
Noiseless Clutter

Tubular Additive

Noiseless Clutter

Q=P

Tubular Additive

Noiseless Clutter
Q = (1 − γ)U + γP
Q=P
P ∈P
U is uniform
on ball(0, 1)

Tubular Additive

Noiseless Clutter
Q = (1 − γ)U + γP
Q=P
P ∈P
U is uniform
on ball(0, 1)

Tubular Additive
Let QM,σ be
uniform on M σ .

Q = {QM,σ : M ∈ M}

Noiseless Clutter
Q = (1 − γ)U + γP
Q=P
P ∈P
U is uniform
on ball(0, 1)

Tubular Additive
Let QM,σ be
uniform on M σ . Q = {P Φ : P ∈ P}
Φ is Gaussian
Q = {QM,σ : M ∈ M} with σ τ
or Φ has Fourier transform
bounded away from 0
and τ is ﬁxed.

Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.

Lemma. Let Q be a set of distributions. Let θ(Q) take values
in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
ρ(θ,
ˆ
θ Q∈Q 8


ρ(θ,
ˆ
θ Q∈Q 8

0 if x = y
For homology, use the trivial metric. ρ(x, y) =
1 if x = y


ρ(θ,
ˆ
θ Q∈Q 8

0 if x = y
1 if x = y

n ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
inf sup Q (H
ˆ
H Q∈Q 8


ρ(θ,
ˆ
θ Q∈Q 8

0 if x = y
1 if x = y

ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
Rn = inf sup Q (Hn
ˆ
H Q∈Q 8

The lower bound requires two manifolds that are
geometrically close but topologically distinct.


B = balld (0, 1 − τ ) A = B balld (0, 2τ )



M1 = ∂(B ) τ
M2 = ∂(A ) τ



M1 = ∂(B ) τ
M2 = ∂(A ) τ

The overlap

It suffices to bound the total variation distance.

Total Variation Distance:
TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
A
≤ a max{vol(M1 M2 ), vol(M2 M1 )}
≤ Cd aτ d

TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
A
≤ Cd aτ d

Minimax Risk:
1 2n 1 d 2n 1 −2Cd aτ d n
Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
8 8 8

TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
A
≤ Cd aτ d

Minimax Risk:
1 2n 1 d 2n 1 −2Cd aτ d n
Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
8 8 8

Sampling Rate: 1
d
1
n( ) ≥ log
τ ε

The upper bound uses a union of balls to estimate
the homology of M.

the homology of M.

1 Take a union of balls.

the homology of M.

2 resulting Cech complex.
Compute the homology of the

the homology of M.

0 Denoise the data.

the homology of M.

0 Denoise the data.

To prove: The density is bounded from below near M
and from above far from M.

Many fundamental problems are still open.


1 Is the reach the right parameter?


2 What about manifolds with boundary?


3 Homotopy equivalence?


4 How to choose parameters?


4 How to choose parameters?
5 Are there efficient algorithms?

Minimax Rates for Homology Inference

More Related Content

What's hot (7)

Similar to Minimax Rates for Homology Inference (20)

More from Don Sheehy (20)

Minimax Rates for Homology Inference