Minimax Rates
            for
Homology Inference

        Don Sheehy

        Joint work with
    Sivaraman Balakrishan,
      Alessandro Rinaldo,
        Aarti Singh, and
       Larry Wasserman
Something like a joke.
Something like a joke.


 What is topological inference?
Something like a joke.


   What is topological inference?


It’s when you infer the topology of a
   space given only a finite subset.
Something like a joke.


   What is topological inference?


It’s when you infer the topology of a
   space given only a finite subset.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
We add geometric and statistical hypotheses to
make the problem well-posed.
   Geometric Assumption:
   The underlying space is a smooth manifold M.

   Statistical Assumption:
   The points are drawn i.i.d. from a distribution derived from M.
Minimax Rates for Homology Inference
Input: n points from a d-manifold M in D-dimensions.
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?
Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
sam pled i.i.d.

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
         pled i.i.d.   distribut d on
     sam               supporte

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
         pled i.i.d.             distribut d on
     sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                 e
                       with nois

Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?


Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?
                                                                       ing
                                                   probabil ity of giv
                                                                    r
                                                   a wro ng answe
Lower Bound: What is the worst case complexity of
the best possible algorithm?
ion
          pled i.i.d.             distribut d on
      sam                         supporte

Input: n points from a d-manifold M in D-dimensions.
                                  e
   estimate
            of          with nois
an

Output: The homology of M.


Upper bound: What is the worst case complexity?
                                                                       ing
                                                   probabil ity of giv
                                                                    r
                                                   a wro ng answe
Lower Bound: What is the worst case complexity of
the best possible algorithm?


                              The Goal:
                  Matching Bounds
                          (asymptotically)
Minimax risk is the error probability of the
best estimator on the hardest examples.
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf   sup    n ˆ
                                 Q (H = H(M ))
                      ˆ
                      H    Q∈Q
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf         sup    n ˆ
                                       Q (H = H(M ))
                             ˆ
                             H   Q∈Q

                the best r
                estimato
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                the best r       t
                                             t
                                   he hardes n
                estimato         d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion
                the best r       t
                                             t
                                   he hardes n   distribut
                estimato         d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion   the true
                the best r                   t
                                   he hardes n   distribut
                estimato
                                 t                             homology
                                 d istributio
Minimax risk is the error probability of the
best estimator on the hardest examples.

Minimax Risk:   Rn = inf              sup         n ˆ
                                                 Q (H = H(M ))
                             ˆ
                             H       Q∈Q

                                                 product ion   the true
                the best r                   t
                                   he hardes n   distribut
                estimato
                                 t                             homology
                                 d istributio




Sample Complexity:   n( ) = min{n : Rn ≤ }
We assume manifolds without boundary of
bounded volume and reach.
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
      3 The reach of M is at most τ .
We assume manifolds without boundary of
bounded volume and reach.

      Let M be the set of compact d-dimensional
   Riemannian manifolds without boundary such that
      1 M ⊂ ballD (0, 1)
      2 vol(M ) ≤ cd
      3 The reach of M is at most τ .

      Let P be the set of probability distributions
   supported over M ∈ M with densities bounded
   from below by a constant a.
We consider 4 different noise models.
Noiseless              Clutter




Tubular                Additive
We consider 4 different noise models.
Noiseless              Clutter


            Q=P




Tubular                Additive
We consider 4 different noise models.
Noiseless              Clutter
                                  Q = (1 − γ)U + γP
            Q=P
                                       P ∈P
                                    U is uniform
                                    on ball(0, 1)


Tubular                Additive
We consider 4 different noise models.
Noiseless                        Clutter
                                            Q = (1 − γ)U + γP
               Q=P
                                                 P ∈P
                                              U is uniform
                                              on ball(0, 1)


Tubular                          Additive
             Let QM,σ be
             uniform on M σ .

            Q = {QM,σ : M ∈ M}
We consider 4 different noise models.
Noiseless                        Clutter
                                            Q = (1 − γ)U + γP
               Q=P
                                                  P ∈P
                                              U is uniform
                                              on ball(0, 1)


Tubular                          Additive
             Let QM,σ be
             uniform on M σ .               Q = {P   Φ : P ∈ P}
                                              Φ is Gaussian
            Q = {QM,σ : M ∈ M}                with σ   τ
                                       or Φ has Fourier transform
                                       bounded away from 0
                                       and τ is fixed.
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y


                          n ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
                 inf sup Q (H
                  ˆ
                  H Q∈Q                  8
Le Cam’s Lemma is a powerful tool for
proving minimax lower bounds.
   Lemma. Let Q be a set of distributions. Let θ(Q) take values
   in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q,

   inf sup EQn     ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n
                 ρ(θ,
   ˆ
   θ Q∈Q                     8

                                                      0   if x = y
   For homology, use the trivial metric. ρ(x, y) =
                                                      1   if x = y


                        ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n
        Rn = inf sup Q (Hn
              ˆ
              H Q∈Q                  8
The lower bound requires two manifolds that are
geometrically close but topologically distinct.
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )   A = B  balld (0, 2τ )
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )   A = B  balld (0, 2τ )



        M1 = ∂(B )    τ
                                M2 = ∂(A )    τ
The lower bound requires two manifolds that are
geometrically close but topologically distinct.


      B = balld (0, 1 − τ )                 A = B  balld (0, 2τ )



        M1 = ∂(B )    τ
                                              M2 = ∂(A )    τ




                              The overlap
It suffices to bound the total variation distance.
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d

Minimax Risk:
     1                 2n 1         d 2n 1 −2Cd aτ d n
 Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
     8                    8              8
It suffices to bound the total variation distance.
Total Variation Distance:
        TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)|
                            A
                      ≤ a max{vol(M1  M2 ), vol(M2  M1 )}
                      ≤ Cd aτ d

Minimax Risk:
     1                 2n 1         d 2n 1 −2Cd aτ d n
 Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e
     8                    8              8


Sampling Rate:                    1
                                      d
                                              1
                      n( ) ≥              log
                                  τ           ε
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.




1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the
The upper bound uses a union of balls to estimate
the homology of M.



0 Denoise the data.
1 Take a union of balls.
2 resulting Cech complex.
  Compute the homology of the




        To prove: The density is bounded from below near M
        and from above far from M.
Many fundamental problems are still open.
Many fundamental problems are still open.


       1 Is the reach the right parameter?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
       4 How to choose parameters?
Many fundamental problems are still open.


       1 Is the reach the right parameter?
       2 What about manifolds with boundary?
       3 Homotopy equivalence?
       4 How to choose parameters?
       5 Are there efficient algorithms?
Thank you.

More Related Content

PDF
Slides anr-jeu
PDF
Slides brussels
PDF
Prob distros
PDF
practive of finance
PDF
Bayesian Subset Simulation
PPT
PDF
Lecture7 channel capacity
PDF
presentation
Slides anr-jeu
Slides brussels
Prob distros
practive of finance
Bayesian Subset Simulation
Lecture7 channel capacity
presentation

What's hot (7)

PDF
Lecture5 xing
PDF
Bachelor thesis of do dai chi
PDF
Lesson19 Maximum And Minimum Values 034 Slides
PDF
SIAM CSE 2017 talk
PDF
Georgia Tech 2017 March Talk
PDF
Tulane March 2017 Talk
PDF
MCQMC 2016 Tutorial
Lecture5 xing
Bachelor thesis of do dai chi
Lesson19 Maximum And Minimum Values 034 Slides
SIAM CSE 2017 talk
Georgia Tech 2017 March Talk
Tulane March 2017 Talk
MCQMC 2016 Tutorial
Ad

Similar to Minimax Rates for Homology Inference (20)

PPT
Submodularity slides
PDF
Lagrange
PDF
Jokyokai20111124
PDF
ma112011id535
PDF
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
PDF
Approximating Bayes Factors
PDF
Robust Sparse Analysis Recovery
PDF
Norm-variation of bilinear averages
PDF
Optimum Engineering Design - Day 2b. Classical Optimization methods
PDF
Maximum Likelihood Estimation
PDF
Test Problems in Optimization
PDF
Nonlinear Manifolds in Computer Vision
PDF
CMA-ES with local meta-models
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
Introduction to Machine Learning
PDF
Monte-Carlo method for Two-Stage SLP
PDF
Differential Equations with Maxima 1st Edition Drumi D. Bainov
PDF
Harmonic Analysis and Deep Learning
PDF
NCE, GANs & VAEs (and maybe BAC)
PDF
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
Submodularity slides
Lagrange
Jokyokai20111124
ma112011id535
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Approximating Bayes Factors
Robust Sparse Analysis Recovery
Norm-variation of bilinear averages
Optimum Engineering Design - Day 2b. Classical Optimization methods
Maximum Likelihood Estimation
Test Problems in Optimization
Nonlinear Manifolds in Computer Vision
CMA-ES with local meta-models
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Introduction to Machine Learning
Monte-Carlo method for Two-Stage SLP
Differential Equations with Maxima 1st Edition Drumi D. Bainov
Harmonic Analysis and Deep Learning
NCE, GANs & VAEs (and maybe BAC)
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
Ad

More from Don Sheehy (20)

PDF
Some Thoughts on Sampling
PDF
Characterizing the Distortion of Some Simple Euclidean Embeddings
PDF
Sensors and Samples: A Homological Approach
PDF
Persistent Homology and Nested Dissection
PDF
The Persistent Homology of Distance Functions under Random Projection
PDF
Geometric and Topological Data Analysis
PDF
Geometric Separators and the Parabolic Lift
PDF
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
PDF
Optimal Meshing
PDF
Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
PDF
Mesh Generation and Topological Data Analysis
PDF
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
PDF
Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...
PDF
A Multicover Nerve for Geometric Inference
PDF
ATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration
PDF
New Bounds on the Size of Optimal Meshes
PPT
Flips in Computational Geometry
PDF
Beating the Spread: Time-Optimal Point Meshing
PDF
Ball Packings and Fat Voronoi Diagrams
PDF
Learning with Nets and Meshes
Some Thoughts on Sampling
Characterizing the Distortion of Some Simple Euclidean Embeddings
Sensors and Samples: A Homological Approach
Persistent Homology and Nested Dissection
The Persistent Homology of Distance Functions under Random Projection
Geometric and Topological Data Analysis
Geometric Separators and the Parabolic Lift
A New Approach to Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
Optimal Meshing
Output-Sensitive Voronoi Diagrams and Delaunay Triangulations
Mesh Generation and Topological Data Analysis
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
Linear-Size Approximations to the Vietoris-Rips Filtration - Presented at Uni...
A Multicover Nerve for Geometric Inference
ATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration
New Bounds on the Size of Optimal Meshes
Flips in Computational Geometry
Beating the Spread: Time-Optimal Point Meshing
Ball Packings and Fat Voronoi Diagrams
Learning with Nets and Meshes

Minimax Rates for Homology Inference

  • 1. Minimax Rates for Homology Inference Don Sheehy Joint work with Sivaraman Balakrishan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman
  • 3. Something like a joke. What is topological inference?
  • 4. Something like a joke. What is topological inference? It’s when you infer the topology of a space given only a finite subset.
  • 5. Something like a joke. What is topological inference? It’s when you infer the topology of a space given only a finite subset.
  • 6. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 7. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 8. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 9. We add geometric and statistical hypotheses to make the problem well-posed. Geometric Assumption: The underlying space is a smooth manifold M. Statistical Assumption: The points are drawn i.i.d. from a distribution derived from M.
  • 11. Input: n points from a d-manifold M in D-dimensions.
  • 12. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M.
  • 13. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity?
  • 14. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 15. sam pled i.i.d. Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 16. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 17. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e with nois Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 18. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 19. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? ing probabil ity of giv r a wro ng answe Lower Bound: What is the worst case complexity of the best possible algorithm?
  • 20. ion pled i.i.d. distribut d on sam supporte Input: n points from a d-manifold M in D-dimensions. e estimate of with nois an Output: The homology of M. Upper bound: What is the worst case complexity? ing probabil ity of giv r a wro ng answe Lower Bound: What is the worst case complexity of the best possible algorithm? The Goal: Matching Bounds (asymptotically)
  • 21. Minimax risk is the error probability of the best estimator on the hardest examples.
  • 22. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q
  • 23. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q the best r estimato
  • 24. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q the best r t t he hardes n estimato d istributio
  • 25. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the best r t t he hardes n distribut estimato d istributio
  • 26. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the true the best r t he hardes n distribut estimato t homology d istributio
  • 27. Minimax risk is the error probability of the best estimator on the hardest examples. Minimax Risk: Rn = inf sup n ˆ Q (H = H(M )) ˆ H Q∈Q product ion the true the best r t he hardes n distribut estimato t homology d istributio Sample Complexity: n( ) = min{n : Rn ≤ }
  • 28. We assume manifolds without boundary of bounded volume and reach.
  • 29. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that
  • 30. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1)
  • 31. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd
  • 32. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd 3 The reach of M is at most τ .
  • 33. We assume manifolds without boundary of bounded volume and reach. Let M be the set of compact d-dimensional Riemannian manifolds without boundary such that 1 M ⊂ ballD (0, 1) 2 vol(M ) ≤ cd 3 The reach of M is at most τ . Let P be the set of probability distributions supported over M ∈ M with densities bounded from below by a constant a.
  • 34. We consider 4 different noise models. Noiseless Clutter Tubular Additive
  • 35. We consider 4 different noise models. Noiseless Clutter Q=P Tubular Additive
  • 36. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive
  • 37. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive Let QM,σ be uniform on M σ . Q = {QM,σ : M ∈ M}
  • 38. We consider 4 different noise models. Noiseless Clutter Q = (1 − γ)U + γP Q=P P ∈P U is uniform on ball(0, 1) Tubular Additive Let QM,σ be uniform on M σ . Q = {P Φ : P ∈ P} Φ is Gaussian Q = {QM,σ : M ∈ M} with σ τ or Φ has Fourier transform bounded away from 0 and τ is fixed.
  • 39. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds.
  • 40. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8
  • 41. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y
  • 42. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y n ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n inf sup Q (H ˆ H Q∈Q 8
  • 43. Le Cam’s Lemma is a powerful tool for proving minimax lower bounds. Lemma. Let Q be a set of distributions. Let θ(Q) take values in a metric space (X, ρ) for Q ∈ Q. For any Q1 , Q2 ∈ Q, inf sup EQn ˆ θ(Q)) ≥ 1 ρ(θ(Q1 ), θ(Q2 ))(1−TV(Q1 , Q2 ))2n ρ(θ, ˆ θ Q∈Q 8 0 if x = y For homology, use the trivial metric. ρ(x, y) = 1 if x = y ˆ = H(M )) ≥ 1 (1 − TV(Q1 , Q2 ))2n Rn = inf sup Q (Hn ˆ H Q∈Q 8
  • 44. The lower bound requires two manifolds that are geometrically close but topologically distinct.
  • 45. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ )
  • 46. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ ) M1 = ∂(B ) τ M2 = ∂(A ) τ
  • 47. The lower bound requires two manifolds that are geometrically close but topologically distinct. B = balld (0, 1 − τ ) A = B balld (0, 2τ ) M1 = ∂(B ) τ M2 = ∂(A ) τ The overlap
  • 48. It suffices to bound the total variation distance.
  • 49. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d
  • 50. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d Minimax Risk: 1 2n 1 d 2n 1 −2Cd aτ d n Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e 8 8 8
  • 51. It suffices to bound the total variation distance. Total Variation Distance: TV(Q1 , Q2 ) = sup |Q1 (A) − Q2 (A)| A ≤ a max{vol(M1 M2 ), vol(M2 M1 )} ≤ Cd aτ d Minimax Risk: 1 2n 1 d 2n 1 −2Cd aτ d n Rn ≥ (1 − TV(Q1 , Q2 ) ≥ (1 − Cd aτ ) ≥ e 8 8 8 Sampling Rate: 1 d 1 n( ) ≥ log τ ε
  • 52. The upper bound uses a union of balls to estimate the homology of M.
  • 53. The upper bound uses a union of balls to estimate the homology of M.
  • 54. The upper bound uses a union of balls to estimate the homology of M.
  • 55. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls.
  • 56. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 57. The upper bound uses a union of balls to estimate the homology of M. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 58. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 59. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 60. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 61. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 62. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the
  • 63. The upper bound uses a union of balls to estimate the homology of M. 0 Denoise the data. 1 Take a union of balls. 2 resulting Cech complex. Compute the homology of the To prove: The density is bounded from below near M and from above far from M.
  • 64. Many fundamental problems are still open.
  • 65. Many fundamental problems are still open. 1 Is the reach the right parameter?
  • 66. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary?
  • 67. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence?
  • 68. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence? 4 How to choose parameters?
  • 69. Many fundamental problems are still open. 1 Is the reach the right parameter? 2 What about manifolds with boundary? 3 Homotopy equivalence? 4 How to choose parameters? 5 Are there efficient algorithms?