SlideShare a Scribd company logo
The Uniform
Manifold
Approximation
Projection
Algorithm
Dimensionality reduction from local metric learning via fuzzy
simplicial sets
Umberto Lupo
April 26, 2019
Table of contents
1. The old mathematics
2. The fuzzy mathematics
3. Uniformity and local metric structure
4. Implementational details
1
In one slide!
By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python
library umap-learn: based on scikit-learn, optimized with numba.
An unsupervised algorithm for non-linear dimensionality reduction. A
noteworthy alternative to t-SNE.
1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm
).
2. Parameters: num. neighbours κ, embedding dimension d, etc.
3. Topological simplification steps:
a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to
entry i by normalizing distances with respect to the κth
nearest entry.
b) Distill the topological and geometric content of each Mi into a
fuzzy simplicial set Fi .
c) The fuzzy union i Fi is a global topological representation.
4. Dimensionality reduction steps:
a) Initialize a cloud Z of N points in Euclidean Rd
.
b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy
simplicial representation and the input’s.
c) Move points of Z around until this distance is minimized. 2
The old mathematics
Abstracting away abstract simplicial complexes
An abstract simplicial complex (ASC) is a family X of non-empty finite
sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X.
If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices
of X is denoted by Xn. V = X0 is the set of vertices.
Can construct a geometric realization |X| of X as a simplicial complex
in the vector space RJ
= {functions J → R} where J is any sufficiently
large index set (J = V works).
No real need for a total ordering on V so far. With one, could define
face maps dn
i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n:
α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn
i (α) = α  {vi }.
Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be
sets of vertices. Let them simply be elements of an abstract set Xn.
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
3
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
→ Promote to axioms key structural properties of the collection of
dn
i : Xn → Xn−1 which don’t require knowing what the simplices look like.
. . . Not much! Only the simplicial identity
(SI) dn−1
i ◦ dn
j = dn−1
j−1 ◦ dn
i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n.
Sequence of sets (Xn)n∈N0 and {dn
i : Xn → Xn−1} satisfying (SI) → data
for a Delta set (sometimes: “abstract Delta complex”). More general
than ASCs because e.g.:
1. i = j =⇒ di (α) = dj (α);
2. di (α) = di (β) ∀ i =⇒ α = β.
Geometric realization : For each simplex α let |∆α| = |∆dim α
| where
|∆n
| ⊆ Rn+1
is the standard geometric n-simplex. Identify the faces
appropriately to construct the topological space Real(X) as a quotient of
the disjoint union α |∆α|. Hint: (dn
i α, x) ∼ (α, Di
nx) where
Di
n : |∆n−1
| → |∆n
| is the inclusion of the i-th face (a coface map).
4
Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}.
Since {[n]}n∈N0
∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn.
Know how to extract i-th faces of all n-simplices at once:
dn
i : X([n]) → X([n − 1]). dn
i “corresponds to” [n]  {i}. But
{[n]  {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}.
dn
i implements in Xn the prototype map Di
n : [n − 1] → [n] given by
0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar?
=⇒ Our Delta set X is an implementation of {[n]}n∈N0
and of the
collection of coface maps. Boring until we notice:
Dj
n ◦ Di
n−1 = Di
n ◦ Dj−1
n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar?
For [l]
f
−→ [m]
g
−→ [n] let f ◦op
g := g ◦ f . Starting from X(Di
n) := dn
i we
can define X(Di
n−1 ◦op
Dj
n) := X(Di
n−1) ◦ X(Dj
n) consistently thanks to
(SI)! And extend to arbitrary compositions s.t. X(f ◦op
g) = X(f ) ◦ X(g).
Abstract nonsense: A Delta set is a functor X : ∆op
→ Sets where ∆ is
the category with objects the [n]s, and arrows the strictly o.-p. maps.
5
Further generalize (yes, really): Easy with categories and functors!
Enlarge collection of arrows to include all non-strictly o.-p. maps. Call
the new category ∆. A simplicial set is a functor X : ∆op
→ Sets. The
collection of simplicial sets has the structure of a category S.
But why? We would like to include “degenerate” simplices. Degeneracy
maps sn
i : X([n]) → X([n + 1]) expose any hidden degenerate simplices
“by repeating the i-th vertex”. Example: (v0, v1, v1) = s1
1 ((v0, v1)), a
degenerate 2-simplex “living inside” (v0, v1). sn
i corresponds to and
implements the unique o.-p. map Si
n : [n + 1] → [n] repeating i twice – a
codegeneracy map and the prototype of a “collapse” of an ordered
simplex. Additional easy-to-check-but-tedious-to-write identities satisfied
when codegeneracy maps are added to the coface maps. Functoriality
yields corresponding identities satisfied by the face and degeneracy maps.
Geometric realization : As for Delta sets, but add equivalences
(sn
i α, x) ∼ (α, Si
nx). Real: S → Top is a functor.
6
7
Motivation for us: Variations on the theme of singular homology of a
topological space Y : Sing(Y ) is the simplicial set defined by
Sing(Y ): [n] → {σ: |∆n
| → Y continuous},
with di σ the restriction of σ to the i-th face and si σ the composition of
σ with a collapse. Sing: Top → S is in fact a functor.
This is just another definition, I want my time back. OK, but first note
down this theorem: for any Y ∈ Top and X ∈ S,
(Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}.
Interpretation
Sing and Real are not inverses, but if you did Real(Sing(Y )) the result
would have topologically a lot in common with Y .
UMAP employs a cousin of this result where Top is replaced by a
category of finite “almost metric” spaces because these are directly and
naturally defined by the data. What, then, must replace S, Real and Sing
to yield something analogous to (Adj)?
8
The fuzzy mathematics
Fuzzy sets
In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A
fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a
membership function, i.e. µ(x) is the membership strength of x to A.
Interpreting µ as a “field of Bernoulli probabilities” suggests fuzzy
analogues to the standard Boolean operators ∪ and ∩:
(A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν
(A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x
=⇒ ¬ (¬µ, ¬ν) = µ + ν − µν.
If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is
C((U, µ), (U, ν)) =
u∈U
KL Bern(µ(u)) Bern(ν(u))
=
u∈U
µ(u) log
µ(u)
ν(u)
+ (1 − µ(u)) log
1 − µ(u)
1 − ν(u)
.
9
Fuzzy simplicial sets
A simplicial set was a functor ∆op
→ Sets. A fuzzy simplicial set is a
functor X : ∆op
→ Fuzz where Fuzz is the category of fuzzy sets. sFuzz
is the category of fuzzy simplicial sets.
“Concretely”: Let I be (0, 1] ⊂ R,1
then can view X ∈ sFuzz as a
functor X : (∆ × I)op
→ Sets. For each n, there is a fuzzy set (Xn, µn).
Define X([n], a) := µ−1
n ([a, 1]).
Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼
where each |∆α| = |∆dim α
|. Reliant on the fact that for each object in
∆op
– i.e. for each n – we have a model space |∆n
| ∈ Top. Here objects
in the source category (∆ × I)op
contain the extra piece of information
a ∈ (0, 1]. If we had equivalent model spaces |∆n
a| and chose a category
C |∆n
a| to replace Top we could define a fuzzy set realization functor
fReal: sFuzz → C “analogously” to Real.
1As a category. . .
10
The correct adjunction
Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n
| appears
in the definition of Real but also of Sing:
Sing(Y )([n]) = {σ: |∆n
| → Y cts} = {Top-arrows |∆n
| → Y }.
With a choice of “geometric” category C and of model space |∆n
a| ∈ C,
we can define by analogy
fSing(Y )([n], a) = {C-arrows |∆n
a| → Y } so that fSing: C → sFuzz.
The obvious question
What are “correct” choices of C and |∆n
a|?
Our answer
Ones yielding a relation between fSing and fReal analogous to (Adj): e.g.
C = EψMet, |∆n
a| = (t0, . . . , tn) ∈ Rn+1
n
i=0
ti = − log(a), ti ≥ 0 .
(Spivak 2012). EψMet is extended
dist=∞ allowed
pseudo
dist(x,y)=0 =⇒ x=y
-metric spaces.
11
Finite version
Starting from a real-life point cloud we can at best hope to encode the
metric structure in a finite almost-metric space. Need finite analogs
Fin-EψMet, Fin-sFuzz, |∆n
a|Fin ∈ Fin-EψMet,
Fin-EψMet
Fin-fSing
−−−−−→ Fin-sFuzz
Fin-fReal
−−−−−→ Fin-EψMet,
and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward)
definitions and a proof of (Fin-fAdj) are the main mathematical
contributions of the UMAP paper.
Where we at?
If our data problem naturally yields an object M ∈ Fin-EψMet, we can
theoretically distill much of the topological information by computing
Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N
i=1
instead, we can first apply Fin-fSing individually and then take fuzzy
unions! This will give us a global, fuzzy simplicial representation.
12
Computer-friendly version
We descend back to planet Earth.
Truncate: Stop the computation of Fin-fSing(M) at some small finite n!
Maximally cheap: n = 1.
Understand the output data structure: Requires a look at the definitions.
|∆n
a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a,
Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n
a|Fin → M}
= {distance non-increasing maps |∆n
a|Fin → M}.
So |∆1
a| ∼= ({0, − log a}, dEucl) and, if M = (M, d):
Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}.
So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q)
.
Just a weighted graph!
13
14
Fuzzy set cross-entropy
Let E be the abstract set of all possible 1-simplices and suppose we have
two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond
to high and low dimensional representations respectively. Then the fuzzy
set cross entropy will be
e∈E
µh(e) log
µh(e)
µl (e)
+ (1 − µh(e)) log
1 − µh(e)
1 − µl (e)
For fixed µh, minimizing this as a function of µl can be viewed as a
force directed graph layout algorithm:
• First term is minimized when µl (e) is as large as possible, i.e. when
the distance between the points is as small as possible =⇒ an
“attractive force” which is larger when µh(e) is large.
• The second term will be minimized by making µl (e) as small as
possible =⇒ a “repulsive force” between the ends of e whenever
µh(e) is small.
15
Uniformity and local metric
structure
Why uniformity? (Very vaguely)
Some motivation: the ˇCech complex construction from a finite sample of
points is best at topologically reconstructing the underlying manifold
when the points are sampled uniformly.
Theorem (Niyogi et al. 2008). Let M be a smooth, compact
submanifold of Rn
with injectivity radius τ. Let D be a collection of
points on M such that the minimal distance between any point of M
and D is less than /2 for < τ 3/5 – say that D is 2
-dense in M.
Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒
homotopy equivalence =⇒ same homology).
Other results show that the more points we sample uniformly from M,
the higher the probability that the resulting D will be 2 -dense.
16
Learning local metric spaces from data
Basic idea: If enough data is sampled uniformly from a Riemannian
manifold, we should be able to estimate the local metric from the local
density of sample points.
Can estimate the local metric structure relative to which the data would
be uniformly sampled by enforcing that spheres of radius δ centred at
different locations in the point cloud should contain the same number K
of sample points.
In practice, locally rescale distances between each reference point and the
rest of the cloud by making sure this is the case.
17
Implementational details
Local (extended pseudo-)metric spaces
Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define,
for i = 1, . . . , N, Mi = (M = {xi }N
i=1, di ) where ∀ j = i
di (xi , xj ) =
Dij − ρi
σi
,
ρi
σi
:= dist. between xi and its
1st
κth
NN,
and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒
corresponding edge has membership strength 1 =⇒ local connectivity.
Current implementational shortcuts
Using the nearest neighbour descent algorithm (Dong et al 2011) to
efficiently yield an approximate κ-nearest neighbour graph data structure.
The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t.
xjk
∈κ-NNi
exp−(Dijk
−ρi )/ ˆσi
= log2 κ.
RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi
∞-ly far away from xi . Reduction in complexity from O(N2
) to O(Nκ)!
18
Embedding initialization
Fuzzy union of all local fuzzy sets of edges gives an undirected weighted
graph with weighted adjacency matrix B. With D the degree matrix,
L := D−1/2
(D − B)D−1/2
= I − D−1/2
BD−1/2
is the symmetric normalized Laplacian. If the data were generated by
sampling from a Riemannian manifold, L should be closely related to the
Laplace–Beltrami operator. Exploit this to initialize the low dimensional
representation into a good state by spectral embedding techniques.
In practice
Components of eigenvectors associated with d smallest non-zero
eigenvalues of L (listed in ascending e-value order) used to initialize the
embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd
.
19
Embedding optimization (briefly)
Recall the optimization objective: if (E, µh) =
N
i=1 Fin-fSing(Mi )([1])
and Z := (Z, dEucl) then the loss function is
L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]).
Several shortcuts:
• Use stochastic gradient descent
• (S)GD would benefit from the final objective function being
differentiable. But Fin-fSin – as a function of N points in Rd
– is
not! Use a smooth approximation of the actual membership strength
function for the low dimensional representation, selecting from a
suitably versatile family. In practice UMAP uses the family of curves
1
1+ax2b .
• Don’t want to have to deal with all possible edges, so use the
negative sampling trick (as in word2vec and LargeVis), to sample
negative examples as needed.
20
Thank you for your attention!
20

More Related Content

PDF
Dimensionality reduction with UMAP
PPTX
Visualization using tSNE
PDF
Self-organizing maps - Tutorial
PDF
Visualizing Data Using t-SNE
PDF
高速な物体候補領域提案手法 (Fast Object Proposal Methods)
PDF
Bayesian computation with INLA
PDF
Layer Normalization@NIPS+読み会・関西
PDF
時系列予測モデルを導入した価値関数に基づく強化学習
Dimensionality reduction with UMAP
Visualization using tSNE
Self-organizing maps - Tutorial
Visualizing Data Using t-SNE
高速な物体候補領域提案手法 (Fast Object Proposal Methods)
Bayesian computation with INLA
Layer Normalization@NIPS+読み会・関西
時系列予測モデルを導入した価値関数に基づく強化学習

What's hot (20)

PDF
Priorに基づく画像/テンソルの復元
PDF
t-SNE Explained
PPTX
【DL輪読会】Non-Linguistic Supervision for Contrastive Learning of Sentence Embedd...
PDF
New Adventures in RDF2vec
PDF
Random Matrix Theory and Machine Learning - Part 2
PDF
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
PDF
ConditionalPointDiffusion.pdf
PDF
EuroSciPy 2019 - GANs: Theory and Applications
PDF
Linear time sorting algorithms
PDF
LDA入門
PDF
Mean shift and Hierarchical clustering
PDF
Brief introduction on GAN
PPTX
G-TAD: Sub-Graph Localization for Temporal Action Detection
PDF
Visualizing Data Using t-SNE
PDF
理解して使うRNA Velocity解析ツール-最近のツール編
PPTX
Deep Learning in Bio-Medical Imaging
PPTX
Dynamic time wrapping
PDF
CVPR2019読み会 (Rethinking the Evaluation of Video Summaries)
PDF
(DL hacks輪読) Deep Kernel Learning
PDF
[DLHacks]XLNet を動かして可視化してみた
Priorに基づく画像/テンソルの復元
t-SNE Explained
【DL輪読会】Non-Linguistic Supervision for Contrastive Learning of Sentence Embedd...
New Adventures in RDF2vec
Random Matrix Theory and Machine Learning - Part 2
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
ConditionalPointDiffusion.pdf
EuroSciPy 2019 - GANs: Theory and Applications
Linear time sorting algorithms
LDA入門
Mean shift and Hierarchical clustering
Brief introduction on GAN
G-TAD: Sub-Graph Localization for Temporal Action Detection
Visualizing Data Using t-SNE
理解して使うRNA Velocity解析ツール-最近のツール編
Deep Learning in Bio-Medical Imaging
Dynamic time wrapping
CVPR2019読み会 (Rethinking the Evaluation of Video Summaries)
(DL hacks輪読) Deep Kernel Learning
[DLHacks]XLNet を動かして可視化してみた
Ad

Similar to UMAP - Mathematics and implementational details (20)

PDF
An algorithm for computing resultant polytopes
PDF
Density theorems for Euclidean point configurations
PDF
"An output-sensitive algorithm for computing projections of resultant polytop...
PDF
Introduction to Topological Data Analysis
PPT
00_1 - Slide Pelengkap (dari Buku Neuro Fuzzy and Soft Computing).ppt
PDF
Tda presentation
PDF
Polyhedral computations in computational algebraic geometry and optimization
PDF
Practical computation of Hecke operators
PPT
Expander Graph and application_tutorial_June2010.ppt
PPTX
Conformal matching
PDF
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
PDF
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
PDF
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
PDF
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
PDF
Conctructing Polytopes via a Vertex Oracle
PDF
A Szemerédi-type theorem for subsets of the unit cube
PDF
A NEW APPROACH TO M(G)-GROUP SOFT UNION ACTION AND ITS APPLICATIONS TO M(G)-G...
PPT
Unit 4 Intro to Fuzzy Logic 1VBGBGBG.ppt
An algorithm for computing resultant polytopes
Density theorems for Euclidean point configurations
"An output-sensitive algorithm for computing projections of resultant polytop...
Introduction to Topological Data Analysis
00_1 - Slide Pelengkap (dari Buku Neuro Fuzzy and Soft Computing).ppt
Tda presentation
Polyhedral computations in computational algebraic geometry and optimization
Practical computation of Hecke operators
Expander Graph and application_tutorial_June2010.ppt
Conformal matching
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
International Journal of Soft Computing, Mathematics and Control (IJSCMC)
Conctructing Polytopes via a Vertex Oracle
A Szemerédi-type theorem for subsets of the unit cube
A NEW APPROACH TO M(G)-GROUP SOFT UNION ACTION AND ITS APPLICATIONS TO M(G)-G...
Unit 4 Intro to Fuzzy Logic 1VBGBGBG.ppt
Ad

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Computer network topology notes for revision
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Foundation of Data Science unit number two notes
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Computer network topology notes for revision
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to machine learning and Linear Models
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Ppt On Nestle.pptx huunnnhhgfvu
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Foundation of Data Science unit number two notes
Quality review (1)_presentation of this 21
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

UMAP - Mathematics and implementational details

  • 1. The Uniform Manifold Approximation Projection Algorithm Dimensionality reduction from local metric learning via fuzzy simplicial sets Umberto Lupo April 26, 2019
  • 2. Table of contents 1. The old mathematics 2. The fuzzy mathematics 3. Uniformity and local metric structure 4. Implementational details 1
  • 3. In one slide! By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python library umap-learn: based on scikit-learn, optimized with numba. An unsupervised algorithm for non-linear dimensionality reduction. A noteworthy alternative to t-SNE. 1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm ). 2. Parameters: num. neighbours κ, embedding dimension d, etc. 3. Topological simplification steps: a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to entry i by normalizing distances with respect to the κth nearest entry. b) Distill the topological and geometric content of each Mi into a fuzzy simplicial set Fi . c) The fuzzy union i Fi is a global topological representation. 4. Dimensionality reduction steps: a) Initialize a cloud Z of N points in Euclidean Rd . b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy simplicial representation and the input’s. c) Move points of Z around until this distance is minimized. 2
  • 5. Abstracting away abstract simplicial complexes An abstract simplicial complex (ASC) is a family X of non-empty finite sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X. If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices of X is denoted by Xn. V = X0 is the set of vertices. Can construct a geometric realization |X| of X as a simplicial complex in the vector space RJ = {functions J → R} where J is any sufficiently large index set (J = V works). No real need for a total ordering on V so far. With one, could define face maps dn i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n: α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn i (α) = α {vi }. Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be sets of vertices. Let them simply be elements of an abstract set Xn. Trade off this loss for a collection of face maps which should behave as if they arose from a total ordering. 3
  • 6. Trade off this loss for a collection of face maps which should behave as if they arose from a total ordering. → Promote to axioms key structural properties of the collection of dn i : Xn → Xn−1 which don’t require knowing what the simplices look like. . . . Not much! Only the simplicial identity (SI) dn−1 i ◦ dn j = dn−1 j−1 ◦ dn i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n. Sequence of sets (Xn)n∈N0 and {dn i : Xn → Xn−1} satisfying (SI) → data for a Delta set (sometimes: “abstract Delta complex”). More general than ASCs because e.g.: 1. i = j =⇒ di (α) = dj (α); 2. di (α) = di (β) ∀ i =⇒ α = β. Geometric realization : For each simplex α let |∆α| = |∆dim α | where |∆n | ⊆ Rn+1 is the standard geometric n-simplex. Identify the faces appropriately to construct the topological space Real(X) as a quotient of the disjoint union α |∆α|. Hint: (dn i α, x) ∼ (α, Di nx) where Di n : |∆n−1 | → |∆n | is the inclusion of the i-th face (a coface map). 4
  • 7. Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}. Since {[n]}n∈N0 ∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn. Know how to extract i-th faces of all n-simplices at once: dn i : X([n]) → X([n − 1]). dn i “corresponds to” [n] {i}. But {[n] {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}. dn i implements in Xn the prototype map Di n : [n − 1] → [n] given by 0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar? =⇒ Our Delta set X is an implementation of {[n]}n∈N0 and of the collection of coface maps. Boring until we notice: Dj n ◦ Di n−1 = Di n ◦ Dj−1 n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar? For [l] f −→ [m] g −→ [n] let f ◦op g := g ◦ f . Starting from X(Di n) := dn i we can define X(Di n−1 ◦op Dj n) := X(Di n−1) ◦ X(Dj n) consistently thanks to (SI)! And extend to arbitrary compositions s.t. X(f ◦op g) = X(f ) ◦ X(g). Abstract nonsense: A Delta set is a functor X : ∆op → Sets where ∆ is the category with objects the [n]s, and arrows the strictly o.-p. maps. 5
  • 8. Further generalize (yes, really): Easy with categories and functors! Enlarge collection of arrows to include all non-strictly o.-p. maps. Call the new category ∆. A simplicial set is a functor X : ∆op → Sets. The collection of simplicial sets has the structure of a category S. But why? We would like to include “degenerate” simplices. Degeneracy maps sn i : X([n]) → X([n + 1]) expose any hidden degenerate simplices “by repeating the i-th vertex”. Example: (v0, v1, v1) = s1 1 ((v0, v1)), a degenerate 2-simplex “living inside” (v0, v1). sn i corresponds to and implements the unique o.-p. map Si n : [n + 1] → [n] repeating i twice – a codegeneracy map and the prototype of a “collapse” of an ordered simplex. Additional easy-to-check-but-tedious-to-write identities satisfied when codegeneracy maps are added to the coface maps. Functoriality yields corresponding identities satisfied by the face and degeneracy maps. Geometric realization : As for Delta sets, but add equivalences (sn i α, x) ∼ (α, Si nx). Real: S → Top is a functor. 6
  • 9. 7
  • 10. Motivation for us: Variations on the theme of singular homology of a topological space Y : Sing(Y ) is the simplicial set defined by Sing(Y ): [n] → {σ: |∆n | → Y continuous}, with di σ the restriction of σ to the i-th face and si σ the composition of σ with a collapse. Sing: Top → S is in fact a functor. This is just another definition, I want my time back. OK, but first note down this theorem: for any Y ∈ Top and X ∈ S, (Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}. Interpretation Sing and Real are not inverses, but if you did Real(Sing(Y )) the result would have topologically a lot in common with Y . UMAP employs a cousin of this result where Top is replaced by a category of finite “almost metric” spaces because these are directly and naturally defined by the data. What, then, must replace S, Real and Sing to yield something analogous to (Adj)? 8
  • 12. Fuzzy sets In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a membership function, i.e. µ(x) is the membership strength of x to A. Interpreting µ as a “field of Bernoulli probabilities” suggests fuzzy analogues to the standard Boolean operators ∪ and ∩: (A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν (A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x =⇒ ¬ (¬µ, ¬ν) = µ + ν − µν. If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is C((U, µ), (U, ν)) = u∈U KL Bern(µ(u)) Bern(ν(u)) = u∈U µ(u) log µ(u) ν(u) + (1 − µ(u)) log 1 − µ(u) 1 − ν(u) . 9
  • 13. Fuzzy simplicial sets A simplicial set was a functor ∆op → Sets. A fuzzy simplicial set is a functor X : ∆op → Fuzz where Fuzz is the category of fuzzy sets. sFuzz is the category of fuzzy simplicial sets. “Concretely”: Let I be (0, 1] ⊂ R,1 then can view X ∈ sFuzz as a functor X : (∆ × I)op → Sets. For each n, there is a fuzzy set (Xn, µn). Define X([n], a) := µ−1 n ([a, 1]). Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼ where each |∆α| = |∆dim α |. Reliant on the fact that for each object in ∆op – i.e. for each n – we have a model space |∆n | ∈ Top. Here objects in the source category (∆ × I)op contain the extra piece of information a ∈ (0, 1]. If we had equivalent model spaces |∆n a| and chose a category C |∆n a| to replace Top we could define a fuzzy set realization functor fReal: sFuzz → C “analogously” to Real. 1As a category. . . 10
  • 14. The correct adjunction Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n | appears in the definition of Real but also of Sing: Sing(Y )([n]) = {σ: |∆n | → Y cts} = {Top-arrows |∆n | → Y }. With a choice of “geometric” category C and of model space |∆n a| ∈ C, we can define by analogy fSing(Y )([n], a) = {C-arrows |∆n a| → Y } so that fSing: C → sFuzz. The obvious question What are “correct” choices of C and |∆n a|? Our answer Ones yielding a relation between fSing and fReal analogous to (Adj): e.g. C = EψMet, |∆n a| = (t0, . . . , tn) ∈ Rn+1 n i=0 ti = − log(a), ti ≥ 0 . (Spivak 2012). EψMet is extended dist=∞ allowed pseudo dist(x,y)=0 =⇒ x=y -metric spaces. 11
  • 15. Finite version Starting from a real-life point cloud we can at best hope to encode the metric structure in a finite almost-metric space. Need finite analogs Fin-EψMet, Fin-sFuzz, |∆n a|Fin ∈ Fin-EψMet, Fin-EψMet Fin-fSing −−−−−→ Fin-sFuzz Fin-fReal −−−−−→ Fin-EψMet, and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward) definitions and a proof of (Fin-fAdj) are the main mathematical contributions of the UMAP paper. Where we at? If our data problem naturally yields an object M ∈ Fin-EψMet, we can theoretically distill much of the topological information by computing Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N i=1 instead, we can first apply Fin-fSing individually and then take fuzzy unions! This will give us a global, fuzzy simplicial representation. 12
  • 16. Computer-friendly version We descend back to planet Earth. Truncate: Stop the computation of Fin-fSing(M) at some small finite n! Maximally cheap: n = 1. Understand the output data structure: Requires a look at the definitions. |∆n a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a, Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n a|Fin → M} = {distance non-increasing maps |∆n a|Fin → M}. So |∆1 a| ∼= ({0, − log a}, dEucl) and, if M = (M, d): Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}. So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q) . Just a weighted graph! 13
  • 17. 14
  • 18. Fuzzy set cross-entropy Let E be the abstract set of all possible 1-simplices and suppose we have two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond to high and low dimensional representations respectively. Then the fuzzy set cross entropy will be e∈E µh(e) log µh(e) µl (e) + (1 − µh(e)) log 1 − µh(e) 1 − µl (e) For fixed µh, minimizing this as a function of µl can be viewed as a force directed graph layout algorithm: • First term is minimized when µl (e) is as large as possible, i.e. when the distance between the points is as small as possible =⇒ an “attractive force” which is larger when µh(e) is large. • The second term will be minimized by making µl (e) as small as possible =⇒ a “repulsive force” between the ends of e whenever µh(e) is small. 15
  • 19. Uniformity and local metric structure
  • 20. Why uniformity? (Very vaguely) Some motivation: the ˇCech complex construction from a finite sample of points is best at topologically reconstructing the underlying manifold when the points are sampled uniformly. Theorem (Niyogi et al. 2008). Let M be a smooth, compact submanifold of Rn with injectivity radius τ. Let D be a collection of points on M such that the minimal distance between any point of M and D is less than /2 for < τ 3/5 – say that D is 2 -dense in M. Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒ homotopy equivalence =⇒ same homology). Other results show that the more points we sample uniformly from M, the higher the probability that the resulting D will be 2 -dense. 16
  • 21. Learning local metric spaces from data Basic idea: If enough data is sampled uniformly from a Riemannian manifold, we should be able to estimate the local metric from the local density of sample points. Can estimate the local metric structure relative to which the data would be uniformly sampled by enforcing that spheres of radius δ centred at different locations in the point cloud should contain the same number K of sample points. In practice, locally rescale distances between each reference point and the rest of the cloud by making sure this is the case. 17
  • 23. Local (extended pseudo-)metric spaces Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define, for i = 1, . . . , N, Mi = (M = {xi }N i=1, di ) where ∀ j = i di (xi , xj ) = Dij − ρi σi , ρi σi := dist. between xi and its 1st κth NN, and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒ corresponding edge has membership strength 1 =⇒ local connectivity. Current implementational shortcuts Using the nearest neighbour descent algorithm (Dong et al 2011) to efficiently yield an approximate κ-nearest neighbour graph data structure. The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t. xjk ∈κ-NNi exp−(Dijk −ρi )/ ˆσi = log2 κ. RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi ∞-ly far away from xi . Reduction in complexity from O(N2 ) to O(Nκ)! 18
  • 24. Embedding initialization Fuzzy union of all local fuzzy sets of edges gives an undirected weighted graph with weighted adjacency matrix B. With D the degree matrix, L := D−1/2 (D − B)D−1/2 = I − D−1/2 BD−1/2 is the symmetric normalized Laplacian. If the data were generated by sampling from a Riemannian manifold, L should be closely related to the Laplace–Beltrami operator. Exploit this to initialize the low dimensional representation into a good state by spectral embedding techniques. In practice Components of eigenvectors associated with d smallest non-zero eigenvalues of L (listed in ascending e-value order) used to initialize the embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd . 19
  • 25. Embedding optimization (briefly) Recall the optimization objective: if (E, µh) = N i=1 Fin-fSing(Mi )([1]) and Z := (Z, dEucl) then the loss function is L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]). Several shortcuts: • Use stochastic gradient descent • (S)GD would benefit from the final objective function being differentiable. But Fin-fSin – as a function of N points in Rd – is not! Use a smooth approximation of the actual membership strength function for the low dimensional representation, selecting from a suitably versatile family. In practice UMAP uses the family of curves 1 1+ax2b . • Don’t want to have to deal with all possible edges, so use the negative sampling trick (as in word2vec and LargeVis), to sample negative examples as needed. 20
  • 26. Thank you for your attention! 20