SlideShare a Scribd company logo
C
Topological Data Analysis of
Complex Spatial Systems
Mason A. Porter (@masonporter)
Department of Mathematics, UCLA
22 of 45
(a) (b) (c)
FIG. 8. Visualization of the three different spatial partitions of Peru’s provinces on a map. (a) Broad climate partition
into coast (yellow), mountains (brown), and jungle (green); (b) detailed climate partition, in which we start with the
broad partition and then further divide the coast and mountains into northern coast, central coast, southern coast, northern
mountains, central mountains, and southern mountains; and (c) the administrative partition of Peru. We obtained province
boundaries from [82] and plot the maps in MATLAB.
We use the term “spatial partitions” to describe partitions that have high z-Rand scores in
comparison to the manual climate or administrative partitions. For multilayer networks, we
also compare the algorithmic partitions to partitions that contain a planted temporal change in
community structure. For these comparisons, we group the multilayer nodes into ones that occur
before or after a “critical” time tc (i.e., partitions into a “pre-tc” community and a “post-tc”
community). We test the set t = 1,1+D,1+2D,...,1+D ⇥ bT
D c 1 of times that we
use to create the multilayer network, and we report the time with the highest z-Rand score as the
critical time tc. We also test for pairs of critical times (yielding a partition into three communities)
by examining all possible pairs of critical times tc1 and tc2 in the same manner. We use the term
“temporal partitions” to describe algorithmic partitions of the disease-correlation networks that
yield high z-Rand scores in these comparisons.
4.3.1 Modularity Maximization Using the NG Null Model. We first study the community
structures of the 700 overlapping static networks formed by taking t = {1,2,...,700} and using
D = 80. (There are 779 time points in total.) The community structures that we obtain from max-
imizing modularity have a strong spatial organization, as suggested by the high z-Rand scores
when compared to topographical partitions. As one can see in Fig. 9(a), in which we plot the
z-Rand scores versus the centers of the time windows that correspond to the static networks, the
spatial organization is especially evident starting in the year 2000. In our subsequent figures,
time points that we indicate on the axes also correspond to the centers of the associated time
windows.
As one can see from a plot of number of epidemic cases over time (see Fig. 7), this transition
seems to occur near the time of the largest countrywide epidemic in the data, and the subse-
quent period includes recurring yearly epidemics that were linked to climatic patterns in prior
Spatial Systems
• Space has a major influence on
the structures of networks and
other complex systems
• Useful reference: Marc
Barthelemy, Morphogenesis of
Spatial Networks, 2018
Slime molds and fungal networks
• See, e.g., work by Mark
Fricker and collaborators
Leaf-Venation
Patterns
(Eleni Katifori and
collaborators)
Spiders: Spinning Webs While Under the Influence
The border between
Belgium and the
Netherlands at Baarle-
Nassau/Baarle-Hertog
Topological Data Analysis
• Algorithmic methods to study high-dimensional data in a
quantitative manner
• Data from point clouds, networks, etc.
• Examine the “shape” of data
• Persistent homology
• Mathematical formalism for studying topological invariants
• Fast algorithms
• Persistent structures: a way to cope with noise in data
• Allows examination of “higher-order” (beyond pairwise) interactions in data
My TDA “Origin Story”
• Available at:
https://twitter.com/masonporter/status/1200127512371556352
• The short short version
• In college (1994–1998), I saw some algebraic topology, but it looked very abstract and it seemed
more on the ‘purer’ (i.e., theoretical) end of mathematics
• As a postdoc (2002–2005) at Georgia Tech, I saw Konstantin Mischaikow using ideas from
computational topology on experimental data (e.g., in fluid mechanics)
• In late 2012, I noticed that Konstantin and I were both working on granular materials (him with
TDA, me with network analysis), so I contacted him and we arranged a visit.
• We did a project on TDA on spreading processes on networks (Taylor et al., Nature Commun., 2015)
• An Oxford doctoral student saw that paper and wanted to work on TDA with me.
• Another Oxford student saw a couple of previous papers of mine with TDA and wanted to work on that with
me.
• Rinse, wash, and repeat — and unintentionally now I do a lot of TDA stuff.
Topological Data Analysis
(in practice, usually persistent homology)
• Michelle Feng, Abigail Hickok, Yacoub H. Kureh, Mason A. Porter,
Chad M. Topaz, “Connecting the Dots: Discovering the “Shape” of
Data”, Frontiers for Young Minds, 2021
• Chad Topaz’s introductory article (for a general audience) in DSWeb
• https://dsweb.siam.org/The-Magazine/Article/topological-data-analysis
• Nina Otter, MAP, Ulrike Tillmann, Peter Grindrod, and Heather A.
Harrington [2017], “A Roadmap for the Computation of Persistent
Homology”, European Physical Journal — Data Science, Vol. 6: 17
SIAM News
Jan–Feb 2020 issue
Example: Point-Cloud Data
[Figure from: Michelle Feng, Abigail Hickock, Yacoub H. Kureh, MAP, & Chad M. Topaz,
“Connecting the Dots: Discovering the “Shape” of Data”, Frontiers for Young Minds, 2021]
C
Topological Data Analysis
of 2D Voting Data
Michelle Feng & MAP [2021], “Persistent Homology of Geospatial Data:
A Case Study with Voting”, SIAM Review, Vol. 63, No. 1: 67–99
Quantifying “Political Islands”
How do we detect red voters in a sea of blue?
(Or light blue voters in a sea of dark blue?)
Precinct-Level Voting Data
• How do people vote?
• How can we identify
geographical or temporal
patterns in voting?
• Our paper: geographical
• Can we automatically
characterize 2D
geographical outliers?
• Voting data (compiled by Los
Angeles Times) for all
California precincts in 2016
election
TDA and Voting Data
• Topological methods allow us to find and
identify holes, if we have a nice enough
space to search for those holes
• They also allow us to relate the presence
of holes to global structure
• Want to find “political islands”
• Red voters in a sea of blue, etc.
• Consider these as “holes” in a manifold in which
all precincts vote similarly
• Maybe we can also say something about
the structure of a county?
Barcodes
• A method of visualizing the PH of a
point cloud
• Each interval represents a feature in
dimension n
• Left endpoint = “birth” of a feature
• Right endpoint = “death” of a feature
• Visually, long features are “more
persistent”
Persistence Diagrams
• Another way of visualizing PH
• Put the filtration on both the
horizontal and vertical axes
• If a feature is born at b and dies at d,
we place a point at (b,d)
• The height above the diagonal
indicates persistence
• Pink circles: H0
• Blue squares: H1
Distance-Based Constructions:
Vietoris–Rips (VR) and Alpha Complexes
•VR complex [Jigglypuff]
• Surround each point in a point cloud with balls of radius !
• For a set of n + 1 points, if the pairwise distance between any two points is
less than !, build an n–simplex. The resulting simplicial complex is X!
•Alpha complex
• Compute the Delaunay triangulation of the point cloud
• X! is the simplicial complex formed by the set of edges and triangles
whose radii are at most !
Topological Data Analysis of Complex Spatial Systems
Summary: Distance-Based Constructions
•Advantages
• Easy to construct
• Fast algorithms, built into many packages
• Easy to interpret
• Embedded in Euclidean space, built-in parameter selection
•Disadvantages
• Which parameter values are appropriate ones?
• Persistence doesn’t always measure what we want it to
• Sensitive to rescaling
• Requires data in point-cloud form
Adjacency Construction
• Use network adjacency to define simplices
• If n + 1 nodes are all pairwise adjacent, define an n–simplex
• Given appropriate node data (or edge data), we construct a
filtration
• Note that filtration is not determined by distance
• In our data, filtration corresponds to strength of precinct
preference for a specific candidate
• For example, we can find light-blue precincts in a sea of dark blue
Summary: Adjacency Construction
• Advantages
• Does not depend on distance scaling
• Suitable for networks that aren’t easy or natural to embed in
Euclidean space
• Disadvantages
• Still only works on discrete data
• Sensitive to choices of construction of the underlying network
• It requires the nodes or edges to have associated data to construct
a meaningful filtration
Level-Set VR Construction
• Use data in surface form
• Take map of all precincts with similar voting patterns, and consider the outer
contour to be the 0 level set of some 3D object
• Evolve the surface outward with forces on a triangular grid according to the
level-set equation
• Take the collection of filled grid cells to be 2-simplices (and take grid lines to
be edges; and take points to be vertices)
• The filtration is given by the time steps of the evolution
Level Sets and PH
• The level-set method is a very fast method for front
propagation
• Persistence corresponds to the size of a feature: larger
holes take longer to fill
• We’re still “thickening” a point cloud (as in VR complexes),
except that we start with a manifold
Summary: Level-Set VR Construction
•Advantages
• We can use the underlying shape of a map
• We maintain some notion of geographic size of holes via the mesh size
• Faster than previous VR method on large data sets
•Disadvantages
• Difficult to associate generators of holes with the original precincts on the map
• Potentially not well-suited to less granular data
• Captures geographic features (e.g., bodies of water) that may not be desirable
Topological Data Analysis of Complex Spatial Systems
A key point from the MF + MAP paper
• Our new constructions allow us to distinguish short-persistence
features that occur only for a narrow range of distance scales (e.g.,
voting behaviors in densely populated cities) from short-persistence
noise by incorporating information about other spatial relationships
between precincts
• Note: “Short persistence” with respect to the usual filtrations that
don’t take geospatial nature of the problem into account
C
Persistent Homology on
Other Spatial Data
Michelle Feng & MAP [2020], Physical Review Research, Vol. 2, No. 3: 033426
Spiders Spinning Under the Influence
• The Marshall Space Flight Center studied the webs of spiders that were exposed to
various chemicals. (There is a NASA Tech Brief from 1995.)
• Earlier work, starting in 1948 by Swiss pharmacologist Peter N. Witt
• They concluded that more toxic chemicals resulted in more deformed spiderwebs
PH with Level-Set
Complexes on
Spiderwebs
Pink circles: H0
Blue squares: H1
Topological Data Analysis of Complex Spatial Systems
Street Networks in Cities
Los Angeles
(gridlike)
(a) Aleppo and (b) Barcelona
(interrupted grids)
(a) Nanyang and (b) London
(not gridlike)
C
Analysis of Spatiotemporal Anomalies Using
Persistent Homology: Case Studies with
COVID-19 Data
Abigail Hickok, Deanna Needell, and MAP, arXiv:2107.09188
COVID-19 Data Sets
• COVID-19 per capita vaccination rates in the different zip
codes of New York City
• Fully vaccinated people on 23 February 2021
• COVID-19 case rates in neighborhoods in the city of Los
Angeles
• Running 14-day mean per capita case rate from 25 April
2020 through 25 April 2021
Constructing a Simplicial Complex
• (1) Construct a 2D simplicial complex
for each region.
• (2) Glue their boundaries together in a
way that respects the geographical
region boundaries.
A More Complicated Situation
Filtration Functions (example: sublevel filtration)
Per capita vaccination
rate (NYC) or running
14-day mean of per
capita case rate (LA)
Case Study: Vaccination Rates in NYC
• Each point in the PD
corresponds to a zip code
(which we label by Borough)
that has a higher vaccination
rate than its neighboring zip
codes.
• We use “vineyards” to examine
the birth and death of features
over time.
• A continuous “stack” of PDs through
time. Points in the PD trace out curves
(“vines”) through time.
Topological Data Analysis of Complex Spatial Systems
Conclusions
• Topological data analysis (TDA), such as by computing persistent homology
(PH), can give insights into large-scale structures in networks and other
complex systems
• Important: going beyond pairwise interactions in networks
• Persistent homology of spatial and spatiotemporal data
• By looking at 2D data, we can do systematic comparisons between different types of
constructions (topologically, fewer things can happen)
• Incorporate information from applications of interest into PH approaches
• Short-persistence features versus short-persistence noise: Need to think
carefully about how one constructs simplicial complexes
• Serendipity in research: You can end up writing a lot of papers on a topic
without intending to make it a big part of your research program
• Students and postdocs driving you into new research areas is the best thing ever™

More Related Content

PDF
Tutorial of topological_data_analysis_part_1(basic)
PPTX
Topological Data Analysis.pptx
PDF
Introduction to Topological Data Analysis
PDF
Graph Theory: Cut-Set and Cut-Vertices
PPT
Princípios de Estatística Inferencial - II
PDF
Cluster Analysis for Dummies
PPTX
Descriptive statistics
PPTX
Eclat algorithm in association rule mining
Tutorial of topological_data_analysis_part_1(basic)
Topological Data Analysis.pptx
Introduction to Topological Data Analysis
Graph Theory: Cut-Set and Cut-Vertices
Princípios de Estatística Inferencial - II
Cluster Analysis for Dummies
Descriptive statistics
Eclat algorithm in association rule mining

What's hot (20)

PPTX
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
PPTX
Basics of Hypothesis Testing
PPTX
Outlier analysis and anomaly detection
PPT
Estimation Of The Box Cox Transformation Parameter And Application To Hydrolo...
PPT
Classification (ML).ppt
PDF
Outlier Detection
PDF
PCA (Principal component analysis)
PPTX
Predictive Modelling
PDF
resampling techniques in machine learning
PDF
Community detection in graphs
PDF
Density Based Clustering
PDF
Chapter 1 Functions Relations V3
PPT
Algebraic structures
PDF
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
PPTX
Anomaly Detection
PDF
Topological Data Analysis: visual presentation of multidimensional data sets
PPT
Data collection & management
PDF
Data management in Stata
PPTX
Unit 2: All
PPTX
Cluster Analysis
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Basics of Hypothesis Testing
Outlier analysis and anomaly detection
Estimation Of The Box Cox Transformation Parameter And Application To Hydrolo...
Classification (ML).ppt
Outlier Detection
PCA (Principal component analysis)
Predictive Modelling
resampling techniques in machine learning
Community detection in graphs
Density Based Clustering
Chapter 1 Functions Relations V3
Algebraic structures
Slides for Ph.D. Thesis Defense of Dheryta Jaisinghani at IIIT-Delhi, INDIA
Anomaly Detection
Topological Data Analysis: visual presentation of multidimensional data sets
Data collection & management
Data management in Stata
Unit 2: All
Cluster Analysis
Ad

Similar to Topological Data Analysis of Complex Spatial Systems (20)

PDF
Topological Data Analysis of Complex Spatial Systems
PDF
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
PPTX
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
PDF
Networks in Space: Granular Force Networks and Beyond
PPTX
Summary of survey papers on deep learning method to 3D data
PPTX
Presentation
PPTX
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
PPT
Spatial data mining
PDF
Topological Data Analysis
PDF
20131106 acm geocrowd
PDF
Big Data and Geospatial with HPCC Systems
PPTX
ODSC India 2018: Topological space creation & Clustering at BigData scale
PDF
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
PDF
An Introduction to Networks
PDF
network mining and representation learning
PPTX
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
PPTX
Undergraduate Modeling Workshop - Forest Cover Working Group Final Presentati...
PPTX
sheeba 1.pptx
PDF
CSE5656 Complex Networks - Final Presentation
Topological Data Analysis of Complex Spatial Systems
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Networks in Space: Granular Force Networks and Beyond
Summary of survey papers on deep learning method to 3D data
Presentation
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
Spatial data mining
Topological Data Analysis
20131106 acm geocrowd
Big Data and Geospatial with HPCC Systems
ODSC India 2018: Topological space creation & Clustering at BigData scale
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
An Introduction to Networks
network mining and representation learning
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
Undergraduate Modeling Workshop - Forest Cover Working Group Final Presentati...
sheeba 1.pptx
CSE5656 Complex Networks - Final Presentation
Ad

More from Mason Porter (17)

PDF
Opinion Dynamics on Generalized Networks
PDF
Social Dynamics on Networks
PDF
Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...
PDF
Opinion Dynamics on Networks
PDF
The Science of "Chaos"
PDF
Centrality in Time- Dependent Networks
PDF
Paper Writing in Applied Mathematics (slightly updated slides)
PDF
Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)
PDF
Mathematics and Social Networks
PDF
Snowbird comp-top-may2017
PDF
Data Ethics for Mathematicians
PDF
Mesoscale Structures in Networks
PPTX
Map history-networks-shorter
PDF
Ds15 minitute-v2
PDF
Matchmaker110714
PDF
Cascades and Social Influence on Networks, UCSB, 3 Oct 2014
PDF
Multilayer tutorial-netsci2014-slightlyupdated
Opinion Dynamics on Generalized Networks
Social Dynamics on Networks
Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...
Opinion Dynamics on Networks
The Science of "Chaos"
Centrality in Time- Dependent Networks
Paper Writing in Applied Mathematics (slightly updated slides)
Tutorial on Paper-Writing in Applied Mathematics (Preliminary Draft of Slides)
Mathematics and Social Networks
Snowbird comp-top-may2017
Data Ethics for Mathematicians
Mesoscale Structures in Networks
Map history-networks-shorter
Ds15 minitute-v2
Matchmaker110714
Cascades and Social Influence on Networks, UCSB, 3 Oct 2014
Multilayer tutorial-netsci2014-slightlyupdated

Recently uploaded (20)

PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
An interstellar mission to test astrophysical black holes
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
2. Earth - The Living Planet earth and life
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Viruses (History, structure and composition, classification, Bacteriophage Re...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
An interstellar mission to test astrophysical black holes
Biophysics 2.pdffffffffffffffffffffffffff
The KM-GBF monitoring framework – status & key messages.pptx
Placing the Near-Earth Object Impact Probability in Context
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
Derivatives of integument scales, beaks, horns,.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
TOTAL hIP ARTHROPLASTY Presentation.pptx
ECG_Course_Presentation د.محمد صقران ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
2. Earth - The Living Planet earth and life
Cell Membrane: Structure, Composition & Functions
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...

Topological Data Analysis of Complex Spatial Systems

  • 1. C Topological Data Analysis of Complex Spatial Systems Mason A. Porter (@masonporter) Department of Mathematics, UCLA
  • 2. 22 of 45 (a) (b) (c) FIG. 8. Visualization of the three different spatial partitions of Peru’s provinces on a map. (a) Broad climate partition into coast (yellow), mountains (brown), and jungle (green); (b) detailed climate partition, in which we start with the broad partition and then further divide the coast and mountains into northern coast, central coast, southern coast, northern mountains, central mountains, and southern mountains; and (c) the administrative partition of Peru. We obtained province boundaries from [82] and plot the maps in MATLAB. We use the term “spatial partitions” to describe partitions that have high z-Rand scores in comparison to the manual climate or administrative partitions. For multilayer networks, we also compare the algorithmic partitions to partitions that contain a planted temporal change in community structure. For these comparisons, we group the multilayer nodes into ones that occur before or after a “critical” time tc (i.e., partitions into a “pre-tc” community and a “post-tc” community). We test the set t = 1,1+D,1+2D,...,1+D ⇥ bT D c 1 of times that we use to create the multilayer network, and we report the time with the highest z-Rand score as the critical time tc. We also test for pairs of critical times (yielding a partition into three communities) by examining all possible pairs of critical times tc1 and tc2 in the same manner. We use the term “temporal partitions” to describe algorithmic partitions of the disease-correlation networks that yield high z-Rand scores in these comparisons. 4.3.1 Modularity Maximization Using the NG Null Model. We first study the community structures of the 700 overlapping static networks formed by taking t = {1,2,...,700} and using D = 80. (There are 779 time points in total.) The community structures that we obtain from max- imizing modularity have a strong spatial organization, as suggested by the high z-Rand scores when compared to topographical partitions. As one can see in Fig. 9(a), in which we plot the z-Rand scores versus the centers of the time windows that correspond to the static networks, the spatial organization is especially evident starting in the year 2000. In our subsequent figures, time points that we indicate on the axes also correspond to the centers of the associated time windows. As one can see from a plot of number of epidemic cases over time (see Fig. 7), this transition seems to occur near the time of the largest countrywide epidemic in the data, and the subse- quent period includes recurring yearly epidemics that were linked to climatic patterns in prior Spatial Systems • Space has a major influence on the structures of networks and other complex systems • Useful reference: Marc Barthelemy, Morphogenesis of Spatial Networks, 2018
  • 3. Slime molds and fungal networks • See, e.g., work by Mark Fricker and collaborators
  • 5. Spiders: Spinning Webs While Under the Influence
  • 6. The border between Belgium and the Netherlands at Baarle- Nassau/Baarle-Hertog
  • 7. Topological Data Analysis • Algorithmic methods to study high-dimensional data in a quantitative manner • Data from point clouds, networks, etc. • Examine the “shape” of data • Persistent homology • Mathematical formalism for studying topological invariants • Fast algorithms • Persistent structures: a way to cope with noise in data • Allows examination of “higher-order” (beyond pairwise) interactions in data
  • 8. My TDA “Origin Story” • Available at: https://twitter.com/masonporter/status/1200127512371556352 • The short short version • In college (1994–1998), I saw some algebraic topology, but it looked very abstract and it seemed more on the ‘purer’ (i.e., theoretical) end of mathematics • As a postdoc (2002–2005) at Georgia Tech, I saw Konstantin Mischaikow using ideas from computational topology on experimental data (e.g., in fluid mechanics) • In late 2012, I noticed that Konstantin and I were both working on granular materials (him with TDA, me with network analysis), so I contacted him and we arranged a visit. • We did a project on TDA on spreading processes on networks (Taylor et al., Nature Commun., 2015) • An Oxford doctoral student saw that paper and wanted to work on TDA with me. • Another Oxford student saw a couple of previous papers of mine with TDA and wanted to work on that with me. • Rinse, wash, and repeat — and unintentionally now I do a lot of TDA stuff.
  • 9. Topological Data Analysis (in practice, usually persistent homology) • Michelle Feng, Abigail Hickok, Yacoub H. Kureh, Mason A. Porter, Chad M. Topaz, “Connecting the Dots: Discovering the “Shape” of Data”, Frontiers for Young Minds, 2021 • Chad Topaz’s introductory article (for a general audience) in DSWeb • https://dsweb.siam.org/The-Magazine/Article/topological-data-analysis • Nina Otter, MAP, Ulrike Tillmann, Peter Grindrod, and Heather A. Harrington [2017], “A Roadmap for the Computation of Persistent Homology”, European Physical Journal — Data Science, Vol. 6: 17
  • 11. Example: Point-Cloud Data [Figure from: Michelle Feng, Abigail Hickock, Yacoub H. Kureh, MAP, & Chad M. Topaz, “Connecting the Dots: Discovering the “Shape” of Data”, Frontiers for Young Minds, 2021]
  • 12. C Topological Data Analysis of 2D Voting Data Michelle Feng & MAP [2021], “Persistent Homology of Geospatial Data: A Case Study with Voting”, SIAM Review, Vol. 63, No. 1: 67–99
  • 13. Quantifying “Political Islands” How do we detect red voters in a sea of blue? (Or light blue voters in a sea of dark blue?)
  • 14. Precinct-Level Voting Data • How do people vote? • How can we identify geographical or temporal patterns in voting? • Our paper: geographical • Can we automatically characterize 2D geographical outliers? • Voting data (compiled by Los Angeles Times) for all California precincts in 2016 election
  • 15. TDA and Voting Data • Topological methods allow us to find and identify holes, if we have a nice enough space to search for those holes • They also allow us to relate the presence of holes to global structure • Want to find “political islands” • Red voters in a sea of blue, etc. • Consider these as “holes” in a manifold in which all precincts vote similarly • Maybe we can also say something about the structure of a county?
  • 16. Barcodes • A method of visualizing the PH of a point cloud • Each interval represents a feature in dimension n • Left endpoint = “birth” of a feature • Right endpoint = “death” of a feature • Visually, long features are “more persistent”
  • 17. Persistence Diagrams • Another way of visualizing PH • Put the filtration on both the horizontal and vertical axes • If a feature is born at b and dies at d, we place a point at (b,d) • The height above the diagonal indicates persistence • Pink circles: H0 • Blue squares: H1
  • 18. Distance-Based Constructions: Vietoris–Rips (VR) and Alpha Complexes •VR complex [Jigglypuff] • Surround each point in a point cloud with balls of radius ! • For a set of n + 1 points, if the pairwise distance between any two points is less than !, build an n–simplex. The resulting simplicial complex is X! •Alpha complex • Compute the Delaunay triangulation of the point cloud • X! is the simplicial complex formed by the set of edges and triangles whose radii are at most !
  • 20. Summary: Distance-Based Constructions •Advantages • Easy to construct • Fast algorithms, built into many packages • Easy to interpret • Embedded in Euclidean space, built-in parameter selection •Disadvantages • Which parameter values are appropriate ones? • Persistence doesn’t always measure what we want it to • Sensitive to rescaling • Requires data in point-cloud form
  • 21. Adjacency Construction • Use network adjacency to define simplices • If n + 1 nodes are all pairwise adjacent, define an n–simplex • Given appropriate node data (or edge data), we construct a filtration • Note that filtration is not determined by distance • In our data, filtration corresponds to strength of precinct preference for a specific candidate • For example, we can find light-blue precincts in a sea of dark blue
  • 22. Summary: Adjacency Construction • Advantages • Does not depend on distance scaling • Suitable for networks that aren’t easy or natural to embed in Euclidean space • Disadvantages • Still only works on discrete data • Sensitive to choices of construction of the underlying network • It requires the nodes or edges to have associated data to construct a meaningful filtration
  • 23. Level-Set VR Construction • Use data in surface form • Take map of all precincts with similar voting patterns, and consider the outer contour to be the 0 level set of some 3D object • Evolve the surface outward with forces on a triangular grid according to the level-set equation • Take the collection of filled grid cells to be 2-simplices (and take grid lines to be edges; and take points to be vertices) • The filtration is given by the time steps of the evolution
  • 24. Level Sets and PH • The level-set method is a very fast method for front propagation • Persistence corresponds to the size of a feature: larger holes take longer to fill • We’re still “thickening” a point cloud (as in VR complexes), except that we start with a manifold
  • 25. Summary: Level-Set VR Construction •Advantages • We can use the underlying shape of a map • We maintain some notion of geographic size of holes via the mesh size • Faster than previous VR method on large data sets •Disadvantages • Difficult to associate generators of holes with the original precincts on the map • Potentially not well-suited to less granular data • Captures geographic features (e.g., bodies of water) that may not be desirable
  • 27. A key point from the MF + MAP paper • Our new constructions allow us to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting behaviors in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between precincts • Note: “Short persistence” with respect to the usual filtrations that don’t take geospatial nature of the problem into account
  • 28. C Persistent Homology on Other Spatial Data Michelle Feng & MAP [2020], Physical Review Research, Vol. 2, No. 3: 033426
  • 29. Spiders Spinning Under the Influence • The Marshall Space Flight Center studied the webs of spiders that were exposed to various chemicals. (There is a NASA Tech Brief from 1995.) • Earlier work, starting in 1948 by Swiss pharmacologist Peter N. Witt • They concluded that more toxic chemicals resulted in more deformed spiderwebs
  • 30. PH with Level-Set Complexes on Spiderwebs Pink circles: H0 Blue squares: H1
  • 33. Los Angeles (gridlike) (a) Aleppo and (b) Barcelona (interrupted grids) (a) Nanyang and (b) London (not gridlike)
  • 34. C Analysis of Spatiotemporal Anomalies Using Persistent Homology: Case Studies with COVID-19 Data Abigail Hickok, Deanna Needell, and MAP, arXiv:2107.09188
  • 35. COVID-19 Data Sets • COVID-19 per capita vaccination rates in the different zip codes of New York City • Fully vaccinated people on 23 February 2021 • COVID-19 case rates in neighborhoods in the city of Los Angeles • Running 14-day mean per capita case rate from 25 April 2020 through 25 April 2021
  • 36. Constructing a Simplicial Complex • (1) Construct a 2D simplicial complex for each region. • (2) Glue their boundaries together in a way that respects the geographical region boundaries.
  • 37. A More Complicated Situation
  • 38. Filtration Functions (example: sublevel filtration) Per capita vaccination rate (NYC) or running 14-day mean of per capita case rate (LA)
  • 39. Case Study: Vaccination Rates in NYC • Each point in the PD corresponds to a zip code (which we label by Borough) that has a higher vaccination rate than its neighboring zip codes.
  • 40. • We use “vineyards” to examine the birth and death of features over time. • A continuous “stack” of PDs through time. Points in the PD trace out curves (“vines”) through time.
  • 42. Conclusions • Topological data analysis (TDA), such as by computing persistent homology (PH), can give insights into large-scale structures in networks and other complex systems • Important: going beyond pairwise interactions in networks • Persistent homology of spatial and spatiotemporal data • By looking at 2D data, we can do systematic comparisons between different types of constructions (topologically, fewer things can happen) • Incorporate information from applications of interest into PH approaches • Short-persistence features versus short-persistence noise: Need to think carefully about how one constructs simplicial complexes • Serendipity in research: You can end up writing a lot of papers on a topic without intending to make it a big part of your research program • Students and postdocs driving you into new research areas is the best thing ever™