SlideShare a Scribd company logo
TOPOLOGICAL DATA ANALYSIS
HJ vanVeen· Data Science· Nubank Brasil
TOPOLOGY I
• "When a truth is necessary, the reason for it can be
found by analysis, that is, by resolving it into simpler
ideas and truths until the primary ones are reached."
- Leibniz
TOPOLOGY II
• Topology is the mathematical study of topological
spaces.
• Topology is interested in shapes,
• More specifically: the concept of 'connectedness'
TOPOLOGY III
• A topologist is someone who does not see the
difference between a coffee mug and a donut.











HISTORY I
• “Nothing at all takes place in the universe in which
some rule of maximum or minimum does not
appear.” - Euler
• Seven Bridges of Koningsbrucke: devise a walk
through the city that would cross each bridge
once and only once.
HISTORY II
HISTORY III
• Euler's big insights:
• Doesn’t matter where you start walking, only matters which bridges
you cross.
• A similar solution should be found, regardless where you start your
walk.
• only the connectedness of bridges matter,
• a solution should also apply to all other bridges that are connected
in a similar fashion, no matter the distances between them.
HISTORY IV
• We now call these graph walks ‘Eulerian walks’ in
Euler’s honor.
• Euler's first proven graph theory theorem:
• 'Euler walks' are possible if exactly zero or two nodes
have an odd number of edges.
TDA I
• TDA marries 300-year old maths with
modern data analysis.
• Captures the shape of data
• Is invariant
• Compresses large datasets
• Functions well in the presence of noise / missing variables
TDA II
• Capturing the shape of data





























•Traditional techniques like clustering or dimensionality reduction have
trouble capturing this shape.

TDA III
• Invariance.









• Euler showed that only connectedness matters.The size, position, or
pose of an object doesn't change that object.
TDA IV
• Compression.
• Compressed representations use 

the order in data.
• Only order can be compressed.
• Random noise or slight variations 

are ignored.
• Lossy compression retains the most

important features.
• "Now where there are no parts, there neither extension, nor shape, nor divisibility is possible.
And these monads are the true atoms of nature and, in a word, the elements of things." - Leibniz
MAPPER I
• Mapper was created by Ayasdi Co-founder
Gurjeet Singh during his PhD under Gunnar
Carlsson.
• Based on the idea of partial clustering of the data
guided by a set of functions defined on the data.
MAPPER II
• Mapper was inspired by the Reeb Graph.













MAPPER III
• Map the data with overlapping intervals.
• Cluster the points inside the intervals
• When clusters share data points draw an edge
• Color nodes by function
MAPPER IV
MAPPERV
Distance_to_median(row) x y z
1.5 1.5 1.5 1.5
1.5 -0.5 -0.5 -0.5
0 1 1 1
0 1 0.9 1.1
3 2 2 2
3 2.1 1.9 2
Y
MAPPERVI
• In conclusion:
FUNCTIONS
• Raw features or point-cloud axis / coordinates
• Statistics: Mean, Max, Skewness, etc.
• Mathematics: L2-norm, FourierTransform, etc.
• Machine Learning: t-SNE, PCA, out-of-fold preds
• Deep Learning: Layer activations, embeddings
CLUSTER ALGO’S
• DBSCAN / HDBSCAN:
• Handles noise well.
• No need to set number of clusters.
• K-Means:
• Creates visually nice simplicial complexes/graphs
SOME GENERAL USE CASES
• ComputerVision
• Model and feature inspection
• Computational Biology / Healthcare
• Persistent Homology
COMPUTERVISION
• Demo













MODEL AND FEATURE
INSPECTION
• Demo













COMPUTATIONAL BIOLOGY
• Example













PERSISTENT HOMOLOGY
• Example













SOME FINANCE USE CASES
• Customer Segmentation
• Transactional Fraud
• Accurate Interpretable Models
• Exploration / Analysis
CUSTOMER SEGMENTATION
• Demo













TRANSACTIONAL FRAUD
• Example of spousal fraud













ACCURATE INTERPRETABLE
MODELS
• Create: global linear model
• Function: L2-norm
• Color: Heatmap by ground truth and animate to out-of-fold model predictions
• Identify: Low accuracy sub graphs
• Select: Features that are most important for sub graphs
• Create: Local linear models on sub graphs
• Stack: DecisionTree
• Compare: Divide-and-Conquer and LIME
• DEMO
EXPLORATION / ANALYSIS
• Demo













QUESTIONS?
FURTHER READING
• Google terms:
• Ayasdi,Topological Data Analysis, Robert Ghrist, Gurjeet Singh, Gunnar Carlsson,
Anthony Bak,Allison Gilmore, Simplicial Complex, Python Mapper.
• Videos:
• https://www.youtube.com/watch?v=4RNpuZydlKY
• https://www.youtube.com/watch?v=x3Hl85OBuc0
• https://www.youtube.com/watch?v=cJ8W0ASsnp0
• https://www.youtube.com/watch?v=kctyag2Xi8o

More Related Content

PDF
Introduction to Topological Data Analysis
PDF
Kaggle presentation
PDF
Hacking Predictive Modeling - RoadSec 2018
PDF
Topological Data Analysis and Persistent Homology
PDF
Topological Data Analysis: visual presentation of multidimensional data sets
PDF
Introduction to Topological Data Analysis
PPTX
Topological Data Analysis.pptx
PDF
PR-203: Class-Balanced Loss Based on Effective Number of Samples
Introduction to Topological Data Analysis
Kaggle presentation
Hacking Predictive Modeling - RoadSec 2018
Topological Data Analysis and Persistent Homology
Topological Data Analysis: visual presentation of multidimensional data sets
Introduction to Topological Data Analysis
Topological Data Analysis.pptx
PR-203: Class-Balanced Loss Based on Effective Number of Samples

What's hot (20)

PDF
Feature Engineering
PDF
Tutorial of topological_data_analysis_part_1(basic)
PDF
Topological data analysis
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PDF
General Tips for participating Kaggle Competitions
PDF
Tips for data science competitions
PDF
統計的因果推論 勉強用 isseing333
PDF
Top-K Off-Policy Correction for a REINFORCE Recommender System
PDF
Feature Engineering
PDF
Kaggle and data science
PDF
Feature selection
PPTX
Kaggle meetup #3 instacart 2nd place solution
PDF
Visualizing the Model Selection Process
PDF
Winning Data Science Competitions
PDF
DataRobotを用いた要因分析 (Causal Analysis by DataRobot)
PDF
CF-FinML 金融時系列予測のための機械学習
PDF
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
PDF
Hyperparameter Optimization for Machine Learning
PPTX
多目的遺伝的アルゴリズム
PDF
Machine Learning Interpretability
Feature Engineering
Tutorial of topological_data_analysis_part_1(basic)
Topological data analysis
Feature Engineering - Getting most out of data for predictive models - TDC 2017
General Tips for participating Kaggle Competitions
Tips for data science competitions
統計的因果推論 勉強用 isseing333
Top-K Off-Policy Correction for a REINFORCE Recommender System
Feature Engineering
Kaggle and data science
Feature selection
Kaggle meetup #3 instacart 2nd place solution
Visualizing the Model Selection Process
Winning Data Science Competitions
DataRobotを用いた要因分析 (Causal Analysis by DataRobot)
CF-FinML 金融時系列予測のための機械学習
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
Hyperparameter Optimization for Machine Learning
多目的遺伝的アルゴリズム
Machine Learning Interpretability
Ad

Viewers also liked (20)

PDF
No-Bullshit Data Science
PDF
Intra company hackathons using HackerEarth
PDF
USC LIGHT Ministry Introduction
PPTX
Kill the wabbit
PDF
Work - LIGHT Ministry
PDF
Open Innovation - A Case Study
PDF
Druva Casestudy - HackerEarth
PDF
Menstrual Health Reader - mEo
ODP
Wapid and wobust active online machine leawning with Vowpal Wabbit
PPTX
Leverage Social Media for Employer Brand and Recruiting
PDF
Fairly Measuring Fairness In Machine Learning
PPTX
Make Sense Out of Data with Feature Engineering
PDF
DataRobot R Package
PPTX
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
PDF
6 rules of enterprise innovation
PPTX
Managing Data Science | Lessons from the Field
PPTX
State of women in technical workforce
PDF
Data science at the command line
PDF
How hackathons can drive top line revenue growth
PDF
Ethics in Data Science and Machine Learning
No-Bullshit Data Science
Intra company hackathons using HackerEarth
USC LIGHT Ministry Introduction
Kill the wabbit
Work - LIGHT Ministry
Open Innovation - A Case Study
Druva Casestudy - HackerEarth
Menstrual Health Reader - mEo
Wapid and wobust active online machine leawning with Vowpal Wabbit
Leverage Social Media for Employer Brand and Recruiting
Fairly Measuring Fairness In Machine Learning
Make Sense Out of Data with Feature Engineering
DataRobot R Package
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
6 rules of enterprise innovation
Managing Data Science | Lessons from the Field
State of women in technical workforce
Data science at the command line
How hackathons can drive top line revenue growth
Ethics in Data Science and Machine Learning
Ad

Similar to Tda presentation (20)

PDF
Topological Data Analysis
PDF
Using Topological Data Analysis on your BigData
PDF
Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson
PDF
2012/2013-TDA-intro-part1
PDF
Topological Data Analysis of Complex Spatial Systems
PDF
Lecture 07 leonidas guibas - networks of shapes and images
PPTX
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
PDF
Topological Data Analysis of Complex Spatial Systems
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PPTX
Shape as Organizing Principle for Data
PPTX
A Study on data analysis in Oncology.pptx
PPTX
A Study on data analysis in Oncology.pptx
PPTX
DIGITAL TOPOLOGY OPERATING IN MEDICAL IMAGING WITH MRI TECHNOLOGY.pptx
PPTX
Presentation on textile mathematics
PDF
Graphtheory
PPTX
ODSC India 2018: Topological space creation & Clustering at BigData scale
PPTX
Gephi, Graphx, and Giraph
PDF
Graph theory
PPTX
Data Structure Graph DMZ #DMZone
PDF
UMAP - Mathematics and implementational details
Topological Data Analysis
Using Topological Data Analysis on your BigData
Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson
2012/2013-TDA-intro-part1
Topological Data Analysis of Complex Spatial Systems
Lecture 07 leonidas guibas - networks of shapes and images
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
Topological Data Analysis of Complex Spatial Systems
Tutorial of topological data analysis part 3(Mapper algorithm)
Shape as Organizing Principle for Data
A Study on data analysis in Oncology.pptx
A Study on data analysis in Oncology.pptx
DIGITAL TOPOLOGY OPERATING IN MEDICAL IMAGING WITH MRI TECHNOLOGY.pptx
Presentation on textile mathematics
Graphtheory
ODSC India 2018: Topological space creation & Clustering at BigData scale
Gephi, Graphx, and Giraph
Graph theory
Data Structure Graph DMZ #DMZone
UMAP - Mathematics and implementational details

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
A Presentation on Artificial Intelligence
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx

Tda presentation

  • 1. TOPOLOGICAL DATA ANALYSIS HJ vanVeen· Data Science· Nubank Brasil
  • 2. TOPOLOGY I • "When a truth is necessary, the reason for it can be found by analysis, that is, by resolving it into simpler ideas and truths until the primary ones are reached." - Leibniz
  • 3. TOPOLOGY II • Topology is the mathematical study of topological spaces. • Topology is interested in shapes, • More specifically: the concept of 'connectedness'
  • 4. TOPOLOGY III • A topologist is someone who does not see the difference between a coffee mug and a donut.
 
 
 
 
 

  • 5. HISTORY I • “Nothing at all takes place in the universe in which some rule of maximum or minimum does not appear.” - Euler • Seven Bridges of Koningsbrucke: devise a walk through the city that would cross each bridge once and only once.
  • 7. HISTORY III • Euler's big insights: • Doesn’t matter where you start walking, only matters which bridges you cross. • A similar solution should be found, regardless where you start your walk. • only the connectedness of bridges matter, • a solution should also apply to all other bridges that are connected in a similar fashion, no matter the distances between them.
  • 8. HISTORY IV • We now call these graph walks ‘Eulerian walks’ in Euler’s honor. • Euler's first proven graph theory theorem: • 'Euler walks' are possible if exactly zero or two nodes have an odd number of edges.
  • 9. TDA I • TDA marries 300-year old maths with modern data analysis. • Captures the shape of data • Is invariant • Compresses large datasets • Functions well in the presence of noise / missing variables
  • 10. TDA II • Capturing the shape of data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •Traditional techniques like clustering or dimensionality reduction have trouble capturing this shape.

  • 11. TDA III • Invariance.
 
 
 
 
 • Euler showed that only connectedness matters.The size, position, or pose of an object doesn't change that object.
  • 12. TDA IV • Compression. • Compressed representations use 
 the order in data. • Only order can be compressed. • Random noise or slight variations 
 are ignored. • Lossy compression retains the most
 important features. • "Now where there are no parts, there neither extension, nor shape, nor divisibility is possible. And these monads are the true atoms of nature and, in a word, the elements of things." - Leibniz
  • 13. MAPPER I • Mapper was created by Ayasdi Co-founder Gurjeet Singh during his PhD under Gunnar Carlsson. • Based on the idea of partial clustering of the data guided by a set of functions defined on the data.
  • 14. MAPPER II • Mapper was inspired by the Reeb Graph.
 
 
 
 
 
 

  • 15. MAPPER III • Map the data with overlapping intervals. • Cluster the points inside the intervals • When clusters share data points draw an edge • Color nodes by function
  • 17. MAPPERV Distance_to_median(row) x y z 1.5 1.5 1.5 1.5 1.5 -0.5 -0.5 -0.5 0 1 1 1 0 1 0.9 1.1 3 2 2 2 3 2.1 1.9 2 Y
  • 19. FUNCTIONS • Raw features or point-cloud axis / coordinates • Statistics: Mean, Max, Skewness, etc. • Mathematics: L2-norm, FourierTransform, etc. • Machine Learning: t-SNE, PCA, out-of-fold preds • Deep Learning: Layer activations, embeddings
  • 20. CLUSTER ALGO’S • DBSCAN / HDBSCAN: • Handles noise well. • No need to set number of clusters. • K-Means: • Creates visually nice simplicial complexes/graphs
  • 21. SOME GENERAL USE CASES • ComputerVision • Model and feature inspection • Computational Biology / Healthcare • Persistent Homology
  • 23. MODEL AND FEATURE INSPECTION • Demo
 
 
 
 
 
 

  • 26. SOME FINANCE USE CASES • Customer Segmentation • Transactional Fraud • Accurate Interpretable Models • Exploration / Analysis
  • 28. TRANSACTIONAL FRAUD • Example of spousal fraud
 
 
 
 
 
 

  • 29. ACCURATE INTERPRETABLE MODELS • Create: global linear model • Function: L2-norm • Color: Heatmap by ground truth and animate to out-of-fold model predictions • Identify: Low accuracy sub graphs • Select: Features that are most important for sub graphs • Create: Local linear models on sub graphs • Stack: DecisionTree • Compare: Divide-and-Conquer and LIME • DEMO
  • 30. EXPLORATION / ANALYSIS • Demo
 
 
 
 
 
 

  • 32. FURTHER READING • Google terms: • Ayasdi,Topological Data Analysis, Robert Ghrist, Gurjeet Singh, Gunnar Carlsson, Anthony Bak,Allison Gilmore, Simplicial Complex, Python Mapper. • Videos: • https://www.youtube.com/watch?v=4RNpuZydlKY • https://www.youtube.com/watch?v=x3Hl85OBuc0 • https://www.youtube.com/watch?v=cJ8W0ASsnp0 • https://www.youtube.com/watch?v=kctyag2Xi8o