SlideShare a Scribd company logo
Using Topological Analysis to Support
Event-Guided Exploration in Urban Data
IEEE transactions on visualization and computer graphics
Harish Doraiswamy, Nivan Ferreira, Theodoros Damoulas, Juliana Freire and Cl´audio T. Silva
Overview of the paper
• Motivation
 To examine prohibitively large number of spatio-temporal slices in efficient way
and discover interesting patterns from it
• Contribution
 Propose an efficient and scalable technique that automatically discovers events
 Guide users towards potentially interesting data slices
 Accomplish event detection through the application of topological analysis on a
time-varying scalar function
 Design an indexing scheme that groups similar patterns across time slices
 Design visual interface to aid in event-guided exploration of urban data
1
Background
• A scalar function maps points in a spatial domain.
 The function value at each point on this graph is equal to the point’s y-coordinate.
• A super-level set of a real value a is the pre-image of the interval [a,+∞).
• A sub-level set of a is the pre-image of the interval (−∞,a].
• Critical points of a smooth real-valued function are exactly where the gradient becomes zero.
 Topological changes occur at critical points.
 A maximum captures a peak of the function, where the function value is higher than its
neighborhood.
 A minimum captures a valley of the function.
• Regular points are the points that are not critical.
 Topology of the super-level (sub-level) set is preserved across regular points.
2
Background
• Topological Persistence
 The topology of the super-level sets change when the sweep in decreasing order
encounters a critical point.
 A creator is the critical point if a new component is created, a destroyer otherwise.
 The persistence value of 𝑣𝑐 is 𝜋𝑐 = 𝑓 𝑣𝑐 − 𝑓 𝑣𝑑 .
• Join tree and split tree
 The tree abstracts the topology of a scalar function f, and represent features of f.
 The join tree tracks the changes in the connectivity of super-level sets
 The split tree tracks the connectivity of the sub-level sets
3
Data
• NYC Taxi Data
 Manhattan during 2011 and 2012
 Each trip consists of pickup and drop-off locations and times
 Average 500 thousand trips each day
 Identifying road closures and taxi hot spots
 Scalar function for an hourly interval at each node of this graph as the density of taxis
within a small circular region
 Minima and maxima are used to represent events
• MTA Subway Data
 Time stamps of all the stops for all the trips that happen each day.
 Delays in the schedule of the different trains ( scheduled – actual)
 The nodes of this path corresponds to the different stations along its route
4
Managing Events
• Computing Events
1. Split tree to see the significant events
2. Persistence to capture the importance of a feature
3. Geometric size of a feature to consider the characteristic of hyper-volume
4. Remain top-k from the set of minima
• Event Group Index
 Define a notion of similarity between events based on their geometric and topological
properties
 Group similar events within a certain time interval into event groups
 Define a key to index these groups
5
Event Group Index
• Similarity Between Events
 E is formally represented as a pair (R, τ)
• R is a subgraph of spatial region
• τ is a real number representing topological importance
 Graph distance metric δ, to measure the geometric similarity between R1 and R2
δ 𝐸1, 𝐸2 = 1 −
|𝑅1 ∩ 𝑅2|
max( 𝑅1 , 𝑅2 )
• 𝑅1 ∩ 𝑅2 denotes the maximum common subgraph between R1 and R2
• 𝑅 denotes the number of nodes in R
• Measures the amount of overlap between two regions, ensuring that similar regions have a significant
overlap
 Topological similarity between two events
T 𝐸1, 𝐸2 = |τ1 − τ2|
• Two events E1 and E2 are similar if δ 𝐸1, 𝐸2 ≤ εδ and τ 𝐸1, 𝐸2 ≤ ετ
• Ensures that the two events are topologically close with respect to the topological importance
6
Event Group and Event Group Key
• Use a time period equal to one month
 not to miss periodic events
 not to create a computational bottleneck
• Given an event group Σ = 𝐸1, 𝐸2, … , 𝐸𝑘 , define the event group key of Σ as (𝑅Σ, τΣ)
𝑅Σ =
𝑖∈[1,𝑘]
𝑅𝑖 𝑎𝑛𝑑 τΣ =
𝑖=1
𝑘
τ𝑖/𝑘
• 𝑅𝛴 is the maximum common subgraph of the geometric regions  overlap for similarity condition
• τΣ captures average of the topological importance
• Follows definition of geometric and topological similarity measures
• The definition of event group key helps in using a consistent definition for the similarity between event
groups
• When two similar event groups are found, they are merged into a single group
• With given query, perform a linear search over the set to find events
7
Visual Exploration Interface
• Map View and Query Interface
• Event Group Distribution View and Timeline View
 Range is the amount of time between the first and the last event
 Density is the number of events of group that happen per time unit
• Classification of event groups with two attributes
 Region I:
• Low range, but high density
• Rare occurrence (irregular pattern)
 Region II:
• High range and high density
• Occur over frequent periods, so can identify trend
 Region III:
• Small number of events that span a large range
• Potentially represent patterns that are regular over a large time interval
• Irregular with respect to the range of the input data
 Region IV:
• Low range as well as low density
8
Filtering Interface
Event group size
Event size
Event time
Spatial region
Case Studies – NYC Taxi data
• To help them identify areas with high concentrations of taxis
• Minima events in NYC
 Regions where there are comparatively fewer taxis
• If this place is usually a high density of taxis, blockage of streets
• Hourly events
 Sixth avenue in Greenwich Village on October 31st
 This corresponds to the annual NYC Halloween Parade
• Daily events
 Fifth avenue on October 9th and 10th, 2011
 This corresponds to the Hispanic Day Parade on 9th October and the
Columbus Day Parade on October 10th
• Weekly events
 NYC Summer streets that happens on Park avenue
 Occurred on three consecutive Saturdays, 6th, 13th, and 20th
August respectively
9
Case Studies – NYC Taxi data
• Querying events
 Search for events similar to a selected event that occurs in other
months
 find other parades that also occurred in the same location.
• Identifying trends
 Maxima events show high concentration of taxis
 If such concentrations are frequent, then it could imply taxi hot spots
 Optimize the amount of receiver place
10
Case Studies – MTA data
• To identify events related to delays
• The amount of delay is applied as topological persistence for
importance measure
• Minimum event groups
 Find a station at which the delay is lower than that of its neighbors
 Signals where trains start to get delayed
 Frequent presence of such events are in Region II
 Wall Street station
• Events occur predominantly during the rush hour period on
weekdays
 14th street station
• 3 train sometimes waits for the 1 train
11
Limitation and Future work
• The characteristic of the event is not explainable unless the user search the event
• The system should iterate the group to find the similar event
• No entire view of the system other than the graphs
• Scalar function should be assigned considerately
• Speed can be used for scalar function computation
12
Using topological analysis to support event guided exploration in urban data

More Related Content

PPTX
Introduction to spatial data mining
PDF
IEEE Camad20 presentation - Isam Al Jawarneh
PPTX
Spatial Data Mining : Seminar
PDF
Events, Signals, and Recommendations
PPTX
spatial interoplation in GIS
PPT
Probabilistic Roadmaps
PDF
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...
PPTX
[Seminar] hyunwook 0624
Introduction to spatial data mining
IEEE Camad20 presentation - Isam Al Jawarneh
Spatial Data Mining : Seminar
Events, Signals, and Recommendations
spatial interoplation in GIS
Probabilistic Roadmaps
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...
[Seminar] hyunwook 0624

What's hot (20)

PDF
Spatial data analysis
PPTX
Spatial analysis and Analysis Tools ( GIS )
DOCX
Distributed data fusion for multirobot search
PDF
Interpolation
PDF
Interpolation techniques in ArcGIS
PPTX
201029 Joohee Kim
PPTX
Temporal Network
PPTX
Graph for SQL Practitioners
PPTX
Spatial analysis & interpolation in ARC GIS
PPTX
Towards Smart Transportation DSS 2018
PDF
Spatial data analysis 2
PPTX
Remote Sensing: Interppolation
PPTX
Interpolation 2013
PDF
ESTA-LD exploring spatio-temporal linked statistical data
PDF
KTH-Texxi Project 2010
PDF
Streaming Weather Data from Web APIs to Jupyter through Kafka
PDF
Interpolation 2013
PPTX
data analysis
PPTX
Kalman filtering and it's applications
PPTX
Inverse distance weighting
Spatial data analysis
Spatial analysis and Analysis Tools ( GIS )
Distributed data fusion for multirobot search
Interpolation
Interpolation techniques in ArcGIS
201029 Joohee Kim
Temporal Network
Graph for SQL Practitioners
Spatial analysis & interpolation in ARC GIS
Towards Smart Transportation DSS 2018
Spatial data analysis 2
Remote Sensing: Interppolation
Interpolation 2013
ESTA-LD exploring spatio-temporal linked statistical data
KTH-Texxi Project 2010
Streaming Weather Data from Web APIs to Jupyter through Kafka
Interpolation 2013
data analysis
Kalman filtering and it's applications
Inverse distance weighting
Ad

Similar to Using topological analysis to support event guided exploration in urban data (20)

PPTX
Integrating Sensor and Social Data for Understanding City Events
PPTX
Extracting City Traffic Events from Social Streams
PDF
City Data Dating: emerging affinities between diverse urban datasets
PDF
Temporal models for mining, ranking and recommendation in the Web
PPTX
cs 601 - lecture 1.pptx
PDF
20131106 acm geocrowd
PDF
Spatial analysis and Analysis Tools
PDF
Topological Data Analysis of Complex Spatial Systems
PPTX
Generative_Techniques_Spatial_Temporal_Data_Mining.pptx
PPT
Information Spread in the Context of Evacuation Optimization
PPTX
Exploring Data (1).pptx
PPT
Individual movements and geographical data mining. Clustering algorithms for ...
PPT
Debs 2010 context based computing tutorial
PDF
Toward Semantic Data Stream - Technologies and Applications
ODP
Learn about Your Location (Using ALL Your Data)
PPTX
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
PDF
Machine Learning statistical model using Transportation data
PDF
ITS for Crowds
PPT
acmgis 2008
PDF
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
Integrating Sensor and Social Data for Understanding City Events
Extracting City Traffic Events from Social Streams
City Data Dating: emerging affinities between diverse urban datasets
Temporal models for mining, ranking and recommendation in the Web
cs 601 - lecture 1.pptx
20131106 acm geocrowd
Spatial analysis and Analysis Tools
Topological Data Analysis of Complex Spatial Systems
Generative_Techniques_Spatial_Temporal_Data_Mining.pptx
Information Spread in the Context of Evacuation Optimization
Exploring Data (1).pptx
Individual movements and geographical data mining. Clustering algorithms for ...
Debs 2010 context based computing tutorial
Toward Semantic Data Stream - Technologies and Applications
Learn about Your Location (Using ALL Your Data)
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Machine Learning statistical model using Transportation data
ITS for Crowds
acmgis 2008
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
Ad

More from ivaderivader (20)

PPTX
Argument Mining
PPTX
Papers at CHI23
PPTX
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
PPTX
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
PPTX
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
PPTX
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
PPTX
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
PPTX
A Style-Based Generator Architecture for Generative Adversarial Networks
PPTX
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
PPTX
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
PPTX
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
PPTX
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
PPTX
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
PPTX
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
PPTX
Invertible Denoising Network: A Light Solution for Real Noise Removal
PPTX
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
PPTX
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
PPTX
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
PPTX
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
PPTX
Natural Language to Visualization by Neural Machine Translation
Argument Mining
Papers at CHI23
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
A Style-Based Generator Architecture for Generative Adversarial Networks
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Invertible Denoising Network: A Light Solution for Real Noise Removal
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Natural Language to Visualization by Neural Machine Translation

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Reach Out and Touch Someone: Haptics and Empathic Computing

Using topological analysis to support event guided exploration in urban data

  • 1. Using Topological Analysis to Support Event-Guided Exploration in Urban Data IEEE transactions on visualization and computer graphics Harish Doraiswamy, Nivan Ferreira, Theodoros Damoulas, Juliana Freire and Cl´audio T. Silva
  • 2. Overview of the paper • Motivation  To examine prohibitively large number of spatio-temporal slices in efficient way and discover interesting patterns from it • Contribution  Propose an efficient and scalable technique that automatically discovers events  Guide users towards potentially interesting data slices  Accomplish event detection through the application of topological analysis on a time-varying scalar function  Design an indexing scheme that groups similar patterns across time slices  Design visual interface to aid in event-guided exploration of urban data 1
  • 3. Background • A scalar function maps points in a spatial domain.  The function value at each point on this graph is equal to the point’s y-coordinate. • A super-level set of a real value a is the pre-image of the interval [a,+∞). • A sub-level set of a is the pre-image of the interval (−∞,a]. • Critical points of a smooth real-valued function are exactly where the gradient becomes zero.  Topological changes occur at critical points.  A maximum captures a peak of the function, where the function value is higher than its neighborhood.  A minimum captures a valley of the function. • Regular points are the points that are not critical.  Topology of the super-level (sub-level) set is preserved across regular points. 2
  • 4. Background • Topological Persistence  The topology of the super-level sets change when the sweep in decreasing order encounters a critical point.  A creator is the critical point if a new component is created, a destroyer otherwise.  The persistence value of 𝑣𝑐 is 𝜋𝑐 = 𝑓 𝑣𝑐 − 𝑓 𝑣𝑑 . • Join tree and split tree  The tree abstracts the topology of a scalar function f, and represent features of f.  The join tree tracks the changes in the connectivity of super-level sets  The split tree tracks the connectivity of the sub-level sets 3
  • 5. Data • NYC Taxi Data  Manhattan during 2011 and 2012  Each trip consists of pickup and drop-off locations and times  Average 500 thousand trips each day  Identifying road closures and taxi hot spots  Scalar function for an hourly interval at each node of this graph as the density of taxis within a small circular region  Minima and maxima are used to represent events • MTA Subway Data  Time stamps of all the stops for all the trips that happen each day.  Delays in the schedule of the different trains ( scheduled – actual)  The nodes of this path corresponds to the different stations along its route 4
  • 6. Managing Events • Computing Events 1. Split tree to see the significant events 2. Persistence to capture the importance of a feature 3. Geometric size of a feature to consider the characteristic of hyper-volume 4. Remain top-k from the set of minima • Event Group Index  Define a notion of similarity between events based on their geometric and topological properties  Group similar events within a certain time interval into event groups  Define a key to index these groups 5
  • 7. Event Group Index • Similarity Between Events  E is formally represented as a pair (R, τ) • R is a subgraph of spatial region • τ is a real number representing topological importance  Graph distance metric δ, to measure the geometric similarity between R1 and R2 δ 𝐸1, 𝐸2 = 1 − |𝑅1 ∩ 𝑅2| max( 𝑅1 , 𝑅2 ) • 𝑅1 ∩ 𝑅2 denotes the maximum common subgraph between R1 and R2 • 𝑅 denotes the number of nodes in R • Measures the amount of overlap between two regions, ensuring that similar regions have a significant overlap  Topological similarity between two events T 𝐸1, 𝐸2 = |τ1 − τ2| • Two events E1 and E2 are similar if δ 𝐸1, 𝐸2 ≤ εδ and τ 𝐸1, 𝐸2 ≤ ετ • Ensures that the two events are topologically close with respect to the topological importance 6
  • 8. Event Group and Event Group Key • Use a time period equal to one month  not to miss periodic events  not to create a computational bottleneck • Given an event group Σ = 𝐸1, 𝐸2, … , 𝐸𝑘 , define the event group key of Σ as (𝑅Σ, τΣ) 𝑅Σ = 𝑖∈[1,𝑘] 𝑅𝑖 𝑎𝑛𝑑 τΣ = 𝑖=1 𝑘 τ𝑖/𝑘 • 𝑅𝛴 is the maximum common subgraph of the geometric regions  overlap for similarity condition • τΣ captures average of the topological importance • Follows definition of geometric and topological similarity measures • The definition of event group key helps in using a consistent definition for the similarity between event groups • When two similar event groups are found, they are merged into a single group • With given query, perform a linear search over the set to find events 7
  • 9. Visual Exploration Interface • Map View and Query Interface • Event Group Distribution View and Timeline View  Range is the amount of time between the first and the last event  Density is the number of events of group that happen per time unit • Classification of event groups with two attributes  Region I: • Low range, but high density • Rare occurrence (irregular pattern)  Region II: • High range and high density • Occur over frequent periods, so can identify trend  Region III: • Small number of events that span a large range • Potentially represent patterns that are regular over a large time interval • Irregular with respect to the range of the input data  Region IV: • Low range as well as low density 8 Filtering Interface Event group size Event size Event time Spatial region
  • 10. Case Studies – NYC Taxi data • To help them identify areas with high concentrations of taxis • Minima events in NYC  Regions where there are comparatively fewer taxis • If this place is usually a high density of taxis, blockage of streets • Hourly events  Sixth avenue in Greenwich Village on October 31st  This corresponds to the annual NYC Halloween Parade • Daily events  Fifth avenue on October 9th and 10th, 2011  This corresponds to the Hispanic Day Parade on 9th October and the Columbus Day Parade on October 10th • Weekly events  NYC Summer streets that happens on Park avenue  Occurred on three consecutive Saturdays, 6th, 13th, and 20th August respectively 9
  • 11. Case Studies – NYC Taxi data • Querying events  Search for events similar to a selected event that occurs in other months  find other parades that also occurred in the same location. • Identifying trends  Maxima events show high concentration of taxis  If such concentrations are frequent, then it could imply taxi hot spots  Optimize the amount of receiver place 10
  • 12. Case Studies – MTA data • To identify events related to delays • The amount of delay is applied as topological persistence for importance measure • Minimum event groups  Find a station at which the delay is lower than that of its neighbors  Signals where trains start to get delayed  Frequent presence of such events are in Region II  Wall Street station • Events occur predominantly during the rush hour period on weekdays  14th street station • 3 train sometimes waits for the 1 train 11
  • 13. Limitation and Future work • The characteristic of the event is not explainable unless the user search the event • The system should iterate the group to find the similar event • No entire view of the system other than the graphs • Scalar function should be assigned considerately • Speed can be used for scalar function computation 12

Editor's Notes

  • #7: Note that using persistence instead of hyper-volume could potentially remove the large shallow valleys during the simplification process.
  • #8: Note that using persistence instead of hyper-volume could potentially remove the large shallow valleys during the simplification process.