SlideShare a Scribd company logo
SnapNETS: Automatic
Segmentation of Network Sequences
with Node Labels
Sorour E. Amiri, Liangzhe Chen, B. Aditya Prakash
Department of Computer Science
Virginia Tech
AAAI, San Francisco, USA, February 9, 2017
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Experiments
 Conclusion
Amiri, Chen, Prakash 1
Network Sequences
 Epidemiology: disease spreads over contact networks
 Social Media: Information spreads over friendship networks
2
Flu
Meme
Amiri, Chen, Prakash
G1 G2 G3 G4
G1 G2 G3 G4
Uninfected
Infected
Inactive
Active
Making sense of network sequences
3
Flu
when do the infection patterns change?
Star Bridge Near Clique
Reason:
• Virus mutation
• Vaccination
• …
Amiri, Chen, Prakash
G1 G2 G3 G4
Uninfected
Infected
Making sense of network sequences
4
Meme Reason:
• Event
• …
Star Clique
when do the activation patterns change?
Amiri, Chen, Prakash
G1 G2 G3 G4
Inactive
Active
Problem 1: Network sequence segmentation
 Given a sequence of networks with labeled nodes,
 Find the best segmentation which captures:
 Different distribution of node labels.
5
Star Bridge Near Clique
Amiri, Chen, Prakash
G1 G2 G3 G4
In this work:
 Binary labels {0, 1}
Desirable Properties
 P1. Parameter-free:
• No threshold, No fixed granularity
 P2. Comprehensive:
• Use the entire graph
 P3. Scalable
6Amiri, Chen, Prakash
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Experiments
 Conclusion
7Amiri, Chen, Prakash
Alternative 1: Feature Ext. &Time-series
8
0 0 0 … 2F1: #cliques (of active subgraph)
F2: #ladders (of inactive subgraph)
F3: #ladders (of active subgraph)
1 1 0 … 0
0 0 0 … 1
[Henderson et al. 2010]
[Likas, Vlassis, and Verbeek 2003]
[Li et al. 2009]
Amiri, Chen, Prakash
-1
0
1
2
G1 G2 G3 G4
Features time series
F1 F2 F3
Step 1: Feature Extraction
Step 2: Time-series
segmentationG1 G2 G3 G4
…
Alternative 1: Feature Ext. &Time-series
 Drawbacks:
 Laborious feature-engineering
o # Cliques
o # Ladders
 “Local” change detection:
o One aggregation time period
o Threshold
9Amiri, Chen, Prakash
-1
0
1
2
G1 G2 G3 G4
Features time series
F1 F2 F3
G1 G2 G3 G4
Alternative 2: Plain-graph-based analysis
10
[Shah et al. 2015]
[Sun et al. 2007]
[Lin et al. 2009]
[Qu et al. 2014]
Step 1: Extract active subgraphs
Amiri, Chen, Prakash
Step 2: Dynamic graph segmentation
G1 G2 G3 G4
G1 G2 G3 G4 G1 G2 G3 G4
Alternative 2: Plain-graph-based analysis
 Drawbacks:
Inactive nodes are important to detect different patterns
Amiri, Chen, Prakash
Entire graphDynamic graph segmentation
10
G1 G2 G3 G4 G1 G2 G3 G4
Chain Roles are different
Desirable Properties
 P1. Parameter-free:
• No threshold, No fixed granularity
 P2. Comprehensive:
• Use the entire graph
 P3. Scalable
12Amiri, Chen, Prakash
SNAPNETS
Feature eng. and 2me series
Plain-graph-based
Comparison of SnapNETS
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
13Amiri, Chen, Prakash
 Nodes:
 For each segment there is a node + {Source (‘s’), Target (‘t’)}
 Source (‘s’) = start time Target (‘t’) = end time
 Edges:
 There is a directed edge between adjacent nodes
Main Idea: Segmentation graph
14Amiri, Chen, Prakash
Best segmentation problem ≡ Path optimization problem
Input
SegmentationGraph
Overview of SnapNETS
 Goal 1. Summarize each graph:
Keep structural and label dependent properties
 Goal 2. Construct Segmentation graph:
Define nodes and edges
Defining edges weights
o extract the features of summarized graphs
 Goal 3. Find the best segmentation:
Define the best segmentation (path)
Compute the best segmentation
15Amiri, Chen, Prakash
Technical Challenges
 Using the entire graph snapshots:
 Summarize graph while satisfying P2
 Finding the number of segments:
 Compute segmentation while satisfying P1
16
Reminder:
 P1. Parameter-free
 P2. Comprehensive
 P3. Scalable
Amiri, Chen, Prakash
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
17Amiri, Chen, Prakash
Goal 1: Summarizing graph snapshots
 We want to preserve
 Structural properties
 Nodes labels
 Role of Eigenvalue:
18Amiri, Chen, Prakash
Epidemic threshold in most diffusion
models [Prakash et al. ICDM 2011]
Same Same diffusive properties
Leading eigenvalue
of Adjacency matrix
Our summarization approach
 We want to get a smaller graph with similar eigenvalues:
Successively merge nodes
19Amiri, Chen, Prakash
Problem 2: Graph summarization
 Given: A graph with labeled nodes and a compression ratio.
 Find: a coarsened graph such that:
20Amiri, Chen, Prakash
 Keep leading eigenvalue
 Matrix perturbation approach
 Based on CoarsNet [Purohit et al. KDD 2014]
 Successively merge nodes
 Do not merge nodes with different labels
Our Approach
21
Given: A graph with labeled nodes and a compression ratio.
Find: a coarsened graph such that:
Amiri, Chen, Prakash
0.1
0.2
0.2
…
…
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
22Amiri, Chen, Prakash
 Nodes:
 For each segment there is a node + {Source (‘s’), Target (‘t’)}
 Source (‘s’) = start time Target (‘t’) = end time
 Edges:
 There is a directed edge between adjacent nodes
Goal 2: Segmentation graph
23Amiri, Chen, Prakash
Edge Weights
24
How can we measure the distance between two segments?
Amiri, Chen, Prakash
w ?
Our Approach
 Step 1: Extract features from summary graphs:
Easier and more efficient than on original graphs.
No complex features
25Amiri, Chen, Prakash
F = [3.9, 13,..., 2.2]
Step 2: Distance of adjacent segments
26
Edge Weights
Amiri, Chen, Prakash
w
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
27Amiri, Chen, Prakash
Goal 3: Finding the best segmentation
 Observation:
For each segmentation there is a path from ‘s’ to ‘t’
For each path from ‘s’ to ‘t’ there is a segmentation
 Therefore,
• Best segmentation problem ≡ Path optimization problem
28Amiri, Chen, Prakash
Possible approach
 Longest path?
Given a segmentation graph
Find the longest path from ‘s’ to ‘t’
29
Over segmentation problem
s t. . .
s t
0.01 0.01 0.01 0.01
0.9 0.9 0.9
Sum = 3
Sum = 2.7
Amiri, Chen, Prakash
Problem 3: Finding the best segmentation
 Our idea: Average longest path
 Advantages:
 Parameter free
 Naturally balances weight of the path with the number of segments.
30
Given a segmentation graph
Find the average longest path from ‘s’ to ‘t’
Amiri, Chen, Prakash
Solving ALP
 Finding the ALP in general graphs is NP-hard.
 The segmentation graph is a DAG ALP can be solved in
polynomial time
 State-of-the-art algorithm [Waggoner et al. WACV 2013]
31Amiri, Chen, Prakash
Time complexity:
Cubic: Not scalable!
Our Solution: LAYERED-ALP
Amiri, Chen, Prakash 32
 Dynamic Programming
 Optimal solution
lp1 = Longest path with 1 segment
lp2 = Longest path with 2 segments
lp4 = Longest path with 4 segments
Our Solution: LAYERED-ALP
Amiri, Chen, Prakash 33
Time Complexity:
Linear!
Build Layers
Find LP in
each layer
Find ALP
Complete algorithm
34
Time complexity:
Amiri, Chen, Prakash
Sub-quadratic
Complete algorithm: Parallel
35
Time complexity:
Amiri, Chen, Prakash
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
36Amiri, Chen, Prakash
Experiments: datasets
 Different Domains with range of sizes:
 BA-degree: Random Barabasi Albert graph
 AS-Oregon: Autonomous Systems peering information
 Higgs: Tweets dataset (with the follower-followee network)
 Portland: Contact network between people of Portland
 Memetracker: Who-copies-from-whom blog and website network
 IranElect: Follower-followee network of Twitter related to the Iran
election.
 DBLP: Co-authorship network related to ‘network’ topic.
37Amiri, Chen, Prakash
Experiments: baselines
 DYNAMMO [Li et al. KDD 2009]:
 Change point detection ( Reconstruction errors)
 # segments = # segments of SnapNETS .
 K-means [Likas et al. Pattern Recognition 2003]:
 segment when a new cluster is detected
 VOG [Koutra et al. SDM 2014]:
 10 most important sub-structures
 Cut when the set of sub-structures changes significantly
o (threshold = the one gives the best result)
38Amiri, Chen, Prakash
Feature Extraction
& time series
Dynamic graph
Experiments: baselines-variations
 SN-ORIG: Original graphs instead of summary graphs
 SN-LP: Longest Path instead of ALP
 SN-GREEDY: Greedy Approach instead of ALP
39Amiri, Chen, Prakash
Experiments: Quantitative analysis
40
 SnapNETS outperforms the baselines
 Clear patterns in summary graphs
 Infection moves to new community
As-Oregon
Amiri, Chen, Prakash
Case studies: Memetracker
41
Televised vice-presidential debates
 Summary graphs are close to
the case when all nodes have
the same label (f5)
 Random nodes are active (f8)
 Summary graphs are
substantially sparser (f2).
 Many active nodes got merged
into important nodes such as
CNN and BBC to form hubs (f6)
Amiri, Chen, Prakash
Can I call you joe?
Case studies: AS-Oregon
42
 New community  New segment
Amiri, Chen, Prakash
Scalability
43Amiri, Chen, Prakash
Scalability of SNAP NETS Speedup by parallelizing
construction of segmentation graph
Near-linear
Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion
44Amiri, Chen, Prakash
Discussion: SnapNets
 Patterns:
 the ‘placement’ and ‘connection’ of
active/inactive nodes:
• structural (e.g. community/role/centrality)
• rate changes.
 Global method:
 SnapNETS is a ‘global’ method and
not simply a change-point detection method.
45Amiri, Chen, Prakash
Graph summarization
and features
Average Longest
Path
 Properties:
P1. Parameter-free
P2. Comprehensive
P3. Scalable
Future Work
 Handle dynamic graphs with varying
nodes and edges
 More node labels and real valued features
 Work with partially observed graphs
46Amiri, Chen, Prakash
Any questions?
47
Funding:
Code at: https://github.com/SorourAmiri/SnapNETS
Sorour E. Amiri Liangzhe Chen B. Aditya Prakash
Goal 1 Goal 2 Goal 3
Finding the best segmentation
Successively merge nodes
Keep leading eigenvalue
Keep same set of labels
Graph summarization Segmentation graph
 Nodes
 Edges
 Edge weights
ALP
SnapNETS Result

More Related Content

PPTX
Segmenting Sequences of Node-labeled Graphs
PDF
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
PDF
FULL PAPER.PDF
PDF
Topological Data Analysis of Complex Spatial Systems
PDF
A CHINESE CHARACTER RECOGNITION METHOD BASED ON POPULATION MATRIX AND RELATIO...
PDF
Topological Data Analysis
PDF
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
PDF
Human Head Counting and Detection using Convnets
Segmenting Sequences of Node-labeled Graphs
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
FULL PAPER.PDF
Topological Data Analysis of Complex Spatial Systems
A CHINESE CHARACTER RECOGNITION METHOD BASED ON POPULATION MATRIX AND RELATIO...
Topological Data Analysis
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Human Head Counting and Detection using Convnets

What's hot (14)

PDF
Ug 205-image-retrieval-using-re-ranking-algorithm-11
PDF
NEW ONTOLOGY RETRIEVAL IMAGE METHOD IN 5K COREL IMAGES
PDF
Learning with Relative Attributes
PDF
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
PPTX
Wavelet, Wavelet Image Compression, STW, SPIHT, MATLAB
PDF
Paper id 36201507
PDF
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHM
PDF
Lidar Point Cloud Classification Using Expectation Maximization Algorithm
PDF
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
PPTX
17 Statistical Models for Networks
PPTX
Community detection algorithms
PDF
Scalable Dynamic Graph Summarization
PDF
Tda presentation
PPTX
Community detection
Ug 205-image-retrieval-using-re-ranking-algorithm-11
NEW ONTOLOGY RETRIEVAL IMAGE METHOD IN 5K COREL IMAGES
Learning with Relative Attributes
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
Wavelet, Wavelet Image Compression, STW, SPIHT, MATLAB
Paper id 36201507
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHM
Lidar Point Cloud Classification Using Expectation Maximization Algorithm
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
17 Statistical Models for Networks
Community detection algorithms
Scalable Dynamic Graph Summarization
Tda presentation
Community detection
Ad

Similar to SnapNETS: Automatic Segmentation of Network Sequences with Node Labels (20)

PDF
Using Networks to Measure Influence and Impact
PDF
Greedy Incremental approach for unfolding of communities in massive networks
PPTX
Avi-newmans_fast_community_detection.pptx
PPT
PPT
Graph based forcasting for social network
PDF
Improving Machine Learning using Graph Algorithms
PPTX
Introduction to image processing and pattern recognition
PDF
PPTX
Introduction to Datamining Concept and Techniques
PPTX
Control of Photo Sharing on Online Social Network.
PDF
Social Network Analysis
PPTX
LRP for hand gesture recogntion.pptx
PDF
Module - 5 Machine Learning-22ISE62.pdf
PDF
Data-driven Analysis for Multi-agent Trajectories in Team Sports
PPTX
How to write an academic paper by a Bulgarian teacher
PPTX
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
PDF
Ego net facebook data analysis
PPTX
Community detection in complex social networks
PDF
Current clustering techniques
PDF
ANP-GP Approach for Selection of Software Architecture Styles
Using Networks to Measure Influence and Impact
Greedy Incremental approach for unfolding of communities in massive networks
Avi-newmans_fast_community_detection.pptx
Graph based forcasting for social network
Improving Machine Learning using Graph Algorithms
Introduction to image processing and pattern recognition
Introduction to Datamining Concept and Techniques
Control of Photo Sharing on Online Social Network.
Social Network Analysis
LRP for hand gesture recogntion.pptx
Module - 5 Machine Learning-22ISE62.pdf
Data-driven Analysis for Multi-agent Trajectories in Team Sports
How to write an academic paper by a Bulgarian teacher
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Ego net facebook data analysis
Community detection in complex social networks
Current clustering techniques
ANP-GP Approach for Selection of Software Architecture Styles
Ad

Recently uploaded (20)

PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Introduction to the R Programming Language
PPTX
Managing Community Partner Relationships
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Introduction to Data Science and Data Analysis
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Transcultural that can help you someday.
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Pilar Kemerdekaan dan Identi Bangsa.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
[EN] Industrial Machine Downtime Prediction
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Introduction to the R Programming Language
Managing Community Partner Relationships
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
ISS -ESG Data flows What is ESG and HowHow
A Complete Guide to Streamlining Business Processes
Introduction to Data Science and Data Analysis
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Transcultural that can help you someday.
New ISO 27001_2022 standard and the changes
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf

SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

  • 1. SnapNETS: Automatic Segmentation of Network Sequences with Node Labels Sorour E. Amiri, Liangzhe Chen, B. Aditya Prakash Department of Computer Science Virginia Tech AAAI, San Francisco, USA, February 9, 2017
  • 2. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Experiments  Conclusion Amiri, Chen, Prakash 1
  • 3. Network Sequences  Epidemiology: disease spreads over contact networks  Social Media: Information spreads over friendship networks 2 Flu Meme Amiri, Chen, Prakash G1 G2 G3 G4 G1 G2 G3 G4 Uninfected Infected Inactive Active
  • 4. Making sense of network sequences 3 Flu when do the infection patterns change? Star Bridge Near Clique Reason: • Virus mutation • Vaccination • … Amiri, Chen, Prakash G1 G2 G3 G4 Uninfected Infected
  • 5. Making sense of network sequences 4 Meme Reason: • Event • … Star Clique when do the activation patterns change? Amiri, Chen, Prakash G1 G2 G3 G4 Inactive Active
  • 6. Problem 1: Network sequence segmentation  Given a sequence of networks with labeled nodes,  Find the best segmentation which captures:  Different distribution of node labels. 5 Star Bridge Near Clique Amiri, Chen, Prakash G1 G2 G3 G4 In this work:  Binary labels {0, 1}
  • 7. Desirable Properties  P1. Parameter-free: • No threshold, No fixed granularity  P2. Comprehensive: • Use the entire graph  P3. Scalable 6Amiri, Chen, Prakash
  • 8. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Experiments  Conclusion 7Amiri, Chen, Prakash
  • 9. Alternative 1: Feature Ext. &Time-series 8 0 0 0 … 2F1: #cliques (of active subgraph) F2: #ladders (of inactive subgraph) F3: #ladders (of active subgraph) 1 1 0 … 0 0 0 0 … 1 [Henderson et al. 2010] [Likas, Vlassis, and Verbeek 2003] [Li et al. 2009] Amiri, Chen, Prakash -1 0 1 2 G1 G2 G3 G4 Features time series F1 F2 F3 Step 1: Feature Extraction Step 2: Time-series segmentationG1 G2 G3 G4 …
  • 10. Alternative 1: Feature Ext. &Time-series  Drawbacks:  Laborious feature-engineering o # Cliques o # Ladders  “Local” change detection: o One aggregation time period o Threshold 9Amiri, Chen, Prakash -1 0 1 2 G1 G2 G3 G4 Features time series F1 F2 F3 G1 G2 G3 G4
  • 11. Alternative 2: Plain-graph-based analysis 10 [Shah et al. 2015] [Sun et al. 2007] [Lin et al. 2009] [Qu et al. 2014] Step 1: Extract active subgraphs Amiri, Chen, Prakash Step 2: Dynamic graph segmentation G1 G2 G3 G4 G1 G2 G3 G4 G1 G2 G3 G4
  • 12. Alternative 2: Plain-graph-based analysis  Drawbacks: Inactive nodes are important to detect different patterns Amiri, Chen, Prakash Entire graphDynamic graph segmentation 10 G1 G2 G3 G4 G1 G2 G3 G4 Chain Roles are different
  • 13. Desirable Properties  P1. Parameter-free: • No threshold, No fixed granularity  P2. Comprehensive: • Use the entire graph  P3. Scalable 12Amiri, Chen, Prakash SNAPNETS Feature eng. and 2me series Plain-graph-based Comparison of SnapNETS
  • 14. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 13Amiri, Chen, Prakash
  • 15.  Nodes:  For each segment there is a node + {Source (‘s’), Target (‘t’)}  Source (‘s’) = start time Target (‘t’) = end time  Edges:  There is a directed edge between adjacent nodes Main Idea: Segmentation graph 14Amiri, Chen, Prakash Best segmentation problem ≡ Path optimization problem Input SegmentationGraph
  • 16. Overview of SnapNETS  Goal 1. Summarize each graph: Keep structural and label dependent properties  Goal 2. Construct Segmentation graph: Define nodes and edges Defining edges weights o extract the features of summarized graphs  Goal 3. Find the best segmentation: Define the best segmentation (path) Compute the best segmentation 15Amiri, Chen, Prakash
  • 17. Technical Challenges  Using the entire graph snapshots:  Summarize graph while satisfying P2  Finding the number of segments:  Compute segmentation while satisfying P1 16 Reminder:  P1. Parameter-free  P2. Comprehensive  P3. Scalable Amiri, Chen, Prakash
  • 18. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 17Amiri, Chen, Prakash
  • 19. Goal 1: Summarizing graph snapshots  We want to preserve  Structural properties  Nodes labels  Role of Eigenvalue: 18Amiri, Chen, Prakash Epidemic threshold in most diffusion models [Prakash et al. ICDM 2011] Same Same diffusive properties Leading eigenvalue of Adjacency matrix
  • 20. Our summarization approach  We want to get a smaller graph with similar eigenvalues: Successively merge nodes 19Amiri, Chen, Prakash
  • 21. Problem 2: Graph summarization  Given: A graph with labeled nodes and a compression ratio.  Find: a coarsened graph such that: 20Amiri, Chen, Prakash
  • 22.  Keep leading eigenvalue  Matrix perturbation approach  Based on CoarsNet [Purohit et al. KDD 2014]  Successively merge nodes  Do not merge nodes with different labels Our Approach 21 Given: A graph with labeled nodes and a compression ratio. Find: a coarsened graph such that: Amiri, Chen, Prakash 0.1 0.2 0.2 … …
  • 23. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 22Amiri, Chen, Prakash
  • 24.  Nodes:  For each segment there is a node + {Source (‘s’), Target (‘t’)}  Source (‘s’) = start time Target (‘t’) = end time  Edges:  There is a directed edge between adjacent nodes Goal 2: Segmentation graph 23Amiri, Chen, Prakash
  • 25. Edge Weights 24 How can we measure the distance between two segments? Amiri, Chen, Prakash w ?
  • 26. Our Approach  Step 1: Extract features from summary graphs: Easier and more efficient than on original graphs. No complex features 25Amiri, Chen, Prakash F = [3.9, 13,..., 2.2]
  • 27. Step 2: Distance of adjacent segments 26 Edge Weights Amiri, Chen, Prakash w
  • 28. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 27Amiri, Chen, Prakash
  • 29. Goal 3: Finding the best segmentation  Observation: For each segmentation there is a path from ‘s’ to ‘t’ For each path from ‘s’ to ‘t’ there is a segmentation  Therefore, • Best segmentation problem ≡ Path optimization problem 28Amiri, Chen, Prakash
  • 30. Possible approach  Longest path? Given a segmentation graph Find the longest path from ‘s’ to ‘t’ 29 Over segmentation problem s t. . . s t 0.01 0.01 0.01 0.01 0.9 0.9 0.9 Sum = 3 Sum = 2.7 Amiri, Chen, Prakash
  • 31. Problem 3: Finding the best segmentation  Our idea: Average longest path  Advantages:  Parameter free  Naturally balances weight of the path with the number of segments. 30 Given a segmentation graph Find the average longest path from ‘s’ to ‘t’ Amiri, Chen, Prakash
  • 32. Solving ALP  Finding the ALP in general graphs is NP-hard.  The segmentation graph is a DAG ALP can be solved in polynomial time  State-of-the-art algorithm [Waggoner et al. WACV 2013] 31Amiri, Chen, Prakash Time complexity: Cubic: Not scalable!
  • 33. Our Solution: LAYERED-ALP Amiri, Chen, Prakash 32  Dynamic Programming  Optimal solution lp1 = Longest path with 1 segment lp2 = Longest path with 2 segments lp4 = Longest path with 4 segments
  • 34. Our Solution: LAYERED-ALP Amiri, Chen, Prakash 33 Time Complexity: Linear! Build Layers Find LP in each layer Find ALP
  • 35. Complete algorithm 34 Time complexity: Amiri, Chen, Prakash Sub-quadratic
  • 36. Complete algorithm: Parallel 35 Time complexity: Amiri, Chen, Prakash
  • 37. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 36Amiri, Chen, Prakash
  • 38. Experiments: datasets  Different Domains with range of sizes:  BA-degree: Random Barabasi Albert graph  AS-Oregon: Autonomous Systems peering information  Higgs: Tweets dataset (with the follower-followee network)  Portland: Contact network between people of Portland  Memetracker: Who-copies-from-whom blog and website network  IranElect: Follower-followee network of Twitter related to the Iran election.  DBLP: Co-authorship network related to ‘network’ topic. 37Amiri, Chen, Prakash
  • 39. Experiments: baselines  DYNAMMO [Li et al. KDD 2009]:  Change point detection ( Reconstruction errors)  # segments = # segments of SnapNETS .  K-means [Likas et al. Pattern Recognition 2003]:  segment when a new cluster is detected  VOG [Koutra et al. SDM 2014]:  10 most important sub-structures  Cut when the set of sub-structures changes significantly o (threshold = the one gives the best result) 38Amiri, Chen, Prakash Feature Extraction & time series Dynamic graph
  • 40. Experiments: baselines-variations  SN-ORIG: Original graphs instead of summary graphs  SN-LP: Longest Path instead of ALP  SN-GREEDY: Greedy Approach instead of ALP 39Amiri, Chen, Prakash
  • 41. Experiments: Quantitative analysis 40  SnapNETS outperforms the baselines  Clear patterns in summary graphs  Infection moves to new community As-Oregon Amiri, Chen, Prakash
  • 42. Case studies: Memetracker 41 Televised vice-presidential debates  Summary graphs are close to the case when all nodes have the same label (f5)  Random nodes are active (f8)  Summary graphs are substantially sparser (f2).  Many active nodes got merged into important nodes such as CNN and BBC to form hubs (f6) Amiri, Chen, Prakash Can I call you joe?
  • 43. Case studies: AS-Oregon 42  New community  New segment Amiri, Chen, Prakash
  • 44. Scalability 43Amiri, Chen, Prakash Scalability of SNAP NETS Speedup by parallelizing construction of segmentation graph Near-linear
  • 45. Outline  Motivation  Alternative Approaches  Our Proposed Method: SnapNETS  Main Idea and Overview  Goal 1: Summarizing Act-snapshots  Goal 2: Constructing the segmentation graph  Goal 3: Finding the best segmentation  Experiments  Conclusion 44Amiri, Chen, Prakash
  • 46. Discussion: SnapNets  Patterns:  the ‘placement’ and ‘connection’ of active/inactive nodes: • structural (e.g. community/role/centrality) • rate changes.  Global method:  SnapNETS is a ‘global’ method and not simply a change-point detection method. 45Amiri, Chen, Prakash Graph summarization and features Average Longest Path  Properties: P1. Parameter-free P2. Comprehensive P3. Scalable
  • 47. Future Work  Handle dynamic graphs with varying nodes and edges  More node labels and real valued features  Work with partially observed graphs 46Amiri, Chen, Prakash
  • 48. Any questions? 47 Funding: Code at: https://github.com/SorourAmiri/SnapNETS Sorour E. Amiri Liangzhe Chen B. Aditya Prakash Goal 1 Goal 2 Goal 3 Finding the best segmentation Successively merge nodes Keep leading eigenvalue Keep same set of labels Graph summarization Segmentation graph  Nodes  Edges  Edge weights ALP SnapNETS Result

Editor's Notes

  • #4: What is a network sequence? Consider flu. It can propagate among people. An infected person can infect another one if they have any sort of connection. For example, if the infected person sneeze or if they shake hands the other person can be infected. In an abstract level, as we can see in the figure, in a people contact network, different people are getting infected over time. In the figure the yellow nodes means infected. We call this sequence of graphs with infected and uninfected nodes a network sequence. Another example is in social network. Assume users make a friendship network and they are connected if they are friend in the social media. A meme can spread on this network. When a user tweet about the meme we consider it as an active user. So we will have a sequence of graph snapshots where different users are active over time.
  • #5: Our goal is to make sense of network sequences. consider, in the flu propagation example we want to detect when the infection patterns change. Our result will be a set of cut point such as here. They show at the beginning in the first graph the infection is more like a star but in the next segment the bridge nodes getting infected and in the last segment a new community is getting infected in the graph. These changes can be because of virus mutation, vaccination or any other reason.
  • #6: In the social media example, we want to find when the activation patters change. The desired segmentation can be like this. Which shows, at the beginning nodes getting infected in a star and then in a clique shape fashion. This change of pattern can be because of an important event.
  • #7: So define our network sequence problem here. Given a network sequence with node labels. Find the best segmentation which capture different distribution of node labels. As we saw in the example at the beginning infected node are star then they become a bridge and at last a near clique. Also, we assume that the node labels are binary. For example, 0/1, active/inactive, infected/uninfected.
  • #9: I will give you a background about alternative approaches.
  • #12: Another possible approach is, to extract the active sub-graphs from each graph snapshot. IN this case we extract the infected nodes. So, we will have a sequence of networks with different structures and all nodes are in a same label. So we have a sequence of plain and dynamic graphs. So we can use any dynamic graph segmentation such as time-crunch to detect the cut-points in the segmentation.
  • #16: Our main idea is to make directed and acyclic graph as a data structure. It can map the exponential search space of finding the best segmentation to a polynomial one. In this graph for each segment we have a node in the graph. As we can see… also we have two dummy nodes source ans target to show the start and end time of the sequence. There is an edge between adjecent segments. In this graph for each possible segmentation there is a path from source to target in seg graph. And for each path there is a segmentation in the sequence. So the best seg problem is equivalent to the path otimization problem.
  • #25: Smaller size summarization maintains the relevant important properties effectively
  • #26: Smaller size summarization maintains the relevant important properties effectively
  • #27: Smaller size summarization maintains the relevant important properties effectively
  • #28: Smaller size summarization maintains the relevant important properties effectively
  • #39: BA-degree: We activate highest degree and then lowest degree nodes on a
  • #40: BA-degree: We activate highest degree and then lowest degree nodes on a
  • #41: BA-degree: We activate highest degree and then lowest degree nodes on a
  • #43: We also run some case studies to study our method. Here we track an popular meme in the blog and websites networks. Them meme a the phrase: Can I call you joe which Sara Palin used in vice presidential debated. We consider nodes active if they post anything related to this meme. We can find a cut-point matches the vice presidential debate. This figure shows the sequence of summary graphs. The change of pattern is obvious here. First segment shows nodes are randomly getting infected. In second segment, summaries are sparser and important nodes such as news websites form hubs in the graph.