SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

SnapNETS: Automatic
Segmentation of Network Sequences
with Node Labels
Sorour E. Amiri, Liangzhe Chen, B. Aditya Prakash
Department of Computer Science
Virginia Tech
AAAI, San Francisco, USA, February 9, 2017

Outline
 Motivation
 Alternative Approaches
 Our Proposed Method: SnapNETS
 Experiments
 Conclusion
Amiri, Chen, Prakash 1

Network Sequences
 Epidemiology: disease spreads over contact networks
 Social Media: Information spreads over friendship networks
2
Flu
Meme
Amiri, Chen, Prakash
G1 G2 G3 G4
G1 G2 G3 G4
Uninfected
Infected
Inactive
Active

Making sense of network sequences
3
Flu
when do the infection patterns change?
Star Bridge Near Clique
Reason:
• Virus mutation
• Vaccination
• …
G1 G2 G3 G4
Uninfected
Infected

Making sense of network sequences
4
Meme Reason:
• Event
• …
Star Clique
when do the activation patterns change?
G1 G2 G3 G4
Inactive
Active

Problem 1: Network sequence segmentation
 Given a sequence of networks with labeled nodes,
 Find the best segmentation which captures:
 Different distribution of node labels.
5
Star Bridge Near Clique
G1 G2 G3 G4
In this work:
 Binary labels {0, 1}

Desirable Properties
 P1. Parameter-free:
• No threshold, No fixed granularity
 P2. Comprehensive:
• Use the entire graph
 P3. Scalable
6Amiri, Chen, Prakash

Outline
 Motivation
 Experiments
 Conclusion

Alternative 1: Feature Ext. &Time-series
8
0 0 0 … 2F1: #cliques (of active subgraph)
F2: #ladders (of inactive subgraph)
F3: #ladders (of active subgraph)
1 1 0 … 0
0 0 0 … 1
[Henderson et al. 2010]
[Likas, Vlassis, and Verbeek 2003]
[Li et al. 2009]
-1
0
1
2
G1 G2 G3 G4
Features time series
F1 F2 F3
Step 1: Feature Extraction
Step 2: Time-series
segmentationG1 G2 G3 G4
…

Alternative 1: Feature Ext. &Time-series
 Drawbacks:
 Laborious feature-engineering
o # Cliques
o # Ladders
 “Local” change detection:
o One aggregation time period
o Threshold
-1
0
1
2
G1 G2 G3 G4
Features time series
F1 F2 F3
G1 G2 G3 G4

Alternative 2: Plain-graph-based analysis
10
[Shah et al. 2015]
[Sun et al. 2007]
[Lin et al. 2009]
[Qu et al. 2014]
Step 1: Extract active subgraphs
Step 2: Dynamic graph segmentation
G1 G2 G3 G4
G1 G2 G3 G4 G1 G2 G3 G4

Alternative 2: Plain-graph-based analysis
 Drawbacks:
Inactive nodes are important to detect different patterns
Entire graphDynamic graph segmentation
10
G1 G2 G3 G4 G1 G2 G3 G4
Chain Roles are different

Desirable Properties
 P1. Parameter-free:
• No threshold, No fixed granularity
 P2. Comprehensive:
• Use the entire graph
 P3. Scalable
SNAPNETS
Feature eng. and 2me series
Plain-graph-based
Comparison of SnapNETS

Outline
 Motivation
 Main Idea and Overview
 Goal 1: Summarizing Act-snapshots
 Goal 2: Constructing the segmentation graph
 Goal 3: Finding the best segmentation
 Experiments
 Conclusion

 Nodes:
 For each segment there is a node + {Source (‘s’), Target (‘t’)}
 Source (‘s’) = start time Target (‘t’) = end time
 Edges:
 There is a directed edge between adjacent nodes
Main Idea: Segmentation graph
Best segmentation problem ≡ Path optimization problem
Input
SegmentationGraph

Overview of SnapNETS
 Goal 1. Summarize each graph:
Keep structural and label dependent properties
 Goal 2. Construct Segmentation graph:
Define nodes and edges
Defining edges weights
o extract the features of summarized graphs
 Goal 3. Find the best segmentation:
Define the best segmentation (path)
Compute the best segmentation

Technical Challenges
 Using the entire graph snapshots:
 Summarize graph while satisfying P2
 Finding the number of segments:
 Compute segmentation while satisfying P1
16
Reminder:
 P1. Parameter-free
 P2. Comprehensive
 P3. Scalable

Outline
 Motivation
 Experiments
 Conclusion

Goal 1: Summarizing graph snapshots
 We want to preserve
 Structural properties
 Nodes labels
 Role of Eigenvalue:
Epidemic threshold in most diffusion
models [Prakash et al. ICDM 2011]
Same Same diffusive properties
Leading eigenvalue
of Adjacency matrix

Our summarization approach
 We want to get a smaller graph with similar eigenvalues:
Successively merge nodes

Problem 2: Graph summarization
 Given: A graph with labeled nodes and a compression ratio.
 Find: a coarsened graph such that:

 Keep leading eigenvalue
 Matrix perturbation approach
 Based on CoarsNet [Purohit et al. KDD 2014]
 Successively merge nodes
 Do not merge nodes with different labels
Our Approach
21
Given: A graph with labeled nodes and a compression ratio.
Find: a coarsened graph such that:
0.1
0.2
0.2
…
…

Outline
 Motivation
 Experiments
 Conclusion

 Nodes:
 For each segment there is a node + {Source (‘s’), Target (‘t’)}
 Source (‘s’) = start time Target (‘t’) = end time
 Edges:
 There is a directed edge between adjacent nodes
Goal 2: Segmentation graph

Edge Weights
24
How can we measure the distance between two segments?
w ?

Our Approach
 Step 1: Extract features from summary graphs:
Easier and more efficient than on original graphs.
No complex features
F = [3.9, 13,..., 2.2]

Step 2: Distance of adjacent segments
26
Edge Weights
w

Outline
 Motivation
 Experiments
 Conclusion

Goal 3: Finding the best segmentation
 Observation:
For each segmentation there is a path from ‘s’ to ‘t’
For each path from ‘s’ to ‘t’ there is a segmentation
 Therefore,
• Best segmentation problem ≡ Path optimization problem

Possible approach
 Longest path?
Given a segmentation graph
Find the longest path from ‘s’ to ‘t’
29
Over segmentation problem
s t. . .
s t
0.01 0.01 0.01 0.01
0.9 0.9 0.9
Sum = 3
Sum = 2.7

Problem 3: Finding the best segmentation
 Our idea: Average longest path
 Advantages:
 Parameter free
 Naturally balances weight of the path with the number of segments.
30
Given a segmentation graph
Find the average longest path from ‘s’ to ‘t’

Solving ALP
 Finding the ALP in general graphs is NP-hard.
 The segmentation graph is a DAG ALP can be solved in
polynomial time
 State-of-the-art algorithm [Waggoner et al. WACV 2013]
Time complexity:
Cubic: Not scalable!

Our Solution: LAYERED-ALP
 Dynamic Programming
 Optimal solution
lp1 = Longest path with 1 segment
lp2 = Longest path with 2 segments
lp4 = Longest path with 4 segments

Our Solution: LAYERED-ALP
Time Complexity:
Linear!
Build Layers
Find LP in
each layer
Find ALP

Complete algorithm
34
Time complexity:
Sub-quadratic

Complete algorithm: Parallel
35
Time complexity:

Outline
 Motivation
 Experiments
 Conclusion

Experiments: datasets
 Different Domains with range of sizes:
 BA-degree: Random Barabasi Albert graph
 AS-Oregon: Autonomous Systems peering information
 Higgs: Tweets dataset (with the follower-followee network)
 Portland: Contact network between people of Portland
 Memetracker: Who-copies-from-whom blog and website network
 IranElect: Follower-followee network of Twitter related to the Iran
election.
 DBLP: Co-authorship network related to ‘network’ topic.

Experiments: baselines
 DYNAMMO [Li et al. KDD 2009]:
 Change point detection ( Reconstruction errors)
 # segments = # segments of SnapNETS .
 K-means [Likas et al. Pattern Recognition 2003]:
 segment when a new cluster is detected
 VOG [Koutra et al. SDM 2014]:
 10 most important sub-structures
 Cut when the set of sub-structures changes significantly
o (threshold = the one gives the best result)
Feature Extraction
& time series
Dynamic graph

Experiments: baselines-variations
 SN-ORIG: Original graphs instead of summary graphs
 SN-LP: Longest Path instead of ALP
 SN-GREEDY: Greedy Approach instead of ALP

Experiments: Quantitative analysis
40
 SnapNETS outperforms the baselines
 Clear patterns in summary graphs
 Infection moves to new community
As-Oregon

Case studies: Memetracker
41
Televised vice-presidential debates
 Summary graphs are close to
the case when all nodes have
the same label (f5)
 Random nodes are active (f8)
 Summary graphs are
substantially sparser (f2).
 Many active nodes got merged
into important nodes such as
CNN and BBC to form hubs (f6)
Can I call you joe?

Case studies: AS-Oregon
42
 New community  New segment

Scalability
Scalability of SNAP NETS Speedup by parallelizing
construction of segmentation graph
Near-linear

Outline
 Motivation
 Experiments
 Conclusion

Discussion: SnapNets
 Patterns:
 the ‘placement’ and ‘connection’ of
active/inactive nodes:
• structural (e.g. community/role/centrality)
• rate changes.
 Global method:
 SnapNETS is a ‘global’ method and
not simply a change-point detection method.
Graph summarization
and features
Average Longest
Path
 Properties:
P1. Parameter-free
P2. Comprehensive
P3. Scalable

Future Work
 Handle dynamic graphs with varying
nodes and edges
 More node labels and real valued features
 Work with partially observed graphs

Any questions?
47
Funding:
Code at: https://github.com/SorourAmiri/SnapNETS
Sorour E. Amiri Liangzhe Chen B. Aditya Prakash
Goal 1 Goal 2 Goal 3
Finding the best segmentation
Successively merge nodes
Keep leading eigenvalue
Keep same set of labels
Graph summarization Segmentation graph
 Nodes
 Edges
 Edge weights
ALP
SnapNETS Result

SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

More Related Content

What's hot (14)

Similar to SnapNETS: Automatic Segmentation of Network Sequences with Node Labels (20)

Recently uploaded (20)

SnapNETS: Automatic Segmentation of Network Sequences with Node Labels

Editor's Notes