SlideShare a Scribd company logo
A New Algorithm Model for Massive-Scale
Streaming Graph Analysis
E. Jason Riedy, Chunxing Yin, and David A. Bader
Georgia Institute of Technology
SIAM Workshop on Network Science, 14 July 2017
Outline
Motivation and Applications
Current and Future STINGER Models
Closing
Streaming Graphs — SIAM NS, 14 July 2017 1/19
Motivation and Applications
(insert prefix here)-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Changes are important. Cannot stop the world...
Streaming Graphs — SIAM NS, 14 July 2017 2/19
Potential Applications
• Social Networks
• Identify communities, influences, bridges, trends,
anomalies (trends before they happen)...
• Potential to help social sciences, city planning, and
others with large-scale data.
• Cybersecurity
• Determine if new connections can access a device or
represent new threat in < 5ms...
• Is the transfer by a virus / persistent threat?
• Bioinformatics, health
• Construct gene sequences, analyze protein
interactions, map brain interactions
• Credit fraud forensics ⇒ detection ⇒ monitoring
• Real-time integration of all the customer’s data
Streaming Graphs — SIAM NS, 14 July 2017 3/19
Streaming graph data
Network data rates:
• Gigabit ethernet: 81k – 1.5M packets per second
• Over 130 000 flows per second on 10 GigE (< 7.7 µs)
Person-level data rates:
• 500M posts per day on Twitter (6k / sec)1
• 3M posts per minute on Facebook (50k / sec)2
Should analyze only changes and not entire graph.
Throughput & latency trade off and expose different
levels of concurrency.
1
www.internetlivestats.com/twitter-statistics/
2
www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/
Streaming Graphs — SIAM NS, 14 July 2017 4/19
Streaming graph analysis
Terminology, will go into more details:
• Streaming changes into a massive, evolving graph
• Will compare models later...
• Need to handle deletions as well as insertions
Previous STINGER performance results (x86-64):
Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, &
Bader 2014]
Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E,
& Mattson 2012]
Connected comp. >1M upd/sec [McColl, Green, & B 2013]
Community clustering >100K upd/sec∗
[R & B 2013]
PageRank Up to 40× latency improvement [R 2016]
Streaming Graphs — SIAM NS, 14 July 2017 5/19
Current and Future STINGER
Models
STINGER: Framework for streaming graphs
Slide credit: Rob McColl and David Ediger
• OpenMP + sufficiently POSIX-ish
• Multiple processes for resilience
Streaming Graphs — SIAM NS, 14 July 2017 6/19
Current STINGER model
Pre-process batch:
Sort by source vertex,
reconcile ins/del.
Pre-change hook
Alter graph (may “age off”old edges)
Post-change hook
STINGER
graph
Batch of insertions / deletions
Affected vertices
Change in metric
Streaming Graphs — SIAM NS, 14 July 2017 7/19
Is STINGER’s current model good enough?
Data ingest rates, R-MAT into R-MAT, scales 24 & 30
q
q
q
q
q
q
1e+02
1e+03
1e+04
1e+05
1e+06
1 10 100 1000 10000 1e+05
Batch size
Updaterate(upd/s)
platform q Power8 Haswell Haswell−30
q
q q
q
q q0.00316
0.00562
0.01000
0.01778
0.03162
1 10 100 1000 10000 1e+05
Batch size
Avg.updatetime(s)
platform q Power8 Haswell Haswell−30
Want to add analysis clients without slowing data ingest!
Note that scale 30 starts with 1.1B vertices, 17B edges...
(Different STINGER internal parameters.)
Streaming Graphs — SIAM NS, 14 July 2017 8/19
What if we don’t hold up changes?
When is an algorithm valid?
Analyze concurrently with the graph changes, and
produce a result correct for the starting graph and
some subset of concurrent changes.3
• No locking beyond atomic operations.
• No versioned data structure.
• No stopping.
3
Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on
Streaming Data.” 2017. (in submission)
Streaming Graphs — SIAM NS, 14 July 2017 9/19
Sample of other execution models
• Put in a query, wait for sufficient data [Phillips, et al.
at Sandia]
• Different but very interesting model.
• Evolving: Sample, accurate w/high-prob.
• Difficult to generalize into graph results (e.g.
shortest path tree).
• Classical: dynamic algorithms, versioned data
• Can require drastically more storage, possibly a copy
of the graph per property, or more overhead for
techniques like read-copy-update.
We are assuming we cannot “re-run” the world and must
keep up.
Streaming Graphs — SIAM NS, 14 July 2017 10/19
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over vertices, count the number of
neighbors.
1
Compute deg(v1)
1 0
Compute deg(v2)
delete edge
Cannot correspond to an undirected graph at all!
Valid for our model? No!
Not incorrect, just not valid for our model.
Streaming Graphs — SIAM NS, 14 July 2017 11/19
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over edges, increment the degrees of
the endpoints.
1 1
Inc deg(v1), deg(v2)
1 1
(later...)
delete edge
Corresponds to the beginning graph plus a subset of
concurrent changes.
Valid for our model? Yes!
Undirected stored as directed: skip edges with v1 ≥ v2.
Streaming Graphs — SIAM NS, 14 July 2017 12/19
Algorithm validity in our model
s
w(e1) = 10
w(e2) = 5 → 1
∆ = 4
• What is valid?
• Typical BFS
• Shiloach-Vishkin connected components
• PageRank (will describe...)
• Saved decisions...
• What is invalid?
• Making a decision twice in implementations
• ∆-stepping SSSP: Decrease a weight below ∆
• Degree optimization: Cross threshold, miss vertex
• Applying old or different information
• Multiply counting triangles: Counts match no graph
• Multiple searches: Betweenness centrality
• Labeling in S. Kahan’s components alg
Streaming Graphs — SIAM NS, 14 July 2017 13/19
PageRank without stopping
Apply Jacobi iteration to the linear system form of
PageRank:
x(k+1)
= αAT
D−1
x(k)
+ (1 − α)v.
Amusingly, the residual
r(k)
= (1 − α)v − (I − αAT
D−1
)x(k)
= x(k+1)
− x(k)
.
So if r(k)
is small, converged to a solution of a system near
the graph in the most recent iteration, hence to a graph
containing the original plus some subset of changes.
Streaming Graphs — SIAM NS, 14 July 2017 14/19
Fun properties for one-shot queries
Due to Chunxing Yin, under sensible assumptions:
1. You can produce a single-change stream to
demonstrate invalidity.
• Idea: Start with a graph that incorporates all the
visible changes, introduce the one change at the
right time.
2. Algorithms that produce a subgraph of their input
cannot be guaranteed to run concurrently with
changes and always produce moment-in-time
outputs.
• Idea: Any time a snapshot result could happen,
delete then re-insert an edge from the output.
Streaming Graphs — SIAM NS, 14 July 2017 15/19
On to streaming...
Can we update graph metrics as new data arrives?
• Track what changed during the one-shot query.
• Update locally around those changes, while other
changes are occuring.
• If the update is valid, can repeat to follow a
streaming graph.
Initial
∆0
Upd. w/∆0
∆1
Upd. w/∆1
∆2
Example: PageRank. Treat only the changed portions as
unconverged.
Streaming Graphs — SIAM NS, 14 July 2017 16/19
Then what?
• Many analyses do not scale in
performance to graphs with
billions of vertices.
• But we can extract
subgraphs...
• without stopping data ingest,
and...
• update the results!
Work in progress, based on PageRank and Katz.
Streaming Graphs — SIAM NS, 14 July 2017 17/19
Closing
Closing
• Summary
• Analysis concurrent with graph change can work.
• But not all methods are valid. Avoid evaluating
conditions or exploring the graph more than once.
• Valid updating methods can continue
• Future work
• Track subgraphs / communities for “slow” analyses
• Develop more valid updating methods,
approximation results
• Consider the debugging problem...
• And metadata...
Non-stop validity is only one approach! There are others.
Streaming Graphs — SIAM NS, 14 July 2017 18/19
STINGER: Where do you get it?
Home: www.cc.gatech.edu/stinger/
Code: git.cc.gatech.edu/git/project/stinger.git/
Gateway to
• code,
• development,
• documentation,
• presentations...
Remember: Academic code, but maturing
with contributions.
Users / contributors / questioners:
Georgia Tech, PNNL, CMU, Berkeley, Intel,
Cray, NVIDIA, IBM, Federal Government,
Ionic Security, Citi, Accenture, ...
Streaming Graphs — SIAM NS, 14 July 2017 19/19

More Related Content

PDF
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
PDF
Big Data Visualization
PPTX
Higher Education Profiling using Open Source GIS - A Primer on OpenStreetMap ...
PDF
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
PDF
Analysis of Webspaces of the Siberian Branch of the Russian Academy of Scienc...
PDF
Workshop 7 data science
PPTX
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Big Data Visualization
Higher Education Profiling using Open Source GIS - A Primer on OpenStreetMap ...
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
Analysis of Webspaces of the Siberian Branch of the Russian Academy of Scienc...
Workshop 7 data science
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs

Similar to A New Algorithm Model for Massive-Scale Streaming Graph Analysis (20)

PDF
High-Performance Analysis of Streaming Graphs
PDF
High-Performance Analysis of Streaming Graphs
PPT
Benchmarking graph databases on the problem of community detection
PPT
Benchmarking graph databases on the problem of community detection
PDF
Distributed deep learning
PDF
STINGER: Multi-threaded Graph Streaming
PPT
Scalable Machine Learning: The Role of Stratified Data Sharding
PDF
Representation Learning on Complex Graphs
PDF
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
PDF
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
PPTX
A Hacking Toolset for Big Tabular Files (3)
PDF
11.concept for a web map implementation with faster query response
PDF
Concept for a web map implementation with faster query response
PDF
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
PDF
IEEE Big data 2016 Title and Abstract
PDF
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
PDF
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
PDF
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
PDF
Ling liu part 01:big graph processing
PPTX
Data analytics in computer networking
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Distributed deep learning
STINGER: Multi-threaded Graph Streaming
Scalable Machine Learning: The Role of Stratified Data Sharding
Representation Learning on Complex Graphs
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
A Hacking Toolset for Big Tabular Files (3)
11.concept for a web map implementation with faster query response
Concept for a web map implementation with faster query response
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
IEEE Big data 2016 Title and Abstract
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Ling liu part 01:big graph processing
Data analytics in computer networking
Ad

More from Jason Riedy (20)

PDF
Lucata at the HPEC GraphBLAS BoF
PDF
LAGraph 2021-10-13
PDF
Lucata at the HPEC GraphBLAS BoF
PDF
Graph analysis and novel architectures
PDF
GraphBLAS and Emus
PDF
Reproducible Linear Algebra from Application to Architecture
PDF
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PDF
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
PDF
Novel Architectures for Applications in Data Science and Beyond
PDF
Characterization of Emu Chick with Microbenchmarks
PDF
CRNCH 2018 Summit: Rogues Gallery Update
PDF
Augmented Arithmetic Operations Proposed for IEEE-754 2018
PDF
Graph Analysis: New Algorithm Models, New Architectures
PDF
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
PDF
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
PDF
Updating PageRank for Streaming Graphs
PDF
Graph Analysis Beyond Linear Algebra
PDF
Network Challenge: Error and Sensitivity Analysis
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Lucata at the HPEC GraphBLAS BoF
LAGraph 2021-10-13
Lucata at the HPEC GraphBLAS BoF
Graph analysis and novel architectures
GraphBLAS and Emus
Reproducible Linear Algebra from Application to Architecture
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
Novel Architectures for Applications in Data Science and Beyond
Characterization of Emu Chick with Microbenchmarks
CRNCH 2018 Summit: Rogues Gallery Update
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Graph Analysis: New Algorithm Models, New Architectures
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Updating PageRank for Streaming Graphs
Graph Analysis Beyond Linear Algebra
Network Challenge: Error and Sensitivity Analysis
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Computer network topology notes for revision
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Foundation of Data Science unit number two notes
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to machine learning and Linear Models
PDF
Business Analytics and business intelligence.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Lecture1 pattern recognition............
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Computer network topology notes for revision
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
Foundation of Data Science unit number two notes
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Fluorescence-microscope_Botany_detailed content
Introduction to machine learning and Linear Models
Business Analytics and business intelligence.pdf
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
climate analysis of Dhaka ,Banglades.pptx

A New Algorithm Model for Massive-Scale Streaming Graph Analysis

  • 1. A New Algorithm Model for Massive-Scale Streaming Graph Analysis E. Jason Riedy, Chunxing Yin, and David A. Bader Georgia Institute of Technology SIAM Workshop on Network Science, 14 July 2017
  • 2. Outline Motivation and Applications Current and Future STINGER Models Closing Streaming Graphs — SIAM NS, 14 July 2017 1/19
  • 4. (insert prefix here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Changes are important. Cannot stop the world... Streaming Graphs — SIAM NS, 14 July 2017 2/19
  • 5. Potential Applications • Social Networks • Identify communities, influences, bridges, trends, anomalies (trends before they happen)... • Potential to help social sciences, city planning, and others with large-scale data. • Cybersecurity • Determine if new connections can access a device or represent new threat in < 5ms... • Is the transfer by a virus / persistent threat? • Bioinformatics, health • Construct gene sequences, analyze protein interactions, map brain interactions • Credit fraud forensics ⇒ detection ⇒ monitoring • Real-time integration of all the customer’s data Streaming Graphs — SIAM NS, 14 July 2017 3/19
  • 6. Streaming graph data Network data rates: • Gigabit ethernet: 81k – 1.5M packets per second • Over 130 000 flows per second on 10 GigE (< 7.7 µs) Person-level data rates: • 500M posts per day on Twitter (6k / sec)1 • 3M posts per minute on Facebook (50k / sec)2 Should analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency. 1 www.internetlivestats.com/twitter-statistics/ 2 www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/ Streaming Graphs — SIAM NS, 14 July 2017 4/19
  • 7. Streaming graph analysis Terminology, will go into more details: • Streaming changes into a massive, evolving graph • Will compare models later... • Need to handle deletions as well as insertions Previous STINGER performance results (x86-64): Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, & Bader 2014] Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E, & Mattson 2012] Connected comp. >1M upd/sec [McColl, Green, & B 2013] Community clustering >100K upd/sec∗ [R & B 2013] PageRank Up to 40× latency improvement [R 2016] Streaming Graphs — SIAM NS, 14 July 2017 5/19
  • 8. Current and Future STINGER Models
  • 9. STINGER: Framework for streaming graphs Slide credit: Rob McColl and David Ediger • OpenMP + sufficiently POSIX-ish • Multiple processes for resilience Streaming Graphs — SIAM NS, 14 July 2017 6/19
  • 10. Current STINGER model Pre-process batch: Sort by source vertex, reconcile ins/del. Pre-change hook Alter graph (may “age off”old edges) Post-change hook STINGER graph Batch of insertions / deletions Affected vertices Change in metric Streaming Graphs — SIAM NS, 14 July 2017 7/19
  • 11. Is STINGER’s current model good enough? Data ingest rates, R-MAT into R-MAT, scales 24 & 30 q q q q q q 1e+02 1e+03 1e+04 1e+05 1e+06 1 10 100 1000 10000 1e+05 Batch size Updaterate(upd/s) platform q Power8 Haswell Haswell−30 q q q q q q0.00316 0.00562 0.01000 0.01778 0.03162 1 10 100 1000 10000 1e+05 Batch size Avg.updatetime(s) platform q Power8 Haswell Haswell−30 Want to add analysis clients without slowing data ingest! Note that scale 30 starts with 1.1B vertices, 17B edges... (Different STINGER internal parameters.) Streaming Graphs — SIAM NS, 14 July 2017 8/19
  • 12. What if we don’t hold up changes? When is an algorithm valid? Analyze concurrently with the graph changes, and produce a result correct for the starting graph and some subset of concurrent changes.3 • No locking beyond atomic operations. • No versioned data structure. • No stopping. 3 Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on Streaming Data.” 2017. (in submission) Streaming Graphs — SIAM NS, 14 July 2017 9/19
  • 13. Sample of other execution models • Put in a query, wait for sufficient data [Phillips, et al. at Sandia] • Different but very interesting model. • Evolving: Sample, accurate w/high-prob. • Difficult to generalize into graph results (e.g. shortest path tree). • Classical: dynamic algorithms, versioned data • Can require drastically more storage, possibly a copy of the graph per property, or more overhead for techniques like read-copy-update. We are assuming we cannot “re-run” the world and must keep up. Streaming Graphs — SIAM NS, 14 July 2017 10/19
  • 14. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over vertices, count the number of neighbors. 1 Compute deg(v1) 1 0 Compute deg(v2) delete edge Cannot correspond to an undirected graph at all! Valid for our model? No! Not incorrect, just not valid for our model. Streaming Graphs — SIAM NS, 14 July 2017 11/19
  • 15. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over edges, increment the degrees of the endpoints. 1 1 Inc deg(v1), deg(v2) 1 1 (later...) delete edge Corresponds to the beginning graph plus a subset of concurrent changes. Valid for our model? Yes! Undirected stored as directed: skip edges with v1 ≥ v2. Streaming Graphs — SIAM NS, 14 July 2017 12/19
  • 16. Algorithm validity in our model s w(e1) = 10 w(e2) = 5 → 1 ∆ = 4 • What is valid? • Typical BFS • Shiloach-Vishkin connected components • PageRank (will describe...) • Saved decisions... • What is invalid? • Making a decision twice in implementations • ∆-stepping SSSP: Decrease a weight below ∆ • Degree optimization: Cross threshold, miss vertex • Applying old or different information • Multiply counting triangles: Counts match no graph • Multiple searches: Betweenness centrality • Labeling in S. Kahan’s components alg Streaming Graphs — SIAM NS, 14 July 2017 13/19
  • 17. PageRank without stopping Apply Jacobi iteration to the linear system form of PageRank: x(k+1) = αAT D−1 x(k) + (1 − α)v. Amusingly, the residual r(k) = (1 − α)v − (I − αAT D−1 )x(k) = x(k+1) − x(k) . So if r(k) is small, converged to a solution of a system near the graph in the most recent iteration, hence to a graph containing the original plus some subset of changes. Streaming Graphs — SIAM NS, 14 July 2017 14/19
  • 18. Fun properties for one-shot queries Due to Chunxing Yin, under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. • Idea: Start with a graph that incorporates all the visible changes, introduce the one change at the right time. 2. Algorithms that produce a subgraph of their input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. • Idea: Any time a snapshot result could happen, delete then re-insert an edge from the output. Streaming Graphs — SIAM NS, 14 July 2017 15/19
  • 19. On to streaming... Can we update graph metrics as new data arrives? • Track what changed during the one-shot query. • Update locally around those changes, while other changes are occuring. • If the update is valid, can repeat to follow a streaming graph. Initial ∆0 Upd. w/∆0 ∆1 Upd. w/∆1 ∆2 Example: PageRank. Treat only the changed portions as unconverged. Streaming Graphs — SIAM NS, 14 July 2017 16/19
  • 20. Then what? • Many analyses do not scale in performance to graphs with billions of vertices. • But we can extract subgraphs... • without stopping data ingest, and... • update the results! Work in progress, based on PageRank and Katz. Streaming Graphs — SIAM NS, 14 July 2017 17/19
  • 22. Closing • Summary • Analysis concurrent with graph change can work. • But not all methods are valid. Avoid evaluating conditions or exploring the graph more than once. • Valid updating methods can continue • Future work • Track subgraphs / communities for “slow” analyses • Develop more valid updating methods, approximation results • Consider the debugging problem... • And metadata... Non-stop validity is only one approach! There are others. Streaming Graphs — SIAM NS, 14 July 2017 18/19
  • 23. STINGER: Where do you get it? Home: www.cc.gatech.edu/stinger/ Code: git.cc.gatech.edu/git/project/stinger.git/ Gateway to • code, • development, • documentation, • presentations... Remember: Academic code, but maturing with contributions. Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi, Accenture, ... Streaming Graphs — SIAM NS, 14 July 2017 19/19