STINGER
Dynamic Graph Analysis
Contributors
• David Bader
• David Ediger
• Rob McColl
• Jason Riedy
• Kamesh Madduri
• Jason Poovey
Outline
• Motivation


• Dynamic Graph Basics


• What is STINGER?


• What can STINGER do?


• Why STINGER?
Big Data problems need Graph Analysis
    Health Care      • Finding outbreaks, population epidemiology


   Social Networks   • Advertising, searching, grouping, influence


     Intelligence    • Decisions at scale, regulating algorithms


  Systems Biology    • Understanding interactions, drug design


     Power Grid      • Disruptions, conversion


     Simulation      • Discrete events, cracking meshes
Graphs are pervasive
 • Graphs: things and relationships
    • Different kinds of things, different kinds of relationships, but graphs provide a
      framework for analyzing the relationships.
    • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.


         Astrophysics                     Bioinformatics                  Social Informatics
Problem: Outlier detection       Problem:                           Problem: Emergent behavior,
Challenges: Massive data         Identifying target proteins        information spread
sets, temporal variation         Challenges:                        Challenges: New analysis,
Graph Problems: matching,        Data heterogeneity, quality        data uncertainty, scale
clustering                       Graph Problems:                    Graph Problems: clustering,
                                 Centrality, clustering             flows, shortest paths
Data rates and volumes are immense
• Facebook:
  • ~1 billion users
  • average 130 friends
  • 30 billion pieces of content shared / month
• Twitter:
   • 500 million active users
   • 340 million tweets / day
• Internet – 100s of exabytes / year
   • 300 million new websites per year
   • 48 hours of video to You Tube per minute
   • 30,000 YouTube videos played per second
Our focus is streaming graphs
• As relationships change
  • Edges (relationships) are inserted, updated, and removed
  • New vertices (things) join and leave the network


• What are the effects?
  • On information flow
  • On community structure
                                                z       x      y
  • On the integrity of data and structure


• Which actors and relationships are…
  • The key players and influencers in the change?
  • The anomalies and threats?
What is STINGER?
Spatio-Temporal Interaction Networks and Graphs Extensible Representation
D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos


• A scalable, high performance in-memory dynamic graph data
  structure
   •   Stores semantic and temporal information.
   •   Designed to be flexible and extendable.
   •   Be useful for the entire “large graph” community.
   •   Permit good performance: No single structure is optimal for all.
   •   Assume globally addressable memory access.
   •   Support multiple, parallel readers and a single parallel writer.

• A software suite for dynamic graph analysis
  • Targets large shared-memory x86 and the Cray XMT
  • Written in C with OpenMP and XMT pragma support for parallelism
As a data structure
• Fast insertions, deletions, and updates:
 A data structure that grows and changes at the speed of the data.

• Edge and vertex types and weights:
 Represent complex relationships and multiple simultaneous networks.

• Filtering traversal mechanisms:
 Traverse serially or in parallel on specific edge types, time ranges,
 vertex sets, etc.

• Experimental workflow server:
 Multiple data streams and analytics with one persistent data structure.

• Experimental Java and Python bindings:
 Use efficiency-oriented languages without sacrificing performance-
 oriented results.
As an analysis package
• Streaming edge insertions and deletions:
  Performs new edge insertions, updates, and deletions in batches or individually.

• Streaming clustering coefficients:
  Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions.

• Streaming connected components:
  Accurately tracks the connected components of a graph with insertions and deletions.

• Streaming community detection:
  Track and update the community structures within the graph as they change.

• Parallel agglomerative clustering:
  Find clusters that are optimized for a user-defined edge scoring function.

• Streaming Betweenness Centrality:
  Find the key points within information flows and structural vulnerabilities.

• K-core Extraction:
  Extract additional communities and filter noisy high-degree vertices.

• Classic breadth-first search:
  Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
How is the graph stored?
What can STINGER represent?
• Nearly any set of
  relationships
   •   Healthcare
   •   Social Networks
   •   Intelligence
   •   Systems biology
   •   Power grid
   •   Travel networks

• Example: Twitter
   • Users, hashtags, tweets as vertex types
   • Authorship, retweet, mentions, follows / followed by edge types


• Example: Work Environment
   • Users, PCs, printers, emails, URLs, files, etc. as vertex types
   • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
What can STINGER do?
• Optimized to update at rates of over 3 million edges per second on
 graphs of one billion edges
  •   D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming
      Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20-
      22, 2012. Best Paper Award.




                       RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
What can STINGER do?
• Maintaining connected components in a graph of half a billion edges
  • Up to 1.26 million updates per sec.
  • 137x faster than recomputing.

• Scalable parallel streaming community detection
  • Built on parallel insert / delete mechanisms.

• Streaming approximate betweenness
  • Used to analyze influencers on Twitter during Hurricane Sandy over time.
What does STINGER not do?
• Does not provide all ACID properties
   • Why: Not intended to be the backing data store.
   • Why: Allows for greater ingest and processing speeds.
   • Alternative: Back STINGER ingest with an ACID DB
   • Alternative: STINGER does provide consistency, partial isolation


• No text base query language – for now
   • Why: Currently, no language is general enough to describe most or all queries
   • Alternative: Filtering traversal APIs, unlimited query flexibility through code
   • Alternative: Productivity language bindings (Python, Java)


• No distributed / Hadoop-like cluster support
   • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow
   • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems
   • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers
   • Alternative: Extract key portions of the graph from a larger data store and perform fast in-
     memory processing in STINGER
What sizes, performance can it handle?
                                                                  Server 4x Opteron 6282 256GB DDR3
    Desktop (Intel Core i7-2600 16GB DDR3)                                                     Connected      Updates
                                                            V      E      Config Size (GB)
                                 Connected      Updates                                      Components (s)   per Sec.
V      E    Config Size (GB)
                               Components (s)   per Sec.
                                                           16M 512M       25-14    60GB           13.7         696K
1M    8M    22-14    1.184         0.316         2.7M
                                                           16M 256M       25-14    24.6GB         9.82         2.1M
2M    16M   22-14    2.384          0.75         2.3M
4M    33M   22-14    4.768           2           2.3M           Cray XMT2 – 64 Processors 2TB DDR2
8M    67M   24-14    9.536          5.36         0.85M                                         Connected      Updates
                                                            V       E     Config Size (GB)
                                                                                             Components (s)   per Sec.
4M    67M   24-14    7.984           3           1.38M
                                                           67M    512M     28-32    86GB          13.8         3.3M
4M   134M   24-14    14.336         5.7          0.8M
                                                           268M    4.3B    28-32   312GB          52.3         2.34M


                        • The only limitation on size is system memory
                            • Billions of vertices and edges are possible

                        • V vertices and E edges in each graph
                             • E counts are undirected
                             • STINGER stores both directions
                        • Config is STINGER-specific parameters
Why not existing technologies?
• Traditional SQL databases
   • Not structured to do any meaningful graph queries with any level of
     efficiency or timeliness

• Graph databases - mostly on-disk
  • Distributed disk can keep up with storing / indexing, but is simply too
    slow at random graph access to process on as the graph updates

• Hadoop and HDFS-based projects
  • Not really the right programming model for many structural queries
    over the entire graph, random access performance is poor

• Smaller graph libraries, processing tools
  • Can't scale, can't process dynamic graphs, frequently leads to
    impossible visualization attempts
Who is GTRI?
• Georgia Tech Research Institute
  • Largest research entity at Georgia Institute of Technology
  • One of the world's premier university-based applied R&D
    organizations for 75 years
  • Non-profit with over 1,600 employees and 21 locations world-wide
  • Over $240 million per year of government and industry contracts


• Innovative Computing Division
 of the Cyber Technology and Information Security Lab
  • Dedicated to the application of practical HPC expertise and
    cutting-edge fundamental research to solve real-world problems
  • Experts in high-performance computing, algorithms, and big data
How can I start using STINGER?
• Information, code, help
   • http://cc.gatech.edu/stinger
   • robert.mccoll@gtri.gatech.edu


• Together, GTRI and Georgia Tech can offer
   • Consulting
     Understand how your organization can benefit from graph analytics.

  • Training
    Learn how to use graph analysis and apply STINGER to your data.

  • Implementation
    Customize and extend STINGER to suit your needs using our experts.

  • Research Expertise
    Connect with researchers on the cutting edge of big data to develop novel
    solutions to your open problems.

More Related Content

PPTX
Schizophrenia management
PPTX
Networkx & Gephi Tutorial #Pydata NYC
PDF
STINGER: Multi-threaded Graph Streaming
PPTX
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
PDF
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
PPTX
Temporal graph
PDF
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
PPTX
Big Data Analytics with Storm, Spark and GraphLab
Schizophrenia management
Networkx & Gephi Tutorial #Pydata NYC
STINGER: Multi-threaded Graph Streaming
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
Temporal graph
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Big Data Analytics with Storm, Spark and GraphLab

Viewers also liked (11)

PDF
FluxGraph: a time-machine for your graphs
PDF
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
PDF
Gephi with CSV File
PDF
Sparksee overview
PDF
Machine Learning and GraphX
PDF
Community detection in graphs
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
PDF
Gephi Quick Start
PDF
Recommender Systems with Apache Spark's ALS Function
PPTX
How to Build a Recommendation Engine on Spark
PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
FluxGraph: a time-machine for your graphs
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
Gephi with CSV File
Sparksee overview
Machine Learning and GraphX
Community detection in graphs
Migrating Netflix from Datacenter Oracle to Global Cassandra
Gephi Quick Start
Recommender Systems with Apache Spark's ALS Function
How to Build a Recommendation Engine on Spark
Real time data viz with Spark Streaming, Kafka and D3.js
Ad

Similar to Introduction to STINGER (20)

PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
Bcn On Rails May2010 On Graph Databases
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
PDF
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
PDF
Big data landscape
PDF
G-Store: High-Performance Graph Store for Trillion-Edge Processing
PDF
Graph Theory and Databases
PDF
Guy Coates
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
PDF
The big data_computing_architecture-graph500
PDF
The big data_computing_architecture-graph500
ODP
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
PPTX
Introduction to NoSQL
PDF
Big Data/Hadoop Infrastructure Considerations
PDF
Scaling Out With Hadoop And HBase
PDF
Where Does Big Data Meet Big Database - QCon 2012
PPTX
Big Data & Hadoop Introduction
PDF
Omaha Java Users Group - Introduction to HBase and Hadoop
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Bcn On Rails May2010 On Graph Databases
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
Big data landscape
G-Store: High-Performance Graph Store for Trillion-Edge Processing
Graph Theory and Databases
Guy Coates
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Introduction to NoSQL
Big Data/Hadoop Infrastructure Considerations
Scaling Out With Hadoop And HBase
Where Does Big Data Meet Big Database - QCon 2012
Big Data & Hadoop Introduction
Omaha Java Users Group - Introduction to HBase and Hadoop
Ad

Recently uploaded (20)

PPT
What is a Computer? Input Devices /output devices
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Unlock new opportunities with location data.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
The various Industrial Revolutions .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
DOCX
search engine optimization ppt fir known well about this
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPT
Geologic Time for studying geology for geologist
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Chapter 5: Probability Theory and Statistics
What is a Computer? Input Devices /output devices
O2C Customer Invoices to Receipt V15A.pptx
Unlock new opportunities with location data.pdf
Developing a website for English-speaking practice to English as a foreign la...
The various Industrial Revolutions .pptx
Tartificialntelligence_presentation.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
search engine optimization ppt fir known well about this
A novel scalable deep ensemble learning framework for big data classification...
Getting started with AI Agents and Multi-Agent Systems
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Geologic Time for studying geology for geologist
observCloud-Native Containerability and monitoring.pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Enhancing emotion recognition model for a student engagement use case through...
Chapter 5: Probability Theory and Statistics

Introduction to STINGER

  • 2. Contributors • David Bader • David Ediger • Rob McColl • Jason Riedy • Kamesh Madduri • Jason Poovey
  • 3. Outline • Motivation • Dynamic Graph Basics • What is STINGER? • What can STINGER do? • Why STINGER?
  • 4. Big Data problems need Graph Analysis Health Care • Finding outbreaks, population epidemiology Social Networks • Advertising, searching, grouping, influence Intelligence • Decisions at scale, regulating algorithms Systems Biology • Understanding interactions, drug design Power Grid • Disruptions, conversion Simulation • Discrete events, cracking meshes
  • 5. Graphs are pervasive • Graphs: things and relationships • Different kinds of things, different kinds of relationships, but graphs provide a framework for analyzing the relationships. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Bioinformatics Social Informatics Problem: Outlier detection Problem: Problem: Emergent behavior, Challenges: Massive data Identifying target proteins information spread sets, temporal variation Challenges: Challenges: New analysis, Graph Problems: matching, Data heterogeneity, quality data uncertainty, scale clustering Graph Problems: Graph Problems: clustering, Centrality, clustering flows, shortest paths
  • 6. Data rates and volumes are immense • Facebook: • ~1 billion users • average 130 friends • 30 billion pieces of content shared / month • Twitter: • 500 million active users • 340 million tweets / day • Internet – 100s of exabytes / year • 300 million new websites per year • 48 hours of video to You Tube per minute • 30,000 YouTube videos played per second
  • 7. Our focus is streaming graphs • As relationships change • Edges (relationships) are inserted, updated, and removed • New vertices (things) join and leave the network • What are the effects? • On information flow • On community structure z x y • On the integrity of data and structure • Which actors and relationships are… • The key players and influencers in the change? • The anomalies and threats?
  • 8. What is STINGER? Spatio-Temporal Interaction Networks and Graphs Extensible Representation D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos • A scalable, high performance in-memory dynamic graph data structure • Stores semantic and temporal information. • Designed to be flexible and extendable. • Be useful for the entire “large graph” community. • Permit good performance: No single structure is optimal for all. • Assume globally addressable memory access. • Support multiple, parallel readers and a single parallel writer. • A software suite for dynamic graph analysis • Targets large shared-memory x86 and the Cray XMT • Written in C with OpenMP and XMT pragma support for parallelism
  • 9. As a data structure • Fast insertions, deletions, and updates: A data structure that grows and changes at the speed of the data. • Edge and vertex types and weights: Represent complex relationships and multiple simultaneous networks. • Filtering traversal mechanisms: Traverse serially or in parallel on specific edge types, time ranges, vertex sets, etc. • Experimental workflow server: Multiple data streams and analytics with one persistent data structure. • Experimental Java and Python bindings: Use efficiency-oriented languages without sacrificing performance- oriented results.
  • 10. As an analysis package • Streaming edge insertions and deletions: Performs new edge insertions, updates, and deletions in batches or individually. • Streaming clustering coefficients: Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions. • Streaming connected components: Accurately tracks the connected components of a graph with insertions and deletions. • Streaming community detection: Track and update the community structures within the graph as they change. • Parallel agglomerative clustering: Find clusters that are optimized for a user-defined edge scoring function. • Streaming Betweenness Centrality: Find the key points within information flows and structural vulnerabilities. • K-core Extraction: Extract additional communities and filter noisy high-degree vertices. • Classic breadth-first search: Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
  • 11. How is the graph stored?
  • 12. What can STINGER represent? • Nearly any set of relationships • Healthcare • Social Networks • Intelligence • Systems biology • Power grid • Travel networks • Example: Twitter • Users, hashtags, tweets as vertex types • Authorship, retweet, mentions, follows / followed by edge types • Example: Work Environment • Users, PCs, printers, emails, URLs, files, etc. as vertex types • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
  • 13. What can STINGER do? • Optimized to update at rates of over 3 million edges per second on graphs of one billion edges • D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20- 22, 2012. Best Paper Award. RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
  • 14. What can STINGER do? • Maintaining connected components in a graph of half a billion edges • Up to 1.26 million updates per sec. • 137x faster than recomputing. • Scalable parallel streaming community detection • Built on parallel insert / delete mechanisms. • Streaming approximate betweenness • Used to analyze influencers on Twitter during Hurricane Sandy over time.
  • 15. What does STINGER not do? • Does not provide all ACID properties • Why: Not intended to be the backing data store. • Why: Allows for greater ingest and processing speeds. • Alternative: Back STINGER ingest with an ACID DB • Alternative: STINGER does provide consistency, partial isolation • No text base query language – for now • Why: Currently, no language is general enough to describe most or all queries • Alternative: Filtering traversal APIs, unlimited query flexibility through code • Alternative: Productivity language bindings (Python, Java) • No distributed / Hadoop-like cluster support • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers • Alternative: Extract key portions of the graph from a larger data store and perform fast in- memory processing in STINGER
  • 16. What sizes, performance can it handle? Server 4x Opteron 6282 256GB DDR3 Desktop (Intel Core i7-2600 16GB DDR3) Connected Updates V E Config Size (GB) Connected Updates Components (s) per Sec. V E Config Size (GB) Components (s) per Sec. 16M 512M 25-14 60GB 13.7 696K 1M 8M 22-14 1.184 0.316 2.7M 16M 256M 25-14 24.6GB 9.82 2.1M 2M 16M 22-14 2.384 0.75 2.3M 4M 33M 22-14 4.768 2 2.3M Cray XMT2 – 64 Processors 2TB DDR2 8M 67M 24-14 9.536 5.36 0.85M Connected Updates V E Config Size (GB) Components (s) per Sec. 4M 67M 24-14 7.984 3 1.38M 67M 512M 28-32 86GB 13.8 3.3M 4M 134M 24-14 14.336 5.7 0.8M 268M 4.3B 28-32 312GB 52.3 2.34M • The only limitation on size is system memory • Billions of vertices and edges are possible • V vertices and E edges in each graph • E counts are undirected • STINGER stores both directions • Config is STINGER-specific parameters
  • 17. Why not existing technologies? • Traditional SQL databases • Not structured to do any meaningful graph queries with any level of efficiency or timeliness • Graph databases - mostly on-disk • Distributed disk can keep up with storing / indexing, but is simply too slow at random graph access to process on as the graph updates • Hadoop and HDFS-based projects • Not really the right programming model for many structural queries over the entire graph, random access performance is poor • Smaller graph libraries, processing tools • Can't scale, can't process dynamic graphs, frequently leads to impossible visualization attempts
  • 18. Who is GTRI? • Georgia Tech Research Institute • Largest research entity at Georgia Institute of Technology • One of the world's premier university-based applied R&D organizations for 75 years • Non-profit with over 1,600 employees and 21 locations world-wide • Over $240 million per year of government and industry contracts • Innovative Computing Division of the Cyber Technology and Information Security Lab • Dedicated to the application of practical HPC expertise and cutting-edge fundamental research to solve real-world problems • Experts in high-performance computing, algorithms, and big data
  • 19. How can I start using STINGER? • Information, code, help • http://cc.gatech.edu/stinger • robert.mccoll@gtri.gatech.edu • Together, GTRI and Georgia Tech can offer • Consulting Understand how your organization can benefit from graph analytics. • Training Learn how to use graph analysis and apply STINGER to your data. • Implementation Customize and extend STINGER to suit your needs using our experts. • Research Expertise Connect with researchers on the cutting edge of big data to develop novel solutions to your open problems.