SlideShare a Scribd company logo
PARALLEL GRAPH ANALYTICS
STEFANO ROMANAZZI
STEFAN.ROMANAZZI@GMAIL.COM
INTRODUCTIONv
 Today we must deal with huge graphs
 Parallel computing is essential for analyzing such graphs
Web graph Facebook graph
50bln nodes, 1 trillion edges 1bln nodes, 200bln edges
TAO ANALYSISv
Graph
analytics
applications
Topology
Uniform
degree
Power-law
Erdős–Rényi
Active nodes
Location
Topology-
driven
Data-driven
Ordering
Unordered
Ordered
Operator
Morph
Local
computation
Reader
GRAPH TOPOLOGIESv
 Uniform-degree graphs
 Power-law graphs
 Erdős–Rényi graphs
OPERATORSv
 Morph  Modifies the structure of the graph
 Local computation  Updates labels on nodes and edges
 Push-style operator
 Pull-style operator
 Reader  Used on read-only data
ACTIVE NODESv
 ORDERING
 Ordered algorithms
 Active nodes must be processed in a specific order
 Unordered algorithms
 Any order is semantically correct
 It is possible to define soft priorities to enhance the efficiency
 LOCATION
 Topology-driven algorithms
 All graph nodes are active nodes
 Work-inefficient
 Easier to implement on GPU
 Data-driven algorithms
 Visits nodes only if there may be work to be performed
 Threads obtain work by pulling active nodes from a worklist
ASYNCHRONOUS Δ-STEPPINGv
 Example of data-driven algorithm
 Parallel version of the Dijsktra algorithm
 Δ is a user-defined parameter
 Higher the Δ, more the parallelism and higher the speculative work
 Two types of edges
 Heavy edge if 𝑤(𝑒) > Δ
 Light edge if 𝑤(𝑒) < Δ
 Buckets represent worklists
 Bucket 𝑖 contains vertices with tentative distance in [Δ 𝑖 − 1 , Δ 𝑖 − 1]
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
ASYNCHRONOUS Δ-STEPPINGv
PARALLEL ABSTRACTIONSv
 Data parallelism
 Defined data-parallel operations
 Amorphous data-parallelism
 Prevent activities on overlapping neighborhoods from being executed in parallel
 Ordering constraints
 The execution of a new activity may create new activities
BSP-STYLE SEMANTICSv
 Programs are executed in rounds
 Barrier synchronization between rounds
 Multiple updates can be resolved with reduction operations
 If (u,v) is an edge  𝑑𝑖𝑠𝑡(𝑣) = 𝑚𝑖𝑛{𝑑𝑖𝑠𝑡(𝑣), 𝑑𝑖𝑠𝑡(𝑢) + 𝑙(𝑢, 𝑣)}
 Good performance if all processors are kept busy
 Power-law graphs
 Bad performance on road networks
TRANSACTIONAL SEMANTICSv
 Prevent activities from executing in parallel if they conflict
 Activities do not see concurrent activities
 Updates are visible after the execution is complete
 SCHEDULING TYPES
 Autonomous scheduling
 Abort some of the activity in conflict
 Coordinated scheduling
 Static parallelization
 Just-in-time parallelization
 Runtime parallelization
SYSTEMS FOR GRAPH ANALYTICSv
 CombBLAS, Pregel, Giraph, PowerGraph, Ligra, Galois
 Important additional properties of a graph analytics system
 Pointer-jumping operations
 Better performances in some algorithms
 Graph partitioning among hosts of a distributed-memory cluster
 Minimization of the number of edges that span to multiple hosts
PERFORMANCE STUDIESv
 Trade-off between abstraction and limitations
 May prevent some implementations
 Introduce performance penalties
 Performance comparison on power-law graphs
 Galois vs native implementation and other systems
 Performance comparison with varying graph structures
 Twitter graph – 51mln nodes, 2bln edges
 U.S. road network – 24mln nodes, 58mln edges
PERFORMANCE STUDIESv
PERFORMANCE STUDIESv
CONCLUSIONSv
 PGA is one of the most rapidly growing research areas in Network Science
 Amazon and Netflix adopts it to to find patterns of purchases by their customers
 Intelligence agencies use it to find the key players in terrorist networks
 Build efficiently engineered systems for PGA is one of the major actual
challenges
REFERENCESv
[1] K. Pingali, D.Nguyen, The Tao of Parallelism in Algorithms, 2011.
[2] R. Nasre, M. Burtscher, K. Pingali, Data-driven versus Topology-driven
Irregular Computations on GPUs, 2013, pp. 3-6.
[3] D. Ajwani, Trade-offs in Processing Large Graphs: Representations,
Storage, Systems and Algorithms, 2015, pp. 43-53.

More Related Content

PPTX
Mapwise in the Field with GPS
PDF
Spark Summit EU talk by Chris Pool and Jeroen Vlek
PPTX
PDF
Finns Using FME Like Crazy
PPT
Parallel Rendering
PPTX
Project Based on MATLAB For Final Year Research Guidance
PPTX
Transformer Loading, Driving Enterprise Decisions with ArcGIS Online
PDF
Winter Maintenance Management System Bavaria
Mapwise in the Field with GPS
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Finns Using FME Like Crazy
Parallel Rendering
Project Based on MATLAB For Final Year Research Guidance
Transformer Loading, Driving Enterprise Decisions with ArcGIS Online
Winter Maintenance Management System Bavaria

What's hot (20)

PPTX
Matlab training Introduction at VTIPS
PPTX
Outside of the Box Integrations
PPTX
Map Reduce
PDF
SaileshKumar_Kumar_Resume
PDF
An Introduction to the Heatmap / Histogram Plugin
PDF
Torkel Ödegaard (Creator of Grafana) - Grafana at #DOXLON
PPTX
Transforming Data into Information: Supporting Dashboards with FME
PPTX
2016 conservation track: under the hood of an rea: what is within a rapid ec...
PPT
Proposed bench test for gis servers
PPT
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
PPTX
Getting More for Less in Optimized MapReduce Workflows
PPTX
Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by...
PDF
FME Spatial Querying in a CAD-Driven GIS
PPTX
Using FME to Automate Data Integration in a City
PDF
AgileGraphingCookbook
PDF
GoFFish - A Sub-graph centric framework for large scale graph analytics
PPTX
Scalding intro 20141125
PPTX
How will the Utility Network Affect You?
PPTX
implementation and comparision of effective area efficient architecture for CSLA
Matlab training Introduction at VTIPS
Outside of the Box Integrations
Map Reduce
SaileshKumar_Kumar_Resume
An Introduction to the Heatmap / Histogram Plugin
Torkel Ödegaard (Creator of Grafana) - Grafana at #DOXLON
Transforming Data into Information: Supporting Dashboards with FME
2016 conservation track: under the hood of an rea: what is within a rapid ec...
Proposed bench test for gis servers
Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...
Getting More for Less in Optimized MapReduce Workflows
Apache Hadoop India Summit 2011 Keynote talk "Scaling Hadoop Applications" by...
FME Spatial Querying in a CAD-Driven GIS
Using FME to Automate Data Integration in a City
AgileGraphingCookbook
GoFFish - A Sub-graph centric framework for large scale graph analytics
Scalding intro 20141125
How will the Utility Network Affect You?
implementation and comparision of effective area efficient architecture for CSLA
Ad

Similar to Parallel Graph Analytics (20)

PDF
A Lightweight Infrastructure for Graph Analytics
PDF
Graph Analyses with Python and NetworkX
PDF
Graph Analysis Beyond Linear Algebra
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
PPTX
PowerLyra@EuroSys2015
PDF
Graph Analytics with Greenplum and Apache MADlib
PPTX
Graph processing
PPTX
Network analysis lecture
PPTX
Big data week 2018 - Graph Analytics on Big Data
PPTX
Apache Spark GraphX highlights.
PDF
Bill howe 8_graphs
PDF
Ling liu part 02:big graph processing
PPTX
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PPTX
Graphs in data structures are non-linear data structures made up of a finite ...
PPTX
TREE ADT, TREE TRAVERSALS, BINARY TREE ADT
PDF
Demystifying Distributed Graph Processing
PPT
Social Network Based Information Systems (Tin180 Com)
PDF
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
PDF
F14 lec12graphs
PDF
Graph Algorithms - Map-Reduce Graph Processing
A Lightweight Infrastructure for Graph Analytics
Graph Analyses with Python and NetworkX
Graph Analysis Beyond Linear Algebra
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
PowerLyra@EuroSys2015
Graph Analytics with Greenplum and Apache MADlib
Graph processing
Network analysis lecture
Big data week 2018 - Graph Analytics on Big Data
Apache Spark GraphX highlights.
Bill howe 8_graphs
Ling liu part 02:big graph processing
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graphs in data structures are non-linear data structures made up of a finite ...
TREE ADT, TREE TRAVERSALS, BINARY TREE ADT
Demystifying Distributed Graph Processing
Social Network Based Information Systems (Tin180 Com)
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
F14 lec12graphs
Graph Algorithms - Map-Reduce Graph Processing
Ad

Recently uploaded (20)

PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Phytochemical Investigation of Miliusa longipes.pdf
Placing the Near-Earth Object Impact Probability in Context
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
POSITIONING IN OPERATION THEATRE ROOM.ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
2Systematics of Living Organisms t-.pptx
INTRODUCTION TO EVS | Concept of sustainability
The KM-GBF monitoring framework – status & key messages.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
2. Earth - The Living Planet Module 2ELS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
lecture 2026 of Sjogren's syndrome l .pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Biophysics 2.pdffffffffffffffffffffffffff
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

Parallel Graph Analytics

  • 1. PARALLEL GRAPH ANALYTICS STEFANO ROMANAZZI STEFAN.ROMANAZZI@GMAIL.COM
  • 2. INTRODUCTIONv  Today we must deal with huge graphs  Parallel computing is essential for analyzing such graphs Web graph Facebook graph 50bln nodes, 1 trillion edges 1bln nodes, 200bln edges
  • 4. GRAPH TOPOLOGIESv  Uniform-degree graphs  Power-law graphs  Erdős–Rényi graphs
  • 5. OPERATORSv  Morph  Modifies the structure of the graph  Local computation  Updates labels on nodes and edges  Push-style operator  Pull-style operator  Reader  Used on read-only data
  • 6. ACTIVE NODESv  ORDERING  Ordered algorithms  Active nodes must be processed in a specific order  Unordered algorithms  Any order is semantically correct  It is possible to define soft priorities to enhance the efficiency  LOCATION  Topology-driven algorithms  All graph nodes are active nodes  Work-inefficient  Easier to implement on GPU  Data-driven algorithms  Visits nodes only if there may be work to be performed  Threads obtain work by pulling active nodes from a worklist
  • 7. ASYNCHRONOUS Δ-STEPPINGv  Example of data-driven algorithm  Parallel version of the Dijsktra algorithm  Δ is a user-defined parameter  Higher the Δ, more the parallelism and higher the speculative work  Two types of edges  Heavy edge if 𝑤(𝑒) > Δ  Light edge if 𝑤(𝑒) < Δ  Buckets represent worklists  Bucket 𝑖 contains vertices with tentative distance in [Δ 𝑖 − 1 , Δ 𝑖 − 1]
  • 18. PARALLEL ABSTRACTIONSv  Data parallelism  Defined data-parallel operations  Amorphous data-parallelism  Prevent activities on overlapping neighborhoods from being executed in parallel  Ordering constraints  The execution of a new activity may create new activities
  • 19. BSP-STYLE SEMANTICSv  Programs are executed in rounds  Barrier synchronization between rounds  Multiple updates can be resolved with reduction operations  If (u,v) is an edge  𝑑𝑖𝑠𝑡(𝑣) = 𝑚𝑖𝑛{𝑑𝑖𝑠𝑡(𝑣), 𝑑𝑖𝑠𝑡(𝑢) + 𝑙(𝑢, 𝑣)}  Good performance if all processors are kept busy  Power-law graphs  Bad performance on road networks
  • 20. TRANSACTIONAL SEMANTICSv  Prevent activities from executing in parallel if they conflict  Activities do not see concurrent activities  Updates are visible after the execution is complete  SCHEDULING TYPES  Autonomous scheduling  Abort some of the activity in conflict  Coordinated scheduling  Static parallelization  Just-in-time parallelization  Runtime parallelization
  • 21. SYSTEMS FOR GRAPH ANALYTICSv  CombBLAS, Pregel, Giraph, PowerGraph, Ligra, Galois  Important additional properties of a graph analytics system  Pointer-jumping operations  Better performances in some algorithms  Graph partitioning among hosts of a distributed-memory cluster  Minimization of the number of edges that span to multiple hosts
  • 22. PERFORMANCE STUDIESv  Trade-off between abstraction and limitations  May prevent some implementations  Introduce performance penalties  Performance comparison on power-law graphs  Galois vs native implementation and other systems  Performance comparison with varying graph structures  Twitter graph – 51mln nodes, 2bln edges  U.S. road network – 24mln nodes, 58mln edges
  • 25. CONCLUSIONSv  PGA is one of the most rapidly growing research areas in Network Science  Amazon and Netflix adopts it to to find patterns of purchases by their customers  Intelligence agencies use it to find the key players in terrorist networks  Build efficiently engineered systems for PGA is one of the major actual challenges
  • 26. REFERENCESv [1] K. Pingali, D.Nguyen, The Tao of Parallelism in Algorithms, 2011. [2] R. Nasre, M. Burtscher, K. Pingali, Data-driven versus Topology-driven Irregular Computations on GPUs, 2013, pp. 3-6. [3] D. Ajwani, Trade-offs in Processing Large Graphs: Representations, Storage, Systems and Algorithms, 2015, pp. 43-53.

Editor's Notes

  • #4: I’ll start with TAO analysis, which provides a useful taxonomy for the graph analytics. We’ll surf now through these categories full of key concepts that we’re going to use later.
  • #5: First off, we’ll talk about graph topologies. I’ll go pretty fast through here, since these are part of the course contents. We consider Uniform degree graphs which have node’s degree fairly uniform and I provide as an example this road network. We have then power law graphs, with high CC, low diameter in relation to the dimensions of the graph and the rich get richer property. Finally we have Erdos-Renyi graphs, which basically are random graphs.
  • #6: Push – reads the label of the active node and writes on its neighbors Pull – The active node reads the label of its neighbors and updates the active node
  • #7: Unordered – These soft priority may not be respected at runtime execution. Topology – every node is processed in each super-step, until a convergence criterion is reached. It may be work-inefficient since there may not be computations that need to be performed at each node. It is then much inefficient in sequential and for large sparse graphs. Despite this, it is very efficient if combined with GPU computing and it’s also quite easy to implement. Data – worklists may lead to a centralization bottleneck problem and at each computation new activities may be created and then added to the worklist.
  • #8: And by saying speculative work, I mean it gets work-inefficience. Delta - must be set in an appropriate way (usually as a small value), so that the algorithm is work-efficient (delta = 1) without chocking off parallelism. Buckets – so if delta = 10, we have the first bucket going from 0 to 9, the second from 10-19.
  • #9: On the top right corner of the page, we can see the pseudocode of the algorithm with the bold part corresponding to what we are executing and the color represents the actual selected bucket. At the first iteration, we select the initial node and we relax along its light edges.
  • #10: So we update the node labels, and we relax again on light edges from the new nodes.
  • #11: Here we update the distance of the node from 9 to 8 and at next step we do the same for the node with the label currently set to 10.
  • #12: Here we have just one more light node to process for the green bucket.
  • #13: We now relax along the heavy edges, which are the bold ones, updating their labels.
  • #14: Now we finished the first iteration, so we select the yellow bucket.
  • #15: In the next step we have just one light edge to relax along.
  • #16: We update then the label after the relaxation and on the next step we’ll relax along the heavy nodes.
  • #17: There’s nothing to do, so we simply mark the nodes as processed and we terminate.
  • #18: If we see the buckets as worklists we can easily see how new activities are pushed into the worklist. Just a data-driven algorithm can do this. If we were talking about a topology driven algorithm, we would have had all nodes as active nodes and at each iteration we would have applied an operation sweeping over the whole graph, repeating the cycle until the attainment of a convergence criterion.
  • #19: An example of data-parallel operation is a map operation would consist of applying an f function to a set nodes, producing a new set with the same cardinality. AMORPHOUS - What matters in this parallel abstraction is the memory model, which defines the semantics of how to read and write in overlapped regions.
  • #20: The update of vertices and edges labels can be interpreted as a communication between rounds. We have here an example of a reduction operation applied to the Bellman-Ford algorithm, use to update the node label if there are conflicts during execution. On power-law graphs we have a small diameter and high connectivity, so a large number of processors are kept busy and there are few rounds to synchronize. On road networks we have the opposite situation, so we’ll use a small amount of processors and we’ll have a large number of rounds.
  • #21: Autonomous – If a conflict is detected, some of the conflicting activities are automatically aborted, otherwise the activity is committed and the updates get visible. Static – The active nodes can be exectued in parallel without any conflict checking like in a matrix-vector product, that will never produce conflicts. Just in time – preprocess of the input graph to find conflict-free schedules to be executed. This strategy is very strong if combined with topology-driven algorithms. Runtime – Execution in rounds, a set of active nodes is chosen, their neighborhoods are computed and only a set of non-conflicting activities is chosen for execution.
  • #22: So let’s start talking about systems. We soon give the names of the most popular systems for graph analytics, where Galois is the the system implemented by the researchers of the article I’ve chosen. In addition to the TAO components we’ve been explaining until now, these systems should have two more properties. Pointer-jumping operations are essential to get better performances during the execution of some algorithms, like Connected-component algorithm. We simply redirect a node with degree 1 to the father of its father like in the figure shown. This can help to iterate faster using Disjoint-set operations. Graph partitioning is essential if we want to compute through a distributed-memory cluster. To reduce the span factor we can partition using edges instead of nodes. This technique is called 2D partitioning.
  • #23: The abstractions makes the implementations easier for programmers, but may insert limitations. Some limitations may even inhibit some kinds of algorithm implementations. The abstractions obviously introduce performance penalties if compared to the native implementation.
  • #24: In the first study, we can see the resolution of common problems on a power-law graph using different systems and making a comparison with the Native implementation. As we can see Galois has done a good work, being pretty close to the native implementation by showing a performance slower just by 1.2%. We’ll explain later why.
  • #25: In this study, we analyze the comparison between Galois and other systems, with varying graph structures. Galois shows better performances on the road network because Galois is the only system with Transactional semantics implemented, and, as we said before, BSP-style semantics are work-inefficient with uniform-degree graphs. We have better performances also on SSSP algorithms, because of the implementation of data-driven algorithms. And we finally have better performances also in the Connected Components test, thanks to the implementation in Galois of pointer-jumping operations.
  • #26: Amazon and Netflix - …for their recommender systems. This is possible thanks to the use of collaborative filtering techniques. Even Galois has its own defeats, in fact it cannot operate on distributed-memory systems. (for now)