Parallel Graph Analytics

PARALLEL GRAPH ANALYTICS
STEFANO ROMANAZZI
STEFAN.ROMANAZZI@GMAIL.COM

INTRODUCTIONv
 Today we must deal with huge graphs
 Parallel computing is essential for analyzing such graphs
Web graph Facebook graph
50bln nodes, 1 trillion edges 1bln nodes, 200bln edges

TAO ANALYSISv
Graph
analytics
applications
Topology
Uniform
degree
Power-law
Erdős–Rényi
Active nodes
Location
Topology-
driven
Data-driven
Ordering
Unordered
Ordered
Operator
Morph
Local
computation
Reader

GRAPH TOPOLOGIESv
 Uniform-degree graphs
 Power-law graphs
 Erdős–Rényi graphs

OPERATORSv
 Morph  Modifies the structure of the graph
 Local computation  Updates labels on nodes and edges
 Push-style operator
 Pull-style operator
 Reader  Used on read-only data

ACTIVE NODESv
 ORDERING
 Ordered algorithms
 Active nodes must be processed in a specific order
 Unordered algorithms
 Any order is semantically correct
 It is possible to define soft priorities to enhance the efficiency
 LOCATION
 Topology-driven algorithms
 All graph nodes are active nodes
 Work-inefficient
 Easier to implement on GPU
 Data-driven algorithms
 Visits nodes only if there may be work to be performed
 Threads obtain work by pulling active nodes from a worklist

ASYNCHRONOUS Δ-STEPPINGv
 Example of data-driven algorithm
 Parallel version of the Dijsktra algorithm
 Δ is a user-defined parameter
 Higher the Δ, more the parallelism and higher the speculative work
 Two types of edges
 Heavy edge if 𝑤(𝑒) > Δ
 Light edge if 𝑤(𝑒) < Δ
 Buckets represent worklists
 Bucket 𝑖 contains vertices with tentative distance in [Δ 𝑖 − 1 , Δ 𝑖 − 1]

PARALLEL ABSTRACTIONSv
 Data parallelism
 Defined data-parallel operations
 Amorphous data-parallelism
 Prevent activities on overlapping neighborhoods from being executed in parallel
 Ordering constraints
 The execution of a new activity may create new activities

BSP-STYLE SEMANTICSv
 Programs are executed in rounds
 Barrier synchronization between rounds
 Multiple updates can be resolved with reduction operations
 If (u,v) is an edge  𝑑𝑖𝑠𝑡(𝑣) = 𝑚𝑖𝑛{𝑑𝑖𝑠𝑡(𝑣), 𝑑𝑖𝑠𝑡(𝑢) + 𝑙(𝑢, 𝑣)}
 Good performance if all processors are kept busy
 Power-law graphs
 Bad performance on road networks

TRANSACTIONAL SEMANTICSv
 Prevent activities from executing in parallel if they conflict
 Activities do not see concurrent activities
 Updates are visible after the execution is complete
 SCHEDULING TYPES
 Autonomous scheduling
 Abort some of the activity in conflict
 Coordinated scheduling
 Static parallelization
 Just-in-time parallelization
 Runtime parallelization

SYSTEMS FOR GRAPH ANALYTICSv
 CombBLAS, Pregel, Giraph, PowerGraph, Ligra, Galois
 Important additional properties of a graph analytics system
 Pointer-jumping operations
 Better performances in some algorithms
 Graph partitioning among hosts of a distributed-memory cluster
 Minimization of the number of edges that span to multiple hosts

PERFORMANCE STUDIESv
 Trade-off between abstraction and limitations
 May prevent some implementations
 Introduce performance penalties
 Performance comparison on power-law graphs
 Galois vs native implementation and other systems
 Performance comparison with varying graph structures
 Twitter graph – 51mln nodes, 2bln edges
 U.S. road network – 24mln nodes, 58mln edges

CONCLUSIONSv
 PGA is one of the most rapidly growing research areas in Network Science
 Amazon and Netflix adopts it to to find patterns of purchases by their customers
 Intelligence agencies use it to find the key players in terrorist networks
 Build efficiently engineered systems for PGA is one of the major actual
challenges

REFERENCESv
[1] K. Pingali, D.Nguyen, The Tao of Parallelism in Algorithms, 2011.
[2] R. Nasre, M. Burtscher, K. Pingali, Data-driven versus Topology-driven
Irregular Computations on GPUs, 2013, pp. 3-6.
[3] D. Ajwani, Trade-offs in Processing Large Graphs: Representations,
Storage, Systems and Algorithms, 2015, pp. 43-53.

Parallel Graph Analytics

More Related Content

What's hot (20)

Similar to Parallel Graph Analytics (20)

Recently uploaded (20)

Parallel Graph Analytics

Editor's Notes