Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model

Fast Billion-scale Graph Computation Using a Bimodal
Block Processing Model
Hugo Gualdron1, Robson Cordeiro1, Jose Rodrigues-Jr1,
Duen Horng (Polo) Chau 2, Minsuk Kahng2, U Kang3
1University of Sao Paulo, Brazil
2Georgia Institute of Technology, Atlanta, USA
3Seoul National University, Republic of Korea
{gualdron,robson,junio}@icmc.usp.br, {polo,kahng}@gatech.edu, ukang@snu.ac.kr
Sept/2016
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 1 / 28

Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Introduction
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Introduction
Introduction
Frameworks for large-graph processing based on secondary memory
have shown better or equal performance than distributed frameworks.
They are better to solve problems such as PageRank, Connected
Components and Triangle Counting.
Large-graph processing by minimizing computational resources is very
important for the industry.
“You can have a second computer once you have shown you know
how to use the ﬁrst one.”
Paul Barham
Gualdron et al. (1

Introduction
Contributions
M-Flash, the fastest graph computation framework to date.
A Bimodal Block Processing Model, an innovation that is able to
boost the graph computation by minimizing the I/O cost even further.
A ﬂexible and simple programming model to easily implement
popular and essential graph algorithm including the ﬁrst
single-machine billion-scale eigensolver.
Gualdron et al. (1

Graph Organization
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Graph Organization
Graph Organization – Facts
Edges and vertex values do not ﬁt in main memory.
Sequential operations on disk must be maximized, improving the
performance on HDDs and SSDs.
Real graphs have a varying density of edges (sparse and dense
regions) and these regions can be processed in diﬀerent ways to
achieve superior performance.
Gualdron et al. (1

Graph Organization
Graph Organization
Each vertex has a set of attributes γ
Gualdron et al. (1

Graph Organization
Graph Organization
Vertices are divided into intervals.
Intersections between intervals are called blocks
Edges are divided into blocks
Gualdron et al. (1

Graph Organization
Graph Organization
M-Flash loads in memory some vertex intervals and edges are
processed using streaming.
Gualdron et al. (1

Graph Organization
Graph Organization
Edges create a source-partition (SP) when they are grouped by source interval.
Gualdron et al. (1

Graph Organization
Graph Organization
Edges create a destination-partition (DP) when they are grouped by destination interval.
Gualdron et al. (1

Graph Organization
Number of intervals to process a graph
β =
φ(T + 1) |V |
M
M = RAM size
|V | = Number of vertices of the graph
φ = Size of data per vertex
T = Number of threads
Gualdron et al. (1

Graph Organization
β =
φ(T + 1) |V |
M
M = RAM size
For example, 4 bytes of data per node, 2 threads, a graph with 2 billion
nodes, and for 1 GB RAM:
Gualdron et al. (1

Graph Organization
β =
φ(T + 1) |V |
M
M = RAM size
For example, 4 bytes of data per node, 2 threads, a graph with 2 billion
nodes, and for 1 GB RAM:
β = 23 intervals
529 blocks
Gualdron et al. (1

Processing Model
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Processing Model
Processing Model
Dense Block Processing model DBP
Gualdron et al. (1

Processing Model
Processing Model
Dense Block Processing model DBP
An eﬃcient model to process blocks with higher density, i.e., blocks
with high number of edges.
Gualdron et al. (1

Processing Model
Processing Model
Streaming Partition Processing Model SPP
SP = Source-partition, DP = Destination-partition
Gualdron et al. (1

Processing Model
Processing Model
Streaming Partition Processing Model SPP
SP = Source-partition, DP = Destination-partition
An eﬃcient model for processing sparse blocks.
Gualdron et al. (1

Processing Model
Processing Model
Bimodal Block Processing Model BBP
I/O cost as a metric of efficiency
DBP = Dense Block Processing, SSP = Streaming Partition Processing
O (DBP (G)) = O
(β + 1) |V | + |E|
B
+ β2
O (SPP (G)) = O


2 |V | + |E| + 2 Ê
B
+ β


β = Number of intervals
V = Number of Vertices
E = Number of Edges
Ê = Number of edges with extended size
B = Block size for I/O operations on disk
Gualdron et al. (1

Processing Model
Processing Model
I/O cost per block G(p,q)
O DBP G(p,q)
= O
ϑφ (1 + 1/β) + ξψ
B
O SPP G(p,q)
= O
2ϑφ/β + 2ξ(φ + ψ) + ξψ
B
ξ = Number of edges withing the block G(p,q)
ϑ = Number of vertices within the interval
φ = Vertex size in bytes
ψ = Edge size in bytes
B = Block size for I/O operations on disk.
Gualdron et al. (1

Processing Model
Processing Model
Ratio SPP/DBP
O
SPP
DBP
= O
1
β
+
2ξ
ϑ
1 +
ψ
φ
BlockType G(p,q)
=
sparse, if O SPP
DBP
< 1
dense, otherwise
Gualdron et al. (1

Processing Model
Processing Model — Preprocessing 1
Edges are divided in source intervals.
Gualdron et al. (1

Processing Model
The number of edges is measured at the same time to classify the block (sparse or dense).
Gualdron et al. (1

Processing Model
Source Partitions are rewritten and we store only edges of sparse blocks.
Gualdron et al. (1

Processing Model
Processing Model - an iteration
Destination-partitions are generated since the source-partitions.
For processing a source-partition, vertex values of the source interval are loaded in
memory.
The edges are rewritten and divided by destination.
Gualdron et al. (1

Processing Model
The graph processing starts over the destination-partitions and the dense blocks.
Gualdron et al. (1

Processing Model
Vertex values of the destination interval are initialized.
Next, edges of the destination partition are processed.
Gualdron et al. (1

Processing Model
Next, dense blocks of the interval are processed. To do the processing, vertex values
associated with the source interval are loaded in memory.
Gualdron et al. (1

Processing Model
Programming Model
MAlgorithm: Algorithm Interface
initialize (Vertex v);
gather (Vertex u, Vertex v, EdgeData data);
process (Accum v 1, Accum v 2, Accum v out);
apply (Vertex v);
PageRank
degree(v) = out degree for Vertex v;
initialize (v): v.value = 0
gather (u, v, data): v.value += u.value / degree(u)
process (v 1, v 2, Accum v out): v out = v 1 + v 2
apply (Vertex v): v.value = 0.15 + 0.85 * v.value
Gualdron et al. (1

Programming Model
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Programming Model
Programming Model
Algorithm for Connected Components
initialize (v): v.value = v.id
gather (u, v, data): v.value = min (u.value, v.value)
process: (v 1, v 2, Accum v out): v out = min (v 1, v 2)
Algorithm for Sparse Matrix-Vector Multiplication SpMV
initialize (v): v.value = 0
gather (u, v, data): v.value += u.value * data
process: (v 1, v 2, Accum v out): v out = v 1 + v 2
Gualdron et al. (1

Results
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Results
Results
Datasets
Graph Nodes Edges Size
LiveJournal 4,847,571 68,993,773 Small
Twitter 41,652,230 1,468,365,182 Medium
YahooWeb 1,413,511,391 6,636,600,779 Large
R-Mat (Synthetic graph) 4,000,000,000 12,000,000,000 Large
Gualdron et al. (1

Results
Results
State of the art approaches that are compared with
the experiments
GraphChi (2012)
X-Stream (2013)
TurboGraph (2013)
MMap (2014)
GridGraph (2015)
M-Flash
Gualdron et al. (1

Results
Preprocessing time (seconds)
LiveJournal Twitter YahooWeb R-Mat
GraphChi 23 511 2,781 7,440
X-Stream 5 131 865 2,553
TurboGraph 18 582 4,694 -
MMap 17 372 636 -
M-Flash 10 206 1,265 4,837
In the worst case, M-Flash is twice slower than the best framework
(MMAP)
It reads and writes two times the entire graph on disk, which is the
third best performance, after MMap and X-Stream.
Gualdron et al. (1

Results
Runtime (in seconds) with 8GB of RAM
The symbol “-” indicates that the corresponding system failed to process
the graph or the information is not available in the respective papers.
Gualdron et al. (1

Results
Eﬀect of Memory Size
Gualdron et al. (1

Conclusions
Summary
1 Introduction
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1

Conclusions
Conclusions
M-Flash uses an innovative design that considers the varying density
of the graph;
By using an adaptive engineering, it beneﬁts from the best of both
worlds: stream processing and dense-block processing;
Its programming model supports classical and new algorithms
straightly adapting to the model of other similar frameworks – among
its processing possibilities: PageRank, Connected Components,
matrix-vector multiplication, eigensolver, clustering coeﬃcient, to
name few;
To date, M-Flash is the fastest single-node graph processing
framework, beating competitors GraphChi, X-Stream, TurboGraph
and MMAP.
Gualdron et al. (1

Conclusions
Acknowledgments
CNPq (grant 444985/2014-0)
Fapesp (grants 2016/02557-0, 2014/21483-2),
Capes
NSF (grants IIS-1563816, TWC-1526254, IIS-1217559)
GRFP (grant DGE-1148903)
Korean (MSIP) agency IITP (grant R0190-15-2012)
Thanks
Gualdron et al. (1

Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model

More Related Content

Viewers also liked (20)

Similar to Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model (20)

More from Universidade de São Paulo (13)

Recently uploaded (20)

Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model