Distributed Graph Algorithms

distributed graph algorithms
Generalized Architecture For Some Graph Problems
Abhilash Kumar and Saurav Kumar
November 10, 2015
Indian Institute of Technology Kanpur

problem statement
∙ Compute all connected sub-graphs of a given graph,
in a distributed environment
2

problem statement
∙ Compute all connected sub-graphs of a given graph,
in a distributed environment
∙ Develop a generalized architecture to solve similar
graph problems
2

motivation
∙ Exponential number of connected sub-graphs of a
given graph
4

motivation
∙ Exponential number of connected sub-graphs of a
given graph
∙ Necessity to build distributed systems which utilize
the worldwide plethora of distributed resources
4

approach
Insights
∙ Connected sub-graphs exhibit sub-structure
6

approach
Insights
∙ Extend smaller sub-graphs by adding an outgoing edge to
generate larger sub-graphs
6

approach
Insights
∙ Extend smaller sub-graphs by adding an outgoing edge to
generate larger sub-graphs
∙ Base cases are sub-graphs represented by all the edges of the
graph
6

approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
7

approach
∙ Initialize:
∙ Queue Q
7

approach
∙ Initialize:
∙ Queue Q
∙ For each edge in G
7

approach
∙ Initialize:
∙ Queue Q
∙ Create a sub-graph G’ representing the edge
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
∙ Process:
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
∙ Process:
∙ G = Q.pop()
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
∙ Process:
∙ G = Q.pop()
∙ Save G
7

approach
∙ Initialize:
∙ Queue Q
∙ Push G’ to Q
∙ Process:
∙ G = Q.pop()
∙ Save G
∙ For each outgoing edge E of G
G’ = G U E
if G’ has not been seen yet
Push G’ to Q
7

approach
Figure: Generating initial sub-graphs from a given graph
8

approach
Figure: Extending a sub-graph to generate new sub-graphs
9

approach
Figure: Consider only unique sub-graphs generated for further processing
10

architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
12

architecture
applications
∙ Message passing to communicate over TCP
12

architecture
applications
∙ Master assigns tasks to slaves and ﬁnally collects the results
12

architecture
applications
∙ A Task object represents a sub-graph which contains all
necessary information to process that sub-graph
12

architecture
applications
∙ A Task object represents a sub-graph which contains all
necessary information to process that sub-graph
∙ A slave may request a task from other slaves when its task
queue is empty and processing ends when all task queues are
empty
12

architecture
Task, Queue and Bloom ﬁlter
∙ A task has these information:
13

architecture
∙ A list of vertices that are already in the sub-graph
13

architecture
∙ A list of edges that can be extended in the next step
13

architecture
∙ Task Queue
13

architecture
∙ Task Queue
∙ Each slave has a task queue
13

architecture
∙ Task Queue
∙ Slave picks up a task from its task queue and processes it
13

architecture
∙ Task Queue
∙ Newly generated unique tasks are pushed into the task queue
13

architecture
∙ Task Queue
∙ Bloom ﬁlter
13

architecture
∙ Task Queue
∙ Bloom ﬁlter
∙ We use Bloom ﬁlter to check uniqueness of the newly generated
tasks (i.e. sub-graphs)
13

architecture
∙ Task Queue
∙ Bloom filter
∙ We use Bloom filter to check uniqueness of the newly generated
tasks (i.e. sub-graphs)
∙ Bloom filter is also distributed so that none of the servers get
loaded
13

architecture
Bloom Filter Vs Hashing
∙ Used bloom ﬁlter because its very space efﬁcient
14

architecture
∙ Space required to get error probability of p is
−n × ln p
(ln 2)2
bits
14

architecture
−n × ln p
(ln 2)2
bits
∙ Error probability can be reduced with very little extra space
14

architecture
−n × ln p
(ln 2)2
bits
∙ Hashing can be used to make the algorithm deterministic
14

architecture
−n × ln p
(ln 2)2
bits
∙ Hashing can be used to make the algorithm deterministic
∙ Bloom ﬁlter can also be parallelized whereas Hashing cannot be.
14

architecture
How to use this architecture?
∙ Two functions required: initialize and process
15

architecture
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
15

architecture
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
∙ Process deﬁnes a procedure that will generate new tasks from a
given task (extend sub-graph in our case)
15

architecture
Fitting the connected sub-graph problem
∙ Initialize creates all the tasks (sub-graphs) with one edge.
16

architecture
Fitting the connected sub-graph problem
∙ Initialize creates all the tasks (sub-graphs) with one edge.
∙ Process takes a connected sub-graph and extends it by adding
all extend-able edges, one at a time
16

simulation
Simulation for testing
∙ Used 2 machines, say H and L.
18

simulation
∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz
18

simulation
∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz
18

simulation
∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz
∙ Opened multiple ports (6 on H, 2 on L) to mimic 8 slave servers.
18

simulation
∙ Used various combinations of number of slaves on H and L
19

simulation
∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results
19

simulation
∙ Collected data for number of tasks processed by each slave and
number of hash-check queries made by each slave.
19

simulation
∙ Collected data for number of tasks processed by each slave and
number of hash-check queries made by each slave.
∙ Collected total running time data for both graphs, including the
cases of network fault.
19

results
Figure: Number of hash check queries vs number of slaves for G(14, 13)
21

results
Figure: Distribution of number of tasks processed by slaves for G(14, 13)
22

results
23

results
24

results
Figure: Number of hash check queries vs number of slaves for G(16, 15)
25

results
26

results
27

results
28

results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
29

results
Actual Running Time
machines
∙ The architecture recovers from these faults, but a lot of time is
consumed
29

results
Actual Running Time
machines
consumed
∙ For G(14, 13), running time ranged from 15s to 91s
29

results
Actual Running Time
machines
consumed
29

results
Actual Running Time
machines
consumed
∙ These are the cases when process function doesn’t do
additional computation per subgraph.
29

results
Figure: Running time when process does addition computation(10ms)
30

advantages
Advantages
∙ Highly scalable
32

advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
32

advantages
Advantages
∙ Highly scalable
∙ Performance increases with number of slaves
32

advantages
Advantages
∙ Highly scalable
∙ Even distribution of tasks: efﬁcient machines process more tasks
32

advantages
Advantages
∙ Highly scalable
∙ Architecture is very reusable
32

advantages
Advantages
∙ Highly scalable
∙ Many other problems can be solved using this architecture
32

advantages
Advantages
∙ Highly scalable
∙ Only need to provide 2 functions: initialize and process
32

advantages
Advantages
∙ Highly scalable
∙ Only need to provide 2 functions: initialize and process
∙ Network fault tolerant
32

advantages
Other problems that can be solved using this
paradigm
∙ Generating all cliques, paths, cycles, sub-trees, spanning
sub-trees
33

advantages
Other problems that can be solved using this
paradigm
∙ Generating all cliques, paths, cycles, sub-trees, spanning
sub-trees
∙ Can also solve few classical NP problems like ﬁnding all
maximal cliques and TSP
33

future works
Further improvements
∙ Implement parallelized bloom ﬁlter
35

future works
∙ Parallely solving tasks in a slave (on powerful servers)
35

future works
∙ Handle slave/master failures
35

future works
∙ Using ﬁle I/O to store task queue for large problems
35

future works
∙ Using ﬁle I/O to store task queue for large problems
∙ Exploring this paradigm to solve other problems
35

conclusion
Conclusion
∙ The algorithm is very efﬁcient, total computation is not greater
than m * T, where T is the minimum computation required to
ﬁnd all sub-graphs and m is number of edges.
37

conclusion
Conclusion
∙ In practice time complexity is c*T where c is much smaller.
Bound on c can be improved to min(m, log T).
37

conclusion
Conclusion
∙ As we are interested in ﬁnding all connected sub-graph, T better
not be very large.
37

conclusion
Conclusion
∙ As we are interested in ﬁnding all connected sub-graph, T better
not be very large.
∙ The architecture help us solve this problem in much scalable
manner and signiﬁcantly reduces the time of computation
provided good infrastructure and better implementation.
37

Questions?
Implementation of the algorithm and the architecture available at
github.com/abhilak/DGA
Slides created using Beamer(mtheme) and plot.ly on ShareLaTeX
38

Distributed Graph Algorithms

More Related Content

What's hot (17)

Viewers also liked (15)

Similar to Distributed Graph Algorithms (20)

Recently uploaded (20)

Distributed Graph Algorithms