SlideShare a Scribd company logo
Graph Processing
  Applications
praveensripati@gmail.com

www.thecloudavenue.com

    @praveensripati
Agenda

Introduction to Graphs

     Representing graphs

     Different types of graphs

     Algorithms in graphs

What constitutes a graph application

     Graph databases (examples and how they work)

     Graph computing engines (examples and how they work)

Questions & Answers
What are/aren't Graphs in this context?




         YES                   NO
How is a graph represented?
                                               4




                 1              2              3              6



                                                                               Vertex

                                                   5
                                                                      Edge

A collection of vertices connected to each other using edges, with both vertices and edges
having properties. A vertex can be a person, place, account or any item which needs to be
tracked.
W
                                                                                  Sh hom

                           n ds
                                ?      A social graph                               ee s
                                                                                      ta ho
                                                                                        l t ul
                                                                                           o d
                      f rie                                                                 be I r
                 's                                                                           fri eco
              run                                                Deepak
                                                                                                 en m
        reA                                                                                        ds m
    h oa                                                            4                                wi en
W                                                                                                      th d
                                                                                                         ?

                                             Friend              Relative
                                    Friend                                   Friend




                                                        Friend
                               1               2                     3      Bob       6   Sheetal
      Name:Arun                               Tom
       Age : 25
       Sex : M                                                    Friend Relation : Collegue
                                             Collegue
                                                                                                       Vertex
                                                                        5
                                                                                                Edge
Properties                                                         Prajval
Facebook Recruiting Competition
                     @
                 w           The challenge is to recommend missing links in a social
              vie
         inter ok?           network. Participants will be presented with an external
    t an cebo                anonymized, directed social graph (no, not Facebook, keep
  an Fa                      guessing) from which some edges have been deleted, and
W
                             asked to make ranked predictions for each user in the test set
                             of which other users they would want to follow.

                                             What is Kaggle?
                         4                   Kaggle is an innovative solution for
                                             statistical/analytics outsourcing. We are the
                                             leading platform for predictive modeling
                                             competitions. Companies, governments and
 1            2          3            6      researchers present datasets and problems - the
                                             world's best data scientists then compete to
                                             produce the best solutions. At the end of a
                                             competition, the competition host pays prize
                                             money in exchange for the intellectual property
                         5
                                             behind the winning model.

                               http://www.kaggle.com/c/FacebookRecruiting
I
                                                                           th wou
                   r tes
                        t
                 ho een ta?
                                A spatial graph                              e
                                                                                pl ld l
                                                                                  a
               s                                                             sh ce ike
           t he etw lcut                                                         or s, to
       t is e b Ca                                  New Delhi                      te wh co
                                                                                     st ic v
    ha tanc and                                                                         pa h er
   W is re
     D alo                                                4                               th is all
       g                                                                                    ? th
                                                                                                 e
   B an                         450 km
                                                                      600 km
                                                     250 km

                              350 km            450 km
                          1              2                 3 Lucknow      6    Kolkotta
   Name:Bangalore                      Mumbai
Populataion : 25,00,000                                  850 km
 Area : 35,000 SqKm                                                Distance : 700 km
                                                                                              Vertex
                                  800 km
                                                              5
                                                                                       Edge
      Properties                                         Chennai
How to represent a Graph for computing?
                                                                            3, 6
.... as an adjacency list for sparse graph                              4

1 -> 2,4,5
2 -> 3
3 -> 5                                  2, 4, 5           3                     5
4 -> 3.6
5 ->                                         1            2             3             6
6 -> 5
                                                                                      5
.... as an adjacency matrix for dense graph

       1     2    3     4     5    6
                                                                            5
  1    0     1    0     1     1    0
  2    0     0    1     0     0    0              A graph with few edges is sparse,
                                                       many edges is dense.
  3    0     0    0     0     1    0
  4    0     0    1     0     0    0
  5    0     0    0     0     0    0              Obviously, the web with billions
                                                  of pages cannot be represented
  6    0     0    0     0     1    0                   as an adjaceny matrix.
Different Graphs

 Social graph (Facebook, LinkedIn etc)

 Spacial graph (Google Maps, MapQuest, FedEx etc)

 Web graph (PageRank, Recomendations etc)

 Computer network graph (Optimal network layout
etc)

 Financial graph (Fraud detection, Currency Flow
etc)

 Data representations (Lists etc)

 Chemistry (to represent genomes/molucules)

 And others
Some of the Graph Algorithms

    Shortest path (Finding the shortest path from A to B)

    Minimal Spanning Tree (Cheapest way to connect objects, so that each
    object is connected to another – can be used in internet, cable wiring etc)





    Graph center (placing a warehouse, hospital in a city, so that all the
    locations can be reached easily)

    Bipartite Matching (Matching in a dating site, job to employee and others)

    Finding Planar Graph (as in the case of circuit designs).

                      http://www.graph-magics.com/practic_use.php
Graph Applications


                  Applications




                                                  Hama
                                   Giraph



Graph Databases                  Graph processing frameworks
How to store a Graph?
                                      Sim
                                      an ple, b
                                        de
Option 1 : In a flat file as               asy ut no
                                                to t effi
                                                  ma cie
       1- 4,5,6                                      inta nt
                                                          in.
       4- 2,5,6

Where vertex 1 is connected to vertex 4,5,6 and so on



Option 2 : In a relational database using referencing
tables or join tables.



Option 3 : Using a specialized database designed only
and only for graphs.
Comparing Graph with Relational DB
                 ld
             wou ring
        one r sto
    ich fer fo ata?
Wh pre h d              In a DB of 1,000,000 users finding friends-of-friends
          p
y ou Gra                         for 1,000 users at various depths.


     Depth                             Execution Time – MySQL             Execution Time –Neo4j
     2                                 0.016                              0.010
     3                                 30.267                             0.168
     4                                 1,543.505                          1.359
     5                                 Not Finished in 1 Hour             2.132




              http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
So, what is a Graph DB?
A graph database is any storage system that
provides `index free adjacency`.                                          3, 6
                                                                     4



                                       2, 4, 5          3                    5
                                          1             2             3              6

                                                                                 5



                                                                         5
Every element (node or edge) has a direct pointer to it's adjacent element.

No Index lookup : We can determine which vertex is adjacent wo which other vertex
without lookup an index-tree.
So, what is a Graph DB? (.....)

                      n
                 p tio s.
           th e o raph
         is g g
    h DB istin
         s
 rap per
G en
 wh
So, what is a Graph DB? (.....)


                          Key Value Store like Amazon Dynamo.
Data Size




                                     Columnar Databases like Cassandra, HBase.


                                               Document Databases like MongoDB,
                                               CouchDB..

                                                        Graph Databases like Neo4J
                            ily
                            m
                          fa
                        L
                      Q
                    oS
                    N
                t he




                                  Data Complexity
             of
             rt
            Pa
Graph DB Bindings (~JDBC API)
//connect to the database
//begin transaction

Node firstNode;
Node secondNode;
Relationship relationship;

firstNode = graphDb.createNode();
firstNode.setProperty( "message", "Hello, " );
secondNode = graphDb.createNode();
secondNode.setProperty( "message", "World!" );

relationship = firstNode.createRelationshipTo( secondNode,
RelTypes.KNOWS );
relationship.setProperty( "message", "brave Neo4j " );

//end the transaction
//close the connection to the database


           http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
Graph Adhoc Query (~SQL)

START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof



 john                    fof
 Node[4]{name:"John"}    Node[2]{name:"Maria"}
 Node[4]{name:"John"}    Node[3]{name:"Steve"}




                  http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Different Graph Databases
                                                      FlockDB from
                                                      Twitter

                           Allegrograph



GraphBase




                                                   From
                                                   Objectivity




     http://en.wikipedia.org/wiki/Graph_database
What is a Graph Computing Engine?

 Algorithms




                 Graph Computing                                     OutputFormat
                 Engine                                             Output Location




                 Graph engines come with some built-in graph
 InputFormat     processing algorithms, but also provide an easy to use
Input Location   API to build new algorithms and extend the framework.

                 http://incubator.apache.org/giraph/apidocs/index.html
                 http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
Different Graph Computing Engines

Memory based graphs like (graph size < local machine ram)
     - jung.sourceforge.net
     - igraph.sourceforge.net
     - metworkx.lanl.gov

Disk based graphs like (graph size < local hard disk size)
       - Neo4j
       - Infinite Graph – objectivity.com
       - sparsity-technologies.com/dex

Cluster based graphs like (depends on the cluster specs)
                                                                                            l
       - Apache Hama                                                                     de
                                                                                       mo l
       - Apache Giraph                                                        SP llel) ege
                                                                             B a r
       - GoldenORB
                                                                      d  on Par le p
                                                                    se ous oog
                                                                 Ba ron f G
                                                                    h      o
                                                                y nc pirit
                                                           l k S he s
                                                       ( Bu in t
Bulk Synchronous Parallel

Some quick facts

• An alternate computing model to MapReduce (Not all problems can be solved with
  MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and
  vice versa.

  Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the
  Pregel Paper (extensively used for PageRank)

  Good for

  - Processing big data with complicated relationships, eg., graph and networks.
  - Iterative and Recursive scientific computations
  - Continious Event Processing (CEP)




         http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
                         http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
What is Bulk Synchronous Parallel?


                                                                       Super Step 1



                                                                       Super Step 2




                                                                       Super Step 3




            http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/
    http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
Hama vs Giraph
                        Derived                           Derived

                                Google Pregel **


                                                            Giraph


                  Hama                                        BSP


                   BSP                                  MapReduce



                                       HDFS

** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Hama vs Giraph (.....)

                    Hama                                                    Giraph
Pure BSP engine.                                     Uses BSP, but BSP API is not exposed.
Matrix, Graph, Network and other                     Just for Graph processing.
procesing.
Jobs are run as a BSP Job on HDFS.                   Jobs as run as MapReduce on Hadoop.

Both of them are derived from on `Pregel : A System for Large-Scale Graph
Processing` paper published by Google. Both have been recently promoted from
Incubator to Apache Top Level Project.
Both of them have a few graph algorithms implemented and also provide a very easy
API to implement new Graph algorithms.




        ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Page Rank in Hama

           PageRank Algorithm assigns numerical
           weightage to each element of a hyperlinked set of
           documents

           .
           bin/hama jar ../hama-0.4.0-examples.jar pagerank
           <input path> <output path> [damping factor]
           [epsilon error] [tasks]


           Input                        Output

           Site1tSite2tSite3          Site1 0.5
           Site2tSite3                 Site2 1.3
           Site3                        Site3 1.2




 http://wiki.apache.org/hama/PageRank
What's next?
Deep dive into

       - Both Graph databases and frameworks with a Demo.
       - Bulk Syncronous Parallel procssing model.




Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and
Databases are emerging and are an easy entry to contribute to in Apache.

Would suggest to subscribe/follow the mailing lists in Apache and try to get
familiar and contribute to them.
Q&A
Graph Processing Applications @ HUG

More Related Content

PPT
All about drawing Graphs
PPTX
Big deal big data
PPT
Where does hadoop come handy
PDF
Graph Processing with Titan and Scylla
PDF
Faunus: Graph Analytics Engine
PDF
Commonwealth Caribbean Criminal Practice and Procedure
PPTX
PDF
Graph Processing with Apache TinkerPop
All about drawing Graphs
Big deal big data
Where does hadoop come handy
Graph Processing with Titan and Scylla
Faunus: Graph Analytics Engine
Commonwealth Caribbean Criminal Practice and Procedure
Graph Processing with Apache TinkerPop

Viewers also liked (15)

PPT
Domain and range
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
PDF
Quantum Processes in Graph Computing
PPTX
Visual Mapping of Clickstream Data
PPTX
Reading Graphs & Charts
PPT
Cataloging of nonbook materials edited
PPTX
Interpreting charts and graphs
PPTX
Writing Objectives & Problem Statements
PPT
Dictionary Skills
PDF
Titan: The Rise of Big Graph Data
PDF
Titan: Big Graph Data with Cassandra
PPT
17. Trees and Graphs
PPT
Describing graphs
PDF
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
 
PPSX
Writing research objectives
Domain and range
Graphs are everywhere! Distributed graph computing with Spark GraphX
Quantum Processes in Graph Computing
Visual Mapping of Clickstream Data
Reading Graphs & Charts
Cataloging of nonbook materials edited
Interpreting charts and graphs
Writing Objectives & Problem Statements
Dictionary Skills
Titan: The Rise of Big Graph Data
Titan: Big Graph Data with Cassandra
17. Trees and Graphs
Describing graphs
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
 
Writing research objectives
Ad

Similar to Graph Processing Applications @ HUG (12)

PDF
Steven Davies - Design Portfolio
PDF
Increasing Social Media ROI Using Gladwell's Tipping Point Framework
PPT
Technical File Powerpoint
PPT
L3 cmp technicalfile_180911 powerpoint
PPT
L3 cmp technicalfile_180911 powerpoint
PDF
Gremlin: A Graph-Based Programming Language
PDF
Data Driven Design Research Personas
PDF
Explainable AI - making ML and DL models more interpretable
PDF
Folksonomies Indexing Und Retrieval In Bibliotheken
PDF
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
PDF
Semantic web user interfaces - Do they have to be ugly?
KEY
Improving decision-making based on government data and visualizations
Steven Davies - Design Portfolio
Increasing Social Media ROI Using Gladwell's Tipping Point Framework
Technical File Powerpoint
L3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpoint
Gremlin: A Graph-Based Programming Language
Data Driven Design Research Personas
Explainable AI - making ML and DL models more interpretable
Folksonomies Indexing Und Retrieval In Bibliotheken
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Semantic web user interfaces - Do they have to be ugly?
Improving decision-making based on government data and visualizations
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
August Patch Tuesday
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
Approach and Philosophy of On baking technology
Heart disease approach using modified random forest and particle swarm optimi...
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Hindi spoken digit analysis for native and non-native speakers
Assigned Numbers - 2025 - Bluetooth® Document
Web App vs Mobile App What Should You Build First.pdf
Group 1 Presentation -Planning and Decision Making .pptx
NewMind AI Weekly Chronicles - August'25-Week II
DP Operators-handbook-extract for the Mautical Institute
Building Integrated photovoltaic BIPV_UPV.pdf
WOOl fibre morphology and structure.pdf for textiles
A novel scalable deep ensemble learning framework for big data classification...
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative study of natural language inference in Swahili using monolingua...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
August Patch Tuesday
Chapter 5: Probability Theory and Statistics
SOPHOS-XG Firewall Administrator PPT.pptx

Graph Processing Applications @ HUG

  • 1. Graph Processing Applications praveensripati@gmail.com www.thecloudavenue.com @praveensripati
  • 2. Agenda Introduction to Graphs Representing graphs Different types of graphs Algorithms in graphs What constitutes a graph application Graph databases (examples and how they work) Graph computing engines (examples and how they work) Questions & Answers
  • 3. What are/aren't Graphs in this context? YES NO
  • 4. How is a graph represented? 4 1 2 3 6 Vertex 5 Edge A collection of vertices connected to each other using edges, with both vertices and edges having properties. A vertex can be a person, place, account or any item which needs to be tracked.
  • 5. W Sh hom n ds ? A social graph ee s ta ho l t ul o d f rie be I r 's fri eco run Deepak en m reA ds m h oa 4 wi en W th d ? Friend Relative Friend Friend Friend 1 2 3 Bob 6 Sheetal Name:Arun Tom Age : 25 Sex : M Friend Relation : Collegue Collegue Vertex 5 Edge Properties Prajval
  • 6. Facebook Recruiting Competition @ w The challenge is to recommend missing links in a social vie inter ok? network. Participants will be presented with an external t an cebo anonymized, directed social graph (no, not Facebook, keep an Fa guessing) from which some edges have been deleted, and W asked to make ranked predictions for each user in the test set of which other users they would want to follow. What is Kaggle? 4 Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for predictive modeling competitions. Companies, governments and 1 2 3 6 researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property 5 behind the winning model. http://www.kaggle.com/c/FacebookRecruiting
  • 7. I th wou r tes t ho een ta? A spatial graph e pl ld l a s sh ce ike t he etw lcut or s, to t is e b Ca New Delhi te wh co st ic v ha tanc and pa h er W is re D alo 4 th is all g ? th e B an 450 km 600 km 250 km 350 km 450 km 1 2 3 Lucknow 6 Kolkotta Name:Bangalore Mumbai Populataion : 25,00,000 850 km Area : 35,000 SqKm Distance : 700 km Vertex 800 km 5 Edge Properties Chennai
  • 8. How to represent a Graph for computing? 3, 6 .... as an adjacency list for sparse graph 4 1 -> 2,4,5 2 -> 3 3 -> 5 2, 4, 5 3 5 4 -> 3.6 5 -> 1 2 3 6 6 -> 5 5 .... as an adjacency matrix for dense graph 1 2 3 4 5 6 5 1 0 1 0 1 1 0 2 0 0 1 0 0 0 A graph with few edges is sparse, many edges is dense. 3 0 0 0 0 1 0 4 0 0 1 0 0 0 5 0 0 0 0 0 0 Obviously, the web with billions of pages cannot be represented 6 0 0 0 0 1 0 as an adjaceny matrix.
  • 9. Different Graphs Social graph (Facebook, LinkedIn etc) Spacial graph (Google Maps, MapQuest, FedEx etc) Web graph (PageRank, Recomendations etc) Computer network graph (Optimal network layout etc) Financial graph (Fraud detection, Currency Flow etc) Data representations (Lists etc) Chemistry (to represent genomes/molucules) And others
  • 10. Some of the Graph Algorithms  Shortest path (Finding the shortest path from A to B)  Minimal Spanning Tree (Cheapest way to connect objects, so that each object is connected to another – can be used in internet, cable wiring etc)  Graph center (placing a warehouse, hospital in a city, so that all the locations can be reached easily)  Bipartite Matching (Matching in a dating site, job to employee and others)  Finding Planar Graph (as in the case of circuit designs). http://www.graph-magics.com/practic_use.php
  • 11. Graph Applications Applications Hama Giraph Graph Databases Graph processing frameworks
  • 12. How to store a Graph? Sim an ple, b de Option 1 : In a flat file as asy ut no to t effi ma cie 1- 4,5,6 inta nt in. 4- 2,5,6 Where vertex 1 is connected to vertex 4,5,6 and so on Option 2 : In a relational database using referencing tables or join tables. Option 3 : Using a specialized database designed only and only for graphs.
  • 13. Comparing Graph with Relational DB ld wou ring one r sto ich fer fo ata? Wh pre h d In a DB of 1,000,000 users finding friends-of-friends p y ou Gra for 1,000 users at various depths. Depth Execution Time – MySQL Execution Time –Neo4j 2 0.016 0.010 3 30.267 0.168 4 1,543.505 1.359 5 Not Finished in 1 Hour 2.132 http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
  • 14. So, what is a Graph DB? A graph database is any storage system that provides `index free adjacency`. 3, 6 4 2, 4, 5 3 5 1 2 3 6 5 5 Every element (node or edge) has a direct pointer to it's adjacent element. No Index lookup : We can determine which vertex is adjacent wo which other vertex without lookup an index-tree.
  • 15. So, what is a Graph DB? (.....) n p tio s. th e o raph is g g h DB istin s rap per G en wh
  • 16. So, what is a Graph DB? (.....) Key Value Store like Amazon Dynamo. Data Size Columnar Databases like Cassandra, HBase. Document Databases like MongoDB, CouchDB.. Graph Databases like Neo4J ily m fa L Q oS N t he Data Complexity of rt Pa
  • 17. Graph DB Bindings (~JDBC API) //connect to the database //begin transaction Node firstNode; Node secondNode; Relationship relationship; firstNode = graphDb.createNode(); firstNode.setProperty( "message", "Hello, " ); secondNode = graphDb.createNode(); secondNode.setProperty( "message", "World!" ); relationship = firstNode.createRelationshipTo( secondNode, RelTypes.KNOWS ); relationship.setProperty( "message", "brave Neo4j " ); //end the transaction //close the connection to the database http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
  • 18. Graph Adhoc Query (~SQL) START john=node:node_auto_index(name = 'John') MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof john fof Node[4]{name:"John"} Node[2]{name:"Maria"} Node[4]{name:"John"} Node[3]{name:"Steve"} http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
  • 19. Different Graph Databases FlockDB from Twitter Allegrograph GraphBase From Objectivity http://en.wikipedia.org/wiki/Graph_database
  • 20. What is a Graph Computing Engine? Algorithms Graph Computing OutputFormat Engine Output Location Graph engines come with some built-in graph InputFormat processing algorithms, but also provide an easy to use Input Location API to build new algorithms and extend the framework. http://incubator.apache.org/giraph/apidocs/index.html http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
  • 21. Different Graph Computing Engines Memory based graphs like (graph size < local machine ram) - jung.sourceforge.net - igraph.sourceforge.net - metworkx.lanl.gov Disk based graphs like (graph size < local hard disk size) - Neo4j - Infinite Graph – objectivity.com - sparsity-technologies.com/dex Cluster based graphs like (depends on the cluster specs) l - Apache Hama de mo l - Apache Giraph SP llel) ege B a r - GoldenORB d on Par le p se ous oog Ba ron f G h o y nc pirit l k S he s ( Bu in t
  • 22. Bulk Synchronous Parallel Some quick facts • An alternate computing model to MapReduce (Not all problems can be solved with MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and vice versa. Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the Pregel Paper (extensively used for PageRank) Good for - Processing big data with complicated relationships, eg., graph and networks. - Iterative and Recursive scientific computations - Continious Event Processing (CEP) http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
  • 23. What is Bulk Synchronous Parallel? Super Step 1 Super Step 2 Super Step 3 http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/ http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
  • 24. Hama vs Giraph Derived Derived Google Pregel ** Giraph Hama BSP BSP MapReduce HDFS ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 25. Hama vs Giraph (.....) Hama Giraph Pure BSP engine. Uses BSP, but BSP API is not exposed. Matrix, Graph, Network and other Just for Graph processing. procesing. Jobs are run as a BSP Job on HDFS. Jobs as run as MapReduce on Hadoop. Both of them are derived from on `Pregel : A System for Large-Scale Graph Processing` paper published by Google. Both have been recently promoted from Incubator to Apache Top Level Project. Both of them have a few graph algorithms implemented and also provide a very easy API to implement new Graph algorithms. ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 26. Page Rank in Hama PageRank Algorithm assigns numerical weightage to each element of a hyperlinked set of documents . bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks] Input Output Site1tSite2tSite3 Site1 0.5 Site2tSite3 Site2 1.3 Site3 Site3 1.2 http://wiki.apache.org/hama/PageRank
  • 27. What's next? Deep dive into - Both Graph databases and frameworks with a Demo. - Bulk Syncronous Parallel procssing model. Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and Databases are emerging and are an easy entry to contribute to in Apache. Would suggest to subscribe/follow the mailing lists in Apache and try to get familiar and contribute to them.
  • 28. Q&A