SlideShare a Scribd company logo
GoFFish: A Sub-Graph Centric Framework for
Large-Scale Graph Analytics
Guide:
Prof. Dinkar Sitaram
Department of Computer Science and Engineering
PES Institute of Technology
Ms. Prafullata
Department of Computer Science and Engineering
PES Institute of Technology
Guide:
Prof. Yogesh Simmhan
Department of SERC
Indian Institute of Science
Team Members:
Anushree P K 1PI11IS017
Bhavani B 1PI11IS027
Mithilesh K G 1PI11IS059
What is GoFFISH?
GoFFish is a scalable software framework for storing graphs, and composing and executing graph
analytics in a Cloud and commodity cluster environment
It consists of :-
1. GoFS – It is a distributed store for partitioning, storing and accessing graph datasets across hosts in a cluster.
2. Gopher - Gopher is a programming framework that offers sub-graph centric abstractions on a a Cloud or cluster
in conjunction with GoFS.
GoFFish is implemented in Java.
Existing GoFFISH Storage Architecture
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – t10
Slice 1 Slice 1
Same Storage format in worker 1
• Large graph partitioned into
subgraphs and distributed across
workers.
• GoFFISH default storage
10 bins in a slice. 10 instances
of every subgraph.
Using the GoFFish framework:
To store the real time graphs in – temporal and spatial formats
To compute the efficiencies of both storage formats based on the input algorithm (gopher job)
Intuition :-
The time taken for a gopher job over large graphs should be optimized when the graphs are stored
In the given two formats depending on the algorithm run.
For example for vertex count algorithm temporal format should take lessar computation time
Problem Definition
GoFFISH Storage Architecture – Our Model
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – tn
Slice 1
Same Storage format in worker 1
• All instances for a subgraph in
one slice.
• One bin per slice.
Slice 2
Slice 3
Slice 4
We used slicing pointers instancegroupingsize and numsubgraphbins to manipulate the way the graphs
were partitioned in order to obtain the desired storage format. These slicing pointers correspond the
temporal and subgraph bin packing schemes in GoFFISH.
Algorithms
• Vertex Count
Each subgraph processor calculate number of vertices within a subgraph.
They send messages to all other subgraphs with their count where each subgraph calculate the total using the
messages received.
• Connected Components
Each subgraph finds the smallest vertex id for each subgraph and propagate that smallest value to its connected
subgraphs. If incoming value to a subgraph is different from its current value it updates the current value and
propagate the changes to its neighbours.
Flow Diagram
Dataset :- Road network graph
Vehicle route tracking using traffic cams
–Time-series graph of sync camera snapshots
–Sensors are vertices
–Edges are road connectivity w/ distance
weight
Graph instance is image metadata every N sec
–License plate, vehicle color, direction, speed
Urban
Dataset
0
2000
4000
6000
8000
10000
12000
0 1 2 3 4
TImeTaken
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
4200
4250
4300
4350
4400
4450
0 1 2 3 4
TImeTaken
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
Vertex Count Connected Components
Performance Analysis
Partition 1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 1 2 3 4
TimeTaken
Superstep
Superstep- wise Compute Task Time
VC(s=10,t=10)
VC(s=1,t=ALL)
0
50
100
150
200
250
300
350
400
0 0.5 1 1.5 2 2.5 3 3.5
TimeTaken
Superstep
Superstep- wise Compute Task Time
CC(s=10,t=10)
CC(s=1,t=ALL)
Vertex Count Connected
Components
Performance Analysis
DEMO + Loggers
Screenshots
Screenshots
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, November 26, 2013
Scalable Analytics over Distributed Time-series Graphs using GoFFish - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, June 23, 2014
Chronos: A Graph Engine for Temporal Graph Analysis - Tsinghua University,
University of Science and Technology of China, Microsoft Research
References
The Team
Anushree Prasanna Kumar
8th Sem ISE
Bhavani B
8th Sem ISE
Mithilesh Kumar
8th Sem ISE
Thank you

More Related Content

PPTX
06 how to write a map reduce version of k-means clustering
PPSX
Guided Wave Propagation Simulation by ANSYS
PDF
Determining the k in k-means with MapReduce
PDF
Integration schemes in Molecular Dynamics
PDF
Hadoop combiner and partitioner
PPTX
How Rough Is Your Runway?
PPTX
Designing a machine learning algorithm for Apache Spark
PDF
Hadoop secondary sort and a custom comparator
06 how to write a map reduce version of k-means clustering
Guided Wave Propagation Simulation by ANSYS
Determining the k in k-means with MapReduce
Integration schemes in Molecular Dynamics
Hadoop combiner and partitioner
How Rough Is Your Runway?
Designing a machine learning algorithm for Apache Spark
Hadoop secondary sort and a custom comparator

What's hot (20)

PPT
3D Analyst - Cut and Fill
PPTX
Mapreduce total order sorting technique
PPTX
SparkNet presentation
PDF
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
PPTX
XL-Miner: Timeseries
PPTX
All projects
PPTX
3012: Assignment 3 Video clip collection 1
PPTX
Compiler Design
PDF
poster-hadoop-MiroslavMihaylov
PDF
BDC-presentation
PDF
Container orchestration in geo-distributed cloud computing platforms
PPTX
Point Clouds: The Power of Components
ODP
Parallel Programming on the ANDC cluster
PPTX
Ch4.mapreduce algorithm design
PPT
Graph Matching
PPTX
Big o notation
PDF
Large Graph Processing
PDF
Global Grid of Grapes
PPTX
Cross-Validation and Big Data Partitioning Via Experimental Design
3D Analyst - Cut and Fill
Mapreduce total order sorting technique
SparkNet presentation
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
XL-Miner: Timeseries
All projects
3012: Assignment 3 Video clip collection 1
Compiler Design
poster-hadoop-MiroslavMihaylov
BDC-presentation
Container orchestration in geo-distributed cloud computing platforms
Point Clouds: The Power of Components
Parallel Programming on the ANDC cluster
Ch4.mapreduce algorithm design
Graph Matching
Big o notation
Large Graph Processing
Global Grid of Grapes
Cross-Validation and Big Data Partitioning Via Experimental Design
Ad

Viewers also liked (20)

PDF
Tcpo2 calibration using PeriFlux 6000
PDF
公器不私用 - 企业实时通讯新世代 (先作科技有限公司 谢权华先生)
PPTX
CALMing the Cost of Textbooks: How to Create Affordable Learning Materials on...
PPTX
Color y volumen
PDF
Le Camere dell'Economia - Da quarant'anni in rete per lo sviluppo
PDF
Tcpo2 site selection
DOCX
Araling Panlipunan
PDF
Reconceptualizing the Role of Creativity in Art Education Theory and Practice
PDF
全方位流动广告策略 (Hotmob Limited 周君谚先生)
PDF
Rethinking art education for older adults: An ethnographic study of the Unive...
PDF
Social Action and Art Education: A Curriculum for Change
PDF
CALMing the High Cost of Educational Resources: How CSUSM is Creating Alterna...
PDF
Article review
PDF
Tcpo2 electrode maintenance
PDF
Social Analytics
PDF
Using art in pre-registration nurse education
PPTX
Pitstop - One stop solution to resource management
PDF
Consuntivo 2014 dell'economia regionale
PDF
Audience Response System - guest lecture NHTV Breda, The Netherlands 2014 Mar...
Tcpo2 calibration using PeriFlux 6000
公器不私用 - 企业实时通讯新世代 (先作科技有限公司 谢权华先生)
CALMing the Cost of Textbooks: How to Create Affordable Learning Materials on...
Color y volumen
Le Camere dell'Economia - Da quarant'anni in rete per lo sviluppo
Tcpo2 site selection
Araling Panlipunan
Reconceptualizing the Role of Creativity in Art Education Theory and Practice
全方位流动广告策略 (Hotmob Limited 周君谚先生)
Rethinking art education for older adults: An ethnographic study of the Unive...
Social Action and Art Education: A Curriculum for Change
CALMing the High Cost of Educational Resources: How CSUSM is Creating Alterna...
Article review
Tcpo2 electrode maintenance
Social Analytics
Using art in pre-registration nurse education
Pitstop - One stop solution to resource management
Consuntivo 2014 dell'economia regionale
Audience Response System - guest lecture NHTV Breda, The Netherlands 2014 Mar...
Ad

Similar to Optimization of graph storage using GoFFish (20)

PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Graph Analysis: New Algorithm Models, New Architectures
PPTX
2013.09.10 Giraph at London Hadoop Users Group
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
PPTX
2013 06-03 berlin buzzwords
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
PDF
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
PDF
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
PDF
Distributed processing of large graphs in python
PDF
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...
PPTX
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
PDF
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs : NOTES
PDF
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs (NOTES)
PDF
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
PPTX
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
PDF
Ling liu part 02:big graph processing
PDF
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Time-Evolving Graph Processing On Commodity Clusters
Graph Analysis: New Algorithm Models, New Architectures
2013.09.10 Giraph at London Hadoop Users Group
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
2013 06-03 berlin buzzwords
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
Distributed processing of large graphs in python
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Sys...
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs : NOTES
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs (NOTES)
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
Ling liu part 02:big graph processing
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Miokarditis (Inflamasi pada Otot Jantung)
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction-to-Cloud-ComputingFinal.pptx
Qualitative Qantitative and Mixed Methods.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Mega Projects Data Mega Projects Data
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Analytics and business intelligence.pdf
Supervised vs unsupervised machine learning algorithms
Introduction to Knowledge Engineering Part 1
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21

Optimization of graph storage using GoFFish

  • 1. GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics Guide: Prof. Dinkar Sitaram Department of Computer Science and Engineering PES Institute of Technology Ms. Prafullata Department of Computer Science and Engineering PES Institute of Technology Guide: Prof. Yogesh Simmhan Department of SERC Indian Institute of Science Team Members: Anushree P K 1PI11IS017 Bhavani B 1PI11IS027 Mithilesh K G 1PI11IS059
  • 2. What is GoFFISH? GoFFish is a scalable software framework for storing graphs, and composing and executing graph analytics in a Cloud and commodity cluster environment It consists of :- 1. GoFS – It is a distributed store for partitioning, storing and accessing graph datasets across hosts in a cluster. 2. Gopher - Gopher is a programming framework that offers sub-graph centric abstractions on a a Cloud or cluster in conjunction with GoFS. GoFFish is implemented in Java.
  • 3. Existing GoFFISH Storage Architecture Worker 2 Worker 3 Worker 1 + Head Node Partition 2 Partition 1 Partition 3 t0 – t10 Slice 1 Slice 1 Same Storage format in worker 1 • Large graph partitioned into subgraphs and distributed across workers. • GoFFISH default storage 10 bins in a slice. 10 instances of every subgraph.
  • 4. Using the GoFFish framework: To store the real time graphs in – temporal and spatial formats To compute the efficiencies of both storage formats based on the input algorithm (gopher job) Intuition :- The time taken for a gopher job over large graphs should be optimized when the graphs are stored In the given two formats depending on the algorithm run. For example for vertex count algorithm temporal format should take lessar computation time Problem Definition
  • 5. GoFFISH Storage Architecture – Our Model Worker 2 Worker 3 Worker 1 + Head Node Partition 2 Partition 1 Partition 3 t0 – tn Slice 1 Same Storage format in worker 1 • All instances for a subgraph in one slice. • One bin per slice. Slice 2 Slice 3 Slice 4
  • 6. We used slicing pointers instancegroupingsize and numsubgraphbins to manipulate the way the graphs were partitioned in order to obtain the desired storage format. These slicing pointers correspond the temporal and subgraph bin packing schemes in GoFFISH.
  • 7. Algorithms • Vertex Count Each subgraph processor calculate number of vertices within a subgraph. They send messages to all other subgraphs with their count where each subgraph calculate the total using the messages received. • Connected Components Each subgraph finds the smallest vertex id for each subgraph and propagate that smallest value to its connected subgraphs. If incoming value to a subgraph is different from its current value it updates the current value and propagate the changes to its neighbours.
  • 9. Dataset :- Road network graph Vehicle route tracking using traffic cams –Time-series graph of sync camera snapshots –Sensors are vertices –Edges are road connectivity w/ distance weight Graph instance is image metadata every N sec –License plate, vehicle color, direction, speed Urban Dataset
  • 10. 0 2000 4000 6000 8000 10000 12000 0 1 2 3 4 TImeTaken Partition Id Partition-Wise Total App Time S=10,t=10 S=1,T=ALL 4200 4250 4300 4350 4400 4450 0 1 2 3 4 TImeTaken Partition Id Partition-Wise Total App Time S=10,t=10 S=1,T=ALL Vertex Count Connected Components Performance Analysis
  • 11. Partition 1 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 1 2 3 4 TimeTaken Superstep Superstep- wise Compute Task Time VC(s=10,t=10) VC(s=1,t=ALL) 0 50 100 150 200 250 300 350 400 0 0.5 1 1.5 2 2.5 3 3.5 TimeTaken Superstep Superstep- wise Compute Task Time CC(s=10,t=10) CC(s=1,t=ALL) Vertex Count Connected Components Performance Analysis
  • 15. GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics - Indian Institute of Science, Bangalore 560012 India, University of Southern California, Los Angeles CA 90089 USA, November 26, 2013 Scalable Analytics over Distributed Time-series Graphs using GoFFish - Indian Institute of Science, Bangalore 560012 India, University of Southern California, Los Angeles CA 90089 USA, June 23, 2014 Chronos: A Graph Engine for Temporal Graph Analysis - Tsinghua University, University of Science and Technology of China, Microsoft Research References
  • 16. The Team Anushree Prasanna Kumar 8th Sem ISE Bhavani B 8th Sem ISE Mithilesh Kumar 8th Sem ISE