SlideShare a Scribd company logo
2
Most read
6
Most read
GraphX
GraphX
GraphX
What is a graph
GraphX
Examples of graph computations
● Finding common friends
● Finding the page rank
● And Many more…
GraphX
Finding common friends
Examples of graph computations
GraphX
Finding Page Rank
Examples of graph computations
GraphX
● Unifies Graph Computation
○ ETL
○ Exploratory analysis
○ Iterative
● View the same Data as Graph and Collections
● Transform and join graphs with RDDs efficiently
● Extends the Spark RDD by introducing a new Graph
abstraction
GraphX
GraphX
GraphX
○ PageRank
■ If important pages link you, you are more important
○ Connected Components
■ Clusters amongst your facebook friends
○ Triangle Counting
■ Triangles passing through each vertex => measure of
clustering.
○ Label propagation
○ SVD++
○ Strongly connected components
Has library of algorithms
GraphX
GraphX
● subgraph
● joinVertices
● aggregateMessages
● And more….
Provides set of fundamental operations
https://spark.apache.org/docs/latest/graphx-programming-guide.html
GraphX
GraphX - Pagerank
1. BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
PR(A) = 0.15 + 0.85 * ( rank of node / outgoing)
GraphX
GraphX - Pagerank
$ hadoop fs -cat /data/spark/graphx/followers.txt
2 1
4 1
1 2
6 3
7 3
7 6
6 7
3 7
https://github.com/cloudxlab/bigdata/blob/master/spark/examples/graphx/pagerank.scala
GraphX
GraphX - Pagerank
import org.apache.spark.graphx.GraphLoader
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "/data/spark/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("/data/spark/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
println(ranksByUsername.collect().mkString("n"))
See more
GraphX
GraphX - Pagerank
1. BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
0.15
0.70
1.39
1.46
1.0
1.3
Thank you!
GraphX
reachus@cloudxlab.com

More Related Content

PDF
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
PDF
Indexing, searching, and aggregation with redi search and .net
PPT
The main trends in the use and development
PPTX
LD4KD 2015 - Demos and tools
PDF
Knowledge graphs + Chatbots with Neo4j
PDF
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
PPTX
Intro To Graph Databases - Oxana Goriuc
PDF
Hadoop @ eBuddy
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Indexing, searching, and aggregation with redi search and .net
The main trends in the use and development
LD4KD 2015 - Demos and tools
Knowledge graphs + Chatbots with Neo4j
Managing and Consuming Completeness Information for Wikidata Using COOL-WD
Intro To Graph Databases - Oxana Goriuc
Hadoop @ eBuddy

What's hot (10)

PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
PPTX
Sasaki datathon-madrid-2015
PDF
Distributed Deep Learning (And How to Get Involved)
PDF
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
PDF
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
PPTX
Lecture 08 mapping-converted
PDF
Distributed deep learning
PDF
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
PDF
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Hacktoberfest 2020 - Intro to Knowledge Graphs
Sasaki datathon-madrid-2015
Distributed Deep Learning (And How to Get Involved)
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Lecture 08 mapping-converted
Distributed deep learning
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Ad

Similar to Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab (20)

PDF
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
PDF
An excursion into Graph Analytics with Apache Spark GraphX
PPTX
Graphs in data structures are non-linear data structures made up of a finite ...
PDF
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
PDF
GraphX: Graph analytics for insights about developer communities
PDF
Graph Analytics in Spark
PDF
Machine Learning and GraphX
PDF
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
PDF
Microservices, containers, and machine learning
PDF
GraphFrames: DataFrame-based graphs for Apache® Spark™
PDF
Spark Meetup @ Netflix, 05/19/2015
PDF
MLconf seattle 2015 presentation
PDF
Web-Scale Graph Analytics with Apache® Spark™
PPTX
Dato vs GraphX
PPTX
Apache Spark GraphX highlights.
PDF
Mp26 : A Quick Introduction to NetworkX
PDF
Xia Zhu – Intel at MLconf ATL
PDF
Two graph data models : RDF and Property Graphs
PDF
F14 lec12graphs
PDF
Python networkx library quick start guide
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
An excursion into Graph Analytics with Apache Spark GraphX
Graphs in data structures are non-linear data structures made up of a finite ...
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX: Graph analytics for insights about developer communities
Graph Analytics in Spark
Machine Learning and GraphX
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Microservices, containers, and machine learning
GraphFrames: DataFrame-based graphs for Apache® Spark™
Spark Meetup @ Netflix, 05/19/2015
MLconf seattle 2015 presentation
Web-Scale Graph Analytics with Apache® Spark™
Dato vs GraphX
Apache Spark GraphX highlights.
Mp26 : A Quick Introduction to NetworkX
Xia Zhu – Intel at MLconf ATL
Two graph data models : RDF and Property Graphs
F14 lec12graphs
Python networkx library quick start guide
Ad

More from CloudxLab (20)

PDF
Understanding computer vision with Deep Learning
PDF
Deep Learning Overview
PDF
Recurrent Neural Networks
PDF
Natural Language Processing
PDF
Naive Bayes
PDF
Autoencoders
PDF
Training Deep Neural Nets
PDF
Reinforcement Learning
PDF
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
PDF
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
PDF
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
PDF
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
PPTX
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
PPTX
Introduction to Deep Learning | CloudxLab
PPTX
Dimensionality Reduction | Machine Learning | CloudxLab
PPTX
Ensemble Learning and Random Forests
Understanding computer vision with Deep Learning
Deep Learning Overview
Recurrent Neural Networks
Natural Language Processing
Naive Bayes
Autoencoders
Training Deep Neural Nets
Reinforcement Learning
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction to Deep Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
Ensemble Learning and Random Forests

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Monthly Chronicles - July 2025

Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab

  • 3. GraphX Examples of graph computations ● Finding common friends ● Finding the page rank ● And Many more…
  • 5. GraphX Finding Page Rank Examples of graph computations
  • 6. GraphX ● Unifies Graph Computation ○ ETL ○ Exploratory analysis ○ Iterative ● View the same Data as Graph and Collections ● Transform and join graphs with RDDs efficiently ● Extends the Spark RDD by introducing a new Graph abstraction GraphX
  • 7. GraphX GraphX ○ PageRank ■ If important pages link you, you are more important ○ Connected Components ■ Clusters amongst your facebook friends ○ Triangle Counting ■ Triangles passing through each vertex => measure of clustering. ○ Label propagation ○ SVD++ ○ Strongly connected components Has library of algorithms
  • 8. GraphX GraphX ● subgraph ● joinVertices ● aggregateMessages ● And more…. Provides set of fundamental operations https://spark.apache.org/docs/latest/graphx-programming-guide.html
  • 9. GraphX GraphX - Pagerank 1. BarackObama 2. Lady Gaga 3. John Resig 4. Justin Bieber 6. Matei Zaharia 6. Martin Odersky 7. anonsys PR(A) = 0.15 + 0.85 * ( rank of node / outgoing)
  • 10. GraphX GraphX - Pagerank $ hadoop fs -cat /data/spark/graphx/followers.txt 2 1 4 1 1 2 6 3 7 3 7 6 6 7 3 7 https://github.com/cloudxlab/bigdata/blob/master/spark/examples/graphx/pagerank.scala
  • 11. GraphX GraphX - Pagerank import org.apache.spark.graphx.GraphLoader // Load the edges as a graph val graph = GraphLoader.edgeListFile(sc, "/data/spark/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames val users = sc.textFile("/data/spark/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } val ranksByUsername = users.join(ranks).map { case (id, (username, rank)) => (username, rank) } // Print the result println(ranksByUsername.collect().mkString("n")) See more
  • 12. GraphX GraphX - Pagerank 1. BarackObama 2. Lady Gaga 3. John Resig 4. Justin Bieber 6. Matei Zaharia 6. Martin Odersky 7. anonsys 0.15 0.70 1.39 1.46 1.0 1.3