SlideShare a Scribd company logo
1
GNNs at Scale With Graph Data Science Sampling
and GDS Python Client Integration
Adam Schill Collberg
Senior Software Engineer, Graph Data Science Team at Neo4j
Overview
‱ Problem Statement
◩ Cora Dataset
◩ Node Classification
‱ Using Graph Neural Networks (GNN)
◩ A Very Brief Introduction
◩ PyTorch and PyG
◩ GNN Pros and Cons
‱ Graph Sampling
◩ Random Walk with Restarts (RWR)
◩ Neo4j Graph Data Science (GDS)
‱ Demo
‱ Summary and Further Learning
2
3
Problem Statement
Classifying subjects of research papers in a citation network
3
The Cora Dataset
‱ A research paper citation network
‱ 2708 research Paper nodes
‱ 5429 CITES relationships
Paper Paper
CITES
Schema:
Paper nodes color
coded by subject
4
“equation”
“turing”
“graph”


Paper X
1
0
0


Feature vector
‱ Each paper node has a length 1433 binary
feature vector
◩ Each dimension represents a keyword
◩ Value 1 if paper has keyword, 0 otherwise
‱ Each paper node belongs to a subject
The Node Classification Problem
Computer Science
Mathematics
Paper 1
Paper 2
Feature vectors
5
Supervised Machine Learning Approach


Supervised ML
model
Eg. neural network
Paper subject
Feature Vector Representation
But what about all the relationship information?
“equation”
“turing”
“graph”


Paper 1
1
0
0


Paper 2
0
1
1


6
Degree could be interesting?
Maybe interesting, but can we do better?


Supervised ML
model
Eg. neural network
Paper subject
Feature Vector Representation
“equation”
“turing”
“graph”


Node degree
Paper 1
1
0
0


82
Paper 2
0
1
1


3
7
8
Using Graph Neural Networks
Graph topology-aware deep machine learning
8
A Very Brief Intro to GNNs
A neural network architecture based on message passing, where:
● Inputs are node feature vectors (same as regular ML)
● Layer connectivity is defined by relationships
Example:
Each target nodes gets its own computation graph, but node vectors and weights are shared
Layer 0:
Node feature
vectors
Layer 1, outputs
transformed vectors
Layer 2
GNN computation graph
for A
9
The GNN layer
The GNN layer abstractly consists of two parts:
1. Aggregating neighbor output vectors from previous layer
2. Updating target node vector with neural network
Graph Convolution Network (GCN) example:
Layer i
xC(i - 1)
xA(i - 1)
xB(i)
10
where σ is non-linear activation and W(i) weight matrix
PyTorch and PyG
● An optimized tensor library for deep learning using
GPUs and CPUs
● Originally developed by Meta AI, now part of the Linux
Foundation umbrella
● Rich ecosystem
● Interfaces in both Python and C++
● PyTorch Geometric (PyG):
○ Extension for graph learning
○ Supports lots of GNN architectures
11
12
GNN Cons
GNN Pros
‱ Requires a lot of memory
‱ Are slow to compute
◩ Even with GPUs
‱ Hard to interpret
‱ Fairly complicated to
implement
‱ Leverage topology through
message passing
‱ Can capture lots of
information in weights
‱ Are inductive:
◩ Can be trained on one graph
and applied for prediction on
another (similar) graph
Let’s train a GNN on a graph subsample!
13
Graph Sampling
Random walk with restarts and Neo4j Graph Data Science
13
Random Walk with Restarts (RWR) sampling
The algorithm:
Take random walks in the graph, but keep restarting from the
same node root intermittently
When enough of the graph visited, output visited graph
Shown by Leskovec et al. [*] to produce structurally
representative subgraphs
[*]: Sampling from Large Graphs, Leskovec et al, 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
14
Neo4j Graph Data Science (GDS) Library
● A Neo4j DB plugin for analytics
○ Part of the DBMS (server) process
○ Projects database into memory for analysis
● Provides high performance graph algorithms
○ Running at scale (100s of billions of nodes)
○ Has a nice implementation of RWR sampling
● Supports rapid transfer to and from the library
○ Using the Apache Arrow memory format and
libraries
15
Client side
Server side
Neo4j GDS
Neo4j
Driver
JVM
Bolt
Arrow
Client
Arrow
But there is an easier way

Neo4j Graph Data Science Client
● A Pythonic surface for GDS
● Wraps Neo4j Python driver
● Looks very similar to the GDS Cypher API
● Adds additional convenience functionality
● Uses Arrow client for fast data transfer
seamlessly under the hood
● pip install graphdatascience
16
Client side
Server side
Neo4j GDS
GDS Client
JVM
Bolt Arrow
17
Demo
Jupyter notebook time!
17
18
Summary and Further Learning
What did we learn? And what’s next?
18
What did we learn?
● Node classification is an interesting graph ML problem
● It’s useful to leverage topology when learning on graphs
● GNNs can do this well, but are slow
● Since GNNs are inductive we can train them on a subsample
● We can use PyG for GNN training
● We can subsample using RWR
● We can use GDS (including its Python client) for RWR
● GDS’s rapid transfer capabilities enable a fast workflow
19
Also at NODES
If you liked this presentation, you might also like
‱ Fundamentals of Neo4j Graph Data Science Series 2.x – Pipelines and more
◩ Mats Rydberg
◩ Thursday, November 17, 9:40 – 10:25 CET
‱ Link Prediction With Graph Data Science at Scale
◩ Florentin Dörre
◩ Thursday, November 17, 13:40 – 13:55 CET
A Bunch of Links
● The DEMO notebook
● Neo4j Graph Data Science Manual
● Neo4j Graph Data Science Client Manual
● PyTorch Docs
● PyTorch Geometric Docs
● The Cora dataset
● GCN paper
● Sampling from Large Graphs paper
● Graph Representation Learning book
● Apache Arrow
21
22
Thank you!
Contact me at
adam.schill-collberg@neo4j.com
adamnsch@Github

More Related Content

PPTX
Graph Neural Network - Introduction
PPTX
Sun_MAPL_GNN.pptx
PPTX
Graph Neural Networks.pptx
PDF
Gnn overview
PPTX
Colloquium.pptx
PPTX
Chapter 4 better.pptx
PPTX
Chapter 3.pptx
PPTX
Demystifying Graph Neural Networks
 
Graph Neural Network - Introduction
Sun_MAPL_GNN.pptx
Graph Neural Networks.pptx
Gnn overview
Colloquium.pptx
Chapter 4 better.pptx
Chapter 3.pptx
Demystifying Graph Neural Networks
 

Similar to Apple Logic Pro X for MacOS Free Download (20)

PDF
What is GNN and Its Real World Applications.pdf
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PPTX
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
PPTX
20191107 deeplearningapproachesfornetworks
 
PDF
Graph neural networks overview
PPTX
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
PDF
Shift AI 2020: Graph Deep Learning for Real-World Applications | Mark Weber (...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
PDF
Node classification with graph neural network based centrality measures and f...
PDF
NS-CUK Seminar: V.T.Hoang, Review on "Everything is Connected: Graph Neural N...
PPTX
Introduction to Graph neural networks @ Vienna Deep Learning meetup
PDF
Graph Neural Networks for Recommendations
 
PDF
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
PDF
Graph Neural Network in practice
PDF
Grl book
PPTX
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PDF
Leveraging Graphs for Better AI
 
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
PDF
Introduction to Chainer
What is GNN and Its Real World Applications.pdf
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
20191107 deeplearningapproachesfornetworks
 
Graph neural networks overview
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
Shift AI 2020: Graph Deep Learning for Real-World Applications | Mark Weber (...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
Node classification with graph neural network based centrality measures and f...
NS-CUK Seminar: V.T.Hoang, Review on "Everything is Connected: Graph Neural N...
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Graph Neural Networks for Recommendations
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Graph Neural Network in practice
Grl book
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
VJAI Paper Reading#3-KDD2019-ClusterGCN
Leveraging Graphs for Better AI
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
Introduction to Chainer
Ad

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Nekopoi APK 2025 free lastest update
PPTX
Transform Your Business with a Software ERP System
PDF
Digital Strategies for Manufacturing Companies
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administraation Chapter 3
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
System and Network Administration Chapter 2
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms II-SECS-1021-03
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Nekopoi APK 2025 free lastest update
Transform Your Business with a Software ERP System
Digital Strategies for Manufacturing Companies
Understanding Forklifts - TECH EHS Solution
Upgrade and Innovation Strategies for SAP ERP Customers
L1 - Introduction to python Backend.pptx
System and Network Administraation Chapter 3
Adobe Illustrator 28.6 Crack My Vision of Vector Design
System and Network Administration Chapter 2
Wondershare Filmora 15 Crack With Activation Key [2025
Navsoft: AI-Powered Business Solutions & Custom Software Development
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Reimagine Home Health with the Power of Agentic AI​
Which alternative to Crystal Reports is best for small or large businesses.pdf
Ad

Apple Logic Pro X for MacOS Free Download

  • 1. 1 GNNs at Scale With Graph Data Science Sampling and GDS Python Client Integration Adam Schill Collberg Senior Software Engineer, Graph Data Science Team at Neo4j
  • 2. Overview ‱ Problem Statement ◩ Cora Dataset ◩ Node Classification ‱ Using Graph Neural Networks (GNN) ◩ A Very Brief Introduction ◩ PyTorch and PyG ◩ GNN Pros and Cons ‱ Graph Sampling ◩ Random Walk with Restarts (RWR) ◩ Neo4j Graph Data Science (GDS) ‱ Demo ‱ Summary and Further Learning 2
  • 3. 3 Problem Statement Classifying subjects of research papers in a citation network 3
  • 4. The Cora Dataset ‱ A research paper citation network ‱ 2708 research Paper nodes ‱ 5429 CITES relationships Paper Paper CITES Schema: Paper nodes color coded by subject 4 “equation” “turing” “graph” 
 Paper X 1 0 0 
 Feature vector ‱ Each paper node has a length 1433 binary feature vector ◩ Each dimension represents a keyword ◩ Value 1 if paper has keyword, 0 otherwise ‱ Each paper node belongs to a subject
  • 5. The Node Classification Problem Computer Science Mathematics Paper 1 Paper 2 Feature vectors 5
  • 6. Supervised Machine Learning Approach 
 Supervised ML model Eg. neural network Paper subject Feature Vector Representation But what about all the relationship information? “equation” “turing” “graph” 
 Paper 1 1 0 0 
 Paper 2 0 1 1 
 6
  • 7. Degree could be interesting? Maybe interesting, but can we do better? 
 Supervised ML model Eg. neural network Paper subject Feature Vector Representation “equation” “turing” “graph” 
 Node degree Paper 1 1 0 0 
 82 Paper 2 0 1 1 
 3 7
  • 8. 8 Using Graph Neural Networks Graph topology-aware deep machine learning 8
  • 9. A Very Brief Intro to GNNs A neural network architecture based on message passing, where: ● Inputs are node feature vectors (same as regular ML) ● Layer connectivity is defined by relationships Example: Each target nodes gets its own computation graph, but node vectors and weights are shared Layer 0: Node feature vectors Layer 1, outputs transformed vectors Layer 2 GNN computation graph for A 9
  • 10. The GNN layer The GNN layer abstractly consists of two parts: 1. Aggregating neighbor output vectors from previous layer 2. Updating target node vector with neural network Graph Convolution Network (GCN) example: Layer i xC(i - 1) xA(i - 1) xB(i) 10 where σ is non-linear activation and W(i) weight matrix
  • 11. PyTorch and PyG ● An optimized tensor library for deep learning using GPUs and CPUs ● Originally developed by Meta AI, now part of the Linux Foundation umbrella ● Rich ecosystem ● Interfaces in both Python and C++ ● PyTorch Geometric (PyG): ○ Extension for graph learning ○ Supports lots of GNN architectures 11
  • 12. 12 GNN Cons GNN Pros ‱ Requires a lot of memory ‱ Are slow to compute ◩ Even with GPUs ‱ Hard to interpret ‱ Fairly complicated to implement ‱ Leverage topology through message passing ‱ Can capture lots of information in weights ‱ Are inductive: ◩ Can be trained on one graph and applied for prediction on another (similar) graph Let’s train a GNN on a graph subsample!
  • 13. 13 Graph Sampling Random walk with restarts and Neo4j Graph Data Science 13
  • 14. Random Walk with Restarts (RWR) sampling The algorithm: Take random walks in the graph, but keep restarting from the same node root intermittently When enough of the graph visited, output visited graph Shown by Leskovec et al. [*] to produce structurally representative subgraphs [*]: Sampling from Large Graphs, Leskovec et al, 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining 14
  • 15. Neo4j Graph Data Science (GDS) Library ● A Neo4j DB plugin for analytics ○ Part of the DBMS (server) process ○ Projects database into memory for analysis ● Provides high performance graph algorithms ○ Running at scale (100s of billions of nodes) ○ Has a nice implementation of RWR sampling ● Supports rapid transfer to and from the library ○ Using the Apache Arrow memory format and libraries 15 Client side Server side Neo4j GDS Neo4j Driver JVM Bolt Arrow Client Arrow But there is an easier way

  • 16. Neo4j Graph Data Science Client ● A Pythonic surface for GDS ● Wraps Neo4j Python driver ● Looks very similar to the GDS Cypher API ● Adds additional convenience functionality ● Uses Arrow client for fast data transfer seamlessly under the hood ● pip install graphdatascience 16 Client side Server side Neo4j GDS GDS Client JVM Bolt Arrow
  • 18. 18 Summary and Further Learning What did we learn? And what’s next? 18
  • 19. What did we learn? ● Node classification is an interesting graph ML problem ● It’s useful to leverage topology when learning on graphs ● GNNs can do this well, but are slow ● Since GNNs are inductive we can train them on a subsample ● We can use PyG for GNN training ● We can subsample using RWR ● We can use GDS (including its Python client) for RWR ● GDS’s rapid transfer capabilities enable a fast workflow 19
  • 20. Also at NODES If you liked this presentation, you might also like ‱ Fundamentals of Neo4j Graph Data Science Series 2.x – Pipelines and more ◩ Mats Rydberg ◩ Thursday, November 17, 9:40 – 10:25 CET ‱ Link Prediction With Graph Data Science at Scale ◩ Florentin Dörre ◩ Thursday, November 17, 13:40 – 13:55 CET
  • 21. A Bunch of Links ● The DEMO notebook ● Neo4j Graph Data Science Manual ● Neo4j Graph Data Science Client Manual ● PyTorch Docs ● PyTorch Geometric Docs ● The Cora dataset ● GCN paper ● Sampling from Large Graphs paper ● Graph Representation Learning book ● Apache Arrow 21
  • 22. 22 Thank you! Contact me at adam.schill-collberg@neo4j.com adamnsch@Github