1
GNNs at Scale With Graph Data Science Sampling
and GDS Python Client Integration
Adam Schill Collberg
Senior Software Engineer, Graph Data Science Team at Neo4j
Overview
• Problem Statement
◦ Cora Dataset
◦ Node Classification
• Using Graph Neural Networks (GNN)
◦ A Very Brief Introduction
◦ PyTorch and PyG
◦ GNN Pros and Cons
• Graph Sampling
◦ Random Walk with Restarts (RWR)
◦ Neo4j Graph Data Science (GDS)
• Demo
• Summary and Further Learning
2
3
Problem Statement
Classifying subjects of research papers in a citation network
3
The Cora Dataset
• A research paper citation network
• 2708 research Paper nodes
• 5429 CITES relationships
Paper Paper
CITES
Schema:
Paper nodes color
coded by subject
4
“equation”
“turing”
“graph”
…
Paper X
1
0
0
…
Feature vector
• Each paper node has a length 1433 binary
feature vector
◦ Each dimension represents a keyword
◦ Value 1 if paper has keyword, 0 otherwise
• Each paper node belongs to a subject
The Node Classification Problem
Computer Science
Mathematics
Paper 1
Paper 2
Feature vectors
5
Supervised Machine Learning Approach
…
Supervised ML
model
Eg. neural network
Paper subject
Feature Vector Representation
But what about all the relationship information?
“equation”
“turing”
“graph”
…
Paper 1
1
0
0
…
Paper 2
0
1
1
…
6
Degree could be interesting?
Maybe interesting, but can we do better?
…
Supervised ML
model
Eg. neural network
Paper subject
Feature Vector Representation
“equation”
“turing”
“graph”
…
Node degree
Paper 1
1
0
0
…
82
Paper 2
0
1
1
…
3
7
8
Using Graph Neural Networks
Graph topology-aware deep machine learning
8
A Very Brief Intro to GNNs
A neural network architecture based on message passing, where:
● Inputs are node feature vectors (same as regular ML)
● Layer connectivity is defined by relationships
Example:
Each target nodes gets its own computation graph, but node vectors and weights are shared
Layer 0:
Node feature
vectors
Layer 1, outputs
transformed vectors
Layer 2
GNN computation graph
for A
9
The GNN layer
The GNN layer abstractly consists of two parts:
1. Aggregating neighbor output vectors from previous layer
2. Updating target node vector with neural network
Graph Convolution Network (GCN) example:
Layer i
xC(i - 1)
xA(i - 1)
xB(i)
10
where σ is non-linear activation and W(i) weight matrix
PyTorch and PyG
● An optimized tensor library for deep learning using
GPUs and CPUs
● Originally developed by Meta AI, now part of the Linux
Foundation umbrella
● Rich ecosystem
● Interfaces in both Python and C++
● PyTorch Geometric (PyG):
○ Extension for graph learning
○ Supports lots of GNN architectures
11
12
GNN Cons
GNN Pros
• Requires a lot of memory
• Are slow to compute
◦ Even with GPUs
• Hard to interpret
• Fairly complicated to
implement
• Leverage topology through
message passing
• Can capture lots of
information in weights
• Are inductive:
◦ Can be trained on one graph
and applied for prediction on
another (similar) graph
Let’s train a GNN on a graph subsample!
13
Graph Sampling
Random walk with restarts and Neo4j Graph Data Science
13
Random Walk with Restarts (RWR) sampling
The algorithm:
Take random walks in the graph, but keep restarting from the
same node root intermittently
When enough of the graph visited, output visited graph
Shown by Leskovec et al. [*] to produce structurally
representative subgraphs
[*]: Sampling from Large Graphs, Leskovec et al, 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
14
Neo4j Graph Data Science (GDS) Library
● A Neo4j DB plugin for analytics
○ Part of the DBMS (server) process
○ Projects database into memory for analysis
● Provides high performance graph algorithms
○ Running at scale (100s of billions of nodes)
○ Has a nice implementation of RWR sampling
● Supports rapid transfer to and from the library
○ Using the Apache Arrow memory format and
libraries
15
Client side
Server side
Neo4j GDS
Neo4j
Driver
JVM
Bolt
Arrow
Client
Arrow
But there is an easier way…
Neo4j Graph Data Science Client
● A Pythonic surface for GDS
● Wraps Neo4j Python driver
● Looks very similar to the GDS Cypher API
● Adds additional convenience functionality
● Uses Arrow client for fast data transfer
seamlessly under the hood
● pip install graphdatascience
16
Client side
Server side
Neo4j GDS
GDS Client
JVM
Bolt Arrow
17
Demo
Jupyter notebook time!
17
18
Summary and Further Learning
What did we learn? And what’s next?
18
What did we learn?
● Node classification is an interesting graph ML problem
● It’s useful to leverage topology when learning on graphs
● GNNs can do this well, but are slow
● Since GNNs are inductive we can train them on a subsample
● We can use PyG for GNN training
● We can subsample using RWR
● We can use GDS (including its Python client) for RWR
● GDS’s rapid transfer capabilities enable a fast workflow
19
Also at NODES
If you liked this presentation, you might also like
• Fundamentals of Neo4j Graph Data Science Series 2.x – Pipelines and more
◦ Mats Rydberg
◦ Thursday, November 17, 9:40 – 10:25 CET
• Link Prediction With Graph Data Science at Scale
◦ Florentin Dörre
◦ Thursday, November 17, 13:40 – 13:55 CET
A Bunch of Links
● The DEMO notebook
● Neo4j Graph Data Science Manual
● Neo4j Graph Data Science Client Manual
● PyTorch Docs
● PyTorch Geometric Docs
● The Cora dataset
● GCN paper
● Sampling from Large Graphs paper
● Graph Representation Learning book
● Apache Arrow
21
22
Thank you!
Contact me at
adam.schill-collberg@neo4j.com
adamnsch@Github

More Related Content

PPTX
Graph Neural Network - Introduction
PPTX
Sun_MAPL_GNN.pptx
PPTX
Graph Neural Networks.pptx
PDF
Gnn overview
PPTX
Colloquium.pptx
PPTX
Chapter 4 better.pptx
PPTX
Chapter 3.pptx
PPTX
Demystifying Graph Neural Networks
Graph Neural Network - Introduction
Sun_MAPL_GNN.pptx
Graph Neural Networks.pptx
Gnn overview
Colloquium.pptx
Chapter 4 better.pptx
Chapter 3.pptx
Demystifying Graph Neural Networks

Similar to RadioBOSS Advanced 7.0.8 Free Download (20)

PDF
What is GNN and Its Real World Applications.pdf
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PPTX
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
PPTX
20191107 deeplearningapproachesfornetworks
PDF
Graph neural networks overview
PPTX
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
PDF
Shift AI 2020: Graph Deep Learning for Real-World Applications | Mark Weber (...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
PDF
Node classification with graph neural network based centrality measures and f...
PDF
NS-CUK Seminar: V.T.Hoang, Review on "Everything is Connected: Graph Neural N...
PPTX
Introduction to Graph neural networks @ Vienna Deep Learning meetup
PDF
Graph Neural Networks for Recommendations
PDF
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
PDF
Graph Neural Network in practice
PDF
Grl book
PPTX
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PDF
Leveraging Graphs for Better AI
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
PDF
Introduction to Chainer
What is GNN and Its Real World Applications.pdf
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
20191107 deeplearningapproachesfornetworks
Graph neural networks overview
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
Shift AI 2020: Graph Deep Learning for Real-World Applications | Mark Weber (...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
Node classification with graph neural network based centrality measures and f...
NS-CUK Seminar: V.T.Hoang, Review on "Everything is Connected: Graph Neural N...
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Graph Neural Networks for Recommendations
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Graph Neural Network in practice
Grl book
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
VJAI Paper Reading#3-KDD2019-ClusterGCN
Leveraging Graphs for Better AI
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
Introduction to Chainer
Ad

Recently uploaded (20)

PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Tech Workshop Escape Room Tech Workshop
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PPTX
most interesting chapter in the world ppt
PPTX
Download Adobe Photoshop Crack 2025 Free
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Microsoft Office 365 Crack Download Free
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
AI Guide for Business Growth - Arna Softech
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Wondershare Recoverit Full Crack New Version (Latest 2025)
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Tech Workshop Escape Room Tech Workshop
Advanced SystemCare Ultimate Crack + Portable (2025)
most interesting chapter in the world ppt
Download Adobe Photoshop Crack 2025 Free
Airline CRS | Airline CRS Systems | CRS System
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
iTop VPN Crack Latest Version Full Key 2025
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Salesforce Agentforce AI Implementation.pdf
Microsoft Office 365 Crack Download Free
Topaz Photo AI Crack New Download (Latest 2025)
MCP Security Tutorial - Beginner to Advanced
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
CCleaner 6.39.11548 Crack 2025 License Key
CNN LeNet5 Architecture: Neural Networks
AI Guide for Business Growth - Arna Softech
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Ad

RadioBOSS Advanced 7.0.8 Free Download

  • 1. 1 GNNs at Scale With Graph Data Science Sampling and GDS Python Client Integration Adam Schill Collberg Senior Software Engineer, Graph Data Science Team at Neo4j
  • 2. Overview • Problem Statement ◦ Cora Dataset ◦ Node Classification • Using Graph Neural Networks (GNN) ◦ A Very Brief Introduction ◦ PyTorch and PyG ◦ GNN Pros and Cons • Graph Sampling ◦ Random Walk with Restarts (RWR) ◦ Neo4j Graph Data Science (GDS) • Demo • Summary and Further Learning 2
  • 3. 3 Problem Statement Classifying subjects of research papers in a citation network 3
  • 4. The Cora Dataset • A research paper citation network • 2708 research Paper nodes • 5429 CITES relationships Paper Paper CITES Schema: Paper nodes color coded by subject 4 “equation” “turing” “graph” … Paper X 1 0 0 … Feature vector • Each paper node has a length 1433 binary feature vector ◦ Each dimension represents a keyword ◦ Value 1 if paper has keyword, 0 otherwise • Each paper node belongs to a subject
  • 5. The Node Classification Problem Computer Science Mathematics Paper 1 Paper 2 Feature vectors 5
  • 6. Supervised Machine Learning Approach … Supervised ML model Eg. neural network Paper subject Feature Vector Representation But what about all the relationship information? “equation” “turing” “graph” … Paper 1 1 0 0 … Paper 2 0 1 1 … 6
  • 7. Degree could be interesting? Maybe interesting, but can we do better? … Supervised ML model Eg. neural network Paper subject Feature Vector Representation “equation” “turing” “graph” … Node degree Paper 1 1 0 0 … 82 Paper 2 0 1 1 … 3 7
  • 8. 8 Using Graph Neural Networks Graph topology-aware deep machine learning 8
  • 9. A Very Brief Intro to GNNs A neural network architecture based on message passing, where: ● Inputs are node feature vectors (same as regular ML) ● Layer connectivity is defined by relationships Example: Each target nodes gets its own computation graph, but node vectors and weights are shared Layer 0: Node feature vectors Layer 1, outputs transformed vectors Layer 2 GNN computation graph for A 9
  • 10. The GNN layer The GNN layer abstractly consists of two parts: 1. Aggregating neighbor output vectors from previous layer 2. Updating target node vector with neural network Graph Convolution Network (GCN) example: Layer i xC(i - 1) xA(i - 1) xB(i) 10 where σ is non-linear activation and W(i) weight matrix
  • 11. PyTorch and PyG ● An optimized tensor library for deep learning using GPUs and CPUs ● Originally developed by Meta AI, now part of the Linux Foundation umbrella ● Rich ecosystem ● Interfaces in both Python and C++ ● PyTorch Geometric (PyG): ○ Extension for graph learning ○ Supports lots of GNN architectures 11
  • 12. 12 GNN Cons GNN Pros • Requires a lot of memory • Are slow to compute ◦ Even with GPUs • Hard to interpret • Fairly complicated to implement • Leverage topology through message passing • Can capture lots of information in weights • Are inductive: ◦ Can be trained on one graph and applied for prediction on another (similar) graph Let’s train a GNN on a graph subsample!
  • 13. 13 Graph Sampling Random walk with restarts and Neo4j Graph Data Science 13
  • 14. Random Walk with Restarts (RWR) sampling The algorithm: Take random walks in the graph, but keep restarting from the same node root intermittently When enough of the graph visited, output visited graph Shown by Leskovec et al. [*] to produce structurally representative subgraphs [*]: Sampling from Large Graphs, Leskovec et al, 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining 14
  • 15. Neo4j Graph Data Science (GDS) Library ● A Neo4j DB plugin for analytics ○ Part of the DBMS (server) process ○ Projects database into memory for analysis ● Provides high performance graph algorithms ○ Running at scale (100s of billions of nodes) ○ Has a nice implementation of RWR sampling ● Supports rapid transfer to and from the library ○ Using the Apache Arrow memory format and libraries 15 Client side Server side Neo4j GDS Neo4j Driver JVM Bolt Arrow Client Arrow But there is an easier way…
  • 16. Neo4j Graph Data Science Client ● A Pythonic surface for GDS ● Wraps Neo4j Python driver ● Looks very similar to the GDS Cypher API ● Adds additional convenience functionality ● Uses Arrow client for fast data transfer seamlessly under the hood ● pip install graphdatascience 16 Client side Server side Neo4j GDS GDS Client JVM Bolt Arrow
  • 18. 18 Summary and Further Learning What did we learn? And what’s next? 18
  • 19. What did we learn? ● Node classification is an interesting graph ML problem ● It’s useful to leverage topology when learning on graphs ● GNNs can do this well, but are slow ● Since GNNs are inductive we can train them on a subsample ● We can use PyG for GNN training ● We can subsample using RWR ● We can use GDS (including its Python client) for RWR ● GDS’s rapid transfer capabilities enable a fast workflow 19
  • 20. Also at NODES If you liked this presentation, you might also like • Fundamentals of Neo4j Graph Data Science Series 2.x – Pipelines and more ◦ Mats Rydberg ◦ Thursday, November 17, 9:40 – 10:25 CET • Link Prediction With Graph Data Science at Scale ◦ Florentin Dörre ◦ Thursday, November 17, 13:40 – 13:55 CET
  • 21. A Bunch of Links ● The DEMO notebook ● Neo4j Graph Data Science Manual ● Neo4j Graph Data Science Client Manual ● PyTorch Docs ● PyTorch Geometric Docs ● The Cora dataset ● GCN paper ● Sampling from Large Graphs paper ● Graph Representation Learning book ● Apache Arrow 21
  • 22. 22 Thank you! Contact me at adam.schill-collberg@neo4j.com adamnsch@Github