SlideShare a Scribd company logo
Graph Neural Network in practiceGraph Neural Network in practice
Céline Brouard and Nathalie Vialaneix, INRAE/MIATCéline Brouard and Nathalie Vialaneix, INRAE/MIAT
WG GNN, December 17th, 2020WG GNN, December 17th, 2020
1 / 261 / 26
GNN in practiceGNN in practice
Message passing principle and exploration of two librairiesMessage passing principle and exploration of two librairies
2 / 262 / 26
OverviewofGNN
last layer is fed to a standard MLP for prediction (at the graph level).
3 / 26
Message passing layers
are the generalization of convolutional layers to graph data
general concept introduced in [Gilmer et al. 2017] (general framework for
several previous GNN)
More formally, if is a graph with nodes
nodes
edges ,
node features for :
edge features for : are associated
representation of node , learned iteratively (layers ):
with : differential permutation invariant function (mean, sum, max...)
Rq: Actually [Gilmer et al. 2017] use: and (but no example).
G = (X, E) n
x ∈ X
e ∈ E
x lx
e le
x hx ∈ R
K
t = 1 … T
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
□
□ = ∑ Ft
4 / 26
Examples ofstandard MP layers
(restricted to those present in both PyTorch Geometric and Spektral)
spectral chebyshev (ChebNets) [Defferrard et al., 2016] DETAILS
Gated Graph Neural Network (GATGNN) [Li et al., 2016] DETAILS
attention-based (GAT) [Veličković et al., 2017]
Attention-based GNN (AGNN) [Thekumparampil et al., 2018]
GraphSAGE [Hamilton et al., 2017]
Graph Convolutional Networks (GCN) [Kipf & Welling, 2017] DETAILS
edge-convolution operator [Wang et al., 2018]
Graph Isomorphism Network (GIN) [Xu et al., 2019] DETAILS
ARMA [Bianchi et al., 2019]
Approximate Personalized Propagation of Neural Predictions (APPNP)
[Klicpera et al., 2019]
5 / 26
ChebNets [De errard etal., 2016]
Setting: (weighted graph)
Main idea: Signal filtering based on the Laplacian eigendecomposition
, and
is replaced by
(row corresponds to new feature , ie )
with
and is a polynomial (a decomposition on
Chebyshev polynomial basis is used) with , the polynomial
coefficients, learned during training).
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
le ∈ R
(Λ, U )
h
t
x ∈ R
K(t)
F (h
t
x , . ) = σ(. )
□y∈N (x)
ϕt(h
t
x , h
t
y, exy)
(∑
K(t)
k
′
=1
gθ(k,k
′
)
(L)(h
t
1k
′   …  h
t
nk
′ )
⊤
)
k=1,…,K(t+1)
∈ R
n×K(t+1)
x h
t+1
x gθ(k,k
′
)
(L) ∈ R
n×n
gθ(k,k
′
) (L) = U gθ(k,k
′
) (Λ)U
⊤
gθ(k,k
′
)
θ(k, k
′
) ∈ R
r
6 / 26
ChebNets [De errard etal., 2016](some
explanations)
Why is it message passing?
can be rewritten under the compact form
with
:
slight difference with general framework: MP is performed over all nodes (not
just neighbors) + Laplacian used to provide proximity relations between nodes
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
(∑
K(t)
k
′
=1
gθ(k,k
′
) (L)(h1k
′ ,
t
… h
t
nk
′ )
⊤
)
k=1,…,K(t+1)
∑
y
C
t
xy(θ)h
t
y
C
t
xy(θ) ∈ R
K(t+1)×K(t)
[C
t
xy]kk
′ = [gθ(k,k
′
)
]xy(L)
7 / 26
GATGNN [Li etal., 2016]
Setting: discrete (potentially directed)
Main idea: Use GRU (Gated Recurrent Unit [Cho et al., 2016]) in the original
GNN [Scarselli et al., 2009]
, and where
learned matrix depending on only
(update)
(reset)
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
le ∈ {A, B, . . . }
h
t
x ∈ R
K(t)
□ = ∑ ϕt(h
t
x , h
t
y, exy) = Alexy
h
t
y
Alexy
∈ R
K(t+1)×K(t)
lexy
z
t
x = σ(W
z
a
t
x + U
z
h
t
x )
r
t
x = σ(W
r
a
t
x + U
r
h
t
x )
~
h
t
x
= tanh(W a
t
x + U (r
t
x ⊙ h
t
x ))
h
t+1
x = (1 − z
t
x ) ⊙ h
t
x + z
t
x
~
h
t
x
8 / 26
GATGNN [Li etal., 2016](with some explanations)
: no update
: reset of in
These parameters and the matrices are learned.
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
z
t
x = 1
r
t
x = 0 h
t
x
~
h
t
x
9 / 26
Graph Convolutional Networks (GCN) [Kipf& Welling,
2017]
, , and
, where and are the degrees of and
. This step encourages similar prediction among locally connected nodes.
The propagation rule over the entire graph can be expressed as:
, where is the adjacency matrix of
the undirected graph.
This propagation rule is based on a first-order approximation of spectral
convolution on graphs.
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
h
t
x ∈ R
K(t)
□ = ∑ F (h
t
x , . ) = σ(. )
ϕt(h
t
x , h
t
y, exy) = h
t
y
exy
√(dx +1)(dy +1)
dx dy x y
H
t+1
← σ(
~
D
− ~
A
~
D
−
H
t
W
t
)
1
2
1
2
~
A = A + I
10 / 26
Graph IsomorphismNetwork (GIN) [Xu et al., 2019]
, , (multi-layer perceptron)
GIN- : learns by gradient descent,
GIN-0: is fixed to 0.
GIN is proved to be as powerful as the WL test for distinguishing between
different graph structures by using simple architecture (MLP).
Sum aggregation is better than mean and max aggregation in terms of
distinguishing graph structure:
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
h
t
x ∈ R
K(t)
□ = ∑ F = MLP
t+1
h
t+1
x = MLP
t+1
((1 + ϵ
t
)h
t
x + ∑
y∈N (x)
h
t
y)
ϵ ϵ
ϵ
11 / 26
Pooling layers
Graph pooling: reduction of the number of nodes in a graph. It helps GNN to
discard information that is superfluous for the task and keeps model
complexity under control.
DiffPool (Ying et al., 2018): extracts a complex hierarchical structure by
performing clustering of the graphs after each MP layer.
Top-K (Hongyang Gao, 2019; Lee et al., 2019): learns a projection vector
and selects the nodes with the K highest projection values.
MinCut (Bianchi et al., 2020): pooling method that uses spectral clustering
and aggregates nodes belonling to the same cluster.
Global pooling: reduction of a graph to a single node.
sum
average
max
SortPool (Zhang et al., 2018): sorts the vertex features in a consistent order
(based on WL colors). After sorting, the output tensor is truncated from n
to k in order to unify graph sizes.
12 / 26
The Python librairies Spektral and Pytorch GeometricThe Python librairies Spektral and Pytorch Geometric
13 / 2613 / 26
Basic overview
Spektral [Grattarola and Alippi, 2020]
based on tensorflow (at least 2.3.1) (easy to install on ubuntu with
pip3 but installation from source required for the last version)
github repository https://github.com/danielegrattarola/spektral and
detailed documentation https://graphneural.network/ with tutorials
many datasets included: https://graphneural.network/datasets/
PyTorch Geometric [Fey and Lenssen, 2019]
based on PyTorch (a bit harder to install on ubuntu due to
dependencies)
github repository https://github.com/rusty1s/pytorch_geometric and
detailed documentation https://pytorch-
geometric.readthedocs.io/en/latest/ with examples
many datasets included: https://pytorch-
geometric.readthedocs.io/en/latest/modules/datasets.html
14 / 26
Main available datasets in Spektral and PyTorch
geometric
Citation: Cora, CiteSeer and Pubmed citation datasets (node classification)
GraphSAGE: PPI dataset and Reddit dataset containing Reddit posts
belonging to different communities (node classification)
QM7, QM9: chemical datasets of molecules (graph classification)
TUDataset: benchmark datasets for graph kernels from TU Dortmund
(e.g. MUTAG, ENZYMES, PROTEINS ...) (graph classification)
Example in PyTorch geometric:
dataset =
torch_geometric.datasets.TUDataset(root='/tmp/MUTAG',
name='MUTAG')
Example in Spektral:
dataset = spektral.datasets.TUDataset('MUTAG')
15 / 26
Data modes and mini-batching
Scaling to huge amounts of data: examples in a mini-batch are grouped into a
unified representation where it can efficiently be processed in parallel.
Data modes:
single mode: only 1 graph (node classification)
disjoint mode: a set of graphs is represented as a single graph (disjoint
union)
batch mode: the graphs are zero-padded so that they fit into tensors of
shape [batch, N, N]
mixed mode: single graph with different node attributes
16 / 26
Data modes and mini-batching
Spektral
single node: loader = spektral.data.SingleLoader(dataset)
disjoint mode: loader = spektral.data.DisjointLoader(dataset,
batch_size=3)
batch mode: loader = spektral.data.BatchLoader(dataset,
batch_size=3)
PyTorch geometric: only uses the disjoint mode
loader = torch_geometric.data.DataLoader(dataset,
batch_size=3)
17 / 26
MP Layers
Spektral
ChebNets: spektral.layers.ChebConv(channels, K)
GATGNN: spektral.layers.GatedGraphConv(channels, n_layers)
GCN: spektral.layers.GCNConv(channels)
GIN: spektral.layers.GINConv(channels, epsilon) channels:
number of output channels
PyTorch geometric
ChebNets: torch_geometric.nn.ChebConv(in_channels,
out_channels, K)
GATGNN: torch_geometric.nn.GatedGraphConv(out_channels,
num_layers)
GCN: torch_geometric.nn.GCNConv(in_channels, out_channels)
GIN: torch_geometric.nn.GINConv(nn, eps, train_eps), where
nnis a neural network (e.g. torch_geometric.nn.Sequential)
18 / 26
Comparison on node classi cation
Example: Cora (2708 scientific publications, edges are co-citations, features are
words-in-documents descriptors and seven classes)
Task: starting from an initial set of training nodes with known classes, learn
the classes of the other node (test set)
the first layer, then dropout (50%) before the second layer, softmax after the
second layer, target error is categorical_crossentropy.
Learning algorithm: ADAM optimizer, 200 iterations (no early stopping),
learning rates and regularization parameter (weight decays) set to the same
value (probably)
19 / 26
Comparison on node classi cation (critical
assessment)
very fast: ~4 s for PyTorch Geometric and ~13 s for Spektral on my
computer
BUT: settings of the different parameters (iterations, learning rates and
iterations, dropout rates, dimension in hidden layers) in addition to
architecture is very hard
good accuracy: ~80% at every run
BUT: results are not at all the same!
20 / 26
Comparison on graph classi cation with PyG
For IMDB-binary, one-hot encodings of node degrees are used as input
features.
Comparison in PyTorch Geometric of:
different MP layers: GCN, GIN0, GIN, CHEB (k=3)
different global pooling layers: average, sum, max, SortPool
Architecture: 4 MP layers of dim 32, each one followed by relu, 1 global
pooling layer, relu, and then softmax. The target error is
categorical_crossentropy.
Learning algorithm: ADAM optimizer, 100 iterations. The batch size is 128.
Cross-validation with 10 folds is used.
21 / 26
Comparison on graph classi cation with PyG: results
22 / 26
Comparison on graph classi cation: critical
assignment
I also experimented graph classification wih Spektral and the type of the data
in the loaders is different compared to PyTorch Geometric
PyTorch Geometric:
data
>>>Batch(batch=[1012], edge_attr=[2244, 4], edge_index=[2,
2244], x=[1012, 7], y=[56])
x, a, e, i = data.x, data.edge_index, data.edge_attr,
data.batch
Spektral :
data is a tuple: ((x,a,i), y) or ((x,a,e,i),y) if there are edge features
More difficult to handle the two cases (edge features/no edge features)
23 / 26
That's all for now...That's all for now...
... questions?... questions?
24 / 2624 / 26
References
Bianchi FM, Grattarola D, Livi L, Alippi C (2020) Graph neural network with convolutional
ARMA filters. Preprint arXiv: 1901.01343.
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2016)
Learning phrase representations usin RNN encoder-decoder for statistical machine
translation. Preprint arXiv: 1406.1078.
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural network on graphs
with fast localized spectral filtering. Proceedings of NIPS 2016, Barcelona, Spain, 3844-3852.
Fey M, Lenssen JE (2019) Fast graph representation learning with pytorch geometric.
Proceedings of ICLR 2019 Workshop, New Orleans, LA, USA.
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for
quantum chemistry. Proceedings of ICML 2017, Sidney, Australia, PMLR 70.
Grattarola D, Alippi C (2020) Graph neural networks in TensorFlow and Keras with Spektral.
Proceedings of ICML 2020 workshop on Graph Representation Learning and Beyond.
Hamilton W, Ying Z, Lesbovec J (2017) Inductive representation learning on large graphs.
Proceedings of NIPS 2017, Long Beach, CA, USA.
Kipf TN, Welling M (2017) Semi-supervised classification with Graph Convolutional
networks. Proceedings of ICLR 2017, Toulon, France.
Klicpera J, Bojchevski A, Günnemann S (2019) Predict then propagate: graph neural
networks meet personalized pagerank. Proceedings of ICLR 2019, New Orleans, LA, USA. 25 / 26
References
Li Y, Zemel R, Brockschmidt M, Tarlow D (2016) Gated graph sequence neural networks.
Proceedings of ICLR 2016, Toulon, France.
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network
model. IEEE Transactions on Neural Networks, 20(1), 61-80.
Thekumparampil KK, Wang C, Oh S, Li LJ (2018) Attention-based graph neural network for
semi-supervised learning. Proceedings of ICLR 2018, Vancouver, Canada.
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention
networks. Proceedings of ICLR 2018, Vancouver, Canada.
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph CNN for
learning on point clouds. ACM Transactions on Graphics, 38(5), 146. DOI: 10.1145/3326362.
Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural network?
Proceedings of ICLR 2019, New Orleans, LA, USA.
26 / 26

More Related Content

PPTX
Graph Neural Network - Introduction
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PDF
Graph neural networks overview
PDF
Gnn overview
PPTX
Graph Neural Network (한국어)
PPTX
Introduction to Graph neural networks @ Vienna Deep Learning meetup
PPTX
Introduction to Grad-CAM (complete version)
PDF
Deep Learning for Graphs
Graph Neural Network - Introduction
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Graph neural networks overview
Gnn overview
Graph Neural Network (한국어)
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Grad-CAM (complete version)
Deep Learning for Graphs

What's hot (20)

PPT
Artificial neural network
PDF
How Powerful are Graph Networks?
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
PPTX
Graph R-CNN for Scene Graph Generation
PDF
[기초개념] Graph Convolutional Network (GCN)
PDF
Evolution of the StyleGAN family
PPTX
Classifying and understanding financial data using graph neural network
PDF
Domain Transfer and Adaptation Survey
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
Convolutional Neural Networks (CNN)
PPTX
Graph Neural Network 1부
PPTX
Convolution Neural Network (CNN)
PPTX
Graph Representation Learning
PDF
Introduction to Generative Adversarial Networks (GANs)
PDF
06. graph mining
PDF
Basic Generative Adversarial Networks
PDF
Link prediction
PPTX
Community detection in social networks
PDF
Transfer Learning
PDF
Graph-Powered Machine Learning
Artificial neural network
How Powerful are Graph Networks?
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Graph R-CNN for Scene Graph Generation
[기초개념] Graph Convolutional Network (GCN)
Evolution of the StyleGAN family
Classifying and understanding financial data using graph neural network
Domain Transfer and Adaptation Survey
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Convolutional Neural Networks (CNN)
Graph Neural Network 1부
Convolution Neural Network (CNN)
Graph Representation Learning
Introduction to Generative Adversarial Networks (GANs)
06. graph mining
Basic Generative Adversarial Networks
Link prediction
Community detection in social networks
Transfer Learning
Graph-Powered Machine Learning
Ad

Similar to Graph Neural Network in practice (20)

PPTX
Chapter 4 better.pptx
PDF
Graph convolutional networks in apache spark
PPTX
20191107 deeplearningapproachesfornetworks
PDF
Grl book
PPTX
Chapter 3.pptx
PDF
Learning Convolutional Neural Networks for Graphs
PPTX
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
PDF
A review on structure learning in GNN
PDF
Predicting organic reaction outcomes with weisfeiler lehman network
PDF
Webinar on Graph Neural Networks
PPTX
Sun_MAPL_GNN.pptx
PPTX
Colloquium.pptx
PPTX
Towards Predicting Molecular Property by Graph Neural Networks
PPTX
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
PDF
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
PPTX
240729_JW_labseminar[Semi-Supervised Classification with Graph Convolutional ...
PDF
Node classification with graph neural network based centrality measures and f...
PPTX
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
PDF
High-Performance Graph Analysis and Modeling
PDF
Graph deep learningまとめ (as of 20190919)
Chapter 4 better.pptx
Graph convolutional networks in apache spark
20191107 deeplearningapproachesfornetworks
Grl book
Chapter 3.pptx
Learning Convolutional Neural Networks for Graphs
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
A review on structure learning in GNN
Predicting organic reaction outcomes with weisfeiler lehman network
Webinar on Graph Neural Networks
Sun_MAPL_GNN.pptx
Colloquium.pptx
Towards Predicting Molecular Property by Graph Neural Networks
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
240729_JW_labseminar[Semi-Supervised Classification with Graph Convolutional ...
Node classification with graph neural network based centrality measures and f...
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
High-Performance Graph Analysis and Modeling
Graph deep learningまとめ (as of 20190919)
Ad

More from tuxette (20)

PDF
Analyse comparative de données de génomique 3D
PDF
Detecting differences between 3D genomic data: a benchmark study
PDF
Racines en haut et feuilles en bas : les arbres en maths
PDF
Méthodes à noyaux pour l’intégration de données hétérogènes
PDF
Méthodologies d'intégration de données omiques
PDF
Projets autour de l'Hi-C
PDF
Can deep learning learn chromatin structure from sequence?
PDF
Multi-omics data integration methods: kernel and other machine learning appro...
PDF
ASTERICS : une application pour intégrer des données omiques
PDF
Autour des projets Idefics et MetaboWean
PDF
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
PDF
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
PDF
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
PDF
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
PDF
Journal club: Validation of cluster analysis results on validation data
PDF
Overfitting or overparametrization?
PDF
Selective inference and single-cell differential analysis
PDF
SOMbrero : un package R pour les cartes auto-organisatrices
PDF
Graph Neural Network for Phenotype Prediction
PDF
A short and naive introduction to using network in prediction models
Analyse comparative de données de génomique 3D
Detecting differences between 3D genomic data: a benchmark study
Racines en haut et feuilles en bas : les arbres en maths
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodologies d'intégration de données omiques
Projets autour de l'Hi-C
Can deep learning learn chromatin structure from sequence?
Multi-omics data integration methods: kernel and other machine learning appro...
ASTERICS : une application pour intégrer des données omiques
Autour des projets Idefics et MetaboWean
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Journal club: Validation of cluster analysis results on validation data
Overfitting or overparametrization?
Selective inference and single-cell differential analysis
SOMbrero : un package R pour les cartes auto-organisatrices
Graph Neural Network for Phenotype Prediction
A short and naive introduction to using network in prediction models

Recently uploaded (20)

PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
Sciences of Europe No 170 (2025)
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
diccionario toefl examen de ingles para principiante
PPTX
famous lake in india and its disturibution and importance
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
HPLC-PPT.docx high performance liquid chromatography
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Phytochemical Investigation of Miliusa longipes.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
Sciences of Europe No 170 (2025)
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
7. General Toxicologyfor clinical phrmacy.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet Module 2ELS
Microbiology with diagram medical studies .pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
diccionario toefl examen de ingles para principiante
famous lake in india and its disturibution and importance
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Derivatives of integument scales, beaks, horns,.pptx
Comparative Structure of Integument in Vertebrates.pptx

Graph Neural Network in practice

  • 1. Graph Neural Network in practiceGraph Neural Network in practice Céline Brouard and Nathalie Vialaneix, INRAE/MIATCéline Brouard and Nathalie Vialaneix, INRAE/MIAT WG GNN, December 17th, 2020WG GNN, December 17th, 2020 1 / 261 / 26
  • 2. GNN in practiceGNN in practice Message passing principle and exploration of two librairiesMessage passing principle and exploration of two librairies 2 / 262 / 26
  • 3. OverviewofGNN last layer is fed to a standard MLP for prediction (at the graph level). 3 / 26
  • 4. Message passing layers are the generalization of convolutional layers to graph data general concept introduced in [Gilmer et al. 2017] (general framework for several previous GNN) More formally, if is a graph with nodes nodes edges , node features for : edge features for : are associated representation of node , learned iteratively (layers ): with : differential permutation invariant function (mean, sum, max...) Rq: Actually [Gilmer et al. 2017] use: and (but no example). G = (X, E) n x ∈ X e ∈ E x lx e le x hx ∈ R K t = 1 … T h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) □ □ = ∑ Ft 4 / 26
  • 5. Examples ofstandard MP layers (restricted to those present in both PyTorch Geometric and Spektral) spectral chebyshev (ChebNets) [Defferrard et al., 2016] DETAILS Gated Graph Neural Network (GATGNN) [Li et al., 2016] DETAILS attention-based (GAT) [Veličković et al., 2017] Attention-based GNN (AGNN) [Thekumparampil et al., 2018] GraphSAGE [Hamilton et al., 2017] Graph Convolutional Networks (GCN) [Kipf & Welling, 2017] DETAILS edge-convolution operator [Wang et al., 2018] Graph Isomorphism Network (GIN) [Xu et al., 2019] DETAILS ARMA [Bianchi et al., 2019] Approximate Personalized Propagation of Neural Predictions (APPNP) [Klicpera et al., 2019] 5 / 26
  • 6. ChebNets [De errard etal., 2016] Setting: (weighted graph) Main idea: Signal filtering based on the Laplacian eigendecomposition , and is replaced by (row corresponds to new feature , ie ) with and is a polynomial (a decomposition on Chebyshev polynomial basis is used) with , the polynomial coefficients, learned during training). h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) le ∈ R (Λ, U ) h t x ∈ R K(t) F (h t x , . ) = σ(. ) □y∈N (x) ϕt(h t x , h t y, exy) (∑ K(t) k ′ =1 gθ(k,k ′ ) (L)(h t 1k ′   …  h t nk ′ ) ⊤ ) k=1,…,K(t+1) ∈ R n×K(t+1) x h t+1 x gθ(k,k ′ ) (L) ∈ R n×n gθ(k,k ′ ) (L) = U gθ(k,k ′ ) (Λ)U ⊤ gθ(k,k ′ ) θ(k, k ′ ) ∈ R r 6 / 26
  • 7. ChebNets [De errard etal., 2016](some explanations) Why is it message passing? can be rewritten under the compact form with : slight difference with general framework: MP is performed over all nodes (not just neighbors) + Laplacian used to provide proximity relations between nodes h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) (∑ K(t) k ′ =1 gθ(k,k ′ ) (L)(h1k ′ , t … h t nk ′ ) ⊤ ) k=1,…,K(t+1) ∑ y C t xy(θ)h t y C t xy(θ) ∈ R K(t+1)×K(t) [C t xy]kk ′ = [gθ(k,k ′ ) ]xy(L) 7 / 26
  • 8. GATGNN [Li etal., 2016] Setting: discrete (potentially directed) Main idea: Use GRU (Gated Recurrent Unit [Cho et al., 2016]) in the original GNN [Scarselli et al., 2009] , and where learned matrix depending on only (update) (reset) h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) le ∈ {A, B, . . . } h t x ∈ R K(t) □ = ∑ ϕt(h t x , h t y, exy) = Alexy h t y Alexy ∈ R K(t+1)×K(t) lexy z t x = σ(W z a t x + U z h t x ) r t x = σ(W r a t x + U r h t x ) ~ h t x = tanh(W a t x + U (r t x ⊙ h t x )) h t+1 x = (1 − z t x ) ⊙ h t x + z t x ~ h t x 8 / 26
  • 9. GATGNN [Li etal., 2016](with some explanations) : no update : reset of in These parameters and the matrices are learned. h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) z t x = 1 r t x = 0 h t x ~ h t x 9 / 26
  • 10. Graph Convolutional Networks (GCN) [Kipf& Welling, 2017] , , and , where and are the degrees of and . This step encourages similar prediction among locally connected nodes. The propagation rule over the entire graph can be expressed as: , where is the adjacency matrix of the undirected graph. This propagation rule is based on a first-order approximation of spectral convolution on graphs. h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) h t x ∈ R K(t) □ = ∑ F (h t x , . ) = σ(. ) ϕt(h t x , h t y, exy) = h t y exy √(dx +1)(dy +1) dx dy x y H t+1 ← σ( ~ D − ~ A ~ D − H t W t ) 1 2 1 2 ~ A = A + I 10 / 26
  • 11. Graph IsomorphismNetwork (GIN) [Xu et al., 2019] , , (multi-layer perceptron) GIN- : learns by gradient descent, GIN-0: is fixed to 0. GIN is proved to be as powerful as the WL test for distinguishing between different graph structures by using simple architecture (MLP). Sum aggregation is better than mean and max aggregation in terms of distinguishing graph structure: h t+1 x = F (h t x , □y∈N (x) ϕt(h t x , h t y, exy)) h t x ∈ R K(t) □ = ∑ F = MLP t+1 h t+1 x = MLP t+1 ((1 + ϵ t )h t x + ∑ y∈N (x) h t y) ϵ ϵ ϵ 11 / 26
  • 12. Pooling layers Graph pooling: reduction of the number of nodes in a graph. It helps GNN to discard information that is superfluous for the task and keeps model complexity under control. DiffPool (Ying et al., 2018): extracts a complex hierarchical structure by performing clustering of the graphs after each MP layer. Top-K (Hongyang Gao, 2019; Lee et al., 2019): learns a projection vector and selects the nodes with the K highest projection values. MinCut (Bianchi et al., 2020): pooling method that uses spectral clustering and aggregates nodes belonling to the same cluster. Global pooling: reduction of a graph to a single node. sum average max SortPool (Zhang et al., 2018): sorts the vertex features in a consistent order (based on WL colors). After sorting, the output tensor is truncated from n to k in order to unify graph sizes. 12 / 26
  • 13. The Python librairies Spektral and Pytorch GeometricThe Python librairies Spektral and Pytorch Geometric 13 / 2613 / 26
  • 14. Basic overview Spektral [Grattarola and Alippi, 2020] based on tensorflow (at least 2.3.1) (easy to install on ubuntu with pip3 but installation from source required for the last version) github repository https://github.com/danielegrattarola/spektral and detailed documentation https://graphneural.network/ with tutorials many datasets included: https://graphneural.network/datasets/ PyTorch Geometric [Fey and Lenssen, 2019] based on PyTorch (a bit harder to install on ubuntu due to dependencies) github repository https://github.com/rusty1s/pytorch_geometric and detailed documentation https://pytorch- geometric.readthedocs.io/en/latest/ with examples many datasets included: https://pytorch- geometric.readthedocs.io/en/latest/modules/datasets.html 14 / 26
  • 15. Main available datasets in Spektral and PyTorch geometric Citation: Cora, CiteSeer and Pubmed citation datasets (node classification) GraphSAGE: PPI dataset and Reddit dataset containing Reddit posts belonging to different communities (node classification) QM7, QM9: chemical datasets of molecules (graph classification) TUDataset: benchmark datasets for graph kernels from TU Dortmund (e.g. MUTAG, ENZYMES, PROTEINS ...) (graph classification) Example in PyTorch geometric: dataset = torch_geometric.datasets.TUDataset(root='/tmp/MUTAG', name='MUTAG') Example in Spektral: dataset = spektral.datasets.TUDataset('MUTAG') 15 / 26
  • 16. Data modes and mini-batching Scaling to huge amounts of data: examples in a mini-batch are grouped into a unified representation where it can efficiently be processed in parallel. Data modes: single mode: only 1 graph (node classification) disjoint mode: a set of graphs is represented as a single graph (disjoint union) batch mode: the graphs are zero-padded so that they fit into tensors of shape [batch, N, N] mixed mode: single graph with different node attributes 16 / 26
  • 17. Data modes and mini-batching Spektral single node: loader = spektral.data.SingleLoader(dataset) disjoint mode: loader = spektral.data.DisjointLoader(dataset, batch_size=3) batch mode: loader = spektral.data.BatchLoader(dataset, batch_size=3) PyTorch geometric: only uses the disjoint mode loader = torch_geometric.data.DataLoader(dataset, batch_size=3) 17 / 26
  • 18. MP Layers Spektral ChebNets: spektral.layers.ChebConv(channels, K) GATGNN: spektral.layers.GatedGraphConv(channels, n_layers) GCN: spektral.layers.GCNConv(channels) GIN: spektral.layers.GINConv(channels, epsilon) channels: number of output channels PyTorch geometric ChebNets: torch_geometric.nn.ChebConv(in_channels, out_channels, K) GATGNN: torch_geometric.nn.GatedGraphConv(out_channels, num_layers) GCN: torch_geometric.nn.GCNConv(in_channels, out_channels) GIN: torch_geometric.nn.GINConv(nn, eps, train_eps), where nnis a neural network (e.g. torch_geometric.nn.Sequential) 18 / 26
  • 19. Comparison on node classi cation Example: Cora (2708 scientific publications, edges are co-citations, features are words-in-documents descriptors and seven classes) Task: starting from an initial set of training nodes with known classes, learn the classes of the other node (test set) the first layer, then dropout (50%) before the second layer, softmax after the second layer, target error is categorical_crossentropy. Learning algorithm: ADAM optimizer, 200 iterations (no early stopping), learning rates and regularization parameter (weight decays) set to the same value (probably) 19 / 26
  • 20. Comparison on node classi cation (critical assessment) very fast: ~4 s for PyTorch Geometric and ~13 s for Spektral on my computer BUT: settings of the different parameters (iterations, learning rates and iterations, dropout rates, dimension in hidden layers) in addition to architecture is very hard good accuracy: ~80% at every run BUT: results are not at all the same! 20 / 26
  • 21. Comparison on graph classi cation with PyG For IMDB-binary, one-hot encodings of node degrees are used as input features. Comparison in PyTorch Geometric of: different MP layers: GCN, GIN0, GIN, CHEB (k=3) different global pooling layers: average, sum, max, SortPool Architecture: 4 MP layers of dim 32, each one followed by relu, 1 global pooling layer, relu, and then softmax. The target error is categorical_crossentropy. Learning algorithm: ADAM optimizer, 100 iterations. The batch size is 128. Cross-validation with 10 folds is used. 21 / 26
  • 22. Comparison on graph classi cation with PyG: results 22 / 26
  • 23. Comparison on graph classi cation: critical assignment I also experimented graph classification wih Spektral and the type of the data in the loaders is different compared to PyTorch Geometric PyTorch Geometric: data >>>Batch(batch=[1012], edge_attr=[2244, 4], edge_index=[2, 2244], x=[1012, 7], y=[56]) x, a, e, i = data.x, data.edge_index, data.edge_attr, data.batch Spektral : data is a tuple: ((x,a,i), y) or ((x,a,e,i),y) if there are edge features More difficult to handle the two cases (edge features/no edge features) 23 / 26
  • 24. That's all for now...That's all for now... ... questions?... questions? 24 / 2624 / 26
  • 25. References Bianchi FM, Grattarola D, Livi L, Alippi C (2020) Graph neural network with convolutional ARMA filters. Preprint arXiv: 1901.01343. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2016) Learning phrase representations usin RNN encoder-decoder for statistical machine translation. Preprint arXiv: 1406.1078. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural network on graphs with fast localized spectral filtering. Proceedings of NIPS 2016, Barcelona, Spain, 3844-3852. Fey M, Lenssen JE (2019) Fast graph representation learning with pytorch geometric. Proceedings of ICLR 2019 Workshop, New Orleans, LA, USA. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. Proceedings of ICML 2017, Sidney, Australia, PMLR 70. Grattarola D, Alippi C (2020) Graph neural networks in TensorFlow and Keras with Spektral. Proceedings of ICML 2020 workshop on Graph Representation Learning and Beyond. Hamilton W, Ying Z, Lesbovec J (2017) Inductive representation learning on large graphs. Proceedings of NIPS 2017, Long Beach, CA, USA. Kipf TN, Welling M (2017) Semi-supervised classification with Graph Convolutional networks. Proceedings of ICLR 2017, Toulon, France. Klicpera J, Bojchevski A, Günnemann S (2019) Predict then propagate: graph neural networks meet personalized pagerank. Proceedings of ICLR 2019, New Orleans, LA, USA. 25 / 26
  • 26. References Li Y, Zemel R, Brockschmidt M, Tarlow D (2016) Gated graph sequence neural networks. Proceedings of ICLR 2016, Toulon, France. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61-80. Thekumparampil KK, Wang C, Oh S, Li LJ (2018) Attention-based graph neural network for semi-supervised learning. Proceedings of ICLR 2018, Vancouver, Canada. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. Proceedings of ICLR 2018, Vancouver, Canada. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5), 146. DOI: 10.1145/3326362. Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural network? Proceedings of ICLR 2019, New Orleans, LA, USA. 26 / 26