Graph Neural Network in practice

Graph Neural Network in practiceGraph Neural Network in practice
Céline Brouard and Nathalie Vialaneix, INRAE/MIATCéline Brouard and Nathalie Vialaneix, INRAE/MIAT
WG GNN, December 17th, 2020WG GNN, December 17th, 2020
1 / 261 / 26

GNN in practiceGNN in practice
Message passing principle and exploration of two librairiesMessage passing principle and exploration of two librairies
2 / 262 / 26

OverviewofGNN
last layer is fed to a standard MLP for prediction (at the graph level).
3 / 26

Message passing layers
are the generalization of convolutional layers to graph data
general concept introduced in [Gilmer et al. 2017] (general framework for
several previous GNN)
More formally, if is a graph with nodes
nodes
edges ,
node features for :
edge features for : are associated
representation of node , learned iteratively (layers ):
with : differential permutation invariant function (mean, sum, max...)
Rq: Actually [Gilmer et al. 2017] use: and (but no example).
G = (X, E) n
x ∈ X
e ∈ E
x lx
e le
x hx ∈ R
K
t = 1 … T
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
□
□ = ∑ Ft
4 / 26

Examples ofstandard MP layers
(restricted to those present in both PyTorch Geometric and Spektral)
spectral chebyshev (ChebNets) [Defferrard et al., 2016] DETAILS
Gated Graph Neural Network (GATGNN) [Li et al., 2016] DETAILS
attention-based (GAT) [Veličković et al., 2017]
Attention-based GNN (AGNN) [Thekumparampil et al., 2018]
GraphSAGE [Hamilton et al., 2017]
Graph Convolutional Networks (GCN) [Kipf & Welling, 2017] DETAILS
edge-convolution operator [Wang et al., 2018]
Graph Isomorphism Network (GIN) [Xu et al., 2019] DETAILS
ARMA [Bianchi et al., 2019]
Approximate Personalized Propagation of Neural Predictions (APPNP)
[Klicpera et al., 2019]
5 / 26

ChebNets [De errard etal., 2016]
Setting: (weighted graph)
Main idea: Signal filtering based on the Laplacian eigendecomposition
, and
is replaced by
(row corresponds to new feature , ie )
with
and is a polynomial (a decomposition on
Chebyshev polynomial basis is used) with , the polynomial
coefficients, learned during training).
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
le ∈ R
(Λ, U )
h
t
x ∈ R
K(t)
F (h
t
x , . ) = σ(. )
□y∈N (x)
ϕt(h
t
x , h
t
y, exy)
(∑
K(t)
k
′
=1
gθ(k,k
′
)
(L)(h
t
1k
′ … h
t
nk
′ )
⊤
)
k=1,…,K(t+1)
∈ R
n×K(t+1)
x h
t+1
x gθ(k,k
′
)
(L) ∈ R
n×n
gθ(k,k
′
) (L) = U gθ(k,k
′
) (Λ)U
⊤
gθ(k,k
′
)
θ(k, k
′
) ∈ R
r
6 / 26

ChebNets [De errard etal., 2016](some
explanations)
Why is it message passing?
can be rewritten under the compact form
with
:
slight difference with general framework: MP is performed over all nodes (not
just neighbors) + Laplacian used to provide proximity relations between nodes
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
(∑
K(t)
k
′
=1
gθ(k,k
′
) (L)(h1k
′ ,
t
… h
t
nk
′ )
⊤
)
k=1,…,K(t+1)
∑
y
C
t
xy(θ)h
t
y
C
t
xy(θ) ∈ R
K(t+1)×K(t)
[C
t
xy]kk
′ = [gθ(k,k
′
)
]xy(L)
7 / 26

GATGNN [Li etal., 2016]
Setting: discrete (potentially directed)
Main idea: Use GRU (Gated Recurrent Unit [Cho et al., 2016]) in the original
GNN [Scarselli et al., 2009]
, and where
learned matrix depending on only
(update)
(reset)
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
le ∈ {A, B, . . . }
h
t
x ∈ R
K(t)
□ = ∑ ϕt(h
t
x , h
t
y, exy) = Alexy
h
t
y
Alexy
∈ R
K(t+1)×K(t)
lexy
z
t
x = σ(W
z
a
t
x + U
z
h
t
x )
r
t
x = σ(W
r
a
t
x + U
r
h
t
x )
~
h
t
x
= tanh(W a
t
x + U (r
t
x ⊙ h
t
x ))
h
t+1
x = (1 − z
t
x ) ⊙ h
t
x + z
t
x
~
h
t
x
8 / 26

GATGNN [Li etal., 2016](with some explanations)
: no update
: reset of in
These parameters and the matrices are learned.
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
z
t
x = 1
r
t
x = 0 h
t
x
~
h
t
x
9 / 26

Graph Convolutional Networks (GCN) [Kipf& Welling,
2017]
, , and
, where and are the degrees of and
. This step encourages similar prediction among locally connected nodes.
The propagation rule over the entire graph can be expressed as:
, where is the adjacency matrix of
the undirected graph.
This propagation rule is based on a first-order approximation of spectral
convolution on graphs.
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
h
t
x ∈ R
K(t)
□ = ∑ F (h
t
x , . ) = σ(. )
ϕt(h
t
x , h
t
y, exy) = h
t
y
exy
√(dx +1)(dy +1)
dx dy x y
H
t+1
← σ(
~
D
− ~
A
~
D
−
H
t
W
t
)
1
2
1
2
~
A = A + I
10 / 26

Graph IsomorphismNetwork (GIN) [Xu et al., 2019]
, , (multi-layer perceptron)
GIN- : learns by gradient descent,
GIN-0: is fixed to 0.
GIN is proved to be as powerful as the WL test for distinguishing between
different graph structures by using simple architecture (MLP).
Sum aggregation is better than mean and max aggregation in terms of
distinguishing graph structure:
h
t+1
x = F (h
t
x , □y∈N (x)
ϕt(h
t
x , h
t
y, exy))
h
t
x ∈ R
K(t)
□ = ∑ F = MLP
t+1
h
t+1
x = MLP
t+1
((1 + ϵ
t
)h
t
x + ∑
y∈N (x)
h
t
y)
ϵ ϵ
ϵ
11 / 26

Pooling layers
Graph pooling: reduction of the number of nodes in a graph. It helps GNN to
discard information that is superfluous for the task and keeps model
complexity under control.
DiffPool (Ying et al., 2018): extracts a complex hierarchical structure by
performing clustering of the graphs after each MP layer.
Top-K (Hongyang Gao, 2019; Lee et al., 2019): learns a projection vector
and selects the nodes with the K highest projection values.
MinCut (Bianchi et al., 2020): pooling method that uses spectral clustering
and aggregates nodes belonling to the same cluster.
Global pooling: reduction of a graph to a single node.
sum
average
max
SortPool (Zhang et al., 2018): sorts the vertex features in a consistent order
(based on WL colors). After sorting, the output tensor is truncated from n
to k in order to unify graph sizes.
12 / 26

The Python librairies Spektral and Pytorch GeometricThe Python librairies Spektral and Pytorch Geometric
13 / 2613 / 26

Basic overview
Spektral [Grattarola and Alippi, 2020]
based on tensorflow (at least 2.3.1) (easy to install on ubuntu with
pip3 but installation from source required for the last version)
github repository https://github.com/danielegrattarola/spektral and
detailed documentation https://graphneural.network/ with tutorials
many datasets included: https://graphneural.network/datasets/
PyTorch Geometric [Fey and Lenssen, 2019]
based on PyTorch (a bit harder to install on ubuntu due to
dependencies)
github repository https://github.com/rusty1s/pytorch_geometric and
detailed documentation https://pytorch-
geometric.readthedocs.io/en/latest/ with examples
many datasets included: https://pytorch-
geometric.readthedocs.io/en/latest/modules/datasets.html
14 / 26

Main available datasets in Spektral and PyTorch
geometric
Citation: Cora, CiteSeer and Pubmed citation datasets (node classification)
GraphSAGE: PPI dataset and Reddit dataset containing Reddit posts
belonging to different communities (node classification)
QM7, QM9: chemical datasets of molecules (graph classification)
TUDataset: benchmark datasets for graph kernels from TU Dortmund
(e.g. MUTAG, ENZYMES, PROTEINS ...) (graph classification)
Example in PyTorch geometric:
dataset =
torch_geometric.datasets.TUDataset(root='/tmp/MUTAG',
name='MUTAG')
Example in Spektral:
dataset = spektral.datasets.TUDataset('MUTAG')
15 / 26

Data modes and mini-batching
Scaling to huge amounts of data: examples in a mini-batch are grouped into a
unified representation where it can efficiently be processed in parallel.
Data modes:
single mode: only 1 graph (node classification)
disjoint mode: a set of graphs is represented as a single graph (disjoint
union)
batch mode: the graphs are zero-padded so that they fit into tensors of
shape [batch, N, N]
mixed mode: single graph with different node attributes
16 / 26

Data modes and mini-batching
Spektral
single node: loader = spektral.data.SingleLoader(dataset)
disjoint mode: loader = spektral.data.DisjointLoader(dataset,
batch_size=3)
batch mode: loader = spektral.data.BatchLoader(dataset,
batch_size=3)
PyTorch geometric: only uses the disjoint mode
loader = torch_geometric.data.DataLoader(dataset,
batch_size=3)
17 / 26

MP Layers
Spektral
ChebNets: spektral.layers.ChebConv(channels, K)
GATGNN: spektral.layers.GatedGraphConv(channels, n_layers)
GCN: spektral.layers.GCNConv(channels)
GIN: spektral.layers.GINConv(channels, epsilon) channels:
number of output channels
PyTorch geometric
ChebNets: torch_geometric.nn.ChebConv(in_channels,
out_channels, K)
GATGNN: torch_geometric.nn.GatedGraphConv(out_channels,
num_layers)
GCN: torch_geometric.nn.GCNConv(in_channels, out_channels)
GIN: torch_geometric.nn.GINConv(nn, eps, train_eps), where
nnis a neural network (e.g. torch_geometric.nn.Sequential)
18 / 26

Comparison on node classi cation
Example: Cora (2708 scientific publications, edges are co-citations, features are
words-in-documents descriptors and seven classes)
Task: starting from an initial set of training nodes with known classes, learn
the classes of the other node (test set)
the first layer, then dropout (50%) before the second layer, softmax after the
second layer, target error is categorical_crossentropy.
Learning algorithm: ADAM optimizer, 200 iterations (no early stopping),
learning rates and regularization parameter (weight decays) set to the same
value (probably)
19 / 26

Comparison on node classi cation (critical
assessment)
very fast: ~4 s for PyTorch Geometric and ~13 s for Spektral on my
computer
BUT: settings of the different parameters (iterations, learning rates and
iterations, dropout rates, dimension in hidden layers) in addition to
architecture is very hard
good accuracy: ~80% at every run
BUT: results are not at all the same!
20 / 26

Comparison on graph classi cation with PyG
For IMDB-binary, one-hot encodings of node degrees are used as input
features.
Comparison in PyTorch Geometric of:
different MP layers: GCN, GIN0, GIN, CHEB (k=3)
different global pooling layers: average, sum, max, SortPool
Architecture: 4 MP layers of dim 32, each one followed by relu, 1 global
pooling layer, relu, and then softmax. The target error is
categorical_crossentropy.
Learning algorithm: ADAM optimizer, 100 iterations. The batch size is 128.
Cross-validation with 10 folds is used.
21 / 26

Comparison on graph classi cation with PyG: results
22 / 26

Comparison on graph classi cation: critical
assignment
I also experimented graph classification wih Spektral and the type of the data
in the loaders is different compared to PyTorch Geometric
PyTorch Geometric:
data
>>>Batch(batch=[1012], edge_attr=[2244, 4], edge_index=[2,
2244], x=[1012, 7], y=[56])
x, a, e, i = data.x, data.edge_index, data.edge_attr,
data.batch
Spektral :
data is a tuple: ((x,a,i), y) or ((x,a,e,i),y) if there are edge features
More difficult to handle the two cases (edge features/no edge features)
23 / 26

That's all for now...That's all for now...
... questions?... questions?
24 / 2624 / 26

References
Bianchi FM, Grattarola D, Livi L, Alippi C (2020) Graph neural network with convolutional
ARMA filters. Preprint arXiv: 1901.01343.
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2016)
Learning phrase representations usin RNN encoder-decoder for statistical machine
translation. Preprint arXiv: 1406.1078.
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural network on graphs
with fast localized spectral filtering. Proceedings of NIPS 2016, Barcelona, Spain, 3844-3852.
Fey M, Lenssen JE (2019) Fast graph representation learning with pytorch geometric.
Proceedings of ICLR 2019 Workshop, New Orleans, LA, USA.
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for
quantum chemistry. Proceedings of ICML 2017, Sidney, Australia, PMLR 70.
Grattarola D, Alippi C (2020) Graph neural networks in TensorFlow and Keras with Spektral.
Proceedings of ICML 2020 workshop on Graph Representation Learning and Beyond.
Hamilton W, Ying Z, Lesbovec J (2017) Inductive representation learning on large graphs.
Proceedings of NIPS 2017, Long Beach, CA, USA.
Kipf TN, Welling M (2017) Semi-supervised classification with Graph Convolutional
networks. Proceedings of ICLR 2017, Toulon, France.
Klicpera J, Bojchevski A, Günnemann S (2019) Predict then propagate: graph neural
networks meet personalized pagerank. Proceedings of ICLR 2019, New Orleans, LA, USA. 25 / 26

References
Li Y, Zemel R, Brockschmidt M, Tarlow D (2016) Gated graph sequence neural networks.
Proceedings of ICLR 2016, Toulon, France.
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network
model. IEEE Transactions on Neural Networks, 20(1), 61-80.
Thekumparampil KK, Wang C, Oh S, Li LJ (2018) Attention-based graph neural network for
semi-supervised learning. Proceedings of ICLR 2018, Vancouver, Canada.
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention
networks. Proceedings of ICLR 2018, Vancouver, Canada.
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph CNN for
learning on point clouds. ACM Transactions on Graphics, 38(5), 146. DOI: 10.1145/3326362.
Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural network?
Proceedings of ICLR 2019, New Orleans, LA, USA.
26 / 26

Graph Neural Network in practice

More Related Content

What's hot (20)

Similar to Graph Neural Network in practice (20)

More from tuxette (20)

Recently uploaded (20)

Graph Neural Network in practice