SlideShare a Scribd company logo
From RNN to neural networks for cyclicFrom RNN to neural networks for cyclic
undirected graphsundirected graphs
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
WG GNN, May 7th, 2020WG GNN, May 7th, 2020
1 / 321 / 32
To start: a briefoverviewofthis talk...To start: a briefoverviewofthis talk...
2 / 322 / 32
Topic
(What is this presentation about?)
How to use (deep) NN for processing relational (graph) data?
I will first start by describing Recurrent Neural Network (RNN) and their
limits to process graphs
Then, I will present two alternatives able to address these limits
I will try to stay non technical (so potentially too vague) to focus on the
most important take home messages
3 / 32
Description ofthe purpose ofthe methods
Data: a graph , with a set of vertices with labels
and a set of edges that can also be labelled
The graph can be directed or undirected, with or without
cycles.
Purpose: find a method (a neural network with weights , ) that is able
to process these data (using the information about the relations / edges
between vertices) to obtain:
make a prediction for every node in the graph,
or make a prediction for the graph itself,
learning dataset: a collection of graphs or a graph (that can be
disconnected) associated to predictions or
G V = {x1 , . . . xn}
l(xi ) ∈ R
p
E = (ej)j=1,…,m
l(ej) ∈ R
q
w ϕw
ϕw(xi )
ϕw(G)
y(G) yi
4 / 32
The basis ofthe work: RNN for structured dataThe basis ofthe work: RNN for structured data
5 / 325 / 32
Framework
Reference: [Sperduti & Starita, 1997]
basic description of standard RNN
adaptations to deal with directed acyclic graphs (DAG)
output is obtained at the graph level ( )
The article also mentions way to deal with cycles and other types of learning
that the standard back-propagation that I'll describe
ϕw(G)
6 / 32
Fromstandard neuron to recurrent neuron
standard neuron
where are the inputs of the neuron (often: the neurons in the previous
layer).
o = f (
r
∑
j=1
wjvj)
vj
7 / 32
Fromstandard neuron to recurrent neuron
recurrent neuron
where is the self weight.
o(t) = f (
r
∑
j=1
wjvj + wSo(t − 1))
wS
8 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
where is the position of the vertex within the children of (it means
that the DAG is a positional DAG).
x5
o(xi ) = f (∑
p
j=1
wjlj(xi ) + ∑
xi→x
i
′
^wn(i
′
) o(xi
′ ))
n(i
′
) i
′
xi
9 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
We have:
(and similarly for and )
x5
o(x8 ) = f (∑
p
j=1
wjlj(x8 )) x7 x2
10 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
We have:
(and similarly for )
x5
o(x9 ) = f (∑
p
j=1
wjlj(x9 ) + ^w1 o(x8 )) x3
11 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
We have:
x5
o(x10 ) = f (∑
p
j=1
wjlj(x10 ) + ^w1 o(x3 ))
12 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
We have:
x5
o(x11 ) = f (∑
p
j=1
wjlj(x11 ) + ^w1 o(x2 ) + ^w2 o(x7 ) + ^w3 o(x9 ) + ^w4 o(x10 ))
13 / 32
Using that type ofrecurrent neuron for DAGencoding
(for a DAG with a supersource, here )
We have:
x5
o(x5 ) = f (∑
p
j=1
wjlj(x5 ) + ^w1 o(x11 ))
14 / 32
Using that type ofrecurrent neuron for DAGencoding
Learning can be performed by back-propagation:
for a given set of weights , recursively compute the outputs on the
graph structure
reciprocally, compute the gradient from the output, recursively on the
graph structure
(w, ^w)
15 / 32
Generalization: cascade correlation for networks
Idea: make several layer of outputs such that depends
on , (as for the previous case) and also on (but
these values are "frozen").
o
1
(x), … , o
r
(x) o
l
(x)
l(x) (o
l
′
(x
′
))x→x
′
, l
′
≤l (o
l
′
(x))l
′
<l
16 / 32
Main limits
Since the approach explicitely relies on the DAG order to successively
compute the output of the nodes, it is not adapted to undirected or cyclic
graphs
Also, the positional assumption of the neighbor of a given node (that an
objective "order" exist between neighbors) is not easily met in real-world
applications
Can only compute prediction for graphs (not for nodes)
Note: The method is tested (in this paper) on logic problems (not described
here)
17 / 32
A rst approach using contraction maps by ScarselliA rst approach using contraction maps by Scarselli etet
al.al., 2009, 2009
18 / 3218 / 32
Overviewofthe method
is able to deal with undirected and cyclic graphs
does not require a positional assumption on the neighbors of a given
node
can be used to make a prediction at a graph and node levels
Main idea: use a "time"-dependant update of the neurons and use restriction
on the weights to constrain the NN to be a contraction map so that the fixed
point theorem can be applied
19 / 32
For each node , we define:
a neuron value expressed as:
an output value obtained from
this neuron value as:
(that can be
combined into a graph output
value if needed)
Basic neuron equations
xi
vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi)
,
{vu }xu ∈N (xi)
, {l(xu )}xu ∈N (xi)
)
oi = gw(vi , l(xi ))
20 / 32
For each node , we define:
a neuron value expressed as:
an output value obtained from
this neuron value as:
(that can be
combined into a graph output
value if needed)
In a compressed version, this gives:
and .
Basic neuron equations
xi
vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi)
,
{vu }xu ∈N (xi)
, {l(xu )}xu ∈N (xi)
)
oi = gw(vi , l(xi ))
V = Fw(V , l) O = Gw(V , l)
21 / 32
Making the process recurrent...
The neuron value is made "time" dependent with:
Equivalently, ) so, provided that is a contraction map,
converges to a fixed point. (a sufficient condition is that the norm of
is bounded by )
v
t
i
= fw (l(xi ), {l(xi , xu )}xu ∈N (xi)
, {v
t−1
u }xu ∈N (xi)
, {l(xu )}xu ∈N (xi)
)
V
t+1
= Fw(V
t
, l) Fw
(V
t
)t
∇V Fw(V , l) μ < 1
22 / 32
What are and ?
is a fully-connected MLP
is decomposed into
and is trained as a 1 hidden layer MLP.
Rk: another version is provided in which is obtained as a linear function in
which the intercept and the slope are estimated by MLP.
fw gw
gw
vi = fw(l(xi ), . . . )
∑
xu ∈N (xi)
hw(l(xi ), l(xi , xu ), vu , l(xu ))
hw
hw
23 / 32
Training ofthe weights
The weights of the two MLP are trained by the minimization of
but, to ensure that the resulting is a contraction
map, the weights of are penalized during the training:
with for a given and .
The training is performed by gradient descent where the gradient is obtained
by back-propagation.
BP is simplified using the fact that tends to a fixed point.
∑
n
i=1
(yi − gw(v
T
i
))
2
Fw
Fw
n
∑
i=1
(yi − gw(v
T
i
))
2
+ βL (|∇V Fw|)
L(u) = u − μ μ ∈]0, β > 0
(v
t
i
)t
24 / 32
Applications
The method is illustrated on different types of problems:
the subgraph matching problem (finding a subgraph matching a target
graph in a large graph) in which the prediction is made at the node level
(does it belong to the subgraph or not?)
recover the mutagenic compounds into nitroaromatic compounds
(molecules used as intermediate subproducts in many industrial
reactions). Compounds are described by the graph molecule with
(qualitative and numerical) informations attached to the nodes
web page ranking in which the purpose is to predict a Google page rank
derived measure from a network of 5,000 web pages
25 / 32
A second approach using constructive architecture byA second approach using constructive architecture by
MicheliMicheli etal.etal., 2009, 2009
26 / 3226 / 32
Overviewofthe method
is able to deal with undirected and cyclic graphs (but no label on the
edges)
does not require a positional assumption on the neighbors of a given
node
can be used to make a prediction at a graph and node levels (probably,
though it is made explicit for the graph level)
Main idea: define an architecture close to "cascade correlation network" with
some "frozen" neurones that are not updated. The architecture is hierarchical
and adaptive, in the sense that it stops growing when a given accuracy is
achieved.
27 / 32
Neuron equations
Similarly as previously, neurons are computed in a recurrent way that
depends on "time". The neuron state at time for vertex depends on its
label and of the neuron state of the neighboring neurons at all past times:
Rk:
a stationnary assumption (the weights do not depend on the node nor on
the edge) is critical to obtain a simple enough formulation
contrary to RNN or to the previous version, are not updated: the
layer are trained one at a time and once the training is finished, the
neuron states are considered "frozen" (which is a way to avoid problem
with cycles)
t xi
v
t
i
= f (∑
j
w
t
j
lj(xi ) + ∑
t
′
<t
^w
tt
′
∑
xu ∈N (xi)
v
t
′
u )
(v
t
′
u )t
′
<t
28 / 32
Combining neuron outputs into a prediction
output of layer : where is a normalization factor
(equal to 1 or to the number of nodes for instance)
output of the network:
t ϕ
t
w(G) = ∑
n
i=1
v
t
i
1
C
C
Φw(G) = f (∑
t
w
t
ϕ
t
w(G))
29 / 32
Training
Training is also performed by minimization of the squared error but:
not constraint is needed on weights
back-propagation is not performed through unfolded layers
Examples
QSPR/QSAR task that consists in transforming information on molecular
structure into information on chemical properties. Here: prediction of
boiling point value
classification of cyclic/acyclic graphs
30 / 32
That's all for now...That's all for now...
... questions?... questions?
31 / 3231 / 32
References
Micheli A (2009) Neural networks for graphs: a contextual constructive approach. IEEE
Transactions on Neural Networks, 20(3): 498-511
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network
model. IEEE Transactions on Neural Networks, 20(1): 61-80
Sperduti A, Starita A (1997) Supervised neural network for the classification of structures.
IEEE Transactions on Neural Networks, 8(3): 714-735
32 / 32

More Related Content

PDF
Radial Basis Function Interpolation
PDF
A short and naive introduction to using network in prediction models
PDF
A review on structure learning in GNN
PDF
Convolutional networks and graph networks through kernels
PDF
Differential analyses of structures in HiC data
PDF
Random Forest for Big Data
PDF
Kernel methods and variable selection for exploratory analysis and multi-omic...
PDF
Macrocanonical models for texture synthesis
Radial Basis Function Interpolation
A short and naive introduction to using network in prediction models
A review on structure learning in GNN
Convolutional networks and graph networks through kernels
Differential analyses of structures in HiC data
Random Forest for Big Data
Kernel methods and variable selection for exploratory analysis and multi-omic...
Macrocanonical models for texture synthesis

What's hot (20)

PDF
Gtti 10032021
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Neural Networks: Radial Bases Functions (RBF)
PDF
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
PDF
Iclr2016 vaeまとめ
PDF
Reproducibility and differential analysis with selfish
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
Particle Filters and Applications in Computer Vision
PDF
Polynomial Matrix Decompositions
PDF
Investigating the 3D structure of the genome with Hi-C data analysis
PDF
A discussion on sampling graphs to approximate network classification functions
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
Estimating Space-Time Covariance from Finite Sample Sets
PDF
Kernel methods for data integration in systems biology
PDF
Graph Kernels for Chemical Informatics
PDF
An introduction to deep learning
PDF
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
PDF
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
PDF
Bayesian Core: Chapter 8
PDF
(DL輪読)Matching Networks for One Shot Learning
Gtti 10032021
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Neural Networks: Radial Bases Functions (RBF)
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Iclr2016 vaeまとめ
Reproducibility and differential analysis with selfish
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
Particle Filters and Applications in Computer Vision
Polynomial Matrix Decompositions
Investigating the 3D structure of the genome with Hi-C data analysis
A discussion on sampling graphs to approximate network classification functions
Continuous and Discrete-Time Analysis of SGD
Estimating Space-Time Covariance from Finite Sample Sets
Kernel methods for data integration in systems biology
Graph Kernels for Chemical Informatics
An introduction to deep learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Bayesian Core: Chapter 8
(DL輪読)Matching Networks for One Shot Learning
Ad

Similar to From RNN to neural networks for cyclic undirected graphs (20)

PPTX
20191107 deeplearningapproachesfornetworks
PPTX
Artificial neural network by arpit_sharma
PDF
Graph neural networks overview
PDF
Graph convolutional networks in apache spark
PPTX
Nimrita deep learning
PDF
Echo state networks and locomotion patterns
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PDF
Rnn presentation 2
PPTX
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
PPTX
08 neural networks
PPTX
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
PDF
Predicting organic reaction outcomes with weisfeiler lehman network
PDF
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
PPTX
Tsinghua invited talk_zhou_xing_v2r0
PPTX
Chapter 4 better.pptx
PDF
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
PDF
Capstone paper
PDF
Artificial neural networks
PDF
Hardware Acceleration for Machine Learning
PDF
Neural networks across space & time : Deep learning in java
20191107 deeplearningapproachesfornetworks
Artificial neural network by arpit_sharma
Graph neural networks overview
Graph convolutional networks in apache spark
Nimrita deep learning
Echo state networks and locomotion patterns
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Rnn presentation 2
NS-CUK Joint Journal Club: S.T.Nguyen, Review on “Cluster-GCN: An Efficient A...
08 neural networks
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
Predicting organic reaction outcomes with weisfeiler lehman network
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
Tsinghua invited talk_zhou_xing_v2r0
Chapter 4 better.pptx
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Capstone paper
Artificial neural networks
Hardware Acceleration for Machine Learning
Neural networks across space & time : Deep learning in java
Ad

More from tuxette (20)

PDF
Analyse comparative de données de génomique 3D
PDF
Detecting differences between 3D genomic data: a benchmark study
PDF
Racines en haut et feuilles en bas : les arbres en maths
PDF
Méthodes à noyaux pour l’intégration de données hétérogènes
PDF
Méthodologies d'intégration de données omiques
PDF
Projets autour de l'Hi-C
PDF
Can deep learning learn chromatin structure from sequence?
PDF
Multi-omics data integration methods: kernel and other machine learning appro...
PDF
ASTERICS : une application pour intégrer des données omiques
PDF
Autour des projets Idefics et MetaboWean
PDF
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
PDF
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
PDF
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
PDF
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
PDF
Journal club: Validation of cluster analysis results on validation data
PDF
Overfitting or overparametrization?
PDF
Selective inference and single-cell differential analysis
PDF
SOMbrero : un package R pour les cartes auto-organisatrices
PDF
Graph Neural Network for Phenotype Prediction
PDF
Explanable models for time series with random forest
Analyse comparative de données de génomique 3D
Detecting differences between 3D genomic data: a benchmark study
Racines en haut et feuilles en bas : les arbres en maths
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodologies d'intégration de données omiques
Projets autour de l'Hi-C
Can deep learning learn chromatin structure from sequence?
Multi-omics data integration methods: kernel and other machine learning appro...
ASTERICS : une application pour intégrer des données omiques
Autour des projets Idefics et MetaboWean
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Journal club: Validation of cluster analysis results on validation data
Overfitting or overparametrization?
Selective inference and single-cell differential analysis
SOMbrero : un package R pour les cartes auto-organisatrices
Graph Neural Network for Phenotype Prediction
Explanable models for time series with random forest

Recently uploaded (20)

DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
The scientific heritage No 166 (166) (2025)
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
2. Earth - The Living Planet earth and life
Viruses (History, structure and composition, classification, Bacteriophage Re...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Introduction to Fisheries Biotechnology_Lesson 1.pptx
2. Earth - The Living Planet Module 2ELS
HPLC-PPT.docx high performance liquid chromatography
Introduction to Cardiovascular system_structure and functions-1
The scientific heritage No 166 (166) (2025)
INTRODUCTION TO EVS | Concept of sustainability
bbec55_b34400a7914c42429908233dbd381773.pdf
ECG_Course_Presentation د.محمد صقران ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Biophysics 2.pdffffffffffffffffffffffffff
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Derivatives of integument scales, beaks, horns,.pptx
2. Earth - The Living Planet earth and life

From RNN to neural networks for cyclic undirected graphs

  • 1. From RNN to neural networks for cyclicFrom RNN to neural networks for cyclic undirected graphsundirected graphs Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT WG GNN, May 7th, 2020WG GNN, May 7th, 2020 1 / 321 / 32
  • 2. To start: a briefoverviewofthis talk...To start: a briefoverviewofthis talk... 2 / 322 / 32
  • 3. Topic (What is this presentation about?) How to use (deep) NN for processing relational (graph) data? I will first start by describing Recurrent Neural Network (RNN) and their limits to process graphs Then, I will present two alternatives able to address these limits I will try to stay non technical (so potentially too vague) to focus on the most important take home messages 3 / 32
  • 4. Description ofthe purpose ofthe methods Data: a graph , with a set of vertices with labels and a set of edges that can also be labelled The graph can be directed or undirected, with or without cycles. Purpose: find a method (a neural network with weights , ) that is able to process these data (using the information about the relations / edges between vertices) to obtain: make a prediction for every node in the graph, or make a prediction for the graph itself, learning dataset: a collection of graphs or a graph (that can be disconnected) associated to predictions or G V = {x1 , . . . xn} l(xi ) ∈ R p E = (ej)j=1,…,m l(ej) ∈ R q w ϕw ϕw(xi ) ϕw(G) y(G) yi 4 / 32
  • 5. The basis ofthe work: RNN for structured dataThe basis ofthe work: RNN for structured data 5 / 325 / 32
  • 6. Framework Reference: [Sperduti & Starita, 1997] basic description of standard RNN adaptations to deal with directed acyclic graphs (DAG) output is obtained at the graph level ( ) The article also mentions way to deal with cycles and other types of learning that the standard back-propagation that I'll describe ϕw(G) 6 / 32
  • 7. Fromstandard neuron to recurrent neuron standard neuron where are the inputs of the neuron (often: the neurons in the previous layer). o = f ( r ∑ j=1 wjvj) vj 7 / 32
  • 8. Fromstandard neuron to recurrent neuron recurrent neuron where is the self weight. o(t) = f ( r ∑ j=1 wjvj + wSo(t − 1)) wS 8 / 32
  • 9. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) where is the position of the vertex within the children of (it means that the DAG is a positional DAG). x5 o(xi ) = f (∑ p j=1 wjlj(xi ) + ∑ xi→x i ′ ^wn(i ′ ) o(xi ′ )) n(i ′ ) i ′ xi 9 / 32
  • 10. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: (and similarly for and ) x5 o(x8 ) = f (∑ p j=1 wjlj(x8 )) x7 x2 10 / 32
  • 11. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: (and similarly for ) x5 o(x9 ) = f (∑ p j=1 wjlj(x9 ) + ^w1 o(x8 )) x3 11 / 32
  • 12. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x10 ) = f (∑ p j=1 wjlj(x10 ) + ^w1 o(x3 )) 12 / 32
  • 13. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x11 ) = f (∑ p j=1 wjlj(x11 ) + ^w1 o(x2 ) + ^w2 o(x7 ) + ^w3 o(x9 ) + ^w4 o(x10 )) 13 / 32
  • 14. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x5 ) = f (∑ p j=1 wjlj(x5 ) + ^w1 o(x11 )) 14 / 32
  • 15. Using that type ofrecurrent neuron for DAGencoding Learning can be performed by back-propagation: for a given set of weights , recursively compute the outputs on the graph structure reciprocally, compute the gradient from the output, recursively on the graph structure (w, ^w) 15 / 32
  • 16. Generalization: cascade correlation for networks Idea: make several layer of outputs such that depends on , (as for the previous case) and also on (but these values are "frozen"). o 1 (x), … , o r (x) o l (x) l(x) (o l ′ (x ′ ))x→x ′ , l ′ ≤l (o l ′ (x))l ′ <l 16 / 32
  • 17. Main limits Since the approach explicitely relies on the DAG order to successively compute the output of the nodes, it is not adapted to undirected or cyclic graphs Also, the positional assumption of the neighbor of a given node (that an objective "order" exist between neighbors) is not easily met in real-world applications Can only compute prediction for graphs (not for nodes) Note: The method is tested (in this paper) on logic problems (not described here) 17 / 32
  • 18. A rst approach using contraction maps by ScarselliA rst approach using contraction maps by Scarselli etet al.al., 2009, 2009 18 / 3218 / 32
  • 19. Overviewofthe method is able to deal with undirected and cyclic graphs does not require a positional assumption on the neighbors of a given node can be used to make a prediction at a graph and node levels Main idea: use a "time"-dependant update of the neurons and use restriction on the weights to constrain the NN to be a contraction map so that the fixed point theorem can be applied 19 / 32
  • 20. For each node , we define: a neuron value expressed as: an output value obtained from this neuron value as: (that can be combined into a graph output value if needed) Basic neuron equations xi vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {vu }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) oi = gw(vi , l(xi )) 20 / 32
  • 21. For each node , we define: a neuron value expressed as: an output value obtained from this neuron value as: (that can be combined into a graph output value if needed) In a compressed version, this gives: and . Basic neuron equations xi vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {vu }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) oi = gw(vi , l(xi )) V = Fw(V , l) O = Gw(V , l) 21 / 32
  • 22. Making the process recurrent... The neuron value is made "time" dependent with: Equivalently, ) so, provided that is a contraction map, converges to a fixed point. (a sufficient condition is that the norm of is bounded by ) v t i = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {v t−1 u }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) V t+1 = Fw(V t , l) Fw (V t )t ∇V Fw(V , l) μ < 1 22 / 32
  • 23. What are and ? is a fully-connected MLP is decomposed into and is trained as a 1 hidden layer MLP. Rk: another version is provided in which is obtained as a linear function in which the intercept and the slope are estimated by MLP. fw gw gw vi = fw(l(xi ), . . . ) ∑ xu ∈N (xi) hw(l(xi ), l(xi , xu ), vu , l(xu )) hw hw 23 / 32
  • 24. Training ofthe weights The weights of the two MLP are trained by the minimization of but, to ensure that the resulting is a contraction map, the weights of are penalized during the training: with for a given and . The training is performed by gradient descent where the gradient is obtained by back-propagation. BP is simplified using the fact that tends to a fixed point. ∑ n i=1 (yi − gw(v T i )) 2 Fw Fw n ∑ i=1 (yi − gw(v T i )) 2 + βL (|∇V Fw|) L(u) = u − μ μ ∈]0, β > 0 (v t i )t 24 / 32
  • 25. Applications The method is illustrated on different types of problems: the subgraph matching problem (finding a subgraph matching a target graph in a large graph) in which the prediction is made at the node level (does it belong to the subgraph or not?) recover the mutagenic compounds into nitroaromatic compounds (molecules used as intermediate subproducts in many industrial reactions). Compounds are described by the graph molecule with (qualitative and numerical) informations attached to the nodes web page ranking in which the purpose is to predict a Google page rank derived measure from a network of 5,000 web pages 25 / 32
  • 26. A second approach using constructive architecture byA second approach using constructive architecture by MicheliMicheli etal.etal., 2009, 2009 26 / 3226 / 32
  • 27. Overviewofthe method is able to deal with undirected and cyclic graphs (but no label on the edges) does not require a positional assumption on the neighbors of a given node can be used to make a prediction at a graph and node levels (probably, though it is made explicit for the graph level) Main idea: define an architecture close to "cascade correlation network" with some "frozen" neurones that are not updated. The architecture is hierarchical and adaptive, in the sense that it stops growing when a given accuracy is achieved. 27 / 32
  • 28. Neuron equations Similarly as previously, neurons are computed in a recurrent way that depends on "time". The neuron state at time for vertex depends on its label and of the neuron state of the neighboring neurons at all past times: Rk: a stationnary assumption (the weights do not depend on the node nor on the edge) is critical to obtain a simple enough formulation contrary to RNN or to the previous version, are not updated: the layer are trained one at a time and once the training is finished, the neuron states are considered "frozen" (which is a way to avoid problem with cycles) t xi v t i = f (∑ j w t j lj(xi ) + ∑ t ′ <t ^w tt ′ ∑ xu ∈N (xi) v t ′ u ) (v t ′ u )t ′ <t 28 / 32
  • 29. Combining neuron outputs into a prediction output of layer : where is a normalization factor (equal to 1 or to the number of nodes for instance) output of the network: t ϕ t w(G) = ∑ n i=1 v t i 1 C C Φw(G) = f (∑ t w t ϕ t w(G)) 29 / 32
  • 30. Training Training is also performed by minimization of the squared error but: not constraint is needed on weights back-propagation is not performed through unfolded layers Examples QSPR/QSAR task that consists in transforming information on molecular structure into information on chemical properties. Here: prediction of boiling point value classification of cyclic/acyclic graphs 30 / 32
  • 31. That's all for now...That's all for now... ... questions?... questions? 31 / 3231 / 32
  • 32. References Micheli A (2009) Neural networks for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks, 20(3): 498-511 Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61-80 Sperduti A, Starita A (1997) Supervised neural network for the classification of structures. IEEE Transactions on Neural Networks, 8(3): 714-735 32 / 32