SlideShare a Scribd company logo
3
Most read
5
Most read
A review on structure learning in GNN
A review on structure learning in GNN
Nathalie Vialaneix, INRAE/MIAT
Nathalie Vialaneix, INRAE/MIAT
WG GNN, May 25, 2021
WG GNN, May 25, 2021
1 / 18
1 / 18
Overviewofthe structure learning problem
Overviewofthe structure learning problem
2 / 18
2 / 18
Problemframework
Question: Given a GNN ($Theta$ are the network parameters learned
during training) with input node features
 and (unknown) graph
with nodes and adjacency matrix ,
find the GNN parameters,
learn the graph 
so as to minimize
Problem: Learning means learning edges 
between nodes
(discrete optimization problem)
fΘ
X ∈ R
N ×F
G
N i ∈ {1, … , N} A ∈ {0, 1}
N
Θ
G ∑
N
i=1
Loss(fΘ(xi , G), yi )
G Aii
′ ∈ {0, 1}
3 / 18
Possible routes to a solution
in what Thomas and Marianne presented: the output of NN is itself
discrete and thus the loss is piecewise constant approximation of
the
loss by a continuous loss to compute a gradient
in the more general case presented before: unsure what the shape of the
loss is, with respect to the graph description of a typology
of methods
as proposed by [Zhu et al, 2021]
⇒
A ⇒
4 / 18
Short reminder ofthe general learning process
Rk:
Iteration between forward pass (prediction and update of embedding) and
passward pass (backpropagation and update of GNN parameters and
graph)
backward pass is often based on gradient descent and the gradient with
respect to is maybe automatically computed through standard NN
library
(almost never fully described)
A
∗
5 / 18
First class ofmethods: metric based learning
First class ofmethods: metric based learning
6 / 18
6 / 18
General principle
Instead of a graph in , a similarity matrix (symmetric
with null diagonal) is learned using smoothness assumption of node
labels/embeddings wrt the graph:
edges correspond to closeness (in some sense) between the corresponding
values of
simplest approach: Gaussian weights
the final graph is often
learned with additional constrains ( or nuclear norm on )
combined with the input graph (like in gated NN)
for the simplest form
post-processed to ensure sparsity (e.g., hard thresholding)
Important remark: in this approach, the learned parameter for the graph are
related to the similarity (how to compute it?) and not to the adjacency matrix
itself (directly deduced from )
{0, 1}
N
S
∗
∈ [0, +∞[
N
Z
s
∗
ii
′ = e
−γ∥zi−z
i
′ ∥
2
ℓ1 S
∗
S
∗∗
= αA + (1 − α)S
∗
ϵ
S
7 / 18
An example: [Chen etal., 2020]
Forward step iteratively computes the adjacency matrices and the output
of
the two GNN times Loss is the average of losses combining graph loss
and prediction loss
1. Graph definition when is given
Graph at substep is learned through an adjacency matrix with:
Compute entries of adjacency matrices between nodes and with:
( is either the input
features on the nodes or the prediction of the first GNN as obtained in
substep ) are learned parameters (using
GD during backward step)
average the (three parameters, looks a bit like Gated
NN) and
threshold values below : . Question: How to compute a gradient
here?
Define adjacency matrix as a mix between , and
T ⇒ R
Θ
(t)
r + 1
m i i
′
s
(r+1),l
ii
′ (v
(r)
) = cos (w
(t)
l
⊗ z
(r)
i
, w
(t)
l
⊗ z
(r)
i
′ ) v
r ⇒ w
S
l
ii
′ (v
(r+1)
)
ϵ A
(r+1)
A
(r+1)
A
(0)
A
(1)
~
A
(r+1)
8 / 18
Prediction once the graph is given: two GNN
z
(r+1)
i
= GNN1 (
~
A
(r+1)
, xi )
^
y i
= GNN2 (
~
A
(r+1)
, z
(r+1)
i
)
9 / 18
Loss at substep
loss composed of three terms: adequation to (Laplacian like),
(ridge penalty) and a "log barrier" (prevents disconnected graph and
control sparsity)
Global loss at forward step : . Gradient is also the sum of
gradients that are back-propagated from each other
(I guess)
r + 1
L
r+1
pred
= ∑
i
Loss(yi , ^
y
i
)
L
r+1
graph
xi
∥. ∥
2
F
t L
t
= ∑
r
L
r
10 / 18
Second class ofmethods: Probabilistic models
Second class ofmethods: Probabilistic models
11 / 18
11 / 18
General principle
Assume a distribution on (or equivalently ) and solve
the expected
problem wrt this distribution
Ex: [Franceschi etal, ICML 2019]
learned parameters are the distribution parameter (and not the graph itself)
Optimization over a distribution
Initial problem to solve:
 (can
contain a regularization term for )
Problem transformation Suppose (with unknown ),
then instead, solve
 


Remarks:
It does not solve the graph learning problem directly
but it gives information on the graph by learning (here, not very
informative because the model is too simple)
G w
minΘ(a),a ∑
nodes
Loss(fΘ(a) (xi , G), yi )
Θ(a)
aii
′ ∼iid Pθ = B(θ) θ
EPθ
[Loss(ϕΘ(a) (xi , G), yi )]
θ
12 / 18
Learning in practice
Iterate over:
update of : we need to compute 


easy: is computed for a given (SG) and SGD steps are performed
times to update
update of : we need to compute 


hard (because is involved via the expectation over a distribution)
magic trick: setting ( is set to its expectation)
and update with projected gradient descent
F (Θ(a), a) = EPθ
[Loss(fΘ(a) (xi , G), yi )]
Θ ∇ΘF
∇ΘF a ∼ Pθ
R Θ
a ∇aF
a
a = θ a
∇θF ≃ EPθ
[∇ΘF (Θ(a), a)∇aΘ(a) + ∇aF (Θ(a), a)]
θ
13 / 18
My concern
A bit rough:
distribution is too simple
averaging over the distribution is not really informative probably
14 / 18
Third class ofmethods: Direct optimization
Third class ofmethods: Direct optimization
15 / 18
15 / 18
Main principles
learn directly the graph (via the adjacency matrix optimization over )
by targeting:

is usually Laplacian smoothing (minimize
differences in values
between edges with strong weights)

often includes penalties to ensure:
graph connectivity (ex: barrier penalty)
low rank (ex: nuclear norm penalty)
sparsity (ex: penalty)
→ A
L
graph
(A) + L
prediction
(A)
L
graph
∑
ii
′ A
∗
ii
′ ∥zi − zi
′ ∥
2
= Tr (Z
⊤
LA
∗ Z)
1
2
L
graph
log
ℓ1
16 / 18
An example [Jin etal., 2020]
fix graph and update by backpropagation of the error
(SGD)
fix GNN parameters and update graph:
(FBS algorithm: alternate between gradient descent step and proximal
step)
node features
A θ
Err(GNN(A, θ)(X, Y ))
θ
arg minA ∥A − A
(t−1)
∥
2
+ Err(GNN(A, θ)(X, Y )) + λ1 Tr(X
⊤
L(A)X) + λ2 ∥A
17 / 18
References
Chen Y, Wu L, Zaki MJ (2020) Iterative deep graph learning for graph neural
networks: better and robust node embeddings. NeurIPS
Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel
programming for hyperparameter optimization and meta-learning. ICML
Jin W, Ma Y, Liu X, Tang X, Wang S, Tang J (2020) Graph structure learning for
robust graph neural networks. KDD
Zhu Y, Xu W, Zhang J, Liu Q, Wu S, Wang L (2021) Deep graph structure
learning
for robust representations: a survey. Preprint arXiv 2103.03036v1
18 / 18

More Related Content

PDF
From RNN to neural networks for cyclic undirected graphs
PDF
A short and naive introduction to using network in prediction models
PDF
Random Forest for Big Data
PDF
Differential analyses of structures in HiC data
PDF
Convolutional networks and graph networks through kernels
PDF
Dimensionality reduction with UMAP
PDF
Radial Basis Function Interpolation
PDF
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
From RNN to neural networks for cyclic undirected graphs
A short and naive introduction to using network in prediction models
Random Forest for Big Data
Differential analyses of structures in HiC data
Convolutional networks and graph networks through kernels
Dimensionality reduction with UMAP
Radial Basis Function Interpolation
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...

What's hot (20)

PDF
Lecture 3 image sampling and quantization
PPT
Chapter 2 Image Processing: Pixel Relation
PDF
QTML2021 UAP Quantum Feature Map
PDF
Particle Filters and Applications in Computer Vision
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Iclr2016 vaeまとめ
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
About functional SIR
PDF
010_20160216_Variational Gaussian Process
PPTX
Pixelrelationships
PDF
Visualizing Data Using t-SNE
PDF
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
PDF
Particle filtering in Computer Vision (2003)
PDF
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
PDF
High Dimensional Data Visualization using t-SNE
PDF
Bayesian Core: Chapter 8
PDF
Paper Summary of Disentangling by Factorising (Factor-VAE)
PDF
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
PDF
Kernel methods and variable selection for exploratory analysis and multi-omic...
PDF
Graph Kernels for Chemical Informatics
Lecture 3 image sampling and quantization
Chapter 2 Image Processing: Pixel Relation
QTML2021 UAP Quantum Feature Map
Particle Filters and Applications in Computer Vision
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Iclr2016 vaeまとめ
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
About functional SIR
010_20160216_Variational Gaussian Process
Pixelrelationships
Visualizing Data Using t-SNE
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
Particle filtering in Computer Vision (2003)
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
High Dimensional Data Visualization using t-SNE
Bayesian Core: Chapter 8
Paper Summary of Disentangling by Factorising (Factor-VAE)
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
Kernel methods and variable selection for exploratory analysis and multi-omic...
Graph Kernels for Chemical Informatics
Ad

Similar to A review on structure learning in GNN (20)

PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PDF
Scalable Global Alignment Graph Kernel Using Random Features: From Node Embed...
PDF
Presentation.pdf
PDF
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
PDF
MAPE regression, seminar @ QUT (Brisbane)
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
Graph Neural Network in practice
PDF
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
PDF
Presentacion granada
PDF
Talk iccf 19_ben_hammouda
PPTX
Secure Domination in graphs
PDF
Lego like spheres and tori, enumeration and drawings
PDF
Improved Trainings of Wasserstein GANs (WGAN-GP)
PPT
analysis.ppt
PDF
Lecture 12, Graph Theory slides in Discrete Math
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Problem Solving by Computer Finite Element Method
PDF
New Insights and Perspectives on the Natural Gradient Method
PDF
A Szemeredi-type theorem for subsets of the unit cube
VJAI Paper Reading#3-KDD2019-ClusterGCN
Scalable Global Alignment Graph Kernel Using Random Features: From Node Embed...
Presentation.pdf
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
MAPE regression, seminar @ QUT (Brisbane)
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Graph Neural Network in practice
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...
Presentacion granada
Talk iccf 19_ben_hammouda
Secure Domination in graphs
Lego like spheres and tori, enumeration and drawings
Improved Trainings of Wasserstein GANs (WGAN-GP)
analysis.ppt
Lecture 12, Graph Theory slides in Discrete Math
International Journal of Computational Engineering Research(IJCER)
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Problem Solving by Computer Finite Element Method
New Insights and Perspectives on the Natural Gradient Method
A Szemeredi-type theorem for subsets of the unit cube
Ad

More from tuxette (20)

PDF
Analyse comparative de données de génomique 3D
PDF
Detecting differences between 3D genomic data: a benchmark study
PDF
Racines en haut et feuilles en bas : les arbres en maths
PDF
Méthodes à noyaux pour l’intégration de données hétérogènes
PDF
Méthodologies d'intégration de données omiques
PDF
Projets autour de l'Hi-C
PDF
Can deep learning learn chromatin structure from sequence?
PDF
Multi-omics data integration methods: kernel and other machine learning appro...
PDF
ASTERICS : une application pour intégrer des données omiques
PDF
Autour des projets Idefics et MetaboWean
PDF
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
PDF
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
PDF
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
PDF
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
PDF
Journal club: Validation of cluster analysis results on validation data
PDF
Overfitting or overparametrization?
PDF
Selective inference and single-cell differential analysis
PDF
SOMbrero : un package R pour les cartes auto-organisatrices
PDF
Graph Neural Network for Phenotype Prediction
PDF
Explanable models for time series with random forest
Analyse comparative de données de génomique 3D
Detecting differences between 3D genomic data: a benchmark study
Racines en haut et feuilles en bas : les arbres en maths
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodologies d'intégration de données omiques
Projets autour de l'Hi-C
Can deep learning learn chromatin structure from sequence?
Multi-omics data integration methods: kernel and other machine learning appro...
ASTERICS : une application pour intégrer des données omiques
Autour des projets Idefics et MetaboWean
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Journal club: Validation of cluster analysis results on validation data
Overfitting or overparametrization?
Selective inference and single-cell differential analysis
SOMbrero : un package R pour les cartes auto-organisatrices
Graph Neural Network for Phenotype Prediction
Explanable models for time series with random forest

Recently uploaded (20)

PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
2. Earth - The Living Planet earth and life
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Overview of calcium in human muscles.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Sciences of Europe No 170 (2025)
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
BIOMOLECULES PPT........................
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
2. Earth - The Living Planet earth and life
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
7. General Toxicologyfor clinical phrmacy.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
6.1 High Risk New Born. Padetric health ppt
Phytochemical Investigation of Miliusa longipes.pdf
Taita Taveta Laboratory Technician Workshop Presentation.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
2Systematics of Living Organisms t-.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Overview of calcium in human muscles.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Sciences of Europe No 170 (2025)
neck nodes and dissection types and lymph nodes levels
BIOMOLECULES PPT........................

A review on structure learning in GNN

  • 1. A review on structure learning in GNN A review on structure learning in GNN Nathalie Vialaneix, INRAE/MIAT Nathalie Vialaneix, INRAE/MIAT WG GNN, May 25, 2021 WG GNN, May 25, 2021 1 / 18 1 / 18
  • 2. Overviewofthe structure learning problem Overviewofthe structure learning problem 2 / 18 2 / 18
  • 3. Problemframework Question: Given a GNN ($Theta$ are the network parameters learned during training) with input node features and (unknown) graph with nodes and adjacency matrix , find the GNN parameters, learn the graph so as to minimize Problem: Learning means learning edges between nodes (discrete optimization problem) fΘ X ∈ R N ×F G N i ∈ {1, … , N} A ∈ {0, 1} N Θ G ∑ N i=1 Loss(fΘ(xi , G), yi ) G Aii ′ ∈ {0, 1} 3 / 18
  • 4. Possible routes to a solution in what Thomas and Marianne presented: the output of NN is itself discrete and thus the loss is piecewise constant approximation of the loss by a continuous loss to compute a gradient in the more general case presented before: unsure what the shape of the loss is, with respect to the graph description of a typology of methods as proposed by [Zhu et al, 2021] ⇒ A ⇒ 4 / 18
  • 5. Short reminder ofthe general learning process Rk: Iteration between forward pass (prediction and update of embedding) and passward pass (backpropagation and update of GNN parameters and graph) backward pass is often based on gradient descent and the gradient with respect to is maybe automatically computed through standard NN library (almost never fully described) A ∗ 5 / 18
  • 6. First class ofmethods: metric based learning First class ofmethods: metric based learning 6 / 18 6 / 18
  • 7. General principle Instead of a graph in , a similarity matrix (symmetric with null diagonal) is learned using smoothness assumption of node labels/embeddings wrt the graph: edges correspond to closeness (in some sense) between the corresponding values of simplest approach: Gaussian weights the final graph is often learned with additional constrains ( or nuclear norm on ) combined with the input graph (like in gated NN) for the simplest form post-processed to ensure sparsity (e.g., hard thresholding) Important remark: in this approach, the learned parameter for the graph are related to the similarity (how to compute it?) and not to the adjacency matrix itself (directly deduced from ) {0, 1} N S ∗ ∈ [0, +∞[ N Z s ∗ ii ′ = e −γ∥zi−z i ′ ∥ 2 ℓ1 S ∗ S ∗∗ = αA + (1 − α)S ∗ ϵ S 7 / 18
  • 8. An example: [Chen etal., 2020] Forward step iteratively computes the adjacency matrices and the output of the two GNN times Loss is the average of losses combining graph loss and prediction loss 1. Graph definition when is given Graph at substep is learned through an adjacency matrix with: Compute entries of adjacency matrices between nodes and with: ( is either the input features on the nodes or the prediction of the first GNN as obtained in substep ) are learned parameters (using GD during backward step) average the (three parameters, looks a bit like Gated NN) and threshold values below : . Question: How to compute a gradient here? Define adjacency matrix as a mix between , and T ⇒ R Θ (t) r + 1 m i i ′ s (r+1),l ii ′ (v (r) ) = cos (w (t) l ⊗ z (r) i , w (t) l ⊗ z (r) i ′ ) v r ⇒ w S l ii ′ (v (r+1) ) ϵ A (r+1) A (r+1) A (0) A (1) ~ A (r+1) 8 / 18
  • 9. Prediction once the graph is given: two GNN z (r+1) i = GNN1 ( ~ A (r+1) , xi ) ^ y i = GNN2 ( ~ A (r+1) , z (r+1) i ) 9 / 18
  • 10. Loss at substep loss composed of three terms: adequation to (Laplacian like), (ridge penalty) and a "log barrier" (prevents disconnected graph and control sparsity) Global loss at forward step : . Gradient is also the sum of gradients that are back-propagated from each other (I guess) r + 1 L r+1 pred = ∑ i Loss(yi , ^ y i ) L r+1 graph xi ∥. ∥ 2 F t L t = ∑ r L r 10 / 18
  • 11. Second class ofmethods: Probabilistic models Second class ofmethods: Probabilistic models 11 / 18 11 / 18
  • 12. General principle Assume a distribution on (or equivalently ) and solve the expected problem wrt this distribution Ex: [Franceschi etal, ICML 2019] learned parameters are the distribution parameter (and not the graph itself) Optimization over a distribution Initial problem to solve: (can contain a regularization term for ) Problem transformation Suppose (with unknown ), then instead, solve Remarks: It does not solve the graph learning problem directly but it gives information on the graph by learning (here, not very informative because the model is too simple) G w minΘ(a),a ∑ nodes Loss(fΘ(a) (xi , G), yi ) Θ(a) aii ′ ∼iid Pθ = B(θ) θ EPθ [Loss(ϕΘ(a) (xi , G), yi )] θ 12 / 18
  • 13. Learning in practice Iterate over: update of : we need to compute easy: is computed for a given (SG) and SGD steps are performed times to update update of : we need to compute hard (because is involved via the expectation over a distribution) magic trick: setting ( is set to its expectation) and update with projected gradient descent F (Θ(a), a) = EPθ [Loss(fΘ(a) (xi , G), yi )] Θ ∇ΘF ∇ΘF a ∼ Pθ R Θ a ∇aF a a = θ a ∇θF ≃ EPθ [∇ΘF (Θ(a), a)∇aΘ(a) + ∇aF (Θ(a), a)] θ 13 / 18
  • 14. My concern A bit rough: distribution is too simple averaging over the distribution is not really informative probably 14 / 18
  • 15. Third class ofmethods: Direct optimization Third class ofmethods: Direct optimization 15 / 18 15 / 18
  • 16. Main principles learn directly the graph (via the adjacency matrix optimization over ) by targeting: is usually Laplacian smoothing (minimize differences in values between edges with strong weights) often includes penalties to ensure: graph connectivity (ex: barrier penalty) low rank (ex: nuclear norm penalty) sparsity (ex: penalty) → A L graph (A) + L prediction (A) L graph ∑ ii ′ A ∗ ii ′ ∥zi − zi ′ ∥ 2 = Tr (Z ⊤ LA ∗ Z) 1 2 L graph log ℓ1 16 / 18
  • 17. An example [Jin etal., 2020] fix graph and update by backpropagation of the error (SGD) fix GNN parameters and update graph: (FBS algorithm: alternate between gradient descent step and proximal step) node features A θ Err(GNN(A, θ)(X, Y )) θ arg minA ∥A − A (t−1) ∥ 2 + Err(GNN(A, θ)(X, Y )) + λ1 Tr(X ⊤ L(A)X) + λ2 ∥A 17 / 18
  • 18. References Chen Y, Wu L, Zaki MJ (2020) Iterative deep graph learning for graph neural networks: better and robust node embeddings. NeurIPS Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel programming for hyperparameter optimization and meta-learning. ICML Jin W, Ma Y, Liu X, Tang X, Wang S, Tang J (2020) Graph structure learning for robust graph neural networks. KDD Zhu Y, Xu W, Zhang J, Liu Q, Wu S, Wang L (2021) Deep graph structure learning for robust representations: a survey. Preprint arXiv 2103.03036v1 18 / 18