Graph convolutional networks in apache spark

Graph Convolutional Networks
In Apache Spark
Intelligent Workﬂow Automati
Scalacon
Emiliano Martínez
November 2021
BBVA Innovation Labs
n

About Me:
Programming in Scala for 10 years
Akka development
Functional domain models with FP libraries Cats, Scalaz
Big Data: Spark, Kafka, Cassandra, NoSql, ...
Machine Learning: Spark ML, Sklearn, Analytics Zoo, Tensorﬂow, Torch
Currently, I do NLP at BBVA

Deep Learning Models
- It is a machine learning model based on neural networks, that tries to mimic the
structure and the function of the human brain.
- Supervised Machine learning method.
- The goal is to approximate a function that maps an input x to a category by
adjuﬆing the value of the Θ parameters: y = f(x; Θ)

- Automatic speech recognition. A generated wave from a human voice that is broken down into
what are called phonemes. Each phoneme is like a chain link and by analyzing them in sequence,
starting from the ﬁrst phoneme, the ASR software uses statistical probability analysis to deduce
whole words and then from there, complete sentences.
- Image Recognition. Based on CNN. To automatically identify objects, people, place in images. It is
used in guiding robots, autonomous vehicles, driver assistant systems, etc ...
- Drug Discovery. Using graph convolutional networks.
- Natural Language Processing. Set of machine learning models/techniques to process natural
language data.
Deep Learning Models

Neural Networks
Hidden Layers
y ∊ ℝk
ŷ ∊ ℝk
x ∊ ℝn
Input training with
samples n features
Output label vector
Prediction vector
a1
l1
a1
l2
a4
l2
a4
l1
W + b parameters
Input Layer Output Layer

zl
= Wl
al-1
+ bl
al
= σ(zl
)
Layer Feed Forward Equations
g(z)=1/(1+e^(-z)) g(z)= max(0, z) g(z)= (e^(z) - e^(-z))/(e^(z) + e^(-z))
Sigmoid ReLU Tanh

Hi
=σ(Wi
Hi-1
+ bi
)
CE(𝑃,𝑄)=−𝐸𝑥∼𝑃[log(𝑞(𝑥)]
Training Equations
Layer feed Forward
Loss Function
Gradient Calculation
Weights update
Forward step
Backward step

Graphs
- Graphs are described by a set of vertices and edges.
- Data that can not be represented in an euclidean space.
- Input training samples can be represented as nodes of a graph. They are deﬁned by
its properties and by the connections with other nodes.
- They can be cyclic, acyclic, weighted, ...
G = (V, E)

Graphs Examples
Caﬀeine. Image taken from
Wikipedia
Social Network Graph . Image taken from Wikimedia

- A type of NN the operates directly on Graph Structures.
- It can be used for tasks of node node classiﬁcation.
- Different approaches that can be used depending of the case:
a. Inductive: GraphSage, ...
b. Transductive: Spectral graph convolutions, DeepWalk, ...
Graph Neural Networks

Convolutions 0 1 1
0 0 0
1 1 1
1 0 1
0 1 1
1 0 0
1 -1 0
0 -1 0
1 1 1
- To apply filters that detect details of the
images.
- Less parameters than the Fully
Connected Layer Model.
- Pixel positions and neighborhood have
semantic meanings.
- Element-wise multiplication between the
filter-sized patch of the input and filter,
which is then summed, always resulting
in a single value.
- Translation invariance.
Some CNN improvements:

Graph Convolutional Networks
Graph
Convolutional
Networks
Spectral Based
Models
Spacial Based
Models
Classic CNN Models
Propagation
Models
General
Frameworks
Graph
Convolutional
Networks
Computer
Vision
NLP
Science
Images
Videos
Point Clouds
Meshes
Physics
Chemistry
Social
Representation in the
Fourier Domain:
Eigendecomposition of
the Graph Laplacian
Matrix.

GCN Intuition
2
4
5
3
1
0 1 1 0 0
1
1
0
0 1 1 1
1 0 0 0
1 1
0 0
0 1 0 1 0
X = N x F
Convolution 1
NxN Graph Adjacency Matrix
Hi
= f(Hi-1
A)
If First Layer: Hi
= f(XA)
Convolution 2
Dense
Dense
H1
H2
N nodes by F
features per
node
SoftMax
dot(A, X)
dot(A, H1)

Graph Analysis models
- Fast Spectral Graph Convolution. Kipf & Welling (2017)
H(l+1)
= σ[D-1/2
ÂD-1/2
Hl
Wl
]
● Semi-supervised learning method.
● Simpliﬁcation of Spectral Graph Analysis.
● 2 Layers GCN.
● Two hops neighborhood.
D is the degree matrix
Â adjacency matrix plus the identity matrix

Laplacian
Eigendecomposition
Spectral Graph Convolution
(Defferrard et al., 2016).
Truncated expansion in terms of Chebyshev
polynomials
(Defferrard et al., 2016).
First-order approximation of spectral
graph convolutions
(Kipf & Welling, 2016). with K = 1, θ0
= 2, and θ1
= −1

Frameworks used for implementation
- Apache Spark 3.1.2
1. GraphX to get Connected components
2. Spark ML to transform the dataframes corresponding to graphs
3. Spark Core to create the RDD[Sample] and to partition the dataset depending of the graph´s components.
- Breeze 1.0
1. Create the adjacency matrix that represents node connections, sparse matrix.
2. Convert to symmetric matrix
3. Normalize the matrix according to the spectral graph approximation
- Analytics Zoo for Spark 3.1.2
1. Build the Model Graph
2. Model optimization

Deep Learning in Spark
- Analytics Zoo
It provides a distributed deep learning framework using a Scala Keras based implementation that runs on on BigDL framework.
https://arxiv.org/pdf/1804.05839.pdf
BigDL: A Distributed Deep Learning Framework for Big Data

Experiment Steps
Edges
Nodes
Read Files Input Tensor
Dataset
Adjacency Matrix
Get Spark
Workers
Two modules of one
convolutional and one hidden
layer. Adam Optimizer an L2
Regularization
Model
Split Graph
Partition 1
Partition 2
Two modules of one
convolutional and one hidden
layer. Adam Optimizer an L2
Regularization
Model
Graph 1
Graph 2
Graph 1
Graph 2
All-reduce parameters
RDD Partitioner
Sparse Breeze to
Analytics Zoo
sparse Tensor

Case implementation
- Cora dataset
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each
publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the
dictionary. The dictionary consists of 1433 unique words.
Documents can be classified among this seven classes: Neural_Networks, Rule_Learning, Reinforcement_Learning,
Probabiliﬆic_Methods, Theory, Genetic_Algorithms, Case_Based

Two main ﬁles: content with the input features and edges. Data is loaded in a
RDD with one partition per graph.
Cora.cites
35 1033
35 103482
35 103515
35 1050679
35 1103960
887 334153
906 910
Cora.content
31336 0 0 0 0 0
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 0 0
… 0 Neural_Networks
case class Element(id: String, words:
Array[Float], label: String)
RDD[Element]
1. One partition per graph.
2. Avoid shuﬄe
case class Edge(orig: Int, dest: Int)
RDD[Edge]
1. Build adjacency matrix from this representation

private[gcn] def createInputDataset(rdd: RDD[Element]) : RDD[Sample[Float]] = {
rdd.map { case Element(_, words, label) =>
Sample(
Tensor(element.words, Array(1432)),
Tensor(Array(label), Array(1))
)
}
}
Build the datasets using Sample and Tensor[T] types from Analytics Zoo library
Datasets are represented as RDD[Sample]
One partition per graph. Use Spark Partioner in case of multiple graphs.
Use GraphX to split the graph in different components.

1. Adjacency matrix is built and processed using Breeze CSCMatrix.
val builder = new CSCMatrix.Builder[Float](rows, cols)
edges.foreach { case Edge(r, c) =>
builder.add(r, c, 1.0F)
}
2. Transform to Symmetrical.
sparseAdj +:+ (sparseAdj.t *:* (sparseAdj.t >:> sparseAdj)
.map(el => if (el) 1.0F else 0.0F)) - (sparseAdj *:* (sparseAdj.t >:> sparseAdj)
.map(el => if (el) 1.0F else 0.0F))
3. Matrix normalization.
According to the spectral graph convolution equation.

def getModel(
dropout: Double,
matrix: Tensor[Float],
batchSize: Int,
inputSize: Int,
intermediateSize: Int,
labelsNumber: Int
): Sequential[Float] = {
Sequential[Float]()
.add(Dropout(dropout))
.add(GraphConvolution[Float](matrix, batchSize,
inputSize))
.add(Linear[Float](inputSize, intermediateSize,
wRegularizer =
L2Regularizer(5e-4)).setName("layer-1"))
.add(ReLU())
.add(Dropout(dropout))
.add(GraphConvolution[Float](matrix, batchSize,
intermediateSize))
.add(Linear[Float](intermediateSize,
labelsNumber).setName("layer-2"))
.add(LogSoftMax())
}
NN Model
GraphConvLayer LinearLayer
ReLU
Drop
GraphConvLayer
Drop
LinearLayer SoftMax Label
Prediction
Input
Trainable parameters are in red modules!
Model sequential implementation:

Optimization Process
- We use only train with 140 samples of the 2708.
- Every mini-batch is equivalent to one Epoch.
- Avoid shuﬄe the data in data broadcast.
- For every sub-graph one Spark Partition.
- The negative log likelihood (NLL) criterion.
- Adam Optimizer with lr = 1E-3, beta1= 0.9, beta2 = 0.999, epsilon =
1E-8, decay = 0, wdecay = 0

Results
accuracy: 0.531019
- Training 1000 Epochs.
- 140 labeled examples.
- Propagation Function HW. Multilayer perceptron: D-1/2
ÂD-1/2
HW
Case 1: One graph in one partition.
- Propagation Function D-1/2
ÂD-1/2
HW. Renormalization trick.
accuracy: 0.769202
Identity matrix!

Process Visualization
- Represent the output of the second hidden layer (7 neurons)
- Dimensionality reduction applying tSNE(t-Distributed Stochastic
Neighbor Embedding)
- A Snapshot every 200 epochs is taken.

Representation of the last layer using tSNE with NO convolution

Representation of the last layer using tSNE with spectral Conv

Conclusions and future work
- Convolutions on graphs show promising results in graph analysis using deep
learning.
- We can get beneﬁt of the Spark processing power to perform distributed NN
training.
- The Scala ecosystem will help to develop and integrate with the big data world.
- Scala 3 Graph Neural Network library on top of Spark.

Implementation:
https://github.com/emartinezs44/SparkGCN

Graph convolutional networks in apache spark

More Related Content

What's hot (20)

Similar to Graph convolutional networks in apache spark (20)

Recently uploaded (20)

Graph convolutional networks in apache spark