A short and naive introduction to using network in prediction models

A short and naive introduction to using network in
prediction models
Nathalie Vialaneix
nathalie.vialaneix@inra.fr
http://www.nathalievialaneix.eu
EpiFun
December 7th, 2018 - Paris
Nathalie Vialaneix | Using network in prediction models 1/22

Outline
1 A very short introduction
2 Some background on graphs and kernels
3 Using Laplacian in prediction
4 Coming back to our problem... what about SNP networks?

What is this presentation about?
How to integrate network a priori in prediction models with application to
biology?

Disclaimer
has been made with the LA RACHE methodology
might contain some (not understandable) maths...
is probably not exhaustive
should provide an intuitive idea of the basic
concepts and a few directions

Disclaimer
has been made with the LA RACHE methodology
might contain some (not understandable) maths...
is probably not exhaustive
should provide an intuitive idea of the basic
concepts and a few directions
Material:
References at the end of the slides
these slides on my website
http://www.nathalievialaneix.eu/seminars2018.html
most articles available online at http://nextcloud.
nathalievilla.org/index.php/s/VLlheqpwhwD8eeZ (ask me to
be granted write rights)

Outline

Background and notations
What we have: a network (graph), G, with p nodes, v1, . . . , vp and edges
between these nodes
Example: nodes are genes; edges are known regulatory information or co-expression

Background and notations
What we have: a network (graph), G, with p nodes, v1, . . . , vp and edges
between these nodes
Example: nodes are genes; edges are known regulatory information or co-expression
An important matrix: the Laplacian
LG
ij
=









−1 if i , j and vi and vj are linked by an edge
0 if i , j and vi and vj are not linked by an edge
di if i = j
with di the degree (i.e., the number of edges) of node vi.
Minor note (however important for those who like linear algebra): rows and columns
of this matrix sum to 0. It is equivalent to notice that 0 is an eigenvalue (the smallest) with
1p its eigenvector.

Some properties of the Laplacian
Relations with the graph structure:
1
2
3
4
5
has a null space
(eigenvalues = 0) spanned by the eigenvectors





















1
1
1
0
0





















and





















0
0
0
1
1





















(similar to
the fact that the vector 1p spans the null space for connected graphs).

Some properties of the Laplacian
Relations with the graph structure:
Random walk point of view: If we consider a random walk on the graph
with probability to jump from one node to the other equal to 1
di
then the
average time to go from one node to another (commute time) is given by
L+ [Fouss et al., 2007] (commute time kernel).
Diffusion process point of view: If we consider a (heat) diffusion process
on the graph where each node “send” a fraction β > 0 of its value (heat)
through the edges, exp−βL
is related to the covariance of this diffusion
process [Kondor and Lafferty, 2002] (heat kernel).

Relation between RatioCut and Laplacian (graph
clustering)
[von Luxburg, 2007] shows that find a partition of the graph into two clusters
that minimize:
RatioCut(C1, C2) =
1
2
2
X
k=1
X
vi∈Ck , vj<Ck vi∼vj
1
|Ck |
is equivalent to the following constrained problem:
min
C1, ,C2
v>
Lv st v ⊥ 1n and kvk =
√
n
for v the vector of Rn
obtained from the partition by:
vi =
( p
(|C2|)/|C1| if vi ∈ C1
−
p
|C1|/(|C2|) otherwise.

Eigendecomposition of the Laplacian
L is symmetric and positive (not definite positive) so it can be decomposed
into:
L =
p
X
i=1
λieie>
i
with λi the eigenvalues (in increasing order) and ei the orthonormal
eigenvectors in Rp
.

Eigendecomposition of the Laplacian
L is symmetric and positive (not definite positive) so it can be decomposed
into:
L =
p
X
i=1
λieie>
i
with λi the eigenvalues (in increasing order) and ei the orthonormal
eigenvectors in Rp
.
If you want to extract the most relevant information from the network, use
the smallest eigenvalues with:
low pass filter (similar to signal processing): FG =
Pr
i=1 λieie>
i
for
r < p
regularization FG =
Pp
i=1
φ(λi)eie>
i
with φ(x) = e−βλi or 1
λi
for
instance (is, most of the time, a kernel)

Take home messages
eigenvectors of the Laplacian that are associated to the smallest
eigenvalues are strongly related to the graph structure
many kernels have been derived from the Laplacian
[Smola and Kondor, 2003] that perform regularization on graphs that
can be used to measure similarities between nodes of the graph

Outline

Background
Problem: predict Y (numerical) from X (multivariate, dimension p) with a
linear model:
y
|{z}
vector, length n
= X
|{z}
matrix, dimension n×p
× β
|{z}
vector to be estimated, length p
+

Background
Problem: predict Y (numerical) from X (multivariate, dimension p) with a
linear model:
y
|{z}
vector, length n
= X
|{z}
matrix, dimension n×p
× β
|{z}
vector to be estimated, length p
+
Examples:
[Rapaport et al., 2007]: Y is “Radiated/Not radiated sample” and X is
gene expression. A network is given on the p genes based on KEGG
metabolic pathways
[Li and Li, 2008]: Y is time to death (Glioblastoma) and X is gene
expression. A network is given on the p genes based on KEGG
metabolic pathways

Sketch of main directions
1 use a kernel based on the Laplacian and its associated dot product to
compute a distance
2 use a standard linear model but regularize/penalize it with the
Laplacian norm

Working in the feature space...
[Rapaport et al., 2007]
xi is gene expression (length p) for sample i
Standard approach
→ a gene expression profile:
xi =
p
X
j=1
xij

























0
.
.
.
1
.
.
.
0

























→ dot product is:
p
X
j=1
(xijxi0j)

Working in the feature space...
xi is gene expression (length p) for sample i
Standard approach
xi =
p
X
j=1
xij

























0
.
.
.
1
.
.
.
0

























→ dot product is:
p
X
j=1
(xijxi0j)
Laplacian approach
Φ(xi) =
p
X
j=1
φ(λj)x̃ijej
with x̃ij = e
j xi (similar to Fourier
transform)
→ dot product is:
p
X
j=1
x
i Kφxi0
with Kφ the kernel obtained from L with φ

Working in the feature space... and then?
∼ Standard linear regression with the Laplacian dot product + `2 penalty
with the Laplacian kernel norm:
arg min
w∈Rp
n
X
i=1

w
Φ(xi) − yi
2
+Ckwk2
⇔ arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+Cβ
K−1
φ β
this is called a SVM...!
redundant information from two genes that are strongly connected on
the graph is taken into account: this approach enforces similar
contributions of these genes for the prediction
improved interpretation

Direct penalty with L
[Li and Li, 2008]
arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+ Cβ
Lβ +C0
kβk1
| {z }
to enforce sparsity

[Li and Li, 2008]
arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+ Cβ
Lβ +C0
kβk1
| {z }
to enforce sparsity
Remarks:
this is a very similar idea than previously (identical penalty for
φ(·) = 1/·).

[Li and Li, 2008]
arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+ Cβ
Lβ +C0
kβk1
| {z }
to enforce sparsity
Remarks:
this is a very similar idea than previously (identical penalty for
φ(·) = 1/·).
[Li and Li, 2010] explain that this forces connected vertices to have
similar contributions whereas in some cases, they can have opposite
contributions
⇒ use of L
β
jj0 =









−sign(βj)sign(βj0 ) if j , j0 and vj and vj0 are linked by an edge
0 if j , j0 and vj and vj0 are not linked by an edge
dj if j = j0

Outline

Similar approach... with SNPs
[Azencott et al., 2013]
Background: X are SNP values, Y is a phenotype but it is not used as but
to derive a association score between each single SNP and the
phenotype, c ∈ Rp
(ex: SKAT kernel)
In addition, a network is given between SNPs (in the paper, derived from
KEGG gene network or co-expression network)

phenotype, c ∈ Rp
(ex: SKAT kernel)
Purpose: discover sets of genetic loci that are maximally associated with a
phenotype while being connected in the underlying network
→ search for s ∈ {0, 1}p
with the association score associated to s given
by
Pp
j=1
sjcj = sc (to be maximized)

phenotype, c ∈ Rp
(ex: SKAT kernel)
Purpose: discover sets of genetic loci that are maximally associated with a
phenotype while being connected in the underlying network
→ search for s ∈ {0, 1}p
with the association score associated to s given
by
Pp
j=1
sjcj = sc (to be maximized)
Solution:
arg max
s∈{0,1}p
s
c − Cs
Ls −C0
ksk0
| {z }
to enforce sparsity

Another similar reference... with SNPs
[Chiquet et al., 2016]
Background: X are SNP values, Y are several phenotypes (multi-trait
prediction ⇒ multivariate regression model)
In addition, the authors do not directly use a network but a similarity that is
related to the genomic distance, L

Another similar reference... with SNPs
[Chiquet et al., 2016]
Background: X are SNP values, Y are several phenotypes (multi-trait
prediction ⇒ multivariate regression model)
In addition, the authors do not directly use a network but a similarity that is
related to the genomic distance, L
Purpose: prediction, estimation of coefficients (in linear models) and of
direct effects are obtained using a model with i/ `2 regularization with the
norm induced by L and 2/ `1 penalization to enforce sparsity (e.g., select
SNPs)

Other references on SNP networks
[Hu et al., 2011] their aim is SNP network construction in association
with a phenotype (for further mining): pairwise interactions detected
with entropy gain (cancer)
[Kogelman and Kadarmideen, 2014] their aim is also SNP network
construction in association with a phenotype (for clustering + mining):
combine genomic correlation (similar to LD) with genetic association
(interaction tests) in a very crude way (almost just a sum) (pig carcass
weight)

Promising questions
using a priori SNP network in association models
comparing SNP networks (as described in the previous slide) with a
priori networks to check if this solution is potentially interesting
... that will be for the next meeting!

References
Azencott, C.-A., Grimm, D., Sugiyama, M., Kawahara, Y., and Borgwardt, K. M. (2013).
Efficient network-guided multi-locus association mapping with graph cuts.
Bioinformatics, 29(13):i171–i179.
Chiquet, J., Mary-Huard, T., and Robin, S. (2016).
Structured regularization for conditional Gaussian graphical models.
Statistics and Computing, pages 789–804.
Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).
Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation.
IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.
Hu, T., Sinnott-Armstrong, N. A., Kiralis, J. W., Andrew, A. S., Karagas, M. R., and Moore, J. H. (2011).
Characterizing genetic interactions in human disease association studies using statistical epistasis networks.
BMC Bioinformatics, 12:364.
Kogelman, L. J. and Kadarmideen, H. N. (2014).
Weighted interaction SNP hub (WISH) network method for building genetic networks for complex diseases traits using whole
genome genotype data.
BMC Systems Biology, 8(Suppl 2).
Kondor, R. and Lafferty, J. (2002).
Diffusion kernels on graphs and other discrete structures.
In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages
315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Li, C. and Li, H. (2008).
Network-constrained regularization and variable selection for analysis of genomic data.
Bioinformatics, 24(9):1175–1182.
Li, C. and Li, H. (2010).
Variable selection and regression analysis for graph-structured covariates with an application to genomics.
The Annals of Applied Statistics, 4(3):1498–1516.

Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E., and Vert, J. (2007).
Classification of microarray data using gene networks.
BMC Bioinformatics, 8:35.
Smola, A. and Kondor, R. (2003).
Kernels and regularization on graphs.
In Warmuth, M. and Schölkopf, B., editors, Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop,
Lecture Notes in Computer Science, pages 144–158, Washington, DC, USA. Springer-Verlag Berlin Heidelberg.
von Luxburg, U. (2007).
A tutorial on spectral clustering.
Statistics and Computing, 17(4):395–416.

A short and naive introduction to using network in prediction models

More Related Content

What's hot (20)

Similar to A short and naive introduction to using network in prediction models (20)

More from tuxette (20)

Recently uploaded (20)

A short and naive introduction to using network in prediction models