SlideShare a Scribd company logo
Combining co-expression and
co-location for gene network inference
in porcine muscle development in late
gestation
Laurence Liaubet, Nathalie Vialaneix
October 14th, 2019 - NETBIO
Unités MIAT & GenPhySE, INRA Toulouse
General context, an interdisciplinary network
A fundamental molecular question, the regulation of gene expression
An ethical and economic breeding problem, survival at birth
A statistical question, modelization of gene co-expression with adding biological information
Interpretation of phenotypes not explained only by genetic approach.
The same genome sequence produce a wide range of differentiated cells.
Modulation of gene expression: Epigenetic marks and chromatin regulation, cis and trans
regulation  3D nuclear topography……
Response to physiological context: growth, health, reproduction, adaptation
Biological context: a fundamental molecular question
Fanucchi et al. Cell 2013
Hierarchical Transcription in a Multigene Complex
Biological context: a fundamental molecular question
Thèse de Maria Marti-Marimon, 2018
Biological context: an increased mortality at birth
14 % of newborns died between birth
and weaning
Peak of mortality in the first two days
after birth maturity
The selection for more prolificacy and
meat production has been
accompanied by a substantial increase
in mortality of piglets at birth
Wilson et al., 1998 ; Biensen et al., 1998 ; Leenhouwers et al., 2002 ; Canario, 2006 ; Thèse de Valentin Voillet, 2016
Specific mechanisms during late gestation in pigs
Maturity = plain development allowing survival at birth
Specific mechanisms during late gestation in pigs – muscle tissue
d114
d90 d110
Sampling of longissimus
(n=459)
ANR project PORCINET (2010)
Systems biology of piglet maturity
 Transcriptomic analysis (n = 64)
A dramatic switch of gene expression occurred in late gestation
5,167 genes differential between 90 and 110
dg in the fetal muscle (Bonferroni 1%)
With 1,131 DEGs for age x genotype (maturity)
found in Voillet et al. (2014)
PCA without variables selection
IGF2 (pat), insulin-like growth factor 2
Pig  QTL for adiposity muscle mass
Human  fetal growth and intrauterine growth restriction
Co-expression = co-regulation? = nuclear co-localization?
IGF2, a gene of fundamental importance in pig muscle development
IGF2
LWLW MSLW LWMS MSMS LWLW MSLW LWMS MSMS
90 110
Marti-Marimon et al., 2018.
RNA + DNA + Pig chr Gene 1 + Gene 2
1. Network inference with GGM
2. Coming back to our problem: gene expression and FISH
experiments
2
In short, what are we looking for?
Hypothetizing that co-expression is related to co-location:
• have an automated process to automatically find relevant
pairs of genes for which co-location can be tested (because
FISH experiments are time consumming and targeted experiments)
• improve network inference using co-location information
3
1. Network inference with GGM
4
Framework
Data: large scale gene expression data
individuals
n



X =




. . . . . .
. . Xj
i . . .
. . . . . .




variables (gene expressions), p
here: micro-array experiment, n = 61 (gestational age: 90
days) and p = 13, 855 uniquely annotated genes
5
Framework
Data: large scale gene expression data
individuals
n



X =




. . . . . .
. . Xj
i . . .
. . . . . .




variables (gene expressions), p
here: micro-array experiment, n = 61 (gestational age: 90
days) and p = 13, 855 uniquely annotated genes
What we want to obtain: a network with
• nodes: genes;
• edges: large and direct co-expression between two genes
(track transcription regulations)
5
Using correlations: relevance network
Butte and Kohane (1999, 2000)
First (naive) approach: calculate correlations between
expressions for all pairs of genes, threshold the smallest ones
and build the network.
“Correlations” Thresholding Graph
6
But correlation is not causality…
7
But correlation is not causality…
strong indirect correlation
y z
x
set.seed(2807); x <- runif(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751
cor(y,z); [1] 0.9971105
7
But correlation is not causality…
strong indirect correlation
y z
x
set.seed(2807); x <- runif(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751
cor(y,z); [1] 0.9971105
♯ Partial correlation
cor(lm(y∼x)$residuals,lm(z∼x)$residuals) [1]
-0.1933699 7
But correlation is not causality…
strong indirect correlation
y z
x
Networks are built using partial correlations, i.e., correlations
between gene expressions knowing the expression of all the
other genes (residual correlations).
7
Gaussian Graphical Model (GGM)
(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene
expression); then
j ←→ j′
(genes j and j′
are linked) ⇔ Cor
(
Xj
, Xj′
|(Xk
)k̸=j,j′
)
̸= 0
8
Gaussian Graphical Model (GGM)
(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene
expression); then
j ←→ j′
(genes j and j′
are linked) ⇔ Cor
(
Xj
, Xj′
|(Xk
)k̸=j,j′
)
̸= 0
Cor
(
Xj
, Xj′
|(Xk
)k̸=j,j′
)
≃
(
Σ−1
)
j,j′
⇒ find the partial
correlations by means of (Σn
)−1
.
8
Gaussian Graphical Model (GGM)
(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene
expression); then
j ←→ j′
(genes j and j′
are linked) ⇔ Cor
(
Xj
, Xj′
|(Xk
)k̸=j,j′
)
̸= 0
Cor
(
Xj
, Xj′
|(Xk
)k̸=j,j′
)
≃
(
Σ−1
)
j,j′
⇒ find the partial
correlations by means of (Σn
)−1
.
Problem: Σ is a p-dimensional matrix (with p large) and n is
small compared to p ⇒ (Σn
)−1
is a poor estimate of Σ−1
!
8
Sparse approaches
Relation between partial correlation and LM: if S = Σ−1
and
writing
Xj
= β⊤
j X−j
+ ϵ
we have: βjj′ =
Sjj′
Sjj
. So edges (non zero partial correlations)
also correspond to coefficients different to zero in the p
regression models above (for j = 1, . . . , p).
9
Sparse approaches
Relation between partial correlation and LM: if S = Σ−1
and
writing
Xj
= β⊤
j X−j
+ ϵ
we have: βjj′ =
Sjj′
Sjj
. So edges (non zero partial correlations)
also correspond to coefficients different to zero in the p
regression models above (for j = 1, . . . , p).
To ensure sparsity of βj: Meinshausen and Bühlmann (2006)
argminβj
n∑
i=1
(
xj
i − β⊤
j x−j
i
)2
+ λ∥βj∥ℓ1
9
Including prior knowledge in this model
Suppose that we have some clues that:
• for some pairs (j, j′
), an edge is likely to occur between j
and j′
• for some pairs (j, j′
), it is likely that there is no edge
between j and j′
10
Including prior knowledge in this model
Suppose that we have some clues that:
• for some pairs (j, j′
), an edge is likely to occur between j
and j′
• for some pairs (j, j′
), it is likely that there is no edge
between j and j′
then, we want to drive βjj′
• toward ±a with a some positive value (the sign is that of
the correlation Cor(Xj
, Xj′
))
• toward 0
10
Including another penalty in the model
argminβj
n∑
i=1
(
xj
i − β⊤
j x−j
i
)2
+ λ∥βj∥ℓ1
+
µ


∑
j′ of type 1
(βjj′ ± a)2
+
∑
j′ of type 2
(βjj′ )2


smooth penalty for co-localized (or not) pairs
11
Including another penalty in the model
argminβj
n∑
i=1
(
xj
i − β⊤
j x−j
i
)2
+ λ∥βj∥ℓ1
+
µ


∑
j′ of type 1
(βjj′ ± a)2
+
∑
j′ of type 2
(βjj′ )2


smooth penalty for co-localized (or not) pairs
In practice:
• a = 1 (after scaling of gene expressions)
• λ chosen with stability selection based on bootstrap for a
fixed µ
• µ chosen as the minimum value recovering exactly prior
knowledge 11
2. Coming back to our problem:
gene expression and FISH
experiments
12
Starting point
• restricted to a list of genes likely involved in foetal
development (1,131 DEGs between 90 and 110 days of
gestation as found in Voillet et al. (2014))
13
Starting point
• restricted to a list of genes likely involved in foetal
development (1,131 DEGs between 90 and 110 days of
gestation as found in Voillet et al. (2014))
• started from an even more restricted list including genes
of interest (IGF2, DLK1 and MEG3) and the genes highly
correlated to these genes (p = 359 genes at the end)
13
Iterative process: from co-location to network and
conversely
14
What to do with these networks? Network mining…
1. Node importance
2. Clustering of nodes (and comparison of clustering with
NMI)
3. GO analysis
15
Node importance
For every node, computation of:
• degree (number of edges afferent to a given node)
• betweenness centrality measure
The orange node’s degree is equal to 2, its betweenness
to 4.
16
Node importance
For every node, computation of:
• degree (number of edges afferent to a given node)
• betweenness centrality measure
16
Node importance
For every node, computation of:
• degree (number of edges afferent to a given node)
• betweenness centrality measure
16
Node importance
For every node, computation of:
• degree (number of edges afferent to a given node)
• betweenness centrality measure
16
Node importance
For every node, computation of:
• degree (number of edges afferent to a given node)
• betweenness centrality measure
16
Find clusters by modularity optimization
The modularity Newman and Girvan (2004) of the partition
(C1, . . . , CK) is equal to:
Q(C1, . . . , CK) =
1
2m
K∑
k=1
∑
xi,xj∈Ck
(Wij − Pij)
with Pij: weight of a “null model” (graph with the same
degree distribution but no preferential attachment):
Pij =
didj
2m
with di = 1
2
∑
j̸=i Wij.
17
Interpretation of the modularity
A good clustering should maximize the modularity:
• Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij
• Q ↘ when (xi, xj) are in two different clusters and
Wij ≫ Pij (m = 20)
Pij = 7.5
Wij = 5 ⇒ Wij − Pij = −2.5
di = 15 dj = 20
i and j in the same cluster decreases the modularity
18
Interpretation of the modularity
A good clustering should maximize the modularity:
• Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij
• Q ↘ when (xi, xj) are in two different clusters and
Wij ≫ Pij (m = 20)
Pij = 0.05
Wij = 5 ⇒ Wij − Pij = 4.95
di = 1 dj = 2
i and j in the same cluster increases the modularity
18
Interpretation of the modularity
A good clustering should maximize the modularity:
• Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij
• Q ↘ when (xi, xj) are in two different clusters and
Wij ≫ Pij
• Modularity
• helps separate hubs
• is not an increasing function of the number of clusters:
useful to choose the relevant number of clusters
Approximate optimization with the Louvain algorithm Blondel
et al. (2008) (among others)
18
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 0 to 3 with 359 nodes
Network 0 without a priori, 2,279 edges (density: 3.55%)
Sub-network extracted
around the three target
genes
Full network
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 0 to 3 with 359 nodes
Network 0 without a priori, 2,279 edges (density: 3.55%)
359 nodes, 2279 edges
Network0, no a priori
IGF2–DLK1
IGF2–MEG3
DLK1–MEG3A priori 1
Lahbib-Mansais et al, 2016
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 0 to 3 with 359 nodes
Network 0 without a priori, 2,279 edges (density 3.55%)
Network 1 with triple co-localization of IGF2, DLK1 and MEG3, 2,250 edges (density 3.50%)
Test FISH 3D
IGF2 and RPL32 were associated in 20%
of the analysed nuclei (threshold 10%)
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 0 to 3 with 359 nodes
Network 0 without a priori, 2,279 edges and density 3.55%
Network 1 with co-localization of IGF2, DLK1 and MEG3, 2,250 edges and density 3.50%
Network 2 with test of MEST and DCN associations, 2,091 edges and density 3.25%
Test FISH 3D
Network inference
34% (+)
10% (-)
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 3 with test co-localization with MYH3 (ntw 0 and 1)
Network 0 Network 1
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 3 with test co-localization with MYH3 (ntw 0 and 1)
MYH3 = Embryonic myosin, excellent biomarker of muscle maturity (Voillet et al., 2018)
No functional link known with IGF2!
Marti-Marimon et al., 2018.
Network inference iteration and 3D FISH validations
359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84)
Network 3 with test co-localization with MYH3 (ntw 0 and 1), 2,091 edges and density 3.25%
52% (+)
45% (+)
26% (+)
Test FISH 3D
Network inference
Marti-Marimon et al., 2018.
Network mining (network structure with key genes)
The degree of a node is the number of edges afferent to this gene. High degree genes are
connected to many other genes (hub).
The betweenness of the node is the number of shortest paths between pairs of genes in
the network that pass through that gene. High-betweenness genes are central and more
likely to disconnect the network if removed.
Network mining (network structure with key genes)
gene symbol degree betweeness degree betweeness degree betweeness degree betweeness Degree betweeness
ADIPOR2 15 646,65 14 487,32 15 628,78 14 660,97 -7 2
AKR7A2 19 492,63 17 436,71 15 474,10 14 291,90 -26 -41
CD81 17 551,17 18 616,7 15 478,76 17 600,58 0 9
CRAT 19 716,24 15 518,26 16 738,30 14 573,58 -26 -20
DCN 16 438,86 18 560,83 9 288,82 6 357,74 -63 -18
DLK1 10 103,52 6 81,7 5 74,22 5 24,13 -50 -77
DPP4 15 568,91 16 672,01 15 674,94 15 597,87 0 5
EGFR 16 624,92 12 375,87 12 385,35 11 354,78 -31 -43
GHITM 16 578,58 17 588,76 16 592,35 14 496,63 -13 -14
GLUD1 13 575,69 13 553,28 12 574,48 12 586,27 -8 2
IGF2 10 118,26 11 231,09 8 260,58 7 622,44 -30 426
LPAR4 14 464,31 17 644,76 18 812,81 16 798,82 14 72
MEG3 13 282,32 5 55,75 6 120,18 5 24,13 -62 -91
MESP1 12 228,49 14 320,34 14 483,27 14 775,31 17 239
MEST 13 148,2 12 121,44 10 345,69 7 385,27 -46 160
MRPS28 16 743 15 743,29 16 953,42 15 796,14 -6 7
MYH3 14 610,73 14 656,6 11 455,62 4 0,00 -71 -100
NMNAT3 17 562,63 18 664,84 16 473,55 17 573,15 0 2
RAVER1 16 613,84 16 665,73 16 696,35 16 745,66 0 21
RPL32 18 717,96 15 557,65 7 149,80 5 243,11 -72 -66
SELO 18 692,52 14 438,35 14 459,46 15 587,32 -17 -15
SYDE1 15 436,75 17 530,29 14 459,66 18 745,52 20 71
TFRC 15 595,1 15 534,83 13 437,38 17 846,81 13 42
TYRO3 20 785,95 18 659,9 16 603,94 17 700,03 -15 -11
YWHAB 20 670,22 17 470,35 17 538,41 17 547,17 -15 -18
Network 0 Network 1 Network 2 Network 3
Comparison between
Network 0 and Network 3
(% of variation)
↗
↘
↘
↘
↘
↘
↘
Marti-Marimon et al., 2018.
Network clustering
Network 0 Network 1 Network 2 Network 3
Network 0 1 0.3893 0.3381 0.3244
Network 1 0.3893 1 0.4007 0.3923
Network 2 0.3381 0.4007 1 0.4152
Network 3 0.3244 0.3923 0.4152 1
To analyse the evolution of the network structure from Network 0 to Network 3, clustering
of the genes was performed on each network.
Normalized mutual information (NMI) measure the similarity between two clusterings.
The value is comprised between 0 and 1 and is equal to 1 when the two clusterings are
identical.
clusterings become more consistent when introducing new biological information in
each network inference iteration
Marti-Marimon et al., 2018.
Network clustering: Networks 0 and 3 were analysed in depth to
search for any correspondence between clusters
Pairwise contingency tables between clusterings. Percentage of genes for each cluster in
Network 0 found in each cluster of Network 3. In bold and red, the most resembling values
between clusters.
Marti-Marimon et al., 2018.
1 2 3 4 5 6
1 64,10 7,69 7,69 2,56 7,69 10,26
2 8,77 68,42 0,00 1,75 19,30 1,75
3 14,89 0,00 65,96 19,15 0,00 0,00
4 3,92 1,96 11,76 82,35 0,00 0,00
5 34,09 6,82 4,55 11,36 43,18 0,00
6 3,57 17,86 14,29 32,14 0,00 32,14
7 11,11 38,89 0,00 11,11 38,89 0,00
8 0,00 48,72 33,33 10,26 5,13 2,56
9 5,56 11,11 16,67 27,78 38,89 0,00
Clusters in Network 3
Clusters in
Network 0
Functional enrichment analysis: Gene Ontology Biological Process
Marti-Marimon et al., 2018.
Network 0 - Cluster 1 Network 3 - Cluster 1
GOBP Terms FDR FDR Target
Extracellular structure 5,76E-05 1,14E-08 DCN
Cellular response to organonitrogen compound 6,80E-04 1,16E-02 IGF2
Reponse to transformaing growth factor beta 2,35E-03 1,24E-01
Multicellular organism metabolic process 2,35E-03 3,05E-03
Skin development 3,18E-03 1,44E-01
Neuron migration 2,82E-02 4,37E-01
Regulation of neuron projection development 3,07E-02 4,93E-01
Mesoderm development 1,24E-01 MEST
Muscle organ development 8,35E-01 MYH3
Notch signaling pathway 5,56E-01 DLK1
Collagen fibril organization 1,10E-04 1,02E-05
Network 0 - Cluster 8 Network 3 - Cluster 2
GOBP Terms FDR FDR
Generation of precursor metabolites and energy 1,64E-02 1,32E-07
Oxidation-reduction process 7,25E-03 5,63E-09
Energy derivation by oxidation of organic compounds 8,17E-03 1,88E-06
Cellular respiration 8,17E-03 2,65E-07
Functional enrichment analysis based on Gene Ontology was performed using the web tool
Webgestalt (WEB-based GEne SeT AnaLysis Toolkit)
Functional enrichment analysis: reconstructed network of genes in
cluster 1 of Network 3 with Ingenuity Pathway Analysis (IPA)
Marti-Marimon et al., 2018.
Villa-Vialaneix et al., 2013
IPA proposed to connecting 49 (82%) out of 60 genes in a network.
MYOD1 and CTNNB1 were identified by upstream regulator analysis as potential transcriptional
factors for a group of genes including IGF2 and MYH3.
“Cell Morphology”, 14 genes, p-value = 1.75e-08
“Quantity of cells”, 31 genes, p-value = 2.48e-09
“Morphology of connective tissue cells”, 8 genes, p-value = 1.27e-04
“Formation of muscle”, 10 genes, p-value = 2.98e-05, involved IGF2 and
MYH3 together with CTNNB1 and MYOD1.
• 82% of edges in Network 0 were conserved in Network 3
• The most important genes in Network 0 were among those showing the highest values of
betweenness and degree in Network 3.
Not major disturbances in the network structure
• In the local analysis, the NMI value revealed that the clusters resembled one another more with
each new network inferred.
• Four out of six clusters in the final network conserved more than 62% of genes in the corresponding
clusters of Network 0.
• IGF2-MEST, (DLK1/MEG3)-MEST, (DLK1/MEG3)-DCN, that were observed to be connected in co-
expression networks in other studies.
• DLK1, MEG3, RPL32, MEST, DCN and MYH3 were less connected with the rest of the other genes in
Network 3 but not IGF2.
• No previous association between IGF2 and MYH3, even though the two genes are known to be
involved in muscle development overexpression and accumulation of β-catenin in the nuclei of
differentiating murine myoblasts results in higher MyoD activation and Myhc induction (Ramazzotti et
al, 2016)
Conclusions
Marti-Marimon et al., 2018.
• What is published and what is not…
• Intermediate modelling is retained as valuable information on robust or non-robust
interactions currently, new interactions are being tested by FISH 3D
• Dramatic change in gene expression at the end of gestation Search of interaction
whole genome (Maria Marti-Marimon thesis)
Conclusions - Perspectives
Whole genome interaction Maps
3D Chromosome conformation capture
Hi-C in progress
Thank you for your attention!
19
Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E.
(2008). Fast unfolding of communites in large networks.
Journal of Statistical Mechanics: Theory and Experiment,
P10008:1742–5468.
Butte, A. and Kohane, I. (1999). Unsupervised knowledge
discovery in medical databases using relevance networks. In
Proceedings of the AMIA Symposium, pages 711–715.
Butte, A. and Kohane, I. (2000). Mutual information
relevance networks: functional genomic clustering using
pairwise entropy measurements. In Proceedings of the
Pacific Symposium on Biocomputing, pages 418–429.
20
Meinshausen, N. and Bühlmann, P. (2006). High dimensional
graphs and variable selection with the Lasso. Annals of
Statistic, 34(3):1436–1462.
Newman, M. and Girvan, M. (2004). Finding and evaluating
community structure in networks. Physical Review, E,
69:026113.
Voillet, V., SanCristobal, M., Lippi, Y., Martin, P. G.,
Iannuccelli, N., Lascor, C., Vignoles, F., Billon, Y., Canario,
L., and Liaubet, L. (2014). Muscle transcriptomic
investigation of late fetal deelopment identifies candidate
genes for piglet maturity. BMC Genomics, 15:797.
21

More Related Content

PDF
'ACCOST' for differential HiC analysis
PDF
La statistique et le machine learning pour l'intégration de données de la bio...
PDF
Kernel methods for data integration in systems biology
PDF
Explanable models for time series with random forest
PDF
Selective inference and single-cell differential analysis
PDF
A short and naive introduction to epistasis in association studies
PDF
Reproducibility and differential analysis with selfish
PDF
Learning from (dis)similarity data
'ACCOST' for differential HiC analysis
La statistique et le machine learning pour l'intégration de données de la bio...
Kernel methods for data integration in systems biology
Explanable models for time series with random forest
Selective inference and single-cell differential analysis
A short and naive introduction to epistasis in association studies
Reproducibility and differential analysis with selfish
Learning from (dis)similarity data

What's hot (20)

PDF
Graph Neural Network for Phenotype Prediction
PDF
A short introduction to statistical learning
PDF
Multiple kernel learning applied to the integration of Tara oceans datasets
PDF
An introduction to neural networks
PDF
Study of Different Multi-instance Learning kNN Algorithms
PPTX
New Insights and Applications of Eco-Finance Networks and Collaborative Games
PDF
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
PDF
Dimensionality reduction by matrix factorization using concept lattice in dat...
PDF
An introduction to neural network
PDF
Irene Martelli - PhD presentation
PDF
Analysis On Classification Techniques In Mammographic Mass Data Set
PDF
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
PPT
Tinkerplots
PDF
Improved probabilistic distance based locality preserving projections method ...
PDF
B colouring
PDF
Algorithms 14-00122
PPT
P1121133727
PDF
Similarity encoding for learning on dirty categorical variables
PDF
4 データ間の距離と類似度
PDF
IRJET- Agricultural Crop Classification Models in Data Mining Techniques
Graph Neural Network for Phenotype Prediction
A short introduction to statistical learning
Multiple kernel learning applied to the integration of Tara oceans datasets
An introduction to neural networks
Study of Different Multi-instance Learning kNN Algorithms
New Insights and Applications of Eco-Finance Networks and Collaborative Games
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Dimensionality reduction by matrix factorization using concept lattice in dat...
An introduction to neural network
Irene Martelli - PhD presentation
Analysis On Classification Techniques In Mammographic Mass Data Set
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
Tinkerplots
Improved probabilistic distance based locality preserving projections method ...
B colouring
Algorithms 14-00122
P1121133727
Similarity encoding for learning on dirty categorical variables
4 データ間の距離と類似度
IRJET- Agricultural Crop Classification Models in Data Mining Techniques
Ad

Similar to Combining co-expression and co-location for gene network inference in porcine muscle development in late gestation (20)

PDF
presentation
PDF
Consensual gene co-expression network inference with multiple samples
PDF
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
PPT
Cornell Pbsb 20090126 Nets
PDF
Mining co-expression network
PDF
High-Dimensional Machine Learning for Medicine
PDF
Network analysis for computational biology
PPTX
A simple Introduction to Explainability in Machine Learning and AI (XAI)
PDF
Inferring networks from multiple samples with consensus LASSO
PPTX
R Packages Unpacked
PPTX
Microarray and its application
PPT
Large scale machine learning challenges for systems biology
PDF
Lecture10 xing
PPTX
NetBioSIG2014-Talk by Ashwini Patil
PDF
A comparative study of covariance selection models for the inference of gene ...
PDF
BOSE, Debasish - Research Plan
PDF
Protein-protein interactions-graph-theoretic-modeling
PDF
Biological Network Inference via Gaussian Graphical Models
PDF
Relational machine-learning
PPT
Project3.ppt
presentation
Consensual gene co-expression network inference with multiple samples
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Cornell Pbsb 20090126 Nets
Mining co-expression network
High-Dimensional Machine Learning for Medicine
Network analysis for computational biology
A simple Introduction to Explainability in Machine Learning and AI (XAI)
Inferring networks from multiple samples with consensus LASSO
R Packages Unpacked
Microarray and its application
Large scale machine learning challenges for systems biology
Lecture10 xing
NetBioSIG2014-Talk by Ashwini Patil
A comparative study of covariance selection models for the inference of gene ...
BOSE, Debasish - Research Plan
Protein-protein interactions-graph-theoretic-modeling
Biological Network Inference via Gaussian Graphical Models
Relational machine-learning
Project3.ppt
Ad

More from tuxette (20)

PDF
Analyse comparative de données de génomique 3D
PDF
Detecting differences between 3D genomic data: a benchmark study
PDF
Racines en haut et feuilles en bas : les arbres en maths
PDF
Méthodes à noyaux pour l’intégration de données hétérogènes
PDF
Méthodologies d'intégration de données omiques
PDF
Projets autour de l'Hi-C
PDF
Can deep learning learn chromatin structure from sequence?
PDF
Multi-omics data integration methods: kernel and other machine learning appro...
PDF
ASTERICS : une application pour intégrer des données omiques
PDF
Autour des projets Idefics et MetaboWean
PDF
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
PDF
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
PDF
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
PDF
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
PDF
Journal club: Validation of cluster analysis results on validation data
PDF
Overfitting or overparametrization?
PDF
SOMbrero : un package R pour les cartes auto-organisatrices
PDF
A short and naive introduction to using network in prediction models
PDF
Présentation du projet ASTERICS
PDF
Présentation du projet ASTERICS
Analyse comparative de données de génomique 3D
Detecting differences between 3D genomic data: a benchmark study
Racines en haut et feuilles en bas : les arbres en maths
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodologies d'intégration de données omiques
Projets autour de l'Hi-C
Can deep learning learn chromatin structure from sequence?
Multi-omics data integration methods: kernel and other machine learning appro...
ASTERICS : une application pour intégrer des données omiques
Autour des projets Idefics et MetaboWean
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Journal club: Validation of cluster analysis results on validation data
Overfitting or overparametrization?
SOMbrero : un package R pour les cartes auto-organisatrices
A short and naive introduction to using network in prediction models
Présentation du projet ASTERICS
Présentation du projet ASTERICS

Recently uploaded (20)

PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
The scientific heritage No 166 (166) (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
BIOMOLECULES PPT........................
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
2. Earth - The Living Planet earth and life
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPT
protein biochemistry.ppt for university classes
PPTX
Microbiology with diagram medical studies .pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
microscope-Lecturecjchchchchcuvuvhc.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
The scientific heritage No 166 (166) (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Introduction to Fisheries Biotechnology_Lesson 1.pptx
BIOMOLECULES PPT........................
AlphaEarth Foundations and the Satellite Embedding dataset
POSITIONING IN OPERATION THEATRE ROOM.ppt
Introduction to Cardiovascular system_structure and functions-1
INTRODUCTION TO EVS | Concept of sustainability
ECG_Course_Presentation د.محمد صقران ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
2. Earth - The Living Planet earth and life
TOTAL hIP ARTHROPLASTY Presentation.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
protein biochemistry.ppt for university classes
Microbiology with diagram medical studies .pptx
HPLC-PPT.docx high performance liquid chromatography

Combining co-expression and co-location for gene network inference in porcine muscle development in late gestation

  • 1. Combining co-expression and co-location for gene network inference in porcine muscle development in late gestation Laurence Liaubet, Nathalie Vialaneix October 14th, 2019 - NETBIO Unités MIAT & GenPhySE, INRA Toulouse
  • 2. General context, an interdisciplinary network A fundamental molecular question, the regulation of gene expression An ethical and economic breeding problem, survival at birth A statistical question, modelization of gene co-expression with adding biological information
  • 3. Interpretation of phenotypes not explained only by genetic approach. The same genome sequence produce a wide range of differentiated cells. Modulation of gene expression: Epigenetic marks and chromatin regulation, cis and trans regulation  3D nuclear topography…… Response to physiological context: growth, health, reproduction, adaptation Biological context: a fundamental molecular question
  • 4. Fanucchi et al. Cell 2013 Hierarchical Transcription in a Multigene Complex Biological context: a fundamental molecular question
  • 5. Thèse de Maria Marti-Marimon, 2018 Biological context: an increased mortality at birth 14 % of newborns died between birth and weaning Peak of mortality in the first two days after birth maturity The selection for more prolificacy and meat production has been accompanied by a substantial increase in mortality of piglets at birth
  • 6. Wilson et al., 1998 ; Biensen et al., 1998 ; Leenhouwers et al., 2002 ; Canario, 2006 ; Thèse de Valentin Voillet, 2016 Specific mechanisms during late gestation in pigs Maturity = plain development allowing survival at birth
  • 7. Specific mechanisms during late gestation in pigs – muscle tissue d114 d90 d110 Sampling of longissimus (n=459) ANR project PORCINET (2010) Systems biology of piglet maturity  Transcriptomic analysis (n = 64)
  • 8. A dramatic switch of gene expression occurred in late gestation 5,167 genes differential between 90 and 110 dg in the fetal muscle (Bonferroni 1%) With 1,131 DEGs for age x genotype (maturity) found in Voillet et al. (2014) PCA without variables selection
  • 9. IGF2 (pat), insulin-like growth factor 2 Pig  QTL for adiposity muscle mass Human  fetal growth and intrauterine growth restriction Co-expression = co-regulation? = nuclear co-localization? IGF2, a gene of fundamental importance in pig muscle development IGF2 LWLW MSLW LWMS MSMS LWLW MSLW LWMS MSMS 90 110 Marti-Marimon et al., 2018. RNA + DNA + Pig chr Gene 1 + Gene 2
  • 10. 1. Network inference with GGM 2. Coming back to our problem: gene expression and FISH experiments 2
  • 11. In short, what are we looking for? Hypothetizing that co-expression is related to co-location: • have an automated process to automatically find relevant pairs of genes for which co-location can be tested (because FISH experiments are time consumming and targeted experiments) • improve network inference using co-location information 3
  • 12. 1. Network inference with GGM 4
  • 13. Framework Data: large scale gene expression data individuals n    X =     . . . . . . . . Xj i . . . . . . . . .     variables (gene expressions), p here: micro-array experiment, n = 61 (gestational age: 90 days) and p = 13, 855 uniquely annotated genes 5
  • 14. Framework Data: large scale gene expression data individuals n    X =     . . . . . . . . Xj i . . . . . . . . .     variables (gene expressions), p here: micro-array experiment, n = 61 (gestational age: 90 days) and p = 13, 855 uniquely annotated genes What we want to obtain: a network with • nodes: genes; • edges: large and direct co-expression between two genes (track transcription regulations) 5
  • 15. Using correlations: relevance network Butte and Kohane (1999, 2000) First (naive) approach: calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. “Correlations” Thresholding Graph 6
  • 16. But correlation is not causality… 7
  • 17. But correlation is not causality… strong indirect correlation y z x set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 7
  • 18. But correlation is not causality… strong indirect correlation y z x set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 ♯ Partial correlation cor(lm(y∼x)$residuals,lm(z∼x)$residuals) [1] -0.1933699 7
  • 19. But correlation is not causality… strong indirect correlation y z x Networks are built using partial correlations, i.e., correlations between gene expressions knowing the expression of all the other genes (residual correlations). 7
  • 20. Gaussian Graphical Model (GGM) (Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene expression); then j ←→ j′ (genes j and j′ are linked) ⇔ Cor ( Xj , Xj′ |(Xk )k̸=j,j′ ) ̸= 0 8
  • 21. Gaussian Graphical Model (GGM) (Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene expression); then j ←→ j′ (genes j and j′ are linked) ⇔ Cor ( Xj , Xj′ |(Xk )k̸=j,j′ ) ̸= 0 Cor ( Xj , Xj′ |(Xk )k̸=j,j′ ) ≃ ( Σ−1 ) j,j′ ⇒ find the partial correlations by means of (Σn )−1 . 8
  • 22. Gaussian Graphical Model (GGM) (Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene expression); then j ←→ j′ (genes j and j′ are linked) ⇔ Cor ( Xj , Xj′ |(Xk )k̸=j,j′ ) ̸= 0 Cor ( Xj , Xj′ |(Xk )k̸=j,j′ ) ≃ ( Σ−1 ) j,j′ ⇒ find the partial correlations by means of (Σn )−1 . Problem: Σ is a p-dimensional matrix (with p large) and n is small compared to p ⇒ (Σn )−1 is a poor estimate of Σ−1 ! 8
  • 23. Sparse approaches Relation between partial correlation and LM: if S = Σ−1 and writing Xj = β⊤ j X−j + ϵ we have: βjj′ = Sjj′ Sjj . So edges (non zero partial correlations) also correspond to coefficients different to zero in the p regression models above (for j = 1, . . . , p). 9
  • 24. Sparse approaches Relation between partial correlation and LM: if S = Σ−1 and writing Xj = β⊤ j X−j + ϵ we have: βjj′ = Sjj′ Sjj . So edges (non zero partial correlations) also correspond to coefficients different to zero in the p regression models above (for j = 1, . . . , p). To ensure sparsity of βj: Meinshausen and Bühlmann (2006) argminβj n∑ i=1 ( xj i − β⊤ j x−j i )2 + λ∥βj∥ℓ1 9
  • 25. Including prior knowledge in this model Suppose that we have some clues that: • for some pairs (j, j′ ), an edge is likely to occur between j and j′ • for some pairs (j, j′ ), it is likely that there is no edge between j and j′ 10
  • 26. Including prior knowledge in this model Suppose that we have some clues that: • for some pairs (j, j′ ), an edge is likely to occur between j and j′ • for some pairs (j, j′ ), it is likely that there is no edge between j and j′ then, we want to drive βjj′ • toward ±a with a some positive value (the sign is that of the correlation Cor(Xj , Xj′ )) • toward 0 10
  • 27. Including another penalty in the model argminβj n∑ i=1 ( xj i − β⊤ j x−j i )2 + λ∥βj∥ℓ1 + µ   ∑ j′ of type 1 (βjj′ ± a)2 + ∑ j′ of type 2 (βjj′ )2   smooth penalty for co-localized (or not) pairs 11
  • 28. Including another penalty in the model argminβj n∑ i=1 ( xj i − β⊤ j x−j i )2 + λ∥βj∥ℓ1 + µ   ∑ j′ of type 1 (βjj′ ± a)2 + ∑ j′ of type 2 (βjj′ )2   smooth penalty for co-localized (or not) pairs In practice: • a = 1 (after scaling of gene expressions) • λ chosen with stability selection based on bootstrap for a fixed µ • µ chosen as the minimum value recovering exactly prior knowledge 11
  • 29. 2. Coming back to our problem: gene expression and FISH experiments 12
  • 30. Starting point • restricted to a list of genes likely involved in foetal development (1,131 DEGs between 90 and 110 days of gestation as found in Voillet et al. (2014)) 13
  • 31. Starting point • restricted to a list of genes likely involved in foetal development (1,131 DEGs between 90 and 110 days of gestation as found in Voillet et al. (2014)) • started from an even more restricted list including genes of interest (IGF2, DLK1 and MEG3) and the genes highly correlated to these genes (p = 359 genes at the end) 13
  • 32. Iterative process: from co-location to network and conversely 14
  • 33. What to do with these networks? Network mining… 1. Node importance 2. Clustering of nodes (and comparison of clustering with NMI) 3. GO analysis 15
  • 34. Node importance For every node, computation of: • degree (number of edges afferent to a given node) • betweenness centrality measure The orange node’s degree is equal to 2, its betweenness to 4. 16
  • 35. Node importance For every node, computation of: • degree (number of edges afferent to a given node) • betweenness centrality measure 16
  • 36. Node importance For every node, computation of: • degree (number of edges afferent to a given node) • betweenness centrality measure 16
  • 37. Node importance For every node, computation of: • degree (number of edges afferent to a given node) • betweenness centrality measure 16
  • 38. Node importance For every node, computation of: • degree (number of edges afferent to a given node) • betweenness centrality measure 16
  • 39. Find clusters by modularity optimization The modularity Newman and Girvan (2004) of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K∑ k=1 ∑ xi,xj∈Ck (Wij − Pij) with Pij: weight of a “null model” (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 ∑ j̸=i Wij. 17
  • 40. Interpretation of the modularity A good clustering should maximize the modularity: • Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij • Q ↘ when (xi, xj) are in two different clusters and Wij ≫ Pij (m = 20) Pij = 7.5 Wij = 5 ⇒ Wij − Pij = −2.5 di = 15 dj = 20 i and j in the same cluster decreases the modularity 18
  • 41. Interpretation of the modularity A good clustering should maximize the modularity: • Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij • Q ↘ when (xi, xj) are in two different clusters and Wij ≫ Pij (m = 20) Pij = 0.05 Wij = 5 ⇒ Wij − Pij = 4.95 di = 1 dj = 2 i and j in the same cluster increases the modularity 18
  • 42. Interpretation of the modularity A good clustering should maximize the modularity: • Q ↗ when (xi, xj) are in the same cluster and Wij ≫ Pij • Q ↘ when (xi, xj) are in two different clusters and Wij ≫ Pij • Modularity • helps separate hubs • is not an increasing function of the number of clusters: useful to choose the relevant number of clusters Approximate optimization with the Louvain algorithm Blondel et al. (2008) (among others) 18
  • 43. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 0 to 3 with 359 nodes Network 0 without a priori, 2,279 edges (density: 3.55%) Sub-network extracted around the three target genes Full network Marti-Marimon et al., 2018.
  • 44. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 0 to 3 with 359 nodes Network 0 without a priori, 2,279 edges (density: 3.55%) 359 nodes, 2279 edges Network0, no a priori IGF2–DLK1 IGF2–MEG3 DLK1–MEG3A priori 1 Lahbib-Mansais et al, 2016 Marti-Marimon et al., 2018.
  • 45. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 0 to 3 with 359 nodes Network 0 without a priori, 2,279 edges (density 3.55%) Network 1 with triple co-localization of IGF2, DLK1 and MEG3, 2,250 edges (density 3.50%) Test FISH 3D IGF2 and RPL32 were associated in 20% of the analysed nuclei (threshold 10%) Marti-Marimon et al., 2018.
  • 46. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 0 to 3 with 359 nodes Network 0 without a priori, 2,279 edges and density 3.55% Network 1 with co-localization of IGF2, DLK1 and MEG3, 2,250 edges and density 3.50% Network 2 with test of MEST and DCN associations, 2,091 edges and density 3.25% Test FISH 3D Network inference 34% (+) 10% (-) Marti-Marimon et al., 2018.
  • 47. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 3 with test co-localization with MYH3 (ntw 0 and 1) Network 0 Network 1 Marti-Marimon et al., 2018.
  • 48. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 3 with test co-localization with MYH3 (ntw 0 and 1) MYH3 = Embryonic myosin, excellent biomarker of muscle maturity (Voillet et al., 2018) No functional link known with IGF2! Marti-Marimon et al., 2018.
  • 49. Network inference iteration and 3D FISH validations 359 DEGs were selected for being highly correlated with IGF2, DLK1 and MEG3 (R² ≥ 0.84) Network 3 with test co-localization with MYH3 (ntw 0 and 1), 2,091 edges and density 3.25% 52% (+) 45% (+) 26% (+) Test FISH 3D Network inference Marti-Marimon et al., 2018.
  • 50. Network mining (network structure with key genes) The degree of a node is the number of edges afferent to this gene. High degree genes are connected to many other genes (hub). The betweenness of the node is the number of shortest paths between pairs of genes in the network that pass through that gene. High-betweenness genes are central and more likely to disconnect the network if removed.
  • 51. Network mining (network structure with key genes) gene symbol degree betweeness degree betweeness degree betweeness degree betweeness Degree betweeness ADIPOR2 15 646,65 14 487,32 15 628,78 14 660,97 -7 2 AKR7A2 19 492,63 17 436,71 15 474,10 14 291,90 -26 -41 CD81 17 551,17 18 616,7 15 478,76 17 600,58 0 9 CRAT 19 716,24 15 518,26 16 738,30 14 573,58 -26 -20 DCN 16 438,86 18 560,83 9 288,82 6 357,74 -63 -18 DLK1 10 103,52 6 81,7 5 74,22 5 24,13 -50 -77 DPP4 15 568,91 16 672,01 15 674,94 15 597,87 0 5 EGFR 16 624,92 12 375,87 12 385,35 11 354,78 -31 -43 GHITM 16 578,58 17 588,76 16 592,35 14 496,63 -13 -14 GLUD1 13 575,69 13 553,28 12 574,48 12 586,27 -8 2 IGF2 10 118,26 11 231,09 8 260,58 7 622,44 -30 426 LPAR4 14 464,31 17 644,76 18 812,81 16 798,82 14 72 MEG3 13 282,32 5 55,75 6 120,18 5 24,13 -62 -91 MESP1 12 228,49 14 320,34 14 483,27 14 775,31 17 239 MEST 13 148,2 12 121,44 10 345,69 7 385,27 -46 160 MRPS28 16 743 15 743,29 16 953,42 15 796,14 -6 7 MYH3 14 610,73 14 656,6 11 455,62 4 0,00 -71 -100 NMNAT3 17 562,63 18 664,84 16 473,55 17 573,15 0 2 RAVER1 16 613,84 16 665,73 16 696,35 16 745,66 0 21 RPL32 18 717,96 15 557,65 7 149,80 5 243,11 -72 -66 SELO 18 692,52 14 438,35 14 459,46 15 587,32 -17 -15 SYDE1 15 436,75 17 530,29 14 459,66 18 745,52 20 71 TFRC 15 595,1 15 534,83 13 437,38 17 846,81 13 42 TYRO3 20 785,95 18 659,9 16 603,94 17 700,03 -15 -11 YWHAB 20 670,22 17 470,35 17 538,41 17 547,17 -15 -18 Network 0 Network 1 Network 2 Network 3 Comparison between Network 0 and Network 3 (% of variation) ↗ ↘ ↘ ↘ ↘ ↘ ↘ Marti-Marimon et al., 2018.
  • 52. Network clustering Network 0 Network 1 Network 2 Network 3 Network 0 1 0.3893 0.3381 0.3244 Network 1 0.3893 1 0.4007 0.3923 Network 2 0.3381 0.4007 1 0.4152 Network 3 0.3244 0.3923 0.4152 1 To analyse the evolution of the network structure from Network 0 to Network 3, clustering of the genes was performed on each network. Normalized mutual information (NMI) measure the similarity between two clusterings. The value is comprised between 0 and 1 and is equal to 1 when the two clusterings are identical. clusterings become more consistent when introducing new biological information in each network inference iteration Marti-Marimon et al., 2018.
  • 53. Network clustering: Networks 0 and 3 were analysed in depth to search for any correspondence between clusters Pairwise contingency tables between clusterings. Percentage of genes for each cluster in Network 0 found in each cluster of Network 3. In bold and red, the most resembling values between clusters. Marti-Marimon et al., 2018. 1 2 3 4 5 6 1 64,10 7,69 7,69 2,56 7,69 10,26 2 8,77 68,42 0,00 1,75 19,30 1,75 3 14,89 0,00 65,96 19,15 0,00 0,00 4 3,92 1,96 11,76 82,35 0,00 0,00 5 34,09 6,82 4,55 11,36 43,18 0,00 6 3,57 17,86 14,29 32,14 0,00 32,14 7 11,11 38,89 0,00 11,11 38,89 0,00 8 0,00 48,72 33,33 10,26 5,13 2,56 9 5,56 11,11 16,67 27,78 38,89 0,00 Clusters in Network 3 Clusters in Network 0
  • 54. Functional enrichment analysis: Gene Ontology Biological Process Marti-Marimon et al., 2018. Network 0 - Cluster 1 Network 3 - Cluster 1 GOBP Terms FDR FDR Target Extracellular structure 5,76E-05 1,14E-08 DCN Cellular response to organonitrogen compound 6,80E-04 1,16E-02 IGF2 Reponse to transformaing growth factor beta 2,35E-03 1,24E-01 Multicellular organism metabolic process 2,35E-03 3,05E-03 Skin development 3,18E-03 1,44E-01 Neuron migration 2,82E-02 4,37E-01 Regulation of neuron projection development 3,07E-02 4,93E-01 Mesoderm development 1,24E-01 MEST Muscle organ development 8,35E-01 MYH3 Notch signaling pathway 5,56E-01 DLK1 Collagen fibril organization 1,10E-04 1,02E-05 Network 0 - Cluster 8 Network 3 - Cluster 2 GOBP Terms FDR FDR Generation of precursor metabolites and energy 1,64E-02 1,32E-07 Oxidation-reduction process 7,25E-03 5,63E-09 Energy derivation by oxidation of organic compounds 8,17E-03 1,88E-06 Cellular respiration 8,17E-03 2,65E-07 Functional enrichment analysis based on Gene Ontology was performed using the web tool Webgestalt (WEB-based GEne SeT AnaLysis Toolkit)
  • 55. Functional enrichment analysis: reconstructed network of genes in cluster 1 of Network 3 with Ingenuity Pathway Analysis (IPA) Marti-Marimon et al., 2018. Villa-Vialaneix et al., 2013 IPA proposed to connecting 49 (82%) out of 60 genes in a network. MYOD1 and CTNNB1 were identified by upstream regulator analysis as potential transcriptional factors for a group of genes including IGF2 and MYH3. “Cell Morphology”, 14 genes, p-value = 1.75e-08 “Quantity of cells”, 31 genes, p-value = 2.48e-09 “Morphology of connective tissue cells”, 8 genes, p-value = 1.27e-04 “Formation of muscle”, 10 genes, p-value = 2.98e-05, involved IGF2 and MYH3 together with CTNNB1 and MYOD1.
  • 56. • 82% of edges in Network 0 were conserved in Network 3 • The most important genes in Network 0 were among those showing the highest values of betweenness and degree in Network 3. Not major disturbances in the network structure • In the local analysis, the NMI value revealed that the clusters resembled one another more with each new network inferred. • Four out of six clusters in the final network conserved more than 62% of genes in the corresponding clusters of Network 0. • IGF2-MEST, (DLK1/MEG3)-MEST, (DLK1/MEG3)-DCN, that were observed to be connected in co- expression networks in other studies. • DLK1, MEG3, RPL32, MEST, DCN and MYH3 were less connected with the rest of the other genes in Network 3 but not IGF2. • No previous association between IGF2 and MYH3, even though the two genes are known to be involved in muscle development overexpression and accumulation of β-catenin in the nuclei of differentiating murine myoblasts results in higher MyoD activation and Myhc induction (Ramazzotti et al, 2016) Conclusions Marti-Marimon et al., 2018.
  • 57. • What is published and what is not… • Intermediate modelling is retained as valuable information on robust or non-robust interactions currently, new interactions are being tested by FISH 3D • Dramatic change in gene expression at the end of gestation Search of interaction whole genome (Maria Marti-Marimon thesis) Conclusions - Perspectives Whole genome interaction Maps 3D Chromosome conformation capture Hi-C in progress
  • 58. Thank you for your attention! 19
  • 59. Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communites in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008:1742–5468. Butte, A. and Kohane, I. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium, pages 711–715. Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacific Symposium on Biocomputing, pages 418–429. 20
  • 60. Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Annals of Statistic, 34(3):1436–1462. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. Voillet, V., SanCristobal, M., Lippi, Y., Martin, P. G., Iannuccelli, N., Lascor, C., Vignoles, F., Billon, Y., Canario, L., and Liaubet, L. (2014). Muscle transcriptomic investigation of late fetal deelopment identifies candidate genes for piglet maturity. BMC Genomics, 15:797. 21