SlideShare a Scribd company logo
Structured Regularization for conditional Gaussian
Graphical Models
Julien Chiquet, St´ephane Robin, Tristan Mary-Huard
MAP5 – April the 4th, 2014
arXiv preprint http://arxiv.org/abs/1403.6168
Application to Multi-trait genomic selection (MLCB 2013 NIPS Workshop)
R-package spring https://r-forge.r-project.org/projects/spring-pkg/.
1
Multivariate regression analysis
Consider n samples and let for individual i
yi be the q-dimensional vector of responses,
xi be the p-dimensional vector of predictors,
B be the p × q matrix of regression coefficients
εi be a noise term with a q-dimensional covariance matrix R.
yi = BT
xi + εi , εi ∼ N(0, R), ∀i = 1, . . . , n,
Matrix notation
Let Y(n × q) and X(n × p) be the data matrices, then
Y = XB + ε, vec(ε) ∼ N(0, Ip ⊗ R).
Remark
If X is a design matrix, this is called the“General Linear Model”(GLM).
2
Motivating example: cookie dough data
Osborne, B.G., Fearn, T., Miller, A.R., and Douglas, S.
Application of near infrared reflectance spectroscopy to compositional analys is of
biscuits and biscuit doughs. J. Sci. Food Agr., 1984.
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
10
20
30
40
50
fat sucrose dry flour water
variable
fat
sucrose
dry flour
water
responses
0.8
1.2
1.6
1500 1750 2000 2250
position
reflectance
predictors
q = 4 responses related to the composition of biscuit dough.
p = 256 wavelengths equally sampled between 1380nm and 2400nm.
n = 70 biscuit dough samples. 3
From Low to High dimensional setup
Low dimensional setup
Mardia, Kent and Bibby, Multivariate Analysis, Academic Press, 1979.
Mathematics is the same for both GLM and MLR.
Application of maximum likelihood, least squares and generalized
least squares lead to an estimator which is not defined when n < p
ˆB = XT
X
−1
XT
Y.
High dimensional setup: regularization is a popular answer
Biais B towards a given feasible set to enhance both prediction
performance and interpretability.
What features are required for the coefficients? (sparsity, and. . . )
How do we shape the feasible this set?
4
From Low to High dimensional setup
Low dimensional setup
Mardia, Kent and Bibby, Multivariate Analysis, Academic Press, 1979.
Mathematics is the same for both GLM and MLR.
Application of maximum likelihood, least squares and generalized
least squares lead to an estimator which is not defined when n < p
ˆB = XT
X
−1
XT
Y.
High dimensional setup: regularization is a popular answer
Biais B towards a given feasible set to enhance both prediction
performance and interpretability.
What features are required for the coefficients? (sparsity, and. . . )
How do we shape the feasible this set?
Our proposal: SPRINGa
a
Structured selection of Primordial Relationships IN the General linear model
1. account for the dependency structure between the outputs if it exists
by estimating R
2. pay attention to the direct links between predictors and responses
by means of sparse GGM
3. Integrate some prior information about the predictors
by means of graph-regularization
4
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
5
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
6
Connection between multivariate regression and GGM (I)
Multivariate Linear Regression (MLR)
The model writes
yi |xi ∼ N(BT
xi , R).
with negative log-likelihood
− log L(B, R) =
n
2
log |R| +
1
2
tr (Y − XB)R−1
(Y − XB)T
+ cst,
which is only bi-convex in (B, R).
7
Connection between multivariate regression and GGM (II)
Used in Sohn & Kim (2012) and others
Assume that xi , yi are centered and jointly Gaussian such as
xi
yi
∼ N(0, Σ), with Σ =
Σxx Σxy
Σyx Σyy
, Ω Σ−1
=
Ωxx Ωxy
Ωyx Ωyy
.
A Convex loglikelihood
The model writes yi |xi ∼ N −Ω−1
yyΩyxxi , Ω−1
yy and
−
2
n
log L(Ωxy, Ωyy) = − log |Ωyy| + tr (SyyΩyy)
+ 2tr (SxyΩyx) + tr(ΩyxSxxΩxyΩ−1
yy) + cst.
(with Sxx = XT X/n and so on)
8
CGGM: interpretation (I)
Matrix Ω is related to partial correlations (direct links).
corij|i,j =
−Ωij
Ωii Ωjj
, so Ωij = 0 ⇔ Xi ⊥⊥ Xj |i, j.
Linking parameters of MLR to cGGM
The cGGM“splits”the regression coefficients into two parts
B = −ΩxyΩ−1
yy, R = Ω−1
yy.
1. Ωxy describes the direct links between predictors and responses
2. Ωyy is the inverse of the residual covariance R
B entails both direct and indirect links, the latter due to correlation
between responses
9
CGGM: interpretation (II)
Illustrative examples
NO structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = τ|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Ωxy
10
CGGM: interpretation (II)
Illustrative examples
NO structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.1|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rlow
Ωxy Blow
10
CGGM: interpretation (II)
Illustrative examples
NO structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.5|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rmed
Ωxy Bmed
10
CGGM: interpretation (II)
Illustrative examples
NO structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.9|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rhigh
Ωxy Bhigh
10
CGGM: interpretation (II)
Illustrative examples
Strong structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = τ|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
R
Ωxy B
10
CGGM: interpretation (II)
Illustrative examples
Strong structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.1|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rlow
Ωxy Blow
10
CGGM: interpretation (II)
Illustrative examples
Strong structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.5|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rmed
Ωxy Bmed
10
CGGM: interpretation (II)
Illustrative examples
Strong structure along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = 0.9|i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
Rhigh
Ωxy Bhigh
10
CGGM: interpretation (II)
Illustrative examples
along the predictors
Ωxy: p = 40 predictors, q = 5 outcomes.
R: toeplitz scheme Rij = |i−j|
B = −ΩxyR.
Direct relationships are masked in B in case of
strong correlations between the responses.
R
Ωxy B
Consequence
Our regularization scheme will be applied on the direct links Ωxy.
Remarks
sparsity on Ωxy does not necessarily induce sparsity on B.
The prior structure on the predictors is identical on Ωxy and B as it
applies on the ”rows”
10
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
11
Ball crafting towards structured regularization (1)
Elastic-Net
Grouping effect that catches highly
correlated predictors simultaneously
minimize
β∈Rp
||Xβ − y||2
2
+ λ1 ||β||1
+ λ2 ||β||2
2 .
Zou, H. and Hastie, T.
Regularization and variable selection via the elastic net. JRSS B, 2005.
12
Ball crafting towards structured regularization (2)
Fused-Lasso
Encourages sparsity and identical
consecutive parameters.
minimize
β∈Rp
||Xβ − y||2
2
+ λ1 ||β||1
+ λ2
p−1
j=1
|βj+1 − βj |.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K.
Sparsity and smoothness via the fused lasso. JRSS B, 2005.
13
Ball crafting towards structured regularization (2)
Fused-Lasso
Encourages sparsity and identical
consecutive parameters.
minimize
β∈Rp
||Xβ − y||2
2
+ λ1 ||β||1
+ λ2 ||Dβ||1 ,
with
D =



1 ... ... p
1 −1 1
...
...
...
p−1 −1 1



13
Ball crafting towards structured regularization (3)
Structured/Generalized Elastic-Net
A“smooth”version of fused-Lasso
(neighbors should be close, not
identical).
minimize
β∈Rp
||Xβ − y||2
2
+ λ1 ||β||1
+ λ2
p−1
j=1
(βj+1 − βj )2
.
Slawski, zu Castell and Tutz.
Feature selection guided by structural information. Ann. Appl. Stat., 2010.
Hebiri and van De Geer
The smooth-lasso and other l1 + l2 penalized methods EJS, 2011.
14
Ball crafting towards structured regularization (3)
Structured/Generalized Elastic-Net
A“smooth”version of fused-Lasso
(neighbors should be close, not
identical).
minimize
β∈Rp
||Xβ − y||2
2
+ λ1 ||β||1
+ λ2 βT
DT
Dβ.
L = DT
D =









1 ... ... ... p
1 1 −1
... −1 2 −1
...
...
...
...
...
−1 2 1
p −1 1









.
14
Generalized fused penalty: the univariate case
Graphical interpretation of the fusion penalty
p−1
j=1
|βj+1 − βj |
Fused Lasso
p−1
j=1
(βj+1 − βj )2
Generalized ridge
A chain graph between the successive (ordered) predictors
Generalization via a graphical argument
Let G = (E, V, W) be a graph with weighted edges. Then
(i,j)∈E
wij |βj − βi | = Dβ 1
(i,j)∈E
wij (βj − βi )2
= βT
Lβ,
L = DT D 0 is the graph Laplacian associated o G
15
Adapting this scheme to multivariate settings
Bayesian Interpretation
Suppose the prior structure is encoded in a matrix L.
Univariate case: the conjugate prior for β is N(0, L−1).
Multivariate case: combine with the covariance, then
vec(B) ∼ N(0, R ⊗ L−1
).
Using vec and ⊗ properties, we have for the direct links
vec(Ωxy) ∼ N(0, R−1
⊗ L−1
).
Corresponding regularization term
log P(Ωxy|L, R) =
1
2
tr ΩT
xyLΩxyR + cst.
16
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
17
Optimization problem
Penalized criterion
Encourage sparsity with structuring prior on the direct links:
J(Ωxy, Ωyy) =
−
1
n
log L(Ωxy, Ωyy) +
λ2
2
tr ΩyxLΩxyΩ−1
yy + λ1 Ωxy 1.
Proposition
The objective function is jointly convex in (Ωxy, Ωyy) and admits at
least one global minimum which is unique when n ≥ q and (λ2L + Sxx)
is positive definite.
18
Algorithm
Alternate optimization
ˆΩ
(k+1)
yy = arg min
Ωyy 0
Jλ1λ2 ( ˆΩ
(k)
xy , Ωyy), (1a)
ˆΩ
(k+1)
xy = arg min
Ωxy
Jλ1λ2 (Ωxy, ˆΩ
(k+1)
yy ). (1b)
(1a) boilds down to the diagonalization of a q × q matrix.
O(q3)
(1b) can be recast as generalized Elastic-Net with size pq.
O(npqk) where k is the final number of nonzero entries in ˆΩxy
Convergence
Despite nonsmoothness of the objective, the 1 penalty is separable in
(Ωxy, Ωyy) and results of Tseng (2001, 2009) on convergence of
coordinate descent apply.
19
First block: covariance estimation
Analytic resolution of ˆR
If Ωxy = 0, then Ωyy = S−1
yy. Otherwise we rely on the following:
Proposition
Let n > q. Assume that the following eigen decomposition holds
ˆΩyx
ˆΣ
λ2
xx
ˆΩxySyy = Udiag(ζ)U−1
and denote by η = (η1, . . . , ηq) the roots of η2
j − ηj − ζj . Then
ˆΩyy = Udiag(η/ζ)U−1 ˆΩyx
ˆΣ
λ2
xx
ˆΩxy(= ˆR−1
), (2a)
ˆΩ
−1
yy = SyyUdiag(η−1
)U−1
(= ˆR). (2b)
Proof.
Differentiation of the objective, commuting matrices property,
algebra.
20
Second block: parameters estimation
Reformulation as a Elastic-Net problem
Proposition
Solution ˆΩxy for a fix ˆΩyy is given by vec( ˆΩxy) = ˆω where ˆω solves the
Elastic-Net problem
arg min
ω∈Rpq
1
2
Aω − b 2
2 + λ1 ω 1
λ2
2
ωT ˆΩ
−1
yy ⊗ L ω,
where A and b are defined thanks to the Cholesky decomposition
CT C = ˆΩ
−1
yy, so as
A = C ⊗ X/
√
n , b = −vec YC−1/
√
n
T
.
Proof.
Algebra with bad vec/tr/⊗ properties.
21
Monitoring convergence
Example on the cookies dough data
0 50 100 150 200 250
−5−4−3−2−10
iteration
objective
3e−01
2e−01
1e−01
1e−01
7e−02
4e−02
3e−02
2e−02
1e−02
9e−03
6e−03
4e−03
3e−03
2e−03
1e−03
8e−04
5e−04
4e−04
2e−04
2e−04
λ1
Figure: monitoring the objective along the whole path of λ1
22
Monitoring convergence
Example on the cookies dough data
0 50 100 150 200 250
50100150200
iteration
loglikelihood
3e−01
2e−01
1e−01
1e−01
7e−02
4e−02
3e−02
2e−02
1e−02
9e−03
6e−03
4e−03
3e−03
2e−03
1e−03
8e−04
5e−04
4e−04
2e−04
2e−04
λ1
Figure: monitoring the likelihood along the whole path of λ1
22
Tuning the penalty parameters
K−fold cross-Validation
Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K},
(λcv
1 , λcv
2 ) = arg min
(λ1,λ2)∈Λ1×Λ2
1
n
n
i=1
xT
i
ˆBλ1,λ2
−κ(i) − yi
2
2
.
Information criteria adapted to regularized methods
(λpen
1 , λpen
2 ) = arg min
λ1,λ2
−2 log L( ˆΩ
λ1,λ2
xy , ˆΩ
λ1,λ2
yy ) + pen (dfλ1,λ2 ) .
Proposition (Generalized degrees of freedom)
dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1
AA ,
where A = j : vec ˆΩ
λ1,λ2
xy = 0 , the set of active guys.
23
Tuning the penalty parameters
K−fold cross-Validation
Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K},
(λcv
1 , λcv
2 )
?
= arg min
(λ1,λ2)∈Λ1×Λ2
1
n
n
i=1
log L( ˆΩ
λ1,λ2
xy , ˆΩ
λ1,λ2
yy ; xi , yi ).
Information criteria adapted to regularized methods
(λpen
1 , λpen
2 ) = arg min
λ1,λ2
−2 log L( ˆΩ
λ1,λ2
xy , ˆΩ
λ1,λ2
yy ) + pen (dfλ1,λ2 ) .
Proposition (Generalized degrees of freedom)
dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1
AA ,
where A = j : vec ˆΩ
λ1,λ2
xy = 0 , the set of active guys.
23
Tuning the penalty parameters
K−fold cross-Validation
Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K},
(λcv
1 , λcv
2 )
?
= arg min
(λ1,λ2)∈Λ1×Λ2
1
n
n
i=1
log L( ˆΩ
λ1,λ2
xy , ˆΩ
λ1,λ2
yy ; xi , yi ).
Information criteria adapted to regularized methods
(λpen
1 , λpen
2 ) = arg min
λ1,λ2
−2 log L( ˆΩ
λ1,λ2
xy , ˆΩ
λ1,λ2
yy ) + pen (dfλ1,λ2 ) .
Proposition (Generalized degrees of freedom)
dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1
AA ,
where A = j : vec ˆΩ
λ1,λ2
xy = 0 , the set of active guys.
23
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
24
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
Ωxy
25
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
Rlow
Ωxy Blow
25
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
Rmed
Ωxy Bmed
25
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
Rhigh
Ωxy Bhigh
25
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
R
Ωxy B
25
Assessing gain brought by covariance estimation
Simulation settings
Parameters
p = 40 predictors, q = 5 outcomes.
Ωxy: 25 non null entries in {−1, 1}
no particular structure along the predictors
R: toeplitz scheme
Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}.
B = −ΩxyR.
Data generation
Draw ntrain = 50 + ntest = 1000 samples from
yi = BT
xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R).
Evaluating performance
Compare prediction error on 100 runs between
Lasso, group-Lasso and SPRING.
R
Ωxy B
25
Assessing gain brought by covariance estimation
Results
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1
2
3
4
low medium high
PredictionError
estimator
spring (oracle)
spring
lasso
group−lasso
Figure: Prediction error for 100 runs illustrates the influence of correlations
between outcomes. Scenarios {low, med, high} map to τ ∈ {.1, .5, .9}
26
Assessing gain brought by structure integration
Simulation settings
Parameters
p = 100, q = 1 to remove covariance effect.
Ωxy ωxy: a vector with two successive bumps
ωj =



−((30 − j)2 − 100)/200 j = 21, . . . 39,
((70 − j)2 − 100)/200 j = 61, . . . 80,
0 otherwise.
−0.50
−0.25
0.00
0.25
0.50
0 25 50 75 100
27
Assessing gain brought by structure integration
Simulation settings
Parameters
p = 100, q = 1 to remove covariance effect.
Ωxy ωxy: a vector with two successive bumps
ωj =



−((30 − j)2 − 100)/200 j = 21, . . . 39,
((70 − j)2 − 100)/200 j = 61, . . . 80,
0 otherwise.
R ρ = 5: a residual (scalar) variance.
β = −ωxy/ρ.
Data generation
Draw ntrain = 120 + ntest = 1000 samples from
yi = βT
xi + εi , with xi ∼ N(0, I) and εi ∼ N(0, ρ).
27
Assessing gain brought by structure information
Results (1): predictive performance
40
80
120
160
0.001 0.100
penalty level λ1 (scaled)
PredictionError
estimator
spring (λ2 = .01)
spring (λ2 = .00)
lasso
Figure: Mean PE + standard error for 100 runs on a grid of λ1 – SPRING with
and without structural regularization (L = D D) and Lasso 28
Assessing gain brought by structure information
Results (2): robustness
What if we introduce a“wrong”structure?
Evaluate performance with the same settings but
randomly swap all elements in ωxy to remove any structure.
keep exactly the same xi , εi ,
draw yi with swapped an unswapped parameters,
use the same folds for cross-validation,
then replicate 100 times.
Method Scenario MSE PE
LASSO – .336 (.096) 58.6 (10.2)
E-Net (L = I) – .340 (.095) 59 (10.3)
SPRING (L = I) – .358 (.094) 60.7 (10)
S. E-net unswapped .163 (.036) 41.3 ( 4.08)
(L = DT
D) swapped .352 (.107) 60.3 (11.42)
SPRING unswapped .062 (.022) 31.4 ( 2.99)
(L = DT
D) swapped .378 (.123) 62.9 (13.15)
29
Assessing gain brought by structure information
Results (2): robustness
What if we introduce a“wrong”structure?
Evaluate performance with the same settings but
randomly swap all elements in ωxy to remove any structure.
keep exactly the same xi , εi ,
draw yi with swapped an unswapped parameters,
use the same folds for cross-validation,
then replicate 100 times.
Method Scenario MSE PE
LASSO – .336 (.096) 58.6 (10.2)
E-Net (L = I) – .340 (.095) 59 (10.3)
SPRING (L = I) – .358 (.094) 60.7 (10)
S. E-net unswapped .163 (.036) 41.3 ( 4.08)
(L = DT
D) swapped .352 (.107) 60.3 (11.42)
SPRING unswapped .062 (.022) 31.4 ( 2.99)
(L = DT
D) swapped .378 (.123) 62.9 (13.15)
29
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
30
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
Brown, P.J., Fearn, T., and Vannucci, M.
Bayesian wavelet regression on curves with applications to a spectroscopic
calibration problem. JASA, 2001.
31
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqq
q
qqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqq
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqq
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qq
q
qq
q
qqq
q
qqqqq
q
q
q
q
qqqqq
q
qqq
q
qqq
q
qqqq
q
qqqq
−100
0
100
1500 1750 2000 2250
The Lasso induces sparsity on B
No structure along the predictors.
No structure between responses.
31
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
qqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqq
qqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqq
q
q
qqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqq
q
q
qqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqq
q
q
qqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqq
−100
−50
0
50
100
150
1500 1750 2000 2250
The Group-Lasso induces sparsity on B group-wise across the responses
No structure along the predictors.
(Too) strong structure between responses.
31
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
qqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqq
q
q
qq
q
q
qqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qqqqqqqqqqqqqq
q
qqqqqq
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qq
q
q
q
qqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
qqqqq
q
q
qq
q
q
qqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqq
q
qqq
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
qq
q
q
qqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
qq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqqqqqqqqqqq
q
qqq
q
q
qq
q
qqq
qqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
−20
−10
0
10
20
30
1500 1750 2000 2250
The Structured Elastic-Net induces sparsity on B with a smooth
neighborhood prior along the predictors (L = DT D)
Structure along the predictors.
No structure between responses. 31
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
qqq
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
qqqqqqqqqqq
q
q
q
qqqq
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
qqq
q
q
q
q
q
q
q
q
qqqq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
qq
q
q
qq
qqqqqqqqqq
q
q
q
q
q
q
qq
q
q
qqqqqqqqqqqqqq
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqq
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
qqqqqqqqq
q
q
q
q
q
q
q
q
q
qq
q
qqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqq
qq
q
q
q
qqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
qqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qqqqq
q
q
qqqqq
q
q
q
q
qqqqqqqqqqq
q
q
q
q
q
q
qqqqqqqq
q
q
q
qqqqqqqq
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqq
q
q
q
qqqqq
q
qq
q
qq
qqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
qqqqqqqqqqq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qqqq
q
q
q
q
q
q
q
qqq
q
q
q
qq
qqqqqqqqqqqqqqq
q
q
q
qqq
q
q
q
qqqq
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqq
q
q
q
q
q
q
q
qqqqqqqqq
q
q
q
q
qq
q
q
q
qq
q
qq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqq
q
q
qq
q
q
q
q
qqq
q
qq
q
q
qqqqqqqqq
q
q
qq
qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqq
q
q
q
q
qqqqqq
q
q
qq
q
q
qq
q
qq
q
q
q
qqqqqqqqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
qqqqqqqqq
q
q
qqqqqqqqqqqqqqqqq
q
q
q
q
qqqq
−20
−10
0
10
20
30
1500 1750 2000 2250
MRCE induces sparsity on B and sparsity on R−1
No structure along the predictors.
(Supposed to add) Structure between responses.
31
Cookie dough data: performance
Method fat sucrose flour water
Step. MLR .044 1.188 .722 .221
Decision th. .076 .566 .265 .176
PLS .151 .583 .375 .105
PCR .160 .614 .388 .106
Bayes. Reg. .058 .819 .457 .080
LASSO .045 .860 .376 .104
grp LASSO .127 .918 .467 .102
str E-net .039 .666 .365 .100
MRCE .151 .821 .321 .081
SPRING (CV) .065 .397 .237 .083
SPRING (BIC) .048 .389 .243 .066
Table: Test error
qqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqq
qqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
qqqqqq
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
qqqq
q
q
qq
q
q
qqqqqq
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
qqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqq
q
q
qqqqqqq
qqq
q
q
q
q
qqq
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqq
q
qqq
−20
−10
0
10
20
30
1500 1750 2000 2250
SPRING
Use GGM to induce structured sparsity on the direct links between
the responses and the predictors + smooth neighborhood prior via
L = DT D.
31
Cookie dough data: parameters
ˆB − ˆΩxy
ˆR
qqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqq
qqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
qqqqqq
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
qqqq
q
q
qq
q
q
qqqqqq
q
qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
qqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqq
q
q
qqqqqqq
qqq
q
q
q
q
qqq
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
qqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqq
q
qqq
−20
−10
0
10
20
30
1500 1750 2000 2250
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
qq
q
q
q
q
q
qqqqqqqqqq
qqq
qqqqqqqqqq
qqqqq
q
q
q
q
qqqqqq
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
q
q
q
q
q
q
q
q
qqqqqqqqqq
q
q
qqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqq
q
q
qqqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqq
q
q
qqq
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqq
q
q
q
q
q
q
q
q
qqqqqqqq
q
q
q
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqq
q
q
qq
q
q
q
qq
q
q
qq
q
qq
q
q
qqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqq
q
q
q
q
qq
q
q
q
q
q
q
q
−20
−10
0
10
20
30
1500 1750 2000 2250
dry flour
fat
sucrose
water
dryflour
fat
sucrose
water
−0.25
0.00
0.25
0.50
value
32
Cookie dough data: model selection with BIC
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
−700
−600
−500
−400
0.001 0.100
log10(λ1)
criterion'svalue
λ2
q
q
q
q
0.01
0.1
1
10
BIC
33
Outline
Statistical Model
Regularizing Scheme and optimization
Inference and Optimization
Simulation Studies
Spectroscopy and the cookie dough data
Multi-trait genomic selection for a biparental population (Colza)
34
Quantitative Trait Loci (QTL) study in Colza
Doubled haployd samples
n = 103 homozygous lines of Brassica napus by crossing ‘Stellar’
and ‘Major’ cultivars.
Bi-parental markers
p = 300 markers with known loci dispatched on the 19 chromosomes
with value in {Major, Stellar, Missing} → {1, −1, 0}.
Traits
Consider q = 8 traits including
survival traits (% survival in winter)
surv92, surv93, surv94, surv97, surv99
flowering traits (no vernalization, 4 weeks or 8 weeks vernalization)
flower0, flower4, flower8
35
Include genetic linkage information
Genetic distance between markers A1 and A2
Let r12 be the recombination rate between A1 and A2, then
d12 = −
1
2
log {1 − 2r12} .
Linkage disequilibrium as covariance between the markers
In a biparental population with independent recombination events, one
has
cor(A1, A2) = ρd13
= ρd12+d13
, with ρ = e−2
.
Proposition (Including LD information in the model)
The matrix L is given by inverting the covariance matrix, which can be
done analytically.
36
Analytical form of L as a precision matrix
Usually met in AR(1) processes
L is given by the inverse of the correlation between the markers
L = UT
ΛU
with
U =











1 −ρd12 0 . . . 0 0
0 1 −ρd23 . . . 0 0
0 0 1
... 0 0
...
...
...
...
...
...
0 0 0 . . . 1 −ρdm−1m
0 0 0 . . . 0 1











Λ =











(1 − ρ2d12 )−1 0 . . . . . . . . . 0
0 (1 − ρ2d23 )−1 0 . . . . . . 0
0 0
...
...
...
...
...
...
0 0 0 . . . (1 − ρ2dm−1m )−1 0
0 0 0 . . . 0 1











.
37
Predictive performance
1. Split the data into training/test sets (n1 = 70, n2 = 33),
2. Adjust each procedure using 5-fold CV for model selection,
3. Compute test (prediction) error.
Method surv92 surv93 surv94 surv97 surv99 Mean PE
LASSO .79 .98 .90 1.02 1.00 .938
group-LASSO .90 1.00 .92 .99 .92 .946
Enet (no LD) .87 1.01 .97 1.03 1.03 .983
Gen-Enet LD) .75 .98 .89 1.03 1.02 .934
our proposal (LD) .77 .96 .84 1.00 1.02 .918
Table: Survival traits
Method flower0 flower4 flower8 Mean PE
LASSO .58 .53 .74 .616
group-LASSO .59 .55 .74 .626
Enet (no LD) .55 .54 .69 .593
Gen-Enet (LD) .55 .50 .74 .596
our proposal (LD) .48 .46 .68 .54
Table: Flowering traits
38
Estimated Residual Covariance ˆR
flower0
flower4
flower8
surv92
surv93
surv94
surv97
surv99
flower0 flower4 flower8 surv92 surv93 surv94 surv97 surv99
−1.0
−0.5
0.0
0.5
1.0
correlation
39
Estimated Regression Coefficients ˆB
−0.1
0.0
0.1
0 500 1000 1500
position of the markers
outcomes
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
40
Estimated Direct Effects ˆΩxy
−0.1
0.0
0.1
0 500 1000 1500
position of the markers
outcomes
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
41
Estimated Direct Effects ˆΩxy
−0.1
0.0
0.1
0 500 1000 1500
position of the markers
outcomes
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
41
QTL Mapping (chr. 2, 8, 10), regression coefficients ˆB
ec2e5a
E33M49.117
ec3b12wg2d11b
wg1g8a
E32M49.73
ec3a8
wg7f3a
E33M59.59
E35M62.117wg6b10
wg8g1b
wg5a5
tg6a12
Aca1E38M50.133
E35M59.117
ec2d1a
wg1a10tg2h10b
tg2f12
wg4b6b
wg6g9E33M59.147eru1ec4h3E33M62.99E38M50.157
wg6d9
E38M62.189tg3c1ec3d3bE33M49.175E33M48.268E35M62.80E35M48.143
wg1g4a
E33M47.182b
E38M50.119wg7b3
E33M59.64
ec3g3c
ec2h2bE32M48.212
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0
40
80
120
2 8 10
loci
outcomes
q
q
q
q
q
q
q
q
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
42
QTL Mapping (chr. 2, 8, 10), direct links ˆΩxy
ec2e5a
E33M49.117
ec3b12wg2d11b
wg1g8a
E32M49.73
ec3a8
wg7f3a
E33M59.59
E35M62.117wg6b10
wg8g1b
wg5a5
tg6a12
Aca1E38M50.133
E35M59.117
ec2d1a
wg1a10tg2h10b
tg2f12
wg4b6b
wg6g9E33M59.147eru1ec4h3E33M62.99E38M50.157
wg6d9
E38M62.189tg3c1ec3d3bE33M49.175E33M48.268E35M62.80E35M48.143
wg1g4a
E33M47.182b
E38M50.119wg7b3
E33M59.64
ec3g3c
ec2h2bE32M48.212
q
q
q
q
q
q
q
q
q
qqq
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
qq
q
q q
q
q
q
qq
q
q
0
40
80
120
2 8 10
loci
outcomes
q
q
q
q
q
q
q
surv92
surv94
surv97
surv99
flower0
flower4
flower8
42
QTL Mapping (all chromosomes), ˆB
wg1h4c
wg1g5c
tg5e11b
E35M47.262
tg6c3aE32M48.249E33M50.252
E35M48.120
ec5d5
wg1g10a
E33M48.369E33M50.451
tg1f8
wg7b6a
isoACO
ec4f1
ec2e5a
E33M49.117
ec3b12wg2d11b
wg1g8a
E32M49.73wg3g11
ec3a8
wg7f3a
E33M59.59
E35M62.117wg6b10
wg8g1b
wg5a5
tg6a12
Aca1E38M50.133
E35M59.117
ec2d1a
wg1a10tg2h10b
tg2f12
E33M49.491E38M62.229wg1g10b
ec4h7
wg4d10
E32M50.409
E38M50.159
E33M47.338
ec2d8a
E33M49.165
wg4f4cE35M48.148
wg6c6ec4g7bwg7a11
wg5b1a
ec4e7a
E32M61.218
wg4a4b
E33M50.183E33M62.140
wg9c7
wg6b2
E33M49.211
wg2e12b
isoIdh
wg3f7c
ec3b4
E33M62.206wg5b2
E32M59.302
wg6a12
wg4d7c
ec4c5b
E32M61.166
E32M47.136
E32M62.107
wg6f10
ec5e12c
E38M62.358
E35M62.256
ec5a7b
wg3c9
E33M47.154E35M59.581
E32M47.460
ec4g7aec6b2
E35M62.111wg1g6
E35M62.201
ec4c5aec5a7awg1h5
wg6a10
E33M50.120ec4e8
E33M48.191
E32M47.168
E35M62.225E35M62.340wg1g8cE32M62.75E32M49.473E32M59.330
wg7e10
wg6h1b
wg2c1
tg5h12
wg3b6
wg7d9awg1g3b
wg7h2
wg9d5
E32M59.359
E33M59.353
E32M61.137
ec3h4
wg8g3wg2a11tg2b4
E35M47.367
ec2e4bE32M47.512ec2h2aLemtg5d9a
wg7f5a
wg5a1a
ec3e12a
wg4b6b
wg6g9
E33M59.147eru1ec4h3
E33M62.99
E38M50.157
wg6d9
E33M60.50
wg4h5a
wg3h8ec3d3aec2c7
wg4d11tg1h12
ec2e5bE38M62.461
wg3f7aE35M60.312
E38M62.189
tg3c1
ec3d3b
E33M49.175
E33M48.268E35M62.80
E35M48.143
wg1g4a
E33M47.182b
E38M50.119wg7b3
E33M59.64
ec3g3c
ec2h2b
E32M48.212
wg1g5b
tg6c3bec5a1wg6f3
E32M62.115
E33M62.250
E32M62.186
wg2b7
wg8h5
wg3h4
tg2h10a
tg5e11a
E32M50.90
ec2d1b
E32M50.77
wg1g4c
wg8g1a
wg2c3
wg7f3b
wg4h1
ec4e7bwg5a6
ec2c12
wg2d11a
ec2e12a
wg7a8a
isoLapE33M62.176
E35M48.84
E33M49.293
E35M62.136
eru2E38M50.186
ec4f11
E32M50.252
E32M59.107
wg1g8b
wg2g9
E33M50.282
E35M48.123
wg1e3
wg6d6wg4f4a
ec5c4E35M48.198
E35M62.135wg1a4
ec2e12bwg3h6
wg4d5a
wg5b1b
E33M61.54
ec3d2
E32M48.191E33M59.333
wg6e3b
ec4d11
E32M59.88
ec4g4
wg1g4b
ec3g12
ec3g3a
wg9f2
E35M62.222
wg4a4a
E33M59.234E33M61.84
wg4d7b
ec5e12bec4c11
wg6e3a
E32M48.69
ec3b2b
E32M47.186
ec4d9
wg4d5c
E33M48.67
E35M60.329
wg1h4b
E38M62.188
E32M50.261
E33M50.118
wg1g3a
E35M60.230
wg6a11wg6h1a
E32M62.241
E32M47.288
E33M48.316
E33M59.225ec2e4c
ec3e12b
wg5a1bwg2a3c
ec5e12a
wg7f5bE32M47.159
tg5d9bslg6E35M59.85
ec2d8b
E35M62.132
E35M47.337
E35M47.257
wg9e9
ec2b3E33M60.229
E32M50.325
wg6c1ec3b2a
E35M47.170
wg2d5a
E33M60.120
E33M47.115
wg1g10c
E33M62.130
E32M47.344
E32M50.255
wg3g9
E32M50.424
pr2
tg5b2
E33M59.165
E35M60.125
E33M62.196
ec2h2c
wg3f7b
ec3f1
E35M60.107
wg1g2
E33M48.346
E33M50.371
E33M47.138
tg4d2b
E32M62.394E33M47.189
E32M49.409
wg7b6b
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0
50
100
150
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
chromosomes
loci
outcomes
q
q
q
q
q
q
q
q
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
43
QTL Mapping (all chromosomes), ˆΩxy
wg1h4c
wg1g5c
tg5e11b
E35M47.262
tg6c3aE32M48.249E33M50.252
E35M48.120
ec5d5
wg1g10a
E33M48.369E33M50.451
tg1f8
wg7b6a
isoACO
ec4f1
ec2e5a
E33M49.117
ec3b12wg2d11b
wg1g8a
E32M49.73wg3g11
ec3a8
wg7f3a
E33M59.59
E35M62.117wg6b10
wg8g1b
wg5a5
tg6a12
Aca1E38M50.133
E35M59.117
ec2d1a
wg1a10tg2h10b
tg2f12
E33M49.491E38M62.229wg1g10b
ec4h7
wg4d10
E32M50.409
E38M50.159
E33M47.338
ec2d8a
E33M49.165
wg4f4cE35M48.148
wg6c6ec4g7bwg7a11
wg5b1a
ec4e7a
E32M61.218
wg4a4b
E33M50.183E33M62.140
wg9c7
wg6b2
E33M49.211
wg2e12b
isoIdh
wg3f7c
ec3b4
E33M62.206wg5b2
E32M59.302
wg6a12
wg4d7c
ec4c5b
E32M61.166
E32M47.136
E32M62.107
wg6f10
ec5e12c
E38M62.358
E35M62.256
ec5a7b
wg3c9
E33M47.154E35M59.581
E32M47.460
ec4g7aec6b2
E35M62.111wg1g6
E35M62.201
ec4c5aec5a7awg1h5
wg6a10
E33M50.120ec4e8
E33M48.191
E32M47.168
E35M62.225E35M62.340wg1g8cE32M62.75E32M49.473E32M59.330
wg7e10
wg6h1b
wg2c1
tg5h12
wg3b6
wg7d9awg1g3b
wg7h2
wg9d5
E32M59.359
E33M59.353
E32M61.137
ec3h4
wg8g3wg2a11tg2b4
E35M47.367
ec2e4bE32M47.512ec2h2aLemtg5d9a
wg7f5a
wg5a1a
ec3e12a
wg4b6b
wg6g9
E33M59.147eru1ec4h3
E33M62.99
E38M50.157
wg6d9
E33M60.50
wg4h5a
wg3h8ec3d3aec2c7
wg4d11tg1h12
ec2e5bE38M62.461
wg3f7aE35M60.312
E38M62.189
tg3c1
ec3d3b
E33M49.175
E33M48.268E35M62.80
E35M48.143
wg1g4a
E33M47.182b
E38M50.119wg7b3
E33M59.64
ec3g3c
ec2h2b
E32M48.212
wg1g5b
tg6c3bec5a1wg6f3
E32M62.115
E33M62.250
E32M62.186
wg2b7
wg8h5
wg3h4
tg2h10a
tg5e11a
E32M50.90
ec2d1b
E32M50.77
wg1g4c
wg8g1a
wg2c3
wg7f3b
wg4h1
ec4e7bwg5a6
ec2c12
wg2d11a
ec2e12a
wg7a8a
isoLapE33M62.176
E35M48.84
E33M49.293
E35M62.136
eru2E38M50.186
ec4f11
E32M50.252
E32M59.107
wg1g8b
wg2g9
E33M50.282
E35M48.123
wg1e3
wg6d6wg4f4a
ec5c4E35M48.198
E35M62.135wg1a4
ec2e12bwg3h6
wg4d5a
wg5b1b
E33M61.54
ec3d2
E32M48.191E33M59.333
wg6e3b
ec4d11
E32M59.88
ec4g4
wg1g4b
ec3g12
ec3g3a
wg9f2
E35M62.222
wg4a4a
E33M59.234E33M61.84
wg4d7b
ec5e12bec4c11
wg6e3a
E32M48.69
ec3b2b
E32M47.186
ec4d9
wg4d5c
E33M48.67
E35M60.329
wg1h4b
E38M62.188
E32M50.261
E33M50.118
wg1g3a
E35M60.230
wg6a11wg6h1a
E32M62.241
E32M47.288
E33M48.316
E33M59.225ec2e4c
ec3e12b
wg5a1bwg2a3c
ec5e12a
wg7f5bE32M47.159
tg5d9bslg6E35M59.85
ec2d8b
E35M62.132
E35M47.337
E35M47.257
wg9e9
ec2b3E33M60.229
E32M50.325
wg6c1ec3b2a
E35M47.170
wg2d5a
E33M60.120
E33M47.115
wg1g10c
E33M62.130
E32M47.344
E32M50.255
wg3g9
E32M50.424
pr2
tg5b2
E33M59.165
E35M60.125
E33M62.196
ec2h2c
wg3f7b
ec3f1
E35M60.107
wg1g2
E33M48.346
E33M50.371
E33M47.138
tg4d2b
E32M62.394E33M47.189
E32M49.409
wg7b6b
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0
50
100
150
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
chromosomes
loci
outcomes
q
q
q
q
q
q
q
q
surv92
surv93
surv94
surv97
surv99
flower0
flower4
flower8
43
Some concluding remarks
Perspectives
1. Modelling
Generalized Fused-Lasso penalty
Automatic inference of L
Environment? Multiparental aspect, Multi-population ?
2. Technical algorithmic points
active set strategy in the alternating algorithm
smart screening of irrelevant predictors
full C++ implementation
3. Applications to regulatory motifs discovery
Y is a matrix of q microarrays for n genes (the individuals),
X is the matrice of motif counts in the promotor of each gene.
L is a matrix based upon editing distance between motifs.
A first attempt is made in the paper but we would like to consider
large scale problems (10s/100s of q, 1000s of n, 10,000s of p).
44
Thanks
Hiring! We are looking for a post-doc with strong background in
Optimization and Statistics.
Thanks to you for your patience and to my co-workers
45

More Related Content

PDF
Pattern-based classification of demographic sequences
PDF
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
PDF
A lattice-based consensus clustering
PDF
A new generalized lindley distribution
PDF
prior selection for mixture estimation
PDF
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
PDF
A Coq Library for the Theory of Relational Calculus
PDF
K-algebras on quadripartitioned single valued neutrosophic sets
Pattern-based classification of demographic sequences
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
A lattice-based consensus clustering
A new generalized lindley distribution
prior selection for mixture estimation
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
A Coq Library for the Theory of Relational Calculus
K-algebras on quadripartitioned single valued neutrosophic sets

What's hot (20)

PDF
Steven Duplij - Polyadic systems, representations and quantum groups
PDF
Vancouver18
PDF
Theory of Relational Calculus and its Formalization
PDF
Algebras for programming languages
PDF
better together? statistical learning in models made of modules
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
Physics of Algorithms Talk
PDF
PaperNo6-YousefiHabibi-IJAM
PDF
comments on exponential ergodicity of the bouncy particle sampler
PDF
Steven Duplij, "Polyadic Hopf algebras and quantum groups"
PDF
Clustering in Hilbert geometry for machine learning
PDF
Coordinate sampler : A non-reversible Gibbs-like sampler
PDF
Laplace's Demon: seminar #1
PDF
ABC-Gibbs
PDF
Estimates for a class of non-standard bilinear multipliers
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Solution Manual for Linear Models – Shayle Searle, Marvin Gruber
Steven Duplij - Polyadic systems, representations and quantum groups
Vancouver18
Theory of Relational Calculus and its Formalization
Algebras for programming languages
better together? statistical learning in models made of modules
Maximum likelihood estimation of regularisation parameters in inverse problem...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Continuous and Discrete-Time Analysis of SGD
Physics of Algorithms Talk
PaperNo6-YousefiHabibi-IJAM
comments on exponential ergodicity of the bouncy particle sampler
Steven Duplij, "Polyadic Hopf algebras and quantum groups"
Clustering in Hilbert geometry for machine learning
Coordinate sampler : A non-reversible Gibbs-like sampler
Laplace's Demon: seminar #1
ABC-Gibbs
Estimates for a class of non-standard bilinear multipliers
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Solution Manual for Linear Models – Shayle Searle, Marvin Gruber
Ad

Similar to Structured Regularization for conditional Gaussian graphical model (20)

PPTX
Extreme bound analysis based on correlation coefficient for optimal regressio...
PDF
Presentation
PDF
Reproducibility and differential analysis with selfish
PDF
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
PDF
ABC workshop: 17w5025
PDF
the ABC of ABC
PDF
block-mdp-masters-defense.pdf
PDF
nber_slides.pdf
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
PDF
ABC-Gibbs
PDF
Slides: A glance at information-geometric signal processing
PDF
Paper Summary of Disentangling by Factorising (Factor-VAE)
PDF
Semi-Supervised Regression using Cluster Ensemble
PDF
2018 Modern Math Workshop - Contact Invariants and Reeb Dynamics - Jo Nelson,...
PDF
Regularization and variable selection via elastic net
PDF
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
PDF
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
PDF
A comparative analysis of predictve data mining techniques3
PDF
Factor analysis
PDF
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Extreme bound analysis based on correlation coefficient for optimal regressio...
Presentation
Reproducibility and differential analysis with selfish
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
ABC workshop: 17w5025
the ABC of ABC
block-mdp-masters-defense.pdf
nber_slides.pdf
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
ABC-Gibbs
Slides: A glance at information-geometric signal processing
Paper Summary of Disentangling by Factorising (Factor-VAE)
Semi-Supervised Regression using Cluster Ensemble
2018 Modern Math Workshop - Contact Invariants and Reeb Dynamics - Jo Nelson,...
Regularization and variable selection via elastic net
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
A comparative analysis of predictve data mining techniques3
Factor analysis
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Ad

More from Laboratoire Statistique et génome (6)

PDF
Sparsity by worst-case quadratic penalties
PDF
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
PDF
Weighted Lasso for Network inference
PDF
Gaussian Graphical Models with latent structure
PDF
Multitask learning for GGM
PDF
SIMoNe: Statistical Iference for MOdular NEtworks
Sparsity by worst-case quadratic penalties
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Weighted Lasso for Network inference
Gaussian Graphical Models with latent structure
Multitask learning for GGM
SIMoNe: Statistical Iference for MOdular NEtworks

Recently uploaded (20)

PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Microbiology with diagram medical studies .pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
diccionario toefl examen de ingles para principiante
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Derivatives of integument scales, beaks, horns,.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
An interstellar mission to test astrophysical black holes
Microbiology with diagram medical studies .pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
INTRODUCTION TO EVS | Concept of sustainability
Taita Taveta Laboratory Technician Workshop Presentation.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
famous lake in india and its disturibution and importance
The KM-GBF monitoring framework – status & key messages.pptx
diccionario toefl examen de ingles para principiante
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Phytochemical Investigation of Miliusa longipes.pdf
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Classification Systems_TAXONOMY_SCIENCE8.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
neck nodes and dissection types and lymph nodes levels

Structured Regularization for conditional Gaussian graphical model

  • 1. Structured Regularization for conditional Gaussian Graphical Models Julien Chiquet, St´ephane Robin, Tristan Mary-Huard MAP5 – April the 4th, 2014 arXiv preprint http://arxiv.org/abs/1403.6168 Application to Multi-trait genomic selection (MLCB 2013 NIPS Workshop) R-package spring https://r-forge.r-project.org/projects/spring-pkg/. 1
  • 2. Multivariate regression analysis Consider n samples and let for individual i yi be the q-dimensional vector of responses, xi be the p-dimensional vector of predictors, B be the p × q matrix of regression coefficients εi be a noise term with a q-dimensional covariance matrix R. yi = BT xi + εi , εi ∼ N(0, R), ∀i = 1, . . . , n, Matrix notation Let Y(n × q) and X(n × p) be the data matrices, then Y = XB + ε, vec(ε) ∼ N(0, Ip ⊗ R). Remark If X is a design matrix, this is called the“General Linear Model”(GLM). 2
  • 3. Motivating example: cookie dough data Osborne, B.G., Fearn, T., Miller, A.R., and Douglas, S. Application of near infrared reflectance spectroscopy to compositional analys is of biscuits and biscuit doughs. J. Sci. Food Agr., 1984. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q qqq q q q q q q 10 20 30 40 50 fat sucrose dry flour water variable fat sucrose dry flour water responses 0.8 1.2 1.6 1500 1750 2000 2250 position reflectance predictors q = 4 responses related to the composition of biscuit dough. p = 256 wavelengths equally sampled between 1380nm and 2400nm. n = 70 biscuit dough samples. 3
  • 4. From Low to High dimensional setup Low dimensional setup Mardia, Kent and Bibby, Multivariate Analysis, Academic Press, 1979. Mathematics is the same for both GLM and MLR. Application of maximum likelihood, least squares and generalized least squares lead to an estimator which is not defined when n < p ˆB = XT X −1 XT Y. High dimensional setup: regularization is a popular answer Biais B towards a given feasible set to enhance both prediction performance and interpretability. What features are required for the coefficients? (sparsity, and. . . ) How do we shape the feasible this set? 4
  • 5. From Low to High dimensional setup Low dimensional setup Mardia, Kent and Bibby, Multivariate Analysis, Academic Press, 1979. Mathematics is the same for both GLM and MLR. Application of maximum likelihood, least squares and generalized least squares lead to an estimator which is not defined when n < p ˆB = XT X −1 XT Y. High dimensional setup: regularization is a popular answer Biais B towards a given feasible set to enhance both prediction performance and interpretability. What features are required for the coefficients? (sparsity, and. . . ) How do we shape the feasible this set? Our proposal: SPRINGa a Structured selection of Primordial Relationships IN the General linear model 1. account for the dependency structure between the outputs if it exists by estimating R 2. pay attention to the direct links between predictors and responses by means of sparse GGM 3. Integrate some prior information about the predictors by means of graph-regularization 4
  • 6. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 5
  • 7. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 6
  • 8. Connection between multivariate regression and GGM (I) Multivariate Linear Regression (MLR) The model writes yi |xi ∼ N(BT xi , R). with negative log-likelihood − log L(B, R) = n 2 log |R| + 1 2 tr (Y − XB)R−1 (Y − XB)T + cst, which is only bi-convex in (B, R). 7
  • 9. Connection between multivariate regression and GGM (II) Used in Sohn & Kim (2012) and others Assume that xi , yi are centered and jointly Gaussian such as xi yi ∼ N(0, Σ), with Σ = Σxx Σxy Σyx Σyy , Ω Σ−1 = Ωxx Ωxy Ωyx Ωyy . A Convex loglikelihood The model writes yi |xi ∼ N −Ω−1 yyΩyxxi , Ω−1 yy and − 2 n log L(Ωxy, Ωyy) = − log |Ωyy| + tr (SyyΩyy) + 2tr (SxyΩyx) + tr(ΩyxSxxΩxyΩ−1 yy) + cst. (with Sxx = XT X/n and so on) 8
  • 10. CGGM: interpretation (I) Matrix Ω is related to partial correlations (direct links). corij|i,j = −Ωij Ωii Ωjj , so Ωij = 0 ⇔ Xi ⊥⊥ Xj |i, j. Linking parameters of MLR to cGGM The cGGM“splits”the regression coefficients into two parts B = −ΩxyΩ−1 yy, R = Ω−1 yy. 1. Ωxy describes the direct links between predictors and responses 2. Ωyy is the inverse of the residual covariance R B entails both direct and indirect links, the latter due to correlation between responses 9
  • 11. CGGM: interpretation (II) Illustrative examples NO structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = τ|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Ωxy 10
  • 12. CGGM: interpretation (II) Illustrative examples NO structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.1|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rlow Ωxy Blow 10
  • 13. CGGM: interpretation (II) Illustrative examples NO structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.5|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rmed Ωxy Bmed 10
  • 14. CGGM: interpretation (II) Illustrative examples NO structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.9|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rhigh Ωxy Bhigh 10
  • 15. CGGM: interpretation (II) Illustrative examples Strong structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = τ|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. R Ωxy B 10
  • 16. CGGM: interpretation (II) Illustrative examples Strong structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.1|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rlow Ωxy Blow 10
  • 17. CGGM: interpretation (II) Illustrative examples Strong structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.5|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rmed Ωxy Bmed 10
  • 18. CGGM: interpretation (II) Illustrative examples Strong structure along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = 0.9|i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. Rhigh Ωxy Bhigh 10
  • 19. CGGM: interpretation (II) Illustrative examples along the predictors Ωxy: p = 40 predictors, q = 5 outcomes. R: toeplitz scheme Rij = |i−j| B = −ΩxyR. Direct relationships are masked in B in case of strong correlations between the responses. R Ωxy B Consequence Our regularization scheme will be applied on the direct links Ωxy. Remarks sparsity on Ωxy does not necessarily induce sparsity on B. The prior structure on the predictors is identical on Ωxy and B as it applies on the ”rows” 10
  • 20. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 11
  • 21. Ball crafting towards structured regularization (1) Elastic-Net Grouping effect that catches highly correlated predictors simultaneously minimize β∈Rp ||Xβ − y||2 2 + λ1 ||β||1 + λ2 ||β||2 2 . Zou, H. and Hastie, T. Regularization and variable selection via the elastic net. JRSS B, 2005. 12
  • 22. Ball crafting towards structured regularization (2) Fused-Lasso Encourages sparsity and identical consecutive parameters. minimize β∈Rp ||Xβ − y||2 2 + λ1 ||β||1 + λ2 p−1 j=1 |βj+1 − βj |. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. Sparsity and smoothness via the fused lasso. JRSS B, 2005. 13
  • 23. Ball crafting towards structured regularization (2) Fused-Lasso Encourages sparsity and identical consecutive parameters. minimize β∈Rp ||Xβ − y||2 2 + λ1 ||β||1 + λ2 ||Dβ||1 , with D =    1 ... ... p 1 −1 1 ... ... ... p−1 −1 1    13
  • 24. Ball crafting towards structured regularization (3) Structured/Generalized Elastic-Net A“smooth”version of fused-Lasso (neighbors should be close, not identical). minimize β∈Rp ||Xβ − y||2 2 + λ1 ||β||1 + λ2 p−1 j=1 (βj+1 − βj )2 . Slawski, zu Castell and Tutz. Feature selection guided by structural information. Ann. Appl. Stat., 2010. Hebiri and van De Geer The smooth-lasso and other l1 + l2 penalized methods EJS, 2011. 14
  • 25. Ball crafting towards structured regularization (3) Structured/Generalized Elastic-Net A“smooth”version of fused-Lasso (neighbors should be close, not identical). minimize β∈Rp ||Xβ − y||2 2 + λ1 ||β||1 + λ2 βT DT Dβ. L = DT D =          1 ... ... ... p 1 1 −1 ... −1 2 −1 ... ... ... ... ... −1 2 1 p −1 1          . 14
  • 26. Generalized fused penalty: the univariate case Graphical interpretation of the fusion penalty p−1 j=1 |βj+1 − βj | Fused Lasso p−1 j=1 (βj+1 − βj )2 Generalized ridge A chain graph between the successive (ordered) predictors Generalization via a graphical argument Let G = (E, V, W) be a graph with weighted edges. Then (i,j)∈E wij |βj − βi | = Dβ 1 (i,j)∈E wij (βj − βi )2 = βT Lβ, L = DT D 0 is the graph Laplacian associated o G 15
  • 27. Adapting this scheme to multivariate settings Bayesian Interpretation Suppose the prior structure is encoded in a matrix L. Univariate case: the conjugate prior for β is N(0, L−1). Multivariate case: combine with the covariance, then vec(B) ∼ N(0, R ⊗ L−1 ). Using vec and ⊗ properties, we have for the direct links vec(Ωxy) ∼ N(0, R−1 ⊗ L−1 ). Corresponding regularization term log P(Ωxy|L, R) = 1 2 tr ΩT xyLΩxyR + cst. 16
  • 28. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 17
  • 29. Optimization problem Penalized criterion Encourage sparsity with structuring prior on the direct links: J(Ωxy, Ωyy) = − 1 n log L(Ωxy, Ωyy) + λ2 2 tr ΩyxLΩxyΩ−1 yy + λ1 Ωxy 1. Proposition The objective function is jointly convex in (Ωxy, Ωyy) and admits at least one global minimum which is unique when n ≥ q and (λ2L + Sxx) is positive definite. 18
  • 30. Algorithm Alternate optimization ˆΩ (k+1) yy = arg min Ωyy 0 Jλ1λ2 ( ˆΩ (k) xy , Ωyy), (1a) ˆΩ (k+1) xy = arg min Ωxy Jλ1λ2 (Ωxy, ˆΩ (k+1) yy ). (1b) (1a) boilds down to the diagonalization of a q × q matrix. O(q3) (1b) can be recast as generalized Elastic-Net with size pq. O(npqk) where k is the final number of nonzero entries in ˆΩxy Convergence Despite nonsmoothness of the objective, the 1 penalty is separable in (Ωxy, Ωyy) and results of Tseng (2001, 2009) on convergence of coordinate descent apply. 19
  • 31. First block: covariance estimation Analytic resolution of ˆR If Ωxy = 0, then Ωyy = S−1 yy. Otherwise we rely on the following: Proposition Let n > q. Assume that the following eigen decomposition holds ˆΩyx ˆΣ λ2 xx ˆΩxySyy = Udiag(ζ)U−1 and denote by η = (η1, . . . , ηq) the roots of η2 j − ηj − ζj . Then ˆΩyy = Udiag(η/ζ)U−1 ˆΩyx ˆΣ λ2 xx ˆΩxy(= ˆR−1 ), (2a) ˆΩ −1 yy = SyyUdiag(η−1 )U−1 (= ˆR). (2b) Proof. Differentiation of the objective, commuting matrices property, algebra. 20
  • 32. Second block: parameters estimation Reformulation as a Elastic-Net problem Proposition Solution ˆΩxy for a fix ˆΩyy is given by vec( ˆΩxy) = ˆω where ˆω solves the Elastic-Net problem arg min ω∈Rpq 1 2 Aω − b 2 2 + λ1 ω 1 λ2 2 ωT ˆΩ −1 yy ⊗ L ω, where A and b are defined thanks to the Cholesky decomposition CT C = ˆΩ −1 yy, so as A = C ⊗ X/ √ n , b = −vec YC−1/ √ n T . Proof. Algebra with bad vec/tr/⊗ properties. 21
  • 33. Monitoring convergence Example on the cookies dough data 0 50 100 150 200 250 −5−4−3−2−10 iteration objective 3e−01 2e−01 1e−01 1e−01 7e−02 4e−02 3e−02 2e−02 1e−02 9e−03 6e−03 4e−03 3e−03 2e−03 1e−03 8e−04 5e−04 4e−04 2e−04 2e−04 λ1 Figure: monitoring the objective along the whole path of λ1 22
  • 34. Monitoring convergence Example on the cookies dough data 0 50 100 150 200 250 50100150200 iteration loglikelihood 3e−01 2e−01 1e−01 1e−01 7e−02 4e−02 3e−02 2e−02 1e−02 9e−03 6e−03 4e−03 3e−03 2e−03 1e−03 8e−04 5e−04 4e−04 2e−04 2e−04 λ1 Figure: monitoring the likelihood along the whole path of λ1 22
  • 35. Tuning the penalty parameters K−fold cross-Validation Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K}, (λcv 1 , λcv 2 ) = arg min (λ1,λ2)∈Λ1×Λ2 1 n n i=1 xT i ˆBλ1,λ2 −κ(i) − yi 2 2 . Information criteria adapted to regularized methods (λpen 1 , λpen 2 ) = arg min λ1,λ2 −2 log L( ˆΩ λ1,λ2 xy , ˆΩ λ1,λ2 yy ) + pen (dfλ1,λ2 ) . Proposition (Generalized degrees of freedom) dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1 AA , where A = j : vec ˆΩ λ1,λ2 xy = 0 , the set of active guys. 23
  • 36. Tuning the penalty parameters K−fold cross-Validation Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K}, (λcv 1 , λcv 2 ) ? = arg min (λ1,λ2)∈Λ1×Λ2 1 n n i=1 log L( ˆΩ λ1,λ2 xy , ˆΩ λ1,λ2 yy ; xi , yi ). Information criteria adapted to regularized methods (λpen 1 , λpen 2 ) = arg min λ1,λ2 −2 log L( ˆΩ λ1,λ2 xy , ˆΩ λ1,λ2 yy ) + pen (dfλ1,λ2 ) . Proposition (Generalized degrees of freedom) dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1 AA , where A = j : vec ˆΩ λ1,λ2 xy = 0 , the set of active guys. 23
  • 37. Tuning the penalty parameters K−fold cross-Validation Computationally intensive, but works! For κ : {1, . . . , n} → {1, . . . , K}, (λcv 1 , λcv 2 ) ? = arg min (λ1,λ2)∈Λ1×Λ2 1 n n i=1 log L( ˆΩ λ1,λ2 xy , ˆΩ λ1,λ2 yy ; xi , yi ). Information criteria adapted to regularized methods (λpen 1 , λpen 2 ) = arg min λ1,λ2 −2 log L( ˆΩ λ1,λ2 xy , ˆΩ λ1,λ2 yy ) + pen (dfλ1,λ2 ) . Proposition (Generalized degrees of freedom) dfλ1,λ2 = card(A) − λ2tr ( ˆR ⊗ L)AA( ˆR ⊗ (Sxx + λ2L))−1 AA , where A = j : vec ˆΩ λ1,λ2 xy = 0 , the set of active guys. 23
  • 38. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 24
  • 39. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. Ωxy 25
  • 40. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. Rlow Ωxy Blow 25
  • 41. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. Rmed Ωxy Bmed 25
  • 42. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. Rhigh Ωxy Bhigh 25
  • 43. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. R Ωxy B 25
  • 44. Assessing gain brought by covariance estimation Simulation settings Parameters p = 40 predictors, q = 5 outcomes. Ωxy: 25 non null entries in {−1, 1} no particular structure along the predictors R: toeplitz scheme Rij = τ|i−j| with τ ∈ {0.1, 0.5, 0.9}. B = −ΩxyR. Data generation Draw ntrain = 50 + ntest = 1000 samples from yi = BT xi +εi , with xi ∼ N(0, I) and εi ∼ N(0, R). Evaluating performance Compare prediction error on 100 runs between Lasso, group-Lasso and SPRING. R Ωxy B 25
  • 45. Assessing gain brought by covariance estimation Results qq q q qq q q q q q q q q q q q q q q q q 1 2 3 4 low medium high PredictionError estimator spring (oracle) spring lasso group−lasso Figure: Prediction error for 100 runs illustrates the influence of correlations between outcomes. Scenarios {low, med, high} map to τ ∈ {.1, .5, .9} 26
  • 46. Assessing gain brought by structure integration Simulation settings Parameters p = 100, q = 1 to remove covariance effect. Ωxy ωxy: a vector with two successive bumps ωj =    −((30 − j)2 − 100)/200 j = 21, . . . 39, ((70 − j)2 − 100)/200 j = 61, . . . 80, 0 otherwise. −0.50 −0.25 0.00 0.25 0.50 0 25 50 75 100 27
  • 47. Assessing gain brought by structure integration Simulation settings Parameters p = 100, q = 1 to remove covariance effect. Ωxy ωxy: a vector with two successive bumps ωj =    −((30 − j)2 − 100)/200 j = 21, . . . 39, ((70 − j)2 − 100)/200 j = 61, . . . 80, 0 otherwise. R ρ = 5: a residual (scalar) variance. β = −ωxy/ρ. Data generation Draw ntrain = 120 + ntest = 1000 samples from yi = βT xi + εi , with xi ∼ N(0, I) and εi ∼ N(0, ρ). 27
  • 48. Assessing gain brought by structure information Results (1): predictive performance 40 80 120 160 0.001 0.100 penalty level λ1 (scaled) PredictionError estimator spring (λ2 = .01) spring (λ2 = .00) lasso Figure: Mean PE + standard error for 100 runs on a grid of λ1 – SPRING with and without structural regularization (L = D D) and Lasso 28
  • 49. Assessing gain brought by structure information Results (2): robustness What if we introduce a“wrong”structure? Evaluate performance with the same settings but randomly swap all elements in ωxy to remove any structure. keep exactly the same xi , εi , draw yi with swapped an unswapped parameters, use the same folds for cross-validation, then replicate 100 times. Method Scenario MSE PE LASSO – .336 (.096) 58.6 (10.2) E-Net (L = I) – .340 (.095) 59 (10.3) SPRING (L = I) – .358 (.094) 60.7 (10) S. E-net unswapped .163 (.036) 41.3 ( 4.08) (L = DT D) swapped .352 (.107) 60.3 (11.42) SPRING unswapped .062 (.022) 31.4 ( 2.99) (L = DT D) swapped .378 (.123) 62.9 (13.15) 29
  • 50. Assessing gain brought by structure information Results (2): robustness What if we introduce a“wrong”structure? Evaluate performance with the same settings but randomly swap all elements in ωxy to remove any structure. keep exactly the same xi , εi , draw yi with swapped an unswapped parameters, use the same folds for cross-validation, then replicate 100 times. Method Scenario MSE PE LASSO – .336 (.096) 58.6 (10.2) E-Net (L = I) – .340 (.095) 59 (10.3) SPRING (L = I) – .358 (.094) 60.7 (10) S. E-net unswapped .163 (.036) 41.3 ( 4.08) (L = DT D) swapped .352 (.107) 60.3 (11.42) SPRING unswapped .062 (.022) 31.4 ( 2.99) (L = DT D) swapped .378 (.123) 62.9 (13.15) 29
  • 51. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 30
  • 52. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error Brown, P.J., Fearn, T., and Vannucci, M. Bayesian wavelet regression on curves with applications to a spectroscopic calibration problem. JASA, 2001. 31
  • 53. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error qq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqq q qqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqq qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qq q qq q qqq q qqqqq q q q q qqqqq q qqq q qqq q qqqq q qqqq −100 0 100 1500 1750 2000 2250 The Lasso induces sparsity on B No structure along the predictors. No structure between responses. 31
  • 54. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qqqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqq qqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqq q q qqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqq q q qqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqq q q qqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqq −100 −50 0 50 100 150 1500 1750 2000 2250 The Group-Lasso induces sparsity on B group-wise across the responses No structure along the predictors. (Too) strong structure between responses. 31
  • 55. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q qqqqqqq q q q q q q q q q qq q q q q q q qqqqqqqqqqqqqq q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qqqqqqqq q q q q q qqqqqqqqqqqqqqqq q q qq q q qqqqqqqqqqqq q q q q q q q q q qq q q q q q q q q qqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqqqqqqqqqqqqqqqq q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q qqqqqqqqqqqqqqqqq q q q q q q q q q q q qqq q q q qqqqqqqqqqqqqq q qqqqqq q q q qqqq q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qq q q q qqqqqqqqqqqqqqqqqq q q q q q q q q qqqqq q q qq q q qqqqqqqqqqqqqq q q q q q q q q q qqqqqqqqqqqqqqqqqq q q q q q q q q q q q qq q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqq q qqq q qqqqqqqqqqqqqqqqqqqqq q q q qq q q qqqqqqqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqqqqq q q q qq q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q qqqqqqqqqqqqqqqqqq q q q q q q q qq q q q q q q qqqqqqqqqqqq q qqq q q qq q qqq qqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q qqq q q q q q q q q q q q q qqqqqqqq q qq q q q q q q q q q q q q q −20 −10 0 10 20 30 1500 1750 2000 2250 The Structured Elastic-Net induces sparsity on B with a smooth neighborhood prior along the predictors (L = DT D) Structure along the predictors. No structure between responses. 31
  • 56. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error qqq q q q q qqqqqq q q q q q q q q qqqqqqqqqqq q q q qqqq q q q qq q q q q q q qq q q q qqq q q q q q q q q qqqq q qq qq q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q qq qq q q q qq q q qq qqqqqqqqqq q q q q q q qq q q qqqqqqqqqqqqqq q q q q q qq q q q qq q q q q q q q qq q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqq q q q q q q q q q qq q qqq q q q q qqqqqqqqq q q q q q q q q q qq q qqqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqq qq q q q qqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q qqq q q q q qqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q q q q q q q q qq q q q qqqq q q q q q q qq qq q q q q q q q q q q q q qqq q q q q q q q qqqqq q q qqqqq q q q q qqqqqqqqqqq q q q q q q qqqqqqqq q q q qqqqqqqq q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqq q q q qqqqq q qq q qq qqqqqqqqqqqqqqqqqqq q q q q q q qqqqqqqqqqq q q qq q q q q q q q q q q q q qq q q qq q q q qqqq q q q q q q q qqq q q q qq qqqqqqqqqqqqqqq q q q qqq q q q qqqq qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqq q q q q q q q qqqqqqqqq q q q q qq q q q qq q qq q q q qqqqqqqqqqqqqqqqqqqqqqq q q qq q q q q qqq q qq q q qqqqqqqqq q q qq qq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqq q q q q qqqqqq q q qq q q qq q qq q q q qqqqqqqqqq q q q q q q q qq q q q q q qqq q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qqqqqqqqq q q qqqqqqqqqqqqqqqqq q q q q qqqq −20 −10 0 10 20 30 1500 1750 2000 2250 MRCE induces sparsity on B and sparsity on R−1 No structure along the predictors. (Supposed to add) Structure between responses. 31
  • 57. Cookie dough data: performance Method fat sucrose flour water Step. MLR .044 1.188 .722 .221 Decision th. .076 .566 .265 .176 PLS .151 .583 .375 .105 PCR .160 .614 .388 .106 Bayes. Reg. .058 .819 .457 .080 LASSO .045 .860 .376 .104 grp LASSO .127 .918 .467 .102 str E-net .039 .666 .365 .100 MRCE .151 .821 .321 .081 SPRING (CV) .065 .397 .237 .083 SPRING (BIC) .048 .389 .243 .066 Table: Test error qqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqqqqqqq q q qqqqqqqqqqqqq qqqqqqqqqqq q q q q q q q qqqqqqqq q q q q q qqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqqq q q q q q q q q q q q q q q q q q q q q qqqqqqqqqq q q q q q q q q q q q q q q qqqqq q q q q q qqqqqq q q qqqqqqq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q qqq q q qqqq q q qq q q qqqqqq q qqq q q q q q qq q q q q q q q q q q q q qqqqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qqqqqqqqqqqqq q q q q q q q q q q q qqqqqqq q q q qqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q qqqqqqqqqqqqq q q qqqqqqq qqq q q q q qqq q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqq q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqq q qqq −20 −10 0 10 20 30 1500 1750 2000 2250 SPRING Use GGM to induce structured sparsity on the direct links between the responses and the predictors + smooth neighborhood prior via L = DT D. 31
  • 58. Cookie dough data: parameters ˆB − ˆΩxy ˆR qqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqqqqqqq q q qqqqqqqqqqqqq qqqqqqqqqqq q q q q q q q qqqqqqqq q q q q q qqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqqq q q q q q q q q q q q q q q q q q q q q qqqqqqqqqq q q q q q q q q q q q q q q qqqqq q q q q q qqqqqq q q qqqqqqq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q qqq q q qqqq q q qq q q qqqqqq q qqq q q q q q qq q q q q q q q q q q q q qqqqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qqqqqqqqqqqqq q q q q q q q q q q q qqqqqqq q q q qqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q qqqqqqqqqqqqq q q qqqqqqq qqq q q q q qqq q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q qqqqq q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqqqq q qqq −20 −10 0 10 20 30 1500 1750 2000 2250 qqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q qqqq q q q q q q q q q qqqq q q q q q q q q q q qqqq q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q qqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q qq q q q q q qqqqqqqqqq qqq qqqqqqqqqq qqqqq q q q q qqqqqq q q q q q q qqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q qqqqqq q q q q q q q q qqqqqqqqqq q q qqqqqqqqqqqqq q q q q q q q q q q q q q q q q qqqqqqq q q q q q q q q q q q q q q q q qqq q q q q q qqq q q q q q q q q q qqqq q q q qqq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq q q qqqq q q q qq q q q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q qqq q q qqq q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqq q q q q q q q q qqqqqqqq q q q q q qqqqqqq q q q q q q q q q q q qqqq q q q q q q q q q q q q q q qqqqqqqq q q qq q q q qq q q qq q qq q q qqqqqqqqq q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q qqqq q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q q q qq q q q q q q q qqqqqqqqqqqqqqqqq q q q q qq q q q q q q q −20 −10 0 10 20 30 1500 1750 2000 2250 dry flour fat sucrose water dryflour fat sucrose water −0.25 0.00 0.25 0.50 value 32
  • 59. Cookie dough data: model selection with BIC q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q −700 −600 −500 −400 0.001 0.100 log10(λ1) criterion'svalue λ2 q q q q 0.01 0.1 1 10 BIC 33
  • 60. Outline Statistical Model Regularizing Scheme and optimization Inference and Optimization Simulation Studies Spectroscopy and the cookie dough data Multi-trait genomic selection for a biparental population (Colza) 34
  • 61. Quantitative Trait Loci (QTL) study in Colza Doubled haployd samples n = 103 homozygous lines of Brassica napus by crossing ‘Stellar’ and ‘Major’ cultivars. Bi-parental markers p = 300 markers with known loci dispatched on the 19 chromosomes with value in {Major, Stellar, Missing} → {1, −1, 0}. Traits Consider q = 8 traits including survival traits (% survival in winter) surv92, surv93, surv94, surv97, surv99 flowering traits (no vernalization, 4 weeks or 8 weeks vernalization) flower0, flower4, flower8 35
  • 62. Include genetic linkage information Genetic distance between markers A1 and A2 Let r12 be the recombination rate between A1 and A2, then d12 = − 1 2 log {1 − 2r12} . Linkage disequilibrium as covariance between the markers In a biparental population with independent recombination events, one has cor(A1, A2) = ρd13 = ρd12+d13 , with ρ = e−2 . Proposition (Including LD information in the model) The matrix L is given by inverting the covariance matrix, which can be done analytically. 36
  • 63. Analytical form of L as a precision matrix Usually met in AR(1) processes L is given by the inverse of the correlation between the markers L = UT ΛU with U =            1 −ρd12 0 . . . 0 0 0 1 −ρd23 . . . 0 0 0 0 1 ... 0 0 ... ... ... ... ... ... 0 0 0 . . . 1 −ρdm−1m 0 0 0 . . . 0 1            Λ =            (1 − ρ2d12 )−1 0 . . . . . . . . . 0 0 (1 − ρ2d23 )−1 0 . . . . . . 0 0 0 ... ... ... ... ... ... 0 0 0 . . . (1 − ρ2dm−1m )−1 0 0 0 0 . . . 0 1            . 37
  • 64. Predictive performance 1. Split the data into training/test sets (n1 = 70, n2 = 33), 2. Adjust each procedure using 5-fold CV for model selection, 3. Compute test (prediction) error. Method surv92 surv93 surv94 surv97 surv99 Mean PE LASSO .79 .98 .90 1.02 1.00 .938 group-LASSO .90 1.00 .92 .99 .92 .946 Enet (no LD) .87 1.01 .97 1.03 1.03 .983 Gen-Enet LD) .75 .98 .89 1.03 1.02 .934 our proposal (LD) .77 .96 .84 1.00 1.02 .918 Table: Survival traits Method flower0 flower4 flower8 Mean PE LASSO .58 .53 .74 .616 group-LASSO .59 .55 .74 .626 Enet (no LD) .55 .54 .69 .593 Gen-Enet (LD) .55 .50 .74 .596 our proposal (LD) .48 .46 .68 .54 Table: Flowering traits 38
  • 65. Estimated Residual Covariance ˆR flower0 flower4 flower8 surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 surv92 surv93 surv94 surv97 surv99 −1.0 −0.5 0.0 0.5 1.0 correlation 39
  • 66. Estimated Regression Coefficients ˆB −0.1 0.0 0.1 0 500 1000 1500 position of the markers outcomes surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 40
  • 67. Estimated Direct Effects ˆΩxy −0.1 0.0 0.1 0 500 1000 1500 position of the markers outcomes surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 41
  • 68. Estimated Direct Effects ˆΩxy −0.1 0.0 0.1 0 500 1000 1500 position of the markers outcomes surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 41
  • 69. QTL Mapping (chr. 2, 8, 10), regression coefficients ˆB ec2e5a E33M49.117 ec3b12wg2d11b wg1g8a E32M49.73 ec3a8 wg7f3a E33M59.59 E35M62.117wg6b10 wg8g1b wg5a5 tg6a12 Aca1E38M50.133 E35M59.117 ec2d1a wg1a10tg2h10b tg2f12 wg4b6b wg6g9E33M59.147eru1ec4h3E33M62.99E38M50.157 wg6d9 E38M62.189tg3c1ec3d3bE33M49.175E33M48.268E35M62.80E35M48.143 wg1g4a E33M47.182b E38M50.119wg7b3 E33M59.64 ec3g3c ec2h2bE32M48.212 q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q qq qq q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q q 0 40 80 120 2 8 10 loci outcomes q q q q q q q q surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 42
  • 70. QTL Mapping (chr. 2, 8, 10), direct links ˆΩxy ec2e5a E33M49.117 ec3b12wg2d11b wg1g8a E32M49.73 ec3a8 wg7f3a E33M59.59 E35M62.117wg6b10 wg8g1b wg5a5 tg6a12 Aca1E38M50.133 E35M59.117 ec2d1a wg1a10tg2h10b tg2f12 wg4b6b wg6g9E33M59.147eru1ec4h3E33M62.99E38M50.157 wg6d9 E38M62.189tg3c1ec3d3bE33M49.175E33M48.268E35M62.80E35M48.143 wg1g4a E33M47.182b E38M50.119wg7b3 E33M59.64 ec3g3c ec2h2bE32M48.212 q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q q q qq q q 0 40 80 120 2 8 10 loci outcomes q q q q q q q surv92 surv94 surv97 surv99 flower0 flower4 flower8 42
  • 71. QTL Mapping (all chromosomes), ˆB wg1h4c wg1g5c tg5e11b E35M47.262 tg6c3aE32M48.249E33M50.252 E35M48.120 ec5d5 wg1g10a E33M48.369E33M50.451 tg1f8 wg7b6a isoACO ec4f1 ec2e5a E33M49.117 ec3b12wg2d11b wg1g8a E32M49.73wg3g11 ec3a8 wg7f3a E33M59.59 E35M62.117wg6b10 wg8g1b wg5a5 tg6a12 Aca1E38M50.133 E35M59.117 ec2d1a wg1a10tg2h10b tg2f12 E33M49.491E38M62.229wg1g10b ec4h7 wg4d10 E32M50.409 E38M50.159 E33M47.338 ec2d8a E33M49.165 wg4f4cE35M48.148 wg6c6ec4g7bwg7a11 wg5b1a ec4e7a E32M61.218 wg4a4b E33M50.183E33M62.140 wg9c7 wg6b2 E33M49.211 wg2e12b isoIdh wg3f7c ec3b4 E33M62.206wg5b2 E32M59.302 wg6a12 wg4d7c ec4c5b E32M61.166 E32M47.136 E32M62.107 wg6f10 ec5e12c E38M62.358 E35M62.256 ec5a7b wg3c9 E33M47.154E35M59.581 E32M47.460 ec4g7aec6b2 E35M62.111wg1g6 E35M62.201 ec4c5aec5a7awg1h5 wg6a10 E33M50.120ec4e8 E33M48.191 E32M47.168 E35M62.225E35M62.340wg1g8cE32M62.75E32M49.473E32M59.330 wg7e10 wg6h1b wg2c1 tg5h12 wg3b6 wg7d9awg1g3b wg7h2 wg9d5 E32M59.359 E33M59.353 E32M61.137 ec3h4 wg8g3wg2a11tg2b4 E35M47.367 ec2e4bE32M47.512ec2h2aLemtg5d9a wg7f5a wg5a1a ec3e12a wg4b6b wg6g9 E33M59.147eru1ec4h3 E33M62.99 E38M50.157 wg6d9 E33M60.50 wg4h5a wg3h8ec3d3aec2c7 wg4d11tg1h12 ec2e5bE38M62.461 wg3f7aE35M60.312 E38M62.189 tg3c1 ec3d3b E33M49.175 E33M48.268E35M62.80 E35M48.143 wg1g4a E33M47.182b E38M50.119wg7b3 E33M59.64 ec3g3c ec2h2b E32M48.212 wg1g5b tg6c3bec5a1wg6f3 E32M62.115 E33M62.250 E32M62.186 wg2b7 wg8h5 wg3h4 tg2h10a tg5e11a E32M50.90 ec2d1b E32M50.77 wg1g4c wg8g1a wg2c3 wg7f3b wg4h1 ec4e7bwg5a6 ec2c12 wg2d11a ec2e12a wg7a8a isoLapE33M62.176 E35M48.84 E33M49.293 E35M62.136 eru2E38M50.186 ec4f11 E32M50.252 E32M59.107 wg1g8b wg2g9 E33M50.282 E35M48.123 wg1e3 wg6d6wg4f4a ec5c4E35M48.198 E35M62.135wg1a4 ec2e12bwg3h6 wg4d5a wg5b1b E33M61.54 ec3d2 E32M48.191E33M59.333 wg6e3b ec4d11 E32M59.88 ec4g4 wg1g4b ec3g12 ec3g3a wg9f2 E35M62.222 wg4a4a E33M59.234E33M61.84 wg4d7b ec5e12bec4c11 wg6e3a E32M48.69 ec3b2b E32M47.186 ec4d9 wg4d5c E33M48.67 E35M60.329 wg1h4b E38M62.188 E32M50.261 E33M50.118 wg1g3a E35M60.230 wg6a11wg6h1a E32M62.241 E32M47.288 E33M48.316 E33M59.225ec2e4c ec3e12b wg5a1bwg2a3c ec5e12a wg7f5bE32M47.159 tg5d9bslg6E35M59.85 ec2d8b E35M62.132 E35M47.337 E35M47.257 wg9e9 ec2b3E33M60.229 E32M50.325 wg6c1ec3b2a E35M47.170 wg2d5a E33M60.120 E33M47.115 wg1g10c E33M62.130 E32M47.344 E32M50.255 wg3g9 E32M50.424 pr2 tg5b2 E33M59.165 E35M60.125 E33M62.196 ec2h2c wg3f7b ec3f1 E35M60.107 wg1g2 E33M48.346 E33M50.371 E33M47.138 tg4d2b E32M62.394E33M47.189 E32M49.409 wg7b6b q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 chromosomes loci outcomes q q q q q q q q surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 43
  • 72. QTL Mapping (all chromosomes), ˆΩxy wg1h4c wg1g5c tg5e11b E35M47.262 tg6c3aE32M48.249E33M50.252 E35M48.120 ec5d5 wg1g10a E33M48.369E33M50.451 tg1f8 wg7b6a isoACO ec4f1 ec2e5a E33M49.117 ec3b12wg2d11b wg1g8a E32M49.73wg3g11 ec3a8 wg7f3a E33M59.59 E35M62.117wg6b10 wg8g1b wg5a5 tg6a12 Aca1E38M50.133 E35M59.117 ec2d1a wg1a10tg2h10b tg2f12 E33M49.491E38M62.229wg1g10b ec4h7 wg4d10 E32M50.409 E38M50.159 E33M47.338 ec2d8a E33M49.165 wg4f4cE35M48.148 wg6c6ec4g7bwg7a11 wg5b1a ec4e7a E32M61.218 wg4a4b E33M50.183E33M62.140 wg9c7 wg6b2 E33M49.211 wg2e12b isoIdh wg3f7c ec3b4 E33M62.206wg5b2 E32M59.302 wg6a12 wg4d7c ec4c5b E32M61.166 E32M47.136 E32M62.107 wg6f10 ec5e12c E38M62.358 E35M62.256 ec5a7b wg3c9 E33M47.154E35M59.581 E32M47.460 ec4g7aec6b2 E35M62.111wg1g6 E35M62.201 ec4c5aec5a7awg1h5 wg6a10 E33M50.120ec4e8 E33M48.191 E32M47.168 E35M62.225E35M62.340wg1g8cE32M62.75E32M49.473E32M59.330 wg7e10 wg6h1b wg2c1 tg5h12 wg3b6 wg7d9awg1g3b wg7h2 wg9d5 E32M59.359 E33M59.353 E32M61.137 ec3h4 wg8g3wg2a11tg2b4 E35M47.367 ec2e4bE32M47.512ec2h2aLemtg5d9a wg7f5a wg5a1a ec3e12a wg4b6b wg6g9 E33M59.147eru1ec4h3 E33M62.99 E38M50.157 wg6d9 E33M60.50 wg4h5a wg3h8ec3d3aec2c7 wg4d11tg1h12 ec2e5bE38M62.461 wg3f7aE35M60.312 E38M62.189 tg3c1 ec3d3b E33M49.175 E33M48.268E35M62.80 E35M48.143 wg1g4a E33M47.182b E38M50.119wg7b3 E33M59.64 ec3g3c ec2h2b E32M48.212 wg1g5b tg6c3bec5a1wg6f3 E32M62.115 E33M62.250 E32M62.186 wg2b7 wg8h5 wg3h4 tg2h10a tg5e11a E32M50.90 ec2d1b E32M50.77 wg1g4c wg8g1a wg2c3 wg7f3b wg4h1 ec4e7bwg5a6 ec2c12 wg2d11a ec2e12a wg7a8a isoLapE33M62.176 E35M48.84 E33M49.293 E35M62.136 eru2E38M50.186 ec4f11 E32M50.252 E32M59.107 wg1g8b wg2g9 E33M50.282 E35M48.123 wg1e3 wg6d6wg4f4a ec5c4E35M48.198 E35M62.135wg1a4 ec2e12bwg3h6 wg4d5a wg5b1b E33M61.54 ec3d2 E32M48.191E33M59.333 wg6e3b ec4d11 E32M59.88 ec4g4 wg1g4b ec3g12 ec3g3a wg9f2 E35M62.222 wg4a4a E33M59.234E33M61.84 wg4d7b ec5e12bec4c11 wg6e3a E32M48.69 ec3b2b E32M47.186 ec4d9 wg4d5c E33M48.67 E35M60.329 wg1h4b E38M62.188 E32M50.261 E33M50.118 wg1g3a E35M60.230 wg6a11wg6h1a E32M62.241 E32M47.288 E33M48.316 E33M59.225ec2e4c ec3e12b wg5a1bwg2a3c ec5e12a wg7f5bE32M47.159 tg5d9bslg6E35M59.85 ec2d8b E35M62.132 E35M47.337 E35M47.257 wg9e9 ec2b3E33M60.229 E32M50.325 wg6c1ec3b2a E35M47.170 wg2d5a E33M60.120 E33M47.115 wg1g10c E33M62.130 E32M47.344 E32M50.255 wg3g9 E32M50.424 pr2 tg5b2 E33M59.165 E35M60.125 E33M62.196 ec2h2c wg3f7b ec3f1 E35M60.107 wg1g2 E33M48.346 E33M50.371 E33M47.138 tg4d2b E32M62.394E33M47.189 E32M49.409 wg7b6b q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 chromosomes loci outcomes q q q q q q q q surv92 surv93 surv94 surv97 surv99 flower0 flower4 flower8 43
  • 73. Some concluding remarks Perspectives 1. Modelling Generalized Fused-Lasso penalty Automatic inference of L Environment? Multiparental aspect, Multi-population ? 2. Technical algorithmic points active set strategy in the alternating algorithm smart screening of irrelevant predictors full C++ implementation 3. Applications to regulatory motifs discovery Y is a matrix of q microarrays for n genes (the individuals), X is the matrice of motif counts in the promotor of each gene. L is a matrix based upon editing distance between motifs. A first attempt is made in the paper but we would like to consider large scale problems (10s/100s of q, 1000s of n, 10,000s of p). 44
  • 74. Thanks Hiring! We are looking for a post-doc with strong background in Optimization and Statistics. Thanks to you for your patience and to my co-workers 45