SlideShare a Scribd company logo
About functional SIR
Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix
nathalie.villa@toulouse.inra.fr
http://www.nathalievilla.org
Journées “Données fonctionnelles”
Institut de Mathématiques de Toulouse, June 19th 2017
Nathalie Villa-Vialaneix | SISIR 1/34
A joint work of SFCB team
Victor Picheny Rémi Servien NV2
Nathalie Villa-Vialaneix | SISIR 2/34
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations and Real data
Nathalie Villa-Vialaneix | SISIR 3/34
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations and Real data
Nathalie Villa-Vialaneix | SISIR 4/34
Introduction
X a functional random variable and Y ∈ R
n i.i.d. realizations of (X, Y)
Nathalie Villa-Vialaneix | SISIR 5/34
Objectives
variable selection in functional regression
selection of full intervals made of consecutive points
without any a priori information on the intervals
fully data-driven procedure
Nathalie Villa-Vialaneix | SISIR 6/34
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Nathalie Villa-Vialaneix | SISIR 7/34
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Data: n i.i.d. observations (xi, yi)i=1,...,n.
xi is not perfectly known but sampled at (fixed) points
xi = (xi(t1), . . . , xi(tp))T
∈ Rp
. We denote: X =


xT
1
...
xT
n


.
Nathalie Villa-Vialaneix | SISIR 7/34
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Data: n i.i.d. observations (xi, yi)i=1,...,n.
xi is not perfectly known but sampled at (fixed) points
xi = (xi(t1), . . . , xi(tp))T
∈ Rp
. We denote: X =


xT
1
...
xT
n


.
Question: Find a model that is easily interpretable and points out relevant
intervals for the prediction within the definition domain of X.
Nathalie Villa-Vialaneix | SISIR 7/34
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Data: n i.i.d. observations (xi, yi)i=1,...,n.
xi is not perfectly known but sampled at (fixed) points
xi = (xi(t1), . . . , xi(tp))T
∈ Rp
. We denote: X =


xT
1
...
xT
n


.
Question: Find a model that is easily interpretable and points out relevant
intervals for the prediction within the definition domain of X.
Method: Do not expand X on a functional basis but use the fact that the
entries of the digitized function xi are ordered in a natural way.
Nathalie Villa-Vialaneix | SISIR 7/34
Related works (variable selection in FDA)
LASSO / L1
regularization in linear models
[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation
points), [Matsui and Konishi, 2011] (selects elements of an expansion
basis)
[Fraiman et al., 2016] (blinding approach usable for various problems:
PCA, regression...)
[Gregorutti et al., 2015] adaptation of the importance of variables in
random forest for groups of variables
[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a
greedy update of the selected evaluation points to select the most
relevant evaluation points in a nonparametric framework
Nathalie Villa-Vialaneix | SISIR 8/34
Related works (variable selection in FDA)
LASSO / L1
regularization in linear models
[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation
points), [Matsui and Konishi, 2011] (selects elements of an expansion
basis)
[Fraiman et al., 2016] (blinding approach usable for various problems:
PCA, regression...)
[Gregorutti et al., 2015] adaptation of the importance of variables in
random forest for groups of variables
[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a
greedy update of the selected evaluation points to select the most
relevant evaluation points in a nonparametric framework
However, none of these approach propose to automatically design and
select contiguous sets of variables.
Nathalie Villa-Vialaneix | SISIR 8/34
Related works (selection of groups of variables)
[James et al., 2009] L1
regularization in linear model with sparsity on
derivatives: piecewise constant predictors
[Park et al., 2016] criterion based on a minimization of the overall
correlation during a greedy segmentation
[Grollemund et al., 2017] Bayesian approach in which a posteriori
distribution about informative intervals can be obtained
Nathalie Villa-Vialaneix | SISIR 9/34
Related works (selection of groups of variables)
[James et al., 2009] L1
regularization in linear model with sparsity on
derivatives: piecewise constant predictors
[Park et al., 2016] criterion based on a minimization of the overall
correlation during a greedy segmentation
[Grollemund et al., 2017] Bayesian approach in which a posteriori
distribution about informative intervals can be obtained
All are proposed in the framework of the linear model and the second one
does not use the target variable to define and select relevant intervals.
Nathalie Villa-Vialaneix | SISIR 9/34
Related works (selection of groups of variables)
[James et al., 2009] L1
regularization in linear model with sparsity on
derivatives: piecewise constant predictors
[Park et al., 2016] criterion based on a minimization of the overall
correlation during a greedy segmentation
[Grollemund et al., 2017] Bayesian approach in which a posteriori
distribution about informative intervals can be obtained
All are proposed in the framework of the linear model and the second one
does not use the target variable to define and select relevant intervals.
Our proposal: a semi-parametric (not entirely linear) model which selects
relevant intervals combined with an automatic procedure to define the
intervals.
Nathalie Villa-Vialaneix | SISIR 9/34
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations and Real data
Nathalie Villa-Vialaneix | SISIR 10/34
SIR in multidimensional framework
SIR: a semi-parametric regression model for X ∈ Rp
Y = F(aT
1 X, . . . , aT
d X, )
for a1, . . . , ad ∈ Rp
(to be estimated), F : Rd+1
→ R, unknown, and , an
error, independant from X.
Standard assumption for SIR
Y X | PA (X)
in which A is the so-called EDR space, spanned by (ak )k=1,...,d.
Nathalie Villa-Vialaneix | SISIR 11/34
SIR in multidimensional framework
SIR: a semi-parametric regression model for X ∈ Rp
Y = F(aT
1 X, . . . , aT
d X, )
for a1, . . . , ad ∈ Rp
(to be estimated), F : Rd+1
→ R, unknown, and , an
error, independant from X.
Standard assumption for SIR
Y X | PA (X)
in which A is the so-called EDR space, spanned by (ak )k=1,...,d.
SIR is the regression extension of Linear Discriminant Analysis.
Nathalie Villa-Vialaneix | SISIR 11/34
Estimation
Equivalence between SIR and eigendecomposition
Nathalie Villa-Vialaneix | SISIR 12/34
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of
E(X|Y)
Nathalie Villa-Vialaneix | SISIR 12/34
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
Nathalie Villa-Vialaneix | SISIR 12/34
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
split the range of Y into H different slices: τ1, ... τH and estimate
ˆE(X|Y) = 1
nh i: yi∈τh
xi
h=1,...,H
, with nh = |{i : yi ∈ τh}|, in each slice,
to obtain an estimate of ˆΓ
Nathalie Villa-Vialaneix | SISIR 12/34
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
split the range of Y into H different slices: τ1, ... τH and estimate
ˆE(X|Y) = 1
nh i: yi∈τh
xi
h=1,...,H
, with nh = |{i : yi ∈ τh}|, in each slice,
to obtain an estimate of ˆΓ
solve the eigendecomposition problem ˆΓa = λˆΣa and obtain the
eigenvectors a1, . . . , ad
Nathalie Villa-Vialaneix | SISIR 12/34
SIR in large dimensions: problem
In large dimension (or in Functional Data Analysis), n < p and ˆΣ is
ill-conditionned and does not have an inverse ⇒ Z = (X − InX
T
)ˆΣ−1/2
can
not be computed.
Nathalie Villa-Vialaneix | SISIR 13/34
SIR in large dimensions: problem
In large dimension (or in Functional Data Analysis), n < p and ˆΣ is
ill-conditionned and does not have an inverse ⇒ Z = (X − InX
T
)ˆΣ−1/2
can
not be computed.
Different solutions have been proposed in the litterature based on:
prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in the
framework of FDA)
regularization (ridge...)
[Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to the
generalized eigendecomposition problem ˆΓa = λ(ˆΣ + µ2I)a
sparse SIR
[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]
QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm
Nathalie Villa-Vialaneix | SISIR 13/34
SIR in large dimensions: sparse versions
Specific issue to introduce sparsity in SIR
Sparsity on a multiple-index model: most authors use shrinkage
approaches or sparsity on a single-index model and depletion (not shown)
First version: Li and Yin (2008) based on the regression formulation
Pro : Sparsity common to all dimensions d
Cons : Minimization problem with dependent variables in Rp
Second version: Li and Nachtsheim (2008) based on the correlation
formulation
Pro : Minimization problem with independent variables in Rd
Cons : Sparsity different in all dimensions d
Nathalie Villa-Vialaneix | SISIR 14/34
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Nathalie Villa-Vialaneix | SISIR 15/34
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
Nathalie Villa-Vialaneix | SISIR 15/34
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]
shows that SIR rewrites as the double optimisation problem
maxaj,φ Cor(φ(Y), aT
j
X), where φ is any function R → R and (aj)j are
Σ-orthonormal.
Nathalie Villa-Vialaneix | SISIR 15/34
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]
shows that SIR rewrites as the double optimisation problem
maxaj,φ Cor(φ(Y), aT
j
X), where φ is any function R → R and (aj)j are
Σ-orthonormal.
Rk: The solution is shown to satisfy φ(y) = aT
j
E(X|Y = y) and aj is
also obtained as the solution of the mean square error problem:
min
aj
E φ(Y) − aT
j X
2
Nathalie Villa-Vialaneix | SISIR 15/34
SIR in large dimensions: sparse versions
First version: sparse penalization of the ridge solution
If (ˆA, ˆC) are the solutions of the ridge SIR,
[Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution by
minimizing
Es,1(α) =
H
h=1
ˆph Xh − X − ˆΣDiag(α)ˆA ˆCh
2
+ µ1 α L1
(regression formulation of SIR)
Nathalie Villa-Vialaneix | SISIR 16/34
SIR in large dimensions: sparse versions
Second version: [Li and Nachtsheim, 2008] derive the sparse optimization
problem from the correlation formulation of SIR:
min
as
j
n
i=1
Pˆaj
(X|yi) − (as
j )T
xi
2
+ µ1,j as
j L1
,
in which Pˆaj
is the projection of ˆE(X|Y = yi) = Xh onto the space spanned
by the solution of the ridge problem.
Nathalie Villa-Vialaneix | SISIR 16/34
Characteristics of the different approaches and possible
extensions
[Li and Yin, 2008] [Li and Nachtsheim, 2008]
sparsity on shrinkage coefficients estimates
nb optimization pb 1 d
sparsity common to all dims specific to each dim
Nathalie Villa-Vialaneix | SISIR 17/34
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations and Real data
Nathalie Villa-Vialaneix | SISIR 18/34
SIR in large dimensions: our sparse version
Background: Back to the functional setting, we suppose that t1, ..., tp are
split into D intervals I1, ..., ID.
Based on the minimization problem of Li and Nachtsheim (2008)
Our adaptation: Sparsity under the intervals using α = (α1, . . . , αD)
∀l = 1, . . . , p, ˆas
jl
= ˆαk ˆajl for k such that tj ∈ Ik .
the sparsity constraint is put on α and not directly on ˆas
j
α are made identical for all dimensions of the projection j = 1, . . . , d
Nathalie Villa-Vialaneix | SISIR 19/34
SIR in large dimensions: our sparse version
Li and Nachtsheim (2008) (LASSO):
min
as
j
n
i=1
Pˆaj
(X|yi) − (as
j )T
xi
2
+ µ1,j as
j L1
,
in which Pˆaj
is the projection of ˆE(X|Y = yi) = Xh (for h such that yi in
slide h) onto the space spanned by the ˆaj.
Our adaptation:
ˆα = arg min
α∈RD
d
j=1
n
i=1
Pˆaj
(X|yi) − (Λ(α) ˆaj) xi
2
+ µ1 α L1
with ∀l = 1, . . . , p, ˆas
jl
= ˆαk ˆajl for k such that tj ∈ Ik and
Λ(α) = Diag (α1I|I1|, . . . , αDI|ID |) ∈ Mp×p.
Nathalie Villa-Vialaneix | SISIR 20/34
Summary : SISIR: a two step approach
First step: Solve the projection problem (using SIR and L2-regularization of
Σ) that provides the estimates (ˆaj)j∈{1,...,d} of the vectors spanning the EDR
space.
Second step: Sparsity under the D intervals using α = (α1, . . . , αD)
solving a LASSO problem : handles functional setting by penalizing entire
intervals and not just isolated points.
Nathalie Villa-Vialaneix | SISIR 21/34
SISIR: Characteristics
uses the approach based on the correlation formulation (because the
dimensionality of the optimization problem is smaller);
uses a shrinkage approach and optimizes shrinkage coefficients in a
single optimization problem;
handles functional setting by penalizing entire intervals and not just
isolated points.
Nathalie Villa-Vialaneix | SISIR 22/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
Nathalie Villa-Vialaneix | SISIR 23/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
along the regularization path, select three values for µ1:
Nathalie Villa-Vialaneix | SISIR 23/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
along the regularization path, select three values for µ1: P% of the
coefficients are zero, P% of the coefficients are non zero, best GCV.
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
Nathalie Villa-Vialaneix | SISIR 23/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
Nathalie Villa-Vialaneix | SISIR 23/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
3 Output: Collection of models (first with p intervals, last with 1), M∗
D
(optimal for GCV) and corresponding GCVD versus D (number of
intervals).
Nathalie Villa-Vialaneix | SISIR 23/34
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
3 Output: Collection of models (first with p intervals, last with 1), M∗
D
(optimal for GCV) and corresponding GCVD versus D (number of
intervals).
Final solution: Minimize GCVD over D.
Nathalie Villa-Vialaneix | SISIR 23/34
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations and Real data
Nathalie Villa-Vialaneix | SISIR 24/34
Simulation framework
Data generated with:
X(t) a Gaussian process with mean µ(t) = −5 + 4t − 4t2
and a
Matern covariance
aj = sin
t(2+j)π
2 −
(j−1)π
3 IIj
(t)
Y = d
j=1 log X, aj
one model: (M1), d = 1, I1 = [0.2, 0.4].
Nathalie Villa-Vialaneix | SISIR 25/34
Definition of the intervals
D = p = 200 (initial state=LASSO) D = 142
D = 41 D = 5
Nathalie Villa-Vialaneix | SISIR 26/34
Second model
(M2): d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78].
Nathalie Villa-Vialaneix | SISIR 27/34
Second model
SISIR standard sparse
Nathalie Villa-Vialaneix | SISIR 28/34
Tecator dataset
relevant intervals
easily interpretable
good MSE
Nathalie Villa-Vialaneix | SISIR 29/34
Sunflower dataset
climatic time series (between 1975 and 2012 in France)
daily measure from April to October
X=evaportranspiration, Y=yield, n = 111, p = 309
Nathalie Villa-Vialaneix | SISIR 30/34
Sunflower dataset
only two points identified outside the interval
focus on the second half of the interval
matches expert knowledge
Nathalie Villa-Vialaneix | SISIR 31/34
Conclusion
SI-SIR:
sparse dimension reduction model adapted to functional framework
fully automated definition of relevant intervals in the range of the
predictors
Package SISIR available on CRAN at
https://cran.r-project.org/package=SISIR.
Perspectives
adaptation to multiple X
application to large-scale real data (agricultural application:
X={temperature,rainfall ...}, Y={yield})
replace CV criterion?
Nathalie Villa-Vialaneix | SISIR 32/34
Nathalie Villa-Vialaneix | SISIR 33/34
Aneiros, G. and Vieu, P. (2014).
Variable in infinite-dimensional problems.
Statistics and Probability Letters, 94:12–20.
Bernard-Michel, C., Gardes, L., and Girard, S. (2008).
A note on sliced inverse regression with regularizations.
Biometrics, 64(3):982–986.
Coudret, R., Liquet, B., and Saracco, J. (2014).
Comparison of sliced inverse regression aproaches for undetermined cases.
Journal de la Société Française de Statistique, 155(2):72–96.
Fauvel, M., Deschene, C., Zullo, A., and Ferraty, F. (2015).
Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):2824–2831.
Ferraty, F. and Hall, P. (2015).
An algorithm for nonlinear, nonparametric model choice and prediction.
Journal of Computational and Graphical Statistics, 24(3):695–714.
Ferraty, F., Hall, P., and Vieu, P. (2010).
Most-predictive design points for functiona data predictors.
Biometrika, 97(4):807–824.
Ferré, L. and Yao, A. (2003).
Functional sliced inverse regression analysis.
Statistics, 37(6):475–488.
Fraiman, R., Gimenez, Y., and Svarc, M. (2016).
Feature selection for functional data.
Journal of Multivariate Analysis, 146:191–208.
Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).
Grouped variable importance with random forests and application to multiple functional data analysis.
Nathalie Villa-Vialaneix | SISIR 33/34
Computational Statistics and Data Analysis, 90:15–35.
Grollemund, P., Abraham, C., Baragatti, M., and Pudlo, P. (2017).
Bayesian functional linear regression with sparse step functions.
Preprint.
James, G., Wang, J., and Zhu, J. (2009).
Functional linear regression that’s interpretable.
Annals of Statistics, 37(5A):2083–2108.
Li, L. and Nachtsheim, C. (2008).
Sparse sliced inverse regression.
Technometrics, 48(4):503–510.
Li, L. and Yin, X. (2008).
Sliced inverse regression with regularizations.
Biometrics, 64(1):124–131.
Liquet, B. and Saracco, J. (2012).
A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches.
Computational Statistics, 27(1):103–125.
Matsui, H. and Konishi, S. (2011).
Variable selection for functional regression models via the l1 regularization.
Computational Statistics and Data Analysis, 55(12):3304–3310.
Ni, L., Cook, D., and Tsai, C. (2005).
A note on shrinkage sliced inverse regression.
Biometrika, 92(1):242–247.
Park, A., Aston, J., and Ferraty, F. (2016).
Stable and predictive functional domain selection with application to brain images.
Preprint arXiv 1606.02186.
Nathalie Villa-Vialaneix | SISIR 34/34
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
Nathalie Villa-Vialaneix | SISIR 34/34
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough) Note that GCV as described in
[Li and Yin, 2008] can not be used since the current version of the L2
penalty involves the use of an estimate of Σ−1
.
Nathalie Villa-Vialaneix | SISIR 34/34
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough)
using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of
R(d) = d − E Tr Πd
ˆΠd ,
in which Πd and ˆΠd are the projector onto the first d dimensions of the
EDR space and its estimate, is derived similarly as in
[Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied
to select a relevant d.
Nathalie Villa-Vialaneix | SISIR 34/34
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough)
using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of
R(d) = d − E Tr Πd
ˆΠd ,
in which Πd and ˆΠd are the projector onto the first d dimensions of the
EDR space and its estimate, is derived similarly as in
[Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied
to select a relevant d.
µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along the
regularization path.
Nathalie Villa-Vialaneix | SISIR 34/34

More Related Content

PDF
Random Forest for Big Data
PDF
Interpretable Sparse Sliced Inverse Regression for digitized functional data
PDF
A comparison of learning methods to predict N2O fluxes and N leaching
PDF
A short introduction to statistical learning
PDF
About functional SIR
PDF
Learning from (dis)similarity data
PDF
A comparison of three learning methods to predict N20 fluxes and N leaching
PDF
Kernel methods and variable selection for exploratory analysis and multi-omic...
Random Forest for Big Data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
A comparison of learning methods to predict N2O fluxes and N leaching
A short introduction to statistical learning
About functional SIR
Learning from (dis)similarity data
A comparison of three learning methods to predict N20 fluxes and N leaching
Kernel methods and variable selection for exploratory analysis and multi-omic...

What's hot (20)

PDF
Reliable ABC model choice via random forests
PDF
Boston talk
PDF
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
PDF
A short and naive introduction to using network in prediction models
PDF
From RNN to neural networks for cyclic undirected graphs
PDF
Pattern learning and recognition on statistical manifolds: An information-geo...
PDF
A short introduction to statistical learning
PDF
from model uncertainty to ABC
PDF
NBBC15, Reyjavik, June 08, 2015
PDF
Intractable likelihoods
PDF
Reproducibility and differential analysis with selfish
PDF
Kernel methods for data integration in systems biology
PDF
FDA and Statistical learning theory
PDF
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
PDF
Random Matrix Theory in Array Signal Processing: Application Examples
PDF
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
PDF
Estimating Space-Time Covariance from Finite Sample Sets
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PDF
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
PDF
(Approximate) Bayesian computation as a new empirical Bayes (something)?
Reliable ABC model choice via random forests
Boston talk
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
A short and naive introduction to using network in prediction models
From RNN to neural networks for cyclic undirected graphs
Pattern learning and recognition on statistical manifolds: An information-geo...
A short introduction to statistical learning
from model uncertainty to ABC
NBBC15, Reyjavik, June 08, 2015
Intractable likelihoods
Reproducibility and differential analysis with selfish
Kernel methods for data integration in systems biology
FDA and Statistical learning theory
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
Random Matrix Theory in Array Signal Processing: Application Examples
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Estimating Space-Time Covariance from Finite Sample Sets
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
(Approximate) Bayesian computation as a new empirical Bayes (something)?
Ad

Similar to About functional SIR (20)

PDF
Regression on gaussian symbols
PDF
Transportation Problem with Pentagonal Intuitionistic Fuzzy Numbers Solved Us...
PDF
Asynchronous Stochastic Optimization, New Analysis and Algorithms
PDF
Classification
PDF
Conceptual Introduction to Gaussian Processes
PDF
Radial Basis Function Interpolation
PDF
Classification and regression based on derivatives: a consistency result for ...
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Non-parametric analysis of models and data
PDF
block-mdp-masters-defense.pdf
PDF
When Classifier Selection meets Information Theory: A Unifying View
PDF
RuleML2015: Input-Output STIT Logic for Normative Systems
PDF
2. polynomial interpolation
PDF
Side 2019, part 2
PDF
Kernel methods for data integration in systems biology
PDF
Approximate Bayesian model choice via random forests
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
PDF
Projection methods for stochastic structural dynamics
PDF
Can we estimate a constant?
Regression on gaussian symbols
Transportation Problem with Pentagonal Intuitionistic Fuzzy Numbers Solved Us...
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Classification
Conceptual Introduction to Gaussian Processes
Radial Basis Function Interpolation
Classification and regression based on derivatives: a consistency result for ...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Non-parametric analysis of models and data
block-mdp-masters-defense.pdf
When Classifier Selection meets Information Theory: A Unifying View
RuleML2015: Input-Output STIT Logic for Normative Systems
2. polynomial interpolation
Side 2019, part 2
Kernel methods for data integration in systems biology
Approximate Bayesian model choice via random forests
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
Projection methods for stochastic structural dynamics
Can we estimate a constant?
Ad

More from tuxette (20)

PDF
Analyse comparative de données de génomique 3D
PDF
Detecting differences between 3D genomic data: a benchmark study
PDF
Racines en haut et feuilles en bas : les arbres en maths
PDF
Méthodes à noyaux pour l’intégration de données hétérogènes
PDF
Méthodologies d'intégration de données omiques
PDF
Projets autour de l'Hi-C
PDF
Can deep learning learn chromatin structure from sequence?
PDF
Multi-omics data integration methods: kernel and other machine learning appro...
PDF
ASTERICS : une application pour intégrer des données omiques
PDF
Autour des projets Idefics et MetaboWean
PDF
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
PDF
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
PDF
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
PDF
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
PDF
Journal club: Validation of cluster analysis results on validation data
PDF
Overfitting or overparametrization?
PDF
Selective inference and single-cell differential analysis
PDF
SOMbrero : un package R pour les cartes auto-organisatrices
PDF
Graph Neural Network for Phenotype Prediction
PDF
Explanable models for time series with random forest
Analyse comparative de données de génomique 3D
Detecting differences between 3D genomic data: a benchmark study
Racines en haut et feuilles en bas : les arbres en maths
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodologies d'intégration de données omiques
Projets autour de l'Hi-C
Can deep learning learn chromatin structure from sequence?
Multi-omics data integration methods: kernel and other machine learning appro...
ASTERICS : une application pour intégrer des données omiques
Autour des projets Idefics et MetaboWean
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Journal club: Validation of cluster analysis results on validation data
Overfitting or overparametrization?
Selective inference and single-cell differential analysis
SOMbrero : un package R pour les cartes auto-organisatrices
Graph Neural Network for Phenotype Prediction
Explanable models for time series with random forest

Recently uploaded (20)

PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
HPLC-PPT.docx high performance liquid chromatography
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPT
protein biochemistry.ppt for university classes
lecture 2026 of Sjogren's syndrome l .pdf
Biophysics 2.pdffffffffffffffffffffffffff
2. Earth - The Living Planet Module 2ELS
HPLC-PPT.docx high performance liquid chromatography
POSITIONING IN OPERATION THEATRE ROOM.ppt
2Systematics of Living Organisms t-.pptx
Phytochemical Investigation of Miliusa longipes.pdf
famous lake in india and its disturibution and importance
neck nodes and dissection types and lymph nodes levels
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
ECG_Course_Presentation د.محمد صقران ppt
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
6.1 High Risk New Born. Padetric health ppt
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
TOTAL hIP ARTHROPLASTY Presentation.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
protein biochemistry.ppt for university classes

About functional SIR

  • 1. About functional SIR Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Journées “Données fonctionnelles” Institut de Mathématiques de Toulouse, June 19th 2017 Nathalie Villa-Vialaneix | SISIR 1/34
  • 2. A joint work of SFCB team Victor Picheny Rémi Servien NV2 Nathalie Villa-Vialaneix | SISIR 2/34
  • 3. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 3/34
  • 4. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 4/34
  • 5. Introduction X a functional random variable and Y ∈ R n i.i.d. realizations of (X, Y) Nathalie Villa-Vialaneix | SISIR 5/34
  • 6. Objectives variable selection in functional regression selection of full intervals made of consecutive points without any a priori information on the intervals fully data-driven procedure Nathalie Villa-Vialaneix | SISIR 6/34
  • 7. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Nathalie Villa-Vialaneix | SISIR 7/34
  • 8. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (fixed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Nathalie Villa-Vialaneix | SISIR 7/34
  • 9. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (fixed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Question: Find a model that is easily interpretable and points out relevant intervals for the prediction within the definition domain of X. Nathalie Villa-Vialaneix | SISIR 7/34
  • 10. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (fixed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Question: Find a model that is easily interpretable and points out relevant intervals for the prediction within the definition domain of X. Method: Do not expand X on a functional basis but use the fact that the entries of the digitized function xi are ordered in a natural way. Nathalie Villa-Vialaneix | SISIR 7/34
  • 11. Related works (variable selection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis) [Fraiman et al., 2016] (blinding approach usable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables [Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a greedy update of the selected evaluation points to select the most relevant evaluation points in a nonparametric framework Nathalie Villa-Vialaneix | SISIR 8/34
  • 12. Related works (variable selection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis) [Fraiman et al., 2016] (blinding approach usable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables [Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a greedy update of the selected evaluation points to select the most relevant evaluation points in a nonparametric framework However, none of these approach propose to automatically design and select contiguous sets of variables. Nathalie Villa-Vialaneix | SISIR 8/34
  • 13. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained Nathalie Villa-Vialaneix | SISIR 9/34
  • 14. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained All are proposed in the framework of the linear model and the second one does not use the target variable to define and select relevant intervals. Nathalie Villa-Vialaneix | SISIR 9/34
  • 15. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained All are proposed in the framework of the linear model and the second one does not use the target variable to define and select relevant intervals. Our proposal: a semi-parametric (not entirely linear) model which selects relevant intervals combined with an automatic procedure to define the intervals. Nathalie Villa-Vialaneix | SISIR 9/34
  • 16. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 10/34
  • 17. SIR in multidimensional framework SIR: a semi-parametric regression model for X ∈ Rp Y = F(aT 1 X, . . . , aT d X, ) for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and , an error, independant from X. Standard assumption for SIR Y X | PA (X) in which A is the so-called EDR space, spanned by (ak )k=1,...,d. Nathalie Villa-Vialaneix | SISIR 11/34
  • 18. SIR in multidimensional framework SIR: a semi-parametric regression model for X ∈ Rp Y = F(aT 1 X, . . . , aT d X, ) for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and , an error, independant from X. Standard assumption for SIR Y X | PA (X) in which A is the so-called EDR space, spanned by (ak )k=1,...,d. SIR is the regression extension of Linear Discriminant Analysis. Nathalie Villa-Vialaneix | SISIR 11/34
  • 19. Estimation Equivalence between SIR and eigendecomposition Nathalie Villa-Vialaneix | SISIR 12/34
  • 20. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Nathalie Villa-Vialaneix | SISIR 12/34
  • 21. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) Nathalie Villa-Vialaneix | SISIR 12/34
  • 22. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, in each slice, to obtain an estimate of ˆΓ Nathalie Villa-Vialaneix | SISIR 12/34
  • 23. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, in each slice, to obtain an estimate of ˆΓ solve the eigendecomposition problem ˆΓa = λˆΣa and obtain the eigenvectors a1, . . . , ad Nathalie Villa-Vialaneix | SISIR 12/34
  • 24. SIR in large dimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Nathalie Villa-Vialaneix | SISIR 13/34
  • 25. SIR in large dimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Different solutions have been proposed in the litterature based on: prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in the framework of FDA) regularization (ridge...) [Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to the generalized eigendecomposition problem ˆΓa = λ(ˆΣ + µ2I)a sparse SIR [Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005] QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm Nathalie Villa-Vialaneix | SISIR 13/34
  • 26. SIR in large dimensions: sparse versions Specific issue to introduce sparsity in SIR Sparsity on a multiple-index model: most authors use shrinkage approaches or sparsity on a single-index model and depletion (not shown) First version: Li and Yin (2008) based on the regression formulation Pro : Sparsity common to all dimensions d Cons : Minimization problem with dependent variables in Rp Second version: Li and Nachtsheim (2008) based on the correlation formulation Pro : Minimization problem with independent variables in Rd Cons : Sparsity different in all dimensions d Nathalie Villa-Vialaneix | SISIR 14/34
  • 27. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Nathalie Villa-Vialaneix | SISIR 15/34
  • 28. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... Nathalie Villa-Vialaneix | SISIR 15/34
  • 29. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Nathalie Villa-Vialaneix | SISIR 15/34
  • 30. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Rk: The solution is shown to satisfy φ(y) = aT j E(X|Y = y) and aj is also obtained as the solution of the mean square error problem: min aj E φ(Y) − aT j X 2 Nathalie Villa-Vialaneix | SISIR 15/34
  • 31. SIR in large dimensions: sparse versions First version: sparse penalization of the ridge solution If (ˆA, ˆC) are the solutions of the ridge SIR, [Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution by minimizing Es,1(α) = H h=1 ˆph Xh − X − ˆΣDiag(α)ˆA ˆCh 2 + µ1 α L1 (regression formulation of SIR) Nathalie Villa-Vialaneix | SISIR 16/34
  • 32. SIR in large dimensions: sparse versions Second version: [Li and Nachtsheim, 2008] derive the sparse optimization problem from the correlation formulation of SIR: min as j n i=1 Pˆaj (X|yi) − (as j )T xi 2 + µ1,j as j L1 , in which Pˆaj is the projection of ˆE(X|Y = yi) = Xh onto the space spanned by the solution of the ridge problem. Nathalie Villa-Vialaneix | SISIR 16/34
  • 33. Characteristics of the different approaches and possible extensions [Li and Yin, 2008] [Li and Nachtsheim, 2008] sparsity on shrinkage coefficients estimates nb optimization pb 1 d sparsity common to all dims specific to each dim Nathalie Villa-Vialaneix | SISIR 17/34
  • 34. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 18/34
  • 35. SIR in large dimensions: our sparse version Background: Back to the functional setting, we suppose that t1, ..., tp are split into D intervals I1, ..., ID. Based on the minimization problem of Li and Nachtsheim (2008) Our adaptation: Sparsity under the intervals using α = (α1, . . . , αD) ∀l = 1, . . . , p, ˆas jl = ˆαk ˆajl for k such that tj ∈ Ik . the sparsity constraint is put on α and not directly on ˆas j α are made identical for all dimensions of the projection j = 1, . . . , d Nathalie Villa-Vialaneix | SISIR 19/34
  • 36. SIR in large dimensions: our sparse version Li and Nachtsheim (2008) (LASSO): min as j n i=1 Pˆaj (X|yi) − (as j )T xi 2 + µ1,j as j L1 , in which Pˆaj is the projection of ˆE(X|Y = yi) = Xh (for h such that yi in slide h) onto the space spanned by the ˆaj. Our adaptation: ˆα = arg min α∈RD d j=1 n i=1 Pˆaj (X|yi) − (Λ(α) ˆaj) xi 2 + µ1 α L1 with ∀l = 1, . . . , p, ˆas jl = ˆαk ˆajl for k such that tj ∈ Ik and Λ(α) = Diag (α1I|I1|, . . . , αDI|ID |) ∈ Mp×p. Nathalie Villa-Vialaneix | SISIR 20/34
  • 37. Summary : SISIR: a two step approach First step: Solve the projection problem (using SIR and L2-regularization of Σ) that provides the estimates (ˆaj)j∈{1,...,d} of the vectors spanning the EDR space. Second step: Sparsity under the D intervals using α = (α1, . . . , αD) solving a LASSO problem : handles functional setting by penalizing entire intervals and not just isolated points. Nathalie Villa-Vialaneix | SISIR 21/34
  • 38. SISIR: Characteristics uses the approach based on the correlation formulation (because the dimensionality of the optimization problem is smaller); uses a shrinkage approach and optimizes shrinkage coefficients in a single optimization problem; handles functional setting by penalizing entire intervals and not just isolated points. Nathalie Villa-Vialaneix | SISIR 22/34
  • 39. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } Nathalie Villa-Vialaneix | SISIR 23/34
  • 40. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: Nathalie Villa-Vialaneix | SISIR 23/34
  • 41. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: P% of the coefficients are zero, P% of the coefficients are non zero, best GCV. define: D− (“strong zeros”) and D+ (“strong non zeros”) Nathalie Villa-Vialaneix | SISIR 23/34
  • 42. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. Nathalie Villa-Vialaneix | SISIR 23/34
  • 43. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (first with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Nathalie Villa-Vialaneix | SISIR 23/34
  • 44. An automatic approach to define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (first with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Final solution: Minimize GCVD over D. Nathalie Villa-Vialaneix | SISIR 23/34
  • 45. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 24/34
  • 46. Simulation framework Data generated with: X(t) a Gaussian process with mean µ(t) = −5 + 4t − 4t2 and a Matern covariance aj = sin t(2+j)π 2 − (j−1)π 3 IIj (t) Y = d j=1 log X, aj one model: (M1), d = 1, I1 = [0.2, 0.4]. Nathalie Villa-Vialaneix | SISIR 25/34
  • 47. Definition of the intervals D = p = 200 (initial state=LASSO) D = 142 D = 41 D = 5 Nathalie Villa-Vialaneix | SISIR 26/34
  • 48. Second model (M2): d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78]. Nathalie Villa-Vialaneix | SISIR 27/34
  • 49. Second model SISIR standard sparse Nathalie Villa-Vialaneix | SISIR 28/34
  • 50. Tecator dataset relevant intervals easily interpretable good MSE Nathalie Villa-Vialaneix | SISIR 29/34
  • 51. Sunflower dataset climatic time series (between 1975 and 2012 in France) daily measure from April to October X=evaportranspiration, Y=yield, n = 111, p = 309 Nathalie Villa-Vialaneix | SISIR 30/34
  • 52. Sunflower dataset only two points identified outside the interval focus on the second half of the interval matches expert knowledge Nathalie Villa-Vialaneix | SISIR 31/34
  • 53. Conclusion SI-SIR: sparse dimension reduction model adapted to functional framework fully automated definition of relevant intervals in the range of the predictors Package SISIR available on CRAN at https://cran.r-project.org/package=SISIR. Perspectives adaptation to multiple X application to large-scale real data (agricultural application: X={temperature,rainfall ...}, Y={yield}) replace CV criterion? Nathalie Villa-Vialaneix | SISIR 32/34
  • 55. Aneiros, G. and Vieu, P. (2014). Variable in infinite-dimensional problems. Statistics and Probability Letters, 94:12–20. Bernard-Michel, C., Gardes, L., and Girard, S. (2008). A note on sliced inverse regression with regularizations. Biometrics, 64(3):982–986. Coudret, R., Liquet, B., and Saracco, J. (2014). Comparison of sliced inverse regression aproaches for undetermined cases. Journal de la Société Française de Statistique, 155(2):72–96. Fauvel, M., Deschene, C., Zullo, A., and Ferraty, F. (2015). Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):2824–2831. Ferraty, F. and Hall, P. (2015). An algorithm for nonlinear, nonparametric model choice and prediction. Journal of Computational and Graphical Statistics, 24(3):695–714. Ferraty, F., Hall, P., and Vieu, P. (2010). Most-predictive design points for functiona data predictors. Biometrika, 97(4):807–824. Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis. Statistics, 37(6):475–488. Fraiman, R., Gimenez, Y., and Svarc, M. (2016). Feature selection for functional data. Journal of Multivariate Analysis, 146:191–208. Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015). Grouped variable importance with random forests and application to multiple functional data analysis. Nathalie Villa-Vialaneix | SISIR 33/34
  • 56. Computational Statistics and Data Analysis, 90:15–35. Grollemund, P., Abraham, C., Baragatti, M., and Pudlo, P. (2017). Bayesian functional linear regression with sparse step functions. Preprint. James, G., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable. Annals of Statistics, 37(5A):2083–2108. Li, L. and Nachtsheim, C. (2008). Sparse sliced inverse regression. Technometrics, 48(4):503–510. Li, L. and Yin, X. (2008). Sliced inverse regression with regularizations. Biometrics, 64(1):124–131. Liquet, B. and Saracco, J. (2012). A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches. Computational Statistics, 27(1):103–125. Matsui, H. and Konishi, S. (2011). Variable selection for functional regression models via the l1 regularization. Computational Statistics and Data Analysis, 55(12):3304–3310. Ni, L., Cook, D., and Tsai, C. (2005). A note on shrinkage sliced inverse regression. Biometrika, 92(1):242–247. Park, A., Aston, J., and Ferraty, F. (2016). Stable and predictive functional domain selection with application to brain images. Preprint arXiv 1606.02186. Nathalie Villa-Vialaneix | SISIR 34/34
  • 57. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); Nathalie Villa-Vialaneix | SISIR 34/34
  • 58. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) Note that GCV as described in [Li and Yin, 2008] can not be used since the current version of the L2 penalty involves the use of an estimate of Σ−1 . Nathalie Villa-Vialaneix | SISIR 34/34
  • 59. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the first d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. Nathalie Villa-Vialaneix | SISIR 34/34
  • 60. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the first d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along the regularization path. Nathalie Villa-Vialaneix | SISIR 34/34