Variable_selection_via_fused_sparse-group_lasso_pe.pdf

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/386144148
Variable selection via fused sparse-group lasso penalized multi-state models
incorporating molecular data
Preprint · November 2024
DOI: 10.48550/arXiv.2411.17394
CITATIONS
0
READS
14
5 authors, including:
Kaya Miah
German Cancer Research Center
19 PUBLICATIONS 157 CITATIONS
SEE PROFILE
Jelle J Goeman
Leiden University Medical Centre
242 PUBLICATIONS 8,666 CITATIONS
SEE PROFILE
Hein Putter
Leiden University Medical Centre
922 PUBLICATIONS 53,075 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kaya Miah on 06 December 2024.
The user has requested enhancement of the downloaded file.

RESEARCH ARTICLE
Variable selection via fused sparse-group lasso penalized
multi-state models incorporating molecular data
Kaya Miah1,2
| Jelle J. Goeman3
| Hein Putter3,4
| Annette Kopp-Schneider1
| Axel Benner1
1Division of Biostatistics, German Cancer
Research Center (DKFZ), Heidelberg,
Germany
2Medical Faculty, Heidelberg University,
Heidelberg, Germany
3Department of Biomedical Data Sciences,
Leiden University Medical Center
(LUMC), Leiden, The Netherlands
4Mathematical Institute, Leiden University,
Leiden, The Netherlands
Correspondence
Kaya Miah, Division of Biostatistics,
German Cancer Research Center (DKFZ), Im
Neuenheimer Feld 280, D-69120 Heidelberg,
Germany. Email: k.miah@dkfz.de
Funding
Deutsche Forschungsgemeinschaft (DFG),
Grant number: 514653984
Abstract
In multi-state models based on high-dimensional data, effective modeling strategies
are required to determine an optimal, ideally parsimonious model. In particular, link-
ing covariate effects across transitions is needed to conduct joint variable selection.
A useful technique to reduce model complexity is to address homogeneous covari-
ate effects for distinct transitions. We integrate this approach to data-driven variable
selection by extended regularization methods within multi-state model building. We
propose the fused sparse-group lasso (FSGL) penalized Cox-type regression in the
framework of multi-state models combining the penalization concepts of pairwise
differences of covariate effects along with transition grouping. For optimization,
we adapt the alternating direction method of multipliers (ADMM) algorithm to
transition-specific hazards regression in the multi-state setting. In a simulation study
and application to acute myeloid leukemia (AML) data, we evaluate the algorithm’s
ability to select a sparse model incorporating relevant transition-specific effects
and similar cross-transition effects. We investigate settings in which the combined
penalty is beneficial compared to global lasso regularization.
KEYWORDS:
Cox regression; High-dimensional data; Markov models; Regularization; Transition-specific hazards
1 INTRODUCTION
In medical research, common prediction models still predominantly make use of composite endpoints such as progression- or
event-free survival. However, these time-to-first-event endpoints do not take into account important aspects of the individual
disease and therapy course. Multi-state models are a natural framework to assess the effect of prognostic factors and treatment
on the event history of a patient and to separate risks for the occurrence of distinct events. These extend competing risks anal-
yses of event time endpoints such as time to progression, relapse, remission or death, by modeling the sequence of competing
consecutive events on a macro level. In survival analysis, the multi-state model class is used for event history data where
individuals experience a sequence of events over time. Each event is defined by an entry and exit time along with transition
types. This paper is motivated by an application to the acute myeloid leukemia (AML) disease pathway. Figure 1 illustrates the
event history for AML patients in the form of a state chart of a multi-state model with nine states and eight transitions. Distinct
states are treated as nodes and possible transitions are represented by directed arrows. To assess how probabilities of going
from state to state depend on covariates, multi-state proportional hazards regression models can be used. In the era of precision
medicine with increasingly high-dimensional information on molecular biomarkers, such a holistic analysis of a multi-state
model is of essential interest. For our motivating AML application, we will investigate the effect of biomarkers along with
arXiv:2411.17394v1
[stat.ME]
26
Nov
2024

2 Miah ET AL
Active
disease
1st Complete
Remission
(CR1)
Death
(no CR)
Death
(CR1)
Death
(relapse)
Death
(CR2)
1st
Relapse
2nd
Relapse
First-line therapy Second-line therapy
2nd Complete
Remission
(CR2)
1
2
3
4
5
6
7
8
Figure 1 State chart of the multi-state model for acute myeloid leukemia (AML) with nine states and eight possible transitions
represented by arrows. Numbers denote the corresponding transition.
established clinical covariates on the different transitions of the multi-state model depicted in Figure 1. Thus, effective variable
selection strategies for multi-state models incorporating high-dimensional molecular data are required to obtain a sparse model
and mitigate overfitting. Such data-driven model building strategies will contribute to a deeper understanding of the individual
disease progression and its therapeutic concepts as well as improved personalized prognoses.
This paper focuses on data-driven variable selection via penalized multi-state models to capture pathogenic processes and
underlying etiologies more accurately. The goal is to select a sparse model based on high-dimensional molecular data by
extended regularization methods. We want to use a-priori knowledge about the multi-state model structure to help simplify
it. First, we assume that most biomarkers might have no effect on specific disease transitions, and if they have an effect on
one transition, they might have an effect on many. Second, we presume that parameters of similar transitions are often of a
similar direction and magnitude. Third, biomarker effects might only be relevant for specific transitions. Thus, the parameter
space dimensionality should be decreased by setting non-relevant biomarker effects to zero (i.e. sparsity), identifying similar
biomarker effects across distinct transitions (i.e. similarity) and detecting only relevant biomarker effects for specific transitions
of interest (i.e. group structuring). Further, we want to combine molecular and clinical data by incorporating established clinical
predictors into the model that remain unpenalized. In our motivating AML application illustrated in Figure 1, we assume that
biomarker effects of transitions 3 and 7, i.e. from first complete remission (CR1) to first relapse and from second complete
remission (CR2) to second relapse, might be similar. Further, we would not expect any biomarker effect on e.g. transition 1, i.e.
from active disease to early death, since this might rather be related to the initiation of intensive chemotherapy.
A scoping literature review on statistical methods for model selection in the framework of multi-state models was conducted
based on the PubMed database (http://www.ncbi.nlm.nih.gov/pubmed/advanced). In the following, we give a brief
overview on existing model selection strategies for multi-state model building in survival analysis.
Common methods for variable selection comprise regularization in the fitting process in order to avoid the inclusion of covari-
ates with non-relevant effects. Saadati et al.1
proposed a lasso penalized cause-specific hazards approach for competing risks
data in higher dimensions, where the independently penalized cause-specific hazards models are linked by choosing the combi-
nation of tuning parameters that yields the best prediction. In the multi-state setting, Sennhenn-Reulen and Kneib2
developed
a data-driven regularization method for sparse modeling by combining so-called cross-transition effects of the same baseline
covariate. The structured fusion lasso penalization regularizes the 𝐿1-norm of the regression coefficients as well as all pairwise
differences between distinct transitions. Dang et al.3
suggested 𝐿1-penalization by a one-step coordinate descent algorithm.
Further, Huang et al.4
proposed a regularized continuous-time Markov model with the elastic net penalty. Beyond, Reulen
and Kneib5
introduced the component-wise functional gradient descent boosting approach to perform unsupervised variable
selection and multi-state model choice simultaneously. In particular, they focused on non-linearity of single transition-specific
or cross-transition effects. Further, Edelmann et al.6
extended the global test to competing risks and multi-state models to test
if regression coefficients for a certain subset of transitions are equal under the Markov assumption. Fiocco et al.7
introduced
reduced rank proportional hazards regression to competing risks and multi-state models8
for limiting the number of regression

Miah ET AL 3
parameters. A model class to directly estimate the effect of a covariate on survival times are accelerated failure time (AFT) mod-
els. Huang9
established the multi-state accelerated sojourn times model. Ramchandani et al.10
yield insights into the estimation
of an AFT model with intermediate states as auxiliary information. With respect to model selection approaches, Huang et al.11
consider regularized AFT models with high-dimensional covariates. Beyond, pseudo-observations in event history analysis
introduced by Andersen et al.12
provide another direct modeling technique. The state occupation probabilities are modeled di-
rectly instead of considering each transition intensity separately. These pseudo-values are then used in a generalized estimating
equation (GEE) to deduce estimates of the model parameters. In terms of variable selection procedures, Wang et al.13
proposed
penalized GEE based on high-dimensional longitudinal data. Further, Niu et al.14
utilize penalized GEE for a marginal survival
model. Based on pseudo-observations, Su et al.15
make use of penalized GEE for proportional hazards mixture cure models.
We focus on the well established hazard-based framework of Cox-type multi-state models, so that direct modeling approaches
are not further pursued in this work. Consequently, there is already some valuable work for multi-state model selection. But to
the best of our knowledge, a-priori information on the multi-state model structure is not yet well taken into account.
In this paper, we propose the fused sparse-group lasso (FSGL) penalty for multi-state models combining the concepts of
general sparsity, pairwise differences of covariate effects and transition grouping. For fitting such a penalized multi-state model,
we adapt the alternating direction method of multipliers (ADMM) optimization algorithm to Cox-type hazards regression in
the multi-state setting due to its beneficial feature of decomposing the objective function.
The remainder of the paper is structured as follows: The methodological background of multi-state models needed for the
proposed adaptation is given in Section 2. Section 3 introduces the FSGL penalty extended to the multi-state setting. Section 4
describes the general ADMM optimization algorithm for parameter estimation in Subsection 4.1 along with the derived ADMM
update steps to fit FSGL penalized multi-state models in Subsection 4.2. Section 5 shows the results of a proof-of-concept
simulation study to investigate the regularization performance of the derived algorithm and Section 6 illustrates a real data
application to AML patients.
2 METHODS FOR MULTI-STATE MODELING
The following Section provides a brief introduction to the multi-state model class in survival analysis needed for our adapted
fused sparse-group lasso penalty to the multi-state setting. A holistic framework to multi-state modeling theory can be found in
Andersen et al.16
. Subsection 2.1 introduces the general multi-state process and defines the concept of transition-specific Cox
proportional hazards regression for multi-state models. Further, Subsection 2.2 denotes the explicit likelihood formulation in
the multi-state setting along with its derivatives needed for model fitting.
2.1 Multi-state proportional hazards regression model
Following Andersen and Keiding17
and Putter et al.18
, a multi-state process is a stochastic process {𝑍(𝑡), 𝑡 ∈  } with times in
 = [0, 𝑡max], 0 < 𝑡max < ∞, and a finite state space  = {1, … , 𝐾}. The transition probabilities are given as
P𝑞(𝑠, 𝑡) = P[𝑘.𝑘′](𝑠, 𝑡) = P(𝑍(𝑡) = 𝑘′
| 𝑍(𝑠) = 𝑘)
for transition 𝑞 = [𝑘.𝑘′
] from state 𝑘 to 𝑘′
, 𝑘, 𝑘′
∈ , 𝑠, 𝑡 ∈  , 𝑠 ≤ 𝑡 and 𝑞 ∈  = {1, … , 𝑄} the set of observable transitions.
We assume a Markovian model, i.e. the probability for a transition only depends on the current state of the multi-state process
at the current time. The transition intensities are defined as the corresponding derivatives
ℎ𝑞(𝑡) = lim
Δ𝑡↘0
P𝑞(𝑡, 𝑡 + Δ𝑡)
Δ𝑡
.
To assess the dependence on covariates, these transition-specific hazard rates can be modeled by separate Cox proportional
hazards models for each transition as
ℎ𝑞(𝑡|𝒙) = ℎ0,𝑞(𝑡) exp{𝜷𝑇
𝑞
𝒙}, 𝑞 = 1, … , 𝑄,

4 Miah ET AL
for an individual with covariate vector 𝒙 = (𝑥1, … , 𝑥𝑃 )𝑇
∈ ℝ𝑃
, where ℎ0,𝑞(𝑡) denotes the baseline hazard rate of transition 𝑞
at time 𝑡 and 𝜷𝑞 = (𝛽1,𝑞, … , 𝛽𝑃,𝑞)𝑇
∈ ℝ𝑃
the vector of transition-specific regression coefficients. Thus, Cox-type regression
analysis for multi-state data enables simultaneous modeling of the relationship between covariates and all relevant transitions.19
2.2 Multi-state likelihood formulation
In the multi-state framework, the generalized partial likelihood can be written in terms of a stratified formulation as a product
of Cox partial likelihoods for each transition, i.e.
𝑙(𝜷) =
𝑄
∏
𝑞=1
𝑙𝑞(𝜷𝑞) =
𝑄
∏
𝑞=1
𝑁
∏
𝑖=1
(
exp{𝒙𝑇
𝑖
𝜷𝑞}
∑
𝑙∈𝑅𝑖,𝑞
exp{𝒙𝑇
𝑙
𝜷𝑞}
)𝛿𝑖,𝑞
,
where 𝒙𝑖 = (𝑥1;𝑖, … , 𝑥𝑃 ;𝑖)𝑇
∈ ℝ𝑃
denotes the covariate vector of individual 𝑖, 𝑖 = 1, … , 𝑁, 𝛽𝑞 ∈ ℝ𝑃
the transition-specific
regression vector, and 𝛿𝑖,𝑞 the event indicator for transition 𝑞.18,20
𝑅𝑖,𝑞 denotes the risk set for individual 𝑖 with transition 𝑞 at
time 𝑡𝑖. This set includes all individuals who are at risk of experiencing a transition of type 𝑞 transition at time 𝑡𝑖. The transition-
specific Cox partial likelihood 𝑙𝑞(𝛽𝑞) compares the hazard of the individual with an event at time 𝑡𝑖 to the hazard of all individuals
under risk at 𝑡𝑖.
The multi-state partial likelihood formulation for the stacked regression vector 𝜷 = (𝛽1,1, … , 𝛽1,𝑄, 𝛽2,1, … , 𝛽𝑃,𝑄)𝑇
∈ ℝ𝑃𝑄
and corresponding extended covariate vector ̃
𝒙𝑖 = (𝑥1.1;𝑖, … , 𝑥1.𝑄;𝑖, 𝑥2.1;𝑖, … , 𝑥𝑃.𝑄;𝑖)𝑇
∈ ℝ𝑃𝑄
is then derived as
𝑙(𝜷) =
𝑛
∏
𝑖=1
(
exp{ ̃
𝒙𝑇
𝑖
𝜷}
∑
𝑙∈ ̃
𝑅𝑖
exp{ ̃
𝒙𝑇
𝑙
𝜷}
)𝛿𝑖
,
where ̃
𝑅𝑖 denotes the corresponding risk set formulation based on long format data according to de Wreede et al.21
with single
lines 𝑖, 𝑗 from a total of 𝑛 rows. In this format, each individual has a row for each transition for which they are at risk. The
negative logarithm of the multi-state partial likelihood is
𝐿(𝜷) = − log[𝑙(𝜷)] =
𝑛
∑
𝑖=1
𝛿𝑖
⎡
⎢
⎢
⎣
− ̃
𝒙𝑇
𝑖
𝜷 + log
⎛
⎜
⎜
⎝
∑
𝑙∈ ̃
𝑅𝑖
exp{ ̃
𝒙𝑇
𝑙
𝜷}
⎞
⎟
⎟
⎠
⎤
⎥
⎥
⎦
. (1)
The regression parameters are then estimated by minimizing this negative partial log-likelihood. The estimate ̂
𝜷 is plugged-in
in Breslow’s estimate of the cumulative baseline hazard18
̂
Λ0,𝑞(𝑡) =
∑
𝑗∶𝑡𝑗 ≤𝑡
1
∑
𝑙∈𝑅𝑗,𝑞
exp{𝒙𝑇
𝑙
̂
𝜷𝑞}
.
For estimation, we need the first and second derivative of the Cox partial log-likelihood function. The score vector is given as
𝑈(𝜷) =
𝜕
𝜕𝛽
log[𝑙(𝜷)] = 𝑿𝑇
(𝜹 − ̂
𝝁), (2)
where 𝑿 ∈ ℝ𝑛×𝑃𝑄
denotes the design matrix, 𝜹 = (𝛿1, … , 𝛿𝑛)𝑇
the vector of event indicators and ̂
𝝁 = ( ̂
𝜇1, … , ̂
𝜇𝑛)𝑇
the estimated
cumulative hazards with elements ̂
𝜇𝑖 = ̂
Λ0(𝑡𝑖) exp{𝒙𝑇
𝑖
𝜷}. The Hessian matrix is
𝐽(𝜷) =
𝜕2
𝜕𝛽𝜕𝛽𝑇
log[𝑙(𝜷)] = −𝑿𝑇
𝑾 𝑿 (3)
with 𝑾 ∈ ℝ𝑛×𝑛
the weight matrix of the estimated cumulative hazards ̂
𝝁.22,23
3 FUSED SPARSE-GROUP LASSO PENALTY
This Section describes our adapted fused sparse-group lasso (FSGL) penalty to multi-state models as key variable selection
strategy for high-dimensional multi-state modeling.
For data-driven model selection, established methods incorporate regularization in the fitting process in order to conduct
variable selection.24,25
Especially in applications with few events per variable, regularization is needed in order to obtain a
unique and more stable solution of the regression parameters.26
Several regularization methods that perform covariate selection

Miah ET AL 5
beyond the least absolute shrinkage and selection operator (lasso)27
have been developed. These include elastic net28
, fused
lasso29
, sparse-group lasso30
and fused sparse-group lasso31
penalization. In the multi-state framework, adapted regularization
approaches incorporate the lasso1,3
, elastic net4
and structured fusion lasso2
for penalized multi-state modeling. Table 1 gives
an overview of existing penalization methods along with their penalty functions as well as their original publications for linear
regression models and adaptations to Cox models for survival outcomes.
The fused sparse-group lasso (FSGL) penalty, introduced by Zhou et al.31
and adapted by Beer et al.40
for linear models,
provides a combination of lasso, fused and grouped regularization. Thus, prior information of spatial and group structure can be
incorporated into the prediction model. The global lasso penalty fosters overall sparsity. The fusion penalty regularizes absolute
pairwise differences of regression coefficients. The group penalty allows variables within the same group to be jointly selected
or shrunk to zero.
We propose to transfer this combined penalty to the multi-state framework based on transition-specific hazards regression
models in order to obtain overall sparsity, link covariate effects across transitions and incorporate transition grouping. Thus, we
advocate the FSGL penalty that provide estimates with three properties:
1. Sparsity: The resulting estimator automatically zeros out small estimated coefficients to achieve variable selection and
simplify the model.41
2. Similarity: The resulting estimator penalizes absolute differences of covariate effects across similar transitions, thus
addressing homogeneous cross-transition effects.
3. Group structuring: The resulting estimator allows variables within the same transition to be jointly selected or shrunk to
zero, thus incorporating transition grouping.
We consider the same set of 𝑃 (time-fixed) covariates, e.g. biomarkers, for each transition 𝑞 ∈ {1, … , 𝑄} = . Further,
we presume a subset of pairs of similar transitions  = {(𝑞, 𝑞′
) ∶ 𝑞 ≠ 𝑞′
, 𝑞, 𝑞′
∈ }, of which we assume that covariate
effects across these transitions are of a similar magnitude, i.e. we consider potential cross-transition effects.2
The FSGL penalty
Table 1 Examples of penalization methods.
Penalization method Penalty function Parameters Model type
Ridge 𝜆‖𝜷‖2
2
𝜆 > 0 Linear (Hoerl and Kennard32
),
Cox (Gray33
,
Verweij and van Houwelingen34
)
Lasso 𝜆‖𝜷‖1 𝜆 > 0 Linear (Tibshirani27
),
Cox (Tibshirani35
)
Elastic net 𝛼‖𝜷‖1 + (1 − 𝛼)‖𝜷‖2
2
𝛼 ∈ [0, 1] Linear (Zou and Hastie28
),
Cox (Simon et al.36
)
Fused lasso 𝜆1
∑𝑃
𝑝=1
|𝛽𝑝| + 𝜆2
∑𝑃
𝑝=2
|𝛽𝑝 − 𝛽𝑝−1| 𝜆1, 𝜆2 > 0 Linear (Tibshirani et al.29
),
Cox (Chaturvedi et al.37
)
Group lasso 𝜆
∑
𝑔∈
√
𝑝𝑔‖𝜷𝑔‖2 𝜆 > 0, groups , Linear (Yuan and Lin38
),
group size 𝑝𝑔 Cox (Kim et al.39
)
Sparse-group lasso 𝛼‖𝜷‖1 + (1 − 𝛼)
∑
𝑔∈
√
𝑝𝑔‖𝜷𝑔‖2 𝛼 ∈ [0, 1] Linear & Cox (Simon et al.30
)
Fused sparse-group lasso 𝜆 [𝛼𝛾‖𝜷‖1 + (1 − 𝛾)‖𝑫𝜷‖1 𝜆 > 0, 𝛼, 𝛾 ∈ [0, 1], Linear (Beer et al.40
)
+(1 − 𝛼)𝛾
∑
𝑔∈
√
𝑝𝑔‖𝜷𝑔‖2] fusion matrix 𝑫
Lasso mstate 𝜆
∑
𝑞
∑
𝑝 |𝛽𝑝,𝑞| 𝜆 > 0 Competing risks (Saadati et al.1
),
Multi-state (Dang et al.3
)
Elastic net mstate (1 − 𝛼)
∑
𝑝,𝑞 𝛽2
𝑝,𝑞
+ 𝛼
∑
𝑝,𝑞 |𝛽𝑝,𝑞| 𝛼 ∈ [0, 1] Multi-state (Huang et al.4
)
Fusion lasso mstate 𝜆1
∑
𝑞
∑
𝑝 |𝛽𝑝,𝑞| 𝜆1, 𝜆2 > 0 Multi-state
+𝜆2
∑
𝑞,𝑞′
∑𝑃
𝑝=1
|𝛽𝑝,𝑞 − 𝛽𝑝,𝑞′ | (Sennhenn-Reulen and Kneib2
)

6 Miah ET AL
function is then defined as
𝑝𝜆,FSGL(𝜷) = 𝜆
[
𝛼𝛾
𝑄
∑
𝑞=1
𝑃
∑
𝑝=1
|𝛽𝑝,𝑞| + (1 − 𝛾)
∑
(𝑞,𝑞′)∈
𝑃
∑
𝑝=1
|𝛽𝑝,𝑞 − 𝛽𝑝,𝑞′ | + (1 − 𝛼)𝛾
𝑄
∑
𝑞=1
‖𝜷𝑞‖2
]
, (4)
with transition-specific regression coefficients 𝛽𝑝,𝑞 of covariate 𝑥𝑝, 𝑝 = 1, … , 𝑃 for transition 𝑞, transition-specific regression
vector 𝜷𝑞 ∈ ℝ𝑃
and tuning parameters 𝜆, 𝛼, 𝛾. The tuning parameter 𝜆 > 0 controls the overall level of regularization, 𝛼 ∈ [0, 1]
balances between global lasso and group lasso and 𝛾 ∈ [0, 1] balances between sparse penalties and the fusion penalty.40
Thus,
the optimal tuning parameter 𝜆opt is chosen at pre-selected values of 𝛼 and 𝛾. For (𝛼, 𝛾) = (1, 1), the estimator reduces to the
global lasso, for (𝛼, 𝛾) = (0, 1) to the group penalty and for (𝛼, 𝛾) = (1, 0) or (𝛼, 𝛾) = (0, 0) to the fusion penalty. The regression
parameter 𝛽 is estimated by minimizing the penalized negative partial log-likelihood function, i.e.
̂
𝜷 = arg min𝛽
[
𝐿(𝜷) + 𝑝𝜆,FSGL(𝜷)
]
.
4 OPTIMIZATION ALGORITHM
This Section introduces the general concept of the alternating direction method of multipliers (ADMM) optimization algorithm
in Subsection 4.1 and provides the explicitly derived ADMM updating steps to fit FSGL penalized multi-state models in Sub-
section 4.2. The criterion of selecting optimal penalty parameters is described in Subsection 4.3.
For penalized Cox-type regression, several numerical optimization algorithms exist for parameter estimation by minimizing
the penalized negative likelihood function. Simon et al.30
utilize an accelerated generalized gradient algorithm for the sparse-
group lasso penalty. However, the accelerated gradient method depends on the separability of the penalty term across groups
of 𝜷, so that the fusion penalty can only be applied within groups. For the structured fusion lasso penalty, Sennhenn-Reulen
and Kneib2
make use of a penalized iteratively re-weighted least squares algorithm. This second-order optimization has high
computation cost and potential convergence problems.3
Further, coordinate descent algorithms do not work for the fused lasso
penalty due to its non-separability into a sum of functions of the elements of 𝜷 that is also not continuously differentiable.
Thus, we chose the ADMM optimization algorithm to fit FSGL penalized multi-state models due to the decomposability of the
objective function as well as superior convergence properties.
4.1 Alternating direction method of multipliers algorithm
The alternating direction method of multipliers (ADMM) algorithm provides a very general framework for numerical optimiza-
tion of convex functions. It originates from the 1950s and was developed in the 1970s42,43
, but was holistically examined later by
Boyd et al.44
for a broader conceptuality. The algorithm combines the decomposability of the objective function with superior
convergence properties of the method of multipliers.44
Consider the following general optimization problem w.r.t. a variable
𝜷 ∈ ℝ𝑃
min𝛽 𝑓(𝜷) + 𝑔(𝜷),
where 𝑓, 𝑔 denote convex functions. In the ADMM framework, the generic constrained optimization problem introducing an
auxiliary variable 𝜽 ∈ ℝ𝑃
is given as
min𝛽,𝜃 𝑓(𝜷) + 𝑔(𝜽) subject to 𝜽 − 𝜷 = 𝟎.
Thus, the objective function becomes additively separable, which simplifies the subsequent optimization steps. As in the method
of multipliers, the augmented Lagrangian function adding an 𝐿2-term to enhance optimization stability45
is given as
(𝜷, 𝜽, 𝝓) = 𝑓(𝜷) + 𝑔(𝜽) + 𝝓𝑇
(𝜽 − 𝜷) +
𝜌
2
‖𝜽 − 𝜷‖2
2
= (𝜷, 𝜽, 𝝂) = 𝑓(𝜷) + 𝑔(𝜽) +
𝜌
2
‖𝜽 − 𝜷 + 𝝂‖2
2
−
𝜌
2
‖𝝂‖2
,

Miah ET AL 7
with Lagrangian multiplier 𝝓 ∈ ℝ𝑃
, augmented Lagrangian parameter 𝜌 > 0 (i.e. the ADMM step size) and scaled dual variable
𝝂 = 𝝓
𝜌
∈ ℝ𝑃
. The general ADMM iterations consist of the following alternating update steps at iteration 𝑟 + 1:
𝜷𝑟+1
= arg min𝛽 (𝜷, 𝜽𝑟
, 𝝂𝑟
),
𝜽𝑟+1
= arg min𝜃 (𝜷𝑟+1
, 𝜽, 𝝂𝑟
),
𝝂𝑟+1
= 𝝂𝑟
+ 𝜷𝑟+1
− 𝜽𝑟+1
.
The algorithm comprises a 𝜷-minimization step, a 𝜽-minimization step and a dual variable 𝝂-update. Thus, the usual joint
minimization is separated across the decomposition of the objective function over parameters 𝜷 (e.g. likelihood) and 𝜽 (e.g.
penalty) into two steps.
As a stopping criterion, Boyd et al.44
propose sufficiently small primal and dual residuals, i.e.
‖𝜷𝑟+1
− 𝜽𝑟+1
‖2 < 𝜖1 =
√
𝑃𝜖abs + 𝜖rel max{‖𝜷𝑟
‖2, ‖𝜽𝑟
‖2} and
‖𝜌(𝜽𝑟+1
− 𝜽𝑟
)‖2 < 𝜖2 =
√
𝑃𝜖abs + 𝜖rel‖𝜈𝑟
‖2,
with absolute and relative tolerances 𝜖abs = 10−4
and 𝜖rel = 10−2
.
4.2 ADMM for penalized multi-state models
In the FSGL penalized multi-state framework, the constrained optimization problem for the stacked regression parameter
𝜷 ∈ ℝ𝑃𝑄
is given as
min𝜷,𝜽 𝑓(𝜷) + 𝑔(𝜽) subject to 𝜃𝑚 − 𝑲𝑚𝜷 = 0, 𝑚 ∈ {1, … , 𝑀},
where 𝑓(𝜷) = 𝐿(𝜷) is the negative multi-state partial log-likelihood function as in (1) and 𝑔(𝜽) = 𝑝𝜆,FSGL(𝜽) is the FSGL penalty
function (4) with auxiliary variable 𝜽 = (𝜃1, … , 𝜃𝑀 )𝑇
∈ ℝ𝑀
, 𝑀 = 𝑃𝑄 + 𝑠 + 𝑃𝑄, such that 𝜃𝑚 = 𝑲𝑚𝜷. The penalty structure
matrix is defined as 𝑲 = (𝑲1| … |𝑲𝑀 )𝑇
∈ ℝ𝑀×𝑃𝑄
, with elements 𝑘𝑖𝑗 ∈ {−1, 0, 1}, such that
𝑲𝑚 =
⎧
⎪
⎨
⎪
⎩
𝒖𝑚, if 𝑚 ∈ {1, … , 𝑃𝑄},
𝒅𝑚−𝑃𝑄, if 𝑚 ∈ {𝑃𝑄 + 1, … , 𝑃𝑄 + 𝑠},
𝑮𝑚−𝑃𝑄−𝑠, if 𝑚 ∈ {𝑃𝑄 + 𝑠 + 1, … , 𝑃𝑄 + 𝑠 + 𝑄},
where 𝒖𝑚 denotes the unit vector of the identity matrix 𝑰𝑃𝑄 ∈ ℝ𝑃𝑄×𝑃𝑄
corresponding to the global lasso penalty. The contrast
vector of the (𝑚 − 𝑃𝑄)-th row of the fusion matrix 𝑫 ∈ ℝ𝑠×𝑃𝑄
for 𝑠 pairs of similar transitions with elements 𝑑𝑖𝑗 ∈ {−1, 1} at
the corresponding positions of covariates of such similar transitions corresponding to the fusion penalty is denoted as 𝒅𝑚, e.g.
𝒅1 = (1, −1, 0, … , 0)𝑇
for covariate 𝑋1.1 and 𝑋1.2 of transitions 1 and 2. 𝑮𝑚−𝑃𝑄−𝑠 ∈ ℝ𝑃 ×𝑃𝑄
are the group matrices of the 𝑄
transitions consisting of unit vectors that indicate the group allocation of a variable to a corresponding transition for the group
penalty. The penalty structure matrix 𝑲 is then given as
𝑲 =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣
𝑰𝑃𝑄
𝑫
𝑮1
⋮
𝑮𝑄
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1 0 0 ⋯ ⋯ 0 0 0
0 1 0 ⋯ ⋯ 0 0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 0 0 ⋯ ⋯ 0 0 1
1 −1 0 ⋯ ⋯ 0 0 0
1 0 −1 ⋯ ⋯ 0 0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 0 0 ⋯ ⋯ 0 1 −1
1 0 0 ⋯ ⋯ 0 0 0
0 0 0 ⋯ ⋯ 1 0 0
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 0 0 ⋯ ⋯ 0 0 1
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
.

8 Miah ET AL
Thus, the total number of rows of the penalty structure matix 𝑲 ∈ ℝ𝑀×𝑃𝑄
is 𝑀 = 𝑃𝑄 + 𝑠 + 𝑃𝑄. Optimization of the
likelihood and penalty terms are separated and therefore simplified.
For the 𝜷-updating step, Cox estimation of the regression parameter 𝜷 is performed by numerical algorithms. The gradient
descent update is given as 𝜷𝑟+1
GD
= 𝜷𝑟
+ 𝜖GD𝑈(𝜷𝑟
) using the score vector 𝑈(𝜷𝑟
) at iteration 𝑟 as in (2) and step size 𝜖GD. The
Newton-Raphson update is
𝜷𝑟+1
NR
= 𝜷𝑟
− 𝐽(𝜷𝑟
)−1
𝑈(𝜷𝑟
),
using both the gradient 𝑈(𝜷𝑟
) and Hessian matrix 𝐽(𝜷𝑟
) at iteration 𝑟 as in (3). The estimation tolerance for the convergence
criterion based on the partial log-likelihood is denoted as 𝑣NR. A hybrid algorithm as proposed by Goeman22
combines adaptive
gradient descent and Newton-Raphson to derive 𝜷-estimates in a Cox model. It starts with a single gradient descent step and
then switches to Newton-Raphson updating steps. For an efficient 𝜽-updating step, the proximity operator is utilized, i.e. the
vector soft-thresholding operator for 𝒂 ∈ ℝ𝑚
𝑆𝜅(𝒂) = (1 − 𝜅∕‖𝒂‖2)+𝒂,
with 𝑆𝜅(𝟎) = 𝟎 and (⋅)+ = max{0, ⋅}. As a shrinkage operator, it provides a simple closed-form solution for the 𝜽-update (see
Boyd et al.44
for details).
We derive the augmented Lagrangian function along with its first and second derivative w.r.t. 𝜷 as
(𝜷, 𝜽, 𝝂) = 𝑓(𝜷) + 𝑔(𝜽) +
𝑀
∑
𝑚=1
[
𝜈𝑚(𝜃𝑚 − 𝑲𝑚𝜷) +
𝜌
2
‖𝜃𝑚 − 𝑲𝑚𝜷‖2
2
]
,
𝜕
𝜕𝛽
(𝜷, 𝜽, 𝝂) = 𝑓′
(𝜷) +
𝑀
∑
𝑚=1
[
−𝜈𝑚𝑲𝑚 + 𝜌(−𝜃𝑚 + 𝑲𝑚𝜷)𝑲𝑚
]
= 𝑈(𝜷) + [𝜌(𝜷𝑇
𝑲𝑇
− 𝜽𝑇
) − 𝝂𝑇
]𝑲
= 𝑿𝑇
(𝜹 − ̂
𝝁) + [𝜌(𝜷𝑇
𝑲𝑇
− 𝜽𝑇
) − 𝝂𝑇
]𝑲,
𝜕
(𝜷, 𝜽, 𝝂) = 𝑓′′
(𝜷) +
𝑀
∑
𝑚=1
[
𝜌𝑲𝑇
𝑚
𝑲𝑚
]
= 𝐽(𝜷) + 𝜌𝑲𝑇
𝑲
= −𝑿𝑇
𝑾 𝑿 + 𝜌𝑲𝑇
𝑲,
with step size 𝜌 > 0, scaled dual variable 𝝂 = (𝜈1, … , 𝜈𝑀 )𝑇
∈ ℝ𝑀
, score vector 𝑈(𝜷) as in (2) and Hessian matrix 𝐽(𝜷) as
in (3). Thus, by plugging-in both derivatives to the Newton-Raphson 𝜷-updating step, our ADMM algorithm for the stacked
regression parameter 𝜷 ∈ ℝ𝑃𝑄
in a multi-state model consists of the following steps:
1. Initialize 𝜷0
, 𝜽0
, and 𝝂0
.
2. Update until stopping criterion met:
𝜷𝑟+1
= 𝜷𝑟
+ (𝑿𝑇
𝑾 𝑟
𝑿 + 𝜌𝑲𝑇
𝑲)−1
[𝑿𝑇
(𝜹 − ̂
𝝁𝑟
) + [𝜌(𝜷𝑇
𝑲𝑇
− 𝜽𝑇
) − 𝝂𝑇
]𝐾],
𝜃𝑟+1
𝑚
= 𝑆𝜆𝑚𝑤𝑚
𝜌
(𝑲𝑚𝜷𝑟+1
+ 𝜈𝑟
𝑚
∕𝜌), 𝑚 = 1, … , 𝑀,
𝝂𝑟+1
= 𝝂𝑟
+ 𝜌(𝜽𝑟+1
− 𝑲𝜷𝑟+1
),
where parameter dimensions are 𝜽, 𝝂 ∈ ℝ𝑀
. 𝜆𝑚 denotes the regularization parameters for the global lasso, fusion and group
penalties, respectively, and 𝑤𝑚 =
√
𝑃 the group weights incorporating the group sizes corresponding to the group penalty. For
the stopping criterion, we follow Boyd et al.44
adapted for FSGL by Beer et al.40
(Appendix Section 2.2) as
‖𝜽𝑟+1
− 𝐾𝜷𝑟+1
‖2 < 𝜖1 and
‖𝜌𝑲𝑇
(𝜽𝑟+1
− 𝜽𝑟
)‖2 < 𝜖2,
with 𝜖1 =
√
𝑃𝑄𝜖abs +𝜖rel max{‖𝑲𝜷𝑟+1
‖2, ‖𝜽𝑟+1
‖2}, 𝜖2 =
√
𝑀𝜖abs +𝜖rel‖𝑲𝑇
𝝂𝑟+1
‖2 and tolerances 𝜖abs, 𝜖rel as in Subsection 4.1.
Regarding the ADMM step size 𝜌 > 0, we follow Beer et al.40
by implementing an adaptive step size proposed by He et al.46

Miah ET AL 9
to accelerate the convergence of the ADMM algorithm, i.e.
𝜌𝑟+1
=
⎧
⎪
⎨
⎪
⎩
𝜏𝜌𝑟
, if ‖𝜽𝑟+1
− 𝐾𝜷𝑟+1
‖2 > 𝜂‖𝜌𝑲𝑇
(𝜽𝑟+1
− 𝜽𝑟
)‖2,
𝜌𝑟
𝜏
, if ‖𝜽𝑟+1
− 𝐾𝜷𝑟+1
‖2 < 𝜂‖𝜌𝑲𝑇
(𝜽𝑟+1
− 𝜽𝑟
)‖2,
𝜌𝑟
, otherwise,
where we set 𝜏 = 2, 𝜂 = 10 and initialize 𝜌0
= 1. Algorithm 1 provides a summary of the adapted ADMM algorithm to FSGL
penalized multi-state models (FSGLmstate).
Algorithm 1 ADMM for fused sparse-group lasso penalized multi-state models (FSGLmstate)
1: Set 𝑲 ∈ ℝ𝑀×𝑃𝑄
, 𝛼, 𝛾 ∈ [0, 1], 𝜌 = 1, 𝜖NR = 0.01, and 𝑣NR = 10−6
.
2: initialize 𝜷0
= 𝟎𝑃𝑄, 𝜽0
= 𝟎𝑀 , 𝝂0
= 𝟎𝑀 .
3: repeat
4: Update 𝜷𝑟+1
= 𝜷𝑟
+ (𝑿𝑇
𝑾 𝑟
𝑿 + 𝜌𝑲𝑇
𝑲)−1
[𝑿𝑇
(𝜹 − ̂
𝝁𝑟
) + [𝜌(𝜷𝑇
𝑲𝑇
− 𝜽𝑇
) − 𝝂𝑇
]𝑲],
5: Update 𝜽𝑟+1
𝑚
= 𝑆𝜆𝑚𝑤𝑚
𝜌
(𝑲𝒎𝜷𝑟+1
+ 𝜈𝑟
𝑚
∕𝜌), 𝑚 = 1, … , 𝑀,
6: Update 𝝂𝑟+1
= 𝝂𝑟
+ 𝜌(𝜽𝑟+1
− 𝐾𝜷𝑟+1
),
7: until ‖𝜽𝑟+1
− 𝑲𝜷𝑟+1
‖2 < 𝜖1 and ‖𝜌𝑲𝑇
(𝜽𝑟+1
− 𝜽𝑟
)‖2 < 𝜖2 for sufficiently small 𝜖1 and 𝜖2.
8: obtain ̂
𝜷 = ̂
𝜽.
To tackle the dependency of the penalized estimation solution on relative variable scales, standardization is performed for
continuous covariates before applying penalization, i.e. 𝑥∗
𝑝.𝑞
=
𝑥𝑝.𝑞
̂
𝜎𝑥𝑝.𝑞
, where ̂
𝜎𝑥𝑝.𝑞
denotes the empirical standard deviation of 𝑥𝑝.𝑞.
For interpretion, the regression coefficients have to be scaled back after estimation.
The algorithm can be easily amended to situations in which certain covariates should not be penalized (e.g. established
clinical predictors). Therefore, we introduce an individual penalty scaling factor 𝜁𝑚 ≥ 0, 𝑚 = 1, … , 𝑃𝑄, which allows different
penalties for each variable, i.e. 𝜆𝑚 = 𝜆𝜁𝑚.47
Unpenalized parameters get a penalty scaling factor set to zero, i.e. 𝜁𝑚 = 0 for
𝑚 ∈ {1, … , 𝑃𝑄}.
Further, it is important to note that the ADMM algorithm does not generate exact zeros for the ̂
𝜷-solution.48,45
However, the
estimated auxiliary variable ̂
𝜽 is sparse, so that variable selection results are based on the derived estimate ̂
𝜽. Thus, we get the
final estimated penalized regression parameters as ̂
𝜷 = ̂
𝜽.
4.3 Selection of tuning parameters
For tuning parameter selection, we focus on the approximate generalized cross-validation (GCV) statistic49
. This selection
criterion was used by Tibshirani et al.29
for the fused lasso and Fan and Li41
for variable selection in penalized Cox models.
GCV is an estimator of the predictive ability of a model50
, which is defined as
GCV(𝜆) =
𝐿( ̂
𝜷)
𝑁[1 − 𝑒(𝜆)∕𝑁]2
,
where 𝜆 is a general tuning parameter. The effective number of model parameters for the Cox proportional hazards model in the
last step of the Newton-Raphson algorithm iteration41
is approximated as
𝑒(𝜆) = tr
[{
𝜕2
𝐿( ̂
𝜷) + Σ𝜆( ̂
𝜷)
}−1
𝜕2
𝐿( ̂
𝜷)
]
,
with
Σ𝜆( ̂
𝜷) = diag
{
𝑝′
( ̂
𝛽1,1)
| ̂
𝛽1,1|
, … ,
𝑝′
( ̂
𝛽𝑃,𝑄)
| ̂
𝛽𝑃,𝑄|
}

10 Miah ET AL
and 𝑝′
(⋅) denoting the first derivative of the locally quadratic approximated penalty function. The optimal tuning parameter is
then selected as 𝜆opt = arg min𝜆{GCV(𝜆)}. For the selection of an optimal combination of multiple tuning parameters, grid
search29
along with the Brent optimization algorithm51
is utilized.
5 SIMULATION STUDY
This Section describes the design of a proof-of-concept simulation study for evaluating FSGL penalized multi-state models in
terms of variable selection in Subsection 5.1 and illustrates corresponding results in Subsection 5.2.
5.1 Simulation design
The aim of the following proof-of-concept simulation study is to evaluate the variable selection procedure based on FSGL pe-
nalized multi-state models in terms of its ability to select a sparse model distinguishing between relevant transition-specific
effects and equal cross-transition effects. As a methodological phase II simulation study, it offers empirical evidence to demon-
strate validity in finite samples across a limited range of scenarios.52
The corresponding ADEMP criteria of the simulation study
are summarized in Table 2. A detailed simulation study plan according to ADEMP-PreReg53
can be found in the Supporting
Information.
Data-generating mechanism
In each simulation run, we generate multi-state data with a sample size of 𝑁 = 1000 from the AML multi-state model dis-
played in Figure 1 based on transition-specific hazards regression as a nested series of competing risks experiments according
to Beyersmann et al.55
Thus, data has been generated by the following data-generating process: Waiting times in state 𝑙 are gen-
erated from an exponential distribution with hazards ℎ𝑙⋅ =
∑9
𝑘=1,𝑘≠𝑙
ℎ𝑙𝑘, 𝑙 = 1, … , 9. Transition-specific baseline hazards are
set constant to ℎ0,𝑞(𝑡) = 0.05 for all transitions 𝑞 = 1, … , 8. We synthesize two independent biomarkers as binary covariates
𝑋𝑝,𝑖 ∼ (0.5), 𝑝 = 1, 2, 𝑖 = 1, … , 1000. The true regression parameters for biomarker 𝑋1 are set to 𝜷1,1 = 1.5 for transition 1,
𝛽1,3 = 𝛽1,7 = 1.2 for transitions 3 and 7, 𝛽1,4 = 𝛽1,8 = −0.8 for transitions 4 and 8 and 𝛽1,2 = 𝛽1,5 = 𝛽1,6 = 0 for transitions 2,
5 and 6. Similar transitions are 3 and 7, i.e. from first complete remission (CR1) to first relapse and from second complete re-
mission (CR2) to second relapse, as well as 4 and 8, i.e. CR1 to death in CR1 and CR2 to death in CR2. Thus, covariate 𝑋1 has
equal effects on these two pairs of similar transitions. Covariate 𝑋2 has no effect on any transition, i.e. 𝛽2,1 = ⋯ = 𝛽2,8 = 0.
Target
Our primary target focuses on the true non-zero regression coefficients 𝛽𝑝.𝑞 from the penalized multi-state Cox-type proportional
hazards models
ℎ𝑞(𝑡|𝑥) = ℎ0,𝑞(𝑡) exp{𝜷𝑇
𝑞
𝒙}, 𝑞 = 1, … , 8,
Table 2 ADEMP criteria of the simulation study according to Morris et al.54
.
ADEMP criterion Definition
Aim Evaluation of sparse variable selection detecting relevant transition-specific effects and
equal cross-transitions effects
Data-generating mechanism Multi-state model based on transition-specific hazards models
Estimand/target Regression coefficients
Methods Unpenalized Cox-type multi-state estimation with ADMM optimization;
Lasso penalized multi-state model with ADMM optimization (LASSOmstate);
Fused sparse-group lasso penalized multi-state model with ADMM optimization (FSGLmstate)
Performance measures True positive rate (TPR); False discovery rate (FDR);
Bias; Mean squared error (MSE)

Miah ET AL 11
where ℎ0,𝑞(𝑡) denotes the baseline hazard rate of transition 𝑞 at time 𝑡, 𝒙 = (𝑥1, … , 𝑥𝑃 )𝑇
∈ ℝ𝑃
the vector of covariates and
𝜷𝑞 ∈ ℝ𝑃
the vector of transition-specific regression coefficients for 𝑃 covariates.
Methods
We aim to compare the FSGLmstate algorithm to unpenalized multi-state Cox-type estimation and lasso penalized estimation
(LASSOmstate) based on ADMM optimization. For fitting Cox-type multi-state models by ADMM optimization as described in
Section 4.2, we chose the following parameter settings: The ADMM variables are initialized as 𝜷0
= 𝜽0
= 𝝂0
= 𝟎 and the adap-
tive ADMM step size as 𝜌0
= 1. The step size in gradient descent is set to 𝜖GD = 0.01, the tolerance of the stopping criterion for
Cox estimation tolGD = 10−6
, the relative and absolute tolerance for the ADMM stopping criterion 𝜖rel = 10−2
and 𝜖abs = 10−4
and the maximum number of iterations to maxiter = 500. For each combination of tuning parameters 𝛼, 𝛾 ∈ {0, 0.25, 0.5, 0.75, 1},
the optimal overall penalty parameter ̂
𝜆opt > 0 is selected by minimal GCV over a grid of 𝜆 ∈ {0.01, … , 500}, equally spaced
on a logarithmic scale.
Performance measures
Regularization performance is assessed by true positive rates (TPR) and false discovery rates (FDR) of variable selection. Median
counts of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) of variables over all simulations
are calculated. Based on these absolute counts, TPR is calculated as TPR = TP
TP+FN
. Further, FDR is defined as the number of
unrelated variables selected (i.e. false positives) divided by the total number of selected variables, i.e. FDR = FP
TP+FP
.
For quantifying the estimation bias, Bias( ̂
𝜷) = ̂
𝜷 − 𝜷, for the non-zero covariates, the mean squared error (MSE) over all
simulation iterations is used. The MSE for the non-zero covariates is defined as
MSE𝑛𝑧( ̂
𝜷) =
1
𝑑
∑
𝑝,𝑞∶𝛽𝑝,𝑞≠0
( ̂
𝛽𝑝,𝑞 − 𝛽𝑝,𝑞)2
,
where 𝑑 denotes the number of non-zero covariates with 𝛽𝑝,𝑞 ≠ 0 of the true model. The mean bias and mean MSE averaged
over the non-zero predictors over all simulation runs along with Monte Carlo standard errors (MCSE) are calculated according
to Morris et al.54
.
The number of simulation runs is based on the TPR as one of the primary performance measures of interest. Thus, we
need 𝑛sim = 225 simulation repetitions per scenario as we aim for TPR ≥ 0.9 and MCSE(TPR) ≤ 0.02 and assume
MCSE( ̂
TPR) ≤ 0.15, resulting in 𝑛sim = 0.9⋅0.1
0.022
= 225 and 𝑛sim = 0.152
0.022
= 225, respectively.
5.2 Simulation results
This Section summarizes the main simulation findings, with full results available in the Supporting Information. Tuning pa-
rameter selection by minimal GCV is illustrated in Figure 2 for FSGLmstate. Boxplots depict mean GCV across combinations
of tuning parameter pairs (𝛼, 𝛾) for a grid of penalty parameter 𝜆 ∈ {0.01, … , 500} over all 𝑛sim = 225 simulated data sets.
For LASSOmstate corresponding to (𝛼, 𝛾) = (1, 1), the most frequent lowest GCV is obtained for the optimal tuning parameter
̂
𝜆opt,L = 8.6 with mean GCV( ̂
𝜆opt,L) ⋅ 1000 = 0.52597 over all simulations. For FSGLmstate, the tuning parameter combina-
tion (𝛼, 𝛾) = (1, 0.25) yields the most frequent lowest GCV for ̂
𝜆opt,FSGL = 38.1 with mean GCV( ̂
𝜆opt,FSGL) ⋅ 1000 = 0.52663
over all simulated data sets with the corresponding penalty parameter combination. The regularization performance of the FS-
GLmstate algorithm in comparison to unpenalized and lasso penalized multi-state Cox-type estimation is depicted in Figure 3.
For our simulation setting with 𝑁 = 1000 observations and 𝑃𝑄 = 16 regression parameters, unpenalized Cox-type estimation
serves as a gold standard. The boxplots illustrate the estimated regression coefficients of the binary covariates based on ̂
𝜆opt,L
and ̂
𝜆opt,FSGL. Whereas LASSOmstate identifies the non-zero effects of 𝛽1,1 = 1.5, 𝛽1,3 = 𝛽1,7 = 1.2 and 𝛽1,4 = −0.8, the
negative effect of 𝛽1.8 = −0.8 for the late transition 8 from CR2 to death in CR2 is set to zero on average. FSGLmstate recog-
nizes the similarity structure of the covariate effect pairs 𝛽1,3 = 𝛽1,7 = 1.2 as well as 𝛽1,4 = 𝛽1.8 = −0.8 while setting all other
true negative covariate effects to zero. The unpenalized Cox-type estimation based on ADMM optimization identifies all non-
zero effects, but inherently does not perform regularization, which results in larger variance for all true negative coefficients.
Figure 4 depicts variable selection results in terms of TPR and FDR for LASSOmstate and FSGLmstate. Whereas FSGLmstate
more often detects all non-zero regression effects, LASSOmstate’s estimated TPR varies between 0.8 and 1.0 (left panel). With
regard to FDR, FSGLmstate has a median estimated FDR of 0.29 and LASSOmstate of 0.38 (right panel). Figure 5 illustrate

12 Miah ET AL
0.000530
0.000535
0.000540
(
0
,
0
)
(
0
,
0
.
2
5
)
(
0
,
0
.
5
)
(
0
,
0
.
7
5
)
(
0
,
1
)
(
0
.
2
5
,
0
)
(
0
.
2
5
,
0
.
2
5
)
(
0
.
2
5
,
0
.
5
)
(
0
.
2
5
,
0
.
7
5
)
(
0
.
2
5
,
1
)
(
0
.
5
,
0
)
(
0
.
5
,
0
.
2
5
)
(
0
.
5
,
0
.
5
)
(
0
.
5
,
0
.
7
5
)
(
0
.
5
,
1
)
(
0
.
7
5
,
0
)
(
0
.
7
5
,
0
.
2
5
)
(
0
.
7
5
,
0
.
5
)
(
0
.
7
5
,
0
.
7
5
)
(
0
.
7
5
,
1
)
(
1
,
0
)
(
1
,
0
.
2
5
)
(
1
,
0
.
5
)
(
1
,
0
.
7
5
)
(
1
,
1
)
(α, γ)−pair of tuning parameters
Mean
generalized
cross−validation
(GCV)
FSGLmstate: Tuning parameter selection
Figure 2 Tuning parameter selection results for FSGLmstate: Mean generalized cross-validation (GCV) statistics across all pre-
selected combinations of penalty parameters (𝛼, 𝛾) over all simulation runs. The pair (𝛼, 𝛾) = (1, 1) corresponds to the global
lasso penalty.
−1
0
1
2
X1.1 X1.2 X1.3 X1.4 X1.5 X1.6 X1.7 X1.8 X2.1 X2.2 X2.3 X2.4 X2.5 X2.6 X2.7 X2.8
Transition−specific covariates
Regression
coefficient
βtrue Method FSGLmstate LASSOmstate Unpenalized
Figure 3 Boxplots of estimated regression coefficients based on simulated data of the 9-state AML model with eight transitions
and two binary covariates. 𝑋1.3 and 𝑋1.7 as well as 𝑋1.4 and 𝑋1.8 refer to transitions with true equal effects of covariate 𝑋1.
Covariate 𝑋2 has no true effect on any transition. Dots depict estimated covariate effects based on ̂
𝜆opt,L and ̂
𝜆opt,FSGL of each
simulated data set. True underlying covariate effects 𝜷true are denoted as crosses (×).

Miah ET AL 13
FSGLmstate
LASSOmstate
0.00 0.25 0.50 0.75 1.00
True positive rate (TPR)
Method
FSGLmstate
LASSOmstate
0.00 0.25 0.50 0.75 1.00
False discovery rate (FDR)
Method
Figure 4 Variable selection results in terms of true positive rates (TPR) and false discovery rates (FDR) for LASSOmstate and
FSGLmstate. Dots illustrate TPR and FDR of each simulated data set.
FSGLmstate
LASSOmstate
Unpenalized
−0.50 −0.25 0.00 0.25 0.50
Mean bias (MC−CI)
Method
FSGLmstate
LASSOmstate
Unpenalized
0.00 0.25 0.50 0.75 1.00
Mean squared error (MSE)
Method
Figure 5 Mean bias and mean squared error (MSE) of estimating the non-zero covariate effects along with 95% Monte Carlo
confidence intervals (MC-CI). Dots illustrate mean bias and mean MSE of a single simulated data set.
the mean bias and MSE of estimating the non-zero covariate effects along with MCSE. As expected, unpenalized Cox-type es-
timation exhibits smallest mean bias and MSE of estimating the non-zero covariates in our simulation setting with 𝑁 = 1000
observations. Notably, FSGLmstate provides smaller mean MSEs than LASSOmstate.
6 APPLICATION TO LEUKEMIA DATA
The potential of FSGL penalized multi-state models is further investigated in an illustrative application to leukemia data. The
AMLSG 09-09 study is a randomized phase III trial conducted between 2010 and 2017 at 56 study hospitals in Germany
and Austria. The clinical trial evaluated intensive chemotherapy with or without gemtuzumab ozogamicin (GO) in patients
with NPM1-mutated AML. Final analysis results for the single and composite endpoints event-free survival (EFS), overall
survival (OS), CR rates and cumulative incidence of relapse (CIR) with long-term follow-up are published in Döhner et al.56
.
In conclusion, primary endpoints of the trial in terms of EFS and OS were not met. Additional gene mutation data are available
for 𝑁 = 568 study patients.

14 Miah ET AL
Our motivating 9-state model for AML along with event counts based on the 09-09 trial data is illustrated in Figure 6. Late
transitions 7 and 8 are rather rarely observed with few events (𝐸7 = 31, 𝐸8 = 25). Derived from this multi-state model, Figure 7
depicts the stacked transition probabilities to all states from randomization. The probability of being in an intermediate state can
fluctuate over time, either increasing or decreasing, while the absorbing state probabilities can only increase over time.
For our 9-state model, we investigate covariate effects of 𝑃 = 24 gene mutations with a prevalence of >3% along with
the 𝑃𝑐 = 4 established clinical predictors treatment (GO vs. standard), age (years), sex (male vs. female) and log10 trans-
formed white blood cell count (109
cells/l). Considering these 𝑃 = 28 covariates and 𝑄 = 8 transitions, we need to incorporate
(𝑃 +𝑃𝑐)𝑄 = 28 ⋅ 8 = 224 regression parameters. The clinical predictors should persist unpenalized, thus we apply the FSGL
penalty to the remaining 192 mutation parameters. We assume similarity for transitions 3 and 7, i.e. CR1 to first relapse and CR2
to second relapse, as well as transitions 4 and 8, i.e. CR1 to death in CR1 and CR2 to death in CR2, so that we have 𝑠 = 2 pairs
of similar transitions. With respect to a-priori expert knowledge on similarity and grouping structures in AML mutations, tun-
ing parameter combinations are investigated for 𝛼 ∈ {0.5, 0.75, 1} with more weight on the global lasso and 𝛾 ∈ {0, 0.25, 0.5}
putting more weight to the fusion penalty. Among all pre-defined pairs (𝛼, 𝛾), the optimal combination of regularization param-
eters (̂
𝛼opt,FSGL, ̂
𝛾opt,FSGL) = (0.75, 0.5) and ̂
𝜆opt,FSGL = 20 is then selected by minimal GCV over the grid 𝜆 ∈ {0.01, … , 500}.
Active
disease
1st Complete
Remission
(CR1)
Death
(no CR)
Death
(CR1)
Death
(relapse)
Death
(CR2)
1st
Relapse
2nd
Relapse
First-line therapy Second-line therapy
2nd Complete
Remission
(CR2)
498
56
190
47
102
79
31
25
1
2
3
4
5
6
7
8
Figure 6 Event counts of the multi-state model for acute myeloid leukemia (AML) with nine states and eight transitions based
on the AMLSG 09-09 trial data. Gray numbers indicate the corresponding transition.
0.00
0.25
0.50
0.75
1.00
0 2 4 6 8
Years since randomization
Transition
probabilities
State
Relapse2
Death (CR2)
Death (relapse)
Death (CR1)
Relapse1
Death (no CR)
CR2
CR1
Active disease
Figure 7 Stacked transition probabilities to all states from randomization derived from the multi-state model for acute myeloid
leukemia (AML) based on the AMLSG 09-09 trial data. The distance between two adjacent curves represents the probability of
being in the corresponding state. CR: Complete remission.

Miah ET AL 15
Figure 8 depicts all estimated regression coefficients of clinical and mutation variables by FSGLmstate separately for each tran-
sition. In consistence with final analysis results for CIR, treatment has a negative regression effect on transition 3, i.e. from CR1
to first relapse, suggesting an anti-leukaemic efficacy of intensive chemotherapy including GO ( ̂
𝛽treatment.3 = −0.34). With re-
spect to molecular markers, mutations of the DNA methylation gene DNMT3AR882
are selected for transition 3 from CR1 to first
relapse, as well as for transition 7 from CR2 to second relapse. This result aligns with accompanying gene mutation analyses of
Cocciardi et al.57
, where DNMT3AR882
mutations were associated with an increased CIR.
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
1
a
g
e
.
1
g
e
n
d
e
r
.
1
l
o
g
1
0
_
w
b
c
.
1
C
E
B
P
A
.
1
D
N
M
T
3
A
n
o
n
R
8
8
2
.
1
D
N
M
T
3
A
R
8
8
2
.
1
F
L
G
.
1
F
L
T
3
_
I
T
D
.
1
F
L
T
3
_
T
K
D
.
1
G
A
T
A
2
.
1
I
D
H
1
.
1
I
D
H
2
.
1
K
M
T
2
C
.
1
K
R
A
S
.
1
M
R
G
.
1
M
Y
C
.
1
N
F
1
.
1
N
F
E
2
.
1
N
R
A
S
.
1
P
T
P
N
1
1
.
1
R
A
D
2
1
.
1
S
M
C
1
A
.
1
S
M
C
3
.
1
S
R
S
F
2
.
1
S
T
A
G
2
.
1
T
E
T
2
.
1
W
T
1
.
1
Covariates
Regression
coefficient
Transition 1: Active disease −> CR1
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
2
a
g
e
.
2
g
e
n
d
e
r
.
2
l
o
g
1
0
_
w
b
c
.
2
C
E
B
P
A
.
2
D
N
M
T
3
A
n
o
n
R
8
8
2
.
2
D
N
M
T
3
A
R
8
8
2
.
2
F
L
G
.
2
F
L
T
3
_
I
T
D
.
2
F
L
T
3
_
T
K
D
.
2
G
A
T
A
2
.
2
I
D
H
1
.
2
I
D
H
2
.
2
K
M
T
2
C
.
2
K
R
A
S
.
2
M
R
G
.
2
M
Y
C
.
2
N
F
1
.
2
N
F
E
2
.
2
N
R
A
S
.
2
P
T
P
N
1
1
.
2
R
A
D
2
1
.
2
S
M
C
1
A
.
2
S
M
C
3
.
2
S
R
S
F
2
.
2
S
T
A
G
2
.
2
T
E
T
2
.
2
W
T
1
.
2
Covariates
Regression
coefficient
Transition 2: Active disease −> Death (no CR)
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
3
a
g
e
.
3
g
e
n
d
e
r
.
3
l
o
g
1
0
_
w
b
c
.
3
C
E
B
P
A
.
3
D
N
M
T
3
A
n
o
n
R
8
8
2
.
3
D
N
M
T
3
A
R
8
8
2
.
3
F
L
G
.
3
F
L
T
3
_
I
T
D
.
3
F
L
T
3
_
T
K
D
.
3
G
A
T
A
2
.
3
I
D
H
1
.
3
I
D
H
2
.
3
K
M
T
2
C
.
3
K
R
A
S
.
3
M
R
G
.
3
M
Y
C
.
3
N
F
1
.
3
N
F
E
2
.
3
N
R
A
S
.
3
P
T
P
N
1
1
.
3
R
A
D
2
1
.
3
S
M
C
1
A
.
3
S
M
C
3
.
3
S
R
S
F
2
.
3
S
T
A
G
2
.
3
T
E
T
2
.
3
W
T
1
.
3
Covariates
Regression
coefficient
Transition 3: CR1 −> Relapse1
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
4
a
g
e
.
4
g
e
n
d
e
r
.
4
l
o
g
1
0
_
w
b
c
.
4
C
E
B
P
A
.
4
D
N
M
T
3
A
n
o
n
R
8
8
2
.
4
D
N
M
T
3
A
R
8
8
2
.
4
F
L
G
.
4
F
L
T
3
_
I
T
D
.
4
F
L
T
3
_
T
K
D
.
4
G
A
T
A
2
.
4
I
D
H
1
.
4
I
D
H
2
.
4
K
M
T
2
C
.
4
K
R
A
S
.
4
M
R
G
.
4
M
Y
C
.
4
N
F
1
.
4
N
F
E
2
.
4
N
R
A
S
.
4
P
T
P
N
1
1
.
4
R
A
D
2
1
.
4
S
M
C
1
A
.
4
S
M
C
3
.
4
S
R
S
F
2
.
4
S
T
A
G
2
.
4
T
E
T
2
.
4
W
T
1
.
4
Covariates
Regression
coefficient
Transition 4: CR1 −> Death (CR1)
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
5
a
g
e
.
5
g
e
n
d
e
r
.
5
l
o
g
1
0
_
w
b
c
.
5
C
E
B
P
A
.
5
D
N
M
T
3
A
n
o
n
R
8
8
2
.
5
D
N
M
T
3
A
R
8
8
2
.
5
F
L
G
.
5
F
L
T
3
_
I
T
D
.
5
F
L
T
3
_
T
K
D
.
5
G
A
T
A
2
.
5
I
D
H
1
.
5
I
D
H
2
.
5
K
M
T
2
C
.
5
K
R
A
S
.
5
M
R
G
.
5
M
Y
C
.
5
N
F
1
.
5
N
F
E
2
.
5
N
R
A
S
.
5
P
T
P
N
1
1
.
5
R
A
D
2
1
.
5
S
M
C
1
A
.
5
S
M
C
3
.
5
S
R
S
F
2
.
5
S
T
A
G
2
.
5
T
E
T
2
.
5
W
T
1
.
5
Covariates
Regression
coefficient
Transition 5: Relapse1 −> CR2
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
6
a
g
e
.
6
g
e
n
d
e
r
.
6
l
o
g
1
0
_
w
b
c
.
6
C
E
B
P
A
.
6
D
N
M
T
3
A
n
o
n
R
8
8
2
.
6
D
N
M
T
3
A
R
8
8
2
.
6
F
L
G
.
6
F
L
T
3
_
I
T
D
.
6
F
L
T
3
_
T
K
D
.
6
G
A
T
A
2
.
6
I
D
H
1
.
6
I
D
H
2
.
6
K
M
T
2
C
.
6
K
R
A
S
.
6
M
R
G
.
6
M
Y
C
.
6
N
F
1
.
6
N
F
E
2
.
6
N
R
A
S
.
6
P
T
P
N
1
1
.
6
R
A
D
2
1
.
6
S
M
C
1
A
.
6
S
M
C
3
.
6
S
R
S
F
2
.
6
S
T
A
G
2
.
6
T
E
T
2
.
6
W
T
1
.
6
Covariates
Regression
coefficient
Transition 6: Relapse1 −> Death (relapse)
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
7
a
g
e
.
7
g
e
n
d
e
r
.
7
l
o
g
1
0
_
w
b
c
.
7
C
E
B
P
A
.
7
D
N
M
T
3
A
n
o
n
R
8
8
2
.
7
D
N
M
T
3
A
R
8
8
2
.
7
F
L
G
.
7
F
L
T
3
_
I
T
D
.
7
F
L
T
3
_
T
K
D
.
7
G
A
T
A
2
.
7
I
D
H
1
.
7
I
D
H
2
.
7
K
M
T
2
C
.
7
K
R
A
S
.
7
M
R
G
.
7
M
Y
C
.
7
N
F
1
.
7
N
F
E
2
.
7
N
R
A
S
.
7
P
T
P
N
1
1
.
7
R
A
D
2
1
.
7
S
M
C
1
A
.
7
S
M
C
3
.
7
S
R
S
F
2
.
7
S
T
A
G
2
.
7
T
E
T
2
.
7
W
T
1
.
7
Covariates
Regression
coefficient
Transition 7: CR2 −> Relapse2
−0.50
−0.25
0.00
0.25
0.50
0.75
t
r
e
a
t
m
e
n
t
.
8
a
g
e
.
8
g
e
n
d
e
r
.
8
l
o
g
1
0
_
w
b
c
.
8
C
E
B
P
A
.
8
D
N
M
T
3
A
n
o
n
R
8
8
2
.
8
D
N
M
T
3
A
R
8
8
2
.
8
F
L
G
.
8
F
L
T
3
_
I
T
D
.
8
F
L
T
3
_
T
K
D
.
8
G
A
T
A
2
.
8
I
D
H
1
.
8
I
D
H
2
.
8
K
M
T
2
C
.
8
K
R
A
S
.
8
M
R
G
.
8
M
Y
C
.
8
N
F
1
.
8
N
F
E
2
.
8
N
R
A
S
.
8
P
T
P
N
1
1
.
8
R
A
D
2
1
.
8
S
M
C
1
A
.
8
S
M
C
3
.
8
S
R
S
F
2
.
8
S
T
A
G
2
.
8
T
E
T
2
.
8
W
T
1
.
8
Covariates
Regression
coefficient
Transition 8: CR2 −> Death (CR2)
Figure 8 Estimated regression effects of clinical and mutation variables by FSGLmstate separately for each transition derived
from the 9-state model for acute myeloid leukemia (AML) based on the AMLSG 09-09 trial data. Larger crosses (×) depict
non-zero effects.

16 Miah ET AL
7 DISCUSSION
In this paper, we propose FSGL penalized multi-state models for data-driven variable selection and dimension reduction in
order to capture pathogenic disease processes more accurately while incorporating clinical and molecular data. The objective
was to select a sparse model based on high-dimensional molecular data by extended regularization methods. We adapted the
ADMM algorithm to FSGL penalized multi-state models combining the penalization concepts of general sparsity, pairwise
differences of covariate effects along with transition grouping. Thus, FSGL penalized multi-state models tackle sparse model
building while incorporating a-priori information about the covariate and transition structure into a prediction model. Further,
the ADMM algorithm can quite efficiently handle large-scale problems due to the decomposability of the objective function as
well as superior convergence properties.
The proof-of-concept simulation study evaluated the FSGLmstate algorithm’s regularization performance to select a sparse
model incorporating only relevant transition-specific effects and similar cross-transition effects. Compared to unpenalized and
global lasso penalized estimation, FSGLmstate identifies similarity and grouping structures depending on the choices of the
corresponding tuning parameters.
The real-world data application on a phase III AML trial illustrated the utility of an FSGL penalized multi-state model to
reduce model complexity while combining clinical and molecular data. Whereas an unpenalized 9-state model incorporating all
established clinical predictors along with high-dimensional mutation information based on the study data suffers from overfitting
due to few events per variable, our FSGLmstate approach allows to fit a penalized 9-state model combining clinical predictors
and mutation variables.
Several improvements and extensions of the proposed FSGL penalty to multi-state models offer further research directions.
One limitation of our work is that time-dependent covariates, e.g. allogeneic stem cell transplantation, and time-dependent
effects are not yet incorporated. Further, post-selection inference requires to be investigated. Besides, the algorithm needs further
adaptations to enhance computational speed and efficiently handle very high dimensions with 𝑃 ≫ 𝑁. Additionally, different
tuning parameter selection criteria should be investigated and extensive phase III simulations for empirical method comparisons
are required to evaluate the performance of our variable selection method across a wide range of settings.
COMPUTATIONAL DETAILS
All implementations and statistical analyses are performed utilizing the statistical computing language R, version 4.4.158
, along
with the R packages mstate59
, penalized22
and penMSM60
among others. R code to reproduce simulation study results and
manuscript figures is available in the Supporting Information.
ACKNOWLEDGEMENTS
The authors would like to thank Maral Saadati, Jan Beyersmann and Jörg Rahnenführer for fruitful discussions. Further, we
are grateful to the German-Austrian AML study group (AMLSG) for providing the 09-09 trial data, as well as all patients and
centers involved in the study. The work of the corresponding author was supported by the Deutsche Forschungsgemeinschaft
(DFG, German Research Foundation, grant number 514653984).
CONFLICT OF INTEREST
The authors have declared no conflict of interest.
References
1. Saadati M, Beyersmann J, Kopp-Schneider A, Benner A. Prediction accuracy and variable selection for penalized cause-
specific hazards models. Biometrical Journal 2018; 60(2): 288–306.

Miah ET AL 17
2. Sennhenn-Reulen H, Kneib T. Structured fusion lasso penalized multi-state models. Statistics in Medicine 2016; 35(25):
4637–4659.
3. Dang X, Huang S, Qian X. Risk factor identification in heterogeneous disease progression with L1-regularized multi-state
models. Journal of Healthcare Informatics Research 2021; 5(1): 20–53.
4. Huang S, Hu C, Bell ML, et al. Regularized continuous-time Markov model via elastic net. Biometrics 2018; 74(3):
1045–1054.
5. Reulen H, Kneib T. Boosting multi-state models. Lifetime Data Analysis 2016; 22(2): 241–262.
6. Edelmann D, Saadati M, Putter H, Goeman J. A global test for competing risks survival analysis. Statistical Methods in
Medical Research 2020; 29(12): 3666–3683.
7. Fiocco M, Putter H, van Houwelingen JC. Reduced rank proportional hazards model for competing risks. Biostatistics
2005; 6(3): 465–478.
8. Fiocco M, Putter H, van Houwelingen HC. Reduced-rank proportional hazards regression and simulation-based prediction
for multi-state models. Statistics in Medicine 2008; 27(21): 4340–4358.
9. Huang Y. Two-sample multistate accelerated sojourn times model. Journal of the American Statistical Association 2000;
95(450): 619–627.
10. Ramchandani R, Finkelstein DM, Schoenfeld DA. Estimation for an accelerated failure time model with intermediate states
as auxiliary information. Lifetime Data Analysis 2020; 26(1): 1–20.
11. Huang J, Ma S, Xie H. Regularized estimation in the accelerated failure time model with high-dimensional covariates.
Biometrics 2006; 62(3): 813–820.
12. Andersen PK, Klein JP, Rosthoj S. Generalised linear models for correlated pseudo-observations, with applications to
multi-state models. Biometrika 2003; 90(1): 15–27.
13. Wang L, Zhou J, Qu A. Penalized generalized estimating equations for high-dimensional longitudinal data analysis.
Biometrics 2012; 68(2): 353–360.
14. Niu Y, Wang X, Cao H, Peng Y. Variable selection via penalized generalized estimating equations for a marginal survival
model. Statistical Methods in Medical Research 2020; 29(9): 2493–2506.
15. Su CL, Chiou SH, Lin FC, Platt RW. Analysis of survival data with cure fraction and variable selection: A pseudo-
observations approach. Statistical Methods in Medical Research 2022; 31(11): 1–17.
16. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer Science & Business
Media. 1993.
17. Andersen PK, Keiding N. Multi-state models for event history analysis. Statistical Methods in Medical Research 2002;
11(2): 91–115.
18. Putter H, Geskus RB, Fiocco M. Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine
2007; 26(11): 2389–2430.
19. Le-Rademacher JG, Therneau TM, Ou FS. The utility of multistate models: a flexible framework for time-to-event data.
Current Epidemiology Reports 2022; 9(3): 183–189.
20. Putter H, van der Hage J, de Bock GH, Elgalta R, van de Velde CJ. Estimation and prediction in a multi-state model for
breast cancer. Biometrical Journal 2006; 48(3): 366–380.
21. de Wreede LC, Fiocco M, Putter H. The mstate package for estimation and prediction in non-and semi-parametric multi-state
and competing risks models. Computer methods and programs in biomedicine 2010; 99(3): 261–274.
22. Goeman JJ. L1 penalized estimation in the cox proportional hazards model. Biometrical Journal 2010; 52(1): 70–84.

18 Miah ET AL
23. van Houwelingen HC, Bruinsma T, Hart AAM, van’t Veer LJ, Wessels LFA. Cross-validated Cox regression on microarray
gene expression data. Statistics in Medicine 2006; 25(18): 3201–3216.
24. Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. High-Dimensional Cox Models: The Choice of Penalty as
Part of the Model Building Process. Biometrical Journal 2010; 52(1): 50–69.
25. Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician.
Biometrical Journal 2018; 60(3): 431–449.
26. Salerno S, Li Y. High-dimensional survival analysis: Methods and applications. Annual review of statistics and its
application 2023; 10: 25–49.
27. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B
(Methodological) 1996; 58(1): 267–288.
28. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 2005; 67(2): 301–320.
29. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 2005; 67(1): 91–108.
30. Simon N, Friedman J, Hastie T, Tibshirani R. A Sparse-Group Lasso. Journal of Computational and Graphical Statistics
2013; 22(2): 231–245.
31. Zhou J, Liu J, Narayan VA, Ye J. Modeling disease progression via fused sparse group lasso. In: Proceedings of the 18th
ACM SIGKDD international conference on Knowledge discovery and data mining. 2012:1095–1103.
32. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12(1):
55–67.
33. Gray RJ. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal
of the American Statistical Association 1992; 87(420): 942–951.
34. Verweij PJ, van Houwelingen HC. Penalized likelihood in Cox regression. Statistics in medicine 1994; 13(23-24): 2427–
2436.
35. Tibshirani R. The lasso method for variable selection in the cox model. Statistics in medicine 1997; 16(4): 385–395.
36. Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate
descent. Journal of statistical software 2011; 39(5): 1.
37. Chaturvedi N, de Menezes RX, Goeman JJ. Fused lasso algorithm for cox proportional hazards and binomial logit models
with application to copy number profiles. Biometrical Journal 2014; 56(3): 477–492.
38. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical
Society Series B: Statistical Methodology 2006; 68(1): 49–67.
39. Kim J, Sohn I, Jung SH, Kim S, Park C. Analysis of survival data with group lasso. Communications in Statistics-Simulation
and Computation 2012; 41(9): 1593–1605.
40. Beer JC, Aizenstein HJ, Anderson SJ, Krafty RT. Incorporating prior information with fused sparse group lasso: Application
to prediction of clinical measures from neuroimages. Biometrics 2019; 75(4): 1299–1309.
41. Fan J, Li R. Variable selection for cox’s proportional hazards model and frailty model. The Annals of Statistics 2002; 30(1):
74–99.
42. Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation.
Computers & mathematics with applications 1976; 2(1): 17–40.

Miah ET AL 19
43. Glowinski R, Marroco A. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité
d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle
Analyse numérique 1975; 9(R2): 41–76.
44. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed Optimization and Statistical Learning via the Alternating
Direction Method of Multipliers. Foundations and Trends in Machine Learning 2010; 3(1): 1–122.
45. Parka S, Shin SJ. Admm for least square problems with pairwise-difference penalties for coefficient grouping. Communi-
cations for Statistical Applications and Methods 2022; 29(4): 441–451.
46. He BS, Yang H, Wang S. Alternating direction method with self-adaptive penalty parameters for monotone variational
inequalities. Journal of Optimization Theory and applications 2000; 106: 337–356.
47. Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of
statistical software 2010; 33: 1–22.
48. Andrade D, Fukumizu K, Okajima Y. Convex covariate clustering for classification. Pattern Recognition Letters 2021; 151:
193–199.
49. Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method
of generalized cross-validation. Numerische Mathematik 1978; 31(4): 377–403.
50. Jansen M. Generalized cross validation in variable selection with and without shrinkage. Journal of statistical planning and
inference 2015; 159: 90–104.
51. Brent RP. Algorithms for minimization without derivatives. Englewood-Cliffs: Prentice-Hall. 1973.
52. Heinze G, Boulesteix AL, Kammer M, et al. Phases of methodological research in biostatistics—building the evidence base
for new methods. Biometrical Journal 2024; 66(1): 2200222.
53. Siepe BS, Bartoš F, Morris T, et al. Simulation studies for methodological research in psychology: A standardized template
for planning, preregistration, and reporting [preprint]. PsyArXiv 2023.
54. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Statistics in Medicine 2019;
38(11): 2074–2102.
55. Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating competing risks data in survival analysis. Statistics
in Medicine 2009; 28(6): 956–971.
56. Döhner H, Weber D, Krzykalla J, et al. Intensive chemotherapy with or without gemtuzumab ozogamicin in patients with
npm1-mutated acute myeloid leukaemia (amlsg 09–09): a randomised, open-label, multicentre, phase 3 trial. The Lancet
Haematology 2023; 10(7): e495–e509.
57. Cocciardi S, Saadati M, Weiß N, et al. Impact of myelodysplasia-related and additional gene mutations in intensively treated
patients with NPM1-mutated acute myeloid leukemia. Accepted for Publication in HemaSphere 2024.
58. R Core Team. R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing;Vienna,
Austria: 2024.
59. de Wreede LC, Fiocco M, Putter H. mstate: An R Package for the Analysis of Competing Risks and Multi-State Models.
Journal of Statistical Software 2011; 38(1): 1–30.
60. Reulen H. penMSM: Estimating Regularized Multi-state Models Using L1 Penalties.2015. R package version 0.99.
View publication stats

Variable_selection_via_fused_sparse-group_lasso_pe.pdf

More Related Content

Similar to Variable_selection_via_fused_sparse-group_lasso_pe.pdf (20)

More from PareshMishra11 (11)

Recently uploaded (20)

Variable_selection_via_fused_sparse-group_lasso_pe.pdf