SlideShare a Scribd company logo
Propensity Score Methods for
Comparative Effectiveness Research with
Multiple Treatment Groups
Kazuki Yoshida
Division of Rheumatology, Immunology and Allergy
Brigham and Women’s Hospital & Harvard Medical School
@kaz_yos kaz-yos kazukiyoshida@mail.harvard.edu
2019-03-18 at
Study Design and Biostatistics Center
Department of Population Health Sciences
University of Utah
1 / 50
Multi-group Comparative Effectiveness
Increasing availability of multiple medications
=⇒ Need for CER involving multiple groups.
Recent observational CER examples in literature:
[Zeng et al., 2019] Analgesics: Tramadol, Naproxen,
Diclofenac, Celecoxib, Etoricoxib, Codeine
[Pawar et al., 2019] Biological Antirheumatics:
Tocilizumab, Tumor necrosis factor inhibitors, Abatacept
[Bergstra et al., 2019] Antirheumatics: Synthetic,
Synthetic + Glucocorticoids, Biological w or w/o
synthetic
[Shah et al., 2018] Anticoagulants: Rivaroxaban,
Dabigatran, Apixaban, Warfarin
6 / 50
Propensity Score Methods and CER
Propensity score (PS) [Rosenbaum and Rubin, 1983]
methods are routinely used in CER comparing two
treatment strategies.
Adjustment [Rosenbaum and Rubin, 1983]
Stratification [Rosenbaum and Rubin, 1984]
Matching [Rosenbaum and Rubin, 1985]
Weighting [Rosenbaum, 1987]
However, when there are more than two treatment
strategies of interest, adaptation is less clear and varies
across fields. [Lopez and Gutman, 2017]
7 / 50
Approaches in Examples
Paper Treatment Approach
[Zeng et al., 2019] Analgesics Pairwise PS, Match
[Pawar et al., 2019] Biologics Pairwise PS, Match
[Bergstra et al., 2019] Antirheumatics Multinom PS, Adjust
[Shah et al., 2018] Anticoagulants Pairwise PS, Adjust
Several options in multi-group CER.
Cohort Construction: Pairwise vs Simultaneous eligibility
PS Estimation: Binary vs Multinomial (logistic) model
PS Methods: Adjustment, Stratification, Matching, or
Weighting
8 / 50
Example of RCT with Multiple Groups
Prospective Randomized Evaluation of Celecoxib
Integrated Safety versus Ibuprofen or Naproxen
(PRECISION) trial [Becker et al., 2009, Nissen et al., 2016]
9 / 50
Question
How can we better design multi-group CER using PS
methods?
10 / 50
Two-Group PS
Weighting
11 / 50
Notations
Yi : Outcome
Ai : Treatment Strategy
Xi : Vector of Covariates
ei : Propensity Score
where
ei = P[Ai = 1|Xi ]
12 / 50
Balancing Weights
[Li et al., 2018] organized existing PS weighting strategies
as a class of weights (covariate) "balancing weights".
The balancing weight for a given individual is defined as:
h(Xi )
Ai ei + (1 − Ai )(1 − ei )
= h(Xi )IPTWi
where h(·) is a prespecified scalar function of Xi , but not Ai .
Intuition:
Denominator (IPTW) balances groups in covariates
Numerator h(·) manipulates target population (estimand)
13 / 50
PS Weighting with Binary Strategy
IPTWi =
1
Ai ei + (1 − Ai )(1 − ei )
=
⎧
⎪⎪⎨
⎪⎪⎩
1
ei
for Ai = 1
1
1 − ei
for Ai = 0
ATTWi =
ei
Ai ei + (1 − Ai )(1 − ei )
=
⎧
⎨
⎩
1 for Ai = 1
ei
1 − ei
for Ai = 0
ATUWi =
1 − ei
Ai ei + (1 − Ai )(1 − ei )
=
⎧
⎨
⎩
1 − ei
ei
for Ai = 1
1 for Ai = 0
MWi =
min {ei , 1 − ei }
Ai ei + (1 − Ai )(1 − ei )
=
ATTWi for ei ≤ 0.5
ATUWi for ei > 0.5
OWi =
ei (1 − ei )
Ai ei + (1 − Ai )(1 − ei )
=
1 − ei for Ai = 1
ei for Ai = 0
[Rosenbaum, 1987, Robins et al., 2000, Sato and Matsuyama, 2003, Li and Greene, 2013,
Li et al., 2018]
14 / 50
PS Methods Visualized (Equal Groups)
Matching MW OW
Original IPTW ATTW ATUW
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
Propensity score
Frequency
Treatment
Treated
Untreated
15 / 50
PS Methods Visualized (Fewer Treated)
Matching MW OW
Original IPTW ATTW ATUW
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
Propensity score
Frequency
Treatment
Treated
Untreated
16 / 50
PS Methods Visualized (More Treated)
Matching MW OW
Original IPTW ATTW ATUW
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
Propensity score
Frequency
Treatment
Treated
Untreated
17 / 50
Asymptotic Equivalence of MW and 1:1 PSM
[Li and Greene, 2013] proved the asymptotic equivalence
of the MW estimand and 1:1 PS matching estimand
under:
Finite PS space (no growth with n)
Positivity (i.e., perfect overlap)
1:1 exact PS matching
18 / 50
Estimands
Using balancing weights [Li et al., 2018] various
population can be targeted for inference of the (marginal)
treatment effect.
IPTW targets average treatment effect (ATE).
We can weights specifically for the average treatment
effect on the treated (ATT) or untreated (ATU)
1:1 PSM and MW target the treatment effect in a
feasible subset of the sample.
[Samuels and Greevy, 2018] named this estimand
"average treatment effect on the evenly matchable units"
(ATM).
OW similarly targets a feasible subset.
19 / 50
Multiple Group
Setting
20 / 50
Generalized PS
Conditional probability of receiving a particular level of
the treatment given the pre-treatment variables:
[Imbens, 2000]
Ai ∈ {0, 1, ..., J}
eji = P[Ai = j|Xi ]
Subject to
J
j=0
eji = 1
Each individual has a PS vector ei = (e0i , e1i , . . . , eJi )T
.
21 / 50
Generalized Balancing Weights
[Li and Li, 2018] extended the balancing weights
framework using the generalized PS.
Using our notation,
h(Xi )
J
j=0
eji I(Ai = j)
= h(Xi )IPTWi
where h(·) is a prespecified scalar function of Xi , but not Ai .
Intuition:
Denominator (IPTW) balances groups in covariates
Numerator h(·) manipulates target population (estimand)
22 / 50
Generalized PS Weighting
IPTWi =
1
J
j=0
eji I(Ai = j)
=
1
eAi i
> 1 for all Ai
AT(k)Wi =
eji
J
j=0
eki I(Ai = j)
=
⎧
⎨
⎩
1 for Ai = k
eki
eAi i
for Ai ̸= k
MWi =
minj {eji }
J
j=0
eji I(Ai = j)
=
⎧
⎨
⎩
1 for Ai = argminj {eji }
minj {eji }
eAi i
< 1 otherwise
OWi =
J
j=0
1
eji
−1
J
j=0
eji I(Ai = j)
=
⎧
⎨
⎩
1
eAi i
1
J
l=0
1
eli
< 1 for all Ai = 1
[Yoshida et al., 2017, Li and Li, 2018]
23 / 50
Generalized PS Weighting Visualized I
x
y
z
Raw
xy
z
Group 0
x
y
z
Group 1
x
y
z
Group 2
x
y
z
IPTW
x
y
z
MW
xy
z
OW
24 / 50
Generalized PS Weighting Visualized II
x
y
z
Raw
x
y
z
Group 0
x
y
z
Group 1
x
y
z
Group 2
x
y
z
IPTW
x
y
z
MW
x
y
z
OW
25 / 50
Generalized PS Weighting Visualized III
x
y
z
Raw
x
y
z
Group 0
x
y
z
Group 1
x
y
z
Group 2
x
y
z
IPTW
x
y
z
MW
x
y
z
OW
26 / 50
Simulation Study
[Yoshida et al., 2017] examined 3-group MW in
comparison to 3-group IPTW and 1:1:1 simultaneous
three-way matching [Rassen et al., 2013].
OW was not included.
27 / 50
Mean Squared Error
●●●
●●●
●●● ●●●
●
●●
●
●
●
●
●●
●
●
●
●●● ●
●●
●●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●●
●●● ●●●
●
●
● ●
●
●
●
●●
●
●
●
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
Goodoverlap
Non−nullmaineffects
Pooroverlap
Non−nullmaineffects
U
nadj
M
atch
M
W
IPTW
U
nadj
M
atch
M
W
IPTW
U
nadj
M
atch
M
W
IPTW
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
MeanSquaredError
pExpo 33:33:33 10:45:45 10:10:80
28 / 50
Estimands
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
Goodoverlap
Non−nullmaineffects
Pooroverlap
Non−nullmaineffects
U
nadj
M
atch
M
W
IPTW
U
nadj
M
atch
M
W
IPTW
U
nadj
M
atch
M
W
IPTW
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
TrueRiskRatio
pExpo 33:33:33 10:45:45 10:10:80
Estimand calculation was based on the counterfactual method described in [Austin, 2013] 29 / 50
Simulation: Summary results
Comparing MW to three-way matching and IPTW, we found:
Similar estimands for MW and matching, but not IPTW
Best covariate balance
Similarly small bias compared to matching
Smaller MSE compared to matching in all scenarios
More robust to rare events, unequally sized groups, and
poor covariate overlap
The full results are available in [Yoshida et al., 2017]
30 / 50
Empirical example
Medicare Beneficiary dataset from PA and NJ
(1999-2005) [Solomon et al., 2010]
Unadjusted
nsNSAIDs Coxibs Opioids SMD
n 4874 6172 12601
Charlson score, mean (SD) 1.59 (1.54) 1.72 (1.53) 2.17 (1.78) 0.23
Antithrombotic use, % 14.4 17.6 27.7 0.22
No. prescription drugs, mean (SD) 8.28 (4.69) 8.55 (4.76) 9.76 (5.38) 0.20
No. days in hospital, mean (SD) 1.85 (6.90) 2.19 (6.86) 4.18 (9.46) 0.19
White race, % 84.6 88 92.4 0.16
Fracture, % 6.5 7.2 13.7 0.16
Loop diuretic use, % 21.3 25.8 31.3 0.15
Age, mean (SD) 79.67 (7.03) 80.87 (6.99) 81.15 (7.17) 0.14
No. physician visits, mean (SD) 8.72 (6.32) 8.80 (5.99) 10.08 (7.14) 0.14
Myocardial infarction, % 5.2 5.7 9.6 0.11
Stroke, % 15.2 16.1 21.5 0.11
31 / 50
Table 1: Comparison
Unadjusted
nsNSAIDs Coxibs Opioids SMD
Charlson score, mean (SD) 1.59 (1.54) 1.72 (1.53) 2.17 (1.78) 0.23
Antithrombotic use, % 14.4 17.6 27.7 0.22
IPTW
nsNSAIDs Coxibs Opioids SMD
Charlson score, mean (SD) 1.98 (1.70) 1.94 (1.68) 1.94 (1.69) 0.02
Antithrombotic use, % 23.3 22.5 22.4 0.01
MW
nsNSAIDs Coxibs Opioids SMD
Charlson score, mean (SD) 1.62 (1.53) 1.61 (1.52) 1.63 (1.53) 0.01
Antithrombotic use, % 14.9 14.8 15.2 0.01
OW
nsNSAIDs Coxibs Opioids SMD
Charlson score, mean (SD) 1.73 (1.58) 1.71 (1.56) 1.73 (1.57) 0.01
Antithrombotic use, % 17.5 17.2 17.5 0.01
Weighted standardized mean difference (SMD) available in R package tableone.
32 / 50
Empirical example: Outcome regression
●
● ● ●
● ●
●
●
●
● ● ●
●
● ● ●
Coxib vs nsNSAIDs Opioids vs nsNSAIDs
DeathMI
Unadj IPTW MW OW Unadj IPTW MW OW
1
2
3
1
2
3
model
HR
33 / 50
Conclusion
MW has been suggested as a more efficient alternative to
1:1 pairwise matching. [Li and Greene, 2013]
In a simulation study with three treatment groups, MW
demonstrated similar bias, but smaller MSE compared to
1:1:1 three-way matching. [Rassen et al., 2013]
Efficiency gain compared to 1:1:1 three-way matching was
more noticeable in scenarios in which the outcome events
were rare, treatment groups were unequally sized, or
covariate overlap was poor.
Compared to IPTW, MW was more stable in the poor
covariate overlap setting.
Confirming the type of patients that MW is making
inference for is important in practice.
34 / 50
PS Trimming
35 / 50
PS Trimming
Propensity score trimming has been suggested by several
authors.
To increase efficiency [Crump et al., 2009]
To reduce unmeasured confounding
[Stürmer et al., 2010]
To guide study design [Walker et al., 2013]
[Yoshida et al., 2019] examined multi-group extension of
all three.
Here we focus on the extension of [Stürmer et al., 2010].
36 / 50
Motivation for Stürmer’s PS Trimming
[Stürmer et al., 2010] was concerned with very
heterogeneous treatment effects in the tails of PS
distribution.
[Kurth et al., 2006] tissue plasminogen activator (t-PA)
use vs no t-PA use in stroke patients. Outcome
in-hospital death. Very high mortality in t-PA users with
lowest probabilities for t-PA.
[Lunt et al., 2009] tumor necrosis factor inhibitor (TNFi)
initiation vs non-TNFi treatment in rheumatoid arthritis
patients. Outcome death. Higher mortality among
non-TNFi users with highest probabilities for TNFi
initiation.
[Stürmer et al., 2010] hypothesized that there may be
higher prevalence of unmeasured confounders that
preferentially introduce more confounding in the tails.
37 / 50
Definition of Stürmer’s PS Trimming
[Stürmer et al., 2010] proposed the asymmetric PS
trimming to remedy this.
Their simulation study confirmed its benefit in bias
reduction if indeed the tails of PS contained higher
prevalence of unmeasured confounders.
38 / 50
Question
[Stürmer et al., 2010] demonstrated benefits of PS
trimming in reducing unmeasured confounding in the
presence of unmeasured confounders that were more
prevalent in the tails of the PS distribution.
How can we conceptualize this issue in the general
setting?
How can we extend their method?
39 / 50
Original Two-Group Definition
Method Existing Binary Definition
Stürmer Is = i ∈ I : ei ∈ F−1
ei |Ai
(0.05|1), F−1
ei |Ai
(0.95|0)
Define the lower threshold using the treated PS
distribution.
Define the upper threshold
Notation Explanation
i ∈ {1, ..., n} index for an individual
I = {1, ..., n} index set for entire sample
Ai ∈ {0, 1} treatment variable
ei = P[Ai = 1|Xi ] propensity score
p = P[Ai = 1] treatment prevalence
F−1
ei |Ai
(x|a) treatment-specific quantile of ei
40 / 50
Proposed definitions I
Method Proposed Multinomial Definition
Stürmer IJ,s = i ∈ I : eji ≥ F−1
eji |Ai
(αJ,s|j) ∀ j ∈ {0, ..., J}
Define a threshold at the 100 × αJ,s percentile of each PS
in the corresponding treatment group.
Trim individuals outside the region above all these
thresholds.
We used the following
provisional thresholds
for visualization.
Groups J αJ,s
2 1 0.050
3 2 0.033
4 3 0.025
5 4 0.020
J + 1 J 1
J+1
1
10
41 / 50
Visualization Explanation
x
y
z
84.6%
(86.2; 82.1; 88.3)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Group 0
Group 1 Group 2
Group
●
●
●
0
1
2
Interactive web application
42 / 50
Data generation mechanism
Xm
i
Xu
i
Ai Yi
Outcome model
βA1, βA2 (main effects)
for treatment effects
βXA1, βXA2 (interactions)
for additional treatment effects in subset
Treatment model
α01, α02 (intercepts)
for treatment prevalence
αX1, αX2 (covariate association)
for covariate overlap level
Outcome model
β0 (intercept)
for baseline rate of events
βX (covariate association)
for strength of risk factors
Unmeasured covariates Xu
i were introduced in tails of PS
based on Xm
i only.
Treatment generating model: Multinomial logistic model
Outcome generating model: Poisson model
43 / 50
Bias
●
●
●
● ●
●
● ● ● ● ●
●
● ● ● ● ●●
● ● ● ● ●
●
●
●
●
● ●
●
●
● ● ● ●
●
● ● ● ● ●
●
● ● ● ● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ● ● ● ●
●
● ● ● ● ●
●
1vs0
Sturmer
2vs0
Sturmer
2vs1
Sturmer
UnadjIPTWMWOW
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
−0.8
−0.4
0.0
0.4
−0.8
−0.4
0.0
0.4
−0.8
−0.4
0.0
0.4
−0.8
−0.4
0.0
0.4
Threshold
Bias
44 / 50
Simulation SE
● ●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
1vs0
Sturmer
2vs0
Sturmer
2vs1
Sturmer
UnadjIPTWMWOW
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
Threshold
SE
45 / 50
Simulation Root MSE
●
●
●
●
●
●
● ● ● ●
●
●
● ● ● ●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ● ●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●
● ● ●
●
●
● ● ● ●
●
●
●
● ● ●
●●
1vs0
Sturmer
2vs0
Sturmer
2vs1
Sturmer
UnadjIPTWMWOW
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
Threshold
sqrt(MSE)
46 / 50
Summary Result
Unmeasured confounding was reduced by trimming in
many cases even with MW and OW albeit to a lesser
extent.
Initial benefits on variance were apparent for IPTW, but
this was not the case for MW and OW.
Practical implication: Stürmer trimming with several
trimming thresholds may be useful as a sensitivity
analysis.
Important limitation in practice: Changing point estimate
with trimming can be due to both unmeasured
confounding reduction and true treatment effect
heterogeneity.
47 / 50
Recommendations for Multi-Group CER
The multinomial PS approach more closely approximate a
multi-arm RCT than the pairwise PS approach. PS
weighting is easier than matching.
When MW and IPTW results diverge, reviewing and
revising the eligibility criteria may be most important.
MW and OW, although more stable, require more
attention to whose effect we are studying. A weighted
Table 1 can help. Note that the smallest group tend to
affect the estimand most.
If unmeasured confounders are suspected in the tails of
the PS distribution, PS trimming may be a useful
sensitivity analysis even for MW and OW.
48 / 50
Further Information on MW
Slides: https://www.slideshare.net/kaz_yos
Code: https://github.com/kaz-yos/mw
Weighted tables:
https://github.com/kaz-yos/tableone
Published Paper: Epidemiology 2017;28:387
1 / 7
Further Information on Trimming
Slides: https://www.slideshare.net/kaz_yos
Code: https://github.com/kaz-yos/
multinomial-ps-trimming
Published Paper: Am J Epidemiol. 2019;188:609
2 / 7
Bibliography I
[Austin, 2013] Austin, P. C. (2013).
The performance of different propensity score methods for estimating marginal hazard ratios.
Stat Med, 32(16):2837–2849.
[Becker et al., 2009] Becker, M. C., Wang, T. H., Wisniewski, L., Wolski, K., Libby, P., Lüscher, T. F.,
Borer, J. S., Mascette, A. M., Husni, M. E., Solomon, D. H., Graham, D. Y., Yeomans, N. D.,
Krum, H., Ruschitzka, F., Lincoff, A. M., Nissen, S. E., and PRECISION Investigators (2009).
Rationale, design, and governance of Prospective Randomized Evaluation of Celecoxib Integrated
Safety versus Ibuprofen Or Naproxen (PRECISION), a cardiovascular end point trial of nonsteroidal
antiinflammatory agents in patients with arthritis.
Am. Heart J., 157(4):606–612.
[Bergstra et al., 2019] Bergstra, S. A., Winchow, L.-L., Murphy, E., Chopra, A., Salomon-Escoto, K.,
Fonseca, J. a. E., Allaart, C. F., and Landewé, R. B. M. (2019).
How to treat patients with rheumatoid arthritis when methotrexate has failed? The use of a multiple
propensity score to adjust for confounding by indication in observational studies.
Ann. Rheum. Dis., 78(1):25–30.
[Crump et al., 2009] Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O. A. (2009).
Dealing with limited overlap in estimation of average treatment effects.
Biometrika, 96(1):187–199.
[Imbens, 2000] Imbens, G. W. (2000).
The role of the propensity score in estimating dose-response functions.
Biometrika, 87(3):706–710.
3 / 7
Bibliography II
[Kurth et al., 2006] Kurth, T., Walker, A. M., Glynn, R. J., Chan, K. A., Gaziano, J. M., Berger, K.,
and Robins, J. M. (2006).
Results of multivariable logistic regression, propensity matching, propensity adjustment, and
propensity-based weighting under conditions of nonuniform effect.
Am. J. Epidemiol., 163(3):262–270.
[Li and Li, 2018] Li, F. and Li, F. (2018).
Propensity Score Weighting for Causal Inference with Multi-valued Treatments.
arXiv:1808.05339 [stat].
[Li et al., 2018] Li, F., Morgan, K. L., and Zaslavsky, A. M. (2018).
Balancing Covariates via Propensity Score Weighting.
Journal of the American Statistical Association, 113(521):390–400.
[Li and Greene, 2013] Li, L. and Greene, T. (2013).
A weighting analogue to pair matching in propensity score analysis.
Int J Biostat, 9(2):215–234.
[Lopez and Gutman, 2017] Lopez, M. J. and Gutman, R. (2017).
Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas.
Statist. Sci., 32(3):432–454.
[Lunt et al., 2009] Lunt, M., Solomon, D., Rothman, K., Glynn, R., Hyrich, K., Symmons, D. P. M.,
Stürmer, T., British Society for Rheumatology Biologics Register, and British Society for
Rheumatology Biologics Register Control Centre Consortium (2009).
Different methods of balancing covariates leading to different effect estimates in the presence of
effect modification.
Am. J. Epidemiol., 169(7):909–917.
4 / 7
Bibliography III
[Nissen et al., 2016] Nissen, S. E., Yeomans, N. D., Solomon, D. H., Lüscher, T. F., Libby, P., Husni,
M. E., Graham, D. Y., Borer, J. S., Wisniewski, L. M., Wolski, K. E., Wang, Q., Menon, V.,
Ruschitzka, F., Gaffney, M., Beckerman, B., Berger, M. F., Bao, W., Lincoff, A. M., and
PRECISION Trial Investigators (2016).
Cardiovascular Safety of Celecoxib, Naproxen, or Ibuprofen for Arthritis.
N. Engl. J. Med., 375(26):2519–2529.
[Pawar et al., 2019] Pawar, A., Desai, R. J., Solomon, D. H., Santiago Ortiz, A. J., Gale, S., Bao, M.,
Sarsour, K., Schneeweiss, S., and Kim, S. C. (2019).
Risk of serious infections in tocilizumab versus other biologic drugs in patients with rheumatoid
arthritis: A multidatabase cohort study.
Ann. Rheum. Dis.
[Rassen et al., 2013] Rassen, J. A., Shelat, A. A., Franklin, J. M., Glynn, R. J., Solomon, D. H., and
Schneeweiss, S. (2013).
Matching by propensity score in cohort studies with three treatment groups.
Epidemiology, 24(3):401–409.
[Robins et al., 2000] Robins, J. M., Hernán, M. A., and Brumback, B. (2000).
Marginal structural models and causal inference in epidemiology.
Epidemiology, 11(5):550–560.
[Rosenbaum, 1987] Rosenbaum, P. R. (1987).
Model-Based Direct Adjustment.
Journal of the American Statistical Association, 82(398):387–394.
[Rosenbaum and Rubin, 1983] Rosenbaum, P. R. and Rubin, D. B. (1983).
The central role of the propensity score in observational studies for causal effects.
Biometrika, 70(1):41–55.
5 / 7
Bibliography IV
[Rosenbaum and Rubin, 1984] Rosenbaum, P. R. and Rubin, D. B. (1984).
Reducing Bias in Observational Studies Using Subclassification on the Propensity Score.
J Am Stat Assoc, 79(387):516.
[Rosenbaum and Rubin, 1985] Rosenbaum, P. R. and Rubin, D. B. (1985).
Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the
Propensity Score.
The American Statistician, 39(1):33–38.
[Samuels and Greevy, 2018] Samuels, L. R. and Greevy, R. A. (2018).
Bagged one-to-one matching for efficient and robust treatment effect estimation.
Stat Med, 37(29):4353–4373.
[Sato and Matsuyama, 2003] Sato, T. and Matsuyama, Y. (2003).
Marginal structural models as a tool for standardization.
Epidemiology, 14(6):680–686.
[Shah et al., 2018] Shah, S., Norby, F. L., Datta, Y. H., Lutsey, P. L., MacLehose, R. F., Chen, L. Y.,
and Alonso, A. (2018).
Comparative effectiveness of direct oral anticoagulants and warfarin in patients with cancer and atrial
fibrillation.
Blood Adv, 2(3):200–209.
[Solomon et al., 2010] Solomon, D. H., Rassen, J. A., Glynn, R. J., Lee, J., Levin, R., and
Schneeweiss, S. (2010).
The comparative safety of analgesics in older adults with arthritis.
Arch. Intern. Med., 170(22):1968–1976.
6 / 7
Bibliography V
[Stürmer et al., 2010] Stürmer, T., Rothman, K. J., Avorn, J., and Glynn, R. J. (2010).
Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails
of the propensity score distribution–a simulation study.
Am. J. Epidemiol., 172(7):843–854.
[Walker et al., 2013] Walker, A. M., Patrick, A. R., Lauer, M. S., Hornbrook, M. C., Marin, M. G.,
Platt, R., Roger, V. L., Stang, P., and Schneeweiss, S. (2013).
A tool for assessing the feasibility of comparative effectiveness research.
Comp Eff Res, 2013(3):11–20.
[Yoshida et al., 2017] Yoshida, K., Hernandez-Diaz, S., Solomon, D. H., Jackson, J. W., Gagne, J. J.,
Glynn, R. J., and Franklin, J. M. (2017).
Matching Weights to Simultaneously Compare Three Treatment Groups: Comparison to Three-way
Matching.
Epidemiology, 28(3):387–395.
[Yoshida et al., 2019] Yoshida, K., Solomon, D. H., Haneuse, S., Kim, S. C., Patorno, E., Tedeschi,
S. K., Lyu, H., Franklin, J. M., Stürmer, T., Hernández-Díaz, S., and Glynn, R. J. (2019).
Multinomial Extension of Propensity Score Trimming Methods: A Simulation Study.
Am. J. Epidemiol., 188(3):609–616.
[Zeng et al., 2019] Zeng, C., Dubreuil, M., LaRochelle, M. R., Lu, N., Wei, J., Choi, H. K., Lei, G.,
and Zhang, Y. (2019).
Association of Tramadol With All-Cause Mortality Among Patients With Osteoarthritis.
JAMA, 321(10):969–982.
7 / 7

More Related Content

PDF
Designing Causal Inference Studies Using Real-World Data
PPT
Part 1 Survival Analysis
PPTX
schedule y
PPTX
Clinical trial design
PPTX
SAS Clinical Online Training
PDF
Lecture 6-1. normal and extensive form of game theory
PPTX
T test and types of t-test
PPTX
Pharmacoeconomis
Designing Causal Inference Studies Using Real-World Data
Part 1 Survival Analysis
schedule y
Clinical trial design
SAS Clinical Online Training
Lecture 6-1. normal and extensive form of game theory
T test and types of t-test
Pharmacoeconomis

What's hot (20)

PDF
R workshop xiv--Survival Analysis with R
PPTX
PPTX
ICH E6 good clinical practice guidelines
DOCX
Ib math studies internal assessment final draft
PPTX
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
PPT
Part 2 Cox Regression
PPTX
Survival analysis
PDF
Posthoc
PDF
Inverse variance method of meta-analysis and Cochran's Q
PPTX
Quality Assurance in Clinical Trials
DOCX
Ia final report
PPTX
SDTM Fnal Detail Training
PPT
Data management through spss
PPTX
Inspection Findings in Clinical Trials
PDF
Fixed-effect and random-effects models in meta-analysis
PPTX
Top 10 clinical trial assistant interview questions and answers
PDF
Standards-driven Oncology Studies
PDF
Oncology Therapeutic Area Workshop
PDF
Top 52 clinical research associate interview questions and answers pdf
PDF
演講-Meta analysis in medical research-張偉豪
R workshop xiv--Survival Analysis with R
ICH E6 good clinical practice guidelines
Ib math studies internal assessment final draft
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Part 2 Cox Regression
Survival analysis
Posthoc
Inverse variance method of meta-analysis and Cochran's Q
Quality Assurance in Clinical Trials
Ia final report
SDTM Fnal Detail Training
Data management through spss
Inspection Findings in Clinical Trials
Fixed-effect and random-effects models in meta-analysis
Top 10 clinical trial assistant interview questions and answers
Standards-driven Oncology Studies
Oncology Therapeutic Area Workshop
Top 52 clinical research associate interview questions and answers pdf
演講-Meta analysis in medical research-張偉豪
Ad

Similar to Propensity Score Methods for Comparative Effectiveness Research with Multiple Treatment Groups (20)

PDF
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
PDF
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
PDF
Cluster-Specific Propensity Score Weighting To Stabilize Treatment Effect Est...
PDF
Cluster-Specific Propensity Score Weighting To Stabilize Treatment Effect Est...
PDF
Propensity Score Matching Methods
PPT
Analytic Methods and Issues in CER from Observational Data
PDF
PyData Meetup Berlin 2017-04-19
PDF
A New Statistical Aspects of Cancer Diagnosis and Treatment
PDF
Causality and Propensity Score Methods
PPTX
Economic evaluation. Methods for adjusting survival estimates in the presence...
PPTX
PS.Observational.SAS_Y.Duan
PDF
Causally regularized machine learning
PDF
PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...
PDF
Subgroup identification for precision medicine. a comparative review of 13 me...
PPTX
Week 3 educational product puckett
PDF
Causal Inference Introduction.pdf
PDF
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PDF
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PDF
Life On Earth Is Dependent On The Functions Of Light As A...
PDF
PMED Opening Workshop - Measurement Error and Precision Medicine - Michael Wa...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
Cluster-Specific Propensity Score Weighting To Stabilize Treatment Effect Est...
Cluster-Specific Propensity Score Weighting To Stabilize Treatment Effect Est...
Propensity Score Matching Methods
Analytic Methods and Issues in CER from Observational Data
PyData Meetup Berlin 2017-04-19
A New Statistical Aspects of Cancer Diagnosis and Treatment
Causality and Propensity Score Methods
Economic evaluation. Methods for adjusting survival estimates in the presence...
PS.Observational.SAS_Y.Duan
Causally regularized machine learning
PMED: APPM Workshop: Overview of Methods for Subgroup Identification in Clini...
Subgroup identification for precision medicine. a comparative review of 13 me...
Week 3 educational product puckett
Causal Inference Introduction.pdf
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
Life On Earth Is Dependent On The Functions Of Light As A...
PMED Opening Workshop - Measurement Error and Precision Medicine - Michael Wa...
Ad

More from Kazuki Yoshida (20)

PDF
Graphical explanation of causal mediation analysis
PPTX
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
PDF
What is the Expectation Maximization (EM) Algorithm?
PDF
Emacs Key Bindings
PDF
Visual Explanation of Ridge Regression and LASSO
PDF
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
PDF
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
PDF
Spacemacs: emacs user's first impression
PDF
Multiple Imputation: Joint and Conditional Modeling of Missing Data
PDF
20130222 Data structures and manipulation in R
PDF
20130215 Reading data into R
PDF
Linear regression with R 2
PDF
Linear regression with R 1
PDF
(Very) Basic graphing with R
PDF
Introduction to Deducer
PDF
Groupwise comparison of continuous data
PDF
Categorical data with R
PDF
Install and Configure R and RStudio
PDF
Reading Data into R REVISED
PDF
Descriptive Statistics with R
Graphical explanation of causal mediation analysis
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
What is the Expectation Maximization (EM) Algorithm?
Emacs Key Bindings
Visual Explanation of Ridge Regression and LASSO
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Spacemacs: emacs user's first impression
Multiple Imputation: Joint and Conditional Modeling of Missing Data
20130222 Data structures and manipulation in R
20130215 Reading data into R
Linear regression with R 2
Linear regression with R 1
(Very) Basic graphing with R
Introduction to Deducer
Groupwise comparison of continuous data
Categorical data with R
Install and Configure R and RStudio
Reading Data into R REVISED
Descriptive Statistics with R

Recently uploaded (20)

PPT
6.1 High Risk New Born. Padetric health ppt
PPT
Mutation in dna of bacteria and repairss
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPTX
perinatal infections 2-171220190027.pptx
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
Science Form five needed shit SCIENEce so
PPT
LEC Synthetic Biology and its application.ppt
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPT
Presentation of a Romanian Institutee 2.
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
6.1 High Risk New Born. Padetric health ppt
Mutation in dna of bacteria and repairss
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PMR- PPT.pptx for students and doctors tt
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
perinatal infections 2-171220190027.pptx
Introcution to Microbes Burton's Biology for the Health
Science Form five needed shit SCIENEce so
LEC Synthetic Biology and its application.ppt
The Land of Punt — A research by Dhani Irwanto
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Hypertension_Training_materials_English_2024[1] (1).pptx
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Seminar Hypertension and Kidney diseases.pptx
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Presentation of a Romanian Institutee 2.
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf

Propensity Score Methods for Comparative Effectiveness Research with Multiple Treatment Groups

  • 1. Propensity Score Methods for Comparative Effectiveness Research with Multiple Treatment Groups Kazuki Yoshida Division of Rheumatology, Immunology and Allergy Brigham and Women’s Hospital & Harvard Medical School @kaz_yos kaz-yos kazukiyoshida@mail.harvard.edu 2019-03-18 at Study Design and Biostatistics Center Department of Population Health Sciences University of Utah 1 / 50
  • 2. Multi-group Comparative Effectiveness Increasing availability of multiple medications =⇒ Need for CER involving multiple groups. Recent observational CER examples in literature: [Zeng et al., 2019] Analgesics: Tramadol, Naproxen, Diclofenac, Celecoxib, Etoricoxib, Codeine [Pawar et al., 2019] Biological Antirheumatics: Tocilizumab, Tumor necrosis factor inhibitors, Abatacept [Bergstra et al., 2019] Antirheumatics: Synthetic, Synthetic + Glucocorticoids, Biological w or w/o synthetic [Shah et al., 2018] Anticoagulants: Rivaroxaban, Dabigatran, Apixaban, Warfarin 6 / 50
  • 3. Propensity Score Methods and CER Propensity score (PS) [Rosenbaum and Rubin, 1983] methods are routinely used in CER comparing two treatment strategies. Adjustment [Rosenbaum and Rubin, 1983] Stratification [Rosenbaum and Rubin, 1984] Matching [Rosenbaum and Rubin, 1985] Weighting [Rosenbaum, 1987] However, when there are more than two treatment strategies of interest, adaptation is less clear and varies across fields. [Lopez and Gutman, 2017] 7 / 50
  • 4. Approaches in Examples Paper Treatment Approach [Zeng et al., 2019] Analgesics Pairwise PS, Match [Pawar et al., 2019] Biologics Pairwise PS, Match [Bergstra et al., 2019] Antirheumatics Multinom PS, Adjust [Shah et al., 2018] Anticoagulants Pairwise PS, Adjust Several options in multi-group CER. Cohort Construction: Pairwise vs Simultaneous eligibility PS Estimation: Binary vs Multinomial (logistic) model PS Methods: Adjustment, Stratification, Matching, or Weighting 8 / 50
  • 5. Example of RCT with Multiple Groups Prospective Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen or Naproxen (PRECISION) trial [Becker et al., 2009, Nissen et al., 2016] 9 / 50
  • 6. Question How can we better design multi-group CER using PS methods? 10 / 50
  • 8. Notations Yi : Outcome Ai : Treatment Strategy Xi : Vector of Covariates ei : Propensity Score where ei = P[Ai = 1|Xi ] 12 / 50
  • 9. Balancing Weights [Li et al., 2018] organized existing PS weighting strategies as a class of weights (covariate) "balancing weights". The balancing weight for a given individual is defined as: h(Xi ) Ai ei + (1 − Ai )(1 − ei ) = h(Xi )IPTWi where h(·) is a prespecified scalar function of Xi , but not Ai . Intuition: Denominator (IPTW) balances groups in covariates Numerator h(·) manipulates target population (estimand) 13 / 50
  • 10. PS Weighting with Binary Strategy IPTWi = 1 Ai ei + (1 − Ai )(1 − ei ) = ⎧ ⎪⎪⎨ ⎪⎪⎩ 1 ei for Ai = 1 1 1 − ei for Ai = 0 ATTWi = ei Ai ei + (1 − Ai )(1 − ei ) = ⎧ ⎨ ⎩ 1 for Ai = 1 ei 1 − ei for Ai = 0 ATUWi = 1 − ei Ai ei + (1 − Ai )(1 − ei ) = ⎧ ⎨ ⎩ 1 − ei ei for Ai = 1 1 for Ai = 0 MWi = min {ei , 1 − ei } Ai ei + (1 − Ai )(1 − ei ) = ATTWi for ei ≤ 0.5 ATUWi for ei > 0.5 OWi = ei (1 − ei ) Ai ei + (1 − Ai )(1 − ei ) = 1 − ei for Ai = 1 ei for Ai = 0 [Rosenbaum, 1987, Robins et al., 2000, Sato and Matsuyama, 2003, Li and Greene, 2013, Li et al., 2018] 14 / 50
  • 11. PS Methods Visualized (Equal Groups) Matching MW OW Original IPTW ATTW ATUW 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Propensity score Frequency Treatment Treated Untreated 15 / 50
  • 12. PS Methods Visualized (Fewer Treated) Matching MW OW Original IPTW ATTW ATUW 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Propensity score Frequency Treatment Treated Untreated 16 / 50
  • 13. PS Methods Visualized (More Treated) Matching MW OW Original IPTW ATTW ATUW 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Propensity score Frequency Treatment Treated Untreated 17 / 50
  • 14. Asymptotic Equivalence of MW and 1:1 PSM [Li and Greene, 2013] proved the asymptotic equivalence of the MW estimand and 1:1 PS matching estimand under: Finite PS space (no growth with n) Positivity (i.e., perfect overlap) 1:1 exact PS matching 18 / 50
  • 15. Estimands Using balancing weights [Li et al., 2018] various population can be targeted for inference of the (marginal) treatment effect. IPTW targets average treatment effect (ATE). We can weights specifically for the average treatment effect on the treated (ATT) or untreated (ATU) 1:1 PSM and MW target the treatment effect in a feasible subset of the sample. [Samuels and Greevy, 2018] named this estimand "average treatment effect on the evenly matchable units" (ATM). OW similarly targets a feasible subset. 19 / 50
  • 17. Generalized PS Conditional probability of receiving a particular level of the treatment given the pre-treatment variables: [Imbens, 2000] Ai ∈ {0, 1, ..., J} eji = P[Ai = j|Xi ] Subject to J j=0 eji = 1 Each individual has a PS vector ei = (e0i , e1i , . . . , eJi )T . 21 / 50
  • 18. Generalized Balancing Weights [Li and Li, 2018] extended the balancing weights framework using the generalized PS. Using our notation, h(Xi ) J j=0 eji I(Ai = j) = h(Xi )IPTWi where h(·) is a prespecified scalar function of Xi , but not Ai . Intuition: Denominator (IPTW) balances groups in covariates Numerator h(·) manipulates target population (estimand) 22 / 50
  • 19. Generalized PS Weighting IPTWi = 1 J j=0 eji I(Ai = j) = 1 eAi i > 1 for all Ai AT(k)Wi = eji J j=0 eki I(Ai = j) = ⎧ ⎨ ⎩ 1 for Ai = k eki eAi i for Ai ̸= k MWi = minj {eji } J j=0 eji I(Ai = j) = ⎧ ⎨ ⎩ 1 for Ai = argminj {eji } minj {eji } eAi i < 1 otherwise OWi = J j=0 1 eji −1 J j=0 eji I(Ai = j) = ⎧ ⎨ ⎩ 1 eAi i 1 J l=0 1 eli < 1 for all Ai = 1 [Yoshida et al., 2017, Li and Li, 2018] 23 / 50
  • 20. Generalized PS Weighting Visualized I x y z Raw xy z Group 0 x y z Group 1 x y z Group 2 x y z IPTW x y z MW xy z OW 24 / 50
  • 21. Generalized PS Weighting Visualized II x y z Raw x y z Group 0 x y z Group 1 x y z Group 2 x y z IPTW x y z MW x y z OW 25 / 50
  • 22. Generalized PS Weighting Visualized III x y z Raw x y z Group 0 x y z Group 1 x y z Group 2 x y z IPTW x y z MW x y z OW 26 / 50
  • 23. Simulation Study [Yoshida et al., 2017] examined 3-group MW in comparison to 3-group IPTW and 1:1:1 simultaneous three-way matching [Rassen et al., 2013]. OW was not included. 27 / 50
  • 24. Mean Squared Error ●●● ●●● ●●● ●●● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●●● ●●● ● ● ● ● ● ● ● ●● ● ● ● Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 Goodoverlap Non−nullmaineffects Pooroverlap Non−nullmaineffects U nadj M atch M W IPTW U nadj M atch M W IPTW U nadj M atch M W IPTW 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 MeanSquaredError pExpo 33:33:33 10:45:45 10:10:80 28 / 50
  • 25. Estimands Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 Goodoverlap Non−nullmaineffects Pooroverlap Non−nullmaineffects U nadj M atch M W IPTW U nadj M atch M W IPTW U nadj M atch M W IPTW 0.40 0.50 0.75 1.00 0.40 0.50 0.75 1.00 TrueRiskRatio pExpo 33:33:33 10:45:45 10:10:80 Estimand calculation was based on the counterfactual method described in [Austin, 2013] 29 / 50
  • 26. Simulation: Summary results Comparing MW to three-way matching and IPTW, we found: Similar estimands for MW and matching, but not IPTW Best covariate balance Similarly small bias compared to matching Smaller MSE compared to matching in all scenarios More robust to rare events, unequally sized groups, and poor covariate overlap The full results are available in [Yoshida et al., 2017] 30 / 50
  • 27. Empirical example Medicare Beneficiary dataset from PA and NJ (1999-2005) [Solomon et al., 2010] Unadjusted nsNSAIDs Coxibs Opioids SMD n 4874 6172 12601 Charlson score, mean (SD) 1.59 (1.54) 1.72 (1.53) 2.17 (1.78) 0.23 Antithrombotic use, % 14.4 17.6 27.7 0.22 No. prescription drugs, mean (SD) 8.28 (4.69) 8.55 (4.76) 9.76 (5.38) 0.20 No. days in hospital, mean (SD) 1.85 (6.90) 2.19 (6.86) 4.18 (9.46) 0.19 White race, % 84.6 88 92.4 0.16 Fracture, % 6.5 7.2 13.7 0.16 Loop diuretic use, % 21.3 25.8 31.3 0.15 Age, mean (SD) 79.67 (7.03) 80.87 (6.99) 81.15 (7.17) 0.14 No. physician visits, mean (SD) 8.72 (6.32) 8.80 (5.99) 10.08 (7.14) 0.14 Myocardial infarction, % 5.2 5.7 9.6 0.11 Stroke, % 15.2 16.1 21.5 0.11 31 / 50
  • 28. Table 1: Comparison Unadjusted nsNSAIDs Coxibs Opioids SMD Charlson score, mean (SD) 1.59 (1.54) 1.72 (1.53) 2.17 (1.78) 0.23 Antithrombotic use, % 14.4 17.6 27.7 0.22 IPTW nsNSAIDs Coxibs Opioids SMD Charlson score, mean (SD) 1.98 (1.70) 1.94 (1.68) 1.94 (1.69) 0.02 Antithrombotic use, % 23.3 22.5 22.4 0.01 MW nsNSAIDs Coxibs Opioids SMD Charlson score, mean (SD) 1.62 (1.53) 1.61 (1.52) 1.63 (1.53) 0.01 Antithrombotic use, % 14.9 14.8 15.2 0.01 OW nsNSAIDs Coxibs Opioids SMD Charlson score, mean (SD) 1.73 (1.58) 1.71 (1.56) 1.73 (1.57) 0.01 Antithrombotic use, % 17.5 17.2 17.5 0.01 Weighted standardized mean difference (SMD) available in R package tableone. 32 / 50
  • 29. Empirical example: Outcome regression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Coxib vs nsNSAIDs Opioids vs nsNSAIDs DeathMI Unadj IPTW MW OW Unadj IPTW MW OW 1 2 3 1 2 3 model HR 33 / 50
  • 30. Conclusion MW has been suggested as a more efficient alternative to 1:1 pairwise matching. [Li and Greene, 2013] In a simulation study with three treatment groups, MW demonstrated similar bias, but smaller MSE compared to 1:1:1 three-way matching. [Rassen et al., 2013] Efficiency gain compared to 1:1:1 three-way matching was more noticeable in scenarios in which the outcome events were rare, treatment groups were unequally sized, or covariate overlap was poor. Compared to IPTW, MW was more stable in the poor covariate overlap setting. Confirming the type of patients that MW is making inference for is important in practice. 34 / 50
  • 32. PS Trimming Propensity score trimming has been suggested by several authors. To increase efficiency [Crump et al., 2009] To reduce unmeasured confounding [Stürmer et al., 2010] To guide study design [Walker et al., 2013] [Yoshida et al., 2019] examined multi-group extension of all three. Here we focus on the extension of [Stürmer et al., 2010]. 36 / 50
  • 33. Motivation for Stürmer’s PS Trimming [Stürmer et al., 2010] was concerned with very heterogeneous treatment effects in the tails of PS distribution. [Kurth et al., 2006] tissue plasminogen activator (t-PA) use vs no t-PA use in stroke patients. Outcome in-hospital death. Very high mortality in t-PA users with lowest probabilities for t-PA. [Lunt et al., 2009] tumor necrosis factor inhibitor (TNFi) initiation vs non-TNFi treatment in rheumatoid arthritis patients. Outcome death. Higher mortality among non-TNFi users with highest probabilities for TNFi initiation. [Stürmer et al., 2010] hypothesized that there may be higher prevalence of unmeasured confounders that preferentially introduce more confounding in the tails. 37 / 50
  • 34. Definition of Stürmer’s PS Trimming [Stürmer et al., 2010] proposed the asymmetric PS trimming to remedy this. Their simulation study confirmed its benefit in bias reduction if indeed the tails of PS contained higher prevalence of unmeasured confounders. 38 / 50
  • 35. Question [Stürmer et al., 2010] demonstrated benefits of PS trimming in reducing unmeasured confounding in the presence of unmeasured confounders that were more prevalent in the tails of the PS distribution. How can we conceptualize this issue in the general setting? How can we extend their method? 39 / 50
  • 36. Original Two-Group Definition Method Existing Binary Definition Stürmer Is = i ∈ I : ei ∈ F−1 ei |Ai (0.05|1), F−1 ei |Ai (0.95|0) Define the lower threshold using the treated PS distribution. Define the upper threshold Notation Explanation i ∈ {1, ..., n} index for an individual I = {1, ..., n} index set for entire sample Ai ∈ {0, 1} treatment variable ei = P[Ai = 1|Xi ] propensity score p = P[Ai = 1] treatment prevalence F−1 ei |Ai (x|a) treatment-specific quantile of ei 40 / 50
  • 37. Proposed definitions I Method Proposed Multinomial Definition Stürmer IJ,s = i ∈ I : eji ≥ F−1 eji |Ai (αJ,s|j) ∀ j ∈ {0, ..., J} Define a threshold at the 100 × αJ,s percentile of each PS in the corresponding treatment group. Trim individuals outside the region above all these thresholds. We used the following provisional thresholds for visualization. Groups J αJ,s 2 1 0.050 3 2 0.033 4 3 0.025 5 4 0.020 J + 1 J 1 J+1 1 10 41 / 50
  • 38. Visualization Explanation x y z 84.6% (86.2; 82.1; 88.3) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Group 0 Group 1 Group 2 Group ● ● ● 0 1 2 Interactive web application 42 / 50
  • 39. Data generation mechanism Xm i Xu i Ai Yi Outcome model βA1, βA2 (main effects) for treatment effects βXA1, βXA2 (interactions) for additional treatment effects in subset Treatment model α01, α02 (intercepts) for treatment prevalence αX1, αX2 (covariate association) for covariate overlap level Outcome model β0 (intercept) for baseline rate of events βX (covariate association) for strength of risk factors Unmeasured covariates Xu i were introduced in tails of PS based on Xm i only. Treatment generating model: Multinomial logistic model Outcome generating model: Poisson model 43 / 50
  • 40. Bias ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1vs0 Sturmer 2vs0 Sturmer 2vs1 Sturmer UnadjIPTWMWOW 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 −0.8 −0.4 0.0 0.4 −0.8 −0.4 0.0 0.4 −0.8 −0.4 0.0 0.4 −0.8 −0.4 0.0 0.4 Threshold Bias 44 / 50
  • 41. Simulation SE ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1vs0 Sturmer 2vs0 Sturmer 2vs1 Sturmer UnadjIPTWMWOW 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 Threshold SE 45 / 50
  • 42. Simulation Root MSE ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 1vs0 Sturmer 2vs0 Sturmer 2vs1 Sturmer UnadjIPTWMWOW 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Threshold sqrt(MSE) 46 / 50
  • 43. Summary Result Unmeasured confounding was reduced by trimming in many cases even with MW and OW albeit to a lesser extent. Initial benefits on variance were apparent for IPTW, but this was not the case for MW and OW. Practical implication: Stürmer trimming with several trimming thresholds may be useful as a sensitivity analysis. Important limitation in practice: Changing point estimate with trimming can be due to both unmeasured confounding reduction and true treatment effect heterogeneity. 47 / 50
  • 44. Recommendations for Multi-Group CER The multinomial PS approach more closely approximate a multi-arm RCT than the pairwise PS approach. PS weighting is easier than matching. When MW and IPTW results diverge, reviewing and revising the eligibility criteria may be most important. MW and OW, although more stable, require more attention to whose effect we are studying. A weighted Table 1 can help. Note that the smallest group tend to affect the estimand most. If unmeasured confounders are suspected in the tails of the PS distribution, PS trimming may be a useful sensitivity analysis even for MW and OW. 48 / 50
  • 45. Further Information on MW Slides: https://www.slideshare.net/kaz_yos Code: https://github.com/kaz-yos/mw Weighted tables: https://github.com/kaz-yos/tableone Published Paper: Epidemiology 2017;28:387 1 / 7
  • 46. Further Information on Trimming Slides: https://www.slideshare.net/kaz_yos Code: https://github.com/kaz-yos/ multinomial-ps-trimming Published Paper: Am J Epidemiol. 2019;188:609 2 / 7
  • 47. Bibliography I [Austin, 2013] Austin, P. C. (2013). The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med, 32(16):2837–2849. [Becker et al., 2009] Becker, M. C., Wang, T. H., Wisniewski, L., Wolski, K., Libby, P., Lüscher, T. F., Borer, J. S., Mascette, A. M., Husni, M. E., Solomon, D. H., Graham, D. Y., Yeomans, N. D., Krum, H., Ruschitzka, F., Lincoff, A. M., Nissen, S. E., and PRECISION Investigators (2009). Rationale, design, and governance of Prospective Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen Or Naproxen (PRECISION), a cardiovascular end point trial of nonsteroidal antiinflammatory agents in patients with arthritis. Am. Heart J., 157(4):606–612. [Bergstra et al., 2019] Bergstra, S. A., Winchow, L.-L., Murphy, E., Chopra, A., Salomon-Escoto, K., Fonseca, J. a. E., Allaart, C. F., and Landewé, R. B. M. (2019). How to treat patients with rheumatoid arthritis when methotrexate has failed? The use of a multiple propensity score to adjust for confounding by indication in observational studies. Ann. Rheum. Dis., 78(1):25–30. [Crump et al., 2009] Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1):187–199. [Imbens, 2000] Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87(3):706–710. 3 / 7
  • 48. Bibliography II [Kurth et al., 2006] Kurth, T., Walker, A. M., Glynn, R. J., Chan, K. A., Gaziano, J. M., Berger, K., and Robins, J. M. (2006). Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am. J. Epidemiol., 163(3):262–270. [Li and Li, 2018] Li, F. and Li, F. (2018). Propensity Score Weighting for Causal Inference with Multi-valued Treatments. arXiv:1808.05339 [stat]. [Li et al., 2018] Li, F., Morgan, K. L., and Zaslavsky, A. M. (2018). Balancing Covariates via Propensity Score Weighting. Journal of the American Statistical Association, 113(521):390–400. [Li and Greene, 2013] Li, L. and Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. Int J Biostat, 9(2):215–234. [Lopez and Gutman, 2017] Lopez, M. J. and Gutman, R. (2017). Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas. Statist. Sci., 32(3):432–454. [Lunt et al., 2009] Lunt, M., Solomon, D., Rothman, K., Glynn, R., Hyrich, K., Symmons, D. P. M., Stürmer, T., British Society for Rheumatology Biologics Register, and British Society for Rheumatology Biologics Register Control Centre Consortium (2009). Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. Am. J. Epidemiol., 169(7):909–917. 4 / 7
  • 49. Bibliography III [Nissen et al., 2016] Nissen, S. E., Yeomans, N. D., Solomon, D. H., Lüscher, T. F., Libby, P., Husni, M. E., Graham, D. Y., Borer, J. S., Wisniewski, L. M., Wolski, K. E., Wang, Q., Menon, V., Ruschitzka, F., Gaffney, M., Beckerman, B., Berger, M. F., Bao, W., Lincoff, A. M., and PRECISION Trial Investigators (2016). Cardiovascular Safety of Celecoxib, Naproxen, or Ibuprofen for Arthritis. N. Engl. J. Med., 375(26):2519–2529. [Pawar et al., 2019] Pawar, A., Desai, R. J., Solomon, D. H., Santiago Ortiz, A. J., Gale, S., Bao, M., Sarsour, K., Schneeweiss, S., and Kim, S. C. (2019). Risk of serious infections in tocilizumab versus other biologic drugs in patients with rheumatoid arthritis: A multidatabase cohort study. Ann. Rheum. Dis. [Rassen et al., 2013] Rassen, J. A., Shelat, A. A., Franklin, J. M., Glynn, R. J., Solomon, D. H., and Schneeweiss, S. (2013). Matching by propensity score in cohort studies with three treatment groups. Epidemiology, 24(3):401–409. [Robins et al., 2000] Robins, J. M., Hernán, M. A., and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5):550–560. [Rosenbaum, 1987] Rosenbaum, P. R. (1987). Model-Based Direct Adjustment. Journal of the American Statistical Association, 82(398):387–394. [Rosenbaum and Rubin, 1983] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55. 5 / 7
  • 50. Bibliography IV [Rosenbaum and Rubin, 1984] Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. J Am Stat Assoc, 79(387):516. [Rosenbaum and Rubin, 1985] Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician, 39(1):33–38. [Samuels and Greevy, 2018] Samuels, L. R. and Greevy, R. A. (2018). Bagged one-to-one matching for efficient and robust treatment effect estimation. Stat Med, 37(29):4353–4373. [Sato and Matsuyama, 2003] Sato, T. and Matsuyama, Y. (2003). Marginal structural models as a tool for standardization. Epidemiology, 14(6):680–686. [Shah et al., 2018] Shah, S., Norby, F. L., Datta, Y. H., Lutsey, P. L., MacLehose, R. F., Chen, L. Y., and Alonso, A. (2018). Comparative effectiveness of direct oral anticoagulants and warfarin in patients with cancer and atrial fibrillation. Blood Adv, 2(3):200–209. [Solomon et al., 2010] Solomon, D. H., Rassen, J. A., Glynn, R. J., Lee, J., Levin, R., and Schneeweiss, S. (2010). The comparative safety of analgesics in older adults with arthritis. Arch. Intern. Med., 170(22):1968–1976. 6 / 7
  • 51. Bibliography V [Stürmer et al., 2010] Stürmer, T., Rothman, K. J., Avorn, J., and Glynn, R. J. (2010). Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution–a simulation study. Am. J. Epidemiol., 172(7):843–854. [Walker et al., 2013] Walker, A. M., Patrick, A. R., Lauer, M. S., Hornbrook, M. C., Marin, M. G., Platt, R., Roger, V. L., Stang, P., and Schneeweiss, S. (2013). A tool for assessing the feasibility of comparative effectiveness research. Comp Eff Res, 2013(3):11–20. [Yoshida et al., 2017] Yoshida, K., Hernandez-Diaz, S., Solomon, D. H., Jackson, J. W., Gagne, J. J., Glynn, R. J., and Franklin, J. M. (2017). Matching Weights to Simultaneously Compare Three Treatment Groups: Comparison to Three-way Matching. Epidemiology, 28(3):387–395. [Yoshida et al., 2019] Yoshida, K., Solomon, D. H., Haneuse, S., Kim, S. C., Patorno, E., Tedeschi, S. K., Lyu, H., Franklin, J. M., Stürmer, T., Hernández-Díaz, S., and Glynn, R. J. (2019). Multinomial Extension of Propensity Score Trimming Methods: A Simulation Study. Am. J. Epidemiol., 188(3):609–616. [Zeng et al., 2019] Zeng, C., Dubreuil, M., LaRochelle, M. R., Lu, N., Wei, J., Choi, H. K., Lei, G., and Zhang, Y. (2019). Association of Tramadol With All-Cause Mortality Among Patients With Osteoarthritis. JAMA, 321(10):969–982. 7 / 7