QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL

QUANTIFYING THE IMPACT OF DIFFERENT
APPROACHES FOR HANDLING CONTINUOUS
PREDICTORS ON THE PERFORMANCE OF A
PROGNOSTIC MODEL
Gary Collins, Emmanuel Ogundimu, Jonathan Cook,
Yannick Le Manach, Doug Altman
Centre for Statistics in Medicine
University of Oxford
20-July-2016
gary.collins@csm.ox.ac.uk

Outline
 Existing guidance
 What’s done in practice?
 Brief overview of the study sample & simulation set-up
 Findings & Discussion
2

It’s all in the title…(1994-2006)
1. Problems in dichotomizing continuous variables (Altman 1994)
2. Dangers of using "optimal" cutpoints in the evaluation of prognostic
factors. (Altman et al 1994)
3. How bad is categorization? (Weinberg; 1995)
4. Seven reasons why you should NOT categorize continuous data
(Dinero; 1996)
5. Breaking Up is Hard to Do: The Heartbreak of Dichotomizing
Continuous Data (Streiner; 2002)
6. Negative consequences of dichotomizing continuous predictor
variables (Irwin & McClelland; 2003)
7. Why carve up your continuous data? (Owen 2005)
8. Chopped liver? OK. Chopped data? Not OK. Chopped liver? OK.
Chopped data? Not OK (Butts & Ng 2005)
9. Categorizing continuous variables resulted in different
predictors in a prognostic model for nonspecific neck pain
(Schellingerhout et al 2006)
5

It’s all in the title…(2006-2014)
10.Dichotomizing continuous predictors in multiple regression: a bad idea
(Royston et el 2006)
11. The cost of dichotomising continuous variables (Altman & Royston; 2006)
12.Leave 'em alone - why continuous variables should be analyzed as such
(van Walraven & Hart; 2008)
13.Dichotomization of continuous data--a pitfall in prognostic factor studies
(Metze; 2008)
14. Analysis by categorizing or dichotomizing continuous variables is
inadvisable: an example from the natural history of unruptured aneurysms
(Naggara et al 2011)
15.Against quantiles: categorization of continuous variables in epidemiologic
research, and its discontents (Bennette & Vickers; 2012)
16.Dichotomizing continuous variables in statistical analysis: a practice to
avoid (Dawson & Weiss; 2012)
17. The danger of dichotomizing continuous variables: A visualization (Kuss
2013)
18. The “anathema” of arbitrary categorization of continuous predictors
(Vintzileos et al; 2014)
19. Ophthalmic statistics note: the perils of dichotomising continuous variables
(Cumberland et al 2014)
6

Prognostic factor (PF)
A B C
PF not present
(low risk)
PF present
(high risk)
Cut-point
Biologically implausible
Slide adapted from Michael Babyak (‘Modeling with Observational Data’)

Prognostic factor (PF)
A B C
PF not present
(low risk)
PF present
(high risk)
Cut-point
Biologically implausible
“Convoluted Reasoning and
Anti-intellectual Pomposity”
“C.R.A.P”
(Norman & Streiner;
Biostatistics: the Bare
Essentials, 2008)
Slide adapted from Michael Babyak (‘Modeling with Observational Data’)

Still, what happens in practice…?
 Breast cancer models (Altman 2009)
– Categorised some/all - 34/53 (64%)
 Diabetes models (Collins et al 2011)
– Categorised some/all 21/43 (49%)
 General medical journals (Bouwmeester et al 2012)
– Categorised 30/64 (47%)
– Dichotomised 21/64 (21%)
 Cancer models (Mallett et al 2010)
– All categorised/dichotomised 24/47 (51%)
9

Aim of the study
 Investigate the impact of different approaches for
handling continuous predictors on the
– apparent performance (same data)
– validation performance (different data; geographical validation)
 Investigate the influence of sample size has on the
approach for handling continuous predictors
10

Sample characteristics (THIN)
11
80,800 CVD
events
4688 CVD
events
565 hip
fractures
7721 hip
fractures

Models
 Cox models to predict
– 10-year risk of CVD (men & women)
– 10-year risk of hip fracture (women only)
 CVD model contained 7 predictors
– Age, sex, family history, cholesterol, SBP, BMI, hypertension
 Hip fracture model contained 5 predictors
– Age, BMI, Townsend score, asthma, antidepressants
12

Resampling strategy
 MODEL DEVELOPMENT
– To ensure the number of events in each sample was fixed at
25, 50, 100, and 2000 events
– Sample were drawn from those with and without the event
(separately)
– 200 samples randomly drawn (with replacement)
 MODEL VALIDATION
– All available data were used
• CVD: n=110,934 (4688 CVD events)
• Hip fracture: n=61,563 (565 hip fractures)
13

Approaches considered
 Dichotomised at the
– Median predictor value
– ‘optimal’ cut-point based on the logrank test
 Categorised into
– 3 groups (using tertile predictor values)
– 4 groups (using quartile predictor values)
– 5 groups (using quintile predictor values)
– 5-year age categories
– 10-year age categories
 Linear relationship
 Nonlinear relationship
– fractional polynomials (FP2; 4 degrees of freedom per predictor)
– restricted cubic splines (3 knots)
14

Performance measures calculated
 Calibration
– Calibration plot
– Harrell’s “val.surv” function; hazard regression with linear
splines
 Discrimination
– Harrell’s c-index
 Clinical utility
– Decision curve analysis (Vickers & Elkin 2006)
– Net benefit;
• weighted difference between true positives and false positives
 D-statistic; Brier Score; R-squared also examined
– Not reported here - but in the supplementary material of
Collins et al Stat Med 2016.
15

Net benefit (recap)
 pt is the probability threshold to denote ‘high
risk’
– Used to weight the FPs and FN results
 TP and FP calculated using Kaplan-Meier
estimates of the percentage surviving at 10
years among those with predicted risks
greater than pt
 Bottom line: model with highest NB ‘wins’
16

Total serum cholesterol & CVD
18

Age, cholesterol, BMI, SBP & CVD
19

RESULTS: Hip fracture 25 events
25

26

27

28

RESULTS: Discrimination CVD
 At small sample sizes (25 events)
– Large difference in between apparent performance and
validation performance for ‘optimal’ dichotomisation
• 0.84 (apparent); 0.72 (validation)
– Smaller differences observed for FP/RCS/Linear
 Observed difference between dichotomisation (at
the median) and linear/FP/RCS
– Apparent performance: difference of 0.05
– Validation performance: difference of 0.05
– Observed over all 4 sample sizes examined
 Negligible differences between linear/FP/RCS
29

RESULTS: Discrimination Hip Fracture
 At small sample sizes (25 events)
– Large difference in between apparent performance and
validation performance for ‘optimal’ dichotomisation
– FP/RCS/Linear
 Observed difference between dichotomisation (at
the median) and linear/FP/RCS
– Apparent performance: difference of 0.1
– Validation performance: difference of 0.1
– Observed over all 4 sample sizes examined
 Negligible differences between linear/FP/RCS
30

RESULTS: Discrimination Hip Fracture
31

RESULTS: Decision Curve Analysis
(CVD only) [higher NB better model]
32
FP/RCS
dichotomisation

RESULTS: Net cases found per 1000
33

Conclusions
 Systematic reviews show dichotomising /
categorising continuous predictors routinely done
when developing a prediction model
 Dichotomising, either at the median or ‘optimal’
predictor value leads to models with substantially
poorer performance
– Poor discrimination; poor calibration; poor clinical utility
 Large discrepancies between apparent performance
and validation performance observed for ‘optimal’
split dichotomising
 The impact of dichotomising continuous predictors
are handled are more pronounced at smaller sample
sizes
34

QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL

More Related Content

Similar to QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL (20)

Recently uploaded (20)

QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL