2. Why do we need multivariable analysis?
Reminder of confounders and adjusting for confounders
The idea of a risk prediction (or risk factor) model
Two of the main regression models linear and logistic
Strategies of adjusting for confounders
Strategies of building a risk prediction model
Opportunities to see how this is used in Sports Medicine publications
and examples
Outline of Learning Objectives
3. Scene setting example
Researchers looked at the association
between active commuting and obesity
(BMI) in mid-life
Men who walk or cycle to work have a BMI (body mass index)
of 1.4Kg/m2
less than those who drive
For women it was 0.9 Kg/m2
less
Are there any alternative explanations for this association?
4. active commuting BMI
Confounders?
A confounder is:
• associated with the exposure of interest
• independently associated with the disease outcome
Age? Healthy lifestyle?
Confounding… the issue of observational data
5. Multivariable analysis allows us to adjust for confounders
Analysis depends on the type of outcome variable:
Continuous multiple linear regression
Binary multiple logistic regression
Survival time Cox regression
Multivariable (multivariate) analysis
6. y = β0 + β1x1
Remember the simple linear regression model:
where y = continuous outcome, x = exposure variable
β0 = constant (intercept)
β1 = regression coefficient (slope)
NB: if x1 is continuous or binary, β1 is amount y increases for a unit increase in x1
BMI = β0 + β1 x active commuting (yes = 1 versus no = 0)
Extend this model quite naturally:
y = β0 + β1x1 + β2x2 + …
BMI =β0 + β1 x active commuting + β2 x age
β1 is the effect of
active commuting
adjusted for age
7. Effect of active commuting with adjustment for confounders..
After adjustment for age and other things, men who use active
transport have a weight 0.97Kg less than men who dont
8. Multiple linear regression in SPSS
Other variables can be added to the independent box
(note that for this option in SPSS they must be
continuous or binary)
9. To predict blood pressure given waist to hip ratio and age:
Blood pressure = 70.748 + 30.725 x whr + 0.594 x age
Blood pressure increases by 30.725 mmHg per 1 unit increase in
whr, after adjusting for age
The effect of whr remains significant (p < 0.001) so is
independent of age
11. Odds & Odds Ratios
Odds of wheeze in exposed = 0.045 / 0.955 = 0.048
Odds of wheeze unexposed = 0.032 / 0.968 =0.033
OR = 0.048 = 1.45
0.033
wheeze no wheeze Total
exposed 61 (4.5%) 1282 (95.5%) 1343
unexposed 185 (3.2%) 5627 (96.8%) 5812
Total 246 6909 7155
Association between exposure to parental tobacco smoke
and wheezing in children:
12. In the logistic regression model, we model log odds:
where log odds = log odds of outcome (or disease)
x = binary exposure variable
β0 = constant (log odds of outcome in unexposed group)
β1 = measure of effect (log odds ratio)
log odds = β0 + β1 x
13. Does the risk of wheeze differ between those exposed and
unexposed to parental tobacco smoke?
ie want to estimate odds ratio for wheeze comparing exposed
with unexposed and obtain p-value
Outcome:
Must be coded 1 for event (+ve outcome), 0 for no event
no wheeze = 0
wheeze = 1
Exposure:
parent’s don’t smoke = 0 (unexposed/baseline gp)
at least one parent smokes = 1
Example in SPSS: one binary exposure
15. Click on options and tick
‘CI for exp(β)’
Move outcome into ‘dependent’
box and binary exposure into
‘covariates’ box
16. Variables in the Equation
.370 .151 6.005 1 .014 1.447 1.077 1.945
-3.415 .075 2088.807 1 .000 .033
anysmoke
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke.
a.
Odds ratio of wheeze for
exposed vs unexposed
Odds of wheeze in
unexposed
95% CI for OR
P value for smoking
effect (Wald test)
As no other explanatory variables (exposures) in the model, the odds ratio above is
the ‘unadjusted OR’ and the same as that computed from the cross-tab.
The p-value above will be very similar to that obtained from chi-squared test earlier
β0, ie log odds of
wheeze in unexposed
β1, ie log OR
SPSS output and interpretation
17. Have put site in the model as well,
and have chosen simple(first)
contract for each so SPSS gives
odds ratio for levels of site and
smoking relative to the lowest
level.
Multiple logistic regression:
other “confounders” can be
added to the model for
“adjustment” eg site
18. Deciding on confounding
Variables in the Equation
.560 .154 13.174 1 .000 1.751 1.294 2.369
-.894 .155 33.170 1 .000 .409 .302 .555
-3.339 .082 1639.188 1 .000 .035
anysmoke(1)
site(1)
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke, site.
a.
Interpretation:
After adjusting for site,
• the OR for smoking has increased from 1.45 (unadjusted) to 1.75
suggesting site is a confounder in this relation
• We sometimes use a 10% change in the effect of interest to
decide if there is important confounding.
• In this case, the change is more than 10% so site is an important
confounder. We should adjust for it.
OR and 95% CI for being exposed to smoke vs unexposed, adjusted for site
19. Modelling strategies
Strategy depends on study aim:
Strategy 1:
If study aim is to describe a single exposure-
outcome relation, adjusted for potential
confounding variables
Strategy 2:
If study aim is to establish which of a list of
exposure variables are associated with
outcome, ie to find the 'best' model to predict
outcome
20. STRATEGY 1
Examples:
Study to investigate whether physical activity
reduces blood pressure
Study of the relation between physical inactivity
and depression.
i.e. one exposure, one outcome
21. Want to get an ‘adjusted’ regression
coefficient, and associated CI and p-value, for
exposure of interest
1.Start by looking at unadjusted regression
coefficient for exposure of interest
2.Add all a priori confounders (age, sex, …) to
the model. What is the regression coefficient for
the exposure of interest now?
Strategy 1: Steps to follow
22. Strategy 1: Steps to follow (continued)
3. Does adding any other potential confounder
to the model make a difference to the
magnitude of the regression coefficient of
interest (>10% change)?
NO - present coefficient from step 2,
adjusted for a priori confounders only.
YES - include confounder in model and
present coefficient for exposure of interest
arising from this model.
23. Example: What are the risk factors for predicting
inactivity?
i.e. multiple exposures, one outcome
Want to find the ‘best’ model:
best at predicting dependent variable
best at explaining variation in dependent
variable
Should aim to have a simple a model as possible
STRATEGY 2
25. Strategy 2: Model building
No perfect answer
Formal selection procedures (forward, backward
and stepwise), particularly the automatic
procedures in SPSS, should be used with
caution
• Can lead to very different results depending on
procedure used
• Avoids thinking about the specific research question
and can hence lead to models that include
‘implausible’ variables and exclude ‘known risk
factors’
26. Strategy 2: Model building
1. Use a systematic approach where you think about the
particular problem
2. If the literature points to any established risk factors,
these should probably be included.
3. Look at the univariate (unadjusted) significance of each
predictor
4. Fit the significant ones together into a multivariate model
5. Remove the non-significant ones
6. Try adding back any that are omitted.
7. And repeat …..!
27. 1. Don’t include any explanatory variable whose
relation with the outcome is implausible
2. Don’t include two explanatory variables that are
closely related (collinear), eg both Townsend and
Carstairs social deprivation score
3. Don’t try modelling large numbers of variables to
small datasets. There should be at least 10 times
as many observations as variables in the model
4. If you are unsure which of a number of models is
best, see how alternative models effect the
conclusions.
General principles of model building