Multivariable_Regression_Dec_2025 about reg

Multivariable
regression
analysis

 Why do we need multivariable analysis?
 Reminder of confounders and adjusting for confounders
 The idea of a risk prediction (or risk factor) model
 Two of the main regression models linear and logistic
 Strategies of adjusting for confounders
 Strategies of building a risk prediction model
 Opportunities to see how this is used in Sports Medicine publications
and examples
Outline of Learning Objectives

Scene setting example
Researchers looked at the association
between active commuting and obesity
(BMI) in mid-life
Men who walk or cycle to work have a BMI (body mass index)
of 1.4Kg/m2
less than those who drive
For women it was 0.9 Kg/m2
less
Are there any alternative explanations for this association?

active commuting BMI
Confounders?
A confounder is:
• associated with the exposure of interest
• independently associated with the disease outcome
Age? Healthy lifestyle?
Confounding… the issue of observational data

Multivariable analysis allows us to adjust for confounders
Analysis depends on the type of outcome variable:
Continuous  multiple linear regression
Binary  multiple logistic regression
Survival time  Cox regression
Multivariable (multivariate) analysis

y = β0 + β1x1
Remember the simple linear regression model:
where y = continuous outcome, x = exposure variable
β0 = constant (intercept)
β1 = regression coefficient (slope)
NB: if x1 is continuous or binary, β1 is amount y increases for a unit increase in x1
BMI = β0 + β1 x active commuting (yes = 1 versus no = 0)
Extend this model quite naturally:
y = β0 + β1x1 + β2x2 + …
BMI =β0 + β1 x active commuting + β2 x age
β1 is the effect of
active commuting
adjusted for age

Effect of active commuting with adjustment for confounders..
After adjustment for age and other things, men who use active
transport have a weight 0.97Kg less than men who dont

Multiple linear regression in SPSS
Other variables can be added to the independent box
(note that for this option in SPSS they must be
continuous or binary)

To predict blood pressure given waist to hip ratio and age:
Blood pressure = 70.748 + 30.725 x whr + 0.594 x age
Blood pressure increases by 30.725 mmHg per 1 unit increase in
whr, after adjusting for age
The effect of whr remains significant (p < 0.001) so is
independent of age

Logistic regression
For binary outcomes …
Based on odds ratios …

Odds & Odds Ratios
Odds of wheeze in exposed = 0.045 / 0.955 = 0.048
Odds of wheeze unexposed = 0.032 / 0.968 =0.033
OR = 0.048 = 1.45
0.033
wheeze no wheeze Total
exposed 61 (4.5%) 1282 (95.5%) 1343
unexposed 185 (3.2%) 5627 (96.8%) 5812
Total 246 6909 7155
Association between exposure to parental tobacco smoke
and wheezing in children:

In the logistic regression model, we model log odds:
where log odds = log odds of outcome (or disease)
x = binary exposure variable
β0 = constant (log odds of outcome in unexposed group)
β1 = measure of effect (log odds ratio)
log odds = β0 + β1 x

Does the risk of wheeze differ between those exposed and
unexposed to parental tobacco smoke?
ie want to estimate odds ratio for wheeze comparing exposed
with unexposed and obtain p-value
Outcome:
Must be coded 1 for event (+ve outcome), 0 for no event
no wheeze = 0
wheeze = 1
Exposure:
parent’s don’t smoke = 0 (unexposed/baseline gp)
at least one parent smokes = 1
Example in SPSS: one binary exposure

Click on options and tick
‘CI for exp(β)’
Move outcome into ‘dependent’
box and binary exposure into
‘covariates’ box

Variables in the Equation
.370 .151 6.005 1 .014 1.447 1.077 1.945
-3.415 .075 2088.807 1 .000 .033
anysmoke
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke.
a.
Odds ratio of wheeze for
exposed vs unexposed
Odds of wheeze in
unexposed
95% CI for OR
P value for smoking
effect (Wald test)
As no other explanatory variables (exposures) in the model, the odds ratio above is
the ‘unadjusted OR’ and the same as that computed from the cross-tab.
The p-value above will be very similar to that obtained from chi-squared test earlier
β0, ie log odds of
wheeze in unexposed
β1, ie log OR
SPSS output and interpretation

Have put site in the model as well,
and have chosen simple(first)
contract for each so SPSS gives
odds ratio for levels of site and
smoking relative to the lowest
level.
Multiple logistic regression:
other “confounders” can be
added to the model for
“adjustment” eg site

Deciding on confounding
Variables in the Equation
.560 .154 13.174 1 .000 1.751 1.294 2.369
-.894 .155 33.170 1 .000 .409 .302 .555
-3.339 .082 1639.188 1 .000 .035
anysmoke(1)
site(1)
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke, site.
a.
Interpretation:
After adjusting for site,
• the OR for smoking has increased from 1.45 (unadjusted) to 1.75
suggesting site is a confounder in this relation
• We sometimes use a 10% change in the effect of interest to
decide if there is important confounding.
• In this case, the change is more than 10% so site is an important
confounder. We should adjust for it.
OR and 95% CI for being exposed to smoke vs unexposed, adjusted for site

Modelling strategies
Strategy depends on study aim:
Strategy 1:
If study aim is to describe a single exposure-
outcome relation, adjusted for potential
confounding variables
Strategy 2:
If study aim is to establish which of a list of
exposure variables are associated with
outcome, ie to find the 'best' model to predict
outcome

STRATEGY 1
Examples:
 Study to investigate whether physical activity
reduces blood pressure
 Study of the relation between physical inactivity
and depression.
i.e. one exposure, one outcome

Want to get an ‘adjusted’ regression
coefficient, and associated CI and p-value, for
exposure of interest
1.Start by looking at unadjusted regression
coefficient for exposure of interest
2.Add all a priori confounders (age, sex, …) to
the model. What is the regression coefficient for
the exposure of interest now?
Strategy 1: Steps to follow

Strategy 1: Steps to follow (continued)
3. Does adding any other potential confounder
to the model make a difference to the
magnitude of the regression coefficient of
interest (>10% change)?
NO - present coefficient from step 2,
adjusted for a priori confounders only.
YES - include confounder in model and
present coefficient for exposure of interest
arising from this model.

Example: What are the risk factors for predicting
inactivity?
i.e. multiple exposures, one outcome
Want to find the ‘best’ model:
 best at predicting dependent variable
 best at explaining variation in dependent
variable
Should aim to have a simple a model as possible
STRATEGY 2

Multiple logistic regression:
Used to look at predictors o
f physical inactivity

Strategy 2: Model building
 No perfect answer
 Formal selection procedures (forward, backward
and stepwise), particularly the automatic
procedures in SPSS, should be used with
caution
• Can lead to very different results depending on
procedure used
• Avoids thinking about the specific research question
and can hence lead to models that include
‘implausible’ variables and exclude ‘known risk
factors’

Strategy 2: Model building
1. Use a systematic approach where you think about the
particular problem
2. If the literature points to any established risk factors,
these should probably be included.
3. Look at the univariate (unadjusted) significance of each
predictor
4. Fit the significant ones together into a multivariate model
5. Remove the non-significant ones
6. Try adding back any that are omitted.
7. And repeat …..!

1. Don’t include any explanatory variable whose
relation with the outcome is implausible
2. Don’t include two explanatory variables that are
closely related (collinear), eg both Townsend and
Carstairs social deprivation score
3. Don’t try modelling large numbers of variables to
small datasets. There should be at least 10 times
as many observations as variables in the model
4. If you are unsure which of a number of models is
best, see how alternative models effect the
conclusions.
General principles of model building

Multivariable_Regression_Dec_2025 about reg

More Related Content

Similar to Multivariable_Regression_Dec_2025 about reg (20)

Recently uploaded (20)

Multivariable_Regression_Dec_2025 about reg