SlideShare a Scribd company logo
Multivariable
regression
analysis
 Why do we need multivariable analysis?
 Reminder of confounders and adjusting for confounders
 The idea of a risk prediction (or risk factor) model
 Two of the main regression models linear and logistic
 Strategies of adjusting for confounders
 Strategies of building a risk prediction model
 Opportunities to see how this is used in Sports Medicine publications
and examples
Outline of Learning Objectives
Scene setting example
Researchers looked at the association
between active commuting and obesity
(BMI) in mid-life
Men who walk or cycle to work have a BMI (body mass index)
of 1.4Kg/m2
less than those who drive
For women it was 0.9 Kg/m2
less
Are there any alternative explanations for this association?
active commuting BMI
Confounders?
A confounder is:
• associated with the exposure of interest
• independently associated with the disease outcome
Age? Healthy lifestyle?
Confounding… the issue of observational data
Multivariable analysis allows us to adjust for confounders
Analysis depends on the type of outcome variable:
Continuous  multiple linear regression
Binary  multiple logistic regression
Survival time  Cox regression
Multivariable (multivariate) analysis
y = β0 + β1x1
Remember the simple linear regression model:
where y = continuous outcome, x = exposure variable
β0 = constant (intercept)
β1 = regression coefficient (slope)
NB: if x1 is continuous or binary, β1 is amount y increases for a unit increase in x1
BMI = β0 + β1 x active commuting (yes = 1 versus no = 0)
Extend this model quite naturally:
y = β0 + β1x1 + β2x2 + …
BMI =β0 + β1 x active commuting + β2 x age
β1 is the effect of
active commuting
adjusted for age
Effect of active commuting with adjustment for confounders..
After adjustment for age and other things, men who use active
transport have a weight 0.97Kg less than men who dont
Multiple linear regression in SPSS
Other variables can be added to the independent box
(note that for this option in SPSS they must be
continuous or binary)
To predict blood pressure given waist to hip ratio and age:
Blood pressure = 70.748 + 30.725 x whr + 0.594 x age
Blood pressure increases by 30.725 mmHg per 1 unit increase in
whr, after adjusting for age
The effect of whr remains significant (p < 0.001) so is
independent of age
Logistic regression
For binary outcomes …
Based on odds ratios …
Odds & Odds Ratios
Odds of wheeze in exposed = 0.045 / 0.955 = 0.048
Odds of wheeze unexposed = 0.032 / 0.968 =0.033
OR = 0.048 = 1.45
0.033
wheeze no wheeze Total
exposed 61 (4.5%) 1282 (95.5%) 1343
unexposed 185 (3.2%) 5627 (96.8%) 5812
Total 246 6909 7155
Association between exposure to parental tobacco smoke
and wheezing in children:
In the logistic regression model, we model log odds:
where log odds = log odds of outcome (or disease)
x = binary exposure variable
β0 = constant (log odds of outcome in unexposed group)
β1 = measure of effect (log odds ratio)
log odds = β0 + β1 x
Does the risk of wheeze differ between those exposed and
unexposed to parental tobacco smoke?
ie want to estimate odds ratio for wheeze comparing exposed
with unexposed and obtain p-value
Outcome:
Must be coded 1 for event (+ve outcome), 0 for no event
no wheeze = 0
wheeze = 1
Exposure:
parent’s don’t smoke = 0 (unexposed/baseline gp)
at least one parent smokes = 1
Example in SPSS: one binary exposure
Logistic regression in SPSS
Click on options and tick
‘CI for exp(β)’
Move outcome into ‘dependent’
box and binary exposure into
‘covariates’ box
Variables in the Equation
.370 .151 6.005 1 .014 1.447 1.077 1.945
-3.415 .075 2088.807 1 .000 .033
anysmoke
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke.
a.
Odds ratio of wheeze for
exposed vs unexposed
Odds of wheeze in
unexposed
95% CI for OR
P value for smoking
effect (Wald test)
As no other explanatory variables (exposures) in the model, the odds ratio above is
the ‘unadjusted OR’ and the same as that computed from the cross-tab.
The p-value above will be very similar to that obtained from chi-squared test earlier
β0, ie log odds of
wheeze in unexposed
β1, ie log OR
SPSS output and interpretation
Have put site in the model as well,
and have chosen simple(first)
contract for each so SPSS gives
odds ratio for levels of site and
smoking relative to the lowest
level.
Multiple logistic regression:
other “confounders” can be
added to the model for
“adjustment” eg site
Deciding on confounding
Variables in the Equation
.560 .154 13.174 1 .000 1.751 1.294 2.369
-.894 .155 33.170 1 .000 .409 .302 .555
-3.339 .082 1639.188 1 .000 .035
anysmoke(1)
site(1)
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0% C.I.for EXP(B)
Variable(s) entered on step 1: anysmoke, site.
a.
Interpretation:
After adjusting for site,
• the OR for smoking has increased from 1.45 (unadjusted) to 1.75
suggesting site is a confounder in this relation
• We sometimes use a 10% change in the effect of interest to
decide if there is important confounding.
• In this case, the change is more than 10% so site is an important
confounder. We should adjust for it.
OR and 95% CI for being exposed to smoke vs unexposed, adjusted for site
Modelling strategies
Strategy depends on study aim:
Strategy 1:
If study aim is to describe a single exposure-
outcome relation, adjusted for potential
confounding variables
Strategy 2:
If study aim is to establish which of a list of
exposure variables are associated with
outcome, ie to find the 'best' model to predict
outcome
STRATEGY 1
Examples:
 Study to investigate whether physical activity
reduces blood pressure
 Study of the relation between physical inactivity
and depression.
i.e. one exposure, one outcome
Want to get an ‘adjusted’ regression
coefficient, and associated CI and p-value, for
exposure of interest
1.Start by looking at unadjusted regression
coefficient for exposure of interest
2.Add all a priori confounders (age, sex, …) to
the model. What is the regression coefficient for
the exposure of interest now?
Strategy 1: Steps to follow
Strategy 1: Steps to follow (continued)
3. Does adding any other potential confounder
to the model make a difference to the
magnitude of the regression coefficient of
interest (>10% change)?
NO - present coefficient from step 2,
adjusted for a priori confounders only.
YES - include confounder in model and
present coefficient for exposure of interest
arising from this model.
Example: What are the risk factors for predicting
inactivity?
i.e. multiple exposures, one outcome
Want to find the ‘best’ model:
 best at predicting dependent variable
 best at explaining variation in dependent
variable
Should aim to have a simple a model as possible
STRATEGY 2
Multiple logistic regression:
Used to look at predictors o
f physical inactivity
Strategy 2: Model building
 No perfect answer
 Formal selection procedures (forward, backward
and stepwise), particularly the automatic
procedures in SPSS, should be used with
caution
• Can lead to very different results depending on
procedure used
• Avoids thinking about the specific research question
and can hence lead to models that include
‘implausible’ variables and exclude ‘known risk
factors’
Strategy 2: Model building
1. Use a systematic approach where you think about the
particular problem
2. If the literature points to any established risk factors,
these should probably be included.
3. Look at the univariate (unadjusted) significance of each
predictor
4. Fit the significant ones together into a multivariate model
5. Remove the non-significant ones
6. Try adding back any that are omitted.
7. And repeat …..!
1. Don’t include any explanatory variable whose
relation with the outcome is implausible
2. Don’t include two explanatory variables that are
closely related (collinear), eg both Townsend and
Carstairs social deprivation score
3. Don’t try modelling large numbers of variables to
small datasets. There should be at least 10 times
as many observations as variables in the model
4. If you are unsure which of a number of models is
best, see how alternative models effect the
conclusions.
General principles of model building

More Related Content

PPT
Quantitative_analysis and methods built software
PPTX
Multiple Linear Regression Homework Help
PDF
Assignment Pharmacoeconomics Fatma Adel Soliman
PPTX
unmatched case control studies
PPT
Quantitative_analysis.ppt
PPT
Biostatistics
PPTX
Multiple Linear Regression Homework Help
PPT
Estatística aplicada a saúde: regressão logística
Quantitative_analysis and methods built software
Multiple Linear Regression Homework Help
Assignment Pharmacoeconomics Fatma Adel Soliman
unmatched case control studies
Quantitative_analysis.ppt
Biostatistics
Multiple Linear Regression Homework Help
Estatística aplicada a saúde: regressão logística

Similar to Multivariable_Regression_Dec_2025 about reg (20)

PDF
2. ph250b.14 measures of association 1
 
PPT
Analytic Methods and Issues in CER from Observational Data
DOCX
Chapter 9Multivariable MethodsObjectives• .docx
PPTX
Biostatistics.pptx
PPT
Quantitative analysis
PDF
Multiple Regression and Logistic Regression
PPTX
2.1 big picture
 
DOCX
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PPT
PPTX
Research Designs
DOCX
how much would it cost to do the followingHow can graphics and.docx
PDF
Biostatistics - slides considerations for reseaarch
PPT
Validity andreliability
PPT
Analysis and Interpretation
PPTX
Presentation to CCG - Capita Health Freakononics v3
PPTX
Sample Size determine in health research
PPTX
4.3.2. controlling confounding stratification
 
PPTX
teaching-2394666758-7290-1618944682-1.pptx
PDF
Robust Methods for Health-related Quality-of-life Assessment
2. ph250b.14 measures of association 1
 
Analytic Methods and Issues in CER from Observational Data
Chapter 9Multivariable MethodsObjectives• .docx
Biostatistics.pptx
Quantitative analysis
Multiple Regression and Logistic Regression
2.1 big picture
 
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
Research Designs
how much would it cost to do the followingHow can graphics and.docx
Biostatistics - slides considerations for reseaarch
Validity andreliability
Analysis and Interpretation
Presentation to CCG - Capita Health Freakononics v3
Sample Size determine in health research
4.3.2. controlling confounding stratification
 
teaching-2394666758-7290-1618944682-1.pptx
Robust Methods for Health-related Quality-of-life Assessment
Ad

Recently uploaded (20)

PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
CHEM421 - Biochemistry (Chapter 1 - Introduction)
PPTX
Post Op complications in general surgery
PDF
Copy of OB - Exam #2 Study Guide. pdf
PPT
HIV lecture final - student.pptfghjjkkejjhhge
PPTX
Electrolyte Disturbance in Paediatric - Nitthi.pptx
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
PPTX
the psycho-oncology for psychiatrists pptx
PPTX
Acute Coronary Syndrome for Cardiology Conference
PDF
Cardiology Pearls for Primary Care Providers
PPTX
obstructive neonatal jaundice.pptx yes it is
PPTX
regulatory aspects for Bulk manufacturing
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPT
Infections Member of Royal College of Physicians.ppt
PDF
Extended-Expanded-role-of-Nurses.pdf is a key for student Nurses
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
Cardiovascular - antihypertensive medical backgrounds
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
ONCOLOGY Principles of Radiotherapy.pptx
شيت_عطا_0000000000000000000000000000.pdf
CHEM421 - Biochemistry (Chapter 1 - Introduction)
Post Op complications in general surgery
Copy of OB - Exam #2 Study Guide. pdf
HIV lecture final - student.pptfghjjkkejjhhge
Electrolyte Disturbance in Paediatric - Nitthi.pptx
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
the psycho-oncology for psychiatrists pptx
Acute Coronary Syndrome for Cardiology Conference
Cardiology Pearls for Primary Care Providers
obstructive neonatal jaundice.pptx yes it is
regulatory aspects for Bulk manufacturing
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
Infections Member of Royal College of Physicians.ppt
Extended-Expanded-role-of-Nurses.pdf is a key for student Nurses
surgery guide for USMLE step 2-part 1.pptx
Cardiovascular - antihypertensive medical backgrounds
MENTAL HEALTH - NOTES.ppt for nursing students
ONCOLOGY Principles of Radiotherapy.pptx
Ad

Multivariable_Regression_Dec_2025 about reg

  • 2.  Why do we need multivariable analysis?  Reminder of confounders and adjusting for confounders  The idea of a risk prediction (or risk factor) model  Two of the main regression models linear and logistic  Strategies of adjusting for confounders  Strategies of building a risk prediction model  Opportunities to see how this is used in Sports Medicine publications and examples Outline of Learning Objectives
  • 3. Scene setting example Researchers looked at the association between active commuting and obesity (BMI) in mid-life Men who walk or cycle to work have a BMI (body mass index) of 1.4Kg/m2 less than those who drive For women it was 0.9 Kg/m2 less Are there any alternative explanations for this association?
  • 4. active commuting BMI Confounders? A confounder is: • associated with the exposure of interest • independently associated with the disease outcome Age? Healthy lifestyle? Confounding… the issue of observational data
  • 5. Multivariable analysis allows us to adjust for confounders Analysis depends on the type of outcome variable: Continuous  multiple linear regression Binary  multiple logistic regression Survival time  Cox regression Multivariable (multivariate) analysis
  • 6. y = β0 + β1x1 Remember the simple linear regression model: where y = continuous outcome, x = exposure variable β0 = constant (intercept) β1 = regression coefficient (slope) NB: if x1 is continuous or binary, β1 is amount y increases for a unit increase in x1 BMI = β0 + β1 x active commuting (yes = 1 versus no = 0) Extend this model quite naturally: y = β0 + β1x1 + β2x2 + … BMI =β0 + β1 x active commuting + β2 x age β1 is the effect of active commuting adjusted for age
  • 7. Effect of active commuting with adjustment for confounders.. After adjustment for age and other things, men who use active transport have a weight 0.97Kg less than men who dont
  • 8. Multiple linear regression in SPSS Other variables can be added to the independent box (note that for this option in SPSS they must be continuous or binary)
  • 9. To predict blood pressure given waist to hip ratio and age: Blood pressure = 70.748 + 30.725 x whr + 0.594 x age Blood pressure increases by 30.725 mmHg per 1 unit increase in whr, after adjusting for age The effect of whr remains significant (p < 0.001) so is independent of age
  • 10. Logistic regression For binary outcomes … Based on odds ratios …
  • 11. Odds & Odds Ratios Odds of wheeze in exposed = 0.045 / 0.955 = 0.048 Odds of wheeze unexposed = 0.032 / 0.968 =0.033 OR = 0.048 = 1.45 0.033 wheeze no wheeze Total exposed 61 (4.5%) 1282 (95.5%) 1343 unexposed 185 (3.2%) 5627 (96.8%) 5812 Total 246 6909 7155 Association between exposure to parental tobacco smoke and wheezing in children:
  • 12. In the logistic regression model, we model log odds: where log odds = log odds of outcome (or disease) x = binary exposure variable β0 = constant (log odds of outcome in unexposed group) β1 = measure of effect (log odds ratio) log odds = β0 + β1 x
  • 13. Does the risk of wheeze differ between those exposed and unexposed to parental tobacco smoke? ie want to estimate odds ratio for wheeze comparing exposed with unexposed and obtain p-value Outcome: Must be coded 1 for event (+ve outcome), 0 for no event no wheeze = 0 wheeze = 1 Exposure: parent’s don’t smoke = 0 (unexposed/baseline gp) at least one parent smokes = 1 Example in SPSS: one binary exposure
  • 15. Click on options and tick ‘CI for exp(β)’ Move outcome into ‘dependent’ box and binary exposure into ‘covariates’ box
  • 16. Variables in the Equation .370 .151 6.005 1 .014 1.447 1.077 1.945 -3.415 .075 2088.807 1 .000 .033 anysmoke Constant Step 1 a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: anysmoke. a. Odds ratio of wheeze for exposed vs unexposed Odds of wheeze in unexposed 95% CI for OR P value for smoking effect (Wald test) As no other explanatory variables (exposures) in the model, the odds ratio above is the ‘unadjusted OR’ and the same as that computed from the cross-tab. The p-value above will be very similar to that obtained from chi-squared test earlier β0, ie log odds of wheeze in unexposed β1, ie log OR SPSS output and interpretation
  • 17. Have put site in the model as well, and have chosen simple(first) contract for each so SPSS gives odds ratio for levels of site and smoking relative to the lowest level. Multiple logistic regression: other “confounders” can be added to the model for “adjustment” eg site
  • 18. Deciding on confounding Variables in the Equation .560 .154 13.174 1 .000 1.751 1.294 2.369 -.894 .155 33.170 1 .000 .409 .302 .555 -3.339 .082 1639.188 1 .000 .035 anysmoke(1) site(1) Constant Step 1 a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: anysmoke, site. a. Interpretation: After adjusting for site, • the OR for smoking has increased from 1.45 (unadjusted) to 1.75 suggesting site is a confounder in this relation • We sometimes use a 10% change in the effect of interest to decide if there is important confounding. • In this case, the change is more than 10% so site is an important confounder. We should adjust for it. OR and 95% CI for being exposed to smoke vs unexposed, adjusted for site
  • 19. Modelling strategies Strategy depends on study aim: Strategy 1: If study aim is to describe a single exposure- outcome relation, adjusted for potential confounding variables Strategy 2: If study aim is to establish which of a list of exposure variables are associated with outcome, ie to find the 'best' model to predict outcome
  • 20. STRATEGY 1 Examples:  Study to investigate whether physical activity reduces blood pressure  Study of the relation between physical inactivity and depression. i.e. one exposure, one outcome
  • 21. Want to get an ‘adjusted’ regression coefficient, and associated CI and p-value, for exposure of interest 1.Start by looking at unadjusted regression coefficient for exposure of interest 2.Add all a priori confounders (age, sex, …) to the model. What is the regression coefficient for the exposure of interest now? Strategy 1: Steps to follow
  • 22. Strategy 1: Steps to follow (continued) 3. Does adding any other potential confounder to the model make a difference to the magnitude of the regression coefficient of interest (>10% change)? NO - present coefficient from step 2, adjusted for a priori confounders only. YES - include confounder in model and present coefficient for exposure of interest arising from this model.
  • 23. Example: What are the risk factors for predicting inactivity? i.e. multiple exposures, one outcome Want to find the ‘best’ model:  best at predicting dependent variable  best at explaining variation in dependent variable Should aim to have a simple a model as possible STRATEGY 2
  • 24. Multiple logistic regression: Used to look at predictors o f physical inactivity
  • 25. Strategy 2: Model building  No perfect answer  Formal selection procedures (forward, backward and stepwise), particularly the automatic procedures in SPSS, should be used with caution • Can lead to very different results depending on procedure used • Avoids thinking about the specific research question and can hence lead to models that include ‘implausible’ variables and exclude ‘known risk factors’
  • 26. Strategy 2: Model building 1. Use a systematic approach where you think about the particular problem 2. If the literature points to any established risk factors, these should probably be included. 3. Look at the univariate (unadjusted) significance of each predictor 4. Fit the significant ones together into a multivariate model 5. Remove the non-significant ones 6. Try adding back any that are omitted. 7. And repeat …..!
  • 27. 1. Don’t include any explanatory variable whose relation with the outcome is implausible 2. Don’t include two explanatory variables that are closely related (collinear), eg both Townsend and Carstairs social deprivation score 3. Don’t try modelling large numbers of variables to small datasets. There should be at least 10 times as many observations as variables in the model 4. If you are unsure which of a number of models is best, see how alternative models effect the conclusions. General principles of model building