SlideShare a Scribd company logo
REGRESSION
ANALYSIS
Regression
• Regression: technique concerned with predicting some
variables by knowing others
• The process of predicting variable Y using variable X
• Tells you how values in y change as a function of changes
in values of x
Correlation and Regression
• Correlation describes the strength of a linear
relationship between two variables
• Linear means “straight line”
• Regression tells us how to draw the straight line
described by the correlation
Regression
• Calculates the “best-fit” line for a certain set of data
• The regression line makes the sum of the squares of the
residuals smaller than for any other line
• Regression minimizes residuals
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
Wt (kg)
SBP(mmHg)
Regression
• we are able to construct a best fitting straight line to the
scatter diagram points and then formulate a regression
equation in the form of:
Simple Linear Regression
Independent variable (x)
Dependentvariable(y)
The output of a regression is a function that predicts the dependent
variable based upon values of the independent variables.
Simple regression fits a straight line to the data.
y’ = b0 + b1X ± є
b0 (y intercept)
B1 = slope
= ∆y/ ∆x
є
Simple Linear Regression
Independent variable (x)
Dependentvariable
The function will make a prediction for each observed data point.
The observation is denoted by y and the prediction is denoted by y.
Zero
Prediction: y
Observation: y
^
^
Simple Linear Regression
For each observation, the variation can be described as:
y = y + ε
Actual = Explained + Error
Zero
Prediction error: ε
^
Prediction: y^
Observation: y
Regression
Independent variable (x)
Dependentvariable
A least squares regression selects the line with the lowest total sum
of squared prediction errors.
This value is called the Sum of Squares of Error, or SSE.
Calculating SSR
Independent variable (x)
Dependentvariable
The Sum of Squares Regression (SSR) is the sum of the squared
differences between the prediction for each observation and the
population mean.
Population mean: y
Regression Formulas
The Total Sum of Squares (SST) is equal to SSR + SSE.
Mathematically,
SSR = ∑ ( y – y ) (measure of explained variation)
SSE = ∑ ( y – y ) (measure of unexplained variation)
SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y)
^
^
2
2
The Coefficient of Determination
The proportion of total variation (SST) that is explained by the
regression (SSR) is known as the Coefficient of Determination, and is
often referred to as R .
R = =
The value of R can range between 0 and 1, and the higher its value
the more accurate the regression model is. It is often referred to as a
percentage.
SSR SSR
SST SSR + SSE
2
2
2
Standard Error of Regression
The Standard Error of a regression is a measure of its variability. It
can be used in a similar manner to standard deviation, allowing for
prediction intervals.
y ± 2 standard errors will provide approximately 95% accuracy, and 3
standard errors will provide a 99% confidence interval.
Standard Error is calculated by taking the square root of the average
prediction error.
Standard Error =
SSE
n-k
Where n is the number of observations in the sample and
k is the total number of variables in the model
√
The output of a simple regression is the coefficient β and the
constant A. The equation is then:
y = A + β * x + ε
where ε is the residual error.
β is the per unit change in the dependent variable for each unit
change in the independent variable. Mathematically:
β =
∆ y
∆ x
Multiple Linear Regression
More than one independent variable can be used to explain variance in
the dependent variable, as long as they are not linearly related.
A multiple regression takes the form:
y = A + β X + β X + … + β k Xk + ε
where k is the number of variables, or parameters.
1 1 2 2
Multicollinearity
Multicollinearity is a condition in which at least 2 independent
variables are highly linearly correlated. It will often crash computers.
Example table of
Correlations
Y X1 X2
Y 1.000
X1 0.802 1.000
X2 0.848 0.578 1.000
A correlations table can suggest which independent variables may be
significant. Generally, an ind. variable that has more than a
correlation with the dependent variable and less than with any other
ind. variable can be included as a possible predictor.
Nonlinear Regression
Nonlinear functions can also be fit as regressions. Common
choices include Power, Logarithmic, Exponential, and Logistic,
but any continuous function can be used.
Not Linear Linear

x
residuals
x
Y
x
Y
x
residuals
Regression Output in Excel
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.982655
R Square 0.96561
Adjusted R Square 0.959879
Standard Error 26.01378
Observations 15
ANOVA
df SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%
Intercept 562.151 21.0931 26.65094 4.78E-12 516.1931 608.1089
Temperature -5.436581 0.336216 -16.1699 1.64E-09 -6.169133 -4.704029
Insulation -20.01232 2.342505 -8.543127 1.91E-06 -25.1162 -14.90844
Estimated Heating Oil = 562.15 - 5.436 (Temperature) - 20.012 (Insulation)
Y = B0 + B1 X1 + B2X2 + B3X3 - - - +/- Error
Total = Estimated/Predicted +/- Error
Significance testing…
Slope
Distribution of slope ~ Tn-2
ˆ
H0: β1 = 0 (no linear relationship)
H1: β1  0 (linear relationship does exist)
)ˆ.(.
0ˆ


es
Tn-2=
Functions of multivariate analysis:
• Control for confounders
• Test for interactions between predictors (effect modification)
• Improve predictions
Interpreting Regression
Continuous outcome (means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent
groups
ANOVA: compares means
between more than two
independent groups
Pearson’s correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
multivariate regression technique
used when the outcome is
continuous; gives slopes
Paired ttest: compares means
between two related groups (e.g.,
the same subjects before and
after)
Repeated-measures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two
or more groups; gives rate of
change over time
Non-parametric statistics
Wilcoxon sign-rank test:
non-parametric alternative to the
paired ttest
Wilcoxon sum-rank test
(=Mann-Whitney U test): non-
parametric alternative to the ttest
Kruskal-Wallis test: non-
parametric alternative to ANOVA
Spearman rank correlation
coefficient: non-parametric
alternative to Pearson’s correlation
coefficient
Covariance
1
))((
),(cov 1




n
YyXx
yx
n
i
ii
covariance is a measure of the joint variability of two random variables
cov(X,Y) > 0 X and Y are positively correlated
cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
Types of variables to be analyzed
Statistical procedure
or measure of associationPredictor variable/s Outcome variable
Cross-sectional/case-control studies
Categorical (>2 groups) Continuous ANOVA
Continuous Continuous Simple linear regression
Multivariate
(categorical and
continuous)
Continuous Multiple linear regression
Categorical Categorical
Chi-square test (or Fisher’s
exact)
Binary Binary Odds ratio, risk ratio
Multivariate Binary Logistic regression
Cohort Studies/Clinical Trials
Binary Binary Risk ratio
Categorical Time-to-event Kaplan-Meier/ log-rank test
Multivariate Time-to-event
Cox-proportional hazards
regression, hazard ratio
Binary (two groups) Continuous T-test
Binary Ranks/ordinal Wilcoxon rank-sum test
Categorical Continuous Repeated measures ANOVA
Multivariate Continuous
Mixed models; GEE
modeling
Alternative summary: statistics for
various types of outcome data
Outcome Variable
Are the observations independent or
correlated?
Assumptionsindependent correlated
Continuous
(e.g. pain scale,
cognitive function)
Ttest
ANOVA
Linear correlation
Linear regression
Paired ttest
Repeated-measures ANOVA
Mixed models/GEE modeling
Outcome is normally
distributed (important
for small samples).
Outcome and predictor
have a linear
relationship.
Binary or
categorical
(e.g. fracture yes/no)
Difference in proportions
Relative risks
Chi-square test
Logistic regression
McNemar’s test
Conditional logistic regression
GEE modeling
Chi-square test
assumes sufficient
numbers in each cell
(>=5)
Time-to-event
(e.g. time to fracture)
Kaplan-Meier statistics
Cox regression
n/a Cox regression
assumes proportional
hazards between
groups
Continuous outcome (means);
HRP 259/HRP 262
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent
groups
ANOVA: compares means
between more than two
independent groups
Pearson’s correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
multivariate regression technique
used when the outcome is
continuous; gives slopes
Paired ttest: compares means
between two related groups (e.g.,
the same subjects before and
after)
Repeated-measures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two
or more groups; gives rate of
change over time
Non-parametric statistics
Wilcoxon sign-rank test:
non-parametric alternative to the
paired ttest
Wilcoxon sum-rank test
(=Mann-Whitney U test): non-
parametric alternative to the ttest
Kruskal-Wallis test: non-
parametric alternative to ANOVA
Spearman rank correlation
coefficient: non-parametric
alternative to Pearson’s correlation
coefficient
Binary or categorical outcomes
(proportions); HRP 259/HRP 261
Outcome
Variable
Are the observations correlated? Alternative to the chi-
square test if sparse
cells:independent correlated
Binary or
categorical
(e.g.
fracture,
yes/no)
Chi-square test:
compares proportions between
two or more groups
Relative risks: odds ratios
or risk ratios
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios
McNemar’s chi-square test:
compares binary outcome between
correlated groups (e.g., before and
after)
Conditional logistic
regression: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., matched data)
GEE modeling: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., repeated measures)
Fisher’s exact test: compares
proportions between independent
groups when there are sparse data
(some cells <5).
McNemar’s exact test:
compares proportions between
correlated groups when there are
sparse data (some cells <5).
Time-to-event outcome (survival
data); HRP 262
Outcome
Variable
Are the observation groups independent or correlated? Modifications to
Cox regression
if proportional-
hazards is
violated:
independent correlated
Time-to-
event (e.g.,
time to
fracture)
Kaplan-Meier statistics: estimates survival functions for
each group (usually displayed graphically); compares survival
functions with log-rank test
Cox regression: Multivariate technique for time-to-event data;
gives multivariate-adjusted hazard ratios
n/a (already over
time)
Time-dependent
predictors or time-
dependent hazard
ratios (tricky!)

More Related Content

PPTX
Simple Linear Regression: Step-By-Step
PPT
Regression analysis
PPTX
Basics of Regression analysis
PPTX
Regression
PPTX
Regression analysis
PPTX
Statistics-Regression analysis
PPTX
Regression Analysis
PPTX
Multiple Regression Analysis (MRA)
Simple Linear Regression: Step-By-Step
Regression analysis
Basics of Regression analysis
Regression
Regression analysis
Statistics-Regression analysis
Regression Analysis
Multiple Regression Analysis (MRA)

What's hot (20)

PDF
Regression Analysis
PPTX
Regression ppt
PPTX
Regression
PPTX
Regression analysis.
PDF
Simple linear regression
PPTX
Regression Analysis
PPTX
non parametric statistics
PPT
Mann Whitney U test
PPTX
Wilcoxon signed rank test
PPT
Regression analysis ppt
PPTX
The mann whitney u test
PPT
Introduction to ANOVAs
PPTX
Normal distribution
PPTX
Degrees of freedom
PPTX
Regression analysis
PPTX
Sampling distribution
PPTX
Hypothesis testing ppt final
PDF
Introduction to ANOVA
PPT
Bivariate analysis
PPTX
Chi square test
Regression Analysis
Regression ppt
Regression
Regression analysis.
Simple linear regression
Regression Analysis
non parametric statistics
Mann Whitney U test
Wilcoxon signed rank test
Regression analysis ppt
The mann whitney u test
Introduction to ANOVAs
Normal distribution
Degrees of freedom
Regression analysis
Sampling distribution
Hypothesis testing ppt final
Introduction to ANOVA
Bivariate analysis
Chi square test
Ad

Similar to Regression analysis (20)

PPT
PPT
lecture13.ppt
PPT
lecture13.ppt
PPT
Linear regression.ppt
PPT
lecture13.ppt
PPT
Slideset Simple Linear Regression models.ppt
PPT
lecture13.ppt
PPT
SimpleLinearRegressionAnalysisWithExamples.ppt
PPTX
PPT
Regression.ppt basic introduction of regression with example
PPTX
6 the six uContinuous data analysis.pptx
PPTX
STATISTICAL REGRESSION MODELS
PPTX
regression analysis presentation slides.
PPTX
simple and multiple linear Regression. (1).pptx
PDF
Role of regression in statistics (2)
PPT
Regression and Co-Relation
PPTX
Regression analysis in R
PPTX
Regression
PPTX
Regression-SIMPLE LINEAR (1).psssssssssptx
lecture13.ppt
lecture13.ppt
Linear regression.ppt
lecture13.ppt
Slideset Simple Linear Regression models.ppt
lecture13.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
Regression.ppt basic introduction of regression with example
6 the six uContinuous data analysis.pptx
STATISTICAL REGRESSION MODELS
regression analysis presentation slides.
simple and multiple linear Regression. (1).pptx
Role of regression in statistics (2)
Regression and Co-Relation
Regression analysis in R
Regression
Regression-SIMPLE LINEAR (1).psssssssssptx
Ad

More from University of Jaffna (16)

PPTX
Inferential Statistics
PPTX
Test of hypotheses
PPTX
Descriptive statistics
PPTX
PPTX
Comparing means
PPTX
Data presentation
PPTX
Intro to stats
PPTX
Capital stucture copy
PPTX
Overview of cost & Management Accounting
PPTX
Service portfolio structure & firm performance
PPTX
Impact of leverage on shareholders’ return (1)
PPTX
Comparison of strategic plan of cbsl & sbp
PPTX
srilankan apparel industry
PPT
business failures in Jaffna
PPTX
tax administration in srilanka
Inferential Statistics
Test of hypotheses
Descriptive statistics
Comparing means
Data presentation
Intro to stats
Capital stucture copy
Overview of cost & Management Accounting
Service portfolio structure & firm performance
Impact of leverage on shareholders’ return (1)
Comparison of strategic plan of cbsl & sbp
srilankan apparel industry
business failures in Jaffna
tax administration in srilanka

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Global journeys: estimating international migration
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Logistic Regression ml machine learning.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Fluorescence-microscope_Botany_detailed content
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Acumen Training GuidePresentation.pptx
Global journeys: estimating international migration
STUDY DESIGN details- Lt Col Maksud (21).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Business Ppt On Nestle.pptx huunnnhhgfvu
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Taxes Foundatisdcsdcsdon Certificate.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Logistic Regression ml machine learning.pptx
Mega Projects Data Mega Projects Data
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

Regression analysis

  • 2. Regression • Regression: technique concerned with predicting some variables by knowing others • The process of predicting variable Y using variable X • Tells you how values in y change as a function of changes in values of x
  • 3. Correlation and Regression • Correlation describes the strength of a linear relationship between two variables • Linear means “straight line” • Regression tells us how to draw the straight line described by the correlation
  • 4. Regression • Calculates the “best-fit” line for a certain set of data • The regression line makes the sum of the squares of the residuals smaller than for any other line • Regression minimizes residuals 80 100 120 140 160 180 200 220 60 70 80 90 100 110 120 Wt (kg) SBP(mmHg)
  • 5. Regression • we are able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of:
  • 6. Simple Linear Regression Independent variable (x) Dependentvariable(y) The output of a regression is a function that predicts the dependent variable based upon values of the independent variables. Simple regression fits a straight line to the data. y’ = b0 + b1X ± є b0 (y intercept) B1 = slope = ∆y/ ∆x є
  • 7. Simple Linear Regression Independent variable (x) Dependentvariable The function will make a prediction for each observed data point. The observation is denoted by y and the prediction is denoted by y. Zero Prediction: y Observation: y ^ ^
  • 8. Simple Linear Regression For each observation, the variation can be described as: y = y + ε Actual = Explained + Error Zero Prediction error: ε ^ Prediction: y^ Observation: y
  • 9. Regression Independent variable (x) Dependentvariable A least squares regression selects the line with the lowest total sum of squared prediction errors. This value is called the Sum of Squares of Error, or SSE.
  • 10. Calculating SSR Independent variable (x) Dependentvariable The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean. Population mean: y
  • 11. Regression Formulas The Total Sum of Squares (SST) is equal to SSR + SSE. Mathematically, SSR = ∑ ( y – y ) (measure of explained variation) SSE = ∑ ( y – y ) (measure of unexplained variation) SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y) ^ ^ 2 2
  • 12. The Coefficient of Determination The proportion of total variation (SST) that is explained by the regression (SSR) is known as the Coefficient of Determination, and is often referred to as R . R = = The value of R can range between 0 and 1, and the higher its value the more accurate the regression model is. It is often referred to as a percentage. SSR SSR SST SSR + SSE 2 2 2
  • 13. Standard Error of Regression The Standard Error of a regression is a measure of its variability. It can be used in a similar manner to standard deviation, allowing for prediction intervals. y ± 2 standard errors will provide approximately 95% accuracy, and 3 standard errors will provide a 99% confidence interval. Standard Error is calculated by taking the square root of the average prediction error. Standard Error = SSE n-k Where n is the number of observations in the sample and k is the total number of variables in the model √
  • 14. The output of a simple regression is the coefficient β and the constant A. The equation is then: y = A + β * x + ε where ε is the residual error. β is the per unit change in the dependent variable for each unit change in the independent variable. Mathematically: β = ∆ y ∆ x
  • 15. Multiple Linear Regression More than one independent variable can be used to explain variance in the dependent variable, as long as they are not linearly related. A multiple regression takes the form: y = A + β X + β X + … + β k Xk + ε where k is the number of variables, or parameters. 1 1 2 2
  • 16. Multicollinearity Multicollinearity is a condition in which at least 2 independent variables are highly linearly correlated. It will often crash computers. Example table of Correlations Y X1 X2 Y 1.000 X1 0.802 1.000 X2 0.848 0.578 1.000 A correlations table can suggest which independent variables may be significant. Generally, an ind. variable that has more than a correlation with the dependent variable and less than with any other ind. variable can be included as a possible predictor.
  • 17. Nonlinear Regression Nonlinear functions can also be fit as regressions. Common choices include Power, Logarithmic, Exponential, and Logistic, but any continuous function can be used.
  • 19. Regression Output in Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.982655 R Square 0.96561 Adjusted R Square 0.959879 Standard Error 26.01378 Observations 15 ANOVA df SS MS F Significance F Regression 2 228014.6 114007.3 168.4712 1.65E-09 Residual 12 8120.603 676.7169 Total 14 236135.2 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95% Intercept 562.151 21.0931 26.65094 4.78E-12 516.1931 608.1089 Temperature -5.436581 0.336216 -16.1699 1.64E-09 -6.169133 -4.704029 Insulation -20.01232 2.342505 -8.543127 1.91E-06 -25.1162 -14.90844 Estimated Heating Oil = 562.15 - 5.436 (Temperature) - 20.012 (Insulation) Y = B0 + B1 X1 + B2X2 + B3X3 - - - +/- Error Total = Estimated/Predicted +/- Error
  • 20. Significance testing… Slope Distribution of slope ~ Tn-2 ˆ H0: β1 = 0 (no linear relationship) H1: β1  0 (linear relationship does exist) )ˆ.(. 0ˆ   es Tn-2=
  • 21. Functions of multivariate analysis: • Control for confounders • Test for interactions between predictors (effect modification) • Improve predictions
  • 23. Continuous outcome (means) Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non- parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient
  • 24. Covariance 1 ))(( ),(cov 1     n YyXx yx n i ii covariance is a measure of the joint variability of two random variables
  • 25. cov(X,Y) > 0 X and Y are positively correlated cov(X,Y) < 0 X and Y are inversely correlated cov(X,Y) = 0 X and Y are independent Interpreting Covariance
  • 26. Types of variables to be analyzed Statistical procedure or measure of associationPredictor variable/s Outcome variable Cross-sectional/case-control studies Categorical (>2 groups) Continuous ANOVA Continuous Continuous Simple linear regression Multivariate (categorical and continuous) Continuous Multiple linear regression Categorical Categorical Chi-square test (or Fisher’s exact) Binary Binary Odds ratio, risk ratio Multivariate Binary Logistic regression Cohort Studies/Clinical Trials Binary Binary Risk ratio Categorical Time-to-event Kaplan-Meier/ log-rank test Multivariate Time-to-event Cox-proportional hazards regression, hazard ratio Binary (two groups) Continuous T-test Binary Ranks/ordinal Wilcoxon rank-sum test Categorical Continuous Repeated measures ANOVA Multivariate Continuous Mixed models; GEE modeling
  • 27. Alternative summary: statistics for various types of outcome data Outcome Variable Are the observations independent or correlated? Assumptionsindependent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Difference in proportions Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Chi-square test assumes sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups
  • 28. Continuous outcome (means); HRP 259/HRP 262 Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non- parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient
  • 29. Binary or categorical outcomes (proportions); HRP 259/HRP 261 Outcome Variable Are the observations correlated? Alternative to the chi- square test if sparse cells:independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate-adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).
  • 30. Time-to-event outcome (survival data); HRP 262 Outcome Variable Are the observation groups independent or correlated? Modifications to Cox regression if proportional- hazards is violated: independent correlated Time-to- event (e.g., time to fracture) Kaplan-Meier statistics: estimates survival functions for each group (usually displayed graphically); compares survival functions with log-rank test Cox regression: Multivariate technique for time-to-event data; gives multivariate-adjusted hazard ratios n/a (already over time) Time-dependent predictors or time- dependent hazard ratios (tricky!)