Linear Regression and Correlation
• Explanatory and Response Variables are Numeric
• Relationship between the mean of the response
variable and the level of the explanatory variable
assumed to be approximately linear (straight line)
• Model:
)
,
0
(
~
1
0 



 N
x
Y 


• 1 > 0  Positive Association
• 1 < 0  Negative Association
• 1 = 0  No Association
Least Squares Estimation of 0, 1
 0  Mean response when x=0 (y-intercept)
 1  Change in mean response when x increases by 1
unit (slope)
• 0,1 are unknown parameters (like )
• 0+1x  Mean response when explanatory variable
takes on the value x
• Goal: Choose values (estimates) that minimize the sum
of squared errors (SSE) of observed values to the
straight-line:
2
1 1
^
0
^
1
2
^
1
^
0
^
^

 


























n
i i
i
n
i i
i x
y
y
y
SSE
x
y 



Example - Pharmacodynamics of LSD
Score (y) LSD Conc (x)
78.93 1.17
58.20 2.97
67.47 3.26
37.47 4.69
45.65 5.83
32.92 6.00
29.97 6.41
• Response (y) - Math score (mean among 5 volunteers)
• Predictor (x) - LSD tissue concentration (mean of 5 volunteers)
• Raw Data and scatterplot of Score vs LSD concentration:
LSD_CONC
7
6
5
4
3
2
1
SCORE
80
70
60
50
40
30
20
Source: Wagner, et al (1968)
Least Squares Computations
 
 
 
 
 
 
 
2
2
2
^
2
1
^
0
^
2
1
^
2
2































n
SSE
n
y
y
s
x
y
S
S
x
x
y
y
x
x
y
y
S
y
y
x
x
S
x
x
S
xx
xy
yy
xy
xx



Example - Pharmacodynamics of LSD
72
.
50
01
.
9
10
.
89
10
.
89
)
33
.
4
)(
01
.
9
(
09
.
50
01
.
9
4749
.
22
4872
.
202
333
.
4
7
33
.
30
087
.
50
7
61
.
350
2
^
1
^
0
^
1
^

















s
x
y
x
y
x
y



Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy
78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649
58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769
67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689
37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689
45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969
32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889
29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689
350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343
(Column totals given in bottom row of table)
SPSS Output and Plot of Equation
Coefficientsa
89.124 7.048 12.646 .000
-9.009 1.503 -.937 -5.994 .002
(Constant)
LSD_CONC
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: SCORE
a.
Linear Regression
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00
score







score = 89.12 + -9.01 * lsd_conc
R-Square = 0.88
Math Score vs LSD Concentration (SPSS)
Inference Concerning the Slope (1)
• Parameter: Slope in the population model (1)
• Estimator: Least squares estimate:
• Estimated standard error:
• Methods of making inference regarding population:
– Hypothesis tests (2-sided or 1-sided)
– Confidence Intervals
1
^

xx
S
s /
^
1
^



Hypothesis Test for 1
• 2-Sided Test
– H0: 1 = 0
– HA: 1  0
• 1-sided Test
– H0: 1 = 0
– HA
+
: 1 > 0 or
– HA
-
: 1 < 0
|)
|
(
2
:
|
|
:
.
.
:
.
.
2
,
2
/
^
1
^
1
^
obs
n
obs
obs
t
t
P
val
P
t
t
R
R
t
S
T









)
(
:
)
(
:
:
.
.
:
.
.
:
.
.
2
,
2
,
^
1
^
1
^
obs
obs
n
obs
n
obs
obs
t
t
P
val
P
t
t
P
val
P
t
t
R
R
t
t
R
R
t
S
T



















(1-)100% Confidence Interval for 1
xx
S
s
t
t 2
/
1
^
^
2
/
1
^
1
^


 

 


• Conclude positive association if entire interval above 0
• Conclude negative association if entire interval below 0
• Cannot conclude an association if interval contains 0
• Conclusion based on interval is same as 2-sided hypothesis test
Example - Pharmacodynamics of LSD
50
.
1
475
.
22
12
.
7
475
.
22
12
.
7
72
.
50
01
.
9
7
1
^
^
1
^










 xx
S
s
n
• Testing H0: 1 = 0 vs HA: 1  0
571
.
2
|
:|
.
.
01
.
6
50
.
1
01
.
9
:
.
. 5
,
025
. 




 t
t
R
R
t
S
T obs
obs
• 95% Confidence Interval for 1 :
)
15
.
5
,
87
.
12
(
86
.
3
01
.
9
)
50
.
1
(
571
.
2
01
.
9 







Correlation Coefficient
• Measures the strength of the linear association
between two variables
• Takes on the same sign as the slope estimate from
the linear regression
• Not effected by linear transformations of y or x
• Does not distinguish between dependent and
independent variable (e.g. height and weight)
• Population Parameter - 
• Pearson’s Correlation Coefficient:
1
1 


 r
S
S
S
r
yy
xx
xy
Correlation Coefficient
• Values close to 1 in absolute value  strong linear
association, positive or negative from sign
• Values close to 0 imply little or no association
• If data contain outliers (are non-normal),
Spearman’s coefficient of correlation can be
computed based on the ranks of the x and y values
• Test of H0: = 0 is equivalent to test of H0:1=0
• Coefficient of Determination (r2
) - Proportion of
variation in y “explained” by the regression on x:
1
0
)
( 2
2
2




 r
S
SSE
S
r
r
yy
yy
Example - Pharmacodynamics of LSD
2
2
)
94
.
0
(
88
.
0
183
.
2078
89
.
253
183
.
2078
94
.
0
)
183
.
2078
)(
475
.
22
(
487
.
202
89
.
253
183
.
2078
487
.
202
475
.
22














r
r
SSE
S
S
S yy
xy
xx
Mean
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00 






Mean = 50.09
Linear Regression
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00
score







score = 89.12 + -9.01 * lsd_conc
R-Square = 0.88
Syy SSE
Example - SPSS Output
Pearson’s and Spearman’s Measures
Correlations
1 -.937**
. .002
7 7
-.937** 1
.002 .
7 7
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
SCORE
LSD_CONC
SCORE LSD_CONC
Correlation is significant at the 0.01 level (2-tailed).
**.
Correlations
1.000 -.929**
. .003
7 7
-.929** 1.000
.003 .
7 7
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
SCORE
LSD_CONC
Spearman's rho
SCORE LSD_CONC
Correlation is significant at the 0.01 level (2-tailed).
**.
Analysis of Variance in Regression
• Goal: Partition the total variation in y into variation
“explained” by x and random variation
2
^
2
^
2
^
^
)
(
)
(
)
(
)
(
)
(
)
(


 









y
y
y
y
y
y
y
y
y
y
y
y
i
i
i
i
i
i
i
i
• These three sums of squares and degrees of freedom are:
•Total (Syy) dfTotal = n-1
• Error (SSE) dfError = n-2
• Model (SSR) dfModel = 1
Analysis of Variance in Regression
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square F
Model SSR 1 MSR = SSR/1 F = MSR/MSE
Error SSE n-2 MSE = SSE/(n-2)
Total Syy n-1
• Analysis of Variance - F-test
• H0: 1 = 0 HA: 1  0
)
(
:
:
.
.
:
.
.
2
,
1
,
obs
n
obs
obs
F
F
P
val
P
F
F
R
R
MSE
MSR
F
S
T






Example - Pharmacodynamics of LSD
• Total Sum of squares:
6
1
7
183
.
2078
)
( 2





 Total
i
yy df
y
y
S
• Error Sum of squares:
5
2
7
890
.
253
)
( 2
^





 Error
i
i df
y
y
SSE
• Model Sum of Squares:
1
293
.
1824
890
.
253
183
.
2078
)
( 2
^





 Model
i
df
y
y
SSR
Example - Pharmacodynamics of LSD
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square F
Model 1824.293 1 1824.293 35.93
Error 253.890 5 50.778
Total 2078.183 6
•Analysis of Variance - F-test
• H0: 1 = 0 HA: 1  0
)
93
.
35
(
:
61
.
6
:
.
.
93
.
35
:
.
.
5
,
1
,
05
.






F
P
val
P
F
F
R
R
MSE
MSR
F
S
T
obs
obs
Example - SPSS Output
ANOVAb
1824.302 1 1824.302 35.928 .002a
253.881 5 50.776
2078.183 6
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), LSD_CONC
a.
Dependent Variable: SCORE
b.
Multiple Regression
• Numeric Response variable (Y)
• p Numeric predictor variables
• Model:
Y = 0 + 1x1 +  + pxp + 
• Partial Regression Coefficients: i  effect (on
the mean response) of increasing the ith
predictor
variable by 1 unit, holding all other predictors
constant
Example - Effect of Birth weight on
Body Size in Early Adolescence
• Response: Height at Early adolescence (n =250 cases)
• Predictors (p=6 explanatory variables)
• Adolescent Age (x1, in years -- 11-14)
• Tanner stage (x2, units not given)
• Gender (x3=1 if male, 0 if female)
• Gestational age (x4, in weeks at birth)
• Birth length (x5, units not given)
• Birthweight Group (x6=1,...,6 <1500g (1), 1500-
1999g(2), 2000-2499g(3), 2500-2999g(4), 3000-
3499g(5), >3500g(6))
Source: Falkner, et al (2004)
Least Squares Estimation
• Population Model for mean response:
p
p x
x
Y
E 

 


 
1
1
0
)
(
• Least Squares Fitted (predicted) equation, minimizing SSE:
 











2
^
^
1
1
^
0
^
^
Y
Y
SSE
x
x
Y p
p


 
• All statistical software packages/spreadsheets can
compute least squares estimates and their standard errors
Analysis of Variance
• Direct extension to ANOVA based on simple linear
regression
• Only adjustments are to degrees of freedom:
– dfModel = p dfError = n-p-1
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square F
Model SSR p MSR = SSR/p F = MSR/MSE
Error SSE n-p-1 MSE = SSE/(n-p-1)
Total Syy n-1
yy
yy
yy
S
SSR
S
SSE
S
R 


2
Testing for the Overall Model - F-test
• Tests whether any of the explanatory variables are
associated with the response
• H0: 1==p=0 (None of the xs
associated with y)
• HA: Not all i = 0
)
(
:
:
.
.
)
1
/(
)
1
(
/
:
.
.
1
,
,
2
2
obs
p
n
p
obs
obs
F
F
P
val
P
F
F
R
R
p
n
R
p
R
MSE
MSR
F
S
T











Example - Effect of Birth weight on
Body Size in Early Adolescence
• Authors did not print ANOVA, but did provide following:
• n=250 p=6 R2
=0.26
• H0: 1==6=0
• HA: Not all i = 0
)
2
.
14
(
:
13
.
2
:
.
.
2
.
14
0030
.
0433
.
)
1
6
250
/(
)
26
.
0
1
(
6
/
26
.
0
)
1
/(
)
1
(
/
:
.
.
243
,
6
,
2
2
















F
P
val
P
F
F
R
R
p
n
R
p
R
MSE
MSR
F
S
T
obs
obs

Testing Individual Partial Coefficients - t-tests
• Wish to determine whether the response is
associated with a single explanatory variable, after
controlling for the others
• H0: i = 0 HA: i  0 (2-sided alternative)
|)
|
(
2
:
|
|
:
.
.
:
.
.
1
,
2
/
^
^
^
obs
p
n
obs
i
obs
t
t
P
val
P
t
t
R
R
t
S
T
i










Example - Effect of Birth weight on
Body Size in Early Adolescence
Variable b sb t=b/sb P-val (z)
Adolescent Age 2.86 0.99 2.89 .0038
Tanner Stage 3.41 0.89 3.83 <.001
Male 0.08 1.26 0.06 .9522
Gestational Age -0.11 0.21 -0.52 .6030
Birth Length 0.44 0.19 2.32 .0204
Birth Wt Grp -0.78 0.64 -1.22 .2224
Controlling for all other predictors, adolescent age,
Tanner stage, and Birth length are associated with
adolescent height measurement
Models with Dummy Variables
• Some models have both numeric and categorical
explanatory variables (Recall gender in example)
• If a categorical variable has k levels, need to create
k-1 dummy variables that take on the values 1 if
the level of interest is present, 0 otherwise.
• The baseline level of the categorical variable for
which all k-1 dummy variables are set to 0
• The regression coefficient corresponding to a
dummy variable is the difference between the
mean for that level and the mean for baseline
group, controlling for all numeric predictors
Example - Deep Cervical Infections
• Subjects - Patients with deep neck infections
• Response (Y) - Length of Stay in hospital
• Predictors: (One numeric, 11 Dichotomous)
– Age (x1)
– Gender (x2=1 if female, 0 if male)
– Fever (x3=1 if Body Temp > 38C, 0 if not)
– Neck swelling (x4=1 if Present, 0 if absent)
– Neck Pain (x5=1 if Present, 0 if absent)
– Trismus (x6=1 if Present, 0 if absent)
– Underlying Disease (x7=1 if Present, 0 if absent)
– Respiration Difficulty (x8=1 if Present, 0 if absent)
– Complication (x9=1 if Present, 0 if absent)
– WBC > 15000/mm3
(x10=1 if Present, 0 if absent)
– CRP > 100g/ml (x11=1 if Present, 0 if absent)
Source: Wang, et al (2003)
Example - Weather and Spinal Patients
• Subjects - Visitors to National Spinal Network in 23 cities
Completing SF-36 Form
• Response - Physical Function subscale (1 of 10 reported)
• Predictors:
– Patient’s age (x1)
– Gender (x2=1 if female, 0 if male)
– High temperature on day of visit (x3)
– Low temperature on day of visit (x4)
– Dew point (x5)
– Wet bulb (x6)
– Total precipitation (x7)
– Barometric Pressure (x7)
– Length of sunlight (x8)
– Moon Phase (new, wax crescent, 1st Qtr, wax gibbous, full moon, wan
gibbous, last Qtr, wan crescent, presumably had 8-1=7 dummy
variables)
Source: Glaser, et al (2004)
Analysis of Covariance
• Combination of 1-Way ANOVA and Linear
Regression
• Goal: Comparing numeric responses among k
groups, adjusting for numeric concomitant
variable(s), referred to as Covariate(s)
• Clinical trial applications: Response is Post-Trt
score, covariate is Pre-Trt score
• Epidemiological applications: Outcomes
compared across exposure conditions, adjusted for
other risk factors (age, smoking status, sex,...)

More Related Content

PPT
chapter15c.ppt
PPT
chapter15c.ppt
PDF
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
PPT
15.Simple Linear Regression of case study-530 (2).ppt
PDF
need help with stats 301 assignment help
PPT
Statistics08_Cut_Regression.jdnkdjvbjddj
PPT
LogisticRegressionDichotomousResponse.ppt
PPT
regression analysis .ppt
chapter15c.ppt
chapter15c.ppt
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
15.Simple Linear Regression of case study-530 (2).ppt
need help with stats 301 assignment help
Statistics08_Cut_Regression.jdnkdjvbjddj
LogisticRegressionDichotomousResponse.ppt
regression analysis .ppt

Similar to linear regression and correlation power point (20)

PPTX
Correlation _ Regression Analysis statistics.pptx
PPT
Ttestrrrrrrrrrrrrrr2dfsssssssssssss008.ppt
PPT
Deconvolution
PDF
PPTX
14. Regression_RcOMMANDER .pptx
PPTX
Measures of Dispersion: Standard Deviation and Co- efficient of Variation
PPTX
Chi square distribution and analysis of frequencies.pptx
PDF
Curve Fitting in Numerical Methods Regression
PDF
Unit 1 Correlation- BSRM.pdf
PPTX
Regression and Correlation Analysis-NEW NORMAL.pptx
PPTX
Regression and corelation (Biostatistics)
PDF
bivariate-for electronic And communication engineering.pdf
PDF
PDF
Sparsenet
PPT
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
PPT
Statistics
PPT
Regression and Co-Relation
PDF
simple linear regression - brief introduction
PPTX
Statistics-Regression analysis
PPTX
Variance component analysis by paravayya c pujeri
Correlation _ Regression Analysis statistics.pptx
Ttestrrrrrrrrrrrrrr2dfsssssssssssss008.ppt
Deconvolution
14. Regression_RcOMMANDER .pptx
Measures of Dispersion: Standard Deviation and Co- efficient of Variation
Chi square distribution and analysis of frequencies.pptx
Curve Fitting in Numerical Methods Regression
Unit 1 Correlation- BSRM.pdf
Regression and Correlation Analysis-NEW NORMAL.pptx
Regression and corelation (Biostatistics)
bivariate-for electronic And communication engineering.pdf
Sparsenet
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Statistics
Regression and Co-Relation
simple linear regression - brief introduction
Statistics-Regression analysis
Variance component analysis by paravayya c pujeri
Ad

Recently uploaded (20)

PPTX
IMPACT OF LANDSLIDE.....................
PDF
Global Data and Analytics Market Outlook Report
PDF
An essential collection of rules designed to help businesses manage and reduc...
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPT
statistic analysis for study - data collection
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPT
Image processing and pattern recognition 2.ppt
PPTX
New ISO 27001_2022 standard and the changes
PPTX
recommendation Project PPT with details attached
PPTX
Machine Learning and working of machine Learning
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
CYBER SECURITY the Next Warefare Tactics
DOCX
Factor Analysis Word Document Presentation
PDF
Session 11 - Data Visualization Storytelling (2).pdf
IMPACT OF LANDSLIDE.....................
Global Data and Analytics Market Outlook Report
An essential collection of rules designed to help businesses manage and reduc...
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
statistic analysis for study - data collection
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
expt-design-lecture-12 hghhgfggjhjd (1).ppt
Image processing and pattern recognition 2.ppt
New ISO 27001_2022 standard and the changes
recommendation Project PPT with details attached
Machine Learning and working of machine Learning
DU, AIS, Big Data and Data Analytics.ppt
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
SET 1 Compulsory MNH machine learning intro
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
CYBER SECURITY the Next Warefare Tactics
Factor Analysis Word Document Presentation
Session 11 - Data Visualization Storytelling (2).pdf
Ad

linear regression and correlation power point

  • 1. Linear Regression and Correlation • Explanatory and Response Variables are Numeric • Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) • Model: ) , 0 ( ~ 1 0      N x Y    • 1 > 0  Positive Association • 1 < 0  Negative Association • 1 = 0  No Association
  • 2. Least Squares Estimation of 0, 1  0  Mean response when x=0 (y-intercept)  1  Change in mean response when x increases by 1 unit (slope) • 0,1 are unknown parameters (like ) • 0+1x  Mean response when explanatory variable takes on the value x • Goal: Choose values (estimates) that minimize the sum of squared errors (SSE) of observed values to the straight-line: 2 1 1 ^ 0 ^ 1 2 ^ 1 ^ 0 ^ ^                              n i i i n i i i x y y y SSE x y    
  • 3. Example - Pharmacodynamics of LSD Score (y) LSD Conc (x) 78.93 1.17 58.20 2.97 67.47 3.26 37.47 4.69 45.65 5.83 32.92 6.00 29.97 6.41 • Response (y) - Math score (mean among 5 volunteers) • Predictor (x) - LSD tissue concentration (mean of 5 volunteers) • Raw Data and scatterplot of Score vs LSD concentration: LSD_CONC 7 6 5 4 3 2 1 SCORE 80 70 60 50 40 30 20 Source: Wagner, et al (1968)
  • 4. Least Squares Computations               2 2 2 ^ 2 1 ^ 0 ^ 2 1 ^ 2 2                                n SSE n y y s x y S S x x y y x x y y S y y x x S x x S xx xy yy xy xx   
  • 5. Example - Pharmacodynamics of LSD 72 . 50 01 . 9 10 . 89 10 . 89 ) 33 . 4 )( 01 . 9 ( 09 . 50 01 . 9 4749 . 22 4872 . 202 333 . 4 7 33 . 30 087 . 50 7 61 . 350 2 ^ 1 ^ 0 ^ 1 ^                  s x y x y x y    Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy 78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649 58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769 67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689 37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689 45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969 32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889 29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343 (Column totals given in bottom row of table)
  • 6. SPSS Output and Plot of Equation Coefficientsa 89.124 7.048 12.646 .000 -9.009 1.503 -.937 -5.994 .002 (Constant) LSD_CONC Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: SCORE a. Linear Regression 1.00 2.00 3.00 4.00 5.00 6.00 lsd_conc 30.00 40.00 50.00 60.00 70.00 80.00 score        score = 89.12 + -9.01 * lsd_conc R-Square = 0.88 Math Score vs LSD Concentration (SPSS)
  • 7. Inference Concerning the Slope (1) • Parameter: Slope in the population model (1) • Estimator: Least squares estimate: • Estimated standard error: • Methods of making inference regarding population: – Hypothesis tests (2-sided or 1-sided) – Confidence Intervals 1 ^  xx S s / ^ 1 ^   
  • 8. Hypothesis Test for 1 • 2-Sided Test – H0: 1 = 0 – HA: 1  0 • 1-sided Test – H0: 1 = 0 – HA + : 1 > 0 or – HA - : 1 < 0 |) | ( 2 : | | : . . : . . 2 , 2 / ^ 1 ^ 1 ^ obs n obs obs t t P val P t t R R t S T          ) ( : ) ( : : . . : . . : . . 2 , 2 , ^ 1 ^ 1 ^ obs obs n obs n obs obs t t P val P t t P val P t t R R t t R R t S T                   
  • 9. (1-)100% Confidence Interval for 1 xx S s t t 2 / 1 ^ ^ 2 / 1 ^ 1 ^          • Conclude positive association if entire interval above 0 • Conclude negative association if entire interval below 0 • Cannot conclude an association if interval contains 0 • Conclusion based on interval is same as 2-sided hypothesis test
  • 10. Example - Pharmacodynamics of LSD 50 . 1 475 . 22 12 . 7 475 . 22 12 . 7 72 . 50 01 . 9 7 1 ^ ^ 1 ^            xx S s n • Testing H0: 1 = 0 vs HA: 1  0 571 . 2 | :| . . 01 . 6 50 . 1 01 . 9 : . . 5 , 025 .       t t R R t S T obs obs • 95% Confidence Interval for 1 : ) 15 . 5 , 87 . 12 ( 86 . 3 01 . 9 ) 50 . 1 ( 571 . 2 01 . 9        
  • 11. Correlation Coefficient • Measures the strength of the linear association between two variables • Takes on the same sign as the slope estimate from the linear regression • Not effected by linear transformations of y or x • Does not distinguish between dependent and independent variable (e.g. height and weight) • Population Parameter -  • Pearson’s Correlation Coefficient: 1 1     r S S S r yy xx xy
  • 12. Correlation Coefficient • Values close to 1 in absolute value  strong linear association, positive or negative from sign • Values close to 0 imply little or no association • If data contain outliers (are non-normal), Spearman’s coefficient of correlation can be computed based on the ranks of the x and y values • Test of H0: = 0 is equivalent to test of H0:1=0 • Coefficient of Determination (r2 ) - Proportion of variation in y “explained” by the regression on x: 1 0 ) ( 2 2 2      r S SSE S r r yy yy
  • 13. Example - Pharmacodynamics of LSD 2 2 ) 94 . 0 ( 88 . 0 183 . 2078 89 . 253 183 . 2078 94 . 0 ) 183 . 2078 )( 475 . 22 ( 487 . 202 89 . 253 183 . 2078 487 . 202 475 . 22               r r SSE S S S yy xy xx Mean 1.00 2.00 3.00 4.00 5.00 6.00 lsd_conc 30.00 40.00 50.00 60.00 70.00 80.00        Mean = 50.09 Linear Regression 1.00 2.00 3.00 4.00 5.00 6.00 lsd_conc 30.00 40.00 50.00 60.00 70.00 80.00 score        score = 89.12 + -9.01 * lsd_conc R-Square = 0.88 Syy SSE
  • 14. Example - SPSS Output Pearson’s and Spearman’s Measures Correlations 1 -.937** . .002 7 7 -.937** 1 .002 . 7 7 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N SCORE LSD_CONC SCORE LSD_CONC Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1.000 -.929** . .003 7 7 -.929** 1.000 .003 . 7 7 Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N SCORE LSD_CONC Spearman's rho SCORE LSD_CONC Correlation is significant at the 0.01 level (2-tailed). **.
  • 15. Analysis of Variance in Regression • Goal: Partition the total variation in y into variation “explained” by x and random variation 2 ^ 2 ^ 2 ^ ^ ) ( ) ( ) ( ) ( ) ( ) (              y y y y y y y y y y y y i i i i i i i i • These three sums of squares and degrees of freedom are: •Total (Syy) dfTotal = n-1 • Error (SSE) dfError = n-2 • Model (SSR) dfModel = 1
  • 16. Analysis of Variance in Regression Source of Variation Sum of Squares Degrees of Freedom Mean Square F Model SSR 1 MSR = SSR/1 F = MSR/MSE Error SSE n-2 MSE = SSE/(n-2) Total Syy n-1 • Analysis of Variance - F-test • H0: 1 = 0 HA: 1  0 ) ( : : . . : . . 2 , 1 , obs n obs obs F F P val P F F R R MSE MSR F S T      
  • 17. Example - Pharmacodynamics of LSD • Total Sum of squares: 6 1 7 183 . 2078 ) ( 2       Total i yy df y y S • Error Sum of squares: 5 2 7 890 . 253 ) ( 2 ^       Error i i df y y SSE • Model Sum of Squares: 1 293 . 1824 890 . 253 183 . 2078 ) ( 2 ^       Model i df y y SSR
  • 18. Example - Pharmacodynamics of LSD Source of Variation Sum of Squares Degrees of Freedom Mean Square F Model 1824.293 1 1824.293 35.93 Error 253.890 5 50.778 Total 2078.183 6 •Analysis of Variance - F-test • H0: 1 = 0 HA: 1  0 ) 93 . 35 ( : 61 . 6 : . . 93 . 35 : . . 5 , 1 , 05 .       F P val P F F R R MSE MSR F S T obs obs
  • 19. Example - SPSS Output ANOVAb 1824.302 1 1824.302 35.928 .002a 253.881 5 50.776 2078.183 6 Regression Residual Total Model 1 Sum of Squares df Mean Square F Sig. Predictors: (Constant), LSD_CONC a. Dependent Variable: SCORE b.
  • 20. Multiple Regression • Numeric Response variable (Y) • p Numeric predictor variables • Model: Y = 0 + 1x1 +  + pxp +  • Partial Regression Coefficients: i  effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant
  • 21. Example - Effect of Birth weight on Body Size in Early Adolescence • Response: Height at Early adolescence (n =250 cases) • Predictors (p=6 explanatory variables) • Adolescent Age (x1, in years -- 11-14) • Tanner stage (x2, units not given) • Gender (x3=1 if male, 0 if female) • Gestational age (x4, in weeks at birth) • Birth length (x5, units not given) • Birthweight Group (x6=1,...,6 <1500g (1), 1500- 1999g(2), 2000-2499g(3), 2500-2999g(4), 3000- 3499g(5), >3500g(6)) Source: Falkner, et al (2004)
  • 22. Least Squares Estimation • Population Model for mean response: p p x x Y E         1 1 0 ) ( • Least Squares Fitted (predicted) equation, minimizing SSE:              2 ^ ^ 1 1 ^ 0 ^ ^ Y Y SSE x x Y p p     • All statistical software packages/spreadsheets can compute least squares estimates and their standard errors
  • 23. Analysis of Variance • Direct extension to ANOVA based on simple linear regression • Only adjustments are to degrees of freedom: – dfModel = p dfError = n-p-1 Source of Variation Sum of Squares Degrees of Freedom Mean Square F Model SSR p MSR = SSR/p F = MSR/MSE Error SSE n-p-1 MSE = SSE/(n-p-1) Total Syy n-1 yy yy yy S SSR S SSE S R    2
  • 24. Testing for the Overall Model - F-test • Tests whether any of the explanatory variables are associated with the response • H0: 1==p=0 (None of the xs associated with y) • HA: Not all i = 0 ) ( : : . . ) 1 /( ) 1 ( / : . . 1 , , 2 2 obs p n p obs obs F F P val P F F R R p n R p R MSE MSR F S T           
  • 25. Example - Effect of Birth weight on Body Size in Early Adolescence • Authors did not print ANOVA, but did provide following: • n=250 p=6 R2 =0.26 • H0: 1==6=0 • HA: Not all i = 0 ) 2 . 14 ( : 13 . 2 : . . 2 . 14 0030 . 0433 . ) 1 6 250 /( ) 26 . 0 1 ( 6 / 26 . 0 ) 1 /( ) 1 ( / : . . 243 , 6 , 2 2                 F P val P F F R R p n R p R MSE MSR F S T obs obs 
  • 26. Testing Individual Partial Coefficients - t-tests • Wish to determine whether the response is associated with a single explanatory variable, after controlling for the others • H0: i = 0 HA: i  0 (2-sided alternative) |) | ( 2 : | | : . . : . . 1 , 2 / ^ ^ ^ obs p n obs i obs t t P val P t t R R t S T i          
  • 27. Example - Effect of Birth weight on Body Size in Early Adolescence Variable b sb t=b/sb P-val (z) Adolescent Age 2.86 0.99 2.89 .0038 Tanner Stage 3.41 0.89 3.83 <.001 Male 0.08 1.26 0.06 .9522 Gestational Age -0.11 0.21 -0.52 .6030 Birth Length 0.44 0.19 2.32 .0204 Birth Wt Grp -0.78 0.64 -1.22 .2224 Controlling for all other predictors, adolescent age, Tanner stage, and Birth length are associated with adolescent height measurement
  • 28. Models with Dummy Variables • Some models have both numeric and categorical explanatory variables (Recall gender in example) • If a categorical variable has k levels, need to create k-1 dummy variables that take on the values 1 if the level of interest is present, 0 otherwise. • The baseline level of the categorical variable for which all k-1 dummy variables are set to 0 • The regression coefficient corresponding to a dummy variable is the difference between the mean for that level and the mean for baseline group, controlling for all numeric predictors
  • 29. Example - Deep Cervical Infections • Subjects - Patients with deep neck infections • Response (Y) - Length of Stay in hospital • Predictors: (One numeric, 11 Dichotomous) – Age (x1) – Gender (x2=1 if female, 0 if male) – Fever (x3=1 if Body Temp > 38C, 0 if not) – Neck swelling (x4=1 if Present, 0 if absent) – Neck Pain (x5=1 if Present, 0 if absent) – Trismus (x6=1 if Present, 0 if absent) – Underlying Disease (x7=1 if Present, 0 if absent) – Respiration Difficulty (x8=1 if Present, 0 if absent) – Complication (x9=1 if Present, 0 if absent) – WBC > 15000/mm3 (x10=1 if Present, 0 if absent) – CRP > 100g/ml (x11=1 if Present, 0 if absent) Source: Wang, et al (2003)
  • 30. Example - Weather and Spinal Patients • Subjects - Visitors to National Spinal Network in 23 cities Completing SF-36 Form • Response - Physical Function subscale (1 of 10 reported) • Predictors: – Patient’s age (x1) – Gender (x2=1 if female, 0 if male) – High temperature on day of visit (x3) – Low temperature on day of visit (x4) – Dew point (x5) – Wet bulb (x6) – Total precipitation (x7) – Barometric Pressure (x7) – Length of sunlight (x8) – Moon Phase (new, wax crescent, 1st Qtr, wax gibbous, full moon, wan gibbous, last Qtr, wan crescent, presumably had 8-1=7 dummy variables) Source: Glaser, et al (2004)
  • 31. Analysis of Covariance • Combination of 1-Way ANOVA and Linear Regression • Goal: Comparing numeric responses among k groups, adjusting for numeric concomitant variable(s), referred to as Covariate(s) • Clinical trial applications: Response is Post-Trt score, covariate is Pre-Trt score • Epidemiological applications: Outcomes compared across exposure conditions, adjusted for other risk factors (age, smoking status, sex,...)