SlideShare a Scribd company logo
CHAPTER 16
Regression
Regression
The statistical technique for finding the best-fitting straight
line for a set of data
• Allows us to make
predictions based on
correlations
• A linear relationship
between two variables
allows the computation
of an equation that
provides a precise,
mathematical description
of the relationship abXY 
Regression
Line
The Relationship Between
Correlation and Regression
Both examine the relationship/association
between two variables
Both involve an X and Y variable for each
individual (one pair of scores)
Differences in practice
Correlation
Used to determine the
relationship between
two variables
Regression
Used to make
predictions about one
variable based on the
value of another
The Linear Equation:
Expresses a linear relationship between variables X and Y
• X: represents any given score on X
• Y: represents the corresponding score for Y based on X
• a: the Y-intercept
• Determines what the
value of Y equals when X = 0
• Where the line crosses the
Y-axis
• b: the slope constant
• How much the Y variable
will change when X is
increased by one point
• The direction and degree of the line’s tilt
abXY 
Prediction using Regression
A local video store charges a
$5/month membership fee
which allows video rentals at
$2 each
• How much will I spend per
month?
• If you never rent a video (X = 0)
• If you rent 3 videos/mo (X = 3)
• If you rent 8 videos/mo (X = 8)
abXY 
52  XY
55)0(2 Y
115)3(2 Y
215)8(2 Y
Graphing linear equations
7560)35(3
6060)05(0


YX
YX
The intercept (a) is 60
(when X = 0, Y = 60)
The slope (b) is 5
(as we increase one value in X, Y
increases 5 points)
0
10
20
30
40
50
60
70
80
0 1 2 3 4
• To graph the line below,
we only need to find two
pairs of scores for X and Y,
and then draw the straight
line that connects them
605  XY
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
1. Makes the relationship
between two variables
easier to see (and
describe)
2. Identifies the ‘central
tendency’ of the relationship
between the variables
3. Can be used for prediction
• Best fit: the line that minimizes the distance of each
point to the line
‘Best fit’
Regression
Line
Correlation and the regression line
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5
• The magnitude of the
correlation coefficient (r ) is
an indicator of how well
the points aggregate
around the regression line
• What would a perfect
correlation look like?
The Distance Between a Point and the Line
:ˆ
:
Y
Y
Each data point will have its
own distance from the
regression line (a.k.a. error)
The actual value of Y shown in
the data for a given X
The value of Y predicted for a
given X from your linear
equation
YY ˆDistance 
How well does the line fit the data?
• How well a set of data points fits a straight line
can be measured by calculating the distance
(error) between the line and each data point
YY ˆError 
hat"y"ˆ Y
How well does the line fit the data?
• Some of distances will be positive and some
negative, so to find a total value we must square
each distance (remember SS)
 2
ˆ YY
Total squared error
(SS residual):
Remember, this is
the squared sum
of all distances
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
The Least-
Squared-Error
Solution
A.k.a.
• The “best fit”
regression line
• minimizes the distance
of each point from the line
• Gives the best prediction
of Y
• The Least-Squared-Error
Solution
• Results in the smallest possible
value for the total squared error abXY ˆ
Solving the regression equation
abXY ˆ
Remember:
n
YX
XYSP


x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
I interrupt our regularly scheduled
program for a brief announcement….
‘Memba these?
We have spent the semester
utilizing the Computational
Formulas for all Sum of Squares
For sanity’s sake, we will now be
utilizing the definitional formulas
for all
n
X
XSSX
2
2 )(

n
Y
YSSY
2
2 )(

n
YX
XYSP


2
)( XX MXSS 
  YX MYMXSP 
2
)( YY MYSS 
And now back to our regularly
scheduled programming…..
Solving the regression equation
abXY ˆ
Remember:
x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
  YX MYMXSP 
Let’s Try One!
(Example 16.1, p.563, using the definitional formula)
Scores
X Y
2 3
6 11
0 6
4 6
7 12
5 7
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X – MX)(Y – MY)
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Find b and a in the regression equation
1
36
36

xSS
SP
b
448)4(18 

a
bMMa XY
36
648;364


SP
SSMSSM YYXx
441ˆ  XXabXY
Making Predictions
We use the regression to make predictions.
• For the previous example:
• Thus, an individual with a score of X = 3 would be
predicted to have a Y score of:
However, keep in mind:
1. The predicted value will not be perfect unless the correlation is
perfect (the data points are not perfectly in line)
• Least error is NOT the absence of error
2. The regression equation should not be used to make predictions for
X values outside the range of the original data
4ˆ  XY
743ˆ Y
Standardizing the Regression Equation
The standardized form of the regression equation
utilizes z-scores (standardized scores) in place of raw
scores:
Note:
1. We are now using the z-score for each X value (zx) to predict the
z-score for the corresponding Y value (zy)
2. The slope constant that was b is now identified as β (“beta”)
• The slope for standardized variables: one standard deviation change
in X produces this much change in the standard deviation of Y
• For an equation with two variables, β = Pearson r
3. There is no longer a constant (a) in the equation
because z-scores have a mean of 0
xy zz ˆ
xy bMMa 
The Accuracy of the Predictions
• These plots of two different sets of data have the same
regression equation
The regression equation does not
provide any information about the
accuracy of the predictions!
The Standard Error of the Estimate
Provides a measure of the standard distance between a
regression line (the predicted Y values) and the actual data
points (the actual Y values)
• Very similar to the standard deviation
• Answers the question:
How accurately does the regression equation predict the
observed Y values?
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Let’s Compute the Standard Error of
Estimate (Example 16.1, p.563, using the definitional formula)
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
43.11
67.130
6
784
28
282





Relationship Between the Standard
Error of the Estimate and Correlation
• r2 = proportion of predicted variability
• Variability in Y that is predicted by its relationship with X
• (1 – r2) = proportion of unpredicted variability
So, if r = 0.80, then the predicted variability is r2 = 0.64
• 64% of the total variability for Y scores can be predicted by X
• And the unpredicted variability is the remaining 36% (1 - r2)
predicted variability = SSregression = r2
SSY
unpredicted variability = SSresidual = (1-r2
)SSY
An Easier Way to Compute SSresidual
sY.X =
SSresidual
df
=
1-r2
( )SSY
n-2
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Instead of computing individual error values:
It is easier to simply use the formula for unpredicted
variability for the SSresidual
These are the steps we just went through to
compute the Standard Error of Estimate
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
sY.X =
SSresidual
df
=
å Y - ˆY( )
2
n-2
43.11
67.130
6
784
28
282





Now let’s do it using the easier formula
• We know SSX = 36, SSY = 64, and SP = 36 because we
calculated it a few slides back:
Scores
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X - MX)2(Y - MY)2
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Using those figures, we can compute:
• With SSY = 64 and a correlation of 0.75, the predicted
variability from the regression equation is:
r =
SP
SSXSSY
=
36
36(64)
=
36
2304
=
36
48
= 0.75
SSregression = r2
SSY = 0.752
(64)= 0.5625(64) = 36
SSresidual = (1-r2
)SSY = (1-0.752
)64 = (1-0.5625)64
= (0.4375)64 = 28
• And the unpredicted variability is:
• This is the same value we found working with our table!
CHAPTER 16.2
Analysis of Regression:
Testing the Significance of the Regression Equation
Analysis of Regression
• Uses an F-ratio to determine whether the variance
predicted by the regression equation is significantly
greater than would be expected if there was no
relationship between X and Y.
F =
variance in Y predicted by the regression equation
unpredicted variance in the Y scores
F =
systematic changes in Y resulting from changes in X
changes in Y that are independent from changes in X
Significance testing
The regression equation does not account for a
significant proportion of variance in the Y scores
The equation does account for a significant
proportion of variance in the Y scores
MSregression =
SSregression
dfregression
;df =1
MSresidual =
SSresidual
dfresidual
;df = n- 2
Find and evaluate the critical F-value the same as for
ANOVA (df = # of predictors, n-2)
H0 :
H1 :
F =
MSregression
MSresidual
Coming up next…
• Wednesday lab
• Lab #9: Using SPSS for correlation and regression
• HW #9 is due in the beginning of class
• Read the second half of Chapter 16 (pp.572-581)
CHAPTER 16.3
Introduction to Multiple Regression with Two Predictor
Variables
Multiple
Regression
with Two
Predictor
Variables
• 40% of the variance in Academic Performance can be
predicted by IQ scores
• 30% of the variance in academic performance can be
predicted from SAT scores
• IQ and SAT also overlap: SAT contributes only an additional
10% beyond what is already predicted by IQ
Predicting the variance
in academic
performance from IQ
and SAT scores
Multiple Regression
When you have more than one predictor variable
Considering the two-predictor model:
For standardized scores:
ˆY = b1x1 + b2 x2 + a
ˆzY = b1zX1 + b2zX 2
Calculations for two-predictor
regression coefficients:
Where:
• SSX1= sum of squared
deviations for X1
• SSX2= sum of squared
deviations for X2
• SPX1Y= sum of products
of deviations for X1 and Y
• SPX2Y= sum of products
of deviations for X2 and Y
• SPX1X2= sum of products
of deviations for X1and X22211
2
2121
12112
2
2
2121
22121
1
)())((
))(())((
)())((
))(())((
XXY
XXXX
YXXXXYX
XXXX
YXXXXYX
MbMbMa
SPSSSS
SPSPSSSP
b
SPSSSS
SPSPSSSP
b







R²
Percentage of variance accounted for by a
multiple-regression equation
• Proportion of unpredicted variability:
Y
YXYX
Y
regression
SS
SPbSPb
SS
SS
R 22112 

Y
residual
SS
SS
R  )1( 2
Standard error of the
estimate
Significance testing
(2-predictors)
3
21



ndf
df
SS
MS
MSs
residual
residual
residualXXY
),2(
3
2
residual
residual
regression
residual
residual
regression
regression
dfdf
MS
MS
F
n
SS
MS
SS
MS





** With 3+ predictors, df
regression = # predictors
Evaluating the Contribution of Each
Predictor Variable
• With a multiple regression, we can evaluate the
contribution of each predictor variable
• Does variable X1 make a significant contribution
beyond what is already predicted by variable X2?
• Does variable X2 make a significant contribution
beyond what is already predicted by variable X1?
• This is useful if we want to control for a third variable and
any confounding effects

More Related Content

PPTX
Frequency Distributions
PPTX
What is a partial correlation?
PPTX
Measures of Central Tendency
PPT
Chi square mahmoud
PDF
Bias and Mean square error
PDF
Phi Coefficient of Correlation - Thiyagu
Frequency Distributions
What is a partial correlation?
Measures of Central Tendency
Chi square mahmoud
Bias and Mean square error
Phi Coefficient of Correlation - Thiyagu

What's hot (20)

PDF
Correlation in statistics
PPTX
Agreement analysis
PDF
Mpc 006 - 02-03 partial and multiple correlation
PDF
Statistics lecture 8 (chapter 7)
PPTX
frequency distribution
PPTX
Intro to probability
PPTX
Correlation and Regression ppt
PPTX
Basic Statistics
PDF
Least Squares Regression Method | Edureka
PPTX
Correlation & Regression Analysis using SPSS
PPTX
A power point presentation on statistics
PPTX
Probability
PPTX
What is a phi coefficient?
PPTX
A.1 properties of point estimators
PPTX
Binomial probability distribution
PPTX
Regression analysis
PPTX
Normal Probabilty Distribution and its Problems
PPTX
Standard normal distribution
PPT
Descriptive statistics ppt
PPTX
Regression analysis
Correlation in statistics
Agreement analysis
Mpc 006 - 02-03 partial and multiple correlation
Statistics lecture 8 (chapter 7)
frequency distribution
Intro to probability
Correlation and Regression ppt
Basic Statistics
Least Squares Regression Method | Edureka
Correlation & Regression Analysis using SPSS
A power point presentation on statistics
Probability
What is a phi coefficient?
A.1 properties of point estimators
Binomial probability distribution
Regression analysis
Normal Probabilty Distribution and its Problems
Standard normal distribution
Descriptive statistics ppt
Regression analysis
Ad

Similar to regression (20)

PPT
Corr-and-Regress (1).ppt
PPT
Corr-and-Regress.ppt
PPT
Cr-and-Regress.ppt
PPT
Correlation & Regression for Statistics Social Science
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Corr And Regress
PPTX
Regression-SIMPLE LINEAR (1).psssssssssptx
PPTX
Regression refers to the statistical technique of modeling
PPT
Regression and Co-Relation
PDF
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
PPT
Correlation and Regression analysis .ppt
PPTX
regression.pptx
PPTX
Unit-III Correlation and Regression.pptx
PDF
PPTX
8. Correlation and Linear Regression.pptx
PDF
Regression analysis
PPTX
PPTX
Unit 7b Regression Analyss.pptxbhjjjjjjk
Corr-and-Regress (1).ppt
Corr-and-Regress.ppt
Cr-and-Regress.ppt
Correlation & Regression for Statistics Social Science
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Corr And Regress
Regression-SIMPLE LINEAR (1).psssssssssptx
Regression refers to the statistical technique of modeling
Regression and Co-Relation
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
Correlation and Regression analysis .ppt
regression.pptx
Unit-III Correlation and Regression.pptx
8. Correlation and Linear Regression.pptx
Regression analysis
Unit 7b Regression Analyss.pptxbhjjjjjjk
Ad

More from Kaori Kubo Germano, PhD (15)

PDF
Hypothesis testing
PDF
Probability & Samples
PDF
Choosing the right statistics
PDF
Factorial ANOVA
PDF
Analysis of Variance
PDF
Repeated Measures ANOVA
PDF
Repeated Measures t-test
PDF
Independent samples t-test
PDF
Introduction to the t-test
PDF
Central Tendency
PDF
PDF
Frequency Distributions
PDF
Behavioral Statistics Intro lecture
Hypothesis testing
Probability & Samples
Choosing the right statistics
Factorial ANOVA
Analysis of Variance
Repeated Measures ANOVA
Repeated Measures t-test
Independent samples t-test
Introduction to the t-test
Central Tendency
Frequency Distributions
Behavioral Statistics Intro lecture

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
01-Introduction-to-Information-Management.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
RMMM.pdf make it easy to upload and study
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
master seminar digital applications in india
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Types and Its function , kingdom of life
01-Introduction-to-Information-Management.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Pharma ospi slides which help in ospi learning
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPH.pptx obstetrics and gynecology in nursing
RMMM.pdf make it easy to upload and study
2.FourierTransform-ShortQuestionswithAnswers.pdf
TR - Agricultural Crops Production NC III.pdf
Microbial diseases, their pathogenesis and prophylaxis
Final Presentation General Medicine 03-08-2024.pptx
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O7-L3 Supply Chain Operations - ICLT Program

regression

  • 2. Regression The statistical technique for finding the best-fitting straight line for a set of data • Allows us to make predictions based on correlations • A linear relationship between two variables allows the computation of an equation that provides a precise, mathematical description of the relationship abXY  Regression Line
  • 3. The Relationship Between Correlation and Regression Both examine the relationship/association between two variables Both involve an X and Y variable for each individual (one pair of scores) Differences in practice Correlation Used to determine the relationship between two variables Regression Used to make predictions about one variable based on the value of another
  • 4. The Linear Equation: Expresses a linear relationship between variables X and Y • X: represents any given score on X • Y: represents the corresponding score for Y based on X • a: the Y-intercept • Determines what the value of Y equals when X = 0 • Where the line crosses the Y-axis • b: the slope constant • How much the Y variable will change when X is increased by one point • The direction and degree of the line’s tilt abXY 
  • 5. Prediction using Regression A local video store charges a $5/month membership fee which allows video rentals at $2 each • How much will I spend per month? • If you never rent a video (X = 0) • If you rent 3 videos/mo (X = 3) • If you rent 8 videos/mo (X = 8) abXY  52  XY 55)0(2 Y 115)3(2 Y 215)8(2 Y
  • 6. Graphing linear equations 7560)35(3 6060)05(0   YX YX The intercept (a) is 60 (when X = 0, Y = 60) The slope (b) is 5 (as we increase one value in X, Y increases 5 points) 0 10 20 30 40 50 60 70 80 0 1 2 3 4 • To graph the line below, we only need to find two pairs of scores for X and Y, and then draw the straight line that connects them 605  XY
  • 7. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) 1. Makes the relationship between two variables easier to see (and describe) 2. Identifies the ‘central tendency’ of the relationship between the variables 3. Can be used for prediction • Best fit: the line that minimizes the distance of each point to the line ‘Best fit’ Regression Line
  • 8. Correlation and the regression line 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 • The magnitude of the correlation coefficient (r ) is an indicator of how well the points aggregate around the regression line • What would a perfect correlation look like?
  • 9. The Distance Between a Point and the Line :ˆ : Y Y Each data point will have its own distance from the regression line (a.k.a. error) The actual value of Y shown in the data for a given X The value of Y predicted for a given X from your linear equation YY ˆDistance 
  • 10. How well does the line fit the data? • How well a set of data points fits a straight line can be measured by calculating the distance (error) between the line and each data point YY ˆError  hat"y"ˆ Y
  • 11. How well does the line fit the data? • Some of distances will be positive and some negative, so to find a total value we must square each distance (remember SS)  2 ˆ YY Total squared error (SS residual): Remember, this is the squared sum of all distances
  • 12. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) The Least- Squared-Error Solution A.k.a. • The “best fit” regression line • minimizes the distance of each point from the line • Gives the best prediction of Y • The Least-Squared-Error Solution • Results in the smallest possible value for the total squared error abXY ˆ
  • 13. Solving the regression equation abXY ˆ Remember: n YX XYSP   x y x s s r SS SP b  XY bMMa  meanM
  • 14. I interrupt our regularly scheduled program for a brief announcement….
  • 15. ‘Memba these? We have spent the semester utilizing the Computational Formulas for all Sum of Squares For sanity’s sake, we will now be utilizing the definitional formulas for all n X XSSX 2 2 )(  n Y YSSY 2 2 )(  n YX XYSP   2 )( XX MXSS    YX MYMXSP  2 )( YY MYSS 
  • 16. And now back to our regularly scheduled programming…..
  • 17. Solving the regression equation abXY ˆ Remember: x y x s s r SS SP b  XY bMMa  meanM   YX MYMXSP 
  • 18. Let’s Try One! (Example 16.1, p.563, using the definitional formula) Scores X Y 2 3 6 11 0 6 4 6 7 12 5 7 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X – MX)(Y – MY) 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 19. Find b and a in the regression equation 1 36 36  xSS SP b 448)4(18   a bMMa XY 36 648;364   SP SSMSSM YYXx 441ˆ  XXabXY
  • 20. Making Predictions We use the regression to make predictions. • For the previous example: • Thus, an individual with a score of X = 3 would be predicted to have a Y score of: However, keep in mind: 1. The predicted value will not be perfect unless the correlation is perfect (the data points are not perfectly in line) • Least error is NOT the absence of error 2. The regression equation should not be used to make predictions for X values outside the range of the original data 4ˆ  XY 743ˆ Y
  • 21. Standardizing the Regression Equation The standardized form of the regression equation utilizes z-scores (standardized scores) in place of raw scores: Note: 1. We are now using the z-score for each X value (zx) to predict the z-score for the corresponding Y value (zy) 2. The slope constant that was b is now identified as β (“beta”) • The slope for standardized variables: one standard deviation change in X produces this much change in the standard deviation of Y • For an equation with two variables, β = Pearson r 3. There is no longer a constant (a) in the equation because z-scores have a mean of 0 xy zz ˆ xy bMMa 
  • 22. The Accuracy of the Predictions • These plots of two different sets of data have the same regression equation The regression equation does not provide any information about the accuracy of the predictions!
  • 23. The Standard Error of the Estimate Provides a measure of the standard distance between a regression line (the predicted Y values) and the actual data points (the actual Y values) • Very similar to the standard deviation • Answers the question: How accurately does the regression equation predict the observed Y values?   2 ˆ 2 .    n YY df SS s residual XY
  • 24. Let’s Compute the Standard Error of Estimate (Example 16.1, p.563, using the definitional formula) Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY    2 ˆ 2 .    n YY df SS s residual XY 43.11 67.130 6 784 28 282     
  • 25. Relationship Between the Standard Error of the Estimate and Correlation • r2 = proportion of predicted variability • Variability in Y that is predicted by its relationship with X • (1 – r2) = proportion of unpredicted variability So, if r = 0.80, then the predicted variability is r2 = 0.64 • 64% of the total variability for Y scores can be predicted by X • And the unpredicted variability is the remaining 36% (1 - r2) predicted variability = SSregression = r2 SSY unpredicted variability = SSresidual = (1-r2 )SSY
  • 26. An Easier Way to Compute SSresidual sY.X = SSresidual df = 1-r2 ( )SSY n-2   2 ˆ 2 .    n YY df SS s residual XY Instead of computing individual error values: It is easier to simply use the formula for unpredicted variability for the SSresidual
  • 27. These are the steps we just went through to compute the Standard Error of Estimate Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY  sY.X = SSresidual df = å Y - ˆY( ) 2 n-2 43.11 67.130 6 784 28 282     
  • 28. Now let’s do it using the easier formula • We know SSX = 36, SSY = 64, and SP = 36 because we calculated it a few slides back: Scores X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X - MX)2(Y - MY)2 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 29. Using those figures, we can compute: • With SSY = 64 and a correlation of 0.75, the predicted variability from the regression equation is: r = SP SSXSSY = 36 36(64) = 36 2304 = 36 48 = 0.75 SSregression = r2 SSY = 0.752 (64)= 0.5625(64) = 36 SSresidual = (1-r2 )SSY = (1-0.752 )64 = (1-0.5625)64 = (0.4375)64 = 28 • And the unpredicted variability is: • This is the same value we found working with our table!
  • 30. CHAPTER 16.2 Analysis of Regression: Testing the Significance of the Regression Equation
  • 31. Analysis of Regression • Uses an F-ratio to determine whether the variance predicted by the regression equation is significantly greater than would be expected if there was no relationship between X and Y. F = variance in Y predicted by the regression equation unpredicted variance in the Y scores F = systematic changes in Y resulting from changes in X changes in Y that are independent from changes in X
  • 32. Significance testing The regression equation does not account for a significant proportion of variance in the Y scores The equation does account for a significant proportion of variance in the Y scores MSregression = SSregression dfregression ;df =1 MSresidual = SSresidual dfresidual ;df = n- 2 Find and evaluate the critical F-value the same as for ANOVA (df = # of predictors, n-2) H0 : H1 : F = MSregression MSresidual
  • 33. Coming up next… • Wednesday lab • Lab #9: Using SPSS for correlation and regression • HW #9 is due in the beginning of class • Read the second half of Chapter 16 (pp.572-581)
  • 34. CHAPTER 16.3 Introduction to Multiple Regression with Two Predictor Variables
  • 35. Multiple Regression with Two Predictor Variables • 40% of the variance in Academic Performance can be predicted by IQ scores • 30% of the variance in academic performance can be predicted from SAT scores • IQ and SAT also overlap: SAT contributes only an additional 10% beyond what is already predicted by IQ Predicting the variance in academic performance from IQ and SAT scores
  • 36. Multiple Regression When you have more than one predictor variable Considering the two-predictor model: For standardized scores: ˆY = b1x1 + b2 x2 + a ˆzY = b1zX1 + b2zX 2
  • 37. Calculations for two-predictor regression coefficients: Where: • SSX1= sum of squared deviations for X1 • SSX2= sum of squared deviations for X2 • SPX1Y= sum of products of deviations for X1 and Y • SPX2Y= sum of products of deviations for X2 and Y • SPX1X2= sum of products of deviations for X1and X22211 2 2121 12112 2 2 2121 22121 1 )())(( ))(())(( )())(( ))(())(( XXY XXXX YXXXXYX XXXX YXXXXYX MbMbMa SPSSSS SPSPSSSP b SPSSSS SPSPSSSP b       
  • 38. R² Percentage of variance accounted for by a multiple-regression equation • Proportion of unpredicted variability: Y YXYX Y regression SS SPbSPb SS SS R 22112   Y residual SS SS R  )1( 2
  • 39. Standard error of the estimate Significance testing (2-predictors) 3 21    ndf df SS MS MSs residual residual residualXXY ),2( 3 2 residual residual regression residual residual regression regression dfdf MS MS F n SS MS SS MS      ** With 3+ predictors, df regression = # predictors
  • 40. Evaluating the Contribution of Each Predictor Variable • With a multiple regression, we can evaluate the contribution of each predictor variable • Does variable X1 make a significant contribution beyond what is already predicted by variable X2? • Does variable X2 make a significant contribution beyond what is already predicted by variable X1? • This is useful if we want to control for a third variable and any confounding effects