SlideShare a Scribd company logo
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-1
Chapter 12
Simple Linear Regression
Statistics for Managers
Using MicrosoftÂŽ
Excel
4th
Edition
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-2
Chapter Goals
After completing this chapter, you should be
able to:
 Explain the simple linear regression model
 Obtain and interpret the simple linear regression
equation for a set of data
 Evaluate regression residuals for aptness of the fitted
model
 Understand the assumptions behind regression
analysis
 Explain measures of variation and determine whether
the independent variable is significant
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-3
Chapter Goals
After completing this chapter, you should be
able to:
 Calculate and interpret confidence intervals for the
regression coefficients
 Use the Durbin-Watson statistic to check for
autocorrelation
 Form confidence and prediction intervals around an
estimated Y value for a given X
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-4
Correlation vs. Regression
 A scatter plot (or scatter diagram) can be used
to show the relationship between two variables
 Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
 Correlation is only concerned with strength of the
relationship
 No causal effect is implied with correlation
 Correlation was first presented in Chapter 3
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-5
Introduction to
Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on the
value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain
the dependent variable
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-6
Simple Linear Regression
Model
 Only one independent variable, X
 Relationship between X and Y is
described by a linear function
 Changes in Y are assumed to be caused
by changes in X
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-7
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-8
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-9
Types of Relationships
Y
X
Y
X
No relationship
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-10
ii10i ξXββY ++=
Linear component
Simple Linear Regression
Model
The population regression model:
Population
Y intercept
Population
Slope
Coefficient
Random
Error
term
Dependent
Variable
Independent
Variable
Random Error
component
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-11
(continued)
Random Error
for this Xi value
Y
X
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
ii10i ξXββY ++=
Xi
Slope = β1
Intercept = β0
Îľi
Simple Linear Regression
Model
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-12
i10i XbbYˆ +=
The simple linear regression equation provides an
estimate of the population regression line
Simple Linear Regression
Equation
Estimate of
the regression
intercept
Estimate of the
regression slope
Estimated
(or predicted)
Y value for
observation i
Value of X for
observation i
The individual random error terms ei have a mean of zero
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-13
Least Squares Method
 b0 and b1 are obtained by finding the values
of b0 and b1 that minimize the sum of the
squared differences between Y and :
2
i10i
2
ii ))Xb(b(Ymin)Yˆ(Ymin +−=− ∑∑
Yˆ
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-14
Finding the Least Squares
Equation
 The coefficients b0 and b1 , and other
regression results in this chapter, will be
found using Excel
Formulas are shown in the text at the end of
the chapter for those who are interested
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-15
 b0 is the estimated average value of Y
when the value of X is zero
 b1 is the estimated change in the
average value of Y as a result of a
one-unit change in X
Interpretation of the
Slope and the Intercept
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-16
Simple Linear Regression
Example
 A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
 A random sample of 10 houses is selected
 Dependent variable (Y) = house price in $1000s
 Independent variable (X) = square feet
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-17
Sample Data for House Price
Model
House Price in $1000s
(Y)
Square Feet
(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-18
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
HousePrice($1000s)
Graphical Presentation
 House price model: scatter plot
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-19
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
HousePrice($1000s)
Graphical Presentation
 House price model: scatter plot and
regression line
feet)(square0.1097798.24833pricehouse +=
Slope
= 0.10977
Intercept
= 98.248
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-20
Interpretation of the
Intercept, b0
 b0 is the estimated average value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)
 Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet
feet)(square0.1097798.24833pricehouse +=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-21
Interpretation of the
Slope Coefficient, b1
 b1 measures the estimated change in the
average value of Y as a result of a one-
unit change in X
 Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
feet)(square0.1097798.24833pricehouse +=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-22
317.85
0)0.1098(20098.25
(sq.ft.)0.109898.25pricehouse
=
+=
+=
Predict the price for a house
with 2000 square feet:
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Predictions using
Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-23
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
HousePrice($1000s)
Interpolation vs. Extrapolation
 When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
Do not try to
extrapolate
beyond the range
of observed X’s
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-24
Measures of Variation
 Total variation is made up of two parts:
SSESSRSST +=
Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
∑ −= 2
i )YY(SST ∑ −= 2
ii )YˆY(SSE∑ −= 2
i )YYˆ(SSR
where:
= Average value of the dependent variable
Yi = Observed values of the dependent variable
i = Predicted value of Y for the given Xi valueYˆ
Y
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-25
 SST = total sum of squares
 Measures the variation of the Yi values around their
mean Y
 SSR = regression sum of squares
 Explained variation attributable to the relationship
between X and Y
 SSE = error sum of squares
 Variation attributable to factors other than the
relationship between X and Y
(continued)
Measures of Variation
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-26
(continued)
Xi
Y
X
Yi
SST = ∑(Yi - Y)2
SSE = ∑(Yi - Yi )2
∧
SSR = ∑(Yi - Y)2
∧
_
_
_
Y
∧
Y
Y
_
Y
∧
Measures of Variation
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-27
 The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
 The coefficient of determination is also called
r-squared and is denoted as r2
Coefficient of Determination, r2
1r0 2
≤≤note:
squaresofsumtotal
squaresofsumregression
SST
SSR
r2
==
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-28
r2
= 1
Examples of Approximate
r2
Values
Y
X
Y
X
r2
= 1
r2
= 1
Perfect linear relationship
between X and Y:
100% of the variation in Y is
explained by variation in X
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-29
Examples of Approximate
r2
Values
Y
X
Y
X
0 < r2
< 1
Weaker linear relationships
between X and Y:
Some but not all of the
variation in Y is explained
by variation in X
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-30
Examples of Approximate
r2
Values
r2
= 0
No linear relationship
between X and Y:
The value of Y does not
depend on X. (None of the
variation in Y is explained
by variation in X)
Y
X
r2
= 0
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-31
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
58.08% of the variation in
house prices is explained by
variation in square feet
0.58082
32600.5000
18934.9348
SST
SSR
r2
===
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-32
Standard Error of Estimate
 The standard deviation of the variation of
observations around the regression line is
estimated by
2n
)YˆY(
2n
SSE
S
n
1i
2
ii
YX
−
−
=
−
=
∑=
Where
SSE = error sum of squares
n = sample size
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-33
Comparing Standard Errors
YY
X X
YXssmall YXslarge
SYX is a measure of the variation of observed
Y values from the regression line
The magnitude of SYX should always be judged relative to the
size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200 - $300K range
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-34
Assumptions of Regression
 Normality of Error
 Error values (ε) are normally distributed for any given
value of X
 Homoscedasticity
 The probability distribution of the errors has constant
variance
 Independence of Errors
 Error values are statistically independent
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-35
Residual Analysis
 The residual for observation i, ei, is the difference
between its observed and predicted value
 Check the assumptions of regression by examining the
residuals
 Examine for linearity assumption
 Examine for constant variance for all levels of X
(homoscedasticity)
 Evaluate normal distribution assumption
 Evaluate independence assumption
 Graphical Analysis of Residuals
 Can plot residuals vs. X
iii YˆYe −=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-36
Residual Analysis for Linearity
Not Linear Linear

x
residuals
x
Y
x
Y
x
residuals
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-37
Residual Analysis for
Homoscedasticity
Non-constant variance
 Constant variance
x x
Y
x x
Y
residuals
residuals
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-38
Residual Analysis for
Independence
Not Independent
Independent
X
X
residuals
residuals
X
residuals

Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-39
House Price Model Residual Plot
-60
-40
-20
0
20
40
60
80
0 1000 2000 3000
Square Feet
Residuals
Excel Residual Output
RESIDUAL OUTPUT
Predicted
House Price Residuals
1 251.92316 -6.923162
2 273.87671 38.12329
3 284.85348 -5.853484
4 304.06284 3.937162
5 218.99284 -19.99284
6 268.38832 -49.38832
7 356.20251 48.79749
8 367.17929 -43.17929
9 254.6674 64.33264
10 284.85348 -29.85348
Does not appear to violate
any regression assumptions
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-40
 Used when data are collected over time to
detect if autocorrelation is present
 Autocorrelation exists if residuals in one
time period are related to residuals in
another period
Measuring Autocorrelation:
The Durbin-Watson Statistic
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-41
Autocorrelation
 Autocorrelation is correlation of the errors
(residuals) over time
 Violates the regression assumption that
residuals are random and independent
Time (t) Residual Plot
-15
-10
-5
0
5
10
15
0 2 4 6 8
Time (t)
Residuals
 Here, residuals show
a cyclic pattern, not
random
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-42
The Durbin-Watson Statistic
∑
∑
=
=
−−
= n
1i
2
i
n
2i
2
1ii
e
)ee(
D
 The possible range is 0 ≤ D ≤ 4
 D should be close to 2 if H0 is true
 D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
 The Durbin-Watson statistic is used to test for
autocorrelation
H0: residuals are not correlated
H1: autocorrelation is present
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-43
Testing for Positive
Autocorrelation
 Calculate the Durbin-Watson test statistic = D
(The Durbin-Watson Statistic can be found using PHStat in Excel)
Decision rule: reject H0 if D < dL
H0: positive autocorrelation does not exist
H1: positive autocorrelation is present
0 dU 2dL
Reject H0 Do not reject H0
 Find the values dL and dU from the Durbin-Watson table
(for sample size n and number of independent variables k)
Inconclusive
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-44
 Example with n = 25:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals 3296.18
Sum of Squared
Residuals 3279.98
Durbin-Watson
Statistic 1.00494
y = 30.65 + 4.7038x
R
2
= 0.8976
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30
Time
Sales
Testing for Positive
Autocorrelation
(continued)
Excel/PHStat output:
1.00494
3279.98
3296.18
e
)e(e
D n
1i
2
i
n
2i
2
1ii
==
−
=
∑
∑
=
=
−
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-45
 Here, n = 25 and there is k = 1 one independent variable
 Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
 D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
 Therefore the linear model is not the appropriate model
to forecast sales
Testing for Positive
Autocorrelation
(continued)
Decision: reject H0 since
D = 1.00494 < dL
0 dU=1.45 2dL=1.29
Reject H0 Do not reject H0Inconclusive
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-46
Inferences About the Slope
 The standard error of the regression slope
coefficient (b1) is estimated by
∑ −
==
2
i
YXYX
b
)X(X
S
SSX
S
S 1
where:
= Estimate of the standard error of the least squares slope
= Standard error of the estimate
1bS
2n
SSE
SYX
−
=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-47
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
0.03297S 1b =
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-48
Comparing Standard Errors of
the Slope
Y
X
Y
X
1bSsmall 1bSlarge
is a measure of the variation in the slope of regression
lines from different possible samples
1bS
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-49
Inference about the Slope:
t Test
 t test for a population slope
 Is there a linear relationship between X and Y?
 Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 ≠ 0 (linear relationship does exist)
 Test statistic
1b
11
S
βb
t
−
=
2nd.f. −=
where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb1 = standard
error of the slope
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-50
House Price
in $1000s
(y)
Square Feet
(x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
(sq.ft.)0.109898.25pricehouse +=
Estimated Regression Equation:
The slope of this model is 0.1098
Does square footage of the house
affect its sales price?
Inference about the Slope:
t Test
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-51
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1 ≠ 0
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1bS
t
b1
32938.3
03297.0
010977.0
S
βb
t
1b
11
=
−
=
−
=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-52
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1 ≠ 0
Test Statistic: t = 3.329
There is sufficient evidence
that square footage affects
house price
From Excel output:
Reject H0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1bS tb1
Decision:
Conclusion:
Reject H0Reject H0
Îą/2=.025
-tÎą/2
Do not reject H0
0
tÎą/2
Îą/2=.025
-2.3060 2.3060 3.329
d.f. = 10-2 = 8
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-53
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1 ≠ 0
P-value = 0.01039
There is sufficient evidence
that square footage affects
house price
From Excel output:
Reject H0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
P-value
Decision: P-value < Îą so
Conclusion:
(continued)
This is a two-tail test, so
the p-value is
P(t > 3.329)+P(t < -3.329)
= 0.01039
(for 8 d.f.)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-54
F-Test for Significance
 F Test statistic:
where
MSE
MSR
F =
1kn
SSE
MSE
k
SSR
MSR
−−
=
=
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-55
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
11.0848
1708.1957
18934.9348
MSE
MSR
F ===
With 1 and 8 degrees
of freedom
P-value for
the F-Test
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-56
H0: β1 = 0
H1: β1 ≠ 0
Îą = .05
df1= 1 df2 = 8
Test Statistic:
Decision:
Conclusion:
Reject H0 at Îą = 0.05
There is sufficient evidence that
house size affects selling price0
Îą = .05
F.05 = 5.32
Reject H0Do not
reject H0
11.08
MSE
MSR
F ==
Critical
Value:
FÎą = 5.32
F-Test for Significance
(continued)
F
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-57
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
Excel Printout for House Prices:
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
1b2n1 Stb −±
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
d.f. = n - 2
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-58
Since the units of the house price variable is
$1000s, we are 95% confident that the average
impact on sales price is between $33.70 and
$185.80 per square foot of house size
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
This 95% confidence interval does not include 0.
Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance
Confidence Interval Estimate
for the Slope
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-59
t Test for a Correlation Coefficient
 Hypotheses
H0: ρ = 0 (no correlation between X and Y)
HA: ρ ≠ 0 (correlation exists)
 Test statistic
 (with n – 2 degrees of freedom)
2n
r1
ρ-r
t
2
−
−
=
0bifrr
0bifrr
where
1
2
1
2
<−=
>+=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-60
Example: House Prices
Is there evidence of a linear relationship
between square feet and house price at
the .05 level of significance?
H0: ρ = 0 (No correlation)
H1: ρ ≠ 0 (correlation exists)
Îą =.05 , df = 10 - 2 = 8
3.33
210
.7621
0.762
2n
r1
ρr
t
22
=
−
−
−
=
−
−
−
=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-61
Example: Test Solution
Conclusion:
There is
evidence of a
linear association
at the 5% level of
significance
Decision:
Reject H0
Reject H0Reject H0
Îą/2=.025
-tÎą/2
Do not reject H0
0
tÎą/2
Îą/2=.025
-2.3060 2.3060
3.33
d.f. = 10-2 = 8
3.33
210
.7621
0.762
2n
r1
ρr
t
22
=
−
−
−
=
−
−
−
=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-62
Estimating Mean Values and
Predicting Individual Values
Y
XXi
Y = b0+b1Xi
∧
Confidence
Interval for
the mean of
Y, given Xi
Prediction Interval
for an individual Y,
given Xi
Goal: Form intervals around Y to express
uncertainty about the value of Y for a given Xi
Y
∧
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-63
Confidence Interval for
the Average Y, Given X
Confidence interval estimate for the
mean value of Y given a particular Xi
Size of interval varies according
to distance away from mean, X
iYX2n
XX|Y
hStYˆ
:ÎźforintervalConfidence i
−
=
Âą
∑ −
−
+=
−
+= 2
i
2
i
2
i
i
)X(X
)X(X
n
1
SSX
)X(X
n
1
h
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-64
Prediction Interval for
an Individual Y, Given X
Confidence interval estimate for an
Individual value of Y given a particular Xi
This extra term adds to the interval width to reflect
the added uncertainty for an individual case
iYX2n
XX
h1StYˆ
:YforintervalConfidence i
+± −
=
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-65
Estimation of Mean Values:
Example
Find the 95% confidence interval for the mean price
of 2,000 square-foot houses
Predicted Price Yi = 317.85 ($1,000s)
∧
Confidence Interval Estimate for ÎźY|X=X
37.12317.85
)X(X
)X(X
n
1
StYˆ
2
i
2
i
YX2-n Âą=
−
−
+Âą
∑
The confidence interval endpoints are 280.66 and 354.90,
or from $280,660 to $354,900
i
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-66
Estimation of Individual Values:
Example
Find the 95% prediction interval for an individual
house with 2,000 square feet
Predicted Price Yi = 317.85 ($1,000s)
∧
Prediction Interval Estimate for YX=X
102.28317.85
)X(X
)X(X
n
1
1StYˆ
2
i
2
i
YX1-n Âą=
−
−
++Âą
∑
The prediction interval endpoints are 215.50 and 420.07,
or from $215,500 to $420,070
i
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-67
Finding Confidence and
Prediction Intervals in Excel
 In Excel, use
PHStat | regression | simple linear regression …
 Check the
“confidence and prediction interval for X=”
box and enter the X-value and confidence level
desired
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-68
Input values
Finding Confidence and
Prediction Intervals in Excel
(continued)
Confidence Interval Estimate for ÎźY|X=Xi
Prediction Interval Estimate for YX=Xi
Y
∧
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-69
Pitfalls of Regression Analysis
 Lacking an awareness of the assumptions
underlying least-squares regression
 Not knowing how to evaluate the assumptions
 Not knowing the alternatives to least-squares
regression if a particular assumption is violated
 Using a regression model without knowledge of
the subject matter
 Extrapolating outside the relevant range
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-70
Strategies for Avoiding
the Pitfalls of Regression
 Start with a scatter plot of X on Y to observe
possible relationship
 Perform residual analysis to check the
assumptions
 Plot the residuals vs. X to check for violations of
assumptions such as homoscedasticity
 Use a histogram, stem-and-leaf display, box-and-
whisker plot, or normal probability plot of the
residuals to uncover possible non-normality
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-71
Strategies for Avoiding
the Pitfalls of Regression
 If there is violation of any assumption, use
alternative methods or models
 If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
 Avoid making predictions or forecasts outside
the relevant range
(continued)
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-72
Chapter Summary
 Introduced types of regression models
 Reviewed assumptions of regression and
correlation
 Discussed determining the simple linear
regression equation
 Described measures of variation
 Discussed residual analysis
 Addressed measuring autocorrelation
Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-73
Chapter Summary
 Described inference about the slope
 Discussed correlation -- measuring the strength
of the association
 Addressed estimation of mean values and
prediction of individual values
 Discussed possible pitfalls in regression and
recommended strategies to avoid them
(continued)

More Related Content

PPTX
1. Data Analytics-introduction
ODP
Multiple linear regression
PPTX
Types of variables in research
PDF
Probability Distributions
PPTX
Multinomial Logistic Regression Analysis
PDF
H2S management including scavengers
PDF
Analysis of Variance (ANOVA)
1. Data Analytics-introduction
Multiple linear regression
Types of variables in research
Probability Distributions
Multinomial Logistic Regression Analysis
H2S management including scavengers
Analysis of Variance (ANOVA)

What's hot (20)

PDF
An Overview of Simple Linear Regression
PPT
Simple lin regress_inference
PPT
Regression analysis
PDF
Logistic regression
PPTX
Chap11 simple regression
PPT
Simple Linier Regression
PPTX
Correlation and regression
PPTX
Regression Analysis
PPT
Regression analysis
PPT
Simple linear regression (final)
PPT
Linear regression
PPTX
Regression
PDF
Introduction to correlation and regression analysis
PPTX
Stat 2153 Stochastic Process and Markov chain
PDF
Linear regression
PPTX
Regression
PPTX
Time Series Decomposition
PDF
Probability Distributions
PPTX
Presentation on Regression Analysis
PPTX
Reporting point biserial correlation in apa
An Overview of Simple Linear Regression
Simple lin regress_inference
Regression analysis
Logistic regression
Chap11 simple regression
Simple Linier Regression
Correlation and regression
Regression Analysis
Regression analysis
Simple linear regression (final)
Linear regression
Regression
Introduction to correlation and regression analysis
Stat 2153 Stochastic Process and Markov chain
Linear regression
Regression
Time Series Decomposition
Probability Distributions
Presentation on Regression Analysis
Reporting point biserial correlation in apa
Ad

Similar to Simple Linear Regression (20)

PPT
Chap12 simple regression
PPTX
chap12.pptx
PPT
Chap14 multiple regression model building
PPT
multiple regression model building
PPT
Newbold_chap12.ppt
PPT
Simple linear regression - regression analysis.ppt
PPT
Introduction to Multiple Regression
PPT
lecture No. 3a.ppt
PPT
Chap13 intro to multiple regression
PPT
Chapter 4 power point presentation Regression models
PPT
Simple Regression
PPT
Chap06 normal distributions & continous
PDF
Bbs11 ppt ch13
PDF
simple regression-1.pdf
PPT
Analysis of Variance
PPT
Chap10 anova
PPT
Numerical Descriptive Measures
PPT
Linear Reqression_used in statistics and engineering
PPT
Linear Regression statistical analysis s
PPT
The Normal Distribution and Other Continuous Distributions
Chap12 simple regression
chap12.pptx
Chap14 multiple regression model building
multiple regression model building
Newbold_chap12.ppt
Simple linear regression - regression analysis.ppt
Introduction to Multiple Regression
lecture No. 3a.ppt
Chap13 intro to multiple regression
Chapter 4 power point presentation Regression models
Simple Regression
Chap06 normal distributions & continous
Bbs11 ppt ch13
simple regression-1.pdf
Analysis of Variance
Chap10 anova
Numerical Descriptive Measures
Linear Reqression_used in statistics and engineering
Linear Regression statistical analysis s
The Normal Distribution and Other Continuous Distributions
Ad

More from Yesica Adicondro (20)

PPTX
Strategi Tata Letak
PPT
Konsep Balanced Score Card
PPTX
Makalah kelompok Analisis Taksi Bakri
DOCX
Makalah kelompok Analisis Taksi Bakri
PPTX
Makalah Analisis PT Kereta API Indonesia
DOC
Makalah Analisis PT Kereta API Indonesia
PPTX
Makalah kelompok 3 gudang garam
DOC
Makalah Perusahaan Gudang Garam
PPTX
Makalah kelompok 2 garuda citilink PPT
DOCX
Makalah kelompok 2 garuda citilink
PDF
Dmfi leaflet indonesian
PDF
Dmfi booklet indonesian
PPTX
Makalah kinerja operasi Indonesia PPT
DOCX
Makalah kinerja operasi Indonesia
PPTX
Business process reengineering PPT
DOCX
Business process reengineering Makalah
PPTX
PPT Balanced Scorecard
DOCX
Makalah Balanced Scorecard
DOCX
Analisis Manajemen strategik PT garuda citilink
PPTX
analisis PPT PT Japfa
Strategi Tata Letak
Konsep Balanced Score Card
Makalah kelompok Analisis Taksi Bakri
Makalah kelompok Analisis Taksi Bakri
Makalah Analisis PT Kereta API Indonesia
Makalah Analisis PT Kereta API Indonesia
Makalah kelompok 3 gudang garam
Makalah Perusahaan Gudang Garam
Makalah kelompok 2 garuda citilink PPT
Makalah kelompok 2 garuda citilink
Dmfi leaflet indonesian
Dmfi booklet indonesian
Makalah kinerja operasi Indonesia PPT
Makalah kinerja operasi Indonesia
Business process reengineering PPT
Business process reengineering Makalah
PPT Balanced Scorecard
Makalah Balanced Scorecard
Analisis Manajemen strategik PT garuda citilink
analisis PPT PT Japfa

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Global journeys: estimating international migration
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
Global journeys: estimating international migration
Taxes Foundatisdcsdcsdon Certificate.pdf
Business Acumen Training GuidePresentation.pptx
Introduction to machine learning and Linear Models
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Moving the Public Sector (Government) to a Digital Adoption
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
.pdf is not working space design for the following data for the following dat...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Simple Linear Regression

  • 1. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using MicrosoftÂŽ Excel 4th Edition
  • 2. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-2 Chapter Goals After completing this chapter, you should be able to:  Explain the simple linear regression model  Obtain and interpret the simple linear regression equation for a set of data  Evaluate regression residuals for aptness of the fitted model  Understand the assumptions behind regression analysis  Explain measures of variation and determine whether the independent variable is significant
  • 3. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-3 Chapter Goals After completing this chapter, you should be able to:  Calculate and interpret confidence intervals for the regression coefficients  Use the Durbin-Watson statistic to check for autocorrelation  Form confidence and prediction intervals around an estimated Y value for a given X (continued)
  • 4. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-4 Correlation vs. Regression  A scatter plot (or scatter diagram) can be used to show the relationship between two variables  Correlation analysis is used to measure strength of the association (linear relationship) between two variables  Correlation is only concerned with strength of the relationship  No causal effect is implied with correlation  Correlation was first presented in Chapter 3
  • 5. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-5 Introduction to Regression Analysis  Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable
  • 6. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-6 Simple Linear Regression Model  Only one independent variable, X  Relationship between X and Y is described by a linear function  Changes in Y are assumed to be caused by changes in X
  • 7. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-7 Types of Relationships Y X Y X Y Y X X Linear relationships Curvilinear relationships
  • 8. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-8 Types of Relationships Y X Y X Y Y X X Strong relationships Weak relationships (continued)
  • 9. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-9 Types of Relationships Y X Y X No relationship (continued)
  • 10. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-10 ii10i ÎľXββY ++= Linear component Simple Linear Regression Model The population regression model: Population Y intercept Population Slope Coefficient Random Error term Dependent Variable Independent Variable Random Error component
  • 11. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-11 (continued) Random Error for this Xi value Y X Observed Value of Y for Xi Predicted Value of Y for Xi ii10i ÎľXββY ++= Xi Slope = β1 Intercept = β0 Îľi Simple Linear Regression Model
  • 12. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-12 i10i XbbYˆ += The simple linear regression equation provides an estimate of the population regression line Simple Linear Regression Equation Estimate of the regression intercept Estimate of the regression slope Estimated (or predicted) Y value for observation i Value of X for observation i The individual random error terms ei have a mean of zero
  • 13. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-13 Least Squares Method  b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared differences between Y and : 2 i10i 2 ii ))Xb(b(Ymin)Yˆ(Ymin +−=− ∑∑ Yˆ
  • 14. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-14 Finding the Least Squares Equation  The coefficients b0 and b1 , and other regression results in this chapter, will be found using Excel Formulas are shown in the text at the end of the chapter for those who are interested
  • 15. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-15  b0 is the estimated average value of Y when the value of X is zero  b1 is the estimated change in the average value of Y as a result of a one-unit change in X Interpretation of the Slope and the Intercept
  • 16. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-16 Simple Linear Regression Example  A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)  A random sample of 10 houses is selected  Dependent variable (Y) = house price in $1000s  Independent variable (X) = square feet
  • 17. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-17 Sample Data for House Price Model House Price in $1000s (Y) Square Feet (X) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700
  • 18. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-18 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet HousePrice($1000s) Graphical Presentation  House price model: scatter plot
  • 19. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-19 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet HousePrice($1000s) Graphical Presentation  House price model: scatter plot and regression line feet)(square0.1097798.24833pricehouse += Slope = 0.10977 Intercept = 98.248
  • 20. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-20 Interpretation of the Intercept, b0  b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)  Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet feet)(square0.1097798.24833pricehouse +=
  • 21. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-21 Interpretation of the Slope Coefficient, b1  b1 measures the estimated change in the average value of Y as a result of a one- unit change in X  Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size feet)(square0.1097798.24833pricehouse +=
  • 22. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-22 317.85 0)0.1098(20098.25 (sq.ft.)0.109898.25pricehouse = += += Predict the price for a house with 2000 square feet: The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850 Predictions using Regression Analysis
  • 23. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-23 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet HousePrice($1000s) Interpolation vs. Extrapolation  When using a regression model for prediction, only predict within the relevant range of data Relevant range for interpolation Do not try to extrapolate beyond the range of observed X’s
  • 24. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-24 Measures of Variation  Total variation is made up of two parts: SSESSRSST += Total Sum of Squares Regression Sum of Squares Error Sum of Squares ∑ −= 2 i )YY(SST ∑ −= 2 ii )YˆY(SSE∑ −= 2 i )YYˆ(SSR where: = Average value of the dependent variable Yi = Observed values of the dependent variable i = Predicted value of Y for the given Xi valueYˆ Y
  • 25. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-25  SST = total sum of squares  Measures the variation of the Yi values around their mean Y  SSR = regression sum of squares  Explained variation attributable to the relationship between X and Y  SSE = error sum of squares  Variation attributable to factors other than the relationship between X and Y (continued) Measures of Variation
  • 26. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-26 (continued) Xi Y X Yi SST = ∑(Yi - Y)2 SSE = ∑(Yi - Yi )2 ∧ SSR = ∑(Yi - Y)2 ∧ _ _ _ Y ∧ Y Y _ Y ∧ Measures of Variation
  • 27. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-27  The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable  The coefficient of determination is also called r-squared and is denoted as r2 Coefficient of Determination, r2 1r0 2 ≤≤note: squaresofsumtotal squaresofsumregression SST SSR r2 ==
  • 28. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-28 r2 = 1 Examples of Approximate r2 Values Y X Y X r2 = 1 r2 = 1 Perfect linear relationship between X and Y: 100% of the variation in Y is explained by variation in X
  • 29. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-29 Examples of Approximate r2 Values Y X Y X 0 < r2 < 1 Weaker linear relationships between X and Y: Some but not all of the variation in Y is explained by variation in X
  • 30. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-30 Examples of Approximate r2 Values r2 = 0 No linear relationship between X and Y: The value of Y does not depend on X. (None of the variation in Y is explained by variation in X) Y X r2 = 0
  • 31. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-31 Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 58.08% of the variation in house prices is explained by variation in square feet 0.58082 32600.5000 18934.9348 SST SSR r2 ===
  • 32. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-32 Standard Error of Estimate  The standard deviation of the variation of observations around the regression line is estimated by 2n )YˆY( 2n SSE S n 1i 2 ii YX − − = − = ∑= Where SSE = error sum of squares n = sample size
  • 33. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-33 Comparing Standard Errors YY X X YXssmall YXslarge SYX is a measure of the variation of observed Y values from the regression line The magnitude of SYX should always be judged relative to the size of the Y values in the sample data i.e., SYX = $41.33K is moderately small relative to house prices in the $200 - $300K range
  • 34. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-34 Assumptions of Regression  Normality of Error  Error values (Îľ) are normally distributed for any given value of X  Homoscedasticity  The probability distribution of the errors has constant variance  Independence of Errors  Error values are statistically independent
  • 35. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-35 Residual Analysis  The residual for observation i, ei, is the difference between its observed and predicted value  Check the assumptions of regression by examining the residuals  Examine for linearity assumption  Examine for constant variance for all levels of X (homoscedasticity)  Evaluate normal distribution assumption  Evaluate independence assumption  Graphical Analysis of Residuals  Can plot residuals vs. X iii YˆYe −=
  • 36. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-36 Residual Analysis for Linearity Not Linear Linear  x residuals x Y x Y x residuals
  • 37. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-37 Residual Analysis for Homoscedasticity Non-constant variance  Constant variance x x Y x x Y residuals residuals
  • 38. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-38 Residual Analysis for Independence Not Independent Independent X X residuals residuals X residuals 
  • 39. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-39 House Price Model Residual Plot -60 -40 -20 0 20 40 60 80 0 1000 2000 3000 Square Feet Residuals Excel Residual Output RESIDUAL OUTPUT Predicted House Price Residuals 1 251.92316 -6.923162 2 273.87671 38.12329 3 284.85348 -5.853484 4 304.06284 3.937162 5 218.99284 -19.99284 6 268.38832 -49.38832 7 356.20251 48.79749 8 367.17929 -43.17929 9 254.6674 64.33264 10 284.85348 -29.85348 Does not appear to violate any regression assumptions
  • 40. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-40  Used when data are collected over time to detect if autocorrelation is present  Autocorrelation exists if residuals in one time period are related to residuals in another period Measuring Autocorrelation: The Durbin-Watson Statistic
  • 41. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-41 Autocorrelation  Autocorrelation is correlation of the errors (residuals) over time  Violates the regression assumption that residuals are random and independent Time (t) Residual Plot -15 -10 -5 0 5 10 15 0 2 4 6 8 Time (t) Residuals  Here, residuals show a cyclic pattern, not random
  • 42. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-42 The Durbin-Watson Statistic ∑ ∑ = = −− = n 1i 2 i n 2i 2 1ii e )ee( D  The possible range is 0 ≤ D ≤ 4  D should be close to 2 if H0 is true  D less than 2 may signal positive autocorrelation, D greater than 2 may signal negative autocorrelation  The Durbin-Watson statistic is used to test for autocorrelation H0: residuals are not correlated H1: autocorrelation is present
  • 43. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-43 Testing for Positive Autocorrelation  Calculate the Durbin-Watson test statistic = D (The Durbin-Watson Statistic can be found using PHStat in Excel) Decision rule: reject H0 if D < dL H0: positive autocorrelation does not exist H1: positive autocorrelation is present 0 dU 2dL Reject H0 Do not reject H0  Find the values dL and dU from the Durbin-Watson table (for sample size n and number of independent variables k) Inconclusive
  • 44. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-44  Example with n = 25: Durbin-Watson Calculations Sum of Squared Difference of Residuals 3296.18 Sum of Squared Residuals 3279.98 Durbin-Watson Statistic 1.00494 y = 30.65 + 4.7038x R 2 = 0.8976 0 20 40 60 80 100 120 140 160 0 5 10 15 20 25 30 Time Sales Testing for Positive Autocorrelation (continued) Excel/PHStat output: 1.00494 3279.98 3296.18 e )e(e D n 1i 2 i n 2i 2 1ii == − = ∑ ∑ = = −
  • 45. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-45  Here, n = 25 and there is k = 1 one independent variable  Using the Durbin-Watson table, dL = 1.29 and dU = 1.45  D = 1.00494 < dL = 1.29, so reject H0 and conclude that significant positive autocorrelation exists  Therefore the linear model is not the appropriate model to forecast sales Testing for Positive Autocorrelation (continued) Decision: reject H0 since D = 1.00494 < dL 0 dU=1.45 2dL=1.29 Reject H0 Do not reject H0Inconclusive
  • 46. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-46 Inferences About the Slope  The standard error of the regression slope coefficient (b1) is estimated by ∑ − == 2 i YXYX b )X(X S SSX S S 1 where: = Estimate of the standard error of the least squares slope = Standard error of the estimate 1bS 2n SSE SYX − =
  • 47. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-47 Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 0.03297S 1b =
  • 48. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-48 Comparing Standard Errors of the Slope Y X Y X 1bSsmall 1bSlarge is a measure of the variation in the slope of regression lines from different possible samples 1bS
  • 49. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-49 Inference about the Slope: t Test  t test for a population slope  Is there a linear relationship between X and Y?  Null and alternative hypotheses H0: β1 = 0 (no linear relationship) H1: β1 ≠ 0 (linear relationship does exist)  Test statistic 1b 11 S βb t − = 2nd.f. −= where: b1 = regression slope coefficient β1 = hypothesized slope Sb1 = standard error of the slope
  • 50. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-50 House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 (sq.ft.)0.109898.25pricehouse += Estimated Regression Equation: The slope of this model is 0.1098 Does square footage of the house affect its sales price? Inference about the Slope: t Test (continued)
  • 51. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-51 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1 ≠ 0 From Excel output: Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 1bS t b1 32938.3 03297.0 010977.0 S βb t 1b 11 = − = − =
  • 52. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-52 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1 ≠ 0 Test Statistic: t = 3.329 There is sufficient evidence that square footage affects house price From Excel output: Reject H0 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 1bS tb1 Decision: Conclusion: Reject H0Reject H0 Îą/2=.025 -tÎą/2 Do not reject H0 0 tÎą/2 Îą/2=.025 -2.3060 2.3060 3.329 d.f. = 10-2 = 8 (continued)
  • 53. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-53 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1 ≠ 0 P-value = 0.01039 There is sufficient evidence that square footage affects house price From Excel output: Reject H0 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 P-value Decision: P-value < Îą so Conclusion: (continued) This is a two-tail test, so the p-value is P(t > 3.329)+P(t < -3.329) = 0.01039 (for 8 d.f.)
  • 54. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-54 F-Test for Significance  F Test statistic: where MSE MSR F = 1kn SSE MSE k SSR MSR −− = = where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom (k = the number of independent variables in the regression model)
  • 55. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-55 Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 11.0848 1708.1957 18934.9348 MSE MSR F === With 1 and 8 degrees of freedom P-value for the F-Test
  • 56. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-56 H0: β1 = 0 H1: β1 ≠ 0 Îą = .05 df1= 1 df2 = 8 Test Statistic: Decision: Conclusion: Reject H0 at Îą = 0.05 There is sufficient evidence that house size affects selling price0 Îą = .05 F.05 = 5.32 Reject H0Do not reject H0 11.08 MSE MSR F == Critical Value: FÎą = 5.32 F-Test for Significance (continued) F
  • 57. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-57 Confidence Interval Estimate for the Slope Confidence Interval Estimate of the Slope: Excel Printout for House Prices: At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858) 1b2n1 Stb −± Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 d.f. = n - 2
  • 58. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-58 Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.70 and $185.80 per square foot of house size   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance Confidence Interval Estimate for the Slope (continued)
  • 59. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-59 t Test for a Correlation Coefficient  Hypotheses H0: ρ = 0 (no correlation between X and Y) HA: ρ ≠ 0 (correlation exists)  Test statistic  (with n – 2 degrees of freedom) 2n r1 ρ-r t 2 − − = 0bifrr 0bifrr where 1 2 1 2 <−= >+=
  • 60. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-60 Example: House Prices Is there evidence of a linear relationship between square feet and house price at the .05 level of significance? H0: ρ = 0 (No correlation) H1: ρ ≠ 0 (correlation exists) Îą =.05 , df = 10 - 2 = 8 3.33 210 .7621 0.762 2n r1 ρr t 22 = − − − = − − − =
  • 61. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-61 Example: Test Solution Conclusion: There is evidence of a linear association at the 5% level of significance Decision: Reject H0 Reject H0Reject H0 Îą/2=.025 -tÎą/2 Do not reject H0 0 tÎą/2 Îą/2=.025 -2.3060 2.3060 3.33 d.f. = 10-2 = 8 3.33 210 .7621 0.762 2n r1 ρr t 22 = − − − = − − − =
  • 62. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-62 Estimating Mean Values and Predicting Individual Values Y XXi Y = b0+b1Xi ∧ Confidence Interval for the mean of Y, given Xi Prediction Interval for an individual Y, given Xi Goal: Form intervals around Y to express uncertainty about the value of Y for a given Xi Y ∧
  • 63. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-63 Confidence Interval for the Average Y, Given X Confidence interval estimate for the mean value of Y given a particular Xi Size of interval varies according to distance away from mean, X iYX2n XX|Y hStYˆ :ÎźforintervalConfidence i − = Âą ∑ − − += − += 2 i 2 i 2 i i )X(X )X(X n 1 SSX )X(X n 1 h
  • 64. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-64 Prediction Interval for an Individual Y, Given X Confidence interval estimate for an Individual value of Y given a particular Xi This extra term adds to the interval width to reflect the added uncertainty for an individual case iYX2n XX h1StYˆ :YforintervalConfidence i +Âą − =
  • 65. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-65 Estimation of Mean Values: Example Find the 95% confidence interval for the mean price of 2,000 square-foot houses Predicted Price Yi = 317.85 ($1,000s) ∧ Confidence Interval Estimate for ÎźY|X=X 37.12317.85 )X(X )X(X n 1 StYˆ 2 i 2 i YX2-n Âą= − − +Âą ∑ The confidence interval endpoints are 280.66 and 354.90, or from $280,660 to $354,900 i
  • 66. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-66 Estimation of Individual Values: Example Find the 95% prediction interval for an individual house with 2,000 square feet Predicted Price Yi = 317.85 ($1,000s) ∧ Prediction Interval Estimate for YX=X 102.28317.85 )X(X )X(X n 1 1StYˆ 2 i 2 i YX1-n Âą= − − ++Âą ∑ The prediction interval endpoints are 215.50 and 420.07, or from $215,500 to $420,070 i
  • 67. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-67 Finding Confidence and Prediction Intervals in Excel  In Excel, use PHStat | regression | simple linear regression …  Check the “confidence and prediction interval for X=” box and enter the X-value and confidence level desired
  • 68. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-68 Input values Finding Confidence and Prediction Intervals in Excel (continued) Confidence Interval Estimate for ÎźY|X=Xi Prediction Interval Estimate for YX=Xi Y ∧
  • 69. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-69 Pitfalls of Regression Analysis  Lacking an awareness of the assumptions underlying least-squares regression  Not knowing how to evaluate the assumptions  Not knowing the alternatives to least-squares regression if a particular assumption is violated  Using a regression model without knowledge of the subject matter  Extrapolating outside the relevant range
  • 70. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-70 Strategies for Avoiding the Pitfalls of Regression  Start with a scatter plot of X on Y to observe possible relationship  Perform residual analysis to check the assumptions  Plot the residuals vs. X to check for violations of assumptions such as homoscedasticity  Use a histogram, stem-and-leaf display, box-and- whisker plot, or normal probability plot of the residuals to uncover possible non-normality
  • 71. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-71 Strategies for Avoiding the Pitfalls of Regression  If there is violation of any assumption, use alternative methods or models  If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals  Avoid making predictions or forecasts outside the relevant range (continued)
  • 72. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-72 Chapter Summary  Introduced types of regression models  Reviewed assumptions of regression and correlation  Discussed determining the simple linear regression equation  Described measures of variation  Discussed residual analysis  Addressed measuring autocorrelation
  • 73. Statistics for Managers Using Microsoft Excel, 4e Š 2004 Prentice-Hall, Inc. Chap 12-73 Chapter Summary  Described inference about the slope  Discussed correlation -- measuring the strength of the association  Addressed estimation of mean values and prediction of individual values  Discussed possible pitfalls in regression and recommended strategies to avoid them (continued)