SlideShare a Scribd company logo
LESSON 10
SIMPLE LINEAR
REGRESSION
CHAPTER 14 SECTIONS 14.1 TO 14.3
7 SECTIONS
1.  SIMPLE LINEAR REGRESSION MODEL 14.2
2.  LEAST SQUARE METHOD 14.2
3.  COEFFICIENT OF DETERMINATION 14.2
4.  MODEL ASSUMPTIONS 14.2
5.  TESTING FOR SIGNIFICANCE 14.3
6.  COVARIANCE & COEFFICIENT OF
CORRELATION 14.1
7.  USING THE ESTIMATED REGRESSION 11.7
EQUATION FOR ESTIMATION AND PREDICTION
1 . SIMPLE LINEAR
REGRESSION MODEL
REGRESSION MODEL
REGRESSION EQUATION
ESTIMATED REGRESSION EQUATION
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
3
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
What influences my Final COMM 215 grade?
Think of a few examples.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
4
FINAL COMM 215 Grade (y) =
Study Time (x1) +
Work Time (x2) +
# of hours spent on Facebook (x3) +
Family Time (x4) +
anything else you can think of (x…) …
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
5
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
Let’s say,
I study 5 hours a week,
work part-time for 25 hours a week,
I spend 3 hours on facebook,
And I have 2 kids.
Here is 1 scenario.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
6
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
What if it was someone else?
With a different profile?
Do I have to make my calculations all over again?
If I set up a regression line, I just need to plug in values and
get an estimation of my Final COMM 215 grade.
Voila!
Then you might ask… how?
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
7
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
How do I create this regression line?
Answer: By gathering data from history.
I am going to take a sample of individuals who took COMM
215 before and write down their profile.
Hours per
week
Study Time Work Time Family
Time
Facebook
Time
Bob 10 25 5 0
Sally 12 0 15 15
Eric 3 40 10 0
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
8
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
Since we are only considering 1 independent variable, let’s
take just 1, Study Time as the main indicator of your Final
COMM 215 grade.
Hours
per
week
Study
Time
(x)
Final
Grade
(y)
Bob 10 89
Sally 12 67
Eric 3 45
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
9
THE OBJECTIVE OF
SLR
Let’s say, I would like to create the ultimate equation to figure
out my Final COMM 215 grade.
We’ve generated this equation!
Now, if Michelle asks, if I study 8 hours a week, what would
be my estimated Final COMM 215 grade?
y = 3.4478x + 38.269
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
10
SIMPLE LINEAR REGRESSION
1 VARIABLE
FINAL COMM 215 GRADE (y) = STUDY TIME (x)
MULTIPLE LINEAR REGRESSION
MORE THAN 1 VARIABLE
FINAL COMM 215 GRADE (y) =
STUDY TIME (x1) + WORK TIME(x2) + ….
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
11
As a way of predicting sales
Managerial decisions are often made based
on the relationship between two or more
variables.
The statistical process is called
regression analysis used to develop an
equation showing
how the variables are related.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
12
The variable
being predicted is
called the dependent variable.
The variable or variables
being used to predict the value of the
dependence variable are
called independent variables.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
13
FINAL COMM 215 Grade (y) =
Study Time (x1) +
Work Time (x2) +
# of hours spent on Facebook (x3) +
Family Time (x4) +
anything else you can think of (x…) …
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
14
SIMPLE LINEAR REGRESSION
EQUATION
  Positive Linear Relationship
E(y)
x
Slope β1
is positive
Regression line
Intercept
β0
n The relationship between the two variables is
approximated by a straight line.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
15
PAGE 572
GROEBNER et Al. (2014)
  Negative Linear Relationship
E(y)
x
Slope β1
is negative
Regression line
Intercept
β0
SIMPLE LINEAR REGRESSION
EQUATION
When would be a case of a negative relationship? If I spend
all my time watching movies instead of studying, would it
increase my grade? Or decrease my grade?
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
16
  No Relationship
E(y)
x
Slope β1
is 0
Regression lineIntercept
β0
SIMPLE LINEAR REGRESSION
EQUATION
Hmm… what would be an example of No relationship?
If my friend eats 5 ice creams everyday, does it have any
relationship with my grade? Not really right?
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
17
SIMPLE LINEAR
REGRESSION MODEL
βo and β1 – parameters of the model
ε is a random variable referred to as the error term.
ε – variability in y that cannot be explained
y = β0 + β1x +εE(y)
x
Slope β1
is positive
Regression line
Intercept
β0
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
18
CHARACTERISTICS OF
THE ERROR TERM
y = β0 + β1x +ε
The more variables you add
The small ε will become
because now you know more!
y=β0+β1x1+ β2x2+ β3x3+ β4x4
+ β5x5+ β6x6+ β7x7+ β8x8…+ βixi
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
19
PAGE 573
GROEBNER et Al. (2014)
CHARACTERISTICS OF
THE ERROR TERM
y = β0 + β1x +ε
The more variables you add
The small ε will become
because now you know more!
y=β0+β1x1+ β2x2+ β3x3+ β4x4
+ β5x5+ β6x6+ β7x7+ β8x8…+ βixi
Cannot be
explained!
Cannot be
calculated!
Cannot.. Just don’t
ask sigh…
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
20
REGRESSION EQUATION
Describes how the expected value of y, E(y) is related to x
E(y) = β0 + β1x
•  E(y) is the expected value of y for a given x value.
•  β1 is the slope of the regression line.
•  β0 is the y intercept of the regression line.
•  Graph of the regression equation is a straight line.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
21
PAGE 573
GROEBNER et Al. (2014)
ESTIMATED SIMPLE LINEAR
REGRESSION EQUATION
Sample statistics are
E(y) = β0 + β1x
0 1
ˆy b b x= +
•  is the estimated value of y for a given x value.ˆy
•  b1 is the slope of the line.
•  b0 is the y intercept of the line.
•  The graph is called the estimated regression line.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
22
PROBLEM #10.1
In the linear regression equation, y = b0 + b1x1, why is the
term at the left given as ŷ instead of simply y?
because it is an estimated value for the dependent
variable given a value of x.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
23
2. LEAST SQUARE METHOD
Is a procedure for using sample data to find
the estimated regression equation.
The goal to using the least square method
0 1
ˆy b b x= +
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
24
Good fit
with the
line
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
25
LEAST SQUARES METHOD
Least Squares Criterion
min (y yi i−∑  )2
where:
yi = observed value of the dependent variable
for the ith observation
^
yi = estimated value of the dependent variable
for the ith observation
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
26
PAGE 572
GROEBNER et Al. (2014)
PROBLEM # 10.2
A scatter diagram includes the data points (x=2,y=10), (x=3,
y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are
proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least-
squares criterion, which of these regression lines is the better fit to
the data? Why?
ŷ = 10 + x
x y ŷ (y-ŷ) (y-ŷ)2x y ŷ (y-ŷ) (y-ŷ)2
2 10
3 12
4 20
5 16
ŷ = 10 + = 10 + 2 = 12
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
27
PROBLEM # 10.2
A scatter diagram includes the data points (x=2,y=10), (x=3,
y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are
proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least-
squares criterion, which of these regression lines is the better fit to
the data? Why?
ŷ = 10 + x
x y ŷ (y-ŷ) (y-ŷ)2x y ŷ (y-ŷ) (y-ŷ)2
2 10
3 12
4 20
5 16
ŷ = 10 + = 10 + 2 = 12
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12
3 12
4 20
5 16
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12
3 12 13
4 20 14
5 16 15
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2
3 12 13 -1
4 20 14 6
5 16 15 1
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2 4
3 12 13 -1 1
4 20 14 6 36
5 16 15 1 1
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2 4
3 12 13 -1 1
4 20 14 6 36
5 16 15 1 1
42
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
28
PROBLEM # 10.2
A scatter diagram includes the data points (x=2,y=10), (x=3,
y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are
proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least-
squares criterion, which of these regression lines is the better fit to
the data? Why?
ŷ = 8 + 2x
x y ŷ (y-ŷ) (y-ŷ)2
2 10
3 12
4 20
5 16
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12
3 12 14
4 20 16
5 16 18
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2
3 12 14 -2
4 20 16 4
5 16 18 -2
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2 4
3 12 14 -2 4
4 20 16 4 16
5 16 18 -2 4
x y ŷ (y-ŷ) (y-ŷ)2
2 10 12 -2 4
3 12 14 -2 4
4 20 16 4 16
5 16 18 -2 4
28
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
29
PROBLEM # 10.2
A scatter diagram includes the data points (x=2,y=10), (x=3,
y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are
proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least-
squares criterion, which of these regression lines is the better fit to
the data? Why?
For ŷ = 10 + x, the least square criterion = 42
For ŷ = 8 + 2x, the least square criterion = 28
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
30
Slope for the Estimated Regression Equation
1 2
( )( )
( )
i i
i
x x y y
b
x x
− −
=
−
∑
∑
LEAST SQUARES METHOD
where:
xi = value of independent variable for ith
observation
_
y = mean value for dependent variable
_
x = mean value for independent variable
yi = value of dependent variable for ith
observation
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
31
PAGE 573
GROEBNER et Al. (2014)
  y-Intercept for the Estimated Regression Equation
0 1b y b x= −
LEAST SQUARES METHOD
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
32
PROBLEM # 10.3
For a sample of 8 employees, a personnel
director has collected the following data on
ownership of company stock versus years with
the firm.
a. Determine the least-squares regression line
and interpret its slope.
b. For an employee who has been with the firm
10 years, what is the predicted number of shares
of stock owned?
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
33
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
0 1
ˆy b b x= +
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
34
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
0 1
ˆy b b x= +
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
35
x y xy x2
6.00 300.00 1800.00 36.00
12.00 408.00 4896.00 144.00
14.00 560.00 7840.00 196.00
6.00 252.00 1512.00 36.00
9.00 288.00 2592.00 81.00
13.00 650.00 8450.00 169.00
15.00 630.00 9450.00 225.00
9.00 522.00 4698.00 81.00
Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00
x y xy x2
6.00 300.00 1800.00 36.00
12.00 408.00 4896.00
14.00 560.00
6.00 252.00 1512.00 36.00
9.00 288.00 81.00
13.00 650.00 8450.00
15.00 630.00 9450.00 225.00
9.00 522.00 81.00
Σ x = Σ y = Σ xy = Σ x2 =
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has collected
the following data on ownership of company stock versus years
with the firm.
a.  Determine the least-squares regression line and interpret its
slope.
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
x y xy x2
6.00 300.00 1800.00 36.00
12.00 408.00 4896.00
14.00 560.00 7840.00
6.00 252.00 1512.00 36.00
9.00 288.00 2592.00 81.00
13.00 650.00 8450.00
15.00 630.00 9450.00 225.00
9.00 522.00 4698.00 81.00
Σ x = Σ y = Σ xy = Σ x2 =
x y xy x2
6.00 300.00 1800.00 36.00
12.00 408.00 4896.00 144.00
14.00 560.00 7840.00 196.00
6.00 252.00 1512.00 36.00
9.00 288.00 2592.00 81.00
13.00 650.00 8450.00 169.00
15.00 630.00 9450.00 225.00
9.00 522.00 4698.00 81.00
Σ x = Σ y = Σ xy = Σ x2 =
x y xy x2
6.00 300.00 1800.00 36.00
12.00 408.00 4896.00 144.00
14.00 560.00 7840.00 196.00
6.00 252.00 1512.00 36.00
9.00 288.00 2592.00 81.00
13.00 650.00 8450.00 169.00
15.00 630.00 9450.00 225.00
9.00 522.00 4698.00 81.00
Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
36
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
0 1
ˆy b b x= +
Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00
Do your calculations! Find bo and b1!
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
37
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
0 1
ˆy b b x= +
Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00
b1=
Σxy –[Σx Σy / n]
Σx2 – [(Σx)2 / n]
41,238 – [(84) (3610)/8]
(968) – [(84)2 / 8]
=
3333
86
= = 38.756
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
38
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
0 1
ˆy b b x= +
Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00
b0= ymean – b1 xmean = 451.25 – 38.756 (10.5)
ymean= Σ y / n = 3610.00/8 = 451.25
xmean= Σ x / n = 84 /8 = 10.5
= 44.314
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
39
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
a.  Determine the least-squares regression line and interpret
its slope.
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
0 1
ˆy b b x= +
y = 38.756x + 44.314
R² = 0.72009
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
b1= 38.756
bo= 44.314
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
40
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
C D E F G H I
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.849
R Square 0.720
Adjusted R Square 0.673
Standard Error 91.479
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 129173.13 129173.13 15.436 0.008
Residual 6 50210.37 8368.40
Total 7 179383.50
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 44.3140 108.51 0.408 0.697 -221.197 309.825
Years 38.7558 9.86 3.929 0.008 14.618 62.893
0 1
ˆy b b x= +
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
41
PROBLEM # 10.3
For a sample of 8 employees, a personnel director has
collected the following data on ownership of company stock
versus years with the firm.
b. For an employee who has been with the firm
10 years,
what is the predicted number of shares of stock owned?
X=	
  years	
   Y	
  =	
  shares	
  
6	
   300	
  
12	
   408	
  
14	
   560	
  
6	
   252	
  
9	
   288	
  
13	
   650	
  
15	
   630	
  
9	
   522	
  
ŷ = 44.314 + 38.756 x
ŷ = 44.314 + 38.756 (10)
ŷ = 431.9
the predicted value of Shares will be 431.9
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
42
3. COEFFICIENT OF DETERMINATION
CORRELATION COEFFICIENT
y = 38.756x + 44.314
R² = 0.72009
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16
shares
years
Ownership of company stock
vs years with the firm
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
43
COEFFICIENT OF
DETERMINATION
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12 14 16 18
SHARES
YEARS
Ownership of company stock
vs years with the firm
x	
   y	
  
6	
   300	
  
9	
   408	
  
14	
   560	
  
6	
   252	
  
6	
   288	
  
16	
   650	
  
15	
   630	
  
13	
   522	
  
R² = 0.98688
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
44
PROVIDES A MEASURE OF THE GOODNESS OF FIT FOR THE
ESTIMATED REGRESSION EQUATION
IN OTHER WORDS
DOES THE INDEPENDENT VARIABLE
EXPLAIN
THE DEPENDENT VARIABLE WELL?
COEFFICIENT OF
DETERMINATION
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
45
FINAL COMM 215 Grade
(y) =
Study Time (x1) +
Work Time (x2) +
# of hours spent on Facebook
(x3) +
Family Time (x4) +
anything else you can think of
(x…) …
Explain 40% of
the grade
Explain 15% of
the grade
Explain 9% of
the grade
Explain 15% of
the grade
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
46
FINAL COMM 215 Grade
(y) =
Study Time (x1) +
Work Time (x2) +
# of hours spent on Facebook
(x3) +
Family Time (x4) +
anything else you can think of
(x…) …
Explain 40% of
the grade
Explain 15% of
the grade
Explain 9% of
the grade
Explain 15% of
the grade
These 4 Variables
explain
79% of your
grade!
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
47
ymean=451.25
Sum of Squares due to Error (SSE)
Difference between Observed y and Expected ŷ
y - ŷΣ( )2
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
48
PAGE 575
GROEBNER et Al. (2014)
ymean=451.25
Sum of Squares due to Regression (SSR)
Difference between Mean of y and Expected ŷ
ymean - ŷΣ( )2
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
49
PAGE 581
GROEBNER et Al. (2014)
ymean=451.25
Total Sum of Squares (SST)
Difference between Observed y and Mean of y
y- ymeanΣ( )2
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
50
PAGE 580
GROEBNER et Al. (2014)
COEFFICIENT OF DETERMINATION
Relationship Among SST, SSR, SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
SST = SSR + SSE
2
( )iy y−∑ 2
ˆ( )iy y= −∑ 2
ˆ( )i iy y+ −∑
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
51
  The coefficient of determination is:
where:
SSR = sum of squares due to regression
= explained variation
SST = total sum of squares
= total variation
r2 = SSR/SST
COEFFICIENT OF DETERMINATION
PAGE 392
ADCOCK ET Al. (2011)
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
52
COEFFICIENT OF
DETERMINATION (R2)
Expresses proportion of
the variation in the
dependent variable (y)
that is explained by the
regression line:
ŷ = b0+b1x1
COEFFICIENT OF
CORRELATION (R)
Describes both the
direction and the
strength of the linear
relationship between
two variables
r = (sign of b1) √ r2
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
53
PROBLEM #10.4
For a set of data, the total variation or sum of squares for y is
SST = 143.0, and error sum of squares is SSE = 24.0. What
proportion of the variation in y is explained by the regression
equation?
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
54
PROBLEM #10.4
For a set of data, the total variation or sum of squares for y is
SST = 143.0, and error sum of squares is SSE = 24.0. What
proportion of the variation in y is explained by the regression
equation?
= SSR/SST
We are asked to find the
coefficient of determination :
R2
We also know SST = SSR + SSE
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
55
PROBLEM #10.4
For a set of data, the total variation or sum of squares for y is
SST = 143.0, and error sum of squares is SSE = 24.0. What
proportion of the variation in y is explained by the regression
equation?
= SSR/SST
We are asked to find the
coefficient of determination :
R2
We also know SST = SSR + SSE
SST – SSE = SSR
143.0 – 24.0 = 119
119 /143.0 = 0.832 = 83.2%
Interpretation:
83.2% of the variation in y is explained by x
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
56
4. MODEL
ASSUMPTIONS
Before conducting regression analysis…
What determines an appropriate model for
the relationship between the dependent and
the independent variable(s).
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
57
Even if the data fits well , the estimated
regression equation should not be used
until further analysis of
how appropriate the assumed model is.
One way to determine is to test for
significance.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
58
ASSUMPTIONS-
ERROR TERM
1.  The error term ε is a random variable with
an expected value of zero. In estimating an
element that is unpredictable, it is best to
assume it to be zero.
2.  The variance of ε, denoted by σ2 is the
same for all values of x.
3.  The values of error are independent.
4.  The error term is normally distributed.
PAGE 370
ADCOCK ET Al. (2011)
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
59
PAGE 371
ADCOCK ET Al. (2011)
Do you see the normal curve?
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
60
PAGE 371
ADCOCK ET Al. (2011)
Do you see the normal curve?
Right here is the population
mean, which is also your point
on the line.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
61
PAGE 371
ADCOCK ET Al. (2011)
Assumption # 2: The variance/ standard
deviation for each value x is the same. All
of these normal distributions have the
same standard deviation.
Assumption # 2: The variance/ standard
deviation for each value x is the same. All
of these normal distributions have the
same standard deviation.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
62
PAGE 371
ADCOCK ET Al. (2011)
Assumptions # 3 & 4: the error terms are independent
and are normally distributed.
If the point is right on the line, then you have en error
of 0. If your point is away from the line, the further it is
away, the larger the error term is.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
63
PAGE 371
ADCOCK ET Al. (2011)
Assumptions # 3 & 4: the error terms are independent
and are normally distributed.
If the point is right on the line, then you have en error
of 0. If your point is away from the line, the further it is
away, the larger the error term is.
Away from the line, larger
error = larger standard error.
Right on the line, error = 0,
standard error = 0. It is exactly
where we expect the point to
be.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
64
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
65
5. TESTING FOR
SIGNIFICANCE
ESTIMATE σ2
TESTING FOR SIGNIFICANCE
CONFIDENCE INTERVAL FOR B1
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
66
An Estimate of σ 2
TESTING FOR SIGNIFICANCE
∑∑ −−=−= 2
10
2
)()ˆ(SSE iiii xbbyyy
where:
s2 = MSE = SSE/(n - 2)
The mean square error (MSE) provides the estimate
of σ 2, and the notation s2 is also used.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
67
SSE = ! yi
2
" b0 yi " b1! xi yi!
TESTING FOR SIGNIFICANCE
An Estimate of σ
2
SSE
MSE
−
==
n
s
•  To estimate σ we take the square root of σ 2.
•  The resulting s is called the standard error of
the estimate.
Why is this n-2?, it actually is n-k-1.
In a simple linear regression, since you always
have 1 independent variable (k=1), therefore
automatically it becomes n-1-1.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
68
14-69
Large Standard Error Small Standard Error
STANDARD ERROR
OF THE ESTIMATE
TESTING FOR SIGNIFICANCE
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of β1 is zero.
Two tests are commonly used:
t Test and F Test
Both the t test and F test require an estimate of σ 2,
the variance of ε in the regression model.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
70
Hypotheses
Test Statistic
TESTING FOR SIGNIFICANCE: T
TEST
0 1: 0H β =
1: 0aH β ≠
1
1
b
b
t
s
= where
1 2
( )
b
i
s
s
x x
=
Σ −
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
71
PAGE 584
GROEBNER et Al. (2014)
1. Set up Hypotheses.
2. What is the appropriate test statistic to use?.
3. Calculate the test statistic value.
α = .054. Find the critical value for the test statistic.
0 1: 0H β = 1: 0aH β ≠
1
1
b
b
t
s
=
5. Define the decision rule
6. Make your decision
7. Interpret the conclusion in context
TESTING FOR SIGNIFICANCE: T TEST
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
72
The form of a confidence interval for β1 is:
CONFIDENCE INTERVAL FOR Β1
11 /2 bb t sα±
where is the t value providing an area
of α/2 in the upper tail of a t distribution
with n - 2 degrees of freedom
2/αt
b1 is the
point
estimator
is the
margin
of error
1/2 bt sα
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
73
PAGE 594
GROEBNER et Al. (2014)
CONFIDENCE INTERVAL FOR Β1
n  H0 is rejected if the hypothesized value of β1 is not
included in the confidence interval for β1.
n  We can use a 95% confidence interval for β1 to test
the hypotheses just used in the t test.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
74
CONFIDENCE INTERVAL FOR Β1
Reject H0 if 0 is not included in
the confidence interval for β1.
0 is not included in the confidence interval.
Reject H0
= 5.0 ± 2.048(2.25) = 5.0 ± 4.60812/1 bstb α±
or 0.392 to 9.608
  Rejection Rule
  95% Confidence Interval for β1
  Conclusion
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
75
SOME CAUTIONS ABOUT THE
INTERPRETATION OF
SIGNIFICANCE TESTS
n  Just because we are able to reject H0: β1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
n  Rejecting H0: β1 = 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
76
What is a standard error of the estimate?
What is a standard error of the slope?
2
SSE
MSE
−
==
n
s 1 2
( )
b
i
s
s
x x
=
Σ −
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
77
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has
been reviewing weekly advertising expenditures. During the past 6
months, all advertisements for the store have appeared in the local
newspaper. The number of ads per week has varied from one to
seven. The store’s sales staff has been tracking the number of
customers who enter the store each week. The number of ads and
the number of customers per week for the past 26 weeks were
recorded.
a.  Determine the sample regression line
b.  Interpret the coefficients.
c.  Can the manager infer that the larger the number of ads, the larger
the number of customers?
d. Find and interpret the coefficient of determination.
e. In your opinion, is it worthwhile exercise to use the regression
equation to predict the number of customers who will enter the store,
given that Colonial intends to advertise five times in the newspaper? If
so, find the 95% prediction interval. If not, explain why not.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
78
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has been
reviewing weekly advertising expenditures. During the past 6 months, all
advertisements for the store have appeared in the local newspaper. The
number of ads per week has varied from one to seven. The store’s sales
staff has been tracking the number of customers who enter the store each
week. The number of ads and the number of customers per week for the
past 26 weeks were recorded.
a.  Determine the sample regression line
b.  Interpret the coefficients.
Ads Customer
5 353
6 319
3 440
2 332
4 172
2 331
4 344
2 483
4 329
2 532
7 496
5 393
4 376
7 372
2 512
5 254
5 459
2 153
1 426
6 566
6 596
5 395
6 676
3 194
2 135
7 367
!
ŷ = 296.92 + 21.356x
On average each additional ad generates 21.36 customers.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
79
PROBLEM # 10.5
c. Can the manager infer that the larger the number of ads, the
larger the number of customers?
1. Set up the hypotheses:
Ho: β1 = 0 ; Ha: β1 > 0
2. What is the appropriate test statistics to use?
One tail t-test, α=0.05
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
80
1. Set up Hypotheses.
2. What is the appropriate test statistic to use?.
3. Calculate the test statistic value.
α = .054. Find the critical value for the test statistic.
Testing for Significance: t Test
1
1
b
b
t
s
=
5. Define the decision rule
6. Make your decision
7. Interpret the conclusion in context
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
81
3. Calculate the test statistic value.
1
1
b
b
t
s
=
2n
SSE
s
!
=" !
1
3
2
4
5
SSE
Standard Error of Estimate (sε)
= = SSxx
1 2
( )
b
i
s
s
x x
=
Σ −
n-k-1
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
82
PROBLEM # 10.5
At 5% level significance. The manager of
Colonial Furniture has been reviewing
weekly advertising expenditures. During
the past 6 months, all advertisements for
the store have appeared in the local
newspaper. The number of ads per week
has varied from one to seven. The store’s
sales staff has been tracking the number of
customers who enter the store each week.
The number of ads and the number of
customers per week for the past 26 weeks
were recorded.
c.  Can the manager infer that the larger
the number of ads, the larger the
number of customers?
Ads x Customer y ŷ	
  =	
  296.92	
  +	
  21.356x	
   y-­‐ŷ	
   (y-­‐ŷ)2	
  
5.00 353.00 403.70	
   -­‐50.70	
   2570.49	
  
6.00 319.00 425.06	
   -­‐106.06	
   11247.88	
  
3.00 440.00 360.99	
   79.01	
   6242.90	
  
2.00 332.00 339.63	
   -­‐7.63	
   58.25	
  
4.00 172.00 382.34	
   -­‐210.34	
   44244.60	
  
2.00 331.00 339.63	
   -­‐8.63	
   74.51	
  
4.00 344.00 382.34	
   -­‐38.34	
   1470.26	
  
2.00 483.00 339.63	
   143.37	
   20554.38	
  
4.00 329.00 382.34	
   -­‐53.34	
   2845.58	
  
2.00 532.00 339.63	
   192.37	
   37005.45	
  
7.00 496.00 446.41	
   49.59	
   2458.97	
  
5.00 393.00 403.70	
   -­‐10.70	
   114.49	
  
4.00 376.00 382.34	
   -­‐6.34	
   40.25	
  
7.00 372.00 446.41	
   -­‐74.41	
   5537.15	
  
2.00 512.00 339.63	
   172.37	
   29710.73	
  
5.00 254.00 403.70	
   -­‐149.70	
   22410.09	
  
5.00 459.00 403.70	
   55.30	
   3058.09	
  
2.00 153.00 339.63	
   -­‐186.63	
   34831.50	
  
1.00 426.00 318.28	
   107.72	
   11604.46	
  
6.00 566.00 425.06	
   140.94	
   19865.21	
  
6.00 596.00 425.06	
   170.94	
   29221.85	
  
5.00 395.00 403.70	
   -­‐8.70	
   75.69	
  
6.00 676.00 425.06	
   250.94	
   62972.89	
  
3.00 194.00 360.99	
   -­‐166.99	
   27884.99	
  
2.00 135.00 339.63	
   -­‐204.63	
   41874.26	
  
7.00 367.00 446.41	
   -­‐79.41	
   6306.27	
  
424281.17	
  
SSE = Σ(y-ŷ)2
1
SSE = Σ(y-ŷ)2 = 424281.17
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
83
PROBLEM # 10.5
At 5% level significance. The manager of
Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6
months, all advertisements for the store have
appeared in the local newspaper. The number
of ads per week has varied from one to seven.
The store’s sales staff has been tracking the
number of customers who enter the store
each week. The number of ads and the
number of customers per week for the past 26
weeks were recorded.
c.  Can the manager infer that the larger the
number of ads, the larger the number of
customers?
2
2n
SSE
s
!
=" !
SSE = Σ(y-ŷ)2 = 424281.17	
  
2n
SSE
s
!
=" !
n-k-1
424281.17	
  
26-1-1
2n
SSE
s
!
=" !132.96
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
84
PROBLEM # 10.5
At 5% level significance. The manager of Colonial
Furniture has been reviewing weekly advertising
expenditures. During the past 6 months, all
advertisements for the store have appeared in the
local newspaper. The number of ads per week has
varied from one to seven. The store’s sales staff has
been tracking the number of customers who enter
the store each week. The number of ads and the
number of customers per week for the past 26
weeks were recorded.
c.  Can the manager infer that the larger the
number of ads, the larger the number of
customers?
SSxx=
Ads x x-­‐xbar	
   (x-­‐xbar)2	
  
5.00 0.88	
   0.7744	
  
6.00 1.88	
   3.5344	
  
3.00 -­‐1.12	
   1.2544	
  
2.00 -­‐2.12	
   4.4944	
  
4.00 -­‐0.12	
   0.0144	
  
2.00 -­‐2.12	
   4.4944	
  
4.00 -­‐0.12	
   0.0144	
  
2.00 -­‐2.12	
   4.4944	
  
4.00 -­‐0.12	
   0.0144	
  
2.00 -­‐2.12	
   4.4944	
  
7.00 2.88	
   8.2944	
  
5.00 0.88	
   0.7744	
  
4.00 -­‐0.12	
   0.0144	
  
7.00 2.88	
   8.2944	
  
2.00 -­‐2.12	
   4.4944	
  
5.00 0.88	
   0.7744	
  
5.00 0.88	
   0.7744	
  
2.00 -­‐2.12	
   4.4944	
  
1.00 -­‐3.12	
   9.7344	
  
6.00 1.88	
   3.5344	
  
6.00 1.88	
   3.5344	
  
5.00 0.88	
   0.7744	
  
6.00 1.88	
   3.5344	
  
3.00 -­‐1.12	
   1.2544	
  
2.00 -­‐2.12	
   4.4944	
  
7.00 2.88	
   8.2944	
  
4.12	
   86.6544	
  
3
1 2
( )
b
i
s
s
x x
=
Σ −
=132.96
1 2
( )
b
i
s
s
x x
=
Σ −86.6544
132.96
= 14.28
4
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
85
PROBLEM # 10.5
At 5% level significance. The manager of
Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6
months, all advertisements for the store have
appeared in the local newspaper. The number
of ads per week has varied from one to seven.
The store’s sales staff has been tracking the
number of customers who enter the store each
week. The number of ads and the number of
customers per week for the past 26 weeks
were recorded.
c.  Can the manager infer that the larger the
number of ads, the larger the number of
customers?
5
1
1
b
b
t
s
=
1 2
( )
b
i
s
s
x x
=
Σ −
= 14.28
ŷ = 296.92 + 21.356x
= + 21.356
= 14.28
1.4955
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
86
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6 months, all advertisements for the store have
appeared in the local newspaper. The number of ads per week has varied from one to seven.
The store’s sales staff has been tracking the number of customers who enter the store each
week. The number of ads and the number of customers per week for the past 26 weeks were
recorded.
c.  Can the manager infer that the larger the number of ads, the larger the number of
customers?
3. Calculate the test statistic value.
t =
b1 – β1
sb1
=
21.356 - 0
14.28
= 1.496
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
87
PROBLEM # 10.5
4. Find the critical value of the test statistics
tα, n-k-1= t0.05,24 = 1.711
5. Define the decision rule
Reject Ho, if tobserved > tcritical, otherwise do not reject.
6. Make your decision
Since tobserved= 1.4955 < tcritical , then we do not reject Ho
7. Interpret in the context
There is not enough evidence to conclude that the larger the number
of ads the larger the number of customers.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
88
6. COVARIANCE &
COEFFICIENT OF CORRELATION
Covariance
Interpretation of the covariance
Correlation coefficient
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
89
MEASURES OF ASSOCIATION
BETWEEN TWO VARIABLES
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
Often a manager or decision maker is interested in
the relationship between two variables.
Two descriptive measures of the relationship
between two variables are covariance and correlation
coefficient.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
90
2 VARIABLE RELATIONSHIPS
COVARIANCE
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.
The covariance is a measure of the linear association
between two variables.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
92
PAGE 560
GROEBNER et Al. (2014)
The covariance is computed as follows:
for
samples
for
populations
s
x x y y
n
xy
i i=
− −∑
−
( )( )
1
σ
µ µ
xy
i x i yx y
N
=
− −∑( )( )
COVARIANCE
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
93
Just because two variables are highly correlated, it
does not mean that one variable is the cause of the
other.
Correlation is a measure of linear association and not
necessarily causation.
CORRELATION COEFFICIENT
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
94
The correlation coefficient is computed as follows:
for
samples
for
populations
r
s
s s
xy
xy
x y
= ρ
σ
σ σxy
xy
x y
=
CORRELATION COEFFICIENT
Pearson Product Moment
Correlation Coefficient.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
95
∑
∑
∑ ∑
∑ ∑
∑ ∑ ∑ ∑
r - Sample correlation coefficient
n - Sample size
x - Value of the independent variable
y - Value of the dependent variable
CORRELATION COEFFICIENT
PAGE 561
GROEBNER et Al. (2014)
CORRELATION COEFFICIENT
Values near +1 indicate a strong positive linear
relationship.
Values near -1 indicate a strong negative linear
relationship.
The coefficient can take on values between -1 and +1.
The closer the correlation is to zero, the weaker the
relationship.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
97
Covariance and Correlation Coefficient
277.6
259.5
269.1
267.0
255.6
272.9
69
71
70
70
71
69
x y
10.65
-7.45
2.15
0.05
-11.35
5.95
-1.0
1.0
0
0
1.0
-1.0
-10.65
-7.45
0
0
-11.35
-5.95
( )ix x− ( )( )i ix x y y− −( )iy y−
Average
Std. Dev.
267.0 70.0 -35.40
8.2192 .8944
Total
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
98
Ads x Customer y x-­‐xmean	
   (x-­‐xmean)2	
   y-­‐ymean	
   (y-­‐ymean)2	
   (x-­‐xmean)(y-­‐ymean)	
  
5.00 353.00 0.88	
   0.77	
   -­‐31.81	
   1011.88	
   -­‐27.993	
  
6.00 319.00 1.88	
   3.53	
   -­‐65.81	
   4330.96	
   -­‐123.723	
  
3.00 440.00 -­‐1.12	
   1.25	
   55.19	
   3045.94	
   -­‐61.813	
  
2.00 332.00 -­‐2.12	
   4.49	
   -­‐52.81	
   2788.90	
   111.957	
  
4.00 172.00 -­‐0.12	
   0.01	
   -­‐212.81	
   45288.10	
   25.537	
  
2.00 331.00 -­‐2.12	
   4.49	
   -­‐53.81	
   2895.52	
   114.077	
  
4.00 344.00 -­‐0.12	
   0.01	
   -­‐40.81	
   1665.46	
   4.897	
  
2.00 483.00 -­‐2.12	
   4.49	
   98.19	
   9641.28	
   -­‐208.163	
  
4.00 329.00 -­‐0.12	
   0.01	
   -­‐55.81	
   3114.76	
   6.697	
  
2.00 532.00 -­‐2.12	
   4.49	
   147.19	
   21664.90	
   -­‐312.043	
  
7.00 496.00 2.88	
   8.29	
   111.19	
   12363.22	
   320.227	
  
5.00 393.00 0.88	
   0.77	
   8.19	
   67.08	
   7.207	
  
4.00 376.00 -­‐0.12	
   0.01	
   -­‐8.81	
   77.62	
   1.057	
  
7.00 372.00 2.88	
   8.29	
   -­‐12.81	
   164.10	
   -­‐36.893	
  
2.00 512.00 -­‐2.12	
   4.49	
   127.19	
   16177.30	
   -­‐269.643	
  
5.00 254.00 0.88	
   0.77	
   -­‐130.81	
   17111.26	
   -­‐115.113	
  
5.00 459.00 0.88	
   0.77	
   74.19	
   5504.16	
   65.287	
  
2.00 153.00 -­‐2.12	
   4.49	
   -­‐231.81	
   53735.88	
   491.437	
  
1.00 426.00 -­‐3.12	
   9.73	
   41.19	
   1696.62	
   -­‐128.513	
  
6.00 566.00 1.88	
   3.53	
   181.19	
   32829.82	
   340.637	
  
6.00 596.00 1.88	
   3.53	
   211.19	
   44601.22	
   397.037	
  
5.00 395.00 0.88	
   0.77	
   10.19	
   103.84	
   8.967	
  
6.00 676.00 1.88	
   3.53	
   291.19	
   84791.62	
   547.437	
  
3.00 194.00 -­‐1.12	
   1.25	
   -­‐190.81	
   36408.46	
   213.707	
  
2.00 135.00 -­‐2.12	
   4.49	
   -­‐249.81	
   62405.04	
   529.597	
  
7.00 367.00 2.88	
   8.29	
   -­‐17.81	
   317.20	
   -­‐51.293	
  
4.12	
   384.81	
   86.65	
   463802.04	
   1850.577	
  
1.86	
   136.21	
   74.023	
  
3.466176	
   18552.081544	
  Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
99
Ads x Customer y
x-­‐xmean	
   (x-­‐xmean)2	
   y-­‐ymean	
   (y-­‐ymean)2	
  
(x-­‐xmean)	
  
(y-­‐ymean)	
  
5.00 353.00 0.88	
   0.77	
   -­‐31.81	
   1011.88	
   -­‐27.993	
  
… …. …	
   …	
   …	
   …	
   …	
  
6.00 566.00 1.88	
   3.53	
   181.19	
   32829.82	
   340.637	
  
6.00 596.00 1.88	
   3.53	
   211.19	
   44601.22	
   397.037	
  
5.00 395.00 0.88	
   0.77	
   10.19	
   103.84	
   8.967	
  
6.00 676.00 1.88	
   3.53	
   291.19	
   84791.62	
   547.437	
  
3.00 194.00 -­‐1.12	
   1.25	
   -­‐190.81	
   36408.46	
   213.707	
  
2.00 135.00 -­‐2.12	
   4.49	
   -­‐249.81	
   62405.04	
   529.597	
  
7.00 367.00 2.88	
   8.29	
   -­‐17.81	
   317.20	
   -­‐51.293	
  
4.12	
   384.81	
   86.65	
   463802.04	
   1850.577	
  
1.86	
   136.21	
   74.023	
  
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
100
Ads x Customer y x-­‐xmean	
   (x-­‐xmean)2	
   y-­‐ymean	
   (y-­‐ymean)2	
   (x-­‐xmean)(y-­‐ymean)	
  
5.00 353.00 0.88	
   0.77	
   -­‐31.81	
   1011.88	
   -­‐27.993	
  
6.00 319.00 1.88	
   3.53	
   -­‐65.81	
   4330.96	
   -­‐123.723	
  
3.00 440.00 -­‐1.12	
   1.25	
   55.19	
   3045.94	
   -­‐61.813	
  
2.00 332.00 -­‐2.12	
   4.49	
   -­‐52.81	
   2788.90	
   111.957	
  
4.00 172.00 -­‐0.12	
   0.01	
   -­‐212.81	
   45288.10	
   25.537	
  
2.00 331.00 -­‐2.12	
   4.49	
   -­‐53.81	
   2895.52	
   114.077	
  
4.00 344.00 -­‐0.12	
   0.01	
   -­‐40.81	
   1665.46	
   4.897	
  
2.00 483.00 -­‐2.12	
   4.49	
   98.19	
   9641.28	
   -­‐208.163	
  
4.00 329.00 -­‐0.12	
   0.01	
   -­‐55.81	
   3114.76	
   6.697	
  
2.00 532.00 -­‐2.12	
   4.49	
   147.19	
   21664.90	
   -­‐312.043	
  
7.00 496.00 2.88	
   8.29	
   111.19	
   12363.22	
   320.227	
  
5.00 393.00 0.88	
   0.77	
   8.19	
   67.08	
   7.207	
  
4.00 376.00 -­‐0.12	
   0.01	
   -­‐8.81	
   77.62	
   1.057	
  
7.00 372.00 2.88	
   8.29	
   -­‐12.81	
   164.10	
   -­‐36.893	
  
2.00 512.00 -­‐2.12	
   4.49	
   127.19	
   16177.30	
   -­‐269.643	
  
5.00 254.00 0.88	
   0.77	
   -­‐130.81	
   17111.26	
   -­‐115.113	
  
5.00 459.00 0.88	
   0.77	
   74.19	
   5504.16	
   65.287	
  
2.00 153.00 -­‐2.12	
   4.49	
   -­‐231.81	
   53735.88	
   491.437	
  
1.00 426.00 -­‐3.12	
   9.73	
   41.19	
   1696.62	
   -­‐128.513	
  
6.00 566.00 1.88	
   3.53	
   181.19	
   32829.82	
   340.637	
  
6.00 596.00 1.88	
   3.53	
   211.19	
   44601.22	
   397.037	
  
5.00 395.00 0.88	
   0.77	
   10.19	
   103.84	
   8.967	
  
6.00 676.00 1.88	
   3.53	
   291.19	
   84791.62	
   547.437	
  
3.00 194.00 -­‐1.12	
   1.25	
   -­‐190.81	
   36408.46	
   213.707	
  
2.00 135.00 -­‐2.12	
   4.49	
   -­‐249.81	
   62405.04	
   529.597	
  
7.00 367.00 2.88	
   8.29	
   -­‐17.81	
   317.20	
   -­‐51.293	
  
4.12	
   384.81	
   86.65	
   463802.04	
   1850.577	
  
1.86	
   136.21	
   74.023	
  
s
x x y y
n
xy
i i=
− −∑
−
( )( )
1
r
s
s s
xy
xy
x y
=
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
101
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6 months, all advertisements for the store have
appeared in the local newspaper. The number of ads per week has varied from one to seven.
The store’s sales staff has been tracking the number of customers who enter the store each
week. The number of ads and the number of customers per week for the past 26 weeks were
recorded.
d. Find and interpret the coefficient of determination.
e. In your opinion, is it worthwhile exercise to use the regression
equation to predict the number of customers who will enter the store,
given that Colonial intends to advertise five times in the newspaper? If
so, find the 95% prediction interval. If not, explain why not.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
102
Ads x Customer y
x-­‐xmean	
   (x-­‐xmean)2	
   y-­‐ymean	
   (y-­‐ymean)2	
  
(x-­‐xmean)	
  
(y-­‐ymean)	
  
5.00 353.00 0.88	
   0.77	
   -­‐31.81	
   1011.88	
   -­‐27.993	
  
… … …	
   …	
   …	
   …	
   …	
  
6.00 319.00 1.88	
   3.53	
   -­‐65.81	
   4330.96	
   -­‐123.723	
  
2.00 153.00 -­‐2.12	
   4.49	
   -­‐231.81	
   53735.88	
   491.437	
  
1.00 426.00 -­‐3.12	
   9.73	
   41.19	
   1696.62	
   -­‐128.513	
  
6.00 566.00 1.88	
   3.53	
   181.19	
   32829.82	
   340.637	
  
6.00 596.00 1.88	
   3.53	
   211.19	
   44601.22	
   397.037	
  
5.00 395.00 0.88	
   0.77	
   10.19	
   103.84	
   8.967	
  
6.00 676.00 1.88	
   3.53	
   291.19	
   84791.62	
   547.437	
  
3.00 194.00 -­‐1.12	
   1.25	
   -­‐190.81	
   36408.46	
   213.707	
  
2.00 135.00 -­‐2.12	
   4.49	
   -­‐249.81	
   62405.04	
   529.597	
  
7.00 367.00 2.88	
   8.29	
   -­‐17.81	
   317.20	
   -­‐51.293	
  
4.12	
   384.81	
   86.65	
   463802.04	
   1850.577	
  
1.86	
   136.21	
   74.023	
  
s
x x y y
n
xy
i i=
− −∑
−
( )( )
1
r
s
s s
xy
xy
x y
=
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
103
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6 months, all advertisements for the store have
appeared in the local newspaper. The number of ads per week has varied from one to seven.
The store’s sales staff has been tracking the number of customers who enter the store each
week. The number of ads and the number of customers per week for the past 26 weeks were
recorded.
d. Find and interpret the coefficient of determination.
e. In your opinion, is it worthwhile exercise to use the regression equation to predict the
number of customers who will enter the store, given that Colonial intends to advertise five
times in the newspaper? If so, find the 95% prediction interval. If not, explain why not.
2
y
2
x
2
xy2
ss
s
R = ! 0851.
)552,18)(47.3(
)02.74( 2
= !
There is a weak linear relationship between the number of ads and the
number of customers.
=
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
104
PROBLEM # 10.5
At 5% level significance. The manager of Colonial Furniture has been reviewing weekly
advertising expenditures. During the past 6 months, all advertisements for the store have
appeared in the local newspaper. The number of ads per week has varied from one to seven.
The store’s sales staff has been tracking the number of customers who enter the store each
week. The number of ads and the number of customers per week for the past 26 weeks were
recorded.
d. Find and interpret the coefficient of determination.
e. In your opinion, is it worthwhile exercise to use the regression equation to predict the
number of customers who will enter the store, given that Colonial intends to advertise five
times in the newspaper? If so, find the 95% prediction interval. If not, explain why not.
The linear relationship is too weak for the model to produce predictions.
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
105
7. ESTIMATION
POINT ESTIMATION
INTERVAL ESTIMATION
CONFIDENCE INTERVAL FOR THE MEAN VALUE OF Y
PREDICTION INTERVAL FOR AN INDIVIDUAL VALUE OF Y
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
106
If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be:
^
y = 10 + 5(3) = 25 cars
POINT ESTIMATION
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
107
Using the Estimated Regression
Equation for Estimation and Prediction
 / y t sp yp
± α 2
where:
confidence coefficient is 1 - α and
	

tα/2 is based on a t distribution
with n - 2 degrees of freedom
/2 indpy t sα±
  Confidence Interval Estimate of E(yp)
  Prediction Interval Estimate of yp
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
108
2
ˆ 2
( )1
( )p
p
y
i
x x
s s
n x x
−
= +
−∑
  Estimate of the Standard Deviation of ˆpy
Confidence Interval for E(yp)
 / y t sp yp
± α 2
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
109
CONFIDENCE VS PREDICTION
INTERVAL
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
a.  Determine the least-squares regression line.
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
111
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
a.  Determine the least-squares regression line.
To determine the least squares regression line, we must calculate the slope
and y‑intercept.
i i
1 2 22
i
x y nxy 400 6(6.67)(12.67)
b 1.354
346 6(6.67)x nx
! !
= = = !
!!
"
"
!
0 1b y b x 12.67 ( 1.354)(6.67) 21.701= ! = ! ! = !
400 – [(40)(76)/6]
346 – [(40)2/6]
b1 -1.345
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
12.67 – (-1.345)(6.67) = 21.641
The regression equation is ŷ = 21.641 – 1.345 x
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
112
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
b.  Determine the standard error of estimate.
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
2
i i
y.x
ˆ(y y ) 52.334
s
n 2 6 2
!
= = =
! !
"
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
113
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
b.  Determine the standard error of estimate.
The standard error of the estimate is
2
i i
y.x
ˆ(y y ) 52.334
s 3.617
n 2 6 2
!
= = =
! !
" !
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
2
i i
y.x
ˆ(y y ) 52.334
s
n 2 6 2
!
= = =
! !
"
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
114
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
c.  Construct the 95% confidence interval for the mean of y when x=7.0
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
The regression equation is ŷ = 21.641 – 1.345 (7) = 12.226
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
115
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !
12.226
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !
12.226
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !79.333
0.1089
10.041 0.4099± x
12.226 + 10.041 x 0.4099 = 16.342
12.226 - 10.041 x 0.4099 = 8.11
12.226 10.041
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
116
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
c.  Construct the 95% confidence interval for the mean of y when x=7.0
We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval;
this value is 2.776.
Therefore the 95% confidence interval for the mean value of y when x = 7 is:
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !
The regression equation is ŷ = 21.641 – 1.345 (7) = 12.226
12.226
The confidence interval ranges from 8.110 to 16.342Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
117
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
d. Construct the 95% confidence interval for the mean of y when x=9.0
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
The regression equation is ŷ = 21.641 – 1.345 (9) = 9.536
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
118
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !
9.536
2 2
y.x 2 2
i2
i
(x x) (7 6.67)1 1
ˆy ts 12.223 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
12.223 4.116 (8.107, 16.339)
! !
± + = ± +
!!
= ± =
"
" !79.333
5.4289
10.041 0.4849± x
9.536 + 10.041 x 0.4849 = 14.4048
9.536 - 10.041 x 0.4849 = 4.668
9.536 10.041
2 2
y.x 2 2
i2
i
(x x) (9 6.67)1 1
ˆy ts 9.515 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
9.515 4.868 (4.647, 14.383)
! !
± + = ± +
!!
= ± =
"
" !
9.536
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
119
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
d. Construct the 95% confidence interval for the mean of y when x=9.0
We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval;
this value is 2.776.
Therefore the 95% confidence interval for the mean value of y when x = 9 is:
2 2
y.x 2 2
i2
i
(x x) (9 6.67)1 1
ˆy ts 9.515 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
9.515 4.868 (4.647, 14.383)
! !
± + = ± +
!!
= ± =
"
" !
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
2 2
y.x 2 2
i2
i
(x x) (9 6.67)1 1
ˆy ts 9.515 2.776(3.617)
n 6( x ) 40
(346 )( x )
6n
9.515 4.868 (4.647, 14.383)
! !
± + = ± +
!!
= ± =
"
" !
The regression equation is ŷ = 21.641 – 1.345 (9) = 9.536
9.536
The confidence interval ranges from 4.668 to 14.404Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
120
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
e. Compare the width of the confidence interval obtained in part (c ) with
the obtained in part (d). Which is wider and why?
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
121
PROBLEM #10.6
For n=6 data points, the following quantities have been calculated:
e. Compare the width of the confidence interval obtained in part (c ) with
the obtained in part (d). Which is wider and why?
The confidence interval in d is wider because 9 is farther
from the mean of x than 7.
!!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!!
!!
"!'#&!!!!! !!
"!((&$!!!!!!!! !! ! !!!
")*+''#!
!
For x = 7,
8.110 to 16.342
For x = 9,
4.668 to 14.404
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
122
2
ind 2
( )1
1
( )
p
i
x x
s s
n x x
−
= + +
−∑
  Estimate of the Standard Deviation
of an Individual Value of yp
Prediction Interval for yp
/2 indpy t sα±
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
123
PROBLEM #10.7
For the summary data provided in Problem #10.18, construct a
95% prediction interval for an individual y value whenever
a. x=2
The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
124
PROBLEM #10.7
For the summary data provided in Problem #10.18, construct a
95% prediction interval for an individual y value whenever
a. x=2
We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95%
interval; this value is 2.776. Therefore, the 95% prediction interval for
an individual y value when x = 2 is:
The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
125
2 2
y.x 2 2
i2
i
(x x) (2 6.67)1 1
ˆy ts 1 18.993 2.776(3.617) 1
n 6( x ) 40
(346 )( x )
6n
18.993 12.056 (6.937, 31.049)
! !
± + + = ± + +
!!
= ± =
"
" !
18.951
79.333
21.8089
10.041 1.2006± x
18.951 + 10.041 x 1.2006 = 31.006
18.951 - 10.041 x 1.2006 = 6.896
18.951 10.041
2 2
y.x 2 2
i2
i
(x x) (2 6.67)1 1
ˆy ts 1 18.993 2.776(3.617) 1
n 6( x ) 40
(346 )( x )
6n
18.993 12.056 (6.937, 31.049)
! !
± + + = ± + +
!!
= ± =
"
" !
18.951
2 2
y.x 2 2
i2
i
(x x) (2 6.67)1 1
ˆy ts 1 18.993 2.776(3.617) 1
n 6( x ) 40
(346 )( x )
6n
18.993 12.056 (6.937, 31.049)
! !
± + + = ± + +
!!
= ± =
"
" !
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
126
PROBLEM #10.7
For the summary data provided in Problem #10.18, construct a
95% prediction interval for an individual y value whenever
a. x=2
We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95%
interval; this value is 2.776. Therefore, the 95% prediction interval for
an individual y value when x = 2 is:
2 2
y.x 2 2
i2
i
(x x) (2 6.67)1 1
ˆy ts 1 18.993 2.776(3.617) 1
n 6( x ) 40
(346 )( x )
6n
18.993 12.056 (6.937, 31.049)
! !
± + + = ± + +
!!
= ± =
"
" !
The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951
18.951
2 2
y.x 2 2
i2
i
(x x) (2 6.67)1 1
ˆy ts 1 18.993 2.776(3.617) 1
n 6( x ) 40
(346 )( x )
6n
18.993 12.056 (6.937, 31.049)
! !
± + + = ± + +
!!
= ± =
"
" !
The prediction interval ranges from 6.895 to 31.007
Created by Samie L.S. Ly, must not be used for profitable
purposes without permission.
127

More Related Content

PPT
Polynomial and thier graphs
PPTX
Gráficas de ecuaciones (slide share)
PPTX
Solving Quadratic Equations by Graphing
PDF
Graphs of trigonometric exponential functions lecture
PPT
Linear Equations Ppt
PDF
Lesson 39b
PPT
Question 2 Solution
PDF
Math 4 lecture on Graphing Rational Functions
Polynomial and thier graphs
Gráficas de ecuaciones (slide share)
Solving Quadratic Equations by Graphing
Graphs of trigonometric exponential functions lecture
Linear Equations Ppt
Lesson 39b
Question 2 Solution
Math 4 lecture on Graphing Rational Functions

What's hot (18)

DOC
Mth 4108-1 a
PPT
Graphing, Slope, And Special Lines
PPT
Parent functions and Transformations
PPTX
Functions and relations review
PDF
Polynomials lecture
PPT
Rational Function
PPT
4.1 exponential functions 2
PPT
Tutorials--Graphs of Rational Functions
PPT
4.2 vertex and intercept form
PPT
Graphing rational functions
PPTX
Math 4 6
PDF
Day 5 examples u5w14
PPT
Number
PPT
1544 integration-define
PPTX
Graphing rational functions
PDF
New day 5 examples
Mth 4108-1 a
Graphing, Slope, And Special Lines
Parent functions and Transformations
Functions and relations review
Polynomials lecture
Rational Function
4.1 exponential functions 2
Tutorials--Graphs of Rational Functions
4.2 vertex and intercept form
Graphing rational functions
Math 4 6
Day 5 examples u5w14
Number
1544 integration-define
Graphing rational functions
New day 5 examples
Ad

Similar to Lesson 10 - Regression Analysis (20)

PDF
Lecture 1.pdf
PPTX
Buoi 1.2-EMBS6Regre, Simple Lineae Regression,
PPTX
Regression refers to the statistical technique of modeling
PPTX
Regression Analysis.pptx
DOCX
CO for stat and probability-2023-Copy.docx
PPTX
Simple linear regression
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
PDF
me310_5_regression.pdf numerical method for engineering
PPTX
Unit-III Correlation and Regression.pptx
PDF
Linear regression model in econometrics undergraduate
PDF
Regression analysis
PPTX
business Lesson-Linear-Regression-1.pptx
PPT
15.Simple Linear Regression of case study-530 (2).ppt
PPTX
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
PPTX
An Introduction to Regression Models: Linear and Logistic approaches
PDF
Regression Analysis-Machine Learning -Different Types
PPT
Intro to corhklloytdeb koptrcb k & reg.ppt
PPTX
Chapter 12
PDF
linear_regression_notes.pdf
Lecture 1.pdf
Buoi 1.2-EMBS6Regre, Simple Lineae Regression,
Regression refers to the statistical technique of modeling
Regression Analysis.pptx
CO for stat and probability-2023-Copy.docx
Simple linear regression
Lecture 8 Linear and Multiple Regression (1).pptx
me310_5_regression.pdf numerical method for engineering
Unit-III Correlation and Regression.pptx
Linear regression model in econometrics undergraduate
Regression analysis
business Lesson-Linear-Regression-1.pptx
15.Simple Linear Regression of case study-530 (2).ppt
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
An Introduction to Regression Models: Linear and Logistic approaches
Regression Analysis-Machine Learning -Different Types
Intro to corhklloytdeb koptrcb k & reg.ppt
Chapter 12
linear_regression_notes.pdf
Ad

Recently uploaded (20)

PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
IFRS Notes in your pocket for study all the time
PDF
How to Get Funding for Your Trucking Business
PDF
Keppel_Proposed Divestment of M1 Limited
PDF
Charisse Litchman: A Maverick Making Neurological Care More Accessible
PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Building a Smart Pet Ecosystem: A Full Introduction to Zhejiang Beijing Techn...
PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Daniels 2024 Inclusive, Sustainable Development
PDF
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PDF
Introduction to Generative Engine Optimization (GEO)
PDF
How to Get Approval for Business Funding
PPTX
Negotiation and Persuasion Skills: A Shrewd Person's Perspective
PDF
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PDF
Technical Architecture - Chainsys dataZap
PDF
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
IFRS Notes in your pocket for study all the time
How to Get Funding for Your Trucking Business
Keppel_Proposed Divestment of M1 Limited
Charisse Litchman: A Maverick Making Neurological Care More Accessible
Slide gioi thieu VietinBank Quy 2 - 2025
Building a Smart Pet Ecosystem: A Full Introduction to Zhejiang Beijing Techn...
Lecture 3344;;,,(,(((((((((((((((((((((((
Tata consultancy services case study shri Sharda college, basrur
Daniels 2024 Inclusive, Sustainable Development
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
Introduction to Generative Engine Optimization (GEO)
How to Get Approval for Business Funding
Negotiation and Persuasion Skills: A Shrewd Person's Perspective
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
NEW - FEES STRUCTURES (01-july-2024).pdf
Module 2 - Modern Supervison Challenges - Student Resource.pdf
Technical Architecture - Chainsys dataZap
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf

Lesson 10 - Regression Analysis

  • 2. 7 SECTIONS 1.  SIMPLE LINEAR REGRESSION MODEL 14.2 2.  LEAST SQUARE METHOD 14.2 3.  COEFFICIENT OF DETERMINATION 14.2 4.  MODEL ASSUMPTIONS 14.2 5.  TESTING FOR SIGNIFICANCE 14.3 6.  COVARIANCE & COEFFICIENT OF CORRELATION 14.1 7.  USING THE ESTIMATED REGRESSION 11.7 EQUATION FOR ESTIMATION AND PREDICTION
  • 3. 1 . SIMPLE LINEAR REGRESSION MODEL REGRESSION MODEL REGRESSION EQUATION ESTIMATED REGRESSION EQUATION Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 3
  • 4. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. What influences my Final COMM 215 grade? Think of a few examples. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 4
  • 5. FINAL COMM 215 Grade (y) = Study Time (x1) + Work Time (x2) + # of hours spent on Facebook (x3) + Family Time (x4) + anything else you can think of (x…) … Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 5
  • 6. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. Let’s say, I study 5 hours a week, work part-time for 25 hours a week, I spend 3 hours on facebook, And I have 2 kids. Here is 1 scenario. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 6
  • 7. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. What if it was someone else? With a different profile? Do I have to make my calculations all over again? If I set up a regression line, I just need to plug in values and get an estimation of my Final COMM 215 grade. Voila! Then you might ask… how? Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 7
  • 8. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. How do I create this regression line? Answer: By gathering data from history. I am going to take a sample of individuals who took COMM 215 before and write down their profile. Hours per week Study Time Work Time Family Time Facebook Time Bob 10 25 5 0 Sally 12 0 15 15 Eric 3 40 10 0 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 8
  • 9. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. Since we are only considering 1 independent variable, let’s take just 1, Study Time as the main indicator of your Final COMM 215 grade. Hours per week Study Time (x) Final Grade (y) Bob 10 89 Sally 12 67 Eric 3 45 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 9
  • 10. THE OBJECTIVE OF SLR Let’s say, I would like to create the ultimate equation to figure out my Final COMM 215 grade. We’ve generated this equation! Now, if Michelle asks, if I study 8 hours a week, what would be my estimated Final COMM 215 grade? y = 3.4478x + 38.269 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 10
  • 11. SIMPLE LINEAR REGRESSION 1 VARIABLE FINAL COMM 215 GRADE (y) = STUDY TIME (x) MULTIPLE LINEAR REGRESSION MORE THAN 1 VARIABLE FINAL COMM 215 GRADE (y) = STUDY TIME (x1) + WORK TIME(x2) + …. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 11
  • 12. As a way of predicting sales Managerial decisions are often made based on the relationship between two or more variables. The statistical process is called regression analysis used to develop an equation showing how the variables are related. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 12
  • 13. The variable being predicted is called the dependent variable. The variable or variables being used to predict the value of the dependence variable are called independent variables. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 13
  • 14. FINAL COMM 215 Grade (y) = Study Time (x1) + Work Time (x2) + # of hours spent on Facebook (x3) + Family Time (x4) + anything else you can think of (x…) … Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 14
  • 15. SIMPLE LINEAR REGRESSION EQUATION   Positive Linear Relationship E(y) x Slope β1 is positive Regression line Intercept β0 n The relationship between the two variables is approximated by a straight line. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 15 PAGE 572 GROEBNER et Al. (2014)
  • 16.   Negative Linear Relationship E(y) x Slope β1 is negative Regression line Intercept β0 SIMPLE LINEAR REGRESSION EQUATION When would be a case of a negative relationship? If I spend all my time watching movies instead of studying, would it increase my grade? Or decrease my grade? Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 16
  • 17.   No Relationship E(y) x Slope β1 is 0 Regression lineIntercept β0 SIMPLE LINEAR REGRESSION EQUATION Hmm… what would be an example of No relationship? If my friend eats 5 ice creams everyday, does it have any relationship with my grade? Not really right? Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 17
  • 18. SIMPLE LINEAR REGRESSION MODEL βo and β1 – parameters of the model ε is a random variable referred to as the error term. ε – variability in y that cannot be explained y = β0 + β1x +εE(y) x Slope β1 is positive Regression line Intercept β0 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 18
  • 19. CHARACTERISTICS OF THE ERROR TERM y = β0 + β1x +ε The more variables you add The small ε will become because now you know more! y=β0+β1x1+ β2x2+ β3x3+ β4x4 + β5x5+ β6x6+ β7x7+ β8x8…+ βixi Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 19 PAGE 573 GROEBNER et Al. (2014)
  • 20. CHARACTERISTICS OF THE ERROR TERM y = β0 + β1x +ε The more variables you add The small ε will become because now you know more! y=β0+β1x1+ β2x2+ β3x3+ β4x4 + β5x5+ β6x6+ β7x7+ β8x8…+ βixi Cannot be explained! Cannot be calculated! Cannot.. Just don’t ask sigh… Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 20
  • 21. REGRESSION EQUATION Describes how the expected value of y, E(y) is related to x E(y) = β0 + β1x •  E(y) is the expected value of y for a given x value. •  β1 is the slope of the regression line. •  β0 is the y intercept of the regression line. •  Graph of the regression equation is a straight line. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 21 PAGE 573 GROEBNER et Al. (2014)
  • 22. ESTIMATED SIMPLE LINEAR REGRESSION EQUATION Sample statistics are E(y) = β0 + β1x 0 1 ˆy b b x= + •  is the estimated value of y for a given x value.ˆy •  b1 is the slope of the line. •  b0 is the y intercept of the line. •  The graph is called the estimated regression line. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 22
  • 23. PROBLEM #10.1 In the linear regression equation, y = b0 + b1x1, why is the term at the left given as ŷ instead of simply y? because it is an estimated value for the dependent variable given a value of x. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 23
  • 24. 2. LEAST SQUARE METHOD Is a procedure for using sample data to find the estimated regression equation. The goal to using the least square method 0 1 ˆy b b x= + Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 24
  • 25. Good fit with the line Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 25
  • 26. LEAST SQUARES METHOD Least Squares Criterion min (y yi i−∑  )2 where: yi = observed value of the dependent variable for the ith observation ^ yi = estimated value of the dependent variable for the ith observation Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 26 PAGE 572 GROEBNER et Al. (2014)
  • 27. PROBLEM # 10.2 A scatter diagram includes the data points (x=2,y=10), (x=3, y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least- squares criterion, which of these regression lines is the better fit to the data? Why? ŷ = 10 + x x y ŷ (y-ŷ) (y-ŷ)2x y ŷ (y-ŷ) (y-ŷ)2 2 10 3 12 4 20 5 16 ŷ = 10 + = 10 + 2 = 12 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 27
  • 28. PROBLEM # 10.2 A scatter diagram includes the data points (x=2,y=10), (x=3, y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least- squares criterion, which of these regression lines is the better fit to the data? Why? ŷ = 10 + x x y ŷ (y-ŷ) (y-ŷ)2x y ŷ (y-ŷ) (y-ŷ)2 2 10 3 12 4 20 5 16 ŷ = 10 + = 10 + 2 = 12 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 3 12 4 20 5 16 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 3 12 13 4 20 14 5 16 15 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 3 12 13 -1 4 20 14 6 5 16 15 1 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 4 3 12 13 -1 1 4 20 14 6 36 5 16 15 1 1 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 4 3 12 13 -1 1 4 20 14 6 36 5 16 15 1 1 42 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 28
  • 29. PROBLEM # 10.2 A scatter diagram includes the data points (x=2,y=10), (x=3, y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least- squares criterion, which of these regression lines is the better fit to the data? Why? ŷ = 8 + 2x x y ŷ (y-ŷ) (y-ŷ)2 2 10 3 12 4 20 5 16 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 3 12 14 4 20 16 5 16 18 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 3 12 14 -2 4 20 16 4 5 16 18 -2 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 4 3 12 14 -2 4 4 20 16 4 16 5 16 18 -2 4 x y ŷ (y-ŷ) (y-ŷ)2 2 10 12 -2 4 3 12 14 -2 4 4 20 16 4 16 5 16 18 -2 4 28 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 29
  • 30. PROBLEM # 10.2 A scatter diagram includes the data points (x=2,y=10), (x=3, y=12), (x=4, y=20), and (x=5, y=16). Two regression lines are proposed: (1) y = 10 + x, and (2) y = 8 + 2x. Using the least- squares criterion, which of these regression lines is the better fit to the data? Why? For ŷ = 10 + x, the least square criterion = 42 For ŷ = 8 + 2x, the least square criterion = 28 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 30
  • 31. Slope for the Estimated Regression Equation 1 2 ( )( ) ( ) i i i x x y y b x x − − = − ∑ ∑ LEAST SQUARES METHOD where: xi = value of independent variable for ith observation _ y = mean value for dependent variable _ x = mean value for independent variable yi = value of dependent variable for ith observation Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 31 PAGE 573 GROEBNER et Al. (2014)
  • 32.   y-Intercept for the Estimated Regression Equation 0 1b y b x= − LEAST SQUARES METHOD Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 32
  • 33. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a. Determine the least-squares regression line and interpret its slope. b. For an employee who has been with the firm 10 years, what is the predicted number of shares of stock owned? X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 33
  • 34. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm 0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm 0 1 ˆy b b x= + Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 34
  • 35. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   0 1 ˆy b b x= + Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 35
  • 36. x y xy x2 6.00 300.00 1800.00 36.00 12.00 408.00 4896.00 144.00 14.00 560.00 7840.00 196.00 6.00 252.00 1512.00 36.00 9.00 288.00 2592.00 81.00 13.00 650.00 8450.00 169.00 15.00 630.00 9450.00 225.00 9.00 522.00 4698.00 81.00 Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00 x y xy x2 6.00 300.00 1800.00 36.00 12.00 408.00 4896.00 14.00 560.00 6.00 252.00 1512.00 36.00 9.00 288.00 81.00 13.00 650.00 8450.00 15.00 630.00 9450.00 225.00 9.00 522.00 81.00 Σ x = Σ y = Σ xy = Σ x2 = PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   x y xy x2 6.00 300.00 1800.00 36.00 12.00 408.00 4896.00 14.00 560.00 7840.00 6.00 252.00 1512.00 36.00 9.00 288.00 2592.00 81.00 13.00 650.00 8450.00 15.00 630.00 9450.00 225.00 9.00 522.00 4698.00 81.00 Σ x = Σ y = Σ xy = Σ x2 = x y xy x2 6.00 300.00 1800.00 36.00 12.00 408.00 4896.00 144.00 14.00 560.00 7840.00 196.00 6.00 252.00 1512.00 36.00 9.00 288.00 2592.00 81.00 13.00 650.00 8450.00 169.00 15.00 630.00 9450.00 225.00 9.00 522.00 4698.00 81.00 Σ x = Σ y = Σ xy = Σ x2 = x y xy x2 6.00 300.00 1800.00 36.00 12.00 408.00 4896.00 144.00 14.00 560.00 7840.00 196.00 6.00 252.00 1512.00 36.00 9.00 288.00 2592.00 81.00 13.00 650.00 8450.00 169.00 15.00 630.00 9450.00 225.00 9.00 522.00 4698.00 81.00 Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 36
  • 37. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. 0 1 ˆy b b x= + Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00 Do your calculations! Find bo and b1! Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 37
  • 38. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. 0 1 ˆy b b x= + Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00 b1= Σxy –[Σx Σy / n] Σx2 – [(Σx)2 / n] 41,238 – [(84) (3610)/8] (968) – [(84)2 / 8] = 3333 86 = = 38.756 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 38
  • 39. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. 0 1 ˆy b b x= + Σ x = 84.00 Σ y = 3610.00 Σ xy = 41,238.00 Σ x2 = 968.00 b0= ymean – b1 xmean = 451.25 – 38.756 (10.5) ymean= Σ y / n = 3610.00/8 = 451.25 xmean= Σ x / n = 84 /8 = 10.5 = 44.314 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 39
  • 40. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. a.  Determine the least-squares regression line and interpret its slope. X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm 0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm 0 1 ˆy b b x= + y = 38.756x + 44.314 R² = 0.72009 0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm b1= 38.756 bo= 44.314 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 40
  • 41. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 C D E F G H I SUMMARY OUTPUT Regression Statistics Multiple R 0.849 R Square 0.720 Adjusted R Square 0.673 Standard Error 91.479 Observations 8 ANOVA df SS MS F Significance F Regression 1 129173.13 129173.13 15.436 0.008 Residual 6 50210.37 8368.40 Total 7 179383.50 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 44.3140 108.51 0.408 0.697 -221.197 309.825 Years 38.7558 9.86 3.929 0.008 14.618 62.893 0 1 ˆy b b x= + Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 41
  • 42. PROBLEM # 10.3 For a sample of 8 employees, a personnel director has collected the following data on ownership of company stock versus years with the firm. b. For an employee who has been with the firm 10 years, what is the predicted number of shares of stock owned? X=  years   Y  =  shares   6   300   12   408   14   560   6   252   9   288   13   650   15   630   9   522   ŷ = 44.314 + 38.756 x ŷ = 44.314 + 38.756 (10) ŷ = 431.9 the predicted value of Shares will be 431.9 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 42
  • 43. 3. COEFFICIENT OF DETERMINATION CORRELATION COEFFICIENT y = 38.756x + 44.314 R² = 0.72009 0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 shares years Ownership of company stock vs years with the firm Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 43
  • 44. COEFFICIENT OF DETERMINATION 0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 14 16 18 SHARES YEARS Ownership of company stock vs years with the firm x   y   6   300   9   408   14   560   6   252   6   288   16   650   15   630   13   522   R² = 0.98688 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 44
  • 45. PROVIDES A MEASURE OF THE GOODNESS OF FIT FOR THE ESTIMATED REGRESSION EQUATION IN OTHER WORDS DOES THE INDEPENDENT VARIABLE EXPLAIN THE DEPENDENT VARIABLE WELL? COEFFICIENT OF DETERMINATION Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 45
  • 46. FINAL COMM 215 Grade (y) = Study Time (x1) + Work Time (x2) + # of hours spent on Facebook (x3) + Family Time (x4) + anything else you can think of (x…) … Explain 40% of the grade Explain 15% of the grade Explain 9% of the grade Explain 15% of the grade Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 46
  • 47. FINAL COMM 215 Grade (y) = Study Time (x1) + Work Time (x2) + # of hours spent on Facebook (x3) + Family Time (x4) + anything else you can think of (x…) … Explain 40% of the grade Explain 15% of the grade Explain 9% of the grade Explain 15% of the grade These 4 Variables explain 79% of your grade! Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 47
  • 48. ymean=451.25 Sum of Squares due to Error (SSE) Difference between Observed y and Expected ŷ y - ŷΣ( )2 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 48 PAGE 575 GROEBNER et Al. (2014)
  • 49. ymean=451.25 Sum of Squares due to Regression (SSR) Difference between Mean of y and Expected ŷ ymean - ŷΣ( )2 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 49 PAGE 581 GROEBNER et Al. (2014)
  • 50. ymean=451.25 Total Sum of Squares (SST) Difference between Observed y and Mean of y y- ymeanΣ( )2 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 50 PAGE 580 GROEBNER et Al. (2014)
  • 51. COEFFICIENT OF DETERMINATION Relationship Among SST, SSR, SSE where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error SST = SSR + SSE 2 ( )iy y−∑ 2 ˆ( )iy y= −∑ 2 ˆ( )i iy y+ −∑ Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 51
  • 52.   The coefficient of determination is: where: SSR = sum of squares due to regression = explained variation SST = total sum of squares = total variation r2 = SSR/SST COEFFICIENT OF DETERMINATION PAGE 392 ADCOCK ET Al. (2011) Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 52
  • 53. COEFFICIENT OF DETERMINATION (R2) Expresses proportion of the variation in the dependent variable (y) that is explained by the regression line: ŷ = b0+b1x1 COEFFICIENT OF CORRELATION (R) Describes both the direction and the strength of the linear relationship between two variables r = (sign of b1) √ r2 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 53
  • 54. PROBLEM #10.4 For a set of data, the total variation or sum of squares for y is SST = 143.0, and error sum of squares is SSE = 24.0. What proportion of the variation in y is explained by the regression equation? Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 54
  • 55. PROBLEM #10.4 For a set of data, the total variation or sum of squares for y is SST = 143.0, and error sum of squares is SSE = 24.0. What proportion of the variation in y is explained by the regression equation? = SSR/SST We are asked to find the coefficient of determination : R2 We also know SST = SSR + SSE Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 55
  • 56. PROBLEM #10.4 For a set of data, the total variation or sum of squares for y is SST = 143.0, and error sum of squares is SSE = 24.0. What proportion of the variation in y is explained by the regression equation? = SSR/SST We are asked to find the coefficient of determination : R2 We also know SST = SSR + SSE SST – SSE = SSR 143.0 – 24.0 = 119 119 /143.0 = 0.832 = 83.2% Interpretation: 83.2% of the variation in y is explained by x Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 56
  • 57. 4. MODEL ASSUMPTIONS Before conducting regression analysis… What determines an appropriate model for the relationship between the dependent and the independent variable(s). Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 57
  • 58. Even if the data fits well , the estimated regression equation should not be used until further analysis of how appropriate the assumed model is. One way to determine is to test for significance. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 58
  • 59. ASSUMPTIONS- ERROR TERM 1.  The error term ε is a random variable with an expected value of zero. In estimating an element that is unpredictable, it is best to assume it to be zero. 2.  The variance of ε, denoted by σ2 is the same for all values of x. 3.  The values of error are independent. 4.  The error term is normally distributed. PAGE 370 ADCOCK ET Al. (2011) Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 59
  • 60. PAGE 371 ADCOCK ET Al. (2011) Do you see the normal curve? Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 60
  • 61. PAGE 371 ADCOCK ET Al. (2011) Do you see the normal curve? Right here is the population mean, which is also your point on the line. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 61
  • 62. PAGE 371 ADCOCK ET Al. (2011) Assumption # 2: The variance/ standard deviation for each value x is the same. All of these normal distributions have the same standard deviation. Assumption # 2: The variance/ standard deviation for each value x is the same. All of these normal distributions have the same standard deviation. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 62
  • 63. PAGE 371 ADCOCK ET Al. (2011) Assumptions # 3 & 4: the error terms are independent and are normally distributed. If the point is right on the line, then you have en error of 0. If your point is away from the line, the further it is away, the larger the error term is. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 63
  • 64. PAGE 371 ADCOCK ET Al. (2011) Assumptions # 3 & 4: the error terms are independent and are normally distributed. If the point is right on the line, then you have en error of 0. If your point is away from the line, the further it is away, the larger the error term is. Away from the line, larger error = larger standard error. Right on the line, error = 0, standard error = 0. It is exactly where we expect the point to be. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 64
  • 65. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 65
  • 66. 5. TESTING FOR SIGNIFICANCE ESTIMATE σ2 TESTING FOR SIGNIFICANCE CONFIDENCE INTERVAL FOR B1 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 66
  • 67. An Estimate of σ 2 TESTING FOR SIGNIFICANCE ∑∑ −−=−= 2 10 2 )()ˆ(SSE iiii xbbyyy where: s2 = MSE = SSE/(n - 2) The mean square error (MSE) provides the estimate of σ 2, and the notation s2 is also used. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 67 SSE = ! yi 2 " b0 yi " b1! xi yi!
  • 68. TESTING FOR SIGNIFICANCE An Estimate of σ 2 SSE MSE − == n s •  To estimate σ we take the square root of σ 2. •  The resulting s is called the standard error of the estimate. Why is this n-2?, it actually is n-k-1. In a simple linear regression, since you always have 1 independent variable (k=1), therefore automatically it becomes n-1-1. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 68
  • 69. 14-69 Large Standard Error Small Standard Error STANDARD ERROR OF THE ESTIMATE
  • 70. TESTING FOR SIGNIFICANCE To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of β1 is zero. Two tests are commonly used: t Test and F Test Both the t test and F test require an estimate of σ 2, the variance of ε in the regression model. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 70
  • 71. Hypotheses Test Statistic TESTING FOR SIGNIFICANCE: T TEST 0 1: 0H β = 1: 0aH β ≠ 1 1 b b t s = where 1 2 ( ) b i s s x x = Σ − Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 71 PAGE 584 GROEBNER et Al. (2014)
  • 72. 1. Set up Hypotheses. 2. What is the appropriate test statistic to use?. 3. Calculate the test statistic value. α = .054. Find the critical value for the test statistic. 0 1: 0H β = 1: 0aH β ≠ 1 1 b b t s = 5. Define the decision rule 6. Make your decision 7. Interpret the conclusion in context TESTING FOR SIGNIFICANCE: T TEST Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 72
  • 73. The form of a confidence interval for β1 is: CONFIDENCE INTERVAL FOR Β1 11 /2 bb t sα± where is the t value providing an area of α/2 in the upper tail of a t distribution with n - 2 degrees of freedom 2/αt b1 is the point estimator is the margin of error 1/2 bt sα Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 73 PAGE 594 GROEBNER et Al. (2014)
  • 74. CONFIDENCE INTERVAL FOR Β1 n  H0 is rejected if the hypothesized value of β1 is not included in the confidence interval for β1. n  We can use a 95% confidence interval for β1 to test the hypotheses just used in the t test. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 74
  • 75. CONFIDENCE INTERVAL FOR Β1 Reject H0 if 0 is not included in the confidence interval for β1. 0 is not included in the confidence interval. Reject H0 = 5.0 ± 2.048(2.25) = 5.0 ± 4.60812/1 bstb α± or 0.392 to 9.608   Rejection Rule   95% Confidence Interval for β1   Conclusion Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 75
  • 76. SOME CAUTIONS ABOUT THE INTERPRETATION OF SIGNIFICANCE TESTS n  Just because we are able to reject H0: β1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. n  Rejecting H0: β1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 76
  • 77. What is a standard error of the estimate? What is a standard error of the slope? 2 SSE MSE − == n s 1 2 ( ) b i s s x x = Σ − Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 77
  • 78. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. a.  Determine the sample regression line b.  Interpret the coefficients. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? d. Find and interpret the coefficient of determination. e. In your opinion, is it worthwhile exercise to use the regression equation to predict the number of customers who will enter the store, given that Colonial intends to advertise five times in the newspaper? If so, find the 95% prediction interval. If not, explain why not. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 78
  • 79. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. a.  Determine the sample regression line b.  Interpret the coefficients. Ads Customer 5 353 6 319 3 440 2 332 4 172 2 331 4 344 2 483 4 329 2 532 7 496 5 393 4 376 7 372 2 512 5 254 5 459 2 153 1 426 6 566 6 596 5 395 6 676 3 194 2 135 7 367 ! ŷ = 296.92 + 21.356x On average each additional ad generates 21.36 customers. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 79
  • 80. PROBLEM # 10.5 c. Can the manager infer that the larger the number of ads, the larger the number of customers? 1. Set up the hypotheses: Ho: β1 = 0 ; Ha: β1 > 0 2. What is the appropriate test statistics to use? One tail t-test, α=0.05 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 80
  • 81. 1. Set up Hypotheses. 2. What is the appropriate test statistic to use?. 3. Calculate the test statistic value. α = .054. Find the critical value for the test statistic. Testing for Significance: t Test 1 1 b b t s = 5. Define the decision rule 6. Make your decision 7. Interpret the conclusion in context Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 81
  • 82. 3. Calculate the test statistic value. 1 1 b b t s = 2n SSE s ! =" ! 1 3 2 4 5 SSE Standard Error of Estimate (sε) = = SSxx 1 2 ( ) b i s s x x = Σ − n-k-1 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 82
  • 83. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? Ads x Customer y ŷ  =  296.92  +  21.356x   y-­‐ŷ   (y-­‐ŷ)2   5.00 353.00 403.70   -­‐50.70   2570.49   6.00 319.00 425.06   -­‐106.06   11247.88   3.00 440.00 360.99   79.01   6242.90   2.00 332.00 339.63   -­‐7.63   58.25   4.00 172.00 382.34   -­‐210.34   44244.60   2.00 331.00 339.63   -­‐8.63   74.51   4.00 344.00 382.34   -­‐38.34   1470.26   2.00 483.00 339.63   143.37   20554.38   4.00 329.00 382.34   -­‐53.34   2845.58   2.00 532.00 339.63   192.37   37005.45   7.00 496.00 446.41   49.59   2458.97   5.00 393.00 403.70   -­‐10.70   114.49   4.00 376.00 382.34   -­‐6.34   40.25   7.00 372.00 446.41   -­‐74.41   5537.15   2.00 512.00 339.63   172.37   29710.73   5.00 254.00 403.70   -­‐149.70   22410.09   5.00 459.00 403.70   55.30   3058.09   2.00 153.00 339.63   -­‐186.63   34831.50   1.00 426.00 318.28   107.72   11604.46   6.00 566.00 425.06   140.94   19865.21   6.00 596.00 425.06   170.94   29221.85   5.00 395.00 403.70   -­‐8.70   75.69   6.00 676.00 425.06   250.94   62972.89   3.00 194.00 360.99   -­‐166.99   27884.99   2.00 135.00 339.63   -­‐204.63   41874.26   7.00 367.00 446.41   -­‐79.41   6306.27   424281.17   SSE = Σ(y-ŷ)2 1 SSE = Σ(y-ŷ)2 = 424281.17 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 83
  • 84. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? 2 2n SSE s ! =" ! SSE = Σ(y-ŷ)2 = 424281.17   2n SSE s ! =" ! n-k-1 424281.17   26-1-1 2n SSE s ! =" !132.96 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 84
  • 85. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? SSxx= Ads x x-­‐xbar   (x-­‐xbar)2   5.00 0.88   0.7744   6.00 1.88   3.5344   3.00 -­‐1.12   1.2544   2.00 -­‐2.12   4.4944   4.00 -­‐0.12   0.0144   2.00 -­‐2.12   4.4944   4.00 -­‐0.12   0.0144   2.00 -­‐2.12   4.4944   4.00 -­‐0.12   0.0144   2.00 -­‐2.12   4.4944   7.00 2.88   8.2944   5.00 0.88   0.7744   4.00 -­‐0.12   0.0144   7.00 2.88   8.2944   2.00 -­‐2.12   4.4944   5.00 0.88   0.7744   5.00 0.88   0.7744   2.00 -­‐2.12   4.4944   1.00 -­‐3.12   9.7344   6.00 1.88   3.5344   6.00 1.88   3.5344   5.00 0.88   0.7744   6.00 1.88   3.5344   3.00 -­‐1.12   1.2544   2.00 -­‐2.12   4.4944   7.00 2.88   8.2944   4.12   86.6544   3 1 2 ( ) b i s s x x = Σ − =132.96 1 2 ( ) b i s s x x = Σ −86.6544 132.96 = 14.28 4 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 85
  • 86. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? 5 1 1 b b t s = 1 2 ( ) b i s s x x = Σ − = 14.28 ŷ = 296.92 + 21.356x = + 21.356 = 14.28 1.4955 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 86
  • 87. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. c.  Can the manager infer that the larger the number of ads, the larger the number of customers? 3. Calculate the test statistic value. t = b1 – β1 sb1 = 21.356 - 0 14.28 = 1.496 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 87
  • 88. PROBLEM # 10.5 4. Find the critical value of the test statistics tα, n-k-1= t0.05,24 = 1.711 5. Define the decision rule Reject Ho, if tobserved > tcritical, otherwise do not reject. 6. Make your decision Since tobserved= 1.4955 < tcritical , then we do not reject Ho 7. Interpret in the context There is not enough evidence to conclude that the larger the number of ads the larger the number of customers. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 88
  • 89. 6. COVARIANCE & COEFFICIENT OF CORRELATION Covariance Interpretation of the covariance Correlation coefficient Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 89
  • 90. MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES Thus far we have examined numerical methods used to summarize the data for one variable at a time. Often a manager or decision maker is interested in the relationship between two variables. Two descriptive measures of the relationship between two variables are covariance and correlation coefficient. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 90
  • 92. COVARIANCE Positive values indicate a positive relationship. Negative values indicate a negative relationship. The covariance is a measure of the linear association between two variables. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 92 PAGE 560 GROEBNER et Al. (2014)
  • 93. The covariance is computed as follows: for samples for populations s x x y y n xy i i= − −∑ − ( )( ) 1 σ µ µ xy i x i yx y N = − −∑( )( ) COVARIANCE Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 93
  • 94. Just because two variables are highly correlated, it does not mean that one variable is the cause of the other. Correlation is a measure of linear association and not necessarily causation. CORRELATION COEFFICIENT Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 94
  • 95. The correlation coefficient is computed as follows: for samples for populations r s s s xy xy x y = ρ σ σ σxy xy x y = CORRELATION COEFFICIENT Pearson Product Moment Correlation Coefficient. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 95
  • 96. ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ r - Sample correlation coefficient n - Sample size x - Value of the independent variable y - Value of the dependent variable CORRELATION COEFFICIENT PAGE 561 GROEBNER et Al. (2014)
  • 97. CORRELATION COEFFICIENT Values near +1 indicate a strong positive linear relationship. Values near -1 indicate a strong negative linear relationship. The coefficient can take on values between -1 and +1. The closer the correlation is to zero, the weaker the relationship. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 97
  • 98. Covariance and Correlation Coefficient 277.6 259.5 269.1 267.0 255.6 272.9 69 71 70 70 71 69 x y 10.65 -7.45 2.15 0.05 -11.35 5.95 -1.0 1.0 0 0 1.0 -1.0 -10.65 -7.45 0 0 -11.35 -5.95 ( )ix x− ( )( )i ix x y y− −( )iy y− Average Std. Dev. 267.0 70.0 -35.40 8.2192 .8944 Total Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 98
  • 99. Ads x Customer y x-­‐xmean   (x-­‐xmean)2   y-­‐ymean   (y-­‐ymean)2   (x-­‐xmean)(y-­‐ymean)   5.00 353.00 0.88   0.77   -­‐31.81   1011.88   -­‐27.993   6.00 319.00 1.88   3.53   -­‐65.81   4330.96   -­‐123.723   3.00 440.00 -­‐1.12   1.25   55.19   3045.94   -­‐61.813   2.00 332.00 -­‐2.12   4.49   -­‐52.81   2788.90   111.957   4.00 172.00 -­‐0.12   0.01   -­‐212.81   45288.10   25.537   2.00 331.00 -­‐2.12   4.49   -­‐53.81   2895.52   114.077   4.00 344.00 -­‐0.12   0.01   -­‐40.81   1665.46   4.897   2.00 483.00 -­‐2.12   4.49   98.19   9641.28   -­‐208.163   4.00 329.00 -­‐0.12   0.01   -­‐55.81   3114.76   6.697   2.00 532.00 -­‐2.12   4.49   147.19   21664.90   -­‐312.043   7.00 496.00 2.88   8.29   111.19   12363.22   320.227   5.00 393.00 0.88   0.77   8.19   67.08   7.207   4.00 376.00 -­‐0.12   0.01   -­‐8.81   77.62   1.057   7.00 372.00 2.88   8.29   -­‐12.81   164.10   -­‐36.893   2.00 512.00 -­‐2.12   4.49   127.19   16177.30   -­‐269.643   5.00 254.00 0.88   0.77   -­‐130.81   17111.26   -­‐115.113   5.00 459.00 0.88   0.77   74.19   5504.16   65.287   2.00 153.00 -­‐2.12   4.49   -­‐231.81   53735.88   491.437   1.00 426.00 -­‐3.12   9.73   41.19   1696.62   -­‐128.513   6.00 566.00 1.88   3.53   181.19   32829.82   340.637   6.00 596.00 1.88   3.53   211.19   44601.22   397.037   5.00 395.00 0.88   0.77   10.19   103.84   8.967   6.00 676.00 1.88   3.53   291.19   84791.62   547.437   3.00 194.00 -­‐1.12   1.25   -­‐190.81   36408.46   213.707   2.00 135.00 -­‐2.12   4.49   -­‐249.81   62405.04   529.597   7.00 367.00 2.88   8.29   -­‐17.81   317.20   -­‐51.293   4.12   384.81   86.65   463802.04   1850.577   1.86   136.21   74.023   3.466176   18552.081544  Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 99
  • 100. Ads x Customer y x-­‐xmean   (x-­‐xmean)2   y-­‐ymean   (y-­‐ymean)2   (x-­‐xmean)   (y-­‐ymean)   5.00 353.00 0.88   0.77   -­‐31.81   1011.88   -­‐27.993   … …. …   …   …   …   …   6.00 566.00 1.88   3.53   181.19   32829.82   340.637   6.00 596.00 1.88   3.53   211.19   44601.22   397.037   5.00 395.00 0.88   0.77   10.19   103.84   8.967   6.00 676.00 1.88   3.53   291.19   84791.62   547.437   3.00 194.00 -­‐1.12   1.25   -­‐190.81   36408.46   213.707   2.00 135.00 -­‐2.12   4.49   -­‐249.81   62405.04   529.597   7.00 367.00 2.88   8.29   -­‐17.81   317.20   -­‐51.293   4.12   384.81   86.65   463802.04   1850.577   1.86   136.21   74.023   Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 100
  • 101. Ads x Customer y x-­‐xmean   (x-­‐xmean)2   y-­‐ymean   (y-­‐ymean)2   (x-­‐xmean)(y-­‐ymean)   5.00 353.00 0.88   0.77   -­‐31.81   1011.88   -­‐27.993   6.00 319.00 1.88   3.53   -­‐65.81   4330.96   -­‐123.723   3.00 440.00 -­‐1.12   1.25   55.19   3045.94   -­‐61.813   2.00 332.00 -­‐2.12   4.49   -­‐52.81   2788.90   111.957   4.00 172.00 -­‐0.12   0.01   -­‐212.81   45288.10   25.537   2.00 331.00 -­‐2.12   4.49   -­‐53.81   2895.52   114.077   4.00 344.00 -­‐0.12   0.01   -­‐40.81   1665.46   4.897   2.00 483.00 -­‐2.12   4.49   98.19   9641.28   -­‐208.163   4.00 329.00 -­‐0.12   0.01   -­‐55.81   3114.76   6.697   2.00 532.00 -­‐2.12   4.49   147.19   21664.90   -­‐312.043   7.00 496.00 2.88   8.29   111.19   12363.22   320.227   5.00 393.00 0.88   0.77   8.19   67.08   7.207   4.00 376.00 -­‐0.12   0.01   -­‐8.81   77.62   1.057   7.00 372.00 2.88   8.29   -­‐12.81   164.10   -­‐36.893   2.00 512.00 -­‐2.12   4.49   127.19   16177.30   -­‐269.643   5.00 254.00 0.88   0.77   -­‐130.81   17111.26   -­‐115.113   5.00 459.00 0.88   0.77   74.19   5504.16   65.287   2.00 153.00 -­‐2.12   4.49   -­‐231.81   53735.88   491.437   1.00 426.00 -­‐3.12   9.73   41.19   1696.62   -­‐128.513   6.00 566.00 1.88   3.53   181.19   32829.82   340.637   6.00 596.00 1.88   3.53   211.19   44601.22   397.037   5.00 395.00 0.88   0.77   10.19   103.84   8.967   6.00 676.00 1.88   3.53   291.19   84791.62   547.437   3.00 194.00 -­‐1.12   1.25   -­‐190.81   36408.46   213.707   2.00 135.00 -­‐2.12   4.49   -­‐249.81   62405.04   529.597   7.00 367.00 2.88   8.29   -­‐17.81   317.20   -­‐51.293   4.12   384.81   86.65   463802.04   1850.577   1.86   136.21   74.023   s x x y y n xy i i= − −∑ − ( )( ) 1 r s s s xy xy x y = Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 101
  • 102. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. d. Find and interpret the coefficient of determination. e. In your opinion, is it worthwhile exercise to use the regression equation to predict the number of customers who will enter the store, given that Colonial intends to advertise five times in the newspaper? If so, find the 95% prediction interval. If not, explain why not. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 102
  • 103. Ads x Customer y x-­‐xmean   (x-­‐xmean)2   y-­‐ymean   (y-­‐ymean)2   (x-­‐xmean)   (y-­‐ymean)   5.00 353.00 0.88   0.77   -­‐31.81   1011.88   -­‐27.993   … … …   …   …   …   …   6.00 319.00 1.88   3.53   -­‐65.81   4330.96   -­‐123.723   2.00 153.00 -­‐2.12   4.49   -­‐231.81   53735.88   491.437   1.00 426.00 -­‐3.12   9.73   41.19   1696.62   -­‐128.513   6.00 566.00 1.88   3.53   181.19   32829.82   340.637   6.00 596.00 1.88   3.53   211.19   44601.22   397.037   5.00 395.00 0.88   0.77   10.19   103.84   8.967   6.00 676.00 1.88   3.53   291.19   84791.62   547.437   3.00 194.00 -­‐1.12   1.25   -­‐190.81   36408.46   213.707   2.00 135.00 -­‐2.12   4.49   -­‐249.81   62405.04   529.597   7.00 367.00 2.88   8.29   -­‐17.81   317.20   -­‐51.293   4.12   384.81   86.65   463802.04   1850.577   1.86   136.21   74.023   s x x y y n xy i i= − −∑ − ( )( ) 1 r s s s xy xy x y = Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 103
  • 104. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. d. Find and interpret the coefficient of determination. e. In your opinion, is it worthwhile exercise to use the regression equation to predict the number of customers who will enter the store, given that Colonial intends to advertise five times in the newspaper? If so, find the 95% prediction interval. If not, explain why not. 2 y 2 x 2 xy2 ss s R = ! 0851. )552,18)(47.3( )02.74( 2 = ! There is a weak linear relationship between the number of ads and the number of customers. = Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 104
  • 105. PROBLEM # 10.5 At 5% level significance. The manager of Colonial Furniture has been reviewing weekly advertising expenditures. During the past 6 months, all advertisements for the store have appeared in the local newspaper. The number of ads per week has varied from one to seven. The store’s sales staff has been tracking the number of customers who enter the store each week. The number of ads and the number of customers per week for the past 26 weeks were recorded. d. Find and interpret the coefficient of determination. e. In your opinion, is it worthwhile exercise to use the regression equation to predict the number of customers who will enter the store, given that Colonial intends to advertise five times in the newspaper? If so, find the 95% prediction interval. If not, explain why not. The linear relationship is too weak for the model to produce predictions. Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 105
  • 106. 7. ESTIMATION POINT ESTIMATION INTERVAL ESTIMATION CONFIDENCE INTERVAL FOR THE MEAN VALUE OF Y PREDICTION INTERVAL FOR AN INDIVIDUAL VALUE OF Y Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 106
  • 107. If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: ^ y = 10 + 5(3) = 25 cars POINT ESTIMATION Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 107
  • 108. Using the Estimated Regression Equation for Estimation and Prediction  / y t sp yp ± α 2 where: confidence coefficient is 1 - α and tα/2 is based on a t distribution with n - 2 degrees of freedom /2 indpy t sα±   Confidence Interval Estimate of E(yp)   Prediction Interval Estimate of yp Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 108
  • 109. 2 ˆ 2 ( )1 ( )p p y i x x s s n x x − = + −∑   Estimate of the Standard Deviation of ˆpy Confidence Interval for E(yp)  / y t sp yp ± α 2 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 109
  • 111. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: a.  Determine the least-squares regression line. !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 111
  • 112. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: a.  Determine the least-squares regression line. To determine the least squares regression line, we must calculate the slope and y‑intercept. i i 1 2 22 i x y nxy 400 6(6.67)(12.67) b 1.354 346 6(6.67)x nx ! ! = = = ! !! " " ! 0 1b y b x 12.67 ( 1.354)(6.67) 21.701= ! = ! ! = ! 400 – [(40)(76)/6] 346 – [(40)2/6] b1 -1.345 !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! 12.67 – (-1.345)(6.67) = 21.641 The regression equation is ŷ = 21.641 – 1.345 x Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 112
  • 113. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: b.  Determine the standard error of estimate. !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! 2 i i y.x ˆ(y y ) 52.334 s n 2 6 2 ! = = = ! ! " Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 113
  • 114. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: b.  Determine the standard error of estimate. The standard error of the estimate is 2 i i y.x ˆ(y y ) 52.334 s 3.617 n 2 6 2 ! = = = ! ! " ! !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! 2 i i y.x ˆ(y y ) 52.334 s n 2 6 2 ! = = = ! ! " Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 114
  • 115. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: c.  Construct the 95% confidence interval for the mean of y when x=7.0 !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! The regression equation is ŷ = 21.641 – 1.345 (7) = 12.226 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 115
  • 116. 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " ! 12.226 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " ! 12.226 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " !79.333 0.1089 10.041 0.4099± x 12.226 + 10.041 x 0.4099 = 16.342 12.226 - 10.041 x 0.4099 = 8.11 12.226 10.041 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 116
  • 117. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: c.  Construct the 95% confidence interval for the mean of y when x=7.0 We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval; this value is 2.776. Therefore the 95% confidence interval for the mean value of y when x = 7 is: 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " ! !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " ! The regression equation is ŷ = 21.641 – 1.345 (7) = 12.226 12.226 The confidence interval ranges from 8.110 to 16.342Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 117
  • 118. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: d. Construct the 95% confidence interval for the mean of y when x=9.0 !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! The regression equation is ŷ = 21.641 – 1.345 (9) = 9.536 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 118
  • 119. 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " ! 9.536 2 2 y.x 2 2 i2 i (x x) (7 6.67)1 1 ˆy ts 12.223 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 12.223 4.116 (8.107, 16.339) ! ! ± + = ± + !! = ± = " " !79.333 5.4289 10.041 0.4849± x 9.536 + 10.041 x 0.4849 = 14.4048 9.536 - 10.041 x 0.4849 = 4.668 9.536 10.041 2 2 y.x 2 2 i2 i (x x) (9 6.67)1 1 ˆy ts 9.515 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 9.515 4.868 (4.647, 14.383) ! ! ± + = ± + !! = ± = " " ! 9.536 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 119
  • 120. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: d. Construct the 95% confidence interval for the mean of y when x=9.0 We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval; this value is 2.776. Therefore the 95% confidence interval for the mean value of y when x = 9 is: 2 2 y.x 2 2 i2 i (x x) (9 6.67)1 1 ˆy ts 9.515 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 9.515 4.868 (4.647, 14.383) ! ! ± + = ± + !! = ± = " " ! !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! 2 2 y.x 2 2 i2 i (x x) (9 6.67)1 1 ˆy ts 9.515 2.776(3.617) n 6( x ) 40 (346 )( x ) 6n 9.515 4.868 (4.647, 14.383) ! ! ± + = ± + !! = ± = " " ! The regression equation is ŷ = 21.641 – 1.345 (9) = 9.536 9.536 The confidence interval ranges from 4.668 to 14.404Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 120
  • 121. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: e. Compare the width of the confidence interval obtained in part (c ) with the obtained in part (d). Which is wider and why? !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 121
  • 122. PROBLEM #10.6 For n=6 data points, the following quantities have been calculated: e. Compare the width of the confidence interval obtained in part (c ) with the obtained in part (d). Which is wider and why? The confidence interval in d is wider because 9 is farther from the mean of x than 7. !!"!#$!!!!!!!!! !!"!%&!!!!!!!! !"!"!#$$!! !! "!'#&!!!!! !! "!((&$!!!!!!!! !! ! !!! ")*+''#! ! For x = 7, 8.110 to 16.342 For x = 9, 4.668 to 14.404 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 122
  • 123. 2 ind 2 ( )1 1 ( ) p i x x s s n x x − = + + −∑   Estimate of the Standard Deviation of an Individual Value of yp Prediction Interval for yp /2 indpy t sα± Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 123
  • 124. PROBLEM #10.7 For the summary data provided in Problem #10.18, construct a 95% prediction interval for an individual y value whenever a. x=2 The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 124
  • 125. PROBLEM #10.7 For the summary data provided in Problem #10.18, construct a 95% prediction interval for an individual y value whenever a. x=2 We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval; this value is 2.776. Therefore, the 95% prediction interval for an individual y value when x = 2 is: The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 125
  • 126. 2 2 y.x 2 2 i2 i (x x) (2 6.67)1 1 ˆy ts 1 18.993 2.776(3.617) 1 n 6( x ) 40 (346 )( x ) 6n 18.993 12.056 (6.937, 31.049) ! ! ± + + = ± + + !! = ± = " " ! 18.951 79.333 21.8089 10.041 1.2006± x 18.951 + 10.041 x 1.2006 = 31.006 18.951 - 10.041 x 1.2006 = 6.896 18.951 10.041 2 2 y.x 2 2 i2 i (x x) (2 6.67)1 1 ˆy ts 1 18.993 2.776(3.617) 1 n 6( x ) 40 (346 )( x ) 6n 18.993 12.056 (6.937, 31.049) ! ! ± + + = ± + + !! = ± = " " ! 18.951 2 2 y.x 2 2 i2 i (x x) (2 6.67)1 1 ˆy ts 1 18.993 2.776(3.617) 1 n 6( x ) 40 (346 )( x ) 6n 18.993 12.056 (6.937, 31.049) ! ! ± + + = ± + + !! = ± = " " ! Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 126
  • 127. PROBLEM #10.7 For the summary data provided in Problem #10.18, construct a 95% prediction interval for an individual y value whenever a. x=2 We also need the t‑value with 6 ‑ 2 = 4 degrees of freedom for a 95% interval; this value is 2.776. Therefore, the 95% prediction interval for an individual y value when x = 2 is: 2 2 y.x 2 2 i2 i (x x) (2 6.67)1 1 ˆy ts 1 18.993 2.776(3.617) 1 n 6( x ) 40 (346 )( x ) 6n 18.993 12.056 (6.937, 31.049) ! ! ± + + = ± + + !! = ± = " " ! The regression equation is ŷ = 21.641 – 1.345 (2) = 18.951 18.951 2 2 y.x 2 2 i2 i (x x) (2 6.67)1 1 ˆy ts 1 18.993 2.776(3.617) 1 n 6( x ) 40 (346 )( x ) 6n 18.993 12.056 (6.937, 31.049) ! ! ± + + = ± + + !! = ± = " " ! The prediction interval ranges from 6.895 to 31.007 Created by Samie L.S. Ly, must not be used for profitable purposes without permission. 127