SlideShare a Scribd company logo
Tyler Anton
1
Spring 2014
Problem Set #3
Hypothesis Testing
1. University of Maryland University College is concerned that out of state students may be
receiving lower grades than Maryland students. Two independent random samples have been
selected: 165 observations from population 1 (Out of state students) and 177 from population 2
(Maryland students). The sample means obtained are X1(bar)=86 and X2(bar)=87. It is known
from previous studies that the population variances are 8.1 and 7.3 respectively. Using a level of
significance of .01, is there evidence that the out of state students may be receiving lower
grades? Fully explain your answer.
H0: 1 > 2
H1: 1 < 2 [Rejection Region in lower (left) tail]
Level of Significance = 0.01 @ one-tailed test (Appendix B.5)
*Critical Value (infinite df) = (-) 2.326; less than = (-) Critical Value via one-tail; Rejection
Region in lower (left) tail
Thus, reject H0 if z < - 2.326
Population Variance = 1^2
Z = (86-87) / SQRT [(8.1/165) + (7.3/177)]
Z = (-1/0.3005558965)
Z = -3.327168129
Explanation
The Z test statistic (-3.327) is lower than the critical value (-2.326) and the one-tail rejection
region is pointing towards the left (lower tail). This implies that we reject H0, and accept H1.
Thus, there is evidence that out-of-state students receive lower grades than Maryland students.
Reject Ho if P-value < Level of significance (0.01)
*P-value = [0.5 – 0.4990] = 0.0010; Thus, reject H0; small likelihood Ho is true
*0.4990 derived from Appendix 3.B; Area under the curve corresponding to 3.327 is 0.4990
Tyler Anton
2
Simple Regression
2. A CEO of a large pharmaceutical company would like to determine if the company should
be placing more money allotted in the budget next year for television advertising of a new drug
marketed for controlling diabetes. He wonders whether there is a strong relationship between the
amount of money spent on television advertising for this new drug called DIB and the number of
orders received. The manufacturing process of this drug is very difficult and requires stability so
the CEO would prefer to generate a stable number of orders. The cost of advertising is always an
important consideration in the phase I roll-out of a new drug. Data that have been collected over
the past 20 months indicate the amount of money spent of television advertising and the number
of orders received.
The use of linear regression is a critical tool for a manager's decision-making ability.
Please carefully read the example below and try to answer the questions in terms of the problem
context. The results are as follows:
NOTE: If you do not have the Data Analysis option under Tools you must install it. You need
to go to Tools select Add-ins and then choose the 2 data toolpak options. It should take about a
minute.
Month Advertising Cost Number of Orders
1 $74,430.00
2,856,000
2 62,620 1,800,000
3 67,580 1,299,000
4 53,680 1,510,000
5 69,180 1,367,000
6 73,140 2,611,000
7 85,370 3,788,000
8 76,880 2,935,000
9 66,990 1,955,000
10 77,230 3,634,000
11 61,380 1,598,000
12 62,750 1,867,000
13 63,270 1,899,000
14 86,190 3,245,000
Tyler Anton
3
15 60,030 1,934,000
16 79,210 2,761,000
17 67,770 1,625,000
18 84,530 3,778,000
19 79,760 2,979,000
20 84,640 3,814,000
a. Set up a scatter diagram and calculate the associated correlation
coefficient. Discuss how strong you think the relationship is between the
amount of money spent on television advertising and the number of orders
received.
Please use the Correlation procedures within Excel under Tools > Data Analysis.
Implication: The number of orders received is related to the advertising costs/budget.
Dependent Variable = [Number of Orders]
Independent Variable = [Advertising Costs]
y = 0.0097x + 47895
R² = 0.776
$0
$20,000
$40,000
$60,000
$80,000
$100,000
1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000
AdvertisingCosts(y)
Orders Received (x)
Advertising Cost & Orders Received Comparison
Tyler Anton
4
Correlation Coefficient (r) 0.880931435
The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive
correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two
variables – advertising costs and number of orders. As advertising costs increase, the number of
orders received will follow. A positive correlation exists. So far, the CEO should consider
increasing the advertising budget. There is a relatively direct or strong relationship between the
amount of money spent on television advertising for this new drug, called DIB, and the number
of orders received.
b. Assuming there is a statistically significant relationship, use the least squares method to
find the regression equation to predict the advertising costs based on the number of orders
received. Please use the regression procedure within Excel under Tools > Data Analysis to
construct this equation.
Least Squares Regression Equation: y = 0.00971950x + 47895
R2
= 0.776
c. Interpret the meaning of the slope, b1, in the regression equation.
The coefficient for the ‘Number of Orders Received’ (x) is 0.00971950. For every increase in the
firm’s ‘Number of Orders Received’, there is an anticipated 0.00971950 increase in ‘Advertising
Costs’ respectively - (Just under 1 cent)
B. Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.880931435
R Square 0.776040194
Adjusted R Square0.763597982
Standard Error4704.512237
Observations 20
ANOVA
df SS MS F Significance F
Regression 1 1380434618 1380434618 62.3715644 2.943E-07
Residual 18 398383837 22132435.39
Total 19 1778818455
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 47894.77763 3208.26531 14.92855891 1.3962E-11 41154.4623 54635.0929 41154.4623 54635.0929
X Variable 1 0.00971951 0.0012307 7.897566989 2.943E-07 0.00713391 0.01230511 0.00713391 0.01230511
Note that R Squared here is the same (.776) as we got on the chart.
Also the equation coefficients are identical (47895 and .00971)
Tyler Anton
5
d. Predict the monthly advertising cost when the number of orders is 2,300,000. (Hint: Be very
careful with assigning the dependent variable for this problem)
y = dependent variable being estimated. In part d, Advertising Costs are forecasted; hence,
Advertising Costs are the dependent variable.
y = 0.00971950x + 47895
y (Advertising Costs) = 0.00971950(2300000) + 47895
Monthly Advertising Cost (When x = 2,300,000 orders): $70,250
e. Compute the coefficient of determination, r2
, and interpret its meaning.
R2
= 0.776 = % of Total variation (SS Total) explained by the regression equation (SSR)
77.6% of the total variation in Advertising Costs (y) is explained by the number of orders
received (x). Thus, the data is scattered around the best least squares regression line and there
will be error in the predictions – actual vs. predicted (y)’s.
22.4% of the total variation in the dependent variable is error/residual (Unexplained)
variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear
regression line.
f. Compute the standard error of estimate, and interpret its meaning.
Sy.x = standard error for y (advertising costs – depend.) for a given value of x (number of orders).
Sy.x OR STEYX = 4704.51; or [4704.51/1000] = 4.70451 {Simplified}
The standard error of a predicted y-value for each x in the regression is 4.70451
(simplified). This implies the standard error for our forecasted monthly advertising costs is
4.70451.
The predicted dependent variable is located at an x-value corresponding to the regression
line; however, an actual data point may be above or below that line.
Standard error of estimate (SEE): A measure of how inaccurate an estimate might be. It is
essentially the standard deviation or dispersion of actual (y)’s from the predicted (y)’s on
the linear regression line. This is a measure of how well regression line represents the scattered
data. The SEE is the standard deviation of the errors (or residuals). More simply put, the
difference between the actual (y) and the predicted (y) is the error or residual.
The greater the dispersion, the larger the SEE. A larger sample size could be used to
reduce the SEE.
Tyler Anton
6
scatter/dispersion of the observed values around the line of regression for a given value of (x)
g. Do you think that the company should use these results from the regression to base any
corporate decisions on?….explain fully.
Yes.
SEE & r2
are the best measures to evaluate the predictive ability of the regression equation.
The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive
correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two
variables – advertising costs and number of orders. This (r) indicates that there is a very strong
predictive model.
As for r2
, 77.6% of the variation in Advertising Costs (y) is explained by the number of orders
received (x). However, 22.4% of the total variation in the dependent variable is error/residual
(unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s
on the linear regression line.
The standard error of a predicted y-value for each x in the regression is 4.70451
(simplified). This implies the standard error for our forecasted monthly advertising costs is
4.70451 – quite small considering the following:
The correlation coefficient is large (0.8809) since the scattered points tend to be close to the
linear regression line. The correlation coefficient and SEE are inversely related. Thus, as
the strength of the linear relationship between the 2 variables increases, the SEE decreases.
Due to high correlation between the independent and dependent variables, there is less
erratic scatter/dispersion - indicating the regression equation is sufficient and accounts for
over 2/3rds of total variation. A larger sample size, however, such as 3 or 4 years of data,
could be used to reduce this SEE.
This regression model can be used to predict future values with great certainty; high
degree of statistical significance.
Tyler Anton
7
Hypothesis Testing on Multiple Populations
3. Dr. Michaella Evans, a statistics professor at the University of Maryland University College,
drives from her home to the school every weekday. She has three options to drive there. She can
take the Beltway, or she can take a main highway with some traffic lights, or she can take the
back road, which has no traffic lights but is a longer distance. Being as data-oriented as she is,
she is interested to know if there is a difference in the time it takes to drive each route.
As an experiment she randomly selected the route on 21 different days and wrote down the time
it took her for the round trip, getting to work in the morning and back home in the evening.
At the .01 significance level, can she conclude that there is a difference between the driving
times using the different routes?
Time (in minutes) it took to get to work and back using:
Beltway
Main highway Back road
88 79 86
94 86 78
91 75 79
88 83 96
98 74 97
84 72 73
90 68
77
You can check your critical value with the following table:
http://www.statsoft.com/textbook/distribution-tables
Pg 391 & 751
H0: 1=2=3
H1: The mean scores are not equal
Level of Significance = 0.01
Test Statistic = F distribution
df in numerator = (k-1) or 3-1 = 2
df in denominator = (n-k) or 21-3 = 18
Appendix B.6 @ 0.01 F dist = 6.013 (intersection value); Reject H0 if computed F>6.013
Reject Ho if P-value < Level of significance (0.01)
Reject Ho if F > 6.0129
According to the Anova data analysis below, F<6.013 and P-value (0.071) > Level of
significance (0.01). Thus, we reject H1 and conclude that there is NOT a difference between the
driving times using the different routes. This P-value indicates that there is a high probability that
if we rejected H0, we would have committed a type 1 error.
Tyler Anton
8
Since 3.0683<6.0129 we can conclude that the null hypothesis Ho should not be rejected. There
is enough evidence to conclude that there is no difference in the driving times between the three
routes
Anova: Single Factor (Single Driver,
not multiple like in Two-Factor W/O
Replication on pg 402)
SUMMARY
Groups Count Sum Average Variance
Beltway 8 710 88.75 40.21429
Main highway 6 469 78.16667 30.16667
Back road 7 577 82.42857 122.9524
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 398.9047619 2 199.4524 3.068373 0.071341785 6.012905
Within Groups 1170.047619 18 65.00265
Total 1568.952381 20

More Related Content

PPT
Multiple Regression
PDF
PPTX
Decision analysis
PPT
Chapter 10
PPT
Chapter 15
PPTX
Statr session 17 and 18 (ASTR)
PDF
Assessing Model Performance - Beginner's Guide
PDF
Multiple Regression
Decision analysis
Chapter 10
Chapter 15
Statr session 17 and 18 (ASTR)
Assessing Model Performance - Beginner's Guide

What's hot (15)

PPT
Business Statistics Chapter 9
PDF
Bbs11 ppt ch06
PDF
Bbs11 ppt ch05
PPT
Kxu stat-anderson-ch02
PDF
Slides for ch05
PDF
Percentage and its applications /COMMERCIAL MATHEMATICS
PDF
Bbs11 ppt ch07
PDF
Applied Business Statistics ,ken black , ch 3 part 1
DOCX
Math 533 ( applied managerial statistics ) final exam answers
PPT
PPTX
Hypothesis Testing-Z-Test
PDF
14 ch ken black solution
PDF
Biomath12
PDF
Math 533 ( applied managerial statistics ) entire course
PPT
Chap05 discrete probability distributions
Business Statistics Chapter 9
Bbs11 ppt ch06
Bbs11 ppt ch05
Kxu stat-anderson-ch02
Slides for ch05
Percentage and its applications /COMMERCIAL MATHEMATICS
Bbs11 ppt ch07
Applied Business Statistics ,ken black , ch 3 part 1
Math 533 ( applied managerial statistics ) final exam answers
Hypothesis Testing-Z-Test
14 ch ken black solution
Biomath12
Math 533 ( applied managerial statistics ) entire course
Chap05 discrete probability distributions
Ad

Viewers also liked (16)

DOCX
Instrumento de evaluación para la producción de textos escritos
PDF
2010 NEO Econ Study(1)
PPTX
Arquitectura contemporanea
PPTX
Historia arquitectura religiosa
DOCX
Informe de desarrollo y evalución de la estrategia didáctica
PDF
Acadia student intro
PDF
Man138048
PDF
Progettare antifurto a norme Cei 79 3 Diakron
PDF
Отдых на Байкале с детьми
DOCX
Урок – проект «Як зберегти ялинку перед новорічними святами».
PPTX
Plantas industriales
PDF
EVA & MVA Analysis 2
PPTX
CLASE 25
DOC
День захисника Вітчизни
DOC
Вчи і поважай правила дорожнього руху
Instrumento de evaluación para la producción de textos escritos
2010 NEO Econ Study(1)
Arquitectura contemporanea
Historia arquitectura religiosa
Informe de desarrollo y evalución de la estrategia didáctica
Acadia student intro
Man138048
Progettare antifurto a norme Cei 79 3 Diakron
Отдых на Байкале с детьми
Урок – проект «Як зберегти ялинку перед новорічними святами».
Plantas industriales
EVA & MVA Analysis 2
CLASE 25
День захисника Вітчизни
Вчи і поважай правила дорожнього руху
Ad

Similar to Stat_AMBA_600_Problem Set3 (20)

PPTX
01_SLR_final (1).pptx
PPTX
manecohuhuhuhubasicEstimation-1.pptx
PDF
Statistics homework help
PPTX
Regression Analysis
PPTX
Regression Analysis
DOC
Marketing Engineering Notes
DOCX
Exercise 29Calculating Simple Linear RegressionSimple linear reg.docx
PPTX
Me ppt
PDF
Regression analysis
PPT
Powerpoint2.reg
PPTX
business Lesson-Linear-Regression-1.pptx
PPT
Chapter13
PPT
koefisienkorelasiUNTUKILMUMANAJEMENS2.ppt
PPT
15.Simple Linear Regression of case study-530 (2).ppt
PDF
Chapter 13 (1).pdf
PPT
Chapter13
DOCX
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
PPTX
Simple Linear Regression.pptx helloooooo
PDF
Business statistics-ii-aarhus-bss
DOCX
1Chapter 11 • Interval Estimation of a Populatio.docx
01_SLR_final (1).pptx
manecohuhuhuhubasicEstimation-1.pptx
Statistics homework help
Regression Analysis
Regression Analysis
Marketing Engineering Notes
Exercise 29Calculating Simple Linear RegressionSimple linear reg.docx
Me ppt
Regression analysis
Powerpoint2.reg
business Lesson-Linear-Regression-1.pptx
Chapter13
koefisienkorelasiUNTUKILMUMANAJEMENS2.ppt
15.Simple Linear Regression of case study-530 (2).ppt
Chapter 13 (1).pdf
Chapter13
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Simple Linear Regression.pptx helloooooo
Business statistics-ii-aarhus-bss
1Chapter 11 • Interval Estimation of a Populatio.docx

More from Tyler Anton (6)

DOCX
WK 8 DA
DOCX
Amazon Case Study - (Tyler Anton)
PDF
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
DOCX
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
PDF
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
DOCX
MSFT & ORCL Analysis
WK 8 DA
Amazon Case Study - (Tyler Anton)
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
MSFT & ORCL Analysis

Stat_AMBA_600_Problem Set3

  • 1. Tyler Anton 1 Spring 2014 Problem Set #3 Hypothesis Testing 1. University of Maryland University College is concerned that out of state students may be receiving lower grades than Maryland students. Two independent random samples have been selected: 165 observations from population 1 (Out of state students) and 177 from population 2 (Maryland students). The sample means obtained are X1(bar)=86 and X2(bar)=87. It is known from previous studies that the population variances are 8.1 and 7.3 respectively. Using a level of significance of .01, is there evidence that the out of state students may be receiving lower grades? Fully explain your answer. H0: 1 > 2 H1: 1 < 2 [Rejection Region in lower (left) tail] Level of Significance = 0.01 @ one-tailed test (Appendix B.5) *Critical Value (infinite df) = (-) 2.326; less than = (-) Critical Value via one-tail; Rejection Region in lower (left) tail Thus, reject H0 if z < - 2.326 Population Variance = 1^2 Z = (86-87) / SQRT [(8.1/165) + (7.3/177)] Z = (-1/0.3005558965) Z = -3.327168129 Explanation The Z test statistic (-3.327) is lower than the critical value (-2.326) and the one-tail rejection region is pointing towards the left (lower tail). This implies that we reject H0, and accept H1. Thus, there is evidence that out-of-state students receive lower grades than Maryland students. Reject Ho if P-value < Level of significance (0.01) *P-value = [0.5 – 0.4990] = 0.0010; Thus, reject H0; small likelihood Ho is true *0.4990 derived from Appendix 3.B; Area under the curve corresponding to 3.327 is 0.4990
  • 2. Tyler Anton 2 Simple Regression 2. A CEO of a large pharmaceutical company would like to determine if the company should be placing more money allotted in the budget next year for television advertising of a new drug marketed for controlling diabetes. He wonders whether there is a strong relationship between the amount of money spent on television advertising for this new drug called DIB and the number of orders received. The manufacturing process of this drug is very difficult and requires stability so the CEO would prefer to generate a stable number of orders. The cost of advertising is always an important consideration in the phase I roll-out of a new drug. Data that have been collected over the past 20 months indicate the amount of money spent of television advertising and the number of orders received. The use of linear regression is a critical tool for a manager's decision-making ability. Please carefully read the example below and try to answer the questions in terms of the problem context. The results are as follows: NOTE: If you do not have the Data Analysis option under Tools you must install it. You need to go to Tools select Add-ins and then choose the 2 data toolpak options. It should take about a minute. Month Advertising Cost Number of Orders 1 $74,430.00 2,856,000 2 62,620 1,800,000 3 67,580 1,299,000 4 53,680 1,510,000 5 69,180 1,367,000 6 73,140 2,611,000 7 85,370 3,788,000 8 76,880 2,935,000 9 66,990 1,955,000 10 77,230 3,634,000 11 61,380 1,598,000 12 62,750 1,867,000 13 63,270 1,899,000 14 86,190 3,245,000
  • 3. Tyler Anton 3 15 60,030 1,934,000 16 79,210 2,761,000 17 67,770 1,625,000 18 84,530 3,778,000 19 79,760 2,979,000 20 84,640 3,814,000 a. Set up a scatter diagram and calculate the associated correlation coefficient. Discuss how strong you think the relationship is between the amount of money spent on television advertising and the number of orders received. Please use the Correlation procedures within Excel under Tools > Data Analysis. Implication: The number of orders received is related to the advertising costs/budget. Dependent Variable = [Number of Orders] Independent Variable = [Advertising Costs] y = 0.0097x + 47895 R² = 0.776 $0 $20,000 $40,000 $60,000 $80,000 $100,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 AdvertisingCosts(y) Orders Received (x) Advertising Cost & Orders Received Comparison
  • 4. Tyler Anton 4 Correlation Coefficient (r) 0.880931435 The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two variables – advertising costs and number of orders. As advertising costs increase, the number of orders received will follow. A positive correlation exists. So far, the CEO should consider increasing the advertising budget. There is a relatively direct or strong relationship between the amount of money spent on television advertising for this new drug, called DIB, and the number of orders received. b. Assuming there is a statistically significant relationship, use the least squares method to find the regression equation to predict the advertising costs based on the number of orders received. Please use the regression procedure within Excel under Tools > Data Analysis to construct this equation. Least Squares Regression Equation: y = 0.00971950x + 47895 R2 = 0.776 c. Interpret the meaning of the slope, b1, in the regression equation. The coefficient for the ‘Number of Orders Received’ (x) is 0.00971950. For every increase in the firm’s ‘Number of Orders Received’, there is an anticipated 0.00971950 increase in ‘Advertising Costs’ respectively - (Just under 1 cent) B. Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.880931435 R Square 0.776040194 Adjusted R Square0.763597982 Standard Error4704.512237 Observations 20 ANOVA df SS MS F Significance F Regression 1 1380434618 1380434618 62.3715644 2.943E-07 Residual 18 398383837 22132435.39 Total 19 1778818455 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 47894.77763 3208.26531 14.92855891 1.3962E-11 41154.4623 54635.0929 41154.4623 54635.0929 X Variable 1 0.00971951 0.0012307 7.897566989 2.943E-07 0.00713391 0.01230511 0.00713391 0.01230511 Note that R Squared here is the same (.776) as we got on the chart. Also the equation coefficients are identical (47895 and .00971)
  • 5. Tyler Anton 5 d. Predict the monthly advertising cost when the number of orders is 2,300,000. (Hint: Be very careful with assigning the dependent variable for this problem) y = dependent variable being estimated. In part d, Advertising Costs are forecasted; hence, Advertising Costs are the dependent variable. y = 0.00971950x + 47895 y (Advertising Costs) = 0.00971950(2300000) + 47895 Monthly Advertising Cost (When x = 2,300,000 orders): $70,250 e. Compute the coefficient of determination, r2 , and interpret its meaning. R2 = 0.776 = % of Total variation (SS Total) explained by the regression equation (SSR) 77.6% of the total variation in Advertising Costs (y) is explained by the number of orders received (x). Thus, the data is scattered around the best least squares regression line and there will be error in the predictions – actual vs. predicted (y)’s. 22.4% of the total variation in the dependent variable is error/residual (Unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. f. Compute the standard error of estimate, and interpret its meaning. Sy.x = standard error for y (advertising costs – depend.) for a given value of x (number of orders). Sy.x OR STEYX = 4704.51; or [4704.51/1000] = 4.70451 {Simplified} The standard error of a predicted y-value for each x in the regression is 4.70451 (simplified). This implies the standard error for our forecasted monthly advertising costs is 4.70451. The predicted dependent variable is located at an x-value corresponding to the regression line; however, an actual data point may be above or below that line. Standard error of estimate (SEE): A measure of how inaccurate an estimate might be. It is essentially the standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. This is a measure of how well regression line represents the scattered data. The SEE is the standard deviation of the errors (or residuals). More simply put, the difference between the actual (y) and the predicted (y) is the error or residual. The greater the dispersion, the larger the SEE. A larger sample size could be used to reduce the SEE.
  • 6. Tyler Anton 6 scatter/dispersion of the observed values around the line of regression for a given value of (x) g. Do you think that the company should use these results from the regression to base any corporate decisions on?….explain fully. Yes. SEE & r2 are the best measures to evaluate the predictive ability of the regression equation. The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two variables – advertising costs and number of orders. This (r) indicates that there is a very strong predictive model. As for r2 , 77.6% of the variation in Advertising Costs (y) is explained by the number of orders received (x). However, 22.4% of the total variation in the dependent variable is error/residual (unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. The standard error of a predicted y-value for each x in the regression is 4.70451 (simplified). This implies the standard error for our forecasted monthly advertising costs is 4.70451 – quite small considering the following: The correlation coefficient is large (0.8809) since the scattered points tend to be close to the linear regression line. The correlation coefficient and SEE are inversely related. Thus, as the strength of the linear relationship between the 2 variables increases, the SEE decreases. Due to high correlation between the independent and dependent variables, there is less erratic scatter/dispersion - indicating the regression equation is sufficient and accounts for over 2/3rds of total variation. A larger sample size, however, such as 3 or 4 years of data, could be used to reduce this SEE. This regression model can be used to predict future values with great certainty; high degree of statistical significance.
  • 7. Tyler Anton 7 Hypothesis Testing on Multiple Populations 3. Dr. Michaella Evans, a statistics professor at the University of Maryland University College, drives from her home to the school every weekday. She has three options to drive there. She can take the Beltway, or she can take a main highway with some traffic lights, or she can take the back road, which has no traffic lights but is a longer distance. Being as data-oriented as she is, she is interested to know if there is a difference in the time it takes to drive each route. As an experiment she randomly selected the route on 21 different days and wrote down the time it took her for the round trip, getting to work in the morning and back home in the evening. At the .01 significance level, can she conclude that there is a difference between the driving times using the different routes? Time (in minutes) it took to get to work and back using: Beltway Main highway Back road 88 79 86 94 86 78 91 75 79 88 83 96 98 74 97 84 72 73 90 68 77 You can check your critical value with the following table: http://www.statsoft.com/textbook/distribution-tables Pg 391 & 751 H0: 1=2=3 H1: The mean scores are not equal Level of Significance = 0.01 Test Statistic = F distribution df in numerator = (k-1) or 3-1 = 2 df in denominator = (n-k) or 21-3 = 18 Appendix B.6 @ 0.01 F dist = 6.013 (intersection value); Reject H0 if computed F>6.013 Reject Ho if P-value < Level of significance (0.01) Reject Ho if F > 6.0129 According to the Anova data analysis below, F<6.013 and P-value (0.071) > Level of significance (0.01). Thus, we reject H1 and conclude that there is NOT a difference between the driving times using the different routes. This P-value indicates that there is a high probability that if we rejected H0, we would have committed a type 1 error.
  • 8. Tyler Anton 8 Since 3.0683<6.0129 we can conclude that the null hypothesis Ho should not be rejected. There is enough evidence to conclude that there is no difference in the driving times between the three routes Anova: Single Factor (Single Driver, not multiple like in Two-Factor W/O Replication on pg 402) SUMMARY Groups Count Sum Average Variance Beltway 8 710 88.75 40.21429 Main highway 6 469 78.16667 30.16667 Back road 7 577 82.42857 122.9524 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 398.9047619 2 199.4524 3.068373 0.071341785 6.012905 Within Groups 1170.047619 18 65.00265 Total 1568.952381 20