SlideShare a Scribd company logo
CASE 01: CASE 14 – RISK AND RETURN (CHAPTER 10: SIMPLE LINEAR REGRESSION)
According to the Capital Asset Pricing Model (CAPM), the risk associated with a capital asset is
proportional to the slope 𝛽1 (or simply 𝛽) obtained by regressing the asset’s past returns with the
corresponding returns of the average portfolio called the market portfolio. (The return of the market
portfolio represents the return earned by the average investor. It is a weighted average of the returns from
all the assets in the market.) The larger the slope 𝛽 of an asset, the larger is the risk associated with that
asset. A 𝛽 of 1.00 represents average risk.
The returns from an electronics firm’s stock and the corresponding returns for the market portfolio for the
past 15 years are given below.
Market Return Stock’s Return
(%) (%)
16.02 21.05
12.17 17.25
11.48 13.1
17.62 18.23
20.01 21.52
14 13.26
13.22 15.84
17.79 22.18
15.46 16.26
8.09 5.64
11 10.54
18.52 17.86
14.05 12.75
8.79 9.13
11.6 13.87
1. Carry out the regression and find the 𝛽 for the stock. What is the regression equation?
Independent variable (𝑋): Market Return
Dependent variable (𝑌): Stock’s Return
Least-square estimator 𝑏0, which estimates the intercept 𝛽0 of the model, is −1.090724
Least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.166957
Regression equation: 𝑌 = 1.166957𝑋 − 1.090724
y = 1.167x - 1.0907
0
5
10
15
20
25
0 5 10 15 20 25
Y
X
Simple Regression CASE 14: Risk and Return
Stock's Return Market's Return r 2
0.7775 Coefficient of Determination
X Y Error Confidence Interval for Slope r 0.8818 Coefficient of Correlation
1 16.02 21.05 3.44607 1-a (1-a) C.I. for b1
2 12.17 17.25 4.13886 95% 1.16696 + or - 0.37405 s(b 1 ) 0.17314 Standard Error of Slope
3 11.48 13.1 0.79406 t 6.73986
4 17.62 18.23 -1.2411 Confidence Interval for Intercept p- value 0.0000
5 20.01 21.52 -0.7401 1-a (1-a) C.I. for b0
6 14 13.26 -1.9867 95% -1.0907 + or - 5.38802 s(b 0 ) 2.49403 Standard Error of Intercept
7 13.22 15.84 1.50356
8 17.79 22.18 2.51056 Prediction Interval for Y
9 15.46 16.26 -0.6904 1-a X (1-a) C.I. for Y given X
10 8.09 5.64 -2.7099 95% 10 10.5788 + or - 5.35692 s 2.30593 Standard Error of prediction
11 11 10.55 -1.1958
12 18.52 17.86 -2.6613 Prediction Interval for E[Y|X]
13 14.05 12.75 -2.555 1-a X (1-a) C.I. for E[Y | X ]
14 8.79 9.13 -0.0368 95% 10 10.5788 + or - 1.96969
15 11.6 13.87 1.42403
ANOVA Table
Source SS df MS F F critical p-value
Regn. 241.543 1 241.543 45.4257 4.66719 0.0000
Error 69.125 13 5.3173
Total 310.667 14
2. State your interpretation about the slope 𝛽1 of the model (Hint: Does the value of the slope indicate that
the stock has above-average risk? For the purposes of this case assume that the risk is average if the slope is
in the range 1 ± 0.1, below average if it is less than 0.9, and above average if it is more than 1.1.)
Since the least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.166957 (> 1.10), the
value of the slope indicate that the stock has above-average risk.
3. Give a 95% confidence interval for this 𝛽. Can we say the risk is above average with 95% confidence?
Confidence Intervals for the Regression Parameters:
A (1 − 𝛼)100% confidence interval for 𝛽1is: 𝑏1 ± 𝑡(
𝛼
2
,𝑛−2)
𝑠(𝑏1)
A 95% confidence interval for 𝛽1is:
𝑏1 ± 𝑡(0.025,15−2) 𝑠(𝑏1) = 1.166957 ± (2.16)(0.17314) = [0.79291, 1.54101]
4. If the market portfolio return for the current year is 10%, what is the stock’s return predicted by the
regression equation? Give a 95% confidence interval for this prediction.
If the market portfolio return for the current year is 10% (𝑋 = 10), the stock’s return predicted by the
regression equation: 𝑌̂ = 1.166957𝑋 − 1.090724 = 1.166957(10) − 1.090724 = 10.57884
Prediction Intervals
A (1 − 𝛼)100% prediction interval for 𝑌 is:
𝑦̂ ± 𝑡 𝛼/2 𝑠√1 +
1
𝑛
+
(𝑥 − 𝑥̅)2
𝑆𝑆 𝑥
= 10.57884 ± (2.16)(2.3059)√1 +
1
15
+
(10 − 13.988)2
177.3712
= [5.2219, 15.9358]
5. Construct a residual plot. Do the residuals appear random?
A Check for the Equality of Variance of the Errors
One of the assumptions in the regression model is the equality of variance of the errors. One of several ways
to test for the normality of the residuals is to use a residual plot of the residuals.
The residual plot is constructed as follows.
Residual Plot:
The residuals appear random.
A graph of the regression errors, the residuals, versus the independent variable X, will reveal whether the
variance of the errors is constant. The variance of the residuals is indicated by the width of the scatter plot
of the residuals as X increases. If the width of the scatter plot of the residuals either increases or decreases
as X increases, then the assumption of constant variance is not met. This problem is called
heteroscedasticity. When heteroscedasticity exists, we cannot use the ordinary least squares method for
estimating the regression and should use a more complex method, called generalized least squares. The
above figure shows a residual plot in a good regression, with no heteroscedasticity that the residuals appear
random.
6. Construct a normal probability plot. Do the residuals appear to be normally distributed?
The Normal Probability Plot
One of the assumptions in the regression model is that the errors are normally distributed. This assumption
is necessary for calculating prediction intervals and for hypothesis tests about the regression. One of several
ways to test for the normality of the residuals is to use a normal probability plot of the residuals.
The normal probability plot is constructed as follows.
-4
-3
-2
-1
0
1
2
3
4
5Error
X
Residual Plot
Normal Probability Plot:
The residuals appear to be normally distributed.
In this plot, the residual values are on the horizontal axis and the corresponding z values from the normal
distribution are on the vertical axis. If the residuals are normal, then they should align themselves along the
straight line that appears on the plot. To the extent the points deviate from this straight line, the residuals
deviate from a normal distribution. It is useful to recognize whether the assumption of normally distributed
errors holds on a normal probability plot. The above figure a case where the residuals are relatively normal,
but from the pattern of the points we can also infer that the distribution of the residuals is flatter than the
normal distribution.
7. (Optional) The risk-free rate of return is the rate associated with an investment that has no risk at all,
such as lending money to the government. Assume that for the current year the risk-free rate is 6%.
According to the CAPM, when the return from the market portfolio is equal to the risk-free rate, the
return from every asset must also be equal to the risk-free rate. In other words, if the market portfolio
return is 6%, then the stock’s return should also be 6%. It implies that the regression line must pass
through the point (6, 6). Repeat the regression forcing this constraint. Comment on the risk based
on the new regression equation.
The Excel Solver Method for Regression
The Solver macro available in Excel can also be used to conduct a simple linear regression. The advantage
of using this method is that additional constraints can be imposed on the slope and the intercept. For
instance, if we want the intercept to be a particular value, or if we want to force the regression line to go
through a desired point, we can do that by imposing appropriate constraints.
As the given problem, consider a common type of regression carried out in the area of finance. The risk of
a stock (or any capital asset) is measured by regressing its returns against the market return (which is the
average return from all the assets in the market) during the same period. The Capital Asset Pricing Model
(CAPM) stipulates that when the market return equals the risk-free interest rate (such as the interest rate of
short-term Treasury bills), the stock will also return the same amount. In other words, if the market return
risk-free interest rate 6%, then the stock’s return, according to the CAPM, will also be 6%. This means that
according to the CAPM, the regression line must pass through the point (6, 6). This can be imposed as a
Normal Probability Plot of Residuals
constraint in the Solver method of regression. Note that forcing a regression line through the origin, (0, 0), is
the same as forcing the intercept to equal zero, and forcing the line through the point (0, 5) is the same as
forcing the intercept to equal 5. The criterion for the line of best fit by the Solver method is still the same as
before—minimize the sum of squared errors (SSE).
Without any constraint, the regression equation is 𝑌̂ = 1.166957𝑋 − 1.090724 (obtained from the
template for regular regression). For the market portfolio return of 6%, the predicted return of stock is
𝑌̂ = 1.166957𝑋 − 1.090724 = 1.166957(6) − 1.090724 = 5.911006
With the constraint, the regression equation changes as follows:
Least-square estimator 𝑏0, which estimates the intercept 𝛽0 of the model, is −0.945353214
Least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.157558869
Regression equation: 𝑌 = 1.157558869𝑋 − 0.945353214
Even though the risk of the regression model with the constraint (𝑏1 = 1.157558869) is lower than the
risk of the original regression model without any constraint (𝑏1 = 1.166957958), the value of the slope
still indicates that the stock has above-average risk.
Regression Using the Solver
SSE Intercept Slope Prediction
69.1435 b 0 b 1 X Y
X Y Error -0.945353 1.157559 6 6
1 16.02 21.05 3.4513
2 12.17 17.25 4.1079
3 11.48 13.1 0.7566
4 17.62 18.23 -1.2208
5 20.01 21.52 -0.6974
6 14 13.26 -2.0005
7 13.22 15.84 1.4824
8 17.79 22.18 2.5324
9 15.46 16.26 -0.6905
10 8.09 5.64 -2.7793
11 11 10.55 -1.2378
12 18.52 17.86 -2.6326
13 14.05 12.75 -2.5683
14 8.79 9.13 -0.0996
15 11.6 13.87 1.3877
CASE 14: Risk and Return
0
5
10
15
20
25
0 5 10 15 20 25
Y
X
CASE 02: SAIGON COOPMART
Logistics & Supply Chain plays an important role, if needed to say a critical factor for the success of Saigon
Coopmart. Most of supermarkets over the world follow the identical model in which a warehouse is placed
next to supermarket for stocks storage; and the size of warehouse is more or less equal to size of
supermarket. However, due to harsh competition, and weak finance, Saigon Coopmart decided to follow a
different model with very small size warehouse. This allows Saigon Coopmart to place more supermarkets;
but in exchange, stocks only enough for a day, or maximum two compared to ordinary model in which a
warehouse can store enough stocks for a week or more. As a consequence, Saigon Coopmart has to ship
much more frequency to its supermarkets than its competitors such as Big C.
Gaining trusted in customers over years, sale increased gradually. In late 2011, Logistics and Supply Chain
department received warning from some directors of Coopmart supermarkets (Saigon Coopmart has many
supermarkets, each supermarket is supervised by one director) that they suspected by the end of 2012, the
warehouse would no longer enough for a day sale. This means supermarket would not have enough
products to sell for customer. A logistics improvement project was conducted to solve the problem
temporary to spare time for BOD of Saigon Coopmart to come with a new and complete solution. One of
the sub-projects involved improving the unloading time (i.e. when trucks carrying products come to
supermarket, the products are then unloaded and moved to warehouse).
a. Indicator UWPM (Unloading Weight/minute) is used to measure the effectiveness of unloading product
management. From information provide (Excel data file – Sheet “Case 01 (a-b) – Coopmart”, what can you
tell about the unloading product management among four Coopmart supermarkets? (Important note: any
statistics test used HAVE TO comply with explanation/argument why using that statistics test).
ANOVA Test
The required assumptions of ANOVA:
1. We assume independent random sampling from each of the r populations.
2. We assume that the r populations under study are normally distributed, with means 𝜇𝑖 that may or may
not be equal, but with equal variances 𝜎2
.
The null and alternative hypotheses here are,
𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐷𝑇𝐻 = 𝜇 𝐿𝑇𝐾 = 𝜇 𝐻𝑉
𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑓𝑜𝑢𝑟 𝜇𝑖 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙
ANOVA Table
ANOVA
UWPM
Sum of Squares df Mean Square F Sig.
Between Groups 205089.155 3 68363.052 141.665 .000
Within Groups 229702.540 476 482.568
Total 434791.695 479
As shown in the above table, the p-value is smaller than 0.05, we reject the null hypothesis. We may
conclude that, based on the testing results and our assumptions, it is likely that the four supermarkets
studied are not equal in terms of average UWPM. Which supermarkets are more effective than others?
This question will be answered when we return to the given problem in the next section.
The method we will discuss here is the Tukey method of pairwise comparisons of the population means.
The method is also called the HSD (honestly significant differences) test. This method allows us to
compare every possible pair of means by using a single level of significance, say 𝛼 = 0.05 (or a single
confidence coefficient, say, 1 − 0.05 = 0.95). The single level of significance applies to the entire set of
pairwise comparisons.
To compare the population mean vacationer responses for every pair of supermarkets, we use the following
set of hypothesis tests:
𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐷𝑇𝐻
𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐷𝑇𝐻
𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐻𝑉
𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐻𝑉
𝐻0: 𝜇 𝐷𝑇𝐻 = 𝜇 𝐻𝑉
𝐻1: 𝜇 𝐷𝑇𝐻 ≠ 𝜇 𝐻𝑉
𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐿𝑇𝐾
𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐿𝑇𝐾
𝐻0: 𝜇 𝐷𝑇𝐻 = 𝜇 𝐿𝑇𝐾
𝐻1: 𝜇 𝐷𝑇𝐻 ≠ 𝜇 𝐿𝑇𝐾
𝐻0: 𝜇 𝐿𝑇𝐾 = 𝜇 𝐻𝑉
𝐻1: 𝜇 𝐿𝑇𝐾 ≠ 𝜇 𝐻𝑉
From these comparisons we determine that our data provide statistical evidence to conclude that 𝜇 𝐶𝑄 is
different from 𝜇 𝐷𝑇𝐻; 𝜇 𝐶𝑄 is different from 𝜇 𝐿𝑇𝐾; 𝜇 𝐶𝑄 is different from 𝜇 𝐻𝑉; 𝜇 𝐷𝑇𝐻 is different from 𝜇 𝐿𝑇𝐾;
and 𝜇 𝐷𝑇𝐻 is different from 𝜇 𝐻𝑉. There are no other statistically significant differences at 𝛼 = 0.05.
b. Further investigation shows that measuring the effectiveness by mean value is not enough because there
might be a case in which two or more supermarkets having the same mean (weight/minute) but with
different variance. Then, the one with smaller variance turns out to be better. Construct the hypothesis
testing for two population variances matrix as follow:
From the result in that matrix and the result in question a, what is your conclusion?
The F Distribution and a Test for Equality of Two Population Variances
We assume independent random sampling from the four populations in question. We also assume that the
four populations are normally distributed. The possible hypotheses to be tested are the following:
Comparison between two population variance matrix
Cong Quynh Dinh Tien Hoang Ly Thuong Kiet Hung Vuong
Cong Quynh
𝐻0: 𝜎 𝐶𝑄
2
= 𝜎 𝐷𝑇𝐻
2
𝐻1: 𝜎 𝐶𝑄
2
≠ 𝜎 𝐷𝑇𝐻
2
𝐻0: 𝜎 𝐶𝑄
2
= 𝜎𝐿𝑇𝐾
2
𝐻1: 𝜎 𝐶𝑄
2
≠ 𝜎𝐿𝑇𝐾
2
𝐻0: 𝜎 𝐶𝑄
2
= 𝜎 𝐻𝑉
2
𝐻1: 𝜎 𝐶𝑄
2
≠ 𝜎 𝐻𝑉
2
Dinh Tien Hoang
𝐻0: 𝜎 𝐷𝑇𝐻
2
= 𝜎𝐿𝑇𝐾
2
𝐻1: 𝜎 𝐷𝑇𝐻
2
≠ 𝜎𝐿𝑇𝐾
2
𝐻0: 𝜎 𝐷𝑇𝐻
2
= 𝜎 𝐻𝑉
2
𝐻1: 𝜎 𝐷𝑇𝐻
2
≠ 𝜎 𝐻𝑉
2
Ly Thuong Kiet
𝐻0: 𝜎𝐿𝑇𝐾
2
= 𝜎 𝐻𝑉
2
𝐻1: 𝜎𝐿𝑇𝐾
2
≠ 𝜎 𝐻𝑉
2
Hung Vuong
(I) Coopmart (J) Coopmart Test Statistic Critical Sig.
Coopmart Cong Quynh Coopmart Dinh Tien Hoang 1.93226 1.43485 .0003
Coopmart Ly Thuong Kiet 5.19545 1.43485 .0000
Coopmart Hung Vuong 1.02779 1.43485 .8814
Coopmart Dinh Tien Hoang Coopmart Cong Quynh 1.93226 1.43485 .0003
Coopmart Ly Thuong Kiet 2.64625 1.43485 .0000
Coopmart Hung Vuong 1.91024 1.43485 .0005
Coopmart Ly Thuong Kiet Coopmart Cong Quynh 5.19545 1.43485 .0000
Coopmart Dinh Tien Hoang 2.64625 1.43485 .0000
Coopmart Hung Vuong 5.05498 1.43485 .0000
Coopmart Hung Vuong Coopmart Cong Quynh 1.02779 1.43485 .8814
Coopmart Dinh Tien Hoang 1.91024 1.43485 .0005
Coopmart Ly Thuong Kiet 5.05498 1.43485 .0000
From these comparisons we determine that our data provide statistical evidence to conclude that 𝜎 𝐶𝑄
2
is
different from 𝜎𝐿𝑇𝐾
2
; 𝜎 𝐶𝑄
2
is different from 𝜎𝐿𝑇𝐾
2
; 𝜎 𝐷𝑇𝐻
2
is different from 𝜎𝐿𝑇𝐾
2
; 𝜎 𝐷𝑇𝐻
2
is different from 𝜎 𝐻𝑉
2
;
and 𝜎𝐿𝑇𝐾
2
is different from 𝜎 𝐻𝑉
2
. There are no other statistically significant differences at 𝛼 = 0.05.
c. (Data for question c is in sheet “Case 01 (c) – Coopmart”) To improve the unloading products
management, indicator unloading weight per minute (UWPM) is selected. This means higher UWPM is
better. To improve UWPM, project manager need to know what are factors that affects to UWPM. A
sample of 240 times unloading products were recorded. It is suspected that UWPM has close relation to
two key factors. The first factor is the number of workers. The second factors is year of experience.
For the first factor, since different time the total weight unloading is different; hence an appropriate
indicator is total of worker involved/total weight (WIPW). For example, if 3,400kg of products need to
unload and the number of worker in the trial is 7, then WIPW is = 7/3,400 = 0.002051.
For the second factor, the average number of year experience of a group of workers (AvgYr) is used as an
indicator.
Construct a regression (Reg 1) in which UWPM is dependent variable, WIPW and AvgYr are independent
variables. What information that the project manager can withdraw from the regression (Reg 1) above.
Descriptive Statistics
Descriptive Statistics
Mean Std. Deviation N
UWPM 122.117 44.2590 240
WIPM .003960 .0017123 240
AvgYr 3.6669 1.64448 240
The constructed multiple regression model in which UWPM is dependent variable and WIPW and AvgYr
are independent variables is given by
𝑌 = 11.299 + 16,886.185𝑋1 + 11.985𝑋2 + 𝜀
The estimated regression relationship is: 𝑌̂ = 11.299 + 16,886.185𝑋1 + 11.985𝑋2
F-Test
Is there a relationship between the dependent variable 𝑌of UWPM and any of the explanatory,
independent variables, 𝑋1 and 𝑋2, of WIPM and AvgYr suggested by the regression equation under
consideration?
A statistical hypothesis test for the existence of a linear relationship between 𝑌 and any of the 𝑋1 and 𝑋2 is:
𝐻0: 𝛽1 = 𝛽2 = 0
𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝛽𝑖 (𝑖 = 1,2) 𝑎𝑟𝑒 𝑧𝑒𝑟𝑜
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 278085.484 2 139042.742 173.363 .000b
Residual 190081.189 237 802.030
Total 468166.673 239
a. Dependent Variable: UWPM
b. Predictors: (Constant), AvgYr, WIPM
As shown in the above table, since the p-value is small, we reject the null hypothesis that both slope
parameters 𝛽1 and 𝛽2 are zero, in favor of the alternative that the slope parameters are not both zero. There
is statistical evidence to conclude that, based on the testing results and our assumptions, a linear regression
relationship existing between UWPM and at least one of the independent variables, WIPM or AvgYr (or
both), proposed in the regression model.
Model Summary
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate Durbin-Watson
1 .771a
.594 .591 28.3201 1.978
a. Predictors: (Constant), AvgYr, WIPM
b. Dependent Variable: UWPM
In the above table, 𝑅2
= 0.594, which means that 59.4% of the variation in UWPM is explained by the
combination of the two independent variables, WIPM and AvgYr. Adjusted 𝑅2
is 0.591, which is very
close to the unadjusted measure. We conclude that the regression model fits the data very well since a high
percentage of the variation in UWPM is explained by WIPM and/or AvgYr
Coefficients
Hypothesis tests about individual regression slope parameters:
(1)
𝐻0: 𝛽1 = 0
𝐻1: 𝛽1 ≠ 0
(2)
𝐻0: 𝛽2 = 0
𝐻1: 𝛽2 ≠ 0
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
Collinearity
Statistics
B Std. Error Beta Tolerance VIF
1 (Constant) 11.299 6.319 1.788 .075
WIPM 16886.185 1071.371 .653 15.761 .000 .997 1.003
AvgYr 11.985 1.116 .445 10.744 .000 .997 1.003
We start with the test for the significance of variable 𝑋1 as a prediction variable of WIPM. The hypothesis
test is 𝐻0: 𝛽1 = 0 versus 𝐻1: 𝛽1 ≠ 0. As shown in the above table, since the p-value is small, we reject the
null hypothesis that the slope parameter 𝛽1 is zero. We therefore conclude that there is statistical evidence
that the slope of 𝑌 with respect to 𝑋1, the population parameter 𝛽1, is not zero. Variable of WIPM is shown
to have some explanatory power with respect to the dependent variable, UWPM.
The hypothesis test for 𝛽2 is 𝐻0: 𝛽2 = 0 versus 𝐻1: 𝛽2 ≠ 0. This p-value, too, is small. We conclude that
𝑋2 of AvgYr is also an important variable in the regression equation.
Finally, we conclude that both independent variables, WIPM and AvgYr, have close relation to the
dependent variable, UWPM that positively affects UWPM. Both slope parameters, 𝛽1 and 𝛽2, are positive,
which means that, everything else staying constant, the dependent variable of UWPM increases on average
as WIPM increases or AvgYr increases (or both).
Residual Plots
The above figure is a plot of the regression residuals against the dependent variable UWPM. As we
examine this figure carefully, we see that the spread of the residuals increases as UWPM increases. Thus,
the variance of the residuals is not constant. We have the situation called heteroscedasticity—a violation of
the assumption of equal error variance.
The Normal Probability Plot
The above figure is the normal probability plot of the residuals. The residuals lie along and less deviate
from the diagonal lie in the plot, they less deviates from the normal distribution. In the figure, the deviations
appear to be significant, so we conclude that the model assumption that the population errors ∈𝑗 are
normally distributed with mean zero and standard deviation 𝜎 is valid.
Multicollinearity
Correlation
Correlations
UWPM WIPM AvgYr
Pearson Correlation UWPM 1.000 .629 .410
WIPM .629 1.000 -.053
AvgYr .410 -.053 1.000
Sig. (1-tailed) UWPM . .000 .000
WIPM .000 . .205
AvgYr .000 .205 .
N UWPM 240 240 240
WIPM 240 240 240
AvgYr 240 240 240
In the correlation matrix shown in the above figure, we see that the correlation between the independent
variables, WIPM and AvgYr, are not high (−0.053). This means that the two variables do not represent the
same direction in space. Being lowly correlated with each other, the two variables do not contain the same
information about the dependent variable and therefore not cause multicollinearity when both are in the
regression equation.
Variance inflation factor
Model
Collinearity
Statistics
Tolerance VIF
1 (Constant)
WIPM .997 1.003
AvgYr .997 1.003
The above figure shows the output for the current regression problem which contains the VIF values in the
last column. We note that the VIF for variables, WIPM and AvgYr, are not greater than 5 that does not
indicate the degree of multicollinearity existing with respect to the independent variables.
CASE 03: TON DUC THANG UNIVERSITY – CONTINUOUS IMPROVEMENT IN
EDUCATION PROGRAM
Continuous improvement in education program is always one of the top strategic priority of Ton Duc
Thang University. Every period, TDT University always applies the new teaching methods for continuously
improving education programs. Recently, there is a suspect that the students perform better in the
experiment classes (the classes are applied the new teaching method) compared to the control classes (the
classes are applied the old teaching method).
a. Present the methodology on how much test that suspect (what is your argument and what is an
appropriate Statistics tests and why);
b. How do you conduct sample for Statistics test;
c. Present the result of your Statistics test;
d. What is your conclusion from Statistics test?.
Data
Experiment Class Control Class Experiment Class Control Class
Students Test 1 Test 2 Test 1 Test 2 Students Test 1 Test 2 Test 1 Test 2
1 63 84 88 71 31 62 82 83 91
2 71 89 59 91 32 77 77 80 63
3 87 70 85 79 33 87 69 94 79
4 66 73 64 79 34 63 76 53 92
5 63 74 95 91 35 73 95 70 58
6 70 97 92 89 36 90 72 90 86
7 63 89 71 85 37 84 75 57 74
8 84 80 58 90 38 64 77 82 82
9 84 86 62 69 39 85 98 65 66
10 63 77 93 76 40 86 78 54 83
11 62 74 80 80 41 66 86 68 74
12 68 75 89 85 42 83 70 73 66
13 84 98 65 93 43 61 90 71 97
14 90 75 67 61 44 81 70 86 93
15 86 96 76 84 45 60 85 90 80
16 69 76 81 77 46 72 83 80 75
17 87 89 85 75 47 60 90 55 70
18 60 74 85 83 48 87 68 81 96
19 64 81 87 68 49 65 78 94 82
20 67 86 86 91 50 71 81 95 78
21 64 72 86 92 51 74 69 60 63
22 86 69 85 97 52 60 78 90 98
23 88 94 77 60 53 90 85 66 61
24 67 89 85 61 54 68 84 74 90
25 66 73 90 84 55 67 90 83 74
26 83 80 72 93 56 77 95 77 77
27 89 94 70 92 57 79 67 53 93
28 68 66 60 79 58 64 82 80 98
29 81 87 60 67 59 90 92 67 61
30 71 76 74 63 60 67 90 60 95
Experiment Class Control Class Experiment Class Control Class
Students Test 1 Test 2 Test 1 Test 2 Students Test 1 Test 2 Test 1 Test 2
61 84 93 75 89 61 84 93 75 89
62 90 87 85 61 62 90 87 85 61
63 63 90 78 74 63 63 90 78 74
64 69 72 54 71 64 69 72 54 71
65 81 66 73 97 65 81 66 73 97
66 67 80 71 93 66 67 80 71 93
67 63 74 92 61 67 63 74 92 61
68 62 78 88 89 68 62 78 88 89
69 90 74 78 69 69 90 74 78 69
70 86 80 57 72 70 86 80 57 72
71 83 96 82 73 71 83 96 82 73
72 80 72 92 59 72 80 72 92 59
73 89 89 61 84 73 89 89 61 84
74 69 69 81 70 74 69 69 81 70
75 64 72 73 86 75 64 72 73 86
76 90 83 85 61 76 90 83 85 61
77 77 85 53 95 77 77 85 53 95
78 86 75 54 93 78 86 75 54 93
79 60 75 85 92 79 60 75 85 92
80 84 94 78 84 80 84 94 78 84
81 66 77 73 70 81 66 77 73 70
82 85 71 91 65 82 85 71 91 65
83 86 86 91 98 83 86 86 91 98
84 83 71 72 90 84 83 71 72 90
85 75 87 67 91 85 75 87 67 91
86 67 87 77 98 86 67 87 77 98
87 88 70 94 98 87 88 70 94 98
88 65 80 68 62 88 65 80 68 62
89 82 74 80 90 89 82 74 80 90
90 89 66 94 84 90 89 66 94 84
Descriptive Statistics
Descriptive Statistics
N Mean Std. Deviation
Test 1 (Experiment Class) 120 75.8000 10.01981
Test 2 (Experiment Class) 120 81.6833 8.78882
Test 1 (Control Class) 120 75.6167 12.51767
Test 2 (Control Class) 120 79.1833 11.91777
Valid N (listwise) 120
Pair Samples Statistics
Test 1 Test 2 Test 1 Test 2
Students Ex. - Co. Ex. - Co. Students Ex. - Co. Ex. - Co.
1 -25 13 31 -21 -9
2 12 -2 32 -3 14
3 2 -9 33 -7 -10
4 2 -6 34 10 -16
5 -32 -17 35 3 37
6 -22 8 36 0 -14
7 -8 4 37 27 1
8 26 -10 38 -18 -5
9 22 17 39 20 32
10 -30 1 40 32 -5
11 -18 -6 41 -2 12
12 -21 -10 42 10 4
13 19 5 43 -10 -7
14 23 14 44 -5 -23
15 10 12 45 -30 5
16 -12 -1 46 -8 8
17 2 14 47 5 20
18 -25 -9 48 6 -28
19 -23 13 49 -29 -4
20 -19 -5 50 -24 3
21 -22 -20 51 14 6
22 1 -28 52 -30 -20
23 11 34 53 24 24
24 -18 28 54 -6 -6
25 -24 -11 55 -16 16
26 11 -13 56 0 18
27 19 2 57 26 -26
28 8 -13 58 -16 -16
29 21 20 59 23 31
30 -3 13 60 7 -5
Test 1 Test 2 Test 1 Test 2
Students Ex. - Co. Ex. - Co. Students Ex. - Co. Ex. - Co.
61 9 4 91 4 16
62 5 26 92 8 0
63 -15 16 93 19 12
64 15 1 94 25 -7
65 8 -31 95 -6 -13
66 -4 -13 96 -12 22
67 -29 13 97 1 32
68 -26 -11 98 9 -4
69 12 5 99 15 14
70 29 8 100 13 4
71 1 23 101 14 -11
72 -12 13 102 -19 9
73 28 5 103 1 10
74 -12 -1 104 -1 22
75 -9 -14 105 -24 17
76 5 22 106 16 1
77 24 -10 107 -6 22
78 32 -18 108 30 17
79 -25 -17 109 32 14
80 6 10 110 -5 22
81 -7 7 111 2 -6
82 -6 6 112 -28 6
83 -5 -12 113 4 29
84 11 -19 114 3 -7
85 8 -4 115 16 17
86 -10 -11 116 29 15
87 -6 -28 117 -10 -4
88 -3 18 118 -4 1
89 2 -16 119 9 -9
90 -5 -18 120 -3 30
Descriptive Statistics
N Mean Std. Deviation
Test 1 (Ex.-Co.) 120 .1833 16.85728
Test 2 (Ex.-Co.) 120 2.5000 15.53797
Valid N (listwise) 120
For each test (Test 1 and Test 2), the hypothesis test involves two populations: the population of students
who study in the experiment class and the population of students who study in the control class. We want to
test the null hypothesis that the mean test score in both populations is equal versus the alternative
hypothesis that the mean for the experiment-class students is greater. Using the same students for the tests
and pairing their observations in an experiment-and-control (Ex.-Co.) way makes the test more precise than it
would be without pairing.
Under these circumstances, it is easy to see that the variable in which we are interested is the difference
between the test score of the students who study in the experiment class and that of the students who study
in the control class. The population parameter about which we want to draw an inference is the mean
difference between the two populations.
For Test 1, we denote the population parameter by 𝜇 𝐷.𝑇𝑒𝑠𝑡 1, the mean difference. This parameter is
defined as 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 = 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 1 − 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 1, where 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 1 is the average test-1 score of the students
who study in the experiment class and 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 1 is the average test-1 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
𝐻0: 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 ≤ 0
𝐻1: 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 > 0
For Test 2, we denote the population parameter by 𝜇 𝐷.𝑇𝑒𝑠𝑡 2, the mean difference. This parameter is
defined as 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 = 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 2 − 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 2, where 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 2 is the average test-2 score of the students
who study in the experiment class and 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 2 is the average test-2 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
𝐻0: 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 ≤ 0
𝐻1: 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 > 0
The only assumption we make when we use this test is that the populations of differences are normally
distributed.
Paired Samples Test
Paired Differences
t df
Sig.
(2-tailed)
Sig.
(R-tailed)Mean
Std.
Deviation
Std.
Error
Mean
95% Confidence
Interval
of the Difference
Lower Upper
Pair 1 Test 1 (Experiment Class)
- Test 1 (Control Class)
.18333 16.85728 1.53885
-
2.86375
3.23041 .119 119 .905 .453
Pair 2 Test 2 (Experiment Class)
- Test 2 (Control Class)
2.50000 15.53797 1.41842 -.30861 5.30861 1.763 119 .081 .040
As shown in the above table, for Test 1 (Pair 1), since the p-value is greater than levels of α even larger than
0.10, we conclude that the test-1 scores of the students who study in the experiment class is not higher than
that of the students who in the control class.
However, for Test 2 (Pair 2), since the p-value is smaller than α level of 0.05, we conclude that the test-2
scores of the students who study in the experiment class is higher than that of the students who in the
control class, but the testing result is not strongly significant that may change at different levels of α.
------ THE END ------

More Related Content

PDF
Intro to Quant Trading Strategies (Lecture 6 of 10)
PDF
Intro to Quant Trading Strategies (Lecture 9 of 10)
PDF
Intro to Quant Trading Strategies (Lecture 4 of 10)
PDF
Intro to Quant Trading Strategies (Lecture 10 of 10)
PPTX
Decision theory
PDF
Intro to Quant Trading Strategies (Lecture 2 of 10)
PPT
Decision theory
Intro to Quant Trading Strategies (Lecture 6 of 10)
Intro to Quant Trading Strategies (Lecture 9 of 10)
Intro to Quant Trading Strategies (Lecture 4 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
Decision theory
Intro to Quant Trading Strategies (Lecture 2 of 10)
Decision theory

What's hot (19)

PPTX
Decision analysis
PDF
Intro to Quant Trading Strategies (Lecture 1 of 10)
PPTX
Chapter 07
PPTX
Decision Theory
PPT
Presentazione tesi
PDF
Intro to Quantitative Investment (Lecture 6 of 6)
PDF
Chapter2 slides-part 2-harish complete
PDF
Chapter 3
PPT
Telesidang 4 bab_8_9_10stst
PDF
Significance Tests
PPTX
Probability Distribution
PPTX
DECISION THEORY WITH EXAMPLE
PDF
Taxi for Professor Evans
PPTX
Psych stats Probability and Probability Distribution
PPTX
Decision Theory
PPTX
Probability distributions
PPTX
A.6 confidence intervals
PDF
Intro to Quant Trading Strategies (Lecture 7 of 10)
DOCX
Option pricing under quantum theory of securities price formation - with copy...
Decision analysis
Intro to Quant Trading Strategies (Lecture 1 of 10)
Chapter 07
Decision Theory
Presentazione tesi
Intro to Quantitative Investment (Lecture 6 of 6)
Chapter2 slides-part 2-harish complete
Chapter 3
Telesidang 4 bab_8_9_10stst
Significance Tests
Probability Distribution
DECISION THEORY WITH EXAMPLE
Taxi for Professor Evans
Psych stats Probability and Probability Distribution
Decision Theory
Probability distributions
A.6 confidence intervals
Intro to Quant Trading Strategies (Lecture 7 of 10)
Option pricing under quantum theory of securities price formation - with copy...
Ad

Viewers also liked (20)

PDF
Chapter 16
PPTX
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
DOCX
Financialmodeling
PPTX
The power of RapidMiner, showing the direct marketing demo
PPTX
Midterm
PPT
Qam formulas
PPTX
Regression
PPTX
ForecastIT 2. Linear Regression & Model Statistics
PDF
Regression: A skin-deep dive
PDF
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
PDF
C2.1 intro
PDF
Chapt 11 & 12 linear & multiple regression minitab
PDF
Simple linear regression project
PPTX
Simple Linear Regression
PPTX
Statr session 23 and 24
PDF
Ch14
PPSX
Simple linear regression
PPTX
Logistic regression for ordered dependant variable with more than 2 levels
PDF
Simple linear regression
PPT
Chapter13
Chapter 16
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Financialmodeling
The power of RapidMiner, showing the direct marketing demo
Midterm
Qam formulas
Regression
ForecastIT 2. Linear Regression & Model Statistics
Regression: A skin-deep dive
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
C2.1 intro
Chapt 11 & 12 linear & multiple regression minitab
Simple linear regression project
Simple Linear Regression
Statr session 23 and 24
Ch14
Simple linear regression
Logistic regression for ordered dependant variable with more than 2 levels
Simple linear regression
Chapter13
Ad

Similar to Statisticsfor businessproject solution (20)

PDF
creditriskmanagment_howardhaughton121510
DOCX
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
DOCX
Chapter 06 - Efficient DiversificationChapter 06Efficient Di.docx
PDF
bkm9e-answers-chap007.phtthy thth jujse èdf
PPTX
ACT04_CH14 STATISTICAL ANALYSIS.pptx
PPTX
Corrleation and regression
PPT
3.1 Security risk Valuation.ppt present by akash
PDF
Estimating Market Risk Measures: An Introduction and Overview
PDF
Risk-Analysis.pdf
PPTX
Financial Management: Risk and Rates of Return
PPTX
UNIT 3 SLIDES-1 Daniel Kojo Frederickwalter777@gmail.com
PDF
FinalThesis_AnasRadouani
PPT
Financial Management Slides Ch 05
PPTX
UNIT 3.pptx.......................................
DOC
Marketing Engineering Notes
PPT
Value At Risk Sep 22
PDF
Occidental petroleum corp.
PDF
Occidental petroleum corp.
PDF
Risk and return - IMP.pdf
creditriskmanagment_howardhaughton121510
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Chapter 06 - Efficient DiversificationChapter 06Efficient Di.docx
bkm9e-answers-chap007.phtthy thth jujse èdf
ACT04_CH14 STATISTICAL ANALYSIS.pptx
Corrleation and regression
3.1 Security risk Valuation.ppt present by akash
Estimating Market Risk Measures: An Introduction and Overview
Risk-Analysis.pdf
Financial Management: Risk and Rates of Return
UNIT 3 SLIDES-1 Daniel Kojo Frederickwalter777@gmail.com
FinalThesis_AnasRadouani
Financial Management Slides Ch 05
UNIT 3.pptx.......................................
Marketing Engineering Notes
Value At Risk Sep 22
Occidental petroleum corp.
Occidental petroleum corp.
Risk and return - IMP.pdf

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Introduction to the R Programming Language
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Supervised vs unsupervised machine learning algorithms
Database Infoormation System (DBIS).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
SAP 2 completion done . PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ISS -ESG Data flows What is ESG and HowHow
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
Introduction to the R Programming Language
[EN] Industrial Machine Downtime Prediction
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
STERILIZATION AND DISINFECTION-1.ppthhhbx
Supervised vs unsupervised machine learning algorithms

Statisticsfor businessproject solution

  • 1. CASE 01: CASE 14 – RISK AND RETURN (CHAPTER 10: SIMPLE LINEAR REGRESSION) According to the Capital Asset Pricing Model (CAPM), the risk associated with a capital asset is proportional to the slope 𝛽1 (or simply 𝛽) obtained by regressing the asset’s past returns with the corresponding returns of the average portfolio called the market portfolio. (The return of the market portfolio represents the return earned by the average investor. It is a weighted average of the returns from all the assets in the market.) The larger the slope 𝛽 of an asset, the larger is the risk associated with that asset. A 𝛽 of 1.00 represents average risk. The returns from an electronics firm’s stock and the corresponding returns for the market portfolio for the past 15 years are given below. Market Return Stock’s Return (%) (%) 16.02 21.05 12.17 17.25 11.48 13.1 17.62 18.23 20.01 21.52 14 13.26 13.22 15.84 17.79 22.18 15.46 16.26 8.09 5.64 11 10.54 18.52 17.86 14.05 12.75 8.79 9.13 11.6 13.87 1. Carry out the regression and find the 𝛽 for the stock. What is the regression equation? Independent variable (𝑋): Market Return Dependent variable (𝑌): Stock’s Return Least-square estimator 𝑏0, which estimates the intercept 𝛽0 of the model, is −1.090724 Least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.166957 Regression equation: 𝑌 = 1.166957𝑋 − 1.090724
  • 2. y = 1.167x - 1.0907 0 5 10 15 20 25 0 5 10 15 20 25 Y X Simple Regression CASE 14: Risk and Return Stock's Return Market's Return r 2 0.7775 Coefficient of Determination X Y Error Confidence Interval for Slope r 0.8818 Coefficient of Correlation 1 16.02 21.05 3.44607 1-a (1-a) C.I. for b1 2 12.17 17.25 4.13886 95% 1.16696 + or - 0.37405 s(b 1 ) 0.17314 Standard Error of Slope 3 11.48 13.1 0.79406 t 6.73986 4 17.62 18.23 -1.2411 Confidence Interval for Intercept p- value 0.0000 5 20.01 21.52 -0.7401 1-a (1-a) C.I. for b0 6 14 13.26 -1.9867 95% -1.0907 + or - 5.38802 s(b 0 ) 2.49403 Standard Error of Intercept 7 13.22 15.84 1.50356 8 17.79 22.18 2.51056 Prediction Interval for Y 9 15.46 16.26 -0.6904 1-a X (1-a) C.I. for Y given X 10 8.09 5.64 -2.7099 95% 10 10.5788 + or - 5.35692 s 2.30593 Standard Error of prediction 11 11 10.55 -1.1958 12 18.52 17.86 -2.6613 Prediction Interval for E[Y|X] 13 14.05 12.75 -2.555 1-a X (1-a) C.I. for E[Y | X ] 14 8.79 9.13 -0.0368 95% 10 10.5788 + or - 1.96969 15 11.6 13.87 1.42403 ANOVA Table Source SS df MS F F critical p-value Regn. 241.543 1 241.543 45.4257 4.66719 0.0000 Error 69.125 13 5.3173 Total 310.667 14
  • 3. 2. State your interpretation about the slope 𝛽1 of the model (Hint: Does the value of the slope indicate that the stock has above-average risk? For the purposes of this case assume that the risk is average if the slope is in the range 1 ± 0.1, below average if it is less than 0.9, and above average if it is more than 1.1.) Since the least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.166957 (> 1.10), the value of the slope indicate that the stock has above-average risk. 3. Give a 95% confidence interval for this 𝛽. Can we say the risk is above average with 95% confidence? Confidence Intervals for the Regression Parameters: A (1 − 𝛼)100% confidence interval for 𝛽1is: 𝑏1 ± 𝑡( 𝛼 2 ,𝑛−2) 𝑠(𝑏1) A 95% confidence interval for 𝛽1is: 𝑏1 ± 𝑡(0.025,15−2) 𝑠(𝑏1) = 1.166957 ± (2.16)(0.17314) = [0.79291, 1.54101] 4. If the market portfolio return for the current year is 10%, what is the stock’s return predicted by the regression equation? Give a 95% confidence interval for this prediction. If the market portfolio return for the current year is 10% (𝑋 = 10), the stock’s return predicted by the regression equation: 𝑌̂ = 1.166957𝑋 − 1.090724 = 1.166957(10) − 1.090724 = 10.57884 Prediction Intervals A (1 − 𝛼)100% prediction interval for 𝑌 is: 𝑦̂ ± 𝑡 𝛼/2 𝑠√1 + 1 𝑛 + (𝑥 − 𝑥̅)2 𝑆𝑆 𝑥 = 10.57884 ± (2.16)(2.3059)√1 + 1 15 + (10 − 13.988)2 177.3712 = [5.2219, 15.9358] 5. Construct a residual plot. Do the residuals appear random? A Check for the Equality of Variance of the Errors One of the assumptions in the regression model is the equality of variance of the errors. One of several ways to test for the normality of the residuals is to use a residual plot of the residuals. The residual plot is constructed as follows.
  • 4. Residual Plot: The residuals appear random. A graph of the regression errors, the residuals, versus the independent variable X, will reveal whether the variance of the errors is constant. The variance of the residuals is indicated by the width of the scatter plot of the residuals as X increases. If the width of the scatter plot of the residuals either increases or decreases as X increases, then the assumption of constant variance is not met. This problem is called heteroscedasticity. When heteroscedasticity exists, we cannot use the ordinary least squares method for estimating the regression and should use a more complex method, called generalized least squares. The above figure shows a residual plot in a good regression, with no heteroscedasticity that the residuals appear random. 6. Construct a normal probability plot. Do the residuals appear to be normally distributed? The Normal Probability Plot One of the assumptions in the regression model is that the errors are normally distributed. This assumption is necessary for calculating prediction intervals and for hypothesis tests about the regression. One of several ways to test for the normality of the residuals is to use a normal probability plot of the residuals. The normal probability plot is constructed as follows. -4 -3 -2 -1 0 1 2 3 4 5Error X Residual Plot
  • 5. Normal Probability Plot: The residuals appear to be normally distributed. In this plot, the residual values are on the horizontal axis and the corresponding z values from the normal distribution are on the vertical axis. If the residuals are normal, then they should align themselves along the straight line that appears on the plot. To the extent the points deviate from this straight line, the residuals deviate from a normal distribution. It is useful to recognize whether the assumption of normally distributed errors holds on a normal probability plot. The above figure a case where the residuals are relatively normal, but from the pattern of the points we can also infer that the distribution of the residuals is flatter than the normal distribution. 7. (Optional) The risk-free rate of return is the rate associated with an investment that has no risk at all, such as lending money to the government. Assume that for the current year the risk-free rate is 6%. According to the CAPM, when the return from the market portfolio is equal to the risk-free rate, the return from every asset must also be equal to the risk-free rate. In other words, if the market portfolio return is 6%, then the stock’s return should also be 6%. It implies that the regression line must pass through the point (6, 6). Repeat the regression forcing this constraint. Comment on the risk based on the new regression equation. The Excel Solver Method for Regression The Solver macro available in Excel can also be used to conduct a simple linear regression. The advantage of using this method is that additional constraints can be imposed on the slope and the intercept. For instance, if we want the intercept to be a particular value, or if we want to force the regression line to go through a desired point, we can do that by imposing appropriate constraints. As the given problem, consider a common type of regression carried out in the area of finance. The risk of a stock (or any capital asset) is measured by regressing its returns against the market return (which is the average return from all the assets in the market) during the same period. The Capital Asset Pricing Model (CAPM) stipulates that when the market return equals the risk-free interest rate (such as the interest rate of short-term Treasury bills), the stock will also return the same amount. In other words, if the market return risk-free interest rate 6%, then the stock’s return, according to the CAPM, will also be 6%. This means that according to the CAPM, the regression line must pass through the point (6, 6). This can be imposed as a Normal Probability Plot of Residuals
  • 6. constraint in the Solver method of regression. Note that forcing a regression line through the origin, (0, 0), is the same as forcing the intercept to equal zero, and forcing the line through the point (0, 5) is the same as forcing the intercept to equal 5. The criterion for the line of best fit by the Solver method is still the same as before—minimize the sum of squared errors (SSE). Without any constraint, the regression equation is 𝑌̂ = 1.166957𝑋 − 1.090724 (obtained from the template for regular regression). For the market portfolio return of 6%, the predicted return of stock is 𝑌̂ = 1.166957𝑋 − 1.090724 = 1.166957(6) − 1.090724 = 5.911006 With the constraint, the regression equation changes as follows: Least-square estimator 𝑏0, which estimates the intercept 𝛽0 of the model, is −0.945353214 Least-square estimator 𝑏1, which estimates the slope 𝛽1of the model, is 1.157558869 Regression equation: 𝑌 = 1.157558869𝑋 − 0.945353214 Even though the risk of the regression model with the constraint (𝑏1 = 1.157558869) is lower than the risk of the original regression model without any constraint (𝑏1 = 1.166957958), the value of the slope still indicates that the stock has above-average risk. Regression Using the Solver SSE Intercept Slope Prediction 69.1435 b 0 b 1 X Y X Y Error -0.945353 1.157559 6 6 1 16.02 21.05 3.4513 2 12.17 17.25 4.1079 3 11.48 13.1 0.7566 4 17.62 18.23 -1.2208 5 20.01 21.52 -0.6974 6 14 13.26 -2.0005 7 13.22 15.84 1.4824 8 17.79 22.18 2.5324 9 15.46 16.26 -0.6905 10 8.09 5.64 -2.7793 11 11 10.55 -1.2378 12 18.52 17.86 -2.6326 13 14.05 12.75 -2.5683 14 8.79 9.13 -0.0996 15 11.6 13.87 1.3877 CASE 14: Risk and Return 0 5 10 15 20 25 0 5 10 15 20 25 Y X
  • 7. CASE 02: SAIGON COOPMART Logistics & Supply Chain plays an important role, if needed to say a critical factor for the success of Saigon Coopmart. Most of supermarkets over the world follow the identical model in which a warehouse is placed next to supermarket for stocks storage; and the size of warehouse is more or less equal to size of supermarket. However, due to harsh competition, and weak finance, Saigon Coopmart decided to follow a different model with very small size warehouse. This allows Saigon Coopmart to place more supermarkets; but in exchange, stocks only enough for a day, or maximum two compared to ordinary model in which a warehouse can store enough stocks for a week or more. As a consequence, Saigon Coopmart has to ship much more frequency to its supermarkets than its competitors such as Big C. Gaining trusted in customers over years, sale increased gradually. In late 2011, Logistics and Supply Chain department received warning from some directors of Coopmart supermarkets (Saigon Coopmart has many supermarkets, each supermarket is supervised by one director) that they suspected by the end of 2012, the warehouse would no longer enough for a day sale. This means supermarket would not have enough products to sell for customer. A logistics improvement project was conducted to solve the problem temporary to spare time for BOD of Saigon Coopmart to come with a new and complete solution. One of the sub-projects involved improving the unloading time (i.e. when trucks carrying products come to supermarket, the products are then unloaded and moved to warehouse). a. Indicator UWPM (Unloading Weight/minute) is used to measure the effectiveness of unloading product management. From information provide (Excel data file – Sheet “Case 01 (a-b) – Coopmart”, what can you tell about the unloading product management among four Coopmart supermarkets? (Important note: any statistics test used HAVE TO comply with explanation/argument why using that statistics test). ANOVA Test The required assumptions of ANOVA: 1. We assume independent random sampling from each of the r populations. 2. We assume that the r populations under study are normally distributed, with means 𝜇𝑖 that may or may not be equal, but with equal variances 𝜎2 . The null and alternative hypotheses here are, 𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐷𝑇𝐻 = 𝜇 𝐿𝑇𝐾 = 𝜇 𝐻𝑉 𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑓𝑜𝑢𝑟 𝜇𝑖 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 ANOVA Table ANOVA UWPM Sum of Squares df Mean Square F Sig. Between Groups 205089.155 3 68363.052 141.665 .000 Within Groups 229702.540 476 482.568 Total 434791.695 479 As shown in the above table, the p-value is smaller than 0.05, we reject the null hypothesis. We may conclude that, based on the testing results and our assumptions, it is likely that the four supermarkets
  • 8. studied are not equal in terms of average UWPM. Which supermarkets are more effective than others? This question will be answered when we return to the given problem in the next section. The method we will discuss here is the Tukey method of pairwise comparisons of the population means. The method is also called the HSD (honestly significant differences) test. This method allows us to compare every possible pair of means by using a single level of significance, say 𝛼 = 0.05 (or a single confidence coefficient, say, 1 − 0.05 = 0.95). The single level of significance applies to the entire set of pairwise comparisons. To compare the population mean vacationer responses for every pair of supermarkets, we use the following set of hypothesis tests: 𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐷𝑇𝐻 𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐷𝑇𝐻 𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐻𝑉 𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐻𝑉 𝐻0: 𝜇 𝐷𝑇𝐻 = 𝜇 𝐻𝑉 𝐻1: 𝜇 𝐷𝑇𝐻 ≠ 𝜇 𝐻𝑉 𝐻0: 𝜇 𝐶𝑄 = 𝜇 𝐿𝑇𝐾 𝐻1: 𝜇 𝐶𝑄 ≠ 𝜇 𝐿𝑇𝐾 𝐻0: 𝜇 𝐷𝑇𝐻 = 𝜇 𝐿𝑇𝐾 𝐻1: 𝜇 𝐷𝑇𝐻 ≠ 𝜇 𝐿𝑇𝐾 𝐻0: 𝜇 𝐿𝑇𝐾 = 𝜇 𝐻𝑉 𝐻1: 𝜇 𝐿𝑇𝐾 ≠ 𝜇 𝐻𝑉 From these comparisons we determine that our data provide statistical evidence to conclude that 𝜇 𝐶𝑄 is different from 𝜇 𝐷𝑇𝐻; 𝜇 𝐶𝑄 is different from 𝜇 𝐿𝑇𝐾; 𝜇 𝐶𝑄 is different from 𝜇 𝐻𝑉; 𝜇 𝐷𝑇𝐻 is different from 𝜇 𝐿𝑇𝐾; and 𝜇 𝐷𝑇𝐻 is different from 𝜇 𝐻𝑉. There are no other statistically significant differences at 𝛼 = 0.05. b. Further investigation shows that measuring the effectiveness by mean value is not enough because there might be a case in which two or more supermarkets having the same mean (weight/minute) but with different variance. Then, the one with smaller variance turns out to be better. Construct the hypothesis testing for two population variances matrix as follow:
  • 9. From the result in that matrix and the result in question a, what is your conclusion? The F Distribution and a Test for Equality of Two Population Variances We assume independent random sampling from the four populations in question. We also assume that the four populations are normally distributed. The possible hypotheses to be tested are the following: Comparison between two population variance matrix Cong Quynh Dinh Tien Hoang Ly Thuong Kiet Hung Vuong Cong Quynh 𝐻0: 𝜎 𝐶𝑄 2 = 𝜎 𝐷𝑇𝐻 2 𝐻1: 𝜎 𝐶𝑄 2 ≠ 𝜎 𝐷𝑇𝐻 2 𝐻0: 𝜎 𝐶𝑄 2 = 𝜎𝐿𝑇𝐾 2 𝐻1: 𝜎 𝐶𝑄 2 ≠ 𝜎𝐿𝑇𝐾 2 𝐻0: 𝜎 𝐶𝑄 2 = 𝜎 𝐻𝑉 2 𝐻1: 𝜎 𝐶𝑄 2 ≠ 𝜎 𝐻𝑉 2 Dinh Tien Hoang 𝐻0: 𝜎 𝐷𝑇𝐻 2 = 𝜎𝐿𝑇𝐾 2 𝐻1: 𝜎 𝐷𝑇𝐻 2 ≠ 𝜎𝐿𝑇𝐾 2 𝐻0: 𝜎 𝐷𝑇𝐻 2 = 𝜎 𝐻𝑉 2 𝐻1: 𝜎 𝐷𝑇𝐻 2 ≠ 𝜎 𝐻𝑉 2 Ly Thuong Kiet 𝐻0: 𝜎𝐿𝑇𝐾 2 = 𝜎 𝐻𝑉 2 𝐻1: 𝜎𝐿𝑇𝐾 2 ≠ 𝜎 𝐻𝑉 2 Hung Vuong (I) Coopmart (J) Coopmart Test Statistic Critical Sig. Coopmart Cong Quynh Coopmart Dinh Tien Hoang 1.93226 1.43485 .0003 Coopmart Ly Thuong Kiet 5.19545 1.43485 .0000 Coopmart Hung Vuong 1.02779 1.43485 .8814 Coopmart Dinh Tien Hoang Coopmart Cong Quynh 1.93226 1.43485 .0003 Coopmart Ly Thuong Kiet 2.64625 1.43485 .0000 Coopmart Hung Vuong 1.91024 1.43485 .0005 Coopmart Ly Thuong Kiet Coopmart Cong Quynh 5.19545 1.43485 .0000 Coopmart Dinh Tien Hoang 2.64625 1.43485 .0000 Coopmart Hung Vuong 5.05498 1.43485 .0000 Coopmart Hung Vuong Coopmart Cong Quynh 1.02779 1.43485 .8814 Coopmart Dinh Tien Hoang 1.91024 1.43485 .0005 Coopmart Ly Thuong Kiet 5.05498 1.43485 .0000
  • 10. From these comparisons we determine that our data provide statistical evidence to conclude that 𝜎 𝐶𝑄 2 is different from 𝜎𝐿𝑇𝐾 2 ; 𝜎 𝐶𝑄 2 is different from 𝜎𝐿𝑇𝐾 2 ; 𝜎 𝐷𝑇𝐻 2 is different from 𝜎𝐿𝑇𝐾 2 ; 𝜎 𝐷𝑇𝐻 2 is different from 𝜎 𝐻𝑉 2 ; and 𝜎𝐿𝑇𝐾 2 is different from 𝜎 𝐻𝑉 2 . There are no other statistically significant differences at 𝛼 = 0.05. c. (Data for question c is in sheet “Case 01 (c) – Coopmart”) To improve the unloading products management, indicator unloading weight per minute (UWPM) is selected. This means higher UWPM is better. To improve UWPM, project manager need to know what are factors that affects to UWPM. A sample of 240 times unloading products were recorded. It is suspected that UWPM has close relation to two key factors. The first factor is the number of workers. The second factors is year of experience. For the first factor, since different time the total weight unloading is different; hence an appropriate indicator is total of worker involved/total weight (WIPW). For example, if 3,400kg of products need to unload and the number of worker in the trial is 7, then WIPW is = 7/3,400 = 0.002051. For the second factor, the average number of year experience of a group of workers (AvgYr) is used as an indicator. Construct a regression (Reg 1) in which UWPM is dependent variable, WIPW and AvgYr are independent variables. What information that the project manager can withdraw from the regression (Reg 1) above. Descriptive Statistics Descriptive Statistics Mean Std. Deviation N UWPM 122.117 44.2590 240 WIPM .003960 .0017123 240 AvgYr 3.6669 1.64448 240 The constructed multiple regression model in which UWPM is dependent variable and WIPW and AvgYr are independent variables is given by 𝑌 = 11.299 + 16,886.185𝑋1 + 11.985𝑋2 + 𝜀 The estimated regression relationship is: 𝑌̂ = 11.299 + 16,886.185𝑋1 + 11.985𝑋2 F-Test Is there a relationship between the dependent variable 𝑌of UWPM and any of the explanatory, independent variables, 𝑋1 and 𝑋2, of WIPM and AvgYr suggested by the regression equation under consideration?
  • 11. A statistical hypothesis test for the existence of a linear relationship between 𝑌 and any of the 𝑋1 and 𝑋2 is: 𝐻0: 𝛽1 = 𝛽2 = 0 𝐻1: 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝛽𝑖 (𝑖 = 1,2) 𝑎𝑟𝑒 𝑧𝑒𝑟𝑜 ANOVAa Model Sum of Squares df Mean Square F Sig. 1 Regression 278085.484 2 139042.742 173.363 .000b Residual 190081.189 237 802.030 Total 468166.673 239 a. Dependent Variable: UWPM b. Predictors: (Constant), AvgYr, WIPM As shown in the above table, since the p-value is small, we reject the null hypothesis that both slope parameters 𝛽1 and 𝛽2 are zero, in favor of the alternative that the slope parameters are not both zero. There is statistical evidence to conclude that, based on the testing results and our assumptions, a linear regression relationship existing between UWPM and at least one of the independent variables, WIPM or AvgYr (or both), proposed in the regression model. Model Summary Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson 1 .771a .594 .591 28.3201 1.978 a. Predictors: (Constant), AvgYr, WIPM b. Dependent Variable: UWPM In the above table, 𝑅2 = 0.594, which means that 59.4% of the variation in UWPM is explained by the combination of the two independent variables, WIPM and AvgYr. Adjusted 𝑅2 is 0.591, which is very close to the unadjusted measure. We conclude that the regression model fits the data very well since a high percentage of the variation in UWPM is explained by WIPM and/or AvgYr Coefficients Hypothesis tests about individual regression slope parameters: (1) 𝐻0: 𝛽1 = 0 𝐻1: 𝛽1 ≠ 0 (2) 𝐻0: 𝛽2 = 0 𝐻1: 𝛽2 ≠ 0
  • 12. Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) 11.299 6.319 1.788 .075 WIPM 16886.185 1071.371 .653 15.761 .000 .997 1.003 AvgYr 11.985 1.116 .445 10.744 .000 .997 1.003 We start with the test for the significance of variable 𝑋1 as a prediction variable of WIPM. The hypothesis test is 𝐻0: 𝛽1 = 0 versus 𝐻1: 𝛽1 ≠ 0. As shown in the above table, since the p-value is small, we reject the null hypothesis that the slope parameter 𝛽1 is zero. We therefore conclude that there is statistical evidence that the slope of 𝑌 with respect to 𝑋1, the population parameter 𝛽1, is not zero. Variable of WIPM is shown to have some explanatory power with respect to the dependent variable, UWPM. The hypothesis test for 𝛽2 is 𝐻0: 𝛽2 = 0 versus 𝐻1: 𝛽2 ≠ 0. This p-value, too, is small. We conclude that 𝑋2 of AvgYr is also an important variable in the regression equation. Finally, we conclude that both independent variables, WIPM and AvgYr, have close relation to the dependent variable, UWPM that positively affects UWPM. Both slope parameters, 𝛽1 and 𝛽2, are positive, which means that, everything else staying constant, the dependent variable of UWPM increases on average as WIPM increases or AvgYr increases (or both). Residual Plots The above figure is a plot of the regression residuals against the dependent variable UWPM. As we examine this figure carefully, we see that the spread of the residuals increases as UWPM increases. Thus, the variance of the residuals is not constant. We have the situation called heteroscedasticity—a violation of the assumption of equal error variance.
  • 13. The Normal Probability Plot The above figure is the normal probability plot of the residuals. The residuals lie along and less deviate from the diagonal lie in the plot, they less deviates from the normal distribution. In the figure, the deviations appear to be significant, so we conclude that the model assumption that the population errors ∈𝑗 are normally distributed with mean zero and standard deviation 𝜎 is valid. Multicollinearity Correlation Correlations UWPM WIPM AvgYr Pearson Correlation UWPM 1.000 .629 .410 WIPM .629 1.000 -.053 AvgYr .410 -.053 1.000 Sig. (1-tailed) UWPM . .000 .000 WIPM .000 . .205 AvgYr .000 .205 . N UWPM 240 240 240 WIPM 240 240 240 AvgYr 240 240 240 In the correlation matrix shown in the above figure, we see that the correlation between the independent variables, WIPM and AvgYr, are not high (−0.053). This means that the two variables do not represent the same direction in space. Being lowly correlated with each other, the two variables do not contain the same
  • 14. information about the dependent variable and therefore not cause multicollinearity when both are in the regression equation. Variance inflation factor Model Collinearity Statistics Tolerance VIF 1 (Constant) WIPM .997 1.003 AvgYr .997 1.003 The above figure shows the output for the current regression problem which contains the VIF values in the last column. We note that the VIF for variables, WIPM and AvgYr, are not greater than 5 that does not indicate the degree of multicollinearity existing with respect to the independent variables. CASE 03: TON DUC THANG UNIVERSITY – CONTINUOUS IMPROVEMENT IN EDUCATION PROGRAM Continuous improvement in education program is always one of the top strategic priority of Ton Duc Thang University. Every period, TDT University always applies the new teaching methods for continuously improving education programs. Recently, there is a suspect that the students perform better in the experiment classes (the classes are applied the new teaching method) compared to the control classes (the classes are applied the old teaching method). a. Present the methodology on how much test that suspect (what is your argument and what is an appropriate Statistics tests and why); b. How do you conduct sample for Statistics test; c. Present the result of your Statistics test; d. What is your conclusion from Statistics test?. Data Experiment Class Control Class Experiment Class Control Class Students Test 1 Test 2 Test 1 Test 2 Students Test 1 Test 2 Test 1 Test 2 1 63 84 88 71 31 62 82 83 91 2 71 89 59 91 32 77 77 80 63 3 87 70 85 79 33 87 69 94 79 4 66 73 64 79 34 63 76 53 92 5 63 74 95 91 35 73 95 70 58 6 70 97 92 89 36 90 72 90 86 7 63 89 71 85 37 84 75 57 74 8 84 80 58 90 38 64 77 82 82
  • 15. 9 84 86 62 69 39 85 98 65 66 10 63 77 93 76 40 86 78 54 83 11 62 74 80 80 41 66 86 68 74 12 68 75 89 85 42 83 70 73 66 13 84 98 65 93 43 61 90 71 97 14 90 75 67 61 44 81 70 86 93 15 86 96 76 84 45 60 85 90 80 16 69 76 81 77 46 72 83 80 75 17 87 89 85 75 47 60 90 55 70 18 60 74 85 83 48 87 68 81 96 19 64 81 87 68 49 65 78 94 82 20 67 86 86 91 50 71 81 95 78 21 64 72 86 92 51 74 69 60 63 22 86 69 85 97 52 60 78 90 98 23 88 94 77 60 53 90 85 66 61 24 67 89 85 61 54 68 84 74 90 25 66 73 90 84 55 67 90 83 74 26 83 80 72 93 56 77 95 77 77 27 89 94 70 92 57 79 67 53 93 28 68 66 60 79 58 64 82 80 98 29 81 87 60 67 59 90 92 67 61 30 71 76 74 63 60 67 90 60 95 Experiment Class Control Class Experiment Class Control Class Students Test 1 Test 2 Test 1 Test 2 Students Test 1 Test 2 Test 1 Test 2 61 84 93 75 89 61 84 93 75 89 62 90 87 85 61 62 90 87 85 61 63 63 90 78 74 63 63 90 78 74 64 69 72 54 71 64 69 72 54 71 65 81 66 73 97 65 81 66 73 97 66 67 80 71 93 66 67 80 71 93 67 63 74 92 61 67 63 74 92 61 68 62 78 88 89 68 62 78 88 89 69 90 74 78 69 69 90 74 78 69 70 86 80 57 72 70 86 80 57 72 71 83 96 82 73 71 83 96 82 73 72 80 72 92 59 72 80 72 92 59 73 89 89 61 84 73 89 89 61 84 74 69 69 81 70 74 69 69 81 70 75 64 72 73 86 75 64 72 73 86 76 90 83 85 61 76 90 83 85 61 77 77 85 53 95 77 77 85 53 95
  • 16. 78 86 75 54 93 78 86 75 54 93 79 60 75 85 92 79 60 75 85 92 80 84 94 78 84 80 84 94 78 84 81 66 77 73 70 81 66 77 73 70 82 85 71 91 65 82 85 71 91 65 83 86 86 91 98 83 86 86 91 98 84 83 71 72 90 84 83 71 72 90 85 75 87 67 91 85 75 87 67 91 86 67 87 77 98 86 67 87 77 98 87 88 70 94 98 87 88 70 94 98 88 65 80 68 62 88 65 80 68 62 89 82 74 80 90 89 82 74 80 90 90 89 66 94 84 90 89 66 94 84 Descriptive Statistics Descriptive Statistics N Mean Std. Deviation Test 1 (Experiment Class) 120 75.8000 10.01981 Test 2 (Experiment Class) 120 81.6833 8.78882 Test 1 (Control Class) 120 75.6167 12.51767 Test 2 (Control Class) 120 79.1833 11.91777 Valid N (listwise) 120 Pair Samples Statistics Test 1 Test 2 Test 1 Test 2 Students Ex. - Co. Ex. - Co. Students Ex. - Co. Ex. - Co. 1 -25 13 31 -21 -9 2 12 -2 32 -3 14 3 2 -9 33 -7 -10 4 2 -6 34 10 -16 5 -32 -17 35 3 37 6 -22 8 36 0 -14 7 -8 4 37 27 1 8 26 -10 38 -18 -5 9 22 17 39 20 32 10 -30 1 40 32 -5 11 -18 -6 41 -2 12 12 -21 -10 42 10 4 13 19 5 43 -10 -7
  • 17. 14 23 14 44 -5 -23 15 10 12 45 -30 5 16 -12 -1 46 -8 8 17 2 14 47 5 20 18 -25 -9 48 6 -28 19 -23 13 49 -29 -4 20 -19 -5 50 -24 3 21 -22 -20 51 14 6 22 1 -28 52 -30 -20 23 11 34 53 24 24 24 -18 28 54 -6 -6 25 -24 -11 55 -16 16 26 11 -13 56 0 18 27 19 2 57 26 -26 28 8 -13 58 -16 -16 29 21 20 59 23 31 30 -3 13 60 7 -5 Test 1 Test 2 Test 1 Test 2 Students Ex. - Co. Ex. - Co. Students Ex. - Co. Ex. - Co. 61 9 4 91 4 16 62 5 26 92 8 0 63 -15 16 93 19 12 64 15 1 94 25 -7 65 8 -31 95 -6 -13 66 -4 -13 96 -12 22 67 -29 13 97 1 32 68 -26 -11 98 9 -4 69 12 5 99 15 14 70 29 8 100 13 4 71 1 23 101 14 -11 72 -12 13 102 -19 9 73 28 5 103 1 10 74 -12 -1 104 -1 22 75 -9 -14 105 -24 17 76 5 22 106 16 1 77 24 -10 107 -6 22 78 32 -18 108 30 17 79 -25 -17 109 32 14 80 6 10 110 -5 22 81 -7 7 111 2 -6 82 -6 6 112 -28 6 83 -5 -12 113 4 29
  • 18. 84 11 -19 114 3 -7 85 8 -4 115 16 17 86 -10 -11 116 29 15 87 -6 -28 117 -10 -4 88 -3 18 118 -4 1 89 2 -16 119 9 -9 90 -5 -18 120 -3 30 Descriptive Statistics N Mean Std. Deviation Test 1 (Ex.-Co.) 120 .1833 16.85728 Test 2 (Ex.-Co.) 120 2.5000 15.53797 Valid N (listwise) 120 For each test (Test 1 and Test 2), the hypothesis test involves two populations: the population of students who study in the experiment class and the population of students who study in the control class. We want to test the null hypothesis that the mean test score in both populations is equal versus the alternative hypothesis that the mean for the experiment-class students is greater. Using the same students for the tests and pairing their observations in an experiment-and-control (Ex.-Co.) way makes the test more precise than it would be without pairing. Under these circumstances, it is easy to see that the variable in which we are interested is the difference between the test score of the students who study in the experiment class and that of the students who study in the control class. The population parameter about which we want to draw an inference is the mean difference between the two populations. For Test 1, we denote the population parameter by 𝜇 𝐷.𝑇𝑒𝑠𝑡 1, the mean difference. This parameter is defined as 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 = 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 1 − 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 1, where 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 1 is the average test-1 score of the students who study in the experiment class and 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 1 is the average test-1 score of the students who study in the control class. Our null and alternative hypotheses are, then, 𝐻0: 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 ≤ 0 𝐻1: 𝜇 𝐷.𝑇𝑒𝑠𝑡 1 > 0 For Test 2, we denote the population parameter by 𝜇 𝐷.𝑇𝑒𝑠𝑡 2, the mean difference. This parameter is defined as 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 = 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 2 − 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 2, where 𝜇 𝐸𝑥.𝑇𝑒𝑠𝑡 2 is the average test-2 score of the students who study in the experiment class and 𝜇 𝐶𝑜.𝑇𝑒𝑠𝑡 2 is the average test-2 score of the students who study in the control class. Our null and alternative hypotheses are, then, 𝐻0: 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 ≤ 0 𝐻1: 𝜇 𝐷.𝑇𝑒𝑠𝑡 2 > 0 The only assumption we make when we use this test is that the populations of differences are normally distributed.
  • 19. Paired Samples Test Paired Differences t df Sig. (2-tailed) Sig. (R-tailed)Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 Test 1 (Experiment Class) - Test 1 (Control Class) .18333 16.85728 1.53885 - 2.86375 3.23041 .119 119 .905 .453 Pair 2 Test 2 (Experiment Class) - Test 2 (Control Class) 2.50000 15.53797 1.41842 -.30861 5.30861 1.763 119 .081 .040 As shown in the above table, for Test 1 (Pair 1), since the p-value is greater than levels of α even larger than 0.10, we conclude that the test-1 scores of the students who study in the experiment class is not higher than that of the students who in the control class. However, for Test 2 (Pair 2), since the p-value is smaller than α level of 0.05, we conclude that the test-2 scores of the students who study in the experiment class is higher than that of the students who in the control class, but the testing result is not strongly significant that may change at different levels of α. ------ THE END ------