SlideShare a Scribd company logo
ContentsContents 
Analysis of Variance
44.1 One-Way Analysis of Variance 2
44.2 Two-Way Analysis of Variance 15
44.3 Experimental Design 40
Learning
In this Workbook you will learn the basics of this very important branch of Statistics and
how to do the calculations which enable you to draw conclusions about variance found in
data sets. You will also be introduced to the design of experiments which has great
importance in science and engineering.
Outcomes
One-Way Analysis
of Variance



44.1
Introduction
Problems in engineering often involve the exploration of the relationships between values taken by
a variable under different conditions. 41 introduced hypothesis testing which enables us to
compare two population means using hypotheses of the general form
H0 : µ1 = µ2
H1 : µ1 = µ2
or, in the case of more than two populations,
H0 : µ1 = µ2 = µ3 = . . . = µk
H1 : H0 is not true
If we are comparing more than two population means, using the type of hypothesis testing referred
to above gets very clumsy and very time consuming. As you will see, the statistical technique called
Analysis of Variance (ANOVA) enables us to compare several populations simultaneously. We
might, for example need to compare the shear strengths of five different adhesives or the surface
toughness of six samples of steel which have received different surface hardening treatments.
'

$
%
Prerequisites
Before starting this Section you should . . .
• be familiar with the general techniques of
hypothesis testing
• be familiar with the F-distribution
9
8
6
7
Learning Outcomes
On completion you should be able to . . .
• describe what is meant by the term one-way
ANOVA.
• perform one-way ANOVA calculations.
• interpret the results of one-way ANOVA
calculations
2 HELM (2005):
Workbook 44: Analysis of Variance
1. One-way ANOVA
In this Workbook we deal with one-way analysis of variance (one-way ANOVA) and two-way analysis of
variance (two-way ANOVA). One-way ANOVA enables us to compare several means simultaneously
by using the F-test and enables us to draw conclusions about the variance present in the set of
samples we wish to compare.
Multiple (greater than two) samples may be investigated using the techniques of two-population
hypothesis testing. As an example, it is possible to do a comparison looking for variation in the
surface hardness present in (say) three samples of steel which have received different surface hardening
treatments by using hypothesis tests of the form
H0 : µ1 = µ2
H1 : µ1 = µ2
We would have to compare all possible pairs of samples before reaching a conclusion. If we are
dealing with three samples we would need to perform a total of
3
C2 =
3!
1!2!
= 3
hypothesis tests. From a practical point of view this is not an efficient way of dealing with the
problem, especially since the number of tests required rises rapidly with the number of samples
involved. For example, an investigation involving ten samples would require
10
C2 =
10!
8!2!
= 45
separate hypothesis tests.
There is also another crucially important reason why techniques involving such batteries of tests are
unacceptable. In the case of 10 samples mentioned above, if the probability of correctly accepting a
given null hypothesis is 0.95, then the probability of correctly accepting the null hypothesis
H0 : µ1 = µ2 = . . . = µ10
is (0.95)45
≈ 0.10 and we have only a 10% chance of correctly accepting the null hypothesis for
all 45 tests. Clearly, such a low success rate is unacceptable. These problems may be avoided by
simultaneously testing the significance of the difference between a set of more than two population
means by using techniques known as the analysis of variance.
Essentially, we look at the variance between samples and the variance within samples and draw
conclusions from the results. Note that the variation between samples is due to assignable (or
controlled) causes often referred in general as treatments while the variation within samples is due
to chance. In the example above concerning the surface hardness present in three samples of steel
which have received different surface hardening treatments, the following diagrams illustrate the
differences which may occur when between sample and within sample variation is considered.
HELM (2005):
Section 44.1: One-Way Analysis of Variance
3
Case 1
In this case the variation within samples is roughly on a par with that occurring between samples.
Sample 1 Sample 2 Sample 3
¯s3
¯s1
¯s2
Figure 1
Case 2
In this case the variation within samples is considerably less than that occurring between samples.
Sample 1 Sample 2 Sample 3
¯s3
¯s1
¯s2
Figure 2
We argue that the greater the variation present between samples in comparison with the variation
present within samples the more likely it is that there are ‘real’ differences between the population
means, say µ1, µ2 and µ3. If such ‘real’ differences are shown to exist at a sufficiently high level
of significance, we may conclude that there is sufficient evidence to enable us to reject the null
hypothesis H0 : µ1 = µ2 = µ3.
Example of variance in data
This example looks at variance in data. Four machines are set up to produce alloy spacers for use in
the assembly of microlight aircraft. The spaces are supposed to be identical but the four machines
give rise to the following varied lengths in mm.
Machine AAA Machine BBB Machine CCC Machine DDD
46 56 55 49
54 55 51 53
48 56 50 57
46 60 51 60
56 53 53 51
4 HELM (2005):
Workbook 44: Analysis of Variance
Since the machines are set up to produce identical alloy spacers it is reasonable to ask if the evidence
we have suggests that the machine outputs are the same or different in some way. We are really
asking whether the sample means, say ¯XA, ¯XB, ¯XC and ¯XD, are different because of differences in
the respective population means, say µA, µB, µC and µD, or whether the differences in ¯XA, ¯XB, ¯XC
and ¯XD may be attributed to chance variation. Stated in terms of a hypothesis test, we would write
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
In order to decide between the hypotheses, we calculate the mean of each sample and overall mean
(the mean of the means) and use these quantities to calculate the variation present between the
samples. We then calculate the variation present within samples. The following tables illustrate the
calculations.
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
Machine AAA Machine BBB Machine CCC Machine DDD
46 56 55 49
54 55 51 53
48 56 50 57
46 60 51 60
56 53 53 51
¯XA = 50 ¯XB = 56 ¯XC = 52 ¯XD = 54
The mean of the means is clearly
¯¯X =
50 + 56 + 52 + 54
4
= 53
so the variation present between samples may be calculated as
S2
Tr =
1
n − 1
D
i=A
¯Xi − ¯¯X
2
=
1
4 − 1
(50 − 53)2
+ (56 − 53)2
+ (52 − 53)2
+ (54 − 53)2
=
20
3
= 6.67 to 2 d.p.
Note that the notation S2
Tr reflects the general use of the word ‘treatment’ to describe assignable
causes of variation between samples. This notation is not universal but it is fairly common.
Variation within samples
We now calculate the variation due to chance errors present within the samples and use the results to
obtain a pooled estimate of the variance, say S2
E, present within the samples. After this calculation
we will be able to compare the two variances and draw conclusions. The variance present within the
samples may be calculated as follows.
HELM (2005):
Section 44.1: One-Way Analysis of Variance
5
Sample A
(X − ¯XA)2
= (46 − 50)2
+ (54 − 50)2
+ (48 − 50)2
+ (46 − 50)2
+ (56 − 50)2
= 88
Sample B
(X − ¯XB)2
= (56 − 56)2
+ (55 − 56)2
+ (56 − 56)2
+ (60 − 56)2
+ (53 − 56)2
= 26
Sample C
(X − ¯XC)2
= (55 − 52)2
+ (51 − 52)2
+ (50 − 52)2
+ (51 − 52)2
+ (53 − 52)2
= 16
Sample D
(X − ¯XD)2
= (49 − 54)2
+ (53 − 54)2
+ (57 − 54)2
+ (60 − 54)2
+ (51 − 54)2
= 80
An obvious extension of the formula for a pooled variance gives
S2
E =
(X − ¯XA)2
+ (X − ¯XB)2
+ (X − ¯XC)2
+ (X − ¯XD)2
(nA − 1) + (nB − 1) + (nC − 1) + (nD − 1)
where nA, nB, nC and nD represent the number of members (5 in each case here) in each sample.
Note that the quantities comprising the denominator nA − 1, · · · , nD − 1 are the number of degrees
of freedom present in each of the four samples. Hence our pooled estimate of the variance present
within the samples is given by
S2
E =
88 + 26 + 16 + 80
4 + 4 + 4 + 4
= 13.13
We are now in a position to ask whether the variation between samples S2
Tr is large in comparison
with the variation within samples S2
E. The answer to this question enables us to decide whether the
difference in the calculated variations is sufficiently large to conclude that there is a difference in the
population means. That is, do we have sufficient evidence to reject H0?
Using the FFF-test
At first sight it seems reasonable to use the ratio
F =
S2
Tr
S2
E
but in fact the ratio
F =
nS2
Tr
S2
E
,
where n is the sample size, is used since it can be shown that if H0 is true this ratio will have a value
of approximately unity while if H0 is not true the ratio will have a value greater that unity. This is
because the variance of a sample mean is σ2
/n.
The test procedure (three steps) for the data used here is as follows.
(a) Find the value of F;
(b) Find the number of degrees of freedom for both the numerator and denominator of the
ratio;
6 HELM (2005):
Workbook 44: Analysis of Variance
(c) Accept or reject depending on the value of F compared with the appropriate tabulated
value.
Step 1
The value of F is given by
F =
nS2
Tr
S2
E
=
5 × 6.67
13.13
= 2.54
Step 2
The number of degrees of freedom for S2
Tr (the numerator) is
Number of samples − 1 = 3
The number of degrees of freedom for S2
E (the denominator) is
Number of samples × (sample size − 1) = 4 × (5 − 1) = 16
Step 3
The critical value (5% level of significance) from the F-tables (Table 1 at the end of this Workbook)
is F(3,16) = 3.24 and since 2.54  3.224 we see that we cannot reject H0 on the basis of the evidence
available and conclude that in this case the variation present is due to chance. Note that the test
used is one-tailed.
ANOVA tables
It is usual to summarize the calculations we have seen so far in the form of an ANOVA table.
Essentially, the table gives us a method of recording the calculations leading to both the numerator
and the denominator of the expression
F =
nS2
Tr
S2
E
In addition, and importantly, ANOVA tables provide us with a useful means of checking the accuracy
of our calculations. A general ANOVA table is presented below with explanatory notes.
Define a = number of treatments, n = number of observations per sample.
Source of Sum of Squares Degrees Mean Square Value of
Variation SS of Freedom MS F Ratio
Between samples
(due to treatments)
SSTr = n
a
i=1
¯Xi − ¯¯X
2
(a − 1)
MSTr =
SSTr
(a − 1)
= nS2
¯X
F =
MSTr
MSE
=
nS2
Tr
S2
EDifferences between
means ¯Xi and ¯¯X
Within samples
(due to chance errors)
SSE =
a
i=1
n
j=1
Xij − ¯Xj
2
a(n − 1) MSE =
SSE
a(n − 1)
= S2
EDifferences between
individual observations
Xij and means ¯Xi
TOTALS SST =
a
i=1
n
j=1
Xij − ¯¯X
2
(an − 1)
HELM (2005):
Section 44.1: One-Way Analysis of Variance
7
In order to demonstrate this table for the example above we need to calculate
SST =
a
i=1
n
j=1
Xij − ¯¯X
2
a measure of the total variation present in the data. Such calculations are easily done using a
computer (Microsoft Excel was used here), the result being
SST =
a
i=1
n
j=1
Xij − ¯¯X
2
= 310
The ANOVA table becomes
Source of Sum of Squares Degrees of Mean Square Value of
Variation SS Freedom MS F Ratio
Between samples
(due to treatments)
100 3
MSTr =
SSTr
(a − 1)
=
100
3
= 33.33
F =
MSTr
MSE
= 2.54
Differences between
means ¯Xi and ¯¯X
Within samples
(due to chance errors)
210 16
MSE =
SSE
a(n − 1)
=
210
16
= 13.13
Differences between
individual observations
Xij and means ¯Xi
TOTALS 310 19
It is possible to show theoretically that
SST = SSTr + SSE
that is
a
i=1
n
j=1
Xij − ¯¯X
2
= n
a
i=1
¯Xi − ¯¯X
2
+
a
i=1
n
j=1
Xij − ¯Xj
2
As you can see from the table, SSTr and SSE do indeed sum to give SST even though we can
calculate them separately. The same is true of the degrees of freedom.
Note that calculating these quantities separately does offer a check on the arithmetic but that using
the relationship can speed up the calculations by obviating the need to calculate (say) SST . As
you might expect, it is recommended that you check your calculations! However, you should note
that it is usual to calculate SST and SSTr and then find SSE by subtraction. This saves a lot of
unnecessary calculation but does not offer a check on the arithmetic. This shorter method will be
used throughout much of this Workbook.
8 HELM (2005):
Workbook 44: Analysis of Variance
Unequal sample sizes
So far we have assumed that the number of observations in each sample is the same. This is not a
necessary condition for the one-way ANOVA.
Key Point 1
Suppose that the number of samples is a and the numbers of observations are n1, n2, . . . , na. Then
the between-samples sum of squares can be calculated using
SSTr =
a
i=1
T2
i
ni
−
G2
N
where Ti is the total for sample i, G =
a
i=1
Ti is the overall total and N =
a
i=1
ni.
It has a − 1 degrees of freedom.
The total sum of squares can be calculated as before, or using
SST =
a
i=1
ni
j=1
X2
ij −
G2
N
It has N − 1 degrees of freedom.
The within-samples sum of squares can be found by subtraction:
SSE = SST − SSTr
It has (N − 1) − (a − 1) = N − a degrees of freedom.
HELM (2005):
Section 44.1: One-Way Analysis of Variance
9
TaskTask
Three fuel injection systems are tested for efficiency and the following coded data
are obtained.
System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51
Do the data support the hypothesis that the systems offer equivalent levels of efficiency?
Your solution
10 HELM (2005):
Workbook 44: Analysis of Variance
Answer
Appropriate hypotheses are
H0 = µ1 = µ2 = µ3
H1 : At least one mean is different to the others
Variation between samples
System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51
¯X1 = 49 ¯X2 = 56 ¯X3 = 53
The mean of the means is ¯¯X =
49 + 56 + 53
3
= 52.67 and the variation present between samples
is
S2
Tr =
1
n − 1
3
i=1
¯Xi − ¯¯X
2
=
1
3 − 1
(49 − 52.67)2
+ (56 − 52.67)2
+ (53 − 52.67)2
= 12.33
Variation within samples
System 1
(X − ¯X1)2
= (48 − 49)2
+ (56 − 49)2
+ (46 − 49)2
+ (45 − 49)2
+ (51 − 49)2
= 76
System 2
(X − ¯X2)2
= (60 − 56)2
+ (56 − 56)2
+ (53 − 56)2
+ (60 − 56)2
+ (51 − 56)2
= 66
System 3
(X − ¯X3)2
= (57 − 53)2
+ (55 − 53)2
+ (52 − 53)2
+ (50 − 53)2
+ (51 − 53)2
= 34
Hence
S2
E =
(X − ¯X1)2
+ (X − ¯X2)2
+ (X − ¯X3)2
(n1 − 1) + (n2 − 1) + (n3 − 1)
=
76 + 66 + 34
4 + 4 + 4
= 14.67
The value of F is given by F =
nS2
Tr
S2
E
=
5 × 12.33
14.67
= 4.20
The number of degrees of freedom for S2
Tr is No. of samples −1 = 2
The number of degrees of freedom for S2
E is No. of samples×(sample size − 1) = 12
The critical value (5% level of significance) from the F-tables (Table 1 at the end of this Workbook)
is F(2,12) = 3.89 and since 4.20  3.89 we conclude that we have sufficient evidence to reject H0
so that the injection systems are not of equivalent efficiency.
HELM (2005):
Section 44.1: One-Way Analysis of Variance
11
Exercises
1. The yield of a chemical process, expressed in percentage of the theoretical maximum, is mea-
sured with each of two catalysts, A, B, and with no catalyst (Control: C). Five observations
are made under each condition. Making the usual assumptions for an analysis of variance, test
the hypothesis that there is no difference in mean yield between the three conditions. Use the
5% level of significance.
Catalyst A Catalyst B Control C
79.2 81.5 74.8
80.1 80.7 76.5
77.4 80.5 74.7
77.6 81.7 74.8
77.8 80.6 74.9
2. Four large trucks, A, B, C, D, are used to move stone in a quarry. On a number of days,
the amount of fuel, in litres, used per tonne of stone moved is calculated for each truck. On
some days a particular truck might not be used. The data are as follows. Making the usual
assumptions for an analysis of variance, test the hypothesis that the mean amount of fuel used
per tonne of stone moved is the same for each truck. Use the 5% level of significance.
Truck Observations
A 0.21 0.21 0.21 0.21 0.20 0.19 0.18 0.21 0.22 0.21
B 0.22 0.22 0.25 0.21 0.21 0.22 0.20 0.23
C 0.21 0.18 0.18 0.19 0.20 0.18 0.19 0.19 0.20 0.20 0.20
D 0.20 0.20 0.21 0.21 0.21 0.19 0.20 0.20 0.21
12 HELM (2005):
Workbook 44: Analysis of Variance
Answers
1. We calculate the treatment totals for A: 392.1, B: 405.0 and C: 375.7. The overall total is
1172.8 and y2
= 91792.68.
The total sum of squares is
91792.68 −
1172.82
15
= 95.357
on 15 − 1 = 14 degrees of freedom.
The between treatments sum of squares is
1
5
(392.12
+ 405.02
+ 375.72
) −
1172.82
15
= 86.257
on 3 − 1 = 2 degrees of freedom.
By subtraction, the residual sum of squares is
95.357 − 86.257 = 9.100
on 14 − 2 = 12 degrees of freedom.
The analysis of variance table is as follows:
Source of Sum of Degrees of Mean Variance
variation squares freedom square ratio
Treatment 86.257 2 43.129 56.873
Residual 9.100 12 0.758
Total 95.357 14
The upper 5% point of the F2,12 distribution is 3.89. The observed variance ratio is greater
than this so we conclude that the result is significant at the 5% level and we reject the null
hypothesis at this level. The evidence suggests that there are differences in the mean yields
between the three treatments.
HELM (2005):
Section 44.1: One-Way Analysis of Variance
13
Answer
2. We can summarise the data as follows.
Truck y y2
n
A 2.05 0.4215 10
B 1.76 0.3888 8
C 2.12 0.4096 11
D 1.83 0.3725 9
Total 7.76 1.5924 38
The total sum of squares is
1.5924 −
7.762
38
= 7.7263 × 10−3
on 38 − 1 = 37 degrees of freedom.
The between trucks sum of squares is
2.052
10
+
1.762
8
+
2.122
11
+
1.832
9
−
7.762
38
= 3.4581 × 10−3
on 4 − 1 = 3 degrees of freedom.
By subtraction, the residual sum of squares is
7.7263 × 10−3
− 3.4581 × 10−3
= 4.2682 × 10−3
on 37 − 3 = 34 degrees of freedom.
The analysis of variance table is as follows:
Source of Sum of Degrees of Mean Variance
variation squares freedom square ratio
Trucks 3.4581 × 10−3
3 1.1527 × 10−3
9.1824
Residual 4.2682 × 10−3
34 0.1255 × 10−3
Total 7.7263 × 10−3
37
The upper 5% point of the F3,34 distribution is approximately 2.9. The observed variance
ratio is greater than this so we conclude that the result is significant at the 5% level and we
reject the null hypothesis at this level. The evidence suggests that there are differences in the
mean fuel consumption per tonne moved between the four trucks.
14 HELM (2005):
Workbook 44: Analysis of Variance

More Related Content

PPTX
Scales of measurement (1)
PPTX
ANOVA TEST by shafeek
PPTX
Univariate & bivariate analysis
PPTX
Grounded theory
PPTX
PDF
Mixed Effects Models - Signal Detection Theory
PDF
Business Research Methods - Hypothesis
PPTX
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Scales of measurement (1)
ANOVA TEST by shafeek
Univariate & bivariate analysis
Grounded theory
Mixed Effects Models - Signal Detection Theory
Business Research Methods - Hypothesis
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty

What's hot (20)

PPTX
Inductive research
PPT
Univariate Analysis
PPT
wilcoxon signed rank test
PPTX
Business Research Methods BBA
PPTX
Research and Theory
PPTX
Experimental design
PPTX
forecasting technique.pptx
PPTX
Hawthrone studies ( Human Relation Theory)
PPTX
Hypothesis – Meaning, Definition, Importance, Characteristics and Types
PDF
5. Non parametric analysis
PPTX
Two-factor Mixed MANOVA with SPSS
PPTX
Tests of statistical significance : chi square and spss
PPTX
Hypothesis
PDF
'Hypothesis'
PDF
Exploratory Research Plan Template
PPTX
EXPERIMENTAL RESEARCH DESIGN
PPT
Experimental research design
PPTX
HYPOTHESIS
PPTX
Validity in Research
PPTX
Paradigms
Inductive research
Univariate Analysis
wilcoxon signed rank test
Business Research Methods BBA
Research and Theory
Experimental design
forecasting technique.pptx
Hawthrone studies ( Human Relation Theory)
Hypothesis – Meaning, Definition, Importance, Characteristics and Types
5. Non parametric analysis
Two-factor Mixed MANOVA with SPSS
Tests of statistical significance : chi square and spss
Hypothesis
'Hypothesis'
Exploratory Research Plan Template
EXPERIMENTAL RESEARCH DESIGN
Experimental research design
HYPOTHESIS
Validity in Research
Paradigms
Ad

Similar to Analysis of Variance1 (20)

PPT
Anova by Hazilah Mohd Amin
DOCX
Analysis of variance for experimentsANOVA.docx
PDF
Quality Engineering material
PPTX
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
PPTX
Chapter 4(1).pptx
PPT
Quantitative Analysis for Emperical Research
PPTX
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
DOCX
One-way ANOVA research paper
DOCX
[The following information applies to the questions displayed belo.docx
DOCX
Statistics and probability
PDF
anovappt-141025002857-conversion-gate01 (1).pdf
PDF
A Derivative Free High Ordered Hybrid Equation Solver
PDF
A DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVER
PDF
Posthoc
PPT
DOCX
anovappt-141025002857-conversion-gate01 (1)_240403_185855 (2).docx
PPT
Chapter15
PPT
Chapter12
PPTX
Anova by Hazilah Mohd Amin
Analysis of variance for experimentsANOVA.docx
Quality Engineering material
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
Chapter 4(1).pptx
Quantitative Analysis for Emperical Research
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
One-way ANOVA research paper
[The following information applies to the questions displayed belo.docx
Statistics and probability
anovappt-141025002857-conversion-gate01 (1).pdf
A Derivative Free High Ordered Hybrid Equation Solver
A DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVER
Posthoc
anovappt-141025002857-conversion-gate01 (1)_240403_185855 (2).docx
Chapter15
Chapter12
Ad

More from Mayar Zo (6)

PDF
19 4
PDF
19 3
PDF
19 2
PDF
19 1
PDF
Analysis of Variance 3
PDF
Analysis of Variance 2
19 4
19 3
19 2
19 1
Analysis of Variance 3
Analysis of Variance 2

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Institutional Correction lecture only . . .
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Lesson notes of climatology university.
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Types and Its function , kingdom of life
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
RMMM.pdf make it easy to upload and study
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Institutional Correction lecture only . . .
Final Presentation General Medicine 03-08-2024.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
O7-L3 Supply Chain Operations - ICLT Program
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pharma ospi slides which help in ospi learning
Lesson notes of climatology university.
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Complications of Minimal Access Surgery at WLH
Cell Types and Its function , kingdom of life
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
master seminar digital applications in india

Analysis of Variance1

  • 1. ContentsContents  Analysis of Variance 44.1 One-Way Analysis of Variance 2 44.2 Two-Way Analysis of Variance 15 44.3 Experimental Design 40 Learning In this Workbook you will learn the basics of this very important branch of Statistics and how to do the calculations which enable you to draw conclusions about variance found in data sets. You will also be introduced to the design of experiments which has great importance in science and engineering. Outcomes
  • 2. One-Way Analysis of Variance 44.1 Introduction Problems in engineering often involve the exploration of the relationships between values taken by a variable under different conditions. 41 introduced hypothesis testing which enables us to compare two population means using hypotheses of the general form H0 : µ1 = µ2 H1 : µ1 = µ2 or, in the case of more than two populations, H0 : µ1 = µ2 = µ3 = . . . = µk H1 : H0 is not true If we are comparing more than two population means, using the type of hypothesis testing referred to above gets very clumsy and very time consuming. As you will see, the statistical technique called Analysis of Variance (ANOVA) enables us to compare several populations simultaneously. We might, for example need to compare the shear strengths of five different adhesives or the surface toughness of six samples of steel which have received different surface hardening treatments. ' $ % Prerequisites Before starting this Section you should . . . • be familiar with the general techniques of hypothesis testing • be familiar with the F-distribution 9 8 6 7 Learning Outcomes On completion you should be able to . . . • describe what is meant by the term one-way ANOVA. • perform one-way ANOVA calculations. • interpret the results of one-way ANOVA calculations 2 HELM (2005): Workbook 44: Analysis of Variance
  • 3. 1. One-way ANOVA In this Workbook we deal with one-way analysis of variance (one-way ANOVA) and two-way analysis of variance (two-way ANOVA). One-way ANOVA enables us to compare several means simultaneously by using the F-test and enables us to draw conclusions about the variance present in the set of samples we wish to compare. Multiple (greater than two) samples may be investigated using the techniques of two-population hypothesis testing. As an example, it is possible to do a comparison looking for variation in the surface hardness present in (say) three samples of steel which have received different surface hardening treatments by using hypothesis tests of the form H0 : µ1 = µ2 H1 : µ1 = µ2 We would have to compare all possible pairs of samples before reaching a conclusion. If we are dealing with three samples we would need to perform a total of 3 C2 = 3! 1!2! = 3 hypothesis tests. From a practical point of view this is not an efficient way of dealing with the problem, especially since the number of tests required rises rapidly with the number of samples involved. For example, an investigation involving ten samples would require 10 C2 = 10! 8!2! = 45 separate hypothesis tests. There is also another crucially important reason why techniques involving such batteries of tests are unacceptable. In the case of 10 samples mentioned above, if the probability of correctly accepting a given null hypothesis is 0.95, then the probability of correctly accepting the null hypothesis H0 : µ1 = µ2 = . . . = µ10 is (0.95)45 ≈ 0.10 and we have only a 10% chance of correctly accepting the null hypothesis for all 45 tests. Clearly, such a low success rate is unacceptable. These problems may be avoided by simultaneously testing the significance of the difference between a set of more than two population means by using techniques known as the analysis of variance. Essentially, we look at the variance between samples and the variance within samples and draw conclusions from the results. Note that the variation between samples is due to assignable (or controlled) causes often referred in general as treatments while the variation within samples is due to chance. In the example above concerning the surface hardness present in three samples of steel which have received different surface hardening treatments, the following diagrams illustrate the differences which may occur when between sample and within sample variation is considered. HELM (2005): Section 44.1: One-Way Analysis of Variance 3
  • 4. Case 1 In this case the variation within samples is roughly on a par with that occurring between samples. Sample 1 Sample 2 Sample 3 ¯s3 ¯s1 ¯s2 Figure 1 Case 2 In this case the variation within samples is considerably less than that occurring between samples. Sample 1 Sample 2 Sample 3 ¯s3 ¯s1 ¯s2 Figure 2 We argue that the greater the variation present between samples in comparison with the variation present within samples the more likely it is that there are ‘real’ differences between the population means, say µ1, µ2 and µ3. If such ‘real’ differences are shown to exist at a sufficiently high level of significance, we may conclude that there is sufficient evidence to enable us to reject the null hypothesis H0 : µ1 = µ2 = µ3. Example of variance in data This example looks at variance in data. Four machines are set up to produce alloy spacers for use in the assembly of microlight aircraft. The spaces are supposed to be identical but the four machines give rise to the following varied lengths in mm. Machine AAA Machine BBB Machine CCC Machine DDD 46 56 55 49 54 55 51 53 48 56 50 57 46 60 51 60 56 53 53 51 4 HELM (2005): Workbook 44: Analysis of Variance
  • 5. Since the machines are set up to produce identical alloy spacers it is reasonable to ask if the evidence we have suggests that the machine outputs are the same or different in some way. We are really asking whether the sample means, say ¯XA, ¯XB, ¯XC and ¯XD, are different because of differences in the respective population means, say µA, µB, µC and µD, or whether the differences in ¯XA, ¯XB, ¯XC and ¯XD may be attributed to chance variation. Stated in terms of a hypothesis test, we would write H0 : µA = µB = µC = µD H1 : At least one mean is different from the others In order to decide between the hypotheses, we calculate the mean of each sample and overall mean (the mean of the means) and use these quantities to calculate the variation present between the samples. We then calculate the variation present within samples. The following tables illustrate the calculations. H0 : µA = µB = µC = µD H1 : At least one mean is different from the others Machine AAA Machine BBB Machine CCC Machine DDD 46 56 55 49 54 55 51 53 48 56 50 57 46 60 51 60 56 53 53 51 ¯XA = 50 ¯XB = 56 ¯XC = 52 ¯XD = 54 The mean of the means is clearly ¯¯X = 50 + 56 + 52 + 54 4 = 53 so the variation present between samples may be calculated as S2 Tr = 1 n − 1 D i=A ¯Xi − ¯¯X 2 = 1 4 − 1 (50 − 53)2 + (56 − 53)2 + (52 − 53)2 + (54 − 53)2 = 20 3 = 6.67 to 2 d.p. Note that the notation S2 Tr reflects the general use of the word ‘treatment’ to describe assignable causes of variation between samples. This notation is not universal but it is fairly common. Variation within samples We now calculate the variation due to chance errors present within the samples and use the results to obtain a pooled estimate of the variance, say S2 E, present within the samples. After this calculation we will be able to compare the two variances and draw conclusions. The variance present within the samples may be calculated as follows. HELM (2005): Section 44.1: One-Way Analysis of Variance 5
  • 6. Sample A (X − ¯XA)2 = (46 − 50)2 + (54 − 50)2 + (48 − 50)2 + (46 − 50)2 + (56 − 50)2 = 88 Sample B (X − ¯XB)2 = (56 − 56)2 + (55 − 56)2 + (56 − 56)2 + (60 − 56)2 + (53 − 56)2 = 26 Sample C (X − ¯XC)2 = (55 − 52)2 + (51 − 52)2 + (50 − 52)2 + (51 − 52)2 + (53 − 52)2 = 16 Sample D (X − ¯XD)2 = (49 − 54)2 + (53 − 54)2 + (57 − 54)2 + (60 − 54)2 + (51 − 54)2 = 80 An obvious extension of the formula for a pooled variance gives S2 E = (X − ¯XA)2 + (X − ¯XB)2 + (X − ¯XC)2 + (X − ¯XD)2 (nA − 1) + (nB − 1) + (nC − 1) + (nD − 1) where nA, nB, nC and nD represent the number of members (5 in each case here) in each sample. Note that the quantities comprising the denominator nA − 1, · · · , nD − 1 are the number of degrees of freedom present in each of the four samples. Hence our pooled estimate of the variance present within the samples is given by S2 E = 88 + 26 + 16 + 80 4 + 4 + 4 + 4 = 13.13 We are now in a position to ask whether the variation between samples S2 Tr is large in comparison with the variation within samples S2 E. The answer to this question enables us to decide whether the difference in the calculated variations is sufficiently large to conclude that there is a difference in the population means. That is, do we have sufficient evidence to reject H0? Using the FFF-test At first sight it seems reasonable to use the ratio F = S2 Tr S2 E but in fact the ratio F = nS2 Tr S2 E , where n is the sample size, is used since it can be shown that if H0 is true this ratio will have a value of approximately unity while if H0 is not true the ratio will have a value greater that unity. This is because the variance of a sample mean is σ2 /n. The test procedure (three steps) for the data used here is as follows. (a) Find the value of F; (b) Find the number of degrees of freedom for both the numerator and denominator of the ratio; 6 HELM (2005): Workbook 44: Analysis of Variance
  • 7. (c) Accept or reject depending on the value of F compared with the appropriate tabulated value. Step 1 The value of F is given by F = nS2 Tr S2 E = 5 × 6.67 13.13 = 2.54 Step 2 The number of degrees of freedom for S2 Tr (the numerator) is Number of samples − 1 = 3 The number of degrees of freedom for S2 E (the denominator) is Number of samples × (sample size − 1) = 4 × (5 − 1) = 16 Step 3 The critical value (5% level of significance) from the F-tables (Table 1 at the end of this Workbook) is F(3,16) = 3.24 and since 2.54 3.224 we see that we cannot reject H0 on the basis of the evidence available and conclude that in this case the variation present is due to chance. Note that the test used is one-tailed. ANOVA tables It is usual to summarize the calculations we have seen so far in the form of an ANOVA table. Essentially, the table gives us a method of recording the calculations leading to both the numerator and the denominator of the expression F = nS2 Tr S2 E In addition, and importantly, ANOVA tables provide us with a useful means of checking the accuracy of our calculations. A general ANOVA table is presented below with explanatory notes. Define a = number of treatments, n = number of observations per sample. Source of Sum of Squares Degrees Mean Square Value of Variation SS of Freedom MS F Ratio Between samples (due to treatments) SSTr = n a i=1 ¯Xi − ¯¯X 2 (a − 1) MSTr = SSTr (a − 1) = nS2 ¯X F = MSTr MSE = nS2 Tr S2 EDifferences between means ¯Xi and ¯¯X Within samples (due to chance errors) SSE = a i=1 n j=1 Xij − ¯Xj 2 a(n − 1) MSE = SSE a(n − 1) = S2 EDifferences between individual observations Xij and means ¯Xi TOTALS SST = a i=1 n j=1 Xij − ¯¯X 2 (an − 1) HELM (2005): Section 44.1: One-Way Analysis of Variance 7
  • 8. In order to demonstrate this table for the example above we need to calculate SST = a i=1 n j=1 Xij − ¯¯X 2 a measure of the total variation present in the data. Such calculations are easily done using a computer (Microsoft Excel was used here), the result being SST = a i=1 n j=1 Xij − ¯¯X 2 = 310 The ANOVA table becomes Source of Sum of Squares Degrees of Mean Square Value of Variation SS Freedom MS F Ratio Between samples (due to treatments) 100 3 MSTr = SSTr (a − 1) = 100 3 = 33.33 F = MSTr MSE = 2.54 Differences between means ¯Xi and ¯¯X Within samples (due to chance errors) 210 16 MSE = SSE a(n − 1) = 210 16 = 13.13 Differences between individual observations Xij and means ¯Xi TOTALS 310 19 It is possible to show theoretically that SST = SSTr + SSE that is a i=1 n j=1 Xij − ¯¯X 2 = n a i=1 ¯Xi − ¯¯X 2 + a i=1 n j=1 Xij − ¯Xj 2 As you can see from the table, SSTr and SSE do indeed sum to give SST even though we can calculate them separately. The same is true of the degrees of freedom. Note that calculating these quantities separately does offer a check on the arithmetic but that using the relationship can speed up the calculations by obviating the need to calculate (say) SST . As you might expect, it is recommended that you check your calculations! However, you should note that it is usual to calculate SST and SSTr and then find SSE by subtraction. This saves a lot of unnecessary calculation but does not offer a check on the arithmetic. This shorter method will be used throughout much of this Workbook. 8 HELM (2005): Workbook 44: Analysis of Variance
  • 9. Unequal sample sizes So far we have assumed that the number of observations in each sample is the same. This is not a necessary condition for the one-way ANOVA. Key Point 1 Suppose that the number of samples is a and the numbers of observations are n1, n2, . . . , na. Then the between-samples sum of squares can be calculated using SSTr = a i=1 T2 i ni − G2 N where Ti is the total for sample i, G = a i=1 Ti is the overall total and N = a i=1 ni. It has a − 1 degrees of freedom. The total sum of squares can be calculated as before, or using SST = a i=1 ni j=1 X2 ij − G2 N It has N − 1 degrees of freedom. The within-samples sum of squares can be found by subtraction: SSE = SST − SSTr It has (N − 1) − (a − 1) = N − a degrees of freedom. HELM (2005): Section 44.1: One-Way Analysis of Variance 9
  • 10. TaskTask Three fuel injection systems are tested for efficiency and the following coded data are obtained. System 1 System 2 System 3 48 60 57 56 56 55 46 53 52 45 60 50 50 51 51 Do the data support the hypothesis that the systems offer equivalent levels of efficiency? Your solution 10 HELM (2005): Workbook 44: Analysis of Variance
  • 11. Answer Appropriate hypotheses are H0 = µ1 = µ2 = µ3 H1 : At least one mean is different to the others Variation between samples System 1 System 2 System 3 48 60 57 56 56 55 46 53 52 45 60 50 50 51 51 ¯X1 = 49 ¯X2 = 56 ¯X3 = 53 The mean of the means is ¯¯X = 49 + 56 + 53 3 = 52.67 and the variation present between samples is S2 Tr = 1 n − 1 3 i=1 ¯Xi − ¯¯X 2 = 1 3 − 1 (49 − 52.67)2 + (56 − 52.67)2 + (53 − 52.67)2 = 12.33 Variation within samples System 1 (X − ¯X1)2 = (48 − 49)2 + (56 − 49)2 + (46 − 49)2 + (45 − 49)2 + (51 − 49)2 = 76 System 2 (X − ¯X2)2 = (60 − 56)2 + (56 − 56)2 + (53 − 56)2 + (60 − 56)2 + (51 − 56)2 = 66 System 3 (X − ¯X3)2 = (57 − 53)2 + (55 − 53)2 + (52 − 53)2 + (50 − 53)2 + (51 − 53)2 = 34 Hence S2 E = (X − ¯X1)2 + (X − ¯X2)2 + (X − ¯X3)2 (n1 − 1) + (n2 − 1) + (n3 − 1) = 76 + 66 + 34 4 + 4 + 4 = 14.67 The value of F is given by F = nS2 Tr S2 E = 5 × 12.33 14.67 = 4.20 The number of degrees of freedom for S2 Tr is No. of samples −1 = 2 The number of degrees of freedom for S2 E is No. of samples×(sample size − 1) = 12 The critical value (5% level of significance) from the F-tables (Table 1 at the end of this Workbook) is F(2,12) = 3.89 and since 4.20 3.89 we conclude that we have sufficient evidence to reject H0 so that the injection systems are not of equivalent efficiency. HELM (2005): Section 44.1: One-Way Analysis of Variance 11
  • 12. Exercises 1. The yield of a chemical process, expressed in percentage of the theoretical maximum, is mea- sured with each of two catalysts, A, B, and with no catalyst (Control: C). Five observations are made under each condition. Making the usual assumptions for an analysis of variance, test the hypothesis that there is no difference in mean yield between the three conditions. Use the 5% level of significance. Catalyst A Catalyst B Control C 79.2 81.5 74.8 80.1 80.7 76.5 77.4 80.5 74.7 77.6 81.7 74.8 77.8 80.6 74.9 2. Four large trucks, A, B, C, D, are used to move stone in a quarry. On a number of days, the amount of fuel, in litres, used per tonne of stone moved is calculated for each truck. On some days a particular truck might not be used. The data are as follows. Making the usual assumptions for an analysis of variance, test the hypothesis that the mean amount of fuel used per tonne of stone moved is the same for each truck. Use the 5% level of significance. Truck Observations A 0.21 0.21 0.21 0.21 0.20 0.19 0.18 0.21 0.22 0.21 B 0.22 0.22 0.25 0.21 0.21 0.22 0.20 0.23 C 0.21 0.18 0.18 0.19 0.20 0.18 0.19 0.19 0.20 0.20 0.20 D 0.20 0.20 0.21 0.21 0.21 0.19 0.20 0.20 0.21 12 HELM (2005): Workbook 44: Analysis of Variance
  • 13. Answers 1. We calculate the treatment totals for A: 392.1, B: 405.0 and C: 375.7. The overall total is 1172.8 and y2 = 91792.68. The total sum of squares is 91792.68 − 1172.82 15 = 95.357 on 15 − 1 = 14 degrees of freedom. The between treatments sum of squares is 1 5 (392.12 + 405.02 + 375.72 ) − 1172.82 15 = 86.257 on 3 − 1 = 2 degrees of freedom. By subtraction, the residual sum of squares is 95.357 − 86.257 = 9.100 on 14 − 2 = 12 degrees of freedom. The analysis of variance table is as follows: Source of Sum of Degrees of Mean Variance variation squares freedom square ratio Treatment 86.257 2 43.129 56.873 Residual 9.100 12 0.758 Total 95.357 14 The upper 5% point of the F2,12 distribution is 3.89. The observed variance ratio is greater than this so we conclude that the result is significant at the 5% level and we reject the null hypothesis at this level. The evidence suggests that there are differences in the mean yields between the three treatments. HELM (2005): Section 44.1: One-Way Analysis of Variance 13
  • 14. Answer 2. We can summarise the data as follows. Truck y y2 n A 2.05 0.4215 10 B 1.76 0.3888 8 C 2.12 0.4096 11 D 1.83 0.3725 9 Total 7.76 1.5924 38 The total sum of squares is 1.5924 − 7.762 38 = 7.7263 × 10−3 on 38 − 1 = 37 degrees of freedom. The between trucks sum of squares is 2.052 10 + 1.762 8 + 2.122 11 + 1.832 9 − 7.762 38 = 3.4581 × 10−3 on 4 − 1 = 3 degrees of freedom. By subtraction, the residual sum of squares is 7.7263 × 10−3 − 3.4581 × 10−3 = 4.2682 × 10−3 on 37 − 3 = 34 degrees of freedom. The analysis of variance table is as follows: Source of Sum of Degrees of Mean Variance variation squares freedom square ratio Trucks 3.4581 × 10−3 3 1.1527 × 10−3 9.1824 Residual 4.2682 × 10−3 34 0.1255 × 10−3 Total 7.7263 × 10−3 37 The upper 5% point of the F3,34 distribution is approximately 2.9. The observed variance ratio is greater than this so we conclude that the result is significant at the 5% level and we reject the null hypothesis at this level. The evidence suggests that there are differences in the mean fuel consumption per tonne moved between the four trucks. 14 HELM (2005): Workbook 44: Analysis of Variance