SlideShare a Scribd company logo
TOPICS FOR TODAY
Analysis of Variance
ANOVA
The concept of Analysis of Variance
is explained below:
Earlier, we compared two-population
means by using a two-sample t-test.
However, we are often required to
compare more than two population means
simultaneously.
We might be tempted to apply the two-
sample t-test to all possible pairwise
comparisons of means.
For example, if we wish to compare 4
population means, there will be
separate pairs, and to test the null
hypothesis that all four population means
are equal, we would require six
two-sample t-tests.
6
2
4









Similarly, to test the null hypothesis
that 10 population means are equal, we
would need
separate two-sample t-tests.
This procedure of running multiple
two-sample t-tests for comparing means
would obviously be tedious and time-
consuming.
45
2
10









Thus a series of two-sample t-tests is not
an appropriate procedure to test the equality
of several means simultaneously.
Evidently, we require a simpler
procedure for carrying out this kind of a test.
One such procedure is the Analysis of
Variance, introduced by Sir R.A. Fisher
(1890-1962) in 1923:
Analysis of Variance (ANOVA) is a
procedure which enables us to test the
hypothesis of equality of several population
means
(i.e.
H0 : 1 = 2 = 3 = …… = k
against
HA: not all the means are equal)
The concept of Analysis of Variance is
closely related with the concept of
Experimental Design:
EXPERIMENTAL DESIGN
By an experimental design, we mean a
plan used to collect the data relevant to the
problem under study in such a way as to
provide a basis for valid and objective
inference about the stated problem.
The plan usually includes:
• The selection of treatments whose
effects are to be studied,
• the specification of the experimental layout,
and
• The assignment of treatments to the
experimental units.
All these steps are accomplished before
any experiment is performed.
Experimental Design is a very vast
area. In this course, we will be presenting
only a very basic introduction of this area.
There are two types of designs:
systematic and randomized designs.
Today, we will be discussing only the
randomized designs, and, in this regard, it
should be noted that for the randomized
designs, the analysis of the collected data is
carried out through the technique known as
Analysis of Variance.
Two of the very basic randomized
designs are:
i) The Completely Randomized (CR)
Design,
and
ii) The Randomized Complete
Block (RCB) Design
EXAMPLE:
An experiment was conducted to
compare the yields of three varieties of
potatoes.
Each variety as assigned at random to
equal-size plots, four times.
The yields were as follow:
Variety
A B C
23 18 16
26 28 25
20 17 12
17 21 14
Test the hypothesis that the three
varieties of potatoes are not different in the
yielding capabilities.
SOLUTION:
The first thing to note is that this is an
example of the Completely Randomized (CR)
Design.
We are assuming that all twelve of the
plots (i.e. farms) available to us for this
experiment are homogeneous (i.e. similar)
with regard to the fertility of the soil, the
weather conditions, etc., and hence, we are
assigning the four varieties to the twelve plots
totally at random.
Now, in order to test the hypothesis
that the mean yields of the three varieties
of potato are equal, we carry out the six-
step hypothesis-testing procedure, as given
below:
Hypothesis-Testing Procedure:
i) H0 : A = B = C
HA : Not all the three means
are equal
ii) Level of Significance:
 = 0.05
iii) Test Statistic:
which, if H0 is true, has an F distribution with
1 = k-1 = 3 – 1 = 2 and 2 = n-k = 12 – 3 = 9
degree of freedom
Error
MS
Treatments
MS
F 
Step-4: Computations:
The computation of the test statistic
presented above involves quite a few steps,
including the formation of what is known as
the ANOVA Table.
First of all, let us consider what is
meant by the ANOVA Table (i.e. the
Analysis of Variance Table).
ANOVA Table
Source of
Variation
df Sum of
Squares
Mean
Squares
F
Between
Treatments
k - 1 SST MST MST/
MSE
Within
Treatments
(Error)
n - k SSE MSE
Total n - 1 TSS
Let us try to understand this table step
by step:
The very first column is headed
‘Source of Variation’, and under this
heading, we have three distinct sources of
variation:
‘Total’ stands for the overall variation
in the twelve values that we have in our
data-set.
Variety
A B C
23 18 16
26 28 25
20 17 12
17 21 14
As you can see, the values in our data-
set are 23, 26, 20, 17, 18, 28, and so on.
Evidently, there is a variation in these
values, and the term ‘Total’ in the lowest
row of the ANOVA Table stands for this
overall variation.
The term ‘Variation Between
Treatments’ stands for the variability that
exists between the three varieties of potato
that we have sown in the plots.
(In this example, the term ‘treatments’
stands for the three varieties of potato that
we are trying to compare.)
(The term ‘variation between treatments’
points to the fact that:
It is possible that the three varieties, or,
at least two of the varieties are significantly
different from each other with regard to their
yielding capabilities. This variability between
the varieties can be measured by measuring
the differences between the mean yields of the
three varieties.)
The third source of variation is
‘variation within treatments’. This points to
the fact that even if only one particular
variety of potato is sown more than once,
we do not get the same yield every time.
Variety
A B C
23 18 16
26 28 25
20 17 12
17 21 14
In this example, variety A was sown four
times, and the yields were 23, 26, 20, and 17 -
-- all different from one another!
Similar is the case for variety B as well
as variety C.
The variability in the yields of variety
A can be called ‘variation within variety A’.
Similarly, the variability in the yields
of variety B can be called ‘variation within
variety B’.
Also, the variability in the yields of
variety C can be called ‘variation within
variety C’.
We can say that the term ‘variability
within treatments’ stands for the combined
effect of the above-mentioned three
variations.
The ‘variation within treatments’ is
also known as the ‘error variation’.
This is so because we can argue that if
we are sowing the same variety in four plots
which are very similar to each other, then we
should have obtained the same yield from
each plot!
If it is not coming out to be the same
every time, we can regard this as some kind of
an ‘error’.
The second, third and fourth columns
of the ANOVA Table are entitled ‘degrees of
freedom’, ‘Sum of Squares’ and ‘Mean
Square’.
ANOVA Table
Source of
Variation
df Sum of
Squares
Mean
Squares
F
Between
Treatments
k - 1 SST MST MST/
MSE
Within
Treatments
(Error)
n - k SSE MSE --
Total n - 1 TSS -- --
The point to understand is that the
sources of variation corresponding to
treatments and error will be measured by
computing quantities that are called Mean
Squares, and ‘Mean Square’ can be defined
as:
Freedom
of
Degrees
Squares
of
Sum
Square
Mean 
Corresponding to these two sources of
variation, we have the following two
equations:
and
.
f
.
d
'
Treatment
SS
'
'
Treatment
MS
'
)
1 
.
f
.
d
'
Error
SS
'
'
Error
MS
'
)
2 
It has been mathematically proved that,
with reference to Analysis of Variance, the
degrees of freedom corresponding to the
Treatment Sum of Squares are k-1, and the
degrees of freedom corresponding to the
Error Sum of Squares are n-k.
Hence, the above two equations can be
written as:
and
1
k
'
Treatment
SS
'
'
Treatment
MS
'
)
1
-

k
n
'
Error
SS
'
'
Error
MS
'
)
2
-

How do we compute the various sums
of squares?
The three sums of squares occurring in
the third column of the above ANOVA Table
are given by:
where C.F. stands for
‘Correction Factor’, and is given by
and r denotes the number of data-values
per column (i.e. the number of rows).
TSS
SS
Total
)
1 

SST
Treatment
SS
)
2 

2
ij
i j
X CF
-

2
. j
j
T
CF
r
-

2
..
T
CF
n

With reference to the CR Design, it
should be noted that, in some situations, the
various treatments are not repeated an equal
number of times.
For example, with reference to the
twelve plots (farms) that we have been
considering above, we could have sown
variety A in five of the plots, variety B in
three plots, and variety C in four plots.
Going back to the formulae of various
sums of squares, the sum of squares for
error is given by
SST
TSS
SSE
.
e
.
i
Treatment
SS
SS
Total
Error
SS
)
3
-

-

It is interesting to note that,
Total SS = SS Treatment + SS Error
In a similar way, we have the equation:
Total d.f. = d.f. for Treatment + d.f. for
Error
It can be shown that the degrees of
freedom pertaining to ‘Total’ are n - 1.
Now,
n-1 = (k-1) + (n-k)
i.e.
Total d.f. = d.f. for Treatment + d.f. for Error
The notations and terminology given in
the above equations relate to the following
table:
C
B
A
4953
1221
1838
1894
Check
18941
4489
7056
7396
4953
237
67
84
86
1109
2085
833
926
--
--
--
--
16 (256)
25 (625)
12 (144)
14 (196)
18 (324)
28 (784)
17 (289)
21 (441)
23 (529)
26 (676)
20 (400)
17 (289)
Total
Variety 2
ij
j
X

. j
T
2
. j
T
2
ij
i
X

The entries in the body of the table i.e.
23, 26, 20, 17, and so on are the yields of
the three varieties of potato that we had
sown in the twelve farms.
The entries written in brackets next to
the above-mentioned data-values are the
squares of those values.
For example:
529 is the square of 23,
676 is the square of 26,
400 is the square of 20,
and so on.
Adding all these squares, we obtain :
2
4953
ij
i j
X 

C
B
A
4953
1221
1838
1894
Check
18941
4489
7056
7396
4953
237
67
84
86
1109
2085
833
926
--
--
--
--
16 (256)
25 (625)
12 (144)
14 (196)
18 (324)
28 (784)
17 (289)
21 (441)
23 (529)
26 (676)
20 (400)
17 (289)
Total
Variety 2
ij
j
X

. j
T
2
. j
T
2
ij
i
X

The notation T.j stands for the total of the
jth column.
(The students must already be aware that, in
general, the rows of a bivariate table are
denoted by the letter ‘i’, whereas the columns
of a bivariate table are denoted by the letter ‘j’.
In other words, we talk about the ‘ith
row’, and the ‘jth column’ of a bivariate
table.)
The ‘dot’ in the notation T.j indicates
the fact that summation has been carried out
over i (i.e. over the rows).
In this example, the total of the values
in the first column is 86, the total of the
values in the second column is 84, and the
total of the values in the third column is 67.
C
B
A
4953
1221
1838
1894
Check
18941
4489
7056
7396
4953
237
67
84
86
1109
2085
833
926
--
--
--
--
16 (256)
25 (625)
12 (144)
14 (196)
18 (324)
28 (784)
17 (289)
21 (441)
23 (529)
26 (676)
20 (400)
17 (289)
Total
Variety 2
ij
j
X

. j
T
2
. j
T
2
ij
i
X

Hence, T.j is equal to 237.
T.j is also denoted by T..
i.e.
T.. = T.j
The ‘double dot’ in the notation T..
indicates that summation has been carried
out over i as well as over j.
The row below T.j is that of T.j
2, and
squaring the three values of T.j, we obtain
the quantities 7396, 7056 and 4489.
Adding these, we obtain T.j
2 = 18941.
C
B
A
4953
1221
1838
1894
Check
18941
4489
7056
7396
4953
237
67
84
86
1109
2085
833
926
--
--
--
--
16 (256)
25 (625)
12 (144)
14 (196)
18 (324)
28 (784)
17 (289)
21 (441)
23 (529)
26 (676)
20 (400)
17 (289)
Total
Variety 2
ij
j
X

. j
T
2
. j
T
2
ij
i
X

Now that we have obtained all the
required quantities, we are ready to
compute SS Total, SS Treatment, and SS
Error:
We have
Hence, the total sum of squares is given by
 
2
2
..
237
4680.75
12
T
CF
n
  
2
4953 4680.75
272.25
ij
i j
TSS X CF
 -
 -


Also, we have
2
.
SS Treatment
18941
4680.75
4
54.50
j
j
T
SST CF
r
  -
 -


And, hence:
SS Error = SSE = TSS - SST
= 272.25 - 54.50 = 217.75
Also:
In this example,
we have n = 12, and k = 3, hence:
n - 1 = 11,
k - 1 = 2,
and
n - k = 9.
Substituting the above sums of squares
and degree of freedom in the ANOVA table,
we obtain:
ANOVA Table
Source of
Variation d.f. Sum of
Squares
Mean
Square
Computed
F
Between
treatments
(i.e. Between
varieties)
2 54.50
Error 9 217.75
Total 11 272.25
Now, the mean squares for treatments
and for error are very easily found by
dividing the sums of squares by the
corresponding degrees of freedom.
Hence, we have
--
--
272.25
11
Total
--
24.19
217.75
9
(Error)
27.25
54.50
2
Between
Treatments
(i.e. Between
Varieties)
F
Mean
Squares
Sum of
Squares
df
Source of
Variation
ANOVA Table
As indicated earlier, the test-statistic
appropriate for testing the null hypothesis
H0 : A = B = C
versus
HA : Not all the three means
are equal
is:
which, if H0 is true, has an F distribution with
1 = k-1 = 3 – 1 = 2 and 2 = n-k = 12 – 3 = 9
degree of freedom
MS Treatment
F
MS Error

Hence, it is obvious that F will be
found by dividing the first entry of the
fourth column of our ANOVA Table by the
second entry of the same column i.e.
13
.
1
24.19
27.25
Error
MS
Treatment
MS
F 


We insert this computed value of F in
the last column of our ANOVA table, and
thus obtain:
--
--
272.25
11
Total
--
24.19
217.75
9
(Error)
1.13
27.25
54.50
2
Between
Treatments
(i.e. Between
Varieties)
F
Mean
Squares
Sum of
Squares
df
Source of
Variation
ANOVA Table
The fifth step of the hypothesis - testing
procedure is to determine the critical region.
With reference to the Analysis of
Variance procedure, it can be shown that it is
appropriate to establish the critical region in
such a way that our test is a right-tailed test.
In other words, the critical region is
given by:
Critical Region:
F > F ( k - 1, n - k)
In this example:
The critical region is
F > F0.05 (2,9) = 4.26
vi) Conclusion:
Since the computed value of F = 1.13
does not fall in the critical region, so we
accept our null hypothesis and may conclude
that, on the average, there is no difference
among the yielding capabilities of the three
varieties of potatoes.
One important point that the students
should note is that the ANOVA technique
being presented here is valid under the
following assumptions:
1. The k populations (whose means are to
be compared) are normally distributed;
2. All k populations have equal variances
i.e. 1
2 = 2
2 = … = k
2. (This property is
called homoscedasticity.)
3. The k samples have been drawn
randomly and independently from the
respective populations.

More Related Content

PPTX
Anova; analysis of variance
PPTX
ANOVA.pptx
PPTX
A study on the ANOVA ANALYSIS OF VARIANCE.pptx
PPTX
ANOVA - Dr. Manu Melwin Joy - School of Management Studies, Cochin University...
PDF
Anova stat 512
DOC
Anova statistics
Anova; analysis of variance
ANOVA.pptx
A study on the ANOVA ANALYSIS OF VARIANCE.pptx
ANOVA - Dr. Manu Melwin Joy - School of Management Studies, Cochin University...
Anova stat 512
Anova statistics

Similar to ANOVA.ppt (20)

PPTX
QM Unit II.pptx
PPT
Anova by Hazilah Mohd Amin
PPT
11.1 anova1
PPT
Anova.ppt
PPTX
TWO WAY ANOVA.pptx biostatistics and reasearch
PPTX
Analysis of variance
PDF
analysisofvariance-210906094053.pdf
DOCX
ANOVA SNEHA.docx
PDF
AnalysisOfVariance
PPTX
Anova in easyest way
PPTX
Research Methodology anova
PDF
2Analysis of Variance.pdf
PPTX
{ANOVA} PPT-1.pptx
PPT
Ch7 Analysis of Variance (ANOVA)
PDF
Stat sample test ch 12
PPT
Q3W3_ANOVA_SC.pptQ3W3_ANOVA_SC.pptQ3W3_ANOVA_SC.ppt
PPT
Anova single factor
PDF
ANOVA.pdf
DOCX
anovappt-141025002857-conversion-gate01 (1)_240403_185855 (2).docx
PPTX
13. single factor ANOVA crop.pptx
QM Unit II.pptx
Anova by Hazilah Mohd Amin
11.1 anova1
Anova.ppt
TWO WAY ANOVA.pptx biostatistics and reasearch
Analysis of variance
analysisofvariance-210906094053.pdf
ANOVA SNEHA.docx
AnalysisOfVariance
Anova in easyest way
Research Methodology anova
2Analysis of Variance.pdf
{ANOVA} PPT-1.pptx
Ch7 Analysis of Variance (ANOVA)
Stat sample test ch 12
Q3W3_ANOVA_SC.pptQ3W3_ANOVA_SC.pptQ3W3_ANOVA_SC.ppt
Anova single factor
ANOVA.pdf
anovappt-141025002857-conversion-gate01 (1)_240403_185855 (2).docx
13. single factor ANOVA crop.pptx
Ad

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PPTX
Managing Community Partner Relationships
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
STUDY DESIGN details- Lt Col Maksud (21).pptx
ISS -ESG Data flows What is ESG and HowHow
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Predictive modeling basics in data cleaning process
Introduction-to-Cloud-ComputingFinal.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Data Science and Data Analysis
Qualitative Qantitative and Mixed Methods.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
Managing Community Partner Relationships
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Knowledge Engineering Part 1
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Ad

ANOVA.ppt

  • 1. TOPICS FOR TODAY Analysis of Variance ANOVA
  • 2. The concept of Analysis of Variance is explained below:
  • 3. Earlier, we compared two-population means by using a two-sample t-test. However, we are often required to compare more than two population means simultaneously. We might be tempted to apply the two- sample t-test to all possible pairwise comparisons of means.
  • 4. For example, if we wish to compare 4 population means, there will be separate pairs, and to test the null hypothesis that all four population means are equal, we would require six two-sample t-tests. 6 2 4         
  • 5. Similarly, to test the null hypothesis that 10 population means are equal, we would need separate two-sample t-tests. This procedure of running multiple two-sample t-tests for comparing means would obviously be tedious and time- consuming. 45 2 10         
  • 6. Thus a series of two-sample t-tests is not an appropriate procedure to test the equality of several means simultaneously. Evidently, we require a simpler procedure for carrying out this kind of a test.
  • 7. One such procedure is the Analysis of Variance, introduced by Sir R.A. Fisher (1890-1962) in 1923:
  • 8. Analysis of Variance (ANOVA) is a procedure which enables us to test the hypothesis of equality of several population means (i.e. H0 : 1 = 2 = 3 = …… = k against HA: not all the means are equal)
  • 9. The concept of Analysis of Variance is closely related with the concept of Experimental Design:
  • 10. EXPERIMENTAL DESIGN By an experimental design, we mean a plan used to collect the data relevant to the problem under study in such a way as to provide a basis for valid and objective inference about the stated problem.
  • 11. The plan usually includes: • The selection of treatments whose effects are to be studied, • the specification of the experimental layout, and • The assignment of treatments to the experimental units. All these steps are accomplished before any experiment is performed.
  • 12. Experimental Design is a very vast area. In this course, we will be presenting only a very basic introduction of this area.
  • 13. There are two types of designs: systematic and randomized designs. Today, we will be discussing only the randomized designs, and, in this regard, it should be noted that for the randomized designs, the analysis of the collected data is carried out through the technique known as Analysis of Variance.
  • 14. Two of the very basic randomized designs are: i) The Completely Randomized (CR) Design, and ii) The Randomized Complete Block (RCB) Design
  • 15. EXAMPLE: An experiment was conducted to compare the yields of three varieties of potatoes. Each variety as assigned at random to equal-size plots, four times. The yields were as follow:
  • 16. Variety A B C 23 18 16 26 28 25 20 17 12 17 21 14
  • 17. Test the hypothesis that the three varieties of potatoes are not different in the yielding capabilities.
  • 18. SOLUTION: The first thing to note is that this is an example of the Completely Randomized (CR) Design. We are assuming that all twelve of the plots (i.e. farms) available to us for this experiment are homogeneous (i.e. similar) with regard to the fertility of the soil, the weather conditions, etc., and hence, we are assigning the four varieties to the twelve plots totally at random.
  • 19. Now, in order to test the hypothesis that the mean yields of the three varieties of potato are equal, we carry out the six- step hypothesis-testing procedure, as given below:
  • 20. Hypothesis-Testing Procedure: i) H0 : A = B = C HA : Not all the three means are equal ii) Level of Significance:  = 0.05
  • 21. iii) Test Statistic: which, if H0 is true, has an F distribution with 1 = k-1 = 3 – 1 = 2 and 2 = n-k = 12 – 3 = 9 degree of freedom Error MS Treatments MS F 
  • 22. Step-4: Computations: The computation of the test statistic presented above involves quite a few steps, including the formation of what is known as the ANOVA Table.
  • 23. First of all, let us consider what is meant by the ANOVA Table (i.e. the Analysis of Variance Table).
  • 24. ANOVA Table Source of Variation df Sum of Squares Mean Squares F Between Treatments k - 1 SST MST MST/ MSE Within Treatments (Error) n - k SSE MSE Total n - 1 TSS
  • 25. Let us try to understand this table step by step: The very first column is headed ‘Source of Variation’, and under this heading, we have three distinct sources of variation:
  • 26. ‘Total’ stands for the overall variation in the twelve values that we have in our data-set.
  • 27. Variety A B C 23 18 16 26 28 25 20 17 12 17 21 14
  • 28. As you can see, the values in our data- set are 23, 26, 20, 17, 18, 28, and so on. Evidently, there is a variation in these values, and the term ‘Total’ in the lowest row of the ANOVA Table stands for this overall variation.
  • 29. The term ‘Variation Between Treatments’ stands for the variability that exists between the three varieties of potato that we have sown in the plots. (In this example, the term ‘treatments’ stands for the three varieties of potato that we are trying to compare.)
  • 30. (The term ‘variation between treatments’ points to the fact that: It is possible that the three varieties, or, at least two of the varieties are significantly different from each other with regard to their yielding capabilities. This variability between the varieties can be measured by measuring the differences between the mean yields of the three varieties.)
  • 31. The third source of variation is ‘variation within treatments’. This points to the fact that even if only one particular variety of potato is sown more than once, we do not get the same yield every time.
  • 32. Variety A B C 23 18 16 26 28 25 20 17 12 17 21 14
  • 33. In this example, variety A was sown four times, and the yields were 23, 26, 20, and 17 - -- all different from one another! Similar is the case for variety B as well as variety C.
  • 34. The variability in the yields of variety A can be called ‘variation within variety A’. Similarly, the variability in the yields of variety B can be called ‘variation within variety B’. Also, the variability in the yields of variety C can be called ‘variation within variety C’.
  • 35. We can say that the term ‘variability within treatments’ stands for the combined effect of the above-mentioned three variations.
  • 36. The ‘variation within treatments’ is also known as the ‘error variation’.
  • 37. This is so because we can argue that if we are sowing the same variety in four plots which are very similar to each other, then we should have obtained the same yield from each plot! If it is not coming out to be the same every time, we can regard this as some kind of an ‘error’.
  • 38. The second, third and fourth columns of the ANOVA Table are entitled ‘degrees of freedom’, ‘Sum of Squares’ and ‘Mean Square’.
  • 39. ANOVA Table Source of Variation df Sum of Squares Mean Squares F Between Treatments k - 1 SST MST MST/ MSE Within Treatments (Error) n - k SSE MSE -- Total n - 1 TSS -- --
  • 40. The point to understand is that the sources of variation corresponding to treatments and error will be measured by computing quantities that are called Mean Squares, and ‘Mean Square’ can be defined as:
  • 42. Corresponding to these two sources of variation, we have the following two equations:
  • 44. It has been mathematically proved that, with reference to Analysis of Variance, the degrees of freedom corresponding to the Treatment Sum of Squares are k-1, and the degrees of freedom corresponding to the Error Sum of Squares are n-k. Hence, the above two equations can be written as:
  • 46. How do we compute the various sums of squares? The three sums of squares occurring in the third column of the above ANOVA Table are given by:
  • 47. where C.F. stands for ‘Correction Factor’, and is given by and r denotes the number of data-values per column (i.e. the number of rows). TSS SS Total ) 1   SST Treatment SS ) 2   2 ij i j X CF -  2 . j j T CF r -  2 .. T CF n 
  • 48. With reference to the CR Design, it should be noted that, in some situations, the various treatments are not repeated an equal number of times. For example, with reference to the twelve plots (farms) that we have been considering above, we could have sown variety A in five of the plots, variety B in three plots, and variety C in four plots.
  • 49. Going back to the formulae of various sums of squares, the sum of squares for error is given by
  • 51. It is interesting to note that, Total SS = SS Treatment + SS Error In a similar way, we have the equation: Total d.f. = d.f. for Treatment + d.f. for Error
  • 52. It can be shown that the degrees of freedom pertaining to ‘Total’ are n - 1. Now, n-1 = (k-1) + (n-k) i.e. Total d.f. = d.f. for Treatment + d.f. for Error
  • 53. The notations and terminology given in the above equations relate to the following table:
  • 54. C B A 4953 1221 1838 1894 Check 18941 4489 7056 7396 4953 237 67 84 86 1109 2085 833 926 -- -- -- -- 16 (256) 25 (625) 12 (144) 14 (196) 18 (324) 28 (784) 17 (289) 21 (441) 23 (529) 26 (676) 20 (400) 17 (289) Total Variety 2 ij j X  . j T 2 . j T 2 ij i X 
  • 55. The entries in the body of the table i.e. 23, 26, 20, 17, and so on are the yields of the three varieties of potato that we had sown in the twelve farms. The entries written in brackets next to the above-mentioned data-values are the squares of those values.
  • 56. For example: 529 is the square of 23, 676 is the square of 26, 400 is the square of 20, and so on.
  • 57. Adding all these squares, we obtain :
  • 59. C B A 4953 1221 1838 1894 Check 18941 4489 7056 7396 4953 237 67 84 86 1109 2085 833 926 -- -- -- -- 16 (256) 25 (625) 12 (144) 14 (196) 18 (324) 28 (784) 17 (289) 21 (441) 23 (529) 26 (676) 20 (400) 17 (289) Total Variety 2 ij j X  . j T 2 . j T 2 ij i X 
  • 60. The notation T.j stands for the total of the jth column. (The students must already be aware that, in general, the rows of a bivariate table are denoted by the letter ‘i’, whereas the columns of a bivariate table are denoted by the letter ‘j’. In other words, we talk about the ‘ith row’, and the ‘jth column’ of a bivariate table.)
  • 61. The ‘dot’ in the notation T.j indicates the fact that summation has been carried out over i (i.e. over the rows).
  • 62. In this example, the total of the values in the first column is 86, the total of the values in the second column is 84, and the total of the values in the third column is 67.
  • 63. C B A 4953 1221 1838 1894 Check 18941 4489 7056 7396 4953 237 67 84 86 1109 2085 833 926 -- -- -- -- 16 (256) 25 (625) 12 (144) 14 (196) 18 (324) 28 (784) 17 (289) 21 (441) 23 (529) 26 (676) 20 (400) 17 (289) Total Variety 2 ij j X  . j T 2 . j T 2 ij i X 
  • 64. Hence, T.j is equal to 237. T.j is also denoted by T.. i.e.
  • 66. The ‘double dot’ in the notation T.. indicates that summation has been carried out over i as well as over j.
  • 67. The row below T.j is that of T.j 2, and squaring the three values of T.j, we obtain the quantities 7396, 7056 and 4489. Adding these, we obtain T.j 2 = 18941.
  • 68. C B A 4953 1221 1838 1894 Check 18941 4489 7056 7396 4953 237 67 84 86 1109 2085 833 926 -- -- -- -- 16 (256) 25 (625) 12 (144) 14 (196) 18 (324) 28 (784) 17 (289) 21 (441) 23 (529) 26 (676) 20 (400) 17 (289) Total Variety 2 ij j X  . j T 2 . j T 2 ij i X 
  • 69. Now that we have obtained all the required quantities, we are ready to compute SS Total, SS Treatment, and SS Error:
  • 70. We have Hence, the total sum of squares is given by   2 2 .. 237 4680.75 12 T CF n    2 4953 4680.75 272.25 ij i j TSS X CF  -  -  
  • 71. Also, we have 2 . SS Treatment 18941 4680.75 4 54.50 j j T SST CF r   -  -  
  • 72. And, hence: SS Error = SSE = TSS - SST = 272.25 - 54.50 = 217.75
  • 73. Also:
  • 74. In this example, we have n = 12, and k = 3, hence: n - 1 = 11, k - 1 = 2, and n - k = 9.
  • 75. Substituting the above sums of squares and degree of freedom in the ANOVA table, we obtain:
  • 76. ANOVA Table Source of Variation d.f. Sum of Squares Mean Square Computed F Between treatments (i.e. Between varieties) 2 54.50 Error 9 217.75 Total 11 272.25
  • 77. Now, the mean squares for treatments and for error are very easily found by dividing the sums of squares by the corresponding degrees of freedom. Hence, we have
  • 79. As indicated earlier, the test-statistic appropriate for testing the null hypothesis H0 : A = B = C versus HA : Not all the three means are equal is:
  • 80. which, if H0 is true, has an F distribution with 1 = k-1 = 3 – 1 = 2 and 2 = n-k = 12 – 3 = 9 degree of freedom MS Treatment F MS Error 
  • 81. Hence, it is obvious that F will be found by dividing the first entry of the fourth column of our ANOVA Table by the second entry of the same column i.e.
  • 83. We insert this computed value of F in the last column of our ANOVA table, and thus obtain:
  • 85. The fifth step of the hypothesis - testing procedure is to determine the critical region. With reference to the Analysis of Variance procedure, it can be shown that it is appropriate to establish the critical region in such a way that our test is a right-tailed test. In other words, the critical region is given by:
  • 86. Critical Region: F > F ( k - 1, n - k)
  • 87. In this example: The critical region is F > F0.05 (2,9) = 4.26
  • 88. vi) Conclusion: Since the computed value of F = 1.13 does not fall in the critical region, so we accept our null hypothesis and may conclude that, on the average, there is no difference among the yielding capabilities of the three varieties of potatoes.
  • 89. One important point that the students should note is that the ANOVA technique being presented here is valid under the following assumptions:
  • 90. 1. The k populations (whose means are to be compared) are normally distributed; 2. All k populations have equal variances i.e. 1 2 = 2 2 = … = k 2. (This property is called homoscedasticity.) 3. The k samples have been drawn randomly and independently from the respective populations.