Anova single factor

AANNOOVVAA
OOnnee wwaayy SSiinnggllee FFaaccttoorr MMooddeellss
KARAN DESAI-11BIE001
DHRUV PATEL-11BIE024
VISHAL DERASHRI -11BIE030
HARDIK MEHTA-11BIE037
MALAV BHATT-11BIE056

DEFINITION
 Analysis of variance (ANOVA) is a collection of
statistical models used to analyze the differences
between group means and their associated procedures
(such as "variation" among and between groups),
developed by R.A.Fisher .In the ANOVA setting, the
observed variance in a particular variable is partitioned
into components attributable to different sources of
variation
22

-Sir Ronald
Aylmer Fisher
FRS was an English statistician,
evolutionary biologist, geneticist, and
eugenicist
33

Why ANOVA
• Compare the mean of more than two
population?
• Compare populations each containing
several subgroups or levels?
4

Problem with multiple T test
• One problem with this approach is the increasing
number of tests as the number of groups
increases
• The probability of making a Type I error increases
as the number of tests increase.
• If the probability of a Type I error for the analysis
is set at 0.05 and 10 t-tests are done, the overall
probability of a Type I error for the set of tests = 1
– (0.95)10 = 0.40* instead of 0.05
5

 In its simplest form, ANOVA provides a statistical test of
whether or not the means of several groups are equal,
and therefore generalizes the t-test to more than two
groups. As doing multiple two-sample t-tests would result
in an increased chance of committing a statistical type-I
error, ANOVAs are useful in comparing (testing) three or
more means (groups or variables) for statistical
significance.
66

• Another way to describe the multiple comparisons
problem is to think about the meaning of an alpha
level = 0.05
• Alpha of 0.05 implies that, by chance, there will be
one Type I error in every 20 tests: 1/20 = 0.05.
• This means that, by chance the null hypothesis
will be incorrectly rejected once in every 20 tests
• As the number of tests increases, the probability
of finding a ‘significant’ result by chance
increases.
7

Importance of ANOVA
• The ANOVA is an important test because
it enables us to see for example how
effective two different types of treatment
are and how durable they are.
• Effectively a ANOVA can tell us how well a
treatment work, how long it lasts and how
budget friendly it will be an
8

CLASSIFICATION OF ANOVA
MODEL
1. Fixed-effects models:
The fixed-effects model of analysis of
variance applies to situations in which the experimenter
applies one or more treatments to the subjects of the
experiment to see if the response variable values
change. This allows the experimenter to estimate the
ranges of response variable values that the treatment
would generate in the population as a whole.
99

2. Random-effects model:
Random effects models are used
when the treatments are not fixed. This occurs when the
various factor levels are sampled from a larger
population. Because the levels themselves are random
variables , some assumptions and the method of
contrasting the treatments (a multi-variable
generalization of simple differences) differ from the fixed-effects
model.
1100

3.Mixed-effects models
A mixed-effects model contains experimental
factors of both fixed and random-effects types, with appropriately
different interpretations and analysis for the two types.
Example: Teaching experiments could be performed by a
university department to find a good introductory textbook, with each
text considered a treatment. The fixed-effects model would
compare a list of candidate texts. The random-effects model would
determine whether important differences exist among a list of
randomly selected texts. The mixed-effects model would
compare the (fixed) incumbent texts to randomly selected
alternatives.
1111

ASSUMPTION
 Normal distribution
 Variances of dependent variable are equal in all
populations
 Random samples; independent scores
1122

One way Single factor ANOVA
1133

ONE-WAY ANOVA
 One factor (manipulated variable)
 One response variable
 Two or more groups to compare
1144

USEFULLNESS
 Similar to t-test
 More versatile than t-test
 Compare one parameter (response variable)
between two or more groups
1155

Remember that…
 Standard deviation (s)
n
s = √[(Σ (xi – X)2)/(n-1)]
i = 1
 In this case: Degrees of freedom (df)
df = Number of observations or groups
1166

ANOVA
 ANOVA (ANalysis Of VAriance) is a natural extension used to
compare the means more than 2 populations.
 Basic Question: Even if the true means of n populations were
equal (i.e. m1 = m2 = m3 = m4) we cannot expect the sample means
(`x1, `x2, `x3, `x4 ) to be equal. So when we get different values
for the `x’s,
 How much is due to randomness?
 How much is due to the fact that we are sampling from
different populations with possibly different mj’s.

ANOVA TERMINOLOGY
 Response Variable (y)
 WWhhaatt wwee aarree mmeeaassuurriinngg
 Experimental Units
 TThhee iinnddiivviidduuaall uunniitt tthhaatt wwee wwiillll mmeeaassuurree
 Factors
 IInnddeeppeennddeenntt vvaarriiaabblleess wwhhoossee vvaalluueess ccaann cchhaannggee ttoo aaffffeecctt
tthhee oouuttccoommee ooff tthhee rreessppoonnssee vvaarriiaabbllee,, yy
 Levels of Factors
 VVaalluueess ooff tthhee ffaaccttoorrss
 Treatments
 TThhee ccoommbbiinnaattiioonn ooff tthhee lleevveellss ooff tthhee ffaaccttoorrss aapppplliieedd ttoo aann
eexxppeerriimmeennttaall uunniitt

Example
We want to know how combinations of different
amounts of water (1 ac-ft, 3 ac-ft, 5 ac-ft) and
different fertilizers (A, B, C) affect crop yields
 Response variable
– ccrroopp yyiieelldd ((bbuusshheellss//aaccrree))
 Experimental unit
 EEaacchh aaccrree tthhaatt rreecceeiivveess aa ttrreeaattmmeenntt
 Factors ((22))
 WWaatteerr aanndd ffeerrttiilliizzeerr
 Levels ((33 ffoorr WWaatteerr;; 33 ffoorr FFeerrttiilliizzeerr))
 WWaatteerr:: 11,, 33,, 55;; FFeerrttiilliizzeerr:: AA,, BB,, CC
 Treatments ((99 == 33xx33))
 11AA,, 33AA,, 55AA,, 11BB,, 33BB,, 55BB,, 11CC,, 33CC,, 55CC

Total Treatments
Fertilizer
A B C
1 AC-FT Treatment 1 Treatment 2 Treatment 3
Water 3 AC-FT Treatment 4 Treatment 5 Treatment 6
5 AC-FT Treatment 7 Treatment 8 Treatment 9

Single Factor ANOVA
Basic Assumptions
 If we focus on only one factor (e.g. fertilizer type in the
previous example), this is called single factor ANOVA.
 In this case, levels and treatments are the same thing since
there are no combinations between factors.
 Assumptions for Single Factor ANOVA
1. The distribution of each population in the comparison has a
normal distribution
2. The standard deviations of each population (although
unknown) are assumed to be equal (i.e. s1 = s2 = s3 = s4)
3. Sampling is:
Random
Independent

Example
 The university would like to know if the delivery mode of the
introductory statistics class affects the performance in the
class as measured by the scores on the final exam.
 The class is given in four different formats:
 Lecture
 Text Reading
 Videotape
 Internet
 The final exam scores from random samples of students from
each of the four teaching formats was recorded.

Summary
 There is a single factor under observation – teaching format
 There are k = 4 different treatments (or levels of teaching
formats)
 The number of observations (experimental units) are n1 = 7, n2
= 8, n3 = 6, n4 = 5 total number of
observations, n = 26
= = = =
Treatment Means : x1 76, x2 65, x3 75, x4 74
=
Grand mean (of all 26 observations) : x 72

Why aren’t all the`x’s the same?
 There is variability due to the different treatments --
Between TTrreeaattmmeenntt VVaarriiaabbiilliittyy ((TTrreeaattmmeenntt))
 There is variability due to randomness within each
treatment -- WWiitthhiinn TTrreeaattmmeenntt VVaarriiaabbiilliittyy ((EErrrroorr))
BBAASSIICC CCOONNCCEEPPTT
If the average BBeettwweeeenn TTrreeaattmmeenntt VVaarriiaabbiilliittyy is “large”
compared to the average WWiitthhiinn TTrreeaattmmeenntt VVaarriiaabbiilliittyy,
we can reasonably conclude that there really are
differences among the population means (i.e. at least
one μj differs from the others).

Basic Questions
 Given this basic concept, the natural questions are:
 What is “variability” due to treatment and due to error
and how are they measured?
 What is “average variability” due to treatment and due
to error and how are they measured?
 What is “large”?
 How much larger than the observed average
variability due to error does the observed average
variability due to treatment have to be before we
are convinced that there are differences in the true
population means (the μ’s)?

How Is “Total” Variability
Measured? Variability is defined as the Sum ooff SSqquuaarree DDeevviiaattiioonnss (from the
grand mean). So,
SSSSTT (Total Sum of Squares)
 Sum of Squared Deviations of all observations from the
grand mean. (McClave uses SSTotal)
 SSSSTTrr (Between Treatment Sum of Squares)
 Sum of Square Deviations Due to Different Treatments.
(McClave uses SST)
 SSSSEE (Within Treatment Sum of Squares)
 Sum of Square Deviations Due to Error
SSSSTT == SSSSTTrr ++ SSSSEE

How is “Average” Variability Measured?
“Average” Variability is measured in:
MMeeaann SSqquuaarree VVaalluueess (MSTr and MSE)
 Found by dividing SSTr and SSE by their respective
degrees of freedom
AANNOOVVAA TTAABBLLEE
# treatments -1 DFT - DFTR
VVaarriiaabbiilliittyy SSSS DDFF MMeeaann SSqquuaarree ((MMSS))
Between Tr. (Treatment) SSTr k-1 SSTr/DFTR
Within Tr. (Error) SSE n-k SSE/DFE
TOTAL SST n-1
# observations -1

Formula for Calculating
SST
Calculating SST
Just like the
numerator of the
variance
assuming all (26)
entries come
from one
population
=åå -
SST (x x)
(82 72) 2 ... (81 72) 2
4394
2
ij
= - + + - =

SSTr
Calculating SSTr
Between Treatment
Variability
Replace all entries within
each treatment by its
mean – now all the
variability is between (not
within) treatments
76
76
76
76
76
76
76
=å -
SSTr n (x x)
2
j j
75
75
75
75
75
75
65
65
65
65
65
65
65
65
2 2 2 2
= - + - + - + - =
7(76 72) 8(65 72) 6(75 72) 5(74 72) 578
74
74
74
74
74

SSE
Calculating SSE (Within Treatment Variability)
The difference between the SST and SSTr ---
= =
SSE SST - SSTr
=
4394 - 578 3816

Can we Conclude a Difference Among
the 4 Teaching Formats?
We conclude that at least one population mean differs
from the others if the average between treatment
variability is large compared to the average within
treatment variability, that is if MSTr/MSE is “large”.
 The ratio of the two measures of variability for these
normally distributed random variables has an FF
ddiissttrriibbuuttiioonn and the FF--ssttaattiissttiicc ((==MMSSTTrr//MMSSEE)) is
compared to a critical F-value from an F distribution
with:
 Numerator degrees of freedom = DFTr
 Denominator degrees of freedom = DFE
 If the ratio of MSTr to MSE (the F-statistic) exceeds
the critical F-value, we can conclude that aatt lleeaasstt oonnee
ppooppuullaattiioonn mmeeaann ddiiffffeerrss ffrroomm tthhee ootthheerrss.

Can We Conclude Different Teaching
Formats Affect Final Exam Scores?
The F-test
H0: m1 = m2 = m3 = m4
HA: At least one mj differs from the others
Select α = .05.
Reject H0 (Accept HA) if:
F MSTr = > α,DFTr,DFE = .05,3,22 =
F F 3.05
MSE

Hand Calculations for the F-test
= = =
173.45
578
3816
22
MSTr SSTr
MSE SSE
DFE
192.67
3
DFTr
= = =
1.11
F 192.67
= =
173.45
<
1.11 3.05
CCaannnnoott ccoonncclluuddee tthheerree iiss aa ddiiffffeerreennccee aammoonngg tthhee μμjj’’ss

EXCEL OUTPUT
pp--vvaalluuee == ..336655997755 >> ..0055
CCaannnnoott ccoonncclluuddee ddiiffffeerreenncceess

REVIEW
 ANOVA Situation and Terminology
 Response variable, Experimental Units, Factors,
Levels, Treatments, Error
 Basic Concept
 If the “average variability” between treatments is “a
lot” greater than the “average variability” due to error –
conclude that at least one mean differs from the
others.
 Single Factor Analysis
 By Hand
 By Excel

Anova single factor

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Anova single factor (20)

Recently uploaded (20)

Anova single factor