Stactistics: Analysis of Variance Part I

Analysis of Variance
(ANOVA) Part I

Why study Analysis of Variance (ANOVA)?
• ANOVA technology was developed by R. A.
Fisher for use in the agricultural trials being
run at the Rothamstad Agricultural Research
Station where he was employed for a period of
time.
• The research was concerned with testing the
effects of different types of organic and
inorganic fertilizers on the crop yield.
• For this reason, many terms used to describe
ANOVA models still reflect the terminology
used in the agricultural settings (e.g., split-plot
designs).

• ANOVA methods were specifically
developed for the analysis of data from
experimental studies.
• The models assume that random
assignment has taken place as a means
of assuring that the IID assumptions of
the statistical test are met.
• These designs are still the staple of
certain social science disciplines where
much experimental research is
conducted (e.g., psychology).

• But what is the utility for other disciplines?
• There are several reasons why knowledge of
ANOVA is important:
– Clinical trials in all areas of research involving
human participants still employ these designs.
– Many of the techniques employed in testing
hypotheses in the ANOVA context are generalizable
to other statistical methods.
– There are other statistical procedures, e.g., variance
components models that use a similar approach to
variance decomposition as is employed in ANOVA
analysis and can be more readily understood with
an understanding of ANOVA.
– Possible involvement in interdisciplinary research.

• ANOVA models distinguish between independent
variables (IVs), dependent variables (DVs), blocking
variables, and levels of those variables.
• All IVs and Blocking variables are categorical. DVs are
continuous and IID normally.
• We have already discussed IVs and DVs.
• Blocking variables are variables needing to be controlled
in an analysis that are not manipulated by the
researchers and hence not true IV’s (e.g., gender, grade
in school, etc.).
Overview of ANOVA

“Levels” of IVs and Blocking Variables
• Each IV and Blocking variable in an ANOVA
model must have two or more “levels.”
– Example: the independent variable may be a type of
therapy, a drug, the induction of anger or frustration
or some other experimental manipulation.
• Levels could include different types of therapies
or therapies of differing degrees of intensity,
different doses of a drug, different induction
procedures, etc.
• It is assumed that participants are randomly
assigned to levels of the IV but not to levels of
any blocking variables.

Overview of ANOVA
• IVs and Blocking variables are referred to
as “Factors” and each factor is assumed
to have two or more levels.
• In ANOVA we distinguish between
between groups and within groups
factors.

Between vs. Within Factors
• A between groups factor is one in which
each subject appears at only one level of
the IV.
• A within groups factor (of which a
repeated measures factor is an example),
is one in which each subject appears at
each level of the IV.
• It is possible to have a design with a
mixture of between and within groups
factors or effects.

Fixed vs. Random Effects:
Expected Mean Squares (EMS)
• Effects or factors are fixed when all
levels of an IV or blocking variable we
are interested in generalizing to are
included in the analysis.
• Effects are random when they represent
a sampling of levels from the universe of
possible values.

Examples of Random and Fixed Effects
• Drug dosages of 2, 4, or 10 mg
– random: since not all levels are
represented.
• Different raters providing observational
ratings of behavior
– random
• Gender - male and female
– fixed
• MST treatment versus usual services
– fixed

Fixed vs. Random Effects (cont.)
• The distinction between fixed and
random effects is important since it
has implications for the way in which
the treatment effects are estimated
and the generalizability of the results

• For fixed effect models, we have complete
information (all levels of a variable or factor are
observed) and we can calculate the effect of the
treatment by taking the average across the
groups present.
• In the case of random factors, we have
incomplete information, (not all levels of the
factor are included in the design). For random
factors, we are estimating the treatment effect
at the level of the population given only the
information available from the levels we
included in our design. The formulas are
designed to represent this uncertainty.

Fixed versus Random Effects (cont.)
• In the case of a fixed effect, we can
generalize the results only to the levels
of the variables included in our
analyses.
• Random effects assume that the results
will be generalized to other levels
between the endpoint values included in
our analyses.

“Levels” of IVs and Blocking Variables
• The ANOVA model is a “means model”
– i.e., it assumes that any observed differences
in behavior can be completely described
using only information in the means.
• The ANOVA model is also a population-
averaged model.
• It evaluates the effects of treatments and
blocking variables at the group rather
than at the individual level.

Hypothesis Testing
• ANOVA involves the same 4-step
hypothesis testing procedure we
applied in the case of the z-test, t-tests,
and tests for the correlation coefficients.
• We will, however, use a different
sampling distribution to determine the
critical values of our statistic.
• This sampling distribution is called the
F-distribution and the significance test
is now called an F-test.

F-test Basics
• F-statistics are formed as the ratio of two chi-
square distributions divided by their respective
degrees of freedom:
χ2
/d.f.1
F(1,d.f.Denom) = t2
= --------------
χ2
/d.f.2
• As a result, unlike the t-distribution the shape of
which was determined by one degree of
freedom parameter, the F-distribution is
determined by two degree of freedom
parameters.
• When there are only 2 groups, F is equal to t2
.

The F Statistic (cont.)
• A very important property of the F-test under the null
hypothesis of no differences between the groups is
that, in theory, the numerator and denominator are
independent estimates of the same population
variance.
• However, the denominator measures only “error” or
“noise” while the numerator measures both error and
treatment effect.
• Under the null hypothesis of no effect of treatment, the
expected value of the F-statistic is 1.0. As the
treatment effect increases in size, F becomes greater
than 1.0
• Note: Although in theory F should never be less than
1.0, with “real” data it will fall below 1.0 at times.

Error Term
• The error term in the denominator of the F-
statistic is an extension of the two sample t-test
error term.
– In the two sample t-test we saw that since two independent
estimates of the population variance were available - one
from each sample - we could improve on the estimate of
the population parameter by averaging across the two
estimates. The error term which resulted was called a
“pooled error term.”
• In ANOVA, we will have at least two but possibly
three or more groups. Regardless, the process
is the same. To improve on our estimate of the
population parameter, we pool the variance
estimates together - one from each cell or
sample - and use this mean squared error as the
error term in our F-test.

Tabled Values of F
• The critical values in the table for the F-statistic
are non-normally distributed and include only
the values at the upper tail of the distribution.
• Lower values can be obtained by taking the
reciprocal of the tabled value of F, i.e., 1/F but
these values are rarely used.
• The F-distribution changes shape depending on
the numerator and denominator degrees of
freedom as can be seen in the next slide:

Stactistics: Analysis of Variance Part I

ANOVA Hypotheses
• Ho: μ1 = μ2 = μ3
• H1: μ1 ≠ μ2 ≠ μ3
• As with the other statistics covered in this
course, the F-test can be run working from
definitional formulas or computational formulas.
• We will work through the definitional formulas
in class examples.
• One reason for this is that it is easy to calculate
the statistic using these formulas. More
importantly, it is easier to see what each
component of the statistic represents.

Completely Randomized
(CR-p) Design

One-Way ANOVA Design Model
• Simplest ANOVA model – single factor:
Yij = μ + αj + εij
• Where i = person and j=group.
• This model says that each person's score
can be described by:
– μ, the overall or grand mean
– αj, an average group level treatment effect, and
– εij, a parameter describing each individual's
deviation from the average group effect.

The F-statistic
• For all ANOVA designs, the F-statistic is
comprised of two parts:
– Numerator: a component known as the “mean
square between groups”
– Denominator: “mean square within groups”
– We form a ratio of these two variance terms to
compute the F statistic:
MSBG SSBG / d.f.BG
F = --------- = ---------------
MSWG SSWG / d.f.WG

The F-statistic (cont.)
• The trick is to correctly estimate the
variance components forming the
numerator and denominator for different
combinations of fixed and random effects.
• The correct formulas to use are based on
the statistical theory underlying ANOVA
and are derived using what are termed
“expected mean square” formulas.

Components of the F-Statistic
• We define the numerator and denominator of the
F-test in terms of their expected values (the
values that one should obtain in the population if
Ho were true).
• The expected mean squares for the two types of
effects in the CR-p design are given by:
Model I Model II
(Fixed effect) (Random effect)
_____________________________________________
MSBG = σε
2
+ nΣαj
2
/ (p - 1) σε
2
+ n(1 - p/P)σα
2
MSWG = σε
2
σε
2
________________________________________

Forming the F-statistics for the two
possible design models:
σε
2
+ nΣαj
2
/ (p - 1)
F (Model 1) = ----------------------------
σε
2
σε
2
+ n(1 - p/P)σα
2
F (Model 2) = ----------------------------
σε
2

Calculating the Variance Components
J _ _
SSBG = n Σ (X.j - X..)2
j=1
J I _ J ^
SSWG = Σ Σ (Xij - X.j)2
or Σ s2
(n-1)
j=1 i=1 j=1
MSBG = SSBG/d.f.BG
MSWG = SSWG/d.f.WG

Variance Components (cont).
• If we add the SSBG and SSWG terms together, we have the
Total Sums of Squares or SSTOT:
SSTOT = SSBG+SSWG
= ΣΣ(Yij – Y..)2
• The ANOVA process represents a “decomposition” of
the total variation in a DV into “components” of
variation attributable to the factors included in the
model and a residual or error component.
• In the CR-p design, SSBG is the variance in the DV that
can be “explained” by the IV and SSWG is the error,
residual, or unexplained variation left over after
accounting for SSBG.

Assumptions
• ANOVA makes the same assumptions as the t-
test.
• F assumptions:
– The dependent variable is from a population that
is normally distributed.
– The sample observations are random samples
from the population.
– The numerator and denominator of the F-test are
estimates of the same population variance.
– The numerator and denominator are
independent.

Assumptions (cont.)
• Model Assumptions:
– The model equation (design model)
reflects all sources of variation affecting a
DV and each score is the sum of several
components.
– The experiment contains all treatment
levels of interest
– The error effect is independent of all other
errors and is normally distributed within
each treatment population with a mean
equal to 0 and variance equal to σ2
(homogeneity of variances assumption)

Assumptions (cont.)
• Note that the assumptions of ANOVA are
frequently violated to at least some
extent with real-world data.
• Generally the violations have little effect
on significance or power if:
– The data are derived through random
sampling
– The sample size is not small
– Departures from normality are not large
– n’s within each cell are equal.

Violations of Assumptions
• The F-test is robust to violations of the
homogeneity of variances assumption.
When the violation is extreme, however,
the result is an incorrect Type I error rate
(It will become increasingly inflated as the
violation becomes more severe).
• As we noted with the t-test, if the group
sizes are equal, the impact of
heterogeneity is of less concern.

Violations of Assumptions
• The effect on the error term will be liberal
when the largest cell sizes are associated
with the smallest variances and
conservative if the largest cell sizes are
associated with the largest variances.
• In most cases, problems arise when the
cell or group sizes are very different.

Normalizing Transformations
• When the data are more severely skewed,
kurtotic, or both, or the homogenity of
variances assumption has been violated,
it is sometimes necessary to apply a
normalizing transformation to the DV to
allow the analysis to be run.

Transformations
• Normalizing transformations help to
accomplish several things:
– Homogeneity of error variances
– Normality of errors
– Additivity of effects, i.e., effects that do not
interact (as is desirable, for example, in a
repeated measures ANOVA design). By
transforming the scale of measurement,
additivity can sometimes be achieved.
– We will talk about additivity more in the
context of repeated measures ANOVA.

Types of Normalizing Transformations
• Square Root
– Y'=√Y
– Use when treatment level means and
variances are proportional (moderate
positive skew)

Types of Normalizing Transformations (Cont.)
• Log 10
– Y'=log10(Y+1)
– Use when Tx means and SDs are
proportional (more extreme positive
skew)

• Angular or Inverse Sine
– Y'=2*arcsin(√Y)
– Use when means and variances are
proportional and underlying distribution
is binomial.
– Often used when the DV represents
proportions.

• Inverse or Reciprocal
– Y'=1/Y
– Use when squares of Tx means are
proportional to SDs (severe positive
skew or L-shaped distribution)

Negatively Skewed Distributions
• When the distribution is negatively
skewed, one must first "reflex" the
distribution and then apply the correct
transformation. To reflex a set of scores,
simply subtract each value from the
highest value plus one unit.
– e.g. an item is on a 1-5 scale, and we wish to
reflex it (change it to a 5-1 scale) we would
simply subtract each value from 6:
Reflexed(y) = 6-y

Selecting a Transformation
• If you are unsure which transformation will
work best, you can apply all possible
transformations to the highest and lowest
scores in each treatment level.
• Calculate the range of values for each
treatment level by subtracting the smallest
score from the largest.
• Form a ratio of the largest and smallest ranges
for each transformation across Tx levels.
• The transformation associated with the
smallest ratio wins.
• (See Kirk Experimental Design for an example)

Additivity
• If addititivity is of interest, use a test for
nonadditivity such as that developed by
Tukey (1949) and covered in many
statistics texts.
• Then select a transformation that
reduces nonadditivity to an acceptable
level.

Example of a Completely Randomized
Between Groups Design (CR-5)
• One independent variable with five levels
• The independent variable represents
different types of stranger awareness
training and the dependent variable the
latency of the children to protest verbally
about a stranger’s actions (measured in
seconds).

Example (cont.)
Experimenter Mother Mother Role
Control verbal verbal natural play
__________________________________________________________
Mean 279 284 286 308
330
SD 50 53 51 56 58
n 10 10 10 10 10
__________________________________________________________
Y.. = 297.4
Can eyeball the SD’s relative to the sample sizes to see if the
homogeneity assumption has been violated.

CR-5 Example (cont.)
• Ho: μ1 = μ2 = μ3 = μ4= μ5
• H1: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5
• Use ~F(4,45)
• Where dfnum = p-1 and dfden = p(n-1)
• Set up decision rules (from F-table):
– Fcrit(.05,4,45) = 2.61
– If Fobs > Fcrit then reject Ho

Formulas
• Calculate statistic and apply decision rules:
J ^
SSWG = Σ S2
(n-1) = [(50)2
+(53)2
+(51)2
+(56)2
+(58)2
]*9=129,690
j=1
J _ _
SSBG = nΣ (Xj - X..)2
j=1
= (10)(279-297.4)2
+(284-297.4)2
+(286-297.4)2
+
(308-297.4)2
+(330-297.4)2
= (10)1823.2
= 18,232
SSTOT = SSBG+SSWG = 129,690+18,232 = 147,992

Formulas (cont.)
d.f.BG = p-1 = 5-1= 4
d.f.WG = p(n-1) = 5(10-1)= 45
MSBG = SSBG/d.f.BG
= 18,232/4
= 4558
MSWG = SSWG/d.f.WG
= 129,690/45
= 2882

Formulas (cont.)
MSBG
F = ------
MSWG
= 4558 / 2882
= 1.58
Fcrit(.05,4,45) = 2.61
Since F-obs < F-crit do not reject Ho.
Conclude no effect due to treatment

What if F was significant?
• If you found that there was a significant
difference between means using the F-statistic,
you still do not know which of the groups were
significantly different from the others.
• This is because the F-test is a simultaneous or
omnibus test of the difference between all
possible combinations of group means.
• In the case of the two-sample t-test, this was
easy to resolve by looking at the group means.
In the case of three or more groups, this is not
as easy to determine.
• For this reason, it is necessary to introduce
what are known as post-hoc tests as a follow-up
to the F-test.

Post Hoc Tests
• Post hoc tests take many forms.
– For example, your book introduces one
post hoc test known as Tukey's HSD (where
the HSD stands for "honestly significantly
different").
– This test compares all pairwise
combinations of means and allows one to
determine which of a set of means are
actually different from one another.

Post Hoc Tests (cont.)
• Other commonly used Post Hoc Tests:
– Tukey's HSD: Evaluates all pairwise
differences between means and is based on
the Studentized range statistic (q). Maintains
error rate familywise at alpha.
– Fisher's LSD (Do not EVER use this!):
tantamount to no control. Essentially sets
error rate per contrast.
– Dunnett’s: Allows one to compare p-1
treatment means to a control mean with the
correlation between any two contrasts
being .50. Controls error rate familywise.

– Neuman-Keuls: Controls the error rate at
alpha for any ordered set of means and takes
into account the distance between the
means. Control is between familywise and
per contrast.
– Scheffe's: One of the most stringent and
therefore least powerful tests. Holds Type I
error rate at alpha for all possible contrasts.
Can be used with unequal n’s and for other
than pairwise comparisons.
– Duncan's multiple range test:

Type I Error Rate
• Post-hoc tests were developed to deal
with a problem associated with running
multiple significance tests.
• Whenever more than one hypothesis test
is run on the same data, the nominal or
overall Type I error rate increases. The
amount of increase is described by a
simple formula:
True alpha = 1-(1-α)c
• Where C = the number of significance
tests run.

Type I Error Rate
• As the number of tests increases, the
Type I error rate also increases. If you
run 5 significance tests each at the .05
level, the true error rate will be 1 - (1
- .05)5
= .2262.
• It is easy to see that this is a problem in
the ANOVA contexts since determining
which groups differ from one another
requires the application of multiple
significance tests.

Type I Error Rate
• In response, the various post hoc tests were
designed to hold the Type I error rate at the
nominal level across an entire set of
significance tests.
• To better understand issues having to do with
the Type I error rate in ANOVA models, one
must first understand that the error rate can
be defined at several different levels:
– Experiment-wise or across all possible
comparisons in a study.
– Family-wise or across all possible comparisons for
one factor or effect (if an interaction).
– Per contrast or comparison.

• Setting the Type I error rate per contrast is like
setting no control at all.
• Setting the error rate experiment-wise is
generally too conservative for most
applications.
• In most cases, the error rate is set family-wise.
– Note: Fisher's LSD is tantamount to no control at
all and should never be used as a post hoc test for
this reason.
– Scheffe’ is the most conservative and can also be
used for other than pairwise comparisons.

Example using Tukey’s HSD
• Tukey's HSD is specially designed to control or
maintain the Type I error rate across all possible
pairwise comparisons at the .05 level. The formula for
Tukey's is:
tHSD = q.05,p(√(MSWG/n))
• q is given in a Table in your book. For our problem
using the .05 level q=4.04:
tHSD = 4.04 (√(2882/10))
= 68.58
• 68.58 represents the value that the difference between
any two pairs of means must exceed for that difference
to be considered significant.

M1 M2 M3 M4 M5
M1 - -5 -7 -26 -51
M2 - -2 -24 -46
M3 - -22 -44
M4 - -22
M5 -
In this example none of the means using
Tukey's HSD are significantly different as
none of the absolute values of the
difference scores exceeds the critical
difference value of 68.58.

Tukey’s HSD (cont.)
• Sometimes, a pair of means may be
significantly different using a post hoc test
when the omnibus F-test tells you that there
are no differences or vice versa.
– Can happen since the two tests sometimes use a
different error term and d.f. or the means for one
pair of groups may be sufficiently different but the
difference is lost when SS are grouped for the
overall test.
– You should decide which to interpret based on your
best judgment as to which level of the Type I error
rate is most appropriate.

ω2
, η2
, and ρ
• These three statistics represent the proportion of the
total variance in the dependent variable explained by
the independent variables included in a design.
– Omega hat squared (ω2
) is used to estimate the proportion of
variance in the dependent variable explained by the fixed effect
independent variables in the population.
– Rho (ρ) is the equivalent of ω2
but for random effects.
– Eta-squared (η2
) is the proportion of variance explained in the
sample. Eta is similar to the Pearson r2
but indexes linear and
nonlinear association.
– These statistics are called measures of the “strength of
association” (between the IV's and DV).

Omega Hat Squared, rho, and
Eta-Squared cont.
• These statistics also provide additional
information to that of the significance
tests allowing a researcher to determine
how meaningful are the results of the
statistical tests.

Omega Squared
• Formula for a one-way design:
SSBG - (k-1)MSWG
ω2
= ------------------------- where k = # of groups
SSTOT + MSWG
For our problem this is:
18,232 - (5-1)(2882)
ω2
= -------------------------
147,992 + 2882
= 6,704 / 150,874
= .044, or approx. 4% of the variance in the measure
latency to protest is explained by the various types of
treatments included in this design

Rho (ρ)
• Our example includes a fixed effect.
• If the effect had been random, we would
have calculated rho which is given by
the following formula:
MSBG –MSWG
ρ = -------------------------------
MSBG + (n - 1)MSWG

Eta-Squared
η2
= SSBG / SSTOT
For our example:
= 18,232 / 147,992
= .1232, or 12% of the sample variance
in latency scores is explained by
the
treatments.

A Note…
• As with the post hoc tests, the strength
of association measures are often
computed only when the F-tests are
significant to help index the importance
of the result.
• However, they can also be calculated as
an index of effect size to help understand
why a test failed to reach significance.
• Therefore, not a bad idea to always
calculate them.

Contrasts
• A contrast or comparison among means is
simply a difference among the means with
appropriate algebraic signs
ψ1 = (-1*Y.1) + (1*Y.2) = a comparison between the
means of groups 1 & 2.
ψ2 = (-1*Y.1) + (-1*Y.2) + (2*Y.3) = a comparison of
groups 1 & 2 vs. group 3.
• In general contrasts will have the form of :
_ _ _
• ψi = c1Y.1 + c2Y.2 + . . . + cpY.p

Contrasts
• Contrasts can be preplanned or post hoc
and pairwise or non-pairwise.
• The number of pairwise comparisons that
can be defined for any set of means is
equal to:
p(p - 1) / 2
where p = the number of means
• Contrasts can also be orthogonal or
nonorthogonal .

Orthogonal vs. Nonorthogonal
• Contrasts are orthogonal if the following
equality holds:
P
Σ CijCi'j = 0
p=1
for the equal n case or:
P
Σ CijCi'j / nj = 0
j=1
for the unequal n case.

Orthogonal vs. Nonorthogonal Contrasts
• If the contrasts are nonorthogonal, they are correlated
as given by the following formula:
ρii' = (ΣCij Ci'j / nj) / √[(ΣCij
2
/ nj)( ΣCi'j
2
/nj)]
Thus the pairwise contrasts:
-1 1 0 0
-1 0 1 0
-1 0 0 1
__________________
-1 0 0 0
would not be considered to be orthogonal since their
products do not sum to 0 (each is correlated .5 if n=10).

On the other hand, the following pairwise
contrasts are orthogonal:
-1 1 0 0
0 0 -1 1
_____________
0 0 0 0

• The number of orthogonal contrasts is always
equal to p-1 or the number of levels of a factor
minus 1 (or equal to the d.f. for a factor).
• For our example with 5 means, we should be
able to define 4 contrasts that are orthogonal.
• One possible combination of such contrasts
would be the following:
4 -1 -1 -1 -1
0 3 -1 -1 -1
0 0 2 -1 -1
0 0 0 1 -1

• The sum of squares for any possible set of
p-1 orthogonal contrasts when summed
will always equal the total SSBG.
– This is not true for nonorthogonal contrasts
which will sum to more than the SS associated
with a factor (and can cause software to fail).
• Note that each contrast always has only a
single numerator degree of freedom.

Hypothesis Tests
Using Contrasts

A Priori Test Using the t-Statistic and a Contrast
• Ho: ψ1 = 0
• H1: ψ2 ≠ 0
• The t-statistic is given by:
^ p _
ψi Σ CjY.j
j=1
t = ------ = -------------------------
^
σψi √(MSE ΣCj
2
/ nj)
_ _ _
C1Y.1 + C2Y.2 + . . . + CpY.p
= ----------------------------------------------------------
√[MSE((C1
2
/ n1) + (C2
2
/ n2) + ... (Cp
2
/ np))]
• Where MSE = the error term for the appropriate effect

Example of “User Specified” (a priori / preplanned
or a posteriori) Tests of Hypotheses
• Use an extension of the t-test:
_ _ _ _
ψi c1Y1 + c2Y2 + c3Y3 + ... + CpYp
t = --- =
------------------------------------------------------
σψi c1
2
c2
2
c3
2
cp
2
√[MSWG( ---- + ----- + ----- + ... + -----)]
n1 n2 n3 n4

Example (cont)
• All contrasts of this type have only a
single degree of freedom for the
numerator and degrees of freedom for
the denominator equal to those
associated with the MSWG term.
• Always keep in mind that the MSWG term
to use for the contrast is the same term
associated with the overall test of the
factor whose mean differences are being
tested with the contrast.

For Our Example (1st
Contrast)
ψi -2(279) + -2(284) + -2(286) + 3(308) + 3(330)
t = --- = --------------------------------------------------------
σ ψi 4 4 4 9 9
√[2882(----- + ----- + ----- + ----- + -----)]
10 10 10 10 10
ψi-558 + -568 + -572 + 924 + 990
t = --- = -----------------------------------------
σ ψi √[2882(.4 + .4 + .4 + .9 +.9)]

Continued…
ψi 216
t = --- = ------------------
σ ψi √[2882(3)]
ψi 216
t = --- = ------------
σ ψi 92.9839
ψi
t = --- = 2.32298
σ ψi
• Note: t2
= F when only 1 d.f. in the numerator or
5.396 from F-table
• at 1, 45 degrees of freedom F-crit = 4.04

The Bonferroni Inequality
• Used to control the Type I Error rate
when hypothesis testing using multiple
contrasts.
• As we showed earlier, when you run a
series of significance tests, the Type I
error rate does not remain at the .05
or .01 levels across the tests, but actually
increases as a function of the number of
comparisons that you are making.

The Bonferroni Inequality (cont.)
Recall the formula:
1 - (1-α)C
where α signifies the Type I error rate for each
test and C equals the number of comparisons
made.
e.g. If we ran 6 significance tests each at
the .05 level, our actual error rate across the
comparisons would be:
1 - (1 - .05)6
= .2649
a much higher value than we ever want to use.

• To control the Type I error rate across a set of
contrasts it is possible to apply the Bonferroni
inequality to determine the proper significance
level to test each comparison or contrast at:
error rate = .05 / C
= .0083
• This controls the Type I error rate across the
set of contrasts so that it will not exceed our
specified level of .05 (or .01) since:
1 - (1 - .0083)6
= .0488 < .05

• The Bonferroni adjustment is very
conservative as with increasing tests, it
sets a very stringent level for the Type I
error rate.
• For this reason it has received increasing
criticism of late.
• Alternatives to the Bonferroni have been
proposed and should be considered
when appropriate.

Scheffe’s S Test
• Scheffe’s S test is another type of post
hoc test that is described by Kirk (1982)
as “…one of the most flexible,
conservative, and robust data snooping
procedures available.” (p121)
• Can be used for all possible contrasts not
just pairwise.
• Can be used with unequal n’s.

Scheffe’s S Test
• Error rate is set experimentwise for an
infinite number of possible contrasts.
• Most conservative of the post hoc tests.
• Less powerful than Tukey’s HSD.
• Uses the F-distribution and is robust
against violations of the normality and
homogeneity of variances assumption.

Scheffe’s cont.
• Critical difference a contrast must exceed
is:
^ P
Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj
2
/nj)
j=1
Where
P = # of means
Fαν1ν2 is taken from the F-table with v1 = p-
1 and the v2 = MSError d.f.

Scheffe’ cont.
Example using the contrast previously tested (-2, -2,
-2, 3, 3):
^ P
• Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj
2
/nj)
j=1
P=5
Fαν1ν2 = F.05,4,45 = 2.5975
= √(5-1)2.5975 * √2882[(-22
/10)+(-22
/10)+(-22
/10)+(32
/10)+(32
/10)]
= √(4)2.5975 * √2882[(4/10)+(4/10)+(4/10)+(9/10)+(9/10)]
= √10.18 * √2882[.4+.4+.4+.9+.9]
= 3.1906 * √2882[3]
= 296.67

Trend Analysis and Orthogonal
Polynomial Contrasts
• If you have a factor in an ANOVA model that has
three or more levels, it is also possible to
conduct a trend analysis using orthogonal
polynomial contrasts:
If you have: you can conduct tests for:
2 means linear trend only
3 means linear & quadratic trends
4 means linear, quadratic, & cubic trends
5 means linear, quadratic, cubic, &
quartic trends
etc. etc. etc.

• The contrasts to use might look like this:
linear -3 -1 1 3
quadratic 1 -1 -1 1
cubic -1 3 -3 1
• Some software packages, like SAS, have
procedures to generate trend coefficients
for any number of means. Others, like
SPSS, will run a trend analysis as part of
the general output.

Steps in Conducting a Trend Analysis
1. Calculate the SSBG associated with a
linear trend component
2. Test the residual or remaining SS for
any further departures from linearity
3. Calculate the SSBG associated with a
quadratic trend component
4. Repeat step 2
5. Evaluate SS with next higher-order
trend.
6. Etc.

Example of a Linear Trend
• Assume that we have run an experiment and
have found the sums across 4 treatment levels
to be equal to 22, 28, 50, and 72.
• Assume that we also calculated the SSWG and
find it to be equal to 41.0 and SSBG = 194.5.
• Further assume that each level of the IV has 8
participants.
• Our linear trend contrast coefficients are:
-3 -1 1 3

Example (cont)
• Our linear contrast would be:
^ P n
ψlin = Σ (Cij ΣYij)
j=1 i=1
= -3(22) + -1(28) + 1(50) + 3(72) = 172
Note the i in Cij refers to the set of contrasts
associated with the number of the trend
component (i.e. linear = 1, quadratic = 2,
etc.)

Compute the Sums of Squares
^
^ ψ2
lin (172)2
SS ψlin = ---------- = ---------------------------- = 184.9
p
8[(-3)2
+ (-1)2
+ (1)2
+ (3)2
]
nΣC2
1j
p=1
d.f.=1
• SSψdep from lin = SSBG - SSψlin
=194.5 - 184.9 = 9.6
d.f. = p - 2 = 2

Compute the Sums of Squares
Note: If using the group means instead of
the sums, use the following formula
instead:
P _
n[Σ(CijY.j)]2
j=1
SS ψlin = ------------------
p
ΣC2
ij
p=1

Compute the F-ratio for Both
MS ψlin SS ψlin/ 1 184.9/1
F = --------------- = ---------------- = ----------------
MSWG SSWG / p(n-1) 41.0/28
= 126.30 = F obtained for linear trend
• d.f. = 1 numerator and p(n-1) for the
denominator
• F(1,28) = 4.20 = F critical

Compute the F-ratio (cont)
MS ψdep from lin SS ψdep from lin/ 2 9.6/2
F = ------------------- = ------------------ = ------------
MSWG SSWG / p(n-1) 41.0/28
= 3.28 = F obtained (any other trend)
• d.f. = 2, p(n-1)
• F(2,28) = 3.34 = F critical

Trend Analysis Example Cont.
• Based on the F-tests, we would
conclude that there is a significant
linear trend but that there is no
higher-order trend present.

Additional Post Hoc Tests
• Neuman Keuls
• Tukey Kramer
• Scheffe’
• The Scheffe’ test is one of special interest
since it is the only post-hoc test apart from
ones specifically defined by the user that can
test other than pairwise comparisons. It also
offers the most stringent Type I error rate of all
of the post-hoc tests.

Power of the F-test
• Effect sizes in the ANOVA context are typically calculated
in one of two ways:
– As a standardized average difference between the
group means ^
– Using eta-squared (η2
) and omega-hat squared (ω2
)
• In the case of number 1, the effect size is first
calculated and translated into a value known as Phi
and then special tables are referenced to estimate
power and sample size.
• The second approach is much easier to implement.
Here one runs the analysis, calculates eta-squared or
omega-hat squared (for fixed effects) or the intra-class
correlation coefficient (for random effects).

Remember…
SSBG
η2
= -------------
SSTOT
^ σ2
BG SSBG - (a-1)(MSWG)
ω2
= ------------------- = -------------------------
σ2
BG + σ2
WG SSTOT +
MSWG
(a - 1) (F - 1)
= -------------------------
(a - 1)(F - 1) + a(n)

Power (cont.)
• Once the effect size has been
calculated, it is a simple matter to turn
it into another useful value using the
following formula:
f = √(ω2
/ 1 - ω2
)
(f is a symbol used by Jacob Cohen to
index a value that can be used to
estimate power and sample sizes.)

Conventions for Effect Sizes (Cohen)
f = .01 = small effect size
f = .05 = medium effect size
f = .14 = large effect size
• After calculating f, one can then use
information from published tables in Cohen to
calculate power and sample size or use
existing computer software such as G-Power
• Note that there is an option available in SPSS's
GLM procedure which will output power values
for main and interaction effects for any ANOVA
design model

Stactistics: Analysis of Variance Part I

More Related Content

Similar to Stactistics: Analysis of Variance Part I (20)

Recently uploaded (20)

Stactistics: Analysis of Variance Part I