SlideShare a Scribd company logo
Analysis of Variance
(ANOVA) Part I
Why study Analysis of Variance (ANOVA)?
• ANOVA technology was developed by R. A.
Fisher for use in the agricultural trials being
run at the Rothamstad Agricultural Research
Station where he was employed for a period of
time.
• The research was concerned with testing the
effects of different types of organic and
inorganic fertilizers on the crop yield.
• For this reason, many terms used to describe
ANOVA models still reflect the terminology
used in the agricultural settings (e.g., split-plot
designs).
• ANOVA methods were specifically
developed for the analysis of data from
experimental studies.
• The models assume that random
assignment has taken place as a means
of assuring that the IID assumptions of
the statistical test are met.
• These designs are still the staple of
certain social science disciplines where
much experimental research is
conducted (e.g., psychology).
Why study Analysis of Variance (ANOVA)?
• But what is the utility for other disciplines?
• There are several reasons why knowledge of
ANOVA is important:
– Clinical trials in all areas of research involving
human participants still employ these designs.
– Many of the techniques employed in testing
hypotheses in the ANOVA context are generalizable
to other statistical methods.
– There are other statistical procedures, e.g., variance
components models that use a similar approach to
variance decomposition as is employed in ANOVA
analysis and can be more readily understood with
an understanding of ANOVA.
– Possible involvement in interdisciplinary research.
Why study Analysis of Variance (ANOVA)?
• ANOVA models distinguish between independent
variables (IVs), dependent variables (DVs), blocking
variables, and levels of those variables.
• All IVs and Blocking variables are categorical. DVs are
continuous and IID normally.
• We have already discussed IVs and DVs.
• Blocking variables are variables needing to be controlled
in an analysis that are not manipulated by the
researchers and hence not true IV’s (e.g., gender, grade
in school, etc.).
Overview of ANOVA
“Levels” of IVs and Blocking Variables
• Each IV and Blocking variable in an ANOVA
model must have two or more “levels.”
– Example: the independent variable may be a type of
therapy, a drug, the induction of anger or frustration
or some other experimental manipulation.
• Levels could include different types of therapies
or therapies of differing degrees of intensity,
different doses of a drug, different induction
procedures, etc.
• It is assumed that participants are randomly
assigned to levels of the IV but not to levels of
any blocking variables.
Overview of ANOVA
• IVs and Blocking variables are referred to
as “Factors” and each factor is assumed
to have two or more levels.
• In ANOVA we distinguish between
between groups and within groups
factors.
Between vs. Within Factors
• A between groups factor is one in which
each subject appears at only one level of
the IV.
• A within groups factor (of which a
repeated measures factor is an example),
is one in which each subject appears at
each level of the IV.
• It is possible to have a design with a
mixture of between and within groups
factors or effects.
Fixed vs. Random Effects:
Expected Mean Squares (EMS)
• Effects or factors are fixed when all
levels of an IV or blocking variable we
are interested in generalizing to are
included in the analysis.
• Effects are random when they represent
a sampling of levels from the universe of
possible values.
Examples of Random and Fixed Effects
• Drug dosages of 2, 4, or 10 mg
– random: since not all levels are
represented.
• Different raters providing observational
ratings of behavior
– random
• Gender - male and female
– fixed
• MST treatment versus usual services
– fixed
Fixed vs. Random Effects (cont.)
• The distinction between fixed and
random effects is important since it
has implications for the way in which
the treatment effects are estimated
and the generalizability of the results
• For fixed effect models, we have complete
information (all levels of a variable or factor are
observed) and we can calculate the effect of the
treatment by taking the average across the
groups present.
• In the case of random factors, we have
incomplete information, (not all levels of the
factor are included in the design). For random
factors, we are estimating the treatment effect
at the level of the population given only the
information available from the levels we
included in our design. The formulas are
designed to represent this uncertainty.
Fixed versus Random Effects (cont.)
• In the case of a fixed effect, we can
generalize the results only to the levels
of the variables included in our
analyses.
• Random effects assume that the results
will be generalized to other levels
between the endpoint values included in
our analyses.
“Levels” of IVs and Blocking Variables
• The ANOVA model is a “means model”
– i.e., it assumes that any observed differences
in behavior can be completely described
using only information in the means.
• The ANOVA model is also a population-
averaged model.
• It evaluates the effects of treatments and
blocking variables at the group rather
than at the individual level.
Hypothesis Testing
• ANOVA involves the same 4-step
hypothesis testing procedure we
applied in the case of the z-test, t-tests,
and tests for the correlation coefficients.
• We will, however, use a different
sampling distribution to determine the
critical values of our statistic.
• This sampling distribution is called the
F-distribution and the significance test
is now called an F-test.
F-test Basics
• F-statistics are formed as the ratio of two chi-
square distributions divided by their respective
degrees of freedom:
χ2
/d.f.1
F(1,d.f.Denom) = t2
= --------------
χ2
/d.f.2
• As a result, unlike the t-distribution the shape of
which was determined by one degree of
freedom parameter, the F-distribution is
determined by two degree of freedom
parameters.
• When there are only 2 groups, F is equal to t2
.
The F Statistic (cont.)
• A very important property of the F-test under the null
hypothesis of no differences between the groups is
that, in theory, the numerator and denominator are
independent estimates of the same population
variance.
• However, the denominator measures only “error” or
“noise” while the numerator measures both error and
treatment effect.
• Under the null hypothesis of no effect of treatment, the
expected value of the F-statistic is 1.0. As the
treatment effect increases in size, F becomes greater
than 1.0
• Note: Although in theory F should never be less than
1.0, with “real” data it will fall below 1.0 at times.
Error Term
• The error term in the denominator of the F-
statistic is an extension of the two sample t-test
error term.
– In the two sample t-test we saw that since two independent
estimates of the population variance were available - one
from each sample - we could improve on the estimate of
the population parameter by averaging across the two
estimates. The error term which resulted was called a
“pooled error term.”
• In ANOVA, we will have at least two but possibly
three or more groups. Regardless, the process
is the same. To improve on our estimate of the
population parameter, we pool the variance
estimates together - one from each cell or
sample - and use this mean squared error as the
error term in our F-test.
Tabled Values of F
• The critical values in the table for the F-statistic
are non-normally distributed and include only
the values at the upper tail of the distribution.
• Lower values can be obtained by taking the
reciprocal of the tabled value of F, i.e., 1/F but
these values are rarely used.
• The F-distribution changes shape depending on
the numerator and denominator degrees of
freedom as can be seen in the next slide:
Stactistics: Analysis of Variance Part I
ANOVA Hypotheses
• Ho: μ1 = μ2 = μ3
• H1: μ1 ≠ μ2 ≠ μ3
• As with the other statistics covered in this
course, the F-test can be run working from
definitional formulas or computational formulas.
• We will work through the definitional formulas
in class examples.
• One reason for this is that it is easy to calculate
the statistic using these formulas. More
importantly, it is easier to see what each
component of the statistic represents.
Completely Randomized
(CR-p) Design
One-Way ANOVA Design Model
• Simplest ANOVA model – single factor:
Yij = μ + αj + εij
• Where i = person and j=group.
• This model says that each person's score
can be described by:
– μ, the overall or grand mean
– αj, an average group level treatment effect, and
– εij, a parameter describing each individual's
deviation from the average group effect.
The F-statistic
• For all ANOVA designs, the F-statistic is
comprised of two parts:
– Numerator: a component known as the “mean
square between groups”
– Denominator: “mean square within groups”
– We form a ratio of these two variance terms to
compute the F statistic:
MSBG SSBG / d.f.BG
F = --------- = ---------------
MSWG SSWG / d.f.WG
The F-statistic (cont.)
• The trick is to correctly estimate the
variance components forming the
numerator and denominator for different
combinations of fixed and random effects.
• The correct formulas to use are based on
the statistical theory underlying ANOVA
and are derived using what are termed
“expected mean square” formulas.
Components of the F-Statistic
• We define the numerator and denominator of the
F-test in terms of their expected values (the
values that one should obtain in the population if
Ho were true).
• The expected mean squares for the two types of
effects in the CR-p design are given by:
Model I Model II
(Fixed effect) (Random effect)
_____________________________________________
MSBG = σε
2
+ nΣαj
2
/ (p - 1) σε
2
+ n(1 - p/P)σα
2
MSWG = σε
2
σε
2
________________________________________
Forming the F-statistics for the two
possible design models:
σε
2
+ nΣαj
2
/ (p - 1)
F (Model 1) = ----------------------------
σε
2
σε
2
+ n(1 - p/P)σα
2
F (Model 2) = ----------------------------
σε
2
Calculating the Variance Components
J _ _
SSBG = n Σ (X.j - X..)2
j=1
J I _ J ^
SSWG = Σ Σ (Xij - X.j)2
or Σ s2
(n-1)
j=1 i=1 j=1
MSBG = SSBG/d.f.BG
MSWG = SSWG/d.f.WG
Variance Components (cont).
• If we add the SSBG and SSWG terms together, we have the
Total Sums of Squares or SSTOT:
SSTOT = SSBG+SSWG
= ΣΣ(Yij – Y..)2
• The ANOVA process represents a “decomposition” of
the total variation in a DV into “components” of
variation attributable to the factors included in the
model and a residual or error component.
• In the CR-p design, SSBG is the variance in the DV that
can be “explained” by the IV and SSWG is the error,
residual, or unexplained variation left over after
accounting for SSBG.
Assumptions
• ANOVA makes the same assumptions as the t-
test.
• F assumptions:
– The dependent variable is from a population that
is normally distributed.
– The sample observations are random samples
from the population.
– The numerator and denominator of the F-test are
estimates of the same population variance.
– The numerator and denominator are
independent.
Assumptions (cont.)
• Model Assumptions:
– The model equation (design model)
reflects all sources of variation affecting a
DV and each score is the sum of several
components.
– The experiment contains all treatment
levels of interest
– The error effect is independent of all other
errors and is normally distributed within
each treatment population with a mean
equal to 0 and variance equal to σ2
(homogeneity of variances assumption)
Assumptions (cont.)
• Note that the assumptions of ANOVA are
frequently violated to at least some
extent with real-world data.
• Generally the violations have little effect
on significance or power if:
– The data are derived through random
sampling
– The sample size is not small
– Departures from normality are not large
– n’s within each cell are equal.
Violations of Assumptions
• The F-test is robust to violations of the
homogeneity of variances assumption.
When the violation is extreme, however,
the result is an incorrect Type I error rate
(It will become increasingly inflated as the
violation becomes more severe).
• As we noted with the t-test, if the group
sizes are equal, the impact of
heterogeneity is of less concern.
Violations of Assumptions
• The effect on the error term will be liberal
when the largest cell sizes are associated
with the smallest variances and
conservative if the largest cell sizes are
associated with the largest variances.
• In most cases, problems arise when the
cell or group sizes are very different.
Normalizing Transformations
• When the data are more severely skewed,
kurtotic, or both, or the homogenity of
variances assumption has been violated,
it is sometimes necessary to apply a
normalizing transformation to the DV to
allow the analysis to be run.
Transformations
• Normalizing transformations help to
accomplish several things:
– Homogeneity of error variances
– Normality of errors
– Additivity of effects, i.e., effects that do not
interact (as is desirable, for example, in a
repeated measures ANOVA design). By
transforming the scale of measurement,
additivity can sometimes be achieved.
– We will talk about additivity more in the
context of repeated measures ANOVA.
Types of Normalizing Transformations
• Square Root
– Y'=√Y
– Use when treatment level means and
variances are proportional (moderate
positive skew)
Types of Normalizing Transformations (Cont.)
• Log 10
– Y'=log10(Y+1)
– Use when Tx means and SDs are
proportional (more extreme positive
skew)
Types of Normalizing Transformations (Cont.)
• Angular or Inverse Sine
– Y'=2*arcsin(√Y)
– Use when means and variances are
proportional and underlying distribution
is binomial.
– Often used when the DV represents
proportions.
Types of Normalizing Transformations (Cont.)
• Inverse or Reciprocal
– Y'=1/Y
– Use when squares of Tx means are
proportional to SDs (severe positive
skew or L-shaped distribution)
Negatively Skewed Distributions
• When the distribution is negatively
skewed, one must first "reflex" the
distribution and then apply the correct
transformation. To reflex a set of scores,
simply subtract each value from the
highest value plus one unit.
– e.g. an item is on a 1-5 scale, and we wish to
reflex it (change it to a 5-1 scale) we would
simply subtract each value from 6:
Reflexed(y) = 6-y
Selecting a Transformation
• If you are unsure which transformation will
work best, you can apply all possible
transformations to the highest and lowest
scores in each treatment level.
• Calculate the range of values for each
treatment level by subtracting the smallest
score from the largest.
• Form a ratio of the largest and smallest ranges
for each transformation across Tx levels.
• The transformation associated with the
smallest ratio wins.
• (See Kirk Experimental Design for an example)
Additivity
• If addititivity is of interest, use a test for
nonadditivity such as that developed by
Tukey (1949) and covered in many
statistics texts.
• Then select a transformation that
reduces nonadditivity to an acceptable
level.
Example of a Completely Randomized
Between Groups Design (CR-5)
• One independent variable with five levels
• The independent variable represents
different types of stranger awareness
training and the dependent variable the
latency of the children to protest verbally
about a stranger’s actions (measured in
seconds).
Example (cont.)
Experimenter Mother Mother Role
Control verbal verbal natural play
__________________________________________________________
Mean 279 284 286 308
330
SD 50 53 51 56 58
n 10 10 10 10 10
__________________________________________________________
Y.. = 297.4
Can eyeball the SD’s relative to the sample sizes to see if the
homogeneity assumption has been violated.
CR-5 Example (cont.)
• Ho: μ1 = μ2 = μ3 = μ4= μ5
• H1: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5
• Use ~F(4,45)
• Where dfnum = p-1 and dfden = p(n-1)
• Set up decision rules (from F-table):
– Fcrit(.05,4,45) = 2.61
– If Fobs > Fcrit then reject Ho
Formulas
• Calculate statistic and apply decision rules:
J ^
SSWG = Σ S2
(n-1) = [(50)2
+(53)2
+(51)2
+(56)2
+(58)2
]*9=129,690
j=1
J _ _
SSBG = nΣ (Xj - X..)2
j=1
= (10)(279-297.4)2
+(284-297.4)2
+(286-297.4)2
+
(308-297.4)2
+(330-297.4)2
= (10)1823.2
= 18,232
SSTOT = SSBG+SSWG = 129,690+18,232 = 147,992
Formulas (cont.)
d.f.BG = p-1 = 5-1= 4
d.f.WG = p(n-1) = 5(10-1)= 45
MSBG = SSBG/d.f.BG
= 18,232/4
= 4558
MSWG = SSWG/d.f.WG
= 129,690/45
= 2882
Formulas (cont.)
MSBG
F = ------
MSWG
= 4558 / 2882
= 1.58
Fcrit(.05,4,45) = 2.61
Since F-obs < F-crit do not reject Ho.
Conclude no effect due to treatment
What if F was significant?
• If you found that there was a significant
difference between means using the F-statistic,
you still do not know which of the groups were
significantly different from the others.
• This is because the F-test is a simultaneous or
omnibus test of the difference between all
possible combinations of group means.
• In the case of the two-sample t-test, this was
easy to resolve by looking at the group means.
In the case of three or more groups, this is not
as easy to determine.
• For this reason, it is necessary to introduce
what are known as post-hoc tests as a follow-up
to the F-test.
Post Hoc Tests
• Post hoc tests take many forms.
– For example, your book introduces one
post hoc test known as Tukey's HSD (where
the HSD stands for "honestly significantly
different").
– This test compares all pairwise
combinations of means and allows one to
determine which of a set of means are
actually different from one another.
Post Hoc Tests (cont.)
• Other commonly used Post Hoc Tests:
– Tukey's HSD: Evaluates all pairwise
differences between means and is based on
the Studentized range statistic (q). Maintains
error rate familywise at alpha.
– Fisher's LSD (Do not EVER use this!):
tantamount to no control. Essentially sets
error rate per contrast.
– Dunnett’s: Allows one to compare p-1
treatment means to a control mean with the
correlation between any two contrasts
being .50. Controls error rate familywise.
Post Hoc Tests (cont.)
– Neuman-Keuls: Controls the error rate at
alpha for any ordered set of means and takes
into account the distance between the
means. Control is between familywise and
per contrast.
– Scheffe's: One of the most stringent and
therefore least powerful tests. Holds Type I
error rate at alpha for all possible contrasts.
Can be used with unequal n’s and for other
than pairwise comparisons.
– Duncan's multiple range test:
Type I Error Rate
• Post-hoc tests were developed to deal
with a problem associated with running
multiple significance tests.
• Whenever more than one hypothesis test
is run on the same data, the nominal or
overall Type I error rate increases. The
amount of increase is described by a
simple formula:
True alpha = 1-(1-α)c
• Where C = the number of significance
tests run.
Type I Error Rate
• As the number of tests increases, the
Type I error rate also increases. If you
run 5 significance tests each at the .05
level, the true error rate will be 1 - (1
- .05)5
= .2262.
• It is easy to see that this is a problem in
the ANOVA contexts since determining
which groups differ from one another
requires the application of multiple
significance tests.
Type I Error Rate
• In response, the various post hoc tests were
designed to hold the Type I error rate at the
nominal level across an entire set of
significance tests.
• To better understand issues having to do with
the Type I error rate in ANOVA models, one
must first understand that the error rate can
be defined at several different levels:
– Experiment-wise or across all possible
comparisons in a study.
– Family-wise or across all possible comparisons for
one factor or effect (if an interaction).
– Per contrast or comparison.
Post Hoc Tests (cont.)
• Setting the Type I error rate per contrast is like
setting no control at all.
• Setting the error rate experiment-wise is
generally too conservative for most
applications.
• In most cases, the error rate is set family-wise.
– Note: Fisher's LSD is tantamount to no control at
all and should never be used as a post hoc test for
this reason.
– Scheffe’ is the most conservative and can also be
used for other than pairwise comparisons.
Example using Tukey’s HSD
• Tukey's HSD is specially designed to control or
maintain the Type I error rate across all possible
pairwise comparisons at the .05 level. The formula for
Tukey's is:
tHSD = q.05,p(√(MSWG/n))
• q is given in a Table in your book. For our problem
using the .05 level q=4.04:
tHSD = 4.04 (√(2882/10))
= 68.58
• 68.58 represents the value that the difference between
any two pairs of means must exceed for that difference
to be considered significant.
M1 M2 M3 M4 M5
M1 - -5 -7 -26 -51
M2 - -2 -24 -46
M3 - -22 -44
M4 - -22
M5 -
In this example none of the means using
Tukey's HSD are significantly different as
none of the absolute values of the
difference scores exceeds the critical
difference value of 68.58.
Tukey’s HSD (cont.)
• Sometimes, a pair of means may be
significantly different using a post hoc test
when the omnibus F-test tells you that there
are no differences or vice versa.
– Can happen since the two tests sometimes use a
different error term and d.f. or the means for one
pair of groups may be sufficiently different but the
difference is lost when SS are grouped for the
overall test.
– You should decide which to interpret based on your
best judgment as to which level of the Type I error
rate is most appropriate.
ω2
, η2
, and ρ
• These three statistics represent the proportion of the
total variance in the dependent variable explained by
the independent variables included in a design.
– Omega hat squared (ω2
) is used to estimate the proportion of
variance in the dependent variable explained by the fixed effect
independent variables in the population.
– Rho (ρ) is the equivalent of ω2
but for random effects.
– Eta-squared (η2
) is the proportion of variance explained in the
sample. Eta is similar to the Pearson r2
but indexes linear and
nonlinear association.
– These statistics are called measures of the “strength of
association” (between the IV's and DV).
Omega Hat Squared, rho, and
Eta-Squared cont.
• These statistics also provide additional
information to that of the significance
tests allowing a researcher to determine
how meaningful are the results of the
statistical tests.
Omega Squared
• Formula for a one-way design:
SSBG - (k-1)MSWG
ω2
= ------------------------- where k = # of groups
SSTOT + MSWG
For our problem this is:
18,232 - (5-1)(2882)
ω2
= -------------------------
147,992 + 2882
= 6,704 / 150,874
= .044, or approx. 4% of the variance in the measure
latency to protest is explained by the various types of
treatments included in this design
Rho (ρ)
• Our example includes a fixed effect.
• If the effect had been random, we would
have calculated rho which is given by
the following formula:
MSBG –MSWG
ρ = -------------------------------
MSBG + (n - 1)MSWG
Eta-Squared
η2
= SSBG / SSTOT
For our example:
= 18,232 / 147,992
= .1232, or 12% of the sample variance
in latency scores is explained by
the
treatments.
A Note…
• As with the post hoc tests, the strength
of association measures are often
computed only when the F-tests are
significant to help index the importance
of the result.
• However, they can also be calculated as
an index of effect size to help understand
why a test failed to reach significance.
• Therefore, not a bad idea to always
calculate them.
Contrasts
Contrasts
• A contrast or comparison among means is
simply a difference among the means with
appropriate algebraic signs
ψ1 = (-1*Y.1) + (1*Y.2) = a comparison between the
means of groups 1 & 2.
ψ2 = (-1*Y.1) + (-1*Y.2) + (2*Y.3) = a comparison of
groups 1 & 2 vs. group 3.
• In general contrasts will have the form of :
_ _ _
• ψi = c1Y.1 + c2Y.2 + . . . + cpY.p
Contrasts
• Contrasts can be preplanned or post hoc
and pairwise or non-pairwise.
• The number of pairwise comparisons that
can be defined for any set of means is
equal to:
p(p - 1) / 2
where p = the number of means
• Contrasts can also be orthogonal or
nonorthogonal .
Orthogonal vs. Nonorthogonal
• Contrasts are orthogonal if the following
equality holds:
P
Σ CijCi'j = 0
p=1
for the equal n case or:
P
Σ CijCi'j / nj = 0
j=1
for the unequal n case.
Orthogonal vs. Nonorthogonal Contrasts
• If the contrasts are nonorthogonal, they are correlated
as given by the following formula:
ρii' = (ΣCij Ci'j / nj) / √[(ΣCij
2
/ nj)( ΣCi'j
2
/nj)]
Thus the pairwise contrasts:
-1 1 0 0
-1 0 1 0
-1 0 0 1
__________________
-1 0 0 0
would not be considered to be orthogonal since their
products do not sum to 0 (each is correlated .5 if n=10).
On the other hand, the following pairwise
contrasts are orthogonal:
-1 1 0 0
0 0 -1 1
_____________
0 0 0 0
• The number of orthogonal contrasts is always
equal to p-1 or the number of levels of a factor
minus 1 (or equal to the d.f. for a factor).
• For our example with 5 means, we should be
able to define 4 contrasts that are orthogonal.
• One possible combination of such contrasts
would be the following:
4 -1 -1 -1 -1
0 3 -1 -1 -1
0 0 2 -1 -1
0 0 0 1 -1
• The sum of squares for any possible set of
p-1 orthogonal contrasts when summed
will always equal the total SSBG.
– This is not true for nonorthogonal contrasts
which will sum to more than the SS associated
with a factor (and can cause software to fail).
• Note that each contrast always has only a
single numerator degree of freedom.
Hypothesis Tests
Using Contrasts
A Priori Test Using the t-Statistic and a Contrast
• Ho: ψ1 = 0
• H1: ψ2 ≠ 0
• The t-statistic is given by:
^ p _
ψi Σ CjY.j
j=1
t = ------ = -------------------------
^
σψi √(MSE ΣCj
2
/ nj)
_ _ _
C1Y.1 + C2Y.2 + . . . + CpY.p
= ----------------------------------------------------------
√[MSE((C1
2
/ n1) + (C2
2
/ n2) + ... (Cp
2
/ np))]
• Where MSE = the error term for the appropriate effect
Example of “User Specified” (a priori / preplanned
or a posteriori) Tests of Hypotheses
• Use an extension of the t-test:
_ _ _ _
ψi c1Y1 + c2Y2 + c3Y3 + ... + CpYp
t = --- =
------------------------------------------------------
σψi c1
2
c2
2
c3
2
cp
2
√[MSWG( ---- + ----- + ----- + ... + -----)]
n1 n2 n3 n4
Example (cont)
• All contrasts of this type have only a
single degree of freedom for the
numerator and degrees of freedom for
the denominator equal to those
associated with the MSWG term.
• Always keep in mind that the MSWG term
to use for the contrast is the same term
associated with the overall test of the
factor whose mean differences are being
tested with the contrast.
For Our Example (1st
Contrast)
ψi -2(279) + -2(284) + -2(286) + 3(308) + 3(330)
t = --- = --------------------------------------------------------
σ ψi 4 4 4 9 9
√[2882(----- + ----- + ----- + ----- + -----)]
10 10 10 10 10
ψi-558 + -568 + -572 + 924 + 990
t = --- = -----------------------------------------
σ ψi √[2882(.4 + .4 + .4 + .9 +.9)]
Continued…
ψi 216
t = --- = ------------------
σ ψi √[2882(3)]
ψi 216
t = --- = ------------
σ ψi 92.9839
ψi
t = --- = 2.32298
σ ψi
• Note: t2
= F when only 1 d.f. in the numerator or
5.396 from F-table
• at 1, 45 degrees of freedom F-crit = 4.04
The Bonferroni Inequality
• Used to control the Type I Error rate
when hypothesis testing using multiple
contrasts.
• As we showed earlier, when you run a
series of significance tests, the Type I
error rate does not remain at the .05
or .01 levels across the tests, but actually
increases as a function of the number of
comparisons that you are making.
The Bonferroni Inequality (cont.)
Recall the formula:
1 - (1-α)C
where α signifies the Type I error rate for each
test and C equals the number of comparisons
made.
e.g. If we ran 6 significance tests each at
the .05 level, our actual error rate across the
comparisons would be:
1 - (1 - .05)6
= .2649
a much higher value than we ever want to use.
The Bonferroni Inequality (cont.)
• To control the Type I error rate across a set of
contrasts it is possible to apply the Bonferroni
inequality to determine the proper significance
level to test each comparison or contrast at:
error rate = .05 / C
= .0083
• This controls the Type I error rate across the
set of contrasts so that it will not exceed our
specified level of .05 (or .01) since:
1 - (1 - .0083)6
= .0488 < .05
The Bonferroni Inequality (cont.)
• The Bonferroni adjustment is very
conservative as with increasing tests, it
sets a very stringent level for the Type I
error rate.
• For this reason it has received increasing
criticism of late.
• Alternatives to the Bonferroni have been
proposed and should be considered
when appropriate.
Scheffe’s S Test
• Scheffe’s S test is another type of post
hoc test that is described by Kirk (1982)
as “…one of the most flexible,
conservative, and robust data snooping
procedures available.” (p121)
• Can be used for all possible contrasts not
just pairwise.
• Can be used with unequal n’s.
Scheffe’s S Test
• Error rate is set experimentwise for an
infinite number of possible contrasts.
• Most conservative of the post hoc tests.
• Less powerful than Tukey’s HSD.
• Uses the F-distribution and is robust
against violations of the normality and
homogeneity of variances assumption.
Scheffe’s cont.
• Critical difference a contrast must exceed
is:
^ P
Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj
2
/nj)
j=1
Where
P = # of means
Fαν1ν2 is taken from the F-table with v1 = p-
1 and the v2 = MSError d.f.
Scheffe’ cont.
Example using the contrast previously tested (-2, -2,
-2, 3, 3):
^ P
• Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj
2
/nj)
j=1
P=5
Fαν1ν2 = F.05,4,45 = 2.5975
= √(5-1)2.5975 * √2882[(-22
/10)+(-22
/10)+(-22
/10)+(32
/10)+(32
/10)]
= √(4)2.5975 * √2882[(4/10)+(4/10)+(4/10)+(9/10)+(9/10)]
= √10.18 * √2882[.4+.4+.4+.9+.9]
= 3.1906 * √2882[3]
= 296.67
Trend Analysis and Orthogonal
Polynomial Contrasts
• If you have a factor in an ANOVA model that has
three or more levels, it is also possible to
conduct a trend analysis using orthogonal
polynomial contrasts:
If you have: you can conduct tests for:
2 means linear trend only
3 means linear & quadratic trends
4 means linear, quadratic, & cubic trends
5 means linear, quadratic, cubic, &
quartic trends
etc. etc. etc.
• The contrasts to use might look like this:
linear -3 -1 1 3
quadratic 1 -1 -1 1
cubic -1 3 -3 1
• Some software packages, like SAS, have
procedures to generate trend coefficients
for any number of means. Others, like
SPSS, will run a trend analysis as part of
the general output.
Steps in Conducting a Trend Analysis
1. Calculate the SSBG associated with a
linear trend component
2. Test the residual or remaining SS for
any further departures from linearity
3. Calculate the SSBG associated with a
quadratic trend component
4. Repeat step 2
5. Evaluate SS with next higher-order
trend.
6. Etc.
Example of a Linear Trend
• Assume that we have run an experiment and
have found the sums across 4 treatment levels
to be equal to 22, 28, 50, and 72.
• Assume that we also calculated the SSWG and
find it to be equal to 41.0 and SSBG = 194.5.
• Further assume that each level of the IV has 8
participants.
• Our linear trend contrast coefficients are:
-3 -1 1 3
Example (cont)
• Our linear contrast would be:
^ P n
ψlin = Σ (Cij ΣYij)
j=1 i=1
= -3(22) + -1(28) + 1(50) + 3(72) = 172
Note the i in Cij refers to the set of contrasts
associated with the number of the trend
component (i.e. linear = 1, quadratic = 2,
etc.)
Compute the Sums of Squares
^
^ ψ2
lin (172)2
SS ψlin = ---------- = ---------------------------- = 184.9
p
8[(-3)2
+ (-1)2
+ (1)2
+ (3)2
]
nΣC2
1j
p=1
d.f.=1
• SSψdep from lin = SSBG - SSψlin
=194.5 - 184.9 = 9.6
d.f. = p - 2 = 2
Compute the Sums of Squares
Note: If using the group means instead of
the sums, use the following formula
instead:
P _
n[Σ(CijY.j)]2
j=1
SS ψlin = ------------------
p
ΣC2
ij
p=1
Compute the F-ratio for Both
MS ψlin SS ψlin/ 1 184.9/1
F = --------------- = ---------------- = ----------------
MSWG SSWG / p(n-1) 41.0/28
= 126.30 = F obtained for linear trend
• d.f. = 1 numerator and p(n-1) for the
denominator
• F(1,28) = 4.20 = F critical
Compute the F-ratio (cont)
MS ψdep from lin SS ψdep from lin/ 2 9.6/2
F = ------------------- = ------------------ = ------------
MSWG SSWG / p(n-1) 41.0/28
= 3.28 = F obtained (any other trend)
• d.f. = 2, p(n-1)
• F(2,28) = 3.34 = F critical
Trend Analysis Example Cont.
• Based on the F-tests, we would
conclude that there is a significant
linear trend but that there is no
higher-order trend present.
Additional Post Hoc Tests
• Neuman Keuls
• Tukey Kramer
• Scheffe’
• The Scheffe’ test is one of special interest
since it is the only post-hoc test apart from
ones specifically defined by the user that can
test other than pairwise comparisons. It also
offers the most stringent Type I error rate of all
of the post-hoc tests.
Power of the F-test
• Effect sizes in the ANOVA context are typically calculated
in one of two ways:
– As a standardized average difference between the
group means ^
– Using eta-squared (η2
) and omega-hat squared (ω2
)
• In the case of number 1, the effect size is first
calculated and translated into a value known as Phi
and then special tables are referenced to estimate
power and sample size.
• The second approach is much easier to implement.
Here one runs the analysis, calculates eta-squared or
omega-hat squared (for fixed effects) or the intra-class
correlation coefficient (for random effects).
Remember…
SSBG
η2
= -------------
SSTOT
^ σ2
BG SSBG - (a-1)(MSWG)
ω2
= ------------------- = -------------------------
σ2
BG + σ2
WG SSTOT +
MSWG
(a - 1) (F - 1)
= -------------------------
(a - 1)(F - 1) + a(n)
Power (cont.)
• Once the effect size has been
calculated, it is a simple matter to turn
it into another useful value using the
following formula:
f = √(ω2
/ 1 - ω2
)
(f is a symbol used by Jacob Cohen to
index a value that can be used to
estimate power and sample sizes.)
Conventions for Effect Sizes (Cohen)
f = .01 = small effect size
f = .05 = medium effect size
f = .14 = large effect size
• After calculating f, one can then use
information from published tables in Cohen to
calculate power and sample size or use
existing computer software such as G-Power
• Note that there is an option available in SPSS's
GLM procedure which will output power values
for main and interaction effects for any ANOVA
design model

More Related Content

PPT
anova & analysis of variance pearson.ppt
PDF
07. Repeated-Measures and Two-Factor Analysis of Variance.pdf
PPTX
Repeated-Measures and Two-Factor Analysis of Variance
PPT
1 ANOVA.ppt
DOCX
(Individuals With Disabilities Act Transformation Over the Years)D
PPTX
Anova - One way and two way
PPTX
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
PPTX
mean comparison.pptx
anova & analysis of variance pearson.ppt
07. Repeated-Measures and Two-Factor Analysis of Variance.pdf
Repeated-Measures and Two-Factor Analysis of Variance
1 ANOVA.ppt
(Individuals With Disabilities Act Transformation Over the Years)D
Anova - One way and two way
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
mean comparison.pptx

Similar to Stactistics: Analysis of Variance Part I (20)

PPTX
mean comparison.pptx
PPTX
3.4 ANOVA how to perform one way and two way ANOVA.pptx
PPTX
Parametric & non-parametric
PDF
Research 101: Inferential Quantitative Analysis
PPTX
ANOVA theory qadm pptx in qualitative decision making
PDF
Analysis of Variance (ANOVA)
PPTX
Shovan anova main
PPTX
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
PPTX
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
PPTX
Introduction to Analysis of Variance
PPTX
QM Unit II.pptx
PPT
PDF
Understanding ANOVA Tests: One-Way and Two-Way
PDF
Analysis of Variance
PPTX
Mean comparison2
PPTX
F unit 5.pptx
PDF
Repeated Measures ANOVA
PPT
CHAPTER 2 - NORM, CORRELATION AND REGRESSION.ppt
PDF
Analysis of Variance
PPTX
ANOVA Parametric test: Biostatics and Research Methodology
mean comparison.pptx
3.4 ANOVA how to perform one way and two way ANOVA.pptx
Parametric & non-parametric
Research 101: Inferential Quantitative Analysis
ANOVA theory qadm pptx in qualitative decision making
Analysis of Variance (ANOVA)
Shovan anova main
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Introduction to Analysis of Variance
QM Unit II.pptx
Understanding ANOVA Tests: One-Way and Two-Way
Analysis of Variance
Mean comparison2
F unit 5.pptx
Repeated Measures ANOVA
CHAPTER 2 - NORM, CORRELATION AND REGRESSION.ppt
Analysis of Variance
ANOVA Parametric test: Biostatics and Research Methodology
Ad

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
2Systematics of Living Organisms t-.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Microbiology with diagram medical studies .pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Phytochemical Investigation of Miliusa longipes.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
7. General Toxicologyfor clinical phrmacy.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
2Systematics of Living Organisms t-.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
An interstellar mission to test astrophysical black holes
Comparative Structure of Integument in Vertebrates.pptx
2. Earth - The Living Planet earth and life
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
POSITIONING IN OPERATION THEATRE ROOM.ppt
The KM-GBF monitoring framework – status & key messages.pptx
Microbiology with diagram medical studies .pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
lecture 2026 of Sjogren's syndrome l .pdf
Biophysics 2.pdffffffffffffffffffffffffff
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Phytochemical Investigation of Miliusa longipes.pdf
Ad

Stactistics: Analysis of Variance Part I

  • 2. Why study Analysis of Variance (ANOVA)? • ANOVA technology was developed by R. A. Fisher for use in the agricultural trials being run at the Rothamstad Agricultural Research Station where he was employed for a period of time. • The research was concerned with testing the effects of different types of organic and inorganic fertilizers on the crop yield. • For this reason, many terms used to describe ANOVA models still reflect the terminology used in the agricultural settings (e.g., split-plot designs).
  • 3. • ANOVA methods were specifically developed for the analysis of data from experimental studies. • The models assume that random assignment has taken place as a means of assuring that the IID assumptions of the statistical test are met. • These designs are still the staple of certain social science disciplines where much experimental research is conducted (e.g., psychology). Why study Analysis of Variance (ANOVA)?
  • 4. • But what is the utility for other disciplines? • There are several reasons why knowledge of ANOVA is important: – Clinical trials in all areas of research involving human participants still employ these designs. – Many of the techniques employed in testing hypotheses in the ANOVA context are generalizable to other statistical methods. – There are other statistical procedures, e.g., variance components models that use a similar approach to variance decomposition as is employed in ANOVA analysis and can be more readily understood with an understanding of ANOVA. – Possible involvement in interdisciplinary research. Why study Analysis of Variance (ANOVA)?
  • 5. • ANOVA models distinguish between independent variables (IVs), dependent variables (DVs), blocking variables, and levels of those variables. • All IVs and Blocking variables are categorical. DVs are continuous and IID normally. • We have already discussed IVs and DVs. • Blocking variables are variables needing to be controlled in an analysis that are not manipulated by the researchers and hence not true IV’s (e.g., gender, grade in school, etc.). Overview of ANOVA
  • 6. “Levels” of IVs and Blocking Variables • Each IV and Blocking variable in an ANOVA model must have two or more “levels.” – Example: the independent variable may be a type of therapy, a drug, the induction of anger or frustration or some other experimental manipulation. • Levels could include different types of therapies or therapies of differing degrees of intensity, different doses of a drug, different induction procedures, etc. • It is assumed that participants are randomly assigned to levels of the IV but not to levels of any blocking variables.
  • 7. Overview of ANOVA • IVs and Blocking variables are referred to as “Factors” and each factor is assumed to have two or more levels. • In ANOVA we distinguish between between groups and within groups factors.
  • 8. Between vs. Within Factors • A between groups factor is one in which each subject appears at only one level of the IV. • A within groups factor (of which a repeated measures factor is an example), is one in which each subject appears at each level of the IV. • It is possible to have a design with a mixture of between and within groups factors or effects.
  • 9. Fixed vs. Random Effects: Expected Mean Squares (EMS) • Effects or factors are fixed when all levels of an IV or blocking variable we are interested in generalizing to are included in the analysis. • Effects are random when they represent a sampling of levels from the universe of possible values.
  • 10. Examples of Random and Fixed Effects • Drug dosages of 2, 4, or 10 mg – random: since not all levels are represented. • Different raters providing observational ratings of behavior – random • Gender - male and female – fixed • MST treatment versus usual services – fixed
  • 11. Fixed vs. Random Effects (cont.) • The distinction between fixed and random effects is important since it has implications for the way in which the treatment effects are estimated and the generalizability of the results
  • 12. • For fixed effect models, we have complete information (all levels of a variable or factor are observed) and we can calculate the effect of the treatment by taking the average across the groups present. • In the case of random factors, we have incomplete information, (not all levels of the factor are included in the design). For random factors, we are estimating the treatment effect at the level of the population given only the information available from the levels we included in our design. The formulas are designed to represent this uncertainty.
  • 13. Fixed versus Random Effects (cont.) • In the case of a fixed effect, we can generalize the results only to the levels of the variables included in our analyses. • Random effects assume that the results will be generalized to other levels between the endpoint values included in our analyses.
  • 14. “Levels” of IVs and Blocking Variables • The ANOVA model is a “means model” – i.e., it assumes that any observed differences in behavior can be completely described using only information in the means. • The ANOVA model is also a population- averaged model. • It evaluates the effects of treatments and blocking variables at the group rather than at the individual level.
  • 15. Hypothesis Testing • ANOVA involves the same 4-step hypothesis testing procedure we applied in the case of the z-test, t-tests, and tests for the correlation coefficients. • We will, however, use a different sampling distribution to determine the critical values of our statistic. • This sampling distribution is called the F-distribution and the significance test is now called an F-test.
  • 16. F-test Basics • F-statistics are formed as the ratio of two chi- square distributions divided by their respective degrees of freedom: χ2 /d.f.1 F(1,d.f.Denom) = t2 = -------------- χ2 /d.f.2 • As a result, unlike the t-distribution the shape of which was determined by one degree of freedom parameter, the F-distribution is determined by two degree of freedom parameters. • When there are only 2 groups, F is equal to t2 .
  • 17. The F Statistic (cont.) • A very important property of the F-test under the null hypothesis of no differences between the groups is that, in theory, the numerator and denominator are independent estimates of the same population variance. • However, the denominator measures only “error” or “noise” while the numerator measures both error and treatment effect. • Under the null hypothesis of no effect of treatment, the expected value of the F-statistic is 1.0. As the treatment effect increases in size, F becomes greater than 1.0 • Note: Although in theory F should never be less than 1.0, with “real” data it will fall below 1.0 at times.
  • 18. Error Term • The error term in the denominator of the F- statistic is an extension of the two sample t-test error term. – In the two sample t-test we saw that since two independent estimates of the population variance were available - one from each sample - we could improve on the estimate of the population parameter by averaging across the two estimates. The error term which resulted was called a “pooled error term.” • In ANOVA, we will have at least two but possibly three or more groups. Regardless, the process is the same. To improve on our estimate of the population parameter, we pool the variance estimates together - one from each cell or sample - and use this mean squared error as the error term in our F-test.
  • 19. Tabled Values of F • The critical values in the table for the F-statistic are non-normally distributed and include only the values at the upper tail of the distribution. • Lower values can be obtained by taking the reciprocal of the tabled value of F, i.e., 1/F but these values are rarely used. • The F-distribution changes shape depending on the numerator and denominator degrees of freedom as can be seen in the next slide:
  • 21. ANOVA Hypotheses • Ho: μ1 = μ2 = μ3 • H1: μ1 ≠ μ2 ≠ μ3 • As with the other statistics covered in this course, the F-test can be run working from definitional formulas or computational formulas. • We will work through the definitional formulas in class examples. • One reason for this is that it is easy to calculate the statistic using these formulas. More importantly, it is easier to see what each component of the statistic represents.
  • 23. One-Way ANOVA Design Model • Simplest ANOVA model – single factor: Yij = μ + αj + εij • Where i = person and j=group. • This model says that each person's score can be described by: – μ, the overall or grand mean – αj, an average group level treatment effect, and – εij, a parameter describing each individual's deviation from the average group effect.
  • 24. The F-statistic • For all ANOVA designs, the F-statistic is comprised of two parts: – Numerator: a component known as the “mean square between groups” – Denominator: “mean square within groups” – We form a ratio of these two variance terms to compute the F statistic: MSBG SSBG / d.f.BG F = --------- = --------------- MSWG SSWG / d.f.WG
  • 25. The F-statistic (cont.) • The trick is to correctly estimate the variance components forming the numerator and denominator for different combinations of fixed and random effects. • The correct formulas to use are based on the statistical theory underlying ANOVA and are derived using what are termed “expected mean square” formulas.
  • 26. Components of the F-Statistic • We define the numerator and denominator of the F-test in terms of their expected values (the values that one should obtain in the population if Ho were true). • The expected mean squares for the two types of effects in the CR-p design are given by: Model I Model II (Fixed effect) (Random effect) _____________________________________________ MSBG = σε 2 + nΣαj 2 / (p - 1) σε 2 + n(1 - p/P)σα 2 MSWG = σε 2 σε 2 ________________________________________
  • 27. Forming the F-statistics for the two possible design models: σε 2 + nΣαj 2 / (p - 1) F (Model 1) = ---------------------------- σε 2 σε 2 + n(1 - p/P)σα 2 F (Model 2) = ---------------------------- σε 2
  • 28. Calculating the Variance Components J _ _ SSBG = n Σ (X.j - X..)2 j=1 J I _ J ^ SSWG = Σ Σ (Xij - X.j)2 or Σ s2 (n-1) j=1 i=1 j=1 MSBG = SSBG/d.f.BG MSWG = SSWG/d.f.WG
  • 29. Variance Components (cont). • If we add the SSBG and SSWG terms together, we have the Total Sums of Squares or SSTOT: SSTOT = SSBG+SSWG = ΣΣ(Yij – Y..)2 • The ANOVA process represents a “decomposition” of the total variation in a DV into “components” of variation attributable to the factors included in the model and a residual or error component. • In the CR-p design, SSBG is the variance in the DV that can be “explained” by the IV and SSWG is the error, residual, or unexplained variation left over after accounting for SSBG.
  • 30. Assumptions • ANOVA makes the same assumptions as the t- test. • F assumptions: – The dependent variable is from a population that is normally distributed. – The sample observations are random samples from the population. – The numerator and denominator of the F-test are estimates of the same population variance. – The numerator and denominator are independent.
  • 31. Assumptions (cont.) • Model Assumptions: – The model equation (design model) reflects all sources of variation affecting a DV and each score is the sum of several components. – The experiment contains all treatment levels of interest – The error effect is independent of all other errors and is normally distributed within each treatment population with a mean equal to 0 and variance equal to σ2 (homogeneity of variances assumption)
  • 32. Assumptions (cont.) • Note that the assumptions of ANOVA are frequently violated to at least some extent with real-world data. • Generally the violations have little effect on significance or power if: – The data are derived through random sampling – The sample size is not small – Departures from normality are not large – n’s within each cell are equal.
  • 33. Violations of Assumptions • The F-test is robust to violations of the homogeneity of variances assumption. When the violation is extreme, however, the result is an incorrect Type I error rate (It will become increasingly inflated as the violation becomes more severe). • As we noted with the t-test, if the group sizes are equal, the impact of heterogeneity is of less concern.
  • 34. Violations of Assumptions • The effect on the error term will be liberal when the largest cell sizes are associated with the smallest variances and conservative if the largest cell sizes are associated with the largest variances. • In most cases, problems arise when the cell or group sizes are very different.
  • 35. Normalizing Transformations • When the data are more severely skewed, kurtotic, or both, or the homogenity of variances assumption has been violated, it is sometimes necessary to apply a normalizing transformation to the DV to allow the analysis to be run.
  • 36. Transformations • Normalizing transformations help to accomplish several things: – Homogeneity of error variances – Normality of errors – Additivity of effects, i.e., effects that do not interact (as is desirable, for example, in a repeated measures ANOVA design). By transforming the scale of measurement, additivity can sometimes be achieved. – We will talk about additivity more in the context of repeated measures ANOVA.
  • 37. Types of Normalizing Transformations • Square Root – Y'=√Y – Use when treatment level means and variances are proportional (moderate positive skew)
  • 38. Types of Normalizing Transformations (Cont.) • Log 10 – Y'=log10(Y+1) – Use when Tx means and SDs are proportional (more extreme positive skew)
  • 39. Types of Normalizing Transformations (Cont.) • Angular or Inverse Sine – Y'=2*arcsin(√Y) – Use when means and variances are proportional and underlying distribution is binomial. – Often used when the DV represents proportions.
  • 40. Types of Normalizing Transformations (Cont.) • Inverse or Reciprocal – Y'=1/Y – Use when squares of Tx means are proportional to SDs (severe positive skew or L-shaped distribution)
  • 41. Negatively Skewed Distributions • When the distribution is negatively skewed, one must first "reflex" the distribution and then apply the correct transformation. To reflex a set of scores, simply subtract each value from the highest value plus one unit. – e.g. an item is on a 1-5 scale, and we wish to reflex it (change it to a 5-1 scale) we would simply subtract each value from 6: Reflexed(y) = 6-y
  • 42. Selecting a Transformation • If you are unsure which transformation will work best, you can apply all possible transformations to the highest and lowest scores in each treatment level. • Calculate the range of values for each treatment level by subtracting the smallest score from the largest. • Form a ratio of the largest and smallest ranges for each transformation across Tx levels. • The transformation associated with the smallest ratio wins. • (See Kirk Experimental Design for an example)
  • 43. Additivity • If addititivity is of interest, use a test for nonadditivity such as that developed by Tukey (1949) and covered in many statistics texts. • Then select a transformation that reduces nonadditivity to an acceptable level.
  • 44. Example of a Completely Randomized Between Groups Design (CR-5) • One independent variable with five levels • The independent variable represents different types of stranger awareness training and the dependent variable the latency of the children to protest verbally about a stranger’s actions (measured in seconds).
  • 45. Example (cont.) Experimenter Mother Mother Role Control verbal verbal natural play __________________________________________________________ Mean 279 284 286 308 330 SD 50 53 51 56 58 n 10 10 10 10 10 __________________________________________________________ Y.. = 297.4 Can eyeball the SD’s relative to the sample sizes to see if the homogeneity assumption has been violated.
  • 46. CR-5 Example (cont.) • Ho: μ1 = μ2 = μ3 = μ4= μ5 • H1: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5 • Use ~F(4,45) • Where dfnum = p-1 and dfden = p(n-1) • Set up decision rules (from F-table): – Fcrit(.05,4,45) = 2.61 – If Fobs > Fcrit then reject Ho
  • 47. Formulas • Calculate statistic and apply decision rules: J ^ SSWG = Σ S2 (n-1) = [(50)2 +(53)2 +(51)2 +(56)2 +(58)2 ]*9=129,690 j=1 J _ _ SSBG = nΣ (Xj - X..)2 j=1 = (10)(279-297.4)2 +(284-297.4)2 +(286-297.4)2 + (308-297.4)2 +(330-297.4)2 = (10)1823.2 = 18,232 SSTOT = SSBG+SSWG = 129,690+18,232 = 147,992
  • 48. Formulas (cont.) d.f.BG = p-1 = 5-1= 4 d.f.WG = p(n-1) = 5(10-1)= 45 MSBG = SSBG/d.f.BG = 18,232/4 = 4558 MSWG = SSWG/d.f.WG = 129,690/45 = 2882
  • 49. Formulas (cont.) MSBG F = ------ MSWG = 4558 / 2882 = 1.58 Fcrit(.05,4,45) = 2.61 Since F-obs < F-crit do not reject Ho. Conclude no effect due to treatment
  • 50. What if F was significant? • If you found that there was a significant difference between means using the F-statistic, you still do not know which of the groups were significantly different from the others. • This is because the F-test is a simultaneous or omnibus test of the difference between all possible combinations of group means. • In the case of the two-sample t-test, this was easy to resolve by looking at the group means. In the case of three or more groups, this is not as easy to determine. • For this reason, it is necessary to introduce what are known as post-hoc tests as a follow-up to the F-test.
  • 51. Post Hoc Tests • Post hoc tests take many forms. – For example, your book introduces one post hoc test known as Tukey's HSD (where the HSD stands for "honestly significantly different"). – This test compares all pairwise combinations of means and allows one to determine which of a set of means are actually different from one another.
  • 52. Post Hoc Tests (cont.) • Other commonly used Post Hoc Tests: – Tukey's HSD: Evaluates all pairwise differences between means and is based on the Studentized range statistic (q). Maintains error rate familywise at alpha. – Fisher's LSD (Do not EVER use this!): tantamount to no control. Essentially sets error rate per contrast. – Dunnett’s: Allows one to compare p-1 treatment means to a control mean with the correlation between any two contrasts being .50. Controls error rate familywise.
  • 53. Post Hoc Tests (cont.) – Neuman-Keuls: Controls the error rate at alpha for any ordered set of means and takes into account the distance between the means. Control is between familywise and per contrast. – Scheffe's: One of the most stringent and therefore least powerful tests. Holds Type I error rate at alpha for all possible contrasts. Can be used with unequal n’s and for other than pairwise comparisons. – Duncan's multiple range test:
  • 54. Type I Error Rate • Post-hoc tests were developed to deal with a problem associated with running multiple significance tests. • Whenever more than one hypothesis test is run on the same data, the nominal or overall Type I error rate increases. The amount of increase is described by a simple formula: True alpha = 1-(1-α)c • Where C = the number of significance tests run.
  • 55. Type I Error Rate • As the number of tests increases, the Type I error rate also increases. If you run 5 significance tests each at the .05 level, the true error rate will be 1 - (1 - .05)5 = .2262. • It is easy to see that this is a problem in the ANOVA contexts since determining which groups differ from one another requires the application of multiple significance tests.
  • 56. Type I Error Rate • In response, the various post hoc tests were designed to hold the Type I error rate at the nominal level across an entire set of significance tests. • To better understand issues having to do with the Type I error rate in ANOVA models, one must first understand that the error rate can be defined at several different levels: – Experiment-wise or across all possible comparisons in a study. – Family-wise or across all possible comparisons for one factor or effect (if an interaction). – Per contrast or comparison.
  • 57. Post Hoc Tests (cont.) • Setting the Type I error rate per contrast is like setting no control at all. • Setting the error rate experiment-wise is generally too conservative for most applications. • In most cases, the error rate is set family-wise. – Note: Fisher's LSD is tantamount to no control at all and should never be used as a post hoc test for this reason. – Scheffe’ is the most conservative and can also be used for other than pairwise comparisons.
  • 58. Example using Tukey’s HSD • Tukey's HSD is specially designed to control or maintain the Type I error rate across all possible pairwise comparisons at the .05 level. The formula for Tukey's is: tHSD = q.05,p(√(MSWG/n)) • q is given in a Table in your book. For our problem using the .05 level q=4.04: tHSD = 4.04 (√(2882/10)) = 68.58 • 68.58 represents the value that the difference between any two pairs of means must exceed for that difference to be considered significant.
  • 59. M1 M2 M3 M4 M5 M1 - -5 -7 -26 -51 M2 - -2 -24 -46 M3 - -22 -44 M4 - -22 M5 - In this example none of the means using Tukey's HSD are significantly different as none of the absolute values of the difference scores exceeds the critical difference value of 68.58.
  • 60. Tukey’s HSD (cont.) • Sometimes, a pair of means may be significantly different using a post hoc test when the omnibus F-test tells you that there are no differences or vice versa. – Can happen since the two tests sometimes use a different error term and d.f. or the means for one pair of groups may be sufficiently different but the difference is lost when SS are grouped for the overall test. – You should decide which to interpret based on your best judgment as to which level of the Type I error rate is most appropriate.
  • 61. ω2 , η2 , and ρ • These three statistics represent the proportion of the total variance in the dependent variable explained by the independent variables included in a design. – Omega hat squared (ω2 ) is used to estimate the proportion of variance in the dependent variable explained by the fixed effect independent variables in the population. – Rho (ρ) is the equivalent of ω2 but for random effects. – Eta-squared (η2 ) is the proportion of variance explained in the sample. Eta is similar to the Pearson r2 but indexes linear and nonlinear association. – These statistics are called measures of the “strength of association” (between the IV's and DV).
  • 62. Omega Hat Squared, rho, and Eta-Squared cont. • These statistics also provide additional information to that of the significance tests allowing a researcher to determine how meaningful are the results of the statistical tests.
  • 63. Omega Squared • Formula for a one-way design: SSBG - (k-1)MSWG ω2 = ------------------------- where k = # of groups SSTOT + MSWG For our problem this is: 18,232 - (5-1)(2882) ω2 = ------------------------- 147,992 + 2882 = 6,704 / 150,874 = .044, or approx. 4% of the variance in the measure latency to protest is explained by the various types of treatments included in this design
  • 64. Rho (ρ) • Our example includes a fixed effect. • If the effect had been random, we would have calculated rho which is given by the following formula: MSBG –MSWG ρ = ------------------------------- MSBG + (n - 1)MSWG
  • 65. Eta-Squared η2 = SSBG / SSTOT For our example: = 18,232 / 147,992 = .1232, or 12% of the sample variance in latency scores is explained by the treatments.
  • 66. A Note… • As with the post hoc tests, the strength of association measures are often computed only when the F-tests are significant to help index the importance of the result. • However, they can also be calculated as an index of effect size to help understand why a test failed to reach significance. • Therefore, not a bad idea to always calculate them.
  • 68. Contrasts • A contrast or comparison among means is simply a difference among the means with appropriate algebraic signs ψ1 = (-1*Y.1) + (1*Y.2) = a comparison between the means of groups 1 & 2. ψ2 = (-1*Y.1) + (-1*Y.2) + (2*Y.3) = a comparison of groups 1 & 2 vs. group 3. • In general contrasts will have the form of : _ _ _ • ψi = c1Y.1 + c2Y.2 + . . . + cpY.p
  • 69. Contrasts • Contrasts can be preplanned or post hoc and pairwise or non-pairwise. • The number of pairwise comparisons that can be defined for any set of means is equal to: p(p - 1) / 2 where p = the number of means • Contrasts can also be orthogonal or nonorthogonal .
  • 70. Orthogonal vs. Nonorthogonal • Contrasts are orthogonal if the following equality holds: P Σ CijCi'j = 0 p=1 for the equal n case or: P Σ CijCi'j / nj = 0 j=1 for the unequal n case.
  • 71. Orthogonal vs. Nonorthogonal Contrasts • If the contrasts are nonorthogonal, they are correlated as given by the following formula: ρii' = (ΣCij Ci'j / nj) / √[(ΣCij 2 / nj)( ΣCi'j 2 /nj)] Thus the pairwise contrasts: -1 1 0 0 -1 0 1 0 -1 0 0 1 __________________ -1 0 0 0 would not be considered to be orthogonal since their products do not sum to 0 (each is correlated .5 if n=10).
  • 72. On the other hand, the following pairwise contrasts are orthogonal: -1 1 0 0 0 0 -1 1 _____________ 0 0 0 0
  • 73. • The number of orthogonal contrasts is always equal to p-1 or the number of levels of a factor minus 1 (or equal to the d.f. for a factor). • For our example with 5 means, we should be able to define 4 contrasts that are orthogonal. • One possible combination of such contrasts would be the following: 4 -1 -1 -1 -1 0 3 -1 -1 -1 0 0 2 -1 -1 0 0 0 1 -1
  • 74. • The sum of squares for any possible set of p-1 orthogonal contrasts when summed will always equal the total SSBG. – This is not true for nonorthogonal contrasts which will sum to more than the SS associated with a factor (and can cause software to fail). • Note that each contrast always has only a single numerator degree of freedom.
  • 76. A Priori Test Using the t-Statistic and a Contrast • Ho: ψ1 = 0 • H1: ψ2 ≠ 0 • The t-statistic is given by: ^ p _ ψi Σ CjY.j j=1 t = ------ = ------------------------- ^ σψi √(MSE ΣCj 2 / nj) _ _ _ C1Y.1 + C2Y.2 + . . . + CpY.p = ---------------------------------------------------------- √[MSE((C1 2 / n1) + (C2 2 / n2) + ... (Cp 2 / np))] • Where MSE = the error term for the appropriate effect
  • 77. Example of “User Specified” (a priori / preplanned or a posteriori) Tests of Hypotheses • Use an extension of the t-test: _ _ _ _ ψi c1Y1 + c2Y2 + c3Y3 + ... + CpYp t = --- = ------------------------------------------------------ σψi c1 2 c2 2 c3 2 cp 2 √[MSWG( ---- + ----- + ----- + ... + -----)] n1 n2 n3 n4
  • 78. Example (cont) • All contrasts of this type have only a single degree of freedom for the numerator and degrees of freedom for the denominator equal to those associated with the MSWG term. • Always keep in mind that the MSWG term to use for the contrast is the same term associated with the overall test of the factor whose mean differences are being tested with the contrast.
  • 79. For Our Example (1st Contrast) ψi -2(279) + -2(284) + -2(286) + 3(308) + 3(330) t = --- = -------------------------------------------------------- σ ψi 4 4 4 9 9 √[2882(----- + ----- + ----- + ----- + -----)] 10 10 10 10 10 ψi-558 + -568 + -572 + 924 + 990 t = --- = ----------------------------------------- σ ψi √[2882(.4 + .4 + .4 + .9 +.9)]
  • 80. Continued… ψi 216 t = --- = ------------------ σ ψi √[2882(3)] ψi 216 t = --- = ------------ σ ψi 92.9839 ψi t = --- = 2.32298 σ ψi • Note: t2 = F when only 1 d.f. in the numerator or 5.396 from F-table • at 1, 45 degrees of freedom F-crit = 4.04
  • 81. The Bonferroni Inequality • Used to control the Type I Error rate when hypothesis testing using multiple contrasts. • As we showed earlier, when you run a series of significance tests, the Type I error rate does not remain at the .05 or .01 levels across the tests, but actually increases as a function of the number of comparisons that you are making.
  • 82. The Bonferroni Inequality (cont.) Recall the formula: 1 - (1-α)C where α signifies the Type I error rate for each test and C equals the number of comparisons made. e.g. If we ran 6 significance tests each at the .05 level, our actual error rate across the comparisons would be: 1 - (1 - .05)6 = .2649 a much higher value than we ever want to use.
  • 83. The Bonferroni Inequality (cont.) • To control the Type I error rate across a set of contrasts it is possible to apply the Bonferroni inequality to determine the proper significance level to test each comparison or contrast at: error rate = .05 / C = .0083 • This controls the Type I error rate across the set of contrasts so that it will not exceed our specified level of .05 (or .01) since: 1 - (1 - .0083)6 = .0488 < .05
  • 84. The Bonferroni Inequality (cont.) • The Bonferroni adjustment is very conservative as with increasing tests, it sets a very stringent level for the Type I error rate. • For this reason it has received increasing criticism of late. • Alternatives to the Bonferroni have been proposed and should be considered when appropriate.
  • 85. Scheffe’s S Test • Scheffe’s S test is another type of post hoc test that is described by Kirk (1982) as “…one of the most flexible, conservative, and robust data snooping procedures available.” (p121) • Can be used for all possible contrasts not just pairwise. • Can be used with unequal n’s.
  • 86. Scheffe’s S Test • Error rate is set experimentwise for an infinite number of possible contrasts. • Most conservative of the post hoc tests. • Less powerful than Tukey’s HSD. • Uses the F-distribution and is robust against violations of the normality and homogeneity of variances assumption.
  • 87. Scheffe’s cont. • Critical difference a contrast must exceed is: ^ P Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj 2 /nj) j=1 Where P = # of means Fαν1ν2 is taken from the F-table with v1 = p- 1 and the v2 = MSError d.f.
  • 88. Scheffe’ cont. Example using the contrast previously tested (-2, -2, -2, 3, 3): ^ P • Ψ(S) = √(p-1)Fαν1ν2 * √MSError Σ(Cj 2 /nj) j=1 P=5 Fαν1ν2 = F.05,4,45 = 2.5975 = √(5-1)2.5975 * √2882[(-22 /10)+(-22 /10)+(-22 /10)+(32 /10)+(32 /10)] = √(4)2.5975 * √2882[(4/10)+(4/10)+(4/10)+(9/10)+(9/10)] = √10.18 * √2882[.4+.4+.4+.9+.9] = 3.1906 * √2882[3] = 296.67
  • 89. Trend Analysis and Orthogonal Polynomial Contrasts • If you have a factor in an ANOVA model that has three or more levels, it is also possible to conduct a trend analysis using orthogonal polynomial contrasts: If you have: you can conduct tests for: 2 means linear trend only 3 means linear & quadratic trends 4 means linear, quadratic, & cubic trends 5 means linear, quadratic, cubic, & quartic trends etc. etc. etc.
  • 90. • The contrasts to use might look like this: linear -3 -1 1 3 quadratic 1 -1 -1 1 cubic -1 3 -3 1 • Some software packages, like SAS, have procedures to generate trend coefficients for any number of means. Others, like SPSS, will run a trend analysis as part of the general output.
  • 91. Steps in Conducting a Trend Analysis 1. Calculate the SSBG associated with a linear trend component 2. Test the residual or remaining SS for any further departures from linearity 3. Calculate the SSBG associated with a quadratic trend component 4. Repeat step 2 5. Evaluate SS with next higher-order trend. 6. Etc.
  • 92. Example of a Linear Trend • Assume that we have run an experiment and have found the sums across 4 treatment levels to be equal to 22, 28, 50, and 72. • Assume that we also calculated the SSWG and find it to be equal to 41.0 and SSBG = 194.5. • Further assume that each level of the IV has 8 participants. • Our linear trend contrast coefficients are: -3 -1 1 3
  • 93. Example (cont) • Our linear contrast would be: ^ P n ψlin = Σ (Cij ΣYij) j=1 i=1 = -3(22) + -1(28) + 1(50) + 3(72) = 172 Note the i in Cij refers to the set of contrasts associated with the number of the trend component (i.e. linear = 1, quadratic = 2, etc.)
  • 94. Compute the Sums of Squares ^ ^ ψ2 lin (172)2 SS ψlin = ---------- = ---------------------------- = 184.9 p 8[(-3)2 + (-1)2 + (1)2 + (3)2 ] nΣC2 1j p=1 d.f.=1 • SSψdep from lin = SSBG - SSψlin =194.5 - 184.9 = 9.6 d.f. = p - 2 = 2
  • 95. Compute the Sums of Squares Note: If using the group means instead of the sums, use the following formula instead: P _ n[Σ(CijY.j)]2 j=1 SS ψlin = ------------------ p ΣC2 ij p=1
  • 96. Compute the F-ratio for Both MS ψlin SS ψlin/ 1 184.9/1 F = --------------- = ---------------- = ---------------- MSWG SSWG / p(n-1) 41.0/28 = 126.30 = F obtained for linear trend • d.f. = 1 numerator and p(n-1) for the denominator • F(1,28) = 4.20 = F critical
  • 97. Compute the F-ratio (cont) MS ψdep from lin SS ψdep from lin/ 2 9.6/2 F = ------------------- = ------------------ = ------------ MSWG SSWG / p(n-1) 41.0/28 = 3.28 = F obtained (any other trend) • d.f. = 2, p(n-1) • F(2,28) = 3.34 = F critical
  • 98. Trend Analysis Example Cont. • Based on the F-tests, we would conclude that there is a significant linear trend but that there is no higher-order trend present.
  • 99. Additional Post Hoc Tests • Neuman Keuls • Tukey Kramer • Scheffe’ • The Scheffe’ test is one of special interest since it is the only post-hoc test apart from ones specifically defined by the user that can test other than pairwise comparisons. It also offers the most stringent Type I error rate of all of the post-hoc tests.
  • 100. Power of the F-test • Effect sizes in the ANOVA context are typically calculated in one of two ways: – As a standardized average difference between the group means ^ – Using eta-squared (η2 ) and omega-hat squared (ω2 ) • In the case of number 1, the effect size is first calculated and translated into a value known as Phi and then special tables are referenced to estimate power and sample size. • The second approach is much easier to implement. Here one runs the analysis, calculates eta-squared or omega-hat squared (for fixed effects) or the intra-class correlation coefficient (for random effects).
  • 101. Remember… SSBG η2 = ------------- SSTOT ^ σ2 BG SSBG - (a-1)(MSWG) ω2 = ------------------- = ------------------------- σ2 BG + σ2 WG SSTOT + MSWG (a - 1) (F - 1) = ------------------------- (a - 1)(F - 1) + a(n)
  • 102. Power (cont.) • Once the effect size has been calculated, it is a simple matter to turn it into another useful value using the following formula: f = √(ω2 / 1 - ω2 ) (f is a symbol used by Jacob Cohen to index a value that can be used to estimate power and sample sizes.)
  • 103. Conventions for Effect Sizes (Cohen) f = .01 = small effect size f = .05 = medium effect size f = .14 = large effect size • After calculating f, one can then use information from published tables in Cohen to calculate power and sample size or use existing computer software such as G-Power • Note that there is an option available in SPSS's GLM procedure which will output power values for main and interaction effects for any ANOVA design model