Anova advanced

9-1  2004 A. Karpinski
Chapter 9
Advanced Topics in ANOVA
Page
Unbalanced ANOVA designs
1. Why is the design unbalanced? 9-2
2. What happens with unbalanced designs? 9-3
3. An introduction to the problem 9-5
4. Types of sums of squares 9-10
5. An example 9-15
ANOVA designs with random effects
6. Fixed effects vs. random effects 9-22
7. Model II: One-factor random effects model 9-24
8. Model II: Two-factor random effects model 9-30
9. Model III: Two-factor mixed effects model 9-35
10. Contrasts and post-hoc tests 9-41
11. Effect sizes 9-41
12. Final considerations about random effects 9-42
ANOVA designs with nested effects
13. An introduction to nested designs 9-43
14. Structural models for nested designs 9-45
15. Testing nested effects 9-46
16. Final considerations about nested designs 9-52
ANOVA designs with randomized blocks
17. The logic of blocked designs 9-53
18. Examples of randomized block designs 9-55
19. Final consideration about blocked designs 9-69

Advanced Topics in ANOVA:
Unbalanced ANOVA designs
1. Why is the design unbalanced?
• Random factors
o The unequal cell sizes are randomly unequal
o The process leading to the missingness is independent of the levels of the
independent variable
• Scheduling problems
• Computer errors
IV 1
IV B Level 1 Level 2 Level 3
Level 1 11n =15 21n =10 31n =20 45
Level 2 12n =20 22n =20 32n =15 55
35 30 35 100
IV 1
Level 1 11n =4 21n =7 31n =3 14
Level 2 12n =4 22n =3 32n =6 13
Level 3 13n =5 23n =4 33n =5 14
13 14 14 41
• Systematic factors
o The unequal cell sizes are directly or indirectly related to the levels of the
independent variables
• A treatment is painful/ineffective
• High prejudice individuals refuse to answer questions regarding
attitudes toward ethnic groups
IV 1
Level 1 11n =40 21n =40 31n =50 130
Level 2 12n =20 22n =20 32n =30 70
60 60 80 200
IV 1
Level 1 11n =3 21n =6 31n =9 18
Level 2 12n =2 22n =6 32n =9 17
Level 3 13n =4 23n =8 33n =13 25
9 20 31 60

• Missing observations due to systematic factors is bad. Analyzing these data
can lead to very biased results.
• All of the methods we discuss for analyzing unbalanced designs assume the
cell sizes are either a result of:
o Random factors
o Real differences in the population
2. What happens with unbalanced designs?
• Recall that two contrasts are orthogonal if for unequal n
1ψ = ),...,,,( 321 aaaaa
2ψ = ),...,,,( 321 abbbb
0
1
=∑=
a
j i
ii
n
ba
or 0...
2
22
1
11
=+++
a
aa
n
ba
n
ba
n
ba
• In general the tests for main effects and interactions are no longer orthogonal
for unbalanced designs.
• Because of this non-orthogonality, the sums of squares will not nicely
partition.
SSTSSABSSBSSA ≠++
• As a result:
o The tests for the main effects and interactions are not independent of each
other.
o Single degree of freedom contrasts may not be combined into a
simultaneous test.
• The most popular method for dealing with these issues is to use different
methods of computing the sums of squares for each effect.
• These different methods of computing sums of squares DO NOT affect:
i. The error term (MSW)
ii. The test of the highest order interaction

• Three possible approaches to unequal cell sizes (assuming data are missing
completely at random)
o Add observations to make the design balanced
• This solution may not be pragmatic
• It may also present problems regarding random assignment in a true
experiment
o Delete observations to make design balanced
• While an unbalanced design is less powerful than a balanced design,
you ALWAYS lose power by tossing observations
• There is not a good method for deciding whom to toss. (If you use a
random process, then a different person using the same algorithm may
come to different conclusions. If you use a systematic process, then
you may bias your results.)
• I recommend that you NEVER delete an observation to make a design
balanced.
o Impute the missing data
• A topic too advanced for this course!
o Conduct analysis on an unbalanced design

3. An introduction to the problem of unbalanced designs
• Balanced, orthogonal designs
o For balanced designs, the SS partition is complete and each component’s
contribution to the total SS is unique.
• Unbalanced, non-orthogonal designs
o For unbalanced designs, the SS are not necessarily unique to each
component
o These figures are just heuristics. With data, it is possible to have
“negative” overlapping area.
SSA SSB
SSAB
SSA SSB
SSAB

• Approach #1: Only count the unique contribution of each factor
o This approach is known as the Unique SS or Type III SS approach
• Approach #2: Start with only the main effects. Use a unique SS approach to
divide the main effect sums of squares. Then, add the next highest order
effects. For the remaining SS, use the unique approach to divide the SS.
Continue until all effects have been added.
o This approach is known as using Type II SS
SSA
SSAB
SSB
SSAB
SSBSSA

• Approach #3: Start with only the main effects. Determine an order of
importance. Give the most important effect all its SS. For next effect, give
the effect its entire remaining SS. Continue until all main effects are used.
Next consider the two-way interactions, and determine an order of
importance and repeat the process. Continue until all effects have been
considered.
o This approach is known as the hierarchical or Type I SS approach.
Factor A entered first Factor B entered first
SSAB
SSBSSA
SSAB
SSA SSB

• The problem of unequal sample sizes occurs when we collapse across cells
to look at the marginal means. There are different ways to collapse the main
effects, and each gives a different answer.
(The MSW and the highest order interaction are unaffected by these
different methods because they do not average across any cells—they say
something about individual cells.)
• An example: Salary data for female and male employees
Female Male
College Degree
No
College Degree College Degree
No
College Degree
24 15 25 19
26 17 29 18
25 20 27 21
24 16 20
27 21
24 22
27 19
23
Mean 25 17 27 20
Sample Size 8 4 3 7
Gender
Female Male
Education College Degree
25
8
11
11
=
=
X
n
27
3
21
21
=
=
X
n
No College Degree
17
4
12
12
=
=
X
n
20
7
22
22
=
=
X
n

• Question: Is there a difference in the salaries of men and women?
o Approach #1: Let’s run a contrast comparing women’s salary to men’s
salary
Gender
Women Men
Education College Degree -1 1
No College Degree -1 1
• Based on this approach, we conclude that men earn more than
women!
⇒ Women earn $21000 21
2
1725
=




 +
⇒ Men earn $23500 5.23
2
2027
=




 +
o Approach #2: Ignore education level and compute marginal gender
means.
Gender
Women Men
College Degree
33.22
12
=
=
F
F
X
n
10.22
10
=
=
M
M
X
n
• Based on this approach we look at the marginal means for gender, and
conclude that women earn slightly more than men
o Which answer is correct?

o It depends – each method answers a different question
• Method #2 asks: Are men paid a higher salary than women?
• Method #1 asks: Within an education status, are men paid a higher
salary than women?
• This discrepancy is known as “Simpson’s Paradox”
4. Types of Sums of Squares
• I am going to focus on the use and interpretation of each type of sums of
squares, and will ignore how to compute these SS. SPSS (or any statistical
software) can calculate each of the SS, but if you must see the computational
details, see an advanced ANOVA book.
• Type III / Unique SS or Regression SS
o In general, this is the best and most common approach to analysis
o For Type III SS, each cell mean is weighted equally when computing
marginal means. These cell means are unweighted (because they
considered equally, independent of the sample sizes).
o This approach leads to the identical results as converting the design to a
one-factor arrangement and using contrasts to test the main effects and
interactions.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type III SS, we only use the part of the SS that is unique to the factor
of interest.
(For those of you familiar with regression, Type III SS is equivalent to testing for
each effect after having previously controlled for/entered all other effects OR by
entering all effects simultaneously.)

o In our example, using Type III SS is equivalent to taking approach #1 to
the analysis.
Testing the main effect for gender using a Type III SS approach:
Gender
Women Men
Education College Degree 2511 =X
-1
2721 =X
1
No College Degree 1712 =X
-1
2022 =X
1
• Main effect for gender
⇒ Women earn $21000 21
2
1725
=




 +
⇒ Men earn $23500 5.23
2
2027
=




 +
• How is the main effect for education tested?
• In SPSS:
UNIANOVA dv BY gender edu
/METHOD = SSTYPE(3).
Tests of Between-Subjects Effects
Dependent Variable: DV
273.864a 3 91.288 32.864 .000
9305.790 1 9305.790 3350.084 .000
29.371 1 29.371 10.573 .004
264.336 1 264.336 95.161 .000
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
GENDER
EDU
GENDER * EDU
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .846 (Adjusted R Squared = .820)a.
Main effect for gender such that men earn more than women,
F(1,22) = 10.57, p = .004
Main effect for education such that college educated individuals earn
more than non-college educated individuals,
F(1,22) = 95.16, p < .001

• Type I / Hierarchical SS
o For Type I SS, each cell mean is weighted by its cell size when
computing marginal means.
o The order the factors are entered into SPSS makes a difference in how
the SS are computed.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type I SS:
• For the first factor listed, we use all the SS for that factor (unique and
redundant)
• For the next factors, we use the entire SS that is not redundant with
the previous factors
(For those of you familiar with regression, Type I SS is equivalent to testing for
each effect by entering each effect one after the other)
o In our example, Type I SS (with gender listed first) is equivalent to
ignoring education level and using weighted marginal means
Gender
Women Men
College Degree
33.22
12
=
=
F
F
X
n
10.22
10
=
=
M
M
X
n
• In SPSS:
UNIANOVA dv BY gender edu
273.864a 3 91.288 32.864 .000
10869.136 1 10869.136 3912.889 .000
.297 1 .297 .107 .747
272.392 1 272.392 98.061 .000
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
GENDER
EDU
GENDER * EDU
Error
Total
Corrected Total
Type I Sum

UNIANOVA dv BY edu gender
273.864a 3 91.288 32.864 .000
10869.136 1 10869.136 3912.889 .000
242.227 1 242.227 87.202 .000
30.462 1 30.462 10.966 .004
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
EDU
GENDER
EDU * GENDER
Error
Total
Corrected Total
Type I Sum
Gender listed first Edu listed first
Main effect for gender F(1,18) = 0.11, p = .75 F(1,18) = 10.97, p < .001
Main effect for education F(1,18) = 98.06, p < .001 F(1,18) = 87.20, p < .001
• Not surprisingly, there are additional types of sums of squares
o Type II SS
A compromise between Type I and Type III SS
o Type IV SS
Use when there are missing cells in the design of the experiment
• Which SS are better?
o In general, you ran the design because you wanted to compare the cell
means. In this case, the unequal cell sizes are irrelevant and you should
use Type III SS
• If we have an experimental design and the data are missing at random,
then there is no defensible reason for allowing cells with larger
numbers of observations to exert a greater influence on the analysis
• For men and women with equal levels of education, do men and
women receive equal pay?
• Type III SS also have the advantage of being the simplest to convert
to contrast coefficients

o If your design intentionally has unequal cell sizes (perhaps to reflect
differences in the composition of the population) and you want your
analyses to reflect this feature, then Type I SS may be more appropriate
• Do men and women receive equal pay?
o This issue of which type of SS to use for unbalanced designs is still
controversial. Different texts and different authors offer different
recommendations. The important point is for you to think about what
question you are asking and which type of SS best answers that question.
You must decide this issue before you analyze your data, not after
examining the p-values!
• Important points to remember
o Regardless of the type of SS used, the error term remains unchanged
o Any analysis that does not involve marginal means remains unchanged
• The test of the highest order interaction is unchanged
• Tests of cell mean contrasts are unchanged
o In most cases Type III SS seem to be the “best” because they take into
account information about all the factors
• If important factors are omitted from the design, you may arrive a
erroneous conclusions (In regression, this is known as the omitted
variable problem).

5. An Example: Level of Management and Support of Affirmative Action
Management Level
Gender
Middle-
Management,
Minor
Division
Upper-
Management,
Minor
Division
Middle-
Management,
Major
Division
Upper-
Management,
Major
Division CEO
Female 21 25 29
26 24
31 30
23 28
25 22
31 30
35 25
30
27 36
27
Male 25 18
26
31 28
22 31
33 31
40 36
35 35 43
37 40
36 44 43
45 42
DV = Scores on an Affirmative Action Attitude Scale
• Note that this design is rather odd – it is a 2*2* 2 with an extra 2 cells
Management Level
Middle Management Upper Management
Gender
Minor
Division
Major
Division
Minor
Division
Major
Division
Male
Female
Gender CEO
Male
Female
• Rather than trying to analyze it as a 2*2*3 with two missing cells, it is much
easier to consider this design to be a 2*5 design. Using appropriate contrasts,
we can test
o Main effect of management level
o Main effect of division
o Management by division interaction
o Interactions between all these terms and gender
But we can also make comparisons between these groups and CEOs.
• Using this approach, we can avoid designs with empty cells and the need to
learn about Type IV SS.

Your specific research questions were:
i. Do middle and upper management from minor divisions differ in their
support for AA?
ii. Do minor division managers differ from major division managers in
their support for AA?
iii. Do CEOs differ from other management in their support for AA?
iv. Do questions i. – iii. differ by gender?
• First, let’s look at the data:
55443 33445N =
MANAGE
CEOUP - MajorMM - MajorUM - MinorMM - Minor
DV
50
40
30
20
10
GENDER
Female
Male
36
1
3
EXAMINE VARIABLES=dv BY group
/PLOT NPPLOT.
Tests of Normality
.989 5 .977
.895 4 .405
.912 4 .492
1.000 3 1.000
.750 3 .000
.842 3 .220
.827 4 .161
.971 4 .850
.887 5 .341
.836 5 .154
GROUP
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variances
DV
.348 9 30 .950
Levene
Statistic df1 df2 Sig.

• Rather than running a traditional main effects and interaction analysis, let’s
skip the omnibus tests and do a contrast-based test of the hypotheses.
o We should adopt a Type III SS approach – the variations in the cell sizes
appear to be random and we are interested in the cell means.
o To conduct contrasts with a Type III SS approach, we need to consider
each cell mean equally, regardless of its sample size – but that is what we
do when we use our standard tests for contrasts.
o However, remember that we cannot combine single degree of freedom
contrasts into a simultaneous omnibus test of a hypothesis.
Hypothesis 1
o Do middle and upper management in the minor divisions differ in their
support for AA?
o Does this level of support differ by gender?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major CEO
Hyp1: Female
Male
-1
-1
1
1
0
0
0
0
0
0
Hyp 1B: Female
Male
-1
1
1
-1
0
0
0
0
0
0
ONEWAY dv by group
/cont = -1 1 0 0 0 -1 1 0 0 0
/cont = -1 1 0 0 0 1 -1 0 0 0.
Contrast Tests
8.0000 4.00638 1.997 30 .055
-2.0000 4.00638 -.499 30 .621
Contrast
Hyp 1
Hyp 1 * Gender
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)

• In the minor divisions, we find that upper management is more
supportive of AA than middle management,
t(30) = 2.00, p = .06, ω2
= .07.
• This difference in support of AA does not vary by gender,
t(30) = -0.50, p = .62, ω2
< .01
• As an example of the effect size calculation, here are the omega
squared calculations for the test of Hypothesis 1:
Hypothesis 1: ˆψ1 = 8
SS ˆψ1=
ˆψ1
2
cj
2
n j
∑
=
(8)2
(−1)2
5
+
(1)2
4
+ 0 + 0 + 0 +
(−1)2
3
+
(1)2
4
+ 0 + 0 + 0
=
64
1.033
= 61.935
ˆωψ
2
=
SSψ − MSWithin
SSψ + (N −1)MSWithin
=
61.935 −15.53
61.935 + (39)15.53
= .0695
Hypothesis 2
o Do minor division managers differ from major division managers in their
support for AA?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major CEO
Hyp 2: Female
Male
-1
-1
-1
-1
1
1
1
1
0
0
Hyp 2B: Female
Male
-1
1
-1
1
1
-1
1
-1
0
0
ONEWAY dv by group
/cont = -1 -1 1 1 0 -1 -1 1 1 0
/cont = -1 -1 1 1 0 1 1 -1 -1 0.
Contrast Tests
26.0000 5.66588 4.589 30 .000
-18.0000 5.66588 -3.177 30 .003
Contrast
Hyp 2
Hyp 2 * Gender
DV
Value of

• We find a significant division of management by gender interaction,
t(30) = -3.18, p < .01, ω2
= .19.
To understand this interaction, we must conduct simple effects tests:
ONEWAY dv by group
/cont = -1 -1 1 1 0 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 1 1 0.
Contrast Tests
4.0000 4.00638 .998 30 .326
22.0000 4.00638 5.491 30 .000
Contrast
Hyp 2 - Women only
Hyp 2 - Men only
DV
Value of
• For women, we find no significant difference between major and
minor division management in their support for AA,
t(30) = 1.00, ns, ω2
< .01.
• For men, we find that managers in major divisions express more
support for AA than managers in minor divisions,
t(30) = 5.49, p < .05, ω2
= .42.
(Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value)
Hypothesis 3
o Do CEOs differ from other management in their support for AA?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major
CEO
Hyp 3: Female
Male
-1
-1
-1
-1
-1
-1
-1
-1
4
4
Hyp 3B: Female
Male
-1
1
-1
1
-1
1
-1
1
4
-4
ONEWAY dv by group
/cont = -1 -1 -1 -1 4 -1 -1 -1 -1 4
/cont = -1 -1 -1 -1 4 1 1 1 1 -4.
Contrast Tests
54.0000 12.83173 4.208 30 .000
-34.0000 12.83173 -2.650 30 .013
Contrast
Hyp 3
Hyp * Gender
DV
Value of

• We find a significant level of management by gender interaction,
t(30) = -2.65, p = .01, ω2
= .13.
To understand this interaction, we must conduct simple effects tests:
ONEWAY dv by group
/cont = -1 -1 -1 -1 4 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 -1 -1 4.
Contrast Tests
10.0000 9.94462 1.006 30 .323
44.0000 8.10912 5.426 30 .000
Contrast
Hyp 3 - Women only
Hyp 3 - Men only
DV
Value of
• For women, we find no significant difference between management
and CEOs in their support for AA, t(30) = 1.01, ns, ω2
< .01.
• For men, we find that CEOs express more support for AA than other
managers, t(30) = 5.42, p < .05, ω2
= .42
(Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value)
• Note that for a contrast-based analysis, we are implicitly adopting a Type III
SS approach by weighting each cell mean equally. Single degree of
freedom tests of cell means are not affected by an unbalanced design
(However, we would not be able to combine single df tests into a
simultaneous test).

• If we had taken a traditional approach, we would have used Type III SS for
our analysis because we assume that the data are missing at random and we
want to know if attitudes toward AA differ by gender within each
management position.
UNIANOVA dv BY gender manage
/METHOD = SSTYPE(3)
/PRINT = DESC.
1427.100a 9 158.567 10.208 .000
36013.846 1 36013.846 2318.488 .000
260.000 1 260.000 16.738 .000
687.429 4 171.857 11.064 .000
268.351 4 67.088 4.319 .007
466.000 30 15.533
40706.000 40
1893.100 39
Source
Corrected Model
Intercept
GENDER
MANAGE
GENDER * MANAGE
Error
Total
Corrected Total
Type III Sum
o We find a significant gender by management position interaction,
F(1,30) = 4.32, p < .01
o We would be required to perform follow-up tests before interpreting the
main effects for gender and management.
Attitude Toward Affirmative Action
20
25
30
35
40
45
Gender
Attitude
MM - Minor
MM - Major
UM - Minor
UM - Major
CEO
Female Male

ANOVA designs with random effects
6. Fixed effects vs. random effects
• Model I: The fixed effects model
o A fixed effect is one in which the experimenter is only interested in the
levels of the IV that are included in the study
o In advance of the study, the experimenter decides to examine a relatively
small set of treatments. Each treatment of interest is included in the
study. The experimenter wishes to make inferences about those
treatments and no others.
o The effect is fixed in that if someone were to replicate the study, the
identical treatments would be used
o Example of a fixed effects model: An advertising company wants to
examine the effectiveness of five different billboards in both men and
women, and in White-Americans, Black-Americans, Asian-Americans,
and Hispanic Americans.
• This design is a 5*2*4 between subjects, fixed effects ANOVA
Factor 1: Advertisement (5 different billboards)
Factor 2: Gender (Men and Women)
Factor 3: Ethnicity (4 ethnic groups)
• Each of these factors is fixed. If the design were to be replicated, the
exact same ads, genders, and ethnicities would be used. The
experimenter wants to make inferences regarding only these ads,
genders, and ethnicities.
• (The exact same participants would not be used – participants are
always a random effect)
( ) ( ) ( ) ( ) ijkljklkljljklkjijklY εαβγβγαγαβγβαµ ++++++++=

• Model II: The random effects model
o A random effect is one in which the factor levels are randomly sampled
from a population. Inferences are made not only for the factor levels
included in the study, but to the entire population of factor levels.
o The effect is random in that if someone were to replicate the study, the
different treatments would be sampled from the population.
o Example of a random effects model: A company owns several hundred
retail stores throughout the country, and it wants to examine the
effectiveness of a new sales promotion. Five stores are randomly
sampled. The sales promotion is implemented in each store for a trial
period and then evaluated.
• This design is a 1-factor between-subjects, random effect ANOVA
Factor 1: Store (5 stores)
• The store factor is a random factor. If the design were to be
replicated, five different stores would be randomly sampled from the
population. The experimenter wants to make inferences regarding the
effectiveness of the sales promotion in all stores, not just the five
included in the study.
• Model III: Mixed model
o A mixed model is a model containing at least one fixed effect and at least
one random effect
In psychology many people refer to a design with at least one between-subjects
factor and at least one within-subjects factor as a mixed design. Although this
terminology is common in psychology it is inconsistent with the statistical usage
of the term. Consistent with the statistical usage, we will reserve the term mixed
model for a model with fixed and random factors

o Example of a mixed model: To investigate the effect of mental activity
on blood flow to the brain (BF), participants completed a math test, a
reading comprehension test, or a history task. The experimenter wanted
to generalize the results to a classroom setting, and reasoned that
different classrooms might have different effects on baseline BF. Thus,
six fifth grade classrooms were selected at random from the Philadelphia
public school system. The students in each class were randomly assigned
to the math test, the reading comprehension test, or the history test. Post-
test BF readings were taken on all participants.
• This design is a 2-factor between-subjects, mixed model ANOVA
Factor 1: Test (Math, Reading Comprehension, or History)
Factor 2: Classroom (6 classrooms)
• The test factor is a fixed factor. These three kinds of tasks are the
only tasks of interest to the experimenter. The classroom factor is a
random factor. If the design were to be replicated, six different
classrooms would be randomly sampled from the population.
• The key idea of the random effects model is that you not only take into
account random noise, 2
εσ , you also take into account the variability due to
the sampling of the factor levels, 2
ασ
7. Model II: One-factor random effects model
• Let’s consider the sales effectiveness example in more detail
Store
1 2 3 4 5
5.80 6.00 6.30 6.40 5.70
5.10 6.10 5.50 6.40 5.90
5.70 6.60 5.70 6.50 6.50
5.90 6.50 6.00 6.10 6.30
5.60 5.90 6.10 6.60 6.20
5.40 5.90 6.20 5.90 6.40
5.30 6.40 5.80 6.70 6.00
5.20 6.30 5.60 6.00 6.30
50.51 =X 22.62 =X 90.53 =X 33.64 =X 16.65 =X

• For a random effects model, we need to check some additional assumptions,
compared to the fixed-effects model
o Fixed effects assumptions:
• All observations are drawn from normally distributed populations
• All observations have a common variance
• All observations are independent and are randomly sampled from the
population
o Random effects assumptions:
• All treatment effects are drawn from normally distributed populations
• All treatment effects are independent and are randomly sampled from
the population
o In general, we cannot check these random effects assumptions in the
data. We must infer them from the design.
EXAMINE VARIABLES=dv BY store
/PLOT BOXPLOT NPPLOT SPREADLEVEL.
88888N =
STORE
5.004.003.002.001.00
DV
7.0
6.5
6.0
5.5
5.0
4.5
Tests of Normality
.950 8 .716
.913 8 .373
.950 8 .716
.930 8 .516
.946 8 .667
STORE
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variance
.073 4 35 .990DV
Levene

• The structural model for a oneway random effects model looks similar to a
fixed model
o Fixed effects model:
Yij = µ + α j + εij ),0(~ εσε Nij
o Random effects model:
ijjijY εαµ σ ++= ),0(~ εσε Nij ),0(~ ασ σα Nj
So that 222
αε σσσ +=Y
• Random effects are denoted with a subscript σ to highlight that they
are random. That is, the sj
'σα are not fixed at a level, but have a
distribution.
• In general, we are not interested in estimating the sj
'σα because they
vary from study to study. It is much more informative to estimate the
distribution of sj
'σα : ),0(~ ασ σα Nj
• When we estimate effects, we will want to estimate 2
ασ
• ANOVA table for a random-effects model
o Recall the ANOVA table for the fixed-effects model
0...: 210 ==== aH ααα
Source SS df MS E(MS) F
Between SSBet a-1 SSB/DFBet
1
2
2
−
+
∑
a
n iiα
σε MSW
MSBet
Within (Error) SSW N-a SSW/DFW
2
εσ
Total SST N-1
o A valid F-test for a factor is constructed so that:
• When the null hypothesis is true, the expected F-value is 1
If H0 is true: 0
1
2
=
−
∑
a
n iiα
Then 11
2
2
2
2
2
==−
+
==
∑
ε
ε
ε
ε
σ
σ
σ
α
σ
a
n
MSW
MSBet
F
ii

• When the alternative hypothesis is true, the expected F-value is
greater than 1 and this increase is only due to the factor of interest
If H1 is true: 0
1
2
>
−
∑
a
n iiα
Then 11
2
2
2
>−
+
==
∑
ε
ε
σ
α
σ
a
n
MSW
MSB
F
ii
o Now the ANOVA table for the random-effects model
0: 2
0 =ασH
Between SSBet a-1 SSB/DFBet
22
αε σσ n+
MSW
MSBet
Within (Error) SSW N-a SSW/DFW
2
εσ
Total SST N-1
o Although the F-tests are constructed in the same manner as a fixed effects
model, under the hood different components are being estimated
• When the null hypothesis is true, the expected F-value is 1
If H0 is true: 02
=ασ
Then 12
2
2
22
==
+
==
ε
ε
ε
αε
σ
σ
σ
σσ n
MSW
MSBet
F
• When the alternative hypothesis is true, the expected F-value is
greater than 1 and this increase is only due to the factor of interest
If H1 is true: 02
>ασ
Then 12
22
>
+
==
ε
αε
σ
σσ n
MSW
MSBet
F

• Random Effects in SPSS
UNIANOVA dv BY store
/RANDOM = store.
1449.616 1 1449.616 1665.507 .000
3.482 4 .870a
3.482 4 .870 10.717 .000
2.843 35 8.121E-02b
Source
Hypothesis
Error
Intercept
Hypothesis
Error
STORE
Type III Sum
MS(STORE)a.
MS(Error)b.
o To test the effect of store: F(4, 35) = 10.72, p < .01
o We reject the null hypothesis of no store effect and conclude that the
effectiveness of the sales campaign varies by store
• If store had been a fixed effect, we would conduct post-hoc tests to
determine how the stores differed.
• But when store is a random effect, we are not interested in differences
between specific stores used in the study. We only want to know if
the store variable adds any variance to the DV (or accounts for any
variance in the DV). In general, we are not interested in post-hoc tests
on the levels of a random variable.

o For any random effects model, SPSS also provides us with the E(MS) so
that we can see how the F-test was constructed:
Expected Mean Squaresa
8.000 1.000 Intercept
8.000 1.000
.000 1.000
Source
Intercept
STORE
Error
Var(STORE) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square
equals the sum of the coefficients in the cells
times the variance components, plus a quadratic
term involving effects in the Quadratic Term cell.
a.
E(MSSTORE) = 8*VAR(STORE) + VAR(ERROR)
VAR(STORE) = 2
ασ and VAR(ERROR) = 2
εσ
E(MSSTORE) = 8 2
ασ + 2
εσ
• We can use this information to estimate the variance components
⇒ To estimate the error variance
08.ˆ 2
== MSWεσ
⇒ To estimate the variance of the store effect
22
8)( εα σσ +=STOREMSE
So that with a little algebra, we obtain:
10.
8
08.87.
8
ˆ 2
=
−
=
−
=
MSWMSSTORE
ασ
⇒ To estimate total variance
18.10.08.ˆˆˆ 222
=+=+= αε σσσY

8. Model II: Two-factor random effects model
• An Example: Suppose a projective test involves 10 cards administered to a
patient, and the number of responses to each card is recorded. The
developer of the test suspects that the order of the cards might influence the
number of responses. Furthermore, the developer has created a standardized
set of instructions in hopes that the effect of the administrator will be
negligible.
To test these assumptions about the test, the developer randomly
selects four possible orders of the ten cards. Four administrators are
recruited to give each order of the test to two patients
Administrator
Order 1 2 3 4
1 26 15 30 33 25 23 28 30
2 26 24 25 33 27 17 27 26
3 33 27 26 32 30 24 31 26
4 36 28 37 42 37 33 39 25
2222 2222 2222 2222N =
ADMIN
4.003.002.001.00
DV
50
40
30
20
10
ORDER
1.00
2.00
3.00
4.00
• With 2 observations/cell, this example is obviously for pedagogical purposes
only. Due to the limited number of observations per cell, we will assume
that the assumptions are satisfied.

• The structural model for this design:
( ) ijkjkkjijkY εαββαµ σσσ ++++=
),0(~ εσε Nij
),0(~ ασ σα Nj
),0(~ βσ σβ Nk
( ) ),0(~ αβσ σαβ Njk
So that 22222
αββαε σσσσσ +++=Y
• ANOVA table for a random-effects model
o The test of each factor is examining a different variance component
Main effect for Administrator: 0: 2
0 =ασH
Main effect for Order: 0: 2
0 =βσH
Administrator by Order interaction: 0: 2
0 =αβσH
o In the two factor random effects model, we need to be much more careful
about examining the E(MS) and constructing appropriated tests of each
effect.
Factor A SSA a-1 SSA/DFA
222
ααβε σσσ nbn ++
MSAB
MSA
Factor B SSB b-1 SSB/DFB
222
βαβε σσσ nan ++
MSAB
MSB
A * B SSAB (a-1)*(b-1) SSAB/DFAB
22
αβε σσ n+
MSW
MSAB
Within (Error) SSW N-ab SSW/DFW
2
εσ
Total SST N-1
o For multi-factor random effects ANOVA, you must always examine the
expected MS to make sure you are using the correct error term!

• To construct a test for Factor A or Factor B, we must use the MS from
the interaction as the error term
For example, let’s consider Factor A
If H0 is true: 02
=ασ
Then 122
22
22
222
=
+
+
=
+
++
==
αβε
αβε
αβε
ααβε
σσ
σσ
σσ
σσσ
n
n
n
nbn
MSAB
MSA
F
If H1 is true: 02
>ασ
Then 122
222
>
+
++
==
αβε
ααβε
σσ
σσσ
n
nbn
MSAB
MSA
F
Suppose we tried to construct an F-test using the MSW
If H0 is true: 02
=ασ
Then 12
22
2
222
>
+
=
++
==
ε
αβε
ε
ααβε
σ
σσ
σ
σσσ nnbn
MSW
MSA
F
F would be greater than 1, even when the null hypothesis was
true! This test is not a test for the effect of factor A!!!
• To construct a test for the AB interaction, we must use the MSW as
the error term
If H0 is true: 02
=αβσ
Then 12
2
2
22
==
+
==
ε
ε
ε
αβε
σ
σ
σ
σσ n
MSW
MSAB
F
If H1 is true: 02
>αβσ
Then 12
22
>
+
==
ε
αβε
σ
σσ n
MSW
MSAB
F

• Using SPSS to analyze a two-factor random effects design
UNIANOVA dv BY admin order
/RANDOM = admin order.
26507.531 1 26507.531 155.441 .000
716.173 4.200 170.531a
151.094 3 50.365 3.446 .065
131.531 9 14.615b
404.344 3 134.781 9.222 .004
131.531 9 14.615b
131.531 9 14.615 .631 .755
370.500 16 23.156c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
ADMIN
Hypothesis
Error
ORDER
Hypothesis
Error
ADMIN *
ORDER
Type III Sum
MS(ADMIN) + MS(ORDER) - MS(ADMIN * ORDER)a.
MS(ADMIN * ORDER)b.
MS(Error)c.
o SPSS highlights the fact that it is using different error terms to test each
factor
o We conclude:
• There is a significant effect of order of the test on number of
responses, F(3,9) = 9.22, p < .01
• Also there is a marginally significant effect of administrator on the
number of responses, F(3,9) = 3.45, p = .07
• But that there is no order by administrator interaction effect on the
number of responses, F(9,16) = 0.63, p = .76.

o SPSS also gives us information on the E(MS) so that we can calculate the
variance components
Expected Mean Squaresa,b
8.000 8.000 2.000 1.000 Intercept
8.000 .000 2.000 1.000
.000 8.000 2.000 1.000
.000 .000 2.000 1.000
.000 .000 .000 1.000
Source
Intercept
ADMIN
ORDER
ADMIN * ORDER
Error
Var(ADMIN) Var(ORDER)
Var(ADMIN *
ORDER) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square equals the sum of the coefficients in
the cells times the variance components, plus a quadratic term involving effects in
the Quadratic Term cell.
a.
Expected Mean Squares are based on the Type III Sums of Squares.b.
16.23ˆ 2
== MSWεσ
⇒ To estimate the variance of the interaction effect
22
* 2)( εαβ σσ +=OrderAdminMSE
0
2
156.23615.14
2
ˆ 2
=
−
=
−
=
MSWMS rAdmin*Orde
αβσ
⇒ To estimate the variance of the administrator effect
rAdmin*ordeAdmin MSMSE +=++= 2222
828)( αεαβα σσσσ
47.4
8
615.14365.50
8
ˆ 2
=
−
=
−
= rAdmin*OrdeAdmin MSMS
ασ
⇒ To estimate the variance of the order effect
rAdmin*ordeOrder MSMSE +=++= 2222
828)( βεαββ σσσσ
02.15
8
615.14781.134
8
ˆ 2
=
−
=
−
= rAdmin*OrdeOrder MSMS
βσ
ˆσY
2
= ˆσε
2
+ ˆσα
2
+σβ
2
+σαβ
2
= 23.16 + 4.47 +15.02 + 0 = 42.65
• Note that any component that is estimated to be less than zero is
assumed to have a value of zero

o SPSS can also compute variance components directly
VARCOMP dv BY order admin
/RANDOM = order admin.
Variance Estimates
15.021
4.469
-4.271a
23.156
Component
Var(ORDER)
Var(ADMIN)
Var(ORDER * ADMIN)
Var(Error)
Estimate
Method: Minimum Norm Quadratic Unbiased Estimation
(Weight = 1 for Random Effects and Residual)
For the ANOVA and MINQUE methods, negative
variance component estimates may occur. Some
possible reasons for their occurrence are: (a) the
specified model is not the correct model, or (b)
the true value of the variance equals zero.
a.
9. Model III: Two-factor mixed models
• Multi-factor experiments involving only random effects are relatively rare in
behavioral research. It is much more common to encounter mixed models
(containing both fixed and random effects) than to encounter a multi-factor
random effects model
• A return to the study on the effect of mental activity on blood flow (BF) –
See p. 9-24. This design is a 2-factor between-subjects mixed model
ANOVA
Factor 1: Test (Math, Reading Comprehension, or History)
Factor 2: Classroom (6 classrooms)
Task (fixed)
Classroom
(random) Math Reading Comp History
1 7.8 8.7 11.1 12.0 11.7 10.0
2 8.0 9.2 11.3 10.6 9.8 11.9
3 4.0 6.9 9.8 10.1 11.7 12.6
4 10.3 9.4 11.4 10.5 7.9 8.1
5 9.3 10.6 13.0 11.7 8.3 7.9
6 9.5 9.8 12.2 12.3 8.6 10.5

• As with the previous example, due to the limited number of observations per
cell, we will assume that the assumptions are satisfied.
222222 222222 222222N =
CLASS
6.005.004.003.002.001.00
14
12
10
8
6
4
2
TASK
Math
Reading
History
• When considering mixed models, interactions between fixed effects and
random effects are considered to be random effects.
• The structural model for a mixed design (A fixed; B random):
Yijk = µ + α j + βσ k + αβ( )σ jk
+εijk
),0(~ εσε Nij
),0(~ βσ σβ Nk
( ) ),0(~ αβσ σαβ Njk
So that σY
2
= σε
2
+ σβ
2
+ σαβ
2
• ANOVA table for a mixed-effects model
o The test of each:
Main effect for task: H0 :α1 = α2 = α3 = 0
Main effect for class: 0: 2
0 =βσH
Task by class interaction: 0: 2
0 =αβσH

o Again, we need to consider the E(MS)s so that we construct valid F-tests.
Factor A
(Fixed)
SSA a-1 SSA/DFA
σε
2
+ nσαβ
2
+
nb α j
2
∑
a −1 MSAB
MSA
Factor B
(Random)
SSB b-1 SSB/DFB σε
2
+ naσβ
2
MSB
MSW
A * B SSAB (a-1)*(b-1) SSAB/DFAB
22
αβε σσ n+ MSAB
MSW
Within
(Error)
SSW N-ab SSW/DFW
2
εσ
Total SST N-1
• To construct a test for Factor A (the fixed effect):
⇒ We must use the MS from the interaction as the error term
• To construct a test for Factor B (a random effect):
⇒ We must use the MSW as the error term
• To construct a test for the Factor AB interaction (a random effect):
⇒ We must use the MSW as the error term
• Why does having a random effect change the error term of the fixed effect,
but not of the random effect?
o Consider a design with therapy (3 fixed levels) and clinical trainee (3
random levels)
o We assume that the three trainees used in the study were drawn from a
population of trainees. Imagine that we can put on our magic classes and
see population means for the therapy modes for the entire population of
trainees (and for simplicity, we will assume that the population is small –
consisting of 17 trainees)
Clinical Trainee
Therapy a b c d e f g h i j k l m n o p q r Mean
A 7 6 5 7 6 5 4 4 4 1 2 3 4 4 4 1 2 3 4
B 4 4 4 1 2 3 7 6 5 7 6 5 1 2 3 4 4 4 4
C 1 2 3 4 4 4 1 2 3 4 4 4 7 6 5 7 6 5 4
Mean 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

o In our study, we randomly sample 3 of the trainees. So let’s consider a
random sample of three trainees
Clinical Trainee
Therapy g k r Mean
A 4 2 3 3.0
B 7 6 4 5.67
C 1 4 5 3.33
Mean 4 4 4 4
o The random trainee factor does not affect our estimation of the effect of
trainee
o The random trainee factor does affect our estimation of the therapy (the
fixed factor)
• Trainee and Therapy interact, which can cause variability among
means for the fixed factor to increase
• MS(A) must be measuring something other than just error and the
effect of Therapy. When we look at the EMS for factor A, we see that
it captures variability due to the A*B interaction
• Using SPSS to analyze a two-factor mixed effects design
UNIANOVA dv BY task class
/RANDOM = class.
3570.062 1 3570.062 2626.655 .000
6.796 5 1.359a
44.042 2 22.021 3.784 .060
58.195 10 5.820b
6.796 5 1.359 .234 .939
58.195 10 5.820b
58.195 10 5.820 7.207 .000
14.535 18 .808c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
TASK
Hypothesis
Error
CLASS
Hypothesis
Error
TASK *
CLASS
Type III Sum
MS(CLASS)a.
MS(TASK * CLASS)b.
MS(Error)c.
o But wait!! SPSS is using the wrong error term for test of the main effect
of classroom!!!
Classroom is a random effect. To test the random effect, we need to
use MSW as the error term. SPSS is using MSAB.

o We will have to do the correct test by hand
Main Effect for Class: F(5,18) =
MSCLASS
MSW
=
1.36
0.81
=1.68, p = .19
o We can also use the TEST subcommand and ask SPSS to compute the F-
test. We need to enter the effect (class), the SS of the denominator
(14.54) and the df of the denominator (18)
UNIANOVA dv BY task class
/RANDOM = class
/TEST = class vs 14.54 df(18).
Test Results
6.796 5 1.359 1.683 .190
14.540a 18a .808
Source
Contrast
Error
Sum of
Squares df Mean Square F Sig.
User specified.a.
o BEWARE! SPSS may contain other “errors.” If you are going to be
analyzing balanced random or mixed designs, it is worth your time and
effort to look up or calculate the E(MS)s for your design (For an
algorithm see Neter, Appendix D)
o Note: SPSS does not consider this to be an error. They state that
statisticians differ in how they approach this problem.
http://spss.com/tech/answer/details.cfm?tech_tan_id=100000073
As indicated in this tech note, SAS makes the same “error.” Thus,
even if you run the analysis in SAS, you will still have to rerun the
analysis
I cannot find any recent texts that agree with the SPSS approach.
Neter et al (1996, p 981), Kirk (1995, p 374) and Maxwell & Delaney
(1990, p 429/431) all give the E(MS) I list on the previous page. For
balanced designs, SPSS does the wrong analysis. For unbalanced
designs, SPSS’s approach may be appropriate.

o The following is a hand-corrected variance components table (based on
the correct E(MS) values listed on page 9-37)
Expected Mean Squaresa
6.000 2.000 1.000
Intercept,
TASK
.000 2.000 1.000 TASK
6.000 .000 1.000
.000 2.000 1.000
.000 .000 1.000
Source
Intercept
TASK
CLASS
TASK * CLASS
Error
Var(CLASS)
Var(TASK *
CLASS) Var(Error)
Quadratic
Term
Variance Component
Andy's Hand-Corrected Tablea.
ˆσε
2
= MSW = 0.81
⇒ To estimate the variance of the interaction effect
E(MSTask*Class) = 2σαβ
2
+σε
2
ˆσαβ
2
=
MSTask*Class − MSW
2
=
5.82 − 0.81
2
= 2.51
⇒ Task is a fixed effect – there is no variance component to estimate
⇒ To estimate the variance of the class effect
E(MSClass) = 6σβ
2
+σε
2
ˆσβ
2
=
MSClass − MSW
6
=
1.36 −0.81
6
= 0.09
ˆσY
2
= ˆσε
2
+ ˆσβ
2
+ ˆσαβ
2
= 0.81+ 0.09 + 2.51 = 3.41
o SPSS’s VARCOMP command also errs on the variance estimate for the
class effect (SPSS output not shown here)

10.Contrasts and post-hoc tests
• To perform contrasts or post-hoc tests, you can use the same formulas
previously discussed for ANOVA – with one exception. You must use the
correct error term in place of MSW, and the degrees of freedom associated
with that error term
o If you perform contrasts/post-hoc test on the marginal means for factor
A, you need to use the error term used to test factor A
o If you perform contrasts/post-hoc test on the marginal means for factor B,
you need to use the error term used to test factor B
o If you perform contrasts/post-hoc test on the individual cell means, you
need to use the error term used to test AB interaction
11. Effect sizes for random effects designs
• The random effects equivalent of eta squared is rho, ρ
• Rho is interpreted just as eta squared – as the proportion of the variance in
the DV accounted for by the factor in the sample
ρA =
σA
2
σY
2
• Omega squares must still be used for fixed effects in a mixed model. In
general, for a fixed factor A:
MSWNerrortermMSdfASSA
errortermMSdfASSA
A
)(][)(
][)(
ˆ 2
+−
−
=ω
o For example, in a two-factor mixed model, with A fixed and B random,
we used MSAB as the error term to test Factor A. Thus, our equation for
omega squared would be:
MSWNMSABdfASSA
MSABdfASSA
A
)()(
)(
ˆ 2
+−
−
=ω
53.
)808)(.36(82.5)2(04.44
82.5)2(04.44
ˆ 2
=
+−
−
=Taskω

12.Final considerations about random effects
• The distinction between fixed and random effects is not always as clear as
presented here. For example, Clark (1973) argued – convincingly – that
when a list of words is used in a study, the words should be treated as a
random effect. The key is what type of inference you want to make
• We consider the random effects as being sampled from an infinite
population. If the population is finite but large, we are OK. However, when
the population to be sampled from is small, adjustments are necessary
• We estimate the distribution of the random effects based on the means (and
the variability of those means) of the random factor. If you only have 2-3
levels of your random factor, you will not get a good estimate of the
distribution. It is desirable to have a relatively large number of levels of any
random factor. In addition, it is important that the levels of the random
factor be randomly sampled from the population of interest
• In designs with three or more factors that include two or more random
effects, it is common to encounter situations where no exact F-test can be
constructed. In this case, quasi-F ratios (linear combinations of MSs) are
used to approximate an F-ratio.
• All of our calculations assume that cell sizes are equal. Things get very
wacky with unequal cell sizes, and it is no longer possible to construct exact
F-tests (the ratios of expected MSs no longer satisfy the requirements for a
valid F-test). Approximate tests are available and are calculated in SPSS.
• It is a good idea to calculate or look-up E(MS)s for balanced designs and/or
to replicate the analysis using another statistical package.

ANOVA designs with nested effects
13.An introduction to nested designs
• Nested designs are also known as hierarchical designs
• The factorial designs studied thus far are considered to be crossed designs.
That is, every level of a factor appears in (or is crossed with) every level of
all other factors. If you display the design in a grid, there are no empty cells
in a crossed design.
• Example 1: The effect of therapist’s sex on treatment outcome You observed
three male and three female therapists. Each therapist sees four patients, and
you record a general measure of psychological health.
Sex of therapist Male Female
Therapist 1 2 3 4 5 6
o Sex is the main variable of interest and is a fixed effect
o Therapist is nested within sex (It can not be crossed because a therapist
can not be both male and female). Therapist will also be considered a
random effect
o Each therapist sees three patients. Thus, patients are nested within
therapist (and are a random effect)
• Example #2: The effect of race of defendant on jury decision making
Race of Defendant Black White
Jury 1 2 3 4 5 6 7 8 9 10 11 12
o Race is the main variable of interest and is a fixed effect
o Jury is nested within race. Jury will most likely be considered a random
effect
o Each jury is composed of 12 participants. The participants are nested
within jury (and are also a random effect)

• Example #3: A new intervention is developed to reduce drug use in inner
city middle-schools students. Six inner-city schools are selected at random,
three receive the new intervention and three receive the old intervention and
within each of those schools two classrooms are selected at random to
receive the new intervention.
Old intervention
School School A School B School C
Classroom 1 2 3 4 5 6 7 8 9 10 11 12
New intervention
School School D School E School F
Classroom 1 2 3 4 5 6 7 8 9 10 11 12
o Type of intervention is a fixed effect
o School is a random effect nested within treatment
o Classroom is a random effect nested within school
o The participants are a random effect nested within classroom
• General comments about nested designs
o In behavioral research, nested factors are usually random effects
o In factorial between subjects designs, participants are nested within cell
• Because I am presenting only an introduction to nested designs, I will
consider only designs with random effects nested within a fixed effect (like
these examples). I can provide references for the analysis of more advanced
designs.

14. Structural models for nested designs
• Example #1: Therapist’s sex and treatment outcome
o Factor A: Therapist’s sex (Male vs. Female) Fixed effect
o Factor B: Therapist Random effect
)()(
/ jkijkjijkY εαβαµ σ +++=
jα The fixed effect of therapist’s sex
αβσ /)( jk
The random effect of therapist within sex
)( jkiε The errors/residuals
AKA the random effect of participant within therapist
Sometimes notated βπσ /)( jki
to emphasize the nesting
• Example #3: Drug use intervention
o Factor A: Intervention Fixed effect
o Factor B: School within intervention Random effect
o Factor C: Classroom within school Random effect
)()()(
// jklijkljkjijklY εβγαβαµ σσ ++++=
jα The fixed effect of intervention
αβσ /)( jk
The random effect of school within intervention
βγσ /)( jkl
The random effect of class within school
)( jkiε The errors/residuals
AKA the random effect of participant within class
Sometimes notated γπσ /)( jkli
to emphasize the nesting
• Note that because these designs are nested, not crossed, there is no way to
estimate an interaction effect.

15.Testing nested effects
• With nested effects, we again need to make sure we use the correct error
term when constructing F-tests.
Design Effect Error Term
Two-factor B/A A MS(B/A)
B Random B/A MSW
A Fixed
Three- factor C/B/A A MS(B/A)
C,B Random B/A MS(C/B)
A Fixed C/B MSW
o Just as for the random effect designs – the SS are calculated in the same
manner as before. The only difference is the construction of the F-test
o For more complex designs, you’ll have to look up the error term, or trust
SPSS
• Example #1: Therapist’s sex and treatment outcome
Sex of Therapist
Male Female
1 2 3 4 5 6
49 42 42 54 44 57
40 48 46 60 54 62
31 52 50 64 54 66
40 58 54 70 64 71
o To test the effect of sex of therapist, we treat each therapist as one
observation (collapsing across participants)
Sex of Therapist
Male Female
40 50 48 62 54 64
A one-factor ANOVA on these six observations would have:
1 df in the numerator
4 df in the denominator
This is essentially how the effect of sex of therapist is analyzed in a
nested design

o SPSS syntax:
UNIANOVA dv BY sex thera
/RANDOM = thera
/DESIGN = sex thera within sex .
67416.000 1 67416.000 601.929 .000
448.000 4 112.000a
1176.000 1 1176.000 10.500 .032
448.000 4 112.000a
448.000 4 112.000 2.459 .083
820.000 18 45.556b
Source
Hypothesis
Error
Intercept
Hypothesis
Error
SEX
Hypothesis
Error
THERA(SEX)
Type III Sum
MS(THERA(SEX))a.
MS(Error)b.
Effect for sex of therapist: F(1,4) = 10.50, p = .03
Effect of therapist: F(4, 18) = 2.46, p = .08
o Let’s do the one-factor ANOVA on the collapsed data to examine the
effect of sex of therapist
Sex of Therapist
Male Female
40 50 48 62 54 64
Descriptives
DV
3 46.0000
3 60.0000
6 53.0000
1.00
2.00
Total
N Mean
ANOVA
DV
294.000 1 294.000 10.500 .032
112.000 4 28.000
406.000 5
Between Groups
Within Groups
Total
Sum of
• This analysis produces the same results – only the SS are different.
This analysis was tricked into thinking each observation was one
participant, but in the actual analysis, we know that each ‘observation’
was based on data from four participants. If you multiply the SS in
this oneway analysis by 4, you will get the same results as the nested
analysis. (This trick only works for balanced designs)

o To calculate the effect sizes:
• Sex is a fixed effect, so we need to calculate omega squared
MSWNerrortermMSdfASSA
errortermMSdfASSA
A
)(][)(
][)(
ˆ 2
++
−
=ω
45.
56.45)24(112)1(1176
112)1(1176
ˆ 2
=
++
−
=Sexω
• Therapist within sex is a random effect, so we need to calculate phi
2
2
)(
)(
Y
sexThera
sexThera
σ
σ
ρ =
Expected Mean Squares
4.000 1.000
Intercept,
SEX
4.000 1.000 SEX
4.000 1.000
.000 1.000
Source
Intercept
SEX
THERA(SEX)
Error
Var(THER
A(SEX)) Var(Error)
Quadratic
Term
Variance Component
22
)()( 4)( εσσ += sexTherasexTheraMSE
86.18
4
56.45121
4
ˆ )(2
)( =
−
=
−
=
MSWMS sexThera
sexTheraσ
22
)(
2
εσσσ += sexTheraY
42.6456.4586.18ˆ 2
=+=Yσ
29.
42.64
86.18
ˆ
ˆ
ˆ 2
2
)(
)( ===
Y
sexThera
sexThera
σ
σ
ρ

• Example #3: Drug use intervention
(Let’s assume that there were three students in each class)
Old Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
11.2 16.5 18.3 19 7.3 11.9 11.3 8.9 15.3 19.5 14.1 16.5
11.6 16.8 18.7 18.5 7.8 12.4 10.9 9.4 15.9 20.1 13.8 17.2
12.0 16.1 19.0 18.2 7.0 12.0 10.5 9.3 16.0 19.3 14.2 16.9
New Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
13.2 17.25 20.3 20.5 9.3 12.9 10.3 10.9 17.55 20.75 15.1 18.75
12.35 18.8 18.45 17.5 7.05 14.65 12.15 8.15 14.9 22.1 14.55 17.2
13.25 15.85 21.0 19.2 8.5 14.25 10.0 11.55 17.75 21.3 13.7 16.9
o To gain an intuitive understanding of how nested effects are tested, it is
beneficial to examine each effect separately
o To test the effect of the intervention, we essentially treat each school as
one observation (collapsing across classrooms and participants)
Intervention
Old New
16.33 9.89 16.57 17.30 10.81 17.55
A one-factor ANOVA on these six observations has:
1 df in the numerator (a-1) = (2-1) = 1
4 df in the denominator a(b-1) = 2(3-1) = 2*2 = 4
ONEWAY dv by treat
/STAT = DESC.
Descriptives
DV
3 14.2613 3.78589
3 15.2200 3.82122
6 14.7407 3.44232
1.00
2.00
Total
N Mean Std. Deviation
ANOVA
DV
1.379 1 1.379 .095 .773
57.869 4 14.467
59.248 5
Between Groups
Within Groups
Total
Sum of
F(1,4) = 0.10, p = .77

o To test the effect of school (within intervention), we treat each class as
one observation (collapsing across participants)
School (Treatment)
1(Old) 2(Old) 3(Old) 1(New) 2(New) 3(New)
11.60 7.37 15.73 12.93 8.28 16.73
16.47 12.10 19.63 17.30 13.93 21.38
18.67 10.90 14.03 19.92 10.81 14.45
18.57 9.20 16.86 19.07 10.20 17.61
A school within treatment ANOVA on these 24 observations has:
4 df in the numerator a(b-1) = 2(3-1) = 2*2 = 4
18 df in the denominator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18
UNIANOVA dv BY treat school
/DESIGN = treat, school within treat.
237.029 5 47.406 6.427 .001
5213.833 1 5213.833 706.816 .000
5.491a 1 5.491 .744 .400
231.538 4 57.885 7.847 .001
132.777 18 7.377
5583.639 24
369.807 23
Source
Corrected Model
Intercept
TREAT
SCHOOL(TREAT)
Error
Total
Corrected Total
Type III Sum
Ignore this test for the effect of treatment in this setupa.
F(4,18) = 7.85, p = .001
o Finally, to test the effect of class (within school within intervention), we
examine the individual observations
This analysis has:
18 df in the numerator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18
48 df in the denominator abc(n-1) = 2*3*4*(3-1) = 48

o To analyze all the effects in one command:
UNIANOVA dv BY treat school class
/RANDOM = school class
/PRINT = DESC
/DESIGN = treat, school within treat,
class within school within treat.
15643.857 1 15643.857 90.088 .001
694.600 4 173.650a
16.531 1 16.531 .095 .773
694.600 4 173.650a
694.600 4 173.650 7.850 .001
398.194 18 22.122b
398.194 18 22.122 27.682 .000
38.358 48 .799c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
TREAT
Hypothesis
Error
SCHOOL(TREAT)
Hypothesis
Error
CLASS(SCHOOL
(TREAT))
Type III Sum
MS(SCHOOL(TREAT))a.
MS(CLASS(SCHOOL(TREAT)))b.
MS(Error)c.
Effect of treatment: F(1,4) = 0.10, p = .77
Effect of school(treatment): F(4,18) = 7.85, p = .001
Effect of class(school(treatment)): F(18,48) = 27.68, p < .001
o SPSS also provides the variance components so that effect sizes can be
calculated for the random effects
Expected Mean Squaresa,b
12.000 3.000 1.000
Intercept,
TREAT
12.000 3.000 1.000 TREAT
12.000 3.000 1.000
.000 3.000 1.000
.000 .000 1.000
Source
Intercept
TREAT
SCHOOL(TREAT)
CLASS(SCHOOL
(TREAT))
Error
Var(SCHOO
L(TREAT))
Var(CLASS
(SCHOOL(T
REAT))) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square equals the sum of the
coefficients in the cells times the variance components, plus a
quadratic term involving effects in the Quadratic Term cell.
a.
Expected Mean Squares are based on the Type III Sums of Squares.b.

16.Final considerations about nested designs
• In these examples, we did not test the assumptions for the model because of
small cell sizes. However, the ANOVA assumptions must be satisfied for the
results to be valid. The assumptions for a nested model are the same as the
assumptions for a fixed or random effects model (depending on if there are
fixed or random effects in the model).
• Pay attention to the small degrees of freedom in the tests for some of the
nested effects. In both examples, the test of the fixed effect (the effect of
most interest in these designs) is based on six observations! Nested designs
can have very low power unless you have a large number of levels of the
nested effects.
• We have focused on balanced complete nested designs with random effects
nested within a fixed effect. Many other nested designs are possible –
including partially nested designs. Before you run a more complicated
nested design, make sure that you know how to analyze it. Kirk (1995) is a
good reference.
• As in the random effects case, contrasts and post-hoc tests can be conducted
by using the appropriate error term in previously developed equations.
• We have discussed nested designs in an ANOVA framework where all the
independent variables are categorical variables. In a regression framework,
these models are usually called hierarchical linear models (HLM) and are
very popular at the moment. In an HLM analysis, different terminology and
different methods of estimation are used, but the interpretation is the same.

ANOVA designs with randomized blocks
17.The logic of blocking
• When we test the effect of a factor on a dependent variable, there are always
many other factors that lead to variability in the DV. When these variables
are not of interest to us, they are called nuisance variables.
• For example, if we are interested in the relationship between type of therapy
and psychological wellness, there are many other factors that influence
wellness other than the type of therapy.
• What can we do about nuisance variables?
o The typical approach is to use random assignment of participants to
treatment conditions.
• The nuisance variables are distributed equally over the experimental
factors so that they do not affect just one treatment level.
• However, all the variation in the DV caused by the nuisance variable
is accumulated in the MSW. A large MSW (relative to the MS of the
factor of interest) will decrease our power to detect the effect of
interest.
o An alternative approach is to hold the nuisance variables constant.
• For example, to examine the effectiveness of several types of therapy,
we can use only 18-year-old white females who have the same
severity of the disorder. By creating a homogenous sample, we will
decrease the MSW and increase our power.
• This approach limits the generalizability of the conclusions. In
addition, if you attempt to hold several variables constant, it may be
difficult to find participants for the study.
o You can also include the nuisance variable(s) as factors in the study.
This approach is known as blocking.

• Any variable that is related to the DV may be used as a blocking variable.
There are two categories of common blocking variables:
o Characteristics associated with the participant:
• Gender
• Age
• Income
• IQ
• Education
• Attitudes
• Previous experience with
task
o Characteristics associated with the experimental setting:
• Time of day
• Batch of material
• Location
• Week
• Measuring instrument
• The participant (!)
• When we include a blocking factor in the design, we can capture the
variability it causes in the DV in a SS(Blocks). This process will reduce the
SS Within, compared to a non blocked design
SS Total
(SS Corrected Total)
SS Error
df = N-a
SS A
df=(a-1)
SS Blocks
df = bl-1
SS Residual
df = N – a – bl + 1
SS A
df=a-1

18.Examples of blocked designs
• Example #1: Methods of quantifying risk. Managers were exposed to one of
three methods of quantifying risk. After learning about the method,
participants were asked to rate their degree of confidence in their risk
assessments.
Fifteen participants were grouped into five blocks, according to their age.
Within each block, participants were randomly assigned to one of the three
experimental conditions
o Layout for a randomized block design
Participant
1 2 3
Block 1 (Oldest participants) C W U
2 C U W
3 U W C
4 W U C
5 (Youngest participants) W C U
o Data from the quantifying risk example:
Method
Block Utility Worry Comparison Average
1 (oldest) 1 5 8 4.7
2 2 8 14 8.0
3 7 9 16 10.7
4 6 13 18 12.3
5 (youngest) 12 14 17 14.3
Average 5.6 14 17
• Note that a randomized block design looks like a factorial design, but
there is only one participant per cell. If there were two or more
participants per cell, we would call this design a two-way ANOVA.
• Because there is one participant per cell, we do not have any
information to test the block by factor interaction.

o Assumptions for a randomized block design:
• Because we only have one observation/cell, we cannot check
assumptions on a cell-by-cell basis as we would for a factorial design.
• We require the standard assumptions:
⇒ Independently and randomly sampled observations
⇒ Homogeneity of variances
(Checked on the marginal means for the factor AND for the blocks)
⇒ Normality
(By block and by treatment)
⇒ We assume that there is no treatment by block interaction (non-
additivity of treatment and blocks)
Plot observed values by block and look for parallel lines
• Additional assumptions are required if the blocking factor is a random
effect
o Checking assumptions in the quantifying risk example
EXAMINE VARIABLES=dv BY block treat
/PLOT BOXPLOT SPREADLEVEL NPPLOT.
• By treatment:
.048 2 12 .953DV
Levene
555N =
TREAT
3.002.001.00
DV
20
10
0
-10
3
Tests of Normality
.940 5 .665
.943 5 .687
.860 5 .227
TREAT
1.00
2.00
3.00
DV
Statistic df Sig.
Shapiro-Wilk

• By block:
Test of Homogeneity of Variances
DV
.552 4 10 .702
Levene
33333N =
BLOCK
5.004.003.002.001.00
DV
20
10
0
-10
Tests of Normality
.993 3 .843
1.000 3 1.000
.907 3 .407
.991 3 .817
.987 3 .780
BLOCK
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk
But with three observations per block, these tests are essentially
worthless!
• No treatment by block interaction
Test for Interaction
0
4
8
12
16
20
Utility Worry Comparison
Block 1
Block 2
Block 3
Block 4
Block 5
It may be difficult to judge the difference between random error and a
true block * factor interaction. You are looking for an extreme pattern
in the data.
o All the assumptions appear to be satisfied in this case

o What to do if assumptions are not satisfied?
• Non-normality and/or moderate heterogeneity of variances
⇒ Rank data and perform analysis on ranked data
• Heterogeneity of variances and/or treatment by block interaction
⇒ Transform data
o Structural model for a randomized block design with one factor and one
block:
ijijijY εταµ +++=
µ = Grand population mean
..ˆ Y=µ
jα = The treatment effect:
The effect of being in level j of factor A
∑ = 0jα or ),0(~ ασ σα Nj
...ˆ YY jj −=α
iτ = The block effect:
The effect of being in level i of the blocking variable
∑ = 0iτ
...ˆ YYii −=τ
ijε = The unexplained error associated with ijY
....ˆ YYYY jiijij +−−=ε
• The randomized block design is identical to a two-factor ANOVA
with no interaction term.
• In this case, the blocking variable is considered to be a fixed variable.
Special accommodations are necessary for a random blocking factor.

o Sums of squares decomposition and ANOVA table for a randomized
block design:
E(MS)
Source SS df MS
Treatments
Fixed
Treatments
Random
Treatment SSA a-1 MSA
1
2
2
−
+
∑
a
bl jα
σε
22
αε σσ bl+
Blocks SSBL bl-1 MSBL
1
2
2
−
+
∑
bl
a jτ
σε
1
2
2
−
+
∑
bl
a jτ
σε
Error SSError (a-1)(bl-1) MSE 2
εσ 2
εσ
Total SST N-1
• To construct a significance test
⇒ For fixed treatment effects For Random Treatment effects
0...: 210 ==== aH ααα 0: 2
0 =ασH
⇒ But for either fixed or random effects, we construct the F-test in
the same manner
MSE
MSA
blaaF =−−− )]1)(1(,1[
⇒ To test for the block effect
MSE
MSBL
blablF =−−− )]1)(1(,1[
However, we are usually not so interested in the test of the
blocking variable. We included this variable to reduce the error
variability.

o Using SPSS to analyze a randomized block design
UNIANOVA dv BY block treat
/DESIGN = treat block.
Note that a factorial design (treatment, block, and treatment*block) is
assumed unless otherwise stated with the DESIGN subcommand
374.133a 6 62.356 20.901 .000
1500.000 1 1500.000 502.793 .000
202.800 2 101.400 33.989 .000
171.333 4 42.833 14.358 .001
23.867 8 2.983
1898.000 15
398.000 14
Source
Corrected Model
Intercept
TREAT
BLOCK
Error
Total
Corrected Total
Type III Sum
• We find a significant treatment effect, F(2,8) = 33.99, p < .001
ˆωA
2
=
SSA −(dfA)MSError
SSA+ (N − dfA)MSError
=
202.8 −(2)2.983
202.8 + (15 −2)2.983
= .814
• Note that post-hoc tests on the marginal treatment means are required
to identify the effect
o What if we had neglected to block by age of participant?
ONEWAY dv BY treat.
ANOVA
DV
202.800 2 101.400 6.234 .014
195.200 12 16.267
398.000 14
Between Groups
Within Groups
Total
Sum of
41.
267.16)215(8.202
267.16)2(8.202
)(
)(
ˆ 2
=
−+
−
=
−+
−
=
MSWithindfANSSA
MSWithindfASSA
Aω
• Although inclusion of the blocking effect did not change the
conclusion of the statistical test, blocking greatly increased the size of
the effect of treatment.

• Example #2: Fat in the diet. A researcher studies three low fat diets.
Participants were blocked on the basis of age. DV = post-diet reduction in
blood plasma lipid levels
Fat content of diet
Block
Extremely
Low
Fairly
Low
Moderately
Low
15-24 .73 .67 .35
25-34 .86 .75 .41
35-44 .94 .81 .46
45-54 1.40 1.32 .95
55-64 1.62 1.41 .98
o First, let’s check the assumptions
EXAMINE VARIABLES=dv BY block fat
/PLOT BOXPLOT NPPLOT.
By block By treatment level
33333N =
BLOCK
5.004.003.002.001.00
DV
1.8
1.6
1.4
1.2
1.0
.8
.6
.4
.2
555N =
FAT
3.002.001.00
DV 1.8
1.6
1.4
1.2
1.0
.8
.6
.4
.2
Tests of Normality
.865 3 .281
.920 3 .452
.935 3 .506
.878 3 .320
.962 3 .626
BLOCK
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk Tests of Normality
.898 5 .401
.829 5 .138
.792 5 .070
FAT
1.00
2.00
3.00
DV
Statistic df Sig.
Shapiro-Wilk
.336 2 12 .721
.047 2 12 .954
.047 2 11.893 .954
.302 2 12 .745
Based on Mean
Based on Median
Based on Median and
with adjusted df
Based on trimmed mean
DV
Levene

Check for treatment by block interaction:
0
0.4
0.8
1.2
1.6
2
Extreme Fair Moderate
Age 15-24
Age 25-34
Age 35-44
Age 45-54
Age 55-64
• All assumptions seem fine
o To examine the effect of fat in the diet on plasma lipid levels, let’s
conduct a randomized block ANOVA
UNIANOVA dv BY block fat
/DESIGN = fat block.
2.045a 6 .341 141.102 .000
12.440 1 12.440 5151.017 .000
.626 2 .313 129.527 .000
1.419 4 .355 146.890 .000
1.932E-02 8 2.415E-03
14.504 15
2.064 14
Source
Corrected Model
Intercept
FAT
BLOCK
Error
Total
Corrected Total
Type III Sum
We find a significant effect of fat in the diet on plasma lipid levels,
F(2,8) = 129.52, p < .001
Let’s conduct Tukey HSD post-hoc tests on the marginal treatment
means. We can have SPSS do the test for us:
UNIANOVA dv BY fat block
/POSTHOC = fat ( TUKEY )
/DESIGN = fat block .

Multiple Comparisons
Tukey HSD
.1180* .03108 .013 .0292 .2068
.4800* .03108 .000 .3912 .5688
-.1180* .03108 .013 -.2068 -.0292
.3620* .03108 .000 .2732 .4508
-.4800* .03108 .000 -.5688 -.3912
-.3620* .03108 .000 -.4508 -.2732
(J) FAT
2.00
3.00
1.00
3.00
1.00
2.00
(I) FAT
1.00
2.00
3.00
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .050 level.*.
Extremely low vs. fairly low fat: t(8) = 3.80, p = .013
Extremely low vs. moderately low fat:t(8) = 15.44, p < .001
Fairly low vs. moderately low fat: t(8) = 11.65, p < .001
o Note that if we had neglected to block on age, we would have failed to
find a significant treatment effect!
ONEWAY dv BY fat.
ANOVA
DV
.626 2 .313 2.610 .115
1.438 12 .120
2.064 14
Between Groups
Within Groups
Total
Sum of
o What would happen if we forgot this was a randomized block design, and
attempted to analyze it as a factorial design?
UNIANOVA dv BY fat block
/DESIGN = fat block fat*block.
2.064a 14 .147 . .
12.440 1 12.440 . .
.626 2 .313 . .
1.419 4 .355 . .
1.932E-02 8 2.415E-03 . .
.000 0 .
14.504 15
2.064 14
Source
Corrected Model
Intercept
FAT
BLOCK
FAT * BLOCK
Error
Total
Corrected Total
Type III Sum
R Squared = 1.000 (Adjusted R Squared = .)a.
Why did this happen???

• A final example: A researcher studied how children solved a variety of
puzzles. Sixty children were blocked into groups of 6 on the basis of age,
gender, and IQ. Within each block, children were randomly assigned to
work on a specific type of puzzle. The number of puzzles (out of a possible
20) solved by each child was recorded.
Puzzle Type
Block P1 P2 P3 P4 P5 P6
1 5 14 8 10 11 6
2 7 10 7 9 12 5
3 11 9 10 11 14 6
4 9 10 6 13 15 7
5 13 12 7 14 16 11
6 7 9 8 6 11 5
7 10 11 8 12 13 8
8 4 8 5 7 9 4
9 14 13 11 15 17 12
10 9 9 8 10 14 9
o First, let’s check assumptions:
EXAMINE VARIABLES=dv by block puzzle
/PLOT BOXPLOT NPPLOT SPREADLEVEL.
• By factor
101010101010N =
PUZZLE
6.005.004.003.002.001.00
DV
18
16
14
12
10
8
6
4
2
45
15
51
Tests of Normality
.970 10 .891
.924 10 .394
.941 10 .560
.974 10 .925
.979 10 .959
.927 10 .415
PUZZLE
1.00
2.00
3.00
4.00
5.00
6.00
DV
Statistic df Sig.
Shapiro-Wilk
1.110 5 54 .366Based on MeanDV
Levene

• By block
6666666666N =
BLOCK
10.009.008.007.006.005.004.003.002.001.00
DV
18
16
14
12
10
8
6
4
2
59
18
17
Tests of Normality
.969 6 .886
.972 6 .907
.964 6 .847
.952 6 .759
.963 6 .846
.983 6 .964
.918 6 .493
.892 6 .331
.983 6 .964
.750 6 .020
BLOCK
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
DV
Statistic df Sig.
Shapiro-Wilk
.521 9 50 .852Based on MeanDV
Levene
• Block by factor interaction
0
2
4
6
8
10
12
14
16
18
P1 P2 P3 P4 P5 P6
• All appears OK.

• Let’s start with a general ANOVA approach
UNIANOVA dv BY puzzle block
/DESIGN = puzzle block.
488.000a 14 34.857 15.121 .000
5684.267 1 5684.267 2465.861 .000
238.933 5 47.787 20.730 .000
249.067 9 27.674 12.005 .000
103.733 45 2.305
6276.000 60
591.733 59
Source
Corrected Model
Intercept
PUZZLE
BLOCK
Error
Total
Corrected Total
Type III Sum
o We find a significant puzzle effect, 01.,73.20)45,5( <= pF
o To describe specific differences, we conduct pair-wise posthoc tests
UNIANOVA dv BY puzzle block
/POSTHOC = puzzle ( TUKEY )
/DESIGN = puzzle block.
Multiple Comparisons
Tukey HSD
-1.6000 .67900 .194 -3.6207 .4207
1.1000 .67900 .590 -.9207 3.1207
-1.8000 .67900 .106 -3.8207 .2207
-4.3000 .67900 .000 -6.3207 -2.2793
1.6000 .67900 .194 -.4207 3.6207
2.7000 .67900 .003 .6793 4.7207
-.2000 .67900 1.000 -2.2207 1.8207
-2.7000 .67900 .003 -4.7207 -.6793
3.2000 .67900 .000 1.1793 5.2207
-2.9000 .67900 .001 -4.9207 -.8793
-5.4000 .67900 .000 -7.4207 -3.3793
.5000 .67900 .976 -1.5207 2.5207
-2.5000 .67900 .008 -4.5207 -.4793
3.4000 .67900 .000 1.3793 5.4207
5.9000 .67900 .000 3.8793 7.9207
(J) PUZZLE
2.00
3.00
4.00
5.00
6.00
3.00
4.00
5.00
6.00
4.00
5.00
6.00
5.00
6.00
6.00
(I) PUZZLE
1.00
2.00
3.00
4.00
5.00
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
• Puzzle 5 is solved more frequently than all other puzzles
• Puzzles 2 and 4 are solved more frequently than puzzles 3 and 6

• Alternatively, imagine that you had the following a priori hypotheses
o P2 = P4
o P3 = P6
o 




 +
>




 +
>
2
63
2
42
5
PPPP
P
o We cannot enter contrasts directly into SPSS, so we’ll have to do these
contrasts by hand.
o Computing and testing a Main Effect Contrast (see 7-39)
.........ˆ
1
11 ar
a
j
jj XcXcXc ++== ∑=
ψ
∑=
=
a
j j
j
n
c
MSErrorStdError
1
2
)ˆ(ψ
Where 2
jc is the squared weight for each marginal mean
jn is the sample size for each marginal mean
MSE is MSE from the omnibus ANOVA
(With the effects of the blocks removed)
)ˆerror(standard
ˆ
~
ψ
ψ
t
∑
∑=
j
j
jj
observed
n
c
MSE
Xc
t
2
..
∑
=
j
j
n
c
SS 2
2
ˆ
)ˆ(
ψ
ψ
MSE
SSC
dfw
SSE
dfc
SSC
dfwF ==),1(

o Create contrast coefficients:
• P2 = P4 (0 –1 0 1 0 0)
• P3 = P6 (0 0 –1 0 0 1)
• 




 +
>




 +
>
2
63
2
42
5
PPPP
P (0 -1 0 -1 2 0) (0 1 -1 1 0 -1)
o Compute the value of each contrast:
Descriptive Statistics
8.9000 3.24722 10
10.5000 1.95789 10
7.8000 1.75119 10
10.7000 2.90784 10
13.2000 2.48551 10
7.3000 2.66875 10
9.7333 3.16692 60
PUZZLE
1.00
2.00
3.00
4.00
5.00
6.00
Total
Mean Std. Deviation N
(0 –1 0 1 0 0) 2.07.105.10ˆ1 =+−=ψ 2.0)ˆ( 1 =ψSS
(0 0 –1 0 0 1) 4.03.78.7ˆ2 −=+−=ψ 8.0)ˆ( 2 =ψSS
(0 -1 0 -1 2 0) 2.52.13*27.105.10ˆ3 =+−−=ψ 067.45)ˆ( 3 =ψSS
(0 1 -1 1 0 -1) 1.63.77.108.75.10ˆ4 =−+−=ψ 025.93)ˆ( 4 =ψSS
o Test the contrast:
77.,08.0
305.2
2.
)45,1(:1 === pFψ
56.,35.0
305.2
8.
)45,1(:2 === pFψ
01.,55.19
305.2
067.45
)45,1(:3 <== pFψ
01.,36.40
305.2
025.93
)45,1(:4 <== pFψ
o Note that if these were post-hoc tests, then we would need to apply the
Tukey HSD or Scheffé correction.

19. Final considerations about blocking
• As shown in the last SPSS output, when there is one participant per cell, the
SS for the interaction is the error term. Some authors create ANOVA tables
with no error term, and use the SS(BL*A) to test the effect of A. The only
difference in these approaches is the labeling of the error term.
• If the blocking variable is not related to the DV, then you actually lose
power by including it in the design.
Blocked Design
Source SS df MS F
MSE
MSA
blaNaF =+−−− )]1(),1[(
Blocks 0 bl-1 MSBL
Error SSError (a-1)(bl-1) MSE
Total SST N-1
Standard Design
Source SS df MS F
F[(a −1),(N − a)] =
MSA
MSE
Within SSError N-a MSE
Total SST N-1
o When SSBL = 0, then MSE (in blocked design) = MSW (in the standard
design), so that the F-ratios in the two cases are identical
o But there are fewer degrees of freedom in the error term for the blocked
design (N-a-bl+1) than in the standard design (N-a). The loss of these b-
1 dfs results in lower power for the blocked design.
o In reality, the SSBL will never be exactly zero, but when SSBL is small
and the number of blocks is large, you will lose power.

• The blocking variable must be a discrete variable. Oftentimes in behavioral
research (and in both of our examples) the blocking variable is a continuous
variable that must be artificially grouped for the purpose of analysis. When
you treat a continuous variable as a discrete variable, you lose information
and power. An analysis of covariance (ANCOVA) is a similar design to a
randomized block design, except nuisance variables may be continuous.
• Testing for non-additivity of treatment effects and blocks:
o If looking at the plot of the DV by blocks makes you feel uneasy (it
shouldn’t!), a statistical test is available: Tukey’s test for nonadditivity.
o If you have more than 1 observation per cell, then you have a factorial
design. You can calculate a SS(Bl*A) and test the interaction.
• If you want to block on two factors, you can use the same procedure outlined
here. Simply combine the two factors into one block. For example, to block
on age and education:
⇒ Young and no education
⇒ Young and education
⇒ Old and no education
⇒ Old and education

Anova advanced

More Related Content

What's hot (20)

Similar to Anova advanced (20)

Recently uploaded (20)

Anova advanced