SlideShare a Scribd company logo
9-1  2004 A. Karpinski
Chapter 9
Advanced Topics in ANOVA
Page
Unbalanced ANOVA designs
1. Why is the design unbalanced? 9-2
2. What happens with unbalanced designs? 9-3
3. An introduction to the problem 9-5
4. Types of sums of squares 9-10
5. An example 9-15
ANOVA designs with random effects
6. Fixed effects vs. random effects 9-22
7. Model II: One-factor random effects model 9-24
8. Model II: Two-factor random effects model 9-30
9. Model III: Two-factor mixed effects model 9-35
10. Contrasts and post-hoc tests 9-41
11. Effect sizes 9-41
12. Final considerations about random effects 9-42
ANOVA designs with nested effects
13. An introduction to nested designs 9-43
14. Structural models for nested designs 9-45
15. Testing nested effects 9-46
16. Final considerations about nested designs 9-52
ANOVA designs with randomized blocks
17. The logic of blocked designs 9-53
18. Examples of randomized block designs 9-55
19. Final consideration about blocked designs 9-69
9-2  2004 A. Karpinski
Advanced Topics in ANOVA:
Unbalanced ANOVA designs
1. Why is the design unbalanced?
• Random factors
o The unequal cell sizes are randomly unequal
o The process leading to the missingness is independent of the levels of the
independent variable
• Scheduling problems
• Computer errors
IV 1
IV B Level 1 Level 2 Level 3
Level 1 11n =15 21n =10 31n =20 45
Level 2 12n =20 22n =20 32n =15 55
35 30 35 100
IV 1
IV B Level 1 Level 2 Level 3
Level 1 11n =4 21n =7 31n =3 14
Level 2 12n =4 22n =3 32n =6 13
Level 3 13n =5 23n =4 33n =5 14
13 14 14 41
• Systematic factors
o The unequal cell sizes are directly or indirectly related to the levels of the
independent variables
• A treatment is painful/ineffective
• High prejudice individuals refuse to answer questions regarding
attitudes toward ethnic groups
IV 1
IV B Level 1 Level 2 Level 3
Level 1 11n =40 21n =40 31n =50 130
Level 2 12n =20 22n =20 32n =30 70
60 60 80 200
IV 1
IV B Level 1 Level 2 Level 3
Level 1 11n =3 21n =6 31n =9 18
Level 2 12n =2 22n =6 32n =9 17
Level 3 13n =4 23n =8 33n =13 25
9 20 31 60
9-3  2004 A. Karpinski
• Missing observations due to systematic factors is bad. Analyzing these data
can lead to very biased results.
• All of the methods we discuss for analyzing unbalanced designs assume the
cell sizes are either a result of:
o Random factors
o Real differences in the population
2. What happens with unbalanced designs?
• Recall that two contrasts are orthogonal if for unequal n
1ψ = ),...,,,( 321 aaaaa
2ψ = ),...,,,( 321 abbbb
0
1
=∑=
a
j i
ii
n
ba
or 0...
2
22
1
11
=+++
a
aa
n
ba
n
ba
n
ba
• In general the tests for main effects and interactions are no longer orthogonal
for unbalanced designs.
• Because of this non-orthogonality, the sums of squares will not nicely
partition.
SSTSSABSSBSSA ≠++
• As a result:
o The tests for the main effects and interactions are not independent of each
other.
o Single degree of freedom contrasts may not be combined into a
simultaneous test.
• The most popular method for dealing with these issues is to use different
methods of computing the sums of squares for each effect.
• These different methods of computing sums of squares DO NOT affect:
i. The error term (MSW)
ii. The test of the highest order interaction
9-4  2004 A. Karpinski
• Three possible approaches to unequal cell sizes (assuming data are missing
completely at random)
o Add observations to make the design balanced
• This solution may not be pragmatic
• It may also present problems regarding random assignment in a true
experiment
o Delete observations to make design balanced
• While an unbalanced design is less powerful than a balanced design,
you ALWAYS lose power by tossing observations
• There is not a good method for deciding whom to toss. (If you use a
random process, then a different person using the same algorithm may
come to different conclusions. If you use a systematic process, then
you may bias your results.)
• I recommend that you NEVER delete an observation to make a design
balanced.
o Impute the missing data
• A topic too advanced for this course!
o Conduct analysis on an unbalanced design
9-5  2004 A. Karpinski
3. An introduction to the problem of unbalanced designs
• Balanced, orthogonal designs
o For balanced designs, the SS partition is complete and each component’s
contribution to the total SS is unique.
• Unbalanced, non-orthogonal designs
o For unbalanced designs, the SS are not necessarily unique to each
component
o These figures are just heuristics. With data, it is possible to have
“negative” overlapping area.
SSA SSB
SSAB
SSA SSB
SSAB
9-6  2004 A. Karpinski
• Approach #1: Only count the unique contribution of each factor
o This approach is known as the Unique SS or Type III SS approach
• Approach #2: Start with only the main effects. Use a unique SS approach to
divide the main effect sums of squares. Then, add the next highest order
effects. For the remaining SS, use the unique approach to divide the SS.
Continue until all effects have been added.
o This approach is known as using Type II SS
SSA
SSAB
SSB
SSAB
SSBSSA
9-7  2004 A. Karpinski
• Approach #3: Start with only the main effects. Determine an order of
importance. Give the most important effect all its SS. For next effect, give
the effect its entire remaining SS. Continue until all main effects are used.
Next consider the two-way interactions, and determine an order of
importance and repeat the process. Continue until all effects have been
considered.
o This approach is known as the hierarchical or Type I SS approach.
Factor A entered first Factor B entered first
SSAB
SSBSSA
SSAB
SSA SSB
9-8  2004 A. Karpinski
• The problem of unequal sample sizes occurs when we collapse across cells
to look at the marginal means. There are different ways to collapse the main
effects, and each gives a different answer.
(The MSW and the highest order interaction are unaffected by these
different methods because they do not average across any cells—they say
something about individual cells.)
• An example: Salary data for female and male employees
Female Male
College Degree
No
College Degree College Degree
No
College Degree
24 15 25 19
26 17 29 18
25 20 27 21
24 16 20
27 21
24 22
27 19
23
Mean 25 17 27 20
Sample Size 8 4 3 7
Gender
Female Male
Education College Degree
25
8
11
11
=
=
X
n
27
3
21
21
=
=
X
n
No College Degree
17
4
12
12
=
=
X
n
20
7
22
22
=
=
X
n
9-9  2004 A. Karpinski
• Question: Is there a difference in the salaries of men and women?
o Approach #1: Let’s run a contrast comparing women’s salary to men’s
salary
Gender
Women Men
Education College Degree -1 1
No College Degree -1 1
• Based on this approach, we conclude that men earn more than
women!
⇒ Women earn $21000 21
2
1725
=




 +
⇒ Men earn $23500 5.23
2
2027
=




 +
o Approach #2: Ignore education level and compute marginal gender
means.
Gender
Women Men
College Degree
33.22
12
=
=
F
F
X
n
10.22
10
=
=
M
M
X
n
• Based on this approach we look at the marginal means for gender, and
conclude that women earn slightly more than men
o Which answer is correct?
9-10  2004 A. Karpinski
o It depends – each method answers a different question
• Method #2 asks: Are men paid a higher salary than women?
• Method #1 asks: Within an education status, are men paid a higher
salary than women?
• This discrepancy is known as “Simpson’s Paradox”
4. Types of Sums of Squares
• I am going to focus on the use and interpretation of each type of sums of
squares, and will ignore how to compute these SS. SPSS (or any statistical
software) can calculate each of the SS, but if you must see the computational
details, see an advanced ANOVA book.
• Type III / Unique SS or Regression SS
o In general, this is the best and most common approach to analysis
o For Type III SS, each cell mean is weighted equally when computing
marginal means. These cell means are unweighted (because they
considered equally, independent of the sample sizes).
o This approach leads to the identical results as converting the design to a
one-factor arrangement and using contrasts to test the main effects and
interactions.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type III SS, we only use the part of the SS that is unique to the factor
of interest.
(For those of you familiar with regression, Type III SS is equivalent to testing for
each effect after having previously controlled for/entered all other effects OR by
entering all effects simultaneously.)
9-11  2004 A. Karpinski
o In our example, using Type III SS is equivalent to taking approach #1 to
the analysis.
Testing the main effect for gender using a Type III SS approach:
Gender
Women Men
Education College Degree 2511 =X
-1
2721 =X
1
No College Degree 1712 =X
-1
2022 =X
1
• Main effect for gender
⇒ Women earn $21000 21
2
1725
=




 +
⇒ Men earn $23500 5.23
2
2027
=




 +
• How is the main effect for education tested?
• In SPSS:
UNIANOVA dv BY gender edu
/METHOD = SSTYPE(3).
Tests of Between-Subjects Effects
Dependent Variable: DV
273.864a 3 91.288 32.864 .000
9305.790 1 9305.790 3350.084 .000
29.371 1 29.371 10.573 .004
264.336 1 264.336 95.161 .000
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
GENDER
EDU
GENDER * EDU
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .846 (Adjusted R Squared = .820)a.
Main effect for gender such that men earn more than women,
F(1,22) = 10.57, p = .004
Main effect for education such that college educated individuals earn
more than non-college educated individuals,
F(1,22) = 95.16, p < .001
9-12  2004 A. Karpinski
• Type I / Hierarchical SS
o For Type I SS, each cell mean is weighted by its cell size when
computing marginal means.
o The order the factors are entered into SPSS makes a difference in how
the SS are computed.
o When the design is not orthogonal, the SS of each effect may sum to a
number greater than the total SS because of redundancy/overlap in SS.
For Type I SS:
• For the first factor listed, we use all the SS for that factor (unique and
redundant)
• For the next factors, we use the entire SS that is not redundant with
the previous factors
(For those of you familiar with regression, Type I SS is equivalent to testing for
each effect by entering each effect one after the other)
o In our example, Type I SS (with gender listed first) is equivalent to
ignoring education level and using weighted marginal means
Gender
Women Men
College Degree
33.22
12
=
=
F
F
X
n
10.22
10
=
=
M
M
X
n
• In SPSS:
UNIANOVA dv BY gender edu
/METHOD = SSTYPE(1).
Tests of Between-Subjects Effects
Dependent Variable: DV
273.864a 3 91.288 32.864 .000
10869.136 1 10869.136 3912.889 .000
.297 1 .297 .107 .747
272.392 1 272.392 98.061 .000
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
GENDER
EDU
GENDER * EDU
Error
Total
Corrected Total
Type I Sum
of Squares df Mean Square F Sig.
R Squared = .846 (Adjusted R Squared = .820)a.
9-13  2004 A. Karpinski
UNIANOVA dv BY edu gender
/METHOD = SSTYPE(1).
Tests of Between-Subjects Effects
Dependent Variable: DV
273.864a 3 91.288 32.864 .000
10869.136 1 10869.136 3912.889 .000
242.227 1 242.227 87.202 .000
30.462 1 30.462 10.966 .004
1.175 1 1.175 .423 .524
50.000 18 2.778
11193.000 22
323.864 21
Source
Corrected Model
Intercept
EDU
GENDER
EDU * GENDER
Error
Total
Corrected Total
Type I Sum
of Squares df Mean Square F Sig.
R Squared = .846 (Adjusted R Squared = .820)a.
Gender listed first Edu listed first
Main effect for gender F(1,18) = 0.11, p = .75 F(1,18) = 10.97, p < .001
Main effect for education F(1,18) = 98.06, p < .001 F(1,18) = 87.20, p < .001
• Not surprisingly, there are additional types of sums of squares
o Type II SS
A compromise between Type I and Type III SS
o Type IV SS
Use when there are missing cells in the design of the experiment
• Which SS are better?
o In general, you ran the design because you wanted to compare the cell
means. In this case, the unequal cell sizes are irrelevant and you should
use Type III SS
• If we have an experimental design and the data are missing at random,
then there is no defensible reason for allowing cells with larger
numbers of observations to exert a greater influence on the analysis
• For men and women with equal levels of education, do men and
women receive equal pay?
• Type III SS also have the advantage of being the simplest to convert
to contrast coefficients
9-14  2004 A. Karpinski
o If your design intentionally has unequal cell sizes (perhaps to reflect
differences in the composition of the population) and you want your
analyses to reflect this feature, then Type I SS may be more appropriate
• Do men and women receive equal pay?
o This issue of which type of SS to use for unbalanced designs is still
controversial. Different texts and different authors offer different
recommendations. The important point is for you to think about what
question you are asking and which type of SS best answers that question.
You must decide this issue before you analyze your data, not after
examining the p-values!
• Important points to remember
o Regardless of the type of SS used, the error term remains unchanged
o Any analysis that does not involve marginal means remains unchanged
• The test of the highest order interaction is unchanged
• Tests of cell mean contrasts are unchanged
o In most cases Type III SS seem to be the “best” because they take into
account information about all the factors
• If important factors are omitted from the design, you may arrive a
erroneous conclusions (In regression, this is known as the omitted
variable problem).
9-15  2004 A. Karpinski
5. An Example: Level of Management and Support of Affirmative Action
Management Level
Gender
Middle-
Management,
Minor
Division
Upper-
Management,
Minor
Division
Middle-
Management,
Major
Division
Upper-
Management,
Major
Division CEO
Female 21 25 29
26 24
31 30
23 28
25 22
31 30
35 25
30
27 36
27
Male 25 18
26
31 28
22 31
33 31
40 36
35 35 43
37 40
36 44 43
45 42
DV = Scores on an Affirmative Action Attitude Scale
• Note that this design is rather odd – it is a 2*2* 2 with an extra 2 cells
Management Level
Middle Management Upper Management
Gender
Minor
Division
Major
Division
Minor
Division
Major
Division
Male
Female
Gender CEO
Male
Female
• Rather than trying to analyze it as a 2*2*3 with two missing cells, it is much
easier to consider this design to be a 2*5 design. Using appropriate contrasts,
we can test
o Main effect of management level
o Main effect of division
o Management by division interaction
o Interactions between all these terms and gender
But we can also make comparisons between these groups and CEOs.
• Using this approach, we can avoid designs with empty cells and the need to
learn about Type IV SS.
9-16  2004 A. Karpinski
Your specific research questions were:
i. Do middle and upper management from minor divisions differ in their
support for AA?
ii. Do minor division managers differ from major division managers in
their support for AA?
iii. Do CEOs differ from other management in their support for AA?
iv. Do questions i. – iii. differ by gender?
• First, let’s look at the data:
55443 33445N =
MANAGE
CEOUP - MajorMM - MajorUM - MinorMM - Minor
DV
50
40
30
20
10
GENDER
Female
Male
36
1
3
EXAMINE VARIABLES=dv BY group
/PLOT NPPLOT.
Tests of Normality
.989 5 .977
.895 4 .405
.912 4 .492
1.000 3 1.000
.750 3 .000
.842 3 .220
.827 4 .161
.971 4 .850
.887 5 .341
.836 5 .154
GROUP
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variances
DV
.348 9 30 .950
Levene
Statistic df1 df2 Sig.
9-17  2004 A. Karpinski
• Rather than running a traditional main effects and interaction analysis, let’s
skip the omnibus tests and do a contrast-based test of the hypotheses.
o We should adopt a Type III SS approach – the variations in the cell sizes
appear to be random and we are interested in the cell means.
o To conduct contrasts with a Type III SS approach, we need to consider
each cell mean equally, regardless of its sample size – but that is what we
do when we use our standard tests for contrasts.
o However, remember that we cannot combine single degree of freedom
contrasts into a simultaneous omnibus test of a hypothesis.
Hypothesis 1
o Do middle and upper management in the minor divisions differ in their
support for AA?
o Does this level of support differ by gender?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major CEO
Hyp1: Female
Male
-1
-1
1
1
0
0
0
0
0
0
Hyp 1B: Female
Male
-1
1
1
-1
0
0
0
0
0
0
ONEWAY dv by group
/cont = -1 1 0 0 0 -1 1 0 0 0
/cont = -1 1 0 0 0 1 -1 0 0 0.
Contrast Tests
8.0000 4.00638 1.997 30 .055
-2.0000 4.00638 -.499 30 .621
Contrast
Hyp 1
Hyp 1 * Gender
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)
9-18  2004 A. Karpinski
• In the minor divisions, we find that upper management is more
supportive of AA than middle management,
t(30) = 2.00, p = .06, ω2
= .07.
• This difference in support of AA does not vary by gender,
t(30) = -0.50, p = .62, ω2
< .01
• As an example of the effect size calculation, here are the omega
squared calculations for the test of Hypothesis 1:
Hypothesis 1: ˆψ1 = 8
SS ˆψ1=
ˆψ1
2
cj
2
n j
∑
=
(8)2
(−1)2
5
+
(1)2
4
+ 0 + 0 + 0 +
(−1)2
3
+
(1)2
4
+ 0 + 0 + 0
=
64
1.033
= 61.935
ˆωψ
2
=
SSψ − MSWithin
SSψ + (N −1)MSWithin
=
61.935 −15.53
61.935 + (39)15.53
= .0695
Hypothesis 2
o Do minor division managers differ from major division managers in their
support for AA?
o Does this level of support differ by gender?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major CEO
Hyp 2: Female
Male
-1
-1
-1
-1
1
1
1
1
0
0
Hyp 2B: Female
Male
-1
1
-1
1
1
-1
1
-1
0
0
ONEWAY dv by group
/cont = -1 -1 1 1 0 -1 -1 1 1 0
/cont = -1 -1 1 1 0 1 1 -1 -1 0.
Contrast Tests
26.0000 5.66588 4.589 30 .000
-18.0000 5.66588 -3.177 30 .003
Contrast
Hyp 2
Hyp 2 * Gender
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)
9-19  2004 A. Karpinski
• We find a significant division of management by gender interaction,
t(30) = -3.18, p < .01, ω2
= .19.
To understand this interaction, we must conduct simple effects tests:
ONEWAY dv by group
/cont = -1 -1 1 1 0 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 1 1 0.
Contrast Tests
4.0000 4.00638 .998 30 .326
22.0000 4.00638 5.491 30 .000
Contrast
Hyp 2 - Women only
Hyp 2 - Men only
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)
• For women, we find no significant difference between major and
minor division management in their support for AA,
t(30) = 1.00, ns, ω2
< .01.
• For men, we find that managers in major divisions express more
support for AA than managers in minor divisions,
t(30) = 5.49, p < .05, ω2
= .42.
(Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value)
Hypothesis 3
o Do CEOs differ from other management in their support for AA?
o Does this level of support differ by gender?
Management Level
Gender MM,
Minor
UM,
Minor
MM,
Major
UM,
Major
CEO
Hyp 3: Female
Male
-1
-1
-1
-1
-1
-1
-1
-1
4
4
Hyp 3B: Female
Male
-1
1
-1
1
-1
1
-1
1
4
-4
ONEWAY dv by group
/cont = -1 -1 -1 -1 4 -1 -1 -1 -1 4
/cont = -1 -1 -1 -1 4 1 1 1 1 -4.
Contrast Tests
54.0000 12.83173 4.208 30 .000
-34.0000 12.83173 -2.650 30 .013
Contrast
Hyp 3
Hyp * Gender
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)
9-20  2004 A. Karpinski
• We find a significant level of management by gender interaction,
t(30) = -2.65, p = .01, ω2
= .13.
To understand this interaction, we must conduct simple effects tests:
ONEWAY dv by group
/cont = -1 -1 -1 -1 4 0 0 0 0 0
/cont = 0 0 0 0 0 -1 -1 -1 -1 4.
Contrast Tests
10.0000 9.94462 1.006 30 .323
44.0000 8.10912 5.426 30 .000
Contrast
Hyp 3 - Women only
Hyp 3 - Men only
DV
Value of
Contrast Std. Error t df Sig. (2-tailed)
• For women, we find no significant difference between management
and CEOs in their support for AA, t(30) = 1.01, ns, ω2
< .01.
• For men, we find that CEOs express more support for AA than other
managers, t(30) = 5.42, p < .05, ω2
= .42
(Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value)
• Note that for a contrast-based analysis, we are implicitly adopting a Type III
SS approach by weighting each cell mean equally. Single degree of
freedom tests of cell means are not affected by an unbalanced design
(However, we would not be able to combine single df tests into a
simultaneous test).
9-21  2004 A. Karpinski
• If we had taken a traditional approach, we would have used Type III SS for
our analysis because we assume that the data are missing at random and we
want to know if attitudes toward AA differ by gender within each
management position.
UNIANOVA dv BY gender manage
/METHOD = SSTYPE(3)
/PRINT = DESC.
Tests of Between-Subjects Effects
Dependent Variable: DV
1427.100a 9 158.567 10.208 .000
36013.846 1 36013.846 2318.488 .000
260.000 1 260.000 16.738 .000
687.429 4 171.857 11.064 .000
268.351 4 67.088 4.319 .007
466.000 30 15.533
40706.000 40
1893.100 39
Source
Corrected Model
Intercept
GENDER
MANAGE
GENDER * MANAGE
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .754 (Adjusted R Squared = .680)a.
o We find a significant gender by management position interaction,
F(1,30) = 4.32, p < .01
o We would be required to perform follow-up tests before interpreting the
main effects for gender and management.
Attitude Toward Affirmative Action
20
25
30
35
40
45
Gender
Attitude
MM - Minor
MM - Major
UM - Minor
UM - Major
CEO
Female Male
9-22  2004 A. Karpinski
ANOVA designs with random effects
6. Fixed effects vs. random effects
• Model I: The fixed effects model
o A fixed effect is one in which the experimenter is only interested in the
levels of the IV that are included in the study
o In advance of the study, the experimenter decides to examine a relatively
small set of treatments. Each treatment of interest is included in the
study. The experimenter wishes to make inferences about those
treatments and no others.
o The effect is fixed in that if someone were to replicate the study, the
identical treatments would be used
o Example of a fixed effects model: An advertising company wants to
examine the effectiveness of five different billboards in both men and
women, and in White-Americans, Black-Americans, Asian-Americans,
and Hispanic Americans.
• This design is a 5*2*4 between subjects, fixed effects ANOVA
Factor 1: Advertisement (5 different billboards)
Factor 2: Gender (Men and Women)
Factor 3: Ethnicity (4 ethnic groups)
• Each of these factors is fixed. If the design were to be replicated, the
exact same ads, genders, and ethnicities would be used. The
experimenter wants to make inferences regarding only these ads,
genders, and ethnicities.
• (The exact same participants would not be used – participants are
always a random effect)
( ) ( ) ( ) ( ) ijkljklkljljklkjijklY εαβγβγαγαβγβαµ ++++++++=
9-23  2004 A. Karpinski
• Model II: The random effects model
o A random effect is one in which the factor levels are randomly sampled
from a population. Inferences are made not only for the factor levels
included in the study, but to the entire population of factor levels.
o The effect is random in that if someone were to replicate the study, the
different treatments would be sampled from the population.
o Example of a random effects model: A company owns several hundred
retail stores throughout the country, and it wants to examine the
effectiveness of a new sales promotion. Five stores are randomly
sampled. The sales promotion is implemented in each store for a trial
period and then evaluated.
• This design is a 1-factor between-subjects, random effect ANOVA
Factor 1: Store (5 stores)
• The store factor is a random factor. If the design were to be
replicated, five different stores would be randomly sampled from the
population. The experimenter wants to make inferences regarding the
effectiveness of the sales promotion in all stores, not just the five
included in the study.
• Model III: Mixed model
o A mixed model is a model containing at least one fixed effect and at least
one random effect
In psychology many people refer to a design with at least one between-subjects
factor and at least one within-subjects factor as a mixed design. Although this
terminology is common in psychology it is inconsistent with the statistical usage
of the term. Consistent with the statistical usage, we will reserve the term mixed
model for a model with fixed and random factors
9-24  2004 A. Karpinski
o Example of a mixed model: To investigate the effect of mental activity
on blood flow to the brain (BF), participants completed a math test, a
reading comprehension test, or a history task. The experimenter wanted
to generalize the results to a classroom setting, and reasoned that
different classrooms might have different effects on baseline BF. Thus,
six fifth grade classrooms were selected at random from the Philadelphia
public school system. The students in each class were randomly assigned
to the math test, the reading comprehension test, or the history test. Post-
test BF readings were taken on all participants.
• This design is a 2-factor between-subjects, mixed model ANOVA
Factor 1: Test (Math, Reading Comprehension, or History)
Factor 2: Classroom (6 classrooms)
• The test factor is a fixed factor. These three kinds of tasks are the
only tasks of interest to the experimenter. The classroom factor is a
random factor. If the design were to be replicated, six different
classrooms would be randomly sampled from the population.
• The key idea of the random effects model is that you not only take into
account random noise, 2
εσ , you also take into account the variability due to
the sampling of the factor levels, 2
ασ
7. Model II: One-factor random effects model
• Let’s consider the sales effectiveness example in more detail
Store
1 2 3 4 5
5.80 6.00 6.30 6.40 5.70
5.10 6.10 5.50 6.40 5.90
5.70 6.60 5.70 6.50 6.50
5.90 6.50 6.00 6.10 6.30
5.60 5.90 6.10 6.60 6.20
5.40 5.90 6.20 5.90 6.40
5.30 6.40 5.80 6.70 6.00
5.20 6.30 5.60 6.00 6.30
50.51 =X 22.62 =X 90.53 =X 33.64 =X 16.65 =X
9-25  2004 A. Karpinski
• For a random effects model, we need to check some additional assumptions,
compared to the fixed-effects model
o Fixed effects assumptions:
• All observations are drawn from normally distributed populations
• All observations have a common variance
• All observations are independent and are randomly sampled from the
population
o Random effects assumptions:
• All treatment effects are drawn from normally distributed populations
• All treatment effects are independent and are randomly sampled from
the population
o In general, we cannot check these random effects assumptions in the
data. We must infer them from the design.
EXAMINE VARIABLES=dv BY store
/PLOT BOXPLOT NPPLOT SPREADLEVEL.
88888N =
STORE
5.004.003.002.001.00
DV
7.0
6.5
6.0
5.5
5.0
4.5
Tests of Normality
.950 8 .716
.913 8 .373
.950 8 .716
.930 8 .516
.946 8 .667
STORE
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variance
.073 4 35 .990DV
Levene
Statistic df1 df2 Sig.
9-26  2004 A. Karpinski
• The structural model for a oneway random effects model looks similar to a
fixed model
o Fixed effects model:
Yij = µ + α j + εij ),0(~ εσε Nij
o Random effects model:
ijjijY εαµ σ ++= ),0(~ εσε Nij ),0(~ ασ σα Nj
So that 222
αε σσσ +=Y
• Random effects are denoted with a subscript σ to highlight that they
are random. That is, the sj
'σα are not fixed at a level, but have a
distribution.
• In general, we are not interested in estimating the sj
'σα because they
vary from study to study. It is much more informative to estimate the
distribution of sj
'σα : ),0(~ ασ σα Nj
• When we estimate effects, we will want to estimate 2
ασ
• ANOVA table for a random-effects model
o Recall the ANOVA table for the fixed-effects model
0...: 210 ==== aH ααα
Source SS df MS E(MS) F
Between SSBet a-1 SSB/DFBet
1
2
2
−
+
∑
a
n iiα
σε MSW
MSBet
Within (Error) SSW N-a SSW/DFW
2
εσ
Total SST N-1
o A valid F-test for a factor is constructed so that:
• When the null hypothesis is true, the expected F-value is 1
If H0 is true: 0
1
2
=
−
∑
a
n iiα
Then 11
2
2
2
2
2
==−
+
==
∑
ε
ε
ε
ε
σ
σ
σ
α
σ
a
n
MSW
MSBet
F
ii
9-27  2004 A. Karpinski
• When the alternative hypothesis is true, the expected F-value is
greater than 1 and this increase is only due to the factor of interest
If H1 is true: 0
1
2
>
−
∑
a
n iiα
Then 11
2
2
2
>−
+
==
∑
ε
ε
σ
α
σ
a
n
MSW
MSB
F
ii
o Now the ANOVA table for the random-effects model
0: 2
0 =ασH
Source SS df MS E(MS) F
Between SSBet a-1 SSB/DFBet
22
αε σσ n+
MSW
MSBet
Within (Error) SSW N-a SSW/DFW
2
εσ
Total SST N-1
o Although the F-tests are constructed in the same manner as a fixed effects
model, under the hood different components are being estimated
• When the null hypothesis is true, the expected F-value is 1
If H0 is true: 02
=ασ
Then 12
2
2
22
==
+
==
ε
ε
ε
αε
σ
σ
σ
σσ n
MSW
MSBet
F
• When the alternative hypothesis is true, the expected F-value is
greater than 1 and this increase is only due to the factor of interest
If H1 is true: 02
>ασ
Then 12
22
>
+
==
ε
αε
σ
σσ n
MSW
MSBet
F
9-28  2004 A. Karpinski
• Random Effects in SPSS
UNIANOVA dv BY store
/RANDOM = store.
Tests of Between-Subjects Effects
Dependent Variable: DV
1449.616 1 1449.616 1665.507 .000
3.482 4 .870a
3.482 4 .870 10.717 .000
2.843 35 8.121E-02b
Source
Hypothesis
Error
Intercept
Hypothesis
Error
STORE
Type III Sum
of Squares df Mean Square F Sig.
MS(STORE)a.
MS(Error)b.
o To test the effect of store: F(4, 35) = 10.72, p < .01
o We reject the null hypothesis of no store effect and conclude that the
effectiveness of the sales campaign varies by store
• If store had been a fixed effect, we would conduct post-hoc tests to
determine how the stores differed.
• But when store is a random effect, we are not interested in differences
between specific stores used in the study. We only want to know if
the store variable adds any variance to the DV (or accounts for any
variance in the DV). In general, we are not interested in post-hoc tests
on the levels of a random variable.
9-29  2004 A. Karpinski
o For any random effects model, SPSS also provides us with the E(MS) so
that we can see how the F-test was constructed:
Expected Mean Squaresa
8.000 1.000 Intercept
8.000 1.000
.000 1.000
Source
Intercept
STORE
Error
Var(STORE) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square
equals the sum of the coefficients in the cells
times the variance components, plus a quadratic
term involving effects in the Quadratic Term cell.
a.
E(MSSTORE) = 8*VAR(STORE) + VAR(ERROR)
VAR(STORE) = 2
ασ and VAR(ERROR) = 2
εσ
E(MSSTORE) = 8 2
ασ + 2
εσ
• We can use this information to estimate the variance components
⇒ To estimate the error variance
08.ˆ 2
== MSWεσ
⇒ To estimate the variance of the store effect
22
8)( εα σσ +=STOREMSE
So that with a little algebra, we obtain:
10.
8
08.87.
8
ˆ 2
=
−
=
−
=
MSWMSSTORE
ασ
⇒ To estimate total variance
18.10.08.ˆˆˆ 222
=+=+= αε σσσY
9-30  2004 A. Karpinski
8. Model II: Two-factor random effects model
• An Example: Suppose a projective test involves 10 cards administered to a
patient, and the number of responses to each card is recorded. The
developer of the test suspects that the order of the cards might influence the
number of responses. Furthermore, the developer has created a standardized
set of instructions in hopes that the effect of the administrator will be
negligible.
To test these assumptions about the test, the developer randomly
selects four possible orders of the ten cards. Four administrators are
recruited to give each order of the test to two patients
Administrator
Order 1 2 3 4
1 26 15 30 33 25 23 28 30
2 26 24 25 33 27 17 27 26
3 33 27 26 32 30 24 31 26
4 36 28 37 42 37 33 39 25
2222 2222 2222 2222N =
ADMIN
4.003.002.001.00
DV
50
40
30
20
10
ORDER
1.00
2.00
3.00
4.00
• With 2 observations/cell, this example is obviously for pedagogical purposes
only. Due to the limited number of observations per cell, we will assume
that the assumptions are satisfied.
9-31  2004 A. Karpinski
• The structural model for this design:
( ) ijkjkkjijkY εαββαµ σσσ ++++=
),0(~ εσε Nij
),0(~ ασ σα Nj
),0(~ βσ σβ Nk
( ) ),0(~ αβσ σαβ Njk
So that 22222
αββαε σσσσσ +++=Y
• ANOVA table for a random-effects model
o The test of each factor is examining a different variance component
Main effect for Administrator: 0: 2
0 =ασH
Main effect for Order: 0: 2
0 =βσH
Administrator by Order interaction: 0: 2
0 =αβσH
o In the two factor random effects model, we need to be much more careful
about examining the E(MS) and constructing appropriated tests of each
effect.
Source SS df MS E(MS) F
Factor A SSA a-1 SSA/DFA
222
ααβε σσσ nbn ++
MSAB
MSA
Factor B SSB b-1 SSB/DFB
222
βαβε σσσ nan ++
MSAB
MSB
A * B SSAB (a-1)*(b-1) SSAB/DFAB
22
αβε σσ n+
MSW
MSAB
Within (Error) SSW N-ab SSW/DFW
2
εσ
Total SST N-1
o For multi-factor random effects ANOVA, you must always examine the
expected MS to make sure you are using the correct error term!
9-32  2004 A. Karpinski
• To construct a test for Factor A or Factor B, we must use the MS from
the interaction as the error term
For example, let’s consider Factor A
If H0 is true: 02
=ασ
Then 122
22
22
222
=
+
+
=
+
++
==
αβε
αβε
αβε
ααβε
σσ
σσ
σσ
σσσ
n
n
n
nbn
MSAB
MSA
F
If H1 is true: 02
>ασ
Then 122
222
>
+
++
==
αβε
ααβε
σσ
σσσ
n
nbn
MSAB
MSA
F
Suppose we tried to construct an F-test using the MSW
If H0 is true: 02
=ασ
Then 12
22
2
222
>
+
=
++
==
ε
αβε
ε
ααβε
σ
σσ
σ
σσσ nnbn
MSW
MSA
F
F would be greater than 1, even when the null hypothesis was
true! This test is not a test for the effect of factor A!!!
• To construct a test for the AB interaction, we must use the MSW as
the error term
If H0 is true: 02
=αβσ
Then 12
2
2
22
==
+
==
ε
ε
ε
αβε
σ
σ
σ
σσ n
MSW
MSAB
F
If H1 is true: 02
>αβσ
Then 12
22
>
+
==
ε
αβε
σ
σσ n
MSW
MSAB
F
9-33  2004 A. Karpinski
• Using SPSS to analyze a two-factor random effects design
UNIANOVA dv BY admin order
/RANDOM = admin order.
Tests of Between-Subjects Effects
Dependent Variable: DV
26507.531 1 26507.531 155.441 .000
716.173 4.200 170.531a
151.094 3 50.365 3.446 .065
131.531 9 14.615b
404.344 3 134.781 9.222 .004
131.531 9 14.615b
131.531 9 14.615 .631 .755
370.500 16 23.156c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
ADMIN
Hypothesis
Error
ORDER
Hypothesis
Error
ADMIN *
ORDER
Type III Sum
of Squares df Mean Square F Sig.
MS(ADMIN) + MS(ORDER) - MS(ADMIN * ORDER)a.
MS(ADMIN * ORDER)b.
MS(Error)c.
o SPSS highlights the fact that it is using different error terms to test each
factor
o We conclude:
• There is a significant effect of order of the test on number of
responses, F(3,9) = 9.22, p < .01
• Also there is a marginally significant effect of administrator on the
number of responses, F(3,9) = 3.45, p = .07
• But that there is no order by administrator interaction effect on the
number of responses, F(9,16) = 0.63, p = .76.
9-34  2004 A. Karpinski
o SPSS also gives us information on the E(MS) so that we can calculate the
variance components
Expected Mean Squaresa,b
8.000 8.000 2.000 1.000 Intercept
8.000 .000 2.000 1.000
.000 8.000 2.000 1.000
.000 .000 2.000 1.000
.000 .000 .000 1.000
Source
Intercept
ADMIN
ORDER
ADMIN * ORDER
Error
Var(ADMIN) Var(ORDER)
Var(ADMIN *
ORDER) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square equals the sum of the coefficients in
the cells times the variance components, plus a quadratic term involving effects in
the Quadratic Term cell.
a.
Expected Mean Squares are based on the Type III Sums of Squares.b.
⇒ To estimate the error variance
16.23ˆ 2
== MSWεσ
⇒ To estimate the variance of the interaction effect
22
* 2)( εαβ σσ +=OrderAdminMSE
So that with a little algebra, we obtain:
0
2
156.23615.14
2
ˆ 2
=
−
=
−
=
MSWMS rAdmin*Orde
αβσ
⇒ To estimate the variance of the administrator effect
rAdmin*ordeAdmin MSMSE +=++= 2222
828)( αεαβα σσσσ
So that with a little algebra, we obtain:
47.4
8
615.14365.50
8
ˆ 2
=
−
=
−
= rAdmin*OrdeAdmin MSMS
ασ
⇒ To estimate the variance of the order effect
rAdmin*ordeOrder MSMSE +=++= 2222
828)( βεαββ σσσσ
So that with a little algebra, we obtain:
02.15
8
615.14781.134
8
ˆ 2
=
−
=
−
= rAdmin*OrdeOrder MSMS
βσ
⇒ To estimate total variance
ˆσY
2
= ˆσε
2
+ ˆσα
2
+σβ
2
+σαβ
2
= 23.16 + 4.47 +15.02 + 0 = 42.65
• Note that any component that is estimated to be less than zero is
assumed to have a value of zero
9-35  2004 A. Karpinski
o SPSS can also compute variance components directly
VARCOMP dv BY order admin
/RANDOM = order admin.
Variance Estimates
15.021
4.469
-4.271a
23.156
Component
Var(ORDER)
Var(ADMIN)
Var(ORDER * ADMIN)
Var(Error)
Estimate
Dependent Variable: DV
Method: Minimum Norm Quadratic Unbiased Estimation
(Weight = 1 for Random Effects and Residual)
For the ANOVA and MINQUE methods, negative
variance component estimates may occur. Some
possible reasons for their occurrence are: (a) the
specified model is not the correct model, or (b)
the true value of the variance equals zero.
a.
9. Model III: Two-factor mixed models
• Multi-factor experiments involving only random effects are relatively rare in
behavioral research. It is much more common to encounter mixed models
(containing both fixed and random effects) than to encounter a multi-factor
random effects model
• A return to the study on the effect of mental activity on blood flow (BF) –
See p. 9-24. This design is a 2-factor between-subjects mixed model
ANOVA
Factor 1: Test (Math, Reading Comprehension, or History)
Factor 2: Classroom (6 classrooms)
Task (fixed)
Classroom
(random) Math Reading Comp History
1 7.8 8.7 11.1 12.0 11.7 10.0
2 8.0 9.2 11.3 10.6 9.8 11.9
3 4.0 6.9 9.8 10.1 11.7 12.6
4 10.3 9.4 11.4 10.5 7.9 8.1
5 9.3 10.6 13.0 11.7 8.3 7.9
6 9.5 9.8 12.2 12.3 8.6 10.5
9-36  2004 A. Karpinski
• As with the previous example, due to the limited number of observations per
cell, we will assume that the assumptions are satisfied.
222222 222222 222222N =
CLASS
6.005.004.003.002.001.00
14
12
10
8
6
4
2
TASK
Math
Reading
History
• When considering mixed models, interactions between fixed effects and
random effects are considered to be random effects.
• The structural model for a mixed design (A fixed; B random):
Yijk = µ + α j + βσ k + αβ( )σ jk
+εijk
),0(~ εσε Nij
),0(~ βσ σβ Nk
( ) ),0(~ αβσ σαβ Njk
So that σY
2
= σε
2
+ σβ
2
+ σαβ
2
• ANOVA table for a mixed-effects model
o The test of each:
Main effect for task: H0 :α1 = α2 = α3 = 0
Main effect for class: 0: 2
0 =βσH
Task by class interaction: 0: 2
0 =αβσH
9-37  2004 A. Karpinski
o Again, we need to consider the E(MS)s so that we construct valid F-tests.
Source SS df MS E(MS) F
Factor A
(Fixed)
SSA a-1 SSA/DFA
σε
2
+ nσαβ
2
+
nb α j
2
∑
a −1 MSAB
MSA
Factor B
(Random)
SSB b-1 SSB/DFB σε
2
+ naσβ
2
MSB
MSW
A * B SSAB (a-1)*(b-1) SSAB/DFAB
22
αβε σσ n+ MSAB
MSW
Within
(Error)
SSW N-ab SSW/DFW
2
εσ
Total SST N-1
• To construct a test for Factor A (the fixed effect):
⇒ We must use the MS from the interaction as the error term
• To construct a test for Factor B (a random effect):
⇒ We must use the MSW as the error term
• To construct a test for the Factor AB interaction (a random effect):
⇒ We must use the MSW as the error term
• Why does having a random effect change the error term of the fixed effect,
but not of the random effect?
o Consider a design with therapy (3 fixed levels) and clinical trainee (3
random levels)
o We assume that the three trainees used in the study were drawn from a
population of trainees. Imagine that we can put on our magic classes and
see population means for the therapy modes for the entire population of
trainees (and for simplicity, we will assume that the population is small –
consisting of 17 trainees)
Clinical Trainee
Therapy a b c d e f g h i j k l m n o p q r Mean
A 7 6 5 7 6 5 4 4 4 1 2 3 4 4 4 1 2 3 4
B 4 4 4 1 2 3 7 6 5 7 6 5 1 2 3 4 4 4 4
C 1 2 3 4 4 4 1 2 3 4 4 4 7 6 5 7 6 5 4
Mean 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
9-38  2004 A. Karpinski
o In our study, we randomly sample 3 of the trainees. So let’s consider a
random sample of three trainees
Clinical Trainee
Therapy g k r Mean
A 4 2 3 3.0
B 7 6 4 5.67
C 1 4 5 3.33
Mean 4 4 4 4
o The random trainee factor does not affect our estimation of the effect of
trainee
o The random trainee factor does affect our estimation of the therapy (the
fixed factor)
• Trainee and Therapy interact, which can cause variability among
means for the fixed factor to increase
• MS(A) must be measuring something other than just error and the
effect of Therapy. When we look at the EMS for factor A, we see that
it captures variability due to the A*B interaction
• Using SPSS to analyze a two-factor mixed effects design
UNIANOVA dv BY task class
/RANDOM = class.
Tests of Between-Subjects Effects
Dependent Variable: DV
3570.062 1 3570.062 2626.655 .000
6.796 5 1.359a
44.042 2 22.021 3.784 .060
58.195 10 5.820b
6.796 5 1.359 .234 .939
58.195 10 5.820b
58.195 10 5.820 7.207 .000
14.535 18 .808c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
TASK
Hypothesis
Error
CLASS
Hypothesis
Error
TASK *
CLASS
Type III Sum
of Squares df Mean Square F Sig.
MS(CLASS)a.
MS(TASK * CLASS)b.
MS(Error)c.
o But wait!! SPSS is using the wrong error term for test of the main effect
of classroom!!!
Classroom is a random effect. To test the random effect, we need to
use MSW as the error term. SPSS is using MSAB.
9-39  2004 A. Karpinski
o We will have to do the correct test by hand
Main Effect for Class: F(5,18) =
MSCLASS
MSW
=
1.36
0.81
=1.68, p = .19
o We can also use the TEST subcommand and ask SPSS to compute the F-
test. We need to enter the effect (class), the SS of the denominator
(14.54) and the df of the denominator (18)
UNIANOVA dv BY task class
/RANDOM = class
/TEST = class vs 14.54 df(18).
Test Results
Dependent Variable: DV
6.796 5 1.359 1.683 .190
14.540a 18a .808
Source
Contrast
Error
Sum of
Squares df Mean Square F Sig.
User specified.a.
o BEWARE! SPSS may contain other “errors.” If you are going to be
analyzing balanced random or mixed designs, it is worth your time and
effort to look up or calculate the E(MS)s for your design (For an
algorithm see Neter, Appendix D)
o Note: SPSS does not consider this to be an error. They state that
statisticians differ in how they approach this problem.
http://spss.com/tech/answer/details.cfm?tech_tan_id=100000073
As indicated in this tech note, SAS makes the same “error.” Thus,
even if you run the analysis in SAS, you will still have to rerun the
analysis
I cannot find any recent texts that agree with the SPSS approach.
Neter et al (1996, p 981), Kirk (1995, p 374) and Maxwell & Delaney
(1990, p 429/431) all give the E(MS) I list on the previous page. For
balanced designs, SPSS does the wrong analysis. For unbalanced
designs, SPSS’s approach may be appropriate.
9-40  2004 A. Karpinski
o The following is a hand-corrected variance components table (based on
the correct E(MS) values listed on page 9-37)
Expected Mean Squaresa
6.000 2.000 1.000
Intercept,
TASK
.000 2.000 1.000 TASK
6.000 .000 1.000
.000 2.000 1.000
.000 .000 1.000
Source
Intercept
TASK
CLASS
TASK * CLASS
Error
Var(CLASS)
Var(TASK *
CLASS) Var(Error)
Quadratic
Term
Variance Component
Andy's Hand-Corrected Tablea.
⇒ To estimate the error variance
ˆσε
2
= MSW = 0.81
⇒ To estimate the variance of the interaction effect
E(MSTask*Class) = 2σαβ
2
+σε
2
So that with a little algebra, we obtain:
ˆσαβ
2
=
MSTask*Class − MSW
2
=
5.82 − 0.81
2
= 2.51
⇒ Task is a fixed effect – there is no variance component to estimate
⇒ To estimate the variance of the class effect
E(MSClass) = 6σβ
2
+σε
2
So that with a little algebra, we obtain:
ˆσβ
2
=
MSClass − MSW
6
=
1.36 −0.81
6
= 0.09
⇒ To estimate total variance
ˆσY
2
= ˆσε
2
+ ˆσβ
2
+ ˆσαβ
2
= 0.81+ 0.09 + 2.51 = 3.41
o SPSS’s VARCOMP command also errs on the variance estimate for the
class effect (SPSS output not shown here)
9-41  2004 A. Karpinski
10.Contrasts and post-hoc tests
• To perform contrasts or post-hoc tests, you can use the same formulas
previously discussed for ANOVA – with one exception. You must use the
correct error term in place of MSW, and the degrees of freedom associated
with that error term
o If you perform contrasts/post-hoc test on the marginal means for factor
A, you need to use the error term used to test factor A
o If you perform contrasts/post-hoc test on the marginal means for factor B,
you need to use the error term used to test factor B
o If you perform contrasts/post-hoc test on the individual cell means, you
need to use the error term used to test AB interaction
11. Effect sizes for random effects designs
• The random effects equivalent of eta squared is rho, ρ
• Rho is interpreted just as eta squared – as the proportion of the variance in
the DV accounted for by the factor in the sample
ρA =
σA
2
σY
2
• Omega squares must still be used for fixed effects in a mixed model. In
general, for a fixed factor A:
MSWNerrortermMSdfASSA
errortermMSdfASSA
A
)(][)(
][)(
ˆ 2
+−
−
=ω
o For example, in a two-factor mixed model, with A fixed and B random,
we used MSAB as the error term to test Factor A. Thus, our equation for
omega squared would be:
MSWNMSABdfASSA
MSABdfASSA
A
)()(
)(
ˆ 2
+−
−
=ω
53.
)808)(.36(82.5)2(04.44
82.5)2(04.44
ˆ 2
=
+−
−
=Taskω
9-42  2004 A. Karpinski
12.Final considerations about random effects
• The distinction between fixed and random effects is not always as clear as
presented here. For example, Clark (1973) argued – convincingly – that
when a list of words is used in a study, the words should be treated as a
random effect. The key is what type of inference you want to make
• We consider the random effects as being sampled from an infinite
population. If the population is finite but large, we are OK. However, when
the population to be sampled from is small, adjustments are necessary
• We estimate the distribution of the random effects based on the means (and
the variability of those means) of the random factor. If you only have 2-3
levels of your random factor, you will not get a good estimate of the
distribution. It is desirable to have a relatively large number of levels of any
random factor. In addition, it is important that the levels of the random
factor be randomly sampled from the population of interest
• In designs with three or more factors that include two or more random
effects, it is common to encounter situations where no exact F-test can be
constructed. In this case, quasi-F ratios (linear combinations of MSs) are
used to approximate an F-ratio.
• All of our calculations assume that cell sizes are equal. Things get very
wacky with unequal cell sizes, and it is no longer possible to construct exact
F-tests (the ratios of expected MSs no longer satisfy the requirements for a
valid F-test). Approximate tests are available and are calculated in SPSS.
• It is a good idea to calculate or look-up E(MS)s for balanced designs and/or
to replicate the analysis using another statistical package.
9-43  2004 A. Karpinski
ANOVA designs with nested effects
13.An introduction to nested designs
• Nested designs are also known as hierarchical designs
• The factorial designs studied thus far are considered to be crossed designs.
That is, every level of a factor appears in (or is crossed with) every level of
all other factors. If you display the design in a grid, there are no empty cells
in a crossed design.
• Example 1: The effect of therapist’s sex on treatment outcome You observed
three male and three female therapists. Each therapist sees four patients, and
you record a general measure of psychological health.
Sex of therapist Male Female
Therapist 1 2 3 4 5 6
o Sex is the main variable of interest and is a fixed effect
o Therapist is nested within sex (It can not be crossed because a therapist
can not be both male and female). Therapist will also be considered a
random effect
o Each therapist sees three patients. Thus, patients are nested within
therapist (and are a random effect)
• Example #2: The effect of race of defendant on jury decision making
Race of Defendant Black White
Jury 1 2 3 4 5 6 7 8 9 10 11 12
o Race is the main variable of interest and is a fixed effect
o Jury is nested within race. Jury will most likely be considered a random
effect
o Each jury is composed of 12 participants. The participants are nested
within jury (and are also a random effect)
9-44  2004 A. Karpinski
• Example #3: A new intervention is developed to reduce drug use in inner
city middle-schools students. Six inner-city schools are selected at random,
three receive the new intervention and three receive the old intervention and
within each of those schools two classrooms are selected at random to
receive the new intervention.
Old intervention
School School A School B School C
Classroom 1 2 3 4 5 6 7 8 9 10 11 12
New intervention
School School D School E School F
Classroom 1 2 3 4 5 6 7 8 9 10 11 12
o Type of intervention is a fixed effect
o School is a random effect nested within treatment
o Classroom is a random effect nested within school
o The participants are a random effect nested within classroom
• General comments about nested designs
o In behavioral research, nested factors are usually random effects
o In factorial between subjects designs, participants are nested within cell
• Because I am presenting only an introduction to nested designs, I will
consider only designs with random effects nested within a fixed effect (like
these examples). I can provide references for the analysis of more advanced
designs.
9-45  2004 A. Karpinski
14. Structural models for nested designs
• Example #1: Therapist’s sex and treatment outcome
o Factor A: Therapist’s sex (Male vs. Female) Fixed effect
o Factor B: Therapist Random effect
)()(
/ jkijkjijkY εαβαµ σ +++=
jα The fixed effect of therapist’s sex
αβσ /)( jk
The random effect of therapist within sex
)( jkiε The errors/residuals
AKA the random effect of participant within therapist
Sometimes notated βπσ /)( jki
to emphasize the nesting
• Example #3: Drug use intervention
o Factor A: Intervention Fixed effect
o Factor B: School within intervention Random effect
o Factor C: Classroom within school Random effect
)()()(
// jklijkljkjijklY εβγαβαµ σσ ++++=
jα The fixed effect of intervention
αβσ /)( jk
The random effect of school within intervention
βγσ /)( jkl
The random effect of class within school
)( jkiε The errors/residuals
AKA the random effect of participant within class
Sometimes notated γπσ /)( jkli
to emphasize the nesting
• Note that because these designs are nested, not crossed, there is no way to
estimate an interaction effect.
9-46  2004 A. Karpinski
15.Testing nested effects
• With nested effects, we again need to make sure we use the correct error
term when constructing F-tests.
Design Effect Error Term
Two-factor B/A A MS(B/A)
B Random B/A MSW
A Fixed
Three- factor C/B/A A MS(B/A)
C,B Random B/A MS(C/B)
A Fixed C/B MSW
o Just as for the random effect designs – the SS are calculated in the same
manner as before. The only difference is the construction of the F-test
o For more complex designs, you’ll have to look up the error term, or trust
SPSS
• Example #1: Therapist’s sex and treatment outcome
Sex of Therapist
Male Female
1 2 3 4 5 6
49 42 42 54 44 57
40 48 46 60 54 62
31 52 50 64 54 66
40 58 54 70 64 71
o To test the effect of sex of therapist, we treat each therapist as one
observation (collapsing across participants)
Sex of Therapist
Male Female
40 50 48 62 54 64
A one-factor ANOVA on these six observations would have:
1 df in the numerator
4 df in the denominator
This is essentially how the effect of sex of therapist is analyzed in a
nested design
9-47  2004 A. Karpinski
o SPSS syntax:
UNIANOVA dv BY sex thera
/RANDOM = thera
/DESIGN = sex thera within sex .
Tests of Between-Subjects Effects
Dependent Variable: DV
67416.000 1 67416.000 601.929 .000
448.000 4 112.000a
1176.000 1 1176.000 10.500 .032
448.000 4 112.000a
448.000 4 112.000 2.459 .083
820.000 18 45.556b
Source
Hypothesis
Error
Intercept
Hypothesis
Error
SEX
Hypothesis
Error
THERA(SEX)
Type III Sum
of Squares df Mean Square F Sig.
MS(THERA(SEX))a.
MS(Error)b.
Effect for sex of therapist: F(1,4) = 10.50, p = .03
Effect of therapist: F(4, 18) = 2.46, p = .08
o Let’s do the one-factor ANOVA on the collapsed data to examine the
effect of sex of therapist
Sex of Therapist
Male Female
40 50 48 62 54 64
Descriptives
DV
3 46.0000
3 60.0000
6 53.0000
1.00
2.00
Total
N Mean
ANOVA
DV
294.000 1 294.000 10.500 .032
112.000 4 28.000
406.000 5
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
• This analysis produces the same results – only the SS are different.
This analysis was tricked into thinking each observation was one
participant, but in the actual analysis, we know that each ‘observation’
was based on data from four participants. If you multiply the SS in
this oneway analysis by 4, you will get the same results as the nested
analysis. (This trick only works for balanced designs)
9-48  2004 A. Karpinski
o To calculate the effect sizes:
• Sex is a fixed effect, so we need to calculate omega squared
MSWNerrortermMSdfASSA
errortermMSdfASSA
A
)(][)(
][)(
ˆ 2
++
−
=ω
45.
56.45)24(112)1(1176
112)1(1176
ˆ 2
=
++
−
=Sexω
• Therapist within sex is a random effect, so we need to calculate phi
2
2
)(
)(
Y
sexThera
sexThera
σ
σ
ρ =
Expected Mean Squares
4.000 1.000
Intercept,
SEX
4.000 1.000 SEX
4.000 1.000
.000 1.000
Source
Intercept
SEX
THERA(SEX)
Error
Var(THER
A(SEX)) Var(Error)
Quadratic
Term
Variance Component
22
)()( 4)( εσσ += sexTherasexTheraMSE
86.18
4
56.45121
4
ˆ )(2
)( =
−
=
−
=
MSWMS sexThera
sexTheraσ
22
)(
2
εσσσ += sexTheraY
42.6456.4586.18ˆ 2
=+=Yσ
29.
42.64
86.18
ˆ
ˆ
ˆ 2
2
)(
)( ===
Y
sexThera
sexThera
σ
σ
ρ
9-49  2004 A. Karpinski
• Example #3: Drug use intervention
(Let’s assume that there were three students in each class)
Old Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
11.2 16.5 18.3 19 7.3 11.9 11.3 8.9 15.3 19.5 14.1 16.5
11.6 16.8 18.7 18.5 7.8 12.4 10.9 9.4 15.9 20.1 13.8 17.2
12.0 16.1 19.0 18.2 7.0 12.0 10.5 9.3 16.0 19.3 14.2 16.9
New Intervention
School 1 School 2 School 3
1 2 3 4 1 2 3 4 1 2 3 4
13.2 17.25 20.3 20.5 9.3 12.9 10.3 10.9 17.55 20.75 15.1 18.75
12.35 18.8 18.45 17.5 7.05 14.65 12.15 8.15 14.9 22.1 14.55 17.2
13.25 15.85 21.0 19.2 8.5 14.25 10.0 11.55 17.75 21.3 13.7 16.9
o To gain an intuitive understanding of how nested effects are tested, it is
beneficial to examine each effect separately
o To test the effect of the intervention, we essentially treat each school as
one observation (collapsing across classrooms and participants)
Intervention
Old New
16.33 9.89 16.57 17.30 10.81 17.55
A one-factor ANOVA on these six observations has:
1 df in the numerator (a-1) = (2-1) = 1
4 df in the denominator a(b-1) = 2(3-1) = 2*2 = 4
ONEWAY dv by treat
/STAT = DESC.
Descriptives
DV
3 14.2613 3.78589
3 15.2200 3.82122
6 14.7407 3.44232
1.00
2.00
Total
N Mean Std. Deviation
ANOVA
DV
1.379 1 1.379 .095 .773
57.869 4 14.467
59.248 5
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
F(1,4) = 0.10, p = .77
9-50  2004 A. Karpinski
o To test the effect of school (within intervention), we treat each class as
one observation (collapsing across participants)
School (Treatment)
1(Old) 2(Old) 3(Old) 1(New) 2(New) 3(New)
11.60 7.37 15.73 12.93 8.28 16.73
16.47 12.10 19.63 17.30 13.93 21.38
18.67 10.90 14.03 19.92 10.81 14.45
18.57 9.20 16.86 19.07 10.20 17.61
A school within treatment ANOVA on these 24 observations has:
4 df in the numerator a(b-1) = 2(3-1) = 2*2 = 4
18 df in the denominator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18
UNIANOVA dv BY treat school
/DESIGN = treat, school within treat.
Tests of Between-Subjects Effects
Dependent Variable: DV
237.029 5 47.406 6.427 .001
5213.833 1 5213.833 706.816 .000
5.491a 1 5.491 .744 .400
231.538 4 57.885 7.847 .001
132.777 18 7.377
5583.639 24
369.807 23
Source
Corrected Model
Intercept
TREAT
SCHOOL(TREAT)
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
Ignore this test for the effect of treatment in this setupa.
F(4,18) = 7.85, p = .001
o Finally, to test the effect of class (within school within intervention), we
examine the individual observations
This analysis has:
18 df in the numerator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18
48 df in the denominator abc(n-1) = 2*3*4*(3-1) = 48
9-51  2004 A. Karpinski
o To analyze all the effects in one command:
UNIANOVA dv BY treat school class
/RANDOM = school class
/PRINT = DESC
/DESIGN = treat, school within treat,
class within school within treat.
Tests of Between-Subjects Effects
Dependent Variable: DV
15643.857 1 15643.857 90.088 .001
694.600 4 173.650a
16.531 1 16.531 .095 .773
694.600 4 173.650a
694.600 4 173.650 7.850 .001
398.194 18 22.122b
398.194 18 22.122 27.682 .000
38.358 48 .799c
Source
Hypothesis
Error
Intercept
Hypothesis
Error
TREAT
Hypothesis
Error
SCHOOL(TREAT)
Hypothesis
Error
CLASS(SCHOOL
(TREAT))
Type III Sum
of Squares df Mean Square F Sig.
MS(SCHOOL(TREAT))a.
MS(CLASS(SCHOOL(TREAT)))b.
MS(Error)c.
Effect of treatment: F(1,4) = 0.10, p = .77
Effect of school(treatment): F(4,18) = 7.85, p = .001
Effect of class(school(treatment)): F(18,48) = 27.68, p < .001
o SPSS also provides the variance components so that effect sizes can be
calculated for the random effects
Expected Mean Squaresa,b
12.000 3.000 1.000
Intercept,
TREAT
12.000 3.000 1.000 TREAT
12.000 3.000 1.000
.000 3.000 1.000
.000 .000 1.000
Source
Intercept
TREAT
SCHOOL(TREAT)
CLASS(SCHOOL
(TREAT))
Error
Var(SCHOO
L(TREAT))
Var(CLASS
(SCHOOL(T
REAT))) Var(Error)
Quadratic
Term
Variance Component
For each source, the expected mean square equals the sum of the
coefficients in the cells times the variance components, plus a
quadratic term involving effects in the Quadratic Term cell.
a.
Expected Mean Squares are based on the Type III Sums of Squares.b.
9-52  2004 A. Karpinski
16.Final considerations about nested designs
• In these examples, we did not test the assumptions for the model because of
small cell sizes. However, the ANOVA assumptions must be satisfied for the
results to be valid. The assumptions for a nested model are the same as the
assumptions for a fixed or random effects model (depending on if there are
fixed or random effects in the model).
• Pay attention to the small degrees of freedom in the tests for some of the
nested effects. In both examples, the test of the fixed effect (the effect of
most interest in these designs) is based on six observations! Nested designs
can have very low power unless you have a large number of levels of the
nested effects.
• We have focused on balanced complete nested designs with random effects
nested within a fixed effect. Many other nested designs are possible –
including partially nested designs. Before you run a more complicated
nested design, make sure that you know how to analyze it. Kirk (1995) is a
good reference.
• As in the random effects case, contrasts and post-hoc tests can be conducted
by using the appropriate error term in previously developed equations.
• We have discussed nested designs in an ANOVA framework where all the
independent variables are categorical variables. In a regression framework,
these models are usually called hierarchical linear models (HLM) and are
very popular at the moment. In an HLM analysis, different terminology and
different methods of estimation are used, but the interpretation is the same.
9-53  2004 A. Karpinski
ANOVA designs with randomized blocks
17.The logic of blocking
• When we test the effect of a factor on a dependent variable, there are always
many other factors that lead to variability in the DV. When these variables
are not of interest to us, they are called nuisance variables.
• For example, if we are interested in the relationship between type of therapy
and psychological wellness, there are many other factors that influence
wellness other than the type of therapy.
• What can we do about nuisance variables?
o The typical approach is to use random assignment of participants to
treatment conditions.
• The nuisance variables are distributed equally over the experimental
factors so that they do not affect just one treatment level.
• However, all the variation in the DV caused by the nuisance variable
is accumulated in the MSW. A large MSW (relative to the MS of the
factor of interest) will decrease our power to detect the effect of
interest.
o An alternative approach is to hold the nuisance variables constant.
• For example, to examine the effectiveness of several types of therapy,
we can use only 18-year-old white females who have the same
severity of the disorder. By creating a homogenous sample, we will
decrease the MSW and increase our power.
• This approach limits the generalizability of the conclusions. In
addition, if you attempt to hold several variables constant, it may be
difficult to find participants for the study.
o You can also include the nuisance variable(s) as factors in the study.
This approach is known as blocking.
9-54  2004 A. Karpinski
• Any variable that is related to the DV may be used as a blocking variable.
There are two categories of common blocking variables:
o Characteristics associated with the participant:
• Gender
• Age
• Income
• IQ
• Education
• Attitudes
• Previous experience with
task
o Characteristics associated with the experimental setting:
• Time of day
• Batch of material
• Location
• Week
• Measuring instrument
• The participant (!)
• When we include a blocking factor in the design, we can capture the
variability it causes in the DV in a SS(Blocks). This process will reduce the
SS Within, compared to a non blocked design
SS Total
(SS Corrected Total)
SS Error
df = N-a
SS A
df=(a-1)
SS Blocks
df = bl-1
SS Residual
df = N – a – bl + 1
SS A
df=a-1
9-55  2004 A. Karpinski
18.Examples of blocked designs
• Example #1: Methods of quantifying risk. Managers were exposed to one of
three methods of quantifying risk. After learning about the method,
participants were asked to rate their degree of confidence in their risk
assessments.
Fifteen participants were grouped into five blocks, according to their age.
Within each block, participants were randomly assigned to one of the three
experimental conditions
o Layout for a randomized block design
Participant
1 2 3
Block 1 (Oldest participants) C W U
2 C U W
3 U W C
4 W U C
5 (Youngest participants) W C U
o Data from the quantifying risk example:
Method
Block Utility Worry Comparison Average
1 (oldest) 1 5 8 4.7
2 2 8 14 8.0
3 7 9 16 10.7
4 6 13 18 12.3
5 (youngest) 12 14 17 14.3
Average 5.6 14 17
• Note that a randomized block design looks like a factorial design, but
there is only one participant per cell. If there were two or more
participants per cell, we would call this design a two-way ANOVA.
• Because there is one participant per cell, we do not have any
information to test the block by factor interaction.
9-56  2004 A. Karpinski
o Assumptions for a randomized block design:
• Because we only have one observation/cell, we cannot check
assumptions on a cell-by-cell basis as we would for a factorial design.
• We require the standard assumptions:
⇒ Independently and randomly sampled observations
⇒ Homogeneity of variances
(Checked on the marginal means for the factor AND for the blocks)
⇒ Normality
(By block and by treatment)
⇒ We assume that there is no treatment by block interaction (non-
additivity of treatment and blocks)
Plot observed values by block and look for parallel lines
• Additional assumptions are required if the blocking factor is a random
effect
o Checking assumptions in the quantifying risk example
EXAMINE VARIABLES=dv BY block treat
/PLOT BOXPLOT SPREADLEVEL NPPLOT.
• By treatment:
Test of Homogeneity of Variance
.048 2 12 .953DV
Levene
Statistic df1 df2 Sig.
555N =
TREAT
3.002.001.00
DV
20
10
0
-10
3
Tests of Normality
.940 5 .665
.943 5 .687
.860 5 .227
TREAT
1.00
2.00
3.00
DV
Statistic df Sig.
Shapiro-Wilk
9-57  2004 A. Karpinski
• By block:
Test of Homogeneity of Variances
DV
.552 4 10 .702
Levene
Statistic df1 df2 Sig.
33333N =
BLOCK
5.004.003.002.001.00
DV
20
10
0
-10
Tests of Normality
.993 3 .843
1.000 3 1.000
.907 3 .407
.991 3 .817
.987 3 .780
BLOCK
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk
But with three observations per block, these tests are essentially
worthless!
• No treatment by block interaction
Test for Interaction
0
4
8
12
16
20
Utility Worry Comparison
Block 1
Block 2
Block 3
Block 4
Block 5
It may be difficult to judge the difference between random error and a
true block * factor interaction. You are looking for an extreme pattern
in the data.
o All the assumptions appear to be satisfied in this case
9-58  2004 A. Karpinski
o What to do if assumptions are not satisfied?
• Non-normality and/or moderate heterogeneity of variances
⇒ Rank data and perform analysis on ranked data
• Heterogeneity of variances and/or treatment by block interaction
⇒ Transform data
o Structural model for a randomized block design with one factor and one
block:
ijijijY εταµ +++=
µ = Grand population mean
..ˆ Y=µ
jα = The treatment effect:
The effect of being in level j of factor A
∑ = 0jα or ),0(~ ασ σα Nj
...ˆ YY jj −=α
iτ = The block effect:
The effect of being in level i of the blocking variable
∑ = 0iτ
...ˆ YYii −=τ
ijε = The unexplained error associated with ijY
....ˆ YYYY jiijij +−−=ε
• The randomized block design is identical to a two-factor ANOVA
with no interaction term.
• In this case, the blocking variable is considered to be a fixed variable.
Special accommodations are necessary for a random blocking factor.
9-59  2004 A. Karpinski
o Sums of squares decomposition and ANOVA table for a randomized
block design:
E(MS)
Source SS df MS
Treatments
Fixed
Treatments
Random
Treatment SSA a-1 MSA
1
2
2
−
+
∑
a
bl jα
σε
22
αε σσ bl+
Blocks SSBL bl-1 MSBL
1
2
2
−
+
∑
bl
a jτ
σε
1
2
2
−
+
∑
bl
a jτ
σε
Error SSError (a-1)(bl-1) MSE 2
εσ 2
εσ
Total SST N-1
• To construct a significance test
⇒ For fixed treatment effects For Random Treatment effects
0...: 210 ==== aH ααα 0: 2
0 =ασH
⇒ But for either fixed or random effects, we construct the F-test in
the same manner
MSE
MSA
blaaF =−−− )]1)(1(,1[
⇒ To test for the block effect
MSE
MSBL
blablF =−−− )]1)(1(,1[
However, we are usually not so interested in the test of the
blocking variable. We included this variable to reduce the error
variability.
9-60  2004 A. Karpinski
o Using SPSS to analyze a randomized block design
UNIANOVA dv BY block treat
/DESIGN = treat block.
Note that a factorial design (treatment, block, and treatment*block) is
assumed unless otherwise stated with the DESIGN subcommand
Tests of Between-Subjects Effects
Dependent Variable: DV
374.133a 6 62.356 20.901 .000
1500.000 1 1500.000 502.793 .000
202.800 2 101.400 33.989 .000
171.333 4 42.833 14.358 .001
23.867 8 2.983
1898.000 15
398.000 14
Source
Corrected Model
Intercept
TREAT
BLOCK
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .940 (Adjusted R Squared = .895)a.
• We find a significant treatment effect, F(2,8) = 33.99, p < .001
ˆωA
2
=
SSA −(dfA)MSError
SSA+ (N − dfA)MSError
=
202.8 −(2)2.983
202.8 + (15 −2)2.983
= .814
• Note that post-hoc tests on the marginal treatment means are required
to identify the effect
o What if we had neglected to block by age of participant?
ONEWAY dv BY treat.
ANOVA
DV
202.800 2 101.400 6.234 .014
195.200 12 16.267
398.000 14
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
41.
267.16)215(8.202
267.16)2(8.202
)(
)(
ˆ 2
=
−+
−
=
−+
−
=
MSWithindfANSSA
MSWithindfASSA
Aω
• Although inclusion of the blocking effect did not change the
conclusion of the statistical test, blocking greatly increased the size of
the effect of treatment.
9-61  2004 A. Karpinski
• Example #2: Fat in the diet. A researcher studies three low fat diets.
Participants were blocked on the basis of age. DV = post-diet reduction in
blood plasma lipid levels
Fat content of diet
Block
Extremely
Low
Fairly
Low
Moderately
Low
15-24 .73 .67 .35
25-34 .86 .75 .41
35-44 .94 .81 .46
45-54 1.40 1.32 .95
55-64 1.62 1.41 .98
o First, let’s check the assumptions
EXAMINE VARIABLES=dv BY block fat
/PLOT BOXPLOT NPPLOT.
By block By treatment level
33333N =
BLOCK
5.004.003.002.001.00
DV
1.8
1.6
1.4
1.2
1.0
.8
.6
.4
.2
555N =
FAT
3.002.001.00
DV 1.8
1.6
1.4
1.2
1.0
.8
.6
.4
.2
Tests of Normality
.865 3 .281
.920 3 .452
.935 3 .506
.878 3 .320
.962 3 .626
BLOCK
1.00
2.00
3.00
4.00
5.00
DV
Statistic df Sig.
Shapiro-Wilk Tests of Normality
.898 5 .401
.829 5 .138
.792 5 .070
FAT
1.00
2.00
3.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variance
.336 2 12 .721
.047 2 12 .954
.047 2 11.893 .954
.302 2 12 .745
Based on Mean
Based on Median
Based on Median and
with adjusted df
Based on trimmed mean
DV
Levene
Statistic df1 df2 Sig.
9-62  2004 A. Karpinski
Check for treatment by block interaction:
0
0.4
0.8
1.2
1.6
2
Extreme Fair Moderate
Age 15-24
Age 25-34
Age 35-44
Age 45-54
Age 55-64
• All assumptions seem fine
o To examine the effect of fat in the diet on plasma lipid levels, let’s
conduct a randomized block ANOVA
UNIANOVA dv BY block fat
/DESIGN = fat block.
Tests of Between-Subjects Effects
Dependent Variable: DV
2.045a 6 .341 141.102 .000
12.440 1 12.440 5151.017 .000
.626 2 .313 129.527 .000
1.419 4 .355 146.890 .000
1.932E-02 8 2.415E-03
14.504 15
2.064 14
Source
Corrected Model
Intercept
FAT
BLOCK
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .991 (Adjusted R Squared = .984)a.
We find a significant effect of fat in the diet on plasma lipid levels,
F(2,8) = 129.52, p < .001
Let’s conduct Tukey HSD post-hoc tests on the marginal treatment
means. We can have SPSS do the test for us:
UNIANOVA dv BY fat block
/POSTHOC = fat ( TUKEY )
/DESIGN = fat block .
9-63  2004 A. Karpinski
Multiple Comparisons
Dependent Variable: DV
Tukey HSD
.1180* .03108 .013 .0292 .2068
.4800* .03108 .000 .3912 .5688
-.1180* .03108 .013 -.2068 -.0292
.3620* .03108 .000 .2732 .4508
-.4800* .03108 .000 -.5688 -.3912
-.3620* .03108 .000 -.4508 -.2732
(J) FAT
2.00
3.00
1.00
3.00
1.00
2.00
(I) FAT
1.00
2.00
3.00
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .050 level.*.
Extremely low vs. fairly low fat: t(8) = 3.80, p = .013
Extremely low vs. moderately low fat:t(8) = 15.44, p < .001
Fairly low vs. moderately low fat: t(8) = 11.65, p < .001
o Note that if we had neglected to block on age, we would have failed to
find a significant treatment effect!
ONEWAY dv BY fat.
ANOVA
DV
.626 2 .313 2.610 .115
1.438 12 .120
2.064 14
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
o What would happen if we forgot this was a randomized block design, and
attempted to analyze it as a factorial design?
UNIANOVA dv BY fat block
/DESIGN = fat block fat*block.
Tests of Between-Subjects Effects
Dependent Variable: DV
2.064a 14 .147 . .
12.440 1 12.440 . .
.626 2 .313 . .
1.419 4 .355 . .
1.932E-02 8 2.415E-03 . .
.000 0 .
14.504 15
2.064 14
Source
Corrected Model
Intercept
FAT
BLOCK
FAT * BLOCK
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = 1.000 (Adjusted R Squared = .)a.
Why did this happen???
9-64  2004 A. Karpinski
• A final example: A researcher studied how children solved a variety of
puzzles. Sixty children were blocked into groups of 6 on the basis of age,
gender, and IQ. Within each block, children were randomly assigned to
work on a specific type of puzzle. The number of puzzles (out of a possible
20) solved by each child was recorded.
Puzzle Type
Block P1 P2 P3 P4 P5 P6
1 5 14 8 10 11 6
2 7 10 7 9 12 5
3 11 9 10 11 14 6
4 9 10 6 13 15 7
5 13 12 7 14 16 11
6 7 9 8 6 11 5
7 10 11 8 12 13 8
8 4 8 5 7 9 4
9 14 13 11 15 17 12
10 9 9 8 10 14 9
o First, let’s check assumptions:
EXAMINE VARIABLES=dv by block puzzle
/PLOT BOXPLOT NPPLOT SPREADLEVEL.
• By factor
101010101010N =
PUZZLE
6.005.004.003.002.001.00
DV
18
16
14
12
10
8
6
4
2
45
15
51
Tests of Normality
.970 10 .891
.924 10 .394
.941 10 .560
.974 10 .925
.979 10 .959
.927 10 .415
PUZZLE
1.00
2.00
3.00
4.00
5.00
6.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variance
1.110 5 54 .366Based on MeanDV
Levene
Statistic df1 df2 Sig.
9-65  2004 A. Karpinski
• By block
6666666666N =
BLOCK
10.009.008.007.006.005.004.003.002.001.00
DV
18
16
14
12
10
8
6
4
2
59
18
17
Tests of Normality
.969 6 .886
.972 6 .907
.964 6 .847
.952 6 .759
.963 6 .846
.983 6 .964
.918 6 .493
.892 6 .331
.983 6 .964
.750 6 .020
BLOCK
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
DV
Statistic df Sig.
Shapiro-Wilk
Test of Homogeneity of Variance
.521 9 50 .852Based on MeanDV
Levene
Statistic df1 df2 Sig.
• Block by factor interaction
0
2
4
6
8
10
12
14
16
18
P1 P2 P3 P4 P5 P6
• All appears OK.
9-66  2004 A. Karpinski
• Let’s start with a general ANOVA approach
UNIANOVA dv BY puzzle block
/DESIGN = puzzle block.
Tests of Between-Subjects Effects
Dependent Variable: DV
488.000a 14 34.857 15.121 .000
5684.267 1 5684.267 2465.861 .000
238.933 5 47.787 20.730 .000
249.067 9 27.674 12.005 .000
103.733 45 2.305
6276.000 60
591.733 59
Source
Corrected Model
Intercept
PUZZLE
BLOCK
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .825 (Adjusted R Squared = .770)a.
o We find a significant puzzle effect, 01.,73.20)45,5( <= pF
o To describe specific differences, we conduct pair-wise posthoc tests
UNIANOVA dv BY puzzle block
/POSTHOC = puzzle ( TUKEY )
/DESIGN = puzzle block.
Multiple Comparisons
Dependent Variable: DV
Tukey HSD
-1.6000 .67900 .194 -3.6207 .4207
1.1000 .67900 .590 -.9207 3.1207
-1.8000 .67900 .106 -3.8207 .2207
-4.3000 .67900 .000 -6.3207 -2.2793
1.6000 .67900 .194 -.4207 3.6207
2.7000 .67900 .003 .6793 4.7207
-.2000 .67900 1.000 -2.2207 1.8207
-2.7000 .67900 .003 -4.7207 -.6793
3.2000 .67900 .000 1.1793 5.2207
-2.9000 .67900 .001 -4.9207 -.8793
-5.4000 .67900 .000 -7.4207 -3.3793
.5000 .67900 .976 -1.5207 2.5207
-2.5000 .67900 .008 -4.5207 -.4793
3.4000 .67900 .000 1.3793 5.4207
5.9000 .67900 .000 3.8793 7.9207
(J) PUZZLE
2.00
3.00
4.00
5.00
6.00
3.00
4.00
5.00
6.00
4.00
5.00
6.00
5.00
6.00
6.00
(I) PUZZLE
1.00
2.00
3.00
4.00
5.00
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
• Puzzle 5 is solved more frequently than all other puzzles
• Puzzles 2 and 4 are solved more frequently than puzzles 3 and 6
9-67  2004 A. Karpinski
• Alternatively, imagine that you had the following a priori hypotheses
o P2 = P4
o P3 = P6
o 




 +
>




 +
>
2
63
2
42
5
PPPP
P
o We cannot enter contrasts directly into SPSS, so we’ll have to do these
contrasts by hand.
o Computing and testing a Main Effect Contrast (see 7-39)
.........ˆ
1
11 ar
a
j
jj XcXcXc ++== ∑=
ψ
∑=
=
a
j j
j
n
c
MSErrorStdError
1
2
)ˆ(ψ
Where 2
jc is the squared weight for each marginal mean
jn is the sample size for each marginal mean
MSE is MSE from the omnibus ANOVA
(With the effects of the blocks removed)
)ˆerror(standard
ˆ
~
ψ
ψ
t
∑
∑=
j
j
jj
observed
n
c
MSE
Xc
t
2
..
∑
=
j
j
n
c
SS 2
2
ˆ
)ˆ(
ψ
ψ
MSE
SSC
dfw
SSE
dfc
SSC
dfwF ==),1(
9-68  2004 A. Karpinski
o Create contrast coefficients:
• P2 = P4 (0 –1 0 1 0 0)
• P3 = P6 (0 0 –1 0 0 1)
• 




 +
>




 +
>
2
63
2
42
5
PPPP
P (0 -1 0 -1 2 0) (0 1 -1 1 0 -1)
o Compute the value of each contrast:
Descriptive Statistics
Dependent Variable: DV
8.9000 3.24722 10
10.5000 1.95789 10
7.8000 1.75119 10
10.7000 2.90784 10
13.2000 2.48551 10
7.3000 2.66875 10
9.7333 3.16692 60
PUZZLE
1.00
2.00
3.00
4.00
5.00
6.00
Total
Mean Std. Deviation N
(0 –1 0 1 0 0) 2.07.105.10ˆ1 =+−=ψ 2.0)ˆ( 1 =ψSS
(0 0 –1 0 0 1) 4.03.78.7ˆ2 −=+−=ψ 8.0)ˆ( 2 =ψSS
(0 -1 0 -1 2 0) 2.52.13*27.105.10ˆ3 =+−−=ψ 067.45)ˆ( 3 =ψSS
(0 1 -1 1 0 -1) 1.63.77.108.75.10ˆ4 =−+−=ψ 025.93)ˆ( 4 =ψSS
o Test the contrast:
77.,08.0
305.2
2.
)45,1(:1 === pFψ
56.,35.0
305.2
8.
)45,1(:2 === pFψ
01.,55.19
305.2
067.45
)45,1(:3 <== pFψ
01.,36.40
305.2
025.93
)45,1(:4 <== pFψ
o Note that if these were post-hoc tests, then we would need to apply the
Tukey HSD or Scheffé correction.
9-69  2004 A. Karpinski
19. Final considerations about blocking
• As shown in the last SPSS output, when there is one participant per cell, the
SS for the interaction is the error term. Some authors create ANOVA tables
with no error term, and use the SS(BL*A) to test the effect of A. The only
difference in these approaches is the labeling of the error term.
• If the blocking variable is not related to the DV, then you actually lose
power by including it in the design.
Blocked Design
Source SS df MS F
Treatment SSA a-1 MSA
MSE
MSA
blaNaF =+−−− )]1(),1[(
Blocks 0 bl-1 MSBL
Error SSError (a-1)(bl-1) MSE
Total SST N-1
Standard Design
Source SS df MS F
Treatment SSA a-1 MSA
F[(a −1),(N − a)] =
MSA
MSE
Within SSError N-a MSE
Total SST N-1
o When SSBL = 0, then MSE (in blocked design) = MSW (in the standard
design), so that the F-ratios in the two cases are identical
o But there are fewer degrees of freedom in the error term for the blocked
design (N-a-bl+1) than in the standard design (N-a). The loss of these b-
1 dfs results in lower power for the blocked design.
o In reality, the SSBL will never be exactly zero, but when SSBL is small
and the number of blocks is large, you will lose power.
9-70  2004 A. Karpinski
• The blocking variable must be a discrete variable. Oftentimes in behavioral
research (and in both of our examples) the blocking variable is a continuous
variable that must be artificially grouped for the purpose of analysis. When
you treat a continuous variable as a discrete variable, you lose information
and power. An analysis of covariance (ANCOVA) is a similar design to a
randomized block design, except nuisance variables may be continuous.
• Testing for non-additivity of treatment effects and blocks:
o If looking at the plot of the DV by blocks makes you feel uneasy (it
shouldn’t!), a statistical test is available: Tukey’s test for nonadditivity.
o If you have more than 1 observation per cell, then you have a factorial
design. You can calculate a SS(Bl*A) and test the interaction.
• If you want to block on two factors, you can use the same procedure outlined
here. Simply combine the two factors into one block. For example, to block
on age and education:
⇒ Young and no education
⇒ Young and education
⇒ Old and no education
⇒ Old and education

More Related Content

PDF
Statistic note
PPTX
PPTX
Tabular and Graphical Representation of Data
PPTX
Normal distribution
PDF
Spss series - data entry and coding
PPTX
Frequency distribution
PDF
PPT
Analyzing survey data
Statistic note
Tabular and Graphical Representation of Data
Normal distribution
Spss series - data entry and coding
Frequency distribution
Analyzing survey data

What's hot (20)

PPTX
Types of data
PPTX
Types of data and graphical representation
PDF
MEASURES OF DISPERSION NOTES.pdf
PPTX
DATA Types
PPTX
Box and whisker plots
PDF
Logistic regression
PPT
Introduction to spss
PPTX
Data entry in Excel and SPSS
PPTX
Biostatistics Standard deviation and variance
PPTX
presentation of data
PPT
Introduction To SPSS
PDF
chain surveying
PPT
Chain Surveying 2020.ppt
PPTX
2.4 Scatterplots, correlation, and regression
PDF
Correlations using SPSS
PDF
Spss data analysis for univariate, bivariate and multivariate statistics by d...
PDF
Introduction to Statistics
PPTX
PPT
Correlation and regression
PPTX
Stastistics in Physical Education - SMK.pptx
Types of data
Types of data and graphical representation
MEASURES OF DISPERSION NOTES.pdf
DATA Types
Box and whisker plots
Logistic regression
Introduction to spss
Data entry in Excel and SPSS
Biostatistics Standard deviation and variance
presentation of data
Introduction To SPSS
chain surveying
Chain Surveying 2020.ppt
2.4 Scatterplots, correlation, and regression
Correlations using SPSS
Spss data analysis for univariate, bivariate and multivariate statistics by d...
Introduction to Statistics
Correlation and regression
Stastistics in Physical Education - SMK.pptx
Ad

Similar to Anova advanced (20)

PDF
Lecture 8 guidelines_and_assignments
PPTX
Hypothesis Testing
DOCX
Complete Parts A & BPart ASome questions in Part A r.docx
PDF
mix2.pdf
DOCX
Part ASome questions in Part A require that you access data .docx
PPT
classmar16.ppt
PPT
classmar16.ppt
DOCX
Week 2 – Lecture 3 Making judgements about differences bet.docx
PDF
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition
PDF
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
PPT
PPTX
Experimental design data analysis
PPT
Chapter 10 2 way
PDF
The Rothamsted school meets Lord's paradox
PPTX
What research is and what it isn’t
DOCX
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
PDF
2 way ANOVA(Analysis Of VAriance
PPT
T12 non-parametric tests
Lecture 8 guidelines_and_assignments
Hypothesis Testing
Complete Parts A & BPart ASome questions in Part A r.docx
mix2.pdf
Part ASome questions in Part A require that you access data .docx
classmar16.ppt
classmar16.ppt
Week 2 – Lecture 3 Making judgements about differences bet.docx
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
Experimental design data analysis
Chapter 10 2 way
The Rothamsted school meets Lord's paradox
What research is and what it isn’t
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
2 way ANOVA(Analysis Of VAriance
T12 non-parametric tests
Ad

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
famous lake in india and its disturibution and importance
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPT
protein biochemistry.ppt for university classes
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
The scientific heritage No 166 (166) (2025)
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Microbiology with diagram medical studies .pptx
PPTX
2Systematics of Living Organisms t-.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
famous lake in india and its disturibution and importance
Taita Taveta Laboratory Technician Workshop Presentation.pptx
7. General Toxicologyfor clinical phrmacy.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Derivatives of integument scales, beaks, horns,.pptx
INTRODUCTION TO EVS | Concept of sustainability
protein biochemistry.ppt for university classes
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
AlphaEarth Foundations and the Satellite Embedding dataset
The scientific heritage No 166 (166) (2025)
. Radiology Case Scenariosssssssssssssss
Microbiology with diagram medical studies .pptx
2Systematics of Living Organisms t-.pptx

Anova advanced

  • 1. 9-1  2004 A. Karpinski Chapter 9 Advanced Topics in ANOVA Page Unbalanced ANOVA designs 1. Why is the design unbalanced? 9-2 2. What happens with unbalanced designs? 9-3 3. An introduction to the problem 9-5 4. Types of sums of squares 9-10 5. An example 9-15 ANOVA designs with random effects 6. Fixed effects vs. random effects 9-22 7. Model II: One-factor random effects model 9-24 8. Model II: Two-factor random effects model 9-30 9. Model III: Two-factor mixed effects model 9-35 10. Contrasts and post-hoc tests 9-41 11. Effect sizes 9-41 12. Final considerations about random effects 9-42 ANOVA designs with nested effects 13. An introduction to nested designs 9-43 14. Structural models for nested designs 9-45 15. Testing nested effects 9-46 16. Final considerations about nested designs 9-52 ANOVA designs with randomized blocks 17. The logic of blocked designs 9-53 18. Examples of randomized block designs 9-55 19. Final consideration about blocked designs 9-69
  • 2. 9-2  2004 A. Karpinski Advanced Topics in ANOVA: Unbalanced ANOVA designs 1. Why is the design unbalanced? • Random factors o The unequal cell sizes are randomly unequal o The process leading to the missingness is independent of the levels of the independent variable • Scheduling problems • Computer errors IV 1 IV B Level 1 Level 2 Level 3 Level 1 11n =15 21n =10 31n =20 45 Level 2 12n =20 22n =20 32n =15 55 35 30 35 100 IV 1 IV B Level 1 Level 2 Level 3 Level 1 11n =4 21n =7 31n =3 14 Level 2 12n =4 22n =3 32n =6 13 Level 3 13n =5 23n =4 33n =5 14 13 14 14 41 • Systematic factors o The unequal cell sizes are directly or indirectly related to the levels of the independent variables • A treatment is painful/ineffective • High prejudice individuals refuse to answer questions regarding attitudes toward ethnic groups IV 1 IV B Level 1 Level 2 Level 3 Level 1 11n =40 21n =40 31n =50 130 Level 2 12n =20 22n =20 32n =30 70 60 60 80 200 IV 1 IV B Level 1 Level 2 Level 3 Level 1 11n =3 21n =6 31n =9 18 Level 2 12n =2 22n =6 32n =9 17 Level 3 13n =4 23n =8 33n =13 25 9 20 31 60
  • 3. 9-3  2004 A. Karpinski • Missing observations due to systematic factors is bad. Analyzing these data can lead to very biased results. • All of the methods we discuss for analyzing unbalanced designs assume the cell sizes are either a result of: o Random factors o Real differences in the population 2. What happens with unbalanced designs? • Recall that two contrasts are orthogonal if for unequal n 1ψ = ),...,,,( 321 aaaaa 2ψ = ),...,,,( 321 abbbb 0 1 =∑= a j i ii n ba or 0... 2 22 1 11 =+++ a aa n ba n ba n ba • In general the tests for main effects and interactions are no longer orthogonal for unbalanced designs. • Because of this non-orthogonality, the sums of squares will not nicely partition. SSTSSABSSBSSA ≠++ • As a result: o The tests for the main effects and interactions are not independent of each other. o Single degree of freedom contrasts may not be combined into a simultaneous test. • The most popular method for dealing with these issues is to use different methods of computing the sums of squares for each effect. • These different methods of computing sums of squares DO NOT affect: i. The error term (MSW) ii. The test of the highest order interaction
  • 4. 9-4  2004 A. Karpinski • Three possible approaches to unequal cell sizes (assuming data are missing completely at random) o Add observations to make the design balanced • This solution may not be pragmatic • It may also present problems regarding random assignment in a true experiment o Delete observations to make design balanced • While an unbalanced design is less powerful than a balanced design, you ALWAYS lose power by tossing observations • There is not a good method for deciding whom to toss. (If you use a random process, then a different person using the same algorithm may come to different conclusions. If you use a systematic process, then you may bias your results.) • I recommend that you NEVER delete an observation to make a design balanced. o Impute the missing data • A topic too advanced for this course! o Conduct analysis on an unbalanced design
  • 5. 9-5  2004 A. Karpinski 3. An introduction to the problem of unbalanced designs • Balanced, orthogonal designs o For balanced designs, the SS partition is complete and each component’s contribution to the total SS is unique. • Unbalanced, non-orthogonal designs o For unbalanced designs, the SS are not necessarily unique to each component o These figures are just heuristics. With data, it is possible to have “negative” overlapping area. SSA SSB SSAB SSA SSB SSAB
  • 6. 9-6  2004 A. Karpinski • Approach #1: Only count the unique contribution of each factor o This approach is known as the Unique SS or Type III SS approach • Approach #2: Start with only the main effects. Use a unique SS approach to divide the main effect sums of squares. Then, add the next highest order effects. For the remaining SS, use the unique approach to divide the SS. Continue until all effects have been added. o This approach is known as using Type II SS SSA SSAB SSB SSAB SSBSSA
  • 7. 9-7  2004 A. Karpinski • Approach #3: Start with only the main effects. Determine an order of importance. Give the most important effect all its SS. For next effect, give the effect its entire remaining SS. Continue until all main effects are used. Next consider the two-way interactions, and determine an order of importance and repeat the process. Continue until all effects have been considered. o This approach is known as the hierarchical or Type I SS approach. Factor A entered first Factor B entered first SSAB SSBSSA SSAB SSA SSB
  • 8. 9-8  2004 A. Karpinski • The problem of unequal sample sizes occurs when we collapse across cells to look at the marginal means. There are different ways to collapse the main effects, and each gives a different answer. (The MSW and the highest order interaction are unaffected by these different methods because they do not average across any cells—they say something about individual cells.) • An example: Salary data for female and male employees Female Male College Degree No College Degree College Degree No College Degree 24 15 25 19 26 17 29 18 25 20 27 21 24 16 20 27 21 24 22 27 19 23 Mean 25 17 27 20 Sample Size 8 4 3 7 Gender Female Male Education College Degree 25 8 11 11 = = X n 27 3 21 21 = = X n No College Degree 17 4 12 12 = = X n 20 7 22 22 = = X n
  • 9. 9-9  2004 A. Karpinski • Question: Is there a difference in the salaries of men and women? o Approach #1: Let’s run a contrast comparing women’s salary to men’s salary Gender Women Men Education College Degree -1 1 No College Degree -1 1 • Based on this approach, we conclude that men earn more than women! ⇒ Women earn $21000 21 2 1725 =      + ⇒ Men earn $23500 5.23 2 2027 =      + o Approach #2: Ignore education level and compute marginal gender means. Gender Women Men College Degree 33.22 12 = = F F X n 10.22 10 = = M M X n • Based on this approach we look at the marginal means for gender, and conclude that women earn slightly more than men o Which answer is correct?
  • 10. 9-10  2004 A. Karpinski o It depends – each method answers a different question • Method #2 asks: Are men paid a higher salary than women? • Method #1 asks: Within an education status, are men paid a higher salary than women? • This discrepancy is known as “Simpson’s Paradox” 4. Types of Sums of Squares • I am going to focus on the use and interpretation of each type of sums of squares, and will ignore how to compute these SS. SPSS (or any statistical software) can calculate each of the SS, but if you must see the computational details, see an advanced ANOVA book. • Type III / Unique SS or Regression SS o In general, this is the best and most common approach to analysis o For Type III SS, each cell mean is weighted equally when computing marginal means. These cell means are unweighted (because they considered equally, independent of the sample sizes). o This approach leads to the identical results as converting the design to a one-factor arrangement and using contrasts to test the main effects and interactions. o When the design is not orthogonal, the SS of each effect may sum to a number greater than the total SS because of redundancy/overlap in SS. For Type III SS, we only use the part of the SS that is unique to the factor of interest. (For those of you familiar with regression, Type III SS is equivalent to testing for each effect after having previously controlled for/entered all other effects OR by entering all effects simultaneously.)
  • 11. 9-11  2004 A. Karpinski o In our example, using Type III SS is equivalent to taking approach #1 to the analysis. Testing the main effect for gender using a Type III SS approach: Gender Women Men Education College Degree 2511 =X -1 2721 =X 1 No College Degree 1712 =X -1 2022 =X 1 • Main effect for gender ⇒ Women earn $21000 21 2 1725 =      + ⇒ Men earn $23500 5.23 2 2027 =      + • How is the main effect for education tested? • In SPSS: UNIANOVA dv BY gender edu /METHOD = SSTYPE(3). Tests of Between-Subjects Effects Dependent Variable: DV 273.864a 3 91.288 32.864 .000 9305.790 1 9305.790 3350.084 .000 29.371 1 29.371 10.573 .004 264.336 1 264.336 95.161 .000 1.175 1 1.175 .423 .524 50.000 18 2.778 11193.000 22 323.864 21 Source Corrected Model Intercept GENDER EDU GENDER * EDU Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .846 (Adjusted R Squared = .820)a. Main effect for gender such that men earn more than women, F(1,22) = 10.57, p = .004 Main effect for education such that college educated individuals earn more than non-college educated individuals, F(1,22) = 95.16, p < .001
  • 12. 9-12  2004 A. Karpinski • Type I / Hierarchical SS o For Type I SS, each cell mean is weighted by its cell size when computing marginal means. o The order the factors are entered into SPSS makes a difference in how the SS are computed. o When the design is not orthogonal, the SS of each effect may sum to a number greater than the total SS because of redundancy/overlap in SS. For Type I SS: • For the first factor listed, we use all the SS for that factor (unique and redundant) • For the next factors, we use the entire SS that is not redundant with the previous factors (For those of you familiar with regression, Type I SS is equivalent to testing for each effect by entering each effect one after the other) o In our example, Type I SS (with gender listed first) is equivalent to ignoring education level and using weighted marginal means Gender Women Men College Degree 33.22 12 = = F F X n 10.22 10 = = M M X n • In SPSS: UNIANOVA dv BY gender edu /METHOD = SSTYPE(1). Tests of Between-Subjects Effects Dependent Variable: DV 273.864a 3 91.288 32.864 .000 10869.136 1 10869.136 3912.889 .000 .297 1 .297 .107 .747 272.392 1 272.392 98.061 .000 1.175 1 1.175 .423 .524 50.000 18 2.778 11193.000 22 323.864 21 Source Corrected Model Intercept GENDER EDU GENDER * EDU Error Total Corrected Total Type I Sum of Squares df Mean Square F Sig. R Squared = .846 (Adjusted R Squared = .820)a.
  • 13. 9-13  2004 A. Karpinski UNIANOVA dv BY edu gender /METHOD = SSTYPE(1). Tests of Between-Subjects Effects Dependent Variable: DV 273.864a 3 91.288 32.864 .000 10869.136 1 10869.136 3912.889 .000 242.227 1 242.227 87.202 .000 30.462 1 30.462 10.966 .004 1.175 1 1.175 .423 .524 50.000 18 2.778 11193.000 22 323.864 21 Source Corrected Model Intercept EDU GENDER EDU * GENDER Error Total Corrected Total Type I Sum of Squares df Mean Square F Sig. R Squared = .846 (Adjusted R Squared = .820)a. Gender listed first Edu listed first Main effect for gender F(1,18) = 0.11, p = .75 F(1,18) = 10.97, p < .001 Main effect for education F(1,18) = 98.06, p < .001 F(1,18) = 87.20, p < .001 • Not surprisingly, there are additional types of sums of squares o Type II SS A compromise between Type I and Type III SS o Type IV SS Use when there are missing cells in the design of the experiment • Which SS are better? o In general, you ran the design because you wanted to compare the cell means. In this case, the unequal cell sizes are irrelevant and you should use Type III SS • If we have an experimental design and the data are missing at random, then there is no defensible reason for allowing cells with larger numbers of observations to exert a greater influence on the analysis • For men and women with equal levels of education, do men and women receive equal pay? • Type III SS also have the advantage of being the simplest to convert to contrast coefficients
  • 14. 9-14  2004 A. Karpinski o If your design intentionally has unequal cell sizes (perhaps to reflect differences in the composition of the population) and you want your analyses to reflect this feature, then Type I SS may be more appropriate • Do men and women receive equal pay? o This issue of which type of SS to use for unbalanced designs is still controversial. Different texts and different authors offer different recommendations. The important point is for you to think about what question you are asking and which type of SS best answers that question. You must decide this issue before you analyze your data, not after examining the p-values! • Important points to remember o Regardless of the type of SS used, the error term remains unchanged o Any analysis that does not involve marginal means remains unchanged • The test of the highest order interaction is unchanged • Tests of cell mean contrasts are unchanged o In most cases Type III SS seem to be the “best” because they take into account information about all the factors • If important factors are omitted from the design, you may arrive a erroneous conclusions (In regression, this is known as the omitted variable problem).
  • 15. 9-15  2004 A. Karpinski 5. An Example: Level of Management and Support of Affirmative Action Management Level Gender Middle- Management, Minor Division Upper- Management, Minor Division Middle- Management, Major Division Upper- Management, Major Division CEO Female 21 25 29 26 24 31 30 23 28 25 22 31 30 35 25 30 27 36 27 Male 25 18 26 31 28 22 31 33 31 40 36 35 35 43 37 40 36 44 43 45 42 DV = Scores on an Affirmative Action Attitude Scale • Note that this design is rather odd – it is a 2*2* 2 with an extra 2 cells Management Level Middle Management Upper Management Gender Minor Division Major Division Minor Division Major Division Male Female Gender CEO Male Female • Rather than trying to analyze it as a 2*2*3 with two missing cells, it is much easier to consider this design to be a 2*5 design. Using appropriate contrasts, we can test o Main effect of management level o Main effect of division o Management by division interaction o Interactions between all these terms and gender But we can also make comparisons between these groups and CEOs. • Using this approach, we can avoid designs with empty cells and the need to learn about Type IV SS.
  • 16. 9-16  2004 A. Karpinski Your specific research questions were: i. Do middle and upper management from minor divisions differ in their support for AA? ii. Do minor division managers differ from major division managers in their support for AA? iii. Do CEOs differ from other management in their support for AA? iv. Do questions i. – iii. differ by gender? • First, let’s look at the data: 55443 33445N = MANAGE CEOUP - MajorMM - MajorUM - MinorMM - Minor DV 50 40 30 20 10 GENDER Female Male 36 1 3 EXAMINE VARIABLES=dv BY group /PLOT NPPLOT. Tests of Normality .989 5 .977 .895 4 .405 .912 4 .492 1.000 3 1.000 .750 3 .000 .842 3 .220 .827 4 .161 .971 4 .850 .887 5 .341 .836 5 .154 GROUP 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 DV Statistic df Sig. Shapiro-Wilk Test of Homogeneity of Variances DV .348 9 30 .950 Levene Statistic df1 df2 Sig.
  • 17. 9-17  2004 A. Karpinski • Rather than running a traditional main effects and interaction analysis, let’s skip the omnibus tests and do a contrast-based test of the hypotheses. o We should adopt a Type III SS approach – the variations in the cell sizes appear to be random and we are interested in the cell means. o To conduct contrasts with a Type III SS approach, we need to consider each cell mean equally, regardless of its sample size – but that is what we do when we use our standard tests for contrasts. o However, remember that we cannot combine single degree of freedom contrasts into a simultaneous omnibus test of a hypothesis. Hypothesis 1 o Do middle and upper management in the minor divisions differ in their support for AA? o Does this level of support differ by gender? Management Level Gender MM, Minor UM, Minor MM, Major UM, Major CEO Hyp1: Female Male -1 -1 1 1 0 0 0 0 0 0 Hyp 1B: Female Male -1 1 1 -1 0 0 0 0 0 0 ONEWAY dv by group /cont = -1 1 0 0 0 -1 1 0 0 0 /cont = -1 1 0 0 0 1 -1 0 0 0. Contrast Tests 8.0000 4.00638 1.997 30 .055 -2.0000 4.00638 -.499 30 .621 Contrast Hyp 1 Hyp 1 * Gender DV Value of Contrast Std. Error t df Sig. (2-tailed)
  • 18. 9-18  2004 A. Karpinski • In the minor divisions, we find that upper management is more supportive of AA than middle management, t(30) = 2.00, p = .06, ω2 = .07. • This difference in support of AA does not vary by gender, t(30) = -0.50, p = .62, ω2 < .01 • As an example of the effect size calculation, here are the omega squared calculations for the test of Hypothesis 1: Hypothesis 1: ˆψ1 = 8 SS ˆψ1= ˆψ1 2 cj 2 n j ∑ = (8)2 (−1)2 5 + (1)2 4 + 0 + 0 + 0 + (−1)2 3 + (1)2 4 + 0 + 0 + 0 = 64 1.033 = 61.935 ˆωψ 2 = SSψ − MSWithin SSψ + (N −1)MSWithin = 61.935 −15.53 61.935 + (39)15.53 = .0695 Hypothesis 2 o Do minor division managers differ from major division managers in their support for AA? o Does this level of support differ by gender? Management Level Gender MM, Minor UM, Minor MM, Major UM, Major CEO Hyp 2: Female Male -1 -1 -1 -1 1 1 1 1 0 0 Hyp 2B: Female Male -1 1 -1 1 1 -1 1 -1 0 0 ONEWAY dv by group /cont = -1 -1 1 1 0 -1 -1 1 1 0 /cont = -1 -1 1 1 0 1 1 -1 -1 0. Contrast Tests 26.0000 5.66588 4.589 30 .000 -18.0000 5.66588 -3.177 30 .003 Contrast Hyp 2 Hyp 2 * Gender DV Value of Contrast Std. Error t df Sig. (2-tailed)
  • 19. 9-19  2004 A. Karpinski • We find a significant division of management by gender interaction, t(30) = -3.18, p < .01, ω2 = .19. To understand this interaction, we must conduct simple effects tests: ONEWAY dv by group /cont = -1 -1 1 1 0 0 0 0 0 0 /cont = 0 0 0 0 0 -1 -1 1 1 0. Contrast Tests 4.0000 4.00638 .998 30 .326 22.0000 4.00638 5.491 30 .000 Contrast Hyp 2 - Women only Hyp 2 - Men only DV Value of Contrast Std. Error t df Sig. (2-tailed) • For women, we find no significant difference between major and minor division management in their support for AA, t(30) = 1.00, ns, ω2 < .01. • For men, we find that managers in major divisions express more support for AA than managers in minor divisions, t(30) = 5.49, p < .05, ω2 = .42. (Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value) Hypothesis 3 o Do CEOs differ from other management in their support for AA? o Does this level of support differ by gender? Management Level Gender MM, Minor UM, Minor MM, Major UM, Major CEO Hyp 3: Female Male -1 -1 -1 -1 -1 -1 -1 -1 4 4 Hyp 3B: Female Male -1 1 -1 1 -1 1 -1 1 4 -4 ONEWAY dv by group /cont = -1 -1 -1 -1 4 -1 -1 -1 -1 4 /cont = -1 -1 -1 -1 4 1 1 1 1 -4. Contrast Tests 54.0000 12.83173 4.208 30 .000 -34.0000 12.83173 -2.650 30 .013 Contrast Hyp 3 Hyp * Gender DV Value of Contrast Std. Error t df Sig. (2-tailed)
  • 20. 9-20  2004 A. Karpinski • We find a significant level of management by gender interaction, t(30) = -2.65, p = .01, ω2 = .13. To understand this interaction, we must conduct simple effects tests: ONEWAY dv by group /cont = -1 -1 -1 -1 4 0 0 0 0 0 /cont = 0 0 0 0 0 -1 -1 -1 -1 4. Contrast Tests 10.0000 9.94462 1.006 30 .323 44.0000 8.10912 5.426 30 .000 Contrast Hyp 3 - Women only Hyp 3 - Men only DV Value of Contrast Std. Error t df Sig. (2-tailed) • For women, we find no significant difference between management and CEOs in their support for AA, t(30) = 1.01, ns, ω2 < .01. • For men, we find that CEOs express more support for AA than other managers, t(30) = 5.42, p < .05, ω2 = .42 (Use Scheffé correction 28.3)30,4,05(.*4 == Ftcrit , as the critical value) • Note that for a contrast-based analysis, we are implicitly adopting a Type III SS approach by weighting each cell mean equally. Single degree of freedom tests of cell means are not affected by an unbalanced design (However, we would not be able to combine single df tests into a simultaneous test).
  • 21. 9-21  2004 A. Karpinski • If we had taken a traditional approach, we would have used Type III SS for our analysis because we assume that the data are missing at random and we want to know if attitudes toward AA differ by gender within each management position. UNIANOVA dv BY gender manage /METHOD = SSTYPE(3) /PRINT = DESC. Tests of Between-Subjects Effects Dependent Variable: DV 1427.100a 9 158.567 10.208 .000 36013.846 1 36013.846 2318.488 .000 260.000 1 260.000 16.738 .000 687.429 4 171.857 11.064 .000 268.351 4 67.088 4.319 .007 466.000 30 15.533 40706.000 40 1893.100 39 Source Corrected Model Intercept GENDER MANAGE GENDER * MANAGE Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .754 (Adjusted R Squared = .680)a. o We find a significant gender by management position interaction, F(1,30) = 4.32, p < .01 o We would be required to perform follow-up tests before interpreting the main effects for gender and management. Attitude Toward Affirmative Action 20 25 30 35 40 45 Gender Attitude MM - Minor MM - Major UM - Minor UM - Major CEO Female Male
  • 22. 9-22  2004 A. Karpinski ANOVA designs with random effects 6. Fixed effects vs. random effects • Model I: The fixed effects model o A fixed effect is one in which the experimenter is only interested in the levels of the IV that are included in the study o In advance of the study, the experimenter decides to examine a relatively small set of treatments. Each treatment of interest is included in the study. The experimenter wishes to make inferences about those treatments and no others. o The effect is fixed in that if someone were to replicate the study, the identical treatments would be used o Example of a fixed effects model: An advertising company wants to examine the effectiveness of five different billboards in both men and women, and in White-Americans, Black-Americans, Asian-Americans, and Hispanic Americans. • This design is a 5*2*4 between subjects, fixed effects ANOVA Factor 1: Advertisement (5 different billboards) Factor 2: Gender (Men and Women) Factor 3: Ethnicity (4 ethnic groups) • Each of these factors is fixed. If the design were to be replicated, the exact same ads, genders, and ethnicities would be used. The experimenter wants to make inferences regarding only these ads, genders, and ethnicities. • (The exact same participants would not be used – participants are always a random effect) ( ) ( ) ( ) ( ) ijkljklkljljklkjijklY εαβγβγαγαβγβαµ ++++++++=
  • 23. 9-23  2004 A. Karpinski • Model II: The random effects model o A random effect is one in which the factor levels are randomly sampled from a population. Inferences are made not only for the factor levels included in the study, but to the entire population of factor levels. o The effect is random in that if someone were to replicate the study, the different treatments would be sampled from the population. o Example of a random effects model: A company owns several hundred retail stores throughout the country, and it wants to examine the effectiveness of a new sales promotion. Five stores are randomly sampled. The sales promotion is implemented in each store for a trial period and then evaluated. • This design is a 1-factor between-subjects, random effect ANOVA Factor 1: Store (5 stores) • The store factor is a random factor. If the design were to be replicated, five different stores would be randomly sampled from the population. The experimenter wants to make inferences regarding the effectiveness of the sales promotion in all stores, not just the five included in the study. • Model III: Mixed model o A mixed model is a model containing at least one fixed effect and at least one random effect In psychology many people refer to a design with at least one between-subjects factor and at least one within-subjects factor as a mixed design. Although this terminology is common in psychology it is inconsistent with the statistical usage of the term. Consistent with the statistical usage, we will reserve the term mixed model for a model with fixed and random factors
  • 24. 9-24  2004 A. Karpinski o Example of a mixed model: To investigate the effect of mental activity on blood flow to the brain (BF), participants completed a math test, a reading comprehension test, or a history task. The experimenter wanted to generalize the results to a classroom setting, and reasoned that different classrooms might have different effects on baseline BF. Thus, six fifth grade classrooms were selected at random from the Philadelphia public school system. The students in each class were randomly assigned to the math test, the reading comprehension test, or the history test. Post- test BF readings were taken on all participants. • This design is a 2-factor between-subjects, mixed model ANOVA Factor 1: Test (Math, Reading Comprehension, or History) Factor 2: Classroom (6 classrooms) • The test factor is a fixed factor. These three kinds of tasks are the only tasks of interest to the experimenter. The classroom factor is a random factor. If the design were to be replicated, six different classrooms would be randomly sampled from the population. • The key idea of the random effects model is that you not only take into account random noise, 2 εσ , you also take into account the variability due to the sampling of the factor levels, 2 ασ 7. Model II: One-factor random effects model • Let’s consider the sales effectiveness example in more detail Store 1 2 3 4 5 5.80 6.00 6.30 6.40 5.70 5.10 6.10 5.50 6.40 5.90 5.70 6.60 5.70 6.50 6.50 5.90 6.50 6.00 6.10 6.30 5.60 5.90 6.10 6.60 6.20 5.40 5.90 6.20 5.90 6.40 5.30 6.40 5.80 6.70 6.00 5.20 6.30 5.60 6.00 6.30 50.51 =X 22.62 =X 90.53 =X 33.64 =X 16.65 =X
  • 25. 9-25  2004 A. Karpinski • For a random effects model, we need to check some additional assumptions, compared to the fixed-effects model o Fixed effects assumptions: • All observations are drawn from normally distributed populations • All observations have a common variance • All observations are independent and are randomly sampled from the population o Random effects assumptions: • All treatment effects are drawn from normally distributed populations • All treatment effects are independent and are randomly sampled from the population o In general, we cannot check these random effects assumptions in the data. We must infer them from the design. EXAMINE VARIABLES=dv BY store /PLOT BOXPLOT NPPLOT SPREADLEVEL. 88888N = STORE 5.004.003.002.001.00 DV 7.0 6.5 6.0 5.5 5.0 4.5 Tests of Normality .950 8 .716 .913 8 .373 .950 8 .716 .930 8 .516 .946 8 .667 STORE 1.00 2.00 3.00 4.00 5.00 DV Statistic df Sig. Shapiro-Wilk Test of Homogeneity of Variance .073 4 35 .990DV Levene Statistic df1 df2 Sig.
  • 26. 9-26  2004 A. Karpinski • The structural model for a oneway random effects model looks similar to a fixed model o Fixed effects model: Yij = µ + α j + εij ),0(~ εσε Nij o Random effects model: ijjijY εαµ σ ++= ),0(~ εσε Nij ),0(~ ασ σα Nj So that 222 αε σσσ +=Y • Random effects are denoted with a subscript σ to highlight that they are random. That is, the sj 'σα are not fixed at a level, but have a distribution. • In general, we are not interested in estimating the sj 'σα because they vary from study to study. It is much more informative to estimate the distribution of sj 'σα : ),0(~ ασ σα Nj • When we estimate effects, we will want to estimate 2 ασ • ANOVA table for a random-effects model o Recall the ANOVA table for the fixed-effects model 0...: 210 ==== aH ααα Source SS df MS E(MS) F Between SSBet a-1 SSB/DFBet 1 2 2 − + ∑ a n iiα σε MSW MSBet Within (Error) SSW N-a SSW/DFW 2 εσ Total SST N-1 o A valid F-test for a factor is constructed so that: • When the null hypothesis is true, the expected F-value is 1 If H0 is true: 0 1 2 = − ∑ a n iiα Then 11 2 2 2 2 2 ==− + == ∑ ε ε ε ε σ σ σ α σ a n MSW MSBet F ii
  • 27. 9-27  2004 A. Karpinski • When the alternative hypothesis is true, the expected F-value is greater than 1 and this increase is only due to the factor of interest If H1 is true: 0 1 2 > − ∑ a n iiα Then 11 2 2 2 >− + == ∑ ε ε σ α σ a n MSW MSB F ii o Now the ANOVA table for the random-effects model 0: 2 0 =ασH Source SS df MS E(MS) F Between SSBet a-1 SSB/DFBet 22 αε σσ n+ MSW MSBet Within (Error) SSW N-a SSW/DFW 2 εσ Total SST N-1 o Although the F-tests are constructed in the same manner as a fixed effects model, under the hood different components are being estimated • When the null hypothesis is true, the expected F-value is 1 If H0 is true: 02 =ασ Then 12 2 2 22 == + == ε ε ε αε σ σ σ σσ n MSW MSBet F • When the alternative hypothesis is true, the expected F-value is greater than 1 and this increase is only due to the factor of interest If H1 is true: 02 >ασ Then 12 22 > + == ε αε σ σσ n MSW MSBet F
  • 28. 9-28  2004 A. Karpinski • Random Effects in SPSS UNIANOVA dv BY store /RANDOM = store. Tests of Between-Subjects Effects Dependent Variable: DV 1449.616 1 1449.616 1665.507 .000 3.482 4 .870a 3.482 4 .870 10.717 .000 2.843 35 8.121E-02b Source Hypothesis Error Intercept Hypothesis Error STORE Type III Sum of Squares df Mean Square F Sig. MS(STORE)a. MS(Error)b. o To test the effect of store: F(4, 35) = 10.72, p < .01 o We reject the null hypothesis of no store effect and conclude that the effectiveness of the sales campaign varies by store • If store had been a fixed effect, we would conduct post-hoc tests to determine how the stores differed. • But when store is a random effect, we are not interested in differences between specific stores used in the study. We only want to know if the store variable adds any variance to the DV (or accounts for any variance in the DV). In general, we are not interested in post-hoc tests on the levels of a random variable.
  • 29. 9-29  2004 A. Karpinski o For any random effects model, SPSS also provides us with the E(MS) so that we can see how the F-test was constructed: Expected Mean Squaresa 8.000 1.000 Intercept 8.000 1.000 .000 1.000 Source Intercept STORE Error Var(STORE) Var(Error) Quadratic Term Variance Component For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell. a. E(MSSTORE) = 8*VAR(STORE) + VAR(ERROR) VAR(STORE) = 2 ασ and VAR(ERROR) = 2 εσ E(MSSTORE) = 8 2 ασ + 2 εσ • We can use this information to estimate the variance components ⇒ To estimate the error variance 08.ˆ 2 == MSWεσ ⇒ To estimate the variance of the store effect 22 8)( εα σσ +=STOREMSE So that with a little algebra, we obtain: 10. 8 08.87. 8 ˆ 2 = − = − = MSWMSSTORE ασ ⇒ To estimate total variance 18.10.08.ˆˆˆ 222 =+=+= αε σσσY
  • 30. 9-30  2004 A. Karpinski 8. Model II: Two-factor random effects model • An Example: Suppose a projective test involves 10 cards administered to a patient, and the number of responses to each card is recorded. The developer of the test suspects that the order of the cards might influence the number of responses. Furthermore, the developer has created a standardized set of instructions in hopes that the effect of the administrator will be negligible. To test these assumptions about the test, the developer randomly selects four possible orders of the ten cards. Four administrators are recruited to give each order of the test to two patients Administrator Order 1 2 3 4 1 26 15 30 33 25 23 28 30 2 26 24 25 33 27 17 27 26 3 33 27 26 32 30 24 31 26 4 36 28 37 42 37 33 39 25 2222 2222 2222 2222N = ADMIN 4.003.002.001.00 DV 50 40 30 20 10 ORDER 1.00 2.00 3.00 4.00 • With 2 observations/cell, this example is obviously for pedagogical purposes only. Due to the limited number of observations per cell, we will assume that the assumptions are satisfied.
  • 31. 9-31  2004 A. Karpinski • The structural model for this design: ( ) ijkjkkjijkY εαββαµ σσσ ++++= ),0(~ εσε Nij ),0(~ ασ σα Nj ),0(~ βσ σβ Nk ( ) ),0(~ αβσ σαβ Njk So that 22222 αββαε σσσσσ +++=Y • ANOVA table for a random-effects model o The test of each factor is examining a different variance component Main effect for Administrator: 0: 2 0 =ασH Main effect for Order: 0: 2 0 =βσH Administrator by Order interaction: 0: 2 0 =αβσH o In the two factor random effects model, we need to be much more careful about examining the E(MS) and constructing appropriated tests of each effect. Source SS df MS E(MS) F Factor A SSA a-1 SSA/DFA 222 ααβε σσσ nbn ++ MSAB MSA Factor B SSB b-1 SSB/DFB 222 βαβε σσσ nan ++ MSAB MSB A * B SSAB (a-1)*(b-1) SSAB/DFAB 22 αβε σσ n+ MSW MSAB Within (Error) SSW N-ab SSW/DFW 2 εσ Total SST N-1 o For multi-factor random effects ANOVA, you must always examine the expected MS to make sure you are using the correct error term!
  • 32. 9-32  2004 A. Karpinski • To construct a test for Factor A or Factor B, we must use the MS from the interaction as the error term For example, let’s consider Factor A If H0 is true: 02 =ασ Then 122 22 22 222 = + + = + ++ == αβε αβε αβε ααβε σσ σσ σσ σσσ n n n nbn MSAB MSA F If H1 is true: 02 >ασ Then 122 222 > + ++ == αβε ααβε σσ σσσ n nbn MSAB MSA F Suppose we tried to construct an F-test using the MSW If H0 is true: 02 =ασ Then 12 22 2 222 > + = ++ == ε αβε ε ααβε σ σσ σ σσσ nnbn MSW MSA F F would be greater than 1, even when the null hypothesis was true! This test is not a test for the effect of factor A!!! • To construct a test for the AB interaction, we must use the MSW as the error term If H0 is true: 02 =αβσ Then 12 2 2 22 == + == ε ε ε αβε σ σ σ σσ n MSW MSAB F If H1 is true: 02 >αβσ Then 12 22 > + == ε αβε σ σσ n MSW MSAB F
  • 33. 9-33  2004 A. Karpinski • Using SPSS to analyze a two-factor random effects design UNIANOVA dv BY admin order /RANDOM = admin order. Tests of Between-Subjects Effects Dependent Variable: DV 26507.531 1 26507.531 155.441 .000 716.173 4.200 170.531a 151.094 3 50.365 3.446 .065 131.531 9 14.615b 404.344 3 134.781 9.222 .004 131.531 9 14.615b 131.531 9 14.615 .631 .755 370.500 16 23.156c Source Hypothesis Error Intercept Hypothesis Error ADMIN Hypothesis Error ORDER Hypothesis Error ADMIN * ORDER Type III Sum of Squares df Mean Square F Sig. MS(ADMIN) + MS(ORDER) - MS(ADMIN * ORDER)a. MS(ADMIN * ORDER)b. MS(Error)c. o SPSS highlights the fact that it is using different error terms to test each factor o We conclude: • There is a significant effect of order of the test on number of responses, F(3,9) = 9.22, p < .01 • Also there is a marginally significant effect of administrator on the number of responses, F(3,9) = 3.45, p = .07 • But that there is no order by administrator interaction effect on the number of responses, F(9,16) = 0.63, p = .76.
  • 34. 9-34  2004 A. Karpinski o SPSS also gives us information on the E(MS) so that we can calculate the variance components Expected Mean Squaresa,b 8.000 8.000 2.000 1.000 Intercept 8.000 .000 2.000 1.000 .000 8.000 2.000 1.000 .000 .000 2.000 1.000 .000 .000 .000 1.000 Source Intercept ADMIN ORDER ADMIN * ORDER Error Var(ADMIN) Var(ORDER) Var(ADMIN * ORDER) Var(Error) Quadratic Term Variance Component For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell. a. Expected Mean Squares are based on the Type III Sums of Squares.b. ⇒ To estimate the error variance 16.23ˆ 2 == MSWεσ ⇒ To estimate the variance of the interaction effect 22 * 2)( εαβ σσ +=OrderAdminMSE So that with a little algebra, we obtain: 0 2 156.23615.14 2 ˆ 2 = − = − = MSWMS rAdmin*Orde αβσ ⇒ To estimate the variance of the administrator effect rAdmin*ordeAdmin MSMSE +=++= 2222 828)( αεαβα σσσσ So that with a little algebra, we obtain: 47.4 8 615.14365.50 8 ˆ 2 = − = − = rAdmin*OrdeAdmin MSMS ασ ⇒ To estimate the variance of the order effect rAdmin*ordeOrder MSMSE +=++= 2222 828)( βεαββ σσσσ So that with a little algebra, we obtain: 02.15 8 615.14781.134 8 ˆ 2 = − = − = rAdmin*OrdeOrder MSMS βσ ⇒ To estimate total variance ˆσY 2 = ˆσε 2 + ˆσα 2 +σβ 2 +σαβ 2 = 23.16 + 4.47 +15.02 + 0 = 42.65 • Note that any component that is estimated to be less than zero is assumed to have a value of zero
  • 35. 9-35  2004 A. Karpinski o SPSS can also compute variance components directly VARCOMP dv BY order admin /RANDOM = order admin. Variance Estimates 15.021 4.469 -4.271a 23.156 Component Var(ORDER) Var(ADMIN) Var(ORDER * ADMIN) Var(Error) Estimate Dependent Variable: DV Method: Minimum Norm Quadratic Unbiased Estimation (Weight = 1 for Random Effects and Residual) For the ANOVA and MINQUE methods, negative variance component estimates may occur. Some possible reasons for their occurrence are: (a) the specified model is not the correct model, or (b) the true value of the variance equals zero. a. 9. Model III: Two-factor mixed models • Multi-factor experiments involving only random effects are relatively rare in behavioral research. It is much more common to encounter mixed models (containing both fixed and random effects) than to encounter a multi-factor random effects model • A return to the study on the effect of mental activity on blood flow (BF) – See p. 9-24. This design is a 2-factor between-subjects mixed model ANOVA Factor 1: Test (Math, Reading Comprehension, or History) Factor 2: Classroom (6 classrooms) Task (fixed) Classroom (random) Math Reading Comp History 1 7.8 8.7 11.1 12.0 11.7 10.0 2 8.0 9.2 11.3 10.6 9.8 11.9 3 4.0 6.9 9.8 10.1 11.7 12.6 4 10.3 9.4 11.4 10.5 7.9 8.1 5 9.3 10.6 13.0 11.7 8.3 7.9 6 9.5 9.8 12.2 12.3 8.6 10.5
  • 36. 9-36  2004 A. Karpinski • As with the previous example, due to the limited number of observations per cell, we will assume that the assumptions are satisfied. 222222 222222 222222N = CLASS 6.005.004.003.002.001.00 14 12 10 8 6 4 2 TASK Math Reading History • When considering mixed models, interactions between fixed effects and random effects are considered to be random effects. • The structural model for a mixed design (A fixed; B random): Yijk = µ + α j + βσ k + αβ( )σ jk +εijk ),0(~ εσε Nij ),0(~ βσ σβ Nk ( ) ),0(~ αβσ σαβ Njk So that σY 2 = σε 2 + σβ 2 + σαβ 2 • ANOVA table for a mixed-effects model o The test of each: Main effect for task: H0 :α1 = α2 = α3 = 0 Main effect for class: 0: 2 0 =βσH Task by class interaction: 0: 2 0 =αβσH
  • 37. 9-37  2004 A. Karpinski o Again, we need to consider the E(MS)s so that we construct valid F-tests. Source SS df MS E(MS) F Factor A (Fixed) SSA a-1 SSA/DFA σε 2 + nσαβ 2 + nb α j 2 ∑ a −1 MSAB MSA Factor B (Random) SSB b-1 SSB/DFB σε 2 + naσβ 2 MSB MSW A * B SSAB (a-1)*(b-1) SSAB/DFAB 22 αβε σσ n+ MSAB MSW Within (Error) SSW N-ab SSW/DFW 2 εσ Total SST N-1 • To construct a test for Factor A (the fixed effect): ⇒ We must use the MS from the interaction as the error term • To construct a test for Factor B (a random effect): ⇒ We must use the MSW as the error term • To construct a test for the Factor AB interaction (a random effect): ⇒ We must use the MSW as the error term • Why does having a random effect change the error term of the fixed effect, but not of the random effect? o Consider a design with therapy (3 fixed levels) and clinical trainee (3 random levels) o We assume that the three trainees used in the study were drawn from a population of trainees. Imagine that we can put on our magic classes and see population means for the therapy modes for the entire population of trainees (and for simplicity, we will assume that the population is small – consisting of 17 trainees) Clinical Trainee Therapy a b c d e f g h i j k l m n o p q r Mean A 7 6 5 7 6 5 4 4 4 1 2 3 4 4 4 1 2 3 4 B 4 4 4 1 2 3 7 6 5 7 6 5 1 2 3 4 4 4 4 C 1 2 3 4 4 4 1 2 3 4 4 4 7 6 5 7 6 5 4 Mean 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  • 38. 9-38  2004 A. Karpinski o In our study, we randomly sample 3 of the trainees. So let’s consider a random sample of three trainees Clinical Trainee Therapy g k r Mean A 4 2 3 3.0 B 7 6 4 5.67 C 1 4 5 3.33 Mean 4 4 4 4 o The random trainee factor does not affect our estimation of the effect of trainee o The random trainee factor does affect our estimation of the therapy (the fixed factor) • Trainee and Therapy interact, which can cause variability among means for the fixed factor to increase • MS(A) must be measuring something other than just error and the effect of Therapy. When we look at the EMS for factor A, we see that it captures variability due to the A*B interaction • Using SPSS to analyze a two-factor mixed effects design UNIANOVA dv BY task class /RANDOM = class. Tests of Between-Subjects Effects Dependent Variable: DV 3570.062 1 3570.062 2626.655 .000 6.796 5 1.359a 44.042 2 22.021 3.784 .060 58.195 10 5.820b 6.796 5 1.359 .234 .939 58.195 10 5.820b 58.195 10 5.820 7.207 .000 14.535 18 .808c Source Hypothesis Error Intercept Hypothesis Error TASK Hypothesis Error CLASS Hypothesis Error TASK * CLASS Type III Sum of Squares df Mean Square F Sig. MS(CLASS)a. MS(TASK * CLASS)b. MS(Error)c. o But wait!! SPSS is using the wrong error term for test of the main effect of classroom!!! Classroom is a random effect. To test the random effect, we need to use MSW as the error term. SPSS is using MSAB.
  • 39. 9-39  2004 A. Karpinski o We will have to do the correct test by hand Main Effect for Class: F(5,18) = MSCLASS MSW = 1.36 0.81 =1.68, p = .19 o We can also use the TEST subcommand and ask SPSS to compute the F- test. We need to enter the effect (class), the SS of the denominator (14.54) and the df of the denominator (18) UNIANOVA dv BY task class /RANDOM = class /TEST = class vs 14.54 df(18). Test Results Dependent Variable: DV 6.796 5 1.359 1.683 .190 14.540a 18a .808 Source Contrast Error Sum of Squares df Mean Square F Sig. User specified.a. o BEWARE! SPSS may contain other “errors.” If you are going to be analyzing balanced random or mixed designs, it is worth your time and effort to look up or calculate the E(MS)s for your design (For an algorithm see Neter, Appendix D) o Note: SPSS does not consider this to be an error. They state that statisticians differ in how they approach this problem. http://spss.com/tech/answer/details.cfm?tech_tan_id=100000073 As indicated in this tech note, SAS makes the same “error.” Thus, even if you run the analysis in SAS, you will still have to rerun the analysis I cannot find any recent texts that agree with the SPSS approach. Neter et al (1996, p 981), Kirk (1995, p 374) and Maxwell & Delaney (1990, p 429/431) all give the E(MS) I list on the previous page. For balanced designs, SPSS does the wrong analysis. For unbalanced designs, SPSS’s approach may be appropriate.
  • 40. 9-40  2004 A. Karpinski o The following is a hand-corrected variance components table (based on the correct E(MS) values listed on page 9-37) Expected Mean Squaresa 6.000 2.000 1.000 Intercept, TASK .000 2.000 1.000 TASK 6.000 .000 1.000 .000 2.000 1.000 .000 .000 1.000 Source Intercept TASK CLASS TASK * CLASS Error Var(CLASS) Var(TASK * CLASS) Var(Error) Quadratic Term Variance Component Andy's Hand-Corrected Tablea. ⇒ To estimate the error variance ˆσε 2 = MSW = 0.81 ⇒ To estimate the variance of the interaction effect E(MSTask*Class) = 2σαβ 2 +σε 2 So that with a little algebra, we obtain: ˆσαβ 2 = MSTask*Class − MSW 2 = 5.82 − 0.81 2 = 2.51 ⇒ Task is a fixed effect – there is no variance component to estimate ⇒ To estimate the variance of the class effect E(MSClass) = 6σβ 2 +σε 2 So that with a little algebra, we obtain: ˆσβ 2 = MSClass − MSW 6 = 1.36 −0.81 6 = 0.09 ⇒ To estimate total variance ˆσY 2 = ˆσε 2 + ˆσβ 2 + ˆσαβ 2 = 0.81+ 0.09 + 2.51 = 3.41 o SPSS’s VARCOMP command also errs on the variance estimate for the class effect (SPSS output not shown here)
  • 41. 9-41  2004 A. Karpinski 10.Contrasts and post-hoc tests • To perform contrasts or post-hoc tests, you can use the same formulas previously discussed for ANOVA – with one exception. You must use the correct error term in place of MSW, and the degrees of freedom associated with that error term o If you perform contrasts/post-hoc test on the marginal means for factor A, you need to use the error term used to test factor A o If you perform contrasts/post-hoc test on the marginal means for factor B, you need to use the error term used to test factor B o If you perform contrasts/post-hoc test on the individual cell means, you need to use the error term used to test AB interaction 11. Effect sizes for random effects designs • The random effects equivalent of eta squared is rho, ρ • Rho is interpreted just as eta squared – as the proportion of the variance in the DV accounted for by the factor in the sample ρA = σA 2 σY 2 • Omega squares must still be used for fixed effects in a mixed model. In general, for a fixed factor A: MSWNerrortermMSdfASSA errortermMSdfASSA A )(][)( ][)( ˆ 2 +− − =ω o For example, in a two-factor mixed model, with A fixed and B random, we used MSAB as the error term to test Factor A. Thus, our equation for omega squared would be: MSWNMSABdfASSA MSABdfASSA A )()( )( ˆ 2 +− − =ω 53. )808)(.36(82.5)2(04.44 82.5)2(04.44 ˆ 2 = +− − =Taskω
  • 42. 9-42  2004 A. Karpinski 12.Final considerations about random effects • The distinction between fixed and random effects is not always as clear as presented here. For example, Clark (1973) argued – convincingly – that when a list of words is used in a study, the words should be treated as a random effect. The key is what type of inference you want to make • We consider the random effects as being sampled from an infinite population. If the population is finite but large, we are OK. However, when the population to be sampled from is small, adjustments are necessary • We estimate the distribution of the random effects based on the means (and the variability of those means) of the random factor. If you only have 2-3 levels of your random factor, you will not get a good estimate of the distribution. It is desirable to have a relatively large number of levels of any random factor. In addition, it is important that the levels of the random factor be randomly sampled from the population of interest • In designs with three or more factors that include two or more random effects, it is common to encounter situations where no exact F-test can be constructed. In this case, quasi-F ratios (linear combinations of MSs) are used to approximate an F-ratio. • All of our calculations assume that cell sizes are equal. Things get very wacky with unequal cell sizes, and it is no longer possible to construct exact F-tests (the ratios of expected MSs no longer satisfy the requirements for a valid F-test). Approximate tests are available and are calculated in SPSS. • It is a good idea to calculate or look-up E(MS)s for balanced designs and/or to replicate the analysis using another statistical package.
  • 43. 9-43  2004 A. Karpinski ANOVA designs with nested effects 13.An introduction to nested designs • Nested designs are also known as hierarchical designs • The factorial designs studied thus far are considered to be crossed designs. That is, every level of a factor appears in (or is crossed with) every level of all other factors. If you display the design in a grid, there are no empty cells in a crossed design. • Example 1: The effect of therapist’s sex on treatment outcome You observed three male and three female therapists. Each therapist sees four patients, and you record a general measure of psychological health. Sex of therapist Male Female Therapist 1 2 3 4 5 6 o Sex is the main variable of interest and is a fixed effect o Therapist is nested within sex (It can not be crossed because a therapist can not be both male and female). Therapist will also be considered a random effect o Each therapist sees three patients. Thus, patients are nested within therapist (and are a random effect) • Example #2: The effect of race of defendant on jury decision making Race of Defendant Black White Jury 1 2 3 4 5 6 7 8 9 10 11 12 o Race is the main variable of interest and is a fixed effect o Jury is nested within race. Jury will most likely be considered a random effect o Each jury is composed of 12 participants. The participants are nested within jury (and are also a random effect)
  • 44. 9-44  2004 A. Karpinski • Example #3: A new intervention is developed to reduce drug use in inner city middle-schools students. Six inner-city schools are selected at random, three receive the new intervention and three receive the old intervention and within each of those schools two classrooms are selected at random to receive the new intervention. Old intervention School School A School B School C Classroom 1 2 3 4 5 6 7 8 9 10 11 12 New intervention School School D School E School F Classroom 1 2 3 4 5 6 7 8 9 10 11 12 o Type of intervention is a fixed effect o School is a random effect nested within treatment o Classroom is a random effect nested within school o The participants are a random effect nested within classroom • General comments about nested designs o In behavioral research, nested factors are usually random effects o In factorial between subjects designs, participants are nested within cell • Because I am presenting only an introduction to nested designs, I will consider only designs with random effects nested within a fixed effect (like these examples). I can provide references for the analysis of more advanced designs.
  • 45. 9-45  2004 A. Karpinski 14. Structural models for nested designs • Example #1: Therapist’s sex and treatment outcome o Factor A: Therapist’s sex (Male vs. Female) Fixed effect o Factor B: Therapist Random effect )()( / jkijkjijkY εαβαµ σ +++= jα The fixed effect of therapist’s sex αβσ /)( jk The random effect of therapist within sex )( jkiε The errors/residuals AKA the random effect of participant within therapist Sometimes notated βπσ /)( jki to emphasize the nesting • Example #3: Drug use intervention o Factor A: Intervention Fixed effect o Factor B: School within intervention Random effect o Factor C: Classroom within school Random effect )()()( // jklijkljkjijklY εβγαβαµ σσ ++++= jα The fixed effect of intervention αβσ /)( jk The random effect of school within intervention βγσ /)( jkl The random effect of class within school )( jkiε The errors/residuals AKA the random effect of participant within class Sometimes notated γπσ /)( jkli to emphasize the nesting • Note that because these designs are nested, not crossed, there is no way to estimate an interaction effect.
  • 46. 9-46  2004 A. Karpinski 15.Testing nested effects • With nested effects, we again need to make sure we use the correct error term when constructing F-tests. Design Effect Error Term Two-factor B/A A MS(B/A) B Random B/A MSW A Fixed Three- factor C/B/A A MS(B/A) C,B Random B/A MS(C/B) A Fixed C/B MSW o Just as for the random effect designs – the SS are calculated in the same manner as before. The only difference is the construction of the F-test o For more complex designs, you’ll have to look up the error term, or trust SPSS • Example #1: Therapist’s sex and treatment outcome Sex of Therapist Male Female 1 2 3 4 5 6 49 42 42 54 44 57 40 48 46 60 54 62 31 52 50 64 54 66 40 58 54 70 64 71 o To test the effect of sex of therapist, we treat each therapist as one observation (collapsing across participants) Sex of Therapist Male Female 40 50 48 62 54 64 A one-factor ANOVA on these six observations would have: 1 df in the numerator 4 df in the denominator This is essentially how the effect of sex of therapist is analyzed in a nested design
  • 47. 9-47  2004 A. Karpinski o SPSS syntax: UNIANOVA dv BY sex thera /RANDOM = thera /DESIGN = sex thera within sex . Tests of Between-Subjects Effects Dependent Variable: DV 67416.000 1 67416.000 601.929 .000 448.000 4 112.000a 1176.000 1 1176.000 10.500 .032 448.000 4 112.000a 448.000 4 112.000 2.459 .083 820.000 18 45.556b Source Hypothesis Error Intercept Hypothesis Error SEX Hypothesis Error THERA(SEX) Type III Sum of Squares df Mean Square F Sig. MS(THERA(SEX))a. MS(Error)b. Effect for sex of therapist: F(1,4) = 10.50, p = .03 Effect of therapist: F(4, 18) = 2.46, p = .08 o Let’s do the one-factor ANOVA on the collapsed data to examine the effect of sex of therapist Sex of Therapist Male Female 40 50 48 62 54 64 Descriptives DV 3 46.0000 3 60.0000 6 53.0000 1.00 2.00 Total N Mean ANOVA DV 294.000 1 294.000 10.500 .032 112.000 4 28.000 406.000 5 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. • This analysis produces the same results – only the SS are different. This analysis was tricked into thinking each observation was one participant, but in the actual analysis, we know that each ‘observation’ was based on data from four participants. If you multiply the SS in this oneway analysis by 4, you will get the same results as the nested analysis. (This trick only works for balanced designs)
  • 48. 9-48  2004 A. Karpinski o To calculate the effect sizes: • Sex is a fixed effect, so we need to calculate omega squared MSWNerrortermMSdfASSA errortermMSdfASSA A )(][)( ][)( ˆ 2 ++ − =ω 45. 56.45)24(112)1(1176 112)1(1176 ˆ 2 = ++ − =Sexω • Therapist within sex is a random effect, so we need to calculate phi 2 2 )( )( Y sexThera sexThera σ σ ρ = Expected Mean Squares 4.000 1.000 Intercept, SEX 4.000 1.000 SEX 4.000 1.000 .000 1.000 Source Intercept SEX THERA(SEX) Error Var(THER A(SEX)) Var(Error) Quadratic Term Variance Component 22 )()( 4)( εσσ += sexTherasexTheraMSE 86.18 4 56.45121 4 ˆ )(2 )( = − = − = MSWMS sexThera sexTheraσ 22 )( 2 εσσσ += sexTheraY 42.6456.4586.18ˆ 2 =+=Yσ 29. 42.64 86.18 ˆ ˆ ˆ 2 2 )( )( === Y sexThera sexThera σ σ ρ
  • 49. 9-49  2004 A. Karpinski • Example #3: Drug use intervention (Let’s assume that there were three students in each class) Old Intervention School 1 School 2 School 3 1 2 3 4 1 2 3 4 1 2 3 4 11.2 16.5 18.3 19 7.3 11.9 11.3 8.9 15.3 19.5 14.1 16.5 11.6 16.8 18.7 18.5 7.8 12.4 10.9 9.4 15.9 20.1 13.8 17.2 12.0 16.1 19.0 18.2 7.0 12.0 10.5 9.3 16.0 19.3 14.2 16.9 New Intervention School 1 School 2 School 3 1 2 3 4 1 2 3 4 1 2 3 4 13.2 17.25 20.3 20.5 9.3 12.9 10.3 10.9 17.55 20.75 15.1 18.75 12.35 18.8 18.45 17.5 7.05 14.65 12.15 8.15 14.9 22.1 14.55 17.2 13.25 15.85 21.0 19.2 8.5 14.25 10.0 11.55 17.75 21.3 13.7 16.9 o To gain an intuitive understanding of how nested effects are tested, it is beneficial to examine each effect separately o To test the effect of the intervention, we essentially treat each school as one observation (collapsing across classrooms and participants) Intervention Old New 16.33 9.89 16.57 17.30 10.81 17.55 A one-factor ANOVA on these six observations has: 1 df in the numerator (a-1) = (2-1) = 1 4 df in the denominator a(b-1) = 2(3-1) = 2*2 = 4 ONEWAY dv by treat /STAT = DESC. Descriptives DV 3 14.2613 3.78589 3 15.2200 3.82122 6 14.7407 3.44232 1.00 2.00 Total N Mean Std. Deviation ANOVA DV 1.379 1 1.379 .095 .773 57.869 4 14.467 59.248 5 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. F(1,4) = 0.10, p = .77
  • 50. 9-50  2004 A. Karpinski o To test the effect of school (within intervention), we treat each class as one observation (collapsing across participants) School (Treatment) 1(Old) 2(Old) 3(Old) 1(New) 2(New) 3(New) 11.60 7.37 15.73 12.93 8.28 16.73 16.47 12.10 19.63 17.30 13.93 21.38 18.67 10.90 14.03 19.92 10.81 14.45 18.57 9.20 16.86 19.07 10.20 17.61 A school within treatment ANOVA on these 24 observations has: 4 df in the numerator a(b-1) = 2(3-1) = 2*2 = 4 18 df in the denominator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18 UNIANOVA dv BY treat school /DESIGN = treat, school within treat. Tests of Between-Subjects Effects Dependent Variable: DV 237.029 5 47.406 6.427 .001 5213.833 1 5213.833 706.816 .000 5.491a 1 5.491 .744 .400 231.538 4 57.885 7.847 .001 132.777 18 7.377 5583.639 24 369.807 23 Source Corrected Model Intercept TREAT SCHOOL(TREAT) Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. Ignore this test for the effect of treatment in this setupa. F(4,18) = 7.85, p = .001 o Finally, to test the effect of class (within school within intervention), we examine the individual observations This analysis has: 18 df in the numerator ab(c-1) = 2*3*(4-1) = 2*3*3 = 18 48 df in the denominator abc(n-1) = 2*3*4*(3-1) = 48
  • 51. 9-51  2004 A. Karpinski o To analyze all the effects in one command: UNIANOVA dv BY treat school class /RANDOM = school class /PRINT = DESC /DESIGN = treat, school within treat, class within school within treat. Tests of Between-Subjects Effects Dependent Variable: DV 15643.857 1 15643.857 90.088 .001 694.600 4 173.650a 16.531 1 16.531 .095 .773 694.600 4 173.650a 694.600 4 173.650 7.850 .001 398.194 18 22.122b 398.194 18 22.122 27.682 .000 38.358 48 .799c Source Hypothesis Error Intercept Hypothesis Error TREAT Hypothesis Error SCHOOL(TREAT) Hypothesis Error CLASS(SCHOOL (TREAT)) Type III Sum of Squares df Mean Square F Sig. MS(SCHOOL(TREAT))a. MS(CLASS(SCHOOL(TREAT)))b. MS(Error)c. Effect of treatment: F(1,4) = 0.10, p = .77 Effect of school(treatment): F(4,18) = 7.85, p = .001 Effect of class(school(treatment)): F(18,48) = 27.68, p < .001 o SPSS also provides the variance components so that effect sizes can be calculated for the random effects Expected Mean Squaresa,b 12.000 3.000 1.000 Intercept, TREAT 12.000 3.000 1.000 TREAT 12.000 3.000 1.000 .000 3.000 1.000 .000 .000 1.000 Source Intercept TREAT SCHOOL(TREAT) CLASS(SCHOOL (TREAT)) Error Var(SCHOO L(TREAT)) Var(CLASS (SCHOOL(T REAT))) Var(Error) Quadratic Term Variance Component For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell. a. Expected Mean Squares are based on the Type III Sums of Squares.b.
  • 52. 9-52  2004 A. Karpinski 16.Final considerations about nested designs • In these examples, we did not test the assumptions for the model because of small cell sizes. However, the ANOVA assumptions must be satisfied for the results to be valid. The assumptions for a nested model are the same as the assumptions for a fixed or random effects model (depending on if there are fixed or random effects in the model). • Pay attention to the small degrees of freedom in the tests for some of the nested effects. In both examples, the test of the fixed effect (the effect of most interest in these designs) is based on six observations! Nested designs can have very low power unless you have a large number of levels of the nested effects. • We have focused on balanced complete nested designs with random effects nested within a fixed effect. Many other nested designs are possible – including partially nested designs. Before you run a more complicated nested design, make sure that you know how to analyze it. Kirk (1995) is a good reference. • As in the random effects case, contrasts and post-hoc tests can be conducted by using the appropriate error term in previously developed equations. • We have discussed nested designs in an ANOVA framework where all the independent variables are categorical variables. In a regression framework, these models are usually called hierarchical linear models (HLM) and are very popular at the moment. In an HLM analysis, different terminology and different methods of estimation are used, but the interpretation is the same.
  • 53. 9-53  2004 A. Karpinski ANOVA designs with randomized blocks 17.The logic of blocking • When we test the effect of a factor on a dependent variable, there are always many other factors that lead to variability in the DV. When these variables are not of interest to us, they are called nuisance variables. • For example, if we are interested in the relationship between type of therapy and psychological wellness, there are many other factors that influence wellness other than the type of therapy. • What can we do about nuisance variables? o The typical approach is to use random assignment of participants to treatment conditions. • The nuisance variables are distributed equally over the experimental factors so that they do not affect just one treatment level. • However, all the variation in the DV caused by the nuisance variable is accumulated in the MSW. A large MSW (relative to the MS of the factor of interest) will decrease our power to detect the effect of interest. o An alternative approach is to hold the nuisance variables constant. • For example, to examine the effectiveness of several types of therapy, we can use only 18-year-old white females who have the same severity of the disorder. By creating a homogenous sample, we will decrease the MSW and increase our power. • This approach limits the generalizability of the conclusions. In addition, if you attempt to hold several variables constant, it may be difficult to find participants for the study. o You can also include the nuisance variable(s) as factors in the study. This approach is known as blocking.
  • 54. 9-54  2004 A. Karpinski • Any variable that is related to the DV may be used as a blocking variable. There are two categories of common blocking variables: o Characteristics associated with the participant: • Gender • Age • Income • IQ • Education • Attitudes • Previous experience with task o Characteristics associated with the experimental setting: • Time of day • Batch of material • Location • Week • Measuring instrument • The participant (!) • When we include a blocking factor in the design, we can capture the variability it causes in the DV in a SS(Blocks). This process will reduce the SS Within, compared to a non blocked design SS Total (SS Corrected Total) SS Error df = N-a SS A df=(a-1) SS Blocks df = bl-1 SS Residual df = N – a – bl + 1 SS A df=a-1
  • 55. 9-55  2004 A. Karpinski 18.Examples of blocked designs • Example #1: Methods of quantifying risk. Managers were exposed to one of three methods of quantifying risk. After learning about the method, participants were asked to rate their degree of confidence in their risk assessments. Fifteen participants were grouped into five blocks, according to their age. Within each block, participants were randomly assigned to one of the three experimental conditions o Layout for a randomized block design Participant 1 2 3 Block 1 (Oldest participants) C W U 2 C U W 3 U W C 4 W U C 5 (Youngest participants) W C U o Data from the quantifying risk example: Method Block Utility Worry Comparison Average 1 (oldest) 1 5 8 4.7 2 2 8 14 8.0 3 7 9 16 10.7 4 6 13 18 12.3 5 (youngest) 12 14 17 14.3 Average 5.6 14 17 • Note that a randomized block design looks like a factorial design, but there is only one participant per cell. If there were two or more participants per cell, we would call this design a two-way ANOVA. • Because there is one participant per cell, we do not have any information to test the block by factor interaction.
  • 56. 9-56  2004 A. Karpinski o Assumptions for a randomized block design: • Because we only have one observation/cell, we cannot check assumptions on a cell-by-cell basis as we would for a factorial design. • We require the standard assumptions: ⇒ Independently and randomly sampled observations ⇒ Homogeneity of variances (Checked on the marginal means for the factor AND for the blocks) ⇒ Normality (By block and by treatment) ⇒ We assume that there is no treatment by block interaction (non- additivity of treatment and blocks) Plot observed values by block and look for parallel lines • Additional assumptions are required if the blocking factor is a random effect o Checking assumptions in the quantifying risk example EXAMINE VARIABLES=dv BY block treat /PLOT BOXPLOT SPREADLEVEL NPPLOT. • By treatment: Test of Homogeneity of Variance .048 2 12 .953DV Levene Statistic df1 df2 Sig. 555N = TREAT 3.002.001.00 DV 20 10 0 -10 3 Tests of Normality .940 5 .665 .943 5 .687 .860 5 .227 TREAT 1.00 2.00 3.00 DV Statistic df Sig. Shapiro-Wilk
  • 57. 9-57  2004 A. Karpinski • By block: Test of Homogeneity of Variances DV .552 4 10 .702 Levene Statistic df1 df2 Sig. 33333N = BLOCK 5.004.003.002.001.00 DV 20 10 0 -10 Tests of Normality .993 3 .843 1.000 3 1.000 .907 3 .407 .991 3 .817 .987 3 .780 BLOCK 1.00 2.00 3.00 4.00 5.00 DV Statistic df Sig. Shapiro-Wilk But with three observations per block, these tests are essentially worthless! • No treatment by block interaction Test for Interaction 0 4 8 12 16 20 Utility Worry Comparison Block 1 Block 2 Block 3 Block 4 Block 5 It may be difficult to judge the difference between random error and a true block * factor interaction. You are looking for an extreme pattern in the data. o All the assumptions appear to be satisfied in this case
  • 58. 9-58  2004 A. Karpinski o What to do if assumptions are not satisfied? • Non-normality and/or moderate heterogeneity of variances ⇒ Rank data and perform analysis on ranked data • Heterogeneity of variances and/or treatment by block interaction ⇒ Transform data o Structural model for a randomized block design with one factor and one block: ijijijY εταµ +++= µ = Grand population mean ..ˆ Y=µ jα = The treatment effect: The effect of being in level j of factor A ∑ = 0jα or ),0(~ ασ σα Nj ...ˆ YY jj −=α iτ = The block effect: The effect of being in level i of the blocking variable ∑ = 0iτ ...ˆ YYii −=τ ijε = The unexplained error associated with ijY ....ˆ YYYY jiijij +−−=ε • The randomized block design is identical to a two-factor ANOVA with no interaction term. • In this case, the blocking variable is considered to be a fixed variable. Special accommodations are necessary for a random blocking factor.
  • 59. 9-59  2004 A. Karpinski o Sums of squares decomposition and ANOVA table for a randomized block design: E(MS) Source SS df MS Treatments Fixed Treatments Random Treatment SSA a-1 MSA 1 2 2 − + ∑ a bl jα σε 22 αε σσ bl+ Blocks SSBL bl-1 MSBL 1 2 2 − + ∑ bl a jτ σε 1 2 2 − + ∑ bl a jτ σε Error SSError (a-1)(bl-1) MSE 2 εσ 2 εσ Total SST N-1 • To construct a significance test ⇒ For fixed treatment effects For Random Treatment effects 0...: 210 ==== aH ααα 0: 2 0 =ασH ⇒ But for either fixed or random effects, we construct the F-test in the same manner MSE MSA blaaF =−−− )]1)(1(,1[ ⇒ To test for the block effect MSE MSBL blablF =−−− )]1)(1(,1[ However, we are usually not so interested in the test of the blocking variable. We included this variable to reduce the error variability.
  • 60. 9-60  2004 A. Karpinski o Using SPSS to analyze a randomized block design UNIANOVA dv BY block treat /DESIGN = treat block. Note that a factorial design (treatment, block, and treatment*block) is assumed unless otherwise stated with the DESIGN subcommand Tests of Between-Subjects Effects Dependent Variable: DV 374.133a 6 62.356 20.901 .000 1500.000 1 1500.000 502.793 .000 202.800 2 101.400 33.989 .000 171.333 4 42.833 14.358 .001 23.867 8 2.983 1898.000 15 398.000 14 Source Corrected Model Intercept TREAT BLOCK Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .940 (Adjusted R Squared = .895)a. • We find a significant treatment effect, F(2,8) = 33.99, p < .001 ˆωA 2 = SSA −(dfA)MSError SSA+ (N − dfA)MSError = 202.8 −(2)2.983 202.8 + (15 −2)2.983 = .814 • Note that post-hoc tests on the marginal treatment means are required to identify the effect o What if we had neglected to block by age of participant? ONEWAY dv BY treat. ANOVA DV 202.800 2 101.400 6.234 .014 195.200 12 16.267 398.000 14 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. 41. 267.16)215(8.202 267.16)2(8.202 )( )( ˆ 2 = −+ − = −+ − = MSWithindfANSSA MSWithindfASSA Aω • Although inclusion of the blocking effect did not change the conclusion of the statistical test, blocking greatly increased the size of the effect of treatment.
  • 61. 9-61  2004 A. Karpinski • Example #2: Fat in the diet. A researcher studies three low fat diets. Participants were blocked on the basis of age. DV = post-diet reduction in blood plasma lipid levels Fat content of diet Block Extremely Low Fairly Low Moderately Low 15-24 .73 .67 .35 25-34 .86 .75 .41 35-44 .94 .81 .46 45-54 1.40 1.32 .95 55-64 1.62 1.41 .98 o First, let’s check the assumptions EXAMINE VARIABLES=dv BY block fat /PLOT BOXPLOT NPPLOT. By block By treatment level 33333N = BLOCK 5.004.003.002.001.00 DV 1.8 1.6 1.4 1.2 1.0 .8 .6 .4 .2 555N = FAT 3.002.001.00 DV 1.8 1.6 1.4 1.2 1.0 .8 .6 .4 .2 Tests of Normality .865 3 .281 .920 3 .452 .935 3 .506 .878 3 .320 .962 3 .626 BLOCK 1.00 2.00 3.00 4.00 5.00 DV Statistic df Sig. Shapiro-Wilk Tests of Normality .898 5 .401 .829 5 .138 .792 5 .070 FAT 1.00 2.00 3.00 DV Statistic df Sig. Shapiro-Wilk Test of Homogeneity of Variance .336 2 12 .721 .047 2 12 .954 .047 2 11.893 .954 .302 2 12 .745 Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean DV Levene Statistic df1 df2 Sig.
  • 62. 9-62  2004 A. Karpinski Check for treatment by block interaction: 0 0.4 0.8 1.2 1.6 2 Extreme Fair Moderate Age 15-24 Age 25-34 Age 35-44 Age 45-54 Age 55-64 • All assumptions seem fine o To examine the effect of fat in the diet on plasma lipid levels, let’s conduct a randomized block ANOVA UNIANOVA dv BY block fat /DESIGN = fat block. Tests of Between-Subjects Effects Dependent Variable: DV 2.045a 6 .341 141.102 .000 12.440 1 12.440 5151.017 .000 .626 2 .313 129.527 .000 1.419 4 .355 146.890 .000 1.932E-02 8 2.415E-03 14.504 15 2.064 14 Source Corrected Model Intercept FAT BLOCK Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .991 (Adjusted R Squared = .984)a. We find a significant effect of fat in the diet on plasma lipid levels, F(2,8) = 129.52, p < .001 Let’s conduct Tukey HSD post-hoc tests on the marginal treatment means. We can have SPSS do the test for us: UNIANOVA dv BY fat block /POSTHOC = fat ( TUKEY ) /DESIGN = fat block .
  • 63. 9-63  2004 A. Karpinski Multiple Comparisons Dependent Variable: DV Tukey HSD .1180* .03108 .013 .0292 .2068 .4800* .03108 .000 .3912 .5688 -.1180* .03108 .013 -.2068 -.0292 .3620* .03108 .000 .2732 .4508 -.4800* .03108 .000 -.5688 -.3912 -.3620* .03108 .000 -.4508 -.2732 (J) FAT 2.00 3.00 1.00 3.00 1.00 2.00 (I) FAT 1.00 2.00 3.00 Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound 95% Confidence Interval Based on observed means. The mean difference is significant at the .050 level.*. Extremely low vs. fairly low fat: t(8) = 3.80, p = .013 Extremely low vs. moderately low fat:t(8) = 15.44, p < .001 Fairly low vs. moderately low fat: t(8) = 11.65, p < .001 o Note that if we had neglected to block on age, we would have failed to find a significant treatment effect! ONEWAY dv BY fat. ANOVA DV .626 2 .313 2.610 .115 1.438 12 .120 2.064 14 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. o What would happen if we forgot this was a randomized block design, and attempted to analyze it as a factorial design? UNIANOVA dv BY fat block /DESIGN = fat block fat*block. Tests of Between-Subjects Effects Dependent Variable: DV 2.064a 14 .147 . . 12.440 1 12.440 . . .626 2 .313 . . 1.419 4 .355 . . 1.932E-02 8 2.415E-03 . . .000 0 . 14.504 15 2.064 14 Source Corrected Model Intercept FAT BLOCK FAT * BLOCK Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = 1.000 (Adjusted R Squared = .)a. Why did this happen???
  • 64. 9-64  2004 A. Karpinski • A final example: A researcher studied how children solved a variety of puzzles. Sixty children were blocked into groups of 6 on the basis of age, gender, and IQ. Within each block, children were randomly assigned to work on a specific type of puzzle. The number of puzzles (out of a possible 20) solved by each child was recorded. Puzzle Type Block P1 P2 P3 P4 P5 P6 1 5 14 8 10 11 6 2 7 10 7 9 12 5 3 11 9 10 11 14 6 4 9 10 6 13 15 7 5 13 12 7 14 16 11 6 7 9 8 6 11 5 7 10 11 8 12 13 8 8 4 8 5 7 9 4 9 14 13 11 15 17 12 10 9 9 8 10 14 9 o First, let’s check assumptions: EXAMINE VARIABLES=dv by block puzzle /PLOT BOXPLOT NPPLOT SPREADLEVEL. • By factor 101010101010N = PUZZLE 6.005.004.003.002.001.00 DV 18 16 14 12 10 8 6 4 2 45 15 51 Tests of Normality .970 10 .891 .924 10 .394 .941 10 .560 .974 10 .925 .979 10 .959 .927 10 .415 PUZZLE 1.00 2.00 3.00 4.00 5.00 6.00 DV Statistic df Sig. Shapiro-Wilk Test of Homogeneity of Variance 1.110 5 54 .366Based on MeanDV Levene Statistic df1 df2 Sig.
  • 65. 9-65  2004 A. Karpinski • By block 6666666666N = BLOCK 10.009.008.007.006.005.004.003.002.001.00 DV 18 16 14 12 10 8 6 4 2 59 18 17 Tests of Normality .969 6 .886 .972 6 .907 .964 6 .847 .952 6 .759 .963 6 .846 .983 6 .964 .918 6 .493 .892 6 .331 .983 6 .964 .750 6 .020 BLOCK 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 DV Statistic df Sig. Shapiro-Wilk Test of Homogeneity of Variance .521 9 50 .852Based on MeanDV Levene Statistic df1 df2 Sig. • Block by factor interaction 0 2 4 6 8 10 12 14 16 18 P1 P2 P3 P4 P5 P6 • All appears OK.
  • 66. 9-66  2004 A. Karpinski • Let’s start with a general ANOVA approach UNIANOVA dv BY puzzle block /DESIGN = puzzle block. Tests of Between-Subjects Effects Dependent Variable: DV 488.000a 14 34.857 15.121 .000 5684.267 1 5684.267 2465.861 .000 238.933 5 47.787 20.730 .000 249.067 9 27.674 12.005 .000 103.733 45 2.305 6276.000 60 591.733 59 Source Corrected Model Intercept PUZZLE BLOCK Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .825 (Adjusted R Squared = .770)a. o We find a significant puzzle effect, 01.,73.20)45,5( <= pF o To describe specific differences, we conduct pair-wise posthoc tests UNIANOVA dv BY puzzle block /POSTHOC = puzzle ( TUKEY ) /DESIGN = puzzle block. Multiple Comparisons Dependent Variable: DV Tukey HSD -1.6000 .67900 .194 -3.6207 .4207 1.1000 .67900 .590 -.9207 3.1207 -1.8000 .67900 .106 -3.8207 .2207 -4.3000 .67900 .000 -6.3207 -2.2793 1.6000 .67900 .194 -.4207 3.6207 2.7000 .67900 .003 .6793 4.7207 -.2000 .67900 1.000 -2.2207 1.8207 -2.7000 .67900 .003 -4.7207 -.6793 3.2000 .67900 .000 1.1793 5.2207 -2.9000 .67900 .001 -4.9207 -.8793 -5.4000 .67900 .000 -7.4207 -3.3793 .5000 .67900 .976 -1.5207 2.5207 -2.5000 .67900 .008 -4.5207 -.4793 3.4000 .67900 .000 1.3793 5.4207 5.9000 .67900 .000 3.8793 7.9207 (J) PUZZLE 2.00 3.00 4.00 5.00 6.00 3.00 4.00 5.00 6.00 4.00 5.00 6.00 5.00 6.00 6.00 (I) PUZZLE 1.00 2.00 3.00 4.00 5.00 Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound 95% Confidence Interval Based on observed means. • Puzzle 5 is solved more frequently than all other puzzles • Puzzles 2 and 4 are solved more frequently than puzzles 3 and 6
  • 67. 9-67  2004 A. Karpinski • Alternatively, imagine that you had the following a priori hypotheses o P2 = P4 o P3 = P6 o       + >      + > 2 63 2 42 5 PPPP P o We cannot enter contrasts directly into SPSS, so we’ll have to do these contrasts by hand. o Computing and testing a Main Effect Contrast (see 7-39) .........ˆ 1 11 ar a j jj XcXcXc ++== ∑= ψ ∑= = a j j j n c MSErrorStdError 1 2 )ˆ(ψ Where 2 jc is the squared weight for each marginal mean jn is the sample size for each marginal mean MSE is MSE from the omnibus ANOVA (With the effects of the blocks removed) )ˆerror(standard ˆ ~ ψ ψ t ∑ ∑= j j jj observed n c MSE Xc t 2 .. ∑ = j j n c SS 2 2 ˆ )ˆ( ψ ψ MSE SSC dfw SSE dfc SSC dfwF ==),1(
  • 68. 9-68  2004 A. Karpinski o Create contrast coefficients: • P2 = P4 (0 –1 0 1 0 0) • P3 = P6 (0 0 –1 0 0 1) •       + >      + > 2 63 2 42 5 PPPP P (0 -1 0 -1 2 0) (0 1 -1 1 0 -1) o Compute the value of each contrast: Descriptive Statistics Dependent Variable: DV 8.9000 3.24722 10 10.5000 1.95789 10 7.8000 1.75119 10 10.7000 2.90784 10 13.2000 2.48551 10 7.3000 2.66875 10 9.7333 3.16692 60 PUZZLE 1.00 2.00 3.00 4.00 5.00 6.00 Total Mean Std. Deviation N (0 –1 0 1 0 0) 2.07.105.10ˆ1 =+−=ψ 2.0)ˆ( 1 =ψSS (0 0 –1 0 0 1) 4.03.78.7ˆ2 −=+−=ψ 8.0)ˆ( 2 =ψSS (0 -1 0 -1 2 0) 2.52.13*27.105.10ˆ3 =+−−=ψ 067.45)ˆ( 3 =ψSS (0 1 -1 1 0 -1) 1.63.77.108.75.10ˆ4 =−+−=ψ 025.93)ˆ( 4 =ψSS o Test the contrast: 77.,08.0 305.2 2. )45,1(:1 === pFψ 56.,35.0 305.2 8. )45,1(:2 === pFψ 01.,55.19 305.2 067.45 )45,1(:3 <== pFψ 01.,36.40 305.2 025.93 )45,1(:4 <== pFψ o Note that if these were post-hoc tests, then we would need to apply the Tukey HSD or Scheffé correction.
  • 69. 9-69  2004 A. Karpinski 19. Final considerations about blocking • As shown in the last SPSS output, when there is one participant per cell, the SS for the interaction is the error term. Some authors create ANOVA tables with no error term, and use the SS(BL*A) to test the effect of A. The only difference in these approaches is the labeling of the error term. • If the blocking variable is not related to the DV, then you actually lose power by including it in the design. Blocked Design Source SS df MS F Treatment SSA a-1 MSA MSE MSA blaNaF =+−−− )]1(),1[( Blocks 0 bl-1 MSBL Error SSError (a-1)(bl-1) MSE Total SST N-1 Standard Design Source SS df MS F Treatment SSA a-1 MSA F[(a −1),(N − a)] = MSA MSE Within SSError N-a MSE Total SST N-1 o When SSBL = 0, then MSE (in blocked design) = MSW (in the standard design), so that the F-ratios in the two cases are identical o But there are fewer degrees of freedom in the error term for the blocked design (N-a-bl+1) than in the standard design (N-a). The loss of these b- 1 dfs results in lower power for the blocked design. o In reality, the SSBL will never be exactly zero, but when SSBL is small and the number of blocks is large, you will lose power.
  • 70. 9-70  2004 A. Karpinski • The blocking variable must be a discrete variable. Oftentimes in behavioral research (and in both of our examples) the blocking variable is a continuous variable that must be artificially grouped for the purpose of analysis. When you treat a continuous variable as a discrete variable, you lose information and power. An analysis of covariance (ANCOVA) is a similar design to a randomized block design, except nuisance variables may be continuous. • Testing for non-additivity of treatment effects and blocks: o If looking at the plot of the DV by blocks makes you feel uneasy (it shouldn’t!), a statistical test is available: Tukey’s test for nonadditivity. o If you have more than 1 observation per cell, then you have a factorial design. You can calculate a SS(Bl*A) and test the interaction. • If you want to block on two factors, you can use the same procedure outlined here. Simply combine the two factors into one block. For example, to block on age and education: ⇒ Young and no education ⇒ Young and education ⇒ Old and no education ⇒ Old and education