Stepby-step guide to critiquingresearch. Part 1 quantitati.docx

Step'by-step guide to critiquing
research. Part 1: quantitative research
Michaei Coughian, Patricia Cronin, Frances Ryan
Abstract
When caring for patients it is essential that nurses are using the
current best practice. To determine what this is, nurses must be
able
to read research critically. But for many qualified and student
nurses
the terminology used in research can be difficult to understand
thus making critical reading even more daunting. It is
imperative
in nursing that care has its foundations in sound research and it
is
essential that all nurses have the ability to critically appraise
research
to identify what is best practice. This article is a step-by step-
approach
to critiquing quantitative research to help nurses demystify the
process and decode the terminology.
Key words: Quantitative research
methodologies
Review process • Research
]or many qualified nurses and nursing students
research is research, and it is often quite difficult
to grasp what others are referring to when they
discuss the limitations and or strengths within

a research study. Research texts and journals refer to
critiquing the literature, critical analysis, reviewing the
literature, evaluation and appraisal of the literature which
are in essence the same thing (Bassett and Bassett, 2003).
Terminology in research can be confusing for the novice
research reader where a term like 'random' refers to an
organized manner of selecting items or participants, and the
word 'significance' is applied to a degree of chance. Thus
the aim of this article is to take a step-by-step approach to
critiquing research in an attempt to help nurses demystify
the process and decode the terminology.
When caring for patients it is essential that nurses are
using the current best practice. To determine what this is
nurses must be able to read research. The adage 'All that
glitters is not gold' is also true in research. Not all research
is of the same quality or of a high standard and therefore
nurses should not simply take research at face value simply
because it has been published (Cullum and Droogan, 1999;
Rolit and Beck, 2006). Critiquing is a systematic method of
Michael Coughlan, Patricia Cronin and Frances Ryan are
Lecturers,
School of Nursing and Midwifery, University of Dubhn, Trinity
College, Dublin
Accepted for publication: March 2007
appraising the strengths and limitations of a piece of research
in order to determine its credibility and/or its applicability
to practice (Valente, 2003). Seeking only limitations in a
study is criticism and critiquing and criticism are not the
same (Burns and Grove, 1997). A critique is an impersonal
evaluation of the strengths and limitations of the research
being reviewed and should not be seen as a disparagement

of the researchers ability. Neither should it be regarded as
a jousting match between the researcher and the reviewer.
Burns and Grove (1999) call this an 'intellectual critique'
in that it is not the creator but the creation that is being
evaluated. The reviewer maintains objectivity throughout
the critique. No personal views are expressed by the
reviewer and the strengths and/or limitations of the study
and the imphcations of these are highlighted with reference
to research texts or journals. It is also important to remember
that research works within the realms of probability where
nothing is absolutely certain. It is therefore important to
refer to the apparent strengths, limitations and findings
of a piece of research (Burns and Grove, 1997). The use
of personal pronouns is also avoided in order that an
appearance of objectivity can be maintained.
Credibility and integrity
There are numerous tools available to help both novice and
advanced reviewers to critique research studies (Tanner,
2003). These tools generally ask questions that can help the
reviewer to determine the degree to which the steps in the
research process were followed. However, some steps are
more important than others and very few tools acknowledge
this. Ryan-Wenger (1992) suggests that questions in a
critiquing tool can be subdivided in those that are useful
for getting a feel for the study being presented which she
calls 'credibility variables' and those that are essential for
evaluating the research process called 'integrity variables'.
Credibility variables concentrate on how believable the
work appears and focus on the researcher's qualifications and
ability to undertake and accurately present the study. The
answers to these questions are important when critiquing
a piece of research as they can offer the reader an insight
into vhat to expect in the remainder of the study.
However, the reader should be aware that identified strengths

and limitations within this section will not necessarily
correspond with what will be found in the rest of the work.
Integrity questions, on the other hand, are interested in the
robustness of the research method, seeking to identify how
appropriately and accurately the researcher followed the
steps in the research process. The answers to these questions
658 British Journal of Nursing. 2007. Vol 16, No II
RESEARCH METHODOLOGIES
Table 1. Research questions - guidelines for critiquing a
quantitative research study
Elements influencing the beiievabiiity of the research
Elements
Writing styie
Author
Report titie
Abstract
Questions
Is the report well written - concise, grammatically correct,
avoid the use of jargon? Is it weil iaid out and
organized?
Do the researcher(s') quaiifications/position indicate a degree of
knowledge in this particuiar field?
Is the title clear, accurate and unambiguous?
Does the abstract offer a clear overview of the study including
the research problem, sample,
methodology, finding and recommendations?

Elements influencing the robustness of the research
Elements
Purpose/research
Problem
Logical consistency
Literature review
Theoreticai framework
Aims/objectives/
research question/
hypotheses
Sampie
Ethicai considerations
Operational definitions
Methodology
Data Anaiysis / results
Discussion
References
Questions
Is the purpose of the study/research problem clearly identified?
Does the research report foilow the steps of the research process
in a iogical manner? Do these steps
naturally fiow and are the iinks ciear?
is the review Iogicaily organized? Does it offer a balanced
critical anaiysis of the iiterature? is the majority

of the literature of recent origin? is it mainly from primary
sources and of an empirical nature?
Has a conceptual or theoretical framework been identified? Is
the framework adequately described?
is the framework appropriate?
Have alms and objectives, a research question or hypothesis
been identified? If so are they clearly
stated? Do they reflect the information presented in the
iiterature review?
Has the target popuiation been cieariy identified? How were the
sample selected? Was it a probability
or non-probabiiity sampie? is it of adequate size? Are the
indusion/exciusion criteria dearly identified?
Were the participants fuiiy informed about the nature of the
research? Was the autonomy/
confidentiaiity of the participants guaranteed? Were the
participants protected from harm? Was ethicai
permission granted for the study?
Are aii the terms, theories and concepts mentioned in the study
dearly defined?
is the research design cieariy identified? Has the data gathering
instrument been described? is the
instrument appropriate? How was it deveioped? Were reliabiiity
and validity testing undertaken and the
resuits discussed? Was a piiot study undertaken?
What type of data and statisticai analysis was undertaken? Was
it appropriate? How many of the sampie
participated? Significance of the findings?
Are the findings iinked back to the iiterature review? if a
hypothesis was identified was it supported?
Were the strengths and limitations of the study including
generalizability discussed? Was a
recommendation for further research made?
Were ali the books, journais and other media aliuded to in the
study accurateiy referenced?

will help to identify the trustworthiness of the study and its
applicability to nursing practice.
Critiquing the research steps
In critiquing the steps in the research process a number
of questions need to be asked. However, these questions
are seeking more than a simple 'yes' or 'no' answer. The
questions are posed to stimulate the reviewer to consider
the implications of what the researcher has done. Does the
way a step has been applied appear to add to the strength
of the study or does it appear as a possible limitation to
implementation of the study's findings? {Table 1).
Eiements influencing beiievabiiity of the study
Writing style
Research reports should be well written, grammatically
correct, concise and well organized.The use of jargon should
be avoided where possible. The style should be such that it
attracts the reader to read on (Polit and Beck, 2006).
Author(s)
The author(s') qualifications and job title can be a useful
indicator into the researcher(s') knowledge of the area
under investigation and ability to ask the appropriate
questions (Conkin Dale, 2005). Conversely a research
study should be evaluated on its own merits and not
assumed to be valid and reliable simply based on the
author(s') qualifications.
Report title
The title should be between 10 and 15 words long and
should clearly identify for the reader the purpose of the
study (Connell Meehan, 1999). Titles that are too long or
too short can be confusing or misleading (Parahoo, 2006).

Abstract
The abstract should provide a succinct overview of the
research and should include information regarding the
purpose of the study, method, sample size and selection.
Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659
the main findings and conclusions and recommendations
(Conkin Dale, 2005). From the abstract the reader should
be able to determine if the study is of interest and whether
or not to continue reading (Parahoo, 2006).
Eiements influencing robustness
Purpose of the study/research problem
A research problem is often first presented to the reader in
the introduction to the study (Bassett and Bassett, 2003).
Depending on what is to be investigated some authors will
refer to it as the purpose of the study. In either case the
statement should at least broadly indicate to the reader what
is to be studied (Polit and Beck, 2006). Broad problems are
often multi-faceted and will need to become narrower and
more focused before they can be researched. In this the
literature review can play a major role (Parahoo, 2006).
Logical consistency
A research study needs to follow the steps in the process in a
logical manner.There should also be a clear link between the
steps beginning with the purpose of the study and following
through the literature review, the theoretical framework, the
research question, the methodology section, the data analysis,
and the findings (Ryan-Wenger, 1992).
Literature review
The primary purpose of the literature review is to define

or develop the research question while also identifying
an appropriate method of data collection (Burns and
Grove, 1997). It should also help to identify any gaps in
the literature relating to the problem and to suggest how
those gaps might be filled. The literature review should
demonstrate an appropriate depth and breadth of reading
around the topic in question. The majority of studies
included should be of recent origin and ideally less than
five years old. However, there may be exceptions to this,
for example, in areas where there is a lack of research, or a
seminal or all-important piece of work that is still relevant to
current practice. It is important also that the review should
include some historical as well as contemporary material
in order to put the subject being studied into context. The
depth of coverage will depend on the nature of the subject,
for example, for a subject with a vast range of literature then
the review will need to concentrate on a very specific area
(Carnwell, 1997). Another important consideration is the
type and source of hterature presented. Primary empirical
data from the original source is more favourable than a
secondary source or anecdotal information where the
author relies on personal evidence or opinion that is not
founded on research.
A good review usually begins with an introduction which
identifies the key words used to conduct the search and
information about which databases were used. The themes
that emerged from the literature should then be presented
and discussed (Carnwell, 1997). In presenting previous
work it is important that the data is reviewed critically,
highlighting both the strengths and limitations of the study.
It should also be compared and contrasted with the findings
of other studies (Burns and Grove, 1997).
Theoretical framework
Following the identification of the research problem

and the review of the literature the researcher should
present the theoretical framework (Bassett and Bassett,
2003). Theoretical frameworks are a concept that novice
and experienced researchers find confusing. It is initially
important to note that not all research studies use a defined
theoretical framework (Robson, 2002). A theoretical
framework can be a conceptual model that is used as a
guide for the study (Conkin Dale, 2005) or themes from
the literature that are conceptually mapped and used to set
boundaries for the research (Miles and Huberman, 1994).
A sound framework also identifies the various concepts
being studied and the relationship between those concepts
(Burns and Grove, 1997). Such relationships should have
been identified in the literature. The research study should
then build on this theory through empirical observation.
Some theoretical frameworks may include a hypothesis.
Theoretical frameworks tend to be better developed in
experimental and quasi-experimental studies and often
poorly developed or non-existent in descriptive studies
(Burns and Grove, 1999).The theoretical framework should
be clearly identified and explained to the reader.
Aims and objectives/research question/
research hypothesis
The purpose of the aims and objectives of a study, the research
question and the research hypothesis is to form a link between
the initially stated purpose of the study or research problem
and how the study will be undertaken (Burns and Grove,
1999). They should be clearly stated and be congruent with
the data presented in the literature review. The use of these
items is dependent on the type of research being performed.
Some descriptive studies may not identify any of these items
but simply refer to the purpose of the study or the research
problem, others will include either aims and objectives or
research questions (Burns and Grove, 1999). Correlational
designs, study the relationships that exist between two or

more variables and accordingly use either a research question
or hypothesis. Experimental and quasi-experimental studies
should clearly state a hypothesis identifying the variables to
be manipulated, the population that is being studied and the
predicted outcome (Burns and Grove, 1999).
Sample and sample size
The degree to which a sample reflects the population it
was drawn from is known as representativeness and in
quantitative research this is a decisive factor in determining
the adequacy of a study (Polit and Beck, 2006). In order
to select a sample that is likely to be representative and
thus identify findings that are probably generalizable to
the target population a probability sample should be used
(Parahoo, 2006). The size of the sample is also important in
quantitative research as small samples are at risk of being
overly representative of small subgroups within the target
population. For example, if, in a sample of general nurses, it
was noticed that 40% of the respondents were males, then
males would appear to be over represented in the sample,
thereby creating a sampling error. The risk of sampling
660 Britishjournal of Nursing. 2007. Vol 16. No II
errors decrease as larger sample sizes are used (Burns and
Grove, 1997). In selecting the sample the researcher should
clearly identify who the target population are and what
criteria were used to include or exclude participants. It
should also be evident how the sample was selected and
how many were invited to participate (Russell, 2005).
Ethical considerations

Beauchamp and Childress (2001) identify four fundamental
moral principles: autonomy, non-maleficence, beneficence
and justice. Autonomy infers that an individual has the right
to freely decide to participate in a research study without
fear of coercion and with a full knowledge of what is being
investigated. Non-maleficence imphes an intention of not
harming and preventing harm occurring to participants
both of a physical and psychological nature (Parahoo,
2006). Beneficence is interpreted as the research benefiting
the participant and society as a whole (Beauchamp and
Childress, 2001). Justice is concerned with all participants
being treated as equals and no one group of individuals
receiving preferential treatment because, for example, of
their position in society (Parahoo, 2006). Beauchamp and
Childress (2001) also identify four moral rules that are both
closely connected to each other and with the principle of
autonomy. They are veracity (truthfulness), fidelity (loyalty
and trust), confidentiality and privacy.The latter pair are often
linked and imply that the researcher has a duty to respect the
confidentiality and/or the anonymity of participants and
non-participating subjects.
Ethical committees or institutional review boards have to
give approval before research can be undertaken. Their role
is to determine that ethical principles are being applied and
that the rights of the individual are being adhered to (Burns
and Grove, 1999).
In a research study the researcher needs to ensure that
the reader understands what is meant by the terms and
concepts that are used in the research. To ensure this any
concepts or terms referred to should be clearly defined
(Parahoo, 2006).
Methodology: research design

Methodology refers to the nuts and bolts of how a
research study is undertaken. There are a number of
important elements that need to be referred to here and
the first of these is the research design. There are several
types of quantitative studies that can be structured under
the headings of true experimental, quasi-experimental
and non-experimental designs (Robson, 2002) {Table 2).
Although it is outside the remit of this article, within each
of these categories there are a range of designs that will
impact on how the data collection and data analysis phases
of the study are undertaken. However, Robson (2002)
states these designs are similar in many respects as most
are concerned with patterns of group behaviour, averages,
tendencies and properties.
Methodology: data collection
The next element to consider after the research design
is the data collection method. In a quantitative study any
number of strategies can be adopted when collecting data
and these can include interviews, questionnaires, attitude
scales or observational tools. Questionnaires are the most
commonly used data gathering instruments and consist
mainly of closed questions with a choice of fixed answers.
Postal questionnaires are administered via the mail and have
the value of perceived anonymity. Questionnaires can also be
administered in face-to-face interviews or in some instances
over the telephone (Polit and Beck, 2006).
Methodology: instrument design
After identifying the appropriate data gathering method
the next step that needs to be considered is the design
of the instrument. Researchers have the choice of using
a previously designed instrument or developing one for
the study and this choice should be clearly declared for
the reader. Designing an instrument is a protracted and
sometimes difficult process (Burns and Grove, 1997) but the

overall aim is that the final questions will be clearly linked
to the research questions and will elicit accurate information
and will help achieve the goals of the research.This, however,
needs to be demonstrated by the researcher.
Table 2. Research designs
Design
Experimental
Qucisl-experimental
Non-experimental,
e.g. descriptive and
Includes: cross-sectional.
correlationai.
comparative.
iongitudinal studies
Sample
2 or more groups
One or more groups
One or more groups
Sample
allocation
Random
Random
Not applicable

Features
• Groups get
different treatments
• One variable has not
been manipuiated or
controlled (usually
because it cannot be)
• Discover new meaning
• Describe what already
exists
• Measure the relationship
between two or more
variables
Outcome
• Cause and effiect relationship
• Cause and effect relationship
but iess powerful than
experimental
• Possible hypothesis for
future research
• Tentative explanations
Britishjournal of Nursing. 2007. Vol 16. No 11 661

If a previously designed instrument is selected the researcher
should clearly establish that chosen instrument is the most
appropriate.This is achieved by outlining how the instrument
has measured the concepts under study. Previously designed
instruments are often in the form of standardized tests
or scales that have been developed for the purpose of
measuring a range of views, perceptions, attitudes, opinions
or even abilities. There are a multitude of tests and scales
available, therefore the researcher is expected to provide the
appropriate evidence in relation to the validity and reliability
of the instrument (Polit and Beck, 2006).
Methodology: validity and reliability
One of the most important features of any instrument is
that it measures the concept being studied in an unwavering
and consistent way. These are addressed under the broad
headings of validity and reliability respectively. In general,
validity is described as the ability of the instrument to
measure what it is supposed to measure and reliability the
instrument's ability to consistently and accurately measure
the concept under study (Wood et al, 2006). For the most
part, if a well established 'off the shelf instrument has been
used and not adapted in any way, the validity and reliability
will have been determined already and the researcher
should outline what this is. However, if the instrument
has been adapted in any way or is being used for a new
population then previous validity and reliability will not
apply. In these circumstances the researcher should indicate
how the reliability and validity of the adapted instrument
was established (Polit and Beck, 2006).
To establish if the chosen instrument is clear and
unambiguous and to ensure that the proposed study has
been conceptually well planned a mini-version of the main
study, referred to as a pilot study, should be undertaken before

the main study. Samples used in the pilot study are generally
omitted from the main study. Following the pilot study the
researcher may adjust definitions, alter the research question,
address changes to the measuring instrument or even alter
the sampling strategy.
Having described the research design, the researcher should
outline in clear, logical steps the process by which the data
was collected. All steps should be fully described and easy to
follow (Russell, 2005).
Analysis and results
Data analysis in quantitative research studies is often seen
as a daunting process. Much of this is associated with
apparently complex language and the notion of statistical
tests. The researcher should clearly identify what statistical
tests were undertaken, why these tests were used and
what •were the results. A rule of thumb is that studies that
are descriptive in design only use descriptive statistics,
correlational studies, quasi-experimental and experimental
studies use inferential statistics. The latter is subdivided
into tests to measure relationships and differences between
variables (Clegg, 1990).
Inferential statistical tests are used to identify if a
relationship or difference between variables is statistically
significant. Statistical significance helps the researcher to
rule out one important threat to validity and that is that the
result could be due to chance rather than to real differences
in the population. Quantitative studies usually identify the
lowest level of significance as PsO.O5 (P = probability)
(Clegg, 1990).
To enhance readability researchers frequently present
their findings and data analysis section under the headings

of the research questions (Russell, 2005). This can help the
reviewer determine if the results that are presented clearly
answer the research questions. Tables, charts and graphs may
be used to summarize the results and should be accurate,
clearly identified and enhance the presentation of results
(Russell, 2005).
The percentage of the sample who participated in
the study is an important element in considering the
generalizability of the results. At least fifty percent of the
sample is needed to participate if a response bias is to be
avoided (Polit and Beck, 2006).
Discussion/conclusion/recommendations
The discussion of the findings should Oow logically from the
data and should be related back to the literature review thus
placing the study in context (Russell, 2002). If the hypothesis
was deemed to have been supported by the findings,
the researcher should develop this in the discussion. If a
theoretical or conceptual framework was used in the study
then the relationship with the findings should be explored.
Any interpretations or inferences drawn should be clearly
identified as such and consistent with the results.
The significance of the findings should be stated but
these should be considered within the overall strengths
and limitations of the study (Polit and Beck, 2006). In this
section some consideration should be given to whether
or not the findings of the study were generalizable, also
referred to as external validity. Not all studies make a claim
to generalizability but the researcher should have undertaken
an assessment of the key factors in the design, sampling and
analysis of the study to support any such claim.
Finally the researcher should have explored the clinical
significance and relevance of the study. Applying findings

in practice should be suggested with caution and will
obviously depend on the nature and purpose of the study.
In addition, the researcher should make relevant and
meaningful suggestions for future research in the area
(Connell Meehan, 1999).
References
The research study should conclude with an accurate list
of all the books; journal articles, reports and other media
that were referred to in the work (Polit and Beck, 2006).
The referenced material is also a useful source of further
information on the subject being studied.
Conciusions
The process of critiquing involves an in-depth examination
of each stage of the research process. It is not a criticism but
rather an impersonal scrutiny of a piece of work using a
balanced and objective approach, the purpose of which is to
highlight both strengths and weaknesses, in order to identify
662 Uritish Journal of Nursinii. 2007. Vol 16. No II
whether a piece of research is trustworthy and unbiased. As
nursing practice is becoming increasingly more evidenced
based, it is important that care has its foundations in sound
research. It is therefore important that all nurses have the
ability to critically appraise research in order to identify what

is best practice. HH
Russell C (2005) Evaluating quantitative researcli reports.
Nephrol Nurs J
32(1): 61-4
Ryan-Wenger N (1992) Guidelines for critique of a research
report. Heart
Lung 21(4): 394-401
Tanner J (2003) Reading and critiquing research. BrJ Perioper
Nurs 13(4):
162-4
Valente S (2003) Research dissemination and utilization:
Improving care at
the bedside.J Nurs Care Quality 18(2): 114-21
Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in
Planning Nursing
Research: From Question to Proposal 6th edn. Jones and
Bartlett, Sudbury
Bassett C, B.issett J (2003) Reading and critiquing research. BrJ
Perioper
NriK 13(4): 162-4
Beauchamp T, Childress J (2001) Principles of Biomedical
Ethics. 5th edn.
O.xford University Press, Oxford
Burns N, Grove S (1997) The Practice of Nursing Research:
Conduct, Critique
and Utilization. 3rd edn.WB Saunders Company, Philadelphia
Burns N, Grove S (1999) Understanding Nursing Research. 2nd

edn. WB
Saunders Company. Philadelphia
Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21
Clegg F (1990) Simple Statistics: A Course Book for the Social
Sciences. 2nd edn.
Cambridge University Press. Cambridge
Conkin DaleJ (2005) Critiquing research for use in practice.J
Pediatr Health
Care 19: 183-6
Connell Meehan T (1999) The research critique. In:Treacy P,
Hyde A, eds.
Nursing Research and Design. UCD Press, Dublin: 57-74
Cullum N. Droogan J (1999) Using research and the role of
systematic
reviews of the literature. In: Mulhall A. Le May A. eds. Nursing
Research:
Dissemination and Implementation. Churchill Livingstone,
Edinburgh:
109-23-
Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd
edn. Sage,
Thousand Oaks. Ca
Parahoo K (2006) Nursing Research: Principles, Process and
Issties. 2nd edn.
Palgrave Macmillan. Houndmills Basingstoke
Polit D. Beck C (2006) Essentials of Nursing Care: Methods,
Appraisal and
Utilization. 6th edn. Lippincott Williams and Wilkins,

Philadelphia
Robson C (2002) Reat World Research. 2nd edn. Blackwell
Publishing,
O.xford
KEY POINTS
I Many qualified and student nurses have difficulty
understanding the concepts and terminology associated
with research and research critique.
IThe ability to critically read research is essential if the
profession is to achieve and maintain its goal to be
evidenced based.
IA critique of a piece of research is not a criticism of
the wori<, but an impersonai review to highlight the
strengths and iimitations of the study.
I It is important that all nurses have the ability to criticaiiy
appraise research In order to identify what is best
practice.
Critiquing Nursing Research 2nd edition
Critiquing
Nursing Research

2nd edition
ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x
156 mm; p/back; 224 pages;
publicatior) November 2006; £25.99
By John R Cutdiffe and Martin Ward
This 2nd edition of Critiquing Nursing Research retains the
features which made the original
a 'best seller' whilst incorporating new material in order to
expand the book's applicability. In
addition to reviewing and subsequently updating the material of
the original text, the authors
have added two further examples of approaches to crtitique
along with examples and an
additonal chapter on how to critique research as part of the
work of preparing a dissertation.
The fundamentals of the book however remain the same. It
focuses specifically on critiquing
nursing research; the increasing requirement for nurses to
become conversant with research,
understand its link with the use of evidence to underpin
practice; and the movement towards
becoming an evidence-based discipline.
As nurse education around the world increasingly moves
towards an all-graduate discipline, it
is vital for nurses to have the ability to critique research in
order to benefit practice. This book
is the perfect tool for those seeking to gain or develop precisely
that skill and is a must-have
for all students nurses, teachers and academics.
John Cutclitfe holds the 'David G. Braithwaite' Protessor of

Nursing Endowed Chair at the University of Texas (Tyler); he is
also an Adjunct Professor of Psychiatric Nursing at Stenberg
College International School of Nursing, Vancouver, Canada.
Matin Ward is an Independent tvtental Health Nurse Consultant
and Director of tvlW Protessional Develcpment Ltd.
To order your copy please contact us using the details below or
visit our website
www.quaybooks.co.yk where you will also tind details ot other
Quay Books otters and titles.
John Cutcliffe and Martin Ward
IQUAYBOOKS
AdMsioiiDftUHiolthcareM
Quay Books Division I MA Healthcare Limited
Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3
5HN I UK
Tel: 01722 716998 I Fax: 01722 716887 I E-mail:
[email protected] I Web: www.quaybooks.co.uk
A
ilH
MAHbUTHCASIUMITED
Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663

Step'by-step guide to critiquing
research. Part 1: quantitative research
Michaei Coughian, Patricia Cronin, Frances Ryan
Abstract
When caring for patients it is essential that nurses are using the
current best practice. To determine what this is, nurses must be
able
to read research critically. But for many qualified and student
nurses
the terminology used in research can be difficult to understand
thus making critical reading even more daunting. It is
imperative
in nursing that care has its foundations in sound research and it
is
essential that all nurses have the ability to critically appraise
research
to identify what is best practice. This article is a step-by step-
approach
to critiquing quantitative research to help nurses demystify the
process and decode the terminology.
Key words: Quantitative research
methodologies
Review process • Research
]or many qualified nurses and nursing students
research is research, and it is often quite difficult
to grasp what others are referring to when they
discuss the limitations and or strengths within
a research study. Research texts and journals refer to

critiquing the literature, critical analysis, reviewing the
literature, evaluation and appraisal of the literature which
are in essence the same thing (Bassett and Bassett, 2003).
Terminology in research can be confusing for the novice
research reader where a term like 'random' refers to an
organized manner of selecting items or participants, and the
word 'significance' is applied to a degree of chance. Thus
the aim of this article is to take a step-by-step approach to
critiquing research in an attempt to help nurses demystify
the process and decode the terminology.
When caring for patients it is essential that nurses are
using the current best practice. To determine what this is
nurses must be able to read research. The adage 'All that
glitters is not gold' is also true in research. Not all research
is of the same quality or of a high standard and therefore
nurses should not simply take research at face value simply
because it has been published (Cullum and Droogan, 1999;
Rolit and Beck, 2006). Critiquing is a systematic method of
Michael Coughlan, Patricia Cronin and Frances Ryan are
Lecturers,
School of Nursing and Midwifery, University of Dubhn, Trinity
College, Dublin
Accepted for publication: March 2007
appraising the strengths and limitations of a piece of research
in order to determine its credibility and/or its applicability
to practice (Valente, 2003). Seeking only limitations in a
study is criticism and critiquing and criticism are not the
same (Burns and Grove, 1997). A critique is an impersonal
evaluation of the strengths and limitations of the research
being reviewed and should not be seen as a disparagement
of the researchers ability. Neither should it be regarded as
a jousting match between the researcher and the reviewer.

Burns and Grove (1999) call this an 'intellectual critique'
in that it is not the creator but the creation that is being
evaluated. The reviewer maintains objectivity throughout
the critique. No personal views are expressed by the
reviewer and the strengths and/or limitations of the study
and the imphcations of these are highlighted with reference
to research texts or journals. It is also important to remember
that research works within the realms of probability where
nothing is absolutely certain. It is therefore important to
refer to the apparent strengths, limitations and findings
of a piece of research (Burns and Grove, 1997). The use
of personal pronouns is also avoided in order that an
appearance of objectivity can be maintained.
Credibility and integrity
There are numerous tools available to help both novice and
advanced reviewers to critique research studies (Tanner,
2003). These tools generally ask questions that can help the
reviewer to determine the degree to which the steps in the
research process were followed. However, some steps are
more important than others and very few tools acknowledge
this. Ryan-Wenger (1992) suggests that questions in a
critiquing tool can be subdivided in those that are useful
for getting a feel for the study being presented which she
calls 'credibility variables' and those that are essential for
evaluating the research process called 'integrity variables'.
Credibility variables concentrate on how believable the
work appears and focus on the researcher's qualifications and
ability to undertake and accurately present the study. The
answers to these questions are important when critiquing
a piece of research as they can offer the reader an insight
into vhat to expect in the remainder of the study.
However, the reader should be aware that identified strengths
and limitations within this section will not necessarily
correspond with what will be found in the rest of the work.

Integrity questions, on the other hand, are interested in the
robustness of the research method, seeking to identify how
appropriately and accurately the researcher followed the
steps in the research process. The answers to these questions
658 British Journal of Nursing. 2007. Vol 16, No II
Table 1. Research questions - guidelines for critiquing a
quantitative research study
Elements influencing the beiievabiiity of the research
Elements
Writing styie
Author
Report titie
Abstract
Questions
Is the report well written - concise, grammatically correct,
avoid the use of jargon? Is it weil iaid out and
organized?
Do the researcher(s') quaiifications/position indicate a degree of
knowledge in this particuiar field?
Is the title clear, accurate and unambiguous?
Does the abstract offer a clear overview of the study including
the research problem, sample,
methodology, finding and recommendations?
Elements influencing the robustness of the research

Elements
Purpose/research
Problem
Logical consistency
Literature review
Theoreticai framework
Aims/objectives/
research question/
hypotheses
Sampie
Ethicai considerations
Methodology
Data Anaiysis / results
Discussion
References
Questions
Is the purpose of the study/research problem clearly identified?
Does the research report foilow the steps of the research process
in a iogical manner? Do these steps
naturally fiow and are the iinks ciear?
is the review Iogicaily organized? Does it offer a balanced
critical anaiysis of the iiterature? is the majority
of the literature of recent origin? is it mainly from primary
sources and of an empirical nature?

Has a conceptual or theoretical framework been identified? Is
the framework adequately described?
is the framework appropriate?
Have alms and objectives, a research question or hypothesis
been identified? If so are they clearly
stated? Do they reflect the information presented in the
iiterature review?
Has the target popuiation been cieariy identified? How were the
sample selected? Was it a probability
or non-probabiiity sampie? is it of adequate size? Are the
indusion/exciusion criteria dearly identified?
Were the participants fuiiy informed about the nature of the
research? Was the autonomy/
confidentiaiity of the participants guaranteed? Were the
participants protected from harm? Was ethicai
permission granted for the study?
Are aii the terms, theories and concepts mentioned in the study
dearly defined?
is the research design cieariy identified? Has the data gathering
instrument been described? is the
instrument appropriate? How was it deveioped? Were reliabiiity
and validity testing undertaken and the
resuits discussed? Was a piiot study undertaken?
What type of data and statisticai analysis was undertaken? Was
it appropriate? How many of the sampie
participated? Significance of the findings?
Are the findings iinked back to the iiterature review? if a
hypothesis was identified was it supported?
Were the strengths and limitations of the study including
generalizability discussed? Was a
recommendation for further research made?
Were ali the books, journais and other media aliuded to in the
study accurateiy referenced?
will help to identify the trustworthiness of the study and its

applicability to nursing practice.
Critiquing the research steps
In critiquing the steps in the research process a number
of questions need to be asked. However, these questions
are seeking more than a simple 'yes' or 'no' answer. The
questions are posed to stimulate the reviewer to consider
the implications of what the researcher has done. Does the
way a step has been applied appear to add to the strength
of the study or does it appear as a possible limitation to
implementation of the study's findings? {Table 1).
Eiements influencing beiievabiiity of the study
Writing style
Research reports should be well written, grammatically
correct, concise and well organized.The use of jargon should
be avoided where possible. The style should be such that it
attracts the reader to read on (Polit and Beck, 2006).
Author(s)
The author(s') qualifications and job title can be a useful
indicator into the researcher(s') knowledge of the area
under investigation and ability to ask the appropriate
questions (Conkin Dale, 2005). Conversely a research
study should be evaluated on its own merits and not
assumed to be valid and reliable simply based on the
author(s') qualifications.
Report title
The title should be between 10 and 15 words long and
should clearly identify for the reader the purpose of the
study (Connell Meehan, 1999). Titles that are too long or
too short can be confusing or misleading (Parahoo, 2006).
Abstract
The abstract should provide a succinct overview of the

research and should include information regarding the
purpose of the study, method, sample size and selection.
Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659
the main findings and conclusions and recommendations
(Conkin Dale, 2005). From the abstract the reader should
be able to determine if the study is of interest and whether
or not to continue reading (Parahoo, 2006).
Eiements influencing robustness
Purpose of the study/research problem
A research problem is often first presented to the reader in
the introduction to the study (Bassett and Bassett, 2003).
Depending on what is to be investigated some authors will
refer to it as the purpose of the study. In either case the
statement should at least broadly indicate to the reader what
is to be studied (Polit and Beck, 2006). Broad problems are
often multi-faceted and will need to become narrower and
more focused before they can be researched. In this the
literature review can play a major role (Parahoo, 2006).
Logical consistency
A research study needs to follow the steps in the process in a
logical manner.There should also be a clear link between the
steps beginning with the purpose of the study and following
through the literature review, the theoretical framework, the
research question, the methodology section, the data analysis,
and the findings (Ryan-Wenger, 1992).
Literature review
The primary purpose of the literature review is to define
or develop the research question while also identifying
an appropriate method of data collection (Burns and

Grove, 1997). It should also help to identify any gaps in
the literature relating to the problem and to suggest how
those gaps might be filled. The literature review should
demonstrate an appropriate depth and breadth of reading
around the topic in question. The majority of studies
included should be of recent origin and ideally less than
five years old. However, there may be exceptions to this,
for example, in areas where there is a lack of research, or a
seminal or all-important piece of work that is still relevant to
current practice. It is important also that the review should
include some historical as well as contemporary material
in order to put the subject being studied into context. The
depth of coverage will depend on the nature of the subject,
for example, for a subject with a vast range of literature then
the review will need to concentrate on a very specific area
(Carnwell, 1997). Another important consideration is the
type and source of hterature presented. Primary empirical
data from the original source is more favourable than a
secondary source or anecdotal information where the
author relies on personal evidence or opinion that is not
founded on research.
A good review usually begins with an introduction which
identifies the key words used to conduct the search and
information about which databases were used. The themes
that emerged from the literature should then be presented
and discussed (Carnwell, 1997). In presenting previous
work it is important that the data is reviewed critically,
highlighting both the strengths and limitations of the study.
It should also be compared and contrasted with the findings
of other studies (Burns and Grove, 1997).
Theoretical framework
Following the identification of the research problem
and the review of the literature the researcher should
present the theoretical framework (Bassett and Bassett,

2003). Theoretical frameworks are a concept that novice
and experienced researchers find confusing. It is initially
important to note that not all research studies use a defined
theoretical framework (Robson, 2002). A theoretical
framework can be a conceptual model that is used as a
guide for the study (Conkin Dale, 2005) or themes from
the literature that are conceptually mapped and used to set
boundaries for the research (Miles and Huberman, 1994).
A sound framework also identifies the various concepts
being studied and the relationship between those concepts
(Burns and Grove, 1997). Such relationships should have
been identified in the literature. The research study should
then build on this theory through empirical observation.
Some theoretical frameworks may include a hypothesis.
Theoretical frameworks tend to be better developed in
experimental and quasi-experimental studies and often
poorly developed or non-existent in descriptive studies
(Burns and Grove, 1999).The theoretical framework should
be clearly identified and explained to the reader.
Aims and objectives/research question/
research hypothesis
The purpose of the aims and objectives of a study, the research
question and the research hypothesis is to form a link between
the initially stated purpose of the study or research problem
and how the study will be undertaken (Burns and Grove,
1999). They should be clearly stated and be congruent with
the data presented in the literature review. The use of these
items is dependent on the type of research being performed.
Some descriptive studies may not identify any of these items
but simply refer to the purpose of the study or the research
problem, others will include either aims and objectives or
research questions (Burns and Grove, 1999). Correlational
designs, study the relationships that exist between two or
more variables and accordingly use either a research question
or hypothesis. Experimental and quasi-experimental studies

should clearly state a hypothesis identifying the variables to
be manipulated, the population that is being studied and the
predicted outcome (Burns and Grove, 1999).
Sample and sample size
The degree to which a sample reflects the population it
was drawn from is known as representativeness and in
quantitative research this is a decisive factor in determining
the adequacy of a study (Polit and Beck, 2006). In order
to select a sample that is likely to be representative and
thus identify findings that are probably generalizable to
the target population a probability sample should be used
(Parahoo, 2006). The size of the sample is also important in
quantitative research as small samples are at risk of being
overly representative of small subgroups within the target
population. For example, if, in a sample of general nurses, it
was noticed that 40% of the respondents were males, then
males would appear to be over represented in the sample,
thereby creating a sampling error. The risk of sampling
660 Britishjournal of Nursing. 2007. Vol 16. No II
errors decrease as larger sample sizes are used (Burns and
Grove, 1997). In selecting the sample the researcher should
clearly identify who the target population are and what
criteria were used to include or exclude participants. It
should also be evident how the sample was selected and
how many were invited to participate (Russell, 2005).
Ethical considerations
Beauchamp and Childress (2001) identify four fundamental
moral principles: autonomy, non-maleficence, beneficence

and justice. Autonomy infers that an individual has the right
to freely decide to participate in a research study without
fear of coercion and with a full knowledge of what is being
investigated. Non-maleficence imphes an intention of not
harming and preventing harm occurring to participants
both of a physical and psychological nature (Parahoo,
2006). Beneficence is interpreted as the research benefiting
the participant and society as a whole (Beauchamp and
Childress, 2001). Justice is concerned with all participants
being treated as equals and no one group of individuals
receiving preferential treatment because, for example, of
their position in society (Parahoo, 2006). Beauchamp and
Childress (2001) also identify four moral rules that are both
closely connected to each other and with the principle of
autonomy. They are veracity (truthfulness), fidelity (loyalty
and trust), confidentiality and privacy.The latter pair are often
linked and imply that the researcher has a duty to respect the
confidentiality and/or the anonymity of participants and
non-participating subjects.
Ethical committees or institutional review boards have to
give approval before research can be undertaken. Their role
is to determine that ethical principles are being applied and
that the rights of the individual are being adhered to (Burns
and Grove, 1999).
In a research study the researcher needs to ensure that
the reader understands what is meant by the terms and
concepts that are used in the research. To ensure this any
concepts or terms referred to should be clearly defined
(Parahoo, 2006).
Methodology: research design
Methodology refers to the nuts and bolts of how a
research study is undertaken. There are a number of

important elements that need to be referred to here and
the first of these is the research design. There are several
types of quantitative studies that can be structured under
the headings of true experimental, quasi-experimental
and non-experimental designs (Robson, 2002) {Table 2).
Although it is outside the remit of this article, within each
of these categories there are a range of designs that will
impact on how the data collection and data analysis phases
of the study are undertaken. However, Robson (2002)
states these designs are similar in many respects as most
are concerned with patterns of group behaviour, averages,
tendencies and properties.
Methodology: data collection
The next element to consider after the research design
is the data collection method. In a quantitative study any
number of strategies can be adopted when collecting data
and these can include interviews, questionnaires, attitude
scales or observational tools. Questionnaires are the most
commonly used data gathering instruments and consist
mainly of closed questions with a choice of fixed answers.
Postal questionnaires are administered via the mail and have
the value of perceived anonymity. Questionnaires can also be
administered in face-to-face interviews or in some instances
over the telephone (Polit and Beck, 2006).
Methodology: instrument design
After identifying the appropriate data gathering method
the next step that needs to be considered is the design
of the instrument. Researchers have the choice of using
a previously designed instrument or developing one for
the study and this choice should be clearly declared for
the reader. Designing an instrument is a protracted and
sometimes difficult process (Burns and Grove, 1997) but the
overall aim is that the final questions will be clearly linked
to the research questions and will elicit accurate information

and will help achieve the goals of the research.This, however,
needs to be demonstrated by the researcher.
Table 2. Research designs
Design
Experimental
Qucisl-experimental
Non-experimental,
e.g. descriptive and
Includes: cross-sectional.
correlationai.
comparative.
iongitudinal studies
Sample
2 or more groups
One or more groups
One or more groups
Sample
allocation
Random
Random
Not applicable
Features

• Groups get
different treatments
• One variable has not
been manipuiated or
controlled (usually
because it cannot be)
• Discover new meaning
• Describe what already
exists
• Measure the relationship
between two or more
variables
Outcome
• Cause and effiect relationship
• Cause and effect relationship
but iess powerful than
experimental
• Possible hypothesis for
future research
• Tentative explanations
Britishjournal of Nursing. 2007. Vol 16. No 11 661
If a previously designed instrument is selected the researcher

should clearly establish that chosen instrument is the most
appropriate.This is achieved by outlining how the instrument
has measured the concepts under study. Previously designed
instruments are often in the form of standardized tests
or scales that have been developed for the purpose of
measuring a range of views, perceptions, attitudes, opinions
or even abilities. There are a multitude of tests and scales
available, therefore the researcher is expected to provide the
appropriate evidence in relation to the validity and reliability
of the instrument (Polit and Beck, 2006).
Methodology: validity and reliability
One of the most important features of any instrument is
that it measures the concept being studied in an unwavering
and consistent way. These are addressed under the broad
headings of validity and reliability respectively. In general,
validity is described as the ability of the instrument to
measure what it is supposed to measure and reliability the
instrument's ability to consistently and accurately measure
the concept under study (Wood et al, 2006). For the most
part, if a well established 'off the shelf instrument has been
used and not adapted in any way, the validity and reliability
will have been determined already and the researcher
should outline what this is. However, if the instrument
has been adapted in any way or is being used for a new
population then previous validity and reliability will not
apply. In these circumstances the researcher should indicate
how the reliability and validity of the adapted instrument
was established (Polit and Beck, 2006).
To establish if the chosen instrument is clear and
unambiguous and to ensure that the proposed study has
been conceptually well planned a mini-version of the main
study, referred to as a pilot study, should be undertaken before
the main study. Samples used in the pilot study are generally
omitted from the main study. Following the pilot study the

researcher may adjust definitions, alter the research question,
address changes to the measuring instrument or even alter
the sampling strategy.
Having described the research design, the researcher should
outline in clear, logical steps the process by which the data
was collected. All steps should be fully described and easy to
follow (Russell, 2005).
Analysis and results
Data analysis in quantitative research studies is often seen
as a daunting process. Much of this is associated with
apparently complex language and the notion of statistical
tests. The researcher should clearly identify what statistical
tests were undertaken, why these tests were used and
what •were the results. A rule of thumb is that studies that
are descriptive in design only use descriptive statistics,
correlational studies, quasi-experimental and experimental
studies use inferential statistics. The latter is subdivided
into tests to measure relationships and differences between
variables (Clegg, 1990).
Inferential statistical tests are used to identify if a
relationship or difference between variables is statistically
significant. Statistical significance helps the researcher to
rule out one important threat to validity and that is that the
result could be due to chance rather than to real differences
in the population. Quantitative studies usually identify the
lowest level of significance as PsO.O5 (P = probability)
(Clegg, 1990).
To enhance readability researchers frequently present
their findings and data analysis section under the headings
of the research questions (Russell, 2005). This can help the
reviewer determine if the results that are presented clearly

answer the research questions. Tables, charts and graphs may
be used to summarize the results and should be accurate,
clearly identified and enhance the presentation of results
(Russell, 2005).
The percentage of the sample who participated in
the study is an important element in considering the
generalizability of the results. At least fifty percent of the
sample is needed to participate if a response bias is to be
avoided (Polit and Beck, 2006).
Discussion/conclusion/recommendations
The discussion of the findings should Oow logically from the
data and should be related back to the literature review thus
placing the study in context (Russell, 2002). If the hypothesis
was deemed to have been supported by the findings,
the researcher should develop this in the discussion. If a
theoretical or conceptual framework was used in the study
then the relationship with the findings should be explored.
Any interpretations or inferences drawn should be clearly
identified as such and consistent with the results.
The significance of the findings should be stated but
these should be considered within the overall strengths
and limitations of the study (Polit and Beck, 2006). In this
section some consideration should be given to whether
or not the findings of the study were generalizable, also
referred to as external validity. Not all studies make a claim
to generalizability but the researcher should have undertaken
an assessment of the key factors in the design, sampling and
analysis of the study to support any such claim.
Finally the researcher should have explored the clinical
significance and relevance of the study. Applying findings
in practice should be suggested with caution and will
obviously depend on the nature and purpose of the study.

In addition, the researcher should make relevant and
meaningful suggestions for future research in the area
(Connell Meehan, 1999).
References
The research study should conclude with an accurate list
of all the books; journal articles, reports and other media
that were referred to in the work (Polit and Beck, 2006).
The referenced material is also a useful source of further
information on the subject being studied.
Conciusions
The process of critiquing involves an in-depth examination
of each stage of the research process. It is not a criticism but
rather an impersonal scrutiny of a piece of work using a
balanced and objective approach, the purpose of which is to
highlight both strengths and weaknesses, in order to identify
662 Uritish Journal of Nursinii. 2007. Vol 16. No II
whether a piece of research is trustworthy and unbiased. As
nursing practice is becoming increasingly more evidenced
based, it is important that care has its foundations in sound
research. It is therefore important that all nurses have the
ability to critically appraise research in order to identify what
is best practice. HH

Russell C (2005) Evaluating quantitative researcli reports.
Nephrol Nurs J
32(1): 61-4
Ryan-Wenger N (1992) Guidelines for critique of a research
report. Heart
Lung 21(4): 394-401
Tanner J (2003) Reading and critiquing research. BrJ Perioper
Nurs 13(4):
162-4
Valente S (2003) Research dissemination and utilization:
Improving care at
the bedside.J Nurs Care Quality 18(2): 114-21
Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in
Planning Nursing
Research: From Question to Proposal 6th edn. Jones and
Bartlett, Sudbury
Bassett C, B.issett J (2003) Reading and critiquing research. BrJ
Perioper
NriK 13(4): 162-4
Beauchamp T, Childress J (2001) Principles of Biomedical
Ethics. 5th edn.
O.xford University Press, Oxford
Burns N, Grove S (1997) The Practice of Nursing Research:
Conduct, Critique
and Utilization. 3rd edn.WB Saunders Company, Philadelphia
Burns N, Grove S (1999) Understanding Nursing Research. 2nd
edn. WB
Saunders Company. Philadelphia

Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21
Clegg F (1990) Simple Statistics: A Course Book for the Social
Sciences. 2nd edn.
Cambridge University Press. Cambridge
Conkin DaleJ (2005) Critiquing research for use in practice.J
Pediatr Health
Care 19: 183-6
Connell Meehan T (1999) The research critique. In:Treacy P,
Hyde A, eds.
Nursing Research and Design. UCD Press, Dublin: 57-74
Cullum N. Droogan J (1999) Using research and the role of
systematic
reviews of the literature. In: Mulhall A. Le May A. eds. Nursing
Research:
Dissemination and Implementation. Churchill Livingstone,
Edinburgh:
109-23-
Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd
edn. Sage,
Thousand Oaks. Ca
Parahoo K (2006) Nursing Research: Principles, Process and
Issties. 2nd edn.
Palgrave Macmillan. Houndmills Basingstoke
Polit D. Beck C (2006) Essentials of Nursing Care: Methods,
Appraisal and
Utilization. 6th edn. Lippincott Williams and Wilkins,
Philadelphia

Robson C (2002) Reat World Research. 2nd edn. Blackwell
Publishing,
O.xford
KEY POINTS
I Many qualified and student nurses have difficulty
understanding the concepts and terminology associated
with research and research critique.
IThe ability to critically read research is essential if the
profession is to achieve and maintain its goal to be
evidenced based.
IA critique of a piece of research is not a criticism of
the wori<, but an impersonai review to highlight the
strengths and iimitations of the study.
I It is important that all nurses have the ability to criticaiiy
appraise research In order to identify what is best
practice.
Critiquing Nursing Research 2nd edition
Critiquing
Nursing Research
2nd edition

ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x
156 mm; p/back; 224 pages;
publicatior) November 2006; £25.99
By John R Cutdiffe and Martin Ward
This 2nd edition of Critiquing Nursing Research retains the
features which made the original
a 'best seller' whilst incorporating new material in order to
expand the book's applicability. In
addition to reviewing and subsequently updating the material of
the original text, the authors
have added two further examples of approaches to crtitique
along with examples and an
additonal chapter on how to critique research as part of the
work of preparing a dissertation.
The fundamentals of the book however remain the same. It
focuses specifically on critiquing
nursing research; the increasing requirement for nurses to
become conversant with research,
understand its link with the use of evidence to underpin
practice; and the movement towards
becoming an evidence-based discipline.
As nurse education around the world increasingly moves
towards an all-graduate discipline, it
is vital for nurses to have the ability to critique research in
order to benefit practice. This book
is the perfect tool for those seeking to gain or develop precisely
that skill and is a must-have
for all students nurses, teachers and academics.
John Cutclitfe holds the 'David G. Braithwaite' Protessor of
Nursing Endowed Chair at the University of Texas (Tyler); he is

also an Adjunct Professor of Psychiatric Nursing at Stenberg
College International School of Nursing, Vancouver, Canada.
Matin Ward is an Independent tvtental Health Nurse Consultant
and Director of tvlW Protessional Develcpment Ltd.
To order your copy please contact us using the details below or
visit our website
www.quaybooks.co.yk where you will also tind details ot other
Quay Books otters and titles.
John Cutcliffe and Martin Ward
IQUAYBOOKS
AdMsioiiDftUHiolthcareM
Quay Books Division I MA Healthcare Limited
Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3
5HN I UK
Tel: 01722 716998 I Fax: 01722 716887 I E-mail:
[email protected] I Web: www.quaybooks.co.uk
A
ilH
MAHbUTHCASIUMITED
Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663

Jeff Rotman/The Image Bank/Getty Images
chapter 7
Dependent t-Tests and Repeated
Measures Analysis of Variance
Learning Objectives
After reading this chapter, you will be able to. . .
1. describe the impact that initial between-groups differences
have on test results when using the
t-test or analysis of variance.
2. compare the independent t-test to the dependent-groups t-test.
3. complete a dependent-(paired/repeated-)samples t-test.
4. explain what power means in statistical testing.
5. compare the one-way ANOVA to the repeated-measures
ANOVA.
6. complete a repeated-measures ANOVA.
7. interpret results and draw conclusions of within-group
designs.
8. present within-group analysis results in APA format.
9. employ Wilcoxon signed-ranks W-test and Friedman’s
nonparametric ANOVA.
CN

CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_07_c07.indd 235 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
Tests of significant difference, such as the t-test and analysis of
variance, are of two kinds: tests involving independent (or
between) groups and those that employ related,
or dependent (or within) groups. The tests covered to this point
in the book have involved
only independent groups tests. However, there are important
advantages related to the
dependent groups procedures, and they are used frequently in
data analysis.
In this chapter, the focus will be on the dependent groups
equivalents of the independent
t-test and the one-way ANOVA. Since the same groups are used
over time or treatments,
these are called dependent/within-groups designs, whereas the
matched or equivalent
groups can also be employed as an alternative design—all
collectively known as repeated-
measures designs. Although repeated-measures designs answer
the same questions as

their independent groups equivalents (i.e., are there significant
differences within groups,
across times/treatments, or between matched/equivalent groups)
under particular cir-
cumstances they can do so with greater economy and more
statistical power.
7.1 Reconsidering the t and F Ratios
The scores produced in both the independent t and the one-way
ANOVA are ratios. In the case of the t-test, the ratio is the
result of dividing the difference between the means
of the groups by the standard error of the difference:
t 5
M1 2 M2
SEd
With ANOVA, the F ratio is the mean square between divided
by the mean square within:
F 5
MSbet
MSwith
With either t or F, the denominator in the ratio reflects how
much scores vary within (rather
than between) the groups of subjects involved in the study.
These differences are easiest
to see in the way the standard error of the difference is
calculated for a t-test. When group
sizes are equal, the formula is
SED 5 "1SEM1 2 2 1 1SEM2 2 2
with SEM 5

s
"n
and s, of course, a measure of score variation in any group.
So the standard error of the difference is based on the standard
error of the mean, which
in turn is based on the standard deviation. These connections
make it clear that score vari-
ance within in a t-test has its root in the standard deviation for
each group of scores. If we
reverse the order and work from the standard deviation back to
the standard error of the
difference, note the following:
• When scores vary substantially in a group, it is reflected in a
large standard
deviation.
• When the standard deviation is relatively large, the standard
error of the mean
must likewise be large because the standard deviation is the
numerator in the
formula for SEM.
H1
TX_DC
BLF
TX
BL
BLL

suk85842_07_c07.indd 236 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
• A large standard error of the mean results in a large standard
error of the difference because that statistic is the square root
of the sum of the squared standard errors of the mean.
• When the standard error of the difference is large, the differ-
ence between the means has to be correspondingly larger in
order for the result to be statistically significant. The table of
critical values indicates that no t ratio (the ratio of the differ-
ences between the means and the standard error of the differ-
ence) may be less than 1.96 for a two-tailed test and less than
1.645 for a one-tailed test based on the critical a 5 .05.
Error Variance
The point of this is that the value of t in the t-test—and it is the
same for F in an ANOVA—
is greatly affected by the amount of variability within the
groups involved. When the
variability within those groups is extensive, the values of t and
F are correspondingly
diminished and less likely to be statistically significant than
when there is relatively little
variability within the groups.
These differences within groups stem from differences in the
way individuals within the
samples react to whatever treatment is the independent variable;
different people respond
differently to the same stimulus. These differences represent

error variance, which is what
occurs whenever scores differ for reasons not related to the
influence of the IV.
Other Sources of Error Variance
Within-group differences are not the only source of error
variance in the calculation of t
and F. Both t-test and ANOVA are based on the assumption that
the groups involved are
equivalent before the independent variable is introduced. In a t-
test where the impact of
relaxation therapy on clients’ anxiety is the issue, the
assumption is that before the ther-
apy is introduced, the group, which will receive the therapy, and
the control group, which
will not, begin with equivalent levels of anxiety. That
assumption is the key to attributing
any differences after the treatment to the therapy, the IV.
Confounding Variables
In comparisons such as this, the initial equivalence of the
groups can be a problem, how-
ever. Maybe there were differences in anxiety before the
therapy was introduced. There
might be differences in the employment circumstances of each
group, and perhaps those
threatened with unemployment are more anxious than the
others. Maybe there are age-
related differences. These other influences that are not
controlled in an experiment are
sometimes called confounding variables.
If a psychologist wants to examine the impact that a substance
abuse program has on

addicts’ behavior, a study might be set up as follows. Two
groups of the same number of
addicts are selected, and the substance abuse program is
provided to one group. After the
program, the psychologist measures the level of substance abuse
in both groups to see
whether there is a difference.
A If the size
of the standard
deviation is related to
the size of the group,
in a t-test, what is the
relationship between
sample size and error?
Try It!
suk85842_07_c07.indd 237 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The problem is that the presence or absence of the program is
not the only thing that might
prompt subjects to respond differently. Perhaps subjects’
background experiences are dif-
ferent. Maybe there are ethnic group differences, age
differences, or social class differ-
ences. If any of those differences affect substance abuse
behavior, there is an opportunity
to confuse the influence of those factors with the impact of the
substance abuse program,
which is the IV. If those other differences are not controlled and
they affect the dependent

variable, they contribute to error variance. There is error
variance any time the dependent
variable (DV) scores fluctuate for reasons unrelated to the IV.
Therefore, there is error variance reflected in the variability
within groups, and there is error
variance represented in any difference between groups that is
not related to the IV. Test
results can be meaningful only when the score variance that is
related to the independent
variable is substantially greater than the error variance—what is
controlled must contrib-
ute more to score values than what is left uncontrolled. This
makes it important to look for
ways to control error variance so that it is not confused with the
variability in scores that
stems from the independent variable. Controlling for
confounding variables is a necessary
research activity. A confounding variable can affect the IV-DV
relationship, thereby lowering
internal validity and thus the statistical conclusion validity of
your findings. Failing to take
confounding variables into account can result in misleading data
and erroneous conclu-
sions, to the detriment of the researcher’s reputation. In other
words, be careful of research
findings and sweeping general statements as there may be
several confounding elements.
That said, controlling for extraneous confounding variables
could be done in several ways.
7.2 Dependent-Groups Designs
Ideally, any before-the-treatment differences between the
groups in a study will be min-
imal. Recall that random selection occurs when every member
of a population has an

equal chance of being selected. The logic behind random
selection is that when groups are
randomly drawn from the same population, they will differ only
by chance, but they will
differ because no sample can represent the population with
complete fidelity, and occa-
sionally, the chance differences will affect the way subjects
respond to the IV.
One way to reduce error variance is to adopt what are called
dependent-groups designs. The independent t-test and the one-
way
ANOVA required independent groups. Members of one group
can-
not also be members of other groups in the same study.
However, in
the case of the t-test, if the same group is measured, exposed to
the
treatment, and then measured again, an important source of
error
variance is controlled. Using the same group twice makes initial
equivalence no longer a concern. Any scoring variability
between the
first and second measure should more accurately reflect the
impact of
the independent variable.
The Dependent-Samples t-Tests
One dependent-groups test where the same group is measured
twice is called the before/
after t-test, also known as the pre/post t-test. An alternative is
called the matched-pairs
or dependent-samples t-test, where each participant in the first
group is matched to
someone in the second group who has a similar characteristic.

Yet a third alternative that
B How does
random selection
attempt to control error
variance in statistical
testing?
Try It!
suk85842_07_c07.indd 238 10/23/13 1:29 PM
is basically the same as a before/after design is the within-
treatment design where each
participant is used across two treatment groups (usually given at
two different times,
which makes it the same as the before/after t-test). In the latter
option the participant acts
as his or her own control where one of the treatments may be a
placebo. All three types of
dependent-samples t-tests have the same objective, which is
controlling the error variance
that is due to initial between-groups differences. Following are
examples of each test.
• The before/after design: A researcher is interested in the
impact that positive rein-
forcement has on employees’ sales productivity. Besides the
sales commission,
the researcher introduces a rewards program that can result in
increased vaca-
tion time. The researcher gauges sales productivity for a month,

introduces the
rewards program, and gauges sales productivity during the
second month for the
same people. The researcher will explore differences in
employee productivity
before and after the positive reinforcement intervention (the
rewards program).
If significance was obtained then the null (i.e., there is no
difference in employee
productivity after the introduction of the rewards program) can
be rejected and
find support for the alternative hypothesis (i.e., there is a
significant increase in
employee productivity after the introduction of the rewards
program).
• The matched-pairs design: A school counselor is interested in
the impact that verbal
reinforcement has on students’ reading achievement. To
eliminate between-groups
differences, the researcher selects 30 people for the treatment
group and matches
each person in the treatment group to someone in the control
group who has a
similar reading score on a standardized test. The researcher then
introduces the
verbal reinforcement program to those in the treatment group
for a specified period
and over time compares the performance of students in the two
groups as well as
their performance within the group. The matched-pairs design is
similar to an inde-
pendent group design with one major exception: that the groups
are as matched (or
equivalent) to each other as closely as possible based on a
particular measure—in

this case the match or equivalent is based on reading scores on a
standardized test.
• Within-treatment design: A psychiatrist measures each study
participant on taking a
placebo, and then the actual drug for depression to test for
significant differences
over the two treatments (placebo versus drug). Here a
counterbalancing design may be
employed to minimize the order effects that plague repeated-
measures design. Specif-
ically the order in which treatments are given can influence the
outcome. Therefore,
the treatments are given to the groups at different times as
depicted in Figure 7.1.
Figure 7.1: Counterbalance design
Source: Oskar Blakstad (May 8, 2009). Counterbalanced
Measures Design by Martyn Shuttleworth. Retrieved Aug 22,
2013 from
Explorable.com: http://explorable.com/counterbalanced-
measures-design.
Although there are differences in how the tests are set up,
calculating the t-statistic is
the same in each case. The differences between the approaches
are conceptual, not
Group 1 Treatment A Treatment B Posttest
Group 2 Treatment B Treatment A Posttest
suk85842_07_c07.indd 239 10/23/13 1:29 PM
http://explorable.com/counterbalanced-measures-design

mathematical. Both approaches have the same purpose—to
control for any score varia-
tion stemming from nonrelevant factors. They both reduce the
error variance that comes
from using nonequivalent groups. Therefore, testing for
homogeneity of variance or the
Levene’s test is moot here, as we are not dealing with
differences between groups. On the
other hand, we are dealing with variance within groups and
across pairs of treatments.
If there are such significant differences, then this issue
constitutes what is described as a
violation of sphericity, which will be discussed in Section 7.3
with more depth when we
examine repeated-measures ANOVA.
Calculating t in a Dependent-Groups Design
Although the differences between before/after, matched-pairs,
and within-treatment
t-tests are not math-related, there are several approaches to
calculating the t statistic in the
dependent-groups tests. Whatever their differences, they all take
into
account the fact that the two sets of scores are related. One
approach is
to calculate the correlation between the two sets of scores and
then to
use the strength of the correlation value to reduce the error
variance—
the higher the correlation between the two sets of scores, the

lower the
error variance. Rather than correlations, which come up later in
the
book, we will rely on “difference scores.” But whether we use
correla-
tion values or difference scores, the result is the same.
The distribution of difference scores was discussed in Chapter 5
when the independent
t-test was introduced. The point of that distribution is to
determine the point at which the
difference between a pair of sample means (M1 2 M2) is so
great that the most probable
explanation is that the samples were not drawn from populations
with the same means.
That same distribution also provides the theoretical
underpinning for the dependent-
groups tests, but rather than the difference between the means
of the two groups
(M1 2 M2), the difference score in the dependent-groups tests is
based on the mean of the
differences between pairs of individual scores. That is, the
differences between each pair of
related scores will be determined, and then the mean of those
differences will become the
numerator in the t ratio. If the mean of the difference scores is
sufficiently different from
the mean of the distribution of difference scores (which, recall,
is 0), the t value will be
statistically significant.
The denominator in the t ratio is another standard error of the
mean value, but in this case,
it is the standard error of the mean for those difference scores.
The mechanics of checking

for significance are similar to what was done for the
independent t:
• A critical value from the t table defines the point at which the
t ratio is statisti-
cally significant.
• The critical value is dependent upon the degrees of freedom
for the problem.
For the dependent-samples t, the degrees of freedom are the
number of pairs of
scores minus 1 (n 2 1).
The dependent-groups t-test statistic has this form:
t 5
Md
SEMd
Formula 7.1
C How are the
before/after t-test
and the matched-pairs
t-test different?
Try It!
suk85842_07_c07.indd 240 10/23/13 1:29 PM
Where

Md 5 the mean of the difference scores
SEMd 5 the standard error of the mean for the difference scores
The steps for completing the test follow:
1. From the two scores for each subject, subtract the second
from the first to deter-
mine the difference score, d for each pair.
2. Determine the mean of the d scores: Md 5
a d
number of pairs
3. Calculate the standard deviation of the d values, sd.
4. Calculate the standard error of the mean for the difference
scores, SEMd, by
dividing the result of Step 3 by the square root of the number of
pairs of scores,
SEMd 5
sd
"number of pairs
5. t 5
Md
SEMd
Following is an example to illustrate the steps to calculating the
dependent measures
t-test. A psychologist is investigating the impact that verbal
reinforcement has on the
number of questions students ask in a seminar.

• Ten upper-level students participate in two seminars where a
presentation is fol-
lowed by students’ questions.
• In the first seminar, no feedback is provided by the instructor
after a student asks
the presenter a question.
• In the second seminar, the instructor offers feedback—such as
“That’s an excel-
lent question” or “Very interesting question” or “Yes, that had
occurred to me as
well”—after each question.
• The psychologist will test the following hypothesis:
H0: There is no significant mean difference in the number of
student questions
asked from seminar 1 to seminar 2
H0: mseminar_1_questions 5 mseminar_2_questions
• By rejecting H0 , the psychologist will find support for the
alternative hypothesis:
Ha: There is a significant mean difference in the number of
student questions asked
from seminar 1 to seminar 2
Ha: mseminar_1_questions ? mseminar_2_questions
Is there a significant difference between the number of
questions students ask in the first
seminar compared to the number of questions students ask in the
second seminar? The
number of questions asked by each student in both seminars and

the solution to the prob-
lem are in Figure 7.2.
suk85842_07_c07.indd 241 10/23/13 1:29 PM
Figure 7.2: Calculating the before/after and within-treatment t
Se
m
in
ar
1
Se
m
in
ar
2
1.
2.
3.
4.
5.

6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2
3
2
4

0
3
1
5
4
3
1
Se
m
in
ar
1
Se
m
in
ar
2
d
1.
2.

3.
4.
5.
6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2

3
2
4
0
3
1
5
4
3
1
�2
�2
�1
0
�1
0
�2
�2

�2
1
1. Determine the difference between each pair of scores, d by
subtraction.
2. Determine the mean of the differences, the d values (Md).
Md = = � = �1.1
3. Calculate the standard deviation of the d values (sd).
Verify that sd = 1.101
�d = �11
determine standard error of the mean for the difference scores
(SEMd)
4. Just as the standard error of the mean in the earlier tests was
s/�n,
by dividing the result of step 3 by the square root of the number
of
pairs.
Verify that SEmd = = = 0.348
5. Divide Md by SEMd to determine t.
t = = � = �3.161
this test are the number of pairs of scores, np �1.
t0.05(9) = 2.262

6. As noted earlier, the degrees of freedom for the critical value
of t for
�d
10
Md
SEMd
1.1
0.348
sd
�np
1.101
�10
11
10
suk85842_07_c07.indd 242 10/23/13 1:29 PM
The calculated value of t exceeds the critical value from Table
5.1 (which is also Table B in
the Appendix). The result is statistically significant. Note that it

is the absolute value of the
calculated t in which we are interested. Because the question
was whether there is a sig-
nificant difference in the number of questions, it is a two-tailed
test, and it does not matter
which session had the greater number; it also does not matter
whether Session 1 is larger
than Session 2 or the other way around. The students in the
second session, where ques-
tions were followed by feedback, asked significantly more
questions than the students did
in the first session, when the instructor offered no feedback.
The Degrees of Freedom, the Dependent-Groups Test, and
Power
With Md 5 21.1, there is comparatively little difference between
the two sets of scores.
What makes such a small mean difference statistically
significant? The answer is in the
amount of error variance in this problem. When the error
variance is also very small—the
standard error of the difference scores is just .348—
comparatively small mean differences
can be statistically significant. The rationale for using
dependent-
groups tests as opposed to independent-group designs is that the
for-
mer are comparatively more powerful; there is less error to
contend
with, thereby increasing the probability of rejecting the null
hypoth-
esis. This brings us to the discussion of power in statistical
testing.
Table B in the Appendix, the critical values of t, indicates that

critical
values decline as degrees of freedom increase. That occurs not
only in
the critical values for t but also for F in analysis of variance and
in fact
for most tables of critical values for statistical tests. For the
dependent-
groups t-test, the degrees of freedom are based on
• the number of pairs of related scores, 21.
For the independent-groups t-test, the degrees of freedom are
based on
• the number of scores in both groups, 22 (Chapter 5).
This means that critical values are larger in a dependent-groups
test for the same number
of raw scores involved. But even a test with a larger critical
value can produce significant
results when there is more control of error variance. This is
what the dependent-groups
test provides. The central point is this: When each pair of scores
comes from the same par-
ticipant, or from a matched pair of participants, the random
variability from nonequiva-
lent groups is minimal because scores tend to vary similarly for
each pair, resulting in
relatively little error variance. The small SEMd value that
results more than compensates
for the fewer degrees of freedom and the associated larger
critical value connected to
dependent-groups tests.
In statistical testing, power is defined as the likelihood of
detecting a significant differ-

ence when it is present. The more powerful statistical test is the
one that will most read-
ily detect a significant difference. As long as the sets of scores
are closely related, the
dependent-measures test is more powerful than the independent-
groups equivalent.
D What does it
mean to say that
the within-subjects test
has more power than the
independent t-test?
Try It!
suk85842_07_c07.indd 243 10/23/13 1:29 PM
A Matched-Pairs Example
Another form of the dependent-groups t-test is the matched-
pairs design. In this approach,
rather than measure the same people repeatedly, each
participant in one group is paired
with a participant in the other group who is similar.
For example, consider a market analyst who wants to determine
whether a television com-
mercial will induce consumers to spend more on a breakfast
cereal. The analyst selects a
group of consumers entering a grocery store, induces them to
view the television com-
mercial, and then tracks their expenditures on breakfast cereal.

A second group is selected,
and they also shop, but they do not view the television
commercial. The analyst selects
people for the second group who match the age and gender
characteristics of those in the
first group. This controls for age and gender because those
characteristics might affect
spending for the particular product. Each individual from Group
1 has a companion in
Group 2 of the same age and sex. The expenditures in dollars
for the members of each
group and the solution to the problem are in Figure 7.3.
Figure 7.3: Calculating a matched-pairs t-test
Vi
ew
ed
Di
d
no
t
Vi
ew d
1.
2.
3.
4.

5.
6.
7.
8.
9.
10.
1.5
4
3
0
2
4.5
6
0
5.25
2
3
0

2
0
0
4
2
1
2
3
�1.5
4
1
0
2
0.5
4
�1
3.25
�1

sd = 2.092
Verify that Md = 1.125
SEMd = = = 0.662
t = = = 1.700
t0.05(9) = 2.262
sd
�np
2.092
�10
Md
SEMd
1.125
0.662
suk85842_07_c07.indd 244 10/23/13 1:29 PM
The absolute value of t is less than the critical value from Table
5.1 (or Appendix Table B)
for df 5 9. The difference is not statistically significant. There

are probably several ways to
explain the outcome, but we will explore just three.
• The most obvious explanation is that the commercial did not
work. The shoppers
who viewed the commercial were not induced to spend
significantly more than
those who did not view it.
• Another explanation has to do with the matching. Perhaps age
and gender are
not related to how much people spend shopping for the
particular product. Per-
haps the shopper’s level of income is the most important
characteristic, and that
was not controlled in the pairing.
• Another explanation is related to sample size. Small samples
tend to be more
variable than larger samples, and variability is what the
denominator in the
t-ratio reflects. Perhaps if this had been a larger sample, the
SEMd would have
had a smaller value, and the t would have been significant.
The second explanation points out the disadvantage of matched-
pairs designs compared
to repeated-measures designs. The individual conducting the
study must be in a posi-
tion to know which characteristics of the participants are most
relevant to explaining the
dependent variable so that they can be matched in both groups.
Otherwise, it is impos-
sible to know whether a nonsignificant outcome reflects an
inadequate match, control of
the wrong variables, or a treatment that just does not affect the

DV.
Comparing the Dependent-Samples t-Test to the Independent t
In order to compare the dependent-samples t-test and the
independent t more directly, we
are going to apply both tests to the same data. This will
illustrate how each test deals with
error variance; however, a caution is necessary before
beginning: Once data is collected,
there really is no situation where someone can choose which
test to use because either the
groups are independent, or they are not. Therefore, we proceed
purely as an academic
exercise, recognizing that such a situation is not going to
happen in the ordinary course
of events.
As an example, a university program encourages students to
take a service learning class
that emphasizes the importance of community service as a part
of the students’ educa-
tional experience. Data is gathered on the number of hours
former students spend in com-
munity service per month after they complete the course and
graduate from the university.
• For the independent t-test, the students are divided between
those who took a
service learning class and graduates of the same year who did
not.
• For the dependent-groups t-test, those who took the service
learning class are
matched to a student with the same major, age, and gender who
did not take

the class.
The data and the solutions to both tests are in Figure 7.4.
suk85842_07_c07.indd 245 10/23/13 1:29 PM
Figure 7.4: The before/after t-test versus the independent t-test
Because the differences between the scores are quite consistent,
as they tend to be when
participants are matched effectively, there is very little variance
in the difference scores.
This results in a comparatively small standard deviation of
difference scores and a small
standard error of the mean for the difference scores. This allows
for t ratios with even rela-
tively small numerators to be statistically significant. Because
for the independent t-test,
there is no assumption that the two groups are related, error
variance is based on the dif-
ferences within the groups of raw scores, and the denominator is
large enough that the t
value is not significant.
Cl
as
s
No
Cl
as

s d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
4
3
3
2
3
4
1

5
6
4
3
2
2
2
2.5
3
2
4
5
3
1
1
1
0
0.5

1
�1
1
1
1
M
s
SEM
3.50
1.434
0.453
2.850
1.001
0.316
0.650
0.669
0.211
As an independent t-test we have,

SEd = �(SEM12 + SEM22) = �(0.4532 + 0.3162) = 0.553
As a matched pairs t-test the results are,
t = = 0.650 + 0.211 = 3.081; t0.05(9) = 2.262. The result
is significant.
M1 � M2
SEd
3.50 � 2.850
= 1.175; t0.05(18) = 2.101. The result is not significant.=t =
0.553
Md
SEMd
suk85842_07_c07.indd 246 10/23/13 1:29 PM
The Dependent-Groups t-Test on Excel
If the problem in Figure 7.4 is completed in Excel as a
dependent-groups test, the proce-
dure is as follows:
• Create the data file in Excel.
Column A is labeled Class to indicate those who had the service
learning class, and

column B is labeled No Class.
Enter the data, beginning with cell A2 for the first group and
cell B2 for the second
group.
• Click the Data tab at the top of the page.
• At the extreme right, choose Data Analysis.
• In the Analysis Tools window, select t-test: Paired Two
Sample for Means and
click OK.
• There are two blanks near the top of the window for Variable
1 Range and
Variable 2 Range. In the first, enter A2:A11 indicating that the
data for the first
(Class) group is in cells A2 to A11. In the second, enter B2:B11
for the No Class
group.
• Indicate that the hypothesized mean difference is 0. This
reflects the value for the
mean of the distribution of difference scores.
• Indicate A13 for the output range, so that the results do not
overlay the data
scores.
• Click OK.
Widen column A so that all the output is readable. The result is
the screenshot that is
Figure 7.5.
suk85842_07_c07.indd 247 10/23/13 1:29 PM

Figure 7.5: The Excel output for the dependent-samples t-test
using the
data from Figure 7.4
In the Excel solution, t 5 3.074 rather than the 3.081 from the
longhand solution. The
Excel approach is to calculate the correlation between scores to
find a solution, rather than
to determine the difference between scores as we did. Note that
the Pearson correlation
(which will be explained in Chapter 8) is indicated at .91. In
any event, the very minor dif-
ference, .007, between the solution in Figure 7.4 and the Excel
solution in Figure 7.5 is not
relevant to the outcome as these are attributed to rounding
errors. The Excel output also
indicates results for both one-tailed and two-tailed tests at p 5
.05; the outcome is statisti-
cally significant.
suk85842_07_c07.indd 248 10/23/13 1:29 PM
Apply It!
Repeated Measures
Diabetes is a group of metabolic diseases in which the body

cannot properly
regulate blood sugar. Management of this disease is achieved by
controlling
normal levels of glucose in the blood for as much of the time as
possible. This requires an
accurate, portable glucose monitor for home use.
A medical device company has developed a new portable
glucose monitor and wishes to
compare it against a laboratory standard. This will produce a
data set in which two different
monitors measure the glucose level of 11 randomly chosen
diabetes patients. Although the
two monitors take the blood samples at the same time, this can
be considered an example
of the before/after dependent-samples t-test because the same
group is measured twice.
By using the same set of patients for both monitors, each patient
is his or her own control.
Obtaining two measurements for each patient reduces
measurement variability compared
to using two independent sets of patients. Choosing a level of
significance of p ≤ .05, we use
the paired-sample t-test to test the null hypothesis that there is
no difference in measure-
ments between the two monitors.
• H0: mglucose_portable_monitor 5 mglucose_lab_monitor
By rejecting H0 the company will find support for the
alternative hypothesis that there is a
significant mean difference in the glucose level between both
machines.
• Ha: mglucose_portable_monitor ? mglucose_lab_monitor

The glucose readings from each of the two monitors are
measured in milligrams per deciliter
and are shown in the following table. There is a large variability
within each column because
each patient is different, and the readings were taken at various
times of the day.
Patient Portable Monitor Laboratory Standard
A 112 120
B 85 82
C 103 116
D 154 168
E 65 75
F 52 51
G 85 96
H 72 79
I 167 178
J 123 141
K 142 153
(continued)
suk85842_07_c07.indd 249 10/23/13 1:29 PM

Comparing the Three Dependent t-Tests With the Independent t-
Test
The before/after and matched-pairs approaches to calculating a
dependent-groups t-test
each have advantages. The before/after design provides the
greatest control over the
extraneous variables that can confound the results in a matched-
pairs design. When using
the matching approach, there is always a chance that subjects in
Group 2 are not matched
closely enough on some relevant variable and the resulting
mismatches create error vari-
ance. In the service learning example, students were matched
according to age, major,
and gender. But if marital status affects students’ willingness to
be involved in commu-
nity service and it is not controlled, there could be an imbalance
of married/not-married
Apply It! (continued)
The Excel solution follows:
Variable 1 Variable 2
Mean 105.45 114.45
Variance 1428.67 1736.27
Observations 10 10

Pearson Correlation 0.99
Hypothesized Mean
Difference
0.00
df 9
t Stat 24.817
P(Tdt) one-tail 0.0003
tcrit one-tail 1.8331
P(Tdt) two-tail 0.0007
tcrit two-tail 2.2622
The magnitude of the calculated value of t 5 24.817 exceeds the
critical two-tail value from
the table of tcrit 5 2.26. The result is statistically significant so
we reject the null hypothesis
that the means are the same. The portable monitor measures
glucose levels lower than the
laboratory standard.
Based on results of this test, the company continued research on
the portable monitor
until they could devise a solution that would more accurately
replicate laboratory standard
results.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 250 10/23/13 1:29 PM

CHAPTER 7Section 7.3 The Within-Subjects F
students that confounds results. The before/after procedure
involves the same students,
and unless their status on some important variable changes
between measures (a rash of
marriages between the first and second measurement, for
example), there is going to be
better control of error variance with that approach.
Note that the matched-pairs and the within-treatments approach
also assume a large sam-
ple from which to draw in order to select participants who
match those in the first group.
As the number of variables on which participants must be
matched increases, so must the
size of the sample from which to draw in order to find
participants with the correct com-
bination of characteristics.
The advantage of the matched-pairs design, on the other hand,
is that it takes less time to
execute. The treatment group and the control group can both be
involved in the study at
the same time. By way of a summary, note the comparisons in
Table 7.1.
Table 7.1: Comparing the t-tests
Independent t Before/After Matched-Pairs Within-
Treatments
Groups Independent

groups
One group
measured twice
Two groups:
each subject
from the first
group matched
to one in the
second
One group
measured
twice for two
treatments
Denominator/
error term
Within groups
variability plus
between groups
Only within
groups variability
Only within
groups variability
Only within
groups variability
7.3 The Within-Subjects F
Sometimes two measures of the same group are not enough to

track changes in the DV. Maybe the researchers running the
service learning study want to compare how much
time students devoted to community service the year they
graduated, one year later, and
then two years after graduation. The within-subjects F is a
dependent-groups procedure
for two or more groups of scores when the dependent variable is
interval or ratio scale.
Because the dependent-groups t-test is the repeated-measures
equivalent of the indepen-
dent t-test, the within-subjects F is the repeated-measures or
matched-pairs equivalent
of the one-way ANOVA. The same Ronald A. Fisher who
developed analysis of variance
also developed this test, which is a form of ANOVA, and the
test statistic is still F.
Here too, the dependent groups can be formed by either
repeatedly measuring the same
group or by matching separate groups of participants on the
relevant variables. When
there are more than two groups, matching becomes increasingly
problematic, however,
and although it is theoretically possible to match any number of
participant groups, it is
suk85842_07_c07.indd 251 10/23/13 1:29 PM
a highly complex undertaking to match all the relevant variables
across more than two or
three measures. Repeatedly measuring the same participants is
much more common than

matching.
Managing Error Variance in the Within-Subjects F
Recall from Chapter 6 that when Fisher developed ANOVA, he
shifted away from calcu-
lating score variability with the standard deviation, standard
error of the mean, and so on
and used sums of squares instead. The particular sums of
squares computed are the key
to the strength of this procedure.
If a group of participants in a study is measured on a dependent
variable at three different
intervals and their scores are recorded in parallel columns, the
researcher will have a data
sheet similar to the following:
First Measure Second Measure Third Measure
Participant 1 . . .
Participant 2 . . .
• The column scores are the equivalent of scores from the
different groups in a one-
way ANOVA, and any differences from column to column
reflect the effect of the
IV, the treatment.
• The participant-to-participant differences, the within-group
differences, are
reflected in the differences in the scores from row to row. Those
differences are
error variance just as they are with the one-way ANOVA.

• The within-subjects F approach is to calculate the variability
between rows
(the within-groups variance), and then, because it comes from
participant-to-
participant differences that are the same in each group, to
eliminate it from further
analysis.
• The only error variance that remains is that which does not
stem from the person-
to-person differences.
In the dependent-samples t-test, the within-subjects variance is
managed by reducing the
denominator in the t ratio according to how highly correlated
the two sets of measures are
(the Excel approach) or by the longhand approach of using the
standard deviation of the
difference scores, which is relatively small when scores are
related.
In the within-subjects F, the variability within groups is
calculated and then adjusted if
there are issues with too much variance between pairs of
treatments. This detection of
variance is based on the Mauchly’s test of sphericity (W)
developed by John W. Mauchly
in 1940. If the W-test is significant (p , .05), then there is a
violation of sphericity, which
means that there is too much variance within the group across
pairs of times/treat-
ments (see Table 7.2). Therefore, since sphericity is violated,
degrees of freedom adjust-
ments are made that include the Greenhouse-Geisser or Huynh-
Feldt calculations. These
are adjustments of the degrees of freedom (df ) based on their

respective epsilon or
suk85842_07_c07.indd 252 10/23/13 1:29 PM
e-values (discussed more in Chapter 8). Of the two options, the
Greenhouse-Geisser is
more conservative in that it is harder to reject the null
hypothesis, with a lower prob-
ability of a type I error. The Huynh-Feldt is based on a bias
corrected value that is not as
conservative.
One final note regarding error variance is that it can only be
calculated across comparison
of pairs of treatments so therefore the W-test for dependent-
samples t-test is not necessary
since there is only one pair of values. In addition, the test for
sphericity cannot be done
in the one-way ANOVA because the amount of variability
within groups is different for
each group, and there is no way to separate it from the balance
of the error variance in
the problem, which can be a severe limitation in affecting the
power of between-group
designs. An example and interpretation of sphericity will be
shown in the SPSS example
later in the chapter.
Table 7.2: The concept of sphericity
Patient Tx A Tx B Tx C Tx 12Tx 2 Tx 12Tx 3 Tx 22Tx 3

1 30 27 20 3 10 7
2 35 30 28 5 7 2
3 25 30 20 25 5 10
4 15 15 12 0 3 3
5 9 12 7 23 2 5
Variance 17 10.3 10.3
A Within-Subjects F Example
An industrial/organizational psychologist is conducting a study
of employees who
assemble electronic components. The study examines how
productivity changes dur-
ing the length of time employed. The psychologist identifies
five workers hired in the
same month and then gauges the number of assembled
components each employee aver-
ages per hour one week, one month, and then two months after
beginning work. Is there
is a relationship between the number of completed components
and the length of time
employed? The data for the five employees follows:
Products Assembled per Hour
1 week 1 month 2 months
Diego 2 5 4
Harold 4 7 7

Wilma 3 6 5
Carol 4 5 6
Moua 5 8 9
suk85842_07_c07.indd 253 10/23/13 1:29 PM
• The independent variable (the IV, the treatment) is the time
elapsed.
• The dependent variable (the DV) is the number of components
assembled.
• The issue is whether there are significant differences in the
measures from col-
umn to column (over time).
In Chapter 6 the variability related to the IV was measured in
the sum of squares between
(SSbet). The same source of variance is gauged here, except
that it is called the sum of squares
between columns (SScol).
The Components of the Within-Subjects F
Calculating the within-subjects F begins just as the one-way
ANOVA begins, by determin-
ing all variability from all sources with the sum of squares total
(SStot). It is even calculated
the same way as it was in Chapter 6:
1. The sum of squares total.

SStot 5 a (x 2 MG)2
a. Subtract each score from the mean of all the scores from all
the groups,
b. square the difference, and then
c. sum the squared differences.
The balance of the problem is completed with the following
steps:
2. The sum of squares between columns (SScol). This equation
is much like SSbet in the
one-way ANOVA. The scores in each column are treated the
same way the indi-
vidual groups are treated in the one-way ANOVA. For columns
1, 2, through k,
SScol 5 (Mcol 1 2 MG )
2ncol 1 1 (Mcol 2 2 MG )
2ncol 2 1 .
. . 1 ( Mcol k 2 MG )
2ncol k
Formula 7.2
a. calculate the mean for each column of scores,
b. subtract the mean for all the data (MG) from each column
mean,
c. square the result, and
d. multiply the squared result by the number of scores in the
column.
3. The sum of squares between rows. This too, is like the SSbet
from the one-way prob-

lem except that it treats the scores for each row as a separate
group. For rows 1, 2,
through i
SSrows 5 (Mrow 1 2 MG)
2 nrow 1 1 (Mrow 2 2 MG)
2 nrow 2 1 .
. . 1 (Mrow i 2 MG)
2 nrow i
Formula 7.3
a. calculate the mean for each row of scores,
b. subtract the mean for all the data from each row mean,
c. square the result, and
d. multiply the squared result by the number of scores in the
row.
suk85842_07_c07.indd 254 10/23/13 1:29 PM
4. The residual sum of squares is the error term in the within-
subjects F. It is the
equivalent of SSwith in the one-way ANOVA. With the within-
subjects F, the vari-
ability in scores due to person-to-person differences within the
same measure is
calculated, and because it is the same for each set of measures,
it is eliminated. This will
result in a reduced error term. It is determined as follows:

SSresid 5 SStot 2 SScol 2 SSrows Formula 7.4
a. Take all variance from all sources (SStot),
b. subtract from it the treatment effect (SScol), and
c. subtract the person-to-person differences (SSrows).
The Within-Subjects F Calculations
When the sums of squares values are completed, the next step is
to complete the ANOVA
table. The degrees-of-freedom values are as follows:
• dftot 5 N 2 1
• dfcol 5 number of columns 2 1
• dfrows 5 number of rows 2 1
• dfresid 5 dfcol 3 dfrows
Just as with one-way problems, the mean square values are
calculated by dividing the
sums of squares by their degrees of freedom. The components of
the F value and the only
MS values required are the MScol, which includes the treatment
effect, and the MSresid,
which is the error term. The MS is not determined for total or
for rows. The F value in the
within-subjects ANOVA is then MScol 4 MSresid.
The calculations and the table for the products-assembled-per-
hour problem are in
Figure 7.6.
suk85842_07_c07.indd 255 10/23/13 1:29 PM

E How is the
error term in the
within-subjects F different
from that in the one-way
ANOVA?
Try It!
Figure 7.6: A within-subjects F example
The calculated value of F exceeds the critical value of F from
the table. The number of products assembled per hour is
significantly different according to the amount of time the
employee has been on the job. The significant F indicates that
this much difference between measures is unlikely to have
occurred by chance.
Diego
Moua
Carol
Wilma
Harold
Column Means
Grand Mean (Md)
1
week
1
month

2
months
Row
Means
2
4
3
4
5
3.60
5.333
5
7
6
5
8
6.20
4
7

5
6
9
6.20
3.667
6.0
4.667
5.0
7.333
The Products Assembled per Hour
Source
Residual
Rows
Columns
Total
SS
49.333
22.533

23.333
3.467
df
14
2
4
8
MS
11.267
0.433
F
26.0
Fcrit
4.46
The ANOVA Table
1. SStot = �(x � MG)2
(2 � 5.333)2 + (4 � 5.333)2 + ... + (9 � 5.333)2 = 49.333
4. The residual sum of squares.

SSresid = SStot � SScol � SSrows = 49.333 � 22.533 � 23.333
= 3.467
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + ...
+ (Mcolk � MG) 2ncolk
(3.6 � 5.333)25 + (6.2 � 5.333)25 + (6.2 � 5.333)25 = 22.533
(7.333 � 5.333)23 = 23.333
(3.667 � 5.333)23 + (6.0 � 5.333)23 + (4.667 � 5.333)23 +
(5.0 � 5.333)23 +
3. SSrows = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + ... + (Mri
� MG) 2ni
suk85842_07_c07.indd 256 10/23/13 1:29 PM
Completing the Post Hoc Test
Ordinarily, the calculation of F leaves unanswered the question
of which set of measures
is significantly different from which. However, in this
particular problem there is only one
possibility. Because both the 1-month and the 2-month groups
of measures have the same
mean (M 5 6.20), they must both be significantly different from
the only other group of
measures in the problem, the 1-week-on-the-job measures for
which M 5 3.6. As a dem-
onstration, HSD is completed anyway.

The HSD procedure is the same as for the one-way test, except
that the error term is now
MSresid. Substituting MSresid for MSwith in the formula
provides
HSD 5 x Å
MSresid
n
Where
x is a value from Appendix Table D. It is based on the number
of means,
which is the same as the number of groups of measures, 3 in the
example,
and the df for MSresid, which are 8.
n 5 the number of scores there are in any one measure, 5 in this
instance.
For the number-of-products-assembled-per-hour study,
HSD 5 4.04 Å
.433
5
5 1.19
A difference of .306 or greater between any pair of means is
statistically significant. Using
the same approach used in Chapter 6, a matrix indicating the
difference between each pair
of means makes it easier to interpret the HSD value.
1 week (3.6) 1 month (6.2) 2 months (6.2)

1 week (3.6) diff 5 0 diff 5 2.6* diff 5 2.6*
1 month (6.2) diff 5 0
2 months (6.2)
*Indicates a significant difference.
The 1-week measures of productivity are significantly different
from the 1-month and
2-month measures of productivity. Because the mean values of
the 1- and 2-month mea-
sures are the same, neither of the last two measures is
significantly different from the
other. The largest increase in productivity comes between the
first week and first month
of employment.
Calculating the Effect Size
The final question for a significant F is the question of the
practical importance of the
result. Using partial-eta squared as the measure of effect size
yields the following formula:
partial-h2 5
SScol
SSresid 1 SScol
suk85842_07_c07.indd 257 10/23/13 1:29 PM

For the problem just completed, SScol 5 22.533 and SSresid 5
3.467, so
partial-h2 5
22.533
26
5 0.87
Approximately 87% of the variance in productivity can be
explained by how long the
individual has been on the job.
Apply It!
Pilot Program Revisited
Let us return to the example of the middle school that adopted a
medita-
tion program known as quiet time to relieve stress, increase test
scores, and
improve student behavior. In Chapter 5, we used a one-sample t-
test to deter-
mine that a statistically significant increase in GPAs occurred
among participating students.
Now, we will use a within-subject F test to see if their stress
levels have decreased.
Ten randomly chosen students who participated in the program
filled out questionnaires
about their stress levels. The aggregate score was from 1 to 10,
with 10 indicating the most
stress. The survey was given before the start of the program and
at 3-month intervals. The

time elapsed represents the independent variable, the treatment
effect that drives this
analysis. The dependent variable is the stress score. The within-
subjects F is a dependent-
groups procedure for two or more groups of scores for which
the dependent variable is
interval or ratio scale. In this example, we have four groups of
scores.
Results of the stress questionnaires follow.
0 Months 3 Months 6 Months 9 Months
Student 1 7 6 6 6
Student 2 9 6 5 5
Student 3 7 5 5 4
Student 4 5 3 3 2
Student 5 7 6 4 4
Student 6 8 5 7 5
Student 7 5 4 4 3
Student 8 7 5 6 5
Student 9 6 6 4 4
Student 10 7 5 5 5
(continued)
suk85842_07_c07.indd 258 10/23/13 1:29 PM

CHAPTER 7
The following table shows results of the within-subject F test
calculations.
Source SS df MS F
Total 82.000 39
Columns 34.475 3 11.492 26.36
Subjects 35.725 9
Residual 11.775 27 0.436
F.05(3,27) 2.96
The F value of 26.36 is greater than the critical F value of 2.96,
so the results are statistically
significant.
Because the calculation of F did not identify the measures that
were significantly different
from the others, we calculate HSD using the following formula:
HSD 5 x Å
MSresid
n
HSD 5 3.875 Å
0.436

10
5 0.81
A difference of 0.81 or greater between any pair of means is
statistically significant. A matrix
indicating the difference between each pair of means makes it
easier to interpret the HSD
value.
0 months (6.8) 3 months (5.1) 6 months (4.9) 9 months (4.3)
0 months (6.8) diff 5 1.7* diff 5 1.9* diff 5 2.5*
3 months (5.1) diff 5 0.2 diff 5 0.8
6 months (4.9) diff 5 0.6
9 months (4.3)
The differences marked with an asterisk are significant. The
largest increase in productivity
occurs during the first 3 months of the program.
To determine the practical importance of these numbers, partial-
eta squared is used.
For the problem just completed, SScol 5 34.475, and SSresid 5
11.775, so
h2 5
34.475
46.25
5 0.75

Section 7.3 The Within-Subjects F
(continued)
suk85842_07_c07.indd 259 10/23/13 1:29 PM
Comparing the Within-Subjects F and the One-Way ANOVA
In the one-way ANOVA, within-group variance is different for
each group because each
group is made up of different participants. There is no way to
eliminate the error vari-
ance as it was eliminated for the within-subjects F because that
source of error variance
cannot be separated from the balance of the error variance. The
smaller error term in the
within-subjects test (which is the divisor in the F ratio) allows
relatively small differences
between the sets of measures to result in a significant F.
This is illustrated by using the same data as the example of the
workers who assem-
ble electronic components, except here we calculated a one-way
ANOVA instead of the
within-subjects F. This is for illustration only because groups
are either independent or
dependent. There is not a situation in that once the test is
conducted, someone would
wonder which approach is appropriate.
The SStot and the SSbet will be the same as the SStot and the
SScol are in the within-subjects

problem.
SStot 5 49.333
SSbet 5 22.533
SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2 (Formula
6.3)
5 (2 2 3.60)2 1 (4 2 3.60)2 1 . . . 1 (9 2 6.20)2 5 26.80
The value for the SSwith in a one-way ANOVA is the same as
SSrows 1 SSresid in the within-
subjects F in Figure 7.6. It has to be because in the one-way
ANOVA, there is no way to
separate the participant-to-participant differences from the
balance of the error variance
About 75% of the variance in stress can be explained by how
long the student has been
enrolled in the program.
The within-subjects F test allowed analysis of students’ stress
levels at multiple times
throughout the year and showed that the program was reducing
stress levels by significant
amounts.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 260 10/23/13 1:29 PM

because they are different for each group. With the SSrows
added back into the error term,
note in Table 7.3 the changes made to the ANOVA table and to
F in particular.
• The degrees of freedom for “within” change to 12 from the 8
for residual, which
results in a smaller critical value for the independent-groups
test, but that adjust-
ment does not compensate for the additional error in the term.
• Note that the sum of squares for the error term jumps from
3.467 in the within-
subjects test to 26.80 in the independent-groups test.
• The F value is reduced from 26.0 in the within problem to
5.046 in the one-way
problem, a factor of about 1/5.
Because groups are either independent or not, the example is
not realistic. Nevertheless,
the calculations illustrate the advantage to statistical power of
setting up a dependent-
groups test, an option researchers have at the planning level.
Table 7.3: The within-subjects F example repeated as a one-way
ANOVA
The ANOVA table
Source SS df MS F Fcrit
Between 22.533 2 11.267 5.045 3.89

Within 26.800 12 2.233
Total 49.333 14
Another Within-Subjects F Example
A psychologist working at a federal prison is interested in the
relationship between the
amount of time a prisoner is incarcerated and the number of
violent acts in which the
prisoner is involved. Using self-reported data, inmates respond
anonymously to a ques-
tionnaire administered 1 month, 3 months, 6 months, and 9
months after incarceration.
The data and the solution are in Figure 7.7.
The results (F) indicate that there are significant differences in
the num-
ber of violent acts documented for the inmate related to the
length of
time the inmate has been incarcerated. The HSD results indicate
that
those incarcerated for 1 month are involved with a significantly
dif-
ferent number of violent acts than those who have been in for 3
or 6
months. Those who have been in for 6 months are involved with
a sig-
nificantly different number of violent acts than those who have
been
in for 9 months. The eta-squared value indicates that about 37%
of the
variance in number of violent acts is a function of how long the
inmate
has been incarcerated.

F We compared
a one-way ANOVA
to a within-subjects F
using the same data. How
would the eta-squared
values for the two
problems compare?
Try It!
suk85842_07_c07.indd 261 10/23/13 1:29 PM
Figure 7.7: Another within-subjects F: Violence and the time of
incarceration
Inmate
1
2
3
4
5
1
month
4

5
3
4
2
3.60
3
months
3
4
1
2
1
2.20
6
months
2
3
1
1

2
1.80
9
months
5
4
2
3
3
3.40
Row
Means
3.50
4.0
1.750
2.50
2.0
Column
Means
Percentile Improvement

Source
Residual
Subjects
Columns
Total
SS
31.75
11.75
15.0
5.0
df
19
3
4
12
MS
3.917
0.417

F
9.393
The ANOVA Table
1. SStot = �(x � MG)2 = 31.750
Verify that,
4. SSresid = SStot � SScol � SSsubj = 31.75 � 11.75 � 15 =
5.0
F0.05(3.12) = 3.49. F is sig.
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 +
(Mcol3 � MG) 2ncol3 + (Mcol14 � MG) 2ncol14
(3.6 � 2.75)25 + (2.2 � 2.75)25 + (1.8 � 2.75)25 + (3.4 �
2.75)25 = 11.750
(3.5 � 2.75)24 + (4.0 � 2.75)24 + (1.75 � 2.75)24 + (2.5 �
2.75)24 + (2.0 � 2.75)24 = 15.0
3. SSsubj = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + (Mr3 �
MG) 2nr3 + (Mr4 � MG) 2nr4 + (Mr5 � MG) 2nr5
MG = 2.750
MSw
n
0.417
5

M1 = 3.6
M1 = 3.6
1.4*
M2 = 2.2
1.8*
0.4
M3 = 1.8
0.2
1.2
1.6*
M4 = 3.4
M2 = 2.2
M3 = 1.8
M4 = 3.4
The Post-hoc test: HSD = x0.05�( ) = 4.20�( ) =
1.213
SScol
SStot
11.75

= 0.370% of the variance in violence witnessed is related to
how long
the inmate has been incarcerated.
=n2 =
31.75
suk85842_07_c07.indd 262 10/23/13 1:29 PM
A Within-Subjects F in Excel
In spite of the important increase in power that is available
compared to independent-
groups tests, a dependent-groups ANOVA is not one of the more
common tests. It is not
one of the options Excel offers in the list of Data Analysis
Tools, for example. However,
like many statistical procedures, there are a number of
repetitive calculations involved
and Excel can simplify these. We will complete the second
problem as an example.
1. Set the data up in four columns just as they are in Figure 7.8,
but create a blank
column to the right of each column of data. With a row at the
top for the labels,
data begins in cell A2.
2. Calculate the row and column means as well as a grand mean
as follows:
a. For the column means, place the cursor in cell A7 just

beneath the last value in
the first column and enter the formula, 5average(A2:A6)
followed by Enter.
To repeat this for the other columns, left click on the solution
that is now in
A7, drag the cursor across to G7, and release the mouse button.
In the Home
tab, click Fill and then Right. This will repeat the column-
means calculations
for the other columns. Delete the entries this makes to cells B7,
D7, and F7,
which are still empty at this point.
b. For the row means, place the cursor in cell I2 and enter the
formula
5average(A2,C2,E2,G2) followed by Enter. To repeat this for
the other rows,
left click on the solution that is now in I2, drag the cursor down
to I6, and
release the mouse button. In the Home tab, click Fill and then
Down. This will
repeat the calculation of means for the other rows.
c. For the grand mean, place the cursor in cell I7 and enter the
formula
5average(I2:I6) followed by Enter (the mean of the row means
will be the
same as the grand mean—the same could have been done with
the column
means).
3. To determine the SStot:
a. In cell B2, enter the formula 5(A222.75)^2 and press Enter.
This will square

the difference between the value in A2 and the grand mean. To
repeat this for
the other data in the column, left-click the cursor in cell B2 and
drag down to
cell B6. Click Fill and Down. With the cursor in cell B7, click
the summation
sign ( a ) at the upper right of the screen and press Enter.
Repeat these steps
for columns D, F, and H.
b. With the cursor in H9, type in SStot5 and press Enter. In
cell I9, enter the
formula 5Sum(B7,D7,F7,H7) and press Enter. The value will be
31.75, which
is the total sum of squares.
4. For the SScol:
a. In cell A8, enter the formula 5(3.622.75)^2*5 and press
Enter. This will square
the difference between the column mean and the grand mean
and multiply
the result by the number of measures in the column, 5. In cells
C8, E8, and
G8, repeat this for each of the other columns, substituting the
mean for each
column for the 3.60 that was the column 1 mean.
b. With the cursor in H10, type in SScol5 and press Enter. In
cell I10, enter the
formula 5Sum(A8,C8,E8,G8) and press Enter. The value will be
11.75, which
is the sum of squares for the columns.
suk85842_07_c07.indd 263 10/23/13 1:29 PM

5. For the SSrows:
a In cell J2, enter the formula 5(I222.75)^2*4 and press Enter.
Repeat this in
rows I32I6 by left clicking on what is now I2 and dragging the
cursor down to
cell I6. Click Fill and Down.
b. With the cursor in H11, type in SSrow5 and press Enter. In
cell I11, enter the
formula 5Sum(J2:J6) and press Enter. The value will be 15.0,
which is the sum
of squares for the participants.
6. For the SSresid, in cell H12, enter SSresid5 and press Enter.
In cell I12, enter the
formula 5I102I112I12. The resulting value will be 5.0.
We used Excel to determine all the sum-of-squares values. Now,
the mean squares are
determined by dividing the sums of square for columns and
residual by their degrees of
freedom:
MScol 5
11.75
3
5 3.917
MSresid 5
5

12
5 .417
F 5
MScol
MSresid
5
3,917
.417
5 9.393, which agrees with the earlier calculations done
by hand.
To create the ANOVA table,
• beginning in cell A10, type in Source; in B10, SS; in C10, df;
in D10, MS; in E10, F;
and in F10, Fcrit.
• Beginning in cell A11 and working down, type in total,
columns, rows, residual.
• For the sum-of-squares values:
• In cell B11, enter 5I9.
For the degrees of freedom:
• In cell C11, enter 19 for total degrees of freedom.
• In cell C12, enter 3 for columns degrees of freedom.
• In cell C13, enter 4 for rows degrees of freedom.
• In cell C14, enter 12 for residual degrees of freedom.

For the mean squares:
• in cell D12, enter 5B12/C12. The result is MScol.
• in cell D14, enter 5B14/C14. The result is MSresid.
For the F value in cell E12, enter 5D12/D14.
In cell F12, enter the critical value of F for 3 and 12 degrees of
freedom, which is 3.49.
suk85842_07_c07.indd 264 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
The list of commands looks intimidating, but mostly because
every keystroke has been
included. With some practice, this will become second nature.
Figure 7.8 is a screenshot of
the result of the calculations.
Figure 7.8: A within-subjects F problem in Excel
7.4 Presenting Results
Using the data from Figure 7.8 and analyzing it in Excel, we see
the output table high-lighted in yellow. The table is broken
down reading from left to right in columns that
include Sum of Squares (SS), Degrees of Freedom (df), mean
squares (MS), F ratio (F).
and F critical value (Fcrit). Interpreting the results, we can see
that the F ratio is based on
the MScol/MSresid, which is 3.92/.416 5 9.4. This value is
larger than our F critical value

indicating significance at the p , .05 level. Recall that a
psychologist has collected data on
an incarcerated group over a 9-month span and the number of
violent crimes they have
committed. Upon analyzing the findings, we see that as time
elapses from 1 month to 3
months to 6 months to 9 months there is a significant change in
the number of violent acts
being committed. However, you cannot be sure about when the
two significant differ-
ences occurred since there are four points in time in which data
was captured (1 month,
suk85842_07_c07.indd 265 10/23/13 1:29 PM
3 months, 6 months, and 9 months). As a result post hoc tests
will be needed to indicate
where these differences lie.
In regards to the hypotheses of the repeated measures ANOVA,
it would be a comparison
of mean differences across time. Therefore,
H0: m1month 5 m3months 5 m6months 5 m9months
The null hypothesis states there is no significant difference
between the
mean number of violent incidences from 1 month to 3 months to
6 months
to 9 months. Keep in mind that ANOVAs are an omnibus test so
we are
testing any overall differences between the months. There may

be differ-
ences between any two months and not necessarily all of the
months with
each other, where we can follow up with paired comparisons.
Ha: m1month ? m3months ? m6months ? m9months
The alternative (or research) hypothesis states there is a
significant differ-
ence between the mean number of violent incidences from 1
month to 3
months to 6 months, to 9 months. The alternative can also be a
prediction in
the increase between the mean number of violent incidences
from 1 month
to 3 months to 6 months to 9 months.
Ha: m1month , m3months , m6months , m9months
To analyze and present results using SPSS, let us first look at
an example of a paired
sample/dependent t-test then a repeated-measures ANOVA
example.
SPSS Example 1: Steps for a Paired (Matched)-Samples t-Test
From the data set provided (Figure 7.9), a college professor
wants to look at mean differ-
ences in scores over the first two quizzes of his statistics class.
With his scores in SPSS,
go to Analyze S Compared Means S Paired-Samples. Input
Score 1 in the first box and
Score 2 in the second box that is available as seen in Figure
7.10. The click OK. The result-
ing SPSS output tables are provided in Figure 7.11.

suk85842_07_c07.indd 266 10/23/13 1:29 PM
Figure 7.9: Data set for quiz scores
Figure 7.10: SPSS steps in performing a paired-samples t-test
suk85842_07_c07.indd 267 10/23/13 1:29 PM
Figure 7.11: SPSS results of a paired-samples t-test
SPSS Example 2: Steps for a Repeated-Measures ANOVA
This example uses data gathered from SPSS (PASW) On-Line
Training Workshop (1999),
available at the following link:
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data
.htm
The data is measuring cancer treatments over time (see Figure
7.12):
TOTALCIN 5 oral condition at the initial stage
TOTALCW2 5 oral condition at the end of week 2

Go to Analyze S General Linear Model S Repeated Measures.
As shown in Figure 7.13,
type in the Within-Subject Factor Name: CW_Times, Number of
Levels: 4, and the Mea-
sure Name: CW; then click Define. As shown in Figure 7.14,
put in the four TOTALCW
variables in simultaneous order in the Within-Subjects
Variables box, click Plots, and
move CW_Times into the Horizontal Axis. Then click Options
and move CW_Times into
Display Means for, click Compare Main Effects, and select
Sidak from the dropdown box
just below. Then click Descriptive statistics and Estimates of
effect size. Click Continue
and OK.
Paired Samples Statistics
Pair 1
N
14
14
Std. Deviation
42.893
38.7857
Mean Std. Error Mean
4.3992

7.15949
1.1757
1.91345Score_2
Score_1
Paired Samples Test
Pair 1
Std.
Deviation
Mean Std.
Error Mean
95% Confidence
Interval of
the Difference
Lower
Bound
Upper
Bound
Paired Differences t df Sig.
(2-tailed)
Score_2
Score_1

4.10714 7.00598 1.87243 .06201 8.15228 2.193 13 .047
suk85842_07_c07.indd 268 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Figure 7.12: Data set of cancer treatments over time
Data from
Figure 7.13: Repeated-measures steps (part 1)
Data from
suk85842_07_c07.indd 269 10/23/13 1:29 PM
Figure 7.14: Repeated-measures steps (part 2)
Data from
suk85842_07_c07.indd 270 10/23/13 1:29 PM

Figure 7.15: SPSS results of cancer treatments over time
Tests of Within-Subjects Effects
Measure: CW
Source Type III Sum
of Squares
df F Sig. Partial
Eta
Squared
Mean
Square
Imputation
Number
13.760
13.760
13.760
220.340
220.340
220.340
220.340

384.324
384.324
384.324
384.324
3
2.194
2.424
1.000
72
52.656
58.167
24.000
73.447
100.428
90.913
220.340
5.338
7.299

6.607
16.013
13.760
.000
.000
.000
.000
.364
.364
.364
.364
1
CW_Times
Error(CW_Times)
Sphericty
Assumed
Greenhouse-
Geisser
Huynh-Feldt

Lower-Bound
Sphericty
Assumed
Greenhouse-
Geisser
Huynh-Feldt
Lower-Bound
Mauchly’s Test of Sphericitya
Measure: CW
Mauchly’s WWithin
Subjects
Effect
Imputation
Number
Approx.
Chi-Square
.808 .333CW_Times
Tests the null hypothesis that the error covariance matrix of the
orthonormalized transformed dependent
variables is proportional to an identity matrix.
a. Design: Intercept
Within Subjects Design: CW_Times
b. May be used to adjust the degrees of freedom for the
averaged tests of significance. Corrected tests are
displayed in the Tests of Within-Subjects Effects table.

.596 11.752 5 .039 .731
Epsilonbdf Sig.
Greenhouse-
Geisser
Huynh-
Feldt
Lower-
Bound
1
Descriptive Statistics
Std. DeviationMeanImputation Number N
8.28
6.52
10.36
9.76
2.542
1.531
3.475
3.566
25

25
25
25
1
TOTALCIN
TOTALCW2
TOTALCW4
TOTALCW6
suk85842_07_c07.indd 271 10/23/13 1:29 PM
Figure 7.15: SPSS results of cancer treatments over time
(continued)
Data from
Figure 7.16: SPSS output graph of cancer treatments over time
Data from
Pairwise Comparisons
Measure: CW

(I)
CW_Times
(J)
CW_Times
Mean
Difference
(I-J)
Std. Error Sig.bImputation
Number
.011
.000
.001
.011
.043
.262
.000
.043
.800
.001
.262

.800
.504
.694
.755
.504
.709
.717
.694
.709
.489
.755
.717
.489
�1.760*
�3.840*
�3.244*
1.760*
�2.080*

�1.484
3.840*
2.080*
.596
3.244*
1.484
�.596
2
3
4
1
3
4
1
2
4
1
2

3
1
2
3
4
1
95% Confidence
Interval for Differenceb
�3.205
�5.830
�5.407
.315
�4.113
�3.539
1.850
.047
�.806
1.082

�.571
�1.997
Lower
Bound
�.315
�1.850
�1.082
3.205
�.047
.571
5.830
4.113
1.997
5.407
3.539
.806
Upper
Bound
Based on estimated marginal means
*. The mean difference is significant at the .05 level.

b. Adjustment for multiple comparisons: Sidak.
suk85842_07_c07.indd 272 10/23/13 1:29 PM
CHAPTER 7Section 7.5 Interpreting Results
Based on the results (Figures 7.15 and 7.16), the Descriptive
Statistics
table shows that there are differences in the mean, how
significant
those differences are determined by the ANOVA, and the
consequent
post hoc tests. Next the Mauchly’s test of sphericity shows a
signifi-
cant value based on the x2 distribution with a significance value
at
the p , .05 level indicating a violation of the sphericity
assumption.
To reiterate, this indicates that there is variance between pairs
of treat-
ments or measures for the group. Since there was a significant
differ-
ences between pairs of treatments compared to other pairs, a
viola-
tion has occurred. Therefore, looking at the Tests of Within-
Subjects
Effects table, sphericity cannot be assumed, and a df adjustment
will
be made by using the Greenhouse-Geisser or the Huynh-Feldt
calcu-
lations. As seen in the F-value (13.760) and the df, adjustment
does

not make any difference as there are significant differences
across
CW_Times (p , .05). The Pairwise Comparisons table is where
we see
between-treatment differences indicating that all treatment
times are
significantly different except between times 2 and 4 (p 5 .262)
and 3
and 4 (p 5 .800) that are not statistically significant. The line
graph
also indicates a trend in differences between the first, second,
and
third treatment times but not much difference from the third to
fourth
treatments.
7.5 Interpreting Results
Refer to the most recent edition of the APA manual for specific
detail on formatting sta-tistics, but Table 7.4 may be used as a
quick guide in presenting the statistics covered
in this chapter.
Table 7.4: Guide to APA formatting of F statistic results
Abbreviation or Term Description
F F test statistic score
Partial-h2 Partial-eta-squared: a measure of effect size for
ANOVA
W Mauchly’s Test of Sphericity
x2 Distribution used for nonparametric tests such as Mauchly’s
test of

sphericity and Friedman’s ANOVA
SS Sum of Squares
MS Mean Square
Source: Publication Manual of the American Psychological
Association, 6th edition. © 2009 American Psychological
Association,
pp. 119–122.
Using the results from SPSS Example 1, Figure 7.11, we could
present the results in the
following way:
Access the
data and the
accompanying video
via the links below to
perform this analysis
yourself. Both links are
resources are provided
by the Central Michigan
University.
Data link: http://calcnet
.mth.cmich.edu/org/spss
/Prj_cancer_data.htm
Video: http://calcnet
.mth.cmich.edu/org/spss
/V16_materials/Video
_Clips_v16/19repeated
_measures/19repeated
_measures.swf

Try It!
suk85842_07_c07.indd 273 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
• There was a significant difference between quiz scores 1 (M 5
42.89) and
2 (M 5 38.79) as the difference in means significantly decreased
over time
t(13) 5 2.19, p , .05.
Using the results from the SPSS Example 2, Figures 7.15 and
7.16, we could present the
results in the following way:
• The overall difference between CW_Times was significantly
different using the
Greenhouse-Geisser results F(2.19, 52.66) 5 13.76, p 5 .203,
partial-h2 5 .364.
• Based on the Sidak pairwise comparison, the CW_1 time (M 5
6.52, SD 5 1.53)
was significantly different from all the other times. CW_2 (M 5
8.28, SD 5 2.54)
and CW_4 (M 5 10.36, SD 5 3.56) were also significantly
different from each other.
7.6 Nonparametric Tests
You may have noticed that for every parametric test, there is a

nonparametric equiva-lent. The rationale behind nonparametric
tests is to obtain a conservative estimate of
significance when violations in parametric assumptions have
occurred. Such assumptions
include linearity, abnormal distributions, and small data sets.
The nonparametric equivalent of the dependent-samples t-test is
the Wilcoxon signed-
ranks test (not to be confused with Chapter 5’s Wilcoxon rank-
sum test for independent
samples t-test). Frank Wilcoxon proposed both of these in a
single paper published in
1945. The Wilcoxon signed-ranks W-test is known as the
Wilcoxon t-test for dependent
samples (not independent ones). In brief, the steps in the
calculation of W are calculating
the differences between scores, taking an absolute value
(removing the 1/2 sign), rank-
ing the absolute values, reassigning the original (1/2) sign, and
then summing the ranks.
The nonparametric equivalent of the parametric repeated-
measures ANOVA is Fried-
man’s ANOVA. Essentially the analysis looks at the difference
in the mean ranks (instead
of means) across time, treatments, or matched/equivalent
groups. By analyzing differ-
ence in the mean ranks, the analysis is in effect eliminating
extremes points, or outliers, in
the distribution. As noted, this is the disadvantage of using the
mean, as these are affected
by outliers. Again, nonparametric tests are a conservative,
distribution-free analysis that
are used when parametric violations have occurred. As a result,
it is more difficult to find
significance; on the other hand, they are conservative in that

there is a lower probability
of having a type I error. One important point to note is that even
though mean ranks are
used to calculate significant difference between times,
treatments, or matched/equivalent
groups, the results are reported in terms of the median
differences as will be shown in the
next example.
Friedman’s nonparametric ANOVA dates back to 1937 and is
based on ranked (ordinal)
data and the comparison of medians. An alternative test that is
nonparametric and similar
to Friedman’s test is Cochran’s Q-test, which is used for
dichotomous data (i.e., only two
response choices as in yes/no).
suk85842_07_c07.indd 274 10/23/13 1:29 PM
Worked Example of the Friedman’s Nonparametric ANOVA and
Wilcoxon
Signed-Ranks W Using SPSS
To perform the Friedman’s nonparametric ANOVA using the
data set in Figure 7.17, go to
Analyze S Nonparametric Tests S Legacy Dialogs S K Related
Samples. Place Score_1,
Score_2, and Score_3 into the Test Variables box (see Figure
7.18). Click on Statistics,
check the Quartiles box, and then click Continue and OK.
Figure 7.17: Data set for the Friedman’s ANOVA test

suk85842_07_c07.indd 275 10/23/13 1:29 PM
Figure 7.18: Steps in SPSS for the Friedman’s nonparametric
ANOVA
Figure 7.19: Results of the Friedman’s nonparametric ANOVA
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000
44.500
40.2500
43.1250
46.125

44.7500
14
14
14
Score_1
Score_2
Score_3
Ranks
Mean Rank
1.86
2.43
1.71
Score_1
Score_2
Score_3
Test Statisticsa
14
2

4.148
.126
Chi-Square
N
df
Asymp. Sig.
a. Friedman Test
suk85842_07_c07.indd 276 10/23/13 1:29 PM
Interpreting Results
Based on the results of the Friedman’s nonparametric ANOVA
in Figure 7.19, there is no
significant difference between quiz scores over time based on
the x2(14) 5 4.15, p 5 .126. In
other words, the students did not change significantly over
testing times. Since there were
no significant differences in the scores, some researchers will
conclude that no post hoc
tests will be needed. The few steps required to perform a post
hoc Wilcoxon signed-ranks
W-test using software will require minimal additional effort;
however, the researcher can
then be assured that there exist no such significant differences
between groups.

The steps for performing the W-test (Figure 7.20) are Analyze S
Nonparametric
Tests S Legacy Dialogs S 3 Related Samples. Input Score_1 and
Score_2 in Row 1, Score_2
and Score_3 into Row 2, and then Score_1 and Score_3 into
Row 3. Click on Options,
check Quartiles, and click Continue and OK.
Figure 7.20: Steps in SPSS for the Wilcoxon signed-ranks W-
test
suk85842_07_c07.indd 277 10/23/13 1:29 PM
Figure 7.21: Results of the Wilcoxon signed-ranks W-test
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000

44.500
40.2500
43.1250
46.125
44.7500
14
14
14
Score_1
Score_2
Score_3
Test Statisticsa
Score_2-Score_1
.045
�2.001b
Score_3-Score_2
.900
�.126b
Score_3-Score_1

.069
�1.821bZ
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
Ranks
Sum of RanksMean RankN
4.50
7.17
8.42
6.81
5.88
8.15
13.50
64.50
50.50
54.50
23.50
81.50
3b

9a
2c
14
6e
8d
0f
14
4h
10g
0i
14
Score_2-Score_1
Negative Ranks
Positive Ranks
Ties
Total
Negative Ranks
Positive Ranks
Ties

Total
Negative Ranks
Positive Ranks
Ties
Total
Score_3-Score_2
Score_3-Score_1
a.
b.
c.
d.
e.
f.
g.
h.
i.
Score_2
Score_2
Score_2
Score_3
Score_3
Score_3
Score_3
Score_3
Score_3
<
>

=
<
>
=
<
>
=
Score_1
Score_1
Score_1
Score_2
Score_2
Score_2
Score_1
Score_1
Score_1
Asymp. Sig. (2-tailed)
suk85842_07_c07.indd 278 10/23/13 1:29 PM
CHAPTER 7Summary
Looking at the results (Figure 7.21) of the Wilcoxon signed-
ranks W-test, we see that using
this as a post hoc to the Friedman’s ANOVA does have benefits
as it identifies a signifi-
cant difference between Scores 1 and 2 that was not detected
using Friedman’s ANOVA.
This is a significant point in the previously noted debate over
whether to run post hocs
based on the significance of the F value. As a result, the
conclusion here, based on the

W-test, is that there is a significant difference between Score_1
(Mdn 5 44.50) and Score_2
(Mdn 5 40.00), Z(14) 5 22.00, p , .05, while there were no
significant differences in
Score_2 (Mdn 5 40.00), and Score_3 (Mdn 5 40.25), Z(14) 5
20.126, p 5 .90, and Score_1
(Mdn 5 44.50), and Score_3 (Mdn 5 40.25), Z(14) 5 21.821, p 5
.90.
Summary
Any statistical procedure has advantages and disadvantages. The
downside of the dif-
ferent independent-groups designs is that subjects within groups
often respond to the
independent variable differently. Those differences are a source
of error variance that
is different for each group. No matter how carefully a
researcher randomly selects the
groups to be used in a study, there are going to be differences in
the way that people in
the same group respond to whatever stimulus is offered. Both
the before/after t-test and
within-subjects F test eliminate that source of error variance by
either using the same
people repeatedly, or matching subjects on the most important
characteristics. Control-
ling error variance results in a test that is more likely to detect a
significant difference
(Objectives 1 and 5).
In dependent-groups designs, using the same group repeatedly
allows for the number
of participants involved to be fewer (Objectives 1, 2, 3, 4, and
6). One of the downsides
to repeated-measures designs is that they take more time to
complete. Unless subjects

are matched across measures, the different levels of the
independent variable cannot be
administered concurrently as they can in independent-groups
tests. With more time, there
is an increased potential for attrition. If one of the participants
drops out of a repeated-
measures study, the data is lost from all the measures of the
dependent variable for that
subject (Objectives 2 and 4).
Having noted some of the differences between dependent-groups
designs and their
independent-groups equivalents, it is important to note their
consistencies as well.
Whether the test is independent t, before/after t, one-way
ANOVA, or a within-subjects
F, in each case the independent variable is nominal scale, and
the dependent variable is
interval or ratio scale (Objective 2).
In addition, two repeated-measures designs were performed
(i.e., t-tests and ANOVAs)
where we presented several scenarios to test their respective
null hypotheses to find sup-
port for alternative ones (Objectives 3 and 6). Results and
conclusions were presented,
interpreted, and reported in APA format (Objectives 7 and 8).
Finally, the Wilcoxon signed-
ranks W-test and the Friedman’s Nonparametric ANOVA were
discussed with an appro-
priate example (Objective 9).
suk85842_07_c07.indd 279 10/23/13 1:29 PM

CHAPTER 7Key Terms
There is something else that all the tests in this chapter have in
common. They all test the
hypothesis of difference. Like the z-test and the one sample t-
test, they are about signifi-
cant differences. Sometimes, however, the question involves the
strength of the relation-
ships between variables. Those discussions will introduce
correlation and the hypothesis
of association that are the focus of Chapter 8.
Key Terms
before/after t-test A dependent-group’s
application of the t-test, also known as a
pre/post t-test. In this particular applica-
tion, one group is measured before and
after a treatment.
confounding variables Variables that
influence an outcome but are uncontrolled
in the analysis and obscure the effects of
other variables. For example, if a psycholo-
gist is interested in gender-related differ-
ences in problem-solving ability but doesn’t
control for age differences, differences in
gender may be confounded by differences
that are actually age-related.
dependent-groups designs Statistical
procedures in which the groups are related,
either because multiple measures are taken
of the same participants or because each
participant in a particular group is matched
on characteristics relevant to the analysis to

a participant in the other groups with the
same characteristics. Dependent-groups
designs reduce error variance because they
reduce score variation due to factors unre-
lated to the independent variable.
matched-pairs or dependent-samples
t-test A dependent-group’s application
of the t-test. In this particular application,
each participant in the second group is
paired to a participant in the first group
with the same characteristics in order to
limit the error variance that would other-
wise stem from using dissimilar groups.
sphericity Nonsignificant differences in
the dependent variable across pairs of treat-
ments or times for all participants in the
group. By minimizing this within-group
error variance, sphericity may be assumed.
Significant within-group error variances
between pairs of treatments are a violation
of sphericity. Such variances are detected
using Mauchly’s sphericity (W) test.
within-subjects F The dependent-group’s
equivalent of the one-way ANOVA. In this
procedure either participants in each group
are paired on the relevant characteristics
with participants in the other groups, or
one group is measured repeatedly after
different levels of the independent variable
are introduced.
suk85842_07_c07.indd 280 10/23/13 1:29 PM

CHAPTER 7Chapter Exercises
Chapter Exercises
Answers to Try It! Questions
The answers to all Try It! questions introduced in this chapter
are provided below.
A. Small samples tend to be platykurtic because the data in
small samples is often
highly variable. This translates into relatively large standard
deviations and
large error terms.
B. If groups are created by random sampling, they will differ
from the population
from which they were drawn only by chance. That means that
with random
sampling, there can be error, but its potential to affect research
results dimin-
ishes as the sample size grows.
C. The before/after t-test and the matched-pairs t-test differ
only in that the before/
after test uses the same group twice and the matched-pairs test
matches each
subject in the first group with one in the second group who has
similar charac-
teristics. The calculation and interpretation of the t value are
the same in both
procedures.
D. The within-subjects test will detect a significant difference
more readily than an

independent t-test. Power in statistical testing is the likelihood
of detecting sig-
nificance.
E. Because the same subjects are involved in each set of
measures, the within-
subjects test allows us to calculate the amount of score
variability due to indi-
vidual differences in the group and eliminate it because it is the
same for each
group. This source of error variance is eliminated from the
analysis, leaving a
smaller error term.
F. The eta-squared value would be the same in either
problem. Note that in a one-
way ANOVA, eta-squared is the ratio of SSbet to SStot. In the
within-subjects F, it
is SScol to SStot. Because SSbet and SScol both measure the
same variance and the
SStot values will be the same in either case, the eta-squared
values will likewise
be the same. What changes, of course, is the error term.
Ordinarily, SSresid will
be much smaller than SSwith, but those values show up in the F
ratio by virtue of
their respective MS values, not in eta-squared.
Review Questions
The answers to the odd-numbered items can be found in the
answers appendix.
1. A group of clients is being treated for a compulsive behavior
disorder. The number
of times in an hour that each one manifests the compulsivity is
gauged before and

after a mild sedative is administered. The data is as follows:
Before After
1. 5 4
2. 6 4
3. 4 3
(continued)
suk85842_07_c07.indd 281 10/23/13 1:29 PM
Before After
4. 9 5
5. 5 6
6. 7 3
7. 4 2
8. 5 5
a. What is the standard deviation of the difference scores?
b. What is the standard error of the mean for the difference
scores?
c. What is the calculated value of t?
d. Are the differences statistically significant?

2. A researcher is examining the impact that a political ad has
on potential donors’
willingness to contribute. The data indicates the amount (in
dollars) each is willing
to donate before viewing that advertisement and after viewing
the advertisement.
Before After
1. 0 10
2. 20 20
3. 10 0
4. 25 50
5. 0 0
6. 50 75
7. 10 20
8. 0 20
9. 50 60
10. 25 35
a. Are there significant differences in the amount?
b. What is the value of t if this is done as an independent t-
test?
c. Explain the difference between before/after and independent
t-tests.
3. Participants attend three consecutive sessions in a business

seminar. In the first,
there is no reinforcement for responding to the session
moderator’s questions. In
the second, those who respond are provided with verbal
reinforcers. In the third,
(continued)
suk85842_07_c07.indd 282 10/23/13 1:29 PM
responders receive bits of candy as reinforcers. The dependent
variable is the num-
ber of times the participants respond in each session.
None Verbal Token
1. 2 4 5
2. 3 5 6
3. 3 4 7
4. 4 6 7
5. 6 6 8
6. 2 4 5
7. 1 3 4
8. 2 5 7

a. Are the column-to-column differences significant? If so,
which groups are sig-
nificantly different from which?
b. Of what data scale is the dependent variable?
c. Calculate and explain the effect size.
4. In the calculations for Exercise 3, what step is taken to
minimize error variance?
a. What is the source of that error variance?
b. If Exercise 3 had been a one-way ANOVA, what would have
been the degrees of
freedom for the error term?
c. How does the change in degrees of freedom for the error
term in the within-
subjects F affect the value of the test statistic?
5. Because SScol in the within-subjects F contains the treatment
effect and measure-
ment error, if there is no treatment effect, what will be the value
of F?
6. Why is matching uncommon in within-subjects F analyses?
7. A group of nursing students is approaching the licensing test.
The level of anxiety
for each student is measured at 8 weeks prior to the test, then 4
weeks, 2 weeks,
and 1 week before the test. Assuming that anxiety is measured
on an interval scale,
are there significant differences?
Student
Number

8 weeks 4 weeks 2 weeks 1 week
1. 5 8 9 9
2. 4 7 8 10
3. 4 4 4 5
4. 2 3 5 5
5. 4 6 6 8
6. 3 5 7 9
7. 4 5 5 4
8. 2 3 6 7
suk85842_07_c07.indd 283 10/23/13 1:29 PM
a. Is anxiety related to the time interval?
b. Which groups are significantly different from which?
c. How much of anxiety is a function of test proximity?
8. A psychology department sponsors a study of the relationship
between partici-
pation in a particular internship opportunity and students’ final
grades. Eight
students in their second year of graduate study are matched to
eight students in
the same year by grade. Those in the first group participate in

the internship. Stu-
dents’ grades after the second year are compared.
Student Pair Number Internship No Internship
1. 3.6 3.2
2. 2.8 3.0
3. 3.3 3.0
4. 3.8 3.2
5. 3.2 2.9
6. 3.3 3.1
7. 2.9 2.9
8. 3.1 3.4
a. Are the differences statistically significant?
b. This should be done as a dependent-samples t-test. Because
there are two sepa-
rate groups involved, why?
9. A team of researchers associated with an accrediting body
studies the amount of
time professors devote to their scholarship before and after they
receive tenure.
Scores are hours per week.
Professor Number Before Tenure After Tenure
1. 12 5

2. 10 3
3. 5 6
4. 8 5
5. 6 5
6. 12 10
7. 9 8
8. 7 7
a. Are the differences statistically significant?
b. What is t if the groups had been independent?
c. What is the primary reason for the difference in the two t
values?
suk85842_07_c07.indd 284 10/23/13 1:29 PM
10. A supervisor is monitoring the number of sick days
employees take by month. For
seven people, they are as follows:
Employee Number Oct Nov Dec
1. 2 4 3
2. 0 0 0

3. 1 5 4
4. 2 5 3
5. 2 7 7
6. 1 3 4
7. 2 3 2
a. Are the month-to-month differences significant?
b. What is the scale of the independent variable in this
analysis?
c. How much of the variance does the month explain?
11. If the people in each month of the Exercise 10 data were
different, it would have
been a one-way ANOVA.
a. Would the result have been significant?
b. Because total variance (SStot) is the same in either Exercise
10 or 11, and the SScol
(Exercise 10) is the same as SSbet (Exercise 11), why are the F
values different?
Analyzing the Research
Review the article abstracts provided below. You can then
access the full articles via your
university’s online library portal to answer the critical thinking
questions. Answers can be
found in the answers appendix.
Using Repeated Measures ANOVA for a Stress Management
Study

Elo, A., Ervasti, J., Kuosma, E., & Mattila, P. (2008).
Evaluation of an organizational stress
management program in a municipal public works organization.
Journal of Occu-
pational Health Psychology, 13(1), 10–23.
Article Abstract
The aim of this study was to investigate the effects of employee
participation in an orga-
nizational stress management program consisting of several
interventions aiming to
improve psychosocial work environment and well-being. Pre-
and postintervention ques-
tionnaires were used to measure the outcomes with a 2-year
interval. This article describes
the background of the program, results of previously published
effect studies, and a quali-
tative evaluation of the program. The authors also tested the
effects of level of participa-
tion in all interventions among the employees of the service
production units by 2 (time)
3 (group) repeated measures ANOVAs (n 5 625). “Active
participation” (more than 5.5
days) had a positive effect on feedback from supervisor and
flow of information. Work
climate remained on a permanent level while it decreased in the
categories of moderate
suk85842_07_c07.indd 285 10/23/13 1:29 PM
and nonparticipation. The level of participation did not improve

individual well-being or
other aspects of psychosocial work environment as postulated
by the work stress models.
The qualitative evaluation and practical conclusions drawn by
the management of the
Organization provided a positive impression of the impact of
the program.
Critical Thinking Questions
1. What is the independent variable for which subjects are being
tested, under all
treatment levels?
2. Explain the importance of power in relation to this within-
group design.
3. What is the disadvantage of testing subject group’s pre- and
postparticipation
program intervention?
4. Does this study need to worry about sphericity when
conducting the repeated-
measures ANOVA? Why or why not?
Using ANOVA for a Personality Disorder Scales Study
Wise, E. A. (1995). Personality disorder correspondence among
the MMPI, MBHI, and
MCMI. Journal of Clinical Psychology, 51(6), 790–798.
Article Abstract
MMPI, MBHI, and MCMI personality disorder scales were
analyzed for convergent and
discriminant validity. Friedman’s ANOVA indicated that there

were no significant differ-
ences among the sample’s averaged scale scores. Further
analyses of the data, however,
demonstrated that the Millon instruments classified
significantly more of the sample as
personality disordered when compared to Morey’s MMPI
personality disorder scales. In
addition, codetype correspondence among the three instruments
was only 4 to 6%. When
the instruments were analyzed in a pair-wise fashion, codetype
correspondence increased
to approximately 10 to 20%. These data indicate that these
personality disorder scales do
not demonstrate construct equivalence, particularly at the level
of the individual profile.
1. Why did this study run a Friedman’s Nonparametric
ANOVA?
2. The Friedman’s Nonparametric ANOVA showed no
significant differences among
the tests by scale means. What was the significance level for
this to be true?
3. What is reported in Friedman’s Nonparametric ANOVA?
Please label what each
piece is from the Friedman output when comparing MBHI and
MCMI compared
to MMPI.
4. Suppose the Friedman’s ANOVA was significant. Would we
run a post hoc test?
What type of post hoc test?

suk85842_07_c07.indd 286 10/23/13 1:29 PM
iStockphoto/Thinkstock
chapter 6
Analysis of Variance (ANOVA)
Learning Objectives
After reading this chapter, you will be able to. . .
1. explain why it is a mistake to analyze the differences
between more than two groups with
multiple t-tests.
2. relate sum of squares to other measures of data variability.
3. compare and contrast t-test with ANOVA.
4. demonstrate how to determine which group is significant in
an ANOVA with more than
two groups.
5. explain the use of eta-squared in ANOVA.
6. present statistics based on ANOVA results in APA format.
7. interpret results and draw conclusions of ANOVA.
8. discuss nonparametric Kruskal-Wallis H-test compared to the
ANOVA.

CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_06_c06.indd 183 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
Ronald. A. Fisher was present at the creation of modern
statistical analysis. During the early part of the 20th century,
Fisher worked at an agricultural research station in rural
southern England. In his work analyzing the effect of pesticides
and fertilizers on crop
yields, he was stymied by the limitations in Gosset’s
independent t-test, which allowed
him to compare only one pair of samples at a time. In the effort
to develop a more com-
prehensive approach, Fisher created analysis of variance
(ANOVA).
Like Gosset, he felt that his work was important enough to
publish, and like Gosset in his
effort to publish t-test, Fisher had opposition. In Fisher’s case,
the opposition came from
a fellow statistician, Karl Pearson. This is the same man who
created the first department

of statistical analysis at University College, London. In
Chapters 9 and 11 you will study
some of Pearson’s work with correlations as well as Spearman
rho (r) and Chi-square (x2),
which are the analysis of categorical (nominal and ordinal) data.
Pearson also founded
what is probably the most prominent journal for statisticians,
Biometrika. Pearson was an
advocate of making one comparison at a time and of using the
largest groups possible to
make those comparisons.
When Fisher submitted his work to Pearson’s journal with
procedures suggesting that
samples can be small and many comparisons can be made in the
same analysis, Pear-
son rejected the manuscript. So began a long and increasingly
acrimonious relationship
between two men who would become giants in the field of
statistical analysis and end up
in the same department at University College. Interestingly,
Gosset also gravitated to the
department and managed to get along with both of them.
Fisher’s contributions affect more than this chapter. Besides the
development of the
ANOVA, the concept of statistical significance is his as well as
hypothesis testing discussed
in Chapter 5. Note that although a ubiquitous phenomenon,
significance testing itself is
not always accepted by other statisticians. One such adversary
is William [Bill] Kruskal,
who consequently derived the nonparametric version of the
ANOVA—the Kruskal-Wallis
H-test, which is discussed in this chapter. Despite these
philosophical and statistical dif-

ferences, R. A. Fisher made an enormous contribution to the
field of quantitative analysis,
as did his nemesis, Karl Pearson with additional statistical
contributions by William Sealy
Gosset and Bill Kruskal.
6.1 One-Way Analysis of Variance
In any experiment, scores and measurements vary for many
reasons. If a researcher is interested in whether children will
emulate the videotaped behavior of adults whom
they have watched, any differences in the children’s behavior
from before they see the
adults to after are attributed primarily to the adults’ behaviors.
But even if all of the chil-
dren watch with equal attentiveness, it is likely there will be
differences in their behaviors
H1
TX_DC
BLF
TX
BL
BLL
suk85842_06_c06.indd 184 10/23/13 1:40 PM

after the video. Some of those differences might stem from age
differences among the chil-
dren. Perhaps the amount of exposure children otherwise have
to television will prompt
differences in their behavior. Probably differences in their
background experiences will
also affect the way they behave.
In an analysis of how behavior changes as a result of watching
the video, the independent
variable (IV) is whether or not the children have seen the video.
Changes in their behavior,
the dependent variable (DV), reflect the effect of the IV, but
they also reflect all the other
factors that prompt the children to behave differently. An IV is
also referred to as a factor,
particularly in procedures that involve more than one IV.
Behavior changes that are not
related to the IV reflect the presence of error variance attributed
by other factors known
as confounding variables.
When researchers work with human subjects, some level of
error variance is inescap-
able. Even under tightly controlled conditions where all
members of a sample receive
exactly the same treatment, the subjects are unlikely to respond
the same way. There are
just too many confounding variables that also affect their
behavior. Fisher ’s approach
was to calculate the total variability in a problem and then
analyze it, thus the name
analysis of variance.
Any number of IVs can be included in an ANOVA. Here, we are
interested primarily in

ANOVA in its simplest form, a procedure called one-way
ANOVA. The “one” in one-
way ANOVA indicates that there is just one IV in this model. In
that regard, one-way
ANOVA is similar to the independent-samples t-test discussed
in Chapter 5. Both tests
have one IV and one DV. The difference is that the independent
t-test
allows for an IV with just two groups, but the IV in ANOVA
can be any
number of groups generally more than two. In other words, a
one-way
ANOVA with just two groups is the same as an independent-
samples
t-test where the statistic calculated in ANOVA, F is equal to t2;
this is
addressed and illustrated in Section 6.5.
The ANOVA Advantage
The ANOVA and the t-test both answer the same question: Are
there significant differ-
ences between groups? So why bother with another test when
we have the t-test? Suppose
someone has developed a group therapy program for people with
anger management
problems and the question is, are there significant differences in
the behavior of clients
who spend (a) 8, (b) 16, and (c) 24 hours in therapy over a
period of weeks? Why not
answer the question by performing three t-tests as follows?
1. Compare the 8-hour group to the 16-hour group.

A What does the
“one” in one-way
ANOVA refer to?
Try It!
suk85842_06_c06.indd 185 10/23/13 1:40 PM
The Problem of Multiple Comparisons
These three tests represent all possible comparisons, but there
are two problems with this
approach. First, all possible comparisons is a good deal more
manageable if there are three
groups than if there are, say, five groups. If there were five
groups, labeled a through e,
note the number of comparisons needed to cover all possible
comparisons:
1. a to b
2. a to c
3. a to d
4. a to e
5. b to c
6. b to d
7. b to e
8. c to d
9. c to e
10. d to e
All possible comparisons among three tests involve 10 tests as

seen above to cover all the
combinations of tests.
Family-Wise Error
The other problem is an issue of inflated error in hypothesis
testing when doing multiple
tests known as family-wise error. Recall that the potential for
type I error (a) is deter-
mined by the level at which the test is conducted. At a 5 .05,
any significant finding will
result in a type I error an average of 5% of the time. However,
that level of error assumes
that each test is conducted with new data thereby increasing the
family-wise error rate
(FWER). Specifically, if statistical testing is done repeatedly
with the same data, the poten-
tial for type I error does not remain fixed at .05 (or whatever
the level of the testing), but
grows. In fact, if 10 tests are conducted in succession with the
same data as with groups
labeled a, b, c, d, and e mentioned earlier, and each finding is
significant, by the time the
10th test is completed, the potential for alpha error is FWER 5
.40 or a 40% error prob-
ability, as the following procedure illustrates:
P
a
5 1 2 (1 2 pa)n
Where
Pa 5 the probability of alpha error overall
pa 5 the probability of alpha error for the initial significant

finding
n 5 the number of tests conducted where the result was
significant
P
a
5 1 2 (1 2 .052 10
5 1 2 .599
FWER 5 .401
The probability of a type I error at this point is 4 in 10 or 40%!
suk85842_06_c06.indd 186 10/23/13 1:40 PM
The business of raising the (1 2 pa) difference to the 10th power
(or however many com-
parisons there are) is not only tedious, but the more important
problem is that the prob-
ability of a type I error does not remain fixed when there are
successive significant results
with the same data. Therefore, using multiple t-tests is never a
good option.
In the end, running one test in an overall ANOVA will control
for inflated FWER. An
ANOVA is therefore termed an omnibus test, as it will test the
overall significance of the
research model based on the differences between sample means.
It will not tell you which
two means are significantly different, which is why follow-up

post hoc comparisons are
executed. These concepts will be discussed in further detail
throughout the chapter.
The Variance in Analysis of Variance (ANOVA)
To analyze variance, Fisher began by calculating total
variability from all sources. He
recognized that when scores vary in a research study, they do so
for two reasons. They
vary because the independent variable (the “treatment”) has had
an effect, and they vary
because of factors beyond the control of the researcher,
producing the error variance
referred to earlier.
The test statistic in ANOVA is the F ratio (named for Fisher),
which is treatment variance
(variance that can be explained by the IV on the DV) divided by
error variance (variance
that cannot be explained due to confounding variables on the
DV). When F is large, it indi-
cates that the difference between at least two of the groups in
the analysis is not random
and that there are significant differences between at least two
group means. When the F
ratio is small (close to a value of 1), it indicates that the IV has
not had enough impact to
overcome error variability, and the differences between groups
are not significant. We will
return to the F ratio when we discuss Formula 6.4.
Variance Between and Within Groups
If three groups of the same size are all selected from one
population, they could be repre-

sented by three distributions, as shown in Figure 6.1. They do
not have exactly the same
mean, but that is because even when they are selected from the
same population, samples
are rarely identical. Those initial differences between sample
means indicate some degree
of sampling error.
Figure 6.1: Three groups drawn from the same population
suk85842_06_c06.indd 187 10/23/13 1:40 PM
The reason that each of the three distributions has width is that
there are differences within
each of the groups. Even if the sample means were the same,
individuals selected to the
same sample will rarely manifest precisely the same level of
whatever is measured. If a
population is identified—for example, a population of the
academically gifted—and a
sample is drawn from that population, the individuals in the
sample will not all have the
same level of ability. Because they are all members of the
population of the academically
gifted, they will probably all be higher than the norm for
academic ability, but there will
still be differences in the subjects’ academic ability within the
sample.
These differences within are sources of error variance.
The treatment effect is indicated in how the IV affects the way

the DV
is manifested. For example, three groups of subjects are
administered
different levels of a mild stimulant (the IV) to see the effect on
level
of attentiveness. The issue in ANOVA is whether the IV, the
treat-
ment, creates enough additional between-groups variability to
exceed
any error variance. Ultimately, the question is whether, as a
result of
the treatment, the samples still represent populations with the
same
mean, or whether, as is suggested by the distributions in Figure
6.2,
they may represent populations with different means.
Figure 6.2: Three groups after the treatment
The within-groups variability in these three distributions is the
same as it was in the dis-
tributions in Figure 6.1. It is the between-groups variability that
has changed in Figure 6.2.
More particularly, it is the difference between the group means
that has changed. Although
there was some between-groups variability before the treatment,
it was comparatively
minor and probably reflected sampling variability. After the
treatment, the differences
between means are much greater. What F indicates is whether
group differences are great
enough to be statistically significant not due to chance.
The Statistical Hypotheses in One-Way ANOVA
The hypotheses are very much like they were for the

independent t-test, except that they
accommodate more groups. For the t-test, the null hypothesis is
written H0: m1 5 m2. It
indicates that the two samples involved were drawn from
populations with the same
means. For a one-way ANOVA with three groups, the null
hypothesis has this form:
H0: m1 5 m2 5 m3
B If a psychologist
is interested in
the impact that 1 hour,
5 hours, or 10 hours of
therapy have on client
behavior, how are
behavior differences
related to gender
explained?
Try It!
suk85842_06_c06.indd 188 10/23/13 1:40 PM
It indicates that the three samples were drawn from populations
with the same means.
Things have to change for the alternate hypothesis, however,
because with three groups,
there is not just one possible alternative. Note that each of the
following is possible:

a. Ha: m1 ? m2 5 m3
Sample 1 represents a population with a mean value different
from the
mean of the population represented by Samples 2 and 3.
b. Ha: m1 5 m2 ? m3
Samples 1 and 2 represent a population with a mean value
different
from the mean of the population represented by Sample 3.
c. Ha: m1 5 m3 ? m2
Samples 1 and 3 represent a population with a mean value
different
from the population represented by Sample 2.
d. Ha: m1 ? m2 ? m3
All three samples represent populations with different means.
Because the several possible alternative outcomes multiply rap-
idly when the number of groups increases, a more general
alternate
hypothesis is given. Either all the groups involved come from
popu-
lations with the same means, or at least one of them does not.
So the
form of the alternate hypothesis for an ANOVA with any
number of
groups is simply
Ha: At least one of the means is different from the other
means.
Also remember that all the hypotheses are either nondirectional,
in that there is no predic-
tion of which sample mean will be higher than the others:

Nondirectional alternative hypothesis: Ha: m1 ? m2 ? m3
or directional, in that there is a prediction of which sample
mean will be higher than the
other means. As seen below for the directional alternative
hypothesis, there is a prediction
that m3 will be higher than m2 that is higher than m1.
Directional alternative hypothesis: Ha: m1 , m2 , m3
As a researcher, it is important to consider the value of
prediction in terms of a one-tailed
test versus no prediction in a two-tailed test as discussed in
Chapter 5.
Measuring Data Variability in the One-Way ANOVA
We have discussed several different measures of data variability
to this point, including
the standard deviation (s), the variance (s2), the standard error
of the mean (SEM), the
standard error of the difference (SEd), and the range. For
ANOVA, Fisher added one more,
C How many
t-tests would
it take to make all
possible comparisons
in a procedure with six
groups?
Try It!
suk85842_06_c06.indd 189 10/23/13 1:40 PM

the sum of squares (SS). The sum of squares is the sum of the
squared differences between
scores and one of several mean values. In ANOVA,
• One sum-of-squares value involves the differences between
individual scores and
the mean of all the scores in all the groups (the grand mean).
This is the called the
sum of squares total (SStot) because it measures all variability
from all sources.
• A second sum-of-squares value indicates the difference
between the means of
the individual groups and the grand mean. This is the sum of
squares between
(SSbet). It measures the effect of the IV, the treatment effect, as
well any differ-
ences that existed between the groups before the study began.
• A third sum-of-squares value measures the difference between
scores in the sam-
ples and the means of their sample. These sum of squares within
(SSwith) values
reflect the differences in the way subjects respond to the same
stimulus. Because
this value is entirely error variance, it is also called the sum of
squares error (SSerr)
or the sum of squares residual (SSres).
All Variability From All Sources: The Sum of Squares Total
(SStot )
There are multiple formulas for SStot. They all provide the

same answer, but some make
more sense to look at than others. Formula 6.1 makes it clear
that at the heart of SStot is
the difference between each individual score (x) and the mean
of all scores, or the grand
mean, for which the notation is MG.
SStot 5 a (x 2 MG )2 Formula 6.1
Where
x 5 each score in all groups
MG 5 the mean of all data from all groups, the grand mean
To calculate SStot, follow these steps:
1. Sum all scores from all groups and divide by the number of
scores to determine
the grand mean, MG.
2. Subtract MG from each score (x) in each group, and then
square the difference:
(x 2 MG)
2
3. Sum all the squared differences: a (x 2 MG)
2
The Treatment Effect: The Sum of Squares Between (SSbet )
The between-groups variance, the sum of squares between
(SSbet), contains the variability
due to the independent variable, the treatment effect. It will also
contain any initial differ-

ences between the groups, which of course is error variance. For
three groups labeled a,
b, and c, the formula is
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc Formula 6.2
suk85842_06_c06.indd 190 10/23/13 1:40 PM
Where
Ma 5 the mean of the scores in the first group (a)
MG 5 the same grand mean used in SStot
na 5 the number of scores in the first group (a)
To calculate SSbet, follow these steps:
1. Determine the mean for each group: Ma, Mb, and so on.
2. Subtract MG from each sample mean and square the
difference: (Ma 2 MG)
2
3. Multiply the squared differences by the number in the group:
(Ma 2 MG)
2na
4. Repeat for each group.

5. Sum ( a ) the results across groups.
The value that results from Formula 6.2 represents the
differences between groups and the
mean of all the data.
The Error Term: The Sum of Squares Within (SSwith )
When a group receives the same treatment but individuals
within the group respond dif-
ferently, their differences constitute error—unexplained
variability. Maybe subjects’ age
differences are the cause, or perhaps the circumstances of their
family lives, but for some
reason not analyzed in the particular study, subjects in the same
group often respond dif-
ferently to the same stimulus. The amount of this unexplained
variance within the groups
is calculated with the SSwith, for which we have Formula 6.3:
SSwith 5 a (xa 2 Ma)2 1 a (xb 2 Mb)2 1 a (xc 2 Mc )2 Formula
6.3
Where
SSwith 5 the sum of squares within
xa 5 each of the individual scores in Group a
Ma 5 the score mean in Group a
To calculate SSwith, follow these steps:
1. Take the mean for each of the groups; these are available
from
calculating the SSbet earlier.

2. From each score in each group,
a. subtract the mean of the group,
b. square the difference, and
c. sum the squared differences within each group.
3. Repeat this for each group.
4. Sum the results across the groups.
D When will the
sum-of-squares
values be negative?
Try It!
suk85842_06_c06.indd 191 10/23/13 1:40 PM
The SSwith (or the SSerr) measures the degree to which scores
vary due to factors not con-
trolled in the study, fluctuations that constitute error variance.
Because the SStot consists of the SSbet and the SSwith, once
the SStot and the SSbet are known,
the SSwith can be determined by subtraction:
SStot 2 SSbet 5 SSwith
However, there are two reasons not to determine the SSwith by
simple subtraction. First, if
there is an error in the SSbet, it is only perpetuated with the
subtraction. Second, calculat-
ing the value with Formula 6.3 helps clarify that what is being

determined is a measure
of how much variation in scores there is within each group. For
the few problems done
entirely by hand, we will take the “high road” and use the
conceptual formula.
Conceptual formulas (6.1, 6.2, and 6.3) clarify the logic
involved, but in the case of analysis
of variance, they also require a good deal of tiresome
subtracting and then squaring of
numbers. To minimize the tedium, the data sets here are all
relatively small. When larger
studies are done by hand, people often shift to the “calculation
formulas” for simpler
arithmetic, but there is a sacrifice to clarity. Happily, you will
seldom ever find yourself
doing manual ANOVA calculations, and after a few simple
longhand problems, this chap-
ter will explain how you can utilize Excel or SPSS for help with
the larger data sets.
Calculating the Sums of Squares
A researcher is interested in the level of social isolation people
feel in small towns (a),
suburbs (b), and cities (c). Participants randomly selected from
each of those three settings
take the Assessment List of Nonnormal Environments
(ALONE), for which the following
scores are available:
a. 3, 4, 4, 3
b. 6, 6, 7, 8
c. 6, 7, 7, 9
We know we are going to need the mean of all the data (MG) as

well as the mean for each
group (Ma, Mb, Mc ), so we will start there. Verify that
a x 5 70 and N 5 12, so that MG 5 5.833.
For the small-town subjects,
a xa 5 14 and na 5 4, so Ma 5 3.50.
For the suburban subjects,
a xb 5 27 and nb 5 4, so Mb 5 6.750.
suk85842_06_c06.indd 192 10/23/13 1:40 PM
For the city subjects,
a xc 5 29 and nc 5 4, so Mc 5 7.250.
For the sum of squares total, the formula is
SStot 5 a (x 2 MG)2.
SStot 5 41.67
The calculations are in Table 6.1.
Table 6.1: Calculating the sum of squares total (SStot )
SStot 5 a (x 2 MG)
2, MG 5 5.833
For the Town Data
x 2 M 1x 2 M2 2

3 2 5.833 5 22.833 8.026
4 2 5.833 5 21.833 3.360
4 2 5.833 5 21.833 3.360
3 2 5.833 5 22.833 8.026
For the Suburb Data
x 2 M 1x 2 M2 2
6 2 5.833 5 0.167 0.028
6 2 5.833 5 0.167 0.028
7 2 5.833 5 1.167 1.362
8 2 5.833 5 2.167 4.696
For the City Data
x 2 M 1x 2 M2 2
6 2 5.833 5 0.167 0.028
7 2 5.833 5 1.167 1.362
7 2 5.833 5 1.167 1.362
9 2 5.833 5 3.167 10.030
SStot 5 41.668
suk85842_06_c06.indd 193 10/23/13 1:40 PM

For the sum of squares between, the formula is
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc
The SSbet involves three groups rather than the 12 individuals
required for SStot. The SSbet
is as follows:
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc
5 (3.5 2 5.833)2(4) 1 (6.75 2 5.833)2(4) 1 (7.25 2 5.833)2(4)
5 21.772 1 3.364 1 8.032
5 33.17
The SSwith indicates the error variance by determining the
differences between individual scores in a group and their
means. The formula is
SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2
SSwith 5 8.50

The calculations are in Table 6.2.
Because we calculated the SSwith directly instead of
determining it by subtraction, we can
now check for accuracy by adding its value to the SSbet. If the
calculations are correct,
SSwith 1 SSbet 5 SStot. For the isolation example, we have
8.504 1 33.168 5 41.67
In the initial calculation, SStot 5 41.67. The difference of .004
is round-off difference and is
unimportant.
Although they were not called sums of squares, we have
calculated an equivalent statis-
tic since Chapter 1. At the heart of the standard deviation
calculation is those repetitive
x 2 M differences for each score in the sample. The difference
values are then squared and
summed much as they are for calculating SSwith and SStot.
Further, the denominator in the
standard deviation calculation is n 2 1, which should look
suspiciously like some of the
degrees of freedom values we will discuss in the next section.
Interpreting the Sums of Squares
The different sums-of-squares values are measures of data
variability, which makes them
like the standard deviation, variance measures, the standard
error of the mean, and so on.
But there is an important difference between SS and the other
statistics. In addition to data
variability, the magnitude of the SS value reflects the number of
scores included. Because

sums of squares are in fact the sum of squared values, the more
values there are, the larger
E What will
SStot 2 SSwith yield?
Try It!
suk85842_06_c06.indd 194 10/23/13 1:40 PM
the value becomes. With statistics like the standard deviation,
adding more values near
the mean of the distribution actually shrinks its value. But this
cannot happen with the
sum of squares. Additional scores, whatever their value, will
almost always increase the
sum-of-squares.
Table 6.2: Calculating the sum of squares within (SSwith )
SSwith 5 a 1xa 2 Ma 2 2 1 a 1xb 2 Mb 2 2 1 a 1xc 2 Mc 2 2
3, 4, 4, 3
6, 6, 7, 8
6, 7, 7, 9
Ma 5 3.50, Mb 5 6.750, Mc 5 7.250
For the Town Data

x 2 M 1x 2 M2 2
3 2 3.50 5 20.50 0.250
4 2 3.50 5 0.50 0.250
4 2 3.50 5 0.50 0.250
3 2 3.50 5 20.50 0.250
For the Suburb Data
x 2 M 1x 2 M2 2
6 2 6.750 5 20.750 0.563
6 2 6.750 5 20.750 0.563
7 2 6.750 5 0.250 0.063
8 2 6.750 5 1.250 1.563
For the City Data
x 2 M 1x 2 M2 2
6 2 7.250 5 21.250 1.563
7 2 7.250 5 20.250 0.063
7 2 7.250 5 20.250 0.063
9 2 7.250 5 1.750 3.063
SSwith 5 8.504

suk85842_06_c06.indd 195 10/23/13 1:40 PM
This characteristic makes the sum of squares difficult to
interpret. What constitutes much
or little variability depends not just on how much difference
there is between the scores
and the mean to which they are compared but also on how many
scores there are. Fisher
turned the sum-of-squares values into a “mean measure of
variability” by dividing each
sum-of-squares value by its degrees of freedom. The SS 4 df
operation creates what is
called the mean square (MS).
In the one-way ANOVA, there is a MS value associated with
both the SSbet and the SSwith
(SSerr). There is no mean squares total given in the table, but if
this were to be calculated, it
is the total variance (SSbet 1 SSwith) divided by the entire data
set as a single sample minus
one (N 2 1). Dividing the SStot by its degrees of freedom (N 2
1) would provide a mean
level of overall variability, but that would not help answer
questions about the ratio of
between-groups variance to within-groups variance.
The degrees of freedom for each of the sums of squares
calculated for the one-way ANOVA
are as follows:
• Degrees of freedom total 5 N 2 1, where N is the total number
of scores

• Degrees of freedom for between (dfbet) 5 k 2 1, where k is the
number of groups
SSbet 4 dfbet 5 MSbet
• Degrees of freedom for within (dfwith) 5 N 2 k
SSwith 4 dfwith 5 MSwith
Although there is no MStot, we need the sum of squares for
total (SStot) and the degrees of
freedom for total (dftot) because they provide an accuracy
check:
a. The sums of squares between and within should equal total
sum of squares:
SSbet 1 SSwith 5 SStot
b. The sum of degrees of freedom between and within should
equal degrees of
freedom total:
dfbet 1 dfwith 5 dftot
Remembering these relationships can help reveal errors. In
other words, the concept of
error is unexplained or unsystematic variance within groups
(SSwith)
that is considered
variance not caused by experimental manipulation, as opposed
to explained or systematic
variance due to experimental variance between groups (SSbet).
The F Ratio

The mean squares for between and within are the components of
F, and the F ratio is the
test statistic in ANOVA. As noted earlier in this chapter, the F
is a ratio:
F 5
MSbet
MSwith
Formula 6.4
suk85842_06_c06.indd 196 10/23/13 1:40 PM
The issue is whether the MSbet, which contains the treatment
effect and some error, is
substantially greater than the MSwith, which contains only
error. This is illustrated in
Figure 6.3 by comparing the distance from the mean of the first
distribution to the mean
of the second distribution, the A variance, to the B and C
variances, which indicate the
differences within groups.
If the MSbet/MSwith ratio is large—it must be substantially
greater than 1—the difference
between groups is likely to be significant. When that ratio is
small (close to 1), F is likely
to be nonsignificant. How large F must be to be significant
depends on the degrees of free-
dom for the problem, just as it did for the t-tests.

Figure 6.3: The F-ratio: comparing variance between groups (A)
to
variance within groups (B 1 C)
The ANOVA Table
With the sums of squares and the degrees of freedom for the
different values in hand, the
ANOVA results are presented in a table often referred to as a
source table, indicating the
sources of variability that indicates
• the source of the variance,
• the sums-of-squares values,
• the degrees of freedom
for total degrees of freedom, dftot 5 N 2 1 (because N 5 12 dftot
5 11),
for between degrees of freedom, dfbet 5 k 2 1(because k, the
number of groups,
5 3 dfbet 5 k 2 1, dfbet 5 2),
for within degrees of freedom, df 5 N 2 k (because N 5 12 and k
5 3, dfwith 5 9,
• the mean square values, which are SS/df, and
• the F value, which is the MSbet/MSwith.
B C
A
suk85842_06_c06.indd 197 10/23/13 1:40 PM

For the social isolation problem, the ANOVA table is
Source SS df MS F
Between 33.17 2 16.58 17.55
Within 8.50 9 .95
Total 41.67 11
The table makes it easy to check some of the results for
accuracy. Check that
SSbet 1 SSwith 5 SStot
Also verify that
dfbet 1 dfwith 5 dftot
In the course of checking results, note the sums-of-squares
values can never be negative.
Because the SS values are literally sums of squares, a negative
number indicates a calcula-
tion error somewhere because there is no such thing as negative
variability (Chapter 1).
The smallest a sum-of-squares value can be is 0, and this can
happen only if all scores in
the sum-of-squares calculation have the same value.
Understanding F
The larger F is, the more likely it is to be statistically
significant, but how large is large

enough? In the preceding ANOVA table, F 5 17.551, which
seems like a comparatively
large value.
• The fact that F is determined by dividing MSbet by MSwith
indicates that whatever
the value of F is indicates the number of times MSbet is greater
than MSwith.
• Here MSbet is 17.551 times greater than MSwith, which seems
promising, but to be
sure, it must be compared to a value from the critical values of
F (Table 6.3, which
is repeated in the Appendix as Table C).
As with the t-test, as degrees of freedom increase, the critical
values decline. The difference
is that with F two df values are involved: one for the MSbet and
the other for the MSwith.
• In Table 6.3 (also Table C in the Appendix), the critical value
is identified by
moving across the top of the table to the dfbet (the df
numerator) and then moving
down that column to the dfwith (the df denominator). According
to the social isola-
tion test ANOVA table above, these are
the dfbet 5 2 and
the dfwith 5 9.
suk85842_06_c06.indd 198 10/23/13 1:40 PM

• The intersection of the 2 at the top and the 9 along the left
side of the table leads
to two critical values, one in regular type, which is for a 5 .05
and is the default,
and one in bold type, which is the value for testing at the
critical a 5 .01.
• The critical value when testing at p 5 .05 is 4.26.
• The critical value indicates that any ANOVA test with 2 and 9
df that has an F
value equal to or greater than 4.26 is statistically significant.
The social isolation differences between the three groups are
probably
not due to sampling variability. The statistical decision is to
reject H0.
The relatively large value of F—it is more than four times the
critical
value—indicates that of the differences in social isolation, much
more
of it is probably related to where respondents live than to the
amount
that is error variance.
Table 6.3: The critical values of F
Values in regular type indicate the critical value for p = .05.
Values in bold type indicate the critical value for p = .01.
df denominator df numerator
1 2 3 4 5 6 7 8 9 10

2 18.51
98.49
19.00
99.01
19.16
99.17
19.25
99.25
19.30
99.30
19.33
99.33
19.35
99.36
19.37
99.38
19.38
99.39
19.40
99.40
3 10.13
34.12
9.55
30.82

9.28
29.46
9.12
28.71
9.01
28.24
8.94
27.91
8.89
27.67
8.85
27.49
8.81
27.34
8.79
27.23
4 7.71
21.20
6.94
18.00
6.59
16.69
6.39
15.98

6.26
15.52
6.16
15.21
6.09
14.98
6.04
14.80
6.00
14.66
5.96
14.55
5 6.61
16.26
5.79
13.27
5.41
12.06
5.19
11.39
5.05
10.97
4.95
10.67

4.88
10.46
4.82
10.29
4.77
10.16
4.74
10.05
6 5.99
13.75
5.14
10.92
4.76
9.78
4.53
9.15
4.39
8.75
4.28
8.47
4.21
8.26
4.15
8.10

4.10
7.98
4.06
7.87
7 5.59
12.25
4.74
9.55
4.35
8.45
4.12
7.85
3.97
7.46
3.87
7.19
3.79
6.99
3.73
6.84
3.68
6.72
3.64
6.62

8 5.32
11.26
4.46
8.65
4.07
7.59
3.84
7.01
3.69
6.63
3.58
6.37
3.50
6.18
3.44
6.03
3.39
5.91
3.35
5.81
9 5.12
10.56
4.26
8.02

3.86
6.99
3.63
6.42
3.48
6.06
3.37
5.80
3.29
5.61
3.23
5.47
3.18
5.35
3.14
5.26
10 4.96
10.04
4.10
7.56
3.71
6.55
3.48
5.99

3.33
5.64
3.22
5.39
3.14
5.20
3.07
5.06
3.02
4.94
2.98
4.85
11 4.84
9.65
3.98
7.21
3.59
6.22
3.36
5.67
3.20
5.32
3.09
5.07

3.01
4.89
2.95
4.74
2.90
4.63
2.85
4.54
12 4.75
9.33
3.89
6.93
3.49
5.95
3.26
5.41
3.11
5.06
3
4.82
2.91
4.64
2.85
4.50

2.80
4.39
2.75
4.30
13 4.67
9.07
3.81
6.70
3.41
5.74
3.18
5.21
3.03
4.86
2.92
4.62
2.83
4.44
2.77
4.30
2.71
4.19
2.67
4.10

F If the F in an
ANOVA is 4.0 and
the MSwith 5 2.0, what
will be the value of MSbet?
Try It!
(continued)
suk85842_06_c06.indd 199 10/23/13 1:40 PM
Table 6.3: The critical values of F (continued)
Values in regular type indicate the critical value for p = .05.
Values in bold type indicate the critical value for p = .01.
df denominator df numerator
1 2 3 4 5 6 7 8 9 10
14 4.60
8.86
3.74
6.51
3.34
5.56
3.11
5.04

2.96
4.69
2.85
4.46
2.76
4.28
2.70
4.14
2.65
4.03
2.60
3.94
15 4.54
8.68
3.68
6.36
3.29
5.42
3.06
4.89
2.90
4.56
2.79
4.32

2.71
4.14
2.64
4.00
2.59
3.89
2.54
3.80
16 4.49
8.53
3.63
6.23
3.24
5.29
3.01
4.77
2.85
4.44
2.74
4.20
2.66
4.03
2.59
3.89

2.54
3.78
2.49
3.69
17 4.45
8.40
3.59
6.11
3.20
5.19
2.96
4.67
2.81
4.34
2.70
4.10
2.61
3.93
2.55
3.79
2.49
3.68
2.45
3.59

18 4.41
8.29
3.55
6.01
3.16
5.09
2.93
4.58
2.77
4.25
2.66
4.01
2.58
3.84
2.51
3.71
2.46
3.60
2.41
3.51
19 4.38
8.18
3.52
5.93

3.13
5.01
2.90
4.50
2.74
4.17
2.63
3.94
2.54
3.77
2.48
3.63
2.42
3.52
2.38
3.43
20 4.35
8.10
3.49
5.85
3.10
4.94
2.87
4.43

2.71
4.10
2.60
3.87
2.51
3.70
2.45
3.56
2.39
3.46
2.35
3.37
21 4.32
8.02
3.47
5.78
3.07
4.87
2.84
4.37
2.68
4.04
2.57
3.81

2.49
3.64
2.42
3.51
2.37
3.40
2.32
3.31
22 4.30
7.95
3.44
5.72
3.05
4.82
2.82
4.31
2.66
3.99
2.55
3.76
2.46
3.59
2.40
3.45

2.34
3.35
2.30
3.26
23 4.28
7.88
3.42
5.66
3.03
4.76
2.80
4.26
2.64
3.94
2.53
3.71
2.44
3.54
2.37
3.41
2.32
3.30
2.27
3.21

24 4.26
7.82
3.40
5.61
3.01
4.72
2.78
4.22
2.62
3.90
2.51
3.67
2.42
3.50
2.36
3.36
2.30
3.26
2.25
3.17
25 4.24
7.77
3.39
5.57

2.99
4.68
2.76
4.18
2.60
3.85
2.49
3.63
2.40
3.46
2.34
3.32
2.28
3.22
2.24
3.13
26 4.23
7.72
3.37
5.53
2.98
4.64
2.74
4.14

2.59
3.82
2.47
3.59
2.39
3.42
2.32
3.29
2.27
3.18
2.22
3.09
27 4.21
7.68
3.35
5.49
2.96
4.60
2.73
4.11
2.57
3.78
2.46
3.56

2.37
3.39
2.31
3.26
2.25
3.15
2.20
3.06
28 4.20
7.64
3.34
5.45
2.95
4.57
2.71
4.07
2.56
3.75
2.45
3.53
2.36
3.36
2.29
3.23

2.24
3.12
2.19
3.03
29 4.18
7.60
3.33
5.42
2.93
4.54
2.70
4.04
2.55
3.73
2.43
3.50
2.35
3.33
2.28
3.20
2.22
3.09
2.18
3.00

30 4.17
7.56
3.32
5.39
2.92
4.51
2.69
4.02
2.53
3.70
2.42
3.47
2.33
3.30
2.27
3.17
2.21
3.07
2.16
2.98
Source: Richard Lowry. file://localhost/www.vassarstats.net.
Retrieved from http/::vassarstats.net:textbook:apx_d.html
suk85842_06_c06.indd 200 10/23/13 1:40 PM
file://localhost/www.vassarstats.net

http/::vassarstats.net:textbook:apx_d.html
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD
A significant t from an independent t-test provides for a simpler
interpretation than a significant F from an ANOVA with three
or more groups can provide. A significant
t indicates that the two groups probably belong to populations
with different means. A
significant F indicates that at least one group is significantly
different from at least one
other group in the study, but unless there are only two groups in
the ANOVA, it is not
clear which group is significantly different from which. If the
null hypothesis is rejected,
there are a number of possible alternatives, as we noted when
we listed all the possible
HA outcomes earlier.
The point of a post hoc test (an “after this” test) conducted
following an ANOVA is to
determine which groups are significantly different from each
other. So when F is signifi-
cant, a post hoc test is the next step. Statisticians debate the
practice of whether to run a
post hoc if F is not significant, as there may be instances in
which the overall F will be
nonsignificant yet the post hoc tests detect a significant
difference between two groups.
With the ease of running the analysis in Excel or SPSS,
researchers may run post hoc tests
to determine whether there are significant differences in means

between pairs of groups.
In the latter case, a planned comparison is most prudent for
specific detection of mean dif-
ferences. Whether planned comparison or post hoc, the
determination should be based on
the purpose of the study. If the goal is to test the null
hypotheses that the means are not
significantly different, then a significant omnibus F is
appropriate. On the other hand, if
there are specific instances of detecting differences between
means, then the F result is not
necessary and going straight to the post hoc tests will be
apropos as in a planned compari-
son between means.
There are many post hoc tests that are used for different
purposes and based on their own
assumptions and calculations (18 of them in SPSS, named after
their respective authors).
Each of them has particular strengths, but one of the more
common in the psychological
disciplines, and also one of the easiest to calculate, is John
Tukey’s HSD test, for “honestly
significant difference.”
Many statisticians use the terms liberal and conservative to
describe post hoc tests. A liberal
test is one in which there is a greater chance of finding a
significant difference between
means but a higher chance of a type I error. Fisher’s least
significant difference (LSD) test
is an example of a liberal test. These are seldom used for the
very concern of committing a
type I error. Conversely, a conservative post hoc has a lower
chance of finding a significant
difference between means but also a lower chance of a type I

error. One such conservative
test is Bonferroni’s post hoc. By their very conservative nature,
these post hoc tests are
more widely used.
Formula 6.5 produces a value that is the smallest difference
between the means of any two
samples that can be statistically significant:
HSD 5 x Å
MSwith
n
Formula 6.5
suk85842_06_c06.indd 201 10/23/13 1:40 PM
Where
x 5 a table value indexed to the number of groups (k) in the
problem and
the degrees of freedom within (dfwith) from the ANOVA table
MSwith 5 the value from the ANOVA table
n 5 the number in one group when group sizes are equal.
In order to compute Tukey’s HSD, follow these steps:
1. From Table 6.4 locate the value of x by mov-
ing across the top of the table to the number

of groups/treatments (k 5 3), and then down
the left side for the within degrees of freedom
(dfwith 5 9). The intersecting values are 3.95 and
5.43. The smaller of the two is the value when
p 5 .05, as it was in our test. The post hoc test is
always conducted at the same probability level
as the ANOVA. In this case, it is p 5 .05.
2. The calculation is 3.95 times the result of the
square root of .945 (the MSwith) divided by 4 (n).
3.95 Å
.954
4
5 1.920
3. This value is the minimum difference between the means of
two significantly dif-
ferent samples. The sign of the difference does not matter; it is
the absolute value
we need.
The means for social isolation in the three groups are the
following:
Ma 5 3.50 for small town respondents
Mb 5 6.750 for suburban respondents
Mc 5 7.250 for city respondents
Small towns minus suburbs:
Ma 2 Mb 5 3.50 2 6.75 5 23.25—this difference exceeds 1.92
and is significant.

Small towns minus cities:
Ma 2 Mc 5 3.50 2 7.25 5 23.75—this difference exceeds 1.92
and is significant.
Suburbs minus cities:
Mb 2 Mc 5 6.75 2 7.25 5 20.50—this difference is less than
1.92 and is not
significant.
When several groups are involved, sometimes it is helpful to
create a table that presents
all the differences between pairs of means. Table 6.5, which is
repeated in the Appendix as
Table D, is the Tukey’s HSD results for the social isolation
problem.
Formula 6.5 is used
when group sizes are
equal. However, there is an
alternate formula for unequal
group sizes for the more
adventurous:
HSD 5 Å a
MSwith
2 b a
1
n1
1
1
n2

b
with a separate HSD value
completed for each pair of
means in the problem.
Try It!
suk85842_06_c06.indd 202 10/23/13 1:40 PM
Table 6.4: Tukey’s HSD critical values: q (Alpha, k, df )
* The critical value for q corresponding to alpha = .05 (top) and
alpha = .01 (bottom)
df k 5 Number of Treatments
2 3 4 5 6 7 8 9 10
5 3.64
5.70
4.60
6.98
5.22
7.80
5.67
8.42

6.03
8.91
6.33
9.32
6.58
9.67
6.80
9.97
6.99
10.24
6 3.46
5.24
4.34
6.33
4.90
7.03
5.30
7.56
5.63
7.97
5.90
8.32
6.12
8.61

6.32
8.87
6.49
9.10
7 3.34
4.95
4.16
5.92
4.68
6.54
5.06
7.01
5.36
7.37
5.61
7.68
5.82
7.94
6.00
8.17
6.16
8.37
8 3.26
4.75

4.04
5.64
4.53
6.20
4.89
6.62
5.17
6.96
5.40
7.24
5.60
7.47
5.77
7.68
5.92
7.86
9 3.20
4.60
3.95
5.43
4.41
5.96
4.76
6.35

5.02
6.66
5.24
6.91
5.43
7.13
5.59
7.33
5.74
7.49
10 3.15
4.48
3.88
5.27
4.33
5.77
4.65
6.14
4.91
6.43
5.12
6.67
5.30
6.87

5.46
7.05
5.60
7.21
11 3.11
4.39
3.82
5.15
4.26
5.62
4.57
5.97
4.82
6.25
5.03
6.48
5.20
6.67
5.35
6.84
5.49
6.99
12 3.08
4.32

3.77
5.05
4.20
5.50
4.51
5.84
4.75
6.10
4.95
6.32
5.12
6.51
5.27
6.67
5.39
6.81
13 3.06
4.26
3.73
4.96
4.15
5.40
4.45
5.73

4.69
5.98
4.88
6.19
5.05
6.37
5.19
6.53
5.32
6.67
14 3.03
4.21
3.70
4.89
4.11
5.32
4.41
5.63
4.64
5.88
4.83
6.08
4.99
6.26

5.13
6.41
5.25
6.54
15 3.01
4.17
3.67
4.84
4.08
5.25
4.37
5.56
4.59
5.80
4.78
5.99
4.94
6.16
5.08
6.31
5.20
6.44
16 3.00
4.13

3.65
4.79
4.05
5.19
4.33
5.49
4.56
5.72
4.74
5.92
4.90
6.08
5.03
6.22
5.15
6.35
17 2.98
4.10
3.63
4.74
4.02
5.14
4.30
5.43

4.52
5.66
4.70
5.85
4.86
6.01
4.99
6.15
5.11
6.27
18 2.97
4.07
3.61
4.70
4.00
5.09
4.28
5.38
4.49
5.60
4.67
5.79
4.82
5.94

4.96
6.08
5.07
6.20
19 2.96
4.05
3.59
4.67
3.98
5.05
4.25
5.33
4.47
5.55
4.65
5.73
4.79
5.89
4.92
6.02
5.04
6.14
20 2.95
4.02

3.58
4.64
3.96
5.02
4.23
5.29
4.45
5.51
4.62
5.69
4.77
5.84
4.90
5.97
5.01
6.09
24 2.92
3.96
3.53
4.55
3.90
4.91
4.17
5.17

4.37
5.37
4.54
5.54
4.68
5.69
4.81
5.81
4.92
5.92
30 2.89
3.89
3.49
4.45
3.85
4.80
4.10
5.05
4.30
5.24
4.46
5.40
4.60
5.54

4.72
5.65
4.82
5.76
40 2.86
3.82
3.44
4.37
3.79
4.70
4.04
4.93
4.23
5.11
4.39
5.26
4.52
5.39
4.63
5.50
4.73
5.60
Source: Tukey’s HSD critical values (n.d.). Retrieved from
http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html.

suk85842_06_c06.indd 203 10/23/13 1:40 PM
http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html
Table 6.5: Presenting Tukey’s HSD results in a table
HSD 5 x Å
MSwith
n
5 3.95 Å
.954
4
5 1.920 (x2)
Any difference between pairs of means 1.920 or greater is a
statistically significant difference.
Mean differences in orange are statistically significant.
Small towns
M 5 3.500
Suburbs
M 5 6.750
Cities
M 5 7.250
Small towns
M 5 3.500

Diff 5 3.250 Diff 5 3.750
Suburbs
M 5 6.750
Diff 5 0.500
Cities
M 5 7.250
The values entered in the cells in Table 6.5 indicate the
differences between each pair of
means in the study. Comparing the mean scores from each of the
three groups indicates
that the respondents from small towns expressed a significantly
lower level of social
isolation than those in either the suburbs or cities. Comparing
the mean scores from the
suburban and city groups indicates that social isolation scores
are higher in the city, but
the difference is not large enough to be statistically significant.
The significant F from the ANOVA indicated that at least one
group had a significantly
different level of social isolation from at least one other group,
but that is all a significant F
can reveal. The result does not indicate which group is
significantly different from which
other group, unless there are only two groups. The post hoc test
indicates which pairs of
groups are significantly different from each other. Table 6.5 is
an example of how to illus-
trate the significant and the nonsignificant differences. One
caveat in using Tukey’s HSD
is that there is an assumption of equality of variances

(homogeneity) between groups
based on Levene’s test. This assumption applies here as well.
Suppose there is a violation
of homogeneity. In that instance, an adjusted post hoc that
accounts for inequality of vari-
ances (or heterogeneity) will need to be employed. To
implement this in SPSS for instance
there are four options under the Equal Variances Not Assumed
heading when conducting
a post hoc for ANOVA. One of these approaches is the Games-
Howell post hoc, which is
executed by checking that box in SPSS Post Hoc tests tab for
ANOVA.
suk85842_06_c06.indd 204 10/23/13 1:40 PM
Apply It!
ANOVA and Product Development
A product development specialist in a major computer company
decides that
it would be a significant improvement to keyboards if they were
designed to
fit the shape of human hands. Instead of being flat, the new
keyboard would
curve like the surface of a football. Before the company
executives are willing to expend the
resources necessary to produce and distribute such a product,
they need to know whether

it will sell and what the most comfortable curvature of the
keyboard would be.
The company produces prototypes for four different keyboards,
labeled Prototype A through
D (see Table 6.6). Prototype A is a standard flat keyboard, and
the others each have varying
amounts of curve. Everything else about the keyboards is the
same, so this is a one-way
ANOVA. Forty different users are randomly assigned to test one
of the four keyboards and
rank them in comfort on a 100-point scale. The results are
shown below Figure 6.4.
Table 6.6: Prototype A–D data set
Prototype A Prototype B Prototype C Prototype D
49 57 77 65
57 53 82 61
73 69 77 73
68 65 85 81
65 61 93 89
62 73 79 77
61 57 73 81
45 69 89 77
53 73 82 69

61 77 85 77
Next, the test results are analyzed in Excel, which produces the
information in Figure 6.4.
(continued)
suk85842_06_c06.indd 205 10/23/13 1:40 PM
Figure 6.4: Excel results of comparison means and ANOVA of
prototypes
The null hypothesis is that there is no difference among the four
keyboards. From Table
6.6, we see that the F value is 16.72, which is larger than the
critical value of F 5 2.87 at
the critical a 5 .05. Therefore the null hypothesis is rejected at p
, .05. At least one of the
prototypes is significantly different from at least one other
prototype.
Because there is a significant F, the marketers next compute
HSD:
HSD 5 x Å
MSwith
n

Where
x 5 3.81 (based on k 5 4, dfwith 5 36, and p 5 .05)
MSwith 5 61.07, the value from the ANOVA table
n 5 10, the number in one group when group sizes are equal
HSD 5 9.42
(continued)
Groups
Prototype A
Prototype B
Prototype C
Prototype D
Count
10
10
10
10
Sum
594
654

822
750
Average
59.4
65.4
82.2
75.0
Variance
73.82
65.60
36.40
68.44
Summary
Source of
Variation
Between
Groups
Within
Groups
Total

SS
3063.6
2198.4
5262
df
3
36
39
MS
1021.20
61.07
F
16.72
p-value
5.71E–07
Fcrit
2.87
ANOVA

suk85842_06_c06.indd 206 10/23/13 1:40 PM
CHAPTER 6Section 6.3 Determining the Results’ Practical
Importance
This value is the minimum difference between the means of two
significantly different sam-
ples. The difference in means between the groups is shown
below:
A 2 B 5 26.0
A 2 C 5 222.8
A 2 D 5 215.6
B 2 C 5 216.8
B 2 D 5 29.6
C 2 D 5 7.2
The differences in comfort between Prototypes A-B and C-D are
not statistically significant,
because the absolute values are less than the Tukey’s HSD value
of 9.42. However, the differ-
ences in comfort between the remaining prototypes are
statistically significant.
Based on analysis of the one-way ANOVA, the marketing team
decides to produce and sell
the keyboard configuration of Prototype C. This had the highest
mean comfort level and will
be a significant improvement over existing keyboards.
Apply It! boxes written by Shawn Murphy

6.3 Determining the Results’ Practical Importance
Three questions can come up in an ANOVA. The second and
third questions depend upon
the answer to the first:
1. Are any of the differences statistically significant? The
answer depends upon
how the calculated F value compares to the critical value from
the table.
2. If the F is significant, which groups are significantly
different from each other?
That question is answered by completing a post hoc test such as
Tukey’s HSD.
3. If F is significant, how important is the result? The answer
comes by calculating an
effect size.
After addressing the first two questions, we now turn our
attention to the third question,
effect size. With the t-test in Chapter 5, Cohen’s d answered the
question about how impor-
tant the result was. Several effect-size statistics have been used
to explain the importance
of a significant ANOVA result. Omega squared (v2) and partial-
eta-squared (partial-h2)
are both quite common in the social science research literature,
but the one we will use is
called eta-squared (H2). The Greek letter eta (h pronounced like
“ate a” as in “ate a grape”)
is the equivalent of the letter h. Because some of the variance in
scores is unexplained and
is therefore error variance, eta-squared answers this question:
How much of the score

variance can be attributed to the independent variable?
suk85842_06_c06.indd 207 10/23/13 1:40 PM
Importance
In the social isolation problem, the question was whether
residents of small towns, subur-
ban areas, and cities differ in the amount of social isolation they
indicate. The respondents’
location is the IV. Eta-squared estimates how much of the
difference in social isolation is
related to where respondents live.
There are only two values involved in the h2 calculation, both
retrievable from the ANOVA
table. Formula 6.6 shows the eta-squared calculation:
h2 5
SSbet
SStot
Formula 6.6
Eta-squared is the ratio of between-groups variability to total
variability. If there was no
error variance, all variance would be due to the independent
variable, and the sums of
squares for between-groups variability and for total variability
would have the same val-
ues; the effect size would be 1.0. With human subjects, this
never happens because scores
fluctuate for reasons other than the IV, but it is important to

know that 1.0 is the “upper
bound” for this effect size. The lower bound is 0, of course—
none of the variance is
explained. But we also never see eta-squared values of 0
because the only time the effect
size is calculated is when F is significant, and that can only
happen when the effect of the
IV is great enough that the ratio of MSbet to MSwith exceeds
the critical value.
For the social isolation problem, SSbet 5 33.168 and SStot 5
41.672, so
h2 5
33.168
41.672
5 0.796.
According to this data, about 80% (79.6% to be exact) of the
variance in social isolation scores is related to whether the
respondent lives in a small town, a suburb, or a city. (Note
that this amount of variance is unrealistically high, which
can happen when numbers are contrived.)
G If the F in
ANOVA is not
significant, should the
post hoc test or the
effect-size calculation be
made?
Try It!
Apply It!

Using ANOVA to Test Effectiveness
A pharmaceutical company has developed a new medicine to
treat a skin condition. This medi-
cine has been proven effective in previous tests, but now the
company is trying to decide the
best method to deliver the medicine. The options are
1. pills that are taken orally,
2. a cream that is rubbed into the affected area, or
3. drops that are placed on the affected area.
(continued)
suk85842_06_c06.indd 208 10/23/13 1:40 PM
Importance
To test the application methods, the company uses 24 volunteers
who suffer
from this skin condition. Each of the volunteers is randomly
assigned to one of
the three treatment methods. Note that each volunteer tests only
one of the
delivery methods. This satisfies the requirement that the
categories of the IV
must be independent. This is a one-way ANOVA test with the
delivery method
being the only independent variable.

To evaluate the effectiveness of each delivery method, three
different dermatologists exam-
ine each patient after the course of treatment. They then rate the
skin condition on a scale
of 1 through 20, with 20 being a total absence of the condition.
The scores from the three
doctors are then averaged.
The null hypothesis is that all three delivery methods are
equally effective:
H0: mpills 5 mcream 5 mdrops
The null hypothesis indicates that the three treatments were
drawn from populations with
the same mean. The alternate hypothesis for the ANOVA test is
Ha: mpills ? mcream ? mdrops
Data from the trial is shown in Table 6.7.
Table 6.7: Data from trial of skin treatment conditions
Pills Cream Drops
14 18 13
13 15 15
19 16 16
18 18 15
15 17 14
16 13 17

12 17 13
12 18 16
(continued)
suk85842_06_c06.indd 209 10/23/13 1:40 PM
Importance
Figure 6.5: Analysis of the data that was performed in Excel
Figure 6.5 shows the value for F is 1.72, which is less than the
Fcrit value of 3.47 when testing
at p 5 .05. Therefore, the null hypothesis is not rejected. We
cannot say that the different
delivery methods come from populations with different means.
Looking at the p value gen-
erated by Excel, we see that there is a 20% probability that a
difference in means this large
could have occurred by chance alone. Because the null
hypothesis is not rejected, there is
no need to perform either a Tukey’s HSD test or an h2
calculation.
The pharmaceutical company decides to offer the medicine as a
cream because this is gen-
erally their preferred delivery method. The ANOVA test has
assured them that this is the
correct choice, and that neither of the two alternate methods

provided a more effective
delivery option. In other words, the alternative hypothesis is not
correct.
Apply It! boxes written by Shawn Murphy
Groups
Pills
Cream
Drops
Count
8
8
8
Sum
119
132
119
Average
14.88
16.50

14.88
Variance
6.98
3.14
2.13
Summary
Source of
Variation
Between
Groups
Within
Groups
Total
SS
14.08
85.75
99.83
df
2
21

23
MS
7.04
4.08
F
1.72
p-value
0.20
Fcrit
3.47
ANOVA
suk85842_06_c06.indd 210 10/23/13 1:40 PM
CHAPTER 6Section 6.4 Conditions for the One-Way ANOVA
6.4 Conditions for the One-Way ANOVA
As we saw with the t-tests, any statistical test requires that
certain conditions (also referred
to as assumptions) are met. The conditions might be
characteristics such as the scale of the
data, the way the data is distributed, the relationships between
the groups in the analysis,

and so on. In the case of the one-way ANOVA, the name
indicates one of the conditions.
• This particular test can accommodate just one independent
variable.
• That one variable can have any number of categories, but there
can be just one IV.
In the example of small-town, suburban, and city isolation, the
IV was the loca-
tion of the respondents’ residence. We might have added more
categories such as
small-town, semirural, small town, large town, suburbs of small
cities, suburbs of
large cities, and so on, all of which relate to the respondents’
place of residence,
but like the independent t-test, there is no way to add another
variable, such as
the respondents’ gender, in a one-way ANOVA.
• The categories of the IV must be independent.
• Like the independent t-test, the groups involved must be
independent. Those
who are members of one group cannot also be members of
another group
involved in the same analysis.
• The IV must be nominal scale.
• Because the IV must be nominal scale, sometimes data of
some other scale is
reduced to categorical data to complete the analysis. If someone
is interested
in whether there are differences in social isolation related to
age, age must be

changed from ratio to nominal data prior to the analysis. Rather
than using each
person’s age in years as the independent variable, ages are
grouped into catego-
ries such as 20s, 30s, and so on. This is not ideal, because by
reducing ratio data
to nominal or even ordinal scale, the differences in social
isolation between, for
example, 20- and 29-year-olds are lost.
• The DV must be interval or ratio scale.
• Technically, social isolation would need to be measured with
something like the
number of verbal exchanges that one has daily with neighbors or
co-workers,
rather than asking on a scale of 1–10 to indicate how isolated
one feels, which is
probably an example of ordinal data.
• The groups in the analysis must be similarly distributed. The
technical descrip-
tion for this similarity of distribution is homogeneity of
variance. For example,
this condition means that the groups should all have reasonably
similar standard
deviations. This was discussed in Chapter 5 where the Levene’s
test is used to
test equality of variances.
• Finally, using ANOVA assumes that the samples are drawn
from a normally dis-
tributed population.
It may seem difficult to meet all these conditions. However,
keep in mind that normality

and homogeneity of variance in particular represent ideals more
than practical necessities.
As it turns out, Fisher’s procedure can tolerate a certain amount
of deviation from these
requirements; this test is quite robust.
suk85842_06_c06.indd 211 10/23/13 1:40 PM
CHAPTER 6Section 6.5 ANOVA and the Independent t-Test
6.5 ANOVA and the Independent t-Test
The one-way ANOVA and the independent t-test share several
assumptions, although they employ distinct statistics, in that the
sums of squares is used for ANOVA and the
standard error of the difference is used for the t-test. For
example, both tests will lead the
analyst to the same conclusion. This consistency can be
illustrated by completing ANOVA
and the independent t-test for the same data.
Suppose an industrial psychologist is interested in how people
from two separate divi-
sions of a company differ in their work habits. The dependent
variable is the amount of
work completed after-hours at home per week for supervisors in
marketing versus super-
visors in manufacturing. The data is as follows:
Marketing: 3, 4, 5, 7, 7, 9, 11, 12
Manufacturing: 0, 1, 3, 3, 4, 5, 7, 7
Calculating some of the basic statistics yields the following:

M s SEM SEd MG
Marketing: 7.25 3.240 1.146
1.458 5.50
Manufacturing: 3.75 2.550 0.901
First, the t-test:
t 5
M1 2 M2
SEd
5
7.25 2 3.75
1.458
5 2.401; t.05(14) 5 2.145
The difference is significant. Those in marketing (M1) take
significantly more work home
than those in manufacturing (M2).
Now the ANOVA:
• SStot 5 a (x 2 MG)2 5 168
• Verify that the result of subtracting MG from each score in
both groups, squaring
the differences, and summing the square 5 168.
• SSbet 5 (Ma 2 MG)
2na 1 (Mb 2 MG)

2nb
• This one is not too lengthy to do here: (7.25 2 5.50)2(8) 1
(3.75 2 5.50)2(8)
5 24.5 1 24.5 5 49.
• SSwith 5 (xa 2 Ma)
2 1 (xb 2 Mb)
2
• Verify that the result of subtracting the group means from
each score in the par-
ticular group, squaring the differences, and summing the
squares 5 119.
• Check that SSwith 1 SSbet 5 SStot : 119 1 49 5 168.
suk85842_06_c06.indd 212 10/23/13 1:40 PM
CHAPTER 6Section 6.6 Completing ANOVA with Excel
Between 49 1 49 5.765 F.05(1,14) 5 4.60
Within 119 14 8.5
Total 168 15
Like the t-test, ANOVA indicates that the difference in the
amount of work completed at
home is significantly different for the two groups, so at least
both tests draw the same

conclusion about whether the result is significant, but there is
more similarity than this.
• Note that the calculated value of t 5 2.401, and the calculated
value of F 5 5.765.
• If the value of t is squared, it equals the value of F.
2.4012 5 5.765.
• The same is true for the critical values:
t.05(14) 5 2.145
F.05(1,14) 5 4.60
2.1452 5 4.60
Gosset’s and Fisher’s tests draw exactly equivalent conclusions
when there are two
groups. The ANOVA tends to be more work, and researchers
ordinarily use the t-test for
two groups, but the point is that the two tests are entirely
consistent.
6.6 Completing ANOVA with Excel
The ANOVA by longhand involves enough calculated means,
subtractions, squaring of differences, and so on that doing an
ANOVA on Excel is beneficial. A researcher
is comparing the level of optimism indicated by people in
different vocations during an
economic recession. The data is from laborers, clerical staff in
professional offices, and the
professionals in those offices. The data for the three groups
follows:

Laborers: 33, 35, 38, 39, 42, 44, 44, 47, 50, 52
Clerical staff: 27, 36, 37, 37, 39, 39, 41, 42, 45, 46
Professionals: 22, 24, 25, 27, 28, 28, 29, 31, 33, 34
1. Create the data file in Excel. Enter Laborers, Clerical staff,
and Professionals in
cells A1, B1, and C1, respectively.
2. In the columns below those labels, enter the optimism scores,
beginning in cell
A2 for the laborers, B2 for the clerical workers, and C2 for the
professionals. Once
the data is entered and checked for accuracy, proceed with the
following steps.
3. Click the Data tab at the top of the page.
H What is
the relationship
between the values
of t and F if both are
performed for the same
two-group test?
Try It!
suk85842_06_c06.indd 213 10/23/13 1:40 PM
CHAPTER 6Section 6.6 Completing ANOVA with Excel
4. At the extreme right, choose Data Analysis.
5. In the Analysis Tools window, select ANOVA Single Factor

and click OK.
6. Indicate where the data is located in the Input Range. In the
example here, the
range is A2:C11.
7. Note that the default is “Grouped by Columns.” If the data is
arrayed along rows
instead of columns, this would need to be changed.
Because we designated A2 instead of A1 as the point where the
data begins, there is no
need to indicate that labels are in the first row.
8. Select Output Range and enter a cell location where you wish
the display of the
output to begin. In the example in Figure 6.6, the location is
A13.
9. Click OK.
Widen column A to make the output easier to read. It will look
like the screenshot in
Figure 6.6.
Figure 6.6: Performing an ANOVA on Excel
suk85842_06_c06.indd 214 10/23/13 1:40 PM
As you have already seen in the two Apply It! boxes, the results
appear in two tables.
The first provides descriptive statistics. The second table looks

like the longhand table of
results for the social isolation example, except that
• the figures shown for the total follow those for between and
within instead of
preceding them, and
• the P-value column indicates the probability that an F of this
magnitude could
have occurred by chance.
Note that the P value is 4.31E-06. The “E-06” is scientific
notation. It is a shorthand way of
indicating that the actual value is p 5 .00000431, or 4.31 with
the decimal moved six deci-
mals to the left, or negative exponent to the sixth power The
probability easily exceeds the
p 5 .05 standard for statistical significance.
6.7 Presenting Results
The previous analyses all used Excel, so we will now shift to
using SPSS for the execution of these steps and the
interpretation of the results. We will first use the
data in Table 6.7 and then proceed with actual data gathered
from published research.
You will see that we use the same steps regardless of the sample
size, and that using
technology like Excel and SPSS makes hand calculations
unnecessary. While hand cal-
culations are instructive, they are also laborious and more prone
to errors, especially
with large data sets.
SPSS Example 1: Steps for ANOVA

After setting up the data in SPSS as seen in Figure 6.7 (data
from Table 6.7), the steps in
executing this analysis are as follows:
Analyze S Compare Means S One-Way ANOVA. Place
Treatment into the Factor box
and Skin Condition into the Dependent List. Click Post Hoc on
the left and check Tukey
and Games-Howell; then click Options and check Descriptive
and Homogeneity of
variance test. Click Continue and OK. (Note that the three
treatment groups in the data
set (Figure 6.7) are numerically coded: Pills 5 1, Creams 5 2,
and Drops 5 3.)
suk85842_06_c06.indd 215 10/23/13 1:40 PM
Figure 6.7: Data set in SPSS
suk85842_06_c06.indd 216 10/23/13 1:40 PM
Figure 6.8: SPSS output from trial of skin treatment conditions
Levene Statistic
1.822
df1

2
df2
21
Sig.
.186
Test of Homogeneity of Variances
SkinCondition
ANOVA
SkinCondition
Between
Groups
Within
Groups
Total
Sum of
Squares
14.083
85.750
99.833
df

2
21
23
Mean
Square
7.042
4.083
F
1.724
Sig.
0.203
Descriptives
SkinCondition
Pills
Creams
Total
Drops
N
8

8
8
24
16.50
14.88
14.88
15.42
Std. DeviationMean Std. Error
1.773
1.458
2.642
2.083
.627
.515
.934
.425
15.02
13.66

12.67
14.54
17.98
16.09
17.08
16.30
13
13
12
12
18
17
19
19
95% Confidence
Interval for Mean
Minimum
Lower
Bound

Upper
Bound
Maximum
Multiple Comparisons
Dependent Variable: SkinCondition
Tukey
HSD
Games-
Howell
Pills
Creams
Drops
Pills
Creams
Drops
(I)
Treatment
1.625
�1.625
1.625
.000

�1.625
1.625
.000
1.625
�1.625
�1.625
.000
.000
1.010
1.010
1.010
1.010
1.010
1.125
1.010
.811
1.125
.811

1.067
1.067
.264
.264
.264
1.000
.264
.350
1.000
.149
.350
.149
1.000
1.000
�.92
�4.17
�.92
�2.55

�4.17
�1.37
�2.55
�.51
�4.62
�3.76
�2.89
�2.89
4.17
.92
4.17
2.55
.92
4.62
2.55
3.76
1.37
.51

2.89
2.89
(J)
Treatment
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper
Bound
Sig.
Creams
Drops
Pills
Drops
Pills

Creams
Creams
Drops
Pills
Drops
Pills
Creams
suk85842_06_c06.indd 217 10/23/13 1:40 PM
As seen in the SPSS output (Figure 6.8), the ANOVA results are
the same as when exe-
cuted in Excel earlier in the chapter. Here SPSS allows
execution of the ANOVA including
descriptive statistics, tests of homogeneity of variance, post hoc
tests, and a line graphs—
all simultaneously executed using the SPSS steps outlined
earlier. The results begin with
the Descriptives table where you can see that each group has an
even number of partici-
pants (n 5 8). Here you can see difference in the means with
Pills (M 5 16.50) highest of
the three treatments. The Test of Homogeneity of Variance
shows a favorable result in that
it is not significant (p . .05), specifically p 5 .186. This

indicates that there is no significant
difference in the variance of the three treatments indicating
equal variances. As you recall
in earlier chapters, if there is inequality of variance across
groups an adjustment is needed
to compare groups. Next, the ANOVA table shows a
nonsignificant F statistic, p 5 .203. At
this stage since F is not significant we do not need to interpret
the post hoc tests as there
will be no significance between groups. As noted earlier in the
chapter, this is a debat-
able topic in that with the ease of running post hoc tests, the
analyst can easily look at the
results of these regardless of the F statistic result. Findings may
indicate significant differ-
ences between any two groups even though there is a
nonsignificant F. This is often a rar-
ity and you can clearly see from the example that none of the
post hoc tests is significant
between groups.
SPSS Example 2: Steps for ANOVA
Using public data about higher education and housing from Pew
Research (2010),
Social and Demographic Trends, the steps in executing this
analysis are as follows:
Analyze S Compare Means S One-Way ANOVA. Place schl
(currently enrolled in school)
into the Factor box and age into the Dependent List. Click Post
Hoc on the left and check
Tukey and Games-Howell; then click Options and check
Descriptive, Homogeneity of
variance test, and Means plot. Click Continue and OK.
suk85842_06_c06.indd 218 10/23/13 1:40 PM

Figure 6.9: SPSS output from Pew research social and
demographic trends
(2010) education data set
Levene Statistic
44.884
df1
5
df2
1692
Sig.
.000
Test of Homogeneity of Variances
AGE What is your age?
ANOVA
Between
Groups
Within
Groups

Total
Sum of
Squares
72748.597
272338.706
345087.303
df
5
1692
1697
Mean
Square
14549.719
160.957
F
90.395
Sig.
.000
Descriptives

Std. DeviationMean Std. Error
18 43
18 58
18 64
18 64
18 64
22 34
19.64
24.81
36.03
31.38
42.05
29.33
38.82
33
212
33

81
1336
3
1698
4.801
8.433
15.503
10.692
13.399
6.429
14.260
.836
.579
2.699
1.188
.367
3.712
.346

17.93
23.66
30.53
29.02
41.33
13.36
38.14
21.34
25.95
41.53
33.75
42.77
45.30
39.49 18 64
95% Confidence
Interval for Mean
Minimum
Lower
Bound

Upper
Bound
Maximum
Yes, in
High School
Yes, in Technical,
trade, or
vocational school
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in Graduate
School
No
Don’t know/
Refused (VOL.)
Total
N
suk85842_06_c06.indd 219 10/23/13 1:40 PM

demographic trends
(2010) education data set (continued)
Dependent Variable: AGE What is your age?
Tukey
HSD
(I)
SCHL Are you
currently enrolled
in school?
(J)
SCHL Are you
currently enrolled in
school?
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper

Bound
Sig.
Yes, in High School
Yes, in High School
Yes, in Technical, trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in College
2-year colleges)
Yes, in College
2-year colleges)
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
vocational school
Yes, in High School
vocational school

Yes, in College
2-year colleges)
Yes, in High School
vocational school
No
No
No
Yes, in College
2-year colleges)
Yes, in High School

vocational school
No
�16.394*
�5.170
�11.746*
�22.417*
�9.697
16.394*
11.224*
4.648
�6.023
6.697
5.170
�11.224*
�6.576*
�17.247*
�4.527

11.746*
�4.648
6.576*
�10.670*
2.049
22.417*
6.023
17.247*
12.720
10.670*
9.697
�6.697
4.527
�2.049
�12.720
3.123
2.374
2.620

2.236
7.650
3.123
2.374
2.620
2.236
7.650
2.374
2.374
1.657
.938
7.376
2.620
2.620
1.657
1.452
7.459
2.236

2.236
.938
7.333
1.452
7.650
7.650
7.376
7.459
7.333
.000
.249
.000
.000
.803
.000
.000
.483
.077

.952
.249
.000
.001
.000
.990
.000
.483
.001
.000
1.000
.000
.077
.000
.509
.000
.803
.952

.990
1.000
.509
�25.30
�11.94
�19.22
�28.79
�31.52
7.48
4.45
�2.83
�12.40
�15.13
�1.60
�18.00
�11.30
�19.92
�25.57

4.27
�12.12
1.85
�14.81
�19.23
16.04
�.36
14.57
�8.20
6.53
�12.13
�28.52
�16.52
�23.33
�33.64
�7.48
1.60
�4.27

�16.04
12.13
25.30
18.00
12.12
.36
28.52
11.94
�4.45
�1.85
�14.57
16.52
19.22
2.83
11.30
�6.53
23.33
28.79

12.40
19.92
33.64
14.81
31.52
15.13
25.57
19.23
8.20
* The mean difference is significant at the 0.05 level.
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in
Graduate School
No
Yes, in
High School
Yes, in Technical,
trade, or

vocational school
Don’t know/
Refused (VOL.)
suk85842_06_c06.indd 220 10/23/13 1:40 PM
demographic trends
(2010) education data set (continued)
Source: Data from Pew Research: Social and Demographic
Trends. (2011). Higher Education/Housing. Retrieved from
http://www
.pewsocialtrends.org/category/datasets/.
Dependent Variable: AGE What is your age?
Games-
Howell
(I)
SCHL Are you
currently enrolled
in school?
(J)
SCHL Are you
currently enrolled in

school?
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper
Bound
Sig.
Yes, in High School
Yes, in High School
vocational school
Yes, in College
2-year colleges)
Yes, in College
2-year colleges)

Yes, in College
2-year colleges)
No
vocational school
Yes, in High School
vocational school
Yes, in College
2-year colleges)
Yes, in High School
vocational school
No
No

No
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in
Graduate School
No
Yes, in
High School
Yes, in Technical,
trade, or
vocational school
Yes, in College
2-year colleges)
Yes, in High School
vocational school

No
Don’t know/
Refused (VOL.)
* The mean difference is significant at the 0.05 level.
�16.394*
�5.170*
�11.746*
�22.417*
�9.697
16.394*
11.224*
4.648
�6.023
6.697
5.170*
�11.224*
�6.576*

�17.247*
�4.527
11.746*
�4.648
6.576*
�10.670*
2.049
22.417*
6.023
17.247*
12.720
10.670*
9.697
�6.697
4.527
�2.049
�12.720
2.825

1.017
1.453
.913
3.805
2.825
2.760
2.949
2.724
4.589
1.017
2.760
1.322
.685
3.757
1.453
2.949
1.322
1.243

3.897
.913
2.724
.685
3.730
1.243
3.805
4.589
3.757
3.897
3.730
.000
.000
.000
.000
.374
.000
.003

.618
.260
.701
.000
.003
.000
.000
.814
.000
.618
.000
.000
.990
.000
.260
.000
.247
.000

.374
.701
.814
.990
.247
�24.87
�8.15
�15.96
�25.13
�38.01
7.92
2.91
�4.13
�14.25
�13.62
2.19
�19.54
�10.40

�19.21
�34.04
7.53
�13.42
2.75
�14.29
�24.33
19.70
�2.21
15.28
�17.54
7.05
�18.62
�27.02
�24.99
�28.43
�42.98
�7.92

�2.19
�7.53
�19.70
18.62
24.87
19.54
13.42
2.21
27.02
8.15
�2.91
�2.75
�15.28
24.99
15.96
4.13
10.40
�7.05

28.43
25.13
14.25
19.21
42.98
14.29
38.01
13.62
34.04
24.33
17.54
suk85842_06_c06.indd 221 10/23/13 1:40 PM
http://www.pewsocialtrends.org/category/datasets/
Figure 6.10: SPSS output graph from Pew research social and
demographic
trends (2010) education data set
Source: Data from Pew Research: Social and Demographic
Trends. (2011). Higher Education/Housing. Retrieved from

http://www
.pewsocialtrends.org/category/datasets/.
The Descriptives table in Figures 6.9 and 6.10 show that each
group has an unequal number
of participants with No (not in school) participants with n 5
1,336 and the highest mean
age (M 5 42.05). The Test of Homogeneity of Variance shows a
nonfavorable result in that
it is significant (p , .05). This indicates that there is a
significant difference in the variance
of the six education groups indicating unequal variances (or
heterogeneity of variance).
Next, the ANOVA table indicates a significant F statistic (p ,
.05). To determine which
of the group comparisons is significant using a post hoc test
when there is a violation of
homogeneity, equal variance will not be assumed. Therefore, we
will interpret the Equal
variances not assumed post hoc tests, which is Games-Howell.
Here, the Don’t Know/
Refused group is not significant with any of the other education
groups. You can also
see significant difference with several groups such as Yes, in
High School and Yes, in
Technical, trade, or vocational school. All comparisons can be
made in a similar manner
based on the significance value using the Multiple Comparisons
table. The line graph or
means plot shows the mean age of each group with the No group
having the highest mean
age and the Yes, in High School group having the lowest.
suk85842_06_c06.indd 222 10/23/13 1:40 PM

CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
Test
6.8 Interpreting Results
Though you should refer to the most recent edition of the APA
manual for specific detail on formatting statistics, the following
may be used as a quick guide in presenting the
statistics covered in this chapter.
Table 6.8: Guide to APA formatting of F statistic results
Abbreviation or Term Description
F F test statistic score
h2 Eta-squared: an effect size
v2 Omega-squared: an effect size
HSD Honestly significant difference: a Tukey’s post hoc test
SS Sum of squares
MS Mean square
Source: Publication Manual of the American Psychological
Association, 6th edition. © 2009 American Psychological
Association,
pp. 119–122.
Note that all of the terms in Table 6.8 are italicized, while HSD
is not. The following are

some examples of how to present results using these
abbreviations, though you may use
different combinations of results.
Using the data from SPSS Example 2, Figures 6.9 and 6.10, we
could present the results in
the following way:
• The overall difference between treatment and skin condition
was not signifi-
cantly different F(2,21) 5 1.724, p 5 .203. (Note that the df is
listed for both the
between- and within-group lines in the ANOVA table.)
• The overall difference between school and age was
significantly different
F(5,1692) 5 90.39, p , .05.
• The No [school] group were significantly older (M 5 42.05,
SD 5 13.39) than the
Yes, in High School group (M 5 19.64, SD 5 4.80), the Yes, In
College. . . group
(M 5 24.81, SD 5 8.43), and the Yes, in Graduate School group
(M 5 31.38,
SD 5 10.69), whereas there were no significant differences with
the Yes, in Tech-
nical, trade. . . group (M 5 36.03, SD 5 15.50), and the Don’t
Know/Refused
group (M 5 29.33, SD 5 6.43).
6.9 Nonparametric Test: Kruskal-Wallis H-test
The one-way ANOVA nonparametric equivalent is the Kruskal-
Wallis H-test, also known as the Kruskal Wallis ANOVA. Like
the Mann-Whitney U-test, the Kruskal-
Wallis H-test is based on ranked (ordinal) data. It is used as an

alternative to its para-
metric counterpart when violations of assumptions have
occurred. In fact, Kruskal was
suk85842_06_c06.indd 223 10/23/13 1:40 PM
CHAPTER 6
not a proponent of significance testing, as Bradburn (2007)
has quoted him as saying, “I am thinking these days about
the many senses in which relative importance gets consid-
ered. Of these senses, some seem reasonable and others not
so. Statistical significance is low on my ordering.” That said,
his derived equivalent of a parametric technique is very
apropos.
As in the Mann-Whitney U-test, the rank of each group is
determined and then summed. The H is calculated as a pro-
portion of the summed ranks divided by their respective
sample sizes.
H 5
12
N1N 1 12 a a
1Tg 2 2
ng
b 2 31N 1 12 Formula
6.7
Where

N 5 total sample size
Tg 5 sum of ranks across
ng 5 sample of group
To illustrate the calculation of the H-test, we will use the same
data from Table 6.7 with a
few modifications as seen in Table 6.9. Here the initial rank is
to rank all the values across
treatments with 1 being the lowest rank. If there are tied ranks,
then an average of the ranks
is taken. For instance in the Pills column, the two values of 12
have an initial rank of 1 and
2. The average of them is 1.5 as seen in the Rank column. The
same is true for values of 13,
where there are four ranks with the average rank of 4.5, and so
on with the other ties. Once
all of these are complete, then the ranks are summed as seen in
the last row of the table.
Table 6.9: Data from trial of skin treatment conditions
Pills Initial
Rank
Rank Cream Initial
Rank
Rank Drops Initial
Rank
Rank
14 7 7.5 18 21 21.5 13 3 4.5

13 4 4.5 15 10 10.5 15 11 10.5
19 24 24 16 14 14.5 16 15 14.5
18 20 21.5 18 22 21.5 15 12 10.5
15 9 10.5 17 17 18 14 8 7.5
16 13 14.5 13 5 4.5 17 19 18
12 1 1.5 17 18 18 13 6 4.5
12 2 1.5 18 23 21.5 16 16 14.5
85.5 130 84.5
There are several
websites that will
help in these calculations.
One well-used statistical
calculator for various
analyses, such as the
Kruskal-Wallis H-test,
can be done using the
resources available at the
VassarStats website via
the link provided below.
Use the data provided in
this chapter section to see
if you get the same results.
http://vassarstats.net
/index.html
Try It!

Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test
suk85842_06_c06.indd 224 10/23/13 1:40 PM
http://vassarstats.net/index.html
http://vassarstats.net/index.html
Test
Next each of the summed ranks are divided by their respective
sample sizes, completing
Formula 6.7.
H 5
12
24124 1 12 1 c
185.52 2
8
1
11302 2
8
1
184.52 2
8
d 2 3124 1 12
H 5 12 3 17,310.252 1 116,9002 1 17,140.252 4 2 3124 1 12
H 5 0.022 (913.78 1 2,112.50 1 892.53) 2 69

H 5 0.022 (3,968.31) 2 69
H 5 18.30
The H statistic approximates a chi-square (x2) distribution,
which will be discussed in
Chapter 11, based on k 2 1 degrees of freedom where k is the
number of comparison
groups. The chi-square distribution table in Table 6.10 has the
critical values based on the
degrees of freedom, that is N 2 1 5 23. Therefore, using the
table, the x2critical 5 35.17 at
the a 5 .05 level. As noted our x2observed value above 18.30 is
less than this x
2
critical 5 35.17
value meaning that there is no significant difference between
groups. As noted in the
ANOVA conducted earlier in the chapter, it was expected that a
nonsignificant outcome
would occur. Nonparametric tests are more conservative
compared to parametric ones in
that there is a lower probability of finding a significant outcome
compared to its paramet-
ric counterpart. This also leads to a lower probability of a type I
error.
Table 6.10: Chi-square distribution
Area to the right of critical value
Degrees of
freedom

0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01
1 — 0.001 0.004 0.016 2.706 3.841 5.024 6.635
2 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210
3 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345
4 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277
5 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209
11 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725
12 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217
(continued)
suk85842_06_c06.indd 225 10/23/13 1:40 PM
Test
Table 6.10: Chi-square distribution (continued)

Area to the right of critical value
Degrees of
freedom
0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01
13 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688
14 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141
15 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578
16 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000
17 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409
18 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805
19 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191
20 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566
21 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932
22 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289
23 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638
24 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980
25 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314
26 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642

27 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963
28 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278
29 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588
30 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892
As you will see in the next section, when this analysis is
performed in SPSS a x2 value is
given and not an H value per se.
SPSS Steps for the Kuskal-Wallis H-test
Reexamining the data set used Figure 6.6, but rearranging the
data as depicted in Figure
6.11, the employee groups (Position) are categorically coded
with 1 5 Laborers, 2 5 Cleri-
cal, and 3 5 Professional. To execute, go to Analyze S
Nonparametric Tests S Legacy
Dialogs S K Independent Samples. As shown in Figure 6.12,
input Optimism (DV) into
the Test Variable List box and Position (IV) into the Grouping
Variable box, then click
the Define Range button just below to input the range of codes
for the Position variable—
this will be 1 and 3 for the minimum and maximum codes,
respectively. Then click OK.
suk85842_06_c06.indd 226 10/23/13 1:40 PM
Test

Figure 6.11: Data set in SPSS
Figure 6.12: The Kruskal-Wallis H-test steps in SPSS
suk85842_06_c06.indd 227 10/23/13 1:40 PM
test
Interpreting Results
The output in Figure 6.13 shows the results of the Kuskal-
Wallis H-test. The x2 value in the
Test Statistics table shows a result as KW—x2(3) 5 7.17, p ,
.05 as there is an overall sta-
tistical difference in optimism amongst the three employee
groups. This can be seen in the
Ranks table where Laborers’ mean ranks (MR 5 21.80) is
highest compared to the lowest
by Professionals (MR 5 6.30). Consequently, the post hoc tests
are not readily available as
they were for the ANOVA, so follow-up Mann-Whitney U or
Wilcoxon rank-sum tests of
all possible combinations will have to be performed (see
Chapter 5 for these procedures).
The conclusion to these results would read as follows:
Based on the Kruskal-Wallis H-test there is a significant
difference in the
level of optimism of the three groups (KW—x2(3) 5 17.17, p ,
.05). Labor-
ers reported the highest level of optimism (MR 5 21.80)
followed by Cleri-

cal positions (MR 5 18.40), and then Professionals (MR 5 6.30),
which
reported the lowest level of optimism.
Figure 6.13: The Kruskal-Wallis H-test output
Groups
Optimism
Laborers
Clerical
Professionals
Total
Position N
10
10
10
30
Mean Rank
21.80
18.40
6.30

Ranks
Chi-Square 17.166
2df
Asymp. Sig. .000
Test Statisticsa,b
Optimism
a. Kruskal Wallis Test
b. Grouping Variable: Position
suk85842_06_c06.indd 228 10/23/13 1:40 PM
CHAPTER 6Summary
Summary
This chapter is the natural extension of Chapters 4 and 5. Like
the z- and t-tests, analysis
of variance is a test of significant differences. Also like the z-
and t-tests, the IV in ANOVA
is nominal and the DV is interval or ratio. With each
procedure—whether z, t, or F—the
test statistic is a ratio of the differences between groups to the
differences within groups
(Objective 3).
There are differences between ANOVA and the earlier
procedures, of course. The vari-
ance statistics are sums of squares and mean squares values. But
perhaps the most impor-

tant difference is that ANOVA can accommodate any number of
groups (Objectives 2 and
3). Remember that trying to deal with multiple groups in a t-test
introduces the problem
of mounting type I error when repeated analyses with the same
data indicate statistical
significance. One-way ANOVA lifts the limitation of a one-
pair-at-a-time comparison
(Objective 1).
The other side of multiple comparisons, however, is the
difficulty of determining which
comparisons are statistically significant when F is significant.
This problem is solved
with the post hoc test. In this chapter, we used Tukey’s HSD
(Objective 4). There are
other post hoc tests, each having their strengths and drawbacks,
but HSD is one of the
most widely used.
Years ago, the emphasis in the scholarly literature was on
whether a result was statisti-
cally significant. Today, the focus is on measuring the effect
size of a significant result, a
statistic that in the case of analysis of variance can indicate how
much of the variability
in the dependent variable can be attributed to the effect of the
independent variable. We
answered that question with eta-squared (h2). But neither the
post hoc test nor eta-squared
is relevant if the F is not significant (Objective 5). Then, further
ANOVAs were executed in
SPSS, and the results were presented (Objective 6) in APA
format and interpreted accord-
ingly (Objective 7). Finally, the nonparametric equivalent of
ANOVA, Kruskal-Wallis

H-test, was discussed as an alternative method and compared to
its parametric equivalent,
the ANOVA. The same data set was used to compare outcomes.
In addition, an appropri-
ate example in SPSS was provided (Objective 8).
The independent t-test and the one-way ANOVA both require
that groups be indepen-
dent. What if they are not? What if we wish to measure one
group twice over time, or
perhaps more than twice? Such dependent-groups procedures are
the focus of Chapter 7.
Rather than different thinking, it is more of an elaboration of
familiar concepts. For this
reason, consider reviewing Chapter 5 and the independent t-test
discussion before start-
ing Chapter 7.
The one-way ANOVA dramatically broadens the kinds of
questions the researcher can
ask. The procedures in Chapter 7 for nonindependent groups
represent the next incre-
mental step.
suk85842_06_c06.indd 229 10/23/13 1:40 PM
Key Terms
analysis of variance Fisher’s test that
allows one to detect significant differences
among any number of groups. The acro-
nym is ANOVA.

error variance The variability in a measure
unrelated to the variables being analyzed.
Eta-squared A measure of effect size for
ANOVA. It estimates the amount of vari-
ability in the DV explained by the IV.
F ratio The test statistic calculated in an
analysis of variance problem. It is the ratio
of the variance between the groups to the
variance within the groups.
factor Refers to an IV, particularly in pro-
cedures that involve more than one.
family-wise error An inflated type I error
rate in hypothesis testing when doing mul-
tiple tests with the assumption of different
sets of data. Specifically, when comparing
multiple groups in dyad combinations
using a series of t-tests instead of executing
one omnibus ANOVA.
homogeneity of variance When multiple
groups of data are distributed similarly.
mean square The sum of squares divided
by its degrees of freedom. This division
allows the mean square to reflect a mean, or
average, amount of variability from a source.
omnibus test A test of the overall sig-
nificance of the model based on difference
between sample means when there are
more than two groups to compare. The test

will not tell you which two means are sig-
nificantly different, which is why follow-up
post hoc comparisons are executed.
one-way ANOVA The ANOVA in its sim-
plest form, this model has only one inde-
pendent variable.
post hoc test A test conducted after a sig-
nificant ANOVA or some similar test that
identifies which among multiple possibili-
ties is statistically significant.
sum of squares (SS) The variance measure
in analysis of variance. They are literally
the sum of squared deviations between a
set of scores and their mean.
sum of squares between The variability
related to the independent variable and any
measurement error that may occur.
sum of squares total Total variance from
all sources.
sum of squares within Variability stem-
ming from different responses from indi-
viduals in the same group. It is exclusively
error variance. Is also referred to as the sum
of squares error or the sum of squares residual.
Chapter Exercises
Answers to Try It! Questions
The answers to all Try It! questions introduced in this chapter
are provided below.

A. The “one” in one-way ANOVA refers to the fact that this test
accommodates just
one independent variable.
B. There is no gender variable in the analysis and consequently,
gender-related
variance emerges as error variance. The same would be true for
any variability
in scores stemming from any variable not being analyzed in the
study.
suk85842_06_c06.indd 230 10/23/13 1:40 PM
C. It would take 15 comparisons! The answer is the number of
groups (6)
times the number of groups minus 1 (5), with the product
divided by 2:
6 3 5 5 30/2 5 15.
D. The only way SS values can be negative is if there has been a
calculation error.
Because the values are all squared values, if they have any
value other than 0,
they have to be positive.
E. The difference between SStot and SSwith is the SSbet.
F. If F 5 4 and MSwith 5 2, then MSbet 5 8 because F 5
MSbet 4 MSwith.
G. The answer is neither. If F is not significant, there is no
question of which group

is significantly different from which other group because any
variability may be
nothing more than sampling variability. By the same token,
there is no effect to
calculate because, as far as we know, the IV does not have any
effect on the DV.
H. F 5 t2
Review Questions
The answers to the odd-numbered items can be found in the
answers appendix.
1. Several people selected at random are given a story problem
to solve. They take
3.5, 3.8, 4.2, 4.5, 4.7, 5.3, 6.0, and 7.5 minutes. What is the
total sum of squares for
this data?
2. Identify the following symbols and statistics in a one-way
ANOVA:
a. The statistic that indicates the mean amount of difference
between groups.
b. The symbol that indicates the total number of participants.
c. The symbol that indicates the number of groups.
d. The mean amount of uncontrolled variability.
3. The theory is that there are differences by gender in
manifested aggression. With
data from Measuring Expressed Aggression Numbers (MEAN),
a researcher has
the following:
Males: 13, 14, 16, 16, 17, 18, 18, 18
Females: 11, 12, 12, 14, 14, 14, 14, 16

Complete the problem as an ANOVA. Is the difference
statistically significant?
4. Complete Exercise 3 as an independent t-test and demonstrate
the relationship
between t2 and F.
5. Even with a significant F, there is never a need for a post hoc
in a two-group
ANOVA. Why?
6. A researcher completes an ANOVA in which the number of
years of education
completed is analyzed by ethnic group. If h2 5 .36, how should
that be interpreted?
suk85842_06_c06.indd 231 10/23/13 1:40 PM
7. Three groups of clients involved in a program for substance
abuse attend weekly
sessions for 8, 12, and 16 weeks. The DV is the number of days
drug free.
8 weeks: 0, 5, 7, 8, 8
12 weeks: 3, 5, 12, 16, 17
16 weeks: 11, 15, 16, 19, 22
a. Is F significant?
b. What is the location of the significant difference?

c. What does the effect size indicate?
8. Regarding Exercise 7,
a. what is the IV?
b. what is the scale of the IV?
c. what is the DV?
d. what is the scale of the DV?
9. For an ANOVA problem, k 5 4 and n 5 8.
If SSbet 5 24.0
and SSwith 5 72,
a. what is F?
b. is the result significant?
10. Consider this partially completed ANOVA table:
Total 94
Between 2
Within 63 3
a. What must be the value of N 2 k?
b. What must be the value of k?
c. What must be the value of N?
d. What must SSbet be?
e. Determine MSbet.
f. Determine F.
g. What is Fcrit?
Analyzing the Research
Review the article abstracts provided below. You can then

access the full articles via your
university’s online library portal to answer the critical thinking
questions. Answers can be
found in the answers appendix.
Using ANOVA for an Emotions Study
Carolan, L. A., & Power, M. J. (2011). What basic emotions are
experienced in bipolar
disorder? Clinical Psychology & Psychotherapy, 18(5), 366–
378.
suk85842_06_c06.indd 232 10/23/13 1:40 PM
Article Abstract
Aims: The aims of this study were to investigate the basic
emotions experienced within
and between episodes of bipolar disorder and, more specifically,
to test the predictions
made by the Schematic, Propositional, Analogical and
Associative Representation Sys-
tems (SPAARS) model that mania is predominantly
characterized by the coupling of
happiness with anger whereas depression (unipolar and bipolar)
primarily comprises a
coupling between sadness and disgust.
Design: Across-sectional design was employed to examine the
differences within and
between the bipolar, unipolar and control groups in the
emotional profiles. Data were

analyzed using one-way ANOVAs.
Method: Psychiatric diagnoses in the clinical groups were
confirmed using the Structured
Clinical Interview for DSM-IV (SCID). It was not administered
in the control group. Cur-
rent mood state was measured using the Beck Depression
Inventory-II, the State–Trait
Anxiety Inventory and the Bech–Rafaelsen Mania Scale. The
Basic Emotions Scale was
used to explore the emotional profiles.
Results: The results confirmed the predictions made by the
SPAARS model about emo-
tions in mania and depression. Out with these episodes,
individuals with bipolar disorder
experienced elevated levels of disgust.
Discussion: Evidence was found in support of the proposal of
SPAARS that there are five
basic emotions, which form the basis for both normal emotional
experience and emotional
disorders. Disgust is an important feature of bipolar disorder.
Strengths and limitations
are discussed, and suggestions for future research are explored.
1. Why does this study use a one-way ANOVA instead of a t-
test?
2. What means are being compared in the bipolar group in this
study?
3. According to the following ANOVA results between bipolar
and unipolar groups,

which result(s) showed significance?
F(1,46) 5 0.00; p 5 .93
F(1,19.22) 5 9.81; p 5 .005
F(1,45) 5 1.26; p 5 .26
F(1,44) 5 0.02; p 5 .87
F(1,45) 5 0.13; p 5 .71
4. What types of post hoc test did the paper use as a follow-up
to the F statistic?
suk85842_06_c06.indd 233 10/23/13 1:40 PM
Using ANOVA for a Health and Physical Activity Study
Bize, R., & Plotnikoff, R. C. (2009). The relationship between a
short measure of health
status and physical activity in a workplace population.
Psychology, Health & Medi-
cine, 14(1), 53–61.
Article Abstract
Many interventions promoting physical activity (PA) are
effective in preventing disease
onset, and although studies have found a positive relationship
between health-related
quality of life (HRQL) and PA, most of these studies have

focused on older adults and
those with chronic conditions. Less is known regarding the
association between PA level
and HRQL among healthy adults. Our objective was to analyse
the relationship between
PA level and HRQL among a sample of 573 employees aged 20–
68 taking part in a work-
place intervention to promote PA. Measures included HRQL
(using a single item) and
PA (i.e., Godin Leisure-Time Questionnaire). The Modified
Canadian Aerobic Fitness
Test (MCAFT) was also completed by 10% of the employees.
MET-minute scores (assess-
ing energy expenditure over one week) were compared across
HRQL categories using
ANOVA. A multiple linear regression analysis was conducted to
further examine the rela-
tionship between HRQL and PA, controlling for potential
covariates. Participants in the
higher health status categories were found to report higher
levels of energy expenditure
(one-way ANOVA, p , 0.001). In the multiple linear regression
model, each unit increase
in health status level translated in a mean increase of 356 MET-
minutes in energy expen-
diture (p , 0.001). This single-item assessment of health status
explained six percent of
the variance in energy expenditure. The study concludes that
higher energy expenditure
through PA among an adult workplace population is positively
associated with increased
health status, and it also suggests that a single-item HRQL
measure is suitable for com-
munity- and population based studies, reducing response burden
and research costs.

1. Why did this study execute a Kruskal-Wallis H-test?
2. It was stated that the higher health status categories reported
higher mean energy
expenditure of the one-way ANOVA, and the Kruskal-Wallis
yielded similar
results. To make this plausible, what would the significance
level of the Kruskal-
Wallis have been?
3. After evaluating figure 1, we can see there is a difference in
higher health status
and higher energy expenditure. From this information, should
they have run a
post hoc test? Why or why not?
suk85842_06_c06.indd 234 10/23/13 1:40 PM
Research Question FOR WEEK ONE
Background
During this week you will brainstorm a list of research
questions you are interested in, which will help you work
towards your Week 1 Assignment. You are working towards
creating a list of at least 10 unique research questions that
encompass a variety of topics and types of variables. Think
about exploring relationships between variables, making
predictions for one variable using one or more other variables,
and determining differences between groups across one or two
variables. In future weeks, you will pull questions from this list
that might lend themselves to a particular statistical analysis,
thus saving valuable time in not needing to brainstorm research
ideas. During those weeks you will take the research question
and create a mini-research proposal that will help you consider

the application of a specific statistical analysis to that question.
Discussion Assignment Requirements
Initial Posting - To earn full participation points, include in
your initial posting at least 5 potential research questions by
Day 3. Have fun with these questions and choose topics you are
truly interested in, whether they are leadership, training, sports,
social media, politics, movies, or food. This will make the
research design process much more enjoyable. If you need help
coming up with ideas, ask your instructor for examples. Also,
feel free to post more than 5 research questions as it would be
useful to get feedback on as many questions as possible.
For each of the questions, provide the following:
· List the research question (be sure to phrase as a measurable
question)
· Identify the variables presented in the question
· Provide an operational definition for each variable
· Describe each variable’s scale of measurement (nominal,
ordinal, interval, or ratio) and characteristics (i.e., discrete vs.
continuous, numerical vs. categorical, etc.)
Replies - Though you may respond to your peers multiple times
during the week to provide support or feedback, students are
required to respond substantively to at least two of their
classmates’ postings by
ANSWER FOR DISCUSSION WEEK 1
Research discussion
Research questions one: How does leadership style affect
organizational performance?
In this research question, the independent variables are
leadership style, while the dependent variable is organizational
performance (Sukal, 2019). Leadership styles are techniques
used by organizations to run their activities to achieve their
objectives. Besides, organizational performance entails various
achievements of an entity that are accrued from its business
operations. An ordinal scale of measurement can be used in this
case.

Research questions two: Effects of technology on students'
performance?
In the case, technology is the independent variable while
students' performance is the dependent variable. Technology in
education is scientific knowledge used to improve the level of
education (Sukal, 2019). Student performance refers to how
students carry out their studies. An ordinal scale of
measurement is appropriate to measure how technology affects
students' performance.
Research questions Three: what are the effects of smoking on
human health?
Smoking is the independent variable, while human health is the
dependent variable. Smoking is the inhalation of tobacco
products, while human health is the well-being of the human
condition (Carruthers & Maggard, 2019). An ordinal scale of
measurement is used in this case.
Research questions four: Effects of training on employee
performance?
Training is the independent variable, while employee
performance is the dependent variable (Carruthers & Maggard,
2019). Training involves equipping employees with the
knowledge to perform their duties appropriately. Employee
performance is the output that is accrued from different
activities. An ordinal scale is used in this research question.
Research questions five: How does management styles affect
employee performance?
Management styles are the independent variable, while
employee performance is the dependent (Carruthers & Maggard,
2019). Management styles are techniques used by the
management to run business activities while employee
performance is output accrued from employees' actions. An
ordinal scale is used in this research question.
References
Carruthers, M. W., Maggard, M. (2019). Smart Lab: A Statistics
Primer. San Diego, CA: Bridge point Education, Inc.
Sukal, M. (2019). Research methods: Applying statistics in

research. San Diego, CA: Bridge point Education, Inc.
PROFFESSOR RESPOND:
Interesting questions!
Please be sure to include operational definitions of your DVs -
i.e. employee performance. How would you measure it? It
might be helpful to review the operational definition
announcement in the course. Remember, we need to include
enough detail about our methodology and variables so that
anyone could replicate our work.

Stepby-step guide to critiquingresearch. Part 1 quantitati.docx

More Related Content

Similar to Stepby-step guide to critiquingresearch. Part 1 quantitati.docx (20)

More from susanschei (20)

Recently uploaded (20)

Stepby-step guide to critiquingresearch. Part 1 quantitati.docx