SlideShare a Scribd company logo
Step'by-step guide to critiquing
research. Part 1: quantitative research
Michaei Coughian, Patricia Cronin, Frances Ryan
Abstract
When caring for patients it is essential that nurses are using the
current best practice. To determine what this is, nurses must be
able
to read research critically. But for many qualified and student
nurses
the terminology used in research can be difficult to understand
thus making critical reading even more daunting. It is
imperative
in nursing that care has its foundations in sound research and it
is
essential that all nurses have the ability to critically appraise
research
to identify what is best practice. This article is a step-by step-
approach
to critiquing quantitative research to help nurses demystify the
process and decode the terminology.
Key words: Quantitative research
methodologies
Review process • Research
]or many qualified nurses and nursing students
research is research, and it is often quite difficult
to grasp what others are referring to when they
discuss the limitations and or strengths within
a research study. Research texts and journals refer to
critiquing the literature, critical analysis, reviewing the
literature, evaluation and appraisal of the literature which
are in essence the same thing (Bassett and Bassett, 2003).
Terminology in research can be confusing for the novice
research reader where a term like 'random' refers to an
organized manner of selecting items or participants, and the
word 'significance' is applied to a degree of chance. Thus
the aim of this article is to take a step-by-step approach to
critiquing research in an attempt to help nurses demystify
the process and decode the terminology.
When caring for patients it is essential that nurses are
using the current best practice. To determine what this is
nurses must be able to read research. The adage 'All that
glitters is not gold' is also true in research. Not all research
is of the same quality or of a high standard and therefore
nurses should not simply take research at face value simply
because it has been published (Cullum and Droogan, 1999;
Rolit and Beck, 2006). Critiquing is a systematic method of
Michael Coughlan, Patricia Cronin and Frances Ryan are
Lecturers,
School of Nursing and Midwifery, University of Dubhn, Trinity
College, Dublin
Accepted for publication: March 2007
appraising the strengths and limitations of a piece of research
in order to determine its credibility and/or its applicability
to practice (Valente, 2003). Seeking only limitations in a
study is criticism and critiquing and criticism are not the
same (Burns and Grove, 1997). A critique is an impersonal
evaluation of the strengths and limitations of the research
being reviewed and should not be seen as a disparagement
of the researchers ability. Neither should it be regarded as
a jousting match between the researcher and the reviewer.
Burns and Grove (1999) call this an 'intellectual critique'
in that it is not the creator but the creation that is being
evaluated. The reviewer maintains objectivity throughout
the critique. No personal views are expressed by the
reviewer and the strengths and/or limitations of the study
and the imphcations of these are highlighted with reference
to research texts or journals. It is also important to remember
that research works within the realms of probability where
nothing is absolutely certain. It is therefore important to
refer to the apparent strengths, limitations and findings
of a piece of research (Burns and Grove, 1997). The use
of personal pronouns is also avoided in order that an
appearance of objectivity can be maintained.
Credibility and integrity
There are numerous tools available to help both novice and
advanced reviewers to critique research studies (Tanner,
2003). These tools generally ask questions that can help the
reviewer to determine the degree to which the steps in the
research process were followed. However, some steps are
more important than others and very few tools acknowledge
this. Ryan-Wenger (1992) suggests that questions in a
critiquing tool can be subdivided in those that are useful
for getting a feel for the study being presented which she
calls 'credibility variables' and those that are essential for
evaluating the research process called 'integrity variables'.
Credibility variables concentrate on how believable the
work appears and focus on the researcher's qualifications and
ability to undertake and accurately present the study. The
answers to these questions are important when critiquing
a piece of research as they can offer the reader an insight
into vhat to expect in the remainder of the study.
However, the reader should be aware that identified strengths
and limitations within this section will not necessarily
correspond with what will be found in the rest of the work.
Integrity questions, on the other hand, are interested in the
robustness of the research method, seeking to identify how
appropriately and accurately the researcher followed the
steps in the research process. The answers to these questions
658 British Journal of Nursing. 2007. Vol 16, No II
RESEARCH METHODOLOGIES
Table 1. Research questions - guidelines for critiquing a
quantitative research study
Elements influencing the beiievabiiity of the research
Elements
Writing styie
Author
Report titie
Abstract
Questions
Is the report well written - concise, grammatically correct,
avoid the use of jargon? Is it weil iaid out and
organized?
Do the researcher(s') quaiifications/position indicate a degree of
knowledge in this particuiar field?
Is the title clear, accurate and unambiguous?
Does the abstract offer a clear overview of the study including
the research problem, sample,
methodology, finding and recommendations?
Elements influencing the robustness of the research
Elements
Purpose/research
Problem
Logical consistency
Literature review
Theoreticai framework
Aims/objectives/
research question/
hypotheses
Sampie
Ethicai considerations
Operational definitions
Methodology
Data Anaiysis / results
Discussion
References
Questions
Is the purpose of the study/research problem clearly identified?
Does the research report foilow the steps of the research process
in a iogical manner? Do these steps
naturally fiow and are the iinks ciear?
is the review Iogicaily organized? Does it offer a balanced
critical anaiysis of the iiterature? is the majority
of the literature of recent origin? is it mainly from primary
sources and of an empirical nature?
Has a conceptual or theoretical framework been identified? Is
the framework adequately described?
is the framework appropriate?
Have alms and objectives, a research question or hypothesis
been identified? If so are they clearly
stated? Do they reflect the information presented in the
iiterature review?
Has the target popuiation been cieariy identified? How were the
sample selected? Was it a probability
or non-probabiiity sampie? is it of adequate size? Are the
indusion/exciusion criteria dearly identified?
Were the participants fuiiy informed about the nature of the
research? Was the autonomy/
confidentiaiity of the participants guaranteed? Were the
participants protected from harm? Was ethicai
permission granted for the study?
Are aii the terms, theories and concepts mentioned in the study
dearly defined?
is the research design cieariy identified? Has the data gathering
instrument been described? is the
instrument appropriate? How was it deveioped? Were reliabiiity
and validity testing undertaken and the
resuits discussed? Was a piiot study undertaken?
What type of data and statisticai analysis was undertaken? Was
it appropriate? How many of the sampie
participated? Significance of the findings?
Are the findings iinked back to the iiterature review? if a
hypothesis was identified was it supported?
Were the strengths and limitations of the study including
generalizability discussed? Was a
recommendation for further research made?
Were ali the books, journais and other media aliuded to in the
study accurateiy referenced?
will help to identify the trustworthiness of the study and its
applicability to nursing practice.
Critiquing the research steps
In critiquing the steps in the research process a number
of questions need to be asked. However, these questions
are seeking more than a simple 'yes' or 'no' answer. The
questions are posed to stimulate the reviewer to consider
the implications of what the researcher has done. Does the
way a step has been applied appear to add to the strength
of the study or does it appear as a possible limitation to
implementation of the study's findings? {Table 1).
Eiements influencing beiievabiiity of the study
Writing style
Research reports should be well written, grammatically
correct, concise and well organized.The use of jargon should
be avoided where possible. The style should be such that it
attracts the reader to read on (Polit and Beck, 2006).
Author(s)
The author(s') qualifications and job title can be a useful
indicator into the researcher(s') knowledge of the area
under investigation and ability to ask the appropriate
questions (Conkin Dale, 2005). Conversely a research
study should be evaluated on its own merits and not
assumed to be valid and reliable simply based on the
author(s') qualifications.
Report title
The title should be between 10 and 15 words long and
should clearly identify for the reader the purpose of the
study (Connell Meehan, 1999). Titles that are too long or
too short can be confusing or misleading (Parahoo, 2006).
Abstract
The abstract should provide a succinct overview of the
research and should include information regarding the
purpose of the study, method, sample size and selection.
Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659
the main findings and conclusions and recommendations
(Conkin Dale, 2005). From the abstract the reader should
be able to determine if the study is of interest and whether
or not to continue reading (Parahoo, 2006).
Eiements influencing robustness
Purpose of the study/research problem
A research problem is often first presented to the reader in
the introduction to the study (Bassett and Bassett, 2003).
Depending on what is to be investigated some authors will
refer to it as the purpose of the study. In either case the
statement should at least broadly indicate to the reader what
is to be studied (Polit and Beck, 2006). Broad problems are
often multi-faceted and will need to become narrower and
more focused before they can be researched. In this the
literature review can play a major role (Parahoo, 2006).
Logical consistency
A research study needs to follow the steps in the process in a
logical manner.There should also be a clear link between the
steps beginning with the purpose of the study and following
through the literature review, the theoretical framework, the
research question, the methodology section, the data analysis,
and the findings (Ryan-Wenger, 1992).
Literature review
The primary purpose of the literature review is to define
or develop the research question while also identifying
an appropriate method of data collection (Burns and
Grove, 1997). It should also help to identify any gaps in
the literature relating to the problem and to suggest how
those gaps might be filled. The literature review should
demonstrate an appropriate depth and breadth of reading
around the topic in question. The majority of studies
included should be of recent origin and ideally less than
five years old. However, there may be exceptions to this,
for example, in areas where there is a lack of research, or a
seminal or all-important piece of work that is still relevant to
current practice. It is important also that the review should
include some historical as well as contemporary material
in order to put the subject being studied into context. The
depth of coverage will depend on the nature of the subject,
for example, for a subject with a vast range of literature then
the review will need to concentrate on a very specific area
(Carnwell, 1997). Another important consideration is the
type and source of hterature presented. Primary empirical
data from the original source is more favourable than a
secondary source or anecdotal information where the
author relies on personal evidence or opinion that is not
founded on research.
A good review usually begins with an introduction which
identifies the key words used to conduct the search and
information about which databases were used. The themes
that emerged from the literature should then be presented
and discussed (Carnwell, 1997). In presenting previous
work it is important that the data is reviewed critically,
highlighting both the strengths and limitations of the study.
It should also be compared and contrasted with the findings
of other studies (Burns and Grove, 1997).
Theoretical framework
Following the identification of the research problem
and the review of the literature the researcher should
present the theoretical framework (Bassett and Bassett,
2003). Theoretical frameworks are a concept that novice
and experienced researchers find confusing. It is initially
important to note that not all research studies use a defined
theoretical framework (Robson, 2002). A theoretical
framework can be a conceptual model that is used as a
guide for the study (Conkin Dale, 2005) or themes from
the literature that are conceptually mapped and used to set
boundaries for the research (Miles and Huberman, 1994).
A sound framework also identifies the various concepts
being studied and the relationship between those concepts
(Burns and Grove, 1997). Such relationships should have
been identified in the literature. The research study should
then build on this theory through empirical observation.
Some theoretical frameworks may include a hypothesis.
Theoretical frameworks tend to be better developed in
experimental and quasi-experimental studies and often
poorly developed or non-existent in descriptive studies
(Burns and Grove, 1999).The theoretical framework should
be clearly identified and explained to the reader.
Aims and objectives/research question/
research hypothesis
The purpose of the aims and objectives of a study, the research
question and the research hypothesis is to form a link between
the initially stated purpose of the study or research problem
and how the study will be undertaken (Burns and Grove,
1999). They should be clearly stated and be congruent with
the data presented in the literature review. The use of these
items is dependent on the type of research being performed.
Some descriptive studies may not identify any of these items
but simply refer to the purpose of the study or the research
problem, others will include either aims and objectives or
research questions (Burns and Grove, 1999). Correlational
designs, study the relationships that exist between two or
more variables and accordingly use either a research question
or hypothesis. Experimental and quasi-experimental studies
should clearly state a hypothesis identifying the variables to
be manipulated, the population that is being studied and the
predicted outcome (Burns and Grove, 1999).
Sample and sample size
The degree to which a sample reflects the population it
was drawn from is known as representativeness and in
quantitative research this is a decisive factor in determining
the adequacy of a study (Polit and Beck, 2006). In order
to select a sample that is likely to be representative and
thus identify findings that are probably generalizable to
the target population a probability sample should be used
(Parahoo, 2006). The size of the sample is also important in
quantitative research as small samples are at risk of being
overly representative of small subgroups within the target
population. For example, if, in a sample of general nurses, it
was noticed that 40% of the respondents were males, then
males would appear to be over represented in the sample,
thereby creating a sampling error. The risk of sampling
660 Britishjournal of Nursing. 2007. Vol 16. No II
RESEARCH METHODOLOGIES
errors decrease as larger sample sizes are used (Burns and
Grove, 1997). In selecting the sample the researcher should
clearly identify who the target population are and what
criteria were used to include or exclude participants. It
should also be evident how the sample was selected and
how many were invited to participate (Russell, 2005).
Ethical considerations
Beauchamp and Childress (2001) identify four fundamental
moral principles: autonomy, non-maleficence, beneficence
and justice. Autonomy infers that an individual has the right
to freely decide to participate in a research study without
fear of coercion and with a full knowledge of what is being
investigated. Non-maleficence imphes an intention of not
harming and preventing harm occurring to participants
both of a physical and psychological nature (Parahoo,
2006). Beneficence is interpreted as the research benefiting
the participant and society as a whole (Beauchamp and
Childress, 2001). Justice is concerned with all participants
being treated as equals and no one group of individuals
receiving preferential treatment because, for example, of
their position in society (Parahoo, 2006). Beauchamp and
Childress (2001) also identify four moral rules that are both
closely connected to each other and with the principle of
autonomy. They are veracity (truthfulness), fidelity (loyalty
and trust), confidentiality and privacy.The latter pair are often
linked and imply that the researcher has a duty to respect the
confidentiality and/or the anonymity of participants and
non-participating subjects.
Ethical committees or institutional review boards have to
give approval before research can be undertaken. Their role
is to determine that ethical principles are being applied and
that the rights of the individual are being adhered to (Burns
and Grove, 1999).
Operational definitions
In a research study the researcher needs to ensure that
the reader understands what is meant by the terms and
concepts that are used in the research. To ensure this any
concepts or terms referred to should be clearly defined
(Parahoo, 2006).
Methodology: research design
Methodology refers to the nuts and bolts of how a
research study is undertaken. There are a number of
important elements that need to be referred to here and
the first of these is the research design. There are several
types of quantitative studies that can be structured under
the headings of true experimental, quasi-experimental
and non-experimental designs (Robson, 2002) {Table 2).
Although it is outside the remit of this article, within each
of these categories there are a range of designs that will
impact on how the data collection and data analysis phases
of the study are undertaken. However, Robson (2002)
states these designs are similar in many respects as most
are concerned with patterns of group behaviour, averages,
tendencies and properties.
Methodology: data collection
The next element to consider after the research design
is the data collection method. In a quantitative study any
number of strategies can be adopted when collecting data
and these can include interviews, questionnaires, attitude
scales or observational tools. Questionnaires are the most
commonly used data gathering instruments and consist
mainly of closed questions with a choice of fixed answers.
Postal questionnaires are administered via the mail and have
the value of perceived anonymity. Questionnaires can also be
administered in face-to-face interviews or in some instances
over the telephone (Polit and Beck, 2006).
Methodology: instrument design
After identifying the appropriate data gathering method
the next step that needs to be considered is the design
of the instrument. Researchers have the choice of using
a previously designed instrument or developing one for
the study and this choice should be clearly declared for
the reader. Designing an instrument is a protracted and
sometimes difficult process (Burns and Grove, 1997) but the
overall aim is that the final questions will be clearly linked
to the research questions and will elicit accurate information
and will help achieve the goals of the research.This, however,
needs to be demonstrated by the researcher.
Table 2. Research designs
Design
Experimental
Qucisl-experimental
Non-experimental,
e.g. descriptive and
Includes: cross-sectional.
correlationai.
comparative.
iongitudinal studies
Sample
2 or more groups
One or more groups
One or more groups
Sample
allocation
Random
Random
Not applicable
Features
• Groups get
different treatments
• One variable has not
been manipuiated or
controlled (usually
because it cannot be)
• Discover new meaning
• Describe what already
exists
• Measure the relationship
between two or more
variables
Outcome
• Cause and effiect relationship
• Cause and effect relationship
but iess powerful than
experimental
• Possible hypothesis for
future research
• Tentative explanations
Britishjournal of Nursing. 2007. Vol 16. No 11 661
If a previously designed instrument is selected the researcher
should clearly establish that chosen instrument is the most
appropriate.This is achieved by outlining how the instrument
has measured the concepts under study. Previously designed
instruments are often in the form of standardized tests
or scales that have been developed for the purpose of
measuring a range of views, perceptions, attitudes, opinions
or even abilities. There are a multitude of tests and scales
available, therefore the researcher is expected to provide the
appropriate evidence in relation to the validity and reliability
of the instrument (Polit and Beck, 2006).
Methodology: validity and reliability
One of the most important features of any instrument is
that it measures the concept being studied in an unwavering
and consistent way. These are addressed under the broad
headings of validity and reliability respectively. In general,
validity is described as the ability of the instrument to
measure what it is supposed to measure and reliability the
instrument's ability to consistently and accurately measure
the concept under study (Wood et al, 2006). For the most
part, if a well established 'off the shelf instrument has been
used and not adapted in any way, the validity and reliability
will have been determined already and the researcher
should outline what this is. However, if the instrument
has been adapted in any way or is being used for a new
population then previous validity and reliability will not
apply. In these circumstances the researcher should indicate
how the reliability and validity of the adapted instrument
was established (Polit and Beck, 2006).
To establish if the chosen instrument is clear and
unambiguous and to ensure that the proposed study has
been conceptually well planned a mini-version of the main
study, referred to as a pilot study, should be undertaken before
the main study. Samples used in the pilot study are generally
omitted from the main study. Following the pilot study the
researcher may adjust definitions, alter the research question,
address changes to the measuring instrument or even alter
the sampling strategy.
Having described the research design, the researcher should
outline in clear, logical steps the process by which the data
was collected. All steps should be fully described and easy to
follow (Russell, 2005).
Analysis and results
Data analysis in quantitative research studies is often seen
as a daunting process. Much of this is associated with
apparently complex language and the notion of statistical
tests. The researcher should clearly identify what statistical
tests were undertaken, why these tests were used and
what •were the results. A rule of thumb is that studies that
are descriptive in design only use descriptive statistics,
correlational studies, quasi-experimental and experimental
studies use inferential statistics. The latter is subdivided
into tests to measure relationships and differences between
variables (Clegg, 1990).
Inferential statistical tests are used to identify if a
relationship or difference between variables is statistically
significant. Statistical significance helps the researcher to
rule out one important threat to validity and that is that the
result could be due to chance rather than to real differences
in the population. Quantitative studies usually identify the
lowest level of significance as PsO.O5 (P = probability)
(Clegg, 1990).
To enhance readability researchers frequently present
their findings and data analysis section under the headings
of the research questions (Russell, 2005). This can help the
reviewer determine if the results that are presented clearly
answer the research questions. Tables, charts and graphs may
be used to summarize the results and should be accurate,
clearly identified and enhance the presentation of results
(Russell, 2005).
The percentage of the sample who participated in
the study is an important element in considering the
generalizability of the results. At least fifty percent of the
sample is needed to participate if a response bias is to be
avoided (Polit and Beck, 2006).
Discussion/conclusion/recommendations
The discussion of the findings should Oow logically from the
data and should be related back to the literature review thus
placing the study in context (Russell, 2002). If the hypothesis
was deemed to have been supported by the findings,
the researcher should develop this in the discussion. If a
theoretical or conceptual framework was used in the study
then the relationship with the findings should be explored.
Any interpretations or inferences drawn should be clearly
identified as such and consistent with the results.
The significance of the findings should be stated but
these should be considered within the overall strengths
and limitations of the study (Polit and Beck, 2006). In this
section some consideration should be given to whether
or not the findings of the study were generalizable, also
referred to as external validity. Not all studies make a claim
to generalizability but the researcher should have undertaken
an assessment of the key factors in the design, sampling and
analysis of the study to support any such claim.
Finally the researcher should have explored the clinical
significance and relevance of the study. Applying findings
in practice should be suggested with caution and will
obviously depend on the nature and purpose of the study.
In addition, the researcher should make relevant and
meaningful suggestions for future research in the area
(Connell Meehan, 1999).
References
The research study should conclude with an accurate list
of all the books; journal articles, reports and other media
that were referred to in the work (Polit and Beck, 2006).
The referenced material is also a useful source of further
information on the subject being studied.
Conciusions
The process of critiquing involves an in-depth examination
of each stage of the research process. It is not a criticism but
rather an impersonal scrutiny of a piece of work using a
balanced and objective approach, the purpose of which is to
highlight both strengths and weaknesses, in order to identify
662 Uritish Journal of Nursinii. 2007. Vol 16. No II
RESEARCH METHODOLOGIES
whether a piece of research is trustworthy and unbiased. As
nursing practice is becoming increasingly more evidenced
based, it is important that care has its foundations in sound
research. It is therefore important that all nurses have the
ability to critically appraise research in order to identify what
is best practice. HH
Russell C (2005) Evaluating quantitative researcli reports.
Nephrol Nurs J
32(1): 61-4
Ryan-Wenger N (1992) Guidelines for critique of a research
report. Heart
Lung 21(4): 394-401
Tanner J (2003) Reading and critiquing research. BrJ Perioper
Nurs 13(4):
162-4
Valente S (2003) Research dissemination and utilization:
Improving care at
the bedside.J Nurs Care Quality 18(2): 114-21
Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in
Planning Nursing
Research: From Question to Proposal 6th edn. Jones and
Bartlett, Sudbury
Bassett C, B.issett J (2003) Reading and critiquing research. BrJ
Perioper
NriK 13(4): 162-4
Beauchamp T, Childress J (2001) Principles of Biomedical
Ethics. 5th edn.
O.xford University Press, Oxford
Burns N, Grove S (1997) The Practice of Nursing Research:
Conduct, Critique
and Utilization. 3rd edn.WB Saunders Company, Philadelphia
Burns N, Grove S (1999) Understanding Nursing Research. 2nd
edn. WB
Saunders Company. Philadelphia
Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21
Clegg F (1990) Simple Statistics: A Course Book for the Social
Sciences. 2nd edn.
Cambridge University Press. Cambridge
Conkin DaleJ (2005) Critiquing research for use in practice.J
Pediatr Health
Care 19: 183-6
Connell Meehan T (1999) The research critique. In:Treacy P,
Hyde A, eds.
Nursing Research and Design. UCD Press, Dublin: 57-74
Cullum N. Droogan J (1999) Using research and the role of
systematic
reviews of the literature. In: Mulhall A. Le May A. eds. Nursing
Research:
Dissemination and Implementation. Churchill Livingstone,
Edinburgh:
109-23-
Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd
edn. Sage,
Thousand Oaks. Ca
Parahoo K (2006) Nursing Research: Principles, Process and
Issties. 2nd edn.
Palgrave Macmillan. Houndmills Basingstoke
Polit D. Beck C (2006) Essentials of Nursing Care: Methods,
Appraisal and
Utilization. 6th edn. Lippincott Williams and Wilkins,
Philadelphia
Robson C (2002) Reat World Research. 2nd edn. Blackwell
Publishing,
O.xford
KEY POINTS
I Many qualified and student nurses have difficulty
understanding the concepts and terminology associated
with research and research critique.
IThe ability to critically read research is essential if the
profession is to achieve and maintain its goal to be
evidenced based.
IA critique of a piece of research is not a criticism of
the wori<, but an impersonai review to highlight the
strengths and iimitations of the study.
I It is important that all nurses have the ability to criticaiiy
appraise research In order to identify what is best
practice.
Critiquing Nursing Research 2nd edition
Critiquing
Nursing Research
2nd edition
ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x
156 mm; p/back; 224 pages;
publicatior) November 2006; £25.99
By John R Cutdiffe and Martin Ward
This 2nd edition of Critiquing Nursing Research retains the
features which made the original
a 'best seller' whilst incorporating new material in order to
expand the book's applicability. In
addition to reviewing and subsequently updating the material of
the original text, the authors
have added two further examples of approaches to crtitique
along with examples and an
additonal chapter on how to critique research as part of the
work of preparing a dissertation.
The fundamentals of the book however remain the same. It
focuses specifically on critiquing
nursing research; the increasing requirement for nurses to
become conversant with research,
understand its link with the use of evidence to underpin
practice; and the movement towards
becoming an evidence-based discipline.
As nurse education around the world increasingly moves
towards an all-graduate discipline, it
is vital for nurses to have the ability to critique research in
order to benefit practice. This book
is the perfect tool for those seeking to gain or develop precisely
that skill and is a must-have
for all students nurses, teachers and academics.
John Cutclitfe holds the 'David G. Braithwaite' Protessor of
Nursing Endowed Chair at the University of Texas (Tyler); he is
also an Adjunct Professor of Psychiatric Nursing at Stenberg
College International School of Nursing, Vancouver, Canada.
Matin Ward is an Independent tvtental Health Nurse Consultant
and Director of tvlW Protessional Develcpment Ltd.
To order your copy please contact us using the details below or
visit our website
www.quaybooks.co.yk where you will also tind details ot other
Quay Books otters and titles.
John Cutcliffe and Martin Ward
IQUAYBOOKS
AdMsioiiDftUHiolthcareM
Quay Books Division I MA Healthcare Limited
Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3
5HN I UK
Tel: 01722 716998 I Fax: 01722 716887 I E-mail:
[email protected] I Web: www.quaybooks.co.uk
A
ilH
MAHbUTHCASIUMITED
Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663
Step'by-step guide to critiquing
research. Part 1: quantitative research
Michaei Coughian, Patricia Cronin, Frances Ryan
Abstract
When caring for patients it is essential that nurses are using the
current best practice. To determine what this is, nurses must be
able
to read research critically. But for many qualified and student
nurses
the terminology used in research can be difficult to understand
thus making critical reading even more daunting. It is
imperative
in nursing that care has its foundations in sound research and it
is
essential that all nurses have the ability to critically appraise
research
to identify what is best practice. This article is a step-by step-
approach
to critiquing quantitative research to help nurses demystify the
process and decode the terminology.
Key words: Quantitative research
methodologies
Review process • Research
]or many qualified nurses and nursing students
research is research, and it is often quite difficult
to grasp what others are referring to when they
discuss the limitations and or strengths within
a research study. Research texts and journals refer to
critiquing the literature, critical analysis, reviewing the
literature, evaluation and appraisal of the literature which
are in essence the same thing (Bassett and Bassett, 2003).
Terminology in research can be confusing for the novice
research reader where a term like 'random' refers to an
organized manner of selecting items or participants, and the
word 'significance' is applied to a degree of chance. Thus
the aim of this article is to take a step-by-step approach to
critiquing research in an attempt to help nurses demystify
the process and decode the terminology.
When caring for patients it is essential that nurses are
using the current best practice. To determine what this is
nurses must be able to read research. The adage 'All that
glitters is not gold' is also true in research. Not all research
is of the same quality or of a high standard and therefore
nurses should not simply take research at face value simply
because it has been published (Cullum and Droogan, 1999;
Rolit and Beck, 2006). Critiquing is a systematic method of
Michael Coughlan, Patricia Cronin and Frances Ryan are
Lecturers,
School of Nursing and Midwifery, University of Dubhn, Trinity
College, Dublin
Accepted for publication: March 2007
appraising the strengths and limitations of a piece of research
in order to determine its credibility and/or its applicability
to practice (Valente, 2003). Seeking only limitations in a
study is criticism and critiquing and criticism are not the
same (Burns and Grove, 1997). A critique is an impersonal
evaluation of the strengths and limitations of the research
being reviewed and should not be seen as a disparagement
of the researchers ability. Neither should it be regarded as
a jousting match between the researcher and the reviewer.
Burns and Grove (1999) call this an 'intellectual critique'
in that it is not the creator but the creation that is being
evaluated. The reviewer maintains objectivity throughout
the critique. No personal views are expressed by the
reviewer and the strengths and/or limitations of the study
and the imphcations of these are highlighted with reference
to research texts or journals. It is also important to remember
that research works within the realms of probability where
nothing is absolutely certain. It is therefore important to
refer to the apparent strengths, limitations and findings
of a piece of research (Burns and Grove, 1997). The use
of personal pronouns is also avoided in order that an
appearance of objectivity can be maintained.
Credibility and integrity
There are numerous tools available to help both novice and
advanced reviewers to critique research studies (Tanner,
2003). These tools generally ask questions that can help the
reviewer to determine the degree to which the steps in the
research process were followed. However, some steps are
more important than others and very few tools acknowledge
this. Ryan-Wenger (1992) suggests that questions in a
critiquing tool can be subdivided in those that are useful
for getting a feel for the study being presented which she
calls 'credibility variables' and those that are essential for
evaluating the research process called 'integrity variables'.
Credibility variables concentrate on how believable the
work appears and focus on the researcher's qualifications and
ability to undertake and accurately present the study. The
answers to these questions are important when critiquing
a piece of research as they can offer the reader an insight
into vhat to expect in the remainder of the study.
However, the reader should be aware that identified strengths
and limitations within this section will not necessarily
correspond with what will be found in the rest of the work.
Integrity questions, on the other hand, are interested in the
robustness of the research method, seeking to identify how
appropriately and accurately the researcher followed the
steps in the research process. The answers to these questions
658 British Journal of Nursing. 2007. Vol 16, No II
RESEARCH METHODOLOGIES
Table 1. Research questions - guidelines for critiquing a
quantitative research study
Elements influencing the beiievabiiity of the research
Elements
Writing styie
Author
Report titie
Abstract
Questions
Is the report well written - concise, grammatically correct,
avoid the use of jargon? Is it weil iaid out and
organized?
Do the researcher(s') quaiifications/position indicate a degree of
knowledge in this particuiar field?
Is the title clear, accurate and unambiguous?
Does the abstract offer a clear overview of the study including
the research problem, sample,
methodology, finding and recommendations?
Elements influencing the robustness of the research
Elements
Purpose/research
Problem
Logical consistency
Literature review
Theoreticai framework
Aims/objectives/
research question/
hypotheses
Sampie
Ethicai considerations
Operational definitions
Methodology
Data Anaiysis / results
Discussion
References
Questions
Is the purpose of the study/research problem clearly identified?
Does the research report foilow the steps of the research process
in a iogical manner? Do these steps
naturally fiow and are the iinks ciear?
is the review Iogicaily organized? Does it offer a balanced
critical anaiysis of the iiterature? is the majority
of the literature of recent origin? is it mainly from primary
sources and of an empirical nature?
Has a conceptual or theoretical framework been identified? Is
the framework adequately described?
is the framework appropriate?
Have alms and objectives, a research question or hypothesis
been identified? If so are they clearly
stated? Do they reflect the information presented in the
iiterature review?
Has the target popuiation been cieariy identified? How were the
sample selected? Was it a probability
or non-probabiiity sampie? is it of adequate size? Are the
indusion/exciusion criteria dearly identified?
Were the participants fuiiy informed about the nature of the
research? Was the autonomy/
confidentiaiity of the participants guaranteed? Were the
participants protected from harm? Was ethicai
permission granted for the study?
Are aii the terms, theories and concepts mentioned in the study
dearly defined?
is the research design cieariy identified? Has the data gathering
instrument been described? is the
instrument appropriate? How was it deveioped? Were reliabiiity
and validity testing undertaken and the
resuits discussed? Was a piiot study undertaken?
What type of data and statisticai analysis was undertaken? Was
it appropriate? How many of the sampie
participated? Significance of the findings?
Are the findings iinked back to the iiterature review? if a
hypothesis was identified was it supported?
Were the strengths and limitations of the study including
generalizability discussed? Was a
recommendation for further research made?
Were ali the books, journais and other media aliuded to in the
study accurateiy referenced?
will help to identify the trustworthiness of the study and its
applicability to nursing practice.
Critiquing the research steps
In critiquing the steps in the research process a number
of questions need to be asked. However, these questions
are seeking more than a simple 'yes' or 'no' answer. The
questions are posed to stimulate the reviewer to consider
the implications of what the researcher has done. Does the
way a step has been applied appear to add to the strength
of the study or does it appear as a possible limitation to
implementation of the study's findings? {Table 1).
Eiements influencing beiievabiiity of the study
Writing style
Research reports should be well written, grammatically
correct, concise and well organized.The use of jargon should
be avoided where possible. The style should be such that it
attracts the reader to read on (Polit and Beck, 2006).
Author(s)
The author(s') qualifications and job title can be a useful
indicator into the researcher(s') knowledge of the area
under investigation and ability to ask the appropriate
questions (Conkin Dale, 2005). Conversely a research
study should be evaluated on its own merits and not
assumed to be valid and reliable simply based on the
author(s') qualifications.
Report title
The title should be between 10 and 15 words long and
should clearly identify for the reader the purpose of the
study (Connell Meehan, 1999). Titles that are too long or
too short can be confusing or misleading (Parahoo, 2006).
Abstract
The abstract should provide a succinct overview of the
research and should include information regarding the
purpose of the study, method, sample size and selection.
Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659
the main findings and conclusions and recommendations
(Conkin Dale, 2005). From the abstract the reader should
be able to determine if the study is of interest and whether
or not to continue reading (Parahoo, 2006).
Eiements influencing robustness
Purpose of the study/research problem
A research problem is often first presented to the reader in
the introduction to the study (Bassett and Bassett, 2003).
Depending on what is to be investigated some authors will
refer to it as the purpose of the study. In either case the
statement should at least broadly indicate to the reader what
is to be studied (Polit and Beck, 2006). Broad problems are
often multi-faceted and will need to become narrower and
more focused before they can be researched. In this the
literature review can play a major role (Parahoo, 2006).
Logical consistency
A research study needs to follow the steps in the process in a
logical manner.There should also be a clear link between the
steps beginning with the purpose of the study and following
through the literature review, the theoretical framework, the
research question, the methodology section, the data analysis,
and the findings (Ryan-Wenger, 1992).
Literature review
The primary purpose of the literature review is to define
or develop the research question while also identifying
an appropriate method of data collection (Burns and
Grove, 1997). It should also help to identify any gaps in
the literature relating to the problem and to suggest how
those gaps might be filled. The literature review should
demonstrate an appropriate depth and breadth of reading
around the topic in question. The majority of studies
included should be of recent origin and ideally less than
five years old. However, there may be exceptions to this,
for example, in areas where there is a lack of research, or a
seminal or all-important piece of work that is still relevant to
current practice. It is important also that the review should
include some historical as well as contemporary material
in order to put the subject being studied into context. The
depth of coverage will depend on the nature of the subject,
for example, for a subject with a vast range of literature then
the review will need to concentrate on a very specific area
(Carnwell, 1997). Another important consideration is the
type and source of hterature presented. Primary empirical
data from the original source is more favourable than a
secondary source or anecdotal information where the
author relies on personal evidence or opinion that is not
founded on research.
A good review usually begins with an introduction which
identifies the key words used to conduct the search and
information about which databases were used. The themes
that emerged from the literature should then be presented
and discussed (Carnwell, 1997). In presenting previous
work it is important that the data is reviewed critically,
highlighting both the strengths and limitations of the study.
It should also be compared and contrasted with the findings
of other studies (Burns and Grove, 1997).
Theoretical framework
Following the identification of the research problem
and the review of the literature the researcher should
present the theoretical framework (Bassett and Bassett,
2003). Theoretical frameworks are a concept that novice
and experienced researchers find confusing. It is initially
important to note that not all research studies use a defined
theoretical framework (Robson, 2002). A theoretical
framework can be a conceptual model that is used as a
guide for the study (Conkin Dale, 2005) or themes from
the literature that are conceptually mapped and used to set
boundaries for the research (Miles and Huberman, 1994).
A sound framework also identifies the various concepts
being studied and the relationship between those concepts
(Burns and Grove, 1997). Such relationships should have
been identified in the literature. The research study should
then build on this theory through empirical observation.
Some theoretical frameworks may include a hypothesis.
Theoretical frameworks tend to be better developed in
experimental and quasi-experimental studies and often
poorly developed or non-existent in descriptive studies
(Burns and Grove, 1999).The theoretical framework should
be clearly identified and explained to the reader.
Aims and objectives/research question/
research hypothesis
The purpose of the aims and objectives of a study, the research
question and the research hypothesis is to form a link between
the initially stated purpose of the study or research problem
and how the study will be undertaken (Burns and Grove,
1999). They should be clearly stated and be congruent with
the data presented in the literature review. The use of these
items is dependent on the type of research being performed.
Some descriptive studies may not identify any of these items
but simply refer to the purpose of the study or the research
problem, others will include either aims and objectives or
research questions (Burns and Grove, 1999). Correlational
designs, study the relationships that exist between two or
more variables and accordingly use either a research question
or hypothesis. Experimental and quasi-experimental studies
should clearly state a hypothesis identifying the variables to
be manipulated, the population that is being studied and the
predicted outcome (Burns and Grove, 1999).
Sample and sample size
The degree to which a sample reflects the population it
was drawn from is known as representativeness and in
quantitative research this is a decisive factor in determining
the adequacy of a study (Polit and Beck, 2006). In order
to select a sample that is likely to be representative and
thus identify findings that are probably generalizable to
the target population a probability sample should be used
(Parahoo, 2006). The size of the sample is also important in
quantitative research as small samples are at risk of being
overly representative of small subgroups within the target
population. For example, if, in a sample of general nurses, it
was noticed that 40% of the respondents were males, then
males would appear to be over represented in the sample,
thereby creating a sampling error. The risk of sampling
660 Britishjournal of Nursing. 2007. Vol 16. No II
RESEARCH METHODOLOGIES
errors decrease as larger sample sizes are used (Burns and
Grove, 1997). In selecting the sample the researcher should
clearly identify who the target population are and what
criteria were used to include or exclude participants. It
should also be evident how the sample was selected and
how many were invited to participate (Russell, 2005).
Ethical considerations
Beauchamp and Childress (2001) identify four fundamental
moral principles: autonomy, non-maleficence, beneficence
and justice. Autonomy infers that an individual has the right
to freely decide to participate in a research study without
fear of coercion and with a full knowledge of what is being
investigated. Non-maleficence imphes an intention of not
harming and preventing harm occurring to participants
both of a physical and psychological nature (Parahoo,
2006). Beneficence is interpreted as the research benefiting
the participant and society as a whole (Beauchamp and
Childress, 2001). Justice is concerned with all participants
being treated as equals and no one group of individuals
receiving preferential treatment because, for example, of
their position in society (Parahoo, 2006). Beauchamp and
Childress (2001) also identify four moral rules that are both
closely connected to each other and with the principle of
autonomy. They are veracity (truthfulness), fidelity (loyalty
and trust), confidentiality and privacy.The latter pair are often
linked and imply that the researcher has a duty to respect the
confidentiality and/or the anonymity of participants and
non-participating subjects.
Ethical committees or institutional review boards have to
give approval before research can be undertaken. Their role
is to determine that ethical principles are being applied and
that the rights of the individual are being adhered to (Burns
and Grove, 1999).
Operational definitions
In a research study the researcher needs to ensure that
the reader understands what is meant by the terms and
concepts that are used in the research. To ensure this any
concepts or terms referred to should be clearly defined
(Parahoo, 2006).
Methodology: research design
Methodology refers to the nuts and bolts of how a
research study is undertaken. There are a number of
important elements that need to be referred to here and
the first of these is the research design. There are several
types of quantitative studies that can be structured under
the headings of true experimental, quasi-experimental
and non-experimental designs (Robson, 2002) {Table 2).
Although it is outside the remit of this article, within each
of these categories there are a range of designs that will
impact on how the data collection and data analysis phases
of the study are undertaken. However, Robson (2002)
states these designs are similar in many respects as most
are concerned with patterns of group behaviour, averages,
tendencies and properties.
Methodology: data collection
The next element to consider after the research design
is the data collection method. In a quantitative study any
number of strategies can be adopted when collecting data
and these can include interviews, questionnaires, attitude
scales or observational tools. Questionnaires are the most
commonly used data gathering instruments and consist
mainly of closed questions with a choice of fixed answers.
Postal questionnaires are administered via the mail and have
the value of perceived anonymity. Questionnaires can also be
administered in face-to-face interviews or in some instances
over the telephone (Polit and Beck, 2006).
Methodology: instrument design
After identifying the appropriate data gathering method
the next step that needs to be considered is the design
of the instrument. Researchers have the choice of using
a previously designed instrument or developing one for
the study and this choice should be clearly declared for
the reader. Designing an instrument is a protracted and
sometimes difficult process (Burns and Grove, 1997) but the
overall aim is that the final questions will be clearly linked
to the research questions and will elicit accurate information
and will help achieve the goals of the research.This, however,
needs to be demonstrated by the researcher.
Table 2. Research designs
Design
Experimental
Qucisl-experimental
Non-experimental,
e.g. descriptive and
Includes: cross-sectional.
correlationai.
comparative.
iongitudinal studies
Sample
2 or more groups
One or more groups
One or more groups
Sample
allocation
Random
Random
Not applicable
Features
• Groups get
different treatments
• One variable has not
been manipuiated or
controlled (usually
because it cannot be)
• Discover new meaning
• Describe what already
exists
• Measure the relationship
between two or more
variables
Outcome
• Cause and effiect relationship
• Cause and effect relationship
but iess powerful than
experimental
• Possible hypothesis for
future research
• Tentative explanations
Britishjournal of Nursing. 2007. Vol 16. No 11 661
If a previously designed instrument is selected the researcher
should clearly establish that chosen instrument is the most
appropriate.This is achieved by outlining how the instrument
has measured the concepts under study. Previously designed
instruments are often in the form of standardized tests
or scales that have been developed for the purpose of
measuring a range of views, perceptions, attitudes, opinions
or even abilities. There are a multitude of tests and scales
available, therefore the researcher is expected to provide the
appropriate evidence in relation to the validity and reliability
of the instrument (Polit and Beck, 2006).
Methodology: validity and reliability
One of the most important features of any instrument is
that it measures the concept being studied in an unwavering
and consistent way. These are addressed under the broad
headings of validity and reliability respectively. In general,
validity is described as the ability of the instrument to
measure what it is supposed to measure and reliability the
instrument's ability to consistently and accurately measure
the concept under study (Wood et al, 2006). For the most
part, if a well established 'off the shelf instrument has been
used and not adapted in any way, the validity and reliability
will have been determined already and the researcher
should outline what this is. However, if the instrument
has been adapted in any way or is being used for a new
population then previous validity and reliability will not
apply. In these circumstances the researcher should indicate
how the reliability and validity of the adapted instrument
was established (Polit and Beck, 2006).
To establish if the chosen instrument is clear and
unambiguous and to ensure that the proposed study has
been conceptually well planned a mini-version of the main
study, referred to as a pilot study, should be undertaken before
the main study. Samples used in the pilot study are generally
omitted from the main study. Following the pilot study the
researcher may adjust definitions, alter the research question,
address changes to the measuring instrument or even alter
the sampling strategy.
Having described the research design, the researcher should
outline in clear, logical steps the process by which the data
was collected. All steps should be fully described and easy to
follow (Russell, 2005).
Analysis and results
Data analysis in quantitative research studies is often seen
as a daunting process. Much of this is associated with
apparently complex language and the notion of statistical
tests. The researcher should clearly identify what statistical
tests were undertaken, why these tests were used and
what •were the results. A rule of thumb is that studies that
are descriptive in design only use descriptive statistics,
correlational studies, quasi-experimental and experimental
studies use inferential statistics. The latter is subdivided
into tests to measure relationships and differences between
variables (Clegg, 1990).
Inferential statistical tests are used to identify if a
relationship or difference between variables is statistically
significant. Statistical significance helps the researcher to
rule out one important threat to validity and that is that the
result could be due to chance rather than to real differences
in the population. Quantitative studies usually identify the
lowest level of significance as PsO.O5 (P = probability)
(Clegg, 1990).
To enhance readability researchers frequently present
their findings and data analysis section under the headings
of the research questions (Russell, 2005). This can help the
reviewer determine if the results that are presented clearly
answer the research questions. Tables, charts and graphs may
be used to summarize the results and should be accurate,
clearly identified and enhance the presentation of results
(Russell, 2005).
The percentage of the sample who participated in
the study is an important element in considering the
generalizability of the results. At least fifty percent of the
sample is needed to participate if a response bias is to be
avoided (Polit and Beck, 2006).
Discussion/conclusion/recommendations
The discussion of the findings should Oow logically from the
data and should be related back to the literature review thus
placing the study in context (Russell, 2002). If the hypothesis
was deemed to have been supported by the findings,
the researcher should develop this in the discussion. If a
theoretical or conceptual framework was used in the study
then the relationship with the findings should be explored.
Any interpretations or inferences drawn should be clearly
identified as such and consistent with the results.
The significance of the findings should be stated but
these should be considered within the overall strengths
and limitations of the study (Polit and Beck, 2006). In this
section some consideration should be given to whether
or not the findings of the study were generalizable, also
referred to as external validity. Not all studies make a claim
to generalizability but the researcher should have undertaken
an assessment of the key factors in the design, sampling and
analysis of the study to support any such claim.
Finally the researcher should have explored the clinical
significance and relevance of the study. Applying findings
in practice should be suggested with caution and will
obviously depend on the nature and purpose of the study.
In addition, the researcher should make relevant and
meaningful suggestions for future research in the area
(Connell Meehan, 1999).
References
The research study should conclude with an accurate list
of all the books; journal articles, reports and other media
that were referred to in the work (Polit and Beck, 2006).
The referenced material is also a useful source of further
information on the subject being studied.
Conciusions
The process of critiquing involves an in-depth examination
of each stage of the research process. It is not a criticism but
rather an impersonal scrutiny of a piece of work using a
balanced and objective approach, the purpose of which is to
highlight both strengths and weaknesses, in order to identify
662 Uritish Journal of Nursinii. 2007. Vol 16. No II
RESEARCH METHODOLOGIES
whether a piece of research is trustworthy and unbiased. As
nursing practice is becoming increasingly more evidenced
based, it is important that care has its foundations in sound
research. It is therefore important that all nurses have the
ability to critically appraise research in order to identify what
is best practice. HH
Russell C (2005) Evaluating quantitative researcli reports.
Nephrol Nurs J
32(1): 61-4
Ryan-Wenger N (1992) Guidelines for critique of a research
report. Heart
Lung 21(4): 394-401
Tanner J (2003) Reading and critiquing research. BrJ Perioper
Nurs 13(4):
162-4
Valente S (2003) Research dissemination and utilization:
Improving care at
the bedside.J Nurs Care Quality 18(2): 114-21
Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in
Planning Nursing
Research: From Question to Proposal 6th edn. Jones and
Bartlett, Sudbury
Bassett C, B.issett J (2003) Reading and critiquing research. BrJ
Perioper
NriK 13(4): 162-4
Beauchamp T, Childress J (2001) Principles of Biomedical
Ethics. 5th edn.
O.xford University Press, Oxford
Burns N, Grove S (1997) The Practice of Nursing Research:
Conduct, Critique
and Utilization. 3rd edn.WB Saunders Company, Philadelphia
Burns N, Grove S (1999) Understanding Nursing Research. 2nd
edn. WB
Saunders Company. Philadelphia
Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21
Clegg F (1990) Simple Statistics: A Course Book for the Social
Sciences. 2nd edn.
Cambridge University Press. Cambridge
Conkin DaleJ (2005) Critiquing research for use in practice.J
Pediatr Health
Care 19: 183-6
Connell Meehan T (1999) The research critique. In:Treacy P,
Hyde A, eds.
Nursing Research and Design. UCD Press, Dublin: 57-74
Cullum N. Droogan J (1999) Using research and the role of
systematic
reviews of the literature. In: Mulhall A. Le May A. eds. Nursing
Research:
Dissemination and Implementation. Churchill Livingstone,
Edinburgh:
109-23-
Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd
edn. Sage,
Thousand Oaks. Ca
Parahoo K (2006) Nursing Research: Principles, Process and
Issties. 2nd edn.
Palgrave Macmillan. Houndmills Basingstoke
Polit D. Beck C (2006) Essentials of Nursing Care: Methods,
Appraisal and
Utilization. 6th edn. Lippincott Williams and Wilkins,
Philadelphia
Robson C (2002) Reat World Research. 2nd edn. Blackwell
Publishing,
O.xford
KEY POINTS
I Many qualified and student nurses have difficulty
understanding the concepts and terminology associated
with research and research critique.
IThe ability to critically read research is essential if the
profession is to achieve and maintain its goal to be
evidenced based.
IA critique of a piece of research is not a criticism of
the wori<, but an impersonai review to highlight the
strengths and iimitations of the study.
I It is important that all nurses have the ability to criticaiiy
appraise research In order to identify what is best
practice.
Critiquing Nursing Research 2nd edition
Critiquing
Nursing Research
2nd edition
ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x
156 mm; p/back; 224 pages;
publicatior) November 2006; £25.99
By John R Cutdiffe and Martin Ward
This 2nd edition of Critiquing Nursing Research retains the
features which made the original
a 'best seller' whilst incorporating new material in order to
expand the book's applicability. In
addition to reviewing and subsequently updating the material of
the original text, the authors
have added two further examples of approaches to crtitique
along with examples and an
additonal chapter on how to critique research as part of the
work of preparing a dissertation.
The fundamentals of the book however remain the same. It
focuses specifically on critiquing
nursing research; the increasing requirement for nurses to
become conversant with research,
understand its link with the use of evidence to underpin
practice; and the movement towards
becoming an evidence-based discipline.
As nurse education around the world increasingly moves
towards an all-graduate discipline, it
is vital for nurses to have the ability to critique research in
order to benefit practice. This book
is the perfect tool for those seeking to gain or develop precisely
that skill and is a must-have
for all students nurses, teachers and academics.
John Cutclitfe holds the 'David G. Braithwaite' Protessor of
Nursing Endowed Chair at the University of Texas (Tyler); he is
also an Adjunct Professor of Psychiatric Nursing at Stenberg
College International School of Nursing, Vancouver, Canada.
Matin Ward is an Independent tvtental Health Nurse Consultant
and Director of tvlW Protessional Develcpment Ltd.
To order your copy please contact us using the details below or
visit our website
www.quaybooks.co.yk where you will also tind details ot other
Quay Books otters and titles.
John Cutcliffe and Martin Ward
IQUAYBOOKS
AdMsioiiDftUHiolthcareM
Quay Books Division I MA Healthcare Limited
Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3
5HN I UK
Tel: 01722 716998 I Fax: 01722 716887 I E-mail:
[email protected] I Web: www.quaybooks.co.uk
A
ilH
MAHbUTHCASIUMITED
Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663
Jeff Rotman/The Image Bank/Getty Images
chapter 7
Dependent t-Tests and Repeated
Measures Analysis of Variance
Learning Objectives
After reading this chapter, you will be able to. . .
1. describe the impact that initial between-groups differences
have on test results when using the
t-test or analysis of variance.
2. compare the independent t-test to the dependent-groups t-test.
3. complete a dependent-(paired/repeated-)samples t-test.
4. explain what power means in statistical testing.
5. compare the one-way ANOVA to the repeated-measures
ANOVA.
6. complete a repeated-measures ANOVA.
7. interpret results and draw conclusions of within-group
designs.
8. present within-group analysis results in APA format.
9. employ Wilcoxon signed-ranks W-test and Friedman’s
nonparametric ANOVA.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_07_c07.indd 235 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
Tests of significant difference, such as the t-test and analysis of
variance, are of two kinds: tests involving independent (or
between) groups and those that employ related,
or dependent (or within) groups. The tests covered to this point
in the book have involved
only independent groups tests. However, there are important
advantages related to the
dependent groups procedures, and they are used frequently in
data analysis.
In this chapter, the focus will be on the dependent groups
equivalents of the independent
t-test and the one-way ANOVA. Since the same groups are used
over time or treatments,
these are called dependent/within-groups designs, whereas the
matched or equivalent
groups can also be employed as an alternative design—all
collectively known as repeated-
measures designs. Although repeated-measures designs answer
the same questions as
their independent groups equivalents (i.e., are there significant
differences within groups,
across times/treatments, or between matched/equivalent groups)
under particular cir-
cumstances they can do so with greater economy and more
statistical power.
7.1 Reconsidering the t and F Ratios
The scores produced in both the independent t and the one-way
ANOVA are ratios. In the case of the t-test, the ratio is the
result of dividing the difference between the means
of the groups by the standard error of the difference:
t 5
M1 2 M2
SEd
With ANOVA, the F ratio is the mean square between divided
by the mean square within:
F 5
MSbet
MSwith
With either t or F, the denominator in the ratio reflects how
much scores vary within (rather
than between) the groups of subjects involved in the study.
These differences are easiest
to see in the way the standard error of the difference is
calculated for a t-test. When group
sizes are equal, the formula is
SED 5 "1SEM1 2 2 1 1SEM2 2 2
with SEM 5
s
"n
and s, of course, a measure of score variation in any group.
So the standard error of the difference is based on the standard
error of the mean, which
in turn is based on the standard deviation. These connections
make it clear that score vari-
ance within in a t-test has its root in the standard deviation for
each group of scores. If we
reverse the order and work from the standard deviation back to
the standard error of the
difference, note the following:
• When scores vary substantially in a group, it is reflected in a
large standard
deviation.
• When the standard deviation is relatively large, the standard
error of the mean
must likewise be large because the standard deviation is the
numerator in the
formula for SEM.
H1
TX_DC
BLF
TX
BL
BLL
suk85842_07_c07.indd 236 10/23/13 1:29 PM
CHAPTER 7Section 7.1 Reconsidering the t and F Ratios
• A large standard error of the mean results in a large standard
error of the difference because that statistic is the square root
of the sum of the squared standard errors of the mean.
• When the standard error of the difference is large, the differ-
ence between the means has to be correspondingly larger in
order for the result to be statistically significant. The table of
critical values indicates that no t ratio (the ratio of the differ-
ences between the means and the standard error of the differ-
ence) may be less than 1.96 for a two-tailed test and less than
1.645 for a one-tailed test based on the critical a 5 .05.
Error Variance
The point of this is that the value of t in the t-test—and it is the
same for F in an ANOVA—
is greatly affected by the amount of variability within the
groups involved. When the
variability within those groups is extensive, the values of t and
F are correspondingly
diminished and less likely to be statistically significant than
when there is relatively little
variability within the groups.
These differences within groups stem from differences in the
way individuals within the
samples react to whatever treatment is the independent variable;
different people respond
differently to the same stimulus. These differences represent
error variance, which is what
occurs whenever scores differ for reasons not related to the
influence of the IV.
Other Sources of Error Variance
Within-group differences are not the only source of error
variance in the calculation of t
and F. Both t-test and ANOVA are based on the assumption that
the groups involved are
equivalent before the independent variable is introduced. In a t-
test where the impact of
relaxation therapy on clients’ anxiety is the issue, the
assumption is that before the ther-
apy is introduced, the group, which will receive the therapy, and
the control group, which
will not, begin with equivalent levels of anxiety. That
assumption is the key to attributing
any differences after the treatment to the therapy, the IV.
Confounding Variables
In comparisons such as this, the initial equivalence of the
groups can be a problem, how-
ever. Maybe there were differences in anxiety before the
therapy was introduced. There
might be differences in the employment circumstances of each
group, and perhaps those
threatened with unemployment are more anxious than the
others. Maybe there are age-
related differences. These other influences that are not
controlled in an experiment are
sometimes called confounding variables.
If a psychologist wants to examine the impact that a substance
abuse program has on
addicts’ behavior, a study might be set up as follows. Two
groups of the same number of
addicts are selected, and the substance abuse program is
provided to one group. After the
program, the psychologist measures the level of substance abuse
in both groups to see
whether there is a difference.
A If the size
of the standard
deviation is related to
the size of the group,
in a t-test, what is the
relationship between
sample size and error?
Try It!
suk85842_07_c07.indd 237 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The problem is that the presence or absence of the program is
not the only thing that might
prompt subjects to respond differently. Perhaps subjects’
background experiences are dif-
ferent. Maybe there are ethnic group differences, age
differences, or social class differ-
ences. If any of those differences affect substance abuse
behavior, there is an opportunity
to confuse the influence of those factors with the impact of the
substance abuse program,
which is the IV. If those other differences are not controlled and
they affect the dependent
variable, they contribute to error variance. There is error
variance any time the dependent
variable (DV) scores fluctuate for reasons unrelated to the IV.
Therefore, there is error variance reflected in the variability
within groups, and there is error
variance represented in any difference between groups that is
not related to the IV. Test
results can be meaningful only when the score variance that is
related to the independent
variable is substantially greater than the error variance—what is
controlled must contrib-
ute more to score values than what is left uncontrolled. This
makes it important to look for
ways to control error variance so that it is not confused with the
variability in scores that
stems from the independent variable. Controlling for
confounding variables is a necessary
research activity. A confounding variable can affect the IV-DV
relationship, thereby lowering
internal validity and thus the statistical conclusion validity of
your findings. Failing to take
confounding variables into account can result in misleading data
and erroneous conclu-
sions, to the detriment of the researcher’s reputation. In other
words, be careful of research
findings and sweeping general statements as there may be
several confounding elements.
That said, controlling for extraneous confounding variables
could be done in several ways.
7.2 Dependent-Groups Designs
Ideally, any before-the-treatment differences between the
groups in a study will be min-
imal. Recall that random selection occurs when every member
of a population has an
equal chance of being selected. The logic behind random
selection is that when groups are
randomly drawn from the same population, they will differ only
by chance, but they will
differ because no sample can represent the population with
complete fidelity, and occa-
sionally, the chance differences will affect the way subjects
respond to the IV.
One way to reduce error variance is to adopt what are called
dependent-groups designs. The independent t-test and the one-
way
ANOVA required independent groups. Members of one group
can-
not also be members of other groups in the same study.
However, in
the case of the t-test, if the same group is measured, exposed to
the
treatment, and then measured again, an important source of
error
variance is controlled. Using the same group twice makes initial
equivalence no longer a concern. Any scoring variability
between the
first and second measure should more accurately reflect the
impact of
the independent variable.
The Dependent-Samples t-Tests
One dependent-groups test where the same group is measured
twice is called the before/
after t-test, also known as the pre/post t-test. An alternative is
called the matched-pairs
or dependent-samples t-test, where each participant in the first
group is matched to
someone in the second group who has a similar characteristic.
Yet a third alternative that
B How does
random selection
attempt to control error
variance in statistical
testing?
Try It!
suk85842_07_c07.indd 238 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
is basically the same as a before/after design is the within-
treatment design where each
participant is used across two treatment groups (usually given at
two different times,
which makes it the same as the before/after t-test). In the latter
option the participant acts
as his or her own control where one of the treatments may be a
placebo. All three types of
dependent-samples t-tests have the same objective, which is
controlling the error variance
that is due to initial between-groups differences. Following are
examples of each test.
• The before/after design: A researcher is interested in the
impact that positive rein-
forcement has on employees’ sales productivity. Besides the
sales commission,
the researcher introduces a rewards program that can result in
increased vaca-
tion time. The researcher gauges sales productivity for a month,
introduces the
rewards program, and gauges sales productivity during the
second month for the
same people. The researcher will explore differences in
employee productivity
before and after the positive reinforcement intervention (the
rewards program).
If significance was obtained then the null (i.e., there is no
difference in employee
productivity after the introduction of the rewards program) can
be rejected and
find support for the alternative hypothesis (i.e., there is a
significant increase in
employee productivity after the introduction of the rewards
program).
• The matched-pairs design: A school counselor is interested in
the impact that verbal
reinforcement has on students’ reading achievement. To
eliminate between-groups
differences, the researcher selects 30 people for the treatment
group and matches
each person in the treatment group to someone in the control
group who has a
similar reading score on a standardized test. The researcher then
introduces the
verbal reinforcement program to those in the treatment group
for a specified period
and over time compares the performance of students in the two
groups as well as
their performance within the group. The matched-pairs design is
similar to an inde-
pendent group design with one major exception: that the groups
are as matched (or
equivalent) to each other as closely as possible based on a
particular measure—in
this case the match or equivalent is based on reading scores on a
standardized test.
• Within-treatment design: A psychiatrist measures each study
participant on taking a
placebo, and then the actual drug for depression to test for
significant differences
over the two treatments (placebo versus drug). Here a
counterbalancing design may be
employed to minimize the order effects that plague repeated-
measures design. Specif-
ically the order in which treatments are given can influence the
outcome. Therefore,
the treatments are given to the groups at different times as
depicted in Figure 7.1.
Figure 7.1: Counterbalance design
Source: Oskar Blakstad (May 8, 2009). Counterbalanced
Measures Design by Martyn Shuttleworth. Retrieved Aug 22,
2013 from
Explorable.com: http://explorable.com/counterbalanced-
measures-design.
Although there are differences in how the tests are set up,
calculating the t-statistic is
the same in each case. The differences between the approaches
are conceptual, not
Group 1 Treatment A Treatment B Posttest
Group 2 Treatment B Treatment A Posttest
suk85842_07_c07.indd 239 10/23/13 1:29 PM
http://explorable.com/counterbalanced-measures-design
CHAPTER 7Section 7.2 Dependent-Groups Designs
mathematical. Both approaches have the same purpose—to
control for any score varia-
tion stemming from nonrelevant factors. They both reduce the
error variance that comes
from using nonequivalent groups. Therefore, testing for
homogeneity of variance or the
Levene’s test is moot here, as we are not dealing with
differences between groups. On the
other hand, we are dealing with variance within groups and
across pairs of treatments.
If there are such significant differences, then this issue
constitutes what is described as a
violation of sphericity, which will be discussed in Section 7.3
with more depth when we
examine repeated-measures ANOVA.
Calculating t in a Dependent-Groups Design
Although the differences between before/after, matched-pairs,
and within-treatment
t-tests are not math-related, there are several approaches to
calculating the t statistic in the
dependent-groups tests. Whatever their differences, they all take
into
account the fact that the two sets of scores are related. One
approach is
to calculate the correlation between the two sets of scores and
then to
use the strength of the correlation value to reduce the error
variance—
the higher the correlation between the two sets of scores, the
lower the
error variance. Rather than correlations, which come up later in
the
book, we will rely on “difference scores.” But whether we use
correla-
tion values or difference scores, the result is the same.
The distribution of difference scores was discussed in Chapter 5
when the independent
t-test was introduced. The point of that distribution is to
determine the point at which the
difference between a pair of sample means (M1 2 M2) is so
great that the most probable
explanation is that the samples were not drawn from populations
with the same means.
That same distribution also provides the theoretical
underpinning for the dependent-
groups tests, but rather than the difference between the means
of the two groups
(M1 2 M2), the difference score in the dependent-groups tests is
based on the mean of the
differences between pairs of individual scores. That is, the
differences between each pair of
related scores will be determined, and then the mean of those
differences will become the
numerator in the t ratio. If the mean of the difference scores is
sufficiently different from
the mean of the distribution of difference scores (which, recall,
is 0), the t value will be
statistically significant.
The denominator in the t ratio is another standard error of the
mean value, but in this case,
it is the standard error of the mean for those difference scores.
The mechanics of checking
for significance are similar to what was done for the
independent t:
• A critical value from the t table defines the point at which the
t ratio is statisti-
cally significant.
• The critical value is dependent upon the degrees of freedom
for the problem.
For the dependent-samples t, the degrees of freedom are the
number of pairs of
scores minus 1 (n 2 1).
The dependent-groups t-test statistic has this form:
t 5
Md
SEMd
Formula 7.1
C How are the
before/after t-test
and the matched-pairs
t-test different?
Try It!
suk85842_07_c07.indd 240 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Where
Md 5 the mean of the difference scores
SEMd 5 the standard error of the mean for the difference scores
The steps for completing the test follow:
1. From the two scores for each subject, subtract the second
from the first to deter-
mine the difference score, d for each pair.
2. Determine the mean of the d scores: Md 5
a d
number of pairs
3. Calculate the standard deviation of the d values, sd.
4. Calculate the standard error of the mean for the difference
scores, SEMd, by
dividing the result of Step 3 by the square root of the number of
pairs of scores,
SEMd 5
sd
"number of pairs
5. t 5
Md
SEMd
Following is an example to illustrate the steps to calculating the
dependent measures
t-test. A psychologist is investigating the impact that verbal
reinforcement has on the
number of questions students ask in a seminar.
• Ten upper-level students participate in two seminars where a
presentation is fol-
lowed by students’ questions.
• In the first seminar, no feedback is provided by the instructor
after a student asks
the presenter a question.
• In the second seminar, the instructor offers feedback—such as
“That’s an excel-
lent question” or “Very interesting question” or “Yes, that had
occurred to me as
well”—after each question.
• The psychologist will test the following hypothesis:
H0: There is no significant mean difference in the number of
student questions
asked from seminar 1 to seminar 2
H0: mseminar_1_questions 5 mseminar_2_questions
• By rejecting H0 , the psychologist will find support for the
alternative hypothesis:
Ha: There is a significant mean difference in the number of
student questions asked
from seminar 1 to seminar 2
Ha: mseminar_1_questions ? mseminar_2_questions
Is there a significant difference between the number of
questions students ask in the first
seminar compared to the number of questions students ask in the
second seminar? The
number of questions asked by each student in both seminars and
the solution to the prob-
lem are in Figure 7.2.
suk85842_07_c07.indd 241 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.2: Calculating the before/after and within-treatment t
Se
m
in
ar
1
Se
m
in
ar
2
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2
3
2
4
0
3
1
5
4
3
1
Se
m
in
ar
1
Se
m
in
ar
2
d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1
0
3
0
2
1
3
2
1
2
3
2
4
0
3
1
5
4
3
1
�2
�2
�1
0
�1
0
�2
�2
�2
1
1. Determine the difference between each pair of scores, d by
subtraction.
2. Determine the mean of the differences, the d values (Md).
Md = = � = �1.1
3. Calculate the standard deviation of the d values (sd).
Verify that sd = 1.101
�d = �11
determine standard error of the mean for the difference scores
(SEMd)
4. Just as the standard error of the mean in the earlier tests was
s/�n,
by dividing the result of step 3 by the square root of the number
of
pairs.
Verify that SEmd = = = 0.348
5. Divide Md by SEMd to determine t.
t = = � = �3.161
this test are the number of pairs of scores, np �1.
t0.05(9) = 2.262
6. As noted earlier, the degrees of freedom for the critical value
of t for
�d
10
Md
SEMd
1.1
0.348
sd
�np
1.101
�10
11
10
suk85842_07_c07.indd 242 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The calculated value of t exceeds the critical value from Table
5.1 (which is also Table B in
the Appendix). The result is statistically significant. Note that it
is the absolute value of the
calculated t in which we are interested. Because the question
was whether there is a sig-
nificant difference in the number of questions, it is a two-tailed
test, and it does not matter
which session had the greater number; it also does not matter
whether Session 1 is larger
than Session 2 or the other way around. The students in the
second session, where ques-
tions were followed by feedback, asked significantly more
questions than the students did
in the first session, when the instructor offered no feedback.
The Degrees of Freedom, the Dependent-Groups Test, and
Power
With Md 5 21.1, there is comparatively little difference between
the two sets of scores.
What makes such a small mean difference statistically
significant? The answer is in the
amount of error variance in this problem. When the error
variance is also very small—the
standard error of the difference scores is just .348—
comparatively small mean differences
can be statistically significant. The rationale for using
dependent-
groups tests as opposed to independent-group designs is that the
for-
mer are comparatively more powerful; there is less error to
contend
with, thereby increasing the probability of rejecting the null
hypoth-
esis. This brings us to the discussion of power in statistical
testing.
Table B in the Appendix, the critical values of t, indicates that
critical
values decline as degrees of freedom increase. That occurs not
only in
the critical values for t but also for F in analysis of variance and
in fact
for most tables of critical values for statistical tests. For the
dependent-
groups t-test, the degrees of freedom are based on
• the number of pairs of related scores, 21.
For the independent-groups t-test, the degrees of freedom are
based on
• the number of scores in both groups, 22 (Chapter 5).
This means that critical values are larger in a dependent-groups
test for the same number
of raw scores involved. But even a test with a larger critical
value can produce significant
results when there is more control of error variance. This is
what the dependent-groups
test provides. The central point is this: When each pair of scores
comes from the same par-
ticipant, or from a matched pair of participants, the random
variability from nonequiva-
lent groups is minimal because scores tend to vary similarly for
each pair, resulting in
relatively little error variance. The small SEMd value that
results more than compensates
for the fewer degrees of freedom and the associated larger
critical value connected to
dependent-groups tests.
In statistical testing, power is defined as the likelihood of
detecting a significant differ-
ence when it is present. The more powerful statistical test is the
one that will most read-
ily detect a significant difference. As long as the sets of scores
are closely related, the
dependent-measures test is more powerful than the independent-
groups equivalent.
D What does it
mean to say that
the within-subjects test
has more power than the
independent t-test?
Try It!
suk85842_07_c07.indd 243 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
A Matched-Pairs Example
Another form of the dependent-groups t-test is the matched-
pairs design. In this approach,
rather than measure the same people repeatedly, each
participant in one group is paired
with a participant in the other group who is similar.
For example, consider a market analyst who wants to determine
whether a television com-
mercial will induce consumers to spend more on a breakfast
cereal. The analyst selects a
group of consumers entering a grocery store, induces them to
view the television com-
mercial, and then tracks their expenditures on breakfast cereal.
A second group is selected,
and they also shop, but they do not view the television
commercial. The analyst selects
people for the second group who match the age and gender
characteristics of those in the
first group. This controls for age and gender because those
characteristics might affect
spending for the particular product. Each individual from Group
1 has a companion in
Group 2 of the same age and sex. The expenditures in dollars
for the members of each
group and the solution to the problem are in Figure 7.3.
Figure 7.3: Calculating a matched-pairs t-test
Vi
ew
ed
Di
d
no
t
Vi
ew d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1.5
4
3
0
2
4.5
6
0
5.25
2
3
0
2
0
0
4
2
1
2
3
�1.5
4
1
0
2
0.5
4
�1
3.25
�1
sd = 2.092
Verify that Md = 1.125
SEMd = = = 0.662
t = = = 1.700
t0.05(9) = 2.262
sd
�np
2.092
�10
Md
SEMd
1.125
0.662
suk85842_07_c07.indd 244 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The absolute value of t is less than the critical value from Table
5.1 (or Appendix Table B)
for df 5 9. The difference is not statistically significant. There
are probably several ways to
explain the outcome, but we will explore just three.
• The most obvious explanation is that the commercial did not
work. The shoppers
who viewed the commercial were not induced to spend
significantly more than
those who did not view it.
• Another explanation has to do with the matching. Perhaps age
and gender are
not related to how much people spend shopping for the
particular product. Per-
haps the shopper’s level of income is the most important
characteristic, and that
was not controlled in the pairing.
• Another explanation is related to sample size. Small samples
tend to be more
variable than larger samples, and variability is what the
denominator in the
t-ratio reflects. Perhaps if this had been a larger sample, the
SEMd would have
had a smaller value, and the t would have been significant.
The second explanation points out the disadvantage of matched-
pairs designs compared
to repeated-measures designs. The individual conducting the
study must be in a posi-
tion to know which characteristics of the participants are most
relevant to explaining the
dependent variable so that they can be matched in both groups.
Otherwise, it is impos-
sible to know whether a nonsignificant outcome reflects an
inadequate match, control of
the wrong variables, or a treatment that just does not affect the
DV.
Comparing the Dependent-Samples t-Test to the Independent t
In order to compare the dependent-samples t-test and the
independent t more directly, we
are going to apply both tests to the same data. This will
illustrate how each test deals with
error variance; however, a caution is necessary before
beginning: Once data is collected,
there really is no situation where someone can choose which
test to use because either the
groups are independent, or they are not. Therefore, we proceed
purely as an academic
exercise, recognizing that such a situation is not going to
happen in the ordinary course
of events.
As an example, a university program encourages students to
take a service learning class
that emphasizes the importance of community service as a part
of the students’ educa-
tional experience. Data is gathered on the number of hours
former students spend in com-
munity service per month after they complete the course and
graduate from the university.
• For the independent t-test, the students are divided between
those who took a
service learning class and graduates of the same year who did
not.
• For the dependent-groups t-test, those who took the service
learning class are
matched to a student with the same major, age, and gender who
did not take
the class.
The data and the solutions to both tests are in Figure 7.4.
suk85842_07_c07.indd 245 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.4: The before/after t-test versus the independent t-test
Because the differences between the scores are quite consistent,
as they tend to be when
participants are matched effectively, there is very little variance
in the difference scores.
This results in a comparatively small standard deviation of
difference scores and a small
standard error of the mean for the difference scores. This allows
for t ratios with even rela-
tively small numerators to be statistically significant. Because
for the independent t-test,
there is no assumption that the two groups are related, error
variance is based on the dif-
ferences within the groups of raw scores, and the denominator is
large enough that the t
value is not significant.
Cl
as
s
No
Cl
as
s d
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
4
3
3
2
3
4
1
5
6
4
3
2
2
2
2.5
3
2
4
5
3
1
1
1
0
0.5
1
�1
1
1
1
M
s
SEM
3.50
1.434
0.453
2.850
1.001
0.316
0.650
0.669
0.211
As an independent t-test we have,
SEd = �(SEM12 + SEM22) = �(0.4532 + 0.3162) = 0.553
As a matched pairs t-test the results are,
t = = 0.650 + 0.211 = 3.081; t0.05(9) = 2.262. The result
is significant.
M1 � M2
SEd
3.50 � 2.850
= 1.175; t0.05(18) = 2.101. The result is not significant.=t =
0.553
Md
SEMd
suk85842_07_c07.indd 246 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
The Dependent-Groups t-Test on Excel
If the problem in Figure 7.4 is completed in Excel as a
dependent-groups test, the proce-
dure is as follows:
• Create the data file in Excel.
Column A is labeled Class to indicate those who had the service
learning class, and
column B is labeled No Class.
Enter the data, beginning with cell A2 for the first group and
cell B2 for the second
group.
• Click the Data tab at the top of the page.
• At the extreme right, choose Data Analysis.
• In the Analysis Tools window, select t-test: Paired Two
Sample for Means and
click OK.
• There are two blanks near the top of the window for Variable
1 Range and
Variable 2 Range. In the first, enter A2:A11 indicating that the
data for the first
(Class) group is in cells A2 to A11. In the second, enter B2:B11
for the No Class
group.
• Indicate that the hypothesized mean difference is 0. This
reflects the value for the
mean of the distribution of difference scores.
• Indicate A13 for the output range, so that the results do not
overlay the data
scores.
• Click OK.
Widen column A so that all the output is readable. The result is
the screenshot that is
Figure 7.5.
suk85842_07_c07.indd 247 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Figure 7.5: The Excel output for the dependent-samples t-test
using the
data from Figure 7.4
In the Excel solution, t 5 3.074 rather than the 3.081 from the
longhand solution. The
Excel approach is to calculate the correlation between scores to
find a solution, rather than
to determine the difference between scores as we did. Note that
the Pearson correlation
(which will be explained in Chapter 8) is indicated at .91. In
any event, the very minor dif-
ference, .007, between the solution in Figure 7.4 and the Excel
solution in Figure 7.5 is not
relevant to the outcome as these are attributed to rounding
errors. The Excel output also
indicates results for both one-tailed and two-tailed tests at p 5
.05; the outcome is statisti-
cally significant.
suk85842_07_c07.indd 248 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Apply It!
Repeated Measures
Diabetes is a group of metabolic diseases in which the body
cannot properly
regulate blood sugar. Management of this disease is achieved by
controlling
normal levels of glucose in the blood for as much of the time as
possible. This requires an
accurate, portable glucose monitor for home use.
A medical device company has developed a new portable
glucose monitor and wishes to
compare it against a laboratory standard. This will produce a
data set in which two different
monitors measure the glucose level of 11 randomly chosen
diabetes patients. Although the
two monitors take the blood samples at the same time, this can
be considered an example
of the before/after dependent-samples t-test because the same
group is measured twice.
By using the same set of patients for both monitors, each patient
is his or her own control.
Obtaining two measurements for each patient reduces
measurement variability compared
to using two independent sets of patients. Choosing a level of
significance of p ≤ .05, we use
the paired-sample t-test to test the null hypothesis that there is
no difference in measure-
ments between the two monitors.
• H0: mglucose_portable_monitor 5 mglucose_lab_monitor
By rejecting H0 the company will find support for the
alternative hypothesis that there is a
significant mean difference in the glucose level between both
machines.
• Ha: mglucose_portable_monitor ? mglucose_lab_monitor
The glucose readings from each of the two monitors are
measured in milligrams per deciliter
and are shown in the following table. There is a large variability
within each column because
each patient is different, and the readings were taken at various
times of the day.
Patient Portable Monitor Laboratory Standard
A 112 120
B 85 82
C 103 116
D 154 168
E 65 75
F 52 51
G 85 96
H 72 79
I 167 178
J 123 141
K 142 153
(continued)
suk85842_07_c07.indd 249 10/23/13 1:29 PM
CHAPTER 7Section 7.2 Dependent-Groups Designs
Comparing the Three Dependent t-Tests With the Independent t-
Test
The before/after and matched-pairs approaches to calculating a
dependent-groups t-test
each have advantages. The before/after design provides the
greatest control over the
extraneous variables that can confound the results in a matched-
pairs design. When using
the matching approach, there is always a chance that subjects in
Group 2 are not matched
closely enough on some relevant variable and the resulting
mismatches create error vari-
ance. In the service learning example, students were matched
according to age, major,
and gender. But if marital status affects students’ willingness to
be involved in commu-
nity service and it is not controlled, there could be an imbalance
of married/not-married
Apply It! (continued)
The Excel solution follows:
Variable 1 Variable 2
Mean 105.45 114.45
Variance 1428.67 1736.27
Observations 10 10
Pearson Correlation 0.99
Hypothesized Mean
Difference
0.00
df 9
t Stat 24.817
P(Tdt) one-tail 0.0003
tcrit one-tail 1.8331
P(Tdt) two-tail 0.0007
tcrit two-tail 2.2622
The magnitude of the calculated value of t 5 24.817 exceeds the
critical two-tail value from
the table of tcrit 5 2.26. The result is statistically significant so
we reject the null hypothesis
that the means are the same. The portable monitor measures
glucose levels lower than the
laboratory standard.
Based on results of this test, the company continued research on
the portable monitor
until they could devise a solution that would more accurately
replicate laboratory standard
results.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 250 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
students that confounds results. The before/after procedure
involves the same students,
and unless their status on some important variable changes
between measures (a rash of
marriages between the first and second measurement, for
example), there is going to be
better control of error variance with that approach.
Note that the matched-pairs and the within-treatments approach
also assume a large sam-
ple from which to draw in order to select participants who
match those in the first group.
As the number of variables on which participants must be
matched increases, so must the
size of the sample from which to draw in order to find
participants with the correct com-
bination of characteristics.
The advantage of the matched-pairs design, on the other hand,
is that it takes less time to
execute. The treatment group and the control group can both be
involved in the study at
the same time. By way of a summary, note the comparisons in
Table 7.1.
Table 7.1: Comparing the t-tests
Independent t Before/After Matched-Pairs Within-
Treatments
Groups Independent
groups
One group
measured twice
Two groups:
each subject
from the first
group matched
to one in the
second
One group
measured
twice for two
treatments
Denominator/
error term
Within groups
variability plus
between groups
Only within
groups variability
Only within
groups variability
Only within
groups variability
7.3 The Within-Subjects F
Sometimes two measures of the same group are not enough to
track changes in the DV. Maybe the researchers running the
service learning study want to compare how much
time students devoted to community service the year they
graduated, one year later, and
then two years after graduation. The within-subjects F is a
dependent-groups procedure
for two or more groups of scores when the dependent variable is
interval or ratio scale.
Because the dependent-groups t-test is the repeated-measures
equivalent of the indepen-
dent t-test, the within-subjects F is the repeated-measures or
matched-pairs equivalent
of the one-way ANOVA. The same Ronald A. Fisher who
developed analysis of variance
also developed this test, which is a form of ANOVA, and the
test statistic is still F.
Here too, the dependent groups can be formed by either
repeatedly measuring the same
group or by matching separate groups of participants on the
relevant variables. When
there are more than two groups, matching becomes increasingly
problematic, however,
and although it is theoretically possible to match any number of
participant groups, it is
suk85842_07_c07.indd 251 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
a highly complex undertaking to match all the relevant variables
across more than two or
three measures. Repeatedly measuring the same participants is
much more common than
matching.
Managing Error Variance in the Within-Subjects F
Recall from Chapter 6 that when Fisher developed ANOVA, he
shifted away from calcu-
lating score variability with the standard deviation, standard
error of the mean, and so on
and used sums of squares instead. The particular sums of
squares computed are the key
to the strength of this procedure.
If a group of participants in a study is measured on a dependent
variable at three different
intervals and their scores are recorded in parallel columns, the
researcher will have a data
sheet similar to the following:
First Measure Second Measure Third Measure
Participant 1 . . .
Participant 2 . . .
• The column scores are the equivalent of scores from the
different groups in a one-
way ANOVA, and any differences from column to column
reflect the effect of the
IV, the treatment.
• The participant-to-participant differences, the within-group
differences, are
reflected in the differences in the scores from row to row. Those
differences are
error variance just as they are with the one-way ANOVA.
• The within-subjects F approach is to calculate the variability
between rows
(the within-groups variance), and then, because it comes from
participant-to-
participant differences that are the same in each group, to
eliminate it from further
analysis.
• The only error variance that remains is that which does not
stem from the person-
to-person differences.
In the dependent-samples t-test, the within-subjects variance is
managed by reducing the
denominator in the t ratio according to how highly correlated
the two sets of measures are
(the Excel approach) or by the longhand approach of using the
standard deviation of the
difference scores, which is relatively small when scores are
related.
In the within-subjects F, the variability within groups is
calculated and then adjusted if
there are issues with too much variance between pairs of
treatments. This detection of
variance is based on the Mauchly’s test of sphericity (W)
developed by John W. Mauchly
in 1940. If the W-test is significant (p , .05), then there is a
violation of sphericity, which
means that there is too much variance within the group across
pairs of times/treat-
ments (see Table 7.2). Therefore, since sphericity is violated,
degrees of freedom adjust-
ments are made that include the Greenhouse-Geisser or Huynh-
Feldt calculations. These
are adjustments of the degrees of freedom (df ) based on their
respective epsilon or
suk85842_07_c07.indd 252 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
e-values (discussed more in Chapter 8). Of the two options, the
Greenhouse-Geisser is
more conservative in that it is harder to reject the null
hypothesis, with a lower prob-
ability of a type I error. The Huynh-Feldt is based on a bias
corrected value that is not as
conservative.
One final note regarding error variance is that it can only be
calculated across comparison
of pairs of treatments so therefore the W-test for dependent-
samples t-test is not necessary
since there is only one pair of values. In addition, the test for
sphericity cannot be done
in the one-way ANOVA because the amount of variability
within groups is different for
each group, and there is no way to separate it from the balance
of the error variance in
the problem, which can be a severe limitation in affecting the
power of between-group
designs. An example and interpretation of sphericity will be
shown in the SPSS example
later in the chapter.
Table 7.2: The concept of sphericity
Patient Tx A Tx B Tx C Tx 12Tx 2 Tx 12Tx 3 Tx 22Tx 3
1 30 27 20 3 10 7
2 35 30 28 5 7 2
3 25 30 20 25 5 10
4 15 15 12 0 3 3
5 9 12 7 23 2 5
Variance 17 10.3 10.3
A Within-Subjects F Example
An industrial/organizational psychologist is conducting a study
of employees who
assemble electronic components. The study examines how
productivity changes dur-
ing the length of time employed. The psychologist identifies
five workers hired in the
same month and then gauges the number of assembled
components each employee aver-
ages per hour one week, one month, and then two months after
beginning work. Is there
is a relationship between the number of completed components
and the length of time
employed? The data for the five employees follows:
Products Assembled per Hour
1 week 1 month 2 months
Diego 2 5 4
Harold 4 7 7
Wilma 3 6 5
Carol 4 5 6
Moua 5 8 9
suk85842_07_c07.indd 253 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
• The independent variable (the IV, the treatment) is the time
elapsed.
• The dependent variable (the DV) is the number of components
assembled.
• The issue is whether there are significant differences in the
measures from col-
umn to column (over time).
In Chapter 6 the variability related to the IV was measured in
the sum of squares between
(SSbet). The same source of variance is gauged here, except
that it is called the sum of squares
between columns (SScol).
The Components of the Within-Subjects F
Calculating the within-subjects F begins just as the one-way
ANOVA begins, by determin-
ing all variability from all sources with the sum of squares total
(SStot). It is even calculated
the same way as it was in Chapter 6:
1. The sum of squares total.
SStot 5 a (x 2 MG)2
a. Subtract each score from the mean of all the scores from all
the groups,
b. square the difference, and then
c. sum the squared differences.
The balance of the problem is completed with the following
steps:
2. The sum of squares between columns (SScol). This equation
is much like SSbet in the
one-way ANOVA. The scores in each column are treated the
same way the indi-
vidual groups are treated in the one-way ANOVA. For columns
1, 2, through k,
SScol 5 (Mcol 1 2 MG )
2ncol 1 1 (Mcol 2 2 MG )
2ncol 2 1 .
. . 1 ( Mcol k 2 MG )
2ncol k
Formula 7.2
a. calculate the mean for each column of scores,
b. subtract the mean for all the data (MG) from each column
mean,
c. square the result, and
d. multiply the squared result by the number of scores in the
column.
3. The sum of squares between rows. This too, is like the SSbet
from the one-way prob-
lem except that it treats the scores for each row as a separate
group. For rows 1, 2,
through i
SSrows 5 (Mrow 1 2 MG)
2 nrow 1 1 (Mrow 2 2 MG)
2 nrow 2 1 .
. . 1 (Mrow i 2 MG)
2 nrow i
Formula 7.3
a. calculate the mean for each row of scores,
b. subtract the mean for all the data from each row mean,
c. square the result, and
d. multiply the squared result by the number of scores in the
row.
suk85842_07_c07.indd 254 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
4. The residual sum of squares is the error term in the within-
subjects F. It is the
equivalent of SSwith in the one-way ANOVA. With the within-
subjects F, the vari-
ability in scores due to person-to-person differences within the
same measure is
calculated, and because it is the same for each set of measures,
it is eliminated. This will
result in a reduced error term. It is determined as follows:
SSresid 5 SStot 2 SScol 2 SSrows Formula 7.4
a. Take all variance from all sources (SStot),
b. subtract from it the treatment effect (SScol), and
c. subtract the person-to-person differences (SSrows).
The Within-Subjects F Calculations
When the sums of squares values are completed, the next step is
to complete the ANOVA
table. The degrees-of-freedom values are as follows:
• dftot 5 N 2 1
• dfcol 5 number of columns 2 1
• dfrows 5 number of rows 2 1
• dfresid 5 dfcol 3 dfrows
Just as with one-way problems, the mean square values are
calculated by dividing the
sums of squares by their degrees of freedom. The components of
the F value and the only
MS values required are the MScol, which includes the treatment
effect, and the MSresid,
which is the error term. The MS is not determined for total or
for rows. The F value in the
within-subjects ANOVA is then MScol 4 MSresid.
The calculations and the table for the products-assembled-per-
hour problem are in
Figure 7.6.
suk85842_07_c07.indd 255 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
E How is the
error term in the
within-subjects F different
from that in the one-way
ANOVA?
Try It!
Figure 7.6: A within-subjects F example
The calculated value of F exceeds the critical value of F from
the table. The number of products assembled per hour is
significantly different according to the amount of time the
employee has been on the job. The significant F indicates that
this much difference between measures is unlikely to have
occurred by chance.
Diego
Moua
Carol
Wilma
Harold
Column Means
Grand Mean (Md)
1
week
1
month
2
months
Row
Means
2
4
3
4
5
3.60
5.333
5
7
6
5
8
6.20
4
7
5
6
9
6.20
3.667
6.0
4.667
5.0
7.333
The Products Assembled per Hour
Source
Residual
Rows
Columns
Total
SS
49.333
22.533
23.333
3.467
df
14
2
4
8
MS
11.267
0.433
F
26.0
Fcrit
4.46
The ANOVA Table
1. SStot = �(x � MG)2
(2 � 5.333)2 + (4 � 5.333)2 + ... + (9 � 5.333)2 = 49.333
4. The residual sum of squares.
SSresid = SStot � SScol � SSrows = 49.333 � 22.533 � 23.333
= 3.467
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + ...
+ (Mcolk � MG) 2ncolk
(3.6 � 5.333)25 + (6.2 � 5.333)25 + (6.2 � 5.333)25 = 22.533
(7.333 � 5.333)23 = 23.333
(3.667 � 5.333)23 + (6.0 � 5.333)23 + (4.667 � 5.333)23 +
(5.0 � 5.333)23 +
3. SSrows = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + ... + (Mri
� MG) 2ni
suk85842_07_c07.indd 256 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Completing the Post Hoc Test
Ordinarily, the calculation of F leaves unanswered the question
of which set of measures
is significantly different from which. However, in this
particular problem there is only one
possibility. Because both the 1-month and the 2-month groups
of measures have the same
mean (M 5 6.20), they must both be significantly different from
the only other group of
measures in the problem, the 1-week-on-the-job measures for
which M 5 3.6. As a dem-
onstration, HSD is completed anyway.
The HSD procedure is the same as for the one-way test, except
that the error term is now
MSresid. Substituting MSresid for MSwith in the formula
provides
HSD 5 x Å
MSresid
n
Where
x is a value from Appendix Table D. It is based on the number
of means,
which is the same as the number of groups of measures, 3 in the
example,
and the df for MSresid, which are 8.
n 5 the number of scores there are in any one measure, 5 in this
instance.
For the number-of-products-assembled-per-hour study,
HSD 5 4.04 Å
.433
5
5 1.19
A difference of .306 or greater between any pair of means is
statistically significant. Using
the same approach used in Chapter 6, a matrix indicating the
difference between each pair
of means makes it easier to interpret the HSD value.
1 week (3.6) 1 month (6.2) 2 months (6.2)
1 week (3.6) diff 5 0 diff 5 2.6* diff 5 2.6*
1 month (6.2) diff 5 0
2 months (6.2)
*Indicates a significant difference.
The 1-week measures of productivity are significantly different
from the 1-month and
2-month measures of productivity. Because the mean values of
the 1- and 2-month mea-
sures are the same, neither of the last two measures is
significantly different from the
other. The largest increase in productivity comes between the
first week and first month
of employment.
Calculating the Effect Size
The final question for a significant F is the question of the
practical importance of the
result. Using partial-eta squared as the measure of effect size
yields the following formula:
partial-h2 5
SScol
SSresid 1 SScol
suk85842_07_c07.indd 257 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
For the problem just completed, SScol 5 22.533 and SSresid 5
3.467, so
partial-h2 5
22.533
26
5 0.87
Approximately 87% of the variance in productivity can be
explained by how long the
individual has been on the job.
Apply It!
Pilot Program Revisited
Let us return to the example of the middle school that adopted a
medita-
tion program known as quiet time to relieve stress, increase test
scores, and
improve student behavior. In Chapter 5, we used a one-sample t-
test to deter-
mine that a statistically significant increase in GPAs occurred
among participating students.
Now, we will use a within-subject F test to see if their stress
levels have decreased.
Ten randomly chosen students who participated in the program
filled out questionnaires
about their stress levels. The aggregate score was from 1 to 10,
with 10 indicating the most
stress. The survey was given before the start of the program and
at 3-month intervals. The
time elapsed represents the independent variable, the treatment
effect that drives this
analysis. The dependent variable is the stress score. The within-
subjects F is a dependent-
groups procedure for two or more groups of scores for which
the dependent variable is
interval or ratio scale. In this example, we have four groups of
scores.
Results of the stress questionnaires follow.
0 Months 3 Months 6 Months 9 Months
Student 1 7 6 6 6
Student 2 9 6 5 5
Student 3 7 5 5 4
Student 4 5 3 3 2
Student 5 7 6 4 4
Student 6 8 5 7 5
Student 7 5 4 4 3
Student 8 7 5 6 5
Student 9 6 6 4 4
Student 10 7 5 5 5
(continued)
suk85842_07_c07.indd 258 10/23/13 1:29 PM
CHAPTER 7
Apply It! (continued)
The following table shows results of the within-subject F test
calculations.
Source SS df MS F
Total 82.000 39
Columns 34.475 3 11.492 26.36
Subjects 35.725 9
Residual 11.775 27 0.436
F.05(3,27) 2.96
The F value of 26.36 is greater than the critical F value of 2.96,
so the results are statistically
significant.
Because the calculation of F did not identify the measures that
were significantly different
from the others, we calculate HSD using the following formula:
HSD 5 x Å
MSresid
n
HSD 5 3.875 Å
0.436
10
5 0.81
A difference of 0.81 or greater between any pair of means is
statistically significant. A matrix
indicating the difference between each pair of means makes it
easier to interpret the HSD
value.
0 months (6.8) 3 months (5.1) 6 months (4.9) 9 months (4.3)
0 months (6.8) diff 5 1.7* diff 5 1.9* diff 5 2.5*
3 months (5.1) diff 5 0.2 diff 5 0.8
6 months (4.9) diff 5 0.6
9 months (4.3)
The differences marked with an asterisk are significant. The
largest increase in productivity
occurs during the first 3 months of the program.
To determine the practical importance of these numbers, partial-
eta squared is used.
For the problem just completed, SScol 5 34.475, and SSresid 5
11.775, so
h2 5
34.475
46.25
5 0.75
Section 7.3 The Within-Subjects F
(continued)
suk85842_07_c07.indd 259 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Comparing the Within-Subjects F and the One-Way ANOVA
In the one-way ANOVA, within-group variance is different for
each group because each
group is made up of different participants. There is no way to
eliminate the error vari-
ance as it was eliminated for the within-subjects F because that
source of error variance
cannot be separated from the balance of the error variance. The
smaller error term in the
within-subjects test (which is the divisor in the F ratio) allows
relatively small differences
between the sets of measures to result in a significant F.
This is illustrated by using the same data as the example of the
workers who assem-
ble electronic components, except here we calculated a one-way
ANOVA instead of the
within-subjects F. This is for illustration only because groups
are either independent or
dependent. There is not a situation in that once the test is
conducted, someone would
wonder which approach is appropriate.
The SStot and the SSbet will be the same as the SStot and the
SScol are in the within-subjects
problem.
SStot 5 49.333
SSbet 5 22.533
SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2 (Formula
6.3)
5 (2 2 3.60)2 1 (4 2 3.60)2 1 . . . 1 (9 2 6.20)2 5 26.80
The value for the SSwith in a one-way ANOVA is the same as
SSrows 1 SSresid in the within-
subjects F in Figure 7.6. It has to be because in the one-way
ANOVA, there is no way to
separate the participant-to-participant differences from the
balance of the error variance
Apply It! (continued)
About 75% of the variance in stress can be explained by how
long the student has been
enrolled in the program.
The within-subjects F test allowed analysis of students’ stress
levels at multiple times
throughout the year and showed that the program was reducing
stress levels by significant
amounts.
Apply It! boxes written by Shawn Murphy.
suk85842_07_c07.indd 260 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
because they are different for each group. With the SSrows
added back into the error term,
note in Table 7.3 the changes made to the ANOVA table and to
F in particular.
• The degrees of freedom for “within” change to 12 from the 8
for residual, which
results in a smaller critical value for the independent-groups
test, but that adjust-
ment does not compensate for the additional error in the term.
• Note that the sum of squares for the error term jumps from
3.467 in the within-
subjects test to 26.80 in the independent-groups test.
• The F value is reduced from 26.0 in the within problem to
5.046 in the one-way
problem, a factor of about 1/5.
Because groups are either independent or not, the example is
not realistic. Nevertheless,
the calculations illustrate the advantage to statistical power of
setting up a dependent-
groups test, an option researchers have at the planning level.
Table 7.3: The within-subjects F example repeated as a one-way
ANOVA
The ANOVA table
Source SS df MS F Fcrit
Between 22.533 2 11.267 5.045 3.89
Within 26.800 12 2.233
Total 49.333 14
Another Within-Subjects F Example
A psychologist working at a federal prison is interested in the
relationship between the
amount of time a prisoner is incarcerated and the number of
violent acts in which the
prisoner is involved. Using self-reported data, inmates respond
anonymously to a ques-
tionnaire administered 1 month, 3 months, 6 months, and 9
months after incarceration.
The data and the solution are in Figure 7.7.
The results (F) indicate that there are significant differences in
the num-
ber of violent acts documented for the inmate related to the
length of
time the inmate has been incarcerated. The HSD results indicate
that
those incarcerated for 1 month are involved with a significantly
dif-
ferent number of violent acts than those who have been in for 3
or 6
months. Those who have been in for 6 months are involved with
a sig-
nificantly different number of violent acts than those who have
been
in for 9 months. The eta-squared value indicates that about 37%
of the
variance in number of violent acts is a function of how long the
inmate
has been incarcerated.
F We compared
a one-way ANOVA
to a within-subjects F
using the same data. How
would the eta-squared
values for the two
problems compare?
Try It!
suk85842_07_c07.indd 261 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
Figure 7.7: Another within-subjects F: Violence and the time of
incarceration
Inmate
1
2
3
4
5
1
month
4
5
3
4
2
3.60
3
months
3
4
1
2
1
2.20
6
months
2
3
1
1
2
1.80
9
months
5
4
2
3
3
3.40
Row
Means
3.50
4.0
1.750
2.50
2.0
Column
Means
Percentile Improvement
Source
Residual
Subjects
Columns
Total
SS
31.75
11.75
15.0
5.0
df
19
3
4
12
MS
3.917
0.417
F
9.393
The ANOVA Table
1. SStot = �(x � MG)2 = 31.750
Verify that,
4. SSresid = SStot � SScol � SSsubj = 31.75 � 11.75 � 15 =
5.0
F0.05(3.12) = 3.49. F is sig.
2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 +
(Mcol3 � MG) 2ncol3 + (Mcol14 � MG) 2ncol14
(3.6 � 2.75)25 + (2.2 � 2.75)25 + (1.8 � 2.75)25 + (3.4 �
2.75)25 = 11.750
(3.5 � 2.75)24 + (4.0 � 2.75)24 + (1.75 � 2.75)24 + (2.5 �
2.75)24 + (2.0 � 2.75)24 = 15.0
3. SSsubj = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + (Mr3 �
MG) 2nr3 + (Mr4 � MG) 2nr4 + (Mr5 � MG) 2nr5
MG = 2.750
MSw
n
0.417
5
M1 = 3.6
M1 = 3.6
1.4*
M2 = 2.2
1.8*
0.4
M3 = 1.8
0.2
1.2
1.6*
M4 = 3.4
M2 = 2.2
M3 = 1.8
M4 = 3.4
The Post-hoc test: HSD = x0.05�( ) = 4.20�( ) =
1.213
SScol
SStot
11.75
= 0.370% of the variance in violence witnessed is related to
how long
the inmate has been incarcerated.
=n2 =
31.75
suk85842_07_c07.indd 262 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
A Within-Subjects F in Excel
In spite of the important increase in power that is available
compared to independent-
groups tests, a dependent-groups ANOVA is not one of the more
common tests. It is not
one of the options Excel offers in the list of Data Analysis
Tools, for example. However,
like many statistical procedures, there are a number of
repetitive calculations involved
and Excel can simplify these. We will complete the second
problem as an example.
1. Set the data up in four columns just as they are in Figure 7.8,
but create a blank
column to the right of each column of data. With a row at the
top for the labels,
data begins in cell A2.
2. Calculate the row and column means as well as a grand mean
as follows:
a. For the column means, place the cursor in cell A7 just
beneath the last value in
the first column and enter the formula, 5average(A2:A6)
followed by Enter.
To repeat this for the other columns, left click on the solution
that is now in
A7, drag the cursor across to G7, and release the mouse button.
In the Home
tab, click Fill and then Right. This will repeat the column-
means calculations
for the other columns. Delete the entries this makes to cells B7,
D7, and F7,
which are still empty at this point.
b. For the row means, place the cursor in cell I2 and enter the
formula
5average(A2,C2,E2,G2) followed by Enter. To repeat this for
the other rows,
left click on the solution that is now in I2, drag the cursor down
to I6, and
release the mouse button. In the Home tab, click Fill and then
Down. This will
repeat the calculation of means for the other rows.
c. For the grand mean, place the cursor in cell I7 and enter the
formula
5average(I2:I6) followed by Enter (the mean of the row means
will be the
same as the grand mean—the same could have been done with
the column
means).
3. To determine the SStot:
a. In cell B2, enter the formula 5(A222.75)^2 and press Enter.
This will square
the difference between the value in A2 and the grand mean. To
repeat this for
the other data in the column, left-click the cursor in cell B2 and
drag down to
cell B6. Click Fill and Down. With the cursor in cell B7, click
the summation
sign ( a ) at the upper right of the screen and press Enter.
Repeat these steps
for columns D, F, and H.
b. With the cursor in H9, type in SStot5 and press Enter. In
cell I9, enter the
formula 5Sum(B7,D7,F7,H7) and press Enter. The value will be
31.75, which
is the total sum of squares.
4. For the SScol:
a. In cell A8, enter the formula 5(3.622.75)^2*5 and press
Enter. This will square
the difference between the column mean and the grand mean
and multiply
the result by the number of measures in the column, 5. In cells
C8, E8, and
G8, repeat this for each of the other columns, substituting the
mean for each
column for the 3.60 that was the column 1 mean.
b. With the cursor in H10, type in SScol5 and press Enter. In
cell I10, enter the
formula 5Sum(A8,C8,E8,G8) and press Enter. The value will be
11.75, which
is the sum of squares for the columns.
suk85842_07_c07.indd 263 10/23/13 1:29 PM
CHAPTER 7Section 7.3 The Within-Subjects F
5. For the SSrows:
a In cell J2, enter the formula 5(I222.75)^2*4 and press Enter.
Repeat this in
rows I32I6 by left clicking on what is now I2 and dragging the
cursor down to
cell I6. Click Fill and Down.
b. With the cursor in H11, type in SSrow5 and press Enter. In
cell I11, enter the
formula 5Sum(J2:J6) and press Enter. The value will be 15.0,
which is the sum
of squares for the participants.
6. For the SSresid, in cell H12, enter SSresid5 and press Enter.
In cell I12, enter the
formula 5I102I112I12. The resulting value will be 5.0.
We used Excel to determine all the sum-of-squares values. Now,
the mean squares are
determined by dividing the sums of square for columns and
residual by their degrees of
freedom:
MScol 5
11.75
3
5 3.917
MSresid 5
5
12
5 .417
F 5
MScol
MSresid
5
3,917
.417
5 9.393, which agrees with the earlier calculations done
by hand.
To create the ANOVA table,
• beginning in cell A10, type in Source; in B10, SS; in C10, df;
in D10, MS; in E10, F;
and in F10, Fcrit.
• Beginning in cell A11 and working down, type in total,
columns, rows, residual.
• For the sum-of-squares values:
• In cell B11, enter 5I9.
• In cell B12, enter 5I10.
• In cell B13, enter 5I11.
• In cell B14, enter 5I12.
For the degrees of freedom:
• In cell C11, enter 19 for total degrees of freedom.
• In cell C12, enter 3 for columns degrees of freedom.
• In cell C13, enter 4 for rows degrees of freedom.
• In cell C14, enter 12 for residual degrees of freedom.
For the mean squares:
• in cell D12, enter 5B12/C12. The result is MScol.
• in cell D14, enter 5B14/C14. The result is MSresid.
For the F value in cell E12, enter 5D12/D14.
In cell F12, enter the critical value of F for 3 and 12 degrees of
freedom, which is 3.49.
suk85842_07_c07.indd 264 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
The list of commands looks intimidating, but mostly because
every keystroke has been
included. With some practice, this will become second nature.
Figure 7.8 is a screenshot of
the result of the calculations.
Figure 7.8: A within-subjects F problem in Excel
7.4 Presenting Results
Using the data from Figure 7.8 and analyzing it in Excel, we see
the output table high-lighted in yellow. The table is broken
down reading from left to right in columns that
include Sum of Squares (SS), Degrees of Freedom (df), mean
squares (MS), F ratio (F).
and F critical value (Fcrit). Interpreting the results, we can see
that the F ratio is based on
the MScol/MSresid, which is 3.92/.416 5 9.4. This value is
larger than our F critical value
indicating significance at the p , .05 level. Recall that a
psychologist has collected data on
an incarcerated group over a 9-month span and the number of
violent crimes they have
committed. Upon analyzing the findings, we see that as time
elapses from 1 month to 3
months to 6 months to 9 months there is a significant change in
the number of violent acts
being committed. However, you cannot be sure about when the
two significant differ-
ences occurred since there are four points in time in which data
was captured (1 month,
suk85842_07_c07.indd 265 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
3 months, 6 months, and 9 months). As a result post hoc tests
will be needed to indicate
where these differences lie.
In regards to the hypotheses of the repeated measures ANOVA,
it would be a comparison
of mean differences across time. Therefore,
H0: m1month 5 m3months 5 m6months 5 m9months
The null hypothesis states there is no significant difference
between the
mean number of violent incidences from 1 month to 3 months to
6 months
to 9 months. Keep in mind that ANOVAs are an omnibus test so
we are
testing any overall differences between the months. There may
be differ-
ences between any two months and not necessarily all of the
months with
each other, where we can follow up with paired comparisons.
Ha: m1month ? m3months ? m6months ? m9months
The alternative (or research) hypothesis states there is a
significant differ-
ence between the mean number of violent incidences from 1
month to 3
months to 6 months, to 9 months. The alternative can also be a
prediction in
the increase between the mean number of violent incidences
from 1 month
to 3 months to 6 months to 9 months.
Ha: m1month , m3months , m6months , m9months
To analyze and present results using SPSS, let us first look at
an example of a paired
sample/dependent t-test then a repeated-measures ANOVA
example.
SPSS Example 1: Steps for a Paired (Matched)-Samples t-Test
From the data set provided (Figure 7.9), a college professor
wants to look at mean differ-
ences in scores over the first two quizzes of his statistics class.
With his scores in SPSS,
go to Analyze S Compared Means S Paired-Samples. Input
Score 1 in the first box and
Score 2 in the second box that is available as seen in Figure
7.10. The click OK. The result-
ing SPSS output tables are provided in Figure 7.11.
suk85842_07_c07.indd 266 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.9: Data set for quiz scores
Figure 7.10: SPSS steps in performing a paired-samples t-test
suk85842_07_c07.indd 267 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.11: SPSS results of a paired-samples t-test
SPSS Example 2: Steps for a Repeated-Measures ANOVA
This example uses data gathered from SPSS (PASW) On-Line
Training Workshop (1999),
available at the following link:
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data
.htm
The data is measuring cancer treatments over time (see Figure
7.12):
TOTALCIN 5 oral condition at the initial stage
TOTALCW2 5 oral condition at the end of week 2
TOTALCW4 5 oral condition at the end of week 4
TOTALCW6 5 oral condition at the end of week 6
Go to Analyze S General Linear Model S Repeated Measures.
As shown in Figure 7.13,
type in the Within-Subject Factor Name: CW_Times, Number of
Levels: 4, and the Mea-
sure Name: CW; then click Define. As shown in Figure 7.14,
put in the four TOTALCW
variables in simultaneous order in the Within-Subjects
Variables box, click Plots, and
move CW_Times into the Horizontal Axis. Then click Options
and move CW_Times into
Display Means for, click Compare Main Effects, and select
Sidak from the dropdown box
just below. Then click Descriptive statistics and Estimates of
effect size. Click Continue
and OK.
Paired Samples Statistics
Pair 1
N
14
14
Std. Deviation
42.893
38.7857
Mean Std. Error Mean
4.3992
7.15949
1.1757
1.91345Score_2
Score_1
Paired Samples Test
Pair 1
Std.
Deviation
Mean Std.
Error Mean
95% Confidence
Interval of
the Difference
Lower
Bound
Upper
Bound
Paired Differences t df Sig.
(2-tailed)
Score_2
Score_1
4.10714 7.00598 1.87243 .06201 8.15228 2.193 13 .047
suk85842_07_c07.indd 268 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
CHAPTER 7Section 7.4 Presenting Results
Figure 7.12: Data set of cancer treatments over time
Data from
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Figure 7.13: Repeated-measures steps (part 1)
Data from
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
suk85842_07_c07.indd 269 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
CHAPTER 7Section 7.4 Presenting Results
Figure 7.14: Repeated-measures steps (part 2)
Data from
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
suk85842_07_c07.indd 270 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
CHAPTER 7Section 7.4 Presenting Results
Figure 7.15: SPSS results of cancer treatments over time
Tests of Within-Subjects Effects
Measure: CW
Source Type III Sum
of Squares
df F Sig. Partial
Eta
Squared
Mean
Square
Imputation
Number
13.760
13.760
13.760
220.340
220.340
220.340
220.340
384.324
384.324
384.324
384.324
3
2.194
2.424
1.000
72
52.656
58.167
24.000
73.447
100.428
90.913
220.340
5.338
7.299
6.607
16.013
13.760
.000
.000
.000
.000
.364
.364
.364
.364
1
CW_Times
Error(CW_Times)
Sphericty
Assumed
Greenhouse-
Geisser
Huynh-Feldt
Lower-Bound
Sphericty
Assumed
Greenhouse-
Geisser
Huynh-Feldt
Lower-Bound
Mauchly’s Test of Sphericitya
Measure: CW
Mauchly’s WWithin
Subjects
Effect
Imputation
Number
Approx.
Chi-Square
.808 .333CW_Times
Tests the null hypothesis that the error covariance matrix of the
orthonormalized transformed dependent
variables is proportional to an identity matrix.
a. Design: Intercept
Within Subjects Design: CW_Times
b. May be used to adjust the degrees of freedom for the
averaged tests of significance. Corrected tests are
displayed in the Tests of Within-Subjects Effects table.
.596 11.752 5 .039 .731
Epsilonbdf Sig.
Greenhouse-
Geisser
Huynh-
Feldt
Lower-
Bound
1
Descriptive Statistics
Std. DeviationMeanImputation Number N
8.28
6.52
10.36
9.76
2.542
1.531
3.475
3.566
25
25
25
25
1
TOTALCIN
TOTALCW2
TOTALCW4
TOTALCW6
suk85842_07_c07.indd 271 10/23/13 1:29 PM
CHAPTER 7Section 7.4 Presenting Results
Figure 7.15: SPSS results of cancer treatments over time
(continued)
Data from
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Figure 7.16: SPSS output graph of cancer treatments over time
Data from
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
Pairwise Comparisons
Measure: CW
(I)
CW_Times
(J)
CW_Times
Mean
Difference
(I-J)
Std. Error Sig.bImputation
Number
.011
.000
.001
.011
.043
.262
.000
.043
.800
.001
.262
.800
.504
.694
.755
.504
.709
.717
.694
.709
.489
.755
.717
.489
�1.760*
�3.840*
�3.244*
1.760*
�2.080*
�1.484
3.840*
2.080*
.596
3.244*
1.484
�.596
2
3
4
1
3
4
1
2
4
1
2
3
1
2
3
4
1
95% Confidence
Interval for Differenceb
�3.205
�5.830
�5.407
.315
�4.113
�3.539
1.850
.047
�.806
1.082
�.571
�1.997
Lower
Bound
�.315
�1.850
�1.082
3.205
�.047
.571
5.830
4.113
1.997
5.407
3.539
.806
Upper
Bound
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Sidak.
suk85842_07_c07.indd 272 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
CHAPTER 7Section 7.5 Interpreting Results
Based on the results (Figures 7.15 and 7.16), the Descriptive
Statistics
table shows that there are differences in the mean, how
significant
those differences are determined by the ANOVA, and the
consequent
post hoc tests. Next the Mauchly’s test of sphericity shows a
signifi-
cant value based on the x2 distribution with a significance value
at
the p , .05 level indicating a violation of the sphericity
assumption.
To reiterate, this indicates that there is variance between pairs
of treat-
ments or measures for the group. Since there was a significant
differ-
ences between pairs of treatments compared to other pairs, a
viola-
tion has occurred. Therefore, looking at the Tests of Within-
Subjects
Effects table, sphericity cannot be assumed, and a df adjustment
will
be made by using the Greenhouse-Geisser or the Huynh-Feldt
calcu-
lations. As seen in the F-value (13.760) and the df, adjustment
does
not make any difference as there are significant differences
across
CW_Times (p , .05). The Pairwise Comparisons table is where
we see
between-treatment differences indicating that all treatment
times are
significantly different except between times 2 and 4 (p 5 .262)
and 3
and 4 (p 5 .800) that are not statistically significant. The line
graph
also indicates a trend in differences between the first, second,
and
third treatment times but not much difference from the third to
fourth
treatments.
7.5 Interpreting Results
Refer to the most recent edition of the APA manual for specific
detail on formatting sta-tistics, but Table 7.4 may be used as a
quick guide in presenting the statistics covered
in this chapter.
Table 7.4: Guide to APA formatting of F statistic results
Abbreviation or Term Description
F F test statistic score
Partial-h2 Partial-eta-squared: a measure of effect size for
ANOVA
W Mauchly’s Test of Sphericity
x2 Distribution used for nonparametric tests such as Mauchly’s
test of
sphericity and Friedman’s ANOVA
SS Sum of Squares
MS Mean Square
Source: Publication Manual of the American Psychological
Association, 6th edition. © 2009 American Psychological
Association,
pp. 119–122.
Using the results from SPSS Example 1, Figure 7.11, we could
present the results in the
following way:
Access the
data and the
accompanying video
via the links below to
perform this analysis
yourself. Both links are
resources are provided
by the Central Michigan
University.
Data link: http://calcnet
.mth.cmich.edu/org/spss
/Prj_cancer_data.htm
Video: http://calcnet
.mth.cmich.edu/org/spss
/V16_materials/Video
_Clips_v16/19repeated
_measures/19repeated
_measures.swf
Try It!
suk85842_07_c07.indd 273 10/23/13 1:29 PM
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
CHAPTER 7Section 7.6 Nonparametric Tests
• There was a significant difference between quiz scores 1 (M 5
42.89) and
2 (M 5 38.79) as the difference in means significantly decreased
over time
t(13) 5 2.19, p , .05.
Using the results from the SPSS Example 2, Figures 7.15 and
7.16, we could present the
results in the following way:
• The overall difference between CW_Times was significantly
different using the
Greenhouse-Geisser results F(2.19, 52.66) 5 13.76, p 5 .203,
partial-h2 5 .364.
• Based on the Sidak pairwise comparison, the CW_1 time (M 5
6.52, SD 5 1.53)
was significantly different from all the other times. CW_2 (M 5
8.28, SD 5 2.54)
and CW_4 (M 5 10.36, SD 5 3.56) were also significantly
different from each other.
7.6 Nonparametric Tests
You may have noticed that for every parametric test, there is a
nonparametric equiva-lent. The rationale behind nonparametric
tests is to obtain a conservative estimate of
significance when violations in parametric assumptions have
occurred. Such assumptions
include linearity, abnormal distributions, and small data sets.
The nonparametric equivalent of the dependent-samples t-test is
the Wilcoxon signed-
ranks test (not to be confused with Chapter 5’s Wilcoxon rank-
sum test for independent
samples t-test). Frank Wilcoxon proposed both of these in a
single paper published in
1945. The Wilcoxon signed-ranks W-test is known as the
Wilcoxon t-test for dependent
samples (not independent ones). In brief, the steps in the
calculation of W are calculating
the differences between scores, taking an absolute value
(removing the 1/2 sign), rank-
ing the absolute values, reassigning the original (1/2) sign, and
then summing the ranks.
The nonparametric equivalent of the parametric repeated-
measures ANOVA is Fried-
man’s ANOVA. Essentially the analysis looks at the difference
in the mean ranks (instead
of means) across time, treatments, or matched/equivalent
groups. By analyzing differ-
ence in the mean ranks, the analysis is in effect eliminating
extremes points, or outliers, in
the distribution. As noted, this is the disadvantage of using the
mean, as these are affected
by outliers. Again, nonparametric tests are a conservative,
distribution-free analysis that
are used when parametric violations have occurred. As a result,
it is more difficult to find
significance; on the other hand, they are conservative in that
there is a lower probability
of having a type I error. One important point to note is that even
though mean ranks are
used to calculate significant difference between times,
treatments, or matched/equivalent
groups, the results are reported in terms of the median
differences as will be shown in the
next example.
Friedman’s nonparametric ANOVA dates back to 1937 and is
based on ranked (ordinal)
data and the comparison of medians. An alternative test that is
nonparametric and similar
to Friedman’s test is Cochran’s Q-test, which is used for
dichotomous data (i.e., only two
response choices as in yes/no).
suk85842_07_c07.indd 274 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Worked Example of the Friedman’s Nonparametric ANOVA and
Wilcoxon
Signed-Ranks W Using SPSS
To perform the Friedman’s nonparametric ANOVA using the
data set in Figure 7.17, go to
Analyze S Nonparametric Tests S Legacy Dialogs S K Related
Samples. Place Score_1,
Score_2, and Score_3 into the Test Variables box (see Figure
7.18). Click on Statistics,
check the Quartiles box, and then click Continue and OK.
Figure 7.17: Data set for the Friedman’s ANOVA test
suk85842_07_c07.indd 275 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Figure 7.18: Steps in SPSS for the Friedman’s nonparametric
ANOVA
Figure 7.19: Results of the Friedman’s nonparametric ANOVA
Descriptive Statistics
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000
44.500
40.2500
43.1250
46.125
44.7500
14
14
14
Score_1
Score_2
Score_3
Ranks
Mean Rank
1.86
2.43
1.71
Score_1
Score_2
Score_3
Test Statisticsa
14
2
4.148
.126
Chi-Square
N
df
Asymp. Sig.
a. Friedman Test
suk85842_07_c07.indd 276 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Interpreting Results
Based on the results of the Friedman’s nonparametric ANOVA
in Figure 7.19, there is no
significant difference between quiz scores over time based on
the x2(14) 5 4.15, p 5 .126. In
other words, the students did not change significantly over
testing times. Since there were
no significant differences in the scores, some researchers will
conclude that no post hoc
tests will be needed. The few steps required to perform a post
hoc Wilcoxon signed-ranks
W-test using software will require minimal additional effort;
however, the researcher can
then be assured that there exist no such significant differences
between groups.
The steps for performing the W-test (Figure 7.20) are Analyze S
Nonparametric
Tests S Legacy Dialogs S 3 Related Samples. Input Score_1 and
Score_2 in Row 1, Score_2
and Score_3 into Row 2, and then Score_1 and Score_3 into
Row 3. Click on Options,
check Quartiles, and click Continue and OK.
Figure 7.20: Steps in SPSS for the Wilcoxon signed-ranks W-
test
suk85842_07_c07.indd 277 10/23/13 1:29 PM
CHAPTER 7Section 7.6 Nonparametric Tests
Figure 7.21: Results of the Wilcoxon signed-ranks W-test
Descriptive Statistics
50th (Median)
Percentiles
25thN 75th
35.2500
39.000
30.3750
40.0000
44.500
40.2500
43.1250
46.125
44.7500
14
14
14
Score_1
Score_2
Score_3
Test Statisticsa
Score_2-Score_1
.045
�2.001b
Score_3-Score_2
.900
�.126b
Score_3-Score_1
.069
�1.821bZ
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
Ranks
Sum of RanksMean RankN
4.50
7.17
8.42
6.81
5.88
8.15
13.50
64.50
50.50
54.50
23.50
81.50
3b
9a
2c
14
6e
8d
0f
14
4h
10g
0i
14
Score_2-Score_1
Negative Ranks
Positive Ranks
Ties
Total
Negative Ranks
Positive Ranks
Ties
Total
Negative Ranks
Positive Ranks
Ties
Total
Score_3-Score_2
Score_3-Score_1
a.
b.
c.
d.
e.
f.
g.
h.
i.
Score_2
Score_2
Score_2
Score_3
Score_3
Score_3
Score_3
Score_3
Score_3
<
>
=
<
>
=
<
>
=
Score_1
Score_1
Score_1
Score_2
Score_2
Score_2
Score_1
Score_1
Score_1
Asymp. Sig. (2-tailed)
suk85842_07_c07.indd 278 10/23/13 1:29 PM
CHAPTER 7Summary
Looking at the results (Figure 7.21) of the Wilcoxon signed-
ranks W-test, we see that using
this as a post hoc to the Friedman’s ANOVA does have benefits
as it identifies a signifi-
cant difference between Scores 1 and 2 that was not detected
using Friedman’s ANOVA.
This is a significant point in the previously noted debate over
whether to run post hocs
based on the significance of the F value. As a result, the
conclusion here, based on the
W-test, is that there is a significant difference between Score_1
(Mdn 5 44.50) and Score_2
(Mdn 5 40.00), Z(14) 5 22.00, p , .05, while there were no
significant differences in
Score_2 (Mdn 5 40.00), and Score_3 (Mdn 5 40.25), Z(14) 5
20.126, p 5 .90, and Score_1
(Mdn 5 44.50), and Score_3 (Mdn 5 40.25), Z(14) 5 21.821, p 5
.90.
Summary
Any statistical procedure has advantages and disadvantages. The
downside of the dif-
ferent independent-groups designs is that subjects within groups
often respond to the
independent variable differently. Those differences are a source
of error variance that
is different for each group. No matter how carefully a
researcher randomly selects the
groups to be used in a study, there are going to be differences in
the way that people in
the same group respond to whatever stimulus is offered. Both
the before/after t-test and
within-subjects F test eliminate that source of error variance by
either using the same
people repeatedly, or matching subjects on the most important
characteristics. Control-
ling error variance results in a test that is more likely to detect a
significant difference
(Objectives 1 and 5).
In dependent-groups designs, using the same group repeatedly
allows for the number
of participants involved to be fewer (Objectives 1, 2, 3, 4, and
6). One of the downsides
to repeated-measures designs is that they take more time to
complete. Unless subjects
are matched across measures, the different levels of the
independent variable cannot be
administered concurrently as they can in independent-groups
tests. With more time, there
is an increased potential for attrition. If one of the participants
drops out of a repeated-
measures study, the data is lost from all the measures of the
dependent variable for that
subject (Objectives 2 and 4).
Having noted some of the differences between dependent-groups
designs and their
independent-groups equivalents, it is important to note their
consistencies as well.
Whether the test is independent t, before/after t, one-way
ANOVA, or a within-subjects
F, in each case the independent variable is nominal scale, and
the dependent variable is
interval or ratio scale (Objective 2).
In addition, two repeated-measures designs were performed
(i.e., t-tests and ANOVAs)
where we presented several scenarios to test their respective
null hypotheses to find sup-
port for alternative ones (Objectives 3 and 6). Results and
conclusions were presented,
interpreted, and reported in APA format (Objectives 7 and 8).
Finally, the Wilcoxon signed-
ranks W-test and the Friedman’s Nonparametric ANOVA were
discussed with an appro-
priate example (Objective 9).
suk85842_07_c07.indd 279 10/23/13 1:29 PM
CHAPTER 7Key Terms
There is something else that all the tests in this chapter have in
common. They all test the
hypothesis of difference. Like the z-test and the one sample t-
test, they are about signifi-
cant differences. Sometimes, however, the question involves the
strength of the relation-
ships between variables. Those discussions will introduce
correlation and the hypothesis
of association that are the focus of Chapter 8.
Key Terms
before/after t-test A dependent-group’s
application of the t-test, also known as a
pre/post t-test. In this particular applica-
tion, one group is measured before and
after a treatment.
confounding variables Variables that
influence an outcome but are uncontrolled
in the analysis and obscure the effects of
other variables. For example, if a psycholo-
gist is interested in gender-related differ-
ences in problem-solving ability but doesn’t
control for age differences, differences in
gender may be confounded by differences
that are actually age-related.
dependent-groups designs Statistical
procedures in which the groups are related,
either because multiple measures are taken
of the same participants or because each
participant in a particular group is matched
on characteristics relevant to the analysis to
a participant in the other groups with the
same characteristics. Dependent-groups
designs reduce error variance because they
reduce score variation due to factors unre-
lated to the independent variable.
matched-pairs or dependent-samples
t-test A dependent-group’s application
of the t-test. In this particular application,
each participant in the second group is
paired to a participant in the first group
with the same characteristics in order to
limit the error variance that would other-
wise stem from using dissimilar groups.
sphericity Nonsignificant differences in
the dependent variable across pairs of treat-
ments or times for all participants in the
group. By minimizing this within-group
error variance, sphericity may be assumed.
Significant within-group error variances
between pairs of treatments are a violation
of sphericity. Such variances are detected
using Mauchly’s sphericity (W) test.
within-subjects F The dependent-group’s
equivalent of the one-way ANOVA. In this
procedure either participants in each group
are paired on the relevant characteristics
with participants in the other groups, or
one group is measured repeatedly after
different levels of the independent variable
are introduced.
suk85842_07_c07.indd 280 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
Chapter Exercises
Answers to Try It! Questions
The answers to all Try It! questions introduced in this chapter
are provided below.
A. Small samples tend to be platykurtic because the data in
small samples is often
highly variable. This translates into relatively large standard
deviations and
large error terms.
B. If groups are created by random sampling, they will differ
from the population
from which they were drawn only by chance. That means that
with random
sampling, there can be error, but its potential to affect research
results dimin-
ishes as the sample size grows.
C. The before/after t-test and the matched-pairs t-test differ
only in that the before/
after test uses the same group twice and the matched-pairs test
matches each
subject in the first group with one in the second group who has
similar charac-
teristics. The calculation and interpretation of the t value are
the same in both
procedures.
D. The within-subjects test will detect a significant difference
more readily than an
independent t-test. Power in statistical testing is the likelihood
of detecting sig-
nificance.
E. Because the same subjects are involved in each set of
measures, the within-
subjects test allows us to calculate the amount of score
variability due to indi-
vidual differences in the group and eliminate it because it is the
same for each
group. This source of error variance is eliminated from the
analysis, leaving a
smaller error term.
F. The eta-squared value would be the same in either
problem. Note that in a one-
way ANOVA, eta-squared is the ratio of SSbet to SStot. In the
within-subjects F, it
is SScol to SStot. Because SSbet and SScol both measure the
same variance and the
SStot values will be the same in either case, the eta-squared
values will likewise
be the same. What changes, of course, is the error term.
Ordinarily, SSresid will
be much smaller than SSwith, but those values show up in the F
ratio by virtue of
their respective MS values, not in eta-squared.
Review Questions
The answers to the odd-numbered items can be found in the
answers appendix.
1. A group of clients is being treated for a compulsive behavior
disorder. The number
of times in an hour that each one manifests the compulsivity is
gauged before and
after a mild sedative is administered. The data is as follows:
Before After
1. 5 4
2. 6 4
3. 4 3
(continued)
suk85842_07_c07.indd 281 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
Before After
4. 9 5
5. 5 6
6. 7 3
7. 4 2
8. 5 5
a. What is the standard deviation of the difference scores?
b. What is the standard error of the mean for the difference
scores?
c. What is the calculated value of t?
d. Are the differences statistically significant?
2. A researcher is examining the impact that a political ad has
on potential donors’
willingness to contribute. The data indicates the amount (in
dollars) each is willing
to donate before viewing that advertisement and after viewing
the advertisement.
Before After
1. 0 10
2. 20 20
3. 10 0
4. 25 50
5. 0 0
6. 50 75
7. 10 20
8. 0 20
9. 50 60
10. 25 35
a. Are there significant differences in the amount?
b. What is the value of t if this is done as an independent t-
test?
c. Explain the difference between before/after and independent
t-tests.
3. Participants attend three consecutive sessions in a business
seminar. In the first,
there is no reinforcement for responding to the session
moderator’s questions. In
the second, those who respond are provided with verbal
reinforcers. In the third,
(continued)
suk85842_07_c07.indd 282 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
responders receive bits of candy as reinforcers. The dependent
variable is the num-
ber of times the participants respond in each session.
None Verbal Token
1. 2 4 5
2. 3 5 6
3. 3 4 7
4. 4 6 7
5. 6 6 8
6. 2 4 5
7. 1 3 4
8. 2 5 7
a. Are the column-to-column differences significant? If so,
which groups are sig-
nificantly different from which?
b. Of what data scale is the dependent variable?
c. Calculate and explain the effect size.
4. In the calculations for Exercise 3, what step is taken to
minimize error variance?
a. What is the source of that error variance?
b. If Exercise 3 had been a one-way ANOVA, what would have
been the degrees of
freedom for the error term?
c. How does the change in degrees of freedom for the error
term in the within-
subjects F affect the value of the test statistic?
5. Because SScol in the within-subjects F contains the treatment
effect and measure-
ment error, if there is no treatment effect, what will be the value
of F?
6. Why is matching uncommon in within-subjects F analyses?
7. A group of nursing students is approaching the licensing test.
The level of anxiety
for each student is measured at 8 weeks prior to the test, then 4
weeks, 2 weeks,
and 1 week before the test. Assuming that anxiety is measured
on an interval scale,
are there significant differences?
Student
Number
8 weeks 4 weeks 2 weeks 1 week
1. 5 8 9 9
2. 4 7 8 10
3. 4 4 4 5
4. 2 3 5 5
5. 4 6 6 8
6. 3 5 7 9
7. 4 5 5 4
8. 2 3 6 7
suk85842_07_c07.indd 283 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
a. Is anxiety related to the time interval?
b. Which groups are significantly different from which?
c. How much of anxiety is a function of test proximity?
8. A psychology department sponsors a study of the relationship
between partici-
pation in a particular internship opportunity and students’ final
grades. Eight
students in their second year of graduate study are matched to
eight students in
the same year by grade. Those in the first group participate in
the internship. Stu-
dents’ grades after the second year are compared.
Student Pair Number Internship No Internship
1. 3.6 3.2
2. 2.8 3.0
3. 3.3 3.0
4. 3.8 3.2
5. 3.2 2.9
6. 3.3 3.1
7. 2.9 2.9
8. 3.1 3.4
a. Are the differences statistically significant?
b. This should be done as a dependent-samples t-test. Because
there are two sepa-
rate groups involved, why?
9. A team of researchers associated with an accrediting body
studies the amount of
time professors devote to their scholarship before and after they
receive tenure.
Scores are hours per week.
Professor Number Before Tenure After Tenure
1. 12 5
2. 10 3
3. 5 6
4. 8 5
5. 6 5
6. 12 10
7. 9 8
8. 7 7
a. Are the differences statistically significant?
b. What is t if the groups had been independent?
c. What is the primary reason for the difference in the two t
values?
suk85842_07_c07.indd 284 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
10. A supervisor is monitoring the number of sick days
employees take by month. For
seven people, they are as follows:
Employee Number Oct Nov Dec
1. 2 4 3
2. 0 0 0
3. 1 5 4
4. 2 5 3
5. 2 7 7
6. 1 3 4
7. 2 3 2
a. Are the month-to-month differences significant?
b. What is the scale of the independent variable in this
analysis?
c. How much of the variance does the month explain?
11. If the people in each month of the Exercise 10 data were
different, it would have
been a one-way ANOVA.
a. Would the result have been significant?
b. Because total variance (SStot) is the same in either Exercise
10 or 11, and the SScol
(Exercise 10) is the same as SSbet (Exercise 11), why are the F
values different?
Analyzing the Research
Review the article abstracts provided below. You can then
access the full articles via your
university’s online library portal to answer the critical thinking
questions. Answers can be
found in the answers appendix.
Using Repeated Measures ANOVA for a Stress Management
Study
Elo, A., Ervasti, J., Kuosma, E., & Mattila, P. (2008).
Evaluation of an organizational stress
management program in a municipal public works organization.
Journal of Occu-
pational Health Psychology, 13(1), 10–23.
Article Abstract
The aim of this study was to investigate the effects of employee
participation in an orga-
nizational stress management program consisting of several
interventions aiming to
improve psychosocial work environment and well-being. Pre-
and postintervention ques-
tionnaires were used to measure the outcomes with a 2-year
interval. This article describes
the background of the program, results of previously published
effect studies, and a quali-
tative evaluation of the program. The authors also tested the
effects of level of participa-
tion in all interventions among the employees of the service
production units by 2 (time)
3 (group) repeated measures ANOVAs (n 5 625). “Active
participation” (more than 5.5
days) had a positive effect on feedback from supervisor and
flow of information. Work
climate remained on a permanent level while it decreased in the
categories of moderate
suk85842_07_c07.indd 285 10/23/13 1:29 PM
CHAPTER 7Chapter Exercises
and nonparticipation. The level of participation did not improve
individual well-being or
other aspects of psychosocial work environment as postulated
by the work stress models.
The qualitative evaluation and practical conclusions drawn by
the management of the
Organization provided a positive impression of the impact of
the program.
Critical Thinking Questions
1. What is the independent variable for which subjects are being
tested, under all
treatment levels?
2. Explain the importance of power in relation to this within-
group design.
3. What is the disadvantage of testing subject group’s pre- and
postparticipation
program intervention?
4. Does this study need to worry about sphericity when
conducting the repeated-
measures ANOVA? Why or why not?
Using ANOVA for a Personality Disorder Scales Study
Wise, E. A. (1995). Personality disorder correspondence among
the MMPI, MBHI, and
MCMI. Journal of Clinical Psychology, 51(6), 790–798.
Article Abstract
MMPI, MBHI, and MCMI personality disorder scales were
analyzed for convergent and
discriminant validity. Friedman’s ANOVA indicated that there
were no significant differ-
ences among the sample’s averaged scale scores. Further
analyses of the data, however,
demonstrated that the Millon instruments classified
significantly more of the sample as
personality disordered when compared to Morey’s MMPI
personality disorder scales. In
addition, codetype correspondence among the three instruments
was only 4 to 6%. When
the instruments were analyzed in a pair-wise fashion, codetype
correspondence increased
to approximately 10 to 20%. These data indicate that these
personality disorder scales do
not demonstrate construct equivalence, particularly at the level
of the individual profile.
Critical Thinking Questions
1. Why did this study run a Friedman’s Nonparametric
ANOVA?
2. The Friedman’s Nonparametric ANOVA showed no
significant differences among
the tests by scale means. What was the significance level for
this to be true?
3. What is reported in Friedman’s Nonparametric ANOVA?
Please label what each
piece is from the Friedman output when comparing MBHI and
MCMI compared
to MMPI.
4. Suppose the Friedman’s ANOVA was significant. Would we
run a post hoc test?
What type of post hoc test?
suk85842_07_c07.indd 286 10/23/13 1:29 PM
iStockphoto/Thinkstock
chapter 6
Analysis of Variance (ANOVA)
Learning Objectives
After reading this chapter, you will be able to. . .
1. explain why it is a mistake to analyze the differences
between more than two groups with
multiple t-tests.
2. relate sum of squares to other measures of data variability.
3. compare and contrast t-test with ANOVA.
4. demonstrate how to determine which group is significant in
an ANOVA with more than
two groups.
5. explain the use of eta-squared in ANOVA.
6. present statistics based on ANOVA results in APA format.
7. interpret results and draw conclusions of ANOVA.
8. discuss nonparametric Kruskal-Wallis H-test compared to the
ANOVA.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_06_c06.indd 183 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
Ronald. A. Fisher was present at the creation of modern
statistical analysis. During the early part of the 20th century,
Fisher worked at an agricultural research station in rural
southern England. In his work analyzing the effect of pesticides
and fertilizers on crop
yields, he was stymied by the limitations in Gosset’s
independent t-test, which allowed
him to compare only one pair of samples at a time. In the effort
to develop a more com-
prehensive approach, Fisher created analysis of variance
(ANOVA).
Like Gosset, he felt that his work was important enough to
publish, and like Gosset in his
effort to publish t-test, Fisher had opposition. In Fisher’s case,
the opposition came from
a fellow statistician, Karl Pearson. This is the same man who
created the first department
of statistical analysis at University College, London. In
Chapters 9 and 11 you will study
some of Pearson’s work with correlations as well as Spearman
rho (r) and Chi-square (x2),
which are the analysis of categorical (nominal and ordinal) data.
Pearson also founded
what is probably the most prominent journal for statisticians,
Biometrika. Pearson was an
advocate of making one comparison at a time and of using the
largest groups possible to
make those comparisons.
When Fisher submitted his work to Pearson’s journal with
procedures suggesting that
samples can be small and many comparisons can be made in the
same analysis, Pear-
son rejected the manuscript. So began a long and increasingly
acrimonious relationship
between two men who would become giants in the field of
statistical analysis and end up
in the same department at University College. Interestingly,
Gosset also gravitated to the
department and managed to get along with both of them.
Fisher’s contributions affect more than this chapter. Besides the
development of the
ANOVA, the concept of statistical significance is his as well as
hypothesis testing discussed
in Chapter 5. Note that although a ubiquitous phenomenon,
significance testing itself is
not always accepted by other statisticians. One such adversary
is William [Bill] Kruskal,
who consequently derived the nonparametric version of the
ANOVA—the Kruskal-Wallis
H-test, which is discussed in this chapter. Despite these
philosophical and statistical dif-
ferences, R. A. Fisher made an enormous contribution to the
field of quantitative analysis,
as did his nemesis, Karl Pearson with additional statistical
contributions by William Sealy
Gosset and Bill Kruskal.
6.1 One-Way Analysis of Variance
In any experiment, scores and measurements vary for many
reasons. If a researcher is interested in whether children will
emulate the videotaped behavior of adults whom
they have watched, any differences in the children’s behavior
from before they see the
adults to after are attributed primarily to the adults’ behaviors.
But even if all of the chil-
dren watch with equal attentiveness, it is likely there will be
differences in their behaviors
H1
TX_DC
BLF
TX
BL
BLL
suk85842_06_c06.indd 184 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
after the video. Some of those differences might stem from age
differences among the chil-
dren. Perhaps the amount of exposure children otherwise have
to television will prompt
differences in their behavior. Probably differences in their
background experiences will
also affect the way they behave.
In an analysis of how behavior changes as a result of watching
the video, the independent
variable (IV) is whether or not the children have seen the video.
Changes in their behavior,
the dependent variable (DV), reflect the effect of the IV, but
they also reflect all the other
factors that prompt the children to behave differently. An IV is
also referred to as a factor,
particularly in procedures that involve more than one IV.
Behavior changes that are not
related to the IV reflect the presence of error variance attributed
by other factors known
as confounding variables.
When researchers work with human subjects, some level of
error variance is inescap-
able. Even under tightly controlled conditions where all
members of a sample receive
exactly the same treatment, the subjects are unlikely to respond
the same way. There are
just too many confounding variables that also affect their
behavior. Fisher ’s approach
was to calculate the total variability in a problem and then
analyze it, thus the name
analysis of variance.
Any number of IVs can be included in an ANOVA. Here, we are
interested primarily in
ANOVA in its simplest form, a procedure called one-way
ANOVA. The “one” in one-
way ANOVA indicates that there is just one IV in this model. In
that regard, one-way
ANOVA is similar to the independent-samples t-test discussed
in Chapter 5. Both tests
have one IV and one DV. The difference is that the independent
t-test
allows for an IV with just two groups, but the IV in ANOVA
can be any
number of groups generally more than two. In other words, a
one-way
ANOVA with just two groups is the same as an independent-
samples
t-test where the statistic calculated in ANOVA, F is equal to t2;
this is
addressed and illustrated in Section 6.5.
The ANOVA Advantage
The ANOVA and the t-test both answer the same question: Are
there significant differ-
ences between groups? So why bother with another test when
we have the t-test? Suppose
someone has developed a group therapy program for people with
anger management
problems and the question is, are there significant differences in
the behavior of clients
who spend (a) 8, (b) 16, and (c) 24 hours in therapy over a
period of weeks? Why not
answer the question by performing three t-tests as follows?
1. Compare the 8-hour group to the 16-hour group.
2. Compare the 16-hour group to the 24-hour group.
3. Compare the 8-hour group to the 24-hour group.
A What does the
“one” in one-way
ANOVA refer to?
Try It!
suk85842_06_c06.indd 185 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
The Problem of Multiple Comparisons
These three tests represent all possible comparisons, but there
are two problems with this
approach. First, all possible comparisons is a good deal more
manageable if there are three
groups than if there are, say, five groups. If there were five
groups, labeled a through e,
note the number of comparisons needed to cover all possible
comparisons:
1. a to b
2. a to c
3. a to d
4. a to e
5. b to c
6. b to d
7. b to e
8. c to d
9. c to e
10. d to e
All possible comparisons among three tests involve 10 tests as
seen above to cover all the
combinations of tests.
Family-Wise Error
The other problem is an issue of inflated error in hypothesis
testing when doing multiple
tests known as family-wise error. Recall that the potential for
type I error (a) is deter-
mined by the level at which the test is conducted. At a 5 .05,
any significant finding will
result in a type I error an average of 5% of the time. However,
that level of error assumes
that each test is conducted with new data thereby increasing the
family-wise error rate
(FWER). Specifically, if statistical testing is done repeatedly
with the same data, the poten-
tial for type I error does not remain fixed at .05 (or whatever
the level of the testing), but
grows. In fact, if 10 tests are conducted in succession with the
same data as with groups
labeled a, b, c, d, and e mentioned earlier, and each finding is
significant, by the time the
10th test is completed, the potential for alpha error is FWER 5
.40 or a 40% error prob-
ability, as the following procedure illustrates:
P
a
5 1 2 (1 2 pa)n
Where
Pa 5 the probability of alpha error overall
pa 5 the probability of alpha error for the initial significant
finding
n 5 the number of tests conducted where the result was
significant
P
a
5 1 2 (1 2 .052 10
5 1 2 .599
FWER 5 .401
The probability of a type I error at this point is 4 in 10 or 40%!
suk85842_06_c06.indd 186 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
The business of raising the (1 2 pa) difference to the 10th power
(or however many com-
parisons there are) is not only tedious, but the more important
problem is that the prob-
ability of a type I error does not remain fixed when there are
successive significant results
with the same data. Therefore, using multiple t-tests is never a
good option.
In the end, running one test in an overall ANOVA will control
for inflated FWER. An
ANOVA is therefore termed an omnibus test, as it will test the
overall significance of the
research model based on the differences between sample means.
It will not tell you which
two means are significantly different, which is why follow-up
post hoc comparisons are
executed. These concepts will be discussed in further detail
throughout the chapter.
The Variance in Analysis of Variance (ANOVA)
To analyze variance, Fisher began by calculating total
variability from all sources. He
recognized that when scores vary in a research study, they do so
for two reasons. They
vary because the independent variable (the “treatment”) has had
an effect, and they vary
because of factors beyond the control of the researcher,
producing the error variance
referred to earlier.
The test statistic in ANOVA is the F ratio (named for Fisher),
which is treatment variance
(variance that can be explained by the IV on the DV) divided by
error variance (variance
that cannot be explained due to confounding variables on the
DV). When F is large, it indi-
cates that the difference between at least two of the groups in
the analysis is not random
and that there are significant differences between at least two
group means. When the F
ratio is small (close to a value of 1), it indicates that the IV has
not had enough impact to
overcome error variability, and the differences between groups
are not significant. We will
return to the F ratio when we discuss Formula 6.4.
Variance Between and Within Groups
If three groups of the same size are all selected from one
population, they could be repre-
sented by three distributions, as shown in Figure 6.1. They do
not have exactly the same
mean, but that is because even when they are selected from the
same population, samples
are rarely identical. Those initial differences between sample
means indicate some degree
of sampling error.
Figure 6.1: Three groups drawn from the same population
suk85842_06_c06.indd 187 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
The reason that each of the three distributions has width is that
there are differences within
each of the groups. Even if the sample means were the same,
individuals selected to the
same sample will rarely manifest precisely the same level of
whatever is measured. If a
population is identified—for example, a population of the
academically gifted—and a
sample is drawn from that population, the individuals in the
sample will not all have the
same level of ability. Because they are all members of the
population of the academically
gifted, they will probably all be higher than the norm for
academic ability, but there will
still be differences in the subjects’ academic ability within the
sample.
These differences within are sources of error variance.
The treatment effect is indicated in how the IV affects the way
the DV
is manifested. For example, three groups of subjects are
administered
different levels of a mild stimulant (the IV) to see the effect on
level
of attentiveness. The issue in ANOVA is whether the IV, the
treat-
ment, creates enough additional between-groups variability to
exceed
any error variance. Ultimately, the question is whether, as a
result of
the treatment, the samples still represent populations with the
same
mean, or whether, as is suggested by the distributions in Figure
6.2,
they may represent populations with different means.
Figure 6.2: Three groups after the treatment
The within-groups variability in these three distributions is the
same as it was in the dis-
tributions in Figure 6.1. It is the between-groups variability that
has changed in Figure 6.2.
More particularly, it is the difference between the group means
that has changed. Although
there was some between-groups variability before the treatment,
it was comparatively
minor and probably reflected sampling variability. After the
treatment, the differences
between means are much greater. What F indicates is whether
group differences are great
enough to be statistically significant not due to chance.
The Statistical Hypotheses in One-Way ANOVA
The hypotheses are very much like they were for the
independent t-test, except that they
accommodate more groups. For the t-test, the null hypothesis is
written H0: m1 5 m2. It
indicates that the two samples involved were drawn from
populations with the same
means. For a one-way ANOVA with three groups, the null
hypothesis has this form:
H0: m1 5 m2 5 m3
B If a psychologist
is interested in
the impact that 1 hour,
5 hours, or 10 hours of
therapy have on client
behavior, how are
behavior differences
related to gender
explained?
Try It!
suk85842_06_c06.indd 188 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
It indicates that the three samples were drawn from populations
with the same means.
Things have to change for the alternate hypothesis, however,
because with three groups,
there is not just one possible alternative. Note that each of the
following is possible:
a. Ha: m1 ? m2 5 m3
Sample 1 represents a population with a mean value different
from the
mean of the population represented by Samples 2 and 3.
b. Ha: m1 5 m2 ? m3
Samples 1 and 2 represent a population with a mean value
different
from the mean of the population represented by Sample 3.
c. Ha: m1 5 m3 ? m2
Samples 1 and 3 represent a population with a mean value
different
from the population represented by Sample 2.
d. Ha: m1 ? m2 ? m3
All three samples represent populations with different means.
Because the several possible alternative outcomes multiply rap-
idly when the number of groups increases, a more general
alternate
hypothesis is given. Either all the groups involved come from
popu-
lations with the same means, or at least one of them does not.
So the
form of the alternate hypothesis for an ANOVA with any
number of
groups is simply
Ha: At least one of the means is different from the other
means.
Also remember that all the hypotheses are either nondirectional,
in that there is no predic-
tion of which sample mean will be higher than the others:
Nondirectional alternative hypothesis: Ha: m1 ? m2 ? m3
or directional, in that there is a prediction of which sample
mean will be higher than the
other means. As seen below for the directional alternative
hypothesis, there is a prediction
that m3 will be higher than m2 that is higher than m1.
Directional alternative hypothesis: Ha: m1 , m2 , m3
As a researcher, it is important to consider the value of
prediction in terms of a one-tailed
test versus no prediction in a two-tailed test as discussed in
Chapter 5.
Measuring Data Variability in the One-Way ANOVA
We have discussed several different measures of data variability
to this point, including
the standard deviation (s), the variance (s2), the standard error
of the mean (SEM), the
standard error of the difference (SEd), and the range. For
ANOVA, Fisher added one more,
C How many
t-tests would
it take to make all
possible comparisons
in a procedure with six
groups?
Try It!
suk85842_06_c06.indd 189 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
the sum of squares (SS). The sum of squares is the sum of the
squared differences between
scores and one of several mean values. In ANOVA,
• One sum-of-squares value involves the differences between
individual scores and
the mean of all the scores in all the groups (the grand mean).
This is the called the
sum of squares total (SStot) because it measures all variability
from all sources.
• A second sum-of-squares value indicates the difference
between the means of
the individual groups and the grand mean. This is the sum of
squares between
(SSbet). It measures the effect of the IV, the treatment effect, as
well any differ-
ences that existed between the groups before the study began.
• A third sum-of-squares value measures the difference between
scores in the sam-
ples and the means of their sample. These sum of squares within
(SSwith) values
reflect the differences in the way subjects respond to the same
stimulus. Because
this value is entirely error variance, it is also called the sum of
squares error (SSerr)
or the sum of squares residual (SSres).
All Variability From All Sources: The Sum of Squares Total
(SStot )
There are multiple formulas for SStot. They all provide the
same answer, but some make
more sense to look at than others. Formula 6.1 makes it clear
that at the heart of SStot is
the difference between each individual score (x) and the mean
of all scores, or the grand
mean, for which the notation is MG.
SStot 5 a (x 2 MG )2 Formula 6.1
Where
x 5 each score in all groups
MG 5 the mean of all data from all groups, the grand mean
To calculate SStot, follow these steps:
1. Sum all scores from all groups and divide by the number of
scores to determine
the grand mean, MG.
2. Subtract MG from each score (x) in each group, and then
square the difference:
(x 2 MG)
2
3. Sum all the squared differences: a (x 2 MG)
2
The Treatment Effect: The Sum of Squares Between (SSbet )
The between-groups variance, the sum of squares between
(SSbet), contains the variability
due to the independent variable, the treatment effect. It will also
contain any initial differ-
ences between the groups, which of course is error variance. For
three groups labeled a,
b, and c, the formula is
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc Formula 6.2
suk85842_06_c06.indd 190 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
Where
Ma 5 the mean of the scores in the first group (a)
MG 5 the same grand mean used in SStot
na 5 the number of scores in the first group (a)
To calculate SSbet, follow these steps:
1. Determine the mean for each group: Ma, Mb, and so on.
2. Subtract MG from each sample mean and square the
difference: (Ma 2 MG)
2
3. Multiply the squared differences by the number in the group:
(Ma 2 MG)
2na
4. Repeat for each group.
5. Sum ( a ) the results across groups.
The value that results from Formula 6.2 represents the
differences between groups and the
mean of all the data.
The Error Term: The Sum of Squares Within (SSwith )
When a group receives the same treatment but individuals
within the group respond dif-
ferently, their differences constitute error—unexplained
variability. Maybe subjects’ age
differences are the cause, or perhaps the circumstances of their
family lives, but for some
reason not analyzed in the particular study, subjects in the same
group often respond dif-
ferently to the same stimulus. The amount of this unexplained
variance within the groups
is calculated with the SSwith, for which we have Formula 6.3:
SSwith 5 a (xa 2 Ma)2 1 a (xb 2 Mb)2 1 a (xc 2 Mc )2 Formula
6.3
Where
SSwith 5 the sum of squares within
xa 5 each of the individual scores in Group a
Ma 5 the score mean in Group a
To calculate SSwith, follow these steps:
1. Take the mean for each of the groups; these are available
from
calculating the SSbet earlier.
2. From each score in each group,
a. subtract the mean of the group,
b. square the difference, and
c. sum the squared differences within each group.
3. Repeat this for each group.
4. Sum the results across the groups.
D When will the
sum-of-squares
values be negative?
Try It!
suk85842_06_c06.indd 191 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
The SSwith (or the SSerr) measures the degree to which scores
vary due to factors not con-
trolled in the study, fluctuations that constitute error variance.
Because the SStot consists of the SSbet and the SSwith, once
the SStot and the SSbet are known,
the SSwith can be determined by subtraction:
SStot 2 SSbet 5 SSwith
However, there are two reasons not to determine the SSwith by
simple subtraction. First, if
there is an error in the SSbet, it is only perpetuated with the
subtraction. Second, calculat-
ing the value with Formula 6.3 helps clarify that what is being
determined is a measure
of how much variation in scores there is within each group. For
the few problems done
entirely by hand, we will take the “high road” and use the
conceptual formula.
Conceptual formulas (6.1, 6.2, and 6.3) clarify the logic
involved, but in the case of analysis
of variance, they also require a good deal of tiresome
subtracting and then squaring of
numbers. To minimize the tedium, the data sets here are all
relatively small. When larger
studies are done by hand, people often shift to the “calculation
formulas” for simpler
arithmetic, but there is a sacrifice to clarity. Happily, you will
seldom ever find yourself
doing manual ANOVA calculations, and after a few simple
longhand problems, this chap-
ter will explain how you can utilize Excel or SPSS for help with
the larger data sets.
Calculating the Sums of Squares
A researcher is interested in the level of social isolation people
feel in small towns (a),
suburbs (b), and cities (c). Participants randomly selected from
each of those three settings
take the Assessment List of Nonnormal Environments
(ALONE), for which the following
scores are available:
a. 3, 4, 4, 3
b. 6, 6, 7, 8
c. 6, 7, 7, 9
We know we are going to need the mean of all the data (MG) as
well as the mean for each
group (Ma, Mb, Mc ), so we will start there. Verify that
a x 5 70 and N 5 12, so that MG 5 5.833.
For the small-town subjects,
a xa 5 14 and na 5 4, so Ma 5 3.50.
For the suburban subjects,
a xb 5 27 and nb 5 4, so Mb 5 6.750.
suk85842_06_c06.indd 192 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
For the city subjects,
a xc 5 29 and nc 5 4, so Mc 5 7.250.
For the sum of squares total, the formula is
SStot 5 a (x 2 MG)2.
SStot 5 41.67
The calculations are in Table 6.1.
Table 6.1: Calculating the sum of squares total (SStot )
SStot 5 a (x 2 MG)
2, MG 5 5.833
For the Town Data
x 2 M 1x 2 M2 2
3 2 5.833 5 22.833 8.026
4 2 5.833 5 21.833 3.360
4 2 5.833 5 21.833 3.360
3 2 5.833 5 22.833 8.026
For the Suburb Data
x 2 M 1x 2 M2 2
6 2 5.833 5 0.167 0.028
6 2 5.833 5 0.167 0.028
7 2 5.833 5 1.167 1.362
8 2 5.833 5 2.167 4.696
For the City Data
x 2 M 1x 2 M2 2
6 2 5.833 5 0.167 0.028
7 2 5.833 5 1.167 1.362
7 2 5.833 5 1.167 1.362
9 2 5.833 5 3.167 10.030
SStot 5 41.668
suk85842_06_c06.indd 193 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
For the sum of squares between, the formula is
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc
The SSbet involves three groups rather than the 12 individuals
required for SStot. The SSbet
is as follows:
SSbet 5 (Ma 2 MG )
2na 1 (Mb 2 MG)
2nb 1 (Mc 2 MG )
2nc
5 (3.5 2 5.833)2(4) 1 (6.75 2 5.833)2(4) 1 (7.25 2 5.833)2(4)
5 21.772 1 3.364 1 8.032
5 33.17
The SSwith indicates the error variance by determining the
differences between individual scores in a group and their
means. The formula is
SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2
SSwith 5 8.50
The calculations are in Table 6.2.
Because we calculated the SSwith directly instead of
determining it by subtraction, we can
now check for accuracy by adding its value to the SSbet. If the
calculations are correct,
SSwith 1 SSbet 5 SStot. For the isolation example, we have
8.504 1 33.168 5 41.67
In the initial calculation, SStot 5 41.67. The difference of .004
is round-off difference and is
unimportant.
Although they were not called sums of squares, we have
calculated an equivalent statis-
tic since Chapter 1. At the heart of the standard deviation
calculation is those repetitive
x 2 M differences for each score in the sample. The difference
values are then squared and
summed much as they are for calculating SSwith and SStot.
Further, the denominator in the
standard deviation calculation is n 2 1, which should look
suspiciously like some of the
degrees of freedom values we will discuss in the next section.
Interpreting the Sums of Squares
The different sums-of-squares values are measures of data
variability, which makes them
like the standard deviation, variance measures, the standard
error of the mean, and so on.
But there is an important difference between SS and the other
statistics. In addition to data
variability, the magnitude of the SS value reflects the number of
scores included. Because
sums of squares are in fact the sum of squared values, the more
values there are, the larger
E What will
SStot 2 SSwith yield?
Try It!
suk85842_06_c06.indd 194 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
the value becomes. With statistics like the standard deviation,
adding more values near
the mean of the distribution actually shrinks its value. But this
cannot happen with the
sum of squares. Additional scores, whatever their value, will
almost always increase the
sum-of-squares.
Table 6.2: Calculating the sum of squares within (SSwith )
SSwith 5 a 1xa 2 Ma 2 2 1 a 1xb 2 Mb 2 2 1 a 1xc 2 Mc 2 2
3, 4, 4, 3
6, 6, 7, 8
6, 7, 7, 9
Ma 5 3.50, Mb 5 6.750, Mc 5 7.250
For the Town Data
x 2 M 1x 2 M2 2
3 2 3.50 5 20.50 0.250
4 2 3.50 5 0.50 0.250
4 2 3.50 5 0.50 0.250
3 2 3.50 5 20.50 0.250
For the Suburb Data
x 2 M 1x 2 M2 2
6 2 6.750 5 20.750 0.563
6 2 6.750 5 20.750 0.563
7 2 6.750 5 0.250 0.063
8 2 6.750 5 1.250 1.563
For the City Data
x 2 M 1x 2 M2 2
6 2 7.250 5 21.250 1.563
7 2 7.250 5 20.250 0.063
7 2 7.250 5 20.250 0.063
9 2 7.250 5 1.750 3.063
SSwith 5 8.504
suk85842_06_c06.indd 195 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
This characteristic makes the sum of squares difficult to
interpret. What constitutes much
or little variability depends not just on how much difference
there is between the scores
and the mean to which they are compared but also on how many
scores there are. Fisher
turned the sum-of-squares values into a “mean measure of
variability” by dividing each
sum-of-squares value by its degrees of freedom. The SS 4 df
operation creates what is
called the mean square (MS).
In the one-way ANOVA, there is a MS value associated with
both the SSbet and the SSwith
(SSerr). There is no mean squares total given in the table, but if
this were to be calculated, it
is the total variance (SSbet 1 SSwith) divided by the entire data
set as a single sample minus
one (N 2 1). Dividing the SStot by its degrees of freedom (N 2
1) would provide a mean
level of overall variability, but that would not help answer
questions about the ratio of
between-groups variance to within-groups variance.
The degrees of freedom for each of the sums of squares
calculated for the one-way ANOVA
are as follows:
• Degrees of freedom total 5 N 2 1, where N is the total number
of scores
• Degrees of freedom for between (dfbet) 5 k 2 1, where k is the
number of groups
SSbet 4 dfbet 5 MSbet
• Degrees of freedom for within (dfwith) 5 N 2 k
SSwith 4 dfwith 5 MSwith
Although there is no MStot, we need the sum of squares for
total (SStot) and the degrees of
freedom for total (dftot) because they provide an accuracy
check:
a. The sums of squares between and within should equal total
sum of squares:
SSbet 1 SSwith 5 SStot
b. The sum of degrees of freedom between and within should
equal degrees of
freedom total:
dfbet 1 dfwith 5 dftot
Remembering these relationships can help reveal errors. In
other words, the concept of
error is unexplained or unsystematic variance within groups
(SSwith)
that is considered
variance not caused by experimental manipulation, as opposed
to explained or systematic
variance due to experimental variance between groups (SSbet).
The F Ratio
The mean squares for between and within are the components of
F, and the F ratio is the
test statistic in ANOVA. As noted earlier in this chapter, the F
is a ratio:
F 5
MSbet
MSwith
Formula 6.4
suk85842_06_c06.indd 196 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
The issue is whether the MSbet, which contains the treatment
effect and some error, is
substantially greater than the MSwith, which contains only
error. This is illustrated in
Figure 6.3 by comparing the distance from the mean of the first
distribution to the mean
of the second distribution, the A variance, to the B and C
variances, which indicate the
differences within groups.
If the MSbet/MSwith ratio is large—it must be substantially
greater than 1—the difference
between groups is likely to be significant. When that ratio is
small (close to 1), F is likely
to be nonsignificant. How large F must be to be significant
depends on the degrees of free-
dom for the problem, just as it did for the t-tests.
Figure 6.3: The F-ratio: comparing variance between groups (A)
to
variance within groups (B 1 C)
The ANOVA Table
With the sums of squares and the degrees of freedom for the
different values in hand, the
ANOVA results are presented in a table often referred to as a
source table, indicating the
sources of variability that indicates
• the source of the variance,
• the sums-of-squares values,
• the degrees of freedom
for total degrees of freedom, dftot 5 N 2 1 (because N 5 12 dftot
5 11),
for between degrees of freedom, dfbet 5 k 2 1(because k, the
number of groups,
5 3 dfbet 5 k 2 1, dfbet 5 2),
for within degrees of freedom, df 5 N 2 k (because N 5 12 and k
5 3, dfwith 5 9,
• the mean square values, which are SS/df, and
• the F value, which is the MSbet/MSwith.
B C
A
suk85842_06_c06.indd 197 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
For the social isolation problem, the ANOVA table is
Source SS df MS F
Between 33.17 2 16.58 17.55
Within 8.50 9 .95
Total 41.67 11
The table makes it easy to check some of the results for
accuracy. Check that
SSbet 1 SSwith 5 SStot
Also verify that
dfbet 1 dfwith 5 dftot
In the course of checking results, note the sums-of-squares
values can never be negative.
Because the SS values are literally sums of squares, a negative
number indicates a calcula-
tion error somewhere because there is no such thing as negative
variability (Chapter 1).
The smallest a sum-of-squares value can be is 0, and this can
happen only if all scores in
the sum-of-squares calculation have the same value.
Understanding F
The larger F is, the more likely it is to be statistically
significant, but how large is large
enough? In the preceding ANOVA table, F 5 17.551, which
seems like a comparatively
large value.
• The fact that F is determined by dividing MSbet by MSwith
indicates that whatever
the value of F is indicates the number of times MSbet is greater
than MSwith.
• Here MSbet is 17.551 times greater than MSwith, which seems
promising, but to be
sure, it must be compared to a value from the critical values of
F (Table 6.3, which
is repeated in the Appendix as Table C).
As with the t-test, as degrees of freedom increase, the critical
values decline. The difference
is that with F two df values are involved: one for the MSbet and
the other for the MSwith.
• In Table 6.3 (also Table C in the Appendix), the critical value
is identified by
moving across the top of the table to the dfbet (the df
numerator) and then moving
down that column to the dfwith (the df denominator). According
to the social isola-
tion test ANOVA table above, these are
the dfbet 5 2 and
the dfwith 5 9.
suk85842_06_c06.indd 198 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
• The intersection of the 2 at the top and the 9 along the left
side of the table leads
to two critical values, one in regular type, which is for a 5 .05
and is the default,
and one in bold type, which is the value for testing at the
critical a 5 .01.
• The critical value when testing at p 5 .05 is 4.26.
• The critical value indicates that any ANOVA test with 2 and 9
df that has an F
value equal to or greater than 4.26 is statistically significant.
The social isolation differences between the three groups are
probably
not due to sampling variability. The statistical decision is to
reject H0.
The relatively large value of F—it is more than four times the
critical
value—indicates that of the differences in social isolation, much
more
of it is probably related to where respondents live than to the
amount
that is error variance.
Table 6.3: The critical values of F
Values in regular type indicate the critical value for p = .05.
Values in bold type indicate the critical value for p = .01.
df denominator df numerator
1 2 3 4 5 6 7 8 9 10
2 18.51
98.49
19.00
99.01
19.16
99.17
19.25
99.25
19.30
99.30
19.33
99.33
19.35
99.36
19.37
99.38
19.38
99.39
19.40
99.40
3 10.13
34.12
9.55
30.82
9.28
29.46
9.12
28.71
9.01
28.24
8.94
27.91
8.89
27.67
8.85
27.49
8.81
27.34
8.79
27.23
4 7.71
21.20
6.94
18.00
6.59
16.69
6.39
15.98
6.26
15.52
6.16
15.21
6.09
14.98
6.04
14.80
6.00
14.66
5.96
14.55
5 6.61
16.26
5.79
13.27
5.41
12.06
5.19
11.39
5.05
10.97
4.95
10.67
4.88
10.46
4.82
10.29
4.77
10.16
4.74
10.05
6 5.99
13.75
5.14
10.92
4.76
9.78
4.53
9.15
4.39
8.75
4.28
8.47
4.21
8.26
4.15
8.10
4.10
7.98
4.06
7.87
7 5.59
12.25
4.74
9.55
4.35
8.45
4.12
7.85
3.97
7.46
3.87
7.19
3.79
6.99
3.73
6.84
3.68
6.72
3.64
6.62
8 5.32
11.26
4.46
8.65
4.07
7.59
3.84
7.01
3.69
6.63
3.58
6.37
3.50
6.18
3.44
6.03
3.39
5.91
3.35
5.81
9 5.12
10.56
4.26
8.02
3.86
6.99
3.63
6.42
3.48
6.06
3.37
5.80
3.29
5.61
3.23
5.47
3.18
5.35
3.14
5.26
10 4.96
10.04
4.10
7.56
3.71
6.55
3.48
5.99
3.33
5.64
3.22
5.39
3.14
5.20
3.07
5.06
3.02
4.94
2.98
4.85
11 4.84
9.65
3.98
7.21
3.59
6.22
3.36
5.67
3.20
5.32
3.09
5.07
3.01
4.89
2.95
4.74
2.90
4.63
2.85
4.54
12 4.75
9.33
3.89
6.93
3.49
5.95
3.26
5.41
3.11
5.06
3
4.82
2.91
4.64
2.85
4.50
2.80
4.39
2.75
4.30
13 4.67
9.07
3.81
6.70
3.41
5.74
3.18
5.21
3.03
4.86
2.92
4.62
2.83
4.44
2.77
4.30
2.71
4.19
2.67
4.10
F If the F in an
ANOVA is 4.0 and
the MSwith 5 2.0, what
will be the value of MSbet?
Try It!
(continued)
suk85842_06_c06.indd 199 10/23/13 1:40 PM
CHAPTER 6Section 6.1 One-Way Analysis of Variance
Table 6.3: The critical values of F (continued)
Values in regular type indicate the critical value for p = .05.
Values in bold type indicate the critical value for p = .01.
df denominator df numerator
1 2 3 4 5 6 7 8 9 10
14 4.60
8.86
3.74
6.51
3.34
5.56
3.11
5.04
2.96
4.69
2.85
4.46
2.76
4.28
2.70
4.14
2.65
4.03
2.60
3.94
15 4.54
8.68
3.68
6.36
3.29
5.42
3.06
4.89
2.90
4.56
2.79
4.32
2.71
4.14
2.64
4.00
2.59
3.89
2.54
3.80
16 4.49
8.53
3.63
6.23
3.24
5.29
3.01
4.77
2.85
4.44
2.74
4.20
2.66
4.03
2.59
3.89
2.54
3.78
2.49
3.69
17 4.45
8.40
3.59
6.11
3.20
5.19
2.96
4.67
2.81
4.34
2.70
4.10
2.61
3.93
2.55
3.79
2.49
3.68
2.45
3.59
18 4.41
8.29
3.55
6.01
3.16
5.09
2.93
4.58
2.77
4.25
2.66
4.01
2.58
3.84
2.51
3.71
2.46
3.60
2.41
3.51
19 4.38
8.18
3.52
5.93
3.13
5.01
2.90
4.50
2.74
4.17
2.63
3.94
2.54
3.77
2.48
3.63
2.42
3.52
2.38
3.43
20 4.35
8.10
3.49
5.85
3.10
4.94
2.87
4.43
2.71
4.10
2.60
3.87
2.51
3.70
2.45
3.56
2.39
3.46
2.35
3.37
21 4.32
8.02
3.47
5.78
3.07
4.87
2.84
4.37
2.68
4.04
2.57
3.81
2.49
3.64
2.42
3.51
2.37
3.40
2.32
3.31
22 4.30
7.95
3.44
5.72
3.05
4.82
2.82
4.31
2.66
3.99
2.55
3.76
2.46
3.59
2.40
3.45
2.34
3.35
2.30
3.26
23 4.28
7.88
3.42
5.66
3.03
4.76
2.80
4.26
2.64
3.94
2.53
3.71
2.44
3.54
2.37
3.41
2.32
3.30
2.27
3.21
24 4.26
7.82
3.40
5.61
3.01
4.72
2.78
4.22
2.62
3.90
2.51
3.67
2.42
3.50
2.36
3.36
2.30
3.26
2.25
3.17
25 4.24
7.77
3.39
5.57
2.99
4.68
2.76
4.18
2.60
3.85
2.49
3.63
2.40
3.46
2.34
3.32
2.28
3.22
2.24
3.13
26 4.23
7.72
3.37
5.53
2.98
4.64
2.74
4.14
2.59
3.82
2.47
3.59
2.39
3.42
2.32
3.29
2.27
3.18
2.22
3.09
27 4.21
7.68
3.35
5.49
2.96
4.60
2.73
4.11
2.57
3.78
2.46
3.56
2.37
3.39
2.31
3.26
2.25
3.15
2.20
3.06
28 4.20
7.64
3.34
5.45
2.95
4.57
2.71
4.07
2.56
3.75
2.45
3.53
2.36
3.36
2.29
3.23
2.24
3.12
2.19
3.03
29 4.18
7.60
3.33
5.42
2.93
4.54
2.70
4.04
2.55
3.73
2.43
3.50
2.35
3.33
2.28
3.20
2.22
3.09
2.18
3.00
30 4.17
7.56
3.32
5.39
2.92
4.51
2.69
4.02
2.53
3.70
2.42
3.47
2.33
3.30
2.27
3.17
2.21
3.07
2.16
2.98
Source: Richard Lowry. file://localhost/www.vassarstats.net.
Retrieved from http/::vassarstats.net:textbook:apx_d.html
suk85842_06_c06.indd 200 10/23/13 1:40 PM
file://localhost/www.vassarstats.net
http/::vassarstats.net:textbook:apx_d.html
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD
A significant t from an independent t-test provides for a simpler
interpretation than a significant F from an ANOVA with three
or more groups can provide. A significant
t indicates that the two groups probably belong to populations
with different means. A
significant F indicates that at least one group is significantly
different from at least one
other group in the study, but unless there are only two groups in
the ANOVA, it is not
clear which group is significantly different from which. If the
null hypothesis is rejected,
there are a number of possible alternatives, as we noted when
we listed all the possible
HA outcomes earlier.
The point of a post hoc test (an “after this” test) conducted
following an ANOVA is to
determine which groups are significantly different from each
other. So when F is signifi-
cant, a post hoc test is the next step. Statisticians debate the
practice of whether to run a
post hoc if F is not significant, as there may be instances in
which the overall F will be
nonsignificant yet the post hoc tests detect a significant
difference between two groups.
With the ease of running the analysis in Excel or SPSS,
researchers may run post hoc tests
to determine whether there are significant differences in means
between pairs of groups.
In the latter case, a planned comparison is most prudent for
specific detection of mean dif-
ferences. Whether planned comparison or post hoc, the
determination should be based on
the purpose of the study. If the goal is to test the null
hypotheses that the means are not
significantly different, then a significant omnibus F is
appropriate. On the other hand, if
there are specific instances of detecting differences between
means, then the F result is not
necessary and going straight to the post hoc tests will be
apropos as in a planned compari-
son between means.
There are many post hoc tests that are used for different
purposes and based on their own
assumptions and calculations (18 of them in SPSS, named after
their respective authors).
Each of them has particular strengths, but one of the more
common in the psychological
disciplines, and also one of the easiest to calculate, is John
Tukey’s HSD test, for “honestly
significant difference.”
Many statisticians use the terms liberal and conservative to
describe post hoc tests. A liberal
test is one in which there is a greater chance of finding a
significant difference between
means but a higher chance of a type I error. Fisher’s least
significant difference (LSD) test
is an example of a liberal test. These are seldom used for the
very concern of committing a
type I error. Conversely, a conservative post hoc has a lower
chance of finding a significant
difference between means but also a lower chance of a type I
error. One such conservative
test is Bonferroni’s post hoc. By their very conservative nature,
these post hoc tests are
more widely used.
Formula 6.5 produces a value that is the smallest difference
between the means of any two
samples that can be statistically significant:
HSD 5 x Å
MSwith
n
Formula 6.5
suk85842_06_c06.indd 201 10/23/13 1:40 PM
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
Where
x 5 a table value indexed to the number of groups (k) in the
problem and
the degrees of freedom within (dfwith) from the ANOVA table
MSwith 5 the value from the ANOVA table
n 5 the number in one group when group sizes are equal.
In order to compute Tukey’s HSD, follow these steps:
1. From Table 6.4 locate the value of x by mov-
ing across the top of the table to the number
of groups/treatments (k 5 3), and then down
the left side for the within degrees of freedom
(dfwith 5 9). The intersecting values are 3.95 and
5.43. The smaller of the two is the value when
p 5 .05, as it was in our test. The post hoc test is
always conducted at the same probability level
as the ANOVA. In this case, it is p 5 .05.
2. The calculation is 3.95 times the result of the
square root of .945 (the MSwith) divided by 4 (n).
3.95 Å
.954
4
5 1.920
3. This value is the minimum difference between the means of
two significantly dif-
ferent samples. The sign of the difference does not matter; it is
the absolute value
we need.
The means for social isolation in the three groups are the
following:
Ma 5 3.50 for small town respondents
Mb 5 6.750 for suburban respondents
Mc 5 7.250 for city respondents
Small towns minus suburbs:
Ma 2 Mb 5 3.50 2 6.75 5 23.25—this difference exceeds 1.92
and is significant.
Small towns minus cities:
Ma 2 Mc 5 3.50 2 7.25 5 23.75—this difference exceeds 1.92
and is significant.
Suburbs minus cities:
Mb 2 Mc 5 6.75 2 7.25 5 20.50—this difference is less than
1.92 and is not
significant.
When several groups are involved, sometimes it is helpful to
create a table that presents
all the differences between pairs of means. Table 6.5, which is
repeated in the Appendix as
Table D, is the Tukey’s HSD results for the social isolation
problem.
Formula 6.5 is used
when group sizes are
equal. However, there is an
alternate formula for unequal
group sizes for the more
adventurous:
HSD 5 Å a
MSwith
2 b a
1
n1
1
1
n2
b
with a separate HSD value
completed for each pair of
means in the problem.
Try It!
suk85842_06_c06.indd 202 10/23/13 1:40 PM
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
Table 6.4: Tukey’s HSD critical values: q (Alpha, k, df )
* The critical value for q corresponding to alpha = .05 (top) and
alpha = .01 (bottom)
df k 5 Number of Treatments
2 3 4 5 6 7 8 9 10
5 3.64
5.70
4.60
6.98
5.22
7.80
5.67
8.42
6.03
8.91
6.33
9.32
6.58
9.67
6.80
9.97
6.99
10.24
6 3.46
5.24
4.34
6.33
4.90
7.03
5.30
7.56
5.63
7.97
5.90
8.32
6.12
8.61
6.32
8.87
6.49
9.10
7 3.34
4.95
4.16
5.92
4.68
6.54
5.06
7.01
5.36
7.37
5.61
7.68
5.82
7.94
6.00
8.17
6.16
8.37
8 3.26
4.75
4.04
5.64
4.53
6.20
4.89
6.62
5.17
6.96
5.40
7.24
5.60
7.47
5.77
7.68
5.92
7.86
9 3.20
4.60
3.95
5.43
4.41
5.96
4.76
6.35
5.02
6.66
5.24
6.91
5.43
7.13
5.59
7.33
5.74
7.49
10 3.15
4.48
3.88
5.27
4.33
5.77
4.65
6.14
4.91
6.43
5.12
6.67
5.30
6.87
5.46
7.05
5.60
7.21
11 3.11
4.39
3.82
5.15
4.26
5.62
4.57
5.97
4.82
6.25
5.03
6.48
5.20
6.67
5.35
6.84
5.49
6.99
12 3.08
4.32
3.77
5.05
4.20
5.50
4.51
5.84
4.75
6.10
4.95
6.32
5.12
6.51
5.27
6.67
5.39
6.81
13 3.06
4.26
3.73
4.96
4.15
5.40
4.45
5.73
4.69
5.98
4.88
6.19
5.05
6.37
5.19
6.53
5.32
6.67
14 3.03
4.21
3.70
4.89
4.11
5.32
4.41
5.63
4.64
5.88
4.83
6.08
4.99
6.26
5.13
6.41
5.25
6.54
15 3.01
4.17
3.67
4.84
4.08
5.25
4.37
5.56
4.59
5.80
4.78
5.99
4.94
6.16
5.08
6.31
5.20
6.44
16 3.00
4.13
3.65
4.79
4.05
5.19
4.33
5.49
4.56
5.72
4.74
5.92
4.90
6.08
5.03
6.22
5.15
6.35
17 2.98
4.10
3.63
4.74
4.02
5.14
4.30
5.43
4.52
5.66
4.70
5.85
4.86
6.01
4.99
6.15
5.11
6.27
18 2.97
4.07
3.61
4.70
4.00
5.09
4.28
5.38
4.49
5.60
4.67
5.79
4.82
5.94
4.96
6.08
5.07
6.20
19 2.96
4.05
3.59
4.67
3.98
5.05
4.25
5.33
4.47
5.55
4.65
5.73
4.79
5.89
4.92
6.02
5.04
6.14
20 2.95
4.02
3.58
4.64
3.96
5.02
4.23
5.29
4.45
5.51
4.62
5.69
4.77
5.84
4.90
5.97
5.01
6.09
24 2.92
3.96
3.53
4.55
3.90
4.91
4.17
5.17
4.37
5.37
4.54
5.54
4.68
5.69
4.81
5.81
4.92
5.92
30 2.89
3.89
3.49
4.45
3.85
4.80
4.10
5.05
4.30
5.24
4.46
5.40
4.60
5.54
4.72
5.65
4.82
5.76
40 2.86
3.82
3.44
4.37
3.79
4.70
4.04
4.93
4.23
5.11
4.39
5.26
4.52
5.39
4.63
5.50
4.73
5.60
Source: Tukey’s HSD critical values (n.d.). Retrieved from
http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html.
suk85842_06_c06.indd 203 10/23/13 1:40 PM
http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
Table 6.5: Presenting Tukey’s HSD results in a table
HSD 5 x Å
MSwith
n
5 3.95 Å
.954
4
5 1.920 (x2)
Any difference between pairs of means 1.920 or greater is a
statistically significant difference.
Mean differences in orange are statistically significant.
Small towns
M 5 3.500
Suburbs
M 5 6.750
Cities
M 5 7.250
Small towns
M 5 3.500
Diff 5 3.250 Diff 5 3.750
Suburbs
M 5 6.750
Diff 5 0.500
Cities
M 5 7.250
The values entered in the cells in Table 6.5 indicate the
differences between each pair of
means in the study. Comparing the mean scores from each of the
three groups indicates
that the respondents from small towns expressed a significantly
lower level of social
isolation than those in either the suburbs or cities. Comparing
the mean scores from the
suburban and city groups indicates that social isolation scores
are higher in the city, but
the difference is not large enough to be statistically significant.
The significant F from the ANOVA indicated that at least one
group had a significantly
different level of social isolation from at least one other group,
but that is all a significant F
can reveal. The result does not indicate which group is
significantly different from which
other group, unless there are only two groups. The post hoc test
indicates which pairs of
groups are significantly different from each other. Table 6.5 is
an example of how to illus-
trate the significant and the nonsignificant differences. One
caveat in using Tukey’s HSD
is that there is an assumption of equality of variances
(homogeneity) between groups
based on Levene’s test. This assumption applies here as well.
Suppose there is a violation
of homogeneity. In that instance, an adjusted post hoc that
accounts for inequality of vari-
ances (or heterogeneity) will need to be employed. To
implement this in SPSS for instance
there are four options under the Equal Variances Not Assumed
heading when conducting
a post hoc for ANOVA. One of these approaches is the Games-
Howell post hoc, which is
executed by checking that box in SPSS Post Hoc tests tab for
ANOVA.
suk85842_06_c06.indd 204 10/23/13 1:40 PM
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
Apply It!
ANOVA and Product Development
A product development specialist in a major computer company
decides that
it would be a significant improvement to keyboards if they were
designed to
fit the shape of human hands. Instead of being flat, the new
keyboard would
curve like the surface of a football. Before the company
executives are willing to expend the
resources necessary to produce and distribute such a product,
they need to know whether
it will sell and what the most comfortable curvature of the
keyboard would be.
The company produces prototypes for four different keyboards,
labeled Prototype A through
D (see Table 6.6). Prototype A is a standard flat keyboard, and
the others each have varying
amounts of curve. Everything else about the keyboards is the
same, so this is a one-way
ANOVA. Forty different users are randomly assigned to test one
of the four keyboards and
rank them in comfort on a 100-point scale. The results are
shown below Figure 6.4.
Table 6.6: Prototype A–D data set
Prototype A Prototype B Prototype C Prototype D
49 57 77 65
57 53 82 61
73 69 77 73
68 65 85 81
65 61 93 89
62 73 79 77
61 57 73 81
45 69 89 77
53 73 82 69
61 77 85 77
Next, the test results are analyzed in Excel, which produces the
information in Figure 6.4.
(continued)
suk85842_06_c06.indd 205 10/23/13 1:40 PM
CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc
Tests and Tukey’s HSD
Apply It! (continued)
Figure 6.4: Excel results of comparison means and ANOVA of
prototypes
The null hypothesis is that there is no difference among the four
keyboards. From Table
6.6, we see that the F value is 16.72, which is larger than the
critical value of F 5 2.87 at
the critical a 5 .05. Therefore the null hypothesis is rejected at p
, .05. At least one of the
prototypes is significantly different from at least one other
prototype.
Because there is a significant F, the marketers next compute
HSD:
HSD 5 x Å
MSwith
n
Where
x 5 3.81 (based on k 5 4, dfwith 5 36, and p 5 .05)
MSwith 5 61.07, the value from the ANOVA table
n 5 10, the number in one group when group sizes are equal
HSD 5 9.42
(continued)
Groups
Prototype A
Prototype B
Prototype C
Prototype D
Count
10
10
10
10
Sum
594
654
822
750
Average
59.4
65.4
82.2
75.0
Variance
73.82
65.60
36.40
68.44
Summary
Source of
Variation
Between
Groups
Within
Groups
Total
SS
3063.6
2198.4
5262
df
3
36
39
MS
1021.20
61.07
F
16.72
p-value
5.71E–07
Fcrit
2.87
ANOVA
suk85842_06_c06.indd 206 10/23/13 1:40 PM
CHAPTER 6Section 6.3 Determining the Results’ Practical
Importance
Apply It! (continued)
This value is the minimum difference between the means of two
significantly different sam-
ples. The difference in means between the groups is shown
below:
A 2 B 5 26.0
A 2 C 5 222.8
A 2 D 5 215.6
B 2 C 5 216.8
B 2 D 5 29.6
C 2 D 5 7.2
The differences in comfort between Prototypes A-B and C-D are
not statistically significant,
because the absolute values are less than the Tukey’s HSD value
of 9.42. However, the differ-
ences in comfort between the remaining prototypes are
statistically significant.
Based on analysis of the one-way ANOVA, the marketing team
decides to produce and sell
the keyboard configuration of Prototype C. This had the highest
mean comfort level and will
be a significant improvement over existing keyboards.
Apply It! boxes written by Shawn Murphy
6.3 Determining the Results’ Practical Importance
Three questions can come up in an ANOVA. The second and
third questions depend upon
the answer to the first:
1. Are any of the differences statistically significant? The
answer depends upon
how the calculated F value compares to the critical value from
the table.
2. If the F is significant, which groups are significantly
different from each other?
That question is answered by completing a post hoc test such as
Tukey’s HSD.
3. If F is significant, how important is the result? The answer
comes by calculating an
effect size.
After addressing the first two questions, we now turn our
attention to the third question,
effect size. With the t-test in Chapter 5, Cohen’s d answered the
question about how impor-
tant the result was. Several effect-size statistics have been used
to explain the importance
of a significant ANOVA result. Omega squared (v2) and partial-
eta-squared (partial-h2)
are both quite common in the social science research literature,
but the one we will use is
called eta-squared (H2). The Greek letter eta (h pronounced like
“ate a” as in “ate a grape”)
is the equivalent of the letter h. Because some of the variance in
scores is unexplained and
is therefore error variance, eta-squared answers this question:
How much of the score
variance can be attributed to the independent variable?
suk85842_06_c06.indd 207 10/23/13 1:40 PM
CHAPTER 6Section 6.3 Determining the Results’ Practical
Importance
In the social isolation problem, the question was whether
residents of small towns, subur-
ban areas, and cities differ in the amount of social isolation they
indicate. The respondents’
location is the IV. Eta-squared estimates how much of the
difference in social isolation is
related to where respondents live.
There are only two values involved in the h2 calculation, both
retrievable from the ANOVA
table. Formula 6.6 shows the eta-squared calculation:
h2 5
SSbet
SStot
Formula 6.6
Eta-squared is the ratio of between-groups variability to total
variability. If there was no
error variance, all variance would be due to the independent
variable, and the sums of
squares for between-groups variability and for total variability
would have the same val-
ues; the effect size would be 1.0. With human subjects, this
never happens because scores
fluctuate for reasons other than the IV, but it is important to
know that 1.0 is the “upper
bound” for this effect size. The lower bound is 0, of course—
none of the variance is
explained. But we also never see eta-squared values of 0
because the only time the effect
size is calculated is when F is significant, and that can only
happen when the effect of the
IV is great enough that the ratio of MSbet to MSwith exceeds
the critical value.
For the social isolation problem, SSbet 5 33.168 and SStot 5
41.672, so
h2 5
33.168
41.672
5 0.796.
According to this data, about 80% (79.6% to be exact) of the
variance in social isolation scores is related to whether the
respondent lives in a small town, a suburb, or a city. (Note
that this amount of variance is unrealistically high, which
can happen when numbers are contrived.)
G If the F in
ANOVA is not
significant, should the
post hoc test or the
effect-size calculation be
made?
Try It!
Apply It!
Using ANOVA to Test Effectiveness
A pharmaceutical company has developed a new medicine to
treat a skin condition. This medi-
cine has been proven effective in previous tests, but now the
company is trying to decide the
best method to deliver the medicine. The options are
1. pills that are taken orally,
2. a cream that is rubbed into the affected area, or
3. drops that are placed on the affected area.
(continued)
suk85842_06_c06.indd 208 10/23/13 1:40 PM
CHAPTER 6Section 6.3 Determining the Results’ Practical
Importance
Apply It! (continued)
To test the application methods, the company uses 24 volunteers
who suffer
from this skin condition. Each of the volunteers is randomly
assigned to one of
the three treatment methods. Note that each volunteer tests only
one of the
delivery methods. This satisfies the requirement that the
categories of the IV
must be independent. This is a one-way ANOVA test with the
delivery method
being the only independent variable.
To evaluate the effectiveness of each delivery method, three
different dermatologists exam-
ine each patient after the course of treatment. They then rate the
skin condition on a scale
of 1 through 20, with 20 being a total absence of the condition.
The scores from the three
doctors are then averaged.
The null hypothesis is that all three delivery methods are
equally effective:
H0: mpills 5 mcream 5 mdrops
The null hypothesis indicates that the three treatments were
drawn from populations with
the same mean. The alternate hypothesis for the ANOVA test is
Ha: mpills ? mcream ? mdrops
Data from the trial is shown in Table 6.7.
Table 6.7: Data from trial of skin treatment conditions
Pills Cream Drops
14 18 13
13 15 15
19 16 16
18 18 15
15 17 14
16 13 17
12 17 13
12 18 16
(continued)
suk85842_06_c06.indd 209 10/23/13 1:40 PM
CHAPTER 6Section 6.3 Determining the Results’ Practical
Importance
Apply It! (continued)
Figure 6.5: Analysis of the data that was performed in Excel
Figure 6.5 shows the value for F is 1.72, which is less than the
Fcrit value of 3.47 when testing
at p 5 .05. Therefore, the null hypothesis is not rejected. We
cannot say that the different
delivery methods come from populations with different means.
Looking at the p value gen-
erated by Excel, we see that there is a 20% probability that a
difference in means this large
could have occurred by chance alone. Because the null
hypothesis is not rejected, there is
no need to perform either a Tukey’s HSD test or an h2
calculation.
The pharmaceutical company decides to offer the medicine as a
cream because this is gen-
erally their preferred delivery method. The ANOVA test has
assured them that this is the
correct choice, and that neither of the two alternate methods
provided a more effective
delivery option. In other words, the alternative hypothesis is not
correct.
Apply It! boxes written by Shawn Murphy
Groups
Pills
Cream
Drops
Count
8
8
8
Sum
119
132
119
Average
14.88
16.50
14.88
Variance
6.98
3.14
2.13
Summary
Source of
Variation
Between
Groups
Within
Groups
Total
SS
14.08
85.75
99.83
df
2
21
23
MS
7.04
4.08
F
1.72
p-value
0.20
Fcrit
3.47
ANOVA
suk85842_06_c06.indd 210 10/23/13 1:40 PM
CHAPTER 6Section 6.4 Conditions for the One-Way ANOVA
6.4 Conditions for the One-Way ANOVA
As we saw with the t-tests, any statistical test requires that
certain conditions (also referred
to as assumptions) are met. The conditions might be
characteristics such as the scale of the
data, the way the data is distributed, the relationships between
the groups in the analysis,
and so on. In the case of the one-way ANOVA, the name
indicates one of the conditions.
• This particular test can accommodate just one independent
variable.
• That one variable can have any number of categories, but there
can be just one IV.
In the example of small-town, suburban, and city isolation, the
IV was the loca-
tion of the respondents’ residence. We might have added more
categories such as
small-town, semirural, small town, large town, suburbs of small
cities, suburbs of
large cities, and so on, all of which relate to the respondents’
place of residence,
but like the independent t-test, there is no way to add another
variable, such as
the respondents’ gender, in a one-way ANOVA.
• The categories of the IV must be independent.
• Like the independent t-test, the groups involved must be
independent. Those
who are members of one group cannot also be members of
another group
involved in the same analysis.
• The IV must be nominal scale.
• Because the IV must be nominal scale, sometimes data of
some other scale is
reduced to categorical data to complete the analysis. If someone
is interested
in whether there are differences in social isolation related to
age, age must be
changed from ratio to nominal data prior to the analysis. Rather
than using each
person’s age in years as the independent variable, ages are
grouped into catego-
ries such as 20s, 30s, and so on. This is not ideal, because by
reducing ratio data
to nominal or even ordinal scale, the differences in social
isolation between, for
example, 20- and 29-year-olds are lost.
• The DV must be interval or ratio scale.
• Technically, social isolation would need to be measured with
something like the
number of verbal exchanges that one has daily with neighbors or
co-workers,
rather than asking on a scale of 1–10 to indicate how isolated
one feels, which is
probably an example of ordinal data.
• The groups in the analysis must be similarly distributed. The
technical descrip-
tion for this similarity of distribution is homogeneity of
variance. For example,
this condition means that the groups should all have reasonably
similar standard
deviations. This was discussed in Chapter 5 where the Levene’s
test is used to
test equality of variances.
• Finally, using ANOVA assumes that the samples are drawn
from a normally dis-
tributed population.
It may seem difficult to meet all these conditions. However,
keep in mind that normality
and homogeneity of variance in particular represent ideals more
than practical necessities.
As it turns out, Fisher’s procedure can tolerate a certain amount
of deviation from these
requirements; this test is quite robust.
suk85842_06_c06.indd 211 10/23/13 1:40 PM
CHAPTER 6Section 6.5 ANOVA and the Independent t-Test
6.5 ANOVA and the Independent t-Test
The one-way ANOVA and the independent t-test share several
assumptions, although they employ distinct statistics, in that the
sums of squares is used for ANOVA and the
standard error of the difference is used for the t-test. For
example, both tests will lead the
analyst to the same conclusion. This consistency can be
illustrated by completing ANOVA
and the independent t-test for the same data.
Suppose an industrial psychologist is interested in how people
from two separate divi-
sions of a company differ in their work habits. The dependent
variable is the amount of
work completed after-hours at home per week for supervisors in
marketing versus super-
visors in manufacturing. The data is as follows:
Marketing: 3, 4, 5, 7, 7, 9, 11, 12
Manufacturing: 0, 1, 3, 3, 4, 5, 7, 7
Calculating some of the basic statistics yields the following:
M s SEM SEd MG
Marketing: 7.25 3.240 1.146
1.458 5.50
Manufacturing: 3.75 2.550 0.901
First, the t-test:
t 5
M1 2 M2
SEd
5
7.25 2 3.75
1.458
5 2.401; t.05(14) 5 2.145
The difference is significant. Those in marketing (M1) take
significantly more work home
than those in manufacturing (M2).
Now the ANOVA:
• SStot 5 a (x 2 MG)2 5 168
• Verify that the result of subtracting MG from each score in
both groups, squaring
the differences, and summing the square 5 168.
• SSbet 5 (Ma 2 MG)
2na 1 (Mb 2 MG)
2nb
• This one is not too lengthy to do here: (7.25 2 5.50)2(8) 1
(3.75 2 5.50)2(8)
5 24.5 1 24.5 5 49.
• SSwith 5 (xa 2 Ma)
2 1 (xb 2 Mb)
2
• Verify that the result of subtracting the group means from
each score in the par-
ticular group, squaring the differences, and summing the
squares 5 119.
• Check that SSwith 1 SSbet 5 SStot : 119 1 49 5 168.
suk85842_06_c06.indd 212 10/23/13 1:40 PM
CHAPTER 6Section 6.6 Completing ANOVA with Excel
Source SS df MS F Fcrit
Between 49 1 49 5.765 F.05(1,14) 5 4.60
Within 119 14 8.5
Total 168 15
Like the t-test, ANOVA indicates that the difference in the
amount of work completed at
home is significantly different for the two groups, so at least
both tests draw the same
conclusion about whether the result is significant, but there is
more similarity than this.
• Note that the calculated value of t 5 2.401, and the calculated
value of F 5 5.765.
• If the value of t is squared, it equals the value of F.
2.4012 5 5.765.
• The same is true for the critical values:
t.05(14) 5 2.145
F.05(1,14) 5 4.60
2.1452 5 4.60
Gosset’s and Fisher’s tests draw exactly equivalent conclusions
when there are two
groups. The ANOVA tends to be more work, and researchers
ordinarily use the t-test for
two groups, but the point is that the two tests are entirely
consistent.
6.6 Completing ANOVA with Excel
The ANOVA by longhand involves enough calculated means,
subtractions, squaring of differences, and so on that doing an
ANOVA on Excel is beneficial. A researcher
is comparing the level of optimism indicated by people in
different vocations during an
economic recession. The data is from laborers, clerical staff in
professional offices, and the
professionals in those offices. The data for the three groups
follows:
Laborers: 33, 35, 38, 39, 42, 44, 44, 47, 50, 52
Clerical staff: 27, 36, 37, 37, 39, 39, 41, 42, 45, 46
Professionals: 22, 24, 25, 27, 28, 28, 29, 31, 33, 34
1. Create the data file in Excel. Enter Laborers, Clerical staff,
and Professionals in
cells A1, B1, and C1, respectively.
2. In the columns below those labels, enter the optimism scores,
beginning in cell
A2 for the laborers, B2 for the clerical workers, and C2 for the
professionals. Once
the data is entered and checked for accuracy, proceed with the
following steps.
3. Click the Data tab at the top of the page.
H What is
the relationship
between the values
of t and F if both are
performed for the same
two-group test?
Try It!
suk85842_06_c06.indd 213 10/23/13 1:40 PM
CHAPTER 6Section 6.6 Completing ANOVA with Excel
4. At the extreme right, choose Data Analysis.
5. In the Analysis Tools window, select ANOVA Single Factor
and click OK.
6. Indicate where the data is located in the Input Range. In the
example here, the
range is A2:C11.
7. Note that the default is “Grouped by Columns.” If the data is
arrayed along rows
instead of columns, this would need to be changed.
Because we designated A2 instead of A1 as the point where the
data begins, there is no
need to indicate that labels are in the first row.
8. Select Output Range and enter a cell location where you wish
the display of the
output to begin. In the example in Figure 6.6, the location is
A13.
9. Click OK.
Widen column A to make the output easier to read. It will look
like the screenshot in
Figure 6.6.
Figure 6.6: Performing an ANOVA on Excel
suk85842_06_c06.indd 214 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
As you have already seen in the two Apply It! boxes, the results
appear in two tables.
The first provides descriptive statistics. The second table looks
like the longhand table of
results for the social isolation example, except that
• the figures shown for the total follow those for between and
within instead of
preceding them, and
• the P-value column indicates the probability that an F of this
magnitude could
have occurred by chance.
Note that the P value is 4.31E-06. The “E-06” is scientific
notation. It is a shorthand way of
indicating that the actual value is p 5 .00000431, or 4.31 with
the decimal moved six deci-
mals to the left, or negative exponent to the sixth power The
probability easily exceeds the
p 5 .05 standard for statistical significance.
6.7 Presenting Results
The previous analyses all used Excel, so we will now shift to
using SPSS for the execution of these steps and the
interpretation of the results. We will first use the
data in Table 6.7 and then proceed with actual data gathered
from published research.
You will see that we use the same steps regardless of the sample
size, and that using
technology like Excel and SPSS makes hand calculations
unnecessary. While hand cal-
culations are instructive, they are also laborious and more prone
to errors, especially
with large data sets.
SPSS Example 1: Steps for ANOVA
After setting up the data in SPSS as seen in Figure 6.7 (data
from Table 6.7), the steps in
executing this analysis are as follows:
Analyze S Compare Means S One-Way ANOVA. Place
Treatment into the Factor box
and Skin Condition into the Dependent List. Click Post Hoc on
the left and check Tukey
and Games-Howell; then click Options and check Descriptive
and Homogeneity of
variance test. Click Continue and OK. (Note that the three
treatment groups in the data
set (Figure 6.7) are numerically coded: Pills 5 1, Creams 5 2,
and Drops 5 3.)
suk85842_06_c06.indd 215 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
Figure 6.7: Data set in SPSS
suk85842_06_c06.indd 216 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
Figure 6.8: SPSS output from trial of skin treatment conditions
Levene Statistic
1.822
df1
2
df2
21
Sig.
.186
Test of Homogeneity of Variances
SkinCondition
ANOVA
SkinCondition
Between
Groups
Within
Groups
Total
Sum of
Squares
14.083
85.750
99.833
df
2
21
23
Mean
Square
7.042
4.083
F
1.724
Sig.
0.203
Descriptives
SkinCondition
Pills
Creams
Total
Drops
N
8
8
8
24
16.50
14.88
14.88
15.42
Std. DeviationMean Std. Error
1.773
1.458
2.642
2.083
.627
.515
.934
.425
15.02
13.66
12.67
14.54
17.98
16.09
17.08
16.30
13
13
12
12
18
17
19
19
95% Confidence
Interval for Mean
Minimum
Lower
Bound
Upper
Bound
Maximum
Multiple Comparisons
Dependent Variable: SkinCondition
Tukey
HSD
Games-
Howell
Pills
Creams
Drops
Pills
Creams
Drops
(I)
Treatment
1.625
�1.625
1.625
.000
�1.625
1.625
.000
1.625
�1.625
�1.625
.000
.000
1.010
1.010
1.010
1.010
1.010
1.125
1.010
.811
1.125
.811
1.067
1.067
.264
.264
.264
1.000
.264
.350
1.000
.149
.350
.149
1.000
1.000
�.92
�4.17
�.92
�2.55
�4.17
�1.37
�2.55
�.51
�4.62
�3.76
�2.89
�2.89
4.17
.92
4.17
2.55
.92
4.62
2.55
3.76
1.37
.51
2.89
2.89
(J)
Treatment
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper
Bound
Sig.
Creams
Drops
Pills
Drops
Pills
Creams
Creams
Drops
Pills
Drops
Pills
Creams
suk85842_06_c06.indd 217 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
As seen in the SPSS output (Figure 6.8), the ANOVA results are
the same as when exe-
cuted in Excel earlier in the chapter. Here SPSS allows
execution of the ANOVA including
descriptive statistics, tests of homogeneity of variance, post hoc
tests, and a line graphs—
all simultaneously executed using the SPSS steps outlined
earlier. The results begin with
the Descriptives table where you can see that each group has an
even number of partici-
pants (n 5 8). Here you can see difference in the means with
Pills (M 5 16.50) highest of
the three treatments. The Test of Homogeneity of Variance
shows a favorable result in that
it is not significant (p . .05), specifically p 5 .186. This
indicates that there is no significant
difference in the variance of the three treatments indicating
equal variances. As you recall
in earlier chapters, if there is inequality of variance across
groups an adjustment is needed
to compare groups. Next, the ANOVA table shows a
nonsignificant F statistic, p 5 .203. At
this stage since F is not significant we do not need to interpret
the post hoc tests as there
will be no significance between groups. As noted earlier in the
chapter, this is a debat-
able topic in that with the ease of running post hoc tests, the
analyst can easily look at the
results of these regardless of the F statistic result. Findings may
indicate significant differ-
ences between any two groups even though there is a
nonsignificant F. This is often a rar-
ity and you can clearly see from the example that none of the
post hoc tests is significant
between groups.
SPSS Example 2: Steps for ANOVA
Using public data about higher education and housing from Pew
Research (2010),
Social and Demographic Trends, the steps in executing this
analysis are as follows:
Analyze S Compare Means S One-Way ANOVA. Place schl
(currently enrolled in school)
into the Factor box and age into the Dependent List. Click Post
Hoc on the left and check
Tukey and Games-Howell; then click Options and check
Descriptive, Homogeneity of
variance test, and Means plot. Click Continue and OK.
suk85842_06_c06.indd 218 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
Figure 6.9: SPSS output from Pew research social and
demographic trends
(2010) education data set
Levene Statistic
44.884
df1
5
df2
1692
Sig.
.000
Test of Homogeneity of Variances
AGE What is your age?
ANOVA
AGE What is your age?
Between
Groups
Within
Groups
Total
Sum of
Squares
72748.597
272338.706
345087.303
df
5
1692
1697
Mean
Square
14549.719
160.957
F
90.395
Sig.
.000
Descriptives
AGE What is your age?
Std. DeviationMean Std. Error
18 43
18 58
18 64
18 64
18 64
22 34
19.64
24.81
36.03
31.38
42.05
29.33
38.82
33
212
33
81
1336
3
1698
4.801
8.433
15.503
10.692
13.399
6.429
14.260
.836
.579
2.699
1.188
.367
3.712
.346
17.93
23.66
30.53
29.02
41.33
13.36
38.14
21.34
25.95
41.53
33.75
42.77
45.30
39.49 18 64
95% Confidence
Interval for Mean
Minimum
Lower
Bound
Upper
Bound
Maximum
Yes, in
High School
Yes, in Technical,
trade, or
vocational school
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in Graduate
School
No
Don’t know/
Refused (VOL.)
Total
N
suk85842_06_c06.indd 219 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
Figure 6.9: SPSS output from Pew research social and
demographic trends
(2010) education data set (continued)
Multiple Comparisons
Dependent Variable: AGE What is your age?
Tukey
HSD
(I)
SCHL Are you
currently enrolled
in school?
(J)
SCHL Are you
currently enrolled in
school?
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper
Bound
Sig.
Yes, in High School
Yes, in High School
Yes, in Technical, trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in College
(undergraduate, including
2-year colleges)
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Yes, in Technical, trade, or
vocational school
Yes, in High School
Yes, in Technical, trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in High School
Yes, in Technical, trade, or
vocational school
No
Don’t know/Refused (VOL.)
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Don’t know/Refused (VOL.)
Yes, in Graduate School
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in High School
Yes, in Technical, trade, or
vocational school
No
Yes, in Graduate School
�16.394*
�5.170
�11.746*
�22.417*
�9.697
16.394*
11.224*
4.648
�6.023
6.697
5.170
�11.224*
�6.576*
�17.247*
�4.527
11.746*
�4.648
6.576*
�10.670*
2.049
22.417*
6.023
17.247*
12.720
10.670*
9.697
�6.697
4.527
�2.049
�12.720
3.123
2.374
2.620
2.236
7.650
3.123
2.374
2.620
2.236
7.650
2.374
2.374
1.657
.938
7.376
2.620
2.620
1.657
1.452
7.459
2.236
2.236
.938
7.333
1.452
7.650
7.650
7.376
7.459
7.333
.000
.249
.000
.000
.803
.000
.000
.483
.077
.952
.249
.000
.001
.000
.990
.000
.483
.001
.000
1.000
.000
.077
.000
.509
.000
.803
.952
.990
1.000
.509
�25.30
�11.94
�19.22
�28.79
�31.52
7.48
4.45
�2.83
�12.40
�15.13
�1.60
�18.00
�11.30
�19.92
�25.57
4.27
�12.12
1.85
�14.81
�19.23
16.04
�.36
14.57
�8.20
6.53
�12.13
�28.52
�16.52
�23.33
�33.64
�7.48
1.60
�4.27
�16.04
12.13
25.30
18.00
12.12
.36
28.52
11.94
�4.45
�1.85
�14.57
16.52
19.22
2.83
11.30
�6.53
23.33
28.79
12.40
19.92
33.64
14.81
31.52
15.13
25.57
19.23
8.20
* The mean difference is significant at the 0.05 level.
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in
Graduate School
No
Yes, in
High School
Yes, in Technical,
trade, or
vocational school
Don’t know/
Refused (VOL.)
suk85842_06_c06.indd 220 10/23/13 1:40 PM
CHAPTER 6Section 6.7 Presenting Results
Figure 6.9: SPSS output from Pew research social and
demographic trends
(2010) education data set (continued)
Source: Data from Pew Research: Social and Demographic
Trends. (2011). Higher Education/Housing. Retrieved from
http://www
.pewsocialtrends.org/category/datasets/.
Multiple Comparisons
Dependent Variable: AGE What is your age?
Games-
Howell
(I)
SCHL Are you
currently enrolled
in school?
(J)
SCHL Are you
currently enrolled in
school?
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Lower
Bound
Upper
Bound
Sig.
Yes, in High School
Yes, in High School
Yes, in Technical, trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in College
(undergraduate, including
2-year colleges)
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Yes, in Technical, trade, or
vocational school
Yes, in High School
Yes, in Technical, trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in High School
Yes, in Technical, trade, or
vocational school
No
Don’t know/Refused (VOL.)
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Don’t know/Refused (VOL.)
Yes, in Graduate School
No
Don’t know/Refused (VOL.)
Yes, in Graduate School
Yes, in College
(undergraduate,
including 2-year
colleges)
Yes, in
Graduate School
No
Yes, in
High School
Yes, in Technical,
trade, or
vocational school
Yes, in College
(undergraduate, including
2-year colleges)
Yes, in High School
Yes, in Technical, trade, or
vocational school
No
Yes, in Graduate School
Don’t know/
Refused (VOL.)
* The mean difference is significant at the 0.05 level.
�16.394*
�5.170*
�11.746*
�22.417*
�9.697
16.394*
11.224*
4.648
�6.023
6.697
5.170*
�11.224*
�6.576*
�17.247*
�4.527
11.746*
�4.648
6.576*
�10.670*
2.049
22.417*
6.023
17.247*
12.720
10.670*
9.697
�6.697
4.527
�2.049
�12.720
2.825
1.017
1.453
.913
3.805
2.825
2.760
2.949
2.724
4.589
1.017
2.760
1.322
.685
3.757
1.453
2.949
1.322
1.243
3.897
.913
2.724
.685
3.730
1.243
3.805
4.589
3.757
3.897
3.730
.000
.000
.000
.000
.374
.000
.003
.618
.260
.701
.000
.003
.000
.000
.814
.000
.618
.000
.000
.990
.000
.260
.000
.247
.000
.374
.701
.814
.990
.247
�24.87
�8.15
�15.96
�25.13
�38.01
7.92
2.91
�4.13
�14.25
�13.62
2.19
�19.54
�10.40
�19.21
�34.04
7.53
�13.42
2.75
�14.29
�24.33
19.70
�2.21
15.28
�17.54
7.05
�18.62
�27.02
�24.99
�28.43
�42.98
�7.92
�2.19
�7.53
�19.70
18.62
24.87
19.54
13.42
2.21
27.02
8.15
�2.91
�2.75
�15.28
24.99
15.96
4.13
10.40
�7.05
28.43
25.13
14.25
19.21
42.98
14.29
38.01
13.62
34.04
24.33
17.54
suk85842_06_c06.indd 221 10/23/13 1:40 PM
http://www.pewsocialtrends.org/category/datasets/
http://www.pewsocialtrends.org/category/datasets/
CHAPTER 6Section 6.7 Presenting Results
Figure 6.10: SPSS output graph from Pew research social and
demographic
trends (2010) education data set
Source: Data from Pew Research: Social and Demographic
Trends. (2011). Higher Education/Housing. Retrieved from
http://www
.pewsocialtrends.org/category/datasets/.
The Descriptives table in Figures 6.9 and 6.10 show that each
group has an unequal number
of participants with No (not in school) participants with n 5
1,336 and the highest mean
age (M 5 42.05). The Test of Homogeneity of Variance shows a
nonfavorable result in that
it is significant (p , .05). This indicates that there is a
significant difference in the variance
of the six education groups indicating unequal variances (or
heterogeneity of variance).
Next, the ANOVA table indicates a significant F statistic (p ,
.05). To determine which
of the group comparisons is significant using a post hoc test
when there is a violation of
homogeneity, equal variance will not be assumed. Therefore, we
will interpret the Equal
variances not assumed post hoc tests, which is Games-Howell.
Here, the Don’t Know/
Refused group is not significant with any of the other education
groups. You can also
see significant difference with several groups such as Yes, in
High School and Yes, in
Technical, trade, or vocational school. All comparisons can be
made in a similar manner
based on the significance value using the Multiple Comparisons
table. The line graph or
means plot shows the mean age of each group with the No group
having the highest mean
age and the Yes, in High School group having the lowest.
suk85842_06_c06.indd 222 10/23/13 1:40 PM
http://www.pewsocialtrends.org/category/datasets/
http://www.pewsocialtrends.org/category/datasets/
CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
Test
6.8 Interpreting Results
Though you should refer to the most recent edition of the APA
manual for specific detail on formatting statistics, the following
may be used as a quick guide in presenting the
statistics covered in this chapter.
Table 6.8: Guide to APA formatting of F statistic results
Abbreviation or Term Description
F F test statistic score
h2 Eta-squared: an effect size
v2 Omega-squared: an effect size
HSD Honestly significant difference: a Tukey’s post hoc test
SS Sum of squares
MS Mean square
Source: Publication Manual of the American Psychological
Association, 6th edition. © 2009 American Psychological
Association,
pp. 119–122.
Note that all of the terms in Table 6.8 are italicized, while HSD
is not. The following are
some examples of how to present results using these
abbreviations, though you may use
different combinations of results.
Using the data from SPSS Example 2, Figures 6.9 and 6.10, we
could present the results in
the following way:
• The overall difference between treatment and skin condition
was not signifi-
cantly different F(2,21) 5 1.724, p 5 .203. (Note that the df is
listed for both the
between- and within-group lines in the ANOVA table.)
• The overall difference between school and age was
significantly different
F(5,1692) 5 90.39, p , .05.
• The No [school] group were significantly older (M 5 42.05,
SD 5 13.39) than the
Yes, in High School group (M 5 19.64, SD 5 4.80), the Yes, In
College. . . group
(M 5 24.81, SD 5 8.43), and the Yes, in Graduate School group
(M 5 31.38,
SD 5 10.69), whereas there were no significant differences with
the Yes, in Tech-
nical, trade. . . group (M 5 36.03, SD 5 15.50), and the Don’t
Know/Refused
group (M 5 29.33, SD 5 6.43).
6.9 Nonparametric Test: Kruskal-Wallis H-test
The one-way ANOVA nonparametric equivalent is the Kruskal-
Wallis H-test, also known as the Kruskal Wallis ANOVA. Like
the Mann-Whitney U-test, the Kruskal-
Wallis H-test is based on ranked (ordinal) data. It is used as an
alternative to its para-
metric counterpart when violations of assumptions have
occurred. In fact, Kruskal was
suk85842_06_c06.indd 223 10/23/13 1:40 PM
CHAPTER 6
not a proponent of significance testing, as Bradburn (2007)
has quoted him as saying, “I am thinking these days about
the many senses in which relative importance gets consid-
ered. Of these senses, some seem reasonable and others not
so. Statistical significance is low on my ordering.” That said,
his derived equivalent of a parametric technique is very
apropos.
As in the Mann-Whitney U-test, the rank of each group is
determined and then summed. The H is calculated as a pro-
portion of the summed ranks divided by their respective
sample sizes.
H 5
12
N1N 1 12 a a
1Tg 2 2
ng
b 2 31N 1 12 Formula
6.7
Where
N 5 total sample size
Tg 5 sum of ranks across
ng 5 sample of group
To illustrate the calculation of the H-test, we will use the same
data from Table 6.7 with a
few modifications as seen in Table 6.9. Here the initial rank is
to rank all the values across
treatments with 1 being the lowest rank. If there are tied ranks,
then an average of the ranks
is taken. For instance in the Pills column, the two values of 12
have an initial rank of 1 and
2. The average of them is 1.5 as seen in the Rank column. The
same is true for values of 13,
where there are four ranks with the average rank of 4.5, and so
on with the other ties. Once
all of these are complete, then the ranks are summed as seen in
the last row of the table.
Table 6.9: Data from trial of skin treatment conditions
Pills Initial
Rank
Rank Cream Initial
Rank
Rank Drops Initial
Rank
Rank
14 7 7.5 18 21 21.5 13 3 4.5
13 4 4.5 15 10 10.5 15 11 10.5
19 24 24 16 14 14.5 16 15 14.5
18 20 21.5 18 22 21.5 15 12 10.5
15 9 10.5 17 17 18 14 8 7.5
16 13 14.5 13 5 4.5 17 19 18
12 1 1.5 17 18 18 13 6 4.5
12 2 1.5 18 23 21.5 16 16 14.5
85.5 130 84.5
There are several
websites that will
help in these calculations.
One well-used statistical
calculator for various
analyses, such as the
Kruskal-Wallis H-test,
can be done using the
resources available at the
VassarStats website via
the link provided below.
Use the data provided in
this chapter section to see
if you get the same results.
http://vassarstats.net
/index.html
Try It!
Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test
suk85842_06_c06.indd 224 10/23/13 1:40 PM
http://vassarstats.net/index.html
http://vassarstats.net/index.html
CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
Test
Next each of the summed ranks are divided by their respective
sample sizes, completing
Formula 6.7.
H 5
12
24124 1 12 1 c
185.52 2
8
1
11302 2
8
1
184.52 2
8
d 2 3124 1 12
H 5 12 3 17,310.252 1 116,9002 1 17,140.252 4 2 3124 1 12
H 5 0.022 (913.78 1 2,112.50 1 892.53) 2 69
H 5 0.022 (3,968.31) 2 69
H 5 18.30
The H statistic approximates a chi-square (x2) distribution,
which will be discussed in
Chapter 11, based on k 2 1 degrees of freedom where k is the
number of comparison
groups. The chi-square distribution table in Table 6.10 has the
critical values based on the
degrees of freedom, that is N 2 1 5 23. Therefore, using the
table, the x2critical 5 35.17 at
the a 5 .05 level. As noted our x2observed value above 18.30 is
less than this x
2
critical 5 35.17
value meaning that there is no significant difference between
groups. As noted in the
ANOVA conducted earlier in the chapter, it was expected that a
nonsignificant outcome
would occur. Nonparametric tests are more conservative
compared to parametric ones in
that there is a lower probability of finding a significant outcome
compared to its paramet-
ric counterpart. This also leads to a lower probability of a type I
error.
Table 6.10: Chi-square distribution
Area to the right of critical value
Degrees of
freedom
0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01
1 — 0.001 0.004 0.016 2.706 3.841 5.024 6.635
2 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210
3 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345
4 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277
5 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666
10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209
11 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725
12 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217
(continued)
suk85842_06_c06.indd 225 10/23/13 1:40 PM
CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
Test
Table 6.10: Chi-square distribution (continued)
Area to the right of critical value
Degrees of
freedom
0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01
13 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688
14 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141
15 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578
16 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000
17 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409
18 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805
19 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191
20 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566
21 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932
22 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289
23 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638
24 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980
25 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314
26 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642
27 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963
28 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278
29 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588
30 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892
As you will see in the next section, when this analysis is
performed in SPSS a x2 value is
given and not an H value per se.
SPSS Steps for the Kuskal-Wallis H-test
Reexamining the data set used Figure 6.6, but rearranging the
data as depicted in Figure
6.11, the employee groups (Position) are categorically coded
with 1 5 Laborers, 2 5 Cleri-
cal, and 3 5 Professional. To execute, go to Analyze S
Nonparametric Tests S Legacy
Dialogs S K Independent Samples. As shown in Figure 6.12,
input Optimism (DV) into
the Test Variable List box and Position (IV) into the Grouping
Variable box, then click
the Define Range button just below to input the range of codes
for the Position variable—
this will be 1 and 3 for the minimum and maximum codes,
respectively. Then click OK.
suk85842_06_c06.indd 226 10/23/13 1:40 PM
CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
Test
Figure 6.11: Data set in SPSS
Figure 6.12: The Kruskal-Wallis H-test steps in SPSS
suk85842_06_c06.indd 227 10/23/13 1:40 PM
CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H-
test
Interpreting Results
The output in Figure 6.13 shows the results of the Kuskal-
Wallis H-test. The x2 value in the
Test Statistics table shows a result as KW—x2(3) 5 7.17, p ,
.05 as there is an overall sta-
tistical difference in optimism amongst the three employee
groups. This can be seen in the
Ranks table where Laborers’ mean ranks (MR 5 21.80) is
highest compared to the lowest
by Professionals (MR 5 6.30). Consequently, the post hoc tests
are not readily available as
they were for the ANOVA, so follow-up Mann-Whitney U or
Wilcoxon rank-sum tests of
all possible combinations will have to be performed (see
Chapter 5 for these procedures).
The conclusion to these results would read as follows:
Based on the Kruskal-Wallis H-test there is a significant
difference in the
level of optimism of the three groups (KW—x2(3) 5 17.17, p ,
.05). Labor-
ers reported the highest level of optimism (MR 5 21.80)
followed by Cleri-
cal positions (MR 5 18.40), and then Professionals (MR 5 6.30),
which
reported the lowest level of optimism.
Figure 6.13: The Kruskal-Wallis H-test output
Groups
Optimism
Laborers
Clerical
Professionals
Total
Position N
10
10
10
30
Mean Rank
21.80
18.40
6.30
Ranks
Chi-Square 17.166
2df
Asymp. Sig. .000
Test Statisticsa,b
Optimism
a. Kruskal Wallis Test
b. Grouping Variable: Position
suk85842_06_c06.indd 228 10/23/13 1:40 PM
CHAPTER 6Summary
Summary
This chapter is the natural extension of Chapters 4 and 5. Like
the z- and t-tests, analysis
of variance is a test of significant differences. Also like the z-
and t-tests, the IV in ANOVA
is nominal and the DV is interval or ratio. With each
procedure—whether z, t, or F—the
test statistic is a ratio of the differences between groups to the
differences within groups
(Objective 3).
There are differences between ANOVA and the earlier
procedures, of course. The vari-
ance statistics are sums of squares and mean squares values. But
perhaps the most impor-
tant difference is that ANOVA can accommodate any number of
groups (Objectives 2 and
3). Remember that trying to deal with multiple groups in a t-test
introduces the problem
of mounting type I error when repeated analyses with the same
data indicate statistical
significance. One-way ANOVA lifts the limitation of a one-
pair-at-a-time comparison
(Objective 1).
The other side of multiple comparisons, however, is the
difficulty of determining which
comparisons are statistically significant when F is significant.
This problem is solved
with the post hoc test. In this chapter, we used Tukey’s HSD
(Objective 4). There are
other post hoc tests, each having their strengths and drawbacks,
but HSD is one of the
most widely used.
Years ago, the emphasis in the scholarly literature was on
whether a result was statisti-
cally significant. Today, the focus is on measuring the effect
size of a significant result, a
statistic that in the case of analysis of variance can indicate how
much of the variability
in the dependent variable can be attributed to the effect of the
independent variable. We
answered that question with eta-squared (h2). But neither the
post hoc test nor eta-squared
is relevant if the F is not significant (Objective 5). Then, further
ANOVAs were executed in
SPSS, and the results were presented (Objective 6) in APA
format and interpreted accord-
ingly (Objective 7). Finally, the nonparametric equivalent of
ANOVA, Kruskal-Wallis
H-test, was discussed as an alternative method and compared to
its parametric equivalent,
the ANOVA. The same data set was used to compare outcomes.
In addition, an appropri-
ate example in SPSS was provided (Objective 8).
The independent t-test and the one-way ANOVA both require
that groups be indepen-
dent. What if they are not? What if we wish to measure one
group twice over time, or
perhaps more than twice? Such dependent-groups procedures are
the focus of Chapter 7.
Rather than different thinking, it is more of an elaboration of
familiar concepts. For this
reason, consider reviewing Chapter 5 and the independent t-test
discussion before start-
ing Chapter 7.
The one-way ANOVA dramatically broadens the kinds of
questions the researcher can
ask. The procedures in Chapter 7 for nonindependent groups
represent the next incre-
mental step.
suk85842_06_c06.indd 229 10/23/13 1:40 PM
CHAPTER 6Chapter Exercises
Key Terms
analysis of variance Fisher’s test that
allows one to detect significant differences
among any number of groups. The acro-
nym is ANOVA.
error variance The variability in a measure
unrelated to the variables being analyzed.
Eta-squared A measure of effect size for
ANOVA. It estimates the amount of vari-
ability in the DV explained by the IV.
F ratio The test statistic calculated in an
analysis of variance problem. It is the ratio
of the variance between the groups to the
variance within the groups.
factor Refers to an IV, particularly in pro-
cedures that involve more than one.
family-wise error An inflated type I error
rate in hypothesis testing when doing mul-
tiple tests with the assumption of different
sets of data. Specifically, when comparing
multiple groups in dyad combinations
using a series of t-tests instead of executing
one omnibus ANOVA.
homogeneity of variance When multiple
groups of data are distributed similarly.
mean square The sum of squares divided
by its degrees of freedom. This division
allows the mean square to reflect a mean, or
average, amount of variability from a source.
omnibus test A test of the overall sig-
nificance of the model based on difference
between sample means when there are
more than two groups to compare. The test
will not tell you which two means are sig-
nificantly different, which is why follow-up
post hoc comparisons are executed.
one-way ANOVA The ANOVA in its sim-
plest form, this model has only one inde-
pendent variable.
post hoc test A test conducted after a sig-
nificant ANOVA or some similar test that
identifies which among multiple possibili-
ties is statistically significant.
sum of squares (SS) The variance measure
in analysis of variance. They are literally
the sum of squared deviations between a
set of scores and their mean.
sum of squares between The variability
related to the independent variable and any
measurement error that may occur.
sum of squares total Total variance from
all sources.
sum of squares within Variability stem-
ming from different responses from indi-
viduals in the same group. It is exclusively
error variance. Is also referred to as the sum
of squares error or the sum of squares residual.
Chapter Exercises
Answers to Try It! Questions
The answers to all Try It! questions introduced in this chapter
are provided below.
A. The “one” in one-way ANOVA refers to the fact that this test
accommodates just
one independent variable.
B. There is no gender variable in the analysis and consequently,
gender-related
variance emerges as error variance. The same would be true for
any variability
in scores stemming from any variable not being analyzed in the
study.
suk85842_06_c06.indd 230 10/23/13 1:40 PM
CHAPTER 6Chapter Exercises
C. It would take 15 comparisons! The answer is the number of
groups (6)
times the number of groups minus 1 (5), with the product
divided by 2:
6 3 5 5 30/2 5 15.
D. The only way SS values can be negative is if there has been a
calculation error.
Because the values are all squared values, if they have any
value other than 0,
they have to be positive.
E. The difference between SStot and SSwith is the SSbet.
F. If F 5 4 and MSwith 5 2, then MSbet 5 8 because F 5
MSbet 4 MSwith.
G. The answer is neither. If F is not significant, there is no
question of which group
is significantly different from which other group because any
variability may be
nothing more than sampling variability. By the same token,
there is no effect to
calculate because, as far as we know, the IV does not have any
effect on the DV.
H. F 5 t2
Review Questions
The answers to the odd-numbered items can be found in the
answers appendix.
1. Several people selected at random are given a story problem
to solve. They take
3.5, 3.8, 4.2, 4.5, 4.7, 5.3, 6.0, and 7.5 minutes. What is the
total sum of squares for
this data?
2. Identify the following symbols and statistics in a one-way
ANOVA:
a. The statistic that indicates the mean amount of difference
between groups.
b. The symbol that indicates the total number of participants.
c. The symbol that indicates the number of groups.
d. The mean amount of uncontrolled variability.
3. The theory is that there are differences by gender in
manifested aggression. With
data from Measuring Expressed Aggression Numbers (MEAN),
a researcher has
the following:
Males: 13, 14, 16, 16, 17, 18, 18, 18
Females: 11, 12, 12, 14, 14, 14, 14, 16
Complete the problem as an ANOVA. Is the difference
statistically significant?
4. Complete Exercise 3 as an independent t-test and demonstrate
the relationship
between t2 and F.
5. Even with a significant F, there is never a need for a post hoc
in a two-group
ANOVA. Why?
6. A researcher completes an ANOVA in which the number of
years of education
completed is analyzed by ethnic group. If h2 5 .36, how should
that be interpreted?
suk85842_06_c06.indd 231 10/23/13 1:40 PM
CHAPTER 6Chapter Exercises
7. Three groups of clients involved in a program for substance
abuse attend weekly
sessions for 8, 12, and 16 weeks. The DV is the number of days
drug free.
8 weeks: 0, 5, 7, 8, 8
12 weeks: 3, 5, 12, 16, 17
16 weeks: 11, 15, 16, 19, 22
a. Is F significant?
b. What is the location of the significant difference?
c. What does the effect size indicate?
8. Regarding Exercise 7,
a. what is the IV?
b. what is the scale of the IV?
c. what is the DV?
d. what is the scale of the DV?
9. For an ANOVA problem, k 5 4 and n 5 8.
If SSbet 5 24.0
and SSwith 5 72,
a. what is F?
b. is the result significant?
10. Consider this partially completed ANOVA table:
Source SS df MS F Fcrit
Total 94
Between 2
Within 63 3
a. What must be the value of N 2 k?
b. What must be the value of k?
c. What must be the value of N?
d. What must SSbet be?
e. Determine MSbet.
f. Determine F.
g. What is Fcrit?
Analyzing the Research
Review the article abstracts provided below. You can then
access the full articles via your
university’s online library portal to answer the critical thinking
questions. Answers can be
found in the answers appendix.
Using ANOVA for an Emotions Study
Carolan, L. A., & Power, M. J. (2011). What basic emotions are
experienced in bipolar
disorder? Clinical Psychology & Psychotherapy, 18(5), 366–
378.
suk85842_06_c06.indd 232 10/23/13 1:40 PM
CHAPTER 6Chapter Exercises
Article Abstract
Aims: The aims of this study were to investigate the basic
emotions experienced within
and between episodes of bipolar disorder and, more specifically,
to test the predictions
made by the Schematic, Propositional, Analogical and
Associative Representation Sys-
tems (SPAARS) model that mania is predominantly
characterized by the coupling of
happiness with anger whereas depression (unipolar and bipolar)
primarily comprises a
coupling between sadness and disgust.
Design: Across-sectional design was employed to examine the
differences within and
between the bipolar, unipolar and control groups in the
emotional profiles. Data were
analyzed using one-way ANOVAs.
Method: Psychiatric diagnoses in the clinical groups were
confirmed using the Structured
Clinical Interview for DSM-IV (SCID). It was not administered
in the control group. Cur-
rent mood state was measured using the Beck Depression
Inventory-II, the State–Trait
Anxiety Inventory and the Bech–Rafaelsen Mania Scale. The
Basic Emotions Scale was
used to explore the emotional profiles.
Results: The results confirmed the predictions made by the
SPAARS model about emo-
tions in mania and depression. Out with these episodes,
individuals with bipolar disorder
experienced elevated levels of disgust.
Discussion: Evidence was found in support of the proposal of
SPAARS that there are five
basic emotions, which form the basis for both normal emotional
experience and emotional
disorders. Disgust is an important feature of bipolar disorder.
Strengths and limitations
are discussed, and suggestions for future research are explored.
Critical Thinking Questions
1. Why does this study use a one-way ANOVA instead of a t-
test?
2. What means are being compared in the bipolar group in this
study?
3. According to the following ANOVA results between bipolar
and unipolar groups,
which result(s) showed significance?
F(1,46) 5 0.00; p 5 .93
F(1,19.22) 5 9.81; p 5 .005
F(1,45) 5 1.26; p 5 .26
F(1,44) 5 0.02; p 5 .87
F(1,45) 5 0.13; p 5 .71
4. What types of post hoc test did the paper use as a follow-up
to the F statistic?
suk85842_06_c06.indd 233 10/23/13 1:40 PM
CHAPTER 6Chapter Exercises
Using ANOVA for a Health and Physical Activity Study
Bize, R., & Plotnikoff, R. C. (2009). The relationship between a
short measure of health
status and physical activity in a workplace population.
Psychology, Health & Medi-
cine, 14(1), 53–61.
Article Abstract
Many interventions promoting physical activity (PA) are
effective in preventing disease
onset, and although studies have found a positive relationship
between health-related
quality of life (HRQL) and PA, most of these studies have
focused on older adults and
those with chronic conditions. Less is known regarding the
association between PA level
and HRQL among healthy adults. Our objective was to analyse
the relationship between
PA level and HRQL among a sample of 573 employees aged 20–
68 taking part in a work-
place intervention to promote PA. Measures included HRQL
(using a single item) and
PA (i.e., Godin Leisure-Time Questionnaire). The Modified
Canadian Aerobic Fitness
Test (MCAFT) was also completed by 10% of the employees.
MET-minute scores (assess-
ing energy expenditure over one week) were compared across
HRQL categories using
ANOVA. A multiple linear regression analysis was conducted to
further examine the rela-
tionship between HRQL and PA, controlling for potential
covariates. Participants in the
higher health status categories were found to report higher
levels of energy expenditure
(one-way ANOVA, p , 0.001). In the multiple linear regression
model, each unit increase
in health status level translated in a mean increase of 356 MET-
minutes in energy expen-
diture (p , 0.001). This single-item assessment of health status
explained six percent of
the variance in energy expenditure. The study concludes that
higher energy expenditure
through PA among an adult workplace population is positively
associated with increased
health status, and it also suggests that a single-item HRQL
measure is suitable for com-
munity- and population based studies, reducing response burden
and research costs.
Critical Thinking Questions
1. Why did this study execute a Kruskal-Wallis H-test?
2. It was stated that the higher health status categories reported
higher mean energy
expenditure of the one-way ANOVA, and the Kruskal-Wallis
yielded similar
results. To make this plausible, what would the significance
level of the Kruskal-
Wallis have been?
3. After evaluating figure 1, we can see there is a difference in
higher health status
and higher energy expenditure. From this information, should
they have run a
post hoc test? Why or why not?
suk85842_06_c06.indd 234 10/23/13 1:40 PM
Research Question FOR WEEK ONE
Background
During this week you will brainstorm a list of research
questions you are interested in, which will help you work
towards your Week 1 Assignment. You are working towards
creating a list of at least 10 unique research questions that
encompass a variety of topics and types of variables. Think
about exploring relationships between variables, making
predictions for one variable using one or more other variables,
and determining differences between groups across one or two
variables. In future weeks, you will pull questions from this list
that might lend themselves to a particular statistical analysis,
thus saving valuable time in not needing to brainstorm research
ideas. During those weeks you will take the research question
and create a mini-research proposal that will help you consider
the application of a specific statistical analysis to that question.
Discussion Assignment Requirements
Initial Posting - To earn full participation points, include in
your initial posting at least 5 potential research questions by
Day 3. Have fun with these questions and choose topics you are
truly interested in, whether they are leadership, training, sports,
social media, politics, movies, or food. This will make the
research design process much more enjoyable. If you need help
coming up with ideas, ask your instructor for examples. Also,
feel free to post more than 5 research questions as it would be
useful to get feedback on as many questions as possible.
For each of the questions, provide the following:
· List the research question (be sure to phrase as a measurable
question)
· Identify the variables presented in the question
· Provide an operational definition for each variable
· Describe each variable’s scale of measurement (nominal,
ordinal, interval, or ratio) and characteristics (i.e., discrete vs.
continuous, numerical vs. categorical, etc.)
Replies - Though you may respond to your peers multiple times
during the week to provide support or feedback, students are
required to respond substantively to at least two of their
classmates’ postings by
ANSWER FOR DISCUSSION WEEK 1
Research discussion
Research questions one: How does leadership style affect
organizational performance?
In this research question, the independent variables are
leadership style, while the dependent variable is organizational
performance (Sukal, 2019). Leadership styles are techniques
used by organizations to run their activities to achieve their
objectives. Besides, organizational performance entails various
achievements of an entity that are accrued from its business
operations. An ordinal scale of measurement can be used in this
case.
Research questions two: Effects of technology on students'
performance?
In the case, technology is the independent variable while
students' performance is the dependent variable. Technology in
education is scientific knowledge used to improve the level of
education (Sukal, 2019). Student performance refers to how
students carry out their studies. An ordinal scale of
measurement is appropriate to measure how technology affects
students' performance.
Research questions Three: what are the effects of smoking on
human health?
Smoking is the independent variable, while human health is the
dependent variable. Smoking is the inhalation of tobacco
products, while human health is the well-being of the human
condition (Carruthers & Maggard, 2019). An ordinal scale of
measurement is used in this case.
Research questions four: Effects of training on employee
performance?
Training is the independent variable, while employee
performance is the dependent variable (Carruthers & Maggard,
2019). Training involves equipping employees with the
knowledge to perform their duties appropriately. Employee
performance is the output that is accrued from different
activities. An ordinal scale is used in this research question.
Research questions five: How does management styles affect
employee performance?
Management styles are the independent variable, while
employee performance is the dependent (Carruthers & Maggard,
2019). Management styles are techniques used by the
management to run business activities while employee
performance is output accrued from employees' actions. An
ordinal scale is used in this research question.
References
Carruthers, M. W., Maggard, M. (2019). Smart Lab: A Statistics
Primer. San Diego, CA: Bridge point Education, Inc.
Sukal, M. (2019). Research methods: Applying statistics in
research. San Diego, CA: Bridge point Education, Inc.
PROFFESSOR RESPOND:
Interesting questions!
Please be sure to include operational definitions of your DVs -
i.e. employee performance. How would you measure it? It
might be helpful to review the operational definition
announcement in the course. Remember, we need to include
enough detail about our methodology and variables so that
anyone could replicate our work.

More Related Content

PDF
A Nurses Guide To The Critical Reading Of Research
DOCX
Critiquing the Research Steps.docx
PPTX
Critical Review of published Research.pptx
PPTX
Research Critique in Medicine and Nursing
PPTX
Research Critique.pptx
PPTX
CRITICAL ANALYSIS OF RESEARCH.pptx
PPTX
Critical analysis of research article
PPTX
Critical appraisal of research studies
A Nurses Guide To The Critical Reading Of Research
Critiquing the Research Steps.docx
Critical Review of published Research.pptx
Research Critique in Medicine and Nursing
Research Critique.pptx
CRITICAL ANALYSIS OF RESEARCH.pptx
Critical analysis of research article
Critical appraisal of research studies

Similar to Stepby-step guide to critiquingresearch. Part 1 quantitati.docx (20)

PDF
A Guide To Critiquing A Research Paper On Clinical Supervision Enhancing Ski...
PPT
Research critique
PPT
This is the Chapter 20 Gillis & Jackson.ppt
PPTX
critiquing in nursing researchcritique refers to the process of evaluating an...
PPTX
Research Critique.pptx
PDF
Nursing Critique
PPTX
Critiquing research
PDF
CritiqueofNursingResearchfinal-adjusted.pdf
PPTX
Research Critique
PPTX
Critical analysis of research report
PPTX
Research critic
DOCX
Critique Template for a Mixed-Methods StudyNURS 6052Week 6 A.docx
PPTX
critiqu
DOCX
Critique Template for a Mixed-Methods StudyNURS 5052NURS 6052.docx
DOCX
(1) Critique Template for a Qualitative StudyNURS 6052Week.docx
DOCX
Running head RESEARCH TYPES .docx
DOCX
Research Critique GuidelinesTo write a critical appr.docx
PDF
Critiquing research
PDF
Critiquing research
PDF
Critiquing research
A Guide To Critiquing A Research Paper On Clinical Supervision Enhancing Ski...
Research critique
This is the Chapter 20 Gillis & Jackson.ppt
critiquing in nursing researchcritique refers to the process of evaluating an...
Research Critique.pptx
Nursing Critique
Critiquing research
CritiqueofNursingResearchfinal-adjusted.pdf
Research Critique
Critical analysis of research report
Research critic
Critique Template for a Mixed-Methods StudyNURS 6052Week 6 A.docx
critiqu
Critique Template for a Mixed-Methods StudyNURS 5052NURS 6052.docx
(1) Critique Template for a Qualitative StudyNURS 6052Week.docx
Running head RESEARCH TYPES .docx
Research Critique GuidelinesTo write a critical appr.docx
Critiquing research
Critiquing research
Critiquing research
Ad

More from susanschei (20)

DOCX
Src TemplateStandard Recipe CardName of dishSpanish Vegie Tray Ba.docx
DOCX
SPT 208 Final Project Guidelines and Rubric Overview .docx
DOCX
Ssalinas_ThreeMountainsRegionalHospitalCodeofEthics73119.docxR.docx
DOCX
Spring 2020Professor Tim SmithE mail [email protected]Teach.docx
DOCX
Spring 2020 – Business Continuity & Disaster R.docx
DOCX
Sports Business Landscape Graphic OrganizerContent.docx
DOCX
Spring 2020Carlow University Department of Psychology & Co.docx
DOCX
Sport Ticket sales staff trainingChapter 4Sales .docx
DOCX
Sponsorship Works 2018 8PROJECT DETAILSSponsorship tit.docx
DOCX
SPM 4723 Annotated Bibliography You second major proje.docx
DOCX
Speech Environment and Recording Requirements• You must have a.docx
DOCX
Sped4 Interview 2.10.17 Audio.m4aJodee [000008] And we are .docx
DOCX
Speech Recognition in the Electronic Health Record (2013 u.docx
DOCX
Sped Focus Group.m4aJodee [000001] This is a focus group wi.docx
DOCX
Specialized Terms 20.0 Definitions and examples of specialized.docx
DOCX
Special notes Media and the media are plural and take plural verb.docx
DOCX
SPECIAL ISSUE ON POLITICAL VIOLENCEResearch on Social Move.docx
DOCX
SPECIAL ISSUE CRITICAL REALISM IN IS RESEARCHCRITICAL RE.docx
DOCX
Speaking about Muhammad, Speaking for MuslimsAuthor(s) An.docx
DOCX
Speaker NameSpeech TitleDirections For each area li.docx
Src TemplateStandard Recipe CardName of dishSpanish Vegie Tray Ba.docx
SPT 208 Final Project Guidelines and Rubric Overview .docx
Ssalinas_ThreeMountainsRegionalHospitalCodeofEthics73119.docxR.docx
Spring 2020Professor Tim SmithE mail [email protected]Teach.docx
Spring 2020 – Business Continuity & Disaster R.docx
Sports Business Landscape Graphic OrganizerContent.docx
Spring 2020Carlow University Department of Psychology & Co.docx
Sport Ticket sales staff trainingChapter 4Sales .docx
Sponsorship Works 2018 8PROJECT DETAILSSponsorship tit.docx
SPM 4723 Annotated Bibliography You second major proje.docx
Speech Environment and Recording Requirements• You must have a.docx
Sped4 Interview 2.10.17 Audio.m4aJodee [000008] And we are .docx
Speech Recognition in the Electronic Health Record (2013 u.docx
Sped Focus Group.m4aJodee [000001] This is a focus group wi.docx
Specialized Terms 20.0 Definitions and examples of specialized.docx
Special notes Media and the media are plural and take plural verb.docx
SPECIAL ISSUE ON POLITICAL VIOLENCEResearch on Social Move.docx
SPECIAL ISSUE CRITICAL REALISM IN IS RESEARCHCRITICAL RE.docx
Speaking about Muhammad, Speaking for MuslimsAuthor(s) An.docx
Speaker NameSpeech TitleDirections For each area li.docx
Ad

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Complications of Minimal Access Surgery at WLH
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Pharma ospi slides which help in ospi learning
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Complications of Minimal Access Surgery at WLH
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Types and Its function , kingdom of life
O7-L3 Supply Chain Operations - ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Module 4: Burden of Disease Tutorial Slides S2 2025
2.FourierTransform-ShortQuestionswithAnswers.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Anesthesia in Laparoscopic Surgery in India
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Final Presentation General Medicine 03-08-2024.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf

Stepby-step guide to critiquingresearch. Part 1 quantitati.docx

  • 1. Step'by-step guide to critiquing research. Part 1: quantitative research Michaei Coughian, Patricia Cronin, Frances Ryan Abstract When caring for patients it is essential that nurses are using the current best practice. To determine what this is, nurses must be able to read research critically. But for many qualified and student nurses the terminology used in research can be difficult to understand thus making critical reading even more daunting. It is imperative in nursing that care has its foundations in sound research and it is essential that all nurses have the ability to critically appraise research to identify what is best practice. This article is a step-by step- approach to critiquing quantitative research to help nurses demystify the process and decode the terminology. Key words: Quantitative research methodologies Review process • Research ]or many qualified nurses and nursing students research is research, and it is often quite difficult to grasp what others are referring to when they discuss the limitations and or strengths within
  • 2. a research study. Research texts and journals refer to critiquing the literature, critical analysis, reviewing the literature, evaluation and appraisal of the literature which are in essence the same thing (Bassett and Bassett, 2003). Terminology in research can be confusing for the novice research reader where a term like 'random' refers to an organized manner of selecting items or participants, and the word 'significance' is applied to a degree of chance. Thus the aim of this article is to take a step-by-step approach to critiquing research in an attempt to help nurses demystify the process and decode the terminology. When caring for patients it is essential that nurses are using the current best practice. To determine what this is nurses must be able to read research. The adage 'All that glitters is not gold' is also true in research. Not all research is of the same quality or of a high standard and therefore nurses should not simply take research at face value simply because it has been published (Cullum and Droogan, 1999; Rolit and Beck, 2006). Critiquing is a systematic method of Michael Coughlan, Patricia Cronin and Frances Ryan are Lecturers, School of Nursing and Midwifery, University of Dubhn, Trinity College, Dublin Accepted for publication: March 2007 appraising the strengths and limitations of a piece of research in order to determine its credibility and/or its applicability to practice (Valente, 2003). Seeking only limitations in a study is criticism and critiquing and criticism are not the same (Burns and Grove, 1997). A critique is an impersonal evaluation of the strengths and limitations of the research being reviewed and should not be seen as a disparagement
  • 3. of the researchers ability. Neither should it be regarded as a jousting match between the researcher and the reviewer. Burns and Grove (1999) call this an 'intellectual critique' in that it is not the creator but the creation that is being evaluated. The reviewer maintains objectivity throughout the critique. No personal views are expressed by the reviewer and the strengths and/or limitations of the study and the imphcations of these are highlighted with reference to research texts or journals. It is also important to remember that research works within the realms of probability where nothing is absolutely certain. It is therefore important to refer to the apparent strengths, limitations and findings of a piece of research (Burns and Grove, 1997). The use of personal pronouns is also avoided in order that an appearance of objectivity can be maintained. Credibility and integrity There are numerous tools available to help both novice and advanced reviewers to critique research studies (Tanner, 2003). These tools generally ask questions that can help the reviewer to determine the degree to which the steps in the research process were followed. However, some steps are more important than others and very few tools acknowledge this. Ryan-Wenger (1992) suggests that questions in a critiquing tool can be subdivided in those that are useful for getting a feel for the study being presented which she calls 'credibility variables' and those that are essential for evaluating the research process called 'integrity variables'. Credibility variables concentrate on how believable the work appears and focus on the researcher's qualifications and ability to undertake and accurately present the study. The answers to these questions are important when critiquing a piece of research as they can offer the reader an insight into vhat to expect in the remainder of the study. However, the reader should be aware that identified strengths
  • 4. and limitations within this section will not necessarily correspond with what will be found in the rest of the work. Integrity questions, on the other hand, are interested in the robustness of the research method, seeking to identify how appropriately and accurately the researcher followed the steps in the research process. The answers to these questions 658 British Journal of Nursing. 2007. Vol 16, No II RESEARCH METHODOLOGIES Table 1. Research questions - guidelines for critiquing a quantitative research study Elements influencing the beiievabiiity of the research Elements Writing styie Author Report titie Abstract Questions Is the report well written - concise, grammatically correct, avoid the use of jargon? Is it weil iaid out and organized? Do the researcher(s') quaiifications/position indicate a degree of knowledge in this particuiar field? Is the title clear, accurate and unambiguous? Does the abstract offer a clear overview of the study including the research problem, sample, methodology, finding and recommendations?
  • 5. Elements influencing the robustness of the research Elements Purpose/research Problem Logical consistency Literature review Theoreticai framework Aims/objectives/ research question/ hypotheses Sampie Ethicai considerations Operational definitions Methodology Data Anaiysis / results Discussion References Questions Is the purpose of the study/research problem clearly identified? Does the research report foilow the steps of the research process in a iogical manner? Do these steps naturally fiow and are the iinks ciear? is the review Iogicaily organized? Does it offer a balanced critical anaiysis of the iiterature? is the majority
  • 6. of the literature of recent origin? is it mainly from primary sources and of an empirical nature? Has a conceptual or theoretical framework been identified? Is the framework adequately described? is the framework appropriate? Have alms and objectives, a research question or hypothesis been identified? If so are they clearly stated? Do they reflect the information presented in the iiterature review? Has the target popuiation been cieariy identified? How were the sample selected? Was it a probability or non-probabiiity sampie? is it of adequate size? Are the indusion/exciusion criteria dearly identified? Were the participants fuiiy informed about the nature of the research? Was the autonomy/ confidentiaiity of the participants guaranteed? Were the participants protected from harm? Was ethicai permission granted for the study? Are aii the terms, theories and concepts mentioned in the study dearly defined? is the research design cieariy identified? Has the data gathering instrument been described? is the instrument appropriate? How was it deveioped? Were reliabiiity and validity testing undertaken and the resuits discussed? Was a piiot study undertaken? What type of data and statisticai analysis was undertaken? Was it appropriate? How many of the sampie participated? Significance of the findings? Are the findings iinked back to the iiterature review? if a hypothesis was identified was it supported? Were the strengths and limitations of the study including generalizability discussed? Was a recommendation for further research made? Were ali the books, journais and other media aliuded to in the study accurateiy referenced?
  • 7. will help to identify the trustworthiness of the study and its applicability to nursing practice. Critiquing the research steps In critiquing the steps in the research process a number of questions need to be asked. However, these questions are seeking more than a simple 'yes' or 'no' answer. The questions are posed to stimulate the reviewer to consider the implications of what the researcher has done. Does the way a step has been applied appear to add to the strength of the study or does it appear as a possible limitation to implementation of the study's findings? {Table 1). Eiements influencing beiievabiiity of the study Writing style Research reports should be well written, grammatically correct, concise and well organized.The use of jargon should be avoided where possible. The style should be such that it attracts the reader to read on (Polit and Beck, 2006). Author(s) The author(s') qualifications and job title can be a useful indicator into the researcher(s') knowledge of the area under investigation and ability to ask the appropriate questions (Conkin Dale, 2005). Conversely a research study should be evaluated on its own merits and not assumed to be valid and reliable simply based on the author(s') qualifications. Report title The title should be between 10 and 15 words long and should clearly identify for the reader the purpose of the study (Connell Meehan, 1999). Titles that are too long or too short can be confusing or misleading (Parahoo, 2006).
  • 8. Abstract The abstract should provide a succinct overview of the research and should include information regarding the purpose of the study, method, sample size and selection. Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659 the main findings and conclusions and recommendations (Conkin Dale, 2005). From the abstract the reader should be able to determine if the study is of interest and whether or not to continue reading (Parahoo, 2006). Eiements influencing robustness Purpose of the study/research problem A research problem is often first presented to the reader in the introduction to the study (Bassett and Bassett, 2003). Depending on what is to be investigated some authors will refer to it as the purpose of the study. In either case the statement should at least broadly indicate to the reader what is to be studied (Polit and Beck, 2006). Broad problems are often multi-faceted and will need to become narrower and more focused before they can be researched. In this the literature review can play a major role (Parahoo, 2006). Logical consistency A research study needs to follow the steps in the process in a logical manner.There should also be a clear link between the steps beginning with the purpose of the study and following through the literature review, the theoretical framework, the research question, the methodology section, the data analysis, and the findings (Ryan-Wenger, 1992). Literature review The primary purpose of the literature review is to define
  • 9. or develop the research question while also identifying an appropriate method of data collection (Burns and Grove, 1997). It should also help to identify any gaps in the literature relating to the problem and to suggest how those gaps might be filled. The literature review should demonstrate an appropriate depth and breadth of reading around the topic in question. The majority of studies included should be of recent origin and ideally less than five years old. However, there may be exceptions to this, for example, in areas where there is a lack of research, or a seminal or all-important piece of work that is still relevant to current practice. It is important also that the review should include some historical as well as contemporary material in order to put the subject being studied into context. The depth of coverage will depend on the nature of the subject, for example, for a subject with a vast range of literature then the review will need to concentrate on a very specific area (Carnwell, 1997). Another important consideration is the type and source of hterature presented. Primary empirical data from the original source is more favourable than a secondary source or anecdotal information where the author relies on personal evidence or opinion that is not founded on research. A good review usually begins with an introduction which identifies the key words used to conduct the search and information about which databases were used. The themes that emerged from the literature should then be presented and discussed (Carnwell, 1997). In presenting previous work it is important that the data is reviewed critically, highlighting both the strengths and limitations of the study. It should also be compared and contrasted with the findings of other studies (Burns and Grove, 1997). Theoretical framework Following the identification of the research problem
  • 10. and the review of the literature the researcher should present the theoretical framework (Bassett and Bassett, 2003). Theoretical frameworks are a concept that novice and experienced researchers find confusing. It is initially important to note that not all research studies use a defined theoretical framework (Robson, 2002). A theoretical framework can be a conceptual model that is used as a guide for the study (Conkin Dale, 2005) or themes from the literature that are conceptually mapped and used to set boundaries for the research (Miles and Huberman, 1994). A sound framework also identifies the various concepts being studied and the relationship between those concepts (Burns and Grove, 1997). Such relationships should have been identified in the literature. The research study should then build on this theory through empirical observation. Some theoretical frameworks may include a hypothesis. Theoretical frameworks tend to be better developed in experimental and quasi-experimental studies and often poorly developed or non-existent in descriptive studies (Burns and Grove, 1999).The theoretical framework should be clearly identified and explained to the reader. Aims and objectives/research question/ research hypothesis The purpose of the aims and objectives of a study, the research question and the research hypothesis is to form a link between the initially stated purpose of the study or research problem and how the study will be undertaken (Burns and Grove, 1999). They should be clearly stated and be congruent with the data presented in the literature review. The use of these items is dependent on the type of research being performed. Some descriptive studies may not identify any of these items but simply refer to the purpose of the study or the research problem, others will include either aims and objectives or research questions (Burns and Grove, 1999). Correlational designs, study the relationships that exist between two or
  • 11. more variables and accordingly use either a research question or hypothesis. Experimental and quasi-experimental studies should clearly state a hypothesis identifying the variables to be manipulated, the population that is being studied and the predicted outcome (Burns and Grove, 1999). Sample and sample size The degree to which a sample reflects the population it was drawn from is known as representativeness and in quantitative research this is a decisive factor in determining the adequacy of a study (Polit and Beck, 2006). In order to select a sample that is likely to be representative and thus identify findings that are probably generalizable to the target population a probability sample should be used (Parahoo, 2006). The size of the sample is also important in quantitative research as small samples are at risk of being overly representative of small subgroups within the target population. For example, if, in a sample of general nurses, it was noticed that 40% of the respondents were males, then males would appear to be over represented in the sample, thereby creating a sampling error. The risk of sampling 660 Britishjournal of Nursing. 2007. Vol 16. No II RESEARCH METHODOLOGIES errors decrease as larger sample sizes are used (Burns and Grove, 1997). In selecting the sample the researcher should clearly identify who the target population are and what criteria were used to include or exclude participants. It should also be evident how the sample was selected and how many were invited to participate (Russell, 2005). Ethical considerations
  • 12. Beauchamp and Childress (2001) identify four fundamental moral principles: autonomy, non-maleficence, beneficence and justice. Autonomy infers that an individual has the right to freely decide to participate in a research study without fear of coercion and with a full knowledge of what is being investigated. Non-maleficence imphes an intention of not harming and preventing harm occurring to participants both of a physical and psychological nature (Parahoo, 2006). Beneficence is interpreted as the research benefiting the participant and society as a whole (Beauchamp and Childress, 2001). Justice is concerned with all participants being treated as equals and no one group of individuals receiving preferential treatment because, for example, of their position in society (Parahoo, 2006). Beauchamp and Childress (2001) also identify four moral rules that are both closely connected to each other and with the principle of autonomy. They are veracity (truthfulness), fidelity (loyalty and trust), confidentiality and privacy.The latter pair are often linked and imply that the researcher has a duty to respect the confidentiality and/or the anonymity of participants and non-participating subjects. Ethical committees or institutional review boards have to give approval before research can be undertaken. Their role is to determine that ethical principles are being applied and that the rights of the individual are being adhered to (Burns and Grove, 1999). Operational definitions In a research study the researcher needs to ensure that the reader understands what is meant by the terms and concepts that are used in the research. To ensure this any concepts or terms referred to should be clearly defined (Parahoo, 2006). Methodology: research design
  • 13. Methodology refers to the nuts and bolts of how a research study is undertaken. There are a number of important elements that need to be referred to here and the first of these is the research design. There are several types of quantitative studies that can be structured under the headings of true experimental, quasi-experimental and non-experimental designs (Robson, 2002) {Table 2). Although it is outside the remit of this article, within each of these categories there are a range of designs that will impact on how the data collection and data analysis phases of the study are undertaken. However, Robson (2002) states these designs are similar in many respects as most are concerned with patterns of group behaviour, averages, tendencies and properties. Methodology: data collection The next element to consider after the research design is the data collection method. In a quantitative study any number of strategies can be adopted when collecting data and these can include interviews, questionnaires, attitude scales or observational tools. Questionnaires are the most commonly used data gathering instruments and consist mainly of closed questions with a choice of fixed answers. Postal questionnaires are administered via the mail and have the value of perceived anonymity. Questionnaires can also be administered in face-to-face interviews or in some instances over the telephone (Polit and Beck, 2006). Methodology: instrument design After identifying the appropriate data gathering method the next step that needs to be considered is the design of the instrument. Researchers have the choice of using a previously designed instrument or developing one for the study and this choice should be clearly declared for the reader. Designing an instrument is a protracted and sometimes difficult process (Burns and Grove, 1997) but the
  • 14. overall aim is that the final questions will be clearly linked to the research questions and will elicit accurate information and will help achieve the goals of the research.This, however, needs to be demonstrated by the researcher. Table 2. Research designs Design Experimental Qucisl-experimental Non-experimental, e.g. descriptive and Includes: cross-sectional. correlationai. comparative. iongitudinal studies Sample 2 or more groups One or more groups One or more groups Sample allocation Random Random Not applicable
  • 15. Features • Groups get different treatments • One variable has not been manipuiated or controlled (usually because it cannot be) • Discover new meaning • Describe what already exists • Measure the relationship between two or more variables Outcome • Cause and effiect relationship • Cause and effect relationship but iess powerful than experimental • Possible hypothesis for future research • Tentative explanations Britishjournal of Nursing. 2007. Vol 16. No 11 661
  • 16. If a previously designed instrument is selected the researcher should clearly establish that chosen instrument is the most appropriate.This is achieved by outlining how the instrument has measured the concepts under study. Previously designed instruments are often in the form of standardized tests or scales that have been developed for the purpose of measuring a range of views, perceptions, attitudes, opinions or even abilities. There are a multitude of tests and scales available, therefore the researcher is expected to provide the appropriate evidence in relation to the validity and reliability of the instrument (Polit and Beck, 2006). Methodology: validity and reliability One of the most important features of any instrument is that it measures the concept being studied in an unwavering and consistent way. These are addressed under the broad headings of validity and reliability respectively. In general, validity is described as the ability of the instrument to measure what it is supposed to measure and reliability the instrument's ability to consistently and accurately measure the concept under study (Wood et al, 2006). For the most part, if a well established 'off the shelf instrument has been used and not adapted in any way, the validity and reliability will have been determined already and the researcher should outline what this is. However, if the instrument has been adapted in any way or is being used for a new population then previous validity and reliability will not apply. In these circumstances the researcher should indicate how the reliability and validity of the adapted instrument was established (Polit and Beck, 2006). To establish if the chosen instrument is clear and unambiguous and to ensure that the proposed study has been conceptually well planned a mini-version of the main study, referred to as a pilot study, should be undertaken before
  • 17. the main study. Samples used in the pilot study are generally omitted from the main study. Following the pilot study the researcher may adjust definitions, alter the research question, address changes to the measuring instrument or even alter the sampling strategy. Having described the research design, the researcher should outline in clear, logical steps the process by which the data was collected. All steps should be fully described and easy to follow (Russell, 2005). Analysis and results Data analysis in quantitative research studies is often seen as a daunting process. Much of this is associated with apparently complex language and the notion of statistical tests. The researcher should clearly identify what statistical tests were undertaken, why these tests were used and what •were the results. A rule of thumb is that studies that are descriptive in design only use descriptive statistics, correlational studies, quasi-experimental and experimental studies use inferential statistics. The latter is subdivided into tests to measure relationships and differences between variables (Clegg, 1990). Inferential statistical tests are used to identify if a relationship or difference between variables is statistically significant. Statistical significance helps the researcher to rule out one important threat to validity and that is that the result could be due to chance rather than to real differences in the population. Quantitative studies usually identify the lowest level of significance as PsO.O5 (P = probability) (Clegg, 1990). To enhance readability researchers frequently present their findings and data analysis section under the headings
  • 18. of the research questions (Russell, 2005). This can help the reviewer determine if the results that are presented clearly answer the research questions. Tables, charts and graphs may be used to summarize the results and should be accurate, clearly identified and enhance the presentation of results (Russell, 2005). The percentage of the sample who participated in the study is an important element in considering the generalizability of the results. At least fifty percent of the sample is needed to participate if a response bias is to be avoided (Polit and Beck, 2006). Discussion/conclusion/recommendations The discussion of the findings should Oow logically from the data and should be related back to the literature review thus placing the study in context (Russell, 2002). If the hypothesis was deemed to have been supported by the findings, the researcher should develop this in the discussion. If a theoretical or conceptual framework was used in the study then the relationship with the findings should be explored. Any interpretations or inferences drawn should be clearly identified as such and consistent with the results. The significance of the findings should be stated but these should be considered within the overall strengths and limitations of the study (Polit and Beck, 2006). In this section some consideration should be given to whether or not the findings of the study were generalizable, also referred to as external validity. Not all studies make a claim to generalizability but the researcher should have undertaken an assessment of the key factors in the design, sampling and analysis of the study to support any such claim. Finally the researcher should have explored the clinical significance and relevance of the study. Applying findings
  • 19. in practice should be suggested with caution and will obviously depend on the nature and purpose of the study. In addition, the researcher should make relevant and meaningful suggestions for future research in the area (Connell Meehan, 1999). References The research study should conclude with an accurate list of all the books; journal articles, reports and other media that were referred to in the work (Polit and Beck, 2006). The referenced material is also a useful source of further information on the subject being studied. Conciusions The process of critiquing involves an in-depth examination of each stage of the research process. It is not a criticism but rather an impersonal scrutiny of a piece of work using a balanced and objective approach, the purpose of which is to highlight both strengths and weaknesses, in order to identify 662 Uritish Journal of Nursinii. 2007. Vol 16. No II RESEARCH METHODOLOGIES whether a piece of research is trustworthy and unbiased. As nursing practice is becoming increasingly more evidenced based, it is important that care has its foundations in sound research. It is therefore important that all nurses have the ability to critically appraise research in order to identify what
  • 20. is best practice. HH Russell C (2005) Evaluating quantitative researcli reports. Nephrol Nurs J 32(1): 61-4 Ryan-Wenger N (1992) Guidelines for critique of a research report. Heart Lung 21(4): 394-401 Tanner J (2003) Reading and critiquing research. BrJ Perioper Nurs 13(4): 162-4 Valente S (2003) Research dissemination and utilization: Improving care at the bedside.J Nurs Care Quality 18(2): 114-21 Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in Planning Nursing Research: From Question to Proposal 6th edn. Jones and Bartlett, Sudbury Bassett C, B.issett J (2003) Reading and critiquing research. BrJ Perioper NriK 13(4): 162-4 Beauchamp T, Childress J (2001) Principles of Biomedical Ethics. 5th edn. O.xford University Press, Oxford Burns N, Grove S (1997) The Practice of Nursing Research: Conduct, Critique and Utilization. 3rd edn.WB Saunders Company, Philadelphia Burns N, Grove S (1999) Understanding Nursing Research. 2nd
  • 21. edn. WB Saunders Company. Philadelphia Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21 Clegg F (1990) Simple Statistics: A Course Book for the Social Sciences. 2nd edn. Cambridge University Press. Cambridge Conkin DaleJ (2005) Critiquing research for use in practice.J Pediatr Health Care 19: 183-6 Connell Meehan T (1999) The research critique. In:Treacy P, Hyde A, eds. Nursing Research and Design. UCD Press, Dublin: 57-74 Cullum N. Droogan J (1999) Using research and the role of systematic reviews of the literature. In: Mulhall A. Le May A. eds. Nursing Research: Dissemination and Implementation. Churchill Livingstone, Edinburgh: 109-23- Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd edn. Sage, Thousand Oaks. Ca Parahoo K (2006) Nursing Research: Principles, Process and Issties. 2nd edn. Palgrave Macmillan. Houndmills Basingstoke Polit D. Beck C (2006) Essentials of Nursing Care: Methods, Appraisal and Utilization. 6th edn. Lippincott Williams and Wilkins,
  • 22. Philadelphia Robson C (2002) Reat World Research. 2nd edn. Blackwell Publishing, O.xford KEY POINTS I Many qualified and student nurses have difficulty understanding the concepts and terminology associated with research and research critique. IThe ability to critically read research is essential if the profession is to achieve and maintain its goal to be evidenced based. IA critique of a piece of research is not a criticism of the wori<, but an impersonai review to highlight the strengths and iimitations of the study. I It is important that all nurses have the ability to criticaiiy appraise research In order to identify what is best practice. Critiquing Nursing Research 2nd edition Critiquing Nursing Research
  • 23. 2nd edition ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x 156 mm; p/back; 224 pages; publicatior) November 2006; £25.99 By John R Cutdiffe and Martin Ward This 2nd edition of Critiquing Nursing Research retains the features which made the original a 'best seller' whilst incorporating new material in order to expand the book's applicability. In addition to reviewing and subsequently updating the material of the original text, the authors have added two further examples of approaches to crtitique along with examples and an additonal chapter on how to critique research as part of the work of preparing a dissertation. The fundamentals of the book however remain the same. It focuses specifically on critiquing nursing research; the increasing requirement for nurses to become conversant with research, understand its link with the use of evidence to underpin practice; and the movement towards becoming an evidence-based discipline. As nurse education around the world increasingly moves towards an all-graduate discipline, it is vital for nurses to have the ability to critique research in order to benefit practice. This book is the perfect tool for those seeking to gain or develop precisely that skill and is a must-have for all students nurses, teachers and academics. John Cutclitfe holds the 'David G. Braithwaite' Protessor of
  • 24. Nursing Endowed Chair at the University of Texas (Tyler); he is also an Adjunct Professor of Psychiatric Nursing at Stenberg College International School of Nursing, Vancouver, Canada. Matin Ward is an Independent tvtental Health Nurse Consultant and Director of tvlW Protessional Develcpment Ltd. To order your copy please contact us using the details below or visit our website www.quaybooks.co.yk where you will also tind details ot other Quay Books otters and titles. John Cutcliffe and Martin Ward IQUAYBOOKS AdMsioiiDftUHiolthcareM Quay Books Division I MA Healthcare Limited Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3 5HN I UK Tel: 01722 716998 I Fax: 01722 716887 I E-mail: [email protected] I Web: www.quaybooks.co.uk A ilH MAHbUTHCASIUMITED Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663
  • 25. Step'by-step guide to critiquing research. Part 1: quantitative research Michaei Coughian, Patricia Cronin, Frances Ryan Abstract When caring for patients it is essential that nurses are using the current best practice. To determine what this is, nurses must be able to read research critically. But for many qualified and student nurses the terminology used in research can be difficult to understand thus making critical reading even more daunting. It is imperative in nursing that care has its foundations in sound research and it is essential that all nurses have the ability to critically appraise research to identify what is best practice. This article is a step-by step- approach to critiquing quantitative research to help nurses demystify the process and decode the terminology. Key words: Quantitative research methodologies Review process • Research ]or many qualified nurses and nursing students research is research, and it is often quite difficult to grasp what others are referring to when they discuss the limitations and or strengths within a research study. Research texts and journals refer to
  • 26. critiquing the literature, critical analysis, reviewing the literature, evaluation and appraisal of the literature which are in essence the same thing (Bassett and Bassett, 2003). Terminology in research can be confusing for the novice research reader where a term like 'random' refers to an organized manner of selecting items or participants, and the word 'significance' is applied to a degree of chance. Thus the aim of this article is to take a step-by-step approach to critiquing research in an attempt to help nurses demystify the process and decode the terminology. When caring for patients it is essential that nurses are using the current best practice. To determine what this is nurses must be able to read research. The adage 'All that glitters is not gold' is also true in research. Not all research is of the same quality or of a high standard and therefore nurses should not simply take research at face value simply because it has been published (Cullum and Droogan, 1999; Rolit and Beck, 2006). Critiquing is a systematic method of Michael Coughlan, Patricia Cronin and Frances Ryan are Lecturers, School of Nursing and Midwifery, University of Dubhn, Trinity College, Dublin Accepted for publication: March 2007 appraising the strengths and limitations of a piece of research in order to determine its credibility and/or its applicability to practice (Valente, 2003). Seeking only limitations in a study is criticism and critiquing and criticism are not the same (Burns and Grove, 1997). A critique is an impersonal evaluation of the strengths and limitations of the research being reviewed and should not be seen as a disparagement of the researchers ability. Neither should it be regarded as a jousting match between the researcher and the reviewer.
  • 27. Burns and Grove (1999) call this an 'intellectual critique' in that it is not the creator but the creation that is being evaluated. The reviewer maintains objectivity throughout the critique. No personal views are expressed by the reviewer and the strengths and/or limitations of the study and the imphcations of these are highlighted with reference to research texts or journals. It is also important to remember that research works within the realms of probability where nothing is absolutely certain. It is therefore important to refer to the apparent strengths, limitations and findings of a piece of research (Burns and Grove, 1997). The use of personal pronouns is also avoided in order that an appearance of objectivity can be maintained. Credibility and integrity There are numerous tools available to help both novice and advanced reviewers to critique research studies (Tanner, 2003). These tools generally ask questions that can help the reviewer to determine the degree to which the steps in the research process were followed. However, some steps are more important than others and very few tools acknowledge this. Ryan-Wenger (1992) suggests that questions in a critiquing tool can be subdivided in those that are useful for getting a feel for the study being presented which she calls 'credibility variables' and those that are essential for evaluating the research process called 'integrity variables'. Credibility variables concentrate on how believable the work appears and focus on the researcher's qualifications and ability to undertake and accurately present the study. The answers to these questions are important when critiquing a piece of research as they can offer the reader an insight into vhat to expect in the remainder of the study. However, the reader should be aware that identified strengths and limitations within this section will not necessarily correspond with what will be found in the rest of the work.
  • 28. Integrity questions, on the other hand, are interested in the robustness of the research method, seeking to identify how appropriately and accurately the researcher followed the steps in the research process. The answers to these questions 658 British Journal of Nursing. 2007. Vol 16, No II RESEARCH METHODOLOGIES Table 1. Research questions - guidelines for critiquing a quantitative research study Elements influencing the beiievabiiity of the research Elements Writing styie Author Report titie Abstract Questions Is the report well written - concise, grammatically correct, avoid the use of jargon? Is it weil iaid out and organized? Do the researcher(s') quaiifications/position indicate a degree of knowledge in this particuiar field? Is the title clear, accurate and unambiguous? Does the abstract offer a clear overview of the study including the research problem, sample, methodology, finding and recommendations? Elements influencing the robustness of the research
  • 29. Elements Purpose/research Problem Logical consistency Literature review Theoreticai framework Aims/objectives/ research question/ hypotheses Sampie Ethicai considerations Operational definitions Methodology Data Anaiysis / results Discussion References Questions Is the purpose of the study/research problem clearly identified? Does the research report foilow the steps of the research process in a iogical manner? Do these steps naturally fiow and are the iinks ciear? is the review Iogicaily organized? Does it offer a balanced critical anaiysis of the iiterature? is the majority of the literature of recent origin? is it mainly from primary sources and of an empirical nature?
  • 30. Has a conceptual or theoretical framework been identified? Is the framework adequately described? is the framework appropriate? Have alms and objectives, a research question or hypothesis been identified? If so are they clearly stated? Do they reflect the information presented in the iiterature review? Has the target popuiation been cieariy identified? How were the sample selected? Was it a probability or non-probabiiity sampie? is it of adequate size? Are the indusion/exciusion criteria dearly identified? Were the participants fuiiy informed about the nature of the research? Was the autonomy/ confidentiaiity of the participants guaranteed? Were the participants protected from harm? Was ethicai permission granted for the study? Are aii the terms, theories and concepts mentioned in the study dearly defined? is the research design cieariy identified? Has the data gathering instrument been described? is the instrument appropriate? How was it deveioped? Were reliabiiity and validity testing undertaken and the resuits discussed? Was a piiot study undertaken? What type of data and statisticai analysis was undertaken? Was it appropriate? How many of the sampie participated? Significance of the findings? Are the findings iinked back to the iiterature review? if a hypothesis was identified was it supported? Were the strengths and limitations of the study including generalizability discussed? Was a recommendation for further research made? Were ali the books, journais and other media aliuded to in the study accurateiy referenced? will help to identify the trustworthiness of the study and its
  • 31. applicability to nursing practice. Critiquing the research steps In critiquing the steps in the research process a number of questions need to be asked. However, these questions are seeking more than a simple 'yes' or 'no' answer. The questions are posed to stimulate the reviewer to consider the implications of what the researcher has done. Does the way a step has been applied appear to add to the strength of the study or does it appear as a possible limitation to implementation of the study's findings? {Table 1). Eiements influencing beiievabiiity of the study Writing style Research reports should be well written, grammatically correct, concise and well organized.The use of jargon should be avoided where possible. The style should be such that it attracts the reader to read on (Polit and Beck, 2006). Author(s) The author(s') qualifications and job title can be a useful indicator into the researcher(s') knowledge of the area under investigation and ability to ask the appropriate questions (Conkin Dale, 2005). Conversely a research study should be evaluated on its own merits and not assumed to be valid and reliable simply based on the author(s') qualifications. Report title The title should be between 10 and 15 words long and should clearly identify for the reader the purpose of the study (Connell Meehan, 1999). Titles that are too long or too short can be confusing or misleading (Parahoo, 2006). Abstract The abstract should provide a succinct overview of the
  • 32. research and should include information regarding the purpose of the study, method, sample size and selection. Hritislijourn.il of Nursing. 2007. Vol 16. No 11 659 the main findings and conclusions and recommendations (Conkin Dale, 2005). From the abstract the reader should be able to determine if the study is of interest and whether or not to continue reading (Parahoo, 2006). Eiements influencing robustness Purpose of the study/research problem A research problem is often first presented to the reader in the introduction to the study (Bassett and Bassett, 2003). Depending on what is to be investigated some authors will refer to it as the purpose of the study. In either case the statement should at least broadly indicate to the reader what is to be studied (Polit and Beck, 2006). Broad problems are often multi-faceted and will need to become narrower and more focused before they can be researched. In this the literature review can play a major role (Parahoo, 2006). Logical consistency A research study needs to follow the steps in the process in a logical manner.There should also be a clear link between the steps beginning with the purpose of the study and following through the literature review, the theoretical framework, the research question, the methodology section, the data analysis, and the findings (Ryan-Wenger, 1992). Literature review The primary purpose of the literature review is to define or develop the research question while also identifying an appropriate method of data collection (Burns and
  • 33. Grove, 1997). It should also help to identify any gaps in the literature relating to the problem and to suggest how those gaps might be filled. The literature review should demonstrate an appropriate depth and breadth of reading around the topic in question. The majority of studies included should be of recent origin and ideally less than five years old. However, there may be exceptions to this, for example, in areas where there is a lack of research, or a seminal or all-important piece of work that is still relevant to current practice. It is important also that the review should include some historical as well as contemporary material in order to put the subject being studied into context. The depth of coverage will depend on the nature of the subject, for example, for a subject with a vast range of literature then the review will need to concentrate on a very specific area (Carnwell, 1997). Another important consideration is the type and source of hterature presented. Primary empirical data from the original source is more favourable than a secondary source or anecdotal information where the author relies on personal evidence or opinion that is not founded on research. A good review usually begins with an introduction which identifies the key words used to conduct the search and information about which databases were used. The themes that emerged from the literature should then be presented and discussed (Carnwell, 1997). In presenting previous work it is important that the data is reviewed critically, highlighting both the strengths and limitations of the study. It should also be compared and contrasted with the findings of other studies (Burns and Grove, 1997). Theoretical framework Following the identification of the research problem and the review of the literature the researcher should present the theoretical framework (Bassett and Bassett,
  • 34. 2003). Theoretical frameworks are a concept that novice and experienced researchers find confusing. It is initially important to note that not all research studies use a defined theoretical framework (Robson, 2002). A theoretical framework can be a conceptual model that is used as a guide for the study (Conkin Dale, 2005) or themes from the literature that are conceptually mapped and used to set boundaries for the research (Miles and Huberman, 1994). A sound framework also identifies the various concepts being studied and the relationship between those concepts (Burns and Grove, 1997). Such relationships should have been identified in the literature. The research study should then build on this theory through empirical observation. Some theoretical frameworks may include a hypothesis. Theoretical frameworks tend to be better developed in experimental and quasi-experimental studies and often poorly developed or non-existent in descriptive studies (Burns and Grove, 1999).The theoretical framework should be clearly identified and explained to the reader. Aims and objectives/research question/ research hypothesis The purpose of the aims and objectives of a study, the research question and the research hypothesis is to form a link between the initially stated purpose of the study or research problem and how the study will be undertaken (Burns and Grove, 1999). They should be clearly stated and be congruent with the data presented in the literature review. The use of these items is dependent on the type of research being performed. Some descriptive studies may not identify any of these items but simply refer to the purpose of the study or the research problem, others will include either aims and objectives or research questions (Burns and Grove, 1999). Correlational designs, study the relationships that exist between two or more variables and accordingly use either a research question or hypothesis. Experimental and quasi-experimental studies
  • 35. should clearly state a hypothesis identifying the variables to be manipulated, the population that is being studied and the predicted outcome (Burns and Grove, 1999). Sample and sample size The degree to which a sample reflects the population it was drawn from is known as representativeness and in quantitative research this is a decisive factor in determining the adequacy of a study (Polit and Beck, 2006). In order to select a sample that is likely to be representative and thus identify findings that are probably generalizable to the target population a probability sample should be used (Parahoo, 2006). The size of the sample is also important in quantitative research as small samples are at risk of being overly representative of small subgroups within the target population. For example, if, in a sample of general nurses, it was noticed that 40% of the respondents were males, then males would appear to be over represented in the sample, thereby creating a sampling error. The risk of sampling 660 Britishjournal of Nursing. 2007. Vol 16. No II RESEARCH METHODOLOGIES errors decrease as larger sample sizes are used (Burns and Grove, 1997). In selecting the sample the researcher should clearly identify who the target population are and what criteria were used to include or exclude participants. It should also be evident how the sample was selected and how many were invited to participate (Russell, 2005). Ethical considerations Beauchamp and Childress (2001) identify four fundamental moral principles: autonomy, non-maleficence, beneficence
  • 36. and justice. Autonomy infers that an individual has the right to freely decide to participate in a research study without fear of coercion and with a full knowledge of what is being investigated. Non-maleficence imphes an intention of not harming and preventing harm occurring to participants both of a physical and psychological nature (Parahoo, 2006). Beneficence is interpreted as the research benefiting the participant and society as a whole (Beauchamp and Childress, 2001). Justice is concerned with all participants being treated as equals and no one group of individuals receiving preferential treatment because, for example, of their position in society (Parahoo, 2006). Beauchamp and Childress (2001) also identify four moral rules that are both closely connected to each other and with the principle of autonomy. They are veracity (truthfulness), fidelity (loyalty and trust), confidentiality and privacy.The latter pair are often linked and imply that the researcher has a duty to respect the confidentiality and/or the anonymity of participants and non-participating subjects. Ethical committees or institutional review boards have to give approval before research can be undertaken. Their role is to determine that ethical principles are being applied and that the rights of the individual are being adhered to (Burns and Grove, 1999). Operational definitions In a research study the researcher needs to ensure that the reader understands what is meant by the terms and concepts that are used in the research. To ensure this any concepts or terms referred to should be clearly defined (Parahoo, 2006). Methodology: research design Methodology refers to the nuts and bolts of how a research study is undertaken. There are a number of
  • 37. important elements that need to be referred to here and the first of these is the research design. There are several types of quantitative studies that can be structured under the headings of true experimental, quasi-experimental and non-experimental designs (Robson, 2002) {Table 2). Although it is outside the remit of this article, within each of these categories there are a range of designs that will impact on how the data collection and data analysis phases of the study are undertaken. However, Robson (2002) states these designs are similar in many respects as most are concerned with patterns of group behaviour, averages, tendencies and properties. Methodology: data collection The next element to consider after the research design is the data collection method. In a quantitative study any number of strategies can be adopted when collecting data and these can include interviews, questionnaires, attitude scales or observational tools. Questionnaires are the most commonly used data gathering instruments and consist mainly of closed questions with a choice of fixed answers. Postal questionnaires are administered via the mail and have the value of perceived anonymity. Questionnaires can also be administered in face-to-face interviews or in some instances over the telephone (Polit and Beck, 2006). Methodology: instrument design After identifying the appropriate data gathering method the next step that needs to be considered is the design of the instrument. Researchers have the choice of using a previously designed instrument or developing one for the study and this choice should be clearly declared for the reader. Designing an instrument is a protracted and sometimes difficult process (Burns and Grove, 1997) but the overall aim is that the final questions will be clearly linked to the research questions and will elicit accurate information
  • 38. and will help achieve the goals of the research.This, however, needs to be demonstrated by the researcher. Table 2. Research designs Design Experimental Qucisl-experimental Non-experimental, e.g. descriptive and Includes: cross-sectional. correlationai. comparative. iongitudinal studies Sample 2 or more groups One or more groups One or more groups Sample allocation Random Random Not applicable Features
  • 39. • Groups get different treatments • One variable has not been manipuiated or controlled (usually because it cannot be) • Discover new meaning • Describe what already exists • Measure the relationship between two or more variables Outcome • Cause and effiect relationship • Cause and effect relationship but iess powerful than experimental • Possible hypothesis for future research • Tentative explanations Britishjournal of Nursing. 2007. Vol 16. No 11 661 If a previously designed instrument is selected the researcher
  • 40. should clearly establish that chosen instrument is the most appropriate.This is achieved by outlining how the instrument has measured the concepts under study. Previously designed instruments are often in the form of standardized tests or scales that have been developed for the purpose of measuring a range of views, perceptions, attitudes, opinions or even abilities. There are a multitude of tests and scales available, therefore the researcher is expected to provide the appropriate evidence in relation to the validity and reliability of the instrument (Polit and Beck, 2006). Methodology: validity and reliability One of the most important features of any instrument is that it measures the concept being studied in an unwavering and consistent way. These are addressed under the broad headings of validity and reliability respectively. In general, validity is described as the ability of the instrument to measure what it is supposed to measure and reliability the instrument's ability to consistently and accurately measure the concept under study (Wood et al, 2006). For the most part, if a well established 'off the shelf instrument has been used and not adapted in any way, the validity and reliability will have been determined already and the researcher should outline what this is. However, if the instrument has been adapted in any way or is being used for a new population then previous validity and reliability will not apply. In these circumstances the researcher should indicate how the reliability and validity of the adapted instrument was established (Polit and Beck, 2006). To establish if the chosen instrument is clear and unambiguous and to ensure that the proposed study has been conceptually well planned a mini-version of the main study, referred to as a pilot study, should be undertaken before the main study. Samples used in the pilot study are generally omitted from the main study. Following the pilot study the
  • 41. researcher may adjust definitions, alter the research question, address changes to the measuring instrument or even alter the sampling strategy. Having described the research design, the researcher should outline in clear, logical steps the process by which the data was collected. All steps should be fully described and easy to follow (Russell, 2005). Analysis and results Data analysis in quantitative research studies is often seen as a daunting process. Much of this is associated with apparently complex language and the notion of statistical tests. The researcher should clearly identify what statistical tests were undertaken, why these tests were used and what •were the results. A rule of thumb is that studies that are descriptive in design only use descriptive statistics, correlational studies, quasi-experimental and experimental studies use inferential statistics. The latter is subdivided into tests to measure relationships and differences between variables (Clegg, 1990). Inferential statistical tests are used to identify if a relationship or difference between variables is statistically significant. Statistical significance helps the researcher to rule out one important threat to validity and that is that the result could be due to chance rather than to real differences in the population. Quantitative studies usually identify the lowest level of significance as PsO.O5 (P = probability) (Clegg, 1990). To enhance readability researchers frequently present their findings and data analysis section under the headings of the research questions (Russell, 2005). This can help the reviewer determine if the results that are presented clearly
  • 42. answer the research questions. Tables, charts and graphs may be used to summarize the results and should be accurate, clearly identified and enhance the presentation of results (Russell, 2005). The percentage of the sample who participated in the study is an important element in considering the generalizability of the results. At least fifty percent of the sample is needed to participate if a response bias is to be avoided (Polit and Beck, 2006). Discussion/conclusion/recommendations The discussion of the findings should Oow logically from the data and should be related back to the literature review thus placing the study in context (Russell, 2002). If the hypothesis was deemed to have been supported by the findings, the researcher should develop this in the discussion. If a theoretical or conceptual framework was used in the study then the relationship with the findings should be explored. Any interpretations or inferences drawn should be clearly identified as such and consistent with the results. The significance of the findings should be stated but these should be considered within the overall strengths and limitations of the study (Polit and Beck, 2006). In this section some consideration should be given to whether or not the findings of the study were generalizable, also referred to as external validity. Not all studies make a claim to generalizability but the researcher should have undertaken an assessment of the key factors in the design, sampling and analysis of the study to support any such claim. Finally the researcher should have explored the clinical significance and relevance of the study. Applying findings in practice should be suggested with caution and will obviously depend on the nature and purpose of the study.
  • 43. In addition, the researcher should make relevant and meaningful suggestions for future research in the area (Connell Meehan, 1999). References The research study should conclude with an accurate list of all the books; journal articles, reports and other media that were referred to in the work (Polit and Beck, 2006). The referenced material is also a useful source of further information on the subject being studied. Conciusions The process of critiquing involves an in-depth examination of each stage of the research process. It is not a criticism but rather an impersonal scrutiny of a piece of work using a balanced and objective approach, the purpose of which is to highlight both strengths and weaknesses, in order to identify 662 Uritish Journal of Nursinii. 2007. Vol 16. No II RESEARCH METHODOLOGIES whether a piece of research is trustworthy and unbiased. As nursing practice is becoming increasingly more evidenced based, it is important that care has its foundations in sound research. It is therefore important that all nurses have the ability to critically appraise research in order to identify what is best practice. HH
  • 44. Russell C (2005) Evaluating quantitative researcli reports. Nephrol Nurs J 32(1): 61-4 Ryan-Wenger N (1992) Guidelines for critique of a research report. Heart Lung 21(4): 394-401 Tanner J (2003) Reading and critiquing research. BrJ Perioper Nurs 13(4): 162-4 Valente S (2003) Research dissemination and utilization: Improving care at the bedside.J Nurs Care Quality 18(2): 114-21 Wood MJ, Ross-Kerr JC, Brink PJ (2006) Basic Steps in Planning Nursing Research: From Question to Proposal 6th edn. Jones and Bartlett, Sudbury Bassett C, B.issett J (2003) Reading and critiquing research. BrJ Perioper NriK 13(4): 162-4 Beauchamp T, Childress J (2001) Principles of Biomedical Ethics. 5th edn. O.xford University Press, Oxford Burns N, Grove S (1997) The Practice of Nursing Research: Conduct, Critique and Utilization. 3rd edn.WB Saunders Company, Philadelphia Burns N, Grove S (1999) Understanding Nursing Research. 2nd edn. WB Saunders Company. Philadelphia
  • 45. Carnell R (1997) Critiquing research. Nurs Pract 8(12): 16-21 Clegg F (1990) Simple Statistics: A Course Book for the Social Sciences. 2nd edn. Cambridge University Press. Cambridge Conkin DaleJ (2005) Critiquing research for use in practice.J Pediatr Health Care 19: 183-6 Connell Meehan T (1999) The research critique. In:Treacy P, Hyde A, eds. Nursing Research and Design. UCD Press, Dublin: 57-74 Cullum N. Droogan J (1999) Using research and the role of systematic reviews of the literature. In: Mulhall A. Le May A. eds. Nursing Research: Dissemination and Implementation. Churchill Livingstone, Edinburgh: 109-23- Miles M, Huberman A (1994) Qualitative Data Analysis. 2nd edn. Sage, Thousand Oaks. Ca Parahoo K (2006) Nursing Research: Principles, Process and Issties. 2nd edn. Palgrave Macmillan. Houndmills Basingstoke Polit D. Beck C (2006) Essentials of Nursing Care: Methods, Appraisal and Utilization. 6th edn. Lippincott Williams and Wilkins, Philadelphia
  • 46. Robson C (2002) Reat World Research. 2nd edn. Blackwell Publishing, O.xford KEY POINTS I Many qualified and student nurses have difficulty understanding the concepts and terminology associated with research and research critique. IThe ability to critically read research is essential if the profession is to achieve and maintain its goal to be evidenced based. IA critique of a piece of research is not a criticism of the wori<, but an impersonai review to highlight the strengths and iimitations of the study. I It is important that all nurses have the ability to criticaiiy appraise research In order to identify what is best practice. Critiquing Nursing Research 2nd edition Critiquing Nursing Research 2nd edition
  • 47. ISBN-W; 1- 85642-316-6; lSBN-13; 978-1-85642-316-8; 234 x 156 mm; p/back; 224 pages; publicatior) November 2006; £25.99 By John R Cutdiffe and Martin Ward This 2nd edition of Critiquing Nursing Research retains the features which made the original a 'best seller' whilst incorporating new material in order to expand the book's applicability. In addition to reviewing and subsequently updating the material of the original text, the authors have added two further examples of approaches to crtitique along with examples and an additonal chapter on how to critique research as part of the work of preparing a dissertation. The fundamentals of the book however remain the same. It focuses specifically on critiquing nursing research; the increasing requirement for nurses to become conversant with research, understand its link with the use of evidence to underpin practice; and the movement towards becoming an evidence-based discipline. As nurse education around the world increasingly moves towards an all-graduate discipline, it is vital for nurses to have the ability to critique research in order to benefit practice. This book is the perfect tool for those seeking to gain or develop precisely that skill and is a must-have for all students nurses, teachers and academics. John Cutclitfe holds the 'David G. Braithwaite' Protessor of Nursing Endowed Chair at the University of Texas (Tyler); he is
  • 48. also an Adjunct Professor of Psychiatric Nursing at Stenberg College International School of Nursing, Vancouver, Canada. Matin Ward is an Independent tvtental Health Nurse Consultant and Director of tvlW Protessional Develcpment Ltd. To order your copy please contact us using the details below or visit our website www.quaybooks.co.yk where you will also tind details ot other Quay Books otters and titles. John Cutcliffe and Martin Ward IQUAYBOOKS AdMsioiiDftUHiolthcareM Quay Books Division I MA Healthcare Limited Jesses Farm I Snow Hill I Dinton I Salisbury I Wiltshire I SP3 5HN I UK Tel: 01722 716998 I Fax: 01722 716887 I E-mail: [email protected] I Web: www.quaybooks.co.uk A ilH MAHbUTHCASIUMITED Uritishjoiirnnl of Nursinji;. 2OO7.V0I 16. No 11 663
  • 49. Jeff Rotman/The Image Bank/Getty Images chapter 7 Dependent t-Tests and Repeated Measures Analysis of Variance Learning Objectives After reading this chapter, you will be able to. . . 1. describe the impact that initial between-groups differences have on test results when using the t-test or analysis of variance. 2. compare the independent t-test to the dependent-groups t-test. 3. complete a dependent-(paired/repeated-)samples t-test. 4. explain what power means in statistical testing. 5. compare the one-way ANOVA to the repeated-measures ANOVA. 6. complete a repeated-measures ANOVA. 7. interpret results and draw conclusions of within-group designs. 8. present within-group analysis results in APA format. 9. employ Wilcoxon signed-ranks W-test and Friedman’s nonparametric ANOVA. CN
  • 50. CO_LO CO_TX CO_NL CT CO_CRD suk85842_07_c07.indd 235 10/23/13 1:29 PM CHAPTER 7Section 7.1 Reconsidering the t and F Ratios Tests of significant difference, such as the t-test and analysis of variance, are of two kinds: tests involving independent (or between) groups and those that employ related, or dependent (or within) groups. The tests covered to this point in the book have involved only independent groups tests. However, there are important advantages related to the dependent groups procedures, and they are used frequently in data analysis. In this chapter, the focus will be on the dependent groups equivalents of the independent t-test and the one-way ANOVA. Since the same groups are used over time or treatments, these are called dependent/within-groups designs, whereas the matched or equivalent groups can also be employed as an alternative design—all collectively known as repeated- measures designs. Although repeated-measures designs answer the same questions as
  • 51. their independent groups equivalents (i.e., are there significant differences within groups, across times/treatments, or between matched/equivalent groups) under particular cir- cumstances they can do so with greater economy and more statistical power. 7.1 Reconsidering the t and F Ratios The scores produced in both the independent t and the one-way ANOVA are ratios. In the case of the t-test, the ratio is the result of dividing the difference between the means of the groups by the standard error of the difference: t 5 M1 2 M2 SEd With ANOVA, the F ratio is the mean square between divided by the mean square within: F 5 MSbet MSwith With either t or F, the denominator in the ratio reflects how much scores vary within (rather than between) the groups of subjects involved in the study. These differences are easiest to see in the way the standard error of the difference is calculated for a t-test. When group sizes are equal, the formula is SED 5 "1SEM1 2 2 1 1SEM2 2 2 with SEM 5
  • 52. s "n and s, of course, a measure of score variation in any group. So the standard error of the difference is based on the standard error of the mean, which in turn is based on the standard deviation. These connections make it clear that score vari- ance within in a t-test has its root in the standard deviation for each group of scores. If we reverse the order and work from the standard deviation back to the standard error of the difference, note the following: • When scores vary substantially in a group, it is reflected in a large standard deviation. • When the standard deviation is relatively large, the standard error of the mean must likewise be large because the standard deviation is the numerator in the formula for SEM. H1 TX_DC BLF TX BL BLL
  • 53. suk85842_07_c07.indd 236 10/23/13 1:29 PM CHAPTER 7Section 7.1 Reconsidering the t and F Ratios • A large standard error of the mean results in a large standard error of the difference because that statistic is the square root of the sum of the squared standard errors of the mean. • When the standard error of the difference is large, the differ- ence between the means has to be correspondingly larger in order for the result to be statistically significant. The table of critical values indicates that no t ratio (the ratio of the differ- ences between the means and the standard error of the differ- ence) may be less than 1.96 for a two-tailed test and less than 1.645 for a one-tailed test based on the critical a 5 .05. Error Variance The point of this is that the value of t in the t-test—and it is the same for F in an ANOVA— is greatly affected by the amount of variability within the groups involved. When the variability within those groups is extensive, the values of t and F are correspondingly diminished and less likely to be statistically significant than when there is relatively little variability within the groups. These differences within groups stem from differences in the way individuals within the samples react to whatever treatment is the independent variable; different people respond differently to the same stimulus. These differences represent
  • 54. error variance, which is what occurs whenever scores differ for reasons not related to the influence of the IV. Other Sources of Error Variance Within-group differences are not the only source of error variance in the calculation of t and F. Both t-test and ANOVA are based on the assumption that the groups involved are equivalent before the independent variable is introduced. In a t- test where the impact of relaxation therapy on clients’ anxiety is the issue, the assumption is that before the ther- apy is introduced, the group, which will receive the therapy, and the control group, which will not, begin with equivalent levels of anxiety. That assumption is the key to attributing any differences after the treatment to the therapy, the IV. Confounding Variables In comparisons such as this, the initial equivalence of the groups can be a problem, how- ever. Maybe there were differences in anxiety before the therapy was introduced. There might be differences in the employment circumstances of each group, and perhaps those threatened with unemployment are more anxious than the others. Maybe there are age- related differences. These other influences that are not controlled in an experiment are sometimes called confounding variables. If a psychologist wants to examine the impact that a substance abuse program has on
  • 55. addicts’ behavior, a study might be set up as follows. Two groups of the same number of addicts are selected, and the substance abuse program is provided to one group. After the program, the psychologist measures the level of substance abuse in both groups to see whether there is a difference. A If the size of the standard deviation is related to the size of the group, in a t-test, what is the relationship between sample size and error? Try It! suk85842_07_c07.indd 237 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs The problem is that the presence or absence of the program is not the only thing that might prompt subjects to respond differently. Perhaps subjects’ background experiences are dif- ferent. Maybe there are ethnic group differences, age differences, or social class differ- ences. If any of those differences affect substance abuse behavior, there is an opportunity to confuse the influence of those factors with the impact of the substance abuse program, which is the IV. If those other differences are not controlled and they affect the dependent
  • 56. variable, they contribute to error variance. There is error variance any time the dependent variable (DV) scores fluctuate for reasons unrelated to the IV. Therefore, there is error variance reflected in the variability within groups, and there is error variance represented in any difference between groups that is not related to the IV. Test results can be meaningful only when the score variance that is related to the independent variable is substantially greater than the error variance—what is controlled must contrib- ute more to score values than what is left uncontrolled. This makes it important to look for ways to control error variance so that it is not confused with the variability in scores that stems from the independent variable. Controlling for confounding variables is a necessary research activity. A confounding variable can affect the IV-DV relationship, thereby lowering internal validity and thus the statistical conclusion validity of your findings. Failing to take confounding variables into account can result in misleading data and erroneous conclu- sions, to the detriment of the researcher’s reputation. In other words, be careful of research findings and sweeping general statements as there may be several confounding elements. That said, controlling for extraneous confounding variables could be done in several ways. 7.2 Dependent-Groups Designs Ideally, any before-the-treatment differences between the groups in a study will be min- imal. Recall that random selection occurs when every member of a population has an
  • 57. equal chance of being selected. The logic behind random selection is that when groups are randomly drawn from the same population, they will differ only by chance, but they will differ because no sample can represent the population with complete fidelity, and occa- sionally, the chance differences will affect the way subjects respond to the IV. One way to reduce error variance is to adopt what are called dependent-groups designs. The independent t-test and the one- way ANOVA required independent groups. Members of one group can- not also be members of other groups in the same study. However, in the case of the t-test, if the same group is measured, exposed to the treatment, and then measured again, an important source of error variance is controlled. Using the same group twice makes initial equivalence no longer a concern. Any scoring variability between the first and second measure should more accurately reflect the impact of the independent variable. The Dependent-Samples t-Tests One dependent-groups test where the same group is measured twice is called the before/ after t-test, also known as the pre/post t-test. An alternative is called the matched-pairs or dependent-samples t-test, where each participant in the first group is matched to someone in the second group who has a similar characteristic.
  • 58. Yet a third alternative that B How does random selection attempt to control error variance in statistical testing? Try It! suk85842_07_c07.indd 238 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs is basically the same as a before/after design is the within- treatment design where each participant is used across two treatment groups (usually given at two different times, which makes it the same as the before/after t-test). In the latter option the participant acts as his or her own control where one of the treatments may be a placebo. All three types of dependent-samples t-tests have the same objective, which is controlling the error variance that is due to initial between-groups differences. Following are examples of each test. • The before/after design: A researcher is interested in the impact that positive rein- forcement has on employees’ sales productivity. Besides the sales commission, the researcher introduces a rewards program that can result in increased vaca- tion time. The researcher gauges sales productivity for a month,
  • 59. introduces the rewards program, and gauges sales productivity during the second month for the same people. The researcher will explore differences in employee productivity before and after the positive reinforcement intervention (the rewards program). If significance was obtained then the null (i.e., there is no difference in employee productivity after the introduction of the rewards program) can be rejected and find support for the alternative hypothesis (i.e., there is a significant increase in employee productivity after the introduction of the rewards program). • The matched-pairs design: A school counselor is interested in the impact that verbal reinforcement has on students’ reading achievement. To eliminate between-groups differences, the researcher selects 30 people for the treatment group and matches each person in the treatment group to someone in the control group who has a similar reading score on a standardized test. The researcher then introduces the verbal reinforcement program to those in the treatment group for a specified period and over time compares the performance of students in the two groups as well as their performance within the group. The matched-pairs design is similar to an inde- pendent group design with one major exception: that the groups are as matched (or equivalent) to each other as closely as possible based on a particular measure—in
  • 60. this case the match or equivalent is based on reading scores on a standardized test. • Within-treatment design: A psychiatrist measures each study participant on taking a placebo, and then the actual drug for depression to test for significant differences over the two treatments (placebo versus drug). Here a counterbalancing design may be employed to minimize the order effects that plague repeated- measures design. Specif- ically the order in which treatments are given can influence the outcome. Therefore, the treatments are given to the groups at different times as depicted in Figure 7.1. Figure 7.1: Counterbalance design Source: Oskar Blakstad (May 8, 2009). Counterbalanced Measures Design by Martyn Shuttleworth. Retrieved Aug 22, 2013 from Explorable.com: http://explorable.com/counterbalanced- measures-design. Although there are differences in how the tests are set up, calculating the t-statistic is the same in each case. The differences between the approaches are conceptual, not Group 1 Treatment A Treatment B Posttest Group 2 Treatment B Treatment A Posttest suk85842_07_c07.indd 239 10/23/13 1:29 PM http://explorable.com/counterbalanced-measures-design
  • 61. CHAPTER 7Section 7.2 Dependent-Groups Designs mathematical. Both approaches have the same purpose—to control for any score varia- tion stemming from nonrelevant factors. They both reduce the error variance that comes from using nonequivalent groups. Therefore, testing for homogeneity of variance or the Levene’s test is moot here, as we are not dealing with differences between groups. On the other hand, we are dealing with variance within groups and across pairs of treatments. If there are such significant differences, then this issue constitutes what is described as a violation of sphericity, which will be discussed in Section 7.3 with more depth when we examine repeated-measures ANOVA. Calculating t in a Dependent-Groups Design Although the differences between before/after, matched-pairs, and within-treatment t-tests are not math-related, there are several approaches to calculating the t statistic in the dependent-groups tests. Whatever their differences, they all take into account the fact that the two sets of scores are related. One approach is to calculate the correlation between the two sets of scores and then to use the strength of the correlation value to reduce the error variance— the higher the correlation between the two sets of scores, the
  • 62. lower the error variance. Rather than correlations, which come up later in the book, we will rely on “difference scores.” But whether we use correla- tion values or difference scores, the result is the same. The distribution of difference scores was discussed in Chapter 5 when the independent t-test was introduced. The point of that distribution is to determine the point at which the difference between a pair of sample means (M1 2 M2) is so great that the most probable explanation is that the samples were not drawn from populations with the same means. That same distribution also provides the theoretical underpinning for the dependent- groups tests, but rather than the difference between the means of the two groups (M1 2 M2), the difference score in the dependent-groups tests is based on the mean of the differences between pairs of individual scores. That is, the differences between each pair of related scores will be determined, and then the mean of those differences will become the numerator in the t ratio. If the mean of the difference scores is sufficiently different from the mean of the distribution of difference scores (which, recall, is 0), the t value will be statistically significant. The denominator in the t ratio is another standard error of the mean value, but in this case, it is the standard error of the mean for those difference scores. The mechanics of checking
  • 63. for significance are similar to what was done for the independent t: • A critical value from the t table defines the point at which the t ratio is statisti- cally significant. • The critical value is dependent upon the degrees of freedom for the problem. For the dependent-samples t, the degrees of freedom are the number of pairs of scores minus 1 (n 2 1). The dependent-groups t-test statistic has this form: t 5 Md SEMd Formula 7.1 C How are the before/after t-test and the matched-pairs t-test different? Try It! suk85842_07_c07.indd 240 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs Where
  • 64. Md 5 the mean of the difference scores SEMd 5 the standard error of the mean for the difference scores The steps for completing the test follow: 1. From the two scores for each subject, subtract the second from the first to deter- mine the difference score, d for each pair. 2. Determine the mean of the d scores: Md 5 a d number of pairs 3. Calculate the standard deviation of the d values, sd. 4. Calculate the standard error of the mean for the difference scores, SEMd, by dividing the result of Step 3 by the square root of the number of pairs of scores, SEMd 5 sd "number of pairs 5. t 5 Md SEMd Following is an example to illustrate the steps to calculating the dependent measures t-test. A psychologist is investigating the impact that verbal reinforcement has on the number of questions students ask in a seminar.
  • 65. • Ten upper-level students participate in two seminars where a presentation is fol- lowed by students’ questions. • In the first seminar, no feedback is provided by the instructor after a student asks the presenter a question. • In the second seminar, the instructor offers feedback—such as “That’s an excel- lent question” or “Very interesting question” or “Yes, that had occurred to me as well”—after each question. • The psychologist will test the following hypothesis: H0: There is no significant mean difference in the number of student questions asked from seminar 1 to seminar 2 H0: mseminar_1_questions 5 mseminar_2_questions • By rejecting H0 , the psychologist will find support for the alternative hypothesis: Ha: There is a significant mean difference in the number of student questions asked from seminar 1 to seminar 2 Ha: mseminar_1_questions ? mseminar_2_questions Is there a significant difference between the number of questions students ask in the first seminar compared to the number of questions students ask in the second seminar? The number of questions asked by each student in both seminars and
  • 66. the solution to the prob- lem are in Figure 7.2. suk85842_07_c07.indd 241 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs Figure 7.2: Calculating the before/after and within-treatment t Se m in ar 1 Se m in ar 2 1. 2. 3. 4. 5.
  • 71. �2 1 1. Determine the difference between each pair of scores, d by subtraction. 2. Determine the mean of the differences, the d values (Md). Md = = � = �1.1 3. Calculate the standard deviation of the d values (sd). Verify that sd = 1.101 �d = �11 determine standard error of the mean for the difference scores (SEMd) 4. Just as the standard error of the mean in the earlier tests was s/�n, by dividing the result of step 3 by the square root of the number of pairs. Verify that SEmd = = = 0.348 5. Divide Md by SEMd to determine t. t = = � = �3.161 this test are the number of pairs of scores, np �1. t0.05(9) = 2.262
  • 72. 6. As noted earlier, the degrees of freedom for the critical value of t for �d 10 Md SEMd 1.1 0.348 sd �np 1.101 �10 11 10 suk85842_07_c07.indd 242 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs The calculated value of t exceeds the critical value from Table 5.1 (which is also Table B in the Appendix). The result is statistically significant. Note that it
  • 73. is the absolute value of the calculated t in which we are interested. Because the question was whether there is a sig- nificant difference in the number of questions, it is a two-tailed test, and it does not matter which session had the greater number; it also does not matter whether Session 1 is larger than Session 2 or the other way around. The students in the second session, where ques- tions were followed by feedback, asked significantly more questions than the students did in the first session, when the instructor offered no feedback. The Degrees of Freedom, the Dependent-Groups Test, and Power With Md 5 21.1, there is comparatively little difference between the two sets of scores. What makes such a small mean difference statistically significant? The answer is in the amount of error variance in this problem. When the error variance is also very small—the standard error of the difference scores is just .348— comparatively small mean differences can be statistically significant. The rationale for using dependent- groups tests as opposed to independent-group designs is that the for- mer are comparatively more powerful; there is less error to contend with, thereby increasing the probability of rejecting the null hypoth- esis. This brings us to the discussion of power in statistical testing. Table B in the Appendix, the critical values of t, indicates that
  • 74. critical values decline as degrees of freedom increase. That occurs not only in the critical values for t but also for F in analysis of variance and in fact for most tables of critical values for statistical tests. For the dependent- groups t-test, the degrees of freedom are based on • the number of pairs of related scores, 21. For the independent-groups t-test, the degrees of freedom are based on • the number of scores in both groups, 22 (Chapter 5). This means that critical values are larger in a dependent-groups test for the same number of raw scores involved. But even a test with a larger critical value can produce significant results when there is more control of error variance. This is what the dependent-groups test provides. The central point is this: When each pair of scores comes from the same par- ticipant, or from a matched pair of participants, the random variability from nonequiva- lent groups is minimal because scores tend to vary similarly for each pair, resulting in relatively little error variance. The small SEMd value that results more than compensates for the fewer degrees of freedom and the associated larger critical value connected to dependent-groups tests. In statistical testing, power is defined as the likelihood of detecting a significant differ-
  • 75. ence when it is present. The more powerful statistical test is the one that will most read- ily detect a significant difference. As long as the sets of scores are closely related, the dependent-measures test is more powerful than the independent- groups equivalent. D What does it mean to say that the within-subjects test has more power than the independent t-test? Try It! suk85842_07_c07.indd 243 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs A Matched-Pairs Example Another form of the dependent-groups t-test is the matched- pairs design. In this approach, rather than measure the same people repeatedly, each participant in one group is paired with a participant in the other group who is similar. For example, consider a market analyst who wants to determine whether a television com- mercial will induce consumers to spend more on a breakfast cereal. The analyst selects a group of consumers entering a grocery store, induces them to view the television com- mercial, and then tracks their expenditures on breakfast cereal.
  • 76. A second group is selected, and they also shop, but they do not view the television commercial. The analyst selects people for the second group who match the age and gender characteristics of those in the first group. This controls for age and gender because those characteristics might affect spending for the particular product. Each individual from Group 1 has a companion in Group 2 of the same age and sex. The expenditures in dollars for the members of each group and the solution to the problem are in Figure 7.3. Figure 7.3: Calculating a matched-pairs t-test Vi ew ed Di d no t Vi ew d 1. 2. 3. 4.
  • 79. sd = 2.092 Verify that Md = 1.125 SEMd = = = 0.662 t = = = 1.700 t0.05(9) = 2.262 sd �np 2.092 �10 Md SEMd 1.125 0.662 suk85842_07_c07.indd 244 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs The absolute value of t is less than the critical value from Table 5.1 (or Appendix Table B) for df 5 9. The difference is not statistically significant. There
  • 80. are probably several ways to explain the outcome, but we will explore just three. • The most obvious explanation is that the commercial did not work. The shoppers who viewed the commercial were not induced to spend significantly more than those who did not view it. • Another explanation has to do with the matching. Perhaps age and gender are not related to how much people spend shopping for the particular product. Per- haps the shopper’s level of income is the most important characteristic, and that was not controlled in the pairing. • Another explanation is related to sample size. Small samples tend to be more variable than larger samples, and variability is what the denominator in the t-ratio reflects. Perhaps if this had been a larger sample, the SEMd would have had a smaller value, and the t would have been significant. The second explanation points out the disadvantage of matched- pairs designs compared to repeated-measures designs. The individual conducting the study must be in a posi- tion to know which characteristics of the participants are most relevant to explaining the dependent variable so that they can be matched in both groups. Otherwise, it is impos- sible to know whether a nonsignificant outcome reflects an inadequate match, control of the wrong variables, or a treatment that just does not affect the
  • 81. DV. Comparing the Dependent-Samples t-Test to the Independent t In order to compare the dependent-samples t-test and the independent t more directly, we are going to apply both tests to the same data. This will illustrate how each test deals with error variance; however, a caution is necessary before beginning: Once data is collected, there really is no situation where someone can choose which test to use because either the groups are independent, or they are not. Therefore, we proceed purely as an academic exercise, recognizing that such a situation is not going to happen in the ordinary course of events. As an example, a university program encourages students to take a service learning class that emphasizes the importance of community service as a part of the students’ educa- tional experience. Data is gathered on the number of hours former students spend in com- munity service per month after they complete the course and graduate from the university. • For the independent t-test, the students are divided between those who took a service learning class and graduates of the same year who did not. • For the dependent-groups t-test, those who took the service learning class are matched to a student with the same major, age, and gender who did not take
  • 82. the class. The data and the solutions to both tests are in Figure 7.4. suk85842_07_c07.indd 245 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs Figure 7.4: The before/after t-test versus the independent t-test Because the differences between the scores are quite consistent, as they tend to be when participants are matched effectively, there is very little variance in the difference scores. This results in a comparatively small standard deviation of difference scores and a small standard error of the mean for the difference scores. This allows for t ratios with even rela- tively small numerators to be statistically significant. Because for the independent t-test, there is no assumption that the two groups are related, error variance is based on the dif- ferences within the groups of raw scores, and the denominator is large enough that the t value is not significant. Cl as s No Cl as
  • 86. SEd = �(SEM12 + SEM22) = �(0.4532 + 0.3162) = 0.553 As a matched pairs t-test the results are, t = = 0.650 + 0.211 = 3.081; t0.05(9) = 2.262. The result is significant. M1 � M2 SEd 3.50 � 2.850 = 1.175; t0.05(18) = 2.101. The result is not significant.=t = 0.553 Md SEMd suk85842_07_c07.indd 246 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs The Dependent-Groups t-Test on Excel If the problem in Figure 7.4 is completed in Excel as a dependent-groups test, the proce- dure is as follows: • Create the data file in Excel. Column A is labeled Class to indicate those who had the service learning class, and
  • 87. column B is labeled No Class. Enter the data, beginning with cell A2 for the first group and cell B2 for the second group. • Click the Data tab at the top of the page. • At the extreme right, choose Data Analysis. • In the Analysis Tools window, select t-test: Paired Two Sample for Means and click OK. • There are two blanks near the top of the window for Variable 1 Range and Variable 2 Range. In the first, enter A2:A11 indicating that the data for the first (Class) group is in cells A2 to A11. In the second, enter B2:B11 for the No Class group. • Indicate that the hypothesized mean difference is 0. This reflects the value for the mean of the distribution of difference scores. • Indicate A13 for the output range, so that the results do not overlay the data scores. • Click OK. Widen column A so that all the output is readable. The result is the screenshot that is Figure 7.5. suk85842_07_c07.indd 247 10/23/13 1:29 PM
  • 88. CHAPTER 7Section 7.2 Dependent-Groups Designs Figure 7.5: The Excel output for the dependent-samples t-test using the data from Figure 7.4 In the Excel solution, t 5 3.074 rather than the 3.081 from the longhand solution. The Excel approach is to calculate the correlation between scores to find a solution, rather than to determine the difference between scores as we did. Note that the Pearson correlation (which will be explained in Chapter 8) is indicated at .91. In any event, the very minor dif- ference, .007, between the solution in Figure 7.4 and the Excel solution in Figure 7.5 is not relevant to the outcome as these are attributed to rounding errors. The Excel output also indicates results for both one-tailed and two-tailed tests at p 5 .05; the outcome is statisti- cally significant. suk85842_07_c07.indd 248 10/23/13 1:29 PM CHAPTER 7Section 7.2 Dependent-Groups Designs Apply It! Repeated Measures Diabetes is a group of metabolic diseases in which the body
  • 89. cannot properly regulate blood sugar. Management of this disease is achieved by controlling normal levels of glucose in the blood for as much of the time as possible. This requires an accurate, portable glucose monitor for home use. A medical device company has developed a new portable glucose monitor and wishes to compare it against a laboratory standard. This will produce a data set in which two different monitors measure the glucose level of 11 randomly chosen diabetes patients. Although the two monitors take the blood samples at the same time, this can be considered an example of the before/after dependent-samples t-test because the same group is measured twice. By using the same set of patients for both monitors, each patient is his or her own control. Obtaining two measurements for each patient reduces measurement variability compared to using two independent sets of patients. Choosing a level of significance of p ≤ .05, we use the paired-sample t-test to test the null hypothesis that there is no difference in measure- ments between the two monitors. • H0: mglucose_portable_monitor 5 mglucose_lab_monitor By rejecting H0 the company will find support for the alternative hypothesis that there is a significant mean difference in the glucose level between both machines. • Ha: mglucose_portable_monitor ? mglucose_lab_monitor
  • 90. The glucose readings from each of the two monitors are measured in milligrams per deciliter and are shown in the following table. There is a large variability within each column because each patient is different, and the readings were taken at various times of the day. Patient Portable Monitor Laboratory Standard A 112 120 B 85 82 C 103 116 D 154 168 E 65 75 F 52 51 G 85 96 H 72 79 I 167 178 J 123 141 K 142 153 (continued) suk85842_07_c07.indd 249 10/23/13 1:29 PM
  • 91. CHAPTER 7Section 7.2 Dependent-Groups Designs Comparing the Three Dependent t-Tests With the Independent t- Test The before/after and matched-pairs approaches to calculating a dependent-groups t-test each have advantages. The before/after design provides the greatest control over the extraneous variables that can confound the results in a matched- pairs design. When using the matching approach, there is always a chance that subjects in Group 2 are not matched closely enough on some relevant variable and the resulting mismatches create error vari- ance. In the service learning example, students were matched according to age, major, and gender. But if marital status affects students’ willingness to be involved in commu- nity service and it is not controlled, there could be an imbalance of married/not-married Apply It! (continued) The Excel solution follows: Variable 1 Variable 2 Mean 105.45 114.45 Variance 1428.67 1736.27 Observations 10 10
  • 92. Pearson Correlation 0.99 Hypothesized Mean Difference 0.00 df 9 t Stat 24.817 P(Tdt) one-tail 0.0003 tcrit one-tail 1.8331 P(Tdt) two-tail 0.0007 tcrit two-tail 2.2622 The magnitude of the calculated value of t 5 24.817 exceeds the critical two-tail value from the table of tcrit 5 2.26. The result is statistically significant so we reject the null hypothesis that the means are the same. The portable monitor measures glucose levels lower than the laboratory standard. Based on results of this test, the company continued research on the portable monitor until they could devise a solution that would more accurately replicate laboratory standard results. Apply It! boxes written by Shawn Murphy. suk85842_07_c07.indd 250 10/23/13 1:29 PM
  • 93. CHAPTER 7Section 7.3 The Within-Subjects F students that confounds results. The before/after procedure involves the same students, and unless their status on some important variable changes between measures (a rash of marriages between the first and second measurement, for example), there is going to be better control of error variance with that approach. Note that the matched-pairs and the within-treatments approach also assume a large sam- ple from which to draw in order to select participants who match those in the first group. As the number of variables on which participants must be matched increases, so must the size of the sample from which to draw in order to find participants with the correct com- bination of characteristics. The advantage of the matched-pairs design, on the other hand, is that it takes less time to execute. The treatment group and the control group can both be involved in the study at the same time. By way of a summary, note the comparisons in Table 7.1. Table 7.1: Comparing the t-tests Independent t Before/After Matched-Pairs Within- Treatments Groups Independent
  • 94. groups One group measured twice Two groups: each subject from the first group matched to one in the second One group measured twice for two treatments Denominator/ error term Within groups variability plus between groups Only within groups variability Only within groups variability Only within groups variability 7.3 The Within-Subjects F Sometimes two measures of the same group are not enough to
  • 95. track changes in the DV. Maybe the researchers running the service learning study want to compare how much time students devoted to community service the year they graduated, one year later, and then two years after graduation. The within-subjects F is a dependent-groups procedure for two or more groups of scores when the dependent variable is interval or ratio scale. Because the dependent-groups t-test is the repeated-measures equivalent of the indepen- dent t-test, the within-subjects F is the repeated-measures or matched-pairs equivalent of the one-way ANOVA. The same Ronald A. Fisher who developed analysis of variance also developed this test, which is a form of ANOVA, and the test statistic is still F. Here too, the dependent groups can be formed by either repeatedly measuring the same group or by matching separate groups of participants on the relevant variables. When there are more than two groups, matching becomes increasingly problematic, however, and although it is theoretically possible to match any number of participant groups, it is suk85842_07_c07.indd 251 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F a highly complex undertaking to match all the relevant variables across more than two or three measures. Repeatedly measuring the same participants is much more common than
  • 96. matching. Managing Error Variance in the Within-Subjects F Recall from Chapter 6 that when Fisher developed ANOVA, he shifted away from calcu- lating score variability with the standard deviation, standard error of the mean, and so on and used sums of squares instead. The particular sums of squares computed are the key to the strength of this procedure. If a group of participants in a study is measured on a dependent variable at three different intervals and their scores are recorded in parallel columns, the researcher will have a data sheet similar to the following: First Measure Second Measure Third Measure Participant 1 . . . Participant 2 . . . • The column scores are the equivalent of scores from the different groups in a one- way ANOVA, and any differences from column to column reflect the effect of the IV, the treatment. • The participant-to-participant differences, the within-group differences, are reflected in the differences in the scores from row to row. Those differences are error variance just as they are with the one-way ANOVA.
  • 97. • The within-subjects F approach is to calculate the variability between rows (the within-groups variance), and then, because it comes from participant-to- participant differences that are the same in each group, to eliminate it from further analysis. • The only error variance that remains is that which does not stem from the person- to-person differences. In the dependent-samples t-test, the within-subjects variance is managed by reducing the denominator in the t ratio according to how highly correlated the two sets of measures are (the Excel approach) or by the longhand approach of using the standard deviation of the difference scores, which is relatively small when scores are related. In the within-subjects F, the variability within groups is calculated and then adjusted if there are issues with too much variance between pairs of treatments. This detection of variance is based on the Mauchly’s test of sphericity (W) developed by John W. Mauchly in 1940. If the W-test is significant (p , .05), then there is a violation of sphericity, which means that there is too much variance within the group across pairs of times/treat- ments (see Table 7.2). Therefore, since sphericity is violated, degrees of freedom adjust- ments are made that include the Greenhouse-Geisser or Huynh- Feldt calculations. These are adjustments of the degrees of freedom (df ) based on their
  • 98. respective epsilon or suk85842_07_c07.indd 252 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F e-values (discussed more in Chapter 8). Of the two options, the Greenhouse-Geisser is more conservative in that it is harder to reject the null hypothesis, with a lower prob- ability of a type I error. The Huynh-Feldt is based on a bias corrected value that is not as conservative. One final note regarding error variance is that it can only be calculated across comparison of pairs of treatments so therefore the W-test for dependent- samples t-test is not necessary since there is only one pair of values. In addition, the test for sphericity cannot be done in the one-way ANOVA because the amount of variability within groups is different for each group, and there is no way to separate it from the balance of the error variance in the problem, which can be a severe limitation in affecting the power of between-group designs. An example and interpretation of sphericity will be shown in the SPSS example later in the chapter. Table 7.2: The concept of sphericity Patient Tx A Tx B Tx C Tx 12Tx 2 Tx 12Tx 3 Tx 22Tx 3
  • 99. 1 30 27 20 3 10 7 2 35 30 28 5 7 2 3 25 30 20 25 5 10 4 15 15 12 0 3 3 5 9 12 7 23 2 5 Variance 17 10.3 10.3 A Within-Subjects F Example An industrial/organizational psychologist is conducting a study of employees who assemble electronic components. The study examines how productivity changes dur- ing the length of time employed. The psychologist identifies five workers hired in the same month and then gauges the number of assembled components each employee aver- ages per hour one week, one month, and then two months after beginning work. Is there is a relationship between the number of completed components and the length of time employed? The data for the five employees follows: Products Assembled per Hour 1 week 1 month 2 months Diego 2 5 4 Harold 4 7 7
  • 100. Wilma 3 6 5 Carol 4 5 6 Moua 5 8 9 suk85842_07_c07.indd 253 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F • The independent variable (the IV, the treatment) is the time elapsed. • The dependent variable (the DV) is the number of components assembled. • The issue is whether there are significant differences in the measures from col- umn to column (over time). In Chapter 6 the variability related to the IV was measured in the sum of squares between (SSbet). The same source of variance is gauged here, except that it is called the sum of squares between columns (SScol). The Components of the Within-Subjects F Calculating the within-subjects F begins just as the one-way ANOVA begins, by determin- ing all variability from all sources with the sum of squares total (SStot). It is even calculated the same way as it was in Chapter 6: 1. The sum of squares total.
  • 101. SStot 5 a (x 2 MG)2 a. Subtract each score from the mean of all the scores from all the groups, b. square the difference, and then c. sum the squared differences. The balance of the problem is completed with the following steps: 2. The sum of squares between columns (SScol). This equation is much like SSbet in the one-way ANOVA. The scores in each column are treated the same way the indi- vidual groups are treated in the one-way ANOVA. For columns 1, 2, through k, SScol 5 (Mcol 1 2 MG ) 2ncol 1 1 (Mcol 2 2 MG ) 2ncol 2 1 . . . 1 ( Mcol k 2 MG ) 2ncol k Formula 7.2 a. calculate the mean for each column of scores, b. subtract the mean for all the data (MG) from each column mean, c. square the result, and d. multiply the squared result by the number of scores in the column. 3. The sum of squares between rows. This too, is like the SSbet from the one-way prob-
  • 102. lem except that it treats the scores for each row as a separate group. For rows 1, 2, through i SSrows 5 (Mrow 1 2 MG) 2 nrow 1 1 (Mrow 2 2 MG) 2 nrow 2 1 . . . 1 (Mrow i 2 MG) 2 nrow i Formula 7.3 a. calculate the mean for each row of scores, b. subtract the mean for all the data from each row mean, c. square the result, and d. multiply the squared result by the number of scores in the row. suk85842_07_c07.indd 254 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F 4. The residual sum of squares is the error term in the within- subjects F. It is the equivalent of SSwith in the one-way ANOVA. With the within- subjects F, the vari- ability in scores due to person-to-person differences within the same measure is calculated, and because it is the same for each set of measures, it is eliminated. This will result in a reduced error term. It is determined as follows:
  • 103. SSresid 5 SStot 2 SScol 2 SSrows Formula 7.4 a. Take all variance from all sources (SStot), b. subtract from it the treatment effect (SScol), and c. subtract the person-to-person differences (SSrows). The Within-Subjects F Calculations When the sums of squares values are completed, the next step is to complete the ANOVA table. The degrees-of-freedom values are as follows: • dftot 5 N 2 1 • dfcol 5 number of columns 2 1 • dfrows 5 number of rows 2 1 • dfresid 5 dfcol 3 dfrows Just as with one-way problems, the mean square values are calculated by dividing the sums of squares by their degrees of freedom. The components of the F value and the only MS values required are the MScol, which includes the treatment effect, and the MSresid, which is the error term. The MS is not determined for total or for rows. The F value in the within-subjects ANOVA is then MScol 4 MSresid. The calculations and the table for the products-assembled-per- hour problem are in Figure 7.6. suk85842_07_c07.indd 255 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F
  • 104. E How is the error term in the within-subjects F different from that in the one-way ANOVA? Try It! Figure 7.6: A within-subjects F example The calculated value of F exceeds the critical value of F from the table. The number of products assembled per hour is significantly different according to the amount of time the employee has been on the job. The significant F indicates that this much difference between measures is unlikely to have occurred by chance. Diego Moua Carol Wilma Harold Column Means Grand Mean (Md) 1 week 1 month
  • 106. 5 6 9 6.20 3.667 6.0 4.667 5.0 7.333 The Products Assembled per Hour Source Residual Rows Columns Total SS 49.333 22.533
  • 107. 23.333 3.467 df 14 2 4 8 MS 11.267 0.433 F 26.0 Fcrit 4.46 The ANOVA Table 1. SStot = �(x � MG)2 (2 � 5.333)2 + (4 � 5.333)2 + ... + (9 � 5.333)2 = 49.333 4. The residual sum of squares.
  • 108. SSresid = SStot � SScol � SSrows = 49.333 � 22.533 � 23.333 = 3.467 2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + ... + (Mcolk � MG) 2ncolk (3.6 � 5.333)25 + (6.2 � 5.333)25 + (6.2 � 5.333)25 = 22.533 (7.333 � 5.333)23 = 23.333 (3.667 � 5.333)23 + (6.0 � 5.333)23 + (4.667 � 5.333)23 + (5.0 � 5.333)23 + 3. SSrows = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + ... + (Mri � MG) 2ni suk85842_07_c07.indd 256 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F Completing the Post Hoc Test Ordinarily, the calculation of F leaves unanswered the question of which set of measures is significantly different from which. However, in this particular problem there is only one possibility. Because both the 1-month and the 2-month groups of measures have the same mean (M 5 6.20), they must both be significantly different from the only other group of measures in the problem, the 1-week-on-the-job measures for which M 5 3.6. As a dem- onstration, HSD is completed anyway.
  • 109. The HSD procedure is the same as for the one-way test, except that the error term is now MSresid. Substituting MSresid for MSwith in the formula provides HSD 5 x Å MSresid n Where x is a value from Appendix Table D. It is based on the number of means, which is the same as the number of groups of measures, 3 in the example, and the df for MSresid, which are 8. n 5 the number of scores there are in any one measure, 5 in this instance. For the number-of-products-assembled-per-hour study, HSD 5 4.04 Å .433 5 5 1.19 A difference of .306 or greater between any pair of means is statistically significant. Using the same approach used in Chapter 6, a matrix indicating the difference between each pair of means makes it easier to interpret the HSD value. 1 week (3.6) 1 month (6.2) 2 months (6.2)
  • 110. 1 week (3.6) diff 5 0 diff 5 2.6* diff 5 2.6* 1 month (6.2) diff 5 0 2 months (6.2) *Indicates a significant difference. The 1-week measures of productivity are significantly different from the 1-month and 2-month measures of productivity. Because the mean values of the 1- and 2-month mea- sures are the same, neither of the last two measures is significantly different from the other. The largest increase in productivity comes between the first week and first month of employment. Calculating the Effect Size The final question for a significant F is the question of the practical importance of the result. Using partial-eta squared as the measure of effect size yields the following formula: partial-h2 5 SScol SSresid 1 SScol suk85842_07_c07.indd 257 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F
  • 111. For the problem just completed, SScol 5 22.533 and SSresid 5 3.467, so partial-h2 5 22.533 26 5 0.87 Approximately 87% of the variance in productivity can be explained by how long the individual has been on the job. Apply It! Pilot Program Revisited Let us return to the example of the middle school that adopted a medita- tion program known as quiet time to relieve stress, increase test scores, and improve student behavior. In Chapter 5, we used a one-sample t- test to deter- mine that a statistically significant increase in GPAs occurred among participating students. Now, we will use a within-subject F test to see if their stress levels have decreased. Ten randomly chosen students who participated in the program filled out questionnaires about their stress levels. The aggregate score was from 1 to 10, with 10 indicating the most stress. The survey was given before the start of the program and at 3-month intervals. The
  • 112. time elapsed represents the independent variable, the treatment effect that drives this analysis. The dependent variable is the stress score. The within- subjects F is a dependent- groups procedure for two or more groups of scores for which the dependent variable is interval or ratio scale. In this example, we have four groups of scores. Results of the stress questionnaires follow. 0 Months 3 Months 6 Months 9 Months Student 1 7 6 6 6 Student 2 9 6 5 5 Student 3 7 5 5 4 Student 4 5 3 3 2 Student 5 7 6 4 4 Student 6 8 5 7 5 Student 7 5 4 4 3 Student 8 7 5 6 5 Student 9 6 6 4 4 Student 10 7 5 5 5 (continued) suk85842_07_c07.indd 258 10/23/13 1:29 PM
  • 113. CHAPTER 7 Apply It! (continued) The following table shows results of the within-subject F test calculations. Source SS df MS F Total 82.000 39 Columns 34.475 3 11.492 26.36 Subjects 35.725 9 Residual 11.775 27 0.436 F.05(3,27) 2.96 The F value of 26.36 is greater than the critical F value of 2.96, so the results are statistically significant. Because the calculation of F did not identify the measures that were significantly different from the others, we calculate HSD using the following formula: HSD 5 x Å MSresid n HSD 5 3.875 Å 0.436
  • 114. 10 5 0.81 A difference of 0.81 or greater between any pair of means is statistically significant. A matrix indicating the difference between each pair of means makes it easier to interpret the HSD value. 0 months (6.8) 3 months (5.1) 6 months (4.9) 9 months (4.3) 0 months (6.8) diff 5 1.7* diff 5 1.9* diff 5 2.5* 3 months (5.1) diff 5 0.2 diff 5 0.8 6 months (4.9) diff 5 0.6 9 months (4.3) The differences marked with an asterisk are significant. The largest increase in productivity occurs during the first 3 months of the program. To determine the practical importance of these numbers, partial- eta squared is used. For the problem just completed, SScol 5 34.475, and SSresid 5 11.775, so h2 5 34.475 46.25 5 0.75
  • 115. Section 7.3 The Within-Subjects F (continued) suk85842_07_c07.indd 259 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F Comparing the Within-Subjects F and the One-Way ANOVA In the one-way ANOVA, within-group variance is different for each group because each group is made up of different participants. There is no way to eliminate the error vari- ance as it was eliminated for the within-subjects F because that source of error variance cannot be separated from the balance of the error variance. The smaller error term in the within-subjects test (which is the divisor in the F ratio) allows relatively small differences between the sets of measures to result in a significant F. This is illustrated by using the same data as the example of the workers who assem- ble electronic components, except here we calculated a one-way ANOVA instead of the within-subjects F. This is for illustration only because groups are either independent or dependent. There is not a situation in that once the test is conducted, someone would wonder which approach is appropriate. The SStot and the SSbet will be the same as the SStot and the SScol are in the within-subjects
  • 116. problem. SStot 5 49.333 SSbet 5 22.533 SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2 (Formula 6.3) 5 (2 2 3.60)2 1 (4 2 3.60)2 1 . . . 1 (9 2 6.20)2 5 26.80 The value for the SSwith in a one-way ANOVA is the same as SSrows 1 SSresid in the within- subjects F in Figure 7.6. It has to be because in the one-way ANOVA, there is no way to separate the participant-to-participant differences from the balance of the error variance Apply It! (continued) About 75% of the variance in stress can be explained by how long the student has been enrolled in the program. The within-subjects F test allowed analysis of students’ stress levels at multiple times throughout the year and showed that the program was reducing stress levels by significant amounts. Apply It! boxes written by Shawn Murphy. suk85842_07_c07.indd 260 10/23/13 1:29 PM
  • 117. CHAPTER 7Section 7.3 The Within-Subjects F because they are different for each group. With the SSrows added back into the error term, note in Table 7.3 the changes made to the ANOVA table and to F in particular. • The degrees of freedom for “within” change to 12 from the 8 for residual, which results in a smaller critical value for the independent-groups test, but that adjust- ment does not compensate for the additional error in the term. • Note that the sum of squares for the error term jumps from 3.467 in the within- subjects test to 26.80 in the independent-groups test. • The F value is reduced from 26.0 in the within problem to 5.046 in the one-way problem, a factor of about 1/5. Because groups are either independent or not, the example is not realistic. Nevertheless, the calculations illustrate the advantage to statistical power of setting up a dependent- groups test, an option researchers have at the planning level. Table 7.3: The within-subjects F example repeated as a one-way ANOVA The ANOVA table Source SS df MS F Fcrit Between 22.533 2 11.267 5.045 3.89
  • 118. Within 26.800 12 2.233 Total 49.333 14 Another Within-Subjects F Example A psychologist working at a federal prison is interested in the relationship between the amount of time a prisoner is incarcerated and the number of violent acts in which the prisoner is involved. Using self-reported data, inmates respond anonymously to a ques- tionnaire administered 1 month, 3 months, 6 months, and 9 months after incarceration. The data and the solution are in Figure 7.7. The results (F) indicate that there are significant differences in the num- ber of violent acts documented for the inmate related to the length of time the inmate has been incarcerated. The HSD results indicate that those incarcerated for 1 month are involved with a significantly dif- ferent number of violent acts than those who have been in for 3 or 6 months. Those who have been in for 6 months are involved with a sig- nificantly different number of violent acts than those who have been in for 9 months. The eta-squared value indicates that about 37% of the variance in number of violent acts is a function of how long the inmate has been incarcerated.
  • 119. F We compared a one-way ANOVA to a within-subjects F using the same data. How would the eta-squared values for the two problems compare? Try It! suk85842_07_c07.indd 261 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F Figure 7.7: Another within-subjects F: Violence and the time of incarceration Inmate 1 2 3 4 5 1 month 4
  • 123. F 9.393 The ANOVA Table 1. SStot = �(x � MG)2 = 31.750 Verify that, 4. SSresid = SStot � SScol � SSsubj = 31.75 � 11.75 � 15 = 5.0 F0.05(3.12) = 3.49. F is sig. 2. SScol = (Mcol1 � MG) 2ncol1 + (Mcol2 � MG) 2ncol2 + (Mcol3 � MG) 2ncol3 + (Mcol14 � MG) 2ncol14 (3.6 � 2.75)25 + (2.2 � 2.75)25 + (1.8 � 2.75)25 + (3.4 � 2.75)25 = 11.750 (3.5 � 2.75)24 + (4.0 � 2.75)24 + (1.75 � 2.75)24 + (2.5 � 2.75)24 + (2.0 � 2.75)24 = 15.0 3. SSsubj = (Mr1 � MG) 2nr1 + (Mr2 � MG) 2nr2 + (Mr3 � MG) 2nr3 + (Mr4 � MG) 2nr4 + (Mr5 � MG) 2nr5 MG = 2.750 MSw n 0.417 5
  • 124. M1 = 3.6 M1 = 3.6 1.4* M2 = 2.2 1.8* 0.4 M3 = 1.8 0.2 1.2 1.6* M4 = 3.4 M2 = 2.2 M3 = 1.8 M4 = 3.4 The Post-hoc test: HSD = x0.05�( ) = 4.20�( ) = 1.213 SScol SStot 11.75
  • 125. = 0.370% of the variance in violence witnessed is related to how long the inmate has been incarcerated. =n2 = 31.75 suk85842_07_c07.indd 262 10/23/13 1:29 PM CHAPTER 7Section 7.3 The Within-Subjects F A Within-Subjects F in Excel In spite of the important increase in power that is available compared to independent- groups tests, a dependent-groups ANOVA is not one of the more common tests. It is not one of the options Excel offers in the list of Data Analysis Tools, for example. However, like many statistical procedures, there are a number of repetitive calculations involved and Excel can simplify these. We will complete the second problem as an example. 1. Set the data up in four columns just as they are in Figure 7.8, but create a blank column to the right of each column of data. With a row at the top for the labels, data begins in cell A2. 2. Calculate the row and column means as well as a grand mean as follows: a. For the column means, place the cursor in cell A7 just
  • 126. beneath the last value in the first column and enter the formula, 5average(A2:A6) followed by Enter. To repeat this for the other columns, left click on the solution that is now in A7, drag the cursor across to G7, and release the mouse button. In the Home tab, click Fill and then Right. This will repeat the column- means calculations for the other columns. Delete the entries this makes to cells B7, D7, and F7, which are still empty at this point. b. For the row means, place the cursor in cell I2 and enter the formula 5average(A2,C2,E2,G2) followed by Enter. To repeat this for the other rows, left click on the solution that is now in I2, drag the cursor down to I6, and release the mouse button. In the Home tab, click Fill and then Down. This will repeat the calculation of means for the other rows. c. For the grand mean, place the cursor in cell I7 and enter the formula 5average(I2:I6) followed by Enter (the mean of the row means will be the same as the grand mean—the same could have been done with the column means). 3. To determine the SStot: a. In cell B2, enter the formula 5(A222.75)^2 and press Enter. This will square
  • 127. the difference between the value in A2 and the grand mean. To repeat this for the other data in the column, left-click the cursor in cell B2 and drag down to cell B6. Click Fill and Down. With the cursor in cell B7, click the summation sign ( a ) at the upper right of the screen and press Enter. Repeat these steps for columns D, F, and H. b. With the cursor in H9, type in SStot5 and press Enter. In cell I9, enter the formula 5Sum(B7,D7,F7,H7) and press Enter. The value will be 31.75, which is the total sum of squares. 4. For the SScol: a. In cell A8, enter the formula 5(3.622.75)^2*5 and press Enter. This will square the difference between the column mean and the grand mean and multiply the result by the number of measures in the column, 5. In cells C8, E8, and G8, repeat this for each of the other columns, substituting the mean for each column for the 3.60 that was the column 1 mean. b. With the cursor in H10, type in SScol5 and press Enter. In cell I10, enter the formula 5Sum(A8,C8,E8,G8) and press Enter. The value will be 11.75, which is the sum of squares for the columns. suk85842_07_c07.indd 263 10/23/13 1:29 PM
  • 128. CHAPTER 7Section 7.3 The Within-Subjects F 5. For the SSrows: a In cell J2, enter the formula 5(I222.75)^2*4 and press Enter. Repeat this in rows I32I6 by left clicking on what is now I2 and dragging the cursor down to cell I6. Click Fill and Down. b. With the cursor in H11, type in SSrow5 and press Enter. In cell I11, enter the formula 5Sum(J2:J6) and press Enter. The value will be 15.0, which is the sum of squares for the participants. 6. For the SSresid, in cell H12, enter SSresid5 and press Enter. In cell I12, enter the formula 5I102I112I12. The resulting value will be 5.0. We used Excel to determine all the sum-of-squares values. Now, the mean squares are determined by dividing the sums of square for columns and residual by their degrees of freedom: MScol 5 11.75 3 5 3.917 MSresid 5 5
  • 129. 12 5 .417 F 5 MScol MSresid 5 3,917 .417 5 9.393, which agrees with the earlier calculations done by hand. To create the ANOVA table, • beginning in cell A10, type in Source; in B10, SS; in C10, df; in D10, MS; in E10, F; and in F10, Fcrit. • Beginning in cell A11 and working down, type in total, columns, rows, residual. • For the sum-of-squares values: • In cell B11, enter 5I9. • In cell B12, enter 5I10. • In cell B13, enter 5I11. • In cell B14, enter 5I12. For the degrees of freedom: • In cell C11, enter 19 for total degrees of freedom. • In cell C12, enter 3 for columns degrees of freedom. • In cell C13, enter 4 for rows degrees of freedom. • In cell C14, enter 12 for residual degrees of freedom.
  • 130. For the mean squares: • in cell D12, enter 5B12/C12. The result is MScol. • in cell D14, enter 5B14/C14. The result is MSresid. For the F value in cell E12, enter 5D12/D14. In cell F12, enter the critical value of F for 3 and 12 degrees of freedom, which is 3.49. suk85842_07_c07.indd 264 10/23/13 1:29 PM CHAPTER 7Section 7.4 Presenting Results The list of commands looks intimidating, but mostly because every keystroke has been included. With some practice, this will become second nature. Figure 7.8 is a screenshot of the result of the calculations. Figure 7.8: A within-subjects F problem in Excel 7.4 Presenting Results Using the data from Figure 7.8 and analyzing it in Excel, we see the output table high-lighted in yellow. The table is broken down reading from left to right in columns that include Sum of Squares (SS), Degrees of Freedom (df), mean squares (MS), F ratio (F). and F critical value (Fcrit). Interpreting the results, we can see that the F ratio is based on the MScol/MSresid, which is 3.92/.416 5 9.4. This value is larger than our F critical value
  • 131. indicating significance at the p , .05 level. Recall that a psychologist has collected data on an incarcerated group over a 9-month span and the number of violent crimes they have committed. Upon analyzing the findings, we see that as time elapses from 1 month to 3 months to 6 months to 9 months there is a significant change in the number of violent acts being committed. However, you cannot be sure about when the two significant differ- ences occurred since there are four points in time in which data was captured (1 month, suk85842_07_c07.indd 265 10/23/13 1:29 PM CHAPTER 7Section 7.4 Presenting Results 3 months, 6 months, and 9 months). As a result post hoc tests will be needed to indicate where these differences lie. In regards to the hypotheses of the repeated measures ANOVA, it would be a comparison of mean differences across time. Therefore, H0: m1month 5 m3months 5 m6months 5 m9months The null hypothesis states there is no significant difference between the mean number of violent incidences from 1 month to 3 months to 6 months to 9 months. Keep in mind that ANOVAs are an omnibus test so we are testing any overall differences between the months. There may
  • 132. be differ- ences between any two months and not necessarily all of the months with each other, where we can follow up with paired comparisons. Ha: m1month ? m3months ? m6months ? m9months The alternative (or research) hypothesis states there is a significant differ- ence between the mean number of violent incidences from 1 month to 3 months to 6 months, to 9 months. The alternative can also be a prediction in the increase between the mean number of violent incidences from 1 month to 3 months to 6 months to 9 months. Ha: m1month , m3months , m6months , m9months To analyze and present results using SPSS, let us first look at an example of a paired sample/dependent t-test then a repeated-measures ANOVA example. SPSS Example 1: Steps for a Paired (Matched)-Samples t-Test From the data set provided (Figure 7.9), a college professor wants to look at mean differ- ences in scores over the first two quizzes of his statistics class. With his scores in SPSS, go to Analyze S Compared Means S Paired-Samples. Input Score 1 in the first box and Score 2 in the second box that is available as seen in Figure 7.10. The click OK. The result- ing SPSS output tables are provided in Figure 7.11.
  • 133. suk85842_07_c07.indd 266 10/23/13 1:29 PM CHAPTER 7Section 7.4 Presenting Results Figure 7.9: Data set for quiz scores Figure 7.10: SPSS steps in performing a paired-samples t-test suk85842_07_c07.indd 267 10/23/13 1:29 PM CHAPTER 7Section 7.4 Presenting Results Figure 7.11: SPSS results of a paired-samples t-test SPSS Example 2: Steps for a Repeated-Measures ANOVA This example uses data gathered from SPSS (PASW) On-Line Training Workshop (1999), available at the following link: http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data .htm The data is measuring cancer treatments over time (see Figure 7.12): TOTALCIN 5 oral condition at the initial stage TOTALCW2 5 oral condition at the end of week 2 TOTALCW4 5 oral condition at the end of week 4 TOTALCW6 5 oral condition at the end of week 6
  • 134. Go to Analyze S General Linear Model S Repeated Measures. As shown in Figure 7.13, type in the Within-Subject Factor Name: CW_Times, Number of Levels: 4, and the Mea- sure Name: CW; then click Define. As shown in Figure 7.14, put in the four TOTALCW variables in simultaneous order in the Within-Subjects Variables box, click Plots, and move CW_Times into the Horizontal Axis. Then click Options and move CW_Times into Display Means for, click Compare Main Effects, and select Sidak from the dropdown box just below. Then click Descriptive statistics and Estimates of effect size. Click Continue and OK. Paired Samples Statistics Pair 1 N 14 14 Std. Deviation 42.893 38.7857 Mean Std. Error Mean 4.3992
  • 135. 7.15949 1.1757 1.91345Score_2 Score_1 Paired Samples Test Pair 1 Std. Deviation Mean Std. Error Mean 95% Confidence Interval of the Difference Lower Bound Upper Bound Paired Differences t df Sig. (2-tailed) Score_2 Score_1
  • 136. 4.10714 7.00598 1.87243 .06201 8.15228 2.193 13 .047 suk85842_07_c07.indd 268 10/23/13 1:29 PM http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm CHAPTER 7Section 7.4 Presenting Results Figure 7.12: Data set of cancer treatments over time Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm Figure 7.13: Repeated-measures steps (part 1) Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm suk85842_07_c07.indd 269 10/23/13 1:29 PM http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm CHAPTER 7Section 7.4 Presenting Results Figure 7.14: Repeated-measures steps (part 2) Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm suk85842_07_c07.indd 270 10/23/13 1:29 PM http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm
  • 137. CHAPTER 7Section 7.4 Presenting Results Figure 7.15: SPSS results of cancer treatments over time Tests of Within-Subjects Effects Measure: CW Source Type III Sum of Squares df F Sig. Partial Eta Squared Mean Square Imputation Number 13.760 13.760 13.760 220.340 220.340 220.340 220.340
  • 140. Lower-Bound Sphericty Assumed Greenhouse- Geisser Huynh-Feldt Lower-Bound Mauchly’s Test of Sphericitya Measure: CW Mauchly’s WWithin Subjects Effect Imputation Number Approx. Chi-Square .808 .333CW_Times Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept Within Subjects Design: CW_Times b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.
  • 141. .596 11.752 5 .039 .731 Epsilonbdf Sig. Greenhouse- Geisser Huynh- Feldt Lower- Bound 1 Descriptive Statistics Std. DeviationMeanImputation Number N 8.28 6.52 10.36 9.76 2.542 1.531 3.475 3.566 25
  • 142. 25 25 25 1 TOTALCIN TOTALCW2 TOTALCW4 TOTALCW6 suk85842_07_c07.indd 271 10/23/13 1:29 PM CHAPTER 7Section 7.4 Presenting Results Figure 7.15: SPSS results of cancer treatments over time (continued) Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm Figure 7.16: SPSS output graph of cancer treatments over time Data from http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm Pairwise Comparisons Measure: CW
  • 146. 3 1 2 3 4 1 95% Confidence Interval for Differenceb �3.205 �5.830 �5.407 .315 �4.113 �3.539 1.850 .047 �.806 1.082
  • 148. b. Adjustment for multiple comparisons: Sidak. suk85842_07_c07.indd 272 10/23/13 1:29 PM http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm CHAPTER 7Section 7.5 Interpreting Results Based on the results (Figures 7.15 and 7.16), the Descriptive Statistics table shows that there are differences in the mean, how significant those differences are determined by the ANOVA, and the consequent post hoc tests. Next the Mauchly’s test of sphericity shows a signifi- cant value based on the x2 distribution with a significance value at the p , .05 level indicating a violation of the sphericity assumption. To reiterate, this indicates that there is variance between pairs of treat- ments or measures for the group. Since there was a significant differ- ences between pairs of treatments compared to other pairs, a viola- tion has occurred. Therefore, looking at the Tests of Within- Subjects Effects table, sphericity cannot be assumed, and a df adjustment will be made by using the Greenhouse-Geisser or the Huynh-Feldt calcu- lations. As seen in the F-value (13.760) and the df, adjustment does
  • 149. not make any difference as there are significant differences across CW_Times (p , .05). The Pairwise Comparisons table is where we see between-treatment differences indicating that all treatment times are significantly different except between times 2 and 4 (p 5 .262) and 3 and 4 (p 5 .800) that are not statistically significant. The line graph also indicates a trend in differences between the first, second, and third treatment times but not much difference from the third to fourth treatments. 7.5 Interpreting Results Refer to the most recent edition of the APA manual for specific detail on formatting sta-tistics, but Table 7.4 may be used as a quick guide in presenting the statistics covered in this chapter. Table 7.4: Guide to APA formatting of F statistic results Abbreviation or Term Description F F test statistic score Partial-h2 Partial-eta-squared: a measure of effect size for ANOVA W Mauchly’s Test of Sphericity x2 Distribution used for nonparametric tests such as Mauchly’s test of
  • 150. sphericity and Friedman’s ANOVA SS Sum of Squares MS Mean Square Source: Publication Manual of the American Psychological Association, 6th edition. © 2009 American Psychological Association, pp. 119–122. Using the results from SPSS Example 1, Figure 7.11, we could present the results in the following way: Access the data and the accompanying video via the links below to perform this analysis yourself. Both links are resources are provided by the Central Michigan University. Data link: http://calcnet .mth.cmich.edu/org/spss /Prj_cancer_data.htm Video: http://calcnet .mth.cmich.edu/org/spss /V16_materials/Video _Clips_v16/19repeated _measures/19repeated _measures.swf
  • 151. Try It! suk85842_07_c07.indd 273 10/23/13 1:29 PM http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm http://calcnet.mth.cmich.edu/org/spss/Prj_cancer_data.htm CHAPTER 7Section 7.6 Nonparametric Tests • There was a significant difference between quiz scores 1 (M 5 42.89) and 2 (M 5 38.79) as the difference in means significantly decreased over time t(13) 5 2.19, p , .05. Using the results from the SPSS Example 2, Figures 7.15 and 7.16, we could present the results in the following way: • The overall difference between CW_Times was significantly different using the Greenhouse-Geisser results F(2.19, 52.66) 5 13.76, p 5 .203, partial-h2 5 .364. • Based on the Sidak pairwise comparison, the CW_1 time (M 5 6.52, SD 5 1.53) was significantly different from all the other times. CW_2 (M 5 8.28, SD 5 2.54) and CW_4 (M 5 10.36, SD 5 3.56) were also significantly different from each other. 7.6 Nonparametric Tests You may have noticed that for every parametric test, there is a
  • 152. nonparametric equiva-lent. The rationale behind nonparametric tests is to obtain a conservative estimate of significance when violations in parametric assumptions have occurred. Such assumptions include linearity, abnormal distributions, and small data sets. The nonparametric equivalent of the dependent-samples t-test is the Wilcoxon signed- ranks test (not to be confused with Chapter 5’s Wilcoxon rank- sum test for independent samples t-test). Frank Wilcoxon proposed both of these in a single paper published in 1945. The Wilcoxon signed-ranks W-test is known as the Wilcoxon t-test for dependent samples (not independent ones). In brief, the steps in the calculation of W are calculating the differences between scores, taking an absolute value (removing the 1/2 sign), rank- ing the absolute values, reassigning the original (1/2) sign, and then summing the ranks. The nonparametric equivalent of the parametric repeated- measures ANOVA is Fried- man’s ANOVA. Essentially the analysis looks at the difference in the mean ranks (instead of means) across time, treatments, or matched/equivalent groups. By analyzing differ- ence in the mean ranks, the analysis is in effect eliminating extremes points, or outliers, in the distribution. As noted, this is the disadvantage of using the mean, as these are affected by outliers. Again, nonparametric tests are a conservative, distribution-free analysis that are used when parametric violations have occurred. As a result, it is more difficult to find significance; on the other hand, they are conservative in that
  • 153. there is a lower probability of having a type I error. One important point to note is that even though mean ranks are used to calculate significant difference between times, treatments, or matched/equivalent groups, the results are reported in terms of the median differences as will be shown in the next example. Friedman’s nonparametric ANOVA dates back to 1937 and is based on ranked (ordinal) data and the comparison of medians. An alternative test that is nonparametric and similar to Friedman’s test is Cochran’s Q-test, which is used for dichotomous data (i.e., only two response choices as in yes/no). suk85842_07_c07.indd 274 10/23/13 1:29 PM CHAPTER 7Section 7.6 Nonparametric Tests Worked Example of the Friedman’s Nonparametric ANOVA and Wilcoxon Signed-Ranks W Using SPSS To perform the Friedman’s nonparametric ANOVA using the data set in Figure 7.17, go to Analyze S Nonparametric Tests S Legacy Dialogs S K Related Samples. Place Score_1, Score_2, and Score_3 into the Test Variables box (see Figure 7.18). Click on Statistics, check the Quartiles box, and then click Continue and OK. Figure 7.17: Data set for the Friedman’s ANOVA test
  • 154. suk85842_07_c07.indd 275 10/23/13 1:29 PM CHAPTER 7Section 7.6 Nonparametric Tests Figure 7.18: Steps in SPSS for the Friedman’s nonparametric ANOVA Figure 7.19: Results of the Friedman’s nonparametric ANOVA Descriptive Statistics 50th (Median) Percentiles 25thN 75th 35.2500 39.000 30.3750 40.0000 44.500 40.2500 43.1250 46.125
  • 156. 4.148 .126 Chi-Square N df Asymp. Sig. a. Friedman Test suk85842_07_c07.indd 276 10/23/13 1:29 PM CHAPTER 7Section 7.6 Nonparametric Tests Interpreting Results Based on the results of the Friedman’s nonparametric ANOVA in Figure 7.19, there is no significant difference between quiz scores over time based on the x2(14) 5 4.15, p 5 .126. In other words, the students did not change significantly over testing times. Since there were no significant differences in the scores, some researchers will conclude that no post hoc tests will be needed. The few steps required to perform a post hoc Wilcoxon signed-ranks W-test using software will require minimal additional effort; however, the researcher can then be assured that there exist no such significant differences between groups.
  • 157. The steps for performing the W-test (Figure 7.20) are Analyze S Nonparametric Tests S Legacy Dialogs S 3 Related Samples. Input Score_1 and Score_2 in Row 1, Score_2 and Score_3 into Row 2, and then Score_1 and Score_3 into Row 3. Click on Options, check Quartiles, and click Continue and OK. Figure 7.20: Steps in SPSS for the Wilcoxon signed-ranks W- test suk85842_07_c07.indd 277 10/23/13 1:29 PM CHAPTER 7Section 7.6 Nonparametric Tests Figure 7.21: Results of the Wilcoxon signed-ranks W-test Descriptive Statistics 50th (Median) Percentiles 25thN 75th 35.2500 39.000 30.3750 40.0000
  • 159. .069 �1.821bZ a. Wilcoxon Signed Ranks Test b. Based on positive ranks. Ranks Sum of RanksMean RankN 4.50 7.17 8.42 6.81 5.88 8.15 13.50 64.50 50.50 54.50 23.50 81.50 3b
  • 162. = < > = < > = Score_1 Score_1 Score_1 Score_2 Score_2 Score_2 Score_1 Score_1 Score_1 Asymp. Sig. (2-tailed) suk85842_07_c07.indd 278 10/23/13 1:29 PM CHAPTER 7Summary Looking at the results (Figure 7.21) of the Wilcoxon signed- ranks W-test, we see that using this as a post hoc to the Friedman’s ANOVA does have benefits as it identifies a signifi- cant difference between Scores 1 and 2 that was not detected using Friedman’s ANOVA. This is a significant point in the previously noted debate over whether to run post hocs based on the significance of the F value. As a result, the conclusion here, based on the
  • 163. W-test, is that there is a significant difference between Score_1 (Mdn 5 44.50) and Score_2 (Mdn 5 40.00), Z(14) 5 22.00, p , .05, while there were no significant differences in Score_2 (Mdn 5 40.00), and Score_3 (Mdn 5 40.25), Z(14) 5 20.126, p 5 .90, and Score_1 (Mdn 5 44.50), and Score_3 (Mdn 5 40.25), Z(14) 5 21.821, p 5 .90. Summary Any statistical procedure has advantages and disadvantages. The downside of the dif- ferent independent-groups designs is that subjects within groups often respond to the independent variable differently. Those differences are a source of error variance that is different for each group. No matter how carefully a researcher randomly selects the groups to be used in a study, there are going to be differences in the way that people in the same group respond to whatever stimulus is offered. Both the before/after t-test and within-subjects F test eliminate that source of error variance by either using the same people repeatedly, or matching subjects on the most important characteristics. Control- ling error variance results in a test that is more likely to detect a significant difference (Objectives 1 and 5). In dependent-groups designs, using the same group repeatedly allows for the number of participants involved to be fewer (Objectives 1, 2, 3, 4, and 6). One of the downsides to repeated-measures designs is that they take more time to complete. Unless subjects
  • 164. are matched across measures, the different levels of the independent variable cannot be administered concurrently as they can in independent-groups tests. With more time, there is an increased potential for attrition. If one of the participants drops out of a repeated- measures study, the data is lost from all the measures of the dependent variable for that subject (Objectives 2 and 4). Having noted some of the differences between dependent-groups designs and their independent-groups equivalents, it is important to note their consistencies as well. Whether the test is independent t, before/after t, one-way ANOVA, or a within-subjects F, in each case the independent variable is nominal scale, and the dependent variable is interval or ratio scale (Objective 2). In addition, two repeated-measures designs were performed (i.e., t-tests and ANOVAs) where we presented several scenarios to test their respective null hypotheses to find sup- port for alternative ones (Objectives 3 and 6). Results and conclusions were presented, interpreted, and reported in APA format (Objectives 7 and 8). Finally, the Wilcoxon signed- ranks W-test and the Friedman’s Nonparametric ANOVA were discussed with an appro- priate example (Objective 9). suk85842_07_c07.indd 279 10/23/13 1:29 PM
  • 165. CHAPTER 7Key Terms There is something else that all the tests in this chapter have in common. They all test the hypothesis of difference. Like the z-test and the one sample t- test, they are about signifi- cant differences. Sometimes, however, the question involves the strength of the relation- ships between variables. Those discussions will introduce correlation and the hypothesis of association that are the focus of Chapter 8. Key Terms before/after t-test A dependent-group’s application of the t-test, also known as a pre/post t-test. In this particular applica- tion, one group is measured before and after a treatment. confounding variables Variables that influence an outcome but are uncontrolled in the analysis and obscure the effects of other variables. For example, if a psycholo- gist is interested in gender-related differ- ences in problem-solving ability but doesn’t control for age differences, differences in gender may be confounded by differences that are actually age-related. dependent-groups designs Statistical procedures in which the groups are related, either because multiple measures are taken of the same participants or because each participant in a particular group is matched on characteristics relevant to the analysis to
  • 166. a participant in the other groups with the same characteristics. Dependent-groups designs reduce error variance because they reduce score variation due to factors unre- lated to the independent variable. matched-pairs or dependent-samples t-test A dependent-group’s application of the t-test. In this particular application, each participant in the second group is paired to a participant in the first group with the same characteristics in order to limit the error variance that would other- wise stem from using dissimilar groups. sphericity Nonsignificant differences in the dependent variable across pairs of treat- ments or times for all participants in the group. By minimizing this within-group error variance, sphericity may be assumed. Significant within-group error variances between pairs of treatments are a violation of sphericity. Such variances are detected using Mauchly’s sphericity (W) test. within-subjects F The dependent-group’s equivalent of the one-way ANOVA. In this procedure either participants in each group are paired on the relevant characteristics with participants in the other groups, or one group is measured repeatedly after different levels of the independent variable are introduced. suk85842_07_c07.indd 280 10/23/13 1:29 PM
  • 167. CHAPTER 7Chapter Exercises Chapter Exercises Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below. A. Small samples tend to be platykurtic because the data in small samples is often highly variable. This translates into relatively large standard deviations and large error terms. B. If groups are created by random sampling, they will differ from the population from which they were drawn only by chance. That means that with random sampling, there can be error, but its potential to affect research results dimin- ishes as the sample size grows. C. The before/after t-test and the matched-pairs t-test differ only in that the before/ after test uses the same group twice and the matched-pairs test matches each subject in the first group with one in the second group who has similar charac- teristics. The calculation and interpretation of the t value are the same in both procedures. D. The within-subjects test will detect a significant difference more readily than an
  • 168. independent t-test. Power in statistical testing is the likelihood of detecting sig- nificance. E. Because the same subjects are involved in each set of measures, the within- subjects test allows us to calculate the amount of score variability due to indi- vidual differences in the group and eliminate it because it is the same for each group. This source of error variance is eliminated from the analysis, leaving a smaller error term. F. The eta-squared value would be the same in either problem. Note that in a one- way ANOVA, eta-squared is the ratio of SSbet to SStot. In the within-subjects F, it is SScol to SStot. Because SSbet and SScol both measure the same variance and the SStot values will be the same in either case, the eta-squared values will likewise be the same. What changes, of course, is the error term. Ordinarily, SSresid will be much smaller than SSwith, but those values show up in the F ratio by virtue of their respective MS values, not in eta-squared. Review Questions The answers to the odd-numbered items can be found in the answers appendix. 1. A group of clients is being treated for a compulsive behavior disorder. The number of times in an hour that each one manifests the compulsivity is gauged before and
  • 169. after a mild sedative is administered. The data is as follows: Before After 1. 5 4 2. 6 4 3. 4 3 (continued) suk85842_07_c07.indd 281 10/23/13 1:29 PM CHAPTER 7Chapter Exercises Before After 4. 9 5 5. 5 6 6. 7 3 7. 4 2 8. 5 5 a. What is the standard deviation of the difference scores? b. What is the standard error of the mean for the difference scores? c. What is the calculated value of t? d. Are the differences statistically significant?
  • 170. 2. A researcher is examining the impact that a political ad has on potential donors’ willingness to contribute. The data indicates the amount (in dollars) each is willing to donate before viewing that advertisement and after viewing the advertisement. Before After 1. 0 10 2. 20 20 3. 10 0 4. 25 50 5. 0 0 6. 50 75 7. 10 20 8. 0 20 9. 50 60 10. 25 35 a. Are there significant differences in the amount? b. What is the value of t if this is done as an independent t- test? c. Explain the difference between before/after and independent t-tests. 3. Participants attend three consecutive sessions in a business
  • 171. seminar. In the first, there is no reinforcement for responding to the session moderator’s questions. In the second, those who respond are provided with verbal reinforcers. In the third, (continued) suk85842_07_c07.indd 282 10/23/13 1:29 PM CHAPTER 7Chapter Exercises responders receive bits of candy as reinforcers. The dependent variable is the num- ber of times the participants respond in each session. None Verbal Token 1. 2 4 5 2. 3 5 6 3. 3 4 7 4. 4 6 7 5. 6 6 8 6. 2 4 5 7. 1 3 4 8. 2 5 7
  • 172. a. Are the column-to-column differences significant? If so, which groups are sig- nificantly different from which? b. Of what data scale is the dependent variable? c. Calculate and explain the effect size. 4. In the calculations for Exercise 3, what step is taken to minimize error variance? a. What is the source of that error variance? b. If Exercise 3 had been a one-way ANOVA, what would have been the degrees of freedom for the error term? c. How does the change in degrees of freedom for the error term in the within- subjects F affect the value of the test statistic? 5. Because SScol in the within-subjects F contains the treatment effect and measure- ment error, if there is no treatment effect, what will be the value of F? 6. Why is matching uncommon in within-subjects F analyses? 7. A group of nursing students is approaching the licensing test. The level of anxiety for each student is measured at 8 weeks prior to the test, then 4 weeks, 2 weeks, and 1 week before the test. Assuming that anxiety is measured on an interval scale, are there significant differences? Student Number
  • 173. 8 weeks 4 weeks 2 weeks 1 week 1. 5 8 9 9 2. 4 7 8 10 3. 4 4 4 5 4. 2 3 5 5 5. 4 6 6 8 6. 3 5 7 9 7. 4 5 5 4 8. 2 3 6 7 suk85842_07_c07.indd 283 10/23/13 1:29 PM CHAPTER 7Chapter Exercises a. Is anxiety related to the time interval? b. Which groups are significantly different from which? c. How much of anxiety is a function of test proximity? 8. A psychology department sponsors a study of the relationship between partici- pation in a particular internship opportunity and students’ final grades. Eight students in their second year of graduate study are matched to eight students in the same year by grade. Those in the first group participate in
  • 174. the internship. Stu- dents’ grades after the second year are compared. Student Pair Number Internship No Internship 1. 3.6 3.2 2. 2.8 3.0 3. 3.3 3.0 4. 3.8 3.2 5. 3.2 2.9 6. 3.3 3.1 7. 2.9 2.9 8. 3.1 3.4 a. Are the differences statistically significant? b. This should be done as a dependent-samples t-test. Because there are two sepa- rate groups involved, why? 9. A team of researchers associated with an accrediting body studies the amount of time professors devote to their scholarship before and after they receive tenure. Scores are hours per week. Professor Number Before Tenure After Tenure 1. 12 5
  • 175. 2. 10 3 3. 5 6 4. 8 5 5. 6 5 6. 12 10 7. 9 8 8. 7 7 a. Are the differences statistically significant? b. What is t if the groups had been independent? c. What is the primary reason for the difference in the two t values? suk85842_07_c07.indd 284 10/23/13 1:29 PM CHAPTER 7Chapter Exercises 10. A supervisor is monitoring the number of sick days employees take by month. For seven people, they are as follows: Employee Number Oct Nov Dec 1. 2 4 3 2. 0 0 0
  • 176. 3. 1 5 4 4. 2 5 3 5. 2 7 7 6. 1 3 4 7. 2 3 2 a. Are the month-to-month differences significant? b. What is the scale of the independent variable in this analysis? c. How much of the variance does the month explain? 11. If the people in each month of the Exercise 10 data were different, it would have been a one-way ANOVA. a. Would the result have been significant? b. Because total variance (SStot) is the same in either Exercise 10 or 11, and the SScol (Exercise 10) is the same as SSbet (Exercise 11), why are the F values different? Analyzing the Research Review the article abstracts provided below. You can then access the full articles via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix. Using Repeated Measures ANOVA for a Stress Management Study
  • 177. Elo, A., Ervasti, J., Kuosma, E., & Mattila, P. (2008). Evaluation of an organizational stress management program in a municipal public works organization. Journal of Occu- pational Health Psychology, 13(1), 10–23. Article Abstract The aim of this study was to investigate the effects of employee participation in an orga- nizational stress management program consisting of several interventions aiming to improve psychosocial work environment and well-being. Pre- and postintervention ques- tionnaires were used to measure the outcomes with a 2-year interval. This article describes the background of the program, results of previously published effect studies, and a quali- tative evaluation of the program. The authors also tested the effects of level of participa- tion in all interventions among the employees of the service production units by 2 (time) 3 (group) repeated measures ANOVAs (n 5 625). “Active participation” (more than 5.5 days) had a positive effect on feedback from supervisor and flow of information. Work climate remained on a permanent level while it decreased in the categories of moderate suk85842_07_c07.indd 285 10/23/13 1:29 PM CHAPTER 7Chapter Exercises and nonparticipation. The level of participation did not improve
  • 178. individual well-being or other aspects of psychosocial work environment as postulated by the work stress models. The qualitative evaluation and practical conclusions drawn by the management of the Organization provided a positive impression of the impact of the program. Critical Thinking Questions 1. What is the independent variable for which subjects are being tested, under all treatment levels? 2. Explain the importance of power in relation to this within- group design. 3. What is the disadvantage of testing subject group’s pre- and postparticipation program intervention? 4. Does this study need to worry about sphericity when conducting the repeated- measures ANOVA? Why or why not? Using ANOVA for a Personality Disorder Scales Study Wise, E. A. (1995). Personality disorder correspondence among the MMPI, MBHI, and MCMI. Journal of Clinical Psychology, 51(6), 790–798. Article Abstract MMPI, MBHI, and MCMI personality disorder scales were analyzed for convergent and discriminant validity. Friedman’s ANOVA indicated that there
  • 179. were no significant differ- ences among the sample’s averaged scale scores. Further analyses of the data, however, demonstrated that the Millon instruments classified significantly more of the sample as personality disordered when compared to Morey’s MMPI personality disorder scales. In addition, codetype correspondence among the three instruments was only 4 to 6%. When the instruments were analyzed in a pair-wise fashion, codetype correspondence increased to approximately 10 to 20%. These data indicate that these personality disorder scales do not demonstrate construct equivalence, particularly at the level of the individual profile. Critical Thinking Questions 1. Why did this study run a Friedman’s Nonparametric ANOVA? 2. The Friedman’s Nonparametric ANOVA showed no significant differences among the tests by scale means. What was the significance level for this to be true? 3. What is reported in Friedman’s Nonparametric ANOVA? Please label what each piece is from the Friedman output when comparing MBHI and MCMI compared to MMPI. 4. Suppose the Friedman’s ANOVA was significant. Would we run a post hoc test? What type of post hoc test?
  • 180. suk85842_07_c07.indd 286 10/23/13 1:29 PM iStockphoto/Thinkstock chapter 6 Analysis of Variance (ANOVA) Learning Objectives After reading this chapter, you will be able to. . . 1. explain why it is a mistake to analyze the differences between more than two groups with multiple t-tests. 2. relate sum of squares to other measures of data variability. 3. compare and contrast t-test with ANOVA. 4. demonstrate how to determine which group is significant in an ANOVA with more than two groups. 5. explain the use of eta-squared in ANOVA. 6. present statistics based on ANOVA results in APA format. 7. interpret results and draw conclusions of ANOVA. 8. discuss nonparametric Kruskal-Wallis H-test compared to the ANOVA.
  • 181. CN CO_LO CO_TX CO_NL CT CO_CRD suk85842_06_c06.indd 183 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance Ronald. A. Fisher was present at the creation of modern statistical analysis. During the early part of the 20th century, Fisher worked at an agricultural research station in rural southern England. In his work analyzing the effect of pesticides and fertilizers on crop yields, he was stymied by the limitations in Gosset’s independent t-test, which allowed him to compare only one pair of samples at a time. In the effort to develop a more com- prehensive approach, Fisher created analysis of variance (ANOVA). Like Gosset, he felt that his work was important enough to publish, and like Gosset in his effort to publish t-test, Fisher had opposition. In Fisher’s case, the opposition came from a fellow statistician, Karl Pearson. This is the same man who created the first department
  • 182. of statistical analysis at University College, London. In Chapters 9 and 11 you will study some of Pearson’s work with correlations as well as Spearman rho (r) and Chi-square (x2), which are the analysis of categorical (nominal and ordinal) data. Pearson also founded what is probably the most prominent journal for statisticians, Biometrika. Pearson was an advocate of making one comparison at a time and of using the largest groups possible to make those comparisons. When Fisher submitted his work to Pearson’s journal with procedures suggesting that samples can be small and many comparisons can be made in the same analysis, Pear- son rejected the manuscript. So began a long and increasingly acrimonious relationship between two men who would become giants in the field of statistical analysis and end up in the same department at University College. Interestingly, Gosset also gravitated to the department and managed to get along with both of them. Fisher’s contributions affect more than this chapter. Besides the development of the ANOVA, the concept of statistical significance is his as well as hypothesis testing discussed in Chapter 5. Note that although a ubiquitous phenomenon, significance testing itself is not always accepted by other statisticians. One such adversary is William [Bill] Kruskal, who consequently derived the nonparametric version of the ANOVA—the Kruskal-Wallis H-test, which is discussed in this chapter. Despite these philosophical and statistical dif-
  • 183. ferences, R. A. Fisher made an enormous contribution to the field of quantitative analysis, as did his nemesis, Karl Pearson with additional statistical contributions by William Sealy Gosset and Bill Kruskal. 6.1 One-Way Analysis of Variance In any experiment, scores and measurements vary for many reasons. If a researcher is interested in whether children will emulate the videotaped behavior of adults whom they have watched, any differences in the children’s behavior from before they see the adults to after are attributed primarily to the adults’ behaviors. But even if all of the chil- dren watch with equal attentiveness, it is likely there will be differences in their behaviors H1 TX_DC BLF TX BL BLL suk85842_06_c06.indd 184 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance
  • 184. after the video. Some of those differences might stem from age differences among the chil- dren. Perhaps the amount of exposure children otherwise have to television will prompt differences in their behavior. Probably differences in their background experiences will also affect the way they behave. In an analysis of how behavior changes as a result of watching the video, the independent variable (IV) is whether or not the children have seen the video. Changes in their behavior, the dependent variable (DV), reflect the effect of the IV, but they also reflect all the other factors that prompt the children to behave differently. An IV is also referred to as a factor, particularly in procedures that involve more than one IV. Behavior changes that are not related to the IV reflect the presence of error variance attributed by other factors known as confounding variables. When researchers work with human subjects, some level of error variance is inescap- able. Even under tightly controlled conditions where all members of a sample receive exactly the same treatment, the subjects are unlikely to respond the same way. There are just too many confounding variables that also affect their behavior. Fisher ’s approach was to calculate the total variability in a problem and then analyze it, thus the name analysis of variance. Any number of IVs can be included in an ANOVA. Here, we are interested primarily in
  • 185. ANOVA in its simplest form, a procedure called one-way ANOVA. The “one” in one- way ANOVA indicates that there is just one IV in this model. In that regard, one-way ANOVA is similar to the independent-samples t-test discussed in Chapter 5. Both tests have one IV and one DV. The difference is that the independent t-test allows for an IV with just two groups, but the IV in ANOVA can be any number of groups generally more than two. In other words, a one-way ANOVA with just two groups is the same as an independent- samples t-test where the statistic calculated in ANOVA, F is equal to t2; this is addressed and illustrated in Section 6.5. The ANOVA Advantage The ANOVA and the t-test both answer the same question: Are there significant differ- ences between groups? So why bother with another test when we have the t-test? Suppose someone has developed a group therapy program for people with anger management problems and the question is, are there significant differences in the behavior of clients who spend (a) 8, (b) 16, and (c) 24 hours in therapy over a period of weeks? Why not answer the question by performing three t-tests as follows? 1. Compare the 8-hour group to the 16-hour group. 2. Compare the 16-hour group to the 24-hour group. 3. Compare the 8-hour group to the 24-hour group.
  • 186. A What does the “one” in one-way ANOVA refer to? Try It! suk85842_06_c06.indd 185 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance The Problem of Multiple Comparisons These three tests represent all possible comparisons, but there are two problems with this approach. First, all possible comparisons is a good deal more manageable if there are three groups than if there are, say, five groups. If there were five groups, labeled a through e, note the number of comparisons needed to cover all possible comparisons: 1. a to b 2. a to c 3. a to d 4. a to e 5. b to c 6. b to d 7. b to e 8. c to d 9. c to e 10. d to e All possible comparisons among three tests involve 10 tests as
  • 187. seen above to cover all the combinations of tests. Family-Wise Error The other problem is an issue of inflated error in hypothesis testing when doing multiple tests known as family-wise error. Recall that the potential for type I error (a) is deter- mined by the level at which the test is conducted. At a 5 .05, any significant finding will result in a type I error an average of 5% of the time. However, that level of error assumes that each test is conducted with new data thereby increasing the family-wise error rate (FWER). Specifically, if statistical testing is done repeatedly with the same data, the poten- tial for type I error does not remain fixed at .05 (or whatever the level of the testing), but grows. In fact, if 10 tests are conducted in succession with the same data as with groups labeled a, b, c, d, and e mentioned earlier, and each finding is significant, by the time the 10th test is completed, the potential for alpha error is FWER 5 .40 or a 40% error prob- ability, as the following procedure illustrates: P a 5 1 2 (1 2 pa)n Where Pa 5 the probability of alpha error overall pa 5 the probability of alpha error for the initial significant
  • 188. finding n 5 the number of tests conducted where the result was significant P a 5 1 2 (1 2 .052 10 5 1 2 .599 FWER 5 .401 The probability of a type I error at this point is 4 in 10 or 40%! suk85842_06_c06.indd 186 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance The business of raising the (1 2 pa) difference to the 10th power (or however many com- parisons there are) is not only tedious, but the more important problem is that the prob- ability of a type I error does not remain fixed when there are successive significant results with the same data. Therefore, using multiple t-tests is never a good option. In the end, running one test in an overall ANOVA will control for inflated FWER. An ANOVA is therefore termed an omnibus test, as it will test the overall significance of the research model based on the differences between sample means. It will not tell you which two means are significantly different, which is why follow-up
  • 189. post hoc comparisons are executed. These concepts will be discussed in further detail throughout the chapter. The Variance in Analysis of Variance (ANOVA) To analyze variance, Fisher began by calculating total variability from all sources. He recognized that when scores vary in a research study, they do so for two reasons. They vary because the independent variable (the “treatment”) has had an effect, and they vary because of factors beyond the control of the researcher, producing the error variance referred to earlier. The test statistic in ANOVA is the F ratio (named for Fisher), which is treatment variance (variance that can be explained by the IV on the DV) divided by error variance (variance that cannot be explained due to confounding variables on the DV). When F is large, it indi- cates that the difference between at least two of the groups in the analysis is not random and that there are significant differences between at least two group means. When the F ratio is small (close to a value of 1), it indicates that the IV has not had enough impact to overcome error variability, and the differences between groups are not significant. We will return to the F ratio when we discuss Formula 6.4. Variance Between and Within Groups If three groups of the same size are all selected from one population, they could be repre-
  • 190. sented by three distributions, as shown in Figure 6.1. They do not have exactly the same mean, but that is because even when they are selected from the same population, samples are rarely identical. Those initial differences between sample means indicate some degree of sampling error. Figure 6.1: Three groups drawn from the same population suk85842_06_c06.indd 187 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance The reason that each of the three distributions has width is that there are differences within each of the groups. Even if the sample means were the same, individuals selected to the same sample will rarely manifest precisely the same level of whatever is measured. If a population is identified—for example, a population of the academically gifted—and a sample is drawn from that population, the individuals in the sample will not all have the same level of ability. Because they are all members of the population of the academically gifted, they will probably all be higher than the norm for academic ability, but there will still be differences in the subjects’ academic ability within the sample. These differences within are sources of error variance. The treatment effect is indicated in how the IV affects the way
  • 191. the DV is manifested. For example, three groups of subjects are administered different levels of a mild stimulant (the IV) to see the effect on level of attentiveness. The issue in ANOVA is whether the IV, the treat- ment, creates enough additional between-groups variability to exceed any error variance. Ultimately, the question is whether, as a result of the treatment, the samples still represent populations with the same mean, or whether, as is suggested by the distributions in Figure 6.2, they may represent populations with different means. Figure 6.2: Three groups after the treatment The within-groups variability in these three distributions is the same as it was in the dis- tributions in Figure 6.1. It is the between-groups variability that has changed in Figure 6.2. More particularly, it is the difference between the group means that has changed. Although there was some between-groups variability before the treatment, it was comparatively minor and probably reflected sampling variability. After the treatment, the differences between means are much greater. What F indicates is whether group differences are great enough to be statistically significant not due to chance. The Statistical Hypotheses in One-Way ANOVA The hypotheses are very much like they were for the
  • 192. independent t-test, except that they accommodate more groups. For the t-test, the null hypothesis is written H0: m1 5 m2. It indicates that the two samples involved were drawn from populations with the same means. For a one-way ANOVA with three groups, the null hypothesis has this form: H0: m1 5 m2 5 m3 B If a psychologist is interested in the impact that 1 hour, 5 hours, or 10 hours of therapy have on client behavior, how are behavior differences related to gender explained? Try It! suk85842_06_c06.indd 188 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance It indicates that the three samples were drawn from populations with the same means. Things have to change for the alternate hypothesis, however, because with three groups, there is not just one possible alternative. Note that each of the following is possible:
  • 193. a. Ha: m1 ? m2 5 m3 Sample 1 represents a population with a mean value different from the mean of the population represented by Samples 2 and 3. b. Ha: m1 5 m2 ? m3 Samples 1 and 2 represent a population with a mean value different from the mean of the population represented by Sample 3. c. Ha: m1 5 m3 ? m2 Samples 1 and 3 represent a population with a mean value different from the population represented by Sample 2. d. Ha: m1 ? m2 ? m3 All three samples represent populations with different means. Because the several possible alternative outcomes multiply rap- idly when the number of groups increases, a more general alternate hypothesis is given. Either all the groups involved come from popu- lations with the same means, or at least one of them does not. So the form of the alternate hypothesis for an ANOVA with any number of groups is simply Ha: At least one of the means is different from the other means. Also remember that all the hypotheses are either nondirectional, in that there is no predic- tion of which sample mean will be higher than the others:
  • 194. Nondirectional alternative hypothesis: Ha: m1 ? m2 ? m3 or directional, in that there is a prediction of which sample mean will be higher than the other means. As seen below for the directional alternative hypothesis, there is a prediction that m3 will be higher than m2 that is higher than m1. Directional alternative hypothesis: Ha: m1 , m2 , m3 As a researcher, it is important to consider the value of prediction in terms of a one-tailed test versus no prediction in a two-tailed test as discussed in Chapter 5. Measuring Data Variability in the One-Way ANOVA We have discussed several different measures of data variability to this point, including the standard deviation (s), the variance (s2), the standard error of the mean (SEM), the standard error of the difference (SEd), and the range. For ANOVA, Fisher added one more, C How many t-tests would it take to make all possible comparisons in a procedure with six groups? Try It! suk85842_06_c06.indd 189 10/23/13 1:40 PM
  • 195. CHAPTER 6Section 6.1 One-Way Analysis of Variance the sum of squares (SS). The sum of squares is the sum of the squared differences between scores and one of several mean values. In ANOVA, • One sum-of-squares value involves the differences between individual scores and the mean of all the scores in all the groups (the grand mean). This is the called the sum of squares total (SStot) because it measures all variability from all sources. • A second sum-of-squares value indicates the difference between the means of the individual groups and the grand mean. This is the sum of squares between (SSbet). It measures the effect of the IV, the treatment effect, as well any differ- ences that existed between the groups before the study began. • A third sum-of-squares value measures the difference between scores in the sam- ples and the means of their sample. These sum of squares within (SSwith) values reflect the differences in the way subjects respond to the same stimulus. Because this value is entirely error variance, it is also called the sum of squares error (SSerr) or the sum of squares residual (SSres). All Variability From All Sources: The Sum of Squares Total (SStot ) There are multiple formulas for SStot. They all provide the
  • 196. same answer, but some make more sense to look at than others. Formula 6.1 makes it clear that at the heart of SStot is the difference between each individual score (x) and the mean of all scores, or the grand mean, for which the notation is MG. SStot 5 a (x 2 MG )2 Formula 6.1 Where x 5 each score in all groups MG 5 the mean of all data from all groups, the grand mean To calculate SStot, follow these steps: 1. Sum all scores from all groups and divide by the number of scores to determine the grand mean, MG. 2. Subtract MG from each score (x) in each group, and then square the difference: (x 2 MG) 2 3. Sum all the squared differences: a (x 2 MG) 2 The Treatment Effect: The Sum of Squares Between (SSbet ) The between-groups variance, the sum of squares between (SSbet), contains the variability due to the independent variable, the treatment effect. It will also contain any initial differ-
  • 197. ences between the groups, which of course is error variance. For three groups labeled a, b, and c, the formula is SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG) 2nb 1 (Mc 2 MG ) 2nc Formula 6.2 suk85842_06_c06.indd 190 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance Where Ma 5 the mean of the scores in the first group (a) MG 5 the same grand mean used in SStot na 5 the number of scores in the first group (a) To calculate SSbet, follow these steps: 1. Determine the mean for each group: Ma, Mb, and so on. 2. Subtract MG from each sample mean and square the difference: (Ma 2 MG) 2 3. Multiply the squared differences by the number in the group: (Ma 2 MG) 2na 4. Repeat for each group.
  • 198. 5. Sum ( a ) the results across groups. The value that results from Formula 6.2 represents the differences between groups and the mean of all the data. The Error Term: The Sum of Squares Within (SSwith ) When a group receives the same treatment but individuals within the group respond dif- ferently, their differences constitute error—unexplained variability. Maybe subjects’ age differences are the cause, or perhaps the circumstances of their family lives, but for some reason not analyzed in the particular study, subjects in the same group often respond dif- ferently to the same stimulus. The amount of this unexplained variance within the groups is calculated with the SSwith, for which we have Formula 6.3: SSwith 5 a (xa 2 Ma)2 1 a (xb 2 Mb)2 1 a (xc 2 Mc )2 Formula 6.3 Where SSwith 5 the sum of squares within xa 5 each of the individual scores in Group a Ma 5 the score mean in Group a To calculate SSwith, follow these steps: 1. Take the mean for each of the groups; these are available from calculating the SSbet earlier.
  • 199. 2. From each score in each group, a. subtract the mean of the group, b. square the difference, and c. sum the squared differences within each group. 3. Repeat this for each group. 4. Sum the results across the groups. D When will the sum-of-squares values be negative? Try It! suk85842_06_c06.indd 191 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance The SSwith (or the SSerr) measures the degree to which scores vary due to factors not con- trolled in the study, fluctuations that constitute error variance. Because the SStot consists of the SSbet and the SSwith, once the SStot and the SSbet are known, the SSwith can be determined by subtraction: SStot 2 SSbet 5 SSwith However, there are two reasons not to determine the SSwith by simple subtraction. First, if there is an error in the SSbet, it is only perpetuated with the subtraction. Second, calculat- ing the value with Formula 6.3 helps clarify that what is being
  • 200. determined is a measure of how much variation in scores there is within each group. For the few problems done entirely by hand, we will take the “high road” and use the conceptual formula. Conceptual formulas (6.1, 6.2, and 6.3) clarify the logic involved, but in the case of analysis of variance, they also require a good deal of tiresome subtracting and then squaring of numbers. To minimize the tedium, the data sets here are all relatively small. When larger studies are done by hand, people often shift to the “calculation formulas” for simpler arithmetic, but there is a sacrifice to clarity. Happily, you will seldom ever find yourself doing manual ANOVA calculations, and after a few simple longhand problems, this chap- ter will explain how you can utilize Excel or SPSS for help with the larger data sets. Calculating the Sums of Squares A researcher is interested in the level of social isolation people feel in small towns (a), suburbs (b), and cities (c). Participants randomly selected from each of those three settings take the Assessment List of Nonnormal Environments (ALONE), for which the following scores are available: a. 3, 4, 4, 3 b. 6, 6, 7, 8 c. 6, 7, 7, 9 We know we are going to need the mean of all the data (MG) as
  • 201. well as the mean for each group (Ma, Mb, Mc ), so we will start there. Verify that a x 5 70 and N 5 12, so that MG 5 5.833. For the small-town subjects, a xa 5 14 and na 5 4, so Ma 5 3.50. For the suburban subjects, a xb 5 27 and nb 5 4, so Mb 5 6.750. suk85842_06_c06.indd 192 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance For the city subjects, a xc 5 29 and nc 5 4, so Mc 5 7.250. For the sum of squares total, the formula is SStot 5 a (x 2 MG)2. SStot 5 41.67 The calculations are in Table 6.1. Table 6.1: Calculating the sum of squares total (SStot ) SStot 5 a (x 2 MG) 2, MG 5 5.833 For the Town Data x 2 M 1x 2 M2 2
  • 202. 3 2 5.833 5 22.833 8.026 4 2 5.833 5 21.833 3.360 4 2 5.833 5 21.833 3.360 3 2 5.833 5 22.833 8.026 For the Suburb Data x 2 M 1x 2 M2 2 6 2 5.833 5 0.167 0.028 6 2 5.833 5 0.167 0.028 7 2 5.833 5 1.167 1.362 8 2 5.833 5 2.167 4.696 For the City Data x 2 M 1x 2 M2 2 6 2 5.833 5 0.167 0.028 7 2 5.833 5 1.167 1.362 7 2 5.833 5 1.167 1.362 9 2 5.833 5 3.167 10.030 SStot 5 41.668 suk85842_06_c06.indd 193 10/23/13 1:40 PM
  • 203. CHAPTER 6Section 6.1 One-Way Analysis of Variance For the sum of squares between, the formula is SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG) 2nb 1 (Mc 2 MG ) 2nc The SSbet involves three groups rather than the 12 individuals required for SStot. The SSbet is as follows: SSbet 5 (Ma 2 MG ) 2na 1 (Mb 2 MG) 2nb 1 (Mc 2 MG ) 2nc 5 (3.5 2 5.833)2(4) 1 (6.75 2 5.833)2(4) 1 (7.25 2 5.833)2(4) 5 21.772 1 3.364 1 8.032 5 33.17 The SSwith indicates the error variance by determining the differences between individual scores in a group and their means. The formula is SSwith 5 a (xa 2 Ma )2 1 a (xb 2 Mb)2 1 a (xc 2 Mc)2 SSwith 5 8.50
  • 204. The calculations are in Table 6.2. Because we calculated the SSwith directly instead of determining it by subtraction, we can now check for accuracy by adding its value to the SSbet. If the calculations are correct, SSwith 1 SSbet 5 SStot. For the isolation example, we have 8.504 1 33.168 5 41.67 In the initial calculation, SStot 5 41.67. The difference of .004 is round-off difference and is unimportant. Although they were not called sums of squares, we have calculated an equivalent statis- tic since Chapter 1. At the heart of the standard deviation calculation is those repetitive x 2 M differences for each score in the sample. The difference values are then squared and summed much as they are for calculating SSwith and SStot. Further, the denominator in the standard deviation calculation is n 2 1, which should look suspiciously like some of the degrees of freedom values we will discuss in the next section. Interpreting the Sums of Squares The different sums-of-squares values are measures of data variability, which makes them like the standard deviation, variance measures, the standard error of the mean, and so on. But there is an important difference between SS and the other statistics. In addition to data variability, the magnitude of the SS value reflects the number of scores included. Because
  • 205. sums of squares are in fact the sum of squared values, the more values there are, the larger E What will SStot 2 SSwith yield? Try It! suk85842_06_c06.indd 194 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance the value becomes. With statistics like the standard deviation, adding more values near the mean of the distribution actually shrinks its value. But this cannot happen with the sum of squares. Additional scores, whatever their value, will almost always increase the sum-of-squares. Table 6.2: Calculating the sum of squares within (SSwith ) SSwith 5 a 1xa 2 Ma 2 2 1 a 1xb 2 Mb 2 2 1 a 1xc 2 Mc 2 2 3, 4, 4, 3 6, 6, 7, 8 6, 7, 7, 9 Ma 5 3.50, Mb 5 6.750, Mc 5 7.250 For the Town Data
  • 206. x 2 M 1x 2 M2 2 3 2 3.50 5 20.50 0.250 4 2 3.50 5 0.50 0.250 4 2 3.50 5 0.50 0.250 3 2 3.50 5 20.50 0.250 For the Suburb Data x 2 M 1x 2 M2 2 6 2 6.750 5 20.750 0.563 6 2 6.750 5 20.750 0.563 7 2 6.750 5 0.250 0.063 8 2 6.750 5 1.250 1.563 For the City Data x 2 M 1x 2 M2 2 6 2 7.250 5 21.250 1.563 7 2 7.250 5 20.250 0.063 7 2 7.250 5 20.250 0.063 9 2 7.250 5 1.750 3.063 SSwith 5 8.504
  • 207. suk85842_06_c06.indd 195 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance This characteristic makes the sum of squares difficult to interpret. What constitutes much or little variability depends not just on how much difference there is between the scores and the mean to which they are compared but also on how many scores there are. Fisher turned the sum-of-squares values into a “mean measure of variability” by dividing each sum-of-squares value by its degrees of freedom. The SS 4 df operation creates what is called the mean square (MS). In the one-way ANOVA, there is a MS value associated with both the SSbet and the SSwith (SSerr). There is no mean squares total given in the table, but if this were to be calculated, it is the total variance (SSbet 1 SSwith) divided by the entire data set as a single sample minus one (N 2 1). Dividing the SStot by its degrees of freedom (N 2 1) would provide a mean level of overall variability, but that would not help answer questions about the ratio of between-groups variance to within-groups variance. The degrees of freedom for each of the sums of squares calculated for the one-way ANOVA are as follows: • Degrees of freedom total 5 N 2 1, where N is the total number of scores
  • 208. • Degrees of freedom for between (dfbet) 5 k 2 1, where k is the number of groups SSbet 4 dfbet 5 MSbet • Degrees of freedom for within (dfwith) 5 N 2 k SSwith 4 dfwith 5 MSwith Although there is no MStot, we need the sum of squares for total (SStot) and the degrees of freedom for total (dftot) because they provide an accuracy check: a. The sums of squares between and within should equal total sum of squares: SSbet 1 SSwith 5 SStot b. The sum of degrees of freedom between and within should equal degrees of freedom total: dfbet 1 dfwith 5 dftot Remembering these relationships can help reveal errors. In other words, the concept of error is unexplained or unsystematic variance within groups (SSwith) that is considered variance not caused by experimental manipulation, as opposed to explained or systematic variance due to experimental variance between groups (SSbet). The F Ratio
  • 209. The mean squares for between and within are the components of F, and the F ratio is the test statistic in ANOVA. As noted earlier in this chapter, the F is a ratio: F 5 MSbet MSwith Formula 6.4 suk85842_06_c06.indd 196 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance The issue is whether the MSbet, which contains the treatment effect and some error, is substantially greater than the MSwith, which contains only error. This is illustrated in Figure 6.3 by comparing the distance from the mean of the first distribution to the mean of the second distribution, the A variance, to the B and C variances, which indicate the differences within groups. If the MSbet/MSwith ratio is large—it must be substantially greater than 1—the difference between groups is likely to be significant. When that ratio is small (close to 1), F is likely to be nonsignificant. How large F must be to be significant depends on the degrees of free- dom for the problem, just as it did for the t-tests.
  • 210. Figure 6.3: The F-ratio: comparing variance between groups (A) to variance within groups (B 1 C) The ANOVA Table With the sums of squares and the degrees of freedom for the different values in hand, the ANOVA results are presented in a table often referred to as a source table, indicating the sources of variability that indicates • the source of the variance, • the sums-of-squares values, • the degrees of freedom for total degrees of freedom, dftot 5 N 2 1 (because N 5 12 dftot 5 11), for between degrees of freedom, dfbet 5 k 2 1(because k, the number of groups, 5 3 dfbet 5 k 2 1, dfbet 5 2), for within degrees of freedom, df 5 N 2 k (because N 5 12 and k 5 3, dfwith 5 9, • the mean square values, which are SS/df, and • the F value, which is the MSbet/MSwith. B C A suk85842_06_c06.indd 197 10/23/13 1:40 PM
  • 211. CHAPTER 6Section 6.1 One-Way Analysis of Variance For the social isolation problem, the ANOVA table is Source SS df MS F Between 33.17 2 16.58 17.55 Within 8.50 9 .95 Total 41.67 11 The table makes it easy to check some of the results for accuracy. Check that SSbet 1 SSwith 5 SStot Also verify that dfbet 1 dfwith 5 dftot In the course of checking results, note the sums-of-squares values can never be negative. Because the SS values are literally sums of squares, a negative number indicates a calcula- tion error somewhere because there is no such thing as negative variability (Chapter 1). The smallest a sum-of-squares value can be is 0, and this can happen only if all scores in the sum-of-squares calculation have the same value. Understanding F The larger F is, the more likely it is to be statistically significant, but how large is large
  • 212. enough? In the preceding ANOVA table, F 5 17.551, which seems like a comparatively large value. • The fact that F is determined by dividing MSbet by MSwith indicates that whatever the value of F is indicates the number of times MSbet is greater than MSwith. • Here MSbet is 17.551 times greater than MSwith, which seems promising, but to be sure, it must be compared to a value from the critical values of F (Table 6.3, which is repeated in the Appendix as Table C). As with the t-test, as degrees of freedom increase, the critical values decline. The difference is that with F two df values are involved: one for the MSbet and the other for the MSwith. • In Table 6.3 (also Table C in the Appendix), the critical value is identified by moving across the top of the table to the dfbet (the df numerator) and then moving down that column to the dfwith (the df denominator). According to the social isola- tion test ANOVA table above, these are the dfbet 5 2 and the dfwith 5 9. suk85842_06_c06.indd 198 10/23/13 1:40 PM
  • 213. CHAPTER 6Section 6.1 One-Way Analysis of Variance • The intersection of the 2 at the top and the 9 along the left side of the table leads to two critical values, one in regular type, which is for a 5 .05 and is the default, and one in bold type, which is the value for testing at the critical a 5 .01. • The critical value when testing at p 5 .05 is 4.26. • The critical value indicates that any ANOVA test with 2 and 9 df that has an F value equal to or greater than 4.26 is statistically significant. The social isolation differences between the three groups are probably not due to sampling variability. The statistical decision is to reject H0. The relatively large value of F—it is more than four times the critical value—indicates that of the differences in social isolation, much more of it is probably related to where respondents live than to the amount that is error variance. Table 6.3: The critical values of F Values in regular type indicate the critical value for p = .05. Values in bold type indicate the critical value for p = .01. df denominator df numerator 1 2 3 4 5 6 7 8 9 10
  • 224. F If the F in an ANOVA is 4.0 and the MSwith 5 2.0, what will be the value of MSbet? Try It! (continued) suk85842_06_c06.indd 199 10/23/13 1:40 PM CHAPTER 6Section 6.1 One-Way Analysis of Variance Table 6.3: The critical values of F (continued) Values in regular type indicate the critical value for p = .05. Values in bold type indicate the critical value for p = .01. df denominator df numerator 1 2 3 4 5 6 7 8 9 10 14 4.60 8.86 3.74 6.51 3.34 5.56 3.11 5.04
  • 238. 30 4.17 7.56 3.32 5.39 2.92 4.51 2.69 4.02 2.53 3.70 2.42 3.47 2.33 3.30 2.27 3.17 2.21 3.07 2.16 2.98 Source: Richard Lowry. file://localhost/www.vassarstats.net. Retrieved from http/::vassarstats.net:textbook:apx_d.html suk85842_06_c06.indd 200 10/23/13 1:40 PM file://localhost/www.vassarstats.net
  • 239. http/::vassarstats.net:textbook:apx_d.html CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD A significant t from an independent t-test provides for a simpler interpretation than a significant F from an ANOVA with three or more groups can provide. A significant t indicates that the two groups probably belong to populations with different means. A significant F indicates that at least one group is significantly different from at least one other group in the study, but unless there are only two groups in the ANOVA, it is not clear which group is significantly different from which. If the null hypothesis is rejected, there are a number of possible alternatives, as we noted when we listed all the possible HA outcomes earlier. The point of a post hoc test (an “after this” test) conducted following an ANOVA is to determine which groups are significantly different from each other. So when F is signifi- cant, a post hoc test is the next step. Statisticians debate the practice of whether to run a post hoc if F is not significant, as there may be instances in which the overall F will be nonsignificant yet the post hoc tests detect a significant difference between two groups. With the ease of running the analysis in Excel or SPSS, researchers may run post hoc tests to determine whether there are significant differences in means
  • 240. between pairs of groups. In the latter case, a planned comparison is most prudent for specific detection of mean dif- ferences. Whether planned comparison or post hoc, the determination should be based on the purpose of the study. If the goal is to test the null hypotheses that the means are not significantly different, then a significant omnibus F is appropriate. On the other hand, if there are specific instances of detecting differences between means, then the F result is not necessary and going straight to the post hoc tests will be apropos as in a planned compari- son between means. There are many post hoc tests that are used for different purposes and based on their own assumptions and calculations (18 of them in SPSS, named after their respective authors). Each of them has particular strengths, but one of the more common in the psychological disciplines, and also one of the easiest to calculate, is John Tukey’s HSD test, for “honestly significant difference.” Many statisticians use the terms liberal and conservative to describe post hoc tests. A liberal test is one in which there is a greater chance of finding a significant difference between means but a higher chance of a type I error. Fisher’s least significant difference (LSD) test is an example of a liberal test. These are seldom used for the very concern of committing a type I error. Conversely, a conservative post hoc has a lower chance of finding a significant difference between means but also a lower chance of a type I
  • 241. error. One such conservative test is Bonferroni’s post hoc. By their very conservative nature, these post hoc tests are more widely used. Formula 6.5 produces a value that is the smallest difference between the means of any two samples that can be statistically significant: HSD 5 x Å MSwith n Formula 6.5 suk85842_06_c06.indd 201 10/23/13 1:40 PM CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD Where x 5 a table value indexed to the number of groups (k) in the problem and the degrees of freedom within (dfwith) from the ANOVA table MSwith 5 the value from the ANOVA table n 5 the number in one group when group sizes are equal. In order to compute Tukey’s HSD, follow these steps: 1. From Table 6.4 locate the value of x by mov- ing across the top of the table to the number
  • 242. of groups/treatments (k 5 3), and then down the left side for the within degrees of freedom (dfwith 5 9). The intersecting values are 3.95 and 5.43. The smaller of the two is the value when p 5 .05, as it was in our test. The post hoc test is always conducted at the same probability level as the ANOVA. In this case, it is p 5 .05. 2. The calculation is 3.95 times the result of the square root of .945 (the MSwith) divided by 4 (n). 3.95 Å .954 4 5 1.920 3. This value is the minimum difference between the means of two significantly dif- ferent samples. The sign of the difference does not matter; it is the absolute value we need. The means for social isolation in the three groups are the following: Ma 5 3.50 for small town respondents Mb 5 6.750 for suburban respondents Mc 5 7.250 for city respondents Small towns minus suburbs: Ma 2 Mb 5 3.50 2 6.75 5 23.25—this difference exceeds 1.92 and is significant.
  • 243. Small towns minus cities: Ma 2 Mc 5 3.50 2 7.25 5 23.75—this difference exceeds 1.92 and is significant. Suburbs minus cities: Mb 2 Mc 5 6.75 2 7.25 5 20.50—this difference is less than 1.92 and is not significant. When several groups are involved, sometimes it is helpful to create a table that presents all the differences between pairs of means. Table 6.5, which is repeated in the Appendix as Table D, is the Tukey’s HSD results for the social isolation problem. Formula 6.5 is used when group sizes are equal. However, there is an alternate formula for unequal group sizes for the more adventurous: HSD 5 Å a MSwith 2 b a 1 n1 1 1 n2
  • 244. b with a separate HSD value completed for each pair of means in the problem. Try It! suk85842_06_c06.indd 202 10/23/13 1:40 PM CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD Table 6.4: Tukey’s HSD critical values: q (Alpha, k, df ) * The critical value for q corresponding to alpha = .05 (top) and alpha = .01 (bottom) df k 5 Number of Treatments 2 3 4 5 6 7 8 9 10 5 3.64 5.70 4.60 6.98 5.22 7.80 5.67 8.42
  • 258. 4.72 5.65 4.82 5.76 40 2.86 3.82 3.44 4.37 3.79 4.70 4.04 4.93 4.23 5.11 4.39 5.26 4.52 5.39 4.63 5.50 4.73 5.60 Source: Tukey’s HSD critical values (n.d.). Retrieved from http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html.
  • 259. suk85842_06_c06.indd 203 10/23/13 1:40 PM http://www.stat.duke.edu/courses/Spring98/sta110c/qtable.html CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD Table 6.5: Presenting Tukey’s HSD results in a table HSD 5 x Å MSwith n 5 3.95 Å .954 4 5 1.920 (x2) Any difference between pairs of means 1.920 or greater is a statistically significant difference. Mean differences in orange are statistically significant. Small towns M 5 3.500 Suburbs M 5 6.750 Cities M 5 7.250 Small towns M 5 3.500
  • 260. Diff 5 3.250 Diff 5 3.750 Suburbs M 5 6.750 Diff 5 0.500 Cities M 5 7.250 The values entered in the cells in Table 6.5 indicate the differences between each pair of means in the study. Comparing the mean scores from each of the three groups indicates that the respondents from small towns expressed a significantly lower level of social isolation than those in either the suburbs or cities. Comparing the mean scores from the suburban and city groups indicates that social isolation scores are higher in the city, but the difference is not large enough to be statistically significant. The significant F from the ANOVA indicated that at least one group had a significantly different level of social isolation from at least one other group, but that is all a significant F can reveal. The result does not indicate which group is significantly different from which other group, unless there are only two groups. The post hoc test indicates which pairs of groups are significantly different from each other. Table 6.5 is an example of how to illus- trate the significant and the nonsignificant differences. One caveat in using Tukey’s HSD is that there is an assumption of equality of variances
  • 261. (homogeneity) between groups based on Levene’s test. This assumption applies here as well. Suppose there is a violation of homogeneity. In that instance, an adjusted post hoc that accounts for inequality of vari- ances (or heterogeneity) will need to be employed. To implement this in SPSS for instance there are four options under the Equal Variances Not Assumed heading when conducting a post hoc for ANOVA. One of these approaches is the Games- Howell post hoc, which is executed by checking that box in SPSS Post Hoc tests tab for ANOVA. suk85842_06_c06.indd 204 10/23/13 1:40 PM CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD Apply It! ANOVA and Product Development A product development specialist in a major computer company decides that it would be a significant improvement to keyboards if they were designed to fit the shape of human hands. Instead of being flat, the new keyboard would curve like the surface of a football. Before the company executives are willing to expend the resources necessary to produce and distribute such a product, they need to know whether
  • 262. it will sell and what the most comfortable curvature of the keyboard would be. The company produces prototypes for four different keyboards, labeled Prototype A through D (see Table 6.6). Prototype A is a standard flat keyboard, and the others each have varying amounts of curve. Everything else about the keyboards is the same, so this is a one-way ANOVA. Forty different users are randomly assigned to test one of the four keyboards and rank them in comfort on a 100-point scale. The results are shown below Figure 6.4. Table 6.6: Prototype A–D data set Prototype A Prototype B Prototype C Prototype D 49 57 77 65 57 53 82 61 73 69 77 73 68 65 85 81 65 61 93 89 62 73 79 77 61 57 73 81 45 69 89 77 53 73 82 69
  • 263. 61 77 85 77 Next, the test results are analyzed in Excel, which produces the information in Figure 6.4. (continued) suk85842_06_c06.indd 205 10/23/13 1:40 PM CHAPTER 6Section 6.2 Identifying the Difference: Post Hoc Tests and Tukey’s HSD Apply It! (continued) Figure 6.4: Excel results of comparison means and ANOVA of prototypes The null hypothesis is that there is no difference among the four keyboards. From Table 6.6, we see that the F value is 16.72, which is larger than the critical value of F 5 2.87 at the critical a 5 .05. Therefore the null hypothesis is rejected at p , .05. At least one of the prototypes is significantly different from at least one other prototype. Because there is a significant F, the marketers next compute HSD: HSD 5 x Å MSwith n
  • 264. Where x 5 3.81 (based on k 5 4, dfwith 5 36, and p 5 .05) MSwith 5 61.07, the value from the ANOVA table n 5 10, the number in one group when group sizes are equal HSD 5 9.42 (continued) Groups Prototype A Prototype B Prototype C Prototype D Count 10 10 10 10 Sum 594 654
  • 267. suk85842_06_c06.indd 206 10/23/13 1:40 PM CHAPTER 6Section 6.3 Determining the Results’ Practical Importance Apply It! (continued) This value is the minimum difference between the means of two significantly different sam- ples. The difference in means between the groups is shown below: A 2 B 5 26.0 A 2 C 5 222.8 A 2 D 5 215.6 B 2 C 5 216.8 B 2 D 5 29.6 C 2 D 5 7.2 The differences in comfort between Prototypes A-B and C-D are not statistically significant, because the absolute values are less than the Tukey’s HSD value of 9.42. However, the differ- ences in comfort between the remaining prototypes are statistically significant. Based on analysis of the one-way ANOVA, the marketing team decides to produce and sell the keyboard configuration of Prototype C. This had the highest mean comfort level and will be a significant improvement over existing keyboards. Apply It! boxes written by Shawn Murphy
  • 268. 6.3 Determining the Results’ Practical Importance Three questions can come up in an ANOVA. The second and third questions depend upon the answer to the first: 1. Are any of the differences statistically significant? The answer depends upon how the calculated F value compares to the critical value from the table. 2. If the F is significant, which groups are significantly different from each other? That question is answered by completing a post hoc test such as Tukey’s HSD. 3. If F is significant, how important is the result? The answer comes by calculating an effect size. After addressing the first two questions, we now turn our attention to the third question, effect size. With the t-test in Chapter 5, Cohen’s d answered the question about how impor- tant the result was. Several effect-size statistics have been used to explain the importance of a significant ANOVA result. Omega squared (v2) and partial- eta-squared (partial-h2) are both quite common in the social science research literature, but the one we will use is called eta-squared (H2). The Greek letter eta (h pronounced like “ate a” as in “ate a grape”) is the equivalent of the letter h. Because some of the variance in scores is unexplained and is therefore error variance, eta-squared answers this question: How much of the score
  • 269. variance can be attributed to the independent variable? suk85842_06_c06.indd 207 10/23/13 1:40 PM CHAPTER 6Section 6.3 Determining the Results’ Practical Importance In the social isolation problem, the question was whether residents of small towns, subur- ban areas, and cities differ in the amount of social isolation they indicate. The respondents’ location is the IV. Eta-squared estimates how much of the difference in social isolation is related to where respondents live. There are only two values involved in the h2 calculation, both retrievable from the ANOVA table. Formula 6.6 shows the eta-squared calculation: h2 5 SSbet SStot Formula 6.6 Eta-squared is the ratio of between-groups variability to total variability. If there was no error variance, all variance would be due to the independent variable, and the sums of squares for between-groups variability and for total variability would have the same val- ues; the effect size would be 1.0. With human subjects, this never happens because scores fluctuate for reasons other than the IV, but it is important to
  • 270. know that 1.0 is the “upper bound” for this effect size. The lower bound is 0, of course— none of the variance is explained. But we also never see eta-squared values of 0 because the only time the effect size is calculated is when F is significant, and that can only happen when the effect of the IV is great enough that the ratio of MSbet to MSwith exceeds the critical value. For the social isolation problem, SSbet 5 33.168 and SStot 5 41.672, so h2 5 33.168 41.672 5 0.796. According to this data, about 80% (79.6% to be exact) of the variance in social isolation scores is related to whether the respondent lives in a small town, a suburb, or a city. (Note that this amount of variance is unrealistically high, which can happen when numbers are contrived.) G If the F in ANOVA is not significant, should the post hoc test or the effect-size calculation be made? Try It! Apply It!
  • 271. Using ANOVA to Test Effectiveness A pharmaceutical company has developed a new medicine to treat a skin condition. This medi- cine has been proven effective in previous tests, but now the company is trying to decide the best method to deliver the medicine. The options are 1. pills that are taken orally, 2. a cream that is rubbed into the affected area, or 3. drops that are placed on the affected area. (continued) suk85842_06_c06.indd 208 10/23/13 1:40 PM CHAPTER 6Section 6.3 Determining the Results’ Practical Importance Apply It! (continued) To test the application methods, the company uses 24 volunteers who suffer from this skin condition. Each of the volunteers is randomly assigned to one of the three treatment methods. Note that each volunteer tests only one of the delivery methods. This satisfies the requirement that the categories of the IV must be independent. This is a one-way ANOVA test with the delivery method being the only independent variable.
  • 272. To evaluate the effectiveness of each delivery method, three different dermatologists exam- ine each patient after the course of treatment. They then rate the skin condition on a scale of 1 through 20, with 20 being a total absence of the condition. The scores from the three doctors are then averaged. The null hypothesis is that all three delivery methods are equally effective: H0: mpills 5 mcream 5 mdrops The null hypothesis indicates that the three treatments were drawn from populations with the same mean. The alternate hypothesis for the ANOVA test is Ha: mpills ? mcream ? mdrops Data from the trial is shown in Table 6.7. Table 6.7: Data from trial of skin treatment conditions Pills Cream Drops 14 18 13 13 15 15 19 16 16 18 18 15 15 17 14 16 13 17
  • 273. 12 17 13 12 18 16 (continued) suk85842_06_c06.indd 209 10/23/13 1:40 PM CHAPTER 6Section 6.3 Determining the Results’ Practical Importance Apply It! (continued) Figure 6.5: Analysis of the data that was performed in Excel Figure 6.5 shows the value for F is 1.72, which is less than the Fcrit value of 3.47 when testing at p 5 .05. Therefore, the null hypothesis is not rejected. We cannot say that the different delivery methods come from populations with different means. Looking at the p value gen- erated by Excel, we see that there is a 20% probability that a difference in means this large could have occurred by chance alone. Because the null hypothesis is not rejected, there is no need to perform either a Tukey’s HSD test or an h2 calculation. The pharmaceutical company decides to offer the medicine as a cream because this is gen- erally their preferred delivery method. The ANOVA test has assured them that this is the correct choice, and that neither of the two alternate methods
  • 274. provided a more effective delivery option. In other words, the alternative hypothesis is not correct. Apply It! boxes written by Shawn Murphy Groups Pills Cream Drops Count 8 8 8 Sum 119 132 119 Average 14.88 16.50
  • 276. 23 MS 7.04 4.08 F 1.72 p-value 0.20 Fcrit 3.47 ANOVA suk85842_06_c06.indd 210 10/23/13 1:40 PM CHAPTER 6Section 6.4 Conditions for the One-Way ANOVA 6.4 Conditions for the One-Way ANOVA As we saw with the t-tests, any statistical test requires that certain conditions (also referred to as assumptions) are met. The conditions might be characteristics such as the scale of the data, the way the data is distributed, the relationships between the groups in the analysis,
  • 277. and so on. In the case of the one-way ANOVA, the name indicates one of the conditions. • This particular test can accommodate just one independent variable. • That one variable can have any number of categories, but there can be just one IV. In the example of small-town, suburban, and city isolation, the IV was the loca- tion of the respondents’ residence. We might have added more categories such as small-town, semirural, small town, large town, suburbs of small cities, suburbs of large cities, and so on, all of which relate to the respondents’ place of residence, but like the independent t-test, there is no way to add another variable, such as the respondents’ gender, in a one-way ANOVA. • The categories of the IV must be independent. • Like the independent t-test, the groups involved must be independent. Those who are members of one group cannot also be members of another group involved in the same analysis. • The IV must be nominal scale. • Because the IV must be nominal scale, sometimes data of some other scale is reduced to categorical data to complete the analysis. If someone is interested in whether there are differences in social isolation related to age, age must be
  • 278. changed from ratio to nominal data prior to the analysis. Rather than using each person’s age in years as the independent variable, ages are grouped into catego- ries such as 20s, 30s, and so on. This is not ideal, because by reducing ratio data to nominal or even ordinal scale, the differences in social isolation between, for example, 20- and 29-year-olds are lost. • The DV must be interval or ratio scale. • Technically, social isolation would need to be measured with something like the number of verbal exchanges that one has daily with neighbors or co-workers, rather than asking on a scale of 1–10 to indicate how isolated one feels, which is probably an example of ordinal data. • The groups in the analysis must be similarly distributed. The technical descrip- tion for this similarity of distribution is homogeneity of variance. For example, this condition means that the groups should all have reasonably similar standard deviations. This was discussed in Chapter 5 where the Levene’s test is used to test equality of variances. • Finally, using ANOVA assumes that the samples are drawn from a normally dis- tributed population. It may seem difficult to meet all these conditions. However, keep in mind that normality
  • 279. and homogeneity of variance in particular represent ideals more than practical necessities. As it turns out, Fisher’s procedure can tolerate a certain amount of deviation from these requirements; this test is quite robust. suk85842_06_c06.indd 211 10/23/13 1:40 PM CHAPTER 6Section 6.5 ANOVA and the Independent t-Test 6.5 ANOVA and the Independent t-Test The one-way ANOVA and the independent t-test share several assumptions, although they employ distinct statistics, in that the sums of squares is used for ANOVA and the standard error of the difference is used for the t-test. For example, both tests will lead the analyst to the same conclusion. This consistency can be illustrated by completing ANOVA and the independent t-test for the same data. Suppose an industrial psychologist is interested in how people from two separate divi- sions of a company differ in their work habits. The dependent variable is the amount of work completed after-hours at home per week for supervisors in marketing versus super- visors in manufacturing. The data is as follows: Marketing: 3, 4, 5, 7, 7, 9, 11, 12 Manufacturing: 0, 1, 3, 3, 4, 5, 7, 7 Calculating some of the basic statistics yields the following:
  • 280. M s SEM SEd MG Marketing: 7.25 3.240 1.146 1.458 5.50 Manufacturing: 3.75 2.550 0.901 First, the t-test: t 5 M1 2 M2 SEd 5 7.25 2 3.75 1.458 5 2.401; t.05(14) 5 2.145 The difference is significant. Those in marketing (M1) take significantly more work home than those in manufacturing (M2). Now the ANOVA: • SStot 5 a (x 2 MG)2 5 168 • Verify that the result of subtracting MG from each score in both groups, squaring the differences, and summing the square 5 168. • SSbet 5 (Ma 2 MG) 2na 1 (Mb 2 MG)
  • 281. 2nb • This one is not too lengthy to do here: (7.25 2 5.50)2(8) 1 (3.75 2 5.50)2(8) 5 24.5 1 24.5 5 49. • SSwith 5 (xa 2 Ma) 2 1 (xb 2 Mb) 2 • Verify that the result of subtracting the group means from each score in the par- ticular group, squaring the differences, and summing the squares 5 119. • Check that SSwith 1 SSbet 5 SStot : 119 1 49 5 168. suk85842_06_c06.indd 212 10/23/13 1:40 PM CHAPTER 6Section 6.6 Completing ANOVA with Excel Source SS df MS F Fcrit Between 49 1 49 5.765 F.05(1,14) 5 4.60 Within 119 14 8.5 Total 168 15 Like the t-test, ANOVA indicates that the difference in the amount of work completed at home is significantly different for the two groups, so at least both tests draw the same
  • 282. conclusion about whether the result is significant, but there is more similarity than this. • Note that the calculated value of t 5 2.401, and the calculated value of F 5 5.765. • If the value of t is squared, it equals the value of F. 2.4012 5 5.765. • The same is true for the critical values: t.05(14) 5 2.145 F.05(1,14) 5 4.60 2.1452 5 4.60 Gosset’s and Fisher’s tests draw exactly equivalent conclusions when there are two groups. The ANOVA tends to be more work, and researchers ordinarily use the t-test for two groups, but the point is that the two tests are entirely consistent. 6.6 Completing ANOVA with Excel The ANOVA by longhand involves enough calculated means, subtractions, squaring of differences, and so on that doing an ANOVA on Excel is beneficial. A researcher is comparing the level of optimism indicated by people in different vocations during an economic recession. The data is from laborers, clerical staff in professional offices, and the professionals in those offices. The data for the three groups follows:
  • 283. Laborers: 33, 35, 38, 39, 42, 44, 44, 47, 50, 52 Clerical staff: 27, 36, 37, 37, 39, 39, 41, 42, 45, 46 Professionals: 22, 24, 25, 27, 28, 28, 29, 31, 33, 34 1. Create the data file in Excel. Enter Laborers, Clerical staff, and Professionals in cells A1, B1, and C1, respectively. 2. In the columns below those labels, enter the optimism scores, beginning in cell A2 for the laborers, B2 for the clerical workers, and C2 for the professionals. Once the data is entered and checked for accuracy, proceed with the following steps. 3. Click the Data tab at the top of the page. H What is the relationship between the values of t and F if both are performed for the same two-group test? Try It! suk85842_06_c06.indd 213 10/23/13 1:40 PM CHAPTER 6Section 6.6 Completing ANOVA with Excel 4. At the extreme right, choose Data Analysis. 5. In the Analysis Tools window, select ANOVA Single Factor
  • 284. and click OK. 6. Indicate where the data is located in the Input Range. In the example here, the range is A2:C11. 7. Note that the default is “Grouped by Columns.” If the data is arrayed along rows instead of columns, this would need to be changed. Because we designated A2 instead of A1 as the point where the data begins, there is no need to indicate that labels are in the first row. 8. Select Output Range and enter a cell location where you wish the display of the output to begin. In the example in Figure 6.6, the location is A13. 9. Click OK. Widen column A to make the output easier to read. It will look like the screenshot in Figure 6.6. Figure 6.6: Performing an ANOVA on Excel suk85842_06_c06.indd 214 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results As you have already seen in the two Apply It! boxes, the results appear in two tables. The first provides descriptive statistics. The second table looks
  • 285. like the longhand table of results for the social isolation example, except that • the figures shown for the total follow those for between and within instead of preceding them, and • the P-value column indicates the probability that an F of this magnitude could have occurred by chance. Note that the P value is 4.31E-06. The “E-06” is scientific notation. It is a shorthand way of indicating that the actual value is p 5 .00000431, or 4.31 with the decimal moved six deci- mals to the left, or negative exponent to the sixth power The probability easily exceeds the p 5 .05 standard for statistical significance. 6.7 Presenting Results The previous analyses all used Excel, so we will now shift to using SPSS for the execution of these steps and the interpretation of the results. We will first use the data in Table 6.7 and then proceed with actual data gathered from published research. You will see that we use the same steps regardless of the sample size, and that using technology like Excel and SPSS makes hand calculations unnecessary. While hand cal- culations are instructive, they are also laborious and more prone to errors, especially with large data sets. SPSS Example 1: Steps for ANOVA
  • 286. After setting up the data in SPSS as seen in Figure 6.7 (data from Table 6.7), the steps in executing this analysis are as follows: Analyze S Compare Means S One-Way ANOVA. Place Treatment into the Factor box and Skin Condition into the Dependent List. Click Post Hoc on the left and check Tukey and Games-Howell; then click Options and check Descriptive and Homogeneity of variance test. Click Continue and OK. (Note that the three treatment groups in the data set (Figure 6.7) are numerically coded: Pills 5 1, Creams 5 2, and Drops 5 3.) suk85842_06_c06.indd 215 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results Figure 6.7: Data set in SPSS suk85842_06_c06.indd 216 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results Figure 6.8: SPSS output from trial of skin treatment conditions Levene Statistic 1.822 df1
  • 287. 2 df2 21 Sig. .186 Test of Homogeneity of Variances SkinCondition ANOVA SkinCondition Between Groups Within Groups Total Sum of Squares 14.083 85.750 99.833 df
  • 289. 8 8 24 16.50 14.88 14.88 15.42 Std. DeviationMean Std. Error 1.773 1.458 2.642 2.083 .627 .515 .934 .425 15.02 13.66
  • 291. Upper Bound Maximum Multiple Comparisons Dependent Variable: SkinCondition Tukey HSD Games- Howell Pills Creams Drops Pills Creams Drops (I) Treatment 1.625 �1.625 1.625 .000
  • 296. Creams Creams Drops Pills Drops Pills Creams suk85842_06_c06.indd 217 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results As seen in the SPSS output (Figure 6.8), the ANOVA results are the same as when exe- cuted in Excel earlier in the chapter. Here SPSS allows execution of the ANOVA including descriptive statistics, tests of homogeneity of variance, post hoc tests, and a line graphs— all simultaneously executed using the SPSS steps outlined earlier. The results begin with the Descriptives table where you can see that each group has an even number of partici- pants (n 5 8). Here you can see difference in the means with Pills (M 5 16.50) highest of the three treatments. The Test of Homogeneity of Variance shows a favorable result in that it is not significant (p . .05), specifically p 5 .186. This
  • 297. indicates that there is no significant difference in the variance of the three treatments indicating equal variances. As you recall in earlier chapters, if there is inequality of variance across groups an adjustment is needed to compare groups. Next, the ANOVA table shows a nonsignificant F statistic, p 5 .203. At this stage since F is not significant we do not need to interpret the post hoc tests as there will be no significance between groups. As noted earlier in the chapter, this is a debat- able topic in that with the ease of running post hoc tests, the analyst can easily look at the results of these regardless of the F statistic result. Findings may indicate significant differ- ences between any two groups even though there is a nonsignificant F. This is often a rar- ity and you can clearly see from the example that none of the post hoc tests is significant between groups. SPSS Example 2: Steps for ANOVA Using public data about higher education and housing from Pew Research (2010), Social and Demographic Trends, the steps in executing this analysis are as follows: Analyze S Compare Means S One-Way ANOVA. Place schl (currently enrolled in school) into the Factor box and age into the Dependent List. Click Post Hoc on the left and check Tukey and Games-Howell; then click Options and check Descriptive, Homogeneity of variance test, and Means plot. Click Continue and OK. suk85842_06_c06.indd 218 10/23/13 1:40 PM
  • 298. CHAPTER 6Section 6.7 Presenting Results Figure 6.9: SPSS output from Pew research social and demographic trends (2010) education data set Levene Statistic 44.884 df1 5 df2 1692 Sig. .000 Test of Homogeneity of Variances AGE What is your age? ANOVA AGE What is your age? Between Groups Within Groups
  • 300. AGE What is your age? Std. DeviationMean Std. Error 18 43 18 58 18 64 18 64 18 64 22 34 19.64 24.81 36.03 31.38 42.05 29.33 38.82 33 212 33
  • 303. Upper Bound Maximum Yes, in High School Yes, in Technical, trade, or vocational school Yes, in College (undergraduate, including 2-year colleges) Yes, in Graduate School No Don’t know/ Refused (VOL.) Total N suk85842_06_c06.indd 219 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results Figure 6.9: SPSS output from Pew research social and
  • 304. demographic trends (2010) education data set (continued) Multiple Comparisons Dependent Variable: AGE What is your age? Tukey HSD (I) SCHL Are you currently enrolled in school? (J) SCHL Are you currently enrolled in school? Mean Difference (I-J) 95% Confidence Interval Std. Error Lower Bound Upper
  • 305. Bound Sig. Yes, in High School Yes, in High School Yes, in Technical, trade, or vocational school Yes, in College (undergraduate, including 2-year colleges) Yes, in College (undergraduate, including 2-year colleges) Yes, in College (undergraduate, including 2-year colleges) No Don’t know/Refused (VOL.) Yes, in Graduate School Yes, in Technical, trade, or vocational school Yes, in High School Yes, in Technical, trade, or vocational school
  • 306. Yes, in College (undergraduate, including 2-year colleges) Yes, in High School Yes, in Technical, trade, or vocational school No Don’t know/Refused (VOL.) No Don’t know/Refused (VOL.) Yes, in Graduate School Don’t know/Refused (VOL.) Yes, in Graduate School No Don’t know/Refused (VOL.) Yes, in Graduate School Yes, in College (undergraduate, including 2-year colleges) Yes, in High School
  • 307. Yes, in Technical, trade, or vocational school No Yes, in Graduate School �16.394* �5.170 �11.746* �22.417* �9.697 16.394* 11.224* 4.648 �6.023 6.697 5.170 �11.224* �6.576* �17.247* �4.527
  • 315. 12.40 19.92 33.64 14.81 31.52 15.13 25.57 19.23 8.20 * The mean difference is significant at the 0.05 level. Yes, in College (undergraduate, including 2-year colleges) Yes, in Graduate School No Yes, in High School Yes, in Technical, trade, or
  • 316. vocational school Don’t know/ Refused (VOL.) suk85842_06_c06.indd 220 10/23/13 1:40 PM CHAPTER 6Section 6.7 Presenting Results Figure 6.9: SPSS output from Pew research social and demographic trends (2010) education data set (continued) Source: Data from Pew Research: Social and Demographic Trends. (2011). Higher Education/Housing. Retrieved from http://www .pewsocialtrends.org/category/datasets/. Multiple Comparisons Dependent Variable: AGE What is your age? Games- Howell (I) SCHL Are you currently enrolled in school? (J) SCHL Are you currently enrolled in
  • 317. school? Mean Difference (I-J) 95% Confidence Interval Std. Error Lower Bound Upper Bound Sig. Yes, in High School Yes, in High School Yes, in Technical, trade, or vocational school Yes, in College (undergraduate, including 2-year colleges) Yes, in College (undergraduate, including 2-year colleges)
  • 318. Yes, in College (undergraduate, including 2-year colleges) No Don’t know/Refused (VOL.) Yes, in Graduate School Yes, in Technical, trade, or vocational school Yes, in High School Yes, in Technical, trade, or vocational school Yes, in College (undergraduate, including 2-year colleges) Yes, in High School Yes, in Technical, trade, or vocational school No Don’t know/Refused (VOL.) No Don’t know/Refused (VOL.) Yes, in Graduate School
  • 319. Don’t know/Refused (VOL.) Yes, in Graduate School No Don’t know/Refused (VOL.) Yes, in Graduate School Yes, in College (undergraduate, including 2-year colleges) Yes, in Graduate School No Yes, in High School Yes, in Technical, trade, or vocational school Yes, in College (undergraduate, including 2-year colleges) Yes, in High School Yes, in Technical, trade, or vocational school
  • 320. No Yes, in Graduate School Don’t know/ Refused (VOL.) * The mean difference is significant at the 0.05 level. �16.394* �5.170* �11.746* �22.417* �9.697 16.394* 11.224* 4.648 �6.023 6.697 5.170* �11.224* �6.576*
  • 328. 28.43 25.13 14.25 19.21 42.98 14.29 38.01 13.62 34.04 24.33 17.54 suk85842_06_c06.indd 221 10/23/13 1:40 PM http://www.pewsocialtrends.org/category/datasets/ http://www.pewsocialtrends.org/category/datasets/ CHAPTER 6Section 6.7 Presenting Results Figure 6.10: SPSS output graph from Pew research social and demographic trends (2010) education data set Source: Data from Pew Research: Social and Demographic Trends. (2011). Higher Education/Housing. Retrieved from
  • 329. http://www .pewsocialtrends.org/category/datasets/. The Descriptives table in Figures 6.9 and 6.10 show that each group has an unequal number of participants with No (not in school) participants with n 5 1,336 and the highest mean age (M 5 42.05). The Test of Homogeneity of Variance shows a nonfavorable result in that it is significant (p , .05). This indicates that there is a significant difference in the variance of the six education groups indicating unequal variances (or heterogeneity of variance). Next, the ANOVA table indicates a significant F statistic (p , .05). To determine which of the group comparisons is significant using a post hoc test when there is a violation of homogeneity, equal variance will not be assumed. Therefore, we will interpret the Equal variances not assumed post hoc tests, which is Games-Howell. Here, the Don’t Know/ Refused group is not significant with any of the other education groups. You can also see significant difference with several groups such as Yes, in High School and Yes, in Technical, trade, or vocational school. All comparisons can be made in a similar manner based on the significance value using the Multiple Comparisons table. The line graph or means plot shows the mean age of each group with the No group having the highest mean age and the Yes, in High School group having the lowest. suk85842_06_c06.indd 222 10/23/13 1:40 PM http://www.pewsocialtrends.org/category/datasets/
  • 330. http://www.pewsocialtrends.org/category/datasets/ CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H- Test 6.8 Interpreting Results Though you should refer to the most recent edition of the APA manual for specific detail on formatting statistics, the following may be used as a quick guide in presenting the statistics covered in this chapter. Table 6.8: Guide to APA formatting of F statistic results Abbreviation or Term Description F F test statistic score h2 Eta-squared: an effect size v2 Omega-squared: an effect size HSD Honestly significant difference: a Tukey’s post hoc test SS Sum of squares MS Mean square Source: Publication Manual of the American Psychological Association, 6th edition. © 2009 American Psychological Association, pp. 119–122. Note that all of the terms in Table 6.8 are italicized, while HSD is not. The following are
  • 331. some examples of how to present results using these abbreviations, though you may use different combinations of results. Using the data from SPSS Example 2, Figures 6.9 and 6.10, we could present the results in the following way: • The overall difference between treatment and skin condition was not signifi- cantly different F(2,21) 5 1.724, p 5 .203. (Note that the df is listed for both the between- and within-group lines in the ANOVA table.) • The overall difference between school and age was significantly different F(5,1692) 5 90.39, p , .05. • The No [school] group were significantly older (M 5 42.05, SD 5 13.39) than the Yes, in High School group (M 5 19.64, SD 5 4.80), the Yes, In College. . . group (M 5 24.81, SD 5 8.43), and the Yes, in Graduate School group (M 5 31.38, SD 5 10.69), whereas there were no significant differences with the Yes, in Tech- nical, trade. . . group (M 5 36.03, SD 5 15.50), and the Don’t Know/Refused group (M 5 29.33, SD 5 6.43). 6.9 Nonparametric Test: Kruskal-Wallis H-test The one-way ANOVA nonparametric equivalent is the Kruskal- Wallis H-test, also known as the Kruskal Wallis ANOVA. Like the Mann-Whitney U-test, the Kruskal- Wallis H-test is based on ranked (ordinal) data. It is used as an
  • 332. alternative to its para- metric counterpart when violations of assumptions have occurred. In fact, Kruskal was suk85842_06_c06.indd 223 10/23/13 1:40 PM CHAPTER 6 not a proponent of significance testing, as Bradburn (2007) has quoted him as saying, “I am thinking these days about the many senses in which relative importance gets consid- ered. Of these senses, some seem reasonable and others not so. Statistical significance is low on my ordering.” That said, his derived equivalent of a parametric technique is very apropos. As in the Mann-Whitney U-test, the rank of each group is determined and then summed. The H is calculated as a pro- portion of the summed ranks divided by their respective sample sizes. H 5 12 N1N 1 12 a a 1Tg 2 2 ng b 2 31N 1 12 Formula 6.7 Where
  • 333. N 5 total sample size Tg 5 sum of ranks across ng 5 sample of group To illustrate the calculation of the H-test, we will use the same data from Table 6.7 with a few modifications as seen in Table 6.9. Here the initial rank is to rank all the values across treatments with 1 being the lowest rank. If there are tied ranks, then an average of the ranks is taken. For instance in the Pills column, the two values of 12 have an initial rank of 1 and 2. The average of them is 1.5 as seen in the Rank column. The same is true for values of 13, where there are four ranks with the average rank of 4.5, and so on with the other ties. Once all of these are complete, then the ranks are summed as seen in the last row of the table. Table 6.9: Data from trial of skin treatment conditions Pills Initial Rank Rank Cream Initial Rank Rank Drops Initial Rank Rank 14 7 7.5 18 21 21.5 13 3 4.5
  • 334. 13 4 4.5 15 10 10.5 15 11 10.5 19 24 24 16 14 14.5 16 15 14.5 18 20 21.5 18 22 21.5 15 12 10.5 15 9 10.5 17 17 18 14 8 7.5 16 13 14.5 13 5 4.5 17 19 18 12 1 1.5 17 18 18 13 6 4.5 12 2 1.5 18 23 21.5 16 16 14.5 85.5 130 84.5 There are several websites that will help in these calculations. One well-used statistical calculator for various analyses, such as the Kruskal-Wallis H-test, can be done using the resources available at the VassarStats website via the link provided below. Use the data provided in this chapter section to see if you get the same results. http://vassarstats.net /index.html Try It!
  • 335. Section 6.9 Nonparametric Test: Kruskal-Wallis H-Test suk85842_06_c06.indd 224 10/23/13 1:40 PM http://vassarstats.net/index.html http://vassarstats.net/index.html CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H- Test Next each of the summed ranks are divided by their respective sample sizes, completing Formula 6.7. H 5 12 24124 1 12 1 c 185.52 2 8 1 11302 2 8 1 184.52 2 8 d 2 3124 1 12 H 5 12 3 17,310.252 1 116,9002 1 17,140.252 4 2 3124 1 12 H 5 0.022 (913.78 1 2,112.50 1 892.53) 2 69
  • 336. H 5 0.022 (3,968.31) 2 69 H 5 18.30 The H statistic approximates a chi-square (x2) distribution, which will be discussed in Chapter 11, based on k 2 1 degrees of freedom where k is the number of comparison groups. The chi-square distribution table in Table 6.10 has the critical values based on the degrees of freedom, that is N 2 1 5 23. Therefore, using the table, the x2critical 5 35.17 at the a 5 .05 level. As noted our x2observed value above 18.30 is less than this x 2 critical 5 35.17 value meaning that there is no significant difference between groups. As noted in the ANOVA conducted earlier in the chapter, it was expected that a nonsignificant outcome would occur. Nonparametric tests are more conservative compared to parametric ones in that there is a lower probability of finding a significant outcome compared to its paramet- ric counterpart. This also leads to a lower probability of a type I error. Table 6.10: Chi-square distribution Area to the right of critical value Degrees of freedom
  • 337. 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 1 — 0.001 0.004 0.016 2.706 3.841 5.024 6.635 2 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 3 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 4 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 5 0.554 0.831 1.145 1.610 9.236 11.071 12.833 15.086 6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 8 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 11 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 12 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 (continued) suk85842_06_c06.indd 225 10/23/13 1:40 PM CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H- Test Table 6.10: Chi-square distribution (continued)
  • 338. Area to the right of critical value Degrees of freedom 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 13 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 14 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 15 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 16 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 17 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 18 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 19 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 20 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 21 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 22 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289 23 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 24 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 25 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 26 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642
  • 339. 27 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963 28 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 29 14.257 16.047 17.708 19.768 39.087 42.557 45.722 49.588 30 14.954 16.791 18.493 20.599 40.256 43.773 46.979 50.892 As you will see in the next section, when this analysis is performed in SPSS a x2 value is given and not an H value per se. SPSS Steps for the Kuskal-Wallis H-test Reexamining the data set used Figure 6.6, but rearranging the data as depicted in Figure 6.11, the employee groups (Position) are categorically coded with 1 5 Laborers, 2 5 Cleri- cal, and 3 5 Professional. To execute, go to Analyze S Nonparametric Tests S Legacy Dialogs S K Independent Samples. As shown in Figure 6.12, input Optimism (DV) into the Test Variable List box and Position (IV) into the Grouping Variable box, then click the Define Range button just below to input the range of codes for the Position variable— this will be 1 and 3 for the minimum and maximum codes, respectively. Then click OK. suk85842_06_c06.indd 226 10/23/13 1:40 PM CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H- Test
  • 340. Figure 6.11: Data set in SPSS Figure 6.12: The Kruskal-Wallis H-test steps in SPSS suk85842_06_c06.indd 227 10/23/13 1:40 PM CHAPTER 6Section 6.9 Nonparametric Test: Kruskal-Wallis H- test Interpreting Results The output in Figure 6.13 shows the results of the Kuskal- Wallis H-test. The x2 value in the Test Statistics table shows a result as KW—x2(3) 5 7.17, p , .05 as there is an overall sta- tistical difference in optimism amongst the three employee groups. This can be seen in the Ranks table where Laborers’ mean ranks (MR 5 21.80) is highest compared to the lowest by Professionals (MR 5 6.30). Consequently, the post hoc tests are not readily available as they were for the ANOVA, so follow-up Mann-Whitney U or Wilcoxon rank-sum tests of all possible combinations will have to be performed (see Chapter 5 for these procedures). The conclusion to these results would read as follows: Based on the Kruskal-Wallis H-test there is a significant difference in the level of optimism of the three groups (KW—x2(3) 5 17.17, p , .05). Labor- ers reported the highest level of optimism (MR 5 21.80) followed by Cleri-
  • 341. cal positions (MR 5 18.40), and then Professionals (MR 5 6.30), which reported the lowest level of optimism. Figure 6.13: The Kruskal-Wallis H-test output Groups Optimism Laborers Clerical Professionals Total Position N 10 10 10 30 Mean Rank 21.80 18.40 6.30
  • 342. Ranks Chi-Square 17.166 2df Asymp. Sig. .000 Test Statisticsa,b Optimism a. Kruskal Wallis Test b. Grouping Variable: Position suk85842_06_c06.indd 228 10/23/13 1:40 PM CHAPTER 6Summary Summary This chapter is the natural extension of Chapters 4 and 5. Like the z- and t-tests, analysis of variance is a test of significant differences. Also like the z- and t-tests, the IV in ANOVA is nominal and the DV is interval or ratio. With each procedure—whether z, t, or F—the test statistic is a ratio of the differences between groups to the differences within groups (Objective 3). There are differences between ANOVA and the earlier procedures, of course. The vari- ance statistics are sums of squares and mean squares values. But perhaps the most impor-
  • 343. tant difference is that ANOVA can accommodate any number of groups (Objectives 2 and 3). Remember that trying to deal with multiple groups in a t-test introduces the problem of mounting type I error when repeated analyses with the same data indicate statistical significance. One-way ANOVA lifts the limitation of a one- pair-at-a-time comparison (Objective 1). The other side of multiple comparisons, however, is the difficulty of determining which comparisons are statistically significant when F is significant. This problem is solved with the post hoc test. In this chapter, we used Tukey’s HSD (Objective 4). There are other post hoc tests, each having their strengths and drawbacks, but HSD is one of the most widely used. Years ago, the emphasis in the scholarly literature was on whether a result was statisti- cally significant. Today, the focus is on measuring the effect size of a significant result, a statistic that in the case of analysis of variance can indicate how much of the variability in the dependent variable can be attributed to the effect of the independent variable. We answered that question with eta-squared (h2). But neither the post hoc test nor eta-squared is relevant if the F is not significant (Objective 5). Then, further ANOVAs were executed in SPSS, and the results were presented (Objective 6) in APA format and interpreted accord- ingly (Objective 7). Finally, the nonparametric equivalent of ANOVA, Kruskal-Wallis
  • 344. H-test, was discussed as an alternative method and compared to its parametric equivalent, the ANOVA. The same data set was used to compare outcomes. In addition, an appropri- ate example in SPSS was provided (Objective 8). The independent t-test and the one-way ANOVA both require that groups be indepen- dent. What if they are not? What if we wish to measure one group twice over time, or perhaps more than twice? Such dependent-groups procedures are the focus of Chapter 7. Rather than different thinking, it is more of an elaboration of familiar concepts. For this reason, consider reviewing Chapter 5 and the independent t-test discussion before start- ing Chapter 7. The one-way ANOVA dramatically broadens the kinds of questions the researcher can ask. The procedures in Chapter 7 for nonindependent groups represent the next incre- mental step. suk85842_06_c06.indd 229 10/23/13 1:40 PM CHAPTER 6Chapter Exercises Key Terms analysis of variance Fisher’s test that allows one to detect significant differences among any number of groups. The acro- nym is ANOVA.
  • 345. error variance The variability in a measure unrelated to the variables being analyzed. Eta-squared A measure of effect size for ANOVA. It estimates the amount of vari- ability in the DV explained by the IV. F ratio The test statistic calculated in an analysis of variance problem. It is the ratio of the variance between the groups to the variance within the groups. factor Refers to an IV, particularly in pro- cedures that involve more than one. family-wise error An inflated type I error rate in hypothesis testing when doing mul- tiple tests with the assumption of different sets of data. Specifically, when comparing multiple groups in dyad combinations using a series of t-tests instead of executing one omnibus ANOVA. homogeneity of variance When multiple groups of data are distributed similarly. mean square The sum of squares divided by its degrees of freedom. This division allows the mean square to reflect a mean, or average, amount of variability from a source. omnibus test A test of the overall sig- nificance of the model based on difference between sample means when there are more than two groups to compare. The test
  • 346. will not tell you which two means are sig- nificantly different, which is why follow-up post hoc comparisons are executed. one-way ANOVA The ANOVA in its sim- plest form, this model has only one inde- pendent variable. post hoc test A test conducted after a sig- nificant ANOVA or some similar test that identifies which among multiple possibili- ties is statistically significant. sum of squares (SS) The variance measure in analysis of variance. They are literally the sum of squared deviations between a set of scores and their mean. sum of squares between The variability related to the independent variable and any measurement error that may occur. sum of squares total Total variance from all sources. sum of squares within Variability stem- ming from different responses from indi- viduals in the same group. It is exclusively error variance. Is also referred to as the sum of squares error or the sum of squares residual. Chapter Exercises Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below.
  • 347. A. The “one” in one-way ANOVA refers to the fact that this test accommodates just one independent variable. B. There is no gender variable in the analysis and consequently, gender-related variance emerges as error variance. The same would be true for any variability in scores stemming from any variable not being analyzed in the study. suk85842_06_c06.indd 230 10/23/13 1:40 PM CHAPTER 6Chapter Exercises C. It would take 15 comparisons! The answer is the number of groups (6) times the number of groups minus 1 (5), with the product divided by 2: 6 3 5 5 30/2 5 15. D. The only way SS values can be negative is if there has been a calculation error. Because the values are all squared values, if they have any value other than 0, they have to be positive. E. The difference between SStot and SSwith is the SSbet. F. If F 5 4 and MSwith 5 2, then MSbet 5 8 because F 5 MSbet 4 MSwith. G. The answer is neither. If F is not significant, there is no question of which group
  • 348. is significantly different from which other group because any variability may be nothing more than sampling variability. By the same token, there is no effect to calculate because, as far as we know, the IV does not have any effect on the DV. H. F 5 t2 Review Questions The answers to the odd-numbered items can be found in the answers appendix. 1. Several people selected at random are given a story problem to solve. They take 3.5, 3.8, 4.2, 4.5, 4.7, 5.3, 6.0, and 7.5 minutes. What is the total sum of squares for this data? 2. Identify the following symbols and statistics in a one-way ANOVA: a. The statistic that indicates the mean amount of difference between groups. b. The symbol that indicates the total number of participants. c. The symbol that indicates the number of groups. d. The mean amount of uncontrolled variability. 3. The theory is that there are differences by gender in manifested aggression. With data from Measuring Expressed Aggression Numbers (MEAN), a researcher has the following: Males: 13, 14, 16, 16, 17, 18, 18, 18 Females: 11, 12, 12, 14, 14, 14, 14, 16
  • 349. Complete the problem as an ANOVA. Is the difference statistically significant? 4. Complete Exercise 3 as an independent t-test and demonstrate the relationship between t2 and F. 5. Even with a significant F, there is never a need for a post hoc in a two-group ANOVA. Why? 6. A researcher completes an ANOVA in which the number of years of education completed is analyzed by ethnic group. If h2 5 .36, how should that be interpreted? suk85842_06_c06.indd 231 10/23/13 1:40 PM CHAPTER 6Chapter Exercises 7. Three groups of clients involved in a program for substance abuse attend weekly sessions for 8, 12, and 16 weeks. The DV is the number of days drug free. 8 weeks: 0, 5, 7, 8, 8 12 weeks: 3, 5, 12, 16, 17 16 weeks: 11, 15, 16, 19, 22 a. Is F significant? b. What is the location of the significant difference?
  • 350. c. What does the effect size indicate? 8. Regarding Exercise 7, a. what is the IV? b. what is the scale of the IV? c. what is the DV? d. what is the scale of the DV? 9. For an ANOVA problem, k 5 4 and n 5 8. If SSbet 5 24.0 and SSwith 5 72, a. what is F? b. is the result significant? 10. Consider this partially completed ANOVA table: Source SS df MS F Fcrit Total 94 Between 2 Within 63 3 a. What must be the value of N 2 k? b. What must be the value of k? c. What must be the value of N? d. What must SSbet be? e. Determine MSbet. f. Determine F. g. What is Fcrit? Analyzing the Research Review the article abstracts provided below. You can then
  • 351. access the full articles via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix. Using ANOVA for an Emotions Study Carolan, L. A., & Power, M. J. (2011). What basic emotions are experienced in bipolar disorder? Clinical Psychology & Psychotherapy, 18(5), 366– 378. suk85842_06_c06.indd 232 10/23/13 1:40 PM CHAPTER 6Chapter Exercises Article Abstract Aims: The aims of this study were to investigate the basic emotions experienced within and between episodes of bipolar disorder and, more specifically, to test the predictions made by the Schematic, Propositional, Analogical and Associative Representation Sys- tems (SPAARS) model that mania is predominantly characterized by the coupling of happiness with anger whereas depression (unipolar and bipolar) primarily comprises a coupling between sadness and disgust. Design: Across-sectional design was employed to examine the differences within and between the bipolar, unipolar and control groups in the emotional profiles. Data were
  • 352. analyzed using one-way ANOVAs. Method: Psychiatric diagnoses in the clinical groups were confirmed using the Structured Clinical Interview for DSM-IV (SCID). It was not administered in the control group. Cur- rent mood state was measured using the Beck Depression Inventory-II, the State–Trait Anxiety Inventory and the Bech–Rafaelsen Mania Scale. The Basic Emotions Scale was used to explore the emotional profiles. Results: The results confirmed the predictions made by the SPAARS model about emo- tions in mania and depression. Out with these episodes, individuals with bipolar disorder experienced elevated levels of disgust. Discussion: Evidence was found in support of the proposal of SPAARS that there are five basic emotions, which form the basis for both normal emotional experience and emotional disorders. Disgust is an important feature of bipolar disorder. Strengths and limitations are discussed, and suggestions for future research are explored. Critical Thinking Questions 1. Why does this study use a one-way ANOVA instead of a t- test? 2. What means are being compared in the bipolar group in this study? 3. According to the following ANOVA results between bipolar and unipolar groups,
  • 353. which result(s) showed significance? F(1,46) 5 0.00; p 5 .93 F(1,19.22) 5 9.81; p 5 .005 F(1,45) 5 1.26; p 5 .26 F(1,44) 5 0.02; p 5 .87 F(1,45) 5 0.13; p 5 .71 4. What types of post hoc test did the paper use as a follow-up to the F statistic? suk85842_06_c06.indd 233 10/23/13 1:40 PM CHAPTER 6Chapter Exercises Using ANOVA for a Health and Physical Activity Study Bize, R., & Plotnikoff, R. C. (2009). The relationship between a short measure of health status and physical activity in a workplace population. Psychology, Health & Medi- cine, 14(1), 53–61. Article Abstract Many interventions promoting physical activity (PA) are effective in preventing disease onset, and although studies have found a positive relationship between health-related quality of life (HRQL) and PA, most of these studies have
  • 354. focused on older adults and those with chronic conditions. Less is known regarding the association between PA level and HRQL among healthy adults. Our objective was to analyse the relationship between PA level and HRQL among a sample of 573 employees aged 20– 68 taking part in a work- place intervention to promote PA. Measures included HRQL (using a single item) and PA (i.e., Godin Leisure-Time Questionnaire). The Modified Canadian Aerobic Fitness Test (MCAFT) was also completed by 10% of the employees. MET-minute scores (assess- ing energy expenditure over one week) were compared across HRQL categories using ANOVA. A multiple linear regression analysis was conducted to further examine the rela- tionship between HRQL and PA, controlling for potential covariates. Participants in the higher health status categories were found to report higher levels of energy expenditure (one-way ANOVA, p , 0.001). In the multiple linear regression model, each unit increase in health status level translated in a mean increase of 356 MET- minutes in energy expen- diture (p , 0.001). This single-item assessment of health status explained six percent of the variance in energy expenditure. The study concludes that higher energy expenditure through PA among an adult workplace population is positively associated with increased health status, and it also suggests that a single-item HRQL measure is suitable for com- munity- and population based studies, reducing response burden and research costs.
  • 355. Critical Thinking Questions 1. Why did this study execute a Kruskal-Wallis H-test? 2. It was stated that the higher health status categories reported higher mean energy expenditure of the one-way ANOVA, and the Kruskal-Wallis yielded similar results. To make this plausible, what would the significance level of the Kruskal- Wallis have been? 3. After evaluating figure 1, we can see there is a difference in higher health status and higher energy expenditure. From this information, should they have run a post hoc test? Why or why not? suk85842_06_c06.indd 234 10/23/13 1:40 PM Research Question FOR WEEK ONE Background During this week you will brainstorm a list of research questions you are interested in, which will help you work towards your Week 1 Assignment. You are working towards creating a list of at least 10 unique research questions that encompass a variety of topics and types of variables. Think about exploring relationships between variables, making predictions for one variable using one or more other variables, and determining differences between groups across one or two variables. In future weeks, you will pull questions from this list that might lend themselves to a particular statistical analysis, thus saving valuable time in not needing to brainstorm research ideas. During those weeks you will take the research question and create a mini-research proposal that will help you consider
  • 356. the application of a specific statistical analysis to that question. Discussion Assignment Requirements Initial Posting - To earn full participation points, include in your initial posting at least 5 potential research questions by Day 3. Have fun with these questions and choose topics you are truly interested in, whether they are leadership, training, sports, social media, politics, movies, or food. This will make the research design process much more enjoyable. If you need help coming up with ideas, ask your instructor for examples. Also, feel free to post more than 5 research questions as it would be useful to get feedback on as many questions as possible. For each of the questions, provide the following: · List the research question (be sure to phrase as a measurable question) · Identify the variables presented in the question · Provide an operational definition for each variable · Describe each variable’s scale of measurement (nominal, ordinal, interval, or ratio) and characteristics (i.e., discrete vs. continuous, numerical vs. categorical, etc.) Replies - Though you may respond to your peers multiple times during the week to provide support or feedback, students are required to respond substantively to at least two of their classmates’ postings by ANSWER FOR DISCUSSION WEEK 1 Research discussion Research questions one: How does leadership style affect organizational performance? In this research question, the independent variables are leadership style, while the dependent variable is organizational performance (Sukal, 2019). Leadership styles are techniques used by organizations to run their activities to achieve their objectives. Besides, organizational performance entails various achievements of an entity that are accrued from its business operations. An ordinal scale of measurement can be used in this case.
  • 357. Research questions two: Effects of technology on students' performance? In the case, technology is the independent variable while students' performance is the dependent variable. Technology in education is scientific knowledge used to improve the level of education (Sukal, 2019). Student performance refers to how students carry out their studies. An ordinal scale of measurement is appropriate to measure how technology affects students' performance. Research questions Three: what are the effects of smoking on human health? Smoking is the independent variable, while human health is the dependent variable. Smoking is the inhalation of tobacco products, while human health is the well-being of the human condition (Carruthers & Maggard, 2019). An ordinal scale of measurement is used in this case. Research questions four: Effects of training on employee performance? Training is the independent variable, while employee performance is the dependent variable (Carruthers & Maggard, 2019). Training involves equipping employees with the knowledge to perform their duties appropriately. Employee performance is the output that is accrued from different activities. An ordinal scale is used in this research question. Research questions five: How does management styles affect employee performance? Management styles are the independent variable, while employee performance is the dependent (Carruthers & Maggard, 2019). Management styles are techniques used by the management to run business activities while employee performance is output accrued from employees' actions. An ordinal scale is used in this research question. References Carruthers, M. W., Maggard, M. (2019). Smart Lab: A Statistics Primer. San Diego, CA: Bridge point Education, Inc. Sukal, M. (2019). Research methods: Applying statistics in
  • 358. research. San Diego, CA: Bridge point Education, Inc. PROFFESSOR RESPOND: Interesting questions! Please be sure to include operational definitions of your DVs - i.e. employee performance. How would you measure it? It might be helpful to review the operational definition announcement in the course. Remember, we need to include enough detail about our methodology and variables so that anyone could replicate our work.