Aspects of Validity

Aspects of Validity
An Argument-Based, Systematic Framework
to Study Validity and Reliability of Unit and
Program Assessments
Nancy Wellenzohn, EdD
Associate Dean & Director of Accreditation
CAEP Coordinator

WHY ANALYZE VALIDITY?
• Assessments are instruments that
demonstrate that goals and objectives are
being met.
• Goals and objectives are established using
standards, current research in best practice,
conversations with field partners, and other
relevant sources.
• A validity study provides legitimacy to the
program assessments

WHY ANALYZE VALIDITY?
• Connecting assessments and curriculum
to standards, best practices, and needs of
the field is something that has always
been done.
• The new CAEP processes now want us to
prove it.
• Validity is more than just statistical validity.

CAEP REQUIREMENTS
• CAEP Evidence Guide provides a broad
discussion of what makes an assessment
valid and reliable.
• CAEP White Paper “Principles for measures
used in CAEP Accreditation Process” (Ewell,
2013) provides relevant insights.
• Supporting literature provides additional
guidance.
• Informal perceptions of validity are no longer
enough.

CAEP REQUIREMENTS
CAEP 5.2 says “provider’s quality assurance
system relies on relevant, verifiable,
representative, cumulative, and actionable
measures, and produces empirical evidence
that interpretations of data are valid and
consistent.”

VALIDITY LITERATURE
• Messick (1995) defined validity as “nothing
less than an evaluative summary of both
the evidence for and the actual as well as
potential consequences of score
interpretation and use.”
• Need to look at the validity of the
instrument and the validity of the data.

ASPECTS OF VALIDITY
• Need a clear and practical way to
systematically study validity.
• Messick separated the concept of validity
into six separate aspects.
• These aspects provide a good place to
start.

ASPECTS OF VALIDITY
Instrument Aspects
• Content
• Structural
• Consequential

ASPECTS OF VALIDITY
Results Aspects
• Generalizability
• External
• Substantive

CONTENT ASPECT OF VALIDITY
• Evidence of “content relevance,
representativeness, and technical quality”
(Messick 1995, p.6)
• Can be supported by content and
performance standards
• Topic of assessment can be found in
professional domain

STRUCTURAL ASPECT OF
VALIDITY
• Instrument “appraised the extent to which
the internal structure of the assessment is
consistent with the construct domain.”
(Messick, 1995, p. 6)
• Are we asking the right question?

CONSEQUENTIAL ASPECT OF
VALIDITY
• “Appraises the value implications of score
interpretation as a basis for action as well
as the actual and potential consequences
of test use…” (Messick, 1995, p.6)
• Does the instrument lead to results,
positive or negative, that are meaningful?

GENERALIZABILITY ASPECT OF
VALIDITY
• “Extent to which score properties and
interpretations generalize to and across
population groups, setting, and tasks”
(Messick, 1995, p.6)
• Are the data consistent between groups,
over time, and consistent with best
practice in the field?
• Are the data predictive?

EXTERNAL ASPECT OF VALIDITY
• Includes “convergent and discriminant
evidence from multi-trait and multi-method
comparisons as well as evidence of criterion
relevance and applied utility” (Messick 1995,
p. 6)
• Does the data correlate with other variables?
Are the results consistent with other
assessments? Are conclusions made
considering results of multiple assessments?

SUBSTANTIVE ASPECT OF
VALIDITY
• “Theoretical rationales for the observed
consistencies in test responses” (Messick,
1995, p.6)
• Are the candidates taking the right actions,
meaning similar to those in the field?

• Validity and Reliability
• Relevance
• Verifiability
• Representativeness
• Cumulativeness
• Fairness
• Stakeholder Interest
• Benchmarks
• Vulnerability to
Manipulation
• Actionability
CAEP WHITE PAPER
PETER EWELL
“PRINCIPLES FOR MEASURES USED IN THE CAEP ACCREDITATION PROCESS”
• •
OVERLAP BETWEEN THE EWELL AND MESSICK
CONCEPTS IS APPARENT

CONCEPT OF UNITARY VALIDITY
(MESSICK 1989)
• The standard for studying validity for years
had been to consider content, construct, and
criterion validity.
• Messick said “an ideal validation includes
several different types of evidence that spans
all three of the categories.” (Messick 1989)
• This allows for the consideration of different
types of evidence rather than separately
studying different types of validity.

AN ARGUMENT-BASED
APPROACH TO VALIDATION
(KANE, 2013)
• “Under the argument-based approach to
validity, test-score interpretations and uses
that are clearly stated and are supported by
appropriate evidence are considered to be
valid.” (Kane, 2013)
• This means that programs can validate their
assessments by providing multiple pieces of
evidence that lend support to the notion that
an assessment is valid

TAKEAWAYS FROM THE
LITERATURE
• Messick’s aspects of validity provide a useful
framework for analysis
• Ewell’s principles for measures used in CAEP
accreditation are supportive of and related to
the aspects of validity
• Kane suggests that programs can make
arguments that assessments are valid
• Messick’s unitary theory suggests that
multiple factors can be considered

ASPECTS OF VALIDITY REVIEW - INSTRUMENT
Unacceptable 1 Acceptable 2 Target 3
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does not meet at
least two of the following:
Aligned with national or state
standards
Developed with input from external
partners.
Measure is relevant and demonstrably
related to an issue of importance
Assessment Content meets at least
two of the following:
standards
partners
Assessment Content meets all of the
following:
standards
partners
Structural Aspect of Construct
Validity: Observed consistency in
responses
Assessment Structure does not meet
at least 2 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices.
Program considers sources of
potential bias
Assessment Structure meets at least
2 of the following:
by third parties
such as peers or best practices.
potential bias
Assessment Structure meets at least
3 of the following:
by third parties
such as peers or best practices
potential bias
Consequential Aspect of Construct
Validity: Positive and negative
consequences, either intended or
unintended, are observed and
discussed
Assessment Consequences are
reviewed but do not ensure at least 2
of the following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
reviewed to ensure at least 2 of the
following:
reviewed to ensure all of the following:

ASPECTS OF VALIDITY REVIEW - RESULTS
Unacceptable 1 Acceptable 2 Target 3
Substantive Aspect of
Construct Validity: Observed
consistency in the test
responses/scores
Assessment Substance does not meet at
least 2 of the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 2 of
the following:
verification
population
the following:
verification
population
Generalizability Aspect of
Construct Validity: Results
generalize to and across
population groups
Assessment Generalizability does not
include at least 2 of the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
Assessment Generalizability includes at
Assessment Generalizability includes all of
the following:
External Aspect of Construct
Validity: Correlations with
external variables exist
External Aspect of the assessment does
not include at least 2 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created with some
instructor support
External Aspect of the assessment
includes at least 2 of the following:
replicate.
Student work is created with some
instructor support
External Aspect of the assessment
includes 3 of the following:
replicate.
Student work is created without instructor
support

RESULTS CAN BE GRAPHED
• The instrument aspect scores (content, structural,
and consequential aspects) can be averaged.
• The results aspects scores (substantive,
generalizability, and external aspects) can be
averaged.
• Resulting scores can be plotted.
• This can be done for a collection of assessments
in a program to provide a visual representation of
how the program is doing overall.

IMPLEMENTATION
• This is a peer-review process.
• Directors/Chairs make arguments that their
assessments are valid.
• They submit evidence for each argument.
The Director/Chair self-scores the evidence
in a software system.
• A 2-3 person panel reviews the arguments
and applies the rubric. They meet to form a
consensus for scoring.

EXAMPLE EDUCATIONAL LEADERSHIP
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does
not meet at least two of the
following:
standards
Developed with input from
external partners.
Measure is relevant and
demonstrably related to an
issue of importance
Assessment Content meets
at least two of the following:
standards
external partners
issue of importance
Assessment Content meets
all of the following:
standards
external partners
issue of importance

EDUCATIONAL LEADERSHIP EXAMPLE
Substantive Aspect of Construct Validity:
Observed consistency in the test
responses/scores
Assessment Substance does not meet at
verification
population
Data are compared to benchmarks such as
peers or best practices
the following:
verification
population
the following:
verification
population

REFERENCES
Council for the Accreditation of Educator Preparation, CAEP Evidence Guide, February 2014
Ewell, Peter (2013). Principles for measures used in CAEP accreditation process, White Paper for Council for
the Accreditation of Educator Preparation.
Kane, Michael. (2013). The argument-based approach to validation. School Psychology Review 42(4), 448-
457.
Messick, Samuel. (1995). Standards of validity and the validity of standards in performance assessment.
Educational Measurement, Issues and Practice. 14, 5-8.
Messick, Samuel. (1989). Chapter in Linn, R. L. Educational Measurement 3rd Edition. New York: American
Council on Education/Macmillan 13-103.
QUESTIONS?
wellenzn@Canisius.edu

Aspects of Validity

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Aspects of Validity (20)

Recently uploaded (20)

Aspects of Validity