SlideShare a Scribd company logo
Aspects of Validity
An Argument-Based, Systematic Framework
to Study Validity and Reliability of Unit and
Program Assessments
Nancy Wellenzohn, EdD
Associate Dean & Director of Accreditation
CAEP Coordinator
WHY ANALYZE VALIDITY?
• Assessments are instruments that
demonstrate that goals and objectives are
being met.
• Goals and objectives are established using
standards, current research in best practice,
conversations with field partners, and other
relevant sources.
• A validity study provides legitimacy to the
program assessments
WHY ANALYZE VALIDITY?
• Connecting assessments and curriculum
to standards, best practices, and needs of
the field is something that has always
been done.
• The new CAEP processes now want us to
prove it.
• Validity is more than just statistical validity.
CAEP REQUIREMENTS
• CAEP Evidence Guide provides a broad
discussion of what makes an assessment
valid and reliable.
• CAEP White Paper “Principles for measures
used in CAEP Accreditation Process” (Ewell,
2013) provides relevant insights.
• Supporting literature provides additional
guidance.
• Informal perceptions of validity are no longer
enough.
CAEP REQUIREMENTS
CAEP 5.2 says “provider’s quality assurance
system relies on relevant, verifiable,
representative, cumulative, and actionable
measures, and produces empirical evidence
that interpretations of data are valid and
consistent.”
VALIDITY LITERATURE
• Messick (1995) defined validity as “nothing
less than an evaluative summary of both
the evidence for and the actual as well as
potential consequences of score
interpretation and use.”
• Need to look at the validity of the
instrument and the validity of the data.
ASPECTS OF VALIDITY
• Need a clear and practical way to
systematically study validity.
• Messick separated the concept of validity
into six separate aspects.
• These aspects provide a good place to
start.
ASPECTS OF VALIDITY
Instrument Aspects
• Content
• Structural
• Consequential
ASPECTS OF VALIDITY
Results Aspects
• Generalizability
• External
• Substantive
CONTENT ASPECT OF VALIDITY
• Evidence of “content relevance,
representativeness, and technical quality”
(Messick 1995, p.6)
• Can be supported by content and
performance standards
• Topic of assessment can be found in
professional domain
STRUCTURAL ASPECT OF
VALIDITY
• Instrument “appraised the extent to which
the internal structure of the assessment is
consistent with the construct domain.”
(Messick, 1995, p. 6)
• Are we asking the right question?
CONSEQUENTIAL ASPECT OF
VALIDITY
• “Appraises the value implications of score
interpretation as a basis for action as well
as the actual and potential consequences
of test use…” (Messick, 1995, p.6)
• Does the instrument lead to results,
positive or negative, that are meaningful?
GENERALIZABILITY ASPECT OF
VALIDITY
• “Extent to which score properties and
interpretations generalize to and across
population groups, setting, and tasks”
(Messick, 1995, p.6)
• Are the data consistent between groups,
over time, and consistent with best
practice in the field?
• Are the data predictive?
EXTERNAL ASPECT OF VALIDITY
• Includes “convergent and discriminant
evidence from multi-trait and multi-method
comparisons as well as evidence of criterion
relevance and applied utility” (Messick 1995,
p. 6)
• Does the data correlate with other variables?
Are the results consistent with other
assessments? Are conclusions made
considering results of multiple assessments?
SUBSTANTIVE ASPECT OF
VALIDITY
• “Theoretical rationales for the observed
consistencies in test responses” (Messick,
1995, p.6)
• Are the candidates taking the right actions,
meaning similar to those in the field?
• Validity and Reliability
• Relevance
• Verifiability
• Representativeness
• Cumulativeness
• Fairness
• Stakeholder Interest
• Benchmarks
• Vulnerability to
Manipulation
• Actionability
CAEP WHITE PAPER
PETER EWELL
“PRINCIPLES FOR MEASURES USED IN THE CAEP ACCREDITATION PROCESS”
• •
OVERLAP BETWEEN THE EWELL AND MESSICK
CONCEPTS IS APPARENT
CONCEPT OF UNITARY VALIDITY
(MESSICK 1989)
• The standard for studying validity for years
had been to consider content, construct, and
criterion validity.
• Messick said “an ideal validation includes
several different types of evidence that spans
all three of the categories.” (Messick 1989)
• This allows for the consideration of different
types of evidence rather than separately
studying different types of validity.
AN ARGUMENT-BASED
APPROACH TO VALIDATION
(KANE, 2013)
• “Under the argument-based approach to
validity, test-score interpretations and uses
that are clearly stated and are supported by
appropriate evidence are considered to be
valid.” (Kane, 2013)
• This means that programs can validate their
assessments by providing multiple pieces of
evidence that lend support to the notion that
an assessment is valid
TAKEAWAYS FROM THE
LITERATURE
• Messick’s aspects of validity provide a useful
framework for analysis
• Ewell’s principles for measures used in CAEP
accreditation are supportive of and related to
the aspects of validity
• Kane suggests that programs can make
arguments that assessments are valid
• Messick’s unitary theory suggests that
multiple factors can be considered
Aspects of Validity
ASPECTS OF VALIDITY REVIEW - INSTRUMENT
Unacceptable 1 Acceptable 2 Target 3
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does not meet at
least two of the following:
Aligned with national or state
standards
Developed with input from external
partners.
Measure is relevant and demonstrably
related to an issue of importance
Assessment Content meets at least
two of the following:
Aligned with national or state
standards
Developed with input from external
partners
Measure is relevant and demonstrably
related to an issue of importance
Assessment Content meets all of the
following:
Aligned with national or state
standards
Developed with input from external
partners
Measure is relevant and demonstrably
related to an issue of importance
Structural Aspect of Construct
Validity: Observed consistency in
responses
Assessment Structure does not meet
at least 2 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices.
Program considers sources of
potential bias
Assessment Structure meets at least
2 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices.
Program considers sources of
potential bias
Assessment Structure meets at least
3 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices
Program considers sources of
potential bias
Consequential Aspect of Construct
Validity: Positive and negative
consequences, either intended or
unintended, are observed and
discussed
Assessment Consequences are
reviewed but do not ensure at least 2
of the following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
Assessment Consequences are
reviewed to ensure at least 2 of the
following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
Assessment Consequences are
reviewed to ensure all of the following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
ASPECTS OF VALIDITY REVIEW - RESULTS
Unacceptable 1 Acceptable 2 Target 3
Substantive Aspect of
Construct Validity: Observed
consistency in the test
responses/scores
Assessment Substance does not meet at
least 2 of the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 2 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 3 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Generalizability Aspect of
Construct Validity: Results
generalize to and across
population groups
Assessment Generalizability does not
include at least 2 of the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
Assessment Generalizability includes at
least 2 of the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
Assessment Generalizability includes all of
the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
External Aspect of Construct
Validity: Correlations with
external variables exist
External Aspect of the assessment does
not include at least 2 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created with some
instructor support
External Aspect of the assessment
includes at least 2 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created with some
instructor support
External Aspect of the assessment
includes 3 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created without instructor
support
RESULTS CAN BE GRAPHED
• The instrument aspect scores (content, structural,
and consequential aspects) can be averaged.
• The results aspects scores (substantive,
generalizability, and external aspects) can be
averaged.
• Resulting scores can be plotted.
• This can be done for a collection of assessments
in a program to provide a visual representation of
how the program is doing overall.
Aspects of Validity
IMPLEMENTATION
• This is a peer-review process.
• Directors/Chairs make arguments that their
assessments are valid.
• They submit evidence for each argument.
The Director/Chair self-scores the evidence
in a software system.
• A 2-3 person panel reviews the arguments
and applies the rubric. They meet to form a
consensus for scoring.
EXAMPLE EDUCATIONAL LEADERSHIP
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does
not meet at least two of the
following:
Aligned with national or state
standards
Developed with input from
external partners.
Measure is relevant and
demonstrably related to an
issue of importance
Assessment Content meets
at least two of the following:
Aligned with national or state
standards
Developed with input from
external partners
Measure is relevant and
demonstrably related to an
issue of importance
Assessment Content meets
all of the following:
Aligned with national or state
standards
Developed with input from
external partners
Measure is relevant and
demonstrably related to an
issue of importance
EDUCATIONAL LEADERSHIP EXAMPLE
Substantive Aspect of Construct Validity:
Observed consistency in the test
responses/scores
Assessment Substance does not meet at
least 2 of the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 2 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 3 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
RESULT OF VALIDITY REVIEW
REFERENCES
Council for the Accreditation of Educator Preparation, CAEP Evidence Guide, February 2014
Ewell, Peter (2013). Principles for measures used in CAEP accreditation process, White Paper for Council for
the Accreditation of Educator Preparation.
Kane, Michael. (2013). The argument-based approach to validation. School Psychology Review 42(4), 448-
457.
Messick, Samuel. (1995). Standards of validity and the validity of standards in performance assessment.
Educational Measurement, Issues and Practice. 14, 5-8.
Messick, Samuel. (1989). Chapter in Linn, R. L. Educational Measurement 3rd Edition. New York: American
Council on Education/Macmillan 13-103.
QUESTIONS?
wellenzn@Canisius.edu

More Related Content

PPT
Validity and reliability in assessment.
PPTX
Item analysis
PPTX
Validity, reliability and feasibility
PPT
Characteristics of a good test
PPTX
Validity and Reliability
PPT
Test Administration
PPT
Test Reliability and Validity
PPTX
Assessing 21st century skills
Validity and reliability in assessment.
Item analysis
Validity, reliability and feasibility
Characteristics of a good test
Validity and Reliability
Test Administration
Test Reliability and Validity
Assessing 21st century skills

What's hot (20)

PPTX
Validity in Assessment
PPTX
Development of classroom assessment tools
PPTX
Authentic assessment
PPTX
Item analysis in education
PPTX
Grading and reporting
PPT
Curriculum criteria
PPTX
Function of Grading and Reporting System
PPTX
Curriculum design
PPTX
Validation
PPTX
Evaluating the Curriculum
PPTX
Reliability
PPTX
BASIC OF MEASUREMENT & EVALUATION
PPTX
Administering the test
PPTX
Perennialism
PPTX
measurement assessment and evaluation
PPTX
Grading and reporting
PPTX
Adequacy
PPTX
Interpretation of test Scores
Validity in Assessment
Development of classroom assessment tools
Authentic assessment
Item analysis in education
Grading and reporting
Curriculum criteria
Function of Grading and Reporting System
Curriculum design
Validation
Evaluating the Curriculum
Reliability
BASIC OF MEASUREMENT & EVALUATION
Administering the test
Perennialism
measurement assessment and evaluation
Grading and reporting
Adequacy
Interpretation of test Scores
Ad

Viewers also liked (20)

PPTX
Association & causation
PPTX
Validity Study Barbados NVQ-B
PPTX
3.4 types of validity
 
PPTX
Causation in epidemiology
PPT
Validity, its types, measurement & factors.
PPT
Bias and validity
PPTX
Uses of epidemiology
PPTX
Validity
PPTX
Randomized Controlled Trial
PPT
Randomised controlled trials
PPT
Epidemiology ppt
PPT
Presentation Validity & Reliability
PPT
association and causation
PDF
Lecture 10
PDF
There’s No Escape from External Validity – Reporting Habits of Randomized Con...
PPTX
Association & causation (2016)
DOCX
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
PPTX
3.1 big picture
 
PPTX
Experimental Evaluation Methods
PPTX
Randomized Controlled Trials
Association & causation
Validity Study Barbados NVQ-B
3.4 types of validity
 
Causation in epidemiology
Validity, its types, measurement & factors.
Bias and validity
Uses of epidemiology
Validity
Randomized Controlled Trial
Randomised controlled trials
Epidemiology ppt
Presentation Validity & Reliability
association and causation
Lecture 10
There’s No Escape from External Validity – Reporting Habits of Randomized Con...
Association & causation (2016)
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
3.1 big picture
 
Experimental Evaluation Methods
Randomized Controlled Trials
Ad

Similar to Aspects of Validity (20)

PPTX
JC-16-23June2021-rel-val.pptx
DOCX
1) A cyber crime is a crime that involves a computer and the Inter.docx
PPTX
Item Analysis and scaling methods...pptx
PPTX
The validity of Assessment.pptx
PPTX
Validity and reliability of questionnaires
PPTX
reliability and validity psychology 1234
PPTX
Rep
PPTX
Validity in performance appraisal
PDF
Introduction to Systematic Reviews - Prof Ejaz Khan
PPTX
Week 8 & 9 - Validity and Reliability
PPT
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
PPTX
RESEARCH INSTRUMENT, VALIDITY AND RELIABILITY.pptx
PPTX
COMMUNITY EVALUATION 2023.pptx
PPT
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
PDF
Validity and Reliability.pdf
PDF
Validity and Reliability.pdf
PPTX
VALIDITY
PPTX
Evaluation of health programs
PPTX
Program evaluation
JC-16-23June2021-rel-val.pptx
1) A cyber crime is a crime that involves a computer and the Inter.docx
Item Analysis and scaling methods...pptx
The validity of Assessment.pptx
Validity and reliability of questionnaires
reliability and validity psychology 1234
Rep
Validity in performance appraisal
Introduction to Systematic Reviews - Prof Ejaz Khan
Week 8 & 9 - Validity and Reliability
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
RESEARCH INSTRUMENT, VALIDITY AND RELIABILITY.pptx
COMMUNITY EVALUATION 2023.pptx
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
Validity and Reliability.pdf
Validity and Reliability.pdf
VALIDITY
Evaluation of health programs
Program evaluation

Recently uploaded (20)

PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Trump Administration's workforce development strategy
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Pharma ospi slides which help in ospi learning
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Cell Types and Its function , kingdom of life
Trump Administration's workforce development strategy
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Yogi Goddess Pres Conference Studio Updates
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pharma ospi slides which help in ospi learning
2.FourierTransform-ShortQuestionswithAnswers.pdf
Final Presentation General Medicine 03-08-2024.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx

Aspects of Validity

  • 1. Aspects of Validity An Argument-Based, Systematic Framework to Study Validity and Reliability of Unit and Program Assessments Nancy Wellenzohn, EdD Associate Dean & Director of Accreditation CAEP Coordinator
  • 2. WHY ANALYZE VALIDITY? • Assessments are instruments that demonstrate that goals and objectives are being met. • Goals and objectives are established using standards, current research in best practice, conversations with field partners, and other relevant sources. • A validity study provides legitimacy to the program assessments
  • 3. WHY ANALYZE VALIDITY? • Connecting assessments and curriculum to standards, best practices, and needs of the field is something that has always been done. • The new CAEP processes now want us to prove it. • Validity is more than just statistical validity.
  • 4. CAEP REQUIREMENTS • CAEP Evidence Guide provides a broad discussion of what makes an assessment valid and reliable. • CAEP White Paper “Principles for measures used in CAEP Accreditation Process” (Ewell, 2013) provides relevant insights. • Supporting literature provides additional guidance. • Informal perceptions of validity are no longer enough.
  • 5. CAEP REQUIREMENTS CAEP 5.2 says “provider’s quality assurance system relies on relevant, verifiable, representative, cumulative, and actionable measures, and produces empirical evidence that interpretations of data are valid and consistent.”
  • 6. VALIDITY LITERATURE • Messick (1995) defined validity as “nothing less than an evaluative summary of both the evidence for and the actual as well as potential consequences of score interpretation and use.” • Need to look at the validity of the instrument and the validity of the data.
  • 7. ASPECTS OF VALIDITY • Need a clear and practical way to systematically study validity. • Messick separated the concept of validity into six separate aspects. • These aspects provide a good place to start.
  • 8. ASPECTS OF VALIDITY Instrument Aspects • Content • Structural • Consequential
  • 9. ASPECTS OF VALIDITY Results Aspects • Generalizability • External • Substantive
  • 10. CONTENT ASPECT OF VALIDITY • Evidence of “content relevance, representativeness, and technical quality” (Messick 1995, p.6) • Can be supported by content and performance standards • Topic of assessment can be found in professional domain
  • 11. STRUCTURAL ASPECT OF VALIDITY • Instrument “appraised the extent to which the internal structure of the assessment is consistent with the construct domain.” (Messick, 1995, p. 6) • Are we asking the right question?
  • 12. CONSEQUENTIAL ASPECT OF VALIDITY • “Appraises the value implications of score interpretation as a basis for action as well as the actual and potential consequences of test use…” (Messick, 1995, p.6) • Does the instrument lead to results, positive or negative, that are meaningful?
  • 13. GENERALIZABILITY ASPECT OF VALIDITY • “Extent to which score properties and interpretations generalize to and across population groups, setting, and tasks” (Messick, 1995, p.6) • Are the data consistent between groups, over time, and consistent with best practice in the field? • Are the data predictive?
  • 14. EXTERNAL ASPECT OF VALIDITY • Includes “convergent and discriminant evidence from multi-trait and multi-method comparisons as well as evidence of criterion relevance and applied utility” (Messick 1995, p. 6) • Does the data correlate with other variables? Are the results consistent with other assessments? Are conclusions made considering results of multiple assessments?
  • 15. SUBSTANTIVE ASPECT OF VALIDITY • “Theoretical rationales for the observed consistencies in test responses” (Messick, 1995, p.6) • Are the candidates taking the right actions, meaning similar to those in the field?
  • 16. • Validity and Reliability • Relevance • Verifiability • Representativeness • Cumulativeness • Fairness • Stakeholder Interest • Benchmarks • Vulnerability to Manipulation • Actionability CAEP WHITE PAPER PETER EWELL “PRINCIPLES FOR MEASURES USED IN THE CAEP ACCREDITATION PROCESS” • • OVERLAP BETWEEN THE EWELL AND MESSICK CONCEPTS IS APPARENT
  • 17. CONCEPT OF UNITARY VALIDITY (MESSICK 1989) • The standard for studying validity for years had been to consider content, construct, and criterion validity. • Messick said “an ideal validation includes several different types of evidence that spans all three of the categories.” (Messick 1989) • This allows for the consideration of different types of evidence rather than separately studying different types of validity.
  • 18. AN ARGUMENT-BASED APPROACH TO VALIDATION (KANE, 2013) • “Under the argument-based approach to validity, test-score interpretations and uses that are clearly stated and are supported by appropriate evidence are considered to be valid.” (Kane, 2013) • This means that programs can validate their assessments by providing multiple pieces of evidence that lend support to the notion that an assessment is valid
  • 19. TAKEAWAYS FROM THE LITERATURE • Messick’s aspects of validity provide a useful framework for analysis • Ewell’s principles for measures used in CAEP accreditation are supportive of and related to the aspects of validity • Kane suggests that programs can make arguments that assessments are valid • Messick’s unitary theory suggests that multiple factors can be considered
  • 21. ASPECTS OF VALIDITY REVIEW - INSTRUMENT Unacceptable 1 Acceptable 2 Target 3 Content Aspect of Construct Validity: Evidence of content relevance Assessment Content does not meet at least two of the following: Aligned with national or state standards Developed with input from external partners. Measure is relevant and demonstrably related to an issue of importance Assessment Content meets at least two of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Assessment Content meets all of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Structural Aspect of Construct Validity: Observed consistency in responses Assessment Structure does not meet at least 2 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices. Program considers sources of potential bias Assessment Structure meets at least 2 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices. Program considers sources of potential bias Assessment Structure meets at least 3 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices Program considers sources of potential bias Consequential Aspect of Construct Validity: Positive and negative consequences, either intended or unintended, are observed and discussed Assessment Consequences are reviewed but do not ensure at least 2 of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation Assessment Consequences are reviewed to ensure at least 2 of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation Assessment Consequences are reviewed to ensure all of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation
  • 22. ASPECTS OF VALIDITY REVIEW - RESULTS Unacceptable 1 Acceptable 2 Target 3 Substantive Aspect of Construct Validity: Observed consistency in the test responses/scores Assessment Substance does not meet at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 3 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Generalizability Aspect of Construct Validity: Results generalize to and across population groups Assessment Generalizability does not include at least 2 of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement Assessment Generalizability includes at least 2 of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement Assessment Generalizability includes all of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement External Aspect of Construct Validity: Correlations with external variables exist External Aspect of the assessment does not include at least 2 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created with some instructor support External Aspect of the assessment includes at least 2 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created with some instructor support External Aspect of the assessment includes 3 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created without instructor support
  • 23. RESULTS CAN BE GRAPHED • The instrument aspect scores (content, structural, and consequential aspects) can be averaged. • The results aspects scores (substantive, generalizability, and external aspects) can be averaged. • Resulting scores can be plotted. • This can be done for a collection of assessments in a program to provide a visual representation of how the program is doing overall.
  • 25. IMPLEMENTATION • This is a peer-review process. • Directors/Chairs make arguments that their assessments are valid. • They submit evidence for each argument. The Director/Chair self-scores the evidence in a software system. • A 2-3 person panel reviews the arguments and applies the rubric. They meet to form a consensus for scoring.
  • 26. EXAMPLE EDUCATIONAL LEADERSHIP Content Aspect of Construct Validity: Evidence of content relevance Assessment Content does not meet at least two of the following: Aligned with national or state standards Developed with input from external partners. Measure is relevant and demonstrably related to an issue of importance Assessment Content meets at least two of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Assessment Content meets all of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance
  • 27. EDUCATIONAL LEADERSHIP EXAMPLE Substantive Aspect of Construct Validity: Observed consistency in the test responses/scores Assessment Substance does not meet at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 3 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation
  • 29. REFERENCES Council for the Accreditation of Educator Preparation, CAEP Evidence Guide, February 2014 Ewell, Peter (2013). Principles for measures used in CAEP accreditation process, White Paper for Council for the Accreditation of Educator Preparation. Kane, Michael. (2013). The argument-based approach to validation. School Psychology Review 42(4), 448- 457. Messick, Samuel. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement, Issues and Practice. 14, 5-8. Messick, Samuel. (1989). Chapter in Linn, R. L. Educational Measurement 3rd Edition. New York: American Council on Education/Macmillan 13-103. QUESTIONS? wellenzn@Canisius.edu