SlideShare a Scribd company logo
NATIONAL FORUM OF TEACHER EDUCATION JOURNAL
VOLUME 19, NUMBER 3, 2009
1
Enhancing Validity of Critical Tasks Selected for
College and University Program Portfolios
Pattie Johnston, PhD
Assistant Professor of Education
The University of Tampa
Tampa, FL
Karen Wilkinson, M.A.
Middle School Teacher for Social Sciences
Tampa, FL
ABSTRACT
University and college departments are often charged with creating assessment
systems that measure student outcomes on identified objectives to meet
accreditation standards. Some departments like Education have additional
accreditation requirements because they have domain related accreditation agencies
in addition to national or regional university agencies. Assessment systems usually
consist of department objectives with assessments to measure student performance
on each of the department goals. Students are often required to keep a portfolio of
these assessment tasks. It is essential for assessment systems to have tasks that
measure department objectives aligned directly with the stated objective.
Departments have typically relied on faculty consensus to assure the desired
alignment. Consensus can be difficult, an excessive amount of tasks may be
identified or departments may want to spend a bit more effort to assure that their
consensus is valid.
Introduction
Departments in accredited colleges and universities are usually required to assess
student outcomes. This requirement is particularly true for departments with additional
domain specific accreditation like Education. Departments are supposed to create
assessments systems that identify student learning goals and the assessments given to
measure acquisition of the stated goals. Data on student performance on the objective
related assessments are typically tracked, aggregated and used to drive department
improvement.
NATIONAL FORUM OF TEACHER EDUCATION JOURNAL
2_____________________________________________________________________________________
As mentioned above, it is of extreme importance for departments like Education
to use student performance data to drive continual department improvement. National
Council for Accreditation of Teacher Education (NCATE) 2000 standards require teacher
education programs to assess pre-service teachers on departmental objectives over time
using multiple measures (NCATE, 2000; Takona, 2003). Teachers are usually require to
pass some type of basic skills tests such as Praxis I or Praxis II as one required data point
upon program completion (Selk, Mehigan, Fiene, & Victor, 2004). Other states have
created their own tests to use instead of the Praxis exams. The passing rates of students
on these exams can serve as one of the measures over time.
The other measure over time that occurs during course completion is more
complicated. In previous years, perusal of course syllabi was sufficient to show outcomes
of teacher education programs. As time went on, standards were changed and
departments were required to collect actual assessment evidence of each departmental
goal (Fetter, 2003). Thus, there has been a major shift in the assessment of students in
education programs. Departments often opt to have students compile a portfolio of
critical tasks or assessments for faculty review in order to document objective
acquisition. The portfolio assessments are to reflect knowledge gained on all
departmental objectives/learning outcomes. Programs now must show that candidates
have mastered the selected outcomes and that these outcomes have a positive impact on
learners (Fetter, 2003).
Klecker (2000) suggests that there are three major expectations for pre-service
teacher portfolios including: providing more meaningful and valid indictors of what pre-
service teachers know and can do, enhancing both teaching and learning and providing
useful assessment information. Education faculty must determine the assessments
contained in the portfolios. Portfolios must include products that clearly demonstrate that
the candidate can perform required outcomes, not just exposure to a concept in a course
(Fetter, 2003). Assessment entries in portfolios may include written work such as reports,
term papers, graded tests, assignments and lesson/unit plans. Other entries may include
artwork, lists of conferences, letters from parents, notes from students and video
recordings of teachings (Takona, 2003). Products may come from multiple sources such
as course work, field experiences and volunteer work. The products must be connected to
the program outcomes as established by the conceptual framework (Fetter, 2003).
Other issues include who determines the content of the portfolio and what should
be included. Some programs are quite prescriptive whereas others are more student-
oriented. The type is determined by the purpose of the portfolio (Dougan, 1996).
Accordingly, if the portfolio is to show mastery of content, then the department faculty
should choose the products (Dougan, 2003). Stakeholders in the teacher education
programs need to determine a specific set of assessments related to the program even if
multiple sections are taught by different instructors (Fetter, 2003). These critical
assessments need to have consistent assignment descriptions and rubrics to provide
consistency in scoring.
PATTIE JOHNSTON AND KAREN WILKINSON
_____________________________________________________________________________________3
Purpose of the Article
The purpose of this article is to present three easy to implement methods for
augmenting the faculty consensus method which may be associated with better assuring
the alignment of department goals and assessment tasks.
Need for Valid Critical Portfolio Tasks
The idea of the faculty choosing the assessment content and the inclusion of valid
tasks are closely related. It is imperative that the chosen critical tasks and assessments are
held to the same standards as most measurement systems which require estimations of
reliability and evidence of validity (Ghiselli, Campbell, & Zedeck, 2001; Mehrens, 2001;
Klecker, 2000). Reliability estimates allow for an examination of consistency of scoring
by professors on critical assessments. Evidence of the content validity allows the
department to suggest that the critical assessments represent the state objectives well.
This evidence is of primary importance to any assessment system because there are
inferences made on mastery based on assessment ratings. That is, students who score
high enough on critical assessments are considered to have demonstrated mastery of
those objectives. This inference is only true if the task/assessment represents the
objective in a meaningful way. Providing evidence of the validity of critical tasks in
students’ portfolios may be done in several ways and to varying degrees.
The Standards for Educational and Psychological Testing (AERA, 1999) is a
joint publication of the American Educational Research Association, American
Psychological Association and the National Council on Measurement and it suggests that
the most common method of providing evidence of content validity for any test or
assessment is to have content area experts rate the degree to which each test item
represents the objective or domain. These standards may be applied to portfolios. The
items are like the critical tasks and the domain is represented by the state objectives. The
validity question with portfolios becomes how well each critical task represents the
mandated goal. The alignment of assessment task and state goal is of key concern. There
are several ways to assure the alignment between of assessment tasks and objectives that
vary in degrees of certainty.
A fairly typical way of trying to provide evidence of assessment task validity is to
use faculty members as content/domain experts. This method would require faculty to
work collaboratively and agree upon the representativeness of each critical task.
Individual faculty could write critical task/assessment descriptions associated with each
objective. The faculty can then discuss each task and come to consensus on the tasks’
ability to represent the objective. The method is informal and fairly simple to implement
but consensus can be difficult. Sometimes there are more critical tasks identified than
needed per goal so departments need to select the most valid or representative assessment
tasks to include in student portfolios. If consensus is difficult or there are more tasks
identified than needed or if faculty just want to be extra certain of task validity, faculty
may opt to use one of the following easy methods to evaluate task
validity/representativeness.
NATIONAL FORUM OF TEACHER EDUCATION JOURNAL
4_____________________________________________________________________________________
Methods for Providing Additional Evidence of Portfolio Task Validity
Q-Sort Method
There is another step education faculty could take that may create a more
organized collaborative discussion and perhaps be associated with more valid results.
Again, classical measurement literature has suggested the use of a Q-sort as a method for
looking at representativeness (Crocker & Algina, 1986). Q-sorts force a ranking of items
by content experts. They have had wide application and could easily be adapted for use
here. The method would require more effort on the faculty’s part but may be associated
with increased validity. The process could be broken down into the following steps:
1. Faculty would have an objective and a list of possible critical assessments that
could be indicative of the objective.
2. The number of assessment tasks used should exceed the number of tasks needed
to represent the domain or objective.
3. Each faculty is asked to rank order each assessment by representativeness to the
objective.
4. Data is collected for each individual assessment task and mean rankings
calculated.
5. Assessment tasks with the highest rankings are the tasks selected for use.
This method serves to formalize the dialogue between faculty members by forcing
a ranking from each faculty member as to the representativeness of each critical task.
The rankings may require a more careful consideration than dialogue. There may be more
representativeness certainty with dialogue and rankings than with dialogue alone. The
more certainty that assessments tasks represent state goals, the more valid the assessment
tasks are.
Lawshe’s Content Validity Ratio
There is another method borrowed from the classical measurement literature
which has applications for the evaluation of portfolio task representativeness. Lawshe
(1975) created a Content Validity Ratio (CVR) that is used to gauge the content validity
of items on an empirical measure. In this approach, a panel of Subject Matter Experts
(SMEs) is asked to indicate whether or not a measurement item in a set of items is
“essential” to the operationalization of a theoretical construct. “Essential” items or
assessment tasks are ones that best represent the goal and are desired. Faculty members
may be used for SMEs. The measurement item in this case is one of several possible
portfolio tasks and the construct is the goal. For example, the portfolio assessment task
may require the pre-service teacher to construct a traditional test following item writing
guidelines and the state goal is “assessment”. The question to the SMEs becomes to what
degree, on a scale of one to five with five being very essential and one being not essential
at all, is the construction of a traditional test to “assessment” in the classroom. There
could be another possible portfolio task in the set of items which requires the candidate to
PATTIE JOHNSTON AND KAREN WILKINSON
_____________________________________________________________________________________5
write varying levels of objectives according to Bloom’s Taxonomy. Again, the question
would become how essential do the SMEs rate this task of objective writing to
assessment. The two assessment tasks would have varying ratings of “essentialness” but
the best and most valid assessment task would be the one with the highest CVR because
the ratio indicates the proportion of “essential” ratings.
The following is the ratio used after collecting “essential” responses:
CVR = (2ng / N) – 1
Where ng is the number of SMEs who think the item is good and N is the total
number of SMEs. Again, the SMEs should be rating the items/portfolio tasks in terms of
representativeness and essentialness to the goal. One can infer from the equation that the
CVR takes on values between -1.00 to +1.00, where a CVR = 0.00 means that 50% of the
SMEs in the panel size of N believe that the portfolio task is essential thereby valid.
Lawshe has further established minimum CVRs for varying panel sizes based on a one
tailed test at the a = 0.05 significance level. For example, if 25 SMEs make up the panel,
then measurement items for a specific task whose CVR values are less than 0.37 would
be deemed as not essential enough and deleted from use. Faculty could submit several
assessment tasks to consider representing a single particular goal and use the ones with
the highest CVR as evidence of the content validity of their assessment tasks. This
method would provide the department with quantitative data about the validity of each
accepted assessment task being used to measure goal mastery.
There is a third procedure that can serve to augment either of the above
mentioned methods. This method is more time consuming but could provide field based
evidence of validity which may be optimal because of the high stakes nature of portfolio
tasks. The field based approach is described below.
Field Testing Procedure
Departments could conduct a field action study to assess how well each critical
task previously delineated by university faculty represents the goal. A department may
want to consider using people employed in the field as content experts. In the case of
education departments, teachers could be considered as content experts. It is thought that
they may be good judges of how well assessments represent the objectives in a real life
way.
A sample of teachers could be given a brief description of each of the objectives
and a list of several possible associated critical tasks for each objective already identified
by university faculty. The teachers would be instructed to read each task and rate each the
representativeness of each critical assessment to the associated objective on a Likert
scale. The survey could include ratings of one to five, with a rating of five indicating the
most representative of the objective and a rating of one indicating the least representative
of the objective. High ratings would be associated with valid assessment tasks. The data
allows for calculation of mean scores for each critical assessment task. The low means
suggest that the respective assignments be reevaluated in terms of their relationships to
NATIONAL FORUM OF TEACHER EDUCATION JOURNAL
6_____________________________________________________________________________________
their intended objectives. Individual faculties could decide acceptable mean benchmark
standards and review any means that fall below that level.
Discussion
The intent of this article is to suggest that a triangulation of validity evidence be
considered when making decisions about critical assessments for student portfolios.
Faculty discussion and informal consensus may not be enough on their own when dealing
with high stakes assessment that is being overseen by accrediting agencies. The intent is
not to force formal experimental research but rather to consider use of an informal
strategy to augment collaborative decisions made by faculty acting as content experts.
The suggested action study used teachers in the field as a second source of content
area experts. Another group of content area specialists may be other professors in the
state. A department may want to ask professors from other universities to rate the degree
of representativeness of each critical task to the state objective. Departments could also
collect data as to what type of tasks are varying universities using and compare the
critical assessments by doing an informal content analysis. There are different methods
available for education departments to use when trying to provide evidence of the validity
of their portfolio assessments. The important factor is the recognition of the need to
extend beyond typical faculty consensus in situations where faculty consensus may be
difficult, more tasks are identified than needed or when departments feel the need to
confirm consensus because of the high stakes nature of portfolios.
References
Dougan, A. (1996). Student assessment by portfolio: One institution’s Journey. The
History Teacher, 29(2), 171-178.
Klecker, B. (2000). Content validity of pre-service teachers’ portfolios in a standards-
based program. Journal of Instructional Psychology, 27(1), 35-39.
Ghiselli, E., Campbell, J., & Zedeck, S. (1981). Measurement theory for the behavioral
sciences. New York: W.H. Freeman and Company.
Mehrens, W. (1992). Using performance assessment for accountability purposes.
Educational Measurement: Issues and Practice, 11(1), 3-9.
Selk, M., Mehigan, S., Fiene, J., & Victor, D (2004). Validity of standardized teacher test
scores for predicting beginning teacher performance. Action Teacher Education,
25(4), 20-29.
Takona, J.P. (2003). Development for teacher candidates. College Park, MD. ERIC
Clearing House on Assessment and Evaluation. (ERIC Document Reproduction
Services No. 481816).

More Related Content

PDF
Student perspectives on formative feedback: an exploratory comparative study
DOCX
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
PDF
11.descriptive evaluation of the primary schools an overview
PDF
Descriptive evaluation of the primary schools an overview
RTF
Pro questdocuments 2015-03-16(1)
PDF
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
PDF
Turner, colt cross purposes ijobe v2 n1 2014
PDF
Lumadue, rick utilizing merlot content builder focus v7 n1 2013 2
Student perspectives on formative feedback: an exploratory comparative study
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
11.descriptive evaluation of the primary schools an overview
Descriptive evaluation of the primary schools an overview
Pro questdocuments 2015-03-16(1)
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
Turner, colt cross purposes ijobe v2 n1 2014
Lumadue, rick utilizing merlot content builder focus v7 n1 2013 2

What's hot (14)

PDF
Pilot Study Publication (in press)
PDF
Assessing the validity and reliability
PDF
PDF
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
PPTX
Newer Methods of Assessment in Medical Education
PDF
Developing an authentic assessment model in elementary school science teaching
PPTX
60, Li, comparing quality assurance mechanisms for student learning outcomes ...
PDF
Assessment and Evaluation System in Engineering Education of UG Programmes at...
PDF
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
PDF
Factors of Quality Education Enhancement: Review on Higher Education Practic...
PDF
Eyes on the Prize
PDF
TSchehr_Assessing the Assessment_AIR
DOCX
207 TMA-2_ManotaR,FusileroM,DumoM
PDF
A1070109
Pilot Study Publication (in press)
Assessing the validity and reliability
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
Newer Methods of Assessment in Medical Education
Developing an authentic assessment model in elementary school science teaching
60, Li, comparing quality assurance mechanisms for student learning outcomes ...
Assessment and Evaluation System in Engineering Education of UG Programmes at...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Factors of Quality Education Enhancement: Review on Higher Education Practic...
Eyes on the Prize
TSchehr_Assessing the Assessment_AIR
207 TMA-2_ManotaR,FusileroM,DumoM
A1070109
Ad

Similar to Johnston, pattie enhancing validity of critical tasks (20)

PDF
Assessing Service-Learning
PDF
A review of classroom observation techniques used in postsecondary settings..pdf
PDF
School based assesment
PDF
Journal raganit santos
DOCX
Running Head Target of Program Evaluation Plan, Part 11TARG.docx
PPTX
A Competency based Management Workshop.pptx
PPTX
A Workshop On Competency based Management
PDF
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
DOCX
test construction in mathematics
PPTX
What philosophical assumptions drive the teacher/teaching standards movement ...
PPTX
Chapter 12
DOCX
Collecting Information Please respond to the followingUsi.docx
DOCX
Transforming with Technology
PDF
Assessment Of Business Programs A Review Of Two Models
PPTX
PROF ED 7 PPT.pptx
DOCX
construction and administration of unit test in science subject
PDF
B 190313162555
PPTX
assessment report.pptxpurposive communicationpurposive communicationpurposive...
PPTX
Advanced Foundations and Methods in EL (Lecture 1)
PPTX
Competency Workshop for Executives01.pptx
Assessing Service-Learning
A review of classroom observation techniques used in postsecondary settings..pdf
School based assesment
Journal raganit santos
Running Head Target of Program Evaluation Plan, Part 11TARG.docx
A Competency based Management Workshop.pptx
A Workshop On Competency based Management
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
test construction in mathematics
What philosophical assumptions drive the teacher/teaching standards movement ...
Chapter 12
Collecting Information Please respond to the followingUsi.docx
Transforming with Technology
Assessment Of Business Programs A Review Of Two Models
PROF ED 7 PPT.pptx
construction and administration of unit test in science subject
B 190313162555
assessment report.pptxpurposive communicationpurposive communicationpurposive...
Advanced Foundations and Methods in EL (Lecture 1)
Competency Workshop for Executives01.pptx
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Trump Administration's workforce development strategy
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Hazard Identification & Risk Assessment .pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
advance database management system book.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Final Presentation General Medicine 03-08-2024.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Trump Administration's workforce development strategy
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Weekly quiz Compilation Jan -July 25.pdf
History, Philosophy and sociology of education (1).pptx
Empowerment Technology for Senior High School Guide
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Hazard Identification & Risk Assessment .pdf
A systematic review of self-coping strategies used by university students to ...
Supply Chain Operations Speaking Notes -ICLT Program
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Complications of Minimal Access Surgery at WLH
Unit 4 Skeletal System.ppt.pptxopresentatiom
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Practical Manual AGRO-233 Principles and Practices of Natural Farming
advance database management system book.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation

Johnston, pattie enhancing validity of critical tasks

  • 1. NATIONAL FORUM OF TEACHER EDUCATION JOURNAL VOLUME 19, NUMBER 3, 2009 1 Enhancing Validity of Critical Tasks Selected for College and University Program Portfolios Pattie Johnston, PhD Assistant Professor of Education The University of Tampa Tampa, FL Karen Wilkinson, M.A. Middle School Teacher for Social Sciences Tampa, FL ABSTRACT University and college departments are often charged with creating assessment systems that measure student outcomes on identified objectives to meet accreditation standards. Some departments like Education have additional accreditation requirements because they have domain related accreditation agencies in addition to national or regional university agencies. Assessment systems usually consist of department objectives with assessments to measure student performance on each of the department goals. Students are often required to keep a portfolio of these assessment tasks. It is essential for assessment systems to have tasks that measure department objectives aligned directly with the stated objective. Departments have typically relied on faculty consensus to assure the desired alignment. Consensus can be difficult, an excessive amount of tasks may be identified or departments may want to spend a bit more effort to assure that their consensus is valid. Introduction Departments in accredited colleges and universities are usually required to assess student outcomes. This requirement is particularly true for departments with additional domain specific accreditation like Education. Departments are supposed to create assessments systems that identify student learning goals and the assessments given to measure acquisition of the stated goals. Data on student performance on the objective related assessments are typically tracked, aggregated and used to drive department improvement.
  • 2. NATIONAL FORUM OF TEACHER EDUCATION JOURNAL 2_____________________________________________________________________________________ As mentioned above, it is of extreme importance for departments like Education to use student performance data to drive continual department improvement. National Council for Accreditation of Teacher Education (NCATE) 2000 standards require teacher education programs to assess pre-service teachers on departmental objectives over time using multiple measures (NCATE, 2000; Takona, 2003). Teachers are usually require to pass some type of basic skills tests such as Praxis I or Praxis II as one required data point upon program completion (Selk, Mehigan, Fiene, & Victor, 2004). Other states have created their own tests to use instead of the Praxis exams. The passing rates of students on these exams can serve as one of the measures over time. The other measure over time that occurs during course completion is more complicated. In previous years, perusal of course syllabi was sufficient to show outcomes of teacher education programs. As time went on, standards were changed and departments were required to collect actual assessment evidence of each departmental goal (Fetter, 2003). Thus, there has been a major shift in the assessment of students in education programs. Departments often opt to have students compile a portfolio of critical tasks or assessments for faculty review in order to document objective acquisition. The portfolio assessments are to reflect knowledge gained on all departmental objectives/learning outcomes. Programs now must show that candidates have mastered the selected outcomes and that these outcomes have a positive impact on learners (Fetter, 2003). Klecker (2000) suggests that there are three major expectations for pre-service teacher portfolios including: providing more meaningful and valid indictors of what pre- service teachers know and can do, enhancing both teaching and learning and providing useful assessment information. Education faculty must determine the assessments contained in the portfolios. Portfolios must include products that clearly demonstrate that the candidate can perform required outcomes, not just exposure to a concept in a course (Fetter, 2003). Assessment entries in portfolios may include written work such as reports, term papers, graded tests, assignments and lesson/unit plans. Other entries may include artwork, lists of conferences, letters from parents, notes from students and video recordings of teachings (Takona, 2003). Products may come from multiple sources such as course work, field experiences and volunteer work. The products must be connected to the program outcomes as established by the conceptual framework (Fetter, 2003). Other issues include who determines the content of the portfolio and what should be included. Some programs are quite prescriptive whereas others are more student- oriented. The type is determined by the purpose of the portfolio (Dougan, 1996). Accordingly, if the portfolio is to show mastery of content, then the department faculty should choose the products (Dougan, 2003). Stakeholders in the teacher education programs need to determine a specific set of assessments related to the program even if multiple sections are taught by different instructors (Fetter, 2003). These critical assessments need to have consistent assignment descriptions and rubrics to provide consistency in scoring.
  • 3. PATTIE JOHNSTON AND KAREN WILKINSON _____________________________________________________________________________________3 Purpose of the Article The purpose of this article is to present three easy to implement methods for augmenting the faculty consensus method which may be associated with better assuring the alignment of department goals and assessment tasks. Need for Valid Critical Portfolio Tasks The idea of the faculty choosing the assessment content and the inclusion of valid tasks are closely related. It is imperative that the chosen critical tasks and assessments are held to the same standards as most measurement systems which require estimations of reliability and evidence of validity (Ghiselli, Campbell, & Zedeck, 2001; Mehrens, 2001; Klecker, 2000). Reliability estimates allow for an examination of consistency of scoring by professors on critical assessments. Evidence of the content validity allows the department to suggest that the critical assessments represent the state objectives well. This evidence is of primary importance to any assessment system because there are inferences made on mastery based on assessment ratings. That is, students who score high enough on critical assessments are considered to have demonstrated mastery of those objectives. This inference is only true if the task/assessment represents the objective in a meaningful way. Providing evidence of the validity of critical tasks in students’ portfolios may be done in several ways and to varying degrees. The Standards for Educational and Psychological Testing (AERA, 1999) is a joint publication of the American Educational Research Association, American Psychological Association and the National Council on Measurement and it suggests that the most common method of providing evidence of content validity for any test or assessment is to have content area experts rate the degree to which each test item represents the objective or domain. These standards may be applied to portfolios. The items are like the critical tasks and the domain is represented by the state objectives. The validity question with portfolios becomes how well each critical task represents the mandated goal. The alignment of assessment task and state goal is of key concern. There are several ways to assure the alignment between of assessment tasks and objectives that vary in degrees of certainty. A fairly typical way of trying to provide evidence of assessment task validity is to use faculty members as content/domain experts. This method would require faculty to work collaboratively and agree upon the representativeness of each critical task. Individual faculty could write critical task/assessment descriptions associated with each objective. The faculty can then discuss each task and come to consensus on the tasks’ ability to represent the objective. The method is informal and fairly simple to implement but consensus can be difficult. Sometimes there are more critical tasks identified than needed per goal so departments need to select the most valid or representative assessment tasks to include in student portfolios. If consensus is difficult or there are more tasks identified than needed or if faculty just want to be extra certain of task validity, faculty may opt to use one of the following easy methods to evaluate task validity/representativeness.
  • 4. NATIONAL FORUM OF TEACHER EDUCATION JOURNAL 4_____________________________________________________________________________________ Methods for Providing Additional Evidence of Portfolio Task Validity Q-Sort Method There is another step education faculty could take that may create a more organized collaborative discussion and perhaps be associated with more valid results. Again, classical measurement literature has suggested the use of a Q-sort as a method for looking at representativeness (Crocker & Algina, 1986). Q-sorts force a ranking of items by content experts. They have had wide application and could easily be adapted for use here. The method would require more effort on the faculty’s part but may be associated with increased validity. The process could be broken down into the following steps: 1. Faculty would have an objective and a list of possible critical assessments that could be indicative of the objective. 2. The number of assessment tasks used should exceed the number of tasks needed to represent the domain or objective. 3. Each faculty is asked to rank order each assessment by representativeness to the objective. 4. Data is collected for each individual assessment task and mean rankings calculated. 5. Assessment tasks with the highest rankings are the tasks selected for use. This method serves to formalize the dialogue between faculty members by forcing a ranking from each faculty member as to the representativeness of each critical task. The rankings may require a more careful consideration than dialogue. There may be more representativeness certainty with dialogue and rankings than with dialogue alone. The more certainty that assessments tasks represent state goals, the more valid the assessment tasks are. Lawshe’s Content Validity Ratio There is another method borrowed from the classical measurement literature which has applications for the evaluation of portfolio task representativeness. Lawshe (1975) created a Content Validity Ratio (CVR) that is used to gauge the content validity of items on an empirical measure. In this approach, a panel of Subject Matter Experts (SMEs) is asked to indicate whether or not a measurement item in a set of items is “essential” to the operationalization of a theoretical construct. “Essential” items or assessment tasks are ones that best represent the goal and are desired. Faculty members may be used for SMEs. The measurement item in this case is one of several possible portfolio tasks and the construct is the goal. For example, the portfolio assessment task may require the pre-service teacher to construct a traditional test following item writing guidelines and the state goal is “assessment”. The question to the SMEs becomes to what degree, on a scale of one to five with five being very essential and one being not essential at all, is the construction of a traditional test to “assessment” in the classroom. There could be another possible portfolio task in the set of items which requires the candidate to
  • 5. PATTIE JOHNSTON AND KAREN WILKINSON _____________________________________________________________________________________5 write varying levels of objectives according to Bloom’s Taxonomy. Again, the question would become how essential do the SMEs rate this task of objective writing to assessment. The two assessment tasks would have varying ratings of “essentialness” but the best and most valid assessment task would be the one with the highest CVR because the ratio indicates the proportion of “essential” ratings. The following is the ratio used after collecting “essential” responses: CVR = (2ng / N) – 1 Where ng is the number of SMEs who think the item is good and N is the total number of SMEs. Again, the SMEs should be rating the items/portfolio tasks in terms of representativeness and essentialness to the goal. One can infer from the equation that the CVR takes on values between -1.00 to +1.00, where a CVR = 0.00 means that 50% of the SMEs in the panel size of N believe that the portfolio task is essential thereby valid. Lawshe has further established minimum CVRs for varying panel sizes based on a one tailed test at the a = 0.05 significance level. For example, if 25 SMEs make up the panel, then measurement items for a specific task whose CVR values are less than 0.37 would be deemed as not essential enough and deleted from use. Faculty could submit several assessment tasks to consider representing a single particular goal and use the ones with the highest CVR as evidence of the content validity of their assessment tasks. This method would provide the department with quantitative data about the validity of each accepted assessment task being used to measure goal mastery. There is a third procedure that can serve to augment either of the above mentioned methods. This method is more time consuming but could provide field based evidence of validity which may be optimal because of the high stakes nature of portfolio tasks. The field based approach is described below. Field Testing Procedure Departments could conduct a field action study to assess how well each critical task previously delineated by university faculty represents the goal. A department may want to consider using people employed in the field as content experts. In the case of education departments, teachers could be considered as content experts. It is thought that they may be good judges of how well assessments represent the objectives in a real life way. A sample of teachers could be given a brief description of each of the objectives and a list of several possible associated critical tasks for each objective already identified by university faculty. The teachers would be instructed to read each task and rate each the representativeness of each critical assessment to the associated objective on a Likert scale. The survey could include ratings of one to five, with a rating of five indicating the most representative of the objective and a rating of one indicating the least representative of the objective. High ratings would be associated with valid assessment tasks. The data allows for calculation of mean scores for each critical assessment task. The low means suggest that the respective assignments be reevaluated in terms of their relationships to
  • 6. NATIONAL FORUM OF TEACHER EDUCATION JOURNAL 6_____________________________________________________________________________________ their intended objectives. Individual faculties could decide acceptable mean benchmark standards and review any means that fall below that level. Discussion The intent of this article is to suggest that a triangulation of validity evidence be considered when making decisions about critical assessments for student portfolios. Faculty discussion and informal consensus may not be enough on their own when dealing with high stakes assessment that is being overseen by accrediting agencies. The intent is not to force formal experimental research but rather to consider use of an informal strategy to augment collaborative decisions made by faculty acting as content experts. The suggested action study used teachers in the field as a second source of content area experts. Another group of content area specialists may be other professors in the state. A department may want to ask professors from other universities to rate the degree of representativeness of each critical task to the state objective. Departments could also collect data as to what type of tasks are varying universities using and compare the critical assessments by doing an informal content analysis. There are different methods available for education departments to use when trying to provide evidence of the validity of their portfolio assessments. The important factor is the recognition of the need to extend beyond typical faculty consensus in situations where faculty consensus may be difficult, more tasks are identified than needed or when departments feel the need to confirm consensus because of the high stakes nature of portfolios. References Dougan, A. (1996). Student assessment by portfolio: One institution’s Journey. The History Teacher, 29(2), 171-178. Klecker, B. (2000). Content validity of pre-service teachers’ portfolios in a standards- based program. Journal of Instructional Psychology, 27(1), 35-39. Ghiselli, E., Campbell, J., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. New York: W.H. Freeman and Company. Mehrens, W. (1992). Using performance assessment for accountability purposes. Educational Measurement: Issues and Practice, 11(1), 3-9. Selk, M., Mehigan, S., Fiene, J., & Victor, D (2004). Validity of standardized teacher test scores for predicting beginning teacher performance. Action Teacher Education, 25(4), 20-29. Takona, J.P. (2003). Development for teacher candidates. College Park, MD. ERIC Clearing House on Assessment and Evaluation. (ERIC Document Reproduction Services No. 481816).