SlideShare a Scribd company logo
LOGO
Justifying the Use of an English
Language Placement Test
with an Assessment Use Argument
Presented by:
Parisa Mehran
Alzahra University
Tehran, Iran
Placement Tests
Placement test is considered as a fairly high-stakes test (Bachman & Palmer, 1996, 2010),
and the social consequences of placement decisions are of great significance and need to
be investigated, since such decisions can affect the lives of students (Murray, 2001;
Schmitz & delMas, 1991).
Thus, as Brown (1989) emphasizes, it is important to make valid placement decisions to
avoid mismatches that can occur due to inappropriate placement testing.
Validity in Language Assessment
Validity has been regarded as the most significant and complex concept in language
assessment, and it has always been under investigation by language testing experts
and researchers (e.g., Bachman, 1990, 2004, 2005; Bachman & Palmer, 2010; Chapelle,
1999; Cronbach & Meehl, 1955; Kane, 2001, 2012, 2013; Lado, 1961; Messick, 1989). As a
result, the conception of validity has undergone a series of reinterpretations
throughout the history of language assessment.
Argument-based Approaches to Validity
Argument-based approaches to validity are based on the concept of a validity argument which has
been used in the process of validation for more than twenty years (e.g., Bachman, 2005; Cronbach,
1988, 1989; House, 1980, Kane, 1992; Mislevy, 2003; Mislevy, Steinberg, & Almond, 2002, 2003).
The process is comparable to building a legal case to persuade a judge or a jury. The process of
validation thus becomes ongoing: As long as a test is alive, the collection of relevant evidence is
going to be continued (Bachman & Palmer, 2010). Hence, any kind of relevant evidence is gathered
to show the plausibility of the intended interpretations and uses (Bachman, 2004; Kane, 2012).
Purpose of the Study
Using Bachman and Palmer's (2010) AUA as a framework, this study examined the
validity of an English language placement test, which is composed of the Oxford Quick
Placement Test (OQPT) and a follow-up oral examination. The following research
question was addressed:
To what extent are the OQPT and the oral examination justifiable in placing students
appropriately according to Bachman and Palmer's (2010) AUA?
Methodology: Participants and Setting
This study was conducted at one of the English language institutes in Tehran, Iran.
Three hundred and thirty-two (332) newcomers to the institute who had to take the
placement test participated in this study, and 15 of them were interviewed. The head of
the institute, three examiners of the placement test, ten teachers, and four experts also
attended the current study.
Methodology: Instrumentation
1. OQPT
2. Oral Examination
3. TOEFL
4. Interview
5. Observation
The AUA for Justifying the Placement Test
As Vongpumivitch (2010) and Wang et al. (2012) remark, Bachman and Palmer's (2010) framework
has a top-down approach. That is, the four claims are discussed from the perspective of test
development rather than from that of test use. Therefore, in this study, where the aim is to evaluate
the overall usefulness of an English language placement test, the four claims are presented in the
reverse order from that in Bachman and Palmer (2010).
It should also be mentioned that as Bachman and Palmer (2010) emphasize not all of the warrants
and rebuttals listed by them will necessarily be needed in the AUA for any given test. Moreover,
due to practical research limitations, not all of the warrants and rebuttals have been investigated in
the present study.
Claim 4: The assessment records of the OQPT and the oral examination are consistent across
different assessment tasks, different aspects of the assessment procedure, and across different
groups of test takers.
Claim 4: Assessment Records
Consistency
The first warrant for this claim is that the procedures for administrating the OQPT and the oral
examination are followed consistently across different occasions and for all test takers:
 The observation of how the OQPT and the oral examination were administered as well as the
interviews with the examiners and the head of the institute revealed that there are a set of
administrative procedures which are strictly followed by the test administrators; hence, the
administrative procedures are consistent across different occasions and for all test taker groups.
Consistency (cont.)
Another warrant to support the consistency claim involves the scoring criteria and procedures:
 The criteria and procedures for rating test takers' performance on the OQPT are well specified and
are adhered to. Since the OQPT is in multiple-choice format, its rating criteria and procedures are
quite objective, and scoring is done based on an answer key.
 However, the criteria and procedures for the oral examination are not well specified and are quite
subjective. A set of questions have been devised based on the coursebook. In this sense, the
administration of the oral examination is consistent, yet its scoring process does not follow any
specific procedures. This lack of evidence could be a rebuttal here.
Consistency (cont.)
With respect to the warrant of rater training:
Raters undergo training before administrating the placement test.
However, one of the examiners was not satisfied with the training process, and she
believed that what matters is just the examiner's marketing skill to "grab more
customers" for the institute.
Consistency (cont.)
To check internal consistency of the items, as another warrant:
 Kuder-Richardson formula 20 (KR-20) was used.
 The reliability coefficient (KR-20) obtained for the OQPT Version 1 was .93 and for the OQPT Version 2 was
.88 showing that the OQPT has reasonable internal consistency reliability.
 Two main test item indices (item difficulty and item discrimination) were used in the test item analysis for the
OQPT.
 In terms of difficulty, the items have been ordered from the easiest to the most difficult, and this is in line
with the view of experts, examiners, and test takers. The analysis of item difficulty showed that, by and
large, most of the items were difficult (56% in the Version 1 and 63% in Version 2), and both versions of the
OQPT did not have a fairly acceptable distribution of difficulty.
 In terms of item discrimination, the analysis of items demonstrated that the items in the OQPT Version 1
had good amount of discrimination (75%). However, the OQPT Version 2 contained less items of fair
discrimination (46%) in comparison to the first version.
Consistency (cont.)
 In regard to inter-rater and intra-rater reliability, inconsistencies between and within human
raters are not a source of measurement error because the scoring of the OQPT is done through
an answer key.
 In the case of the oral examination, Cronbach's alpha was computed. The alpha was .93 for
inter-rater reliability and .96 for intra-rater reliability indicating that despite the lack of
consistent criteria and procedures for the oral examination, the oral examination has
reasonable internal consistency reliability.
 The analysis of the two versions of the OQPT brought a serious rebuttal to the consistency claim.
Cronbach's alpha was calculated and the alpha was .000 indicating that the two versions are not
equivalent. Two experts also remarked that the second version is much more difficult, and that
it cannot be considered as equivalent to the first version.
Claim 3: The OQPT scores and the oral examination results can be interpreted as test takers' level
of English proficiency and place them in their appropriate level. Such interpretations are
meaningful, impartial, generalizable, relevant, and sufficient.
Claim 3: Interpretations
Meaningfulness
 Interpretations about the construct to be assessed are meaningful if they are based on a frame of reference like a
course syllabus, a needs analysis, or a general theory of language ability. The head of the institute and the
examiners believed that the OQPT is a suitable placement test because Oxford University Press is the publisher of
both this test and the coursebook taught in the institute (i.e., English Result). The teaching method followed in the
institute is Communicative Language Teaching (CLT), and speaking is thus the primary focus; therefore, the oral
examination has a significant role in placement testing. However, lack of a listening section in the OQPT can be a
rebuttal to the meaningfulness of the interpretations.
Impartiality
To support the impartiality warrant, the assessment items/tasks should be checked for response
formats or content that may either favor or disfavor some test takers, and test takers should be
treated impartially in terms of all aspects of test administration.
 As mentioned earlier, interviews with test takers and examiners revealed that due to the lack of
a specific rubric for the oral examination, it was believed that interpretations about the ability to
be assessed were not made without bias against any groups of test takers.
 No complaint was made in regard to the appropriateness of the content. Bias and item
sensitivity studies need to be done for deeper analysis.
Generalizability
According to the generalizability warrant, the characteristics of the assessment items/tasks (e.g., input,
expected response, type of interaction) as well as the scoring criteria and procedures of the test tasks
should correspond closely to those of the target language use (TLU) domain.
 It might be that the items/tasks in the OQPT and the oral examination do not exactly correspond to all
of the TLU tasks; however, the content of the OQPT and the oral examination corresponds to the
content of the textbook taught in the institute. Moreover, in the oral examination, test takers' real world
language performance is examined. Thus, this can to some extent support the generalizability warrant.
 Here, it is worth mentioning that some of the teachers, examiners, and experts asserted that a TOEFL or
an IELTS test was a better test, indeed an ideal one, due to having writing and specially listening parts,
for placement testing, but because of time limitations, they could not be used as placement tests.
Consequently, a TOEFL test was given to those who had taken the OQPT, and the results manifested
that the correlation between the OQPT and TOEFL scores was not high (r=.66).
Relevance
The forth warrant is relevance according to which the assessment-based interpretations should
provide information that is relevant and helpful for the decision makers to make decisions.
 Based on the interviews conducted with examiners, it was revealed that the OQPT scores were
not sufficiently helpful in placement testing. This could have been a rebuttal to the relevance
warrant; yet, the oral examination is in support of this warrant since it is quite helpful for the
examiners to make their placement decisions. As said before, lack of a rubric is a serious
rebuttal.
Sufficiency
The fifth warrant demands that the assessment-based interpretations should provide
information that is sufficient for the decision makers to make decisions.
Again, because the process of placement testing includes both a written and an oral
test, sufficient information is obtained to make placement decisions.
Claim 2: The placement decisions that are made on the basis of the OQPT scores and the oral
examination results are sensitive to local values and equitable to all stakeholders.
Claim 2: Decisions
Values Sensitivity
According to values sensitivity warrant, the existing community values and relevant legal
requirements should be carefully and critically considered in the admission decisions that are to be
made and in determining the relative seriousness of false positive and false negative classification
errors.
 The interviews and observations revealed that the process of placement testing does not
guarantee test fairness considerations, and in this phase just absorbing more clients is
important. Hence, it is possible to have potential false positives (i.e., individuals are placed in a
level higher than their actual level) and false negatives (i.e., individuals are placed in a level
lower than their actual level). Usually the latter happens, because it is less risky; nevertheless, if
at the time of placement testing the institute does not have the level appropriate for the test
taker (due to some limitations, such as space, time, lack of students, all the levels are not always
covered), the test taker will be placed in a higher level.
Equitability
 Due to the subjectivity of the oral examination, it cannot be claimed that the same cut scores
and decision rules are used to classify all students who have applied for the same program, and
no other considerations are used. The economic and practical considerations always exist.
Consequently, test takers and other stakeholders are not fully informed about how decisions will
be made and whether decisions are actually made in the way described to them.
Claim 1: The consequences of the placement decisions based on the OQPT scores and the oral
examination results are beneficial to all stakeholders that use the test, including the test takers, the
institution, the teachers, and the supervisor.
Claim 1: Consequences
Beneficence
The first warrant is that the consequences of using the assessment that are specific to each
stakeholder will be beneficial.
Some of the test takers were interviewed after their placement test and their attendance in
their classes. On the whole, they were satisfied, although some of them believed that their
level was higher, and that they were placed in a lower level. To them, the reason is basically
cost-effectiveness. Most of the teachers believed that their students were homogenous in the
class; yet, two teachers strongly disagreed and they believed that their classes were not at all
homogenous especially at higher levels.
Conclusion
 Based on the evidence gathered, this study found that the assessment records of the OQPT and the oral
examination were consistent across different assessment tasks, different aspects of the assessment
procedure, and across different groups of test takers. However, the oral examination required a set of
clear criteria.
 Moreover, the analysis of the two versions of the OQPT manifested that their parallelism was under
question which could threaten the consistency of the assessment records.
 The findings also indicated that the OQPT scores and the oral examination results could be interpreted
somewhat as test takers' level of English proficiency and could place them in their appropriate levels.
Such interpretations were meaningful, impartial, relevant, and sufficient, although lack of a listening
section in the OQPT and lack of a rubric for the oral examination could be threatening, and
generalizability of the results was to some extent under question.
 In addition, the placement decisions were not sensitive to local values and equitable to all stakeholders
due to the subjectivity of the oral examination and the economic considerations of the institute.
 Lastly, by and large, the consequences of the placement decisions were beneficial to all stakeholders that
use the test, which is composed of the test takers, the institution, the teachers, and the supervisor.
Local Implications
To support the intended test use, it would be helpful to examine the negative evidence that has
been identified in this study and resolve the identified issues or mitigate the potential negative
impact of unresolved issues. For instance, in the case of the current placement test, the oral
examination can be given based on a rubric, a listening section can be added to the written test,
and economic considerations can become less important for the institute, and therefore the
intended uses of the test would become much more justifiable with stronger evidence.
The Merits/Demerits of Using an AUA
Finally, this study serves as an illustration of the merits/demerits of using an AUA.
 On the whole, the AUA provides a sound framework in which the validity of a test and its use
can be justified and the test developers/users can be accountable for their test.
 With the help of the AUA framework, the process of assessment justification becomes more
comprehensive, systematic, and coherent. In fact, one of the merits of an AUA is its clear
articulation about which types of evidence should be collected for which claims or warrants.
 However, in the process of assessment justification, an AUA demands that the evaluation of the
test be done at many levels and this needs different types of data and analyses. Thus, in practice,
the complexity of the justification study may be a big challenge for a single researcher.
Thank you for your attention!

More Related Content

PPTX
Kinds of language tests
PPTX
Types of tests and types of testing
PPTX
Testing listening (1)
PPT
Assessment &testing in the classroom
PPTX
Teaching pronunciation
DOCX
Listening Lesson Plan Billy
PDF
Phonetics features of plosive
PDF
English as second and foreign language
Kinds of language tests
Types of tests and types of testing
Testing listening (1)
Assessment &testing in the classroom
Teaching pronunciation
Listening Lesson Plan Billy
Phonetics features of plosive
English as second and foreign language

What's hot (20)

PPTX
Assessing Writing
PPTX
Teaching Language Skill: Speaking and Writing
PPTX
Testing Grammar and Vocabulary Skill
PPTX
How to teach vocabulary to young learners
PPTX
Test for young learners
PPT
Chapter 5( standards based assessment)
PPT
Types of language assessment
PPT
Chapter 6 popular methodology
PPTX
Testing reading
PPTX
Testing listening slide
PPTX
Assessments, concepts and issues
PPTX
Grammar teaching
PPT
Teaching Vocabulary
PPTX
Assessment & Language Learning
PPTX
Evaluating Your Textbook
PPTX
Linguistic categories and culture. Fatimah Abu-Srair
PPT
ENGLISH FOR SPECIFIC PURPOSES
PPTX
Testing speaking
 
PPT
Communicative language testing
Assessing Writing
Teaching Language Skill: Speaking and Writing
Testing Grammar and Vocabulary Skill
How to teach vocabulary to young learners
Test for young learners
Chapter 5( standards based assessment)
Types of language assessment
Chapter 6 popular methodology
Testing reading
Testing listening slide
Assessments, concepts and issues
Grammar teaching
Teaching Vocabulary
Assessment & Language Learning
Evaluating Your Textbook
Linguistic categories and culture. Fatimah Abu-Srair
ENGLISH FOR SPECIFIC PURPOSES
Testing speaking
 
Communicative language testing
Ad

Viewers also liked (6)

PDF
Cr comp placement_test
PPSX
Survival Skills for College
PDF
CUADERNILLO DE ESCRITURA
PDF
Citywide Safety Solutions
PDF
Placement test
PDF
TEDx Manchester: AI & The Future of Work
Cr comp placement_test
Survival Skills for College
CUADERNILLO DE ESCRITURA
Citywide Safety Solutions
Placement test
TEDx Manchester: AI & The Future of Work
Ad

Similar to Justifying the Use of an English Language Placement Test with an Assessment Use Argument (20)

DOCX
Theoretical framefinal (autoguardado)
DOCX
Assessment on SLA Theoretical Framework
PPTX
CHARACTERISTICS OF A GOOD INSTRUMENT
PDF
Principles of language assessment
PPTX
Week 8 & 9 - Validity and Reliability
PDF
Assessment in Learning ( Characteristics)
PPT
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
PPTX
CRITERIA OF A GOOD TEST.pptx
PPTX
The nittygritty of language testing
PPTX
Language Assessments - Key Features and Concepts
PPTX
Principles of Language Assessment
PPTX
ELTLAE Group 2.pptx
PPT
Presentation seminar on
PPTX
Language admin
PDF
Principles of Language Assessment
DOCX
assessment and feedback in english language learning
PPTX
ASSESSMENT.pptx
PPTX
Characteristics of Assessment
PPTX
Validity and Reliability
Theoretical framefinal (autoguardado)
Assessment on SLA Theoretical Framework
CHARACTERISTICS OF A GOOD INSTRUMENT
Principles of language assessment
Week 8 & 9 - Validity and Reliability
Assessment in Learning ( Characteristics)
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
CRITERIA OF A GOOD TEST.pptx
The nittygritty of language testing
Language Assessments - Key Features and Concepts
Principles of Language Assessment
ELTLAE Group 2.pptx
Presentation seminar on
Language admin
Principles of Language Assessment
assessment and feedback in english language learning
ASSESSMENT.pptx
Characteristics of Assessment
Validity and Reliability

More from Parisa Mehran (20)

PDF
How to Transform Your Classroom with AR and VR
PPTX
Design, Implementation, and Evaluation of an English Blended Course
PDF
“Write 4 Change”: Cultivating Autonomous, Global EFL Learners through Blogging
PDF
Physically Banned yet Virtually Connected at EUROCALL2017: How Technology Ove...
PPTX
Debunking Stereotypes about Middle Eastern Women in the EFL Classroom
PPTX
Building Global Awareness and Responsible World Citizenship through Augmented...
PPTX
Connecting to Puerto Rico through Augmented and Virtual Realities
PDF
Are You Listening? Responding to the Challenges of Diversity (Tottori JALT)
PPTX
Being an Iranian Woman Today イラン人女性として現代に生きるということ
PDF
A Virtual Trip to the Real Iran
PDF
Are You Listening? Responding to the Challenges of Diversity (Kyoto JALT)
PPTX
How Can I Change the World: Postcards for Puerto Rico
PPTX
Developing a Blended Course: Why Quality Matters
PPTX
Inspiring Women
PPTX
Multimodal e-Feedback in an Online English Course
PPTX
A Virtual Trip to the Unseen Iran, MAVR SIG Forum, JALT2017
PPTX
Students' Comments on Their Trip to the HeART of the Unseen Iran via the VR ...
PPTX
Group Number Signs
PPTX
Iran 360
PPTX
I Am More Than A Stereotype: Actions and Stories for Diversity Awareness and ...
How to Transform Your Classroom with AR and VR
Design, Implementation, and Evaluation of an English Blended Course
“Write 4 Change”: Cultivating Autonomous, Global EFL Learners through Blogging
Physically Banned yet Virtually Connected at EUROCALL2017: How Technology Ove...
Debunking Stereotypes about Middle Eastern Women in the EFL Classroom
Building Global Awareness and Responsible World Citizenship through Augmented...
Connecting to Puerto Rico through Augmented and Virtual Realities
Are You Listening? Responding to the Challenges of Diversity (Tottori JALT)
Being an Iranian Woman Today イラン人女性として現代に生きるということ
A Virtual Trip to the Real Iran
Are You Listening? Responding to the Challenges of Diversity (Kyoto JALT)
How Can I Change the World: Postcards for Puerto Rico
Developing a Blended Course: Why Quality Matters
Inspiring Women
Multimodal e-Feedback in an Online English Course
A Virtual Trip to the Unseen Iran, MAVR SIG Forum, JALT2017
Students' Comments on Their Trip to the HeART of the Unseen Iran via the VR ...
Group Number Signs
Iran 360
I Am More Than A Stereotype: Actions and Stories for Diversity Awareness and ...

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Complications of Minimal Access Surgery at WLH
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
01-Introduction-to-Information-Management.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Complications of Minimal Access Surgery at WLH
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Weekly quiz Compilation Jan -July 25.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
STATICS OF THE RIGID BODIES Hibbelers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Yogi Goddess Pres Conference Studio Updates
01-Introduction-to-Information-Management.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
VCE English Exam - Section C Student Revision Booklet
Abdominal Access Techniques with Prof. Dr. R K Mishra

Justifying the Use of an English Language Placement Test with an Assessment Use Argument

  • 1. LOGO Justifying the Use of an English Language Placement Test with an Assessment Use Argument Presented by: Parisa Mehran Alzahra University Tehran, Iran
  • 2. Placement Tests Placement test is considered as a fairly high-stakes test (Bachman & Palmer, 1996, 2010), and the social consequences of placement decisions are of great significance and need to be investigated, since such decisions can affect the lives of students (Murray, 2001; Schmitz & delMas, 1991). Thus, as Brown (1989) emphasizes, it is important to make valid placement decisions to avoid mismatches that can occur due to inappropriate placement testing.
  • 3. Validity in Language Assessment Validity has been regarded as the most significant and complex concept in language assessment, and it has always been under investigation by language testing experts and researchers (e.g., Bachman, 1990, 2004, 2005; Bachman & Palmer, 2010; Chapelle, 1999; Cronbach & Meehl, 1955; Kane, 2001, 2012, 2013; Lado, 1961; Messick, 1989). As a result, the conception of validity has undergone a series of reinterpretations throughout the history of language assessment.
  • 4. Argument-based Approaches to Validity Argument-based approaches to validity are based on the concept of a validity argument which has been used in the process of validation for more than twenty years (e.g., Bachman, 2005; Cronbach, 1988, 1989; House, 1980, Kane, 1992; Mislevy, 2003; Mislevy, Steinberg, & Almond, 2002, 2003). The process is comparable to building a legal case to persuade a judge or a jury. The process of validation thus becomes ongoing: As long as a test is alive, the collection of relevant evidence is going to be continued (Bachman & Palmer, 2010). Hence, any kind of relevant evidence is gathered to show the plausibility of the intended interpretations and uses (Bachman, 2004; Kane, 2012).
  • 5. Purpose of the Study Using Bachman and Palmer's (2010) AUA as a framework, this study examined the validity of an English language placement test, which is composed of the Oxford Quick Placement Test (OQPT) and a follow-up oral examination. The following research question was addressed: To what extent are the OQPT and the oral examination justifiable in placing students appropriately according to Bachman and Palmer's (2010) AUA?
  • 6. Methodology: Participants and Setting This study was conducted at one of the English language institutes in Tehran, Iran. Three hundred and thirty-two (332) newcomers to the institute who had to take the placement test participated in this study, and 15 of them were interviewed. The head of the institute, three examiners of the placement test, ten teachers, and four experts also attended the current study.
  • 7. Methodology: Instrumentation 1. OQPT 2. Oral Examination 3. TOEFL 4. Interview 5. Observation
  • 8. The AUA for Justifying the Placement Test As Vongpumivitch (2010) and Wang et al. (2012) remark, Bachman and Palmer's (2010) framework has a top-down approach. That is, the four claims are discussed from the perspective of test development rather than from that of test use. Therefore, in this study, where the aim is to evaluate the overall usefulness of an English language placement test, the four claims are presented in the reverse order from that in Bachman and Palmer (2010). It should also be mentioned that as Bachman and Palmer (2010) emphasize not all of the warrants and rebuttals listed by them will necessarily be needed in the AUA for any given test. Moreover, due to practical research limitations, not all of the warrants and rebuttals have been investigated in the present study.
  • 9. Claim 4: The assessment records of the OQPT and the oral examination are consistent across different assessment tasks, different aspects of the assessment procedure, and across different groups of test takers. Claim 4: Assessment Records
  • 10. Consistency The first warrant for this claim is that the procedures for administrating the OQPT and the oral examination are followed consistently across different occasions and for all test takers:  The observation of how the OQPT and the oral examination were administered as well as the interviews with the examiners and the head of the institute revealed that there are a set of administrative procedures which are strictly followed by the test administrators; hence, the administrative procedures are consistent across different occasions and for all test taker groups.
  • 11. Consistency (cont.) Another warrant to support the consistency claim involves the scoring criteria and procedures:  The criteria and procedures for rating test takers' performance on the OQPT are well specified and are adhered to. Since the OQPT is in multiple-choice format, its rating criteria and procedures are quite objective, and scoring is done based on an answer key.  However, the criteria and procedures for the oral examination are not well specified and are quite subjective. A set of questions have been devised based on the coursebook. In this sense, the administration of the oral examination is consistent, yet its scoring process does not follow any specific procedures. This lack of evidence could be a rebuttal here.
  • 12. Consistency (cont.) With respect to the warrant of rater training: Raters undergo training before administrating the placement test. However, one of the examiners was not satisfied with the training process, and she believed that what matters is just the examiner's marketing skill to "grab more customers" for the institute.
  • 13. Consistency (cont.) To check internal consistency of the items, as another warrant:  Kuder-Richardson formula 20 (KR-20) was used.  The reliability coefficient (KR-20) obtained for the OQPT Version 1 was .93 and for the OQPT Version 2 was .88 showing that the OQPT has reasonable internal consistency reliability.  Two main test item indices (item difficulty and item discrimination) were used in the test item analysis for the OQPT.  In terms of difficulty, the items have been ordered from the easiest to the most difficult, and this is in line with the view of experts, examiners, and test takers. The analysis of item difficulty showed that, by and large, most of the items were difficult (56% in the Version 1 and 63% in Version 2), and both versions of the OQPT did not have a fairly acceptable distribution of difficulty.  In terms of item discrimination, the analysis of items demonstrated that the items in the OQPT Version 1 had good amount of discrimination (75%). However, the OQPT Version 2 contained less items of fair discrimination (46%) in comparison to the first version.
  • 14. Consistency (cont.)  In regard to inter-rater and intra-rater reliability, inconsistencies between and within human raters are not a source of measurement error because the scoring of the OQPT is done through an answer key.  In the case of the oral examination, Cronbach's alpha was computed. The alpha was .93 for inter-rater reliability and .96 for intra-rater reliability indicating that despite the lack of consistent criteria and procedures for the oral examination, the oral examination has reasonable internal consistency reliability.  The analysis of the two versions of the OQPT brought a serious rebuttal to the consistency claim. Cronbach's alpha was calculated and the alpha was .000 indicating that the two versions are not equivalent. Two experts also remarked that the second version is much more difficult, and that it cannot be considered as equivalent to the first version.
  • 15. Claim 3: The OQPT scores and the oral examination results can be interpreted as test takers' level of English proficiency and place them in their appropriate level. Such interpretations are meaningful, impartial, generalizable, relevant, and sufficient. Claim 3: Interpretations
  • 16. Meaningfulness  Interpretations about the construct to be assessed are meaningful if they are based on a frame of reference like a course syllabus, a needs analysis, or a general theory of language ability. The head of the institute and the examiners believed that the OQPT is a suitable placement test because Oxford University Press is the publisher of both this test and the coursebook taught in the institute (i.e., English Result). The teaching method followed in the institute is Communicative Language Teaching (CLT), and speaking is thus the primary focus; therefore, the oral examination has a significant role in placement testing. However, lack of a listening section in the OQPT can be a rebuttal to the meaningfulness of the interpretations.
  • 17. Impartiality To support the impartiality warrant, the assessment items/tasks should be checked for response formats or content that may either favor or disfavor some test takers, and test takers should be treated impartially in terms of all aspects of test administration.  As mentioned earlier, interviews with test takers and examiners revealed that due to the lack of a specific rubric for the oral examination, it was believed that interpretations about the ability to be assessed were not made without bias against any groups of test takers.  No complaint was made in regard to the appropriateness of the content. Bias and item sensitivity studies need to be done for deeper analysis.
  • 18. Generalizability According to the generalizability warrant, the characteristics of the assessment items/tasks (e.g., input, expected response, type of interaction) as well as the scoring criteria and procedures of the test tasks should correspond closely to those of the target language use (TLU) domain.  It might be that the items/tasks in the OQPT and the oral examination do not exactly correspond to all of the TLU tasks; however, the content of the OQPT and the oral examination corresponds to the content of the textbook taught in the institute. Moreover, in the oral examination, test takers' real world language performance is examined. Thus, this can to some extent support the generalizability warrant.  Here, it is worth mentioning that some of the teachers, examiners, and experts asserted that a TOEFL or an IELTS test was a better test, indeed an ideal one, due to having writing and specially listening parts, for placement testing, but because of time limitations, they could not be used as placement tests. Consequently, a TOEFL test was given to those who had taken the OQPT, and the results manifested that the correlation between the OQPT and TOEFL scores was not high (r=.66).
  • 19. Relevance The forth warrant is relevance according to which the assessment-based interpretations should provide information that is relevant and helpful for the decision makers to make decisions.  Based on the interviews conducted with examiners, it was revealed that the OQPT scores were not sufficiently helpful in placement testing. This could have been a rebuttal to the relevance warrant; yet, the oral examination is in support of this warrant since it is quite helpful for the examiners to make their placement decisions. As said before, lack of a rubric is a serious rebuttal.
  • 20. Sufficiency The fifth warrant demands that the assessment-based interpretations should provide information that is sufficient for the decision makers to make decisions. Again, because the process of placement testing includes both a written and an oral test, sufficient information is obtained to make placement decisions.
  • 21. Claim 2: The placement decisions that are made on the basis of the OQPT scores and the oral examination results are sensitive to local values and equitable to all stakeholders. Claim 2: Decisions
  • 22. Values Sensitivity According to values sensitivity warrant, the existing community values and relevant legal requirements should be carefully and critically considered in the admission decisions that are to be made and in determining the relative seriousness of false positive and false negative classification errors.  The interviews and observations revealed that the process of placement testing does not guarantee test fairness considerations, and in this phase just absorbing more clients is important. Hence, it is possible to have potential false positives (i.e., individuals are placed in a level higher than their actual level) and false negatives (i.e., individuals are placed in a level lower than their actual level). Usually the latter happens, because it is less risky; nevertheless, if at the time of placement testing the institute does not have the level appropriate for the test taker (due to some limitations, such as space, time, lack of students, all the levels are not always covered), the test taker will be placed in a higher level.
  • 23. Equitability  Due to the subjectivity of the oral examination, it cannot be claimed that the same cut scores and decision rules are used to classify all students who have applied for the same program, and no other considerations are used. The economic and practical considerations always exist. Consequently, test takers and other stakeholders are not fully informed about how decisions will be made and whether decisions are actually made in the way described to them.
  • 24. Claim 1: The consequences of the placement decisions based on the OQPT scores and the oral examination results are beneficial to all stakeholders that use the test, including the test takers, the institution, the teachers, and the supervisor. Claim 1: Consequences
  • 25. Beneficence The first warrant is that the consequences of using the assessment that are specific to each stakeholder will be beneficial. Some of the test takers were interviewed after their placement test and their attendance in their classes. On the whole, they were satisfied, although some of them believed that their level was higher, and that they were placed in a lower level. To them, the reason is basically cost-effectiveness. Most of the teachers believed that their students were homogenous in the class; yet, two teachers strongly disagreed and they believed that their classes were not at all homogenous especially at higher levels.
  • 26. Conclusion  Based on the evidence gathered, this study found that the assessment records of the OQPT and the oral examination were consistent across different assessment tasks, different aspects of the assessment procedure, and across different groups of test takers. However, the oral examination required a set of clear criteria.  Moreover, the analysis of the two versions of the OQPT manifested that their parallelism was under question which could threaten the consistency of the assessment records.  The findings also indicated that the OQPT scores and the oral examination results could be interpreted somewhat as test takers' level of English proficiency and could place them in their appropriate levels. Such interpretations were meaningful, impartial, relevant, and sufficient, although lack of a listening section in the OQPT and lack of a rubric for the oral examination could be threatening, and generalizability of the results was to some extent under question.  In addition, the placement decisions were not sensitive to local values and equitable to all stakeholders due to the subjectivity of the oral examination and the economic considerations of the institute.  Lastly, by and large, the consequences of the placement decisions were beneficial to all stakeholders that use the test, which is composed of the test takers, the institution, the teachers, and the supervisor.
  • 27. Local Implications To support the intended test use, it would be helpful to examine the negative evidence that has been identified in this study and resolve the identified issues or mitigate the potential negative impact of unresolved issues. For instance, in the case of the current placement test, the oral examination can be given based on a rubric, a listening section can be added to the written test, and economic considerations can become less important for the institute, and therefore the intended uses of the test would become much more justifiable with stronger evidence.
  • 28. The Merits/Demerits of Using an AUA Finally, this study serves as an illustration of the merits/demerits of using an AUA.  On the whole, the AUA provides a sound framework in which the validity of a test and its use can be justified and the test developers/users can be accountable for their test.  With the help of the AUA framework, the process of assessment justification becomes more comprehensive, systematic, and coherent. In fact, one of the merits of an AUA is its clear articulation about which types of evidence should be collected for which claims or warrants.  However, in the process of assessment justification, an AUA demands that the evaluation of the test be done at many levels and this needs different types of data and analyses. Thus, in practice, the complexity of the justification study may be a big challenge for a single researcher.
  • 29. Thank you for your attention!

Editor's Notes

  • #14: According to Bachman (2004), items within a fairly narrow range of item difficulty, around .50, are desirable. Oller (1979) asserts that, for item discrimination, correlations of less than .35 are considered as not being useful for discriminating between participants.
  • #17: There are seven warrants for meaningfulness.