SlideShare a Scribd company logo
Language Testing: The Social Dimension
Tim McNamara and
Carsten Roever
(Pages 1-64)
Presented by: Amir Hamid Forough Ameri
ahfameri@gmail.com
December 2015
Chapter 2: Validity and the Social Dimension
of Language Testing
In what ways is the social dimension of language assessment reflected
in current theories of the validation of language tests?
Mislevy
Lynch
Cronbach
Kunnan
BachmanShohamy
Messick
Chapelle
Kane
Cronbach
Contemporary discussions of validity in educational assessment are
heavily influenced by the thinking of the American Lee Cronbach;
The “father” of construct validity.
Meehl and Challman coined the term “construct validity.”
Cronbach
Their new concept of construct validity was an alternative to criterion-
related validity:
Construct validity is ordinarily studied when the tester has no definite
criterion measure of the quality with which he [sic] is concerned and
must use indirect measures. Here the trait or quality underlying the test
is of central importance, rather than either the test behavior or the cores
on the criteria. (Cronbach & Meehl, 1955, p. 283)
Cronbach
Since Cronbach and Meehl’s article:
A: There has been the increasingly central role taken by construct
validity,
B: There is also clear recognition that validity is not a mathematical
property like discrimination or reliability, but a matter of judgment.
Cronbach (1989) emphasized the need for a validity argument, which
focuses on collecting evidence for or against a certain interpretation of
test scores: In other words, it is the validity of inferences that construct
validation work is concerned with, rather than the validity of
instruments.
Cronbach
In fact, Cronbach argued that there is no such thing as a “valid test,”
“One does not validate a test, but only a principle for making
inferences” (Cronbach & Meehl, 1955, p. 297);
“One validates not a test, but an interpretation of data arising from a
specified procedure” (Cronbach, 1971, p. 447).
Cronbach
Cronbach and Meehl (1955) distinguished between a weak and a
strong program for construct validation:
 The weak one is a fairly haphazard collection of any sort of evidence
(mostly correlational) that supports the particular interpretation to be
validated.
In contrast, the strong program is based on the falsification idea
advanced by Popperian philosophy (Popper, 1962): Rival hypotheses
for interpretations are proposed and logically or empirically examined.
Cronbach
Cronbach (1988) admitted that the actual approach taken in most
validation research is “confirmationist” rather than “falsificationist”
and aimed at refuting rival hypotheses.
Through his experiences in program evaluation, Cronbach highlighted
the role of beliefs and values in validity arguments, which “must link
concepts, evidence, social and personal consequences, and values”
(Cronbach, 1988, p. 4).
What we have here then is a concern for social consequences as a kind
of corrective to an earlier entirely cognitive and individualistic way of
thinking about tests.
Messick
 The most influential current theory of validity is developed by Samuel
Messick (1989).
 Messick incorporated a social dimension of assessment quite explicitly
within his model.
 Messick, like Cronbach, saw assessment as a process of reasoning and
evidence gathering carried out in order for inferences to be made about
individuals and saw the task of establishing the meaningfulness of those
inferences as being the primary task of assessment development and
research.
 This reflects an individualist, psychological tradition of measurement
concerned with fairness.
Messick
Messick
Messick on Construct Validity
Setting out the nature of the claims that we wish to make about test
takers and providing arguments and evidence in support of them, as in
Messick’s cell 1, represents the process of construct definition and
validation.
Those claims then provide the rationale for making decisions about
individuals (cells 2 and 4) on the basis of test scores.
Consider tests such as IELTS, TOEFL iBT, Occupational English Test
(McNamara, 1996):
Messick on Construct Validity
The path from the observed test performance to the predicted real-
world performance (in performance assessments) or estimate of the
actual knowledge of the target domain (in knowledge-based tests)
involves a chain of inferences.
In principle, how the person will fare in the target setting cannot be
known directly, but must be predicted.
 Deciding whether the person should be admitted then depends on two prior steps:
1. Modeling what you believe the demands of the target setting are and
2. predicting what the standing of the individual is in relation to this
construct.
Messick on Construct Validity
 The test is a procedure for gathering evidence in support of decisions that
need to be made.
 The relationships among test, construct and target are set out in Figure 2.3.
(Next Slide)
 Validity therefore implies considerations of social responsibility, both to
the candidate (protecting him or her against unfair exclusion) and to the
receiving institution.
 Fairness in this sense can only be achieved through carefully planning the
design of the observations of candidate performance and carefully
articulating the relationship between the evidence we gain and the inferences
about candidate standing.
Messick on Construct Validity
Messick on Construct Validity
Test validation steers between what Messick called
1. construct underrepresentation, and
2. construct-irrelevant variance,
The former: the assessment requires less of the test taker than is
required in reality.
The latter: differences in scores might not be due only to differences
in the ability being measured but that other factors are illegitimately
affecting scores.
Mislevy
Central to assessment is the chain of reasoning from the observations
to the claims we make about test takers, on which the decisions about
them will be based. Mislevy calls this the “assessment argument”.
According to Mislevy: An assessment is a machine for reasoning about
what students know, can do, or have accomplished, based on a handful
of things they say, do, or make in particular settings.
Mislevy has developed an approach called Evidence Centered Design
(Figure 2.5), which focuses on the chain of reasoning in designing
tests.
Construct Definition and Validation:
Mislevy
Construct Definition and Validation:
Mislevy
Construct Definition and Validation:
Mislevy
Construct Definition and Validation:
Mislevy
Construct Definition and Validation:
Mislevy
A preliminary first stage, Domain Analysis, involves what in
performance assessment is called job analysis (the testing equivalent of
needs analysis).
Here the test developer needs to develop insight into the conceptual
and organizational structure of the target domain.
What follows is the most crucial stage of the process, Domain
Modeling.
It involves modeling three things: claims, evidence, and tasks.
Together, the modeling of claims and evidence is equivalent to
articulating the test construct.
Construct Definition and Validation:
Mislevy
Step 1 involves the test designer in articulating the claims the test will
make about candidates on the basis of test performance.
This involves conceptualizing the aspects of knowledge or
performance ability to which the evidence of the test will be directed
and on which decisions about candidates will be based.
Claims might be stated in broader or narrower terms; the latter
approach brings us closer to the specification of criterion behaviors in
criterion-referenced assessment (Brown & Hudson, 2004).
Construct Definition and Validation:
Mislevy
Step 2 involves determining the kind of evidence that would be
necessary to support the claims.
This stage depends on a theory of the characteristics of a successful
performance, e.g., the categories of rating scales used to judge the
adequacy of the performance.
Step 3 involves defining in general terms the kinds of task in which the
candidate will be required to engage.
Construct Definition and Validation:
Mislevy
All three steps precede the actual writing of specifications for test
tasks; they constitute the “thinking stage” of test design.
Only when this chain of reasoning is completed can the specifications
for test tasks be written.
In further stages of ECD, Mislevy deals with turning the conceptual
framework developed in the domain modeling stage into an actual
assessment.
The final outcome of this is an operational assessment.
Construct Definition and Validation:
Mislevy
The consideration of the social dimension of assessment in Mislevy’s
conceptual analysis remains implicit and limited to issues of fairness
(McNamara, 2003):
1. Mislevy does not consider the context in which tests are
commissioned.
2. Nor does Mislevy deal directly with the uses of test scores.
Kane
Kane points out that we interpret scores as having meaning. The same
score might have different interpretations.
Whatever the interpretation we choose, Kane argues, we need an
argument to defend the relationship of the score to that interpretation.
He calls this an interpretative argument, defined as a:
“chain of inferences from the observed
performances to conclusions and decisions included
in the interpretation”
Kane
Kane
Kane proposes four types of inference in the chain of inferences. He
uses the metaphor of bridges for each of these inferences; all bridges
need to be crossed safely for the final interpretations to be reached.
The ultimate decisions are vulnerable to weaknesses in any of the
preceding steps; in this way, Kane is clear about the dependency of
valid interpretations on the reliability of scores.
Kane
Kane
The first inference is from observation to observed score. In order
for assessment to be possible, an instance of learner behavior needs to
be observable.
This behavior is then scored.
The first type of inference is that the observed score is a reflection of
the observed behavior (i.e., that there is a clear scoring procedure.)
Kane
The second inference is from the observed score to what Kane called
the universe score. This inference is that the observed score is
consistent across tasks, judges, and occasions.
This involves reliability and can be studied effectively using
generalizability theory, item response modeling, etc.
A number variables can threaten the validity of this inference,
including raters, task, rating scale, candidate characteristics, and
interactions among these.
Kane
The third inference, from the universe score to the target score:
construct validity which is closest to the first cell of Messick’s validity
matrix.
This inference involves extrapolation to nontest behavior—in some
cases, via explanation in terms of a model.
The fourth inference, from the target score to the decision based on
the score, moves the test into the world of test use and test context;
 It encompasses the material in the second, third, and fourth cells of
Messick’s matrix involving questions of relevance (cell 2), values (cell
3), and consequences (cell 4)
Kane
Kane thus distinguishes two types of inference:
1. semantic inferences
2. policy inferences
Kane also distinguishes two related types of interpretation:
1. Interpretations that only involve semantic inferences are called
descriptive interpretations;
2. Interpretations involving policy inferences are called decision-based
interpretations.
The Social Dimension of Validity in Language Testing
Some of the implications for assessment of the more socially oriented
views of communication were represented in the work of Hymes (1967,
1972)
However, the most influential discussion of the validity of
communicative language tests, Bachman’s landmark Fundamental
Considerations in Language Testing (1990), builds its discussion
around Messick’s approach to validity.
Lado (1961) and Davies (1977), had reflected the existing modeling of
validity in terms of different types: criterion validity (concurrent/
predictive), content validity, and construct validity.
The Social Dimension of Validity in Language Testing
Following Messick, Bachman presented validity as a unitary concept,
requiring evidence to support the inferences that we make on the basis
of test scores.
Bachman (1990) introduced the Bachman model of communicative
language ability, clarifying the earlier interpretations by Canale and
Swain (1980) and Canale (1983).
Conceptualizing the demands of the target domain or criterion is a
critical stage in the development of a test validation framework.
The Social Dimension of Validity in Language Testing
Bachman does this in two stages:
First, he assumes that all contexts make specific demands on aspects
of test-taker competence;
The famous model of communicative language ability
Second, Bachman handled characterization of specific target language
use situations and of test content in terms of what he called
Test method facets.
The Social Dimension of Validity in Language Testing
Problems of Bachman’s Model:
 This a priori approach to characterizing the social context of use in
terms of a model of individual ability obviously severely constrains
the conceptualization of the social dimension of the assessment
context.
 The model is clearly primarily cognitive and psychological.
 Those who have used the test method facets approach in actual
research and development projects have found it difficult to use.
The Social Dimension of Validity in Language Testing
One influential attempt to make language test development and
validation more manageable Bachman and Palmer’s (1996) test
usefulness : reliability, construct validity, authenticity, interactiveness,
impact, and practicality.
Note: Authenticity, interactiveness and impact are three qualities
that many measurement specialists consider to be part of validity.
The Social Dimension of Validity in Language Testing
Bachman (2004): the process of validation includes two interrelated
activities:
1. Articulating an interpretive argument (also referred to as a validation
argument), which provides the logical framework linking test
performance to an intended interpretation and use.
Following Kane
2. Collecting relevant evidence in support of the intended
interpretations and uses. Following Mislevy
The Social Dimension of Validity in Language Testing
 Bachman describes procedures for validation of test use decisions,
following Mislevy et al. (2003) in suggesting that they follow the structure
of a Toulmin argument (i.e., a procedure for practical reasoning involving
articulating claims and providing arguments and evidence both in their
favor [warrants or backing] and against [rebuttals]).
 An assessment use argument (AUA) is an overall logical framework for
linking assessment performance to use (decisions). This assessment use
argument includes two parts: an assessment utilization argument, linking an
interpretation to a decision, and an assessment validity argument, which
links assessment performance to an interpretation.
The Social Dimension of Validity in Language Testing
Following Bachman, Chapelle, Enright, and Jamieson (2004) used
Kane’s framework as the basis for a validity argument for the new
TOEFL iBT, arguably the largest language test development and
validation effort yet undertaken.
In both the work of Kane and in work in language testing, with one or
two notable exceptions, the wider social context in which language
tests are commissioned and have their place is still not adequately
theorized.
The Social Dimension of Validity in Language Testing
A kind of optimism about the social role of tests is reflected by
Kunnan and Bachman.
In sharp contrast to this position is Shohamy’s analysis of the political
function of tests: critical language testing (1998, 2001).
Tests have become symbols of power for both individuals and society
 Lynch (2001), like Shohamy: tests have the potential to be sites of
unfairness or injustice at the societal level.
Chapter 3: The Social Dimension of
Proficiency: How Testable Is It?
 The changes in assessment are associated with the advent of the communicative
movement in language teaching in the mid-1970s.
 The most significant contribution of that movement was a renewed focus on
performance tests.
 There was a shift from seeing language proficiency in terms of knowledge of structure
which could be tested using discrete-point, multiple-choice items, to an emphasis on the
integration of many aspects of language knowledge and skill in performance.
 Two areas will be considered in which the assessment construct involves a social
dimension of communication:
• the assessment of face-to-face interaction
• the assessment of pragmatics.
Assessing Face-to-face Interaction
The advent of communicative language testing saw a growing
preference for face-to-face interaction as the context in which the
assessment of spoken language skills would occur.
In the past:
U.S. Government Foreign Service Institute (FSI) test and
the British tradition of examinations, but
these practices were seriously undertheorized.
Assessing Face-to-face Interaction
Language assessment in face-to-face interaction takes place within
what Goffman (1983) called the interaction order:
Social interaction . . . [is] that which uniquely transpires in social
situations, that is, environments in which two or more individuals are
physically in one another’s response presence. . . . which might be
titled the interaction order . . . [and] whose preferred method of study is
microanalysis.
Face to face interaction has its own regulations; it has its own
processes and its own structure.
Assessing Face-to-face Interaction
The nature of behavior in face-to-face domain, and the rules that
constrain it was systematically explored in the work of students of
Goffman, Sacks and Schegloff, who developed Conversation Analysis.
The realization was that for speaking tests, the presence of an
interlocutor in the test setting introduces an immediate and overt
social context, which presented fundamental challenges for the existing
individualistic theories of language proficiency.
Assessing Face-to-face Interaction
A social view of performance is incompatible with taking the
traditional view of performance as a simple projection or display of
individual competence.
Psychology, linguistics, and psychometrics assume that it is possible to
read off underlying individual cognitive abilities from the data of
performance.
 However, the position of Conversation Analysis is that the
interlocutor is implicated in each move by the candidate.
 How are we to speak of communicative competence as residing in the
individual if we are to “include the hearer in the speaker’s processes”?
Assessing Face-to-face Interaction
Research on this issue has taken a number of directions. Discourse
analysts have focused on defining and clarifying the interactional
nature of the oral proficiency interview, e.g., Young and He (1998);
Johnson (2001). Results:
1. To emphasize the contribution of both parties to the performance, thus
making it difficult to isolate the contribution of the candidate, which
is the target of measurement and reporting.
2. To disclose the peculiar character of the oral proficiency interview as
a communicative event.
Assessing Face-to-face Interaction
I do not agree that the interview is more natural than the
other forms of tests, because if I’m being interviewed and
I know that my salary and my promotion depend on it, no
matter how charming the interviewer and his assistants
are, this couldn’t be any more unnatural.
(Jones & Spolsky, 1975, p. 7)
Assessing Face-to-face Interaction
The response from within the psychometric tradition is to maintain the
goal of disentangling the role of the interlocutor by treating it as a
variable like any other, to be controlled for, thus allowing us to focus
again exclusively on the candidate.
The most comprehensive study of the role of the interlocutor in
performances in oral proficiency interviews is that of Brown (2003,
2005).
Assessing Face-to-face Interaction
Findings:
 It was possible to identify score patterns (higher or lower scores) for
candidates paired with particular interlocutors.
What was the source of this effect? Two possible sources:
 The behavior of interlocutors as revealed through discourse analysis,
 The comments of raters as they listened to the performances, using
think-aloud techniques.
Assessing Face-to-face Interaction
The social character of the interaction has also been conceptualized
from more macrosociological points of view, as potentially being
influenced by the identities of the candidate and interlocutor/rater.
 What is at issue here is the extent to which such features as
the gender of participants,
the professional identity and experience of the interviewer/rater,
the native-speaker status (or otherwise) of the interviewer/rater,
the language background of the candidate,
and so on influence the interaction and its outcome.
Assessing Face-to-face Interaction
This is the subject of intensive ongoing debate in discourse studies of
macrosocial phenomena e.g.:
o feminist discourse studies,
o studies of racism,
o the movement known as Critical Discourse Analysis in general, and
o discursive psychology.
Assessing Second Language Pragmatics
Assessment of L2 pragmatics tests language use in social settings, but
unlike oral proficiency tests, it does not necessarily focus on
conversation or extracting speech samples.
Because of its highly contextualized nature, assessment of pragmatics
leads to significant tension between
the construction of authentic assessment tasks and
practicality
only few tests are available in this area
Second Language Pragmatics
 Pragmatics is the study of language use in a social context, and language
users’ pragmatic competence is their “ability to act and interact by means of
language” (Kasper & Roever, 2005, p. 317).
 Pragmatics covers
 implicature,
 deixis,
 speech acts,
 conversational management,
 situational routines, and others (Leech, 1983; Levinson, 1983; Mey, 2001).
Second Language Pragmatics
Components of pragmatic competence (both are equally necessary):
A: “sociopragmatic” knowledge
knowledge of the target language community’s
social rules, appropriateness norms, discourse practices,
and accepted behaviors, whereas
B: “pragmalinguistic” knowledge
the linguistic tools necessary to
“do things with words”
(Austin, 1962)
Second Language Pragmatics
 Because of the close connection between pragmalinguistics and sociopragmatics, it is
difficult to design a test that tests pragmalinguistics to the exclusion of sociopragmatics or
vice versa.
Pragmatic Tests
Hudson, Detmer, & Brown’s (1995) test of English as a second language
(ESL) sociopragmatics
Bouton’s test of ESL implicature (Bouton, 1988, 1994, 1999)
Roever’s Web-based test of ESL pragmalinguistics
(Roever, 2005, 2006b) Liu’s (2006) test of EFL sociopragmatics.
Liu’s (2006) test of EFL sociopragmatics.
Testing Sociopragmatics
 Hudson et al. (1995) designed a test battery for assessing Japanese ESL
learners’ ability to produce and recognize appropriate realizations of the
speech acts of request, apology, and refusal.
 Their battery consisted of these sections,
1. a written DCT,
2. an oral (language lab) DCT,
3. a multiple-choice DCT,
4. a role-play, as well as
5. self-assessment measures for
• the DCTs and the role play.
Testing Sociopragmatics
Result:
A central problem of sociopragmatically oriented tests that focus
on appropriateness: Judgments of what is and what is not appropriate
differ widely among NSs and are probably more a function of
personality and social background variables than of language
knowledge.
Testing Implicature
 Bouton (1988, 1994, 1999) designed a 33-item test of implicature, incorporating two
major types of implicature, which he termed “idiosyncratic implicature” and
“formulaic implicature.”
 Idiosyncratic implicature is conversational implicature in Grice’s terms (1975); that is, it
violates a Gricean maxim and forces the hearer to infer meaning beyond the literal
meaning of the utterance by using background knowledge.
 Bouton viewed formulaic implicature as a specific kind of implicature, which follows a
routinized schema.
 He placed the Pope Q (“Is the Pope Catholic?”), indirect criticism (“How did you like the
food?”—“Let’s just say it was colorful.”), and sequences of events in this category.
Testing Pragmalinguistics
 Roever (2005, 2006b) developed a test battery that focused squarely on the
pragmalinguistic side of pragmatic knowledge.
 Unlike Hudson et al., who limited themselves to speech acts, and Bouton,
who assessed only implicature, Roever tested three aspects of ESL
pragmalinguistic competence:
1. recognition of situational routine formulas,
2. comprehension of implicature, and
3. knowledge of speech act strategies.
 Roever tried to strike a balance between practicality and broad content
coverage to avoid construct underrepresentation.
Apologies and Requests for Chinese EFL
Learners
 Liu (2006) developed a test of requests and apologies for Chinese EFL learners, which
consisted of
 a multiple-choice DCT,
 A written DCT, and
 a self-assessment instrument.
 All three test papers contained 24 situations, evenly split between apologies and requests.
 As Liu (1996) stated quite clearly, his test exclusively targets Chinese EFL learners so its
usability for other first language (L1) groups is unclear.
 It also only considers one aspect of L2 pragmatic competence—speech acts—so that
conclusions drawn from it and decisions based on scores would have to be fairly limited
and restricted.
The Tension in Testing Pragmatics: Keeping It
Social but Practical
If pragmatics is understood as language use in social settings, tests
would necessarily have to construct such social settings.
The usual method of DCTs of cramming all possible speech act
strategies into one gap is highly inauthentic in terms of actual
conversation.
The obvious alternative would be to test pragmatics through face-to-
face interaction, most likely role-plays. In this way, speech acts could
unfold naturally as they would in real-world interaction.
The Tension in Testing Pragmatics: Keeping It
Social but Practical
 Problems with Role Plays:
 Practicality.
 Being time consuming to conduct
 Requiring multiple ratings.
 A standardization issue because every role-play would be somewhat unique
if the conversation is truly co-constructed.
 How can this dilemma be solved?
 One way is to acknowledge that pragmatics does not exclusively equal
speech acts.
The Tension in Testing Pragmatics: Keeping It
Social but Practical
 It is possible to test other aspects of pragmatic competence without simulating
conversations. Certain aspects of pragmatics can be tested more easily in isolation than
others. For example,
 Routine formulas
 Implicature
 However, such a limitation begs the question of construct underrepresentation.
 Given that pragmatics is a fairly broad area, it is difficult to design a single test that
assesses the entirety of a learner’s pragmatic competence.
 Depending on the purpose of the test, different aspects of pragmatic competence can be
tested, and for some purposes, it might be unavoidable to use role-plays or other
simulations of social situations
Language testing  the social dimension

More Related Content

PPTX
Systemic Functional Linguistics: An approach to analyzing written academic di...
PPTX
Applied linguistics: overview
PPT
Syntax and lexis presentation final 3
PPTX
Fossilization
PPTX
Acculturation Model.pptx
PPTX
Functional grammar
PPT
Accommodation theory
PPTX
Systemic Functional Linguistics: An approach to analyzing written academic di...
Applied linguistics: overview
Syntax and lexis presentation final 3
Fossilization
Acculturation Model.pptx
Functional grammar
Accommodation theory

What's hot (20)

PPTX
Discourse analysis
PPTX
Discourse Analysis and grammar
PPTX
Katharina reiss
PDF
World english
PPTX
Interactional hypothesis
PPTX
Product oriented syllabus and process oriented syllsbus
PPTX
Discussion summary emergentism
PPTX
Krashen's Input Hypotheses
DOCX
Language variartion and varities of language
PPT
English for Specific Purposes
PPTX
Introduction to Translanguaging in the ESL/EFL Classroom
PPTX
Teun van dijk
DOCX
Testing and evaluation
PPSX
Discourse analysis and grammar
PPT
Aspects of Critical discourse analysis by Ruth Wodak
DOC
Post Structuralism
PPTX
Tema # 2 the copenhagen school
PPTX
Prague school slides
PPT
Language Variation
PPTX
Communicative Testing
Discourse analysis
Discourse Analysis and grammar
Katharina reiss
World english
Interactional hypothesis
Product oriented syllabus and process oriented syllsbus
Discussion summary emergentism
Krashen's Input Hypotheses
Language variartion and varities of language
English for Specific Purposes
Introduction to Translanguaging in the ESL/EFL Classroom
Teun van dijk
Testing and evaluation
Discourse analysis and grammar
Aspects of Critical discourse analysis by Ruth Wodak
Post Structuralism
Tema # 2 the copenhagen school
Prague school slides
Language Variation
Communicative Testing
Ad

Viewers also liked (20)

PPTX
Ail apresentation(kumazawa)
PDF
Xác trị slide 1 - validation basics
PPT
elements of language &style
PPTX
The role of corrective feedback in second language learning
PPTX
Exploring culture by ah forough ameri
PPTX
Reliability bachman 1990 chapter 6
PPTX
Extroversion introversion
PPTX
PPTX
Maryam Bolouri
PPTX
Thesis summary by amir hamid forough ameri
PPT
Behavioral view of motivation
PPTX
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
PPTX
Critical pedagogy in l2 learning and teaching suresh canagarajah
PPTX
Context culture .... m. wendt
PPTX
Integrated syllabus
PPTX
ANxiety bolouri
PPTX
Swan.bolouri
PPTX
Critical literacy and second language learning luke and dooley
PPTX
attitide anxiety bolouri
PPTX
english evaluation Language elements
Ail apresentation(kumazawa)
Xác trị slide 1 - validation basics
elements of language &style
The role of corrective feedback in second language learning
Exploring culture by ah forough ameri
Reliability bachman 1990 chapter 6
Extroversion introversion
Maryam Bolouri
Thesis summary by amir hamid forough ameri
Behavioral view of motivation
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Critical pedagogy in l2 learning and teaching suresh canagarajah
Context culture .... m. wendt
Integrated syllabus
ANxiety bolouri
Swan.bolouri
Critical literacy and second language learning luke and dooley
attitide anxiety bolouri
english evaluation Language elements
Ad

Similar to Language testing the social dimension (20)

PPTX
Messick’s framework
PPTX
The validity of Assessment.pptx
PPTX
Validity
PPTX
How do we go about investigating test fairness
PPT
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
PPT
Intro assessmentcmm
PPT
Chapter 2(principles of language assessment)
PDF
the Routledge hanbook of language testing Ch 32. fairness
PPT
Introduction to Language Assessment by Brown
PPTX
Validity of Assessment Tools
PDF
the Routledge handbook of language testing Chapter 14
PDF
Aspects of Validity
PPTX
Assessment of Learning
PPTX
Multiple choose question Item Writing.pptx
PPT
Evaluation in education
PPTX
PRINCIPLES OF ASSESSMENT 2.pptx
PPTX
Language testing in all levels
PPTX
Principles of assessment
PPTX
Validity.pptx
Messick’s framework
The validity of Assessment.pptx
Validity
How do we go about investigating test fairness
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Intro assessmentcmm
Chapter 2(principles of language assessment)
the Routledge hanbook of language testing Ch 32. fairness
Introduction to Language Assessment by Brown
Validity of Assessment Tools
the Routledge handbook of language testing Chapter 14
Aspects of Validity
Assessment of Learning
Multiple choose question Item Writing.pptx
Evaluation in education
PRINCIPLES OF ASSESSMENT 2.pptx
Language testing in all levels
Principles of assessment
Validity.pptx

More from Amir Hamid Forough Ameri (9)

PPTX
The task based approach some questions and suggestions littlewood
PPTX
Task based research and language pedagogy ellis
PPTX
Notional functional syllabus
PPT
Test specifications and designs session 4
PPTX
Standards based classroom assessments of english proficiency
PPTX
Reliability and dependability by neil jones
PPTX
Developing a comprehensive, empirically based research framework for classroo...
PPTX
Classroom assessment, glenn fulcher
The task based approach some questions and suggestions littlewood
Task based research and language pedagogy ellis
Notional functional syllabus
Test specifications and designs session 4
Standards based classroom assessments of english proficiency
Reliability and dependability by neil jones
Developing a comprehensive, empirically based research framework for classroo...
Classroom assessment, glenn fulcher

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Trump Administration's workforce development strategy
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Classroom Observation Tools for Teachers
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
01-Introduction-to-Information-Management.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Trump Administration's workforce development strategy
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Classroom Observation Tools for Teachers
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Anesthesia in Laparoscopic Surgery in India
Yogi Goddess Pres Conference Studio Updates
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Final Presentation General Medicine 03-08-2024.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Microbial diseases, their pathogenesis and prophylaxis
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
01-Introduction-to-Information-Management.pdf

Language testing the social dimension

  • 1. Language Testing: The Social Dimension Tim McNamara and Carsten Roever (Pages 1-64) Presented by: Amir Hamid Forough Ameri ahfameri@gmail.com December 2015
  • 2. Chapter 2: Validity and the Social Dimension of Language Testing In what ways is the social dimension of language assessment reflected in current theories of the validation of language tests? Mislevy Lynch Cronbach Kunnan BachmanShohamy Messick Chapelle Kane
  • 3. Cronbach Contemporary discussions of validity in educational assessment are heavily influenced by the thinking of the American Lee Cronbach; The “father” of construct validity. Meehl and Challman coined the term “construct validity.”
  • 4. Cronbach Their new concept of construct validity was an alternative to criterion- related validity: Construct validity is ordinarily studied when the tester has no definite criterion measure of the quality with which he [sic] is concerned and must use indirect measures. Here the trait or quality underlying the test is of central importance, rather than either the test behavior or the cores on the criteria. (Cronbach & Meehl, 1955, p. 283)
  • 5. Cronbach Since Cronbach and Meehl’s article: A: There has been the increasingly central role taken by construct validity, B: There is also clear recognition that validity is not a mathematical property like discrimination or reliability, but a matter of judgment. Cronbach (1989) emphasized the need for a validity argument, which focuses on collecting evidence for or against a certain interpretation of test scores: In other words, it is the validity of inferences that construct validation work is concerned with, rather than the validity of instruments.
  • 6. Cronbach In fact, Cronbach argued that there is no such thing as a “valid test,” “One does not validate a test, but only a principle for making inferences” (Cronbach & Meehl, 1955, p. 297); “One validates not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971, p. 447).
  • 7. Cronbach Cronbach and Meehl (1955) distinguished between a weak and a strong program for construct validation:  The weak one is a fairly haphazard collection of any sort of evidence (mostly correlational) that supports the particular interpretation to be validated. In contrast, the strong program is based on the falsification idea advanced by Popperian philosophy (Popper, 1962): Rival hypotheses for interpretations are proposed and logically or empirically examined.
  • 8. Cronbach Cronbach (1988) admitted that the actual approach taken in most validation research is “confirmationist” rather than “falsificationist” and aimed at refuting rival hypotheses. Through his experiences in program evaluation, Cronbach highlighted the role of beliefs and values in validity arguments, which “must link concepts, evidence, social and personal consequences, and values” (Cronbach, 1988, p. 4). What we have here then is a concern for social consequences as a kind of corrective to an earlier entirely cognitive and individualistic way of thinking about tests.
  • 9. Messick  The most influential current theory of validity is developed by Samuel Messick (1989).  Messick incorporated a social dimension of assessment quite explicitly within his model.  Messick, like Cronbach, saw assessment as a process of reasoning and evidence gathering carried out in order for inferences to be made about individuals and saw the task of establishing the meaningfulness of those inferences as being the primary task of assessment development and research.  This reflects an individualist, psychological tradition of measurement concerned with fairness.
  • 12. Messick on Construct Validity Setting out the nature of the claims that we wish to make about test takers and providing arguments and evidence in support of them, as in Messick’s cell 1, represents the process of construct definition and validation. Those claims then provide the rationale for making decisions about individuals (cells 2 and 4) on the basis of test scores. Consider tests such as IELTS, TOEFL iBT, Occupational English Test (McNamara, 1996):
  • 13. Messick on Construct Validity The path from the observed test performance to the predicted real- world performance (in performance assessments) or estimate of the actual knowledge of the target domain (in knowledge-based tests) involves a chain of inferences. In principle, how the person will fare in the target setting cannot be known directly, but must be predicted.  Deciding whether the person should be admitted then depends on two prior steps: 1. Modeling what you believe the demands of the target setting are and 2. predicting what the standing of the individual is in relation to this construct.
  • 14. Messick on Construct Validity  The test is a procedure for gathering evidence in support of decisions that need to be made.  The relationships among test, construct and target are set out in Figure 2.3. (Next Slide)  Validity therefore implies considerations of social responsibility, both to the candidate (protecting him or her against unfair exclusion) and to the receiving institution.  Fairness in this sense can only be achieved through carefully planning the design of the observations of candidate performance and carefully articulating the relationship between the evidence we gain and the inferences about candidate standing.
  • 16. Messick on Construct Validity Test validation steers between what Messick called 1. construct underrepresentation, and 2. construct-irrelevant variance, The former: the assessment requires less of the test taker than is required in reality. The latter: differences in scores might not be due only to differences in the ability being measured but that other factors are illegitimately affecting scores.
  • 17. Mislevy Central to assessment is the chain of reasoning from the observations to the claims we make about test takers, on which the decisions about them will be based. Mislevy calls this the “assessment argument”. According to Mislevy: An assessment is a machine for reasoning about what students know, can do, or have accomplished, based on a handful of things they say, do, or make in particular settings. Mislevy has developed an approach called Evidence Centered Design (Figure 2.5), which focuses on the chain of reasoning in designing tests.
  • 18. Construct Definition and Validation: Mislevy
  • 19. Construct Definition and Validation: Mislevy
  • 20. Construct Definition and Validation: Mislevy
  • 21. Construct Definition and Validation: Mislevy
  • 22. Construct Definition and Validation: Mislevy A preliminary first stage, Domain Analysis, involves what in performance assessment is called job analysis (the testing equivalent of needs analysis). Here the test developer needs to develop insight into the conceptual and organizational structure of the target domain. What follows is the most crucial stage of the process, Domain Modeling. It involves modeling three things: claims, evidence, and tasks. Together, the modeling of claims and evidence is equivalent to articulating the test construct.
  • 23. Construct Definition and Validation: Mislevy Step 1 involves the test designer in articulating the claims the test will make about candidates on the basis of test performance. This involves conceptualizing the aspects of knowledge or performance ability to which the evidence of the test will be directed and on which decisions about candidates will be based. Claims might be stated in broader or narrower terms; the latter approach brings us closer to the specification of criterion behaviors in criterion-referenced assessment (Brown & Hudson, 2004).
  • 24. Construct Definition and Validation: Mislevy Step 2 involves determining the kind of evidence that would be necessary to support the claims. This stage depends on a theory of the characteristics of a successful performance, e.g., the categories of rating scales used to judge the adequacy of the performance. Step 3 involves defining in general terms the kinds of task in which the candidate will be required to engage.
  • 25. Construct Definition and Validation: Mislevy All three steps precede the actual writing of specifications for test tasks; they constitute the “thinking stage” of test design. Only when this chain of reasoning is completed can the specifications for test tasks be written. In further stages of ECD, Mislevy deals with turning the conceptual framework developed in the domain modeling stage into an actual assessment. The final outcome of this is an operational assessment.
  • 26. Construct Definition and Validation: Mislevy The consideration of the social dimension of assessment in Mislevy’s conceptual analysis remains implicit and limited to issues of fairness (McNamara, 2003): 1. Mislevy does not consider the context in which tests are commissioned. 2. Nor does Mislevy deal directly with the uses of test scores.
  • 27. Kane Kane points out that we interpret scores as having meaning. The same score might have different interpretations. Whatever the interpretation we choose, Kane argues, we need an argument to defend the relationship of the score to that interpretation. He calls this an interpretative argument, defined as a: “chain of inferences from the observed performances to conclusions and decisions included in the interpretation”
  • 28. Kane
  • 29. Kane Kane proposes four types of inference in the chain of inferences. He uses the metaphor of bridges for each of these inferences; all bridges need to be crossed safely for the final interpretations to be reached. The ultimate decisions are vulnerable to weaknesses in any of the preceding steps; in this way, Kane is clear about the dependency of valid interpretations on the reliability of scores.
  • 30. Kane
  • 31. Kane The first inference is from observation to observed score. In order for assessment to be possible, an instance of learner behavior needs to be observable. This behavior is then scored. The first type of inference is that the observed score is a reflection of the observed behavior (i.e., that there is a clear scoring procedure.)
  • 32. Kane The second inference is from the observed score to what Kane called the universe score. This inference is that the observed score is consistent across tasks, judges, and occasions. This involves reliability and can be studied effectively using generalizability theory, item response modeling, etc. A number variables can threaten the validity of this inference, including raters, task, rating scale, candidate characteristics, and interactions among these.
  • 33. Kane The third inference, from the universe score to the target score: construct validity which is closest to the first cell of Messick’s validity matrix. This inference involves extrapolation to nontest behavior—in some cases, via explanation in terms of a model. The fourth inference, from the target score to the decision based on the score, moves the test into the world of test use and test context;  It encompasses the material in the second, third, and fourth cells of Messick’s matrix involving questions of relevance (cell 2), values (cell 3), and consequences (cell 4)
  • 34. Kane Kane thus distinguishes two types of inference: 1. semantic inferences 2. policy inferences Kane also distinguishes two related types of interpretation: 1. Interpretations that only involve semantic inferences are called descriptive interpretations; 2. Interpretations involving policy inferences are called decision-based interpretations.
  • 35. The Social Dimension of Validity in Language Testing Some of the implications for assessment of the more socially oriented views of communication were represented in the work of Hymes (1967, 1972) However, the most influential discussion of the validity of communicative language tests, Bachman’s landmark Fundamental Considerations in Language Testing (1990), builds its discussion around Messick’s approach to validity. Lado (1961) and Davies (1977), had reflected the existing modeling of validity in terms of different types: criterion validity (concurrent/ predictive), content validity, and construct validity.
  • 36. The Social Dimension of Validity in Language Testing Following Messick, Bachman presented validity as a unitary concept, requiring evidence to support the inferences that we make on the basis of test scores. Bachman (1990) introduced the Bachman model of communicative language ability, clarifying the earlier interpretations by Canale and Swain (1980) and Canale (1983). Conceptualizing the demands of the target domain or criterion is a critical stage in the development of a test validation framework.
  • 37. The Social Dimension of Validity in Language Testing Bachman does this in two stages: First, he assumes that all contexts make specific demands on aspects of test-taker competence; The famous model of communicative language ability Second, Bachman handled characterization of specific target language use situations and of test content in terms of what he called Test method facets.
  • 38. The Social Dimension of Validity in Language Testing Problems of Bachman’s Model:  This a priori approach to characterizing the social context of use in terms of a model of individual ability obviously severely constrains the conceptualization of the social dimension of the assessment context.  The model is clearly primarily cognitive and psychological.  Those who have used the test method facets approach in actual research and development projects have found it difficult to use.
  • 39. The Social Dimension of Validity in Language Testing One influential attempt to make language test development and validation more manageable Bachman and Palmer’s (1996) test usefulness : reliability, construct validity, authenticity, interactiveness, impact, and practicality. Note: Authenticity, interactiveness and impact are three qualities that many measurement specialists consider to be part of validity.
  • 40. The Social Dimension of Validity in Language Testing Bachman (2004): the process of validation includes two interrelated activities: 1. Articulating an interpretive argument (also referred to as a validation argument), which provides the logical framework linking test performance to an intended interpretation and use. Following Kane 2. Collecting relevant evidence in support of the intended interpretations and uses. Following Mislevy
  • 41. The Social Dimension of Validity in Language Testing  Bachman describes procedures for validation of test use decisions, following Mislevy et al. (2003) in suggesting that they follow the structure of a Toulmin argument (i.e., a procedure for practical reasoning involving articulating claims and providing arguments and evidence both in their favor [warrants or backing] and against [rebuttals]).  An assessment use argument (AUA) is an overall logical framework for linking assessment performance to use (decisions). This assessment use argument includes two parts: an assessment utilization argument, linking an interpretation to a decision, and an assessment validity argument, which links assessment performance to an interpretation.
  • 42. The Social Dimension of Validity in Language Testing Following Bachman, Chapelle, Enright, and Jamieson (2004) used Kane’s framework as the basis for a validity argument for the new TOEFL iBT, arguably the largest language test development and validation effort yet undertaken. In both the work of Kane and in work in language testing, with one or two notable exceptions, the wider social context in which language tests are commissioned and have their place is still not adequately theorized.
  • 43. The Social Dimension of Validity in Language Testing A kind of optimism about the social role of tests is reflected by Kunnan and Bachman. In sharp contrast to this position is Shohamy’s analysis of the political function of tests: critical language testing (1998, 2001). Tests have become symbols of power for both individuals and society  Lynch (2001), like Shohamy: tests have the potential to be sites of unfairness or injustice at the societal level.
  • 44. Chapter 3: The Social Dimension of Proficiency: How Testable Is It?  The changes in assessment are associated with the advent of the communicative movement in language teaching in the mid-1970s.  The most significant contribution of that movement was a renewed focus on performance tests.  There was a shift from seeing language proficiency in terms of knowledge of structure which could be tested using discrete-point, multiple-choice items, to an emphasis on the integration of many aspects of language knowledge and skill in performance.  Two areas will be considered in which the assessment construct involves a social dimension of communication: • the assessment of face-to-face interaction • the assessment of pragmatics.
  • 45. Assessing Face-to-face Interaction The advent of communicative language testing saw a growing preference for face-to-face interaction as the context in which the assessment of spoken language skills would occur. In the past: U.S. Government Foreign Service Institute (FSI) test and the British tradition of examinations, but these practices were seriously undertheorized.
  • 46. Assessing Face-to-face Interaction Language assessment in face-to-face interaction takes place within what Goffman (1983) called the interaction order: Social interaction . . . [is] that which uniquely transpires in social situations, that is, environments in which two or more individuals are physically in one another’s response presence. . . . which might be titled the interaction order . . . [and] whose preferred method of study is microanalysis. Face to face interaction has its own regulations; it has its own processes and its own structure.
  • 47. Assessing Face-to-face Interaction The nature of behavior in face-to-face domain, and the rules that constrain it was systematically explored in the work of students of Goffman, Sacks and Schegloff, who developed Conversation Analysis. The realization was that for speaking tests, the presence of an interlocutor in the test setting introduces an immediate and overt social context, which presented fundamental challenges for the existing individualistic theories of language proficiency.
  • 48. Assessing Face-to-face Interaction A social view of performance is incompatible with taking the traditional view of performance as a simple projection or display of individual competence. Psychology, linguistics, and psychometrics assume that it is possible to read off underlying individual cognitive abilities from the data of performance.  However, the position of Conversation Analysis is that the interlocutor is implicated in each move by the candidate.  How are we to speak of communicative competence as residing in the individual if we are to “include the hearer in the speaker’s processes”?
  • 49. Assessing Face-to-face Interaction Research on this issue has taken a number of directions. Discourse analysts have focused on defining and clarifying the interactional nature of the oral proficiency interview, e.g., Young and He (1998); Johnson (2001). Results: 1. To emphasize the contribution of both parties to the performance, thus making it difficult to isolate the contribution of the candidate, which is the target of measurement and reporting. 2. To disclose the peculiar character of the oral proficiency interview as a communicative event.
  • 50. Assessing Face-to-face Interaction I do not agree that the interview is more natural than the other forms of tests, because if I’m being interviewed and I know that my salary and my promotion depend on it, no matter how charming the interviewer and his assistants are, this couldn’t be any more unnatural. (Jones & Spolsky, 1975, p. 7)
  • 51. Assessing Face-to-face Interaction The response from within the psychometric tradition is to maintain the goal of disentangling the role of the interlocutor by treating it as a variable like any other, to be controlled for, thus allowing us to focus again exclusively on the candidate. The most comprehensive study of the role of the interlocutor in performances in oral proficiency interviews is that of Brown (2003, 2005).
  • 52. Assessing Face-to-face Interaction Findings:  It was possible to identify score patterns (higher or lower scores) for candidates paired with particular interlocutors. What was the source of this effect? Two possible sources:  The behavior of interlocutors as revealed through discourse analysis,  The comments of raters as they listened to the performances, using think-aloud techniques.
  • 53. Assessing Face-to-face Interaction The social character of the interaction has also been conceptualized from more macrosociological points of view, as potentially being influenced by the identities of the candidate and interlocutor/rater.  What is at issue here is the extent to which such features as the gender of participants, the professional identity and experience of the interviewer/rater, the native-speaker status (or otherwise) of the interviewer/rater, the language background of the candidate, and so on influence the interaction and its outcome.
  • 54. Assessing Face-to-face Interaction This is the subject of intensive ongoing debate in discourse studies of macrosocial phenomena e.g.: o feminist discourse studies, o studies of racism, o the movement known as Critical Discourse Analysis in general, and o discursive psychology.
  • 55. Assessing Second Language Pragmatics Assessment of L2 pragmatics tests language use in social settings, but unlike oral proficiency tests, it does not necessarily focus on conversation or extracting speech samples. Because of its highly contextualized nature, assessment of pragmatics leads to significant tension between the construction of authentic assessment tasks and practicality only few tests are available in this area
  • 56. Second Language Pragmatics  Pragmatics is the study of language use in a social context, and language users’ pragmatic competence is their “ability to act and interact by means of language” (Kasper & Roever, 2005, p. 317).  Pragmatics covers  implicature,  deixis,  speech acts,  conversational management,  situational routines, and others (Leech, 1983; Levinson, 1983; Mey, 2001).
  • 57. Second Language Pragmatics Components of pragmatic competence (both are equally necessary): A: “sociopragmatic” knowledge knowledge of the target language community’s social rules, appropriateness norms, discourse practices, and accepted behaviors, whereas B: “pragmalinguistic” knowledge the linguistic tools necessary to “do things with words” (Austin, 1962)
  • 58. Second Language Pragmatics  Because of the close connection between pragmalinguistics and sociopragmatics, it is difficult to design a test that tests pragmalinguistics to the exclusion of sociopragmatics or vice versa. Pragmatic Tests Hudson, Detmer, & Brown’s (1995) test of English as a second language (ESL) sociopragmatics Bouton’s test of ESL implicature (Bouton, 1988, 1994, 1999) Roever’s Web-based test of ESL pragmalinguistics (Roever, 2005, 2006b) Liu’s (2006) test of EFL sociopragmatics. Liu’s (2006) test of EFL sociopragmatics.
  • 59. Testing Sociopragmatics  Hudson et al. (1995) designed a test battery for assessing Japanese ESL learners’ ability to produce and recognize appropriate realizations of the speech acts of request, apology, and refusal.  Their battery consisted of these sections, 1. a written DCT, 2. an oral (language lab) DCT, 3. a multiple-choice DCT, 4. a role-play, as well as 5. self-assessment measures for • the DCTs and the role play.
  • 60. Testing Sociopragmatics Result: A central problem of sociopragmatically oriented tests that focus on appropriateness: Judgments of what is and what is not appropriate differ widely among NSs and are probably more a function of personality and social background variables than of language knowledge.
  • 61. Testing Implicature  Bouton (1988, 1994, 1999) designed a 33-item test of implicature, incorporating two major types of implicature, which he termed “idiosyncratic implicature” and “formulaic implicature.”  Idiosyncratic implicature is conversational implicature in Grice’s terms (1975); that is, it violates a Gricean maxim and forces the hearer to infer meaning beyond the literal meaning of the utterance by using background knowledge.  Bouton viewed formulaic implicature as a specific kind of implicature, which follows a routinized schema.  He placed the Pope Q (“Is the Pope Catholic?”), indirect criticism (“How did you like the food?”—“Let’s just say it was colorful.”), and sequences of events in this category.
  • 62. Testing Pragmalinguistics  Roever (2005, 2006b) developed a test battery that focused squarely on the pragmalinguistic side of pragmatic knowledge.  Unlike Hudson et al., who limited themselves to speech acts, and Bouton, who assessed only implicature, Roever tested three aspects of ESL pragmalinguistic competence: 1. recognition of situational routine formulas, 2. comprehension of implicature, and 3. knowledge of speech act strategies.  Roever tried to strike a balance between practicality and broad content coverage to avoid construct underrepresentation.
  • 63. Apologies and Requests for Chinese EFL Learners  Liu (2006) developed a test of requests and apologies for Chinese EFL learners, which consisted of  a multiple-choice DCT,  A written DCT, and  a self-assessment instrument.  All three test papers contained 24 situations, evenly split between apologies and requests.  As Liu (1996) stated quite clearly, his test exclusively targets Chinese EFL learners so its usability for other first language (L1) groups is unclear.  It also only considers one aspect of L2 pragmatic competence—speech acts—so that conclusions drawn from it and decisions based on scores would have to be fairly limited and restricted.
  • 64. The Tension in Testing Pragmatics: Keeping It Social but Practical If pragmatics is understood as language use in social settings, tests would necessarily have to construct such social settings. The usual method of DCTs of cramming all possible speech act strategies into one gap is highly inauthentic in terms of actual conversation. The obvious alternative would be to test pragmatics through face-to- face interaction, most likely role-plays. In this way, speech acts could unfold naturally as they would in real-world interaction.
  • 65. The Tension in Testing Pragmatics: Keeping It Social but Practical  Problems with Role Plays:  Practicality.  Being time consuming to conduct  Requiring multiple ratings.  A standardization issue because every role-play would be somewhat unique if the conversation is truly co-constructed.  How can this dilemma be solved?  One way is to acknowledge that pragmatics does not exclusively equal speech acts.
  • 66. The Tension in Testing Pragmatics: Keeping It Social but Practical  It is possible to test other aspects of pragmatic competence without simulating conversations. Certain aspects of pragmatics can be tested more easily in isolation than others. For example,  Routine formulas  Implicature  However, such a limitation begs the question of construct underrepresentation.  Given that pragmatics is a fairly broad area, it is difficult to design a single test that assesses the entirety of a learner’s pragmatic competence.  Depending on the purpose of the test, different aspects of pragmatic competence can be tested, and for some purposes, it might be unavoidable to use role-plays or other simulations of social situations