SlideShare a Scribd company logo
LANGUAGE
PROFICIENCY
TESTING
A Critical Survey
Language Testing
Earth without art is just
eh
Language Testing
Ple ase Go d m ay Ino t fail
Ple ase Go d m ay Ig e t o ve r sixty pe r ce nt
Ple ase Go d m ay Ig e t a hig h place
Ple ase Go d m ay alltho se like ly to be at m e g e t kille d in
ro ad accide nts and m ay the y die ro aring .
Irish no ve list McGahe rn
Overview
Types of language tests
Ways of describing tests
Evaluating the usefulness of language tests
Overview of common language tests:
TOEFL, TOEIC, IELTS, and CAEL
Impact of testing on learning and teaching
Critical use of language tests
Testing Questions
Testing Questions
What is actually being tested by the test
we are using?
What is the“best” test to use?
What relevant information does the test
provide?
How is testing affecting teaching and
learning behaviour?
Is language testing “fair”?
Validity, reliability, feasibility
Reliability relates to the consistency of an
assessment.
A reliable assessment is one which
consistently achieves the same results with
the same (or similar) cohort of students.
A valid assessment is one which
measures what it is intended to
measure
Totally valid or reliable/Driving test
Process of observation and objective
accumulation of evidences about the
individual learning process of students.
- How to assess?
−Checklist
−Informal teaching observation
Assessment
Consider the following:
o You apply for a part-time job to work your way through
school. You learn that as part of the application process,
you must take a test of word-processing speed and a
personality test.
o Mr. and Mrs. Gómez receive a call from their child’s
third-grade teacher, who says she is concerned about
Luis’ performance on a reading test. She would like to
refer Luis for further testing to see whether Luis has a
learning disability.
o Mr. and Mrs. Torres tell you that their son is not eligible
for special-education services because he scored “too
high” on an intelligence test.
Types of Assessment ( moments
of…)
Assessment – The process of collecting data for the purpose of
making decisions about individuals and groups, and this
decision-making role is the reason that assessment touches
so many people’s lives.
People react strongly when test scores are used to make
interpersonal comparisons in which they or those they
love look inferior.
Power of Testing
Language Testing
Testing – Consists of administering a particular set of questions to an
individual or group of individuals to obtain a score. The score is the
end product of testing.
Testing may be part of the larger process
Testing and assessment are not synonymous.
Assessment is a multifacted process that involves far
more than just administering a test.
High quality assessment procedures anyone’s
performance on any task is influenced by (1) the
demands of the task itself, (2) the history and
characteristics the individual brings to the task, and (3)
the factors inherent in the context in which the
assessment is carried out.
Facts
 Results
 Formats
 Quantitative
 Grades
 Letters
 Indicators
testing
SECOND
LANGUAGE
EVALUATION
• Standard test: TOEFL – IELTS – PET- CAE
• Placement test: Licenciatura test for freshmen
students
• Proficiency test: TOEFL - IELTS
• Achievement test: Parciales – workshops in ALx
Types of tests in language
education
Goal: it is the aim expected at the end of
learning process.
Standard: accurate conceptual domain of a
topic.
Descriptors: are the achievements by
competences, they are used in present with
closed and specific characteristics.
Indicators: it is the “regulator” of the curriculum
it is not a final result, because it is subject to
GOAL •To use English in common situations.
STANDARD •Students will use English to involve himself in social
circumstances.
descriptor •To recognize social codes.
Assume a critical position above actual events.
indicators •Students recognize social codes.
•Students recognize social codes with difficulty .
•Student has a lot of difficulties to recognize social
codes.
Avoid the use of
not.
NO
How do create an evaluation?
1.Formulate the descriptors
2.Design a plan
3.Observe the learning process
4.Evaluate
5.Determine the efficiency of
pedagogies.
Evaluation in Colombian settings
National standard for evaluation:
ICFES Saber 5 – 9 - 11 ECAES
Saber pro
National standard for grading:
LAW 230: E S A I D
Decreto 1290:
1 – 5 /10 - 100
EVALUATION AT “INITIAL
schOOL”
1.Goal, standards, descriptors and indicators
based on the “Unified” Standards.
2.Strategies for evaluating in the five skills.
3.Continuous assessing of students development
4.Supportive strategies for solving academic and
personal problems
1.Scales to compare national standards with
school’s scales
2.Explicit self evaluation
3.Participation of the educational community
BIBLIOGRAPHY
Common European Framework for References of Language.
Cambridge University Press.
Alderson, C.J., Beretta, A.(1993) Evaluating second language
education.(pp 4-27.).Location: Cambridge: Cambridge University
Press.
Evaluación y Promoción por Estándares y Competencias. Rivera, G.
(2009)
El proceso de la evaluación. Series lineamientos curriculares idiomas
extranjeros. Ministerio de Educación Nacional.
Types of Language Tests
Achievement test
associated with process of instruction
assesses where progress has been
made
should support the teaching to which it
relates
Alternative Assessment
need for assessment to be integrated
with the goals of the curriculum
Proficiency test
aims to establish a test taker’s
readiness for a particular
communicative role
general measure of “language ability”
measures a relatively stable trait
used to make predictions about future
language performance (Hamp-Lyons,
1998)
high-stakes test
Some ways of describing tests
Objective Subjective
Indirect Direct
Discrete-point Integrative
Aptitude / Achievement/
Proficiency Performance
External Internal
Norm-Referenced Criterion-Referenced
Evaluating the usefulness of a
language test
Usefulness= reliability+validity+ impact
authenticity+interactiveness+practicality
(Bachman and Palmer, 1996)
TEST
USEFULNESS
TEST
USEFULNESS
RELIABILITYRELIABILITY VALIDITYVALIDITY
ImpactImpact AuthenticityAuthenticity
PracticalityPracticality InteractivenessInteractiveness
Evaluating the usefulness of a
language test
Essential measurement qualities
reliability
construct validity
Evaluation: test taker - test task - Target
Language Use (TLU)
TLU
Test TaskTest Taker
Overview of common language
proficiency tests
TOEFL TOEIC
IELTS
CAEL
Test of English as a Foreign
Language
One million test takers per
year
P&P 310-677/ CBT 0-300
Three sections:
Listening
Structure and Written
Expression
Reading
Comprehension
TWE
Test of English as a Foreign
Language
Objective Subjective
Discrete-point Integrative
Proficiency
Achievement
discord between test and understanding of
language and communication
passive recognition of language
cutoff scores are very problematic
general proficiency ≠ academic proficiency
Test of English forInternational
Communication
TOEFL equivalent for
workplace setting
two sections, 200 q.
listening
reading
entertainment,
manufacturing, health,
travel, finance, etc.
“objective and cost-
efficient”
Test of English forInternational
Communication
Objective
Subjective
Discrete-point
Integrative
Proficiency
Achievement
lack of correspondence with TLU
International English Language
Testing System
Academic/General
Results reported in
band scores 1-9
ListeningListening
G.ReadingG.Reading A.ReadingA.Reading
G.WritingG.Writing A.WritingA.Writing
SpeakingSpeaking
International English Language
Testing System
Objective
Subjective
Discrete-point
Integrative
Proficiency
Achievement
test tasks reflective of academic
tasks
Canadian Academic English
Language Assessment
Mirrors language
use in university
Topic-
based,integrated
reading, listening,
and writing tasks
provides specific
diagnostic
information
scores are reported
in bands 10-90
Canadian Academic English
Language Assessment
Objective Subjective
Discrete-point Integrative
Proficiency Achievement
tests performance and use
diminished gap between test and classroom
validity is supported by teacher evaluations
studies on predicting academic success
Washback: The Impact of Tests on
Teaching and Learning
“The power of tests has a strong influence on
curriculum and learning outcomes”
(Shohamy, 1993)
good test ≠ positive washback
form of test impact depends on
antecedent: educational context and condition
process
consequences (Wall,
2000)
Critical Language Testing
Focus on consequence and ethics of test
use
Tests are embedded in cultural,
educational, and political arenas
whose agenda?
Questions traditional testing knowledge
English proficiency= academic success?
English: got it or get it!
Responsible test use (Hamp-Lyons, 2000)
Testing Questions
What is actually being tested by the test we
are using?
What is the”best” test to use?
What relevant information does the test
provide?
How is testing affecting teaching and
learning behaviour?
Is language testing “fair”?
Test design criteria
Usefulness= reliability+validity+ impact
authenticity+interactiveness+practicality
 reliability= consistency of measurement
 validity= the extent to which the inferences that we make
on the basis of the test are valid given the target language
use situation
 authenticity= how closely does the test resemble the
actual language use situation
 interactiveness= to what extent is the test taker involved in
active communication
 impact= what is the effect of the test on test takers, test
users, teachers etc.
Time – language level – design
Layout
Theoretical support (one page to explain
the test; explain why your test is
usefulness, the type of test, )
Score 1 – 5 (create bands for scores)
Make copies for the whole group
15 minutes per skill (except - speaking)

More Related Content

PDF
Language Testing Evaluation
PPT
Language Testing
PPT
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
PPTX
Language Testing Techniques
PPT
Using Proficiency Testing
PPT
Standards In Language Testing
PPTX
Testing language skills chapter one
PPTX
My Presentation
Language Testing Evaluation
Language Testing
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
Language Testing Techniques
Using Proficiency Testing
Standards In Language Testing
Testing language skills chapter one
My Presentation

What's hot (20)

PPTX
Kinds of tests and testing
PPT
Assessment &testing in the classroom
PPT
Communicative language testing
PPTX
Language testing
PPTX
Unit1(testing, assessing, and teaching)
PPTX
TYPES AND USES OF LANGUAGE TESTING & NORM-REFERENCED TEST AND CRITERION-REFER...
PPTX
Language Assessment Principles and Issues
PDF
Introduction to language testing (wed, 23 sept 2014)
PDF
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
PPT
Language testing
PPTX
Kinds of Language Tests
PDF
Language Testing :kinds of tests
PPSX
Summary on LANGUAGE TESTING & ASSESSMENT (Part I) Alderson & Banerjee
PPTX
Test production process - Approaches to language testing - Techniques of lang...
PPT
Language testing final
PPTX
Testing : An important part of ELT
PPTX
L2 assessment
PDF
Testing and assessment in elt
PPT
C:\Documents And Settings\Administrateur\Mes Documents\9 Aout 2009\Elt Method...
PPTX
Types of tests and types of testing
Kinds of tests and testing
Assessment &testing in the classroom
Communicative language testing
Language testing
Unit1(testing, assessing, and teaching)
TYPES AND USES OF LANGUAGE TESTING & NORM-REFERENCED TEST AND CRITERION-REFER...
Language Assessment Principles and Issues
Introduction to language testing (wed, 23 sept 2014)
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
Language testing
Kinds of Language Tests
Language Testing :kinds of tests
Summary on LANGUAGE TESTING & ASSESSMENT (Part I) Alderson & Banerjee
Test production process - Approaches to language testing - Techniques of lang...
Language testing final
Testing : An important part of ELT
L2 assessment
Testing and assessment in elt
C:\Documents And Settings\Administrateur\Mes Documents\9 Aout 2009\Elt Method...
Types of tests and types of testing
Ad

Viewers also liked (7)

PDF
Thom Kiddle: Item development, the CEFR and the perils of Cinderella testing
PPTX
Assesing listening - language learning evaluation
PDF
Global noise global englishes
PPT
English worldwide global englishes
PPTX
Assessing Language Learning
PDF
Language testing and the use of the common european framework of reference fo...
 
PPTX
AUTHENTIC AND ALTERNATIVE ASSESSMENT METHODS
Thom Kiddle: Item development, the CEFR and the perils of Cinderella testing
Assesing listening - language learning evaluation
Global noise global englishes
English worldwide global englishes
Assessing Language Learning
Language testing and the use of the common european framework of reference fo...
 
AUTHENTIC AND ALTERNATIVE ASSESSMENT METHODS
Ad

Similar to Language Testing (20)

PPTX
Testing and Evaluation Strategies in Second Language Teaching.pptx
PPT
Languaje Testing, I Bimestr
PDF
Principles of language assessment
PDF
Principles of Language Assessment
PPTX
1. Introduction to Language Testing.pptx
PPTX
1. Introduction to Language Testing.pptx
PDF
Learning Activity 1_ Viteri Flores_Arlyn Johanna
PPTX
TESTING, SLIDES 2023.pptx. There is a very strtucted description of testing c...
PDF
languagetesting-200627343434344035034.pdf
PPTX
Principles of language assessment.pptx
PDF
1. presentacion de Evaluation 1st part.pdf
PPTX
Learning_activity1_Navarro Luzuriaga_Joseph Andrés.pptx
PDF
Testing and assessment.pdf
PPTX
Principles of Language Assessment
PDF
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
PDF
ASSESSMENT CONCEPTS AND ISSUES
PPT
Hoffmann magno testing_apiba_27_10_12
PPTX
Principles of Language Assessment: Assessment Terminology
PPTX
testing and evaluation
PPT
lecture2_20111.ppt
Testing and Evaluation Strategies in Second Language Teaching.pptx
Languaje Testing, I Bimestr
Principles of language assessment
Principles of Language Assessment
1. Introduction to Language Testing.pptx
1. Introduction to Language Testing.pptx
Learning Activity 1_ Viteri Flores_Arlyn Johanna
TESTING, SLIDES 2023.pptx. There is a very strtucted description of testing c...
languagetesting-200627343434344035034.pdf
Principles of language assessment.pptx
1. presentacion de Evaluation 1st part.pdf
Learning_activity1_Navarro Luzuriaga_Joseph Andrés.pptx
Testing and assessment.pdf
Principles of Language Assessment
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
ASSESSMENT CONCEPTS AND ISSUES
Hoffmann magno testing_apiba_27_10_12
Principles of Language Assessment: Assessment Terminology
testing and evaluation
lecture2_20111.ppt

More from edac4co (20)

PPT
Planeacion linguistica en colombia
PPT
5 generations a lx
PPT
Aprendizaje integrado de lengua y contenidos
PPTX
Approaches to literacy
PPT
Bilingual education in colombia
PPTX
Bilingual memory storage
PPT
Bics and calp
PPT
Introduction to language teaching approach - 2017
PPT
Bilingualism and bilingual education 2017
PPT
Sesión 1 bilingualism 2017
PPT
Introduction to language teaching
PPT
Bilingual education in colombia
PPT
Bilingualism
PPT
Bilingual education in colombia
PPT
Bics and calp
PPT
Sesión 1 bilingualism
PPT
Language teaching methods: from past to future
PPT
Sesión 1 bilingualism
PPT
Bilingual education in colombia
PPT
Five generations of Applied Linguistics
Planeacion linguistica en colombia
5 generations a lx
Aprendizaje integrado de lengua y contenidos
Approaches to literacy
Bilingual education in colombia
Bilingual memory storage
Bics and calp
Introduction to language teaching approach - 2017
Bilingualism and bilingual education 2017
Sesión 1 bilingualism 2017
Introduction to language teaching
Bilingual education in colombia
Bilingualism
Bilingual education in colombia
Bics and calp
Sesión 1 bilingualism
Language teaching methods: from past to future
Sesión 1 bilingualism
Bilingual education in colombia
Five generations of Applied Linguistics

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Insiders guide to clinical Medicine.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Basic Mud Logging Guide for educational purpose
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Pre independence Education in Inndia.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
master seminar digital applications in india
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial disease of the cardiovascular and lymphatic systems
Insiders guide to clinical Medicine.pdf
TR - Agricultural Crops Production NC III.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPH.pptx obstetrics and gynecology in nursing
O5-L3 Freight Transport Ops (International) V1.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Basic Mud Logging Guide for educational purpose
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Sports Quiz easy sports quiz sports quiz
Pre independence Education in Inndia.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
VCE English Exam - Section C Student Revision Booklet
master seminar digital applications in india
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GDM (1) (1).pptx small presentation for students
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

Language Testing

  • 3. Earth without art is just eh
  • 5. Ple ase Go d m ay Ino t fail Ple ase Go d m ay Ig e t o ve r sixty pe r ce nt Ple ase Go d m ay Ig e t a hig h place Ple ase Go d m ay alltho se like ly to be at m e g e t kille d in ro ad accide nts and m ay the y die ro aring . Irish no ve list McGahe rn
  • 6. Overview Types of language tests Ways of describing tests Evaluating the usefulness of language tests Overview of common language tests: TOEFL, TOEIC, IELTS, and CAEL Impact of testing on learning and teaching Critical use of language tests Testing Questions
  • 7. Testing Questions What is actually being tested by the test we are using? What is the“best” test to use? What relevant information does the test provide? How is testing affecting teaching and learning behaviour? Is language testing “fair”?
  • 8. Validity, reliability, feasibility Reliability relates to the consistency of an assessment. A reliable assessment is one which consistently achieves the same results with the same (or similar) cohort of students. A valid assessment is one which measures what it is intended to measure Totally valid or reliable/Driving test
  • 9. Process of observation and objective accumulation of evidences about the individual learning process of students. - How to assess? −Checklist −Informal teaching observation Assessment
  • 10. Consider the following: o You apply for a part-time job to work your way through school. You learn that as part of the application process, you must take a test of word-processing speed and a personality test. o Mr. and Mrs. Gómez receive a call from their child’s third-grade teacher, who says she is concerned about Luis’ performance on a reading test. She would like to refer Luis for further testing to see whether Luis has a learning disability. o Mr. and Mrs. Torres tell you that their son is not eligible for special-education services because he scored “too high” on an intelligence test.
  • 11. Types of Assessment ( moments of…)
  • 12. Assessment – The process of collecting data for the purpose of making decisions about individuals and groups, and this decision-making role is the reason that assessment touches so many people’s lives. People react strongly when test scores are used to make interpersonal comparisons in which they or those they love look inferior. Power of Testing
  • 14. Testing – Consists of administering a particular set of questions to an individual or group of individuals to obtain a score. The score is the end product of testing. Testing may be part of the larger process Testing and assessment are not synonymous. Assessment is a multifacted process that involves far more than just administering a test. High quality assessment procedures anyone’s performance on any task is influenced by (1) the demands of the task itself, (2) the history and characteristics the individual brings to the task, and (3) the factors inherent in the context in which the assessment is carried out. Facts
  • 15.  Results  Formats  Quantitative  Grades  Letters  Indicators testing
  • 17. • Standard test: TOEFL – IELTS – PET- CAE • Placement test: Licenciatura test for freshmen students • Proficiency test: TOEFL - IELTS • Achievement test: Parciales – workshops in ALx Types of tests in language education
  • 18. Goal: it is the aim expected at the end of learning process. Standard: accurate conceptual domain of a topic. Descriptors: are the achievements by competences, they are used in present with closed and specific characteristics. Indicators: it is the “regulator” of the curriculum it is not a final result, because it is subject to
  • 19. GOAL •To use English in common situations. STANDARD •Students will use English to involve himself in social circumstances. descriptor •To recognize social codes. Assume a critical position above actual events. indicators •Students recognize social codes. •Students recognize social codes with difficulty . •Student has a lot of difficulties to recognize social codes. Avoid the use of not. NO
  • 20. How do create an evaluation? 1.Formulate the descriptors 2.Design a plan 3.Observe the learning process 4.Evaluate 5.Determine the efficiency of pedagogies.
  • 21. Evaluation in Colombian settings National standard for evaluation: ICFES Saber 5 – 9 - 11 ECAES Saber pro National standard for grading: LAW 230: E S A I D Decreto 1290: 1 – 5 /10 - 100
  • 22. EVALUATION AT “INITIAL schOOL” 1.Goal, standards, descriptors and indicators based on the “Unified” Standards. 2.Strategies for evaluating in the five skills. 3.Continuous assessing of students development 4.Supportive strategies for solving academic and personal problems 1.Scales to compare national standards with school’s scales 2.Explicit self evaluation 3.Participation of the educational community
  • 23. BIBLIOGRAPHY Common European Framework for References of Language. Cambridge University Press. Alderson, C.J., Beretta, A.(1993) Evaluating second language education.(pp 4-27.).Location: Cambridge: Cambridge University Press. Evaluación y Promoción por Estándares y Competencias. Rivera, G. (2009) El proceso de la evaluación. Series lineamientos curriculares idiomas extranjeros. Ministerio de Educación Nacional.
  • 24. Types of Language Tests Achievement test associated with process of instruction assesses where progress has been made should support the teaching to which it relates Alternative Assessment need for assessment to be integrated with the goals of the curriculum
  • 25. Proficiency test aims to establish a test taker’s readiness for a particular communicative role general measure of “language ability” measures a relatively stable trait used to make predictions about future language performance (Hamp-Lyons, 1998) high-stakes test
  • 26. Some ways of describing tests Objective Subjective Indirect Direct Discrete-point Integrative Aptitude / Achievement/ Proficiency Performance External Internal Norm-Referenced Criterion-Referenced
  • 27. Evaluating the usefulness of a language test Usefulness= reliability+validity+ impact authenticity+interactiveness+practicality (Bachman and Palmer, 1996) TEST USEFULNESS TEST USEFULNESS RELIABILITYRELIABILITY VALIDITYVALIDITY ImpactImpact AuthenticityAuthenticity PracticalityPracticality InteractivenessInteractiveness
  • 28. Evaluating the usefulness of a language test Essential measurement qualities reliability construct validity Evaluation: test taker - test task - Target Language Use (TLU) TLU Test TaskTest Taker
  • 29. Overview of common language proficiency tests TOEFL TOEIC IELTS CAEL
  • 30. Test of English as a Foreign Language One million test takers per year P&P 310-677/ CBT 0-300 Three sections: Listening Structure and Written Expression Reading Comprehension TWE
  • 31. Test of English as a Foreign Language Objective Subjective Discrete-point Integrative Proficiency Achievement discord between test and understanding of language and communication passive recognition of language cutoff scores are very problematic general proficiency ≠ academic proficiency
  • 32. Test of English forInternational Communication TOEFL equivalent for workplace setting two sections, 200 q. listening reading entertainment, manufacturing, health, travel, finance, etc. “objective and cost- efficient”
  • 33. Test of English forInternational Communication Objective Subjective Discrete-point Integrative Proficiency Achievement lack of correspondence with TLU
  • 34. International English Language Testing System Academic/General Results reported in band scores 1-9 ListeningListening G.ReadingG.Reading A.ReadingA.Reading G.WritingG.Writing A.WritingA.Writing SpeakingSpeaking
  • 35. International English Language Testing System Objective Subjective Discrete-point Integrative Proficiency Achievement test tasks reflective of academic tasks
  • 36. Canadian Academic English Language Assessment Mirrors language use in university Topic- based,integrated reading, listening, and writing tasks provides specific diagnostic information scores are reported in bands 10-90
  • 37. Canadian Academic English Language Assessment Objective Subjective Discrete-point Integrative Proficiency Achievement tests performance and use diminished gap between test and classroom validity is supported by teacher evaluations studies on predicting academic success
  • 38. Washback: The Impact of Tests on Teaching and Learning “The power of tests has a strong influence on curriculum and learning outcomes” (Shohamy, 1993) good test ≠ positive washback form of test impact depends on antecedent: educational context and condition process consequences (Wall, 2000)
  • 39. Critical Language Testing Focus on consequence and ethics of test use Tests are embedded in cultural, educational, and political arenas whose agenda? Questions traditional testing knowledge English proficiency= academic success? English: got it or get it! Responsible test use (Hamp-Lyons, 2000)
  • 40. Testing Questions What is actually being tested by the test we are using? What is the”best” test to use? What relevant information does the test provide? How is testing affecting teaching and learning behaviour? Is language testing “fair”?
  • 41. Test design criteria Usefulness= reliability+validity+ impact authenticity+interactiveness+practicality  reliability= consistency of measurement  validity= the extent to which the inferences that we make on the basis of the test are valid given the target language use situation  authenticity= how closely does the test resemble the actual language use situation  interactiveness= to what extent is the test taker involved in active communication  impact= what is the effect of the test on test takers, test users, teachers etc.
  • 42. Time – language level – design Layout Theoretical support (one page to explain the test; explain why your test is usefulness, the type of test, ) Score 1 – 5 (create bands for scores) Make copies for the whole group 15 minutes per skill (except - speaking)

Editor's Notes

  • #6: I would like to begin today’s presentation with a quote which, taken to an extreme, illustrates the effect that high stakes testing can have on students.
  • #8: 1. It is a common assumption that well known testing tools are useful for our purposes, because we believe that they are technically sound and ongoing research is being carried out. However… 2. What is the “best” test to use? 3. Given that we use the results of language proficiency tests to determine, in large part, the academic future of our students, we must ask what relevant information the test provides that would justify this use. 4. 5.  These are questions that I will begin to address in today’s presentation and we will return to them at the end.
  • #25: REFER AUDIENCE TO HANDOUT 1. examples: end of course tests, portfolio assessments 2. We accumulate evidence during, or at the end of a course of study in order to see whether and where progress has been made in terms of the goals of learning 3. designed to measure how much of a syllabus a learner has mastered and thus they are only valid to the extent to which the content of the test matches the content of the syllabus 4. The use of achievement tests allows instructors to be innovative and to reflect progressive aspects of the curriculum = they are thus associated with some interesting new developments , a movement known as alternative assessment 5. this approach stresses the need for assessment to be integrated with the goals of the curriculum -learners may be encouraged to share responsibility in assessment and be trained to evaluate their own capacities -known as self-assessment Refer to Brown and Hudson for a detailed discussion of alternative assessment
  • #26: 1. This is established for university admission, professional certification, workplace etc. 2. “language ability” - consequently not reflective of a specific syllabus 3. “stable trait”- this means that scores tend not to change within a short period of time; thus this type of test would not be useful in the context of assessing learning over a few weeks. -indeed this change would mainly indicate statistical variance -however, programs are often pressured to employ such tests in order to determine the effectiveness of teaching 4. “predictions’ - this is why such tests are used for admissions decisions and consequently are high stakes - they determine in great part a student’s academic and economic future - Interestingly Hamp-Lyons notes that the vast majority of people who interpret test scores are neither teachers nor testing professionals, they are administrators.
  • #27: Objective= no human interference, very highly reliable subjective=individuals are involved in the evaluation process indirect= we make inferences from the test tasks- e.g. using a sentence structure question to infer the writing ability of a test taker direct= no gap between test task and target language situation . E.g assessing speaking skills in an interview discrete-point=multiple-choice, often isolated items integrative=different skills are not separated but assessed holistically external internal Norm-referenced= a test takers performance is evaluated against the range of performances typical of a population of similar test takers Criterion-referenced=performances are compared to one or more descriptions of adequate performance at a given level e.g. band scores  Describing and evaluating tests on a continuum allows us to steer away from a black and white judgment.
  • #28: REFER AUDIENCE TO HANDOUT 1. In order to determine which test is “best” for a given assessment situation, we need to evaluate its overall usefulness. 2. Bachman and Palmer include six qualities in their definition of usefulness :list reliability= consistency of measurement validity= the extent to which the inferences that we make on the basis of the test are valid given the target language use situation authenticity= how closely does the test resemble the actual language use situation interactiveness= to what extent is the test taker involved in active communication impact= what is the effect of the test on test takers, test users, teachers etc. 3. These qualities are not all granted equal regard but they must all be considered in order to achieve a desired balance - consequently the balance would vary from one testing situation to another.  these elements cannot be evaluated independently but must be looked at in terms of their combined effect - OVERALL usefulness that needs to be emphazised rather than ind Qualities. - evaluation of test usefulness is essentially subjective because it is based on judgements on part of test user REFER AUDIENCE TO QUESTIONS FOR EVALUATION ON HANDOUT
  • #29: 1. Two essential considerations in the evaluation of test usefulness are reliability and validity 2. Reliability is necessary because we want to ensure that test results are scored in a reliable and consistent manner. However, strong reliability without validity tells us essentially nothing. 3. Therefore, construct validity is of specific interest to us is because it is concerned with the extent to which we can interpret a given test score as an indicator of the ability we want to measure - thus, it addresses the meaningfulness and appropriateness of the interpretations that we make. 4. Threats to construct validity can occur when real requirements of the TLU domain may be not be fully represented in the test. We frequently hear people complain that even though students perform very high on the TOEFL they lack basic communication skills. This is probably the case because interaction is not required by the test. The TSE is sometimes employed to remedy this fact; however describing how a tourist can find the way to the train station will not necessarily translate into the ablity to take part in round-table discussions threats to content validity : issue is to what extent the test content forms a satisfactor basis for the inferences to be made from performance e.g. using the TOEFL to make inferences about the ability of an international student to act as a teaching assistant if we want to use the scores from a language test to make inferences about individuals’ language ability, and possibly make various types of decisions, we must be able to demonstrate how performance on that language test is related to language use in specific situations other than the language test itself that is why when considering the six qualities just addressed we always need to examine them in connection to the test taker, the test task and the Target Language Use - Ideally there should be a seamless connection between these three elements- the greater the distance the less useful the inferences that we can make.
  • #31: 1. The greatest language test prep industry has developed around this test introduction to test prep book states “ you are well aware that the TOEFL is one of the most important examinations that you will ever take. Your entire future may well depend on your performance in the TOEFL. The results of this test will determine whether you will be admitted to the school of your choice. 2. 1 million test takers 3. the TOEFL is 100% multiple choice -it uses “generic, or neutral” language and does not specify a context 4. Four sections - Listening section: test takers are not given opportunity to preview questions, nor to see them while listening, nor take notes 5. Research at TOEFL places heavy emphasis on reliability but provides inadequate validity evidence. New development include automatic essay scoring that is done by computer analysis of written structures - TOEFL 2000 project that aims to make changes to the construct of the test which dates back to the 1960’s.
  • #32: 1. Does not reflect current teaching and learning practices and could thus have negative effects on students, teachers because it is in conflict. 2. Passive reconition Students who “pass the test are often unable to communicate However, institutions and other TOEFL score recipients that note inconsistencies such as high TOEFL scores and apparent weak English proficiency, should refer to the photo on the Official Score Report for evidence of impersonation 3. Cutoff scores CPA called upon Canadian universities to refrain from using TOEFL as a standard for university admission - contrary to recommendations decisions often based solely on score - interpretation of scores is difficult because it is norm-referenced and simply provides a number -many have increased have increased TOEFL cutoffs ranging from 580-600 -many who would otherwise be qualified for university admission are denied access - after an 8week summer university orientation program given in English, students’ scores on the TOEFL itself increased from an average of 570-601 -mean score of native speakers reported by ETS is 590 4. General proficiency In his critique of language tests and admission procedures, Elson quoted several studies that have found that merely knowing how a student scored on TOEFL will tell us practically nothing we need to know to predict the student’s academic performance 5. dissatisfaction has led to disuse of TOEFL by some e.g. Australia -misuse of the TOEFL, cycles of raising and lowering requirements -TOEFL is used as an initial screen but other tests have to be taken upon arrival
  • #33: 1. listening section includes variety of statements, questions, short conversations 2. reading section includes incomplete sentences, error recognition, and reading comprehension 3. Content is drawn from a wide variety of areas 4. tailored to provide rapid, affordable, and convenient service; therefore only measure listening and reading since these can be tested objectively. Testing writing and speaking requires time and expense and are “less objective and less reliable”
  • #34: 1.Concern with lack of correspondence between test tasks and target language use. Does not measure speaking - how do you know that person will be able to communicate in a business setting? 2. It only measures listening and reading but makes inferences to communicative ability 3. the test content is extremely broad and may in the end not provide any useful information to any of the fields that use this test
  • #35: 1. 205 test centers in over 100 countries 2. Test is divided into four modules, which have no central theme or topic but offer separate reading and writing tasks for either general or academic English use 3. listening: number of recorded texts which increase in difficulty as the test progresses, mixture of conversations and dialogues - allowed to preview 4. readings are taken from books, magazines, journals 5. writing includes two tasks 1. Write a 150 word report based on material found in a table or diagram, demonstrating ability to describe and explain. - Short essay of 250 words in response to an opinion or a problem expected to demonstrate ability to discuss issues, construct an argument, and use appropriate tone and register 6. Speaking is assessed during a 10-15min one-on-one interview. Requires the test taker to describe, narrate, and provide explanations on a variety of personal and general interest topics - objective key for listening and reading components, speaking and writing components are marked on a subjective key 7. The test includes a variety of task and response types
  • #36: 1. The actual tasks are reflective of academic tasks 2. Comprehensive scoring structure has advantage of giving students knowledge of what specific area of language needs special attention - when asked whether the subjective component of the assessment procedure might introduce a degree of unfairness into the testing process, Jill Richardson said that if the test is truly to be regarded as a communication oriented process, personal interaction is a necessary ingredient without which it is difficult to truly establish a person’s capacity to use language 3.need for more reliability research. -emphasis for UCLES has been on validity and this is also reflected in their certificate exams. It comes from a tradition where teaching professionals are trusted to make fair judgements. 4. It is one of the two tests accepted by the Canadian government for immigration purposes.
  • #37: 1. Was designed by Carleton U. in response to their perceived failure of standardized tests to effectively identify students who were able to use English at levels required for university study 2. test is grounded in day-to day use of language within first year courses at the university -this test is designed not for the global knowledge of English but for English-medium academic contexts -attempts to recreate for the test taker the experience of joining an introductory first year course 3.Integrated, criterion-referenced, topic-based test for EAP -uses constructed response rather than multiple-choice items -there is direct overlap between taking a CAEL assessment, taking and academically oriented ESL course or taking a first year course at a university. The overlap is clear in the tasks and activities of the test -in this way the test aims to promote positive and useful learning - When completing practice tests students are provided with a conversion key that states which skill is tested by each question
  • #38: 1. The nature of the test tasks encourage students to make use of their language knowledge and actively engages them 2. The language skills that are promoted by the test are in line with what a teacher would use in an EAP classroom 3. Research has shown that teachers evaluate their students in-class performances similarly 4. There is an ongoing tracking study that aims to link test performance with future academic performance 5. Even though the test was designed to create positive washback for language learners and teachers; some students have reportedly the same studying habits as for the TOEFL: staying at home for independent cramming. Demonstrates that a “positive” test does not have the same impact on all students.
  • #39: 1. Bailey “ there is a natural tendency for both teachers and students to tailor their classroom activities to the demands of the test, especially when the test is very important to the future of the students” 2. washback can be either positive or negative to the extent that it promotes or hinders achievement of language learning goals held by learners and educators 3. Complex interaction of factors. 4. The more information is available to teachers, learners, test users, and the more they are involved in the testing process, the more likely we will be creating positive impact
  • #40: -considering that proficiency tests are most powerful indicator for determining the academic future of ESL students discussion needs to start focusing on ethics and consequences of test use - Shohamy introduced the concept of critical language testing - this concept builds on critical pedagogy perspective and emphasizes that the act of testing is both a product and agent of cultural, social, and political agendas - consequently the notion of just a test does not exist -what sort of vision of society does the test create? Question puts at center the responsibility that test users carry with regard to consequences of test use -need to examine the extent to which test agendas reflect the interest of the field of language teaching and learning - it calls into question traditional testing knowledge that views numbers as symbols of objectivity and truth- these numbers are powerful not only because those who use them consider them truthful but also because they allow classification, quantification and judgement. Success and failure are determined by arbitrary cutting scores and all test takers are judged according to the same yardstick -research to suggest that academic achievement in selected disciplines is hardly affected by degree of English language proficiency- how much do we actually know about the degree of English facility that is required for successful completion? Test developers and experts cannot agree what indeed the tests measure and they do not have a clear sense of We must accept responsibility for all the consequences that we are aware of.
  • #41: 1. there is what the receiving institution wants to know from a test- there is also what the test actually tests, these interests are not necessarily compatible 2. There is no “best” test . We need to consider all variables to make app. choice 3. different tests produce different information . What connection is there between test items that measure surface structure recognition and the ability to be a successful student? If a test is isolated from the reality that the student will experience as a learner, it becomes accordingly less relevant 4. Impact = many of us may have encountered the answer to this question in our classrooms, when students demand to be taught to the test 5. language testing is used as a basis for refusing or admitting a student and thus shifts responsibility away from the institution itself. If the student meets the admission requirements to which native speakers are subject, then they should be admitted on the same basis. The provision of opportunities to continue developing English facility is part of commitment to learning 6. AERA standards state that test developers should provide information on the strengths and the weaknesses of their instruments. However, the ultimate responsibility for appropriate test use and interpretation lies predominantly with the test user. 7. I hope that this brief overview of language proficiency testing will lead to further reflection on language testing and that these testing questions remain with us.