SlideShare a Scribd company logo
LANGUAGE TESTING
A Course Presentation by:
Dr. Jihan Zayed
Mustaqbal University, KSA
2019
Arthur Huges (2001). Language Testing For Teachers.
Cambridge University Press
Outline
Kinds of Testing
2
Approaches to Testing
3
Validity and Reliability
4
Achieving beneficial backwash5
Stages of Test Construction
6
Test Techniques for Testing Overall Ability7
Teaching and Testing1
Testing of Language skills
8
Testing grammar and vocabulary
9
Test administration
10
An accurate test must be:
1-Valid
• A valid test measures accurately what it is intended to
measure. For example, if we want to test writing, we have
to ask our students to write not to read, for instance.
2-Reliable
• A reliable test provides consistent results no matter how
many times a student takes it. For example, a student takes
approximately the same score, whether s/he repeats the test
on a particular day or the next.
3
Invalidity has 2 origins:
4
Test
Content
Test
Techniques
For knowing how well
students can write, there
is absolutely no way we
can get a really accurate
measure of their ability
by means of a multiple-
choice test.
Unreliability has 2 origins:
Features of the test
• Unclear instructions
• Ambiguous questions
• Easily-guessed answers
Scoring
• The same composition may
be given different scores by
different markers or by the
same marker on different
occasions.
5
Language Testing
• The need for language tests:
• Testing a language is a structured attempt to measure what can
students do in, or with, a language. It is important, for example, for:
1. accepting students from overseas to study in, for example, British
and American universities;
2. hiring translators or interpreters in different organizations; and
3. getting information about the achievement of groups of learners.
• What is to be done?
• The teaching profession can contribute to the improvement of
testing through:
1. They can write better tests themselves.
2. They can put pressure on others to improve their tests such as
when a writing test (Test of Written English) was added to
supplement TOEFL (Test of English as a Foreign English), the test
taken by most non-native speakers of English applying to North
American universities.
6
Testing as problem solving
• No Best Test
A test which proves ideal for one purpose may be useless for
another. That is, there must be specified objectives before
designing a test.
• Tests should …
1. consistently and accurately measure the abilities to be
measured;
2. have a beneficial effect on teaching; and
3. be practical – economical in terms of time and money.
7
Kinds of Tests and Testing
Language Tests
Design (Method)
Paper-and-Pen Tests Performance Tests
Purpose
Proficiency Tests Achievement Tests
Diagnosis Tests Placement Tests
8
Kinds of Tests: (Method of Testing)
• Paper-and-pen Tests are typically used for the assessment of:
1. Separate components of language (grammar, vocabulary …)
2. Receptive skills (listening and reading)
• Performance Tests assess language skills in the act of communication (i.e.,
productive skills: speaking and writing) where:
1. extended samples of speech/writing are elicited;
2. judged by trained markers; and
3. common rating procedure are used.
9
Kinds of Tests: (Purpose of Testing)
• Proficiency Tests measure a students’ general ability in the language
regardless of any training they had in that language, rather what they have
to do in the language.
• Achievement Tests discover how far the students have achieved of the
objectives of a course of study.
• Diagnostic Tests identify the students’ strengths and weaknesses to
ascertain what further teaching is necessary.
• Placement Tests assist placement of students by identifying the stage or a
level of a teaching programme most appropriate to their abilities.
10
Achievement Tests:
1. Final Achievement Test:
• Administered at the end of a course of study
• Intended to measure course contents and/or objectives
2. Progress Achievement Test:
• Administered repeatedly during a course of study
• Measures the progress the students are making
towards course objectives
• Increasing scores indicate the progress being made
• Establishes a series of well-defined short-term
objectives on which to test or quiz the students
11
12
Direct vs. Indirect Testing
Norm-referenced vs.
Criterion-referenced Testing
Discrete Point vs. Integrative Testing.
You can simply impress your
audience and add a unique
zing and appeal to your
Presentations.
Your Text Here
01 02
03 04
Approaches to Testing
Objective Testing vs.
Subjective Testing
Direct vs. Indirect Testing
• Direct Testing:
• It requires the test taker to perform precisely the skill we wish to
measure. For example, if we want to know how well students write
essays, then we ask them to write an essay.
• It is easier to carry out with productive skills speaking and writing; while
with reading and listening, students have to demonstrate that they have
done this.
• It is easy to create the conditions eliciting the skills to be measured.
• It gives helpful backwash effect because practice for the test involves
practice of the skills to be measured.
• Indirect Testing
• It attempts to measure the sub-skills which underlie main skills, for
example, writing skill (e.g. vocabulary, grammatical structures).
• It is dangerous as the mastery of the underlying sub-skills does not always
lead to the mastery of main skills.
• In 1961, Lado measured pronunciation by a paper-and-pencil test in which
the candidate has to identify pairs of words which rhyme with each other.
13
Discrete Point vs. Integrative Testing
• Discrete Point Testing:
• It entails testing one element at a time, item by item.
• For ex., a series of items each testing a particular grammatical
structure.
• Integrative Testing:
• It requires the test taker to combine many language elements
for the completion of a task.
• For ex., writing a composition, making notes in a lecture, taking
a dictation, etc.
14
Norm-referenced vs. Criterion-referenced Testing
• Norm-referenced Testing:
• places a student in a percentage category;
• relates one candidate’s performance to that of other
candidates; and
• seeks a bell-shaped curve of student assessment.
• Criterion-referenced Testing:
• sets meaningful standards for students to measure their
progress – these standards do not change with different groups
of students;
• classifies students according to what they can actually do with
the language; and
• motivates students to perform “up-to-standard” rather than
trying to be “better” than other students.
15
Objective Testing vs. Subjective Testing
• Objective Testing:
• No Judgement is required on the part of the scorer.
• For example, Multiple Choice, Fill-in-the-blank, True or False,
Match, … etc.
• Subjective Testing
• Judgement is required on the part of the scorer.
• Different degrees of subjectivity in scoring.
• Complexity increases subjectivity, for example, the scoring of a
composition is more subjective than short-answer responses.
• The less subjective the scoring, the greater agreement will be
between the two different scorers and between the scores of
one person scoring the same test paper on different occasions.
16
Types of Validity
• Content Validity:
• The test would have content validity only if it included a representative
sample of all the language skills, structures, vocabulary, etc. with which it is
intended to test.
• A comparison of test specifications and test content is the basis for this type.
• Criterion-related Validity:
• Where the results of a test agree with those provided by an independent and
highly dependable assessment of the candidate’s ability. This can be
concurrent or predictive. The former refers to the time of administration
while the latter refers to the prediction of candidates’ future performance.
• Construct Validity:
• A construct refers to an underlying trait or ability hypothesized in language
learning theory. It is an important consideration in indirect testing of main
skills or the testing of sub-skills like guessing the meaning of unknown words
as a sub-skill of reading.
• Face Validity:
• A test has face validity if it seems as if it is measuring what it is supposed to
be measuring.
17
Types of Reliability
• Test Reliability:
• A student’s score on a test will be approximately the same no matter
how many times s/he takes it.
• Scorer Reliability:
• When the test is objective, the scoring requires no judgment on the
part of the scorer, and the scores should always be the same.
• When the test is subjective, the scoring requires judgment on the part
of the scorer, and the scores will not be the same on different
occasions.
• A scorer would give the same score on different two occasions and
this would be the same as given by another one on either occasion.
18
How to Make Tests More Reliable!
• Test for enough independent samples of behavior and allow for as many fresh starts as
possible.
• Do not allow test takers too much freedom. Restrict and specify their range of possible
answers.
• Write unambiguous items.
• Provide clear and explicit instructions.
• Ensure that tests are well laid out and perfectly legible.
• Make sure candidates are familiar with format and test-taking procedures.
• Provide uniform and non-distracting conditions of administration.
• Use items that permit scoring which is objective as possible.
• Make comparisons between candidates as direct as possible.
• Provide a detailed scoring key.
• Train scorers.
• Agree on acceptable responses and appropriate scores at the outset of scoring.
• Identify test takers by number, not name.
• Employ multiple, independent scoring.
19
Achieving Beneficial Backwash
• Test abilities whose development we want to encourage.
• Sample widely and unpredictably.
• Use both direct and indirect testing.
• Make testing criterion-referenced.
• Base achievement tests on objectives.
• Ensure test is known and understood by both teachers and
students.
• Provide assistance to teachers.
• Count the cost.
20
Stages of Test Construction
Statement of the Problem
Providing a Solution to the Problem
21
Statement of the Problem
Be clear about what one wants to know and why! The
following questions have to answered:
• What kind of test is most appropriate?
• What is the precise purpose?
• What abilities are to be tested?
• How detailed must the results be?
• How accurate must the results be?
• How important is backwash?
• What constraints are set by unavailability of expertise,
facilities, time [for construction, administration, and
scoring)?
22
Providing a Solution to the Problem
• Once the problem is clear, then steps can be taken to solve it.
• Efforts should be made to gather information on similar tests
designed for similar situations. If possible, samples of these
tests should be obtained. As each testing situation is unique,
they should not be copied, but rather used to suggest
possibilities.
23
1. Writing Specifications for the Test
• The first form that the solution take is a set of specifications for
the test. They include:
24
Test Specifications
Content
Operations Types of Text
Addressees Topics
Format and
Timing
Criterial Levels
of Performance
Scoring
Procedures
1. Content refers not to the content of a single, particular version of the test,
but to the entire potential content of any number of versions. The content
should be specified regarding:
• Operations: The tasks students will have to be able to carry out (e.g. For reading:,
skim, scan, guess, etc.).
• Types of Text: A writing test might include: letters, forms, academic essays, etc.).
• Addressees: the people the test-taker is expected to be able to speak or write to; or
the people for whom reading and listening are primarily intended (for example,
native-speaker university students).
• Topics: topics should be selected according to their suitability for the test takers and
the type of test.
2. Format and Timing should specify test structure and item types/elicitation
procedures, with examples. It should state what weighting to be allocated to
each component. It should say how many passages be presented or required
and how many items will be in each component.
3. Criterial Levels of Performance: The required levels of performance for
different levels of success should be specified. For example, to demonstrate
mastery, 80 % of the items must be responded to correctly. It may entail a
complex rubric including the following: accuracy, appropriacy, range of
expression, flexibility, size of utterances.
4. Scoring Procedures: These are most relevant when scoring is subjective. 25
2. Writing the Test
Sampling
• Choose widely from whole area of content. Succeeding versions of test should
sample widely and unpredictably.
Item Writing and Moderation
•Some items will have to be rejected – others reworked.
•Best way is through teamwork!
•Item writers must be open to, and ready to accept criticism. Critical questions
must be asked
•Is the task perfectly clear?
•Is there more than one possible correct answer?
•Do test takers have enough time to perform the tasks?
Writing and Moderation of Scoring Key
• When there is only one correct response, this is quite straightforward.
• When there are alternative acceptable responses, which may be awarded different
scores, or where partial credit may be given for incomplete responses, greater care26
3. Pretesting
• The aim should be to administer the test first to a group as
similar as to the one for which it is really intended.
• Problems in administration and scoring are noted.
• The reliability coefficient of the whole test and of its
components are calculated, and individual items are
analyzed.
27
Test Techniques for Testing Overall Ability
• Test Techniques are means of eliciting behavior from test
takers which inform us about their language abilities.
• We need test techniques which:
• elicit valid and reliable behavior regarding ability in
which we are interested;
• will elicit behavior which will be reliably scored;
• are economical; and
• have a positive backwash effect.
28
TOEFL IBT
• TOEFL stands for Test of English as a Foreign Language – Internet-
based Test.
• It measures the ability of nonnative speakers of English to use and
understand English as it is spoken, written, and heard in college and
university settings.
• This test emphasizes integrated skills and provides better information
about students’ ability to communicate in an academic setting and their
readiness for academic coursework.
• In 2005, it replaced TOEFL PBT and TOEFL CBT.
29
IELTS
• IELTS stands for International English Language Testing System.
• It measures ability to communicate in English across all four language
skills (listening, reading, writing and speaking) for people who intend
to study or work where English is the language of communication.
• Since 1989, it is managed by British Council, IELTS Australia and the
University of Cambridge.
• Settings that use this test: English-medium universities, colleges,
professional organizations, Immigration Canada (proof of English
language ability)
30
STEP
• STEP stands for Standardized Test of English Proficiency.
• Based on growing international needs for the English language, several
academic and non-academic institutions have approached the National
Center for Assessment in Higher Education calling for the
development of an English test that could measure the proficiency of
their applicants.
• It is designed to be an objective and unbiased test. It is made up of:
1. Reading Comprehension (RC – 40%),
2. Structure (ST – 30%),
3. Listening Comprehension (LC – 20%), and
4. Compositional Analysis (CA – 10%)
• STEP has 100 questions distributed among these four components.
31
Test Techniques for Testing Overall Ability
Multiple Choice
• Multiple Choice
• Advantages
• Scoring is reliable and can be done rapidly and economically,
• Possible to include many more items than would otherwise be
possible in a given period of time – making the test more
reliable.
• Disadvantages
• Tests only recognition knowledge
• Guessing may have a considerable but unknowable effect on
test scores
• Technique severely restricts what can be tested
• It is very difficult to write successful items
• Backwash may be harmful
• Cheating may be facilititated.
32
Test Techniques for Testing Overall Ability
Multiple Choice
• Multiple Choice items take many forms, but their basic
structure is as follows:
There is a stem:
Enid has been here ………………… half an hour.
and a number of options, one of which is correct, the
others being distractors:
A. During
B. For
C. While
D. since
33
Test Techniques for Testing Overall Ability
Cloze (Fill in the Blanks)
• Cloze
• It involves deleting a number of words in a passage ,
leaving blanks, and requiring the person taking the test to
replace the original words.
• It can an be used with a tape-recorded oral passage to
indirectly test oral ability.
• Clear instructions should be provided and students should
initially be encouraged to read through the passage first.
34
Test Techniques for Testing Overall Ability
The C-Test
• A variety of cloze
• Instead of whole words, it is the second half of every
word that is deleted.
• Advantages over the cloze test are
1. Only exact scoring is necessary
2. Shorter (and so more) passages are possible
• In comparison to a Cloze, a C-Test of 100 items takes
little space and not nearly so much time to complete
35
Test Techniques for Testing Overall Ability
Dictation
• Dictation tests are:
• in prediction of overall ability have the advantage of
involving listening ability.
• easy to create and administer
• However, they are:
• not easy to score, and
• time-consuming.
• With poorer students, scoring becomes tedious.
• Partial-dictation may be considered as a better alternative
since it is easier for both the test taker and the scorer.
36

More Related Content

PPTX
Language Testing
PPTX
Stages of test development and common test techniques (1)
PPTX
Designing classroom language tests
PPTX
Testing : An important part of ELT
PPTX
Testing oral ability
PPTX
Testing, assessing, and teaching
PPTX
Kinds of Language Tests
PPTX
Achieving beneficial blackwash
Language Testing
Stages of test development and common test techniques (1)
Designing classroom language tests
Testing : An important part of ELT
Testing oral ability
Testing, assessing, and teaching
Kinds of Language Tests
Achieving beneficial blackwash

What's hot (20)

PPT
Types of test and testing
PPTX
Test Techniques
PPTX
Testing for Language Teachers
PPTX
Designing classroom language test
PPTX
Types of tests and types of testing
PPTX
Principles of language assessment
PPTX
Kinds of tests and testing
PDF
Language Testing Evaluation
PPTX
Testing writing (for Language Teachers)
PPT
Testing for language teachers 101 (1)
PPT
Chapter 2(principles of language assessment)
PPT
Reliability in Language Testing
PPTX
Testing for Language Teachers Arthur Hughes
PPTX
Testing oral ability
PPTX
Introduction to Test and Assessment
PPTX
Stages of test development
PPT
Introduction to Language Assessment by Brown
PPTX
Testing grammar
PPTX
Testing writing
Types of test and testing
Test Techniques
Testing for Language Teachers
Designing classroom language test
Types of tests and types of testing
Principles of language assessment
Kinds of tests and testing
Language Testing Evaluation
Testing writing (for Language Teachers)
Testing for language teachers 101 (1)
Chapter 2(principles of language assessment)
Reliability in Language Testing
Testing for Language Teachers Arthur Hughes
Testing oral ability
Introduction to Test and Assessment
Stages of test development
Introduction to Language Assessment by Brown
Testing grammar
Testing writing
Ad

Similar to Language testing (20)

PPTX
Types of Tests,
PDF
2. presentation evaluation segunda part.pdf
PPTX
Language Assessment : Kinds of tests and testing
PDF
Teaching Methodology "Evaluation and testing"
PPT
Principles_of_language_testing.ppt
PPT
Principles of Lang Assessment_Recently RvsdRe.ppt
PPTX
Learning_activity1_Carvajal_Jennifer.pptx
PDF
Chapter-3 Dsigning classroom language Tests.pdf
PDF
Chapter-3 Dsigning classroom language Tests 1.pdf
PDF
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
PPTX
introducing language testing and assessment
PPT
Language Assessment - Designing Classroom Test by EFL Learners
PPTX
7 assessment and the cefr
PPTX
PDF
Language Testing :kinds of tests
PPTX
Assessments, concepts and issues
PDF
Presentación.Ana.Marca.pdf
PPTX
testing and evaluation
PPTX
Types of Tests.pptx
PDF
Learning Activity 1_ Viteri Flores_Arlyn Johanna
Types of Tests,
2. presentation evaluation segunda part.pdf
Language Assessment : Kinds of tests and testing
Teaching Methodology "Evaluation and testing"
Principles_of_language_testing.ppt
Principles of Lang Assessment_Recently RvsdRe.ppt
Learning_activity1_Carvajal_Jennifer.pptx
Chapter-3 Dsigning classroom language Tests.pdf
Chapter-3 Dsigning classroom language Tests 1.pdf
languagetestingpresentationintroducinglangauageandassessment-210119155750.pdf
introducing language testing and assessment
Language Assessment - Designing Classroom Test by EFL Learners
7 assessment and the cefr
Language Testing :kinds of tests
Assessments, concepts and issues
Presentación.Ana.Marca.pdf
testing and evaluation
Types of Tests.pptx
Learning Activity 1_ Viteri Flores_Arlyn Johanna
Ad

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Classroom Observation Tools for Teachers
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Pharma ospi slides which help in ospi learning
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
Supply Chain Operations Speaking Notes -ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
Week 4 Term 3 Study Techniques revisited.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Classroom Observation Tools for Teachers
O7-L3 Supply Chain Operations - ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Pharma ospi slides which help in ospi learning
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPH.pptx obstetrics and gynecology in nursing
FourierSeries-QuestionsWithAnswers(Part-A).pdf

Language testing

  • 1. LANGUAGE TESTING A Course Presentation by: Dr. Jihan Zayed Mustaqbal University, KSA 2019 Arthur Huges (2001). Language Testing For Teachers. Cambridge University Press
  • 2. Outline Kinds of Testing 2 Approaches to Testing 3 Validity and Reliability 4 Achieving beneficial backwash5 Stages of Test Construction 6 Test Techniques for Testing Overall Ability7 Teaching and Testing1 Testing of Language skills 8 Testing grammar and vocabulary 9 Test administration 10
  • 3. An accurate test must be: 1-Valid • A valid test measures accurately what it is intended to measure. For example, if we want to test writing, we have to ask our students to write not to read, for instance. 2-Reliable • A reliable test provides consistent results no matter how many times a student takes it. For example, a student takes approximately the same score, whether s/he repeats the test on a particular day or the next. 3
  • 4. Invalidity has 2 origins: 4 Test Content Test Techniques For knowing how well students can write, there is absolutely no way we can get a really accurate measure of their ability by means of a multiple- choice test.
  • 5. Unreliability has 2 origins: Features of the test • Unclear instructions • Ambiguous questions • Easily-guessed answers Scoring • The same composition may be given different scores by different markers or by the same marker on different occasions. 5
  • 6. Language Testing • The need for language tests: • Testing a language is a structured attempt to measure what can students do in, or with, a language. It is important, for example, for: 1. accepting students from overseas to study in, for example, British and American universities; 2. hiring translators or interpreters in different organizations; and 3. getting information about the achievement of groups of learners. • What is to be done? • The teaching profession can contribute to the improvement of testing through: 1. They can write better tests themselves. 2. They can put pressure on others to improve their tests such as when a writing test (Test of Written English) was added to supplement TOEFL (Test of English as a Foreign English), the test taken by most non-native speakers of English applying to North American universities. 6
  • 7. Testing as problem solving • No Best Test A test which proves ideal for one purpose may be useless for another. That is, there must be specified objectives before designing a test. • Tests should … 1. consistently and accurately measure the abilities to be measured; 2. have a beneficial effect on teaching; and 3. be practical – economical in terms of time and money. 7
  • 8. Kinds of Tests and Testing Language Tests Design (Method) Paper-and-Pen Tests Performance Tests Purpose Proficiency Tests Achievement Tests Diagnosis Tests Placement Tests 8
  • 9. Kinds of Tests: (Method of Testing) • Paper-and-pen Tests are typically used for the assessment of: 1. Separate components of language (grammar, vocabulary …) 2. Receptive skills (listening and reading) • Performance Tests assess language skills in the act of communication (i.e., productive skills: speaking and writing) where: 1. extended samples of speech/writing are elicited; 2. judged by trained markers; and 3. common rating procedure are used. 9
  • 10. Kinds of Tests: (Purpose of Testing) • Proficiency Tests measure a students’ general ability in the language regardless of any training they had in that language, rather what they have to do in the language. • Achievement Tests discover how far the students have achieved of the objectives of a course of study. • Diagnostic Tests identify the students’ strengths and weaknesses to ascertain what further teaching is necessary. • Placement Tests assist placement of students by identifying the stage or a level of a teaching programme most appropriate to their abilities. 10
  • 11. Achievement Tests: 1. Final Achievement Test: • Administered at the end of a course of study • Intended to measure course contents and/or objectives 2. Progress Achievement Test: • Administered repeatedly during a course of study • Measures the progress the students are making towards course objectives • Increasing scores indicate the progress being made • Establishes a series of well-defined short-term objectives on which to test or quiz the students 11
  • 12. 12 Direct vs. Indirect Testing Norm-referenced vs. Criterion-referenced Testing Discrete Point vs. Integrative Testing. You can simply impress your audience and add a unique zing and appeal to your Presentations. Your Text Here 01 02 03 04 Approaches to Testing Objective Testing vs. Subjective Testing
  • 13. Direct vs. Indirect Testing • Direct Testing: • It requires the test taker to perform precisely the skill we wish to measure. For example, if we want to know how well students write essays, then we ask them to write an essay. • It is easier to carry out with productive skills speaking and writing; while with reading and listening, students have to demonstrate that they have done this. • It is easy to create the conditions eliciting the skills to be measured. • It gives helpful backwash effect because practice for the test involves practice of the skills to be measured. • Indirect Testing • It attempts to measure the sub-skills which underlie main skills, for example, writing skill (e.g. vocabulary, grammatical structures). • It is dangerous as the mastery of the underlying sub-skills does not always lead to the mastery of main skills. • In 1961, Lado measured pronunciation by a paper-and-pencil test in which the candidate has to identify pairs of words which rhyme with each other. 13
  • 14. Discrete Point vs. Integrative Testing • Discrete Point Testing: • It entails testing one element at a time, item by item. • For ex., a series of items each testing a particular grammatical structure. • Integrative Testing: • It requires the test taker to combine many language elements for the completion of a task. • For ex., writing a composition, making notes in a lecture, taking a dictation, etc. 14
  • 15. Norm-referenced vs. Criterion-referenced Testing • Norm-referenced Testing: • places a student in a percentage category; • relates one candidate’s performance to that of other candidates; and • seeks a bell-shaped curve of student assessment. • Criterion-referenced Testing: • sets meaningful standards for students to measure their progress – these standards do not change with different groups of students; • classifies students according to what they can actually do with the language; and • motivates students to perform “up-to-standard” rather than trying to be “better” than other students. 15
  • 16. Objective Testing vs. Subjective Testing • Objective Testing: • No Judgement is required on the part of the scorer. • For example, Multiple Choice, Fill-in-the-blank, True or False, Match, … etc. • Subjective Testing • Judgement is required on the part of the scorer. • Different degrees of subjectivity in scoring. • Complexity increases subjectivity, for example, the scoring of a composition is more subjective than short-answer responses. • The less subjective the scoring, the greater agreement will be between the two different scorers and between the scores of one person scoring the same test paper on different occasions. 16
  • 17. Types of Validity • Content Validity: • The test would have content validity only if it included a representative sample of all the language skills, structures, vocabulary, etc. with which it is intended to test. • A comparison of test specifications and test content is the basis for this type. • Criterion-related Validity: • Where the results of a test agree with those provided by an independent and highly dependable assessment of the candidate’s ability. This can be concurrent or predictive. The former refers to the time of administration while the latter refers to the prediction of candidates’ future performance. • Construct Validity: • A construct refers to an underlying trait or ability hypothesized in language learning theory. It is an important consideration in indirect testing of main skills or the testing of sub-skills like guessing the meaning of unknown words as a sub-skill of reading. • Face Validity: • A test has face validity if it seems as if it is measuring what it is supposed to be measuring. 17
  • 18. Types of Reliability • Test Reliability: • A student’s score on a test will be approximately the same no matter how many times s/he takes it. • Scorer Reliability: • When the test is objective, the scoring requires no judgment on the part of the scorer, and the scores should always be the same. • When the test is subjective, the scoring requires judgment on the part of the scorer, and the scores will not be the same on different occasions. • A scorer would give the same score on different two occasions and this would be the same as given by another one on either occasion. 18
  • 19. How to Make Tests More Reliable! • Test for enough independent samples of behavior and allow for as many fresh starts as possible. • Do not allow test takers too much freedom. Restrict and specify their range of possible answers. • Write unambiguous items. • Provide clear and explicit instructions. • Ensure that tests are well laid out and perfectly legible. • Make sure candidates are familiar with format and test-taking procedures. • Provide uniform and non-distracting conditions of administration. • Use items that permit scoring which is objective as possible. • Make comparisons between candidates as direct as possible. • Provide a detailed scoring key. • Train scorers. • Agree on acceptable responses and appropriate scores at the outset of scoring. • Identify test takers by number, not name. • Employ multiple, independent scoring. 19
  • 20. Achieving Beneficial Backwash • Test abilities whose development we want to encourage. • Sample widely and unpredictably. • Use both direct and indirect testing. • Make testing criterion-referenced. • Base achievement tests on objectives. • Ensure test is known and understood by both teachers and students. • Provide assistance to teachers. • Count the cost. 20
  • 21. Stages of Test Construction Statement of the Problem Providing a Solution to the Problem 21
  • 22. Statement of the Problem Be clear about what one wants to know and why! The following questions have to answered: • What kind of test is most appropriate? • What is the precise purpose? • What abilities are to be tested? • How detailed must the results be? • How accurate must the results be? • How important is backwash? • What constraints are set by unavailability of expertise, facilities, time [for construction, administration, and scoring)? 22
  • 23. Providing a Solution to the Problem • Once the problem is clear, then steps can be taken to solve it. • Efforts should be made to gather information on similar tests designed for similar situations. If possible, samples of these tests should be obtained. As each testing situation is unique, they should not be copied, but rather used to suggest possibilities. 23
  • 24. 1. Writing Specifications for the Test • The first form that the solution take is a set of specifications for the test. They include: 24 Test Specifications Content Operations Types of Text Addressees Topics Format and Timing Criterial Levels of Performance Scoring Procedures
  • 25. 1. Content refers not to the content of a single, particular version of the test, but to the entire potential content of any number of versions. The content should be specified regarding: • Operations: The tasks students will have to be able to carry out (e.g. For reading:, skim, scan, guess, etc.). • Types of Text: A writing test might include: letters, forms, academic essays, etc.). • Addressees: the people the test-taker is expected to be able to speak or write to; or the people for whom reading and listening are primarily intended (for example, native-speaker university students). • Topics: topics should be selected according to their suitability for the test takers and the type of test. 2. Format and Timing should specify test structure and item types/elicitation procedures, with examples. It should state what weighting to be allocated to each component. It should say how many passages be presented or required and how many items will be in each component. 3. Criterial Levels of Performance: The required levels of performance for different levels of success should be specified. For example, to demonstrate mastery, 80 % of the items must be responded to correctly. It may entail a complex rubric including the following: accuracy, appropriacy, range of expression, flexibility, size of utterances. 4. Scoring Procedures: These are most relevant when scoring is subjective. 25
  • 26. 2. Writing the Test Sampling • Choose widely from whole area of content. Succeeding versions of test should sample widely and unpredictably. Item Writing and Moderation •Some items will have to be rejected – others reworked. •Best way is through teamwork! •Item writers must be open to, and ready to accept criticism. Critical questions must be asked •Is the task perfectly clear? •Is there more than one possible correct answer? •Do test takers have enough time to perform the tasks? Writing and Moderation of Scoring Key • When there is only one correct response, this is quite straightforward. • When there are alternative acceptable responses, which may be awarded different scores, or where partial credit may be given for incomplete responses, greater care26
  • 27. 3. Pretesting • The aim should be to administer the test first to a group as similar as to the one for which it is really intended. • Problems in administration and scoring are noted. • The reliability coefficient of the whole test and of its components are calculated, and individual items are analyzed. 27
  • 28. Test Techniques for Testing Overall Ability • Test Techniques are means of eliciting behavior from test takers which inform us about their language abilities. • We need test techniques which: • elicit valid and reliable behavior regarding ability in which we are interested; • will elicit behavior which will be reliably scored; • are economical; and • have a positive backwash effect. 28
  • 29. TOEFL IBT • TOEFL stands for Test of English as a Foreign Language – Internet- based Test. • It measures the ability of nonnative speakers of English to use and understand English as it is spoken, written, and heard in college and university settings. • This test emphasizes integrated skills and provides better information about students’ ability to communicate in an academic setting and their readiness for academic coursework. • In 2005, it replaced TOEFL PBT and TOEFL CBT. 29
  • 30. IELTS • IELTS stands for International English Language Testing System. • It measures ability to communicate in English across all four language skills (listening, reading, writing and speaking) for people who intend to study or work where English is the language of communication. • Since 1989, it is managed by British Council, IELTS Australia and the University of Cambridge. • Settings that use this test: English-medium universities, colleges, professional organizations, Immigration Canada (proof of English language ability) 30
  • 31. STEP • STEP stands for Standardized Test of English Proficiency. • Based on growing international needs for the English language, several academic and non-academic institutions have approached the National Center for Assessment in Higher Education calling for the development of an English test that could measure the proficiency of their applicants. • It is designed to be an objective and unbiased test. It is made up of: 1. Reading Comprehension (RC – 40%), 2. Structure (ST – 30%), 3. Listening Comprehension (LC – 20%), and 4. Compositional Analysis (CA – 10%) • STEP has 100 questions distributed among these four components. 31
  • 32. Test Techniques for Testing Overall Ability Multiple Choice • Multiple Choice • Advantages • Scoring is reliable and can be done rapidly and economically, • Possible to include many more items than would otherwise be possible in a given period of time – making the test more reliable. • Disadvantages • Tests only recognition knowledge • Guessing may have a considerable but unknowable effect on test scores • Technique severely restricts what can be tested • It is very difficult to write successful items • Backwash may be harmful • Cheating may be facilititated. 32
  • 33. Test Techniques for Testing Overall Ability Multiple Choice • Multiple Choice items take many forms, but their basic structure is as follows: There is a stem: Enid has been here ………………… half an hour. and a number of options, one of which is correct, the others being distractors: A. During B. For C. While D. since 33
  • 34. Test Techniques for Testing Overall Ability Cloze (Fill in the Blanks) • Cloze • It involves deleting a number of words in a passage , leaving blanks, and requiring the person taking the test to replace the original words. • It can an be used with a tape-recorded oral passage to indirectly test oral ability. • Clear instructions should be provided and students should initially be encouraged to read through the passage first. 34
  • 35. Test Techniques for Testing Overall Ability The C-Test • A variety of cloze • Instead of whole words, it is the second half of every word that is deleted. • Advantages over the cloze test are 1. Only exact scoring is necessary 2. Shorter (and so more) passages are possible • In comparison to a Cloze, a C-Test of 100 items takes little space and not nearly so much time to complete 35
  • 36. Test Techniques for Testing Overall Ability Dictation • Dictation tests are: • in prediction of overall ability have the advantage of involving listening ability. • easy to create and administer • However, they are: • not easy to score, and • time-consuming. • With poorer students, scoring becomes tedious. • Partial-dictation may be considered as a better alternative since it is easier for both the test taker and the scorer. 36