SlideShare a Scribd company logo
Session 5: Analysing Tests and Test Items
using Classical Test Theory (CTT)
Professor Jim Tognolini
Analysing Tests and Test Items
using Classical Test Theory (CTT)
During this session we will
•define some basic test level statistics using Classical Test Theory analyses:
test mean, test discrimination and test reliability (Chronbach’s Alpha).
•define some basic item level statistics from Classical test theory: item
difficulty, item discrimination (Findlay Index and Point Biserial Correlation).
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
• Difficulty
• Discrimination
• Reliability
• Validity
Test characteristics to evaluate
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Test difficulty
Capacity Development Workshop:
Test and Item Development and
Design, Laos, September 2016
Test discrimination
The ability of a test to discriminate between high- and low-achieving
individuals is a function of the items that comprise the test.
Capacity Development Workshop: Test
and Item Development and Design,
Laos, September 2016
Methods of estimating reliability
Method Type of Reliability Procedure
Test-Retest Stability Reliability Give the same test to the same
group on different occasions with
some time between tests.
Equivalent Forms Equivalent Reliability Give two forms (parallel forms) of
the test to the same group in
close succession.
Split-half Internal Consistency Give test once; split test in half
(odd/even); get the correlation
between the score; correct the
correlation between halves using
the Spearman-Brown formula.
Coefficient Alpha Internal Consistency Give test once to a group and
apply formula.
Interrater Consistency of Ratings Get two or more raters to score
the responses and calculate the
correlation coefficient.
Capacity Development Workshop:
Test and Item Development and
Design, Laos, September 2016
Split-halves method
Reliability can also be estimated from a single administration of a
test, either by correlating the two halves or by using the Kuder-
Richardson Method.
The Split-halves method requires the test to be split into halves
which are most equivalent.
To estimate the reliability of the full test the Spearman-Brown
Adjustment is usually applied
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Kuder-Richardson (KR-20 and KR-21) Method
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Cronbach’s alpha method
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
1. Test length
In general the longer the test the higher the reliability (more
adequate sampling) provided that the material that is added is
identical in statistical and substantive properties
2.Homogeneity of group
The more heterogeneous the group, the high the reliability. It can
vary at different score levels, gender, location, etc.
3.Difficulty of items
Tests that are too difficult or too hard provide results of low reliability.
Generally set tests of item difficulty equal to 0.5. In general with tests
that are required to discriminate, spread questions over the range in
which the discrimination is required.
Ways to improve reliability
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
4. Objectivity
The more objective the test (and marking scheme) the more reliable are the
resulting test scores.
5.Retain Discriminating Items
In general replace items with a low discrimination with those that highly
discriminate. There comes a point where this practice raises the reliability to
such a point that it lowers validity (attenuation paradox).
6.Increase Speededness of the Tests
Highly speeded tests usually show higher reliability. Don’t use internal
consistency estimates.
Ways to improve reliability
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
Types of validity
There are many different types of validity. Traditionally there are three
main types:
I. Content Validity (sometimes referred to as curricular or
instructional validity)
II. Criterion Related Validity (types include predictive and concurrent
validity)
III. Construct Validity
IV. Face Validity
Loevinger (1957) argued that “since predictive, concurrent and content
validities are all essentially ad hoc, construct validity of the whole of
validity from a scientific point of view”
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Define some basic item level statistics from Classical
Test Theory
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Item difficulty
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Item discrimination
Methods for checking item discrimination include
•The Findlay Index (FI)
•The Point Biserial Correlation
•The Biserial Correlation
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Findlay Index (FI)
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Item NRU NRL NU FI Comment
1 9 2 10 0.7 Good item, better students do
well
2 6 6 10 0.0 Weak item, does not
discriminate
3 6 8 10 -0.2 Invalid item, weak students do
better
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
The Findlay Index (FI)
If the number of students in the top group is not equal to the number
in the bottom group proportions must be used.
where
PRU = proportion of persons right in upper group
PRL = proportion of persons right in lower group
FI = PRU - PRL
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Graphical display of the Findlay Index (FI)
Calculate the proportion of the group getting the item correct and then plot this
against the mean score for the particular group mean scores for each group.
Capacity Development Workshop: Test
and Item Development and Design,
Laos, September 2016
Graphical display of the Findlay Index (FI)
0
0.2
0.4
0.6
0.8
1
L M U
ProportionCorrect
Score Group
Item 2
Item 6.2
Item 7
Item 10.4
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Item Type SA SA SA SA SA E E E E E E E Total
Item Number 1 2 3 4 5 6 7 8 9 10 11 12
Max Marks 1 1 1 1 1 3 2 2 3 4 3 6 28
Astha 1 1 0 0 1 3 0 1 3 1 3 4 18
Bosco 1 1 1 0 1 3 0 1 3 1 3 3 18
Chetan 1 1 1 1 1 3 0 2 1 2 3 5 21
Devika 1 1 1 0 1 3 0 2 1 1 2 3 16
Emily 1 1 1 1 1 3 0 1 3 4 2 3 21
Farhan 1 1 1 1 1 3 1 2 3 3 3 4 24
Gogi 1 1 1 0 1 0 0 1 0 0 0 1 6
Harshita 1 1 1 1 1 3 2 1 3 4 3 3 24
Indu 0 1 0 0 1 0 0 2 0 0 2 0 6
Jagat 1 1 1 1 1 2 1 1 3 2 3 5 22
TOTAL 9 10 8 5 10 23 4 14 20 18 24 31 176
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Findlay Index (FI) – An example
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Guttman scale
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
Point-biserial correlation
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Guttman structure
If person A scores better than person B on the test, then
person A should have all the items correct that person B has,
and in addition, some other items that are more difficult.
Louis Guttman
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
The Guttman structure (cont.)
1 2 3 4 5 6 Total
Score
0 0 0 0 0 0 0
1 0 0 0 0 0 1
1 1 0 0 0 0 2
1 1 1 0 0 0 3
1 1 1 1 0 0 4
1 1 1 1 1 0 5
1 1 1 1 1 1 6
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
Reasons for not obtaining a strict Guttman
pattern
• The items do not go together as expected and the scores on the items
should not be added.
• The items are very close in difficulty and the persons are all close in
ability.
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
Guttman scale
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
Individual reporting
3 11 2 15 14 9 8 1 7 4 13 12 5 10 6
Capacity Development Workshop: Test
and Item Development and Design, Laos,
September 2016
Individual reporting
3 11 2 15 14 9 8 1 7 4 13 12 5 10 6
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016

More Related Content

PPT
Laos Session 1: Introduction to Modern Assessment Theory (EN)
PPT
Laos Session 2: Introduction to Modern Assessment Theory (continued) (EN)
PPT
Laos Session 4: Developing Quality Assessment Items (EN)
PPT
Laos Session 6: Developing Quality Assessment Items Extended Response Items
PPT
Laos Session 7: Developing Quality Assessment Items - Rubrics
PPTX
Coherence cas cop assess share
PDF
Creating Meaningful Rubrics
Laos Session 1: Introduction to Modern Assessment Theory (EN)
Laos Session 2: Introduction to Modern Assessment Theory (continued) (EN)
Laos Session 4: Developing Quality Assessment Items (EN)
Laos Session 6: Developing Quality Assessment Items Extended Response Items
Laos Session 7: Developing Quality Assessment Items - Rubrics
Coherence cas cop assess share
Creating Meaningful Rubrics

What's hot (18)

PPTX
Rubrics: Transparent Assessment in Support of Learning
PPTX
Creating Descriptive Rubrics for Educational Assessment
PPT
My Seminar 3
PPT
Rubrics Overview
PDF
Language testing and assessment t ev10
PPT
Rubrics
PPT
Creating Rubrics
PPTX
Choosing and preparing authentic assessment
PPTX
FUSD Rubrics C & I - 5th grade
PDF
BackwardsDesignPedagogy_OUSD_Camargo
PPTX
Presentation how to design rubrics
PPT
Evaluation Rubrics
PPTX
Rubrics
DOCX
Rubrics
PPTX
Holistic rubric orientation pd2
PDF
Rubrics: All You Need To Know About Them
DOCX
Textbook!!organizational behavior a practical, problem solving a
PDF
Designing Rubrics for Competency-based Education
Rubrics: Transparent Assessment in Support of Learning
Creating Descriptive Rubrics for Educational Assessment
My Seminar 3
Rubrics Overview
Language testing and assessment t ev10
Rubrics
Creating Rubrics
Choosing and preparing authentic assessment
FUSD Rubrics C & I - 5th grade
BackwardsDesignPedagogy_OUSD_Camargo
Presentation how to design rubrics
Evaluation Rubrics
Rubrics
Rubrics
Holistic rubric orientation pd2
Rubrics: All You Need To Know About Them
Textbook!!organizational behavior a practical, problem solving a
Designing Rubrics for Competency-based Education
Ad

Viewers also liked (14)

PPTX
Item Response Theory in Constructing Measures
PPTX
Virtual Insurance Apprenticeship Program
PPTX
Talent assessment and identification
PPTX
Assessment Centre Ratings
PDF
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
PDF
IRT - Item response Theory
DOCX
Classical Test Theory and Item Response Theory
PDF
Effective Implementation of Psychometrics in Talent Acquisition and Management
PPT
Introduction to Item Response Theory
PDF
Corporate Social Responsibility
PPTX
Talent Acquisition Best Practices & Trends 2014
PPT
Csr ppt
PPTX
Best Practices in Recruiting Today - High-Impact Talent Acquisition
PPTX
Corporate social responsibility
Item Response Theory in Constructing Measures
Virtual Insurance Apprenticeship Program
Talent assessment and identification
Assessment Centre Ratings
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
IRT - Item response Theory
Classical Test Theory and Item Response Theory
Effective Implementation of Psychometrics in Talent Acquisition and Management
Introduction to Item Response Theory
Corporate Social Responsibility
Talent Acquisition Best Practices & Trends 2014
Csr ppt
Best Practices in Recruiting Today - High-Impact Talent Acquisition
Corporate social responsibility
Ad

Similar to Laos Session 5: Analysing Test Items using Classical Test Theory (CTT) (20)

PPT
Laos Session 3: Principles of Reliability and Validity (EN)
PPTX
PDF
Doing Qualitative Research In Education Settings J Amos Hatch
PPTX
Turning AI Challenges into Learning Opportunities: Empowering Students to Use...
PPTX
SLO based teaching in federal government educational institutions.pptx
PPTX
Writing-Action-Research.pptx
DOCX
PSY 326 Research Methods Week 2 GuidanceWelcome to Week 2 of Res.docx
PPTX
STAGES-OF-TEST-CONSTRUCTION-and-WRITING-A-TOS (1).pptx
DOCX
2010 01 psyf 588 master syllabus
DOCX
CST 120 Assessment 1 (2020) Due 4pm, June 4 2021 Es
PPT
Cst analysis overview sdusd princ 11-16-10
PPT
Collaborative Action Research 2003
DOCX
Coding of the Interview Theme Analysis.docx
DOCX
Coding of the Interview Theme Analysis.docx
PPT
Pgcap session 5 module 1 session ppt nov 2017 (1)
PPT
2010 Sasta Planning And Assessing
PDF
PSY 326 Research Methods Week 3 Guidance.pdf
PPT
Inquiry: Preparing for BTSA Advice & Assistance
PDF
Assessment of Learning 1st Edition Professor Wynne Harlen
PDF
Formative Evaluation - Instructional Design and Development
Laos Session 3: Principles of Reliability and Validity (EN)
Doing Qualitative Research In Education Settings J Amos Hatch
Turning AI Challenges into Learning Opportunities: Empowering Students to Use...
SLO based teaching in federal government educational institutions.pptx
Writing-Action-Research.pptx
PSY 326 Research Methods Week 2 GuidanceWelcome to Week 2 of Res.docx
STAGES-OF-TEST-CONSTRUCTION-and-WRITING-A-TOS (1).pptx
2010 01 psyf 588 master syllabus
CST 120 Assessment 1 (2020) Due 4pm, June 4 2021 Es
Cst analysis overview sdusd princ 11-16-10
Collaborative Action Research 2003
Coding of the Interview Theme Analysis.docx
Coding of the Interview Theme Analysis.docx
Pgcap session 5 module 1 session ppt nov 2017 (1)
2010 Sasta Planning And Assessing
PSY 326 Research Methods Week 3 Guidance.pdf
Inquiry: Preparing for BTSA Advice & Assistance
Assessment of Learning 1st Edition Professor Wynne Harlen
Formative Evaluation - Instructional Design and Development

More from NEQMAP (7)

PDF
ພາກທີ່ 7: ການສ້າງ ລາຍການ ປະເມີນຜົນ ດ້ານຄຸນນະພາບ - ເກນ (Rubrics)
PDF
ພາກທີ່ 5: ການວິເຄາະ ການທົດສອບ ແລະ ການທົດສອບ ລາຍການ ດ້ວຍການນໍາໃຊ້ ທິດສະດີ ການທ...
PDF
ພາກທີ່ 6: ການສ້າງ ລາຍການ ປະເມີນຜົນ ດ້ານຄຸນນະພາບ – ລາຍການ ທີ່ໄດ້ສ້າງ ຄຳຕອບແລ້ວ
PDF
ພາກທີ່ 4: ການສ້າງ ລາຍການຕ່າງໆ ຂອງການປະເມີນຜົນ ດ້ານຄຸນນະພາບ
PDF
ພາກທີ່ 3: ບັນດາຫຼັກການພື້ນຖານ ຂອງການປະເມີນຜົນ: ຄວາມຖືກຕ້ອງ ແລະ ຄວາມຫນ້າເຊື່ອຖ...
PDF
ພາກທີ່ 2: ສະເຫນີແນະນຳ ທິດສະດີ ການປະເມີນຜົນ ແບບ ທັນສະ ໄຫມ (ສືບຕໍ່) ໂດຍ: ສາດສະດ...
PDF
ພາກທີ່ 1: ສະເຫນີແນະ ທິດສະດີ ການປະເມີນຜົນ ແບບ ທັນສະ ໄຫມ ໂດຍ: ສາດສະດາຈານ Jim To...
ພາກທີ່ 7: ການສ້າງ ລາຍການ ປະເມີນຜົນ ດ້ານຄຸນນະພາບ - ເກນ (Rubrics)
ພາກທີ່ 5: ການວິເຄາະ ການທົດສອບ ແລະ ການທົດສອບ ລາຍການ ດ້ວຍການນໍາໃຊ້ ທິດສະດີ ການທ...
ພາກທີ່ 6: ການສ້າງ ລາຍການ ປະເມີນຜົນ ດ້ານຄຸນນະພາບ – ລາຍການ ທີ່ໄດ້ສ້າງ ຄຳຕອບແລ້ວ
ພາກທີ່ 4: ການສ້າງ ລາຍການຕ່າງໆ ຂອງການປະເມີນຜົນ ດ້ານຄຸນນະພາບ
ພາກທີ່ 3: ບັນດາຫຼັກການພື້ນຖານ ຂອງການປະເມີນຜົນ: ຄວາມຖືກຕ້ອງ ແລະ ຄວາມຫນ້າເຊື່ອຖ...
ພາກທີ່ 2: ສະເຫນີແນະນຳ ທິດສະດີ ການປະເມີນຜົນ ແບບ ທັນສະ ໄຫມ (ສືບຕໍ່) ໂດຍ: ສາດສະດ...
ພາກທີ່ 1: ສະເຫນີແນະ ທິດສະດີ ການປະເມີນຜົນ ແບບ ທັນສະ ໄຫມ ໂດຍ: ສາດສະດາຈານ Jim To...

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Lesson notes of climatology university.
PDF
Complications of Minimal Access Surgery at WLH
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Pre independence Education in Inndia.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
Cell Types and Its function , kingdom of life
Basic Mud Logging Guide for educational purpose
Lesson notes of climatology university.
Complications of Minimal Access Surgery at WLH
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
TR - Agricultural Crops Production NC III.pdf
Insiders guide to clinical Medicine.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Computing-Curriculum for Schools in Ghana
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial diseases, their pathogenesis and prophylaxis
Pre independence Education in Inndia.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Institutional Correction lecture only . . .

Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

  • 1. Session 5: Analysing Tests and Test Items using Classical Test Theory (CTT) Professor Jim Tognolini
  • 2. Analysing Tests and Test Items using Classical Test Theory (CTT) During this session we will •define some basic test level statistics using Classical Test Theory analyses: test mean, test discrimination and test reliability (Chronbach’s Alpha). •define some basic item level statistics from Classical test theory: item difficulty, item discrimination (Findlay Index and Point Biserial Correlation). Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 3. • Difficulty • Discrimination • Reliability • Validity Test characteristics to evaluate Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 4. Test difficulty Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 5. Test discrimination The ability of a test to discriminate between high- and low-achieving individuals is a function of the items that comprise the test. Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 6. Methods of estimating reliability Method Type of Reliability Procedure Test-Retest Stability Reliability Give the same test to the same group on different occasions with some time between tests. Equivalent Forms Equivalent Reliability Give two forms (parallel forms) of the test to the same group in close succession. Split-half Internal Consistency Give test once; split test in half (odd/even); get the correlation between the score; correct the correlation between halves using the Spearman-Brown formula. Coefficient Alpha Internal Consistency Give test once to a group and apply formula. Interrater Consistency of Ratings Get two or more raters to score the responses and calculate the correlation coefficient. Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 7. Split-halves method Reliability can also be estimated from a single administration of a test, either by correlating the two halves or by using the Kuder- Richardson Method. The Split-halves method requires the test to be split into halves which are most equivalent. To estimate the reliability of the full test the Spearman-Brown Adjustment is usually applied Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 8. Kuder-Richardson (KR-20 and KR-21) Method Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 9. Cronbach’s alpha method Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 10. 1. Test length In general the longer the test the higher the reliability (more adequate sampling) provided that the material that is added is identical in statistical and substantive properties 2.Homogeneity of group The more heterogeneous the group, the high the reliability. It can vary at different score levels, gender, location, etc. 3.Difficulty of items Tests that are too difficult or too hard provide results of low reliability. Generally set tests of item difficulty equal to 0.5. In general with tests that are required to discriminate, spread questions over the range in which the discrimination is required. Ways to improve reliability Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 11. 4. Objectivity The more objective the test (and marking scheme) the more reliable are the resulting test scores. 5.Retain Discriminating Items In general replace items with a low discrimination with those that highly discriminate. There comes a point where this practice raises the reliability to such a point that it lowers validity (attenuation paradox). 6.Increase Speededness of the Tests Highly speeded tests usually show higher reliability. Don’t use internal consistency estimates. Ways to improve reliability Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 12. Types of validity There are many different types of validity. Traditionally there are three main types: I. Content Validity (sometimes referred to as curricular or instructional validity) II. Criterion Related Validity (types include predictive and concurrent validity) III. Construct Validity IV. Face Validity Loevinger (1957) argued that “since predictive, concurrent and content validities are all essentially ad hoc, construct validity of the whole of validity from a scientific point of view” Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 13. Define some basic item level statistics from Classical Test Theory Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 14. Item difficulty Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 15. Item discrimination Methods for checking item discrimination include •The Findlay Index (FI) •The Point Biserial Correlation •The Biserial Correlation Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 16. The Findlay Index (FI) Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 17. The Findlay Index (FI) – An example Item NRU NRL NU FI Comment 1 9 2 10 0.7 Good item, better students do well 2 6 6 10 0.0 Weak item, does not discriminate 3 6 8 10 -0.2 Invalid item, weak students do better Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 18. The Findlay Index (FI) If the number of students in the top group is not equal to the number in the bottom group proportions must be used. where PRU = proportion of persons right in upper group PRL = proportion of persons right in lower group FI = PRU - PRL Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 19. Graphical display of the Findlay Index (FI) Calculate the proportion of the group getting the item correct and then plot this against the mean score for the particular group mean scores for each group. Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 20. Graphical display of the Findlay Index (FI) 0 0.2 0.4 0.6 0.8 1 L M U ProportionCorrect Score Group Item 2 Item 6.2 Item 7 Item 10.4 Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 21. The Findlay Index (FI) – An example Item Type SA SA SA SA SA E E E E E E E Total Item Number 1 2 3 4 5 6 7 8 9 10 11 12 Max Marks 1 1 1 1 1 3 2 2 3 4 3 6 28 Astha 1 1 0 0 1 3 0 1 3 1 3 4 18 Bosco 1 1 1 0 1 3 0 1 3 1 3 3 18 Chetan 1 1 1 1 1 3 0 2 1 2 3 5 21 Devika 1 1 1 0 1 3 0 2 1 1 2 3 16 Emily 1 1 1 1 1 3 0 1 3 4 2 3 21 Farhan 1 1 1 1 1 3 1 2 3 3 3 4 24 Gogi 1 1 1 0 1 0 0 1 0 0 0 1 6 Harshita 1 1 1 1 1 3 2 1 3 4 3 3 24 Indu 0 1 0 0 1 0 0 2 0 0 2 0 6 Jagat 1 1 1 1 1 2 1 1 3 2 3 5 22 TOTAL 9 10 8 5 10 23 4 14 20 18 24 31 176 Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 22. The Findlay Index (FI) – An example Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 23. The Findlay Index (FI) – An example Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 24. The Findlay Index (FI) – An example Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 25. The Findlay Index (FI) – An example Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 26. Guttman scale Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 27. Point-biserial correlation Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 28. The Guttman structure If person A scores better than person B on the test, then person A should have all the items correct that person B has, and in addition, some other items that are more difficult. Louis Guttman Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 29. The Guttman structure (cont.) 1 2 3 4 5 6 Total Score 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 2 1 1 1 0 0 0 3 1 1 1 1 0 0 4 1 1 1 1 1 0 5 1 1 1 1 1 1 6 Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 30. Reasons for not obtaining a strict Guttman pattern • The items do not go together as expected and the scores on the items should not be added. • The items are very close in difficulty and the persons are all close in ability. Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 31. Guttman scale Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 32. Individual reporting 3 11 2 15 14 9 8 1 7 4 13 12 5 10 6 Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016
  • 33. Individual reporting 3 11 2 15 14 9 8 1 7 4 13 12 5 10 6 Capacity Development Workshop: Test and Item Development and Design, Laos, September 2016