Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

Session 5: Analysing Tests and Test Items
using Classical Test Theory (CTT)
Professor Jim Tognolini

Analysing Tests and Test Items
using Classical Test Theory (CTT)
During this session we will
•define some basic test level statistics using Classical Test Theory analyses:
test mean, test discrimination and test reliability (Chronbach’s Alpha).
•define some basic item level statistics from Classical test theory: item
difficulty, item discrimination (Findlay Index and Point Biserial Correlation).
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016

• Difficulty
• Discrimination
• Reliability
• Validity
Test characteristics to evaluate
September 2016

Test difficulty
Capacity Development Workshop:
Test and Item Development and
Design, Laos, September 2016

Test discrimination
The ability of a test to discriminate between high- and low-achieving
individuals is a function of the items that comprise the test.
Capacity Development Workshop: Test
and Item Development and Design,
Laos, September 2016

Methods of estimating reliability
Method Type of Reliability Procedure
Test-Retest Stability Reliability Give the same test to the same
group on different occasions with
some time between tests.
Equivalent Forms Equivalent Reliability Give two forms (parallel forms) of
the test to the same group in
close succession.
Split-half Internal Consistency Give test once; split test in half
(odd/even); get the correlation
between the score; correct the
correlation between halves using
the Spearman-Brown formula.
Coefficient Alpha Internal Consistency Give test once to a group and
apply formula.
Interrater Consistency of Ratings Get two or more raters to score
the responses and calculate the
correlation coefficient.
Capacity Development Workshop:
Test and Item Development and
Design, Laos, September 2016

Split-halves method
Reliability can also be estimated from a single administration of a
test, either by correlating the two halves or by using the Kuder-
Richardson Method.
The Split-halves method requires the test to be split into halves
which are most equivalent.
To estimate the reliability of the full test the Spearman-Brown
Adjustment is usually applied
September 2016

Kuder-Richardson (KR-20 and KR-21) Method
September 2016

Cronbach’s alpha method
September 2016

1. Test length
In general the longer the test the higher the reliability (more
adequate sampling) provided that the material that is added is
identical in statistical and substantive properties
2.Homogeneity of group
The more heterogeneous the group, the high the reliability. It can
vary at different score levels, gender, location, etc.
3.Difficulty of items
Tests that are too difficult or too hard provide results of low reliability.
Generally set tests of item difficulty equal to 0.5. In general with tests
that are required to discriminate, spread questions over the range in
which the discrimination is required.
Ways to improve reliability
September 2016

4. Objectivity
The more objective the test (and marking scheme) the more reliable are the
resulting test scores.
5.Retain Discriminating Items
In general replace items with a low discrimination with those that highly
discriminate. There comes a point where this practice raises the reliability to
such a point that it lowers validity (attenuation paradox).
6.Increase Speededness of the Tests
Highly speeded tests usually show higher reliability. Don’t use internal
consistency estimates.
Ways to improve reliability
and Item Development and Design, Laos,
September 2016

Types of validity
There are many different types of validity. Traditionally there are three
main types:
I. Content Validity (sometimes referred to as curricular or
instructional validity)
II. Criterion Related Validity (types include predictive and concurrent
validity)
III. Construct Validity
IV. Face Validity
Loevinger (1957) argued that “since predictive, concurrent and content
validities are all essentially ad hoc, construct validity of the whole of
validity from a scientific point of view”
September 2016

Define some basic item level statistics from Classical
Test Theory
September 2016

Item difficulty
September 2016

Item discrimination
Methods for checking item discrimination include
•The Findlay Index (FI)
•The Point Biserial Correlation
•The Biserial Correlation
September 2016

The Findlay Index (FI)
September 2016

The Findlay Index (FI) – An example
Item NRU NRL NU FI Comment
1 9 2 10 0.7 Good item, better students do
well
2 6 6 10 0.0 Weak item, does not
discriminate
3 6 8 10 -0.2 Invalid item, weak students do
better
September 2016

The Findlay Index (FI)
If the number of students in the top group is not equal to the number
in the bottom group proportions must be used.
where
PRU = proportion of persons right in upper group
PRL = proportion of persons right in lower group
FI = PRU - PRL
September 2016

Graphical display of the Findlay Index (FI)
Calculate the proportion of the group getting the item correct and then plot this
against the mean score for the particular group mean scores for each group.
and Item Development and Design,
Laos, September 2016

Graphical display of the Findlay Index (FI)
0
0.2
0.4
0.6
0.8
1
L M U
ProportionCorrect
Score Group
Item 2
Item 6.2
Item 7
Item 10.4
September 2016

Item Type SA SA SA SA SA E E E E E E E Total
Item Number 1 2 3 4 5 6 7 8 9 10 11 12
Max Marks 1 1 1 1 1 3 2 2 3 4 3 6 28
Astha 1 1 0 0 1 3 0 1 3 1 3 4 18
Bosco 1 1 1 0 1 3 0 1 3 1 3 3 18
Chetan 1 1 1 1 1 3 0 2 1 2 3 5 21
Devika 1 1 1 0 1 3 0 2 1 1 2 3 16
Emily 1 1 1 1 1 3 0 1 3 4 2 3 21
Farhan 1 1 1 1 1 3 1 2 3 3 3 4 24
Gogi 1 1 1 0 1 0 0 1 0 0 0 1 6
Harshita 1 1 1 1 1 3 2 1 3 4 3 3 24
Indu 0 1 0 0 1 0 0 2 0 0 2 0 6
Jagat 1 1 1 1 1 2 1 1 3 2 3 5 22
TOTAL 9 10 8 5 10 23 4 14 20 18 24 31 176
September 2016

September 2016

Guttman scale
September 2016

Point-biserial correlation
September 2016

The Guttman structure
If person A scores better than person B on the test, then
person A should have all the items correct that person B has,
and in addition, some other items that are more difficult.
Louis Guttman
September 2016

The Guttman structure (cont.)
1 2 3 4 5 6 Total
Score
0 0 0 0 0 0 0
1 0 0 0 0 0 1
1 1 0 0 0 0 2
1 1 1 0 0 0 3
1 1 1 1 0 0 4
1 1 1 1 1 0 5
1 1 1 1 1 1 6
September 2016

Reasons for not obtaining a strict Guttman
pattern
• The items do not go together as expected and the scores on the items
should not be added.
• The items are very close in difficulty and the persons are all close in
ability.
September 2016

Individual reporting
3 11 2 15 14 9 8 1 7 4 13 12 5 10 6
September 2016

Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)

More Related Content

What's hot (18)

Viewers also liked (14)

Similar to Laos Session 5: Analysing Test Items using Classical Test Theory (CTT) (20)

More from NEQMAP (7)

Recently uploaded (20)

Laos Session 5: Analysing Test Items using Classical Test Theory (CTT)