SlideShare a Scribd company logo
Day 1 PM: Using IRT
Item and test information
Comparison of IRT to Classical Test Theory
How to do IRT analysis
Part 1
Item and test information
Information
 Information is the tool that IRT uses
to build tests
 It is a statistical term that quantifies
how much something “adds” to a
procedure
 Or, alternatively, how much
uncertainty (error) it decreases
 A good test has a lot of information!
Item information
 IRT calculates information for each
item and test at each level of q
 It is therefore not a single number –
it is a function across ability
 Each item has an item information
function
 Each test has an test information
function
Item information
 Some items provide information for
high students, some for low
 Same is true for tests: a test can be
more accurate for certain score
ranges – and IRT will tell you which
Information
 Item information is summative, that
is, it can be added up to obtain the
test information function (TIF)
 Then we know where to add/subtract
items
 Bonus: The TIF can also be inverted
to obtain a predicted SEM curve
Item information
 With CTT, “information” can be
conceptualized by jointly considering
the P and rpbis
◦ Obviously, a higher rpbis is better
 Definitely don’t want negative!
◦ P represents which examinees it is most
appropriate for
 P = 0.95 is easy, good for low examinees
 P = 0.50 is hard, good for high examinees
Item information
 But since items and examinees are
not on the same scale, there is no
direct connection
 With IRT, there is
 Item with b = 0.7 is good for person
with q = 0.7
◦ This is the basis of adaptive testing –
doing this continually
Item information
 Item information takes this idea and
quantifies it across the spectrum
 It is therefore a function of q as well as
the item parameters
 Where P(q) is the probability of a
correct answer for a given value and
Q(q) is 1-P
 
2
2 2 ( ) ( )
I
( ) 1
i i i
i
i i
Q P c
D a
P c
 q q 
q   
q  
Item information
 That is the computational equation
 Conceptual version that is seen in the
literature is
 Or the slope squared over the
conditional variance
   
2
I ( ) / ( ) (1 ( ) )i i iP P Pq q q q 
Graphing info
 So what does this mean?
 We calculate with that equation, and
it will be higher wherever the slope
of the IRF is higher (for a given value
of q)
 This is the item information function
(IIF)
Graphing info
 So the location of the item
determines the location of the IIF
 The discrimination of the item
determines the spread/peakedness of
the IIF
 Information decreases as the guessing
parameter increases
Some example items
Seq a b c
1 1.00 -2.00 0.26
2 0.70 -1.00 0.21
3 0.40 -0.50 0.30
4 0.50 1.00 0.00
5 0.80 0.00 0.22
Example item IRFs
IIFs – example items
Graphing info functions
 Note that a lower slope is not ALL
bad
 Even though Item 3’s peak is lower, it
provides some info at a much wider
range
 So items like that are quite useful
when info is needed across a wide
range
Using item info
 Item information is inversely related
to error in measurement
 If the item provides more info, it
reduces error
 The equation:
   2/1
1 qq ISEM 
Using item info
Key point: an item has less
error where it has more
information
--> where it has more slope
A test has less error where
it has more information
(items)
Using item info
 IIFs are another way to examine
items individually
 They are also what adaptive testing
utilizes for item selection
 But the best use of item info: test
information and test assembly…
Test information
 As a result of the assumption of local
independence, IIFs can be summed to
obtain a test information function
(TIF)
 Same is true for IRFs – they can be
summed into a TRF
◦ This converts thetas to estimated raw
score
Test information
 Test information, like item
information, shows how well a test
measures at each value of q
 Also inverts to CSEM
 This is extremely useful for test
assembly (aka construction, design,
or building)
Test information
 Consider the 5 IRFs…
Test information
 The TRF is…
Test information
 Consider the 5 IIFs…
Test information
 The TIF is…
Test information
 The CSEM curve is…
Test assembly
 Form building is more efficient and
better directed with IRT
 Reason: we can predict measurement
error (SEM) at each level of θ, not
just overall reliability
Test assembly
 This then allows you to build test
forms with specific TIFs or CSEMs in
mind
 Or multiple forms with the same TIF
 The following figures have the same
average a (0.9) but differ in where
they provide information
TRFs
TIFs
CSEMs
Test development
 You can build your test with specific
TRF/TIF/SEM graph in mind
 Peak at cutscore?
 This can be done inside item bankers
(FastTEST & FT Web) or in separate
spreadsheets (my Form Building Tool)
Bank development
 You can also build the bank for a
testing program with the desired TIF
in mind
 If you know you want it to be peaked,
write items at the desired level of
difficulty to build an adequate bank
Bank development
 Otherwise you risk overexposure
 Don’t use all your best items at once
to make a peaked TIF – or any TIF for
that matter
 In the theoretical IRT world, we don’t
have to worry about that, but
exposure is a real issue
Bank development
 That is the reason linear-on-the-fly
(LOFT )was developed – to massively
reduce exposure and increase
security
◦ Every person gets an very similar TIF, but
a completely different test
◦ These tests are parallel, from an IRT
point of view
◦ Tests are conventional fixed-form
Part 2
A brief comparison of CTT and IRT
CTT and IRT Assumptions
 IRT:
◦ Unidimensionality and local independence
◦ Responses modeled by IRF
◦ Parameters, not statistics (sample
independence)
 CTT:
◦ X = T + E
◦ (1) true scores and error scores are
uncorrelated; (2) the average error score in
the sample is zero
◦ Statistics (not parameters) are sample-based
Comparing CTT and IRT
 CTT is said to have weaker assumptions
◦ Does not explicitly assume
unidimensionality
 But if not there, statistics will be iffy, and rpbis
and reliability suffer
 Sum scoring implicitly assumes items are
equivalent, which means unidimensional (all
items count equally on one total score)
Comparing CTT and IRT
 CTT is said to have weaker assumptions
◦ Does not explicitly assume IRF
 But if the idea of an IRF is not working, then the
item isn’t either
 And if you use rpbis, you assume a linear IRF,
which is actually impossible!
Comparing CTT and IRT
 CTT item statistics are at odds with
each other
◦ P says that there is one common
probability of a correct response
(binomial)
◦ But rpbis says that P increases with total
score (~ability)
Comparing CTT and IRT
 Classical SEM: same for everyone
 IRT SEM: different for everyone –
depends on the items you see and
your ability
 Which is more realistic?
Comparing CTT and IRT
 Direct comparison of item statistics
◦ We still use “difficulty” and
“discrimination”
◦ How different are they from CTT?
◦ Difficulty correlates highly (>0.90)
◦ Discrimination does not – because Rpbis
is linear and IRT is not
Comparing CTT and IRT
 IRT and CTT scores also correlate
>0.95
 So why use IRT?
 There are distinct advantages…
Advantages of IRT
 IRT has parameters, not statistics
 Sample-independent… within a linear
transformation
 Huh? This means that if you have two
calibration groups of different levels,
we can convert parameters/scores
with a simple y = mx + b
 (Linking)
Advantages of IRT
 Items and people are on the same
scale
 Easier to interpret, and allows
adaptive testing
Advantages of IRT
 Information provides an important
tool for test building and bank
development
 Better match the purposes of a test
 IRT CSEM allows far better
description of precision
Advantages of IRT
 More precise scores
 CTT number correct scoring is limited
to k + 1 scores
 3PL has 2k scores
 Compare with 10 items:
◦ 11 vs 1024 possible scores
Advantages of IRT
 Scores take item difficulty into
account
 Allows direct comparison of
examinees that saw different sets of
items
 Scores also account for guessing
Advantages of IRT
 Nonlinear IRF – the linear IRF
assumed by CTT is impossible
 Allows for different SEM for every
examinee
 Not realistic to assume they are all
the same
Disadvantages of IRT
 Sample size
 CTT: 50 is OK, 100 is great
◦ It is much easier to fit a straight line
“model” than an IRF because it is an
oversimplification
 IRT: 100 is bare minimum for 1PL
◦ 3PL? ~500
◦ Puts it out of reach of small testing
programs
Disadvantages of IRT
 No “native” distractor analysis unless
polytomous models
 Can adapt the CTT idea of
quantile/distractor plot with IRT
◦ IRT programs will also give you option P
and Rpbis
Disadvantages of IRT
 Complexity
◦ Not only do you have to understand it
yourself, but…
◦ You also have to explain it to
stakeholders!
Disadvantages of IRT
 However, note that these are not big
problems
◦ Many places have plenty of sample size
◦ You can still use CTT for distractor
analysis (always use both!!!!)
◦ The complexity is not too bad unless
using complex models
◦ Often, the biggest issue is the
stakeholders!
IRT Analysis
How do I go about doing this?
IRT Analysis
 Xcalibre 4 for IRT
 CTT analysis with Iteman 4 (not
necessary, but sometimes helps)
 Also:
◦ Scoring and graphing tool
◦ Form building tool
◦ Empirical IRFs in Excel
◦ Have we covered these sufficiently?
IRT Analysis
 I’m assuming here we are analyzing
just one sample of one test
 What would I look for? Basic…
◦ Items with good parameters (keep/clone)
◦ Items with bad parameters (retire)
 Evaluate their CTT option statistics
◦ TIF/CSEM – meet our needs? (not
good/bad in absolute sense)
IRT Analysis
 What would I look for? Advanced…
◦ Dimensionality assessment (reliability,
any items/sections “off on their own”)
◦ Item fit (also dimensionality, and possible
item issues)
◦ Test sections – any stand out for being
hard, easy, low discriminations, poor
precision, etc?
◦ CSEM/TIF for sections: anything under-
measured?
IRT Analysis
 What would I look for? Advanced…
◦ Finally: what do you want to see in the
data, and how will the test be used?
 Later, we’ll talk about more
advanced uses like:
◦ Linking and equating multiple forms
◦ Test assembly
◦ Adaptive testing
◦ Dimensionality evaluation
Iteman 4.1
 Performs comprehensive classical
analysis
 Quantile plots allow broad evaluation
of IRF shape
 Advantages:
◦ Easily understandable – can use with SMEs
◦ Includes distractors
Xcalibre 4.1
 Provides a comprehensive and user-
friendly IRT analysis
 Allows evaluation of individual items
and test as a whole
 All major graphs
 Many summary graphs (freqs etc.)
 Classical analysis too
Reasons for Xcalibre 4.1
 Current available software (Parscale,
Bilog, Multilog, ConQuest, WinSteps,
ICL) still require programming skills
 Some still run on DOS!
 If IRT is to be more widely used, it
needs a user-friendly system
◦ Input and output
Reasons for Xcalibre 4.1
 Better input
◦ Yes: Point and click buttons
◦ No: DOS programming quasi-language
 Better output
◦ Yes: Word docs (RTF), spreadsheets (CSV)
◦ No: DOS txt files with ugly tables
Reasons for Xcalibre 4.1
 Advanced users with programming
skills and need for customized analysis
can still utilize previous software
 Xcalibre 4.1 is designed for a wider
range of users
 The following description is of Xcalibre
4, but also applies to Iteman 4
Xcalibre 4.1 Interface
 Divided into tabs
 Move left to right…
Xcalibre 4.1 Interface
 All options are specified with buttons
or simple entry boxes
 No code based on keywords
◦ Best example: IRT models (you’ll see)
 Also: usable error messages
Specify files/input; choose options
 I’ll now show how to use X4, and do
some analysis of real data…

More Related Content

PPTX
Introduction to Item Response Theory
PPTX
Implementing Item Response Theory
PPT
Irt 1 pl, 2pl, 3pl.pdf
PPT
Item Response Theory (IRT)
PPTX
Linear Regression and Logistic Regression in ML
PPTX
Item Response Theory in Constructing Measures
PDF
IRT - Item response Theory
PPTX
Cannonical correlation
Introduction to Item Response Theory
Implementing Item Response Theory
Irt 1 pl, 2pl, 3pl.pdf
Item Response Theory (IRT)
Linear Regression and Logistic Regression in ML
Item Response Theory in Constructing Measures
IRT - Item response Theory
Cannonical correlation

What's hot (9)

PDF
Structural Equation Modelling (SEM) Part 2
PPT
Confirmatory Factor Analysis
PPT
Defination unit 4
PPTX
10 -- Overfitting and Underfitting.pptx
ODP
Correlation
PDF
Assumptions of Linear Regression - Machine Learning
PPTX
Machine Learning lecture4(logistic regression)
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Structural Equation Modelling (SEM) Part 2
Confirmatory Factor Analysis
Defination unit 4
10 -- Overfitting and Underfitting.pptx
Correlation
Assumptions of Linear Regression - Machine Learning
Machine Learning lecture4(logistic regression)
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Ad

Viewers also liked (16)

PPTX
Irt assessment
PPT
Introduction to Item Response Theory
PPTX
Item discrimination
KEY
Presentations
DOC
Spectrum Of Education Technologies1.1
PPT
The application of irt using the rasch model presnetation1
DOCX
Classical Test Theory and Item Response Theory
PPT
T est item analysis
PPT
Item Analysis
PPT
Item and Distracter Analysis
PPTX
Item analysis
PPTX
Item analysis ppt
PDF
Item Analysis - Discrimination and Difficulty Index
PPTX
Item analysis and validation
PPTX
Item analysis
PPT
Centrer sa recherche
Irt assessment
Introduction to Item Response Theory
Item discrimination
Presentations
Spectrum Of Education Technologies1.1
The application of irt using the rasch model presnetation1
Classical Test Theory and Item Response Theory
T est item analysis
Item Analysis
Item and Distracter Analysis
Item analysis
Item analysis ppt
Item Analysis - Discrimination and Difficulty Index
Item analysis and validation
Item analysis
Centrer sa recherche
Ad

Similar to Using Item Response Theory to Improve Assessment (20)

PPTX
Introduction to Computerized Adaptive Testing (CAT)
PDF
Top 100+ Google Data Science Interview Questions.pdf
PDF
Building useful models for imbalanced datasets (without resampling)
PPT
information retrival evaluation.ppt
PPT
Item Analysis: Classical and Beyond
PDF
A visual guide to item response theory
PDF
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...
PDF
Designing Test Collections That Provide Tight Confidence Intervals
PPTX
machine _learning_introductionand python.pptx
PDF
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
PDF
data science with python_UNIT 2_full notes.pdf
PPTX
Module 4 information_attributes
PDF
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
PDF
Result analysis of mining fast frequent itemset using compacted data
PDF
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
PPTX
Artificial intyelligence and machine learning introduction.pptx
PDF
Unit1_Introduction to ML_Cross_validation.pdf
PPTX
part-4-structural-equation-modelling-qr.pptx
PPT
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
PDF
Profitable Itemset Mining using Weights
Introduction to Computerized Adaptive Testing (CAT)
Top 100+ Google Data Science Interview Questions.pdf
Building useful models for imbalanced datasets (without resampling)
information retrival evaluation.ppt
Item Analysis: Classical and Beyond
A visual guide to item response theory
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...
Designing Test Collections That Provide Tight Confidence Intervals
machine _learning_introductionand python.pptx
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
data science with python_UNIT 2_full notes.pdf
Module 4 information_attributes
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result analysis of mining fast frequent itemset using compacted data
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
Artificial intyelligence and machine learning introduction.pptx
Unit1_Introduction to ML_Cross_validation.pdf
part-4-structural-equation-modelling-qr.pptx
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
Profitable Itemset Mining using Weights

Recently uploaded (20)

PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Institutional Correction lecture only . . .
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
RMMM.pdf make it easy to upload and study
PPTX
Presentation on HIE in infants and its manifestations
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Lesson notes of climatology university.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Classroom Observation Tools for Teachers
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
A systematic review of self-coping strategies used by university students to ...
Institutional Correction lecture only . . .
Microbial diseases, their pathogenesis and prophylaxis
RMMM.pdf make it easy to upload and study
Presentation on HIE in infants and its manifestations
Module 4: Burden of Disease Tutorial Slides S2 2025
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Lesson notes of climatology university.
Abdominal Access Techniques with Prof. Dr. R K Mishra
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Classroom Observation Tools for Teachers
202450812 BayCHI UCSC-SV 20250812 v17.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

Using Item Response Theory to Improve Assessment

  • 1. Day 1 PM: Using IRT Item and test information Comparison of IRT to Classical Test Theory How to do IRT analysis
  • 2. Part 1 Item and test information
  • 3. Information  Information is the tool that IRT uses to build tests  It is a statistical term that quantifies how much something “adds” to a procedure  Or, alternatively, how much uncertainty (error) it decreases  A good test has a lot of information!
  • 4. Item information  IRT calculates information for each item and test at each level of q  It is therefore not a single number – it is a function across ability  Each item has an item information function  Each test has an test information function
  • 5. Item information  Some items provide information for high students, some for low  Same is true for tests: a test can be more accurate for certain score ranges – and IRT will tell you which
  • 6. Information  Item information is summative, that is, it can be added up to obtain the test information function (TIF)  Then we know where to add/subtract items  Bonus: The TIF can also be inverted to obtain a predicted SEM curve
  • 7. Item information  With CTT, “information” can be conceptualized by jointly considering the P and rpbis ◦ Obviously, a higher rpbis is better  Definitely don’t want negative! ◦ P represents which examinees it is most appropriate for  P = 0.95 is easy, good for low examinees  P = 0.50 is hard, good for high examinees
  • 8. Item information  But since items and examinees are not on the same scale, there is no direct connection  With IRT, there is  Item with b = 0.7 is good for person with q = 0.7 ◦ This is the basis of adaptive testing – doing this continually
  • 9. Item information  Item information takes this idea and quantifies it across the spectrum  It is therefore a function of q as well as the item parameters  Where P(q) is the probability of a correct answer for a given value and Q(q) is 1-P   2 2 2 ( ) ( ) I ( ) 1 i i i i i i Q P c D a P c  q q  q    q  
  • 10. Item information  That is the computational equation  Conceptual version that is seen in the literature is  Or the slope squared over the conditional variance     2 I ( ) / ( ) (1 ( ) )i i iP P Pq q q q 
  • 11. Graphing info  So what does this mean?  We calculate with that equation, and it will be higher wherever the slope of the IRF is higher (for a given value of q)  This is the item information function (IIF)
  • 12. Graphing info  So the location of the item determines the location of the IIF  The discrimination of the item determines the spread/peakedness of the IIF  Information decreases as the guessing parameter increases
  • 13. Some example items Seq a b c 1 1.00 -2.00 0.26 2 0.70 -1.00 0.21 3 0.40 -0.50 0.30 4 0.50 1.00 0.00 5 0.80 0.00 0.22
  • 16. Graphing info functions  Note that a lower slope is not ALL bad  Even though Item 3’s peak is lower, it provides some info at a much wider range  So items like that are quite useful when info is needed across a wide range
  • 17. Using item info  Item information is inversely related to error in measurement  If the item provides more info, it reduces error  The equation:    2/1 1 qq ISEM 
  • 18. Using item info Key point: an item has less error where it has more information --> where it has more slope A test has less error where it has more information (items)
  • 19. Using item info  IIFs are another way to examine items individually  They are also what adaptive testing utilizes for item selection  But the best use of item info: test information and test assembly…
  • 20. Test information  As a result of the assumption of local independence, IIFs can be summed to obtain a test information function (TIF)  Same is true for IRFs – they can be summed into a TRF ◦ This converts thetas to estimated raw score
  • 21. Test information  Test information, like item information, shows how well a test measures at each value of q  Also inverts to CSEM  This is extremely useful for test assembly (aka construction, design, or building)
  • 26. Test information  The CSEM curve is…
  • 27. Test assembly  Form building is more efficient and better directed with IRT  Reason: we can predict measurement error (SEM) at each level of θ, not just overall reliability
  • 28. Test assembly  This then allows you to build test forms with specific TIFs or CSEMs in mind  Or multiple forms with the same TIF  The following figures have the same average a (0.9) but differ in where they provide information
  • 29. TRFs
  • 30. TIFs
  • 31. CSEMs
  • 32. Test development  You can build your test with specific TRF/TIF/SEM graph in mind  Peak at cutscore?  This can be done inside item bankers (FastTEST & FT Web) or in separate spreadsheets (my Form Building Tool)
  • 33. Bank development  You can also build the bank for a testing program with the desired TIF in mind  If you know you want it to be peaked, write items at the desired level of difficulty to build an adequate bank
  • 34. Bank development  Otherwise you risk overexposure  Don’t use all your best items at once to make a peaked TIF – or any TIF for that matter  In the theoretical IRT world, we don’t have to worry about that, but exposure is a real issue
  • 35. Bank development  That is the reason linear-on-the-fly (LOFT )was developed – to massively reduce exposure and increase security ◦ Every person gets an very similar TIF, but a completely different test ◦ These tests are parallel, from an IRT point of view ◦ Tests are conventional fixed-form
  • 36. Part 2 A brief comparison of CTT and IRT
  • 37. CTT and IRT Assumptions  IRT: ◦ Unidimensionality and local independence ◦ Responses modeled by IRF ◦ Parameters, not statistics (sample independence)  CTT: ◦ X = T + E ◦ (1) true scores and error scores are uncorrelated; (2) the average error score in the sample is zero ◦ Statistics (not parameters) are sample-based
  • 38. Comparing CTT and IRT  CTT is said to have weaker assumptions ◦ Does not explicitly assume unidimensionality  But if not there, statistics will be iffy, and rpbis and reliability suffer  Sum scoring implicitly assumes items are equivalent, which means unidimensional (all items count equally on one total score)
  • 39. Comparing CTT and IRT  CTT is said to have weaker assumptions ◦ Does not explicitly assume IRF  But if the idea of an IRF is not working, then the item isn’t either  And if you use rpbis, you assume a linear IRF, which is actually impossible!
  • 40. Comparing CTT and IRT  CTT item statistics are at odds with each other ◦ P says that there is one common probability of a correct response (binomial) ◦ But rpbis says that P increases with total score (~ability)
  • 41. Comparing CTT and IRT  Classical SEM: same for everyone  IRT SEM: different for everyone – depends on the items you see and your ability  Which is more realistic?
  • 42. Comparing CTT and IRT  Direct comparison of item statistics ◦ We still use “difficulty” and “discrimination” ◦ How different are they from CTT? ◦ Difficulty correlates highly (>0.90) ◦ Discrimination does not – because Rpbis is linear and IRT is not
  • 43. Comparing CTT and IRT  IRT and CTT scores also correlate >0.95  So why use IRT?  There are distinct advantages…
  • 44. Advantages of IRT  IRT has parameters, not statistics  Sample-independent… within a linear transformation  Huh? This means that if you have two calibration groups of different levels, we can convert parameters/scores with a simple y = mx + b  (Linking)
  • 45. Advantages of IRT  Items and people are on the same scale  Easier to interpret, and allows adaptive testing
  • 46. Advantages of IRT  Information provides an important tool for test building and bank development  Better match the purposes of a test  IRT CSEM allows far better description of precision
  • 47. Advantages of IRT  More precise scores  CTT number correct scoring is limited to k + 1 scores  3PL has 2k scores  Compare with 10 items: ◦ 11 vs 1024 possible scores
  • 48. Advantages of IRT  Scores take item difficulty into account  Allows direct comparison of examinees that saw different sets of items  Scores also account for guessing
  • 49. Advantages of IRT  Nonlinear IRF – the linear IRF assumed by CTT is impossible  Allows for different SEM for every examinee  Not realistic to assume they are all the same
  • 50. Disadvantages of IRT  Sample size  CTT: 50 is OK, 100 is great ◦ It is much easier to fit a straight line “model” than an IRF because it is an oversimplification  IRT: 100 is bare minimum for 1PL ◦ 3PL? ~500 ◦ Puts it out of reach of small testing programs
  • 51. Disadvantages of IRT  No “native” distractor analysis unless polytomous models  Can adapt the CTT idea of quantile/distractor plot with IRT ◦ IRT programs will also give you option P and Rpbis
  • 52. Disadvantages of IRT  Complexity ◦ Not only do you have to understand it yourself, but… ◦ You also have to explain it to stakeholders!
  • 53. Disadvantages of IRT  However, note that these are not big problems ◦ Many places have plenty of sample size ◦ You can still use CTT for distractor analysis (always use both!!!!) ◦ The complexity is not too bad unless using complex models ◦ Often, the biggest issue is the stakeholders!
  • 54. IRT Analysis How do I go about doing this?
  • 55. IRT Analysis  Xcalibre 4 for IRT  CTT analysis with Iteman 4 (not necessary, but sometimes helps)  Also: ◦ Scoring and graphing tool ◦ Form building tool ◦ Empirical IRFs in Excel ◦ Have we covered these sufficiently?
  • 56. IRT Analysis  I’m assuming here we are analyzing just one sample of one test  What would I look for? Basic… ◦ Items with good parameters (keep/clone) ◦ Items with bad parameters (retire)  Evaluate their CTT option statistics ◦ TIF/CSEM – meet our needs? (not good/bad in absolute sense)
  • 57. IRT Analysis  What would I look for? Advanced… ◦ Dimensionality assessment (reliability, any items/sections “off on their own”) ◦ Item fit (also dimensionality, and possible item issues) ◦ Test sections – any stand out for being hard, easy, low discriminations, poor precision, etc? ◦ CSEM/TIF for sections: anything under- measured?
  • 58. IRT Analysis  What would I look for? Advanced… ◦ Finally: what do you want to see in the data, and how will the test be used?  Later, we’ll talk about more advanced uses like: ◦ Linking and equating multiple forms ◦ Test assembly ◦ Adaptive testing ◦ Dimensionality evaluation
  • 59. Iteman 4.1  Performs comprehensive classical analysis  Quantile plots allow broad evaluation of IRF shape  Advantages: ◦ Easily understandable – can use with SMEs ◦ Includes distractors
  • 60. Xcalibre 4.1  Provides a comprehensive and user- friendly IRT analysis  Allows evaluation of individual items and test as a whole  All major graphs  Many summary graphs (freqs etc.)  Classical analysis too
  • 61. Reasons for Xcalibre 4.1  Current available software (Parscale, Bilog, Multilog, ConQuest, WinSteps, ICL) still require programming skills  Some still run on DOS!  If IRT is to be more widely used, it needs a user-friendly system ◦ Input and output
  • 62. Reasons for Xcalibre 4.1  Better input ◦ Yes: Point and click buttons ◦ No: DOS programming quasi-language  Better output ◦ Yes: Word docs (RTF), spreadsheets (CSV) ◦ No: DOS txt files with ugly tables
  • 63. Reasons for Xcalibre 4.1  Advanced users with programming skills and need for customized analysis can still utilize previous software  Xcalibre 4.1 is designed for a wider range of users  The following description is of Xcalibre 4, but also applies to Iteman 4
  • 64. Xcalibre 4.1 Interface  Divided into tabs  Move left to right…
  • 65. Xcalibre 4.1 Interface  All options are specified with buttons or simple entry boxes  No code based on keywords ◦ Best example: IRT models (you’ll see)  Also: usable error messages
  • 66. Specify files/input; choose options  I’ll now show how to use X4, and do some analysis of real data…