Developing a Computerized Adaptive Test

What is CAT?
CAT is an algorithm
We need to break down and specify all
aspects
Choice of major algorithms
Subalgorithms
Input parameters
Item bank needs

CAT Components
1. Calibrated item bank
2. Starting rule
3. Item selection rule
4. Scoring rule
5. Stopping rule
We must provide validity documentation on
each
Algorithms
inside your
testing
engine
Test development
side

Background
CAT remains underutilized. What are the
barriers?
What do you think?
Cost
Complexity
Few guidelines on how to develop one

Background
We have approximately 4 decades of technical
research on CAT
Numerous books and other resources
(Rudner’s tutorial) on what CAT is and how it
works
Discussions of issues (Wise & Kingsbury,
2000)
Very few resources on how to develop a CAT

Background
Best existing resource: descriptions of
current CAT programs
Sands, Waters, & McBride (1997): ASVAB
Elements of Adaptive Testing: Part 2 = 5
examples
JATT issue on CAT

Background
Framework, not complete recipe
Identify choices for your org and best way
to investigate/decide
Leads to better quality in the end
Also the foundation for validity arguments
Why did you choose certain things?

Seq. Stage Primary work
1 Feasibility, applicability, and
planning studies
Monte carlo simulation;
business case evaluation
2 Develop item bank content or
utilize existing bank
Item writing and review
3 Pretest and calibrate item
bank
Pretesting; item analysis
4 Determine specifications for
final CAT
Post-hoc or hybrid
simulations
5 Publish live CAT Publishing and
distribution; software
development
The 5 step model

1. Feasibility, applicability, planning
Big question: is CAT worth
the investment?
If so, how can we develop a
project plan and timeline?

Answer: simulations
Simulate how a CAT would operate under
specified conditions
IVs
 Item bank size
 Item quality
 Desired precision
DVs
 Average test length
 Accuracy: CAT θ vs. true θ (or full bank)

For those newer to CAT…
Three types of simulations
Monte Carlo
Post hoc (real data)
Hybrid

At this point, real data not likely, so Monte
Carlo
Generate plausible situations
Item bank: 100, 200, 300…
Item quality: a = 0.7, 0.8…; spread of b
Desired precision: SEM = 0.2, 0.3, 0.4…
Compare results to each other and fixed forms
Base values on reality (e.g., mean a)

Think of the results table you want to see
Bank size Target SEM Mean test length Mean SEM
(current test) - 100 .32
200 0.30 ? ?
200 0.40 ? ?
300 0.30 ? ?
300 0.40 ? ?

Software will do this for you, allowing you to
simulate CATs for thousands of examinees in
seconds
CATSim (ASC)
WinGen (Han)
FireStar (Choi)
You can then easily set up an experiment with
a wide range of conditions, and run a
simulation for each
Workshop by Cito on this

Example takeaway:
CAT with bank of 300 items and SEM=0.25
has average of 53 items
Current fixed test has 100 items, SEM=0.23
in middle and 0.35+ beyond θ of ±1.5
CAT will make test more accurate for
extreme examinees, about same accuracy
for middle, but with 50% reduction

Another question: Business Case Evaluation
Example:
You deliver 100,000 tests per year
You estimate $20/hour seat time
Reducing a test from 2 hours to 1 hour then saves
$2 million
More difficult to estimate for K-12 – cost is not seat
time but time away from instruction

2. Develop item bank
Now that we have an idea what we need, we
need to build it
CAT-based considerations:
Difficulty spread
Anticipated exposure/security issues
TIF adequacy
Normal considerations
Content blueprints
Cognitive level

3. Pretesting and analysis
Must pretest items to obtain bank
calibration
Two situations
New test, new scale: present large amounts of
items to examinees
Existing test, old scale: seed items
Obviously will take longer time to pilot
Requires a linking study

3. Pretesting and analysis
Then calibrate, usually IRT
Also perform other due diligence
Dimensionality
DIF
Model fit
Distractor analysis
Remove/revise items based on stats?
Etc.

4. Determine final specifications
To publish a CAT, we need to specify
algorithms
Starting point
Item selection
Scoring
Termination criterion
Also subalgorithms, such as item exposure,
content, test length constraints

But we must have a reason for selecting
specifications
Validity documentation
Defensibility
Again, we turn to simulation studies
Define competing conditions
Big difference now: we have real data!
Post Hoc or Hybrid simulations

After determining psychometric
specifications, evaluate more practical
issues
For example, time limits; can’t really set
until you know how many items
CAT-ASVAB approach: set limits for 90-95% of
population

5. Publish live CAT
Once you have finalized your item bank and
CAT design, time to publish
Need to put everything into item banker and
CAT engine
First: obtain the item banker and CAT engine
If developing your own, this can be the biggest step
If purchasing, this is the easiest step

Epilogue: Maintaining CAT
Like fixed form testing, maintenance is
usually necessary
Check that performing as expected
Is termination criterion being satisfied?
Examinees hitting test length or other
constraints?
Average test length what you expected?
Exposure or security issues?

Thank you!
nthompson@assess.com
See PARE, Volume 16, #1

Developing a Computerized Adaptive Test

More Related Content

What's hot (19)

Viewers also liked (11)

Similar to Developing a Computerized Adaptive Test (20)

Recently uploaded (20)

Developing a Computerized Adaptive Test