SlideShare a Scribd company logo
Developing a
CAT
What is CAT?
CAT is an algorithm
We need to break down and specify all
aspects
Choice of major algorithms
Subalgorithms
Input parameters
Item bank needs
CAT Components
1. Calibrated item bank
2. Starting rule
3. Item selection rule
4. Scoring rule
5. Stopping rule
We must provide validity documentation on
each
Algorithms
inside your
testing
engine
Test development
side
Background
CAT remains underutilized. What are the
barriers?
What do you think?
Cost
Complexity
Few guidelines on how to develop one
Background
We have approximately 4 decades of technical
research on CAT
Numerous books and other resources
(Rudner’s tutorial) on what CAT is and how it
works
Discussions of issues (Wise & Kingsbury,
2000)
Very few resources on how to develop a CAT
Background
Best existing resource: descriptions of
current CAT programs
Sands, Waters, & McBride (1997): ASVAB
Elements of Adaptive Testing: Part 2 = 5
examples
JATT issue on CAT
Background
Framework, not complete recipe
Identify choices for your org and best way
to investigate/decide
Leads to better quality in the end
Also the foundation for validity arguments
Why did you choose certain things?
Seq. Stage Primary work
1 Feasibility, applicability, and
planning studies
Monte carlo simulation;
business case evaluation
2 Develop item bank content or
utilize existing bank
Item writing and review
3 Pretest and calibrate item
bank
Pretesting; item analysis
4 Determine specifications for
final CAT
Post-hoc or hybrid
simulations
5 Publish live CAT Publishing and
distribution; software
development
The 5 step model
1. Feasibility, applicability, planning
Big question: is CAT worth
the investment?
If so, how can we develop a
project plan and timeline?
1. Feasibility, applicability, planning
Answer: simulations
Simulate how a CAT would operate under
specified conditions
IVs
 Item bank size
 Item quality
 Desired precision
DVs
 Average test length
 Accuracy: CAT θ vs. true θ (or full bank)
1. Feasibility, applicability, planning
For those newer to CAT…
Three types of simulations
Monte Carlo
Post hoc (real data)
Hybrid
1. Feasibility, applicability, planning
At this point, real data not likely, so Monte
Carlo
Generate plausible situations
Item bank: 100, 200, 300…
Item quality: a = 0.7, 0.8…; spread of b
Desired precision: SEM = 0.2, 0.3, 0.4…
Compare results to each other and fixed forms
Base values on reality (e.g., mean a)
1. Feasibility, applicability, planning
Think of the results table you want to see
Bank size Target SEM Mean test length Mean SEM
(current test) - 100 .32
200 0.30 ? ?
200 0.40 ? ?
300 0.30 ? ?
300 0.40 ? ?
1. Feasibility, applicability, planning
Software will do this for you, allowing you to
simulate CATs for thousands of examinees in
seconds
CATSim (ASC)
WinGen (Han)
FireStar (Choi)
You can then easily set up an experiment with
a wide range of conditions, and run a
simulation for each
Workshop by Cito on this
1. Feasibility, applicability, planning
1. Feasibility, applicability, planning
Example takeaway:
CAT with bank of 300 items and SEM=0.25
has average of 53 items
Current fixed test has 100 items, SEM=0.23
in middle and 0.35+ beyond θ of ±1.5
CAT will make test more accurate for
extreme examinees, about same accuracy
for middle, but with 50% reduction
1. Feasibility, applicability, planning
Another question: Business Case Evaluation
Example:
You deliver 100,000 tests per year
You estimate $20/hour seat time
Reducing a test from 2 hours to 1 hour then saves
$2 million
More difficult to estimate for K-12 – cost is not seat
time but time away from instruction
2. Develop item bank
Now that we have an idea what we need, we
need to build it
CAT-based considerations:
Difficulty spread
Anticipated exposure/security issues
TIF adequacy
Normal considerations
Content blueprints
Cognitive level
3. Pretesting and analysis
Must pretest items to obtain bank
calibration
Two situations
New test, new scale: present large amounts of
items to examinees
Existing test, old scale: seed items
Obviously will take longer time to pilot
Requires a linking study
3. Pretesting and analysis
Then calibrate, usually IRT
Also perform other due diligence
Dimensionality
DIF
Model fit
Distractor analysis
Remove/revise items based on stats?
Etc.
4. Determine final specifications
To publish a CAT, we need to specify
algorithms
Starting point
Item selection
Scoring
Termination criterion
Also subalgorithms, such as item exposure,
content, test length constraints
4. Determine final specifications
But we must have a reason for selecting
specifications
Validity documentation
Defensibility
Again, we turn to simulation studies
Define competing conditions
Big difference now: we have real data!
Post Hoc or Hybrid simulations
Example sim study
4. Determine final specifications
After determining psychometric
specifications, evaluate more practical
issues
For example, time limits; can’t really set
until you know how many items
CAT-ASVAB approach: set limits for 90-95% of
population
5. Publish live CAT
Once you have finalized your item bank and
CAT design, time to publish
Need to put everything into item banker and
CAT engine
First: obtain the item banker and CAT engine
If developing your own, this can be the biggest step
If purchasing, this is the easiest step
Epilogue: Maintaining CAT
Like fixed form testing, maintenance is
usually necessary
Check that performing as expected
Is termination criterion being satisfied?
Examinees hitting test length or other
constraints?
Average test length what you expected?
Exposure or security issues?
Thank you!
nthompson@assess.com
See PARE, Volume 16, #1

More Related Content

PDF
Creating an in-house computerized adaptive testing (CAT) program with Concerto
PPTX
Introduction to Computerized Adaptive Testing (CAT)
PPTX
Computer Adaptive Test (cat)
PPTX
What makes a good adaptive testing program
PPTX
Computer adaptive testing
PPTX
Using Item Response Theory to Improve Assessment
PPTX
SBAC What is a CAT
PPTX
Introduction to Item Response Theory
Creating an in-house computerized adaptive testing (CAT) program with Concerto
Introduction to Computerized Adaptive Testing (CAT)
Computer Adaptive Test (cat)
What makes a good adaptive testing program
Computer adaptive testing
Using Item Response Theory to Improve Assessment
SBAC What is a CAT
Introduction to Item Response Theory

What's hot (19)

PPTX
Implementing Item Response Theory
PPTX
Introduction to Machine Learning
PPTX
Classical Test Theory (CTT)- By Dr. Jai Singh
PPT
Introduction to Item Response Theory
PDF
Cmpe 255 cross validation
PPT
Statistical learning intro
PPT
Item Analysis: Classical and Beyond
PDF
Instance Space Analysis for Search Based Software Engineering
PPT
Machine Learning presentation.
PPTX
Introduction
PPT
Basics of Machine Learning
PDF
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
PDF
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
PDF
Lecture 9: Machine Learning in Practice (2)
PPT
activelearning.ppt
PDF
On the application of SAT solvers for Search Based Software Testing
PDF
Lecture 2 Basic Concepts in Machine Learning for Language Technology
DOCX
Classical Test Theory and Item Response Theory
PPTX
Types of machine learning
Implementing Item Response Theory
Introduction to Machine Learning
Classical Test Theory (CTT)- By Dr. Jai Singh
Introduction to Item Response Theory
Cmpe 255 cross validation
Statistical learning intro
Item Analysis: Classical and Beyond
Instance Space Analysis for Search Based Software Engineering
Machine Learning presentation.
Introduction
Basics of Machine Learning
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
Towards a pattern recognition approach for transferring knowledge in acm v4 f...
Lecture 9: Machine Learning in Practice (2)
activelearning.ppt
On the application of SAT solvers for Search Based Software Testing
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Classical Test Theory and Item Response Theory
Types of machine learning
Ad

Viewers also liked (11)

PPTX
อุปกรณ์สำรองข้อมูล
PDF
อุปกรณ์สำรองข้อมูล
PPT
Test Administration
PPTX
Administering the test
PPSX
Listening strategies
PPTX
Test construction edited
PPT
stages of test construction
PPTX
Test construction
PPT
Test construction 1
DOCX
Detailed Lesson Plan (ENGLISH, MATH, SCIENCE, FILIPINO)
DOCX
Sample Detailed Lesson Plan
อุปกรณ์สำรองข้อมูล
อุปกรณ์สำรองข้อมูล
Test Administration
Administering the test
Listening strategies
Test construction edited
stages of test construction
Test construction
Test construction 1
Detailed Lesson Plan (ENGLISH, MATH, SCIENCE, FILIPINO)
Sample Detailed Lesson Plan
Ad

Similar to Developing a Computerized Adaptive Test (20)

PPT
Coverage dallas june20-2006
PDF
Barga Data Science lecture 10
PDF
201008 Software Testing Notes (part 1/2)
DOCX
Summary Questions – Sartre, Leopold Put things in your own.docx
PDF
Scalable Software Testing and Verification of Non-Functional Properties throu...
PDF
5 Practical Steps to a Successful Deep Learning Research
PPTX
The relationship between test and production code quality (@ SIG)
DOCX
Project Template - Artificial Intelligence and Data Science
PPT
types of testing with descriptions and examples
PPT
ISTQB / ISEB Foundation Exam Practice -1
PPT
Testing process
PPT
Software Testing Process
PPTX
Test design techniques
PDF
Mt s11 test_design
PPT
Ensemble Learning Featuring the Netflix Prize Competition and ...
PPTX
A framework and approaches to develop an in-house CAT with freeware and open ...
PPT
ISTQB, ISEB Lecture Notes
PPTX
Agile analytics : An exploratory study of technical complexity management
PPTX
Test design techniques
PPTX
Test Design Techiques
Coverage dallas june20-2006
Barga Data Science lecture 10
201008 Software Testing Notes (part 1/2)
Summary Questions – Sartre, Leopold Put things in your own.docx
Scalable Software Testing and Verification of Non-Functional Properties throu...
5 Practical Steps to a Successful Deep Learning Research
The relationship between test and production code quality (@ SIG)
Project Template - Artificial Intelligence and Data Science
types of testing with descriptions and examples
ISTQB / ISEB Foundation Exam Practice -1
Testing process
Software Testing Process
Test design techniques
Mt s11 test_design
Ensemble Learning Featuring the Netflix Prize Competition and ...
A framework and approaches to develop an in-house CAT with freeware and open ...
ISTQB, ISEB Lecture Notes
Agile analytics : An exploratory study of technical complexity management
Test design techniques
Test Design Techiques

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
master seminar digital applications in india
PPTX
Presentation on HIE in infants and its manifestations
PPTX
Institutional Correction lecture only . . .
PDF
Classroom Observation Tools for Teachers
PDF
Computing-Curriculum for Schools in Ghana
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Lesson notes of climatology university.
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
master seminar digital applications in india
Presentation on HIE in infants and its manifestations
Institutional Correction lecture only . . .
Classroom Observation Tools for Teachers
Computing-Curriculum for Schools in Ghana
GDM (1) (1).pptx small presentation for students
Lesson notes of climatology university.
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
VCE English Exam - Section C Student Revision Booklet
102 student loan defaulters named and shamed – Is someone you know on the list?
Chapter 2 Heredity, Prenatal Development, and Birth.pdf

Developing a Computerized Adaptive Test

  • 2. What is CAT? CAT is an algorithm We need to break down and specify all aspects Choice of major algorithms Subalgorithms Input parameters Item bank needs
  • 3. CAT Components 1. Calibrated item bank 2. Starting rule 3. Item selection rule 4. Scoring rule 5. Stopping rule We must provide validity documentation on each Algorithms inside your testing engine Test development side
  • 4. Background CAT remains underutilized. What are the barriers? What do you think? Cost Complexity Few guidelines on how to develop one
  • 5. Background We have approximately 4 decades of technical research on CAT Numerous books and other resources (Rudner’s tutorial) on what CAT is and how it works Discussions of issues (Wise & Kingsbury, 2000) Very few resources on how to develop a CAT
  • 6. Background Best existing resource: descriptions of current CAT programs Sands, Waters, & McBride (1997): ASVAB Elements of Adaptive Testing: Part 2 = 5 examples JATT issue on CAT
  • 7. Background Framework, not complete recipe Identify choices for your org and best way to investigate/decide Leads to better quality in the end Also the foundation for validity arguments Why did you choose certain things?
  • 8. Seq. Stage Primary work 1 Feasibility, applicability, and planning studies Monte carlo simulation; business case evaluation 2 Develop item bank content or utilize existing bank Item writing and review 3 Pretest and calibrate item bank Pretesting; item analysis 4 Determine specifications for final CAT Post-hoc or hybrid simulations 5 Publish live CAT Publishing and distribution; software development The 5 step model
  • 9. 1. Feasibility, applicability, planning Big question: is CAT worth the investment? If so, how can we develop a project plan and timeline?
  • 10. 1. Feasibility, applicability, planning Answer: simulations Simulate how a CAT would operate under specified conditions IVs  Item bank size  Item quality  Desired precision DVs  Average test length  Accuracy: CAT θ vs. true θ (or full bank)
  • 11. 1. Feasibility, applicability, planning For those newer to CAT… Three types of simulations Monte Carlo Post hoc (real data) Hybrid
  • 12. 1. Feasibility, applicability, planning At this point, real data not likely, so Monte Carlo Generate plausible situations Item bank: 100, 200, 300… Item quality: a = 0.7, 0.8…; spread of b Desired precision: SEM = 0.2, 0.3, 0.4… Compare results to each other and fixed forms Base values on reality (e.g., mean a)
  • 13. 1. Feasibility, applicability, planning Think of the results table you want to see Bank size Target SEM Mean test length Mean SEM (current test) - 100 .32 200 0.30 ? ? 200 0.40 ? ? 300 0.30 ? ? 300 0.40 ? ?
  • 14. 1. Feasibility, applicability, planning Software will do this for you, allowing you to simulate CATs for thousands of examinees in seconds CATSim (ASC) WinGen (Han) FireStar (Choi) You can then easily set up an experiment with a wide range of conditions, and run a simulation for each Workshop by Cito on this
  • 16. 1. Feasibility, applicability, planning Example takeaway: CAT with bank of 300 items and SEM=0.25 has average of 53 items Current fixed test has 100 items, SEM=0.23 in middle and 0.35+ beyond θ of ±1.5 CAT will make test more accurate for extreme examinees, about same accuracy for middle, but with 50% reduction
  • 17. 1. Feasibility, applicability, planning Another question: Business Case Evaluation Example: You deliver 100,000 tests per year You estimate $20/hour seat time Reducing a test from 2 hours to 1 hour then saves $2 million More difficult to estimate for K-12 – cost is not seat time but time away from instruction
  • 18. 2. Develop item bank Now that we have an idea what we need, we need to build it CAT-based considerations: Difficulty spread Anticipated exposure/security issues TIF adequacy Normal considerations Content blueprints Cognitive level
  • 19. 3. Pretesting and analysis Must pretest items to obtain bank calibration Two situations New test, new scale: present large amounts of items to examinees Existing test, old scale: seed items Obviously will take longer time to pilot Requires a linking study
  • 20. 3. Pretesting and analysis Then calibrate, usually IRT Also perform other due diligence Dimensionality DIF Model fit Distractor analysis Remove/revise items based on stats? Etc.
  • 21. 4. Determine final specifications To publish a CAT, we need to specify algorithms Starting point Item selection Scoring Termination criterion Also subalgorithms, such as item exposure, content, test length constraints
  • 22. 4. Determine final specifications But we must have a reason for selecting specifications Validity documentation Defensibility Again, we turn to simulation studies Define competing conditions Big difference now: we have real data! Post Hoc or Hybrid simulations
  • 24. 4. Determine final specifications After determining psychometric specifications, evaluate more practical issues For example, time limits; can’t really set until you know how many items CAT-ASVAB approach: set limits for 90-95% of population
  • 25. 5. Publish live CAT Once you have finalized your item bank and CAT design, time to publish Need to put everything into item banker and CAT engine First: obtain the item banker and CAT engine If developing your own, this can be the biggest step If purchasing, this is the easiest step
  • 26. Epilogue: Maintaining CAT Like fixed form testing, maintenance is usually necessary Check that performing as expected Is termination criterion being satisfied? Examinees hitting test length or other constraints? Average test length what you expected? Exposure or security issues?