SlideShare a Scribd company logo
0
Computer Generated Items,
Within-Template Variation, and the
Impact on the Parameters of Response
Models
Quinn N Lathrop
University of Notre Dame
January 8, 2015
1
Item Response Theory
2
Item Response Theory
Ability/Difficulty
Prob
Correct
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
2
Item Response Theory
i = 1, 2, ..., I for Items
p = 1, 2, ..., N for Persons
Ypi = 0 or 1
ηpi = αi × (θp − βi)
P(Ypi = 1) =
exp(ηpi)
1 + exp(ηpi)
θp is Person Ability
βi is Item Difficulty
αi is Item Discrimination
3
New Technology Leads to New Psychometrics
Summer Internship with Pearson’s Center for Digital Data,
Analytics & Adaptive Learning
Computer/tablet-based course: lots of data
Technology allows for algorithmically generated items (called
Templates)
4
5
What are templates?
6
What are templates?
Templates generate items/tasks during computerized assessment.
6
What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
6
What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
6
What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
I a question form
6
What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
I a question form
I distributions for all variables in the question form
6
Examples of Templates
7
Examples of Templates
Question Form:
“What is X + Y ?”
7
Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
7
Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
Question Form:
“What is the average of
x1, x2, x3, x4, x5?”
7
Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
Question Form:
“What is the average of
x1, x2, x3, x4, x5?”
Distributions:
x1−5 ∼ Binom(40, .5)
7
Our Motivating Template
8
Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
8
Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
8
Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
8
Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
An incorrect strategy:
x
y
8
Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
An incorrect strategy:
x
y
For a subset (x = 1), students
can use the wrong strategy and
still get the correct answer!
8
Our Motivating Template
9
Model within-template differences with multi-level IRT?
10
Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
10
Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
10
Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
10
Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
Both?
10
Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
Both?
Neither?
10
Publications/Presentations
Lathrop, Q. N. & Cheng, Y. (Under Review). Computer Generated
Items, Within-Template Variation, and the Impact on the Parameters of
Response Models.
Lathrop, Q. N. (2014). The Impact of Within-Template Systematic
Variation. Presented at NCME and regional conferences.
Lathrop, Q. N. & Behrens, J. (2014). Psychometric, Computational, and
Interactional Issues in Designing Integrated Assessment and Learning Systems.
Presented at the AERA.
Lathrop, Q. N. & Cheng, Y. (2013). Modeling Tests Using Templates
and Effect of Ignoring Template Structure on Educational Outcomes.
Presented at NCME
11
Notation Changes
12
Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
12
Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
12
Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
The items within a template may be grouped by a
design matrix
• A dummy variable Xti
= 1 if ti is in the “subset” and 0
otherwise
12
Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
The items within a template may be grouped by a
design matrix
• A dummy variable Xti
= 1 if ti is in the “subset” and 0
otherwise
When person p is assigned template t, item ti is
randomly drawn from available items
• The response is Ypti
∼ Bernoulli(
exp(ηpti
)
1+exp(ηpti
))
12
Four Models
13
Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
13
Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
13
Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
2P-R
ηpti
= αt × (θp − βti
)
βti
∼ N(µt, σt)
• multi-level model
13
Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
2P-R
ηpti
= αt × (θp − βti
)
βti
∼ N(µt, σt)
• multi-level model
2P-RX
ηpti
= αt × (θp − βti
+ λtXti
)
βti
∼ N(µt, σt)
• the “both” option
13
“What is the probability of rolling a X on a Y -sided die?”
14
“What is the probability of rolling a X on a Y -sided die?”
15
“What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
“What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
“What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-R
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
“What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-R
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-RX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
Parameter Estimation and Model Evaluation
2P-T 2P-TX 2P-R 2P-RX
αt 1.382 1.383 1.439 1.412
µt -0.624 -0.484 -0.758 -0.557
λt - 2.381 - 2.279
σt - - 0.493 0.332
DIC 423.1 405.8 391.6 391.8
cv-AUC 0.653 0.705 0.664 0.698
16
Model selection may be out of our hands
Person ID Template ID Item ID Response X
1 1 12 0 0
1 2 37 1 0
.
.
.
.
.
.
.
.
.
.
.
.
1 T 2 0 1
2 1 7 1 1
.
.
.
.
.
.
.
.
.
.
.
.
N T It 1 1
17
Simulation Study
18
Simulation Study
Simulate data (so we know the true values of the parameters)
18
Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
18
Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
Compare their results
18
Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
Compare their results
What happens if we fit the simpler 2P-T? (ignore template
variability and ignore systematic variation)
18
Quick summary of MCMC inference
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
-Standard error is the standard
deviation across iterations.
19
Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
-Standard error is the standard
deviation across iterations.
-Hypothesis testing can be done with
posterior intervals (like confidence
intervals).
19
Simulation Study
1000 persons answering 40 templates
20
Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
20
Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
20
Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
For X, a random 25% of items within a template belong to the
“subset”
20
Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
For X, a random 25% of items within a template belong to the
“subset”
λt is zero (Type I error) or nonzero
20
How does systematic variation affect α̂t?
21
How does systematic variation affect σ̂t?
22
How does systematic variation affect µ̂t?
23
What about λ̂t?
24
What about λ̂t?
The covariate performs well in terms of bias.
24
What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
24
What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
24
What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
There is a clear benefit to the other parameters in the model.
24
What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
There is a clear benefit to the other parameters in the model.
Specification of Xti
is the limiting factor.
24
Review
ηpti
= αt × (θp − βti
+ λtXti
)
βti
∼ N(µt, σt)
25
How does systematic variation affect θ̂p?
26
How does systematic variation affect θ̂p?
26
Is there any situation that biases θ̂p?
Received ∑λX
Bias
of
θ
^
-2 -1 0 1 2
-0.02
-0.01
0.00
0.01
0.02
0.03
2P-T
2P-TX
2P-R
2P-RX
27
Implications - what is our inferential focus?
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
I and the 2P-T’s discrimination parameter can be used to
screen for high within-template variation
28
Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
I and the 2P-T’s discrimination parameter can be used to
screen for high within-template variation
Already used in large assessment and learning systems
28
Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
29
Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
29
Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
29
Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
If not, the 2P-T is generally the only option.
29
Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
If not, the 2P-T is generally the only option.
If we don’t collect the data, we can’t even begin to ask.
29
Thank you
Quinn N Lathrop
qlathrop@nd.edu
irtND.wikispaces.com
30
31

More Related Content

PPT
Active learning lecture
PPT
Machine learning
PDF
Rohan's Masters presentation
PPTX
Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...
PPT
4646150.ppt
PPT
ppt
PPT
ppt
PPTX
Math Exam Help
Active learning lecture
Machine learning
Rohan's Masters presentation
Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...
4646150.ppt
ppt
ppt
Math Exam Help

Similar to Computer Generated Items, Within-Template Variation, and the Impact on the Parameters of Response Models (20)

PDF
Intro to Classification: Logistic Regression & SVM
PPTX
PREDICT 422 - Module 1.pptx
PDF
Machine Learning Foundations
PDF
Mixed Effects Models - Fixed Effects
PPTX
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
PPTX
Data simulation basics
PPT
Machine Learning
PPT
MLlectureMethod.ppt
PPT
MLlectureMethod.ppt
PDF
Statistical Modeling: The Two Cultures
PPTX
Simulating data to gain insights into power and p-hacking
PPTX
Introduction to Item Response Theory
PDF
Pattern Recognition
PDF
A/B Testing for Game Design
PDF
Microchip Mfg. problem
PPT
AML_030607.ppt
PDF
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
PPT
COMP60431 Machine Learning Advanced Computer Science MSc
PPTX
Introduction to simulating data to improve your research
PDF
ICTIR2016tutorial
Intro to Classification: Logistic Regression & SVM
PREDICT 422 - Module 1.pptx
Machine Learning Foundations
Mixed Effects Models - Fixed Effects
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
Data simulation basics
Machine Learning
MLlectureMethod.ppt
MLlectureMethod.ppt
Statistical Modeling: The Two Cultures
Simulating data to gain insights into power and p-hacking
Introduction to Item Response Theory
Pattern Recognition
A/B Testing for Game Design
Microchip Mfg. problem
AML_030607.ppt
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
COMP60431 Machine Learning Advanced Computer Science MSc
Introduction to simulating data to improve your research
ICTIR2016tutorial
Ad

Recently uploaded (20)

PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
August Patch Tuesday
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
1. Introduction to Computer Programming.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
August Patch Tuesday
OMC Textile Division Presentation 2021.pptx
Approach and Philosophy of On baking technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
1. Introduction to Computer Programming.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Group 1 Presentation -Planning and Decision Making .pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Ad

Computer Generated Items, Within-Template Variation, and the Impact on the Parameters of Response Models

  • 1. 0
  • 2. Computer Generated Items, Within-Template Variation, and the Impact on the Parameters of Response Models Quinn N Lathrop University of Notre Dame January 8, 2015 1
  • 4. Item Response Theory Ability/Difficulty Prob Correct -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 2
  • 5. Item Response Theory i = 1, 2, ..., I for Items p = 1, 2, ..., N for Persons Ypi = 0 or 1 ηpi = αi × (θp − βi) P(Ypi = 1) = exp(ηpi) 1 + exp(ηpi) θp is Person Ability βi is Item Difficulty αi is Item Discrimination 3
  • 6. New Technology Leads to New Psychometrics Summer Internship with Pearson’s Center for Digital Data, Analytics & Adaptive Learning Computer/tablet-based course: lots of data Technology allows for algorithmically generated items (called Templates) 4
  • 7. 5
  • 9. What are templates? Templates generate items/tasks during computerized assessment. 6
  • 10. What are templates? Templates generate items/tasks during computerized assessment. I Creates secure and inexpensive item bank I Creates miniature randomized experiments I Students can repeat and practice templates 6
  • 11. What are templates? Templates generate items/tasks during computerized assessment. I Creates secure and inexpensive item bank I Creates miniature randomized experiments I Students can repeat and practice templates Templates contain: 6
  • 12. What are templates? Templates generate items/tasks during computerized assessment. I Creates secure and inexpensive item bank I Creates miniature randomized experiments I Students can repeat and practice templates Templates contain: I a question form 6
  • 13. What are templates? Templates generate items/tasks during computerized assessment. I Creates secure and inexpensive item bank I Creates miniature randomized experiments I Students can repeat and practice templates Templates contain: I a question form I distributions for all variables in the question form 6
  • 15. Examples of Templates Question Form: “What is X + Y ?” 7
  • 16. Examples of Templates Question Form: “What is X + Y ?” Distributions: fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5} fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8} 7
  • 17. Examples of Templates Question Form: “What is X + Y ?” Distributions: fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5} fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8} Question Form: “What is the average of x1, x2, x3, x4, x5?” 7
  • 18. Examples of Templates Question Form: “What is X + Y ?” Distributions: fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5} fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8} Question Form: “What is the average of x1, x2, x3, x4, x5?” Distributions: x1−5 ∼ Binom(40, .5) 7
  • 20. Our Motivating Template Question Form: “What is the probability of rolling a X on a Y -sided die?” 8
  • 21. Our Motivating Template Question Form: “What is the probability of rolling a X on a Y -sided die?” Distributions: fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20} fX (x) = 1/y, x ∈ {1, 2, ..., y} 8
  • 22. Our Motivating Template Question Form: “What is the probability of rolling a X on a Y -sided die?” Distributions: fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20} fX (x) = 1/y, x ∈ {1, 2, ..., y} Correct strategy: 1 y 8
  • 23. Our Motivating Template Question Form: “What is the probability of rolling a X on a Y -sided die?” Distributions: fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20} fX (x) = 1/y, x ∈ {1, 2, ..., y} Correct strategy: 1 y An incorrect strategy: x y 8
  • 24. Our Motivating Template Question Form: “What is the probability of rolling a X on a Y -sided die?” Distributions: fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20} fX (x) = 1/y, x ∈ {1, 2, ..., y} Correct strategy: 1 y An incorrect strategy: x y For a subset (x = 1), students can use the wrong strategy and still get the correct answer! 8
  • 26. Model within-template differences with multi-level IRT? 10
  • 27. Model within-template differences with multi-level IRT? • Albers (1995) • Glas and van der Linden (2003) • Johnson and Sinharay (2005) 10
  • 28. Model within-template differences with multi-level IRT? • Albers (1995) • Glas and van der Linden (2003) • Johnson and Sinharay (2005) Model within-template differences with covariates? 10
  • 29. Model within-template differences with multi-level IRT? • Albers (1995) • Glas and van der Linden (2003) • Johnson and Sinharay (2005) Model within-template differences with covariates? • Fischer (1973) • de Boeck and Wilson (2004) 10
  • 30. Model within-template differences with multi-level IRT? • Albers (1995) • Glas and van der Linden (2003) • Johnson and Sinharay (2005) Model within-template differences with covariates? • Fischer (1973) • de Boeck and Wilson (2004) Both? 10
  • 31. Model within-template differences with multi-level IRT? • Albers (1995) • Glas and van der Linden (2003) • Johnson and Sinharay (2005) Model within-template differences with covariates? • Fischer (1973) • de Boeck and Wilson (2004) Both? Neither? 10
  • 32. Publications/Presentations Lathrop, Q. N. & Cheng, Y. (Under Review). Computer Generated Items, Within-Template Variation, and the Impact on the Parameters of Response Models. Lathrop, Q. N. (2014). The Impact of Within-Template Systematic Variation. Presented at NCME and regional conferences. Lathrop, Q. N. & Behrens, J. (2014). Psychometric, Computational, and Interactional Issues in Designing Integrated Assessment and Learning Systems. Presented at the AERA. Lathrop, Q. N. & Cheng, Y. (2013). Modeling Tests Using Templates and Effect of Ignoring Template Structure on Educational Outcomes. Presented at NCME 11
  • 34. Notation Changes Each person responds to templates • p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates 12
  • 35. Notation Changes Each person responds to templates • p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates Each template has some number of items • ti = 1, 2, ..., tI 12
  • 36. Notation Changes Each person responds to templates • p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates Each template has some number of items • ti = 1, 2, ..., tI The items within a template may be grouped by a design matrix • A dummy variable Xti = 1 if ti is in the “subset” and 0 otherwise 12
  • 37. Notation Changes Each person responds to templates • p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates Each template has some number of items • ti = 1, 2, ..., tI The items within a template may be grouped by a design matrix • A dummy variable Xti = 1 if ti is in the “subset” and 0 otherwise When person p is assigned template t, item ti is randomly drawn from available items • The response is Ypti ∼ Bernoulli( exp(ηpti ) 1+exp(ηpti )) 12
  • 39. Four Models 2P-T ηpt = αt × (θp − µt) • the “neither” option, just a template level IRT model 13
  • 40. Four Models 2P-T ηpt = αt × (θp − µt) • the “neither” option, just a template level IRT model 2P-TX ηpti = αt × (θp − µt + λtXti ) • adds a covariate, λt to explain differences contained in Xti 13
  • 41. Four Models 2P-T ηpt = αt × (θp − µt) • the “neither” option, just a template level IRT model 2P-TX ηpti = αt × (θp − µt + λtXti ) • adds a covariate, λt to explain differences contained in Xti 2P-R ηpti = αt × (θp − βti ) βti ∼ N(µt, σt) • multi-level model 13
  • 42. Four Models 2P-T ηpt = αt × (θp − µt) • the “neither” option, just a template level IRT model 2P-TX ηpti = αt × (θp − µt + λtXti ) • adds a covariate, λt to explain differences contained in Xti 2P-R ηpti = αt × (θp − βti ) βti ∼ N(µt, σt) • multi-level model 2P-RX ηpti = αt × (θp − βti + λtXti ) βti ∼ N(µt, σt) • the “both” option 13
  • 43. “What is the probability of rolling a X on a Y -sided die?” 14
  • 44. “What is the probability of rolling a X on a Y -sided die?” 15
  • 45. “What is the probability of rolling a X on a Y -sided die?” 2P-T Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 15
  • 46. “What is the probability of rolling a X on a Y -sided die?” 2P-T Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-TX Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 15
  • 47. “What is the probability of rolling a X on a Y -sided die?” 2P-T Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-TX Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-R Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 15
  • 48. “What is the probability of rolling a X on a Y -sided die?” 2P-T Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-TX Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-R Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2P-RX Prob of Correct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.70 0.75 0.80 0.85 0.90 0.95 1.00 15
  • 49. Parameter Estimation and Model Evaluation 2P-T 2P-TX 2P-R 2P-RX αt 1.382 1.383 1.439 1.412 µt -0.624 -0.484 -0.758 -0.557 λt - 2.381 - 2.279 σt - - 0.493 0.332 DIC 423.1 405.8 391.6 391.8 cv-AUC 0.653 0.705 0.664 0.698 16
  • 50. Model selection may be out of our hands Person ID Template ID Item ID Response X 1 1 12 0 0 1 2 37 1 0 . . . . . . . . . . . . 1 T 2 0 1 2 1 7 1 1 . . . . . . . . . . . . N T It 1 1 17
  • 52. Simulation Study Simulate data (so we know the true values of the parameters) 18
  • 53. Simulation Study Simulate data (so we know the true values of the parameters) Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with MCMC 18
  • 54. Simulation Study Simulate data (so we know the true values of the parameters) Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with MCMC Compare their results 18
  • 55. Simulation Study Simulate data (so we know the true values of the parameters) Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with MCMC Compare their results What happens if we fit the simpler 2P-T? (ignore template variability and ignore systematic variation) 18
  • 56. Quick summary of MCMC inference 19
  • 57. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. 19
  • 58. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. The results of MCMC are samples from the distribution of all parameters of interest 19
  • 59. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. The results of MCMC are samples from the distribution of all parameters of interest parameters iterations theta[40] [1,] -1.590404 [2,] -1.625150 [3,] -1.676880 [4,] -1.986976 [5,] -1.808551 [6,] -1.562125 [7,] -1.837187 [8,] -1.518175 19
  • 60. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. The results of MCMC are samples from the distribution of all parameters of interest parameters iterations theta[40] [1,] -1.590404 [2,] -1.625150 [3,] -1.676880 [4,] -1.986976 [5,] -1.808551 [6,] -1.562125 [7,] -1.837187 [8,] -1.518175 -Point estimate is the average across iterations. 19
  • 61. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. The results of MCMC are samples from the distribution of all parameters of interest parameters iterations theta[40] [1,] -1.590404 [2,] -1.625150 [3,] -1.676880 [4,] -1.986976 [5,] -1.808551 [6,] -1.562125 [7,] -1.837187 [8,] -1.518175 -Point estimate is the average across iterations. -Standard error is the standard deviation across iterations. 19
  • 62. Quick summary of MCMC inference Bayesian analysis combines the data, our model for the data (likelihood), and any prior information. The results of MCMC are samples from the distribution of all parameters of interest parameters iterations theta[40] [1,] -1.590404 [2,] -1.625150 [3,] -1.676880 [4,] -1.986976 [5,] -1.808551 [6,] -1.562125 [7,] -1.837187 [8,] -1.518175 -Point estimate is the average across iterations. -Standard error is the standard deviation across iterations. -Hypothesis testing can be done with posterior intervals (like confidence intervals). 19
  • 63. Simulation Study 1000 persons answering 40 templates 20
  • 64. Simulation Study 1000 persons answering 40 templates Each template has 12 or 100 items 20
  • 65. Simulation Study 1000 persons answering 40 templates Each template has 12 or 100 items σt ∼ 0, |N(0, .3)|, or |N(0, .6)| 20
  • 66. Simulation Study 1000 persons answering 40 templates Each template has 12 or 100 items σt ∼ 0, |N(0, .3)|, or |N(0, .6)| For X, a random 25% of items within a template belong to the “subset” 20
  • 67. Simulation Study 1000 persons answering 40 templates Each template has 12 or 100 items σt ∼ 0, |N(0, .3)|, or |N(0, .6)| For X, a random 25% of items within a template belong to the “subset” λt is zero (Type I error) or nonzero 20
  • 68. How does systematic variation affect α̂t? 21
  • 69. How does systematic variation affect σ̂t? 22
  • 70. How does systematic variation affect µ̂t? 23
  • 72. What about λ̂t? The covariate performs well in terms of bias. 24
  • 73. What about λ̂t? The covariate performs well in terms of bias. But the 2P-TX has very high Type I error. 24
  • 74. What about λ̂t? The covariate performs well in terms of bias. But the 2P-TX has very high Type I error. And the 2P-RX properly controls Type I error. 24
  • 75. What about λ̂t? The covariate performs well in terms of bias. But the 2P-TX has very high Type I error. And the 2P-RX properly controls Type I error. There is a clear benefit to the other parameters in the model. 24
  • 76. What about λ̂t? The covariate performs well in terms of bias. But the 2P-TX has very high Type I error. And the 2P-RX properly controls Type I error. There is a clear benefit to the other parameters in the model. Specification of Xti is the limiting factor. 24
  • 77. Review ηpti = αt × (θp − βti + λtXti ) βti ∼ N(µt, σt) 25
  • 78. How does systematic variation affect θ̂p? 26
  • 79. How does systematic variation affect θ̂p? 26
  • 80. Is there any situation that biases θ̂p? Received ∑λX Bias of θ ^ -2 -1 0 1 2 -0.02 -0.01 0.00 0.01 0.02 0.03 2P-T 2P-TX 2P-R 2P-RX 27
  • 81. Implications - what is our inferential focus? 28
  • 82. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work 28
  • 83. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them 28
  • 84. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences 28
  • 85. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences I But doesn’t meaningfully effect the inferences based of θ 28
  • 86. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences I But doesn’t meaningfully effect the inferences based of θ The Simple 2P-T Model 28
  • 87. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences I But doesn’t meaningfully effect the inferences based of θ The Simple 2P-T Model I while it cannot uncover the within-template effects, can still measure θ very well 28
  • 88. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences I But doesn’t meaningfully effect the inferences based of θ The Simple 2P-T Model I while it cannot uncover the within-template effects, can still measure θ very well I and the 2P-T’s discrimination parameter can be used to screen for high within-template variation 28
  • 89. Implications - what is our inferential focus? While templates are increasingly used, there is relatively little methodological work I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX models can account for them I Useful for item analysis, item selection, and other item-based inferences I But doesn’t meaningfully effect the inferences based of θ The Simple 2P-T Model I while it cannot uncover the within-template effects, can still measure θ very well I and the 2P-T’s discrimination parameter can be used to screen for high within-template variation Already used in large assessment and learning systems 28
  • 90. Data Collection is Key Many systems have thousands of templates each with potentially thousands of items. 29
  • 91. Data Collection is Key Many systems have thousands of templates each with potentially thousands of items. I Is the item index being recorded (needed for R models)? 29
  • 92. Data Collection is Key Many systems have thousands of templates each with potentially thousands of items. I Is the item index being recorded (needed for R models)? I How do we organize the items by meaningful dimensions in X (needed for X models)? 29
  • 93. Data Collection is Key Many systems have thousands of templates each with potentially thousands of items. I Is the item index being recorded (needed for R models)? I How do we organize the items by meaningful dimensions in X (needed for X models)? If not, the 2P-T is generally the only option. 29
  • 94. Data Collection is Key Many systems have thousands of templates each with potentially thousands of items. I Is the item index being recorded (needed for R models)? I How do we organize the items by meaningful dimensions in X (needed for X models)? If not, the 2P-T is generally the only option. If we don’t collect the data, we can’t even begin to ask. 29
  • 95. Thank you Quinn N Lathrop qlathrop@nd.edu irtND.wikispaces.com 30
  • 96. 31