SlideShare a Scribd company logo
WHAT CAN
MACHINE
LEARNING DO
FOR OPEN
EDUCATION?
Geoff Gordon
CMU Machine Learning
ggordon@cs.cmu.edu
Civilization advances by
extending the number of
important operations which
we can perform without
thinking about them.
—Alfred North Whitehead, 1911
Dr. Geoffrey J. Gordon: What can machine learning do for open education?
Geoff Gordon—OCWC—April 2014
CONTRIBUTION OF ML
Machine learning can help us understand how students learn
4
Geoff Gordon—OCWC—April 2014
CONTRIBUTION OF ML
Machine learning can help us understand how students learn
‣ Not just any ML, but latent variable (“hidden feature”) discovery
4
Geoff Gordon—OCWC—April 2014
CONTRIBUTION OF ML
Machine learning can help us understand how students learn
‣ Not just any ML, but latent variable (“hidden feature”) discovery
‣ Not just any latent variables, but highly structured ones
4
Geoff Gordon—OCWC—April 2014
WHY BOTHER?
Student feedback
‣ what does the student know?
‣ what are common causes for the mistake the student just made?
Instructor feedback
‣ what do the students know?
‣ what skills does this course content address?
‣ what skills doesn’t this course content address?
Evaluation
‣ help design rubrics for (peer, instructor) grading
‣ cluster submissions by similar approach, skill level, …
Etc…
5
Geoff Gordon—OCWC—April 2014
GOAL: UNDERSTAND HOW STUDENTS
LEARN SOMETHING
6
10m
4m
5m
10m
3x + 4 = x + 10
John took Joe for a ride on his boat.
___ boat was blue with a red stripe.
A/The/[]
Geoff Gordon—OCWC—April 2014
GOAL: UNDERSTAND HOW STUDENTS
LEARN SOMETHING
6
10m
4m
5m
10m
3x + 4 = x + 10
John took Joe for a ride on his boat.
___ boat was blue with a red stripe.
70m2
x = 3
The
A/The/[]
Geoff Gordon—OCWC—April 2014
EX: GEOMETRY TUTOR
7
http://www.carnegielearning.com/
Geoff Gordon—OCWC—April 2014
STEP-LEVEL DATA
Record right/wrong, timing, use of hints, …
8
one step = fill in a box
Geoff Gordon—OCWC—April 2014
SIMPLEST MODEL: RASCH /
1-PARAMETER ITEM RESPONSE THEORY
9
ln
✓
pt
1 pt
◆
= ✓it
+ jt
θ = student mean (knowledge level)

β = item mean (easy/difficult)
it = student ID jt = step ID
pt = P(correct answer)
Geoff Gordon—OCWC—April 2014
SIMPLEST MODEL: RASCH /
1-PARAMETER ITEM RESPONSE THEORYStudents
Steps in tutor Each entry: does student i get step j right?
1
10
θ
β
predict 1 if θi+βj > 0
Geoff Gordon—OCWC—April 2014
STRUCTURE: SIMILARITY AMONG STEPS
Learn a “step map”: each point = 1 step
Steps over here are more
similar to each other…
… than to steps over here
Geoff Gordon—OCWC—April 2014
STRUCTURE: SIMILARITY AMONG STEPS
Learn a “step map”: each point = 1 step
Steps over here are more
similar to each other…
… than to steps over here
hic sunt
dracones
Geoff Gordon—OCWC—April 2014
HOW PRINCIPAL COMPONENTS ANAYSIS
GOT FAMOUS
Y1
Y2
Y3
.

.

.

Yn
Users
Movies
Each entry: how many stars does user i give
to movie j?
4
12
Geoff Gordon—OCWC—April 2014
RESULT OF FACTORING
u1
u2
u3
.

.

.

un
v1
…
vk
Users
MoviesBasis weights
Basisvectors
Low-d basis = latent variables

!
Basis vectors represent latent properties of
movies, e.g.,“is a comedy”
13
Geoff Gordon—OCWC—April 2014
IN OUR CASE (STUDENT-STEP DATA)
U1
U2
U3
.
.
.
UN
V1
…
VK
Students
Stepsbasis weights
basisvectors
Basis vectors are candidate “eigenskills”
Weights are students’ knowledge
levels
14
Geoff Gordon—OCWC—April 2014
DOES IT WORK?
15
steps about pentagons
steps about circles
other steps.
Learned features let us
predict held-out data better
than chance
(ρ = .3, p < 0.0001)
step map
Geoff Gordon—OCWC—April 2014
DOES IT WORK?
15
steps about pentagons
steps about circles
other steps.
Learned features let us
predict held-out data better
than chance
(ρ = .3, p < 0.0001)
Yes, sort of …
step map
Geoff Gordon—OCWC—April 2014
STRUCTURE: PRACTICE MAKES PERFECT
PCA ignores step order — clearly wrong…
Add model of student learning to PCA
‣ based on “additive factor model” [Draney et al., 1995]
16
Geoff Gordon—OCWC—April 2014
STRUCTURE: PRACTICE MAKES PERFECT
PCA ignores step order — clearly wrong…
Add model of student learning to PCA
‣ based on “additive factor model” [Draney et al., 1995]
Result: predictions of held-out data get slightly better
‣ ρ = .45 (p < 0.01 vs. plain PCA)
Step map still looks the same
Meh…
17
Geoff Gordon—OCWC—April 2014
WHAT WE REALLY WANT
To be understandable to us humans, latents need to be sparse and
binary (‘is about circles’,‘requires subtracting areas’)
Can’t do this fully automatically from this small data set (only 59
students, 370 steps)
Challenge: can we discover sparse, binary, understandable latents
automatically from MOOC-scale data?
18
Geoff Gordon—OCWC—April 2014
“KC HYPOTHESIS”
Knowledge comes in atomic units (“KCs”)
Each KC is learned independently (no transfer)
‣ transfer among steps mediated by common KCs
‣ or prerequisite structure (can’t learn algebra w/o knowing arithmetic)
Each student has a (latent, scalar) proficiency level for each KC
‣ learn/forget = transition to a higher/lower proficiency level
Learning a KC happens only through exposure to that KC
‣ problem, worked example, lecture, real life, …
19
step 17: {A, B}

step 23: {A, C}
[Koedinger, Corbett, Perfetti. Cognitive Science, 2012]
Geoff Gordon—OCWC—April 2014
CONSEQUENCES OF KC HYPOTHESIS
Mistakes are at KC level: select wrong KC; apply right KC to wrong
data; mistake in application of KC
‣ identifying the KC at fault makes it easier to give student feedback
If we can accurately
‣ determine list of KCs
‣ label instructional activities by KCs
…then we immediately know the quality/coverage of our content
20
Geoff Gordon—OCWC—April 2014
COMPOSE-BY-ADDITION
21
[Stamper & Koedinger,AIED 2011]
Geoff Gordon—OCWC—April 2014
COMPOSE-BY-ADDITION
22
[Stamper&Koedinger,AIED2011]
Geoff Gordon—OCWC—April 2014
WHY ARE SOME COMPOSE-BY-ADDITION
STEPS HARDER?
23
compose by
addition
Geoff Gordon—OCWC—April 2014
WHY ARE SOME COMPOSE-BY-ADDITION
STEPS HARDER?
24
hard
easy
medium
compose by
addition
Geoff Gordon—OCWC—April 2014
WHY ARE SOME COMPOSE-BY-ADDITION
STEPS HARDER?
24
hard
easy
medium
compose by
addition
Geoff Gordon—OCWC—April 2014
HYPOTHESIS: DIFFERENCE IS IN HOW
MUCH PLANNING IS NEEDED
25
plan to
compose
subtract
compose by
addition
Geoff Gordon—OCWC—April 2014
KC DISCOVERY
26
t [4]. Other problems were “unscaffolded” and did not start with such
hus students had to pose these subgoals themselves. Indeed the blips for
y-addition (seen in the learning curve in Figure 2) do correspond with a
ency of these unscaffolded problems.
[Stamper&Koedinger,AIED2011]
Geoff Gordon—OCWC—April 2014
USE DATA-DRIVEN MODEL TO
REDESIGN TUTOR
New skill bars for planning skills
‣ skill bars are a tutor interface to
show students where they are in
acquiring skills
Sequence for gentle slope
‣ adaptive fading of scaffolding
New problems that focus on planning
‣ next slide…
27
Combine areas
Enter given values
Find regular area
Plan to combine areas
Combine areas
Subtract
Enter given values
Find regular area
Geoff Gordon—OCWC—April 2014
NEW PROBLEMS: ISOLATE PRACTICE ON
PLANNING STEP
Decompose complex problem into simpler ones
28
Geoff Gordon—OCWC—April 2014
NEW PROBLEMS: ISOLATE PRACTICE ON
PLANNING STEP
Decompose complex problem into simpler ones
28
Geoff Gordon—OCWC—April 2014
RESULTS
More efficient: 25% less student time
‣ instructional time by step type
Better learning of planning skills
‣ post-test %correct by item type
29
428 K.R. Koedinger et al
(a)
Fig. 4. Students using the rede
28 minutes) while actually spe
learned these decomposition sk
tion problems
5 Discussion and C
Following our past demon
discovered from data [8; 1
model to redesign an adapt
ports the hypothesis. In p
reached mastery (as demon
.
esigned tutor reached master
ending more time on the criti
kills as demonstrated by bett
Conclusion
nstrations that better cogn
1], we have tested the h
tive tutor yields better stu
particular, we found stud
nstrated within the tutor a
428 K.R. Koedinger et al
(a)
.
(b)
time:minutes%correct
[Stamper & Koedinger,AIED 2011]
Geoff Gordon—OCWC—April 2014
MORE STRUCTURE: WHAT’S IN A KC?
So far, each KC is just present or absent in a student or problem
Nothing to distinguish algebra KCs from ESL KCs
What’s going on under the hood as a student solves a problem?
30
Geoff Gordon—OCWC—April 2014
RULE-BASED COGNITIVE MODEL
3(2x – 5) = 9
6x – 15 = 9 2x – 5 = 3 6x – 5 = 9
IF GOAL IS SOLVE A(BX+C) = D
THEN REWRITE AS ABX + AC = D
IF GOAL IS SOLVE A(BX+C) = D
THEN REWRITE AS ABX + C = D
IF GOAL IS SOLVE A(BX+C) = D
THEN REWRITE AS BX+C = D/AKCs
bug
31
What does it look like inside the student’s brain?
‣ … maybe a rule-based system
‣ … in which case KCs might correspond to rules
‣ :- president of US is Obama
‣ constant C on LHS of equation E :- move C to RHS of E
Geoff Gordon—OCWC—April 2014
RULE-BASED SYSTEM
Aka production system:
‣ declarative knowledge held in working memory
‣ production rules match declarative knowledge
‣ and act on WM or external world
Much cognitive modeling work endorses this claim explicitly or implicitly
‣ ACT-R, SimStudent, Russell & Norvig, …
!
But two problems: uncertainty handling, representation learning
‣ here’s where more ML research can help!
32
“I see 3x+5 = 8”
“if LHS has constant C…”
“… then subtract C from both sides”
Geoff Gordon—OCWC—April 2014
PROBLEM 1: UNCERTAINTY
A DAY IN THE LIFE OF A RAT
33
Trial Bell? Light? Food?
1 × ✓ ✓
2 ✓ × ×
3 × ✓ ×
4 × ✓ ✓
… … … …
Geoff Gordon—OCWC—April 2014
RAT AS BAYESIAN
Priors over: how many trial types, sparsity of connections, reliability of
connections, …
(This is a common architecture for medical diagnosis systems)
34
bell light food …
1 2Trial types
Observables
…
Geoff Gordon—OCWC—April 2014
QUIZ: ARE YOU SMARTER THAN A RAT?
35
Trial Bell? Light? Food?
1 × ✓ ✓
2 × ✓ ×
3 × ✓ ✓
4 × ✓ ✓
… … … …
100 × ✓ ✓
Geoff Gordon—OCWC—April 2014
QUIZ: ARE YOU SMARTER THAN A RAT?
35
Trial Bell? Light? Food?
1 × ✓ ✓
2 × ✓ ×
3 × ✓ ✓
4 × ✓ ✓
… … … …
100 × ✓ ✓
101 ✓ ✓ ×
Geoff Gordon—OCWC—April 2014
AND THE RAT SAYS…
Both right! With more light-bell trials, evidence increases for a
separate trial type.
36
Effect name
2nd-order
conditioning
Conditioned
inhibition
light-food trials many many
bell-light trials few many
test: bell predicts food? ↑ ↓
Geoff Gordon—OCWC—April 2014
BAYESIAN RULE LEARNING IN CLASSICAL
CONDITIONING
Only fully Bayesian inference/learning
captured both effects
[Courville, Daw, Gordon,Touretzky, NIPS 2003]
Few bell-light trials, 1 trial type:
(bell, light, food) all associated
More trials: (bell, light, no food) v.
(light, food, no bell)
37
Number of bell-light trials
0 10 20 30 40 50 60
0
0.2
0.4
0.6
0.8
1
Number of A−X trials
P(US | A, D )
P(US | X, D )
(a) Second-order Cond.
P(food | light)
P(food | bell)
Geoff Gordon—OCWC—April 2014
PROBLEM 2: REPRESENTATION LEARNING
38
Flaw with
“KC = rule”:
Many bugs
come from
weak
features
Geoff Gordon—OCWC—April 2014
REPRESENTATION LEARNING
Some student errors come from failure to correctly interpret
(internally represent) a problem
As student sees more and more examples like 3x + 5 = 8, gets better
and better “language model” to explain them (build internal
representation)
—> some KCs must correspond to features of the improved language
model
39
Geoff Gordon—OCWC—April 2014
EXPERIMENT
Present algebra examples to a machine learning system
As part of learning, induce a language model (an unsupervised
probabilistic context free grammar) for algebra equations
Make output of language model (grammar nonterminals, e.g.,
SignedNumber) available as features of each example
Use these features in simulated problem-solving to discover KCs
[Li, Cohen, Koedinger, Matsuda, 2010]
Geoff Gordon—OCWC—April 2014
NEW COGNITIVE MODELS ARE
MORE ACCURATE
41
[Li, Cohen, Koedinger, Matsuda, 2010]
Geoff Gordon—OCWC—April 2014
OPEN RESEARCH QUESTION
Can we build a new generation of rule-based system that has
‣ rich uncertainty handling
‣ integrated representation learning
… and use it to help us model student learning?
42
Geoff Gordon—OCWC—April 2014
SUMMARY
A key contribution of machine learning to education will be to help
understand the educational content we’re creating and delivering
Essential idea: ML models of structured latent variables
Specifically, build and test hypotheses about the knowledge,
procedures and representations students use to solve problems
‣ latents = KCs, rules, representations, strategies, …
Need to link uncertainty handling (traditional domain of ML) to new,
harder situations encountered in understanding student knowledge
Exciting time for research in ML and education!
43

More Related Content

PDF
11th Chinese Articulation Workshop (K-12), Apr. 20-21, 2012
PPTX
Rigor, Relevance, Relationships
PPTX
Establishing meta-learning metrics when programming Mindstorms EV3 robots
PPTX
Designing and Evaluating Virtual Reality for Learning
PPTX
Just in Time Teaching - Jeff Loats @ LMU
PPTX
Keynote presentation pt.1 at eAssessment Scotland 14: Viewing Summative Asses...
PPTX
Just-in-Time Teaching @CCD - Oct 2013 - Jeff Loats
PDF
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
11th Chinese Articulation Workshop (K-12), Apr. 20-21, 2012
Rigor, Relevance, Relationships
Establishing meta-learning metrics when programming Mindstorms EV3 robots
Designing and Evaluating Virtual Reality for Learning
Just in Time Teaching - Jeff Loats @ LMU
Keynote presentation pt.1 at eAssessment Scotland 14: Viewing Summative Asses...
Just-in-Time Teaching @CCD - Oct 2013 - Jeff Loats
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...

Similar to Dr. Geoffrey J. Gordon: What can machine learning do for open education? (20)

PPTX
Teacher-Scholar Forum - Just in Time Teaching - feb 2013 - jeff loats
PPTX
UMR - My ongoing projects with Technology - Rochester - 2015
PDF
Learning Analytics Dashboards
PDF
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
PPTX
EDUC-105-UNIT-2.1-EDUC-105-UNIT-2.1.pptx
PPT
Cognitive Modeling & Intelligent Tutors
PDF
Deep Learning a whirlwind tour of key principles
PDF
Automating Feedback & Assessment in WebLab
PDF
Data Science Reinvents Learning?
PPTX
Key Terms PBL by CAE CI Winter 14 Educators
PDF
Bulkley valley.leadership.dec2011
PDF
The College Classroom Fa15 Meeting 2: Developing Expertise
PDF
Explainer Videos
PPTX
Just in time teaching a 21st century brain-based technique - jeff loats - l...
PPTX
Why today's students need a 3D education
PPTX
BPM Cluster Meeting 2014
PPTX
Narayana School olympiad Program ppt.pptx
PPTX
Unit 1 Webinar
PDF
The College Classroom Week 3: Developing Expertise through Deliberate Practice
PPTX
Controlled Assessment vs IGCSE
Teacher-Scholar Forum - Just in Time Teaching - feb 2013 - jeff loats
UMR - My ongoing projects with Technology - Rochester - 2015
Learning Analytics Dashboards
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
EDUC-105-UNIT-2.1-EDUC-105-UNIT-2.1.pptx
Cognitive Modeling & Intelligent Tutors
Deep Learning a whirlwind tour of key principles
Automating Feedback & Assessment in WebLab
Data Science Reinvents Learning?
Key Terms PBL by CAE CI Winter 14 Educators
Bulkley valley.leadership.dec2011
The College Classroom Fa15 Meeting 2: Developing Expertise
Explainer Videos
Just in time teaching a 21st century brain-based technique - jeff loats - l...
Why today's students need a 3D education
BPM Cluster Meeting 2014
Narayana School olympiad Program ppt.pptx
Unit 1 Webinar
The College Classroom Week 3: Developing Expertise through Deliberate Practice
Controlled Assessment vs IGCSE
Ad

More from The Open Education Consortium (20)

PPTX
Moving Beyond OER: USNH
PPTX
Universities for the Future (OE Global 2015)
PDF
Training Entrepreneurship Through OEP - the Start up Model (OE Global 2015)
PPTX
Qualitative Investigation of Faculty Usage of OERs in Wachington Community an...
PPTX
Open license for MOOCs - Martijn Ouwehand (OE global 2015)
PDF
OE Global 2015 - Paradis
PDF
Inclusive design (oe global action lab)
PPTX
OE global2015 Pre-conference workshop
PPTX
Open Education 101 (OE Global 2015 Pre-conference workshop)
PPTX
OER Roadmap (OE Global Pre-conference workshop)
PPTX
Collaborating across borders: OER use and open educational practices within t...
PPTX
OER strategies and best practices as success factors in Open Access initiativ...
PPTX
The Maturing Open Educational Resources (OER) Ecosystem: Partners, Expansion,...
PDF
Teachers time is valuable (OE global2015)
PDF
Open Assembly (Global 2015 Action Lab)
PPTX
OER strategies and best practices as success factors in Open Access initiativ...
PPT
Web based learning - research and innovation in translation learning resource...
PDF
Free of Open? Investigating Intellectual Property Right and Openness for OER ...
Moving Beyond OER: USNH
Universities for the Future (OE Global 2015)
Training Entrepreneurship Through OEP - the Start up Model (OE Global 2015)
Qualitative Investigation of Faculty Usage of OERs in Wachington Community an...
Open license for MOOCs - Martijn Ouwehand (OE global 2015)
OE Global 2015 - Paradis
Inclusive design (oe global action lab)
OE global2015 Pre-conference workshop
Open Education 101 (OE Global 2015 Pre-conference workshop)
OER Roadmap (OE Global Pre-conference workshop)
Collaborating across borders: OER use and open educational practices within t...
OER strategies and best practices as success factors in Open Access initiativ...
The Maturing Open Educational Resources (OER) Ecosystem: Partners, Expansion,...
Teachers time is valuable (OE global2015)
Open Assembly (Global 2015 Action Lab)
OER strategies and best practices as success factors in Open Access initiativ...
Web based learning - research and innovation in translation learning resource...
Free of Open? Investigating Intellectual Property Right and Openness for OER ...
Ad

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Updated Idioms and Phrasal Verbs in English subject
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
RMMM.pdf make it easy to upload and study
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
master seminar digital applications in india
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
Anesthesia in Laparoscopic Surgery in India
Updated Idioms and Phrasal Verbs in English subject
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
RMMM.pdf make it easy to upload and study
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial diseases, their pathogenesis and prophylaxis
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
What if we spent less time fighting change, and more time building what’s rig...
Weekly quiz Compilation Jan -July 25.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
01-Introduction-to-Information-Management.pdf
master seminar digital applications in india
Practical Manual AGRO-233 Principles and Practices of Natural Farming
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Microbial disease of the cardiovascular and lymphatic systems
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Chinmaya Tiranga quiz Grand Finale.pdf

Dr. Geoffrey J. Gordon: What can machine learning do for open education?

  • 1. WHAT CAN MACHINE LEARNING DO FOR OPEN EDUCATION? Geoff Gordon CMU Machine Learning ggordon@cs.cmu.edu
  • 2. Civilization advances by extending the number of important operations which we can perform without thinking about them. —Alfred North Whitehead, 1911
  • 4. Geoff Gordon—OCWC—April 2014 CONTRIBUTION OF ML Machine learning can help us understand how students learn 4
  • 5. Geoff Gordon—OCWC—April 2014 CONTRIBUTION OF ML Machine learning can help us understand how students learn ‣ Not just any ML, but latent variable (“hidden feature”) discovery 4
  • 6. Geoff Gordon—OCWC—April 2014 CONTRIBUTION OF ML Machine learning can help us understand how students learn ‣ Not just any ML, but latent variable (“hidden feature”) discovery ‣ Not just any latent variables, but highly structured ones 4
  • 7. Geoff Gordon—OCWC—April 2014 WHY BOTHER? Student feedback ‣ what does the student know? ‣ what are common causes for the mistake the student just made? Instructor feedback ‣ what do the students know? ‣ what skills does this course content address? ‣ what skills doesn’t this course content address? Evaluation ‣ help design rubrics for (peer, instructor) grading ‣ cluster submissions by similar approach, skill level, … Etc… 5
  • 8. Geoff Gordon—OCWC—April 2014 GOAL: UNDERSTAND HOW STUDENTS LEARN SOMETHING 6 10m 4m 5m 10m 3x + 4 = x + 10 John took Joe for a ride on his boat. ___ boat was blue with a red stripe. A/The/[]
  • 9. Geoff Gordon—OCWC—April 2014 GOAL: UNDERSTAND HOW STUDENTS LEARN SOMETHING 6 10m 4m 5m 10m 3x + 4 = x + 10 John took Joe for a ride on his boat. ___ boat was blue with a red stripe. 70m2 x = 3 The A/The/[]
  • 10. Geoff Gordon—OCWC—April 2014 EX: GEOMETRY TUTOR 7 http://www.carnegielearning.com/
  • 11. Geoff Gordon—OCWC—April 2014 STEP-LEVEL DATA Record right/wrong, timing, use of hints, … 8 one step = fill in a box
  • 12. Geoff Gordon—OCWC—April 2014 SIMPLEST MODEL: RASCH / 1-PARAMETER ITEM RESPONSE THEORY 9 ln ✓ pt 1 pt ◆ = ✓it + jt θ = student mean (knowledge level) β = item mean (easy/difficult) it = student ID jt = step ID pt = P(correct answer)
  • 13. Geoff Gordon—OCWC—April 2014 SIMPLEST MODEL: RASCH / 1-PARAMETER ITEM RESPONSE THEORYStudents Steps in tutor Each entry: does student i get step j right? 1 10 θ β predict 1 if θi+βj > 0
  • 14. Geoff Gordon—OCWC—April 2014 STRUCTURE: SIMILARITY AMONG STEPS Learn a “step map”: each point = 1 step Steps over here are more similar to each other… … than to steps over here
  • 15. Geoff Gordon—OCWC—April 2014 STRUCTURE: SIMILARITY AMONG STEPS Learn a “step map”: each point = 1 step Steps over here are more similar to each other… … than to steps over here hic sunt dracones
  • 16. Geoff Gordon—OCWC—April 2014 HOW PRINCIPAL COMPONENTS ANAYSIS GOT FAMOUS Y1 Y2 Y3 . . . Yn Users Movies Each entry: how many stars does user i give to movie j? 4 12
  • 17. Geoff Gordon—OCWC—April 2014 RESULT OF FACTORING u1 u2 u3 . . . un v1 … vk Users MoviesBasis weights Basisvectors Low-d basis = latent variables ! Basis vectors represent latent properties of movies, e.g.,“is a comedy” 13
  • 18. Geoff Gordon—OCWC—April 2014 IN OUR CASE (STUDENT-STEP DATA) U1 U2 U3 . . . UN V1 … VK Students Stepsbasis weights basisvectors Basis vectors are candidate “eigenskills” Weights are students’ knowledge levels 14
  • 19. Geoff Gordon—OCWC—April 2014 DOES IT WORK? 15 steps about pentagons steps about circles other steps. Learned features let us predict held-out data better than chance (ρ = .3, p < 0.0001) step map
  • 20. Geoff Gordon—OCWC—April 2014 DOES IT WORK? 15 steps about pentagons steps about circles other steps. Learned features let us predict held-out data better than chance (ρ = .3, p < 0.0001) Yes, sort of … step map
  • 21. Geoff Gordon—OCWC—April 2014 STRUCTURE: PRACTICE MAKES PERFECT PCA ignores step order — clearly wrong… Add model of student learning to PCA ‣ based on “additive factor model” [Draney et al., 1995] 16
  • 22. Geoff Gordon—OCWC—April 2014 STRUCTURE: PRACTICE MAKES PERFECT PCA ignores step order — clearly wrong… Add model of student learning to PCA ‣ based on “additive factor model” [Draney et al., 1995] Result: predictions of held-out data get slightly better ‣ ρ = .45 (p < 0.01 vs. plain PCA) Step map still looks the same Meh… 17
  • 23. Geoff Gordon—OCWC—April 2014 WHAT WE REALLY WANT To be understandable to us humans, latents need to be sparse and binary (‘is about circles’,‘requires subtracting areas’) Can’t do this fully automatically from this small data set (only 59 students, 370 steps) Challenge: can we discover sparse, binary, understandable latents automatically from MOOC-scale data? 18
  • 24. Geoff Gordon—OCWC—April 2014 “KC HYPOTHESIS” Knowledge comes in atomic units (“KCs”) Each KC is learned independently (no transfer) ‣ transfer among steps mediated by common KCs ‣ or prerequisite structure (can’t learn algebra w/o knowing arithmetic) Each student has a (latent, scalar) proficiency level for each KC ‣ learn/forget = transition to a higher/lower proficiency level Learning a KC happens only through exposure to that KC ‣ problem, worked example, lecture, real life, … 19 step 17: {A, B} step 23: {A, C} [Koedinger, Corbett, Perfetti. Cognitive Science, 2012]
  • 25. Geoff Gordon—OCWC—April 2014 CONSEQUENCES OF KC HYPOTHESIS Mistakes are at KC level: select wrong KC; apply right KC to wrong data; mistake in application of KC ‣ identifying the KC at fault makes it easier to give student feedback If we can accurately ‣ determine list of KCs ‣ label instructional activities by KCs …then we immediately know the quality/coverage of our content 20
  • 28. Geoff Gordon—OCWC—April 2014 WHY ARE SOME COMPOSE-BY-ADDITION STEPS HARDER? 23 compose by addition
  • 29. Geoff Gordon—OCWC—April 2014 WHY ARE SOME COMPOSE-BY-ADDITION STEPS HARDER? 24 hard easy medium compose by addition
  • 30. Geoff Gordon—OCWC—April 2014 WHY ARE SOME COMPOSE-BY-ADDITION STEPS HARDER? 24 hard easy medium compose by addition
  • 31. Geoff Gordon—OCWC—April 2014 HYPOTHESIS: DIFFERENCE IS IN HOW MUCH PLANNING IS NEEDED 25 plan to compose subtract compose by addition
  • 32. Geoff Gordon—OCWC—April 2014 KC DISCOVERY 26 t [4]. Other problems were “unscaffolded” and did not start with such hus students had to pose these subgoals themselves. Indeed the blips for y-addition (seen in the learning curve in Figure 2) do correspond with a ency of these unscaffolded problems. [Stamper&Koedinger,AIED2011]
  • 33. Geoff Gordon—OCWC—April 2014 USE DATA-DRIVEN MODEL TO REDESIGN TUTOR New skill bars for planning skills ‣ skill bars are a tutor interface to show students where they are in acquiring skills Sequence for gentle slope ‣ adaptive fading of scaffolding New problems that focus on planning ‣ next slide… 27 Combine areas Enter given values Find regular area Plan to combine areas Combine areas Subtract Enter given values Find regular area
  • 34. Geoff Gordon—OCWC—April 2014 NEW PROBLEMS: ISOLATE PRACTICE ON PLANNING STEP Decompose complex problem into simpler ones 28
  • 35. Geoff Gordon—OCWC—April 2014 NEW PROBLEMS: ISOLATE PRACTICE ON PLANNING STEP Decompose complex problem into simpler ones 28
  • 36. Geoff Gordon—OCWC—April 2014 RESULTS More efficient: 25% less student time ‣ instructional time by step type Better learning of planning skills ‣ post-test %correct by item type 29 428 K.R. Koedinger et al (a) Fig. 4. Students using the rede 28 minutes) while actually spe learned these decomposition sk tion problems 5 Discussion and C Following our past demon discovered from data [8; 1 model to redesign an adapt ports the hypothesis. In p reached mastery (as demon . esigned tutor reached master ending more time on the criti kills as demonstrated by bett Conclusion nstrations that better cogn 1], we have tested the h tive tutor yields better stu particular, we found stud nstrated within the tutor a 428 K.R. Koedinger et al (a) . (b) time:minutes%correct [Stamper & Koedinger,AIED 2011]
  • 37. Geoff Gordon—OCWC—April 2014 MORE STRUCTURE: WHAT’S IN A KC? So far, each KC is just present or absent in a student or problem Nothing to distinguish algebra KCs from ESL KCs What’s going on under the hood as a student solves a problem? 30
  • 38. Geoff Gordon—OCWC—April 2014 RULE-BASED COGNITIVE MODEL 3(2x – 5) = 9 6x – 15 = 9 2x – 5 = 3 6x – 5 = 9 IF GOAL IS SOLVE A(BX+C) = D THEN REWRITE AS ABX + AC = D IF GOAL IS SOLVE A(BX+C) = D THEN REWRITE AS ABX + C = D IF GOAL IS SOLVE A(BX+C) = D THEN REWRITE AS BX+C = D/AKCs bug 31 What does it look like inside the student’s brain? ‣ … maybe a rule-based system ‣ … in which case KCs might correspond to rules ‣ :- president of US is Obama ‣ constant C on LHS of equation E :- move C to RHS of E
  • 39. Geoff Gordon—OCWC—April 2014 RULE-BASED SYSTEM Aka production system: ‣ declarative knowledge held in working memory ‣ production rules match declarative knowledge ‣ and act on WM or external world Much cognitive modeling work endorses this claim explicitly or implicitly ‣ ACT-R, SimStudent, Russell & Norvig, … ! But two problems: uncertainty handling, representation learning ‣ here’s where more ML research can help! 32 “I see 3x+5 = 8” “if LHS has constant C…” “… then subtract C from both sides”
  • 40. Geoff Gordon—OCWC—April 2014 PROBLEM 1: UNCERTAINTY A DAY IN THE LIFE OF A RAT 33 Trial Bell? Light? Food? 1 × ✓ ✓ 2 ✓ × × 3 × ✓ × 4 × ✓ ✓ … … … …
  • 41. Geoff Gordon—OCWC—April 2014 RAT AS BAYESIAN Priors over: how many trial types, sparsity of connections, reliability of connections, … (This is a common architecture for medical diagnosis systems) 34 bell light food … 1 2Trial types Observables …
  • 42. Geoff Gordon—OCWC—April 2014 QUIZ: ARE YOU SMARTER THAN A RAT? 35 Trial Bell? Light? Food? 1 × ✓ ✓ 2 × ✓ × 3 × ✓ ✓ 4 × ✓ ✓ … … … … 100 × ✓ ✓
  • 43. Geoff Gordon—OCWC—April 2014 QUIZ: ARE YOU SMARTER THAN A RAT? 35 Trial Bell? Light? Food? 1 × ✓ ✓ 2 × ✓ × 3 × ✓ ✓ 4 × ✓ ✓ … … … … 100 × ✓ ✓ 101 ✓ ✓ ×
  • 44. Geoff Gordon—OCWC—April 2014 AND THE RAT SAYS… Both right! With more light-bell trials, evidence increases for a separate trial type. 36 Effect name 2nd-order conditioning Conditioned inhibition light-food trials many many bell-light trials few many test: bell predicts food? ↑ ↓
  • 45. Geoff Gordon—OCWC—April 2014 BAYESIAN RULE LEARNING IN CLASSICAL CONDITIONING Only fully Bayesian inference/learning captured both effects [Courville, Daw, Gordon,Touretzky, NIPS 2003] Few bell-light trials, 1 trial type: (bell, light, food) all associated More trials: (bell, light, no food) v. (light, food, no bell) 37 Number of bell-light trials 0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1 Number of A−X trials P(US | A, D ) P(US | X, D ) (a) Second-order Cond. P(food | light) P(food | bell)
  • 46. Geoff Gordon—OCWC—April 2014 PROBLEM 2: REPRESENTATION LEARNING 38 Flaw with “KC = rule”: Many bugs come from weak features
  • 47. Geoff Gordon—OCWC—April 2014 REPRESENTATION LEARNING Some student errors come from failure to correctly interpret (internally represent) a problem As student sees more and more examples like 3x + 5 = 8, gets better and better “language model” to explain them (build internal representation) —> some KCs must correspond to features of the improved language model 39
  • 48. Geoff Gordon—OCWC—April 2014 EXPERIMENT Present algebra examples to a machine learning system As part of learning, induce a language model (an unsupervised probabilistic context free grammar) for algebra equations Make output of language model (grammar nonterminals, e.g., SignedNumber) available as features of each example Use these features in simulated problem-solving to discover KCs [Li, Cohen, Koedinger, Matsuda, 2010]
  • 49. Geoff Gordon—OCWC—April 2014 NEW COGNITIVE MODELS ARE MORE ACCURATE 41 [Li, Cohen, Koedinger, Matsuda, 2010]
  • 50. Geoff Gordon—OCWC—April 2014 OPEN RESEARCH QUESTION Can we build a new generation of rule-based system that has ‣ rich uncertainty handling ‣ integrated representation learning … and use it to help us model student learning? 42
  • 51. Geoff Gordon—OCWC—April 2014 SUMMARY A key contribution of machine learning to education will be to help understand the educational content we’re creating and delivering Essential idea: ML models of structured latent variables Specifically, build and test hypotheses about the knowledge, procedures and representations students use to solve problems ‣ latents = KCs, rules, representations, strategies, … Need to link uncertainty handling (traditional domain of ML) to new, harder situations encountered in understanding student knowledge Exciting time for research in ML and education! 43