Data-Driven Education 2020: Using Big Educational Data to Improve Teaching and Learning

Data-Driven Education:
Using Big Educational Data
to Improve Teaching and Learning
Peter Brusilovsky
School of Computing and Information,
University of Pittsburgh

MOOC
Massive Open Online Course

MOOC Completion Rate
Classic loop user modeling - adaptation in adaptive systems
http://www.katyjordan.com/MOOCproject.html

What Else These Students Need?
• Top colleges
– Stanford, CalTech, Princeton, GATech, Penn, Duke..
• Great faculty – top guns in their fields
• Great content
• Top online platforms – Coursera, edX, Udacity
• FREE!

But now we have data!
• MOOC-level data
– Who did what, when, and what was the result
• What we can do?
– Find out who drops the course (and guess why)
– Predict drop-outs (and possibly interfere)
– Personalize learning process
– Learn more about learning
– Find our how we can teach better

Human
Learning
Nexus of Human and Machine Learning
Machine
Learning

Data-Driven Education
• Using data left by past learners to benefit future
learners
• How this data could be used? Who is making
sense of the data?
• Machine-Centered Approach
– Educational Data Mining
• Human-Centered Approach
– Visual Learning Analytics

Educational Data Mining
• The idea: Feed data to various data mining and
machine learning approaches to tune current
learning process and discover important things
for future improvements
• Could be used on all levels
– Which courses form hidden prerequisites?
– How to order problems and readings in a course?
– Which hint or remedial activity should be offered to
students in problem solving context?

EDM for Personalized Learning
• Several ways to support personalized learning
• Better domain modeling
• Better student modeling
• Better adaptation approaches
• Better student engagement
• Finding what works best for different students
categories

Visual Learning Analytics
• The idea: Present data in visual form to
students, teachers, administrators helping them
to make better decisions about learning process
• Support self-regulated learning
• Provide navigation support for students
• Show performance to instructors to make
decisions
• Show data to administrators to redesign process

Research at PAWS Lab, U of Pittsburgh
• http://adapt2.sis.pitt.edu/wiki/PAWS
• Educational Data Mining
– Mining student online behavior patterns
– Domain modeling and latent topic discovery
– Data-driven student modeling
– Concept extraction
• Visual Learning Analytics
– Social navigation in E-learning
– Open social learner modeling with social comparison

BEHAVIOR MINING
Case 1: Use Past Learners Data to Augment AI Decision Support

Latent Groups and MOOC Performance
• How we could explain large differences in MOOC
dropouts and performance?
• Could it be connected to demography?
Ø Guo, P. and Reinecke, K. (2014) Demographic Differences in How Students Navigate Through
MOOCs. In: Proceedings of Proceedings of the First ACM Conference on Learning @ Scale
Conference, Atlanta, Georgia, USA, ACM, pp. 21-30.
• Could it be connected to behavior?
– Different learners behave differently!
– Expert-defined models of differences: fraction of
assignments to all work, fraction of using forums…
Ø Anderson, A., Huttenlocher, D., Kleinberg, J., and Leskovec, J. (2014) Engaging with Massive
Online Courses. In: Proceedings of the 23rd International World Wide Web Conference (WWW
2014), Seoul, Korea, April 7-11, 2014, pp. 687-698.

Mining Learner Behavior Patterns
• If simple measures do not work complex
behavior patterns might help
• Find behavior patterns
• Identify clusters with similar behavior
• Use patterns to predict retention and success
• Help different students in different ways

How to Find Behavior Patterns
• Encode behavior as a vector
– Balance of activities by type
– Balance of activities by time
– Transition between activities
• Find likeminded groups
– Clustering
– Matrix and tensor factorization
• Markov models
• Sequence mining

Problem-Solving Genome
• Key ideas
– Individual differences important for understanding
students and adapting learning
– "Old generation" of individual differences (i.e. learning
styles) not valuable in e-learning context
– Could we use "data-driven" science extracting individual
differences from behavior data?
• Main challenge
– How to process the data to find and use individual
differences
• Our approach uses sequence mining and student
modeling based on the use of micro-sequences

Context: Parameterized Java Exercises
Some numbers change each time
the exercise is loaded
Hard to game
Exercise from QuizJet system

Dataset
Exercises
• 101 parameterized exercises
• 19 topics
• Exercises labeled as easy (41), medium (41) or hard (19)
complexity
Students
• 3 terms, a total of 101 students
• 21,215 attempts, 14,726 correct and 6,489 incorrect
• We formed sequences of repetitions of the student in the
same exercise in the same session within the system
• We collect time in each attempt
• Pretest, posttest (not all the students)

Labeling Steps (attempts)
Correctness: Success (S) or Failure (F)
Time: Short (lowercase) or Long (uppercase)
– Using median of the distribution of time per exercise
– Using different distributions for first attempt
label correctness time
s success short
S success long
f failure short
F failure long

Labeled Sequences
• First and last attempt are labeled differently. Here
we used underscore ‘_’
• Example sequences:
_fS_
_fFs_
_ss_
This labeled representation is for making sequences and patterns more
readable. The actual labeling used for running the pattern mining
algorithm uses only uppercase letters and different sets of letters for
first and last attempts within sequences.

Pattern mining
• Using PexSPAM algorithm with gap = 0
• Each possible pattern of length 2 or higher is
explored
• Support of a pattern: proportion of sequences
containing the pattern (at least once)
– Does not count multiple occurrences of the pattern within a
sequence
• Select all patterns with minimum support of 1%

Pattern mining
• There were 102 frequent micro patterns
Top 20 frequent micro patterns

The Problem Solving Genome
• Constructed a frequency vector over the 102
patterns (vector of size 102) for each student
– Each common pattern is a gene
• The vector represents how frequently a student uses
each of the micro patterns
• The vector is an individual genome build of genes

Problem Solving Genome
_fSss_
_fSS_
_FFss_
_FSss
_
_fSs_ Frequencies of each of the
102 common patterns
3/5
ss_ ss Ss SS_ _FS_
0/5 2/5 1/5 0/5 …
Guerra, J., Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2014) The Problem
Solving Genome: Analyzing Sequential Patterns of Student Work with
Parameterized Exercises. In: Proceedings of the 7th International Conference on
Educational Data Mining (EDM 2014) pp. 153-160.

Exploring the Genome
• Stability
– Are the patterns stable on a
student?
• Effect of complexity
– Are the patterns different across
complexity levels?
• Patterns of success
– Are successful students
following different patterns?

Genome Stability
• Is the student more similar to him/herself
than to others?
– Select students with at least 60 sequences (32 students)
– For each student:
• Split sequences per student in two random sets (set 1, set 2)
• Form Genome of each set
– Compute Jensen-Shannon (JS) divergence between:
• The the genome of the 2 sets of each student (self-distance)
• Student’s set 1 genome and set 1 of other students (average)
(other-distance)
• Are students changing patterns over time?
– Repeat the procedure splitting sets in early (first half) and
late (second half) sequences per student

Results (1)
Self-
distances
Other-
distances
Sig. Cohen’s
d
M SE M SE
Randomly split Genome
(a)
.2370 .0169 .4815 .0141 <.001 2.693
Early/Late Genome (b) .3211 .0214 .4997 .0164 <.001 1.205
Paired-sample t-test
• Even when changing from early to late sequences,
student self distance is significantly smaller than the
distance to others
Genome is stable on individuals

Clustering by Genome
• Cluster students by their genomes and analyze
different patterns
– Between clusters
– Between low and high students within each cluster
• Spectral Clustering with k = 2
– Larger eigen-gap with k = 2

• Cluster 1: confirmers (repeat short successes)
• Cluster 2: non-confirmers (quitters)
Ordering patterns by difference magnitude
(cluster 2 – cluster 1)

Groups and guidance
• Successful patterns in each cluster
are closer to the other cluster
– Successful confirmers tend to stop
after long success
– Successful non-confirmers (c 2) tend
to continue after hard success
• Extreme different patterns between
clusters are “harmful”
• How it could be used for
personalization?
– Identify student type
– Offer different interface or discourage
poor behavior with recommendation
_FS_

More Results with Behavior Mining
• Hosseini, R., Brusilovsky, P., Yudelson, M., and Hellas, A. (2017) Stereotype Modeling
for Problem-Solving Performance Predictions in MOOCs and Traditional Courses.
In: Proceedings of Proceedings of the 25th Conference on User Modeling, Adaptation and
Personalization, Bratislava, Slovakia, ACM, pp. 76-84.
• Mirzaei, M., Sahebi, S., and Brusilovsky, P. (2019) Annotated Examples and Parameterized
Exercises: Analyzing Student's Sequential Patterns. In: S. Isotani, et al. (eds.) Proceedings of 20th
International Conference on Artificial Intelligence in Education, AIED 2019, Part I, Chicago, IL,
USA, June 25-29, 2019, Springer, pp. 308-319.
• Mirzaei, M., Sahebi, S., and Brusilovsky, P. (2020) Detecting Trait versus Performance
Student Behavioral Patterns Using Discriminative Non-Negative Matrix Factorization.
In: Proceedings of The Thirty-Third International Florida Artificial Intelligence Research Society
Conference (FLAIRS-32), Miami, FL, Association for the Advancement of Artificial Intelligence,
pp. 439-444.
• Wen, X., Lin, Y.-R., Liu, X., Brusilovsky, P., and Barria Pineda, J. (2019) Iterative
Discriminant Tensor Factorization for Behavior Comparison in Massive Open Online Courses.
In: Proceedings of The World Wide Web Conference, San Fancisco, CA, USA, ACM, pp. 2068--
2079.

35
Tinkerers Movers
VS.
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
D
d
d
D
J
J
D
E
J
j
D
D
A
d
D
A
J
c
a
A
A
f
_
A
f
A
c
J
A
_
A
D
A
a
A
D
A
A
_
_
A
A
A
A
Ratio
of
Pattern
Occurrence
Pattern
Cluster 1 (Tinkerers) Cluster 2 (Movers)
ØTinkerers
were less
efficient and
had lower
grades
ØPatterns
tended to
distinguish
low/high
performers
in both
groups
ØBoth groups
included a
mixture of
strong and
Movers have more
steps of adding
code concepts and
passing more tests
Tinkerers have more
steps of changing
code concepts without
passing more tests

SOCIAL NAVIGATION
Case 2: Use Past Learners Data to Augment Humans

Navigation Support
• Students need personalized guidance (navigation
support) to access right content in the right time
– Too late – easy but mostly useless
– Too early – not year ready to understand/apply
– Students start with different knowledge, learn with
different speed
• Knowledge-based navigation support based on
student modeling works well to increase success and
motivation
• Knowledge-based approaches require considerable
knowledge engineering – domain modeling, content
analysis, prerequisite elicitation, etc.

Social Navigation
• Wisdom from user data vs. wisdom from experts
• Social navigation uses behavior of past users to
guide new users
• Can we use “wisdom” extracted from the work of
a community of learners to replace knowledge-
based guidance?
• Knowledge engineering vs. data analysis

Knowledge Sea II`
Brusilovsky, P., Chavan, G., and Farzan, R. (2004) Social adaptive navigation support for open corpus electronic
textbooks. Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems

Knowledge Sea II (+ AnnotatEd)
Farzan, R. and Brusilovsky, P. (2008) AnnotatEd: A social navigation and annotation service for web-based
educational resources. New Review in Hypermedia and Multimedia 14 (1), 3-32.

Open Social Student Modeling
• Key ideas
– Make traditional student models open to the users
– Allow students to compare themselves with class and
peers
– Social navigation based on performance data
• Main challenge
– How to design the interface to make an easy
comparison and provide social guidance and
motivation?
– We went through several attempts…

MasteryGrids
• Adaptive Navigation Support
• Topic-based Adaptation
• Open Social Learner Modeling
• Social Educational Progress Visualization
• Multiple Content Types
• Concept-Based Recommendation
• Open Source

Open Social Student Modeling - I
Interactive Demo YouTube Demo

Open Social Student Modeling - II

Open Social Student Modeling - III

The Study
• A classroom study in a graduate Database Course
• Two sections of the same class. Same teacher, same
lectures, etc.
• The students were able to access non-mandatory
database practice content (exercises, examples) through
Mastery Grids
• 47 students worked with OSM interface and 42 students
worked with OSSM interface
Brusilovsky, P., Somyurek, S., Guerra, J., Hosseini, R., Zadorozhny, V., and Durlach, P. (2016) The Value of
Social: Comparing Open Student Modeling and Open Social Student Modeling. IEEE Transactions on
Emerging Topics in Computing 4 (3), 450-461.

Impact on Learning
• Student knowledge significantly increased in both
groups
• Number of attempted problems significantly
predicts the final grade (SE=0.04,p=.017).
• We obtained the coefficient of 0.09 for number of
attempts on problems, meaning attempting 100
problems increases the final grade by 9
• The mean learning gain was higher for both weak and
strong students in OSSM group
• The difference was significant for weak students
(p=.033)

Does OSSM increases system usage?
Variable
OSM OSSM
U
Mean Mean
Sessions 3.93 6.26 685.500*
Topics coverage 19.0% 56.4% 567.500**
Total attempts to problems 25.86 97.62 548.500**
Correct attempts to problems 14.62 60.28 548.000**
Distinct problems attempted 7.71 23.51 549.000**
Distinct problems attempted correctly 7.52 23.11 545.000**
Distinct examples viewed 18.19 38.55 611.500**
Views to example lines 91.60 209.40 609.000**
MG loads 5.05 9.83 618.500**
MG clicks on topic cells 24.17 61.36 638.500**
MG click on content cells 46.17 119.19 577.500**
MG difficulty feedback answers 6.83 14.68 599.500**
Total time in the system 5145.34 9276.58 667.000**
Time in problems 911.86 2727.38 582.000**
Time in MG (navigation) 2260.10 4085.31 625.000**

Does OSSM increase Efficiency?
• Time per line, time per example and time per activity
scores of students in OSSM group are significantly lower
than in the other group.
• Students who used OSSM interface worked more
efficiently.
Variable
OSM OSSM
U
Mean Mean
Time per line 22.93 11.61 570.000**
Time per
example 97.74 58.54 508.000*
Time per
problem 37.96 29.72 242.000
Time per
activity 47.92 34.33 277.000*

Does OSLM Increase Student Retention?
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
%
Students
in
class
Problem attempts
OSSM
OSM
• OSLM group had much higher
student usage
• Looking much more interesting to
students at the start (compare
#students after the first login)
• At the level of 30+, serious
engagement with the system, the
OSLM group still retained more
than 50% of its original users
while OSM engagement was below
20%.
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
Problem attempts
OSSM
OSM

Why Engagement Is Important?
• Many systems demonstrated their educational
effectiveness in a lab-like settings: once the students are
pushed to use it - it benefits their learning
• However, once released to real classes, these systems are
under-used - most of them offer additional non-
mandatory learning opportunities
• “Students are only interested in points and grades”
• Convert all tools into credit-bearing activities?
• Or use alternative approaches to increase motivation
• Critical to support students in self-organized non-credit
learning context like MOOCs

Current State on OSSM
• MasteryGrids is an open source system
• Full support offered for three domains
– Java, 5 types of smart content
– Python, 6 types of smart content
– SQL, 4 types of smart content
• Several large-scale studies in progress
• Exploring new concept-based OSSM Interface
• Collaborators are welcome!
– Can use your own content and course structure!

More on Social Navigation and OSLM
• Farzan, R. and Brusilovsky, P. (2018) Social Navigation. In: P. Brusilovsky and
D. He (eds.): Social Information Access: Systems and Technologies. Lecture Notes in
Computer Science, Cham: Springer, pp. 142-180.
• Hsiao, I.-H., Bakalov, F., Brusilovsky, P., and König-Ries, B. (2013)
Progressor: social navigation support through open social student modeling. New
Review of Hypermedia and Multimedia 19 (2), 112-131.
• Hsiao, I.-H. and Brusilovsky, P. (2017) Guiding and Motivating Students
Through Open Social Student Modeling: Lessons Learned. Teachers College
Record 119 (3).
• Akhuseyinoglu, K., Barria-Pineda, J., Sosnovsky, S., Lamprecht, A.-L.,
Guerra, J., and Brusilovsky, P. (2020) Exploring Student-Controlled Social
Comparison. In: C. Alario-Hoyos, M. J. Rodríguez-Triana, M. Scheffel, I. Arnedillo-
Sánchez and S. M. Dennerlein (eds.) Proceedings of European Conference on
Technology Enhanced Learning EC-TEL 2020, Cham, 14-18 September, 2020,
Springer International Publishing, pp. 244-258.

More on Data-Driven E-learning
• Domain Modeling and Latent topic discovery
– Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2016) Tensor Factorization
for Student Modeling and Performance Prediction in Unstructured Domain.
Proceedings of the 9th International Conference on Educational Data
Mining (EDM 2016), pp. 502-505.
• Data-driven student modeling
– González-Brenes, J. P., Huang, Y., and Brusilovsky, P. (2014)
General Features in Knowledge Tracing to Model Multiple Subskills,
Temporal Item Response Theory, and Expert Knowledge. Proceedings of the
7th International Conference on Educational Data Mining (EDM 2014),
London, UK, July 4-7, 2014, pp. 84-91.
• Concept extraction for modeling and adaptation
– Huang, Y., Yudelson, M., Han, S., He, D., and Brusilovsky, P.
(2016) A Framework for Dynamic Knowledge Modeling in Textbook-Based
Learning. In: Proceedings of 24th Conference on User Modeling,
Adaptation and Personalization (UMAP 2016), pp. 141-150.

Acknowledgements
• Joint work with
– Rosta Farzan, Tomek Loboda, Michael Yudelson
– Sharon Hsiao, Sherry Sahebi, Julio Guerra,
– Roya Hosseini, Yun Huang, Rafael Dias Araújo
• U. of Pittsburgh “Innovation in Education” awards
• NSF Grants
– CAREER 0447083
– EHR 0310576
– IIS 0426021
• ADL.net support for OSLM work

Visit us in Pittsburgh to Learn More!

… or Read our Papers
• http://www.pitt.edu/~peterb/papers.html
• https://www.researchgate.net/profile/Peter_Br
usilovsky

Data-Driven Education 2020: Using Big Educational Data to Improve Teaching and Learning

More Related Content

What's hot (20)

Similar to Data-Driven Education 2020: Using Big Educational Data to Improve Teaching and Learning (20)

More from Peter Brusilovsky (15)

Recently uploaded (20)

Data-Driven Education 2020: Using Big Educational Data to Improve Teaching and Learning