LOGO
Using Multiple Accounts for
Harvesting Solutions in
MOOCs
José A. Ruipérez Valienteb,c,* @JoseARuiperez
Giora Alexandrona,*
Zhongzhou Chena
David E. Pritcharda
Contact: {jruipere, giora, zchen22, dpritch}@mit.edu
a Massachusetts Institute of Technology
b Universidad Carlos III de Madrid
c IMDEA Networks Institute
* Equal contribution to this work
Overview
 Identify students who use fake accounts for harvesting solutions that they
later submit in their main account.
 CAMEO: Copying Answers using Multiple Existence Online (Northcutt, Ho, and Chuang, 2015)
 Main goals:
 Study the amount of CAMEO, the motivation for using it, and how it can be reduced
 Course studied: MITx 8.MReV Introductory Physics on edX
 Some of the results:
 ~3% of the certificated students used this method in more than 50% of their correct
submissions
 More cheating on high stake questions
 Less cheating on randomized questions and when feedback is limited
 Importance of the research:
 Threat to the value of the certificates
 Interfere with educational research
 Closely related work:
 Related to academic dishonesty (students break edX’s code of Honor) and gaming the
system (exploit properties of system)
 CAMEO in MITx/HarvardX courses (Northcutt, Ho, and Chuang, 2015)
 Online homework tutor system (Palazzo, Lee, Warnakulasooriya, and Pritchard, 2010)
2Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Methodology
3Using Multiple Accounts for Harvesting Solutions in MOOCs
delay
Harvester
Master
Question 1 Question 2
Harvester
Master
 What we detect: Cheating using multiple accounts
 Harvesting account is used to collect correct solutions
(using ‘show answer’ or exhaustive search)
 Master account submits these solutions
 Methodology: Educational data mining on
tracking logs
 Algorithm:
1. Collect all events (harvester, master, q) with the
properties:
• Harvester gets the solution for q, then master submits it
• Harvester and master share IP group
• Delay between events < 24h
2. Apply criteria:
• Master never behaves as harvester, and vice versa
• Harvester works for others:
– Most of submissions are actually used by the
master
– Does not earn a certificate
• Master harvests at least 10 questions
3. Remove events of users who don’t adhere to these
criteria
 We made a manual verification of a sample
 Identified events have a unique delay distribution
@JoseARuiperez
Findings: Amount of CAMEO
 Population of the analysis:
 Students who completed at least 5% of the
questions (1581 students)
 502 certificate earners
4Using Multiple Accounts for Harvesting Solutions in MOOCs
# master
accounts
# harvester
accounts
#harvested
answers
All students 99 (6.3%) 112 19602 (3%)
Certificate
earners
52
(10.3%)
63 12396 (3%)
Non-
Certificatees
47
(4.35%)
49 7206 (3%)
2% of the c”e obtained
70% of correct answers
using CAMEO
@JoseARuiperez
Findings: Characteristics of CAMEO events
 Harvesting technique: 53.5%
using ‘show answer’ and 46.5%
using exhaustive search
 Harvesting precedes first
master answer in 91% of the
cases
 Delay between the harvesting
event and the submission in
the master account:
 90% events below 5 min
 Median 27 seconds
 Mode 5 seconds
5Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Findings: Cheaters vs.
other students
6Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Density distribution. Performance
per type of account.
Scatterplot. Response
time vs. performance
82%
61%43%
ANOVA (F=72.43, p < 0.001)
Findings: Certificate vs.
non-certificate earners
7Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
 Comparison of cheating over time
 Non-certificate earners tend to cheat more  Cheat for a certificate
 Drop in chapter 9+10 due to certification peak
 Different behaviors depending the student
85% of certificates
were earned within
this chunk!
Findings: Reduction methods
 Limiting feedback reduces harvesting
 Our findings suggest that questions with delayed
feedback are cheated 3x less than the rest
 Randomization reduces harvesting
 Decreases harvested answers about 2x
 Statistically significant per student and per
problem
8Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Discussion and conclusions
9Using Multiple Accounts for Harvesting Solutions in MOOCs
 Motivation? Certificate!
 Master’s do not try to solve the problem first (no learning?)
 High-stake questions more likely to be cheated
 Change behavior after receiving certificate
 Non-certificate earners start with the intention of ‘cheat for certificate’
 Problematic issue
 10% students harvested around 1% of their solutions
 Relative large number of students cheating in introductory physics
 It might increase if MOOC certificates become significant for industry
 Implications
 Decrease confidence in the assessment  Affect the perceived value
certificates
 Can interfere with MOOC research
 Can apply to other online learning environments as well
@JoseARuiperez
 Remedies
 Problems using randomized variables
 Delaying (or removing) feedback on high-stake questions
 Limitations
 Internal validity: No definite evidence that what we detect is CAMEO,
or a tagged sample (‘training set’).
• Too high? (4x more CAMEO users than Northcutt et al. found in our course)
• Too low? (very strict criteria; findings are low comparing to general literature)
 Generalizability: We studied only one course. How typical is it?
 Future research
 More courses broadening the understanding of the phenomenon
 Detection without relying on IP
 Run-time detection [by the platform itself]
 Other ways of copying
10Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Discussion and conclusions
LOGO
José A. Ruipérez Valienteb,c,* @JoseARuiperez
Giora Alexandrona,*
Zhongzhou Chena
David E. Pritcharda
Contact: {jruipere, giora, zchen22, dpritch}@mit.edu
a Massachusetts Institute of Technology
b Universidad Carlos III de Madrid
c IMDEA Networks Institute
* Equal contribution to this work
Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Harvester Master
Appendix 1: Example pattern
Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
Appendix 2: Example of individual profiles

More Related Content

PPTX
E-Learning in Newborn Health A paradigm shift in continuing professional deve...
PPTX
The Future of Online Testing with MOOCs: An Exploratory Analysis of Current P...
PPTX
LAK '17 Trends and issues in student-facing learning analytics reporting sys...
PPTX
What questions are MOOCs asking? An evidence based investigation
PPTX
Learning to share: understanding perceptions of repurposing OERs in social sc...
PPTX
Expanding our Testing Horizons
PDF
Utilizing
PDF
De carlo rizk 2010 icelw
E-Learning in Newborn Health A paradigm shift in continuing professional deve...
The Future of Online Testing with MOOCs: An Exploratory Analysis of Current P...
LAK '17 Trends and issues in student-facing learning analytics reporting sys...
What questions are MOOCs asking? An evidence based investigation
Learning to share: understanding perceptions of repurposing OERs in social sc...
Expanding our Testing Horizons
Utilizing
De carlo rizk 2010 icelw

What's hot (18)

PPTX
VLE assessment tools
PPT
De carlo rizk 2010 icelw
PDF
Extending Moodle - Moodlemoot Romania 2013
PPTX
EDUCAUSE 2009 Poster Session -- Implementing Electronic Portfolios By Begi...
PDF
Introduction
PPTX
Does Video In The Classroom Work? New Student Performance Data - Panopto Vide...
PPTX
Building the e-Assessment Centre
PPTX
Lamar resconfdevelopmentimplementationuseeportfoliospk 12 schools-3-22-13
PPTX
Lamar resconfdevelopmentimplementationuseeportfoliospk 12 schools-3-22-13
PDF
Optimizing your Response to Intervention (RTI) Model with Wowzers Online Math
PDF
When Response Rate Matters
PDF
Cs 643 syllabus
PDF
Planning digital-learning-for-k12
PPTX
The Future Teacher
PPTX
Predicted project edtech 2015
PPT
Jisc RSC Eastern Technical Managers forum Feb 2013 'Jisc CETIS, Wilbert Kraan'
PDF
ICELW Conference Slides
PDF
Text Analytics
VLE assessment tools
De carlo rizk 2010 icelw
Extending Moodle - Moodlemoot Romania 2013
EDUCAUSE 2009 Poster Session -- Implementing Electronic Portfolios By Begi...
Introduction
Does Video In The Classroom Work? New Student Performance Data - Panopto Vide...
Building the e-Assessment Centre
Lamar resconfdevelopmentimplementationuseeportfoliospk 12 schools-3-22-13
Lamar resconfdevelopmentimplementationuseeportfoliospk 12 schools-3-22-13
Optimizing your Response to Intervention (RTI) Model with Wowzers Online Math
When Response Rate Matters
Cs 643 syllabus
Planning digital-learning-for-k12
The Future Teacher
Predicted project edtech 2015
Jisc RSC Eastern Technical Managers forum Feb 2013 'Jisc CETIS, Wilbert Kraan'
ICELW Conference Slides
Text Analytics
Ad

Similar to Using Multiple Accounts for Harvesting Solutions in MOOCs (20)

PPTX
TajdgosvsuzkavsiandgakjxjndhsjsbsbsksnsP.pptx
PPTX
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
PDF
Meta-review of recognition of learning in LMS and MOOCs - Ruth Cobos
PPTX
What data from 3 million learners can tell us about effective course design
PPT
Simulating learning networks in a higher education blogosphere – at scale
PPTX
Presentation at the conference ecdea.org, 8 of June 2018
PDF
Learning Management Systems Evaluation based on Neutrosophic sets
PDF
eMOOCs2015 Does peer grading work?
PDF
Fostering students’ engagement and learning through UNEDTrivial: a gamified s...
PPTX
Analytics - Presentation in DkIT
PPTX
online xampp quiz project final report for the completion of the degree bca
PPTX
TESTA Interactive Masterclass
PDF
Examining the effect of a real time student dashboard on student behavior and...
PPTX
Are you really ready to roll out learning analytics across your entire instit...
PPTX
Investigating learning strategies in a dispositional learning analytics conte...
PPTX
Moodle Analytic Admin Tool Plugin for Student Performance Predict
PDF
Survey Data Quality Methods for ISSP and DATIS
PPTX
VII Jornadas eMadrid "Education in exponential times". Erkan Er: "Predicting ...
PPT
Use of online quizzes to support inquiry-based learning in chemical engineering
PPT
Usability Primer - for Alberta Municipal Webmasters Working Group
TajdgosvsuzkavsiandgakjxjndhsjsbsbsksnsP.pptx
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Meta-review of recognition of learning in LMS and MOOCs - Ruth Cobos
What data from 3 million learners can tell us about effective course design
Simulating learning networks in a higher education blogosphere – at scale
Presentation at the conference ecdea.org, 8 of June 2018
Learning Management Systems Evaluation based on Neutrosophic sets
eMOOCs2015 Does peer grading work?
Fostering students’ engagement and learning through UNEDTrivial: a gamified s...
Analytics - Presentation in DkIT
online xampp quiz project final report for the completion of the degree bca
TESTA Interactive Masterclass
Examining the effect of a real time student dashboard on student behavior and...
Are you really ready to roll out learning analytics across your entire instit...
Investigating learning strategies in a dispositional learning analytics conte...
Moodle Analytic Admin Tool Plugin for Student Performance Predict
Survey Data Quality Methods for ISSP and DATIS
VII Jornadas eMadrid "Education in exponential times". Erkan Er: "Predicting ...
Use of online quizzes to support inquiry-based learning in chemical engineering
Usability Primer - for Alberta Municipal Webmasters Working Group
Ad

More from MIT (7)

PPTX
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
 
PPTX
Multiplatform MOOC Analytics: Comparing Global and Regional Patterns in edX a...
 
PPTX
Learning Analytics Design in Game-based Learning
 
PPTX
Investigación en Learning Analytics vs. Learning Analytics en la Universidad
 
PDF
Ph.D. Defense - Dr. Jose A. Ruiperez Valiente
 
PDF
A Data-driven Method for the Detection of Close Submitters in Online Learning...
 
PDF
Diseño e Implementación de un Módulo de Analítica de Aprendizaje en la Plataf...
 
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
 
Multiplatform MOOC Analytics: Comparing Global and Regional Patterns in edX a...
 
Learning Analytics Design in Game-based Learning
 
Investigación en Learning Analytics vs. Learning Analytics en la Universidad
 
Ph.D. Defense - Dr. Jose A. Ruiperez Valiente
 
A Data-driven Method for the Detection of Close Submitters in Online Learning...
 
Diseño e Implementación de un Módulo de Analítica de Aprendizaje en la Plataf...
 

Recently uploaded (20)

PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPT
statistics analysis - topic 3 - describing data visually
PPTX
eGramSWARAJ-PPT Training Module for beginners
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
A biomechanical Functional analysis of the masitary muscles in man
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
An essential collection of rules designed to help businesses manage and reduc...
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PDF
Microsoft 365 products and services descrption
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PDF
Global Data and Analytics Market Outlook Report
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
DOCX
Factor Analysis Word Document Presentation
PPT
statistic analysis for study - data collection
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
statistics analysis - topic 3 - describing data visually
eGramSWARAJ-PPT Training Module for beginners
DU, AIS, Big Data and Data Analytics.ppt
A biomechanical Functional analysis of the masitary muscles in man
Session 11 - Data Visualization Storytelling (2).pdf
CYBER SECURITY the Next Warefare Tactics
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
An essential collection of rules designed to help businesses manage and reduc...
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
Microsoft 365 products and services descrption
Topic 5 Presentation 5 Lesson 5 Corporate Fin
statsppt this is statistics ppt for giving knowledge about this topic
Global Data and Analytics Market Outlook Report
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Factor Analysis Word Document Presentation
statistic analysis for study - data collection
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx

Using Multiple Accounts for Harvesting Solutions in MOOCs

  • 1. LOGO Using Multiple Accounts for Harvesting Solutions in MOOCs José A. Ruipérez Valienteb,c,* @JoseARuiperez Giora Alexandrona,* Zhongzhou Chena David E. Pritcharda Contact: {jruipere, giora, zchen22, dpritch}@mit.edu a Massachusetts Institute of Technology b Universidad Carlos III de Madrid c IMDEA Networks Institute * Equal contribution to this work
  • 2. Overview  Identify students who use fake accounts for harvesting solutions that they later submit in their main account.  CAMEO: Copying Answers using Multiple Existence Online (Northcutt, Ho, and Chuang, 2015)  Main goals:  Study the amount of CAMEO, the motivation for using it, and how it can be reduced  Course studied: MITx 8.MReV Introductory Physics on edX  Some of the results:  ~3% of the certificated students used this method in more than 50% of their correct submissions  More cheating on high stake questions  Less cheating on randomized questions and when feedback is limited  Importance of the research:  Threat to the value of the certificates  Interfere with educational research  Closely related work:  Related to academic dishonesty (students break edX’s code of Honor) and gaming the system (exploit properties of system)  CAMEO in MITx/HarvardX courses (Northcutt, Ho, and Chuang, 2015)  Online homework tutor system (Palazzo, Lee, Warnakulasooriya, and Pritchard, 2010) 2Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
  • 3. Methodology 3Using Multiple Accounts for Harvesting Solutions in MOOCs delay Harvester Master Question 1 Question 2 Harvester Master  What we detect: Cheating using multiple accounts  Harvesting account is used to collect correct solutions (using ‘show answer’ or exhaustive search)  Master account submits these solutions  Methodology: Educational data mining on tracking logs  Algorithm: 1. Collect all events (harvester, master, q) with the properties: • Harvester gets the solution for q, then master submits it • Harvester and master share IP group • Delay between events < 24h 2. Apply criteria: • Master never behaves as harvester, and vice versa • Harvester works for others: – Most of submissions are actually used by the master – Does not earn a certificate • Master harvests at least 10 questions 3. Remove events of users who don’t adhere to these criteria  We made a manual verification of a sample  Identified events have a unique delay distribution @JoseARuiperez
  • 4. Findings: Amount of CAMEO  Population of the analysis:  Students who completed at least 5% of the questions (1581 students)  502 certificate earners 4Using Multiple Accounts for Harvesting Solutions in MOOCs # master accounts # harvester accounts #harvested answers All students 99 (6.3%) 112 19602 (3%) Certificate earners 52 (10.3%) 63 12396 (3%) Non- Certificatees 47 (4.35%) 49 7206 (3%) 2% of the c”e obtained 70% of correct answers using CAMEO @JoseARuiperez
  • 5. Findings: Characteristics of CAMEO events  Harvesting technique: 53.5% using ‘show answer’ and 46.5% using exhaustive search  Harvesting precedes first master answer in 91% of the cases  Delay between the harvesting event and the submission in the master account:  90% events below 5 min  Median 27 seconds  Mode 5 seconds 5Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
  • 6. Findings: Cheaters vs. other students 6Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez Density distribution. Performance per type of account. Scatterplot. Response time vs. performance 82% 61%43% ANOVA (F=72.43, p < 0.001)
  • 7. Findings: Certificate vs. non-certificate earners 7Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez  Comparison of cheating over time  Non-certificate earners tend to cheat more  Cheat for a certificate  Drop in chapter 9+10 due to certification peak  Different behaviors depending the student 85% of certificates were earned within this chunk!
  • 8. Findings: Reduction methods  Limiting feedback reduces harvesting  Our findings suggest that questions with delayed feedback are cheated 3x less than the rest  Randomization reduces harvesting  Decreases harvested answers about 2x  Statistically significant per student and per problem 8Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
  • 9. Discussion and conclusions 9Using Multiple Accounts for Harvesting Solutions in MOOCs  Motivation? Certificate!  Master’s do not try to solve the problem first (no learning?)  High-stake questions more likely to be cheated  Change behavior after receiving certificate  Non-certificate earners start with the intention of ‘cheat for certificate’  Problematic issue  10% students harvested around 1% of their solutions  Relative large number of students cheating in introductory physics  It might increase if MOOC certificates become significant for industry  Implications  Decrease confidence in the assessment  Affect the perceived value certificates  Can interfere with MOOC research  Can apply to other online learning environments as well @JoseARuiperez
  • 10.  Remedies  Problems using randomized variables  Delaying (or removing) feedback on high-stake questions  Limitations  Internal validity: No definite evidence that what we detect is CAMEO, or a tagged sample (‘training set’). • Too high? (4x more CAMEO users than Northcutt et al. found in our course) • Too low? (very strict criteria; findings are low comparing to general literature)  Generalizability: We studied only one course. How typical is it?  Future research  More courses broadening the understanding of the phenomenon  Detection without relying on IP  Run-time detection [by the platform itself]  Other ways of copying 10Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez Discussion and conclusions
  • 11. LOGO José A. Ruipérez Valienteb,c,* @JoseARuiperez Giora Alexandrona,* Zhongzhou Chena David E. Pritcharda Contact: {jruipere, giora, zchen22, dpritch}@mit.edu a Massachusetts Institute of Technology b Universidad Carlos III de Madrid c IMDEA Networks Institute * Equal contribution to this work Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
  • 12. Harvester Master Appendix 1: Example pattern Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez
  • 13. Using Multiple Accounts for Harvesting Solutions in MOOCs @JoseARuiperez Appendix 2: Example of individual profiles