SlideShare a Scribd company logo
Aspiring Minds
www.aspiringminds.com
Grading Programs using
Machine Learning
Varun Aggarwal
Presented at KDD, 2014
Programming Assessments: Existing solutions
• Manual evaluation: Can’t scale; not standardized
• Test-case based evaluation:
• High false-positives – hard code, inadvertent errors
• High false-negatives – correct code but not efficient
• Similarity metric between control flow graphs, syntax trees:
• Need to handle multiple correct implementations – theoretically doesn’t fit in
• No mapping of metric to an objective feedback
Automatic grading of programs– Why?
• Widely performed - will help professors and TAs save a lot of time.
• Companies can recruit efficiently
• MOOCs - need automated open response assessments to really make it effective. True scaling of such system
currently not achieved.
A model to predict the logical correctness of a
program, given the control and data
dependencies it possesses.
Our Approach
Automata – Automatic program evaluation engine
Machine Learning based scoring
engine
Evaluation of programming best
practices
Asymptotic complexity evaluation
Lint-styled rule-based system to detect
programs not following programming best
practices.
Measures the run-time of the code for
various input sizes
and empirically derives the complexity.
Why programming modules give a better test-
shortlist rate ?
• Programming has more predictive power in identifying good performers than Logical ability.
• Due to lower predictive power of Logical, a higher cut-off has to be applied to it as compared to
Programming to get the same organizational efficiency.
• Higher the Programming capability of the person, requirement on Logical score is lesser.
• Given the person is lower than a given score on Programming, even having a higher logical ability does
not help.
Evaluation Rubric
ML based scoring
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
1 2
3
4
5
6 7
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
1
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
2
Evaluation Rubric
Our Approach
Understanding the human process
Problem and Language independent
Features
Machine learning model
Ungraded programs
Graded programs
Predicted grades
3
void print_1(int N){
for(i =1 ; i<=N; i++){
print newline;
count = i;
for(j=0; j<i; j++)
print count;
count++;
}
}
1
2 3
3 4 5
4 5 6 7
OBJECTIVE
To print the pattern of integers
An implementation
1. Are there loops? Are there
print statements?
3. Is the conditional in the inner loop dependent on
- a variable modified in the outer loop?
- a variable used in the conditional of the outer
loop?
What does a grader look for?
2. Is there a nested-loop structure?
Grammar for expressing features
• Simple features
• Keywords and Tokens (Counts):
• Tokens like for, if, return, break; function calls like printf, strrev, strcat; declarations like int, char
• Operators like various arithmetic, logical, relational operators used
• Character constants like ‘0’, ‘ ’, ‘65’, ‘96’
• Capturing logical constructs (Interactions)
• Control flow structure
• Data-dependencies
• Data-dependencies in context of control-flow
CONTROL FEATURES – COUNTS
Counts of control-related keywords/tokens
Ex. count(for) = 2
count(for-in-for) = 1
count(while) = 0
Control-context of these keywords
- The Print command as loop(loop(print)))
for(i =1 ; i<=N; i++){
print newline;
count = i;
CONTROL FLOW GRAPH
i = 1
i <= N
i++
j < i
count = i
j = 0
print(count)
count++
j++
END
Loop 1
Loop 2
Parent scope
for(j=0; j<i; j++)
print count; count++;
void print(int N){
}
}
TARGET PROGRAM
DATA OPERATION FEATURES IN CONTROL-CONTEXT
Counts of data-related tokens in context of the control structure
Ex. count(block1 :loop(loop(++))) = 2
count(block1 :loop(loop_cond(<))) = 1
Capture control-context of data-dependencies in groups of expressions
 i++ j < i : var (i) related to var (j) : appearing in a loop(loop_cond)
previously incremented : appearing in a loop
The relation and the increment happen in the same block
Loop 1
Loop 2
Loop 1
Loop 1
Loop 2
Loop 1
Loop 2
Loop 1
i = 1
i <= N
print(count)
count = ii++
j < i
count = 0
count++
j= 0
j++
Parent scope Parent scope
Loop 1
Loop 1
Loop 2
CONTROL FLOW INFORMATION
ANNOTATED IN A-D-D GRAPH
Loop 1
• Deployed Automata in a major product-based company’s recruitment
• Analyzed the performance improvement in using Automata over test-case pass based selection criterion
• 22.6% candidates who were not being shortlisted through test-case pass were now shortlisted using
Automata.
Case study
Experimental Results
Sort Problem
Doing it the one-class way!
PROBLEM All features Basic features
Mean Min25 Mean Min25
1 0.57 0.61 0.52 0.56
2 0.80 0.83 0.72 0.75
3 0.75 0.81 0.59 0.73
4 0.81 0.81 0.75 0.75
5 0.68 0.69 0.55 0.61
Betters test-case in all, but one
How good is the final ML-based score?
Validation Correlation >= 0.79
Matches Inter-rater Correlation between two human raters
PROBLEM # of features Cross-val correl Train correl Validation correl Test Case Score
1 80 0.61 0.85 0.79 0.54
2 68 0.77 0.93 0.91 0.80
3 193 0.91 0.98 0.90 0.64
4 66 0.90 0.94 0.90 0.80
5 87 0.81 0.92 0.84 0.84
Can we get insight?
• The most contributing feature for Find Digit problem -
int findDigit(int N, int digit){
…
…
LOOP (N != <constant value>){
…
N = N / <constant value>
…
}
}
Features for FindDigit problem
analyzed. Given a multi-digit
number and a digit, one has to
find the number of times the
digit appears in the number
Yes, we can!
• The most contributing feature for Find Digit problem -
int findDigit(int N, int digit){
…
LOOP (N != <constant value>){
…
N = N / <constant value>
…
}
…
}
int findDigit(int N, int digit){
...
while(N != 0){
d = N%10;
if(d == digit)
...
N = N / 10;
}
}
Evaluation Rubric
Score Interpretation
5 Completely correct and efficient
An efficient implementation of the problem using right control structures and data-
dependencies.
4 Correct with some silly errors
Correct control structures and closely matching data-dependencies. Some silly
mistakes fail the code to pass test-cases.
3 Inconsistent logical structures
Right control structures start exist with few correct data dependencies
2 Emerging basic structures
Appropriate keywords and tokens present, showing some understanding of the
problem
1 Gibberish code
Seemingly unrelated to problem at hand.
Automata – Sample report
Candidate’s source code
Feedback on
programming
best practices
Asymptotic
complexity of the
candidate’s
solution
Test case pass/fail
information
Problem summary
Do our fancy features help?
Control and Data dependency features add around 0.15 correlation points above token information.
PROBLEM Type of feature # of features Cross-val correl Train correl Validation correl
1
All, w/o test case 35 0.57 0.72 0.56
Basic 60 0.62 0.87 0.41
2
All, w/o test case 80 0.81 0.99 0.80
Basic 26 0.59 0.72 0.67
3
All, w/o test case 190 0.87 0.97 0.90
Basic 26 0.74 0.89 0.74
4
All, w/o test case 134 0.85 0.91 0.82
Basic 35 0.83 0.88 0.69
5
All, w/o test case 166 0.66 0.81 0.64
Basic 40 0.61 0.78 0.61
Conclusion
• We propose the first machine learning based approach to automatically grade programs
• An innovative feature grammar is proposed which matches human intuition of grading programs.
• Models built for sample problems show promising results.
• We propose and demonstrate machine learning techniques to lower the need of human-graded data to build
models.

More Related Content

PPTX
Aspiring Minds | Outcomes using test scores
PPTX
Software testing
PPT
A Methodology for Enhancing Programming Competence of Students Using Parikshak
PPS
Testing techniques
PDF
IKM Adaptive Methodology
PDF
PDF
Dynamic Testing
Aspiring Minds | Outcomes using test scores
Software testing
A Methodology for Enhancing Programming Competence of Students Using Parikshak
Testing techniques
IKM Adaptive Methodology
Dynamic Testing

What's hot (11)

PPTX
Selection procedure in samsung
PPT
Software testing
PDF
IKMTest-ResultMartinOKello4
PPT
Unit testing
PDF
Internship 3 months
PPTX
Training and development
DOCX
MBA WiZard – 2013: India’s Biggest Contest for MBA Aspirants
PPTX
Machine Translation Quality Estimation
PPTX
Linways assessment codeways!
PPTX
Equivalence class testing
PDF
Hiring process of Amazon
Selection procedure in samsung
Software testing
IKMTest-ResultMartinOKello4
Unit testing
Internship 3 months
Training and development
MBA WiZard – 2013: India’s Biggest Contest for MBA Aspirants
Machine Translation Quality Estimation
Linways assessment codeways!
Equivalence class testing
Hiring process of Amazon
Ad

Viewers also liked (17)

PPTX
Aspiring Minds | Svar
PPTX
Aspiring Minds | AM Situations
PDF
Campus New Proposal.
PDF
Am cat workshop part 1
PDF
Amcat Certificate
PDF
Campus Performace Report
PDF
Prediction of Salary From Profiles
PDF
16720032294774_Sirallapu_Anitha_corpReport
PPTX
Aspiring Minds | Labor market insights
PDF
Recruitments & assessment industry
PPTX
Institute Performance Solutions
PPTX
About Youth4work - Integrated Talent Solutions
PPTX
Youth4work Marketing & Advertising Solutions
PPTX
PPTX
Campus Hiring Made Easy
PDF
Humanika Consulting presentation 2012b english
Aspiring Minds | Svar
Aspiring Minds | AM Situations
Campus New Proposal.
Am cat workshop part 1
Amcat Certificate
Campus Performace Report
Prediction of Salary From Profiles
16720032294774_Sirallapu_Anitha_corpReport
Aspiring Minds | Labor market insights
Recruitments & assessment industry
Institute Performance Solutions
About Youth4work - Integrated Talent Solutions
Youth4work Marketing & Advertising Solutions
Campus Hiring Made Easy
Humanika Consulting presentation 2012b english
Ad

Similar to Aspiring Minds | Automata (20)

PPTX
Software Testing_A_mmmmmmmmmmmmmmmmmmmmm
PDF
Using formal methods in Industrial Software Development
PDF
Staroletov testing TDD BDD MBT
PDF
Testing concepts [3] - Software Testing Techniques (CIS640)
PPT
Unit 1 python (2021 r)
PPTX
Foutse_Khomh.pptx
PDF
PPTX
Introduction to White box testing
PDF
Experimental Design for Distributed Machine Learning with Myles Baker
PDF
Different Methodologies For Testing Web Application Testing
PPT
testing
PDF
Defying Logic - Business Logic Testing with Automation
PPTX
Symposium 2019 : Gestion de projet en Intelligence Artificielle
PDF
presentation.pdf
PDF
ЄРМЕК КАДИРБАЄВ & АЛЕКС РИБКІН «How we train QAEs to join automation» Online ...
PDF
Integrating AI in software quality in absence of a well-defined requirements
DOCX
DP Project Report
PDF
Pengenalan algoritma dasar dalam pemrograman
PPTX
Software Testing interview - Q&A and tips
Software Testing_A_mmmmmmmmmmmmmmmmmmmmm
Using formal methods in Industrial Software Development
Staroletov testing TDD BDD MBT
Testing concepts [3] - Software Testing Techniques (CIS640)
Unit 1 python (2021 r)
Foutse_Khomh.pptx
Introduction to White box testing
Experimental Design for Distributed Machine Learning with Myles Baker
Different Methodologies For Testing Web Application Testing
testing
Defying Logic - Business Logic Testing with Automation
Symposium 2019 : Gestion de projet en Intelligence Artificielle
presentation.pdf
ЄРМЕК КАДИРБАЄВ & АЛЕКС РИБКІН «How we train QAEs to join automation» Online ...
Integrating AI in software quality in absence of a well-defined requirements
DP Project Report
Pengenalan algoritma dasar dalam pemrograman
Software Testing interview - Q&A and tips

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Mushroom cultivation and it's methods.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
1. Introduction to Computer Programming.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
A comparative study of natural language inference in Swahili using monolingua...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Hybrid model detection and classification of lung cancer
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
WOOl fibre morphology and structure.pdf for textiles
Chapter 5: Probability Theory and Statistics
Mushroom cultivation and it's methods.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
NewMind AI Weekly Chronicles - August'25-Week II
1. Introduction to Computer Programming.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
1 - Historical Antecedents, Social Consideration.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DP Operators-handbook-extract for the Mautical Institute
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
SOPHOS-XG Firewall Administrator PPT.pptx
Enhancing emotion recognition model for a student engagement use case through...

Aspiring Minds | Automata

  • 1. Aspiring Minds www.aspiringminds.com Grading Programs using Machine Learning Varun Aggarwal Presented at KDD, 2014
  • 2. Programming Assessments: Existing solutions • Manual evaluation: Can’t scale; not standardized • Test-case based evaluation: • High false-positives – hard code, inadvertent errors • High false-negatives – correct code but not efficient • Similarity metric between control flow graphs, syntax trees: • Need to handle multiple correct implementations – theoretically doesn’t fit in • No mapping of metric to an objective feedback
  • 3. Automatic grading of programs– Why? • Widely performed - will help professors and TAs save a lot of time. • Companies can recruit efficiently • MOOCs - need automated open response assessments to really make it effective. True scaling of such system currently not achieved.
  • 4. A model to predict the logical correctness of a program, given the control and data dependencies it possesses. Our Approach Automata – Automatic program evaluation engine Machine Learning based scoring engine Evaluation of programming best practices Asymptotic complexity evaluation Lint-styled rule-based system to detect programs not following programming best practices. Measures the run-time of the code for various input sizes and empirically derives the complexity.
  • 5. Why programming modules give a better test- shortlist rate ? • Programming has more predictive power in identifying good performers than Logical ability. • Due to lower predictive power of Logical, a higher cut-off has to be applied to it as compared to Programming to get the same organizational efficiency. • Higher the Programming capability of the person, requirement on Logical score is lesser. • Given the person is lower than a given score on Programming, even having a higher logical ability does not help.
  • 6. Evaluation Rubric ML based scoring Understanding the human process Problem and Language independent Features Machine learning model Ungraded programs Graded programs Predicted grades 1 2 3 4 5 6 7
  • 7. Evaluation Rubric Our Approach Understanding the human process Problem and Language independent Features Machine learning model Ungraded programs Graded programs Predicted grades 1
  • 8. Evaluation Rubric Our Approach Understanding the human process Problem and Language independent Features Machine learning model Ungraded programs Graded programs Predicted grades 2
  • 9. Evaluation Rubric Our Approach Understanding the human process Problem and Language independent Features Machine learning model Ungraded programs Graded programs Predicted grades 3
  • 10. void print_1(int N){ for(i =1 ; i<=N; i++){ print newline; count = i; for(j=0; j<i; j++) print count; count++; } } 1 2 3 3 4 5 4 5 6 7 OBJECTIVE To print the pattern of integers An implementation 1. Are there loops? Are there print statements? 3. Is the conditional in the inner loop dependent on - a variable modified in the outer loop? - a variable used in the conditional of the outer loop? What does a grader look for? 2. Is there a nested-loop structure?
  • 11. Grammar for expressing features • Simple features • Keywords and Tokens (Counts): • Tokens like for, if, return, break; function calls like printf, strrev, strcat; declarations like int, char • Operators like various arithmetic, logical, relational operators used • Character constants like ‘0’, ‘ ’, ‘65’, ‘96’ • Capturing logical constructs (Interactions) • Control flow structure • Data-dependencies • Data-dependencies in context of control-flow
  • 12. CONTROL FEATURES – COUNTS Counts of control-related keywords/tokens Ex. count(for) = 2 count(for-in-for) = 1 count(while) = 0 Control-context of these keywords - The Print command as loop(loop(print))) for(i =1 ; i<=N; i++){ print newline; count = i; CONTROL FLOW GRAPH i = 1 i <= N i++ j < i count = i j = 0 print(count) count++ j++ END Loop 1 Loop 2 Parent scope for(j=0; j<i; j++) print count; count++; void print(int N){ } } TARGET PROGRAM
  • 13. DATA OPERATION FEATURES IN CONTROL-CONTEXT Counts of data-related tokens in context of the control structure Ex. count(block1 :loop(loop(++))) = 2 count(block1 :loop(loop_cond(<))) = 1 Capture control-context of data-dependencies in groups of expressions  i++ j < i : var (i) related to var (j) : appearing in a loop(loop_cond) previously incremented : appearing in a loop The relation and the increment happen in the same block Loop 1 Loop 2 Loop 1 Loop 1 Loop 2 Loop 1 Loop 2 Loop 1 i = 1 i <= N print(count) count = ii++ j < i count = 0 count++ j= 0 j++ Parent scope Parent scope Loop 1 Loop 1 Loop 2 CONTROL FLOW INFORMATION ANNOTATED IN A-D-D GRAPH Loop 1
  • 14. • Deployed Automata in a major product-based company’s recruitment • Analyzed the performance improvement in using Automata over test-case pass based selection criterion • 22.6% candidates who were not being shortlisted through test-case pass were now shortlisted using Automata. Case study
  • 16. Doing it the one-class way! PROBLEM All features Basic features Mean Min25 Mean Min25 1 0.57 0.61 0.52 0.56 2 0.80 0.83 0.72 0.75 3 0.75 0.81 0.59 0.73 4 0.81 0.81 0.75 0.75 5 0.68 0.69 0.55 0.61 Betters test-case in all, but one
  • 17. How good is the final ML-based score? Validation Correlation >= 0.79 Matches Inter-rater Correlation between two human raters PROBLEM # of features Cross-val correl Train correl Validation correl Test Case Score 1 80 0.61 0.85 0.79 0.54 2 68 0.77 0.93 0.91 0.80 3 193 0.91 0.98 0.90 0.64 4 66 0.90 0.94 0.90 0.80 5 87 0.81 0.92 0.84 0.84
  • 18. Can we get insight? • The most contributing feature for Find Digit problem - int findDigit(int N, int digit){ … … LOOP (N != <constant value>){ … N = N / <constant value> … } } Features for FindDigit problem analyzed. Given a multi-digit number and a digit, one has to find the number of times the digit appears in the number
  • 19. Yes, we can! • The most contributing feature for Find Digit problem - int findDigit(int N, int digit){ … LOOP (N != <constant value>){ … N = N / <constant value> … } … } int findDigit(int N, int digit){ ... while(N != 0){ d = N%10; if(d == digit) ... N = N / 10; } }
  • 20. Evaluation Rubric Score Interpretation 5 Completely correct and efficient An efficient implementation of the problem using right control structures and data- dependencies. 4 Correct with some silly errors Correct control structures and closely matching data-dependencies. Some silly mistakes fail the code to pass test-cases. 3 Inconsistent logical structures Right control structures start exist with few correct data dependencies 2 Emerging basic structures Appropriate keywords and tokens present, showing some understanding of the problem 1 Gibberish code Seemingly unrelated to problem at hand.
  • 21. Automata – Sample report Candidate’s source code Feedback on programming best practices Asymptotic complexity of the candidate’s solution Test case pass/fail information Problem summary
  • 22. Do our fancy features help? Control and Data dependency features add around 0.15 correlation points above token information. PROBLEM Type of feature # of features Cross-val correl Train correl Validation correl 1 All, w/o test case 35 0.57 0.72 0.56 Basic 60 0.62 0.87 0.41 2 All, w/o test case 80 0.81 0.99 0.80 Basic 26 0.59 0.72 0.67 3 All, w/o test case 190 0.87 0.97 0.90 Basic 26 0.74 0.89 0.74 4 All, w/o test case 134 0.85 0.91 0.82 Basic 35 0.83 0.88 0.69 5 All, w/o test case 166 0.66 0.81 0.64 Basic 40 0.61 0.78 0.61
  • 23. Conclusion • We propose the first machine learning based approach to automatically grade programs • An innovative feature grammar is proposed which matches human intuition of grading programs. • Models built for sample problems show promising results. • We propose and demonstrate machine learning techniques to lower the need of human-graded data to build models.