SlideShare a Scribd company logo
EVALUATION OF PROGRAMS
CODES USING MACHINE
LEARNING
SUBMITTED BY –
MANAS CHHABRA
2K12/SE/041
ROHIT PAL 2K12/SE/
066
TANMAY AGGARWAL 2K12/SE/
OBJECTIVE
•To detect copied codes submitted by
different users on online judges.
HOW CAN WE SAY WHETHER TWO CODES
ARE COPIED ?
• -Tradition Way
• Make a program which compare every code submitted by
user to every other code in the database.
THEN WHAT IS THE PROBLEM WITH THIS
METHOD
Assume that we have 4000 user to solve a given programming
question, assume that every code has 30 lines of codes
WE CAN RESOLVE THIS COMPLEXITY USING
MACHINE LEARNING
And Before we know how we have reduced this complexity
Let’s first Know something about the machine learning concepts
which we have applied
WHAT IS MACHINE LEARNING?
• According to Arthur Samule , Machine Learning is a Field of
study that gives computers the ability to learn without being
explicitly programmed.
• According to Tom Mitchell it is a Well-posed Learning Problem:
A computer program is said to learn from experience E with
respect to some task T and some performance measure P , if its
Performance on T , as measured by P, improves with Experience
E.
MACHINE LEARNING ALGORITHMS
• Supervised Learning
• Unsupervised Learning
WE ARE GOING TO USE UNSUPERVISED
MACHINE LEARNING
Clustering : Learning from unlabled data.
Unsupervised learning
Try and determining structure in the data
Clustering algorithm groups data together based on data
features
WHAT IS CLUSTERING GOOD FOR
• Market segmentation - group customers into different market
segments
• Social network analysis - Facebook "smartlists"
• Organizing computer clusters and data centers for network
layout & location
• Astronomical data analysis - Understanding galaxy formation
K-MEANS ALGORITHM
• Used to automatically group the data into coherent clusters
• e.g. Assume for this unlabled data
• Step 1Randomly allocate two points as the cluster
centroidsCluster
• Step 2 Go through each example and depending on if it's closer
to the red or blue centroid assign each point to one of the two
clusters
• Step 3Move centroid step
• Take each centroid and move to the average of the
correspondingly assigned data-points
• Repeat Step 2 and Step 3 until convergence
MORE FORMAL DEFINITION
• INPUT:
• K (number of clusters in the data)
• Training set {x1, x2, x3 ..., xn)
• Algorithm
• Randomly initialize K cluster centroids as {μ1, μ2, μ3 ... μK}
DIMENSIONALITY REDUCTION
• Speeds up algorithms
• Reduces space used by data for them
• Reduce dimension from nD to mD
e.g. Reduction of 3D->2D
PRINCIPLE COMPONENT ANALYSIS (PCA)
• To reduce from nD to kD weFind k vectors (u(1), u(2), ... u(k)) onto
which to project the data to minimize the projection error
• So lots of vectors onto which we project the data
• Find a set of vectors which we project the data onto the linear
subspace spanned by that set of vectors
• We can define a point in a plane with k vectors
• e.g. 3D->2DFind pair of vectors which define a 2D plane
(surface) onto which you're going to project your data
• Much like the "shallow box" example in compression, we're
trying to create the shallowest box possible (by defining two of
it's three dimensions, so the box' depth is minimized)
ALGORITHM
• Reducing data from n-dimensional to k-dimensional
• Compute the covariance matrix
•This is commonly denoted as
•Σ (greek upper case sigma) - NOT summation symbol
•Σ = sigma
•This is an [n x n] matrix
•Remember than xi is a [n x 1] matrix
•In MATLAB or octave we can implement this as follows;
•Compute eigenvectors of matrix Σ
•[U,S,V] = svd(sigma)
•svd = singular value decomposition
•More numerically stable than eig
•eig = also gives eigenvector
•U,S and V are matrices
•U matrix is also an [n x n] matrix
•Turns out the columns of U are the u vectors we want!
•So to reduce a system from n-dimensions to k-dimensions
•Just take the first k-vectors from U (first k columns)
• Now if we need to reduce to k dimension
• Then we extract k columns from matrix U to Ureduce
NOW LET’S IMPLEMENT THIS INFORMATION
TO OUR PROJECT
FEATURES TO BE CONSIDERED
- Detect type of file eg . .java, .cpp etc
- No. of Lines of Codes
-No. of Functions
-No. of Variables used
-No. of if-else conditions
-No. of loops
APPLICATIONS
• Beside detection of Cheating in Online programming contests
we can have following applications
• We can use this in our Programming labs to evaluate the
programs submitted by students according to complexity.
• We can know what programming style is in trend these days
THANK YOU !

More Related Content

PDF
K-Means Algorithm
PPT
Enhance The K Means Algorithm On Spatial Dataset
PPTX
K means clustering | K Means ++
PDF
Kmeans initialization
PPT
Intro to MATLAB and K-mean algorithm
PDF
K means clustering
PPTX
K-means Clustering
PPTX
A framework for practical fast matrix multiplication
K-Means Algorithm
Enhance The K Means Algorithm On Spatial Dataset
K means clustering | K Means ++
Kmeans initialization
Intro to MATLAB and K-mean algorithm
K means clustering
K-means Clustering
A framework for practical fast matrix multiplication

What's hot (20)

PPT
KNN - Classification Model (Step by Step)
PDF
K - Nearest neighbor ( KNN )
PPTX
K-means Clustering
PPTX
Introduction to Deep Learning
PDF
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
PDF
Notes on Spectral Clustering
PPTX
Spectral clustering
PPTX
Variational Auto Encoder and the Math Behind
PPTX
Optimal Chain Matrix Multiplication Big Data Perspective
PDF
K-means and GMM
PPTX
Connected components and shortest path
PPTX
Daa unit 6_efficiency of algorithms
PPTX
Principal component analysis
PDF
Training machine learning knn 2017
PPTX
Tensor Spectral Clustering
DOCX
Parallel searching
PPT
Support Vector Machine (Classification) - Step by Step
PPTX
2021 01-04-learning filter-basis
PPTX
daa-unit-3-greedy method
PPTX
Parallel algorithm in linear algebra
KNN - Classification Model (Step by Step)
K - Nearest neighbor ( KNN )
K-means Clustering
Introduction to Deep Learning
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Notes on Spectral Clustering
Spectral clustering
Variational Auto Encoder and the Math Behind
Optimal Chain Matrix Multiplication Big Data Perspective
K-means and GMM
Connected components and shortest path
Daa unit 6_efficiency of algorithms
Principal component analysis
Training machine learning knn 2017
Tensor Spectral Clustering
Parallel searching
Support Vector Machine (Classification) - Step by Step
2021 01-04-learning filter-basis
daa-unit-3-greedy method
Parallel algorithm in linear algebra
Ad

Similar to Evaluation of programs codes using machine learning (20)

PDF
CSA 3702 machine learning module 3
PPTX
MODULE 4_ CLUSTERING.pptx
PDF
clusteranalysis_simplexrelated to ai.pdf
PPTX
Csci101 lect10 algorithms_iii
PDF
PPT
Lect4
PPTX
K means Clustering - algorithm to cluster n objects
PPT
Dataa miining
DOCX
8.clustering algorithm.k means.em algorithm
PPT
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
PDF
Clustering.pdf
PPTX
Cluster Analysis.pptx
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
PDF
Mat189: Cluster Analysis with NBA Sports Data
PPTX
Design and Analysis of Algorithm for II year Computer science and Engineering...
PDF
C04701019027
PDF
Machine Learning, Statistics And Data Mining
PPTX
UNIT_V_Cluster Analysis.pptx
PPTX
2015 bioinformatics alignments_wim_vancriekinge
CSA 3702 machine learning module 3
MODULE 4_ CLUSTERING.pptx
clusteranalysis_simplexrelated to ai.pdf
Csci101 lect10 algorithms_iii
Lect4
K means Clustering - algorithm to cluster n objects
Dataa miining
8.clustering algorithm.k means.em algorithm
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
Clustering.pdf
Cluster Analysis.pptx
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
Mat189: Cluster Analysis with NBA Sports Data
Design and Analysis of Algorithm for II year Computer science and Engineering...
C04701019027
Machine Learning, Statistics And Data Mining
UNIT_V_Cluster Analysis.pptx
2015 bioinformatics alignments_wim_vancriekinge
Ad

Recently uploaded (20)

PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
1_Introduction to advance data techniques.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Business Analytics and business intelligence.pdf
PDF
Introduction to Data Science and Data Analysis
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ISS -ESG Data flows What is ESG and HowHow
1_Introduction to advance data techniques.pptx
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Miokarditis (Inflamasi pada Otot Jantung)
STERILIZATION AND DISINFECTION-1.ppthhhbx
Database Infoormation System (DBIS).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Knowledge Engineering Part 1
Business Analytics and business intelligence.pdf
Introduction to Data Science and Data Analysis
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Evaluation of programs codes using machine learning

  • 1. EVALUATION OF PROGRAMS CODES USING MACHINE LEARNING SUBMITTED BY – MANAS CHHABRA 2K12/SE/041 ROHIT PAL 2K12/SE/ 066 TANMAY AGGARWAL 2K12/SE/
  • 2. OBJECTIVE •To detect copied codes submitted by different users on online judges.
  • 3. HOW CAN WE SAY WHETHER TWO CODES ARE COPIED ? • -Tradition Way • Make a program which compare every code submitted by user to every other code in the database.
  • 4. THEN WHAT IS THE PROBLEM WITH THIS METHOD Assume that we have 4000 user to solve a given programming question, assume that every code has 30 lines of codes
  • 5. WE CAN RESOLVE THIS COMPLEXITY USING MACHINE LEARNING And Before we know how we have reduced this complexity Let’s first Know something about the machine learning concepts which we have applied
  • 6. WHAT IS MACHINE LEARNING? • According to Arthur Samule , Machine Learning is a Field of study that gives computers the ability to learn without being explicitly programmed. • According to Tom Mitchell it is a Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P , if its Performance on T , as measured by P, improves with Experience E.
  • 7. MACHINE LEARNING ALGORITHMS • Supervised Learning • Unsupervised Learning
  • 8. WE ARE GOING TO USE UNSUPERVISED MACHINE LEARNING Clustering : Learning from unlabled data. Unsupervised learning Try and determining structure in the data Clustering algorithm groups data together based on data features
  • 9. WHAT IS CLUSTERING GOOD FOR • Market segmentation - group customers into different market segments • Social network analysis - Facebook "smartlists" • Organizing computer clusters and data centers for network layout & location • Astronomical data analysis - Understanding galaxy formation
  • 10. K-MEANS ALGORITHM • Used to automatically group the data into coherent clusters • e.g. Assume for this unlabled data • Step 1Randomly allocate two points as the cluster centroidsCluster • Step 2 Go through each example and depending on if it's closer to the red or blue centroid assign each point to one of the two clusters
  • 11. • Step 3Move centroid step • Take each centroid and move to the average of the correspondingly assigned data-points • Repeat Step 2 and Step 3 until convergence
  • 12. MORE FORMAL DEFINITION • INPUT: • K (number of clusters in the data) • Training set {x1, x2, x3 ..., xn) • Algorithm • Randomly initialize K cluster centroids as {μ1, μ2, μ3 ... μK}
  • 13. DIMENSIONALITY REDUCTION • Speeds up algorithms • Reduces space used by data for them • Reduce dimension from nD to mD e.g. Reduction of 3D->2D
  • 14. PRINCIPLE COMPONENT ANALYSIS (PCA) • To reduce from nD to kD weFind k vectors (u(1), u(2), ... u(k)) onto which to project the data to minimize the projection error • So lots of vectors onto which we project the data • Find a set of vectors which we project the data onto the linear subspace spanned by that set of vectors • We can define a point in a plane with k vectors
  • 15. • e.g. 3D->2DFind pair of vectors which define a 2D plane (surface) onto which you're going to project your data • Much like the "shallow box" example in compression, we're trying to create the shallowest box possible (by defining two of it's three dimensions, so the box' depth is minimized)
  • 16. ALGORITHM • Reducing data from n-dimensional to k-dimensional • Compute the covariance matrix •This is commonly denoted as •Σ (greek upper case sigma) - NOT summation symbol •Σ = sigma •This is an [n x n] matrix •Remember than xi is a [n x 1] matrix •In MATLAB or octave we can implement this as follows;
  • 17. •Compute eigenvectors of matrix Σ •[U,S,V] = svd(sigma) •svd = singular value decomposition •More numerically stable than eig •eig = also gives eigenvector •U,S and V are matrices •U matrix is also an [n x n] matrix •Turns out the columns of U are the u vectors we want! •So to reduce a system from n-dimensions to k-dimensions •Just take the first k-vectors from U (first k columns)
  • 18. • Now if we need to reduce to k dimension • Then we extract k columns from matrix U to Ureduce
  • 19. NOW LET’S IMPLEMENT THIS INFORMATION TO OUR PROJECT
  • 20. FEATURES TO BE CONSIDERED - Detect type of file eg . .java, .cpp etc - No. of Lines of Codes -No. of Functions -No. of Variables used -No. of if-else conditions -No. of loops
  • 21. APPLICATIONS • Beside detection of Cheating in Online programming contests we can have following applications • We can use this in our Programming labs to evaluate the programs submitted by students according to complexity. • We can know what programming style is in trend these days