SlideShare a Scribd company logo
Machine learning
implementations on different
datasets using python
Submitted by: Oanh Doan
Tess Mundadan
Ramandeep
Kaur Bagri
Problem formulation
● Electricity data set
● This is classification
problem of electrical grid
stability into two classes;
● 1.Stable
● 2.Unstable
● Olivetti Dataset
● Recognizing face with highest
accuracy using machine learning
algorithms
● Comparison of their accuracies to
select the most fitted face
recognition algorithm.
About Electricity Data Set
•Electrical Grid data set in which we have different attributes for examine the stability of
the system.
• We will examine the response of the system stability depends on 10,000 observations
and 13 attributes, 1 classes attribute (stabf). Attributes are given in dataset as;
● tau[x] : Reaction time of participant (real from the range [0.5,10]s).
● p[x] : Nominal power consumed(negative)/produced(positive)(real).
● g[x] : Coefficient (gamma) proportional to price elasticity (real from the range
[0.05,1]s^-1).
● stab: The maximal real part of the characteristic equation root (if positive - the
system is linearly unstable)(real)
● stabf: The stability label of the system (stable/unstable)
Head of the data set
Preprocessing
Head of the data set
PCA transformation
Classification using PCA
Method 90:10 80:20 75:25 70:30 Confusion Matrix
Naive bayes 0.706 0.6985 0.6932 0.6997
KNN (N=12) 0.686 0.6735 0.666 0.6723
SVM 0.694 0.689 0.6848 0.6897
Decision Tree 0.618 0.622 0.6352 0.6216
Random Forest 0.661 0.6655 0.6612 0.6703
Classification using without PCA,use standardization
Method 90:10 80:20 75:25 70:30 Confusion
matrix
Naive Bayes 0.831 0.832 0.833 0.83466
KNN (N=12) 0.906 0.9102 0.9108 0.909
SVM 0.81 0.812 0.8108 0.8123
Decision Tree 0.879 0.8715 0.869 0.85766
Random Forest
(n_estimators=6
0)
0.923 0.9215 0.916 0.9186
Classification by ratio 95:5
Method Accuracy Confusion Matrix
Naive Bayes 0.832
KNN 0.92
SVM 0.92
Decision Tree 0.878
Random Forest 0.826
Selecting different features and results
Naive Bayes:
Accuracy - 0.982
KNN:
Accuracy - 0.964
Decision Tree
Accuracy - 1.0
Random Forest:
Accuracy - 1.0
SVM
Accuracy - 99.2
Test: 0.05
Best algorithm fit for electricity simulated data Set for
classification
Random Forest( Using n_estimators = 60 ):
● Test dataset: 90:10
● Accuracy = 0.923
● Confusion Matrix:
● Although the KNN and SVM has given good accuracy at ratio of 95:5, but it is only on
test dataset – need further evaluation on training dataset.
● By selecting different features the accuracy observed is 1, but the model is
overfitted – Need prepossing and proper selection of features to train the models to
improve accuracy.
Step 1: Load and explore the data
Step 2: Visualize the data Olivetti Data set
Python Packages Used
Reshape data
Machine Learning project presentation
Accuracy Metrics Comparison without PCA
Method 90:10 80:20 70:30
Naive_Bayes
92.5 87.50 73.33
KNN
90.0 91.25 90.00
Random Forest
90.0 93.75 93.33
SVM
95.0 96.25 96.67
Logistic Regression
95.0 97.50 97.50
Dimensionality Reduction using PCA
Mean Face of the Samples
Eigen Faces
Metrics with different Principle Components
Method PC = 90 PC =103 PC = 200
Naive Bayes 76.67 77.50 69.17
KNN 90.83 90.83 90.00
Random Forest 93.33 92.50 91.67
Logistic Regression 96.67 98.33 98.33
SVM 96.67 96.67 96.67
Accuracy Metrics Comparison with PCA
METHOD 90:10 80:20 70:30
Without PCA With PCA Without PCA With PCA Without PCA With PCA
Naive Bayes 92.5 92.5 87.50 90.0 73.33 77.50
KNN 90.0 90.0 91.25 92.5 90.00 90.83
Random Forest 90.0 100.0 93.75 95.0 93.33 92.50
Logistic
Regression
95.0 97.5 96.25 97.5 97.50 98.33
SVM 95.0 95.0 97.5 97.5 96.67 96.67
Best fit for Olivetti data set
Random Forest : PCA = 103
Test dataset = 90:10
Accuracy = 100
Conclusion
Use of PCA improved the accuracy metrics.
Project Enhancement Idea : Identify smile and no Smile face/
Identify male and female
Reference
● https://www.kaggle.com/serkanpeldek/face-recognition-on-olivetti-dataset
● https://colab.research.google.com/notebooks/welcome.ipynb
● https://www.youtube.com/watch?v=PitcORQSjNM
● https://www.youtube.com/watch?v=_VTtrSDHPwU&t=86s
● https://www.kaggle.com/elikplim/eergy-efficiency-dataset/kernels
● Class notes: Introduction to data mining and machine learning
Thank you for your attention

More Related Content

PPTX
Machine Learning and Real-World Applications
PPTX
Machine Learning Project
PPTX
Machine Learning Final presentation
PPTX
Machine Learning - Splitting Datasets
PDF
Machine Learning Model Evaluation Methods
PDF
A brief history of machine learning
PDF
Building a performing Machine Learning model from A to Z
PPTX
INTRODUCTION TO MACHINE LEARNING.pptx
Machine Learning and Real-World Applications
Machine Learning Project
Machine Learning Final presentation
Machine Learning - Splitting Datasets
Machine Learning Model Evaluation Methods
A brief history of machine learning
Building a performing Machine Learning model from A to Z
INTRODUCTION TO MACHINE LEARNING.pptx

What's hot (20)

PPTX
Housing price prediction
PPTX
House price prediction
PPTX
Prediction of heart disease using machine learning.pptx
PPTX
House price ppt 18 bcs6588_md. tauhid alam
PPTX
Predicting house price
PPTX
House Sale Price Prediction
PPTX
House Price Prediction.pptx
PDF
Introduction to Machine Learning with SciKit-Learn
PDF
Support Vector Machines ( SVM )
PPTX
Credit card fraud detection
PPTX
Fraud and Risk in Big Data
PPTX
Brain Tumour Detection.pptx
PPT
3.2 partitioning methods
PPTX
Intro/Overview on Machine Learning Presentation
PPTX
Loan Prediction System Using Machine Learning.pptx
PPTX
Object detection presentation
PPTX
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
PPTX
Image classification with Deep Neural Networks
PDF
Linear regression
PPTX
Predicting house prices_Regression
Housing price prediction
House price prediction
Prediction of heart disease using machine learning.pptx
House price ppt 18 bcs6588_md. tauhid alam
Predicting house price
House Sale Price Prediction
House Price Prediction.pptx
Introduction to Machine Learning with SciKit-Learn
Support Vector Machines ( SVM )
Credit card fraud detection
Fraud and Risk in Big Data
Brain Tumour Detection.pptx
3.2 partitioning methods
Intro/Overview on Machine Learning Presentation
Loan Prediction System Using Machine Learning.pptx
Object detection presentation
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Image classification with Deep Neural Networks
Linear regression
Predicting house prices_Regression
Ad

Similar to Machine Learning project presentation (20)

PDF
Performance Evaluation: A Comparative Study of Various Classifiers
PDF
Lecture 12 binary classifier confusion matrix
PDF
Tips and tricks for data science projects with Python
PDF
Mariusz Gil "Machine Learning"
ODP
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
PPTX
Build Deep Learning model to identify santader bank's dissatisfied customers
PPTX
Classification: MNIST, training a Binary classifier, performance measure, mul...
PDF
Benchmarking_ML_Tools
PDF
Fault Detection and Classification for Robotic Test-bench
PDF
Kaggle Higgs Boson Machine Learning Challenge
PDF
IRJET- Supervised Learning Classification Algorithms Comparison
PDF
IRJET- Supervised Learning Classification Algorithms Comparison
PPTX
wk5ppt1_Titanic
PDF
Higgs Boson Challenge
PDF
Image Classification
PDF
Using Open Source Tools for Machine Learning
PDF
Machine Learning Crash Course by Sebastian Raschka
PDF
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
PDF
Predicting performance of classification algorithms
PDF
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Performance Evaluation: A Comparative Study of Various Classifiers
Lecture 12 binary classifier confusion matrix
Tips and tricks for data science projects with Python
Mariusz Gil "Machine Learning"
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
Build Deep Learning model to identify santader bank's dissatisfied customers
Classification: MNIST, training a Binary classifier, performance measure, mul...
Benchmarking_ML_Tools
Fault Detection and Classification for Robotic Test-bench
Kaggle Higgs Boson Machine Learning Challenge
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
wk5ppt1_Titanic
Higgs Boson Challenge
Image Classification
Using Open Source Tools for Machine Learning
Machine Learning Crash Course by Sebastian Raschka
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
Predicting performance of classification algorithms
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Foundation of Data Science unit number two notes
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
Moving the Public Sector (Government) to a Digital Adoption
Business Acumen Training GuidePresentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Foundation of Data Science unit number two notes
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Knowledge Engineering Part 1
Introduction-to-Cloud-ComputingFinal.pptx
climate analysis of Dhaka ,Banglades.pptx

Machine Learning project presentation

  • 1. Machine learning implementations on different datasets using python Submitted by: Oanh Doan Tess Mundadan Ramandeep Kaur Bagri
  • 2. Problem formulation ● Electricity data set ● This is classification problem of electrical grid stability into two classes; ● 1.Stable ● 2.Unstable ● Olivetti Dataset ● Recognizing face with highest accuracy using machine learning algorithms ● Comparison of their accuracies to select the most fitted face recognition algorithm.
  • 3. About Electricity Data Set •Electrical Grid data set in which we have different attributes for examine the stability of the system. • We will examine the response of the system stability depends on 10,000 observations and 13 attributes, 1 classes attribute (stabf). Attributes are given in dataset as; ● tau[x] : Reaction time of participant (real from the range [0.5,10]s). ● p[x] : Nominal power consumed(negative)/produced(positive)(real). ● g[x] : Coefficient (gamma) proportional to price elasticity (real from the range [0.05,1]s^-1). ● stab: The maximal real part of the characteristic equation root (if positive - the system is linearly unstable)(real) ● stabf: The stability label of the system (stable/unstable)
  • 4. Head of the data set
  • 6. Head of the data set
  • 8. Classification using PCA Method 90:10 80:20 75:25 70:30 Confusion Matrix Naive bayes 0.706 0.6985 0.6932 0.6997 KNN (N=12) 0.686 0.6735 0.666 0.6723 SVM 0.694 0.689 0.6848 0.6897 Decision Tree 0.618 0.622 0.6352 0.6216 Random Forest 0.661 0.6655 0.6612 0.6703
  • 9. Classification using without PCA,use standardization Method 90:10 80:20 75:25 70:30 Confusion matrix Naive Bayes 0.831 0.832 0.833 0.83466 KNN (N=12) 0.906 0.9102 0.9108 0.909 SVM 0.81 0.812 0.8108 0.8123 Decision Tree 0.879 0.8715 0.869 0.85766 Random Forest (n_estimators=6 0) 0.923 0.9215 0.916 0.9186
  • 10. Classification by ratio 95:5 Method Accuracy Confusion Matrix Naive Bayes 0.832 KNN 0.92 SVM 0.92 Decision Tree 0.878 Random Forest 0.826
  • 11. Selecting different features and results Naive Bayes: Accuracy - 0.982 KNN: Accuracy - 0.964 Decision Tree Accuracy - 1.0 Random Forest: Accuracy - 1.0 SVM Accuracy - 99.2 Test: 0.05
  • 12. Best algorithm fit for electricity simulated data Set for classification Random Forest( Using n_estimators = 60 ): ● Test dataset: 90:10 ● Accuracy = 0.923 ● Confusion Matrix: ● Although the KNN and SVM has given good accuracy at ratio of 95:5, but it is only on test dataset – need further evaluation on training dataset. ● By selecting different features the accuracy observed is 1, but the model is overfitted – Need prepossing and proper selection of features to train the models to improve accuracy.
  • 13. Step 1: Load and explore the data
  • 14. Step 2: Visualize the data Olivetti Data set
  • 18. Accuracy Metrics Comparison without PCA Method 90:10 80:20 70:30 Naive_Bayes 92.5 87.50 73.33 KNN 90.0 91.25 90.00 Random Forest 90.0 93.75 93.33 SVM 95.0 96.25 96.67 Logistic Regression 95.0 97.50 97.50
  • 20. Mean Face of the Samples
  • 22. Metrics with different Principle Components Method PC = 90 PC =103 PC = 200 Naive Bayes 76.67 77.50 69.17 KNN 90.83 90.83 90.00 Random Forest 93.33 92.50 91.67 Logistic Regression 96.67 98.33 98.33 SVM 96.67 96.67 96.67
  • 23. Accuracy Metrics Comparison with PCA METHOD 90:10 80:20 70:30 Without PCA With PCA Without PCA With PCA Without PCA With PCA Naive Bayes 92.5 92.5 87.50 90.0 73.33 77.50 KNN 90.0 90.0 91.25 92.5 90.00 90.83 Random Forest 90.0 100.0 93.75 95.0 93.33 92.50 Logistic Regression 95.0 97.5 96.25 97.5 97.50 98.33 SVM 95.0 95.0 97.5 97.5 96.67 96.67
  • 24. Best fit for Olivetti data set Random Forest : PCA = 103 Test dataset = 90:10 Accuracy = 100
  • 25. Conclusion Use of PCA improved the accuracy metrics. Project Enhancement Idea : Identify smile and no Smile face/ Identify male and female
  • 26. Reference ● https://www.kaggle.com/serkanpeldek/face-recognition-on-olivetti-dataset ● https://colab.research.google.com/notebooks/welcome.ipynb ● https://www.youtube.com/watch?v=PitcORQSjNM ● https://www.youtube.com/watch?v=_VTtrSDHPwU&t=86s ● https://www.kaggle.com/elikplim/eergy-efficiency-dataset/kernels ● Class notes: Introduction to data mining and machine learning
  • 27. Thank you for your attention

Editor's Notes

  • #15: Data is collected by taking the pictures of 40 different human personalities in each column of the dataset. Each row is the sample of one human personality with different face emojis to train the dataset. Dimensions: 10:400
  • #19: No N/As Machine learning models can work on vectors. Since the image data is in the matrix form, it must be converted to a vector. Different testing datasets are being used: 0.1,0.2,0.3.
  • #24: SVM is not affected by PCA 100 is due to small and clean data set Increasing the percentage of training set improves accuracy