SlideShare a Scribd company logo
Performance Analysis of Machine Learning Algorithms on Self Localization Systems
In thispaperauthoris usingSVM(SupportVectorMachine), DecisionTree Classifier,K-Neighbors
Classifier,naïve bayes,RandomForestClassifier,BaggingClassifier,Ada BoostClassifierandMLP
Classifier
All the algorithmsgenerate model fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.Random Forest algorithmisgivingbetterpredictionaccuracycompare to all other
algorithm.
Support vector machine:
Machine learninginvolvespredictingandclassifyingdataandto do sowe employvariousmachine
learningalgorithmsaccordingtothe dataset.SVMor SupportVectorMachine is a linearmodel for
classificationandregressionproblems.It cansolve linearandnon-linearproblemsandworkwell for
manypractical problems.The ideaof SVMissimple:The algorithmcreatesaline ora hyper plane
whichseparatesthe dataintoclasses.Inmachine learning,the radial basisfunctionkernel,orRBF
kernel,isapopularkernel functionusedinvariouskernelizedlearningalgorithms.Inparticular,itis
commonlyusedinsupportvectormachine classification.Asasimple example,foraclassification
task withonlytwofeatures(like the image above),youcanthinkof a hyperplane asa line that
linearlyseparatesandclassifiesasetof data.
Intuitively,the furtherfromthe hyper plane ourdatapointslie,the more confidentwe are thatthey
have beencorrectlyclassified.We therefore wantourdatapointstobe as far awayfrom the hyper
plane as possible,while still beingonthe correctside of it.
So whennewtestingdataisadded,whateversideof the hyper plane itlandswill decide the class
that we assignto it.
How dowe findthe right hyperplane?
Or, inotherwords,howdo we bestsegregate the twoclasseswithinthe data?
The distance betweenthe hyper plane andthe nearestdatapointfromeithersetisknownasthe
margin.The goal isto choose a hyper plane withthe greatestpossible marginbetweenthe hyper
plane andany pointwithinthe trainingset,givingagreaterchance of new data beingclassified
correctly.
Both algorithmsgeneratemodel fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.SVMalgorithmisgivingbetterpredictionaccuracycompare to ANN algorithm.
Naïve Bayes Classifier Algorithm
It wouldbe difficultandpracticallyimpossible toclassifyawebpage,a document,anemail orany
otherlengthytextnotesmanually.This iswhere Naïve BayesClassifiermachine learningalgorithm
comesto the rescue.A classifierisafunctionthatallocatesa population’s elementvalue fromone of
the available categories.Forinstance,SpamFilteringisapopularapplicationof Naïve Bayes
algorithm.Spamfilterhere,isaclassifierthatassignsalabel “Spam”or “Not Spam” to all the emails.
Naïve BayesClassifierisamongstthe mostpopularlearningmethodgroupedbysimilaritiesthat
workson the popularBayesTheorem of Probability- tobuildmachine learningmodelsparticularly
for disease predictionanddocumentclassification.Itisa simple classification of wordsbasedon
BayesProbabilityTheoremforsubjective analysisof content.
Decision tree:
A decisiontree isagraphical representationthatmakesuse of branchingmethodologytoexemplify
all possible outcomesof adecision, basedoncertainconditions.Inadecisiontree,the internalnode
representsateston the attribute,eachbranchof the tree represents the outcome of the testand
the leaf node representsaparticularclasslabel i.e.the decisionmade aftercomputingall of the
attributes.
The classificationrulesare representedthroughthe pathfromrootto the leaf node.
Types ofDecisionTrees
ClassificationTrees- These are consideredasthe defaultkindof decisiontreesusedtoseparate a
datasetintodifferentclasses,based onthe responsevariable.Theseare generallyusedwhenthe
response variable iscategoricalinnature.
RegressionTrees-Whenthe responseortargetvariable iscontinuousornumerical,regressiontrees
are used.These are generally usedinpredictive type of problemswhencomparedtoclassification.
Decisiontreescanalsobe classifiedintotwotypes,basedonthe type of targetvariable- Continuous
Variable DecisionTreesandBinary Variable DecisionTrees.Itisthe targetvariable thathelpsdecide
whatkindof decisiontree wouldbe requiredforaparticularproblem.
Random forest:
RandomForestisthe go to machine learningalgorithmthatusesabaggingapproachto create a
bunchof decisiontreeswithrandom subsetof the data.A model istrainedseveral timesonrandom
sample of the datasetto achieve goodpredictionperformancefromthe randomforestalgorithm.In
thisensemble learningmethod,the outputof all the decisiontreesinthe randomforest,is
combinedtomake the final prediction.The final predictionof the randomforestalgorithmisderived
by pollingthe resultsof eachdecisiontree orjustby goingwitha predictionthatappearsthe most
timesinthe decisiontrees.
For instance,inthe above example- if 5 friendsdecide thatyouwill like restaurantRbutonly2
friendsdecide thatyouwill notlike the restaurantthenthe final predictionisthat,youwill like
restaurantR as majorityalwayswins.
K – nearest neighbor:
K-nearest neighbor’salgorithm(k-NN) isa nonparametricmethodusedforclassification
and regression Inbothcases,the inputconsistsof the kclosesttrainingexamplesinthe feature
space.The outputdependsonwhether k-NN isusedforclassificationorregression:
 In k-NN classification,the outputisa classmembership.Anobjectisclassifiedbya
pluralityvote of itsneighbors,withthe objectbeingassignedtothe classmost common
amongits k nearestneighbors(kisapositive integer,typicallysmall).If k= 1, thenthe
objectissimplyassignedtothe classof that single nearestneighbor.
 In k-NN regression,the outputisthe propertyvalue forthe object.Thisvalue isthe
average of the valuesof k nearestneighbors.
K-NN isa type of instant-basedlearning,orlazylearning, where the functionisonly
approximatedlocallyandall computationisdeferreduntil classification.
Both forclassificationandregression,auseful technique canbe toassignweightstothe
contributionsof the neighbors,sothatthe nearerneighborscontribute more tothe average
than the more distantones.Forexample,acommonweightingscheme consistsingiving
each neighboraweightof 1/d,where d isthe distance tothe neighbor
The neighborsare takenfroma setof objectsforwhichthe class (for k-NN classification)or
the objectpropertyvalue (for k-NN regression) isknown.Thiscanbe thoughtof as the
trainingsetforthe algorithm,thoughnoexplicittrainingstepisrequired.
A peculiarityof the k-NN algorithmisthatitis sensitive tothe local structure of the data.
Bagging classifier:
A Baggingclassifierisanensemble meta-estimatorthatfitsbase classifierseachonrandomsubsets
of the original datasetandthenaggregate theirindividual predictions(eitherbyvotingorby
averaging) toforma final prediction.Suchameta-estimatorcantypicallybe usedasa wayto reduce
the variance of a black-box estimator(e.g.,adecisiontree),by introducingrandomizationintoits
constructionprocedure andthenmakinganensemble outof it.
Each base classifieristrainedinparallelwithatrainingsetwhichisgeneratedbyrandomlydrawing,
withreplacement,N examples (ordata) fromthe original trainingdataset –whereN is the size of the
original training set.Trainingsetfor each of the base classifiersisindependentof eachother.Many
of the original datamaybe repeatedinthe resultingtrainingsetwhile othersmaybe leftout.
Baggingreducesoverfitting(variance) byaveragingorvoting,however,thisleadstoanincrease in
bias,whichiscompensatedbythe reductioninvariance though.
AdaBoost:
Adaptiveboosting isamachine learningmeatalgorithm formulated.Itcanbe usedin conjunction
withmanyothertypesof learningalgorithmstoimprove performance.The outputof the other
learningalgorithms('weaklearners') iscombinedintoaweightedsumthatrepresentsthe final
outputof the boostedclassifier.AdaBoostisadaptive inthe sense thatsubsequentweaklearners
are tweakedinfavorof those instancesmisclassifiedbypreviousclassifiers.AdaBoostissensitiveto
noisydata and outliers. Insome problemsitcanbe lesssusceptible tothe overfittingproblemthan
otherlearningalgorithms.The individuallearnerscanbe weak,butas longas the performance of
each one isslightlybetterthanrandomguessing,the final model canbe proventoconverge toa
stronglearner.
Everylearningalgorithmtendstosuitsome problemtypesbetterthanothers,andtypicallyhas
manydifferentparametersandconfigurationstoadjustbefore itachievesoptimalperformance ona
dataset,AdaBoost isoftenreferredtoasthe bestout-of-the-box classifier.[2]
Whenusedwith
decisiontree learning,informationgatheredateachstage of the AdaBoostalgorithmaboutthe
relative 'hardness'of eachtrainingsample isfedintothe tree growingalgorithmsuchthatlatertrees
tendto focuson harder-to-classifyexamples.
Multilayer perceptron (MLP):
A multilayerperceptron(MLP) isa classof feedforward artificialneural network (ANN).The term
MLP isusedambiguously, sometimeslooselytoreferto any feed forwardANN,sometimesstrictlyto
referto networkscomposedof multiple layersof perceptrons (withthresholdactivation);
see § Terminology.Multilayerperceptronsare sometimescolloquiallyreferredtoas"vanilla"neural
networks,especiallywhentheyhave asingle hiddenlayer.
An MLP consistsof at leastthree layersof nodes:aninputlayer,a hiddenlayerandanoutputlayer.
Exceptfor the inputnodes,eachnode isa neuronthatusesa nonlinear activationfunction.MLP
utilizesasupervisedlearningtechnique called backpropagation fortraining. Itsmultiple layersand
non-linearactivationdistinguishMLPfroma linear perceptron.Itcandistinguishdatathatis
not linearlyseparable.
To implementabove all algorithmswe have usedpythontechnologyand‘student data’ dataset.This
dataset available inside dataset folder which contains test dataset with dataset information file.
PythonPackagesandLibrariesused:Numpy,pandas, tkinter,
PyVISA 1.10.1 1.10.1
PyVISA-py 0.3.1 0.3.1
cycler 0.10.0 0.10.0
imutils 0.5.3 0.5.3
joblib 0.14.1 0.14.1
kiwisolver 1.1.0 1.1.0
matplotlib 3.1.2 3.1.2
nltk 3.4.5 3.4.5
numpy 1.18.1 1.18.1
opencv-python 4.1.2.30 4.1.2.30
pandas 0.25.3 0.25.3
pip 19.0.3 20.0.1
pylab 0.0.2 0.0.2
pyparsing 2.4.6 2.4.6
python-dateutil 2.8.1 2.8.1
pytz 2019.3 2019.3
pyusb 1.0.2 1.0.2
scikit-learn 0.22.1 0.22.1
scipy 1.4.1 1.4.1
seaborn 0.9.0 0.9.0
setuptools 40.8.0 45.1.0
six 1.14.0 1.14.0
sklearn 0.0 0.0
style 1.1.6 1.1.6
styled 0.2.0.post1 0.2.0.post1
classificationreport,confusionmatrix,accuracyscore,train_test_split,K-Fold,cross_val_score,Grid
SearchCV, DecisionTree Classifier,K-NeighborsClassifier,SVC,naive_bayes,Random Forest
Classifier,BaggingClassifier,AdaBoostClassifier,MLPClassifier.
Screenshots
Whenwe run the code itdisplaysbelow window
Nowclickon ‘uploaddataset’touploadthe data
Nowclickon ‘readdata’ itreads the data
Now clickon ‘Train_Test_split’tosplitthe dataintotrainingandtesting
Nowclickon ‘All classifiers’toclassifythe models
KNN Predicted Values on Test Data is 98%
CART Predicted Values on Test Data is 97.31%
SVM Predicted Values on Test Data is 98.18%
RF Predicted Values on Test Data is 98.62%
Bagging Predicted Values on Test Data is 97.41%
Ada Predicted Values on Test Data is 87.43%
MLP Predicted Values on Test Data is 98.00%
Nowclickon ‘Model comparison’the comparisonbetweenthe models

More Related Content

PPTX
Tree pruning
PPTX
Data mining: Classification and prediction
PPTX
04 Classification in Data Mining
PPT
Classification and prediction
PPTX
Classification Continued
PPT
Data Mining
PPT
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Tree pruning
Data mining: Classification and prediction
04 Classification in Data Mining
Classification and prediction
Classification Continued
Data Mining
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

What's hot (19)

PPT
2.1 Data Mining-classification Basic concepts
PPT
Data.Mining.C.6(II).classification and prediction
PPTX
Data mining technique (decision tree)
PPTX
Chapter 4 Classification
PPTX
Gradient Boosted trees
PDF
05 Classification And Prediction
PDF
Introduction to Some Tree based Learning Method
PDF
Understanding random forests
PDF
Understanding the Machine Learning Algorithms
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PDF
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
PPTX
13 random forest
PPTX
Predictive analytics
PPTX
Primer on major data mining algorithms
PDF
CART: Not only Classification and Regression Trees
PPT
Download It
PDF
Data mining chapter04and5-best
PPTX
PPT
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
2.1 Data Mining-classification Basic concepts
Data.Mining.C.6(II).classification and prediction
Data mining technique (decision tree)
Chapter 4 Classification
Gradient Boosted trees
05 Classification And Prediction
Introduction to Some Tree based Learning Method
Understanding random forests
Understanding the Machine Learning Algorithms
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
13 random forest
Predictive analytics
Primer on major data mining algorithms
CART: Not only Classification and Regression Trees
Download It
Data mining chapter04and5-best
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Ad

Similar to Performance analysis of machine learning algorithms on self localization system1 (20)

DOCX
Performance analysis of machine learning algorithms on self localization system1
DOCX
introduction to machine learning unit iv
PPTX
PythonML.pptx
PDF
Python Code for Classification Supervised Machine Learning.pdf
PDF
dm1.pdf
PPTX
Case Study Presentation on Random Variables in machine learning.pptx
PPTX
An Introduction to Random Forest and linear regression algorithms
PPTX
Data mining: Classification and Prediction
PPTX
RapidMiner: Data Mining And Rapid Miner
PPTX
RapidMiner: Data Mining And Rapid Miner
PDF
Different Types of Data Science Models You Should Know.pdf
PDF
Building Azure Machine Learning Models
PPTX
Introduction to XGBoost Machine Learning Model.pptx
PPTX
machine learning navies bayes therom and how it is soved.pptx
PPTX
AI Algorithms
PPTX
Mis End Term Exam Theory Concepts
PPTX
pjgjhkjhkjhkkhkhkkhkjhjhjhjkhjhjkhjhroject.pptx
PDF
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
PPT
Supervised and unsupervised learning
PDF
A Decision Tree Based Classifier for Classification & Prediction of Diseases
Performance analysis of machine learning algorithms on self localization system1
introduction to machine learning unit iv
PythonML.pptx
Python Code for Classification Supervised Machine Learning.pdf
dm1.pdf
Case Study Presentation on Random Variables in machine learning.pptx
An Introduction to Random Forest and linear regression algorithms
Data mining: Classification and Prediction
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
Different Types of Data Science Models You Should Know.pdf
Building Azure Machine Learning Models
Introduction to XGBoost Machine Learning Model.pptx
machine learning navies bayes therom and how it is soved.pptx
AI Algorithms
Mis End Term Exam Theory Concepts
pjgjhkjhkjhkkhkhkkhkjhjhjhjkhjhjkhjhroject.pptx
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Supervised and unsupervised learning
A Decision Tree Based Classifier for Classification & Prediction of Diseases
Ad

More from Venkat Projects (20)

DOCX
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
DOCX
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
DOCX
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
DOCX
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
DOCX
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
DOCX
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
DOCX
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
DOCX
WATERMARKING IMAGES
DOCX
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
DOCX
Application and evaluation of a K-Medoidsbased shape clustering method for an...
DOCX
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
DOCX
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
DOCX
2022 PYTHON MAJOR PROJECTS LIST.docx
DOCX
2022 PYTHON PROJECTS LIST.docx
DOCX
2021 PYTHON PROJECTS LIST.docx
DOCX
2021 python projects list
DOCX
10.sentiment analysis of customer product reviews using machine learni
DOCX
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
DOCX
6.iris recognition using machine learning technique
DOCX
5.local community detection algorithm based on minimal cluster
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
WATERMARKING IMAGES
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
Application and evaluation of a K-Medoidsbased shape clustering method for an...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
2022 PYTHON MAJOR PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx
2021 python projects list
10.sentiment analysis of customer product reviews using machine learni
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
6.iris recognition using machine learning technique
5.local community detection algorithm based on minimal cluster

Recently uploaded (20)

PDF
A systematic review of self-coping strategies used by university students to ...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Structure & Organelles in detailed.
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Complications of Minimal Access Surgery at WLH
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Presentation on HIE in infants and its manifestations
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
A systematic review of self-coping strategies used by university students to ...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Microbial diseases, their pathogenesis and prophylaxis
O5-L3 Freight Transport Ops (International) V1.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Classroom Observation Tools for Teachers
Cell Structure & Organelles in detailed.
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Complications of Minimal Access Surgery at WLH
Abdominal Access Techniques with Prof. Dr. R K Mishra
Presentation on HIE in infants and its manifestations
102 student loan defaulters named and shamed – Is someone you know on the list?
Final Presentation General Medicine 03-08-2024.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf

Performance analysis of machine learning algorithms on self localization system1

  • 1. Performance Analysis of Machine Learning Algorithms on Self Localization Systems In thispaperauthoris usingSVM(SupportVectorMachine), DecisionTree Classifier,K-Neighbors Classifier,naïve bayes,RandomForestClassifier,BaggingClassifier,Ada BoostClassifierandMLP Classifier All the algorithmsgenerate model fromtraindatasetandnew datawill be appliedontrainmodel to predictitclass.Random Forest algorithmisgivingbetterpredictionaccuracycompare to all other algorithm. Support vector machine: Machine learninginvolvespredictingandclassifyingdataandto do sowe employvariousmachine learningalgorithmsaccordingtothe dataset.SVMor SupportVectorMachine is a linearmodel for classificationandregressionproblems.It cansolve linearandnon-linearproblemsandworkwell for manypractical problems.The ideaof SVMissimple:The algorithmcreatesaline ora hyper plane whichseparatesthe dataintoclasses.Inmachine learning,the radial basisfunctionkernel,orRBF kernel,isapopularkernel functionusedinvariouskernelizedlearningalgorithms.Inparticular,itis commonlyusedinsupportvectormachine classification.Asasimple example,foraclassification task withonlytwofeatures(like the image above),youcanthinkof a hyperplane asa line that linearlyseparatesandclassifiesasetof data. Intuitively,the furtherfromthe hyper plane ourdatapointslie,the more confidentwe are thatthey have beencorrectlyclassified.We therefore wantourdatapointstobe as far awayfrom the hyper plane as possible,while still beingonthe correctside of it. So whennewtestingdataisadded,whateversideof the hyper plane itlandswill decide the class that we assignto it. How dowe findthe right hyperplane? Or, inotherwords,howdo we bestsegregate the twoclasseswithinthe data? The distance betweenthe hyper plane andthe nearestdatapointfromeithersetisknownasthe margin.The goal isto choose a hyper plane withthe greatestpossible marginbetweenthe hyper plane andany pointwithinthe trainingset,givingagreaterchance of new data beingclassified correctly. Both algorithmsgeneratemodel fromtraindatasetandnew datawill be appliedontrainmodel to predictitclass.SVMalgorithmisgivingbetterpredictionaccuracycompare to ANN algorithm. Naïve Bayes Classifier Algorithm It wouldbe difficultandpracticallyimpossible toclassifyawebpage,a document,anemail orany otherlengthytextnotesmanually.This iswhere Naïve BayesClassifiermachine learningalgorithm comesto the rescue.A classifierisafunctionthatallocatesa population’s elementvalue fromone of the available categories.Forinstance,SpamFilteringisapopularapplicationof Naïve Bayes algorithm.Spamfilterhere,isaclassifierthatassignsalabel “Spam”or “Not Spam” to all the emails. Naïve BayesClassifierisamongstthe mostpopularlearningmethodgroupedbysimilaritiesthat workson the popularBayesTheorem of Probability- tobuildmachine learningmodelsparticularly for disease predictionanddocumentclassification.Itisa simple classification of wordsbasedon BayesProbabilityTheoremforsubjective analysisof content.
  • 2. Decision tree: A decisiontree isagraphical representationthatmakesuse of branchingmethodologytoexemplify all possible outcomesof adecision, basedoncertainconditions.Inadecisiontree,the internalnode representsateston the attribute,eachbranchof the tree represents the outcome of the testand the leaf node representsaparticularclasslabel i.e.the decisionmade aftercomputingall of the attributes. The classificationrulesare representedthroughthe pathfromrootto the leaf node. Types ofDecisionTrees ClassificationTrees- These are consideredasthe defaultkindof decisiontreesusedtoseparate a datasetintodifferentclasses,based onthe responsevariable.Theseare generallyusedwhenthe response variable iscategoricalinnature. RegressionTrees-Whenthe responseortargetvariable iscontinuousornumerical,regressiontrees are used.These are generally usedinpredictive type of problemswhencomparedtoclassification. Decisiontreescanalsobe classifiedintotwotypes,basedonthe type of targetvariable- Continuous Variable DecisionTreesandBinary Variable DecisionTrees.Itisthe targetvariable thathelpsdecide whatkindof decisiontree wouldbe requiredforaparticularproblem. Random forest: RandomForestisthe go to machine learningalgorithmthatusesabaggingapproachto create a bunchof decisiontreeswithrandom subsetof the data.A model istrainedseveral timesonrandom sample of the datasetto achieve goodpredictionperformancefromthe randomforestalgorithm.In thisensemble learningmethod,the outputof all the decisiontreesinthe randomforest,is combinedtomake the final prediction.The final predictionof the randomforestalgorithmisderived by pollingthe resultsof eachdecisiontree orjustby goingwitha predictionthatappearsthe most timesinthe decisiontrees. For instance,inthe above example- if 5 friendsdecide thatyouwill like restaurantRbutonly2 friendsdecide thatyouwill notlike the restaurantthenthe final predictionisthat,youwill like restaurantR as majorityalwayswins. K – nearest neighbor: K-nearest neighbor’salgorithm(k-NN) isa nonparametricmethodusedforclassification and regression Inbothcases,the inputconsistsof the kclosesttrainingexamplesinthe feature space.The outputdependsonwhether k-NN isusedforclassificationorregression:  In k-NN classification,the outputisa classmembership.Anobjectisclassifiedbya pluralityvote of itsneighbors,withthe objectbeingassignedtothe classmost common amongits k nearestneighbors(kisapositive integer,typicallysmall).If k= 1, thenthe objectissimplyassignedtothe classof that single nearestneighbor.  In k-NN regression,the outputisthe propertyvalue forthe object.Thisvalue isthe average of the valuesof k nearestneighbors. K-NN isa type of instant-basedlearning,orlazylearning, where the functionisonly approximatedlocallyandall computationisdeferreduntil classification. Both forclassificationandregression,auseful technique canbe toassignweightstothe contributionsof the neighbors,sothatthe nearerneighborscontribute more tothe average than the more distantones.Forexample,acommonweightingscheme consistsingiving each neighboraweightof 1/d,where d isthe distance tothe neighbor
  • 3. The neighborsare takenfroma setof objectsforwhichthe class (for k-NN classification)or the objectpropertyvalue (for k-NN regression) isknown.Thiscanbe thoughtof as the trainingsetforthe algorithm,thoughnoexplicittrainingstepisrequired. A peculiarityof the k-NN algorithmisthatitis sensitive tothe local structure of the data. Bagging classifier: A Baggingclassifierisanensemble meta-estimatorthatfitsbase classifierseachonrandomsubsets of the original datasetandthenaggregate theirindividual predictions(eitherbyvotingorby averaging) toforma final prediction.Suchameta-estimatorcantypicallybe usedasa wayto reduce the variance of a black-box estimator(e.g.,adecisiontree),by introducingrandomizationintoits constructionprocedure andthenmakinganensemble outof it. Each base classifieristrainedinparallelwithatrainingsetwhichisgeneratedbyrandomlydrawing, withreplacement,N examples (ordata) fromthe original trainingdataset –whereN is the size of the original training set.Trainingsetfor each of the base classifiersisindependentof eachother.Many of the original datamaybe repeatedinthe resultingtrainingsetwhile othersmaybe leftout. Baggingreducesoverfitting(variance) byaveragingorvoting,however,thisleadstoanincrease in bias,whichiscompensatedbythe reductioninvariance though. AdaBoost: Adaptiveboosting isamachine learningmeatalgorithm formulated.Itcanbe usedin conjunction withmanyothertypesof learningalgorithmstoimprove performance.The outputof the other learningalgorithms('weaklearners') iscombinedintoaweightedsumthatrepresentsthe final outputof the boostedclassifier.AdaBoostisadaptive inthe sense thatsubsequentweaklearners are tweakedinfavorof those instancesmisclassifiedbypreviousclassifiers.AdaBoostissensitiveto noisydata and outliers. Insome problemsitcanbe lesssusceptible tothe overfittingproblemthan otherlearningalgorithms.The individuallearnerscanbe weak,butas longas the performance of each one isslightlybetterthanrandomguessing,the final model canbe proventoconverge toa stronglearner. Everylearningalgorithmtendstosuitsome problemtypesbetterthanothers,andtypicallyhas manydifferentparametersandconfigurationstoadjustbefore itachievesoptimalperformance ona dataset,AdaBoost isoftenreferredtoasthe bestout-of-the-box classifier.[2] Whenusedwith decisiontree learning,informationgatheredateachstage of the AdaBoostalgorithmaboutthe relative 'hardness'of eachtrainingsample isfedintothe tree growingalgorithmsuchthatlatertrees tendto focuson harder-to-classifyexamples. Multilayer perceptron (MLP): A multilayerperceptron(MLP) isa classof feedforward artificialneural network (ANN).The term MLP isusedambiguously, sometimeslooselytoreferto any feed forwardANN,sometimesstrictlyto referto networkscomposedof multiple layersof perceptrons (withthresholdactivation); see § Terminology.Multilayerperceptronsare sometimescolloquiallyreferredtoas"vanilla"neural networks,especiallywhentheyhave asingle hiddenlayer. An MLP consistsof at leastthree layersof nodes:aninputlayer,a hiddenlayerandanoutputlayer. Exceptfor the inputnodes,eachnode isa neuronthatusesa nonlinear activationfunction.MLP utilizesasupervisedlearningtechnique called backpropagation fortraining. Itsmultiple layersand
  • 4. non-linearactivationdistinguishMLPfroma linear perceptron.Itcandistinguishdatathatis not linearlyseparable. To implementabove all algorithmswe have usedpythontechnologyand‘student data’ dataset.This dataset available inside dataset folder which contains test dataset with dataset information file. PythonPackagesandLibrariesused:Numpy,pandas, tkinter, PyVISA 1.10.1 1.10.1 PyVISA-py 0.3.1 0.3.1 cycler 0.10.0 0.10.0 imutils 0.5.3 0.5.3 joblib 0.14.1 0.14.1 kiwisolver 1.1.0 1.1.0 matplotlib 3.1.2 3.1.2 nltk 3.4.5 3.4.5 numpy 1.18.1 1.18.1 opencv-python 4.1.2.30 4.1.2.30 pandas 0.25.3 0.25.3 pip 19.0.3 20.0.1 pylab 0.0.2 0.0.2 pyparsing 2.4.6 2.4.6 python-dateutil 2.8.1 2.8.1 pytz 2019.3 2019.3 pyusb 1.0.2 1.0.2 scikit-learn 0.22.1 0.22.1 scipy 1.4.1 1.4.1 seaborn 0.9.0 0.9.0 setuptools 40.8.0 45.1.0 six 1.14.0 1.14.0 sklearn 0.0 0.0 style 1.1.6 1.1.6 styled 0.2.0.post1 0.2.0.post1 classificationreport,confusionmatrix,accuracyscore,train_test_split,K-Fold,cross_val_score,Grid SearchCV, DecisionTree Classifier,K-NeighborsClassifier,SVC,naive_bayes,Random Forest Classifier,BaggingClassifier,AdaBoostClassifier,MLPClassifier. Screenshots Whenwe run the code itdisplaysbelow window
  • 6. Nowclickon ‘readdata’ itreads the data Now clickon ‘Train_Test_split’tosplitthe dataintotrainingandtesting Nowclickon ‘All classifiers’toclassifythe models
  • 7. KNN Predicted Values on Test Data is 98% CART Predicted Values on Test Data is 97.31% SVM Predicted Values on Test Data is 98.18% RF Predicted Values on Test Data is 98.62% Bagging Predicted Values on Test Data is 97.41% Ada Predicted Values on Test Data is 87.43% MLP Predicted Values on Test Data is 98.00% Nowclickon ‘Model comparison’the comparisonbetweenthe models