Performance analysis of machine learning algorithms on self localization system1

Performance Analysis of Machine Learning Algorithms on Self Localization Systems
In thispaperauthoris usingSVM(SupportVectorMachine), DecisionTree Classifier,K-Neighbors
Classifier,naïve bayes,RandomForestClassifier,BaggingClassifier,Ada BoostClassifierandMLP
Classifier
All the algorithmsgenerate model fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.Random Forest algorithmisgivingbetterpredictionaccuracycompare to all other
algorithm.
Support vector machine:
Machine learninginvolvespredictingandclassifyingdataandto do sowe employvariousmachine
learningalgorithmsaccordingtothe dataset.SVMor SupportVectorMachine is a linearmodel for
classificationandregressionproblems.It cansolve linearandnon-linearproblemsandworkwell for
manypractical problems.The ideaof SVMissimple:The algorithmcreatesaline ora hyper plane
whichseparatesthe dataintoclasses.Inmachine learning,the radial basisfunctionkernel,orRBF
kernel,isapopularkernel functionusedinvariouskernelizedlearningalgorithms.Inparticular,itis
commonlyusedinsupportvectormachine classification.Asasimple example,foraclassification
task withonlytwofeatures(like the image above),youcanthinkof a hyperplane asa line that
linearlyseparatesandclassifiesasetof data.
Intuitively,the furtherfromthe hyper plane ourdatapointslie,the more confidentwe are thatthey
have beencorrectlyclassified.We therefore wantourdatapointstobe as far awayfrom the hyper
plane as possible,while still beingonthe correctside of it.
So whennewtestingdataisadded,whateversideof the hyper plane itlandswill decide the class
that we assignto it.
How dowe findthe right hyperplane?
Or, inotherwords,howdo we bestsegregate the twoclasseswithinthe data?
The distance betweenthe hyper plane andthe nearestdatapointfromeithersetisknownasthe
margin.The goal isto choose a hyper plane withthe greatestpossible marginbetweenthe hyper
plane andany pointwithinthe trainingset,givingagreaterchance of new data beingclassified
correctly.
Both algorithmsgeneratemodel fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.SVMalgorithmisgivingbetterpredictionaccuracycompare to ANN algorithm.
Naïve Bayes Classifier Algorithm
It wouldbe difficultandpracticallyimpossible toclassifyawebpage,a document,anemail orany
otherlengthytextnotesmanually.This iswhere Naïve BayesClassifiermachine learningalgorithm
comesto the rescue.A classifierisafunctionthatallocatesa population’s elementvalue fromone of
the available categories.Forinstance,SpamFilteringisapopularapplicationof Naïve Bayes
algorithm.Spamfilterhere,isaclassifierthatassignsalabel “Spam”or “Not Spam” to all the emails.
Naïve BayesClassifierisamongstthe mostpopularlearningmethodgroupedbysimilaritiesthat
workson the popularBayesTheorem of Probability- tobuildmachine learningmodelsparticularly
for disease predictionanddocumentclassification.Itisa simple classification of wordsbasedon
BayesProbabilityTheoremforsubjective analysisof content.

Decision tree:
A decisiontree isagraphical representationthatmakesuse of branchingmethodologytoexemplify
all possible outcomesof adecision, basedoncertainconditions.Inadecisiontree,the internalnode
representsateston the attribute,eachbranchof the tree represents the outcome of the testand
the leaf node representsaparticularclasslabel i.e.the decisionmade aftercomputingall of the
attributes.
The classificationrulesare representedthroughthe pathfromrootto the leaf node.
Types ofDecisionTrees
ClassificationTrees- These are consideredasthe defaultkindof decisiontreesusedtoseparate a
datasetintodifferentclasses,based onthe responsevariable.Theseare generallyusedwhenthe
response variable iscategoricalinnature.
RegressionTrees-Whenthe responseortargetvariable iscontinuousornumerical,regressiontrees
are used.These are generally usedinpredictive type of problemswhencomparedtoclassification.
Decisiontreescanalsobe classifiedintotwotypes,basedonthe type of targetvariable- Continuous
Variable DecisionTreesandBinary Variable DecisionTrees.Itisthe targetvariable thathelpsdecide
whatkindof decisiontree wouldbe requiredforaparticularproblem.
Random forest:
RandomForestisthe go to machine learningalgorithmthatusesabaggingapproachto create a
bunchof decisiontreeswithrandom subsetof the data.A model istrainedseveral timesonrandom
sample of the datasetto achieve goodpredictionperformancefromthe randomforestalgorithm.In
thisensemble learningmethod,the outputof all the decisiontreesinthe randomforest,is
combinedtomake the final prediction.The final predictionof the randomforestalgorithmisderived
by pollingthe resultsof eachdecisiontree orjustby goingwitha predictionthatappearsthe most
timesinthe decisiontrees.
For instance,inthe above example- if 5 friendsdecide thatyouwill like restaurantRbutonly2
friendsdecide thatyouwill notlike the restaurantthenthe final predictionisthat,youwill like
restaurantR as majorityalwayswins.
K – nearest neighbor:
K-nearest neighbor’salgorithm(k-NN) isa nonparametricmethodusedforclassification
and regression Inbothcases,the inputconsistsof the kclosesttrainingexamplesinthe feature
space.The outputdependsonwhether k-NN isusedforclassificationorregression:
 In k-NN classification,the outputisa classmembership.Anobjectisclassifiedbya
pluralityvote of itsneighbors,withthe objectbeingassignedtothe classmost common
amongits k nearestneighbors(kisapositive integer,typicallysmall).If k= 1, thenthe
objectissimplyassignedtothe classof that single nearestneighbor.
 In k-NN regression,the outputisthe propertyvalue forthe object.Thisvalue isthe
average of the valuesof k nearestneighbors.
K-NN isa type of instant-basedlearning,orlazylearning, where the functionisonly
approximatedlocallyandall computationisdeferreduntil classification.
Both forclassificationandregression,auseful technique canbe toassignweightstothe
contributionsof the neighbors,sothatthe nearerneighborscontribute more tothe average
than the more distantones.Forexample,acommonweightingscheme consistsingiving
each neighboraweightof 1/d,where d isthe distance tothe neighbor

The neighborsare takenfroma setof objectsforwhichthe class (for k-NN classification)or
the objectpropertyvalue (for k-NN regression) isknown.Thiscanbe thoughtof as the
trainingsetforthe algorithm,thoughnoexplicittrainingstepisrequired.
A peculiarityof the k-NN algorithmisthatitis sensitive tothe local structure of the data.
Bagging classifier:
A Baggingclassifierisanensemble meta-estimatorthatfitsbase classifierseachonrandomsubsets
of the original datasetandthenaggregate theirindividual predictions(eitherbyvotingorby
averaging) toforma final prediction.Suchameta-estimatorcantypicallybe usedasa wayto reduce
the variance of a black-box estimator(e.g.,adecisiontree),by introducingrandomizationintoits
constructionprocedure andthenmakinganensemble outof it.
Each base classifieristrainedinparallelwithatrainingsetwhichisgeneratedbyrandomlydrawing,
withreplacement,N examples (ordata) fromthe original trainingdataset –whereN is the size of the
original training set.Trainingsetfor each of the base classifiersisindependentof eachother.Many
of the original datamaybe repeatedinthe resultingtrainingsetwhile othersmaybe leftout.
Baggingreducesoverfitting(variance) byaveragingorvoting,however,thisleadstoanincrease in
bias,whichiscompensatedbythe reductioninvariance though.
AdaBoost:
Adaptiveboosting isamachine learningmeatalgorithm formulated.Itcanbe usedin conjunction
withmanyothertypesof learningalgorithmstoimprove performance.The outputof the other
learningalgorithms('weaklearners') iscombinedintoaweightedsumthatrepresentsthe final
outputof the boostedclassifier.AdaBoostisadaptive inthe sense thatsubsequentweaklearners
are tweakedinfavorof those instancesmisclassifiedbypreviousclassifiers.AdaBoostissensitiveto
noisydata and outliers. Insome problemsitcanbe lesssusceptible tothe overfittingproblemthan
otherlearningalgorithms.The individuallearnerscanbe weak,butas longas the performance of
each one isslightlybetterthanrandomguessing,the final model canbe proventoconverge toa
stronglearner.
Everylearningalgorithmtendstosuitsome problemtypesbetterthanothers,andtypicallyhas
manydifferentparametersandconfigurationstoadjustbefore itachievesoptimalperformance ona
dataset,AdaBoost isoftenreferredtoasthe bestout-of-the-box classifier.[2]
Whenusedwith
decisiontree learning,informationgatheredateachstage of the AdaBoostalgorithmaboutthe
relative 'hardness'of eachtrainingsample isfedintothe tree growingalgorithmsuchthatlatertrees
tendto focuson harder-to-classifyexamples.
Multilayer perceptron (MLP):
A multilayerperceptron(MLP) isa classof feedforward artificialneural network (ANN).The term
MLP isusedambiguously, sometimeslooselytoreferto any feed forwardANN,sometimesstrictlyto
referto networkscomposedof multiple layersof perceptrons (withthresholdactivation);
see § Terminology.Multilayerperceptronsare sometimescolloquiallyreferredtoas"vanilla"neural
networks,especiallywhentheyhave asingle hiddenlayer.
An MLP consistsof at leastthree layersof nodes:aninputlayer,a hiddenlayerandanoutputlayer.
Exceptfor the inputnodes,eachnode isa neuronthatusesa nonlinear activationfunction.MLP
utilizesasupervisedlearningtechnique called backpropagation fortraining. Itsmultiple layersand

non-linearactivationdistinguishMLPfroma linear perceptron.Itcandistinguishdatathatis
not linearlyseparable.
To implementabove all algorithmswe have usedpythontechnologyand‘student data’ dataset.This
dataset available inside dataset folder which contains test dataset with dataset information file.
PythonPackagesandLibrariesused:Numpy,pandas, tkinter,
PyVISA 1.10.1 1.10.1
PyVISA-py 0.3.1 0.3.1
cycler 0.10.0 0.10.0
imutils 0.5.3 0.5.3
joblib 0.14.1 0.14.1
kiwisolver 1.1.0 1.1.0
matplotlib 3.1.2 3.1.2
nltk 3.4.5 3.4.5
numpy 1.18.1 1.18.1
opencv-python 4.1.2.30 4.1.2.30
pandas 0.25.3 0.25.3
pip 19.0.3 20.0.1
pylab 0.0.2 0.0.2
pyparsing 2.4.6 2.4.6
python-dateutil 2.8.1 2.8.1
pytz 2019.3 2019.3
pyusb 1.0.2 1.0.2
scikit-learn 0.22.1 0.22.1
scipy 1.4.1 1.4.1
seaborn 0.9.0 0.9.0
setuptools 40.8.0 45.1.0
six 1.14.0 1.14.0
sklearn 0.0 0.0
style 1.1.6 1.1.6
styled 0.2.0.post1 0.2.0.post1
classificationreport,confusionmatrix,accuracyscore,train_test_split,K-Fold,cross_val_score,Grid
SearchCV, DecisionTree Classifier,K-NeighborsClassifier,SVC,naive_bayes,Random Forest
Classifier,BaggingClassifier,AdaBoostClassifier,MLPClassifier.
Screenshots
Whenwe run the code itdisplaysbelow window

Nowclickon ‘uploaddataset’touploadthe data

Nowclickon ‘readdata’ itreads the data
Now clickon ‘Train_Test_split’tosplitthe dataintotrainingandtesting
Nowclickon ‘All classifiers’toclassifythe models

KNN Predicted Values on Test Data is 98%
CART Predicted Values on Test Data is 97.31%
SVM Predicted Values on Test Data is 98.18%
RF Predicted Values on Test Data is 98.62%
Bagging Predicted Values on Test Data is 97.41%
Ada Predicted Values on Test Data is 87.43%
MLP Predicted Values on Test Data is 98.00%
Nowclickon ‘Model comparison’the comparisonbetweenthe models

Performance analysis of machine learning algorithms on self localization system1

More Related Content

What's hot (19)

Similar to Performance analysis of machine learning algorithms on self localization system1 (20)

More from Venkat Projects (20)

Recently uploaded (20)

Performance analysis of machine learning algorithms on self localization system1