SlideShare a Scribd company logo
A Hybridized model using clustering
with ensemble classifier for
prediction of diseases
PRESENTED BY-
RASHMI GUPTA
outline
1. Introduction
2. Focus of the Research
3. Literature reviews
4. Methodologies used
5. Data collections
6. Implementations and Results
7. Conclusions & Future perspectives
8. References
INTRODUCTION
• Machine learning algorithms used in DM applications to retrieve the hidden
information that may be used in decision making. It also assist healthcare
professional in diagnosis of disease.
• The ensemble technology in machine learning can be used for feature selection
and classification purpose and used for improving the accuracy of the system.
• Ensemble learning technique combines the different models to solve the
particular problem.
• It usually combines the output of the multiple model and provides the better
performance than single model.
• The proposed hybrid model used the ranker algorithm and PSO for optimizing the
feature and voting ensemble framework with K-means clustering approach to five
different datasets.
Focus of the research
• The main focus is based on the benefits of ensemble base learner model with K-
means clustering to improve the performance and accuracy of the disease
prediction model.
• This model has shown the importance of clustering in classification for data
optimizing and reduced the unclustered or wrongly instances from the dataset.
• This system had analyze the influence of different voting ensemble with k-means
for 5 different medical datasets.
• Multiple or combinations of different classifiers helps to predict data more
accurately than single ones.
Literature reviews
Author Methodologies used Result
Thaseen et al., (2019) Ensemble based system used SVM, NB, LP
boost and chi square for feature selection
99% of accuracy
Leon et al., (2017) Voting ensemble classifiers with bagging good performance
Panthong and Srivihok
(2015)
Wrapper feature selection method based
on ensemble learning algorithm used
bagging on decision tree
89.60% of accuracy
Das et al., (2009) Neural network base ensemble model 89.95% of accuracy
Cohagan et al., (2010) Voting ensemble approach on 16 different
datasets
Voting based approach gave better
accuracy
Dietterich, T. G. et al.,
(2000).
Performed comparisons for efficiency
between 3 ensemble classifiers bagging,
boosting, and Randomization against the
C4.5 classifier.
This system proved that boosting-
C4.5 gave the better results with no
noise as compared to other
combinations, also Randomizing-C4.4
also gave improved results than
Bagging-C4.5.
M. Van Erp et al.,
(2002)
Performed the discussion on the un-
weighted voting method, confidence
voting method, and ranked voting
method in two data sets.
The author compared these voting
methods for pattern recognition.
Methodologies used
PSO (Particle Swarm Optimization):
• It is an Stochastic global search algorithm based on flocks of bird looking
for resources moving around a search space at a specific velocity for
searching better solution.
• This facilitates classification algorithm by reducing data complexity.
• Improve the stability of selected model.
K-means clustering:
• Clustering is an unsupervised learning algorithm used to partitioning the data into
cluster.
• In this model, it is concerned to reduce the unclustered instances from the datasets
prior to the classification or ensemble learning process.
Voting ensemble technique
• Ensemble learning technique is generally a meta approach to machine learning that
combines the prediction from multiple models. Eg. Bagging, adaboost, Boosting ,etc.
•It uses multiple independent similar or different models/ weak learners to derive an
output or make some predictions.
• It is used to summing the prediction probabilities made by one or more algorithm
and can be used to improve the classification , prediction , function approximation,
etc.
• A voting classifier is a machine learning model that trains on an ensemble of
numerous models and predicts an output based on their highest probability of chosen
class as output.
• In this work, a vote ensemble using adaboost , bagging, MLP, FURIA, and Random
forest as a base learner and Naïve Bayes as Meta learner classifier is used.
Voting Ensemble learner model
This experiment has been designed with the five most known datasets
taken from the UCI machine learning repository. All these medical
datasets contain different features, classes, and characteristics need to be
preprocessing before using any algorithm.
Dataset Number of class Number of
attributes
Number of
instances
characteristics
Heart disease 2 14 1025 Categorical, real
CKD 2 25 400 real
Thyroid 4 30 3772 Categorical, real
Diabetes 2 9 768 Categorical,
integer
Dermatology 5 35 366 Categorical,
integer
Datasets
Experimental design and Results
The proposed system followed the overall process in four steps:
i) Feature selection process using PSO and Ranking algorithm.
ii) Reducing the unclustered or wrongly instances using K-means clustering.
iii) Ensemble Naïve bayes with different classifiers such as adaboost, bagging, FURIA,
MLP, Random forest.
iv) Comparing and evaluating our results with different steps after performing the
various algorithms.
In this system, the unsupervised algorithm ensemble with Naïve Bayes and then classified
using different base classifier algorithms such as MLP, Random forest, FURIA, Bagging, and
Adaboost. Naïve Bayes proposed as Meta learner. This system can be trained in different
stages for better comparison with our proposed system. The training can be done with the
datasets before and after using the optimization technique, the datasets with PSO-K-means.
The evaluation and comparison of results after prediction can be done in all these stages for
showing the improved performance of our proposed ensemble voting system. The proposed
flow of the system predicts the various diseases correctly for giving better solutions.
Preprocessing and
feature optimization
with Ranker + PSO
K-means Ensemble classifier
Adaboost
Bagging
FURIA
Random
forest
MLP
Compare and
evaluate the
performances
Predict
outcome
Training without
feature selection
Training after
feature selection
Training after
applying
clustering
Training after
clustering and
ensembled with
Naïve bayes
Medical
datasets
Flow diagram of the proposed system
classifier
algorithms
Accuracy in %(diabetes)
original
data
with PSO PSO+ k-
means
ensemble
in original
data
PSO+
ensemble
PSO+ k-means+
ensemble
(proposed)
MLP 75.39 75.52 99.62 76.82 76.39 99.24
FURIA 74.47 75.52 98.49 76.17 76.3 98.49
RF 75.78 74.73 99.24 76.95 76.56 98.68
Adaboost 74.34 74.34 98.12 77.08 76.82 98.87
bagging 75.78 74.47 98.31 76.95 77.21 98.49
classification accuracy for different methods in the diabetes dataset.
Figure represent the
accuracy chart of
diabetes dataset.
classifier
algorithms
Accuracy in %(thyroid)
original
data
with
PSO
PSO+ k-
means
ensemble in
original data
PSO+
ensemble
PSO+ kmeans+
ensemble
(proposed)
MLP 94.69 96.23 97.91 95.36 95.04 99.33
FURIA 99.52 97.61 99.94 99.46 97.5 99.84
RF 99.39 96.92 99.94 96.87 95.7 99.64
Adaboost 95.38 95.38 100 95.57 94.61 99.94
bagging 99.57 97.4 100 98.93 95.14 99.74
Classification accuracy for different methods in the thyroid dataset.
Figure represent the accuracy
chart of Thyroid dataset
classifier
algorithms
Accuracy in %(dermatology)
original
data
with
PSO
PSO+ k-
means
ensemble
in original
data
PSO+ ensemble
PSO+ k-means+
ensemble
(proposed)
MLP 98.36 96.72 100 98.08 97.81 99.63
FURIA 93.98 95.62 98.52 97.81 97.81 99.63
RF 97.26 96.72 99.63 97.54 98..36 99.63
Adaboost 50.27 50.27 50.18 97.54 98.63 99.53
bagging 95.62 95.62 97.41 97.81 98.63 98.52
Classification accuracy for different methods in the dermatology dataset
Figure represent the accuracy
chart of dermatology dataset
classifier
algorithm
Accuracyin %(heart)
original
data
with
PSO
PSO+ k-
means
ensemble
in original
data
PSO+ ensemble
PSO+ kmeans+
ensemble
(proposed)
MLP 95.51 91.41 100 93.56 90.04 100
FURIA 100 99.51 100 100 99.6 100
RF 100 100 100 83.12 93.26 100
Adaboost 84.29 83.12 100 86.24 84.78 100
bagging 94.55 95.12 100 89.75 88.68 100
Classification accuracy for different methods in the heart disease dataset.
Figure represent the accuracy
chart of heart disease dataset
classifier
algorithm
Accuracy in %(CKD)
original
data
with
PSO
PSO + k-
means
ensemble
in original
data
PSO+
ensemble
PSO+
kmeans+
ensemble
(proposed)
MLP 83.33 99.25 100 98.5 98.75 100
FURIA 97.5 96.5 99.28 97.75 97 98.92
RF 99 98.75 100 96.25 96.5 99.64
Adaboost 96.25 96 100 97 96.75 100
bagging 97.25 97.25 100 97 97.25 100
Classification accuracy for different methods in the chronic kidney disease dataset
Figure represent the accuracy
chart of CKD dataset
Result analysis and comparisons
• The result analysis can be performed by comparing the accuracy for different
combinations with proposed voting hybrid ensemble classifiers for deeper
evaluation and interpreting the proposed system. However, the comparisons of
performance in terms of accuracy can be done in 5 different medical datasets.
• Comparisons are made using base learning classifiers in original data, datasets
after optimizing features using PSO, data after PSO–K-Means clustering, data after
using voting ensemble in original data, datasets after PSO–voting ensemble
classifier with our proposed PSO-K-means-voting ensemble approach.
• This result reveals that the combination of the voting ensemble with k-means is
a good approach for improving the performance and efficiency of the prediction
model.
Conclusions and future perspective
• In our evaluation, we test how effectively the data are used with these hybrid
ensemble technology and produced better and more accurate prediction system.
• Comparisons in different stages and combination made our experiment and
model more superior and gave a better alternative approach for disease diagnosis.
•It also suggested that the k-means clustering algorithm with hybrid ensemble
effectively contributed and strongly impacted classification results.
•This frame work provides effective decision support system and helpful in further
implementation.
• In future, this experiment shall be implemented with different cluster and
ensemble algorithms for more validations.
[1] Alaba, A., Maitanmi, S., & Ajayi, O. (n.d.). An Ensemble of classification techniques for Intrusion Detection Systems.
https://sites.google.com/site/ijcsis/.
[2] Bauer, E., & Kohavi, R. (1999). Empirical comparison of voting classification algorithms: bagging, boosting, and
variants. Machine Learning, 36(1), 105–139. https://doi.org/10.1023/a:1007515423169.
[3] Carvalho, M., & Ludermir, T. B. (2006). Hybrid training of feed-forward neural networks with particle swarm optimization.
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 4233 LNCS, 1061–1070. https://doi.org/10.1007/11893257_116.
[4] Cohagan, C., Grzymala-Busse, J. W., & Hippe, Z. S. (2010). “A Comparison of Three Voting Methods for Bagging with
the MLEM2 Algorithm ”, IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and
automated learning, pp. 118-125.
[5] Cordón, O., Kazienko, P., & Trawiński, B. (2011). Special issue on hybrid and ensemble methods in machine learning.
New Generation Computing, 29(3), 241–244. https://doi.org/10.1007/s00354-011-0300-3
[6] Das, R., & Sengur, A. (2010). Evaluation of ensemble methods for diagnosis of valvular heart disease Expert Systems
with Applications Evaluation of ensemble methods for diagnosing of valvular heart disease. Expert Systems With
Applications, 37(7), 5110–5115. https://doi.org/10.1016/j.eswa.2009.12.085
[7] Das, R., Turkoglu, I., & Sengur, A. (2008). Diagnosis of valvular heart disease through neural networks ensembles. 3,
185–191. https://doi.org/10.1016/j.cmpb.2008.09.005
[8] Das, R., Turkoglu, I., & Sengur, A. (2009). Effective diagnosis of heart disease through neural networks ensembles.
Expert Systems with Applications, 36(4), 7675–7680. https://doi.org/10.1016/j.eswa.2008.09.013
[9] Das, S., Abraham, A., & Konar, A. (2008). Automatic Clustering Using an Improved Differential Evolution Algorithm.
38(1), 218–237.
[10] Dietterich, T. G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:
Bagging, Boosting, and Randomization. Kluwer Academic Publishers. Manufactured in The Netherlands, Machine Learning,
40, 139–157.
References
[11] Hambali, M. A., Saheed, Y. K., Oladele, T. O., & Gbolagade, M. D. (2019). ADABOOST ENSEMBLE ALGORITHMS FOR
BREAST CANCER CLASSIFICATION. In Journal of Advances in Computer Research Quarterly (Vol. 10, Issue 2).
www.jacr.iausari.ac.ir
[12] Kazemi, Y., & Mirroshandel, S. A. (2018). A novel method for predicting kidney stone type using ensemble learning.
Artificial Intelligence in Medicine, 84, 117–126. https://doi.org/10.1016/j.artmed.2017.12.001
[13] Leon, F., Floria, S. A., & Badica, C. (2017). Evaluating the effect of voting methods on ensemble-based classification.
Proceedings - 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2017, July,
1–6. https://doi.org/10.1109/INISTA.2017.8001122.
[14] Leung, K. T., & Stott Parker, D. (2003). “Empirical Comparisons of Various Voting Methods in Bagging”, KDD '03
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 595-600.
[15] Lin, K. C., & Hsieh, Y. H. (2015). Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms
Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms. Journal of Medical Systems,
39(10). https://doi.org/10.1007/s10916-015-0306-3
[16] Manonmani, M., & Balakrishnan, S. (2020). An ensemble feature selection method for prediction of chronic diseases.
International Journal of Advanced Trends in Computer Science and Engineering, 9(5), 7405–7410.
https://doi.org/10.30534/ijatcse/2020/72952020
[17] Mohebian, M. R., Marateb, H. R., Mansourian, M., Mañanas, M. A., & Mokarian, F. (2017). A Hybrid Computer-aided-
diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning. Computational
and Structural Biotechnology Journal, 15, 75–85. https://doi.org/10.1016/j.csbj.2016.11.004
[18] Naaz, E., Sharma, D., Sirisha, D., & Venkatesan, M. (2016). Enhanced K-means clustering approach for health care
analysis using clinical documents. International Journal of Pharmaceutical and Clinical Research, 8(1), 60–64.
[19] Panthong, R., & Srivihok, A. (2015). Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble
Learning Algorithm. Procedia Computer Science, 72, 162–169. https://doi.org/10.1016/j.procs.2015.12.117
[20] Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Hybrid prediction model for Type-2 diabetic patients. Expert Systems with
Applications, 37(12), 8102–8108. https://doi.org/10.1016/j.eswa.2010.05.078
THANK YOU

More Related Content

PPTX
ensemble machine learning technique _ICAIR-2020.pptx
PDF
A hybrid wrapper spider monkey optimization-simulated annealing model for opt...
PDF
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
PDF
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
PDF
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
PPTX
Chronic Kidney Disease Prediction Using Machine Learning with Feature Selection
PDF
Performance evaluation of random forest with feature selection methods in pre...
ensemble machine learning technique _ICAIR-2020.pptx
A hybrid wrapper spider monkey optimization-simulated annealing model for opt...
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
Chronic Kidney Disease Prediction Using Machine Learning with Feature Selection
Performance evaluation of random forest with feature selection methods in pre...

Similar to machine learning classification algorithm on ensemble technology.pptx (20)

PDF
research paper
PDF
Design of an Intelligent System for Improving Classification of Cancer Diseases
PDF
Multivariate sample similarity measure for feature selection with a resemblan...
PPTX
How predictive models help Medicinal Chemists design better drugs_webinar
PDF
Diagnosis of Cancer using Fuzzy Rough Set Theory
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PPTX
Disease Prediction And Doctor Appointment system
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PDF
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
PDF
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
PDF
Classification of Breast Cancer Diseases using Data Mining Techniques
PPTX
An innovative approach for feature selection based on chicken swarm optimization
DOC
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Deep learning methods applied to physicochemical and toxicological endpoints
PPS
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
PDF
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
PDF
A Threshold fuzzy entropy based feature selection method applied in various b...
PDF
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
PDF
Chronic Kidney Disease Prediction Using Machine Learning
research paper
Design of an Intelligent System for Improving Classification of Cancer Diseases
Multivariate sample similarity measure for feature selection with a resemblan...
How predictive models help Medicinal Chemists design better drugs_webinar
Diagnosis of Cancer using Fuzzy Rough Set Theory
Analysis of Common Supervised Learning Algorithms Through Application
Disease Prediction And Doctor Appointment system
Analysis of Common Supervised Learning Algorithms Through Application
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
Classification of Breast Cancer Diseases using Data Mining Techniques
An innovative approach for feature selection based on chicken swarm optimization
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Welcome to International Journal of Engineering Research and Development (IJERD)
Deep learning methods applied to physicochemical and toxicological endpoints
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
A Threshold fuzzy entropy based feature selection method applied in various b...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Chronic Kidney Disease Prediction Using Machine Learning
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to Business Data Analytics.
PDF
Mega Projects Data Mega Projects Data
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Major-Components-ofNKJNNKNKNKNKronment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Business Data Analytics.
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Ad

machine learning classification algorithm on ensemble technology.pptx

  • 1. A Hybridized model using clustering with ensemble classifier for prediction of diseases PRESENTED BY- RASHMI GUPTA
  • 2. outline 1. Introduction 2. Focus of the Research 3. Literature reviews 4. Methodologies used 5. Data collections 6. Implementations and Results 7. Conclusions & Future perspectives 8. References
  • 3. INTRODUCTION • Machine learning algorithms used in DM applications to retrieve the hidden information that may be used in decision making. It also assist healthcare professional in diagnosis of disease. • The ensemble technology in machine learning can be used for feature selection and classification purpose and used for improving the accuracy of the system. • Ensemble learning technique combines the different models to solve the particular problem. • It usually combines the output of the multiple model and provides the better performance than single model. • The proposed hybrid model used the ranker algorithm and PSO for optimizing the feature and voting ensemble framework with K-means clustering approach to five different datasets.
  • 4. Focus of the research • The main focus is based on the benefits of ensemble base learner model with K- means clustering to improve the performance and accuracy of the disease prediction model. • This model has shown the importance of clustering in classification for data optimizing and reduced the unclustered or wrongly instances from the dataset. • This system had analyze the influence of different voting ensemble with k-means for 5 different medical datasets. • Multiple or combinations of different classifiers helps to predict data more accurately than single ones.
  • 5. Literature reviews Author Methodologies used Result Thaseen et al., (2019) Ensemble based system used SVM, NB, LP boost and chi square for feature selection 99% of accuracy Leon et al., (2017) Voting ensemble classifiers with bagging good performance Panthong and Srivihok (2015) Wrapper feature selection method based on ensemble learning algorithm used bagging on decision tree 89.60% of accuracy Das et al., (2009) Neural network base ensemble model 89.95% of accuracy Cohagan et al., (2010) Voting ensemble approach on 16 different datasets Voting based approach gave better accuracy Dietterich, T. G. et al., (2000). Performed comparisons for efficiency between 3 ensemble classifiers bagging, boosting, and Randomization against the C4.5 classifier. This system proved that boosting- C4.5 gave the better results with no noise as compared to other combinations, also Randomizing-C4.4 also gave improved results than Bagging-C4.5. M. Van Erp et al., (2002) Performed the discussion on the un- weighted voting method, confidence voting method, and ranked voting method in two data sets. The author compared these voting methods for pattern recognition.
  • 6. Methodologies used PSO (Particle Swarm Optimization): • It is an Stochastic global search algorithm based on flocks of bird looking for resources moving around a search space at a specific velocity for searching better solution. • This facilitates classification algorithm by reducing data complexity. • Improve the stability of selected model. K-means clustering: • Clustering is an unsupervised learning algorithm used to partitioning the data into cluster. • In this model, it is concerned to reduce the unclustered instances from the datasets prior to the classification or ensemble learning process.
  • 7. Voting ensemble technique • Ensemble learning technique is generally a meta approach to machine learning that combines the prediction from multiple models. Eg. Bagging, adaboost, Boosting ,etc. •It uses multiple independent similar or different models/ weak learners to derive an output or make some predictions. • It is used to summing the prediction probabilities made by one or more algorithm and can be used to improve the classification , prediction , function approximation, etc. • A voting classifier is a machine learning model that trains on an ensemble of numerous models and predicts an output based on their highest probability of chosen class as output. • In this work, a vote ensemble using adaboost , bagging, MLP, FURIA, and Random forest as a base learner and Naïve Bayes as Meta learner classifier is used.
  • 9. This experiment has been designed with the five most known datasets taken from the UCI machine learning repository. All these medical datasets contain different features, classes, and characteristics need to be preprocessing before using any algorithm. Dataset Number of class Number of attributes Number of instances characteristics Heart disease 2 14 1025 Categorical, real CKD 2 25 400 real Thyroid 4 30 3772 Categorical, real Diabetes 2 9 768 Categorical, integer Dermatology 5 35 366 Categorical, integer Datasets
  • 10. Experimental design and Results The proposed system followed the overall process in four steps: i) Feature selection process using PSO and Ranking algorithm. ii) Reducing the unclustered or wrongly instances using K-means clustering. iii) Ensemble Naïve bayes with different classifiers such as adaboost, bagging, FURIA, MLP, Random forest. iv) Comparing and evaluating our results with different steps after performing the various algorithms. In this system, the unsupervised algorithm ensemble with Naïve Bayes and then classified using different base classifier algorithms such as MLP, Random forest, FURIA, Bagging, and Adaboost. Naïve Bayes proposed as Meta learner. This system can be trained in different stages for better comparison with our proposed system. The training can be done with the datasets before and after using the optimization technique, the datasets with PSO-K-means. The evaluation and comparison of results after prediction can be done in all these stages for showing the improved performance of our proposed ensemble voting system. The proposed flow of the system predicts the various diseases correctly for giving better solutions.
  • 11. Preprocessing and feature optimization with Ranker + PSO K-means Ensemble classifier Adaboost Bagging FURIA Random forest MLP Compare and evaluate the performances Predict outcome Training without feature selection Training after feature selection Training after applying clustering Training after clustering and ensembled with Naïve bayes Medical datasets Flow diagram of the proposed system
  • 12. classifier algorithms Accuracy in %(diabetes) original data with PSO PSO+ k- means ensemble in original data PSO+ ensemble PSO+ k-means+ ensemble (proposed) MLP 75.39 75.52 99.62 76.82 76.39 99.24 FURIA 74.47 75.52 98.49 76.17 76.3 98.49 RF 75.78 74.73 99.24 76.95 76.56 98.68 Adaboost 74.34 74.34 98.12 77.08 76.82 98.87 bagging 75.78 74.47 98.31 76.95 77.21 98.49 classification accuracy for different methods in the diabetes dataset. Figure represent the accuracy chart of diabetes dataset.
  • 13. classifier algorithms Accuracy in %(thyroid) original data with PSO PSO+ k- means ensemble in original data PSO+ ensemble PSO+ kmeans+ ensemble (proposed) MLP 94.69 96.23 97.91 95.36 95.04 99.33 FURIA 99.52 97.61 99.94 99.46 97.5 99.84 RF 99.39 96.92 99.94 96.87 95.7 99.64 Adaboost 95.38 95.38 100 95.57 94.61 99.94 bagging 99.57 97.4 100 98.93 95.14 99.74 Classification accuracy for different methods in the thyroid dataset. Figure represent the accuracy chart of Thyroid dataset
  • 14. classifier algorithms Accuracy in %(dermatology) original data with PSO PSO+ k- means ensemble in original data PSO+ ensemble PSO+ k-means+ ensemble (proposed) MLP 98.36 96.72 100 98.08 97.81 99.63 FURIA 93.98 95.62 98.52 97.81 97.81 99.63 RF 97.26 96.72 99.63 97.54 98..36 99.63 Adaboost 50.27 50.27 50.18 97.54 98.63 99.53 bagging 95.62 95.62 97.41 97.81 98.63 98.52 Classification accuracy for different methods in the dermatology dataset Figure represent the accuracy chart of dermatology dataset
  • 15. classifier algorithm Accuracyin %(heart) original data with PSO PSO+ k- means ensemble in original data PSO+ ensemble PSO+ kmeans+ ensemble (proposed) MLP 95.51 91.41 100 93.56 90.04 100 FURIA 100 99.51 100 100 99.6 100 RF 100 100 100 83.12 93.26 100 Adaboost 84.29 83.12 100 86.24 84.78 100 bagging 94.55 95.12 100 89.75 88.68 100 Classification accuracy for different methods in the heart disease dataset. Figure represent the accuracy chart of heart disease dataset
  • 16. classifier algorithm Accuracy in %(CKD) original data with PSO PSO + k- means ensemble in original data PSO+ ensemble PSO+ kmeans+ ensemble (proposed) MLP 83.33 99.25 100 98.5 98.75 100 FURIA 97.5 96.5 99.28 97.75 97 98.92 RF 99 98.75 100 96.25 96.5 99.64 Adaboost 96.25 96 100 97 96.75 100 bagging 97.25 97.25 100 97 97.25 100 Classification accuracy for different methods in the chronic kidney disease dataset Figure represent the accuracy chart of CKD dataset
  • 17. Result analysis and comparisons • The result analysis can be performed by comparing the accuracy for different combinations with proposed voting hybrid ensemble classifiers for deeper evaluation and interpreting the proposed system. However, the comparisons of performance in terms of accuracy can be done in 5 different medical datasets. • Comparisons are made using base learning classifiers in original data, datasets after optimizing features using PSO, data after PSO–K-Means clustering, data after using voting ensemble in original data, datasets after PSO–voting ensemble classifier with our proposed PSO-K-means-voting ensemble approach. • This result reveals that the combination of the voting ensemble with k-means is a good approach for improving the performance and efficiency of the prediction model.
  • 18. Conclusions and future perspective • In our evaluation, we test how effectively the data are used with these hybrid ensemble technology and produced better and more accurate prediction system. • Comparisons in different stages and combination made our experiment and model more superior and gave a better alternative approach for disease diagnosis. •It also suggested that the k-means clustering algorithm with hybrid ensemble effectively contributed and strongly impacted classification results. •This frame work provides effective decision support system and helpful in further implementation. • In future, this experiment shall be implemented with different cluster and ensemble algorithms for more validations.
  • 19. [1] Alaba, A., Maitanmi, S., & Ajayi, O. (n.d.). An Ensemble of classification techniques for Intrusion Detection Systems. https://sites.google.com/site/ijcsis/. [2] Bauer, E., & Kohavi, R. (1999). Empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning, 36(1), 105–139. https://doi.org/10.1023/a:1007515423169. [3] Carvalho, M., & Ludermir, T. B. (2006). Hybrid training of feed-forward neural networks with particle swarm optimization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4233 LNCS, 1061–1070. https://doi.org/10.1007/11893257_116. [4] Cohagan, C., Grzymala-Busse, J. W., & Hippe, Z. S. (2010). “A Comparison of Three Voting Methods for Bagging with the MLEM2 Algorithm ”, IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning, pp. 118-125. [5] Cordón, O., Kazienko, P., & Trawiński, B. (2011). Special issue on hybrid and ensemble methods in machine learning. New Generation Computing, 29(3), 241–244. https://doi.org/10.1007/s00354-011-0300-3 [6] Das, R., & Sengur, A. (2010). Evaluation of ensemble methods for diagnosis of valvular heart disease Expert Systems with Applications Evaluation of ensemble methods for diagnosing of valvular heart disease. Expert Systems With Applications, 37(7), 5110–5115. https://doi.org/10.1016/j.eswa.2009.12.085 [7] Das, R., Turkoglu, I., & Sengur, A. (2008). Diagnosis of valvular heart disease through neural networks ensembles. 3, 185–191. https://doi.org/10.1016/j.cmpb.2008.09.005 [8] Das, R., Turkoglu, I., & Sengur, A. (2009). Effective diagnosis of heart disease through neural networks ensembles. Expert Systems with Applications, 36(4), 7675–7680. https://doi.org/10.1016/j.eswa.2008.09.013 [9] Das, S., Abraham, A., & Konar, A. (2008). Automatic Clustering Using an Improved Differential Evolution Algorithm. 38(1), 218–237. [10] Dietterich, T. G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Kluwer Academic Publishers. Manufactured in The Netherlands, Machine Learning, 40, 139–157. References
  • 20. [11] Hambali, M. A., Saheed, Y. K., Oladele, T. O., & Gbolagade, M. D. (2019). ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION. In Journal of Advances in Computer Research Quarterly (Vol. 10, Issue 2). www.jacr.iausari.ac.ir [12] Kazemi, Y., & Mirroshandel, S. A. (2018). A novel method for predicting kidney stone type using ensemble learning. Artificial Intelligence in Medicine, 84, 117–126. https://doi.org/10.1016/j.artmed.2017.12.001 [13] Leon, F., Floria, S. A., & Badica, C. (2017). Evaluating the effect of voting methods on ensemble-based classification. Proceedings - 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2017, July, 1–6. https://doi.org/10.1109/INISTA.2017.8001122. [14] Leung, K. T., & Stott Parker, D. (2003). “Empirical Comparisons of Various Voting Methods in Bagging”, KDD '03 Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 595-600. [15] Lin, K. C., & Hsieh, Y. H. (2015). Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms. Journal of Medical Systems, 39(10). https://doi.org/10.1007/s10916-015-0306-3 [16] Manonmani, M., & Balakrishnan, S. (2020). An ensemble feature selection method for prediction of chronic diseases. International Journal of Advanced Trends in Computer Science and Engineering, 9(5), 7405–7410. https://doi.org/10.30534/ijatcse/2020/72952020 [17] Mohebian, M. R., Marateb, H. R., Mansourian, M., Mañanas, M. A., & Mokarian, F. (2017). A Hybrid Computer-aided- diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning. Computational and Structural Biotechnology Journal, 15, 75–85. https://doi.org/10.1016/j.csbj.2016.11.004 [18] Naaz, E., Sharma, D., Sirisha, D., & Venkatesan, M. (2016). Enhanced K-means clustering approach for health care analysis using clinical documents. International Journal of Pharmaceutical and Clinical Research, 8(1), 60–64. [19] Panthong, R., & Srivihok, A. (2015). Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm. Procedia Computer Science, 72, 162–169. https://doi.org/10.1016/j.procs.2015.12.117 [20] Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Hybrid prediction model for Type-2 diabetic patients. Expert Systems with Applications, 37(12), 8102–8108. https://doi.org/10.1016/j.eswa.2010.05.078