machine learning classification algorithm on ensemble technology.pptx
1. A Hybridized model using clustering
with ensemble classifier for
prediction of diseases
PRESENTED BY-
RASHMI GUPTA
2. outline
1. Introduction
2. Focus of the Research
3. Literature reviews
4. Methodologies used
5. Data collections
6. Implementations and Results
7. Conclusions & Future perspectives
8. References
3. INTRODUCTION
• Machine learning algorithms used in DM applications to retrieve the hidden
information that may be used in decision making. It also assist healthcare
professional in diagnosis of disease.
• The ensemble technology in machine learning can be used for feature selection
and classification purpose and used for improving the accuracy of the system.
• Ensemble learning technique combines the different models to solve the
particular problem.
• It usually combines the output of the multiple model and provides the better
performance than single model.
• The proposed hybrid model used the ranker algorithm and PSO for optimizing the
feature and voting ensemble framework with K-means clustering approach to five
different datasets.
4. Focus of the research
• The main focus is based on the benefits of ensemble base learner model with K-
means clustering to improve the performance and accuracy of the disease
prediction model.
• This model has shown the importance of clustering in classification for data
optimizing and reduced the unclustered or wrongly instances from the dataset.
• This system had analyze the influence of different voting ensemble with k-means
for 5 different medical datasets.
• Multiple or combinations of different classifiers helps to predict data more
accurately than single ones.
5. Literature reviews
Author Methodologies used Result
Thaseen et al., (2019) Ensemble based system used SVM, NB, LP
boost and chi square for feature selection
99% of accuracy
Leon et al., (2017) Voting ensemble classifiers with bagging good performance
Panthong and Srivihok
(2015)
Wrapper feature selection method based
on ensemble learning algorithm used
bagging on decision tree
89.60% of accuracy
Das et al., (2009) Neural network base ensemble model 89.95% of accuracy
Cohagan et al., (2010) Voting ensemble approach on 16 different
datasets
Voting based approach gave better
accuracy
Dietterich, T. G. et al.,
(2000).
Performed comparisons for efficiency
between 3 ensemble classifiers bagging,
boosting, and Randomization against the
C4.5 classifier.
This system proved that boosting-
C4.5 gave the better results with no
noise as compared to other
combinations, also Randomizing-C4.4
also gave improved results than
Bagging-C4.5.
M. Van Erp et al.,
(2002)
Performed the discussion on the un-
weighted voting method, confidence
voting method, and ranked voting
method in two data sets.
The author compared these voting
methods for pattern recognition.
6. Methodologies used
PSO (Particle Swarm Optimization):
• It is an Stochastic global search algorithm based on flocks of bird looking
for resources moving around a search space at a specific velocity for
searching better solution.
• This facilitates classification algorithm by reducing data complexity.
• Improve the stability of selected model.
K-means clustering:
• Clustering is an unsupervised learning algorithm used to partitioning the data into
cluster.
• In this model, it is concerned to reduce the unclustered instances from the datasets
prior to the classification or ensemble learning process.
7. Voting ensemble technique
• Ensemble learning technique is generally a meta approach to machine learning that
combines the prediction from multiple models. Eg. Bagging, adaboost, Boosting ,etc.
•It uses multiple independent similar or different models/ weak learners to derive an
output or make some predictions.
• It is used to summing the prediction probabilities made by one or more algorithm
and can be used to improve the classification , prediction , function approximation,
etc.
• A voting classifier is a machine learning model that trains on an ensemble of
numerous models and predicts an output based on their highest probability of chosen
class as output.
• In this work, a vote ensemble using adaboost , bagging, MLP, FURIA, and Random
forest as a base learner and Naïve Bayes as Meta learner classifier is used.
9. This experiment has been designed with the five most known datasets
taken from the UCI machine learning repository. All these medical
datasets contain different features, classes, and characteristics need to be
preprocessing before using any algorithm.
Dataset Number of class Number of
attributes
Number of
instances
characteristics
Heart disease 2 14 1025 Categorical, real
CKD 2 25 400 real
Thyroid 4 30 3772 Categorical, real
Diabetes 2 9 768 Categorical,
integer
Dermatology 5 35 366 Categorical,
integer
Datasets
10. Experimental design and Results
The proposed system followed the overall process in four steps:
i) Feature selection process using PSO and Ranking algorithm.
ii) Reducing the unclustered or wrongly instances using K-means clustering.
iii) Ensemble Naïve bayes with different classifiers such as adaboost, bagging, FURIA,
MLP, Random forest.
iv) Comparing and evaluating our results with different steps after performing the
various algorithms.
In this system, the unsupervised algorithm ensemble with Naïve Bayes and then classified
using different base classifier algorithms such as MLP, Random forest, FURIA, Bagging, and
Adaboost. Naïve Bayes proposed as Meta learner. This system can be trained in different
stages for better comparison with our proposed system. The training can be done with the
datasets before and after using the optimization technique, the datasets with PSO-K-means.
The evaluation and comparison of results after prediction can be done in all these stages for
showing the improved performance of our proposed ensemble voting system. The proposed
flow of the system predicts the various diseases correctly for giving better solutions.
11. Preprocessing and
feature optimization
with Ranker + PSO
K-means Ensemble classifier
Adaboost
Bagging
FURIA
Random
forest
MLP
Compare and
evaluate the
performances
Predict
outcome
Training without
feature selection
Training after
feature selection
Training after
applying
clustering
Training after
clustering and
ensembled with
Naïve bayes
Medical
datasets
Flow diagram of the proposed system
12. classifier
algorithms
Accuracy in %(diabetes)
original
data
with PSO PSO+ k-
means
ensemble
in original
data
PSO+
ensemble
PSO+ k-means+
ensemble
(proposed)
MLP 75.39 75.52 99.62 76.82 76.39 99.24
FURIA 74.47 75.52 98.49 76.17 76.3 98.49
RF 75.78 74.73 99.24 76.95 76.56 98.68
Adaboost 74.34 74.34 98.12 77.08 76.82 98.87
bagging 75.78 74.47 98.31 76.95 77.21 98.49
classification accuracy for different methods in the diabetes dataset.
Figure represent the
accuracy chart of
diabetes dataset.
13. classifier
algorithms
Accuracy in %(thyroid)
original
data
with
PSO
PSO+ k-
means
ensemble in
original data
PSO+
ensemble
PSO+ kmeans+
ensemble
(proposed)
MLP 94.69 96.23 97.91 95.36 95.04 99.33
FURIA 99.52 97.61 99.94 99.46 97.5 99.84
RF 99.39 96.92 99.94 96.87 95.7 99.64
Adaboost 95.38 95.38 100 95.57 94.61 99.94
bagging 99.57 97.4 100 98.93 95.14 99.74
Classification accuracy for different methods in the thyroid dataset.
Figure represent the accuracy
chart of Thyroid dataset
14. classifier
algorithms
Accuracy in %(dermatology)
original
data
with
PSO
PSO+ k-
means
ensemble
in original
data
PSO+ ensemble
PSO+ k-means+
ensemble
(proposed)
MLP 98.36 96.72 100 98.08 97.81 99.63
FURIA 93.98 95.62 98.52 97.81 97.81 99.63
RF 97.26 96.72 99.63 97.54 98..36 99.63
Adaboost 50.27 50.27 50.18 97.54 98.63 99.53
bagging 95.62 95.62 97.41 97.81 98.63 98.52
Classification accuracy for different methods in the dermatology dataset
Figure represent the accuracy
chart of dermatology dataset
15. classifier
algorithm
Accuracyin %(heart)
original
data
with
PSO
PSO+ k-
means
ensemble
in original
data
PSO+ ensemble
PSO+ kmeans+
ensemble
(proposed)
MLP 95.51 91.41 100 93.56 90.04 100
FURIA 100 99.51 100 100 99.6 100
RF 100 100 100 83.12 93.26 100
Adaboost 84.29 83.12 100 86.24 84.78 100
bagging 94.55 95.12 100 89.75 88.68 100
Classification accuracy for different methods in the heart disease dataset.
Figure represent the accuracy
chart of heart disease dataset
16. classifier
algorithm
Accuracy in %(CKD)
original
data
with
PSO
PSO + k-
means
ensemble
in original
data
PSO+
ensemble
PSO+
kmeans+
ensemble
(proposed)
MLP 83.33 99.25 100 98.5 98.75 100
FURIA 97.5 96.5 99.28 97.75 97 98.92
RF 99 98.75 100 96.25 96.5 99.64
Adaboost 96.25 96 100 97 96.75 100
bagging 97.25 97.25 100 97 97.25 100
Classification accuracy for different methods in the chronic kidney disease dataset
Figure represent the accuracy
chart of CKD dataset
17. Result analysis and comparisons
• The result analysis can be performed by comparing the accuracy for different
combinations with proposed voting hybrid ensemble classifiers for deeper
evaluation and interpreting the proposed system. However, the comparisons of
performance in terms of accuracy can be done in 5 different medical datasets.
• Comparisons are made using base learning classifiers in original data, datasets
after optimizing features using PSO, data after PSO–K-Means clustering, data after
using voting ensemble in original data, datasets after PSO–voting ensemble
classifier with our proposed PSO-K-means-voting ensemble approach.
• This result reveals that the combination of the voting ensemble with k-means is
a good approach for improving the performance and efficiency of the prediction
model.
18. Conclusions and future perspective
• In our evaluation, we test how effectively the data are used with these hybrid
ensemble technology and produced better and more accurate prediction system.
• Comparisons in different stages and combination made our experiment and
model more superior and gave a better alternative approach for disease diagnosis.
•It also suggested that the k-means clustering algorithm with hybrid ensemble
effectively contributed and strongly impacted classification results.
•This frame work provides effective decision support system and helpful in further
implementation.
• In future, this experiment shall be implemented with different cluster and
ensemble algorithms for more validations.
19. [1] Alaba, A., Maitanmi, S., & Ajayi, O. (n.d.). An Ensemble of classification techniques for Intrusion Detection Systems.
https://sites.google.com/site/ijcsis/.
[2] Bauer, E., & Kohavi, R. (1999). Empirical comparison of voting classification algorithms: bagging, boosting, and
variants. Machine Learning, 36(1), 105–139. https://doi.org/10.1023/a:1007515423169.
[3] Carvalho, M., & Ludermir, T. B. (2006). Hybrid training of feed-forward neural networks with particle swarm optimization.
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 4233 LNCS, 1061–1070. https://doi.org/10.1007/11893257_116.
[4] Cohagan, C., Grzymala-Busse, J. W., & Hippe, Z. S. (2010). “A Comparison of Three Voting Methods for Bagging with
the MLEM2 Algorithm ”, IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and
automated learning, pp. 118-125.
[5] Cordón, O., Kazienko, P., & Trawiński, B. (2011). Special issue on hybrid and ensemble methods in machine learning.
New Generation Computing, 29(3), 241–244. https://doi.org/10.1007/s00354-011-0300-3
[6] Das, R., & Sengur, A. (2010). Evaluation of ensemble methods for diagnosis of valvular heart disease Expert Systems
with Applications Evaluation of ensemble methods for diagnosing of valvular heart disease. Expert Systems With
Applications, 37(7), 5110–5115. https://doi.org/10.1016/j.eswa.2009.12.085
[7] Das, R., Turkoglu, I., & Sengur, A. (2008). Diagnosis of valvular heart disease through neural networks ensembles. 3,
185–191. https://doi.org/10.1016/j.cmpb.2008.09.005
[8] Das, R., Turkoglu, I., & Sengur, A. (2009). Effective diagnosis of heart disease through neural networks ensembles.
Expert Systems with Applications, 36(4), 7675–7680. https://doi.org/10.1016/j.eswa.2008.09.013
[9] Das, S., Abraham, A., & Konar, A. (2008). Automatic Clustering Using an Improved Differential Evolution Algorithm.
38(1), 218–237.
[10] Dietterich, T. G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:
Bagging, Boosting, and Randomization. Kluwer Academic Publishers. Manufactured in The Netherlands, Machine Learning,
40, 139–157.
References
20. [11] Hambali, M. A., Saheed, Y. K., Oladele, T. O., & Gbolagade, M. D. (2019). ADABOOST ENSEMBLE ALGORITHMS FOR
BREAST CANCER CLASSIFICATION. In Journal of Advances in Computer Research Quarterly (Vol. 10, Issue 2).
www.jacr.iausari.ac.ir
[12] Kazemi, Y., & Mirroshandel, S. A. (2018). A novel method for predicting kidney stone type using ensemble learning.
Artificial Intelligence in Medicine, 84, 117–126. https://doi.org/10.1016/j.artmed.2017.12.001
[13] Leon, F., Floria, S. A., & Badica, C. (2017). Evaluating the effect of voting methods on ensemble-based classification.
Proceedings - 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2017, July,
1–6. https://doi.org/10.1109/INISTA.2017.8001122.
[14] Leung, K. T., & Stott Parker, D. (2003). “Empirical Comparisons of Various Voting Methods in Bagging”, KDD '03
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 595-600.
[15] Lin, K. C., & Hsieh, Y. H. (2015). Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms
Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms. Journal of Medical Systems,
39(10). https://doi.org/10.1007/s10916-015-0306-3
[16] Manonmani, M., & Balakrishnan, S. (2020). An ensemble feature selection method for prediction of chronic diseases.
International Journal of Advanced Trends in Computer Science and Engineering, 9(5), 7405–7410.
https://doi.org/10.30534/ijatcse/2020/72952020
[17] Mohebian, M. R., Marateb, H. R., Mansourian, M., Mañanas, M. A., & Mokarian, F. (2017). A Hybrid Computer-aided-
diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning. Computational
and Structural Biotechnology Journal, 15, 75–85. https://doi.org/10.1016/j.csbj.2016.11.004
[18] Naaz, E., Sharma, D., Sirisha, D., & Venkatesan, M. (2016). Enhanced K-means clustering approach for health care
analysis using clinical documents. International Journal of Pharmaceutical and Clinical Research, 8(1), 60–64.
[19] Panthong, R., & Srivihok, A. (2015). Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble
Learning Algorithm. Procedia Computer Science, 72, 162–169. https://doi.org/10.1016/j.procs.2015.12.117
[20] Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Hybrid prediction model for Type-2 diabetic patients. Expert Systems with
Applications, 37(12), 8102–8108. https://doi.org/10.1016/j.eswa.2010.05.078