SlideShare a Scribd company logo
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1122
Comparative Analysis of Heart Disease Prediction Models: Unveiling
the Most Accurate and Reliable Machine Learning Algorithm
Aatmaj Amol Salunke1
Computer Science & Engineering
Department of Computer Science & Engineering,
School of Computer Science and Engineering,
Manipal University Jaipur
Rajasthan, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Heart disease is a significant health concern, warranting accurate prediction models for timely intervention. This
research paper presents a comparative analysis of three popular machine learning algorithms, namely Logistic Regression,
Support Vector Machines (SVM), and RandomForest, forheartdisease prediction. Utilizingacomprehensivedatasetencompassing
clinical and lifestyle features, each model was developed and evaluated using standard metrics. The study unveils the most
accurate and reliable algorithm for heart disease prediction, offering valuable insights into model performance. Furthermore,
feature importance analysis sheds lightoncriticalfactors influencingaccuratepredictions. Theresultsaidhealthcare professionals
in selecting the most appropriate model for efficient heart disease prediction, contributing to improved patient care and clinical
decision-making. Random Forest achieved 88% accuracy, outperforming LogisticRegressionandSVMforheartdiseaseprediction.
Key Words: Heart disease prediction, Machine learning algorithms, Logistic Regression, Support Vector Machines (SVM),
Random Forest, Comparative analysis
1.RELATED WORK
Ali et al. [1] proposed a machine learning approach achieving 100% accuracy, sensitivity, and specificity for heart disease
prediction. Ghosh et al. [2] proposed a model achieving 99.05% accuracy for heart disease prediction using hybrid classifiers
and feature selection. Khourdifi et al. [3] proposed a hybrid approach achieving 99.65% accuracy for heart disease
classification using optimization algorithmsandfeatureselection.Latha etal.[5]proposedanensembleclassificationapproach
achieving 7% increase in accuracy for heart disease prediction. Bhatla et al. [6] proposed using neural networks with 15
attributes for heart disease prediction, outperforming other data mining techniques. Gonsalves et al.[8]proposedusingNaïve
Bayes, SVM, and Decision Tree to predict CHD with promising results. Salhi et al. [11] proposed using neural networks with a
correlation matrix for heart disease prediction with 93% accuracy. Souri et al. [13] proposed an IoT-based student healthcare
monitoring model with SVM achieving 99.1% accuracy. Ramesh et al. [14] proposed using supervised learning methods,
including KNN, for heart disease prediction with promising results. Alarsan et al. [15]proposedanECGclassificationapproach
using machine learning, achieving 97.98% accuracy with Random Forest for binary classification.
2.INTRODUCTION
Heart disease is a prevalent global health concern, necessitating accurate prediction models for timely interventions and
improved patient care. This research paper conducts a comprehensive comparative analysis of three widely used machine
learning algorithms: Logistic Regression, Support Vector Machines (SVM), and Random Forest, in the context of heart disease
prediction. Leveraging a diverse dataset comprising clinical and lifestyle features, each model was developed and evaluated
using standard performance metrics. The study unveils the most accurate and reliable algorithm for heart disease prediction,
enabling informed decision-making by healthcare professionals. Furthermore, feature importance analysis elucidates the
significant factors influencing accurate predictions.Theobtainedinsightsholdpotential implicationsforclinical practice,asthe
most suitable model can be chosen based on performance and interpretability. Ultimately, this research contributes to the
advancement of heart disease prediction systems, enhancing healthcare outcomes and patient well-being.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1123
Fig.1. Generic Architecture of Heart Disease Prediction System using Machine Learning
3. DATASET
The research utilizes a comprehensive dataset sourcedfromdiversehealthcareinstitutions,encompassingclinical andlifestyle
features relevant to heart disease prediction. The dataset includes demographic information, medical history, vital signs,
laboratory results, and lifestyle factors of patients. Each record is labelled to indicate the presence or absenceofheartdisease.
The dataset is carefully curated and pre-processed to handle missing values and ensure data quality. This rich and reliable
dataset forms the foundation for trainingandevaluatingtheheartdiseasepredictionmodelsusingLogisticRegression,Support
Vector Machines (SVM), and Random Forest. The inclusion of variedfeatures enablesa holisticanalysisandrobustcomparison
of the machine learning algorithms.
Attribute Information in the Dataset:
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholesterol in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved.
9. exercise induced angina.
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) coloured by fluoroscopy.
13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect
14. The names and social security numbers of the patients were recently removed from the database, replaced with dummy
values.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1124
Fig.2. Dataset for Heart Disease Prediction System Modelling
4. METHODOLOGY
1. Data Collection and Preprocessing:
The dataset used in this study comprises anonymized patient data with 14 attributes: age, sex, chest pain type, resting blood
pressure, serum cholesterol level, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved,
exercise-induced angina, oldpeak (ST depression induced by exercise relative to rest), slope of the peak exercise ST segment,
number of major vessels colored by fluoroscopy, and thal (thalassemia type). Social security numbers and patient identifiers
have been replaced with dummy values to ensure data privacy.
2. Data Preprocessing:
The dataset undergoes rigorous preprocessingtohandle missingvaluesandstandardizethedata.Categorical attributessuchas
chest pain type and thal are encoded using one-hot encoding, transforming them into numerical representations suitable for
the machine learning algorithms.
3. Train-Test Split:
The preprocessed dataset is divided into a training set and a test set, maintaining an appropriate ratio to ensure robust model
evaluation. The training set is used to train the models, while the test set is reserved for unbiased performance assessment.
4. Feature Scaling:
Numerical features such as age, blood pressure, serum cholesterol, and maximum heart rate are scaled to bring them within a
common range, facilitating better convergence and training stability for the machine learning algorithms.
5. Model Development:
Three machine learning algorithms, namely Logistic Regression, Support Vector Machines (SVM), and Random Forest, are
implemented for heart disease prediction. Each model is trained using the training set with the target variableasthepresence
or absence of heart disease.
6. Model Evaluation:
The trained models are evaluated using standard performance metrics,includingaccuracy,precision, recall,F1-score,andarea
under the receiver operating characteristic curve (AUC-ROC). The evaluation aims to determine the predictive capabilities of
each algorithm and identify the most accurate model for heart disease prediction.
7. Feature Importance Analysis:
For interpretability, feature importanceanalysisisconductedtoidentifythemostinfluential attributescontributingtoaccurate
heart disease predictions. Techniques such as permutation importance or feature importance scores from theRandomForest
model are employed for this analysis.
8. Comparative Analysis:
The performance metrics and feature importance results are compared across thethreemodels,LogisticRegression,SVM,and
Random Forest, to determine the most reliable and effective algorithm for heart disease prediction.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1125
9. Ethical Considerations:
Throughout the methodology, ethical considerationsaregiven paramountimportance,ensuringtheconfidentialityandprivacy
of patient data. The study adheres to data protection regulations and guidelines, guaranteeing responsible data usage for
research purposes.
By following this methodology, the research aims to develop a robust and interpretable heart disease predictionsystemusing
machine learning and provide valuable insights into the most influential factors for accurate predictions.
Fig.3. Comparing the Outcomes of the 3 proposed models
5. RESULTS AND ANALYSIS
The heart disease prediction system was evaluated using three machine learning algorithms: Logistic Regression, Support
Vector Machines (SVM), and Random Forest. The models were trained and tested on the dataset with 14 attributes. The
performance of each model was assessed using various metrics, including accuracy, error rate, precision, recall, F1-score,and
AUC-ROC.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1126
Table 1. Results for Logistic Regression Model
Metric Value
Accuracy 0.82
Error Rate 0.18
Precision 0.84
Recall 0.80
F1-Score 0.82
AUC-ROC 0.88
Table 2. Results for Support Vector Machine (SVM)
Metric Value
Accuracy 0.85
Error Rate 0.15
Precision 0.87
Recall 0.83
F1-Score 0.85
AUC-ROC 0.90
Table 3. Results for Random Forest Model
Metric Value
Accuracy 0.88
Error Rate 0.12
Precision 0.90
Recall 0.87
F1-Score 0.88
AUC-ROC 0.93
From the tabulated results, it is evident that all three models - Logistic Regression, SVM, and Random Forest - demonstrate
promising performance in heart disease prediction. However, the Random Forest model exhibits the highest accuracyof0.88,
followed closely by the SVM model with an accuracy of 0.85. The Logistic Regression model also performs reasonably well,
achieving an accuracy of 0.82.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1127
Fig.4. Line graph for Accuracy
The Random Forest model outperforms the other models in terms of accuracy, precision, recall, F1-score, and AUC-ROC. The
higher accuracy and AUC-ROC suggest that the Random Forest model is more effective in distinguishing between positiveand
negative instances of heart disease, making it a promising candidate for real-world heart disease prediction systems.
Fig.5. Line graph for Error Rate
While the Logistic Regression model demonstrates competitive performance, it may have limitations in handling complex
nonlinear relationships between features. SVM and Random Forest, being capable of capturing complex patterns,seembetter
suited for this particular task.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1128
Fig.6. Line graph for Precision, Recall, and F1-score
In conclusion, the Random Forest model stands out as the most accurate and reliable algorithm for heart disease prediction
among the three tested models. However, further analysisandvalidationonadditional datasetsmaybenecessaryto ensurethe
model's generalizability and robustness across different patient populations. The results obtained from this study hold
significant implications for the development of efficient heart disease prediction systems in clinical settings.
Fig.7. Line graph for AUC-ROC
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1129
6. DISCUSSION
The results illustrate the effectiveness of the three algorithms in accuratelypredictingheartdisease.Amongthem,theRandom
Forest model demonstrates superior performance, exhibiting the highest accuracy, precision, recall, F1-score, and AUC-ROC
values. This suggests that Random Forest is better suited to capture complex patterns and relationships within the dataset.
Despite this, the Logistic Regression model still shows competitive performance, making it a viable choice for certain
applications.
The comparative analysis underscores the significance of selecting the most appropriate algorithm based on the specific
context and data characteristics. Researchers and healthcare professionals can leverage these insights to develop an efficient
and reliable heart disease prediction system. Such a system holds significant potential in aiding early diagnosis and
personalized treatment,ultimatelyleadingtoimprovedpatientoutcomesandreducedhealthcareburden.Further researchand
validation on larger and diverse datasets are recommended to ascertain the generalizability and robustness of the models
across various patient populations and clinical settings.
Fig.8. Heatmap showing the various parameters of the 3 models.
7. CONCLUSION
The research successfully developed and evaluated heart disease prediction models using three machinelearningalgorithms:
Logistic Regression, Support Vector Machines (SVM), and Random Forest. The results demonstratedtheeffectivenessofthese
models in predicting heart disease, with Random Forest showing the highest accuracy and overall performance among the
three. The findings provide valuable insights for healthcare professionals and researchers looking to implement an efficient
heart disease prediction system.
The comparative analysis revealed that model selection shouldconsiderthedataset'scharacteristicsandthespecificcontext of
application. The superior performance of Random Forest can be attributed to its capability to handle complexrelationshipsin
the data. These results hold significant implications for improving patient care through early diagnosis and timely
interventions. Future research should focus on validating the models on diverseandlargerdatasetstoensuretheir robustness
and practical applicability in real-world healthcare scenarios.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1130
8. REFERENCES
[1] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart disease prediction using supervised
machine learning algorithms: Performance analysis and comparison. Computers in Biology and Medicine, 136, 104672.
[2] Ghosh, P., Azam, S., Jonkman, M., Karim, A., Shamrat, F. J. M., Ignatious, E., ... & De Boer, F. (2021). Efficient prediction of
cardiovascular disease using machine learning algorithms withreliefandLASSOfeatureselectiontechniques.IEEE Access,
9, 19304-19326.
[3] Khourdifi, Y., & Baha, M. (2019). Heart disease prediction and classification using machine learning algorithms optimized
by particle swarm optimization and ant colony optimization. International journal of Intelligent engineering & systems,
12(1).
[4] Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast cancer prediction: a comparative
study using machine learning techniques. SN Computer Science, 1, 1-14.
[5] Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble
classification techniques. Informatics in Medicine Unlocked, 16, 100203.
[6] Bhatla, N., & Jyoti, K. (2012). An analysis of heart disease prediction using different data mining techniques. International
Journal of Engineering, 1(8), 1-4.
[7] Hassan, C. A. U., Khan, M. S., & Shah, M. A. (2018, September). Comparison of machine learning algorithms in data
classification. In 2018 24th International Conference on Automation and Computing (ICAC) (pp. 1-6). IEEE.
[8] Gonsalves, A. H., Thabtah, F., Mohammad, R. M. A., & Singh, G. (2019, July). Prediction of coronary heart disease using
machine learning: an experimental analysis. In Proceedings of the 2019 3rd International Conference on Deep Learning
Technologies (pp. 51-56).
[9] Ak, M. F. (2020, April). A comparative analysis of breast cancer detection and diagnosis using data visualization and
machine learning applications. In Healthcare (Vol. 8, No. 2, p. 111). MDPI.
[10] Tu, M. C., Shin, D., & Shin, D. (2009, December). A comparative study of medical data classification methods based on
decision tree and bagging algorithms. In2009EighthIEEE International Conferenceon Dependable,AutonomicandSecure
Computing (pp. 183-187). IEEE.
[11] Salhi, D. E., Tari, A., & Kechadi, M. T. (2021). Using machine learningforheartdiseaseprediction.InAdvancesinComputing
Systems and Applications: Proceedings of the 4th Conference on Computing Systems and Applications (pp. 70-81).
Springer International Publishing.
[12] Zhang, J., Lafta, R. L., Tao, X., Li, Y., Chen, F., Luo, Y., & Zhu, X. (2017). Coupling a fast fourier transformation with a machine
learning ensemble model to support recommendations for heart diseasepatientsina telehealthenvironment.Ieee Access,
5, 10674-10685.
[13] Souri, A., Ghafour, M. Y., Ahmed, A. M., Safara, F., Yamini, A., & Hoseyninezhad, M. (2020). A new machine learning-based
healthcare monitoring model for student’s condition diagnosis in InternetofThingsenvironment.SoftComputing,24(22),
17111-17121.
[14] Ramesh, T. R., Lilhore, U. K., Poongodi, M., Simaiya, S., Kaur, A., & Hamdi, M. (2022). Predictive analysis of heart diseases
with machine learning approaches. Malaysian Journal of Computer Science, 132-148.
[15] Alarsan, F. I., & Younes, M. (2019). Analysis and classification of heart diseases using heartbeat features and machine
learning algorithms. Journal of big data, 6(1), 1-15.

More Related Content

PDF
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
PDF
Heart Disease Prediction using Machine Learning Algorithms
PDF
Risk Of Heart Disease Prediction Using Machine Learning
PDF
Predicting Heart Disease Using Machine Learning Algorithms.
PDF
238_heartdisease (1).pdf
PDF
PREDICTING THE RISK OF HAVING HEART DISEASE USING MACHINE LEARNING TECHNIQUES
PDF
Heart Disease Prediction Using Machine Learning Techniques
PPTX
Heart disease prediction using machine learning algorithm
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
Heart Disease Prediction using Machine Learning Algorithms
Risk Of Heart Disease Prediction Using Machine Learning
Predicting Heart Disease Using Machine Learning Algorithms.
238_heartdisease (1).pdf
PREDICTING THE RISK OF HAVING HEART DISEASE USING MACHINE LEARNING TECHNIQUES
Heart Disease Prediction Using Machine Learning Techniques
Heart disease prediction using machine learning algorithm

Similar to Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most Accurate and Reliable Machine Learning Algorithm (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Estimation of Prediction for Heart Failure Chances Using Various Machine Lear...
PPTX
prediction of heart disease using machine learning
PDF
Heart Disease Prediction using Machine Learning
PDF
IRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
PPTX
Project PPT(N) for a good buiness development ND BITCOIN.pptx
PDF
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study on t...
DOCX
KKKK.docx5555555555555555555555555555555555
PPTX
Effective Feature Engineering Technique for Heart Disease Prediction.pptx
PPTX
Short story_2.pptx
PDF
Heart Disease Prediction Using Multi Feature and Hybrid Approach
PDF
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
PDF
Heart disease classification using Random Forest
PDF
Heart Disease Prediction using Machine Learning Algorithm
PPTX
Short story.pptx
PDF
predictionofheartdiseaseusingmachinelearning.pdf
PPTX
Prediction of heart disease using machine learning.pptx
PPTX
PPT.pptx
PDF
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
PDF
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Estimation of Prediction for Heart Failure Chances Using Various Machine Lear...
prediction of heart disease using machine learning
Heart Disease Prediction using Machine Learning
IRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
Project PPT(N) for a good buiness development ND BITCOIN.pptx
Accurate Prediction of Heart Disease Using Machine Learning-A Case Study on t...
KKKK.docx5555555555555555555555555555555555
Effective Feature Engineering Technique for Heart Disease Prediction.pptx
Short story_2.pptx
Heart Disease Prediction Using Multi Feature and Hybrid Approach
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
Heart disease classification using Random Forest
Heart Disease Prediction using Machine Learning Algorithm
Short story.pptx
predictionofheartdiseaseusingmachinelearning.pdf
Prediction of heart disease using machine learning.pptx
PPT.pptx
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...

More from IRJET Journal (20)

PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
FIR filter-based Sample Rate Convertors and its use in NR PRACH
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
FIR filter-based Sample Rate Convertors and its use in NR PRACH

Recently uploaded (20)

PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PPT on Performance Review to get promotions
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
web development for engineering and engineering
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
573137875-Attendance-Management-System-original
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT on Performance Review to get promotions
UNIT 4 Total Quality Management .pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
web development for engineering and engineering
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
573137875-Attendance-Management-System-original
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Foundation to blockchain - A guide to Blockchain Tech
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Geodesy 1.pptx...............................................
Lecture Notes Electrical Wiring System Components
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction

Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most Accurate and Reliable Machine Learning Algorithm

  • 1. © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1122 Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most Accurate and Reliable Machine Learning Algorithm Aatmaj Amol Salunke1 Computer Science & Engineering Department of Computer Science & Engineering, School of Computer Science and Engineering, Manipal University Jaipur Rajasthan, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Heart disease is a significant health concern, warranting accurate prediction models for timely intervention. This research paper presents a comparative analysis of three popular machine learning algorithms, namely Logistic Regression, Support Vector Machines (SVM), and RandomForest, forheartdisease prediction. Utilizingacomprehensivedatasetencompassing clinical and lifestyle features, each model was developed and evaluated using standard metrics. The study unveils the most accurate and reliable algorithm for heart disease prediction, offering valuable insights into model performance. Furthermore, feature importance analysis sheds lightoncriticalfactors influencingaccuratepredictions. Theresultsaidhealthcare professionals in selecting the most appropriate model for efficient heart disease prediction, contributing to improved patient care and clinical decision-making. Random Forest achieved 88% accuracy, outperforming LogisticRegressionandSVMforheartdiseaseprediction. Key Words: Heart disease prediction, Machine learning algorithms, Logistic Regression, Support Vector Machines (SVM), Random Forest, Comparative analysis 1.RELATED WORK Ali et al. [1] proposed a machine learning approach achieving 100% accuracy, sensitivity, and specificity for heart disease prediction. Ghosh et al. [2] proposed a model achieving 99.05% accuracy for heart disease prediction using hybrid classifiers and feature selection. Khourdifi et al. [3] proposed a hybrid approach achieving 99.65% accuracy for heart disease classification using optimization algorithmsandfeatureselection.Latha etal.[5]proposedanensembleclassificationapproach achieving 7% increase in accuracy for heart disease prediction. Bhatla et al. [6] proposed using neural networks with 15 attributes for heart disease prediction, outperforming other data mining techniques. Gonsalves et al.[8]proposedusingNaïve Bayes, SVM, and Decision Tree to predict CHD with promising results. Salhi et al. [11] proposed using neural networks with a correlation matrix for heart disease prediction with 93% accuracy. Souri et al. [13] proposed an IoT-based student healthcare monitoring model with SVM achieving 99.1% accuracy. Ramesh et al. [14] proposed using supervised learning methods, including KNN, for heart disease prediction with promising results. Alarsan et al. [15]proposedanECGclassificationapproach using machine learning, achieving 97.98% accuracy with Random Forest for binary classification. 2.INTRODUCTION Heart disease is a prevalent global health concern, necessitating accurate prediction models for timely interventions and improved patient care. This research paper conducts a comprehensive comparative analysis of three widely used machine learning algorithms: Logistic Regression, Support Vector Machines (SVM), and Random Forest, in the context of heart disease prediction. Leveraging a diverse dataset comprising clinical and lifestyle features, each model was developed and evaluated using standard performance metrics. The study unveils the most accurate and reliable algorithm for heart disease prediction, enabling informed decision-making by healthcare professionals. Furthermore, feature importance analysis elucidates the significant factors influencing accurate predictions.Theobtainedinsightsholdpotential implicationsforclinical practice,asthe most suitable model can be chosen based on performance and interpretability. Ultimately, this research contributes to the advancement of heart disease prediction systems, enhancing healthcare outcomes and patient well-being. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1123 Fig.1. Generic Architecture of Heart Disease Prediction System using Machine Learning 3. DATASET The research utilizes a comprehensive dataset sourcedfromdiversehealthcareinstitutions,encompassingclinical andlifestyle features relevant to heart disease prediction. The dataset includes demographic information, medical history, vital signs, laboratory results, and lifestyle factors of patients. Each record is labelled to indicate the presence or absenceofheartdisease. The dataset is carefully curated and pre-processed to handle missing values and ensure data quality. This rich and reliable dataset forms the foundation for trainingandevaluatingtheheartdiseasepredictionmodelsusingLogisticRegression,Support Vector Machines (SVM), and Random Forest. The inclusion of variedfeatures enablesa holisticanalysisandrobustcomparison of the machine learning algorithms. Attribute Information in the Dataset: 1. age 2. sex 3. chest pain type (4 values) 4. resting blood pressure 5. serum cholesterol in mg/dl 6. fasting blood sugar > 120 mg/dl 7. resting electrocardiographic results (values 0,1,2) 8. maximum heart rate achieved. 9. exercise induced angina. 10. oldpeak = ST depression induced by exercise relative to rest 11. the slope of the peak exercise ST segment 12. number of major vessels (0-3) coloured by fluoroscopy. 13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect 14. The names and social security numbers of the patients were recently removed from the database, replaced with dummy values.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1124 Fig.2. Dataset for Heart Disease Prediction System Modelling 4. METHODOLOGY 1. Data Collection and Preprocessing: The dataset used in this study comprises anonymized patient data with 14 attributes: age, sex, chest pain type, resting blood pressure, serum cholesterol level, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, oldpeak (ST depression induced by exercise relative to rest), slope of the peak exercise ST segment, number of major vessels colored by fluoroscopy, and thal (thalassemia type). Social security numbers and patient identifiers have been replaced with dummy values to ensure data privacy. 2. Data Preprocessing: The dataset undergoes rigorous preprocessingtohandle missingvaluesandstandardizethedata.Categorical attributessuchas chest pain type and thal are encoded using one-hot encoding, transforming them into numerical representations suitable for the machine learning algorithms. 3. Train-Test Split: The preprocessed dataset is divided into a training set and a test set, maintaining an appropriate ratio to ensure robust model evaluation. The training set is used to train the models, while the test set is reserved for unbiased performance assessment. 4. Feature Scaling: Numerical features such as age, blood pressure, serum cholesterol, and maximum heart rate are scaled to bring them within a common range, facilitating better convergence and training stability for the machine learning algorithms. 5. Model Development: Three machine learning algorithms, namely Logistic Regression, Support Vector Machines (SVM), and Random Forest, are implemented for heart disease prediction. Each model is trained using the training set with the target variableasthepresence or absence of heart disease. 6. Model Evaluation: The trained models are evaluated using standard performance metrics,includingaccuracy,precision, recall,F1-score,andarea under the receiver operating characteristic curve (AUC-ROC). The evaluation aims to determine the predictive capabilities of each algorithm and identify the most accurate model for heart disease prediction. 7. Feature Importance Analysis: For interpretability, feature importanceanalysisisconductedtoidentifythemostinfluential attributescontributingtoaccurate heart disease predictions. Techniques such as permutation importance or feature importance scores from theRandomForest model are employed for this analysis. 8. Comparative Analysis: The performance metrics and feature importance results are compared across thethreemodels,LogisticRegression,SVM,and Random Forest, to determine the most reliable and effective algorithm for heart disease prediction.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1125 9. Ethical Considerations: Throughout the methodology, ethical considerationsaregiven paramountimportance,ensuringtheconfidentialityandprivacy of patient data. The study adheres to data protection regulations and guidelines, guaranteeing responsible data usage for research purposes. By following this methodology, the research aims to develop a robust and interpretable heart disease predictionsystemusing machine learning and provide valuable insights into the most influential factors for accurate predictions. Fig.3. Comparing the Outcomes of the 3 proposed models 5. RESULTS AND ANALYSIS The heart disease prediction system was evaluated using three machine learning algorithms: Logistic Regression, Support Vector Machines (SVM), and Random Forest. The models were trained and tested on the dataset with 14 attributes. The performance of each model was assessed using various metrics, including accuracy, error rate, precision, recall, F1-score,and AUC-ROC.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1126 Table 1. Results for Logistic Regression Model Metric Value Accuracy 0.82 Error Rate 0.18 Precision 0.84 Recall 0.80 F1-Score 0.82 AUC-ROC 0.88 Table 2. Results for Support Vector Machine (SVM) Metric Value Accuracy 0.85 Error Rate 0.15 Precision 0.87 Recall 0.83 F1-Score 0.85 AUC-ROC 0.90 Table 3. Results for Random Forest Model Metric Value Accuracy 0.88 Error Rate 0.12 Precision 0.90 Recall 0.87 F1-Score 0.88 AUC-ROC 0.93 From the tabulated results, it is evident that all three models - Logistic Regression, SVM, and Random Forest - demonstrate promising performance in heart disease prediction. However, the Random Forest model exhibits the highest accuracyof0.88, followed closely by the SVM model with an accuracy of 0.85. The Logistic Regression model also performs reasonably well, achieving an accuracy of 0.82.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1127 Fig.4. Line graph for Accuracy The Random Forest model outperforms the other models in terms of accuracy, precision, recall, F1-score, and AUC-ROC. The higher accuracy and AUC-ROC suggest that the Random Forest model is more effective in distinguishing between positiveand negative instances of heart disease, making it a promising candidate for real-world heart disease prediction systems. Fig.5. Line graph for Error Rate While the Logistic Regression model demonstrates competitive performance, it may have limitations in handling complex nonlinear relationships between features. SVM and Random Forest, being capable of capturing complex patterns,seembetter suited for this particular task.
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1128 Fig.6. Line graph for Precision, Recall, and F1-score In conclusion, the Random Forest model stands out as the most accurate and reliable algorithm for heart disease prediction among the three tested models. However, further analysisandvalidationonadditional datasetsmaybenecessaryto ensurethe model's generalizability and robustness across different patient populations. The results obtained from this study hold significant implications for the development of efficient heart disease prediction systems in clinical settings. Fig.7. Line graph for AUC-ROC
  • 8. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1129 6. DISCUSSION The results illustrate the effectiveness of the three algorithms in accuratelypredictingheartdisease.Amongthem,theRandom Forest model demonstrates superior performance, exhibiting the highest accuracy, precision, recall, F1-score, and AUC-ROC values. This suggests that Random Forest is better suited to capture complex patterns and relationships within the dataset. Despite this, the Logistic Regression model still shows competitive performance, making it a viable choice for certain applications. The comparative analysis underscores the significance of selecting the most appropriate algorithm based on the specific context and data characteristics. Researchers and healthcare professionals can leverage these insights to develop an efficient and reliable heart disease prediction system. Such a system holds significant potential in aiding early diagnosis and personalized treatment,ultimatelyleadingtoimprovedpatientoutcomesandreducedhealthcareburden.Further researchand validation on larger and diverse datasets are recommended to ascertain the generalizability and robustness of the models across various patient populations and clinical settings. Fig.8. Heatmap showing the various parameters of the 3 models. 7. CONCLUSION The research successfully developed and evaluated heart disease prediction models using three machinelearningalgorithms: Logistic Regression, Support Vector Machines (SVM), and Random Forest. The results demonstratedtheeffectivenessofthese models in predicting heart disease, with Random Forest showing the highest accuracy and overall performance among the three. The findings provide valuable insights for healthcare professionals and researchers looking to implement an efficient heart disease prediction system. The comparative analysis revealed that model selection shouldconsiderthedataset'scharacteristicsandthespecificcontext of application. The superior performance of Random Forest can be attributed to its capability to handle complexrelationshipsin the data. These results hold significant implications for improving patient care through early diagnosis and timely interventions. Future research should focus on validating the models on diverseandlargerdatasetstoensuretheir robustness and practical applicability in real-world healthcare scenarios.
  • 9. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1130 8. REFERENCES [1] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in Biology and Medicine, 136, 104672. [2] Ghosh, P., Azam, S., Jonkman, M., Karim, A., Shamrat, F. J. M., Ignatious, E., ... & De Boer, F. (2021). Efficient prediction of cardiovascular disease using machine learning algorithms withreliefandLASSOfeatureselectiontechniques.IEEE Access, 9, 19304-19326. [3] Khourdifi, Y., & Baha, M. (2019). Heart disease prediction and classification using machine learning algorithms optimized by particle swarm optimization and ant colony optimization. International journal of Intelligent engineering & systems, 12(1). [4] Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast cancer prediction: a comparative study using machine learning techniques. SN Computer Science, 1, 1-14. [5] Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. [6] Bhatla, N., & Jyoti, K. (2012). An analysis of heart disease prediction using different data mining techniques. International Journal of Engineering, 1(8), 1-4. [7] Hassan, C. A. U., Khan, M. S., & Shah, M. A. (2018, September). Comparison of machine learning algorithms in data classification. In 2018 24th International Conference on Automation and Computing (ICAC) (pp. 1-6). IEEE. [8] Gonsalves, A. H., Thabtah, F., Mohammad, R. M. A., & Singh, G. (2019, July). Prediction of coronary heart disease using machine learning: an experimental analysis. In Proceedings of the 2019 3rd International Conference on Deep Learning Technologies (pp. 51-56). [9] Ak, M. F. (2020, April). A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. In Healthcare (Vol. 8, No. 2, p. 111). MDPI. [10] Tu, M. C., Shin, D., & Shin, D. (2009, December). A comparative study of medical data classification methods based on decision tree and bagging algorithms. In2009EighthIEEE International Conferenceon Dependable,AutonomicandSecure Computing (pp. 183-187). IEEE. [11] Salhi, D. E., Tari, A., & Kechadi, M. T. (2021). Using machine learningforheartdiseaseprediction.InAdvancesinComputing Systems and Applications: Proceedings of the 4th Conference on Computing Systems and Applications (pp. 70-81). Springer International Publishing. [12] Zhang, J., Lafta, R. L., Tao, X., Li, Y., Chen, F., Luo, Y., & Zhu, X. (2017). Coupling a fast fourier transformation with a machine learning ensemble model to support recommendations for heart diseasepatientsina telehealthenvironment.Ieee Access, 5, 10674-10685. [13] Souri, A., Ghafour, M. Y., Ahmed, A. M., Safara, F., Yamini, A., & Hoseyninezhad, M. (2020). A new machine learning-based healthcare monitoring model for student’s condition diagnosis in InternetofThingsenvironment.SoftComputing,24(22), 17111-17121. [14] Ramesh, T. R., Lilhore, U. K., Poongodi, M., Simaiya, S., Kaur, A., & Hamdi, M. (2022). Predictive analysis of heart diseases with machine learning approaches. Malaysian Journal of Computer Science, 132-148. [15] Alarsan, F. I., & Younes, M. (2019). Analysis and classification of heart diseases using heartbeat features and machine learning algorithms. Journal of big data, 6(1), 1-15.