IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 1, March 2024, pp. 658~672
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp658-672  658
Journal homepage: http://ijai.iaescore.com
Potential directions on coronary artery disease prediction using
machine learning algorithms: A survey
Anu Ragavi Vijayaraj, Subbulakshmi Pasupathi
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Article Info ABSTRACT
Article history:
Received Feb 9, 2023
Revised May 25, 2023
Accepted Jun 3, 2023
Coronary artery disease (CAD) is the most ubiquitous and protuberant cause
of fatal death. The hit in mortality rate is because of certain lifestyle variables
including unhealthy diet, usage of tobaccos and drugs, physical inactivity, and
environmental pollution. Traditional screening tests including computed
tomography, angiography, electrocardiography, and magnetic resonance
imaging are employed for diagnosis and would necessitate more manpower.
Machine learning (ML) has been utilized in healthcare to create early
predictions from massive volumes of data. The Scopus, Web of Science
databases were exhaustively searched utilizing a search strategy that
comprised CAD prediction, cardiac illness detection, and heart disease
categorization. After applying the inclusion and exclusion criteria to the 99
articles obtained, the population of the study was composed of 30 articles.
This review study offers an organized look at the articles published in ML-
based CAD detection and classification models that include clinical variables.
The use of ML could produce amazing results in CAD detection, as evidenced
by the classifiers random forest, decision tree, and k-nearest-neighbour with
accuracy being >90%. The use of ML in CAD diagnosis lowers false-positive,
and false-negative errors, and presents a special opportunity by providing
patients quick, and affordable diagnostic services.
Keywords:
Classification
Clinical variables
Coronary artery disease
Machine learning
This is an open access article under the CC BY-SA license.
Corresponding Author:
Subbulakshmi Pasupathi
School of Computer Science and Engineering, Vellore Institute of Technology
Chennai, Tamil Nadu, India
Email: subbulakshmi.p@vit.ac.in
ABBREVIATIONS
LR : Logistic regression DNN : Deep neural network
SVM : Support vector machine FNN : Fuzzy neural network
KNN : K nearest neighbor ADB : AdaBoost
DT : Decision tree XGB : Extreme gradient boosting
RF : Random forest SMO : Sequential minimal optimization
RT : Random tree MLR : Multinomial logistic regression model
NB : Naïve Bayes SVC : Support vector classifier
GB : Gradient boosting PSO : Particle swarm optimization
GNB : Gaussian Naive Bayes CHAID : Chi-squared automatic interaction detection
NN : Neural network NFC : Neuro fuzzy classifier
GA : Genetic algorithm CART : Classification and regression trees
MLP : Multi layer perceptron CNN : Convoilutional neural networks
ANN : Artificial neural network
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
659
1. INTRODUCTION
People are worried about their hectic schedule in day to day lives and get addicted to drugs and
tobaccos as stress relievers. Ultimately the individuals grow with obesity parallelly. This piles up with serious
life threading disorders namely heart issues, cancer, tuberculosis and many more. The most challenging task is
to predict them before it gets worse. Cardio vascular diseases (CVD) are one such life threading disease
accounting 31.8% of all global deaths, according to the recent World Health Organisation (WHO) statistics [1]
as depicted in Figure 1. A rise in mortality rate can be avoided if it is predicted in advance and proper lifestyle
choices are made.
There are four main types of CVD’s; a stroke, also known as a brain attack, happens when blood
supply to a portion of the brain is cut off or when a blood artery in the brain bursts. Peripheral artery disease
(PAD) is a common disorder characterized by constricted arteries that restrict blood flow to the arms or legs.
Heart valve disorders include the aortic disease. Aortic valve dysfunction results in improper operation of the
valve between the left ventricle, the lower left heart chamber, and the aorta, the main artery to the body. The
last type is coronary heart disease, commonly referred to as coronary artery disease (CAD) and ischemic heart
disease (IHD), which is caused by an accumulation of plaque deposits in the arteries that obstruct the heart's
blood flow.
Figure 1. Global causes of death [1]
In accordance with American Heart Association, IHD was predicted to affect 244.1 million persons
worldwide in 2020, and it was more common in men than in women (141.0 and 103.1 million people,
respectively) [2]. The small arteries in the heart muscles are affected by coronary microvascular disease,
another type of heart disease. In women, coronary microvascular disease is more prevalent.
Although doctors have not provided a definitive explanation for why CAD develops, risk factors are
a significant contributing element. It includes obesity, usage of drugs and tobaccos, cholesterol, and also
because of family history. In addition to all these risk variables, the environmental pollution stands
the foremost for the development of any disease. The small particles from the polluted air can affect the heart
and blood vessels.
According to the global burden of disease (GBD) report, pollution caused 9 million deaths worldwide
in 2019, with cardiovascular disease, including IHD (31.7%), and stroke (27.7%), accounting for 61.9% of all
fatalities. This data underscores the substantial impact of pollution on mortality worldwide, particularly its
strong association with heart-related health issues. Figure 2 shows Cardiovacular disease impact on pollution,
Figure 2(a) displays the age-standardized number of fine particulate matter (PM2.5) related deaths per 100,000
people in 2019, Figure 2(b) displays the total number of noncommunicable disease-related deaths caused by
pollution worldwide in 2019, Figure 2(c) displays the annual mean population-weighted PM2.5 concentrations
in China, India, and the United States from 1990 through 2019 and Figure 2(d) displays the model for the
exposure-response relationships between CVD and PM2.5 air pollution in a 50-year-old person.
The three types of arteries are right coronary artery (RCA), left circumflex artery (LCX), and left
anterior descending artery (LAD). CAD is caused when blood flow to the heart stops partially or completely
because of the plaque which develops in the arteries. The plaque narrows the arteries blocking the oxygen-rich
blood flow to the heart as shown in Figure 3. Because of the reduction in oxygen supply to the heart, chest
pain, shortness of breath, and heart attack are caused.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
660
(a) (b)
(c) (d)
Figure 2. Cardiovacular disease impact on pollution [3]: (a) worldwide mortality from cardiovascular disease
associated with air pollution, (b) deaths caused by global pollution, (c) annual mean PM2.5 Pollution levels
(1990-2019), and (d) cardiovascular disease and PM2.5 Pollution: an exposure-response connection
Figure 3. Coronary artery disease [4]
Medical experts recommended a variety of expensive and technically complex tests, such as
electrocardiograph (ECG), angiography, computed tomography, and magnetic resonance imaging to diagnose
individuals with positive signs of CAD. Currently, these examinations are high-priced and involves methodical
professionals. In order to replace the aforementioned standard tests, researchers are striving to develop a less
expensive yet equally effective test.
Machine learning (ML) and data mining methods are widely used in the analysis and extraction of
information from medical data. ML-based techniques are effective in a variety of disciplines, including
agriculture, credit card fraud detection, speech recognition, and is frequently recommended for predicting heart
disease since it extracts more efficient and accurate data from large datasets, making predictions
straightforward. It is the primary foundation of ML, assisting in the management of large volumes of data,
having a fast-processing speed, and generating predictions in the early phases of development.
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
661
There exist two different types of studies that can be found in the literatures. Some research employed
clinical indicators to categorize CAD patients, including age, blood pressure, smoking history, while other
studies used signal recordings like electrocardiograph (ECG), photoplethysmography (PPG), and
phonocardiography (PCG) to identify CAD symptoms. Consequently, in order to direct the evaluation of future
works, our study concentrates on the workflow on the clinical factors. The remainder of the paper is structured:
section 2 highlights ML for CAD diagnosis. Sections 3 and 4 discusses about data collection and classification.
Finally, sections 5 and 6 discourses the results and conclusion.
In ML enables computers to learn and develop without being explicitly programmed with little or no
human intervention. Both scientists and medical professionals are looking for affordable, precise, and quick
CAD diagnosis and treatment options. In order to help researchers better address numerous difficulties in their
future work, this review paper will highlight the strongest findings of earlier studies. This study is added.
2. ML FOR CAD DIAGNOSIS
Artificial intelligence (AI) has a substantial impact on the diagnosis of heart disease through the
analysis of medical data. AI facilitates accurate forecasting of cardiovascular results and the non-invasive
identification of CAD. A branch of AI called machine learning ressed:
− General flow diagram of the prediction model
− Data acquisition-CAD prediction
− Review of the articles on CAD prediction using machine learning algorithms (Back ground study)
2.1. General flow diagram of the prediction model
Figure 4 is the general heart disease prediction model. The patient details include age, cholesterol,
sugar, blood pressure, are maintained in the hospital’s patient database. With the patient database, the data is
collected and handled to the next phase for pre-processing. With the preprocessed data, the best classification
method is identified. In this layer, we also train our model using the processed dataset, and prepare it for the
next layer. We test the model we trained in the final phase, thus after dividing the dataset into training and
testing data. The testing part of the dataset is used to assess the algorithm's classification accuracy.
Figure 4. Heart disease prediction model
2.2. Dataset
2.2.1. Kaggle database
Four databases make up the 1988 created Kaggle heart disease dataset: Cleveland, Hungary,
Switzerland, and Long Beach V. Despite having 76 features, including the anticipated attribute, only 14 of
them are used in all reported studies. The "target" field is concerned with the patient's cardiac status. It takes
the integer values 0 to indicate no disease and 1 to indicate disease [5].
2.2.2. UCI database
The UCI heart disease databases includes Cleveland with 303 instances, Hungarian with 294
instances, Switzerland with 123 instances, and Long Beach VA with 200 instances. There exist 76 features in
the UCI database and all published experiments employ a section of 14 of them [6]. Cleveland database in
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
662
particular is commonly employed by ML investigators. The "goal" attribute indicates the patient's possessing
heart disease and it has integer values ranging from zero (No presence) to four. Experimentations carried out
with Cleveland database focused mainly to distinguish between the disease's existence (Values 1, 2, 3, 4) and
absence (Value 0).
2.2.3. Z-Alizadeh Sani database
The 303 patient records that makes up the Z-Alizadeh Sani dataset have 56 features each. The
characteristics are divided into four categories: demographic, exam and symptom, ECG, laboratory and echo
aspects. Every patient could be classified as either CAD or normal. If a patient's diameter narrowing exceeds
50%, they are classified with CAD; otherwise, they are considered to be normal [7].
2.2.4. Indira Gandhi Medical College database, Shimla, India
The dataset collected in real time from Indira Gandhi Medical College contains 335 instances of
patients. All of the participants agreed to undergo an Angiography after developing CAD suspicions. Each
patient had 27 characteristics, including historical, demographic, and laboratory information [8].
2.2.5. General hospital, Nigeria database
The dataset includes 506 instances with 12 features from two general hospitals in Kano, Nigeria.
Eleven clinical risk factors for CAD are present in the dataset, along with one demographic component. The
binary classification with integer values 0 for no disease and 1 for disease is the “goal” variable [9], [10].
2.2.6. Extension of Z-Alizadeh Sani dataset
This dataset is Z-Alizadeh Sani's extended version. The main difference between Z-Alizadeh Sani and
extension Z-Alizadeh Sani dataset is that the extended dataset not only helps in the classification of CAD but
also the stenosis of the arteries is predicted with the 3 added features LAD, LCX, and RCA to the existing
dataset [7] The list of datasets utilized for CAD diagnosis by the researchers and authors is attached in Table 1.
Table 1. The list of datasets for CAD diagnosis
Dataset No. Dataset Name Sample No. Input features Features No. Stenosis
1 Heart disease dataset-Kaggle 1026 14
2 UCI repository 920 14
3 Z-Alizadeh Sani 303 56
4 Extension of Z-Alizadeh Sani 303 59 ✓
5 Indira Gandhi Medical College Database, Shimla, India 335 25
6 General hospital, Nigeria 506 18
2.3. Review of the articles on CAD prediction using machine learning algorithms (Back ground study)
Mostly CAD detection systems employ supervised ML techniques. There have also been few reports
of clustering algorithms being used for CAD diagnosis. The list of articles published using ML for CAD
diagnosis is depicted in Table 2 (see appendix).
Alizadehsani et al. [7] proposed a CAD diagnosis model which calculates the stenosis of each vessel.
The effect of features on these three vessels was evaluated using the information gain. ML classifiers including
C4.5, NB, and KNN were applied on the new features added to the Alizadeh dataset. C4.5 reaches the highest
accuracy with 74.20%, 63.76%, and 68.33% for LAD, LCX, and RCA vessels, respectively. In a study by
Garavand et al. [39] the efficiency of the various ML classifiers MLP, SVM, LR, J48, RF, KNN, and NB in
predicting CAD was compared. The most effective algorithms for diagnosing CAD from patient examination
data were SVM and RF.
With the survey carried out in Table 2, all the papers used public datasets including UCI repository,
Z-Alizadeh Sani dataset. Only few researchers utilize real world datastore. The limitation of the Cleveland
dataset is that few instances are found to be missing and suitable missing value imputation methods are
incorporated to eliminate missing values from the dataset. Unlike other datasets, Cleveland dataset is used for
multilevel classification problems where target variable indicates the level of disease ranging from 0 to 4.
Some studies [40]–[42] on CAD diagnosis achieved 90%, 70%, and 75% accuracy, respectively.
However, they did not assess the stenosis of each vessel independently. Researchers can employ extension of
Z-Alizadeh dataset for stenosis arteries. 37 features of the extension of Z-Alizadeh dataset were examined and
the stenosis of these arteries was significantly influenced by the features age and typical chest pain.
Researchers generally employ performance estimators including accuracy, sensitivity, specificity,
F1 score as shown in (2), (3), (4), and (5). Table 2 makes it clear that accuracy was the selection criterion used
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
663
by all authors. As additional selection criteria, [11], [15], [21], [28], [31], [37], [38] used F1 score, sensitivity,
and specificity.
3. DATA COLLECTION
Collecting data is the foremost step in ML pipeline. It is the process of gathering, measuring, and
analysing information gleaned from a profusion of diverse sources. The information gathered is utilised to
create ML and AI solutions. Data collection includes preprocessing, feature extraction, and selection.
3.1. Data preprocessing
Four stages make up preprocessing, which is done to ensure high-quality data. In order to produce an
accurate result, cleaning involves removing noisy and missing values from the dataset. Once the data has been
cleaned of noise and missing values, it is translated into a different format without changing the contents of the
datasets by transformation. It involves aggregation, standardization, and smoothing. The process of merging
data from numerous sources into a single database by integration. For the gathered data to give relevant
findings, it must be structured, which is known as reduction.
3.2. Dimensionality reduction
Feature extraction is a dimensionality reduction procedure that reduces an initial collection of raw
data to more manageable groups for processing. Algorithms for linear transformations that are often utilized
include principal component analysis. It looks for mutually orthogonal directions in the feature space as well
as directions that maximize variance.
Feature selection (FS) eliminates the redundant and irrelevant data by increasing the accuracy and
provide a better understanding of the model. By choosing the most prominent features, a unique fast conditional
mutual information feature selection technique (FCMIM) improves accuracy [25]. The algorithm is feasible
with the classifier SVM in order to detect cardiac problems. For the features that less contributes for the
improvement in the system, Ali et al. [17] An optimally configured and improved deep belief network (OCI-
DBN) approach for heart disease prediction based on Ruzzo-Tompa and stacked genetic algorithm a novel
feature selection algorithm Ruzzo-Tompa which eliminates the irrelevant features from the dataset. With the
selected features, an optimally configured and Improved deep belief network is created and the accuracy is
improved up to 94.61%.
With the feature selection techniques namely MLR, and sequential feature selection (SFS), age, slope,
exang, fluoroscopy, and thalach are the features selected from [34] NFC with the feature selection method
MLR (MLR+NFC), attains the accuracy of 84% than with SFS+NFC. With suitable feature selection through
PSO [32], the attributes are selected from IGMC, Shimla. The selected attributes include smoking, diabetes
mellitus, high density lipoprotein. These attributes are then used with the classification algorithms and MLR
achieved an accuracy of 84.17% by properly identifying the wrong instances.
4. CLASSIFICATION
4.1. Machine learning algorithms
ML algorithms discover hidden patterns in data, anticipate outcomes, and enhance performance based
on their own experiences. These algorithms and models are intended to learn from data and generate predictions
or choices in the absence of explicit instructions. After selecting the attributes from suitable feature selection
techniques, the selected feature subset along with classifier results are compared.
4.1.1. K-nearest neighbor
Hodges and Evelyn introduced the KNN rule, a nonparametric technique for classification and
regression, in 1951. KNN is a straightforward but efficient classification method where little to no prior
knowledge about the distribution of the data is available since it makes no assumptions on the data. The strategy
entails locating the k data points in the training set that are most similar to the data point for which a target
value is missing and assigning the average value of those data points to the missing data point.
A single diagnostic method for the prediction of 3 cardiac abnormalities namely CAD, myocardial
infarction, and congestive heart failure are developed by Acharya et al. [15] using ECG. Additionally, it
pinpoints the precise cardiac abnormalities seen in patients during an ECG test, obviating the need for other
diagnostic techniques. This non-invasive, cost-effective method can be further extended in near future by
detecting the cardiac abnormalities in the early phase by using a single ECG pulse.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
664
4.1.2. Logistic regression
LR is a supervised method for binary problems either true/false, yes/no, pass/fail. The independent
variables can be categorical/numerical whereas the dependent variable is always categorical. The application
of LR includes credit scoring, predicting user behavior, and discrete choice analysis. By removing the
insignificant features from the Cleveland dataset, the optimal attributes were chosen using MLR, and SFS.
With the selected attributes, a novel NFC is proposed by Marateb and Goudarzi [34] performance is achieved
when integrated NFC with the MLR with 84% than SFS+MLR.
4.1.3. Naïve Bayes
NB uses Bayes’ theorem for classification problems. Application of NB includes medical diagnosis,
spam filters, text analysis. Tarawneh and Embarak [23] created a hybrid model for CAD diagnosis with the
selected 12 features, and different classification algorithms including SVM, KNN, GA, RF, NN, and J4.8 is
implemented. NB, SVM shows good performance results with an accuracy of 89.2%. The following is the
formula for Bayes' theorem.
𝑃(𝐴|𝐵) =
𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐵)
(1)
𝑃(𝐴|𝐵) = Posterior probability of class given predictor
𝑃(𝐵|𝐴) = Likelihood is the probability of predictor given class
𝑃(𝐴) = Class prior probability
𝑃(𝐵) = Predictor Prior probability
4.1.4. Decision tree
It is a supervised approach that solves prediction and classification problems. DT is a tree-like
structure with the nodes indicating the test on the attribute, branch nodes expressing the test's outcome, and
leaf nodes providing the classification label. With the path created from root to the leaf node, DT can be easily
transformed into a set of rules. Finally, appropriate conclusions are reached by following these rules.
Applications include sentiment analysis, and products selection.
Fuzzy expert system for the prediction of CAD is developed by Muhammad and Algehyne [10] which
includes knowledge base, inference engine, and defuzzification phases. 87 diagnostic rules are framed in the
knowledge base. Instead of using traditional methods like interviews, questionnaires, the upgraded C4.5 is used
to incorporate human knowledge into the system's knowledge base. C4.5 based fuzzy expert system attained
an accuracy of 94.55%. With the real-world dataset collected from Indira Gandhi Medical College, Shimla, the
data set is evaluated for missing values using a hybrid and new k-means cluster centroid-based method, and
C4.5, NB Tree, and MLPs are utilized to predict CAD. When compared with the other predictive models, C4.5
constructed with 25 features yielded the highest accuracy, sensitivity and specificity [8].
With the Z-Alizadeh dataset, CART model is created by calculating the feature importance score of
the features. A typical feature attains the highest feature score of 16.5%. Features which attained with 0%
importance is neglected for classification [28]. Idris et al. [33] created an embedded method DT using RF with
the features selected from Gini impurity. 20 features were used for the classification of CAD. The model of
NN with Embedded DT features acquired the highest precision and accuracy 94.5%. University Of Malaya
Medical Centre (UMMC) and subarachnoid hemorrhage (SAH) dataset is compared on the performance metrics.
4.1.5. Random forest
RF is a technique for solving regression and classification problems. A random forest model, as an
ensemble approach, is built from a number of small DTs or estimators, each of which provides its own
predictions. The estimators' estimates are combined by the random forest model to get a prediction that is more
accurate. The ‘forest’ in RF is trained using bootstrap or bagging aggregation. Applications include credit card
default, stock market prediction, and product recommendation.
According to Muhammad et al. [11], the DT created using the RF approach was the most effective
model in terms of accuracy and receiver operating characteristic (ROC). With this combined DT from RF
model, production rules are created, and this expert system diagnosed CAD victims very accurately at
the rate of 92.04% in Nigeria. According to Jinny and Mate [21], it is evident that, of all the classifiers, by using
the most features possible for the classification model, the RF classifier was successful in getting an accuracy of
about 91%.
Hybrid RF with a linear model–HRFLM proposed by Mohan et al. [24] utilized with 11 attributes and
achieved an accuracy of 88.7% when applied to the UCI heart dataset. Rajdhan et al. [36] developed a ML
model that contrasts four different methods using the UCI Cleveland dataset. Examining the classification
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
665
accuracy of RF, LR, NB, and DT algorithms, it was found that the RF approach had an accuracy rate of 90.16%.
Proposed method by Joloudari et al. [16], data mining techniques SVM, CHAID, C5.0, and RTs were
implemented on the Z-Alizadeh dataset. Random trees are the best amongst the other with 91.47% accuracy
using the 10-fold validation. The following metrics were examined and evaluated: accuracy, AUC, Gini, return
on investment (ROI), profit, confidence, response, and gain.
The Cleveland heart dataset was used to train classifiers NB, SVM, LR, RF, and Adaboost [35]. A
comparison of the results without feature selection and with feature selection is made. 8 features selected from
the hybrid model yielded the highest accuracy results.
4.1.6. Support vector machine
SVM is a type of supervised learning approach that categorizes data with the hyperplane. The new
data is assigned to the appropriate category by the hyperplane, which then categorizes the n-dimensional space.
The data points which is very close to the hyper plane is called support vectors. SVM maximizes the margin
to reduce any chance of misclassification. The application includes face detection, image classification. The
algorithm can classify both linear and non-linear data.
Li et al. [25] contrasted various feature selection methodologies, such as relief, maximum relevance
minimum redundancy, least absolute shrinkage and selection operator, LLBFS FS algorithms, and presented a
novel feature selection methodology, FCMIM. SVM is utilized as a classifier that combines with FCMIM for
the prediction of cardiac illnesses among LR, KNN, ANN, NB, and DT. Abdar et al. [26] created a novel ML
methodology for accurate CAD prediction. The Z-Alizadeh Sani dataset is preprocessed and normalized. The
GA and PSO algorithms are employed in the selection of the features. The classification algorithms including
NB, generalized linear models (GLM), LR, DL, DT, RF, GBT, and 3 types of SVM namely C-SVC, nu-SVC,
and linear SVM were also tested on the dataset. The N2Genetic-NuSVM method has the highest accuracy
(93.08%) and F1-score rate (0.9151).
4.1.7. Artificial neural network
A NN that mimics the human brain, also known as an artificial neural network. Interconnected neurons
can be found in the human brain. Similar to this, the neurons in ANN are arranged in several layers and
connected to one another. The various layers include input layers, hidden layers, and the output layers.
Kahramanli and Allahverdi [19] created a hybrid system for diabetes and heart disease prediction.
Using the UCI dataset, ANN and FNN hybridization has been implemented. FNN1, FNN2, FNN3 are the three
types of FNN. While the weights are fuzzy, the inputs to the FNN1 are crisp values and vice-versa with FNN2.
Both the input and the weights for FNN3 are fuzzy values. A hybrid system is created by combining a FNN2
and an ANN trained with the backpropagation. The model shows 84.2% accuracy for Pima Indians database
and 86.8% for Cleveland database.
A computer-aided system called heart disease prediction system (HDPS) [29] was created using the
C and C# environments to forecast heart disorders. With the 13 clinical features, ANN classifies with a degree
of precision of 80%. HDPS interface is created with data input panel, ROC curve display section and
performance display section. A low compact sensor is designed by Dixit and Kala [18] that records the ECG
signals and the preprocessed signal is then segmented. With each window, suitable features are extracted and
the classification is made. Finally fusion of the windows is done by 1D CNN model with an accuracy of 93%.
4.1.8. Fuzzy logics
Fuzzy logics is a multi-valued logic in which variables takes the value between 0 and 1. A Fuzzy
expert system-based prediction [37] focused on modules, meta-rules, and consistency checks in the rule base
for improved rule organization. In the current expert system, a specific emphasis has been placed on effective
rule organization methods. Because the patient may be unaware of the values of all clinical parameters, the
various combinations of the criteria are created including age, blood pressure, cholesterol. In the rule base, two
consistency checks namely i) contradictory rule checking and ii) redundant rule checking is defined for the
improvement in results.
When diagnosing CAD using a fuzzy expert system based on PSO, the membership functions are
optimised using PSO, and a fuzzy rule basis is produced using the optimised membership functions. With the
fuzzy rule base, Mamdani inferenceing is implemented that yielded the highest accuracy of 93.27% with the
Cleveland and Hungarian dataset [20]. The fuzzy system inferenced with Mamdani approach predicted the heart
disease with the rules generated by C4.5. The highest accuracy of 94.55% is achieved by Muhammed and
Algehyne [10].
4.1.9. Other learning methods
With the two-level stacking, Wang et al. [30] utilized the enumeration algorithm to identify the best
classifiers with the Z-Alizadeh dataset. A novel hybrid dataset ‘Sathvi’ [38] by integrating public health dataset
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
666
is created. The objective of this hybrid dataset is to make datasets free from noise. The attributes ‘ca’, ‘thal’
were eliminated from the dataset. With the comparison of the classifiers taken for this research, CatBoost
classifier outperformed the other classifier with the accuracy from 88% to 98.11%.
4.2. Performance measurements
The classification of heart disease data requires the use of several supervised ML algorithms. The
categorization models were assessed using the eight quality factors such as true positive (TP), false positive
(FP), true negative (TN), false negative (FN), accuracy, specificity, sensitivity, and F1 score. These
performance criteria for the categorization analysis were looked into.
TP – No. of. victims with presence of heart disease predicted as presence of heart disease.
FP – No. of. victims with absence of heart disease predicted as presence of heart disease.
TN – No. of. victims with absence of heart disease predicted as absence of heart disease.
FN – No. of. victims actually have presence of heart disease predicted as absence of heart disease.
An accuracy score indicates how effectively a model performs. It is calculated as the sum of TPs and
TNs, divided by the sum of TPS, FPS, TNS, and FNS. The formula is:
𝐴ccuracy =
(TP+TN)
(TP+FP+TN+FN)
(2)
following accuracy, specificity is a measure of negative cases recognized as negative by the classifier. The
formula is:
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
(𝑇𝑁)
(𝑇𝑁+𝐹𝑃)
(3)
the proportion of cases that were actually positive but were predicted to be positive is known as sensitivity.
Another name for sensitivity is recall. To put it another way, an unhealthy person was predicted to be unhealthy.
The formula is:
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
(𝑇𝑃)
(𝑇𝑃+𝐹𝑁)
(4)
the harmonic mean of precision and recall is called F1 measure. The value is 1 for the finest performance and
0 for the worst. The formula is:
𝐹1 =
2(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
(5)
5. RESULT ANALYSIS
In this review, traditional classifiers and boost classifiers applied on the public datasets was studied.
RF, DT, and KNN outperforms state-of-the-art techniques, and achieved accuracy of 100% using KNN
algorithm. The performance evaluation results from the literature are shown in Figures 5(a)-5(c). The
performance of all 3 classifiers resulted above 85%.
With regard to the articles examined in Table 2, 7 references with Sr.No [1], [2], [7], [13], [16], [27],
[28] exhibit the best results for RF, and the authors used, correspondingly, 5, 6, 6, 12, 7, 4, and 5 studies for
comparison. 5 references with Sr.No [3], [5], [8], [20], [25] showed best results for DT, and the authors used
4, 7, 5, 34, and 6 studies for comparison. 3 references with Sr.No [4], [6], [19] showed the best results for KNN
where 2, 29, and 6 studies are compared respectively. Figure 5 shows the performance comparison of different
CAD prediction models, Figure 5(a) shows the accuracy achieved by RF, Figure 5(b) shows the accuracy
achieved by DT, and Figure 5(c) shows the accuracy achieved by KNN. With RF, the maximum attained
accuracy is by Sr.No [2] with 92.90%. With DT, Sr.No [3] showed 99.2% accuracy. 100% accuracy is attained
by Sr.No [4] with KNN and proved to be one of the best classifiers in CAD prediction.
5.1. Challenges and research directions
ML is frequently employed to solve categorization problems in the health care sector. Our study on
ML algorithms opens up several research issues especially in healthcare. We observe that ML research is
actively taking place in the field of cardiology, with some intriguing proofs-of-concept, and proprietary
solutions being developed by the research community, healthcare industries respectively.
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
667
In general, the nature and quality of data, in addition to the quality of learning algorithms, will regulate
a machine learning-based solution's success and efficiency. It is challenging to gather real world data in the
pertinent fields, such as agriculture, IoT, healthcare. Therefore, a more thorough investigation of data collection
is required. Also, it is a challenging task to accurately pre-process the data collected from different sources.
Therefore, to effectively employ the learning algorithms, it is required to modify or expand existing pre-
processing methods or to suggest new data preparation strategies. The hybrid learning model, for instance, the
ensemble of methods, the modification or refinement of the current learning approaches, or the construction of
new learning methods, could be a potential future work in classification and prediction of heart related disease.
(a)
(b)
(c)
Figure 5. Accuracy of CAD prediction models (a) accuracy of random forest classifier, (b) accuracy of
decision tree classifier, and (c) accuracy of k-nearest neighbor classifier
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
668
6. CONCLUSION
ML algorithms have enormous potential in predicting heart-related ailments. Each of the
aforementioned algorithms performed excellently in certain cases while failed terribly in others. In order to do
this, we searched extensively across a number of search engines and databases. From 2006 to 2022, the most
significant research on CAD diagnosis using ML algorithms was done, according to our review. The outcomes
also shows that the most popular CAD detection methods are RFs, DTs, KNN, SVMs, ANNs, Fuzzy logics,
and NB. Because of the inherent variety of datasets and ML algorithms, different performance metrics have
been documented. According to the results, RF, DT, and KNN have the highest accuracy levels for the majority
of the CAD datasets. Additionally, hybridizing classifiers and feature selection can enhance performance for
precise CAD diagnosis. With the same methodology and dataset, the studies can be extended to other heart-
related diseases. With the proper utilization of the real-time datasets, the studies can be improvised.
APPENDIX
Table 2. The list of articles published using ML for CAD diagnosis
Sr.
No
Author Year Techniques Conclusion Dataset Observations
1 Muhammad
et al. [11]
2021 LR, SVM, KNN,
RF, NB, GB
RF
Accuracy = 92.04%
General Hospital,
Nigeria
Results from various ML
algorithms were obtained, and
they were compared.
2 Yilmaz et al.
[12]
2022 RF, SVM, LR RF
Accuracy = 92.9%
IEEE DataPort Three traditional ML approaches
were employed, and RF is used as
a best classifier for the prediction
of CHD.
3 Soni et al.
[13]
2011 DT with GA,
NB,
Classification via
clustering
DT with GA
Accuracy = 99.2%
NR With GA, the optimal subset of
features is retrieved, and DT,
Bayesian classification shows
improvement in results.
4 Jabbar et al.
[14]
2013 KNN, J48, NB KNN
Accuracy = 100%
UCI Repository Optimal feature selection method,
along with KNN classifier is
compared with other datasets.
Classifier hybridization with
feature selection can best choose
the features, enhancing accuracy.
5 Verma et al.
[8]
2017 DT, MLP, NB
tree
C4.5
Accuracy = 97.6%
Sensitivity = 97.5%
Specificity = 97.6%
Indira Gandhi
Medical College,
Shimla, India
Three predictive data mining
approaches were compared.
6 Acharya
et al. [15]
2018 DT, KNN KNN
Accuracy = 99.55%
Sensitivity= 99.93%
Specificity= 99.24%
St.-Petersburg
Institute of
Cardiological
Technic (ECG)
With the selected 20 features from
the ECG signal, KNN achieved
the highest accuracy from
contourlet coefficients.
7 Joloudari et
al. [16]
2020 SVM, CHAID,
C5.0, and RT
RT
Accuracy = 91.47%
Z-Alizadeh Sani
heart disease
dataset
Different classifiers were
experimented and RT reports best
in Accuracy, (area under the
curve) AUC, Gini. The most
important rules were extracted
using the random trees model for
CAD diagnosis.
8 Muhammad
and Algehyne
[10]
2021 C4.5 Fuzzy
Expert system
Fuzzy Logics
Accuracy = 94.55%
Sensitivity=95.35%
Specificity = 95.0%
General Hospital,
Nigeria
The C4.5 data mining algorithm is
used to incorporate human
expertise into the knowledge base
of the system.
9 Ali et al. [17] 2020 OCI-DBN,
ANN, DNN
OCI-DBN using
SGA
Accuracy = 94.61%
Sensitivity= 96.03%
Specificity= 93.15%
Cleveland heart
disease dataset
Ruzzo-Tompa method eliminates
features that less contributes to
system performance. Stacked
genetic algorithm is used to build
the best-configured DBN.
10 Dixit and
Kala [18]
2021 RF, GB, Deep
Learning models
(1D CNN)
1D CNN
Accuracy = 93.0%
Swaroop Rani
Nehru Hospital-
Allahabad,
Oversampling with 1D CNN and
voting strategy emerges as the
suitable classification technique
with 93% accuracy.
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
669
Table 2. The list of articles published using ML for CAD diagnosis (Continue)
Sr.
No
Author Year Techniques Conclusion Dataset Observations
11 Kahramanli and
Allahverdi [19]
2008 ANN, FNN on 2
datasets
HNN (Pima
dataset) Accuracy=
84.2%
Sensitivity=80.3%
Specificity 87.3%
HNN (Cleveland
dataset)
Accuracy= 87.4%
Sensitivity = 93.0%
Specificity 78.5%
Pima Indians
diabetes and
Cleveland heart
disease
The databases cover Cleveland
heart disease and Pima Indian
diabetes. The categorization
accuracy of these datasets was
assessed using k-fold cross-
validation.
12 Muthukaruppan
et al. [20]
2012 Fuzzy expert
system
PSO based Fuzzy
expert system
Accuracy = 93.27%
UCI Repository Cleveland and Hungarian Heart
Disease dataset from UCI
repository. DT was employed to
identify the characteristics that
influence the diagnosis. To fine-
tune the fuzzy membership
functions, PSO was used. The
optimized MFs yielded the
highest accuracy.
13 Jinny and Mate
[21]
2021 DT, RF,
AdaBoost,
GNB, LR,
KNN,
GB, XGB
RF
Accuracy = 90.7%
Framingham’s
Dataset
Performance comparison of both
conventional ML methods, and
cutting-edge Gradient Boosting
techniques with feature selection
and without feature selection is
made.
14 Tiwari et al. [22] 2022 ET, RF, XG,
RF, MLP,
KNN, XGB,
SVM, SGD,
AdaBoost,
CART, GB, NB
Stacked ensemble
classifier
Accuracy = 92.34%
IEEE Data Port Hungarian, Cleveland, Long
Beach VA, Switzerland, and
Statlog datasets were combined
into a one dataset. The stacked
ensemble classifier implemented
achieved highest accuracy of
92.34% than the other classifiers.
15 Tarawneh and
Embarak [23]
2019 NB, SVM,
KNN, NN,
J4.8M, RF, GA
NB, SVM
Accuracy = 89.2%
Cleveland heart
disease dataset
A variety of ML classifiers were
deployed on Cleveland datasets to
predict cardiac disease, and NB,
SVM shows improvement in
accuracy when compared to other
classifiers.
16 Mohan et al.
[24]
2019 NB, LR, DL,
DT, RF, GB
Tree, SVM,
VOTE, HRFLM
HRFLM
Accuracy = 88.4%
Sensitivity=92.8%
Specificity=82.6%
UCI Repository With the DT features, the dataset
has been clustered. The classifiers
performance is then estimated by
applying them to each clustered
dataset.
17 Li et al. [25] 2020 LR, KNN,
ANN, SVM,
NB, DT
FCMIM – SVM
Accuracy = 92.37%
Specificity = 98.0%
Sensitivity = 89.0%
Cleveland heart
disease dataset
The feature extraction process
used Fast Conditional Mutual
Information, which improved
prediction accuracy with SVM.
18 Abdar et al. [26] 2019 SVC, nuSVM,
LinSVM
N2Genetic-nuSVM
Accuracy = 93.08%
Z-Alizadeh Sani
heart disease
dataset
Several ML techniques were
explored, and three different
forms of SVM were deployed,
along with feature tuning for
improved accuracy.
19 Shah et al. [27] 2020 NB, KNN, DT,
RF
KNN
Accuracy = 90.78%
UCI Repository Four traditional ML approaches
were employed, and KNN is used
as a classifier for the prediction of
CAD.
20 Ghiasi et al. [28] 2020 Bagging, SMO,
Bagging SMO,
NB, C4.5, J48,
SVM, ANN,
ANN-GA,
CART.
CART
Accuracy = 92.41%
Z-Alizadeh Sani
heart disease
dataset
The important and non-important
features were categorized and the
CART model for CAD diagnosis
was created using those important
features. CART SMO and
Bagging SMO models, ANN-GA
model gives the accurate results
but CART reaches the highest
accuracy.
21 Chen et al. [29] 2011 ANN ANN
Accuracy = 80.0%
UCI Repository A user-friendly ANN based Heart
Disease Prediction System
(HDPS) is developed, and reliably
predicts outcomes.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
670
Table 2. The list of articles published using ML for CAD diagnosis (Continue)
Sr.
No
Author Year Techniques Conclusion Dataset Observations
22 Wang et al. [30] 2020 LR, RF, GNB,
SVC, DT, KNN,
ADB, GB, MLP,
XGB, Stacking
model
Stacking model
Accuracy = 95.43%
Sensitivity=95.84%
Specificity=94.44%
Z-Alizadeh
Sani heart
disease dataset
The stacking model is tested on
three distinct datasets and its
performance is compared to
that of other classifiers. The
best combining classifiers are
then discovered using the
enumeration approach.
23 Sayadi
et al. [31]
2022 DT, DL, LR, RF,
SVM, XGB
LR, SVM
Accuracy=95.08%
Z-Alizadeh
Sani heart
disease dataset
With the features selected from
Pearson feature selection, six
traditional ML approaches
were compared. LR, SVM
reached the highest accuracy
level fo CAD detection.
24 Verma et al. [32] 2016 MLP, MLR
FURIA, C4.5
MLR
Accuracy = 88.4%
Indira Gandhi
Medical
College
(IGMC),
Shimla, India
Dimension reduction using
correlation-based feature subset
selection using PSO, and data
clustering to spot cluster data
points that were improperly
assigned. Finally, C4.5, MLP,
MLR, and FURIA were used to
build the hybrid model.
25 Idris
et al. [33]
2020 LR, NN, kNN,
DT, NB, SVM,
DL, Vote
NN with Embedded
DT features
Accuracy = 94.5%
Malaysian
National
Cardiovascular
Disease
Database
(NCVD)–ACS
registry
With the significant features
selected from suitable feature
selection methods, a prediction
model was built and its
performance metrics was
compared.
26 Marateb and
Goudarzi [34]
2015 Fuzzy expert
system
(MLR+NFC)
Fuzzy rule-based
system (MLR+NFC)
Accuracy = 84.0%
Cleveland heart
disease dataset
The features are selected with
MLR and integrated with the
Neuro-Fuzzy classifier for the
prediction of heart diseases.
27 Rani
et al. [35]
2021 NB, SVM, LR,
RF, AdaBoost
RF
Accuracy = 86.6%
Cleveland heart
disease dataset
A hybrid technique that
included GA and RFE was
applied for feature selection.
Missing values in the dataset
were treated using Multiple
Imputation by Chained
Equations algorithm. RF
provided the highest
performance in combination
with MICE, GARFE, Scaling
and SMOTE.
28 Rajdhan et al.
[36]
2020 DT, LR, RF, NB RF
Accuracy = 90.16%
Cleveland heart
disease dataset
When the accuracy of multiple
ML algorithms was compared,
RF performed well.
29 Pal et al. [37] 2012 ANN, ID3,
CART, Fuzzy
Fuzzy Logics
Accuracy = 84.20%
Sensitivity= 95.85%
Specificity=83.33%
NR The risk factors for developing
CAD, approaches for gathering
and representing knowledge, a
strategy for organizing rules,
fuzzifying clinical parameters,
and defuzzification to crisp
value are all described in this
study.
30 Kanagarathinam
et al. [38]
2022 NB, XGB, KNN,
SVM, MLP,
CatBoost
CatBoost classifier
Accuracy = 94.34%
Sathvi dataset Sathvi dataset is created by
combining the 4 CVD datasets.
The dataset has 531 instances
and 12 attributes with absence
of missing data.
*NR = Not reported
REFERENCES
[1] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia, and J. Gutierrez, “A comprehensive investigation and comparison of
machine learning techniques in the domain of heart disease,” in Proceedings - IEEE Symposium on Computers and
Communications, Jul. 2017, pp. 204–207, doi: 10.1109/ISCC.2017.8024530.
[2] C. W. Tsao et al., “Heart disease and stroke statistics-2022 update: A report from the American Heart Association,” Circulation,
vol. 145, no. 8, pp. E153–E639, 2022, doi: 10.1161/CIR.0000000000001052.
[3] C. Abbafati et al., “Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global
Int J Artif Intell ISSN: 2252-8938 
Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj)
671
Burden of Disease Study 2019,” The Lancet, vol. 396, no. 10258, pp. 1223–1249, 2020, doi: 10.1016/S0140-6736(20)30752-2.
[4] P. Libby and P. Theroux, “Pathophysiology of coronary artery disease,” Circulation, vol. 111, no. 25, pp. 3481–3488, Jun. 2005,
doi: 10.1161/CIRCULATIONAHA.105.537878.
[5] D. Lapp, “Heart disease dataset,” Kaggle, pp. 1–6, 2020, [Online]. Available: https://ieee-dataport.org/open-access/heart-disease-
dataset-comprehensive.
[6] A. Frank and A. Asuncion, “UCI machine learning repository,” UCI, 2010, [Online]. Available: http://archive.ics.uci.edu/ml.
[7] R. Alizadehsani et al., “A data mining approach for diagnosis of coronary artery disease,” Computer Methods and Programs in
Biomedicine, vol. 111, no. 1, pp. 52–61, Jul. 2013, doi: 10.1016/j.cmpb.2013.03.004.
[8] L. Verma, S. Srivastava, and P. C. Negi, “An intelligent noninvasive model for coronary artery disease detection,” Complex &
Intelligent Systems, vol. 4, no. 1, pp. 11–18, Jul. 2018, doi: 10.1007/s40747-017-0048-6.
[9] A. A. Haruna, L. J. Muhammad, B. Z. Yahaya, E. J. Garba, N. D. Oye, and L. T. Jung, “An improved C4.5 data mining driven
algorithm for the diagnosis of coronary artery disease,” in Proceeding of 2019 International Conference on Digitization:
Landscaping Artificial Intelligence, ICD 2019, Nov. 2019, pp. 48–52, doi: 10.1109/ICD47981.2019.9105844.
[10] L. J. Muhammad and E. A. Algehyne, “Fuzzy based expert system for diagnosis of coronary artery disease in nigeria,” Health and
Technology, vol. 11, no. 2, pp. 319–329, Feb. 2021, doi: 10.1007/s12553-021-00531-z.
[11] L. J. Muhammad, I. Al-Shourbaji, A. A. Haruna, I. A. Mohammed, A. Ahmad, and M. B. Jibrin, “Machine learning predictive
models for coronary artery disease,” SN Computer Science, vol. 2, no. 5, Jun. 2021, doi: 10.1007/s42979-021-00731-4.
[12] R. Yilmaz and F. H. Yagin, “Early detection of coronary heart disease based on machine learning methods,” Medical Records, vol.
4, no. 1, pp. 1–6, Jan. 2022, doi: 10.37990/medr.1011924.
[13] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive data mining for medical diagnosis: An overview of heart disease prediction,”
International Journal of Computer Applications, vol. 17, no. 8, pp. 43–48, Mar. 2011, doi: 10.5120/2237-2860.
[14] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, “Heart disease classification using nearest neighbor classifier with feature subset
selection,” Annals Computer Science Series, vol. XI, no. 1, pp. 47–54, 2013, [Online]. Available: http://www.anale-
informatica.tibiscus.ro/download/lucrari/11-1-06-Jabbar.pdf.
[15] U. R. Acharya et al., “Automated characterization of coronary artery disease, myocardial infarction, and congestive heart failure
using contourlet and shearlet transforms of electrocardiogram signal,” Knowledge-Based Systems, vol. 132, pp. 156–166, Sep. 2017,
doi: 10.1016/j.knosys.2017.06.026.
[16] J. H. Joloudari et al., “Coronary artery disease diagnosis; ranking the significant features using a random trees model,” International
Journal of Environmental Research and Public Health, vol. 17, no. 3, p. 731, Jan. 2020, doi: 10.3390/ijerph17030731.
[17] S. A. Ali et al., “An optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction
based on ruzzo-tompa and stacked genetic algorithm,” IEEE Access, vol. 8, pp. 65947–65958, 2020, doi:
10.1109/ACCESS.2020.2985646.
[18] S. Dixit and R. Kala, “Early detection of heart diseases using a low-cost compact ECG sensor,” Multimedia Tools and Applications,
vol. 80, no. 21–23, pp. 32615–32637, Aug. 2021, doi: 10.1007/s11042-021-11083-9.
[19] H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the diabetes and heart diseases,” Expert Systems with Applications,
vol. 35, no. 1–2, pp. 82–89, Jul. 2008, doi: 10.1016/j.eswa.2007.06.004.
[20] S. Muthukaruppan and M. J. Er, “A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary
artery disease,” Expert Systems with Applications, vol. 39, no. 14, pp. 11657–11665, Oct. 2012, doi: 10.1016/j.eswa.2012.04.036.
[21] P. R. L, S. V. Jinny, and Y. V. Mate, “Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter
optimization and machine learning techniques,” Health and Technology, vol. 11, no. 1, pp. 63–73, Nov. 2021, doi: 10.1007/s12553-
020-00508-4.
[22] A. Tiwari, A. Chugh, and A. Sharma, “Ensemble framework for cardiovascular disease prediction,” Computers in Biology and
Medicine, vol. 146, p. 105624, Jul. 2022, doi: 10.1016/j.compbiomed.2022.105624.
[23] M. Tarawneh and O. Embarak, “Hybrid approach for heart disease prediction using data mining techniques,” Lecture Notes on Data
Engineering and Communications Technologies, vol. 29, pp. 447–454, 2019, doi: 10.1007/978-3-030-12839-5_41.
[24] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE
Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
[25] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart disease identification method using machine learning
classification in e-healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.
[26] M. Abdar, W. Książek, U. R. Acharya, R. S. Tan, V. Makarenkov, and P. Pławiak, “A new machine learning technique for an
accurate diagnosis of coronary artery disease,” Computer Methods and Programs in Biomedicine, vol. 179, p. 104992, Oct. 2019,
doi: 10.1016/j.cmpb.2019.104992.
[27] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction using machine learning techniques,” SN Computer Science, vol. 1, no.
6, Oct. 2020, doi: 10.1007/s42979-020-00365-y.
[28] M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, “Decision tree-based diagnosis of coronary artery disease: CART model,”
Computer Methods and Programs in Biomedicine, vol. 192, 2020, doi: 10.1016/j.cmpb.2020.105400.
[29] A. H. Chen, S. Y. Huang, P. S. Hong, C. H. Cheng, and E. J. Lin, “HDPS: Heart disease prediction system,” Computing in
Cardiology, vol. 38, pp. 557–560, 2011.
[30] J. Wang et al., “A stacking-based model for non-invasive detection of coronary heart disease,” IEEE Access, vol. 8, pp. 37124–
37133, 2020, doi: 10.1109/ACCESS.2020.2975377.
[31] M. Sayadi, V. Varadarajan, F. Sadoughi, S. Chopannejad, and M. Langarizadeh, “A machine learning model for detection of
coronary artery disease using noninvasive clinical parameters,” Life, vol. 12, no. 11, p. 1933, Nov. 2022, doi: 10.3390/life12111933.
[32] L. Verma, S. Srivastava, and P. C. Negi, “A hybrid data mining model to predict coronary artery disease cases using non-invasive
clinical data,” Journal of Medical Systems, vol. 40, no. 7, Jun. 2016, doi: 10.1007/s10916-016-0536-z.
[33] N. Md Idris, Y. K. Chiam, K. D. Varathan, W. A. Wan Ahmad, K. H. Chee, and Y. M. Liew, “Feature selection and risk prediction
for patients with coronary artery disease using data mining,” Medical and Biological Engineering and Computing, vol. 58, no. 12,
pp. 3123–3140, Nov. 2020, doi: 10.1007/s11517-020-02268-9.
[34] H. R. Marateb and S. Goudarzi, “A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy
rule-based system,” Journal of Research in Medical Sciences, vol. 20, no. 3, pp. 214–223, 2015.
[35] P. Rani, R. Kumar, N. M. O. S. Ahmed, and A. Jain, “A decision support system for heart disease prediction based upon machine
learning,” Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263–275, Jan. 2021, doi: 10.1007/s40860-021-00133-6.
[36] A. Rajdhan, A. Agarwal, M. Sai, D. Ravi, and P. Ghuli, “Heart Disease Prediction using Machine Learning,” International Journal
of Engineering Research and, vol. V9, no. 04, May 2020, doi: 10.17577/IJERTV9IS040614.
[37] D. Pal, K. M. Mandana, S. Pal, D. Sarkar, and C. Chakraborty, “Fuzzy expert system approach for coronary artery disease screening
using clinical parameters,” Knowledge-Based Systems, vol. 36, pp. 162–174, Dec. 2012, doi: 10.1016/j.knosys.2012.06.013.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672
672
[38] K. Kanagarathinam, D. Sankaran, and R. Manikandan, “Machine learning-based risk prediction model for cardiovascular disease
using a hybrid dataset,” Data and Knowledge Engineering, vol. 140, p. 102042, Jul. 2022, doi: 10.1016/j.datak.2022.102042.
[39] A. Garavand, C. Salehnasab, A. Behmanesh, N. Aslani, A. H. Zadeh, and M. Ghaderzadeh, “Efficient model for coronary artery
disease diagnosis: a comparative study of several machine learning algorithms,” Journal of Healthcare Engineering, vol. 2022, pp.
1–9, Oct. 2022, doi: 10.1155/2022/5359540.
[40] H. G. Lee, K. Y. Noh, and K. H. Ryu, “A data mining approach for coronary heart disease prediction using HRV features and
carotid arterial wall thickness,” in BioMedical Engineering and Informatics: New Development and the Future - Proceedings of the
1st International Conference on BioMedical Engineering and Informatics, BMEI 2008, May 2008, vol. 1, pp. 200–206, doi:
10.1109/BMEI.2008.189.
[41] C. M. Chu et al., “A Bayesian expert system for clinical detecting coronary artery disease,” Journal of Medical Sciences, vol. 29,
no. 4, pp. 187–194, 2009.
[42] M. A. Karaolis, J. A. Moutiris, D. Hadjipanayi, and C. S. Pattichis, “Assessment of the risk factors of coronary heart events based
on data mining with decision trees,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 3, pp. 559–566,
May 2010, doi: 10.1109/TITB.2009.2038906.
BIOGRAPHIES OF AUTHORS
Anu Ragavi Vijayaraj is a full-time research scholar in the school of Computer
Science and Engineering at VIT, Chennai, India. She received her B.Tech. in Information
Technology from Kongu Engineering College, Perundurai (India) in 2013 and M.E. in
Computer and Communication Engineering from Kongu Engineering College, Perundurai
(India) in 2015. Her research interests include machine learning, deep learning, and artificial
intelligence. She can be contacted at email: anuragaviraj@gmail.com.
Subbulakshmi Pasupathi is an Assistant Professor in the school of Computer
Science and Engineering at VIT, Chennai, India. She received her B.Tech. in Information
Technology in 2009, M.E. in Computer Science and Engineering in 2011 and Ph.D. from
Anna University, Chennai (India) in 2019. She is the author of many scientific publications
in international journals and conferences. Her current research interests include cognitive
networks, artificial intelligence, and machine learning. She can be contacted at email:
subbulakshmi.p@vit.ac.in.

More Related Content

PDF
Identification and alertness of cardiovascular disease using MATLAB with IoT
PDF
Involving machine learning techniques in heart disease diagnosis: a performan...
PDF
Heart Failure Prediction using Different Machine Learning Techniques
PDF
Thermal Imaging for the Diagnosis of Early Vascular Dysfunctions: A Case Report
PDF
IRJET - Cloud based Enhanced Cardiac Disease Prediction using Naïve Bayesian ...
PDF
Automated prediction of sudden cardiac death using statistically extracted f...
PDF
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
PDF
IRJET- A Survey on ECG Signals Classification for Early Detection of Cardiova...
Identification and alertness of cardiovascular disease using MATLAB with IoT
Involving machine learning techniques in heart disease diagnosis: a performan...
Heart Failure Prediction using Different Machine Learning Techniques
Thermal Imaging for the Diagnosis of Early Vascular Dysfunctions: A Case Report
IRJET - Cloud based Enhanced Cardiac Disease Prediction using Naïve Bayesian ...
Automated prediction of sudden cardiac death using statistically extracted f...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET- A Survey on ECG Signals Classification for Early Detection of Cardiova...

Similar to Potential directions on coronary artery disease prediction using machine learning algorithms: A survey (20)

PDF
A robust penalty regression function-based deep convolutional neural network ...
PDF
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
PDF
IMBALANCED DATASET EFFECT ON CNN-BASED CLASSIFIER PERFORMANCE FOR FACE RECOGN...
PDF
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
PDF
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
PDF
Heart Attack Prediction System Using Fuzzy C Means Classifier
PDF
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
PDF
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
PDF
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
PDF
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
PDF
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
PDF
EMS196075635346375868878887767548663.pdf
PDF
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
PDF
A comparative study of cn2 rule and svm algorithm
PDF
A comprehensive study of machine learning for predicting cardiovascular disea...
PDF
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
PDF
Detection of myocardial infarction on recent dataset using machine learning
PDF
Genome feature optimization and coronary artery disease prediction using cuck...
PPTX
ATHEROSCLEROTIC CARDIOVASCULAR DISEASES PPT
PPTX
Artificial Intelligence And Machine Learning In Healthcare: A Cardiovascular ...
A robust penalty regression function-based deep convolutional neural network ...
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
IMBALANCED DATASET EFFECT ON CNN-BASED CLASSIFIER PERFORMANCE FOR FACE RECOGN...
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
A COMPREHENSIVE SYSTEMATIC REVIEW FOR CARDIOVASCULAR DISEASE USING MACHINE LE...
Heart Attack Prediction System Using Fuzzy C Means Classifier
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
WAVELET SCATTERING TRANSFORM FOR ECG CARDIOVASCULAR DISEASE CLASSIFICATION
EMS196075635346375868878887767548663.pdf
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
A comparative study of cn2 rule and svm algorithm
A comprehensive study of machine learning for predicting cardiovascular disea...
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
Detection of myocardial infarction on recent dataset using machine learning
Genome feature optimization and coronary artery disease prediction using cuck...
ATHEROSCLEROTIC CARDIOVASCULAR DISEASES PPT
Artificial Intelligence And Machine Learning In Healthcare: A Cardiovascular ...
Ad

More from IAESIJAI (20)

PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Customer segmentation using association rule mining on retail transaction data
PDF
Averaged bars for cryptocurrency price forecasting across different horizons
PDF
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
PDF
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
PDF
Assured time series forecasting using inertial measurement unit, neural netwo...
PDF
Detection of partially occluded area in face image using U-Net model
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Novel artificial intelligence-based ensemble learning for optimized software ...
PDF
GradeZen: automated grading ecosystem using deep learning for educational ass...
PDF
Leveraging artificial intelligence through long short-term memory approach fo...
PDF
Application of the adaptive neuro-fuzzy inference system for prediction of th...
PDF
Novel preemptive intelligent artificial intelligence-model for detecting inco...
PDF
Techniques of Quran reciters recognition: a review
PDF
ApDeC: A rule generator for Alzheimer's disease prediction
Convolutional neural network based encoder-decoder for efficient real-time ob...
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Customer segmentation using association rule mining on retail transaction data
Averaged bars for cryptocurrency price forecasting across different horizons
Optimizing real-time data preprocessing in IoT-based fog computing using mach...
Comparison of deep learning models: CNN and VGG-16 in identifying pornographi...
Assured time series forecasting using inertial measurement unit, neural netwo...
Detection of partially occluded area in face image using U-Net model
Flame analysis and combustion estimation using large language and vision assi...
Heterogeneous semantic graph embedding assisted edge sensitive learning for c...
The influence of sentiment analysis in enhancing early warning system model f...
Novel artificial intelligence-based ensemble learning for optimized software ...
GradeZen: automated grading ecosystem using deep learning for educational ass...
Leveraging artificial intelligence through long short-term memory approach fo...
Application of the adaptive neuro-fuzzy inference system for prediction of th...
Novel preemptive intelligent artificial intelligence-model for detecting inco...
Techniques of Quran reciters recognition: a review
ApDeC: A rule generator for Alzheimer's disease prediction
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DOCX
search engine optimization ppt fir known well about this
PDF
STKI Israel Market Study 2025 version august
PDF
Five Habits of High-Impact Board Members
Enhancing emotion recognition model for a student engagement use case through...
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A proposed approach for plagiarism detection in Myanmar Unicode text
Microsoft Excel 365/2024 Beginner's training
sustainability-14-14877-v2.pddhzftheheeeee
sbt 2.0: go big (Scala Days 2025 edition)
A review of recent deep learning applications in wood surface defect identifi...
A comparative study of natural language inference in Swahili using monolingua...
OpenACC and Open Hackathons Monthly Highlights July 2025
UiPath Agentic Automation session 1: RPA to Agents
Module 1.ppt Iot fundamentals and Architecture
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
search engine optimization ppt fir known well about this
STKI Israel Market Study 2025 version august
Five Habits of High-Impact Board Members

Potential directions on coronary artery disease prediction using machine learning algorithms: A survey

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 1, March 2024, pp. 658~672 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp658-672  658 Journal homepage: http://ijai.iaescore.com Potential directions on coronary artery disease prediction using machine learning algorithms: A survey Anu Ragavi Vijayaraj, Subbulakshmi Pasupathi School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India Article Info ABSTRACT Article history: Received Feb 9, 2023 Revised May 25, 2023 Accepted Jun 3, 2023 Coronary artery disease (CAD) is the most ubiquitous and protuberant cause of fatal death. The hit in mortality rate is because of certain lifestyle variables including unhealthy diet, usage of tobaccos and drugs, physical inactivity, and environmental pollution. Traditional screening tests including computed tomography, angiography, electrocardiography, and magnetic resonance imaging are employed for diagnosis and would necessitate more manpower. Machine learning (ML) has been utilized in healthcare to create early predictions from massive volumes of data. The Scopus, Web of Science databases were exhaustively searched utilizing a search strategy that comprised CAD prediction, cardiac illness detection, and heart disease categorization. After applying the inclusion and exclusion criteria to the 99 articles obtained, the population of the study was composed of 30 articles. This review study offers an organized look at the articles published in ML- based CAD detection and classification models that include clinical variables. The use of ML could produce amazing results in CAD detection, as evidenced by the classifiers random forest, decision tree, and k-nearest-neighbour with accuracy being >90%. The use of ML in CAD diagnosis lowers false-positive, and false-negative errors, and presents a special opportunity by providing patients quick, and affordable diagnostic services. Keywords: Classification Clinical variables Coronary artery disease Machine learning This is an open access article under the CC BY-SA license. Corresponding Author: Subbulakshmi Pasupathi School of Computer Science and Engineering, Vellore Institute of Technology Chennai, Tamil Nadu, India Email: subbulakshmi.p@vit.ac.in ABBREVIATIONS LR : Logistic regression DNN : Deep neural network SVM : Support vector machine FNN : Fuzzy neural network KNN : K nearest neighbor ADB : AdaBoost DT : Decision tree XGB : Extreme gradient boosting RF : Random forest SMO : Sequential minimal optimization RT : Random tree MLR : Multinomial logistic regression model NB : Naïve Bayes SVC : Support vector classifier GB : Gradient boosting PSO : Particle swarm optimization GNB : Gaussian Naive Bayes CHAID : Chi-squared automatic interaction detection NN : Neural network NFC : Neuro fuzzy classifier GA : Genetic algorithm CART : Classification and regression trees MLP : Multi layer perceptron CNN : Convoilutional neural networks ANN : Artificial neural network
  • 2. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 659 1. INTRODUCTION People are worried about their hectic schedule in day to day lives and get addicted to drugs and tobaccos as stress relievers. Ultimately the individuals grow with obesity parallelly. This piles up with serious life threading disorders namely heart issues, cancer, tuberculosis and many more. The most challenging task is to predict them before it gets worse. Cardio vascular diseases (CVD) are one such life threading disease accounting 31.8% of all global deaths, according to the recent World Health Organisation (WHO) statistics [1] as depicted in Figure 1. A rise in mortality rate can be avoided if it is predicted in advance and proper lifestyle choices are made. There are four main types of CVD’s; a stroke, also known as a brain attack, happens when blood supply to a portion of the brain is cut off or when a blood artery in the brain bursts. Peripheral artery disease (PAD) is a common disorder characterized by constricted arteries that restrict blood flow to the arms or legs. Heart valve disorders include the aortic disease. Aortic valve dysfunction results in improper operation of the valve between the left ventricle, the lower left heart chamber, and the aorta, the main artery to the body. The last type is coronary heart disease, commonly referred to as coronary artery disease (CAD) and ischemic heart disease (IHD), which is caused by an accumulation of plaque deposits in the arteries that obstruct the heart's blood flow. Figure 1. Global causes of death [1] In accordance with American Heart Association, IHD was predicted to affect 244.1 million persons worldwide in 2020, and it was more common in men than in women (141.0 and 103.1 million people, respectively) [2]. The small arteries in the heart muscles are affected by coronary microvascular disease, another type of heart disease. In women, coronary microvascular disease is more prevalent. Although doctors have not provided a definitive explanation for why CAD develops, risk factors are a significant contributing element. It includes obesity, usage of drugs and tobaccos, cholesterol, and also because of family history. In addition to all these risk variables, the environmental pollution stands the foremost for the development of any disease. The small particles from the polluted air can affect the heart and blood vessels. According to the global burden of disease (GBD) report, pollution caused 9 million deaths worldwide in 2019, with cardiovascular disease, including IHD (31.7%), and stroke (27.7%), accounting for 61.9% of all fatalities. This data underscores the substantial impact of pollution on mortality worldwide, particularly its strong association with heart-related health issues. Figure 2 shows Cardiovacular disease impact on pollution, Figure 2(a) displays the age-standardized number of fine particulate matter (PM2.5) related deaths per 100,000 people in 2019, Figure 2(b) displays the total number of noncommunicable disease-related deaths caused by pollution worldwide in 2019, Figure 2(c) displays the annual mean population-weighted PM2.5 concentrations in China, India, and the United States from 1990 through 2019 and Figure 2(d) displays the model for the exposure-response relationships between CVD and PM2.5 air pollution in a 50-year-old person. The three types of arteries are right coronary artery (RCA), left circumflex artery (LCX), and left anterior descending artery (LAD). CAD is caused when blood flow to the heart stops partially or completely because of the plaque which develops in the arteries. The plaque narrows the arteries blocking the oxygen-rich blood flow to the heart as shown in Figure 3. Because of the reduction in oxygen supply to the heart, chest pain, shortness of breath, and heart attack are caused.
  • 3.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 660 (a) (b) (c) (d) Figure 2. Cardiovacular disease impact on pollution [3]: (a) worldwide mortality from cardiovascular disease associated with air pollution, (b) deaths caused by global pollution, (c) annual mean PM2.5 Pollution levels (1990-2019), and (d) cardiovascular disease and PM2.5 Pollution: an exposure-response connection Figure 3. Coronary artery disease [4] Medical experts recommended a variety of expensive and technically complex tests, such as electrocardiograph (ECG), angiography, computed tomography, and magnetic resonance imaging to diagnose individuals with positive signs of CAD. Currently, these examinations are high-priced and involves methodical professionals. In order to replace the aforementioned standard tests, researchers are striving to develop a less expensive yet equally effective test. Machine learning (ML) and data mining methods are widely used in the analysis and extraction of information from medical data. ML-based techniques are effective in a variety of disciplines, including agriculture, credit card fraud detection, speech recognition, and is frequently recommended for predicting heart disease since it extracts more efficient and accurate data from large datasets, making predictions straightforward. It is the primary foundation of ML, assisting in the management of large volumes of data, having a fast-processing speed, and generating predictions in the early phases of development.
  • 4. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 661 There exist two different types of studies that can be found in the literatures. Some research employed clinical indicators to categorize CAD patients, including age, blood pressure, smoking history, while other studies used signal recordings like electrocardiograph (ECG), photoplethysmography (PPG), and phonocardiography (PCG) to identify CAD symptoms. Consequently, in order to direct the evaluation of future works, our study concentrates on the workflow on the clinical factors. The remainder of the paper is structured: section 2 highlights ML for CAD diagnosis. Sections 3 and 4 discusses about data collection and classification. Finally, sections 5 and 6 discourses the results and conclusion. In ML enables computers to learn and develop without being explicitly programmed with little or no human intervention. Both scientists and medical professionals are looking for affordable, precise, and quick CAD diagnosis and treatment options. In order to help researchers better address numerous difficulties in their future work, this review paper will highlight the strongest findings of earlier studies. This study is added. 2. ML FOR CAD DIAGNOSIS Artificial intelligence (AI) has a substantial impact on the diagnosis of heart disease through the analysis of medical data. AI facilitates accurate forecasting of cardiovascular results and the non-invasive identification of CAD. A branch of AI called machine learning ressed: − General flow diagram of the prediction model − Data acquisition-CAD prediction − Review of the articles on CAD prediction using machine learning algorithms (Back ground study) 2.1. General flow diagram of the prediction model Figure 4 is the general heart disease prediction model. The patient details include age, cholesterol, sugar, blood pressure, are maintained in the hospital’s patient database. With the patient database, the data is collected and handled to the next phase for pre-processing. With the preprocessed data, the best classification method is identified. In this layer, we also train our model using the processed dataset, and prepare it for the next layer. We test the model we trained in the final phase, thus after dividing the dataset into training and testing data. The testing part of the dataset is used to assess the algorithm's classification accuracy. Figure 4. Heart disease prediction model 2.2. Dataset 2.2.1. Kaggle database Four databases make up the 1988 created Kaggle heart disease dataset: Cleveland, Hungary, Switzerland, and Long Beach V. Despite having 76 features, including the anticipated attribute, only 14 of them are used in all reported studies. The "target" field is concerned with the patient's cardiac status. It takes the integer values 0 to indicate no disease and 1 to indicate disease [5]. 2.2.2. UCI database The UCI heart disease databases includes Cleveland with 303 instances, Hungarian with 294 instances, Switzerland with 123 instances, and Long Beach VA with 200 instances. There exist 76 features in the UCI database and all published experiments employ a section of 14 of them [6]. Cleveland database in
  • 5.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 662 particular is commonly employed by ML investigators. The "goal" attribute indicates the patient's possessing heart disease and it has integer values ranging from zero (No presence) to four. Experimentations carried out with Cleveland database focused mainly to distinguish between the disease's existence (Values 1, 2, 3, 4) and absence (Value 0). 2.2.3. Z-Alizadeh Sani database The 303 patient records that makes up the Z-Alizadeh Sani dataset have 56 features each. The characteristics are divided into four categories: demographic, exam and symptom, ECG, laboratory and echo aspects. Every patient could be classified as either CAD or normal. If a patient's diameter narrowing exceeds 50%, they are classified with CAD; otherwise, they are considered to be normal [7]. 2.2.4. Indira Gandhi Medical College database, Shimla, India The dataset collected in real time from Indira Gandhi Medical College contains 335 instances of patients. All of the participants agreed to undergo an Angiography after developing CAD suspicions. Each patient had 27 characteristics, including historical, demographic, and laboratory information [8]. 2.2.5. General hospital, Nigeria database The dataset includes 506 instances with 12 features from two general hospitals in Kano, Nigeria. Eleven clinical risk factors for CAD are present in the dataset, along with one demographic component. The binary classification with integer values 0 for no disease and 1 for disease is the “goal” variable [9], [10]. 2.2.6. Extension of Z-Alizadeh Sani dataset This dataset is Z-Alizadeh Sani's extended version. The main difference between Z-Alizadeh Sani and extension Z-Alizadeh Sani dataset is that the extended dataset not only helps in the classification of CAD but also the stenosis of the arteries is predicted with the 3 added features LAD, LCX, and RCA to the existing dataset [7] The list of datasets utilized for CAD diagnosis by the researchers and authors is attached in Table 1. Table 1. The list of datasets for CAD diagnosis Dataset No. Dataset Name Sample No. Input features Features No. Stenosis 1 Heart disease dataset-Kaggle 1026 14 2 UCI repository 920 14 3 Z-Alizadeh Sani 303 56 4 Extension of Z-Alizadeh Sani 303 59 ✓ 5 Indira Gandhi Medical College Database, Shimla, India 335 25 6 General hospital, Nigeria 506 18 2.3. Review of the articles on CAD prediction using machine learning algorithms (Back ground study) Mostly CAD detection systems employ supervised ML techniques. There have also been few reports of clustering algorithms being used for CAD diagnosis. The list of articles published using ML for CAD diagnosis is depicted in Table 2 (see appendix). Alizadehsani et al. [7] proposed a CAD diagnosis model which calculates the stenosis of each vessel. The effect of features on these three vessels was evaluated using the information gain. ML classifiers including C4.5, NB, and KNN were applied on the new features added to the Alizadeh dataset. C4.5 reaches the highest accuracy with 74.20%, 63.76%, and 68.33% for LAD, LCX, and RCA vessels, respectively. In a study by Garavand et al. [39] the efficiency of the various ML classifiers MLP, SVM, LR, J48, RF, KNN, and NB in predicting CAD was compared. The most effective algorithms for diagnosing CAD from patient examination data were SVM and RF. With the survey carried out in Table 2, all the papers used public datasets including UCI repository, Z-Alizadeh Sani dataset. Only few researchers utilize real world datastore. The limitation of the Cleveland dataset is that few instances are found to be missing and suitable missing value imputation methods are incorporated to eliminate missing values from the dataset. Unlike other datasets, Cleveland dataset is used for multilevel classification problems where target variable indicates the level of disease ranging from 0 to 4. Some studies [40]–[42] on CAD diagnosis achieved 90%, 70%, and 75% accuracy, respectively. However, they did not assess the stenosis of each vessel independently. Researchers can employ extension of Z-Alizadeh dataset for stenosis arteries. 37 features of the extension of Z-Alizadeh dataset were examined and the stenosis of these arteries was significantly influenced by the features age and typical chest pain. Researchers generally employ performance estimators including accuracy, sensitivity, specificity, F1 score as shown in (2), (3), (4), and (5). Table 2 makes it clear that accuracy was the selection criterion used
  • 6. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 663 by all authors. As additional selection criteria, [11], [15], [21], [28], [31], [37], [38] used F1 score, sensitivity, and specificity. 3. DATA COLLECTION Collecting data is the foremost step in ML pipeline. It is the process of gathering, measuring, and analysing information gleaned from a profusion of diverse sources. The information gathered is utilised to create ML and AI solutions. Data collection includes preprocessing, feature extraction, and selection. 3.1. Data preprocessing Four stages make up preprocessing, which is done to ensure high-quality data. In order to produce an accurate result, cleaning involves removing noisy and missing values from the dataset. Once the data has been cleaned of noise and missing values, it is translated into a different format without changing the contents of the datasets by transformation. It involves aggregation, standardization, and smoothing. The process of merging data from numerous sources into a single database by integration. For the gathered data to give relevant findings, it must be structured, which is known as reduction. 3.2. Dimensionality reduction Feature extraction is a dimensionality reduction procedure that reduces an initial collection of raw data to more manageable groups for processing. Algorithms for linear transformations that are often utilized include principal component analysis. It looks for mutually orthogonal directions in the feature space as well as directions that maximize variance. Feature selection (FS) eliminates the redundant and irrelevant data by increasing the accuracy and provide a better understanding of the model. By choosing the most prominent features, a unique fast conditional mutual information feature selection technique (FCMIM) improves accuracy [25]. The algorithm is feasible with the classifier SVM in order to detect cardiac problems. For the features that less contributes for the improvement in the system, Ali et al. [17] An optimally configured and improved deep belief network (OCI- DBN) approach for heart disease prediction based on Ruzzo-Tompa and stacked genetic algorithm a novel feature selection algorithm Ruzzo-Tompa which eliminates the irrelevant features from the dataset. With the selected features, an optimally configured and Improved deep belief network is created and the accuracy is improved up to 94.61%. With the feature selection techniques namely MLR, and sequential feature selection (SFS), age, slope, exang, fluoroscopy, and thalach are the features selected from [34] NFC with the feature selection method MLR (MLR+NFC), attains the accuracy of 84% than with SFS+NFC. With suitable feature selection through PSO [32], the attributes are selected from IGMC, Shimla. The selected attributes include smoking, diabetes mellitus, high density lipoprotein. These attributes are then used with the classification algorithms and MLR achieved an accuracy of 84.17% by properly identifying the wrong instances. 4. CLASSIFICATION 4.1. Machine learning algorithms ML algorithms discover hidden patterns in data, anticipate outcomes, and enhance performance based on their own experiences. These algorithms and models are intended to learn from data and generate predictions or choices in the absence of explicit instructions. After selecting the attributes from suitable feature selection techniques, the selected feature subset along with classifier results are compared. 4.1.1. K-nearest neighbor Hodges and Evelyn introduced the KNN rule, a nonparametric technique for classification and regression, in 1951. KNN is a straightforward but efficient classification method where little to no prior knowledge about the distribution of the data is available since it makes no assumptions on the data. The strategy entails locating the k data points in the training set that are most similar to the data point for which a target value is missing and assigning the average value of those data points to the missing data point. A single diagnostic method for the prediction of 3 cardiac abnormalities namely CAD, myocardial infarction, and congestive heart failure are developed by Acharya et al. [15] using ECG. Additionally, it pinpoints the precise cardiac abnormalities seen in patients during an ECG test, obviating the need for other diagnostic techniques. This non-invasive, cost-effective method can be further extended in near future by detecting the cardiac abnormalities in the early phase by using a single ECG pulse.
  • 7.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 664 4.1.2. Logistic regression LR is a supervised method for binary problems either true/false, yes/no, pass/fail. The independent variables can be categorical/numerical whereas the dependent variable is always categorical. The application of LR includes credit scoring, predicting user behavior, and discrete choice analysis. By removing the insignificant features from the Cleveland dataset, the optimal attributes were chosen using MLR, and SFS. With the selected attributes, a novel NFC is proposed by Marateb and Goudarzi [34] performance is achieved when integrated NFC with the MLR with 84% than SFS+MLR. 4.1.3. Naïve Bayes NB uses Bayes’ theorem for classification problems. Application of NB includes medical diagnosis, spam filters, text analysis. Tarawneh and Embarak [23] created a hybrid model for CAD diagnosis with the selected 12 features, and different classification algorithms including SVM, KNN, GA, RF, NN, and J4.8 is implemented. NB, SVM shows good performance results with an accuracy of 89.2%. The following is the formula for Bayes' theorem. 𝑃(𝐴|𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) 𝑃(𝐵) (1) 𝑃(𝐴|𝐵) = Posterior probability of class given predictor 𝑃(𝐵|𝐴) = Likelihood is the probability of predictor given class 𝑃(𝐴) = Class prior probability 𝑃(𝐵) = Predictor Prior probability 4.1.4. Decision tree It is a supervised approach that solves prediction and classification problems. DT is a tree-like structure with the nodes indicating the test on the attribute, branch nodes expressing the test's outcome, and leaf nodes providing the classification label. With the path created from root to the leaf node, DT can be easily transformed into a set of rules. Finally, appropriate conclusions are reached by following these rules. Applications include sentiment analysis, and products selection. Fuzzy expert system for the prediction of CAD is developed by Muhammad and Algehyne [10] which includes knowledge base, inference engine, and defuzzification phases. 87 diagnostic rules are framed in the knowledge base. Instead of using traditional methods like interviews, questionnaires, the upgraded C4.5 is used to incorporate human knowledge into the system's knowledge base. C4.5 based fuzzy expert system attained an accuracy of 94.55%. With the real-world dataset collected from Indira Gandhi Medical College, Shimla, the data set is evaluated for missing values using a hybrid and new k-means cluster centroid-based method, and C4.5, NB Tree, and MLPs are utilized to predict CAD. When compared with the other predictive models, C4.5 constructed with 25 features yielded the highest accuracy, sensitivity and specificity [8]. With the Z-Alizadeh dataset, CART model is created by calculating the feature importance score of the features. A typical feature attains the highest feature score of 16.5%. Features which attained with 0% importance is neglected for classification [28]. Idris et al. [33] created an embedded method DT using RF with the features selected from Gini impurity. 20 features were used for the classification of CAD. The model of NN with Embedded DT features acquired the highest precision and accuracy 94.5%. University Of Malaya Medical Centre (UMMC) and subarachnoid hemorrhage (SAH) dataset is compared on the performance metrics. 4.1.5. Random forest RF is a technique for solving regression and classification problems. A random forest model, as an ensemble approach, is built from a number of small DTs or estimators, each of which provides its own predictions. The estimators' estimates are combined by the random forest model to get a prediction that is more accurate. The ‘forest’ in RF is trained using bootstrap or bagging aggregation. Applications include credit card default, stock market prediction, and product recommendation. According to Muhammad et al. [11], the DT created using the RF approach was the most effective model in terms of accuracy and receiver operating characteristic (ROC). With this combined DT from RF model, production rules are created, and this expert system diagnosed CAD victims very accurately at the rate of 92.04% in Nigeria. According to Jinny and Mate [21], it is evident that, of all the classifiers, by using the most features possible for the classification model, the RF classifier was successful in getting an accuracy of about 91%. Hybrid RF with a linear model–HRFLM proposed by Mohan et al. [24] utilized with 11 attributes and achieved an accuracy of 88.7% when applied to the UCI heart dataset. Rajdhan et al. [36] developed a ML model that contrasts four different methods using the UCI Cleveland dataset. Examining the classification
  • 8. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 665 accuracy of RF, LR, NB, and DT algorithms, it was found that the RF approach had an accuracy rate of 90.16%. Proposed method by Joloudari et al. [16], data mining techniques SVM, CHAID, C5.0, and RTs were implemented on the Z-Alizadeh dataset. Random trees are the best amongst the other with 91.47% accuracy using the 10-fold validation. The following metrics were examined and evaluated: accuracy, AUC, Gini, return on investment (ROI), profit, confidence, response, and gain. The Cleveland heart dataset was used to train classifiers NB, SVM, LR, RF, and Adaboost [35]. A comparison of the results without feature selection and with feature selection is made. 8 features selected from the hybrid model yielded the highest accuracy results. 4.1.6. Support vector machine SVM is a type of supervised learning approach that categorizes data with the hyperplane. The new data is assigned to the appropriate category by the hyperplane, which then categorizes the n-dimensional space. The data points which is very close to the hyper plane is called support vectors. SVM maximizes the margin to reduce any chance of misclassification. The application includes face detection, image classification. The algorithm can classify both linear and non-linear data. Li et al. [25] contrasted various feature selection methodologies, such as relief, maximum relevance minimum redundancy, least absolute shrinkage and selection operator, LLBFS FS algorithms, and presented a novel feature selection methodology, FCMIM. SVM is utilized as a classifier that combines with FCMIM for the prediction of cardiac illnesses among LR, KNN, ANN, NB, and DT. Abdar et al. [26] created a novel ML methodology for accurate CAD prediction. The Z-Alizadeh Sani dataset is preprocessed and normalized. The GA and PSO algorithms are employed in the selection of the features. The classification algorithms including NB, generalized linear models (GLM), LR, DL, DT, RF, GBT, and 3 types of SVM namely C-SVC, nu-SVC, and linear SVM were also tested on the dataset. The N2Genetic-NuSVM method has the highest accuracy (93.08%) and F1-score rate (0.9151). 4.1.7. Artificial neural network A NN that mimics the human brain, also known as an artificial neural network. Interconnected neurons can be found in the human brain. Similar to this, the neurons in ANN are arranged in several layers and connected to one another. The various layers include input layers, hidden layers, and the output layers. Kahramanli and Allahverdi [19] created a hybrid system for diabetes and heart disease prediction. Using the UCI dataset, ANN and FNN hybridization has been implemented. FNN1, FNN2, FNN3 are the three types of FNN. While the weights are fuzzy, the inputs to the FNN1 are crisp values and vice-versa with FNN2. Both the input and the weights for FNN3 are fuzzy values. A hybrid system is created by combining a FNN2 and an ANN trained with the backpropagation. The model shows 84.2% accuracy for Pima Indians database and 86.8% for Cleveland database. A computer-aided system called heart disease prediction system (HDPS) [29] was created using the C and C# environments to forecast heart disorders. With the 13 clinical features, ANN classifies with a degree of precision of 80%. HDPS interface is created with data input panel, ROC curve display section and performance display section. A low compact sensor is designed by Dixit and Kala [18] that records the ECG signals and the preprocessed signal is then segmented. With each window, suitable features are extracted and the classification is made. Finally fusion of the windows is done by 1D CNN model with an accuracy of 93%. 4.1.8. Fuzzy logics Fuzzy logics is a multi-valued logic in which variables takes the value between 0 and 1. A Fuzzy expert system-based prediction [37] focused on modules, meta-rules, and consistency checks in the rule base for improved rule organization. In the current expert system, a specific emphasis has been placed on effective rule organization methods. Because the patient may be unaware of the values of all clinical parameters, the various combinations of the criteria are created including age, blood pressure, cholesterol. In the rule base, two consistency checks namely i) contradictory rule checking and ii) redundant rule checking is defined for the improvement in results. When diagnosing CAD using a fuzzy expert system based on PSO, the membership functions are optimised using PSO, and a fuzzy rule basis is produced using the optimised membership functions. With the fuzzy rule base, Mamdani inferenceing is implemented that yielded the highest accuracy of 93.27% with the Cleveland and Hungarian dataset [20]. The fuzzy system inferenced with Mamdani approach predicted the heart disease with the rules generated by C4.5. The highest accuracy of 94.55% is achieved by Muhammed and Algehyne [10]. 4.1.9. Other learning methods With the two-level stacking, Wang et al. [30] utilized the enumeration algorithm to identify the best classifiers with the Z-Alizadeh dataset. A novel hybrid dataset ‘Sathvi’ [38] by integrating public health dataset
  • 9.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 666 is created. The objective of this hybrid dataset is to make datasets free from noise. The attributes ‘ca’, ‘thal’ were eliminated from the dataset. With the comparison of the classifiers taken for this research, CatBoost classifier outperformed the other classifier with the accuracy from 88% to 98.11%. 4.2. Performance measurements The classification of heart disease data requires the use of several supervised ML algorithms. The categorization models were assessed using the eight quality factors such as true positive (TP), false positive (FP), true negative (TN), false negative (FN), accuracy, specificity, sensitivity, and F1 score. These performance criteria for the categorization analysis were looked into. TP – No. of. victims with presence of heart disease predicted as presence of heart disease. FP – No. of. victims with absence of heart disease predicted as presence of heart disease. TN – No. of. victims with absence of heart disease predicted as absence of heart disease. FN – No. of. victims actually have presence of heart disease predicted as absence of heart disease. An accuracy score indicates how effectively a model performs. It is calculated as the sum of TPs and TNs, divided by the sum of TPS, FPS, TNS, and FNS. The formula is: 𝐴ccuracy = (TP+TN) (TP+FP+TN+FN) (2) following accuracy, specificity is a measure of negative cases recognized as negative by the classifier. The formula is: 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (𝑇𝑁) (𝑇𝑁+𝐹𝑃) (3) the proportion of cases that were actually positive but were predicted to be positive is known as sensitivity. Another name for sensitivity is recall. To put it another way, an unhealthy person was predicted to be unhealthy. The formula is: 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (𝑇𝑃) (𝑇𝑃+𝐹𝑁) (4) the harmonic mean of precision and recall is called F1 measure. The value is 1 for the finest performance and 0 for the worst. The formula is: 𝐹1 = 2(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 (5) 5. RESULT ANALYSIS In this review, traditional classifiers and boost classifiers applied on the public datasets was studied. RF, DT, and KNN outperforms state-of-the-art techniques, and achieved accuracy of 100% using KNN algorithm. The performance evaluation results from the literature are shown in Figures 5(a)-5(c). The performance of all 3 classifiers resulted above 85%. With regard to the articles examined in Table 2, 7 references with Sr.No [1], [2], [7], [13], [16], [27], [28] exhibit the best results for RF, and the authors used, correspondingly, 5, 6, 6, 12, 7, 4, and 5 studies for comparison. 5 references with Sr.No [3], [5], [8], [20], [25] showed best results for DT, and the authors used 4, 7, 5, 34, and 6 studies for comparison. 3 references with Sr.No [4], [6], [19] showed the best results for KNN where 2, 29, and 6 studies are compared respectively. Figure 5 shows the performance comparison of different CAD prediction models, Figure 5(a) shows the accuracy achieved by RF, Figure 5(b) shows the accuracy achieved by DT, and Figure 5(c) shows the accuracy achieved by KNN. With RF, the maximum attained accuracy is by Sr.No [2] with 92.90%. With DT, Sr.No [3] showed 99.2% accuracy. 100% accuracy is attained by Sr.No [4] with KNN and proved to be one of the best classifiers in CAD prediction. 5.1. Challenges and research directions ML is frequently employed to solve categorization problems in the health care sector. Our study on ML algorithms opens up several research issues especially in healthcare. We observe that ML research is actively taking place in the field of cardiology, with some intriguing proofs-of-concept, and proprietary solutions being developed by the research community, healthcare industries respectively.
  • 10. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 667 In general, the nature and quality of data, in addition to the quality of learning algorithms, will regulate a machine learning-based solution's success and efficiency. It is challenging to gather real world data in the pertinent fields, such as agriculture, IoT, healthcare. Therefore, a more thorough investigation of data collection is required. Also, it is a challenging task to accurately pre-process the data collected from different sources. Therefore, to effectively employ the learning algorithms, it is required to modify or expand existing pre- processing methods or to suggest new data preparation strategies. The hybrid learning model, for instance, the ensemble of methods, the modification or refinement of the current learning approaches, or the construction of new learning methods, could be a potential future work in classification and prediction of heart related disease. (a) (b) (c) Figure 5. Accuracy of CAD prediction models (a) accuracy of random forest classifier, (b) accuracy of decision tree classifier, and (c) accuracy of k-nearest neighbor classifier
  • 11.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 668 6. CONCLUSION ML algorithms have enormous potential in predicting heart-related ailments. Each of the aforementioned algorithms performed excellently in certain cases while failed terribly in others. In order to do this, we searched extensively across a number of search engines and databases. From 2006 to 2022, the most significant research on CAD diagnosis using ML algorithms was done, according to our review. The outcomes also shows that the most popular CAD detection methods are RFs, DTs, KNN, SVMs, ANNs, Fuzzy logics, and NB. Because of the inherent variety of datasets and ML algorithms, different performance metrics have been documented. According to the results, RF, DT, and KNN have the highest accuracy levels for the majority of the CAD datasets. Additionally, hybridizing classifiers and feature selection can enhance performance for precise CAD diagnosis. With the same methodology and dataset, the studies can be extended to other heart- related diseases. With the proper utilization of the real-time datasets, the studies can be improvised. APPENDIX Table 2. The list of articles published using ML for CAD diagnosis Sr. No Author Year Techniques Conclusion Dataset Observations 1 Muhammad et al. [11] 2021 LR, SVM, KNN, RF, NB, GB RF Accuracy = 92.04% General Hospital, Nigeria Results from various ML algorithms were obtained, and they were compared. 2 Yilmaz et al. [12] 2022 RF, SVM, LR RF Accuracy = 92.9% IEEE DataPort Three traditional ML approaches were employed, and RF is used as a best classifier for the prediction of CHD. 3 Soni et al. [13] 2011 DT with GA, NB, Classification via clustering DT with GA Accuracy = 99.2% NR With GA, the optimal subset of features is retrieved, and DT, Bayesian classification shows improvement in results. 4 Jabbar et al. [14] 2013 KNN, J48, NB KNN Accuracy = 100% UCI Repository Optimal feature selection method, along with KNN classifier is compared with other datasets. Classifier hybridization with feature selection can best choose the features, enhancing accuracy. 5 Verma et al. [8] 2017 DT, MLP, NB tree C4.5 Accuracy = 97.6% Sensitivity = 97.5% Specificity = 97.6% Indira Gandhi Medical College, Shimla, India Three predictive data mining approaches were compared. 6 Acharya et al. [15] 2018 DT, KNN KNN Accuracy = 99.55% Sensitivity= 99.93% Specificity= 99.24% St.-Petersburg Institute of Cardiological Technic (ECG) With the selected 20 features from the ECG signal, KNN achieved the highest accuracy from contourlet coefficients. 7 Joloudari et al. [16] 2020 SVM, CHAID, C5.0, and RT RT Accuracy = 91.47% Z-Alizadeh Sani heart disease dataset Different classifiers were experimented and RT reports best in Accuracy, (area under the curve) AUC, Gini. The most important rules were extracted using the random trees model for CAD diagnosis. 8 Muhammad and Algehyne [10] 2021 C4.5 Fuzzy Expert system Fuzzy Logics Accuracy = 94.55% Sensitivity=95.35% Specificity = 95.0% General Hospital, Nigeria The C4.5 data mining algorithm is used to incorporate human expertise into the knowledge base of the system. 9 Ali et al. [17] 2020 OCI-DBN, ANN, DNN OCI-DBN using SGA Accuracy = 94.61% Sensitivity= 96.03% Specificity= 93.15% Cleveland heart disease dataset Ruzzo-Tompa method eliminates features that less contributes to system performance. Stacked genetic algorithm is used to build the best-configured DBN. 10 Dixit and Kala [18] 2021 RF, GB, Deep Learning models (1D CNN) 1D CNN Accuracy = 93.0% Swaroop Rani Nehru Hospital- Allahabad, Oversampling with 1D CNN and voting strategy emerges as the suitable classification technique with 93% accuracy.
  • 12. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 669 Table 2. The list of articles published using ML for CAD diagnosis (Continue) Sr. No Author Year Techniques Conclusion Dataset Observations 11 Kahramanli and Allahverdi [19] 2008 ANN, FNN on 2 datasets HNN (Pima dataset) Accuracy= 84.2% Sensitivity=80.3% Specificity 87.3% HNN (Cleveland dataset) Accuracy= 87.4% Sensitivity = 93.0% Specificity 78.5% Pima Indians diabetes and Cleveland heart disease The databases cover Cleveland heart disease and Pima Indian diabetes. The categorization accuracy of these datasets was assessed using k-fold cross- validation. 12 Muthukaruppan et al. [20] 2012 Fuzzy expert system PSO based Fuzzy expert system Accuracy = 93.27% UCI Repository Cleveland and Hungarian Heart Disease dataset from UCI repository. DT was employed to identify the characteristics that influence the diagnosis. To fine- tune the fuzzy membership functions, PSO was used. The optimized MFs yielded the highest accuracy. 13 Jinny and Mate [21] 2021 DT, RF, AdaBoost, GNB, LR, KNN, GB, XGB RF Accuracy = 90.7% Framingham’s Dataset Performance comparison of both conventional ML methods, and cutting-edge Gradient Boosting techniques with feature selection and without feature selection is made. 14 Tiwari et al. [22] 2022 ET, RF, XG, RF, MLP, KNN, XGB, SVM, SGD, AdaBoost, CART, GB, NB Stacked ensemble classifier Accuracy = 92.34% IEEE Data Port Hungarian, Cleveland, Long Beach VA, Switzerland, and Statlog datasets were combined into a one dataset. The stacked ensemble classifier implemented achieved highest accuracy of 92.34% than the other classifiers. 15 Tarawneh and Embarak [23] 2019 NB, SVM, KNN, NN, J4.8M, RF, GA NB, SVM Accuracy = 89.2% Cleveland heart disease dataset A variety of ML classifiers were deployed on Cleveland datasets to predict cardiac disease, and NB, SVM shows improvement in accuracy when compared to other classifiers. 16 Mohan et al. [24] 2019 NB, LR, DL, DT, RF, GB Tree, SVM, VOTE, HRFLM HRFLM Accuracy = 88.4% Sensitivity=92.8% Specificity=82.6% UCI Repository With the DT features, the dataset has been clustered. The classifiers performance is then estimated by applying them to each clustered dataset. 17 Li et al. [25] 2020 LR, KNN, ANN, SVM, NB, DT FCMIM – SVM Accuracy = 92.37% Specificity = 98.0% Sensitivity = 89.0% Cleveland heart disease dataset The feature extraction process used Fast Conditional Mutual Information, which improved prediction accuracy with SVM. 18 Abdar et al. [26] 2019 SVC, nuSVM, LinSVM N2Genetic-nuSVM Accuracy = 93.08% Z-Alizadeh Sani heart disease dataset Several ML techniques were explored, and three different forms of SVM were deployed, along with feature tuning for improved accuracy. 19 Shah et al. [27] 2020 NB, KNN, DT, RF KNN Accuracy = 90.78% UCI Repository Four traditional ML approaches were employed, and KNN is used as a classifier for the prediction of CAD. 20 Ghiasi et al. [28] 2020 Bagging, SMO, Bagging SMO, NB, C4.5, J48, SVM, ANN, ANN-GA, CART. CART Accuracy = 92.41% Z-Alizadeh Sani heart disease dataset The important and non-important features were categorized and the CART model for CAD diagnosis was created using those important features. CART SMO and Bagging SMO models, ANN-GA model gives the accurate results but CART reaches the highest accuracy. 21 Chen et al. [29] 2011 ANN ANN Accuracy = 80.0% UCI Repository A user-friendly ANN based Heart Disease Prediction System (HDPS) is developed, and reliably predicts outcomes.
  • 13.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 670 Table 2. The list of articles published using ML for CAD diagnosis (Continue) Sr. No Author Year Techniques Conclusion Dataset Observations 22 Wang et al. [30] 2020 LR, RF, GNB, SVC, DT, KNN, ADB, GB, MLP, XGB, Stacking model Stacking model Accuracy = 95.43% Sensitivity=95.84% Specificity=94.44% Z-Alizadeh Sani heart disease dataset The stacking model is tested on three distinct datasets and its performance is compared to that of other classifiers. The best combining classifiers are then discovered using the enumeration approach. 23 Sayadi et al. [31] 2022 DT, DL, LR, RF, SVM, XGB LR, SVM Accuracy=95.08% Z-Alizadeh Sani heart disease dataset With the features selected from Pearson feature selection, six traditional ML approaches were compared. LR, SVM reached the highest accuracy level fo CAD detection. 24 Verma et al. [32] 2016 MLP, MLR FURIA, C4.5 MLR Accuracy = 88.4% Indira Gandhi Medical College (IGMC), Shimla, India Dimension reduction using correlation-based feature subset selection using PSO, and data clustering to spot cluster data points that were improperly assigned. Finally, C4.5, MLP, MLR, and FURIA were used to build the hybrid model. 25 Idris et al. [33] 2020 LR, NN, kNN, DT, NB, SVM, DL, Vote NN with Embedded DT features Accuracy = 94.5% Malaysian National Cardiovascular Disease Database (NCVD)–ACS registry With the significant features selected from suitable feature selection methods, a prediction model was built and its performance metrics was compared. 26 Marateb and Goudarzi [34] 2015 Fuzzy expert system (MLR+NFC) Fuzzy rule-based system (MLR+NFC) Accuracy = 84.0% Cleveland heart disease dataset The features are selected with MLR and integrated with the Neuro-Fuzzy classifier for the prediction of heart diseases. 27 Rani et al. [35] 2021 NB, SVM, LR, RF, AdaBoost RF Accuracy = 86.6% Cleveland heart disease dataset A hybrid technique that included GA and RFE was applied for feature selection. Missing values in the dataset were treated using Multiple Imputation by Chained Equations algorithm. RF provided the highest performance in combination with MICE, GARFE, Scaling and SMOTE. 28 Rajdhan et al. [36] 2020 DT, LR, RF, NB RF Accuracy = 90.16% Cleveland heart disease dataset When the accuracy of multiple ML algorithms was compared, RF performed well. 29 Pal et al. [37] 2012 ANN, ID3, CART, Fuzzy Fuzzy Logics Accuracy = 84.20% Sensitivity= 95.85% Specificity=83.33% NR The risk factors for developing CAD, approaches for gathering and representing knowledge, a strategy for organizing rules, fuzzifying clinical parameters, and defuzzification to crisp value are all described in this study. 30 Kanagarathinam et al. [38] 2022 NB, XGB, KNN, SVM, MLP, CatBoost CatBoost classifier Accuracy = 94.34% Sathvi dataset Sathvi dataset is created by combining the 4 CVD datasets. The dataset has 531 instances and 12 attributes with absence of missing data. *NR = Not reported REFERENCES [1] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia, and J. Gutierrez, “A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease,” in Proceedings - IEEE Symposium on Computers and Communications, Jul. 2017, pp. 204–207, doi: 10.1109/ISCC.2017.8024530. [2] C. W. Tsao et al., “Heart disease and stroke statistics-2022 update: A report from the American Heart Association,” Circulation, vol. 145, no. 8, pp. E153–E639, 2022, doi: 10.1161/CIR.0000000000001052. [3] C. Abbafati et al., “Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global
  • 14. Int J Artif Intell ISSN: 2252-8938  Potential directions on coronary artery disease prediction using machine learning … (Anu Ragavi Vijayaraj) 671 Burden of Disease Study 2019,” The Lancet, vol. 396, no. 10258, pp. 1223–1249, 2020, doi: 10.1016/S0140-6736(20)30752-2. [4] P. Libby and P. Theroux, “Pathophysiology of coronary artery disease,” Circulation, vol. 111, no. 25, pp. 3481–3488, Jun. 2005, doi: 10.1161/CIRCULATIONAHA.105.537878. [5] D. Lapp, “Heart disease dataset,” Kaggle, pp. 1–6, 2020, [Online]. Available: https://ieee-dataport.org/open-access/heart-disease- dataset-comprehensive. [6] A. Frank and A. Asuncion, “UCI machine learning repository,” UCI, 2010, [Online]. Available: http://archive.ics.uci.edu/ml. [7] R. Alizadehsani et al., “A data mining approach for diagnosis of coronary artery disease,” Computer Methods and Programs in Biomedicine, vol. 111, no. 1, pp. 52–61, Jul. 2013, doi: 10.1016/j.cmpb.2013.03.004. [8] L. Verma, S. Srivastava, and P. C. Negi, “An intelligent noninvasive model for coronary artery disease detection,” Complex & Intelligent Systems, vol. 4, no. 1, pp. 11–18, Jul. 2018, doi: 10.1007/s40747-017-0048-6. [9] A. A. Haruna, L. J. Muhammad, B. Z. Yahaya, E. J. Garba, N. D. Oye, and L. T. Jung, “An improved C4.5 data mining driven algorithm for the diagnosis of coronary artery disease,” in Proceeding of 2019 International Conference on Digitization: Landscaping Artificial Intelligence, ICD 2019, Nov. 2019, pp. 48–52, doi: 10.1109/ICD47981.2019.9105844. [10] L. J. Muhammad and E. A. Algehyne, “Fuzzy based expert system for diagnosis of coronary artery disease in nigeria,” Health and Technology, vol. 11, no. 2, pp. 319–329, Feb. 2021, doi: 10.1007/s12553-021-00531-z. [11] L. J. Muhammad, I. Al-Shourbaji, A. A. Haruna, I. A. Mohammed, A. Ahmad, and M. B. Jibrin, “Machine learning predictive models for coronary artery disease,” SN Computer Science, vol. 2, no. 5, Jun. 2021, doi: 10.1007/s42979-021-00731-4. [12] R. Yilmaz and F. H. Yagin, “Early detection of coronary heart disease based on machine learning methods,” Medical Records, vol. 4, no. 1, pp. 1–6, Jan. 2022, doi: 10.37990/medr.1011924. [13] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive data mining for medical diagnosis: An overview of heart disease prediction,” International Journal of Computer Applications, vol. 17, no. 8, pp. 43–48, Mar. 2011, doi: 10.5120/2237-2860. [14] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, “Heart disease classification using nearest neighbor classifier with feature subset selection,” Annals Computer Science Series, vol. XI, no. 1, pp. 47–54, 2013, [Online]. Available: http://www.anale- informatica.tibiscus.ro/download/lucrari/11-1-06-Jabbar.pdf. [15] U. R. Acharya et al., “Automated characterization of coronary artery disease, myocardial infarction, and congestive heart failure using contourlet and shearlet transforms of electrocardiogram signal,” Knowledge-Based Systems, vol. 132, pp. 156–166, Sep. 2017, doi: 10.1016/j.knosys.2017.06.026. [16] J. H. Joloudari et al., “Coronary artery disease diagnosis; ranking the significant features using a random trees model,” International Journal of Environmental Research and Public Health, vol. 17, no. 3, p. 731, Jan. 2020, doi: 10.3390/ijerph17030731. [17] S. A. Ali et al., “An optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction based on ruzzo-tompa and stacked genetic algorithm,” IEEE Access, vol. 8, pp. 65947–65958, 2020, doi: 10.1109/ACCESS.2020.2985646. [18] S. Dixit and R. Kala, “Early detection of heart diseases using a low-cost compact ECG sensor,” Multimedia Tools and Applications, vol. 80, no. 21–23, pp. 32615–32637, Aug. 2021, doi: 10.1007/s11042-021-11083-9. [19] H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the diabetes and heart diseases,” Expert Systems with Applications, vol. 35, no. 1–2, pp. 82–89, Jul. 2008, doi: 10.1016/j.eswa.2007.06.004. [20] S. Muthukaruppan and M. J. Er, “A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease,” Expert Systems with Applications, vol. 39, no. 14, pp. 11657–11665, Oct. 2012, doi: 10.1016/j.eswa.2012.04.036. [21] P. R. L, S. V. Jinny, and Y. V. Mate, “Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter optimization and machine learning techniques,” Health and Technology, vol. 11, no. 1, pp. 63–73, Nov. 2021, doi: 10.1007/s12553- 020-00508-4. [22] A. Tiwari, A. Chugh, and A. Sharma, “Ensemble framework for cardiovascular disease prediction,” Computers in Biology and Medicine, vol. 146, p. 105624, Jul. 2022, doi: 10.1016/j.compbiomed.2022.105624. [23] M. Tarawneh and O. Embarak, “Hybrid approach for heart disease prediction using data mining techniques,” Lecture Notes on Data Engineering and Communications Technologies, vol. 29, pp. 447–454, 2019, doi: 10.1007/978-3-030-12839-5_41. [24] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707. [25] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart disease identification method using machine learning classification in e-healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149. [26] M. Abdar, W. Książek, U. R. Acharya, R. S. Tan, V. Makarenkov, and P. Pławiak, “A new machine learning technique for an accurate diagnosis of coronary artery disease,” Computer Methods and Programs in Biomedicine, vol. 179, p. 104992, Oct. 2019, doi: 10.1016/j.cmpb.2019.104992. [27] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction using machine learning techniques,” SN Computer Science, vol. 1, no. 6, Oct. 2020, doi: 10.1007/s42979-020-00365-y. [28] M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, “Decision tree-based diagnosis of coronary artery disease: CART model,” Computer Methods and Programs in Biomedicine, vol. 192, 2020, doi: 10.1016/j.cmpb.2020.105400. [29] A. H. Chen, S. Y. Huang, P. S. Hong, C. H. Cheng, and E. J. Lin, “HDPS: Heart disease prediction system,” Computing in Cardiology, vol. 38, pp. 557–560, 2011. [30] J. Wang et al., “A stacking-based model for non-invasive detection of coronary heart disease,” IEEE Access, vol. 8, pp. 37124– 37133, 2020, doi: 10.1109/ACCESS.2020.2975377. [31] M. Sayadi, V. Varadarajan, F. Sadoughi, S. Chopannejad, and M. Langarizadeh, “A machine learning model for detection of coronary artery disease using noninvasive clinical parameters,” Life, vol. 12, no. 11, p. 1933, Nov. 2022, doi: 10.3390/life12111933. [32] L. Verma, S. Srivastava, and P. C. Negi, “A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data,” Journal of Medical Systems, vol. 40, no. 7, Jun. 2016, doi: 10.1007/s10916-016-0536-z. [33] N. Md Idris, Y. K. Chiam, K. D. Varathan, W. A. Wan Ahmad, K. H. Chee, and Y. M. Liew, “Feature selection and risk prediction for patients with coronary artery disease using data mining,” Medical and Biological Engineering and Computing, vol. 58, no. 12, pp. 3123–3140, Nov. 2020, doi: 10.1007/s11517-020-02268-9. [34] H. R. Marateb and S. Goudarzi, “A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system,” Journal of Research in Medical Sciences, vol. 20, no. 3, pp. 214–223, 2015. [35] P. Rani, R. Kumar, N. M. O. S. Ahmed, and A. Jain, “A decision support system for heart disease prediction based upon machine learning,” Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263–275, Jan. 2021, doi: 10.1007/s40860-021-00133-6. [36] A. Rajdhan, A. Agarwal, M. Sai, D. Ravi, and P. Ghuli, “Heart Disease Prediction using Machine Learning,” International Journal of Engineering Research and, vol. V9, no. 04, May 2020, doi: 10.17577/IJERTV9IS040614. [37] D. Pal, K. M. Mandana, S. Pal, D. Sarkar, and C. Chakraborty, “Fuzzy expert system approach for coronary artery disease screening using clinical parameters,” Knowledge-Based Systems, vol. 36, pp. 162–174, Dec. 2012, doi: 10.1016/j.knosys.2012.06.013.
  • 15.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 658-672 672 [38] K. Kanagarathinam, D. Sankaran, and R. Manikandan, “Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset,” Data and Knowledge Engineering, vol. 140, p. 102042, Jul. 2022, doi: 10.1016/j.datak.2022.102042. [39] A. Garavand, C. Salehnasab, A. Behmanesh, N. Aslani, A. H. Zadeh, and M. Ghaderzadeh, “Efficient model for coronary artery disease diagnosis: a comparative study of several machine learning algorithms,” Journal of Healthcare Engineering, vol. 2022, pp. 1–9, Oct. 2022, doi: 10.1155/2022/5359540. [40] H. G. Lee, K. Y. Noh, and K. H. Ryu, “A data mining approach for coronary heart disease prediction using HRV features and carotid arterial wall thickness,” in BioMedical Engineering and Informatics: New Development and the Future - Proceedings of the 1st International Conference on BioMedical Engineering and Informatics, BMEI 2008, May 2008, vol. 1, pp. 200–206, doi: 10.1109/BMEI.2008.189. [41] C. M. Chu et al., “A Bayesian expert system for clinical detecting coronary artery disease,” Journal of Medical Sciences, vol. 29, no. 4, pp. 187–194, 2009. [42] M. A. Karaolis, J. A. Moutiris, D. Hadjipanayi, and C. S. Pattichis, “Assessment of the risk factors of coronary heart events based on data mining with decision trees,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 3, pp. 559–566, May 2010, doi: 10.1109/TITB.2009.2038906. BIOGRAPHIES OF AUTHORS Anu Ragavi Vijayaraj is a full-time research scholar in the school of Computer Science and Engineering at VIT, Chennai, India. She received her B.Tech. in Information Technology from Kongu Engineering College, Perundurai (India) in 2013 and M.E. in Computer and Communication Engineering from Kongu Engineering College, Perundurai (India) in 2015. Her research interests include machine learning, deep learning, and artificial intelligence. She can be contacted at email: anuragaviraj@gmail.com. Subbulakshmi Pasupathi is an Assistant Professor in the school of Computer Science and Engineering at VIT, Chennai, India. She received her B.Tech. in Information Technology in 2009, M.E. in Computer Science and Engineering in 2011 and Ph.D. from Anna University, Chennai (India) in 2019. She is the author of many scientific publications in international journals and conferences. Her current research interests include cognitive networks, artificial intelligence, and machine learning. She can be contacted at email: subbulakshmi.p@vit.ac.in.