SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 3, September 2024, pp. 3686~3694
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3686-3694  3686
Journal homepage: http://ijai.iaescore.com
Deep neural networks and conventional machine learning
classifiers to analyze thoracic survival data
Cucu Ika Agustyaningrum1
, Yudi Ramdhani2
, Doni Purnama Alamsyah3
, Oda I. B. Hariyanto4
1
Faculty of Engineering and Informatics, Universitas Bina Sarana Informatika, Jakarta, Indonesia
2
Department of Information System, Faculty of Creative Technology, Satu University, Bandung, Indonesia
3
Department of Entrepreneurship, BINUS Business School, Bina Nusantara University, Jakarta, Indonesia
4
Department of Tourism, Batam International University, Batam, Indonesia
Article Info ABSTRACT
Article history:
Received Aug 13, 2023
Revised Feb 20, 2024
Accepted Mar 15, 2024
Lung cancer is a prevalent global health concern and most prevalent
malignancy in Indonesian hospitals. Following thoracic surgery, patients were
categorized into two classes: individuals who experienced mortality within a
year and those who achieved survival. Despite being about socks, the dataset
for the deceased category consisted of 70 data samples, while the dataset for
the final group comprised 400 samples. Data calculation involves the
utilization of both deep neural networks and standard machine learning
algorithms. The study use the Python programming language to evaluate the
algorithms, and it measures their performance using metrics such as accuracy,
F1-score, precision, recall, receiver operating characteristic (ROC), and area
under curve (AUC). The test results indicate that the deep neural network
method achieves an accuracy of 95.56%, an F1 score of 79.24%, a precision
of 91.96%, a recall of 85.52%, and an AUC of 85.52%. This study suggests
that utilizing deep neural network data mining techniques, specifically with a
cross-validation fold of 10, variations of six hidden layer encoder-decoder,
relu, sigmoid activation function, optimizer Adam, and learning rate of 0,01,
dropout rate of 0,2. Employing the synthetic minority over-sampling
technique data preprocessing method, can effectively analyze thoracic patient
survival data sets.
Keywords:
Algorithm
Area under curve
Conventional machine learning
Deep neural network
Pythons
Toraric survival data
This is an open access article under the CC BY-SA license.
Corresponding Author:
Doni Purnama Alamsyah
Department of Entrepreneurship, BINUS Business School, Bina Nusantara University
Jakarta 11480, Indonesia
Email: doni.syah@binus.ac.id
1. INTRODUCTION
According to the Indonesian sample registration system (SRS) report from 2014, 10 diseases are
responsible for most fatalities in Indonesia. Smoking cigarettes, both actively and passively, increases the risk
of developing lung cancer. Air pollution and exposure to the workplace are additional risk factors [1]. Because
of the high prevalence of smoking in society, lung cancer will become a public health issue in Indonesia. Lung
cancer is the fourth most common cancer detected in hospitals in Indonesia [2]. Surgery, radiation,
chemotherapy, immunotherapy, hormone therapy, and gene therapy can all be used to treat lung cancer. The
airways are almost entirely located on the ribs. The thorax is essential in the breathing process [3]. Thoracic
surgery is one of the most common treatments for lung cancer. However, thoracic surgery carries several risks
and complications, including neurological disorders, infections, and life-threatening complications. Many
complications occur in people with cardiovascular disease, including heart and blood vessel problems that can
lead to strokes. As a result, the life expectancy following thoracic surgery is extremely low [4]. The research
Int J Artif Intell ISSN: 2252-8938 
Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum)
3687
focus on predicting thoracic patient survival using a computer-aided diagnosis (CAD) system. The analysis of
the patient's condition before and after surgery, CAD can help predict the life expectancy of patients with lung
cancer. Data collection for patients with lung cancer undergoing thoracic surgery [5]. Following thoracic
surgery, patients were divided into two groups: those who died within a year (die) and those who were able to
survive. A total of 70 samples of data were included in the dataset for the die class and 400 samples were
included in the dataset for the latter group [5]. The performance of the prediction model for the survival of
thoracic patients may be affected by the class imbalance issue in the algorithm for detecting lung cancer in the
thoracic dataset [6]. Because there are more false positives than true positives in the thoracic surgery dataset,
there is a class imbalance. Bootstrap aggregating is a classification improvement technique that uses a random
combination of classifications on training and bagging datasets to reduce variance and avoid overfitting [7].
Previously, several researchers used various algorithms to predict the survival of thoracic patients in previous
studies, including the multi-class support vector machine algorithm with active learning for network traffic
classification [8]. Lung cancer classification using support vector machine and neural networks[9]. The medical
internet of things uses machine learning [10]. Based on unbalanced data, a comprehensive data-level analysis
for cancer diagnosis is performed [11]. Previous study indicates that the deep neural network technology has
not been employed thus far, with only traditional machine learning methods being utilized. The objective of
this study is to compare conventional machine learning and deep neural network algorithms in order to
determine the most effective approach for assessing potential online shopper intentions using the Python
programming language. The aim is to obtain the most accurate results for doctors and medical services in
receiving thoracic patient survival data.
2. METHOD
2.1. Research stages
The research on thoracic surgical data involved the utilization of a model constructed through a
systematic research method. This approach encompassed various steps, including dataset acquisition,
preprocessing, feature selection, smoothing, modeling, and evaluation. The research commences with
gathering data from the UCI machine learning repository website [12]. Subsequently, the data undergoes a
transformation to generate initial data for preprocessing. Following this, features are chosen utilizing the
Python programming language. The data is then balanced using the synthetic minority over-sampling technique
(SMOTE) method. Finally, the data is evaluated using conventional machine learning algorithms and deep
neural networks through data cross-validation. During this modeling phase, two techniques will be evaluated:
traditional machine learning and deep neural networks. Prior to inputing the modeling procedure, the data is
gathered using the SMOTE method, a data augmentation methodology that ensures a balanced representation
of both one-year survival and mortality possibilities. After a seamless procedure, the data undergoes
cross-validation, a model validation approach employed to evaluate the accuracy of the analytic outcomes.
Cross-validation is a technique used to do classification on preprocessed data. It involves partitioning the data
into separate training and testing sets [13]. Once the data has undergone preprocessing, feature selection,
smoothing modeling, and testing, the last stage involves analyzing the data produced by the Python
programming language using traditional machine learning methods and deep neural networks as shown in
Figure 1.
2.2. Application of research methods
2.2.1. Dataset
The main goal of this research is to examine how research approaches can be practically applied in
the setting of the thoracic surgery data set. To accomplish this goal, we painstakingly divided our study into
six distinct but related steps, each of which added to the overall analytical process. The data utilized is derived
from secondary sources, namely obtained from UCI machine learning. The dataset contains 470 data points,
each with 17 parameters and 1 class. Its purpose is to analyze and determine the survival statistics for patients
with thoracic conditions [12]. In order to overcome these challenges and offer more convenient, efficient, and
precise outcomes, classification methods that exhibit superior levels of forecasting and precision can be
employed. This study employed a comparison between the deep neural network algorithm and the classic
machine learning algorithm to yield highly predictive and accurate findings.
2.2.2. Pre-processing
A total of 470 patients' data were gathered during this phase of the trial, and they will be examined to
predict the patients' survival and mortality rates after one year. The initial step in the data preparation stage is
the selection or examination of attributes that will modify the data type. The data cleaning process commences
once the data selection phase is completed. During this technique, our objective is to identify any absent values.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694
3688
Figure 1. Research stages
2.2.3. Feature selection
During the pivotal feature selection stage, our primary goal is to identify the attributes that have the
greatest influence on the dataset's dynamics. This determination is made using a thorough feature analysis, with
a particular emphasis on the feature option. In accordance with our research objectives, we make the strategic
decision to incorporate all attributes into our modelling process, apart from the class attribute. This deliberate
choice is based on the expectation that including these attributes will significantly improve our model's
performance metrics, which include accuracy, F1 score, precision, recall, and area under curve (AUC)
measurement. This meticulous feature selection process is critical for fine-tuning our models and ensuring their
efficacy in extracting valuable insights from the dataset.
2.2.4. Smooth
Following the critical feature selection phase, the next step in our research is to address class
imbalance within the dataset. This is accomplished by utilizing the SMOTE, a method designed to equalize
class distribution. Simultaneously, a critical step is taken during the modelling stage to ensure the dependability
and robustness of our analysis. Cross-validation techniques are used in both deep neural network models and
traditional machine learning methods to achieve this. We reduce the risk of overfitting by splitting the data into
training and testing sets in a systematic manner, allowing our models to generalize effectively to previously
unseen data and providing a solid foundation for our research findings.
2.2.5. Modelling
The proposed approach is used to carry out the prediction procedure at the modelling stage. The
suggested method uses deep neural networks and conventional machine learning. Deep neural networks use
three to eight layers and the Python programming language to assess the level of accuracy, F1 scores, precision,
recall, and AUC from live thoracic patients. Conventional machine learning includes several methods, such as
random forest, support vector machine, and logistics regression.
− Machine learning: machine learning is the automatic recognition of significant patterns in data. Computers
can learn things from people through machine learning. The computer can learn to process the data that is
supplied to it without any explicit programming. Algorithms for machine learning are used to train
computers to process data [14].
− Random forest: the random forest concept involves the generation of several decision trees that are
correlated, where each decision tree functions as a collection of models. Every decision tree creates class
predictions, and the ultimate decision is determined by the highest yield [15]. The random forest
classification method employs a decision tree approach, where attributes are randomly selected at each node
to decide categorization. The decision tree utilizes the largest number of votes it receives to classify data
[16], [17]. Random forest employs a voting system, namely the highest count, to merge classifiers (CARTs)
that are mutually independent and originate from the same distribution. This process results in classification
predictions. Decreased correlation can diminish the impact of prediction errors in random forest, which is
an inherent characteristic of this algorithm [18]. Random forest formula [19]:
Int J Artif Intell ISSN: 2252-8938 
Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum)
3689
=Entropy (Y)=- ∑ ⬚
i P (Y)log2
p(Y), (1)
=Entropy (Y)- ∑ ⬚
V εvalues(a)
|Yv|
|Ya|
Entropy (Yv).
Information:
Y=case set
P(c|Y) is the ratio of grades in class Y to those in class c.
Values(a)=Possible values when a is set.
Yv=subclass of Y with class v, which is related to class a.
Ya=All values that correspond to a.
− Support vector machine: because it requires specific learning objectives during training, support vector
machine is an integrated (supervised) classification method [20]. The following is the support vector
machine formula [21]:
similarity =
∑ f(Ti,Si)
n
i=1
Wi
(2)
Information:
T: A new case
S: cases in storage
n: the number of attributes
I: individual attribute between 1 and n
f: TRIBUTE similarity function between case T and case S
W: weight assigned to the i-th attribute
− Logistics regression: the supervised classification algorithm includes logistic regression. This algorithm
has grown in popularity in recent years, and its application has expanded significantly. This is a sigmoid
curve. It is a subset of logistic regression. Next step is start with a simple linear regression formula to
understand the mathematical version of the explanation [22].
y=b0+b1*x (3)
Thus, it has now been subjected to the sigmoid function, and the result is provided by the formula.
p=
1
1+e-y
(4)
Now that one formula has been substituted for another to get the value of y, we have our logistic regression
formula.
logistic(S)=b0+b1M1+b2M2+b3M3…bkMk… (5)
where S denotes the likelihood of the presence of interesting features. The predictor values are M1, M2,
M3, ... Mk. The intercepts of the model are b0, b1, b2, b3, ... bk.
− Deep neural network: the deep neural network is a complex artificial neural network consisting of multiple
layers. A deep neural network often consists of more than three layers, including an input layer, hidden
layers, and an output layer. This architecture classifies it as a multilayer perceptron (MLP) with multiple
layers. The depth is attributed to the multitude of layers. Deep learning refers to the process of learning in
deep neural network [23]. Deep neural network belongs to the category of neural networks. Deep neural
network consists of multiple hidden units that are interconnected across layers, but not inside each
individual unit within a layer. This approach employs supervised training and shares a comparable structure
with an artificial neural network. The deep learning approach can optimize numerous parameters for speech
recognition. deep neural network possesses the ability to accurately identify and interpret speech, exhibiting
enhanced efficiency in understanding various languages and dialects [24].
− Confusion matrix: a particularly helpful tool for examining bias in the recognition of tuples of various types
is the confusion matrix [25]. This technique makes use of a matrix array with positive and negative classes
[26]. The values of accuracy, precision, recall, and error rate will be obtained during the evaluation stage
using the confusion matrix. The accuracy ratio is the number of correctly identified cases divided by the
total number of cases. The following formula can be used to calculate accuracy:
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694
3690
Accuracy=
TP+TN
TP+TN+FP+FN
(6)
The proportion of cases with a true positive result is referred to as precision.
Precision=
TP
TP+FP
(7)
The proportion of correctly identified positive cases is referred to as recall.
Recall=
TP
TP+FN
(8)
Information:
The number of positive cases is designated as positive.
The number of negative cases is considered positive.
The number of negative cases is classified as such.
The number of positive case fields is considered negative.
2.2.6. Evaluation
During the evaluation phase, we conduct a comprehensive assessment of the model's performance
using Python. This involves examining important metrics such accuracy, F1 score, precision, recall, and
success or mistake rates. This study utilizes two separate algorithms: conventional machine learning and deep
neural networks, enabling us to completely evaluate the usefulness and efficiency of the model.
2.3. Method of collecting data
Primary data and secondary data are the two categories of data sources that are available for utilization
in the process of data collection. Primary data are the ones that are obtained directly, whereas secondary data
are the ones that are obtained from other researchers who have already carried out an investigation that is
comparable. To complete this investigation, researchers looked at secondary data. Thoracic surgery data set
results from UCI machine learning were used for the research. These results included 470 records with a total
of 17 attributes and 1 class attribute [27], as stated in Table 1. Table 2 contains categories of user behavior
analysis.
Table 1. Description of the attributes of the survival dataset of patients with thorax
No Attributes Data type Description
1 DGN Category Diagnosis: specific combination of ICD-10 codes for primary and secondary tumors and
multiples if present (DGN3, DGN2, DGN4, DGN6, DGN5, DGN8, DGN1).
2 PRE4 Numeric Forced vital capacity - FVC
3 PRE5 Numeric Volume that has been exhaled at the end of the first second of a forced expiration, FEV
4 PRE6 Category Performance status, Zubrod scale (PRZ2, PRZ1, PRZ0)
5 PRE7 Category Pain before surgery (T, F)
6 PRE8 Category Hemoptysis before surgery (T, F)
7 PRE9 Category Dyspnea before surgery (T, F)
8 PRE10 Category Cough before surgery (T, F)
9 PRE11 Category Weakness before surgery (T, F)
10 PRE14 Category T in clinical TNM: original tumor size, from OC11 (smallest) to OC14 (largest) (OC11,
OC14, OC12, OC13)
11 PRE17 Category Type 2 DM, diabetes mellitus (T, F)
12 PRE19 Category MI for up to six months (T, F)
13 PRE25 Category PAD: peripheral arterial disease (T, F)
14 PRE30 Category Smoking (T, F)
15 PRE32 Category Asthma (T,F)
16 AGE Numeric Age at surgery (numeric)
17 Risk1Y Category 1 year survival period - (T) true value if dead (T,F)
Table 2. Numerical attributes and categories of user behavior analysis
No Nama atribute Min. Value Max. Value SD
1 PRE4 1.44 6.30 0.87
2 PRE5 0.96 86.30 11.77
3 AGE 21.00 87.00 8.71
Int J Artif Intell ISSN: 2252-8938 
Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum)
3691
3. RESULTS AND DISCUSSION
Research data on the survival of thoracic patients is secondary data, where the total data is 470 with
17 attributes and 1 class. In a previous study that discussed the survival of thoracic patients conducted by
Ziȩba et al. [5] with focused on boosted support vector machine to prediction of the post-operative life
expectancy in the lung cancer patients. This study aims to determine the best results using the boosted support
vector machine algorithm to test each predicted classification for the survival of thoracic patients. In a study,
the experimental process used two methods of class formation, namely the minority class and the majority
class. The study's conclusions consist of calculations derived from both qualitative and quantitative processing
methods, which are based on the proposed model. All available datasets were utilized for this investigation.
This study employs deep neural networks and conventional machine learning approaches to predict data sets
through trials and testing. The datasets utilized in this experiment have undergone preprocessing, feature
selection, smoothing, and modeling using Python in Google Collaboratory.
3.1. Pre-processing step validation
During the validation process of survival data for thoracic patients, we obtained a set of noteworthy
findings during the course of our study, which involved meticulous data preprocessing, as shown in Table 3.
When comparing the performance of conventional machine learning algorithms, the random forest algorithm
stood out, boasting an impressive accuracy rate of 92 percent, an F1 score of 91.06%, a precision of 92.94%,
and a recall rate of 88.17%. Surprisingly, these results outperformed those of the support vector machine
algorithm and another regression logistics, albeit by only 1-2 percent.
Table 3. Preprocessing value results of a conventional machine learning algorithm
Model Accuracy (%) F1 Score (%) Precision (%) Recall (%)
Random forest 92 91.06 92.94 88.17
Support vector machine 88 81.67 79.06 85.87
Logistic regression 87 - - -
3.2. Conventional machine learning algorithm model
This study employed various conventional machine learning methods, such as random forest, support
vector machine, and logistic regression. This work use conventional machine learning methods to compute
accuracy, F1 score, precision, and recall metrics for analyzing survival data in thoracic patients. Table 4
presents the test results of many conventional machine learning techniques that have been employed,
facilitating a comparison of the accuracy, F1 score, precision, and recall values. When the conventional
machine learning algorithm is applied to the survival data of thoracic patients, the random forest algorithm is
generated. This random forest algorithm has outstanding performance metrics, including a 92% accuracy rate,
a 91.06% F1 score, a precision level of 92.94%, and a recall rate of 88.17%. Notably, these performance metrics
outperform the support vector machine algorithm and another regression logistics.
Table 4. The results of a comparison of thorax conventional machine learning algorithm
Model Accuracy (%) F1 score (%) Precision (%) Recall (%)
Random forest 92 91.06 92.94 88.17
Support vector machine 88 81.67 79.06 85.87
Logistic regression 87 - - -
3.3. Deep neural network algorithm
Several techniques are used in this study, beginning with standard machine learning algorithms and
progressing to a deep neural network algorithm with a variable number of hidden layers ranging from 3 to 8.
We conducted extensive testing with multiple iterations of the deep neural network method, resulting in a
thorough comparison of various performance metrics. Accuracy, F1 score, precision, recall, and AUC values
are among the metrics presented in Table 5 for a thorough analysis and evaluation of our models' effectiveness.
The test results, as presented in Table 5, demonstrate that the optimization of the deep neural network algorithm
using different variations of 6 hidden layer decoders (encoder, activation function parameter sigmoid, optimizer
Adam), with a learning rate of 0.01 and dropout rate of 0.2, yields higher values compared to the deep neural
network algorithm with alternative variations.
3.4. Comparison model
The results of each algorithm are meticulously documented in Table 6, revealing a compelling pattern
of performance. Notably, deep neural network methods are significantly more effective for research purposes
when compared to traditional machine learning algorithms. In line with previous studies, it has been
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694
3692
demonstrated that deep neural networks improve the capabilities of intrusion detection systems and optimize
classification algorithms [28], [29]. A thorough examination of the test results, including an examination of
both the confusion matrix and the receiver operating characteristic (ROC) curve, elucidates the specific
conditions under which this superiority is most apparent. The deep neural network algorithm was optimized,
with variations of 6 hidden layers in a decoder-encoder architecture, an activation function parameter set to
sigmoid, the Adam optimizer with a learning rate of 0.01, and a dropout rate of 0.2. This configuration produces
impressive performance metrics, including an accuracy rate of 95.56%, an F1 score of 79.24%, a precision
level of 91.96%, a recall rate of 85.52%, and an AUC of 85.52%. In stark contrast, the conventional machine
learning model, specifically the random forest algorithm, achieves an accuracy rate of 92.00%, an F1 score of
91.06%, a precision of 92.94%, and a recall of 88.17%. These figures show a noticeable average difference in
accuracy of 3.56%, an F1 score difference of 11.82%, a precision difference of 0.98%, and a recall difference
of 2.65%, highlighting the significant performance gains offered by the deep neural network approach.
Table 5. Results of the deep neural network algorithm's accuracy, precision, recall, F1-score, and AUC scores
Layers
Activation
function
Optimizer Learn rate
Accuracy
(%)
F1 score
(%)
Precision
(%)
Recall
(%)
AUC
(%)
3 Hidden Decod-Encod Sigmoid RMSProop 0.001 93.95 65.11 96.79 74.13 74.13
3 Hidden Encod-Decod Sigmoid Adagrad 0.1 93.14 58.53 96.39 70.68 70.68
4 Hidden Decod-Encod Sigmoid Adam 0.01 94.35 70.83 92.11 78.85 78.85
4 Hidden Encod-Decod Sigmoid RMSProop 0.01 92.33 55.81 89.22 70.23 70.23
5 Hidden Decod-Encod Sigmoid Adam 0.1 94.75 73.46 92.58 80.57 80.57
5 Hidden Encod-Decod Sigmoid Adam 0.1 94.75 72.34 94.61 79.08 79.08
6 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 77.55 95.30 82.53 82.53
6 Hidden Encod-Decod Sigmoid Adam 0.1 95.56 79.24 91.96 85.52 85.52
7 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 76.59 97.60 81.03 81.03
7 Hidden Encod-Decod Sigmoid Adagrad 0.1 92.74 57.14 92.53 70.46 70.46
8 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 76.59 97.60 81.03 81.03
8 Hidden Encod-Decod Sigmoid RMSProop 0.01 92.33 53.65 92.01 68.73 68.73
Table 6. Testing conventional machine learning algorithms and deep neural networks
Layers
Activation
function
Optimizer
Learning
rate
Accuracy
(%)
F1 score
(%)
Precision
(%)
Recall
(%)
AUC
(%)
6 hidden encod-decod Sigmoid Adam 0.01 95.56 79.24 91.96 85.52 85.52
Random forest 92.00 91.06 92.94 88.17 0.00
4. CONCLUSION
Based on real-time data collected from patients with thoracic conditions, the crucial step in studying
survival data for thoracic patients is a pre-processing procedure that includes data filtering and cleansing,
feature selection, and SMOTE. The deep neural network data mining technique can be a valuable tool for
analyzing the survival data set of thoracic patients using a 10-fold cross-validation approach. The deep neural
network's six hidden layers can be modified with different variations, such as decoder-encoder, relu, and
sigmoid activation functions, optimizer Adam, a learning rate of 0.01, and a dropout rate of 0.2. By
implementing these modifications, the network can achieve an accuracy of 95.56%, precision of 91.96%, recall
of 85.52%, F1 score of 79.24%, and an accuracy of 85.52%. When comparing this value to conventional
machine learning algorithms, it becomes evident how superior it is. Based on this rationale, deep neural
network algorithms can produce superior outcomes in terms of accuracy, precision, recall, F1 score, and AUC
compared to machine learning approaches and related research. There are no definitive guidelines dictating the
learning rate, node, and dropout, although reducing these factors often enhances the accuracy of the network.
An increase in learning speed, node density, and dropout rates will lead to a decrease in network accuracy,
hence prolonging the process.
REFERENCES
[1] E. Dimakakou, H. J. Johnston, G. Streftaris, and J. W. Cherrie, “Exposure to environmental and occupational particulate air pollution
as a potential contributor to neurodegeneration and diabetes: A systematic review of epidemiological research,” International
Journal of Environmental Research and Public Health, vol. 15, no. 8. 2018, doi: 10.3390/ijerph15081704.
[2] M. G. Sholih, D. A. Perwitasari, R. Hendriani, H. Sukandar, and M. I. Barliana, “Risk factors of lung cancer in Indonesia: A
qualitative study,” Journal of Advanced Pharmacy Education and Research, vol. 9, no. 2, pp. 40–41, 2019.
[3] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods
and applications,” Expert Syst Appl, vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.
Int J Artif Intell ISSN: 2252-8938 
Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum)
3693
[4] J. A. ALzubi, B. Bharathikannan, S. Tanwar, R. Manikandan, A. Khanna, and C. Thaventhiran, “Boosted neural network ensemble
classification for lung cancer disease diagnosis,” Applied Soft Computing Journal, vol. 80, pp. 579–591, 2019, doi:
10.1016/j.asoc.2019.04.031.
[5] M. Ziȩba, J. M. Tomczak, M. Lubicz, and J. Świa
̧ tek, “Boosted SVM for extracting rules from imbalanced data in application to
prediction of the post-operative life expectancy in the lung cancer patients,” Applied Soft Computing Journal, vol. 14, pp. 99–108,
2014, doi: 10.1016/j.asoc.2013.07.016.
[6] H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and
solutions,” ACM Computing Surveys, vol. 52, no. 4, 2019, doi: 10.1145/3343440.
[7] L. Yu, R. Zhou, L. Tang, and R. Chen, “A DBN-based resampling SVM ensemble learning paradigm for credit classification with
imbalanced data,” Applied Soft Computing Journal, vol. 69, pp. 192–202, 2018, doi: 10.1016/j.asoc.2018.04.049.
[8] S. Dong, “Multi class SVM algorithm with active learning for network traffic classification,” Expert Systems with Applications, vol.
176, 2021, doi: 10.1016/j.eswa.2021.114885.
[9] P. Nanglia, S. Kumar, A. N. Mahajan, P. Singh, and D. Rathee, “A hybrid algorithm for lung cancer classification using SVM and
Neural Networks,” ICT Express, vol. 7, no. 3, pp. 335–341, 2021, doi: 10.1016/j.icte.2020.06.007.
[10] K. Pradhan and P. Chawla, “Medical internet of things using machine learning algorithms for lung cancer detection,” Journal of
Management Analytics, vol. 7, no. 4, pp. 591–623, 2020, doi: 10.1080/23270012.2020.1811789.
[11] S. Fotouhi, S. Asadi, and M. W. Kattan, “A comprehensive data level analysis for cancer diagnosis on imbalanced data,” Journal
of Biomedical Informatics, vol. 90, 2019, doi: 10.1016/j.jbi.2018.12.003.
[12] M. Lubicz, K. Pawelczyk, A. Rzechonek, and J. Kolodziej, “Thoracic surgery data,” UCI Machine Learning Repository, 2013.
Accessed: Apr. 05, 2023. [Online]. Available: http://archive.ics.uci.edu/dataset/277/thoracic+surgery+data
[13] L. Akter and M. M. Islam, “Hepatocellular carcinoma patient’s survival prediction using oversampling and machine learning
techniques,” in 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), IEEE,
2021, pp. 445–450, doi: 10.1109/ICREST51555.2021.9331108.
[14] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3,
2021, doi: 10.1007/s42979-021-00592-x.
[15] M. R. Kabir, F. B. Ashraf, and R. Ajwad, “Analysis of different predicting model for online shoppers’ purchase intention from empirical
data,” 2019 22nd International Conference on Computer and Information Technology, 5 2019, doi: 10.1109/ICCIT48885.2019.9038521.
[16] A. R. Panhalkar and D. D. Doye, “Optimization of decision trees using modified African buffalo algorithm,” Journal of King Saud
University - Computer and Information Sciences, vol. 34, no. 8, 2022, doi: 10.1016/j.jksuci.2021.01.011.
[17] C. S. Lee, P. Y. S. Cheang, and M. Moslehpour, “Predictive analytics in business analytics: decision tree,” Advances in Decision
Sciences, vol. 26, no. 1, 2022, doi: 10.47654/V26Y2022I1P1-30.
[18] X. Geng, S. Wang, A. Ullah, G. Wu, and H. Wang, “Prediction of hardenability curves for non-boron steels via a combined machine
learning model,” Materials, vol. 15, no. 9, 2022, doi: 10.3390/ma15093127.
[19] J. Zhou, Y. Dai, M. Tao, M. Khandelwal, M. Zhao, and Q. Li, “Estimating the mean cutting force of conical picks using random
forest with salp swarm algorithm,” Results in Engineering, vol. 17, 2023, doi: 10.1016/j.rineng.2023.100892.
[20] R. Rastogi and S. Sharma, “Fast Laplacian twin support vector machine with active learning for pattern classification,” Applied Soft
Computing Journal, vol. 74, 2019, doi: 10.1016/j.asoc.2018.10.042.
[21] S. K. Sharma and X. Hoque, “Sentiment predictions using support vector machines for odd-even formula in Delhi,” International
Journal of Intelligent Systems and Applications, vol. 9, no. 7, 2017, doi: 10.5815/ijisa.2017.07.07.
[22] K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the
text classification,” Augmented Human Research, vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0.
[23] S. Watanabe and H. Nishimori, “Fall lecture note on statistical learning theory,” in Lecture note for Tokyo Institute of Technology, 2016.
[24] S. A. Mohamed, M. A. Abdou, and A. A. Elsayed, “Residual information flow for neural machine translation,” IEEE Access, vol.
10, 2022, doi: 10.1109/ACCESS.2022.3220691.
[25] J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Information Sciences, vol.
507, 2020, doi: 10.1016/j.ins.2019.06.064.
[26] C. Sammut and G. I. Webb, Encyclopedia of Machine Learning and Data Mining, New York: Springer Science + Business Media, 2017.
[27] R. S. Wahono, “A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks,”
Journal of Software Engineering, vol. 1, no. 1, pp. 1–16, 2015.
[28] K. Farhana, M. Rahman, and M. T. Ahmed, “An intrusion detection system for packet and flow based networks using deep neural
network approach,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 5, pp. 5514-5525, 2020,
doi: 10.11591/ijece.v10i5.pp5514-5525.
[29] M. Al-Smadi, M. Hammad, Q. B. Baker, and A. Sa’ad, “A transfer learning with deep neural network approach for diabetic
retinopathy classification,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 4, pp. 3492-3501,
2021, doi: 10.11591/ijece.v11i4.pp3492-3501.
BIOGRAPHIES OF AUTHORS
Cucu Ika Agustyaningrum is a faculty member and researcher at Universitas
Bina Sarana Informatika at the Faculty of Engineering and Informatics. Focusing research on
data mining, data science, and computer science. She can be contacted at email:
cucu.cuk@bsi.ac.id.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694
3694
Yudi Ramdhani is faculty member and researcher from Department of
Information Systems at the Faculty of Creative Technology, Satu University. His
research focuses on data analysis using data mining and decision support systems.
He can be contacted at email: yudi.ramdhani@univ.satu.ac.id.
Doni Purnama Alamsyah is faculty member from Bina Nusantara University,
teaching at Department of Entrepreneurship in Customer Behavior study. He received a
Doctorate degree from Padjadjaran University in the field of Management Science. Currently
has an interest and research focus on consumer behavior and is very open to conducting
research collaboration. He can be contacted at email: doni.syah@binus.ac.id
Oda I. B. Hariyanto is a Professor of Culture and Food History, currently a
Faculty Member and Head of Department in Department of Tourism at Batam International
University. She has an interest in studying behavior and culture related to food in Indonesia.
She can be contacted at email: oda@uib.ac.id.

More Related Content

PDF
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
PDF
Lung Cancer Detection using Decision Tree Algorithm
PDF
Early Detection of Lung Cancer Using Neural Network Techniques
PDF
IRJET- Effect of Principal Component Analysis in Lung Cancer Detection us...
PDF
AN AUTOMATED FRAMEWORK FOR DIAGNOSING LUNGS RELATED ISSUES USING ML AND DATA ...
PPTX
Thesis Ivvvvvvvvvvvvvvvvvvvvvvvntro.pptx
DOC
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
Lung Cancer Detection using Decision Tree Algorithm
Early Detection of Lung Cancer Using Neural Network Techniques
IRJET- Effect of Principal Component Analysis in Lung Cancer Detection us...
AN AUTOMATED FRAMEWORK FOR DIAGNOSING LUNGS RELATED ISSUES USING ML AND DATA ...
Thesis Ivvvvvvvvvvvvvvvvvvvvvvvntro.pptx
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...

Similar to Deep neural networks and conventional machine learning classifiers to analyze thoracic survival data (20)

PDF
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
PDF
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
PDF
Accurate prediction of chronic diseases using deep learning algorithms
PDF
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
PREDICTION OF DISEASE WITH MINING ALGORITHMS IN MACHINE LEARNING
PPTX
Lung Conditions Prognosis Using CNN Model.pptx
PDF
Classification techniques using gray level co-occurrence matrix features for ...
PDF
Comparative analysis of explainable artificial intelligence models for predic...
PPTX
Prediction of time-to-event outcomes in diagnosing lung cancer.pptx
PDF
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
PDF
Enhancing Pneumonia Detection: A Comparative Study of CNN, DenseNet201, and V...
PPTX
PPT.pptx
PDF
An Approach for Disease Data Classification Using Fuzzy Support Vector Machine
PDF
A Study on Cancer Perpetuation Using the Classification Algorithms
PDF
X-Ray Disease Identifier
PDF
A comparative analysis of classification techniques on medical data sets
PDF
IRJET- Detection of Breast Cancer using Machine Learning Techniques
PDF
Using Artificial Neural Networks to Detect Multiple Cancers from a Blood Test
PDF
Virtual Expert -Disease Prediction Using Machine Learning
PDF
vaagdevi paper.pdf
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
Accurate prediction of chronic diseases using deep learning algorithms
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PREDICTION OF DISEASE WITH MINING ALGORITHMS IN MACHINE LEARNING
Lung Conditions Prognosis Using CNN Model.pptx
Classification techniques using gray level co-occurrence matrix features for ...
Comparative analysis of explainable artificial intelligence models for predic...
Prediction of time-to-event outcomes in diagnosing lung cancer.pptx
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
Enhancing Pneumonia Detection: A Comparative Study of CNN, DenseNet201, and V...
PPT.pptx
An Approach for Disease Data Classification Using Fuzzy Support Vector Machine
A Study on Cancer Perpetuation Using the Classification Algorithms
X-Ray Disease Identifier
A comparative analysis of classification techniques on medical data sets
IRJET- Detection of Breast Cancer using Machine Learning Techniques
Using Artificial Neural Networks to Detect Multiple Cancers from a Blood Test
Virtual Expert -Disease Prediction Using Machine Learning
vaagdevi paper.pdf
Ad

More from IAESIJAI (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Automatic detection of dress-code surveillance in a university using YOLO alg...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
Improved convolutional neural networks for aircraft type classification in re...
PDF
Primary phase Alzheimer's disease detection using ensemble learning model
PDF
Deep learning-based techniques for video enhancement, compression and restora...
PDF
Hybrid model detection and classification of lung cancer
PDF
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
PDF
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
PDF
Event detection in soccer matches through audio classification using transfer...
PDF
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
PDF
Optimizing deep learning models from multi-objective perspective via Bayesian...
PDF
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Exploring DenseNet architectures with particle swarm optimization: efficient ...
A comparative study of natural language inference in Swahili using monolingua...
Abstractive summarization using multilingual text-to-text transfer transforme...
Enhancing emotion recognition model for a student engagement use case through...
Automatic detection of dress-code surveillance in a university using YOLO alg...
Hindi spoken digit analysis for native and non-native speakers
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Improved convolutional neural networks for aircraft type classification in re...
Primary phase Alzheimer's disease detection using ensemble learning model
Deep learning-based techniques for video enhancement, compression and restora...
Hybrid model detection and classification of lung cancer
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
Event detection in soccer matches through audio classification using transfer...
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
Optimizing deep learning models from multi-objective perspective via Bayesian...
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
A novel scalable deep ensemble learning framework for big data classification...
Exploring DenseNet architectures with particle swarm optimization: efficient ...
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25-Week II

Deep neural networks and conventional machine learning classifiers to analyze thoracic survival data

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 3, September 2024, pp. 3686~3694 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3686-3694  3686 Journal homepage: http://ijai.iaescore.com Deep neural networks and conventional machine learning classifiers to analyze thoracic survival data Cucu Ika Agustyaningrum1 , Yudi Ramdhani2 , Doni Purnama Alamsyah3 , Oda I. B. Hariyanto4 1 Faculty of Engineering and Informatics, Universitas Bina Sarana Informatika, Jakarta, Indonesia 2 Department of Information System, Faculty of Creative Technology, Satu University, Bandung, Indonesia 3 Department of Entrepreneurship, BINUS Business School, Bina Nusantara University, Jakarta, Indonesia 4 Department of Tourism, Batam International University, Batam, Indonesia Article Info ABSTRACT Article history: Received Aug 13, 2023 Revised Feb 20, 2024 Accepted Mar 15, 2024 Lung cancer is a prevalent global health concern and most prevalent malignancy in Indonesian hospitals. Following thoracic surgery, patients were categorized into two classes: individuals who experienced mortality within a year and those who achieved survival. Despite being about socks, the dataset for the deceased category consisted of 70 data samples, while the dataset for the final group comprised 400 samples. Data calculation involves the utilization of both deep neural networks and standard machine learning algorithms. The study use the Python programming language to evaluate the algorithms, and it measures their performance using metrics such as accuracy, F1-score, precision, recall, receiver operating characteristic (ROC), and area under curve (AUC). The test results indicate that the deep neural network method achieves an accuracy of 95.56%, an F1 score of 79.24%, a precision of 91.96%, a recall of 85.52%, and an AUC of 85.52%. This study suggests that utilizing deep neural network data mining techniques, specifically with a cross-validation fold of 10, variations of six hidden layer encoder-decoder, relu, sigmoid activation function, optimizer Adam, and learning rate of 0,01, dropout rate of 0,2. Employing the synthetic minority over-sampling technique data preprocessing method, can effectively analyze thoracic patient survival data sets. Keywords: Algorithm Area under curve Conventional machine learning Deep neural network Pythons Toraric survival data This is an open access article under the CC BY-SA license. Corresponding Author: Doni Purnama Alamsyah Department of Entrepreneurship, BINUS Business School, Bina Nusantara University Jakarta 11480, Indonesia Email: doni.syah@binus.ac.id 1. INTRODUCTION According to the Indonesian sample registration system (SRS) report from 2014, 10 diseases are responsible for most fatalities in Indonesia. Smoking cigarettes, both actively and passively, increases the risk of developing lung cancer. Air pollution and exposure to the workplace are additional risk factors [1]. Because of the high prevalence of smoking in society, lung cancer will become a public health issue in Indonesia. Lung cancer is the fourth most common cancer detected in hospitals in Indonesia [2]. Surgery, radiation, chemotherapy, immunotherapy, hormone therapy, and gene therapy can all be used to treat lung cancer. The airways are almost entirely located on the ribs. The thorax is essential in the breathing process [3]. Thoracic surgery is one of the most common treatments for lung cancer. However, thoracic surgery carries several risks and complications, including neurological disorders, infections, and life-threatening complications. Many complications occur in people with cardiovascular disease, including heart and blood vessel problems that can lead to strokes. As a result, the life expectancy following thoracic surgery is extremely low [4]. The research
  • 2. Int J Artif Intell ISSN: 2252-8938  Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum) 3687 focus on predicting thoracic patient survival using a computer-aided diagnosis (CAD) system. The analysis of the patient's condition before and after surgery, CAD can help predict the life expectancy of patients with lung cancer. Data collection for patients with lung cancer undergoing thoracic surgery [5]. Following thoracic surgery, patients were divided into two groups: those who died within a year (die) and those who were able to survive. A total of 70 samples of data were included in the dataset for the die class and 400 samples were included in the dataset for the latter group [5]. The performance of the prediction model for the survival of thoracic patients may be affected by the class imbalance issue in the algorithm for detecting lung cancer in the thoracic dataset [6]. Because there are more false positives than true positives in the thoracic surgery dataset, there is a class imbalance. Bootstrap aggregating is a classification improvement technique that uses a random combination of classifications on training and bagging datasets to reduce variance and avoid overfitting [7]. Previously, several researchers used various algorithms to predict the survival of thoracic patients in previous studies, including the multi-class support vector machine algorithm with active learning for network traffic classification [8]. Lung cancer classification using support vector machine and neural networks[9]. The medical internet of things uses machine learning [10]. Based on unbalanced data, a comprehensive data-level analysis for cancer diagnosis is performed [11]. Previous study indicates that the deep neural network technology has not been employed thus far, with only traditional machine learning methods being utilized. The objective of this study is to compare conventional machine learning and deep neural network algorithms in order to determine the most effective approach for assessing potential online shopper intentions using the Python programming language. The aim is to obtain the most accurate results for doctors and medical services in receiving thoracic patient survival data. 2. METHOD 2.1. Research stages The research on thoracic surgical data involved the utilization of a model constructed through a systematic research method. This approach encompassed various steps, including dataset acquisition, preprocessing, feature selection, smoothing, modeling, and evaluation. The research commences with gathering data from the UCI machine learning repository website [12]. Subsequently, the data undergoes a transformation to generate initial data for preprocessing. Following this, features are chosen utilizing the Python programming language. The data is then balanced using the synthetic minority over-sampling technique (SMOTE) method. Finally, the data is evaluated using conventional machine learning algorithms and deep neural networks through data cross-validation. During this modeling phase, two techniques will be evaluated: traditional machine learning and deep neural networks. Prior to inputing the modeling procedure, the data is gathered using the SMOTE method, a data augmentation methodology that ensures a balanced representation of both one-year survival and mortality possibilities. After a seamless procedure, the data undergoes cross-validation, a model validation approach employed to evaluate the accuracy of the analytic outcomes. Cross-validation is a technique used to do classification on preprocessed data. It involves partitioning the data into separate training and testing sets [13]. Once the data has undergone preprocessing, feature selection, smoothing modeling, and testing, the last stage involves analyzing the data produced by the Python programming language using traditional machine learning methods and deep neural networks as shown in Figure 1. 2.2. Application of research methods 2.2.1. Dataset The main goal of this research is to examine how research approaches can be practically applied in the setting of the thoracic surgery data set. To accomplish this goal, we painstakingly divided our study into six distinct but related steps, each of which added to the overall analytical process. The data utilized is derived from secondary sources, namely obtained from UCI machine learning. The dataset contains 470 data points, each with 17 parameters and 1 class. Its purpose is to analyze and determine the survival statistics for patients with thoracic conditions [12]. In order to overcome these challenges and offer more convenient, efficient, and precise outcomes, classification methods that exhibit superior levels of forecasting and precision can be employed. This study employed a comparison between the deep neural network algorithm and the classic machine learning algorithm to yield highly predictive and accurate findings. 2.2.2. Pre-processing A total of 470 patients' data were gathered during this phase of the trial, and they will be examined to predict the patients' survival and mortality rates after one year. The initial step in the data preparation stage is the selection or examination of attributes that will modify the data type. The data cleaning process commences once the data selection phase is completed. During this technique, our objective is to identify any absent values.
  • 3.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694 3688 Figure 1. Research stages 2.2.3. Feature selection During the pivotal feature selection stage, our primary goal is to identify the attributes that have the greatest influence on the dataset's dynamics. This determination is made using a thorough feature analysis, with a particular emphasis on the feature option. In accordance with our research objectives, we make the strategic decision to incorporate all attributes into our modelling process, apart from the class attribute. This deliberate choice is based on the expectation that including these attributes will significantly improve our model's performance metrics, which include accuracy, F1 score, precision, recall, and area under curve (AUC) measurement. This meticulous feature selection process is critical for fine-tuning our models and ensuring their efficacy in extracting valuable insights from the dataset. 2.2.4. Smooth Following the critical feature selection phase, the next step in our research is to address class imbalance within the dataset. This is accomplished by utilizing the SMOTE, a method designed to equalize class distribution. Simultaneously, a critical step is taken during the modelling stage to ensure the dependability and robustness of our analysis. Cross-validation techniques are used in both deep neural network models and traditional machine learning methods to achieve this. We reduce the risk of overfitting by splitting the data into training and testing sets in a systematic manner, allowing our models to generalize effectively to previously unseen data and providing a solid foundation for our research findings. 2.2.5. Modelling The proposed approach is used to carry out the prediction procedure at the modelling stage. The suggested method uses deep neural networks and conventional machine learning. Deep neural networks use three to eight layers and the Python programming language to assess the level of accuracy, F1 scores, precision, recall, and AUC from live thoracic patients. Conventional machine learning includes several methods, such as random forest, support vector machine, and logistics regression. − Machine learning: machine learning is the automatic recognition of significant patterns in data. Computers can learn things from people through machine learning. The computer can learn to process the data that is supplied to it without any explicit programming. Algorithms for machine learning are used to train computers to process data [14]. − Random forest: the random forest concept involves the generation of several decision trees that are correlated, where each decision tree functions as a collection of models. Every decision tree creates class predictions, and the ultimate decision is determined by the highest yield [15]. The random forest classification method employs a decision tree approach, where attributes are randomly selected at each node to decide categorization. The decision tree utilizes the largest number of votes it receives to classify data [16], [17]. Random forest employs a voting system, namely the highest count, to merge classifiers (CARTs) that are mutually independent and originate from the same distribution. This process results in classification predictions. Decreased correlation can diminish the impact of prediction errors in random forest, which is an inherent characteristic of this algorithm [18]. Random forest formula [19]:
  • 4. Int J Artif Intell ISSN: 2252-8938  Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum) 3689 =Entropy (Y)=- ∑ ⬚ i P (Y)log2 p(Y), (1) =Entropy (Y)- ∑ ⬚ V εvalues(a) |Yv| |Ya| Entropy (Yv). Information: Y=case set P(c|Y) is the ratio of grades in class Y to those in class c. Values(a)=Possible values when a is set. Yv=subclass of Y with class v, which is related to class a. Ya=All values that correspond to a. − Support vector machine: because it requires specific learning objectives during training, support vector machine is an integrated (supervised) classification method [20]. The following is the support vector machine formula [21]: similarity = ∑ f(Ti,Si) n i=1 Wi (2) Information: T: A new case S: cases in storage n: the number of attributes I: individual attribute between 1 and n f: TRIBUTE similarity function between case T and case S W: weight assigned to the i-th attribute − Logistics regression: the supervised classification algorithm includes logistic regression. This algorithm has grown in popularity in recent years, and its application has expanded significantly. This is a sigmoid curve. It is a subset of logistic regression. Next step is start with a simple linear regression formula to understand the mathematical version of the explanation [22]. y=b0+b1*x (3) Thus, it has now been subjected to the sigmoid function, and the result is provided by the formula. p= 1 1+e-y (4) Now that one formula has been substituted for another to get the value of y, we have our logistic regression formula. logistic(S)=b0+b1M1+b2M2+b3M3…bkMk… (5) where S denotes the likelihood of the presence of interesting features. The predictor values are M1, M2, M3, ... Mk. The intercepts of the model are b0, b1, b2, b3, ... bk. − Deep neural network: the deep neural network is a complex artificial neural network consisting of multiple layers. A deep neural network often consists of more than three layers, including an input layer, hidden layers, and an output layer. This architecture classifies it as a multilayer perceptron (MLP) with multiple layers. The depth is attributed to the multitude of layers. Deep learning refers to the process of learning in deep neural network [23]. Deep neural network belongs to the category of neural networks. Deep neural network consists of multiple hidden units that are interconnected across layers, but not inside each individual unit within a layer. This approach employs supervised training and shares a comparable structure with an artificial neural network. The deep learning approach can optimize numerous parameters for speech recognition. deep neural network possesses the ability to accurately identify and interpret speech, exhibiting enhanced efficiency in understanding various languages and dialects [24]. − Confusion matrix: a particularly helpful tool for examining bias in the recognition of tuples of various types is the confusion matrix [25]. This technique makes use of a matrix array with positive and negative classes [26]. The values of accuracy, precision, recall, and error rate will be obtained during the evaluation stage using the confusion matrix. The accuracy ratio is the number of correctly identified cases divided by the total number of cases. The following formula can be used to calculate accuracy:
  • 5.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694 3690 Accuracy= TP+TN TP+TN+FP+FN (6) The proportion of cases with a true positive result is referred to as precision. Precision= TP TP+FP (7) The proportion of correctly identified positive cases is referred to as recall. Recall= TP TP+FN (8) Information: The number of positive cases is designated as positive. The number of negative cases is considered positive. The number of negative cases is classified as such. The number of positive case fields is considered negative. 2.2.6. Evaluation During the evaluation phase, we conduct a comprehensive assessment of the model's performance using Python. This involves examining important metrics such accuracy, F1 score, precision, recall, and success or mistake rates. This study utilizes two separate algorithms: conventional machine learning and deep neural networks, enabling us to completely evaluate the usefulness and efficiency of the model. 2.3. Method of collecting data Primary data and secondary data are the two categories of data sources that are available for utilization in the process of data collection. Primary data are the ones that are obtained directly, whereas secondary data are the ones that are obtained from other researchers who have already carried out an investigation that is comparable. To complete this investigation, researchers looked at secondary data. Thoracic surgery data set results from UCI machine learning were used for the research. These results included 470 records with a total of 17 attributes and 1 class attribute [27], as stated in Table 1. Table 2 contains categories of user behavior analysis. Table 1. Description of the attributes of the survival dataset of patients with thorax No Attributes Data type Description 1 DGN Category Diagnosis: specific combination of ICD-10 codes for primary and secondary tumors and multiples if present (DGN3, DGN2, DGN4, DGN6, DGN5, DGN8, DGN1). 2 PRE4 Numeric Forced vital capacity - FVC 3 PRE5 Numeric Volume that has been exhaled at the end of the first second of a forced expiration, FEV 4 PRE6 Category Performance status, Zubrod scale (PRZ2, PRZ1, PRZ0) 5 PRE7 Category Pain before surgery (T, F) 6 PRE8 Category Hemoptysis before surgery (T, F) 7 PRE9 Category Dyspnea before surgery (T, F) 8 PRE10 Category Cough before surgery (T, F) 9 PRE11 Category Weakness before surgery (T, F) 10 PRE14 Category T in clinical TNM: original tumor size, from OC11 (smallest) to OC14 (largest) (OC11, OC14, OC12, OC13) 11 PRE17 Category Type 2 DM, diabetes mellitus (T, F) 12 PRE19 Category MI for up to six months (T, F) 13 PRE25 Category PAD: peripheral arterial disease (T, F) 14 PRE30 Category Smoking (T, F) 15 PRE32 Category Asthma (T,F) 16 AGE Numeric Age at surgery (numeric) 17 Risk1Y Category 1 year survival period - (T) true value if dead (T,F) Table 2. Numerical attributes and categories of user behavior analysis No Nama atribute Min. Value Max. Value SD 1 PRE4 1.44 6.30 0.87 2 PRE5 0.96 86.30 11.77 3 AGE 21.00 87.00 8.71
  • 6. Int J Artif Intell ISSN: 2252-8938  Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum) 3691 3. RESULTS AND DISCUSSION Research data on the survival of thoracic patients is secondary data, where the total data is 470 with 17 attributes and 1 class. In a previous study that discussed the survival of thoracic patients conducted by Ziȩba et al. [5] with focused on boosted support vector machine to prediction of the post-operative life expectancy in the lung cancer patients. This study aims to determine the best results using the boosted support vector machine algorithm to test each predicted classification for the survival of thoracic patients. In a study, the experimental process used two methods of class formation, namely the minority class and the majority class. The study's conclusions consist of calculations derived from both qualitative and quantitative processing methods, which are based on the proposed model. All available datasets were utilized for this investigation. This study employs deep neural networks and conventional machine learning approaches to predict data sets through trials and testing. The datasets utilized in this experiment have undergone preprocessing, feature selection, smoothing, and modeling using Python in Google Collaboratory. 3.1. Pre-processing step validation During the validation process of survival data for thoracic patients, we obtained a set of noteworthy findings during the course of our study, which involved meticulous data preprocessing, as shown in Table 3. When comparing the performance of conventional machine learning algorithms, the random forest algorithm stood out, boasting an impressive accuracy rate of 92 percent, an F1 score of 91.06%, a precision of 92.94%, and a recall rate of 88.17%. Surprisingly, these results outperformed those of the support vector machine algorithm and another regression logistics, albeit by only 1-2 percent. Table 3. Preprocessing value results of a conventional machine learning algorithm Model Accuracy (%) F1 Score (%) Precision (%) Recall (%) Random forest 92 91.06 92.94 88.17 Support vector machine 88 81.67 79.06 85.87 Logistic regression 87 - - - 3.2. Conventional machine learning algorithm model This study employed various conventional machine learning methods, such as random forest, support vector machine, and logistic regression. This work use conventional machine learning methods to compute accuracy, F1 score, precision, and recall metrics for analyzing survival data in thoracic patients. Table 4 presents the test results of many conventional machine learning techniques that have been employed, facilitating a comparison of the accuracy, F1 score, precision, and recall values. When the conventional machine learning algorithm is applied to the survival data of thoracic patients, the random forest algorithm is generated. This random forest algorithm has outstanding performance metrics, including a 92% accuracy rate, a 91.06% F1 score, a precision level of 92.94%, and a recall rate of 88.17%. Notably, these performance metrics outperform the support vector machine algorithm and another regression logistics. Table 4. The results of a comparison of thorax conventional machine learning algorithm Model Accuracy (%) F1 score (%) Precision (%) Recall (%) Random forest 92 91.06 92.94 88.17 Support vector machine 88 81.67 79.06 85.87 Logistic regression 87 - - - 3.3. Deep neural network algorithm Several techniques are used in this study, beginning with standard machine learning algorithms and progressing to a deep neural network algorithm with a variable number of hidden layers ranging from 3 to 8. We conducted extensive testing with multiple iterations of the deep neural network method, resulting in a thorough comparison of various performance metrics. Accuracy, F1 score, precision, recall, and AUC values are among the metrics presented in Table 5 for a thorough analysis and evaluation of our models' effectiveness. The test results, as presented in Table 5, demonstrate that the optimization of the deep neural network algorithm using different variations of 6 hidden layer decoders (encoder, activation function parameter sigmoid, optimizer Adam), with a learning rate of 0.01 and dropout rate of 0.2, yields higher values compared to the deep neural network algorithm with alternative variations. 3.4. Comparison model The results of each algorithm are meticulously documented in Table 6, revealing a compelling pattern of performance. Notably, deep neural network methods are significantly more effective for research purposes when compared to traditional machine learning algorithms. In line with previous studies, it has been
  • 7.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694 3692 demonstrated that deep neural networks improve the capabilities of intrusion detection systems and optimize classification algorithms [28], [29]. A thorough examination of the test results, including an examination of both the confusion matrix and the receiver operating characteristic (ROC) curve, elucidates the specific conditions under which this superiority is most apparent. The deep neural network algorithm was optimized, with variations of 6 hidden layers in a decoder-encoder architecture, an activation function parameter set to sigmoid, the Adam optimizer with a learning rate of 0.01, and a dropout rate of 0.2. This configuration produces impressive performance metrics, including an accuracy rate of 95.56%, an F1 score of 79.24%, a precision level of 91.96%, a recall rate of 85.52%, and an AUC of 85.52%. In stark contrast, the conventional machine learning model, specifically the random forest algorithm, achieves an accuracy rate of 92.00%, an F1 score of 91.06%, a precision of 92.94%, and a recall of 88.17%. These figures show a noticeable average difference in accuracy of 3.56%, an F1 score difference of 11.82%, a precision difference of 0.98%, and a recall difference of 2.65%, highlighting the significant performance gains offered by the deep neural network approach. Table 5. Results of the deep neural network algorithm's accuracy, precision, recall, F1-score, and AUC scores Layers Activation function Optimizer Learn rate Accuracy (%) F1 score (%) Precision (%) Recall (%) AUC (%) 3 Hidden Decod-Encod Sigmoid RMSProop 0.001 93.95 65.11 96.79 74.13 74.13 3 Hidden Encod-Decod Sigmoid Adagrad 0.1 93.14 58.53 96.39 70.68 70.68 4 Hidden Decod-Encod Sigmoid Adam 0.01 94.35 70.83 92.11 78.85 78.85 4 Hidden Encod-Decod Sigmoid RMSProop 0.01 92.33 55.81 89.22 70.23 70.23 5 Hidden Decod-Encod Sigmoid Adam 0.1 94.75 73.46 92.58 80.57 80.57 5 Hidden Encod-Decod Sigmoid Adam 0.1 94.75 72.34 94.61 79.08 79.08 6 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 77.55 95.30 82.53 82.53 6 Hidden Encod-Decod Sigmoid Adam 0.1 95.56 79.24 91.96 85.52 85.52 7 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 76.59 97.60 81.03 81.03 7 Hidden Encod-Decod Sigmoid Adagrad 0.1 92.74 57.14 92.53 70.46 70.46 8 Hidden Decod-Encod Sigmoid Adam 0.01 95.56 76.59 97.60 81.03 81.03 8 Hidden Encod-Decod Sigmoid RMSProop 0.01 92.33 53.65 92.01 68.73 68.73 Table 6. Testing conventional machine learning algorithms and deep neural networks Layers Activation function Optimizer Learning rate Accuracy (%) F1 score (%) Precision (%) Recall (%) AUC (%) 6 hidden encod-decod Sigmoid Adam 0.01 95.56 79.24 91.96 85.52 85.52 Random forest 92.00 91.06 92.94 88.17 0.00 4. CONCLUSION Based on real-time data collected from patients with thoracic conditions, the crucial step in studying survival data for thoracic patients is a pre-processing procedure that includes data filtering and cleansing, feature selection, and SMOTE. The deep neural network data mining technique can be a valuable tool for analyzing the survival data set of thoracic patients using a 10-fold cross-validation approach. The deep neural network's six hidden layers can be modified with different variations, such as decoder-encoder, relu, and sigmoid activation functions, optimizer Adam, a learning rate of 0.01, and a dropout rate of 0.2. By implementing these modifications, the network can achieve an accuracy of 95.56%, precision of 91.96%, recall of 85.52%, F1 score of 79.24%, and an accuracy of 85.52%. When comparing this value to conventional machine learning algorithms, it becomes evident how superior it is. Based on this rationale, deep neural network algorithms can produce superior outcomes in terms of accuracy, precision, recall, F1 score, and AUC compared to machine learning approaches and related research. There are no definitive guidelines dictating the learning rate, node, and dropout, although reducing these factors often enhances the accuracy of the network. An increase in learning speed, node density, and dropout rates will lead to a decrease in network accuracy, hence prolonging the process. REFERENCES [1] E. Dimakakou, H. J. Johnston, G. Streftaris, and J. W. Cherrie, “Exposure to environmental and occupational particulate air pollution as a potential contributor to neurodegeneration and diabetes: A systematic review of epidemiological research,” International Journal of Environmental Research and Public Health, vol. 15, no. 8. 2018, doi: 10.3390/ijerph15081704. [2] M. G. Sholih, D. A. Perwitasari, R. Hendriani, H. Sukandar, and M. I. Barliana, “Risk factors of lung cancer in Indonesia: A qualitative study,” Journal of Advanced Pharmacy Education and Research, vol. 9, no. 2, pp. 40–41, 2019. [3] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst Appl, vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.
  • 8. Int J Artif Intell ISSN: 2252-8938  Deep neural networks and conventional machine learning classifiers to … (Cucu Ika Agustyaningrum) 3693 [4] J. A. ALzubi, B. Bharathikannan, S. Tanwar, R. Manikandan, A. Khanna, and C. Thaventhiran, “Boosted neural network ensemble classification for lung cancer disease diagnosis,” Applied Soft Computing Journal, vol. 80, pp. 579–591, 2019, doi: 10.1016/j.asoc.2019.04.031. [5] M. Ziȩba, J. M. Tomczak, M. Lubicz, and J. Świa ̧ tek, “Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients,” Applied Soft Computing Journal, vol. 14, pp. 99–108, 2014, doi: 10.1016/j.asoc.2013.07.016. [6] H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM Computing Surveys, vol. 52, no. 4, 2019, doi: 10.1145/3343440. [7] L. Yu, R. Zhou, L. Tang, and R. Chen, “A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data,” Applied Soft Computing Journal, vol. 69, pp. 192–202, 2018, doi: 10.1016/j.asoc.2018.04.049. [8] S. Dong, “Multi class SVM algorithm with active learning for network traffic classification,” Expert Systems with Applications, vol. 176, 2021, doi: 10.1016/j.eswa.2021.114885. [9] P. Nanglia, S. Kumar, A. N. Mahajan, P. Singh, and D. Rathee, “A hybrid algorithm for lung cancer classification using SVM and Neural Networks,” ICT Express, vol. 7, no. 3, pp. 335–341, 2021, doi: 10.1016/j.icte.2020.06.007. [10] K. Pradhan and P. Chawla, “Medical internet of things using machine learning algorithms for lung cancer detection,” Journal of Management Analytics, vol. 7, no. 4, pp. 591–623, 2020, doi: 10.1080/23270012.2020.1811789. [11] S. Fotouhi, S. Asadi, and M. W. Kattan, “A comprehensive data level analysis for cancer diagnosis on imbalanced data,” Journal of Biomedical Informatics, vol. 90, 2019, doi: 10.1016/j.jbi.2018.12.003. [12] M. Lubicz, K. Pawelczyk, A. Rzechonek, and J. Kolodziej, “Thoracic surgery data,” UCI Machine Learning Repository, 2013. Accessed: Apr. 05, 2023. [Online]. Available: http://archive.ics.uci.edu/dataset/277/thoracic+surgery+data [13] L. Akter and M. M. Islam, “Hepatocellular carcinoma patient’s survival prediction using oversampling and machine learning techniques,” in 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), IEEE, 2021, pp. 445–450, doi: 10.1109/ICREST51555.2021.9331108. [14] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3, 2021, doi: 10.1007/s42979-021-00592-x. [15] M. R. Kabir, F. B. Ashraf, and R. Ajwad, “Analysis of different predicting model for online shoppers’ purchase intention from empirical data,” 2019 22nd International Conference on Computer and Information Technology, 5 2019, doi: 10.1109/ICCIT48885.2019.9038521. [16] A. R. Panhalkar and D. D. Doye, “Optimization of decision trees using modified African buffalo algorithm,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, 2022, doi: 10.1016/j.jksuci.2021.01.011. [17] C. S. Lee, P. Y. S. Cheang, and M. Moslehpour, “Predictive analytics in business analytics: decision tree,” Advances in Decision Sciences, vol. 26, no. 1, 2022, doi: 10.47654/V26Y2022I1P1-30. [18] X. Geng, S. Wang, A. Ullah, G. Wu, and H. Wang, “Prediction of hardenability curves for non-boron steels via a combined machine learning model,” Materials, vol. 15, no. 9, 2022, doi: 10.3390/ma15093127. [19] J. Zhou, Y. Dai, M. Tao, M. Khandelwal, M. Zhao, and Q. Li, “Estimating the mean cutting force of conical picks using random forest with salp swarm algorithm,” Results in Engineering, vol. 17, 2023, doi: 10.1016/j.rineng.2023.100892. [20] R. Rastogi and S. Sharma, “Fast Laplacian twin support vector machine with active learning for pattern classification,” Applied Soft Computing Journal, vol. 74, 2019, doi: 10.1016/j.asoc.2018.10.042. [21] S. K. Sharma and X. Hoque, “Sentiment predictions using support vector machines for odd-even formula in Delhi,” International Journal of Intelligent Systems and Applications, vol. 9, no. 7, 2017, doi: 10.5815/ijisa.2017.07.07. [22] K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,” Augmented Human Research, vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0. [23] S. Watanabe and H. Nishimori, “Fall lecture note on statistical learning theory,” in Lecture note for Tokyo Institute of Technology, 2016. [24] S. A. Mohamed, M. A. Abdou, and A. A. Elsayed, “Residual information flow for neural machine translation,” IEEE Access, vol. 10, 2022, doi: 10.1109/ACCESS.2022.3220691. [25] J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Information Sciences, vol. 507, 2020, doi: 10.1016/j.ins.2019.06.064. [26] C. Sammut and G. I. Webb, Encyclopedia of Machine Learning and Data Mining, New York: Springer Science + Business Media, 2017. [27] R. S. Wahono, “A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks,” Journal of Software Engineering, vol. 1, no. 1, pp. 1–16, 2015. [28] K. Farhana, M. Rahman, and M. T. Ahmed, “An intrusion detection system for packet and flow based networks using deep neural network approach,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 5, pp. 5514-5525, 2020, doi: 10.11591/ijece.v10i5.pp5514-5525. [29] M. Al-Smadi, M. Hammad, Q. B. Baker, and A. Sa’ad, “A transfer learning with deep neural network approach for diabetic retinopathy classification,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 4, pp. 3492-3501, 2021, doi: 10.11591/ijece.v11i4.pp3492-3501. BIOGRAPHIES OF AUTHORS Cucu Ika Agustyaningrum is a faculty member and researcher at Universitas Bina Sarana Informatika at the Faculty of Engineering and Informatics. Focusing research on data mining, data science, and computer science. She can be contacted at email: cucu.cuk@bsi.ac.id.
  • 9.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3686-3694 3694 Yudi Ramdhani is faculty member and researcher from Department of Information Systems at the Faculty of Creative Technology, Satu University. His research focuses on data analysis using data mining and decision support systems. He can be contacted at email: yudi.ramdhani@univ.satu.ac.id. Doni Purnama Alamsyah is faculty member from Bina Nusantara University, teaching at Department of Entrepreneurship in Customer Behavior study. He received a Doctorate degree from Padjadjaran University in the field of Management Science. Currently has an interest and research focus on consumer behavior and is very open to conducting research collaboration. He can be contacted at email: doni.syah@binus.ac.id Oda I. B. Hariyanto is a Professor of Culture and Food History, currently a Faculty Member and Head of Department in Department of Tourism at Batam International University. She has an interest in studying behavior and culture related to food in Indonesia. She can be contacted at email: oda@uib.ac.id.