SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 3, September 2024, pp. 3119~3128
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3119-3128  3119
Journal homepage: http://ijai.iaescore.com
Comparing logistic regression and extreme gradient boosting on
student arguments
Tri Wahyuningsih, Danny Manongga, Irwan Sembiring
Department of of Computer Science, Universitas Kristen Satya Wacana, Salatiga, Indonesia
Article Info ABSTRACT
Article history:
Received Jan 1, 2024
Revised Jan 27, 2024
Accepted Feb 10, 2024
Identifying the effectiveness level and quality of students' arguments poses a
challenge for teachers. This is due to the lack of techniques that can accurately
assist in identifying the effectiveness and quality of students' arguments. This
research aims to develop a model that can identify effectiveness categories in
students' arguments. The method employed involves the logistic
regression+XGBoost algorithm combined with separate implementations of
term frequency-inverse document frequency (TF-IDF) and CountVectorizer.
Student argument data were collected and processed using natural language
processing techniques. The research results indicate that TF-IDF outperforms
in identifying effectiveness classes in student arguments with an accuracy of
66.20%. The multi-output classification yielded an accuracy of 89.32% in the
initial testing, which further improved to 92.34% after implementing one-hot
encoding. A novel finding in this research is the superiority of TF-IDF as a
technique for identifying effectiveness classes in student arguments compared
to CountVectorizer. The implications of this research include the development
of a model that can assist teachers in identifying the effectiveness level of
students' arguments, thereby improving the quality of learning and enhancing
students' argumentative competence.
Keywords:
Argumentative competence
Effectiveness identification
Logistic regression
Multi-output classification
TF-IDF vs. CountVectorizer
This is an open access article under the CC BY-SA license.
Corresponding Author:
Tri Wahyuningsih
Department of of Computer Science, Universitas Kristen Satya Wacana
Salatiga, Indonesia
Email: 982022001@student.uksw.edu
1. INTRODUCTION
The development of argumentation identification models poses a significant challenge in artificial
intelligence development, especially when dealing with the complexity and diversity of human language
structures [1]. Argument identification involves not only understanding individual words but also requires the
model's capacity to interpret context, capture nuanced meanings, and recognize relationships between parts of a
text. While many advancements have been made in natural language processing (NLP) and machine learning, the
development of argumentation identification models still faces several critical challenges [2]. One of them is the
diversity in how humans present arguments, ranging from linear structures to more concealed and implicit
deliveries. Models must be able to capture this depth and complexity to provide satisfactory results. Additionally,
in developing argumentation identification models, it is crucial to address biases that may emerge in the training
data. These biases can create less accurate or even harmful models when applied in real-world situations.
Therefore, a critical aspect of this model development is ensuring that the training data representation includes
diversity that reflects the reality of human argumentation without causing distortion or imbalance.
The use of logistic regression algorithms has been employed in some previous studies [2], and based
on these studies, it was found that logistic regression performs well. This provides evidence and also raises
questions in this study regarding the accuracy of the algorithm's performance. The method used in this study
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128
3120
is a combination of the logistic regression and XGBoost algorithms. Both algorithms were chosen because they
can measure the effectiveness and quality of student arguments with high accuracy [3]. Logistic regression is
used to predict the effectiveness class in student arguments, while XGBoost is used to enhance the prediction
results of the logistic regression algorithm [4]. The use of logistic regression algorithms along with XGBoost
as a method for evaluating student arguments is an important step in addressing teachers' challenges in
measuring the effectiveness and quality of student arguments.
The study compares the efficacy of the term frequency-inverse document frequency (TF-IDF) and
CountVectorizer methods in analyzing student arguments, aiming to aid teachers in evaluating argument quality.
TF-IDF gauges word occurrence while mitigating the impact of common terms, whereas CountVectorizer
converts text into a numeric format for algorithmic processing. Results demonstrate TF-IDF's superiority in
discerning argument effectiveness. This research innovates by addressing the challenge of accurately assessing
argument quality, offering a model to support teachers in this task where current tools are lacking. Contributions
include the development of a model for assessing argument quality, a comparison of TF-IDF and CountVectorizer
methods, and enhancements through machine learning techniques like apostrophes and one-hot encoding.
This research endeavors to construct a model aimed at gauging the effectiveness and quality of student
arguments, utilizing machine learning techniques to aid educators in a more streamlined and unbiased
assessment process. The primary research inquiries revolve around the feasibility of employing the logistic
regression+XGBoost algorithm for this task, the comparison between logistic regression+XGBoost+TF-IDF
and logistic regression+XGBoost+CountVectorizer algorithms to determine superiority, and the potential
enhancements in model performance through text mining optimization techniques like apostrophes and
one-hot encoding. Through the development and evaluation of this model, the study aims to provide educators
with a more objective and efficient means of evaluating student arguments, ultimately assisting in their
instructional endeavors.
2. RELATED RESEARCH
The presence of several levels regarding the quality of students' arguments, as outlined in the previous
section, organizes discussions on related quality based on these areas and focuses on the same approach as the
research objectives. This research specifically highlights the methods that have different research approaches
used in the field of detecting the quality of student arguments. It begins by presenting relevant approaches in
the domain of argument quality detection, followed by model development techniques. It then discusses works
on the quality of student arguments in general. Finally, this section outlines works related to model
development with comparative analysis.
2.1. Related research on the use of text mining to detect text quality
Text mining, as one branch of data mining, utilizes techniques and algorithms to analyze and extract
information from textual data [5]–[9]. The use of text mining in identifying text quality is highly beneficial and
relevant, considering that most information is currently conveyed through text [10]–[13]. Some previous
studies have employed text mining to measure text quality using methods such as sentiment analysis, opinion
mining, and text classification [6], [14]–[17]. The results of these studies vary greatly and still have ample
room for improvement [18]–[22]. Several criteria are used to measure text quality, such as accuracy, precision,
recall, and F1-score. These criteria provide an overview of how well text mining algorithms identify text
quality. In identifying text quality, text mining also employs techniques such as bag of words,
n-gram, TF-IDF, and word embedding. These techniques are used to transform text into numerical
representations that can be processed by machines [8], [10], [17]. Research related to text mining for identifying
text quality also takes into account factors such as context, slang, emotion, and sentiment. These factors
significantly influence measuring text quality, necessitating better algorithms for measurement. With research
related to the use of text mining to identify text quality, it is hoped that better and more accurate methods for
measuring text quality can be discovered.
2.2. Related research on argument quality detection
Argument quality detection is a crucial field in communication and learning sciences. Several studies
have been conducted to explore how to identify argument quality. However, many of these studies use manual
methods that are time-consuming and inefficient. Therefore, there is a need to develop more effective and
efficient tools for identifying argument quality [23]–[26]. Some research on argument quality detection uses
machine learning algorithms such as logistic regression and XGBoost [24]. These algorithms can identify the
effectiveness and quality levels of arguments more accurately and efficiently than manual methods. One
interesting study in this field is the development of sentiment analysis models to identify the effectiveness and
quality of student arguments. In this research, machine learning algorithms are used to learn and identify
Int J Artif Intell ISSN: 2252-8938 
Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih)
3121
patterns in student arguments and determine the effectiveness and quality of the arguments. Despite many
studies related to argument quality detection, there are still many challenges to be addressed. One of the biggest
challenges is ensuring that the developed tools have high accuracy and can be widely used by teachers and
learners. Therefore, research related to argument quality detection is still evolving to address these challenges
and provide more effective and efficient solutions.
2.3. Related research on the development of machine learning models
Machine learning models in sentiment measurement system development are an area undergoing
significant development. These related studies aim to develop predictive models and clustering in categorizing
sentiment in text or paragraphs [27]–[29]. Some studies have been conducted to develop models for educational
aspects, such as assessing student responses [20], [30]–[33]. In some studies, models are also compared with other
text mining methods such as TF-IDF and CountVectorizer to evaluate their effectiveness in text mining [22].
Previous research results indicate that models have higher performance levels compared to other methods. There
is also research focusing on improving the accuracy and effectiveness of models [13], [18], [21]. This research
involves the development of better machine learning models and the application of optimization techniques to
improve the results of previous models. This study examines previous research [13], [14], [33] both of which used
a dataset similar to this study, namely "feedback prize-predicting effective arguments" on Kaggle. Nevertheless,
the goals of this research differ significantly from the works of [13], [14], [34]. Ding et al. [13] focused on
improving the accuracy and reliability of automatic assessment of the quality of student arguments by using multi-
task learning, while this research aims to explore the combination of three tasks: automatic span detection, type
prediction, and quality prediction. Meanwhile, Ding et al. [14] investigates the influence of prompts in identifying
student arguments. Furthermore, Wang et al. [15] also adopting a combination of logistic regression and XGB
models, uses a different dataset, focusing on the detection of credit card fraud risk in the UCI Public Germany
dataset in 2018. Table 1 describes the reseach summary used in this study.
Table 1. Research summary
Research Dataset Algorithm Target Year
Wang et al. [15] UCI public germany “the German
credit data set (1994)”
Linear regression+XGBoost Credit fraud risk
detection
2018
Rahman et al. [1] Twitter “sentiment polarity datasets” N-Gram, TF-IDF, ensemble Comparison 2020
Dogra et al. [18] Bank report “banking financial news” DistilBERT Evaluation 2021
Yu et al. [17] Questionnaire “Chengde Medical
College, China”
TF-IDF + GRU neural
networks
Model development 2021
Ding et al. [14] Kaggle “feedback prize - predicting
effective arguments”
k-means clustering & TF-
IDF
Model development
and evaluation
2022
Ding et al. [13] Kaggle “feedback prize - predicting
effective arguments”
Logistic regression, BERT,
multi-task learning (MTL)
Comparison 2023
Recent research Kaggle “feedback prize - predicting
effective arguments”
Logistic regression, Xgboost,
TF-IDF, CountVectorizer
Model development
and evaluation
2023
3. METHOD
3.1. Student argument dataset
The dataset, sourced from Kaggle as secondary data, contains argumentative essays written by
students in grades 6 to 12 in the United States, originating from Georgia State University (GSU). It includes
contextual information gathered from students' questionnaire responses on various essay topics, totaling 36,765
entries. Utilizing this dataset, a model was developed to assess the quality of students' arguments, leveraging
text and argument types for predictions and providing feedback for improvement. The choice of English for
the model's development stems from its widespread usage, particularly in scientific literature and NLP research,
benefiting from available resources and preprocessing techniques. However, efforts to extend the model's
applicability to other languages are underway to enhance inclusivity and broaden its impact. For more described
on Table 2 related dataset description, meanwhile Table 3 described example of dataset.
Table 2. Dataset description
Column Explanation
discourse_id Identification of the discussion containing those arguments.
essay_id Identification of the essay from the tested discussion.
discourse_text Text from the argument itself.
discourse_type Types of arguments, such as evidence, rebuttal, counterclaim, lead, concluding statement, position, and
claim.
discourse_effectiveness Numbers representing the types of arguments, such as 1 for ineffective, 2 for adequate, or 3 for effective.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128
3122
Table 3. Example dataset
discourse_id essay_id discourse_text discourse_type discourse_effectiveness
c22adee811b6 007ACE74B050 I think that the face is a natural landform because
there is no life on Mars that we have descovered
yet
Claim Adequate
This research employs one dependent variable, namely discourse effectiveness, which is divided into
three categories: "effective," "adequate," and "ineffective." This dependent variable reflects the extent to which
a student's argument or text is considered effective in the context of discourse analysis. The independent variable
used in this study is a combination of discourse text and discourse type, with the rationale that discourse type is
combined with discourse text to clarify the type of argument. Discourse type encompasses seven categories:
lead, position, claim, CounterClaim, rebuttal, evidence, and concluding statement. Each type of discourse
contributes uniquely to constructing the structure and content of an argument. In the context of this research,
this variable becomes the main focus in identifying patterns and characteristics of discourse types that can
influence discourse effectiveness. The comparison diagram of the distribution of each category is shown in
Figure 1, and words with the highest frequency in each category are visualized using a word cloud in Figure 2.
Wordcloud is one of the crucial visualization techniques in text mining, particularly for datasets with
two classes. It provides a visual representation of the most commonly used words in each class, aiding in the
identification of key words that may differentiate the two classes. Therefore, wordclouds can expedite and
simplify the data exploration process, helping researchers understand the characteristics of each class more
effectively. Additionally, wordclouds assist researchers in selecting the most relevant and important features
when constructing a classification model. Consequently, wordclouds contribute to enhancing the quality and
accuracy of text mining analysis on datasets with two classes.
Figure. 1. Class type distribution
Int J Artif Intell ISSN: 2252-8938 
Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih)
3123
Figure 2. Wordcloud on each type and category
3.2. Research flow
The study employed a systematic research flow, starting with data collection from a Kaggle dataset
featuring argumentative essays evaluated by experts for various elements, categorized into argument types and
effectiveness classes. Preprocessing involved data cleaning, transformation, integration, and reduction to
enhance efficiency, followed by word and text preprocessing utilizing apostrophes and one-hot encoding for
better data preparation [30]–[32], [35], [36]. Model evaluation tested three models, including logistic regression
combined with XGBoost with TF-IDF and CountVectorizer features to weigh words. The goal was to identify
the most accurate model for predicting data classes. Evaluation results validated analysis accuracy through
classification, cluster similarity, and topic representativeness, utilizing a multi-output classification model to
select the best-performing model and test multi-output predictions. Ultimately, the conclusion stage will
present the results of the text mining analysis and summarize the findings of this research in a clear and
understandable manner, whether in the form of tables, graphs, or narratives. The steps of this research are
illustrated in Figure 3.
Figure 3. Research step
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128
3124
4. RESULTS AND DISCUSSION
4.1. Model development
The comparison of the developed models is focused on identifying answers to the research question.
This study investigates how efficiently students' arguments can be detected using:
a) Logistic regression+XGBoost
Combining logistic regression with XGBoost using (1):
ŷ𝐾𝑜𝑚𝐴 = 𝛼ŷ𝐿𝑅 + (1 − 𝛼)ŷ𝑋𝐺𝐵 (1)
Where 𝛼ŷ𝐿𝑅 is the prediction from logistic regression, ŷ𝑋𝐺𝐵 is the prediction from the XGBoost model, and
α is the weight of the specified model.
The weight α will influence the extent to which the model wants to give preference to the predictions
from each model. If α=0, the model will only use predictions from the XGBoost model, and if α=1, the
model will only use predictions from logistic regression. If α is between 0 and 1, the model will assign
different weights to both predictions.
b) Logistic regression+XGBoost+TF-IDF
The combination of logistic regression with XGBoost and TF-IDF involves four equations as (2):
ŷ𝐾𝑜𝑚𝐵 = 𝛼ŷ𝐿𝑅 + 𝛽ŷ𝑋𝐺𝐵 + (1 − 𝑎 − 𝛽)ŷ𝑇𝐹𝐼𝐷𝐹 (2)
Where α is the weight assigned to the predictions from the logistic regression model, β is the weight
assigned to the predictions from the XGBoost model, and (1 − 𝑎 − 𝛽)ŷ𝑇𝐹𝐼𝐷𝐹 is the weight assigned to the
predictions from the TF-IDF method.
The weights α, β, and (1 − 𝑎 − 𝛽) will influence how much the model prefers each prediction source.
If α = 1, only predictions from logistic regression will be used, if β=1, only predictions from XGBoost will
be used, if (1 − 𝑎 − 𝛽) = 1, only predictions from TF-IDF will be used.
c) Logistic regression+XGBoost+CountVectorizer
The combination of logistic regression with XGBoost and CountVectorizer is achieved using (3):
ŷ𝐾𝑜𝑚𝐶 = 𝛼ŷ𝐿𝑅 + 𝛽ŷ𝑋𝐺𝐵 + (1 − 𝑎 − 𝛽)ŷ𝐶𝑉 (3)
Where α is the weight assigned to the predictions from the logistic regression model, β is the weight
assigned to the predictions from the XGBoost model, and (1 − 𝑎 − 𝛽)ŷ𝐶𝑉 is the weight assigned to the
predictions from the CountVectorizer method.
The weights α, β, dan (1 - α - β) will influence the extent to which the model prefers each source of
predictions. If α=1, then only predictions from logistic regression will be used. If β=1, then only predictions
from XGBoost will be used. If (1 − 𝑎 − 𝛽) = 1, then only predictions from CountVectorizer will be used.
This research discusses two methods for determining parameters α and β in composite models:
‒ Manual determination involves optimization techniques like grid or random search across parameter space,
offering insights into parameter effects, potentially utilizing complex optimization algorithms.
‒ Automatic tuning with machine learning utilizes algorithms such as random search or bayesian
optimization. While manual determination offers deeper insights into parameter variations, the choice
depends on research goals and resource availability. The modeling aims to analyze machine learning
algorithm performance across different feature space configurations, as depicted in Table 4, providing clear
experimental labels for result differentiation and discussion.
Table 4. The experimental setup is based on the algorithm used
Label experimen Model
A Logistic regression+XGBoost
B Logistic regression+XGBoost+TF-IDF
C Logistic regression+XGBoost+CountVectorizer
For the purpose of machine learning education, this research utilizes Python, a programming language
that supports data analysis and mining using various machine learning algorithms. The objective of employing
multiple machine learning algorithms is to examine the consistency of the acquired knowledge. The study
conducts three experiments (A, B, and C), employing three classifications for each. The next section presents
experimental findings for all label combinations and experimental classifiers, followed by a conclusive
discussion on the research findings.
Int J Artif Intell ISSN: 2252-8938 
Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih)
3125
4.2. Model training and testing
In this study, the initial dataset consists of 36,765 annotated argumentative essay data ready for
analysis. To optimize computational performance and provide flexibility to model users, the initial dataset is
divided into two parts: training data and testing data. This division is done in a 70:30 ratio, where 70% of the
total initial data (25,735 data) is used as training data, while 30% (11,030 data) is used as testing data. This
separation is carried out automatically and will be tested randomly, aiming to provide flexibility to model users
to easily change test data without rearranging training data, thereby facilitating the testing and evaluation
process of the model. In this way, the research can test and evaluate the model's performance with previously
unseen data, maintaining the integrity of the analysis results and ensuring that the model has good
generalization capabilities for new data.
4.3. Model evaluation
Table 5 presents the summary accuracy results of three different experiments using models in this
research. Experiment A utilizes a combination of logistic regression and XGBoost models. The accuracy result
of this experiment is 58.29%. Precision, measuring the proportion of correctly identified data from a specific
class, is 65.05%. Recall, measuring the proportion of actual data from a specific class correctly identified by
the model, is 57.93%. The F-measure, combining precision and recall, has a value of 61.28%. Experiment B
involves a combined model of logistic regression, XGBoost, and TF-IDF features. This experiment achieves
an accuracy of 66.20%. Precision reaches 75.44%, recall is 70.12%, and the F-measure reaches 72.31%.
Experiment C also uses a combined logistic regression and XGBoost model, this time with CountVectorizer
features. Its accuracy is 66.02%, with precision at 64.11%, recall at 87.32%, and F-measure at 74.12%. The
evaluation matrix analysis in the table provides a comprehensive overview of the performance of the three
different experiments in this research. Accuracy is a metric measuring the overall alignment of model
predictions with actual data. Experiment B with the combined features of logistic regression, XGBoost, and
TF-IDF shows the highest accuracy at 66.20%, indicating the model's ability to classify data correctly.
Precision, indicating how well the model can correctly identify positive data, is highest in experiment B with
a value of 75.44%. Recall, measuring the model's ability to identify all positive data, has the highest value in
experiment C at 87.32%. The F-measure, combining precision and recall, shows that experiment B has the
highest balance between these two metrics with a value of 72.31%. Overall, experiments B and C stand out
with better performance, where experiment C excels in recognizing all positive data (recall), while experiment
B demonstrates a good balance between precision and recall. The higher accuracy in both experiments indicates
the model's accuracy in classifying the entire dataset.
Table 5. Summary of Model Accuracy Results
Experiment A Experiment B Experiment C
(Logistic regression+
XGBoost)
(Logistic regression+
XGBoost+TF-IDF)
(Logistic regression+
XGBoost+CountVectorizer)
Accuracy (%) 58.29 66.20 66.02
Precision (%) 65.05 75.44 64.11
Recall (%) 57.93 70.12 87.32
F-Measure (%) 61.28 72.31 74.12
4.4. Multi-output classification
Multi-output classification is a text classification method used to categorize text into multiple different
classes simultaneously. This method is useful for identifying more than one attribute or category within a text,
providing more comprehensive and accurate information. For example, in sentiment analysis research, multi-
output classification can be employed to classify a text as positive, negative, or neutral, while also identifying
the topics or themes discussed in the text. With multi-output classification, we can analyze text in more detail
and obtain more useful information for specific purposes. In this study, the model testing is based on or has a
reference in the form of testing data that has been categorized or assigned classes for each argument. The goal
of multi-class and multi-output is to use this testing data as a reference for classifying the quality classes based
on computational text, resulting in primary classes such as effective, sufficient, or ineffective. The testing data
and results used in this study can be seen in Figures 4(a) to 4(c).
In the initial test results, it was found that the model accuracy rate was approximately 89.32%. There
were 11.68% prediction errors out of 11,030 tested data. However, after employing the one-hot encoding
technique in the second test, it was discovered that the model successfully provided prediction results with an
accuracy rate reaching 92.34%. Therefore, it can be concluded that the model used with TF-IDF and one-hot
encoding techniques is highly effective in classifying student arguments with a high level of accuracy. In the
context of using different data in this model, the integration of additional data allows for highly possible
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128
3126
adoption, especially if the dataset is consistent with the same target labels. A model that has been trained with
the original dataset can be updated or fine-tuned with additional datasets to enhance its classification
capabilities. This approach leverages the diversity of information that may exist in new datasets, enriching the
model's understanding of variations in the data. It is expected that the combination of different datasets can
help the model improve its generalization and performance. However, it is crucial to emphasize the importance
of maintaining consistency in annotations or target labels across the entire dataset so that the model can produce
consistent and reliable classification results. By merging different data, it is hoped that the model can achieve
better robustness and handle variations that may arise in the context of classification tasks.
(a) (b) (c)
Figure 4. Data testing class prediction result: (a) actual data, (b) before using one-hot encoding, and
(c) after using one-hot encoding
5. CONCLUSION
The results of developing the model using logistic regression and XGBoost indicate that both methods
can be used to predict a text category with fairly high accuracy. The comparison results using TF-IDF and
CountVectorizer techniques show that TF-IDF is superior in terms of prediction accuracy. In the initial test,
the model achieved an accuracy of approximately 89.32%, but there was a 11.68% prediction error out of
11,030 tested data. In the second test using one-hot encoding technique, the model successfully predicted with
an accuracy rate of 92.34%. Therefore, it can be concluded that the model used with TF-IDF and one-hot
encoding techniques is highly effective in classifying student arguments with a high level of accuracy. Using
TF-IDF, the model can predict a text category more accurately because it considers the importance of a word
in a document. This can reduce bias compared to using CountVectorizer, which only counts the occurrences
of words in a document. The success of the model development provides a new breakthrough for educators in
evaluating and categorizing classes of arguments written by students. This can be further developed by using
the data from the model to test patterns or patterns of effectiveness and types of arguments. For future research,
the use of other classification algorithms can also be implemented on the same model to test their performance
with different classifications.
REFERENCES
[1] S. S. M. M. Rahman, K. B. M. B. Biplob, M. H. Rahman, K. Sarker, and T. Islam, “An investigation and evaluation of N-gram, TF-
IDF and ensemble methods in sentiment classification,” in Cyber Security and Computer Science, Cham: Springer, 2020, pp. 391–
402, doi: 10.1007/978-3-030-52856-0_31.
[2] G. N. Harywanto, R. Siautama, A. C. I. Ardison, and D. Suhartono, “Extractive hotel review summarization based on TF/IDF and
adjective-noun pairing by considering annual sentiment trends,” Procedia Computer Science, vol. 179, pp. 558–565, 2021, doi:
10.1016/j.procs.2021.01.040.
[3] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: a comparative study,”
Electronics, vol. 9, no. 3, pp. 1–29, 2020, doi: 10.3390/electronics9030483.
[4] R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia
Computer Science, vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.
[5] H. Liu, X. Chen, and X. Liu, “A study of the application of weight distributing method combining sentiment dictionary and TF-IDF
for text sentiment analysis,” IEEE Access, vol. 10, pp. 32280–32289, 2022, doi: 10.1109/ACCESS.2022.3160172.
Int J Artif Intell ISSN: 2252-8938 
Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih)
3127
[6] M. Kamyab, G. Liu, and M. Adjeisah, “Attention-based CNN and Bi-LSTM model based on TF-IDF and glove word embedding
for sentiment analysis,” Applied Sciences, vol. 11, no. 23, pp. 1–17, 2021, doi: 10.3390/app112311255.
[7] M. Chiny, M. Chihab, Y. Chihab, and O. Bencharef, “LSTM, VADER and TF-IDF based hybrid sentiment analysis model,”
International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, pp. 265–275, 2021, doi:
10.14569/IJACSA.2021.0120730.
[8] R. S. Patil and S. R. Kolhe, “Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets,” Social Network
Analysis and Mining, vol. 12, no. 1, 2022, doi: 10.1007/s13278-022-00877-w.
[9] S. Fransiska and A. Irham Gufroni, “Sentiment analysis provider by. U on Google Play Store reviews with TF-IDF and support
vector machine (SVM) method,” Scientific Journal of Informatics, vol. 7, no. 2, pp. 2407–7658, 2020.
[10] A. Mee, E. Homapour, F. Chiclana, and O. Engel, “Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit,”
Knowledge-Based Systems, vol. 228, 2021, doi: 10.1016/j.knosys.2021.107238.
[11] F. M. Al-kharboush and M. A. Al-Hagery, “Features extraction effect on the accuracy of sentiment classification using ensemble
models,” International Journal of Science and Research (IJSR), vol. 10, no. 3, pp. 228–231, 2021, doi: 10.21275/SR21303123511.
[12] K. S. Kalaivani, S. Uma, and C. S. Kanimozhiselvi, “A review on feature extraction techniques for sentiment classification,” in
2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 679–683, doi:
10.1109/ICCMC48092.2020.ICCMC-000126.
[13] Y. Ding, M. Bexte, and A. Horbach, “Score it all together: a multi-task learning study on automatic scoring of argumentative
essays,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 13052–13063, doi:
10.18653/v1/2023.findings-acl.825.
[14] Y. Ding, M. Bexte, and A. Horbach, “Don’t drop the topic-the role of the prompt in argument identification in student writing,” in
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022, pp. 124–
133, doi: 10.18653/v1/2022.bea-1.17.
[15] M. Wang, J. Yu, and Z. Ji, “Credit fraud risk detection based on XGBoost-LR hybrid model,” in Proceedings of the International
Conference on Electronic Business (ICEB) 2018, 2018, pp. 336–343.
[16] N. S. M. Nafis and S. Awang, “An enhanced hybrid feature selection technique using term frequency-inverse document frequency
and support vector machine-recursive feature elimination for sentiment classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021,
doi: 10.1109/ACCESS.2021.3069001.
[17] H. Yu, Y. Ji, and Q. Li, “Student sentiment classification model based on GRU neural network and TF-IDF algorithm,” Journal of
Intelligent and Fuzzy Systems, vol. 40, no. 2, pp. 2301–2311, 2021, doi: 10.3233/JIFS-189227.
[18] V. Dogra, A. Singh, S. Verma, Kavita, N. Z. Jhanjhi, and M. N. Talib, “Analyzing DistilBERT for sentiment classification of banking
financial news,” Lecture Notes in Networks and Systems, vol. 248, pp. 501–510, 2021, doi: 10.1007/978-981-16-3153-5_53.
[19] D. Suhartono and K. Khodirun, “System of information feedback on archive using term frequency-inverse document frequency and
vector space model methods,” IJIIS: International Journal of Informatics and Information Systems, vol. 3, no. 1, pp. 36–42, 2020,
doi: 10.47738/ijiis.v3i1.6.
[20] J. R. Batmetan and T. Hariguna, “Sentiment Unleashed: Electric Vehicle Incentives Under the Lens of Support Vector Machine and
TF-IDF Analysis,” J. Appl. Data Sci., vol. 5, no. 1, pp. 122–132, 2024, doi: 10.47738/jads.v5i1.162.
[21] Riyanto and A. Azis, “Application of the vector machine support method in twitter social media sentiment analysis regarding the
covid-19 vaccine issue in Indonesia,” Journal of Applied Data Sciences, vol. 2, no. 3, pp. 102–108, 2021, doi:
10.47738/jads.v2i3.40.
[22] S. Qaiser and R. Ali, “Text mining: use of TF-IDF to examine the relevance of words to documents,” International Journal of
Computer Applications, vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.
[23] A. Shankar, C. Jebarajakirthy, and M. Ashaduzzaman, “How do electronic word of mouth practices contribute to mobile banking
adoption?,” Journal of Retailing and Consumer Services, vol. 52, 2020, doi: 10.1016/j.jretconser.2019.101920.
[24] H. Cho and W. Chiu, “COVID-19 pandemic: consumers’ purchase intention of indoor fitness products during the partial lockdown
period in Singapore,” Asia Pacific Journal of Marketing and Logistics, vol. 34, no. 10, pp. 2299–2313, 2022, doi: 10.1108/APJML-
04-2021-0235.
[25] M. C. G. Davidson, “Does organizational climate add to service quality in hotels?,” International Journal of Contemporary
Hospitality Management, vol. 15, no. 4, pp. 206–213, 2003, doi: 10.1108/09596110310475658.
[26] H. R. Zeinabadi, “Principal-teacher high-quality exchange indicators and student achievement: testing a model,” Journal of
Educational Administration, vol. 52, no. 3, pp. 404–420, 2014.
[27] S. Klotz and A. Lindermeir, “Multivariate credit portfolio management using cluster analysis,” Journal of Risk Finance, vol. 16,
no. 2, pp. 145–163, 2015, doi: 10.1108/JRF-09-2014-0131.
[28] Q. Li, K. H. Cheung, J. You, R. Tong, and A. Mak, “A robust automatic face recognition system for real-time personal
identification,” Sensor Review, vol. 26, no. 1, pp. 38–44, 2006, doi: 10.1108/02602280610640661.
[29] K. Plangger, M. Montecchi, I. Danatzis, M. Etter, and J. Clement, “Strategic enablement investments: exploring differences in
human and technological knowledge transfers to supply chain partners,” Industrial Marketing Management, vol. 91, pp. 187–195,
2020, doi: 10.1016/j.indmarman.2020.09.001.
[30] L. S. Riza, A. B. Rachmat, Munir, T. Hidayat, and S. Nazir, “Genomic repeat detection using the Knuth-Morris-Pratt algorithm on
R high-performance-computing package,” International Journal of Advances in Soft Computing and its Applications, vol. 11, no.
1, pp. 94–111, 2019.
[31] F. Wang, Q. Wang, F. Nie, Z. Li, W. Yu, an9d F. Ren, “A linear multivariate binary decision tree classifier based on K-means
splitting,” Pattern Recognition, vol. 107, 2020, doi: 10.1016/j.patcog.2020.107521.
[32] Y. Choi, “Finding ‘just right’ books for children: analyzing sentiments in online book reviews,” Electronic Library, vol. 37, no. 3,
pp. 563–576, 2019, doi: 10.1108/EL-01-2019-0018.
[33] N. Süzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, “Automatic short answer grading and feedback using text mining methods,”
Procedia Computer Science, vol. 169, pp. 726–743, 2020, doi: 10.1016/j.procs.2020.02.171.
[34] S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and support vector machine
for news classification,” in 2016 IEEE International Conference on Engineering and Technology (ICETECH), 2016, pp. 112–116,
doi: 10.1109/ICETECH.2016.7569223.
[35] D. Antons, E. Grünwald, P. Cichy, and T. O. Salge, “The application of text mining methods in innovation research: current state,
evolution patterns, and development priorities,” R and D Management, vol. 50, no. 3, pp. 329–351, 2020, doi: 10.1111/radm.12408.
[36] I. A. Orobor and N. O. Obi, “Machine learning pipeline for multi-class text classification,” International Journal of Engineering
Applied Sciences and Technology, vol. 7, no. 2, pp. 64–69, 2022.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128
3128
BIOGRAPHIES OF AUTHORS
Tri Wahyuningsih is doctoral student of computer science program at Satya
Wacana Christian University. She has a strong interest in information systems management
and decided to pursue her doctoral degree in computer science. She has interests and
capabilities in data mining and text mining. Before starting her doctoral program, she
completed her bachelor and master degrees in informatics engineering. She has working
experience as a system analyst and showed excellent achievements during her work. Now,
she is focusing on her doctoral studies and working on several research projects in the field
of information systems management. She can be contacted at email:
982022001@student.uksw.edu.
Danny Manongga obtained his Bachelor's degree (Drs) from Universitas Kristen
Satya Wacana in 1982, followed by a Ph.D. from the University of East Anglia in 1997. His
international exposure includes an M.Sc. in IT from Queen Mary College, London, in 1989.
Currently serving as a professor and the dean of the Faculty of Information Technology, he
brings a wealth of expertise to his administrative and academic roles. Professor Manongga's
extensive contributions to research and education underscore his commitment to advancing
technology and information studies. He can be contacted at email:
danny.manongga@uksw.edu.
Irwan Sembiring earned his Bachelor of Engineering degree in informatics
engineering from Universitas Pembangunan Nasional "Veteran" Yogyakarta in 2001, Master
of Computer Science in computer science from Gadjah Mada University Yogyakarta in 2004,
and Doctorate in computer science from Gadjah Mada University Yogyakarta in 2015. His
main research interests are computer network security, and he has done more than 40
publications during his education and teaching career. His research interests include computer
network security and computer network designing. Currently, he is active as a lecturer at the
Faculty of Information Technology, Satya Wacana Christian University Salatiga. He can be
contacted at email: irwan@uksw.edu.

More Related Content

PDF
A simplified classification computational model of opinion mining using deep ...
PDF
Sentiment analysis of student feedback using attention-based RNN and transfor...
PDF
Automated Thai Online Assignment Scoring
PDF
Sentiment analysis of student’s comments using long short-term memory with mu...
PDF
Automatic Assessment of University Teachers Critical Thinking Levels.pdf
PDF
Mining Opinions from University Students’ Feedback using Text Analytics
PDF
A hybrid composite features based sentence level sentiment analyzer
PDF
Clustering Students of Computer in Terms of Level of Programming
A simplified classification computational model of opinion mining using deep ...
Sentiment analysis of student feedback using attention-based RNN and transfor...
Automated Thai Online Assignment Scoring
Sentiment analysis of student’s comments using long short-term memory with mu...
Automatic Assessment of University Teachers Critical Thinking Levels.pdf
Mining Opinions from University Students’ Feedback using Text Analytics
A hybrid composite features based sentence level sentiment analyzer
Clustering Students of Computer in Terms of Level of Programming

Similar to Comparing logistic regression and extreme gradient boosting on student arguments (20)

PDF
Aq35241246
PDF
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
PDF
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
PDF
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
PDF
C017321319
PDF
Review of Various Text Categorization Methods
PDF
Exploring Semantic Question Generation Methodology and a Case Study for Algor...
PDF
K0176495101
PDF
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...
PDF
The sarcasm detection with the method of logistic regression
PDF
An efficient-classification-model-for-unstructured-text-document
PDF
Relevance feature discovery for text mining
PDF
Automated Essay Score Predictions As A Formative Assessment Tool
PDF
Constructing an instrument with behavioral scales to assess teaching quality ...
PDF
A Survey on Research work in Educational Data Mining
PDF
G017224349
PDF
IRJET- Academic Performance Analysis System
PDF
76201910
PDF
New Fuzzy Model For quality evaluation of e-Training of CNC Operators
DOCX
Technology Enabled Learning to Improve Student Performance: A Survey
Aq35241246
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
C017321319
Review of Various Text Categorization Methods
Exploring Semantic Question Generation Methodology and a Case Study for Algor...
K0176495101
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...
The sarcasm detection with the method of logistic regression
An efficient-classification-model-for-unstructured-text-document
Relevance feature discovery for text mining
Automated Essay Score Predictions As A Formative Assessment Tool
Constructing an instrument with behavioral scales to assess teaching quality ...
A Survey on Research work in Educational Data Mining
G017224349
IRJET- Academic Performance Analysis System
76201910
New Fuzzy Model For quality evaluation of e-Training of CNC Operators
Technology Enabled Learning to Improve Student Performance: A Survey
Ad

More from IAESIJAI (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
PDF
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
PDF
Event detection in soccer matches through audio classification using transfer...
PDF
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
PDF
Optimizing deep learning models from multi-objective perspective via Bayesian...
PDF
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Exploring DenseNet architectures with particle swarm optimization: efficient ...
PDF
A transfer learning-based deep neural network for tomato plant disease classi...
PDF
U-Net for wheel rim contour detection in robotic deburring
PDF
Deep learning-based classifier for geometric dimensioning and tolerancing sym...
PDF
Enhancing fire detection capabilities: Leveraging you only look once for swif...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Depression detection through transformers-based emotion recognition in multiv...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Enhancing financial cybersecurity via advanced machine learning: analysis, co...
PDF
Crop classification using object-oriented method and Google Earth Engine
Hybrid model detection and classification of lung cancer
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
Event detection in soccer matches through audio classification using transfer...
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
Optimizing deep learning models from multi-objective perspective via Bayesian...
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
A novel scalable deep ensemble learning framework for big data classification...
Exploring DenseNet architectures with particle swarm optimization: efficient ...
A transfer learning-based deep neural network for tomato plant disease classi...
U-Net for wheel rim contour detection in robotic deburring
Deep learning-based classifier for geometric dimensioning and tolerancing sym...
Enhancing fire detection capabilities: Leveraging you only look once for swif...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Depression detection through transformers-based emotion recognition in multiv...
A comparative analysis of optical character recognition models for extracting...
Enhancing financial cybersecurity via advanced machine learning: analysis, co...
Crop classification using object-oriented method and Google Earth Engine
Ad

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
sap open course for s4hana steps from ECC to s4
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
sap open course for s4hana steps from ECC to s4
The Rise and Fall of 3GPP – Time for a Sabbatical?

Comparing logistic regression and extreme gradient boosting on student arguments

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 3, September 2024, pp. 3119~3128 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3119-3128  3119 Journal homepage: http://ijai.iaescore.com Comparing logistic regression and extreme gradient boosting on student arguments Tri Wahyuningsih, Danny Manongga, Irwan Sembiring Department of of Computer Science, Universitas Kristen Satya Wacana, Salatiga, Indonesia Article Info ABSTRACT Article history: Received Jan 1, 2024 Revised Jan 27, 2024 Accepted Feb 10, 2024 Identifying the effectiveness level and quality of students' arguments poses a challenge for teachers. This is due to the lack of techniques that can accurately assist in identifying the effectiveness and quality of students' arguments. This research aims to develop a model that can identify effectiveness categories in students' arguments. The method employed involves the logistic regression+XGBoost algorithm combined with separate implementations of term frequency-inverse document frequency (TF-IDF) and CountVectorizer. Student argument data were collected and processed using natural language processing techniques. The research results indicate that TF-IDF outperforms in identifying effectiveness classes in student arguments with an accuracy of 66.20%. The multi-output classification yielded an accuracy of 89.32% in the initial testing, which further improved to 92.34% after implementing one-hot encoding. A novel finding in this research is the superiority of TF-IDF as a technique for identifying effectiveness classes in student arguments compared to CountVectorizer. The implications of this research include the development of a model that can assist teachers in identifying the effectiveness level of students' arguments, thereby improving the quality of learning and enhancing students' argumentative competence. Keywords: Argumentative competence Effectiveness identification Logistic regression Multi-output classification TF-IDF vs. CountVectorizer This is an open access article under the CC BY-SA license. Corresponding Author: Tri Wahyuningsih Department of of Computer Science, Universitas Kristen Satya Wacana Salatiga, Indonesia Email: 982022001@student.uksw.edu 1. INTRODUCTION The development of argumentation identification models poses a significant challenge in artificial intelligence development, especially when dealing with the complexity and diversity of human language structures [1]. Argument identification involves not only understanding individual words but also requires the model's capacity to interpret context, capture nuanced meanings, and recognize relationships between parts of a text. While many advancements have been made in natural language processing (NLP) and machine learning, the development of argumentation identification models still faces several critical challenges [2]. One of them is the diversity in how humans present arguments, ranging from linear structures to more concealed and implicit deliveries. Models must be able to capture this depth and complexity to provide satisfactory results. Additionally, in developing argumentation identification models, it is crucial to address biases that may emerge in the training data. These biases can create less accurate or even harmful models when applied in real-world situations. Therefore, a critical aspect of this model development is ensuring that the training data representation includes diversity that reflects the reality of human argumentation without causing distortion or imbalance. The use of logistic regression algorithms has been employed in some previous studies [2], and based on these studies, it was found that logistic regression performs well. This provides evidence and also raises questions in this study regarding the accuracy of the algorithm's performance. The method used in this study
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128 3120 is a combination of the logistic regression and XGBoost algorithms. Both algorithms were chosen because they can measure the effectiveness and quality of student arguments with high accuracy [3]. Logistic regression is used to predict the effectiveness class in student arguments, while XGBoost is used to enhance the prediction results of the logistic regression algorithm [4]. The use of logistic regression algorithms along with XGBoost as a method for evaluating student arguments is an important step in addressing teachers' challenges in measuring the effectiveness and quality of student arguments. The study compares the efficacy of the term frequency-inverse document frequency (TF-IDF) and CountVectorizer methods in analyzing student arguments, aiming to aid teachers in evaluating argument quality. TF-IDF gauges word occurrence while mitigating the impact of common terms, whereas CountVectorizer converts text into a numeric format for algorithmic processing. Results demonstrate TF-IDF's superiority in discerning argument effectiveness. This research innovates by addressing the challenge of accurately assessing argument quality, offering a model to support teachers in this task where current tools are lacking. Contributions include the development of a model for assessing argument quality, a comparison of TF-IDF and CountVectorizer methods, and enhancements through machine learning techniques like apostrophes and one-hot encoding. This research endeavors to construct a model aimed at gauging the effectiveness and quality of student arguments, utilizing machine learning techniques to aid educators in a more streamlined and unbiased assessment process. The primary research inquiries revolve around the feasibility of employing the logistic regression+XGBoost algorithm for this task, the comparison between logistic regression+XGBoost+TF-IDF and logistic regression+XGBoost+CountVectorizer algorithms to determine superiority, and the potential enhancements in model performance through text mining optimization techniques like apostrophes and one-hot encoding. Through the development and evaluation of this model, the study aims to provide educators with a more objective and efficient means of evaluating student arguments, ultimately assisting in their instructional endeavors. 2. RELATED RESEARCH The presence of several levels regarding the quality of students' arguments, as outlined in the previous section, organizes discussions on related quality based on these areas and focuses on the same approach as the research objectives. This research specifically highlights the methods that have different research approaches used in the field of detecting the quality of student arguments. It begins by presenting relevant approaches in the domain of argument quality detection, followed by model development techniques. It then discusses works on the quality of student arguments in general. Finally, this section outlines works related to model development with comparative analysis. 2.1. Related research on the use of text mining to detect text quality Text mining, as one branch of data mining, utilizes techniques and algorithms to analyze and extract information from textual data [5]–[9]. The use of text mining in identifying text quality is highly beneficial and relevant, considering that most information is currently conveyed through text [10]–[13]. Some previous studies have employed text mining to measure text quality using methods such as sentiment analysis, opinion mining, and text classification [6], [14]–[17]. The results of these studies vary greatly and still have ample room for improvement [18]–[22]. Several criteria are used to measure text quality, such as accuracy, precision, recall, and F1-score. These criteria provide an overview of how well text mining algorithms identify text quality. In identifying text quality, text mining also employs techniques such as bag of words, n-gram, TF-IDF, and word embedding. These techniques are used to transform text into numerical representations that can be processed by machines [8], [10], [17]. Research related to text mining for identifying text quality also takes into account factors such as context, slang, emotion, and sentiment. These factors significantly influence measuring text quality, necessitating better algorithms for measurement. With research related to the use of text mining to identify text quality, it is hoped that better and more accurate methods for measuring text quality can be discovered. 2.2. Related research on argument quality detection Argument quality detection is a crucial field in communication and learning sciences. Several studies have been conducted to explore how to identify argument quality. However, many of these studies use manual methods that are time-consuming and inefficient. Therefore, there is a need to develop more effective and efficient tools for identifying argument quality [23]–[26]. Some research on argument quality detection uses machine learning algorithms such as logistic regression and XGBoost [24]. These algorithms can identify the effectiveness and quality levels of arguments more accurately and efficiently than manual methods. One interesting study in this field is the development of sentiment analysis models to identify the effectiveness and quality of student arguments. In this research, machine learning algorithms are used to learn and identify
  • 3. Int J Artif Intell ISSN: 2252-8938  Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih) 3121 patterns in student arguments and determine the effectiveness and quality of the arguments. Despite many studies related to argument quality detection, there are still many challenges to be addressed. One of the biggest challenges is ensuring that the developed tools have high accuracy and can be widely used by teachers and learners. Therefore, research related to argument quality detection is still evolving to address these challenges and provide more effective and efficient solutions. 2.3. Related research on the development of machine learning models Machine learning models in sentiment measurement system development are an area undergoing significant development. These related studies aim to develop predictive models and clustering in categorizing sentiment in text or paragraphs [27]–[29]. Some studies have been conducted to develop models for educational aspects, such as assessing student responses [20], [30]–[33]. In some studies, models are also compared with other text mining methods such as TF-IDF and CountVectorizer to evaluate their effectiveness in text mining [22]. Previous research results indicate that models have higher performance levels compared to other methods. There is also research focusing on improving the accuracy and effectiveness of models [13], [18], [21]. This research involves the development of better machine learning models and the application of optimization techniques to improve the results of previous models. This study examines previous research [13], [14], [33] both of which used a dataset similar to this study, namely "feedback prize-predicting effective arguments" on Kaggle. Nevertheless, the goals of this research differ significantly from the works of [13], [14], [34]. Ding et al. [13] focused on improving the accuracy and reliability of automatic assessment of the quality of student arguments by using multi- task learning, while this research aims to explore the combination of three tasks: automatic span detection, type prediction, and quality prediction. Meanwhile, Ding et al. [14] investigates the influence of prompts in identifying student arguments. Furthermore, Wang et al. [15] also adopting a combination of logistic regression and XGB models, uses a different dataset, focusing on the detection of credit card fraud risk in the UCI Public Germany dataset in 2018. Table 1 describes the reseach summary used in this study. Table 1. Research summary Research Dataset Algorithm Target Year Wang et al. [15] UCI public germany “the German credit data set (1994)” Linear regression+XGBoost Credit fraud risk detection 2018 Rahman et al. [1] Twitter “sentiment polarity datasets” N-Gram, TF-IDF, ensemble Comparison 2020 Dogra et al. [18] Bank report “banking financial news” DistilBERT Evaluation 2021 Yu et al. [17] Questionnaire “Chengde Medical College, China” TF-IDF + GRU neural networks Model development 2021 Ding et al. [14] Kaggle “feedback prize - predicting effective arguments” k-means clustering & TF- IDF Model development and evaluation 2022 Ding et al. [13] Kaggle “feedback prize - predicting effective arguments” Logistic regression, BERT, multi-task learning (MTL) Comparison 2023 Recent research Kaggle “feedback prize - predicting effective arguments” Logistic regression, Xgboost, TF-IDF, CountVectorizer Model development and evaluation 2023 3. METHOD 3.1. Student argument dataset The dataset, sourced from Kaggle as secondary data, contains argumentative essays written by students in grades 6 to 12 in the United States, originating from Georgia State University (GSU). It includes contextual information gathered from students' questionnaire responses on various essay topics, totaling 36,765 entries. Utilizing this dataset, a model was developed to assess the quality of students' arguments, leveraging text and argument types for predictions and providing feedback for improvement. The choice of English for the model's development stems from its widespread usage, particularly in scientific literature and NLP research, benefiting from available resources and preprocessing techniques. However, efforts to extend the model's applicability to other languages are underway to enhance inclusivity and broaden its impact. For more described on Table 2 related dataset description, meanwhile Table 3 described example of dataset. Table 2. Dataset description Column Explanation discourse_id Identification of the discussion containing those arguments. essay_id Identification of the essay from the tested discussion. discourse_text Text from the argument itself. discourse_type Types of arguments, such as evidence, rebuttal, counterclaim, lead, concluding statement, position, and claim. discourse_effectiveness Numbers representing the types of arguments, such as 1 for ineffective, 2 for adequate, or 3 for effective.
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128 3122 Table 3. Example dataset discourse_id essay_id discourse_text discourse_type discourse_effectiveness c22adee811b6 007ACE74B050 I think that the face is a natural landform because there is no life on Mars that we have descovered yet Claim Adequate This research employs one dependent variable, namely discourse effectiveness, which is divided into three categories: "effective," "adequate," and "ineffective." This dependent variable reflects the extent to which a student's argument or text is considered effective in the context of discourse analysis. The independent variable used in this study is a combination of discourse text and discourse type, with the rationale that discourse type is combined with discourse text to clarify the type of argument. Discourse type encompasses seven categories: lead, position, claim, CounterClaim, rebuttal, evidence, and concluding statement. Each type of discourse contributes uniquely to constructing the structure and content of an argument. In the context of this research, this variable becomes the main focus in identifying patterns and characteristics of discourse types that can influence discourse effectiveness. The comparison diagram of the distribution of each category is shown in Figure 1, and words with the highest frequency in each category are visualized using a word cloud in Figure 2. Wordcloud is one of the crucial visualization techniques in text mining, particularly for datasets with two classes. It provides a visual representation of the most commonly used words in each class, aiding in the identification of key words that may differentiate the two classes. Therefore, wordclouds can expedite and simplify the data exploration process, helping researchers understand the characteristics of each class more effectively. Additionally, wordclouds assist researchers in selecting the most relevant and important features when constructing a classification model. Consequently, wordclouds contribute to enhancing the quality and accuracy of text mining analysis on datasets with two classes. Figure. 1. Class type distribution
  • 5. Int J Artif Intell ISSN: 2252-8938  Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih) 3123 Figure 2. Wordcloud on each type and category 3.2. Research flow The study employed a systematic research flow, starting with data collection from a Kaggle dataset featuring argumentative essays evaluated by experts for various elements, categorized into argument types and effectiveness classes. Preprocessing involved data cleaning, transformation, integration, and reduction to enhance efficiency, followed by word and text preprocessing utilizing apostrophes and one-hot encoding for better data preparation [30]–[32], [35], [36]. Model evaluation tested three models, including logistic regression combined with XGBoost with TF-IDF and CountVectorizer features to weigh words. The goal was to identify the most accurate model for predicting data classes. Evaluation results validated analysis accuracy through classification, cluster similarity, and topic representativeness, utilizing a multi-output classification model to select the best-performing model and test multi-output predictions. Ultimately, the conclusion stage will present the results of the text mining analysis and summarize the findings of this research in a clear and understandable manner, whether in the form of tables, graphs, or narratives. The steps of this research are illustrated in Figure 3. Figure 3. Research step
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128 3124 4. RESULTS AND DISCUSSION 4.1. Model development The comparison of the developed models is focused on identifying answers to the research question. This study investigates how efficiently students' arguments can be detected using: a) Logistic regression+XGBoost Combining logistic regression with XGBoost using (1): ŷ𝐾𝑜𝑚𝐴 = 𝛼ŷ𝐿𝑅 + (1 − 𝛼)ŷ𝑋𝐺𝐵 (1) Where 𝛼ŷ𝐿𝑅 is the prediction from logistic regression, ŷ𝑋𝐺𝐵 is the prediction from the XGBoost model, and α is the weight of the specified model. The weight α will influence the extent to which the model wants to give preference to the predictions from each model. If α=0, the model will only use predictions from the XGBoost model, and if α=1, the model will only use predictions from logistic regression. If α is between 0 and 1, the model will assign different weights to both predictions. b) Logistic regression+XGBoost+TF-IDF The combination of logistic regression with XGBoost and TF-IDF involves four equations as (2): ŷ𝐾𝑜𝑚𝐵 = 𝛼ŷ𝐿𝑅 + 𝛽ŷ𝑋𝐺𝐵 + (1 − 𝑎 − 𝛽)ŷ𝑇𝐹𝐼𝐷𝐹 (2) Where α is the weight assigned to the predictions from the logistic regression model, β is the weight assigned to the predictions from the XGBoost model, and (1 − 𝑎 − 𝛽)ŷ𝑇𝐹𝐼𝐷𝐹 is the weight assigned to the predictions from the TF-IDF method. The weights α, β, and (1 − 𝑎 − 𝛽) will influence how much the model prefers each prediction source. If α = 1, only predictions from logistic regression will be used, if β=1, only predictions from XGBoost will be used, if (1 − 𝑎 − 𝛽) = 1, only predictions from TF-IDF will be used. c) Logistic regression+XGBoost+CountVectorizer The combination of logistic regression with XGBoost and CountVectorizer is achieved using (3): ŷ𝐾𝑜𝑚𝐶 = 𝛼ŷ𝐿𝑅 + 𝛽ŷ𝑋𝐺𝐵 + (1 − 𝑎 − 𝛽)ŷ𝐶𝑉 (3) Where α is the weight assigned to the predictions from the logistic regression model, β is the weight assigned to the predictions from the XGBoost model, and (1 − 𝑎 − 𝛽)ŷ𝐶𝑉 is the weight assigned to the predictions from the CountVectorizer method. The weights α, β, dan (1 - α - β) will influence the extent to which the model prefers each source of predictions. If α=1, then only predictions from logistic regression will be used. If β=1, then only predictions from XGBoost will be used. If (1 − 𝑎 − 𝛽) = 1, then only predictions from CountVectorizer will be used. This research discusses two methods for determining parameters α and β in composite models: ‒ Manual determination involves optimization techniques like grid or random search across parameter space, offering insights into parameter effects, potentially utilizing complex optimization algorithms. ‒ Automatic tuning with machine learning utilizes algorithms such as random search or bayesian optimization. While manual determination offers deeper insights into parameter variations, the choice depends on research goals and resource availability. The modeling aims to analyze machine learning algorithm performance across different feature space configurations, as depicted in Table 4, providing clear experimental labels for result differentiation and discussion. Table 4. The experimental setup is based on the algorithm used Label experimen Model A Logistic regression+XGBoost B Logistic regression+XGBoost+TF-IDF C Logistic regression+XGBoost+CountVectorizer For the purpose of machine learning education, this research utilizes Python, a programming language that supports data analysis and mining using various machine learning algorithms. The objective of employing multiple machine learning algorithms is to examine the consistency of the acquired knowledge. The study conducts three experiments (A, B, and C), employing three classifications for each. The next section presents experimental findings for all label combinations and experimental classifiers, followed by a conclusive discussion on the research findings.
  • 7. Int J Artif Intell ISSN: 2252-8938  Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih) 3125 4.2. Model training and testing In this study, the initial dataset consists of 36,765 annotated argumentative essay data ready for analysis. To optimize computational performance and provide flexibility to model users, the initial dataset is divided into two parts: training data and testing data. This division is done in a 70:30 ratio, where 70% of the total initial data (25,735 data) is used as training data, while 30% (11,030 data) is used as testing data. This separation is carried out automatically and will be tested randomly, aiming to provide flexibility to model users to easily change test data without rearranging training data, thereby facilitating the testing and evaluation process of the model. In this way, the research can test and evaluate the model's performance with previously unseen data, maintaining the integrity of the analysis results and ensuring that the model has good generalization capabilities for new data. 4.3. Model evaluation Table 5 presents the summary accuracy results of three different experiments using models in this research. Experiment A utilizes a combination of logistic regression and XGBoost models. The accuracy result of this experiment is 58.29%. Precision, measuring the proportion of correctly identified data from a specific class, is 65.05%. Recall, measuring the proportion of actual data from a specific class correctly identified by the model, is 57.93%. The F-measure, combining precision and recall, has a value of 61.28%. Experiment B involves a combined model of logistic regression, XGBoost, and TF-IDF features. This experiment achieves an accuracy of 66.20%. Precision reaches 75.44%, recall is 70.12%, and the F-measure reaches 72.31%. Experiment C also uses a combined logistic regression and XGBoost model, this time with CountVectorizer features. Its accuracy is 66.02%, with precision at 64.11%, recall at 87.32%, and F-measure at 74.12%. The evaluation matrix analysis in the table provides a comprehensive overview of the performance of the three different experiments in this research. Accuracy is a metric measuring the overall alignment of model predictions with actual data. Experiment B with the combined features of logistic regression, XGBoost, and TF-IDF shows the highest accuracy at 66.20%, indicating the model's ability to classify data correctly. Precision, indicating how well the model can correctly identify positive data, is highest in experiment B with a value of 75.44%. Recall, measuring the model's ability to identify all positive data, has the highest value in experiment C at 87.32%. The F-measure, combining precision and recall, shows that experiment B has the highest balance between these two metrics with a value of 72.31%. Overall, experiments B and C stand out with better performance, where experiment C excels in recognizing all positive data (recall), while experiment B demonstrates a good balance between precision and recall. The higher accuracy in both experiments indicates the model's accuracy in classifying the entire dataset. Table 5. Summary of Model Accuracy Results Experiment A Experiment B Experiment C (Logistic regression+ XGBoost) (Logistic regression+ XGBoost+TF-IDF) (Logistic regression+ XGBoost+CountVectorizer) Accuracy (%) 58.29 66.20 66.02 Precision (%) 65.05 75.44 64.11 Recall (%) 57.93 70.12 87.32 F-Measure (%) 61.28 72.31 74.12 4.4. Multi-output classification Multi-output classification is a text classification method used to categorize text into multiple different classes simultaneously. This method is useful for identifying more than one attribute or category within a text, providing more comprehensive and accurate information. For example, in sentiment analysis research, multi- output classification can be employed to classify a text as positive, negative, or neutral, while also identifying the topics or themes discussed in the text. With multi-output classification, we can analyze text in more detail and obtain more useful information for specific purposes. In this study, the model testing is based on or has a reference in the form of testing data that has been categorized or assigned classes for each argument. The goal of multi-class and multi-output is to use this testing data as a reference for classifying the quality classes based on computational text, resulting in primary classes such as effective, sufficient, or ineffective. The testing data and results used in this study can be seen in Figures 4(a) to 4(c). In the initial test results, it was found that the model accuracy rate was approximately 89.32%. There were 11.68% prediction errors out of 11,030 tested data. However, after employing the one-hot encoding technique in the second test, it was discovered that the model successfully provided prediction results with an accuracy rate reaching 92.34%. Therefore, it can be concluded that the model used with TF-IDF and one-hot encoding techniques is highly effective in classifying student arguments with a high level of accuracy. In the context of using different data in this model, the integration of additional data allows for highly possible
  • 8.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128 3126 adoption, especially if the dataset is consistent with the same target labels. A model that has been trained with the original dataset can be updated or fine-tuned with additional datasets to enhance its classification capabilities. This approach leverages the diversity of information that may exist in new datasets, enriching the model's understanding of variations in the data. It is expected that the combination of different datasets can help the model improve its generalization and performance. However, it is crucial to emphasize the importance of maintaining consistency in annotations or target labels across the entire dataset so that the model can produce consistent and reliable classification results. By merging different data, it is hoped that the model can achieve better robustness and handle variations that may arise in the context of classification tasks. (a) (b) (c) Figure 4. Data testing class prediction result: (a) actual data, (b) before using one-hot encoding, and (c) after using one-hot encoding 5. CONCLUSION The results of developing the model using logistic regression and XGBoost indicate that both methods can be used to predict a text category with fairly high accuracy. The comparison results using TF-IDF and CountVectorizer techniques show that TF-IDF is superior in terms of prediction accuracy. In the initial test, the model achieved an accuracy of approximately 89.32%, but there was a 11.68% prediction error out of 11,030 tested data. In the second test using one-hot encoding technique, the model successfully predicted with an accuracy rate of 92.34%. Therefore, it can be concluded that the model used with TF-IDF and one-hot encoding techniques is highly effective in classifying student arguments with a high level of accuracy. Using TF-IDF, the model can predict a text category more accurately because it considers the importance of a word in a document. This can reduce bias compared to using CountVectorizer, which only counts the occurrences of words in a document. The success of the model development provides a new breakthrough for educators in evaluating and categorizing classes of arguments written by students. This can be further developed by using the data from the model to test patterns or patterns of effectiveness and types of arguments. For future research, the use of other classification algorithms can also be implemented on the same model to test their performance with different classifications. REFERENCES [1] S. S. M. M. Rahman, K. B. M. B. Biplob, M. H. Rahman, K. Sarker, and T. Islam, “An investigation and evaluation of N-gram, TF- IDF and ensemble methods in sentiment classification,” in Cyber Security and Computer Science, Cham: Springer, 2020, pp. 391– 402, doi: 10.1007/978-3-030-52856-0_31. [2] G. N. Harywanto, R. Siautama, A. C. I. Ardison, and D. Suhartono, “Extractive hotel review summarization based on TF/IDF and adjective-noun pairing by considering annual sentiment trends,” Procedia Computer Science, vol. 179, pp. 558–565, 2021, doi: 10.1016/j.procs.2021.01.040. [3] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: a comparative study,” Electronics, vol. 9, no. 3, pp. 1–29, 2020, doi: 10.3390/electronics9030483. [4] R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Computer Science, vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008. [5] H. Liu, X. Chen, and X. Liu, “A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis,” IEEE Access, vol. 10, pp. 32280–32289, 2022, doi: 10.1109/ACCESS.2022.3160172.
  • 9. Int J Artif Intell ISSN: 2252-8938  Comparing logistic regression and extreme gradient boosting on student arguments (Tri Wahyuningsih) 3127 [6] M. Kamyab, G. Liu, and M. Adjeisah, “Attention-based CNN and Bi-LSTM model based on TF-IDF and glove word embedding for sentiment analysis,” Applied Sciences, vol. 11, no. 23, pp. 1–17, 2021, doi: 10.3390/app112311255. [7] M. Chiny, M. Chihab, Y. Chihab, and O. Bencharef, “LSTM, VADER and TF-IDF based hybrid sentiment analysis model,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, pp. 265–275, 2021, doi: 10.14569/IJACSA.2021.0120730. [8] R. S. Patil and S. R. Kolhe, “Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets,” Social Network Analysis and Mining, vol. 12, no. 1, 2022, doi: 10.1007/s13278-022-00877-w. [9] S. Fransiska and A. Irham Gufroni, “Sentiment analysis provider by. U on Google Play Store reviews with TF-IDF and support vector machine (SVM) method,” Scientific Journal of Informatics, vol. 7, no. 2, pp. 2407–7658, 2020. [10] A. Mee, E. Homapour, F. Chiclana, and O. Engel, “Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit,” Knowledge-Based Systems, vol. 228, 2021, doi: 10.1016/j.knosys.2021.107238. [11] F. M. Al-kharboush and M. A. Al-Hagery, “Features extraction effect on the accuracy of sentiment classification using ensemble models,” International Journal of Science and Research (IJSR), vol. 10, no. 3, pp. 228–231, 2021, doi: 10.21275/SR21303123511. [12] K. S. Kalaivani, S. Uma, and C. S. Kanimozhiselvi, “A review on feature extraction techniques for sentiment classification,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 679–683, doi: 10.1109/ICCMC48092.2020.ICCMC-000126. [13] Y. Ding, M. Bexte, and A. Horbach, “Score it all together: a multi-task learning study on automatic scoring of argumentative essays,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 13052–13063, doi: 10.18653/v1/2023.findings-acl.825. [14] Y. Ding, M. Bexte, and A. Horbach, “Don’t drop the topic-the role of the prompt in argument identification in student writing,” in Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022, pp. 124– 133, doi: 10.18653/v1/2022.bea-1.17. [15] M. Wang, J. Yu, and Z. Ji, “Credit fraud risk detection based on XGBoost-LR hybrid model,” in Proceedings of the International Conference on Electronic Business (ICEB) 2018, 2018, pp. 336–343. [16] N. S. M. Nafis and S. Awang, “An enhanced hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001. [17] H. Yu, Y. Ji, and Q. Li, “Student sentiment classification model based on GRU neural network and TF-IDF algorithm,” Journal of Intelligent and Fuzzy Systems, vol. 40, no. 2, pp. 2301–2311, 2021, doi: 10.3233/JIFS-189227. [18] V. Dogra, A. Singh, S. Verma, Kavita, N. Z. Jhanjhi, and M. N. Talib, “Analyzing DistilBERT for sentiment classification of banking financial news,” Lecture Notes in Networks and Systems, vol. 248, pp. 501–510, 2021, doi: 10.1007/978-981-16-3153-5_53. [19] D. Suhartono and K. Khodirun, “System of information feedback on archive using term frequency-inverse document frequency and vector space model methods,” IJIIS: International Journal of Informatics and Information Systems, vol. 3, no. 1, pp. 36–42, 2020, doi: 10.47738/ijiis.v3i1.6. [20] J. R. Batmetan and T. Hariguna, “Sentiment Unleashed: Electric Vehicle Incentives Under the Lens of Support Vector Machine and TF-IDF Analysis,” J. Appl. Data Sci., vol. 5, no. 1, pp. 122–132, 2024, doi: 10.47738/jads.v5i1.162. [21] Riyanto and A. Azis, “Application of the vector machine support method in twitter social media sentiment analysis regarding the covid-19 vaccine issue in Indonesia,” Journal of Applied Data Sciences, vol. 2, no. 3, pp. 102–108, 2021, doi: 10.47738/jads.v2i3.40. [22] S. Qaiser and R. Ali, “Text mining: use of TF-IDF to examine the relevance of words to documents,” International Journal of Computer Applications, vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395. [23] A. Shankar, C. Jebarajakirthy, and M. Ashaduzzaman, “How do electronic word of mouth practices contribute to mobile banking adoption?,” Journal of Retailing and Consumer Services, vol. 52, 2020, doi: 10.1016/j.jretconser.2019.101920. [24] H. Cho and W. Chiu, “COVID-19 pandemic: consumers’ purchase intention of indoor fitness products during the partial lockdown period in Singapore,” Asia Pacific Journal of Marketing and Logistics, vol. 34, no. 10, pp. 2299–2313, 2022, doi: 10.1108/APJML- 04-2021-0235. [25] M. C. G. Davidson, “Does organizational climate add to service quality in hotels?,” International Journal of Contemporary Hospitality Management, vol. 15, no. 4, pp. 206–213, 2003, doi: 10.1108/09596110310475658. [26] H. R. Zeinabadi, “Principal-teacher high-quality exchange indicators and student achievement: testing a model,” Journal of Educational Administration, vol. 52, no. 3, pp. 404–420, 2014. [27] S. Klotz and A. Lindermeir, “Multivariate credit portfolio management using cluster analysis,” Journal of Risk Finance, vol. 16, no. 2, pp. 145–163, 2015, doi: 10.1108/JRF-09-2014-0131. [28] Q. Li, K. H. Cheung, J. You, R. Tong, and A. Mak, “A robust automatic face recognition system for real-time personal identification,” Sensor Review, vol. 26, no. 1, pp. 38–44, 2006, doi: 10.1108/02602280610640661. [29] K. Plangger, M. Montecchi, I. Danatzis, M. Etter, and J. Clement, “Strategic enablement investments: exploring differences in human and technological knowledge transfers to supply chain partners,” Industrial Marketing Management, vol. 91, pp. 187–195, 2020, doi: 10.1016/j.indmarman.2020.09.001. [30] L. S. Riza, A. B. Rachmat, Munir, T. Hidayat, and S. Nazir, “Genomic repeat detection using the Knuth-Morris-Pratt algorithm on R high-performance-computing package,” International Journal of Advances in Soft Computing and its Applications, vol. 11, no. 1, pp. 94–111, 2019. [31] F. Wang, Q. Wang, F. Nie, Z. Li, W. Yu, an9d F. Ren, “A linear multivariate binary decision tree classifier based on K-means splitting,” Pattern Recognition, vol. 107, 2020, doi: 10.1016/j.patcog.2020.107521. [32] Y. Choi, “Finding ‘just right’ books for children: analyzing sentiments in online book reviews,” Electronic Library, vol. 37, no. 3, pp. 563–576, 2019, doi: 10.1108/EL-01-2019-0018. [33] N. Süzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, “Automatic short answer grading and feedback using text mining methods,” Procedia Computer Science, vol. 169, pp. 726–743, 2020, doi: 10.1016/j.procs.2020.02.171. [34] S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and support vector machine for news classification,” in 2016 IEEE International Conference on Engineering and Technology (ICETECH), 2016, pp. 112–116, doi: 10.1109/ICETECH.2016.7569223. [35] D. Antons, E. Grünwald, P. Cichy, and T. O. Salge, “The application of text mining methods in innovation research: current state, evolution patterns, and development priorities,” R and D Management, vol. 50, no. 3, pp. 329–351, 2020, doi: 10.1111/radm.12408. [36] I. A. Orobor and N. O. Obi, “Machine learning pipeline for multi-class text classification,” International Journal of Engineering Applied Sciences and Technology, vol. 7, no. 2, pp. 64–69, 2022.
  • 10.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3119-3128 3128 BIOGRAPHIES OF AUTHORS Tri Wahyuningsih is doctoral student of computer science program at Satya Wacana Christian University. She has a strong interest in information systems management and decided to pursue her doctoral degree in computer science. She has interests and capabilities in data mining and text mining. Before starting her doctoral program, she completed her bachelor and master degrees in informatics engineering. She has working experience as a system analyst and showed excellent achievements during her work. Now, she is focusing on her doctoral studies and working on several research projects in the field of information systems management. She can be contacted at email: 982022001@student.uksw.edu. Danny Manongga obtained his Bachelor's degree (Drs) from Universitas Kristen Satya Wacana in 1982, followed by a Ph.D. from the University of East Anglia in 1997. His international exposure includes an M.Sc. in IT from Queen Mary College, London, in 1989. Currently serving as a professor and the dean of the Faculty of Information Technology, he brings a wealth of expertise to his administrative and academic roles. Professor Manongga's extensive contributions to research and education underscore his commitment to advancing technology and information studies. He can be contacted at email: danny.manongga@uksw.edu. Irwan Sembiring earned his Bachelor of Engineering degree in informatics engineering from Universitas Pembangunan Nasional "Veteran" Yogyakarta in 2001, Master of Computer Science in computer science from Gadjah Mada University Yogyakarta in 2004, and Doctorate in computer science from Gadjah Mada University Yogyakarta in 2015. His main research interests are computer network security, and he has done more than 40 publications during his education and teaching career. His research interests include computer network security and computer network designing. Currently, he is active as a lecturer at the Faculty of Information Technology, Satya Wacana Christian University Salatiga. He can be contacted at email: irwan@uksw.edu.