TELKOMNIKA, Vol.17, No.5, October 2019, pp.2667~2674
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v17i5.12646 ◼ 2667
Received January 5, 2019; Revised March 17, 2019; Accepted April 22, 2019
The classification of the modern arabic poetry
using machine learning
Munef Abdullah Ahmed*1
, Raed Abdulkareem Hasan2
,
Ahmed Hussein Ali3
, Mostafa Abdulghafoor Mohammed4
1
Faculty of Al Hawija Technical institute, Northern Technical University, Mosel, 41002, Iraq
2
Faculty of Al-dour Technical institute, Northern Technical University, Mosel, 41002, Iraq
3
AL Salam University College Computer Science Department Baghdad, Iraq
4
Great Imam University College, Baghdad, 10053, Iraq
4
Faculty of Automatic Control and Computers, University Polytechnic of Bucharest, 060042, Romania
*Corresponding author, e-mail: drmunef69@gmail.com1
, raed.isc.sa@gmail.com2
msc.ahmed.h.ali@gmail.com3
, alqaisy86@gmail.com4
Abstract
In recent years, working on text classification and analysis of Arabic texts using machine learning
has seen some progress, but most of this research has not focused on Arabic poetry. Because of some
difficulties in the analysis of Arabic poetry, it was required the use of standard Arabic language on which
“Al Arud”, the science of studying poetry is based. This paper presents an approach that uses machine
learning for the classification of modern Arabic poetry into four types: love poems, Islamic poems, social
poems, and political poems. Each of these species usually has features that indicate the class of
the poem. Despite the challenges generated by the difficulty of the rules of the Arabic language on which
this classification depends, we proposed a new automatic way of modern Arabic poems classification to
solve these issues. The recommended method is suitable for the above-mentioned classes of poems. This
study used Naïve Bayes, Support Vector Machines, and Linear Support Vector for the classification
processes. Data preprocessing was an important step of the approach in this paper, as it increased
the accuracy of the classification.
Keywords: classification of arabic poems, machine learning algorithms, modern arabic poems
Copyright © 2019 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
Despite the number of approaches on the automatic classification of the English
language and other languages, the Arabic language still needs a lot of research, especially
related to Arabic poetry. This is due to the number of determinants in the language, including its
difficulty and the need to master the rules of the language when studying poetry. There is also a
need for a full understanding of the theory of “Al Arud”, which specializes in the study of Arabic
poetry [1] whether as a regular text or poem, focused on the topic or on the effects [2]. Few
studies have used sentiment analysis to classify Arabic texts [3]. In this study, we used Naïve
Bayes (NB), Support Vector Machines (SVM), and Linear Support Vector classification (SVC)
for the classification task.
The next section of this paper covers a review of the related work, followed by
the introduction of the four categories of modern Arabic poetry. After that, the dataset of
the work is presented, followed by the data preprocessing step which has a direct effect on
the accuracy of the classification process. The sixth and seventh sections focus on feature
selection and the machine learning algorithms used. These sections are followed by those that
discuss the methodology, results, and conclusions from the study.
2. State of the ART
Several methods have been used in the English language for the classification of
emotions. Some of these studies depended on keywords spotting or unambiguous words like
“happy” and “sad” [4]. The lexical affinity from the effective research in this field depended on
the emotion of the arbitrary term or words. In general, this method is better than the keyword
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674
2668
spotting method as it cannot be used as an independent model [5]. There are other methods
which rely on a deep understanding of the language and semantics [5]. Reliance on
psychological theory in determining desires, goals, and needs was one of the models used in
the classification [6]. The machine learning techniques used in the classification of classical
Arabic poetry depended on the emotion [7]. This work classified the Arabic poetry into
Fakhr, Retha, Ghazal, and Heija. The polynomial networks were used in the Arabic text
classification [8]. Several classification algorithms have been used in the classification of Arabic
text, such as SVM [8, 9], the NB [10], K-Nearest Neighbor (KNN) [11], Artificial Neural Network
(ANN) [12], and the Rocchio feedback algorithm [13].
3. Categories of Modern Arabic Poetry
The modern Arabic poetry in general consists of the following types [14]:
− Love poems: It is a poetic art used to express the feelings between lovers. The poet derives
the meanings of his relationship with the subject, his outlook, the influence of
the environment, and the reality of those feelings.
− Islamic (religious) poems: The poets benefited from the stories contained in the Holy Quran;
so, they took the precepts, rulings, and semantics and employed them in their poetry,
treating community issues and problems that spread in their country at the time.
− Social poems: Social poems aim to repair bad social conditions by diagnosing the problem,
identifying its cause, and describing its resolution. The poets resort to the method of
encouragement and motivation when they want their people to contribute to the promotion
and progress and avoid the pests and conditions that undermine the foundations of
its renaissance.
− Political poems: This type of poetry expresses certain political orientations and the personal
views of poets while preserving the way poetry is written, the values of literary and
artistic poetry.
4. The Dataset
The Arabic language research using Natural Language Processing (NLP) is different
from the English language in terms of the number and size of the datasets used. Due to
the limited number of free available datasets in the Arabic language (which is an obstacle in
the way of researchers), most researchers rely on a collection of datasets taken from
magazines, news stations, and websites. Some researchers depended on Saudi
newspapers [11]. In the Arabic research, several schools of thought have classified the datasets
into training and testing groups. In our work, the big problem is finding the datasets for tuning
and testing because it is the first work on using machine learning for classifying the modern
Arabic poetry. We depended on the website for datasets to train and test the categories of
modern Arabic poetry.
5. Data Pre-Processing
The Arabic language is difficult both in speaking and writing. It consists of 29 letters ( ‫أ‬‫ب‬
‫ت‬‫ث‬‫ج‬‫ح‬‫خ‬‫د‬‫ذ‬‫ر‬‫ز‬‫س‬‫ش‬‫ص‬‫ض‬‫ط‬‫ظ‬‫ع‬‫غ‬‫ف‬‫ق‬‫ك‬‫ل‬‫م‬‫ن‬‫ه‬‫و‬‫ي‬ ) and the ”Hamza” (‫)ء‬ which are divided into
two types. The first type is called long vowels, which includes three letters ( ‫ا‬,‫و‬,‫ي‬ ); the other is
called constant letters. In this language, there are several kinds of diacritics used, such as
“sukoon”, “dammah”, “Kasra”, “Fatha”, “tanween fatha”, “tanween kasra”, “tanween dammah”,
“shadde”, and “mad”. These short vowels give correct pronunciation and meaning. Table 1
illustrates the short vowels and pronunciations to the words that have the same letters but
different pronunciation and meaning as shown in Table 2.
Arabic writings are different from those using the Latin alphabet, due to the direction of
writing from right to left. Some letters in Arabic also take several forms depending on
the location of the character on the word. These features must be considered in this
work as shown in Table 3.
TELKOMNIKA ISSN: 1693-6930 ◼
The classification of the modern arabic poetry using… (Munef Abdullah Ahmed)
2669
Table 1. The Diacritics in Modern Arabic Poem
The short vowel The Sign Applied to the letter Pronunciation
“sukoon” ْ ‫ل‬-‫س‬ S- L
“dammah” ْ ‫ل‬-‫س‬ Su - Lu
“Kasra” ْ ‫ل‬-‫س‬ Si - Li
“Fatha” ْ ‫ل‬-‫س‬ Sa - La
“tanween fatha” ْ ‫ل‬-‫س‬ San - Lan
“tanween kasra” ْ ‫ل‬-‫س‬ Sin - Lin
“tanween dammah” ْ ‫ل‬-‫س‬ Son - Lon
“shadde” ْ ‫ل‬–‫س‬ Ss – Ll
Mad ~ ‫آ‬ Aa
Table 2. Example for the Effect the Diacritics
on the Arabic Word
Table 3. The Effect of a Positioning on
the form of a Letter
The word The meaning
‫م‬‫ل‬‫س‬ Hello
‫م‬‫ل‬‫س‬ Ladder
‫لم‬‫س‬ Was delivered
‫م‬‫ل‬‫س‬ Safety
‫م‬‫ل‬‫س‬ Saved
The letter The Arabic word The meaning
‫هـ‬ ‫هديه‬ Gift
‫ــهــ‬ ‫الهام‬ Important
‫ـــه‬ ‫له‬ For him
‫ه‬ ‫كره‬ A ball
The Arabic language has two types of genres, masculine and feminine. Each type in
the Arabic language has different qualities and features in Arabic grammar. There are three
classes in the Arabic language, the first is singular, the second is dual, and plural which also
has two types (regular and broken). The Arabic language contains many ramifications in
grammar. It is a very rich language, and this makes it difficult and a challenge to reach
the required accuracy in the classification of modern Arabic poetry.
Pre-processing of data is an important thing to do when building classification systems using
machine language for the following reasons:
− It removes noise from the text used in the classification.
− It reduces the terms or characteristics on which we base our classification.
− It helps reducing the amount of memory required for the classification.
− It helps increasing the accuracy of the classification.
We applied the following pre-processing on the data used in our work:
− Tokenization: We divided the data into parts and based on characteristics and recognition
of delimiters like the punctuation of special characters and white space.
− We removed non-Arabic terms, words, numbers, punctuations, and any other singe.
− The stop words like pronouns, prepositions, and conjunctions were also removed; we
deepened the list adopted by Khoja and Garside [15, 16].
− Stemming: The major aim of stemming is to decrease an inflated dataset. In Arabic, many
words can be composed from the same stem. Thus, we can reduce the number of terms
used in the dataset and the complexity of text classification. This is also a storage
requirement for classification systems [17, 18].
6. Features Selection
In machine learning, constructing or representing vectors of features is a very important
and critical point and has a significant impact on the results of the machine learning algorithm.
Each object should be represented with its own features.
𝐷 = 𝑑1𝑑2 … 𝑑𝑛. (1)
𝑑𝑖 = 𝑊1𝑊2 … 𝑊𝑛 (2)
𝑑̈ = 𝑔(𝑑) (3)
where D is a document, 𝑊 is a word, and 𝑔 is the function representing the relation between
the domain of documents and features. 𝑔 may be a linear or nonlinear equation. The number of
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674
2670
classes is represented by 𝐶 and the number of features is represented by 𝐾. 𝐶*𝐾 is a feature
vector length. We performed the mutually deducted occurrence as follows: 𝑛 𝑐 = (𝑓𝑖)
represented the probability of occurrence of feature 𝑓𝑖 in category or class c. Therefore,
the mutually deducted count feature became as follows:
𝑑𝑛 𝑐(𝑓𝑖) = 𝑛 𝑐(𝑓𝑖) − 𝑛 𝑑(𝑓𝑖), where 𝑑 ≠ 𝑐, (4)
which refers to the number of appearances of any characteristic or feature in any category
deducted from the number of appearances of the same characteristic in all other categories.
The feature vector was used for building document 𝐷 once. When found any feature,
the Boolean flag was used. The Boolean vector model used in this type of classification is better
than the count model [19, 20].
7. Machine Learning Algorithms
In our approach, three machine learning algorithms were selected for the classification
of modern Arabic poetry. These algorithms have been proven successful in the classification of
the English text. The first algorithm is Support Vector Machines, the second is Naïve Bayes, and
the third is Linear Support Vector Classification. The datasets consist of four groups (folders):
Islamic contains 23 files, Love contains 25 files, Politic contains 22 files, and Social contains 22
files, as illustrated in Table 4. Classifier performance is evaluated by computing its
precision [21], recall [16], and f-measure [22].
Table 4. The Datasets for the Classification
The folder name Number of files Number of verses
Islamic 23 600
Love 25 600
Politic 22 500
social 22 550
7.1. Support Vector Machines
SVM is a computationally kernel-based algorithm for regression and binary data
classification purposes [17, 18]. Based on the structural risk minimization theory, the SVM has
been proven successful in solving both local minimum and high dimensionality problems. It has
a better generalization performance compared to other ML methods such as ANNs [19, 20].
SVM has so far been excellent in solving several real-world data mining predictive problems like
time series prediction, text categorization, image processing, and pattern recognition [21, 22].
Despite the remarkable achievements of the SVM, there are still certain drawbacks that need to
be addressed, such as problems on the relationship of the statistical learning theory with other
theoretical frameworks, big data processing, parameters selection, and the generalization ability
of a given problem [23, 24]. With the rate of development of information systems,
high-dimensional, dynamic and complex data are easily generated [25, 26].
7.2. Naïve Bayes
The NB method is a classification scheme which relies on the Bayes’ theorem. This
technique assumes the independence of its predictors. Simply, the NB classifier assumes that
there is no relationship between the existence of certain features in a class and that of any other
feature [27-30]. This theory was adopted in determining the class of the document on
the following equation:
𝐶∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑐|𝑑) (5)
where c represents the class and d represent the document.
𝐶∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∗
𝑝(𝑐)
𝑝(𝑑)
(6)
TELKOMNIKA ISSN: 1693-6930 ◼
The classification of the modern arabic poetry using… (Munef Abdullah Ahmed)
2671
Because p(d) has no effect or role, the equations become:
𝐶∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∗ 𝑝(𝑐) (7)
The important hypothesis in this algorithm is that each property or feature in
the document does not depend on the other's features, and assumptions produce the following
equation:
𝐶∗
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∏ 𝑝(𝑓𝑖 𝑐) ∗ 𝑝(𝑐)⁄𝑛
𝑖 (8)
7.3. Linear Support Vector Classification
Linear SVC is a type of machine learning algorithms similar to the SVM. Some features
of this algorithm are the flexibility in selection and loss of functions. It is suitable for a huge
number of samples. From the testing of this model on data, researchers have found it using
one-against-rest approach compared to SVM which uses one-against-one approach. This
model is used in several applications like the classification of text documents using sparse
features [22-24].
8. Methodology
Figure 1 presents the outline of our work. In the beginning, we choose the dataset used
in our work; after that, we segmented it into words and all the steps of data preprocessing were
applied, including features extraction. We used three machine learning algorithms (SVM, LSVC,
and NB) in training and testing.
Figure 1. Block diagram of the proposed method
9. Results
The work was done with the Python language using the machine configuration as
follows: OS: Windows 7, CPU Speed: 3.20 GHz, Processor: Intel Core i7, RAM: 4GB. With
the intention of scrutinizing the suggested work’s performance, different parameters such as
precision, recall, and f-measure were measured for all types of modern Arabic poem.
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674
2672
The performance of the proposed method is presented in Tables 5 to 8 and Figures 2 to 5, as
described below. The first type of machine learning algorithm used was Naïve Bayes. Table 5
illustrates the precision, recall, and f-measure for this algorithm. The maximum value for
precision was for the politic class, while for the recall, the maximum value was for love class.
F-measure was highest in the social and politic classes. The results for this algorithm were
compared to the results of other machine learning algorithms.
Table 6 presents the results of the SVM algorithm. From the results, the maximum
values of precision, recall, and f-measure were all for the Islamic class. This result was also
compared to the results of the other machine learning frameworks. Table 7 illustrates the result
of the classification process using linear SVC algorithm. From the results, the maximum value of
precision was for the social class while the maximum values for recall and f-measure were for
love class. Table 8 illustrates the average value for precision, recall, and f-measure for all
the machine learning algorithms used in the classification of our dataset. From the table, linear
SVC algorithm was found to have the maximum precision, recall, and f-measure values.
Figure 2 illustrates the precision for all types of modern Arabic poem using three
machine learning algorithms. From the figure, the maximum value of precision for most types of
the modern poem was presented by the linear SVC algorithm while the minimum value was
presented by the SVM algorithm. When we compared the recall for our dataset as calculated
using the tree machine learning algorithms, we found the maximum recall value in both NB and
LSVC algorithms while the minimum recall value was found in SVM algorithm as shown in
Figure 3. Figures 4 illustrates the f-measure for our dataset. The sequence of values from top to
bottom in these algorithms was as follows: LSVC, NB, and SVM algorithm. Figure 5 illustrates
the average value for our dataset. The best result was found in the L SVC algorithm, followed by
the NB algorithm and SVM algorithm.
Table 5. Classification of our Dataset using
Naïve Bayes
Table 6. Classification of our Dataset using
Support Vector Machine
precision recall F-measure
Islamic 0.14 0.5 0.22
Love 0.57 0.8 0.67
Politic 1 0.5 0.67
Social 0.5 0.17 0.25
Average 0.64 0.47 0.49
precision recall F-measure
Islamic 0.5 0.25 0.33
Love 0.02 0.1 0.2
Politic 0.07 0.05 0.09
Social 0.12 0.16 0.1
Average 0.1775 0.14 0.18
Table 7. Classification of our Dataset using
Linear Support Vector Classification
Table 8. Average results of our Dataset using
Three Machine Learning Algorithms
precision recall F-measure
Islamic 0.17 0.5 0.25
Love 0.83 0.71 0.77
Politic 0.2 0.33 0.25
Social 1 0.29 0.44
Average 0.72 0.47 0.51
precision recall F-measure
Naïve Bayes 0.64 0.47 0.49
Support Vector Machine 0.1775 0.14 0.18
Linear Support Vector
Classification
0.72 0.47 0.51
Figure 2. The precision for our dataset using
three machine learning algorithms
Figure 3. The recall for our dataset using
three machine learning algorithms
TELKOMNIKA ISSN: 1693-6930 ◼
The classification of the modern arabic poetry using… (Munef Abdullah Ahmed)
2673
Figure 4. The F-measure for our dataset using
three machine learning algorithms
Figure 5. Average results of our dataset using
three machine learning algorithms
10. Conclusion
In this paper, we used Support Vector Machine, linear Support Vector Classification,
and Naïve Bayes for the classification of modern Arabic poems. The machine learning
algorithms proved to be good tools for text classification. From the comparison of the result of
the precision, recall, and f-measure for all types of the modern Arabic poem, the best result was
found when using linear Support Vector Classification and Naïve Bayes. One of the main
reasons for this disparity in performance could be the size of the dataset since some machine
learning algorithms can work better with few datasets. Also, the preprocessing of our dataset
was an important step as it increased the accuracy of the classification and reduced
the required memory size for the classification process. This method of classification can be
further improved for the other types of Arabic poetry.
References
[1] MA Ahmed, S Trausan-Matu. Using natural language processing for analyzing Arabic poetry rhythm.
in Networking in Education and Research (RoEduNet), 2017 16th RoEduNet Conference. 2017: 1-5.
[2] S Al-Harbi, A Almuhareb, A Al-Thubaity, M Khorsheed, A Al-Rajeh. Automatic Arabic text
classification. JADT 2008: 9es Journées internationales d’Analyse statistique des Données
Textuelles. 2008: 77-83.
[3] M Abdul-Mageed, M T Diab, M Korayem. Subjectivity and sentiment analysis of modern standard
Arabic. in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies. 2011; 2: 587-591.
[4] A Ortony, GL Clore, A Collins. The cognitive structure of emotions: Cambridge university press. 1990.
[5] H Liu, H Lieberman, T Selker. A model of textual affect sensing using real-world knowledge. in
Proceedings of the 8th international conference on Intelligent user interfaces. 2003: 125-132.
[6] MG Dyer. Emotions and their computations: Three computer models. Cognition and emotion. 1987;
1(3): 323-347.
[7] O Alsharif, D Alshamaa, N Ghneim. Emotion classification in Arabic poetry using machine learning.
International Journal of Computer Applications. 2013; 56(16):10-15.
[8] MM Al-Tahrawi, SN Al-Khatib. Arabic text classification using Polynomial Networks. Journal of King
Saud University-Computer and Information Sciences. 2015; 27(4): 437-449.
[9] S Alsaleem. Automated Arabic Text Categorization using SVM and NB. in Int. Arab J. e-Technol.
2011; 2(2): 124-128.
[10] R Belkebir, A Guessoum. A hybrid BSO-Chi2-SVM approach to Arabic text categorization. in ACS
International Conference on Computer Systems and Applications (AICCSA). 2013: 1-7.
[11] J Ababneh, O Almomani, W Hadi, NKT El-Omari, A Al-Ibrahim. Vector space models to classify
Arabic text. International Journal of Computer Trends and Technology (IJCTT). 2014; 7(4):
219-223.
[12] S Khorsheed, AOAl-Thubaity. Comparative evaluation of text classification techniques using a large
diverse Arabic dataset. Language resources and evaluation. 2013; 47(2): 513-538.
[13] L Fodil, H Sayoud, S Ouamour. Theme classification of Arabic text: A statistical approach.
Terminology and Knowledge Engineering. 2014: 01005873.
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674
2674
[14] C Holes. Modern Arabic: Structures, functions, and varieties: Georgetown University Press. 2004.
[15] S Khoja, R Garside. Stemming arabic text. Lancaster, UK, Computing Department, Lancaster
University. 1999.
[16] B Pang, L Lee, S Vaithyanathan. Thumbs up?: sentiment classification using machine learning
techniques. in Proceedings of the ACL-02 conference on Empirical methods in natural language
processing. 2002; 10: 79-86.
[17] C Sudheer, R Maheswaran, B K Panigrahi, S Mathur. A hybrid SVM-PSO model for forecasting
monthly streamflow. Neural Computing and Applications. 2014; 24(6): 1381-1389.
[18] X Zhang, S Ding, Y Xue. An improved multiple birth support vector machine for pattern classification.
Neurocomputing. 2017; 225: 119-128.
[19] Z Chen, Z Qi, B Wang, L Cui, F Meng, Y Shi. Learning with label proportions based on nonparallel
support vector machines. Knowledge-Based Systems. 2017; 119: 126-141.
[20] W Jiang, D-S Huang, S Li. Random walk-based solution to triple level stochastic point location
problem. IEEE transactions on cybernetics. 2016; 46(6): 1438-1451.
[21] T Joachims. Text categorization with support vector machines: Learning with many relevant features.
in European conference on machine learning. 1998: 137-142.
[22] F Debole, F Sebastiani. An analysis of the relative hardness of Reuters‐21578 subsets. Journal of the
Association for Information Science and Technology. 2005; 56(6): 584-596.
[23] RA Hasan, MA Mohammed, ZH Salih, MAB Ameedeen, N Ţăpuş, MN Mohammed. HSO: A Hybrid
Swarm Optimization Algorithm for Reducing Energy Consumption in the Cloudlets. TELKOMNIKA
Telecommunication, Computing, Electronics and Control. 2018; 16(5): 2144-2154.
[24] RA Hasan, MA Mohammed, N Ţăpuş, OA Hammood. A comprehensive study: Ant Colony
Optimization (ACO) for facility layout problem. in 2017 16th RoEduNet Conference: Networking in
Education and Research (RoEduNet). 2017: 1-8.
[25] MA Mohammed, ZH Salih, N Ţăpuş, RAK Hasan. Security and accountability for sharing the data
stored in the cloud. in 2016 15th RoEduNet Conference: Networking in Education and Research.
2016: 1-5.
[26] MA Mohammed, N ŢĂPUŞ. A Novel Approach of Reducing Energy Consumption by Utilizing
Enthalpy in Mobile Cloud Computing. Studies in Informatics and Control. 2017; 26: 425-434.
[27] MA Mohammed, RA Hasan. Particle swarm optimization for facility layout problems FLP—A
comprehensive study. in 2017 13th IEEE International Conference on Intelligent Computer
Communication and Processing (ICCP). 2017: 93-99.
[28] ZH Salih, GT Hasan, MA Mohammed. Investigate and analyze the levels of electromagnetic
radiations emitted from underground power cables extended in modern cities. in 2017 9th
International Conference on Electronics, Computers and Artificial Intelligence (ECAI), 2017.
[29] RA Hasan, MN Mohammed. A krill herd behaviour inspired load balancing of tasks in cloud
computing. Studies in Informatics and Control. 2017; 26: 413-424.
[30] MA Mohammed, RA Hasan, MA Ahmed, N Tapus, MA Shanan, MK Khaleel, et al. A Focal load
balancer based algorithm for task assignment in cloud environment. in 2018 10th International
Conference on Electronics, Computers and Artificial Intelligence (ECAI). 2018: 1-4.

More Related Content

PDF
Regional Variation and Persian Word selection
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
04. 9990 16097-1-ed (edited arf)
PDF
Development of morphological analyzer for hindi
PDF
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
PDF
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
PDF
Marathi Text-To-Speech Synthesis using Natural Language Processing
PDF
Using automated lexical resources in arabic sentence subjectivity
Regional Variation and Persian Word selection
Welcome to International Journal of Engineering Research and Development (IJERD)
04. 9990 16097-1-ed (edited arf)
Development of morphological analyzer for hindi
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
Marathi Text-To-Speech Synthesis using Natural Language Processing
Using automated lexical resources in arabic sentence subjectivity

What's hot (11)

PDF
Summer Research Project (Anusaaraka) Report
PDF
Volume 9-issue-1
PDF
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
PDF
المجلد: 1 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
PPSX
Overview mary massoud's translate to communicate
PDF
Azhary: An Arabic Lexical Ontology
PDF
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
PDF
Kannada Phonemes to Speech Dictionary: Statistical Approach
PDF
Hidden markov model based part of speech tagger for sinhala language
PDF
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
PDF
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
Summer Research Project (Anusaaraka) Report
Volume 9-issue-1
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
المجلد: 1 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
Overview mary massoud's translate to communicate
Azhary: An Arabic Lexical Ontology
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
Kannada Phonemes to Speech Dictionary: Statistical Approach
Hidden markov model based part of speech tagger for sinhala language
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
Ad

Similar to The classification of the modern arabic poetry using machine learning (20)

PDF
Hassan ibn Thabit: An Original Arabic Tongue (1) حسان بن ثابت: لسان عربي أصيل
PDF
The Influence of Ibrahim Khafaji as Arabic Lyric Poet - تأثير إبراهيم خفاجي: ...
PDF
A Survey of Arabic Text Classification Models
PPTX
Presentation curras paper-emnlp2014-final
PDF
Home in the Poetry of Saudi Arabia Poets: Abdus-Salam Hafeth an Example of a ...
PDF
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
PDF
BERT-based models for classifying multi-dialect Arabic texts
PPTX
Speech recognition for arabic
PDF
Mua'llagat Zohayr ibn Abi Solma: Elegant Piece of Arabic Poetry (1) - معلقة ز...
PDF
XMODEL: An XML-based Morphological Analyzer for Arabic Language
PDF
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
PDF
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
PDF
Using Twitter to Monitor Political Sentiment for Arabic Slang
PDF
Using Twitter to Monitor Political Sentiment for Arabic Slang
PDF
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...
PDF
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...
PDF
An Enhanced Approach for Arabic Sentiment Analysis
PDF
8517ijaia01
PDF
محاضرة المدونات اللغوية وأدواتها
PDF
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
Hassan ibn Thabit: An Original Arabic Tongue (1) حسان بن ثابت: لسان عربي أصيل
The Influence of Ibrahim Khafaji as Arabic Lyric Poet - تأثير إبراهيم خفاجي: ...
A Survey of Arabic Text Classification Models
Presentation curras paper-emnlp2014-final
Home in the Poetry of Saudi Arabia Poets: Abdus-Salam Hafeth an Example of a ...
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
BERT-based models for classifying multi-dialect Arabic texts
Speech recognition for arabic
Mua'llagat Zohayr ibn Abi Solma: Elegant Piece of Arabic Poetry (1) - معلقة ز...
XMODEL: An XML-based Morphological Analyzer for Arabic Language
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
Using Twitter to Monitor Political Sentiment for Arabic Slang
Using Twitter to Monitor Political Sentiment for Arabic Slang
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...
ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO TH...
An Enhanced Approach for Arabic Sentiment Analysis
8517ijaia01
محاضرة المدونات اللغوية وأدواتها
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
Ad

More from TELKOMNIKA JOURNAL (20)

PDF
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
PDF
Implementation of ICMP flood detection and mitigation system based on softwar...
PDF
Indonesian continuous speech recognition optimization with convolution bidir...
PDF
Recognition and understanding of construction safety signs by final year engi...
PDF
The use of dolomite to overcome grounding resistance in acidic swamp land
PDF
Clustering of swamp land types against soil resistivity and grounding resistance
PDF
Hybrid methodology for parameter algebraic identification in spatial/time dom...
PDF
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
PDF
Deep learning approaches for accurate wood species recognition
PDF
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
PDF
Reversible data hiding with selective bits difference expansion and modulus f...
PDF
Website-based: smart goat farm monitoring cages
PDF
Novel internet of things-spectroscopy methods for targeted water pollutants i...
PDF
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
PDF
Convolutional neural network-based real-time drowsy driver detection for acci...
PDF
Addressing overfitting in comparative study for deep learningbased classifica...
PDF
Integrating artificial intelligence into accounting systems: a qualitative st...
PDF
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
PDF
Adulterated beef detection with redundant gas sensor using optimized convolut...
PDF
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
Implementation of ICMP flood detection and mitigation system based on softwar...
Indonesian continuous speech recognition optimization with convolution bidir...
Recognition and understanding of construction safety signs by final year engi...
The use of dolomite to overcome grounding resistance in acidic swamp land
Clustering of swamp land types against soil resistivity and grounding resistance
Hybrid methodology for parameter algebraic identification in spatial/time dom...
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
Deep learning approaches for accurate wood species recognition
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
Reversible data hiding with selective bits difference expansion and modulus f...
Website-based: smart goat farm monitoring cages
Novel internet of things-spectroscopy methods for targeted water pollutants i...
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
Convolutional neural network-based real-time drowsy driver detection for acci...
Addressing overfitting in comparative study for deep learningbased classifica...
Integrating artificial intelligence into accounting systems: a qualitative st...
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
Adulterated beef detection with redundant gas sensor using optimized convolut...
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...

Recently uploaded (20)

PDF
Cryptography and Network Security-Module-I.pdf
PDF
20250617 - IR - Global Guide for HR - 51 pages.pdf
PDF
Computer organization and architecuture Digital Notes....pdf
PDF
Applications of Equal_Area_Criterion.pdf
PPTX
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
Introduction to Power System StabilityPS
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Prof. Dr. KAYIHURA A. SILAS MUNYANEZA, PhD..pdf
PPTX
CyberSecurity Mobile and Wireless Devices
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
Java Basics-Introduction and program control
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Cryptography and Network Security-Module-I.pdf
20250617 - IR - Global Guide for HR - 51 pages.pdf
Computer organization and architecuture Digital Notes....pdf
Applications of Equal_Area_Criterion.pdf
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
Introduction to Power System StabilityPS
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Prof. Dr. KAYIHURA A. SILAS MUNYANEZA, PhD..pdf
CyberSecurity Mobile and Wireless Devices
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Java Basics-Introduction and program control
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
Soil Improvement Techniques Note - Rabbi
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx

The classification of the modern arabic poetry using machine learning

  • 1. TELKOMNIKA, Vol.17, No.5, October 2019, pp.2667~2674 ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018 DOI: 10.12928/TELKOMNIKA.v17i5.12646 ◼ 2667 Received January 5, 2019; Revised March 17, 2019; Accepted April 22, 2019 The classification of the modern arabic poetry using machine learning Munef Abdullah Ahmed*1 , Raed Abdulkareem Hasan2 , Ahmed Hussein Ali3 , Mostafa Abdulghafoor Mohammed4 1 Faculty of Al Hawija Technical institute, Northern Technical University, Mosel, 41002, Iraq 2 Faculty of Al-dour Technical institute, Northern Technical University, Mosel, 41002, Iraq 3 AL Salam University College Computer Science Department Baghdad, Iraq 4 Great Imam University College, Baghdad, 10053, Iraq 4 Faculty of Automatic Control and Computers, University Polytechnic of Bucharest, 060042, Romania *Corresponding author, e-mail: drmunef69@gmail.com1 , raed.isc.sa@gmail.com2 msc.ahmed.h.ali@gmail.com3 , alqaisy86@gmail.com4 Abstract In recent years, working on text classification and analysis of Arabic texts using machine learning has seen some progress, but most of this research has not focused on Arabic poetry. Because of some difficulties in the analysis of Arabic poetry, it was required the use of standard Arabic language on which “Al Arud”, the science of studying poetry is based. This paper presents an approach that uses machine learning for the classification of modern Arabic poetry into four types: love poems, Islamic poems, social poems, and political poems. Each of these species usually has features that indicate the class of the poem. Despite the challenges generated by the difficulty of the rules of the Arabic language on which this classification depends, we proposed a new automatic way of modern Arabic poems classification to solve these issues. The recommended method is suitable for the above-mentioned classes of poems. This study used Naïve Bayes, Support Vector Machines, and Linear Support Vector for the classification processes. Data preprocessing was an important step of the approach in this paper, as it increased the accuracy of the classification. Keywords: classification of arabic poems, machine learning algorithms, modern arabic poems Copyright © 2019 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction Despite the number of approaches on the automatic classification of the English language and other languages, the Arabic language still needs a lot of research, especially related to Arabic poetry. This is due to the number of determinants in the language, including its difficulty and the need to master the rules of the language when studying poetry. There is also a need for a full understanding of the theory of “Al Arud”, which specializes in the study of Arabic poetry [1] whether as a regular text or poem, focused on the topic or on the effects [2]. Few studies have used sentiment analysis to classify Arabic texts [3]. In this study, we used Naïve Bayes (NB), Support Vector Machines (SVM), and Linear Support Vector classification (SVC) for the classification task. The next section of this paper covers a review of the related work, followed by the introduction of the four categories of modern Arabic poetry. After that, the dataset of the work is presented, followed by the data preprocessing step which has a direct effect on the accuracy of the classification process. The sixth and seventh sections focus on feature selection and the machine learning algorithms used. These sections are followed by those that discuss the methodology, results, and conclusions from the study. 2. State of the ART Several methods have been used in the English language for the classification of emotions. Some of these studies depended on keywords spotting or unambiguous words like “happy” and “sad” [4]. The lexical affinity from the effective research in this field depended on the emotion of the arbitrary term or words. In general, this method is better than the keyword
  • 2. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674 2668 spotting method as it cannot be used as an independent model [5]. There are other methods which rely on a deep understanding of the language and semantics [5]. Reliance on psychological theory in determining desires, goals, and needs was one of the models used in the classification [6]. The machine learning techniques used in the classification of classical Arabic poetry depended on the emotion [7]. This work classified the Arabic poetry into Fakhr, Retha, Ghazal, and Heija. The polynomial networks were used in the Arabic text classification [8]. Several classification algorithms have been used in the classification of Arabic text, such as SVM [8, 9], the NB [10], K-Nearest Neighbor (KNN) [11], Artificial Neural Network (ANN) [12], and the Rocchio feedback algorithm [13]. 3. Categories of Modern Arabic Poetry The modern Arabic poetry in general consists of the following types [14]: − Love poems: It is a poetic art used to express the feelings between lovers. The poet derives the meanings of his relationship with the subject, his outlook, the influence of the environment, and the reality of those feelings. − Islamic (religious) poems: The poets benefited from the stories contained in the Holy Quran; so, they took the precepts, rulings, and semantics and employed them in their poetry, treating community issues and problems that spread in their country at the time. − Social poems: Social poems aim to repair bad social conditions by diagnosing the problem, identifying its cause, and describing its resolution. The poets resort to the method of encouragement and motivation when they want their people to contribute to the promotion and progress and avoid the pests and conditions that undermine the foundations of its renaissance. − Political poems: This type of poetry expresses certain political orientations and the personal views of poets while preserving the way poetry is written, the values of literary and artistic poetry. 4. The Dataset The Arabic language research using Natural Language Processing (NLP) is different from the English language in terms of the number and size of the datasets used. Due to the limited number of free available datasets in the Arabic language (which is an obstacle in the way of researchers), most researchers rely on a collection of datasets taken from magazines, news stations, and websites. Some researchers depended on Saudi newspapers [11]. In the Arabic research, several schools of thought have classified the datasets into training and testing groups. In our work, the big problem is finding the datasets for tuning and testing because it is the first work on using machine learning for classifying the modern Arabic poetry. We depended on the website for datasets to train and test the categories of modern Arabic poetry. 5. Data Pre-Processing The Arabic language is difficult both in speaking and writing. It consists of 29 letters ( ‫أ‬‫ب‬ ‫ت‬‫ث‬‫ج‬‫ح‬‫خ‬‫د‬‫ذ‬‫ر‬‫ز‬‫س‬‫ش‬‫ص‬‫ض‬‫ط‬‫ظ‬‫ع‬‫غ‬‫ف‬‫ق‬‫ك‬‫ل‬‫م‬‫ن‬‫ه‬‫و‬‫ي‬ ) and the ”Hamza” (‫)ء‬ which are divided into two types. The first type is called long vowels, which includes three letters ( ‫ا‬,‫و‬,‫ي‬ ); the other is called constant letters. In this language, there are several kinds of diacritics used, such as “sukoon”, “dammah”, “Kasra”, “Fatha”, “tanween fatha”, “tanween kasra”, “tanween dammah”, “shadde”, and “mad”. These short vowels give correct pronunciation and meaning. Table 1 illustrates the short vowels and pronunciations to the words that have the same letters but different pronunciation and meaning as shown in Table 2. Arabic writings are different from those using the Latin alphabet, due to the direction of writing from right to left. Some letters in Arabic also take several forms depending on the location of the character on the word. These features must be considered in this work as shown in Table 3.
  • 3. TELKOMNIKA ISSN: 1693-6930 ◼ The classification of the modern arabic poetry using… (Munef Abdullah Ahmed) 2669 Table 1. The Diacritics in Modern Arabic Poem The short vowel The Sign Applied to the letter Pronunciation “sukoon” ْ ‫ل‬-‫س‬ S- L “dammah” ْ ‫ل‬-‫س‬ Su - Lu “Kasra” ْ ‫ل‬-‫س‬ Si - Li “Fatha” ْ ‫ل‬-‫س‬ Sa - La “tanween fatha” ْ ‫ل‬-‫س‬ San - Lan “tanween kasra” ْ ‫ل‬-‫س‬ Sin - Lin “tanween dammah” ْ ‫ل‬-‫س‬ Son - Lon “shadde” ْ ‫ل‬–‫س‬ Ss – Ll Mad ~ ‫آ‬ Aa Table 2. Example for the Effect the Diacritics on the Arabic Word Table 3. The Effect of a Positioning on the form of a Letter The word The meaning ‫م‬‫ل‬‫س‬ Hello ‫م‬‫ل‬‫س‬ Ladder ‫لم‬‫س‬ Was delivered ‫م‬‫ل‬‫س‬ Safety ‫م‬‫ل‬‫س‬ Saved The letter The Arabic word The meaning ‫هـ‬ ‫هديه‬ Gift ‫ــهــ‬ ‫الهام‬ Important ‫ـــه‬ ‫له‬ For him ‫ه‬ ‫كره‬ A ball The Arabic language has two types of genres, masculine and feminine. Each type in the Arabic language has different qualities and features in Arabic grammar. There are three classes in the Arabic language, the first is singular, the second is dual, and plural which also has two types (regular and broken). The Arabic language contains many ramifications in grammar. It is a very rich language, and this makes it difficult and a challenge to reach the required accuracy in the classification of modern Arabic poetry. Pre-processing of data is an important thing to do when building classification systems using machine language for the following reasons: − It removes noise from the text used in the classification. − It reduces the terms or characteristics on which we base our classification. − It helps reducing the amount of memory required for the classification. − It helps increasing the accuracy of the classification. We applied the following pre-processing on the data used in our work: − Tokenization: We divided the data into parts and based on characteristics and recognition of delimiters like the punctuation of special characters and white space. − We removed non-Arabic terms, words, numbers, punctuations, and any other singe. − The stop words like pronouns, prepositions, and conjunctions were also removed; we deepened the list adopted by Khoja and Garside [15, 16]. − Stemming: The major aim of stemming is to decrease an inflated dataset. In Arabic, many words can be composed from the same stem. Thus, we can reduce the number of terms used in the dataset and the complexity of text classification. This is also a storage requirement for classification systems [17, 18]. 6. Features Selection In machine learning, constructing or representing vectors of features is a very important and critical point and has a significant impact on the results of the machine learning algorithm. Each object should be represented with its own features. 𝐷 = 𝑑1𝑑2 … 𝑑𝑛. (1) 𝑑𝑖 = 𝑊1𝑊2 … 𝑊𝑛 (2) 𝑑̈ = 𝑔(𝑑) (3) where D is a document, 𝑊 is a word, and 𝑔 is the function representing the relation between the domain of documents and features. 𝑔 may be a linear or nonlinear equation. The number of
  • 4. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674 2670 classes is represented by 𝐶 and the number of features is represented by 𝐾. 𝐶*𝐾 is a feature vector length. We performed the mutually deducted occurrence as follows: 𝑛 𝑐 = (𝑓𝑖) represented the probability of occurrence of feature 𝑓𝑖 in category or class c. Therefore, the mutually deducted count feature became as follows: 𝑑𝑛 𝑐(𝑓𝑖) = 𝑛 𝑐(𝑓𝑖) − 𝑛 𝑑(𝑓𝑖), where 𝑑 ≠ 𝑐, (4) which refers to the number of appearances of any characteristic or feature in any category deducted from the number of appearances of the same characteristic in all other categories. The feature vector was used for building document 𝐷 once. When found any feature, the Boolean flag was used. The Boolean vector model used in this type of classification is better than the count model [19, 20]. 7. Machine Learning Algorithms In our approach, three machine learning algorithms were selected for the classification of modern Arabic poetry. These algorithms have been proven successful in the classification of the English text. The first algorithm is Support Vector Machines, the second is Naïve Bayes, and the third is Linear Support Vector Classification. The datasets consist of four groups (folders): Islamic contains 23 files, Love contains 25 files, Politic contains 22 files, and Social contains 22 files, as illustrated in Table 4. Classifier performance is evaluated by computing its precision [21], recall [16], and f-measure [22]. Table 4. The Datasets for the Classification The folder name Number of files Number of verses Islamic 23 600 Love 25 600 Politic 22 500 social 22 550 7.1. Support Vector Machines SVM is a computationally kernel-based algorithm for regression and binary data classification purposes [17, 18]. Based on the structural risk minimization theory, the SVM has been proven successful in solving both local minimum and high dimensionality problems. It has a better generalization performance compared to other ML methods such as ANNs [19, 20]. SVM has so far been excellent in solving several real-world data mining predictive problems like time series prediction, text categorization, image processing, and pattern recognition [21, 22]. Despite the remarkable achievements of the SVM, there are still certain drawbacks that need to be addressed, such as problems on the relationship of the statistical learning theory with other theoretical frameworks, big data processing, parameters selection, and the generalization ability of a given problem [23, 24]. With the rate of development of information systems, high-dimensional, dynamic and complex data are easily generated [25, 26]. 7.2. Naïve Bayes The NB method is a classification scheme which relies on the Bayes’ theorem. This technique assumes the independence of its predictors. Simply, the NB classifier assumes that there is no relationship between the existence of certain features in a class and that of any other feature [27-30]. This theory was adopted in determining the class of the document on the following equation: 𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑐|𝑑) (5) where c represents the class and d represent the document. 𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∗ 𝑝(𝑐) 𝑝(𝑑) (6)
  • 5. TELKOMNIKA ISSN: 1693-6930 ◼ The classification of the modern arabic poetry using… (Munef Abdullah Ahmed) 2671 Because p(d) has no effect or role, the equations become: 𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∗ 𝑝(𝑐) (7) The important hypothesis in this algorithm is that each property or feature in the document does not depend on the other's features, and assumptions produce the following equation: 𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑃(𝑑|𝑐) ∏ 𝑝(𝑓𝑖 𝑐) ∗ 𝑝(𝑐)⁄𝑛 𝑖 (8) 7.3. Linear Support Vector Classification Linear SVC is a type of machine learning algorithms similar to the SVM. Some features of this algorithm are the flexibility in selection and loss of functions. It is suitable for a huge number of samples. From the testing of this model on data, researchers have found it using one-against-rest approach compared to SVM which uses one-against-one approach. This model is used in several applications like the classification of text documents using sparse features [22-24]. 8. Methodology Figure 1 presents the outline of our work. In the beginning, we choose the dataset used in our work; after that, we segmented it into words and all the steps of data preprocessing were applied, including features extraction. We used three machine learning algorithms (SVM, LSVC, and NB) in training and testing. Figure 1. Block diagram of the proposed method 9. Results The work was done with the Python language using the machine configuration as follows: OS: Windows 7, CPU Speed: 3.20 GHz, Processor: Intel Core i7, RAM: 4GB. With the intention of scrutinizing the suggested work’s performance, different parameters such as precision, recall, and f-measure were measured for all types of modern Arabic poem.
  • 6. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674 2672 The performance of the proposed method is presented in Tables 5 to 8 and Figures 2 to 5, as described below. The first type of machine learning algorithm used was Naïve Bayes. Table 5 illustrates the precision, recall, and f-measure for this algorithm. The maximum value for precision was for the politic class, while for the recall, the maximum value was for love class. F-measure was highest in the social and politic classes. The results for this algorithm were compared to the results of other machine learning algorithms. Table 6 presents the results of the SVM algorithm. From the results, the maximum values of precision, recall, and f-measure were all for the Islamic class. This result was also compared to the results of the other machine learning frameworks. Table 7 illustrates the result of the classification process using linear SVC algorithm. From the results, the maximum value of precision was for the social class while the maximum values for recall and f-measure were for love class. Table 8 illustrates the average value for precision, recall, and f-measure for all the machine learning algorithms used in the classification of our dataset. From the table, linear SVC algorithm was found to have the maximum precision, recall, and f-measure values. Figure 2 illustrates the precision for all types of modern Arabic poem using three machine learning algorithms. From the figure, the maximum value of precision for most types of the modern poem was presented by the linear SVC algorithm while the minimum value was presented by the SVM algorithm. When we compared the recall for our dataset as calculated using the tree machine learning algorithms, we found the maximum recall value in both NB and LSVC algorithms while the minimum recall value was found in SVM algorithm as shown in Figure 3. Figures 4 illustrates the f-measure for our dataset. The sequence of values from top to bottom in these algorithms was as follows: LSVC, NB, and SVM algorithm. Figure 5 illustrates the average value for our dataset. The best result was found in the L SVC algorithm, followed by the NB algorithm and SVM algorithm. Table 5. Classification of our Dataset using Naïve Bayes Table 6. Classification of our Dataset using Support Vector Machine precision recall F-measure Islamic 0.14 0.5 0.22 Love 0.57 0.8 0.67 Politic 1 0.5 0.67 Social 0.5 0.17 0.25 Average 0.64 0.47 0.49 precision recall F-measure Islamic 0.5 0.25 0.33 Love 0.02 0.1 0.2 Politic 0.07 0.05 0.09 Social 0.12 0.16 0.1 Average 0.1775 0.14 0.18 Table 7. Classification of our Dataset using Linear Support Vector Classification Table 8. Average results of our Dataset using Three Machine Learning Algorithms precision recall F-measure Islamic 0.17 0.5 0.25 Love 0.83 0.71 0.77 Politic 0.2 0.33 0.25 Social 1 0.29 0.44 Average 0.72 0.47 0.51 precision recall F-measure Naïve Bayes 0.64 0.47 0.49 Support Vector Machine 0.1775 0.14 0.18 Linear Support Vector Classification 0.72 0.47 0.51 Figure 2. The precision for our dataset using three machine learning algorithms Figure 3. The recall for our dataset using three machine learning algorithms
  • 7. TELKOMNIKA ISSN: 1693-6930 ◼ The classification of the modern arabic poetry using… (Munef Abdullah Ahmed) 2673 Figure 4. The F-measure for our dataset using three machine learning algorithms Figure 5. Average results of our dataset using three machine learning algorithms 10. Conclusion In this paper, we used Support Vector Machine, linear Support Vector Classification, and Naïve Bayes for the classification of modern Arabic poems. The machine learning algorithms proved to be good tools for text classification. From the comparison of the result of the precision, recall, and f-measure for all types of the modern Arabic poem, the best result was found when using linear Support Vector Classification and Naïve Bayes. One of the main reasons for this disparity in performance could be the size of the dataset since some machine learning algorithms can work better with few datasets. Also, the preprocessing of our dataset was an important step as it increased the accuracy of the classification and reduced the required memory size for the classification process. This method of classification can be further improved for the other types of Arabic poetry. References [1] MA Ahmed, S Trausan-Matu. Using natural language processing for analyzing Arabic poetry rhythm. in Networking in Education and Research (RoEduNet), 2017 16th RoEduNet Conference. 2017: 1-5. [2] S Al-Harbi, A Almuhareb, A Al-Thubaity, M Khorsheed, A Al-Rajeh. Automatic Arabic text classification. JADT 2008: 9es Journées internationales d’Analyse statistique des Données Textuelles. 2008: 77-83. [3] M Abdul-Mageed, M T Diab, M Korayem. Subjectivity and sentiment analysis of modern standard Arabic. in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011; 2: 587-591. [4] A Ortony, GL Clore, A Collins. The cognitive structure of emotions: Cambridge university press. 1990. [5] H Liu, H Lieberman, T Selker. A model of textual affect sensing using real-world knowledge. in Proceedings of the 8th international conference on Intelligent user interfaces. 2003: 125-132. [6] MG Dyer. Emotions and their computations: Three computer models. Cognition and emotion. 1987; 1(3): 323-347. [7] O Alsharif, D Alshamaa, N Ghneim. Emotion classification in Arabic poetry using machine learning. International Journal of Computer Applications. 2013; 56(16):10-15. [8] MM Al-Tahrawi, SN Al-Khatib. Arabic text classification using Polynomial Networks. Journal of King Saud University-Computer and Information Sciences. 2015; 27(4): 437-449. [9] S Alsaleem. Automated Arabic Text Categorization using SVM and NB. in Int. Arab J. e-Technol. 2011; 2(2): 124-128. [10] R Belkebir, A Guessoum. A hybrid BSO-Chi2-SVM approach to Arabic text categorization. in ACS International Conference on Computer Systems and Applications (AICCSA). 2013: 1-7. [11] J Ababneh, O Almomani, W Hadi, NKT El-Omari, A Al-Ibrahim. Vector space models to classify Arabic text. International Journal of Computer Trends and Technology (IJCTT). 2014; 7(4): 219-223. [12] S Khorsheed, AOAl-Thubaity. Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Language resources and evaluation. 2013; 47(2): 513-538. [13] L Fodil, H Sayoud, S Ouamour. Theme classification of Arabic text: A statistical approach. Terminology and Knowledge Engineering. 2014: 01005873.
  • 8. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 5, October 2019: 2667-2674 2674 [14] C Holes. Modern Arabic: Structures, functions, and varieties: Georgetown University Press. 2004. [15] S Khoja, R Garside. Stemming arabic text. Lancaster, UK, Computing Department, Lancaster University. 1999. [16] B Pang, L Lee, S Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. in Proceedings of the ACL-02 conference on Empirical methods in natural language processing. 2002; 10: 79-86. [17] C Sudheer, R Maheswaran, B K Panigrahi, S Mathur. A hybrid SVM-PSO model for forecasting monthly streamflow. Neural Computing and Applications. 2014; 24(6): 1381-1389. [18] X Zhang, S Ding, Y Xue. An improved multiple birth support vector machine for pattern classification. Neurocomputing. 2017; 225: 119-128. [19] Z Chen, Z Qi, B Wang, L Cui, F Meng, Y Shi. Learning with label proportions based on nonparallel support vector machines. Knowledge-Based Systems. 2017; 119: 126-141. [20] W Jiang, D-S Huang, S Li. Random walk-based solution to triple level stochastic point location problem. IEEE transactions on cybernetics. 2016; 46(6): 1438-1451. [21] T Joachims. Text categorization with support vector machines: Learning with many relevant features. in European conference on machine learning. 1998: 137-142. [22] F Debole, F Sebastiani. An analysis of the relative hardness of Reuters‐21578 subsets. Journal of the Association for Information Science and Technology. 2005; 56(6): 584-596. [23] RA Hasan, MA Mohammed, ZH Salih, MAB Ameedeen, N Ţăpuş, MN Mohammed. HSO: A Hybrid Swarm Optimization Algorithm for Reducing Energy Consumption in the Cloudlets. TELKOMNIKA Telecommunication, Computing, Electronics and Control. 2018; 16(5): 2144-2154. [24] RA Hasan, MA Mohammed, N Ţăpuş, OA Hammood. A comprehensive study: Ant Colony Optimization (ACO) for facility layout problem. in 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet). 2017: 1-8. [25] MA Mohammed, ZH Salih, N Ţăpuş, RAK Hasan. Security and accountability for sharing the data stored in the cloud. in 2016 15th RoEduNet Conference: Networking in Education and Research. 2016: 1-5. [26] MA Mohammed, N ŢĂPUŞ. A Novel Approach of Reducing Energy Consumption by Utilizing Enthalpy in Mobile Cloud Computing. Studies in Informatics and Control. 2017; 26: 425-434. [27] MA Mohammed, RA Hasan. Particle swarm optimization for facility layout problems FLP—A comprehensive study. in 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). 2017: 93-99. [28] ZH Salih, GT Hasan, MA Mohammed. Investigate and analyze the levels of electromagnetic radiations emitted from underground power cables extended in modern cities. in 2017 9th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), 2017. [29] RA Hasan, MN Mohammed. A krill herd behaviour inspired load balancing of tasks in cloud computing. Studies in Informatics and Control. 2017; 26: 413-424. [30] MA Mohammed, RA Hasan, MA Ahmed, N Tapus, MA Shanan, MK Khaleel, et al. A Focal load balancer based algorithm for task assignment in cloud environment. in 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). 2018: 1-4.