SlideShare a Scribd company logo
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
DOI:10.5121/ijdms.2021.13401 1
TEXT ADVERTISEMENTS ANALYSIS USING
CONVOLUTIONAL NEURAL NETWORKS
AbdulwahedAlmarimi and Asmaa Salem
Department of Computer Science, Bani Waleed University, Libya
ABSTRACT
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements.
KEYWORDS
Convolutional neural networks, advertisement text, back-propagation algorithm, classification, encoding
of text
1. INTRODUCTION
Advertisement texts form a big set of data that we can individually choose an appropriate
category. Advertisements authors can select the category but sometimes they do not recognize the
suitable category. Generally, an advertisement consists of sentences, which are not long and can
be categorized in a good automatic way. Its good classification can be found in [1], [2] using
convolutional neural networks (CNN). Convolutional neural networks are used in many domains
of data processing and realize the best results mostly in image processing as it is presented in [3].
There exist models that have searched their use for word processing and get great results [1]. For
the processing of sentence and the classification of advertisements, modified CNN are used.
The classification of advertisement texts can be done using method for sentence classification in
two ways: (1) To suppose that all advertisement text is one sentence formed by words, or (2) to
analyze each sentence and then to estimate results of all sentences. We have used the first
method, In the first step of our analysis. Arabic and Slovak languages have different free
grammars. This indicates that positions of subjects and predicates in a sentence are not constant,
and it looks that the Arabic grammar is more complicated than the Slovak grammar. We used
grammar information in text data preparation as input to CNN and it can help to do some better
classification. We used some connections between pairs of words as assistance information in
input. As we explained in [4],[5] working on Arabic and English texts, the results were less
complicated than which we have got using Arabic and Slovak texts.
The structure of the paper as the following: The second section contains a description of data and
their preparation as input to the network. In the third section, we describe a developed model of
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
2
the convolutional neural network. The results and their comparison for both languages are given
in the fourth section, we illustrated the results and their comparison for both languages and we
formulate summary of results in the conclusion.
2. DATA PREPROCESSING
The advertisements are very similar in both languages, Arabic and Slovak. But we would like to
do a comparison for the results of the advertisement classification using a convolutional neural
network for both languages and between both languages.
2.1. Information on used Databases of Advertisements
 Arabic Advertisements - advertisements on the web: 3qaratonline.com portal [6]. There
are 3 used categories of different advertisements: realty, furniture and electric devices. We
supposed that electric devices descriptions have many common features with descriptions
of furniture, while the realty category has very little similarity to other categories.
 Slovak Advertisements - advertisements on the Bazos.sk portal [7], from which we have
selected specific categories that we will examine. We chose 3 categories, namely mobiles,
computers and laptops and animals. We supposed that cell phone descriptions have many
common features with descriptions of computers and notebooks, while the animal category
has very little similarity to the rest of categories, and therefore advertisements in this
category could be better evaluate using the neural network.
 The numbers of used advertisements in categories are in Table 1.
Table 1. The numbers of used advertisements from three categories in both languages.
Category Arabic
texts
Slovak
texts
Realty 205 -
Furniture 194 -
Electric devices 206 -
Mobiles - 11470
Computers and
laptops
- 4968
Animals - 3247
2.2. Creating Word Vectors
In order for the neural network to be able to process the text, we need to convert it to a vector of
real numbers. In our situation, we want to process the word by word in order to determine the
category based on the words used. Therefore, we need to create a uniform length vector for each
word that will be used in the input data for CNN. Most often, preprocessed models are used,
which are trained on a large number of texts to recognize the correlations between different
words based on their use in text. They create a vector model that indicates words into a vector
space, where semantically connected words are represented by points situated close to each other.
The most widely used of these models is the software package word2vec [8].
The disadvantage of these solutions is that all well-trained prepared models are in English. Given
the time-consumption of creating our own model for a similar learning of word vectors and the
characteristics of our data, we have chosen not to use this field. We will look to the words as to
images of coded letters (one letter represents one pixel) and we create vectors for words so that
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
3
each word codes the letter after the letter. For each letter a-z, we randomly generated the code v
 (0, 1), taking into account that the code for each letter is unique.It means, different words have
different codes.
Thereafter, we have removed the diacritics for each word in our data and created an encoding
vector
,
),
1
,
0
(
),
0
,...
0
,
,...,
,
( 2
1 d
i
v
v
v
v
v i
d 


 (1)
where d
v
v
v ,...,
, 2
1 are codes for the individual letters in the word with the length d. The vector is
completed with zeros for the uniform length of vectors l. Thus, we created a table which has a
word and its encoding vector. After text processing, we received a matrix of real numbers for
each advertisement with a fixed number of columns equal to the selected vector length and
variable number of rows depending on the number of words in the advertisement. For neural
networks, we used these matrices as input. The preprocessing was very similar for both
languages.
3. DEVELOPED MODEL OF CNN
We developed a similar network structure as suggested by [1], which was used for the processing
of sentences in some texts, and we tried to find parameters such that the network would well
evaluate our data using the knowledge found by [9].
3.1. Input Layer
As mentioned above, a network input is a matrix of real numbers, where each row is a vector
representation of the word in the advertisement after its processing. The number of rows differs
for each output, while the number of columns is equal to the selected vector length l for the word.
Let
l
i R
x 
be a vector of the length l representing the i-th word in the processed advertisement.
If the advertisement has n words, the resulting representation is
n
x
x
x
x 


 ...
2
1 (2)
where is a symbol of a concatenation operation. We will get the vector x with the length n*l
and it should be represented as a matrix of the type n* l.
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
4
Figure 1. The basic model of the prepared convolutional network [1].
3.2. Convolutional Layer
The main idea of a convolutional layer is to work in an invariant way across a text. The
convolutional layer consists of filters 
 n
j
R l
h
j 

 1
,
*
, where 
n
isthe number of filters, h
is the number of words in the filter which is applied to obtain a new feature. For example, the
feature
i
j
c
is calculated after applying the filter j

to the word i
x to 1

 h
xi
),
.
( 1
: b
x
f
c h
i
i
j
i
j 
 


(3)
whereb R is the threshold for the filter and f is an activation function such as hyperbolic
tangent or identity. In literature, the convolution operation is often performed by rotating the
filter before it is applied, but it does not affect to the network processing [Godfellow2016]. If
filter j

is applied to every possible part of the input with shift 1 and there is a vector of features
 
1
2
1
,...,
, 

 h
n
j
j
j
j c
c
c
c
. (4)
If there is a situation when h > n, we need to modify the input so that we can apply the filter to it.
This is achieved by adding of zero vectors (it is called zero padding). It means, we can apply any
big filter to any input (we do not have to limit the size of the smallest input). The following
vector
i
c represents the features of the same words for all filters
 .
,...,
, 2
1
i
n
i
i
i
c
c
c
c 

(5)
3.3. Pooling Layer
The output of the convolutional layer is the so-called vectors of features. For each filter, we get
one such vector and each vector has a different length depending on the length of the input and
the length of the filter. We remove these differences by applying subsampling to each vector
using the max pooling function, giving to each filter its most distinctive feature
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
5
  
n
j
h
n
i
c
c i
j
j ,...,
2
,
1
,
1
1
;
max
max






. (6)
The best feature (the value of the best filter) for words 1
... 
 h
x
x i
i is
 

n
j
c
c i
j
i


 1
;
max
max
. (7)
We connect these values with all filters in one vector to make a real number vector with a length
equal to the number of used filters, regardless of the length of the input that is the pooling layer
output.
3.4. Output Layer
The next layer is the output layer. This layer is fully interconnected to the pooling layer, it means
each neuron that come in this layer is connected to each neuron in the previous layer. We
compute the value of the neuron of the output layer neuron as




n
j
ij
j
i w
c
o
1
.
, (8)
wherec is the resulting vector after applying 
n
filters and then applying the pooling layer, and
w is the matrix of the output layer weights. The activation function is a softmax function,
characterized by converting the vector belonging to Rk
to a vector belonging to (0, 1)k
, for which
the sum of values is equal to 1 used by equation (7). The activation function is applied to the
calculated vector o
 
 k
j i
i
i
o
o
y
1
)
exp(
)
exp(
. (9)
Thus, the calculated values of the output neurons can be interpreted as the coefficient of how the
network determined that the entry belongs to that category. If the output for i-neurons is equal to
1, we would be able to evaluate that the network has entered the i-th category.
However, in using the softmax function is a problem. If oi is large enough (for example, when
using identity as an activation function when applying filters), where exp(oi) is close to infinity,
causing errors in the calculations. To avoid this, we have added the constantD  R 0 to the
expression

 





 k
j i
i
i
k
j i
i
i
D
o
D
o
y
o
o
y
1
1
)
exp(
)
exp(
)
exp(
)
exp(
. (10)
By applying the modified softmax function, all inputs to the function are shifted to negative
values (with the exception of the largest one, which moves to 0), causing a very low negative
value element (if the difference between a given element and the maximum is large enough) has a
resulting value of 0. We have thus "NaN" values, but for the output vector y, it can also contain
values 0 and 1.
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
6
3.5. The Learning Algorithm
Back-propagation algorithm is the algorithm in which network errors are scrolled back across the
layers so that the respective weights can be appropriately modified, and the network outputs are
gradually improving. The difference between the expected and the actual neuron output is the
network error. The value to be adjusted for individual weights depends on how the neuron into
which the weight enters contributed to the overall Errorand on the learning ratio .
 Output Layer: The output of our network is a vector of length k, where k is the number of
categories we are examining. Each value of the output vector y is a number
 
1
.
0

i
y .
However, the expected output is a vector a , where one output value aifor the index i
associated with the input category i is equal to 1 and the other values are equal to 0. The
resulting network error in such cases is best calculated as the cost cross entropy in order to
avoid the learning retardation of the network.
y
y
a
c i
i log
log
. 


 
(11)
forai= 1, if the output belongs to the category i, otherwise ai=0. For each neuron of the output
layer we calculated its error signal , which we later propagated into the previous layers. Neural
error signal is the value that the given neuron has contributed to the resulting network error. In
our case, we denoted the output vector of the network y and the output vector neural potential
values of the output layer before applying the activation function. Expected network output is the
vector a . Subsequently we calculated  for the output layer with respect to the cross entropy
function
i
i
i
v
i a
y
o
C






. (12)
 Pooling Layer: Since the pooling layer only moves the greatest value from the vectors
produced by the filters, its activation function is the identity whose derivation equals 1.
Therefore, we have calculated for this layer simply



n
j
v
j
ij
p
i w
1


. (13)
 Convolutional Layer: The pooling layer moves the largest value from the output vector
for the filter to its output. This means that the weights are equal to 0 for all neurons except
the maximum for which the weight is equal to 1. The error signal therefore propagates only
the neuron that contains the maximum value and is the only neuron whose error signal we
need to count. For each Fi filter we calculated δ based on the activation function f used in
the filters,
i
j
h
of the maximum value of the resulting vector of the particles before
applying the activation function, and
p
i
 pooling the layer
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
7
p
i
i
j
F
i h
f 
 )
(
'

. (14)
 Modification of weights: We adjusted the weights based on the parameter, which indicates
the learning speed and set before the training of the network to a small value (usually 0.001
- 0.05). The weights are adjusted by the following equation (14), where
new
ij
w
is the new
value of the weight,
old
ij
w
is it the original value, i
 is the calculated error signal of the
neuron into which the weight enters, and
prev
j
v
is the output value of the neuron of the
previous layer associated with the weight.
prev
j
i
old
ij
new
ij v
w
w 
.


, (15)
We did not modify the weights in the pooling layer because we still want to return the maximal
value of the filter element, but we just adjusted the weights for the filter based on those values of
the input layer that contributed to the maximum value (for all others, the value of the error signal
is 0). If
 
c
cl max
 and thereforel is the index of the maximum value of the vector, we
adjusted the weights for the filter as follows:
j
l
i
F
i
old
ij
new
ij x
w
w ,
1
. 


 

. (16)
x is the input matrix after eventual application of zero paddings.
4. RESULTS OF APPLICATION
For word vectors we chose the length 50. The length was chosen with respect to the longest word
in [10], and that the grammatical errors in the scanned text. Some functions are used as the
following:
1. accuracy: Calculates how often predictions match labels;
2. false negatives: Computes the total number of false negatives;
3. false positives: The number of sentences when actual class of sentences is yes but
predicted class is no;
4. precision: Computes the precision of the predictions with respect to the labels;
Table 2. Statistics of results
Language set: #segments accuracy false
negative
false
positive
precision
Arabic TR: 484
TE: 121
0.587
0.735
34.00
10.10
13.00
17.00
0.5810
0.8760
Slovak TR: 484
TE: 121
0.460
0.589
0.420
124.0
36.00
106.00
0.7330
0.7850
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
8
The results in the accuracy are better for Arabic texts than for Slovak and in the precision results
for both languages are comparable. All results are positive for us and we know that it is a good
way in the research.
5. CONCLUSION
In the paper we deal with one type of neural networks for a classification of advertisements. The
model was tested on a short text of advertisements were written in Arabic and Slovak language.
We have shown a way to get data from the advertisements websites. The modified model is
qualified for using in the area for both languages independent on the obtained results. But it needs
to analyze more texts. We designed the convolutional neural network model, it was applied in
Java and Python programming languages and was examined using different activation functions,
learning rate coefficients, filter count and its size. We have shown the results of the network
using the proposed model. The results on testing data demonstrate that the neural network model
realize good classification in different data sets (Arabic and Slovak advertisements), after training
on the correct example of inputs.According to the given results, Arabic advertisements are
classified in the suitable category with 87% while 78% for Slovak advertisements of cases within
our dataset. Part of the output is used along with a database model that uses the order file to train
the selected network model, and shows the percentage of success on the training and test sets
during the learning of the network. As the future work we will continue using different encodings
and put more information to codes, Our plan to do many statistic results to evaluations of bigger
text sets and to analyze them in different languages and compare our previous results to the
results which will get it.
ACKNOWLEDGEMENTS
Thanks to Prof. Gabriela Andrejková, CSc. and Mgr. ŠimonHorváth for their help.
REFERENCES
[1] Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882.
Chicago, web-page: https://arxiv.org/abs/1408.5882
[2] Johnson, R. Zhang, T. (2014) “Effective Use of Word Order for Text Categorization with
Convolutional Neural Networks”, arXiv:1412.1058. web-page: https://arxiv.org/abs/1412.1058,
Denver, Colorado.
[3] LeCun, Y. (1990) Handwritten Digit Recognition with a Back-Propagation Network. Morgan
Kaufmann, San Francisco.
[4] Asmaa, Salem, Abdulwahed, Almarimi& Gabriela Andrejkova “Text Dissimilarities Predictions
using Convolutional Neural Networks and Clustering”, 1st World Symposium on Digital Intelligence
for Systems and Machines August 23-25,2018, International Conference in Technical University of
Košice, Slovakia, ISBN: 978-1- 5386-5101-8, pp 343-348.
[5] Asmaa, Salem & Gabriela, Andrejková “Analysis of text advertisements using convolutional neural
networks”, In proceedings of Cognition and artificial life, Brno, 30. 5. - 1. 6. 2018, ISBN 978-80-
88123-24- 8, pp 59-61.
[6] 3qaratonline.com - web-page [online]: https://www.3qaratonline.com/
[7] Bazos.sk - web-page [online]: https://www.bazos.sk/
[8] Mikolov, T (2013) “Distributed Representations of Words and Phrases and their Compositionality”,
proceedings of the 26th International Conference on Neural Information Processing Systems.
[9] Zhang, Y. Wallace, B. (2015) “ A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional
Neural Networks for Sentence classification”, arXiv:1510.03820. web-page:
http://arxiv.org/abs/1510.03820
[10] Tvaroslovník – web page [online], http:// tvaroslovnik.ics.upjs.sk/
International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021
9
AUTHORS
Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science,
Faculty of Science 2012, PhD. from P. J. Šafárik University in Košice, Institute of
Computer Science, Faculty of Science2016, Faculty member at Bani Waleed University
From 2017, Libya.
Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science
2014, PhD. from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science
2019, Faculty member at Bani Waleed University From 2020, Libya..

More Related Content

PDF
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
PDF
Texts Classification with the usage of Neural Network based on the Word2vec’s...
PDF
Texts Classification with the usage of Neural Network based on the Word2vec’s...
PDF
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
PDF
Nlp and Neural Networks workshop
PPTX
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
PPTX
NLP Bootcamp
PDF
Representation Learning of Text for NLP
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
Nlp and Neural Networks workshop
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
NLP Bootcamp
Representation Learning of Text for NLP

Similar to TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS (20)

PDF
Anthiil Inside workshop on NLP
PPTX
Talk from NVidia Developer Connect
PDF
Automatic Text Classification Of News Blog using Machine Learning
PPTX
Text Classification
PPTX
Text features
PPTX
Natural Language Processing
PDF
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
PPTX
Introducción a NLP (Natural Language Processing) en Azure
PDF
NLP and Deep Learning for non_experts
PDF
NLP Bootcamp 2018 : Representation Learning of text for NLP
PDF
Text Document Classification System
PPTX
Deep Learning for Natural Language Processing
PDF
IRJET- Visual Information Narrator using Neural Network
PDF
AINL 2016: Nikolenko
PDF
MACHINE-DRIVEN TEXT ANALYSIS
PDF
1_Introduction_to_ML,_Machine_Learning_Process,_Applications_of.pdf
PPTX
Group 5 Text Vectorization in Natural Language Processing.pptx
PDF
IRJET- Extension to Visual Information Narrator using Neural Network
PPTX
Feature Engineering for NLP
PDF
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Anthiil Inside workshop on NLP
Talk from NVidia Developer Connect
Automatic Text Classification Of News Blog using Machine Learning
Text Classification
Text features
Natural Language Processing
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Introducción a NLP (Natural Language Processing) en Azure
NLP and Deep Learning for non_experts
NLP Bootcamp 2018 : Representation Learning of text for NLP
Text Document Classification System
Deep Learning for Natural Language Processing
IRJET- Visual Information Narrator using Neural Network
AINL 2016: Nikolenko
MACHINE-DRIVEN TEXT ANALYSIS
1_Introduction_to_ML,_Machine_Learning_Process,_Applications_of.pdf
Group 5 Text Vectorization in Natural Language Processing.pptx
IRJET- Extension to Visual Information Narrator using Neural Network
Feature Engineering for NLP
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Current and future trends in Computer Vision.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Artificial Intelligence
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
additive manufacturing of ss316l using mig welding
PDF
composite construction of structures.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
Geodesy 1.pptx...............................................
Foundation to blockchain - A guide to Blockchain Tech
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Current and future trends in Computer Vision.pptx
R24 SURVEYING LAB MANUAL for civil enggi
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Artificial Intelligence
Automation-in-Manufacturing-Chapter-Introduction.pdf
Sustainable Sites - Green Building Construction
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
additive manufacturing of ss316l using mig welding
composite construction of structures.pdf
UNIT 4 Total Quality Management .pptx
Digital Logic Computer Design lecture notes
Ad

TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS

  • 1. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 DOI:10.5121/ijdms.2021.13401 1 TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS AbdulwahedAlmarimi and Asmaa Salem Department of Computer Science, Bani Waleed University, Libya ABSTRACT In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a modified model of the CNN, we have implemented it and developed next modifications. We studied their influence on the performing activity of the proposed network. The result is a functional model of the network and its implementation in Java and Python. And analysis of model results using different parameters for the network and input data. The results on experiments data show that the developed model of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of advertisements. KEYWORDS Convolutional neural networks, advertisement text, back-propagation algorithm, classification, encoding of text 1. INTRODUCTION Advertisement texts form a big set of data that we can individually choose an appropriate category. Advertisements authors can select the category but sometimes they do not recognize the suitable category. Generally, an advertisement consists of sentences, which are not long and can be categorized in a good automatic way. Its good classification can be found in [1], [2] using convolutional neural networks (CNN). Convolutional neural networks are used in many domains of data processing and realize the best results mostly in image processing as it is presented in [3]. There exist models that have searched their use for word processing and get great results [1]. For the processing of sentence and the classification of advertisements, modified CNN are used. The classification of advertisement texts can be done using method for sentence classification in two ways: (1) To suppose that all advertisement text is one sentence formed by words, or (2) to analyze each sentence and then to estimate results of all sentences. We have used the first method, In the first step of our analysis. Arabic and Slovak languages have different free grammars. This indicates that positions of subjects and predicates in a sentence are not constant, and it looks that the Arabic grammar is more complicated than the Slovak grammar. We used grammar information in text data preparation as input to CNN and it can help to do some better classification. We used some connections between pairs of words as assistance information in input. As we explained in [4],[5] working on Arabic and English texts, the results were less complicated than which we have got using Arabic and Slovak texts. The structure of the paper as the following: The second section contains a description of data and their preparation as input to the network. In the third section, we describe a developed model of
  • 2. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 2 the convolutional neural network. The results and their comparison for both languages are given in the fourth section, we illustrated the results and their comparison for both languages and we formulate summary of results in the conclusion. 2. DATA PREPROCESSING The advertisements are very similar in both languages, Arabic and Slovak. But we would like to do a comparison for the results of the advertisement classification using a convolutional neural network for both languages and between both languages. 2.1. Information on used Databases of Advertisements  Arabic Advertisements - advertisements on the web: 3qaratonline.com portal [6]. There are 3 used categories of different advertisements: realty, furniture and electric devices. We supposed that electric devices descriptions have many common features with descriptions of furniture, while the realty category has very little similarity to other categories.  Slovak Advertisements - advertisements on the Bazos.sk portal [7], from which we have selected specific categories that we will examine. We chose 3 categories, namely mobiles, computers and laptops and animals. We supposed that cell phone descriptions have many common features with descriptions of computers and notebooks, while the animal category has very little similarity to the rest of categories, and therefore advertisements in this category could be better evaluate using the neural network.  The numbers of used advertisements in categories are in Table 1. Table 1. The numbers of used advertisements from three categories in both languages. Category Arabic texts Slovak texts Realty 205 - Furniture 194 - Electric devices 206 - Mobiles - 11470 Computers and laptops - 4968 Animals - 3247 2.2. Creating Word Vectors In order for the neural network to be able to process the text, we need to convert it to a vector of real numbers. In our situation, we want to process the word by word in order to determine the category based on the words used. Therefore, we need to create a uniform length vector for each word that will be used in the input data for CNN. Most often, preprocessed models are used, which are trained on a large number of texts to recognize the correlations between different words based on their use in text. They create a vector model that indicates words into a vector space, where semantically connected words are represented by points situated close to each other. The most widely used of these models is the software package word2vec [8]. The disadvantage of these solutions is that all well-trained prepared models are in English. Given the time-consumption of creating our own model for a similar learning of word vectors and the characteristics of our data, we have chosen not to use this field. We will look to the words as to images of coded letters (one letter represents one pixel) and we create vectors for words so that
  • 3. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 3 each word codes the letter after the letter. For each letter a-z, we randomly generated the code v  (0, 1), taking into account that the code for each letter is unique.It means, different words have different codes. Thereafter, we have removed the diacritics for each word in our data and created an encoding vector , ), 1 , 0 ( ), 0 ,... 0 , ,..., , ( 2 1 d i v v v v v i d     (1) where d v v v ,..., , 2 1 are codes for the individual letters in the word with the length d. The vector is completed with zeros for the uniform length of vectors l. Thus, we created a table which has a word and its encoding vector. After text processing, we received a matrix of real numbers for each advertisement with a fixed number of columns equal to the selected vector length and variable number of rows depending on the number of words in the advertisement. For neural networks, we used these matrices as input. The preprocessing was very similar for both languages. 3. DEVELOPED MODEL OF CNN We developed a similar network structure as suggested by [1], which was used for the processing of sentences in some texts, and we tried to find parameters such that the network would well evaluate our data using the knowledge found by [9]. 3.1. Input Layer As mentioned above, a network input is a matrix of real numbers, where each row is a vector representation of the word in the advertisement after its processing. The number of rows differs for each output, while the number of columns is equal to the selected vector length l for the word. Let l i R x  be a vector of the length l representing the i-th word in the processed advertisement. If the advertisement has n words, the resulting representation is n x x x x     ... 2 1 (2) where is a symbol of a concatenation operation. We will get the vector x with the length n*l and it should be represented as a matrix of the type n* l.
  • 4. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 4 Figure 1. The basic model of the prepared convolutional network [1]. 3.2. Convolutional Layer The main idea of a convolutional layer is to work in an invariant way across a text. The convolutional layer consists of filters   n j R l h j    1 , * , where  n isthe number of filters, h is the number of words in the filter which is applied to obtain a new feature. For example, the feature i j c is calculated after applying the filter j  to the word i x to 1   h xi ), . ( 1 : b x f c h i i j i j      (3) whereb R is the threshold for the filter and f is an activation function such as hyperbolic tangent or identity. In literature, the convolution operation is often performed by rotating the filter before it is applied, but it does not affect to the network processing [Godfellow2016]. If filter j  is applied to every possible part of the input with shift 1 and there is a vector of features   1 2 1 ,..., ,    h n j j j j c c c c . (4) If there is a situation when h > n, we need to modify the input so that we can apply the filter to it. This is achieved by adding of zero vectors (it is called zero padding). It means, we can apply any big filter to any input (we do not have to limit the size of the smallest input). The following vector i c represents the features of the same words for all filters  . ,..., , 2 1 i n i i i c c c c   (5) 3.3. Pooling Layer The output of the convolutional layer is the so-called vectors of features. For each filter, we get one such vector and each vector has a different length depending on the length of the input and the length of the filter. We remove these differences by applying subsampling to each vector using the max pooling function, giving to each filter its most distinctive feature
  • 5. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 5    n j h n i c c i j j ,..., 2 , 1 , 1 1 ; max max       . (6) The best feature (the value of the best filter) for words 1 ...   h x x i i is    n j c c i j i    1 ; max max . (7) We connect these values with all filters in one vector to make a real number vector with a length equal to the number of used filters, regardless of the length of the input that is the pooling layer output. 3.4. Output Layer The next layer is the output layer. This layer is fully interconnected to the pooling layer, it means each neuron that come in this layer is connected to each neuron in the previous layer. We compute the value of the neuron of the output layer neuron as     n j ij j i w c o 1 . , (8) wherec is the resulting vector after applying  n filters and then applying the pooling layer, and w is the matrix of the output layer weights. The activation function is a softmax function, characterized by converting the vector belonging to Rk to a vector belonging to (0, 1)k , for which the sum of values is equal to 1 used by equation (7). The activation function is applied to the calculated vector o    k j i i i o o y 1 ) exp( ) exp( . (9) Thus, the calculated values of the output neurons can be interpreted as the coefficient of how the network determined that the entry belongs to that category. If the output for i-neurons is equal to 1, we would be able to evaluate that the network has entered the i-th category. However, in using the softmax function is a problem. If oi is large enough (for example, when using identity as an activation function when applying filters), where exp(oi) is close to infinity, causing errors in the calculations. To avoid this, we have added the constantD  R 0 to the expression          k j i i i k j i i i D o D o y o o y 1 1 ) exp( ) exp( ) exp( ) exp( . (10) By applying the modified softmax function, all inputs to the function are shifted to negative values (with the exception of the largest one, which moves to 0), causing a very low negative value element (if the difference between a given element and the maximum is large enough) has a resulting value of 0. We have thus "NaN" values, but for the output vector y, it can also contain values 0 and 1.
  • 6. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 6 3.5. The Learning Algorithm Back-propagation algorithm is the algorithm in which network errors are scrolled back across the layers so that the respective weights can be appropriately modified, and the network outputs are gradually improving. The difference between the expected and the actual neuron output is the network error. The value to be adjusted for individual weights depends on how the neuron into which the weight enters contributed to the overall Errorand on the learning ratio .  Output Layer: The output of our network is a vector of length k, where k is the number of categories we are examining. Each value of the output vector y is a number   1 . 0  i y . However, the expected output is a vector a , where one output value aifor the index i associated with the input category i is equal to 1 and the other values are equal to 0. The resulting network error in such cases is best calculated as the cost cross entropy in order to avoid the learning retardation of the network. y y a c i i log log .      (11) forai= 1, if the output belongs to the category i, otherwise ai=0. For each neuron of the output layer we calculated its error signal , which we later propagated into the previous layers. Neural error signal is the value that the given neuron has contributed to the resulting network error. In our case, we denoted the output vector of the network y and the output vector neural potential values of the output layer before applying the activation function. Expected network output is the vector a . Subsequently we calculated  for the output layer with respect to the cross entropy function i i i v i a y o C       . (12)  Pooling Layer: Since the pooling layer only moves the greatest value from the vectors produced by the filters, its activation function is the identity whose derivation equals 1. Therefore, we have calculated for this layer simply    n j v j ij p i w 1   . (13)  Convolutional Layer: The pooling layer moves the largest value from the output vector for the filter to its output. This means that the weights are equal to 0 for all neurons except the maximum for which the weight is equal to 1. The error signal therefore propagates only the neuron that contains the maximum value and is the only neuron whose error signal we need to count. For each Fi filter we calculated δ based on the activation function f used in the filters, i j h of the maximum value of the resulting vector of the particles before applying the activation function, and p i  pooling the layer
  • 7. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 7 p i i j F i h f   ) ( '  . (14)  Modification of weights: We adjusted the weights based on the parameter, which indicates the learning speed and set before the training of the network to a small value (usually 0.001 - 0.05). The weights are adjusted by the following equation (14), where new ij w is the new value of the weight, old ij w is it the original value, i  is the calculated error signal of the neuron into which the weight enters, and prev j v is the output value of the neuron of the previous layer associated with the weight. prev j i old ij new ij v w w  .   , (15) We did not modify the weights in the pooling layer because we still want to return the maximal value of the filter element, but we just adjusted the weights for the filter based on those values of the input layer that contributed to the maximum value (for all others, the value of the error signal is 0). If   c cl max  and thereforel is the index of the maximum value of the vector, we adjusted the weights for the filter as follows: j l i F i old ij new ij x w w , 1 .       . (16) x is the input matrix after eventual application of zero paddings. 4. RESULTS OF APPLICATION For word vectors we chose the length 50. The length was chosen with respect to the longest word in [10], and that the grammatical errors in the scanned text. Some functions are used as the following: 1. accuracy: Calculates how often predictions match labels; 2. false negatives: Computes the total number of false negatives; 3. false positives: The number of sentences when actual class of sentences is yes but predicted class is no; 4. precision: Computes the precision of the predictions with respect to the labels; Table 2. Statistics of results Language set: #segments accuracy false negative false positive precision Arabic TR: 484 TE: 121 0.587 0.735 34.00 10.10 13.00 17.00 0.5810 0.8760 Slovak TR: 484 TE: 121 0.460 0.589 0.420 124.0 36.00 106.00 0.7330 0.7850
  • 8. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 8 The results in the accuracy are better for Arabic texts than for Slovak and in the precision results for both languages are comparable. All results are positive for us and we know that it is a good way in the research. 5. CONCLUSION In the paper we deal with one type of neural networks for a classification of advertisements. The model was tested on a short text of advertisements were written in Arabic and Slovak language. We have shown a way to get data from the advertisements websites. The modified model is qualified for using in the area for both languages independent on the obtained results. But it needs to analyze more texts. We designed the convolutional neural network model, it was applied in Java and Python programming languages and was examined using different activation functions, learning rate coefficients, filter count and its size. We have shown the results of the network using the proposed model. The results on testing data demonstrate that the neural network model realize good classification in different data sets (Arabic and Slovak advertisements), after training on the correct example of inputs.According to the given results, Arabic advertisements are classified in the suitable category with 87% while 78% for Slovak advertisements of cases within our dataset. Part of the output is used along with a database model that uses the order file to train the selected network model, and shows the percentage of success on the training and test sets during the learning of the network. As the future work we will continue using different encodings and put more information to codes, Our plan to do many statistic results to evaluations of bigger text sets and to analyze them in different languages and compare our previous results to the results which will get it. ACKNOWLEDGEMENTS Thanks to Prof. Gabriela Andrejková, CSc. and Mgr. ŠimonHorváth for their help. REFERENCES [1] Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882. Chicago, web-page: https://arxiv.org/abs/1408.5882 [2] Johnson, R. Zhang, T. (2014) “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks”, arXiv:1412.1058. web-page: https://arxiv.org/abs/1412.1058, Denver, Colorado. [3] LeCun, Y. (1990) Handwritten Digit Recognition with a Back-Propagation Network. Morgan Kaufmann, San Francisco. [4] Asmaa, Salem, Abdulwahed, Almarimi& Gabriela Andrejkova “Text Dissimilarities Predictions using Convolutional Neural Networks and Clustering”, 1st World Symposium on Digital Intelligence for Systems and Machines August 23-25,2018, International Conference in Technical University of Košice, Slovakia, ISBN: 978-1- 5386-5101-8, pp 343-348. [5] Asmaa, Salem & Gabriela, Andrejková “Analysis of text advertisements using convolutional neural networks”, In proceedings of Cognition and artificial life, Brno, 30. 5. - 1. 6. 2018, ISBN 978-80- 88123-24- 8, pp 59-61. [6] 3qaratonline.com - web-page [online]: https://www.3qaratonline.com/ [7] Bazos.sk - web-page [online]: https://www.bazos.sk/ [8] Mikolov, T (2013) “Distributed Representations of Words and Phrases and their Compositionality”, proceedings of the 26th International Conference on Neural Information Processing Systems. [9] Zhang, Y. Wallace, B. (2015) “ A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence classification”, arXiv:1510.03820. web-page: http://arxiv.org/abs/1510.03820 [10] Tvaroslovník – web page [online], http:// tvaroslovnik.ics.upjs.sk/
  • 9. International Journal of Database Management Systems (IJDMS) Vol.13, No.4, August 2021 9 AUTHORS Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science 2012, PhD. from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science2016, Faculty member at Bani Waleed University From 2017, Libya. Master Degree from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science 2014, PhD. from P. J. Šafárik University in Košice, Institute of Computer Science, Faculty of Science 2019, Faculty member at Bani Waleed University From 2020, Libya..