SlideShare a Scribd company logo
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 13, No. 3, June 2023, pp. 3010~3018
ISSN: 2088-8708, DOI: 10.11591/ijece.v13i3.pp3010-3018  3010
Journal homepage: http://ijece.iaescore.com
An optimized deep learning model for optical character
recognition applications
Sinan Q. Salih1
, Ahmed L. Khalaf1
, Nuha Sami Mohsin2
, Saadya Fahad Jabbar2
1
Department of Communication Engineering Technology Department, College of Information Technology,
Imam Ja’afar Al-Sadiq University, Baghdad, Iraq
2
College of Education, Ibn Rushed for Human Science, University of Baghdad, Baghdad, Iraq
Article Info ABSTRACT
Article history:
Received Jul 14, 2022
Revised Oct 5, 2022
Accepted Dec 2, 2022
The convolutional neural networks (CNN) are among the most utilized
neural networks in various applications, including deep learning. In recent
years, the continuing extension of CNN into increasingly complicated
domains has made its training process more difficult. Thus, researchers
adopted optimized hybrid algorithms to address this problem. In this work, a
novel chaotic black hole algorithm-based approach was created for the
training of CNN to optimize its performance via avoidance of entrapment in
the local minima. The logistic chaotic map was used to initialize the
population instead of using the uniform distribution. The proposed training
algorithm was developed based on a specific benchmark problem for optical
character recognition applications; the proposed method was evaluated for
performance in terms of computational accuracy, convergence analysis, and
cost.
Keywords:
Chaotic maps
Convolutional neural network
Training algorithm
Computational intelligence
Black hole algorithm
This is an open access article under the CC BY-SA license.
Corresponding Author:
Sinan Q. Salih
Department of Communication Engineering Technology, College of Information Technology, Imam
Ja’afar Al-Sadiq University
Baghdad, Iraq
Email: sinan.salih@sadiq.edu.iq
1. INTRODUCTION
Every simple artificial neural network (ANN) is basically made up of the input and output layers of
neurons, but the necessity for an intermediate hidden layer in ANN gave rise to the concept of deep learning
(DL) [1], [2]. So, DL can be considered a more complicated version of ANN that relies on the use of
numerous layers with nonlinear processing units; most DL frameworks rely on the supervised or
unsupervised data representations learning concept [2]. The year 1965 witnessed the introduction of the first
deep feedforward multilayer perceptrons-based working algorithm [3] and since then, DL has improved and
been adopted in many applications. DL is appliable in several areas, such as pattern recognition, neural
networks, optimization, graphical modeling, artificial intelligence, and signal processing [4]–[6].
Recently, convolutional neural networks (CNN) was developed as a way of achieving a better
accuracy during recognition tasks. However, one of CNN main problems is the difficult design of its
architecture for specific task. Thus, several architectures of CNN have been developed, such as the LeNet
architecture which was originally developed by LeCun [7], [8]. This architecture was implemented for optical
character recognition (OCR) and character recognition in several documents. ConvNet is another CNN
architecture that relies on the use of 7 layers where each layer has a specific role. Using the same architecture
for several tasks has a poor chance of reaching optimal performance; consequently, distinct CNN
architectures are built for specific tasks, which requires a lot of research work since there are many types of
Int J Elec & Comp Eng ISSN: 2088-8708 
An optimized deep learning model for optical character recognition applications (Sinan Q. Salih)
3011
machine learning (ML) activities in the sectors [8]. CNN is a robust network nevertheless; it still has some
parameters that need to be optimized; these include the learning parameters and the network configuration
settings. The performance of CNN has been reported to be reliant on the proper tuning of its network
configuration parameters [9], [10]. Theoretical basis has been thoroughly investigated to develop strategies
for enhancing CNN parameters, thereby improving its overall performance. This demands a conceptual shift
from visual features extraction to the optimization of network structure configuration [10]. The last three
decades have seen wide utilization of metaheuristics and nature-inspired algorithms for solving various kinds
of NP optimization problems; such metaheuristics include firefly algorithm (FA), grey wolf optimizer
(GWO), particle swarm optimization (PSO), nomadic people optimizer (NPO), and teaching-learning based
optimization (TLBO) [11]–[14]. These algorithms were developed to handle problems in different
engineering fields, information security, and machine learning [15]–[22].
A novel nature inspired algorithm or the black hole algorithm (BHA) was recently developed based
on inspiration from the behavior of the black hole (BH) as it pulls in its surrounding stars [23]. The concept
of the BHA stemmed from the nature and interaction of the BH with its surrounding solar bodies; it considers
a set of stars as the total number of potential solutions in a given iteration and each star can be pulled by the
BH at a time to represents the best solution. The next iteration generates a new set of solutions by moving the
surrounding stars toward the BH. Stars that are close to the BH at a pre-determined distance are engulfed by
the BH and the random generation of other set of stars is immediately implemented. With this concept, the
BHA can launch an exploration task in the unexplored areas of the solution space rather than searching the
already explored areas. The ability of BHA to solve data clustering problems has been demonstrated; its
optimization performance was reportedly better than those of other meta-heuristics [24]. Recently, BH
algorithm was adopted for CNN training [25]. The authors demonstrated that the exploration of BH algorithm
slow down the searching process. Herein, novel Chaotic BHA-based training algorithm (CBH-CNN) for the
training of the CNNs was developed and evaluated. The role of BHA in this approach is to establish the
optimal CNN parameters. The exploration of BH was improved by initializing the population using logistic
chaotic map, rather than the uniform distribution method. The performance of the proposed approach was
evaluated based on specific criteria, such as accuracy and calculated errors. The modified national institute of
standards and technology (MNIST) dataset, an approved dataset of handwritten digital images, was used as a
reference. This dataset has digital images of 28×28 pixels size and contains 70,000 images. where 60,000
images were employed as the train set while the other 10,000 images were served as the testing data set. The
rest of this article is arranged as follows: section 2 covers the basics of the CNN and BH models while
section 3 covers the explanation of the proposed CBH-CNN algorithm. In section 4, the analysis results of
the proposed CBH-CNN algorithm was presented while section 5 is the conclusion aspect of the work.
2. THEORETICAL BACKGROUND
2.1. Black hole algorithm
As aforementioned, Black hole algorithm (BHA) is based mostly on the idea of a region of space
with much mass focus on it in a manner that nothing that comes close to it can escape its gravitational pull.
Whatever is pulled into the BH is eternally lost. The BHA has two major aspects which are the migration and
the re-initialization of stars that had crossed the event horizon around the BH. The BHA relies on the
following working principle: first, the 𝑁 + 1 stars, 𝑥𝑖 ∈ 𝑅𝐷
, 𝑖 = 1, . . . , 𝑁 + 1 (where 𝑁=population size) are
randomly initialized in the search space prior to the evaluation of their fitness. The candidate star with the
best evaluation function is considered the black hole 𝑥𝐵𝐻 and being that the BH is static, it maintains its
position until another star with a better solution is found. N represents the number of candidate stars
searching for the optimum. The movement of each star towards BH in each generation can be calculated
using this relation:
𝑥𝑖(𝑡 + 1) = 𝑥𝑖(𝑡) + 𝑟𝑎𝑛𝑑 × (𝑥𝐵𝐻 − 𝑥𝑖(𝑡)) 𝑖 = 1.2. ⋯ 𝑁, (1)
where rand is a randomly generated number in the range between 0 and 1. Any star in the BHA that its
distance to the BH is less than the event horizon will disappear. The event horizon has a radius (R) given as:
𝑅 =
𝑓𝐵𝐻
∑ 𝑓𝑖
𝑁
𝑖=1
, (2)
where 𝑓𝑖 and 𝑓𝐵𝐻 represent the fitness values of the BH and the ith
star, respectively. While 𝑁 represents the
number of stars (individual solutions). If the distance between a BH and an individual solution is less than R,
then the individual solution will collapse for another individual will be created and randomly distributed in
the solution space. The BHA is easy and simple to implement because it is a parameter-less algorithm. The
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018
3012
BHA can converge to the global optimum in all the runs and cannot be trapped in local optimum like some
other heuristics [23] [24] [26]. In this work, the reasons behind using BHA to train CNN is due to its simple
structure, easy implementation, and its parameter-less structure.
2.2. The basics of CNNs
The concept of DL was introduced by Kelly [27] in 1960 based on the idea of an output becoming
the input in the next iteration. Basically, the CNN is a form of DL network that comprised of several
convolution layers, the pooling layers (for the reduction of the size of the received input), the ReLU layers
that enhances the outputs non-linearity, and the Fully Connected Layer that control output range using the
softmax activation function. The CNN exploits the advantages of the relationship between the neighboring
neurons. The 3 basic concepts of CNN are the local receptive fields, pooling, and shared weights [28], [29].
Figure 1 depicts a typical CNN architecture.
- Local receptive fields (LRF): The input region is small in size and therefore consist of only neuron found
in the first 1st
hidden layer. The LRF is passed across the input images to produce a different hidden
neuron that is comprised of the whole first hidden layer.
- Shared weights and biases: The role of this feature is to detects the presence of similar features from
different input image locations; the feature map is designated to the map between the input and hidden
layers, while the weights are regarded as the shared weights and biases. The filter or kernel to be used is
also determined by the shared weights and biases. Image recognition is performed using more than one
feature map for the detection of several features; hence, different feature maps can co promise the
convolutional layer.
- Pooling layers: This is the layer next to the convolutional layers ad its role is to analyze the incoming info
from the convolution layer. The common pooling procedures commonly used are Max–pooling and L2
pooling approaches; Max-pooling involves maximum neuron activation while L2 pooling relies on the
square root of the sum of the squares of the activations in the 2×2 region. These three concepts
collectively form a complete CNN architecture as they are the major components of the connection layer.
Before introducing one or more fully connected layers, several convolutions, activation function,
and pooling stages are merged first. A loss function is adopted by the model output for the evaluation of the
performance in terms of the differences the actual image label and the CNN output. The training of the CNN
is aimed at reducing the loss function, and this is achieved using the stochastic gradient descent [7] which is
an optimization strategy that first uses the weight of each edge in the network for the estimation of the loss
function gradient before updating the weights using the computed gradient [27].
Figure 1. The typical architecture of a CNN model
3. THE PROPOSED TRAINING ALGORITHM
This paper presented the development of a training algorithm for CNN; hence, the major aim here is
to determine the best values for the parameters of CNN. The BHA was used in this work as the training
algorithm for the two reasons earlier stated. Figure 2 depicts the schematic of the proposed CBH-CNN which
illustrates the main steps of the CBH-CNN. However, there are three main steps that need to be detailed as
presented in the subsequent subsections.
3.1. Encoding the stars
The two basic parts of a CNN are the feature extraction component and the classification
component; feature extraction (FE) as a process requires the input of several convolution layers, activation
function, and Max-pooling. The classifier is mostly made up of fully connected NN layers. The weights set of
the NN in the proposed system is considered the structure of each wolf; hence, the approach utilizes a vector
of real values that contains all the NN weights. Figure 3 is a depiction of a star for the training of NNs.
Int J Elec & Comp Eng ISSN: 2088-8708 
An optimized deep learning model for optical character recognition applications (Sinan Q. Salih)
3013
Figure 2. CBH-CNN
Figure 3. The encoding of each solution in the neural network in CNN [25]
3.2. Stars initialization
To initialize the population, the population size is first set up, followed by random creation of stars
until the expected population size is reached. As stated earlier, the chaotic logistic map is used for initializing
the stars in the search space, as in (3):
𝑥𝑖+1 = (𝑈𝑝𝑝𝑒𝑟 − 𝐿𝑜𝑤𝑒𝑟) × [(1 − 𝑋𝑖) × 𝜇] + 𝐿𝑜𝑤𝑒𝑟 (3)
where 𝑥𝑖 is the current position of a star, 𝑥𝑖+1 is the next position generated via logistic map, 𝑈𝐵 and 𝐿𝐵 is
values of upper bound and lower bound, respectively, and 𝜇=a controlling parameter, in range [0, 4].
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018
3014
3.3. Evaluation function
The cost function of any classification problem is the classification error which can be estimated
thus:
𝐸𝑟𝑟 =
𝐹𝑃+𝐹𝑁
𝐹𝑃+𝐹𝑁+𝑇𝑃+𝑇𝑁
(4)
where FP is false positive, FN is false negative, TP is true positive, and TN is true negative; these metrics are
computed based on the confusion matrix. The actual classification error value is determined by the number of
incorrectly classified samples; this implies that the CNN is used to classify the samples based on the values
of each star. The pseudocode of the proposed CBH-CNN is given in Figure 4.
CBH-CNN Algorithm
Input: Dataset, 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠, 𝑁𝑜. 𝑜𝑓 𝑆𝑡𝑎𝑟𝑠, 𝑀𝑎𝑥 𝐵𝑜𝑢𝑛𝑑, 𝑀𝑖𝑛 𝐵𝑜𝑢𝑛𝑑
Output: Best Star
Procedure:
Generate the initial positions for all stars 𝑋 in via (3)
Update the weights and biases of CNN using each star in the population
Calculate the evaluation value of each star via (4)
Determine the black hole star which is the current best solution
For 𝑖𝑡𝑟 = 1 → 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
For 𝑖 = 1 → 𝑁𝑜. 𝑜𝑓𝑆𝑡𝑎𝑟𝑠
Calculate the new position for 𝑠𝑡𝑎𝑟𝑖 𝑢𝑠𝑖𝑛𝑔 (1)
Check the boundaries for the new position of 𝑠𝑡𝑎𝑟𝑖
Set the weights and biases of CNN using the new generated position
Calculate the evaluation value of each 𝑠𝑡𝑎𝑟𝑖 via (4)
End For
Determine the event horizon using (2)
For 𝑖 = 1 → 𝑁𝑜. 𝑜𝑓𝑆𝑡𝑎𝑟𝑠
If 𝑆𝑡𝑎𝑟𝑖 crosses the event horizon Then
Replace the 𝑆𝑡𝑎𝑟𝑖 with a new generated star via (3)
End If
End For
Determine the black hole star which is the current best solution
Next i
Return final black hole solution
Figure 4. The main steps of CBH-CNN
4. RESULTS AND DISCUSSION
In this study, we have evaluated our enhanced algorithm based on the same experiment done by the
dataset used was sourced from the database of the Modified National Institute of Standards and Technology
(MNIST); the dataset comprised of 70,000 images. The dataset was divided into two parts; the first part
comprised 60,000 images (considered the training set) which were scanned images of the handwriting of
250 people of which 50% of the people are employees of the US Census Bureau while the remaining 50% are
images captured from the handwriting of high school students. These images were stored in the grayscale
mode and were sized 28×28 pixels. The remaining 10,000 images in the dataset were used as the testing
dataset; they were collected from another set of 250 persons for the sake of comparison) [29]. A population
size of 30 stars was used in this study while the maximum number of epochs is 20. The new approach was
benchmarked against the standard CNN, deep belief network (DBN), convolutional neural network-based
simulating annealing (CNN-SA) [30], and CNN-based PSO in terms of performance [31], convolutional
neural network based standard black hole (BH-CNN) [25] as shown in Table 1; the comparison was in terms of
the classification accuracy based only on five epochs.
All used models attained almost similar performances, as shown in Table 1, while the proposed
CBH-CNN performed better based on the considered metric. The performance of CNN-PSO algorithm was
acceptable but not up to the performance of the new CBH-CNN. The PSO algorithm has three control
parameters (cognitive parameter 𝑐1, social parameters 𝑐2, and inertia weight 𝑤) and these parameters must be
optimally tunned (this is another optimization problem). Hence, the simple structure of BHA, which requires
no parameter tuning, makes it a simpler ML model with less complexity compared to other models. Figure 5
and Figure 6 demonstrated the performance accuracy and error rate of the evaluated models, accordingly.
The comparison of the adopted models based on the execution time is presented in Figure 7. From Figure 7,
it can be noticed that DBN requires less execution time owing to its training process that relies on a rapid
Int J Elec & Comp Eng ISSN: 2088-8708 
An optimized deep learning model for optical character recognition applications (Sinan Q. Salih)
3015
contrastive divergence. The proposed CBH-CNN shows longer execution time as compared to DBN and
standard CNN due to the additional iterations required by the BHA, and the need to calculate the event
horizon.
Table 1. The comparison of the performance of the new CBH-CNN some existing models
Models Epochs
Epoch=1 Epoch=2 Epoch=3 Epoch=4 Epoch=5
CNN 88.87 92.25 93.9 94.81 95.68
DBN 87.46 89.72 90.64 91.14 92.93
CNN-SA 89.18 92.38 94.2 95.19 96.04
CNN-PSO 89.52 92.38 93.91 95.08 96.31
BH-CNN 89.91 92.38 93.91 95.08 96.31
CBH-CNN 89.93 93.12 94.98 95.62 96.96
Figure 5. The classification accuracy
Figure 6. The error rates
82
84
86
88
90
92
94
96
98
1 2 3 4 5
Accuracy
Run/Epoch
CNN DBN CNN-SA CNN-PSO BH-CNN
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5
Error
Run/Epoch
CNN DBN CNN-SA CNN-PSO BH-CNN
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018
3016
Figure 7. The execution times
5. CONCLUSION
Deep learning models have received the most attention in the field of machine learning because they
can recognize the patterns in most classification problems. The CNN is among the well-known DL models
and its main structure consist of two parts-feature extraction and neural network. This paper focused on the
training of the second part of the CNN using CBHA, a novel nature-inspired algorithm. The proposed model
outperformed the rest of the DL models on the MNIST dataset using only five epochs. The outcome of this
study showed that the BHA is a suitable alternative for the training of CNN algorithms.
REFERENCES
[1] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7,
pp. 1527–1554, Jul. 2006, doi: 10.1162/neco.2006.18.7.1527.
[2] N. B. Karayiannis and G. W. Mi, “Growing radial basis neural networks: merging supervised and unsupervised learning with
network growth techniques,” IEEE Transactions on Neural Networks, vol. 8, no. 6, pp. 1492–1506, Nov. 1997, doi:
10.1109/72.641471.
[3] A. G. Ivakhnenko and V. G. Lapa, “Cybernetic predicting devices,” 1966.
[4] T. Dhieb, W. Ouarda, H. Boubaker, and A. M. Alimi, “Deep neural network for online writer identification using beta-elliptic
model,” in 2016 International Joint Conference on Neural Networks (IJCNN), Jul. 2016, pp. 1863–1870, doi:
10.1109/IJCNN.2016.7727426.
[5] L. Haddad, T. M. Hamdani, W. Ouarda, A. M. Alimi, and A. Abraham, “An adaptation module with dynamic radial basis
function neural network using significance concept for writer adaptation,” Journal of Information Assurance, vol. 12, no. 1, 2017.
[6] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.
[7] Y. LeCun and M. Ranzato, “Deep learning tutorial,” in Tutorials in international conference on machine learning (ICML’13),
2013, pp. 1–29.
[8] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional neural networks by variable-length particle swarm
optimization for image classification,” in 2018 IEEE Congress on Evolutionary Computation (CEC), Jul. 2018, pp. 1–8, doi:
10.1109/CEC.2018.8477735.
[9] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun, “What is the best multi-stage architecture for object recognition?,” in
2009 IEEE 12th International Conference on Computer Vision, Sep. 2009, pp. 2146–2153, doi: 10.1109/ICCV.2009.5459469.
[10] T. Yamasaki, T. Honma, and K. Aizawa, “Efficient optimization of convolutional neural networks using particle swarm
optimization,” in 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), Apr. 2017, pp. 70–73, doi:
10.1109/BigMM.2017.69.
[11] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995.
MHS’95., Proceedings of the 6th International Symposium, 1995, pp. 39–43, doi: 10.1109/MHS.1995.494215.
[12] X.-S. Yang, “Firefly algorithms for multimodal optimization,” in Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, vol. 5792, pp. 169–178, doi: 10.1007/978-3-642-
04944-6_14.
0
100
200
300
400
500
600
700
800
1 2 3 4 5
AXIS
TIME
RUN/EPOCH
CNN DBN CNN-SA CNN-PSO BH-CNN
Int J Elec & Comp Eng ISSN: 2088-8708 
An optimized deep learning model for optical character recognition applications (Sinan Q. Salih)
3017
[13] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar.
2014, doi: 10.1016/j.advengsoft.2013.12.007.
[14] R. V Rao, V. J. Savsani, and D. P. Vakharia, “Teaching–learning-based optimization: a novel method for constrained mechanical
design optimization problems,” Computer-Aided Design, vol. 43, no. 3, pp. 303–315, Mar. 2011, doi: 10.1016/j.cad.2010.12.015.
[15] S. Q. Salih, A. A. Alsewari, and Z. M. Yaseen, “Pressure vessel design simulation,” in Proceedings of the 2019 8th International
Conference on Software and Computer Applications, Feb. 2019, pp. 120–124, doi: 10.1145/3316615.3316643.
[16] S. Q. Salih et al., “Integrative stochastic model standardization with genetic algorithm for rainfall pattern forecasting in tropical
and semi-arid environments,” Hydrological Sciences Journal, vol. 65, no. 7, pp. 1145–1157, May 2020, doi:
10.1080/02626667.2020.1734813.
[17] U. Beyaztas, S. Q. Salih, K.-W. Chau, N. Al-Ansari, and Z. M. Yaseen, “Construction of functional data analysis modeling
strategy for global solar radiation prediction: application of cross-station paradigm,” Engineering Applications of Computational
Fluid Mechanics, vol. 13, no. 1, pp. 1165–1181, Jan. 2019, doi: 10.1080/19942060.2019.1676314.
[18] A. H. Zahid et al., “A novel construction of dynamic s-box with high nonlinearity using heuristic evolution,” IEEE Access, vol. 9,
pp. 67797–67812, 2021, doi: 10.1109/ACCESS.2021.3077194.
[19] K. Z. Zamli, A. Kader, F. Din, and H. S. Alhadawi, “Selective chaotic maps tiki-taka algorithm for the s-box generation and
optimization,” Neural Computing and Applications, vol. 33, no. 23, pp. 16641–16658, Dec. 2021, doi: 10.1007/s00521-021-
06260-8.
[20] S. Q. Salih, A. A. Alsewari, B. Al-Khateeb, and M. F. Zolkipli, “Novel multi-swarm approach for balancing exploration and
exploitation in particle swarm optimization,” in In Proceesdings of 3rd International Conference of Reliable Information and
Communication Technology 2018 (IRICT 2018), 2018, pp. 196–206, doi: 10.1007/978-3-319-99007-1_19.
[21] H. Tao et al., “A newly developed integrative bio-inspired artificial intelligence model for wind speed prediction,” IEEE Access,
vol. 8, pp. 83347–83358, 2020, doi: 10.1109/ACCESS.2020.2990439.
[22] H. S. Alhadawi, S. Q. Salih, and Y. D. Salman, “Chaotic particle swarm optimization based on meeting room approach for
designing bijective S-boxes,” in Proceedings of International Conference on Emerging Technologies and Intelligent Systems,
2022, pp. 331–341, doi: 10.1007/978-3-030-85990-9_28.
[23] A. Hatamlou, “Black hole: a new heuristic optimization approach for data clustering,” Information Sciences, vol. 222, pp. 175–
184, Feb. 2013, doi: 10.1016/j.ins.2012.08.023.
[24] S. Kumar, D. Datta, and S. K. Singh, “Black hole algorithm and its applications,” in Computational Intelligence Applications in
Modeling and Control, Springer International Publishing, 2015, pp. 147–170, doi: 10.1007/978-3-319-11017-2_7.
[25] S. Q. Salih, “A new training method based on black hole algorithm for convolutional neural network,” Journal of Southwest
Jiaotong University, vol. 54, no. 3, p. 1, Jun. 2019, doi: 10.35741/issn.0258-2724.54.3.22.
[26] A. P. Piotrowski, J. J. Napiorkowski, and P. M. Rowinski, “How novel is the ‘novel’ black hole optimization approach?,”
Information Sciences, vol. 267, pp. 191–200, May 2014, doi: 10.1016/j.ins.2014.01.026.
[27] H. J. Kelley, “Gradient theory of optimal flight paths,” ARS Journal, vol. 30, no. 10, pp. 947–954, Oct. 1960, doi:
10.2514/8.5282.
[28] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better analysis of deep convolutional neural networks,” IEEE
Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 91–100, Jan. 2017, doi: 10.1109/TVCG.2016.2598831.
[29] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” Jun.
2015, [Online]. Available: http://arxiv.org/abs/1506.06579
[30] L. M. R. Rere, M. I. Fanany, and A. M. Arymurthy, “Simulated annealing algorithm for deep learning,” Procedia Computer
Science, vol. 72, pp. 137–144, 2015, doi: 10.1016/j.procs.2015.12.114.
[31] A. R. Syulistyo, D. M. Jati Purnomo, M. F. Rachmadi, and A. Wibowo, “Particle swarm optimization (PSO) for training
optimization on convolutional neural network (CNN),” Jurnal of Computer Sciences and Information, vol. 9, no. 1, Feb. 2016,
doi: 10.21609/jiki.v9i1.366.
BIOGRAPHIES OF AUTHORS
Sinan Q. Salih received the B.Sc. degree in information systems from the
University of Anbar, Al-Anbar, Iraq, in 2010, the M.Sc. degree in computer sciences from
Universiti Tenaga National (UNITEN), Malaysia, in 2012, and the Ph.D. degree in soft
modeling and intelligent systems from Universiti Malaysia Pahang (UMP). His current
research interests include optimization algorithms, nature-inspired metaheuristics, machine
learning, and feature selection problem for real world problems. He can be contacted at
email: sinan.salih@sadiq.edu.iq.
Ahmed L. Khalaf received his B.Sc Eng. degree (Control and Systems
Engineering) from University of Technology, Iraq (2001) and M.Sc. Eng, degree (Computer
Engineering) from Middle Technical University, Iraq (2008). He did his PhD research at
Universiti Putra Malaysia, Malaysia, (2018) in the area of optical sensor based on
nanomaterials for chemical sensing applications. Currently, he is a senior lecturer at the
Department of Computer Engineering Techniques, Al-Ma’moon University College. His
main research interests are fiber optics sensors, optical chemical sensors, nanomaterials, and
computer engineering. He can be contacted at email: ahmed.l.khalaf@sadiq.edu.iq.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018
3018
Nuha Sami Mohsin was born in Baghdad, Iraq. She received her M.Sc. and
degree in Computer Science in 2005 and 2021 from University of Technology and
University of Information and Communication Technology/Institute of Informatics for
Postgraduate Studies respectively. She is currently a doctor teacher in information
technology unit at collage of education Ibn Rushed/University of Baghdad. Her research
interests include optimization, machine learning, artificial intelligence, algorithm
optimization, software engineering, copyright and watermark protection and information
security. She can be contacted at email: nuha.sami@ircoedu.uobaghdad.edu.iq.
Saadya Fahad Jabbar was born in Baghdad, Iraq. She received her B.Sc. and
M.Sc. degree in Computer Science in 2004 and 2015 from University of Baghdad and
University of Al-Mustansiriyah respectively. She is currently a senior lecturer in information
technology unit at collage of education Ibn Rushed/University of Baghdad. Her research
interests include optimization, machine learning and natural language processing. She can be
contacted at email: saadya.fahad@ircoedu.uobaghdad.edu.iq.

More Related Content

PDF
Threshold adaptation and XOR accumulation algorithm for objects detection
PDF
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
PDF
Backbone search for object detection for applications in intrusion warning sy...
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Human Re-identification with Global and Local Siamese Convolution Neural Network
PDF
Cnn acuracia remotesensing-08-00329
PDF
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
Threshold adaptation and XOR accumulation algorithm for objects detection
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Backbone search for object detection for applications in intrusion warning sy...
International Journal of Engineering and Science Invention (IJESI)
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Human Re-identification with Global and Local Siamese Convolution Neural Network
Cnn acuracia remotesensing-08-00329
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...

Similar to An optimized deep learning model for optical character recognition applications (20)

PDF
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
PDF
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
PPTX
deeplearningpresentation-180625071236.pptx
PDF
Multi-scale 3D-convolutional neural network for hyperspectral image classific...
PDF
Data clustering using kernel based
PDF
ArtificialIntelligenceInObjectDetection-Report.pdf
PDF
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PDF
Image Segmentation and Classification using Neural Network
PDF
Image Segmentation and Classification using Neural Network
PDF
An empirical assessment of different kernel functions on the performance of s...
PDF
最近の研究情勢についていくために - Deep Learningを中心に -
PDF
IRJET- Digital Image Forgery Detection using Local Binary Patterns (LBP) and ...
PDF
Conv xg
PPTX
fuzzy LBP for face recognition ppt
PDF
Competent scene classification using feature fusion of pre-trained convolutio...
PDF
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
PPTX
Introduction of deep learning in cse.pptx
PDF
Clustering using kernel entropy principal component analysis and variable ker...
PDF
Deep Image Clustering Based on Label Similarity and Maximizing Mutual Informa...
PDF
A deep locality-sensitive hashing approach for achieving optimal image retri...
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
deeplearningpresentation-180625071236.pptx
Multi-scale 3D-convolutional neural network for hyperspectral image classific...
Data clustering using kernel based
ArtificialIntelligenceInObjectDetection-Report.pdf
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Image Segmentation and Classification using Neural Network
Image Segmentation and Classification using Neural Network
An empirical assessment of different kernel functions on the performance of s...
最近の研究情勢についていくために - Deep Learningを中心に -
IRJET- Digital Image Forgery Detection using Local Binary Patterns (LBP) and ...
Conv xg
fuzzy LBP for face recognition ppt
Competent scene classification using feature fusion of pre-trained convolutio...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
Introduction of deep learning in cse.pptx
Clustering using kernel entropy principal component analysis and variable ker...
Deep Image Clustering Based on Label Similarity and Maximizing Mutual Informa...
A deep locality-sensitive hashing approach for achieving optimal image retri...
Ad

More from IJECEIAES (20)

PDF
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
PDF
Embedded machine learning-based road conditions and driving behavior monitoring
PDF
Advanced control scheme of doubly fed induction generator for wind turbine us...
PDF
Neural network optimizer of proportional-integral-differential controller par...
PDF
An improved modulation technique suitable for a three level flying capacitor ...
PDF
A review on features and methods of potential fishing zone
PDF
Electrical signal interference minimization using appropriate core material f...
PDF
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
PDF
Bibliometric analysis highlighting the role of women in addressing climate ch...
PDF
Voltage and frequency control of microgrid in presence of micro-turbine inter...
PDF
Enhancing battery system identification: nonlinear autoregressive modeling fo...
PDF
Smart grid deployment: from a bibliometric analysis to a survey
PDF
Use of analytical hierarchy process for selecting and prioritizing islanding ...
PDF
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
PDF
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
PDF
Adaptive synchronous sliding control for a robot manipulator based on neural ...
PDF
Remote field-programmable gate array laboratory for signal acquisition and de...
PDF
Detecting and resolving feature envy through automated machine learning and m...
PDF
Smart monitoring technique for solar cell systems using internet of things ba...
PDF
An efficient security framework for intrusion detection and prevention in int...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Embedded machine learning-based road conditions and driving behavior monitoring
Advanced control scheme of doubly fed induction generator for wind turbine us...
Neural network optimizer of proportional-integral-differential controller par...
An improved modulation technique suitable for a three level flying capacitor ...
A review on features and methods of potential fishing zone
Electrical signal interference minimization using appropriate core material f...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Bibliometric analysis highlighting the role of women in addressing climate ch...
Voltage and frequency control of microgrid in presence of micro-turbine inter...
Enhancing battery system identification: nonlinear autoregressive modeling fo...
Smart grid deployment: from a bibliometric analysis to a survey
Use of analytical hierarchy process for selecting and prioritizing islanding ...
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
Adaptive synchronous sliding control for a robot manipulator based on neural ...
Remote field-programmable gate array laboratory for signal acquisition and de...
Detecting and resolving feature envy through automated machine learning and m...
Smart monitoring technique for solar cell systems using internet of things ba...
An efficient security framework for intrusion detection and prevention in int...
Ad

Recently uploaded (20)

PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PPT on Performance Review to get promotions
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
composite construction of structures.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
web development for engineering and engineering
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Geodesy 1.pptx...............................................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
OOP with Java - Java Introduction (Basics)
PPT on Performance Review to get promotions
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Internet of Things (IOT) - A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
composite construction of structures.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
web development for engineering and engineering
CH1 Production IntroductoryConcepts.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Geodesy 1.pptx...............................................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Construction Project Organization Group 2.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Digital Logic Computer Design lecture notes
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx

An optimized deep learning model for optical character recognition applications

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 13, No. 3, June 2023, pp. 3010~3018 ISSN: 2088-8708, DOI: 10.11591/ijece.v13i3.pp3010-3018  3010 Journal homepage: http://ijece.iaescore.com An optimized deep learning model for optical character recognition applications Sinan Q. Salih1 , Ahmed L. Khalaf1 , Nuha Sami Mohsin2 , Saadya Fahad Jabbar2 1 Department of Communication Engineering Technology Department, College of Information Technology, Imam Ja’afar Al-Sadiq University, Baghdad, Iraq 2 College of Education, Ibn Rushed for Human Science, University of Baghdad, Baghdad, Iraq Article Info ABSTRACT Article history: Received Jul 14, 2022 Revised Oct 5, 2022 Accepted Dec 2, 2022 The convolutional neural networks (CNN) are among the most utilized neural networks in various applications, including deep learning. In recent years, the continuing extension of CNN into increasingly complicated domains has made its training process more difficult. Thus, researchers adopted optimized hybrid algorithms to address this problem. In this work, a novel chaotic black hole algorithm-based approach was created for the training of CNN to optimize its performance via avoidance of entrapment in the local minima. The logistic chaotic map was used to initialize the population instead of using the uniform distribution. The proposed training algorithm was developed based on a specific benchmark problem for optical character recognition applications; the proposed method was evaluated for performance in terms of computational accuracy, convergence analysis, and cost. Keywords: Chaotic maps Convolutional neural network Training algorithm Computational intelligence Black hole algorithm This is an open access article under the CC BY-SA license. Corresponding Author: Sinan Q. Salih Department of Communication Engineering Technology, College of Information Technology, Imam Ja’afar Al-Sadiq University Baghdad, Iraq Email: sinan.salih@sadiq.edu.iq 1. INTRODUCTION Every simple artificial neural network (ANN) is basically made up of the input and output layers of neurons, but the necessity for an intermediate hidden layer in ANN gave rise to the concept of deep learning (DL) [1], [2]. So, DL can be considered a more complicated version of ANN that relies on the use of numerous layers with nonlinear processing units; most DL frameworks rely on the supervised or unsupervised data representations learning concept [2]. The year 1965 witnessed the introduction of the first deep feedforward multilayer perceptrons-based working algorithm [3] and since then, DL has improved and been adopted in many applications. DL is appliable in several areas, such as pattern recognition, neural networks, optimization, graphical modeling, artificial intelligence, and signal processing [4]–[6]. Recently, convolutional neural networks (CNN) was developed as a way of achieving a better accuracy during recognition tasks. However, one of CNN main problems is the difficult design of its architecture for specific task. Thus, several architectures of CNN have been developed, such as the LeNet architecture which was originally developed by LeCun [7], [8]. This architecture was implemented for optical character recognition (OCR) and character recognition in several documents. ConvNet is another CNN architecture that relies on the use of 7 layers where each layer has a specific role. Using the same architecture for several tasks has a poor chance of reaching optimal performance; consequently, distinct CNN architectures are built for specific tasks, which requires a lot of research work since there are many types of
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  An optimized deep learning model for optical character recognition applications (Sinan Q. Salih) 3011 machine learning (ML) activities in the sectors [8]. CNN is a robust network nevertheless; it still has some parameters that need to be optimized; these include the learning parameters and the network configuration settings. The performance of CNN has been reported to be reliant on the proper tuning of its network configuration parameters [9], [10]. Theoretical basis has been thoroughly investigated to develop strategies for enhancing CNN parameters, thereby improving its overall performance. This demands a conceptual shift from visual features extraction to the optimization of network structure configuration [10]. The last three decades have seen wide utilization of metaheuristics and nature-inspired algorithms for solving various kinds of NP optimization problems; such metaheuristics include firefly algorithm (FA), grey wolf optimizer (GWO), particle swarm optimization (PSO), nomadic people optimizer (NPO), and teaching-learning based optimization (TLBO) [11]–[14]. These algorithms were developed to handle problems in different engineering fields, information security, and machine learning [15]–[22]. A novel nature inspired algorithm or the black hole algorithm (BHA) was recently developed based on inspiration from the behavior of the black hole (BH) as it pulls in its surrounding stars [23]. The concept of the BHA stemmed from the nature and interaction of the BH with its surrounding solar bodies; it considers a set of stars as the total number of potential solutions in a given iteration and each star can be pulled by the BH at a time to represents the best solution. The next iteration generates a new set of solutions by moving the surrounding stars toward the BH. Stars that are close to the BH at a pre-determined distance are engulfed by the BH and the random generation of other set of stars is immediately implemented. With this concept, the BHA can launch an exploration task in the unexplored areas of the solution space rather than searching the already explored areas. The ability of BHA to solve data clustering problems has been demonstrated; its optimization performance was reportedly better than those of other meta-heuristics [24]. Recently, BH algorithm was adopted for CNN training [25]. The authors demonstrated that the exploration of BH algorithm slow down the searching process. Herein, novel Chaotic BHA-based training algorithm (CBH-CNN) for the training of the CNNs was developed and evaluated. The role of BHA in this approach is to establish the optimal CNN parameters. The exploration of BH was improved by initializing the population using logistic chaotic map, rather than the uniform distribution method. The performance of the proposed approach was evaluated based on specific criteria, such as accuracy and calculated errors. The modified national institute of standards and technology (MNIST) dataset, an approved dataset of handwritten digital images, was used as a reference. This dataset has digital images of 28×28 pixels size and contains 70,000 images. where 60,000 images were employed as the train set while the other 10,000 images were served as the testing data set. The rest of this article is arranged as follows: section 2 covers the basics of the CNN and BH models while section 3 covers the explanation of the proposed CBH-CNN algorithm. In section 4, the analysis results of the proposed CBH-CNN algorithm was presented while section 5 is the conclusion aspect of the work. 2. THEORETICAL BACKGROUND 2.1. Black hole algorithm As aforementioned, Black hole algorithm (BHA) is based mostly on the idea of a region of space with much mass focus on it in a manner that nothing that comes close to it can escape its gravitational pull. Whatever is pulled into the BH is eternally lost. The BHA has two major aspects which are the migration and the re-initialization of stars that had crossed the event horizon around the BH. The BHA relies on the following working principle: first, the 𝑁 + 1 stars, 𝑥𝑖 ∈ 𝑅𝐷 , 𝑖 = 1, . . . , 𝑁 + 1 (where 𝑁=population size) are randomly initialized in the search space prior to the evaluation of their fitness. The candidate star with the best evaluation function is considered the black hole 𝑥𝐵𝐻 and being that the BH is static, it maintains its position until another star with a better solution is found. N represents the number of candidate stars searching for the optimum. The movement of each star towards BH in each generation can be calculated using this relation: 𝑥𝑖(𝑡 + 1) = 𝑥𝑖(𝑡) + 𝑟𝑎𝑛𝑑 × (𝑥𝐵𝐻 − 𝑥𝑖(𝑡)) 𝑖 = 1.2. ⋯ 𝑁, (1) where rand is a randomly generated number in the range between 0 and 1. Any star in the BHA that its distance to the BH is less than the event horizon will disappear. The event horizon has a radius (R) given as: 𝑅 = 𝑓𝐵𝐻 ∑ 𝑓𝑖 𝑁 𝑖=1 , (2) where 𝑓𝑖 and 𝑓𝐵𝐻 represent the fitness values of the BH and the ith star, respectively. While 𝑁 represents the number of stars (individual solutions). If the distance between a BH and an individual solution is less than R, then the individual solution will collapse for another individual will be created and randomly distributed in the solution space. The BHA is easy and simple to implement because it is a parameter-less algorithm. The
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018 3012 BHA can converge to the global optimum in all the runs and cannot be trapped in local optimum like some other heuristics [23] [24] [26]. In this work, the reasons behind using BHA to train CNN is due to its simple structure, easy implementation, and its parameter-less structure. 2.2. The basics of CNNs The concept of DL was introduced by Kelly [27] in 1960 based on the idea of an output becoming the input in the next iteration. Basically, the CNN is a form of DL network that comprised of several convolution layers, the pooling layers (for the reduction of the size of the received input), the ReLU layers that enhances the outputs non-linearity, and the Fully Connected Layer that control output range using the softmax activation function. The CNN exploits the advantages of the relationship between the neighboring neurons. The 3 basic concepts of CNN are the local receptive fields, pooling, and shared weights [28], [29]. Figure 1 depicts a typical CNN architecture. - Local receptive fields (LRF): The input region is small in size and therefore consist of only neuron found in the first 1st hidden layer. The LRF is passed across the input images to produce a different hidden neuron that is comprised of the whole first hidden layer. - Shared weights and biases: The role of this feature is to detects the presence of similar features from different input image locations; the feature map is designated to the map between the input and hidden layers, while the weights are regarded as the shared weights and biases. The filter or kernel to be used is also determined by the shared weights and biases. Image recognition is performed using more than one feature map for the detection of several features; hence, different feature maps can co promise the convolutional layer. - Pooling layers: This is the layer next to the convolutional layers ad its role is to analyze the incoming info from the convolution layer. The common pooling procedures commonly used are Max–pooling and L2 pooling approaches; Max-pooling involves maximum neuron activation while L2 pooling relies on the square root of the sum of the squares of the activations in the 2×2 region. These three concepts collectively form a complete CNN architecture as they are the major components of the connection layer. Before introducing one or more fully connected layers, several convolutions, activation function, and pooling stages are merged first. A loss function is adopted by the model output for the evaluation of the performance in terms of the differences the actual image label and the CNN output. The training of the CNN is aimed at reducing the loss function, and this is achieved using the stochastic gradient descent [7] which is an optimization strategy that first uses the weight of each edge in the network for the estimation of the loss function gradient before updating the weights using the computed gradient [27]. Figure 1. The typical architecture of a CNN model 3. THE PROPOSED TRAINING ALGORITHM This paper presented the development of a training algorithm for CNN; hence, the major aim here is to determine the best values for the parameters of CNN. The BHA was used in this work as the training algorithm for the two reasons earlier stated. Figure 2 depicts the schematic of the proposed CBH-CNN which illustrates the main steps of the CBH-CNN. However, there are three main steps that need to be detailed as presented in the subsequent subsections. 3.1. Encoding the stars The two basic parts of a CNN are the feature extraction component and the classification component; feature extraction (FE) as a process requires the input of several convolution layers, activation function, and Max-pooling. The classifier is mostly made up of fully connected NN layers. The weights set of the NN in the proposed system is considered the structure of each wolf; hence, the approach utilizes a vector of real values that contains all the NN weights. Figure 3 is a depiction of a star for the training of NNs.
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  An optimized deep learning model for optical character recognition applications (Sinan Q. Salih) 3013 Figure 2. CBH-CNN Figure 3. The encoding of each solution in the neural network in CNN [25] 3.2. Stars initialization To initialize the population, the population size is first set up, followed by random creation of stars until the expected population size is reached. As stated earlier, the chaotic logistic map is used for initializing the stars in the search space, as in (3): 𝑥𝑖+1 = (𝑈𝑝𝑝𝑒𝑟 − 𝐿𝑜𝑤𝑒𝑟) × [(1 − 𝑋𝑖) × 𝜇] + 𝐿𝑜𝑤𝑒𝑟 (3) where 𝑥𝑖 is the current position of a star, 𝑥𝑖+1 is the next position generated via logistic map, 𝑈𝐵 and 𝐿𝐵 is values of upper bound and lower bound, respectively, and 𝜇=a controlling parameter, in range [0, 4].
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018 3014 3.3. Evaluation function The cost function of any classification problem is the classification error which can be estimated thus: 𝐸𝑟𝑟 = 𝐹𝑃+𝐹𝑁 𝐹𝑃+𝐹𝑁+𝑇𝑃+𝑇𝑁 (4) where FP is false positive, FN is false negative, TP is true positive, and TN is true negative; these metrics are computed based on the confusion matrix. The actual classification error value is determined by the number of incorrectly classified samples; this implies that the CNN is used to classify the samples based on the values of each star. The pseudocode of the proposed CBH-CNN is given in Figure 4. CBH-CNN Algorithm Input: Dataset, 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠, 𝑁𝑜. 𝑜𝑓 𝑆𝑡𝑎𝑟𝑠, 𝑀𝑎𝑥 𝐵𝑜𝑢𝑛𝑑, 𝑀𝑖𝑛 𝐵𝑜𝑢𝑛𝑑 Output: Best Star Procedure: Generate the initial positions for all stars 𝑋 in via (3) Update the weights and biases of CNN using each star in the population Calculate the evaluation value of each star via (4) Determine the black hole star which is the current best solution For 𝑖𝑡𝑟 = 1 → 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 For 𝑖 = 1 → 𝑁𝑜. 𝑜𝑓𝑆𝑡𝑎𝑟𝑠 Calculate the new position for 𝑠𝑡𝑎𝑟𝑖 𝑢𝑠𝑖𝑛𝑔 (1) Check the boundaries for the new position of 𝑠𝑡𝑎𝑟𝑖 Set the weights and biases of CNN using the new generated position Calculate the evaluation value of each 𝑠𝑡𝑎𝑟𝑖 via (4) End For Determine the event horizon using (2) For 𝑖 = 1 → 𝑁𝑜. 𝑜𝑓𝑆𝑡𝑎𝑟𝑠 If 𝑆𝑡𝑎𝑟𝑖 crosses the event horizon Then Replace the 𝑆𝑡𝑎𝑟𝑖 with a new generated star via (3) End If End For Determine the black hole star which is the current best solution Next i Return final black hole solution Figure 4. The main steps of CBH-CNN 4. RESULTS AND DISCUSSION In this study, we have evaluated our enhanced algorithm based on the same experiment done by the dataset used was sourced from the database of the Modified National Institute of Standards and Technology (MNIST); the dataset comprised of 70,000 images. The dataset was divided into two parts; the first part comprised 60,000 images (considered the training set) which were scanned images of the handwriting of 250 people of which 50% of the people are employees of the US Census Bureau while the remaining 50% are images captured from the handwriting of high school students. These images were stored in the grayscale mode and were sized 28×28 pixels. The remaining 10,000 images in the dataset were used as the testing dataset; they were collected from another set of 250 persons for the sake of comparison) [29]. A population size of 30 stars was used in this study while the maximum number of epochs is 20. The new approach was benchmarked against the standard CNN, deep belief network (DBN), convolutional neural network-based simulating annealing (CNN-SA) [30], and CNN-based PSO in terms of performance [31], convolutional neural network based standard black hole (BH-CNN) [25] as shown in Table 1; the comparison was in terms of the classification accuracy based only on five epochs. All used models attained almost similar performances, as shown in Table 1, while the proposed CBH-CNN performed better based on the considered metric. The performance of CNN-PSO algorithm was acceptable but not up to the performance of the new CBH-CNN. The PSO algorithm has three control parameters (cognitive parameter 𝑐1, social parameters 𝑐2, and inertia weight 𝑤) and these parameters must be optimally tunned (this is another optimization problem). Hence, the simple structure of BHA, which requires no parameter tuning, makes it a simpler ML model with less complexity compared to other models. Figure 5 and Figure 6 demonstrated the performance accuracy and error rate of the evaluated models, accordingly. The comparison of the adopted models based on the execution time is presented in Figure 7. From Figure 7, it can be noticed that DBN requires less execution time owing to its training process that relies on a rapid
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  An optimized deep learning model for optical character recognition applications (Sinan Q. Salih) 3015 contrastive divergence. The proposed CBH-CNN shows longer execution time as compared to DBN and standard CNN due to the additional iterations required by the BHA, and the need to calculate the event horizon. Table 1. The comparison of the performance of the new CBH-CNN some existing models Models Epochs Epoch=1 Epoch=2 Epoch=3 Epoch=4 Epoch=5 CNN 88.87 92.25 93.9 94.81 95.68 DBN 87.46 89.72 90.64 91.14 92.93 CNN-SA 89.18 92.38 94.2 95.19 96.04 CNN-PSO 89.52 92.38 93.91 95.08 96.31 BH-CNN 89.91 92.38 93.91 95.08 96.31 CBH-CNN 89.93 93.12 94.98 95.62 96.96 Figure 5. The classification accuracy Figure 6. The error rates 82 84 86 88 90 92 94 96 98 1 2 3 4 5 Accuracy Run/Epoch CNN DBN CNN-SA CNN-PSO BH-CNN 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 3 4 5 Error Run/Epoch CNN DBN CNN-SA CNN-PSO BH-CNN
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018 3016 Figure 7. The execution times 5. CONCLUSION Deep learning models have received the most attention in the field of machine learning because they can recognize the patterns in most classification problems. The CNN is among the well-known DL models and its main structure consist of two parts-feature extraction and neural network. This paper focused on the training of the second part of the CNN using CBHA, a novel nature-inspired algorithm. The proposed model outperformed the rest of the DL models on the MNIST dataset using only five epochs. The outcome of this study showed that the BHA is a suitable alternative for the training of CNN algorithms. REFERENCES [1] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, Jul. 2006, doi: 10.1162/neco.2006.18.7.1527. [2] N. B. Karayiannis and G. W. Mi, “Growing radial basis neural networks: merging supervised and unsupervised learning with network growth techniques,” IEEE Transactions on Neural Networks, vol. 8, no. 6, pp. 1492–1506, Nov. 1997, doi: 10.1109/72.641471. [3] A. G. Ivakhnenko and V. G. Lapa, “Cybernetic predicting devices,” 1966. [4] T. Dhieb, W. Ouarda, H. Boubaker, and A. M. Alimi, “Deep neural network for online writer identification using beta-elliptic model,” in 2016 International Joint Conference on Neural Networks (IJCNN), Jul. 2016, pp. 1863–1870, doi: 10.1109/IJCNN.2016.7727426. [5] L. Haddad, T. M. Hamdani, W. Ouarda, A. M. Alimi, and A. Abraham, “An adaptation module with dynamic radial basis function neural network using significance concept for writer adaptation,” Journal of Information Assurance, vol. 12, no. 1, 2017. [6] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791. [7] Y. LeCun and M. Ranzato, “Deep learning tutorial,” in Tutorials in international conference on machine learning (ICML’13), 2013, pp. 1–29. [8] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification,” in 2018 IEEE Congress on Evolutionary Computation (CEC), Jul. 2018, pp. 1–8, doi: 10.1109/CEC.2018.8477735. [9] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun, “What is the best multi-stage architecture for object recognition?,” in 2009 IEEE 12th International Conference on Computer Vision, Sep. 2009, pp. 2146–2153, doi: 10.1109/ICCV.2009.5459469. [10] T. Yamasaki, T. Honma, and K. Aizawa, “Efficient optimization of convolutional neural networks using particle swarm optimization,” in 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), Apr. 2017, pp. 70–73, doi: 10.1109/BigMM.2017.69. [11] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS’95., Proceedings of the 6th International Symposium, 1995, pp. 39–43, doi: 10.1109/MHS.1995.494215. [12] X.-S. Yang, “Firefly algorithms for multimodal optimization,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, vol. 5792, pp. 169–178, doi: 10.1007/978-3-642- 04944-6_14. 0 100 200 300 400 500 600 700 800 1 2 3 4 5 AXIS TIME RUN/EPOCH CNN DBN CNN-SA CNN-PSO BH-CNN
  • 8. Int J Elec & Comp Eng ISSN: 2088-8708  An optimized deep learning model for optical character recognition applications (Sinan Q. Salih) 3017 [13] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar. 2014, doi: 10.1016/j.advengsoft.2013.12.007. [14] R. V Rao, V. J. Savsani, and D. P. Vakharia, “Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems,” Computer-Aided Design, vol. 43, no. 3, pp. 303–315, Mar. 2011, doi: 10.1016/j.cad.2010.12.015. [15] S. Q. Salih, A. A. Alsewari, and Z. M. Yaseen, “Pressure vessel design simulation,” in Proceedings of the 2019 8th International Conference on Software and Computer Applications, Feb. 2019, pp. 120–124, doi: 10.1145/3316615.3316643. [16] S. Q. Salih et al., “Integrative stochastic model standardization with genetic algorithm for rainfall pattern forecasting in tropical and semi-arid environments,” Hydrological Sciences Journal, vol. 65, no. 7, pp. 1145–1157, May 2020, doi: 10.1080/02626667.2020.1734813. [17] U. Beyaztas, S. Q. Salih, K.-W. Chau, N. Al-Ansari, and Z. M. Yaseen, “Construction of functional data analysis modeling strategy for global solar radiation prediction: application of cross-station paradigm,” Engineering Applications of Computational Fluid Mechanics, vol. 13, no. 1, pp. 1165–1181, Jan. 2019, doi: 10.1080/19942060.2019.1676314. [18] A. H. Zahid et al., “A novel construction of dynamic s-box with high nonlinearity using heuristic evolution,” IEEE Access, vol. 9, pp. 67797–67812, 2021, doi: 10.1109/ACCESS.2021.3077194. [19] K. Z. Zamli, A. Kader, F. Din, and H. S. Alhadawi, “Selective chaotic maps tiki-taka algorithm for the s-box generation and optimization,” Neural Computing and Applications, vol. 33, no. 23, pp. 16641–16658, Dec. 2021, doi: 10.1007/s00521-021- 06260-8. [20] S. Q. Salih, A. A. Alsewari, B. Al-Khateeb, and M. F. Zolkipli, “Novel multi-swarm approach for balancing exploration and exploitation in particle swarm optimization,” in In Proceesdings of 3rd International Conference of Reliable Information and Communication Technology 2018 (IRICT 2018), 2018, pp. 196–206, doi: 10.1007/978-3-319-99007-1_19. [21] H. Tao et al., “A newly developed integrative bio-inspired artificial intelligence model for wind speed prediction,” IEEE Access, vol. 8, pp. 83347–83358, 2020, doi: 10.1109/ACCESS.2020.2990439. [22] H. S. Alhadawi, S. Q. Salih, and Y. D. Salman, “Chaotic particle swarm optimization based on meeting room approach for designing bijective S-boxes,” in Proceedings of International Conference on Emerging Technologies and Intelligent Systems, 2022, pp. 331–341, doi: 10.1007/978-3-030-85990-9_28. [23] A. Hatamlou, “Black hole: a new heuristic optimization approach for data clustering,” Information Sciences, vol. 222, pp. 175– 184, Feb. 2013, doi: 10.1016/j.ins.2012.08.023. [24] S. Kumar, D. Datta, and S. K. Singh, “Black hole algorithm and its applications,” in Computational Intelligence Applications in Modeling and Control, Springer International Publishing, 2015, pp. 147–170, doi: 10.1007/978-3-319-11017-2_7. [25] S. Q. Salih, “A new training method based on black hole algorithm for convolutional neural network,” Journal of Southwest Jiaotong University, vol. 54, no. 3, p. 1, Jun. 2019, doi: 10.35741/issn.0258-2724.54.3.22. [26] A. P. Piotrowski, J. J. Napiorkowski, and P. M. Rowinski, “How novel is the ‘novel’ black hole optimization approach?,” Information Sciences, vol. 267, pp. 191–200, May 2014, doi: 10.1016/j.ins.2014.01.026. [27] H. J. Kelley, “Gradient theory of optimal flight paths,” ARS Journal, vol. 30, no. 10, pp. 947–954, Oct. 1960, doi: 10.2514/8.5282. [28] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better analysis of deep convolutional neural networks,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 91–100, Jan. 2017, doi: 10.1109/TVCG.2016.2598831. [29] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” Jun. 2015, [Online]. Available: http://arxiv.org/abs/1506.06579 [30] L. M. R. Rere, M. I. Fanany, and A. M. Arymurthy, “Simulated annealing algorithm for deep learning,” Procedia Computer Science, vol. 72, pp. 137–144, 2015, doi: 10.1016/j.procs.2015.12.114. [31] A. R. Syulistyo, D. M. Jati Purnomo, M. F. Rachmadi, and A. Wibowo, “Particle swarm optimization (PSO) for training optimization on convolutional neural network (CNN),” Jurnal of Computer Sciences and Information, vol. 9, no. 1, Feb. 2016, doi: 10.21609/jiki.v9i1.366. BIOGRAPHIES OF AUTHORS Sinan Q. Salih received the B.Sc. degree in information systems from the University of Anbar, Al-Anbar, Iraq, in 2010, the M.Sc. degree in computer sciences from Universiti Tenaga National (UNITEN), Malaysia, in 2012, and the Ph.D. degree in soft modeling and intelligent systems from Universiti Malaysia Pahang (UMP). His current research interests include optimization algorithms, nature-inspired metaheuristics, machine learning, and feature selection problem for real world problems. He can be contacted at email: sinan.salih@sadiq.edu.iq. Ahmed L. Khalaf received his B.Sc Eng. degree (Control and Systems Engineering) from University of Technology, Iraq (2001) and M.Sc. Eng, degree (Computer Engineering) from Middle Technical University, Iraq (2008). He did his PhD research at Universiti Putra Malaysia, Malaysia, (2018) in the area of optical sensor based on nanomaterials for chemical sensing applications. Currently, he is a senior lecturer at the Department of Computer Engineering Techniques, Al-Ma’moon University College. His main research interests are fiber optics sensors, optical chemical sensors, nanomaterials, and computer engineering. He can be contacted at email: ahmed.l.khalaf@sadiq.edu.iq.
  • 9.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 3, June 2023: 3010-3018 3018 Nuha Sami Mohsin was born in Baghdad, Iraq. She received her M.Sc. and degree in Computer Science in 2005 and 2021 from University of Technology and University of Information and Communication Technology/Institute of Informatics for Postgraduate Studies respectively. She is currently a doctor teacher in information technology unit at collage of education Ibn Rushed/University of Baghdad. Her research interests include optimization, machine learning, artificial intelligence, algorithm optimization, software engineering, copyright and watermark protection and information security. She can be contacted at email: nuha.sami@ircoedu.uobaghdad.edu.iq. Saadya Fahad Jabbar was born in Baghdad, Iraq. She received her B.Sc. and M.Sc. degree in Computer Science in 2004 and 2015 from University of Baghdad and University of Al-Mustansiriyah respectively. She is currently a senior lecturer in information technology unit at collage of education Ibn Rushed/University of Baghdad. Her research interests include optimization, machine learning and natural language processing. She can be contacted at email: saadya.fahad@ircoedu.uobaghdad.edu.iq.