SlideShare a Scribd company logo
TELKOMNIKA Telecommunication Computing Electronics and Control
Vol. 20, No. 5, October 2022, pp. 1109~1116
ISSN: 1693-6930, DOI: 10.12928/TELKOMNIKA.v20i5.24096  1109
Journal homepage: http://telkomnika.uad.ac.id
Static-gesture word recognition in Bangla sign language using
convolutional neural network
Kulsum Ara Lipi1
, Sumaita Faria Karim Adrita1
, Zannatul Ferdous Tunny1
, Abir Hasan Munna1
,
Ahmedul Kabir2
1
Department of Information and Communication Technology, Faculty of Science and Technology, Bangladesh University of
Professionals, Dhaka, Bangladesh
2
Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh
Article Info ABSTRACT
Article history:
Received Sep 17, 2021
Revised Jun 23, 2022
Accepted Jul 01, 2022
Sign language is the communication process of people with hearing
impairments. For hearing-impaired communication in Bangladesh and parts
of India, Bangla sign language (BSL) is the standard. While Bangla is one of
the most widely spoken languages in the world, there is a scarcity of
research in the field of BSL recognition. The few research works done so far
focused on detecting BSL alphabets. To the best of our knowledge, no work
on detecting BSL words has been conducted till now for the unavailability of
BSL word dataset. In this research, a small static-gesture word dataset has
been developed, and a deep learning-based method has been introduced that
can detect BSL static-gesture words from images. The dataset, “BSLword”
contains 30 static-gesture BSL words with 1200 images for training.
The training is done using a multi-layered convolutional neural network with
the Adam optimizer. OpenCV is used for image processing and TensorFlow
is used to build the deep learning models. This system can recognize BSL
static-gesture words with 92.50% accuracy on the word dataset.
Keywords:
BSL
BSL word dataset
Convolutional neural network
Static-gesture signs
This is an open access article under the CC BY-SA license.
Corresponding Author:
Ahmedul Kabir
Institute of Information Technology, University of Dhaka
Dhaka, Bangladesh
Email: kabir@iit.du.ac.bd
1. INTRODUCTION
Bangla is the fifth-most widely spoken language on the planet, spoken by almost 230 million people
in Bangladesh and the eastern parts of India. Among them, more than three million are mute or hard of
hearing [1]. There is an enormous correspondence gap between those who can speak and listen to the
language, and those who cannot. The only way deaf and mute people can communicate is using sign
language which uses manual correspondence and body language to pass on significant information. This
mode of communication is quite hard to understand for regular people. This is where the field of computer
vision is arriving at a potential area to help this communication. Nowadays, computer vision is used for
assisting deaf and mute people by automated sign language detection technique. However, these technologies
are not so readily available to the people of underdeveloped countries like Bangladesh.
There are not many books where Bangla gesture-based communication can be studied by deaf and
mute people. National Centre for Special Education Ministry of Social published a book named “Bangla
Ishara Bhasha Obhidhan” (Bangla sign language dictionary) edited by Bangladesh sign language (BSL)
committee in January 1994, and reprinted in March 1997. This book follows British sign pattern. The centre
for disability in development (CDD) published another book named “Ishara Bhashay Jogajog”
(communication in sign language) in 2005 and reprinted in 2015. Apart from these, there are not many
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116
1110
options for people to understand sign language. And this is a huge undertaking that very few people are able
to do. If there would be a Bangla sign language recognizer model, general individuals could easily interact
with disabled individuals. This would reduce the disparity between people with disabilities and the general
population, and ensure a more just society with equal opportunity for all.
This however is a far cry from the current reality for a number of reasons. There is no proper dataset
for Bangla sign words for scientific work and progression. There is also not enough successful research on
Bangla gesture-based communication. In an attempt to alleviate this situation to some extent, we built up a
dataset called BSLword consisting of images of different words in Bangla sign language. This dataset will
help in research-based work and improvement of Bangla sign language. Moreover, we utilized the deep
learning method called convolutional neural network (CNN) to build a model that can recognize words from
the dataset. In this paper, we describe our whole process of dataset construction and model development.
In 2019, Hasan et al. [2] proposed an easily understandable model that recognizes Bangla finger
numerical digits. Using numerous support vector machines for classifying images, they used the histogram of
directed gradient image features to build a classifier. They selected 900 images for training and 100 for
testing, respectively, from ten-digit groups. Their system acquired approximately 95% accuracy. Earlier in
2018, Hoque et al. [3] proposed a procedure to recognize BSL from pictures that acts continuously. They
utilized the convolutional neural organization-based article recognition strategy. Their approach was faster
region-based and they obtained an average accuracy rate of 98.2 percent. Their constraint was perceiving the
letters, which have numerous likenesses among their patterns. Before that, Uddin et al. [4] in 2017 suggested
a model of image handling focused on Bangla sign language translation. At first, YCbCr shading segments
recognize the client’s skin shade and afterward separates the set of features for each input picture. At last, the
separated features are fed to the support vector machine (SVM) to prepare and test. The suggested model
showed an average of 86% accuracy for their trial dataset.
Hossen et al. [5] proposed another strategy of Bengali sign language recognition that uses deep
CNN (DCNN). Static hand signals for 37 letters of the Bengali letter set are interpreted by the technique.
Directing tests on three 37 sign arrangements with full 1147 images with shifting the accuracy of feature
concentrations taken from each test, they have achieved a robust general recognition rate of 96.33 percent in the
training dataset and 84.68 percent in the validation dataset using a deep CNN. In the same year, Islam et al. [6]
developed a deep learning model to cope with perception of the digits of BSL. In this methodology, they
utilized the CNN model to prepare specific signs with a separate preparing dataset. The model was designed
and tried with separately 860 training pictures and 215 test pictures. Their training model picked up about
95% precision. Prior to that, in 2016, Uddin and Chowdhury [7] introduced a structure in 2016 to perceive
BSL by the use of support vector machine. By analysing their structure and looking at their features, which
distinguish each symbol, Bangla sign letters are perceived. They changed hand signs to hue, saturation, value
(HSV) shading space from the red, green, blue (RGB) picture in the proposed system. At that point, Gabor
channels were utilized to obtain wanted hand sign features. The accuracy of their proposed structure is 97.7%.
Islam et al [1], Ishara-Lipi published in 2018, was the primary complete segregated BSL dataset of
characters. The dataset includes 50 arrangements of 36 characters of Bangla basic signs, gathered from
people with different hearing disabilities, including typical volunteers. 1800 characters pictures of Bangla
communication via gestures were considered for the last state. They got 92.65% precision on the training set
and 94.74% precision on the validation set. Ahmed and Akhand (2016) [8] presented a BSL recognition
system centred on the position of fingers. To train the artificial neural network (ANN) for recognition, the
method considered relative tip places of five fingers in two-measurement space, and used location vectors.
The proposed strategy was evaluated on a data set with 518 images with 37 symbols, and 99% recognition
rates were achieved.
In 2012, Rahman et al. [9] proposed a framework for perceiving static hand gestures of the letter set
in Bangla gesture-based communication. They prepared ANN with the sign letters’ features to utilize
feedforward back propagation learning calculation. They worked with 36 letters of BSL letter sets. Their
framework obtains an average precision of 80.902%. Later, in 2015, Yasir et al. [10] introduced a
computational way to actively recognize BSL. For picture preparation and normalization of the sign image,
Gaussian distribution and grayscaling methods are applied. K-means clustering is performed on all the
descriptors, and a SVM classifier is applied.
Islam et al. [11] proposed hand gesture recognition using American sign language (ASL) and
DCNN. In order to find more informative features from hand images, they used DCNN before performing the
final character recognition using a multi-class SVM. Cui et al. [12] proposed a recurrent convolutional neural
network (RCNN) for continuous sign language recognition. They designed a staged optimization process for
their CNN model and tuned it using vast amounts of data and compared their model with other sign language
recognition models. Earlier, in 2016, Hasan and Ahmed [13] proposed a sign language recognition system for
bilingual users. They used a combination of principal component analysis (PCA) and linear discrimination
TELKOMNIKA Telecommun Comput El Control 
Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi)
1111
analysis (LDA) in order to maximize data discrimination between classes. Their system can translate a set of
27 signs to Bengali text with a recognition rate of 96.463% on average. In 2017, Islam et al. [14] have
applied different algorithms for feature extraction of the hand gesture recognition system. They designed a
process for real time ASL recognition using ANN, which achieves an accuracy of 94.32% when recognizing
alphanumeric character signs.
Huang et al. [15] proposed a 3D CNN model for sign language recognition. They used a multilayer
perceptron in order to extract features. They also evaluated their model against 3D CNN and Gaussian
mixture model with hidden markov model (GMM-HMM) using the same dataset. Their approach has higher
accuracy than the GMM-HMM model. In 2019, Khan et al. [16] proposed an approach which will shorten the
workload of training huge models and use a customizable segmented region of interest (ROI).
In their approach, there is a bounding box that the user can move to the hand area on screen, thus relieveing
the system of the burden of finding the hand area. Naglot and Kulkarni [17] used a leap motion controller in
order to recognize real time sign language. Leap motion controller is a 3D non-contact motion sensor which
can detect discrete position and motion of the fingers. Multi-layer perceptron (MLP) neural network with
back propagation (BP) algorithm used to recognize 26 letters of ASL with a recognition rate of 96.15%.
Rafi et al. [18] proposed a VGG19 based CNN for recognizing 38 classes which achieved an accuracy of
89.6%. The proposed framework includes two processing steps: hand form segmentation and feature
extraction from the hand sign.
Rahaman et al. [19] presented a real-time computer vision-based Bengali sign language (BdSL)
recognition system. The system first detects the location of the hand in the using Haar-like feature-based
classifiers. The system attained a vowel recognition accuracy of 98.17 percent and a consonant recognition
accuracy of 94.75 percent. Masood et al. [20] classified based on geographical and temporal variables using
two alternative techniques. The spatial features were classified using CNN, whereas the temporal features
were classified using RNN. The proposed model was able to achieve a high accuracy of 95.2% over a large
set of images. In 2019, Rony et al. [21] suggested a system in which all members of a family, if one or more
members are deaf or mute members are able to converse quickly and easily. They used convolutional neural
networks in our proposed system for hand gesture recognition and classification as well as the other way
around. Also in 2019, Urmee et al. [22] suggested a solution that works in real-time using Xception and our
BdSLInfinite dataset. They employed a big dataset for training in order to produce extremely accurate
findings that were as close to real-life scenarios as possible. With an average detection time of 48.53
milliseconds, they achieved a test accuracy of 98.93 percent. Yasir and Khan [23] Proposed a framework for
BSL detection and recognition (SLDR) in this paper. They have created a system that can recognize the
numerous alphabets of BSL for human-computer interaction, resulting in more accurate outcomes in the shortest
time possible. In 2020, Ongona et al. [24] proposed a system of recognizing BSL letters using MobileNet.
In this paper, we have built a dataset of BSL words that use a static gesture sign. To the best of our
knowledge, this is the first dataset that deals with BSL words. The dataset can be used for training any
machine learning model. We used a CNN on the training portion of the dataset and built a model that gained
92.50% accuracy on the test set. The rest of the paper discusses our methodology and results obtained.
2. METHODOLOGY
2.1. Data collection and pre-processing
There are more than a hundred thousand words in the Bangla language, but all of them do not have a
corresponding word in sign language. Most sign language words are represented by waving of one hand or
both the hands, while some words are represented with static images just like BSL characters. Since this is
rudimentary study in this field, we collected only those words which can be understandable by one hand
gesture and can be taken with static images. We found 30 such words from the BSL dictionary. The words
are shown here in Bangla script with the English transliteration and translation in brackets: দেশ (‘desh’,
country), স্যার (‘sir’, sir), এখানে (‘ekhane’, here), কিছ
ু টা (‘kichuta’, a little bit), গুণ (‘gun’, multiply),
কিন াগ (‘biyog’, subtract), োাঁড়াও (‘darao’, stand), িাস্া (‘basha’, house), স্ুন্দর (‘shundor’, beautiful), িন্ধ
ু
(‘bondhu’, friend), তুকি (‘tumi’, you), দিাথা (‘kothay’, where), স্াহায্য (‘shahajjo’, help), তারা (‘tara’,
star), আজ (‘aaj’, today), স্ি (‘shomoi’, time), দস্ (‘she’, he), স্িাজিল্যাণ (‘shomajkollan’, social
welfare), অেুনরাধ (‘onurodh’, request), োড়ানো (‘darano’, to stand), িাঘ (‘bagh’, tiger), চািড়া (‘chamra’,
skin), কগজজা (‘girja’, church), হকি (‘hockey’, hockey), দজল্ (‘jail’, jail), দিরাি (‘keram’, carrom), কি ানো
(‘piano’, piano), িূরু (‘puru’, thick), স্তয (‘shotto’, truth), দিৌদ্ধ (‘bouddho’, Buddha). The whole data
collection method is divided into five separate steps: Capture images, label all data, crop images, resize
images, and convert to RGB format.
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116
1112
2.1.1. Capture images
Our dataset contains a total of 1200 static images, 40 images for each of the 30 words. We collected
data from several undergraduate students who volunteered for the work. We captured images of different
hand gestures with bare hands in front of a white background. A high-quality resolution mobile camera was
used to take all the pictures. Figure 1 shows some sample pictures.
Figure 1. Some captured images (one sample image per word shown)
2.1.2. Label all data
In this step, we categorized all the images and labelled them according to the words. This labelling
is important since we are using supervised classification. Our labelling followed a numerical convention from
0 to 29 (0, 1, 2, 3, …, 29).
2.1.3. Crop all images
Due to differences in capturing the images, the hand position within the images is different. Hence
cropping is an essential step to use data for continuing the experiment. Uncropped images are all cropped to
observe the proportion of width and height for later usage. Figure 2 shows an example of image cropping.
2.1.4. Resize images and converting to RGB
All cropped images are resized to 64×64 images. This step is necessary to make the dataset
consistent and to make it suitable to be fed to our deep learning model. Our original pictures are captured in
blue, green, red (BGR) color space. So next we convert them to RGB color space.
2.2. Model development
We divided our dataset into two parts using stratified random sampling 80% for training and 20%
for testing. We then train our model using the CNN architecture described in the next section. Once the CNN
model is created, we can input a random person’s Hnd image and the model will detect the sign word.
TELKOMNIKA Telecommun Comput El Control 
Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi)
1113
Figure 2. An example of image cropping
2.2.1. CNN architecture
CNN are artificial neural networks that try to mimic the visual cortex of the human brain.
The artificial neurons in a CNN are connected to a visual field of the local area, called the receptive field.
Discrete convolutions are conducted on the image. The input images are taken in the form of color planes in
the RGB spectrum, and the images are then transformed in order to facilitate predictive analysis. High-level
features, such as the image edges, are obtained by using a kernel which traverses the whole image starting
from top-left and moving towards bottom-right. The CNN model used to recognize these sign words and
here, multi-layer convolutional neural networks are used that are connected to each other [25].
In this paper, the proposed model utilizes the Adam optimizer, an expansion of stochastic gradient
descent, which is freshly adopted by almost all the computer-vision and natural language processing
purposes. For various parameters, the approach calculates a special adaptive learning rate through
measurements of first and second gradient moments [26]. The model is trained for 200 epochs for each batch.
We used a CNN approach of 12 layers similar to the one used in [1], as shown in Figure 3. For convolution
layers 1, 2, 3, 4, 5, and 6, filter sizes are 16, 32, 32, 64, 128, and 256 respectively. The kernel size of each of
these layers is 3×3, and the activation function is ReLU. The max pooling layers are each 3×3 as well. Then
we use a dropout layer with 50% dropout. After that we have a dense layer with 512 units and ReLU
activation. Finally, in the output layer uses ten units with softmax activation.
Figure 3. CNN model architecture
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116
1114
3. EVALUATION AND RESULT
As stated earlier, we used 80% - 20% split, resulting in a total of 960 images for training and 240
images for testing. After training the model for 200 epochs using the multi-layered CNN architecture detailed
in the previous section, we obtained a test set accuracy of 92.50%. We also calculated the metrics precision,
recall and F1-score for each class. The metrics obtained for each class (each of the 30 words signs) are shown
in Table 1. It is seen from the table that the performance of the model is quite good for most of the signs.
For only a few words, the model fails in some cases to recognize the correct word. Some of these words
include কিছ
ু টা (‘kichuta’), তারা (‘tara’), স্ি (‘shomoi’), দস্ (‘she’), চািড়া (‘chamra’), and অেুনরাধ
(‘onurodh’). Looking at the pictures of these signs (as in Figure 1), we can see that some of them are visually
similar and hence prone to confusion by the model. For example, কিছ
ু টা (‘kichuta’- row 1 column 4 in Figure 1)
and তারা (‘tara’- row 3 column 2 in Figure 1) are strikingly similar. The average precision, recall and F1-score
are all more than 0.9, so we can say that the overall performance of the model is quite satisfactory.
Table 1. Metrics of each class (sign) in the BSLword dataset. English transliteration of the word is shown
Word Precision Recall
F1-
score
Word Precision Recall
F1-
score
Word Precision Recall
F1-
score
Sir 1.00 1.00 1.00 Darao 1.00 0.80 0.89 Shomajkollan 0.90 1.00 0.95
Shundor 1.00 1.00 1.00 Desh 1.00 1.00 1.00 Hockey 1.00 1.00 1.00
She 0.82 0.75 0.78 Ekhane 1.00 0.90 0.95 Piano 1.00 0.70 0.82
Tara 0.75 0.90 0.82 Gun 1.00 1.00 1.00 Puru 0.88 1.00 0.93
Shotto 1.00 1.00 1.00 Kichuta 0.80 0.89 0.84 Chamra 0.75 0.86 0.80
Shomoi 1.00 0.67 0.80 Kothay 1.00 0.71 0.83 Jail 0.83 0.83 0.83
Aaj 0.80 1.00 0.89 Onurodh 0.80 0.89 0.84 Girja 1.00 1.00 1.00
Basha 1.00 1.00 1.00 Shahajjo 0.80 1.00 0.89 Bouddho 0.89 1.00 0.94
Biyog 1.00 0.80 0.89 Tumi 1.00 1.00 1.00 Bagh 1.00 1.00 1.00
Bondhu 1.00 1.00 1.00 Darano 0.86 1.00 0.92 Keram 1.00 1.00 1.00
Avg. precision = 0.93, Avg. recall = 0.93, Avg. F-1 score = 0.92
4. CONCLUSION
This paper has introduced a dataset named BSLword, containing 1200 images of 30 static-gesture
words in BSL. To the best of our knowledge, this dataset is the very first word-level dataset of BSL. We used
a CNN model to correctly identify the words represented by the images in the dataset. The system can
recognize BSL static-gesture words with 92.50% accuracy on the word dataset. The average precision, recall
and F1-scores are 0.93, 0.93, and 0.92 respectively. We believe that our dataset would be an exceptional asset
for BSL recognition specialists. Simultaneously, the dataset can also be beneficial for machine learning and
related methods intended for the study of movements for recognizing gestures and signs. We have plans to
extend our work in the future in the following ways: currently BSLword only contains a small subset of
words of BSL. Our next goal would be to include words with dynamic gestures and make it a comprehensive
dataset. This would require not only a huge undertaking in data collection, but also a thorough research to
find the most suitable model. Ultimately, our vision is to complete a system that can recognize any word with
a reasonable degree of accuracy. If that happens, the mute and deaf people of Bangladesh will no longer
suffer from the communication gap that they must endure at present.
REFERENCES
[1] M. S. Islam, S. S. S. Mousumi, N. A. Jessan, A. S. A. Rabby, and S. A. Hossain, “Ishara-Lipi : The First Complete Multipurpose
Open Access Dataset of Isolated Characters for Bangla Sign Language,” 2018 International Conference on Bangla Speech and
Language Processing (ICBSLP), 2018, pp. 1-4, doi: 10.1109/icbslp.2018.8554466.
[2] M. M. Hasan and S. M. M. Akhsan, “Bangla Sign Digits Recognition Using HOG Feature Based Multi-Class Support Vector
Machine,” 2019 4th International Conference on Electrical Information and Communication Technology (EICT), 2019, pp. 1-5,
doi: 10.1109/EICT48899.2019.9068832.
[3] O. B. Hoque, M. I. Jubair, M. S. Islam, A. -F. Akash, and A. S. Paulson, “Real Time Bangladeshi Sign Language Detection using
Faster R-CNN,” 2018 International Conference on Innovation in Engineering and Technology (ICIET), 2018, pp. 1-6,
doi: 10.1109/ciet.2018.8660780.
[4] J. Uddin, F. N. Arko, N. Tabassum, T. R. Trisha, and F. Ahmed, “Bangla Sign Language Interpretation using Bag of Features and
Support Vector Machine,” 2017 3rd International Conference on Electrical Information and Communication Technology (EICT),
2017, pp. 1-4, doi: 10.1109/eict.2017.8275173.
[5] M. A. Hossen, A. Govindaiah, S. Sultana, and A. Bhuiyan, “Bengali Sign Language Recognition Using Deep Convolutional
Neural Network,” 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd
International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2018, pp. 369-373,
doi: 10.1109/iciev.2018.8640962.
[6] M. S. Islam, S. S. S. Mousumi, AKM. S. A. Rabby, S. A. Hossain, and S. Abujar, “A Potent Model to Recognize Bangla Sign
TELKOMNIKA Telecommun Comput El Control 
Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi)
1115
Language Digits Using Convolutional Neural Network,” Procedia Computer Scince, vol. 143, pp. 611-618, 2018,
doi: 10.1016/j.procs.2018.10.438.
[7] M. A. Uddin, and S. A. Chowdhury, “Hand sign language recognition for bangla alphabet using support vector machine,” 2016
International Conference on Innovations in Science, Engineering and Technology (ICISET), 2016, pp. 1-4,
doi: 10.1109/iciset.2016.7856479.
[8] S. T. Ahmed and M. A. H. Akhand, “Bangladeshi Sign Language Recognition using Fingertip Position,” 2016 International
conference on medical engineering, health informatics and technology (MediTec), 2016, pp. 1-5,
doi: 10.1109/meditec.2016.7835364.
[9] M. A. Rahman, A. U. Ambia, I. Abdullah, and S. K. Mondal, “Recognition of Static Hand Gestures of Alphabet in Bangla Sign
Language,” IOSR Journal of Computer Engineering (IOSRJCE), vol. 8, no. 1, pp. 7–13, 2012, doi: 10.9790/0661/0810713.
[10] F. Yasir, P. W. C. Prasad, A. Alsadoon, and A. Elchouemi, “SIFT based approach on Bangla Sign Language Recognition,” 2015
IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA), 2015, pp. 35–39,
doi: 10.1109/iwcia.2015.7449458.
[11] M. R. Islam, U. K. Mitu, R. A. Bhuiyan, and J. Shin, “Hand gesture feature extraction using deep convolutional neural network
for recognizing American sign language,” 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), 2018,
pp. 115-119, doi: 10.1109/ICFSP.2018.8552044.
[12] R. Cui, H. Liu, and C. Zhang, “Recurrent convolutional neural networks for continuous sign language recognition by staged
optimization,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1610-1618,
doi: 10.1109/CVPR.2017.175.
[13] S. M. K. Hasan and M. Ahmad, “A new approach of sign language recognition system for bilingual users,” 2015 International
Conference on Electrical & Electronic Engineering (ICEEE), 2015, pp. 33-36, doi: 10.1109/CEEE.2015.7428284.
[14] M. M. Islam, S. Siddiqua, and J. Afnan, “Real time Hand Gesture Recognition using different algorithms based on American Sign
Language,” 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2017, pp. 1-6,
doi: 10.1109/ICIVPR.2017.7890854.
[15] J. Huang, W. Zhou, H. Li, and W. Li, “Sign language recognition using 3D convolutional neural networks,” 2015 IEEE
international conference on multimedia and expo (ICME), 2015, pp. 1-6, doi: 10.1109/icme.2015.7177428.
[16] S. A. Khan, A. D. Joy, S. M. Asaduzzaman, and M. Hossain, “An Efficient Sign Language Translator Device Using
Convolutional Neural Network and Customized ROI Segmentation,” 2019 2nd International Conference on Communication
Engineering and Technology (ICCET), 2019, pp. 152-156, doi: 10.1109/ICCET.2019.8726895.
[17] D. Naglot and M. Kulkarni, “Real time sign language recognition using the Leap Motion Controller,” 2016 International
Conference on Inventive Computation Technologies (ICICT), 2016, pp. 1-5, doi: 10.1109/INVENTIVE.2016.7830097.
[18] A. M. Rafi, N. Nawal, N. S. N. Bayev, L. Nima, C. Shahnaz, and S. A. Fattah, “Image-based Bengali Sign Language Alphabet
Recognition for Deaf and Dumb Community,” 2019 IEEE Global Humanitarian Technology Conference (GHTC), 2019, pp. 5-11,
doi: 10.1109/GHTC46095.2019.9033031.
[19] M. A. Rahaman, M. Jasim, M. H. Ali, and M. Hasanuzzaman, “Real-time computer vision-based Bengali sign language
recognition,” 2014 17th International Conference on Computer and Information Technology (ICCIT), 2014, pp. 192-197,
doi: 10.1109/ICCITechn.2014.7073150.
[20] S. Masood, A. Srivastava, H. C. Thuwal, and M. Ahmad, “Real-time sign language gesture (word) recognition from video
sequences using CNN and RNN,” Intelligent Engineering Informatics, 2018, pp. 623-632, doi: 10.1007/978-981-10-7566-7_63.
[21] A. J. Rony, K. H. Saikat, M. Tanzeem, and F. M. R. H. Robi, “An effective approach to communicate with the deaf and mute
people by recognizing characters of one-hand bangla sign language using convolutional neural-network,” 2018 4th International
Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), 2018, pp. 74-79,
doi: 10.1109/CEEICT.2018.8628158.
[22] P. P. Urmee, M. A. A. Mashud, J. Akter, A. S. M. M. Jameel, and S. Islam, “Real-time bangla sign language detection using
xception model with augmented dataset,” 2019 IEEE International WIE Conference on Electrical and Computer Engineering
(WIECON-ECE), 2019, pp. 1-5, doi: 10.1109/WIECON-ECE48653.2019.9019934.
[23] R. Yasir and R. A. Khan, “Two-handed hand gesture recognition for Bangla sign language using LDA and ANN,” The 8th
International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), 2014, pp. 1-5,
doi: 10.1109/SKIMA.2014.7083527.
[24] T. M. Angona et al., “Automated Bangla sign language translation system for alphabets by means of MobileNet,” TELKOMNIKA,
vol. 18, no. 3, pp. 1292-1301, 2020, doi: 10.12928/telkomnika.v18i3.15311.
[25] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” 2017 International
Conference on Engineering and Technology (ICET), 2017, pp. 1-6, doi: 10.1109/icengtechnol.2017.8308186.
[26] Z. Zhang, “Improved Adam Optimizer for Deep Neural Networks,” 2018 IEEE/ACM 26th International Symposium on Quality of
Service (IWQoS), 2018, pp. 1-2, doi: 10.1109/IWQoS.2018.8624183.
BIOGRAPHIES OF AUTHORS
Kulsum Ara Lipi is pursuing her B.Sc. in Information and Communication
Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. She has her
interest in Machine Learning and Data Science. Her current research interest includes Deep
Learning and Natural Language Processing. She can be contacted at email:
kulsumlipi@gmail.com.
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116
1116
Sumaita Faria Karim Adrita is pursuing her B.Sc. in Information and
Communication Engineering from Bangladesh University of Professionals, Dhaka,
Bangladesh. She has her interest in Machine Translation specifically in Bangla Sign Language.
Her current research interest includes Natural Language Processing and Deep Learning.
She can be contacted at email: sumaitafaria@gmail.com.
Zannatul Ferdous Tunny is pursuing her B.Sc. in Information and
Communication Engineering from Bangladesh University of Professionals, Dhaka,
Bangladesh. She has her interest in the areas of Artificial Intelligence (AI), Robotics, IOT,
Data Science. She is also interested in Blockchain, NLP and Computer Vision. She can be
contacted at email: zannatulferdous489@gmail.com.
Abir Hasan Munna is pursuing his B.Sc. in Information and Communication
Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. His main areas
of interest are Artificial Intelligence (AI), Robotics, IOT, Data Science. He is also interested in
Computer Vision, NLP and Blockchain. He can be contacted at email:
abirmunna091@gmail.com.
Ahmedul Kabir is an Assistant Professor at Institute of Information technology,
University of Dhaka, Dhaka, Bangladesh. His principal areas of interest are Machine Learning
and Data Mining. He would like to conduct research in these fields both theoretically and in
practical applications for different domains. He is also interested in Software Analytics and
Natural Language Processing. He can be contacted at email: kabir@iit.du.ac.bd.

More Related Content

PDF
Automated Bangla sign language translation system for alphabets by means of M...
PDF
IRJET- Sign Language Interpreter using Image Processing and Machine Learning
PDF
Real Time Sign Language Detection
PDF
Literature Review on Indian Sign Language Recognition System
PDF
IRJET- Hand Gesture based Recognition using CNN Methodology
PDF
IRJET- Sign Language Recognition using Machine Learning Algorithm
PDF
electronics-11-01780-v2.pdf
PDF
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING
Automated Bangla sign language translation system for alphabets by means of M...
IRJET- Sign Language Interpreter using Image Processing and Machine Learning
Real Time Sign Language Detection
Literature Review on Indian Sign Language Recognition System
IRJET- Hand Gesture based Recognition using CNN Methodology
IRJET- Sign Language Recognition using Machine Learning Algorithm
electronics-11-01780-v2.pdf
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING

Similar to Static-gesture word recognition in Bangla sign language using convolutional neural network (20)

PDF
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
PDF
Sign Language Recognition
PDF
Deep convolutional neural network for hand sign language recognition using mo...
PDF
SIGN LANGUAGE RECOGNITION USING MACHINE LEARNING
PDF
Real time Myanmar Sign Language Recognition System using PCA and SVM
PDF
Paper id 23201490
PDF
IRJET- Communication Aid for Deaf and Dumb People
PDF
IRJET - Sign Language Text to Speech Converter using Image Processing and...
PDF
Gesture Acquisition and Recognition of Sign Language
PDF
SIGN LANGUAGE RECOGNITION USING CNN
PDF
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
PDF
IRJET- Vision Based Sign Language by using Matlab
PDF
SignReco: Sign Language Translator
PDF
IRJET- Tamil Sign Language Recognition Using Machine Learning to Aid Deaf and...
PDF
Live Sign Language Translation: A Survey
PDF
GRS '“ Gesture based Recognition System for Indian Sign Language Recognition ...
PDF
Real-Time Sign Language Detector
PDF
Translation of sign language using generic fourier descriptor and nearest nei...
PDF
Sign Language Recognition using Deep Learning
PDF
SIGN LANGUAGE RECOGNITION USING CONVOLUTIONAL NEURAL NETWORK.pdf
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
Sign Language Recognition
Deep convolutional neural network for hand sign language recognition using mo...
SIGN LANGUAGE RECOGNITION USING MACHINE LEARNING
Real time Myanmar Sign Language Recognition System using PCA and SVM
Paper id 23201490
IRJET- Communication Aid for Deaf and Dumb People
IRJET - Sign Language Text to Speech Converter using Image Processing and...
Gesture Acquisition and Recognition of Sign Language
SIGN LANGUAGE RECOGNITION USING CNN
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
IRJET- Vision Based Sign Language by using Matlab
SignReco: Sign Language Translator
IRJET- Tamil Sign Language Recognition Using Machine Learning to Aid Deaf and...
Live Sign Language Translation: A Survey
GRS '“ Gesture based Recognition System for Indian Sign Language Recognition ...
Real-Time Sign Language Detector
Translation of sign language using generic fourier descriptor and nearest nei...
Sign Language Recognition using Deep Learning
SIGN LANGUAGE RECOGNITION USING CONVOLUTIONAL NEURAL NETWORK.pdf
Ad

More from TELKOMNIKA JOURNAL (20)

PDF
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
PDF
Implementation of ICMP flood detection and mitigation system based on softwar...
PDF
Indonesian continuous speech recognition optimization with convolution bidir...
PDF
Recognition and understanding of construction safety signs by final year engi...
PDF
The use of dolomite to overcome grounding resistance in acidic swamp land
PDF
Clustering of swamp land types against soil resistivity and grounding resistance
PDF
Hybrid methodology for parameter algebraic identification in spatial/time dom...
PDF
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
PDF
Deep learning approaches for accurate wood species recognition
PDF
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
PDF
Reversible data hiding with selective bits difference expansion and modulus f...
PDF
Website-based: smart goat farm monitoring cages
PDF
Novel internet of things-spectroscopy methods for targeted water pollutants i...
PDF
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
PDF
Convolutional neural network-based real-time drowsy driver detection for acci...
PDF
Addressing overfitting in comparative study for deep learningbased classifica...
PDF
Integrating artificial intelligence into accounting systems: a qualitative st...
PDF
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
PDF
Adulterated beef detection with redundant gas sensor using optimized convolut...
PDF
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...
Earthquake magnitude prediction based on radon cloud data near Grindulu fault...
Implementation of ICMP flood detection and mitigation system based on softwar...
Indonesian continuous speech recognition optimization with convolution bidir...
Recognition and understanding of construction safety signs by final year engi...
The use of dolomite to overcome grounding resistance in acidic swamp land
Clustering of swamp land types against soil resistivity and grounding resistance
Hybrid methodology for parameter algebraic identification in spatial/time dom...
Integration of image processing with 6-degrees-of-freedom robotic arm for adv...
Deep learning approaches for accurate wood species recognition
Neuromarketing case study: recognition of sweet and sour taste in beverage pr...
Reversible data hiding with selective bits difference expansion and modulus f...
Website-based: smart goat farm monitoring cages
Novel internet of things-spectroscopy methods for targeted water pollutants i...
XGBoost optimization using hybrid Bayesian optimization and nested cross vali...
Convolutional neural network-based real-time drowsy driver detection for acci...
Addressing overfitting in comparative study for deep learningbased classifica...
Integrating artificial intelligence into accounting systems: a qualitative st...
Leveraging technology to improve tuberculosis patient adherence: a comprehens...
Adulterated beef detection with redundant gas sensor using optimized convolut...
A 6G THz MIMO antenna with high gain and wide bandwidth for high-speed wirele...
Ad

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
composite construction of structures.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
DOCX
573137875-Attendance-Management-System-original
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Digital Logic Computer Design lecture notes
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Geodesy 1.pptx...............................................
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Sustainable Sites - Green Building Construction
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
composite construction of structures.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
573137875-Attendance-Management-System-original
Lecture Notes Electrical Wiring System Components
Digital Logic Computer Design lecture notes
additive manufacturing of ss316l using mig welding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Geodesy 1.pptx...............................................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Foundation to blockchain - A guide to Blockchain Tech
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Sustainable Sites - Green Building Construction

Static-gesture word recognition in Bangla sign language using convolutional neural network

  • 1. TELKOMNIKA Telecommunication Computing Electronics and Control Vol. 20, No. 5, October 2022, pp. 1109~1116 ISSN: 1693-6930, DOI: 10.12928/TELKOMNIKA.v20i5.24096  1109 Journal homepage: http://telkomnika.uad.ac.id Static-gesture word recognition in Bangla sign language using convolutional neural network Kulsum Ara Lipi1 , Sumaita Faria Karim Adrita1 , Zannatul Ferdous Tunny1 , Abir Hasan Munna1 , Ahmedul Kabir2 1 Department of Information and Communication Technology, Faculty of Science and Technology, Bangladesh University of Professionals, Dhaka, Bangladesh 2 Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh Article Info ABSTRACT Article history: Received Sep 17, 2021 Revised Jun 23, 2022 Accepted Jul 01, 2022 Sign language is the communication process of people with hearing impairments. For hearing-impaired communication in Bangladesh and parts of India, Bangla sign language (BSL) is the standard. While Bangla is one of the most widely spoken languages in the world, there is a scarcity of research in the field of BSL recognition. The few research works done so far focused on detecting BSL alphabets. To the best of our knowledge, no work on detecting BSL words has been conducted till now for the unavailability of BSL word dataset. In this research, a small static-gesture word dataset has been developed, and a deep learning-based method has been introduced that can detect BSL static-gesture words from images. The dataset, “BSLword” contains 30 static-gesture BSL words with 1200 images for training. The training is done using a multi-layered convolutional neural network with the Adam optimizer. OpenCV is used for image processing and TensorFlow is used to build the deep learning models. This system can recognize BSL static-gesture words with 92.50% accuracy on the word dataset. Keywords: BSL BSL word dataset Convolutional neural network Static-gesture signs This is an open access article under the CC BY-SA license. Corresponding Author: Ahmedul Kabir Institute of Information Technology, University of Dhaka Dhaka, Bangladesh Email: kabir@iit.du.ac.bd 1. INTRODUCTION Bangla is the fifth-most widely spoken language on the planet, spoken by almost 230 million people in Bangladesh and the eastern parts of India. Among them, more than three million are mute or hard of hearing [1]. There is an enormous correspondence gap between those who can speak and listen to the language, and those who cannot. The only way deaf and mute people can communicate is using sign language which uses manual correspondence and body language to pass on significant information. This mode of communication is quite hard to understand for regular people. This is where the field of computer vision is arriving at a potential area to help this communication. Nowadays, computer vision is used for assisting deaf and mute people by automated sign language detection technique. However, these technologies are not so readily available to the people of underdeveloped countries like Bangladesh. There are not many books where Bangla gesture-based communication can be studied by deaf and mute people. National Centre for Special Education Ministry of Social published a book named “Bangla Ishara Bhasha Obhidhan” (Bangla sign language dictionary) edited by Bangladesh sign language (BSL) committee in January 1994, and reprinted in March 1997. This book follows British sign pattern. The centre for disability in development (CDD) published another book named “Ishara Bhashay Jogajog” (communication in sign language) in 2005 and reprinted in 2015. Apart from these, there are not many
  • 2.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116 1110 options for people to understand sign language. And this is a huge undertaking that very few people are able to do. If there would be a Bangla sign language recognizer model, general individuals could easily interact with disabled individuals. This would reduce the disparity between people with disabilities and the general population, and ensure a more just society with equal opportunity for all. This however is a far cry from the current reality for a number of reasons. There is no proper dataset for Bangla sign words for scientific work and progression. There is also not enough successful research on Bangla gesture-based communication. In an attempt to alleviate this situation to some extent, we built up a dataset called BSLword consisting of images of different words in Bangla sign language. This dataset will help in research-based work and improvement of Bangla sign language. Moreover, we utilized the deep learning method called convolutional neural network (CNN) to build a model that can recognize words from the dataset. In this paper, we describe our whole process of dataset construction and model development. In 2019, Hasan et al. [2] proposed an easily understandable model that recognizes Bangla finger numerical digits. Using numerous support vector machines for classifying images, they used the histogram of directed gradient image features to build a classifier. They selected 900 images for training and 100 for testing, respectively, from ten-digit groups. Their system acquired approximately 95% accuracy. Earlier in 2018, Hoque et al. [3] proposed a procedure to recognize BSL from pictures that acts continuously. They utilized the convolutional neural organization-based article recognition strategy. Their approach was faster region-based and they obtained an average accuracy rate of 98.2 percent. Their constraint was perceiving the letters, which have numerous likenesses among their patterns. Before that, Uddin et al. [4] in 2017 suggested a model of image handling focused on Bangla sign language translation. At first, YCbCr shading segments recognize the client’s skin shade and afterward separates the set of features for each input picture. At last, the separated features are fed to the support vector machine (SVM) to prepare and test. The suggested model showed an average of 86% accuracy for their trial dataset. Hossen et al. [5] proposed another strategy of Bengali sign language recognition that uses deep CNN (DCNN). Static hand signals for 37 letters of the Bengali letter set are interpreted by the technique. Directing tests on three 37 sign arrangements with full 1147 images with shifting the accuracy of feature concentrations taken from each test, they have achieved a robust general recognition rate of 96.33 percent in the training dataset and 84.68 percent in the validation dataset using a deep CNN. In the same year, Islam et al. [6] developed a deep learning model to cope with perception of the digits of BSL. In this methodology, they utilized the CNN model to prepare specific signs with a separate preparing dataset. The model was designed and tried with separately 860 training pictures and 215 test pictures. Their training model picked up about 95% precision. Prior to that, in 2016, Uddin and Chowdhury [7] introduced a structure in 2016 to perceive BSL by the use of support vector machine. By analysing their structure and looking at their features, which distinguish each symbol, Bangla sign letters are perceived. They changed hand signs to hue, saturation, value (HSV) shading space from the red, green, blue (RGB) picture in the proposed system. At that point, Gabor channels were utilized to obtain wanted hand sign features. The accuracy of their proposed structure is 97.7%. Islam et al [1], Ishara-Lipi published in 2018, was the primary complete segregated BSL dataset of characters. The dataset includes 50 arrangements of 36 characters of Bangla basic signs, gathered from people with different hearing disabilities, including typical volunteers. 1800 characters pictures of Bangla communication via gestures were considered for the last state. They got 92.65% precision on the training set and 94.74% precision on the validation set. Ahmed and Akhand (2016) [8] presented a BSL recognition system centred on the position of fingers. To train the artificial neural network (ANN) for recognition, the method considered relative tip places of five fingers in two-measurement space, and used location vectors. The proposed strategy was evaluated on a data set with 518 images with 37 symbols, and 99% recognition rates were achieved. In 2012, Rahman et al. [9] proposed a framework for perceiving static hand gestures of the letter set in Bangla gesture-based communication. They prepared ANN with the sign letters’ features to utilize feedforward back propagation learning calculation. They worked with 36 letters of BSL letter sets. Their framework obtains an average precision of 80.902%. Later, in 2015, Yasir et al. [10] introduced a computational way to actively recognize BSL. For picture preparation and normalization of the sign image, Gaussian distribution and grayscaling methods are applied. K-means clustering is performed on all the descriptors, and a SVM classifier is applied. Islam et al. [11] proposed hand gesture recognition using American sign language (ASL) and DCNN. In order to find more informative features from hand images, they used DCNN before performing the final character recognition using a multi-class SVM. Cui et al. [12] proposed a recurrent convolutional neural network (RCNN) for continuous sign language recognition. They designed a staged optimization process for their CNN model and tuned it using vast amounts of data and compared their model with other sign language recognition models. Earlier, in 2016, Hasan and Ahmed [13] proposed a sign language recognition system for bilingual users. They used a combination of principal component analysis (PCA) and linear discrimination
  • 3. TELKOMNIKA Telecommun Comput El Control  Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi) 1111 analysis (LDA) in order to maximize data discrimination between classes. Their system can translate a set of 27 signs to Bengali text with a recognition rate of 96.463% on average. In 2017, Islam et al. [14] have applied different algorithms for feature extraction of the hand gesture recognition system. They designed a process for real time ASL recognition using ANN, which achieves an accuracy of 94.32% when recognizing alphanumeric character signs. Huang et al. [15] proposed a 3D CNN model for sign language recognition. They used a multilayer perceptron in order to extract features. They also evaluated their model against 3D CNN and Gaussian mixture model with hidden markov model (GMM-HMM) using the same dataset. Their approach has higher accuracy than the GMM-HMM model. In 2019, Khan et al. [16] proposed an approach which will shorten the workload of training huge models and use a customizable segmented region of interest (ROI). In their approach, there is a bounding box that the user can move to the hand area on screen, thus relieveing the system of the burden of finding the hand area. Naglot and Kulkarni [17] used a leap motion controller in order to recognize real time sign language. Leap motion controller is a 3D non-contact motion sensor which can detect discrete position and motion of the fingers. Multi-layer perceptron (MLP) neural network with back propagation (BP) algorithm used to recognize 26 letters of ASL with a recognition rate of 96.15%. Rafi et al. [18] proposed a VGG19 based CNN for recognizing 38 classes which achieved an accuracy of 89.6%. The proposed framework includes two processing steps: hand form segmentation and feature extraction from the hand sign. Rahaman et al. [19] presented a real-time computer vision-based Bengali sign language (BdSL) recognition system. The system first detects the location of the hand in the using Haar-like feature-based classifiers. The system attained a vowel recognition accuracy of 98.17 percent and a consonant recognition accuracy of 94.75 percent. Masood et al. [20] classified based on geographical and temporal variables using two alternative techniques. The spatial features were classified using CNN, whereas the temporal features were classified using RNN. The proposed model was able to achieve a high accuracy of 95.2% over a large set of images. In 2019, Rony et al. [21] suggested a system in which all members of a family, if one or more members are deaf or mute members are able to converse quickly and easily. They used convolutional neural networks in our proposed system for hand gesture recognition and classification as well as the other way around. Also in 2019, Urmee et al. [22] suggested a solution that works in real-time using Xception and our BdSLInfinite dataset. They employed a big dataset for training in order to produce extremely accurate findings that were as close to real-life scenarios as possible. With an average detection time of 48.53 milliseconds, they achieved a test accuracy of 98.93 percent. Yasir and Khan [23] Proposed a framework for BSL detection and recognition (SLDR) in this paper. They have created a system that can recognize the numerous alphabets of BSL for human-computer interaction, resulting in more accurate outcomes in the shortest time possible. In 2020, Ongona et al. [24] proposed a system of recognizing BSL letters using MobileNet. In this paper, we have built a dataset of BSL words that use a static gesture sign. To the best of our knowledge, this is the first dataset that deals with BSL words. The dataset can be used for training any machine learning model. We used a CNN on the training portion of the dataset and built a model that gained 92.50% accuracy on the test set. The rest of the paper discusses our methodology and results obtained. 2. METHODOLOGY 2.1. Data collection and pre-processing There are more than a hundred thousand words in the Bangla language, but all of them do not have a corresponding word in sign language. Most sign language words are represented by waving of one hand or both the hands, while some words are represented with static images just like BSL characters. Since this is rudimentary study in this field, we collected only those words which can be understandable by one hand gesture and can be taken with static images. We found 30 such words from the BSL dictionary. The words are shown here in Bangla script with the English transliteration and translation in brackets: দেশ (‘desh’, country), স্যার (‘sir’, sir), এখানে (‘ekhane’, here), কিছ ু টা (‘kichuta’, a little bit), গুণ (‘gun’, multiply), কিন াগ (‘biyog’, subtract), োাঁড়াও (‘darao’, stand), িাস্া (‘basha’, house), স্ুন্দর (‘shundor’, beautiful), িন্ধ ু (‘bondhu’, friend), তুকি (‘tumi’, you), দিাথা (‘kothay’, where), স্াহায্য (‘shahajjo’, help), তারা (‘tara’, star), আজ (‘aaj’, today), স্ি (‘shomoi’, time), দস্ (‘she’, he), স্িাজিল্যাণ (‘shomajkollan’, social welfare), অেুনরাধ (‘onurodh’, request), োড়ানো (‘darano’, to stand), িাঘ (‘bagh’, tiger), চািড়া (‘chamra’, skin), কগজজা (‘girja’, church), হকি (‘hockey’, hockey), দজল্ (‘jail’, jail), দিরাি (‘keram’, carrom), কি ানো (‘piano’, piano), িূরু (‘puru’, thick), স্তয (‘shotto’, truth), দিৌদ্ধ (‘bouddho’, Buddha). The whole data collection method is divided into five separate steps: Capture images, label all data, crop images, resize images, and convert to RGB format.
  • 4.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116 1112 2.1.1. Capture images Our dataset contains a total of 1200 static images, 40 images for each of the 30 words. We collected data from several undergraduate students who volunteered for the work. We captured images of different hand gestures with bare hands in front of a white background. A high-quality resolution mobile camera was used to take all the pictures. Figure 1 shows some sample pictures. Figure 1. Some captured images (one sample image per word shown) 2.1.2. Label all data In this step, we categorized all the images and labelled them according to the words. This labelling is important since we are using supervised classification. Our labelling followed a numerical convention from 0 to 29 (0, 1, 2, 3, …, 29). 2.1.3. Crop all images Due to differences in capturing the images, the hand position within the images is different. Hence cropping is an essential step to use data for continuing the experiment. Uncropped images are all cropped to observe the proportion of width and height for later usage. Figure 2 shows an example of image cropping. 2.1.4. Resize images and converting to RGB All cropped images are resized to 64×64 images. This step is necessary to make the dataset consistent and to make it suitable to be fed to our deep learning model. Our original pictures are captured in blue, green, red (BGR) color space. So next we convert them to RGB color space. 2.2. Model development We divided our dataset into two parts using stratified random sampling 80% for training and 20% for testing. We then train our model using the CNN architecture described in the next section. Once the CNN model is created, we can input a random person’s Hnd image and the model will detect the sign word.
  • 5. TELKOMNIKA Telecommun Comput El Control  Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi) 1113 Figure 2. An example of image cropping 2.2.1. CNN architecture CNN are artificial neural networks that try to mimic the visual cortex of the human brain. The artificial neurons in a CNN are connected to a visual field of the local area, called the receptive field. Discrete convolutions are conducted on the image. The input images are taken in the form of color planes in the RGB spectrum, and the images are then transformed in order to facilitate predictive analysis. High-level features, such as the image edges, are obtained by using a kernel which traverses the whole image starting from top-left and moving towards bottom-right. The CNN model used to recognize these sign words and here, multi-layer convolutional neural networks are used that are connected to each other [25]. In this paper, the proposed model utilizes the Adam optimizer, an expansion of stochastic gradient descent, which is freshly adopted by almost all the computer-vision and natural language processing purposes. For various parameters, the approach calculates a special adaptive learning rate through measurements of first and second gradient moments [26]. The model is trained for 200 epochs for each batch. We used a CNN approach of 12 layers similar to the one used in [1], as shown in Figure 3. For convolution layers 1, 2, 3, 4, 5, and 6, filter sizes are 16, 32, 32, 64, 128, and 256 respectively. The kernel size of each of these layers is 3×3, and the activation function is ReLU. The max pooling layers are each 3×3 as well. Then we use a dropout layer with 50% dropout. After that we have a dense layer with 512 units and ReLU activation. Finally, in the output layer uses ten units with softmax activation. Figure 3. CNN model architecture
  • 6.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116 1114 3. EVALUATION AND RESULT As stated earlier, we used 80% - 20% split, resulting in a total of 960 images for training and 240 images for testing. After training the model for 200 epochs using the multi-layered CNN architecture detailed in the previous section, we obtained a test set accuracy of 92.50%. We also calculated the metrics precision, recall and F1-score for each class. The metrics obtained for each class (each of the 30 words signs) are shown in Table 1. It is seen from the table that the performance of the model is quite good for most of the signs. For only a few words, the model fails in some cases to recognize the correct word. Some of these words include কিছ ু টা (‘kichuta’), তারা (‘tara’), স্ি (‘shomoi’), দস্ (‘she’), চািড়া (‘chamra’), and অেুনরাধ (‘onurodh’). Looking at the pictures of these signs (as in Figure 1), we can see that some of them are visually similar and hence prone to confusion by the model. For example, কিছ ু টা (‘kichuta’- row 1 column 4 in Figure 1) and তারা (‘tara’- row 3 column 2 in Figure 1) are strikingly similar. The average precision, recall and F1-score are all more than 0.9, so we can say that the overall performance of the model is quite satisfactory. Table 1. Metrics of each class (sign) in the BSLword dataset. English transliteration of the word is shown Word Precision Recall F1- score Word Precision Recall F1- score Word Precision Recall F1- score Sir 1.00 1.00 1.00 Darao 1.00 0.80 0.89 Shomajkollan 0.90 1.00 0.95 Shundor 1.00 1.00 1.00 Desh 1.00 1.00 1.00 Hockey 1.00 1.00 1.00 She 0.82 0.75 0.78 Ekhane 1.00 0.90 0.95 Piano 1.00 0.70 0.82 Tara 0.75 0.90 0.82 Gun 1.00 1.00 1.00 Puru 0.88 1.00 0.93 Shotto 1.00 1.00 1.00 Kichuta 0.80 0.89 0.84 Chamra 0.75 0.86 0.80 Shomoi 1.00 0.67 0.80 Kothay 1.00 0.71 0.83 Jail 0.83 0.83 0.83 Aaj 0.80 1.00 0.89 Onurodh 0.80 0.89 0.84 Girja 1.00 1.00 1.00 Basha 1.00 1.00 1.00 Shahajjo 0.80 1.00 0.89 Bouddho 0.89 1.00 0.94 Biyog 1.00 0.80 0.89 Tumi 1.00 1.00 1.00 Bagh 1.00 1.00 1.00 Bondhu 1.00 1.00 1.00 Darano 0.86 1.00 0.92 Keram 1.00 1.00 1.00 Avg. precision = 0.93, Avg. recall = 0.93, Avg. F-1 score = 0.92 4. CONCLUSION This paper has introduced a dataset named BSLword, containing 1200 images of 30 static-gesture words in BSL. To the best of our knowledge, this dataset is the very first word-level dataset of BSL. We used a CNN model to correctly identify the words represented by the images in the dataset. The system can recognize BSL static-gesture words with 92.50% accuracy on the word dataset. The average precision, recall and F1-scores are 0.93, 0.93, and 0.92 respectively. We believe that our dataset would be an exceptional asset for BSL recognition specialists. Simultaneously, the dataset can also be beneficial for machine learning and related methods intended for the study of movements for recognizing gestures and signs. We have plans to extend our work in the future in the following ways: currently BSLword only contains a small subset of words of BSL. Our next goal would be to include words with dynamic gestures and make it a comprehensive dataset. This would require not only a huge undertaking in data collection, but also a thorough research to find the most suitable model. Ultimately, our vision is to complete a system that can recognize any word with a reasonable degree of accuracy. If that happens, the mute and deaf people of Bangladesh will no longer suffer from the communication gap that they must endure at present. REFERENCES [1] M. S. Islam, S. S. S. Mousumi, N. A. Jessan, A. S. A. Rabby, and S. A. Hossain, “Ishara-Lipi : The First Complete Multipurpose Open Access Dataset of Isolated Characters for Bangla Sign Language,” 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1-4, doi: 10.1109/icbslp.2018.8554466. [2] M. M. Hasan and S. M. M. Akhsan, “Bangla Sign Digits Recognition Using HOG Feature Based Multi-Class Support Vector Machine,” 2019 4th International Conference on Electrical Information and Communication Technology (EICT), 2019, pp. 1-5, doi: 10.1109/EICT48899.2019.9068832. [3] O. B. Hoque, M. I. Jubair, M. S. Islam, A. -F. Akash, and A. S. Paulson, “Real Time Bangladeshi Sign Language Detection using Faster R-CNN,” 2018 International Conference on Innovation in Engineering and Technology (ICIET), 2018, pp. 1-6, doi: 10.1109/ciet.2018.8660780. [4] J. Uddin, F. N. Arko, N. Tabassum, T. R. Trisha, and F. Ahmed, “Bangla Sign Language Interpretation using Bag of Features and Support Vector Machine,” 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 2017, pp. 1-4, doi: 10.1109/eict.2017.8275173. [5] M. A. Hossen, A. Govindaiah, S. Sultana, and A. Bhuiyan, “Bengali Sign Language Recognition Using Deep Convolutional Neural Network,” 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2018, pp. 369-373, doi: 10.1109/iciev.2018.8640962. [6] M. S. Islam, S. S. S. Mousumi, AKM. S. A. Rabby, S. A. Hossain, and S. Abujar, “A Potent Model to Recognize Bangla Sign
  • 7. TELKOMNIKA Telecommun Comput El Control  Static-gesture word recognition in Bangla sign language using … (Kulsum Ara Lipi) 1115 Language Digits Using Convolutional Neural Network,” Procedia Computer Scince, vol. 143, pp. 611-618, 2018, doi: 10.1016/j.procs.2018.10.438. [7] M. A. Uddin, and S. A. Chowdhury, “Hand sign language recognition for bangla alphabet using support vector machine,” 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET), 2016, pp. 1-4, doi: 10.1109/iciset.2016.7856479. [8] S. T. Ahmed and M. A. H. Akhand, “Bangladeshi Sign Language Recognition using Fingertip Position,” 2016 International conference on medical engineering, health informatics and technology (MediTec), 2016, pp. 1-5, doi: 10.1109/meditec.2016.7835364. [9] M. A. Rahman, A. U. Ambia, I. Abdullah, and S. K. Mondal, “Recognition of Static Hand Gestures of Alphabet in Bangla Sign Language,” IOSR Journal of Computer Engineering (IOSRJCE), vol. 8, no. 1, pp. 7–13, 2012, doi: 10.9790/0661/0810713. [10] F. Yasir, P. W. C. Prasad, A. Alsadoon, and A. Elchouemi, “SIFT based approach on Bangla Sign Language Recognition,” 2015 IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA), 2015, pp. 35–39, doi: 10.1109/iwcia.2015.7449458. [11] M. R. Islam, U. K. Mitu, R. A. Bhuiyan, and J. Shin, “Hand gesture feature extraction using deep convolutional neural network for recognizing American sign language,” 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), 2018, pp. 115-119, doi: 10.1109/ICFSP.2018.8552044. [12] R. Cui, H. Liu, and C. Zhang, “Recurrent convolutional neural networks for continuous sign language recognition by staged optimization,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1610-1618, doi: 10.1109/CVPR.2017.175. [13] S. M. K. Hasan and M. Ahmad, “A new approach of sign language recognition system for bilingual users,” 2015 International Conference on Electrical & Electronic Engineering (ICEEE), 2015, pp. 33-36, doi: 10.1109/CEEE.2015.7428284. [14] M. M. Islam, S. Siddiqua, and J. Afnan, “Real time Hand Gesture Recognition using different algorithms based on American Sign Language,” 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2017, pp. 1-6, doi: 10.1109/ICIVPR.2017.7890854. [15] J. Huang, W. Zhou, H. Li, and W. Li, “Sign language recognition using 3D convolutional neural networks,” 2015 IEEE international conference on multimedia and expo (ICME), 2015, pp. 1-6, doi: 10.1109/icme.2015.7177428. [16] S. A. Khan, A. D. Joy, S. M. Asaduzzaman, and M. Hossain, “An Efficient Sign Language Translator Device Using Convolutional Neural Network and Customized ROI Segmentation,” 2019 2nd International Conference on Communication Engineering and Technology (ICCET), 2019, pp. 152-156, doi: 10.1109/ICCET.2019.8726895. [17] D. Naglot and M. Kulkarni, “Real time sign language recognition using the Leap Motion Controller,” 2016 International Conference on Inventive Computation Technologies (ICICT), 2016, pp. 1-5, doi: 10.1109/INVENTIVE.2016.7830097. [18] A. M. Rafi, N. Nawal, N. S. N. Bayev, L. Nima, C. Shahnaz, and S. A. Fattah, “Image-based Bengali Sign Language Alphabet Recognition for Deaf and Dumb Community,” 2019 IEEE Global Humanitarian Technology Conference (GHTC), 2019, pp. 5-11, doi: 10.1109/GHTC46095.2019.9033031. [19] M. A. Rahaman, M. Jasim, M. H. Ali, and M. Hasanuzzaman, “Real-time computer vision-based Bengali sign language recognition,” 2014 17th International Conference on Computer and Information Technology (ICCIT), 2014, pp. 192-197, doi: 10.1109/ICCITechn.2014.7073150. [20] S. Masood, A. Srivastava, H. C. Thuwal, and M. Ahmad, “Real-time sign language gesture (word) recognition from video sequences using CNN and RNN,” Intelligent Engineering Informatics, 2018, pp. 623-632, doi: 10.1007/978-981-10-7566-7_63. [21] A. J. Rony, K. H. Saikat, M. Tanzeem, and F. M. R. H. Robi, “An effective approach to communicate with the deaf and mute people by recognizing characters of one-hand bangla sign language using convolutional neural-network,” 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), 2018, pp. 74-79, doi: 10.1109/CEEICT.2018.8628158. [22] P. P. Urmee, M. A. A. Mashud, J. Akter, A. S. M. M. Jameel, and S. Islam, “Real-time bangla sign language detection using xception model with augmented dataset,” 2019 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), 2019, pp. 1-5, doi: 10.1109/WIECON-ECE48653.2019.9019934. [23] R. Yasir and R. A. Khan, “Two-handed hand gesture recognition for Bangla sign language using LDA and ANN,” The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), 2014, pp. 1-5, doi: 10.1109/SKIMA.2014.7083527. [24] T. M. Angona et al., “Automated Bangla sign language translation system for alphabets by means of MobileNet,” TELKOMNIKA, vol. 18, no. 3, pp. 1292-1301, 2020, doi: 10.12928/telkomnika.v18i3.15311. [25] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” 2017 International Conference on Engineering and Technology (ICET), 2017, pp. 1-6, doi: 10.1109/icengtechnol.2017.8308186. [26] Z. Zhang, “Improved Adam Optimizer for Deep Neural Networks,” 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), 2018, pp. 1-2, doi: 10.1109/IWQoS.2018.8624183. BIOGRAPHIES OF AUTHORS Kulsum Ara Lipi is pursuing her B.Sc. in Information and Communication Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. She has her interest in Machine Learning and Data Science. Her current research interest includes Deep Learning and Natural Language Processing. She can be contacted at email: kulsumlipi@gmail.com.
  • 8.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 20, No. 5, October 2022: 1109-1116 1116 Sumaita Faria Karim Adrita is pursuing her B.Sc. in Information and Communication Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. She has her interest in Machine Translation specifically in Bangla Sign Language. Her current research interest includes Natural Language Processing and Deep Learning. She can be contacted at email: sumaitafaria@gmail.com. Zannatul Ferdous Tunny is pursuing her B.Sc. in Information and Communication Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. She has her interest in the areas of Artificial Intelligence (AI), Robotics, IOT, Data Science. She is also interested in Blockchain, NLP and Computer Vision. She can be contacted at email: zannatulferdous489@gmail.com. Abir Hasan Munna is pursuing his B.Sc. in Information and Communication Engineering from Bangladesh University of Professionals, Dhaka, Bangladesh. His main areas of interest are Artificial Intelligence (AI), Robotics, IOT, Data Science. He is also interested in Computer Vision, NLP and Blockchain. He can be contacted at email: abirmunna091@gmail.com. Ahmedul Kabir is an Assistant Professor at Institute of Information technology, University of Dhaka, Dhaka, Bangladesh. His principal areas of interest are Machine Learning and Data Mining. He would like to conduct research in these fields both theoretically and in practical applications for different domains. He is also interested in Software Analytics and Natural Language Processing. He can be contacted at email: kabir@iit.du.ac.bd.