SlideShare a Scribd company logo
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 514
A Real-Time Letter Recognition Model for
Arabic Sign Language Using Kinect and Leap
Motion Controller v2
Miada A. Almasre, Hana Al-Nuaim
Department of Computer Science, King AbdulAziz University, Jeddah, Saudi Arabia
Abstract— The objective of this research is to develop a
supervised machine learning hand-gesturing model to
recognize Arabic Sign Language (ArSL), using two sensors:
Microsoft's Kinect with a Leap Motion Controller. The
proposed model relies on the concept of supervised learning
to predict a hand pose from two depth images and defines a
classifier algorithm to dynamically transform gestural
interactions based on 3D positions of a hand-joint direction
into their corresponding letters whereby live gesturing can
be then compared and letters displayed in real time.
This research is motivated by the need to increase the
opportunity for the Arabic hearing-impaired to
communicate with ease using ArSL and is the first step
towards building a full communication system for the
Arabic hearing impaired that can improve the interpretation
of detected letters using fewer calculations.
To evaluate the model, participants were asked to gesture
the 28 letters of the Arabic alphabet multiple times each to
create an ArSL letter data set of gestures built by the depth
images retrieved by these devices. Then, participants were
later asked to gesture letters to validate the classifier
algorithm developed. The results indicated that using both
devices for the ArSL model were essential in detecting and
recognizing 22 of the 28 Arabic alphabet correctly 100 %.
Keywords— Hand gesture recognition, Arabic Sign
Language, Kinect version 2, Leap Motion Controller,
skeleton.
I. INTRODUCTION
One of the current research areas in the field of computer
vision is automatic hand-gesture recognition, especially in
domains that serve hearing-impaired or deaf individuals.
Hand gestures use the palms, finger positions, and shapes to
create forms referring to different letters and phrases.
Traditional vision-based methods that estimate hand poses
lack robust image detection because they exploit a certain
global-image feature like the contour or silhouette, which
rely totally on input that could be imperfect and are basically
2D renderings of 3D hand poses [1]. These traditional
methods rely on template matching, where the hand pose is
obtained by the nearest-neighbor search in a vast set of
templates, each of which contains a matching descriptor that
does not allow full encoding of all needed information to
extract complete pose parameters [2-4].
Current research tests different technologies that can be used
to detect and interpret hand signs, such as tracking hand
devices based on spatial-temporal features, hand-parsing-
based color devices, and colored gloves or 3D hand-
reference models [5-7]. Many devices used for gesture
recognition map the user’s body or hand gestures to specific
features such as the whole hand, finger position, or velocity.
It seems that most research focuses on exploiting these
devices, but they rarely investigate their combined
capabilities [8]. Microsoft Kinect and LMC, for example, are
two of the latest devices used to create natural user interfaces
(NUIs), which enhance communication between the user and
the devices, relying on hand and body gestures and
movements [8-9].
Microsoft Kinect relies on depth technology, which allows
users to deal with any system via a Web camera through an
NUI in which the user uses hand gestures or verbal
commands to recognize objects without the need to touch the
controller [9]. In 2013, LMC released a device that a user
can use to interact with objects on screens through hand
gestures. This device is a combination of hardware and
software that tracks the movement of hands and fingers and
then converts them into a 3D input [11-12].
This research follows the concept of supervised learning to
predict the hand pose from two depth images and define a
classifier that relies on hand-joint direction. In addition, it
investigates how exploiting both Kinect’s and LMC’s
skeleton detection is useful in the recognition of gestures
used for Arabic Sign Language (ArSL) and whether their
detection can improve letter interpretation with fewer
calculations. We used 3D positions of one hand’s joints and
presented it as a 3D model. The proposed model is a real-
time recognition model for any letter, and it is the first step
toward building a full communication system for the hearing
impaired.
II. LITERATURE REVIEW
Hand detection and recognition are the main communication
phases between humans and computers. In the hand-
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 515
detection phase, some researchers rely on a hand kinematic
model and use it to mold the hand’s shape as a skeleton or
3D hand shape [13]. The hand recognition or classification
phase must actually pass through two steps: (a) hand-feature
extraction, which uses many parameters to detect the hand-
shape details by relying on the application type and goal, and
(b) the gesture classification or recognition process, which
translates the sign [14]. The algorithms and calculation
methods used in these steps will vary depending on whether
the gesture is static or dynamic [13-14].
The recent trend in pose estimation seems more efficient and
robust compared to those that rely on global-image features
because it fuses individual estimations obtained by many
weak pose estimators that gather a subset of the pose
parameters based on part of the input [15-16] to reconstruct
the complete pose. This recent trend has been implemented
in many projects in various fields including those related to
sign-language recognition. Many projects have attempted to
enhance the communication process with the hearing
impaired; some examples include Web sites, mobile
application or desk top like SignSpeak, the ClassAct,
Tawasoul (Connect) and Maktabi (my office) application
[17-20].
All such projects are used to translate and improve
communication between the signer and hearing communities
through vision-based sign-language interpretation or video-
tracker technology. However, because of the lack of a
standard dictionary (and even before an established sign
language), home signs have been used by hearing-impaired
communities as means of communication with each other
[20]. These home signs were not part of a unified language
but were still used as familiar motions and expressions
employed within families or regions [20-21].
In 2014, a technique was introduced to predict the 3D joint
positions from the depth images and the parsed hand parts
[22]. Consequently, they enhanced the multimodal
predictions produced by the previous regression-based
method without making the calculations more complicated
[22]. The prediction accuracy was significantly improved by
the proposed method compared to other experiments that
employed the joint locations from the depth images [22].
Since they presented a method that works on single depth
images (static), they planned to analyze the dynamic
constraints of the joint positions and track the hand-joint
positions in the successive input images [22].
In Japan’s Nara Women’s University, researchers proposed a
finger-spelling recognition method using LMC to provide an
alternative method of voice input for the hearing impaired
[23]. Different experiments were conducted to apply the
genetic algorithm [23]. A quasi-optimal solution with a
recognition rate of 82.71% was obtained, and therefore, the
researchers recommended including the finger spelling of
two letters with movement and that the decision tree
employed should not be fixed [23].
A limited amount of research has been conducted over the
last 5 years within the Arab world regarding the gesture
recognition of ArSL. Nevertheless, a few exhaustive
researches have been conducted to enhance a full recognition
system that can be employed as a means of communication
for the hearing impaired [24]. Each research usually focused
on one part of gesture recognition: alphabet letters, isolated
words, or continuous phrases [24].
As for letter recognition, El-Bendary, Zawbaa, Daoud, ella
Hassanien, and Nakamatsu [25] at 2010, developed an
Arabic-alphabet sign translator capable of identifying 30
characters in Arabic alphabet, with an accuracy of up to
91.3%. They employed rotation, scale, and translation
features extracted from a video of signs, producing a text
representing the letter [25]. In addition, Hemayed and
Hassanien at 2010 presented a model that converted ArSL
signs to voice [26]. As well as, an LMC device was
introduced to implement a system for Arabic-alphabet sign
recognition [27]. Only one signer was responsible for
making 28 Arabic-alphabet signs, which was repeated 10
times for a total of 2,800 frames of data but they
recommended two devices for more accuracy [27].
[28-29], Regarding isolated word recognition, [28]
Shanableh and Assaleh (2007) presented various applications
to recognize isolated ArSL gestures. Different
spatiotemporal feature-extraction techniques were used with
temporal features of a video-based gesture extracted through
forward image predictions. The k-nearest-neighbors
algorithm (KNN) and the Bayesian classifier both is used,
and the results showed higher classification performances,
ranging from 97% to 100% recognition rates [28].
[30-32] for continuous phrase recognition, [30] Assaleh at
2010, used a system based on spatiotemporal features and
HMM models, and the result was a 94% average of word-
recognition rate. However, recognized phrases were not
always grammatically functional or were sometimes
incoherent linguistically [30].
Indeed, to produce powerful recognition systems that are
close to real-life accuracy for ArSL, researchers need to
improve the accuracy of error detection and correction for
two-way communication translation. When the size of the
vocabulary is large, the difficulty of obtaining strong
accuracy within the recognition system increases, but the
system becomes more relevant [21]. The important point of
real-life applications is to get better recognition accuracy
with shorter processing times.
III. RESEARCH OBJECTIVES
The recognition of ArSL poses some problems as with all
sign languages due to the fact that the letters of ArSL are
signed using gestures that are not always detected easily.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 516
Sometimes there are difficulties with reading hand parts,
such as the joints, the palm, the fingertip, etc. “fig. 1”. In
such cases, predicting the meaning of a sign is difficult.
Fig.1: Some ArSL issues.
Some new sensor devices, such as LMC and Kinect, have
some limitations that require further development. Using
both devices may provide more details and enhance any
hand gesture recognition system, especially if more devices
are placed in different settings [21].
To build a translator system using several sensors, we need a
compatible algorithm for keeping track of all high-precision
parts. Therefore, the main objective of this research is to
develop a supervised machine learning hand-gesturing model
to recognize ArSL using Kinect with LMC. This paper
describes the first step (phase) the researcher must take when
constructing a Sign Language interpreter system, which uses
a novel approach by combining Kinect V2 with LMC to
assess the accuracy of ArSL classifications.
IV. KINECT AND LMC
Microsoft’s Kinect Version 2.0, was used to track standing
skeleton with high-depth fidelity. It has an RGB camera,
voice recognition capability, face-tracking capabilities, and
access to the raw sensor records [33] and [34].
The infrared (IR) emitter and IR depth sensor work together
with the PrimeSense Chip. Through this chip, Kinect has a
wider sensing range and tracks 24 joint points “fig. 2” [35-
36].
Fig. 2. 24 joint points that Kinect detect 1
LMC Version 2.0 was used to provide a skeletal tracking
model that offers information about hands, fingers and
overall hand tracking data even if the hands cross over each
other “fig.3”. LMC uses a right-handed Cartesian coordinate
system, and its application programming interface (API)
measures different physical quantities’ units, such as
measuring distance using millimeters, time using
microseconds, and an angle using radians [37].
Fig. 3: Hand skeleton points that LMC retrieves
LMC has an interactive box that is a cube inside an inverse
pyramid “fig. 4” that defines a rectilinear area (“rectilinear”
means related to a straight line; it may refer to a rectilinear
grid, a tessellation of the Euclidean plane) within the LMC
field of view “fig. 4”. The size of this box is determined by
the LMC field of view and the user’s setting in the
application. When the device is placed on any table
horizontally, the original point in the plane (0,0,0) is the
center of the LMC, so it can detect any user’s hand above it
[34]. However, many researchers, such as in our case, rotate
the LMC so that a user’s hand gesture becomes more natural;
instead of a hand hovering over the device, it can simply
gesture normally in front of it. Thus, the user normally
stands and makes sign language gestures.
Fig. 4. Rotate the view field (horizontal to vertical)
plane.
Like Kinect, LMC has an open-source library (SDK Leap
Version2). This library provides APIs that retrieve many
details about the hand, such as the direction and the index of
each finger or each bone in the finger.
V. THE PROPOSED MODEL
The proposed model of this research is to detect and
recognize the hand gestures of ArSL letters using supervised
learning and natural user interface libraries with the Kinect
SDK Version 2 and LMC Kit as follows:
1.1. The devices calibration (joints):
Kinect and LMC give the same type of data that we need
(depth of object) based on a skeleton algorithm. Thus, we get
data about several objects (user hand and body) with high
accuracy. Initially, the two devices are placed in the same
direction with 30 centimeters between them “fig. 5”.
Fig. 5. The tow device’s initial place.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 517
Kinect tracks the joints of the user’s body, while LMC tracks
the hand details of the same user. In this experiment, we
used Microsoft Visual Studio with C# and SDK Version 2 of
Kinect and LMC using a Windows Media3D namespace to
present a 3D model of a hand that provided 3D object
transformation, such as rotation and zooming.
Calibration, between two devices in our research, means
drawing one skeleton from the details retrieved from two
devices. The skeleton drawing is actually lines between
joints using a vector class. We converted the length for each
vector from meters, used by Kinect, to millimeters, used by
Leap, to have standard length units in millimeters. This class
exists in the leap SDK and vector3D class that exist in the
.Net C# library. Moreover, the skeleton part that we need is
just the upper part of the user’s body (head, neck, shoulder,
arm, spin, and hand with fingers). The vector is a structure
that represents a three-component mathematical vector or
point, such as a direction or position in a 3D space. Fig. 6
shows the vector3D with drawing the complete skeleton of
the user.
Fig. 6:Defined vectors in the two devices with skeleton
appear on user.
We stored the hand information in an XML file instead of
using tables in the relational database “fig.9”.
Fig. 7 presents the first perspective structure that explains the
main concept of hand detection and recognition for ArSL
letters using Kinect and LMC.
Fig. 7. The general process of the experiment level 0.
1.2. The testing environment
The data collection was done in a room with white light. The
four participants (two males and two females) their age
between fifteen to forty years to capture different hand sizes,
styles, speeds, and hand orientations for each gesture. They
were not deaf; rather, they were able to mimic sign gestures.
Each participant stood in a position from 1 to 2 meters away
from the front of the Kinect device, and 30 to 50 cm away
from the front of the LMC.
Kinect and LMC were mounted on a table 70 cm from the
floor. The two devices were placed on the same horizontal
surface and 50 cm apart. The background behind each
participant was different because it was captured in different
places. Fig. 8 presents two users in the testing environment
with the main interface of the application that we built to
apply our methodology.
.
Fig. 8. Experimental environment with non-deaf participants
with the main window interface.
The capturing procedure used Microsoft Kinect and LMC as
an input device, and all of the signals that the LMC device
captured were activated, but only the depth and color frames
were activated in Kinect. Then, all of the users were asked to
gesture all letters continuously by making the “new” gesture
and then click on the “select” button for each letter.
For letters with more than one movement, the process of
recording used then the comparison made on frames, where
each frame represents one sign.
An evaluation button is used to export any live sign to a
Microsoft Excel sheet with all of the calculations between
this unknown gesture and all stored gestures for classifying
the gesture and checking if it is a match with any existing
gesture.
1.3. Data set collection
To evaluate the calculation process, we collected an
experimental dataset with all 28 ArSL letters. Each of the
four participants performed each letter at least two times.
Thus, there were 28 × 4 × 2 = 224 signs for all letters. The
average for one user was 28 gestures in 20 minutes.
The depth data hand gesture dataset collected for these cases
using LMC and Kinect consisted of 3D coordinate values
(x,y,z) and an RGB image. Bone direction was the main
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 518
feature, which was used to compare the hand gestures. The
four other features were used to draw each bone finger as a
3D model: the fingertip, the bone center point, the bone from
point, and the bone to point.
We attempted to find out how LMC and Microsoft Kinect
would recognize and translate ArSL from the depth images
of the sensor to ordinary speech or text.
Fig. 9 shows the features of the first bone in the thumb; it is
an XML file that is a snapshot of the sample of the dataset
for the letter “‫.”ﺃ‬ We attempted to find out how LMC and
Microsoft Kinect would recognize and translate ArSL from
the depth images of the sensor to ordinary speech or text.
The data file contains many tags that describe the letter, such
as:
• Sign name: the name of the gesture that the user enters in
the class name field
• Hand: contains a tag for recognizing the left or right
hand, palm position, palm direction, and wrist point
finger. These are the values with respect to the x, y, and z
axes that were obtained from LMC.
• Fingers: the finger index, the (x,y,z) coordinates, and
distance of the fingertip
• Bones: the bone index, direction, center point, and from
point and to point with respect to the (x,y,z) coordinates
and distance
Fig. 9:Gesture dataset XML file.
The threshold value is a point or value that the gesture
accuracy can affect. Through experimentation, researchers
can determine a threshold that is useful for reducing the
computation in the search process, as it becomes the point
separating the accepted from the rejected values. For
instance, if the value is under the threshold value, the gesture
is accepted; otherwise, the gesture is rejected if it goes
beyond the threshold. The threshold value used in this
research was from 15 degrees because when the hand leans
more than this value, the sign will be different.
For each joint point, we drew a line between each
corresponding point (vector). Each vector has a
corresponding direction and an angle, and each angle was
measured based on the corresponding x, y , z angles. These
angles are the main factor for a comparison between two
gestures. To clarify the calculation process, the following
example presents a certain value for a live gesture compared
with a stored one “Fig. 10”.
The first value (point A in the live gesture) represents the
first bone in the thumb: Point (A) (x1, y1, z1) = (1.78, 0.44,
1.95). If we need to compare it with the same bone in the
first stored sign, which is (x2, y2, z2) = ( 0.92, 0.66, 1.51),
we have to apply the difference between each value, as in the
following equation:
X=ABS(180 * x1 /360-180*x2/360) = 49.59
Y =ABS(180 * y1 /360-180*y2/360) = 12.56
Z =ABS(180 * z1 /360-180*z2/360) = 25.
Fig. 10:Clarify skeleton on device.
Thus, the result shows the value more than the threshold, so
the live gesture did not match the first stored value. Keep in
the mind that the comparison was between the same index of
the hand, finger, and bone.
1.4. Training phase
Using the distance or depth value for each hand skeleton
point stored in an XML file, each bone direction in an
unknown live gesture was compared with the same bone
index in the stored signs. If more than one gesture gave an
accepted value, we would calculate the average of the
comparison bones and then accept the minimum one.
“Fig. 11” shows a snapshot from the training phase for the
gesture of the letter “‫,”ﺏ‬ pronounced “ba”. The first
participant clicked on the “new” button using a mouse, and
then the participant made the gesture of the letter and clicked
“select” to fix the gesture; next, the participant clicked
“save” to store it. The participant repeated the same process
for all letters, and so on for all participants. The testing phase
“detection” had enough information to classify or recognize
the gesture as in “fig. 12”. The information about gestures
appears under the left-hand model, where the sign text is, for
example: “‫-ﺕ‬user1-2” means that the gesture is “‫”,ﺕ‬
pronounced “ta.” It matched the stored one made by
participant number 1 in his second attempt “1-2.” In
addition, a timestamp appeared to give us some indication
about the response time in milliseconds.
Fig. 11:A snapshot from the training phase.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 519
Fig. 12. A snapshot from the classification phase.
Table I presents samples of the same letters “‫ﺕ‬ “ and “ ‫”ﺝ‬
signed by four participants. In the training phase, we
repeated some gestures more than one time because some
participants felt tired and the device became too slow to
detect due to heating.
TABLE I. SAMPLE OF “‫”ﺕ‬ AND “‫”ﺝ‬ LETTERS SIGNED
BY THREE PARTICIPANTS
Gesture of
sign ‫ﺕ‬
Gesture of
sign ‫ﺝ‬
User1User2User3
1.5. The algorithm of gesture classification
The bone direction was used to detect the gesture meaning
using a skeletal algorithm that the two devices feature. To
check the similarity of the letters, the comparison step
between each of the two bones that had the same index in
both hands is the core of our algorithm. The algorithm with
four steps of the proposed model is:
1-Initializing values:
• Threshold= threshold numbers (angle);
• List [Sign details] = XML file.
• Local Info= Coordinates value (X,Y,Z)
• IsSimilar (Local Info Value1, Local Info Value2,
Threshold)
Get RGB from Kinect; then, draw a hand by vectors
between each joint, line (from previous joint to next
joint), and points based on LMC coordinates
• Calculate the direction value for each bone (pervious
joint, next joint)
2- Get angles:
• X= Angle between the vector of point(i) around X-
axis
• Y= Angle between the vector of point(i) around Y-
axis
• Z= Angle between the vector of point(i) around Z-
axis
3- IsSimilar (Local Info Value1(x,y,z), Local Info
Value2(x,y,z), Threshold)  If (Same hand index and
same finger index)
• Loop all points to check the difference between
value 1 and value 2;
• If point value < threshold matched
• Else not matched and exist
• Calculate the AVR for (X, Y , Z).
• Calculate the average of all points (all hand/ finger
joints).
4- If the result contains more than one hand, then
select the hand with minimum AVR.
For example, the measurement of the similarity bones’
direction between the unknown gesture skeleton points and
one of the gestures in the dataset (e.g., sign of the number 2).
The following example provides the calculation process;
“Fig. 13” shows the values of each point in the hand
skeleton, and Table II contains example of the full point
values obtained by LMC for thumb and index where we had
the same for each bone in the hand.
Fig. 13: Values of some hand skeleton point.
TABLE II. VALUES OBTAINED BY LMC FOR
EACH BONE IN THE HAND
Unknown Sign
Sign of
number 2
Live Sign angle
values between
point and one
coordinate
(Radian)
Stored Sign
angle values
between point
and one
coordinate
(Radian)
Finger
name
B
on
e
X1 Y1 Z1 X2 Y2 Z2
1
0-
thumb
1
1.7
8
0.4
4
1.9
5
0.9
2
0.6
6
1.5
1
2
0-
thumb
2
2.1
9
0.7
2
1.9
0
1.4
6
0.3
6
1.9
1
3
0-
thumb
3
2.2
5
0.7
8
1.8
9
2.3
6
1.2
4
2.2
4
4 1-index 0 1.6 0.1 1.6 1.4 0.2 1.3
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 520
Unknown Sign
Sign of
number 2
Live Sign angle
values between
point and one
coordinate
(Radian)
Stored Sign
angle values
between point
and one
coordinate
(Radian)
Finger
name
B
on
e
X1 Y1 Z1 X2 Y2 Z2
0 0 7 9 1 8
5 1-index 1 1.4
3
0.3
2
1.8
5
1.1
5
1.0
0
2.4
0
6 1-index 2 1.4
5
0.3
6
1.9
1
1.4
4
2.3
3
2.3
7
7 1-index 3 1.4
6
0.4
0
1.9
5
1.7
2
2.9
1
1.7
4
Thus, the same calculations will apply, and the above table
values become “Table III”. The last column in Table III tests
the matching of an unknown letter with the sign of the
number two, based on the threshold, where there are four
values over 15, which means the unknown sign was not
matched with the sign of the number two. Each letter passes
a comparison process, which is:
• Same Hand (Left, Right)=2
• Same Finger with the same index and direction
(0=Thumb, 1=index, 2=Middle, 3=Ring, 4=little)=5
• Same Bone with the same index and direction
(0=metacarpal, 1=Proximal, 2=intermediate,
3=distal), where the thumb has just three bones = 3 +
4 (4) = 19
Thus, if the letter received 19 yes’s, that means it was a
match; otherwise, it was not. If it was matched with more
than one sign, the stored sign with the minimum average will
be taken. The bar chart of “fig. 14” illustrates the comparison
of each stored letter with letter “‫”ﺕ‬ pronounced “ta” bones.
It was measured by counting the number of “yes” and “no”
matches “fig. 15”.
• “Yes” means all of the bone directions matched, and
“no” means none were a match.
• The y-axis is the number of “yes” and “no” matches (it
will be between 0 and 19).
• The x-axis is the sign of users for all stored letters (u1-
1-‫,)ﺃ‬ meaning the first participant in the first attempt for
letter “‫.”ﺃ‬
Overall, it can be seen that the letter “‫”ﺕ‬ matches correctly
with participant 2 in his first attempt and participant 3 with
his second attempt where the comparison count was equal to
19. The line graph compares the average of letter “‫”ﺕ‬ with
all of the stored letters, and it shows that the minimum
average matched with the same letter for participant 2 in his
first attempt, so it will be the matched result.
Some of the results were incorrect, such as those for the
letter “‫”ﻫـ‬ which the device cannot get correctly. It could be
because many fingers need to cross over one another, or
since it is the final letter in the alphabet, the participants
could have been fatigued, or the device may have heated up.
Even if the participant made the same sign gesture several
times, the devices were not able to capture it properly, so one
extra camera (maybe another LMC) might have been needed
to capture the sign from a different angle. Letters with
multiple frames need a special algorithm to take certain
frames from each sign. Thus, comparing becomes more
efficient.
In spite of these few cases, the recognition of the other letters
was correct 100%, especially for the letters that had clear
shapes, such as “ ‫ﺹ‬,‫ﺩ‬,‫ﻝ‬,‫ﻥ‬,‫ﺱ‬,‫ﺏ‬,‫ﺕ‬,‫ﺙ‬,‫ﺵ‬,‫ﻱ‬,‫ﻡ‬,‫ﺽ‬ ”.
VI. DISCUSSION AND CONCLUSION
This research examined 224 matching cases of 28 letters. We
found the class name (correct letter) with high accuracy for
more than 20 letters because the matched letter value was
less than 15 degrees over the threshold. Meanwhile, the other
letters’ values were not only more than the threshold but also
very far away from the same letter because the device
obtained data from one direction. Some critical cases
included difficulty in counting the joints, or the palm might
have been covered by the fingers. Thus, letters with
overlapping fingers or letters with multiple frames need
more processing to be accurate.
The bone direction feature that was used in our algorithm
gave very high accuracy and did not care about the user’s
hand position. This means that if the participants stood in a
position and made gestures, there was no need in the next
iteration to stand in the same position. The hand did not need
to be in the same position, either. This was true even in the
dynamic classification of letters (once the hand changes, the
letter class changes directly). However, the limitation of
using both LMC and Kinect is that each depends on one
direction. For the accurate recognition of hand gestures in
LMC, only one person should stay in the visible range,
which is only 57 degrees in front of the device and 50 cm
away from the device (users’ movement is restricted).
Using Kinect could possibly overcome a crucial problem
for Arab hearing-impaired people that manifests in the lack
of ArSL interpreters and the excessive variation of ArSL
dialects. Hearing-impaired people who use such a platform
(Kinect and Leap) as an interpreter could actually be offered
the chance to be more interactive if the device is fed with
enough data about these various dialects.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 521
TABLE III. VALUES AFTER APPLYING THE CALCULATIONS
Finger
name
Bone
Unknown Sign Sign of number 2
Average
of
similarity
Less
than
Thresh
old (15)
Live Sign angle values between
point and one coordinate
(Radian)
Similarity percentage
(value/360)
X Y Z X/360 Y/360 Z/360
1 0-thumb 1 33.623 12.494 7.580 9.34% 3.47% 2.11% 4.97% Yes
2 0-thumb 2 20.807 3.700 22.642 5.78% 1.03% 6.29% 4.37% Yes
3 0-thumb 3 8.033 24.307 17.238 2.23% 6.75% 4.79% 4.59% Yes
4 1-index 0 3.023 7.658 7.740 0.84% 2.13% 2.15% 1.71% Yes
5 1-index 1 3.272 62.643 63.421 0.91% 17.40% 17.62% 11.98% Yes
6 1-index 2 6.997 103.946 35.900 1.94% 28.87% 9.97% 13.60% Yes
7 1-index 3 8.744 123.427 11.890 2.43% 34.29% 3.30% 13.34% Yes
8
2-
Middle
0 2.691 5.578 10.466 0.75% 1.55% 2.91% 1.73% Yes
9
2-
Middle
1 19.019 15.298 7.568 5.28% 4.25% 2.10% 3.88% Yes
10
2-
Middle
2 21.267 9.027 2.349 5.91% 2.51% 0.65% 3.02% Yes
11
2-
Middle
3 22.976 3.749 2.229 6.38% 1.04% 0.62% 2.68% Yes
12 3-Ring 0 2.376 3.366 13.235 0.66% 0.93% 3.68% 1.76% Yes
13 3-Ring 1 5.441 53.755 89.312 1.51% 14.93% 24.81% 13.75% Yes
14 3-Ring 2 15.621 91.278 82.127 4.34% 25.35% 22.81% 17.50% No
15 3-Ring 3 19.905 110.122 61.538 5.53% 30.59% 17.09% 17.74% No
16 4-Little 0 1.180 1.591 15.439 0.33% 0.44% 4.29% 1.69% Yes
17 4-Little 1 28.895 35.547 94.157 8.03% 9.87% 26.15% 14.69% Yes
18 4-Little 2 46.680 69.417 85.296 12.97% 19.28% 23.69% 18.65% No
19 4-Little 3 53.066 84.655 67.394 14.74% 23.52% 18.72% 18.99% No
Fig. 14. Bar chart of evaluation of the “ ‫”ﺕ‬ letter.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 522
Fig. 15. Line chart of evaluation of the “‫”ﺕ‬ letter.
REFERENCES
[1] Sarkat, A., Sanyal, G., & Majumder S. (2013). Hand
gesture recognition system: A survey. International
Journal of Computer Applications, 71(15), 0975–
8887.
[2] Ballan, L., Taneja, A., Gall, J., Gool, L. V., &
Pollefeys, M. (2012). Motion capture of hands in
action using discriminative salient points. Paper
presented at European Conference on Computer
Vision (ECCV), Florence, Italy. Springer: Springer
Berlin Heidelberg.
[3] Oikonomidis, I. Kyriazis, N., & Argyros, A. A.
(2011). Efficient model-based 3D tracking of hand
articulations using Kinect. Paper presented at British
Machine Vision Conference (BMVC), UK. Place of
Publication: University of Dundee.
[4] Wang, R. Y., & Popović, J. (2009). Real-time hand-
tracking with a color glove. ACM Transactions on
Graphics (TOG), 28(3), 63.
[5] Hui, L., & Junsong, Y. (2014). Hand parsing and
gesture recognition with a commodity depth camera.
In L. S. Ling, J. Han, P. Kohli, & Z. Zhang (Eds.),
Advances in computer vision and pattern recognition
(pp. 239–265). Switzerland: Spring International
Publishing.
[6] Liang, H., Yuan, J., Thalmann, D., & Zhang, Z.
(2013). Model-based hand pose estimation via
spatial-temporal hand parsing and 3D fingertip
localization. The Visual Computer International
Journal of Computer Graphics, 29(6-8), 837-848.
[7] Lingchen, C., Feng, W., Hui, D., & Kaifan, J. (2013).
A survey on hand gesture recognition. Proceedings of
the International Conference on Computer Sciences
and Applications, Kunming, China. Place of
publication: IEEE Xplore Digital Library.
[8] Suarez, J., & Murphy, R. (2012). Hand gesture
recognition with depth images: A review. Paper
presented at the 21st IEEE International Symposium
on Robot and Human Interactive Communication,
Paris, France. Place of Publication: IEEE Xplore
Digital Library.
[9] Han, J., Shao, L., Xu, D., & Shotton, J. (2013).
Enhanced computer vision with Microsoft Kinect
Sensor: A review. IEEE Trans Cybern, 43(5), 2168–
2267.
[10]Chen, X., Li, H., & Tansley, S. (2013). Kinect Sign
Language Translator expands communication
possibilities. Retrieved from
http://research.microsoft.com/en-
us/collaboration/stories/kinectforsignlanguage_cs.pdf
[11]Leap Motion – Best practices guidelines. (2015)
Available from
https://developer.leapmotion.com/documentation/
[12]Marin, G., Dominio, F., & Zanuttigh, P. (2015,7).
Hand gesture recognition with jointly calibrated Leap
Motion and depth sensor. Multimedia Tools and
Applications journal, 1380-7501, 1-25.
[13]Murthy, G., & Jadon, R. (2009). Review of vision
based hand gestures recognition. International
Journal of Information Technology and Knowledge
Management, India, 2(2), 105–410.
[14]Ali, E., Mircea, N., Richard, B., & Xander, T. (2007).
Vision-based hand pose estimation: A review.
Computer Vision and Image Understanding, 108(1),
52–73.
[15]Hsiang, Y., & Han, J. (2014). Real-time dynamic
hand gesture recognition. Proceedings of the
International Symposium on Computer, Consumer
and Control, Taichung. IEEE Xplore Digital Library:
IEEE.
[16]Li, Y. (2012, June). Hand gesture recognition using
Kinect. In Software Engineering and Service Science
(ICSESS), 2012 IEEE 3rd International Conference
on (pp. 196-199). IEEE.
[17]SignSpeak. (2009). Retrieved from
http://www.signspeak.eu/en/index.html.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN: 2454-1311
www.ijaems.com Page | 523
[18]ClassAct. (n.d.). Retrieved from
https://www.deaftec.org/classact
[19]tawasoul4arsl. (2012). Retrieved from
http://tawasoul4arsl.ksu.edu.sa/
[20]Senghas, R. J., & Monaghan, L. (2002). Signs of
their times: Deaf communities and the culture of
language. Annual Reviews of Anthropology from
Journal Storage (JSTOR), 31(2002), 69–97.
[21]Mohandes, M., Liu, J., & Deriche, M. (2014,
February). A survey of image-based arabic sign
language recognition. In Multi-Conference on
Systems, Signals & Devices (SSD), Barcelona, 2014
11th International (pp. 1-4). IEEE.
[22]Hui, L., Junsong, Y. Y., & Daniel, T. (2015).
Resolving ambiguous hand pose predictions by
exploiting part correlations. IEEE, 25(7), 1125–1139.
[23]Youssif, A., Aboutabl, A., & Ali, H. (2011). Arabic
Sign Language (ArSL) recognition system using
HMM. International Journal of Advanced Computer
Science and Applications, 2(11), 45–51.
[24]EI-Bendary, N., Zawbaa, H. M., Daoud, M. S.,
Hassanien, A. E., & Nakamatsu, K. (2010). ArSLAT:
Arabic Sign Language Alphabets Translator. Paper
presented at the 2010 International Conference on
Computer Information Systems and Industrial
Management Applications (CISIM), Krackow. IEEE
Xplore Digital Library: IEEE.
[25]Hemayed, E. E., & Hassanien, A. S. (2010). Edge-
based recognizer for Arabic sign language alphabet
(ArS2VArabic sign to voice). Paper presented at the
Computer Engineering Conference (ICENCO), 2010
International, Giza. IEEE Xplore Digital Library:
IEEE.
[26]Mohandes, M., Aliyu, S., & Deriche, M. (2014).
Arabic sign language recognition using the Leap
Motion Controller. Paper presented at King Fahd
University of Petroleum & Minerals Dhahran, 31261,
Saudi Arabia. Place of Publication: IEEE Xplore
Digital Library.
[27]Samir, A., & Aboul-Ela, M. (2012). Error detection
and correction approach for Arabic sign language
recognition. Paper presented at the 2012 Seventh
International Conference on Computer Engineering
Systems (ICCES), Cairo. Place of Publication: IEEE
Xplore Digital Library.
[28]Shanableh, T., & Assaleh, K. (2007). Video-based
feature extraction techniques for isolated Arabic sign
language recognition. Paper presented at the 9th
International Symposium on Signal Processing and
Its Applications, Sharjah. Place of Publication: IEEE
Xplore Digital Library.
[29]Mohandes, M., Deriche, M., Johar, U., & Ilyas, S.
(2012). A signer-independent Arabic Sign Language
recognition system using face detection, geometric
features, and a Hidden Markov Model. Comput.
Electr. Eng., 38(2), 422–433.
[30]Assaleh, K. (2010). Continuous Arabic sign language
recognition in user dependent mode. J Intel. Learn.
Syst. App!, 2(1), 19–27.
[31]Tolba, M. F., Samir, A., & Abul-Ela, M. (2012). A
proposed graph matching technique for Arabic sign
language continuous sentences recognition. Paper
presented at the 2012 8th International Conference on
Informatics and Systems (INFOS), Cairo. Place of
Publication: IEEE Xplore Digital Library.
[32]Spiegelmock, M. (2015). Leap Motion Development
Essentials packet. Retrieved from
www.packtpub.com/leap-motion-development-
essentials/book
[33]Jana, A. (2012). Kinect for Windows SDK
programming guide. Birmingham B3 2PB, UK: Packt
Publishing Ltd.
[34]Chen, X., Li, H., Pan, T., Tansley, S., & Zhou, M.
Kinect Sign Language Translator expands
communication possibilities. Retrieved from
http://research.microsoft.com/en-
us/collaboration/stories/kinect-sign-language-
translator.aspx.

More Related Content

PDF
Review on Hand Gesture Recognition
PDF
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
PDF
Using The Hausdorff Algorithm to Enhance Kinect's Recognition of Arabic Sign ...
PDF
[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...
PDF
IRJET - Paint using Hand Gesture
PDF
Ay4102371374
PDF
Appearance based static hand gesture alphabet recognition
PDF
Real time Myanmar Sign Language Recognition System using PCA and SVM
Review on Hand Gesture Recognition
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
Using The Hausdorff Algorithm to Enhance Kinect's Recognition of Arabic Sign ...
[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...
IRJET - Paint using Hand Gesture
Ay4102371374
Appearance based static hand gesture alphabet recognition
Real time Myanmar Sign Language Recognition System using PCA and SVM

What's hot (19)

PDF
Sign Language Recognition with Gesture Analysis
PDF
A gesture recognition system for the Colombian sign language based on convolu...
PDF
Speech Recognition using HMM & GMM Models: A Review on Techniques and Approaches
PDF
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
PDF
IRJET- Survey Paper on Vision based Hand Gesture Recognition
PDF
IRJET- Tamil Sign Language Recognition Using Machine Learning to Aid Deaf and...
PDF
L01117074
PDF
Automated Bangla sign language translation system for alphabets by means of M...
PDF
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
PDF
Gesture Acquisition and Recognition of Sign Language
PDF
Translation of sign language using generic fourier descriptor and nearest nei...
PDF
IRJET- Sign Language Interpreter using Image Processing and Machine Learning
PDF
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
PDF
Character Recognition (Devanagari Script)
PDF
Handwritten character recognition in
PDF
Hand Segmentation for Hand Gesture Recognition
PPTX
Text Detection and Recognition
PDF
IRJET- Finger Gesture Recognition using Laser Line Generator and Camera
PDF
Character recognition of kannada text in scene images using neural
Sign Language Recognition with Gesture Analysis
A gesture recognition system for the Colombian sign language based on convolu...
Speech Recognition using HMM & GMM Models: A Review on Techniques and Approaches
Design and Development of a 2D-Convolution CNN model for Recognition of Handw...
IRJET- Survey Paper on Vision based Hand Gesture Recognition
IRJET- Tamil Sign Language Recognition Using Machine Learning to Aid Deaf and...
L01117074
Automated Bangla sign language translation system for alphabets by means of M...
A SIGNATURE BASED DRAVIDIAN SIGN LANGUAGE RECOGNITION BY SPARSE REPRESENTATION
Gesture Acquisition and Recognition of Sign Language
Translation of sign language using generic fourier descriptor and nearest nei...
IRJET- Sign Language Interpreter using Image Processing and Machine Learning
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Character Recognition (Devanagari Script)
Handwritten character recognition in
Hand Segmentation for Hand Gesture Recognition
Text Detection and Recognition
IRJET- Finger Gesture Recognition using Laser Line Generator and Camera
Character recognition of kannada text in scene images using neural
Ad

Similar to A Real-Time Letter Recognition Model for Arabic Sign Language Using Kinect and Leap Motion Controller v2 (20)

PDF
Dynamic hand gesture recognition of Arabic sign language by using deep convol...
PDF
Sign Language Recognition with Gesture Analysis
PDF
Literature Review on Indian Sign Language Recognition System
PDF
AUTOMATIC TRANSLATION OF ARABIC SIGN TO ARABIC TEXT (ATASAT) SYSTEM
PDF
gesture-recognition
PDF
Automatic recognition of Arabic alphabets sign language using deep learning
PDF
IRJET- Hand Gesture Recognition for Deaf and Dumb
PDF
Smart hand gestures recognition using K-NN based algorithm for video annotati...
PDF
The International Journal of Engineering and Science
PDF
The International Journal of Engineering and Science (IJES)
PDF
Ijaia040203
PDF
Pakistan sign language to Urdu translator using Kinect
PDF
Kinect Sensor based Indian Sign Language Detection with Voice Extraction
PDF
D0361015021
PDF
Novel Approach to Use HU Moments with Image Processing Techniques for Real Ti...
PDF
Glove based wearable devices for sign language-GloSign
DOC
PDF
Automatic Isolated word sign language recognition
PDF
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING
PDF
Video Audio Interface for recognizing gestures of Indian sign Language
Dynamic hand gesture recognition of Arabic sign language by using deep convol...
Sign Language Recognition with Gesture Analysis
Literature Review on Indian Sign Language Recognition System
AUTOMATIC TRANSLATION OF ARABIC SIGN TO ARABIC TEXT (ATASAT) SYSTEM
gesture-recognition
Automatic recognition of Arabic alphabets sign language using deep learning
IRJET- Hand Gesture Recognition for Deaf and Dumb
Smart hand gestures recognition using K-NN based algorithm for video annotati...
The International Journal of Engineering and Science
The International Journal of Engineering and Science (IJES)
Ijaia040203
Pakistan sign language to Urdu translator using Kinect
Kinect Sensor based Indian Sign Language Detection with Voice Extraction
D0361015021
Novel Approach to Use HU Moments with Image Processing Techniques for Real Ti...
Glove based wearable devices for sign language-GloSign
Automatic Isolated word sign language recognition
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING
Video Audio Interface for recognizing gestures of Indian sign Language
Ad

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
DOCX
573137875-Attendance-Management-System-original
PPTX
Geodesy 1.pptx...............................................
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Well-logging-methods_new................
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
PPT on Performance Review to get promotions
PPTX
OOP with Java - Java Introduction (Basics)
PPT
introduction to datamining and warehousing
PPTX
web development for engineering and engineering
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
additive manufacturing of ss316l using mig welding
PDF
composite construction of structures.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Mechanical Engineering MATERIALS Selection
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
573137875-Attendance-Management-System-original
Geodesy 1.pptx...............................................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CH1 Production IntroductoryConcepts.pptx
UNIT 4 Total Quality Management .pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Well-logging-methods_new................
R24 SURVEYING LAB MANUAL for civil enggi
PPT on Performance Review to get promotions
OOP with Java - Java Introduction (Basics)
introduction to datamining and warehousing
web development for engineering and engineering
UNIT-1 - COAL BASED THERMAL POWER PLANTS
additive manufacturing of ss316l using mig welding
composite construction of structures.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Mechanical Engineering MATERIALS Selection

A Real-Time Letter Recognition Model for Arabic Sign Language Using Kinect and Leap Motion Controller v2

  • 1. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 514 A Real-Time Letter Recognition Model for Arabic Sign Language Using Kinect and Leap Motion Controller v2 Miada A. Almasre, Hana Al-Nuaim Department of Computer Science, King AbdulAziz University, Jeddah, Saudi Arabia Abstract— The objective of this research is to develop a supervised machine learning hand-gesturing model to recognize Arabic Sign Language (ArSL), using two sensors: Microsoft's Kinect with a Leap Motion Controller. The proposed model relies on the concept of supervised learning to predict a hand pose from two depth images and defines a classifier algorithm to dynamically transform gestural interactions based on 3D positions of a hand-joint direction into their corresponding letters whereby live gesturing can be then compared and letters displayed in real time. This research is motivated by the need to increase the opportunity for the Arabic hearing-impaired to communicate with ease using ArSL and is the first step towards building a full communication system for the Arabic hearing impaired that can improve the interpretation of detected letters using fewer calculations. To evaluate the model, participants were asked to gesture the 28 letters of the Arabic alphabet multiple times each to create an ArSL letter data set of gestures built by the depth images retrieved by these devices. Then, participants were later asked to gesture letters to validate the classifier algorithm developed. The results indicated that using both devices for the ArSL model were essential in detecting and recognizing 22 of the 28 Arabic alphabet correctly 100 %. Keywords— Hand gesture recognition, Arabic Sign Language, Kinect version 2, Leap Motion Controller, skeleton. I. INTRODUCTION One of the current research areas in the field of computer vision is automatic hand-gesture recognition, especially in domains that serve hearing-impaired or deaf individuals. Hand gestures use the palms, finger positions, and shapes to create forms referring to different letters and phrases. Traditional vision-based methods that estimate hand poses lack robust image detection because they exploit a certain global-image feature like the contour or silhouette, which rely totally on input that could be imperfect and are basically 2D renderings of 3D hand poses [1]. These traditional methods rely on template matching, where the hand pose is obtained by the nearest-neighbor search in a vast set of templates, each of which contains a matching descriptor that does not allow full encoding of all needed information to extract complete pose parameters [2-4]. Current research tests different technologies that can be used to detect and interpret hand signs, such as tracking hand devices based on spatial-temporal features, hand-parsing- based color devices, and colored gloves or 3D hand- reference models [5-7]. Many devices used for gesture recognition map the user’s body or hand gestures to specific features such as the whole hand, finger position, or velocity. It seems that most research focuses on exploiting these devices, but they rarely investigate their combined capabilities [8]. Microsoft Kinect and LMC, for example, are two of the latest devices used to create natural user interfaces (NUIs), which enhance communication between the user and the devices, relying on hand and body gestures and movements [8-9]. Microsoft Kinect relies on depth technology, which allows users to deal with any system via a Web camera through an NUI in which the user uses hand gestures or verbal commands to recognize objects without the need to touch the controller [9]. In 2013, LMC released a device that a user can use to interact with objects on screens through hand gestures. This device is a combination of hardware and software that tracks the movement of hands and fingers and then converts them into a 3D input [11-12]. This research follows the concept of supervised learning to predict the hand pose from two depth images and define a classifier that relies on hand-joint direction. In addition, it investigates how exploiting both Kinect’s and LMC’s skeleton detection is useful in the recognition of gestures used for Arabic Sign Language (ArSL) and whether their detection can improve letter interpretation with fewer calculations. We used 3D positions of one hand’s joints and presented it as a 3D model. The proposed model is a real- time recognition model for any letter, and it is the first step toward building a full communication system for the hearing impaired. II. LITERATURE REVIEW Hand detection and recognition are the main communication phases between humans and computers. In the hand-
  • 2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 515 detection phase, some researchers rely on a hand kinematic model and use it to mold the hand’s shape as a skeleton or 3D hand shape [13]. The hand recognition or classification phase must actually pass through two steps: (a) hand-feature extraction, which uses many parameters to detect the hand- shape details by relying on the application type and goal, and (b) the gesture classification or recognition process, which translates the sign [14]. The algorithms and calculation methods used in these steps will vary depending on whether the gesture is static or dynamic [13-14]. The recent trend in pose estimation seems more efficient and robust compared to those that rely on global-image features because it fuses individual estimations obtained by many weak pose estimators that gather a subset of the pose parameters based on part of the input [15-16] to reconstruct the complete pose. This recent trend has been implemented in many projects in various fields including those related to sign-language recognition. Many projects have attempted to enhance the communication process with the hearing impaired; some examples include Web sites, mobile application or desk top like SignSpeak, the ClassAct, Tawasoul (Connect) and Maktabi (my office) application [17-20]. All such projects are used to translate and improve communication between the signer and hearing communities through vision-based sign-language interpretation or video- tracker technology. However, because of the lack of a standard dictionary (and even before an established sign language), home signs have been used by hearing-impaired communities as means of communication with each other [20]. These home signs were not part of a unified language but were still used as familiar motions and expressions employed within families or regions [20-21]. In 2014, a technique was introduced to predict the 3D joint positions from the depth images and the parsed hand parts [22]. Consequently, they enhanced the multimodal predictions produced by the previous regression-based method without making the calculations more complicated [22]. The prediction accuracy was significantly improved by the proposed method compared to other experiments that employed the joint locations from the depth images [22]. Since they presented a method that works on single depth images (static), they planned to analyze the dynamic constraints of the joint positions and track the hand-joint positions in the successive input images [22]. In Japan’s Nara Women’s University, researchers proposed a finger-spelling recognition method using LMC to provide an alternative method of voice input for the hearing impaired [23]. Different experiments were conducted to apply the genetic algorithm [23]. A quasi-optimal solution with a recognition rate of 82.71% was obtained, and therefore, the researchers recommended including the finger spelling of two letters with movement and that the decision tree employed should not be fixed [23]. A limited amount of research has been conducted over the last 5 years within the Arab world regarding the gesture recognition of ArSL. Nevertheless, a few exhaustive researches have been conducted to enhance a full recognition system that can be employed as a means of communication for the hearing impaired [24]. Each research usually focused on one part of gesture recognition: alphabet letters, isolated words, or continuous phrases [24]. As for letter recognition, El-Bendary, Zawbaa, Daoud, ella Hassanien, and Nakamatsu [25] at 2010, developed an Arabic-alphabet sign translator capable of identifying 30 characters in Arabic alphabet, with an accuracy of up to 91.3%. They employed rotation, scale, and translation features extracted from a video of signs, producing a text representing the letter [25]. In addition, Hemayed and Hassanien at 2010 presented a model that converted ArSL signs to voice [26]. As well as, an LMC device was introduced to implement a system for Arabic-alphabet sign recognition [27]. Only one signer was responsible for making 28 Arabic-alphabet signs, which was repeated 10 times for a total of 2,800 frames of data but they recommended two devices for more accuracy [27]. [28-29], Regarding isolated word recognition, [28] Shanableh and Assaleh (2007) presented various applications to recognize isolated ArSL gestures. Different spatiotemporal feature-extraction techniques were used with temporal features of a video-based gesture extracted through forward image predictions. The k-nearest-neighbors algorithm (KNN) and the Bayesian classifier both is used, and the results showed higher classification performances, ranging from 97% to 100% recognition rates [28]. [30-32] for continuous phrase recognition, [30] Assaleh at 2010, used a system based on spatiotemporal features and HMM models, and the result was a 94% average of word- recognition rate. However, recognized phrases were not always grammatically functional or were sometimes incoherent linguistically [30]. Indeed, to produce powerful recognition systems that are close to real-life accuracy for ArSL, researchers need to improve the accuracy of error detection and correction for two-way communication translation. When the size of the vocabulary is large, the difficulty of obtaining strong accuracy within the recognition system increases, but the system becomes more relevant [21]. The important point of real-life applications is to get better recognition accuracy with shorter processing times. III. RESEARCH OBJECTIVES The recognition of ArSL poses some problems as with all sign languages due to the fact that the letters of ArSL are signed using gestures that are not always detected easily.
  • 3. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 516 Sometimes there are difficulties with reading hand parts, such as the joints, the palm, the fingertip, etc. “fig. 1”. In such cases, predicting the meaning of a sign is difficult. Fig.1: Some ArSL issues. Some new sensor devices, such as LMC and Kinect, have some limitations that require further development. Using both devices may provide more details and enhance any hand gesture recognition system, especially if more devices are placed in different settings [21]. To build a translator system using several sensors, we need a compatible algorithm for keeping track of all high-precision parts. Therefore, the main objective of this research is to develop a supervised machine learning hand-gesturing model to recognize ArSL using Kinect with LMC. This paper describes the first step (phase) the researcher must take when constructing a Sign Language interpreter system, which uses a novel approach by combining Kinect V2 with LMC to assess the accuracy of ArSL classifications. IV. KINECT AND LMC Microsoft’s Kinect Version 2.0, was used to track standing skeleton with high-depth fidelity. It has an RGB camera, voice recognition capability, face-tracking capabilities, and access to the raw sensor records [33] and [34]. The infrared (IR) emitter and IR depth sensor work together with the PrimeSense Chip. Through this chip, Kinect has a wider sensing range and tracks 24 joint points “fig. 2” [35- 36]. Fig. 2. 24 joint points that Kinect detect 1 LMC Version 2.0 was used to provide a skeletal tracking model that offers information about hands, fingers and overall hand tracking data even if the hands cross over each other “fig.3”. LMC uses a right-handed Cartesian coordinate system, and its application programming interface (API) measures different physical quantities’ units, such as measuring distance using millimeters, time using microseconds, and an angle using radians [37]. Fig. 3: Hand skeleton points that LMC retrieves LMC has an interactive box that is a cube inside an inverse pyramid “fig. 4” that defines a rectilinear area (“rectilinear” means related to a straight line; it may refer to a rectilinear grid, a tessellation of the Euclidean plane) within the LMC field of view “fig. 4”. The size of this box is determined by the LMC field of view and the user’s setting in the application. When the device is placed on any table horizontally, the original point in the plane (0,0,0) is the center of the LMC, so it can detect any user’s hand above it [34]. However, many researchers, such as in our case, rotate the LMC so that a user’s hand gesture becomes more natural; instead of a hand hovering over the device, it can simply gesture normally in front of it. Thus, the user normally stands and makes sign language gestures. Fig. 4. Rotate the view field (horizontal to vertical) plane. Like Kinect, LMC has an open-source library (SDK Leap Version2). This library provides APIs that retrieve many details about the hand, such as the direction and the index of each finger or each bone in the finger. V. THE PROPOSED MODEL The proposed model of this research is to detect and recognize the hand gestures of ArSL letters using supervised learning and natural user interface libraries with the Kinect SDK Version 2 and LMC Kit as follows: 1.1. The devices calibration (joints): Kinect and LMC give the same type of data that we need (depth of object) based on a skeleton algorithm. Thus, we get data about several objects (user hand and body) with high accuracy. Initially, the two devices are placed in the same direction with 30 centimeters between them “fig. 5”. Fig. 5. The tow device’s initial place.
  • 4. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 517 Kinect tracks the joints of the user’s body, while LMC tracks the hand details of the same user. In this experiment, we used Microsoft Visual Studio with C# and SDK Version 2 of Kinect and LMC using a Windows Media3D namespace to present a 3D model of a hand that provided 3D object transformation, such as rotation and zooming. Calibration, between two devices in our research, means drawing one skeleton from the details retrieved from two devices. The skeleton drawing is actually lines between joints using a vector class. We converted the length for each vector from meters, used by Kinect, to millimeters, used by Leap, to have standard length units in millimeters. This class exists in the leap SDK and vector3D class that exist in the .Net C# library. Moreover, the skeleton part that we need is just the upper part of the user’s body (head, neck, shoulder, arm, spin, and hand with fingers). The vector is a structure that represents a three-component mathematical vector or point, such as a direction or position in a 3D space. Fig. 6 shows the vector3D with drawing the complete skeleton of the user. Fig. 6:Defined vectors in the two devices with skeleton appear on user. We stored the hand information in an XML file instead of using tables in the relational database “fig.9”. Fig. 7 presents the first perspective structure that explains the main concept of hand detection and recognition for ArSL letters using Kinect and LMC. Fig. 7. The general process of the experiment level 0. 1.2. The testing environment The data collection was done in a room with white light. The four participants (two males and two females) their age between fifteen to forty years to capture different hand sizes, styles, speeds, and hand orientations for each gesture. They were not deaf; rather, they were able to mimic sign gestures. Each participant stood in a position from 1 to 2 meters away from the front of the Kinect device, and 30 to 50 cm away from the front of the LMC. Kinect and LMC were mounted on a table 70 cm from the floor. The two devices were placed on the same horizontal surface and 50 cm apart. The background behind each participant was different because it was captured in different places. Fig. 8 presents two users in the testing environment with the main interface of the application that we built to apply our methodology. . Fig. 8. Experimental environment with non-deaf participants with the main window interface. The capturing procedure used Microsoft Kinect and LMC as an input device, and all of the signals that the LMC device captured were activated, but only the depth and color frames were activated in Kinect. Then, all of the users were asked to gesture all letters continuously by making the “new” gesture and then click on the “select” button for each letter. For letters with more than one movement, the process of recording used then the comparison made on frames, where each frame represents one sign. An evaluation button is used to export any live sign to a Microsoft Excel sheet with all of the calculations between this unknown gesture and all stored gestures for classifying the gesture and checking if it is a match with any existing gesture. 1.3. Data set collection To evaluate the calculation process, we collected an experimental dataset with all 28 ArSL letters. Each of the four participants performed each letter at least two times. Thus, there were 28 × 4 × 2 = 224 signs for all letters. The average for one user was 28 gestures in 20 minutes. The depth data hand gesture dataset collected for these cases using LMC and Kinect consisted of 3D coordinate values (x,y,z) and an RGB image. Bone direction was the main
  • 5. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 518 feature, which was used to compare the hand gestures. The four other features were used to draw each bone finger as a 3D model: the fingertip, the bone center point, the bone from point, and the bone to point. We attempted to find out how LMC and Microsoft Kinect would recognize and translate ArSL from the depth images of the sensor to ordinary speech or text. Fig. 9 shows the features of the first bone in the thumb; it is an XML file that is a snapshot of the sample of the dataset for the letter “‫.”ﺃ‬ We attempted to find out how LMC and Microsoft Kinect would recognize and translate ArSL from the depth images of the sensor to ordinary speech or text. The data file contains many tags that describe the letter, such as: • Sign name: the name of the gesture that the user enters in the class name field • Hand: contains a tag for recognizing the left or right hand, palm position, palm direction, and wrist point finger. These are the values with respect to the x, y, and z axes that were obtained from LMC. • Fingers: the finger index, the (x,y,z) coordinates, and distance of the fingertip • Bones: the bone index, direction, center point, and from point and to point with respect to the (x,y,z) coordinates and distance Fig. 9:Gesture dataset XML file. The threshold value is a point or value that the gesture accuracy can affect. Through experimentation, researchers can determine a threshold that is useful for reducing the computation in the search process, as it becomes the point separating the accepted from the rejected values. For instance, if the value is under the threshold value, the gesture is accepted; otherwise, the gesture is rejected if it goes beyond the threshold. The threshold value used in this research was from 15 degrees because when the hand leans more than this value, the sign will be different. For each joint point, we drew a line between each corresponding point (vector). Each vector has a corresponding direction and an angle, and each angle was measured based on the corresponding x, y , z angles. These angles are the main factor for a comparison between two gestures. To clarify the calculation process, the following example presents a certain value for a live gesture compared with a stored one “Fig. 10”. The first value (point A in the live gesture) represents the first bone in the thumb: Point (A) (x1, y1, z1) = (1.78, 0.44, 1.95). If we need to compare it with the same bone in the first stored sign, which is (x2, y2, z2) = ( 0.92, 0.66, 1.51), we have to apply the difference between each value, as in the following equation: X=ABS(180 * x1 /360-180*x2/360) = 49.59 Y =ABS(180 * y1 /360-180*y2/360) = 12.56 Z =ABS(180 * z1 /360-180*z2/360) = 25. Fig. 10:Clarify skeleton on device. Thus, the result shows the value more than the threshold, so the live gesture did not match the first stored value. Keep in the mind that the comparison was between the same index of the hand, finger, and bone. 1.4. Training phase Using the distance or depth value for each hand skeleton point stored in an XML file, each bone direction in an unknown live gesture was compared with the same bone index in the stored signs. If more than one gesture gave an accepted value, we would calculate the average of the comparison bones and then accept the minimum one. “Fig. 11” shows a snapshot from the training phase for the gesture of the letter “‫,”ﺏ‬ pronounced “ba”. The first participant clicked on the “new” button using a mouse, and then the participant made the gesture of the letter and clicked “select” to fix the gesture; next, the participant clicked “save” to store it. The participant repeated the same process for all letters, and so on for all participants. The testing phase “detection” had enough information to classify or recognize the gesture as in “fig. 12”. The information about gestures appears under the left-hand model, where the sign text is, for example: “‫-ﺕ‬user1-2” means that the gesture is “‫”,ﺕ‬ pronounced “ta.” It matched the stored one made by participant number 1 in his second attempt “1-2.” In addition, a timestamp appeared to give us some indication about the response time in milliseconds. Fig. 11:A snapshot from the training phase.
  • 6. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 519 Fig. 12. A snapshot from the classification phase. Table I presents samples of the same letters “‫ﺕ‬ “ and “ ‫”ﺝ‬ signed by four participants. In the training phase, we repeated some gestures more than one time because some participants felt tired and the device became too slow to detect due to heating. TABLE I. SAMPLE OF “‫”ﺕ‬ AND “‫”ﺝ‬ LETTERS SIGNED BY THREE PARTICIPANTS Gesture of sign ‫ﺕ‬ Gesture of sign ‫ﺝ‬ User1User2User3 1.5. The algorithm of gesture classification The bone direction was used to detect the gesture meaning using a skeletal algorithm that the two devices feature. To check the similarity of the letters, the comparison step between each of the two bones that had the same index in both hands is the core of our algorithm. The algorithm with four steps of the proposed model is: 1-Initializing values: • Threshold= threshold numbers (angle); • List [Sign details] = XML file. • Local Info= Coordinates value (X,Y,Z) • IsSimilar (Local Info Value1, Local Info Value2, Threshold) Get RGB from Kinect; then, draw a hand by vectors between each joint, line (from previous joint to next joint), and points based on LMC coordinates • Calculate the direction value for each bone (pervious joint, next joint) 2- Get angles: • X= Angle between the vector of point(i) around X- axis • Y= Angle between the vector of point(i) around Y- axis • Z= Angle between the vector of point(i) around Z- axis 3- IsSimilar (Local Info Value1(x,y,z), Local Info Value2(x,y,z), Threshold) If (Same hand index and same finger index) • Loop all points to check the difference between value 1 and value 2; • If point value < threshold matched • Else not matched and exist • Calculate the AVR for (X, Y , Z). • Calculate the average of all points (all hand/ finger joints). 4- If the result contains more than one hand, then select the hand with minimum AVR. For example, the measurement of the similarity bones’ direction between the unknown gesture skeleton points and one of the gestures in the dataset (e.g., sign of the number 2). The following example provides the calculation process; “Fig. 13” shows the values of each point in the hand skeleton, and Table II contains example of the full point values obtained by LMC for thumb and index where we had the same for each bone in the hand. Fig. 13: Values of some hand skeleton point. TABLE II. VALUES OBTAINED BY LMC FOR EACH BONE IN THE HAND Unknown Sign Sign of number 2 Live Sign angle values between point and one coordinate (Radian) Stored Sign angle values between point and one coordinate (Radian) Finger name B on e X1 Y1 Z1 X2 Y2 Z2 1 0- thumb 1 1.7 8 0.4 4 1.9 5 0.9 2 0.6 6 1.5 1 2 0- thumb 2 2.1 9 0.7 2 1.9 0 1.4 6 0.3 6 1.9 1 3 0- thumb 3 2.2 5 0.7 8 1.8 9 2.3 6 1.2 4 2.2 4 4 1-index 0 1.6 0.1 1.6 1.4 0.2 1.3
  • 7. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 520 Unknown Sign Sign of number 2 Live Sign angle values between point and one coordinate (Radian) Stored Sign angle values between point and one coordinate (Radian) Finger name B on e X1 Y1 Z1 X2 Y2 Z2 0 0 7 9 1 8 5 1-index 1 1.4 3 0.3 2 1.8 5 1.1 5 1.0 0 2.4 0 6 1-index 2 1.4 5 0.3 6 1.9 1 1.4 4 2.3 3 2.3 7 7 1-index 3 1.4 6 0.4 0 1.9 5 1.7 2 2.9 1 1.7 4 Thus, the same calculations will apply, and the above table values become “Table III”. The last column in Table III tests the matching of an unknown letter with the sign of the number two, based on the threshold, where there are four values over 15, which means the unknown sign was not matched with the sign of the number two. Each letter passes a comparison process, which is: • Same Hand (Left, Right)=2 • Same Finger with the same index and direction (0=Thumb, 1=index, 2=Middle, 3=Ring, 4=little)=5 • Same Bone with the same index and direction (0=metacarpal, 1=Proximal, 2=intermediate, 3=distal), where the thumb has just three bones = 3 + 4 (4) = 19 Thus, if the letter received 19 yes’s, that means it was a match; otherwise, it was not. If it was matched with more than one sign, the stored sign with the minimum average will be taken. The bar chart of “fig. 14” illustrates the comparison of each stored letter with letter “‫”ﺕ‬ pronounced “ta” bones. It was measured by counting the number of “yes” and “no” matches “fig. 15”. • “Yes” means all of the bone directions matched, and “no” means none were a match. • The y-axis is the number of “yes” and “no” matches (it will be between 0 and 19). • The x-axis is the sign of users for all stored letters (u1- 1-‫,)ﺃ‬ meaning the first participant in the first attempt for letter “‫.”ﺃ‬ Overall, it can be seen that the letter “‫”ﺕ‬ matches correctly with participant 2 in his first attempt and participant 3 with his second attempt where the comparison count was equal to 19. The line graph compares the average of letter “‫”ﺕ‬ with all of the stored letters, and it shows that the minimum average matched with the same letter for participant 2 in his first attempt, so it will be the matched result. Some of the results were incorrect, such as those for the letter “‫”ﻫـ‬ which the device cannot get correctly. It could be because many fingers need to cross over one another, or since it is the final letter in the alphabet, the participants could have been fatigued, or the device may have heated up. Even if the participant made the same sign gesture several times, the devices were not able to capture it properly, so one extra camera (maybe another LMC) might have been needed to capture the sign from a different angle. Letters with multiple frames need a special algorithm to take certain frames from each sign. Thus, comparing becomes more efficient. In spite of these few cases, the recognition of the other letters was correct 100%, especially for the letters that had clear shapes, such as “ ‫ﺹ‬,‫ﺩ‬,‫ﻝ‬,‫ﻥ‬,‫ﺱ‬,‫ﺏ‬,‫ﺕ‬,‫ﺙ‬,‫ﺵ‬,‫ﻱ‬,‫ﻡ‬,‫ﺽ‬ ”. VI. DISCUSSION AND CONCLUSION This research examined 224 matching cases of 28 letters. We found the class name (correct letter) with high accuracy for more than 20 letters because the matched letter value was less than 15 degrees over the threshold. Meanwhile, the other letters’ values were not only more than the threshold but also very far away from the same letter because the device obtained data from one direction. Some critical cases included difficulty in counting the joints, or the palm might have been covered by the fingers. Thus, letters with overlapping fingers or letters with multiple frames need more processing to be accurate. The bone direction feature that was used in our algorithm gave very high accuracy and did not care about the user’s hand position. This means that if the participants stood in a position and made gestures, there was no need in the next iteration to stand in the same position. The hand did not need to be in the same position, either. This was true even in the dynamic classification of letters (once the hand changes, the letter class changes directly). However, the limitation of using both LMC and Kinect is that each depends on one direction. For the accurate recognition of hand gestures in LMC, only one person should stay in the visible range, which is only 57 degrees in front of the device and 50 cm away from the device (users’ movement is restricted). Using Kinect could possibly overcome a crucial problem for Arab hearing-impaired people that manifests in the lack of ArSL interpreters and the excessive variation of ArSL dialects. Hearing-impaired people who use such a platform (Kinect and Leap) as an interpreter could actually be offered the chance to be more interactive if the device is fed with enough data about these various dialects.
  • 8. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 521 TABLE III. VALUES AFTER APPLYING THE CALCULATIONS Finger name Bone Unknown Sign Sign of number 2 Average of similarity Less than Thresh old (15) Live Sign angle values between point and one coordinate (Radian) Similarity percentage (value/360) X Y Z X/360 Y/360 Z/360 1 0-thumb 1 33.623 12.494 7.580 9.34% 3.47% 2.11% 4.97% Yes 2 0-thumb 2 20.807 3.700 22.642 5.78% 1.03% 6.29% 4.37% Yes 3 0-thumb 3 8.033 24.307 17.238 2.23% 6.75% 4.79% 4.59% Yes 4 1-index 0 3.023 7.658 7.740 0.84% 2.13% 2.15% 1.71% Yes 5 1-index 1 3.272 62.643 63.421 0.91% 17.40% 17.62% 11.98% Yes 6 1-index 2 6.997 103.946 35.900 1.94% 28.87% 9.97% 13.60% Yes 7 1-index 3 8.744 123.427 11.890 2.43% 34.29% 3.30% 13.34% Yes 8 2- Middle 0 2.691 5.578 10.466 0.75% 1.55% 2.91% 1.73% Yes 9 2- Middle 1 19.019 15.298 7.568 5.28% 4.25% 2.10% 3.88% Yes 10 2- Middle 2 21.267 9.027 2.349 5.91% 2.51% 0.65% 3.02% Yes 11 2- Middle 3 22.976 3.749 2.229 6.38% 1.04% 0.62% 2.68% Yes 12 3-Ring 0 2.376 3.366 13.235 0.66% 0.93% 3.68% 1.76% Yes 13 3-Ring 1 5.441 53.755 89.312 1.51% 14.93% 24.81% 13.75% Yes 14 3-Ring 2 15.621 91.278 82.127 4.34% 25.35% 22.81% 17.50% No 15 3-Ring 3 19.905 110.122 61.538 5.53% 30.59% 17.09% 17.74% No 16 4-Little 0 1.180 1.591 15.439 0.33% 0.44% 4.29% 1.69% Yes 17 4-Little 1 28.895 35.547 94.157 8.03% 9.87% 26.15% 14.69% Yes 18 4-Little 2 46.680 69.417 85.296 12.97% 19.28% 23.69% 18.65% No 19 4-Little 3 53.066 84.655 67.394 14.74% 23.52% 18.72% 18.99% No Fig. 14. Bar chart of evaluation of the “ ‫”ﺕ‬ letter.
  • 9. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 522 Fig. 15. Line chart of evaluation of the “‫”ﺕ‬ letter. REFERENCES [1] Sarkat, A., Sanyal, G., & Majumder S. (2013). Hand gesture recognition system: A survey. International Journal of Computer Applications, 71(15), 0975– 8887. [2] Ballan, L., Taneja, A., Gall, J., Gool, L. V., & Pollefeys, M. (2012). Motion capture of hands in action using discriminative salient points. Paper presented at European Conference on Computer Vision (ECCV), Florence, Italy. Springer: Springer Berlin Heidelberg. [3] Oikonomidis, I. Kyriazis, N., & Argyros, A. A. (2011). Efficient model-based 3D tracking of hand articulations using Kinect. Paper presented at British Machine Vision Conference (BMVC), UK. Place of Publication: University of Dundee. [4] Wang, R. Y., & Popović, J. (2009). Real-time hand- tracking with a color glove. ACM Transactions on Graphics (TOG), 28(3), 63. [5] Hui, L., & Junsong, Y. (2014). Hand parsing and gesture recognition with a commodity depth camera. In L. S. Ling, J. Han, P. Kohli, & Z. Zhang (Eds.), Advances in computer vision and pattern recognition (pp. 239–265). Switzerland: Spring International Publishing. [6] Liang, H., Yuan, J., Thalmann, D., & Zhang, Z. (2013). Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. The Visual Computer International Journal of Computer Graphics, 29(6-8), 837-848. [7] Lingchen, C., Feng, W., Hui, D., & Kaifan, J. (2013). A survey on hand gesture recognition. Proceedings of the International Conference on Computer Sciences and Applications, Kunming, China. Place of publication: IEEE Xplore Digital Library. [8] Suarez, J., & Murphy, R. (2012). Hand gesture recognition with depth images: A review. Paper presented at the 21st IEEE International Symposium on Robot and Human Interactive Communication, Paris, France. Place of Publication: IEEE Xplore Digital Library. [9] Han, J., Shao, L., Xu, D., & Shotton, J. (2013). Enhanced computer vision with Microsoft Kinect Sensor: A review. IEEE Trans Cybern, 43(5), 2168– 2267. [10]Chen, X., Li, H., & Tansley, S. (2013). Kinect Sign Language Translator expands communication possibilities. Retrieved from http://research.microsoft.com/en- us/collaboration/stories/kinectforsignlanguage_cs.pdf [11]Leap Motion – Best practices guidelines. (2015) Available from https://developer.leapmotion.com/documentation/ [12]Marin, G., Dominio, F., & Zanuttigh, P. (2015,7). Hand gesture recognition with jointly calibrated Leap Motion and depth sensor. Multimedia Tools and Applications journal, 1380-7501, 1-25. [13]Murthy, G., & Jadon, R. (2009). Review of vision based hand gestures recognition. International Journal of Information Technology and Knowledge Management, India, 2(2), 105–410. [14]Ali, E., Mircea, N., Richard, B., & Xander, T. (2007). Vision-based hand pose estimation: A review. Computer Vision and Image Understanding, 108(1), 52–73. [15]Hsiang, Y., & Han, J. (2014). Real-time dynamic hand gesture recognition. Proceedings of the International Symposium on Computer, Consumer and Control, Taichung. IEEE Xplore Digital Library: IEEE. [16]Li, Y. (2012, June). Hand gesture recognition using Kinect. In Software Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference on (pp. 196-199). IEEE. [17]SignSpeak. (2009). Retrieved from http://www.signspeak.eu/en/index.html.
  • 10. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN: 2454-1311 www.ijaems.com Page | 523 [18]ClassAct. (n.d.). Retrieved from https://www.deaftec.org/classact [19]tawasoul4arsl. (2012). Retrieved from http://tawasoul4arsl.ksu.edu.sa/ [20]Senghas, R. J., & Monaghan, L. (2002). Signs of their times: Deaf communities and the culture of language. Annual Reviews of Anthropology from Journal Storage (JSTOR), 31(2002), 69–97. [21]Mohandes, M., Liu, J., & Deriche, M. (2014, February). A survey of image-based arabic sign language recognition. In Multi-Conference on Systems, Signals & Devices (SSD), Barcelona, 2014 11th International (pp. 1-4). IEEE. [22]Hui, L., Junsong, Y. Y., & Daniel, T. (2015). Resolving ambiguous hand pose predictions by exploiting part correlations. IEEE, 25(7), 1125–1139. [23]Youssif, A., Aboutabl, A., & Ali, H. (2011). Arabic Sign Language (ArSL) recognition system using HMM. International Journal of Advanced Computer Science and Applications, 2(11), 45–51. [24]EI-Bendary, N., Zawbaa, H. M., Daoud, M. S., Hassanien, A. E., & Nakamatsu, K. (2010). ArSLAT: Arabic Sign Language Alphabets Translator. Paper presented at the 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM), Krackow. IEEE Xplore Digital Library: IEEE. [25]Hemayed, E. E., & Hassanien, A. S. (2010). Edge- based recognizer for Arabic sign language alphabet (ArS2VArabic sign to voice). Paper presented at the Computer Engineering Conference (ICENCO), 2010 International, Giza. IEEE Xplore Digital Library: IEEE. [26]Mohandes, M., Aliyu, S., & Deriche, M. (2014). Arabic sign language recognition using the Leap Motion Controller. Paper presented at King Fahd University of Petroleum & Minerals Dhahran, 31261, Saudi Arabia. Place of Publication: IEEE Xplore Digital Library. [27]Samir, A., & Aboul-Ela, M. (2012). Error detection and correction approach for Arabic sign language recognition. Paper presented at the 2012 Seventh International Conference on Computer Engineering Systems (ICCES), Cairo. Place of Publication: IEEE Xplore Digital Library. [28]Shanableh, T., & Assaleh, K. (2007). Video-based feature extraction techniques for isolated Arabic sign language recognition. Paper presented at the 9th International Symposium on Signal Processing and Its Applications, Sharjah. Place of Publication: IEEE Xplore Digital Library. [29]Mohandes, M., Deriche, M., Johar, U., & Ilyas, S. (2012). A signer-independent Arabic Sign Language recognition system using face detection, geometric features, and a Hidden Markov Model. Comput. Electr. Eng., 38(2), 422–433. [30]Assaleh, K. (2010). Continuous Arabic sign language recognition in user dependent mode. J Intel. Learn. Syst. App!, 2(1), 19–27. [31]Tolba, M. F., Samir, A., & Abul-Ela, M. (2012). A proposed graph matching technique for Arabic sign language continuous sentences recognition. Paper presented at the 2012 8th International Conference on Informatics and Systems (INFOS), Cairo. Place of Publication: IEEE Xplore Digital Library. [32]Spiegelmock, M. (2015). Leap Motion Development Essentials packet. Retrieved from www.packtpub.com/leap-motion-development- essentials/book [33]Jana, A. (2012). Kinect for Windows SDK programming guide. Birmingham B3 2PB, UK: Packt Publishing Ltd. [34]Chen, X., Li, H., Pan, T., Tansley, S., & Zhou, M. Kinect Sign Language Translator expands communication possibilities. Retrieved from http://research.microsoft.com/en- us/collaboration/stories/kinect-sign-language- translator.aspx.