Vision based dynamic gesture recognition in Indian sign Language

Vision Based Dynamic Gesture Recognition on
Indian Sign Language
Geetha M∗, Manjusha C†,Unnikrishnan P‡ and Harikrishnan R‡
∗Dept. of Computer Science and Engg.
Amrita School of Engineering
Amritapuri, Kollam, Kerala, India
Abstract—Communication with or by the deaf and dumb are
based on the sign language followed by a country. Not everyone
is well versed with the sign language followed in their country. As
a result, the deaf and the dumb find it difficult to communicate
with the people around them. The motive of our project is to
make communication easy for the deaf and dumb thus taking
this project to the level of serving the society. Gesture recognition
is an area of active research in computer vision. While speech
recognition has made rapid advances, sign language recognition is
lagging behind. Our project aims at filling the need for technology
for automated understanding of sign language. While languages
like the American sign language(ASL) are of huge popularity in
the field of research and development, Indian sign language on
the other hand has been standardized recently and hence its(ISLs)
recognition is less explored.
We proposed a novel method for feature extraction part of the
dynamic gesture recognition. In simple terms, it is interpreting
a video input (through Microsoft Kinect) based on Indian sign
language (ISL). Given a video(through Microsoft Kinect) in sign
language, the software must be able to extract features from the
signs and convert it to the intended textual form.
After the initial pre processing of the input and trajectory
extraction of different fingers, these trajectories are given as
inputs to the feature extraction phase. Most methods for fea-
ture extraction in recognition of sign languages concentrate on
extraction of either the global or local features for recognition.
Our novel method for feature extraction is an integration of both
the local and global features present in a gesture. For Global
Feature Extraction, we have proposed a new method using the
concept of Axis of Least Inertia (ALI). For recognition of Local
features, we used Principle Component Analysis (PCA).
Apart from serving as an aid to the disabled people, other
applications of the system also include serving as a sign language
tutor, interpreter and also be of use in electronic systems that
take gesture input from the users.
I. INTRODUCTION
The principal constituent of any sign language recognition
system is hand gestures and shapes normally used by deaf
people to communicate among themselves. A gesture is defined
as a energetic movement of hands and creating signs with them
such as alphabets, numbers, words and sentences. Gestures are
classified into two type static gesture and dynamic gestures.
Static gesture refer to certain pattern of hand and finger orien-
tation where as dynamic gestures involve different movement
and orientation of hands and face expressions largely used
to recognize continuous stream of sentences. Our method of
gesture recognition is a vision based technique which does not
use motion sensor gloves or colored gloves for the system to
recognize hand shapes. A complete gesture recognition system
requires understanding of hand shapes, finger orientations,
hand tracking and face expressions tracking.
Accordingly sign language recognition systems are classi-
fied in to two broad categories: sensor glove based and vision
based systems.
The first category requires signers to wear a sensor glove or
a colored glove. The wearing of the glove simplifies the task of
segmentation during processing. Glove based methods suffer
from drawbacks such as the signer has to wear the sensor
hardware along with the glove during the operation of the
system.
In comparison, vision based systems use image processing
algorithms to detect and track hand signs as well as facial
expressions of the signer, which is easier to the signer without
wearing gloves. However, there are accuracy problems related
to image processing algorithms which are a dynamic research
area.
For Recognition, methods that take into account both the
global hand motion (motion of the centroid during the gesture)
and local motion (motion of the fingers with respect to the
centroid) are most effective. Hence, we came up with a novel
approach which uses both the local and global features of a
gesture for Recognition.
In our method, we have used dynamic gestures as inputs
to the system with a view of incorporating sentences into the
recognition system as well as words in the ISL. Also, since we
use the depth information of our hands to identify the gestures,
we have used Microsoft Kinect.
May 21, 2013
II. SYSTEM OVERVIEW
We propose a system which can be used to identify and
recognize the gestures based on Indian Sign Language. These
gestures are dynamic and use both the hands for expression.
We use Microsoft Kinect for inputting the gesture to the
system. After the initial pre processing of the coordinate
values, the B spline trajectory is plotted for the centroid, index,
thumb, middle, ring and small finger motion. These are then
subject to feature extraction that extracts the local and global
features that can uniquely indentify a particular gesture. For
global feature extraction, we proposed a new method using the
concept of Axis of Least Inertia. (ALI). Twenty five key points
are extracted from each gesture and the distance between those
points to the ALI is computed. These distance vectors are

Fig. 1. Block diagram representing the structure of our system
Fig. 2. Skeletal mapping of the joints of the hand through Kinect
taken as the global features. For local features the distance
between each finger tip to centroid is computed in each frame
and a matrix is created with all these distance values. This
is then subject to Principle Component Analysis (PCA). An
integration of both these results helps us to identify the gesture
correctly.
A. Preprocessing
1) VIDEO INPUT: Microsoft Kinects 3Gear technologies
have an option to visualize the hand motion as a skeletal hand
motion on screen. We have utilized this option for getting the
input coordinate values. The steps involved in this phase are:
1) The gesture is shown to the Kinect cameras that sense
the motion of the hand in a skeletal frames
2) The program outputs the value of coordinates (x,y,z)
in each frame in the following format
(Left hand) Wrist, Centre of the palm, Thumb, Index finger,
Middle finger, Ring finger, Small finger
(Right hand) Wrist, Centre of the palm, Thumb, Index
finger, Middle finger, Ring finger, Small finger
Fig. 3. (Top) Left and (Bottom) Right hand trajectories for the word V alley
after B-Spline Approximation
Fig. 4. (left) Left and (right) Right hand normalised trajectories for the word
V alley
2) CONVERSION TO POSITIVE: The coordinates ob-
tained from Kinects SDK can be negative. These have to be
converted to positive in order to plot them onto an image.
B. B Spline Approximation
1) B Spline Plotting :
The B-Spline stands as one of the most efficient curve
representations, and possesses very attractive properties such
as spatial uniqueness, boundedness and continuity, local shape
controllability, and invariance to affine transformation. These
properties made them very attractive for curve representation.
Very little work, however, has been done on their use for recog-
nition purposes. A closed cubic B-Spline with n+l parameters
C0, C1 , ..., Cn, (control points) consists of n + 1 connected
curve segments
ri(t) = (xi(t), yi(t)) (1)
xi and yi are a linear combination of four cubic polyno-
mials in the parameter t, where t is normalized for each such
segment between 0 and 1 (0 t 1).

2) Normalization: Different people may show the same
gesture in different scales. Therefore, scalability can be an
issue during processing. To overcome this problem, we have
normalized the trajectory so that no matter how the gesture is
shown, it will always fit into a 100 X 100 frame.
xv = (xvmin + ((xw − xwmin) ∗ scalex)) (2)
yv = (yvmin + ((yw − ywmin) ∗ scaley)) (3)
xvmin is the minimum x-point in the normalized
frame,xvmax is the maximum x-point in the normalized
frame,xwmin is the minimum x-point in the non-normalized
frame, xwmax iis the maximum x-point in the non-
normalized frame,yvmin is the minimum y-point in the
normalized frame,yvmax is the maximum y-point in the
normalized frame, ywmin is the minimum y-point in the
non-normalized frame,ywmax is the maximum y-point in
the non-normalized frame,xv and yv are the Non normalized
coordinates in the B spline trajectory.
Data: Bspline points to be normalized
Result: Normalization of B spline curve
initialization;
scalex ←− (xvmax − xvmin)/(xwmax − xwmin);
scaley ←− (yvmax − yvmin)/(ywmax − ywmin);
while For all points do
xv ←− (int)(xvmin + ((xw − xwmin) ∗ scalex));
yv ←− (int)(yvmin + ((yw − ywmin) ∗ scaley));
end
Algorithm 1: Algorithm for Normalization
C. FEATURE EXTRACTION
In order to recognize gestures based on the input values,
feature vectors play a commandalble part. This makes the
extraction of feature vectors a.k.a Feature Extraction extremely
important in any gesture recognition related work. Our method
takes into account both the local and global features associated
with a gesture. Global Feature Vectors , are extracted on the
basis of the centroid of the hand, while local feature vectors
are extracted according to the finger movements.
1) GLOBAL FEATURE EXTRACTION:
The new method proposed is based on the concept of Axis
of least Inertia.
a) AXIS OF LEAST INERTIA:
The axis of least inertia (ALI) of a shape is defined as the
line for which the integral of the square of the distances to
points on the shape boundary is a minimum. Once the ALI
is calculated, each point on the shape curve is projected on
to ALI. The two farthest projected points say E1 and E2 on
ALI are chosen as the extreme points as shown in Figure. The
Euclidean distance between these two extreme points defines
the length of ALI.
2) LOCAL FEATURE EXTRACTION: Initial step in this
phase is to find the local features. For every frame, distance
from each tip to the centroid is calculated and stored as a
matrix.
Data: TO FIND THE GLOBAL FEATURES
Result: GLOBAL FEATURE VECTORS
1.Find the key points in the Bspline Trajectory;
2. Using clustering reduce the number of Key points;
3. Take into account a fixed number of other points
excluding the key points;
4. Get a fixed number of selected points (key points +
other points);
while For all selected points do
Find the distance between these selected points to
the ALI.;
end
5. The resultant distance vectors that are considered as
the global features.;
Algorithm 2: Algorithm for Extraction of GlobalFeature
Vectors
Fig. 5. 25 Global features of the word V alley
a) PRINCIPAL COMPONENT ANALYSIS: Principal
component analysis (PCA) is a mathematical procedure that
uses an orthogonal transformation to convert a set of obser-
vations of possibly correlated variables into a set of values
of linearly uncorrelated variables called principal components.
The number of principal components is less than or equal
to the number of original variables. This transformation is
defined in such a way that the first principal component has
the largest possible variance (that is, accounts for as much of
the variability in the data as possible), and each succeeding
component in turn has the highest variance possible under the
constraint that it be orthogonal to (i.e., uncorrelated with) the
preceding components. Principal components are guaranteed
to be independent only if the data set is jointly normally
distributed. PCA is sensitive to the relative scaling of the
original variables.
STEPS IN PCA
1) 1. From the input matrix, find the covariance matrix.
2) 2. From the covariance matrix, find the Eigen values
and Eigen vectors
3) 3. Select the Eigen vectors to be multiplied with input
matrix based on the Eigen values
4) 4. Multiply the input matrix with the selected Eigen
Vectors and get coefficients

Data: TO FIND LOCAL FEATURES USING PCA
Result: LOCAL FEATURE VECTORS
while From all fingertips do
InputM atrix ←− Distance(fingertip, centroid);
end
2. Do ResultantM atrix ←− InputM atrix − Mean;
3.CovarianceM atrix ←− ResultantM atrix/(N − 1);
4. From the covariance matrix, find Eigen values and
Eigen vectors;
5. Sort the Eigen values. Find the number of Eigen
values to be considered using;
for i=1 to N do V al ←− sum(i)/Totalsum ;
if Val > 0.95 then Choose i as the number of points
required;
6. Take the corresponding Eigen vectors and multiply it
with the transpose of the input matrix.;
7. This results in the coefficients ;
Algorithm 3: Algorithm for finding Local Features using
PCA
III. RECOGNITION
A. GLOBAL FEATURE EXTRACTION
In the Global Feature Extraction, we had 25 distance
vectors and 15 coefficients corresponding to each gesture.
When any new gesture arrives, its feature vectors are compared
to the Feature Vectors present using Euclidean Distance. The
gesture corresponding to which we get minimum Euclidean
Distance is given as the recognized gesture.
B. LOCAL FEATURE EXTRACTION
Corresponding to each finger-centroid we have 15 coeffi-
cient values. This gives us a total of 75 global feature vectors.
To make it simpler, we considered the feature vectors of
thumb, middle finger and small finger alone. Thus reducing
the number of feature vectors to 45.
Any new gesture is compared to these feature vectors.
Euclidean distance of the feature vectors is found and the
gesture with minimum Euclidean distance is given as the
output recognized gesture. Now recognized gestures from local
features and global features are integrated to output the final
result.
IV. CONCLUSION
The proposed gesture recognition system can handle dif-
ferent types of words in a common vision based platform. Our
approach uses both local and global features for recognition
which improves the accuracy of recognition. So our system
is a promising approach in this field. The system is suitable
for complex ISL dynamic signs. However, it is to be noted
that the proposed gesture recognizer cannot be considered as a
complete sign language recognizer, as for complete recognition
of sign language, information about other body parts i.e.,
head, arm, facial expression etc are essential. The experimental
results show that the system is sufficient to claim a ”working
Fig. 6. An example ALI for a shape
system” for native Indian sign language recognition. The
system is designed to support recognition of words in ISL.
This can be enhanced to recognize continuous sentences as
well.
REFERENCES
[1] T. Starner and A. Pentland, ”Real-time american sign language recogni-
tion from video using hidden markov models”, Technical Report, M.I.T
Media Laboratory Perceptual Computing Section, Technical Report No.
375, 1995.
[2] Xiaodong Yang and YingLi Tian , Eigen Joints Based Action Recognition
Using Nave Bayes Nearest Neighbour, THE CITY COLLEGE OF NEW
YORK, NEW YORK.
[3] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.New
York: Wiley, 1973.
[4] Sushmita Mitra, Senior Member, IEEE, and Tinku Acharya, Senior Mem-
ber, IEEE, Gesture Recognition: A Survey, IEEE TRANSACTIONS ON
SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS
AND REVIEWS, VOL. 37, NO. 3, MAY 2007.
[5] Geetha M and Manjusha U C , A vision based Recogniton of Indian
Sign language Alphabets and Numerals using Bspline approximation,
INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND EN-
GINEERING(IJCSE), VOL. 4, NO. 3, MARCH 2012.
[6] Matthew Tang, Recognising hand gestures with Microsoft Kinect, STAN-
FORD UNIVERSITY.
[7] Rafiqul Zaman Khan and Noor Adnan Ibraheem, COMPARITIVE
STUDY OF HAND GESTURE RECOGNITION SYSTEM , A. M. U ,
Aligarh, India
[8] Yang Mingqiang,Kplama Kidiyo and Ronsin Joseph , A survey of
shape feature extraction Techniques, IEEE TRANSACTIONS ON SYS-
TEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND
REVIEWS, VOL. 30, NO. 2, MAY 2000.
[9] Mu-Chun Su , A Fuzzy Rule Based Approach to Spatio Temporal Hand
Gesture Recognition, PATTERN RECOGNITION, PENG-YENG YIN,
VERSION. 1, NO. 43-90, 2008
[10] Sylvie C.W. Ong and Surendra Ranganath , Automatic Sign Language
Analysis: A Survey and the Future beyond Lexical Meaning, IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE

Vision based dynamic gesture recognition in Indian sign Language

More Related Content

What's hot (18)

Viewers also liked (12)

Similar to Vision based dynamic gesture recognition in Indian sign Language (20)

Vision based dynamic gesture recognition in Indian sign Language