SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
DOI : 10.5121/sipij.2013.4408 101
FEATURE EXTRACTION USING MFCC
Shikha Gupta1
, Jafreezal Jaafar2
, Wan Fatimah wan Ahmad3
and Arpit Bansal4
Universiti Tecknologi PETRONAS, CIS Dept, Perak, Malaysia
Shikha.cs88@gmail.com1, jafreez@petronas.com.my2
fatimhd@petronas.com.my3
4
Indian institute of Information and Technology, Allahabad, India
Arpit06bansal@gmail.com
ABSTRACT
Mel Frequency Ceptral Coefficient is a very common and efficient technique for signal processing. This
paper presents a new purpose of working with MFCC by using it for Hand gesture recognition. The
objective of using MFCC for hand gesture recognition is to explore the utility of the MFCC for image
processing. Till now it has been used in speech recognition, for speaker identification. The present system
is based on converting the hand gesture into one dimensional (1-D) signal and then extracting first 13
MFCCs from the converted 1-D signal. Classification is performed by using Support Vector Machine.
Experimental results represents that proposed application of using MFCC for gesture recognition have
very good accuracy and hence can be used for recognition of sign language or for other household
application with the combination for other techniques such as Gabor filter, DWT to increase the accuracy
rate and to make it more efficient.
KEYWORDS
Hand gesture, 1D signal, MFCC (Mel Frequency Cepstral Coefficient), SVM (Support Vector Machine).
1. INTRODUCTION
Currently, there is a great focus on developing easy, comfortable interfaces by which human can
communicate with computer by using natural and manipulation communication skills of the
human. In HCI the input domain requires capturing and then interpretation of the face, facial
expression, arms, hands, sometimes whole body motion as well. Among all the inputs, gesture is a
powerful means for the communication purpose among human beings. Even the use of
gesture is very common while talking on the telephone. The Gesture recognition system has
two phases: first one is the feature extraction phase where by using some specific methods few
values are assigned for each gesture by using training dataset. It involves extracting important
information associated with the given gesture and removing all the remaining useless
information. And another phase is the classification processes were based on the training and
the testing database the intended gesture get analyzed. Basically Mel frequency Capstral
coefficients (MFCC) are very common and one of the best method for feature extraction when
talking about the 1D signals. So this paper presents an application of MFCC for hand gesture
recognition. Features are extracted by converting input image into 1D signal. For classification
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
102
purposes SVM is used. SVM it is a supervised Learning method. The benefit of SVM is that it
can also use kernels for non-linear data transformation. The Law behind using SVM is to divide
the given data into two dissimilar category and then to get hyper-plane to partition the given
classes.
The organization of the rest of the paper is as follow: Section II is all about the recognition
system for hand gestures. Section III highlights details about the Mel Frequency Cepstral
Coefficients. Section IV describes the experiment results and discussion. Section VI concludes
the paper along with small description about the possible future work.
2. RECOGNITION SYSTEM FOR HAND GESTURE
The proposed gesture recognition system is divided into three important stages as shown in
figure1: Image conversion from 2D to 1D signal, feature extraction and feature matching also
known as classification process. The 2D converted image is given as input to MFCC for
coefficients extraction. By doing feature extraction from the given training data the unnecessary
data is stripped way leaving behind the important information for classification. The output after
applying MFCC is a matrix having feature vectors extracted from all the frames. In this output
matrix the rows represent the corresponding frame numbers and columns represent
corresponding feature vector coefficients [1-4].
Figure 1: Proposed System
Finally this output matrix is used for classification process. The classification process is divided
into two stages: training phase and testing phase. Feature extraction plays a very important in
the recognition process. This is basically a process of dimension reduction or feature reduction
as this process eliminates the irrelevant data present in the given input while maintaining
important information. Several feature extraction techniques [5-14] are there for gesture
recognition but in this paper MFCC have been used for feature extraction which is mainly used
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
103
for speech recognition system. The purpose for using MFCC for image processing is to enhance
the effectiveness of MFCC in the field of image processing as well. As per the study MFCC
already have application for identification of satellite images [15], face recognition [16] and
palm print recognition [17].
Steps for calculating MFCC for hand gestures are the same as for 1D signal [18-21]. Since
MFCC works for 1D signal and the input image is a 2D image, so the input image is converted
from 2D to 1D signal. Remaining calculation for features extraction is same as for speech
signals as shown in figure 3.
Figure 3: Step by Step MFCC Processing
Since the speech signal is quasi-stationary in nature so, its characteristics remain same for a
short time period and for long time period the characteristics start reflecting the changes. Due to
that MFCC is also based on short time period analysis, so for feature extraction the 1D image
signal first broken up into frames. The continuous signal blocked into frames of N samples, to
prevent loss of information the adjacent frames are separated by M where (M<N).
Signal & Image Processing : An Inter
Hence, the first frame will const
samples after the previous frame
to avoid a loss of the information
the values for N= 256 and for M=
After framing of the 1D sign
technique. This windowing is done
the frame. Hamming window is
n ≤ N-1.
After windowing each frame wh
frequency domain by using FFT
Transform). After DFT calculati
into the Mel frequency. The
frequency scale. The mapping
represented by
represents the mel-scale frequenc
Finally, the log of this Mel-spec
inverse DCT (discrete Cosine
taken.
3. EXPERIMENTAL RESULTS
In this section, experiment is carr
posture images are taken from
scale and orientations. This data
24 different users with different
static background images are bei
Figure 3: Sampl
After following the same proc
MFCC’s are extracted for each
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
titute first N samples then the second frame will be
e to get overlap with the previous frame by N-M and
n. As per the study 60 % of the overlapping is suffi
=100.
nal next step is windowing by using the hamm
done to reduce the discontinuity at the beginning and
used having the form: W (n) = 0.54 – 0.46 Cos (2πn
hich consists of N samples is converted from time do
T which is a fast algorithm to implement DFT (Dis
ion we get the magnitude spectrum which is further
mapping is done between real frequency and the
ng is virtually linear under 1000Hz and logarithm
(1)
cy corresponding to the actual frequency f.
ectrum is processed to get the ceptstral coefficien
Transform). For implementation, the first 13 coef
ESULTS AND DISCUSSION
arried for gesture recognition using MFCC. For exp
Jochen Triesch dataset. This database has images
abase has gray scale images with 10 different gestur
t scales, orientations and different backgrounds. But
ing used as shown below in Figure 3.
le hand posture images from Jochen Triesch dataset
cedure for the 2D images as used for 1D signal, wh
given input, the plotting of these extracted feature
national Journal (SIPIJ) Vol.4, No.4, August 2013
104
egin with M
nd so. This is
icient. Mostly
mming window
at the end of
n/ (N-1)), 0 ≤
domain to the
screte Fourier
r transformed
he perceived
hm beyond it
1) Where: mf
nts by taking
efficients are
eriment hand
s with varied
res made by
ut here only
hen first 13
es shows that
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
105
gestures within the same class even done by different user still look similar as compare to the
MFCC’s of different classes as shown figure 4 which represents the raw images taken by
different users with a variation in scale and angle and its respective MFCC representation.
In the given graphs the X axis represents the number of frames or can say the number of
MFCCs coefficients extracted from the given input signal. And Y axis represents the feature
vector values for each frame.
Figure 4: first 13 MFCCs extracted from the gray scaled images
After extracting MFCCs coefficients the feature vectors classification was done using SVM
classify. While using SVM two choices are there either classification can be done using One
against One SVM or One against all. One against one approach is a pair- wise method; we
required (m (m-1) /2) SVM classifier to be trained where: m is the number of classes. The
confusion matrix after classification is shown below in Table1.
The matrix displaying the classification results after using SVM for classification. In this work
the classification is done between ten different classes. In this confusion matrix the accuracy
and the false alarm rate are being shown. It depends individually how to show the accuracy
using confusion matrix. In the above shown Matrix the diagonal represents the accuracy for the
respective classes and the columns representing the gestures or can also instances in the
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
106
determined class and in the rows the gesture recognized in the actual class. The errors or
misclassified rate is shown outside the diagonal with non-zero values.
1 2 3 4 5 6 7 8 9 10
1 100 0 0 0 0 0 0 0 0 0
2 8 92 0 0 0 0 0 0 0 0
3 0 0 83.33 16.67 0 0 0 0 0 0
4 0 0 0 90 0 0 10 0 0 0
5 0 0 0 0 93 7 0 0 0 0
6 0 0 0 0 8 92 0 0 0 0
7 0 0 0 0 0 0 100 0 0 0
8 0 0 0 0 0 0 0 100 0 0
9 16.67 0 0 0 0 0 0 0 83.33 0
10 0 0 0 0 0 0 0 0 0 100
Table 1: Confusion matrix
From the above matrix it is clear that when class1 was classified against the remaining classes
the accuracy for the given class is 100%. This accuracy rate represents that all the gesture from
the class 1 is identified correctly as positive class gesture during the classification process. In
case of class 2 the total accuracy for this class is 92% with error rate of 8%. The
misclassification rate of 8% for class 2 shows that while doing the classification 8% of the
gestures from class 2 was wrongly identified as class 1 gesture and due to this the false negative
rate is 8% and true positive rate is 92% for class 2 with an overall accuracy as 92% for class 2.
For class 3 again the accuracy is just 83.33% which is very less as compare to class 1 and class
2 accuracy rate. When class 3 was classified against the given classes for all the classes the
misclassification was zero but with class 4 erroneously 16.67% of the images from class 3 was
identified as class 4 images and due to which all other columns have zero value while for class3
4 the value is 16.67%. Similarly for the class 4 the overall accuracy is 90% with
10% misclassification rate of 10% with class 7.
For class 5 and class 6 again the accuracy is 93% and 92% respectively. From the
misclassification rate it is visible that when class 5 was classified with other classes the
misclassification rate for all the classes is 0% except for class 6 having false alarm as 7%. For
class 6 also the misclassification rate is 0% with all other given classes except class 5 having an
error rate of 8%.
Finally for the remaining classes the accuracy is 100% such for class 7, class 8 and class 10
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
107
except for class 9 having an identification rate as 83.33% which is again very less as compare to
other given classes accuracy.
In the last for overall accuracy all the diagonal percentage are total together and the sum is
divided by the total number of classes to get the overall performance of the algorithms in terms
of accuracy rate. In this research ten classes were used and after doing the calculation for the
system accuracy the accuracy achieved was 93.6%. From the attain accuracy rate it is believe
that the proposed application is feasible for hand gesture recognition.
3. CONCLUSION AND FUTURE WORK
This paper has represented a feasible method for hand gesture recognition using MFCC. In this
work the given input are converted from 2D Images to 1D signal to be given as input to Mel
frequency ceptral coefficients. After getting the first 13 MFCCs the extracted feature vectors are
classified against SVM. From the resultant confusion matrix it is visible that the MFCC can be
used as a feature extraction technique while working with images just like other available
techniques and also shown a new application for MFCC which is always used for the voice
based processing such as speaker identification, voice recognition, gender identification using
the voice and recently in bio medical too to diagnosis the baby through its voice while crying.
This time the experiment was done only on 10 gestures may be in future the experiment can
done on ASL database. But on the same side the misclassification rate is high between many
classes. If the MFCC can be combine with other technologies may be this misclassification rate
can be reduced and MFCC can be used in image processing like other techniques. Already from
the previous study it is clear that before this already MFCC was tried for palm recognition, face
recognition and for satellite image recognition with very good accuracy. So if more emphasis
will be given on this may be MFCC can be one of the best known algorithm in image
processing as well just like the way it is famous in speech recognition, speaker identification.
So, in future MFCC can be used with a combination of other techniques.
REFERENCES
[1] A. Khan, et al., "Speech Recognition: Increasing Efficiency of Support Vector Machines,"
International Journal of Computer Applications vol. 35, dec 2011.
[2] A. S. Mehendale and M. R. Dixit, "SPEAKER IDENTIFICATION," Signal & Image Processing: An
International Journal (SIPIJ), vol. 2, june 2011.
[3] L. Muda, et al., "Voice recognition algorithm Using Mel Frequency Cepstral Coefficient (MFCC)
and Dynamic Time Warping (DTW) Techniques," Journal of computing vol. 2, 2010.
[4] A. Zulfiqar, et al., "A Speaker Identification System using MFCC Features with VQ Technique "
Third International Symposium on Intelligent Informartion Technology Application, 2009.T. M.
Talal and A. E.-. Sayad, "Identification of Satellite Images Based on Mel Frequency Cepstral
Coefficients” 2009.
[5] D. C. Gope, "Hand Gesture Interaction with Human-Computer," Global Journal of Computer Science
and Technology, vol. 11, dec 2011.
[6] T. Messer, "Static hand gesture recognition," University of Fribourg.
[7] S. K. Kang, et al., "Color Based Hand and Finger Detection Technology for user interaction,"
presented at the International Conference on Convergence and Hybrid Information Technology,
2008.
Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013
108
[8] M. A. amin and H. Yan, "Sign Language Finger Alphabet Recognition from Gabor -PCA
Representation of hand gestures," presented at the Proceeding of the sixth Internaional Conference on
Machine Learning and Cybernetics, Hong Kong, 2007.
[9] Chen, et al., "Hand gesture recognition using Haar-like features and a stochastic context-free
grammer," IEEE Transactions on Instrumentation and Measurement vol. 57, p. 9, 2008.
[10] D. C. Gope, "Hand Gesture Interaction with Human-Computer," Global Journal of Computer Science
and Technology, vol. 11, dec 2011.
[11] D.-Y. Huang, et al., "Gabor filter-based hand-pose angle estimation for hand gesture recognition
under varying illumination " Expert Systems With Applications, vol. 38, p. 12, 2011.
[12] S. Padam and K.Prabin.Bora, "A Study on Static Hand Gesture Recognition using Moments,"
presented at the International Conference on Signal Processing and Communications (SPCOM),
2010
[13] J. J. Stephan and S. a. Khudayer, "Gesture recognition for Human Computer Interaction”
International Journal of Advancements in computing Technology, vol. 2, 4 November 2010.
[14] K. Symeonidis, "Hand Gesture Recognition Using Neural Networks," Centre for Vision, Speech and
Signal ProcessingAugust 23, 2000.
[15] T. M. Talal and A. E.-. Sayad, "Identification of Satellite Images Based on Mel Frequency Cepstral
Coefficients " 2009.
[16] Sangeeta Biswas” MFCC based Face Identification” Titech Japan.
[17] M. M. M. Fahmy, "Palmprint recognition based on Mel frequency Cepstral coefficients feature
extraction," Ain Shams Engineering Journal, p. 9, 2010.
[18] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur Rahman,”Speaker identification
using Mel Frequency Cepstral coefficients”.
[19] V. Tiwari, "MFCC and its applications in speaker recognition," International Journal on Emerging
Technologies, 2010.
[20] S. Khan, Mohd Rafibullslam, M. Faizul, D. Doll, “Speaker recognition using MFCC“ IJCSES
(International Journal of Computer Science and Engineering System) 2(1): 2008.
[21] Mohd Rasheedur Hassan, Mustafa Zamil, Mohd Bolam Khabsani, Mohd Saifur Rehman ” Speaker
identification using MFCC coefficients “ 3rd international conference on electrical and computer
engineering (ICECE), (2004).

More Related Content

PDF
Fingerprint Image Compression using Sparse Representation and Enhancement wit...
PDF
Paper id 252014146
PDF
Shot Boundary Detection In Videos Sequences Using Motion Activities
PDF
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
PDF
B070306010
PDF
IRJET- Traffic Sign Classification and Detection using Deep Learning
PDF
Efficiency and capability of fractal image compression with adaptive quardtre...
Fingerprint Image Compression using Sparse Representation and Enhancement wit...
Paper id 252014146
Shot Boundary Detection In Videos Sequences Using Motion Activities
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
B070306010
IRJET- Traffic Sign Classification and Detection using Deep Learning
Efficiency and capability of fractal image compression with adaptive quardtre...

What's hot (20)

PDF
Neuro-fuzzy inference system based face recognition using feature extraction
DOCX
Template Matching - Pattern Recognition
PDF
07 18sep 7983 10108-1-ed an edge edit ari
DOCX
Himadeep
PDF
Video indexing using shot boundary detection approach and search tracks
PDF
Enhancement of genetic image watermarking robust against cropping attack
PDF
Recognition and tracking moving objects using moving camera in complex scenes
PDF
High Speed Data Exchange Algorithm in Telemedicine with Wavelet based on 4D M...
PDF
Video Inpainting detection using inconsistencies in optical Flow
PDF
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
PDF
Real Time Myanmar Traffic Sign Recognition System using HOG and SVM
PPTX
FAN search for image copy-move forgery-amalta 2014
PDF
Offline Character Recognition Using Monte Carlo Method and Neural Network
PDF
Video Forgery Detection: Literature review
PDF
Image denoising using new adaptive based median filter
PDF
Full Body Spatial Vibrotactile Brain Computer Interface Paradigm
PDF
Convolutional Neural Network Architecture and Input Volume Matrix Design for ...
PDF
Dynamic Threshold in Clip Analysis and Retrieval
PDF
11.digital image processing for camera application in mobile devices using ar...
PDF
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
Neuro-fuzzy inference system based face recognition using feature extraction
Template Matching - Pattern Recognition
07 18sep 7983 10108-1-ed an edge edit ari
Himadeep
Video indexing using shot boundary detection approach and search tracks
Enhancement of genetic image watermarking robust against cropping attack
Recognition and tracking moving objects using moving camera in complex scenes
High Speed Data Exchange Algorithm in Telemedicine with Wavelet based on 4D M...
Video Inpainting detection using inconsistencies in optical Flow
Image segmentation using advanced fuzzy c-mean algorithm [FYP @ IITR, obtaine...
Real Time Myanmar Traffic Sign Recognition System using HOG and SVM
FAN search for image copy-move forgery-amalta 2014
Offline Character Recognition Using Monte Carlo Method and Neural Network
Video Forgery Detection: Literature review
Image denoising using new adaptive based median filter
Full Body Spatial Vibrotactile Brain Computer Interface Paradigm
Convolutional Neural Network Architecture and Input Volume Matrix Design for ...
Dynamic Threshold in Clip Analysis and Retrieval
11.digital image processing for camera application in mobile devices using ar...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
Ad

Viewers also liked (18)

PDF
Wound image analysis classifier for efficient tracking of wound healing status
PDF
AUTOMATIC THRESHOLDING TECHNIQUES FOR OPTICAL IMAGES
PDF
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
PDF
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
PDF
PERFORMANCE ANALYIS OF LMS ADAPTIVE FIR FILTER AND RLS ADAPTIVE FIR FILTER FO...
PDF
Collaborative semantic annotation of images ontology based model
PDF
Performance analysis of image compression using fuzzy logic algorithm
PDF
Retinal image analysis using morphological process and clustering technique
PDF
Image similarity using symbolic representation and its variations
PDF
Performance analysis of high resolution images using interpolation techniques...
PDF
A novel approach to generate face biometric template using binary discriminat...
PDF
ALGORITHM AND TECHNIQUE ON VARIOUS EDGE DETECTION: A SURVEY
PDF
EFFICIENT IMAGE RETRIEVAL USING REGION BASED IMAGE RETRIEVAL
PDF
Detection of fabrication in photocopy document using texture features through...
PDF
A FRAGILE WATERMARKING BASED ON LEGENDRE TRANSFORM FOR COLOR IMAGES (FWLTCI)
PDF
Mfcc based enlargement of the training set for emotion recognition in speech
PDF
A binarization technique for extraction of devanagari text from camera based ...
PDF
An ensemble classification algorithm for hyperspectral images
Wound image analysis classifier for efficient tracking of wound healing status
AUTOMATIC THRESHOLDING TECHNIQUES FOR OPTICAL IMAGES
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
WAVELET BASED AUTHENTICATION/SECRET TRANSMISSION THROUGH IMAGE RESIZING (WA...
PERFORMANCE ANALYIS OF LMS ADAPTIVE FIR FILTER AND RLS ADAPTIVE FIR FILTER FO...
Collaborative semantic annotation of images ontology based model
Performance analysis of image compression using fuzzy logic algorithm
Retinal image analysis using morphological process and clustering technique
Image similarity using symbolic representation and its variations
Performance analysis of high resolution images using interpolation techniques...
A novel approach to generate face biometric template using binary discriminat...
ALGORITHM AND TECHNIQUE ON VARIOUS EDGE DETECTION: A SURVEY
EFFICIENT IMAGE RETRIEVAL USING REGION BASED IMAGE RETRIEVAL
Detection of fabrication in photocopy document using texture features through...
A FRAGILE WATERMARKING BASED ON LEGENDRE TRANSFORM FOR COLOR IMAGES (FWLTCI)
Mfcc based enlargement of the training set for emotion recognition in speech
A binarization technique for extraction of devanagari text from camera based ...
An ensemble classification algorithm for hyperspectral images
Ad

Similar to FEATURE EXTRACTION USING MFCC (20)

PDF
Pattern Recognition final project
PDF
IRJET- Survey Paper on Vision based Hand Gesture Recognition
PDF
SLIDE PRESENTATION BY HAND GESTURE RECOGNITION USING MACHINE LEARNING
PDF
Ijarcet vol-2-issue-3-947-950
PPTX
Gesture recognition using_inertial_sensors
PDF
Feature Extraction of Gesture Recognition Based on Image Analysis for Differe...
PDF
Feature Extraction of Gesture Recognition Based on Image Analysis for Differe...
DOC
PDF
Comparative Analysis of Hand Gesture Recognition Techniques
PDF
IRJET- Hand Gesture Recognition and Voice Conversion for Deaf and Dumb
PPTX
Indian Sign Language Recognition Method For Deaf People
PPTX
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
PDF
Sign Language Identification based on Hand Gestures
PPTX
Interactive Wall (Multi Touch Interactive Surface)
PDF
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
PDF
Ap31289293
PDF
Design a System for Hand Gesture Recognition with Neural Network
PDF
Hand gesture recognition using support vector machine
PDF
RECOGNITION SYSTEM USING MYO ARMBAND FOR HAND GESTURES - SURVEY
PDF
Hand Gesture Recognition using Neural Network
Pattern Recognition final project
IRJET- Survey Paper on Vision based Hand Gesture Recognition
SLIDE PRESENTATION BY HAND GESTURE RECOGNITION USING MACHINE LEARNING
Ijarcet vol-2-issue-3-947-950
Gesture recognition using_inertial_sensors
Feature Extraction of Gesture Recognition Based on Image Analysis for Differe...
Feature Extraction of Gesture Recognition Based on Image Analysis for Differe...
Comparative Analysis of Hand Gesture Recognition Techniques
IRJET- Hand Gesture Recognition and Voice Conversion for Deaf and Dumb
Indian Sign Language Recognition Method For Deaf People
A Framework For Dynamic Hand Gesture Recognition Using Key Frames Extraction
Sign Language Identification based on Hand Gestures
Interactive Wall (Multi Touch Interactive Surface)
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
Ap31289293
Design a System for Hand Gesture Recognition with Neural Network
Hand gesture recognition using support vector machine
RECOGNITION SYSTEM USING MYO ARMBAND FOR HAND GESTURES - SURVEY
Hand Gesture Recognition using Neural Network

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Electronic commerce courselecture one. Pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Group 1 Presentation -Planning and Decision Making .pptx
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Electronic commerce courselecture one. Pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Programs and apps: productivity, graphics, security and other tools
SOPHOS-XG Firewall Administrator PPT.pptx
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Assigned Numbers - 2025 - Bluetooth® Document

FEATURE EXTRACTION USING MFCC

  • 1. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 DOI : 10.5121/sipij.2013.4408 101 FEATURE EXTRACTION USING MFCC Shikha Gupta1 , Jafreezal Jaafar2 , Wan Fatimah wan Ahmad3 and Arpit Bansal4 Universiti Tecknologi PETRONAS, CIS Dept, Perak, Malaysia Shikha.cs88@gmail.com1, jafreez@petronas.com.my2 fatimhd@petronas.com.my3 4 Indian institute of Information and Technology, Allahabad, India Arpit06bansal@gmail.com ABSTRACT Mel Frequency Ceptral Coefficient is a very common and efficient technique for signal processing. This paper presents a new purpose of working with MFCC by using it for Hand gesture recognition. The objective of using MFCC for hand gesture recognition is to explore the utility of the MFCC for image processing. Till now it has been used in speech recognition, for speaker identification. The present system is based on converting the hand gesture into one dimensional (1-D) signal and then extracting first 13 MFCCs from the converted 1-D signal. Classification is performed by using Support Vector Machine. Experimental results represents that proposed application of using MFCC for gesture recognition have very good accuracy and hence can be used for recognition of sign language or for other household application with the combination for other techniques such as Gabor filter, DWT to increase the accuracy rate and to make it more efficient. KEYWORDS Hand gesture, 1D signal, MFCC (Mel Frequency Cepstral Coefficient), SVM (Support Vector Machine). 1. INTRODUCTION Currently, there is a great focus on developing easy, comfortable interfaces by which human can communicate with computer by using natural and manipulation communication skills of the human. In HCI the input domain requires capturing and then interpretation of the face, facial expression, arms, hands, sometimes whole body motion as well. Among all the inputs, gesture is a powerful means for the communication purpose among human beings. Even the use of gesture is very common while talking on the telephone. The Gesture recognition system has two phases: first one is the feature extraction phase where by using some specific methods few values are assigned for each gesture by using training dataset. It involves extracting important information associated with the given gesture and removing all the remaining useless information. And another phase is the classification processes were based on the training and the testing database the intended gesture get analyzed. Basically Mel frequency Capstral coefficients (MFCC) are very common and one of the best method for feature extraction when talking about the 1D signals. So this paper presents an application of MFCC for hand gesture recognition. Features are extracted by converting input image into 1D signal. For classification
  • 2. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 102 purposes SVM is used. SVM it is a supervised Learning method. The benefit of SVM is that it can also use kernels for non-linear data transformation. The Law behind using SVM is to divide the given data into two dissimilar category and then to get hyper-plane to partition the given classes. The organization of the rest of the paper is as follow: Section II is all about the recognition system for hand gestures. Section III highlights details about the Mel Frequency Cepstral Coefficients. Section IV describes the experiment results and discussion. Section VI concludes the paper along with small description about the possible future work. 2. RECOGNITION SYSTEM FOR HAND GESTURE The proposed gesture recognition system is divided into three important stages as shown in figure1: Image conversion from 2D to 1D signal, feature extraction and feature matching also known as classification process. The 2D converted image is given as input to MFCC for coefficients extraction. By doing feature extraction from the given training data the unnecessary data is stripped way leaving behind the important information for classification. The output after applying MFCC is a matrix having feature vectors extracted from all the frames. In this output matrix the rows represent the corresponding frame numbers and columns represent corresponding feature vector coefficients [1-4]. Figure 1: Proposed System Finally this output matrix is used for classification process. The classification process is divided into two stages: training phase and testing phase. Feature extraction plays a very important in the recognition process. This is basically a process of dimension reduction or feature reduction as this process eliminates the irrelevant data present in the given input while maintaining important information. Several feature extraction techniques [5-14] are there for gesture recognition but in this paper MFCC have been used for feature extraction which is mainly used
  • 3. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 103 for speech recognition system. The purpose for using MFCC for image processing is to enhance the effectiveness of MFCC in the field of image processing as well. As per the study MFCC already have application for identification of satellite images [15], face recognition [16] and palm print recognition [17]. Steps for calculating MFCC for hand gestures are the same as for 1D signal [18-21]. Since MFCC works for 1D signal and the input image is a 2D image, so the input image is converted from 2D to 1D signal. Remaining calculation for features extraction is same as for speech signals as shown in figure 3. Figure 3: Step by Step MFCC Processing Since the speech signal is quasi-stationary in nature so, its characteristics remain same for a short time period and for long time period the characteristics start reflecting the changes. Due to that MFCC is also based on short time period analysis, so for feature extraction the 1D image signal first broken up into frames. The continuous signal blocked into frames of N samples, to prevent loss of information the adjacent frames are separated by M where (M<N).
  • 4. Signal & Image Processing : An Inter Hence, the first frame will const samples after the previous frame to avoid a loss of the information the values for N= 256 and for M= After framing of the 1D sign technique. This windowing is done the frame. Hamming window is n ≤ N-1. After windowing each frame wh frequency domain by using FFT Transform). After DFT calculati into the Mel frequency. The frequency scale. The mapping represented by represents the mel-scale frequenc Finally, the log of this Mel-spec inverse DCT (discrete Cosine taken. 3. EXPERIMENTAL RESULTS In this section, experiment is carr posture images are taken from scale and orientations. This data 24 different users with different static background images are bei Figure 3: Sampl After following the same proc MFCC’s are extracted for each Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 titute first N samples then the second frame will be e to get overlap with the previous frame by N-M and n. As per the study 60 % of the overlapping is suffi =100. nal next step is windowing by using the hamm done to reduce the discontinuity at the beginning and used having the form: W (n) = 0.54 – 0.46 Cos (2πn hich consists of N samples is converted from time do T which is a fast algorithm to implement DFT (Dis ion we get the magnitude spectrum which is further mapping is done between real frequency and the ng is virtually linear under 1000Hz and logarithm (1) cy corresponding to the actual frequency f. ectrum is processed to get the ceptstral coefficien Transform). For implementation, the first 13 coef ESULTS AND DISCUSSION arried for gesture recognition using MFCC. For exp Jochen Triesch dataset. This database has images abase has gray scale images with 10 different gestur t scales, orientations and different backgrounds. But ing used as shown below in Figure 3. le hand posture images from Jochen Triesch dataset cedure for the 2D images as used for 1D signal, wh given input, the plotting of these extracted feature national Journal (SIPIJ) Vol.4, No.4, August 2013 104 egin with M nd so. This is icient. Mostly mming window at the end of n/ (N-1)), 0 ≤ domain to the screte Fourier r transformed he perceived hm beyond it 1) Where: mf nts by taking efficients are eriment hand s with varied res made by ut here only hen first 13 es shows that
  • 5. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 105 gestures within the same class even done by different user still look similar as compare to the MFCC’s of different classes as shown figure 4 which represents the raw images taken by different users with a variation in scale and angle and its respective MFCC representation. In the given graphs the X axis represents the number of frames or can say the number of MFCCs coefficients extracted from the given input signal. And Y axis represents the feature vector values for each frame. Figure 4: first 13 MFCCs extracted from the gray scaled images After extracting MFCCs coefficients the feature vectors classification was done using SVM classify. While using SVM two choices are there either classification can be done using One against One SVM or One against all. One against one approach is a pair- wise method; we required (m (m-1) /2) SVM classifier to be trained where: m is the number of classes. The confusion matrix after classification is shown below in Table1. The matrix displaying the classification results after using SVM for classification. In this work the classification is done between ten different classes. In this confusion matrix the accuracy and the false alarm rate are being shown. It depends individually how to show the accuracy using confusion matrix. In the above shown Matrix the diagonal represents the accuracy for the respective classes and the columns representing the gestures or can also instances in the
  • 6. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 106 determined class and in the rows the gesture recognized in the actual class. The errors or misclassified rate is shown outside the diagonal with non-zero values. 1 2 3 4 5 6 7 8 9 10 1 100 0 0 0 0 0 0 0 0 0 2 8 92 0 0 0 0 0 0 0 0 3 0 0 83.33 16.67 0 0 0 0 0 0 4 0 0 0 90 0 0 10 0 0 0 5 0 0 0 0 93 7 0 0 0 0 6 0 0 0 0 8 92 0 0 0 0 7 0 0 0 0 0 0 100 0 0 0 8 0 0 0 0 0 0 0 100 0 0 9 16.67 0 0 0 0 0 0 0 83.33 0 10 0 0 0 0 0 0 0 0 0 100 Table 1: Confusion matrix From the above matrix it is clear that when class1 was classified against the remaining classes the accuracy for the given class is 100%. This accuracy rate represents that all the gesture from the class 1 is identified correctly as positive class gesture during the classification process. In case of class 2 the total accuracy for this class is 92% with error rate of 8%. The misclassification rate of 8% for class 2 shows that while doing the classification 8% of the gestures from class 2 was wrongly identified as class 1 gesture and due to this the false negative rate is 8% and true positive rate is 92% for class 2 with an overall accuracy as 92% for class 2. For class 3 again the accuracy is just 83.33% which is very less as compare to class 1 and class 2 accuracy rate. When class 3 was classified against the given classes for all the classes the misclassification was zero but with class 4 erroneously 16.67% of the images from class 3 was identified as class 4 images and due to which all other columns have zero value while for class3 4 the value is 16.67%. Similarly for the class 4 the overall accuracy is 90% with 10% misclassification rate of 10% with class 7. For class 5 and class 6 again the accuracy is 93% and 92% respectively. From the misclassification rate it is visible that when class 5 was classified with other classes the misclassification rate for all the classes is 0% except for class 6 having false alarm as 7%. For class 6 also the misclassification rate is 0% with all other given classes except class 5 having an error rate of 8%. Finally for the remaining classes the accuracy is 100% such for class 7, class 8 and class 10
  • 7. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 107 except for class 9 having an identification rate as 83.33% which is again very less as compare to other given classes accuracy. In the last for overall accuracy all the diagonal percentage are total together and the sum is divided by the total number of classes to get the overall performance of the algorithms in terms of accuracy rate. In this research ten classes were used and after doing the calculation for the system accuracy the accuracy achieved was 93.6%. From the attain accuracy rate it is believe that the proposed application is feasible for hand gesture recognition. 3. CONCLUSION AND FUTURE WORK This paper has represented a feasible method for hand gesture recognition using MFCC. In this work the given input are converted from 2D Images to 1D signal to be given as input to Mel frequency ceptral coefficients. After getting the first 13 MFCCs the extracted feature vectors are classified against SVM. From the resultant confusion matrix it is visible that the MFCC can be used as a feature extraction technique while working with images just like other available techniques and also shown a new application for MFCC which is always used for the voice based processing such as speaker identification, voice recognition, gender identification using the voice and recently in bio medical too to diagnosis the baby through its voice while crying. This time the experiment was done only on 10 gestures may be in future the experiment can done on ASL database. But on the same side the misclassification rate is high between many classes. If the MFCC can be combine with other technologies may be this misclassification rate can be reduced and MFCC can be used in image processing like other techniques. Already from the previous study it is clear that before this already MFCC was tried for palm recognition, face recognition and for satellite image recognition with very good accuracy. So if more emphasis will be given on this may be MFCC can be one of the best known algorithm in image processing as well just like the way it is famous in speech recognition, speaker identification. So, in future MFCC can be used with a combination of other techniques. REFERENCES [1] A. Khan, et al., "Speech Recognition: Increasing Efficiency of Support Vector Machines," International Journal of Computer Applications vol. 35, dec 2011. [2] A. S. Mehendale and M. R. Dixit, "SPEAKER IDENTIFICATION," Signal & Image Processing: An International Journal (SIPIJ), vol. 2, june 2011. [3] L. Muda, et al., "Voice recognition algorithm Using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques," Journal of computing vol. 2, 2010. [4] A. Zulfiqar, et al., "A Speaker Identification System using MFCC Features with VQ Technique " Third International Symposium on Intelligent Informartion Technology Application, 2009.T. M. Talal and A. E.-. Sayad, "Identification of Satellite Images Based on Mel Frequency Cepstral Coefficients” 2009. [5] D. C. Gope, "Hand Gesture Interaction with Human-Computer," Global Journal of Computer Science and Technology, vol. 11, dec 2011. [6] T. Messer, "Static hand gesture recognition," University of Fribourg. [7] S. K. Kang, et al., "Color Based Hand and Finger Detection Technology for user interaction," presented at the International Conference on Convergence and Hybrid Information Technology, 2008.
  • 8. Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.4, August 2013 108 [8] M. A. amin and H. Yan, "Sign Language Finger Alphabet Recognition from Gabor -PCA Representation of hand gestures," presented at the Proceeding of the sixth Internaional Conference on Machine Learning and Cybernetics, Hong Kong, 2007. [9] Chen, et al., "Hand gesture recognition using Haar-like features and a stochastic context-free grammer," IEEE Transactions on Instrumentation and Measurement vol. 57, p. 9, 2008. [10] D. C. Gope, "Hand Gesture Interaction with Human-Computer," Global Journal of Computer Science and Technology, vol. 11, dec 2011. [11] D.-Y. Huang, et al., "Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination " Expert Systems With Applications, vol. 38, p. 12, 2011. [12] S. Padam and K.Prabin.Bora, "A Study on Static Hand Gesture Recognition using Moments," presented at the International Conference on Signal Processing and Communications (SPCOM), 2010 [13] J. J. Stephan and S. a. Khudayer, "Gesture recognition for Human Computer Interaction” International Journal of Advancements in computing Technology, vol. 2, 4 November 2010. [14] K. Symeonidis, "Hand Gesture Recognition Using Neural Networks," Centre for Vision, Speech and Signal ProcessingAugust 23, 2000. [15] T. M. Talal and A. E.-. Sayad, "Identification of Satellite Images Based on Mel Frequency Cepstral Coefficients " 2009. [16] Sangeeta Biswas” MFCC based Face Identification” Titech Japan. [17] M. M. M. Fahmy, "Palmprint recognition based on Mel frequency Cepstral coefficients feature extraction," Ain Shams Engineering Journal, p. 9, 2010. [18] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur Rahman,”Speaker identification using Mel Frequency Cepstral coefficients”. [19] V. Tiwari, "MFCC and its applications in speaker recognition," International Journal on Emerging Technologies, 2010. [20] S. Khan, Mohd Rafibullslam, M. Faizul, D. Doll, “Speaker recognition using MFCC“ IJCSES (International Journal of Computer Science and Engineering System) 2(1): 2008. [21] Mohd Rasheedur Hassan, Mustafa Zamil, Mohd Bolam Khabsani, Mohd Saifur Rehman ” Speaker identification using MFCC coefficients “ 3rd international conference on electrical and computer engineering (ICECE), (2004).