SlideShare a Scribd company logo
Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09,
pp. 33–38, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3104
SEARCH TIME REDUCTION USING
HIDDEN MARKOV MODELS FOR
ISOLATED DIGIT RECOGNITION
Sheena C V1
, T M Thasleema2
and N K Narayanan3
Department of Information Technology, Kannur University, Kerala, INDIA
sheenacvg@gmail.com1
, thasnitm1@hotmail.com2
and
nknarayanan@gmail.com3
ABSTRACT
This paper reports a word modeling algorithm for the Malayalam isolated digit recognition to
reduce the search time in the classification process. A recognition experiment is carried out for
the 10 Malayalam digits using the Mel Frequency Cepstral Coefficients (MFCC) feature
parameters and k - Nearest Neighbor (k-NN) classification algorithm. A word modeling schema
using Hidden Markov Model (HMM) algorithm is developed. From the experimental result it is
reported that we can reduce the search time for the classification process using the proposed
algorithm in telephony application by a factor of 80% for the first digit recognition.
KEYWORDS
Isolated Digit Recognition, Mel Frequency Cepstral Coefficient, k - Nearest Neighbor, Hidden
Markov Model.
1. INTRODUCTION
Speech recognition is one of the active research areas in Human Computer Interaction [1]. Speech
Recognition is the ability of a computer to recognize general, naturally flowing utterances from a
wide variety of speakers. It involves capturing and digitizing the sound waves, converting them to
basic language units or phonemes, constructing words from phonemes, and contextually
analyzing the words to ensure correct spelling for words that sound alike. This paper discusses
two different stages for Malayalam digit recognition using Mel Frequency Cepstral Coefficients
(MFCC) algorithm. In the first stage a recognition experiment is carried out using k-NN
algorithm and in the later part a word modeling algorithm is proposed for the Malayalam
telephony application using Hidden Markov Model (HMM) for faster classification.
The basic theory of HMM was introduced and studied in the late 1960s and early 1970s [2]. But
from the literature study, it is reported that only in the past some decades only HMM has been
applied accurately to problems in speech processing. These models are very rich in mathematical
structure and thus it can provide the theoretical basis for use in wide range of applications. Here
we have developed a word modeling algorithm for the Malayalam isolated digits using HMM.
34 Computer Science & Information Technology (CS & IT)
Malayalam is the one of the major language in the Dravidian language family. It is regional
language of south Indian state of Kerala and also on the Lakshadweep islands spoken by about 36
million people [3]. The phonemic structure of Malayalam contains 51 vowels/consonant-vowel
sounds in which 15 long and short vowels and 36 consonant-vowel sounds. Due to lineage of
Malayalam to both Sanskrit and Tamil, Malayalam language structure has the largest number of
phonemic utterances among the Indian languages [4]. Malayalam script includes letters capable of
representing all the phoneme of Sanskrit and all Dravidian languages [5]. In this work, in the
recognition stage we have used ten Malayalam digits uttered by a single speaker repeated 20
times and is tabulated in the table 1. In the word modeling part we have used isolated digits in
Malayalam for the use of telephone applications.
The organization of this paper is as follows. In section II feature extraction using MFCC
algorithm is discussed. Section III explains recognition experiments using k-NN algorithm and
the results are discussed. Section IV gives an overview on HMM followed by Malayalam isolated
digit word modeling using the probability matrix. Finally in section V concludes the present work
followed by directions for future work.
2. FEATURE EXTRACTION USING MFCC COEFFICIENTS
Feature extraction involves simplifying the amount of resources required to describe a large set of
data accurately. In this paper we discusses one of the basic speech feature extraction technique
namely Mel Frequency Cepstral Coefficient (MFCC). The MFCC method uses the bank of filters
scaled according to the Mel scale to smooth the spectrum and to perform similar to that executed
by the human ear [6]. The filters with Mel scale spaced linearly at low frequencies up to 1 kHz
and logarithmically at higher frequencies are used to capture the phonetical characteristics of the
speech signals. Thus MFCCs are used to represent human speech perception models. MFCCs are
computed as in fig 1.
Computer Science & Information Technology (CS & IT) 35
Frame blocking is the process of segmenting the speech samples obtained from the analog to
digital (A/D) conversion into small frames with time length in the range of (20 to 40)
milliseconds. In the next step windowing is carried out to each individual frame so as to minimize
the signal discontinuities at the beginning and end of each frame. After that Fast Fourier
Transform (FFT) is applied for converting each frame of N samples from the time domain in to
the frequency domain. Then each frame with actual frequency, f measured in Hz is converted on a
scale called the ‘Mel’ scale. The Mel frequency is calculated using the formula
700/)1log(2595 Hzmel FF += (1)
The Mel-frequency scale is linear frequency spacing below 1000Hz and a logarithmic spacing
above 1000Hz. The log Mel spectrum is again converted into time domain using discrete cosine
transform (DCT) to get Mel Frequency Cepstral Coefficients (MFCC). Thus the MFCC is derived
by applying above described procedure for each speech frame. A set of MFCC coefficients are
extracted by taking the average of each frame and are used as a feature set in the k-NN
recognition algorithm.
3. SPEECH RECOGNITION USING K-NN ALGORITHM
k-Nearest Neighbor algorithm (k-NN) is part of supervised learning that has been used in many
applications in the field of data mining, statistical pattern recognition and many others [7]. k- NN
is a method for classifying objects based on closest training samples in the feature space. An
object is classified by a majority vote of its neighbors. k is always a positive integer. The
neighbors are taken from a set of objects for which the correct classification is known [8]. Hand
proposed an effective trial and error approach for identifying the value of k that incurs highest
recognition accuracy [9]. Various pattern recognition studies with highest performance accuracy
are also reported based on these classification techniques [10].
k-NN assumes that the data is in a feature space. If k=1, then the algorithm is simply called the
nearest neighbor algorithm. In the example in Fig. 2, we have three classes and the goal is to find
a class label for the unknown example xj. In this case we use the Euclidean distance and a value
of k=5 neighbors. Of the 5 closest neighbors, 4 belong to w1 and 1 belongs to w3, so xj is
assigned to w1, the predominant class.
36 Computer Science & Information Technology (CS & IT)
In the specified experiment a database of 20 repetitions of 10 Malayalam digits are used for
testing and training purpose.One hundred samples are taken for training and one hundred samples
for testing. An average recognition accuracy of 62% is obtained using k-NN algorithm for the
Malayalam digit recognition.
4. WORD MODELLING USING HMM
Hidden Markov Model (HMM) is a statistical model in which the system being modeled is
assumed to be a Markov process with unobserved state [11]. The model is completely defined by
the set of parameters A, B and π where A is the transition probability, NjiaA ij ≤≤≤= 1},{ , B
is the emission probability, MkNjwbB kj ≤≤≤≤= 1,1)},( is the probability of the
observation wk being generated from the state j, π is the initial state probabilities. Thus a model of
N state and M observation can be defined by λ= (A,B,π) [2]. The present work discusses the
modeling algorithm developed for the Malayalam isolated digit recognition in telephony
application. We considered 50 mobile numbers of the BSNL service provider. Here we calculated
and tabulated the initial probabilities and transition probabilities and are given in table 2 and the
corresponding HMM model is shown in fig 3.
Computer Science & Information Technology (CS & IT) 37
From the tabulated result it is found out that the initial probability for all the digits except the
digit 8 and 9 is 0 while the digit 8 has the probability 0.05 and 9 has the probability0.95, since the
database contains all the BSNL numbers starting with the digit 8 and 9. In this work we make use
this result to reduce the search time in the recognition experiment in such a way that in the
classification stage we can start the recognition experiment only by considering the digit 8 and 9
and hence we can reduce the search time by a factor of 80% in the first digit. The similar
procedure can be extended to the successive digits also, resulting in a good reduction in search
time in recognition/classification experiment.
5. CONCLUSIONS
This paper presented a word modeling schema for the recognition of Malayalam isolated digit
recognition using various mobile numbers uttered in Malayalam. Two different stages are carried
out in this study. In the first stage a recognition experiment is carried out using MFCC
coefficients and k-NN algorithm for the 10 Malayalam digits and a recognition accuracy of 62%
is obtained. In the second stage a word modeling algorithm is proposed for the Malayalam
isolated digits using HMM. From the experimental result it is reported that using proposed
algorithm we can reduce the search time by a factor of 80% in the recognition of first digit for the
classification process of BSNL telephone number recognition system. The modeling of the all the
isolated digits from the different service providers using HMM modeling algorithm and its
recognition using other classification algorithms are some of our future research directions.
REFERENCES
[1] Rabiner Lawrence and Biing-Hwang Juang (1993) Fundamentals of speech Recognition Pretice Hall.
[2] L. R.Rabiner and B. H. Juang, (1986), “An Introduction to Hidden Markov Models”, IEEE ASSP
Magazine , pp. 4 – 16, 1986.
[3] Ramachandran, H. P (2008) Encyclopedia of language and linguistics,. Oxford: Pergamon Press.
[4] Aiyar, S (1987). Dravid ian theories, p. 286.
38 Computer Science & Information Technology (CS & IT)
[5] Govindaraju, V., & Setlur, S (2009), Advances in pattern recognition. Guide to OCR for Indic
scripts:Document recognition and retrieval, Berlin: Springer. (p. 126).
[6] Ibrahim Patel and Y Srinivas Rao, (2010), “ Speech Recognition using HMM with MFCC analysis
using frequency spectral decomposition tech nique”, Signal and Image Processing - An International
Journal, Vol. 1(2), pp. 101 – 110.
[7] Zhang. B and Srihari S N, (2004), “Fast k – Nearest Neighbor using Cluster Based Trees”, IEEE
trans. on Pattern Analysis and Machine Intelligence, Vol. 26(4), pp. 525 – 528.
[8] Pernkopf.F,(2005),“Bayesian Network Classifiers versus selective k–NNClassifier Patter
Recognition,Vol. 38, pp. 1 – 10.
[9] Hand D J (1981) Discrimination and classification, NewYo rk, Wiley.
[10] Ray A. K and Chatterjee B, (1984), “Design o f a Nearest Neighbor Classifier System for Bengali
Character Recognition”, Journal of Inst. Elec. Telecom. Eng , Vol. 30, pp 226 – 229,.
[11] Daniel Jurafsky and James Martin (2004) Speech and Language Processing, Pearson Education.
Authors
Sheena C V received her MSc in Computer Science from, Kannur University ,
Kerala, India in 2008, she is currently a Ph.D. student under Prof Dr.N.K.Narayanan
at Department of Information Technology, Kannur University, Kerala, India. Her
research interests include Computer Vision, Digital Image Processing, Digital Speech
Processing, Artificial Intelligence and Artificial Neural Networks.
T M Thasleema had her M Sc in Computer Science from Kannur University,
Kerala, India in 2004. She had to her credit one book chapter and many research
publications in national and international levels in the area of speech processing
and pattern recognition. Currently she is doing her Ph.D in speech signal
processing at Department of Information Technology, Kannur University under
the supervision of Prof Dr N. K Narayanan.
Dr. N.K. Narayanan is a Senior Professor of Information Technology, Kannur
University, Karala, India. He earned a Ph.D in speech signal processing fro m
Department o f Electronics, CUSAT, Kerala, India in 1990. He has published more
than hundred of research papers in national & international journals in the area of
Speech processing, Image processing, Neural networks, ANC and Bioinformatics.
He has served as Chairman of the School of Information Science & Technology,
Kannur University during 2003 to 2008, and as Principal, Coop Engineering
College, Vadakara, Kerala, India during 2009-10. Currently he is the Director, UGC
IQAC, Kannur University.

More Related Content

PDF
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
PDF
FPGA-based implementation of speech recognition for robocar control using MFCC
PDF
PDF
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
PDF
A Text-Independent Speaker Identification System based on The Zak Transform
PDF
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PDF
Speaker Identification From Youtube Obtained Data
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
FPGA-based implementation of speech recognition for robocar control using MFCC
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
A Text-Independent Speaker Identification System based on The Zak Transform
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
Speaker Identification From Youtube Obtained Data

What's hot (20)

PDF
L046056365
PDF
D04812125
PDF
19 ijcse-01227
PDF
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
PDF
Comparative Study of Different Techniques in Speaker Recognition: Review
PDF
3ways to improve semantic segmentation
PDF
PDF
Speaker and Speech Recognition for Secured Smart Home Applications
PDF
Jf3515881595
PDF
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
PDF
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
PDF
V4101134138
PDF
Recognition of handwritten digits using rbf neural network
PPTX
A general multiobjective clustering approach based on multiple distance measures
PDF
A novel secure image steganography method based on chaos theory in spatial do...
PDF
05 comparative study of voice print based acoustic features mfcc and lpcc
PDF
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
PDF
histogram-based-emotion
PDF
Architecture neural network deep optimizing based on self organizing feature ...
PPTX
Semantic Mask for Transformer Based End-to-End Speech Recognition
L046056365
D04812125
19 ijcse-01227
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Comparative Study of Different Techniques in Speaker Recognition: Review
3ways to improve semantic segmentation
Speaker and Speech Recognition for Secured Smart Home Applications
Jf3515881595
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
V4101134138
Recognition of handwritten digits using rbf neural network
A general multiobjective clustering approach based on multiple distance measures
A novel secure image steganography method based on chaos theory in spatial do...
05 comparative study of voice print based acoustic features mfcc and lpcc
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
histogram-based-emotion
Architecture neural network deep optimizing based on self organizing feature ...
Semantic Mask for Transformer Based End-to-End Speech Recognition
Ad

Similar to SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION (20)

PDF
Hindi digits recognition system on speech data collected in different natural...
PDF
Performance of different classifiers in speech recognition
PDF
Performance of different classifiers in speech recognition
PDF
Et25897899
PDF
Intelligent Arabic letters speech recognition system based on mel frequency c...
PPTX
BSc 4th year project proposal final 16-5-22
PDF
AN EFFICIENT SPEECH RECOGNITION SYSTEM
PDF
Combined feature extraction techniques and naive bayes classifier for speech ...
PDF
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
PDF
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
PDF
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
PDF
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
PDF
44 i9 advanced-speaker-recognition
PDF
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
PDF
Effect of MFCC Based Features for Speech Signal Alignments
PDF
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
PDF
Effect of MFCC Based Features for Speech Signal Alignments
PDF
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
PDF
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
PDF
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
Hindi digits recognition system on speech data collected in different natural...
Performance of different classifiers in speech recognition
Performance of different classifiers in speech recognition
Et25897899
Intelligent Arabic letters speech recognition system based on mel frequency c...
BSc 4th year project proposal final 16-5-22
AN EFFICIENT SPEECH RECOGNITION SYSTEM
Combined feature extraction techniques and naive bayes classifier for speech ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
44 i9 advanced-speaker-recognition
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Effect of MFCC Based Features for Speech Signal Alignments
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
Effect of MFCC Based Features for Speech Signal Alignments
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
Ad

More from cscpconf (20)

PDF
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
PDF
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
PDF
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
PDF
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PDF
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
PDF
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
PDF
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
PDF
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
PDF
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
PDF
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
PDF
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
PDF
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
PDF
AUTOMATED PENETRATION TESTING: AN OVERVIEW
PDF
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
PDF
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
PDF
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PDF
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
PDF
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
PDF
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
PDF
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
AUTOMATED PENETRATION TESTING: AN OVERVIEW
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT

Recently uploaded (20)

PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Cell Structure & Organelles in detailed.
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Computing-Curriculum for Schools in Ghana
FourierSeries-QuestionsWithAnswers(Part-A).pdf
A systematic review of self-coping strategies used by university students to ...
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Final Presentation General Medicine 03-08-2024.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
VCE English Exam - Section C Student Revision Booklet
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Final Presentation General Medicine 03-08-2024.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chinmaya Tiranga quiz Grand Finale.pdf
Cell Structure & Organelles in detailed.

SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION

  • 1. Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 33–38, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3104 SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION Sheena C V1 , T M Thasleema2 and N K Narayanan3 Department of Information Technology, Kannur University, Kerala, INDIA sheenacvg@gmail.com1 , thasnitm1@hotmail.com2 and nknarayanan@gmail.com3 ABSTRACT This paper reports a word modeling algorithm for the Malayalam isolated digit recognition to reduce the search time in the classification process. A recognition experiment is carried out for the 10 Malayalam digits using the Mel Frequency Cepstral Coefficients (MFCC) feature parameters and k - Nearest Neighbor (k-NN) classification algorithm. A word modeling schema using Hidden Markov Model (HMM) algorithm is developed. From the experimental result it is reported that we can reduce the search time for the classification process using the proposed algorithm in telephony application by a factor of 80% for the first digit recognition. KEYWORDS Isolated Digit Recognition, Mel Frequency Cepstral Coefficient, k - Nearest Neighbor, Hidden Markov Model. 1. INTRODUCTION Speech recognition is one of the active research areas in Human Computer Interaction [1]. Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of speakers. It involves capturing and digitizing the sound waves, converting them to basic language units or phonemes, constructing words from phonemes, and contextually analyzing the words to ensure correct spelling for words that sound alike. This paper discusses two different stages for Malayalam digit recognition using Mel Frequency Cepstral Coefficients (MFCC) algorithm. In the first stage a recognition experiment is carried out using k-NN algorithm and in the later part a word modeling algorithm is proposed for the Malayalam telephony application using Hidden Markov Model (HMM) for faster classification. The basic theory of HMM was introduced and studied in the late 1960s and early 1970s [2]. But from the literature study, it is reported that only in the past some decades only HMM has been applied accurately to problems in speech processing. These models are very rich in mathematical structure and thus it can provide the theoretical basis for use in wide range of applications. Here we have developed a word modeling algorithm for the Malayalam isolated digits using HMM.
  • 2. 34 Computer Science & Information Technology (CS & IT) Malayalam is the one of the major language in the Dravidian language family. It is regional language of south Indian state of Kerala and also on the Lakshadweep islands spoken by about 36 million people [3]. The phonemic structure of Malayalam contains 51 vowels/consonant-vowel sounds in which 15 long and short vowels and 36 consonant-vowel sounds. Due to lineage of Malayalam to both Sanskrit and Tamil, Malayalam language structure has the largest number of phonemic utterances among the Indian languages [4]. Malayalam script includes letters capable of representing all the phoneme of Sanskrit and all Dravidian languages [5]. In this work, in the recognition stage we have used ten Malayalam digits uttered by a single speaker repeated 20 times and is tabulated in the table 1. In the word modeling part we have used isolated digits in Malayalam for the use of telephone applications. The organization of this paper is as follows. In section II feature extraction using MFCC algorithm is discussed. Section III explains recognition experiments using k-NN algorithm and the results are discussed. Section IV gives an overview on HMM followed by Malayalam isolated digit word modeling using the probability matrix. Finally in section V concludes the present work followed by directions for future work. 2. FEATURE EXTRACTION USING MFCC COEFFICIENTS Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. In this paper we discusses one of the basic speech feature extraction technique namely Mel Frequency Cepstral Coefficient (MFCC). The MFCC method uses the bank of filters scaled according to the Mel scale to smooth the spectrum and to perform similar to that executed by the human ear [6]. The filters with Mel scale spaced linearly at low frequencies up to 1 kHz and logarithmically at higher frequencies are used to capture the phonetical characteristics of the speech signals. Thus MFCCs are used to represent human speech perception models. MFCCs are computed as in fig 1.
  • 3. Computer Science & Information Technology (CS & IT) 35 Frame blocking is the process of segmenting the speech samples obtained from the analog to digital (A/D) conversion into small frames with time length in the range of (20 to 40) milliseconds. In the next step windowing is carried out to each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. After that Fast Fourier Transform (FFT) is applied for converting each frame of N samples from the time domain in to the frequency domain. Then each frame with actual frequency, f measured in Hz is converted on a scale called the ‘Mel’ scale. The Mel frequency is calculated using the formula 700/)1log(2595 Hzmel FF += (1) The Mel-frequency scale is linear frequency spacing below 1000Hz and a logarithmic spacing above 1000Hz. The log Mel spectrum is again converted into time domain using discrete cosine transform (DCT) to get Mel Frequency Cepstral Coefficients (MFCC). Thus the MFCC is derived by applying above described procedure for each speech frame. A set of MFCC coefficients are extracted by taking the average of each frame and are used as a feature set in the k-NN recognition algorithm. 3. SPEECH RECOGNITION USING K-NN ALGORITHM k-Nearest Neighbor algorithm (k-NN) is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition and many others [7]. k- NN is a method for classifying objects based on closest training samples in the feature space. An object is classified by a majority vote of its neighbors. k is always a positive integer. The neighbors are taken from a set of objects for which the correct classification is known [8]. Hand proposed an effective trial and error approach for identifying the value of k that incurs highest recognition accuracy [9]. Various pattern recognition studies with highest performance accuracy are also reported based on these classification techniques [10]. k-NN assumes that the data is in a feature space. If k=1, then the algorithm is simply called the nearest neighbor algorithm. In the example in Fig. 2, we have three classes and the goal is to find a class label for the unknown example xj. In this case we use the Euclidean distance and a value of k=5 neighbors. Of the 5 closest neighbors, 4 belong to w1 and 1 belongs to w3, so xj is assigned to w1, the predominant class.
  • 4. 36 Computer Science & Information Technology (CS & IT) In the specified experiment a database of 20 repetitions of 10 Malayalam digits are used for testing and training purpose.One hundred samples are taken for training and one hundred samples for testing. An average recognition accuracy of 62% is obtained using k-NN algorithm for the Malayalam digit recognition. 4. WORD MODELLING USING HMM Hidden Markov Model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved state [11]. The model is completely defined by the set of parameters A, B and π where A is the transition probability, NjiaA ij ≤≤≤= 1},{ , B is the emission probability, MkNjwbB kj ≤≤≤≤= 1,1)},( is the probability of the observation wk being generated from the state j, π is the initial state probabilities. Thus a model of N state and M observation can be defined by λ= (A,B,π) [2]. The present work discusses the modeling algorithm developed for the Malayalam isolated digit recognition in telephony application. We considered 50 mobile numbers of the BSNL service provider. Here we calculated and tabulated the initial probabilities and transition probabilities and are given in table 2 and the corresponding HMM model is shown in fig 3.
  • 5. Computer Science & Information Technology (CS & IT) 37 From the tabulated result it is found out that the initial probability for all the digits except the digit 8 and 9 is 0 while the digit 8 has the probability 0.05 and 9 has the probability0.95, since the database contains all the BSNL numbers starting with the digit 8 and 9. In this work we make use this result to reduce the search time in the recognition experiment in such a way that in the classification stage we can start the recognition experiment only by considering the digit 8 and 9 and hence we can reduce the search time by a factor of 80% in the first digit. The similar procedure can be extended to the successive digits also, resulting in a good reduction in search time in recognition/classification experiment. 5. CONCLUSIONS This paper presented a word modeling schema for the recognition of Malayalam isolated digit recognition using various mobile numbers uttered in Malayalam. Two different stages are carried out in this study. In the first stage a recognition experiment is carried out using MFCC coefficients and k-NN algorithm for the 10 Malayalam digits and a recognition accuracy of 62% is obtained. In the second stage a word modeling algorithm is proposed for the Malayalam isolated digits using HMM. From the experimental result it is reported that using proposed algorithm we can reduce the search time by a factor of 80% in the recognition of first digit for the classification process of BSNL telephone number recognition system. The modeling of the all the isolated digits from the different service providers using HMM modeling algorithm and its recognition using other classification algorithms are some of our future research directions. REFERENCES [1] Rabiner Lawrence and Biing-Hwang Juang (1993) Fundamentals of speech Recognition Pretice Hall. [2] L. R.Rabiner and B. H. Juang, (1986), “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine , pp. 4 – 16, 1986. [3] Ramachandran, H. P (2008) Encyclopedia of language and linguistics,. Oxford: Pergamon Press. [4] Aiyar, S (1987). Dravid ian theories, p. 286.
  • 6. 38 Computer Science & Information Technology (CS & IT) [5] Govindaraju, V., & Setlur, S (2009), Advances in pattern recognition. Guide to OCR for Indic scripts:Document recognition and retrieval, Berlin: Springer. (p. 126). [6] Ibrahim Patel and Y Srinivas Rao, (2010), “ Speech Recognition using HMM with MFCC analysis using frequency spectral decomposition tech nique”, Signal and Image Processing - An International Journal, Vol. 1(2), pp. 101 – 110. [7] Zhang. B and Srihari S N, (2004), “Fast k – Nearest Neighbor using Cluster Based Trees”, IEEE trans. on Pattern Analysis and Machine Intelligence, Vol. 26(4), pp. 525 – 528. [8] Pernkopf.F,(2005),“Bayesian Network Classifiers versus selective k–NNClassifier Patter Recognition,Vol. 38, pp. 1 – 10. [9] Hand D J (1981) Discrimination and classification, NewYo rk, Wiley. [10] Ray A. K and Chatterjee B, (1984), “Design o f a Nearest Neighbor Classifier System for Bengali Character Recognition”, Journal of Inst. Elec. Telecom. Eng , Vol. 30, pp 226 – 229,. [11] Daniel Jurafsky and James Martin (2004) Speech and Language Processing, Pearson Education. Authors Sheena C V received her MSc in Computer Science from, Kannur University , Kerala, India in 2008, she is currently a Ph.D. student under Prof Dr.N.K.Narayanan at Department of Information Technology, Kannur University, Kerala, India. Her research interests include Computer Vision, Digital Image Processing, Digital Speech Processing, Artificial Intelligence and Artificial Neural Networks. T M Thasleema had her M Sc in Computer Science from Kannur University, Kerala, India in 2004. She had to her credit one book chapter and many research publications in national and international levels in the area of speech processing and pattern recognition. Currently she is doing her Ph.D in speech signal processing at Department of Information Technology, Kannur University under the supervision of Prof Dr N. K Narayanan. Dr. N.K. Narayanan is a Senior Professor of Information Technology, Kannur University, Karala, India. He earned a Ph.D in speech signal processing fro m Department o f Electronics, CUSAT, Kerala, India in 1990. He has published more than hundred of research papers in national & international journals in the area of Speech processing, Image processing, Neural networks, ANC and Bioinformatics. He has served as Chairman of the School of Information Science & Technology, Kannur University during 2003 to 2008, and as Principal, Coop Engineering College, Vadakara, Kerala, India during 2009-10. Currently he is the Director, UGC IQAC, Kannur University.