Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 161
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy
Petri Net and LPC Feature Extraction
Ghassaq S. Mosa ghassaqsaeed@yahoo.com
College of Engineering/Department of
Computer Engineering
University of Basrah
Basrah, Iraq.
Abduladhem Abdulkareem Ali aduladhem@compengbas.net
College of Engineering/Department of
Computer Engineering
University of Basrah
Basrah, Iraq.
Abstract
The basic idea behind the proposed hierarchical phoneme recognition is that
phonemes can be classified into specific phoneme types which can be organized
within a hierarchical tree structure. The recognition principle is based on “divide
and conquer” in which a large problem is divided into many smaller, easier to
solve problems whose solutions can be combined to yield a solution to the
complex problem. Fuzzy Petri net (FPN) is a powerful modeling tool for fuzzy
production rules based knowledge systems. For building hierarchical classifier
using Neural Fuzzy Petri net (NFPN), Each node of the hierarchical tree is
represented by a NFPN. Every NFPN in the hierarchical tree is trained by
repeatedly presenting a set of input patterns along with the class to which each
particular pattern belongs. The feature vector used as input to the NFPN is the
LPC parameters.
Keywords: Hierarchical networks, Linear predictive coding, Neural fuzzy Petri net, phoneme recognition,
Speech recognition.
.
1. INTRODUCTION
The Arabic Language is one of the oldest living languages in the world. The bulk of classical
Islamic literature was written in classical Arabic (CA), and the Holy Qur’an was revealed in the
Classical Arabic language. Standard Arabic is the mother (spoken) tongue for more than 200
million people living in the vast geographical area known as the Arab world, which includes
countries such as Iraq, Syria, Jordan, Egypt, Saudi Arabia, Morocco, and Sudan. Arabic is one of
the world's oldest Semitic languages, and it is the fifth most widely used. Arabic is the language of
communication in official discourse, teaching, religious activities, and in literature.
Many works have been done on the recognition of Arabic phonemes. These studies include the
use of neural networks [1-3], Hidden Markov Model [4] and Fuzzy system [5].
Hierarchical approaches based on neural networks were employed in other languages with
different techniques [5-7] . In this paper hieratical Arabic phoneme recognition system is proposed
based on LPC feature vector and neural Fuzzy Petri Net (NFPN). The principle is based on
proposing a decision tree for the Arabic phonemes. NFPN is used as a decision network in each
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 162
node in the tree.
2. ARABIC LANGUGE ALPHABET
Every language is typically partitioned into two broad categories: vowels and consonants. Vowels
are produced without obstructing air flow through the vocal tract, while consonants involve
significant obstruction, creating a nosier sound with weaker amplitude. The Arabic language
consists of 28 letters, Arabic is written from right to left and letters take different forms depending
on their position in a word; some letters are similar to others except for diacritical points placed
above or beneath them. Arab linguists classify Arabic letters into two categories: sun and moon.
Sun letters are indicated by an asterisk. When the sun letters are preceded by the prefix Alif-
Laam in nouns, the Laam consonant is not pronounced. The Arabic language has six different
vowels, three short and three long. The short vowels are fatha (a), short kasrah (i), and short
dammah (u). No special letters are assigned to the short vowels; however special marks and
diacritical notations above and beneath the consonants are used. The three long vowels are
durational allophones of the above short vowels, as in mad, meet, and soon and correspond to
long fatha, long kasrah, and long dammah respectively. Consonants can be also un-vowelised
(not followed by a vowel); in this case a diacritic sakoon is placed above the Consonant. Vowels
and their IPA (International Phonetic Alphabet) equivalents .
.
3. SPEECH RECOGNITION SYSTEM
The general model for speech recognition system here are five major phases recording &
digitalizing speech signal, segmentation, pre-processing signal, feature extraction and decision-
making, each phase will be explained in more details along with the approaches used to enhance
the performance of the speech recognition systems.
A. A/D Conversion: The input speech signal is changed into an electrical signal by using a
microphone. Before performing A/D conversion, a low pass filter is used to eliminate the aliasing
effect during sampling. A continuous speech signal has a maximum frequency component at
about 16 KHz.
B. Segmentation: Speech segmentation plays an important role in speech recognition in
reducing the requirement for large memory and in minimizing the computation complexity in large
vocabulary continuous speech recognition systems. [8].
C. Preprocessing: Preprocessing includes filtering and scaling of the incoming signal in order to
reduce the noise and other external effect. Filtering speech signal before recognition task is an
important process to remove noise related to speech signal which may be either low frequency or
high frequency noise. Figure (1) shows the effect of preprocessing on signal.
D. Feature Extraction: The goal of feature extraction is to represent any speech signal by a finite
number of measures (or features) of the signal. This is because the entirety of the information in
the acoustic signal is too much to process, and not all of the information is relevant for specific
tasks. In present ASR systems, the approach of feature extraction has generally been to find a
representation that is relatively stable for different examples of the same speech sound, despite
differences in the speaker or environmental characteristics, while keeping the part that represents
the message in the speech signal relatively intact [9].
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 163
FIGURE 1: The word Basrah before and after filtering and normalizing.
4. LINEAR PRIDICTIVE CODING
Linear predictive analysis has been one of the most powerful speech analysis techniques since it
was introduced in the early 1970s [10]. The LPC is a mode1 based on the vocal tract of human
beings [11]. Figure (2) shows the block diagram of the LPC calculations.
FIGURE 2: The LPC block diagram
A. Frame Blocking: The digitalized speech signal, S (n), is blocked into frames of N samples,
with adjacent frames being separated by M samples. If we denote the lth frame of speech by
xl(n ), and there are L frames within the entire speech signal, then
B. Windowing: To minimize the discontinuity and therefore preventing spectral leakage of a
signal at the beginning and end of each frame, every frame is multiplied by a window function.
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 164






−
−=
1
2
cos45.054.0)(
N
n
nw
π
(1)
10 −≤≤ Nn
Figure (3) shows the effect of windowing.
FIGURE 3: Effect of window on signal.
C. Autocorrelation Analysis: In this step, each frame of Windowed signal is auto correlated to
give:
(2)
Where is the windowed signal,
Where highest correlation value, p, is the order of LPC analysis. Typically, values of p from 8 to
16 are used. It is interesting to note that the zeroth autocorrelation, , is the energy of
frame. The frame energy is an important parameter for speech-detection. In our case [10,11].
For l=0,1,2,3…….. (3)
D. LPC Analysis: The next processing step is the LPC calculation, which convert each of the auto
correlated frame into an "LPC parameter set", in which the set might be the LPC coefficients, the
reflection (or PARCOR) coefficients, the log are ratio coefficients and the cepstral coefficients.
The formal method for converting from autocorrelation coefficients to an LPC parameter set (for
the LPC autocorrelation method) is known as Durbin's method and can formally given as the
following algorithm for convenience omitting the subscript l or ,[10].
(4)
(5)
(6)
(7)
(8)
Where, the summation in Equation (5 ) is omitted for i=1, the set of Equation(4 -8) are solved
recursively for i=1,2,…P, and the final solution is given as[11] :
= LPC coefficients = ,
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 165
(9)
=PARCOR Coefficients (10)
= Log area ratio coefficients
= (11)
D. LPC Parameter Conversion to Cepstral Coefficients: Avery important LPC parameter set,
which can be driven from the LPC coefficient set, is the LPC cepstral coefficients Cm, that is
calculated by the following Equation:
pm ≤≤1
Where is LPC coefficients
5. FUZZY NEURAL PETRI NET
After extracting the desired features from the input data, they are applied to the decision making
stage in order to make the appropriate decision on the specific class that the input data belongs to.
In this work NFPN is used as the decision making network.
Petri nets, developed by Carl Adam Petri in his Ph.D. thesis in 1962, are generally considered as a
tool for studying and modeling of systems. A Petri net (PN) is foremostly a mathematical
description, but it is also a visual or graphical representation of a system. The application areas of
Petri nets being vigorously investigated involve knowledge representation and discovery, robotics,
process control, diagnostics, grid computation, traffic control, to name a few high representative
domains [12]. Petri nets (PNs) are a mathematical tool to describe concurrent systems and model
their behavior. They have been used to study the performance and dependability of a variety of
systems.
Petri Nets essentially consist of three key components: places, transitions, and directed arcs [13].
The directed arcs connect the places to the transitions and the transitions to the places. There are
no arcs connect transitions to transitions or places to places directly. Each place contains zero or
more tokens. A vector representation of the number of tokens over all places defines the state of
the Petri Net. A simple Petri Net graph is shown in Figure (4). The configuration of the Petri Net
combined with the location of tokens in the net at any particular time is called the Petri Net s
Petri Net structure is formally described by the five-tuple (P, T, I, O, M),
Where P is the set of places {p1, .., pn},
T is the s I is the set of places connected via arcs as inputs to transitions,
O is the set of places connected via arcs as outputs from transitions,
and M is the set of places that contain tokens et of transitions {t1, .., tm}, Formal
Definition and State of Figure (4)
(P, T, I, O, M):
P = { p1, p2, p3, p4 }
T = { t1, t2 }
I = {{ p1 }, { p2, p3 } }
O = {{ p2, p3 }, { p4 } }
M = {1, 0, 0, 0}
The structure of the proposed Neural Fuzzy Petri Net is shown in figure (5) and (6). The network
has the following three layers
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 166
FIGURE 4: Petri Net graph.
:
- an input layer composed of n input
- a transition layer composed of hidden transitions;
- an output layer consisting of m output places.
The input place is marked by the value of the feature. The transitions act as processing units. The
firing depends on the parameters of transitions, which are the thresholds, and the parameters of
the arcs (connections), which are the weights. Each output place corresponds to a class of
pattern. The marking of the output place reflects a level of membership of the pattern in the
corresponding class [14].
FIGURE 5: The structure of the Neural Fuzzy Petri Net.
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 167
FIGURE 6: Section of the net outlines the notations.
The specifications of the network are as follows:
- Xj is the marking level of j-th input place produced by a triangular mapping function. The top of
the triangular function is centered on the average point of the input values. The length of triangular
base is calculated from the difference between the minimum and maximum values of the input.
The height of the triangle is unity. This process keep the input of the network within the period
[0,1]. This generalization of the Petri net will be in full agreement with the two-valued generic
version of the Petri net [14].
))(( jinputf
j
x =
(13)
Where f is a triangular mapping function









=
>
−
−
<
−
−
=
)(,1
)(,
)()max(
)max(
)(,
)min()(
)min(
)(
xaveragexif
xaveragexif
xaveragex
xx
xaveragexif
xxaverage
xx
xf
- Wij is the weight between the i-th transition and the j-th input place;
- rij is a threshold level associated with the level of marking of the j-th input place and the i-
thtransition;
- Zj is the activation level of i-th transition and defined as follows:
)]([
1
jijij
n
j
i XrSWZ T →=
=
, j= 1,2,…, n; i= 1,2,…, hidden (14)
kY is the marking level of the k-th output place produced by the transition layer and performs a
nonlinear mapping of the weighted sum of the activation levels of these transitions (Zi) and the
associated connections Vjk
mjZVfY
onsofTransitiNo
i
ikik ,...,2,1),(
.
1
== ∑= (15)
Where “ f ” is a nonlinear monotonically increasing function from R to [0,1].
Learning Procedure
The learning process depends on minimizing certain performance index in order to optimize the
network parameters (weights and thresholds). The performance index used is the standard sum
of squared errors[4].
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 168
∑=
−=
m
k
kk ytE
1
2
)(
2
1
(16)
Where tk is the k-th target;
yk is the k-th output. The updates of the parameters are performed according to the gradient
method
Eparamα)param(iter1)param(iter ∇−=+
(17)
Where ∇ Eparam E is a gradient of the performance index E with respect to the network
parameters, α is the learning rate coefficient, and iter is the iteration counter.
The nonlinear function associated with the output place is a standard sigmoid described as[14]:
)exp(1
1
∑−+
=
kii
k
VZ
y
(18)
In this paper we used the fuzzy Petri net in Arabic phoneme recognition and using LPC (liner
predictive code) technique as feature extracting for speech signal.
6. EXPERMENTAL RESULTS
For each phoneme, there are 24 recorded words. In eight of the 24 words, the target comes as
the initial letter in word. In the second group, 8 phonemes come in the middle of the words. In
other eight phonemes, the target phoneme comes at the end of the words. Manual segmentation
is used to extract target phoneme from the recorded words. Half of the phonemes used in the
training and the other half is used for testing the resulting system. Adobe Audition software1.5 is
used to record and save the data files. The recorded data is stored as (.WAV) files with 16-bit per
sample precision and a sampling rate of 16 kHz.
A hierarchical tree is formed for the phonemes as shown in figure (7). Five classes are exist in
this tree. The first class is the fricative which contain both voiced and un voiced, the second class
is the stop which contain both voiced and unvoiced, the third class contain the semi vowel
phoneme, the fourth class contain nasal, and fifth class contain affricative, lateral, and trail. Each
node in the tree is recognized using a separate NFPN with LPC parameters as inputs to these
networks. The recognition principle is based on divide and contour. Class 1 is identified with a net
consist of 18 input place, 56 transition and one output place. For class2 is a net consist of 18
input place, 47 transitions and one output place. For class 3 the input place is 18, the hidden layer
is 22 and one output place. For class 18 the input place is 2, the hidden layer is 22 and one output
place. For class 5 is a net consist of 18 input place, 36 hidden layer and one output. Table 1
shows the recognition accuracy for each phoneme and class recognition. It is found that the total
recognition accuracy reached 79.6378%.
7. Conclusion
Arabic phoneme recognition system is proposed in this work. The decision is based on LPC
feature vector and hierarchical NFPN as a decision network. The experimental results shows that
it is possible to use the hierarchical structure to recognize phonemes using NFPN.
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 169
FIGURE 7: The proposed hierarchical tree.
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 170
TABLE 1: The phonemes recognition accuracy
Ghassaq S. Mosa and Abduladhem A. Ali
Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 171
8. REFERENCES
1. S. Ismail, and A. Ahmad, “Recurrent neural network with back propagation through time
algorithm for Arabic recognition,” In Proceedings of the 18th ESM Magdeburg, Germany, 13-16
June 2004.
2. S. Al-Sayegh, and A. AbedEl-Kader, “Arabic phoneme recognizer based on neural network” , In
Proceedings of International Conf. Intelligent Knowledge Systems (IKS-2004), August 16-20,2004.
3. Y. Alotaibi, S. Selouani, and D. O’Shaughnessy, “Experiments on Automatic Recognition of
Nonnative Arabic Speech”, EURASIP Journal on Audio, Speech, and Music Processing, Vol.
2008, pp.1-6, 2008.
4. M. Awais, and Habib-ur-Rehman, “Recognition of Arabic phonemes using fuzzy rule base
system”, In Proceedings of 7th Int. Multi Topic Conf. INMIC-2003, pp.367-370, 8-9 Dec. 2003.
5. P. Schwarz, P. Matejka, and J. Cernocky, “Hierarchical Structures of Neural Networks for
Phoneme Recognition”, In Proceedings of IEEE Int. Conf. Acoustics, Speech and Signal
Processing, ICSP-2006, 14-19 May 2006.
6. J. Pinto and H. Hermansky, “Combining Evidence from a Generative and a Discriminative
Model in Phoneme Recognition”, In Proceedings of Interspeech. Brisbane, Australia 22-26
September 2008
7. M. Scholz, and R. Vigario, “ Nonlinear PCA: a new hierarchical approach”, In Proceedings of
European Symposium on Artificial Neural Networks ESANN-2002, pp. 439-444, Bruges,
Belgium, 24-26 April 2002.
8. Y. Suh and Y. Lee, "Phoneme Segmentation of Continuous Speech using Multi-Layer
perceptron", In Proceedings of 4th Int. Conf. Spoken Language, ICSLP-96,3, pp.1297-1300 ,
1996.
9. Y.Gong, "Speech Recognition in Noisy Environments: A SurveSpeech Communication ,16,
p:261-291, 1995.
10. N. Awasthy, J.P.Saini and D.S.Chauhan "Spectral Analysis of Speech: A New Technique", Int.
J. Signal Processing ,2(1), p: 19-28, 2005.
11. L.Rabinar and R.W.Schafar "Fundamental of Speech Recognition ",Prentice Hall, 1993.
12. S. I. Ahson “Petri net models of fuzzy neural networks,” IEEE Trans. Syst. Man Cybern., 25(6),
pp. 926–932, Jun. 1995.
13. A. Seely, “Petri Net Implementation of Neural Network Elements”, M.Sc. Thesis, Nova
Southeastern University, 2002.
14. H. M. Abdul-Ridha," ECG Signal Classification using Neural, Neural Fuzzy and Neural Fuzzy
Petri Networks" Ph.D. Thesis,Department of Electrical Engineering, University of Basrah, 2007.

More Related Content

PDF
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
PDF
Isolated word recognition using lpc & vector quantization
PDF
High Quality Arabic Concatenative Speech Synthesis
PPT
Environmental Sound detection Using MFCC technique
PDF
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
PDF
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
DOC
Voice Morphing
PDF
Isolated words recognition using mfcc, lpc and neural network
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Isolated word recognition using lpc & vector quantization
High Quality Arabic Concatenative Speech Synthesis
Environmental Sound detection Using MFCC technique
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Voice Morphing
Isolated words recognition using mfcc, lpc and neural network

What's hot (19)

PDF
Learning R via Python…or the other way around
PDF
A017410108
PDF
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
DOCX
Voice biometric recognition
PDF
A Rewriting Approach to Concurrent Programming Language Design and Semantics
PPTX
Text independent speaker recognition system
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PDF
NLP using transformers
PDF
Access Control via Belnap Logic
PDF
Sequence Learning with CTC technique
PPTX
Notes on attention mechanism
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
PDF
44 i9 advanced-speaker-recognition
PDF
Realization and design of a pilot assist decision making system based on spee...
PDF
Deep Learning for Machine Translation - A dramatic turn of paradigm
PDF
Text-Independent Speaker Verification Report
PPTX
Learning R via Python…or the other way around
A017410108
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Voice biometric recognition
A Rewriting Approach to Concurrent Programming Language Design and Semantics
Text independent speaker recognition system
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
NLP using transformers
Access Control via Belnap Logic
Sequence Learning with CTC technique
Notes on attention mechanism
Speaker Recognition System using MFCC and Vector Quantization Approach
Introduction to Transformers for NLP - Olga Petrova
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
44 i9 advanced-speaker-recognition
Realization and design of a pilot assist decision making system based on spee...
Deep Learning for Machine Translation - A dramatic turn of paradigm
Text-Independent Speaker Verification Report
Ad

Similar to Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC Feature Extraction (20)

PDF
Broad Phoneme Classification Using Signal Based Features
PDF
Broad phoneme classification using signal based features
PDF
50120140505010
PDF
Emotion Recognition Based On Audio Speech
PDF
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
PDF
50120140501002
PDF
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
PDF
An expert system for automatic reading of a text written in standard arabic
PDF
Phonetic distance based accent
PDF
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
PDF
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
PPT
Asr
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PDF
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
PDF
Hindi digits recognition system on speech data collected in different natural...
PDF
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
PDF
5215ijcseit01
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
PDF
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
Broad Phoneme Classification Using Signal Based Features
Broad phoneme classification using signal based features
50120140505010
Emotion Recognition Based On Audio Speech
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
50120140501002
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
An expert system for automatic reading of a text written in standard arabic
Phonetic distance based accent
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Asr
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Hindi digits recognition system on speech data collected in different natural...
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
5215ijcseit01
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
Ad

Recently uploaded (20)

PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
Trump Administration's workforce development strategy
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
IGGE1 Understanding the Self1234567891011
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Empowerment Technology for Senior High School Guide
PPTX
History, Philosophy and sociology of education (1).pptx
Paper A Mock Exam 9_ Attempt review.pdf.
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
AI-driven educational solutions for real-life interventions in the Philippine...
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Trump Administration's workforce development strategy
Introduction to pro and eukaryotes and differences.pptx
IGGE1 Understanding the Self1234567891011
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Cambridge-Practice-Tests-for-IELTS-12.docx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
LDMMIA Reiki Yoga Finals Review Spring Summer
FORM 1 BIOLOGY MIND MAPS and their schemes
202450812 BayCHI UCSC-SV 20250812 v17.pptx
TNA_Presentation-1-Final(SAVE)) (1).pptx
Virtual and Augmented Reality in Current Scenario
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Empowerment Technology for Senior High School Guide
History, Philosophy and sociology of education (1).pptx

Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC Feature Extraction

  • 1. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 161 Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC Feature Extraction Ghassaq S. Mosa ghassaqsaeed@yahoo.com College of Engineering/Department of Computer Engineering University of Basrah Basrah, Iraq. Abduladhem Abdulkareem Ali aduladhem@compengbas.net College of Engineering/Department of Computer Engineering University of Basrah Basrah, Iraq. Abstract The basic idea behind the proposed hierarchical phoneme recognition is that phonemes can be classified into specific phoneme types which can be organized within a hierarchical tree structure. The recognition principle is based on “divide and conquer” in which a large problem is divided into many smaller, easier to solve problems whose solutions can be combined to yield a solution to the complex problem. Fuzzy Petri net (FPN) is a powerful modeling tool for fuzzy production rules based knowledge systems. For building hierarchical classifier using Neural Fuzzy Petri net (NFPN), Each node of the hierarchical tree is represented by a NFPN. Every NFPN in the hierarchical tree is trained by repeatedly presenting a set of input patterns along with the class to which each particular pattern belongs. The feature vector used as input to the NFPN is the LPC parameters. Keywords: Hierarchical networks, Linear predictive coding, Neural fuzzy Petri net, phoneme recognition, Speech recognition. . 1. INTRODUCTION The Arabic Language is one of the oldest living languages in the world. The bulk of classical Islamic literature was written in classical Arabic (CA), and the Holy Qur’an was revealed in the Classical Arabic language. Standard Arabic is the mother (spoken) tongue for more than 200 million people living in the vast geographical area known as the Arab world, which includes countries such as Iraq, Syria, Jordan, Egypt, Saudi Arabia, Morocco, and Sudan. Arabic is one of the world's oldest Semitic languages, and it is the fifth most widely used. Arabic is the language of communication in official discourse, teaching, religious activities, and in literature. Many works have been done on the recognition of Arabic phonemes. These studies include the use of neural networks [1-3], Hidden Markov Model [4] and Fuzzy system [5]. Hierarchical approaches based on neural networks were employed in other languages with different techniques [5-7] . In this paper hieratical Arabic phoneme recognition system is proposed based on LPC feature vector and neural Fuzzy Petri Net (NFPN). The principle is based on proposing a decision tree for the Arabic phonemes. NFPN is used as a decision network in each
  • 2. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 162 node in the tree. 2. ARABIC LANGUGE ALPHABET Every language is typically partitioned into two broad categories: vowels and consonants. Vowels are produced without obstructing air flow through the vocal tract, while consonants involve significant obstruction, creating a nosier sound with weaker amplitude. The Arabic language consists of 28 letters, Arabic is written from right to left and letters take different forms depending on their position in a word; some letters are similar to others except for diacritical points placed above or beneath them. Arab linguists classify Arabic letters into two categories: sun and moon. Sun letters are indicated by an asterisk. When the sun letters are preceded by the prefix Alif- Laam in nouns, the Laam consonant is not pronounced. The Arabic language has six different vowels, three short and three long. The short vowels are fatha (a), short kasrah (i), and short dammah (u). No special letters are assigned to the short vowels; however special marks and diacritical notations above and beneath the consonants are used. The three long vowels are durational allophones of the above short vowels, as in mad, meet, and soon and correspond to long fatha, long kasrah, and long dammah respectively. Consonants can be also un-vowelised (not followed by a vowel); in this case a diacritic sakoon is placed above the Consonant. Vowels and their IPA (International Phonetic Alphabet) equivalents . . 3. SPEECH RECOGNITION SYSTEM The general model for speech recognition system here are five major phases recording & digitalizing speech signal, segmentation, pre-processing signal, feature extraction and decision- making, each phase will be explained in more details along with the approaches used to enhance the performance of the speech recognition systems. A. A/D Conversion: The input speech signal is changed into an electrical signal by using a microphone. Before performing A/D conversion, a low pass filter is used to eliminate the aliasing effect during sampling. A continuous speech signal has a maximum frequency component at about 16 KHz. B. Segmentation: Speech segmentation plays an important role in speech recognition in reducing the requirement for large memory and in minimizing the computation complexity in large vocabulary continuous speech recognition systems. [8]. C. Preprocessing: Preprocessing includes filtering and scaling of the incoming signal in order to reduce the noise and other external effect. Filtering speech signal before recognition task is an important process to remove noise related to speech signal which may be either low frequency or high frequency noise. Figure (1) shows the effect of preprocessing on signal. D. Feature Extraction: The goal of feature extraction is to represent any speech signal by a finite number of measures (or features) of the signal. This is because the entirety of the information in the acoustic signal is too much to process, and not all of the information is relevant for specific tasks. In present ASR systems, the approach of feature extraction has generally been to find a representation that is relatively stable for different examples of the same speech sound, despite differences in the speaker or environmental characteristics, while keeping the part that represents the message in the speech signal relatively intact [9].
  • 3. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 163 FIGURE 1: The word Basrah before and after filtering and normalizing. 4. LINEAR PRIDICTIVE CODING Linear predictive analysis has been one of the most powerful speech analysis techniques since it was introduced in the early 1970s [10]. The LPC is a mode1 based on the vocal tract of human beings [11]. Figure (2) shows the block diagram of the LPC calculations. FIGURE 2: The LPC block diagram A. Frame Blocking: The digitalized speech signal, S (n), is blocked into frames of N samples, with adjacent frames being separated by M samples. If we denote the lth frame of speech by xl(n ), and there are L frames within the entire speech signal, then B. Windowing: To minimize the discontinuity and therefore preventing spectral leakage of a signal at the beginning and end of each frame, every frame is multiplied by a window function.
  • 4. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 164       − −= 1 2 cos45.054.0)( N n nw π (1) 10 −≤≤ Nn Figure (3) shows the effect of windowing. FIGURE 3: Effect of window on signal. C. Autocorrelation Analysis: In this step, each frame of Windowed signal is auto correlated to give: (2) Where is the windowed signal, Where highest correlation value, p, is the order of LPC analysis. Typically, values of p from 8 to 16 are used. It is interesting to note that the zeroth autocorrelation, , is the energy of frame. The frame energy is an important parameter for speech-detection. In our case [10,11]. For l=0,1,2,3…….. (3) D. LPC Analysis: The next processing step is the LPC calculation, which convert each of the auto correlated frame into an "LPC parameter set", in which the set might be the LPC coefficients, the reflection (or PARCOR) coefficients, the log are ratio coefficients and the cepstral coefficients. The formal method for converting from autocorrelation coefficients to an LPC parameter set (for the LPC autocorrelation method) is known as Durbin's method and can formally given as the following algorithm for convenience omitting the subscript l or ,[10]. (4) (5) (6) (7) (8) Where, the summation in Equation (5 ) is omitted for i=1, the set of Equation(4 -8) are solved recursively for i=1,2,…P, and the final solution is given as[11] : = LPC coefficients = ,
  • 5. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 165 (9) =PARCOR Coefficients (10) = Log area ratio coefficients = (11) D. LPC Parameter Conversion to Cepstral Coefficients: Avery important LPC parameter set, which can be driven from the LPC coefficient set, is the LPC cepstral coefficients Cm, that is calculated by the following Equation: pm ≤≤1 Where is LPC coefficients 5. FUZZY NEURAL PETRI NET After extracting the desired features from the input data, they are applied to the decision making stage in order to make the appropriate decision on the specific class that the input data belongs to. In this work NFPN is used as the decision making network. Petri nets, developed by Carl Adam Petri in his Ph.D. thesis in 1962, are generally considered as a tool for studying and modeling of systems. A Petri net (PN) is foremostly a mathematical description, but it is also a visual or graphical representation of a system. The application areas of Petri nets being vigorously investigated involve knowledge representation and discovery, robotics, process control, diagnostics, grid computation, traffic control, to name a few high representative domains [12]. Petri nets (PNs) are a mathematical tool to describe concurrent systems and model their behavior. They have been used to study the performance and dependability of a variety of systems. Petri Nets essentially consist of three key components: places, transitions, and directed arcs [13]. The directed arcs connect the places to the transitions and the transitions to the places. There are no arcs connect transitions to transitions or places to places directly. Each place contains zero or more tokens. A vector representation of the number of tokens over all places defines the state of the Petri Net. A simple Petri Net graph is shown in Figure (4). The configuration of the Petri Net combined with the location of tokens in the net at any particular time is called the Petri Net s Petri Net structure is formally described by the five-tuple (P, T, I, O, M), Where P is the set of places {p1, .., pn}, T is the s I is the set of places connected via arcs as inputs to transitions, O is the set of places connected via arcs as outputs from transitions, and M is the set of places that contain tokens et of transitions {t1, .., tm}, Formal Definition and State of Figure (4) (P, T, I, O, M): P = { p1, p2, p3, p4 } T = { t1, t2 } I = {{ p1 }, { p2, p3 } } O = {{ p2, p3 }, { p4 } } M = {1, 0, 0, 0} The structure of the proposed Neural Fuzzy Petri Net is shown in figure (5) and (6). The network has the following three layers
  • 6. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 166 FIGURE 4: Petri Net graph. : - an input layer composed of n input - a transition layer composed of hidden transitions; - an output layer consisting of m output places. The input place is marked by the value of the feature. The transitions act as processing units. The firing depends on the parameters of transitions, which are the thresholds, and the parameters of the arcs (connections), which are the weights. Each output place corresponds to a class of pattern. The marking of the output place reflects a level of membership of the pattern in the corresponding class [14]. FIGURE 5: The structure of the Neural Fuzzy Petri Net.
  • 7. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 167 FIGURE 6: Section of the net outlines the notations. The specifications of the network are as follows: - Xj is the marking level of j-th input place produced by a triangular mapping function. The top of the triangular function is centered on the average point of the input values. The length of triangular base is calculated from the difference between the minimum and maximum values of the input. The height of the triangle is unity. This process keep the input of the network within the period [0,1]. This generalization of the Petri net will be in full agreement with the two-valued generic version of the Petri net [14]. ))(( jinputf j x = (13) Where f is a triangular mapping function          = > − − < − − = )(,1 )(, )()max( )max( )(, )min()( )min( )( xaveragexif xaveragexif xaveragex xx xaveragexif xxaverage xx xf - Wij is the weight between the i-th transition and the j-th input place; - rij is a threshold level associated with the level of marking of the j-th input place and the i- thtransition; - Zj is the activation level of i-th transition and defined as follows: )]([ 1 jijij n j i XrSWZ T →= = , j= 1,2,…, n; i= 1,2,…, hidden (14) kY is the marking level of the k-th output place produced by the transition layer and performs a nonlinear mapping of the weighted sum of the activation levels of these transitions (Zi) and the associated connections Vjk mjZVfY onsofTransitiNo i ikik ,...,2,1),( . 1 == ∑= (15) Where “ f ” is a nonlinear monotonically increasing function from R to [0,1]. Learning Procedure The learning process depends on minimizing certain performance index in order to optimize the network parameters (weights and thresholds). The performance index used is the standard sum of squared errors[4].
  • 8. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 168 ∑= −= m k kk ytE 1 2 )( 2 1 (16) Where tk is the k-th target; yk is the k-th output. The updates of the parameters are performed according to the gradient method Eparamα)param(iter1)param(iter ∇−=+ (17) Where ∇ Eparam E is a gradient of the performance index E with respect to the network parameters, α is the learning rate coefficient, and iter is the iteration counter. The nonlinear function associated with the output place is a standard sigmoid described as[14]: )exp(1 1 ∑−+ = kii k VZ y (18) In this paper we used the fuzzy Petri net in Arabic phoneme recognition and using LPC (liner predictive code) technique as feature extracting for speech signal. 6. EXPERMENTAL RESULTS For each phoneme, there are 24 recorded words. In eight of the 24 words, the target comes as the initial letter in word. In the second group, 8 phonemes come in the middle of the words. In other eight phonemes, the target phoneme comes at the end of the words. Manual segmentation is used to extract target phoneme from the recorded words. Half of the phonemes used in the training and the other half is used for testing the resulting system. Adobe Audition software1.5 is used to record and save the data files. The recorded data is stored as (.WAV) files with 16-bit per sample precision and a sampling rate of 16 kHz. A hierarchical tree is formed for the phonemes as shown in figure (7). Five classes are exist in this tree. The first class is the fricative which contain both voiced and un voiced, the second class is the stop which contain both voiced and unvoiced, the third class contain the semi vowel phoneme, the fourth class contain nasal, and fifth class contain affricative, lateral, and trail. Each node in the tree is recognized using a separate NFPN with LPC parameters as inputs to these networks. The recognition principle is based on divide and contour. Class 1 is identified with a net consist of 18 input place, 56 transition and one output place. For class2 is a net consist of 18 input place, 47 transitions and one output place. For class 3 the input place is 18, the hidden layer is 22 and one output place. For class 18 the input place is 2, the hidden layer is 22 and one output place. For class 5 is a net consist of 18 input place, 36 hidden layer and one output. Table 1 shows the recognition accuracy for each phoneme and class recognition. It is found that the total recognition accuracy reached 79.6378%. 7. Conclusion Arabic phoneme recognition system is proposed in this work. The decision is based on LPC feature vector and hierarchical NFPN as a decision network. The experimental results shows that it is possible to use the hierarchical structure to recognize phonemes using NFPN.
  • 9. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 169 FIGURE 7: The proposed hierarchical tree.
  • 10. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 170 TABLE 1: The phonemes recognition accuracy
  • 11. Ghassaq S. Mosa and Abduladhem A. Ali Signal Processing: An International Journal (SPIJ) Volume (3) : Issue (5) 171 8. REFERENCES 1. S. Ismail, and A. Ahmad, “Recurrent neural network with back propagation through time algorithm for Arabic recognition,” In Proceedings of the 18th ESM Magdeburg, Germany, 13-16 June 2004. 2. S. Al-Sayegh, and A. AbedEl-Kader, “Arabic phoneme recognizer based on neural network” , In Proceedings of International Conf. Intelligent Knowledge Systems (IKS-2004), August 16-20,2004. 3. Y. Alotaibi, S. Selouani, and D. O’Shaughnessy, “Experiments on Automatic Recognition of Nonnative Arabic Speech”, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2008, pp.1-6, 2008. 4. M. Awais, and Habib-ur-Rehman, “Recognition of Arabic phonemes using fuzzy rule base system”, In Proceedings of 7th Int. Multi Topic Conf. INMIC-2003, pp.367-370, 8-9 Dec. 2003. 5. P. Schwarz, P. Matejka, and J. Cernocky, “Hierarchical Structures of Neural Networks for Phoneme Recognition”, In Proceedings of IEEE Int. Conf. Acoustics, Speech and Signal Processing, ICSP-2006, 14-19 May 2006. 6. J. Pinto and H. Hermansky, “Combining Evidence from a Generative and a Discriminative Model in Phoneme Recognition”, In Proceedings of Interspeech. Brisbane, Australia 22-26 September 2008 7. M. Scholz, and R. Vigario, “ Nonlinear PCA: a new hierarchical approach”, In Proceedings of European Symposium on Artificial Neural Networks ESANN-2002, pp. 439-444, Bruges, Belgium, 24-26 April 2002. 8. Y. Suh and Y. Lee, "Phoneme Segmentation of Continuous Speech using Multi-Layer perceptron", In Proceedings of 4th Int. Conf. Spoken Language, ICSLP-96,3, pp.1297-1300 , 1996. 9. Y.Gong, "Speech Recognition in Noisy Environments: A SurveSpeech Communication ,16, p:261-291, 1995. 10. N. Awasthy, J.P.Saini and D.S.Chauhan "Spectral Analysis of Speech: A New Technique", Int. J. Signal Processing ,2(1), p: 19-28, 2005. 11. L.Rabinar and R.W.Schafar "Fundamental of Speech Recognition ",Prentice Hall, 1993. 12. S. I. Ahson “Petri net models of fuzzy neural networks,” IEEE Trans. Syst. Man Cybern., 25(6), pp. 926–932, Jun. 1995. 13. A. Seely, “Petri Net Implementation of Neural Network Elements”, M.Sc. Thesis, Nova Southeastern University, 2002. 14. H. M. Abdul-Ridha," ECG Signal Classification using Neural, Neural Fuzzy and Neural Fuzzy Petri Networks" Ph.D. Thesis,Department of Electrical Engineering, University of Basrah, 2007.