SlideShare a Scribd company logo
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
21
Speech Emotion Recognition by Using
Combinations of Support Vector Machine (SVM),
and C5.0
Mohammad Masoud Javidi1
and Ebrahim Fazlizadeh Roshan2
1,2
Department of Computer Science, Shahid Bahonar University of Kerman, Kerman,
Iran.
ABSTRACT:
Speech emotion recognition enables a computer system to records sounds and realizes the emotion of the
speaker. we are still far from having a natural interaction between the human and machine because
machines cannot distinguishes the emotion of the speaker. For this reason it has been established a new
investigation field, namely โ€œthe speech emotion recognition systemsโ€. The accuracy of these systems
depend on the various factors such as the type and the number of the emotion states and also the classifier
type. In this paper, the classification methods of C5.0, Support Vector Machine (SVM), and the
combination of C5.0 and SVM (SVM-C5.0) are verified, and their efficiencies in speech emotion
recognition are compared. The utilized features in this research include energy, Zero Crossing Rate (ZCR),
pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The results of paper demonstrate that the
effectiveness proposed SVM-C5.0 classification method is more efficient in recognizing the emotion of the
between -5.5 % and 8.9 % depending on the number of emotion states than SVM, C5.0.
KEY WORDS:
Emotion recognition, Feature extraction, Mel-scale Frequency Cepstral Coefficients, C5.0, Support Vector
Machines
1. INTRODUCTION
Speech emotion recognition aims are to design of the operator systems which receive the speech
signals and extract the emotional states from them. This technology enables a sound recording
computer (e.g. a computer that has a microphone) to realize the emotion of the speaker. However,
there are a lot of challenges before any speech emotion recognition system. The most important
challenges are emotional databases, feature extraction and classification models. A machine
learning framework for emotion recognition from speech are displayed in figure 1.
In the last three decades, several attempts have been done in order to recognize the speech
emotion which the most important ones include [1-6]. The voice and the prosodic features, the
speaking style, the speakerโ€™s characteristics, and the linguistic features can affect the emotion [5].
Several models such as hidden Markov models (HMM) in [9], Gaussian mixture model (GMM)
in [5, 14, 15], NN in [13, 8], and SVM in [5, 16] have been utilized in order to recognize the
speech emotion. It is well investigated in [7] that the SVM and the HMM lead to the most and the
least recognitions, respectively. Another model was suggested in 2013 in which the neural
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
22
network, SVM, C5.0 and their combination were analyzed [27].
Figure 1. A machine learning framework for emotion recognition from speech.
We propose in this paper the SVM-C5.0 method in order to model the speech emotionrecognition
systems more precisely. Regarding the existing features in the speech processing such as energy,
ZCR, pitch, MFCC, and etc.
Emotion states of anger, happiness, fear, sadness, disgust, boredom, and neutral have been considered.
We will apply these recognizing emotion states to different classification models of SVM, C5.0 and
SVM-C5.0 by using the SPSS IBM MODELER software and verify the results. The results of paper
demonstrate that the effectiveness proposed SVM-C5.0 classification method is more efficient in
recognizing the emotion of the between -5.5 % and 8.9 % depending on the number of emotion states
than SVM, C5.0.
The rest of this paper is organized as follows: in section 2, we review the most recently proposed
speech emotion recognition systems, briefly. In section 3, we verify the database of Berlin.
Section 4 and 5 explain the extracted features and utilized models, respectively. In section 6, we
present the experimental results and also compare different methods. In the last section, the
conclusion of this paper is explained.
2. RELATED WORK
In 1990โ€™s, the most recognizing models were proposed based on Linear Recognize Classification (LDC)
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
23
[17] and Maximum Likelihood Bays (MLB) [10, 26]. In the recent years, on the other hand,
GMM [5,12, 14], NN [13, 8], Multi-Layer Perceptron (MLP) [5], K-Nearest Neighbor (KNN)
[11, 14], HMM [9], and SVM [5, 16] used to recognize speech emotion.
Table 1 shows the utilized models in the recent years along with the number of emotion stats and
their recognition rates. In [26], Haq et al. used the MLB classification model and the following
seven emotion states: the anger, disgust, fear, happiness, neutral, sadness, and surprise. They
could attain the recognition rate of 53%. In [10], Ververidis and Kotropoulos used the MLB
classification model and the anger, happiness, neutral, and sadness emotion states and could
achieve the recognition rate of 53.7%. In [8], Yu et al. used the SVM and ANN classification
models and the following four emotions: anger, happiness, neutral, and sadness, and could reach
the recognition rates of 71% and 42% respectively for the SVM and ANN models, which
demonstrate that the SVM is more high performance than the ANN.
Table1. The most recent utilized models and their associated results
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
24
In [9], Ayadi et al. could gain the recognition rates of 71% and 55% for the HMM and ANN
models, respectively, by using the their classification models and seven emotion states, which
reveals that the HMM model performs more better than the one of ANN. In [11], Petrushin used
the KNN model and the anger, happiness, sadness, fear, and neutral emotion states and reached
the recognition rate of 70%.
In [12], Gharavian et al. used the GMM model for four emotion states and could reach the
recognition rate of 65.1%. In one of the most recent researches in [5], Sheikhan et al. used three
emotion states of happiness, anger, and neutral and two classification models, namely modular
neural-SVM and C5.0. They demonstrated the recognition rates of 76.3% and 56.3% for the
former and later models, respectively. And finally, a model
which is a combination of the NN and C5.0 (NN-C5.0) classification methods was proposed in
2013 [27]. This model was applied to various emotion states and its results for 2, 3, 4, 5, 6, and 7
emotion states were 91.625, 82.122, 82.214, 75.807, 74.946, and 72.621, correspondingly.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
25
3. BERLIN DATABASE OF EMOTIONAL SPEECH
The Berlinโ€™s database of the emotional speech is used to classify discrete emotion states. This
database is one of the most applicable ones used to recognize the speech emotion state [7] based
on which several work have been presented (for example, for two of the most recent researches,
we refer to [16] and [18] proposed in 2011 and [27] proposed in 2013).
This database has been implemented in the Technical University of Berlin. There are seven
emotion states utilized in this database: the anger, happiness, boredom, sadness, fear, disgust, and
neutral. Ten artists, including 5 men and 5 women, implemented the database in German
language by saying 10 sentences, including 5 short and 5 long ones with time duration between
1.5 to 4 seconds. Samplings were done as the single channel by 16 kHz frequency. From these
obtained speeches, it is possible to recognize seven real emotions of human. Audio files
associated with 7 emotion states were classified as follows: anger (127), boredom (81), disgust
(46), fear (69), happiness (71), sadness (62) and neutral (79).
4. FEATURE EXTRACTION
The speech feature extraction which is also called speech coding is a very important and is basic
part in many of the automatic processing systems of speech. Features of the speech are generally
obtained from the digital speech. To do this, various methods are utilized that the aim of it to
extract the features of the speech which are useful for the desired aim of the speech automatic
process. Features that we have extracted to do this research are: energy, ZCR, pitch, MFCC
which are described as follows [21, 19, 27 ].
4.1. Mel-Frequency Cepstral Coefficients (MFCCs)
As it was stated other type of Cepstrum coefficients are Mel-frequency cepstrum coefficients
(MFCCs) [20]. The basic idea of using MFCC inspirited from the properties of the human ear in
understanding speech. The human ear function is in a way that its cognitive frequency varies from
actually true frequency (physically) of the voice. A Mel is a step unit of measurement a step or
the frequency of a sound which is heard. In 1940, Stevens and Volkmann experimentally
estimated Mel scale [28].Therefore; in an experiment they called the frequency of 1000Hz by
1000Mel. Then, it was asked the listeners to change linearly the physical frequency to make
cognitive frequency which double is of 1000Mel. So, they named it 2000Mel and it was repeated
for 10times, 0.5times, 0.1times and etc and in next stage they labeled them 1000Mel, 500Mel and
100Mel and etc. This mapping is linear almost at frequencies below 1000 Hz, while in
frequencies higher than 1000Hz, it is logarithmic [29]. In 1959, the following equation was
recommended by Faunt [31]in order to express the relationship between the frequency based on
Mel criteria and frequency based on Hertz criteria.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
26
Also, in 1993, Yang presented (4-2) equation [30]
Research has shown that in human auditory system, getting a certain frequency such as 0 which
is affected by energies around 0 frequency is placed in a critical band where the band width can
be changed by frequency. The width for frequencies below 1000 Hz is approximately 100 Hz and
for frequencies above 1000 Hz, it is measured logarithmically. Thus, to find the Cepstrum
coefficients, the logarithm of the total energy in the critical band around Mel frequencies has been
used as input to inverse Fourier transform. For this purpose, first a filter bank is installed on the
signal spectrum, so that the center frequency is distributed according to the Mel criteria.
Figure 2. Installation of distributed filter bank based on the Mel criteria on the logarithm spectrum.
Therefore, to extract MFCC features, we do as follow:
โ€ข -Selecting the desired frame from the speech signal.
โ€ข Measuring the frame Fourier spectrum taking the logarithm of the amplitude.
โ€ข Installing the filter bank on the spectrum in a way that distribution of the filters
has been done based on Mel criteria.
โ€ข Calculating the output of each filter in the filter bank. Measuring MFCC by using
(4-3) equation:
Where P is the number of filters, N the number of MFCC coefficients, Xj the output of jth filter
and MFCC coefficients are Ci. The number of chosen frame has to be a power of 2, and if not,
adding 0, the frame length is given a power of 2.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
27
4.2. Pitch Estimation
Fundamental frequency, F0, of the speech signal which is called a step and the periodicity of it, is
of great importance in speech signal automatic processing. The following prosodic information
and speech pitch of speech are largely determined by these parameters.
Pitch estimation algorithms can be classified into the following types:
โ€ข Algorithms that make use of time domain properties of speech signal.
โ€ข Time domain algorithm acts directly on the speech signal and by using the measured
waves peaks and valleys of the wave and measuring the zero crossing rate as well as
measuring autocorrelation, the step frequency period can be measured . It is assumed that
if a quasi-periodic signal be processed properly, a simple estimation of time domain gives
a proper approximation of the pitch.
4.2.1. Evaluation criteria of methods of step estimation
There are several criteria to evaluate a step estimation algorithm and some of them were
mentioned in below:
โ€ข Estimation accuracy in step of consistency and stability of the measurement.
โ€ข Speed of operation.
โ€ข Algorithm complexity.
โ€ข Suitability for hardware implementation.
โ€ข Hardware implementation cost.
An introduction to pitch estimation method
If s(n) is nth sample of a frame in speech signal, since its aim is to estimate the step frequency
and the frame has N samples, therefore, in this method a diagram of variation in autocorrelation
function amount r (ฮท) based on ฮท (sample frame) has been drawn. The distance of the first peak,is
the amount of step period [20].
4.3. Energy
Also known as power or energy, the intensity of a voice can be physically detected through the
pressure of sounds or a subjective level of noisiness. Normally, the simple intensity
is the sum of the absolute values for each data frame. The energy (E) of a signal frame of length
is obtained by:
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
28
4.5. ZCR
zcr is considered as one of the duration-related feature to be used in our experiment, which
represents the number of times that the speech signals crossing the zero point. It can be easily
calculated by counting the times that the wave touches the level zero reference. Instead of speech
rate mentioned in the psychology, ZCR is more appropriate for language-independent speech
recognition. The ZCR of a signal frame of length is obtained by:
In order to extract the feature, each voice is divided into windows of length 320 sections, and the
overlapping size is chosen to be 20. Features have been extracted by using the MATLAB
software. The calculation methods of these features are presented in the following (in all
equations, and ( ) stand for the window length and the values of samples in the time domain,
respectively) [20]. All the extracted features are shown in Table 2.
Table 2. Extracted features of each statement.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
29
5. CLASSIFIERS
Classification is the another component of a speech emotion recognition system. In this research,
we used three classification methods: SVM, and C.5 as well as their ombinations. In the
following of this section, we briefly explain the three major ones:
5.1. SVM Classifier
The SVM is one of the learning with observation methods which has been used in the
classification and regression. In the recent years, this comparatively new method is shown to
outperform the other older classification ones such as NN. This method is used to recognize
speech emotion state which had resulted in a very good performance [5, 7, 17]. The base of the
SVM method is to linearly classify data; and in the linear division of data, that line which has the
highest safety margin is selected. By using the quadratic programming (QP) methods which are
well-known in solving problems with constraints the optimum line for data is obtained. Before
the linear division, in order that the machine can classify data with high complexity, data are
transformed to a space with very higher dimensions by using the โ€œphiโ€ function. In order to solve
the problem with very high dimensions by using these methods, the Lagrangian dual theorem is
utilized to transform the intended minimization problem to its dual form. In the dual form, instead
of the complicated high-dimension โ€œphiโ€ function, a simpler one called the core function is
appeared. It is possible to use such various core functions as exponential, polynomial, and
sigmoid [23].
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
30
5.2. C5.0 Classifier
In the recent years, C5.0 has been used as a popular method in data mining. However, it has not
been used to recognize emotion till 2012. In this year, Sheikhan used the method, but the results
were not satisfying enough [5]. This algorithm is the developed version of the C4.5 and ID3 [24],
and it performs by either constructing the decision tree or utilizing a set of rules C5.0. C5.0
algorithm is used to make a decision tree or as rule set. A C5.0 model works by decomposing the
sample based on the feature which it benefits from maximum obtained information. Every sub -
sample which is defined by the first decomposition will be decomposed again and it takes place
based on different fields. Decomposition process is repeated until the obtained subsamples cannot
be decomposed any more. Finally, decomposition of the lowest levels is retested and the ones that
do not add much value to the model can be removed or pruned. C5.0 can produce two types of
models, while a decision tree is regarded as a direct description founded decomposition by
algorithms. Every leaf describes a certain subset of training data, while each training data item
belongs to a final node in a decision tree .In other words, for each record of proposed data, just a
prediction can be done to the decision tree. In contrast, a set of rules is defined as what is
attempted to do predictions for individual records. Meanwhile, a set of rules has been extracted
form decision trees and so, a simplified and detailed form of the founded data in the decision tree
can be provided. However, set of rules usually is not able to maintain their jobs which are
decision tree features. The main difference is that in a set of rules, more than one rule may be
applied in a record or maybe no rule can be applied in. If several rules are applicable in a record,
each rule achieves one weight (vote) based on the amount of certainty and a final prediction is
carried out by combining all weighted votes. If a rule does not apply, an arbitrary prediction can
be assigned to a record.
C5.0 tree models are totally stable against problems such as missing data and plurality of input
features. They usually do not require much time for estimation. Besides, understanding the C5.0
tree model is easier than the other models due to simple interpretation of the rules extracted from
the tree. Moreover, C5.0 model proposes powerful Boosting methods to increase the accuracy of
classification [25].
6. IMPLEMENTATION AND RESULTS EVALUATION
In this paper, we used the SVM and C5.0 classifiers as well as their combination (SVM-C5.0) in
the SPSS IBM MODELER environment.Tthen we stored the data in the EXCEL environment and
mined them by the SPSS IBM MODELER software. There were 54 features as the inputs which
were obtained by programming in the MATLAB environment from statements of the Berlinโ€™s
database. Our output was a set of emotion states (anger, happiness, boredom, sadness, fear,
disgust, and neutral).
Our statements were 535, 20% of which were randomly chosen for the test while other 80% were
used for the learning process. We implemented each set of data 10 times by using the SVM and
C5.0 classifiers and their combination (SVM-C5.0), averaged over the obtained values of
recognitions. In our first experiment, we tried to recognize two emotion states of neutral and
anger.
Our total statements were 206, 127 of which were associated with the anger state and the other 79
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
31
statements were related to the neutral state. Results of each test were obtained by 10 times
implementations for the following classification methods: SVM, C5.0 and SVM-C5.0. Values of
the average recognitions associated with these experiments are shown in Table 3. this table
shows, the recognition rate of the SVM-C5.0 , 3.6% is better than that of SVM. Also, 4.2% better
is than that of C5.0.
Table3. The average recognition rates for the anger and neutral states.
In our second experiment, we tried to recognize three emotion states of anger, happiness, and
sadness. Like the former case, we used 54 features. Our total statements were 272: 128, 78, and
66 statements were accordingly associated with the anger, sadness, and happiness states. Results
of each lassification test are shown in Table 4. As this table shows, the recognition rate of the
SVM-C5.0 3.5% is better than that of SVM. Also, C5.0 is 5.5% better than that of SVM-C5.0.
Table 4. The average recognition rates for the anger, happiness, and sadness states.
In the following, we will examine 4, 5, 6, and 7 emotion states whose results are presented in the
table 5. This table demonstrates that for four emotion states, SVM-C5.0 classification method
C5.0 and SVM methods to the extent of approximately 0.4%, and 3.9%, correspondingly. Also,
for five emotion states, it outperforms C5.0 and SVM methods as much as about 7.4% and 5.3%,
respectively. Furthermore, for six emotion states, it outperforms C5.0, and SVM methods to the
extent of 8.9%, and 5.6%, correspondingly. Finally, for seven emotion states, it outperforms C5.0,
and SVM methods as much as 4.5% and 3.5%, respectively.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
32
Table 5. Average recognition rates for 4, 5, 6, and 7 emotion states.
7. CONCLUSION AND DISCUSSION
In this paper, we used pitch, energy, ZCR, and MFCC to recognize speech emotion states. We
analyzed 535 emotional speeches from Berlinโ€™s database. A set with 54 features were calculated
for each statement. Three classification methods: SVM, C5.0, and their combination (SVM-C5.0)
were applied.
According to our findings, For anger, and natural emotion states, the recognition rate of SVM-
C5.0 was and 3.6% better than that of SVM model and 4.2% better than that of C5.0 model. for
happiness, anger, sadness states the recognition rate of SVM-C5.0 was 83.6% and 3.5% better
than that of SVM model, approximately and -5.5% better than that of C5.0 model, approximately.
For more emotion states, we obtained the following results: For four emotional states the
recognition rate of SVM-C5.0 was 88.8% and 0.4% better than that of C5.0 model and 3.9%
better than that of SVM model, approximately. For five emotional states the recognition rate of
SVM-C5.0 was 91.8% and 7.4% better than that of C5.0 model and 5.3% better than that of SVM
model, approximately. For six emotional states the recognition rate of SVM-C5.0 was 91.5% and
8.9% better than that of C5.0 model and 5.6% better than that of SVM model, approximately. For
seven emotional states the recognition rate of SVM-C5.0 was 88.6% and 4.5% better than that of
C5.0 model and 3.5% better than that of SVM model, approximately. It is evident that the
proposed SVM-C5.0 classification method is more accurate than the other ones used in this paper
for two, four, five, six, and seven emotional states. Consequently, it should be discussed ,
according to the table, that why the accuracy of 5 and 6 emotional state is higher than 3 and 4
emotional state . Happiness and angriness diagnosis is confusing in model prediction. Thus, the
accuracy rate decreases. However, this state is too low for the six senses and the following tables
approved the results. For other states, accuracy rate increases and decreases.
Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014
33
Table 6. Confusing Matrix for the anger, happiness, and sadness states.
Table 7. Confusing Matrix for the anger, happiness, fear, sadness, disgust, boredom.
REFERENCES
[1] Pao T, Chen Y, Yeh J, Chang Y. Emotion recognition and evaluation of mandarin speech using
weighted D-KNN classification. IntInnovComput Info Control 2008; 4: 1695- 1709.
[2] Altun H, Pollat G. Boosting selection of speech related features to improve performance of multi-
class SVMs in emotion detection.Expert Syst. Appl 2009; 36: 1897-8203.
[3] Yang ML. Emotion recognition from speech single using new harmony feature. Single Process 2010;
90: 1415-1423.
[4] He L, Lech M, Maddage NC, Allen NB. Study of empirical mode decomposition and spectral
analysis for stress and emotion classification in natural speech.Biomed Signal Process Control 2011;
6, 139-146 .
[5] Sheikhan M, Bejamin M, Gharavian D. Modular neural-SVM scheme for speech emotion recognition
using ANOVA feature for method. Neural Comput&Applic 2012.
[6] Schuller B, Rigoll G, Lang M. Speech emotion recognizing combining acoustic features and linguistic
information in a hybrid support vector machine-belief network architecture, in proceeding of the
ICASSP 2004; 1: 397-401.
[7] Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition: features, classification
schemes, and databases. Pattern Recognition 2011; 44: 572โ€“587.
[8] Yu F, Chang E, Xu Y, Shum H. Emotion detection from speech to enrich multimedia content. In
proceedings of the IEEE Pacific Rim conference on multimedia. Advances in multimedia information
processing,pp. 550-557, 2001
[9] Ayadi M, Kamel S, Karray F. Speech emotion recognition using Gaussian mixture vector
autoregressive models. In proceeding of the international conference on acoustics, speech, and signal
processing, 5,pp. 957-960, 2007.

More Related Content

PDF
A critical insight into multi-languages speech emotion databases
PDF
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
PDF
F0363942
PPTX
SPEECH BASED EMOTION RECOGNITION USING VOICE
PDF
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
PDF
Emotion Recognition Based On Audio Speech
PDF
H010215561
PDF
65 69
A critical insight into multi-languages speech emotion databases
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
F0363942
SPEECH BASED EMOTION RECOGNITION USING VOICE
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Recognition Based On Audio Speech
H010215561
65 69

What's hot (20)

PDF
Ijarcet vol-2-issue-4-1347-1351
PDF
IRJET- Emotion recognition using Speech Signal: A Review
PPTX
M3er multiplicative_multimodal_emotion_recognition
PDF
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ย 
PDF
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ย 
PPTX
Speech recognition system seminar
PPTX
Ai based character recognition and speech synthesis
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PPTX
Speech Recognition Technology
PPT
Speech Recognition
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPT
Speech Recognition System By Matlab
PPTX
Esophageal Speech Recognition using Artificial Neural Network (ANN)
PDF
Ho3114511454
PPT
IEEE ICASSP 2021
PDF
Paper id 23201490
ย 
PPT
Oral Qualification Examination_Kun_Zhou
PPTX
Speech recognition final presentation
Ijarcet vol-2-issue-4-1347-1351
IRJET- Emotion recognition using Speech Signal: A Review
M3er multiplicative_multimodal_emotion_recognition
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ย 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ย 
Speech recognition system seminar
Ai based character recognition and speech synthesis
SPEECH RECOGNITION USING NEURAL NETWORK
Speech Recognition Technology
Speech Recognition
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Speech Recognition System By Matlab
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Ho3114511454
IEEE ICASSP 2021
Paper id 23201490
ย 
Oral Qualification Examination_Kun_Zhou
Speech recognition final presentation
Ad

Similar to Speech Emotion Recognition by Using Combinations of Support Vector Machine (SVM), and C5.0 (20)

PDF
Signal & Image Processing : An International Journal
ย 
PDF
histogram-based-emotion
PDF
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ย 
PDF
Signal & Image Processing : An International Journal
ย 
PDF
Emotion recognition based on the energy distribution of plosive syllables
PDF
A comparative analysis of classi๏ฌers in emotion recognition thru acoustic fea...
PDF
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
PDF
Mfcc based enlargement of the training set for emotion recognition in speech
ย 
PDF
Improved speech emotion recognition with Mel frequency magnitude coefficient
DOCX
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
PDF
AffectNet_oneColumn-2.pdf
PDF
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
PDF
Report for Speech Emotion Recognition
PDF
PERFORMANCE EVALUATION OF VARIOUS EMOTION CLASSIFICATION APPROACHES FROM PHYS...
ย 
PDF
Signal Processing Tool for Emotion Recognition
PDF
Voice Emotion Recognition
PDF
Emotion Recognition Based on Speech Signals by Combining Empirical Mode Decom...
PDF
Intelligent Systems - Predictive Analytics Project
PDF
A017410108
PDF
A017410108
Signal & Image Processing : An International Journal
ย 
histogram-based-emotion
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ย 
Signal & Image Processing : An International Journal
ย 
Emotion recognition based on the energy distribution of plosive syllables
A comparative analysis of classi๏ฌers in emotion recognition thru acoustic fea...
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
Mfcc based enlargement of the training set for emotion recognition in speech
ย 
Improved speech emotion recognition with Mel frequency magnitude coefficient
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
AffectNet_oneColumn-2.pdf
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
Report for Speech Emotion Recognition
PERFORMANCE EVALUATION OF VARIOUS EMOTION CLASSIFICATION APPROACHES FROM PHYS...
ย 
Signal Processing Tool for Emotion Recognition
Voice Emotion Recognition
Emotion Recognition Based on Speech Signals by Combining Empirical Mode Decom...
Intelligent Systems - Predictive Analytics Project
A017410108
A017410108
Ad

More from mathsjournal (20)

PDF
DID FISHING NETS WITH CALCULATED SHELL WEIGHTS PRECEDE THE BOW AND ARROW? DIG...
PDF
MULTIPOINT MOVING NODES FOR P ARABOLIC EQUATIONS
PDF
THE VORTEX IMPULSE THEORY FOR FINITE WINGS
PDF
On Ideals via Generalized Reverse Derivation On Factor Rings
PDF
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
PDF
DID FISHING NETS WITH CALCULATED SHELL WEIGHTS PRECEDE THE BOW AND ARROW? DIG...
PDF
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
PDF
MODIFIED ALPHA-ROOTING COLOR IMAGE ENHANCEMENT METHOD ON THE TWO-SIDE 2-DQUAT...
PDF
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
PDF
On Nano Semi Generalized B - Neighbourhood in Nano Topological Spaces
PDF
A Mathematical Model in Public Health Epidemiology: Covid-19 Case Resolution ...
PDF
On a Diophantine Proofs of FLT: The First Case and the Secund Case zโ‰ก0 (mod p...
PDF
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
PDF
MODELING OF REDISTRIBUTION OF INFUSED DOPANT IN A MULTILAYER STRUCTURE DOPANT...
PDF
Numerical solution of fuzzy differential equations by Milneโ€™s predictor-corre...
PDF
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
PDF
A NEW STUDY OF TRAPEZOIDAL, SIMPSONโ€™S1/3 AND SIMPSONโ€™S 3/8 RULES OF NUMERICAL...
PDF
Fractional pseudo-Newton method and its use in the solution of a nonlinear sy...
PDF
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
PDF
SENTIMENT ANALYSIS OF COMPUTER SCIENCE STUDENTSโ€™ ATTITUDES TOWARD PROGRAMMING...
DID FISHING NETS WITH CALCULATED SHELL WEIGHTS PRECEDE THE BOW AND ARROW? DIG...
MULTIPOINT MOVING NODES FOR P ARABOLIC EQUATIONS
THE VORTEX IMPULSE THEORY FOR FINITE WINGS
On Ideals via Generalized Reverse Derivation On Factor Rings
A PROBABILISTIC ALGORITHM FOR COMPUTATION OF POLYNOMIAL GREATEST COMMON WITH ...
DID FISHING NETS WITH CALCULATED SHELL WEIGHTS PRECEDE THE BOW AND ARROW? DIG...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
MODIFIED ALPHA-ROOTING COLOR IMAGE ENHANCEMENT METHOD ON THE TWO-SIDE 2-DQUAT...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
On Nano Semi Generalized B - Neighbourhood in Nano Topological Spaces
A Mathematical Model in Public Health Epidemiology: Covid-19 Case Resolution ...
On a Diophantine Proofs of FLT: The First Case and the Secund Case zโ‰ก0 (mod p...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
MODELING OF REDISTRIBUTION OF INFUSED DOPANT IN A MULTILAYER STRUCTURE DOPANT...
Numerical solution of fuzzy differential equations by Milneโ€™s predictor-corre...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY OF TRAPEZOIDAL, SIMPSONโ€™S1/3 AND SIMPSONโ€™S 3/8 RULES OF NUMERICAL...
Fractional pseudo-Newton method and its use in the solution of a nonlinear sy...
LASSO MODELING AS AN ALTERNATIVE TO PCA BASED MULTIVARIATE MODELS TO SYSTEM W...
SENTIMENT ANALYSIS OF COMPUTER SCIENCE STUDENTSโ€™ ATTITUDES TOWARD PROGRAMMING...

Recently uploaded (20)

PDF
Unit 1 Cost Accounting - Cost sheet
ย 
DOCX
Business Management - unit 1 and 2
PPTX
5 Stages of group development guide.pptx
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
ย 
PDF
SIMNET Inc โ€“ 2023โ€™s Most Trusted IT Services & Solution Provider
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
Reconciliation AND MEMORANDUM RECONCILATION
ย 
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
ร”n tแบญp tiแบฟng anh trong kinh doanh nรขng cao
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PPT
Chapter four Project-Preparation material
Unit 1 Cost Accounting - Cost sheet
ย 
Business Management - unit 1 and 2
5 Stages of group development guide.pptx
Digital Marketing & E-commerce Certificate Glossary.pdf.................
ย 
SIMNET Inc โ€“ 2023โ€™s Most Trusted IT Services & Solution Provider
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Reconciliation AND MEMORANDUM RECONCILATION
ย 
Belch_12e_PPT_Ch18_Accessible_university.pptx
New Microsoft PowerPoint Presentation - Copy.pptx
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
ร”n tแบญp tiแบฟng anh trong kinh doanh nรขng cao
Power and position in leadershipDOC-20250808-WA0011..pdf
Roadmap Map-digital Banking feature MB,IB,AB
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
340036916-American-Literature-Literary-Period-Overview.ppt
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Chapter four Project-Preparation material

Speech Emotion Recognition by Using Combinations of Support Vector Machine (SVM), and C5.0

  • 1. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 21 Speech Emotion Recognition by Using Combinations of Support Vector Machine (SVM), and C5.0 Mohammad Masoud Javidi1 and Ebrahim Fazlizadeh Roshan2 1,2 Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran. ABSTRACT: Speech emotion recognition enables a computer system to records sounds and realizes the emotion of the speaker. we are still far from having a natural interaction between the human and machine because machines cannot distinguishes the emotion of the speaker. For this reason it has been established a new investigation field, namely โ€œthe speech emotion recognition systemsโ€. The accuracy of these systems depend on the various factors such as the type and the number of the emotion states and also the classifier type. In this paper, the classification methods of C5.0, Support Vector Machine (SVM), and the combination of C5.0 and SVM (SVM-C5.0) are verified, and their efficiencies in speech emotion recognition are compared. The utilized features in this research include energy, Zero Crossing Rate (ZCR), pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The results of paper demonstrate that the effectiveness proposed SVM-C5.0 classification method is more efficient in recognizing the emotion of the between -5.5 % and 8.9 % depending on the number of emotion states than SVM, C5.0. KEY WORDS: Emotion recognition, Feature extraction, Mel-scale Frequency Cepstral Coefficients, C5.0, Support Vector Machines 1. INTRODUCTION Speech emotion recognition aims are to design of the operator systems which receive the speech signals and extract the emotional states from them. This technology enables a sound recording computer (e.g. a computer that has a microphone) to realize the emotion of the speaker. However, there are a lot of challenges before any speech emotion recognition system. The most important challenges are emotional databases, feature extraction and classification models. A machine learning framework for emotion recognition from speech are displayed in figure 1. In the last three decades, several attempts have been done in order to recognize the speech emotion which the most important ones include [1-6]. The voice and the prosodic features, the speaking style, the speakerโ€™s characteristics, and the linguistic features can affect the emotion [5]. Several models such as hidden Markov models (HMM) in [9], Gaussian mixture model (GMM) in [5, 14, 15], NN in [13, 8], and SVM in [5, 16] have been utilized in order to recognize the speech emotion. It is well investigated in [7] that the SVM and the HMM lead to the most and the least recognitions, respectively. Another model was suggested in 2013 in which the neural
  • 2. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 22 network, SVM, C5.0 and their combination were analyzed [27]. Figure 1. A machine learning framework for emotion recognition from speech. We propose in this paper the SVM-C5.0 method in order to model the speech emotionrecognition systems more precisely. Regarding the existing features in the speech processing such as energy, ZCR, pitch, MFCC, and etc. Emotion states of anger, happiness, fear, sadness, disgust, boredom, and neutral have been considered. We will apply these recognizing emotion states to different classification models of SVM, C5.0 and SVM-C5.0 by using the SPSS IBM MODELER software and verify the results. The results of paper demonstrate that the effectiveness proposed SVM-C5.0 classification method is more efficient in recognizing the emotion of the between -5.5 % and 8.9 % depending on the number of emotion states than SVM, C5.0. The rest of this paper is organized as follows: in section 2, we review the most recently proposed speech emotion recognition systems, briefly. In section 3, we verify the database of Berlin. Section 4 and 5 explain the extracted features and utilized models, respectively. In section 6, we present the experimental results and also compare different methods. In the last section, the conclusion of this paper is explained. 2. RELATED WORK In 1990โ€™s, the most recognizing models were proposed based on Linear Recognize Classification (LDC)
  • 3. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 23 [17] and Maximum Likelihood Bays (MLB) [10, 26]. In the recent years, on the other hand, GMM [5,12, 14], NN [13, 8], Multi-Layer Perceptron (MLP) [5], K-Nearest Neighbor (KNN) [11, 14], HMM [9], and SVM [5, 16] used to recognize speech emotion. Table 1 shows the utilized models in the recent years along with the number of emotion stats and their recognition rates. In [26], Haq et al. used the MLB classification model and the following seven emotion states: the anger, disgust, fear, happiness, neutral, sadness, and surprise. They could attain the recognition rate of 53%. In [10], Ververidis and Kotropoulos used the MLB classification model and the anger, happiness, neutral, and sadness emotion states and could achieve the recognition rate of 53.7%. In [8], Yu et al. used the SVM and ANN classification models and the following four emotions: anger, happiness, neutral, and sadness, and could reach the recognition rates of 71% and 42% respectively for the SVM and ANN models, which demonstrate that the SVM is more high performance than the ANN. Table1. The most recent utilized models and their associated results
  • 4. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 24 In [9], Ayadi et al. could gain the recognition rates of 71% and 55% for the HMM and ANN models, respectively, by using the their classification models and seven emotion states, which reveals that the HMM model performs more better than the one of ANN. In [11], Petrushin used the KNN model and the anger, happiness, sadness, fear, and neutral emotion states and reached the recognition rate of 70%. In [12], Gharavian et al. used the GMM model for four emotion states and could reach the recognition rate of 65.1%. In one of the most recent researches in [5], Sheikhan et al. used three emotion states of happiness, anger, and neutral and two classification models, namely modular neural-SVM and C5.0. They demonstrated the recognition rates of 76.3% and 56.3% for the former and later models, respectively. And finally, a model which is a combination of the NN and C5.0 (NN-C5.0) classification methods was proposed in 2013 [27]. This model was applied to various emotion states and its results for 2, 3, 4, 5, 6, and 7 emotion states were 91.625, 82.122, 82.214, 75.807, 74.946, and 72.621, correspondingly.
  • 5. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 25 3. BERLIN DATABASE OF EMOTIONAL SPEECH The Berlinโ€™s database of the emotional speech is used to classify discrete emotion states. This database is one of the most applicable ones used to recognize the speech emotion state [7] based on which several work have been presented (for example, for two of the most recent researches, we refer to [16] and [18] proposed in 2011 and [27] proposed in 2013). This database has been implemented in the Technical University of Berlin. There are seven emotion states utilized in this database: the anger, happiness, boredom, sadness, fear, disgust, and neutral. Ten artists, including 5 men and 5 women, implemented the database in German language by saying 10 sentences, including 5 short and 5 long ones with time duration between 1.5 to 4 seconds. Samplings were done as the single channel by 16 kHz frequency. From these obtained speeches, it is possible to recognize seven real emotions of human. Audio files associated with 7 emotion states were classified as follows: anger (127), boredom (81), disgust (46), fear (69), happiness (71), sadness (62) and neutral (79). 4. FEATURE EXTRACTION The speech feature extraction which is also called speech coding is a very important and is basic part in many of the automatic processing systems of speech. Features of the speech are generally obtained from the digital speech. To do this, various methods are utilized that the aim of it to extract the features of the speech which are useful for the desired aim of the speech automatic process. Features that we have extracted to do this research are: energy, ZCR, pitch, MFCC which are described as follows [21, 19, 27 ]. 4.1. Mel-Frequency Cepstral Coefficients (MFCCs) As it was stated other type of Cepstrum coefficients are Mel-frequency cepstrum coefficients (MFCCs) [20]. The basic idea of using MFCC inspirited from the properties of the human ear in understanding speech. The human ear function is in a way that its cognitive frequency varies from actually true frequency (physically) of the voice. A Mel is a step unit of measurement a step or the frequency of a sound which is heard. In 1940, Stevens and Volkmann experimentally estimated Mel scale [28].Therefore; in an experiment they called the frequency of 1000Hz by 1000Mel. Then, it was asked the listeners to change linearly the physical frequency to make cognitive frequency which double is of 1000Mel. So, they named it 2000Mel and it was repeated for 10times, 0.5times, 0.1times and etc and in next stage they labeled them 1000Mel, 500Mel and 100Mel and etc. This mapping is linear almost at frequencies below 1000 Hz, while in frequencies higher than 1000Hz, it is logarithmic [29]. In 1959, the following equation was recommended by Faunt [31]in order to express the relationship between the frequency based on Mel criteria and frequency based on Hertz criteria.
  • 6. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 26 Also, in 1993, Yang presented (4-2) equation [30] Research has shown that in human auditory system, getting a certain frequency such as 0 which is affected by energies around 0 frequency is placed in a critical band where the band width can be changed by frequency. The width for frequencies below 1000 Hz is approximately 100 Hz and for frequencies above 1000 Hz, it is measured logarithmically. Thus, to find the Cepstrum coefficients, the logarithm of the total energy in the critical band around Mel frequencies has been used as input to inverse Fourier transform. For this purpose, first a filter bank is installed on the signal spectrum, so that the center frequency is distributed according to the Mel criteria. Figure 2. Installation of distributed filter bank based on the Mel criteria on the logarithm spectrum. Therefore, to extract MFCC features, we do as follow: โ€ข -Selecting the desired frame from the speech signal. โ€ข Measuring the frame Fourier spectrum taking the logarithm of the amplitude. โ€ข Installing the filter bank on the spectrum in a way that distribution of the filters has been done based on Mel criteria. โ€ข Calculating the output of each filter in the filter bank. Measuring MFCC by using (4-3) equation: Where P is the number of filters, N the number of MFCC coefficients, Xj the output of jth filter and MFCC coefficients are Ci. The number of chosen frame has to be a power of 2, and if not, adding 0, the frame length is given a power of 2.
  • 7. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 27 4.2. Pitch Estimation Fundamental frequency, F0, of the speech signal which is called a step and the periodicity of it, is of great importance in speech signal automatic processing. The following prosodic information and speech pitch of speech are largely determined by these parameters. Pitch estimation algorithms can be classified into the following types: โ€ข Algorithms that make use of time domain properties of speech signal. โ€ข Time domain algorithm acts directly on the speech signal and by using the measured waves peaks and valleys of the wave and measuring the zero crossing rate as well as measuring autocorrelation, the step frequency period can be measured . It is assumed that if a quasi-periodic signal be processed properly, a simple estimation of time domain gives a proper approximation of the pitch. 4.2.1. Evaluation criteria of methods of step estimation There are several criteria to evaluate a step estimation algorithm and some of them were mentioned in below: โ€ข Estimation accuracy in step of consistency and stability of the measurement. โ€ข Speed of operation. โ€ข Algorithm complexity. โ€ข Suitability for hardware implementation. โ€ข Hardware implementation cost. An introduction to pitch estimation method If s(n) is nth sample of a frame in speech signal, since its aim is to estimate the step frequency and the frame has N samples, therefore, in this method a diagram of variation in autocorrelation function amount r (ฮท) based on ฮท (sample frame) has been drawn. The distance of the first peak,is the amount of step period [20]. 4.3. Energy Also known as power or energy, the intensity of a voice can be physically detected through the pressure of sounds or a subjective level of noisiness. Normally, the simple intensity is the sum of the absolute values for each data frame. The energy (E) of a signal frame of length is obtained by:
  • 8. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 28 4.5. ZCR zcr is considered as one of the duration-related feature to be used in our experiment, which represents the number of times that the speech signals crossing the zero point. It can be easily calculated by counting the times that the wave touches the level zero reference. Instead of speech rate mentioned in the psychology, ZCR is more appropriate for language-independent speech recognition. The ZCR of a signal frame of length is obtained by: In order to extract the feature, each voice is divided into windows of length 320 sections, and the overlapping size is chosen to be 20. Features have been extracted by using the MATLAB software. The calculation methods of these features are presented in the following (in all equations, and ( ) stand for the window length and the values of samples in the time domain, respectively) [20]. All the extracted features are shown in Table 2. Table 2. Extracted features of each statement.
  • 9. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 29 5. CLASSIFIERS Classification is the another component of a speech emotion recognition system. In this research, we used three classification methods: SVM, and C.5 as well as their ombinations. In the following of this section, we briefly explain the three major ones: 5.1. SVM Classifier The SVM is one of the learning with observation methods which has been used in the classification and regression. In the recent years, this comparatively new method is shown to outperform the other older classification ones such as NN. This method is used to recognize speech emotion state which had resulted in a very good performance [5, 7, 17]. The base of the SVM method is to linearly classify data; and in the linear division of data, that line which has the highest safety margin is selected. By using the quadratic programming (QP) methods which are well-known in solving problems with constraints the optimum line for data is obtained. Before the linear division, in order that the machine can classify data with high complexity, data are transformed to a space with very higher dimensions by using the โ€œphiโ€ function. In order to solve the problem with very high dimensions by using these methods, the Lagrangian dual theorem is utilized to transform the intended minimization problem to its dual form. In the dual form, instead of the complicated high-dimension โ€œphiโ€ function, a simpler one called the core function is appeared. It is possible to use such various core functions as exponential, polynomial, and sigmoid [23].
  • 10. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 30 5.2. C5.0 Classifier In the recent years, C5.0 has been used as a popular method in data mining. However, it has not been used to recognize emotion till 2012. In this year, Sheikhan used the method, but the results were not satisfying enough [5]. This algorithm is the developed version of the C4.5 and ID3 [24], and it performs by either constructing the decision tree or utilizing a set of rules C5.0. C5.0 algorithm is used to make a decision tree or as rule set. A C5.0 model works by decomposing the sample based on the feature which it benefits from maximum obtained information. Every sub - sample which is defined by the first decomposition will be decomposed again and it takes place based on different fields. Decomposition process is repeated until the obtained subsamples cannot be decomposed any more. Finally, decomposition of the lowest levels is retested and the ones that do not add much value to the model can be removed or pruned. C5.0 can produce two types of models, while a decision tree is regarded as a direct description founded decomposition by algorithms. Every leaf describes a certain subset of training data, while each training data item belongs to a final node in a decision tree .In other words, for each record of proposed data, just a prediction can be done to the decision tree. In contrast, a set of rules is defined as what is attempted to do predictions for individual records. Meanwhile, a set of rules has been extracted form decision trees and so, a simplified and detailed form of the founded data in the decision tree can be provided. However, set of rules usually is not able to maintain their jobs which are decision tree features. The main difference is that in a set of rules, more than one rule may be applied in a record or maybe no rule can be applied in. If several rules are applicable in a record, each rule achieves one weight (vote) based on the amount of certainty and a final prediction is carried out by combining all weighted votes. If a rule does not apply, an arbitrary prediction can be assigned to a record. C5.0 tree models are totally stable against problems such as missing data and plurality of input features. They usually do not require much time for estimation. Besides, understanding the C5.0 tree model is easier than the other models due to simple interpretation of the rules extracted from the tree. Moreover, C5.0 model proposes powerful Boosting methods to increase the accuracy of classification [25]. 6. IMPLEMENTATION AND RESULTS EVALUATION In this paper, we used the SVM and C5.0 classifiers as well as their combination (SVM-C5.0) in the SPSS IBM MODELER environment.Tthen we stored the data in the EXCEL environment and mined them by the SPSS IBM MODELER software. There were 54 features as the inputs which were obtained by programming in the MATLAB environment from statements of the Berlinโ€™s database. Our output was a set of emotion states (anger, happiness, boredom, sadness, fear, disgust, and neutral). Our statements were 535, 20% of which were randomly chosen for the test while other 80% were used for the learning process. We implemented each set of data 10 times by using the SVM and C5.0 classifiers and their combination (SVM-C5.0), averaged over the obtained values of recognitions. In our first experiment, we tried to recognize two emotion states of neutral and anger. Our total statements were 206, 127 of which were associated with the anger state and the other 79
  • 11. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 31 statements were related to the neutral state. Results of each test were obtained by 10 times implementations for the following classification methods: SVM, C5.0 and SVM-C5.0. Values of the average recognitions associated with these experiments are shown in Table 3. this table shows, the recognition rate of the SVM-C5.0 , 3.6% is better than that of SVM. Also, 4.2% better is than that of C5.0. Table3. The average recognition rates for the anger and neutral states. In our second experiment, we tried to recognize three emotion states of anger, happiness, and sadness. Like the former case, we used 54 features. Our total statements were 272: 128, 78, and 66 statements were accordingly associated with the anger, sadness, and happiness states. Results of each lassification test are shown in Table 4. As this table shows, the recognition rate of the SVM-C5.0 3.5% is better than that of SVM. Also, C5.0 is 5.5% better than that of SVM-C5.0. Table 4. The average recognition rates for the anger, happiness, and sadness states. In the following, we will examine 4, 5, 6, and 7 emotion states whose results are presented in the table 5. This table demonstrates that for four emotion states, SVM-C5.0 classification method C5.0 and SVM methods to the extent of approximately 0.4%, and 3.9%, correspondingly. Also, for five emotion states, it outperforms C5.0 and SVM methods as much as about 7.4% and 5.3%, respectively. Furthermore, for six emotion states, it outperforms C5.0, and SVM methods to the extent of 8.9%, and 5.6%, correspondingly. Finally, for seven emotion states, it outperforms C5.0, and SVM methods as much as 4.5% and 3.5%, respectively.
  • 12. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 32 Table 5. Average recognition rates for 4, 5, 6, and 7 emotion states. 7. CONCLUSION AND DISCUSSION In this paper, we used pitch, energy, ZCR, and MFCC to recognize speech emotion states. We analyzed 535 emotional speeches from Berlinโ€™s database. A set with 54 features were calculated for each statement. Three classification methods: SVM, C5.0, and their combination (SVM-C5.0) were applied. According to our findings, For anger, and natural emotion states, the recognition rate of SVM- C5.0 was and 3.6% better than that of SVM model and 4.2% better than that of C5.0 model. for happiness, anger, sadness states the recognition rate of SVM-C5.0 was 83.6% and 3.5% better than that of SVM model, approximately and -5.5% better than that of C5.0 model, approximately. For more emotion states, we obtained the following results: For four emotional states the recognition rate of SVM-C5.0 was 88.8% and 0.4% better than that of C5.0 model and 3.9% better than that of SVM model, approximately. For five emotional states the recognition rate of SVM-C5.0 was 91.8% and 7.4% better than that of C5.0 model and 5.3% better than that of SVM model, approximately. For six emotional states the recognition rate of SVM-C5.0 was 91.5% and 8.9% better than that of C5.0 model and 5.6% better than that of SVM model, approximately. For seven emotional states the recognition rate of SVM-C5.0 was 88.6% and 4.5% better than that of C5.0 model and 3.5% better than that of SVM model, approximately. It is evident that the proposed SVM-C5.0 classification method is more accurate than the other ones used in this paper for two, four, five, six, and seven emotional states. Consequently, it should be discussed , according to the table, that why the accuracy of 5 and 6 emotional state is higher than 3 and 4 emotional state . Happiness and angriness diagnosis is confusing in model prediction. Thus, the accuracy rate decreases. However, this state is too low for the six senses and the following tables approved the results. For other states, accuracy rate increases and decreases.
  • 13. Applied Mathematics and Sciences: An International Journal (MathSJ ), Vol. 1, No. 2, August 2014 33 Table 6. Confusing Matrix for the anger, happiness, and sadness states. Table 7. Confusing Matrix for the anger, happiness, fear, sadness, disgust, boredom. REFERENCES [1] Pao T, Chen Y, Yeh J, Chang Y. Emotion recognition and evaluation of mandarin speech using weighted D-KNN classification. IntInnovComput Info Control 2008; 4: 1695- 1709. [2] Altun H, Pollat G. Boosting selection of speech related features to improve performance of multi- class SVMs in emotion detection.Expert Syst. Appl 2009; 36: 1897-8203. [3] Yang ML. Emotion recognition from speech single using new harmony feature. Single Process 2010; 90: 1415-1423. [4] He L, Lech M, Maddage NC, Allen NB. Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech.Biomed Signal Process Control 2011; 6, 139-146 . [5] Sheikhan M, Bejamin M, Gharavian D. Modular neural-SVM scheme for speech emotion recognition using ANOVA feature for method. Neural Comput&Applic 2012. [6] Schuller B, Rigoll G, Lang M. Speech emotion recognizing combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in proceeding of the ICASSP 2004; 1: 397-401. [7] Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognition 2011; 44: 572โ€“587. [8] Yu F, Chang E, Xu Y, Shum H. Emotion detection from speech to enrich multimedia content. In proceedings of the IEEE Pacific Rim conference on multimedia. Advances in multimedia information processing,pp. 550-557, 2001 [9] Ayadi M, Kamel S, Karray F. Speech emotion recognition using Gaussian mixture vector autoregressive models. In proceeding of the international conference on acoustics, speech, and signal processing, 5,pp. 957-960, 2007.