SlideShare a Scribd company logo
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 6, December 2022, pp. 6090~6102
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i6.pp6090-6102  6090
Journal homepage: http://ijece.iaescore.com
Comparison of window shapes and lengths in short-time feature
extraction for classification of heart sound signals
Mahmoud Fakhry, Abeer FathAllah Brery
Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan, Egypt
Article Info ABSTRACT
Article history:
Received Sep 16, 2021
Revised Jun 27, 2022
Accepted Jul 23, 2022
Heart sound signals, phonocardiography (PCG) signals, allow for the
automatic diagnosis of potential cardiovascular pathology. Such
classification task can be tackled using the bidirectional long short-term
memory (biLSTM) network, trained on features extracted from labeled PCG
signals. Regarding the non-stationarity of PCG signals, it is recommended to
extract the features from multiple short-length segments of the signals using
a sliding window of certain shape and length. However, some window
contains unfavorable spectral side lobes, which distort the features.
Accordingly, it is preferable to adapt the window shape and length in terms
of classification performance. We propose an experimental evaluation for
three window shapes, each with three window lengths. The biLSTM network
is trained and tested on statistical features extracted, and the performance is
reported in terms of the window shapes and lengths. Results show that the
best performance is obtained when the Gaussian window is used for splitting
the signals, and the triangular window competes with the Gaussian window
for a length of 75 ms. Although the rectangular window is a commonly
offered option, it is the worst choice for splitting the signals. Moreover,
the classification performance obtained with a 75 ms Gaussian window
outperforms that of a baseline method.
Keywords:
Bidirectional long short-term
memory network
Classification
Feature extraction
Heart sound
Window length
Window shape
This is an open access article under the CC BY-SA license.
Corresponding Author:
Mahmoud Fakhry
Electrical Engineering Department, Faculty of Engineering, Aswan University
81542 Aswan, Egypt
Email: m.fakhry@aswu.edu.eg
1. INTRODUCTION
The development of reliable and dynamic noninvasive systems is essential to the successful
automatic diagnosis of diseases. Cardiologists are accustomed to using a medical stethoscope to listen to the
heart and assess its health. Although this primitive diagnosis strategy is low-cost and non-invasive, its
accuracy relays on the hearing experience of the cardiologist, which requires years to acquire [1]. As a result,
it sparked the idea of using phonocardiography (PCG) signals to conduct an automated check of heart health
[2]–[5].
There are several methods for the automation of heart monitoring and examination, including
electrocardiogram (ECG), which records heart electrical activity, and PPG, which estimates the blood flow
rate using sensors of light. An assortment of techniques have been developed in the literature for
computerizing the investigation of ECG and PPG signals [6]–[8]. Besides ECG and PPG signals, the PCG
signals, which records the acoustic sound signals created by the heart in a cardiovascular cycle, can be
utilized to proficiently screen and evaluate heart wellbeing. The ECG and PCG signals are more coherent
than the PPG signal and they also include more information. In comparison to the ECG signals, the PCG
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6091
signal has a notable benefit in that it records the acoustic features of heart sound signals. These features make
it easier to identify murmurs, which are irregularities in PCG signals [9].
PCG signals are recorded in a quiet room, with the PCG transducer, such as a microphone or
stethoscope, firmly placed on the chest with a rubber strap. The PCG signal consists of a pair of heartbeats
called S1 and S2, and two silent time durations between them as shown in Figure 1. This refers to the times
the heart takes to alternate between contraction and closure. The time duration from the conclusion of the
heartbeat S1 to the starting of the heartbeat S2 is called the systolic interval, and the diastolic interval is
defined as the time duration from the conclusion of the heartbeat S2 to the starting of the heartbeat S1. The
presence of disease is associated with additional audible actions that occur during the silent time durations,
e.g., murmur is generated when blood continues to flow during the closure of the heart valves. The heartbeat
S1 is a large magnitude signal with low frequency components, and the heartbeat S2 is a small magnitude
signal with high frequency components [10]. Many methods have been recently developed for diagnosing
heart abnormalities based on the analysis of PCG signals [11], [12]. In general, the detection of heart
abnormality based on training distinct features is more effective than using training raw PCG signals.
Figure 1. A sound recording of normal heart (PCG signal)
One of the most difficult aspects for extracting features from PCG signals is that their complex and
nonstationary nature. They are considered stationary only within short- time intervals, i.e., “quasi-stationary,”
whereas signal characteristics are nonstationary over longer time intervals. To meet this requirement, features
should be extracted in short-time intervals. This means that the PCG signal should be divided into
short-length segments of overlapping/non-overlapping signal samples. This is accomplished by employing a
sliding temporal window with a specific shape and length [13]. In this sense, the selection of the window
shape and its length is a critical issue to ensure a successful diagnosis of heart abnormality.
In this paper, sequences of statistical features extracted from short-length segments of labeled PCG
signals are exploited for training and testing the bidirectional long short-term memory (biLSTM) classifier.
Our aim in this work is to experimentally adapt the window shape and length for feature extraction in terms
of classification of PCG signals. For that purpose, the classification results of conducted experiments are
reported for three different window shapes and each is investigated with three different lengths. Furthermore,
we evaluate four different network architectures of the biLSTM classifier.
The rest of this manuscript is organized. Methods for diagnosing heart diseases are briefly reviewed
in section 2. In section 3 explains the proposed method. Elaborated classification experiments and
performance are reported in section 4. The conclusion of this work is drawn in section 5.
2. LITERATURE REVIEW
Classification features can be extracted from different spectro-temporal representations of PCG
signals. These representations can be obtained using the short-time Fourier transform (STFT) [10], the
wavelet transform (WT) [14], and the Mel-frequency cepstral coefficients (MFCCs) [15]. Sanei et al. [16]
conducted a comparative study between these time-frequency representations of PCG signals and the
temporal analysis of PCG signals. They have obtained a similar frequency and time resolution for the
detection of the acoustic properties of heart sound signals. Because subspaces spanned by heartbeats and
murmurs are typically distinct, authors developed a constrained singular spectral analysis method as well as
an adapted method for subspace selection to separate murmur from PCG signals in [17]. The optimized
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6092
singular spectral processing method is found to segregate sound signals even when they are overlapped in
time, as evidenced by low correlation values in almost all murmur types. In [18], the support vector machine
(SVM) is employed to classify envelopes of PCG signals extracted in the wavelet domain. Hamidi et al. [19]
perform feature extraction by estimating the parameters of linear predictive coding, and feature classification
by SVM with a modified optimizer based on cuckoo search. Curve fitting and fractal dimension are exploited
for producing a couple of feature extraction strategies from PCG signals in [20]. In [21], Ölmez and Dokur
investigated the classification performance of the k-nearest neighbors (kNNs) with short-time features of
wavelet and Hilbert transforms, homomorphic filtering, and power spectrum density of PCG signals. The
extracted features are classified employing kNNs with error distances computed using the second norm for
multiple cluster numbers.
Artificial neural networks (ANNs) are utilized to classify PCG signals for the first time in [22].
Following that, many scholars focused on diagnosing of heart diseases using various ANNs and features. The
principal component analysis and the wavelet decomposition are exploited for extraction of distinct features
from PCG signals in [23], [24]. In [25], short-time features of the wavelet analysis and the time-frequency
representation of PCG signals were evaluated independently using a multilayer perceptron composed of
either a single hidden layer or a couple of hidden layers. The feedforward neural network is used in [26], [27]
to classify PCG signals using features computed from the temporal, spectral, and spectro-temporal
representations of the signals. The convolutional neural networks (CNNs) have made a rapid progress in the
classification of PCG signals [28], [29]. PCG signals are classified using CNNs with features of wavelet
coefficients in [30], and of MFCCs in [31]. In [32], heat maps are created from the energy distribution of
MFCCs of PCG signals. Authors suggest to combine the heat maps with CNNs to perform such classification
tasks.
The recurrent neural networks (RNNs) are combined with MFCCs for the classification of PCG
signals. The results are reported for four different RNN models, namely long short-term memory (LSTM),
bidirectional LSTM (biLSTM), gated recurrent unit (GRU), and bidirectional GRU (biGRU) [33].
Alam et al. [34] describe a combination of two network architectures, namely the CNN and the biLSTM
networks. They offer to train visible and temporal properties of murmur in PCG signals through the use of
short-time features of spectrogram and MFCCs. The nonlinear autoregressive network with exogenous inputs
(NARX) is leveraged for binary the detection of heart abnormality in [35], [36]. This training and testing of
this network is performed using three different groups of short-length features, namely, temporal, spectral,
and statistical. Comparative analysis of three algorithms of training the biLSTM network for the purpose of
PCG signal classification is prepared in [37]. Authors concluded that the network provides the best
classification performance when it is trained with the stochastic gradient descent (SGD) with momentum
algorithm.
3. PROPOSED METHOD
Because PCG signals are complex and nonstationary in nature, their analysis is a difficult issue.
They are only deemed stationary within specific short-time intervals. In the proposed method, we offer to
split a PCG signal into overlapping short-length segments of samples. This signal dividing step is completed
using a sliding symmetric temporal window with certain shape and length. The contribution of each sample
of PCG signals in the segment is defined by the window shape, and different segments are obtained by
translating the window one sample step. The diagnosing of heart diseases is accomplished by classification of
PCG which is achieved using the biLSTM network and 10 statistical features calculated independently inside
each segment. However, some window shapes contain unfavorable spectral sidelobes. These sidelobes
produce spectral components that interfere with the true spectrum of segments, resulting in distorted features.
Using distorted classification features reduces the probability of accurate detection of heart abnormality.
In this proposal, we offer to make a comparison among three different window shapes, each with
three different lengths. In this sense, we adapt the shape and length of the window in the context of
classification of PCG signals. Sequences of features are employed for training and testing different biLSTM
architectures. Figure 2 depicts the flowchart of the proposed method. The sections that follow explain the
procedure, beginning with signal pre-processing and progressing to feature extraction and feature
classification.
3.1. Pre-processing
Most heart sound recordings obtained with recording instruments are corrupted with noise from the
surrounding environment. As a result, filtering out the noise becomes critical, but not at the expense of
omitting information necessary for diagnosing heart disease. To reject distorting noise from the recordings,
PCG signals are severely filtered. The bandwidth of heartbeat S1 is mainly from 10 to 200 Hz, which is made
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6093
by the vibrations of heart chambers and valves. After the closure of semilunar valves on aortic and
pulmonary arteries, the heartbeat S2 is formed with a frequency range from 20 to 250 Hz. The low-frequency
components of diagnostic information are preserved while high-frequency noise components are rejected
using a low-pass filter with a 250 Hz cut-off frequency. Following Nyquist theorem, the filtered signals are
down-sampled before feature extraction to reduce both calculations and computation complexity.
Figure 2. Flowchart of the proposed method
3.2. PCG signal windowing
An input signal can be divided into L+1 overlapped segments of signal samples. That is, the
boundaries of subsequent segments overlap, i.e., the last samples of one segment are present in the following
segment. These overlapping segments are obtained by moving a temporal sliding window along the signal. A
typical overlap is chosen between one and L samples. In this work, a PCG signal is represented by N
overlapping segments of samples with the length of overlap to be L. This is achieved for a PCG signal x[n] of
length N by employing a sliding symmetric window w[n] of length L+1, with L+1<=N, such as (1):
𝑦𝑛[𝑙] = 𝑤[𝑙]𝑥[𝑛 + 𝑙], |𝑙| ≤
𝐿
2
, (1)
where yn[l] is the nth signal block of length L+1. The shape of the window is selected from widely used
windows, including the rectangular window, the triangular window, and the Gaussian window.
3.2.1. Sliding symmetric window
Because the time-domain product of two signals corresponds to the linear convolution of their
respective spectra, the spectrum of the signal x[n] is convolved with the spectrum of the sliding window w[n]
to obtain the spectrum of y[n]. As a result, the spectrum of the window influences the resulting spectrum. The
typical spectral shape of a window consists of a main lobe and many side lobes with decreasing intensity as
shown in Figure 3. When the signal of interest is sinusoidal, the resulting Fourier transform of the windowed
signal will be a superposition of two window functions with their main lobes located at the sinusoidal
signal’s frequency, effectively “smearing” the delta functions. This unfavorable side effect is usually
characterized by [13]: i) the main-lobe width, ii) the intensity of the first (closest) side-lobe peak, and iii) the
attenuation of the subsequent side-lobe peaks. Various windows have been offered in the literature to
optimize these properties for specific use situations, and some of them are studied here for extracting features
in the context of PCG signal classification.
3.2.2. The rectangular window
Because signal truncation corresponds to applying the rectangular window, this is the simplest
window. It is also called the Dirichlet window and defined:
𝑤[𝑙] = 1, |𝑙| ≤
𝐿
2
. (2)
The spectrum of this window is represented by a sinc function, with the peak of the first side-lobe accounting
for approximately one-fifth the peak of the main-lobe.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6094
Figure 3. Three different symmetric windows and their corresponding spectral shapes with L=20
3.2.3. The triangular window
The linear convolution of two rectangular windows of length L/2 yields a (L+1)-length triangular
window. Therefore, the spectrum of the triangular window is obviously the Dirichlet kernel squared. This
window exhibits a nonnegative Fourier transform due to the above discussed convolution property. Its
mathematical formula is determined by (3).
𝑤[𝑙] = 1 −
|2𝑙|
𝐿
, |𝑙| ≤
𝐿
2
, (3)
The half width of the main lobe of the triangular window equals twice that of the rectangular window.
3.2.4. The Gaussian window
It is well known that the spectrum of a Gaussian function is also a Gaussian function. Since this
function spans an infinity time, it must be trimmed at its ends to be used as a window function. This window
is represented by (4).
𝑤[𝑙] = 𝑒
−
1
2
(α
𝑙
𝐿/2
)
2
, |𝑙| ≤
𝐿
2
, (4)
Increasing the value of α results in narrowing the width of the window, which reduces the severity of the
discontinuity at the edges. However, this will widen the main-lobe and, as a result, diminish the side-lobe
levels. The most common values of α are 2.5, 3, and 3.5.
3.3. Feature extraction
Various measures for describing the statistical properties of signals are widely used. These measures
can be used as classification features extracted from short-length segments of signal samples. Theoretically,
the computation of statistical features requires infinite length observations; however, in practical applications,
short-length segments can be assumed enough if we adapt the window shape and length [38]. The following
section provides explanations of 10 feature sequences representing each PCG signal of the recording
database.
3.3.1. Arithmetic mean
The arithmetic mean is the average value of the short-length segment of signal samples. Assuming
that the samples of a segment are uniformly distributed, the arithmetic mean is computed by (5).
μ𝑦[𝑛] =
1
𝐿+1
∑ 𝑦𝑛[𝑙]
𝐿/2
𝑙=−𝐿/2 . (5)
For a symmetric probability density function (PDF), the arithmetic mean is the position of the symmetric
axis. When the PDF is not symmetric (a skewed signal), then the calculation of the mean is of limited use.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6095
3.3.2. Median
The median is the value that spatially divides a segment of samples into half. To find the median,
sort the samples of yn[l] from the smallest value to the largest, and then find the sample with an equal number
of samples above and below it. Outlier samples have a minor influence on the median. For a signal with
repeated sample values, the median is of limited use, and the mode takes its place as a measure of the
average.
3.3.3. Mode
The mode is the most frequently occurring value in a segment of samples. In other words, the mode
is the common value that is repeated in samples. A multimodal distribution exists when the samples contain
multiple values that are linked to the most frequently occurring. If no value repeats, the signal lacks a mode.
3.3.4. Variance
The variance and standard deviation both measure the spread of the input segment of signal sample
around its arithmetic mean. The difference is given by (6).
σ𝑦
2 [𝑛] =
1
𝐿+1
∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛])
2
𝐿/2
𝑙=−𝐿/2 . (6)
For μ𝑦[𝑛] = 0, the variance equals the power of the observed segment of signal samples. The standard
deviation σ𝑦[𝑛] can be computed directly from the variance, σ𝑦[𝑛] = √σ𝑦
2[𝑛].
3.3.5. Skewness
The skewness is defined as the division of the third central moment of a segment of signal sample by
the cube of its standard deviation. It is given by (7).
𝑣𝑦[𝑛] =
1
𝐿+1
1
σ𝑦
3[𝑛]
∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛])
3
𝐿/2
𝑙=−𝐿/2 . (7)
Skewness is a measure of PDF asymmetry. It is 0 for symmetric distributions, negative for distributions with
their mass centered on the right, and positive for distributions with their mass centered on the left.
3.3.6. Kurtosis
Kurtosis is defined as the division of the fourth central moment of a segment of signal sample by the
fourth power of the standard deviation. It is mathematically computed as (8):
ω𝑦[𝑛] =
1
𝐿+1
1
σ𝑦
4[𝑛]
∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛])
4
𝐿/2
𝑙=−𝐿/2 − 3. (8)
Kurtosis quantifies the non-Gaussianity of the PDF. It is an indicator of the flatness of the PDF compared to
the Gaussian distribution. It is zero for a Gaussian distribution, is negative for a flatter distribution with a
wider peak, and positive for distributions with a sharper peak.
3.3.7. Shannon energy
Shannon energy finds the average spectral of a segment of signal samples. It discounts the high
value components into the low value components. It is obtained for a segment of samples.
𝐸𝑠𝑦[𝑛] = ∑ |𝑦𝑛[𝑙]|2
𝐿/2
𝑙=−𝐿/2 log|𝑦𝑛[𝑙]|2
. (9)
In the presence of noise and outliers, Shannon energy approaches signal ranges, resulting in fewer errors. The
advantage of using Shannon energy is its ability to emphasis the medium over traditional energy.
3.3.8. Shannon entropy
Shannon entropy quantifies the uncertainty of a random variable. It is calculated using the PDF
𝑝𝑦(𝑦𝑛[𝑙]) of a segment of samples.
ℎ𝑠𝑦[𝑛] = ∑ 𝑝𝑦(𝑦𝑛[𝑙])
𝐿/2
𝑙=−𝐿/2 log 𝑝𝑦 (𝑦𝑛[𝑙]). (10)
Entropy measures how effectively one can predict the behavior of respective parts from the others. In
general, more entropy denotes more complicated signals and, as a result, less predictability.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6096
3.3.9. Zero-crossing rate
The number of sign changes in subsequent samples is referred to as the zero-crossing rate. It is a
low-level feature that has been utilized in audio analysis for decades due to its straightforward calculation.
𝑧𝑦[𝑛] =
1
2
1
𝐿+1
∑ |sign(𝑦𝑛[𝑙]) − sign(𝑦𝑛[𝑙 − 1])|
𝐿/2
𝑙=−𝐿/2 . (11)
The output is in the range of 0 ≤ 𝑧𝑦[𝑛] ≤ 1. The more often the signal changes its sign, the more
high-frequency content is assumed to be in the signal.
3.3.10. Quantile ranges
Quantiles can be used to split the PDF into subsets of equal size. When the PDF is divided into two
quantiles, it is divided into two halves, each of which contains half of the total number of observations. The
median will be defined as the point where the border between those two quantiles crosses.
𝑄𝑦(0.5) = 𝑦|∫ 𝑝𝑦(𝑥)𝑑𝑥
𝑦
𝑖𝑛𝑓 =0.5
(12)
In the case of symmetric distributions, this equals the arithmetic mean. Quantile ranges are useful for
simplifying the explanation of a distribution’s shape. Here, we adopt the range spanned by 50 % of the
samples by discarding the upper and lower 25 %. The quantile range is then computed.
Δ𝑄𝑦(0.5) = 𝑄𝑦(0.75) − 𝑄𝑦(0.25). (13)
3.4. Normalization of feature sequences
Normalization rescales the features so that they have a zero mean and unit standard deviation. For a
feature vector a[n] with mean μ𝑎 and standard deviation σ𝑎, the normalization is accomplished.
𝑎[𝑛] =
𝑎[𝑛]−μ𝑎
σ𝑎
. (14)
Normalization of features to be centered around zero with a standard deviation of one is important for
comparing measurements with different units, and it is also required by many learning algorithms. This
normalization is applied to all of feature sequences obtained as shown in the previous section. Figure 4
depicts an example of 10 normalized sequences of extracted features from two PCG signals. The normalized
sequences are then used to train and test the biLSTM network.
3.5. Feature classification
Recurrent neural networks (RNNs) [33] are composed of hidden layers and feedback connections. In
RNN models there are at least recurrent layer with numerous hidden neurons/nodes between input and output
layers. The recurrent long short-term memory (LSTM) network contains hidden layers with self-recurrent
weights, allowing memory to retain previous information. This network is well-known for modeling the trend
in a sequence of features [39]. The bidirectional LSTM (biLSTM) network is a modified LSTM network with
two hidden layers. These hidden layers learn sequences by alternating between backward and forward layers
[40]. The alternating strategy improves the accuracy of LSTM networks by enabling past data to offer
context for subsequent samples in a training sequence.
3.5.1. Training the biLSTM model
Backpropagation is applied to calculate the training parameters of the biLSTM network, which is
then followed by an optimization algorithm [41]. The simplest way is to take each training example, run it
through the network to get a prediction output, subtract it from the actual output we want to get, and square
it. The network performs very well if the loss function is small, and it should be as small as possible.
Backpropagation computes the gradients used by the optimization algorithm to minimize loss functions. To
update the network parameters, we use the stochastic gradient descent (SGD) with momentum. Unlike the
SGD, the momentum considers the gradient obtained in the last steps to find the optimum search path. This
allows the network to approach the minimum value of the loss function more quickly. Although the
momentum speeds up convergence, it should be used in conjunction with simulated annealing to avoid
overshooting global minima. In the real world, the momentum is set at values that start at 0.5 and
progressively increase to 0.9 over the course of the training epochs.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6097
Figure 4. Normalized sequences of features for two PCG signals obtained with a 75 ms Gaussian window
4. EXPERIMENTS
4.1. Database
The heart sound recordings considered here are taken from the database of the PhysioNet 2016
challenge, which is publicly available on the Web [42]. An electronic stethoscope was used to record PCG
signals at four different points on the chest. The database includes recordings that have been classified as
healthy or pathological. We chose a well-balanced database that includes sound recordings of 150 healthy
hearts and other 150 pathological hearts. These recordings range in length between 5 and 120 seconds, with a
sampling frequency of 2,000 Hz. Following Nyquist theorem, all recordings are downsampled at 500 Hz.
4.2. Training and testing
For extraction of short-time features, we truncate the downsampled PCG signals, so that their
lengths become 10 seconds (5,000 samples). The truncated signals are divided into short-length segments of
samples using a window of length L+1. 70% of the extracted features are used to train the model, while 30%
are used to test it. The training and testing sets are picked from the extracted features of the recording
database at random. This procedure of training and testing is done for thirty times separately. Averaging the
outcomes of the thirty experiments yields the classification performance. The classification results are
provided in terms of the shape and length of the sliding symmetric window and as a function of four different
numbers of the hidden neurons of the biLSTM classifier. We set the value of the model variables as listed in
Table 1. These variables include the learning rate and the value of momentum in SGDM.
4.3. Classification performance
The number of successfully classified signal examples vs the number of wrongly classified signal
examples is a measure of classification performance. False negatives (FN) occur when the total number of
signals from pathological patients are reported healthy, while false positives (FP) occur when the total
number of signals from healthy patients are proclaimed pathological. True positives (TP) refer to
Magnitude
Index Index
Magnitude
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6098
pathological heart sounds that have been accurately identified, whereas true negatives (TN) are healthy heart
sounds that have been appropriately categorized. The sensitivity (Sens.), specificity (Spec.), and accuracy
(Accu.) are commonly computed using the above-defined variables [43].
𝑆𝑒𝑛𝑠. =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
100%,
𝑆𝑝𝑒𝑐. =
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
100%
𝐴𝑐𝑐𝑢. =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃
100%.
Table 1. Implementation variables of the biLSTM classifier with SGDM
Variable Value
# biLSTM layers 2
# Neurons in layer 5, 30, 50 or 100
# Classes 2
Initial learning rate 0.01
Epochs 500
The momentum 0.90
4.4. Results
Figure 5 depicts the classification records for the selected test dataset averaged over the number of
hidden neurons in the biLSTM model. The best performance is obtained by employing the Gaussian window
with a length of 30 samples (75 ms) to split PCG signals into short-length segments for feature extraction,
followed by the triangular window. Furthermore, the specificity and the accuracy accomplished by the
classifier are better when short-length segments are obtained using the triangular window of length
50 samples (125 ms) than those by the Gaussian window, and the sensitivity values are comparable. The
results obtained with the Gaussian window surpass those with the triangle and rectangular windows in terms
of sensitivity and accuracy for windows of length of 15 samples (37.5 ms), while the results with the
rectangular window are better than those with the other windows in terms of specificity, followed by those
with the Gaussian window.
Figure 5. Average performance for four numbers of neurons of the classification model
Figure 6. depicts three sub-figures of the accuracy obtained by the biLSTM network. Each
sub-figure shows the accuracy when each window of the three windows is used to divide PCG signals into
short-length segments. The accuracy follows different trends for different numbers of hidden neurons.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6099
Overall window shapes, the optimal accuracy achieved by the biLSTM classifier when the number of its
hidden neurons is set to 30. For a large or small number of hidden neurons, accuracy degradation is recorded.
When using the biLSTM network to classify features extracted from short-length segments of PCG signals
with a sliding symmetric window, the Gaussian window or the triangular window is the best choice.
However, the rectangular window outperforms the other two windows for a length of 15 samples (37.5 ms).
Table 2 reports detailed classification performance for the selected test PCG signals. According to
the results, using any window length and shape provides a good identification of heart abnormality for all
numbers of hidden neurons in the biLSTM classifier, as evidenced by the sensitivity values compared to the
specificity. The window shape and length can be adapted to get the best performance. The best achieved
accuracy is 89.10.
Figure 6. Classification performance of the test dataset in terms of accuracy for each window shape as a
function of a different number of hidden neurons and different window lengths
Table 2. Detailed classification results for the test data
Window length 15 samples 30 samples 50 samples
Window shape Neurons Sens. Spec. Accu. Sens. Spec. Accu. Sens. Spec. Accu.
Rectangular 5 71.40 74.10 72.70 78.60 74.10 76.40 78.60 70.40 74.50
30 82.10 77.80 80.00 85.70 70.40 78.20 71.40 70.40 70.90
50 75.00 74.10 74.50 75.00 74.10 74.50 71.40 70.40 70.90
100 78.60 77.80 78.20 75.00 70.40 72.70 75.00 63.00 69.10
Triangular 5 82.10 77.80 80.00 85.70 74.10 80.00 75.00 81.50 78.20
30 89.30 70.40 80.00 92.90 85.20 89.10 85.70 77.80 81.80
50 85.70 70.40 78.20 85.70 85.20 85.50 82.10 77.80 80.00
100 78.60 70.40 74.50 89.30 81.50 85.50 78.60 77.80 78.20
Gaussian 5 85.70 66.70 76.40 89.30 88.90 89.10 75.00 66.70 70.90
30 92.90 77.80 85.50 92.90 85.20 89.10 89.30 81.50 85.50
50 92.90 77.80 85.50 92.90 85.20 89.10 85.70 77.80 81.80
100 89.30 77.80 83.60 85.70 81.50 83.60 78.60 74.10 76.40
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6100
4.5. Discussion
Adaptation of the shape and length of sliding window to split PCG signals into short-length
segments for feature extraction is an intriguing need to improve the diagnosis accuracy of heart disease. The
nonstationary nature of these signals and the presence of side-lobes in the spectrum of windows encourage us
to search for optimal window length and shape to avoid destructive effects on the performance, as is
theoretically demonstrated in the literature and experimentally confirmed in this work. Using a sliding
window with adapted length and negligible side-lobes, such as the Gaussian window, provides improved
performance.
To demonstrate the importance of this work, we compare the results obtained by the proposed
method to those obtained by a baseline method of [21]. In this baseline method, authors offer to extract four
sets of features obtained from the wavelet and Hilbert transforms, the homomorphic filtering, and the power
spectral density. This results in eleven extracted features, which were classified using kNNs and yielded
classification accuracy ranging from 74.07 to 81.40. These values of accuracy outperform many other
classification methods, as reported in [21]. The proposed method achieves a classification accuracy of 89.10
with the Gaussian or triangular windows of length 30 samples (37.5 ms) for extracting features and setting
the number of hidden neurons to 30 for classifying features. In this regard, the proposed method outperforms
the baseline method and achieves accuracy gain ranging between 15.03 and 7.70.
5. CONCLUSION
In this paper, we proposed to optimize the shapes and the lengths of sliding symmetric temporal
windows for splitting PCG signals into short-length segments, for extraction of sequences of statistical
features. We used these sequences to train and test the biLSTM classifier for the purpose of diagnosing heart
abnormalities. We tested three different window shapes, each with three different window lengths for the
extraction of features from short-length segments and four different numbers of hidden neurons for feature
classification. According to our findings, the classification performance obtained using the triangular window
is comparable to that obtained using the Gaussian window. However, the performance obtained using the
rectangular window cannot compete with that obtained using the other two windows. This results in the
high-level sidelobe present in the spectrum of the rectangular window compared to the spectra of the other
windows. These sidelobes interfere with the spectrum of signal segment. The best classification performance
is achieved when we used the Gaussian or triangular window of length 75 ms to extract features from
short-length segments of PCG signals, and we set the number of neurons of the biLSTM classifier to 30.
With this set of variables, the proposed methodology outperforms a baseline classification method.
REFERENCES
[1] Y. M. Akay, M. Akay, W. Welkowitz, J. L. Semmlow, and J. B. Kostis, “Noninvasive acoustical detection of coronary artery
disease: a comparative study of signal processing methods,” IEEE Transactions on Biomedical Engineering, vol. 40, no. 6,
pp. 571–578, Jun. 1993, doi: 10.1109/10.237677.
[2] R. M. Rangayyan and R. J. Lehner, “Phonocardiogram signal analysis: a review,” Critical reviews in biomedical engineering,
vol. 15, no. 3, pp. 211–236, 1987.
[3] M. S. Obaidat, “Phonocardiogram signal analysis: Techniques and performance comparison,” Journal of Medical Engineering
and Technology, vol. 17, no. 6, pp. 221–227, Jan. 1993, doi: 10.3109/03091909309006329.
[4] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11,
no. 1, Dec. 2020, doi: 10.1038/s41467-020-15432-4.
[5] W. Chen, Q. Sun, X. Chen, G. Xie, H. Wu, and C. Xu, “Deep learning methods for heart sounds classification: a systematic
review,” Entropy, vol. 23, no. 6, May 2021, doi: 10.3390/e23060667.
[6] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, “Principal component analysis in ECG signal processing,”
EURASIP Journal on Advances in Signal Processing, no. 1, Dec. 2007, doi: 10.1155/2007/74580.
[7] L. M. Sepulveda-Cano, E. Gil, P. Laguna, and G. Castellanos-Dominguez, “Selection of nonstationary dynamic features for
obstructive sleep apnoea detection in children,” EURASIP Journal on Advances in Signal Processing, no. 1, Dec. 2011, doi:
10.1155/2011/538314.
[8] S. Nemati, A. Malhotra, and G. D. Clifford, “Data fusion for improved respiration rate estimation,” EURASIP Journal on
Advances in Signal Processing, no. 1, Dec. 2010, doi: 10.1155/2010/926305.
[9] S. McGee, “Auscultation of the heart: general principles,” in Evidence-Based Physical Diagnosis, Elsevier, 2018, pp. 327–332.
[10] M. El-Segaier, O. Lilja, S. Lukkarinen, L. Sörnmo, R. Sepponen, and E. Pesonen, “Computer-based detection and analysis of
heart sound and murmur,” Annals of Biomedical Engineering, vol. 33, no. 7, pp. 937–942, Jul. 2005, doi: 10.1007/s10439-005-
4053-3.
[11] B. S. Emmanuel, “A review of signal processing techniques for heart sound analysis in clinical diagnosis,” Journal of Medical
Engineering and Technology, vol. 36, no. 6, pp. 303–307, Aug. 2012, doi: 10.3109/03091902.2012.684831.
[12] S. Li, F. Li, S. Tang, and W. Xiong, “A review of computer-aided heart sound detection techniques,” BioMed Research
International, pp. 1–10, Jan. 2020, doi: 10.1155/2020/5846191.
[13] K. M. M. Prabhu, Window functions and their applications in signal processing. CRC Press, 2018., doi: 10.1201/9781315216386.
[14] S. Debbal and F. Bereksi-Reguig, “Detection of differences of the phonocardiogram signals by using the continuous wavelet
transform method,” International Journal of Biomedical Soft Computing and Human Sciences, vol. 18, no. 2, pp. 73–81, 2013.
Int J Elec & Comp Eng ISSN: 2088-8708 
Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry)
6101
[15] J. Bulgrin, B. Rubal, C. Thompson, and J. Moody, “Comparison of short-time fourier, wavelet and time-domain analyses of
intracardiac sounds,” Biomedical sciences instrumentation, vol. 29, pp. 465–472, 1993.
[16] S. Sanei, M. Ghodsi, and H. Hassani, “An adaptive singular spectrum analysis approach to murmur detection from heart sounds,”
Medical Engineering and Physics, vol. 33, no. 3, pp. 362–367, Apr. 2011, doi: 10.1016/j.medengphy.2010.11.004.
[17] J. Wu, S. Zhou, Z. Wu, and X. Wu, “Research on the method of characteristic extraction and classification of Phonocardiogram,”
in 2012 International Conference on Systems and Informatics (ICSAI2012), May 2012, pp. 1732–1735, doi:
10.1109/ICSAI.2012.6223377.
[18] G. Redlarski, D. Gradolewski, and A. Palkowski, “A system for heart sounds classification,” Plos One, vol. 9, no. 11, Nov. 2014,
doi: 10.1371/journal.pone.0112673.
[19] M. Hamidi, H. Ghassemian, and M. Imani, “Classification of heart sound signal using curve fitting and fractal dimension,”
Biomedical Signal Processing and Control, vol. 39, pp. 351–359, Jan. 2018, doi: 10.1016/j.bspc.2017.08.002.
[20] S. A. Singh and S. Majumder, “Classification of unsegmented heart sound recording using KNN classifier,” Journal of Mechanics
in Medicine and Biology, vol. 19, no. 4, Jun. 2019, doi: 10.1142/S0219519419500258.
[21] T. Ölmez and Z. Dokur, “Classification of heart sounds using an artificial neural network,” Pattern Recognition Letters, vol. 24,
no. 1–3, pp. 617–629, Jan. 2003, doi: 10.1016/S0167-8655(02)00281-7.
[22] N. Andrisevic, K. Ejaz, F. Rios-Gutierrez, R. Alba-Flores, G. Nordehn, and S. Burns, “Detection of heart murmurs using wavelet
analysis and artificial neural networks,” Journal of Biomechanical Engineering, vol. 127, no. 6, pp. 899–904, Nov. 2005, doi:
10.1115/1.2049327.
[23] J. P. de Vos and M. M. Blanckenberg, “Automated pediatric cardiac auscultation,” IEEE Transactions on Biomedical
Engineering, vol. 54, no. 2, pp. 244–252, Feb. 2007, doi: 10.1109/TBME.2006.886660.
[24] I. Grzegorczyk et al., “PCG classification using a neural network approach,” Sep. 2016., doi: 10.22489/CinC.2016.323-252.
[25] H. Tang, H. Chen, T. Li, and M. Zhong, “Classification of normal/abnormal heart sound recordings based on multi:domain
features and back propagation neural network,” Sep. 2016., doi: 10.22489/CinC.2016.171-159.
[26] M. Abdollahpur, A. Ghaffari, S. Ghiasi, and M. J. Mollakazemi, “Detection of pathological heart sounds,” Physiological
Measurement, vol. 38, no. 8, pp. 1616–1630, Jul. 2017, doi: 10.1088/1361-6579/aa7840.
[27] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, Nov. 2015. doi:
10.48550/arXiv.1511.08458
[28] P. T. Krishnan, P. Balasubramanian, and S. Umapathy, “Automated heart sound classification system from unsegmented
phonocardiogram (PCG) using deep neural network,” Physical and Engineering Sciences in Medicine, vol. 43, no. 2,
pp. 505–515, Jun. 2020, doi: 10.1007/s13246-020-00851-w.
[29] M. Tschannen, T. Kramer, G. Marti, M. Heinzmann, and T. Wiatowski, “Heart sound classification using deep structured
features,” Sep. 2016., doi: 10.22489/CinC.2016.162-186.
[30] V. Maknickas and A. Maknickas, “Recognition of normal-abnormal phonocardiographic signals using deep convolutional neural
networks and mel-frequency spectral coefficients,” Physiological Measurement, vol. 38, no. 8, pp. 1671–1684, Jul. 2017, doi:
10.1088/1361-6579/aa7841.
[31] J. Rubin, R. Abreu, A. Ganguli, S. Nelaturi, I. Matei, and K. Sricharan, “Recognizing abnormal heart sounds using deep
learning,” arXiv preprint arXiv:1707.04642, 2017, doi: 10.48550/arXiv.1707.04642.
[32] S. Latif, M. Usman, J. Qadir, and R. Rana, “Abnormal heartbeat detection using recurrent neural networks,” CoRR
abs/1801.08322, 2018
[33] L. Medsker and L. C. Jain, Recurrent neural networks: design and applications. CRC press, 1999.
[34] S. Alam, R. Banerjee, and S. Bandyopadhyay, “Murmur detection using parallel recurrent and convolutional neural networks,”
arXiv preprint arXiv:1808.04411, 2018. doi: 10.48550/arXiv.1808.04411.
[35] S. Khaled, M. Fakhry, and A. S. Mubarak, “Classification of PCG signals using a nonlinear autoregressive network with
exogenous inputs (NARX),” in 2020 International Conference on Innovative Trends in Communication and Computer
Engineering (ITCE), Feb. 2020, pp. 98–102, doi: 10.1109/ITCE48509.2020.9047772.
[36] S. Khaled, M. Fakhry, H. Esmail, A. Ezzat, and E. Hamad, “Analysis of training optimization algorithms in the NARX neural
network for classification of heart sound signals,” International Journal of Scientific and Engineering Research, vol. 13, no. 2,
pp. 382–390, 2022.
[37] M. Fakhry and A. F. Brery, “A comparison study on training optimization algorithms in the biLSTM neural network for
classification of PCG signals,” in 2022 2nd International Conference on Innovative Research in Applied Science, Engineering
and Technology (IRASET), Mar. 2022, pp. 1–6, doi: 10.1109/IRASET52964.2022.9738309.
[38] A. Lerch, An introduction to audio content analysis. Hoboken, NJ, USA: John Wiley and Sons, Inc., 2012, doi:
10.1002/9781118393550.
[39] G. Van Houdt, C. Mosquera, and G. Nápoles, “A review on the long short-term memory model,” Artificial Intelligence Review,
vol. 53, no. 8, pp. 5929–5955, Dec. 2020, doi: 10.1007/s10462-020-09838-1.
[40] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45,
no. 11, pp. 2673–2681, 1997, doi: 10.1109/78.650093.
[41] S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A survey of optimization methods from a machine learning perspective,” IEEE
Transactions on Cybernetics, vol. 50, no. 8, pp. 3668–3681, Aug. 2020, doi: 10.1109/TCYB.2019.2950779.
[42] C. Liu et al., “An open access database for the evaluation of heart sound algorithms,” Physiological Measurement, vol. 37, no. 12,
pp. 2181–2213, Dec. 2016, doi: 10.1088/0967-3334/37/12/2181.
[43] J. Maroco, D. Silva, A. Rodrigues, M. Guerreiro, I. Santana, and A. de Mendonça, “Data mining methods in the prediction of
Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression,
neural networks, support vector machines, classification trees and random forests,” BMC Research Notes, vol. 4, no. 1, Dec. 2011,
doi: 10.1186/1756-0500-4-299.
 ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102
6102
BIOGRAPHIES OF AUTHORS
Mahmoud Fakhry received the Ph.D. degree of ICT from the University of
Trento, Italy, in 2016. He worked for the Center for Information and Communication
Technology, Fondazione Bruno Kessler, Trento, Italy from October 2012 to November 2016.
From December 2016 to February 2018, he was a researcher fellow at the Audio Analysis Lab,
Aalborg University, Denmark. From March 2018, he is an assistant professor at the Electrical
Engineering Department, Aswan University, Egypt. His research interests include audio signal
analysis and machine learning. He can be contacted by email: m.fakhry@aswu.edu.eg.
Abeer FathAllah Brery received the master’s degree in computer science and
engineering from Universitat Rovira i Virgili, Tarragona, Spain, in September 2016. From
October 2016, she is a research and teaching assistant at Electrical Engineering Department,
Aswan University, Aswan, Egypt. Her research interests include software engineering and
machine learning. She can be contacted at email: abeer_brery@aswu.edu.eg.

More Related Content

PDF
Classification_of_heart_sounds_using_fra.pdf
PDF
IRJET - Heart Anomaly Detection using Deep Learning Approach based on PCG...
PDF
Feature extraction of electrocardiogram signal using machine learning classif...
PDF
IRJET- Segmentation of PCG Signal: A Survey
PDF
Real Time Acquisition and Analysis of PCG and PPG Signals
PDF
Phonocardiogram based diagnostic system
PDF
50720140101001 2
Classification_of_heart_sounds_using_fra.pdf
IRJET - Heart Anomaly Detection using Deep Learning Approach based on PCG...
Feature extraction of electrocardiogram signal using machine learning classif...
IRJET- Segmentation of PCG Signal: A Survey
Real Time Acquisition and Analysis of PCG and PPG Signals
Phonocardiogram based diagnostic system
50720140101001 2

Similar to Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals (20)

PDF
50720140101001 2
PDF
PHONOCARDIOGRAM HEART SOUND SIGNAL CLASSIFICATION USING DEEP LEARNING TECHNIQUE
PDF
Iaetsd a review on ecg arrhythmia detection
PPTX
Paper Id-266_A Review on Heartbeat Classification.pptx
PDF
PHONOCARDIOGRAM-BASED DIAGNOSIS USING MACHINE LEARNING: PARAMETRIC ESTIMATION...
PDF
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
PDF
PHONOCARDIOGRAM-BASED DIAGNOSIS USING MACHINE LEARNING: PARAMETRIC ESTIMATION...
PDF
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
PDF
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
PDF
Au32311316
PDF
IRJET- Detection of Abnormal ECG Signal using DWT Feature Extraction and CNN
PDF
Identification of Myocardial Infarction from Multi-Lead ECG signal
PDF
Rule Based Identification of Cardiac Arrhythmias from Enhanced ECG Signals Us...
PDF
A Review on ECG -Signal Classification of Scalogram Snap shots the use of Con...
PPTX
Research topic of signal processing PPT for help
PDF
SVM + Machine learning + Biomedical
PDF
A deep learning-based cardio-vascular disease diagnosis system
PDF
PERFORMANCE EVALUATION OF ARTIFICIAL NEURAL NETWORKS FOR CARDIAC ARRHYTHMIA C...
PDF
Classification of ecg signal using artificial neural network
50720140101001 2
PHONOCARDIOGRAM HEART SOUND SIGNAL CLASSIFICATION USING DEEP LEARNING TECHNIQUE
Iaetsd a review on ecg arrhythmia detection
Paper Id-266_A Review on Heartbeat Classification.pptx
PHONOCARDIOGRAM-BASED DIAGNOSIS USING MACHINE LEARNING: PARAMETRIC ESTIMATION...
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
PHONOCARDIOGRAM-BASED DIAGNOSIS USING MACHINE LEARNING: PARAMETRIC ESTIMATION...
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
Phonocardiogram Based Diagnosis Using Machine Learning : Parametric Estimatio...
Au32311316
IRJET- Detection of Abnormal ECG Signal using DWT Feature Extraction and CNN
Identification of Myocardial Infarction from Multi-Lead ECG signal
Rule Based Identification of Cardiac Arrhythmias from Enhanced ECG Signals Us...
A Review on ECG -Signal Classification of Scalogram Snap shots the use of Con...
Research topic of signal processing PPT for help
SVM + Machine learning + Biomedical
A deep learning-based cardio-vascular disease diagnosis system
PERFORMANCE EVALUATION OF ARTIFICIAL NEURAL NETWORKS FOR CARDIAC ARRHYTHMIA C...
Classification of ecg signal using artificial neural network

More from IJECEIAES (20)

PDF
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
PDF
Embedded machine learning-based road conditions and driving behavior monitoring
PDF
Advanced control scheme of doubly fed induction generator for wind turbine us...
PDF
Neural network optimizer of proportional-integral-differential controller par...
PDF
An improved modulation technique suitable for a three level flying capacitor ...
PDF
A review on features and methods of potential fishing zone
PDF
Electrical signal interference minimization using appropriate core material f...
PDF
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
PDF
Bibliometric analysis highlighting the role of women in addressing climate ch...
PDF
Voltage and frequency control of microgrid in presence of micro-turbine inter...
PDF
Enhancing battery system identification: nonlinear autoregressive modeling fo...
PDF
Smart grid deployment: from a bibliometric analysis to a survey
PDF
Use of analytical hierarchy process for selecting and prioritizing islanding ...
PDF
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
PDF
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
PDF
Adaptive synchronous sliding control for a robot manipulator based on neural ...
PDF
Remote field-programmable gate array laboratory for signal acquisition and de...
PDF
Detecting and resolving feature envy through automated machine learning and m...
PDF
Smart monitoring technique for solar cell systems using internet of things ba...
PDF
An efficient security framework for intrusion detection and prevention in int...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Embedded machine learning-based road conditions and driving behavior monitoring
Advanced control scheme of doubly fed induction generator for wind turbine us...
Neural network optimizer of proportional-integral-differential controller par...
An improved modulation technique suitable for a three level flying capacitor ...
A review on features and methods of potential fishing zone
Electrical signal interference minimization using appropriate core material f...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Bibliometric analysis highlighting the role of women in addressing climate ch...
Voltage and frequency control of microgrid in presence of micro-turbine inter...
Enhancing battery system identification: nonlinear autoregressive modeling fo...
Smart grid deployment: from a bibliometric analysis to a survey
Use of analytical hierarchy process for selecting and prioritizing islanding ...
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...
Adaptive synchronous sliding control for a robot manipulator based on neural ...
Remote field-programmable gate array laboratory for signal acquisition and de...
Detecting and resolving feature envy through automated machine learning and m...
Smart monitoring technique for solar cell systems using internet of things ba...
An efficient security framework for intrusion detection and prevention in int...

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
PPT on Performance Review to get promotions
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPT
introduction to datamining and warehousing
PPTX
Lecture Notes Electrical Wiring System Components
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Artificial Intelligence
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Well-logging-methods_new................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mechanical Engineering MATERIALS Selection
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
additive manufacturing of ss316l using mig welding
PPT on Performance Review to get promotions
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
introduction to datamining and warehousing
Lecture Notes Electrical Wiring System Components
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
CYBER-CRIMES AND SECURITY A guide to understanding
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Artificial Intelligence
Operating System & Kernel Study Guide-1 - converted.pdf
Well-logging-methods_new................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 12, No. 6, December 2022, pp. 6090~6102 ISSN: 2088-8708, DOI: 10.11591/ijece.v12i6.pp6090-6102  6090 Journal homepage: http://ijece.iaescore.com Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals Mahmoud Fakhry, Abeer FathAllah Brery Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan, Egypt Article Info ABSTRACT Article history: Received Sep 16, 2021 Revised Jun 27, 2022 Accepted Jul 23, 2022 Heart sound signals, phonocardiography (PCG) signals, allow for the automatic diagnosis of potential cardiovascular pathology. Such classification task can be tackled using the bidirectional long short-term memory (biLSTM) network, trained on features extracted from labeled PCG signals. Regarding the non-stationarity of PCG signals, it is recommended to extract the features from multiple short-length segments of the signals using a sliding window of certain shape and length. However, some window contains unfavorable spectral side lobes, which distort the features. Accordingly, it is preferable to adapt the window shape and length in terms of classification performance. We propose an experimental evaluation for three window shapes, each with three window lengths. The biLSTM network is trained and tested on statistical features extracted, and the performance is reported in terms of the window shapes and lengths. Results show that the best performance is obtained when the Gaussian window is used for splitting the signals, and the triangular window competes with the Gaussian window for a length of 75 ms. Although the rectangular window is a commonly offered option, it is the worst choice for splitting the signals. Moreover, the classification performance obtained with a 75 ms Gaussian window outperforms that of a baseline method. Keywords: Bidirectional long short-term memory network Classification Feature extraction Heart sound Window length Window shape This is an open access article under the CC BY-SA license. Corresponding Author: Mahmoud Fakhry Electrical Engineering Department, Faculty of Engineering, Aswan University 81542 Aswan, Egypt Email: m.fakhry@aswu.edu.eg 1. INTRODUCTION The development of reliable and dynamic noninvasive systems is essential to the successful automatic diagnosis of diseases. Cardiologists are accustomed to using a medical stethoscope to listen to the heart and assess its health. Although this primitive diagnosis strategy is low-cost and non-invasive, its accuracy relays on the hearing experience of the cardiologist, which requires years to acquire [1]. As a result, it sparked the idea of using phonocardiography (PCG) signals to conduct an automated check of heart health [2]–[5]. There are several methods for the automation of heart monitoring and examination, including electrocardiogram (ECG), which records heart electrical activity, and PPG, which estimates the blood flow rate using sensors of light. An assortment of techniques have been developed in the literature for computerizing the investigation of ECG and PPG signals [6]–[8]. Besides ECG and PPG signals, the PCG signals, which records the acoustic sound signals created by the heart in a cardiovascular cycle, can be utilized to proficiently screen and evaluate heart wellbeing. The ECG and PCG signals are more coherent than the PPG signal and they also include more information. In comparison to the ECG signals, the PCG
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6091 signal has a notable benefit in that it records the acoustic features of heart sound signals. These features make it easier to identify murmurs, which are irregularities in PCG signals [9]. PCG signals are recorded in a quiet room, with the PCG transducer, such as a microphone or stethoscope, firmly placed on the chest with a rubber strap. The PCG signal consists of a pair of heartbeats called S1 and S2, and two silent time durations between them as shown in Figure 1. This refers to the times the heart takes to alternate between contraction and closure. The time duration from the conclusion of the heartbeat S1 to the starting of the heartbeat S2 is called the systolic interval, and the diastolic interval is defined as the time duration from the conclusion of the heartbeat S2 to the starting of the heartbeat S1. The presence of disease is associated with additional audible actions that occur during the silent time durations, e.g., murmur is generated when blood continues to flow during the closure of the heart valves. The heartbeat S1 is a large magnitude signal with low frequency components, and the heartbeat S2 is a small magnitude signal with high frequency components [10]. Many methods have been recently developed for diagnosing heart abnormalities based on the analysis of PCG signals [11], [12]. In general, the detection of heart abnormality based on training distinct features is more effective than using training raw PCG signals. Figure 1. A sound recording of normal heart (PCG signal) One of the most difficult aspects for extracting features from PCG signals is that their complex and nonstationary nature. They are considered stationary only within short- time intervals, i.e., “quasi-stationary,” whereas signal characteristics are nonstationary over longer time intervals. To meet this requirement, features should be extracted in short-time intervals. This means that the PCG signal should be divided into short-length segments of overlapping/non-overlapping signal samples. This is accomplished by employing a sliding temporal window with a specific shape and length [13]. In this sense, the selection of the window shape and its length is a critical issue to ensure a successful diagnosis of heart abnormality. In this paper, sequences of statistical features extracted from short-length segments of labeled PCG signals are exploited for training and testing the bidirectional long short-term memory (biLSTM) classifier. Our aim in this work is to experimentally adapt the window shape and length for feature extraction in terms of classification of PCG signals. For that purpose, the classification results of conducted experiments are reported for three different window shapes and each is investigated with three different lengths. Furthermore, we evaluate four different network architectures of the biLSTM classifier. The rest of this manuscript is organized. Methods for diagnosing heart diseases are briefly reviewed in section 2. In section 3 explains the proposed method. Elaborated classification experiments and performance are reported in section 4. The conclusion of this work is drawn in section 5. 2. LITERATURE REVIEW Classification features can be extracted from different spectro-temporal representations of PCG signals. These representations can be obtained using the short-time Fourier transform (STFT) [10], the wavelet transform (WT) [14], and the Mel-frequency cepstral coefficients (MFCCs) [15]. Sanei et al. [16] conducted a comparative study between these time-frequency representations of PCG signals and the temporal analysis of PCG signals. They have obtained a similar frequency and time resolution for the detection of the acoustic properties of heart sound signals. Because subspaces spanned by heartbeats and murmurs are typically distinct, authors developed a constrained singular spectral analysis method as well as an adapted method for subspace selection to separate murmur from PCG signals in [17]. The optimized
  • 3.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6092 singular spectral processing method is found to segregate sound signals even when they are overlapped in time, as evidenced by low correlation values in almost all murmur types. In [18], the support vector machine (SVM) is employed to classify envelopes of PCG signals extracted in the wavelet domain. Hamidi et al. [19] perform feature extraction by estimating the parameters of linear predictive coding, and feature classification by SVM with a modified optimizer based on cuckoo search. Curve fitting and fractal dimension are exploited for producing a couple of feature extraction strategies from PCG signals in [20]. In [21], Ölmez and Dokur investigated the classification performance of the k-nearest neighbors (kNNs) with short-time features of wavelet and Hilbert transforms, homomorphic filtering, and power spectrum density of PCG signals. The extracted features are classified employing kNNs with error distances computed using the second norm for multiple cluster numbers. Artificial neural networks (ANNs) are utilized to classify PCG signals for the first time in [22]. Following that, many scholars focused on diagnosing of heart diseases using various ANNs and features. The principal component analysis and the wavelet decomposition are exploited for extraction of distinct features from PCG signals in [23], [24]. In [25], short-time features of the wavelet analysis and the time-frequency representation of PCG signals were evaluated independently using a multilayer perceptron composed of either a single hidden layer or a couple of hidden layers. The feedforward neural network is used in [26], [27] to classify PCG signals using features computed from the temporal, spectral, and spectro-temporal representations of the signals. The convolutional neural networks (CNNs) have made a rapid progress in the classification of PCG signals [28], [29]. PCG signals are classified using CNNs with features of wavelet coefficients in [30], and of MFCCs in [31]. In [32], heat maps are created from the energy distribution of MFCCs of PCG signals. Authors suggest to combine the heat maps with CNNs to perform such classification tasks. The recurrent neural networks (RNNs) are combined with MFCCs for the classification of PCG signals. The results are reported for four different RNN models, namely long short-term memory (LSTM), bidirectional LSTM (biLSTM), gated recurrent unit (GRU), and bidirectional GRU (biGRU) [33]. Alam et al. [34] describe a combination of two network architectures, namely the CNN and the biLSTM networks. They offer to train visible and temporal properties of murmur in PCG signals through the use of short-time features of spectrogram and MFCCs. The nonlinear autoregressive network with exogenous inputs (NARX) is leveraged for binary the detection of heart abnormality in [35], [36]. This training and testing of this network is performed using three different groups of short-length features, namely, temporal, spectral, and statistical. Comparative analysis of three algorithms of training the biLSTM network for the purpose of PCG signal classification is prepared in [37]. Authors concluded that the network provides the best classification performance when it is trained with the stochastic gradient descent (SGD) with momentum algorithm. 3. PROPOSED METHOD Because PCG signals are complex and nonstationary in nature, their analysis is a difficult issue. They are only deemed stationary within specific short-time intervals. In the proposed method, we offer to split a PCG signal into overlapping short-length segments of samples. This signal dividing step is completed using a sliding symmetric temporal window with certain shape and length. The contribution of each sample of PCG signals in the segment is defined by the window shape, and different segments are obtained by translating the window one sample step. The diagnosing of heart diseases is accomplished by classification of PCG which is achieved using the biLSTM network and 10 statistical features calculated independently inside each segment. However, some window shapes contain unfavorable spectral sidelobes. These sidelobes produce spectral components that interfere with the true spectrum of segments, resulting in distorted features. Using distorted classification features reduces the probability of accurate detection of heart abnormality. In this proposal, we offer to make a comparison among three different window shapes, each with three different lengths. In this sense, we adapt the shape and length of the window in the context of classification of PCG signals. Sequences of features are employed for training and testing different biLSTM architectures. Figure 2 depicts the flowchart of the proposed method. The sections that follow explain the procedure, beginning with signal pre-processing and progressing to feature extraction and feature classification. 3.1. Pre-processing Most heart sound recordings obtained with recording instruments are corrupted with noise from the surrounding environment. As a result, filtering out the noise becomes critical, but not at the expense of omitting information necessary for diagnosing heart disease. To reject distorting noise from the recordings, PCG signals are severely filtered. The bandwidth of heartbeat S1 is mainly from 10 to 200 Hz, which is made
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6093 by the vibrations of heart chambers and valves. After the closure of semilunar valves on aortic and pulmonary arteries, the heartbeat S2 is formed with a frequency range from 20 to 250 Hz. The low-frequency components of diagnostic information are preserved while high-frequency noise components are rejected using a low-pass filter with a 250 Hz cut-off frequency. Following Nyquist theorem, the filtered signals are down-sampled before feature extraction to reduce both calculations and computation complexity. Figure 2. Flowchart of the proposed method 3.2. PCG signal windowing An input signal can be divided into L+1 overlapped segments of signal samples. That is, the boundaries of subsequent segments overlap, i.e., the last samples of one segment are present in the following segment. These overlapping segments are obtained by moving a temporal sliding window along the signal. A typical overlap is chosen between one and L samples. In this work, a PCG signal is represented by N overlapping segments of samples with the length of overlap to be L. This is achieved for a PCG signal x[n] of length N by employing a sliding symmetric window w[n] of length L+1, with L+1<=N, such as (1): 𝑦𝑛[𝑙] = 𝑤[𝑙]𝑥[𝑛 + 𝑙], |𝑙| ≤ 𝐿 2 , (1) where yn[l] is the nth signal block of length L+1. The shape of the window is selected from widely used windows, including the rectangular window, the triangular window, and the Gaussian window. 3.2.1. Sliding symmetric window Because the time-domain product of two signals corresponds to the linear convolution of their respective spectra, the spectrum of the signal x[n] is convolved with the spectrum of the sliding window w[n] to obtain the spectrum of y[n]. As a result, the spectrum of the window influences the resulting spectrum. The typical spectral shape of a window consists of a main lobe and many side lobes with decreasing intensity as shown in Figure 3. When the signal of interest is sinusoidal, the resulting Fourier transform of the windowed signal will be a superposition of two window functions with their main lobes located at the sinusoidal signal’s frequency, effectively “smearing” the delta functions. This unfavorable side effect is usually characterized by [13]: i) the main-lobe width, ii) the intensity of the first (closest) side-lobe peak, and iii) the attenuation of the subsequent side-lobe peaks. Various windows have been offered in the literature to optimize these properties for specific use situations, and some of them are studied here for extracting features in the context of PCG signal classification. 3.2.2. The rectangular window Because signal truncation corresponds to applying the rectangular window, this is the simplest window. It is also called the Dirichlet window and defined: 𝑤[𝑙] = 1, |𝑙| ≤ 𝐿 2 . (2) The spectrum of this window is represented by a sinc function, with the peak of the first side-lobe accounting for approximately one-fifth the peak of the main-lobe.
  • 5.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6094 Figure 3. Three different symmetric windows and their corresponding spectral shapes with L=20 3.2.3. The triangular window The linear convolution of two rectangular windows of length L/2 yields a (L+1)-length triangular window. Therefore, the spectrum of the triangular window is obviously the Dirichlet kernel squared. This window exhibits a nonnegative Fourier transform due to the above discussed convolution property. Its mathematical formula is determined by (3). 𝑤[𝑙] = 1 − |2𝑙| 𝐿 , |𝑙| ≤ 𝐿 2 , (3) The half width of the main lobe of the triangular window equals twice that of the rectangular window. 3.2.4. The Gaussian window It is well known that the spectrum of a Gaussian function is also a Gaussian function. Since this function spans an infinity time, it must be trimmed at its ends to be used as a window function. This window is represented by (4). 𝑤[𝑙] = 𝑒 − 1 2 (α 𝑙 𝐿/2 ) 2 , |𝑙| ≤ 𝐿 2 , (4) Increasing the value of α results in narrowing the width of the window, which reduces the severity of the discontinuity at the edges. However, this will widen the main-lobe and, as a result, diminish the side-lobe levels. The most common values of α are 2.5, 3, and 3.5. 3.3. Feature extraction Various measures for describing the statistical properties of signals are widely used. These measures can be used as classification features extracted from short-length segments of signal samples. Theoretically, the computation of statistical features requires infinite length observations; however, in practical applications, short-length segments can be assumed enough if we adapt the window shape and length [38]. The following section provides explanations of 10 feature sequences representing each PCG signal of the recording database. 3.3.1. Arithmetic mean The arithmetic mean is the average value of the short-length segment of signal samples. Assuming that the samples of a segment are uniformly distributed, the arithmetic mean is computed by (5). μ𝑦[𝑛] = 1 𝐿+1 ∑ 𝑦𝑛[𝑙] 𝐿/2 𝑙=−𝐿/2 . (5) For a symmetric probability density function (PDF), the arithmetic mean is the position of the symmetric axis. When the PDF is not symmetric (a skewed signal), then the calculation of the mean is of limited use.
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6095 3.3.2. Median The median is the value that spatially divides a segment of samples into half. To find the median, sort the samples of yn[l] from the smallest value to the largest, and then find the sample with an equal number of samples above and below it. Outlier samples have a minor influence on the median. For a signal with repeated sample values, the median is of limited use, and the mode takes its place as a measure of the average. 3.3.3. Mode The mode is the most frequently occurring value in a segment of samples. In other words, the mode is the common value that is repeated in samples. A multimodal distribution exists when the samples contain multiple values that are linked to the most frequently occurring. If no value repeats, the signal lacks a mode. 3.3.4. Variance The variance and standard deviation both measure the spread of the input segment of signal sample around its arithmetic mean. The difference is given by (6). σ𝑦 2 [𝑛] = 1 𝐿+1 ∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛]) 2 𝐿/2 𝑙=−𝐿/2 . (6) For μ𝑦[𝑛] = 0, the variance equals the power of the observed segment of signal samples. The standard deviation σ𝑦[𝑛] can be computed directly from the variance, σ𝑦[𝑛] = √σ𝑦 2[𝑛]. 3.3.5. Skewness The skewness is defined as the division of the third central moment of a segment of signal sample by the cube of its standard deviation. It is given by (7). 𝑣𝑦[𝑛] = 1 𝐿+1 1 σ𝑦 3[𝑛] ∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛]) 3 𝐿/2 𝑙=−𝐿/2 . (7) Skewness is a measure of PDF asymmetry. It is 0 for symmetric distributions, negative for distributions with their mass centered on the right, and positive for distributions with their mass centered on the left. 3.3.6. Kurtosis Kurtosis is defined as the division of the fourth central moment of a segment of signal sample by the fourth power of the standard deviation. It is mathematically computed as (8): ω𝑦[𝑛] = 1 𝐿+1 1 σ𝑦 4[𝑛] ∑ (𝑦𝑛[𝑙] − μ𝑦[𝑛]) 4 𝐿/2 𝑙=−𝐿/2 − 3. (8) Kurtosis quantifies the non-Gaussianity of the PDF. It is an indicator of the flatness of the PDF compared to the Gaussian distribution. It is zero for a Gaussian distribution, is negative for a flatter distribution with a wider peak, and positive for distributions with a sharper peak. 3.3.7. Shannon energy Shannon energy finds the average spectral of a segment of signal samples. It discounts the high value components into the low value components. It is obtained for a segment of samples. 𝐸𝑠𝑦[𝑛] = ∑ |𝑦𝑛[𝑙]|2 𝐿/2 𝑙=−𝐿/2 log|𝑦𝑛[𝑙]|2 . (9) In the presence of noise and outliers, Shannon energy approaches signal ranges, resulting in fewer errors. The advantage of using Shannon energy is its ability to emphasis the medium over traditional energy. 3.3.8. Shannon entropy Shannon entropy quantifies the uncertainty of a random variable. It is calculated using the PDF 𝑝𝑦(𝑦𝑛[𝑙]) of a segment of samples. ℎ𝑠𝑦[𝑛] = ∑ 𝑝𝑦(𝑦𝑛[𝑙]) 𝐿/2 𝑙=−𝐿/2 log 𝑝𝑦 (𝑦𝑛[𝑙]). (10) Entropy measures how effectively one can predict the behavior of respective parts from the others. In general, more entropy denotes more complicated signals and, as a result, less predictability.
  • 7.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6096 3.3.9. Zero-crossing rate The number of sign changes in subsequent samples is referred to as the zero-crossing rate. It is a low-level feature that has been utilized in audio analysis for decades due to its straightforward calculation. 𝑧𝑦[𝑛] = 1 2 1 𝐿+1 ∑ |sign(𝑦𝑛[𝑙]) − sign(𝑦𝑛[𝑙 − 1])| 𝐿/2 𝑙=−𝐿/2 . (11) The output is in the range of 0 ≤ 𝑧𝑦[𝑛] ≤ 1. The more often the signal changes its sign, the more high-frequency content is assumed to be in the signal. 3.3.10. Quantile ranges Quantiles can be used to split the PDF into subsets of equal size. When the PDF is divided into two quantiles, it is divided into two halves, each of which contains half of the total number of observations. The median will be defined as the point where the border between those two quantiles crosses. 𝑄𝑦(0.5) = 𝑦|∫ 𝑝𝑦(𝑥)𝑑𝑥 𝑦 𝑖𝑛𝑓 =0.5 (12) In the case of symmetric distributions, this equals the arithmetic mean. Quantile ranges are useful for simplifying the explanation of a distribution’s shape. Here, we adopt the range spanned by 50 % of the samples by discarding the upper and lower 25 %. The quantile range is then computed. Δ𝑄𝑦(0.5) = 𝑄𝑦(0.75) − 𝑄𝑦(0.25). (13) 3.4. Normalization of feature sequences Normalization rescales the features so that they have a zero mean and unit standard deviation. For a feature vector a[n] with mean μ𝑎 and standard deviation σ𝑎, the normalization is accomplished. 𝑎[𝑛] = 𝑎[𝑛]−μ𝑎 σ𝑎 . (14) Normalization of features to be centered around zero with a standard deviation of one is important for comparing measurements with different units, and it is also required by many learning algorithms. This normalization is applied to all of feature sequences obtained as shown in the previous section. Figure 4 depicts an example of 10 normalized sequences of extracted features from two PCG signals. The normalized sequences are then used to train and test the biLSTM network. 3.5. Feature classification Recurrent neural networks (RNNs) [33] are composed of hidden layers and feedback connections. In RNN models there are at least recurrent layer with numerous hidden neurons/nodes between input and output layers. The recurrent long short-term memory (LSTM) network contains hidden layers with self-recurrent weights, allowing memory to retain previous information. This network is well-known for modeling the trend in a sequence of features [39]. The bidirectional LSTM (biLSTM) network is a modified LSTM network with two hidden layers. These hidden layers learn sequences by alternating between backward and forward layers [40]. The alternating strategy improves the accuracy of LSTM networks by enabling past data to offer context for subsequent samples in a training sequence. 3.5.1. Training the biLSTM model Backpropagation is applied to calculate the training parameters of the biLSTM network, which is then followed by an optimization algorithm [41]. The simplest way is to take each training example, run it through the network to get a prediction output, subtract it from the actual output we want to get, and square it. The network performs very well if the loss function is small, and it should be as small as possible. Backpropagation computes the gradients used by the optimization algorithm to minimize loss functions. To update the network parameters, we use the stochastic gradient descent (SGD) with momentum. Unlike the SGD, the momentum considers the gradient obtained in the last steps to find the optimum search path. This allows the network to approach the minimum value of the loss function more quickly. Although the momentum speeds up convergence, it should be used in conjunction with simulated annealing to avoid overshooting global minima. In the real world, the momentum is set at values that start at 0.5 and progressively increase to 0.9 over the course of the training epochs.
  • 8. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6097 Figure 4. Normalized sequences of features for two PCG signals obtained with a 75 ms Gaussian window 4. EXPERIMENTS 4.1. Database The heart sound recordings considered here are taken from the database of the PhysioNet 2016 challenge, which is publicly available on the Web [42]. An electronic stethoscope was used to record PCG signals at four different points on the chest. The database includes recordings that have been classified as healthy or pathological. We chose a well-balanced database that includes sound recordings of 150 healthy hearts and other 150 pathological hearts. These recordings range in length between 5 and 120 seconds, with a sampling frequency of 2,000 Hz. Following Nyquist theorem, all recordings are downsampled at 500 Hz. 4.2. Training and testing For extraction of short-time features, we truncate the downsampled PCG signals, so that their lengths become 10 seconds (5,000 samples). The truncated signals are divided into short-length segments of samples using a window of length L+1. 70% of the extracted features are used to train the model, while 30% are used to test it. The training and testing sets are picked from the extracted features of the recording database at random. This procedure of training and testing is done for thirty times separately. Averaging the outcomes of the thirty experiments yields the classification performance. The classification results are provided in terms of the shape and length of the sliding symmetric window and as a function of four different numbers of the hidden neurons of the biLSTM classifier. We set the value of the model variables as listed in Table 1. These variables include the learning rate and the value of momentum in SGDM. 4.3. Classification performance The number of successfully classified signal examples vs the number of wrongly classified signal examples is a measure of classification performance. False negatives (FN) occur when the total number of signals from pathological patients are reported healthy, while false positives (FP) occur when the total number of signals from healthy patients are proclaimed pathological. True positives (TP) refer to Magnitude Index Index Magnitude
  • 9.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6098 pathological heart sounds that have been accurately identified, whereas true negatives (TN) are healthy heart sounds that have been appropriately categorized. The sensitivity (Sens.), specificity (Spec.), and accuracy (Accu.) are commonly computed using the above-defined variables [43]. 𝑆𝑒𝑛𝑠. = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 100%, 𝑆𝑝𝑒𝑐. = 𝑇𝑁 𝑇𝑁 + 𝐹𝑃 100% 𝐴𝑐𝑐𝑢. = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑁 + 𝑇𝑁 + 𝐹𝑃 100%. Table 1. Implementation variables of the biLSTM classifier with SGDM Variable Value # biLSTM layers 2 # Neurons in layer 5, 30, 50 or 100 # Classes 2 Initial learning rate 0.01 Epochs 500 The momentum 0.90 4.4. Results Figure 5 depicts the classification records for the selected test dataset averaged over the number of hidden neurons in the biLSTM model. The best performance is obtained by employing the Gaussian window with a length of 30 samples (75 ms) to split PCG signals into short-length segments for feature extraction, followed by the triangular window. Furthermore, the specificity and the accuracy accomplished by the classifier are better when short-length segments are obtained using the triangular window of length 50 samples (125 ms) than those by the Gaussian window, and the sensitivity values are comparable. The results obtained with the Gaussian window surpass those with the triangle and rectangular windows in terms of sensitivity and accuracy for windows of length of 15 samples (37.5 ms), while the results with the rectangular window are better than those with the other windows in terms of specificity, followed by those with the Gaussian window. Figure 5. Average performance for four numbers of neurons of the classification model Figure 6. depicts three sub-figures of the accuracy obtained by the biLSTM network. Each sub-figure shows the accuracy when each window of the three windows is used to divide PCG signals into short-length segments. The accuracy follows different trends for different numbers of hidden neurons.
  • 10. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6099 Overall window shapes, the optimal accuracy achieved by the biLSTM classifier when the number of its hidden neurons is set to 30. For a large or small number of hidden neurons, accuracy degradation is recorded. When using the biLSTM network to classify features extracted from short-length segments of PCG signals with a sliding symmetric window, the Gaussian window or the triangular window is the best choice. However, the rectangular window outperforms the other two windows for a length of 15 samples (37.5 ms). Table 2 reports detailed classification performance for the selected test PCG signals. According to the results, using any window length and shape provides a good identification of heart abnormality for all numbers of hidden neurons in the biLSTM classifier, as evidenced by the sensitivity values compared to the specificity. The window shape and length can be adapted to get the best performance. The best achieved accuracy is 89.10. Figure 6. Classification performance of the test dataset in terms of accuracy for each window shape as a function of a different number of hidden neurons and different window lengths Table 2. Detailed classification results for the test data Window length 15 samples 30 samples 50 samples Window shape Neurons Sens. Spec. Accu. Sens. Spec. Accu. Sens. Spec. Accu. Rectangular 5 71.40 74.10 72.70 78.60 74.10 76.40 78.60 70.40 74.50 30 82.10 77.80 80.00 85.70 70.40 78.20 71.40 70.40 70.90 50 75.00 74.10 74.50 75.00 74.10 74.50 71.40 70.40 70.90 100 78.60 77.80 78.20 75.00 70.40 72.70 75.00 63.00 69.10 Triangular 5 82.10 77.80 80.00 85.70 74.10 80.00 75.00 81.50 78.20 30 89.30 70.40 80.00 92.90 85.20 89.10 85.70 77.80 81.80 50 85.70 70.40 78.20 85.70 85.20 85.50 82.10 77.80 80.00 100 78.60 70.40 74.50 89.30 81.50 85.50 78.60 77.80 78.20 Gaussian 5 85.70 66.70 76.40 89.30 88.90 89.10 75.00 66.70 70.90 30 92.90 77.80 85.50 92.90 85.20 89.10 89.30 81.50 85.50 50 92.90 77.80 85.50 92.90 85.20 89.10 85.70 77.80 81.80 100 89.30 77.80 83.60 85.70 81.50 83.60 78.60 74.10 76.40
  • 11.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6100 4.5. Discussion Adaptation of the shape and length of sliding window to split PCG signals into short-length segments for feature extraction is an intriguing need to improve the diagnosis accuracy of heart disease. The nonstationary nature of these signals and the presence of side-lobes in the spectrum of windows encourage us to search for optimal window length and shape to avoid destructive effects on the performance, as is theoretically demonstrated in the literature and experimentally confirmed in this work. Using a sliding window with adapted length and negligible side-lobes, such as the Gaussian window, provides improved performance. To demonstrate the importance of this work, we compare the results obtained by the proposed method to those obtained by a baseline method of [21]. In this baseline method, authors offer to extract four sets of features obtained from the wavelet and Hilbert transforms, the homomorphic filtering, and the power spectral density. This results in eleven extracted features, which were classified using kNNs and yielded classification accuracy ranging from 74.07 to 81.40. These values of accuracy outperform many other classification methods, as reported in [21]. The proposed method achieves a classification accuracy of 89.10 with the Gaussian or triangular windows of length 30 samples (37.5 ms) for extracting features and setting the number of hidden neurons to 30 for classifying features. In this regard, the proposed method outperforms the baseline method and achieves accuracy gain ranging between 15.03 and 7.70. 5. CONCLUSION In this paper, we proposed to optimize the shapes and the lengths of sliding symmetric temporal windows for splitting PCG signals into short-length segments, for extraction of sequences of statistical features. We used these sequences to train and test the biLSTM classifier for the purpose of diagnosing heart abnormalities. We tested three different window shapes, each with three different window lengths for the extraction of features from short-length segments and four different numbers of hidden neurons for feature classification. According to our findings, the classification performance obtained using the triangular window is comparable to that obtained using the Gaussian window. However, the performance obtained using the rectangular window cannot compete with that obtained using the other two windows. This results in the high-level sidelobe present in the spectrum of the rectangular window compared to the spectra of the other windows. These sidelobes interfere with the spectrum of signal segment. The best classification performance is achieved when we used the Gaussian or triangular window of length 75 ms to extract features from short-length segments of PCG signals, and we set the number of neurons of the biLSTM classifier to 30. With this set of variables, the proposed methodology outperforms a baseline classification method. REFERENCES [1] Y. M. Akay, M. Akay, W. Welkowitz, J. L. Semmlow, and J. B. Kostis, “Noninvasive acoustical detection of coronary artery disease: a comparative study of signal processing methods,” IEEE Transactions on Biomedical Engineering, vol. 40, no. 6, pp. 571–578, Jun. 1993, doi: 10.1109/10.237677. [2] R. M. Rangayyan and R. J. Lehner, “Phonocardiogram signal analysis: a review,” Critical reviews in biomedical engineering, vol. 15, no. 3, pp. 211–236, 1987. [3] M. S. Obaidat, “Phonocardiogram signal analysis: Techniques and performance comparison,” Journal of Medical Engineering and Technology, vol. 17, no. 6, pp. 221–227, Jan. 1993, doi: 10.3109/03091909309006329. [4] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, Dec. 2020, doi: 10.1038/s41467-020-15432-4. [5] W. Chen, Q. Sun, X. Chen, G. Xie, H. Wu, and C. Xu, “Deep learning methods for heart sounds classification: a systematic review,” Entropy, vol. 23, no. 6, May 2021, doi: 10.3390/e23060667. [6] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, “Principal component analysis in ECG signal processing,” EURASIP Journal on Advances in Signal Processing, no. 1, Dec. 2007, doi: 10.1155/2007/74580. [7] L. M. Sepulveda-Cano, E. Gil, P. Laguna, and G. Castellanos-Dominguez, “Selection of nonstationary dynamic features for obstructive sleep apnoea detection in children,” EURASIP Journal on Advances in Signal Processing, no. 1, Dec. 2011, doi: 10.1155/2011/538314. [8] S. Nemati, A. Malhotra, and G. D. Clifford, “Data fusion for improved respiration rate estimation,” EURASIP Journal on Advances in Signal Processing, no. 1, Dec. 2010, doi: 10.1155/2010/926305. [9] S. McGee, “Auscultation of the heart: general principles,” in Evidence-Based Physical Diagnosis, Elsevier, 2018, pp. 327–332. [10] M. El-Segaier, O. Lilja, S. Lukkarinen, L. Sörnmo, R. Sepponen, and E. Pesonen, “Computer-based detection and analysis of heart sound and murmur,” Annals of Biomedical Engineering, vol. 33, no. 7, pp. 937–942, Jul. 2005, doi: 10.1007/s10439-005- 4053-3. [11] B. S. Emmanuel, “A review of signal processing techniques for heart sound analysis in clinical diagnosis,” Journal of Medical Engineering and Technology, vol. 36, no. 6, pp. 303–307, Aug. 2012, doi: 10.3109/03091902.2012.684831. [12] S. Li, F. Li, S. Tang, and W. Xiong, “A review of computer-aided heart sound detection techniques,” BioMed Research International, pp. 1–10, Jan. 2020, doi: 10.1155/2020/5846191. [13] K. M. M. Prabhu, Window functions and their applications in signal processing. CRC Press, 2018., doi: 10.1201/9781315216386. [14] S. Debbal and F. Bereksi-Reguig, “Detection of differences of the phonocardiogram signals by using the continuous wavelet transform method,” International Journal of Biomedical Soft Computing and Human Sciences, vol. 18, no. 2, pp. 73–81, 2013.
  • 12. Int J Elec & Comp Eng ISSN: 2088-8708  Comparison of window shapes and lengths in short-time feature extraction for … (Mahmoud Fakhry) 6101 [15] J. Bulgrin, B. Rubal, C. Thompson, and J. Moody, “Comparison of short-time fourier, wavelet and time-domain analyses of intracardiac sounds,” Biomedical sciences instrumentation, vol. 29, pp. 465–472, 1993. [16] S. Sanei, M. Ghodsi, and H. Hassani, “An adaptive singular spectrum analysis approach to murmur detection from heart sounds,” Medical Engineering and Physics, vol. 33, no. 3, pp. 362–367, Apr. 2011, doi: 10.1016/j.medengphy.2010.11.004. [17] J. Wu, S. Zhou, Z. Wu, and X. Wu, “Research on the method of characteristic extraction and classification of Phonocardiogram,” in 2012 International Conference on Systems and Informatics (ICSAI2012), May 2012, pp. 1732–1735, doi: 10.1109/ICSAI.2012.6223377. [18] G. Redlarski, D. Gradolewski, and A. Palkowski, “A system for heart sounds classification,” Plos One, vol. 9, no. 11, Nov. 2014, doi: 10.1371/journal.pone.0112673. [19] M. Hamidi, H. Ghassemian, and M. Imani, “Classification of heart sound signal using curve fitting and fractal dimension,” Biomedical Signal Processing and Control, vol. 39, pp. 351–359, Jan. 2018, doi: 10.1016/j.bspc.2017.08.002. [20] S. A. Singh and S. Majumder, “Classification of unsegmented heart sound recording using KNN classifier,” Journal of Mechanics in Medicine and Biology, vol. 19, no. 4, Jun. 2019, doi: 10.1142/S0219519419500258. [21] T. Ölmez and Z. Dokur, “Classification of heart sounds using an artificial neural network,” Pattern Recognition Letters, vol. 24, no. 1–3, pp. 617–629, Jan. 2003, doi: 10.1016/S0167-8655(02)00281-7. [22] N. Andrisevic, K. Ejaz, F. Rios-Gutierrez, R. Alba-Flores, G. Nordehn, and S. Burns, “Detection of heart murmurs using wavelet analysis and artificial neural networks,” Journal of Biomechanical Engineering, vol. 127, no. 6, pp. 899–904, Nov. 2005, doi: 10.1115/1.2049327. [23] J. P. de Vos and M. M. Blanckenberg, “Automated pediatric cardiac auscultation,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 2, pp. 244–252, Feb. 2007, doi: 10.1109/TBME.2006.886660. [24] I. Grzegorczyk et al., “PCG classification using a neural network approach,” Sep. 2016., doi: 10.22489/CinC.2016.323-252. [25] H. Tang, H. Chen, T. Li, and M. Zhong, “Classification of normal/abnormal heart sound recordings based on multi:domain features and back propagation neural network,” Sep. 2016., doi: 10.22489/CinC.2016.171-159. [26] M. Abdollahpur, A. Ghaffari, S. Ghiasi, and M. J. Mollakazemi, “Detection of pathological heart sounds,” Physiological Measurement, vol. 38, no. 8, pp. 1616–1630, Jul. 2017, doi: 10.1088/1361-6579/aa7840. [27] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, Nov. 2015. doi: 10.48550/arXiv.1511.08458 [28] P. T. Krishnan, P. Balasubramanian, and S. Umapathy, “Automated heart sound classification system from unsegmented phonocardiogram (PCG) using deep neural network,” Physical and Engineering Sciences in Medicine, vol. 43, no. 2, pp. 505–515, Jun. 2020, doi: 10.1007/s13246-020-00851-w. [29] M. Tschannen, T. Kramer, G. Marti, M. Heinzmann, and T. Wiatowski, “Heart sound classification using deep structured features,” Sep. 2016., doi: 10.22489/CinC.2016.162-186. [30] V. Maknickas and A. Maknickas, “Recognition of normal-abnormal phonocardiographic signals using deep convolutional neural networks and mel-frequency spectral coefficients,” Physiological Measurement, vol. 38, no. 8, pp. 1671–1684, Jul. 2017, doi: 10.1088/1361-6579/aa7841. [31] J. Rubin, R. Abreu, A. Ganguli, S. Nelaturi, I. Matei, and K. Sricharan, “Recognizing abnormal heart sounds using deep learning,” arXiv preprint arXiv:1707.04642, 2017, doi: 10.48550/arXiv.1707.04642. [32] S. Latif, M. Usman, J. Qadir, and R. Rana, “Abnormal heartbeat detection using recurrent neural networks,” CoRR abs/1801.08322, 2018 [33] L. Medsker and L. C. Jain, Recurrent neural networks: design and applications. CRC press, 1999. [34] S. Alam, R. Banerjee, and S. Bandyopadhyay, “Murmur detection using parallel recurrent and convolutional neural networks,” arXiv preprint arXiv:1808.04411, 2018. doi: 10.48550/arXiv.1808.04411. [35] S. Khaled, M. Fakhry, and A. S. Mubarak, “Classification of PCG signals using a nonlinear autoregressive network with exogenous inputs (NARX),” in 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Feb. 2020, pp. 98–102, doi: 10.1109/ITCE48509.2020.9047772. [36] S. Khaled, M. Fakhry, H. Esmail, A. Ezzat, and E. Hamad, “Analysis of training optimization algorithms in the NARX neural network for classification of heart sound signals,” International Journal of Scientific and Engineering Research, vol. 13, no. 2, pp. 382–390, 2022. [37] M. Fakhry and A. F. Brery, “A comparison study on training optimization algorithms in the biLSTM neural network for classification of PCG signals,” in 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Mar. 2022, pp. 1–6, doi: 10.1109/IRASET52964.2022.9738309. [38] A. Lerch, An introduction to audio content analysis. Hoboken, NJ, USA: John Wiley and Sons, Inc., 2012, doi: 10.1002/9781118393550. [39] G. Van Houdt, C. Mosquera, and G. Nápoles, “A review on the long short-term memory model,” Artificial Intelligence Review, vol. 53, no. 8, pp. 5929–5955, Dec. 2020, doi: 10.1007/s10462-020-09838-1. [40] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997, doi: 10.1109/78.650093. [41] S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A survey of optimization methods from a machine learning perspective,” IEEE Transactions on Cybernetics, vol. 50, no. 8, pp. 3668–3681, Aug. 2020, doi: 10.1109/TCYB.2019.2950779. [42] C. Liu et al., “An open access database for the evaluation of heart sound algorithms,” Physiological Measurement, vol. 37, no. 12, pp. 2181–2213, Dec. 2016, doi: 10.1088/0967-3334/37/12/2181. [43] J. Maroco, D. Silva, A. Rodrigues, M. Guerreiro, I. Santana, and A. de Mendonça, “Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests,” BMC Research Notes, vol. 4, no. 1, Dec. 2011, doi: 10.1186/1756-0500-4-299.
  • 13.  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 6, December 2022: 6090-6102 6102 BIOGRAPHIES OF AUTHORS Mahmoud Fakhry received the Ph.D. degree of ICT from the University of Trento, Italy, in 2016. He worked for the Center for Information and Communication Technology, Fondazione Bruno Kessler, Trento, Italy from October 2012 to November 2016. From December 2016 to February 2018, he was a researcher fellow at the Audio Analysis Lab, Aalborg University, Denmark. From March 2018, he is an assistant professor at the Electrical Engineering Department, Aswan University, Egypt. His research interests include audio signal analysis and machine learning. He can be contacted by email: m.fakhry@aswu.edu.eg. Abeer FathAllah Brery received the master’s degree in computer science and engineering from Universitat Rovira i Virgili, Tarragona, Spain, in September 2016. From October 2016, she is a research and teaching assistant at Electrical Engineering Department, Aswan University, Aswan, Egypt. Her research interests include software engineering and machine learning. She can be contacted at email: abeer_brery@aswu.edu.eg.