A Non Parametric Estimation Based Underwater Target Classifier

Binesh T, Supriya M.H & P.R.Saseendran Pillai
Signal Processing: An International Journal (SPIJ), Volume (5) : Issue (4) : 2011 156
A Non Parametric Estimation Based Underwater Target
Classifier
Binesh T bineshtbt@gmail.com
Department of Electronics,
Cochin University of Science and Technology,
Cochin 682 022, India
Supriya M. H. supriya@cusat.ac.in
P.R. Saseendran Pillai prspillai@cusat.ac.in
Abstract
Underwater noise sources constitute a prominent class of input signal in most underwater signal
processing systems. The problem of identification of noise sources in the ocean is of great
importance because of its numerous practical applications. In this paper, a methodology is
presented for the detection and identification of underwater targets and noise sources based on
non parametric indicators. The proposed system utilizes Cepstral coefficient analysis and the
Kruskal-Wallis H statistic along with other statistical indicators like F-test statistic for the effective
detection and classification of noise sources in the ocean. Simulation results for typical
underwater noise data and the set of identified underwater targets are also presented in this
paper.
Keywords: Cepstral Coefficients, Linear Prediction Coefficients, Forward Backward Algorithm, Kruskal-
Wallis H Statistic, F-test Statistic, Median, Sum of Ranks.
1. INTRODUCTION
Underwater acoustic propagation depends on a variety of factors associated with the channel in
addition to the characteristic properties of the generating source. Studies on noise data
waveforms generated by man made underwater targets and marine species are significant as
they will unveil the general characteristics of the noise generating mechanisms. The composite
ambient noise containing the noise waveforms from the targets, received by the hydrophone
array systems are processed for extracting the target specific features. Though quite a large
number of techniques have been evolved for the extraction of source specific features for the task
of identification and classification, none of them are capable of providing the complete set of
functional clues. Of these, many of the techniques are complex and some of them lead to
ambiguities in the decision making process. Since classification of noise sources using certain
traditional techniques yields low accuracy rates, many improved approaches based on non-
parametric and parametric modeling have been mentioned in open literature [1]. Some of the
modern approaches for the extraction of spectral profiles give more emphasis to spectral
resolutions and increased signal detection capabilities while others rely on the extraction and
utilization of acceptable features of underwater signal sources .The proper identification and
classification of underwater man-made and biological noise sources can utilize the cepstral
feature extraction and non parametric statistical approaches which do not rely on any

assumption that the data are drawn from a given probability distribution and includes non
parametric statistical models, inference and statistical tests. The Kruskal-Wallis H test is a non
parametric test and the H statistic can be efficiently employed in different statistical situations[2].
The underwater noise signal is sampled, processed and cepstral features are extracted and
hence the sample set of transition probability values of the system model is estimated. The H
statistic, F-test statistic, Median value and Sum of Ranks are estimated for the sample sets of
various underwater signals, the transition probability values and a reference signal, which were
found to be occupying non overlapping value ranges and can be utilized in the system design for
the identification of underwater signal sources.
2. PRINCIPLES
Cepstral coefficients are widely used as features for a variety of recognition and classification
applications. In a cepstral transformation, the convolution of two signals x1[n] and x2[n] becomes
equivalent to Xc, which is the sum of the cepstra of the two signals.
][ˆ][ˆ 21 nxnxXc += (1)
Defined otherwise, P discrete cepstrum coefficients[3], cp where p = 0,…..P-1 define an amplitude
envelope │H(ω)│equals exp(c0 +2∑p cpcos(pω)) with p varying from 1 to P-1.
The Inverse Fourier Transform of the log amplitude gives the cepstral coefficients. The discrete
cepstrum coefficients can be described by a set , at frequencies ωk with amplitudes Xk with k=
1,….P. This can be expressed mathematically:
))(()(
1
k
P
k
kXX ωωω −∂= ∑=
(2)
where ࣔ(ω) denotes the Dirac delta distribution. The calculation of cp can be done by minimizing
the square difference of │H(ω)│ and │X(ω)│.
Non parametrical analysis provides effective methods for target detection and classification of
underwater targets. Such a strategy may also be incorporated into a hierarchical classification
framework, where a target is first assigned to a class and later with additional information, it may
be identified as a particular target within that class. In order to train a statistical model for each
class, many methods can be used, which may consist of several training states. The system can
be trained on the target data associated with their respective classes. Statistical non parametric
tests can be considered as an alternative for comparisons of data of which the distribution is not
Gaussian[4]. The exact distribution of H-statistic in the Kruskal-Wallis test is conventionally fitted
to a Chi-squared approximation. In state based models, the sequence of tokens generated by it
may give some information about the sequence of states. Even though the states possess
different attributes, for many practical applications there will be often some physical significance
associated to the set of states and their transition probabilities. The proposed procedure can
utilize a codebook to estimate the required parameters. In a codebook, a large number of
observational vectors of the training data is clustered into a certain number of observational
vector clusters using K- means iterative procedure. Based on this clustered observational vectors,
estimates of the parameters are generated during system modulation.
2.1 LPC Analysis
Linear Prediction Coefficients(LPC) Analysis is used to calculate the Cepstral coefficients. LPC is
a powerful modeling technique used for signal analysis. LPC encodes a signal by finding a set of
weights on earlier signal values that can predict the next signal value. Linear prediction
coefficients can be transformed to cepstral coefficients which is a more robust set of parameters.
In matrix form,
Ra = r (3)

Where r is the autocorrelation vector, a is the LPC vector and R is the Toeplitz matrix of r.
The solution is:
a = R
-1
r (4)
2.2 Cepstral Coefficients and Clustering
The p Cepstral coefficients cm, for m=0,1…p-1 derived from the set of LPC coefficients using the
LPC to Cepstral coefficient recursion[5].
K-means is one of the learning algorithms that solve the clustering problem . It is an algorithm to
cluster n objects based on attributes into K partitions, where K < n. It attempts to find the centers
of natural clusters in the data. It assumes that the object attributes form a vector space. The main
idea is to define K centroids, one for each cluster. The result it tries to achieve is to minimize the
total intra-cluster variance, or, the squared error function [6]
2
1
)( i
K
i
jxV µ−= ∑=
(5)
where there are K clusters Si , i = 1, 2, ….K, and µi is the centroid or mean point of all the points xj
which will form the elements of Si and considered in the above computation.
2.3 Forward-Backward Algorithm
The Forward-Backward Algorithm is an algorithm for computing the probability of a particular
observation sequence. Let the forward probability αj(t) for some model M with N states be defined
as αj(t)=P(o1,…..,ot),x(t)=j|M ). That is, αj(t) is the joint probability of observing the first t vectors
and being in state j at time t.
This recursion is based on the fact that the probability of being in state j at time t and having
observation ot can be found by adding the forward probabilities for all possible previous states i
weighted by the transition probability aij . Also,
))(()(
1
2
iN
N
i
iN aTT ∑
−
=
= αα (6)
and P(O|M) equals αN(T).
The backward probability βj(t) is defined as:
),)(|,....,()( 1 MjtxooPt Ttj == +β (7)
The forward probability is a joint probability and the backward probability is a conditional
probability. Also, αj(t) βj(t)= P(O,x(t)=j|M). Hence the probability of state occupation becomes
Sj(t)= P(x(t)=j|O,M) which in turn equals P(O, x(t)=j|M) ÷P(O|M). Let P(O|M)be denoted by Po.
Then
)()(
1
)( tt
P
tS jj
o
j βα= (8)
2.4 H-Statistic
Statistical indicators measure the significance of the difference between the performance of
different systems and can be used to grade the systems if the performance difference is
significant. Kruskal-wallis H-test is a non parametric test[7] of hypothesis whose test statistic can
be effectively utilized in underwater signal classification. The H-statistic is given by:
)1(3
)1(
12
1
2
+−
+
= ∑=
N
N
R
NN
H
G
j j
j
(9)

where G is the total number of samples, Nj , j= 1,…G, is the size of sample j , Rj , j = 1,…G, is the
rank of the sample j . Let (Rj
2
/Nj) of the different sample sets be termed as C which forms an
intermediate parameter in H estimation and
∑=
=
G
j
jNN
1
(10)
2.5 F-Statistic
A F-test is a statistical test which is usually applied when comparing statistical models and is
used to assess if the expected values of a quantitative variable within several pre-defined groups
have difference among each other. The test statistic in an F-test is the ratio of two scaled sums of
squares following Chi-squared distribution, indicating different sources of variability. The F-test
statistic is given as the ratio of ‘Between-Group variability’(BG) to ‘Within-Group variability’(WG).
The two terms can be defined mathematically as follows:
)1/()( −−= ∑ gaviav
i
i NYynBG (11)
where yiav denotes the sample mean in the ith
sample group, ni is the number of observations in
the i
th
group and Yav denotes the overall mean of the data. Also
)/()( 0
2
giavij ij NNyYWG −−= ∑ (12)
where Yij is the j
th
observation in the i
th
out of Ng groups and N0 is the overall sample size.
2.6 Median (M) and Sum of Ranks (R)
The statistical estimate Median (M) is an important characteristic of signals from any underwater
source. It is a measure of the skewness of the sampled signal distribution and also an indicator of
the amplitude variations in the sample set of the particular signal. The Median of the signal can
be estimated as that amplitude value in the sample set from which there occurs equal numbers of
positive and negative amplitude deviations. The M parameter, along with H and F values helps in
the classification of a particular signal. The other statistical estimate used along side H, F and M
parameters in the proposed system is the Sum of Ranks ( R ). It gives a measure of the relative
gradation of signal amplitude variations of the signal, taking into consideration, the sample
location indices in the sample set of the underwater signal. The R parameter can be estimated for
a sample set of by reordering the samples in the increasing order of amplitudes and replacing the
original samples with their respective ranks, in the distribution. A minimum rank of unity can be
assigned to a sample. For equal valued samples, average of the corresponding rank can be
assigned. The sum of all the individual sample ranks will give the parameter R, which forms an
important property, when utilized along with other parameters of the system. For the underwater
signals with closely related H and F parameters, the R parameter can be helpful for identification
in association with the M parameter.
3. METHODOLOGY
The methodology consists of various stages and the different steps involved in the extraction of
feature vectors are furnished below.
3.1 Cepstral Coefficient Extraction
3.1.1 Sampling and Frame Conversion
The noise data waveforms emanating from the underwater target of interest have been sampled
and recorded as a wave file data, which is sampled to be converted to frames of Ns samples, with
adjacent frames being separated by md samples[5]. Denoting the sampled signal by s[n], the l
th
frame of data by xl[n], and there are L frames, then
][][ nlmsnx dl += (13)
Where n = 0, 1, …., Ns -1, and l = 0, 1, ….L-1.

3.1.2 Windowing
Each individual frame is windowed to minimize the signal discontinuities at the boundaries of
each frame. If the window is defined as w[n], then the windowed signal xw is
][][ nwnxx lw = (14)
where 0 < n < Ns-1.
Hamming window is used as a typical window for the autocorrelation method of LPC.
A frame based analysis of the noise data waveform has been performed to generate the sample
vector, which can be used to estimate the statistics needed for target classification. The sampled
signal is partitioned into frames of Ns samples, and consecutive frames are spaced md samples
apart. Each frame is multiplied by a Ns-sample Hamming window, and LP analysis is
performed[8]. The Linear Prediction Coefficients are then converted to the required number of
Cepstral coefficients, which are weighted by a raised sine window.
3.2 Vector Quantization
The next step in the system is a clustering process which can be used to generate a code book
which in turn is utilized in the estimation of transition probability vector. The K-means algorithm
has been used to fix the centroids of a cluster model. The extracted cepstral coefficients of the
underwater signal source are being utilized as the data in this vector quantization process of
unique cluster identification. A matrix is defined, which represents the data which is being
clustered, in a concatenation of K clusters, with each row corresponding to a vector. The cluster
centroids are generated as a vector with the cluster identity. The sum of square error function is
used in the algorithm, and a log of the error values after each iteration can be returned in a
variable. The maximum number of iterations can also be specified.
3.3 Transition Probability Vector Generation
A Vector of transition probabilities can be generated from the vector quantized output, for the
estimation of the Decision Statistics. The algorithm for the generation of the transition probability
vector is as follows:
START:
Segregate the data into Frames.
Windowing the Frames using Hamming Window.
Generation of Linear Prediction Coefficients.
LPC to Cepstral Coefficient conversion.
Vector Quantization and code book generation.
Set Nit = maximum iterations
LABEL 1:
While (count <= Nit)
{
Compute the forward probability αj(t) for all states j at times t.
Compute the backward probability βj(t).
If (P(O|M)<= value of previous iteration)
{
go to LABEL 2
}
Estimate Transition Probability Sj(t).
count = count + 1.
}

)(
)(
1
1
1
3
3 i
g
i
i tt
NN
CF −
−
−= ∑=
LABEL 2:
Generate a single column vector by concatenating individual columns of the estimated
transition probability matrix.
END
3.4 Decision Statistics Estimation
The H and F statistics are estimated as illustrated in Fig 1 with the three sample set consisting of
the previously generated transition probability vector, a down sampled version of the original
underwater signal and a predefined reference sample vector. A correction for ties can be made
by dividing the H-statistic value by a Correction Factor(CF) defined as follows:
(15)
where g is the number of groupings of different tied ranks, and ti is the number of tied values
within group i that are tied at a particular value. This correction usually makes only negligibly
small change in the value of test statistic unless there are large numbers of ties. Additional
statistical parameters like Median and Sum of Ranks can also be estimated along with, for the
underwater signal being processed.
FIGURE 1: Estimation of Decision Statistics
4. IMPLEMENTATION
The sampled underwater noise source is divided into frames of 400 samples (Ns). Consecutive
frames are spaced 19 samples apart. Each frame is multiplied by an Ns-sample Hamming
window. Because of lower side lobe levels, Hamming window is a good choice for comparatively
accurate signal processing systems. Each windowed set of samples is auto correlated to give a
set of coefficients. Then linear prediction coefficient analysis is done on the autocorrelation vector
to estimate the LP coefficients and using recursion method, linear prediction coefficients are
converted to cepstral coefficients. They are then weighted by a raised sine window function. By
applying K-means algorithm, K centroids are defined, one for each cluster. Random selection of K
vectors is done. K=16 is selected in the algorithm. The next step is to take each vector and
associate it to the nearest centroid. At this point, readjusting the centroids is done based on the
new assignment. The algorithm minimizes the squared error function mentioned earlier. Thus,
vector quantization is carried out and unique clusters are defined for the particular underwater
noise waveform.
4.1 Sample Sets Under Consideration
Using Forward-Backward re-estimation algorithm, the transition probabilities for the twenty states
of the system model are estimated leading to the generation of the transition probability vector

which is considered as the first sample set. A vector of down sampled values of the underwater
noise source with a down sampling factor of 0.5 forms the second sample set while a reference
sample set of 1000 samples with sample values of 0.5 for the first 500 samples and 0.25 for the
next 500 samples as depicted in Fig 2, forms the third sample set.
The Kruskal-Wallis H-statistic is estimated with the correction factor to obtain the Chi-squared
statistic approximation. The F-statistic approximation is also estimated for the system. The
Median(M) of the underwater signal and Sum of Ranks(R), taking into consideration, the three
vectors, of the same underwater signal are also evaluated. The estimated values for the four
parameters of different underwater noise sources possess divergent statistical properties which
can be utilized in the effective identification and classification of the unknown underwater signal
source under consideration.
5. RESULTS AND DISCUSSIONS
The system has been validated using simulation studies and the estimated H-statistic as well as
F-statistic approximations, median values(M) and sum of ranks(R) of different underwater signal
sources have been tabulated in Table 1.
TABLE 1: Underwater signal sources and their estimated values of H-statistic,
F-statistic, Median and Sum of Ranks.
Underwater
signal source
Estimated H-
Statistic
Approximation
value
Estimated F-
Statistic
Approximation
value
Estimated
Median value(M)
Estimated Sum
of Ranks
value(R)
Shors 2090 3465 -0.0025 833927
Toadfish 1798 2322 0.001975 1002781
Beluga 2044 3242 -0.00158 908441
Bagre 2420 5706 0.03316 1445904
Outboard 1951 2791 0.00355 971748
Damsel 2115 3616 0.0012571 827679
Sculpin 1172 933 0.21805 1414338
Atlantic croaker 1987 3023 -0.0004 862176
Spiny 2450 6076 -0.005633 631137
BlueGrunt 2097 3570 0.0003167 860600
Dolphin 2146 3455 -0.00108 863228
01m 1172 940 0.0772 1313128
Barjack 2021 3050 0.00228 892434
Bow1 2168 3939 -0.0049167 782094
Boat 1494 1451 0.0024 1136117
Chord 2160 3783 0.000625 778549
3Blade 1837 2372 -0.004733 988073
Torpedo 2563 9757 -0.007817 540386
Rockhind 2075 3394 0.0013125 864103
Snap1 2117 3632 -0.000483 823856
Scad 1990 2893 0.0006667 869278
Finwhale 2134 3875 -0.000453 793392
Seal1 2051 3187 0.0241 1040226
Garib 1896 2635 -0.049514 969721
Grunt 1955 3259 0.00235 888618
Ocean Wave 2054 3558 -0.006425 844440
Minke 2130 3476 0.0001 823722
Hump 2156 3838 -0.010267 786830
Seatrout 2051 3251 0.01018 934365
Silverperch 2064 3193 0.0031 855612
Cavitate 1877 2559 -0.007275 1004192
Sklaxon 2141 3744 -0.00995 807558
Submarine 1644 1843 -0.040775 1012841
Badgear 2060 3453 -0.000217 852301
Seacat 1731 2580 -0.003825 985634
Searobin 1844 2394 -0.002425 962476

The Reference Sample Set of the type depicted in Fig 2, having a statistical variance of 0.0156
has been considered in the proposed technique. Also, the Coefficient of Variation (CV) which is
defined as the ratio of the Standard Deviation to modulus of Mean, for this reference sample set
is seen to be 0.124.
FIGURE 2: Plot of Reference Sample Set values used in the system.
The (H, F, M, R) components form the recognition parameter for a given underwater signal
source. The plots of the loglikelihood in transition probability estimation for the underwater noises
of Toad Fish and Submarine are depicted in Fig 3 (a) and (b). The unknown underwater signal is
processed and the extracted H,F,M,R components are assigned to known underwater signal
categories by judiciously matching the component parameters. The signals listed out in Table 1
have been tested with the system, utilizing the (H,F,M,R) components and correct recognition has
been obtained except for the Searobin and 3Blade underwater signals. The system possesses a
tolerance specification of ±1% for the parameters used in this technique.
(a) (b)
FIGURE 3: Plots of loglikelihood in Transition probability estimation for (a) Toad Fish (b) Submarine.
The proposed system is optimized for the classification of underwater noise sources in the ocean.
Non-parametric estimators and the featured statistical indicators possess increased robustness
essential for the efficient classification capability of a system. State Transition Probability
estimation has been utilized in the design of Hidden Markov Model based speech recognition
systems [1][9]. In this underwater target classifying system, the transition probabilities form a
significant sample set in the estimation of recognition parameters of a particular signal. The
simulated results, using the four components, show high recognition capability of the system for
underwater signals. The increased computational complexity of the system is offset by the
0 5 10 15 20 25 30
-220
-210
-200
-190
-180
-170
-160
Number of estimation cycles
Loglikelihood
0 5 10 15 20 25
-100
-95
-90
-85
-80
-75
-70
-65
-60
Number of estimation cycles
Loglikelihood
0 100 200 300 400 500 600 700 800 900 1000
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
Samples
Magnitude

improved classification efficiency, while upholding the inherent advantages of non-parametric
classifiers.
6. CONCLUSIONS
The proposed system makes use of statistical indicators along with non-parametric estimations
like the cepstral coefficients for the identification and classification of underwater targets utilizing
the target emanations. Using simulation studies, the H-statistic as well as F-statistic
approximations along with the Median and Sum of Ranks parameters for different underwater
signal sources have been estimated and are utilized for the identification of the unknown noise
sources in the ocean. The system can also be augmented with other features and can be
effectively used for the identification and classification of noise sources in the ocean, with
improved success rates.
7. ACKNOWLEDGEMENTS
The authors gratefully acknowledge the Department of Electronics, Cochin University of Science
and Technology, Cochin, India, for providing the necessary facilities for carrying out this work.
8. REFERENCES
[1] Lawrence R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition”, Proceedings of the IEEE, 77(2):257-273, 1989
[2] W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion variance analysis,” Journal of
American Statistics Association, 47 : 583-621, Dec.1952
[3] Schwarz and X. Rodet “Spectral Envelope estimation and representation for sound
analysis- synthesis”, In Proceedings of International Computer Music Conference, ICMC
99, Beijing, 1999
[4] Dirk K. de vries and Yves Chandon, “On the false positive rate of Statistical equipment
comparisons based on the Kruskal-Wallis H-statistice”, IEEE Transactions on Semi
conductor manufacturing, 20(3), 2007
[5] Lawrence Rabiner and Biing-Hwang Juang , “Fundamentals of Speech Recognition”, NJ:
PTR Prentice Hall, pp. 112-117 (1993)
[6] Donghu Li, Azimi Sadjadi, M. R and Robinson, M, “Comparison of different Classification
Algorithms for underwater target discrimination”, IEEE Transactions on Neural Networks,
15(1), 2004
[7] M. Hollander & D.A. Wolfe, “Non parametric Statistical methods”, New York, Wiley, (1973).
[8] J. R Deller, J. G. Proakis and F. H. L Hansen, “Discrete time processing of speech signals”,
IEEE Press, p. 71, (2000)
[9] L. R. Rabiner and B. H. Juang, “An Introduction to Hidden Markov Models”, IEEE ASSP
Magazine, 3(1): pp. 4-16, 1986

A Non Parametric Estimation Based Underwater Target Classifier

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to A Non Parametric Estimation Based Underwater Target Classifier (20)

Recently uploaded (20)

A Non Parametric Estimation Based Underwater Target Classifier