SlideShare a Scribd company logo
Dynamic mixing kernels in Gaussian Mixture Classifier for
Hyperspectral Classification
Vikram Jayaram & Bryan Usevitch
Dept. of Electrical & Computer Engineering

The University of Texas at El Paso
500 W. University Ave., El Paso, TX 79968-0523
ABSTRACT
In this paper, new Gaussian mixture classifiers are designed to deal with the case of an unknown number of mixing
kernels. Not knowing the true number of mixing components is a major learning problem for a mixture classifier
using expectation-maximization (EM). To overcome this problem, the training algorithm uses a combination of
covariance constraints, dynamic pruning, splitting and merging of mixture kernels of the Gaussian mixture to
correctly automate the learning process. This structural learning of Gaussian mixtures is employed to model
and classify Hyperspectral imagery (HSI) data. The results from the HSI experiments suggested that this new
methodology is a potential alternative to the traditional mixture based modeling and classification using general
EM.
Keywords: Hyperspectral imagery (HSI), Gaussian mixture model (GMM), EM, Classification, Kurtosis, PCA.

1. INTRODUCTION
The complexity involved in collecting, storing, analysis and processing of voluminous and multi-dimensional
remote sensing data for an array of “Earth System Science” applications is a well known problem to the remote
sensing community. The recent HSI technology have evolved from its earlier version of multispectral imaging
(MSI).1, 2 In HSI, images are acquired using hundreds of spectral channels when compared to fewer channels
in MSI. However, over the years there has not been much significant development of processing algorithms for
these ever growing (in the spectral direction) electro-optical (EO) data sets. The need to come up with reduced
dimensionality processing algorithms3 is even stronger due to the increased spectral dimensionality of the remote
sensing data. In most remote sensing EO imagery, each spatial pixel is treated as a column vector that contains
spectral information from each channel. On several occasions mixture model based approach have been justified
for processing voluminous data. In this paper, we show an instance of training a Gaussian mixture classifier
using dynamic kernel carpentry to model voluminous data such as the HSI.
Gaussian mixture model (GMM) is a standard modeling technique for estimating unknown probability density
functions (PDF). Even though the merit of GMM lie in fairly approximating most naturally occurring random
processes,4, 5 there lies a learning disability while using EM to estimate its model parameters. In order to
ensure proper learning by the EM, dynamic allocation of Gaussian kernels are used to fit the HSI data. This
model estimates an unknown PDF based on the assumption that the unknown density can be expressed as a
weighted finite sum of Gaussian kernels. These Gaussian kernels have different mixing weights and parametersmeans and covariance matrices. Updating the mixture parameters is carried out by the EM algorithm while
also monitoring the total kurtosis which serves the requirement of kernel splitting (increase in the number of
kernels). Therefore, this technique not only ensures likelihood maximization but also kurtosis minimization.
The kernel splitting comes to a halt when there is no further improvement in the minimization of kurtosis seem
possible. Similarly, the other steps of this training methodology - pruning (destroying the weak kernels), merging
of kernels and determining if the algorithm converged is carried out in a step-by-step fashion. The results of this
Further author information: (Send correspondence to Vikram Jayaram)
V. Jayaram: E-mail: jayaram@ece.utep.edu, Telephone: 1 915 747 5869

Mathematics of Data/Image Pattern Recognition, Compression, and Encryption with Applications XI, edited by
Mark S. Schmalz, Gerhard X. Ritter, Junior Barrera, Jaakko T. Astola, Proc. of SPIE Vol. 7075, 70750L, (2008)
0277-786X/08/$18 · doi: 10.1117/12.798443
Proc. of SPIE Vol. 7075 70750L-1
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
training are indicated by means of the receiver operating characteristics (ROC) curves. This structural learning6
based training technique uses relatively fewer kernels for to estimate the model parameters of GMM with a fast
converging property.

2. GAUSSIAN MIXTURE MODELS AND EM ALGORITHM
Multidimensional data such as the HSI can be modeled by a multidimensional Gaussian mixture (GM). Normally,
GM in the form of the PDF for z ∈ RP is given by
L

αi N (z, µi , Σi )

p(z) =
i=1

where
N (z, µi , Σi ) =

1

1

(2π)P/2 |Σi |1/2

−1

e{− 2 (z−µi ) Σi

(z−µi )}

.

Here L is the number of mixture components and P the number of spectral channels (bands). The GM parameters
are denoted by λ = {αi , µi , Σi }, where αi , µi , Σi are the mixing weight, mean and covariances of the individual
components for the mixture model. The parameters of the GM are estimated using maximum likelihood (ML)
by means of EM algorithm.7
Let sample vectors Z = {z1 , z2 , · · ·, zT } & λ, the likelihood (ML) of the GMM is given by:
T

p(zt |λ).

p(Z|λ) =

(1)

t=1

ˆ
ˆ
Next the ML estimation finds a new parameter model λ such that p(Z|λ) ≥ p(Z|λ). Due to the nonlinearity
behavior of the likelihood in λ given in 1, a straight forward maximization of the function is not viable. The
maximization takes place on an iterative basis using EM.7 In EM algorithm, we use the auxiliary function Q
given by
T

L

ˆ
Q(λ, λ) =

p(i|zt , λ) log[αi N (zt , µi , Σi )],
ˆ
ˆ ˆ

(2)

t=1 i=1

where p(i|zt , λ) is the a posteriori probability for each mixture component of image class i, where i = 1, · · ·, L
and satisfies
αi N (zt , µi , Σi )
·
(3)
p(i|zt , λ) = L
Σk=1 αk N (zt , µk , Σk )
ˆ
ˆ
The EM algorithm is such that if Q(λ, λ) ≥ Q(λ, λ) then p(Z|λ) ≥ p(Z|λ).8 After setting up derivatives of the
ˆ to zero, the re-estimation formulas are as follows8
Q function with respect to λ
αi =
ˆ

1 T
Σ p(i|zt , λ),
T t=1

(4)

µi =
ˆ

ΣT p(i|zt , λ)zt
t=1
,
ΣT p(i|zt , λ)
t=1

(5)

ΣT p(i|zt , λ)(zt − µi )(zt − µi )
ˆ
Σi = t=1
·
ΣT p(i|zt , λ)
t=1

(6)

The algorithm for training GMM is summarized as follows:
• Generate the a posteriori probability ΣT p(i|zt , λ) based on proposed method (explained further) satisfying
t=1
(4).
• Compute the mixture weight, mean vector and covariance matrix by means of (4), (5) & (6).

Proc. of SPIE Vol. 7075 70750L-2
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
• Update the a posteriori probability ΣT p(i|zt , λ) according to (3) followed by computation of the Q
t=1
function using (2).
• Stop if the increase in value of Q function at the current iteration relative to the value of Q function at
the previous iteration is less than a chosen threshold, otherwise go to item 2.

3. TRAINING THE MIXTURE MODEL
In spite of robust design of GMM, there are challenges trying to train a GM with a local algorithm like EM.
First of all, the true number of mixture components is usually unknown. Not knowing the true number of mixing
components is a major learning problem for a mixture classifier using EM.5, 9
The solution to this problem is a dynamic algorithm for Gaussian mixture density estimation that could
dynamically add-remove kernel components to adequately model the input data. This methodology also increases
the chances to escape getting stuck in one of the many local maxima of the likelihood function.10 In a method
proposed by N. Vlassis and A. Likas11 called the greedy EM algorithm, GM training begins with a single
component mixture. Components or modes are then added in a sequential manner until the likelihood stops
increasing or the incrementally computed mixture is almost as good as any mixture in that form. This incremental
mixture density function uses a combination of global11 and local search11 techniques each time a new kernel
component is added to the mixture.
In case, the number of mixture components become high, they are pruned out depending upon the value of
mixing weight αi . This procedure ensures removal of weak modes from the mixture. A weak mode is identified
by checking αi with respect to certain threshold. Once identified, the weak modes are obliterated. A further
re-normalizing of αi takes place, such that i αi = 1.
Merging of kernel components is another process in this training scheme, wherein, a single mode is created
from two identical ones. The similarity measure between the mixture modes is given by a metric d. For example,
consider two PDF’s p1 (x) and p2 (x). Let there be collection of points near the central peak of p1 (x) represented
by xi ∈ X1 and another set of points near the central peak of p2 (x) denoted by xi ∈ X2 . In which case closeness
metric d is given by
xi ∈X1 p2 (xi )
xi ∈X2 p1 (xi )
(7)
d = log
xi ∈X1 p1 (xi )
xi ∈X2 p2 (xi )
Notice that d = 0 when p1 (x) = p2 (x) and d < 0 for p1 (x) = p2 (x). A pre-determined threshold is set to
determine if the modes are too close. If the two modes are found below a certain threshold, they will be merged
forming weighted sum of two modes. The mode for this newly merged kernel components will be computed as10
µ=

α1 µ1 + α2 µ2
.
α1 + α2

A similar weighted analogy cannot be applied while merging covariances as it does not take in to account
the separation of means. Instead of computing Σi directly we consider its Cholesky factors7 that are multiplied
by the respective weights given by α1α1 2 and α1α2 2 to obtain the merged covariance.11 On the other hand
+α
+α
if the number of mixture components are insufficient, then the components are split in order to increase the
total number of components. Vlassis et. al.12 define a method to monitor weighted kurtosis of each mode which
directly determine the number of mixture components. This kurtosis measure is given by
Ki =
where
wn,i =

N
n=1

n
wn,i ( Z√−µi )4
Σ

N
n=1

i

wn,i

−3

N (zn , µi , Σi )
·
ΣN N (zn , µi , Σi )
n=1

Proc. of SPIE Vol. 7075 70750L-3
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
300

PCA 2

200
100
0
−100
−200
−1500

−1000

−500

0
PCA 1

500

1000

−500

0
PCA 1

500

1000

300

PCA 2

200
100
0
−100
−200
−1500

−1000

Figure 1. The scene on the left is a 1995 AVIRIS image of Cuprite field in Nevada. Figure on the right is the 2D scatter
plot of first two components after PCA rotation.

If |Ki | is too high for any mode i, then the mode is split into two. This could be modified to higher dimension
by considering skew in addition to the kurtosis, where each data sample zn is projected on to the j-th principal
j
axis of Σi in turn. Let zn,i = (zn − µi ) vij where vij is the j-th column of V, obtained from the SVD of Σi
(this step is necessary in order to condition the covariances). Conditioning of covariances is necessary in order
to prevent covariance matrices from becoming singular. Therefore, for each j
j

Ki,j =

Zn,i 4
N
n=1 wn,i ( si )
N
n=1 wn,i

−3

j

ψi,j =

Zn,i 3
N
n=1 wn,i ( si )
N
n=1 wn,i

mi,j = |Ki,j | + |ψi,j |
where
s2
i

=

N
j
2
n=1 wn,i (zn,i )
.
N
n=1 wn,i

Now, if mi,j > τ (threshold), for any j, split mode i. Further, split the mode by creating mixture components
at µ = µi + vi,j Si,j and µ = µi − vi,j Si,j . Here Si,j is the j-th singular value of Σi . The same covariance Σi is
used for each new mode. As mentioned earlier, the decision to split or not depends upon the mixing weight αi .
The splitting does not take place if the value of αi is too small. Finally, once the number of modes settles out
the likelihood stops increasing and convergence is achieved. With this combination of covariance constraints,
mode pruning, merging and splitting can result in a good PDF approximation of the mixture models.

4. EXPERIMENTS
To demonstrate the robustness of this learning scheme we run the model experiments on high dimensional HSI
data. The remote sensing data sets that we have used in our experiments comes from an Airborne Visible/Infrared
Imaging Spectrometer (AVIRIS) sensor derived imagery. AVIRIS is a unique optical sensor that delivers calibrated images of the upwelling spectral radiance in 224 contiguous spectral channels (also called bands) with
wavelengths from 0.4-2.5 µm. AVIRIS is flown all across the US, Canada and Europe. Figure 1 shows data sets

Proc. of SPIE Vol. 7075 70750L-4
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
300

PCA 2

200
100
0
−100
−200
−1500

−1000

−500

0
PCA 1

500

1000

−1000

−500

0
PCA 1

500

1000

300

PCA 2

200
100
0
−100
−200
−1500

Figure 2. 2D scatter plot of the data and the Gaussian mixture model after the convergence achieved by the EM algorithm.

used in our experiments that belong to 1995 Cuprite field scene in Nevada. Since, HSI imagery is highly correlated in the spectral direction using principal component (PCA) rotation is an obvious choice for decorrelation
among the bands.13, 14 The 2D “scatter” plot of the first two principal components of the data as shown in
Figure 1. The scatter plots used in the paper are similar to marginalized PDF on any 2D plane. Marginalization
could be easily depicted for Gaussian mixtures.10 Let z = [z1 , z2 , z3 , z4 ]. For example, to visualize on the (z2 , z4 )
plane, we would need to compute
p(z2 , z4 ) =

z1

z3

p(z1 , z2 , z3 , z4 )dz1 dz3 .

This utility is very useful when visualizing high-dimensional PDF. With the given HSI data, the next step is to
train the mixture model. Training consists of five operations as mentioned earlier- begin EM algorithm, pruning
and merging the components, spitting the components if necessary and finally determining if the algorithm has
converged based on likelihood estimates of the parameters by the end of each iteration. Some of the aspects that
are critical for training are- covariance constraints, minimum individual component weights used in pruning,
threshold to determine if the two components should be merged or split and the criterion for determining if the
convergence has taken place. Figure 3 shows one dimensional PDF plots of the two PCA components. Notice
the marginal PDF’s being compared to the histograms.
During the process of training the log likelihood would monotonically increase, if not for pruning, splitting,
and merging operations. Figure 2 shows the Gaussian mixture approximation after convergence. Approximately
ten components were derived by the EM to characterize the GMM. The GM model parameters obtained as a
result of the structural learning will now be used to build a classifier. Figure 4 shows a synthetically simulated
second class of data added to the already existing input data. We will now build a classifier using Gaussian
mixtures by training a second parameter set on the newly added class also using the similar scheme of learning.
Figure 5 shows the result after the model converges to obtain the parameters for the second class. This is
followed by computing the log-likelihood of the test data. The performance of the classifier is evaluated using
ROC as shown in Figure 6. The response of the ROC curve clearly supports the robustness of the proposed
classification-learning scheme.

Proc. of SPIE Vol. 7075 70750L-5
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
−3

3

x 10

PCA 1

2

1

0
−1500

−1000

−500

0

500

1000

1500

0.01

PCA 2

0.008
0.006
0.004
0.002
0
−200

−150

−100

−50

0

50

100

150

200

250

Figure 3. One dimensional PDF plots. The marginal PDF’s are compared to the histograms.

250
200
150

FEATURE B

100
50
0
−50
−100
−150
−200
−1000

−800

−600

−400

−200

0

200

400

600

FEATURE A

Figure 4. Addition of second (yellow) class to the original data.

Proc. of SPIE Vol. 7075 70750L-6
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms

800

1000
150

PCA 2

100
50
0
−50
−100
−300

−250

−200

−150

−100

−150

−50

−100

PCA 1

PCA 2

100

0

−100
−300

−250

−200

−50

PCA 1

Figure 5. Trained GM approximation of the second class.

1
0.95
0.9
0.85

Pd

0.8
0.75
0.7
0.65
0.6
0.55
0

0.005

0.01

0.015
Pfa

0.02

Figure 6. ROC curve for the two-class problem.

Proc. of SPIE Vol. 7075 70750L-7
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms

0.025

0.03
5. CONCLUSIONS AND FUTURE WORKS
In this paper, we proposed the use of Gaussian mixture models that utilize structural learning scheme for
modeling and classification of Hyperspectral imagery. Traditional learning technique employing general EM
has serious drawbacks such as no generally accepted method for parameter initialization, how many mixture
components to be employed to adequately model the input data and the chances of the model being stuck in
multiple local maxima’s of the likelihood function. These drawbacks have been well addressed by the proposed
structural learning scheme. The ROC curve in our experiments is used as a general diagnosis tool to evaluate
classification. Clearly, ROC depicted high probability of detection for low false alarm rates. The GMM in
conjunction with structural learning is well equipped to model and classify HSI data. These models provide
sufficient adjustment to several distributions with lower variances. This trait of GMM is particular appreciated
in image classification applications, since it reduces misclassification. As future work, we intend to explore and
equip current state-of-the-art statistical classifiers with better training schemes for practical HSI applications.

ACKNOWLEDGMENTS
This work was supported by NASA Earth System Science Fellowship grant. The authors would also like to thank
department of Geological Sciences at UTEP for providing access to ENVI software.

REFERENCES
[1] Landgrebe, D. A., [Signal Theory Methods in Multispectral Remote Sensing], Wiley Inter-Science, Hoboken,
NJ, second ed. (2003).
[2] Schowengerdt, R. A., [Remote Sensing Models & Methods for Image Processing], Academic Press, Burlington, MA, seventh ed. (1997).
[3] Keshava, N., “Distance metrics & band selection in hyperspectral processing with applications to material
identification & spectral libraries,” IEEE Transactions on Geoscience and Remote Sensing 42, 1552–1565
(2004).
[4] Redner, R. and Walker, H., “Mixture densities, maximum likelihood and the EM algorithm,” SIAM Review 26, 195–239 (1984).
[5] Duda, R. O., Hart, P. E., and Stork, D. G., [Pattern Classification ], John-Wiley and Sons, New York, NY,
seventh ed. (2001).
[6] Baggenstoss, P. M., “Structural learning for classification of high dimensional data,” Proc. of International
Conference on Intelligent Systems and Semantics NIST, 124–129 (1997).
[7] Moon, T. and Stirling, W., [Mathematical Methods and Algorithms for Signal Processing], Prentice Hall,
Upper Saddle River, NJ (2000).
[8] Rabiner, L., “A tutorial on hidden M arkov models and selected application in speech recognition,” Proc. of
IEEE 77, 257–286 (1989).
[9] McLachlan, G. and Peel, D., [Finite Mixture Models], Wiley Series in Probability and Statistics, New York,
NY, second ed. (2000).
[10] Hastie, T., Tibshirani, R., and Friedman, J., [The Elements of Statistical Learning], Springer-Verlag, New
York, NY (1994).
[11] Vlassis, N. and Likas, A., “A greedy EM for gaussian mixture learning,” Neural Processing Letters 15,
77–87 (2002).
[12] Vlassis, N. and Likas, A., “A kurtosis-based dynamic approach to gaussian mixture modeling,” IEEE
Transactions on Systems, Man and Cybernetics 29, 393–399 (1999).
[13] Jayaram, V., Usevitch, B., and Kosheleva, O., “Detection from H yperspectral images compressed using
rate distortion and optimization techniques under JPEG2000 part 2,” IEEE 11th DSP Workshop and 3rd
SPE Workshop cdrom, 195–239 (2004).
[14] Shaw, G. and Manolakis, D., “Signal processing for H yperspectral image exploitation,” IEEE Signal Processing Magazine 19, 12–16 (2002).

Proc. of SPIE Vol. 7075 70750L-8
Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms

More Related Content

PDF
Understanding Random Forests: From Theory to Practice
PDF
07 Machine Learning - Expectation Maximization
PDF
Batch gradient method for training of
PDF
A PSO-Based Subtractive Data Clustering Algorithm
PDF
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
PDF
Multi fractal analysis of human brain mr image
PDF
Multi fractal analysis of human brain mr image
PDF
Image segmentation by modified map ml estimations
Understanding Random Forests: From Theory to Practice
07 Machine Learning - Expectation Maximization
Batch gradient method for training of
A PSO-Based Subtractive Data Clustering Algorithm
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
Image segmentation by modified map ml estimations

What's hot (19)

PDF
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
PPTX
Handling missing data with expectation maximization algorithm
PDF
Multi objective predictive control a solution using metaheuristics
PDF
PDF
An Adaptive Masker for the Differential Evolution Algorithm
PDF
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
PDF
better together? statistical learning in models made of modules
PDF
Em molnar2015
PDF
Black-box modeling of nonlinear system using evolutionary neural NARX model
PDF
Joint3DShapeMatching
PDF
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
PDF
Behavior study of entropy in a digital image through an iterative algorithm
PDF
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
PDF
Probabilistic model based image segmentation
PDF
Hybrid method for achieving Pareto front on economic emission dispatch
PDF
Improved probabilistic distance based locality preserving projections method ...
PDF
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
PDF
Kernal based speaker specific feature extraction and its applications in iTau...
PDF
A Genetic Algorithm for the Constrained Forest Problem
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Handling missing data with expectation maximization algorithm
Multi objective predictive control a solution using metaheuristics
An Adaptive Masker for the Differential Evolution Algorithm
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
better together? statistical learning in models made of modules
Em molnar2015
Black-box modeling of nonlinear system using evolutionary neural NARX model
Joint3DShapeMatching
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
Behavior study of entropy in a digital image through an iterative algorithm
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
Probabilistic model based image segmentation
Hybrid method for achieving Pareto front on economic emission dispatch
Improved probabilistic distance based locality preserving projections method ...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Kernal based speaker specific feature extraction and its applications in iTau...
A Genetic Algorithm for the Constrained Forest Problem
Ad

Similar to 2008 spie gmm (20)

PDF
PDF
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
PPTX
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
PDF
A Framework for Self-Tuning Optimization Algorithm
PDF
EVOLVING CONNECTION WEIGHTS FOR PATTERN STORAGE AND RECALL IN HOPFIELD MODEL ...
PDF
Improving Performance of Back propagation Learning Algorithm
PDF
Elliptical Mixture Models Improve the Accuracy of Gaussian Mixture Models wit...
PDF
Classification of handwritten characters by their symmetry features
PDF
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
PDF
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
PPTX
Mixture Regression Model for Incomplete Data
PPTX
A popular clustering algorithm is known as K-means, which will follow an iter...
PPT
Iee egold2010 presentazione_finale_veracini
PDF
Evolving Connection Weights for Pattern Storage and Recall in Hopfield Model ...
PDF
A general frame for building optimal multiple SVM kernels
PDF
Evaluation of a hybrid method for constructing multiple SVM kernels
PPTX
PRML Chapter 9
PDF
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
PDF
An Efficient Clustering Method for Aggregation on Data Fragments
PDF
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
A Framework for Self-Tuning Optimization Algorithm
EVOLVING CONNECTION WEIGHTS FOR PATTERN STORAGE AND RECALL IN HOPFIELD MODEL ...
Improving Performance of Back propagation Learning Algorithm
Elliptical Mixture Models Improve the Accuracy of Gaussian Mixture Models wit...
Classification of handwritten characters by their symmetry features
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
Mixture Regression Model for Incomplete Data
A popular clustering algorithm is known as K-means, which will follow an iter...
Iee egold2010 presentazione_finale_veracini
Evolving Connection Weights for Pattern Storage and Recall in Hopfield Model ...
A general frame for building optimal multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
PRML Chapter 9
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
An Efficient Clustering Method for Aggregation on Data Fragments
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Ad

More from Pioneer Natural Resources (15)

PDF
Hydraulic Fracturing Stimulation Monitoring with Distributed Fiber Optic Sens...
PDF
Interpretation Special-Section: Insights into digital oilfield data using ar...
PDF
Machine learning for Seismic Data Analysis
PDF
An approach to offer management: maximizing sales with fare products and anci...
PDF
A comparison of classification techniques for seismic facies recognition
PDF
3D Gravity Modeling of Osage County Oklahoma for 3D Gravity Interpretation
PDF
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
PDF
Receiver deghosting method to mitigate F-­K transform artifacts: A non-­windo...
PDF
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
PDF
Active learning algorithms in seismic facies classification
PDF
A Rapid Location Independent Full Tensor Gravity Algorithm
PDF
Directional Analysis and Filtering for Dust Storm detection in NOAA-AVHRR Ima...
PDF
Detection and Classification in Hyperspectral Images using Rate Distortion an...
Hydraulic Fracturing Stimulation Monitoring with Distributed Fiber Optic Sens...
Interpretation Special-Section: Insights into digital oilfield data using ar...
Machine learning for Seismic Data Analysis
An approach to offer management: maximizing sales with fare products and anci...
A comparison of classification techniques for seismic facies recognition
3D Gravity Modeling of Osage County Oklahoma for 3D Gravity Interpretation
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
Receiver deghosting method to mitigate F-­K transform artifacts: A non-­windo...
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Active learning algorithms in seismic facies classification
A Rapid Location Independent Full Tensor Gravity Algorithm
Directional Analysis and Filtering for Dust Storm detection in NOAA-AVHRR Ima...
Detection and Classification in Hyperspectral Images using Rate Distortion an...

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
sap open course for s4hana steps from ECC to s4
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

2008 spie gmm

  • 1. Dynamic mixing kernels in Gaussian Mixture Classifier for Hyperspectral Classification Vikram Jayaram & Bryan Usevitch Dept. of Electrical & Computer Engineering The University of Texas at El Paso 500 W. University Ave., El Paso, TX 79968-0523 ABSTRACT In this paper, new Gaussian mixture classifiers are designed to deal with the case of an unknown number of mixing kernels. Not knowing the true number of mixing components is a major learning problem for a mixture classifier using expectation-maximization (EM). To overcome this problem, the training algorithm uses a combination of covariance constraints, dynamic pruning, splitting and merging of mixture kernels of the Gaussian mixture to correctly automate the learning process. This structural learning of Gaussian mixtures is employed to model and classify Hyperspectral imagery (HSI) data. The results from the HSI experiments suggested that this new methodology is a potential alternative to the traditional mixture based modeling and classification using general EM. Keywords: Hyperspectral imagery (HSI), Gaussian mixture model (GMM), EM, Classification, Kurtosis, PCA. 1. INTRODUCTION The complexity involved in collecting, storing, analysis and processing of voluminous and multi-dimensional remote sensing data for an array of “Earth System Science” applications is a well known problem to the remote sensing community. The recent HSI technology have evolved from its earlier version of multispectral imaging (MSI).1, 2 In HSI, images are acquired using hundreds of spectral channels when compared to fewer channels in MSI. However, over the years there has not been much significant development of processing algorithms for these ever growing (in the spectral direction) electro-optical (EO) data sets. The need to come up with reduced dimensionality processing algorithms3 is even stronger due to the increased spectral dimensionality of the remote sensing data. In most remote sensing EO imagery, each spatial pixel is treated as a column vector that contains spectral information from each channel. On several occasions mixture model based approach have been justified for processing voluminous data. In this paper, we show an instance of training a Gaussian mixture classifier using dynamic kernel carpentry to model voluminous data such as the HSI. Gaussian mixture model (GMM) is a standard modeling technique for estimating unknown probability density functions (PDF). Even though the merit of GMM lie in fairly approximating most naturally occurring random processes,4, 5 there lies a learning disability while using EM to estimate its model parameters. In order to ensure proper learning by the EM, dynamic allocation of Gaussian kernels are used to fit the HSI data. This model estimates an unknown PDF based on the assumption that the unknown density can be expressed as a weighted finite sum of Gaussian kernels. These Gaussian kernels have different mixing weights and parametersmeans and covariance matrices. Updating the mixture parameters is carried out by the EM algorithm while also monitoring the total kurtosis which serves the requirement of kernel splitting (increase in the number of kernels). Therefore, this technique not only ensures likelihood maximization but also kurtosis minimization. The kernel splitting comes to a halt when there is no further improvement in the minimization of kurtosis seem possible. Similarly, the other steps of this training methodology - pruning (destroying the weak kernels), merging of kernels and determining if the algorithm converged is carried out in a step-by-step fashion. The results of this Further author information: (Send correspondence to Vikram Jayaram) V. Jayaram: E-mail: jayaram@ece.utep.edu, Telephone: 1 915 747 5869 Mathematics of Data/Image Pattern Recognition, Compression, and Encryption with Applications XI, edited by Mark S. Schmalz, Gerhard X. Ritter, Junior Barrera, Jaakko T. Astola, Proc. of SPIE Vol. 7075, 70750L, (2008) 0277-786X/08/$18 · doi: 10.1117/12.798443 Proc. of SPIE Vol. 7075 70750L-1 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
  • 2. training are indicated by means of the receiver operating characteristics (ROC) curves. This structural learning6 based training technique uses relatively fewer kernels for to estimate the model parameters of GMM with a fast converging property. 2. GAUSSIAN MIXTURE MODELS AND EM ALGORITHM Multidimensional data such as the HSI can be modeled by a multidimensional Gaussian mixture (GM). Normally, GM in the form of the PDF for z ∈ RP is given by L αi N (z, µi , Σi ) p(z) = i=1 where N (z, µi , Σi ) = 1 1 (2π)P/2 |Σi |1/2 −1 e{− 2 (z−µi ) Σi (z−µi )} . Here L is the number of mixture components and P the number of spectral channels (bands). The GM parameters are denoted by λ = {αi , µi , Σi }, where αi , µi , Σi are the mixing weight, mean and covariances of the individual components for the mixture model. The parameters of the GM are estimated using maximum likelihood (ML) by means of EM algorithm.7 Let sample vectors Z = {z1 , z2 , · · ·, zT } & λ, the likelihood (ML) of the GMM is given by: T p(zt |λ). p(Z|λ) = (1) t=1 ˆ ˆ Next the ML estimation finds a new parameter model λ such that p(Z|λ) ≥ p(Z|λ). Due to the nonlinearity behavior of the likelihood in λ given in 1, a straight forward maximization of the function is not viable. The maximization takes place on an iterative basis using EM.7 In EM algorithm, we use the auxiliary function Q given by T L ˆ Q(λ, λ) = p(i|zt , λ) log[αi N (zt , µi , Σi )], ˆ ˆ ˆ (2) t=1 i=1 where p(i|zt , λ) is the a posteriori probability for each mixture component of image class i, where i = 1, · · ·, L and satisfies αi N (zt , µi , Σi ) · (3) p(i|zt , λ) = L Σk=1 αk N (zt , µk , Σk ) ˆ ˆ The EM algorithm is such that if Q(λ, λ) ≥ Q(λ, λ) then p(Z|λ) ≥ p(Z|λ).8 After setting up derivatives of the ˆ to zero, the re-estimation formulas are as follows8 Q function with respect to λ αi = ˆ 1 T Σ p(i|zt , λ), T t=1 (4) µi = ˆ ΣT p(i|zt , λ)zt t=1 , ΣT p(i|zt , λ) t=1 (5) ΣT p(i|zt , λ)(zt − µi )(zt − µi ) ˆ Σi = t=1 · ΣT p(i|zt , λ) t=1 (6) The algorithm for training GMM is summarized as follows: • Generate the a posteriori probability ΣT p(i|zt , λ) based on proposed method (explained further) satisfying t=1 (4). • Compute the mixture weight, mean vector and covariance matrix by means of (4), (5) & (6). Proc. of SPIE Vol. 7075 70750L-2 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
  • 3. • Update the a posteriori probability ΣT p(i|zt , λ) according to (3) followed by computation of the Q t=1 function using (2). • Stop if the increase in value of Q function at the current iteration relative to the value of Q function at the previous iteration is less than a chosen threshold, otherwise go to item 2. 3. TRAINING THE MIXTURE MODEL In spite of robust design of GMM, there are challenges trying to train a GM with a local algorithm like EM. First of all, the true number of mixture components is usually unknown. Not knowing the true number of mixing components is a major learning problem for a mixture classifier using EM.5, 9 The solution to this problem is a dynamic algorithm for Gaussian mixture density estimation that could dynamically add-remove kernel components to adequately model the input data. This methodology also increases the chances to escape getting stuck in one of the many local maxima of the likelihood function.10 In a method proposed by N. Vlassis and A. Likas11 called the greedy EM algorithm, GM training begins with a single component mixture. Components or modes are then added in a sequential manner until the likelihood stops increasing or the incrementally computed mixture is almost as good as any mixture in that form. This incremental mixture density function uses a combination of global11 and local search11 techniques each time a new kernel component is added to the mixture. In case, the number of mixture components become high, they are pruned out depending upon the value of mixing weight αi . This procedure ensures removal of weak modes from the mixture. A weak mode is identified by checking αi with respect to certain threshold. Once identified, the weak modes are obliterated. A further re-normalizing of αi takes place, such that i αi = 1. Merging of kernel components is another process in this training scheme, wherein, a single mode is created from two identical ones. The similarity measure between the mixture modes is given by a metric d. For example, consider two PDF’s p1 (x) and p2 (x). Let there be collection of points near the central peak of p1 (x) represented by xi ∈ X1 and another set of points near the central peak of p2 (x) denoted by xi ∈ X2 . In which case closeness metric d is given by xi ∈X1 p2 (xi ) xi ∈X2 p1 (xi ) (7) d = log xi ∈X1 p1 (xi ) xi ∈X2 p2 (xi ) Notice that d = 0 when p1 (x) = p2 (x) and d < 0 for p1 (x) = p2 (x). A pre-determined threshold is set to determine if the modes are too close. If the two modes are found below a certain threshold, they will be merged forming weighted sum of two modes. The mode for this newly merged kernel components will be computed as10 µ= α1 µ1 + α2 µ2 . α1 + α2 A similar weighted analogy cannot be applied while merging covariances as it does not take in to account the separation of means. Instead of computing Σi directly we consider its Cholesky factors7 that are multiplied by the respective weights given by α1α1 2 and α1α2 2 to obtain the merged covariance.11 On the other hand +α +α if the number of mixture components are insufficient, then the components are split in order to increase the total number of components. Vlassis et. al.12 define a method to monitor weighted kurtosis of each mode which directly determine the number of mixture components. This kurtosis measure is given by Ki = where wn,i = N n=1 n wn,i ( Z√−µi )4 Σ N n=1 i wn,i −3 N (zn , µi , Σi ) · ΣN N (zn , µi , Σi ) n=1 Proc. of SPIE Vol. 7075 70750L-3 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
  • 4. 300 PCA 2 200 100 0 −100 −200 −1500 −1000 −500 0 PCA 1 500 1000 −500 0 PCA 1 500 1000 300 PCA 2 200 100 0 −100 −200 −1500 −1000 Figure 1. The scene on the left is a 1995 AVIRIS image of Cuprite field in Nevada. Figure on the right is the 2D scatter plot of first two components after PCA rotation. If |Ki | is too high for any mode i, then the mode is split into two. This could be modified to higher dimension by considering skew in addition to the kurtosis, where each data sample zn is projected on to the j-th principal j axis of Σi in turn. Let zn,i = (zn − µi ) vij where vij is the j-th column of V, obtained from the SVD of Σi (this step is necessary in order to condition the covariances). Conditioning of covariances is necessary in order to prevent covariance matrices from becoming singular. Therefore, for each j j Ki,j = Zn,i 4 N n=1 wn,i ( si ) N n=1 wn,i −3 j ψi,j = Zn,i 3 N n=1 wn,i ( si ) N n=1 wn,i mi,j = |Ki,j | + |ψi,j | where s2 i = N j 2 n=1 wn,i (zn,i ) . N n=1 wn,i Now, if mi,j > τ (threshold), for any j, split mode i. Further, split the mode by creating mixture components at µ = µi + vi,j Si,j and µ = µi − vi,j Si,j . Here Si,j is the j-th singular value of Σi . The same covariance Σi is used for each new mode. As mentioned earlier, the decision to split or not depends upon the mixing weight αi . The splitting does not take place if the value of αi is too small. Finally, once the number of modes settles out the likelihood stops increasing and convergence is achieved. With this combination of covariance constraints, mode pruning, merging and splitting can result in a good PDF approximation of the mixture models. 4. EXPERIMENTS To demonstrate the robustness of this learning scheme we run the model experiments on high dimensional HSI data. The remote sensing data sets that we have used in our experiments comes from an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor derived imagery. AVIRIS is a unique optical sensor that delivers calibrated images of the upwelling spectral radiance in 224 contiguous spectral channels (also called bands) with wavelengths from 0.4-2.5 µm. AVIRIS is flown all across the US, Canada and Europe. Figure 1 shows data sets Proc. of SPIE Vol. 7075 70750L-4 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
  • 5. 300 PCA 2 200 100 0 −100 −200 −1500 −1000 −500 0 PCA 1 500 1000 −1000 −500 0 PCA 1 500 1000 300 PCA 2 200 100 0 −100 −200 −1500 Figure 2. 2D scatter plot of the data and the Gaussian mixture model after the convergence achieved by the EM algorithm. used in our experiments that belong to 1995 Cuprite field scene in Nevada. Since, HSI imagery is highly correlated in the spectral direction using principal component (PCA) rotation is an obvious choice for decorrelation among the bands.13, 14 The 2D “scatter” plot of the first two principal components of the data as shown in Figure 1. The scatter plots used in the paper are similar to marginalized PDF on any 2D plane. Marginalization could be easily depicted for Gaussian mixtures.10 Let z = [z1 , z2 , z3 , z4 ]. For example, to visualize on the (z2 , z4 ) plane, we would need to compute p(z2 , z4 ) = z1 z3 p(z1 , z2 , z3 , z4 )dz1 dz3 . This utility is very useful when visualizing high-dimensional PDF. With the given HSI data, the next step is to train the mixture model. Training consists of five operations as mentioned earlier- begin EM algorithm, pruning and merging the components, spitting the components if necessary and finally determining if the algorithm has converged based on likelihood estimates of the parameters by the end of each iteration. Some of the aspects that are critical for training are- covariance constraints, minimum individual component weights used in pruning, threshold to determine if the two components should be merged or split and the criterion for determining if the convergence has taken place. Figure 3 shows one dimensional PDF plots of the two PCA components. Notice the marginal PDF’s being compared to the histograms. During the process of training the log likelihood would monotonically increase, if not for pruning, splitting, and merging operations. Figure 2 shows the Gaussian mixture approximation after convergence. Approximately ten components were derived by the EM to characterize the GMM. The GM model parameters obtained as a result of the structural learning will now be used to build a classifier. Figure 4 shows a synthetically simulated second class of data added to the already existing input data. We will now build a classifier using Gaussian mixtures by training a second parameter set on the newly added class also using the similar scheme of learning. Figure 5 shows the result after the model converges to obtain the parameters for the second class. This is followed by computing the log-likelihood of the test data. The performance of the classifier is evaluated using ROC as shown in Figure 6. The response of the ROC curve clearly supports the robustness of the proposed classification-learning scheme. Proc. of SPIE Vol. 7075 70750L-5 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms
  • 6. −3 3 x 10 PCA 1 2 1 0 −1500 −1000 −500 0 500 1000 1500 0.01 PCA 2 0.008 0.006 0.004 0.002 0 −200 −150 −100 −50 0 50 100 150 200 250 Figure 3. One dimensional PDF plots. The marginal PDF’s are compared to the histograms. 250 200 150 FEATURE B 100 50 0 −50 −100 −150 −200 −1000 −800 −600 −400 −200 0 200 400 600 FEATURE A Figure 4. Addition of second (yellow) class to the original data. Proc. of SPIE Vol. 7075 70750L-6 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms 800 1000
  • 7. 150 PCA 2 100 50 0 −50 −100 −300 −250 −200 −150 −100 −150 −50 −100 PCA 1 PCA 2 100 0 −100 −300 −250 −200 −50 PCA 1 Figure 5. Trained GM approximation of the second class. 1 0.95 0.9 0.85 Pd 0.8 0.75 0.7 0.65 0.6 0.55 0 0.005 0.01 0.015 Pfa 0.02 Figure 6. ROC curve for the two-class problem. Proc. of SPIE Vol. 7075 70750L-7 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms 0.025 0.03
  • 8. 5. CONCLUSIONS AND FUTURE WORKS In this paper, we proposed the use of Gaussian mixture models that utilize structural learning scheme for modeling and classification of Hyperspectral imagery. Traditional learning technique employing general EM has serious drawbacks such as no generally accepted method for parameter initialization, how many mixture components to be employed to adequately model the input data and the chances of the model being stuck in multiple local maxima’s of the likelihood function. These drawbacks have been well addressed by the proposed structural learning scheme. The ROC curve in our experiments is used as a general diagnosis tool to evaluate classification. Clearly, ROC depicted high probability of detection for low false alarm rates. The GMM in conjunction with structural learning is well equipped to model and classify HSI data. These models provide sufficient adjustment to several distributions with lower variances. This trait of GMM is particular appreciated in image classification applications, since it reduces misclassification. As future work, we intend to explore and equip current state-of-the-art statistical classifiers with better training schemes for practical HSI applications. ACKNOWLEDGMENTS This work was supported by NASA Earth System Science Fellowship grant. The authors would also like to thank department of Geological Sciences at UTEP for providing access to ENVI software. REFERENCES [1] Landgrebe, D. A., [Signal Theory Methods in Multispectral Remote Sensing], Wiley Inter-Science, Hoboken, NJ, second ed. (2003). [2] Schowengerdt, R. A., [Remote Sensing Models & Methods for Image Processing], Academic Press, Burlington, MA, seventh ed. (1997). [3] Keshava, N., “Distance metrics & band selection in hyperspectral processing with applications to material identification & spectral libraries,” IEEE Transactions on Geoscience and Remote Sensing 42, 1552–1565 (2004). [4] Redner, R. and Walker, H., “Mixture densities, maximum likelihood and the EM algorithm,” SIAM Review 26, 195–239 (1984). [5] Duda, R. O., Hart, P. E., and Stork, D. G., [Pattern Classification ], John-Wiley and Sons, New York, NY, seventh ed. (2001). [6] Baggenstoss, P. M., “Structural learning for classification of high dimensional data,” Proc. of International Conference on Intelligent Systems and Semantics NIST, 124–129 (1997). [7] Moon, T. and Stirling, W., [Mathematical Methods and Algorithms for Signal Processing], Prentice Hall, Upper Saddle River, NJ (2000). [8] Rabiner, L., “A tutorial on hidden M arkov models and selected application in speech recognition,” Proc. of IEEE 77, 257–286 (1989). [9] McLachlan, G. and Peel, D., [Finite Mixture Models], Wiley Series in Probability and Statistics, New York, NY, second ed. (2000). [10] Hastie, T., Tibshirani, R., and Friedman, J., [The Elements of Statistical Learning], Springer-Verlag, New York, NY (1994). [11] Vlassis, N. and Likas, A., “A greedy EM for gaussian mixture learning,” Neural Processing Letters 15, 77–87 (2002). [12] Vlassis, N. and Likas, A., “A kurtosis-based dynamic approach to gaussian mixture modeling,” IEEE Transactions on Systems, Man and Cybernetics 29, 393–399 (1999). [13] Jayaram, V., Usevitch, B., and Kosheleva, O., “Detection from H yperspectral images compressed using rate distortion and optimization techniques under JPEG2000 part 2,” IEEE 11th DSP Workshop and 3rd SPE Workshop cdrom, 195–239 (2004). [14] Shaw, G. and Manolakis, D., “Signal processing for H yperspectral image exploitation,” IEEE Signal Processing Magazine 19, 12–16 (2002). Proc. of SPIE Vol. 7075 70750L-8 Downloaded From: http://spiedigitallibrary.org/ on 09/04/2013 Terms of Use: http://spiedl.org/terms