Endmember Extraction in Hyperspectral Images Using l-1
Minimization and Linear Complementary Programming
Dzung Nguyena, Trac Trana, Chiman Kwanb, Bulent Ayhanb
a
Dept. of Electrical and Computer Engineering, the Johns Hopkins University, 3400 N. Charles St,
Barton Hall 322, Baltimore, MD USA 21218;
b
Signal Processing Inc, 9700 Great Seneca Hwy, Rockville MD USA 20850
ABSTRACT
Endmember extraction in Hyperspectral Images (HSI) is a critical step for target detection and abundance estimation. In
this paper, we propose a new approach to endmember extraction, which takes advantage of the sparsity property of the
linear representation of HSI's spectral vector. Sparsity is measured by the l 0 norm of the abundance vector. It is also well
known that l1 norm well resembles l0 in boosting sparsity while keeping the minimization problem convex and tractable.
By adding the l1 norm term to the objective function, we result in a constrained quadratic programming which can be
solved effectively using the Linear Complementary Programming (LCP). Unlike existing methods which require
expensive computations in each iteration, LCP only requires pivoting steps, which are extremely simple and efficient for
the un-mixing problem, since the number of signatures in the reconstructing basis is reasonably small. Preliminary
experiments of the proposed methods for both supervised and unsupervised abundance decomposition showed
competitive results as compared to LS-based method like Fully Constrained Least Square (FCLS). Furthermore,
combination of our unsupervised decomposition with anomaly detection makes a decent target detection algorithm as
compared to methods which require prior information of target and background signatures.
Keywords: hyperspectral, endmember extraction, unmixing, sparsity, l1 minimization, linear complementary
programming
1. INTRODUCTION
HSI are highly resolute in spectrum, but generally have poor spatial resolutions. If taken high above the surface, the
Instantaneous Field Of View (IFOV) of a single pixel may cover an area of size up to 3x3m (HYDICE) or even 30x30m
(HYPERION)1, which can be larger than the spatial dimensions of the targets/ materials of interest. As a consequence,
information in a spectral pixel is not pure, but a mixture of contributions from several materials located at the
corresponding elementary area. A mixture model, either linear or non-linear, is necessary to quantify this. Under some
reasonable conditions the linear assumption to the model is acceptable 2. In fact the linear mixture model r = Ms + n is
used in most of the unmixing, classification, and detection algorithms in HSI processing. Here r is a spectral vector
(pixel), M is a dictionary composed of signatures of distinct endmembers as columns, s contains contributions of each
endmember, and n models zero mean equipment noise and signature variability.
In this paper, firstly we propose to look at the unsupervised endmember extraction problem from a sparse representation
perspective. The sparse nature of hyperspectral images is verified by the fact that, no matter how large in spatial
dimensions, an HSI can be represented closely by a few endmembers present in the scene. An algorithm is then
developed based on sparse reconstruction techniques to extract the best signatures that can be used to constitute the
whole HSI. The iterative matching pursuit approach is chosen for the extraction algorithm, since this is proved to be
effective in compressed sensing and sparse reconstruction literature 7. In each iteration, abundances of previously
extracted endmembers are estimated by an unmixing process in which we introduced l 1 regularization term to traditional
Least Square objective function to promote the sparsity of the solution. The minimization of this combined cost is done
by Linear Complementary Programming (LCP) which is very efficient in this situation. Also, the proposed endmember
extraction method is used in conjunction with an anomaly detection algorithm to form an effective unsupervised target
detection algorithm.
The structure of the paper is as follows: In order for the paper to be as self-contained as possible, the next section covers
some fundamental background on endmember extraction, sparse reconstruction, and LCP. Section 3 gives details about
sparse linear mixture model, analyzes the approach to exploit this model, and describes the proposed algorithm. Section
4 shows experimental results, including results of the proposed unsupervised target detection. Section 5 provides a
summary with concluding remarks, as well as a brief discussion about potential future research.
2. BACKGROUND
2.1 Linear Mixture Model and Spectral Unmixing
In spectral mixture modeling, the basic assumption is that the surface is made of a few endmembers of relatively
constant spectral signature, or at least constant spectral shape. If the multiple scattering among distinct endmembers is
negligible and the surface is partitioned according to the fractional abundances, then the spectral radiance upon the
sensor location is well approximated by a linear mixture of endmember radiances weighted by the correspondent
fractional abundances1. Mathematically, r = Ms + n, where r is a spectral pixel, M is a matrix whose columns are
endmembers’ signatures, s is the vector of corresponding abundance, and n is noise in all spectral bands.
Depending on how much prior information is available, various algorithms had been developed to extract/ unmix HSI
data. If M is known, the task is to unmix the contributions of each endmember in a supervised fashion, which is usually a
constrained optimization process. If only the number of signatures is known, then they must be extracted from the
dataset, and the unmixing is done later by supervised unmixing. If that intrinsic dimension of the HSI is unknown, the
task is now more difficult. Endmember extraction and abundance unmixing must be done simultaneously in a process
called unsupervised unmixing.
Some early methods use stochastic approach, for example Independent Component Analysis (ICA), require estimation of
second and higher order statistics of data, a difficult task due to some intrinsic factors of HSI imaging systems like
signature variability, abundance constraints, topography modulation, and system noise.
Recent methods which use deterministic approaches are more successful in unmixing problem. If the signatures of
endmembers to unmix are known, then the problem is essentially a non-negative linear least square problem, which can
be solved with the Non-negativity Constrained Least Square (NCLS) algorithm, or Fully Constrained Least Square
(FCLS)4. The later is an extension of NCLS that also enforce the abundance sum-to-one constraint. NCLS and FCLS are
extended further to unsupervised NCLS (UNCLS) and UFCLS for unsupervised unmixing. Some algorithms use convex
analysis approach, like N-Findr algorithm, but this method requires the precise knowledge of the intrinsic dimension of
the dataset.
2.2 Sparse representation and l1 minimization
Compressed Sensing (CS) is a newly emerged sensing/sampling paradigm in signal processing that, in a sense, goes
against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from
far fewer samples or measurements than traditional methods use, which governed by Nyquist rate. One necessity for the
theory to hold is the signal of interest must have sparse representation in some dictionary, i.e., y = Aα, where A is the
dictionary, generally over complete, and α is the vector of coefficients with only a few non-zero entries (sparse).
The magic to perfectly reconstruct sparse α from measurements y is to solve the following minimization problem
min α w.r.t. y = Aα, where l0 norm is the number of non-zero entries of α. Solving this programming directly requires
α
0
a combinatorial search through all k-size subset of columns of A, which is computationally intractable. Besides, the
precise knowledge of k is not always available.
There are two main approaches, which have shown success in sparse reconstruction problems. The first is based on a
surprising result that the l1 norm can replace the l0 norm in the previous minimization and still give the correct sparse
5,6
solution . It is solving the l1 minimization using linear programming and its robust version to deal with compressible
source signals (i.e. signals those can be represented with a few significant coefficients in some domain), which also
based on constrained convex programming.
The second approach is called greedy pursuit, which is an iterative method. The technique gets its name because, in each
iteration, it tries to incorporate a new atom into the dictionary (subset of vector in the basis). If the signal is k-sparse, the
algorithm should give the correct result after k steps. Though completely different from l1 minimization approach, this
greedy approach, with its most successful algorithm named Orthogonal Matching Pursuit (OMP) is also proved to give
correct sparse solution with overwhelming probability7.
3. HSI SPARSE REPRESENTATION
3.1 Sparse Mixture Model
In unsupervised unmixing, we assumed not to know any prior information about true signatures of the endmembers
constituting the HSI image. Algorithms must then be tried to select/extract the purest possible spectral vectors (from the
HSI in consideration) to represent their corresponding pure signatures. If we build a dictionary with all spectral vectors
in the image as its atoms, we then have a sparse representation of the HSI image, and more importantly, the source
vectors are sparse simultaneously. The idea is realized in the following linear mixture model:
R = MS + N
(1)
In the model, R is a matrix whose columns are all spectral pixels we want to unmix. M is the over-complete dictionary
containing all the possible empirical signatures. The simplest construction of M is to include all spectral pixels into the
dictionary. It is possible to reduce the size of M by restricting the region of interest, for example a local neighborhood or
a cluster (of course if the clustering information is available). S is the matrix of the non-negative abundances; i.e. each
column in S is the abundance vector corresponds to the spectral pixels at the same location in R. N is a matrix of
appropriate dimension to model noise.
Simultaneous sparsity in this model means entries of S are non-zero only in rows correspond to active atoms in the
dictionary M. If we define an indication vector I as follows: I equals zero at indices where a whole row in S is zero. Our
algorithm is then to solve the following sparse problem:
min I
S
0
w.r.t R j MS j
2
2
, j and other conditions on S
(P1)
In this problem, Rj and Sj are columns of R and S respectively, and other conditions on S are non-negativity and
abundance-sum-to-one.
As discussed earlier, there are different approaches for solving a sparse reconstruction problem. In this paper, we chose
the iterative greedy pursuit approach. In this approach, we are going to choose the representative spectral vectors, which
we called D, iteratively among the available pixels in the whole image (or in the area that constitutes the over complete
dictionary M). The algorithm starts with an empty set of selected spectral signatures (D=). It then initiates the set with
the vector with maximum length and the vector that is most orthogonal to the first selection. Generally, these pair of
initial atoms is the brightest and darkest pixels in the image. From the third iteration, every pixel is unmixed with respect
to endmembers in D by solving
min
f s =
1
T
r - Ds r - Ds s 1 subject to s 0
2
(P2)
The residue of each pixel in this representation is calculated and its norm is used as criterion to select the next atom. The
iteration process stops when every pixel can be represented using D within a pre-selected precision .
It is a known fact that the intrinsic dimension of an HSI is very small despite the spatial dimensions. As a result, the
iteration process usually stops after a few iterations, once D already includes sufficient signatures to represent the HSI
image. This is the main benefit of choosing greedy approach. In fact, the number of necessary iterations, and
equivalently the number of extracted endmembers, is a little higher than the intrinsic dimension due to the effect of
noise. Another benefit is that the extracted signatures in D tend to have low correlations. In other words, there is no
duplication of the same material signatures in D.
3.2 The proposed algorithm
Table 1. Pseudo codes of the proposed HSI unsupervised endmember extraction algorithm
The L1-LCP Endmember Extraction Algorithm
1. D=, k=0, set a threshold .
2. Select the 1st atom d0= argmaxr (rTr). ; Select the 2nd atom, which is the
most orthogonal to the 1st: d1= argmaxr(r - d0)T(r - d0).
D= [d0 d1], k=1.
3. k = k + 1; for every spectral pixel r, unmix on the current dictionary D by
solving min f(s)= (r - Ds)T(r - Ds)/2 + |s|1 subject to s 0 using LCP
(explained later).
4. Calculate the residue for all pixels e(r)= (r - Ds)T(r - Ds) for all r in R.
5. Check if all residues are smaller than the threshold . If Yes then stop, if
No, go to 6.
6. Find the vector with largest residue dk= argmaxr e(r), add to the current
dictionary D= [D dk], go back to 3.
The structure of the above algorithm looks similar to other unsupervised endmember extraction methods, such as
UNCLS or UFCLS. The initialization of D is actually the same as in UNCLS and UFCLS. The main difference here is in
the perspectives to look at the problem. While the other methods are generalization from supervised endmember
extraction, our method takes the viewpoint of sparse representation and reconstruction framework, and in this framework
the iterative pursuit process described here is only one amongst other possible approaches to solve the problem. This
framework opens other possibilities to extend the algorithm, for example, to include more than one signature to D at
each step, to refine the selection i.e. to exclude the “bad” choices from previous iterations, or to take advantage of side
information (in case we have some ideas about the intrinsic dimension of the image).
Another main difference is at the unmixing step. Our algorithm based on an l1-regularized least square cost function, not
solely on least square as in NCLS and FCLS. If looking at the iterative selection process as the first mechanism to
impose sparsity, this cost function can be seen as the second mechanism to boost sparsity in the resulted abundance
vectors.
Although the described algorithm is for unsupervised endmember extraction, it is easy to see that supervised endmember
extraction is a special case, in that we only need the 3 rd step to solve the unmixing problem on a known spectral
signature set D.
3.3 Linear Complementary Programming (LCP)
The original LCP problem can be described as: let H be a given pxp matrix and let q be a given p-vector, the goal is to
find vectors w and z to satisfy (2)-(4) or to conclude that no such solution exists.
(2)
w - Hz = q
wj 0, z j 0 for j = 1,…,p
w j z j 0 for j = 1,…,p
(3)
(4)
Solving LCP problem can be summarized briefly as follows: If q is nonnegative, then we immediately have a solution
satisfying (2)-(4) by letting w = q and z = 0. If q is not 0, however, a new column 1 and an artificial variable are
introduced, leading to the following system, where 1 is a vector of p ones.
w - Hz 1z0 = q
(5)
z0 0, wj 0, z j 0 for j = 1,…,p
w j z j 0 for j = 1,…,p
(6)
(7)
Letting z0= max{-qi : 1≤ i ≤ p} , z = 0, and w = q + ez0, we obtain a starting solution to the above system. Through a
sequence of pivots, we attempt to drive the artificial variable z0 to zero while satisfying (5)-(7), thus obtaining a solution
3
to the LCP. More details on solving LCP problem can be found in the referenced textbook .
We chose LCP because minimizing the cost function at step 3 with non negativity constraint is actually solving a
quadratic minimization. Karush-Kuhn-Tucker (K-K-T) conditions of this minimization can be written as a LCP problem:
f s =
T
where H = D
1
1
T
r - Ds r - Ds s 1 = sT Hs + qTs c for s 0
2
2
(8)
D , q = 1 - DT r and c r Tr . Minimizing f subject to s 0 yields K-K-T conditions (with w is the
vector of Lagrangian multipliers)
w - Hs = q
w j s j 0 for j = 1,…,p
w 0, s 0
(9)
(10)
(11)
These conditions now have exactly the same form of a LCP problem (with s takes the role of z), and can be solved by a
LCP solver. And since the size of the problem (in this case the size of H) is relatively small, LCP can be solved
efficiently. Another reason for choosing LCP is it only requires pivoting steps, which are extremely simple (as compared
to other gradient-based methods) and attractive when it comes to hardware implementation.
4. EXPERIMENTS AND RESULTS
4.1 Endmember extraction on synthetic HSI image
We perform our experiments on a synthetic hyperspectral image, which has the spatial dimensions of 64x64 and 45
bands. The image is generated by implanting 5 target signatures at known locations on a background, whose signature is
also known. Noise is added to the image to achieve SNR = 35dB. Figure 1 show the 5th band of the synthetic data and
the original signatures of 5 implanted targets.
Our L1-LCP method is compared against Unsupervised FCLS (UFCLS). UFCLS is implemented based on its
description in the referenced paper4. We use fixed parameter = 0.01 for UFCLS, and fixed regularization parameter =
0.01 for L1-LCP. Both algorithms are set to run until the maximum least square error over all pixels is smaller than a
certain error tolerance . The first 13 extracted signatures are shown [Figure 2(b),(c)], and it can be seen clearly that both
methods capture all 5 target signatures, the rest of the extracted signatures are variants of the background signature due
to the effect of noise. Maximum least square errors calculated at each iteration step show that our method is essentially
performing better than UFCLS [Figure 2(a)].
Figure 1. One spectral band of the synthetic image (band 5), and original signatures of the 5 target embedded in
the image.
(a)
(b)
(c)
Figure 2. (a) Absolute Maximum Least Square Error at each iteration (the image is normalized to have maximum
magnitude 1 before experiments); (b) and (c): The first 13 extracted endmembers are displayed.
4.2 Endmember extraction on real HSI image
In this section, we run extraction algorithms on a real HSI image. The original image is large 800x1024x124, so we
select a middle portion in spatial domain, down sample it by a factor of 2, and keep only the first 70 bands which keep
most of the distinctive features between different materials. The resulting image has the dimension of 150x150x70,
which contains trees, grass, shadow of the trees on grass and a metal panel [Figure 3(a)]. Parameters for both algorithms
are kept the same as in Section 4.1.
Locations of the extracted endmembers are shown in Figure 3(b) and (c). Their corresponding signatures are in Figure
4(b) and (c). Figure 4(a) shows L1-LCP is better than UFCLS in terms of maximum least-square-error measurement.
(a)
(b)
(c)
Figure 3. (a) The image used in the experiment, band 25; (b) Positions of endmembers extracted using UFCLS; (c)
Positions of endmembers extracted using L1-LCP.
(a)
(b)
(c)
Figure 4. (a) Absolute Maximum Least Square Error at each iteration; (b) Extracted signatures using UFCLS; (c)
Extracted signatures using L1-LCP.
4.3 Application in unsupervised target detection
The idea is that target pixels are often considered as 'abnormal' as compared to their surrounding background. However,
anomaly detection algorithms themselves cannot give clean maps of potential targets. If we use the noisy maps generated
by an anomaly detection algorithm, e.g. the traditional RX, to classify the signatures extracted by our unsupervised
unmixing algorithm, we can separate the target signatures from background signatures. The abundances on those targets
(also outputs of the unsupervised unmixing process) can then be used to discriminate pixels belong to targets and
backgrounds.
The experiment has been conducted on the same synthetic HSI image that has been used in Section 4.1. Firstly, L1-LCP
is used to extract endmembers. With = 0.0025, the algorithm extracts 13 endmembers and stops. The RX anomaly
detection algorithm10 is used on the HSI image to extract an anomaly image [Figure 5(a)], which is thresholded to give a
map of anomaly pixels. This map is then used to check on the locations of extracted endmembers. Endmembers whose
locations match with anomaly pixels are classified as target endmembers [Figure 5(b)], the others are classified as
background endmembers [Figure 5(c)]. Once target and background signatures are separated, sum of abundances of
target signatures in each image pixel is used as the score [Figure 6(b)]. One should notice that the whole process here
doesn’t require any prior information of target or background.
On the other hand, we use Matched Subspace Detector (MSD) to detect targets. Its result image is in Figure 6(a). MSD
requires information of target and background signatures, therefore we use the original target signatures (groundtruth),
and select a background area (a patch from row 25 to row 45, and column 54 to column 64) to feed into MSD.
Groundtruth of the locations of the target is shown [Figure 6(c)] to compare the detection capabilities of the 2
approaches. Profile views of the 3-D mesh images from 2 methods [Figure 7(a),(b)] show MSD’s result is noisy, while
our method gives very clean score image. This means our method can capture the weak targets at row 8 and 9 by setting
a low threshold without getting to much false positives. The ROC curves in Figure 7(c) also show that our proposed
approach outperforms the MSD.
(a)
(b)
(c)
Figure 5. (a) Locations of the Endmembers extracted using L1-LCP, drawn on top of the result from anomaly
detection algorithm RX; (b) Signatures classified as targets; (c) Signatures classified as backgrounds.
(a)
(b)
(c)
Figure 6. (a) Result of Matched Subspace Detector (MSD); (b) Sum of the abundances those belong to extracted
target signatures; (c) Groundtruth of target locations.
(a)
(b)
(c)
Figure 7. (a) mesh image of MSD’s result; (b) mesh image of sum of target abundance image; (c) Receiver
Operating Characteristic curves.
5. CONCLUSIONS
To conclude, in this paper we introduce a sparse representation framework for HSI images, which we called Sparse
Mixture Model. This model provides a new insight into the unsupervised endmember extraction problem and serves as a
base to develop our extraction/unmixing algorithm. We propose an algorithm which takes the iterative greedy pursuit
approach to solve the problem. The core of our iterative algorithm is an l1 minimization, which we use LCP to solve fast
and effectively. We also suggest using the proposed unsupervised endmember extraction algorithm together with an
anomaly detection algorithm as an unsupervised target detection method, which the conducted experiments in this
direction showed promising results.
In future research, we will extend the proposed iterative algorithm. Feasible ways could be adding more atoms in each
iteration step and incorporating a mechanism that allows refining the previous selection. Since in most cases, the true
dimension of the dataset is much smaller than the number of bands, we might consider a dimensionality reduction step to
reduce the problem to a smaller spectral dimension. Compressed sensing paradigm can be used for this purpose. The
unsupervised target detection idea also needs further development to become a robust algorithm. We will also test our
algorithms on a number of different HSI data as well as compare with other methods.
ACKNOWLEDGEMENTS
This work was supported in part by Air Force Office of Scientific Research under contract FA9550-09-C-0162.
REFERENCES
[1] Chein-I Chang, "Hyperspectral Data Exploitation Theory and Applications," John Wiley & Sons, 2007.
[2] Chein-I Chang, "Hyperspectral Imaging Techniques for Spectral Detection and Classification," KAP, 2003.
[3] Mokhtar S. Bazaraa, Hanif D. Sherali, C. M. Shetty, "Nonlinear Programming Theory and Algorithms," 3rd Ed,
John Wiley & Sons, 2006.
[4] Daniel C. Heinz and Chein-I Chang, "Fully Constrained Least Squares Linear Spectral Mixture Analysis Method for
Material Quantification in Hyperspectral Imagery," IEEE Trans. Geosci. Remote Sens., Vol. 39, No. 3, March 2001.
[5] E. J. Candes and M. B. Wakin, "An Introduction To Compressive Sampling," IEEE Signal Processing Magazine,
Vol. 25, No. 2, pp. 21-30, March 2008.
[6] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, Apr.
2006.
[7] J. A. Tropp and A. C. Gilbert, "Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit,"
IEEE Transactions on Information Theory, Vol. 53, No. 12, pp. 4655-4666, Dec. 2007.
[8] Z. Guo, T. Wittman, S. Osher, "L1 unmixing and its application to hyperspectral image enhancement," Proc. SPIE,
Vol. 7334, 73341M (2009).
[9] L. L. Scharf and B. Friedlander "Matched subspace detectors", IEEE Trans. Signal Process., vol. 42, pp. 2146
1994.
[10] I. S. Reed and X. Yu, "Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral
distribution", IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 1760 - 1770, 1990.
[11] H. Kwon and N. M. Nasrabadi "Kernel RX-algorithm: A nonlinear anomaly detector for hyperspectral imagery",
IEEE Trans. Geosci. Remote Sens., vol. 43, pp. 388 2005.