1
Random Subspace Ensembles for
Hyperspectral Image Classification with
Extended Morphological Attribute Profiles
Junshi Xia, Student Member, IEEE, Mauro Dalla Mura, Member, IEEE,
Jocelyn Chanussot, Fellow, IEEE, Peijun Du, Senior Member, IEEE, and Xiyan He
Abstract
Classification is one of the most important techniques to the analysis of hyperspectral remote sensing images.
Nonetheless, there are many challenging problems arising in this task. Two common issues are the curse of dimensionality and the spatial information modeling. In this work, we present a new general framework to train series of
effective classifiers with spatial information for classifying hyperspectral data. The proposed framework is based on
the two key observations: 1) the curse of dimensionality and the high feature-to-instance ratio can be alleviated by
using Random Subspace (RS) ensembles; 2) the spatial-contextual information is modeled by the extended multiattribute profiles (EMAPs). Two fast learning algorithms, decision tree (DT) and extreme learning machine (ELM),
are selected as the base classifiers. Six RS ensemble methods, including Random subspace with DT (RSDT), Random
Forest (RF), Rotation Forest (RoF), Rotation Random Forest (RoRF), RS with ELM (RSELM) and Rotation subspace
with ELM (RoELM), are constructed by the multiple base learners. Experimental results on both simulated and real
hyperspectral data verify the effectiveness of the RS ensemble methods for the classification of both spectral and
spatial information (EMAPs). On the University of Pavia ROSIS image, our proposed approaches, both RSELM and
RoELM with EMAPs, achieve the state-of-the-art performances, which demonstrates the advantage of the proposed
methods. The key parameters in RS ensembles and the computational complexity are also investigated in this study.
Index Terms
Manuscript received ; revised . This paper is supported by the Natural Science Foundation of China under Grant No. 41471275, the Priority
Academic Program Development of Jiangsu Higher Education Institutions (PAPD), the Fundamental Research Funds for the Central Universities
and the project XIMRI ANR-BLAN-SIMI2-LS-101019-6-01.
J. Xia is with the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and
Geoinformation of China, Nanjing University, 210023 Nanjing, China and the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble,
France (e-mail: xiajunshi@gmail.com).
M. Dalla Mura and X. He are with the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble, France (e-mail: mauro.dalla-mura@gipsalab.grenoble-inp.fr, greenhxy@gmail.com).
J. Chanussot is with the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble, France and Faculty of Electrical and Computer
Engineering, University of Iceland, Iceland. (e-mail: jocelyn.chanussot@gipsa-lab.grenoble-inp.fr).
P. Du is with the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and
Geoinformation of China, Nanjing University, 210023 Nanjing, China (Corresponding author, e-mail: dupjrs@gmail.com).
March 1, 2015
DRAFT
2
Classification, Hyperspectral data, Random subspace, Extended multi-attribute profiles (EMAPs)
I. I NTRODUCTION
In the context of hyperspectral image analysis, classification is an intense field of research and development
[1]–[5]. The difficulties of the supervised classification of high spatial resolution hyperspectral data come from at
least three sources:
•
the ratio between features (spectral bands) and available training samples is large;
•
the feature set might show some redundancy;
•
existence of many approaches to exploit the spatial information in classification, but unavailability of a reliable
approach.
A considerable amount of literature has focused on hyperspectral image classification [6]–[11]. One of the most
widely used approaches is based on kernel methods, such as the support vector machines (SVMs), due to their
good generalization capability, ability in addressing small-size samples problems and the curse of dimensionality
[9], [12], [13]. In addition, kernel methods can efficiently define non-linear decision boundaries, dealing with cases
in which the data is not linearly separable [12]. However, the selection of kernels and the parameters is still an
open question that needs to be further investigated.
Another alternative strategy for providing enhanced classification performance is classifier ensembles, which are
deemed to be better than individual classifiers [14]–[18]. A popular ensemble method is the Random Subspace
(RS) ensembles [19]. The idea is intuitive and simple: subsets of feature set are used in the ensemble instead of
using all features for all the individual classifiers, and ensemble classifier integrates the outputs of all individual
classifiers using a majority voting rule to obtain the final result. Each classifier in the ensemble is constructed on
a different feature subset by randomly sampling the original feature set. The rationale behind the RS ensembles is
to break down a complex high dimensional problem into several lower-dimensional sub-problems. Thus, they can
address such problems as the curse of dimensionality and high feature-to-instance ratio [20].
The most popular RS ensemble method for high-dimensional data (hyperspectral and multi-date images) classification is Random Forest [21]–[24]. Besides, Waske et al. [25] developed a random selection-based SVM for the
classification of hyperspectral images. Recently, Xia et al. [26], [27] used Rotation Forest to classify hyperspectral
remote sensing images. In comparison with Random Forest, Rotation Forest [26]–[28] promotes both the interclassifier diversity and accuracy of individual classifiers by using a feature extraction approach. Therefore, it can
produce more accurate result than Random Forest in most cases [26], [27].
Recent studies demonstrated that spectral-spatial approaches can provide more accurate classification results by
integrating the spatial and spectral information together [29]. The motivation is due to the fact that spatial features
are discriminant features that can well complement the spectral ones. Different approaches can be used to extract
spatial features [30]–[34]. Some of them are based on mathematical morphology (MM), which is a powerful tool
for the analysis and processing of geometrical structures in the spatial domain [35]. Pesaresi and Benediktsson
introduced the morphological profile (MP) for classifying very high spatial resolution images using a sequence of
March 1, 2015
DRAFT
3
geodesic opening and closing operations [30]. The derivative of the morphological profile (DMP) was also defined
in their study. Furthermore, Benediktsson et al. proposed the extended morphological profile (EMP), in which an
MP is computed on each component after reducing the dimensionality of the data [36]. The first few components
and the EMP are stacked together and then classified by a neural network. The main drawback of the method in
[36] is that it is constructed for classification of urban structures and it cannot fully use the spectral information
of hyperspectral data [37]. Fauvel et al. developed a spectral and spatial fusion methods based on EMP and the
original hyperspectral data to overcome this problem [37].
In the works of Dalla Mura et al. [38], [39], attribute profiles (APs) were proposed for extracting additional
spatial features for the classification of remote sensing imagery, extending the MP and EMP concepts. APs have
proven to extract more informative spatial features than MPs in the classification of high-resolution images. Since
then, the AP and its extensions have been widely used for the classification and change detection of multi/hyperspectral and LiDAR data. Dalla Mura et al. presented a technique based on Extended APs (EAPs) and independent
component analysis (ICA) for the classification of urban hyperspectral images [40]. Prashanth et al. explored the
use of APs based on three supervised and two unsupervised feature extraction techniques for the classification
of hyperspectral data with SVM and Random Forest classifiers [41]. Pedergnana et al. proposed a classification
approach of features extracted with EAPs computed on both optical and LiDAR images, leading to the integration
of spectral, spatial and elevation data [42]. Pedergnana et al. proposed a novel iterative technique based on genetic
algorithm to select the optimal features from the EMAPs [43]. Falco et al. investigated the performance of change
detection in very high resolution image based on APs [44]. Li et al. presented a generalized composite kernel
framework for hypersepctral image classification by combing spectral and the spatial information (EMAPs) [45].
Bernabe et al. proposed a new strategy combing EMAPs and kernel principal components analysis (KPCA) for
the classification of multi/hyper-spectral images [46]. Song et al. applied a sparse representation-based learning
approach to classify EMAPs extracted from hyperspectral data [47].
From the above literature review, it can be seen that when the EMAPs are used for hyperspectral data classification,
two strategies are often adopted:
•
applying feature selection/extraction [43] or advanced classifiers to EMAPs [47];
•
integrating EMAPs with spectral information to formulate the composite kernel for kernel-based methods [45].
In this paper, we propose an advanced classification scheme based on Random Subspace (RS) ensembles applied
to EMAPs features. Decision Tree (DT) and Artificial Neural Network (ANN) are usually adopted as a base learner
in RS ensemble because they are unstable as weak learners. Small changes in the training data can lead to potentially
large variations in the results, making a high diversity within the ensemble. Considering the computational cost,
we construct the RS ensembles with two fast learning algorithms: classification and regression tree (CART) and
a recently proposed NN classifier: Extreme Learning Machine (ELM) [48], [49]. EMAPs are generated by the
combination of APs and the first several components extracted by PCA. Six classifier ensembles, including RSDT,
RF, RoF, RoRF, RSELM and RoELM, are considered as shown in Table I.
The novelty of this work lies in:
March 1, 2015
DRAFT
4
TABLE I
I NDIVIDUAL AND ENSEMBLE CLASSIFICATION APPROACHES CONSIDERED FOR THE STUDY
Individual classifiers
(Notation)
Decision tree
DT
Extreme learning machine
•
ELM
Classifier ensembles
Random subspace with DT
Random Forest
Rotation Forest
Rotation Random Forest
Random subspace with ELM
Rotation subspace with ELM
(Notation)
RSDT
RF
RoF
RoRF
RSELM
RoELM
proposing an ensemble classifier using ELM base learner and two possible strategies for building the ensembles
(i.e. Random and Rotation subspace);
•
introducing Rotation Random Forest (RoRF) [50] in the field of hyperspectral remote sensing;
•
defining spectral-spatial classification techniques based on the proposed ensembles and on the spatial features
computed by EMAPs.
In particular, the performances in a scenario with limited training samples and high input dimensions and the
computational complexity is investigated in this paper. It is should be noted that the spectral information and EMAPs
are directly applied to the Random subspace ensemble methods without any preprocessing technique (e.g., feature
extraction/selection or whitening).
The overall structure of the study takes the form of seven sections, including this introductory section. Section
II presents an introduction to decision tree and its ensembles. The proposed ELM ensemble methods are detailed
in Section III. The main description of EMAPs is presented in Section IV. Section V reports classification results
based on simulated hyperspectral data. We report the experimental results on two real hyperspectral datasets in
Section VI. Section VII contains the conclusion of the presented work and its perspectives.
II. D ECISION TREE AND ITS ENSEMBLES
Let {X, Y} = {(x1 , y1 ) , ..., (xn , yn )} be a set of labeled samples, where xi ∈ RD is a pixel and yi contains the
label information 1 . Let F be the set of D features. In order to construct an RS ensemble, we collect T classifiers
based on the subsets of the original features. Each feature set in the ensemble defines a subspace of features of
cardinality M and a classifier is trained on this feature set using the whole training samples [19]. The final result
is generated by a majority voting rule. Two parameters, including the ensemble size T and the cardinality of the
feature set M , are required in the RS ensemble.
A. Decision tree
Decision tree is a non-parametric supervised learning algorithm used for classification and regression [51].
1 y is different between DT and ELM. In DT, y is a scalar with classes of interest Q = {1, ..., Q}, where Q is the total number of classes.
i
i
In ELM, yi is a vector of label in which j th column is set to be 1 if the sample belongs to class j while the other columns are set to be 0.
March 1, 2015
DRAFT
5
It is composed of a root node, a set of internal nodes (split) and a set of terminal nodes (leaves). In classification,
a root node and each internal node has a splitting decision and splitting features associated with it. Class labels can
then be assigned to the leaves. The creation of a DT from training samples involves two phases. At first, a splitting
measure and a splitting attribute should be chosen. In the second phase, the records among the child nodes are split
based on the decision made in the first phase. This process is applied recursively until a stopping criterion is met
[52]. Then, the DT can be used to predict the class label of a new sample. The prediction process starts at the root,
and a path to a leaf is traced by performing a splitting decision at each internal node. The class label attached to
the leaf is then assigned to the new sample [52].
A critical component of the decision tree induction process is the selection of the split. Different algorithms use
various metrics to split the nodes. The most widely used splitting criteria relies on the minimization of the Gini
index of the splits [53].
B. Decision tree ensembles
1) Random subspace with DT: The RS ensemble, introduced by Ho [19], was proposed for constructing multiple
decision trees. The objective of the RS ensemble is to sample a feature set into low dimensionality subspaces from
the whole original high dimensional feature space, then construct a classifier on each smaller subspace, and finally
apply a majority voting rule for the final decision.
2) Random Forest: Random Forest, developed by Breiman [21], combines Bagging [54] and Random subspace
[19] together to produce the decision tree ensemble. Random Forest is a particular implementation of bagging in
which each model is a random tree. A random tree is grown according to the CART algorithm with one exception:
for each split, only a small subset of features of randomly selected splits is considered and the best split is chosen
from this subset. Since only a portion of the input features is used to split and no pruning on the tree is done,
the computational complexity of Random Forest is relatively light [21]. The computing time is approximately of
√
T M nlog(n), where T , M and n represents the number of classifiers, the features in a subset and the training
samples, respectively.
3) Rotation Forest: Rotation Forest is a recently proposed ensemble method for building classifier ensembles
using independent decision trees built on a different set of extracted features [26]. The main heuristic of Rotation
Forest is to apply feature extraction and to subsequently reconstruct a full different feature set for each classifier in
the ensemble. To do this, the feature space is randomly split into K subsets and each subset contains M features,
then principal component analysis (PCA) is applied to each K subset and a new set of M linear extracted features in
each subset is constructed by all principal components. Furthermore, a new training data is formed by concatenating
M linear extracted features in each subset.
An individual DT classifier is trained with this training data. A series of individual classifiers is generated by
repeating the above steps several times. The final classification result is produced by integrating the results from
individual classifiers using a majority voting rule. Different splits of the features will lead to different extracted
features, thereby further increasing the diversity already introduced by the bootstrap sampling.
March 1, 2015
DRAFT
6
4) Rotation Random Forest: Rotation Random Forest (RoRF) is a variant of Rotation Forest, which uses Random
Forests as the base classifiers instead of decision trees [50]. This method was already evaluated on genomic and
proteomic datasets [50], but it was not yet used for remote sensing image classification. The main training and
prediction steps of RoRF are presented in Algorithm 1.
In the training phase, the feature space is firstly divided into K disjoint subspaces. PCA is performed on each
subspace with the bootstrapped samples of 75% of original training set. A transformed training set is generated by
rotating with a sparse matrix Rai the original training set. An individual classifier is trained on this rotated training
set. In the prediction phase, a new sample x∗ is rotated by Rai . Then, the transformed set, i.e., x∗ Rai , is classified
by the ensemble and the class with the maximum number of votes is chosen as the final class. It is important to
notice Step 5 in Algorithm 1, in which 75% of the original size of training samples are selected to avoid obtaining
the same coefficients of the transformed components if the same features are selected, thus enhancing the diversity
among the member classifiers.
Rotation Random Forest can improve the performance of Random Forest by introducing further diversity performing a feature extraction within the ensemble. The base classifiers in Rotation Random Forest are more diverse
and accurate with respect to Rotation Forest and this could be beneficial for the ensemble.
III. ELM AND ITS ENSEMBLES
ANN is another base learner used for the construction of classifier ensembles. However, the main drawback of
conventional ANN is the high computation complexity and low efficiency. To address the shortcoming, Extreme
learning machine (ELM) was proposed for the learning of generalized single hidden layer feed-forward neural
networks (SLFNs) without tuning the hidden layers [48], [49].
A. Extreme learning machine
For generalized SLFNs, the output function of ELM is defined as:
f (xi ) =
δ
X
βj hj (xi ) = h(xi )β
(1)
j=1
where, β = [β1 , β2 , ..., βδ ]
⊤
is the vectors of weights between the hidden layer of δ nodes and the output node
and h(xi ) = [h1 (xi ), h2 (xi ), ..., hδ (xi )] is the vector of hidden layer of xi . Specifically, h(·) is the feature mapping
from the D-dimensional input space to the δ-dimensional hidden-layer feature space.
The standard SLFNs can approximate these n samples with zero error means that
P
i
kf (xi ) − yi k = 0. Thus,
the n equations can be written compactly as:
Hβ = Y
March 1, 2015
(2)
DRAFT
7
Algorithm 1 Rotation RF
Training phase
n
Input: {X, Y} = {xi , yi }i=1 : training samples, T : number of classifier, K: number of subsets (M : number of
features in each subset), L: base classifier. The ensemble L = ∅. F: Feature set
Output: The ensemble L
1: for i = 1 : T do
2:
randomly split the features F into K subsets Fij
3:
for j = 1 : K do
4:
extract from X the new training set Xi,j with the corresponding features Fij
5:
generate a subset X̂i,j by selecting with the bootstrap algorithm, the 75% of the initial training samples
in Xi,j
(1)
(M )
6:
transform X̂i,j to get the coefficients vi,j , ..., vi,j k
7:
end for
8:
sparse matrix Ri is composed of the above coefficients
(1)
(M )
vi,1 , ..., vi,1 1
0
···
0
(1)
(M )
0
vi,2 , ..., vi,2 2 · · ·
0
Ri =
..
..
..
..
.
.
.
.
(1)
(MK )
0
0
· · · vi,j , ..., vi,j
with respect to the original feature set, rearrange Ri to Rai
obtain the new training samples {XRai , Y}
build RF classifier Li using {XRai , Y}
Add the classifier to the current ensemble, L = L ∪ Li .
13: end for
9:
10:
11:
12:
Prediction phase
T
Input: The ensemble L = {Li }i . A new sample x∗ . Rotation matrix: Rai .
Output: class label y ∗
a
1: get the output ensemble with x∗ Ri .
T
P
1
2: y ∗ = argmax
q∈{1,2,...,Q} j:Lj (x∗ Ra
i )=q
where, Y is the target matrix and H is the hidden-layer output matrix:
h(x1 )
h1 (x1 ) · · · hδ (x1 )
..
..
. .
H = .. = ..
.
.
h(xn )
h1 (xn ) · · · hδ (xn )
(3)
The output weights in equation (2) are given by the following smallest norm least-squares solution [49]:
β = H+ Y
(4)
where, H+ is the Moor-Penrose generalized inverse of the hidden layer output matrix H.
In ELM, a feature mapping H from input space to a higher dimensional space is needed. The works of [55],
[56] demonstrated that almost all nonlinear piecewise continuous functions can be used as output functions of the
March 1, 2015
DRAFT
8
hidden-nodes. In this paper, the Sigmoid function is adopted as the nonlinear piecewise continuous function:
g(ω, b, xi ) =
1
1 + exp (−(ω · xi + b))
δ
where, {ωj , bj }i=1 are randomly generated values that can define a continuous probability distribution (i.e.,
(5)
R
g = 1).
Thus, h(xi ) is defined based on the nonlinear piecewise continuous function g(ωi , bi ):
h(xi ) = [g(ω1 , b1 , xi ), ..., g(ωδ , bδ , xi )]
(6)
The training and prediction steps of ELM are listed in Algorithm 2.
Algorithm 2 Extreme learning machine
Training phase
n
Input: {X, Y} = {(xi , yi )}i=1 : training samples, δ: number of nodes in a hidden layer. g: the sigmoid function .
Output: The output weight β.
1: Randomly select the {ω1 , ..., ωδ } and {b1 , ..., bδ }
2: For each training sample xi , calculate the output layer matrix: h(xi ) = [g(ω1 , b1 , xi ), ..., g(ωδ , bδ , xi )]
+
3: Calculate the output weight: β = H Y
Prediction phase
Input: A new sample x∗ . The output weight β. The sigmoid function g. {ω1 , ..., ωδ } and {b1 , ..., bδ }
Output: Class label of x∗ .
1: Calculate the output layer matrix: h(x∗ ) = [g(ω1 , b1 , x∗ ), ..., g(ωδ , bδ , x∗ )].
2: y ∗ = h(x∗ )β. Assign the number of column which gets the greatest value among the columns to the class label
of x∗ .
Compared to conventional feed forward ANNs, ELM offers significant advantages such as: 1) fast leaning speed,
2) no need to tune the parameters, 3) better generalization performance and 4) ease of implementation [48], [49].
B. Proposed ELM ensembles
ELM decreases the learning time dramatically with respect to a conventional ANN due to the random selection
of weights and biases for hidden nodes [55], [56]. However, these parameters are not optimized. In this case,
ELM will not be able to incorporate prior knowledge of the inputs, thus the generalization error might increase.
Consequently, we propose to construct an ensemble of several predictors on the training set using RS method in
which the parameters in each predictor are randomly selected. In this work, two implementations of the ELM
ensembles, Random subspace- and Rotation Subspace-based are developed for hyperspectral image classification.
1) Random subspace with ELM: Given a training set, the parameters of ELM (activation function and number
of hidden nodes), the number of features in a subset (M ) and the number of classifiers (T ), the RS with ELM
algorithm can be summarized by the following three steps (see Algorithm. 3): 1) generate a subset of M features
from the entire feature set for T times; 2) apply these features to ELM classifier and obtain T classification results;
3) produce the final classification map by combining the T predictions using a majority voting rule.
March 1, 2015
DRAFT
9
Algorithm 3 Random Subspace with ELM
Training phase
n
Input: : {X, Y} = {(xi , yi )}i=1 : training samples. T : number of classifiers. L: Base classifier. L = ∅: the ensemble.
M : number of features in a subspace (M < D). F: feature set.
Output: : The ensemble L.
1: for i = 1 to T do
2:
Randomly selected from F without replacement to form a new training set composed of M features.
3:
Train a ELM classifier Li using a new training set.
4:
Add the classifier to the current ensemble, L = L ∪ Li .
5: end for
Prediction phase
T
Input: The ensemble L = {Li }i . A new sample x∗ .
Output: class label y ∗
1: run each classifier in the ensemble using x∗ .
T
P
1
2: y ∗ = argmax
q∈{1,2,...,Q} j:Lj (x∗ )=q
2) Rotation subspace ELM: The main steps of Rotation subspace ELM (RoELM) can be summarized as follows:
•
divide the feature space into K disjoint subspaces;
•
perform PCA to each subspace with the bootstrapped samples of 75% of the original training set;
•
the new training set, which is obtained by rotating the original training set, is treated as input to the individual
classifier;
•
the final result is generated by combing the individual classification results using a majority voting rule.
The main difference between RoELM and RoRF is that we use the ELM classifier instead of the RF classifier
as base learner (see Step 8 in training phase of Algorithm 1). Diversity in RoELM is promoted in three aspects:
1) random selection of features; 2) feature extraction applied to the selected features using bootstrap sampling
technique; 3) random selection of parameters in each ELM classifier. When the number of training samples is less
than the number of features, the covariance matrix will be singular and cannot be inverted. In order to avoid the
problem of singularity of the covariance matrix, the value of 0.75 × n should be larger than M in the ensembles
of RoF, RoRF and RoELM.
IV. E XTENDED MULTI - ATTRIBUTE PROFILES (EMAP S )
Mathematical morphological is a powerful framework for the analysis of spatial information in remote sensing
imagery [30], [35]. In particular, attribute profiles have been successfully applied to produce classification maps
of remote sensing data [38], [39]. A sequence of attribute filters (AFs) are applied to a scalar image to obtain
APs. AFs are connected operators, that is they process a gray-level image by keeping or merging their connected
components at different gray levels.
Denoting respectively with φ and γ an attribute thickening and thinning based on the arbitrary criterion Pλ . An
AP of an image f is obtained by applying several attribute thickening and thinning operators with given a sequence
of thresholds {λ1 , λ2 , .., λǫ } for the predicate P as follows [39]:
March 1, 2015
DRAFT
10
AP (f ) = φλǫ (f ), φλǫ−1 (f ), ..., φλ1 (f ), f,
γ λ1 (f ), ..., γ λǫ−1 (f ), γ λǫ (f )
(7)
AP deals with only one spectral band. If we apply the full spectral bands of hyperspectral data to extract APs,
the dimensionality of APs becomes extremely high. In order to address the problem, Dalla Mura et al. proposed to
consider few of the first several principal components of the hyperspectral data [39]. However, any feature extraction
and selection could be also used [42]. Thus, the expression of an EAP computed on the first C PCs from the original
hyperspectral data [39] is given by:
EAP = {AP (P C1 ), AP (P C2 ), ..., AP (P CC )}
(8)
An EMAP is composed of m different EAPs based on different attributes {a1 , a2 , ..., am }:
o
n
′
′
EM AP = EAPa1 , EAPa2 , ..., EAPam
(9)
′
where, EAPa = EAPa / {P C1 , P C2 , ..., P CC }.
Although a wide variety of attributes can be used to construct APs, only the area and standard deviation attributes
are considered in this study. Fig. 1 presents the general steps of the construction of EMAPs using the area and
standard deviation attributes. Firstly, PCA is performed on the original hyperspectral image and the first components
with cumulative eigenvalues over 99% are remained. Then, APs with attribute and standard deviation attributes are
computed on the first retained features and the output features are concatenated into a stacked vector to construct
an EMAP.
According to [43], λs is initialized so as to cover a reasonable amount of deviation in the individual feature,
which is mathematically given by:
λs (Fi ) =
µi
{τmin , τmin + ǫs , τmin + 2ǫs , ..., τmax }
100
(10)
where, Fi is the ith feature of the image and µi is the mean value of the ith feature. The values of τmin , τmax
and ǫs are 2.5%, 27.5% and 2.5%, respectively, which leads to 11 thinning and 11 thickening operations.
The construction of the attribute area is established in the following:
λa (Fi ) =
100
{αmin , αmin + ǫa , αmin + 2ǫa , ..., αmax }
ν
(11)
where, ν is the spatial resolution of the remote sensing image. The values of αmin , αmax and ǫa are 1, 14 and
1. The EAP for the area attribute contains 14 thinning and 14 thickening operations for each feature.
March 1, 2015
DRAFT
11
Fig. 1. The construction of EMAPs using the area (A) and standard (S) deviation attributes. Firstly, PCA is performed on the original
hyperspectral image and the first features with cumulative eigenvalues over 99% are kept. Then, APs with attribute and standard deviation
attributes are performed on the first features and the output features are concatenated into a stacked vector to construct EMAPs.
March 1, 2015
DRAFT
12
V. E XPERIMENTS WITH A SIMULATED DATA
In this section, a simple simulated hyperspetral data is used to evaluate the classification performance of the
proposed methods, including RoRF ensemble, ELM ensembles and spectral-spatial strategy.
An synthetic image is generated by a linear mixture model with Q = 4 spectra:
xi =
Q
X
mq sqi + ni
(12)
q=1
where, xi is simulated mixed pixel. {mq } are the spectral signatures obtained from the USGS digital library
2
.
The spatial information is generated by using an multi-level logistic (MLL) distribution with a value of smoothness
parameter equals to 2. The simulated image is composed of 128×128 pixels with 224 spectral bands. Assume that
q
xi has class label yi = qq , then we define si q as the abundance of the objective class and sqi is the abundance of
the remaining features which contribute to the mixed pixel, where sqi is constructed by the uniform distribution
P
q
sqi = 1 − s, and we use the same value of s = 0.7 for
over the simplex. In this section, we take si q = s,
q∈Q,q6=qi
all pixels.
Furthermore, zero-mean Gaussian noise with variance σ 2 I, i.e. ni ∼ N 0, σ 2 I is added to the simulated image.
In particular, σ 2 is set to be 0.8 in order to make a very challenging classification problem. More details about how
this dataset is generated can be found in [57].
We conducted four different experiments with the simulated hyperspectral image in order to investigate several
relevant aspects of our proposed framework. In all experiments, 40 samples per class (total number of 160 samples)
are selected as training set. In order to increase the statistical significance of the results, the reported means and
standard deviations are obtained from 10 Monte Carlo runs. Experiments for the simulated data consists in:
•
In the first experiment, we evaluate the classification performance of RoRF with respect to other decision tree
ensembles, including RSDT, RF and RoF. The impact of the parameters, such as the number of trees in RF
and RoRF and the number of features in a subset (M), are analyzed.
•
In the second experiment, we compare the ELM ensembles with a standard ELM classifier. The key parameters
are also analyzed.
•
In the third experiment, we compare the ELM ensembles with DT ensembles.
•
In the fourth experiment, we give the classification performance of the proposed spectral-spatial classification
strategy.
In order to analyze the ensembles clearly, we present the follow measures to evaluate the performance:
•
Overall accuracy (OA) of the ensemble.
•
Average of overall accuracies (AOA) of the individual classifiers.
•
Diversity among the individual classifiers within the ensemble. In this paper, we select coincident failure
diversity (CFD) as a diversity measure [58] 3 . Please refer to the details of CFD in [58]. Higher values of
2 https://engineering.purdue.edu/biehl/MultiSpec/.
3 http://pages.bangor.ac.uk/∼mas00a/book
March 1, 2015
wiley/matlab code/diversity/demo diversity.html.
DRAFT
13
TABLE II
OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES ( IN PERCENT ) AND DIVERSITY OBTAINED FOR DIFFERENT DT
ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (T = 20 AND M = 10).
Classifiers
OA
AOA
Diversity
RSDT
53.52±1.15
35.83±0.42
0.38±0.0044
RF
57.81±0.80
37.12±0.51
0.39±0.0053
RoF
79.50±1.09
57.07±1.24
0.61±0.0120
RoRF
85.15±0.73
67.75±1.06
0.70±0.0089
TABLE III
OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES ( IN PERCENT ) AND DIVERSITY OBTAINED FOR RO RF WITH
DIFFERENT NUMBER OF TREES IN RF (T =20 AND M = 10).
Classifiers
OA
AOA
Diversity
1
RF
39.31±1.64
N/A
N/A
RoRF
72.80±1.59
46.54±0.80
0.49±0.0084
Number of trees in RF
5
RF
RoRF
44.36±1.58
79.85±0.95
36.97±1.24
57.25±0.82
0.46±0.0106 0.62±0.0087
10
RF
52.25±0.91
37.64±0.96
0.42±0.0040
RoRF
82.84±0.87
60.60±0.89
0.68±0.0088
CFD means the stronger diversity within the ensemble.
A. Experiment 1
In this experiment, the number of base classifiers (T ) and the number of features in a subset (M ) are set to be 20
and 10, respectively. A non-parametric decision tree learning technique: classification and regression tree, is used
to construct the decision tree ensembles [59]. The impurity measure used in selecting the variables in CART is
Gini index. Table II shows overall accuracies, average of overall accuracies of individual classifiers and diversities
obtained for different decision tree ensemble classifiers. Greater values of AOA and diversity usually lead to better
performances of the ensemble. From Table II, RoRF gets the highest values of AOA and diversity, resulting in the
best classification result. Then, we analyze the performance of the decision tree ensembles for different values of
M with T = 20. Fig. 2 shows the OAs, AOAs and diversities obtained by RSDT, RF, RoF and RoRF as a function
of M . The best performance achieved by the proposed RoRF ensemble approach, which yields the best OA, AOA
and diversity with all different values of M . From the figure, we can observe that when M becomes larger, all
the DT ensembles tend to have better performances. Furthermore, we also studied the classification performances
of RoRF with different number of trees in RF. The statistics are reported in Table III. OA, AOA and diversity of
RoRF increase as the number of trees increases. Notice that a good performance achieved by RoRF with only 10
decision trees in RF, which gets an OA of 82.84%, even higher than the one of RoF with 20 trees. Therefore, in
order to save time and memory, we can build the RoRF ensemble with a small number of trees in RF.
B. Experiment 2
In this experiment, T , M and δ are set to be 20, 10 and 256, respectively. The Sigmoid function is selected as the
functions of hidden-nodes. OAs, AOAs and diversities obtained by ELM and the proposed ELM ensemble classifiers
March 1, 2015
DRAFT
14
75
0.8
70
0.75
85
80
0.7
65
0.65
75
65
Diversity
AOA(%)
OA(%)
60
70
55
0.6
0.55
50
0.5
60
RSDT
RF
RoF
RoRF
55
50
45
RSDT
RF
RoF
RoRF
40
0.4
0.35
35
20
40
60
80
100
Number of features in a subset (M)
RSDT
RF
RoF
RoRF
0.45
20
40
60
80
100
Number of features in a subset (M)
(a)
20
(b)
40
60
80
100
Number of features in a subset (M)
(c)
Fig. 2. (a) OAs obtained by RSDT, RF, RoF and RoRF with different values of M . (b) AOAs obtained by RSDT, RF, RoF and RoRF with
different values of M . (c) Diversities obtained by RSDT, RF, RoF and RoRF with different values of M (T = 20).
TABLE IV
OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF INDIVIDUAL CLASSIFIERS ( IN PERCENT ) AND DIVERSITY
OBTAINED FOR DIFFERENT ELM ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (M = 10).
Classifiers
OA
AOA
Diversity
ELM
53.52±1.15
N/A
N/A
RSELM
67.75±1.00
43.80±0.54
0.46±0.0057
RoELM
79.73±1.67
58.98±1.19
0.62±0.0115
are shown Table IV. Note the proposed RoELM ensemble classifier produced the best result. This is reasonable
since RoELM introduces rotation strategy into the ensemble, which can both increase the accuracies of the member
classifiers and diversity within the ensemble. Fig. 3 plots the OAs, AOAs and diversities of ELM ensembles with
respect to the number of features in a subset (M ). For these experiments, T and δ are fixed to be 20 and 256,
respectively. Fig. 4 reports the OAs, AOAs and diversities with different parameter δ for ELM ensembles. T and
M are fixed to be 20 and 10, respectively. From the two figures, we have the following observations: 1) RoELM
achieves the best results in all the cases, which demonstrates that RoELM is an effective ensemble method; 2) the
effects of M are consistent with the preliminary test in Fig. 2. The higher OAs, AOAs and diversities of ELM
ensembles, the higher the number of features in a subset (M ); 3) for this simulated data, ELM and its ensemble
produce the best classification performances when δ = 64. In this case, larger δ results in lower AOAs and diversities
of RSELM and diversities of RoELM. In contrast, AOAs of RoELM increase as δ increases. The reason is that a
complex network (high value of δ) may overfit the training data of the member classifier in RSELM.
C. Experiment 3
Table V gives the OAs, AOAs and diversities obtained for RSDT, RSELM, RoF and RoELM ensembles. In order
to make fair comparisons of DT ensembles and ELM ensembles, RSDT shares the same subspace with RSELM
and RoF shares the same rotations with RoELM. In these experiments, T , M and δ are set to be 20, 10 and 256,
March 1, 2015
DRAFT
15
85
0.8
70
0.75
80
65
0.7
65
0.65
60
Diversity
70
AOA(%)
OA(%)
75
55
0.6
0.55
60
50
ELM
RSELM
RoELM
55
0.5
RSELM
RoELM
45
RSELM
RoELM
0.45
50
0.4
20
40
60
80
100
Number of features in a subset (M)
20
40
60
80
100
Number of features in a subset (M)
(a)
20
40
60
80
100
Number of features in a subset (M)
(b)
(c)
Fig. 3. (a) OAs obtained by ELM, RSELM and RoELM with different values of M . (b) AOAs obtained by RSELM and RoELM with different
values of M . (c) Diversities obtained by RSELM and RoELM with different values of M (T = 20 and δ = 256).
80
0.65
80
75
75
0.6
70
70
60
Diversity
AOA(%)
OA(%)
65
65
60
0.55
0.5
55
55
RSELM
RoELM
50
50
ELM
RSELM
RoELM
45
0.45
RSELM
RoELM
45
40
200
400
600
800
Number of hidden nodes (δ)
1000
(a)
0.4
200
400
600
800
Number of hidden nodes (δ)
1000
(b)
200
400
600
800
Number of hidden nodes (δ)
1000
(c)
Fig. 4. (a) OAs obtained by ELM, RSELM and RoELM with different values of δ. (b) AOAs obtained by RSELM and RoELM with different
values of δ. (c) Diversities obtained by RSELM and RoELM with different values of δ (T = 20 and M = 10).
TABLE V
OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF MEMBER CLASSIFIERS ( IN PERCENT ) AND DIVERSITY
OBTAINED FOR RSDT, RSELM, RO F AND RO ELM ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (T = 20,
M = 10 AND δ = 256).
Classifiers
OA
AOA
Diversity
March 1, 2015
RSDT
53.52±1.15
35.83±0.42
0.38±0.0044
RSELM
67.75±1.00
43.80±0.54
0.46±0.0057
RoF
79.50±1.09
57.07±1.24
0.61±0.0120
RoELM
79.73±1.67
58.98±1.19
0.62±0.0115
DRAFT
16
respectively. Compared to DT ensembles (RSDT and RoF), the proposed ELM ensembles (RSELM and RoELM)
adapt two strategies to improve the classification performance, one is to use ELM as the base learner to improve
the individual classification accuracies and the other is to utilize random selection of parameters in each ELM
classifier to promote diversity within the ensemble. In addition, RoELM and RoF are superior to RSELM and
RSDT. Sensitivity of the parameters to the classification performance can be seen in Section V. B and Section V.
C.
D. Experiment 4
Although Random subspace ensemble can provide better classification results than single classifiers, the map (seen
in Fig.5(b)) looks still noisy due to the use only of spectral information. In order to further improve the results, the
spatial information should be considered. In this work, we propose to use EMAPs, which can offer the potential to
model structural information in great details according to the use of different types of attributes. In this experiment,
the proposed RoELM for spectral and spatial classification is considered. The first four components resulting from
PCA (which comprise more than 99% of the data variance) were used and the EMAPs consisted of 204 features.
Fig. 5 shows the ground truth of the simulated image and classification maps of RoELM with spectral and spatial
information, respectively. The OAs, AOAs and diversities are also listed in Table VI. The classification accuracy of
RoELM with EMAPs is significantly higher than the one of RoELM with spectral information and the map produced
by RoELM with EMAPs is more smoothness than the one produced by RoELM with spectral information when
compared to the ground truth. In addition, we also studied the effect of M and δ on the classification performances
of RoELM with EMAPs. For this dataset, RoELM with EMAPs can obtain high overall accuracy with M = 10
(OA = 99.30%). Larger values of M cannot improve the accuracies significantly. Similar to the results of RoELM
with spectral information, classification accuracy of RoELM with EMAPs increases gradually in the beginning as
δ increases, but decreases dramatically when δ reaches 64.
Summarizing, the experiments conducted with simulated data sets indicated that Random subspace ensembles
with or without EMAPs achieves better performance than single classifiers in highly mixed and noisy environments
and with a limited number of training samples. In particular, the proposed RoRF and ELM ensembles show good
performance in the classification of hypersepctral data.
However, the performances of the RS ensembles have been shown to be dependent on the setting of parameters M
and δ. In order to get good characterization results, these parameters should be optimized. In particular, fine-tuning
of these parameters can be done by manual manipulation.
Although it is encouraging to observe positive classification results using the proposed methods on simulated
hyperspectral data, we have performed further analysis with real hyperspectral scenes and comparisons with other
state-of-the-art methods in the next section in order to fully substantiate the proposed methods.
March 1, 2015
DRAFT
17
TABLE VI
OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF MEMBER CLASSIFIERS ( IN PERCENT ) AND DIVERSITY
OBTAINED FOR RO ELM ENSEMBLES WHEN APPLIED TO SPECTRAL AND SPATIAL INFORMATION OF THE SIMULATED H YPERSPECTRAL
DATA .
Classifiers
OA
AOA
Diversity
RoELM with spectral information
79.73±1.67
58.98±1.19
0.62±0.0115
(a)
RoELM with EMAPs
99.30±0.08
97.12±0.13
0.74±0.0221
(b)
(c)
Fig. 5. (a) Image of class labels for a simulated image. (b) Classification map of RoELM with spectral information (OA = 79.67%). (c)
Classification map of RoELM with EMAPs (OA = 99.42%).
VI. E XPERIMENTAL WITH REAL HYPERSPECTRAL DATASETS
In this section, the proposed approaches are evaluated using two real hyperspectral datasets. Two individual
classifiers, DT and ELM, and six ensemble methods, RSDT, RF, RoF, RoRF, RSELM and RoELM are applied to
classify the spectral information and EMAPs of hyperspectral data. CART is used to build decision tree ensembles
and Gini index is adopted as the impurity measure used in selecting the variable [59]. The Sigmoid function is
selected as the functions of hidden-nodes in ELM and its ensembles.
In this work, only the first four components resulting from PCA (which comprise more than 99% of the data
variance) were used, and the EMAPs consisted of 204 features. The reported results in this work are achieved by
the mean of ten Monte Carlo runs.
We used the following measures to evaluate the performances of different classification methods:
•
Overall accuracy (OA): the percentage of correctly classified samples.
•
Average accuracy (AA): average percentage of correctly classified samples for individual class.
•
Kappa coefficient (κ): the percentage agreement corrected by the level of agreement that could be expected to
chance alone.
•
Computation time: all methods were implemented in Matlab on a computer having Inter(R) Xeon(R) 2 CPU,
2.8 GHz and 12GB of memory. Random Forest implementation was downloaded from the website: http:
//code.google.com/p/randomforest-matlab/. The source code of ELM can be assessed from the website: http:
//www.ntu.edu.sg/home/egbhuang/elm codes.html.
March 1, 2015
DRAFT
18
Fig. 6.
(a) Three-band color composite of AVIRIS image. (b) Ground truth.
TABLE VII
I NDIAN P INES AVIRIS IMAGE : CLASS NAME AND NUMBER OF SAMPLES IN G ROUND TRUTH
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Class
Name
Alfalfa
Corn-no till
Corn-min till
Bldg-Grass-Tree-Drives
Grass/pasture
Grass/trees
Grass/pasture-mowed
Corn
Oats
Soybeans-no till
Soybeans-min till
Soybeans-clean till
Wheat
Woods
Hay-windrowed
Stone-steel towers
Number
Ground Truth
54
1434
834
234
497
747
26
489
20
968
2468
614
212
1294
380
95
A. Hyperspectral datasets
Two hyperspectral remote sensing images are used to assess the performance of the proposed methods.
1) Indian Pines AVIRIS image: The first hyperspectral image is recorded by the Airborne Visible/Infrared Imaging
Spectrometer (AVIRIS) sensor over the Indian Pines in Northwestern Indiana, USA. This scene, which comprises
220 spectral bands in the wavelength range from 0.4 to 2.5 µm with spectral resolution 10nm, is composed of
145 × 145 pixels, and the spatial resolution is 20 m/pixel. Fig. 6 shows the three-band color composite image
and ground truth of AVIRIS hyperspectral data. Table VII gives the class name and the number of ground truth of
AVIRIS hyperspectral data.
March 1, 2015
DRAFT
19
Fig. 7.
(a) Three-band color composite image of AVIRIS data. (b) Reference map.
2) University of Pavia ROSIS image: The second experiment was carried out on the University of Pavia image
of an urban area, acquired by the Reflective Optics Spectrographic Imaging System (ROSIS)-03 optical airborne
sensor. Nine land cover classes were considered for classification. The original image is composed by 610 × 340
pixels, with a spatial resolution of 1.3 m/pixel and 115 spectral bands. In this work, 12 noisy channels were removed
and the remaining 103 spectral bands are used for the investigation. Fig. 7 shows the three-band color composite
image and reference map of University of Pavia data. Class name and the number of training and test samples of
ROSIS image are presented in Table VIII.
B. Results of Indian Pines AVIRIS image
Table IX and X present the classification results obtained for the individual classifiers and RS ensemble methods
using different number of training samples when the spectral and EMAPs are used as input, respectively. Average
accuracies for each classifier are also given in parentheses. According to the studies in [27] and [60], the parameters
used for each ensemble classifier are shown in Table XI 4 . As shown in Table IX and X, the RS ensemble methods
4 For
5 samples per class, M is set to be 55 in RoF, RoRF and RoELM ensembles with spectral information.
March 1, 2015
DRAFT
20
TABLE VIII
U NIVERSITY OF PAVIA ROSIS IMAGE : CLASS NAME AND NUMBER OF TRAINING AND TEST SAMPLES
Number
1
2
3
4
5
6
7
8
9
Class
Name
Bricks
Shadows
Metal Sheets
Bare Soil
Trees
Meadows
Gravel
Asphalt
Bitumen
Number of samples
Train
Test
524
3682
514
947
375
1345
540
5029
231
3064
532
18649
265
2099
548
6631
392
1330
TABLE IX
OVERALL ACCURACIES AND AVERAGE ACCURACIES ( IN PARENTHESES ) OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING
DIFFERENT SIZE OF TRAINING SET WHEN APPLIED TO THE SPECTRAL INFORMATION OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA .
Samples per class
5
10
15
20
25
30
35
40
45
50
DT
29.64±3.61(39.62)
38.85±4.03(49.87)
40.43±2.42(51.79)
43.13±2.35(55.15)
46.59±1.32(57.54)
47.49±1.35(58.24)
47.90±2.56(57.75)
49.05±2.06(58.33)
50.42±2.14(60.79)
50.51±2.63(59.48)
RSDT
36.41±3.49(47.17)
46.57±2.48(58.33)
49.62±1.18(61.91)
53.82±2.07(65.78)
55.71±1.18(66.78)
58.23±2.36(67.98)
59.82±1.31(69.12)
60.85±1.27(70.10)
62.22±1.24(72.08)
62.63±0.99(71.60)
RF
42.87±3.65(53.79)
49.89±3.16(60.86)
51.13±1.52(63.73)
55.52±2.50(66.92)
57.23±1.56(68.52)
60.07±2.10(70.47)
61.48±1.59(71.10)
62.66±1.52(71.95)
63.68±0.93(73.25)
64.20±0.58(77.51)
RoF
47.79±3.23(61.05)
57.33±2.26(69.95)
63.03±1.65(73.79)
68.98±2.71(79.89)
71.81±1.80(81.19)
72.65±2.27(81.36)
74.36±0.58(82.54)
74.46±1.19(82.97)
76.58±1.26(84.90)
75.96±1.06(84.39)
RoRF
51.14±3.22(62.66)
58.39±1.78(69.58)
61.49±1.71(73.01)
67.56±2.55(78.54)
70.17±1.46(79.43)
70.79±1.77(80.03)
72.78±1.17(81.23)
73.45±1.53(81.64)
74.81±1.37(83.37)
74.56±1.26(82.21)
ELM
51.15±2.58(65.8)
57.17±1.92(71.06)
58.51±1.80(71.99)
57.68±2.02(70.75)
62.03±1.58(76.34)
61.88±1.10(73.96)
66.73±1.62(76.98)
66.44±1.13(77.79)
67.46±1.05(77.31)
67.65±1.69(77.11)
RSELM
55.39±3.08(69.69)
65.11±1.83(76.35)
68.69±1.59(79.7)
70.73±1.18(80.36)
72.67±0.97(83.71)
75.61±1.11(85.49)
74.86±1.53(84.31)
75.46±0.80(85.05)
77.43±0.31(86.64)
77.85±1.09(87.57)
RoELM
58.72±2.06(71.87)
69.84±1.27(79.67)
72.93±1.07(83.43)
75.95±0.82(85.22)
77.13±0.94(86.51)
78.24±0.67(86.87)
78.89±1.07(87.17)
80.08±0.5(88.24)
80.34±0.25(88.08)
81.19±0.7(89.00)
exhibit the potential to improve the classification performance by using both spectral and spatial information.
The proposed RoELM outperforms ELM, RSELM and other decision tree ensemble in terms of achieving higher
classification accuracies in all cases. With the help of promoting diversity using feature extraction approaches,
Rotation subspace classifiers, including RoF, RoRF and RoELM, are superior to the ensemble classifiers of RF,
RSDT and RSELM.
In order to show the performance of RS ensemble methods under different training conditions and scenarios,
in the second experiment, we evaluated the classification accuracies of the RS ensemble approaches using a fixed
number of training samples in which 10% of the labeled samples per class have been used for training (a total
number of 1036 samples) and the remaining labeled samples are used for testing. Table XII and XIII provide the
TABLE X
OVERALL ACCURACIES AND AVERAGE ACCURACIES ( IN PARENTHESES ) OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING
DIFFERENT SIZE OF TRAINING SET WHEN APPLIED TO THE EMAP S OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA .
Samples per class
5
10
15
20
25
30
35
40
45
50
March 1, 2015
DT
55.07±6.68(65.95)
70.22±5.01(80.19)
76.04±1.12(83.34)
80.82±2.32(85.98)
81.62±2.24(87.64)
84.17±2.23(88.38)
85.13±3.11(89.15)
85.14±1.71(88.52)
85.68±1.48(89.31)
87.08±1.53(90.43)
RSDT
57.48±5.51(70.24)
73.84±4.59(82.36)
80.34±2.28(85.96)
82.69±1.75(87.84)
84.37±2.14(89.95)
87.07±2.82(90.27)
87.51±2.25(91.72)
87.72±1.82(90.91)
89.12±1.91(92.42)
90.27±1.21(92.82)
RF
65.69±5.58(77.51)
77.21±3.43(85.77)
83.18±1.78(89.26)
84.46±1.80(89.44)
87.54±1.34(92.1)
88.66±1.06(92.94)
88.93±1.33(92.66)
89.85±1.23(93.29)
90.81±0.97(93.69)
91.61±0.79(94.01)
RoF
66.22±4.41(76.91)
78.29±2.94(76.91)
83.26±1.9(88.25)
85.36±1.53(89.29)
87.92±1.31(92.10)
89.32±1.67(92.60)
89.84±1.06(93.07)
90.44±0.98(92.98)
91.34±0.96(93.87)
92.20±0.85(94.59)
RoRF
70.81±4.59(81.64)
80.98±2.45(88.31)
86.01±1.65(90.78)
87.41±1.45(91.55)
89.30±1.83(93.48)
91.05±1.60(94.46)
91.29±0.74(94.00)
91.99±1.20(94.45)
92.71±0.93(95.16)
93.31±0.42(95.33)
ELM
73.24±5.28(81.44)
82.47±2.71(87.17)
83.92±2.94(88.02)
83.98±2.41(87.31)
87.02±2.60(90.7)
89.65±1.78(92.28)
89.48±1.70(91.73)
88.76±1.17(91.03)
91.4±1.24(93.73)
91.32±0.86(93.35)
RSELM
74.24±5.38(81.55)
83.36±2.33(87.69)
85.76±2.67(89.25)
87.67±1.49(90.15)
88.37±1.69(91.61)
90.73±1.60(92.82)
90.76±1.20(92.45)
91.01±1.25(92.54)
93.71±0.65(95.41)
94.29±0.43(95.31)
RoELM
75.97±4.01(83.59)
85.19±2.32(89.43)
87.45±2.05(91.4)
89.35±1.09(92.19)
90.70±1.33(94.02)
92.14±1.45(94.71)
92.77±0.98(94.99)
92.99±0.64(94.8)
93.97±0.60(95.61)
94.53±0.41(95.25)
DRAFT
21
TABLE XI
T HE PARAMETERS USED FOR ELM AND RS ENSEMBLE CLASSIFIERS (I NDIAN P INES AVIRIS IMAGE ).
Features
Spectral
Methods
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
T
20
20
20
20
−
20
20
M
110
15
110
110
−
110
110
δ
−
−
−
−
256
256
256
Features
EMAPs
Methods
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
T
20
20
20
20
−
20
20
M
102
15
3
3
−
102
3
δ
−
−
−
−
256
256
256
TABLE XII
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10% NUMBER OF SAMPLES IN G ROUND
T RUTH AS TRAINING SAMPLES WHEN APPLIED TO THE SPECTRAL INFORMATION OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA .
Class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
OA
AA
κ
Time(s)
DT
38.16
50.76
45.09
26.87
67.63
78.23
20.43
85.93
1.67
47.01
62.02
29.73
80.94
89.57
35.96
54.59
59.77
50.79
54.13
1.49
RSDT
28.77
65.79
53.86
34.69
79.04
92.14
16.09
92.77
0.56
63.85
80.56
42.28
90.94
93.67
39.77
71.88
72.53
57.98
68.44
9.25
RF
15.71
61.82
50.55
30.76
79.98
92.37
14.78
96.16
6.11
61.76
83.88
47.25
92.93
94.13
39.33
81.06
72.84
59.29
68.70
0.85
RoF
53.06
79.86
69.31
63.32
88.37
94.72
59.13
97.73
0
77.19
87.35
69.58
97.91
96.22
55.85
88.35
83.14
73.62
80.70
26.95
RoRF
29.80
74.38
62.88
50.76
83.06
93.99
50.87
98.20
2.78
76.87
90.53
68.72
97.07
96.36
45.09
89.53
81.31
69.12
78.49
18.11
ELM
15.11
73.01
59.01
36.59
90.56
95.55
3.04
99.43
4.44
64.02
80.77
67.09
99.63
95.55
59.97
46.12
77.46
61.89
74.11
0.22
RSELM
25.71
79.95
61.89
54.17
93.17
97.49
8.02
99.57
12.22
68.56
87.72
75.51
99.58
96.94
60.26
70.59
82.38
68.23
79.74
6.18
RoELM
49.39
83.13
71.85
62.09
91.48
94.64
46.52
98.32
37.77
75.67
87.87
82.64
98.53
97.36
54.77
73.29
84.70
75.33
82.44
14.63
OAs, AAs, κ and class-specific accuracies obtained from the individual and ensemble classifiers using spectral
information and EMAPs, respectively. The processing times in seconds are also included for reference.
It can be seen from the results in Table XII and Table XIII that the performance of ELM is superior to CART
in both of testing accuracy and learning time. When the spectral information is treated as input, RoELM and RoF
share the top position. The OAs (AAs) of the two methods are 84.70% (75.33%) and 81.31% (73.62%), higher than
those of other methods. Class 9 produces bad results in all classifiers, the reason may be that there is insufficient
information provided by class 9 using only 2 samples in the training. Compared to the results reported in Table
XII, the classification accuracies in Table XIII involving the spatial information are much better than those obtained
only with the spectral information, demonstrating that EMAP can accurately model spatial-contextual information
March 1, 2015
DRAFT
22
TABLE XIII
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10% NUMBER OF SAMPLES IN G ROUND
T RUTH AS TRAINING SAMPLES WHEN APPLIED TO THE EMAP S OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA .
Class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
OA
AA
κ
Time(s)
DT
82.65
86.14
92.65
74.55
89.53
94.15
23.48
100
69.44
84.43
94.67
85.14
99.16
98.57
93.27
96.12
91.57
85.23
90.41
0.77
RSDT
83.67
91.01
95.5
87.11
92.37
95.33
23.04
99.77
61.11
86.89
96.18
90.29
98.84
99.22
96.40
97.41
94.05
87.13
93.22
4.55
RF
87.14
91.00
95.31
89.15
92.51
97.22
73.91
99.77
92.77
88.43
97.94
92.28
99.11
99.25
97.63
98.00
95.17
93.21
94.36
0.63
RoF
87.35
91.11
95.63
87.57
93.31
96.95
40.43
99.75
62.78
87.50
96.74
90.22
99.42
99.23
96.35
97.65
94.56
90.00
93.80
13.99
RoRF
87.14
91.55
96.51
91.66
93.20
98.07
84.78
99.80
98.33
89.06
98.55
93.06
99.53
99.24
98.63
98.12
95.83
94.83
95.24
14.07
ELM
74.08
90.21
97.48
88.34
91.28
97.75
88.28
96.23
80.56
92.61
96.53
86.20
98.95
96.29
75.38
0.71
92.59
84.43
91.64
0.21
RSELM
86.94
90.12
98.75
94.27
94.63
99.12
96.09
99.39
89.44
90.55
98.49
89.19
99.48
99.16
92.40
50.51
95.46
92.32
94.82
4.08
RoELM
87.76
90.33
98.95
95.02
94.36
99.32
96.09
99.55
95.56
91.56
98.66
89.17
99.48
99.42
94.36
32.71
95.40
91.39
94.73
13.59
in all cases. For the EMAPs as the input for this scene, RF and RSELM is slightly better than RoF and RoELM.
Among them, RoRF yields the highest OA, AA and κ. Feature extraction techniques in the processing of RoF,
RoRF and RoELM classifiers will lead the longer computation time than those of RSDT, RF and RSELM. The
computational complexity of ELM ensemble is lower than those of DT ensemble. The computation time of RF
ensemble is extremely low (less than 1s).
Fig. 8 presents the classification maps (one of the ten Monte Carlo runs) obtained for the individual and ensemble
learning methods with 10% labeled sample as the training samples in Table XII and Table XIII. As can be seen
from the two figures, RS ensemble can improve the classification performance and reduce the classification noise.
The classification methods based on EMAPs spatial features result in classification maps with more homogeneous
regions when compared to the classification result using spectral information.
More classification results for the Indian Pines AVIRIS image based on EMAPs and other spatial-contextual
information can be found in [34], [45]–[47]. The accuracies in the previous studies are not directly compared with
those given in this paper because different experimental settings (number of features, training and testing samples)
are used in these studies. However, it can be concluded that RS ensembles with EMAPs perform well compared
to other previously proposed classification approaches for hyperspectral data.
March 1, 2015
DRAFT
23
Fig. 8.
Classification results of Indians Pines AVIRIS image (only one Monte Carlo run). Overall accuracies of the classifiers are also given.
March 1, 2015
DRAFT
24
C. Results of University of Pavia ROSIS image
TABLE XIV
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING THE ENTIRE TRAINING SET WHEN
APPLIED TO THE SPECTRAL INFORMATION OF U NIVERSITY OF PAVIA ROSIS IMAGE .
Class
1
2
3
4
5
6
7
8
9
OA
AA
κ
Time(s)
DT
83.46
92.93
96.95
76.29
97.75
52.35
54.79
71.93
76.62
67.30
78.11
60.15
1.98
RSDT
91.68
97.42
98.99
81.56
98.67
53.12
51.32
79.60
83.68
70.44
81.78
63.90
20.50
RF
90.17
97.44
98.82
77.80
98.58
56.10
53.79
80.07
84.63
71.37
81.93
64.79
2.33
RoF
92.55
98.30
99.58
95.60
95.62
74.61
58.49
84.55
89.93
82.66
87.69
78.09
44.74
RoRF
93.29
99.60
99.55
95.55
98.79
65.38
57.54
85.34
90.39
79.04
87.27
73.98
53.41
ELM
90.7
99.65
85.64
94.92
96.68
58.76
70.18
77.21
88.01
74.56
84.64
68.75
1.56
RSELM
95.05
99.69
98.95
96.14
97.11
64.32
68.06
80.50
91.08
78.45
87.88
72.84
34.65
RoELM
91.76
99.89
99.85
97.62
95.33
69.19
63.43
76.02
90.90
79.44
87.11
74.25
51.45
Random subspace ensembles both with spectral information and spatial information are performed on the University of Pavia ROSIS image. For all the ensemble classifiers, the number of classifiers (iterations) is fixed to be
20. Following the studies in [27] and [60], the number of features in a subset (M ) in each ensemble classifier are
√
set as follows. For the RF algorithm, the number of features in a subset is set to be the default value N of the
software package (10 for this scene). For the RSDT and RSELM approach (spectral and spatial information), M is
set to 52 and 102, respectively. The number of features in a subset for RoF, RoRF and RoELM used for spectral
and spatial information is set to 10 and 3, respectively. The number of hidden nodes in ELM and its ensemble is
fixed to be 128.
Table XIV gives the overall accuracy, average accuracy and class-specific accuracy obtained for different classification algorithms using the entire training set when applied to the spectral information of University of Pavia
ROSIS image. The computational times are also given in this table. From this table, it is clear that RoF provided
the best results in terms of global and individual class accuracies, followed by RoRF and RoELM. In order to
enhance the classification results, RS ensemble with EMAPs are further applied to classify hyperspectral data and
the global and class-specific accuracies are reported in Table XV.
March 1, 2015
DRAFT
25
TABLE XVI
C LASSIFICATION ACCURACIES OBTAINED FROM THE PROPOSED METHODS (RO ELM EMAP S AND RSELM EMAP S ) WITH OTHER
S PATIAL - SPECTRAL CLASSIFIERS FOR THE U NIVERSITY OF PAVIA ROSIS IMAGE
Classifier
OA
AA
κ
SVM+Clustering [4]
94.68
95.21
92.92
MLRsubMLL [57]
94.10
93.45
92.24
GCK [45]
98.09
97.76
97.46
Mixed lasso 3D-DWT [34]
98.15
97.56
97.48
RSELM EMAPs
98.67
99.00
98.21
RoELM EMAPs
98.69
98.92
98.25
TABLE XV
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING THE ENTIRE TRAINING SET WHEN
APPLIED TO THE EMAP S OF U NIVERSITY OF PAVIA ROSIS IMAGE .
Class
1
2
3
4
5
6
7
8
9
OA
AA
κ
Time(s)
DT
98.02
85.96
99.55
98.95
89.69
90.85
67.13
91.34
99.32
91.67
91.20
89.14
1.38
RSDT
98.95
92.47
99.58
96.55
97.10
91.66
80.63
94.26
100.00
93.61
94.58
91.71
29.72
RF
98.94
97.33
99.62
96.34
99.12
97.28
73.05
95.14
100.00
96.08
95.20
94.83
2.82
RoF
98.61
97.00
99.62
99.39
94.55
93.65
85.45
93.52
100.00
94.85
95.75
93.26
37.93
RoRF
99.16
99.32
99.62
97.35
99.23
97.42
75.15
95.32
100.00
96.47
95.84
95.34
70.24
ELM
98.96
98.37
96.51
97.61
94.54
96.42
87.88
97.16
99.92
96.49
96.37
95.37
1.83
RSELM
99.51
99.31
99.67
99.96
98.52
98.35
98.08
97.69
99.93
98.67
99.00
98.21
37.66
RoELM
99.58
98.38
99.56
99.90
97.45
98.55
99.38
97.54
99.92
98.69
98.92
98.25
71.59
It can be seen from Table XV that the classification results with EMAPs significantly outperformed those only
considering spectral information. All the RS ensemble yields the highest precision results. The proposed RSELM
and RoELM outperform ELM, RoRF and other ensemble methods in terms of achieving higher global and classspecific accuracies. Rotation subspace-based classifiers (RoRF, RoF and RoELM) generate more accurate results
than those of RF, RSDT and RSELM because they introduced more diversity within the ensemble. Concerning the
computational load, different observations can be made as in the former experiments. The computational cost of
ELM and its ensembles is higher than those of DT and DT ensembles, because of the large size of the dataset.
The spectral-spatial methods are less computationally efficient than the spectral-based methods due to the higher
dimensionality of input features, but provide, in turn, higher accuracies. For illustrative purpose, Fig. 9 provides
the classification maps of the individual and ensemble classifiers (one of ten Monte Carlo runs). Compared to the
results using only spectral information presented in Fig. 9 (a-f), the maps involving spatial information (seen in
Fig. 9 (g-p)) generate more homogeneous areas (especially for the Class M eadows located at the lower left area)
and reduce the classification noise.
In addition, Table XVI presents the comparisons of RSELM EMAPs and RoELM EMAPS against other state
of the art spectral-spatial classification methods, such as SVM+Clustering [4], MLRsubMLL [57], Generalized
composite kernels (GCK) [45] and Mixed lasso with 3D-DWT features [34]. SVM+Clustering approach combines
the results of a pixel wise SVM classification and the segmentation map obtained by partitional clustering using
March 1, 2015
DRAFT
26
Fig. 9. Classification results of University of Pavia ROSIS image (only one Monte Carlo run). Overall accuracies for each classifier are given.
March 1, 2015
DRAFT
27
TABLE XVII
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10 SAMPLES PER CLASS WHEN APPLIED
TO THE SPECTRAL INFORMATION OF U NIVERSITY OF PAVIA ROSIS IMAGE .
Class
1
2
3
4
5
6
7
8
9
OA
AA
κ
Time
DT
64.06
87.71
92.92
44.66
75.78
36.08
37.66
58.84
64.08
49.79
62.45
40.02
0.29
RSDT
66.11
91.22
96.39
51.76
84.35
39.71
40.96
60.97
68.71
53.79
66.69
44.36
2.51
RF
69.78
96.58
95.73
51.06
88.54
46.02
43.83
65.06
78.63
58.24
70.58
49.24
0.63
RoF
73.67
95.21
98.61
82.51
89.24
47.65
48.64
63.85
84.31
63.32
75.97
55.81
10.13
RoRF
80.33
98.36
99.20
77.54
96.81
48.54
46.34
66.71
87.94
64.77
77.98
57.51
19.74
ELM
52.83
92.83
40.37
52.54
55.41
50.08
50.27
47.99
61.36
51.67
55.97
40.96
2.6
RSELM
61.17
93.36
13.37
55.26
41.34
59.19
56.91
54.97
67.21
56.41
55.86
45.74
41.86
RoELM
54.08
97.56
94.59
84.55
89.13
48.29
61.98
49.75
73.96
60.22
72.63
52.12
80.67
majority voting [4]. MLRsubMLL is a Bayesian approach, which contains two main steps: 1) the posterior probability
distributions are constructed by an a subspace MLR classifier, and 2) segmentation, which refers to an image of
class labels from a posterior distribution built on the aforementioned classifier and on a multilevel logistic (MLL)
prior [57]. GCK combines the different kernels built on the spectral and the spatial information of the hyperspectral
data without any weight parameters [45]. The classifier in this work is the multinomial logistic regression, and
the spatial information is modeled from EMAPs. Mixed lasso with 3D-DWT features is to use structured sparse
logistic regression (solved by Mixed lasso) to classify three-dimensional discrete wavelet transform (3D-DWT)
[34]. The results presented in Table XVI are obtained using the same training and testing set. From Table XVI,
we can conclude that both RSELM EMAPs and RoELM EMAPS outperform other spatial-spectral classifiers in
terms of OA, AA and κ. In particular, RoELM EMAPs gains the highest OA and κ and RSELM EMAPs achieves
the highest AA.
In order to assess the effectiveness of the RS ensemble for a limited training set, we have randomly extracted
a few training samples from the training set. Only 10 samples for each class are used for this experiment. We
have repeated the training sample selection and the classification process ten times, and the mean classification
results are reported in this paper. Table XVII-XVIII shows the overall accuracy, average accuracy and class-specific
accuracy obtained for individual and ensemble classifiers using 10 samples per class when the spectral and spatial
information of University of Pavia ROSIS image used as the input, respectively . The classification results in
Table XVII-XVIII are lower than those in Table XIV-XV, due to the limited training set. For instance, the OA
and AA of RoRF EMAPs are 96.47% and 95.87% for the original training set, whereas using limited training
samples, the OA and AA of RoRF EMAPs are 89.30% and 91.49%. Nevertheless, with a very small training set,
the results using the combination of RS ensembles and EMAPs are still very good. Futhermore, Table XIX gives
the comparisons of RSELM EMAPs and RoELM EMAPS against the state of the art spectral-spatial classification
March 1, 2015
DRAFT
28
TABLE XVIII
C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10 SAMPLES PER CLASS WHEN APPLIED
TO THE EMAP S OF U NIVERSITY OF PAVIA ROSIS IMAGE .
Class
1
2
3
4
5
6
7
8
9
OA
AA
κ
Time
DT
84.63
82.48
85.92
84.85
84.35
79.48
63.08
80.62
98.66
81.14
82.67
75.93
0.32
RSDT
87.66
89.59
91.23
85.74
88.55
82.18
67.38
85.15
99.44
84.24
86.32
79.77
2.47
RF
89.04
98.26
94.71
82.26
92.84
87.37
76.10
90.71
99.10
88.11
90.04
84.54
0.79
RoF
89.14
90.61
90.21
87.52
92.25
86.81
77.73
87.01
99.21
88.40
89.20
84.95
15.02
RoRF
90.85
98.59
97.70
87.11
94.62
87.72
76.21
91.18
99.45
89.30
91.49
86.10
27.56
ELM
56.65
95.77
88.91
73.24
87.00
81.80
66.61
88.61
99.62
80.43
82.02
74.75
2.87
RSELM
71.11
99.88
99.43
95.85
94.80
90.79
85.83
95.26
99.95
91.19
92.55
88.54
58.35
RoELM
94.81
97.42
91.44
95.27
93.18
88.38
89.39
95.75
99.91
91.93
93.95
89.55
114.35
TABLE XIX
C LASSIFICATION ACCURACIES OBTAINED FROM THE PROPOSED METHODS (RO ELM EMAP S AND RSELM EMAP S ) IN THE
COMPARSIONS OTHER S PATIAL - SPECTRAL CLASSIFIERS FOR THE U NIVERSITY OF PAVIA ROSIS IMAGE (10 SAMPLES PER CLASS )
Classifier
OA
AA
κ
SVM+Clustering [4]
61.83
73.85
57.14
MLRsubMLL [57]
73.68
77.18
66.41
GCK [45]
89.38
92.22
86.53
Mixed lasso 3D-DWT [34]
87.73
91.14
84.36
RSELM EMAPs
91.19
92.55
88.54
RoELM EMAPs
91.93
93.95
89.55
methods for a limited training set (10 samples per class). From this table, it can be found that our proposed method
RoELM EMAPs gains the best classification result. Considering the processing time, with the limited training set,
the processing time of DT and DT ensemble is significantly reduced. The computational cost of ELM and its
ensembles with limited training samples is higher than those of ELM and its ensemble with entire training set,
because we used more hidden nodes (δ = 512) to generate better performances.
March 1, 2015
DRAFT
29
D. Study of Effects on Parameter selection
85
100
96
80
80
90
95
70
75
80
94
65
60
93
CART
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
92
91
55
OA(%)
CART
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
OA(%)
70
OA(%)
OA(%)
60
50
70
60
40
ELM
RSELM
RoELM
50
ELM
RSELM
RoELM
30
40
20
90
20
40
60
80
Number of features in a subset (M)
100
20
40
60
80
Number of features in a subset (M)
(a)
200
100
400
600
800
Number of hidden nodes (δ)
(b)
85
1000
200
(c)
1000
(d)
80
100
400
600
800
Number of hidden nodes (δ)
100
98
95
75
96
90
80
94
70
65
10
20
30
40
Number of features in a subset (M)
CART
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
88
86
84
82
65
(e)
40
60
80
Number of features in a subset (M)
100
ELM
RSELM
RoELM
70
ELM
RSELM
RoELM
65
55
20
80
75
60
80
50
OA(%)
90
OA(%)
OA(%)
OA(%)
CART
RSDT
RF
RoF
RoRF
ELM
RSELM
RoELM
85
70
92
75
60
200
400
600
800
Number of hidden nodes (δ)
(f)
(g)
1000
200
400
600
800
Number of hidden nodes (δ)
1000
(h)
Fig. 10. Indiana Pines AVIRIS image (10% of the labeled samples as training samples). Sensitivity to the change of (a) M with spectral
information. (b) M with EMAPs. (c) δ of ELM and its ensembles with spectral information. (d) δ of ELM and its ensembles with EMAPs.
University of Pavia ROSIS image (entire training set). Sensitivity to the change of (e) M with spectral information. (f) M with EMAPs. (g) δ
of ELM and its ensembles with spectral information. (h) δ of ELM and its ensembles with EMAPs.
Number of features in a subset (M ) is the key parameter of the RS ensemble. In ELM and its ensembles, the
number of hidden nodes (δ) plays an important role. The effects of parameters in RS ensemble are depicted in
Fig. 10. It is observed from Fig. 10(a-b, e-f) that there is no pattern of dependency between M and the ensemble
accuracy. Different RS ensemble classifier gain the highest OA on different values of M . For the University of
Pavia ROSIS image, RoF with spectral information gains the highest OAs when M = 10 and RoELM with EMAPs
achieves the best classification result when M = 6. Fig. 10(c-d, g-h) depicts that a large number of hidden nodes
may give higher accuracies in testing, but a complex network could also overfit the training data. For instance,
the generalization performance decreases when the number of hidden nodes in larger than 512. In general, these
parameters should be selected empirically in particular applications.
VII. C ONCLUSION AND PERSPECTIVE
In this paper, we developed a novel framework that combines Random Subspace ensembles and EMAPs for the
spatial-spectral classification of remotely sensed hyperspectral data. Considering the computational cost, we selected
two fast learning algorithms: DT and ELM, to build the RS ensembles. Several conclusions can be summarized
based on our experimental results:
March 1, 2015
DRAFT
30
•
Although RS ensembles requires more training time than individual classifiers, their performance is superior to
the individual classifiers using both spectral and spatial information as input. The computational load for RF
algorithm is very low (less than 1s for AVIRIS dataset). In addition, the computation time for RS ensemble
can be further reduced by decreasing the ensemble size.
•
In most cases, Rotation subspace classifiers, such as RoF, RoRF and RoELM outperform RSDT, RF and
RSELM. That is because we introduce more diversity in Rotation subspace ensemble classifiers by using
feature extraction and random selection strategies. However, it leads to increased computational complexity
for Rotation subspace approaches.
•
In general, ELM and its ensembles can achieve higher accuracies than DT and its ensembles. The computation
time of ELM and its ensemble depends on the hidden nodes when the ensemble size and training samples are
fixed. Nevertheless, the efficiency of the ELM ensemble could be further improved by choosing smaller size
of ensemble or using less hidden nodes.
•
Spectral-spatial classification approaches computed by EMAPs based on the proposed ensembles achieve thestate-of-the-art performances for two hyperspectral datasets.
However, Random subspace ensembles are likely to have two limitations: 1) the number of features in a subset
is required to be defined in advance (the optimal value for this parameter depends on the dataset); 2) the high
computation time due to the high-dimensionality of the input features. Therefore, our future work is to develop an
effective scheme for automatically estimating the number of features in a subset for RS ensemble and to apply a
dimensionality reduction step for both spectral and spatial information of hyperspectral data.
ACKNOWLEDGEMENT
The authors would like to thank Prof D. Landgrebe from Purdue University, USA and Prof P. Gamba for providing
the hyperspectral remote sensing images.
R EFERENCES
[1] D. A. Landgrebe, “Hyperspectral Image Data Analysis as a High Dimensional Signal Processing Problem,” IEEE Signal Processing
Magazine, vol. 19, no. 1, pp. 17–28, Jan. 2002.
[2] C. I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. Plenum Publishing Co., 2003.
[3] ——, Hyperspectral Data Exploitation: Theory and Applications. Wiley-Interscience, Hoboken, NJ, 2007.
[4] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral-spatial classification of hyperspectral imagery based on partitional clustering
techniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973–2987, 2009.
[5] Y. Tarabalka, J. Chanussot, and J. A. Benediktsson, “Segmentation and classification of hyperspectral images using watershed transformation,” Pattern Recognition, vol. 43, no. 7, pp. 2367–2379, 2010.
[6] J. C. Harsanyi and C. I. Chang, “Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection
approach,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, pp. 779–785, 1994.
[7] L. O. Jimenez, A. Morales-Morell, and A. Creus, “Classification of hyperdimensional data based on feature and decision fusion approaches
using projection pursuit, majority voting, and neural networks.” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 3, pp. 1360–1366, 1999.
[8] C. I. Chang, “An information-theoretic approach to spectral variability, similarity, and discrimination for hyperspectral image analysis,”
IEEE Trans. Inform. Theory., vol. 46, no. 5, pp. 1927–1932, Sep. 2000.
March 1, 2015
DRAFT
31
[9] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification.” IEEE Trans. Geosci. Remote Sens.,
vol. 43, no. 6, pp. 1351–1362, 2005.
[10] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis.”
IEEE Trans. Geosci. Remote Sens., vol. 47, no. 3, pp. 862–873, 2009.
[11] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant
analysis,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876, 2011.
[12] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[13] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci.
Remote Sens., vol. 42, no. 8, pp. 1778–1790, 2004.
[14] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
[15] J. A. Benediktsson, J. Chanussot, and M. Fauvel, “Multiple classifier systems in remote sensing: from basics to recent developments,”
in Proceedings of the 7th International Workshop on Multiple Classifier Systems, Prague, Czech Republic, May May 23-25, 2007, pp.
501–512.
[16] L. Rokach, Pattern classification using ensemble methods. World Scientific, 2010.
[17] P. Du, J. Xia, W. Zhang, K. Tan, Y. Liu, and S. Liu, “Multiple classifier system for remote sensing image classification: A review,” Sensors,
vol. 12, no. 4, pp. 4764–4792, 2012.
[18] M. Wozniak, M. Grana, and E. Corchado, “A survey of multiple classifier systems as hybrid systems,” Information Fusion, vol. 16, no. 1,
pp. 3–17, 2014.
[19] T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp.
832–844, August 1998.
[20] L. I. Kuncheva, J. J. Rodrı́guez, C. O. Plumpton, D. E. Linden, and S. J. Johnston, “Random subspace ensembles for fMRI classification,”
IEEE Trans. Med. Imaging, vol. 29, no. 2, pp. 531–542, 2010.
[21] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[22] P. Gislason, J. A. Benediktsson, and J. Sveinsson, “Random forests for land cover classification,” Pattern Recogn. Lett., vol. 27, no. 4, pp.
294–300, Mar. 2006.
[23] J. C. Chan and D. Paelinckx, “Evaluation of Random Forest and AdBboost tree-based ensemble classification and spectral band selection
for ecotope mapping using airborne hyperspectral imagery,” Remote Sens. Environ., vol. 112, no. 6, pp. 2999–3011, 2008.
[24] V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, “An assessment of the effectiveness of a random
forest classifier for land-cover classification,” ISPRS J. Photogramm., vol. 67, no. 1, pp. 93–104, Jan.2012.
[25] B. Waske, S. Van Der Linden, J. A. Benediktsson, A. Rabe, and P. Hostert, “Sensitivity of support vector machines to random feature
selection in classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 2880–2889, 2010.
[26] J. J. Rodriguez and L. I. Kuncheva, “Rotation forest: A new classifier ensemble method.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28,
no. 10, pp. 1619–1630, 2009.
[27] J. Xia, P. Du, X. He, and J. Chanussot, “Hyperspectral remote sensing image classification based on rotation forest,” IEEE Geosci. Remote
Sensing Lett., vol. 11, no. 1, pp. 239 – 243, 2014.
[28] J. Xia, J. Chanussot, P. Du, and X. He, “Spectral-spatial classification for hyperspectral data using rotation forests with local feature
extraction and markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2532–2546, 2015.
[29] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, “Advances in spectral-spatial classification of hyperspectral
images,” Proceedings of the IEEE, vol. 101, no. 3, pp. 652–675, 2013.
[30] M. Pesaresi and J. A. Benediktsson, “A new approach for the morphological segmentation of high-resolution satellite imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 39, no. 2, pp. 309–320, 2001.
[31] J. A. Benediktsson, M. Pesaresi, and K. Amason, “Classification and feature extraction for remote sensing images from urban areas based
on morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 1940–1949, 2003.
[32] T. C. Bau, S. Sarkar, and G. Healey, “Hyperspectral region classification using a three-dimensional gabor filterbank,” IEEE Trans. Geosci.
Remote Sens., vol. 48, no. 9, pp. 3457–3464, 2010.
[33] F. Tsai and J. Lai, “Feature extraction of hyperspectral image cubes using three-dimensional gray-level cooccurrence.” IEEE Trans. Geosci.
Remote Sens., vol. 51, no. 6-2, pp. 3504–3513, 2013.
March 1, 2015
DRAFT
32
[34] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based on structured sparse logistic regression and three-dimensional
wavelet texture features.” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4-2, pp. 2276–2291, 2013.
[35] J. Serra, Image Analysis and Mathematical Morphology. Academic Press, 1982.
[36] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classification of hyperspectral data from urban areas based on extended
morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–491, 2005.
[37] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral and spatial classification of hyperspectral data using svms and
morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11, pp. 3804–3814, 2008.
[38] M. Dalla Mura, J. A. Benediktsson, B. Waske, and L. Bruzzone, “Morphological attribute profiles for the analysis of very high resolution
images,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3747–3762, 2010.
[39] ——, “Extended profiles with morphological attribute filters for the analysis of hyperspectral data,” Int. J. Remote Sens., vol. 31, no. 22,
pp. 5975–5991, Jul. 2010.
[40] M. Dalla Mura, A. Villa, J. Benediktsson, J. Chanussot, and L. Bruzzone, “Classification of hyperspectral images by using extended
morphological attribute profiles and independent component analysis,” IEEE Geosci. Remote Sensing Lett., vol. 8, no. 3, pp. 542–546,
2011.
[41] P. Reddy Marpu, M. Pedergnana, M. Dalla Mura, S. Peeters, J. A. Benediktsson, and L. Bruzzone, “Classification of hyperspectral data
using extended attribute profiles based on supervised and unsupervised feature extraction techniques,” International Journal of Image and
Data Fusion, vol. 3, no. 3, pp. 269–298, 2012.
[42] M. Pedergnana, P. Reddy Marpu, M. Dalla Mura, J. A. Benediktsson, and L. Bruzzone, “Classification of remote sensing optical and lidar
data using extended attribute profiles,” IEEE J. Sel. Topics Signal Processing, vol. 6, no. 7, pp. 856–865, 2012.
[43] ——, “A novel technique for optimal feature selection in attribute profiles based on genetic algorithms,” IEEE Trans. Geosci. Remote
Sens., vol. 51, no. 6-2, pp. 3514–3528, 2013.
[44] N. Falco, M. Dalla Mura, F. Bovolo, J. A. Benediktsson, and L. Bruzzone, “Change detection in vhr images based on morphological
attribute profiles,” IEEE Geosci. Remote Sensing Lett., vol. 10, no. 3, pp. 636–640, 2013.
[45] J. Li, P. Reddy Marpu, A. Plaza, J. M. Bioucas-Dias, and J. A. Benediktsson, “Generalized composite kernel framework for hyperspectral
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 9, pp. 4816–4829, 2013.
[46] S. Bernabe, P. Reddy Marpu, A. Plaza, M. Dalla Mura, and J. A. Benediktsson, “Spectral-spatial classification of multispectral images
using kernel feature space representation,” IEEE Geosci. Remote Sensing Lett., vol. 11, no. 1, pp. 288–292, 2014.
[47] B. Song, J. Li, M. Dalla Mura, P. Li, A. Plaza, J. M. Bioucas-Dias, J. A. Benediktsson, and J. Chanussot, “Remotely sensed image
classification using sparse representations of morphological attribute profiles,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp.
5122–5136, 2014.
[48] G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in 2004.
Proceedings. 2004 IEEE International Joint Conference on Neural Networks, vol. 2, Budapest, Hungary, 2004, pp. 985–990.
[49] ——, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1-3, pp. 489–501, 2006.
[50] G. Stiglic, J. J. Rodriguez, and P. Kokol, “Rotation of random forests for genomic and proteomic classification problems,” Software Tools
and Algorithms for Biological Systems, vol. 696, pp. 211–221, 2011.
[51] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, Mar. 1986.
[52] R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J. Zambreno, “Interactive presentation: An fpga implementation of decision tree
classification,” in Proceedings of the Conference on Design, Automation and Test in Europe, San Jose, CA, USA, 2007, pp. 189–194.
[53] L. Rokach and O. Maimon, Data Mining with Decision Trees: Theroy and Applications. World Scientific Publishing Co., Inc., 2008.
[54] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996.
[55] G. B. Huang and L. Chen, “Convex incremental extreme learning machine.” Neurocomput., vol. 70, no. 16-18, pp. 3056–3062, 2007.
[56] ——, “Enhanced random search based incremental extreme learning machine,” Neurocomput., vol. 71, no. 16-18, pp. 3460–3468, 2008.
[57] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral-spatial hyperspectral image segmentation using subspace multinomial logistic regression
and markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 3, pp. 809–823, 2012.
[58] L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,”
Machine Learning, vol. 51, no. 2, pp. 181–207, 2003.
[59] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and regression trees. Boca Raton, FL: CRC Press, 1984.
March 1, 2015
DRAFT
33
[60] J. Xia, J. Chanussot, P. Du, and X. He, “Rotation-Based Ensemble Classifiers for High-Dimensional Data,” in Fusion in Computer Vision,
B. Ionescu, J. Benois-Pineau, T. Piatrik, and G. Quénot, Eds.
Springer, 2014, pp. 135–160.
Junshi Xia (S’11) received the B.S. degree in geographic information systems and the Ph.D. degree in photogrammetry and remote sensing from the China University of Mining and Technology, Xuzhou, China, in 2008 and 2013,
respectively. He obtained in 2014 a Ph.D. degree in image processing with the Grenoble Images Speech Signals and
Automatics Laboratory, Grenoble Institute of Technology, Grenoble, France.
He is currently a Research Fellow at the Department of Geographic Information Sciences, Nanjing University. His
research interests include multiple classifier system in remote sensing, hyperspectral remote sensing image processing,
and urban remote sensing.
Mauro Dalla Mura (S’08-M’11) received the laurea (B.E.) and laurea specialistica (M.E.) degrees in Telecommunication Engineering from the University of Trento, Italy, in 2005 and 2007, respectively. He obtained in 2011 a joint
Ph.D. degree in Information and Communication Technologies (Telecommunications Area) from the University of
Trento, Italy and in Electrical and Computer Engineering from the University of Iceland, Iceland. In 2011 he was a
Research fellow at Fondazione Bruno Kessler, Trento, Italy, conducting research on computer vision. He is currently
an Assistant Professor at Grenoble Institute of Technology (Grenoble INP), France. He is conducting his research at
the Grenoble Images Speech Signals and Automatics Laboratory (GIPSA-Lab). His main research activities are in the
fields of remote sensing, image processing and pattern recognition. In particular, his interests include mathematical morphology, classification
and multivariate data analysis. Dr. Dalla Mura was the recipient of the IEEE GRSS Second Prize in the Student Paper Competition of the 2011
IEEE International Geoscience and Remote Sensing Symposium 2011 (Vancouver, CA, July 2011). He is a Reviewer of IEEE Transactions on
Geoscience and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, IEEE Journal of Selected Topics in Earth Observations and
Remote Sensing, IEEE Journal of Selected Topics in Signal Processing, Pattern Recognition Letters, ISPRS Journal of Photogrammetry and
Remote Sensing, Photogrammetric Engineering and Remote Sensing (PE&RS). He is a member of the Geoscience and Remote Sensing Society
(GRSS) and IEEE GRSS Data Fusion Technical Committee (DFTC) and Secretary of the IEEE GRSS French Chapter (2013-2016). He was a
lecturer at the RSSS12 - Remote Sensing Summer School 2012 (organized by the IEEE GRSS), Munich, Germany.
March 1, 2015
DRAFT
34
Jocelyn Chanussot (M’04-SM’04-F’12) received the M.Sc. degree in electrical engineering from the Grenoble Institute
of Technology (Grenoble INP), Grenoble, France, in 1995, and the Ph.D. degree from Savoie University, Annecy,
France, in 1998. In 1999, he was with the Geography Imagery Perception Laboratory for the Delegation Generale de
l’Armement (DGA - French National Defense Department). Since 1999, he has been with Grenoble INP, where he
was an Assistant Professor from 1999 to 2005, an Associate Professor from 2005 to 2007, and is currently a Professor
of signal and image processing. He is conducting his research at the Grenoble Images Speech Signals and Automatics
Laboratory (GIPSA-Lab). His research interests include image analysis, multicomponent image processing, nonlinear
filtering, and data fusion in remote sensing. He is a member of the Institut Universitaire de France (2012-2017). Since 2013, he is an Adjunct
Professor of the University of Iceland. Dr. Chanussot is the founding President of IEEE Geoscience and Remote Sensing French chapter (20072010) which received the 2010 IEEE GRS-S Chapter Excellence Award. He was the co-recipient of the NORSIG 2006 Best Student Paper
Award, the IEEE GRSS 2011 Symposium Best Paper Award, the IEEE GRSS 2012 Transactions Prize Paper Award and the IEEE GRSS
2013 Highest Impact Paper Award. He was a member of the IEEE Geoscience and Remote Sensing Society AdCom (2009-2010), in charge of
membership development. He was the General Chair of the first IEEE GRSS Workshop on Hyperspectral Image and Signal Processing, Evolution
in Remote sensing (WHISPERS). He was the Chair (2009-2011) and Cochair of the GRS Data Fusion Technical Committee (2005-2008). He
was a member of the Machine Learning for Signal Processing Technical Committee of the IEEE Signal Processing Society (2006-2008) and
the Program Chair of the IEEE International Workshop on Machine Learning for Signal Processing, (2009). He was an Associate Editor for
the IEEE Geoscience and Remote Sensing Letters (2005-2007) and for Pattern Recognition (2006-2008). Since 2007, he is an Associate Editor
for the IEEE Transactions on Geoscience and Remote Sensing. Since 2011, he is the Editor-in-Chief of the IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing. In 2013, he was a Guest Editor for the Proceedings of the IEEE and in 2014 a Guest Editor
for the IEEE Signal Processing Magazine. He is a Fellow of the IEEE and a member of the Institut Universitaire de France (2012-2017).
Peijun Du (M’07-SM’12) is a Professor of Remote Sensing at the Department of Geographic Information Sciences,
Nanjing University, and the deputy director of the Key Laboratory for Satellite Mapping Technology and Applications
of National Administration of Surveying, Mapping and Geoinformation (NASG), China. After receiving his Ph.D.
degree from China University of Mining and Technology in 2001, he had been employed by the same university until
he joined Nanjing University in 2011. He was a postdoctoral fellow at Shanghai JiaoTong University from February
2002 to March 2004, and was a senior visiting scholar at the University of Nottingham and the GIPSA-Lab, Grenoble
Institute of Technology, France. His research interests focus on remote sensing image processing and pattern recognition,
hyperspectral remote sensing, and applications of geospatial information technologies. He has published more than 40 articles in international
peer-reviewed journals, and more than100 papers in international conferences and Chinese journals. Dr. Du has been the Associate Editor of
IEEE Geoscience and Remote Sensing Letters (GRSL) since 2009. He was the Guest Editor of 3 special issues IEEE Journal of Selected Topics
in Applied Earth Observation and Remote Sensing (JSTARS). He also served as the Co-chair of the Technical Committee of URBAN 2009,
EORSA 2014 and IAPR-PRRS 2012, the Co-chair of the Local Organizing Committee of JURSE 2009, WHISPERS 2012 and EORSA 2012,
and the member of Scientific Committee or Technical Committee of other international conferences, for example, Spatial Accuracy 2008, ACRS
2009, WHISPERS (2010-2014), URBAN(2011, 2013 and 2015), MultiTemp(2011, 2013 and 2015), ISDIF 2011, SPIE European Conference
on Image and Signal Processing for Remote Sensing (2012-2014).
March 1, 2015
DRAFT
35
Xiyan He received in 2006 the Generalist Engineer degree from Ecole Centrale Paris, France, and the M.E. degree in
Pattern Recognition and Intelligent System from Xi’an Jiaotong University, China, respectively. She received her Ph.D.
degree in Computer Science in 2009, from University of Technology of Troyes, France. Dr. He was a teaching assistant
in University of Technology of Troyes in 2009, a post-doctoral research fellow in Research Centre for Automatic Control
of Nancy in 2010, and a teaching assistant in University of Pierre-Mends-France, Grenoble in 2011. Since 2012, she
has been a post-doctoral research fellow in Grenoble Laboratory of Image, Speech, Signal and Automatics. Her main
research interests include machine learning, pattern recognition and data fusion, with special focus on applications to
remote sensed images.
March 1, 2015
DRAFT