[go: up one dir, main page]

Academia.eduAcademia.edu
1 Random Subspace Ensembles for Hyperspectral Image Classification with Extended Morphological Attribute Profiles Junshi Xia, Student Member, IEEE, Mauro Dalla Mura, Member, IEEE, Jocelyn Chanussot, Fellow, IEEE, Peijun Du, Senior Member, IEEE, and Xiyan He Abstract Classification is one of the most important techniques to the analysis of hyperspectral remote sensing images. Nonetheless, there are many challenging problems arising in this task. Two common issues are the curse of dimensionality and the spatial information modeling. In this work, we present a new general framework to train series of effective classifiers with spatial information for classifying hyperspectral data. The proposed framework is based on the two key observations: 1) the curse of dimensionality and the high feature-to-instance ratio can be alleviated by using Random Subspace (RS) ensembles; 2) the spatial-contextual information is modeled by the extended multiattribute profiles (EMAPs). Two fast learning algorithms, decision tree (DT) and extreme learning machine (ELM), are selected as the base classifiers. Six RS ensemble methods, including Random subspace with DT (RSDT), Random Forest (RF), Rotation Forest (RoF), Rotation Random Forest (RoRF), RS with ELM (RSELM) and Rotation subspace with ELM (RoELM), are constructed by the multiple base learners. Experimental results on both simulated and real hyperspectral data verify the effectiveness of the RS ensemble methods for the classification of both spectral and spatial information (EMAPs). On the University of Pavia ROSIS image, our proposed approaches, both RSELM and RoELM with EMAPs, achieve the state-of-the-art performances, which demonstrates the advantage of the proposed methods. The key parameters in RS ensembles and the computational complexity are also investigated in this study. Index Terms Manuscript received ; revised . This paper is supported by the Natural Science Foundation of China under Grant No. 41471275, the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), the Fundamental Research Funds for the Central Universities and the project XIMRI ANR-BLAN-SIMI2-LS-101019-6-01. J. Xia is with the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and Geoinformation of China, Nanjing University, 210023 Nanjing, China and the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble, France (e-mail: xiajunshi@gmail.com). M. Dalla Mura and X. He are with the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble, France (e-mail: mauro.dalla-mura@gipsalab.grenoble-inp.fr, greenhxy@gmail.com). J. Chanussot is with the GIPSA-lab, Grenoble Institute of Technology, 38400 Grenoble, France and Faculty of Electrical and Computer Engineering, University of Iceland, Iceland. (e-mail: jocelyn.chanussot@gipsa-lab.grenoble-inp.fr). P. Du is with the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and Geoinformation of China, Nanjing University, 210023 Nanjing, China (Corresponding author, e-mail: dupjrs@gmail.com). March 1, 2015 DRAFT 2 Classification, Hyperspectral data, Random subspace, Extended multi-attribute profiles (EMAPs) I. I NTRODUCTION In the context of hyperspectral image analysis, classification is an intense field of research and development [1]–[5]. The difficulties of the supervised classification of high spatial resolution hyperspectral data come from at least three sources: • the ratio between features (spectral bands) and available training samples is large; • the feature set might show some redundancy; • existence of many approaches to exploit the spatial information in classification, but unavailability of a reliable approach. A considerable amount of literature has focused on hyperspectral image classification [6]–[11]. One of the most widely used approaches is based on kernel methods, such as the support vector machines (SVMs), due to their good generalization capability, ability in addressing small-size samples problems and the curse of dimensionality [9], [12], [13]. In addition, kernel methods can efficiently define non-linear decision boundaries, dealing with cases in which the data is not linearly separable [12]. However, the selection of kernels and the parameters is still an open question that needs to be further investigated. Another alternative strategy for providing enhanced classification performance is classifier ensembles, which are deemed to be better than individual classifiers [14]–[18]. A popular ensemble method is the Random Subspace (RS) ensembles [19]. The idea is intuitive and simple: subsets of feature set are used in the ensemble instead of using all features for all the individual classifiers, and ensemble classifier integrates the outputs of all individual classifiers using a majority voting rule to obtain the final result. Each classifier in the ensemble is constructed on a different feature subset by randomly sampling the original feature set. The rationale behind the RS ensembles is to break down a complex high dimensional problem into several lower-dimensional sub-problems. Thus, they can address such problems as the curse of dimensionality and high feature-to-instance ratio [20]. The most popular RS ensemble method for high-dimensional data (hyperspectral and multi-date images) classification is Random Forest [21]–[24]. Besides, Waske et al. [25] developed a random selection-based SVM for the classification of hyperspectral images. Recently, Xia et al. [26], [27] used Rotation Forest to classify hyperspectral remote sensing images. In comparison with Random Forest, Rotation Forest [26]–[28] promotes both the interclassifier diversity and accuracy of individual classifiers by using a feature extraction approach. Therefore, it can produce more accurate result than Random Forest in most cases [26], [27]. Recent studies demonstrated that spectral-spatial approaches can provide more accurate classification results by integrating the spatial and spectral information together [29]. The motivation is due to the fact that spatial features are discriminant features that can well complement the spectral ones. Different approaches can be used to extract spatial features [30]–[34]. Some of them are based on mathematical morphology (MM), which is a powerful tool for the analysis and processing of geometrical structures in the spatial domain [35]. Pesaresi and Benediktsson introduced the morphological profile (MP) for classifying very high spatial resolution images using a sequence of March 1, 2015 DRAFT 3 geodesic opening and closing operations [30]. The derivative of the morphological profile (DMP) was also defined in their study. Furthermore, Benediktsson et al. proposed the extended morphological profile (EMP), in which an MP is computed on each component after reducing the dimensionality of the data [36]. The first few components and the EMP are stacked together and then classified by a neural network. The main drawback of the method in [36] is that it is constructed for classification of urban structures and it cannot fully use the spectral information of hyperspectral data [37]. Fauvel et al. developed a spectral and spatial fusion methods based on EMP and the original hyperspectral data to overcome this problem [37]. In the works of Dalla Mura et al. [38], [39], attribute profiles (APs) were proposed for extracting additional spatial features for the classification of remote sensing imagery, extending the MP and EMP concepts. APs have proven to extract more informative spatial features than MPs in the classification of high-resolution images. Since then, the AP and its extensions have been widely used for the classification and change detection of multi/hyperspectral and LiDAR data. Dalla Mura et al. presented a technique based on Extended APs (EAPs) and independent component analysis (ICA) for the classification of urban hyperspectral images [40]. Prashanth et al. explored the use of APs based on three supervised and two unsupervised feature extraction techniques for the classification of hyperspectral data with SVM and Random Forest classifiers [41]. Pedergnana et al. proposed a classification approach of features extracted with EAPs computed on both optical and LiDAR images, leading to the integration of spectral, spatial and elevation data [42]. Pedergnana et al. proposed a novel iterative technique based on genetic algorithm to select the optimal features from the EMAPs [43]. Falco et al. investigated the performance of change detection in very high resolution image based on APs [44]. Li et al. presented a generalized composite kernel framework for hypersepctral image classification by combing spectral and the spatial information (EMAPs) [45]. Bernabe et al. proposed a new strategy combing EMAPs and kernel principal components analysis (KPCA) for the classification of multi/hyper-spectral images [46]. Song et al. applied a sparse representation-based learning approach to classify EMAPs extracted from hyperspectral data [47]. From the above literature review, it can be seen that when the EMAPs are used for hyperspectral data classification, two strategies are often adopted: • applying feature selection/extraction [43] or advanced classifiers to EMAPs [47]; • integrating EMAPs with spectral information to formulate the composite kernel for kernel-based methods [45]. In this paper, we propose an advanced classification scheme based on Random Subspace (RS) ensembles applied to EMAPs features. Decision Tree (DT) and Artificial Neural Network (ANN) are usually adopted as a base learner in RS ensemble because they are unstable as weak learners. Small changes in the training data can lead to potentially large variations in the results, making a high diversity within the ensemble. Considering the computational cost, we construct the RS ensembles with two fast learning algorithms: classification and regression tree (CART) and a recently proposed NN classifier: Extreme Learning Machine (ELM) [48], [49]. EMAPs are generated by the combination of APs and the first several components extracted by PCA. Six classifier ensembles, including RSDT, RF, RoF, RoRF, RSELM and RoELM, are considered as shown in Table I. The novelty of this work lies in: March 1, 2015 DRAFT 4 TABLE I I NDIVIDUAL AND ENSEMBLE CLASSIFICATION APPROACHES CONSIDERED FOR THE STUDY Individual classifiers (Notation) Decision tree DT Extreme learning machine • ELM Classifier ensembles Random subspace with DT Random Forest Rotation Forest Rotation Random Forest Random subspace with ELM Rotation subspace with ELM (Notation) RSDT RF RoF RoRF RSELM RoELM proposing an ensemble classifier using ELM base learner and two possible strategies for building the ensembles (i.e. Random and Rotation subspace); • introducing Rotation Random Forest (RoRF) [50] in the field of hyperspectral remote sensing; • defining spectral-spatial classification techniques based on the proposed ensembles and on the spatial features computed by EMAPs. In particular, the performances in a scenario with limited training samples and high input dimensions and the computational complexity is investigated in this paper. It is should be noted that the spectral information and EMAPs are directly applied to the Random subspace ensemble methods without any preprocessing technique (e.g., feature extraction/selection or whitening). The overall structure of the study takes the form of seven sections, including this introductory section. Section II presents an introduction to decision tree and its ensembles. The proposed ELM ensemble methods are detailed in Section III. The main description of EMAPs is presented in Section IV. Section V reports classification results based on simulated hyperspectral data. We report the experimental results on two real hyperspectral datasets in Section VI. Section VII contains the conclusion of the presented work and its perspectives. II. D ECISION TREE AND ITS ENSEMBLES Let {X, Y} = {(x1 , y1 ) , ..., (xn , yn )} be a set of labeled samples, where xi ∈ RD is a pixel and yi contains the label information 1 . Let F be the set of D features. In order to construct an RS ensemble, we collect T classifiers based on the subsets of the original features. Each feature set in the ensemble defines a subspace of features of cardinality M and a classifier is trained on this feature set using the whole training samples [19]. The final result is generated by a majority voting rule. Two parameters, including the ensemble size T and the cardinality of the feature set M , are required in the RS ensemble. A. Decision tree Decision tree is a non-parametric supervised learning algorithm used for classification and regression [51]. 1 y is different between DT and ELM. In DT, y is a scalar with classes of interest Q = {1, ..., Q}, where Q is the total number of classes. i i In ELM, yi is a vector of label in which j th column is set to be 1 if the sample belongs to class j while the other columns are set to be 0. March 1, 2015 DRAFT 5 It is composed of a root node, a set of internal nodes (split) and a set of terminal nodes (leaves). In classification, a root node and each internal node has a splitting decision and splitting features associated with it. Class labels can then be assigned to the leaves. The creation of a DT from training samples involves two phases. At first, a splitting measure and a splitting attribute should be chosen. In the second phase, the records among the child nodes are split based on the decision made in the first phase. This process is applied recursively until a stopping criterion is met [52]. Then, the DT can be used to predict the class label of a new sample. The prediction process starts at the root, and a path to a leaf is traced by performing a splitting decision at each internal node. The class label attached to the leaf is then assigned to the new sample [52]. A critical component of the decision tree induction process is the selection of the split. Different algorithms use various metrics to split the nodes. The most widely used splitting criteria relies on the minimization of the Gini index of the splits [53]. B. Decision tree ensembles 1) Random subspace with DT: The RS ensemble, introduced by Ho [19], was proposed for constructing multiple decision trees. The objective of the RS ensemble is to sample a feature set into low dimensionality subspaces from the whole original high dimensional feature space, then construct a classifier on each smaller subspace, and finally apply a majority voting rule for the final decision. 2) Random Forest: Random Forest, developed by Breiman [21], combines Bagging [54] and Random subspace [19] together to produce the decision tree ensemble. Random Forest is a particular implementation of bagging in which each model is a random tree. A random tree is grown according to the CART algorithm with one exception: for each split, only a small subset of features of randomly selected splits is considered and the best split is chosen from this subset. Since only a portion of the input features is used to split and no pruning on the tree is done, the computational complexity of Random Forest is relatively light [21]. The computing time is approximately of √ T M nlog(n), where T , M and n represents the number of classifiers, the features in a subset and the training samples, respectively. 3) Rotation Forest: Rotation Forest is a recently proposed ensemble method for building classifier ensembles using independent decision trees built on a different set of extracted features [26]. The main heuristic of Rotation Forest is to apply feature extraction and to subsequently reconstruct a full different feature set for each classifier in the ensemble. To do this, the feature space is randomly split into K subsets and each subset contains M features, then principal component analysis (PCA) is applied to each K subset and a new set of M linear extracted features in each subset is constructed by all principal components. Furthermore, a new training data is formed by concatenating M linear extracted features in each subset. An individual DT classifier is trained with this training data. A series of individual classifiers is generated by repeating the above steps several times. The final classification result is produced by integrating the results from individual classifiers using a majority voting rule. Different splits of the features will lead to different extracted features, thereby further increasing the diversity already introduced by the bootstrap sampling. March 1, 2015 DRAFT 6 4) Rotation Random Forest: Rotation Random Forest (RoRF) is a variant of Rotation Forest, which uses Random Forests as the base classifiers instead of decision trees [50]. This method was already evaluated on genomic and proteomic datasets [50], but it was not yet used for remote sensing image classification. The main training and prediction steps of RoRF are presented in Algorithm 1. In the training phase, the feature space is firstly divided into K disjoint subspaces. PCA is performed on each subspace with the bootstrapped samples of 75% of original training set. A transformed training set is generated by rotating with a sparse matrix Rai the original training set. An individual classifier is trained on this rotated training set. In the prediction phase, a new sample x∗ is rotated by Rai . Then, the transformed set, i.e., x∗ Rai , is classified by the ensemble and the class with the maximum number of votes is chosen as the final class. It is important to notice Step 5 in Algorithm 1, in which 75% of the original size of training samples are selected to avoid obtaining the same coefficients of the transformed components if the same features are selected, thus enhancing the diversity among the member classifiers. Rotation Random Forest can improve the performance of Random Forest by introducing further diversity performing a feature extraction within the ensemble. The base classifiers in Rotation Random Forest are more diverse and accurate with respect to Rotation Forest and this could be beneficial for the ensemble. III. ELM AND ITS ENSEMBLES ANN is another base learner used for the construction of classifier ensembles. However, the main drawback of conventional ANN is the high computation complexity and low efficiency. To address the shortcoming, Extreme learning machine (ELM) was proposed for the learning of generalized single hidden layer feed-forward neural networks (SLFNs) without tuning the hidden layers [48], [49]. A. Extreme learning machine For generalized SLFNs, the output function of ELM is defined as: f (xi ) = δ X βj hj (xi ) = h(xi )β (1) j=1 where, β = [β1 , β2 , ..., βδ ] ⊤ is the vectors of weights between the hidden layer of δ nodes and the output node and h(xi ) = [h1 (xi ), h2 (xi ), ..., hδ (xi )] is the vector of hidden layer of xi . Specifically, h(·) is the feature mapping from the D-dimensional input space to the δ-dimensional hidden-layer feature space. The standard SLFNs can approximate these n samples with zero error means that P i kf (xi ) − yi k = 0. Thus, the n equations can be written compactly as: Hβ = Y March 1, 2015 (2) DRAFT 7 Algorithm 1 Rotation RF Training phase n Input: {X, Y} = {xi , yi }i=1 : training samples, T : number of classifier, K: number of subsets (M : number of features in each subset), L: base classifier. The ensemble L = ∅. F: Feature set Output: The ensemble L 1: for i = 1 : T do 2: randomly split the features F into K subsets Fij 3: for j = 1 : K do 4: extract from X the new training set Xi,j with the corresponding features Fij 5: generate a subset X̂i,j by selecting with the bootstrap algorithm, the 75% of the initial training samples in Xi,j (1) (M ) 6: transform X̂i,j to get the coefficients vi,j , ..., vi,j k 7: end for 8: sparse matrix Ri is composed of the above coefficients   (1) (M ) vi,1 , ..., vi,1 1 0 ··· 0   (1) (M ) 0 vi,2 , ..., vi,2 2 · · · 0     Ri =  .. .. ..  .. .   . . . (1) (MK ) 0 0 · · · vi,j , ..., vi,j with respect to the original feature set, rearrange Ri to Rai obtain the new training samples {XRai , Y} build RF classifier Li using {XRai , Y} Add the classifier to the current ensemble, L = L ∪ Li . 13: end for 9: 10: 11: 12: Prediction phase T Input: The ensemble L = {Li }i . A new sample x∗ . Rotation matrix: Rai . Output: class label y ∗ a 1: get the output ensemble with x∗ Ri . T P 1 2: y ∗ = argmax q∈{1,2,...,Q} j:Lj (x∗ Ra i )=q where, Y is the target matrix and H is the hidden-layer output matrix:     h(x1 ) h1 (x1 ) · · · hδ (x1 )     .. ..   .   . H =  ..  =  ..  . .     h(xn ) h1 (xn ) · · · hδ (xn ) (3) The output weights in equation (2) are given by the following smallest norm least-squares solution [49]: β = H+ Y (4) where, H+ is the Moor-Penrose generalized inverse of the hidden layer output matrix H. In ELM, a feature mapping H from input space to a higher dimensional space is needed. The works of [55], [56] demonstrated that almost all nonlinear piecewise continuous functions can be used as output functions of the March 1, 2015 DRAFT 8 hidden-nodes. In this paper, the Sigmoid function is adopted as the nonlinear piecewise continuous function: g(ω, b, xi ) = 1 1 + exp (−(ω · xi + b)) δ where, {ωj , bj }i=1 are randomly generated values that can define a continuous probability distribution (i.e., (5) R g = 1). Thus, h(xi ) is defined based on the nonlinear piecewise continuous function g(ωi , bi ): h(xi ) = [g(ω1 , b1 , xi ), ..., g(ωδ , bδ , xi )] (6) The training and prediction steps of ELM are listed in Algorithm 2. Algorithm 2 Extreme learning machine Training phase n Input: {X, Y} = {(xi , yi )}i=1 : training samples, δ: number of nodes in a hidden layer. g: the sigmoid function . Output: The output weight β. 1: Randomly select the {ω1 , ..., ωδ } and {b1 , ..., bδ } 2: For each training sample xi , calculate the output layer matrix: h(xi ) = [g(ω1 , b1 , xi ), ..., g(ωδ , bδ , xi )] + 3: Calculate the output weight: β = H Y Prediction phase Input: A new sample x∗ . The output weight β. The sigmoid function g. {ω1 , ..., ωδ } and {b1 , ..., bδ } Output: Class label of x∗ . 1: Calculate the output layer matrix: h(x∗ ) = [g(ω1 , b1 , x∗ ), ..., g(ωδ , bδ , x∗ )]. 2: y ∗ = h(x∗ )β. Assign the number of column which gets the greatest value among the columns to the class label of x∗ . Compared to conventional feed forward ANNs, ELM offers significant advantages such as: 1) fast leaning speed, 2) no need to tune the parameters, 3) better generalization performance and 4) ease of implementation [48], [49]. B. Proposed ELM ensembles ELM decreases the learning time dramatically with respect to a conventional ANN due to the random selection of weights and biases for hidden nodes [55], [56]. However, these parameters are not optimized. In this case, ELM will not be able to incorporate prior knowledge of the inputs, thus the generalization error might increase. Consequently, we propose to construct an ensemble of several predictors on the training set using RS method in which the parameters in each predictor are randomly selected. In this work, two implementations of the ELM ensembles, Random subspace- and Rotation Subspace-based are developed for hyperspectral image classification. 1) Random subspace with ELM: Given a training set, the parameters of ELM (activation function and number of hidden nodes), the number of features in a subset (M ) and the number of classifiers (T ), the RS with ELM algorithm can be summarized by the following three steps (see Algorithm. 3): 1) generate a subset of M features from the entire feature set for T times; 2) apply these features to ELM classifier and obtain T classification results; 3) produce the final classification map by combining the T predictions using a majority voting rule. March 1, 2015 DRAFT 9 Algorithm 3 Random Subspace with ELM Training phase n Input: : {X, Y} = {(xi , yi )}i=1 : training samples. T : number of classifiers. L: Base classifier. L = ∅: the ensemble. M : number of features in a subspace (M < D). F: feature set. Output: : The ensemble L. 1: for i = 1 to T do 2: Randomly selected from F without replacement to form a new training set composed of M features. 3: Train a ELM classifier Li using a new training set. 4: Add the classifier to the current ensemble, L = L ∪ Li . 5: end for Prediction phase T Input: The ensemble L = {Li }i . A new sample x∗ . Output: class label y ∗ 1: run each classifier in the ensemble using x∗ . T P 1 2: y ∗ = argmax q∈{1,2,...,Q} j:Lj (x∗ )=q 2) Rotation subspace ELM: The main steps of Rotation subspace ELM (RoELM) can be summarized as follows: • divide the feature space into K disjoint subspaces; • perform PCA to each subspace with the bootstrapped samples of 75% of the original training set; • the new training set, which is obtained by rotating the original training set, is treated as input to the individual classifier; • the final result is generated by combing the individual classification results using a majority voting rule. The main difference between RoELM and RoRF is that we use the ELM classifier instead of the RF classifier as base learner (see Step 8 in training phase of Algorithm 1). Diversity in RoELM is promoted in three aspects: 1) random selection of features; 2) feature extraction applied to the selected features using bootstrap sampling technique; 3) random selection of parameters in each ELM classifier. When the number of training samples is less than the number of features, the covariance matrix will be singular and cannot be inverted. In order to avoid the problem of singularity of the covariance matrix, the value of 0.75 × n should be larger than M in the ensembles of RoF, RoRF and RoELM. IV. E XTENDED MULTI - ATTRIBUTE PROFILES (EMAP S ) Mathematical morphological is a powerful framework for the analysis of spatial information in remote sensing imagery [30], [35]. In particular, attribute profiles have been successfully applied to produce classification maps of remote sensing data [38], [39]. A sequence of attribute filters (AFs) are applied to a scalar image to obtain APs. AFs are connected operators, that is they process a gray-level image by keeping or merging their connected components at different gray levels. Denoting respectively with φ and γ an attribute thickening and thinning based on the arbitrary criterion Pλ . An AP of an image f is obtained by applying several attribute thickening and thinning operators with given a sequence of thresholds {λ1 , λ2 , .., λǫ } for the predicate P as follows [39]: March 1, 2015 DRAFT 10  AP (f ) = φλǫ (f ), φλǫ−1 (f ), ..., φλ1 (f ), f, γ λ1 (f ), ..., γ λǫ−1 (f ), γ λǫ (f ) (7) AP deals with only one spectral band. If we apply the full spectral bands of hyperspectral data to extract APs, the dimensionality of APs becomes extremely high. In order to address the problem, Dalla Mura et al. proposed to consider few of the first several principal components of the hyperspectral data [39]. However, any feature extraction and selection could be also used [42]. Thus, the expression of an EAP computed on the first C PCs from the original hyperspectral data [39] is given by: EAP = {AP (P C1 ), AP (P C2 ), ..., AP (P CC )} (8) An EMAP is composed of m different EAPs based on different attributes {a1 , a2 , ..., am }: o n ′ ′ EM AP = EAPa1 , EAPa2 , ..., EAPam (9) ′ where, EAPa = EAPa / {P C1 , P C2 , ..., P CC }. Although a wide variety of attributes can be used to construct APs, only the area and standard deviation attributes are considered in this study. Fig. 1 presents the general steps of the construction of EMAPs using the area and standard deviation attributes. Firstly, PCA is performed on the original hyperspectral image and the first components with cumulative eigenvalues over 99% are remained. Then, APs with attribute and standard deviation attributes are computed on the first retained features and the output features are concatenated into a stacked vector to construct an EMAP. According to [43], λs is initialized so as to cover a reasonable amount of deviation in the individual feature, which is mathematically given by: λs (Fi ) = µi {τmin , τmin + ǫs , τmin + 2ǫs , ..., τmax } 100 (10) where, Fi is the ith feature of the image and µi is the mean value of the ith feature. The values of τmin , τmax and ǫs are 2.5%, 27.5% and 2.5%, respectively, which leads to 11 thinning and 11 thickening operations. The construction of the attribute area is established in the following: λa (Fi ) = 100 {αmin , αmin + ǫa , αmin + 2ǫa , ..., αmax } ν (11) where, ν is the spatial resolution of the remote sensing image. The values of αmin , αmax and ǫa are 1, 14 and 1. The EAP for the area attribute contains 14 thinning and 14 thickening operations for each feature. March 1, 2015 DRAFT 11 Fig. 1. The construction of EMAPs using the area (A) and standard (S) deviation attributes. Firstly, PCA is performed on the original hyperspectral image and the first features with cumulative eigenvalues over 99% are kept. Then, APs with attribute and standard deviation attributes are performed on the first features and the output features are concatenated into a stacked vector to construct EMAPs. March 1, 2015 DRAFT 12 V. E XPERIMENTS WITH A SIMULATED DATA In this section, a simple simulated hyperspetral data is used to evaluate the classification performance of the proposed methods, including RoRF ensemble, ELM ensembles and spectral-spatial strategy. An synthetic image is generated by a linear mixture model with Q = 4 spectra: xi = Q X mq sqi + ni (12) q=1 where, xi is simulated mixed pixel. {mq } are the spectral signatures obtained from the USGS digital library 2 . The spatial information is generated by using an multi-level logistic (MLL) distribution with a value of smoothness parameter equals to 2. The simulated image is composed of 128×128 pixels with 224 spectral bands. Assume that q xi has class label yi = qq , then we define si q as the abundance of the objective class and sqi is the abundance of the remaining features which contribute to the mixed pixel, where sqi is constructed by the uniform distribution P q sqi = 1 − s, and we use the same value of s = 0.7 for over the simplex. In this section, we take si q = s, q∈Q,q6=qi all pixels.  Furthermore, zero-mean Gaussian noise with variance σ 2 I, i.e. ni ∼ N 0, σ 2 I is added to the simulated image. In particular, σ 2 is set to be 0.8 in order to make a very challenging classification problem. More details about how this dataset is generated can be found in [57]. We conducted four different experiments with the simulated hyperspectral image in order to investigate several relevant aspects of our proposed framework. In all experiments, 40 samples per class (total number of 160 samples) are selected as training set. In order to increase the statistical significance of the results, the reported means and standard deviations are obtained from 10 Monte Carlo runs. Experiments for the simulated data consists in: • In the first experiment, we evaluate the classification performance of RoRF with respect to other decision tree ensembles, including RSDT, RF and RoF. The impact of the parameters, such as the number of trees in RF and RoRF and the number of features in a subset (M), are analyzed. • In the second experiment, we compare the ELM ensembles with a standard ELM classifier. The key parameters are also analyzed. • In the third experiment, we compare the ELM ensembles with DT ensembles. • In the fourth experiment, we give the classification performance of the proposed spectral-spatial classification strategy. In order to analyze the ensembles clearly, we present the follow measures to evaluate the performance: • Overall accuracy (OA) of the ensemble. • Average of overall accuracies (AOA) of the individual classifiers. • Diversity among the individual classifiers within the ensemble. In this paper, we select coincident failure diversity (CFD) as a diversity measure [58] 3 . Please refer to the details of CFD in [58]. Higher values of 2 https://engineering.purdue.edu/biehl/MultiSpec/. 3 http://pages.bangor.ac.uk/∼mas00a/book March 1, 2015 wiley/matlab code/diversity/demo diversity.html. DRAFT 13 TABLE II OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES ( IN PERCENT ) AND DIVERSITY OBTAINED FOR DIFFERENT DT ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (T = 20 AND M = 10). Classifiers OA AOA Diversity RSDT 53.52±1.15 35.83±0.42 0.38±0.0044 RF 57.81±0.80 37.12±0.51 0.39±0.0053 RoF 79.50±1.09 57.07±1.24 0.61±0.0120 RoRF 85.15±0.73 67.75±1.06 0.70±0.0089 TABLE III OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES ( IN PERCENT ) AND DIVERSITY OBTAINED FOR RO RF WITH DIFFERENT NUMBER OF TREES IN RF (T =20 AND M = 10). Classifiers OA AOA Diversity 1 RF 39.31±1.64 N/A N/A RoRF 72.80±1.59 46.54±0.80 0.49±0.0084 Number of trees in RF 5 RF RoRF 44.36±1.58 79.85±0.95 36.97±1.24 57.25±0.82 0.46±0.0106 0.62±0.0087 10 RF 52.25±0.91 37.64±0.96 0.42±0.0040 RoRF 82.84±0.87 60.60±0.89 0.68±0.0088 CFD means the stronger diversity within the ensemble. A. Experiment 1 In this experiment, the number of base classifiers (T ) and the number of features in a subset (M ) are set to be 20 and 10, respectively. A non-parametric decision tree learning technique: classification and regression tree, is used to construct the decision tree ensembles [59]. The impurity measure used in selecting the variables in CART is Gini index. Table II shows overall accuracies, average of overall accuracies of individual classifiers and diversities obtained for different decision tree ensemble classifiers. Greater values of AOA and diversity usually lead to better performances of the ensemble. From Table II, RoRF gets the highest values of AOA and diversity, resulting in the best classification result. Then, we analyze the performance of the decision tree ensembles for different values of M with T = 20. Fig. 2 shows the OAs, AOAs and diversities obtained by RSDT, RF, RoF and RoRF as a function of M . The best performance achieved by the proposed RoRF ensemble approach, which yields the best OA, AOA and diversity with all different values of M . From the figure, we can observe that when M becomes larger, all the DT ensembles tend to have better performances. Furthermore, we also studied the classification performances of RoRF with different number of trees in RF. The statistics are reported in Table III. OA, AOA and diversity of RoRF increase as the number of trees increases. Notice that a good performance achieved by RoRF with only 10 decision trees in RF, which gets an OA of 82.84%, even higher than the one of RoF with 20 trees. Therefore, in order to save time and memory, we can build the RoRF ensemble with a small number of trees in RF. B. Experiment 2 In this experiment, T , M and δ are set to be 20, 10 and 256, respectively. The Sigmoid function is selected as the functions of hidden-nodes. OAs, AOAs and diversities obtained by ELM and the proposed ELM ensemble classifiers March 1, 2015 DRAFT 14 75 0.8 70 0.75 85 80 0.7 65 0.65 75 65 Diversity AOA(%) OA(%) 60 70 55 0.6 0.55 50 0.5 60 RSDT RF RoF RoRF 55 50 45 RSDT RF RoF RoRF 40 0.4 0.35 35 20 40 60 80 100 Number of features in a subset (M) RSDT RF RoF RoRF 0.45 20 40 60 80 100 Number of features in a subset (M) (a) 20 (b) 40 60 80 100 Number of features in a subset (M) (c) Fig. 2. (a) OAs obtained by RSDT, RF, RoF and RoRF with different values of M . (b) AOAs obtained by RSDT, RF, RoF and RoRF with different values of M . (c) Diversities obtained by RSDT, RF, RoF and RoRF with different values of M (T = 20). TABLE IV OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF INDIVIDUAL CLASSIFIERS ( IN PERCENT ) AND DIVERSITY OBTAINED FOR DIFFERENT ELM ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (M = 10). Classifiers OA AOA Diversity ELM 53.52±1.15 N/A N/A RSELM 67.75±1.00 43.80±0.54 0.46±0.0057 RoELM 79.73±1.67 58.98±1.19 0.62±0.0115 are shown Table IV. Note the proposed RoELM ensemble classifier produced the best result. This is reasonable since RoELM introduces rotation strategy into the ensemble, which can both increase the accuracies of the member classifiers and diversity within the ensemble. Fig. 3 plots the OAs, AOAs and diversities of ELM ensembles with respect to the number of features in a subset (M ). For these experiments, T and δ are fixed to be 20 and 256, respectively. Fig. 4 reports the OAs, AOAs and diversities with different parameter δ for ELM ensembles. T and M are fixed to be 20 and 10, respectively. From the two figures, we have the following observations: 1) RoELM achieves the best results in all the cases, which demonstrates that RoELM is an effective ensemble method; 2) the effects of M are consistent with the preliminary test in Fig. 2. The higher OAs, AOAs and diversities of ELM ensembles, the higher the number of features in a subset (M ); 3) for this simulated data, ELM and its ensemble produce the best classification performances when δ = 64. In this case, larger δ results in lower AOAs and diversities of RSELM and diversities of RoELM. In contrast, AOAs of RoELM increase as δ increases. The reason is that a complex network (high value of δ) may overfit the training data of the member classifier in RSELM. C. Experiment 3 Table V gives the OAs, AOAs and diversities obtained for RSDT, RSELM, RoF and RoELM ensembles. In order to make fair comparisons of DT ensembles and ELM ensembles, RSDT shares the same subspace with RSELM and RoF shares the same rotations with RoELM. In these experiments, T , M and δ are set to be 20, 10 and 256, March 1, 2015 DRAFT 15 85 0.8 70 0.75 80 65 0.7 65 0.65 60 Diversity 70 AOA(%) OA(%) 75 55 0.6 0.55 60 50 ELM RSELM RoELM 55 0.5 RSELM RoELM 45 RSELM RoELM 0.45 50 0.4 20 40 60 80 100 Number of features in a subset (M) 20 40 60 80 100 Number of features in a subset (M) (a) 20 40 60 80 100 Number of features in a subset (M) (b) (c) Fig. 3. (a) OAs obtained by ELM, RSELM and RoELM with different values of M . (b) AOAs obtained by RSELM and RoELM with different values of M . (c) Diversities obtained by RSELM and RoELM with different values of M (T = 20 and δ = 256). 80 0.65 80 75 75 0.6 70 70 60 Diversity AOA(%) OA(%) 65 65 60 0.55 0.5 55 55 RSELM RoELM 50 50 ELM RSELM RoELM 45 0.45 RSELM RoELM 45 40 200 400 600 800 Number of hidden nodes (δ) 1000 (a) 0.4 200 400 600 800 Number of hidden nodes (δ) 1000 (b) 200 400 600 800 Number of hidden nodes (δ) 1000 (c) Fig. 4. (a) OAs obtained by ELM, RSELM and RoELM with different values of δ. (b) AOAs obtained by RSELM and RoELM with different values of δ. (c) Diversities obtained by RSELM and RoELM with different values of δ (T = 20 and M = 10). TABLE V OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF MEMBER CLASSIFIERS ( IN PERCENT ) AND DIVERSITY OBTAINED FOR RSDT, RSELM, RO F AND RO ELM ENSEMBLES WHEN APPLIED TO THE SIMULATED H YPERSPECTRAL DATA (T = 20, M = 10 AND δ = 256). Classifiers OA AOA Diversity March 1, 2015 RSDT 53.52±1.15 35.83±0.42 0.38±0.0044 RSELM 67.75±1.00 43.80±0.54 0.46±0.0057 RoF 79.50±1.09 57.07±1.24 0.61±0.0120 RoELM 79.73±1.67 58.98±1.19 0.62±0.0115 DRAFT 16 respectively. Compared to DT ensembles (RSDT and RoF), the proposed ELM ensembles (RSELM and RoELM) adapt two strategies to improve the classification performance, one is to use ELM as the base learner to improve the individual classification accuracies and the other is to utilize random selection of parameters in each ELM classifier to promote diversity within the ensemble. In addition, RoELM and RoF are superior to RSELM and RSDT. Sensitivity of the parameters to the classification performance can be seen in Section V. B and Section V. C. D. Experiment 4 Although Random subspace ensemble can provide better classification results than single classifiers, the map (seen in Fig.5(b)) looks still noisy due to the use only of spectral information. In order to further improve the results, the spatial information should be considered. In this work, we propose to use EMAPs, which can offer the potential to model structural information in great details according to the use of different types of attributes. In this experiment, the proposed RoELM for spectral and spatial classification is considered. The first four components resulting from PCA (which comprise more than 99% of the data variance) were used and the EMAPs consisted of 204 features. Fig. 5 shows the ground truth of the simulated image and classification maps of RoELM with spectral and spatial information, respectively. The OAs, AOAs and diversities are also listed in Table VI. The classification accuracy of RoELM with EMAPs is significantly higher than the one of RoELM with spectral information and the map produced by RoELM with EMAPs is more smoothness than the one produced by RoELM with spectral information when compared to the ground truth. In addition, we also studied the effect of M and δ on the classification performances of RoELM with EMAPs. For this dataset, RoELM with EMAPs can obtain high overall accuracy with M = 10 (OA = 99.30%). Larger values of M cannot improve the accuracies significantly. Similar to the results of RoELM with spectral information, classification accuracy of RoELM with EMAPs increases gradually in the beginning as δ increases, but decreases dramatically when δ reaches 64. Summarizing, the experiments conducted with simulated data sets indicated that Random subspace ensembles with or without EMAPs achieves better performance than single classifiers in highly mixed and noisy environments and with a limited number of training samples. In particular, the proposed RoRF and ELM ensembles show good performance in the classification of hypersepctral data. However, the performances of the RS ensembles have been shown to be dependent on the setting of parameters M and δ. In order to get good characterization results, these parameters should be optimized. In particular, fine-tuning of these parameters can be done by manual manipulation. Although it is encouraging to observe positive classification results using the proposed methods on simulated hyperspectral data, we have performed further analysis with real hyperspectral scenes and comparisons with other state-of-the-art methods in the next section in order to fully substantiate the proposed methods. March 1, 2015 DRAFT 17 TABLE VI OVERALL ACCURACIES ( IN PERCENT ), AVERAGE OF OVERALL ACCURACIES OF MEMBER CLASSIFIERS ( IN PERCENT ) AND DIVERSITY OBTAINED FOR RO ELM ENSEMBLES WHEN APPLIED TO SPECTRAL AND SPATIAL INFORMATION OF THE SIMULATED H YPERSPECTRAL DATA . Classifiers OA AOA Diversity RoELM with spectral information 79.73±1.67 58.98±1.19 0.62±0.0115 (a) RoELM with EMAPs 99.30±0.08 97.12±0.13 0.74±0.0221 (b) (c) Fig. 5. (a) Image of class labels for a simulated image. (b) Classification map of RoELM with spectral information (OA = 79.67%). (c) Classification map of RoELM with EMAPs (OA = 99.42%). VI. E XPERIMENTAL WITH REAL HYPERSPECTRAL DATASETS In this section, the proposed approaches are evaluated using two real hyperspectral datasets. Two individual classifiers, DT and ELM, and six ensemble methods, RSDT, RF, RoF, RoRF, RSELM and RoELM are applied to classify the spectral information and EMAPs of hyperspectral data. CART is used to build decision tree ensembles and Gini index is adopted as the impurity measure used in selecting the variable [59]. The Sigmoid function is selected as the functions of hidden-nodes in ELM and its ensembles. In this work, only the first four components resulting from PCA (which comprise more than 99% of the data variance) were used, and the EMAPs consisted of 204 features. The reported results in this work are achieved by the mean of ten Monte Carlo runs. We used the following measures to evaluate the performances of different classification methods: • Overall accuracy (OA): the percentage of correctly classified samples. • Average accuracy (AA): average percentage of correctly classified samples for individual class. • Kappa coefficient (κ): the percentage agreement corrected by the level of agreement that could be expected to chance alone. • Computation time: all methods were implemented in Matlab on a computer having Inter(R) Xeon(R) 2 CPU, 2.8 GHz and 12GB of memory. Random Forest implementation was downloaded from the website: http: //code.google.com/p/randomforest-matlab/. The source code of ELM can be assessed from the website: http: //www.ntu.edu.sg/home/egbhuang/elm codes.html. March 1, 2015 DRAFT 18 Fig. 6. (a) Three-band color composite of AVIRIS image. (b) Ground truth. TABLE VII I NDIAN P INES AVIRIS IMAGE : CLASS NAME AND NUMBER OF SAMPLES IN G ROUND TRUTH Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Class Name Alfalfa Corn-no till Corn-min till Bldg-Grass-Tree-Drives Grass/pasture Grass/trees Grass/pasture-mowed Corn Oats Soybeans-no till Soybeans-min till Soybeans-clean till Wheat Woods Hay-windrowed Stone-steel towers Number Ground Truth 54 1434 834 234 497 747 26 489 20 968 2468 614 212 1294 380 95 A. Hyperspectral datasets Two hyperspectral remote sensing images are used to assess the performance of the proposed methods. 1) Indian Pines AVIRIS image: The first hyperspectral image is recorded by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines in Northwestern Indiana, USA. This scene, which comprises 220 spectral bands in the wavelength range from 0.4 to 2.5 µm with spectral resolution 10nm, is composed of 145 × 145 pixels, and the spatial resolution is 20 m/pixel. Fig. 6 shows the three-band color composite image and ground truth of AVIRIS hyperspectral data. Table VII gives the class name and the number of ground truth of AVIRIS hyperspectral data. March 1, 2015 DRAFT 19 Fig. 7. (a) Three-band color composite image of AVIRIS data. (b) Reference map. 2) University of Pavia ROSIS image: The second experiment was carried out on the University of Pavia image of an urban area, acquired by the Reflective Optics Spectrographic Imaging System (ROSIS)-03 optical airborne sensor. Nine land cover classes were considered for classification. The original image is composed by 610 × 340 pixels, with a spatial resolution of 1.3 m/pixel and 115 spectral bands. In this work, 12 noisy channels were removed and the remaining 103 spectral bands are used for the investigation. Fig. 7 shows the three-band color composite image and reference map of University of Pavia data. Class name and the number of training and test samples of ROSIS image are presented in Table VIII. B. Results of Indian Pines AVIRIS image Table IX and X present the classification results obtained for the individual classifiers and RS ensemble methods using different number of training samples when the spectral and EMAPs are used as input, respectively. Average accuracies for each classifier are also given in parentheses. According to the studies in [27] and [60], the parameters used for each ensemble classifier are shown in Table XI 4 . As shown in Table IX and X, the RS ensemble methods 4 For 5 samples per class, M is set to be 55 in RoF, RoRF and RoELM ensembles with spectral information. March 1, 2015 DRAFT 20 TABLE VIII U NIVERSITY OF PAVIA ROSIS IMAGE : CLASS NAME AND NUMBER OF TRAINING AND TEST SAMPLES Number 1 2 3 4 5 6 7 8 9 Class Name Bricks Shadows Metal Sheets Bare Soil Trees Meadows Gravel Asphalt Bitumen Number of samples Train Test 524 3682 514 947 375 1345 540 5029 231 3064 532 18649 265 2099 548 6631 392 1330 TABLE IX OVERALL ACCURACIES AND AVERAGE ACCURACIES ( IN PARENTHESES ) OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING DIFFERENT SIZE OF TRAINING SET WHEN APPLIED TO THE SPECTRAL INFORMATION OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA . Samples per class 5 10 15 20 25 30 35 40 45 50 DT 29.64±3.61(39.62) 38.85±4.03(49.87) 40.43±2.42(51.79) 43.13±2.35(55.15) 46.59±1.32(57.54) 47.49±1.35(58.24) 47.90±2.56(57.75) 49.05±2.06(58.33) 50.42±2.14(60.79) 50.51±2.63(59.48) RSDT 36.41±3.49(47.17) 46.57±2.48(58.33) 49.62±1.18(61.91) 53.82±2.07(65.78) 55.71±1.18(66.78) 58.23±2.36(67.98) 59.82±1.31(69.12) 60.85±1.27(70.10) 62.22±1.24(72.08) 62.63±0.99(71.60) RF 42.87±3.65(53.79) 49.89±3.16(60.86) 51.13±1.52(63.73) 55.52±2.50(66.92) 57.23±1.56(68.52) 60.07±2.10(70.47) 61.48±1.59(71.10) 62.66±1.52(71.95) 63.68±0.93(73.25) 64.20±0.58(77.51) RoF 47.79±3.23(61.05) 57.33±2.26(69.95) 63.03±1.65(73.79) 68.98±2.71(79.89) 71.81±1.80(81.19) 72.65±2.27(81.36) 74.36±0.58(82.54) 74.46±1.19(82.97) 76.58±1.26(84.90) 75.96±1.06(84.39) RoRF 51.14±3.22(62.66) 58.39±1.78(69.58) 61.49±1.71(73.01) 67.56±2.55(78.54) 70.17±1.46(79.43) 70.79±1.77(80.03) 72.78±1.17(81.23) 73.45±1.53(81.64) 74.81±1.37(83.37) 74.56±1.26(82.21) ELM 51.15±2.58(65.8) 57.17±1.92(71.06) 58.51±1.80(71.99) 57.68±2.02(70.75) 62.03±1.58(76.34) 61.88±1.10(73.96) 66.73±1.62(76.98) 66.44±1.13(77.79) 67.46±1.05(77.31) 67.65±1.69(77.11) RSELM 55.39±3.08(69.69) 65.11±1.83(76.35) 68.69±1.59(79.7) 70.73±1.18(80.36) 72.67±0.97(83.71) 75.61±1.11(85.49) 74.86±1.53(84.31) 75.46±0.80(85.05) 77.43±0.31(86.64) 77.85±1.09(87.57) RoELM 58.72±2.06(71.87) 69.84±1.27(79.67) 72.93±1.07(83.43) 75.95±0.82(85.22) 77.13±0.94(86.51) 78.24±0.67(86.87) 78.89±1.07(87.17) 80.08±0.5(88.24) 80.34±0.25(88.08) 81.19±0.7(89.00) exhibit the potential to improve the classification performance by using both spectral and spatial information. The proposed RoELM outperforms ELM, RSELM and other decision tree ensemble in terms of achieving higher classification accuracies in all cases. With the help of promoting diversity using feature extraction approaches, Rotation subspace classifiers, including RoF, RoRF and RoELM, are superior to the ensemble classifiers of RF, RSDT and RSELM. In order to show the performance of RS ensemble methods under different training conditions and scenarios, in the second experiment, we evaluated the classification accuracies of the RS ensemble approaches using a fixed number of training samples in which 10% of the labeled samples per class have been used for training (a total number of 1036 samples) and the remaining labeled samples are used for testing. Table XII and XIII provide the TABLE X OVERALL ACCURACIES AND AVERAGE ACCURACIES ( IN PARENTHESES ) OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING DIFFERENT SIZE OF TRAINING SET WHEN APPLIED TO THE EMAP S OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA . Samples per class 5 10 15 20 25 30 35 40 45 50 March 1, 2015 DT 55.07±6.68(65.95) 70.22±5.01(80.19) 76.04±1.12(83.34) 80.82±2.32(85.98) 81.62±2.24(87.64) 84.17±2.23(88.38) 85.13±3.11(89.15) 85.14±1.71(88.52) 85.68±1.48(89.31) 87.08±1.53(90.43) RSDT 57.48±5.51(70.24) 73.84±4.59(82.36) 80.34±2.28(85.96) 82.69±1.75(87.84) 84.37±2.14(89.95) 87.07±2.82(90.27) 87.51±2.25(91.72) 87.72±1.82(90.91) 89.12±1.91(92.42) 90.27±1.21(92.82) RF 65.69±5.58(77.51) 77.21±3.43(85.77) 83.18±1.78(89.26) 84.46±1.80(89.44) 87.54±1.34(92.1) 88.66±1.06(92.94) 88.93±1.33(92.66) 89.85±1.23(93.29) 90.81±0.97(93.69) 91.61±0.79(94.01) RoF 66.22±4.41(76.91) 78.29±2.94(76.91) 83.26±1.9(88.25) 85.36±1.53(89.29) 87.92±1.31(92.10) 89.32±1.67(92.60) 89.84±1.06(93.07) 90.44±0.98(92.98) 91.34±0.96(93.87) 92.20±0.85(94.59) RoRF 70.81±4.59(81.64) 80.98±2.45(88.31) 86.01±1.65(90.78) 87.41±1.45(91.55) 89.30±1.83(93.48) 91.05±1.60(94.46) 91.29±0.74(94.00) 91.99±1.20(94.45) 92.71±0.93(95.16) 93.31±0.42(95.33) ELM 73.24±5.28(81.44) 82.47±2.71(87.17) 83.92±2.94(88.02) 83.98±2.41(87.31) 87.02±2.60(90.7) 89.65±1.78(92.28) 89.48±1.70(91.73) 88.76±1.17(91.03) 91.4±1.24(93.73) 91.32±0.86(93.35) RSELM 74.24±5.38(81.55) 83.36±2.33(87.69) 85.76±2.67(89.25) 87.67±1.49(90.15) 88.37±1.69(91.61) 90.73±1.60(92.82) 90.76±1.20(92.45) 91.01±1.25(92.54) 93.71±0.65(95.41) 94.29±0.43(95.31) RoELM 75.97±4.01(83.59) 85.19±2.32(89.43) 87.45±2.05(91.4) 89.35±1.09(92.19) 90.70±1.33(94.02) 92.14±1.45(94.71) 92.77±0.98(94.99) 92.99±0.64(94.8) 93.97±0.60(95.61) 94.53±0.41(95.25) DRAFT 21 TABLE XI T HE PARAMETERS USED FOR ELM AND RS ENSEMBLE CLASSIFIERS (I NDIAN P INES AVIRIS IMAGE ). Features Spectral Methods RSDT RF RoF RoRF ELM RSELM RoELM T 20 20 20 20 − 20 20 M 110 15 110 110 − 110 110 δ − − − − 256 256 256 Features EMAPs Methods RSDT RF RoF RoRF ELM RSELM RoELM T 20 20 20 20 − 20 20 M 102 15 3 3 − 102 3 δ − − − − 256 256 256 TABLE XII C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10% NUMBER OF SAMPLES IN G ROUND T RUTH AS TRAINING SAMPLES WHEN APPLIED TO THE SPECTRAL INFORMATION OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA . Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 OA AA κ Time(s) DT 38.16 50.76 45.09 26.87 67.63 78.23 20.43 85.93 1.67 47.01 62.02 29.73 80.94 89.57 35.96 54.59 59.77 50.79 54.13 1.49 RSDT 28.77 65.79 53.86 34.69 79.04 92.14 16.09 92.77 0.56 63.85 80.56 42.28 90.94 93.67 39.77 71.88 72.53 57.98 68.44 9.25 RF 15.71 61.82 50.55 30.76 79.98 92.37 14.78 96.16 6.11 61.76 83.88 47.25 92.93 94.13 39.33 81.06 72.84 59.29 68.70 0.85 RoF 53.06 79.86 69.31 63.32 88.37 94.72 59.13 97.73 0 77.19 87.35 69.58 97.91 96.22 55.85 88.35 83.14 73.62 80.70 26.95 RoRF 29.80 74.38 62.88 50.76 83.06 93.99 50.87 98.20 2.78 76.87 90.53 68.72 97.07 96.36 45.09 89.53 81.31 69.12 78.49 18.11 ELM 15.11 73.01 59.01 36.59 90.56 95.55 3.04 99.43 4.44 64.02 80.77 67.09 99.63 95.55 59.97 46.12 77.46 61.89 74.11 0.22 RSELM 25.71 79.95 61.89 54.17 93.17 97.49 8.02 99.57 12.22 68.56 87.72 75.51 99.58 96.94 60.26 70.59 82.38 68.23 79.74 6.18 RoELM 49.39 83.13 71.85 62.09 91.48 94.64 46.52 98.32 37.77 75.67 87.87 82.64 98.53 97.36 54.77 73.29 84.70 75.33 82.44 14.63 OAs, AAs, κ and class-specific accuracies obtained from the individual and ensemble classifiers using spectral information and EMAPs, respectively. The processing times in seconds are also included for reference. It can be seen from the results in Table XII and Table XIII that the performance of ELM is superior to CART in both of testing accuracy and learning time. When the spectral information is treated as input, RoELM and RoF share the top position. The OAs (AAs) of the two methods are 84.70% (75.33%) and 81.31% (73.62%), higher than those of other methods. Class 9 produces bad results in all classifiers, the reason may be that there is insufficient information provided by class 9 using only 2 samples in the training. Compared to the results reported in Table XII, the classification accuracies in Table XIII involving the spatial information are much better than those obtained only with the spectral information, demonstrating that EMAP can accurately model spatial-contextual information March 1, 2015 DRAFT 22 TABLE XIII C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10% NUMBER OF SAMPLES IN G ROUND T RUTH AS TRAINING SAMPLES WHEN APPLIED TO THE EMAP S OF I NDIAN P INES AVIRIS H YPERSPECTRAL DATA . Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 OA AA κ Time(s) DT 82.65 86.14 92.65 74.55 89.53 94.15 23.48 100 69.44 84.43 94.67 85.14 99.16 98.57 93.27 96.12 91.57 85.23 90.41 0.77 RSDT 83.67 91.01 95.5 87.11 92.37 95.33 23.04 99.77 61.11 86.89 96.18 90.29 98.84 99.22 96.40 97.41 94.05 87.13 93.22 4.55 RF 87.14 91.00 95.31 89.15 92.51 97.22 73.91 99.77 92.77 88.43 97.94 92.28 99.11 99.25 97.63 98.00 95.17 93.21 94.36 0.63 RoF 87.35 91.11 95.63 87.57 93.31 96.95 40.43 99.75 62.78 87.50 96.74 90.22 99.42 99.23 96.35 97.65 94.56 90.00 93.80 13.99 RoRF 87.14 91.55 96.51 91.66 93.20 98.07 84.78 99.80 98.33 89.06 98.55 93.06 99.53 99.24 98.63 98.12 95.83 94.83 95.24 14.07 ELM 74.08 90.21 97.48 88.34 91.28 97.75 88.28 96.23 80.56 92.61 96.53 86.20 98.95 96.29 75.38 0.71 92.59 84.43 91.64 0.21 RSELM 86.94 90.12 98.75 94.27 94.63 99.12 96.09 99.39 89.44 90.55 98.49 89.19 99.48 99.16 92.40 50.51 95.46 92.32 94.82 4.08 RoELM 87.76 90.33 98.95 95.02 94.36 99.32 96.09 99.55 95.56 91.56 98.66 89.17 99.48 99.42 94.36 32.71 95.40 91.39 94.73 13.59 in all cases. For the EMAPs as the input for this scene, RF and RSELM is slightly better than RoF and RoELM. Among them, RoRF yields the highest OA, AA and κ. Feature extraction techniques in the processing of RoF, RoRF and RoELM classifiers will lead the longer computation time than those of RSDT, RF and RSELM. The computational complexity of ELM ensemble is lower than those of DT ensemble. The computation time of RF ensemble is extremely low (less than 1s). Fig. 8 presents the classification maps (one of the ten Monte Carlo runs) obtained for the individual and ensemble learning methods with 10% labeled sample as the training samples in Table XII and Table XIII. As can be seen from the two figures, RS ensemble can improve the classification performance and reduce the classification noise. The classification methods based on EMAPs spatial features result in classification maps with more homogeneous regions when compared to the classification result using spectral information. More classification results for the Indian Pines AVIRIS image based on EMAPs and other spatial-contextual information can be found in [34], [45]–[47]. The accuracies in the previous studies are not directly compared with those given in this paper because different experimental settings (number of features, training and testing samples) are used in these studies. However, it can be concluded that RS ensembles with EMAPs perform well compared to other previously proposed classification approaches for hyperspectral data. March 1, 2015 DRAFT 23 Fig. 8. Classification results of Indians Pines AVIRIS image (only one Monte Carlo run). Overall accuracies of the classifiers are also given. March 1, 2015 DRAFT 24 C. Results of University of Pavia ROSIS image TABLE XIV C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING THE ENTIRE TRAINING SET WHEN APPLIED TO THE SPECTRAL INFORMATION OF U NIVERSITY OF PAVIA ROSIS IMAGE . Class 1 2 3 4 5 6 7 8 9 OA AA κ Time(s) DT 83.46 92.93 96.95 76.29 97.75 52.35 54.79 71.93 76.62 67.30 78.11 60.15 1.98 RSDT 91.68 97.42 98.99 81.56 98.67 53.12 51.32 79.60 83.68 70.44 81.78 63.90 20.50 RF 90.17 97.44 98.82 77.80 98.58 56.10 53.79 80.07 84.63 71.37 81.93 64.79 2.33 RoF 92.55 98.30 99.58 95.60 95.62 74.61 58.49 84.55 89.93 82.66 87.69 78.09 44.74 RoRF 93.29 99.60 99.55 95.55 98.79 65.38 57.54 85.34 90.39 79.04 87.27 73.98 53.41 ELM 90.7 99.65 85.64 94.92 96.68 58.76 70.18 77.21 88.01 74.56 84.64 68.75 1.56 RSELM 95.05 99.69 98.95 96.14 97.11 64.32 68.06 80.50 91.08 78.45 87.88 72.84 34.65 RoELM 91.76 99.89 99.85 97.62 95.33 69.19 63.43 76.02 90.90 79.44 87.11 74.25 51.45 Random subspace ensembles both with spectral information and spatial information are performed on the University of Pavia ROSIS image. For all the ensemble classifiers, the number of classifiers (iterations) is fixed to be 20. Following the studies in [27] and [60], the number of features in a subset (M ) in each ensemble classifier are √ set as follows. For the RF algorithm, the number of features in a subset is set to be the default value N of the software package (10 for this scene). For the RSDT and RSELM approach (spectral and spatial information), M is set to 52 and 102, respectively. The number of features in a subset for RoF, RoRF and RoELM used for spectral and spatial information is set to 10 and 3, respectively. The number of hidden nodes in ELM and its ensemble is fixed to be 128. Table XIV gives the overall accuracy, average accuracy and class-specific accuracy obtained for different classification algorithms using the entire training set when applied to the spectral information of University of Pavia ROSIS image. The computational times are also given in this table. From this table, it is clear that RoF provided the best results in terms of global and individual class accuracies, followed by RoRF and RoELM. In order to enhance the classification results, RS ensemble with EMAPs are further applied to classify hyperspectral data and the global and class-specific accuracies are reported in Table XV. March 1, 2015 DRAFT 25 TABLE XVI C LASSIFICATION ACCURACIES OBTAINED FROM THE PROPOSED METHODS (RO ELM EMAP S AND RSELM EMAP S ) WITH OTHER S PATIAL - SPECTRAL CLASSIFIERS FOR THE U NIVERSITY OF PAVIA ROSIS IMAGE Classifier OA AA κ SVM+Clustering [4] 94.68 95.21 92.92 MLRsubMLL [57] 94.10 93.45 92.24 GCK [45] 98.09 97.76 97.46 Mixed lasso 3D-DWT [34] 98.15 97.56 97.48 RSELM EMAPs 98.67 99.00 98.21 RoELM EMAPs 98.69 98.92 98.25 TABLE XV C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING THE ENTIRE TRAINING SET WHEN APPLIED TO THE EMAP S OF U NIVERSITY OF PAVIA ROSIS IMAGE . Class 1 2 3 4 5 6 7 8 9 OA AA κ Time(s) DT 98.02 85.96 99.55 98.95 89.69 90.85 67.13 91.34 99.32 91.67 91.20 89.14 1.38 RSDT 98.95 92.47 99.58 96.55 97.10 91.66 80.63 94.26 100.00 93.61 94.58 91.71 29.72 RF 98.94 97.33 99.62 96.34 99.12 97.28 73.05 95.14 100.00 96.08 95.20 94.83 2.82 RoF 98.61 97.00 99.62 99.39 94.55 93.65 85.45 93.52 100.00 94.85 95.75 93.26 37.93 RoRF 99.16 99.32 99.62 97.35 99.23 97.42 75.15 95.32 100.00 96.47 95.84 95.34 70.24 ELM 98.96 98.37 96.51 97.61 94.54 96.42 87.88 97.16 99.92 96.49 96.37 95.37 1.83 RSELM 99.51 99.31 99.67 99.96 98.52 98.35 98.08 97.69 99.93 98.67 99.00 98.21 37.66 RoELM 99.58 98.38 99.56 99.90 97.45 98.55 99.38 97.54 99.92 98.69 98.92 98.25 71.59 It can be seen from Table XV that the classification results with EMAPs significantly outperformed those only considering spectral information. All the RS ensemble yields the highest precision results. The proposed RSELM and RoELM outperform ELM, RoRF and other ensemble methods in terms of achieving higher global and classspecific accuracies. Rotation subspace-based classifiers (RoRF, RoF and RoELM) generate more accurate results than those of RF, RSDT and RSELM because they introduced more diversity within the ensemble. Concerning the computational load, different observations can be made as in the former experiments. The computational cost of ELM and its ensembles is higher than those of DT and DT ensembles, because of the large size of the dataset. The spectral-spatial methods are less computationally efficient than the spectral-based methods due to the higher dimensionality of input features, but provide, in turn, higher accuracies. For illustrative purpose, Fig. 9 provides the classification maps of the individual and ensemble classifiers (one of ten Monte Carlo runs). Compared to the results using only spectral information presented in Fig. 9 (a-f), the maps involving spatial information (seen in Fig. 9 (g-p)) generate more homogeneous areas (especially for the Class M eadows located at the lower left area) and reduce the classification noise. In addition, Table XVI presents the comparisons of RSELM EMAPs and RoELM EMAPS against other state of the art spectral-spatial classification methods, such as SVM+Clustering [4], MLRsubMLL [57], Generalized composite kernels (GCK) [45] and Mixed lasso with 3D-DWT features [34]. SVM+Clustering approach combines the results of a pixel wise SVM classification and the segmentation map obtained by partitional clustering using March 1, 2015 DRAFT 26 Fig. 9. Classification results of University of Pavia ROSIS image (only one Monte Carlo run). Overall accuracies for each classifier are given. March 1, 2015 DRAFT 27 TABLE XVII C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10 SAMPLES PER CLASS WHEN APPLIED TO THE SPECTRAL INFORMATION OF U NIVERSITY OF PAVIA ROSIS IMAGE . Class 1 2 3 4 5 6 7 8 9 OA AA κ Time DT 64.06 87.71 92.92 44.66 75.78 36.08 37.66 58.84 64.08 49.79 62.45 40.02 0.29 RSDT 66.11 91.22 96.39 51.76 84.35 39.71 40.96 60.97 68.71 53.79 66.69 44.36 2.51 RF 69.78 96.58 95.73 51.06 88.54 46.02 43.83 65.06 78.63 58.24 70.58 49.24 0.63 RoF 73.67 95.21 98.61 82.51 89.24 47.65 48.64 63.85 84.31 63.32 75.97 55.81 10.13 RoRF 80.33 98.36 99.20 77.54 96.81 48.54 46.34 66.71 87.94 64.77 77.98 57.51 19.74 ELM 52.83 92.83 40.37 52.54 55.41 50.08 50.27 47.99 61.36 51.67 55.97 40.96 2.6 RSELM 61.17 93.36 13.37 55.26 41.34 59.19 56.91 54.97 67.21 56.41 55.86 45.74 41.86 RoELM 54.08 97.56 94.59 84.55 89.13 48.29 61.98 49.75 73.96 60.22 72.63 52.12 80.67 majority voting [4]. MLRsubMLL is a Bayesian approach, which contains two main steps: 1) the posterior probability distributions are constructed by an a subspace MLR classifier, and 2) segmentation, which refers to an image of class labels from a posterior distribution built on the aforementioned classifier and on a multilevel logistic (MLL) prior [57]. GCK combines the different kernels built on the spectral and the spatial information of the hyperspectral data without any weight parameters [45]. The classifier in this work is the multinomial logistic regression, and the spatial information is modeled from EMAPs. Mixed lasso with 3D-DWT features is to use structured sparse logistic regression (solved by Mixed lasso) to classify three-dimensional discrete wavelet transform (3D-DWT) [34]. The results presented in Table XVI are obtained using the same training and testing set. From Table XVI, we can conclude that both RSELM EMAPs and RoELM EMAPS outperform other spatial-spectral classifiers in terms of OA, AA and κ. In particular, RoELM EMAPs gains the highest OA and κ and RSELM EMAPs achieves the highest AA. In order to assess the effectiveness of the RS ensemble for a limited training set, we have randomly extracted a few training samples from the training set. Only 10 samples for each class are used for this experiment. We have repeated the training sample selection and the classification process ten times, and the mean classification results are reported in this paper. Table XVII-XVIII shows the overall accuracy, average accuracy and class-specific accuracy obtained for individual and ensemble classifiers using 10 samples per class when the spectral and spatial information of University of Pavia ROSIS image used as the input, respectively . The classification results in Table XVII-XVIII are lower than those in Table XIV-XV, due to the limited training set. For instance, the OA and AA of RoRF EMAPs are 96.47% and 95.87% for the original training set, whereas using limited training samples, the OA and AA of RoRF EMAPs are 89.30% and 91.49%. Nevertheless, with a very small training set, the results using the combination of RS ensembles and EMAPs are still very good. Futhermore, Table XIX gives the comparisons of RSELM EMAPs and RoELM EMAPS against the state of the art spectral-spatial classification March 1, 2015 DRAFT 28 TABLE XVIII C LASSIFICATION ACCURACIES OBTAINED FOR DIFFERENT CLASSIFICATION ALGORITHMS USING 10 SAMPLES PER CLASS WHEN APPLIED TO THE EMAP S OF U NIVERSITY OF PAVIA ROSIS IMAGE . Class 1 2 3 4 5 6 7 8 9 OA AA κ Time DT 84.63 82.48 85.92 84.85 84.35 79.48 63.08 80.62 98.66 81.14 82.67 75.93 0.32 RSDT 87.66 89.59 91.23 85.74 88.55 82.18 67.38 85.15 99.44 84.24 86.32 79.77 2.47 RF 89.04 98.26 94.71 82.26 92.84 87.37 76.10 90.71 99.10 88.11 90.04 84.54 0.79 RoF 89.14 90.61 90.21 87.52 92.25 86.81 77.73 87.01 99.21 88.40 89.20 84.95 15.02 RoRF 90.85 98.59 97.70 87.11 94.62 87.72 76.21 91.18 99.45 89.30 91.49 86.10 27.56 ELM 56.65 95.77 88.91 73.24 87.00 81.80 66.61 88.61 99.62 80.43 82.02 74.75 2.87 RSELM 71.11 99.88 99.43 95.85 94.80 90.79 85.83 95.26 99.95 91.19 92.55 88.54 58.35 RoELM 94.81 97.42 91.44 95.27 93.18 88.38 89.39 95.75 99.91 91.93 93.95 89.55 114.35 TABLE XIX C LASSIFICATION ACCURACIES OBTAINED FROM THE PROPOSED METHODS (RO ELM EMAP S AND RSELM EMAP S ) IN THE COMPARSIONS OTHER S PATIAL - SPECTRAL CLASSIFIERS FOR THE U NIVERSITY OF PAVIA ROSIS IMAGE (10 SAMPLES PER CLASS ) Classifier OA AA κ SVM+Clustering [4] 61.83 73.85 57.14 MLRsubMLL [57] 73.68 77.18 66.41 GCK [45] 89.38 92.22 86.53 Mixed lasso 3D-DWT [34] 87.73 91.14 84.36 RSELM EMAPs 91.19 92.55 88.54 RoELM EMAPs 91.93 93.95 89.55 methods for a limited training set (10 samples per class). From this table, it can be found that our proposed method RoELM EMAPs gains the best classification result. Considering the processing time, with the limited training set, the processing time of DT and DT ensemble is significantly reduced. The computational cost of ELM and its ensembles with limited training samples is higher than those of ELM and its ensemble with entire training set, because we used more hidden nodes (δ = 512) to generate better performances. March 1, 2015 DRAFT 29 D. Study of Effects on Parameter selection 85 100 96 80 80 90 95 70 75 80 94 65 60 93 CART RSDT RF RoF RoRF ELM RSELM RoELM 92 91 55 OA(%) CART RSDT RF RoF RoRF ELM RSELM RoELM OA(%) 70 OA(%) OA(%) 60 50 70 60 40 ELM RSELM RoELM 50 ELM RSELM RoELM 30 40 20 90 20 40 60 80 Number of features in a subset (M) 100 20 40 60 80 Number of features in a subset (M) (a) 200 100 400 600 800 Number of hidden nodes (δ) (b) 85 1000 200 (c) 1000 (d) 80 100 400 600 800 Number of hidden nodes (δ) 100 98 95 75 96 90 80 94 70 65 10 20 30 40 Number of features in a subset (M) CART RSDT RF RoF RoRF ELM RSELM RoELM 88 86 84 82 65 (e) 40 60 80 Number of features in a subset (M) 100 ELM RSELM RoELM 70 ELM RSELM RoELM 65 55 20 80 75 60 80 50 OA(%) 90 OA(%) OA(%) OA(%) CART RSDT RF RoF RoRF ELM RSELM RoELM 85 70 92 75 60 200 400 600 800 Number of hidden nodes (δ) (f) (g) 1000 200 400 600 800 Number of hidden nodes (δ) 1000 (h) Fig. 10. Indiana Pines AVIRIS image (10% of the labeled samples as training samples). Sensitivity to the change of (a) M with spectral information. (b) M with EMAPs. (c) δ of ELM and its ensembles with spectral information. (d) δ of ELM and its ensembles with EMAPs. University of Pavia ROSIS image (entire training set). Sensitivity to the change of (e) M with spectral information. (f) M with EMAPs. (g) δ of ELM and its ensembles with spectral information. (h) δ of ELM and its ensembles with EMAPs. Number of features in a subset (M ) is the key parameter of the RS ensemble. In ELM and its ensembles, the number of hidden nodes (δ) plays an important role. The effects of parameters in RS ensemble are depicted in Fig. 10. It is observed from Fig. 10(a-b, e-f) that there is no pattern of dependency between M and the ensemble accuracy. Different RS ensemble classifier gain the highest OA on different values of M . For the University of Pavia ROSIS image, RoF with spectral information gains the highest OAs when M = 10 and RoELM with EMAPs achieves the best classification result when M = 6. Fig. 10(c-d, g-h) depicts that a large number of hidden nodes may give higher accuracies in testing, but a complex network could also overfit the training data. For instance, the generalization performance decreases when the number of hidden nodes in larger than 512. In general, these parameters should be selected empirically in particular applications. VII. C ONCLUSION AND PERSPECTIVE In this paper, we developed a novel framework that combines Random Subspace ensembles and EMAPs for the spatial-spectral classification of remotely sensed hyperspectral data. Considering the computational cost, we selected two fast learning algorithms: DT and ELM, to build the RS ensembles. Several conclusions can be summarized based on our experimental results: March 1, 2015 DRAFT 30 • Although RS ensembles requires more training time than individual classifiers, their performance is superior to the individual classifiers using both spectral and spatial information as input. The computational load for RF algorithm is very low (less than 1s for AVIRIS dataset). In addition, the computation time for RS ensemble can be further reduced by decreasing the ensemble size. • In most cases, Rotation subspace classifiers, such as RoF, RoRF and RoELM outperform RSDT, RF and RSELM. That is because we introduce more diversity in Rotation subspace ensemble classifiers by using feature extraction and random selection strategies. However, it leads to increased computational complexity for Rotation subspace approaches. • In general, ELM and its ensembles can achieve higher accuracies than DT and its ensembles. The computation time of ELM and its ensemble depends on the hidden nodes when the ensemble size and training samples are fixed. Nevertheless, the efficiency of the ELM ensemble could be further improved by choosing smaller size of ensemble or using less hidden nodes. • Spectral-spatial classification approaches computed by EMAPs based on the proposed ensembles achieve thestate-of-the-art performances for two hyperspectral datasets. However, Random subspace ensembles are likely to have two limitations: 1) the number of features in a subset is required to be defined in advance (the optimal value for this parameter depends on the dataset); 2) the high computation time due to the high-dimensionality of the input features. Therefore, our future work is to develop an effective scheme for automatically estimating the number of features in a subset for RS ensemble and to apply a dimensionality reduction step for both spectral and spatial information of hyperspectral data. ACKNOWLEDGEMENT The authors would like to thank Prof D. Landgrebe from Purdue University, USA and Prof P. Gamba for providing the hyperspectral remote sensing images. R EFERENCES [1] D. A. Landgrebe, “Hyperspectral Image Data Analysis as a High Dimensional Signal Processing Problem,” IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 17–28, Jan. 2002. [2] C. I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. Plenum Publishing Co., 2003. [3] ——, Hyperspectral Data Exploitation: Theory and Applications. Wiley-Interscience, Hoboken, NJ, 2007. [4] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral-spatial classification of hyperspectral imagery based on partitional clustering techniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973–2987, 2009. [5] Y. Tarabalka, J. Chanussot, and J. A. Benediktsson, “Segmentation and classification of hyperspectral images using watershed transformation,” Pattern Recognition, vol. 43, no. 7, pp. 2367–2379, 2010. [6] J. C. Harsanyi and C. I. Chang, “Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, pp. 779–785, 1994. [7] L. O. Jimenez, A. Morales-Morell, and A. Creus, “Classification of hyperdimensional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks.” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 3, pp. 1360–1366, 1999. [8] C. I. Chang, “An information-theoretic approach to spectral variability, similarity, and discrimination for hyperspectral image analysis,” IEEE Trans. Inform. Theory., vol. 46, no. 5, pp. 1927–1932, Sep. 2000. March 1, 2015 DRAFT 31 [9] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification.” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, 2005. [10] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis.” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 3, pp. 862–873, 2009. [11] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876, 2011. [12] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995. [13] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, 2004. [14] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004. [15] J. A. Benediktsson, J. Chanussot, and M. Fauvel, “Multiple classifier systems in remote sensing: from basics to recent developments,” in Proceedings of the 7th International Workshop on Multiple Classifier Systems, Prague, Czech Republic, May May 23-25, 2007, pp. 501–512. [16] L. Rokach, Pattern classification using ensemble methods. World Scientific, 2010. [17] P. Du, J. Xia, W. Zhang, K. Tan, Y. Liu, and S. Liu, “Multiple classifier system for remote sensing image classification: A review,” Sensors, vol. 12, no. 4, pp. 4764–4792, 2012. [18] M. Wozniak, M. Grana, and E. Corchado, “A survey of multiple classifier systems as hybrid systems,” Information Fusion, vol. 16, no. 1, pp. 3–17, 2014. [19] T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832–844, August 1998. [20] L. I. Kuncheva, J. J. Rodrı́guez, C. O. Plumpton, D. E. Linden, and S. J. Johnston, “Random subspace ensembles for fMRI classification,” IEEE Trans. Med. Imaging, vol. 29, no. 2, pp. 531–542, 2010. [21] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001. [22] P. Gislason, J. A. Benediktsson, and J. Sveinsson, “Random forests for land cover classification,” Pattern Recogn. Lett., vol. 27, no. 4, pp. 294–300, Mar. 2006. [23] J. C. Chan and D. Paelinckx, “Evaluation of Random Forest and AdBboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery,” Remote Sens. Environ., vol. 112, no. 6, pp. 2999–3011, 2008. [24] V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, “An assessment of the effectiveness of a random forest classifier for land-cover classification,” ISPRS J. Photogramm., vol. 67, no. 1, pp. 93–104, Jan.2012. [25] B. Waske, S. Van Der Linden, J. A. Benediktsson, A. Rabe, and P. Hostert, “Sensitivity of support vector machines to random feature selection in classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 2880–2889, 2010. [26] J. J. Rodriguez and L. I. Kuncheva, “Rotation forest: A new classifier ensemble method.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619–1630, 2009. [27] J. Xia, P. Du, X. He, and J. Chanussot, “Hyperspectral remote sensing image classification based on rotation forest,” IEEE Geosci. Remote Sensing Lett., vol. 11, no. 1, pp. 239 – 243, 2014. [28] J. Xia, J. Chanussot, P. Du, and X. He, “Spectral-spatial classification for hyperspectral data using rotation forests with local feature extraction and markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2532–2546, 2015. [29] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, “Advances in spectral-spatial classification of hyperspectral images,” Proceedings of the IEEE, vol. 101, no. 3, pp. 652–675, 2013. [30] M. Pesaresi and J. A. Benediktsson, “A new approach for the morphological segmentation of high-resolution satellite imagery,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 2, pp. 309–320, 2001. [31] J. A. Benediktsson, M. Pesaresi, and K. Amason, “Classification and feature extraction for remote sensing images from urban areas based on morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 1940–1949, 2003. [32] T. C. Bau, S. Sarkar, and G. Healey, “Hyperspectral region classification using a three-dimensional gabor filterbank,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 9, pp. 3457–3464, 2010. [33] F. Tsai and J. Lai, “Feature extraction of hyperspectral image cubes using three-dimensional gray-level cooccurrence.” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 6-2, pp. 3504–3513, 2013. March 1, 2015 DRAFT 32 [34] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based on structured sparse logistic regression and three-dimensional wavelet texture features.” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4-2, pp. 2276–2291, 2013. [35] J. Serra, Image Analysis and Mathematical Morphology. Academic Press, 1982. [36] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classification of hyperspectral data from urban areas based on extended morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–491, 2005. [37] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral and spatial classification of hyperspectral data using svms and morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11, pp. 3804–3814, 2008. [38] M. Dalla Mura, J. A. Benediktsson, B. Waske, and L. Bruzzone, “Morphological attribute profiles for the analysis of very high resolution images,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3747–3762, 2010. [39] ——, “Extended profiles with morphological attribute filters for the analysis of hyperspectral data,” Int. J. Remote Sens., vol. 31, no. 22, pp. 5975–5991, Jul. 2010. [40] M. Dalla Mura, A. Villa, J. Benediktsson, J. Chanussot, and L. Bruzzone, “Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis,” IEEE Geosci. Remote Sensing Lett., vol. 8, no. 3, pp. 542–546, 2011. [41] P. Reddy Marpu, M. Pedergnana, M. Dalla Mura, S. Peeters, J. A. Benediktsson, and L. Bruzzone, “Classification of hyperspectral data using extended attribute profiles based on supervised and unsupervised feature extraction techniques,” International Journal of Image and Data Fusion, vol. 3, no. 3, pp. 269–298, 2012. [42] M. Pedergnana, P. Reddy Marpu, M. Dalla Mura, J. A. Benediktsson, and L. Bruzzone, “Classification of remote sensing optical and lidar data using extended attribute profiles,” IEEE J. Sel. Topics Signal Processing, vol. 6, no. 7, pp. 856–865, 2012. [43] ——, “A novel technique for optimal feature selection in attribute profiles based on genetic algorithms,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 6-2, pp. 3514–3528, 2013. [44] N. Falco, M. Dalla Mura, F. Bovolo, J. A. Benediktsson, and L. Bruzzone, “Change detection in vhr images based on morphological attribute profiles,” IEEE Geosci. Remote Sensing Lett., vol. 10, no. 3, pp. 636–640, 2013. [45] J. Li, P. Reddy Marpu, A. Plaza, J. M. Bioucas-Dias, and J. A. Benediktsson, “Generalized composite kernel framework for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 9, pp. 4816–4829, 2013. [46] S. Bernabe, P. Reddy Marpu, A. Plaza, M. Dalla Mura, and J. A. Benediktsson, “Spectral-spatial classification of multispectral images using kernel feature space representation,” IEEE Geosci. Remote Sensing Lett., vol. 11, no. 1, pp. 288–292, 2014. [47] B. Song, J. Li, M. Dalla Mura, P. Li, A. Plaza, J. M. Bioucas-Dias, J. A. Benediktsson, and J. Chanussot, “Remotely sensed image classification using sparse representations of morphological attribute profiles,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 5122–5136, 2014. [48] G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in 2004. Proceedings. 2004 IEEE International Joint Conference on Neural Networks, vol. 2, Budapest, Hungary, 2004, pp. 985–990. [49] ——, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1-3, pp. 489–501, 2006. [50] G. Stiglic, J. J. Rodriguez, and P. Kokol, “Rotation of random forests for genomic and proteomic classification problems,” Software Tools and Algorithms for Biological Systems, vol. 696, pp. 211–221, 2011. [51] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, Mar. 1986. [52] R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J. Zambreno, “Interactive presentation: An fpga implementation of decision tree classification,” in Proceedings of the Conference on Design, Automation and Test in Europe, San Jose, CA, USA, 2007, pp. 189–194. [53] L. Rokach and O. Maimon, Data Mining with Decision Trees: Theroy and Applications. World Scientific Publishing Co., Inc., 2008. [54] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996. [55] G. B. Huang and L. Chen, “Convex incremental extreme learning machine.” Neurocomput., vol. 70, no. 16-18, pp. 3056–3062, 2007. [56] ——, “Enhanced random search based incremental extreme learning machine,” Neurocomput., vol. 71, no. 16-18, pp. 3460–3468, 2008. [57] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral-spatial hyperspectral image segmentation using subspace multinomial logistic regression and markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 3, pp. 809–823, 2012. [58] L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181–207, 2003. [59] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and regression trees. Boca Raton, FL: CRC Press, 1984. March 1, 2015 DRAFT 33 [60] J. Xia, J. Chanussot, P. Du, and X. He, “Rotation-Based Ensemble Classifiers for High-Dimensional Data,” in Fusion in Computer Vision, B. Ionescu, J. Benois-Pineau, T. Piatrik, and G. Quénot, Eds. Springer, 2014, pp. 135–160. Junshi Xia (S’11) received the B.S. degree in geographic information systems and the Ph.D. degree in photogrammetry and remote sensing from the China University of Mining and Technology, Xuzhou, China, in 2008 and 2013, respectively. He obtained in 2014 a Ph.D. degree in image processing with the Grenoble Images Speech Signals and Automatics Laboratory, Grenoble Institute of Technology, Grenoble, France. He is currently a Research Fellow at the Department of Geographic Information Sciences, Nanjing University. His research interests include multiple classifier system in remote sensing, hyperspectral remote sensing image processing, and urban remote sensing. Mauro Dalla Mura (S’08-M’11) received the laurea (B.E.) and laurea specialistica (M.E.) degrees in Telecommunication Engineering from the University of Trento, Italy, in 2005 and 2007, respectively. He obtained in 2011 a joint Ph.D. degree in Information and Communication Technologies (Telecommunications Area) from the University of Trento, Italy and in Electrical and Computer Engineering from the University of Iceland, Iceland. In 2011 he was a Research fellow at Fondazione Bruno Kessler, Trento, Italy, conducting research on computer vision. He is currently an Assistant Professor at Grenoble Institute of Technology (Grenoble INP), France. He is conducting his research at the Grenoble Images Speech Signals and Automatics Laboratory (GIPSA-Lab). His main research activities are in the fields of remote sensing, image processing and pattern recognition. In particular, his interests include mathematical morphology, classification and multivariate data analysis. Dr. Dalla Mura was the recipient of the IEEE GRSS Second Prize in the Student Paper Competition of the 2011 IEEE International Geoscience and Remote Sensing Symposium 2011 (Vancouver, CA, July 2011). He is a Reviewer of IEEE Transactions on Geoscience and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, IEEE Journal of Selected Topics in Earth Observations and Remote Sensing, IEEE Journal of Selected Topics in Signal Processing, Pattern Recognition Letters, ISPRS Journal of Photogrammetry and Remote Sensing, Photogrammetric Engineering and Remote Sensing (PE&RS). He is a member of the Geoscience and Remote Sensing Society (GRSS) and IEEE GRSS Data Fusion Technical Committee (DFTC) and Secretary of the IEEE GRSS French Chapter (2013-2016). He was a lecturer at the RSSS12 - Remote Sensing Summer School 2012 (organized by the IEEE GRSS), Munich, Germany. March 1, 2015 DRAFT 34 Jocelyn Chanussot (M’04-SM’04-F’12) received the M.Sc. degree in electrical engineering from the Grenoble Institute of Technology (Grenoble INP), Grenoble, France, in 1995, and the Ph.D. degree from Savoie University, Annecy, France, in 1998. In 1999, he was with the Geography Imagery Perception Laboratory for the Delegation Generale de l’Armement (DGA - French National Defense Department). Since 1999, he has been with Grenoble INP, where he was an Assistant Professor from 1999 to 2005, an Associate Professor from 2005 to 2007, and is currently a Professor of signal and image processing. He is conducting his research at the Grenoble Images Speech Signals and Automatics Laboratory (GIPSA-Lab). His research interests include image analysis, multicomponent image processing, nonlinear filtering, and data fusion in remote sensing. He is a member of the Institut Universitaire de France (2012-2017). Since 2013, he is an Adjunct Professor of the University of Iceland. Dr. Chanussot is the founding President of IEEE Geoscience and Remote Sensing French chapter (20072010) which received the 2010 IEEE GRS-S Chapter Excellence Award. He was the co-recipient of the NORSIG 2006 Best Student Paper Award, the IEEE GRSS 2011 Symposium Best Paper Award, the IEEE GRSS 2012 Transactions Prize Paper Award and the IEEE GRSS 2013 Highest Impact Paper Award. He was a member of the IEEE Geoscience and Remote Sensing Society AdCom (2009-2010), in charge of membership development. He was the General Chair of the first IEEE GRSS Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote sensing (WHISPERS). He was the Chair (2009-2011) and Cochair of the GRS Data Fusion Technical Committee (2005-2008). He was a member of the Machine Learning for Signal Processing Technical Committee of the IEEE Signal Processing Society (2006-2008) and the Program Chair of the IEEE International Workshop on Machine Learning for Signal Processing, (2009). He was an Associate Editor for the IEEE Geoscience and Remote Sensing Letters (2005-2007) and for Pattern Recognition (2006-2008). Since 2007, he is an Associate Editor for the IEEE Transactions on Geoscience and Remote Sensing. Since 2011, he is the Editor-in-Chief of the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. In 2013, he was a Guest Editor for the Proceedings of the IEEE and in 2014 a Guest Editor for the IEEE Signal Processing Magazine. He is a Fellow of the IEEE and a member of the Institut Universitaire de France (2012-2017). Peijun Du (M’07-SM’12) is a Professor of Remote Sensing at the Department of Geographic Information Sciences, Nanjing University, and the deputy director of the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and Geoinformation (NASG), China. After receiving his Ph.D. degree from China University of Mining and Technology in 2001, he had been employed by the same university until he joined Nanjing University in 2011. He was a postdoctoral fellow at Shanghai JiaoTong University from February 2002 to March 2004, and was a senior visiting scholar at the University of Nottingham and the GIPSA-Lab, Grenoble Institute of Technology, France. His research interests focus on remote sensing image processing and pattern recognition, hyperspectral remote sensing, and applications of geospatial information technologies. He has published more than 40 articles in international peer-reviewed journals, and more than100 papers in international conferences and Chinese journals. Dr. Du has been the Associate Editor of IEEE Geoscience and Remote Sensing Letters (GRSL) since 2009. He was the Guest Editor of 3 special issues IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing (JSTARS). He also served as the Co-chair of the Technical Committee of URBAN 2009, EORSA 2014 and IAPR-PRRS 2012, the Co-chair of the Local Organizing Committee of JURSE 2009, WHISPERS 2012 and EORSA 2012, and the member of Scientific Committee or Technical Committee of other international conferences, for example, Spatial Accuracy 2008, ACRS 2009, WHISPERS (2010-2014), URBAN(2011, 2013 and 2015), MultiTemp(2011, 2013 and 2015), ISDIF 2011, SPIE European Conference on Image and Signal Processing for Remote Sensing (2012-2014). March 1, 2015 DRAFT 35 Xiyan He received in 2006 the Generalist Engineer degree from Ecole Centrale Paris, France, and the M.E. degree in Pattern Recognition and Intelligent System from Xi’an Jiaotong University, China, respectively. She received her Ph.D. degree in Computer Science in 2009, from University of Technology of Troyes, France. Dr. He was a teaching assistant in University of Technology of Troyes in 2009, a post-doctoral research fellow in Research Centre for Automatic Control of Nancy in 2010, and a teaching assistant in University of Pierre-Mends-France, Grenoble in 2011. Since 2012, she has been a post-doctoral research fellow in Grenoble Laboratory of Image, Speech, Signal and Automatics. Her main research interests include machine learning, pattern recognition and data fusion, with special focus on applications to remote sensed images. March 1, 2015 DRAFT