[go: up one dir, main page]

 
 
remotesensing-logo

Journal Browser

Journal Browser

State-of-the-Art Remote Sensing Image Scene Classification

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: closed (31 July 2022) | Viewed by 58152

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Spectral Imaging Technology CAS, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
Interests: remote sensing scene classification; cross-domain scene classification
Special Issues, Collections and Topics in MDPI journals
School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore
Interests: remote sensing classification; feature extraction; deep learning; sparse representation; graph learning
Special Issues, Collections and Topics in MDPI journals
School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, P.O. Box 64, Xi'an 710072, China
Interests: remote sensing; image analysis; computer vision; pattern recognition; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

As a necessary precedent step, remote sensing scene classification assigns a specific semantic label to each image, which is very helpful for geological surveys, urban planning, and other fields. To identify remote sensing scenes, many machine learning techniques have been developed, such as logistic regression, neural networks, feature learning, and support vector machines. Although this research area has attracted much attention and achieved remarkable performance, most methods are based on an assumption: the training set and test set are drawn from the same distribution or data. In real-world applications, this assumption is frequently challenged, since the remote sensing scenes may be captured with different remote sensors and over diverse locations of the ground surface. Changes in sensor types, shooting angles and illumination condition can cause large distribution differences across remote sensing images. Therefore, the inclusion of a Special Issue in the journal Remote Sensing is right on time for promoting the innovation and improvement of remote sensing scene classification using cross-domain/multi-source data.

This Special Issue focuses on advances in remote sensing scene classification using cross-domain data, multi-source data, and multi-modal data. Topics of interest include, but are not limited to:

  • Cross-domainremote sensing scene classification/cross-scene classification;
  • Multi-source remote sensing data classification;
  • Few-shot image classification;
  • Multiple-scene or multi-task classification;
  • Knowledge distillation and collaborative learning in remote sensing;
  • A generalizable/domain-invariant/ transferable model for scene classification;
  • Feature learning for cross-domain/multi-modal/multi-temporal image analysis;
  • Applications/survey/benchmarkin remote sensing scene classification.

Dr. Xiangtao Zheng
Dr. Fulin Luo
Prof. Dr. Qi Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Remote sensing scene classification
  • Multi-modal image analysis
  • Cross-modal image interpretation
  • Self-supervised/weakly supervised/unsupervised learning
  • Domain adaptation/transfer learning
  • Open set domain adaptation
  • Zero-shot/few-shot learning
  • Multi-task learning
  • Pattern recognition
  • Deep neural networks

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

20 pages, 11529 KiB  
Article
Multi-Field Context Fusion Network for Semantic Segmentation of High-Spatial-Resolution Remote Sensing Images
by Xinran Du, Shumeng He, Houqun Yang and Chunxiao Wang
Remote Sens. 2022, 14(22), 5830; https://doi.org/10.3390/rs14225830 - 17 Nov 2022
Cited by 5 | Viewed by 2291
Abstract
High spatial resolution (HSR) remote sensing images have a wide range of application prospects in the fields of urban planning, agricultural planning and military training. Therefore, the research on the semantic segmentation of remote sensing images becomes extremely important. However, large data volume [...] Read more.
High spatial resolution (HSR) remote sensing images have a wide range of application prospects in the fields of urban planning, agricultural planning and military training. Therefore, the research on the semantic segmentation of remote sensing images becomes extremely important. However, large data volume and the complex background of HSR remote sensing images put great pressure on the algorithm efficiency. Although the pressure on the GPU can be relieved by down-sampling the image or cropping it into small patches for separate processing, the loss of local details or global contextual information can lead to limited segmentation accuracy. In this study, we propose a multi-field context fusion network (MCFNet), which can preserve both global and local information efficiently. The method consists of three modules: a backbone network, a patch selection module (PSM), and a multi-field context fusion module (FM). Specifically, we propose a confidence-based local selection criterion in the PSM, which adaptively selects local locations in the image that are poorly segmented. Subsequently, the FM dynamically aggregates the semantic information of multiple visual fields centered on that local location to enhance the segmentation of these local locations. Since MCFNet only performs segmentation enhancement on local locations in an image, it can improve segmentation accuracy without consuming excessive GPU memory. We implement our method on two high spatial resolution remote sensing image datasets, DeepGlobe and Potsdam, and compare the proposed method with state-of-the-art methods. The results show that the MCFNet method achieves the best balance in terms of segmentation accuracy, memory efficiency, and inference speed. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Results of semantic segmentation using patch processing and down-sampling processing. (<b>a</b>) is the input image and (<b>b</b>) is the labeled image, where the area circled in red contains two highly confusing categories-Agriculture and Rangeland. (<b>c</b>,<b>d</b>) represent the two processing methods respectively. As can be seen from the figures, down-sampling loses fine details, while patch processing wrongly classifies local patches due to the lack of the global context.</p>
Full article ">Figure 2
<p>Overview of our proposed MCFNet. The global and local branches are images that have been down-sampled and cropped respectively. PSM module is used to identify poorly segmented local patches within the global segmentation, and FM module performs segmentation enhancement on those poorly segmented local patches. The final segmentation is created by aggregating the global and local branch feature maps.</p>
Full article ">Figure 3
<p>(<b>a</b>) The relationship between the score of local patch and its segmentation accuracy, where accuracy is the percentage of correctly predicted samples; (<b>b</b>) Relative score of local patch and its accuracy, where the relative score is defined as <math display="inline"><semantics> <mrow> <msub> <mi>μ</mi> <mrow> <mi>global</mi> </mrow> </msub> <mo>−</mo> <msubsup> <mi>μ</mi> <mrow> <mi>local</mi> </mrow> <mrow/> </msubsup> </mrow> </semantics></math>; (<b>c</b>) PSM module. PSM selects the local patch that requires reinforcement based on the global image’s soft classification output.</p>
Full article ">Figure 4
<p>The multi-field context fusion module. FM adaptive fusion of multi-field semantic for local patch selected by PSM.</p>
Full article ">Figure 5
<p>Interaction of PSM and FM.</p>
Full article ">Figure 6
<p>Dataset of DeepGlobe and Potsdam. The red area framed from the picture is the area that is difficult to segment.</p>
Full article ">Figure 7
<p>Visual comparison of semantic segmentation results on the DeepGlobe.</p>
Full article ">Figure 8
<p>Visualization of semantic segmentation results for ambiguous categories Agriculture and Rangeland.</p>
Full article ">Figure 9
<p>The display of segmentation output for Potsdam dataset.</p>
Full article ">Figure 10
<p>Local details of the Postdam image through different methods.</p>
Full article ">Figure 11
<p>Local details of the DeepGlobe image through different methods.</p>
Full article ">
19 pages, 14157 KiB  
Article
An Efficient Feature Extraction Network for Unsupervised Hyperspectral Change Detection
by Hongyu Zhao, Kaiyuan Feng, Yue Wu and Maoguo Gong
Remote Sens. 2022, 14(18), 4646; https://doi.org/10.3390/rs14184646 - 17 Sep 2022
Cited by 11 | Viewed by 2367
Abstract
Change detection (CD) in hyperspectral images has become a research hotspot in the field of remote sensing due to the extremely wide spectral range of hyperspectral images compared to traditional remote sensing images. It is challenging to effectively extract features from redundant high-dimensional [...] Read more.
Change detection (CD) in hyperspectral images has become a research hotspot in the field of remote sensing due to the extremely wide spectral range of hyperspectral images compared to traditional remote sensing images. It is challenging to effectively extract features from redundant high-dimensional data for hyperspectral change detection tasks due to the fact that hyperspectral data contain abundant spectral information. In this paper, a novel feature extraction network is proposed, which uses a Recurrent Neural Network (RNN) to mine the spectral information of the input image and combines this with a Convolutional Neural Network (CNN) to fuse the spatial information of hyperspectral data. Finally, the feature extraction structure of hybrid RNN and CNN is used as a building block to complete the change detection task. In addition, we use an unsupervised sample generation strategy to produce high-quality samples for network training. The experimental results demonstrate that the proposed method yields reliable detection results. Moreover, the proposed method has fewer noise regions than the pixel-based method. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Architecture of our proposed method, which contains three main parts: the convolutional neural network part, recurrent neural network part and the final, fully connected layer part.</p>
Full article ">Figure 2
<p>The Hermiston City area dataset. (<b>a</b>) Image acquired in 2004. (<b>b</b>) Image acquired in 2007. (<b>c</b>) Reference map.</p>
Full article ">Figure 3
<p>The Bay Area dataset. (<b>a</b>) Image acquired in 2013. (<b>b</b>) Image acquired in 2015. (<b>c</b>) Reference map.</p>
Full article ">Figure 4
<p>The Santa Barbara dataset. (<b>a</b>) Image acquired in 2013. (<b>b</b>) Image acquired in 2014. (<b>c</b>) Reference map.</p>
Full article ">Figure 5
<p>Comparison results of different algorithms on the Hermiston City Area dataset. (<b>a</b>) Reference map. (<b>b</b>) CVA. (<b>c</b>) DNN. (<b>d</b>) CNN. (<b>e</b>) GETNET. (<b>f</b>) Our method.</p>
Full article ">Figure 6
<p>Comparison results of different algorithms on the Bay Area dataset. (<b>a</b>) Reference map. (<b>b</b>) CVA. (<b>c</b>) DNN. (<b>d</b>) CNN. (<b>e</b>) GETNET. (<b>f</b>) Our method.</p>
Full article ">Figure 7
<p>Comparison results of different algorithms on the Santa Barbara dataset. (<b>a</b>) Reference map. (<b>b</b>) CVA. (<b>c</b>) DNN. (<b>d</b>) CNN. (<b>e</b>) GETNET. (<b>f</b>) Our method.</p>
Full article ">Figure 8
<p>Visualization of detection results with and without RNN. (<b>a</b>) Without RNN. (<b>b</b>) With RNN.</p>
Full article ">Figure 9
<p>The Sample Distribution on the Bay Area dataset and the Santa Barbara dataset. (<b>a</b>) The Bay Area dataset. (<b>b</b>) The Santa Barbara dataset.</p>
Full article ">Figure 10
<p>The influence of parameter <math display="inline"><semantics> <mi>λ</mi> </semantics></math> on the Bay Area dataset and the Santa Barbara dataset. (<b>a</b>) The influence of <math display="inline"><semantics> <mi>λ</mi> </semantics></math> on the Bay Area dataset. (<b>b</b>) The influence of <math display="inline"><semantics> <mi>λ</mi> </semantics></math> on the Santa Barbara dataset.</p>
Full article ">Figure 11
<p>The network training accuracy and loss curves for the Bay Area and Santa Barbara datasets. (<b>a</b>) The Bay Area dataset. (<b>b</b>) The Santa Barbara dataset.</p>
Full article ">Figure 12
<p>Metrics comparison of different algorithms on the three datasets. (<b>a</b>) Metrics comparison of the Hermiston City Area dataset. (<b>b</b>) Metrics comparison on the Bay Area dataset. (<b>c</b>) Metrics comparison of the Santa Barbara dataset.</p>
Full article ">
17 pages, 4037 KiB  
Article
Power Line Extraction Framework Based on Edge Structure and Scene Constraints
by Kuansheng Zou and Zhenbang Jiang
Remote Sens. 2022, 14(18), 4575; https://doi.org/10.3390/rs14184575 - 13 Sep 2022
Cited by 6 | Viewed by 2016
Abstract
Power system maintenance is an important guarantee for the stable operation of the power system. Power line autonomous inspection based on Unmanned Aerial Vehicles (UAVs) provides convenience for maintaining power systems. The Power Line Extraction (PLE) is one of the key issues that [...] Read more.
Power system maintenance is an important guarantee for the stable operation of the power system. Power line autonomous inspection based on Unmanned Aerial Vehicles (UAVs) provides convenience for maintaining power systems. The Power Line Extraction (PLE) is one of the key issues that needs solved first for autonomous power line inspection. However, most of the existing PLE methods have the problem that small edge lines are extracted from scene images without power lines, and bringing about that PLE method cannot be well applied in practice. To solve this problem, a PLE method based on edge structure and scene constraints is proposed in this paper. The Power Line Scene Recognition (PLSR) is used as an auxiliary task for the PLE and scene constraints are set first. Based on the characteristics of power line images, the shallow feature map of the fourth layer of the encoding stage is transmitted to the middle three layers of the decoding stage, thus, structured detailed edge features are provided for upsampling. It is helpful to restore the power line edges more finely. Experimental results show that the proposed method has good performance, robustness, and generalization in multiple scenes with complex backgrounds. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Results of traditional deep learning based PLE in practice. (<b>a</b>) Aerial images. (<b>b</b>) Ground Truth. (<b>c</b>) Extract results.</p>
Full article ">Figure 2
<p>The architecture of the proposed PLE model. Each colored box corresponds to a multi- channel feature map. The number of channels is denoted on top of the colored box. The x-y size is listed at the bottom of the colored box. Box without numbers means it has the same channel and size with the same colored box. The colored arrows and lines denote the different operations.</p>
Full article ">Figure 3
<p>The PLE results of datasets containing power lines. The more similar the extracted images (<b>c</b>–<b>j</b>) with the GT (<b>b</b>) denotes the better performance of the method used.</p>
Full article ">Figure 4
<p>The PLE results of datasets without power lines. Other methods false detected edge lines more or less, the non-powerline scenarios are correctly recognized by the proposed method.</p>
Full article ">Figure 5
<p>Comparison experiment results of power line extraction.</p>
Full article ">Figure 6
<p>Test results in a foggy environment. The first four show power line scene images, and the last one shows non-power line scene images (order from left to right). (<b>a</b>) Power line images in a foggy environment. (<b>b</b>) Ground Truth. (<b>c</b>) The predicted results of the proposed method.</p>
Full article ">Figure 6 Cont.
<p>Test results in a foggy environment. The first four show power line scene images, and the last one shows non-power line scene images (order from left to right). (<b>a</b>) Power line images in a foggy environment. (<b>b</b>) Ground Truth. (<b>c</b>) The predicted results of the proposed method.</p>
Full article ">Figure 7
<p>Test results in strong light environment. The first four show power line scene images, and the last one shows non-power line scene images (order from left to right). (<b>a</b>) Power line images in a strong light environment. (<b>b</b>) Ground Truth. (<b>c</b>) The predicted results of the proposed method.</p>
Full article ">Figure 8
<p>Test results in a snowfall environment. The first four show power line scene images, and the last one shows non-power line scene images (order from left to right). (<b>a</b>) Power line images in a snowfall environment. (<b>b</b>) Ground Truth. (<b>c</b>) The prediction results of the proposed method.</p>
Full article ">Figure 9
<p>Test results in motion blur environment. The first four show power line scene images, and the last one shows non-power line scene images (order from left to right). (<b>a</b>) Power line images in motion blur environment. (<b>b</b>) Ground Truth. (<b>c</b>) The prediction results of the proposed method.</p>
Full article ">Figure 10
<p>Generalization test results by using the proposed PLE method. The first four contain the power line scene, and the last two do not contain power line scene (order from left to right). (<b>a</b>) Images used for generalization test. (<b>b</b>) Ground truth. (<b>c</b>) Results of the proposed PLE model.</p>
Full article ">
21 pages, 2503 KiB  
Article
Semi-Supervised DEGAN for Optical High-Resolution Remote Sensing Image Scene Classification
by Jia Li, Yujia Liao, Junjie Zhang, Dan Zeng and Xiaoliang Qian
Remote Sens. 2022, 14(17), 4418; https://doi.org/10.3390/rs14174418 - 5 Sep 2022
Cited by 9 | Viewed by 2422
Abstract
Semi-supervised methods have made remarkable achievements via utilizing unlabeled samples for optical high-resolution remote sensing scene classification. However, the labeled data cannot be effectively combined with unlabeled data in the existing semi-supervised methods during model training. To address this issue, we present a [...] Read more.
Semi-supervised methods have made remarkable achievements via utilizing unlabeled samples for optical high-resolution remote sensing scene classification. However, the labeled data cannot be effectively combined with unlabeled data in the existing semi-supervised methods during model training. To address this issue, we present a semi-supervised optical high-resolution remote sensing scene classification method based on Diversity Enhanced Generative Adversarial Network (DEGAN), in which the supervised and unsupervised stages are deeply combined in the DEGAN training. Based on the unsupervised characteristic of the Generative Adversarial Network (GAN), a large number of unlabeled and labeled images are jointly employed to guide the generator to obtain a complete and accurate probability density space of fake images. The Diversity Enhanced Network (DEN) is designed to increase the diversity of generated images based on massive unlabeled data. Therefore, the discriminator is promoted to provide discriminative features by enhancing the generator given the game relationship between two models in DEGAN. Moreover, the conditional entropy is adopted to make full use of the information of unlabeled data during the discriminator training. Finally, the features extracted from the discriminator and VGGNet-16 are employed for scene classification. Experimental results on three large datasets demonstrate that the proposed scene classification method yields a superior classification performance compared with other semi-supervised methods. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The framework of DEGAN.</p>
Full article ">Figure 2
<p>The flowchart of the proposed scene classification based on DEGAN.</p>
Full article ">Figure 3
<p>The architecture of the FGN.</p>
Full article ">Figure 4
<p>The architecture of the DEN.</p>
Full article ">Figure 5
<p>The architecture of the discriminator.</p>
Full article ">Figure 6
<p>Illustrations of images of three optical high-resolution remote sensing image scene classification datasets. UC Merced, AID, and NWPU-RESISC45 datasets are displayed sequentially from top to bottom: (<b>a</b>) Baseball court; (<b>b</b>) Beach; (<b>c</b>) Storage tank; (<b>d</b>) Forest; (<b>e</b>) Harbor; (<b>f</b>) River; (<b>g</b>) Parking; (<b>h</b>) Sparse residual; (<b>i</b>) Medium residual; and (<b>j</b>) Dense residual.</p>
Full article ">Figure 7
<p>Confusion matrix of the proposed method on the UC Merced dataset under the training ratio of 10%.</p>
Full article ">Figure 8
<p>Confusion matrix of the proposed method on the AID dataset under the training ratio of 10%.</p>
Full article ">Figure 9
<p>Confusion matrix of the proposed method on the NWPU-RESISC45 dataset under the training ratio of 10%.</p>
Full article ">
23 pages, 6282 KiB  
Article
Hyperspectral Band Selection via Band Grouping and Adaptive Multi-Graph Constraint
by Mengbo You, Xiancheng Meng, Yishu Wang, Hongyuan Jin, Chunting Zhai and Aihong Yuan
Remote Sens. 2022, 14(17), 4379; https://doi.org/10.3390/rs14174379 - 3 Sep 2022
Cited by 4 | Viewed by 2554
Abstract
Unsupervised band selection has gained increasing attention recently since massive unlabeled high-dimensional data often need to be processed in the domains of machine learning and data mining. This paper presents a novel unsupervised HSI band selection method via band grouping and adaptive multi-graph [...] Read more.
Unsupervised band selection has gained increasing attention recently since massive unlabeled high-dimensional data often need to be processed in the domains of machine learning and data mining. This paper presents a novel unsupervised HSI band selection method via band grouping and adaptive multi-graph constraint. A band grouping strategy that assigns each group different weights to construct a global similarity matrix is applied to address the problem of overlooking strong correlations among adjacent bands. Different from previous studies that are limited to fixed graph constraints, we adjust the weight of the local similarity matrix dynamically to construct a global similarity matrix. By partitioning the HSI cube into several groups, the model is built with a combination of significance ranking and band selection. After establishing the model, we addressed the optimization problem by an iterative algorithm, which updates the global similarity matrix, its corresponding reconstruction weights matrix, the projection, and the pseudo-label matrix to ameliorate each of them synergistically. Extensive experimental results indicate our method outperforms the other five state-of-the-art band selection methods in the publicly available datasets. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>The workflow of the idea of band grouping of the global similarity matrix reconstructed by the local similarity matrix.</p>
Full article ">Figure 2
<p>The comparison of OA and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> produced by SVM and KNN on the Pavia University dataset.</p>
Full article ">Figure 3
<p>The visualization of classification on the Pavia University dataset. (<b>a</b>) Ground truth; (<b>b</b>) TRC-OC-FDPC; (<b>c</b>) UBS; (<b>d</b>) PCAS; (<b>e</b>) ONR; (<b>f</b>) NC-OC-MVPCA; (<b>g</b>) NC-OC-IE; (<b>h</b>) BAMGC.</p>
Full article ">Figure 4
<p>The comparison of OA and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> produced by SVM and KNN on the Indian Pines dataset.</p>
Full article ">Figure 5
<p>The visualization of classification on the Indian Pines dataset. (<b>a</b>) Ground truth; (<b>b</b>) PCAS; (<b>c</b>) PCA; (<b>d</b>) ONR; (<b>e</b>) NC-OC-MVPA; (<b>f</b>) NC-OC-IE; (<b>g</b>) LvaHAI; (<b>h</b>) BAMGC.</p>
Full article ">Figure 6
<p>The comparison of OA and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> produced by SVM and KNN on the Salinas dataset.</p>
Full article ">Figure 7
<p>The visualization of classification on the Salinas dataset. (<b>a</b>) Ground truth; (<b>b</b>) PCA; (<b>c</b>) NC-OC-IE; (<b>d</b>) NC-OC-MVPCA; (<b>e</b>) PCAS; (<b>f</b>) ONR; (<b>g</b>) TRC-OC-FDPC; (<b>h</b>) BAMGC.</p>
Full article ">Figure 8
<p>The comparison of OA and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> produced by SVM and KNN on the Botswana dataset.</p>
Full article ">Figure 9
<p>The visualization of classification on the Botswana dataset. (<b>a</b>) Ground truth; (<b>b</b>) UBS; (<b>c</b>) NC-OC-MVPCA; (<b>d</b>) SOR-SRL; (<b>e</b>) ONR; (<b>f</b>) LvaHAI; (<b>g</b>) SORSRL; (<b>h</b>) BAMGC.</p>
Full article ">Figure 10
<p>The comparison of OA and <math display="inline"><semantics> <mi>κ</mi> </semantics></math> produced by SVM and KNN on the University of Houston dataset.</p>
Full article ">Figure 11
<p>The visualization of classification on the University of Houston dataset. (<b>a</b>) Ground truth; (<b>b</b>) UBS; (<b>c</b>) ONR; (<b>d</b>) TRC-OC-FDPC; (<b>e</b>) PCA; (<b>f</b>) NC-OC-IE; (<b>g</b>) NC-OC-MVPCA; (<b>h</b>) TRC-OC-FDPC.</p>
Full article ">Figure 12
<p>The sensitivity of hyperparameter <math display="inline"><semantics> <mi>α</mi> </semantics></math> on the five datasets. (<b>a</b>) Pavia University; (<b>b</b>) Indian Pines; (<b>c</b>) Salinas; (<b>d</b>) Botswana; (<b>e</b>) Houston University.</p>
Full article ">Figure 13
<p>The sensitivity of hyperparameter <math display="inline"><semantics> <mi>β</mi> </semantics></math> on the five datasets. (<b>a</b>) Pavia University; (<b>b</b>) Indian Pines; (<b>c</b>) Salinas; (<b>d</b>) Botswana; (<b>e</b>) Houston University.</p>
Full article ">Figure 14
<p>The sensitivity of hyperparameter, the number of groups, on the five datasets. (<b>a</b>) Pavia University; (<b>b</b>) Indian Pines; (<b>c</b>) Salinas; (<b>d</b>) Botswana; (<b>e</b>) Houston University.</p>
Full article ">Figure 15
<p>The sensitivity of hyperparameter <math display="inline"><semantics> <mi>σ</mi> </semantics></math> on the five datasets. (<b>a</b>) Pavia University; (<b>b</b>) Indian Pines; (<b>c</b>) Salinas; (<b>d</b>) Botswana; (<b>e</b>) Houston University.</p>
Full article ">
16 pages, 5951 KiB  
Article
A Tracking Imaging Control Method for Dual-FSM 3D GISC LiDAR
by Yu Cao, Xiuqin Su, Xueming Qian, Haitao Wang, Wei Hao, Meilin Xie, Xubin Feng, Junfeng Han, Mingliang Chen and Chenglong Wang
Remote Sens. 2022, 14(13), 3167; https://doi.org/10.3390/rs14133167 - 1 Jul 2022
Cited by 5 | Viewed by 2031
Abstract
In this paper, a tracking and pointing control system with dual-FSM (fast steering mirror) composite axis is proposed. It is applied to the target-tracking accuracy control in a 3D GISC LiDAR (three-dimensional ghost imaging LiDAR via sparsity constraint) system. The tracking and pointing [...] Read more.
In this paper, a tracking and pointing control system with dual-FSM (fast steering mirror) composite axis is proposed. It is applied to the target-tracking accuracy control in a 3D GISC LiDAR (three-dimensional ghost imaging LiDAR via sparsity constraint) system. The tracking and pointing imaging control system of the dual-FSM 3D GISC LiDAR proposed in this paper is a staring imaging method with multiple measurements, which mainly solves the problem of high-resolution remote-sensing imaging of high-speed moving targets when the technology is transformed into practical applications. In the research of this control system, firstly, we propose a method that combines motion decoupling and sensor decoupling to solve the mechanical coupling problem caused by the noncoaxial sensor installation of the FSM. Secondly, we suppress the inherent mechanical resonance of the FSM in the control system. Thirdly, we propose the optical path design of a dual-FSM 3D GISC LiDAR tracking imaging system to solve the problem of receiving aperture constraint. Finally, after sufficient experimental verification, our method is shown to successfully reduce the coupling from 7% to 0.6%, and the precision tracking bandwidth reaches 300 Hz. Moreover, when the distance between the GISC system and the target is 2.74 km and the target flight speed is 7 m/s, the tracking accuracy of the system is improved from 15.7 μrad (σ) to 2.2 μrad (σ), and at the same time, the system recognizes the target contour clearly. Our research is valuable to put the GISC technology into practical applications. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Structure of a fast steering mirror.</p>
Full article ">Figure 2
<p>Sensor Coordinates.</p>
Full article ">Figure 3
<p>FSM control principle diagram.</p>
Full article ">Figure 4
<p>Spectrum characteristics of second-order digital filter.</p>
Full article ">Figure 5
<p>Comparison of position curves before and after resonance suppression. (<b>a</b>) Before suppressing resonance; (<b>b</b>) after suppressing resonance, the X-axis shows a 200 Hz sinusoidal position curve.</p>
Full article ">Figure 6
<p>Experimental platform’s schematic diagram of dual-FSM 3D GISC LiDAR tracking control system (PMT: photon multiplier tube, BS: beam splitter, RGG: rotating ground glass). (<b>a</b>) Transmitting system’s optical path structure; (<b>b</b>) receiving system’s optical path structure.</p>
Full article ">Figure 7
<p>Photo of the experimental platform. (<b>a</b>) Photo of the transmitting FSM; (<b>b</b>) dual-FSM 3D GISC LiDAR tracking imaging system.</p>
Full article ">Figure 8
<p>Comparison before and after decoupling. (<b>a</b>) Before sensor decoupling; (<b>b</b>) after sensor decoupling.</p>
Full article ">Figure 9
<p>Response bandwidth test after decoupling given a 300 Hz sinusoidal on the X-axis.</p>
Full article ">Figure 10
<p>Phase difference test of 180 degrees of XY-axis double sinusoidal.</p>
Full article ">Figure 11
<p>X-axis antidisturbance capability test.</p>
Full article ">Figure 12
<p>Comparative test of tracking accuracy of the azimuth and pitch axis of a UAV in flight. (<b>a</b>) FSM is controlled by existing algorithm; (<b>b</b>) FSM is controlled by digital decoupling and resonance suppression.</p>
Full article ">Figure 13
<p>Comparison of imaging effects of a dual-FSM 3D GISC LiDAR system with existing control algorithms at different speeds of a UAV (<b>a</b>) UAV hovering; (<b>b</b>) UAV V = 2 m/s; (<b>c</b>) UAV V = 7 m/s.</p>
Full article ">Figure 14
<p>Comparison of imaging effects of a dual-FSM 3D GISC LiDAR system under the control of digital decoupling and resonance suppression at different Speeds of a UAV (<b>a</b>) UAV hovering; (<b>b</b>) UAV V = 2 m/s; (<b>c</b>) UAV V = 7 m/s.</p>
Full article ">
24 pages, 7222 KiB  
Article
PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation
by Bo Ren, Mengqian Chen, Biao Hou, Danfeng Hong, Shibin Ma, Jocelyn Chanussot and Licheng Jiao
Remote Sens. 2022, 14(13), 3117; https://doi.org/10.3390/rs14133117 - 28 Jun 2022
Cited by 1 | Viewed by 2230
Abstract
Polarimetric synthetic aperture radar (PolSAR) data can be acquired at all times and are not impacted by weather conditions. They can efficiently capture geometrical and geographical structures on the ground. However, due to the complexity of the data and the difficulty of data [...] Read more.
Polarimetric synthetic aperture radar (PolSAR) data can be acquired at all times and are not impacted by weather conditions. They can efficiently capture geometrical and geographical structures on the ground. However, due to the complexity of the data and the difficulty of data availability, PolSAR image scene classification remains a challenging task. To this end, in this paper, a low-rank constrained multimodal tensor representation method (LR-MTR) is proposed to integrate PolSAR data in multimodal representations. To preserve the multimodal polarimetric information simultaneously, the target decompositions in a scene from multiple spaces (e.g., Freeman, H/A/α, Pauli, etc.) are exploited to provide multiple pseudo-color images. Furthermore, a representation tensor is constructed via the representation matrices and constrained by the low-rank norm to keep the cross-information from multiple spaces. A projection matrix is also calculated by minimizing the differences between the whole cascaded data set and the features in the corresponding space. It also reduces the redundancy of those multiple spaces and solves the out-of-sample problem in the large-scale data set. To support the experiments, two new PolSAR image data sets are built via ALOS-2 full polarization data, covering the areas of Shanghai, China, and Tokyo, Japan. Compared with state-of-the-art (SOTA) dimension reduction algorithms, the proposed method achieves the best quantitative performance and demonstrates superiority in fusing multimodal PolSAR features for image scene classification. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Overview of LR-MTR.</p>
Full article ">Figure 2
<p>LR-MTR algorithm flow chart.</p>
Full article ">Figure 3
<p>Geographical location of the Tokyo data set and its scene categories.</p>
Full article ">Figure 4
<p>Visualization examples from the Tokyo data set. (<b>a</b>) Building. (<b>b</b>) Water. (<b>c</b>) Woodland. (<b>d</b>) Coast. (<b>e</b>) Farmland.</p>
Full article ">Figure 5
<p>Geographical location of the Shanghai data set and its scene categories.</p>
Full article ">Figure 6
<p>Visualization examples from the Shanghai data set. (<b>a</b>) Urban areas. (<b>b</b>) Surburban areas. (<b>c</b>) Farmland. (<b>d</b>) Water. (<b>e</b>) Coast.</p>
Full article ">Figure 6 Cont.
<p>Visualization examples from the Shanghai data set. (<b>a</b>) Urban areas. (<b>b</b>) Surburban areas. (<b>c</b>) Farmland. (<b>d</b>) Water. (<b>e</b>) Coast.</p>
Full article ">Figure 7
<p>Confusion matrix for the Tokyo data set (<b>a</b>) and the Shanghai data set (<b>b</b>).</p>
Full article ">Figure 8
<p>OA results of KNN with the dimension changes. (<b>a</b>) Tokyo data results. (<b>b</b>) Shanghai data results.</p>
Full article ">Figure 9
<p>Adjusted parameters according to the accuracy of the Tokyo data set (<b>left</b>) and the Shanghai data set (<b>right</b>). As we set <math display="inline"><semantics> <mrow> <mi>γ</mi> <mo>=</mo> <msub> <mi>γ</mi> <mn>2</mn> </msub> <mo>=</mo> <mo>⋯</mo> <mo>=</mo> <msub> <mi>γ</mi> <mi>M</mi> </msub> <mo>=</mo> <mi>γ</mi> </mrow> </semantics></math> in our experiments, <math display="inline"><semantics> <mi>γ</mi> </semantics></math> is the only parameter that needs to be tuned.</p>
Full article ">Figure 10
<p>The convergence property of the proposed algorithm. (<b>a</b>) Tokyo data results. (<b>b</b>) Shanghai data results.</p>
Full article ">
23 pages, 3842 KiB  
Article
Fine-Grained Ship Classification by Combining CNN and Swin Transformer
by Liang Huang, Fengxiang Wang, Yalun Zhang and Qingxia Xu
Remote Sens. 2022, 14(13), 3087; https://doi.org/10.3390/rs14133087 - 27 Jun 2022
Cited by 22 | Viewed by 5147
Abstract
The mainstream algorithms used for ship classification and detection can be improved based on convolutional neural networks (CNNs). By analyzing the characteristics of ship images, we found that the difficulty in ship image classification lies in distinguishing ships with similar hull structures but [...] Read more.
The mainstream algorithms used for ship classification and detection can be improved based on convolutional neural networks (CNNs). By analyzing the characteristics of ship images, we found that the difficulty in ship image classification lies in distinguishing ships with similar hull structures but different equipment and superstructures. To extract features such as ship superstructures, this paper introduces transformer architecture with self-attention into ship classification and detection, and a CNN and Swin transformer model (CNN-Swin model) is proposed for ship image classification and detection. The main contributions of this study are as follows: (1) The proposed approach pays attention to different scale features in ship image classification and detection, introduces a transformer architecture with self-attention into ship classification and detection for the first time, and uses a parallel network of a CNN and a transformer to extract features of images. (2) To exploit the CNN’s performance and avoid overfitting as much as possible, a multi-branch CNN-Block is designed and used to construct a CNN backbone with simplicity and accessibility to extract features. (3) The performance of the CNN-Swin model is validated on the open FGSC-23 dataset and a dataset containing typical military ship categories based on open-source images. The results show that the model achieved accuracies of 90.9% and 91.9% for the FGSC-23 dataset and the military ship dataset, respectively, outperforming the existing nine state-of-the-art approaches. (4) The good extraction effect on the ship features of the CNN-Swin model is validated as the backbone of the three state-of-the-art detection methods on the open datasets HRSC2016 and FAIR1M. The results show the great potential of the CNN-Swin backbone with self-attention in ship detection. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Network architecture of the proposed model.</p>
Full article ">Figure 2
<p>Network architecture of Layer1.</p>
Full article ">Figure 3
<p>Some samples of the military ship dataset: (<b>a</b>) Type-054 frigate, (<b>b</b>) Sovremenny class destroyer, (<b>c</b>) Type-055 destroyer, (<b>d</b>) Surveillance boat, (<b>e</b>) Arleigh Burke class destroyer, (<b>f</b>) landing vessel, and (<b>g</b>) missile boat.</p>
Full article ">Figure 4
<p>Confusion matrix of ship dataset.</p>
Full article ">Figure 5
<p>t-sne clustering analysis on the military ship dataset: (<b>a</b>) sample of untrained dataset and (<b>b</b>) sample of dataset trained by CNN-Swin model.</p>
Full article ">Figure 6
<p>Confusion matrices of state-of-the-art approaches for military ship dataset: (<b>a</b>) Densenet-121, (<b>b</b>) Efficientnet, (<b>c</b>) Resnet-18, (<b>d</b>) Regnet, (<b>e</b>) ViT-base, (<b>f</b>) CaiT, (<b>g</b>) Swin-base, and (<b>h</b>) CNN-Swin.</p>
Full article ">Figure 6 Cont.
<p>Confusion matrices of state-of-the-art approaches for military ship dataset: (<b>a</b>) Densenet-121, (<b>b</b>) Efficientnet, (<b>c</b>) Resnet-18, (<b>d</b>) Regnet, (<b>e</b>) ViT-base, (<b>f</b>) CaiT, (<b>g</b>) Swin-base, and (<b>h</b>) CNN-Swin.</p>
Full article ">Figure 7
<p>Some of the detection results obtained from the HRSC2016 dataset with our backbone in different state-of-the-art methods.</p>
Full article ">Figure 8
<p>Some of the detection results obtained from part of FAIR1M dataset with our backbone in different state-of-the-art methods.</p>
Full article ">Figure 9
<p>Class activation maps: (<b>a</b>) CNN branch in Arleigh Burke-class Destroyer, (<b>b</b>) CNN branch in Arleigh Burke-class Destroyer, (<b>c</b>) CNN branch in Arleigh Burke-class Destroyer, (<b>d</b>) Transformer branch in Arleigh Burke-class Destroyer, (<b>e</b>) Transformer branchin Arleigh Burke-class Destroyer, (<b>f</b>) Transformer branch in Arleigh Burke-class Destroyer.</p>
Full article ">
16 pages, 2625 KiB  
Article
Character Segmentation and Recognition of Variable-Length License Plates Using ROI Detection and Broad Learning System
by Bingshu Wang, Hongli Xiao, Jiangbin Zheng, Dengxiu Yu and C. L. Philip Chen
Remote Sens. 2022, 14(7), 1560; https://doi.org/10.3390/rs14071560 - 24 Mar 2022
Cited by 5 | Viewed by 3300
Abstract
Variable-length license plate segmentation and recognition has always been a challenging barrier in the application of intelligent transportation systems. Previous approaches mainly concern fixed-length license plates, lacking adaptability for variable-length license plates. Although objection detection methods can be used to address the issue, [...] Read more.
Variable-length license plate segmentation and recognition has always been a challenging barrier in the application of intelligent transportation systems. Previous approaches mainly concern fixed-length license plates, lacking adaptability for variable-length license plates. Although objection detection methods can be used to address the issue, they face a series of difficulties: cross class problem, missing detections, and recognition errors between letters and digits. To solve these problems, we propose a machine learning method that regards each character as a region of interest. It covers three parts. Firstly, we explore a transfer learning algorithm based on Faster-RCNN with InceptionV2 structure to generate candidate character regions. Secondly, a strategy of cross-class removal of character is proposed to reject the overlapped results. A mechanism of template matching and position predicting is designed to eliminate missing detections. Moreover, a twofold broad learning system is designed to identify letters and digits separately. Experiments performed on Macau license plates demonstrate that our method achieves an average 99.68% of segmentation accuracy and an average 99.19% of recognition rate, outperforming some conventional and deep learning approaches. The adaptability is expected to transfer the developed algorithm to other countries or regions. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Some examples of variable-length license plates. Different numbers, colors, and fonts are covered.</p>
Full article ">Figure 2
<p>Some problems of character segmentation and recognition generated by the ROI detection approach. (<b>a</b>) Cross class problem. (<b>b</b>) False positives or missing detections. (<b>c</b>) Recognition confusion between letters and digits.</p>
Full article ">Figure 3
<p>The flowchart of the proposed method. It includes three stages: stage A is employed to generated ROIs, stage B is designed to remove cross class and predict positions of missing characters, and stage C is designed to handle the problem of recognition confusions.</p>
Full article ">Figure 4
<p>The flowchart of handling missing detection by the proposed TMPP. (<b>a</b>) is the detection result generated by ROI-based method, (<b>b</b>) shows the rectangles of detected regions, (<b>c</b>) is the selected template from license plate template library that matches with (<b>b</b>), (<b>d</b>) represents predicted result by template matching, (<b>e</b>) shows final detection result.</p>
Full article ">Figure 5
<p>Some examples to show the function of TMPP. (<b>a</b>) Input images. (<b>b</b>) False positives or missing detections. (<b>c</b>) The results processed by TMPP.</p>
Full article ">Figure 6
<p>The structure of BLS. It encompasses mapped feature nodes and enhanced feature nodes.</p>
Full article ">Figure 7
<p>Some segmentation results of single-row license plates. (<b>a</b>) Input images. (<b>b</b>) Projection-based [<a href="#B13-remotesensing-14-01560" class="html-bibr">13</a>]. (<b>c</b>) MSER-based [<a href="#B17-remotesensing-14-01560" class="html-bibr">17</a>]. (<b>d</b>) CCA-based [<a href="#B19-remotesensing-14-01560" class="html-bibr">19</a>]. (<b>e</b>) ROI-based [<a href="#B33-remotesensing-14-01560" class="html-bibr">33</a>]. (<b>f</b>) Ours.</p>
Full article ">Figure 8
<p>Some segmentation results of double-row license plates. (<b>a</b>) Input images. (<b>b</b>) Projection-based [<a href="#B13-remotesensing-14-01560" class="html-bibr">13</a>]. (<b>c</b>) MSER-based [<a href="#B17-remotesensing-14-01560" class="html-bibr">17</a>]. (<b>d</b>) CCA-based [<a href="#B19-remotesensing-14-01560" class="html-bibr">19</a>]. (<b>e</b>) ROI-based [<a href="#B33-remotesensing-14-01560" class="html-bibr">33</a>]. (<b>f</b>) Ours.</p>
Full article ">Figure 9
<p>Some comparison of recognition results between ROI-based approach [<a href="#B33-remotesensing-14-01560" class="html-bibr">33</a>] and ours. Three sets are covered. The odd rows are generated by [<a href="#B33-remotesensing-14-01560" class="html-bibr">33</a>], and the even rows are generated by the proposed method. (<b>a</b>) Indoor; (<b>b</b>) outdoor; (<b>c</b>) complex.</p>
Full article ">Figure 10
<p>Some segmentation failures including missing detections and incorrect segmentation.</p>
Full article ">Figure 11
<p>The recognition of license plates using image sequences. For each example, (<b>a</b>) MN8850 and (<b>b</b>) ML5752, among the 10 frames, 9 frames are successful and only 1 failure occurred (at 2nd row, 3rd column) because of a strong reflection of illumination.</p>
Full article ">
23 pages, 30433 KiB  
Article
Two-Stream Swin Transformer with Differentiable Sobel Operator for Remote Sensing Image Classification
by Siyuan Hao, Bin Wu, Kun Zhao, Yuanxin Ye and Wei Wang
Remote Sens. 2022, 14(6), 1507; https://doi.org/10.3390/rs14061507 - 20 Mar 2022
Cited by 33 | Viewed by 6036
Abstract
Remote sensing (RS) image classification has attracted much attention recently and is widely used in various fields. Different to natural images, the RS image scenes consist of complex backgrounds and various stochastically arranged objects, thus making it difficult for networks to focus on [...] Read more.
Remote sensing (RS) image classification has attracted much attention recently and is widely used in various fields. Different to natural images, the RS image scenes consist of complex backgrounds and various stochastically arranged objects, thus making it difficult for networks to focus on the target objects in the scene. However, conventional classification methods do not have any special treatment for remote sensing images. In this paper, we propose a two-stream swin transformer network (TSTNet) to address these issues. TSTNet consists of two streams (i.e., original stream and edge stream) which use both the deep features of the original images and the ones from the edges to make predictions. The swin transformer is used as the backbone of each stream given its good performance. In addition, a differentiable edge Sobel operator module (DESOM) is included in the edge stream which can learn the parameters of Sobel operator adaptively and provide more robust edge information that can suppress background noise. Experimental results on three publicly available remote sensing datasets show that our TSTNet achieves superior performance over the state-of-the-art (SOTA) methods. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Edge information of the six scenes; the red curves represent the edge of the scene object.</p>
Full article ">Figure 2
<p>The framework of the two-stream swin transformer. It has two inputs, the original image (I) and the edge image T(I), and the synthetic edge image T(I) generated by differentiable edge Sobel operator module (DESOM). The fusion module fuses the original features and edge features.</p>
Full article ">Figure 3
<p>Two successive transformer blocks of the swin transformer. The regular and shifted windows correspond to W-MSA and SW-MSA, respectively.</p>
Full article ">Figure 4
<p>Shifted window mechanism of W-MSA and SW-MSA. (<b>a</b>) is the standard window, (<b>b</b>) is the repartitioned window, (<b>c</b>) is the window after the shift transformation, and (<b>d</b>) is the window after the (<b>e</b>) reverse shift transformation.</p>
Full article ">Figure 5
<p>Structure of differentiable edge Sobel operator module (DESOM).</p>
Full article ">Figure 6
<p>Design of the fusion module and loss function. An auxiliary cross-entropy loss about <math display="inline"><semantics> <msub> <mi>F</mi> <mn>1</mn> </msub> </semantics></math> is added to the cross-entropy loss of the fused feature <math display="inline"><semantics> <mrow> <mi>F</mi> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math>. The balance between the two loss functions is controlled by <math display="inline"><semantics> <mi>λ</mi> </semantics></math>.</p>
Full article ">Figure 7
<p>Example images of AID dataset: (1) airport; (2) bare land; (3) baseball field; (4) beach; (5) bridge; (6) centre; (7) church; (8) commercial; (9) dense residential; (10) desert; (11) farmland; (12) forest; (13) industrial; (14) meadow; (15) medium residential; (16) mountain; (17) park; (18) parking; (19) playground; (20) pond; (21) port; (22) railway station; (23) resort; (24) river; (25) school; (26) sparse residential; (27) square; (28) stadium; (29) storage tanks; (30) viaduct.</p>
Full article ">Figure 8
<p>Example images of NWPU dataset: (1) airplane; (2) airport; (3) baseball diamond; (4) basketball court; (5) beach; (6) bridge; (7) chaparral; (8) church; (9) circular farmland; (10) cloud; (11) commercial area; (12) dense residential; (13) desert; (14) forest; (15) freeway; (16) golf course; (17) ground track field; (18) harbor; (19) industrial area; (20) intersection; (21) island; (22) lake; (23) meadow; (24) medium residential; (25) mobile home park; (26) mountain; (27) overpass; (28) palace; (29) parking lot; (30) railway; (31) railway station; (32) rectangular farmland; (33) river; (34) roundabout; (35) runway; (36) sea ice; (37) ship; (38) snow berg; (39) sparse residential; (40) stadium; (41) storage tank; (42) tennis court; (43) terrace; (44) thermal power station; (45) wetland.</p>
Full article ">Figure 9
<p>Example images of UCM dataset: (1) agriculture, (2) airplane, (3) baseball diamond, (4) beach, (5) buildings, (6) chaparral, (7) dense resi- dential, (8) Forest, (9) Freeway, (10) golf course, (11) harbor, (12) intersection, (13) medium residential, (14) mobile home park, (15) overpass, (16) parking lot, (17) river, (18) runway, (19) sparse residential, (20) storage tanks, and (21) tennis court.</p>
Full article ">Figure 10
<p>Ablation Study: Accuracy comparison between different methods under the training ratios of (<b>a</b>) 20% and 50% on AID dataset; (<b>b</b>) 10% and 20% on NWPU dataset. Our proposed method, TSTNet, consistently has the best performance across different datasets with different training ratios.</p>
Full article ">Figure 11
<p>The optimal values learned by the differentiable Sobel operator under the training ratio of 10% on the NWPU dataset and under the training ratio of 20% on the AID dataset.</p>
Full article ">Figure 12
<p>(<b>a</b>) The original remote sensing images. (<b>b</b>) Synthesized edge images. (<b>c</b>) Attention map of the last layer of original stream (Swin-B). (<b>d</b>) Attention map of the last layer of TSTNet. The upper dashed line represents that original has only the original input images, and TST has both original input images and synthetic edge input images.</p>
Full article ">Figure 13
<p>Confusion matrix for AID dataset under 20% training rate, with true labels on the vertical axis and predicted values on the horizontal axis.</p>
Full article ">Figure 14
<p>Confusion matrix for NWPU dataset under 10% training rate, with true labels on the vertical axis and predicted values on the horizontal axis.</p>
Full article ">Figure 15
<p>Confusion matrix for UCM dataset under 50% training rate, with true labels on the vertical axis and predicted values on the horizontal axis.</p>
Full article ">
22 pages, 8223 KiB  
Article
Meta-Pixel-Driven Embeddable Discriminative Target and Background Dictionary Pair Learning for Hyperspectral Target Detection
by Tan Guo, Fulin Luo, Leyuan Fang and Bob Zhang
Remote Sens. 2022, 14(3), 481; https://doi.org/10.3390/rs14030481 - 20 Jan 2022
Cited by 12 | Viewed by 2615
Abstract
In hyperspectral target detection, the spectral high-dimensionality, variability, and heterogeneity will pose great challenges to the accurate characterizations of the target and background. To alleviate the problems, we propose a Meta-pixel-driven Embeddable Discriminative target and background Dictionary Pair (MEDDP) learning model by combining [...] Read more.
In hyperspectral target detection, the spectral high-dimensionality, variability, and heterogeneity will pose great challenges to the accurate characterizations of the target and background. To alleviate the problems, we propose a Meta-pixel-driven Embeddable Discriminative target and background Dictionary Pair (MEDDP) learning model by combining low-dimensional embeddable subspace projection and the discriminative target and background dictionary pair learning. In MEDDP, the meta-pixel set is built by taking the merits of homogeneous superpixel segmentation and the local manifold affinity structures, which can significantly reduce the influence of spectral variability and find the most typical and informative prototype spectral signature. Afterward, an embeddable discriminative dictionary pair learning model is established to learn a target and background dictionary pair based on the structural incoherent constraint with embeddable subspace projection. The proposed joint learning strategy can reduce the high-dimensional redundant information and simultaneously enhance the discrimination and compactness of the target and background dictionaries. The proposed MEDDP model is solved by an iterative and alternate optimization algorithm and applied with the meta-pixel-level target detection method. Experimental results on four benchmark HSI datasets indicate that the proposed method can consistently yield promising performance in comparison with some state-of-the-art target detectors. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed HSI target detection method. In the training stage, the observed HSI data is segmented by entropy rate superpixel segmentation method, and then the training meta-pixel set is constructed, which is further decomposed to get a discriminative target and background dictionary pair with the guidance of target spectra and the structurally incoherent regularization in an adaptive lower-dimensional embeddable subspace. In the testing stage, the HSI data is segmented with a finer scale to construct the testing meta-pixel set as in the training stage. The discriminative target and background dictionary pair obtained in the training stage are then combined with some representative representation learning-based methods, such as the SRD, SRBBH, and BCRD, for meta-pixel level target detection.</p>
Full article ">Figure 2
<p>Illustration for the center pixel and meta-pixel. As shown in (<b>a</b>), the center pixel equally merges the pixels in the superpixel. Differently, contributions of the pixels in superpixel are weighted by considering the local manifold affinity structure between different pixels in the superpixel and finding the key typical spectral signature in each superpixel, i.e., meta-pixel, as in (<b>b</b>).</p>
Full article ">Figure 3
<p>The HSI dataset and the corresponding ground-truth used in the experiments. (<b>a</b>) AVIRIS I dataset, (<b>b</b>) AVIRIS II dataset, (<b>c</b>) The Indian Pines dataset, (<b>d</b>) The HYDICE dataset.</p>
Full article ">Figure 3 Cont.
<p>The HSI dataset and the corresponding ground-truth used in the experiments. (<b>a</b>) AVIRIS I dataset, (<b>b</b>) AVIRIS II dataset, (<b>c</b>) The Indian Pines dataset, (<b>d</b>) The HYDICE dataset.</p>
Full article ">Figure 4
<p>Visual comparisons between the detection maps of the proposed method and other comparative methods on the AVIRIS I dataset.</p>
Full article ">Figure 5
<p>Visual comparisons between the Detection maps of the proposed method and other comparative methods on the AVIRIS II dataset.</p>
Full article ">Figure 6
<p>Visual comparisons between the Detection maps of the proposed method and other comparing methods on the Indian Pines dataset.</p>
Full article ">Figure 7
<p>Visual comparisons between the Detection maps of the proposed method and other comparative methods on the HYDICE dataset.</p>
Full article ">Figure 8
<p>ROC performance for all the comparative detectors on different data sets. (<b>a</b>) AVIRIS I dataset, (<b>b</b>) AVIRIS II dataset, (<b>c</b>) Indian Pines dataset, (<b>d</b>) HYDICE dataset.</p>
Full article ">Figure 9
<p>The ROC performance variations for our proposed detectors with different reduced dimensionality <span class="html-italic">d</span> on different data sets. (<b>a</b>) MEDDP + SRD, (<b>b</b>) MEDDP + SRBBH, (<b>c</b>) MEDDP + BCRD.</p>
Full article ">Figure 10
<p>The ROC performance variations for our proposed detectors with different number of training meta-pixel <span class="html-italic">C</span> on different data sets. (<b>a</b>) MEDDP + SRD, (<b>b</b>) MEDDP + SRBBH, (<b>c</b>) MEDDP + BCRD.</p>
Full article ">Figure 11
<p>The ROC performance variations for our proposed detectors with different number of testing meta-pixel V on different data sets. (<b>a</b>) MEDDP + SRD, (<b>b</b>) MEDDP + SRBBH, (<b>c</b>) MEDDP + BCRD.</p>
Full article ">Figure 12
<p>The ROC performance variations for our proposed detectors with different settings of the balance parameter α on different data sets. (<b>a</b>) MEDDP + SRD, (<b>b</b>) MEDDP + SRBBH, (<b>c</b>) MEDDP + BCRD.</p>
Full article ">Figure 13
<p>The ROC performance variations for our proposed detectors with different settings of the balance parameters <span class="html-italic">β</span> and <span class="html-italic">γ</span> with <span class="html-italic">α</span> fixed on different data sets. (<b>a</b>) MEDDP + SRD on the AVIRIS I data set with <span class="html-italic">α</span> = 100; (<b>b</b>) MEDDP + BCRD on the AVIRIS II data set with <span class="html-italic">α</span> = 10; (<b>c</b>) MEDDP + SRBBH on the Indian Pines data set with <span class="html-italic">α</span> = 0.01; (<b>d</b>) MEDDP + BCRD on the HYDICE data set with <span class="html-italic">α</span> = 0.1.</p>
Full article ">Figure 13 Cont.
<p>The ROC performance variations for our proposed detectors with different settings of the balance parameters <span class="html-italic">β</span> and <span class="html-italic">γ</span> with <span class="html-italic">α</span> fixed on different data sets. (<b>a</b>) MEDDP + SRD on the AVIRIS I data set with <span class="html-italic">α</span> = 100; (<b>b</b>) MEDDP + BCRD on the AVIRIS II data set with <span class="html-italic">α</span> = 10; (<b>c</b>) MEDDP + SRBBH on the Indian Pines data set with <span class="html-italic">α</span> = 0.01; (<b>d</b>) MEDDP + BCRD on the HYDICE data set with <span class="html-italic">α</span> = 0.1.</p>
Full article ">Figure 14
<p>The convergence curves of Algorithm 1 for solving the proposed MEDDP model on different data sets. (<b>a</b>) AVIRIS I, (<b>b</b>) AVIRIS II, (<b>c</b>) Indian Pines, (<b>d</b>) HYDICE.</p>
Full article ">
20 pages, 5903 KiB  
Article
A Lightweight Convolutional Neural Network Based on Group-Wise Hybrid Attention for Remote Sensing Scene Classification
by Cuiping Shi, Xinlei Zhang, Jingwei Sun and Liguo Wang
Remote Sens. 2022, 14(1), 161; https://doi.org/10.3390/rs14010161 - 30 Dec 2021
Cited by 13 | Viewed by 3400
Abstract
With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve classification performance on remote sensing scene images, the direct introduction of an attention module can increase [...] Read more.
With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve classification performance on remote sensing scene images, the direct introduction of an attention module can increase the number of model parameters and amount of calculation, resulting in slower model operations. To solve this problem, we carried out the following work. First, a channel attention module and spatial attention module were constructed. The input features were enhanced through channel attention and spatial attention separately, and the features recalibrated by the attention modules were fused to obtain the features with hybrid attention. Then, to reduce the increase in parameters caused by the attention module, a group-wise hybrid attention module was constructed. The group-wise hybrid attention module divided the input features into four groups along the channel dimension, then used the hybrid attention mechanism to enhance the features in the channel and spatial dimensions for each group, then fused the features of the four groups along the channel dimension. Through the use of the group-wise hybrid attention module, the number of parameters and computational burden of the network were greatly reduced, and the running time of the network was shortened. Finally, a lightweight convolutional neural network was constructed based on the group-wise hybrid attention (LCNN-GWHA) for remote sensing scene image classification. Experiments on four open and challenging remote sensing scene datasets demonstrated that the proposed method has great advantages, in terms of classification accuracy, even with a very low number of parameters. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Features obtained by traditional convolution.</p>
Full article ">Figure 2
<p>Channel Attention Module.</p>
Full article ">Figure 3
<p>Spatial Attention Module.</p>
Full article ">Figure 4
<p>Group-wise hybrid attention module (GWHAM). W, H, and C are the width, height and number of channels of the feature, respectively. <math display="inline"><semantics> <mover> <mi>X</mi> <mo>⌢</mo> </mover> </semantics></math> and <math display="inline"><semantics> <mover> <mi>X</mi> <mo>⌣</mo> </mover> </semantics></math> are the output features of channel attention and spatial attention, respectively. <math display="inline"><semantics> <mover> <mi>X</mi> <mo stretchy="false">˜</mo> </mover> </semantics></math> is the hybrid attention feature after fusion.</p>
Full article ">Figure 5
<p>Overall flowchart of the proposed LCNN-GWHA method. (GWHAM refers to the group-wise hybrid attention modules, and GAP denotes global average pooling).</p>
Full article ">Figure 6
<p>Confusion matrix of the proposed LCNN-GWHA method of the RSSCN7 Dataset (50/50).</p>
Full article ">Figure 7
<p>Confusion Matrix for the LCNN-GWHA Method on the UCM21 Dataset (80/20).</p>
Full article ">Figure 8
<p>Confusion Matrix for the LCNN-GWHA Method on the AID (50/50) Dataset.</p>
Full article ">Figure 9
<p>Confusion Matrix for the LCNN-GWHA Method on the NWPU45 (20/80) Dataset.</p>
Full article ">Figure 10
<p>Attention Visualization Results.</p>
Full article ">Figure 11
<p>Class activation map (CAM) visualization results of the LCNN-GWHA method and the VGG_VD16 with SAFF method on UCM21 dataset.</p>
Full article ">Figure 12
<p>Random classification prediction results.</p>
Full article ">
20 pages, 10292 KiB  
Article
Building Plane Segmentation Based on Point Clouds
by Zhonghua Su, Zhenji Gao, Guiyun Zhou, Shihua Li, Lihui Song, Xukun Lu and Ning Kang
Remote Sens. 2022, 14(1), 95; https://doi.org/10.3390/rs14010095 - 25 Dec 2021
Cited by 14 | Viewed by 4773
Abstract
Planes are essential features to describe the shapes of buildings. The segmentation of a plane is significant when reconstructing a building in three dimensions. However, there is a concern about the accuracy in segmenting plane from point cloud data. The objective of this [...] Read more.
Planes are essential features to describe the shapes of buildings. The segmentation of a plane is significant when reconstructing a building in three dimensions. However, there is a concern about the accuracy in segmenting plane from point cloud data. The objective of this paper was to develop an effective segmentation algorithm for building planes that combines the region growing algorithm with the distance algorithm based on boundary points. The method was tested on point cloud data from a cottage and pantry as scanned using a Faro Focus 3D laser range scanner and Matterport Camera, respectively. A coarse extraction of the building plane was obtained from the region growing algorithm. The coplanar points where two planes intersect were obtained from the distance algorithm. The building plane’s optimal segmentation was then obtained by combining the coarse extraction plane points and the corresponding coplanar points. The results show that the proposed method successfully segmented the plane points of the cottage and pantry. The optimal distance thresholds using the proposed method from the uncoarse extraction plane points to each plane boundary point of cottage and pantry were 0.025 m and 0.030 m, respectively. The highest correct rate and the highest error rate of the cottage’s (pantry’s) plane segmentations using the proposed method under the optimal distance threshold were 99.93% and 2.30% (98.55% and 2.44%), respectively. The F1 score value of the cottage’s and pantry’s plane segmentations using the proposed method under the optimal distance threshold reached 97.56% and 95.75%, respectively. This method can segment different objects on the same plane, while the random sample consensus (RANSAC) algorithm causes the plane to become over-segmented. The proposed method can also extract the coplanar points at the intersection of two planes, which cannot be separated using the region growing algorithm. Although the RANSAC-RG method combining the RANSAC algorithm and the region growing algorithm can optimize the segmentation results of the RANSAC (region growing) algorithm and has little difference in segmentation effect (especially for cottage data) with the proposed method, the method still loses coplanar points at some intersection of the two planes. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Workflow of building plane point segmentation.</p>
Full article ">Figure 2
<p>Boundary point extraction algorithm where (<b>a</b>) the point projected on the micro-tangent plane is on the plane’s boundary and (<b>b</b>) is inside the plane.</p>
Full article ">Figure 3
<p>Boundary points extraction where (<b>a</b>) raw point cloud and (<b>b</b>) boundary points extraction results.</p>
Full article ">Figure 4
<p>The distance threshold: (<b>a</b>) 0.020 m; (<b>b</b>) 0.025 m.</p>
Full article ">Figure 5
<p>Raw point cloud data of the cottage: (<b>a</b>) front view and (<b>b</b>) top view.</p>
Full article ">Figure 6
<p>Raw point cloud data of the pantry: (<b>a</b>) front view and (<b>b</b>) side view.</p>
Full article ">Figure 7
<p>The RANSAC algorithm fitting plane points: (<b>a</b>) Original points; (<b>b</b>) The plane detected by the RANSAC algorithm.</p>
Full article ">Figure 8
<p>(<b>a</b>) Raw data; (<b>b</b>) The region growing algorithm; (<b>c</b>) The proposed method.</p>
Full article ">Figure 9
<p>Cottage’s segmentation results using the RANSAC algorithm: (<b>a</b>) front view and (<b>b</b>) top view.</p>
Full article ">Figure 10
<p>Pantry’s segmentation results using the RANSAC algorithm: (<b>a</b>) front view and (<b>b</b>) side view.</p>
Full article ">Figure 11
<p>Cottage’s segmentation results using the region growing algorithm: (<b>a</b>) front view and (<b>b</b>) top view.</p>
Full article ">Figure 12
<p>Pantry’s segmentation results using the region growing algorithm: (<b>a</b>) front view and (<b>b</b>) side view.</p>
Full article ">Figure 13
<p>Cottage’s segmentation results using the RANSAC-RG method: (<b>a</b>) front view and (<b>b</b>) top view.</p>
Full article ">Figure 14
<p>Pantry’s segmentation results using the RANSAC-RG method: (<b>a</b>) front view and (<b>b</b>) side view.</p>
Full article ">Figure 15
<p>Cottage’s segmentation results using the proposed method under the optimal distance threshold: (<b>a</b>) front view and (<b>b</b>) top view.</p>
Full article ">Figure 16
<p>Pantry’s segmentation results using the proposed method under the optimal distance threshold: (<b>a</b>) front view and (<b>b</b>) side view.</p>
Full article ">
22 pages, 2202 KiB  
Article
Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning
by Feng Yang, Xiangyue Yuan, Jie Ran, Wenqiang Shu, Yue Zhao, Anyong Qin and Chenqiang Gao
Remote Sens. 2021, 13(23), 4774; https://doi.org/10.3390/rs13234774 - 25 Nov 2021
Cited by 6 | Viewed by 3664
Abstract
Instance segmentation for high-resolution remote sensing images (HRSIs) is a fundamental yet challenging task in earth observation, which aims at achieving instance-level location and pixel-level classification for instances of interest on the earth’s surface. The main difficulties come from the huge scale variation, [...] Read more.
Instance segmentation for high-resolution remote sensing images (HRSIs) is a fundamental yet challenging task in earth observation, which aims at achieving instance-level location and pixel-level classification for instances of interest on the earth’s surface. The main difficulties come from the huge scale variation, arbitrary instance shapes, and numerous densely packed small objects in HRSIs. In this paper, we design an end-to-end multi-category instance segmentation network for HRSIs, where three new modules based on adaptive and dynamic feature learning are proposed to address the above issues. The cross-scale adaptive fusion (CSAF) module introduces a novel multi-scale feature fusion mechanism to enhance the capability of the model to detect and segment objects with noticeable size variation. To predict precise masks for the complex boundaries of remote sensing instances, we embed a context attention upsampling (CAU) kernel instead of deconvolution in the segmentation branch to aggregate contextual information for refined upsampling. Furthermore, we extend the general fixed positive and negative sample judgment threshold strategy into a dynamic sample selection (DSS) module to select more suitable positive and negative samples flexibly for densely packed instances. These three modules enable a better feature learning of the instance segmentation network. Extensive experiments are conducted on the iSAID and NWU VHR-10 instance segmentation datasets to validate the proposed method. Attributing to the three proposed modules, we have achieved 1.9% and 2.9% segmentation performance improvements on these two datasets compared with the baseline method and achieved the state-of-the-art performance. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Characteristics of objects in HRSIs. (<b>a</b>) There are huge scale variations among different planes. (<b>b</b>) Harbors present complex boundaries. (<b>c</b>) Densely packed ships appear in the marina. Notice the size gap among objects in the three scenes and the shape differences among the harbors of (<b>b</b>,<b>c</b>).</p>
Full article ">Figure 2
<p>The network structure of the proposed method, which is based on the architecture of PANet and adds the cross-scale adaptive fusion (CSAF) module for multi-scale feature map fusion, context attention upsampling (CAU) module to refine mask prediction and dynamic sample selection (DSS) module to select suitable positive/negative samples.</p>
Full article ">Figure 3
<p>The structure of proposed cross-scale adaptive fusion module. For each pyramidal feature map, the others are rescaled to the same shape and then spatially fused together according to the learned fusion weights.</p>
Full article ">Figure 4
<p>Illustration for the cross-scale adaptive fusion mechanism. Here we take the fusion to the target layer <math display="inline"><semantics> <msub> <mover accent="true"> <mi>P</mi> <mo>^</mo> </mover> <mn>4</mn> </msub> </semantics></math> for example.</p>
Full article ">Figure 5
<p>Illustration of the context attention upsampling. A feature map <math display="inline"><semantics> <mo>Ψ</mo> </semantics></math> with size <math display="inline"><semantics> <mrow> <mi>W</mi> <mo>×</mo> <mi>H</mi> <mo>×</mo> <mi>C</mi> </mrow> </semantics></math> is upsampled by a factor of <math display="inline"><semantics> <mi>η</mi> </semantics></math> to the output feature map <math display="inline"><semantics> <msup> <mo>Ψ</mo> <mo>′</mo> </msup> </semantics></math>. Here we take <math display="inline"><semantics> <mrow> <mi>η</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math> for example.</p>
Full article ">Figure 6
<p>The mutual interference among densely packed instances. There can be multiple objects in a candidate bounding box (as shown in the red box in the figure). These neighboring objects that have similar appearances and structures can be considered as interference noise during locating and classification, which affects the prediction results of the network.</p>
Full article ">Figure 7
<p>The calculation process of penalty item. The yellow box and the light blue box denote the candidate positive sample and its corresponding ground-truth box. The blue dashed box <span class="html-italic">R</span> represents their minimum enclosing rectangle.</p>
Full article ">Figure 8
<p>The differences between <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>o</mi> <mi>U</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>I</mi> <mi>o</mi> <msub> <mi>U</mi> <mi>p</mi> </msub> </mrow> </semantics></math>. The candidate box <span class="html-italic">c</span> of (<b>a</b>) is placed parallel to the ground-truth <span class="html-italic">g</span>, while that of (<b>b</b>) is placed in a misplaced position. Although the IoU of (<b>a</b>,<b>b</b>) is the same, their difficulty of coordinate regression is different.</p>
Full article ">Figure 9
<p>Class-wise instance segmentation of proposed approach on iSAID validation set.</p>
Full article ">Figure 10
<p>Class-wise instance segmentation of proposed approach on NWPU VHR-10 test set.</p>
Full article ">Figure 11
<p>Visual instance segmentation results of the proposed method on iSAID validation set. (<b>a</b>) input images; (<b>b</b>) ground-truth mask; (<b>c</b>,<b>d</b>) predicted results of PANet and our method. The red rectangles indicate the missing prediction and under-segmentation problems of PANet.</p>
Full article ">Figure 12
<p>Visual instance segmentation results of the proposed method on NWPU VHR-10 test set. (<b>a</b>) input images; (<b>b</b>) ground-truth mask; (<b>c</b>,<b>d</b>) predicted results of PANet and our method.</p>
Full article ">
25 pages, 4607 KiB  
Article
Remote Sensing Scene Image Classification Based on Dense Fusion of Multi-level Features
by Cuiping Shi, Xinlei Zhang, Jingwei Sun and Liguo Wang
Remote Sens. 2021, 13(21), 4379; https://doi.org/10.3390/rs13214379 - 30 Oct 2021
Cited by 14 | Viewed by 2483
Abstract
For remote sensing scene image classification, many convolution neural networks improve the classification accuracy at the cost of the time and space complexity of the models. This leads to a slow running speed for the model and cannot realize a trade-off between the [...] Read more.
For remote sensing scene image classification, many convolution neural networks improve the classification accuracy at the cost of the time and space complexity of the models. This leads to a slow running speed for the model and cannot realize a trade-off between the model accuracy and the model running speed. As the network deepens, it is difficult to extract the key features with a sample double branched structure, and it also leads to the loss of shallow features, which is unfavorable to the classification of remote sensing scene images. To solve this problem, we propose a dual branch multi-level feature dense fusion-based lightweight convolutional neural network (BMDF-LCNN). The network structure can fully extract the information of the current layer through 3 × 3 depthwise separable convolution and 1 × 1 standard convolution, identity branches, and fuse with the features extracted from the previous layer 1 × 1 standard convolution, thus avoiding the loss of shallow information due to network deepening. In addition, we propose a downsampling structure that is more suitable for extracting the shallow features of the network by using the pooled branch to downsample and the convolution branch to compensate for the pooled features. Experiments were carried out on four open and challenging remote sensing image scene data sets. The experimental results show that the proposed method has higher classification accuracy and lower model complexity than some state-of-the-art classification methods and realizes the trade-off between model accuracy and model running speed. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The proposed BMDF-LCNN network model.</p>
Full article ">Figure 2
<p>Three downsampling structure diagrams. (<b>a</b>) Convolutional Downsampling (<b>b</b>) Maximum Pooled Downsampling (<b>c</b>) Our proposed downsampling method (each convolution layer is followed by BN layer and RELU).</p>
Full article ">Figure 3
<p>Optimizing time and space complexity structures. (<b>a</b>) Basic structure diagrams for optimizing time and space complexity. (<b>b</b>) A structure diagram with the same number of input and output channels in the first layer of the branch, (<b>c</b>) A structure diagram with a different number of input and output channels in the first layer of the branch (each convolution layer is followed by BN and ReLU layers).</p>
Full article ">Figure 3 Cont.
<p>Optimizing time and space complexity structures. (<b>a</b>) Basic structure diagrams for optimizing time and space complexity. (<b>b</b>) A structure diagram with the same number of input and output channels in the first layer of the branch, (<b>c</b>) A structure diagram with a different number of input and output channels in the first layer of the branch (each convolution layer is followed by BN and ReLU layers).</p>
Full article ">Figure 4
<p>Performance comparison of BMDF-LCNN and LCNN-BFF. (<b>a</b>) Comparison of AA values between BMDF-LCNN and LCNN-BFF. (<b>b</b>) Comparison of F1 values between BMDF-LCNN and LCNN-BFF.</p>
Full article ">Figure 5
<p>Confusion Matrices of BMDF-LCNN Method on UC and RSSCN Datasets. (<b>a</b>) Confusion matrix obtained on 80/20 UC datasets. (<b>b</b>) Confusion Matrix on 50/50 RSSCN Dataset.</p>
Full article ">Figure 5 Cont.
<p>Confusion Matrices of BMDF-LCNN Method on UC and RSSCN Datasets. (<b>a</b>) Confusion matrix obtained on 80/20 UC datasets. (<b>b</b>) Confusion Matrix on 50/50 RSSCN Dataset.</p>
Full article ">Figure 6
<p>Confusion matrices of BMDF-LCNN method proposed on 20/80 AID and 10/90 NWPU datasets. (<b>a</b>) Confusion Matrix on 20/80AID Dataset. (<b>b</b>) Confusion Matrix on 10/90NWPU Dataset.</p>
Full article ">Figure 7
<p>Thermal diagram on RSSCN dataset.</p>
Full article ">Figure 8
<p>T-SNE visualization results on 80/20 UC and RSSCN (50/50) datasets. (<b>a</b>) T-SNE visualization results on 80/20 UC datasets. (<b>b</b>) T-SNE visualization results on 50/50 RSSCN datasets.</p>
Full article ">Figure 9
<p>Random classification prediction results.</p>
Full article ">

Review

Jump to: Research

37 pages, 3377 KiB  
Review
Self-Supervised Learning for Scene Classification in Remote Sensing: Current State of the Art and Perspectives
by Paul Berg, Minh-Tan Pham and Nicolas Courty
Remote Sens. 2022, 14(16), 3995; https://doi.org/10.3390/rs14163995 - 17 Aug 2022
Cited by 38 | Viewed by 6559
Abstract
Deep learning methods have become an integral part of computer vision and machine learning research by providing significant improvement performed in many tasks such as classification, regression, and detection. These gains have been also observed in the field of remote sensing for Earth [...] Read more.
Deep learning methods have become an integral part of computer vision and machine learning research by providing significant improvement performed in many tasks such as classification, regression, and detection. These gains have been also observed in the field of remote sensing for Earth observation where most of the state-of-the-art results are now achieved by deep neural networks. However, one downside of these methods is the need for large amounts of annotated data, requiring lots of labor-intensive and expensive human efforts, in particular for specific domains that require expert knowledge such as medical imaging or remote sensing. In order to limit the requirement on data annotations, several self-supervised representation learning methods have been proposed to learn unsupervised image representations that can consequently serve for downstream tasks such as image classification, object detection or semantic segmentation. As a result, self-supervised learning approaches have been considerably adopted in the remote sensing domain within the last few years. In this article, we review the underlying principles developed by various self-supervised methods with a focus on scene classification task. We highlight the main contributions and analyze the experiments, as well as summarize the key conclusions, from each study. We then conduct extensive experiments on two public scene classification datasets to benchmark and evaluate different self-supervised models. Based on comparative results, we investigate the impact of individual augmentations when applied to remote sensing data as well as the use of self-supervised pre-training to boost the classification performance with limited number of labeled samples. We finally underline the current trends and challenges, as well as perspectives of self-supervised scene classification. Full article
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)
Show Figures

Figure 1

Figure 1
<p>Illustration of the difference between object-centric natural images (from the ImageNet [<a href="#B1-remotesensing-14-03995" class="html-bibr">1</a>] dataset) and remote sensing scene images (from the Resisc-45 [<a href="#B3-remotesensing-14-03995" class="html-bibr">3</a>] dataset). (<b>a</b>) Object-centric image samples; (<b>b</b>) Remote sensing image samples.</p>
Full article ">Figure 2
<p>The GAN architecture [<a href="#B29-remotesensing-14-03995" class="html-bibr">29</a>] for use in generative self-supervised representation learning. The representation used is <span class="html-italic">h</span>, the last discriminator activation before a binary classification in fake/real labels.</p>
Full article ">Figure 3
<p>In RotNet [<a href="#B33-remotesensing-14-03995" class="html-bibr">33</a>], a random rotation is applied to input images and the model is then tasked to classify which rotation was applied. The model is composed of an encoder <span class="html-italic">f</span> whose output representations <span class="html-italic">h</span> are used by a predictor <span class="html-italic">p</span> to classify the random rotation.</p>
Full article ">Figure 4
<p>The triplet loss [<a href="#B37-remotesensing-14-03995" class="html-bibr">37</a>] is used to learn discriminative representations by learning an encoder that is able to discriminate between negative and positive samples.</p>
Full article ">Figure 5
<p>Illustration of contrastive loss on the 2-dimensional unit sphere with two negative (<math display="inline"><semantics> <msubsup> <mi>z</mi> <mn>1</mn> <mo>−</mo> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>z</mi> <mn>2</mn> <mo>−</mo> </msubsup> </semantics></math>) and one positive (<math display="inline"><semantics> <msup> <mi>z</mi> <mo>+</mo> </msup> </semantics></math>) samples from the EuroSAT [<a href="#B17-remotesensing-14-03995" class="html-bibr">17</a>] dataset.</p>
Full article ">Figure 6
<p>In momentum contrast [<a href="#B40-remotesensing-14-03995" class="html-bibr">40</a>], a queue of <span class="html-italic">Q</span> samples is built using a momentum encoder (<b>right</b>) whose weights are updated as an exponentially moving average (EMA) of the main encoder’s weights (<b>left</b>). Therefore, at each step, only the main encoder’s weights are updated by back-propagation. The similarity between the queue samples and the encoded batch samples is then used in the contrastive loss (cf. Equation (<a href="#FD4-remotesensing-14-03995" class="html-disp-formula">4</a>)).</p>
Full article ">Figure 7
<p>The non-contrastive BYOL [<a href="#B44-remotesensing-14-03995" class="html-bibr">44</a>] architecture which uses a student <span class="html-italic">A</span> and a teacher <span class="html-italic">B</span> pathways to encode the images. The teacher’s weights are updated using an EMA of the student’s weights. The online branch is also equipped with an additional network <math display="inline"><semantics> <msup> <mi>p</mi> <mi>A</mi> </msup> </semantics></math> called the predictor.</p>
Full article ">Figure 8
<p>The split-brain autoencoder architecture [<a href="#B55-remotesensing-14-03995" class="html-bibr">55</a>] used in [<a href="#B54-remotesensing-14-03995" class="html-bibr">54</a>] to split the image <span class="html-italic">x</span> in different data channels, where each autoencoder learns to reconstruct the dedicated missing channels. <math display="inline"><semantics> <msup> <mi>f</mi> <mi>A</mi> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi>f</mi> <mi>B</mi> </msup> </semantics></math> are two autoencoders each reconstructing a different subset of input channels given by channel masking functions <math display="inline"><semantics> <msup> <mi>Mask</mi> <mi>A</mi> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi>Mask</mi> <mi>B</mi> </msup> </semantics></math>, respectively.</p>
Full article ">Figure 9
<p>Sample images from the Resisc-45 [<a href="#B3-remotesensing-14-03995" class="html-bibr">3</a>] dataset with 45 scene classes.</p>
Full article ">Figure 10
<p>Sample images from the EuroSAT [<a href="#B17-remotesensing-14-03995" class="html-bibr">17</a>] dataset.</p>
Full article ">Figure 11
<p>Joint-embedding methods pre-training and usage in a downstream task of scene classification. In the pre-training phase, depending on the framework, the encoder and predictor could have similar or different architectures to the two branches. In the downstream phase, encoder weights are frozen within the linear evaluation or can be updated within the fine-tuning evaluation.</p>
Full article ">Figure 12
<p>The t-SNE [<a href="#B109-remotesensing-14-03995" class="html-bibr">109</a>] visualization of feature representations extracted from the EuroSAT validation set using the different pre-trained backbones of four self-supervised models, the supervised model, and the random weight initialization strategy. (<b>a</b>) SimCLR [<a href="#B39-remotesensing-14-03995" class="html-bibr">39</a>]; (<b>b</b>) MoCo-v2 [<a href="#B67-remotesensing-14-03995" class="html-bibr">67</a>]; (<b>c</b>) Barlow Twins [<a href="#B48-remotesensing-14-03995" class="html-bibr">48</a>]; (<b>d</b>) BYOL [<a href="#B44-remotesensing-14-03995" class="html-bibr">44</a>]; (<b>e</b>) supervised; (<b>f</b>) random weights.</p>
Full article ">Figure 13
<p>Comparison of fine-tuning performance using MoCo-v2 or BYOL under a limited number of samples with pre-training transfer on the EuroSAT (<b>a</b>) Validation on the EuroSAT dataset. (<b>b</b>) Validation on the Resisc-45 dataset.</p>
Full article ">Figure 14
<p>Comparison of fine-tuning performance of supervised and self-supervised pre-trained models on another dataset. (<b>a</b>) Pre-training on Resisc-45 and fine-tuning/validation on EuroSAT. (<b>b</b>) Pre-training on EuroSAT and fine-tuning/validation on Resisc-45.</p>
Full article ">
Back to TopTop