[go: up one dir, main page]

 
 
remotesensing-logo

Journal Browser

Journal Browser

Artificial Intelligence Algorithm for Remote Sensing Imagery Processing

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (30 November 2021) | Viewed by 99978

Special Issue Editors


E-Mail Website1 Website2
Guest Editor
Department of computer technology and communications, Polytechnic School of Cáceres, University of Extremadura, 10003 Cáceres, Spain
Interests: hyperspectral remote sensing; deep learning; Graphics Processing Units (GPUs); High Performance Computing (HPC) techniques
Special Issues, Collections and Topics in MDPI journals
Department of Computer Technology and Communications, Polytechnic School of Cáceres, University of Extremadura, 10003 Cáceres, Spain
Interests: hyperspectral image analysis; machine (deep) learning; neural networks; multisensor data fusion; high performance computing; cloud computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

During the last decades, significant efforts have been made in the Remote Sensing field in order to obtain rich and accurate information about the earth’s surface. The impressive advances in computer technology, in terms of hardware devices and software desing, have enabled the launch of multiple earth observation missions, which are currently collecting huge amounts of data daily. These raw data captures the matter-energy interactions at the earth’s surface and are characterised by their great variety in terms of typology (e.g. lidar and radar data or optical and thermal imaging), acquisition platforms (e.g. unnamed aerial vehicles or UAV, traditional aerial platforms and satelites) and spatial, spectral and temporal resolutions (from high to low spatial resolution data, from single band pachromatic images to hyperspectral images with hundreds of spectral channels, and revisit times for the same observation area from hours to days). The opportunities of using remote sensing data to contribute to economic and social activities is highly attractive, as they collect rich information over large spatial areas, which enables the detailed characterization of natural features, different materials and physical objects on the ground. Indeed, the current literature on the use and exploitation of these data has proved that they are truly useful in different decision-making tasks, such as precision agriculture, natural resource management, urban planning, risk prevention, disasters management, national defense, and homeland security, among many other application areas. 

However, the raw data obtained by remote sensors must be properly processed in order to exploit the information contained in them, refining the data to the end-user level. In this sense, this data processing has to deal with several challenges and limitations, such as high data complexity, noise data due to sensor limitations or uncontrolled atmospheric changes, low spatial resolutions, spectral mixtures, redundancies and correlations between spectral bands, lack of labelled samples, cloud occlusions, high data dimensionality, high intra-class variability and inter-class similarity… To face these issues, the implementation of new, more powerful processing tools is absolutely mandatory, which must be able to  extract the relevant information contained into remote sensing data in a reliable and efficient way. 

In this regard, artificial intelligence (AI) techniques have had significant successes in multiple fields related to data processing, such as speech recognition and computer vision. These methods provide interesting procedures to automaticaly process large amounts of data and conduct data-drive decisions in an accurate way. Moreover, the increasing capabilities of computer systems have promoted a great evolution of these algorithms, from traditional pattern recognition methods to complex machine and task-driven deep learning models, which are achieving unprecedented results. In particular, many AI-based algorithms are achieving dramatic improvements in many remote sensing analysis, such as unmixing, data classification, object/target or anomaly/change detection, data super-resolution, data fusion, cloud removal, denoising, spectral reduction… However, the implementation of these algorithms to remote data processing must address the characteristics, challenges and limitations imposed by this kind of data, and therefore remains a challenging task. 

This Special Issue invites manuscripts that present new AI approaches or improved AI-based algorithms for processing the information contained into remote sensing data. As this is a broad area, there are no constraints regarding the field of application. In this sense, the aim of this special issue will focus on presenting the current state of AI methods for the analysis of remote sensing data in several fields of application.

Dr. Mercedes E. Paoletti
Dr. Juan M. Haut
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Artificial Intelligence
  • Remote Sensing
  • data analysis
  • machine learning, deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (17 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 3919 KiB  
Article
RS-DARTS: A Convolutional Neural Architecture Search for Remote Sensing Image Scene Classification
by Zhen Zhang, Shanghao Liu, Yang Zhang and Wenbo Chen
Remote Sens. 2022, 14(1), 141; https://doi.org/10.3390/rs14010141 - 29 Dec 2021
Cited by 29 | Viewed by 4645
Abstract
Due to the superiority of convolutional neural networks, many deep learning methods have been used in image classification. The enormous difference between natural images and remote sensing images makes it difficult to directly utilize or modify existing CNN models for remote sensing scene [...] Read more.
Due to the superiority of convolutional neural networks, many deep learning methods have been used in image classification. The enormous difference between natural images and remote sensing images makes it difficult to directly utilize or modify existing CNN models for remote sensing scene classification tasks. In this article, a new paradigm is proposed that can automatically design a suitable CNN architecture for scene classification. A more efficient search framework, RS-DARTS, is adopted to find the optimal network architecture. This framework has two phases. In the search phase, some new strategies are presented, making the calculation process smoother, and better distinguishing the optimal and other operations. In addition, we added noise to suppress skip connections in order to close the gap between trained and validation processing and ensure classification accuracy. Moreover, a small part of the neural network is sampled to reduce the redundancy in exploring the network space and speed up the search processing. In the evaluation phase, the optimal cell architecture is stacked to construct the final network. Extensive experiments demonstrated the validity of the search strategy and the impressive classification performance of RS-DARTS on four public benchmark datasets. The proposed method showed more effectiveness than the manually designed CNN model and other methods of neural architecture search. Especially, in terms of search cost, RS-DARTS consumed less time than other NAS methods. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of the proposed approach (best viewed in color), RS-DARTS.</p>
Full article ">Figure 2
<p>Some images random sampled from four benchmark datasets.</p>
Full article ">Figure 3
<p>Searched time cost in searched phase.</p>
Full article ">Figure 4
<p>Searched cell architectures of three methods on Large-RS data set. (<b>a</b>) Normal cell searched by DARTS. (<b>b</b>) Reduction cell searched by DARTS. (<b>c</b>) Normal cell searched by PCDARTS. (<b>d</b>) Reduction cell searched by PCDARTS. (<b>e</b>) Normal cell searched by RS-DARTS. (<b>f</b>) Reduction cell searched by RS-DARTS.</p>
Full article ">Figure 4 Cont.
<p>Searched cell architectures of three methods on Large-RS data set. (<b>a</b>) Normal cell searched by DARTS. (<b>b</b>) Reduction cell searched by DARTS. (<b>c</b>) Normal cell searched by PCDARTS. (<b>d</b>) Reduction cell searched by PCDARTS. (<b>e</b>) Normal cell searched by RS-DARTS. (<b>f</b>) Reduction cell searched by RS-DARTS.</p>
Full article ">Figure 5
<p>Searched cell architectures of GPAS [<a href="#B41-remotesensing-14-00141" class="html-bibr">41</a>]. (<b>a</b>) Normal cell searched by GPAS. (<b>b</b>) Reduction cell searched by GPAS.</p>
Full article ">Figure 6
<p>Searched cell architectures of Auto-RSISC [<a href="#B42-remotesensing-14-00141" class="html-bibr">42</a>]. (<b>a</b>) Normal cell searched by Auto-RSISC. (<b>b</b>) Reduction cell searched by Auto-RSISC.</p>
Full article ">
22 pages, 7275 KiB  
Article
Semi-Autonomous Learning Algorithm for Remote Image Object Detection Based on Aggregation Area Instance Refinement
by Bei Cheng, Zhengzhou Li, Hui Li, Zhiquan Ding and Tianqi Qin
Remote Sens. 2021, 13(24), 5065; https://doi.org/10.3390/rs13245065 - 14 Dec 2021
Viewed by 2090
Abstract
Semi-autonomous learning for object detection has attracted more and more attention in recent years, which usually tends to find only one object instance with the highest score in each image. However, this strategy usually highlights the most representative part of the object instead [...] Read more.
Semi-autonomous learning for object detection has attracted more and more attention in recent years, which usually tends to find only one object instance with the highest score in each image. However, this strategy usually highlights the most representative part of the object instead of the whole object, which may lead to the loss of a lot of important information. To solve this problem, a novel end-to-end aggregate-guided semi-autonomous learning residual network is proposed to perform object detection. Firstly, a progressive modified residual network (MRN) is applied to the backbone network to make the detector more sensitive to the boundary features of the object. Then, an aggregate-based region-merging strategy (ARMS) is designed to select high-quality instances by selecting aggregation areas and merging these regions. The ARMS selects the aggregation areas that are highly related to the object through association coefficient, and then evaluates the aggregation areas through a similarity coefficient and fuses them to obtain high-quality object instance areas. Finally, a regression-locating branch is further developed to refine the location of the object, which can be optimized jointly with regional classification. Extensive experiments demonstrate that the proposed method is superior to state-of-the-art methods. Full article
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the proposed framework.</p>
Full article ">Figure 2
<p>The data flow for plaint net and residual net.</p>
Full article ">Figure 3
<p>The feature map of VGG backbone network and MRN. (<b>1</b>) The original image. (<b>2</b>) The feature map extracted by VGG-16 backbone network. (<b>3</b>) The feature map extracted by MRN.</p>
Full article ">Figure 4
<p>Illustration of the process of aggregation region merging in ARMS. (<b>1</b>) The relationship between region A and region C satisfies the description in Formula (14) according to the <math display="inline"><semantics> <msub> <mrow> <mi>S</mi> <mi>M</mi> </mrow> <mrow> <mi>A</mi> <mi>C</mi> </mrow> </msub> </semantics></math>, then region A and region C are merged. The same is done be-tween region A and region B. (<b>2</b>) All regions do not meet the consolidation conditions in Formula (14), so there are no merge operations between regions.</p>
Full article ">Figure 5
<p>The PRCs of several semi-autonomous learning methods for augmented NWPU-VHR 10 dataset. (<b>a</b>) The PRC of airplane. (<b>b</b>) The PRC of ship. (<b>c</b>) The PRC of storage tank. (<b>d</b>) The PRC of baseball diamond. (<b>e</b>) The PRC of tennis court. (<b>f</b>) The PRC of basketball court. (<b>g</b>) The PRC of ground track field. (<b>h</b>) The PRC of harbor. (<b>i</b>) The PRC of bridge. (<b>j</b>) The PRC of vehicle.</p>
Full article ">Figure 6
<p>The PRCs of several semi-autonomous learning methods for LEVIR dataset. (<b>a</b>) The PRC of airplane. (<b>b</b>) The PRC of ship. (<b>c</b>) The PRC of oil port.</p>
Full article ">Figure 7
<p>Some detection results on the augmented NWPU VHR-10. (<b>a</b>(<b>1</b>)) Detection results of airplane. (<b>a</b>(<b>2</b>)) Detection results of ground track field and baseball diamond. (<b>a</b>(<b>3</b>)) Detection results of basketball court and tennis court. (<b>a</b>(<b>4</b>)) Detection results of ground track field and tennis court. (<b>a</b>(<b>5</b>)) Detection results of storage tank. (<b>b</b>(<b>1</b>)) Detection results of harbor. (<b>b</b>(<b>2</b>)) Detection results of ship. (<b>b</b>(<b>3</b>)) Detection results of storage tank. (<b>b</b>(<b>4</b>)) Detection results of bridge. (<b>b</b>(<b>5</b>)) Detection results of harbor. (<b>c</b>(<b>1</b>)) Detection results of vehicle. (<b>c</b>(<b>2</b>)) Detection results of vehicle. (<b>c</b>(<b>3</b>)) Detection results of basketball court. (<b>c</b>(<b>4</b>)) Detection results of ship. (<b>c</b>(<b>5</b>)) Detection results of airplane.</p>
Full article ">Figure 8
<p>Some detection results on the augmented LEVIR. (<b>1</b>) Detection results of PCL method. (<b>2</b>) Detection results of our method.</p>
Full article ">Figure 9
<p>The experiment for different backbone networks.</p>
Full article ">Figure 10
<p>The ablation experiment for ARMS.</p>
Full article ">Figure 11
<p>The contribution of regression branch to the whole framework.</p>
Full article ">
21 pages, 24683 KiB  
Article
Application of Supervised Machine Learning Technique on LiDAR Data for Monitoring Coastal Land Evolution
by Maurizio Barbarella, Alessandro Di Benedetto and Margherita Fiani
Remote Sens. 2021, 13(23), 4782; https://doi.org/10.3390/rs13234782 - 25 Nov 2021
Cited by 8 | Viewed by 2958
Abstract
Machine Learning (ML) techniques are now being used very successfully in predicting and supporting decisions in multiple areas such as environmental issues and land management. These techniques have also provided promising results in the field of natural hazard assessment and risk mapping. The [...] Read more.
Machine Learning (ML) techniques are now being used very successfully in predicting and supporting decisions in multiple areas such as environmental issues and land management. These techniques have also provided promising results in the field of natural hazard assessment and risk mapping. The aim of this work is to apply the Supervised ML technique to train a model able to classify a particular gravity-driven coastal hillslope geomorphic model (slope-over-wall) involving most of the soft rocks of Cilento (southern Italy). To train the model, only geometric data have been used, namely morphometric feature maps computed on a Digital Terrain Model (DTM) derived from Light Detection and Ranging (LiDAR) data. Morphometric maps were computed using third-order polynomials, so as to obtain products that best describe landforms. Not all morphometric parameters from literature were used to train the model, the most significant ones were chosen by applying the Neighborhood Component Analysis (NCA) method. Different models were trained and the main indicators derived from the confusion matrices were compared. The best results were obtained using the Weighted k-NN model (accuracy score = 75%). Analysis of the Receiver Operating Characteristic (ROC) curves also shows that the discriminating capacity of the test reached percentages higher than 95%. The model, resulting more accurate in the training area, will be extended to similar areas along the Tyrrhenian coastal land. Full article
Show Figures

Figure 1

Figure 1
<p>Study area. (<b>a</b>) Map of Italy, the circlet identifies the area; (<b>b</b>) A zoomed-in view of the area, the box highlights the training area; (<b>c</b>) The training area, with the three sections highlighted.</p>
Full article ">Figure 2
<p>Testing sites (Google Earth maps).</p>
Full article ">Figure 3
<p>Workflow.</p>
Full article ">Figure 4
<p>DTM-derived morphometric feature maps of the training area.</p>
Full article ">Figure 5
<p>Feature selection using NCA. (<b>a</b>) Computing of optimized value of <span class="html-italic">λ</span>; (<b>b</b>) Feature weights.</p>
Full article ">Figure 6
<p>Accuracy score. (<b>a</b>) 9 features; (<b>b</b>) 8 features; (<b>c</b>) Accuracy score differences (9–8 features). The algorithms are sorted in ascending order of accuracy.</p>
Full article ">Figure 7
<p>Model accuracy indicators trained with the 9 features (Difc, Slins, Rot, Asp, Crosc, TRc, Extc, Verc, Unsph); color brightness is directly proportional to percentage values.</p>
Full article ">Figure 8
<p>Model accuracy indicators trained with the 8 features (Difc, Slins, Rot, Asp, Crosc Extc, Verc, Unsph); color brightness is directly proportional to percentage values.</p>
Full article ">Figure 9
<p>(<b>a</b>) Matrix of confusion; (<b>b</b>) <span class="html-italic">TPR</span> and <span class="html-italic">FNR</span>; (<b>c</b>) <span class="html-italic">PPV</span> and <span class="html-italic">FDR</span>.</p>
Full article ">Figure 10
<p>Classification overlaid on the test area DTM and the three coastal sections identified by expert judgment (white edge polygons).</p>
Full article ">Figure 11
<p>ROC curves. (<b>a</b>) Class I; (<b>b</b>) Class II; (<b>c</b>) Class III.</p>
Full article ">Figure 12
<p>Test area “Ripe Rosse”. Classified map superimposed on contour lines; (<b>a</b>,<b>b</b>) details from Google Earth images.</p>
Full article ">Figure 13
<p>Marina di Ascea test area. (<b>a</b>) Image from Google Earth; (<b>b</b>) Classified map superimposed on contour lines.</p>
Full article ">
14 pages, 5410 KiB  
Article
MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
by Wei Yuan and Wenbo Xu
Remote Sens. 2021, 13(23), 4743; https://doi.org/10.3390/rs13234743 - 23 Nov 2021
Cited by 47 | Viewed by 4414
Abstract
The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply [...] Read more.
The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation. Full article
Show Figures

Figure 1

Figure 1
<p>WHU building data set map.</p>
Full article ">Figure 2
<p>Differently sized houses in the remote sensing image.</p>
Full article ">Figure 3
<p>MSST-Net architecture. The red dotted rectangle is the encoding architecture of the Swin Transformer. The blue dotted rectangle is the decoding architecture of multi-level merging. The circle in the figure represents twice convolution with a kernel of 3 × 3, to the right of the circle is its output dimension, <span class="html-italic">H</span> represents the height of input image, <span class="html-italic">W</span> represents the width of the input image, the last number is the number of channels. The square represents single convolution with a kernel of 1 × 1, to the right of the square is its output dimension where <span class="html-italic">N</span> represents class number. The downward dashed arrow represents the deconvolution of upsampling to double size. The yellow curve represents a jump connection.</p>
Full article ">Figure 4
<p>Regular window and shift window of Swin Transformer in the Swin Transformer block [<a href="#B34-remotesensing-13-04743" class="html-bibr">34</a>]. Left is the regular window, and the red windows divide the input image into four equal patches. Right is the shift window, which is formed by scrolling the regular window right and down by 2 pixels, so that the pixels divided into one patch are not the same as in the patch of the regular window.</p>
Full article ">Figure 5
<p>Comparison of prediction results by networks based on CNN.</p>
Full article ">Figure 6
<p>Comparison of prediction results by network based on transformer.</p>
Full article ">
19 pages, 3017 KiB  
Article
Adaptable Convolutional Network for Hyperspectral Image Classification
by Mercedes E. Paoletti and Juan M. Haut
Remote Sens. 2021, 13(18), 3637; https://doi.org/10.3390/rs13183637 - 11 Sep 2021
Cited by 6 | Viewed by 2850
Abstract
Nowadays, a large number of remote sensing instruments are providing a massive amount of data within the frame of different Earth Observation missions. These instruments are characterized by the wide variety of data they can collect, as well as the impressive volume of [...] Read more.
Nowadays, a large number of remote sensing instruments are providing a massive amount of data within the frame of different Earth Observation missions. These instruments are characterized by the wide variety of data they can collect, as well as the impressive volume of data and the speed at which it is acquired. In this sense, hyperspectral imaging data has certain properties that make it difficult to process, such as its large spectral dimension coupled with problematic data variability. To overcome these challenges, convolutional neural networks have been proposed as classification models because of their ability to extract relevant spectral–spatial features and learn hidden patterns, along their great architectural flexibility. Their high performance relies on the convolution kernels to exploit the spatial relationships. Thus, filter design is crucial for the correct performance of models. Nevertheless, hyperspectral data may contain objects with different shapes and orientations, preventing filters from “seeing everything possible” during the decision making. To overcome this limitation, this paper proposes a novel adaptable convolution model based on deforming kernels combined with deforming convolution layers to fit their effective receptive field to the input data. The proposed adaptable convolutional network (named DKDCNet) has been evaluated over two well-known hyperspectral scenes, demonstrating that it is able to achieve better results than traditional strategies with similar computational cost for HSI classification. Full article
Show Figures

Figure 1

Figure 1
<p>Often, objects located in remotely sensed hyperspectral images do not have regular shapes and may appear with different sizes and orientations. For instance, some data augmentation methods apply spatial transformations to the input samples to increase the size of the datasets and cover the variability in the data (<b>a</b>), whilst other scenes contain the same elements with different orientations and poses; for example, in (<b>b</b>) we can observe several parking areas, with different sizes and where the cars are placed differently. However, the convolution kernel disregards these spatial variations, defining a rigid regular mesh.</p>
Full article ">Figure 2
<p>Graphical representation of a convolution kernel. (<b>a</b>) The multidimensional array of weights <math display="inline"><semantics> <msup> <mi mathvariant="bold">W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </semantics></math> (in red) is slid and locally applied over regions on the input feature maps <math display="inline"><semantics> <msup> <mi mathvariant="bold">X</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> </semantics></math> (highlighted in green and yellow), obtaining the corresponding output features in <math display="inline"><semantics> <msup> <mi mathvariant="bold">X</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </semantics></math> as a weighted sum. (<b>b</b>) 2D detail of a <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math> kernel with stride 1 and zero padding. Index <span class="html-italic">l</span> denotes the <span class="html-italic">l</span>-th layer, subscripts <math display="inline"><semantics> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>∈</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mn>5</mn> <mo>]</mo> </mrow> </semantics></math> represent the spatial coordinates of the features, whilst <span class="html-italic">z</span> indicates the feature.</p>
Full article ">Figure 3
<p>Graphical representation of the receptive field. (<b>a</b>) Considers a unique input layer to which a <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math> kernel is applied with a stride of 1. The color map indicates the number of times the kernel has been applied on each feature (i.e., the darker the color, the more kernel applications). The number of times the kernel has been applied over a feature is also included. For instance, the edge pixels are used between 0 and 3 times, while the center pixels are used 9 times. (<b>b</b>) Receptive field of an output feature obtained by a CNN composed of two <math display="inline"><semantics> <mrow> <mn>3</mn> <mo>×</mo> <mn>3</mn> </mrow> </semantics></math> convolution layers (in red) and a <math display="inline"><semantics> <mrow> <mn>2</mn> <mo>×</mo> <mn>2</mn> </mrow> </semantics></math> pooling layer (in yellow).</p>
Full article ">Figure 4
<p>Graphical overview of the proposed adaptable network architecture. Standard convolutions have been implemented as point-wise CONV2D layers. Focusing on DK layers, a kernel scope <math display="inline"><semantics> <mrow> <msup> <mi mathvariant="bold">W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>∈</mo> <msup> <mi mathvariant="double-struck">R</mi> <mrow> <msup> <mi>K</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <mo>×</mo> <msup> <mi>M</mi> <mo>′</mo> </msup> <mo>×</mo> <msup> <mi>M</mi> <mo>′</mo> </msup> <mo>×</mo> <msup> <mi>K</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> </mrow> </semantics></math> is considered. Then, the generator <math display="inline"><semantics> <mi mathvariant="script">G</mi> </semantics></math> (implemented as a CONV2D) computes the kernel offsets <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>Δ</mo> <msub> <mi mathvariant="bold">k</mi> <mi>l</mi> </msub> <mo>}</mo> </mrow> </semantics></math> from input <math display="inline"><semantics> <msup> <mi mathvariant="bold">X</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> </semantics></math>. The bilinear sampler <math display="inline"><semantics> <mi mathvariant="script">B</mi> </semantics></math> samples a new set of kernels <math display="inline"><semantics> <msup> <mover accent="true"> <mi mathvariant="bold">W</mi> <mo stretchy="false">˜</mo> </mover> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </semantics></math> from <math display="inline"><semantics> <msup> <mi mathvariant="bold">W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </semantics></math> and <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>Δ</mo> <msub> <mi mathvariant="bold">k</mi> <mi>l</mi> </msub> <mo>}</mo> </mrow> </semantics></math>, which is convolved over the input as a depth-wise layer, obtaining the output volume <math display="inline"><semantics> <msup> <mi mathvariant="bold">X</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </semantics></math>.</p>
Full article ">Figure 5
<p>Ground truth of University of Pavia scene.</p>
Full article ">Figure 6
<p>Ground truth of University of Houston scene.</p>
Full article ">Figure 7
<p>From left to right: training loss, validation loss, training accuracy, and validation accuracy. First row: obtained results with University of Pavia dataset. Second row: obtained results with University of Houston.</p>
Full article ">Figure 8
<p>From left to right, graphical representation of the OA, AA, and Kappa Coefficient (coupled with their corresponding standard deviations) reached by each classification model.</p>
Full article ">Figure 9
<p>Trainable parameters of the models considering the image University of Pavia.</p>
Full article ">Figure 10
<p>Classification maps obtained from UH scene (considering 30 samples per class). The obtained OAs are shown in brackets.</p>
Full article ">
25 pages, 3461 KiB  
Article
Efficient Transformer for Remote Sensing Image Segmentation
by Zhiyong Xu, Weicun Zhang, Tianxiang Zhang, Zhifang Yang and Jiangyun Li
Remote Sens. 2021, 13(18), 3585; https://doi.org/10.3390/rs13183585 - 9 Sep 2021
Cited by 150 | Viewed by 13709
Abstract
Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote sensing segmentation tasks are generally addressed by CNN-based models and transformer-based models. In particular, transformer-based architecture generally struggles with two [...] Read more.
Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote sensing segmentation tasks are generally addressed by CNN-based models and transformer-based models. In particular, transformer-based architecture generally struggles with two main problems: a high computation load and inaccurate edge classification. Therefore, to overcome these problems, we propose a novel transformer model to realize lightweight edge classification. First, based on a Swin transformer backbone, a pure Efficient transformer with mlphead is proposed to accelerate the inference speed. Moreover, explicit and implicit edge enhancement methods are proposed to cope with object edge problems. The experimental results evaluated on the Potsdam and Vaihingen datasets present that the proposed approach significantly improved the final accuracy, achieving a trade-off between computational complexity (Flops) and accuracy (Efficient-L obtaining 3.23% mIoU improvement on Vaihingen and 2.46% mIoU improvement on Potsdam compared with HRCNet_W48). As a result, it is believed that the proposed Efficient transformer will have an advantage in dealing with remote sensing image segmentation problems. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Flops vs. mIoU on the Potsdam and Vaihingen datasets.</p>
Full article ">Figure 2
<p>The overall framework of the Swin transformer (Swin-T).</p>
Full article ">Figure 3
<p>An illustration of the shifted window approach.</p>
Full article ">Figure 4
<p>The architecture of uperhead.</p>
Full article ">Figure 5
<p>The overall framework of the Efficient transformer (Efficient-T).</p>
Full article ">Figure 6
<p>The architecture of mlphead.</p>
Full article ">Figure 7
<p>Examples of uncertain edge definitions.</p>
Full article ">Figure 8
<p>Detailed structure of explicit edge enhancement method.</p>
Full article ">Figure 9
<p>Architecture of the CNN-based edge extractor.</p>
Full article ">Figure 10
<p>Illustration of the implicit edge enhancement method.</p>
Full article ">Figure 11
<p>Epoch vs. Loss of Adamw and SGD optimizers.</p>
Full article ">Figure 12
<p>Visualization of C1, C2, C3, and C4 features of categories building (<b>top</b>) and car (<b>bottom</b>).</p>
Full article ">Figure 13
<p>Visualization of explicit and implicit edge enhancement methods.</p>
Full article ">Figure 14
<p>Prediction maps of the compared methods on the Vaihingen dataset.</p>
Full article ">Figure 15
<p>Prediction maps of the compared methods on the Potsdam dataset.</p>
Full article ">Figure 16
<p>Comparison of the improvement on blurry areas.</p>
Full article ">
24 pages, 3630 KiB  
Article
Densely Connected Pyramidal Dilated Convolutional Network for Hyperspectral Image Classification
by Feng Zhao, Junjie Zhang, Zhe Meng and Hanqiang Liu
Remote Sens. 2021, 13(17), 3396; https://doi.org/10.3390/rs13173396 - 26 Aug 2021
Cited by 22 | Viewed by 3319
Abstract
Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is [...] Read more.
Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The ways of two different convolutions. (<b>a</b>) Naive convolution. (<b>b</b>) Dilated convolution.</p>
Full article ">Figure 2
<p>Densely connected convolutional block of DenseNet.</p>
Full article ">Figure 3
<p>The structures of three different blocks. (<b>a</b>) Densely naive convolutional block. (<b>b</b>) Densely naive dilated convolutional block. (<b>c</b>) Densely pyramidal dilated convolutional block.</p>
Full article ">Figure 4
<p>The receiving fields of the 3rd layer in the different convolutional block (in the case of one-dimension). (<b>a</b>) The receptive field (densely naive convolutional block). (<b>b</b>) The receptive field (densely naive dilated convolutional block). (<b>c</b>) The receptive field (densely pyramidal dilated convolutional block).</p>
Full article ">Figure 5
<p>The framework of PDCNet.</p>
Full article ">Figure 6
<p>The Indian Pines Dataset. (<b>a</b>) The false-color image. (<b>b</b>) The ground-truth map. (<b>c</b>) The corresponding color labels.</p>
Full article ">Figure 7
<p>The Pavia University dataset. (<b>a</b>) The false-color image. (<b>b</b>) The ground-truth map. (<b>c</b>) The corresponding color labels.</p>
Full article ">Figure 8
<p>The Salinas Valley dataset. (<b>a</b>) The false-color image. (<b>b</b>) The ground-truth map. (<b>c</b>) The corresponding color labels.</p>
Full article ">Figure 9
<p>OA of different training samples with three datasets on BMNet, DCNet and PDCNet. (<b>a</b>) OA on IP dataset. (<b>b</b>) OA on UP dataset. (<b>c</b>) OA on SV dataset.</p>
Full article ">Figure 10
<p>Performance of BMNet, DCNet and PDCNet on different HSI datasets. (<b>a</b>) Metrics on IP dataset. (<b>b</b>) Metrics on UP dataset. (<b>c</b>) Metrics on SV dataset.</p>
Full article ">Figure 11
<p>Classification performance of different network models over IP dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) SVM. (<b>d</b>) 3-D CNN. (<b>e</b>) FDMFN. (<b>f</b>) DenseNet. (<b>g</b>) PresNet. (<b>h</b>) PDCNet.</p>
Full article ">Figure 12
<p>Classification performance of different network models over UP dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) SVM. (<b>d</b>) 3-D CNN. (<b>e</b>) FDMFN. (<b>f</b>) DenseNet. (<b>g</b>) PresNet. (<b>h</b>) PDCNet.</p>
Full article ">Figure 13
<p>Classification performance of different network models over SV dataset. (<b>a</b>) False-color image. (<b>b</b>) Ground truth. (<b>c</b>) SVM. (<b>d</b>) 3-D CNN. (<b>e</b>) FDMFN. (<b>f</b>) DenseNet. (<b>g</b>) PresNet. (<b>h</b>) PDCNet.</p>
Full article ">
24 pages, 26957 KiB  
Article
CscGAN: Conditional Scale-Consistent Generation Network for Multi-Level Remote Sensing Image to Map Translation
by Yuanyuan Liu, Wenbin Wang, Fang Fang, Lin Zhou, Chenxing Sun, Ying Zheng and Zhanlong Chen
Remote Sens. 2021, 13(10), 1936; https://doi.org/10.3390/rs13101936 - 15 May 2021
Cited by 5 | Viewed by 3440
Abstract
Automatic remote sensing (RS) image to map translation is a crucial technology for intelligent tile map generation. Although existing methods based on a generative network (GAN) generated unannotated maps at a single level, they have limited capacity in handling multi-resolution map generation at [...] Read more.
Automatic remote sensing (RS) image to map translation is a crucial technology for intelligent tile map generation. Although existing methods based on a generative network (GAN) generated unannotated maps at a single level, they have limited capacity in handling multi-resolution map generation at different levels. To address the problem, we proposed a novel conditional scale-consistent generation network (CscGAN) to simultaneously generate multi-level tile maps from multi-scale RS images, using only a single and unified model. Specifically, the CscGAN first uses the level labels and map annotations as prior conditions to guide hierarchical feature learning with different scales. Then, a multi-scale discriminator and two multi-scale generators are introduced to describe both high-resolution and low-resolution representations, aiming to improve the similarity of generated maps and thus produce high-quality multi-level tile maps. Meanwhile, a level classifier is designed for further exploring the characteristics of tile maps at different levels. Moreover, the CscGAN is optimized by jointly multi-scale adversarial loss, level classification loss, and scale-consistent loss in an end-to-end manner. Extensive experiments on multiple datasets and study areas demonstrate that the CscGAN outperforms the state-of-the-art methods in multi-level map translation, with great robustness and efficiency. Full article
Show Figures

Figure 1

Figure 1
<p>Generated results at multiple levels using different models. (<b>a</b>) Automatic map translation at the 17th Level. From left to right: RS image + annotation image, Ground truth, by pix2pix, by StarGAN and by the proposed CscGAN. (<b>b</b>) Automatic map translation at the 18th Level. From left to right: RS image + annotation image, Ground truth by pix2pix, by StarGAN and by the proposed CscGAN. Note: in generated mappings, red rectangles represent loss contents; yellow rectangles represent confused contents; green rectangles represent the correct contents generated by the proposed method.</p>
Full article ">Figure 2
<p>Examples from the maps dataset.</p>
Full article ">Figure 3
<p>Examples from the self-annotated RS-image-to-map dataset at different levels, namely from the 14 to 18 levels.</p>
Full article ">Figure 4
<p>The architecture of the proposed CscGAN.</p>
Full article ">Figure 5
<p>The architecture of the multi-scale generator.</p>
Full article ">Figure 6
<p>The architecture of each multi-scale discriminator.</p>
Full article ">Figure 7
<p>The training procedure of the map-level classifier. <span class="html-italic">G</span> represents the multi-scale generator, <span class="html-italic">M</span> represents the map-level classifier, and <span class="html-italic">D</span> represents the multi-scale discriminator. (<b>a</b>) The real image goes through <span class="html-italic">D</span> and then into <span class="html-italic">M</span> to calculate the classification loss and optimize the classifier and the multi-scale discriminator <math display="inline"><semantics> <msub> <mi>D</mi> <mi>i</mi> </msub> </semantics></math>. (<b>b</b>) <span class="html-italic">G</span> generates a map based on the input RS image, annotation image and level label. Then, <span class="html-italic">M</span> calculates the classification loss of the generator and optimizes the multi-scale generator.</p>
Full article ">Figure 8
<p>Visualization generation results via different methods on maps dataset.</p>
Full article ">Figure 9
<p>The comparison of generation results with the use of the multi-scale generator and without the use of the multi-scale generator.</p>
Full article ">Figure 10
<p>The classification results by the level classifier. (<b>a</b>) The confusion matrix of classification on the real data; (<b>b</b>) the confusion matrix of classification on the fake data.</p>
Full article ">Figure 11
<p>ROC curve by the level classifier.</p>
Full article ">Figure 12
<p>The comparison of generation results with the use of the map-level classifier and without the use of the map level classifier.</p>
Full article ">Figure 13
<p>The generation results with level 14 via the proposed CscGAN in Songjiang District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 14
<p>The generation results with level 15 via the proposed CscGAN in the Pudong New Area of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 15
<p>The generation results with level 16 via the proposed CscGAN in Minhang District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 16
<p>The generation results with level 17 via the proposed CscGAN in Qingpu District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 16 Cont.
<p>The generation results with level 17 via the proposed CscGAN in Qingpu District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 17
<p>The generation results with level 18 via the proposed CscGAN in Minhang District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure 17 Cont.
<p>The generation results with level 18 via the proposed CscGAN in Minhang District of Shanghai. For clarity, the zoomed local areas (A1 and A2) are on the right of the figure.</p>
Full article ">Figure A1
<p>Example results translated by different methods.</p>
Full article ">Figure A2
<p>Example results in the Wuhan area.</p>
Full article ">
22 pages, 29182 KiB  
Article
Generative Adversarial Learning in YUV Color Space for Thin Cloud Removal on Satellite Imagery
by Xue Wen, Zongxu Pan, Yuxin Hu and Jiayin Liu
Remote Sens. 2021, 13(6), 1079; https://doi.org/10.3390/rs13061079 - 12 Mar 2021
Cited by 34 | Viewed by 4641
Abstract
Clouds are one of the most serious disturbances when using satellite imagery for ground observations. The semi-translucent nature of thin clouds provides the possibility of 2D ground scene reconstruction based on a single satellite image. In this paper, we propose an effective framework [...] Read more.
Clouds are one of the most serious disturbances when using satellite imagery for ground observations. The semi-translucent nature of thin clouds provides the possibility of 2D ground scene reconstruction based on a single satellite image. In this paper, we propose an effective framework for thin cloud removal involving two aspects: a network architecture and a training strategy. For the network architecture, a Wasserstein generative adversarial network (WGAN) in YUV color space called YUV-GAN is proposed. Unlike most existing approaches in RGB color space, our method performs end-to-end thin cloud removal by learning luminance and chroma components independently, which is efficient at reducing the number of unrecoverable bright and dark pixels. To preserve more detailed features, the generator adopts a residual encoding–decoding network without down-sampling and up-sampling layers, which effectively competes with a residual discriminator, encouraging the accuracy of scene identification. For the training strategy, a transfer-learning-based method was applied. Instead of using either simulated or scarce real data to train the deep network, adequate simulated pairs were used to train the YUV-GAN at first. Then, pre-trained convolutional layers were optimized by real pairs to encourage the applicability of the model to real cloudy images. Qualitative and quantitative results on RICE1 and Sentinel-2A datasets confirmed that our YUV-GAN achieved state-of-the-art performance compared with other approaches. Additionally, our method combining the YUV-GAN with a transfer-learning-based training strategy led to better performance in the case of scarce training data. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Overall framework of our method.</p>
Full article ">Figure 2
<p>The failures of RSC-Net when predicting cloud-free images with some bright and dark pixels. Bright pixels are predicted to be turquoise and dark pixels are predicted to be red.</p>
Full article ">Figure 3
<p>Architecture of the residual block.</p>
Full article ">Figure 4
<p>Architecture of the generator.</p>
Full article ">Figure 5
<p>Architecture of discriminator.</p>
Full article ">Figure 6
<p>Synthesis of cloudy images: (<b>a</b>) reference cloud-free images, (<b>b</b>) simulated clouds using Perlin Fractal noise, and (<b>c</b>) cloudy images synthesized by alpha blending.</p>
Full article ">Figure 7
<p>The thin cloud removal results of two RICE1 samples reconstructed in different color spaces: (<b>a1</b>,<b>a2</b>) input cloudy image, (<b>b1</b>,<b>b2</b>) output images reconstructed in RGB color space, (<b>c1</b>,<b>c2</b>) output images reconstructed in YUV color space, and (<b>d1</b>,<b>d2</b>) reference cloud-free image.</p>
Full article ">Figure 8
<p>The thin cloud removal results of two Sentinel-2A samples reconstructed in different color spaces: (<b>a1</b>,<b>a2</b>) input cloudy image, (<b>b1</b>,<b>b2</b>) output images reconstructed in YUV color space, (<b>c1</b>,<b>c2</b>) output images reconstructed in YUV color space, and (<b>d1</b>,<b>d2</b>) reference cloud-free image.</p>
Full article ">Figure 9
<p>The thin cloud removal results of a RICE1 sample reconstructed with different fidelity losses: (<b>a</b>) input cloudy image, (<b>b</b>) reconstructed image with the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </semantics></math> loss, (<b>c</b>) reconstructed image with the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>1</mn> </msub> </semantics></math> loss, and (<b>d</b>) reference cloud-free image.</p>
Full article ">Figure 10
<p>The thin cloud removal results of a Sentinel-2A sample reconstructed with different fidelity losses: (<b>a</b>) input cloudy image, (<b>b</b>) reconstructed image with the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>2</mn> </msub> </semantics></math> loss, (<b>c</b>) reconstructed image with the <math display="inline"><semantics> <msub> <mo>ℓ</mo> <mn>1</mn> </msub> </semantics></math> loss, and (<b>d</b>) reference cloud-free image.</p>
Full article ">Figure 11
<p>The thin cloud removal results under the influence of adversarial training: (<b>a1</b>,<b>a2</b>) input cloudy image, (<b>b1</b>,<b>b2</b>) reconstructed images without adversarial training, (<b>c1</b>,<b>c2</b>) reconstructed images with adversarial training, and (<b>d1</b>,<b>d2</b>) reference cloud-free image.</p>
Full article ">Figure 12
<p>The trend of the PSNR values with increasing training sets.</p>
Full article ">Figure 13
<p>Thin cloud removal results of various methods on a cloudy image from the RICE1 test set: (<b>a</b>) input cloudy image, (<b>b</b>) result of DCP, (<b>c</b>) result of McGAN, (<b>d</b>) result of RSC-Net, (<b>e</b>) result of our method, and (<b>f</b>) reference cloud-free image.</p>
Full article ">Figure 14
<p>Thin cloud removal results of various methods on a cloudy image from the Sentinel-2A test set: (<b>a</b>) input cloudy image, (<b>b</b>) result of DCP, (<b>c</b>) result of McGAN, (<b>d</b>) result of RSC-Net, (<b>e</b>) result of our method, and (<b>f</b>) reference cloud-free image.</p>
Full article ">Figure 15
<p>The result of removing heavy clouds and cloud shadows with poor performance: (<b>a</b>) input cloudy image, (<b>b</b>) reconstructed image by YUV-GAN, and (<b>c</b>) reference cloud-free image.</p>
Full article ">Figure 16
<p>A failure case: smooth output for an image with overly heavy clouds: (<b>a</b>) input cloudy image, (<b>b</b>) reconstructed image by YUV-GAN, and (<b>c</b>) reference cloud-free image.</p>
Full article ">
21 pages, 8976 KiB  
Article
Integrating Weighted Feature Fusion and the Spatial Attention Module with Convolutional Neural Networks for Automatic Aircraft Detection from SAR Images
by Jielan Wang, Hongguang Xiao, Lifu Chen, Jin Xing, Zhouhao Pan, Ru Luo and Xingmin Cai
Remote Sens. 2021, 13(5), 910; https://doi.org/10.3390/rs13050910 - 28 Feb 2021
Cited by 37 | Viewed by 3857
Abstract
The automatic detection of aircrafts from SAR images is widely applied in both military and civil fields, but there are still considerable challenges. To address the high variety of aircraft sizes and complex background information in SAR images, a new fast detection framework [...] Read more.
The automatic detection of aircrafts from SAR images is widely applied in both military and civil fields, but there are still considerable challenges. To address the high variety of aircraft sizes and complex background information in SAR images, a new fast detection framework based on convolution neural networks is proposed, which achieves automatic and rapid detection of aircraft with high accuracy. First, the airport runway areas are detected to generate the airport runway mask and rectangular contour of the whole airport are generated. Then, a new deep neural network proposed in this paper, named Efficient Weighted Feature Fusion and Attention Network (EWFAN), is used to detect aircrafts. EWFAN integrates the weighted feature fusion module, the spatial attention mechanism, and the CIF loss function. EWFAN can effectively reduce the interference of negative samples and enhance feature extraction, thereby significantly improving the detection accuracy. Finally, the airport runway mask is applied to the detected results to reduce false alarms and produce the final aircraft detection results. To evaluate the performance of the proposed framework, large-scale Gaofen-3 SAR images with 1 m resolution are utilized in the experiment. The detection rate and false alarm rate of our EWFAN algorithm are 95.4% and 3.3%, respectively, which outperforms Efficientdet and YOLOv4. In addition, the average test time with the proposed framework is only 15.40 s, indicating satisfying efficiency of automatic aircraft detection. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The efficient framework for aircraft detection.</p>
Full article ">Figure 2
<p>The framework of the Efficient Weighted Feature Fusion and Attention Network (EWFAN) algorithm.</p>
Full article ">Figure 3
<p>The structure of weighted bi-directional feature pyramid network.</p>
Full article ">Figure 4
<p>Schematic diagram of Residual Spatial Attention Module.</p>
Full article ">Figure 5
<p>Schematic diagram of Adaptively Spatial Feature Fusion (Taking ASFF-2 as an example).</p>
Full article ">Figure 6
<p>Classification regression network.</p>
Full article ">Figure 7
<p>Aircraft target aspect ratio distribution.</p>
Full article ">Figure 8
<p>The problems of IoU Loss.</p>
Full article ">Figure 9
<p>The schematic diagram of Non-Maximum Suppression (NMS).</p>
Full article ">Figure 10
<p>Effectiveness of the proposed airport detection algorithm. The green box represents the correctly detected aircraft, and the red box represents false alarms.</p>
Full article ">Figure 11
<p>The experiment result for Airport I. (<b>a</b>) SAR image of Airport Ifrom Gaofen-3. (<b>b</b>) The ground truth of the airport. (<b>c</b>–<b>e</b>) the aircraft detection results of (<b>a</b>) by EfficientDet, YOLOv4, and EWFAN. The green box, the red box, and the yellow box represent the correctly detected aircrafts, the false alarms, and the missed alarms respectively.</p>
Full article ">Figure 12
<p>The experiment result for Airport I. (<b>a</b>) SAR image of Airport I from Gaofen-3. (<b>b</b>) The ground truth of the airport. (<b>c</b>–<b>e</b>) the aircraft detection results of (<b>a</b>) by EfficientDet, YOLOv4, and EWFAN. The green box, the red box and the yellow box represent the correctly detected aircrafts, the false alarms, and the missed alarms respectively.</p>
Full article ">Figure 13
<p>The experiment result for Airport I. (<b>a</b>) SAR image of Airport I from Gaofen-3. (<b>b</b>) The ground truth of the airport. (<b>c</b>–<b>e</b>) the aircraft detection results of (<b>a</b>) by EfficientDet, YOLOv4, and EWFAN. The green box, the red box, and the yellow box represent the correctly detected aircrafts, the false alarms, and the missed alarms respectively.</p>
Full article ">Figure 13 Cont.
<p>The experiment result for Airport I. (<b>a</b>) SAR image of Airport I from Gaofen-3. (<b>b</b>) The ground truth of the airport. (<b>c</b>–<b>e</b>) the aircraft detection results of (<b>a</b>) by EfficientDet, YOLOv4, and EWFAN. The green box, the red box, and the yellow box represent the correctly detected aircrafts, the false alarms, and the missed alarms respectively.</p>
Full article ">
32 pages, 10384 KiB  
Article
AVILNet: A New Pliable Network with a Novel Metric for Small-Object Segmentation and Detection in Infrared Images
by Ikhwan Song and Sungho Kim
Remote Sens. 2021, 13(4), 555; https://doi.org/10.3390/rs13040555 - 4 Feb 2021
Cited by 11 | Viewed by 3810
Abstract
Infrared small-object segmentation (ISOS) has a persistent trade-off problem—that is, which came first, recall or precision? Constructing a fine balance between of them is, au fond, of vital importance to obtain the best performance in real applications, such as surveillance, tracking, and many [...] Read more.
Infrared small-object segmentation (ISOS) has a persistent trade-off problem—that is, which came first, recall or precision? Constructing a fine balance between of them is, au fond, of vital importance to obtain the best performance in real applications, such as surveillance, tracking, and many fields related to infrared searching and tracking. F1-score may be a good evaluation metric for this problem. However, since the F1-score only depends upon a specific threshold value, it cannot reflect the user’s requirements according to the various application environment. Therefore, several metrics are commonly used together. Now we introduce F-area, a novel metric for a panoptic evaluation of average precision and F1-score. It can simultaneously consider the performance in terms of real application and the potential capability of a model. Furthermore, we propose a new network, called the Amorphous Variable Inter-located Network (AVILNet), which is of pliable structure based on GridNet, and it is also an ensemble network consisting of the main and its sub-network. Compared with the state-of-the-art ISOS methods, our model achieved an AP of 51.69%, F1-score of 63.03%, and F-area of 32.58% on the International Conference on Computer Vision 2019 ISOS Single dataset by using one generator. In addition, an AP of 53.6%, an F1-score of 60.99%, and F-area of 32.69% by using dual generators, with beating the existing best record (AP, 51.42%; F1-score, 57.04%; and F-area, 29.33%). Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Infrared small-object samples. A lot of background clutter and sensor noise distorts the objects of interest.</p>
Full article ">Figure 2
<p>Model overview of DataLossGAN International Conference on Computer Vision (ICCV) 2019 [<a href="#B22-remotesensing-13-00555" class="html-bibr">22</a>]: On the left are two generators (G1 and G2), and on the right is one discriminator. In the two generators, the blue number within each layer is the dilation factor. For the discriminator, the height, width, and channel number of the output feature maps are marked beside each layer. The two generators compose the dual-learning system concentrating on opposing objectives while sharing information (e.g., L1-distance between G1 and G2 feature maps) with the discriminator to alleviate radical bias training.</p>
Full article ">Figure 3
<p>AVILNet is inspired by GridNet [<a href="#B24-remotesensing-13-00555" class="html-bibr">24</a>]. This ’grid’ structure is extremely pliable. In terms of network perception, the horizontal stream (black arrow pointing right) enforces local properties and the vertical stream (black arrow pointing downward) enforces global properties. To transform the grid structure, we only have to set the two parameters (’width’ and ’height’).</p>
Full article ">Figure 4
<p>The whole architectural overview of AVILNet (as proposed).</p>
Full article ">Figure 5
<p>Detailed overviews of CDB-L and CDDB-L. Unlike the study in [<a href="#B25-remotesensing-13-00555" class="html-bibr">25</a>], we take the last-fusion strategy where, in the final processing, <span class="html-italic">addition</span> is modified to <span class="html-italic">concatenation</span>. The dilation layer in CDDB-L improves the quality of the segmentation task through alternative sampling [<a href="#B48-remotesensing-13-00555" class="html-bibr">48</a>]. The ablation study for observing the effects of diverse strategies is shown in <a href="#remotesensing-13-00555-t002" class="html-table">Table 2</a>. The blue number within the layer is the dilation factor.</p>
Full article ">Figure 6
<p>To explain the difference between GridNet [<a href="#B24-remotesensing-13-00555" class="html-bibr">24</a>] and our methods, we illustrate the information flow of them. (<b>a</b>) GridNet, and (<b>b</b>) the feature-highway connections (blue arrows). This allows the information to flow through the down-sampling block (DSB) and the up-sampling block (USB), like a grid pattern, without entering the transition layer.</p>
Full article ">Figure 7
<p>Detailed overviews of the down-sampling block (DSB) and the up-sampling block (USB). Instead of simple binary up-sampling, two convolutional processes operate. This strategy makes our feature-highway connections trainable.</p>
Full article ">Figure 8
<p>Illustrated are (<b>a</b>) ResNext [<a href="#B41-remotesensing-13-00555" class="html-bibr">41</a>] and (<b>b</b>) our assistant network, which has a multi-scale attention-based ensemble decision system with feature-highway connections.</p>
Full article ">Figure 9
<p>Understanding the meaning of width and height of our generator. In terms of feature dimension, our generator can be divided into 6 floors (denoted as <span class="html-italic">H</span>). The number of floors is the same as the height of our generator. For instance, <math display="inline"><semantics> <msub> <mi>H</mi> <mn>1</mn> </msub> </semantics></math> means the 1th floor. In terms of the number of the information up and down stream, we can divide the generator into 4 streams (denoted as <span class="html-italic">W</span>). Each stream <span class="html-italic">W</span> combines a number of either USBs or DSBs.</p>
Full article ">Figure 10
<p>Each phase can choose to resize the input image or not. Asymmetric Contextual Modulation (ACM) [<a href="#B31-remotesensing-13-00555" class="html-bibr">31</a>] takes the route for case 1, since it exploits the backbone, and is constructed for large-scale object images. On the other hand, AVILNet takes the route for case 3, because it is constructed for small-object images from beginning to end.</p>
Full article ">Figure 11
<p>Shuffle strategies for last-fusion include (<b>a</b>) direct through, (<b>b</b>) shuffle, and (<b>c</b>) grain-shuffle. (<b>b</b>) and (<b>c</b>) differentiate our ensemble assistant network. Therefore, those strategies lead to the pool performance shown in <a href="#remotesensing-13-00555-t001" class="html-table">Table 1</a>.</p>
Full article ">Figure 12
<p>The performance of all state-of-the-art methods on the ICCV2019 ISOS Single dataset: (<b>a</b>) the proposed metric (F-area), (<b>b</b>) average precision, and (<b>c</b>) area under the ROC curve.</p>
Full article ">Figure 13
<p>The results of CNN-based methods. Green, red, and yellow indicate true positive, false negative, and false positive, respectively. To make a binary map, a threshold value of 0.5 was applied for each method’s confidence map. GT(D) and GT(S) indicate ground truth for detection and segmentation, respectively.</p>
Full article ">Figure 14
<p>The results of handcraft-based methods.</p>
Full article ">Figure 15
<p>Training and test loss graphs for our AVILNet.</p>
Full article ">Figure 16
<p>The performance of ablation study D and T.</p>
Full article ">Figure 17
<p>The performance of ablation study Q and sub-networks.</p>
Full article ">Figure 18
<p>The performance of all ablation studies.</p>
Full article ">Figure 19
<p>This figure illustrates the feature addition with attention and the feature addition. (<b>a</b>) Weighted sum of features. (<b>b</b>) Simple addition.</p>
Full article ">
13 pages, 2698 KiB  
Article
A Novel Deeplabv3+ Network for SAR Imagery Semantic Segmentation Based on the Potential Energy Loss Function of Gibbs Distribution
by Yingying Kong, Yanjuan Liu, Biyuan Yan, Henry Leung and Xiangyang Peng
Remote Sens. 2021, 13(3), 454; https://doi.org/10.3390/rs13030454 - 28 Jan 2021
Cited by 21 | Viewed by 4515
Abstract
Synthetic aperture radar (SAR) provides rich information about the Earth’s surface under all-weather and day-and-night conditions, and is applied in many relevant fields. SAR imagery semantic segmentation, which can be a final product for end users and a fundamental procedure to support other [...] Read more.
Synthetic aperture radar (SAR) provides rich information about the Earth’s surface under all-weather and day-and-night conditions, and is applied in many relevant fields. SAR imagery semantic segmentation, which can be a final product for end users and a fundamental procedure to support other applications, is one of the most difficult challenges. This paper proposes an encoding-decoding network based on Deeplabv3+ to semantically segment SAR imagery. A new potential energy loss function based on the Gibbs distribution is proposed here to establish the semantic dependence among different categories through the relationship among different cliques in the neighborhood system. This paper introduces an improved channel and spatial attention module to the Mobilenetv2 backbone to improve the recognition accuracy of small object categories in SAR imagery. The experimental results show that the proposed method achieves the highest mean intersection over union (mIoU) and global accuracy (GA) with the least running time, which verifies the effectiveness of our method. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The structure of the Deeplabv3+ network.</p>
Full article ">Figure 2
<p>The channel and spatial attention module.</p>
Full article ">Figure 3
<p>The SAR imagery and its corresponding ground truth: (<b>a</b>) SAR imagery; (<b>b</b>) ground truth.</p>
Full article ">Figure 4
<p>(<b>a</b>) The convergence of the proposed loss function; (<b>b</b>) the change in <math display="inline"><semantics> <mrow> <mi>m</mi> <mi>I</mi> <mi>o</mi> <msub> <mi>U</mi> <mrow> <mi>c</mi> <mi>l</mi> <mi>s</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>The results of different networks: (<b>a</b>) SAR images; (<b>b</b>) ground truth; (<b>c</b>) Deeplabv3+–drn output; (<b>d</b>) Deeplabv3+–ResNet output; (<b>e</b>) Deeplabv3+–Mobilenetv2 output; (<b>f</b>) PSPNet output; (<b>g</b>) FCN output.</p>
Full article ">Figure 5 Cont.
<p>The results of different networks: (<b>a</b>) SAR images; (<b>b</b>) ground truth; (<b>c</b>) Deeplabv3+–drn output; (<b>d</b>) Deeplabv3+–ResNet output; (<b>e</b>) Deeplabv3+–Mobilenetv2 output; (<b>f</b>) PSPNet output; (<b>g</b>) FCN output.</p>
Full article ">Figure 6
<p>The results of the three different networks: (<b>a</b>) SAR imagery; (<b>b</b>) ground truth; (<b>c</b>) Deeplabv3–drn output; (<b>d</b>) Deeplabv3–ResNet output; (<b>e</b>) Deeplabv3–Mobilenetv2 output.</p>
Full article ">
26 pages, 28935 KiB  
Article
Structured Object-Level Relational Reasoning CNN-Based Target Detection Algorithm in a Remote Sensing Image
by Bei Cheng, Zhengzhou Li, Bitong Xu, Xu Yao, Zhiquan Ding and Tianqi Qin
Remote Sens. 2021, 13(2), 281; https://doi.org/10.3390/rs13020281 - 14 Jan 2021
Cited by 26 | Viewed by 3358
Abstract
Deep learning technology has been extensively explored by existing methods to improve the performance of target detection in remote sensing images, due to its powerful feature extraction and representation abilities. However, these methods usually focus on the interior features of the target, but [...] Read more.
Deep learning technology has been extensively explored by existing methods to improve the performance of target detection in remote sensing images, due to its powerful feature extraction and representation abilities. However, these methods usually focus on the interior features of the target, but ignore the exterior semantic information around the target, especially the object-level relationship. Consequently, these methods fail to detect and recognize targets in the complex background where multiple objects crowd together. To handle this problem, a diversified context information fusion framework based on convolutional neural network (DCIFF-CNN) is proposed in this paper, which employs the structured object-level relationship to improve the target detection and recognition in complex backgrounds. The DCIFF-CNN is composed of two successive sub-networks, i.e., a multi-scale local context region proposal network (MLC-RPN) and an object-level relationship context target detection network (ORC-TDN). The MLC-RPN relies on the fine-grained details of objects to generate candidate regions in the remote sensing image. Then, the ORC-TDN utilizes the spatial context information of objects to detect and recognize targets by integrating an attentional message integrated module (AMIM) and an object relational structured graph (ORSG). The AMIM is integrated into the feed-forward CNN to highlight the useful object-level context information, while the ORSG builds the relations between a set of objects by processing their appearance features and geometric features. Finally, the target detection method based on DCIFF-CNN effectively represents the interior and exterior information of the target by exploiting both the multiscale local context information and the object-level relationships. Extensive experiments are conducted, and experimental results demonstrate that the proposed DCIFF-CNN method improves the target detection and recognition accuracy in complex backgrounds, showing superiority to other state-of-the-art methods. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The contextual information in a remote sensing image.</p>
Full article ">Figure 2
<p>The overall framework of diversified context information fusion framework based on convolutional neural network (DCIFF-CNN).</p>
Full article ">Figure 3
<p>The general framework of the multi-scale local context region proposal network (MLC-RPN).</p>
Full article ">Figure 4
<p>The graphical problem between target and objects.</p>
Full article ">Figure 5
<p>Illustration for gated recurrent unit (GRU).</p>
Full article ">Figure 6
<p>Illustration of the attentional message integrated module (AMIM).</p>
Full article ">Figure 7
<p>Object relational structured graph (ORSG).</p>
Full article ">Figure 8
<p>Performance comparisons of twelve different methods in terms of average precision (AP) values. (<b>a</b>) The AP value of an airplane; (<b>b</b>) The AP value of a ship; (<b>c</b>) The AP value of a storage tank; (<b>d</b>) The AP value of a baseball diamond; (<b>e</b>) The AP value of a tennis court; (<b>f</b>) The AP value of a basketball court; (<b>g</b>) The AP value of a ground track field; (<b>h</b>) The AP value of an harbor; (<b>i</b>) The AP value of a bridge; (<b>j</b>) The AP value of a vehicle; (<b>k</b>) The mAP value of ten classes of targets. All the methods are carried out with the same dataset and the same data ratio (Dataset: NWPU-VHR (train:20%, val:20%, test:60%)).</p>
Full article ">Figure 9
<p>Some target detection results with the proposed approach.</p>
Full article ">Figure 10
<p>Some target detection results of some different methods. (<b>a</b>) FRCNN-VGG; (<b>b</b>) YOLO3; (<b>c</b>) SSD; (<b>d</b>) DCIFF-CNN.</p>
Full article ">Figure 11
<p>The precision-recall curves (PRCs) of the proposed method and other compared methods. (<b>a</b>) Airplane; (<b>b</b>) Ship; (<b>c</b>) Storage tank; (<b>d</b>) Baseball diamond; (<b>e</b>) Tennis court; (<b>f</b>) Basketball court; (<b>g</b>) Ground track field; (<b>h</b>) Harbor; (<b>i</b>) Bridge; (<b>j</b>) Vehicle.</p>
Full article ">Figure 12
<p>The heat maps of the collected dataset. (<b>a</b>) Airplane; (<b>b</b>) Ship; (<b>c</b>) Car.</p>
Full article ">Figure 13
<p>The average precision of target detection on the collected dataset.</p>
Full article ">Figure 14
<p>The detection results of ten categories of targets by MLC-PRN and SSD. (The evaluation metric: AP; Dataset: NWPU-VHR (train: 20%, val: 20%, test: 60%)).</p>
Full article ">Figure 15
<p>The AP value for different targets under various context fusion networks. (The evaluation metric: AP; Dataset: NWPU-VHR (train: 20%, val: 20%, test: 60%)).</p>
Full article ">Figure 16
<p>Feature maps of each conv5 in three networks. (<b>a</b>) Input images. (<b>b</b>) Without ORSG and AMIM. (<b>c</b>) With ORSG and max-pooling. (<b>d</b>) With ORSG and average pooling. (<b>e</b>) With ORSG and AMIM.</p>
Full article ">
25 pages, 6240 KiB  
Article
Adaptive Weighting Feature Fusion Approach Based on Generative Adversarial Network for Hyperspectral Image Classification
by Hongbo Liang, Wenxing Bao and Xiangfei Shen
Remote Sens. 2021, 13(2), 198; https://doi.org/10.3390/rs13020198 - 8 Jan 2021
Cited by 16 | Viewed by 3982
Abstract
Recently, generative adversarial network (GAN)-based methods for hyperspectral image (HSI) classification have attracted research attention due to their ability to alleviate the challenges brought by having limited labeled samples. However, several studies have demonstrated that existing GAN-based HSI classification methods are limited in [...] Read more.
Recently, generative adversarial network (GAN)-based methods for hyperspectral image (HSI) classification have attracted research attention due to their ability to alleviate the challenges brought by having limited labeled samples. However, several studies have demonstrated that existing GAN-based HSI classification methods are limited in redundant spectral knowledge and cannot extract discriminative characteristics, thus affecting classification performance. In addition, GAN-based methods always suffer from the model collapse, which seriously hinders their development. In this study, we proposed a semi-supervised adaptive weighting feature fusion generative adversarial network (AWF2-GAN) to alleviate these problems. We introduced unlabeled data to address the issue of having a small number of samples. First, to build valid spectral–spatial feature engineering, the discriminator learns both the dense global spectrum and neighboring separable spatial context via well-designed extractors. Second, a lightweight adaptive feature weighting component is proposed for feature fusion; it considers four predictive fusion options, that is, adding or concatenating feature maps with similar or adaptive weights. Finally, for the mode collapse, the proposed AWF2-GAN combines supervised central loss and unsupervised mean minimization loss for optimization. Quantitative results on two HSI datasets show that our AWF2-GAN achieves superior performance over state-of-the-art GAN-based methods. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Architecture of ACGAN for HSI classification [<a href="#B44-remotesensing-13-00198" class="html-bibr">44</a>].</p>
Full article ">Figure 2
<p>Framework of the AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-GAN for HSI classification.</p>
Full article ">Figure 3
<p>Basic fusion models with (<b>a</b>) element-wise addition and (<b>b</b>) feature concatenation.</p>
Full article ">Figure 4
<p>Adaptive weighting fusion models with (<b>a</b>) element-wise addition and (<b>b</b>) feature concatenation.</p>
Full article ">Figure 5
<p>Adaptive weighting feature fusion discriminator (<b>upper</b>), consisting of a dense spectrum and spatially separable feature extractors. Their resulting features are fed into an adaptive weighting fusion model, which outputs a vector that indicates whether the data is fake or real and contains categorical probabilities. A generator (<b>lower</b>) contains consecutive spatial and spectral feature generation blocks to generate synthetic HSI cuboid <math display="inline"><semantics> <mi mathvariant="bold-italic">Z</mi> </semantics></math>.</p>
Full article ">Figure 6
<p>Indian Pines data with (<b>a</b>) color composite with RGB bands (29,19,9), (<b>b</b>) ground truth, and (<b>c</b>) category names with labeled samples.</p>
Full article ">Figure 7
<p>Pavia University imagery: (<b>a</b>) color composite with RGB bands (61,25,13), (<b>b</b>) ground truth, and (<b>c</b>) class names with available samples.</p>
Full article ">Figure 8
<p>Classification maps for the IN dataset with 525 labeled training samples: (<b>a</b>) training samples (<b>b</b>) SVM (EMAPs) (<b>c</b>) HS-GAN (<b>d</b>) 3D-GAN (<b>e</b>) SS-GAN (<b>f</b>) AD-GAN (<b>g</b>) F<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Concat. (<b>h</b>) F<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Add. (<b>i</b>) AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Concat. and (<b>j</b>) AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Add.</p>
Full article ">Figure 9
<p>Classification maps for the UP dataset with 350 labeled training samples: (<b>a</b>) training samples (<b>b</b>) SVM (EMAPs) (<b>c</b>) HS-GAN (<b>d</b>) 3D-GAN (<b>e</b>) SS-GAN (<b>f</b>) AD-GAN (<b>g</b>) F<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Concat. (<b>h</b>) F<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Add. (<b>i</b>) AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Concat. and (<b>j</b>) AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Add.</p>
Full article ">Figure 10
<p>Overall accuracies of AWF<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>-Add.-GAN with various kernel settings and number of neurons in their spectral and spatial feature extractors, sampled on two datasets for training. (<b>a</b>) Effect of the number of kernels, (<b>b</b>) kernel sizes, and (<b>c</b>) number of neurons for spectral purity analysis.</p>
Full article ">Figure 11
<p>Overall accuracies for different depths of the two feature extractors (3 &amp; 3, 3 &amp; 4, 4 &amp; 4, 4 &amp; 5, and 5 &amp; 5, respectively). (<b>a</b>) On the Indian Pines dataset, and (<b>b</b>) on the Pavia University dataset.</p>
Full article ">
23 pages, 8782 KiB  
Article
HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images
by Zhiyong Xu, Weicun Zhang, Tianxiang Zhang and Jiangyun Li
Remote Sens. 2021, 13(1), 71; https://doi.org/10.3390/rs13010071 - 27 Dec 2020
Cited by 119 | Viewed by 10369
Abstract
Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay [...] Read more.
Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The overall framework of our model.</p>
Full article ">Figure 2
<p>Illustrating the architecture of HRNet. The rectangular blocks represent the feature maps, and ’⟶’ represents the convolution operation. Stem is the downsampling process.</p>
Full article ">Figure 3
<p>The overall architecture is divided into three parts, from left to right are the backbone, segmentation head, and loss functions.</p>
Full article ">Figure 4
<p>Light-weight dual attention (LDA) module is applied to the four stages (Stage1, Stage2, Stage3, and Stage4).</p>
Full article ">Figure 5
<p>Detailed design of LDA module.</p>
Full article ">Figure 6
<p>Framework of feature enhancement feature pyramid (FEFP) module.</p>
Full article ">Figure 7
<p>Framework of designed multiple loss functions.</p>
Full article ">Figure 8
<p>Loss and accuracy during the training process.</p>
Full article ">Figure 9
<p>Comparation of the three postprocessing methods on the Potsdam dataset.</p>
Full article ">Figure 10
<p>Prediction maps of the compared methods on the Potsdam dataset. “†” means using data augmentation (Flip testing) methods.</p>
Full article ">Figure 11
<p>The prediction maps of the above methods on the Vaihingen dataset. “†” means using data augmentation (Flip and MS testing) methods.</p>
Full article ">Figure 12
<p>Our proposed modules significantly improve the segmentation of large objects, and for small objects, the boundary segmentation is smoother.</p>
Full article ">
19 pages, 2301 KiB  
Article
Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks
by Jovan Kalajdjieski, Eftim Zdravevski, Roberto Corizzo, Petre Lameski, Slobodan Kalajdziski, Ivan Miguel Pires, Nuno M. Garcia and Vladimir Trajkovik
Remote Sens. 2020, 12(24), 4142; https://doi.org/10.3390/rs12244142 - 18 Dec 2020
Cited by 75 | Viewed by 8957
Abstract
Air pollution is becoming a rising and serious environmental problem, especially in urban areas affected by an increasing migration rate. The large availability of sensor data enables the adoption of analytical tools to provide decision support capabilities. Employing sensors facilitates air pollution monitoring, [...] Read more.
Air pollution is becoming a rising and serious environmental problem, especially in urban areas affected by an increasing migration rate. The large availability of sensor data enables the adoption of analytical tools to provide decision support capabilities. Employing sensors facilitates air pollution monitoring, but the lack of predictive capability limits such systems’ potential in practical scenarios. On the other hand, forecasting methods offer the opportunity to predict the future pollution in specific areas, potentially suggesting useful preventive measures. To date, many works tackled the problem of air pollution forecasting, most of which are based on sequence models. These models are trained with raw pollution data and are subsequently utilized to make predictions. This paper proposes a novel approach evaluating four different architectures that utilize camera images to estimate the air pollution in those areas. These images are further enhanced with weather data to boost the classification accuracy. The proposed approach exploits generative adversarial networks combined with data augmentation techniques to mitigate the class imbalance problem. The experiments show that the proposed method achieves robust accuracy of up to 0.88, which is comparable to sequence models and conventional models that utilize air pollution data. This is a remarkable result considering that the historic air pollution data is directly related to the output—future air pollution data, whereas the proposed architecture uses camera images to recognize the air pollution—which is an inherently much more difficult problem. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Location of the camera, sensor, and weather station shown on the map of Skopje.</p>
Full article ">Figure 2
<p>Workflow of the proposed methodology for air pollution prediction.</p>
Full article ">Figure 3
<p>Basic convolutional block.</p>
Full article ">Figure 4
<p>Basic convolutional neural network model architecture.</p>
Full article ">Figure 5
<p>Residual block.</p>
Full article ">Figure 6
<p>ResNet convolutional block.</p>
Full article ">Figure 7
<p>ResNet architecture [<a href="#B21-remotesensing-12-04142" class="html-bibr">21</a>].</p>
Full article ">Figure 8
<p>Inception block [<a href="#B50-remotesensing-12-04142" class="html-bibr">50</a>].</p>
Full article ">Figure 9
<p>Inception architecture [<a href="#B21-remotesensing-12-04142" class="html-bibr">21</a>].</p>
Full article ">Figure 10
<p>Proposed custom pretrained inception.</p>
Full article ">Figure 11
<p>Exemplary images in the dataset in the different classes during the day.</p>
Full article ">Figure 12
<p>Exemplary images in the dataset in the different classes during the night or other weather conditions with limited visibility.</p>
Full article ">Figure 13
<p>Accuracy of the different architectures on 6-class classification.</p>
Full article ">Figure 14
<p>Accuracy of the different architectures on binary classification.</p>
Full article ">

Review

Jump to: Research

22 pages, 5900 KiB  
Review
Effect of Attention Mechanism in Deep Learning-Based Remote Sensing Image Processing: A Systematic Literature Review
by Saman Ghaffarian, João Valente, Mariska van der Voort and Bedir Tekinerdogan
Remote Sens. 2021, 13(15), 2965; https://doi.org/10.3390/rs13152965 - 28 Jul 2021
Cited by 120 | Viewed by 14652
Abstract
Machine learning, particularly deep learning (DL), has become a central and state-of-the-art method for several computer vision applications and remote sensing (RS) image processing. Researchers are continually trying to improve the performance of the DL methods by developing new architectural designs of the [...] Read more.
Machine learning, particularly deep learning (DL), has become a central and state-of-the-art method for several computer vision applications and remote sensing (RS) image processing. Researchers are continually trying to improve the performance of the DL methods by developing new architectural designs of the networks and/or developing new techniques, such as attention mechanisms. Since the attention mechanism has been proposed, regardless of its type, it has been increasingly used for diverse RS applications to improve the performances of the existing DL methods. However, these methods are scattered over different studies impeding the selection and application of the feasible approaches. This study provides an overview of the developed attention mechanisms and how to integrate them with different deep learning neural network architectures. In addition, it aims to investigate the effect of the attention mechanism on deep learning-based RS image processing. We identified and analyzed the advances in the corresponding attention mechanism-based deep learning (At-DL) methods. A systematic literature review was performed to identify the trends in publications, publishers, improved DL methods, data types used, attention types used, overall accuracies achieved using At-DL methods, and extracted the current research directions, weaknesses, and open problems to provide insights and recommendations for future studies. For this, five main research questions were formulated to extract the required data and information from the literature. Furthermore, we categorized the papers regarding the addressed RS image processing tasks (e.g., image classification, object detection, and change detection) and discussed the results within each group. In total, 270 papers were retrieved, of which 176 papers were selected according to the defined exclusion criteria for further analysis and detailed review. The results reveal that most of the papers reported an increase in overall accuracy when using the attention mechanism within the DL methods for image classification, image segmentation, change detection, and object detection using remote sensing images. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>An overview of typical attention mechanism approaches [<a href="#B21-remotesensing-13-02965" class="html-bibr">21</a>].</p>
Full article ">Figure 2
<p>A simple illustration of the channel and spatial attention types/networks, and their effects on the feature maps.</p>
Full article ">Figure 3
<p>An example of adding attention network (i.e., co-attention) to a CNN module (i.e., Siamese network) for building-based change detection [<a href="#B51-remotesensing-13-02965" class="html-bibr">51</a>]. CoA—co-attention module, At—attention network, CR—change residual module.</p>
Full article ">Figure 4
<p>An example of adding spatial and channel attentions to a GAN module for building detection from aerial images [<a href="#B75-remotesensing-13-02965" class="html-bibr">75</a>]. A—max pooling layer; B—convolutional + batch normalization + rectified linear unit (ReLU) layers; C—upsampling layer; D—concatenation operation; SA—spatial attention mechanism; CA—channel attention mechanism; RS—reshape operation.</p>
Full article ">Figure 5
<p>An example of adding attention networks (i.e., spatial and channel attentions) to a RNN + CNN module for hyperspectral image classification [<a href="#B79-remotesensing-13-02965" class="html-bibr">79</a>]. PCA—principal component analysis.</p>
Full article ">Figure 6
<p>An example of adding an attention network to a GNN module for multi-label RS image classification [<a href="#B82-remotesensing-13-02965" class="html-bibr">82</a>].</p>
Full article ">Figure 7
<p>Year-wise classification of the papers and classified based on the attention mechanism type used.</p>
Full article ">Figure 8
<p>The number of publications for different study targets.</p>
Full article ">Figure 9
<p>The improved DL algorithms with attention mechanism in the papers.</p>
Full article ">Figure 10
<p>The attention mechanism type used in the papers.</p>
Full article ">Figure 11
<p>The data sets used in the papers.</p>
Full article ">Figure 12
<p>The spatial resolution of the used RS images in the papers.</p>
Full article ">Figure 13
<p>The produced accuracy of the developed At-DL methods for different tasks in the papers.</p>
Full article ">Figure 14
<p>The effect of the use of the attention mechanism within the DL algorithms in terms of accuracy rate for different tasks in the papers.</p>
Full article ">
Back to TopTop