[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (185)

Search Parameters:
Keywords = ISPRS

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 232655 KiB  
Article
SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation
by Gyutae Hwang, Jiwoo Jeong and Sang Jun Lee
Remote Sens. 2024, 16(17), 3278; https://doi.org/10.3390/rs16173278 - 3 Sep 2024
Viewed by 625
Abstract
Advances in deep learning and computer vision techniques have made impacts in the field of remote sensing, enabling efficient data analysis for applications such as land cover classification and change detection. Convolutional neural networks (CNNs) and transformer architectures have been utilized in visual [...] Read more.
Advances in deep learning and computer vision techniques have made impacts in the field of remote sensing, enabling efficient data analysis for applications such as land cover classification and change detection. Convolutional neural networks (CNNs) and transformer architectures have been utilized in visual perception algorithms due to their effectiveness in analyzing local features and global context. In this paper, we propose a hybrid transformer architecture that consists of a CNN-based encoder and transformer-based decoder. We propose a feature adjustment module that refines the multiscale feature maps extracted from an EfficientNet backbone network. The adjusted feature maps are integrated into the transformer-based decoder to perform the semantic segmentation of the remote sensing images. This paper refers to the proposed encoder–decoder architecture as a semantic feature adjustment network (SFA-Net). To demonstrate the effectiveness of the SFA-Net, experiments were thoroughly conducted with four public benchmark datasets, including the UAVid, ISPRS Potsdam, ISPRS Vaihingen, and LoveDA datasets. The proposed model achieved state-of-the-art accuracy on the UAVid, ISPRS Vaihingen, and LoveDA datasets for the segmentation of the remote sensing images. On the ISPRS Potsdam dataset, our method achieved comparable accuracy to the latest model while reducing the number of trainable parameters from 113.8 M to 10.7 M. Full article
(This article belongs to the Special Issue Deep Learning for Remote Sensing and Geodata)
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the SFA-Net.</p>
Full article ">Figure 2
<p>Transformer-based decoder. (<b>a</b>–<b>c</b>) present the decoder block, weighted function, and feature refinement head, respectively.</p>
Full article ">Figure 3
<p>Visualization of the segmentation results on the UAVid datasets. (<b>a</b>) is ID 000300 in sequence 23, (<b>b</b>) is ID 000500 in sequence 28, (<b>c</b>) is ID 000000 id in sequence 30, and (<b>d</b>) is ID 000500 id in sequence 39.</p>
Full article ">Figure 4
<p>Visualization of the entire set of IDs and their segmentation results on the ISPRS Potsdam dataset. (<b>a</b>) is ID 3_14 and (<b>b</b>) is ID 5_13.</p>
Full article ">Figure 5
<p>Visualization of the segmentation results on the ISPRS Potsdam dataset. (<b>a</b>) is the 10th split of ID 3_14, (<b>b</b>) is the 12th split of ID 5_13, (<b>c</b>) is the 10th split of ID 6_14, and (<b>d</b>) is the 22nd split of ID 7_13.</p>
Full article ">Figure 6
<p>Visualization of the entire set of IDs and their segmentation results on the ISPRS Vaihingen test datasets. (<b>a</b>) is area 6 and (<b>b</b>) is area 27.</p>
Full article ">Figure 7
<p>Visualization of the segmentation results on the ISPRS Vaihingen dataset. (<b>a</b>) is the 4th split of ID area 31, and (<b>b</b>) is the 5th split from 2nd of area 33. (<b>c</b>) is the 4th split of area 38, and (<b>d</b>) is the 11th split of area 38.</p>
Full article ">Figure 8
<p>Visualization of the segmentation results on the LoveDA dataset. (<b>a</b>) is 4430, (<b>b</b>) is 4378, and (<b>c</b>) is 5458.</p>
Full article ">Figure 9
<p>Visualized complexity vs. performance graph of each dataset. The horizontal and vertical axes denote FLOPs and evaluation metrics, respectively, and the bubble diameter denotes the number of parameters.</p>
Full article ">
20 pages, 3112 KiB  
Article
Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion
by Yihao Sun, Mingrui Wang, Xiaoyi Huang, Chengshu Xin and Yinan Sun
Remote Sens. 2024, 16(17), 3248; https://doi.org/10.3390/rs16173248 - 2 Sep 2024
Viewed by 600
Abstract
For ultra-high-resolution (UHR) image semantic segmentation, striking a balance between computational efficiency and storage space is a crucial research direction. This paper proposes a Feature Fusion Network (EFFNet) to improve UHR image semantic segmentation performance. EFFNet designs a score map that can be [...] Read more.
For ultra-high-resolution (UHR) image semantic segmentation, striking a balance between computational efficiency and storage space is a crucial research direction. This paper proposes a Feature Fusion Network (EFFNet) to improve UHR image semantic segmentation performance. EFFNet designs a score map that can be embedded into the network for training purposes, enabling the selection of the most valuable features to reduce storage consumption, accelerate speed, and enhance accuracy. In the fusion stage, we improve upon previous redundant multiple feature fusion methods by utilizing a transformer structure for one-time fusion. Additionally, our combination of the transformer structure and multibranch structure allows it to be employed for feature fusion, significantly improving accuracy while ensuring calculations remain within an acceptable range. We evaluated EFFNet on the ISPRS two-dimensional semantic labeling Vaihingen and Potsdam datasets, demonstrating that its architecture offers an exceptionally effective solution with outstanding semantic segmentation precision and optimized inference speed. EFFNet substantially enhances critical performance metrics such as Intersection over Union (IoU), overall accuracy, and F1-score, highlighting its superiority as an architectural innovation in ultra-high-resolution remote sensing image semantic segmentation. Full article
(This article belongs to the Special Issue Deep Learning for Satellite Image Segmentation)
Show Figures

Figure 1

Figure 1
<p>Overview of the Efficient Future Fusion Network (EFFNet): The network utilizes cropped and downsampled full-resolution patches and consists of both local and global branches. After passing the local feature maps through ResNet and applying one-dimensional convolution, the score map module employs a Sigmoid activation function to extract significant local features. These local features are then efficiently fused with global features using a fast fusion mechanism, resulting in a high-resolution and information-rich feature map that is utilized for final semantic segmentation. Consequently, our objective is to enhance the accuracy and efficiency of the network by introducing two attention-based mechanism modules designed to reduce the processing load of local features while improving feature matching across samples.</p>
Full article ">Figure 2
<p>Score map module. The input image is subjected to two successive layers of 3 × 3 convolutions using ResNet, resulting in a feature map of dimensions H × W × 1. Following the Sigmoid function activation, the feature map is indexed, and high-value features are selectively retained.</p>
Full article ">Figure 3
<p>The fast fusion mechanism facilitates seamless integration between the global and local branches, enabling extensive collaboration through the fusion of feature maps at each layer using multiple attention weights. The model’s depth determines the number of layers, while the merging process occurs N times based on the number of cropped global patches. Subsequently, these attention weights are computed by leveraging local and global features such as <span class="html-italic">Q</span>, <span class="html-italic">K</span>, and <span class="html-italic">V</span>. The optimization objective in this study encompasses a primary loss derived from the merged results along with two additional losses.</p>
Full article ">Figure 4
<p>The GPU inference frames per second (FPS) and Mean Intersection over Union (mIoU) accuracy are evaluated on the (<b>a</b>) Vaihingen and (<b>b</b>) Potsdam datasets. EFFNet (represented by red dots) outperforms existing networks, including GLNet, in terms of both inference speed and accuracy for segmenting ultra-high-resolution images.</p>
Full article ">Figure 5
<p>Semantic segmentation results when adopting different modules on (<b>a</b>) the Vaihingen and (<b>b</b>) the Potsdam datasets.</p>
Full article ">Figure 6
<p>Ablation study of different transformer locations.</p>
Full article ">Figure 7
<p>Comparison of semantic segmentation results with and without score map on (<b>a</b>) Vaihingen and (<b>b</b>) Potsdam datasets.</p>
Full article ">Figure 8
<p>Ablation study of different numbers of patches.</p>
Full article ">
23 pages, 2671 KiB  
Article
Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images
by Jiang Liu, Shuli Cheng and Anyu Du
Remote Sens. 2024, 16(17), 3184; https://doi.org/10.3390/rs16173184 - 28 Aug 2024
Viewed by 339
Abstract
Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. [...] Read more.
Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. However, the fused feature information is not sufficiently enriched and it often lacks detailed refinement. To address this issue, we propose a novel method called the Multi-View Feature Fusion and Rich Information Refinement Network (MFRNet). Our model is equipped with the Multi-View Feature Fusion Block (MAFF) to merge various types of information, including local, non-local, channel, and positional information. Within MAFF, we introduce two innovative methods. The Sliding Heterogeneous Multi-Head Attention (SHMA) extracts local, non-local, and positional information using a sliding window, while the Multi-Scale Hierarchical Compressed Channel Attention (MSCA) leverages bar-shaped pooling kernels and stepwise compression to obtain reliable channel information. Additionally, we introduce the Efficient Feature Refinement Module (EFRM), which enhances segmentation accuracy by interacting the results of the Long-Range Information Perception Branch and the Local Semantic Information Perception Branch. We evaluate our model on the ISPRS Vaihingen and Potsdam datasets. We conducted extensive comparison experiments with state-of-the-art models and verified that MFRNet outperforms other models. Full article
(This article belongs to the Special Issue Image Enhancement and Fusion Techniques in Remote Sensing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The characteristics of remote sensing images summarized. The car types have different shapes, with RVs and containers having similar shapes.</p>
Full article ">Figure 2
<p>The overall structure of MFRNet, containing a CNN encoder and our designed decoder MAFF and feature refinement module EFRM.</p>
Full article ">Figure 3
<p>Specific design of MAFF and SSH. (<b>a</b>) Multi-View Feature Fusion Block (MAFF); (<b>b</b>) Semantic Segmentation Header (SSH).</p>
Full article ">Figure 4
<p>Heatmap demonstration of Multi-View Feature Fusion Module (MAFF) and Efficient Feature Refinement Module (EFRM).</p>
Full article ">Figure 5
<p>The specific implementation of the Sliding Heterogeneous Multi-Head Attention (SHMA).</p>
Full article ">Figure 6
<p>The specific design method for Multi-Scale Hierarchical Compressed Channel Attention (MSCA).</p>
Full article ">Figure 7
<p>Detailed structure of the Efficient Feature Refinement Module (EFRM).</p>
Full article ">Figure 8
<p>The visual results of the ablation experiment, GT being ground truth.</p>
Full article ">Figure 9
<p>The ablation experiment results on SHMA window sizes on the Vaihingen and Potsdam datasets. Horizontally, window sizes are represented, while vertically, metric values are displayed as percentages.</p>
Full article ">Figure 10
<p>The ablation experiment results of MSCA compared to the use of Global Average Pooling (GAP).</p>
Full article ">Figure 11
<p>The segmentation results of the Vaihingen dataset compared to other network models, with the RGB blue box corresponding to the red box in the segmentation result, and GT indicating ground truth.</p>
Full article ">Figure 12
<p>Comparison of segmentation results from the Potsdam dataset with other network models, where the RGB blue box corresponds to the red box in the segmentation result, and GT represents ground truth.</p>
Full article ">Figure 13
<p>Validation and training accuracy graphs.</p>
Full article ">
2 pages, 613 KiB  
Correction
Correction: Agriesti et al. Assignment of a Synthetic Population for Activity-Based Modeling Employing Publicly Available Data. ISPRS Int. J. Geo-Inf. 2022, 11, 148
by Serio Agriesti, Claudio Roncoli and Bat-hen Nahmias-Biran
ISPRS Int. J. Geo-Inf. 2024, 13(8), 284; https://doi.org/10.3390/ijgi13080284 - 13 Aug 2024
Viewed by 331
Abstract
In the original publication [...] Full article
Show Figures

Figure 12

Figure 12
<p>(<b>b</b>) Workers and their districts of residence as the origin (colorcoded).</p>
Full article ">
25 pages, 4045 KiB  
Article
MBT-UNet: Multi-Branch Transform Combined with UNet for Semantic Segmentation of Remote Sensing Images
by Bin Liu, Bing Li, Victor Sreeram and Shuofeng Li
Remote Sens. 2024, 16(15), 2776; https://doi.org/10.3390/rs16152776 - 29 Jul 2024
Viewed by 590
Abstract
Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid [...] Read more.
Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid network model that combines the advantages of CNN and Transformer, called MBT-UNet. First, a multi-branch encoder design based on the pyramid vision transformer (PVT) is proposed to effectively capture multi-scale feature information; second, an efficient feature fusion module (FFM) is proposed to optimize the collaboration and integration of features at different scales; finally, in the decoder stage, a multi-scale upsampling module (MSUM) is proposed to further refine the segmentation results and enhance segmentation accuracy. We conduct experiments on the ISPRS Vaihingen dataset, the Potsdam dataset, the LoveDA dataset, and the UAVid dataset. Experimental results show that MBT-UNet surpasses state-of-the-art algorithms in key performance indicators, confirming its superior performance in high-precision remote sensing image segmentation tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of our proposed MBT-UNet. It includes a multi-branch PVT encoder, FFM and MSUM.</p>
Full article ">Figure 2
<p>Structure of Mix-Transformer module.</p>
Full article ">Figure 3
<p>Structure of FFM. It fuses multi-scale features.</p>
Full article ">Figure 4
<p>Structure of MSUM. It performs multi-scale upsampling of features.</p>
Full article ">Figure 5
<p>Comparison of segmentation results before and after using MBT on the Vaihingen dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) P_UNet. (<b>d</b>) P_UNet + MBT. (<b>e</b>) P_UNet + MBT + FFM. (<b>f</b>) P_UNet + MBT + FFM + MSUM. The yellow box indicates the position in the original image, and the red boxes indicate false positives.</p>
Full article ">Figure 6
<p>Comparison of segmentation results before and after using MBT on the LoveDA dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) P_UNet. (<b>d</b>) P_UNet + MBT. (<b>e</b>) P_UNet + MBT + FFM. (<b>f</b>) P_UNet + MBT + FFM + MSUM. The yellow box indicates the position in the original image, and the black boxes indicate missed positives.</p>
Full article ">Figure 7
<p>Comparison of segmentation res ults of different methods on the Vaihingen dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DeepLabv3+. (<b>d</b>) SegFormer. (<b>e</b>) ST-UNet. (<b>f</b>) SSNet. (<b>g</b>) STDSNet. (<b>h</b>) DSHNet. (<b>i</b>) MBT-UNet. The yellow box indicates the position in the original image, the red boxes indicate false positives, and the black boxes indicate missed positives.</p>
Full article ">Figure 8
<p>Comparison of segmentation resul ts of different methods on the Potsdam dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DeepLabv3+. (<b>d</b>) SegFormer. (<b>e</b>) ST-UNet. (<b>f</b>) SSNet. (<b>g</b>) STDSNet. (<b>h</b>) DSHNet. (<b>i</b>) MBT-UNet. The yellow box indicates the position in the original image, the red boxes indicate false positives, and the black boxes indicate missed positives.</p>
Full article ">Figure 9
<p>Comparison of segmentation resu lts of different methods on the LoveDA dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DeepLabv3+. (<b>d</b>) SegFormer. (<b>e</b>) ST-UNet. (<b>f</b>) SSNet. (<b>g</b>) STDSNet. (<b>h</b>) DSHNet. (<b>i</b>) MBT-UNet. The yellow box indicates the position in the original image, the red boxes indicate false positives, and the black boxes indicate missed positives.</p>
Full article ">Figure 10
<p>Comparison of segmentation results of different methods on the UAVid dataset. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DeepLabv3+. (<b>d</b>) SegFormer. (<b>e</b>) ST-UNet. (<b>f</b>) SSNet. (<b>g</b>) STDSNet. (<b>h</b>) DSHNet. (<b>i</b>) MBT-UNet. The yellow box indicates the position in the original image, and the red boxes indicate false positives.</p>
Full article ">
26 pages, 12605 KiB  
Article
Active Bidirectional Self-Training Network for Cross-Domain Segmentation in Remote-Sensing Images
by Zhujun Yang, Zhiyuan Yan, Wenhui Diao, Yihang Ma, Xinming Li and Xian Sun
Remote Sens. 2024, 16(13), 2507; https://doi.org/10.3390/rs16132507 - 8 Jul 2024
Viewed by 602
Abstract
Semantic segmentation with cross-domain adaptation in remote-sensing images (RSIs) is crucial and mitigates the expense of manually labeling target data. However, the performance of existing unsupervised domain adaptation (UDA) methods is still significantly impacted by domain bias, leading to a considerable gap compared [...] Read more.
Semantic segmentation with cross-domain adaptation in remote-sensing images (RSIs) is crucial and mitigates the expense of manually labeling target data. However, the performance of existing unsupervised domain adaptation (UDA) methods is still significantly impacted by domain bias, leading to a considerable gap compared to supervised trained models. To address this, our work focuses on semi-supervised domain adaptation, selecting a small subset of target annotations through active learning (AL) that maximize information to improve domain adaptation. Overall, we propose a novel active bidirectional self-training network (ABSNet) for cross-domain semantic segmentation in RSIs. ABSNet consists of two sub-stages: a multi-prototype active region selection (MARS) stage and a source-weighted class-balanced self-training (SCBS) stage. The MARS approach captures the diversity in labeled source data by introducing multi-prototype density estimation based on Gaussian mixture models. We then measure inter-domain similarity to select complementary and representative target samples. Through fine-tuning with the selected active samples, we propose an enhanced self-training strategy SCBS, designed for weighted training on source data, aiming to avoid the negative effects of interfering samples. We conduct extensive experiments on the LoveDA and ISPRS datasets to validate the superiority of our method over existing state-of-the-art domain-adaptive semantic segmentation methods. Full article
(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>A domain shift exists between urban and rural scenarios and is manifested by differences in target characteristics and imbalances in the class distribution. The t-SNE [<a href="#B25-remotesensing-16-02507" class="html-bibr">25</a>] feature visualization for the UDA method using the DCA [<a href="#B22-remotesensing-16-02507" class="html-bibr">22</a>] and our ADA method is shown on the right.</p>
Full article ">Figure 2
<p>Domain offsets in the instances of the Urban domain from the Rural domain based on Gaussian distribution.</p>
Full article ">Figure 3
<p>Architecture of the proposed ABSNet. The upper left part is the multi-prototype active region selection module, responsible for selecting and labeling the target-domain active samples that are the most informative for domain adaptation. The lower part represents the source-weighted class-balanced self-training process, in which the distance measurement from the source samples to the target distribution and the class entropy are incorporated for self-training.</p>
Full article ">Figure 4
<p>Illustration of multi-prototype active region selection module.</p>
Full article ">Figure 5
<p>Illustration of the source-weighted class-balanced self-training process.</p>
Full article ">Figure 6
<p>Segmentation visualization results with different DA of semantic segmentation methods in Rural-to-Urban task. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DCA. (<b>d</b>) MADA. (<b>e</b>) RIPU. (<b>f</b>) ABSNet.</p>
Full article ">Figure 7
<p>Segmentation visualization results with different DA of semantic segmentation methods in Urban-to-Rural task. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) DCA. (<b>d</b>) MADA. (<b>e</b>) RIPU. (<b>f</b>) ABSNet.</p>
Full article ">Figure 8
<p>Segmentation visualization results with different DA of semantic segmentation methods in VH-to-PD task. (<b>a</b>) Image. (<b>b</b>) Ground truth. (<b>c</b>) Alonso’s. (<b>d</b>) MADA. (<b>e</b>) RIPU. (<b>f</b>) ABSNet.</p>
Full article ">Figure 9
<p>Radar charts of mIoU(%) on the 7 categories of the LoveDA dataset with the baseline method and our ABSNet. (<b>a</b>) Rural-to-Urban. (<b>b</b>) Urban-to-Rural.</p>
Full article ">Figure 10
<p>Performance improvement of our method for different number of active samples. (<b>a</b>) Rural-to-Urban. (<b>b</b>) Urban-to-Rural.</p>
Full article ">
21 pages, 16543 KiB  
Article
Bidirectional Feature Fusion and Enhanced Alignment Based Multimodal Semantic Segmentation for Remote Sensing Images
by Qianqian Liu and Xili Wang
Remote Sens. 2024, 16(13), 2289; https://doi.org/10.3390/rs16132289 - 22 Jun 2024
Viewed by 890
Abstract
Image–text multimodal deep semantic segmentation leverages the fusion and alignment of image and text information and provides more prior knowledge for segmentation tasks. It is worth exploring image–text multimodal semantic segmentation for remote sensing images. In this paper, we propose a bidirectional feature [...] Read more.
Image–text multimodal deep semantic segmentation leverages the fusion and alignment of image and text information and provides more prior knowledge for segmentation tasks. It is worth exploring image–text multimodal semantic segmentation for remote sensing images. In this paper, we propose a bidirectional feature fusion and enhanced alignment-based multimodal semantic segmentation model (BEMSeg) for remote sensing images. Specifically, BEMSeg first extracts image and text features by image and text encoders, respectively, and then the features are provided for fusion and alignment to obtain complementary multimodal feature representation. Secondly, a bidirectional feature fusion module is proposed, which employs self-attention and cross-attention to adaptively fuse image and text features of different modalities, thus reducing the differences between multimodal features. For multimodal feature alignment, the similarity between the image pixel features and text features is computed to obtain a pixel–text score map. Thirdly, we propose a category-based pixel-level contrastive learning on the score map to reduce the differences among the same category’s pixels and increase the differences among the different categories’ pixels, thereby enhancing the alignment effect. Additionally, a positive and negative sample selection strategy based on different images is explored during contrastive learning. Averaging pixel values across different training images for each category to set positive and negative samples compares global pixel information while also limiting sample quantity and reducing computational costs. Finally, the fused image features and aligned pixel–text score map are concatenated and fed into the decoder to predict the segmentation results. Experimental results on the ISPRS Potsdam, Vaihingen, and LoveDA datasets demonstrate that BEMSeg is superior to comparison methods on the Potsdam and Vaihingen datasets, with improvements in mIoU ranging from 0.57% to 5.59% and 0.48% to 6.15%, and compared with Transformer-based methods, BEMSeg also performs competitively on LoveDA dataset with improvements in mIoU ranging from 0.37% to 7.14%. Full article
(This article belongs to the Special Issue Image Processing from Aerial and Satellite Imagery)
Show Figures

Figure 1

Figure 1
<p>The framework of the proposed BEMSeg, which consists of an image encoder, a text encoder, multimodal feature fusion, an alignment module, and a decoder. BFF and CPC denote the bidirectional feature fusion module and category-based pixel-level contrastive learning in the multimodal feature fusion and alignment module, and C denotes the number of categories. The colored squares at the top denote the text features of different categories.</p>
Full article ">Figure 2
<p>We propose a new attention-based bidirectional feature fusion module. This structure enables image-attention text features to be incorporated into text representations (and vice versa) by designing a dual-branch structure and adding a self-attention mechanism.</p>
Full article ">Figure 3
<p>Multi-scale image feature fusion network in the decoder of Semantic FPN.</p>
Full article ">Figure 4
<p>Proportion of the number of pixels in each category in Potsdam and Vaihingen remote sensing datasets.</p>
Full article ">Figure 5
<p>The qualitative results of comparison methods on some test images of the Potsdam dataset.</p>
Full article ">Figure 6
<p>The IoU of each class of comparison methods on the Vaihingen dataset.</p>
Full article ">Figure 7
<p>Different parameters of <math display="inline"><semantics> <msub> <mi>λ</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>λ</mi> <mn>2</mn> </msub> </semantics></math> correspond to the mIoU value of the BEMSeg model on the Potsdam dataset.</p>
Full article ">
28 pages, 5447 KiB  
Review
A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping
by Segun Ajibola and Pedro Cabral
Remote Sens. 2024, 16(12), 2222; https://doi.org/10.3390/rs16122222 - 19 Jun 2024
Cited by 1 | Viewed by 1377
Abstract
Recent advancements in deep learning have spurred the development of numerous novel semantic segmentation models for land cover mapping, showcasing exceptional performance in delineating precise boundaries and producing highly accurate land cover maps. However, to date, no systematic literature review has comprehensively examined [...] Read more.
Recent advancements in deep learning have spurred the development of numerous novel semantic segmentation models for land cover mapping, showcasing exceptional performance in delineating precise boundaries and producing highly accurate land cover maps. However, to date, no systematic literature review has comprehensively examined semantic segmentation models in the context of land cover mapping. This paper addresses this gap by synthesizing recent advancements in semantic segmentation models for land cover mapping from 2017 to 2023, drawing insights on trends, data sources, model structures, and performance metrics based on a review of 106 articles. Our analysis identifies top journals in the field, including MDPI Remote Sensing, IEEE Journal of Selected Topics in Earth Science, and IEEE Transactions on Geoscience and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, and ISPRS Journal Of Photogrammetry And Remote Sensing. We find that research predominantly focuses on land cover, urban areas, precision agriculture, environment, coastal areas, and forests. Geographically, 35.29% of the study areas are located in China, followed by the USA (11.76%), France (5.88%), Spain (4%), and others. Sentinel-2, Sentinel-1, and Landsat satellites emerge as the most used data sources. Benchmark datasets such as ISPRS Vaihingen and Potsdam, LandCover.ai, DeepGlobe, and GID datasets are frequently employed. Model architectures predominantly utilize encoder–decoder and hybrid convolutional neural network-based structures because of their impressive performances, with limited adoption of transformer-based architectures due to its computational complexity issue and slow convergence speed. Lastly, this paper highlights existing key research gaps in the field to guide future research directions. Full article
(This article belongs to the Special Issue Advances of Remote Sensing in Land Cover and Land Use Mapping)
Show Figures

Figure 1

Figure 1
<p>Flow chart of peer-review procedure.</p>
Full article ">Figure 2
<p>Annual distribution of research studies.</p>
Full article ">Figure 3
<p>Number of relevant publications in this field distributed per relevant journals.</p>
Full article ">Figure 4
<p>Geographic distribution of studies per country.</p>
Full article ">Figure 5
<p>Significant keywords occurrence driving domain theme.</p>
Full article ">Figure 6
<p>Land cover mapping domain studies.</p>
Full article ">Figure 7
<p>Number of publications per domain studies.</p>
Full article ">Figure 8
<p>Number of publications per study location.</p>
Full article ">Figure 9
<p>Benchmark datasets identified in the literature.</p>
Full article ">Figure 10
<p>Percentage number of articles employing different model structures in data retrieved.</p>
Full article ">Figure 11
<p>Semantic segmentation model structures analysis on ISPRS 2-D labelling Potsdam.</p>
Full article ">Figure 12
<p>Semantic segmentation model structures analysis on ISPRS 2-D labelling Vaihingen.</p>
Full article ">
24 pages, 3332 KiB  
Article
U-Net Ensemble for Enhanced Semantic Segmentation in Remote Sensing Imagery
by Ivica Dimitrovski, Vlatko Spasev, Suzana Loshkovska and Ivan Kitanovski
Remote Sens. 2024, 16(12), 2077; https://doi.org/10.3390/rs16122077 - 8 Jun 2024
Cited by 1 | Viewed by 1411
Abstract
Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates [...] Read more.
Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates in-depth analysis and comprehension of the Earth’s surface. In this paper, we propose an approach for enhancing semantic segmentation performance by employing an ensemble of U-Net models with three different backbone networks: Multi-Axis Vision Transformer, ConvFormer, and EfficientNet. The final segmentation maps are generated through a geometric mean ensemble method, leveraging the diverse representations learned by each backbone network. The effectiveness of the base U-Net models and the proposed ensemble is evaluated on multiple datasets commonly used for semantic segmentation tasks in remote sensing imagery, including LandCover.ai, LoveDA, INRIA, UAVid, and ISPRS Potsdam datasets. Our experimental results demonstrate that the proposed approach achieves state-of-the-art performance, showcasing its effectiveness and robustness in accurately capturing the semantic information embedded within remote sensing images. Full article
(This article belongs to the Special Issue GeoAI and EO Big Data Driven Advances in Earth Environmental Science)
Show Figures

Figure 1

Figure 1
<p>Illustration of the U-Net architecture. The displayed U-Net is an encoder–decoder network with a contracting path (encoding part, left side) that reduces the height and width of the input images and an expansive path (decoding part, right side) that recovers the original dimensions of the input images.</p>
Full article ">Figure 2
<p>MaxViT architecture with hierarchical design and basic building block that unifies MBConv, block, and grid attention layers.</p>
Full article ">Figure 3
<p>Overall framework of ConvFormer and architecture of the ConvFormer block, which has a token mixer of separable depthwise convolutions.</p>
Full article ">Figure 4
<p>Architecture of EfficientNet-B0 with MBConv as basic building blocks.</p>
Full article ">Figure 5
<p>Geometric mean ensemble training and testing strategy of three base models: MaxViT-S, ConvFormer-M36, and EfficientNet-B7.</p>
Full article ">Figure 6
<p>Example images and inference masks from LoveDA dataset. The first row displays example images. Each subsequent row shows the segmentation masks generated by a different model for the corresponding image in the first row.</p>
Full article ">Figure 7
<p>Confusion matrix for the U-Net ensemble model on the UAVid dataset.</p>
Full article ">Figure 8
<p>Example images, ground-truth masks, and inference masks from UAVid dataset. First row shows example images. Second row shows the corresponding ground-truth masks. Third row shows the prediction results of U-Net ensemble model as in <a href="#remotesensing-16-02077-t002" class="html-table">Table 2</a>.</p>
Full article ">Figure 9
<p>Cropped image, ground-truth mask, and predicted mask from the U-Net ensemble model, as outlined in <a href="#remotesensing-16-02077-t002" class="html-table">Table 2</a>. The images highlight a predominant region containing the labels human and moving cars from the UAVid dataset.</p>
Full article ">Figure 10
<p>Confusion matrix for the U-Net ensemble model applied on the LandCover.ai dataset.</p>
Full article ">Figure 11
<p>Example images, ground-truth masks, and inference masks from LandCover.ai dataset. The first row shows example images. The second row shows the corresponding ground-truth masks. Each subsequent row shows the segmentation masks generated by a different model for the corresponding image in the first row.</p>
Full article ">Figure 12
<p>Example images, ground-truth masks, and inference masks from the Potsdam dataset. The first row shows example images. The second row shows the corresponding ground-truth masks. Each subsequent row shows the segmentation masks generated by a different model for the corresponding image in the first row.</p>
Full article ">Figure 13
<p>Confusion matrix for the U-Net ensemble model on the Potsdam dataset, with clutter (<b>left</b>) and without clutter (<b>right</b>).</p>
Full article ">Figure 14
<p>Example images and inference masks for INRIA dataset. First row shows example images and second row shows the prediction results of U-Net ensemble model as in <a href="#remotesensing-16-02077-t002" class="html-table">Table 2</a>.</p>
Full article ">
14 pages, 3490 KiB  
Article
Rapid and Sensitive Detection by Combining Electric Field Effects and Surface Plasmon Resonance: A Theoretical Study
by Qijie Qiu and Yan Xu
Micromachines 2024, 15(5), 653; https://doi.org/10.3390/mi15050653 - 15 May 2024
Cited by 1 | Viewed by 879
Abstract
Surface plasmon resonance (SPR) has been extensively employed in biological sensing, environmental detection, as well as chemical industry. Nevertheless, the performance possessed by conventional surface plasmon resonance (SPR) biosensors can be further limited by the transport of analyte molecules to the sensing surface, [...] Read more.
Surface plasmon resonance (SPR) has been extensively employed in biological sensing, environmental detection, as well as chemical industry. Nevertheless, the performance possessed by conventional surface plasmon resonance (SPR) biosensors can be further limited by the transport of analyte molecules to the sensing surface, noteworthily when small molecules or low levels of substances are being detected. In this study, a rapid and highly sensitive SPR biosensor is introduced to enhance the ability of the target analytes’ collection by integrating AC electroosmosis (ACEO) and dielectrophoresis (DEP). Both the above-mentioned phenomena principally arise from the generation of the AC electric fields. This generation can be tailored by shaping the interdigitated electrodes (IDEs) that also serve as the SPR biomarker sensing area. The effects exerted by different parameters (e.g., the frequency and voltage of the AC electric field as well as microelectrode structures) are considered in the iSPR (interdigitated SPR) biosensor operation, and the iSPR biosensors are optimized with the sensitivity. The results of this study confirm that the iSPR can efficiently concentrate small molecules into the SPR sensing area, such that SPR reactions achieve an order of magnitude increase, and the detection time is shortened. The rapid and sensitive sensor takes on critical significance in the development of on-site diagnostics in a wide variety of human and animal health applications. Full article
(This article belongs to the Special Issue Micromachines for Dielectrophoresis, 3rd Edition)
Show Figures

Figure 1

Figure 1
<p>Schematic and principle of AC DEP–ACEO-enhanced surface plasmon resonance. Au microelectrodes are first patterned by photolithography and metal liftoff on a glass substrate. With out-of-phase AC voltage applied to the Au electrodes, the electrical double layer horizontally moves along the electrode surfaces. The electrical double-layer motion generates a hydrodynamic rotational flow in the microfluidic channel, and the target analyte is polarized. The hydrodynamic flow (ACEO) and DEP facilitate the transport of the target biomolecules to the sensing surface and their surface binding reaction.</p>
Full article ">Figure 2
<p>(<b>a</b>) Numerical simulation of the electric field strength under different electrode gaps. (<b>d</b>) <math display="inline"><semantics> <mrow> <mo>∇</mo> <mo>|</mo> <mi mathvariant="normal">E</mi> <msup> <mrow> <mo>|</mo> </mrow> <mrow> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math><sub>max</sub> calculated under different electrode gaps as a linear function of Vpp. (<b>c</b>) Relationship between V<sub>ACEO</sub> and frequency. The AC electroosmotic velocity calculated at locations ‘x’ from the electrode edge under 0.0002 S m<sup>−1</sup>, 1 V. (<b>e</b>) Clausius–Mossotti factor calculated for polystyrene particles (<math display="inline"><semantics> <mrow> <msub> <mrow> <mo>ε</mo> </mrow> <mrow> <mi mathvariant="normal">p</mi> </mrow> </msub> <mo>=</mo> <mn>10</mn> <mo>,</mo> <msub> <mrow> <mo>σ</mo> </mrow> <mrow> <mi mathvariant="normal">p</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mi mathvariant="normal">S</mi> <mo>/</mo> <mi mathvariant="normal">m</mi> </mrow> </semantics></math>) in DI water. (<b>b</b>) Calculated distribution of <math display="inline"><semantics> <mrow> <mo>∇</mo> <mo>|</mo> <mi mathvariant="normal">E</mi> <msup> <mrow> <mo>|</mo> </mrow> <mrow> <mn>2</mn> </mrow> </msup> </mrow> </semantics></math> with a gap distance of E<sub>G</sub> = 10 μm and f = 10<sup>5</sup> Hz. (<b>f</b>) The maximum value of ACEO as a result of increasing V<sub>pp</sub> (f = 10<sup>5</sup> Hz) from 0 to 3 V.</p>
Full article ">Figure 3
<p>Simulation of the electric fields for the evaluation of DEP and ACEO. (<b>a</b>) Calculated distribution of ACEO flow on electrode surfaces under V<sub>pp</sub> = 3 V and f = 10<sup>5</sup> Hz. (<b>b</b>) Numerical simulation of the electric field strength in iSPR biochips. (<b>c</b>) Details of the DEP force at distances 5 μm from the electrode edge along the red dashed line in (<b>b</b>). (<b>d</b>) Plot of log10(<math display="inline"><semantics> <mrow> <mrow> <mrow> <msub> <mrow> <mo>|</mo> <mi mathvariant="normal">F</mi> </mrow> <mrow> <mi mathvariant="normal">D</mi> <mi mathvariant="normal">E</mi> <mi mathvariant="normal">P</mi> </mrow> </msub> <mo>|</mo> </mrow> <mo>/</mo> <mrow> <msub> <mrow> <mo>|</mo> <mi mathvariant="normal">F</mi> </mrow> <mrow> <mi mathvariant="normal">d</mi> <mi mathvariant="normal">r</mi> <mi mathvariant="normal">a</mi> <mi mathvariant="normal">g</mi> </mrow> </msub> <mo>|</mo> </mrow> </mrow> </mrow> </semantics></math>) comparing the forces in vertical direction exerted on PS particles under AC electric field (dielectrophoretic vs. Stokes drag force). (<b>e</b>) The number of captured PS particles in the SPR sensing area at 15 s.</p>
Full article ">Figure 4
<p>Target analyte concentration profiles in the iSPR microfluidic channel of the device operated under DEP and ACEO at V<sub>pp</sub> = 3 V f = 10<sup>5</sup> Hz, and with diffusion only, respectively. The flow field direction is shown by the black arrows. The initial concentration for two cases is set at 5 mol/L. The inlet (left edges) and the outlet (right edges) for diffusion-only cases (<b>a</b>) and DEP–ACEO (<b>b</b>) are defined as open boundaries with no target analyte replenishment.</p>
Full article ">Figure 5
<p>Simulation (<b>a</b>) and experimental (<b>b</b>–<b>e</b>) results of the pDEP and ACEO effects of polystyrene microbeads. (<b>a</b>) Results of the COMSOL particle tracing temporal study with a 3 V<sub>pp</sub> voltage and the same electrical parameter as previously. A relative buffer permittivity of 80 is used in each model. While positive pDEP typically attracts objects to the edge of electrodes, electro-osmosis can drag microbeads toward the center of the electrodes, which facilitates SPR detection. (<b>b</b>–<b>d</b>) Representative images of polystyrene microbead collection using the iSPR chip (under 10× magnification). ISPR chip with 5 μm diameter polystyrene microbeads at 10<sup>5</sup> Hz and 3 V suspended in 0.0002 S m<sup>−1</sup> media with a low concentration of particles. (<b>e</b>) Comparison of the concentration changes for DEP–ACEO and diffusion only from the COMSOL temporal study.</p>
Full article ">
19 pages, 2039 KiB  
Article
EAD-Net: Efficiently Asymmetric Network for Semantic Labeling of High-Resolution Remote Sensing Images with Dynamic Routing Mechanism
by Qiongqiong Hu, Feiting Wang and Ying Li
Remote Sens. 2024, 16(9), 1478; https://doi.org/10.3390/rs16091478 - 23 Apr 2024
Viewed by 665
Abstract
Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it [...] Read more.
Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it is also crucial to improve segmentation speed. To address this issue, we propose an efficient asymmetric deep learning network for HRRSIs, referred to as EAD-Net. First, EAD-Net employs ResNet50 as the backbone without pooling, instead of the RepVGG block, to extract rich semantic features while reducing model complexity. Second, a dynamic routing module is proposed in EAD-Net to adjust routing based on the pixel occupancy of small-scale objects. Concurrently, a channel attention mechanism is used to preserve their features even with minimal occupancy. Third, a novel asymmetric decoder is introduced, which uses convolutional operations while discarding skip connections. This not only effectively reduces redundant features but also allows using low-level image features to enhance EAD-Net’s performance. Extensive experimental results on the ISPRS 2D semantic labeling challenge benchmark demonstrate that EAD-Net achieves state-of-the-art (SOTA) accuracy performance while reducing model complexity and inference time, while the mean Intersection over Union (mIoU) score reaching 87.38% and 93.10% in the Vaihingen and Potsdam datasets, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>The structure of the EAD-Net.</p>
Full article ">Figure 2
<p>The architecture of the Re-Parameterization Visual Geometry Group block.</p>
Full article ">Figure 3
<p>The architecture of the dynamic module.</p>
Full article ">Figure 4
<p>The structure of the Adaptive and Selective Perceptual Pooling.</p>
Full article ">Figure 5
<p>The structure of the channel attention module.</p>
Full article ">Figure 6
<p>A sample of a patch from the ISPRS Vaihingen dataset and its corresponding ground truth in (<b>a</b>), a sample of a patch from the ISPRS Potsdam dataset and its corresponding ground truth in (<b>b</b>), and the different categories labeled by different colors in (<b>c</b>).</p>
Full article ">Figure 7
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Vaihingen dataset.</p>
Full article ">Figure 8
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Vaihingen dataset.</p>
Full article ">Figure 9
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Vaihingen dataset.</p>
Full article ">Figure 10
<p>Classification maps of an original image in the Vaihingen dataset.</p>
Full article ">Figure 11
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Potsdam dataset.</p>
Full article ">Figure 12
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Potsdam dataset.</p>
Full article ">Figure 13
<p>Visual comparisons with deep learning models of local evaluation on ISPRS Potsdam dataset.</p>
Full article ">Figure 14
<p>Ablation study of ISPRS Vaihingen dataset.</p>
Full article ">Figure 15
<p>Ablation study of ISPRS Potsdam dataset.</p>
Full article ">
23 pages, 7834 KiB  
Article
A Multiscale Filtering Method for Airborne LiDAR Data Using Modified 3D Alpha Shape
by Di Cao, Cheng Wang, Meng Du and Xiaohuan Xi
Remote Sens. 2024, 16(8), 1443; https://doi.org/10.3390/rs16081443 - 18 Apr 2024
Viewed by 1011
Abstract
The complexity of terrain features poses a substantial challenge in the effective processing and application of airborne LiDAR data, particularly in regions characterized by steep slopes and diverse objects. In this paper, we propose a novel multiscale filtering method utilizing a modified 3D [...] Read more.
The complexity of terrain features poses a substantial challenge in the effective processing and application of airborne LiDAR data, particularly in regions characterized by steep slopes and diverse objects. In this paper, we propose a novel multiscale filtering method utilizing a modified 3D alpha shape algorithm to increase the ground point extraction accuracy in complex terrain. Our methodology comprises three pivotal stages: preprocessing for outlier removal and potential ground point extraction; the deployment of a modified 3D alpha shape to construct multiscale point cloud layers; and the use of a multiscale triangulated irregular network (TIN) densification process for precise ground point extraction. In each layer, the threshold is adaptively determined based on the corresponding α. Points closer to the TIN surface than the threshold are identified as ground points. The performance of the proposed method was validated using a classical benchmark dataset provided by the ISPRS and an ultra-large-scale ground filtering dataset called OpenGF. The experimental results demonstrate that this method is effective, with an average total error and a kappa coefficient on the ISPRS dataset of 3.27% and 88.97%, respectively. When tested in the large scenarios of the OpenGF dataset, the proposed method outperformed four classical filtering methods and achieved accuracy comparable to that of the best of learning-based methods. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Workflow of the proposed method for filtering airborne LiDAR data.</p>
Full article ">Figure 2
<p>Illustration of 3D alpha shape: (<b>a</b>) the original point cloud of a rabbit; (<b>b</b>) the convex hull of the original point cloud; (<b>c</b>,<b>d</b>) the alpha shapes of the original point cloud; the <math display="inline"><semantics> <mi>α</mi> </semantics></math> used in (<b>c</b>) is 0.05 m, and the <math display="inline"><semantics> <mi>α</mi> </semantics></math> used in (<b>d</b>) is 0.01 m.</p>
Full article ">Figure 3
<p>Comparison between 3D alpha shape and the modified 3D alpha shape: (<b>a</b>) the original point cloud (red for nonground points and bule for ground points); (<b>b</b>) result of the 3D alpha shape; (<b>c</b>) result of the modified 3D alpha shape. The green translucent surfaces in (<b>b</b>,<b>c</b>) are the resulting shapes.</p>
Full article ">Figure 4
<p>Overview of the modified 3D alpha shape algorithm.</p>
Full article ">Figure 5
<p>The extraction of point cloud layers. (<b>a</b>) the top layer extracted using a sufficiently large <math display="inline"><semantics> <mi>α</mi> </semantics></math> (radius of the ball); (<b>b</b>–<b>d</b>) the lower layers extracted using gradually decreasing <math display="inline"><semantics> <mi>α</mi> </semantics></math>.</p>
Full article ">Figure 6
<p>Procedure for multiscale TIN densification. (<b>a</b>) the top layer that provides seed points; (<b>b</b>–<b>d</b>) extraction of ground points in the lower layers of the data pyramid; (<b>e</b>) extraction of final ground points.</p>
Full article ">Figure 7
<p>Illustration of distance threshold determination. (<b>a</b>) <span class="html-italic">d</span> is the distance between the point <span class="html-italic">P</span> and its nearest TIN facet; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>D</mi> <mi>T</mi> </mrow> </semantics></math> is the distance threshold and <math display="inline"><semantics> <mi>θ</mi> </semantics></math> is the relative slope.</p>
Full article ">Figure 8
<p>Filtering results for samp11: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of type I and type II errors. Most of the buildings were accurately filtered, and the road on the slope was accurately extracted (indicated with a green rectangle). The yellow rectangles indicate the areas where the roof of terraced buildings was misclassified.</p>
Full article ">Figure 9
<p>Filtering results for samp22: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of errors. The buildings and bridge were accurately filtered (indicated with green rectangle). The yellow rectangles indicate the areas where type II errors occurred due to the misclassification of some edges of the terraced floor, resulting in the roadside edge being cut off in the DTM.</p>
Full article ">Figure 10
<p>Filtering results for samp31: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of errors. The large buildings in the middle were accurately filtered. The yellow rectangles indicate the areas where an edge of the terraced floor was misclassified, resulting in the oversmoothing of the DTM in the corresponding part.</p>
Full article ">Figure 11
<p>Filtering results for samp42: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of errors. The large buildings of the railway station were well-filtered. However, several points on the roof in the lower-left corner were misclassified due to a lack of nearby ground points.</p>
Full article ">Figure 12
<p>Filtering results for samp53: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of type I and type II errors. The slopes and cliffs were successfully extracted. The yellow rectangles indicate the areas where low vegetations were misclassified, resulting in small protuberance in the DTM.</p>
Full article ">Figure 13
<p>Filtering results for samp61: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of type I and type II errors. The vegetation on terrain with large gaps and steep slopes was accurately filtered.</p>
Full article ">Figure 14
<p>Filtering results for samp71: (<b>a</b>) the reference DTM, (<b>b</b>) filtered DTM, and (<b>c</b>) distribution of type I and type II errors. The bridge was correctly filtered (indicated with green rectangle), whereas some points on the roadside were misclassified due to small scale undulations (indicated with yellow rectangles).</p>
Full article ">Figure 15
<p>Filtering results on test samples in OpenGF: (<b>first column</b>) Test I; (<b>second column</b>) Test II (without outliers); (<b>third column</b>) Test III. (<b>a</b>–<b>c</b>) The DSMs of the test samples; (<b>d</b>–<b>f</b>) the DTMs constructed with the filtering results; (<b>g</b>–<b>i</b>) distribution of type I and type II errors (red for type I errors and blue for type II errors).</p>
Full article ">Figure 16
<p>Calculation time (seconds) of MASF, CSF, and PMF.</p>
Full article ">Figure 17
<p>Ground seeds of representative samples (samp11 for urban areas and samp53 for rural areas): (<b>a</b>,<b>e</b>) overlay result of reference DTM and ground seeds (the red points) extracted using cell lowest point; (<b>b</b>,<b>f</b>) DTM generated with these seeds; (<b>c</b>,<b>g</b>) Overlay result of reference DTM and ground seeds extracted with our method, and (<b>d</b>,<b>h</b>) DTM generated with these seeds.</p>
Full article ">Figure 18
<p>Analysis of sensitivities to parameter <math display="inline"><semantics> <msub> <mi>α</mi> <mrow> <mi>s</mi> <mi>t</mi> <mi>e</mi> <mi>p</mi> </mrow> </msub> </semantics></math>: (<b>a</b>) total errors for samples in urban areas; (<b>b</b>) total errors for samples in rural areas; (<b>c</b>) mean total errors of samples; (<b>d</b>) standard deviation of the total error for each sample.</p>
Full article ">
20 pages, 4443 KiB  
Article
PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling
by Ruixing Chen, Jun Wu, Ying Luo and Gang Xu
Remote Sens. 2024, 16(7), 1246; https://doi.org/10.3390/rs16071246 - 31 Mar 2024
Cited by 2 | Viewed by 1117
Abstract
For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The [...] Read more.
For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The main contribution of PointMM involves two aspects: (1) Multi-spatial feature encoding. We leverage a novel feature encoding module to learn multi-spatial features from the neighborhood point set obtained by k-nearest neighbors (KNN) in the feature space. This enhances the network’s ability to learn the spatial structures of various samples more finely and completely. (2) Multi-head attention pooling. We leverage a multi-head attention pooling module to address the limitations of symmetric function-based pooling, such as maximum and average pooling, in terms of losing detailed feature information. This is achieved by aggregating multi-spatial and attribute features of point clouds, thereby enhancing the network’s ability to transmit information more comprehensively and accurately. Experiments on publicly available point cloud datasets S3DIS and ISPRS 3D Vaihingen demonstrate that PointMM effectively learns features at different levels, while improving the semantic segmentation accuracy of various objects. Compared to 12 state-of-the-art methods reported in the literature, PointMM outperforms the runner-up by 2.3% in OA on the ISPRS 3D Vaihingen dataset, and achieves the third best performance in both OA and MioU on the S3DIS dataset. Both achieve a satisfactory balance between OA, F1, and MioU. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

Figure 1
<p>PointMM network structure. (The thin arrow represents the flowchart of the network framework, while the thick arrow indicates the various components of the downsampling layer).</p>
Full article ">Figure 2
<p>Multi-Spatial Feature Encoding module.</p>
Full article ">Figure 3
<p>Multi-head attention pooling module.</p>
Full article ">Figure 4
<p>Segmentation results of each module.</p>
Full article ">Figure 5
<p>The MIoU of different sampling densities based on area 6.</p>
Full article ">Figure 6
<p>The OA of different neighborhood points based on area 6.</p>
Full article ">Figure 7
<p>Segmentation results of different methods. (The red circle represents the incorrectly segmented area).</p>
Full article ">
25 pages, 4894 KiB  
Article
A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images
by Xin Li, Xi Yong, Tao Li, Yao Tong, Hongmin Gao, Xinyuan Wang, Zhennan Xu, Yiwei Fang, Qian You and Xin Lyu
Remote Sens. 2024, 16(7), 1214; https://doi.org/10.3390/rs16071214 - 29 Mar 2024
Cited by 2 | Viewed by 909
Abstract
Semantic segmentation of remote sensing images (RSIs) is pivotal for numerous applications in urban planning, agricultural monitoring, and environmental conservation. However, traditional approaches have primarily emphasized learning within the spatial domain, which frequently leads to less than optimal discrimination of features. Considering the [...] Read more.
Semantic segmentation of remote sensing images (RSIs) is pivotal for numerous applications in urban planning, agricultural monitoring, and environmental conservation. However, traditional approaches have primarily emphasized learning within the spatial domain, which frequently leads to less than optimal discrimination of features. Considering the inherent spectral qualities of RSIs, it is essential to bolster these representations by incorporating the spectral context in conjunction with spatial information to improve discriminative capacity. In this paper, we introduce the spectral–spatial context-boosted network (SSCBNet), an innovative network designed to enhance the accuracy semantic segmentation in RSIs. SSCBNet integrates synergetic attention (SYA) layers and cross-fusion modules (CFMs) to harness both spectral and spatial information, addressing the intrinsic complexities of urban and natural landscapes within RSIs. Extensive experiments on the ISPRS Potsdam and LoveDA datasets reveal that SSCBNet surpasses existing state-of-the-art models, achieving remarkable results in F1-scores, overall accuracy (OA), and mean intersection over union (mIoU). Ablation studies confirm the significant contribution of SYA layers and CFMs to the model’s performance, emphasizing the effectiveness of these components in capturing detailed contextual cues. Full article
Show Figures

Figure 1

Figure 1
<p>Visualizations of different frequency components: (<b>a</b>) components of R (red) band, (<b>b</b>) components of G (green) band, and (<b>c</b>) components of B (blue) band. RGB image comes from LoveDA dataset [<a href="#B39-remotesensing-16-01214" class="html-bibr">39</a>]; <math display="inline"><semantics> <mrow> <mi>L</mi> <mi>L</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>L</mi> <mi>H</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>H</mi> <mi>L</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>H</mi> <mi>H</mi> </mrow> </semantics></math> represent low-frequency, horizontal, vertical, and high-frequency components (projected by discrete wavelet transformation), respectively.</p>
Full article ">Figure 2
<p>The topological framework of SSCBNet.</p>
Full article ">Figure 3
<p>Pipeline of convolution block in SSCBNet.</p>
Full article ">Figure 4
<p>Pipeline of SYA.</p>
Full article ">Figure 5
<p>Pipeline of FSA.</p>
Full article ">Figure 6
<p>Pipeline of DDSA.</p>
Full article ">Figure 7
<p>Pipeline of PSA.</p>
Full article ">Figure 8
<p>Pipeline of CFM.</p>
Full article ">Figure 9
<p>Example of ISPRS Potsdam dataset.</p>
Full article ">Figure 10
<p>Example of LoveDA dataset.</p>
Full article ">Figure 11
<p>Visual inspections of random samples from ISPRS Potsdam test set.</p>
Full article ">Figure 12
<p>Visual inspections of random samples from LoveDA test set.</p>
Full article ">Figure 13
<p>Visual inspections on validating SYA and CFM (random samples from ISPRS Potsdam test set): (<b>a</b>) input image, (<b>b</b>) ground truth, (<b>c</b>) predicted by SSCBNet, (<b>d</b>) predicted by SSCBNet without SYA, (<b>e</b>) predicted by SSCBNet without CFM.</p>
Full article ">Figure 14
<p>Visual inspections on validating SYA and CFM (random samples from LoveDA test set): (<b>a</b>) input image, (<b>b</b>) ground truth, (<b>c</b>) predicted by SSCBNet, (<b>d</b>) predicted by SSCBNet without SYA, (<b>e</b>) predicted by SSCBNet without CFM.</p>
Full article ">
27 pages, 46596 KiB  
Article
Adaptive Clustering for Point Cloud
by Zitao Lin, Chuanli Kang, Siyi Wu, Xuanhao Li, Lei Cai, Dan Zhang and Shiwei Wang
Sensors 2024, 24(3), 848; https://doi.org/10.3390/s24030848 - 28 Jan 2024
Cited by 1 | Viewed by 1364
Abstract
The point cloud segmentation method plays an important role in practical applications, such as remote sensing, mobile robots, and 3D modeling. However, there are still some limitations to the current point cloud data segmentation method when applied to large-scale scenes. Therefore, this paper [...] Read more.
The point cloud segmentation method plays an important role in practical applications, such as remote sensing, mobile robots, and 3D modeling. However, there are still some limitations to the current point cloud data segmentation method when applied to large-scale scenes. Therefore, this paper proposes an adaptive clustering segmentation method. In this method, the threshold for clustering points within the point cloud is calculated using the characteristic parameters of adjacent points. After completing the preliminary segmentation of the point cloud, the segmentation results are further refined according to the standard deviation of the cluster points. Then, the cluster points whose number does not meet the conditions are further segmented, and, finally, scene point cloud data segmentation is realized. To test the superiority of this method, this study was based on point cloud data from a park in Guilin, Guangxi, China. The experimental results showed that this method is more practical and efficient than other methods, and it can effectively segment all ground objects and ground point cloud data in a scene. Compared with other segmentation methods that are easily affected by parameters, this method has strong robustness. In order to verify the universality of the method proposed in this paper, we test a public data set provided by ISPRS. The method achieves good segmentation results for multiple sample data, and it can distinguish noise points in a scene. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Figure 1
<p>The workflow of large-scale scene segmentation based on the adaptive clustering method. Sample 42 provided by ISPRS is used as an example.</p>
Full article ">Figure 2
<p>Flowchart of normal vector calculation.</p>
Full article ">Figure 3
<p>Flowchart of computation bound.</p>
Full article ">Figure 4
<p>Flowchart of cluster merging.</p>
Full article ">Figure 5
<p>Study data distribution. White points are the original large scene study area, blue points are the medium scene study area, and red points are the small scene study area.</p>
Full article ">Figure 6
<p>The normal vector pointing of the whole study data. A normal vector is drawn at an interval of 6 points.</p>
Full article ">Figure 7
<p>The normal vector pointing distribution of each block. The purple area is the border area, the blue area is the forest area, and the red area is the regular area.</p>
Full article ">Figure 8
<p>High-density point distribution. The red points comprise more than 30 adjacent points within 4 times the average density, and the blue points comprise the other points in the scene, except for the high-density points.</p>
Full article ">Figure 9
<p>Initial clustering results. Different colors represent different clusters.</p>
Full article ">Figure 10
<p>Classification results of point clouds. Panel (<b>a</b>) is the initial classification results of a certain number of point clouds, and different colors represent different clusters; panel (<b>b</b>) is the three-dimensional boundary of preliminary clusters, and the red bounding box indicates the same cluster class; panel (<b>c</b>) is the classification results with scatter points; panel (<b>d</b>) is the boundary distribution of clusters.</p>
Full article ">Figure 11
<p>The result of low-density clustering points and merged clustering points. Panel (<b>a</b>) shows the distribution of elimination points; the blue points are scattered points, and the red points are points with a certain density. Panel (<b>b</b>) shows the distribution of segmentation results; the black points are elimination points.</p>
Full article ">Figure 12
<p>The clustering results after removing the low-density clustering point cloud blocks. Panel (<b>a</b>) shows the distribution of the clustering points after removing the low-density clustering point cloud blocks; panel (<b>b</b>) shows the boundary after removing the low-density clustering point cloud.</p>
Full article ">Figure 13
<p>Low-density clustering results. Panel (<b>a</b>) shows the low-density point cloud clustering results; panel (<b>b</b>) shows the low-density point cloud clustering boundary.</p>
Full article ">Figure 14
<p>The merged clustering point cloud and its boundary. Panel (<b>a</b>) shows the distribution of clustering points, and panel (<b>b</b>) shows the boundary of the clustering points.</p>
Full article ">Figure 15
<p>The remaining scatter distribution. The black points are the completed cluster segmentation points, and the red points are the cluster scatter points.</p>
Full article ">Figure 16
<p>The final segmentation result.</p>
Full article ">Figure 17
<p>Euclidean clustering segmentation results for the small scene. Panels (<b>a</b>,<b>c</b>,<b>e</b>) show scatter distribution maps using 2 m, 2.5 m, and 3 m clustering thresholds, respectively; panels (<b>b</b>,<b>d</b>,<b>f</b>) show the cluster top views using 2 m, 2.5 m, and 3 m clustering thresholds. The black line shows the outer contour of the clustering result, and the red line box shows the area with a poor classification result.</p>
Full article ">Figure 18
<p>Region growth segmentation results for the small scene. The red box shows the area with a poor classification result. Panel (<b>a</b>) shows the distribution of clustering points, and panel (<b>b</b>) shows the boundary of the clustering points.</p>
Full article ">Figure 19
<p>Segmentation result map of the proposed method for the small scene. The red box shows the area with a poor classification result, the blue box shows the eliminated discrete points, and the black line is the outer contour of the clustering result. Panel (<b>a</b>) shows the distribution of clustering points, and panel (<b>b</b>) shows the boundary of the clustering points.</p>
Full article ">Figure 20
<p>Segmentation side view of the proposed method for the small scene. The red box shows the disputed area.</p>
Full article ">Figure 21
<p>Euclidean clustering segmentation results for the medium scene. Panels (<b>a</b>,<b>c</b>,<b>e</b>) show the scatter distribution maps using 1.5 m, 2 m, and 3 m clustering thresholds, respectively; panels (<b>b</b>,<b>d</b>,<b>f</b>) show the cluster top views using 1.5 m, 2 m, and 3 m clustering thresholds. The black line is the outer contour of the clustering result, and the red line box shows the area with a poor classification result.</p>
Full article ">Figure 22
<p>Region-growing method segmentation results for the medium scene. Panel (<b>a</b>) shows a scatter plot, and panel (<b>b</b>) shows a cluster top view. The black line is the boundary of the clustering result, and the red box indicates the area with a poor classification result.</p>
Full article ">Figure 23
<p>Euclidean clustering segmentation results for the large scene. Panels (<b>a</b>,<b>c</b>,<b>e</b>) show the scatter distribution maps using 1.5 m, 2 m, and 2.5 m clustering thresholds, respectively; panels (<b>b</b>,<b>d</b>,<b>f</b>) show the cluster top views using 1.5 m, 2 m, and 2.5 m clustering thresholds. The black line is the outer contour of the clustering result, and the red line box shows the area with a poor classification result.</p>
Full article ">Figure 24
<p>Region-growing segmentation results for the large scene. Panel (<b>a</b>) shows a scatter plot, and panel (<b>b</b>) shows a cluster top view. The black line is the boundary of the clustering result, and the red box indicates the area with a poor classification result.</p>
Full article ">Figure 25
<p>Segmentation result map of the proposed method for the large scene. Panel (<b>a</b>) shows the clustering plots, and panel (<b>b</b>) shows a cluster top view. The black line is the boundary of the clustering result. Panel (<b>c</b>) is the boundary of the clustering result.</p>
Full article ">Figure 26
<p>Distributions of samples 1, 2, 3, and 4.</p>
Full article ">Figure 27
<p>Distributions of samples 5 and 6.</p>
Full article ">Figure 28
<p>Segmentation error of each sample area using the Euclidean clustering method. The black point is the correct segmented point cloud, the red point is the under-segmented point cloud, and the blue point is the over-segmented point cloud.</p>
Full article ">Figure 29
<p>Segmentation error of each sample area using the region-growing method. The black point is the correct segmented point cloud, the red point is the under-segmented point cloud, and the blue point is the over-segmented point cloud.</p>
Full article ">Figure 30
<p>Segmentation error of each sample area using our method. The black point is the correct segmented point cloud, the red point is the under-segmented point cloud, and the blue point is the over-segmented point cloud.</p>
Full article ">Figure 31
<p>Comparison of error ratios. Panel (<b>a</b>) shows a comparison chart of Euclidean clustering and the regional-growing method, panel (<b>b</b>) shows a comparison chart of Euclidean clustering and our method, and panel (<b>c</b>) shows a comparison chart of the region-growing method and our method.</p>
Full article ">Figure 32
<p>Small-scene segmentation results with different constant coefficients. Panels (<b>a</b>,<b>c</b>,<b>e</b>) show the split scatter plots with constant coefficients of 2, 3, and 4, respectively; panels (<b>b</b>,<b>d</b>,<b>f</b>) show the segmented top view with constant coefficients of 2, 3, and 4. The black lines are the outer contours of different cluster points.</p>
Full article ">Figure 33
<p>Medium-scene segmentation results with different constant coefficients. Panels (<b>a</b>,<b>c</b>,<b>e</b>) show split scatter plots with constant coefficients of 2, 3, and 4, respectively; panels (<b>b</b>,<b>d</b>,<b>f</b>) show the segmented top view with constant coefficients of 2, 3, and 4. The black lines show the outer contours of different cluster points.</p>
Full article ">Figure 34
<p>Clustering segmentation results of ground points of different samples. The black points are the correct segmented ground points, the red points are the under-segmented ground points, and the blue points are the over-segmented ground points.</p>
Full article ">
Back to TopTop