MDPI - Publisher of Open Access Journals

18 pages, 232655 KiB

Open AccessArticle

SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation

by Gyutae Hwang, Jiwoo Jeong and Sang Jun Lee

Remote Sens. 2024, 16(17), 3278; https://doi.org/10.3390/rs16173278 - 3 Sep 2024

Viewed by 625

Advances in deep learning and computer vision techniques have made impacts in the field of remote sensing, enabling efficient data analysis for applications such as land cover classification and change detection. Convolutional neural networks (CNNs) and transformer architectures have been utilized in visual [...] Read more.

Advances in deep learning and computer vision techniques have made impacts in the field of remote sensing, enabling efficient data analysis for applications such as land cover classification and change detection. Convolutional neural networks (CNNs) and transformer architectures have been utilized in visual perception algorithms due to their effectiveness in analyzing local features and global context. In this paper, we propose a hybrid transformer architecture that consists of a CNN-based encoder and transformer-based decoder. We propose a feature adjustment module that refines the multiscale feature maps extracted from an EfficientNet backbone network. The adjusted feature maps are integrated into the transformer-based decoder to perform the semantic segmentation of the remote sensing images. This paper refers to the proposed encoder–decoder architecture as a semantic feature adjustment network (SFA-Net). To demonstrate the effectiveness of the SFA-Net, experiments were thoroughly conducted with four public benchmark datasets, including the UAVid, ISPRS Potsdam, ISPRS Vaihingen, and LoveDA datasets. The proposed model achieved state-of-the-art accuracy on the UAVid, ISPRS Vaihingen, and LoveDA datasets for the segmentation of the remote sensing images. On the ISPRS Potsdam dataset, our method achieved comparable accuracy to the latest model while reducing the number of trainable parameters from 113.8 M to 10.7 M. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing and Geodata)

► Show Figures

Figure 1

20 pages, 3112 KiB

Open AccessArticle

Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion

by Yihao Sun, Mingrui Wang, Xiaoyi Huang, Chengshu Xin and Yinan Sun

Remote Sens. 2024, 16(17), 3248; https://doi.org/10.3390/rs16173248 - 2 Sep 2024

Viewed by 600

Abstract

For ultra-high-resolution (UHR) image semantic segmentation, striking a balance between computational efficiency and storage space is a crucial research direction. This paper proposes a Feature Fusion Network (EFFNet) to improve UHR image semantic segmentation performance. EFFNet designs a score map that can be [...] Read more.

For ultra-high-resolution (UHR) image semantic segmentation, striking a balance between computational efficiency and storage space is a crucial research direction. This paper proposes a Feature Fusion Network (EFFNet) to improve UHR image semantic segmentation performance. EFFNet designs a score map that can be embedded into the network for training purposes, enabling the selection of the most valuable features to reduce storage consumption, accelerate speed, and enhance accuracy. In the fusion stage, we improve upon previous redundant multiple feature fusion methods by utilizing a transformer structure for one-time fusion. Additionally, our combination of the transformer structure and multibranch structure allows it to be employed for feature fusion, significantly improving accuracy while ensuring calculations remain within an acceptable range. We evaluated EFFNet on the ISPRS two-dimensional semantic labeling Vaihingen and Potsdam datasets, demonstrating that its architecture offers an exceptionally effective solution with outstanding semantic segmentation precision and optimized inference speed. EFFNet substantially enhances critical performance metrics such as Intersection over Union (IoU), overall accuracy, and F1-score, highlighting its superiority as an architectural innovation in ultra-high-resolution remote sensing image semantic segmentation. Full article

(This article belongs to the Special Issue Deep Learning for Satellite Image Segmentation)

► Show Figures

Figure 1

Figure 1
Overview of the Efficient Future Fusion Network (EFFNet): The network utilizes cropped and downsampled full-resolution patches and consists of both local and global branches. After passing the local feature maps through ResNet and applying one-dimensional convolution, the score map module employs a Sigmoid activation function to extract significant local features. These local features are then efficiently fused with global features using a fast fusion mechanism, resulting in a high-resolution and information-rich feature map that is utilized for final semantic segmentation. Consequently, our objective is to enhance the accuracy and efficiency of the network by introducing two attention-based mechanism modules designed to reduce the processing load of local features while improving feature matching across samples. Full article ">Figure 2
Score map module. The input image is subjected to two successive layers of 3 × 3 convolutions using ResNet, resulting in a feature map of dimensions H × W × 1. Following the Sigmoid function activation, the feature map is indexed, and high-value features are selectively retained. Full article ">Figure 3
The fast fusion mechanism facilitates seamless integration between the global and local branches, enabling extensive collaboration through the fusion of feature maps at each layer using multiple attention weights. The model’s depth determines the number of layers, while the merging process occurs N times based on the number of cropped global patches. Subsequently, these attention weights are computed by leveraging local and global features such as Q, K, and V. The optimization objective in this study encompasses a primary loss derived from the merged results along with two additional losses. Full article ">Figure 4
The GPU inference frames per second (FPS) and Mean Intersection over Union (mIoU) accuracy are evaluated on the (a) Vaihingen and (b) Potsdam datasets. EFFNet (represented by red dots) outperforms existing networks, including GLNet, in terms of both inference speed and accuracy for segmenting ultra-high-resolution images. Full article ">Figure 5
Semantic segmentation results when adopting different modules on (a) the Vaihingen and (b) the Potsdam datasets. Full article ">Figure 6
Ablation study of different transformer locations. Full article ">Figure 7
Comparison of semantic segmentation results with and without score map on (a) Vaihingen and (b) Potsdam datasets. Full article ">Figure 8
Ablation study of different numbers of patches. Full article ">

23 pages, 2671 KiB

Open AccessArticle

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

by Jiang Liu, Shuli Cheng and Anyu Du

Remote Sens. 2024, 16(17), 3184; https://doi.org/10.3390/rs16173184 - 28 Aug 2024

Viewed by 339

Abstract

Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. [...] Read more.

Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. However, the fused feature information is not sufficiently enriched and it often lacks detailed refinement. To address this issue, we propose a novel method called the Multi-View Feature Fusion and Rich Information Refinement Network (MFRNet). Our model is equipped with the Multi-View Feature Fusion Block (MAFF) to merge various types of information, including local, non-local, channel, and positional information. Within MAFF, we introduce two innovative methods. The Sliding Heterogeneous Multi-Head Attention (SHMA) extracts local, non-local, and positional information using a sliding window, while the Multi-Scale Hierarchical Compressed Channel Attention (MSCA) leverages bar-shaped pooling kernels and stepwise compression to obtain reliable channel information. Additionally, we introduce the Efficient Feature Refinement Module (EFRM), which enhances segmentation accuracy by interacting the results of the Long-Range Information Perception Branch and the Local Semantic Information Perception Branch. We evaluate our model on the ISPRS Vaihingen and Potsdam datasets. We conducted extensive comparison experiments with state-of-the-art models and verified that MFRNet outperforms other models. Full article

(This article belongs to the Special Issue Image Enhancement and Fusion Techniques in Remote Sensing)

► Show Figures

Graphical abstract

2 pages, 613 KiB

Open AccessCorrection

Correction: Agriesti et al. Assignment of a Synthetic Population for Activity-Based Modeling Employing Publicly Available Data. ISPRS Int. J. Geo-Inf. 2022, 11, 148

by Serio Agriesti, Claudio Roncoli and Bat-hen Nahmias-Biran

ISPRS Int. J. Geo-Inf. 2024, 13(8), 284; https://doi.org/10.3390/ijgi13080284 - 13 Aug 2024

Viewed by 331

Abstract

In the original publication [...] Full article

► Show Figures

Figure 12

25 pages, 4045 KiB

Open AccessArticle

MBT-UNet: Multi-Branch Transform Combined with UNet for Semantic Segmentation of Remote Sensing Images

by Bin Liu, Bing Li, Victor Sreeram and Shuofeng Li

Remote Sens. 2024, 16(15), 2776; https://doi.org/10.3390/rs16152776 - 29 Jul 2024

Viewed by 590

Abstract

Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid [...] Read more.

Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid network model that combines the advantages of CNN and Transformer, called MBT-UNet. First, a multi-branch encoder design based on the pyramid vision transformer (PVT) is proposed to effectively capture multi-scale feature information; second, an efficient feature fusion module (FFM) is proposed to optimize the collaboration and integration of features at different scales; finally, in the decoder stage, a multi-scale upsampling module (MSUM) is proposed to further refine the segmentation results and enhance segmentation accuracy. We conduct experiments on the ISPRS Vaihingen dataset, the Potsdam dataset, the LoveDA dataset, and the UAVid dataset. Experimental results show that MBT-UNet surpasses state-of-the-art algorithms in key performance indicators, confirming its superior performance in high-precision remote sensing image segmentation tasks. Full article

(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation (Second Edition))

► Show Figures

Figure 1

26 pages, 12605 KiB

Open AccessArticle

Active Bidirectional Self-Training Network for Cross-Domain Segmentation in Remote-Sensing Images

by Zhujun Yang, Zhiyuan Yan, Wenhui Diao, Yihang Ma, Xinming Li and Xian Sun

Remote Sens. 2024, 16(13), 2507; https://doi.org/10.3390/rs16132507 - 8 Jul 2024

Viewed by 602

Abstract

Semantic segmentation with cross-domain adaptation in remote-sensing images (RSIs) is crucial and mitigates the expense of manually labeling target data. However, the performance of existing unsupervised domain adaptation (UDA) methods is still significantly impacted by domain bias, leading to a considerable gap compared [...] Read more.

Semantic segmentation with cross-domain adaptation in remote-sensing images (RSIs) is crucial and mitigates the expense of manually labeling target data. However, the performance of existing unsupervised domain adaptation (UDA) methods is still significantly impacted by domain bias, leading to a considerable gap compared to supervised trained models. To address this, our work focuses on semi-supervised domain adaptation, selecting a small subset of target annotations through active learning (AL) that maximize information to improve domain adaptation. Overall, we propose a novel active bidirectional self-training network (ABSNet) for cross-domain semantic segmentation in RSIs. ABSNet consists of two sub-stages: a multi-prototype active region selection (MARS) stage and a source-weighted class-balanced self-training (SCBS) stage. The MARS approach captures the diversity in labeled source data by introducing multi-prototype density estimation based on Gaussian mixture models. We then measure inter-domain similarity to select complementary and representative target samples. Through fine-tuning with the selected active samples, we propose an enhanced self-training strategy SCBS, designed for weighted training on source data, aiming to avoid the negative effects of interfering samples. We conduct extensive experiments on the LoveDA and ISPRS datasets to validate the superiority of our method over existing state-of-the-art domain-adaptive semantic segmentation methods. Full article

(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)

► Show Figures

Figure 1

21 pages, 16543 KiB

Open AccessArticle

Bidirectional Feature Fusion and Enhanced Alignment Based Multimodal Semantic Segmentation for Remote Sensing Images

by Qianqian Liu and Xili Wang

Remote Sens. 2024, 16(13), 2289; https://doi.org/10.3390/rs16132289 - 22 Jun 2024

Viewed by 890

Abstract

Image–text multimodal deep semantic segmentation leverages the fusion and alignment of image and text information and provides more prior knowledge for segmentation tasks. It is worth exploring image–text multimodal semantic segmentation for remote sensing images. In this paper, we propose a bidirectional feature [...] Read more.

Image–text multimodal deep semantic segmentation leverages the fusion and alignment of image and text information and provides more prior knowledge for segmentation tasks. It is worth exploring image–text multimodal semantic segmentation for remote sensing images. In this paper, we propose a bidirectional feature fusion and enhanced alignment-based multimodal semantic segmentation model (BEMSeg) for remote sensing images. Specifically, BEMSeg first extracts image and text features by image and text encoders, respectively, and then the features are provided for fusion and alignment to obtain complementary multimodal feature representation. Secondly, a bidirectional feature fusion module is proposed, which employs self-attention and cross-attention to adaptively fuse image and text features of different modalities, thus reducing the differences between multimodal features. For multimodal feature alignment, the similarity between the image pixel features and text features is computed to obtain a pixel–text score map. Thirdly, we propose a category-based pixel-level contrastive learning on the score map to reduce the differences among the same category’s pixels and increase the differences among the different categories’ pixels, thereby enhancing the alignment effect. Additionally, a positive and negative sample selection strategy based on different images is explored during contrastive learning. Averaging pixel values across different training images for each category to set positive and negative samples compares global pixel information while also limiting sample quantity and reducing computational costs. Finally, the fused image features and aligned pixel–text score map are concatenated and fed into the decoder to predict the segmentation results. Experimental results on the ISPRS Potsdam, Vaihingen, and LoveDA datasets demonstrate that BEMSeg is superior to comparison methods on the Potsdam and Vaihingen datasets, with improvements in mIoU ranging from 0.57% to 5.59% and 0.48% to 6.15%, and compared with Transformer-based methods, BEMSeg also performs competitively on LoveDA dataset with improvements in mIoU ranging from 0.37% to 7.14%. Full article

(This article belongs to the Special Issue Image Processing from Aerial and Satellite Imagery)

► Show Figures

Figure 1

Figure 1
The framework of the proposed BEMSeg, which consists of an image encoder, a text encoder, multimodal feature fusion, an alignment module, and a decoder. BFF and CPC denote the bidirectional feature fusion module and category-based pixel-level contrastive learning in the multimodal feature fusion and alignment module, and C denotes the number of categories. The colored squares at the top denote the text features of different categories. Full article ">Figure 2
We propose a new attention-based bidirectional feature fusion module. This structure enables image-attention text features to be incorporated into text representations (and vice versa) by designing a dual-branch structure and adding a self-attention mechanism. Full article ">Figure 3
Multi-scale image feature fusion network in the decoder of Semantic FPN. Full article ">Figure 4
Proportion of the number of pixels in each category in Potsdam and Vaihingen remote sensing datasets. Full article ">Figure 5
The qualitative results of comparison methods on some test images of the Potsdam dataset. Full article ">Figure 6
The IoU of each class of comparison methods on the Vaihingen dataset. Full article ">Figure 7
Different parameters of <math display="inline"><semantics> <msub> <mi>λ</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>λ</mi> <mn>2</mn> </msub> </semantics></math> correspond to the mIoU value of the BEMSeg model on the Potsdam dataset. Full article ">

28 pages, 5447 KiB

Open AccessReview

A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping

by Segun Ajibola and Pedro Cabral

Remote Sens. 2024, 16(12), 2222; https://doi.org/10.3390/rs16122222 - 19 Jun 2024

Cited by 1 | Viewed by 1377

Abstract

Recent advancements in deep learning have spurred the development of numerous novel semantic segmentation models for land cover mapping, showcasing exceptional performance in delineating precise boundaries and producing highly accurate land cover maps. However, to date, no systematic literature review has comprehensively examined [...] Read more.

Recent advancements in deep learning have spurred the development of numerous novel semantic segmentation models for land cover mapping, showcasing exceptional performance in delineating precise boundaries and producing highly accurate land cover maps. However, to date, no systematic literature review has comprehensively examined semantic segmentation models in the context of land cover mapping. This paper addresses this gap by synthesizing recent advancements in semantic segmentation models for land cover mapping from 2017 to 2023, drawing insights on trends, data sources, model structures, and performance metrics based on a review of 106 articles. Our analysis identifies top journals in the field, including MDPI Remote Sensing, IEEE Journal of Selected Topics in Earth Science, and IEEE Transactions on Geoscience and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, and ISPRS Journal Of Photogrammetry And Remote Sensing. We find that research predominantly focuses on land cover, urban areas, precision agriculture, environment, coastal areas, and forests. Geographically, 35.29% of the study areas are located in China, followed by the USA (11.76%), France (5.88%), Spain (4%), and others. Sentinel-2, Sentinel-1, and Landsat satellites emerge as the most used data sources. Benchmark datasets such as ISPRS Vaihingen and Potsdam, LandCover.ai, DeepGlobe, and GID datasets are frequently employed. Model architectures predominantly utilize encoder–decoder and hybrid convolutional neural network-based structures because of their impressive performances, with limited adoption of transformer-based architectures due to its computational complexity issue and slow convergence speed. Lastly, this paper highlights existing key research gaps in the field to guide future research directions. Full article

(This article belongs to the Special Issue Advances of Remote Sensing in Land Cover and Land Use Mapping)

► Show Figures

Figure 1

24 pages, 3332 KiB

Open AccessArticle

U-Net Ensemble for Enhanced Semantic Segmentation in Remote Sensing Imagery

by Ivica Dimitrovski, Vlatko Spasev, Suzana Loshkovska and Ivan Kitanovski

Remote Sens. 2024, 16(12), 2077; https://doi.org/10.3390/rs16122077 - 8 Jun 2024

Cited by 1 | Viewed by 1411

Abstract

Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates [...] Read more.

Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates in-depth analysis and comprehension of the Earth’s surface. In this paper, we propose an approach for enhancing semantic segmentation performance by employing an ensemble of U-Net models with three different backbone networks: Multi-Axis Vision Transformer, ConvFormer, and EfficientNet. The final segmentation maps are generated through a geometric mean ensemble method, leveraging the diverse representations learned by each backbone network. The effectiveness of the base U-Net models and the proposed ensemble is evaluated on multiple datasets commonly used for semantic segmentation tasks in remote sensing imagery, including LandCover.ai, LoveDA, INRIA, UAVid, and ISPRS Potsdam datasets. Our experimental results demonstrate that the proposed approach achieves state-of-the-art performance, showcasing its effectiveness and robustness in accurately capturing the semantic information embedded within remote sensing images. Full article

(This article belongs to the Special Issue GeoAI and EO Big Data Driven Advances in Earth Environmental Science)

► Show Figures

Figure 1

14 pages, 3490 KiB

Open AccessArticle

Rapid and Sensitive Detection by Combining Electric Field Effects and Surface Plasmon Resonance: A Theoretical Study

by Qijie Qiu and Yan Xu

Micromachines 2024, 15(5), 653; https://doi.org/10.3390/mi15050653 - 15 May 2024

Cited by 1 | Viewed by 879

Abstract

Surface plasmon resonance (SPR) has been extensively employed in biological sensing, environmental detection, as well as chemical industry. Nevertheless, the performance possessed by conventional surface plasmon resonance (SPR) biosensors can be further limited by the transport of analyte molecules to the sensing surface, [...] Read more.

Surface plasmon resonance (SPR) has been extensively employed in biological sensing, environmental detection, as well as chemical industry. Nevertheless, the performance possessed by conventional surface plasmon resonance (SPR) biosensors can be further limited by the transport of analyte molecules to the sensing surface, noteworthily when small molecules or low levels of substances are being detected. In this study, a rapid and highly sensitive SPR biosensor is introduced to enhance the ability of the target analytes’ collection by integrating AC electroosmosis (ACEO) and dielectrophoresis (DEP). Both the above-mentioned phenomena principally arise from the generation of the AC electric fields. This generation can be tailored by shaping the interdigitated electrodes (IDEs) that also serve as the SPR biomarker sensing area. The effects exerted by different parameters (e.g., the frequency and voltage of the AC electric field as well as microelectrode structures) are considered in the iSPR (interdigitated SPR) biosensor operation, and the iSPR biosensors are optimized with the sensitivity. The results of this study confirm that the iSPR can efficiently concentrate small molecules into the SPR sensing area, such that SPR reactions achieve an order of magnitude increase, and the detection time is shortened. The rapid and sensitive sensor takes on critical significance in the development of on-site diagnostics in a wide variety of human and animal health applications. Full article

(This article belongs to the Special Issue Micromachines for Dielectrophoresis, 3rd Edition)

► Show Figures

Figure 1

19 pages, 2039 KiB

Open AccessArticle

EAD-Net: Efficiently Asymmetric Network for Semantic Labeling of High-Resolution Remote Sensing Images with Dynamic Routing Mechanism

by Qiongqiong Hu, Feiting Wang and Ying Li

Remote Sens. 2024, 16(9), 1478; https://doi.org/10.3390/rs16091478 - 23 Apr 2024

Viewed by 665

Abstract

Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it [...] Read more.

Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it is also crucial to improve segmentation speed. To address this issue, we propose an efficient asymmetric deep learning network for HRRSIs, referred to as EAD-Net. First, EAD-Net employs ResNet50 as the backbone without pooling, instead of the RepVGG block, to extract rich semantic features while reducing model complexity. Second, a dynamic routing module is proposed in EAD-Net to adjust routing based on the pixel occupancy of small-scale objects. Concurrently, a channel attention mechanism is used to preserve their features even with minimal occupancy. Third, a novel asymmetric decoder is introduced, which uses convolutional operations while discarding skip connections. This not only effectively reduces redundant features but also allows using low-level image features to enhance EAD-Net’s performance. Extensive experimental results on the ISPRS 2D semantic labeling challenge benchmark demonstrate that EAD-Net achieves state-of-the-art (SOTA) accuracy performance while reducing model complexity and inference time, while the mean Intersection over Union (mIoU) score reaching 87.38% and 93.10% in the Vaihingen and Potsdam datasets, respectively. Full article

► Show Figures

Figure 1

23 pages, 7834 KiB

Open AccessArticle

A Multiscale Filtering Method for Airborne LiDAR Data Using Modified 3D Alpha Shape

by Di Cao, Cheng Wang, Meng Du and Xiaohuan Xi

Remote Sens. 2024, 16(8), 1443; https://doi.org/10.3390/rs16081443 - 18 Apr 2024

Viewed by 1011

Abstract

The complexity of terrain features poses a substantial challenge in the effective processing and application of airborne LiDAR data, particularly in regions characterized by steep slopes and diverse objects. In this paper, we propose a novel multiscale filtering method utilizing a modified 3D [...] Read more.

The complexity of terrain features poses a substantial challenge in the effective processing and application of airborne LiDAR data, particularly in regions characterized by steep slopes and diverse objects. In this paper, we propose a novel multiscale filtering method utilizing a modified 3D alpha shape algorithm to increase the ground point extraction accuracy in complex terrain. Our methodology comprises three pivotal stages: preprocessing for outlier removal and potential ground point extraction; the deployment of a modified 3D alpha shape to construct multiscale point cloud layers; and the use of a multiscale triangulated irregular network (TIN) densification process for precise ground point extraction. In each layer, the threshold is adaptively determined based on the corresponding

α

. Points closer to the TIN surface than the threshold are identified as ground points. The performance of the proposed method was validated using a classical benchmark dataset provided by the ISPRS and an ultra-large-scale ground filtering dataset called OpenGF. The experimental results demonstrate that this method is effective, with an average total error and a kappa coefficient on the ISPRS dataset of 3.27% and 88.97%, respectively. When tested in the large scenarios of the OpenGF dataset, the proposed method outperformed four classical filtering methods and achieved accuracy comparable to that of the best of learning-based methods. Full article

(This article belongs to the Special Issue Aerial and Drone LiDAR Data for Geomorphological Mapping, Landform Extraction and Landscape Evolution)

► Show Figures

Graphical abstract

20 pages, 4443 KiB

Open AccessArticle

PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling

by Ruixing Chen, Jun Wu, Ying Luo and Gang Xu

Remote Sens. 2024, 16(7), 1246; https://doi.org/10.3390/rs16071246 - 31 Mar 2024

Cited by 2 | Viewed by 1117

Abstract

For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The [...] Read more.

For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The main contribution of PointMM involves two aspects: (1) Multi-spatial feature encoding. We leverage a novel feature encoding module to learn multi-spatial features from the neighborhood point set obtained by k-nearest neighbors (KNN) in the feature space. This enhances the network’s ability to learn the spatial structures of various samples more finely and completely. (2) Multi-head attention pooling. We leverage a multi-head attention pooling module to address the limitations of symmetric function-based pooling, such as maximum and average pooling, in terms of losing detailed feature information. This is achieved by aggregating multi-spatial and attribute features of point clouds, thereby enhancing the network’s ability to transmit information more comprehensively and accurately. Experiments on publicly available point cloud datasets S3DIS and ISPRS 3D Vaihingen demonstrate that PointMM effectively learns features at different levels, while improving the semantic segmentation accuracy of various objects. Compared to 12 state-of-the-art methods reported in the literature, PointMM outperforms the runner-up by 2.3% in OA on the ISPRS 3D Vaihingen dataset, and achieves the third best performance in both OA and MioU on the S3DIS dataset. Both achieve a satisfactory balance between OA, F1, and MioU. Full article

(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)

► Show Figures

Figure 1

25 pages, 4894 KiB

Open AccessArticle

A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images

by Xin Li, Xi Yong, Tao Li, Yao Tong, Hongmin Gao, Xinyuan Wang, Zhennan Xu, Yiwei Fang, Qian You and Xin Lyu

Remote Sens. 2024, 16(7), 1214; https://doi.org/10.3390/rs16071214 - 29 Mar 2024

Cited by 2 | Viewed by 909

Abstract

Semantic segmentation of remote sensing images (RSIs) is pivotal for numerous applications in urban planning, agricultural monitoring, and environmental conservation. However, traditional approaches have primarily emphasized learning within the spatial domain, which frequently leads to less than optimal discrimination of features. Considering the [...] Read more.

Semantic segmentation of remote sensing images (RSIs) is pivotal for numerous applications in urban planning, agricultural monitoring, and environmental conservation. However, traditional approaches have primarily emphasized learning within the spatial domain, which frequently leads to less than optimal discrimination of features. Considering the inherent spectral qualities of RSIs, it is essential to bolster these representations by incorporating the spectral context in conjunction with spatial information to improve discriminative capacity. In this paper, we introduce the spectral–spatial context-boosted network (SSCBNet), an innovative network designed to enhance the accuracy semantic segmentation in RSIs. SSCBNet integrates synergetic attention (SYA) layers and cross-fusion modules (CFMs) to harness both spectral and spatial information, addressing the intrinsic complexities of urban and natural landscapes within RSIs. Extensive experiments on the ISPRS Potsdam and LoveDA datasets reveal that SSCBNet surpasses existing state-of-the-art models, achieving remarkable results in

F_{1}

-scores, overall accuracy (OA), and mean intersection over union (mIoU). Ablation studies confirm the significant contribution of SYA layers and CFMs to the model’s performance, emphasizing the effectiveness of these components in capturing detailed contextual cues. Full article

(This article belongs to the Special Issue Multi-Platform and Multi-Modal Remote Sensing Data Fusion with Advanced Deep Learning Techniques)

► Show Figures

Figure 1

27 pages, 46596 KiB

Open AccessArticle

Adaptive Clustering for Point Cloud

by Zitao Lin, Chuanli Kang, Siyi Wu, Xuanhao Li, Lei Cai, Dan Zhang and Shiwei Wang

Sensors 2024, 24(3), 848; https://doi.org/10.3390/s24030848 - 28 Jan 2024

Cited by 1 | Viewed by 1364

Abstract

The point cloud segmentation method plays an important role in practical applications, such as remote sensing, mobile robots, and 3D modeling. However, there are still some limitations to the current point cloud data segmentation method when applied to large-scale scenes. Therefore, this paper [...] Read more.

The point cloud segmentation method plays an important role in practical applications, such as remote sensing, mobile robots, and 3D modeling. However, there are still some limitations to the current point cloud data segmentation method when applied to large-scale scenes. Therefore, this paper proposes an adaptive clustering segmentation method. In this method, the threshold for clustering points within the point cloud is calculated using the characteristic parameters of adjacent points. After completing the preliminary segmentation of the point cloud, the segmentation results are further refined according to the standard deviation of the cluster points. Then, the cluster points whose number does not meet the conditions are further segmented, and, finally, scene point cloud data segmentation is realized. To test the superiority of this method, this study was based on point cloud data from a park in Guilin, Guangxi, China. The experimental results showed that this method is more practical and efficient than other methods, and it can effectively segment all ground objects and ground point cloud data in a scene. Compared with other segmentation methods that are easily affected by parameters, this method has strong robustness. In order to verify the universality of the method proposed in this paper, we test a public data set provided by ISPRS. The method achieves good segmentation results for multiple sample data, and it can distinguish noise points in a scene. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI