Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion
<p>Overview of the Efficient Future Fusion Network (EFFNet): The network utilizes cropped and downsampled full-resolution patches and consists of both local and global branches. After passing the local feature maps through ResNet and applying one-dimensional convolution, the score map module employs a Sigmoid activation function to extract significant local features. These local features are then efficiently fused with global features using a fast fusion mechanism, resulting in a high-resolution and information-rich feature map that is utilized for final semantic segmentation. Consequently, our objective is to enhance the accuracy and efficiency of the network by introducing two attention-based mechanism modules designed to reduce the processing load of local features while improving feature matching across samples.</p> "> Figure 2
<p>Score map module. The input image is subjected to two successive layers of 3 × 3 convolutions using ResNet, resulting in a feature map of dimensions H × W × 1. Following the Sigmoid function activation, the feature map is indexed, and high-value features are selectively retained.</p> "> Figure 3
<p>The fast fusion mechanism facilitates seamless integration between the global and local branches, enabling extensive collaboration through the fusion of feature maps at each layer using multiple attention weights. The model’s depth determines the number of layers, while the merging process occurs N times based on the number of cropped global patches. Subsequently, these attention weights are computed by leveraging local and global features such as <span class="html-italic">Q</span>, <span class="html-italic">K</span>, and <span class="html-italic">V</span>. The optimization objective in this study encompasses a primary loss derived from the merged results along with two additional losses.</p> "> Figure 4
<p>The GPU inference frames per second (FPS) and Mean Intersection over Union (mIoU) accuracy are evaluated on the (<b>a</b>) Vaihingen and (<b>b</b>) Potsdam datasets. EFFNet (represented by red dots) outperforms existing networks, including GLNet, in terms of both inference speed and accuracy for segmenting ultra-high-resolution images.</p> "> Figure 5
<p>Semantic segmentation results when adopting different modules on (<b>a</b>) the Vaihingen and (<b>b</b>) the Potsdam datasets.</p> "> Figure 6
<p>Ablation study of different transformer locations.</p> "> Figure 7
<p>Comparison of semantic segmentation results with and without score map on (<b>a</b>) Vaihingen and (<b>b</b>) Potsdam datasets.</p> "> Figure 8
<p>Ablation study of different numbers of patches.</p> ">
Abstract
:1. Introduction
- A novel approach to improve the accuracy and speed of image semantic segmentation task for ultra-high-resolution images called Efficient Future Fusion Network (EFFNet) is proposed.
- A score map module is proposed to reduce fusion operations and increase efficiency. The score map module is based on a dimension reduction convolutional attention mechanism. This mechanism calculates the global feature vector through global average pooling and learns the relationship between channels using a one-dimensional convolution operation.
- A fast fusion mechanism is introduced to improve the efficiency of multiscale feature fusion. The fast fusion mechanism promotes seamless integration between global and local branches, achieving extensive collaboration by using multiple attention weights to fuse feature maps at each layer.
- The experimental results show that EFFNet outperforms other state-of-the-art network architectures on challenging datasets such as Vaihingen and Potsdam, with improvements in both efficiency and accuracy.
2. Related Work
2.1. Semantic Segmentation of Remote Sensing Images
2.2. Multiscale, Context Aggregation, and Attention Mechanism
2.3. Network for High-Resolution Images
3. Method
3.1. Overview
3.2. Score Map Module
3.3. Fast Fusion Mechanism
3.4. Training Process and Loss Function
4. Experiments
4.1. Dataset
4.2. Implementation Details
4.3. Evaluation Metric
4.4. Quantitative Results: Accuracy and Inference Speed Comparison
4.5. Visualization Results
4.6. Ablation Study
4.6.1. Location of Transformer
4.6.2. Number of Transformer Blocks
4.6.3. Without Score Map
4.6.4. Number of Patch
5. Discussion and Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pan, X.; Gao, L.; Marinoni, A.; Zhang, B.; Yang, F.; Gamba, P. Semantic labeling of high resolution aerial imagery and LiDAR data with fine segmentation network. Remote Sens. 2018, 10, 743. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldu, F. Deep learning in agriculture: A survey, computers and electronics in agriculture. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Li, W.; Guo, Q.; Elkan, C. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Trans. Geosci. Remote Sens. 2010, 49, 717–725. [Google Scholar] [CrossRef]
- Zhang, C.; Xie, Z. Object-based vegetation mapping in the Kissimmee River watershed using HyMap data and machine learning techniques. Wetlands 2013, 33, 233–244. [Google Scholar] [CrossRef]
- Liu, C.; Frazier, P.; Kumar, L. Comparative assessment of the measures of thematic classification accuracy. Remote Sens. Environ. 2007, 107, 606–616. [Google Scholar] [CrossRef]
- Fassnacht, F.E.; Latifi, H.; Stereńczak, K.; Modzelewska, A.; Lefsky, M.; Waser, L.T.; Straub, C.; Ghosh, A. Review of studies on tree species classification from remotely sensed data. Remote Sens. Environ. 2016, 186, 64–87. [Google Scholar] [CrossRef]
- Stow, D.A.; Hope, A.; McGuire, D.; Verbyla, D.; Gamon, J.; Huemmrich, F.; Houston, S.; Racine, C.; Sturm, M.; Tape, K.; et al. Remote sensing of vegetation and land-cover change in Arctic Tundra Ecosystems. Remote Sens. Environ. 2004, 89, 281–308. [Google Scholar] [CrossRef]
- Ascher, S.; Pincus, E. The Filmmaker’s Handbook: A Comprehensive Guide for the Digital Age; Penguin: New York, NY, USA, 1999. [Google Scholar]
- Lilly, P. Samsung Launches Insanely Wide 32: 9 Aspect Ratio Monitor with HDR and Freesync 2. 2017. Available online: https://www.pcgamer.com/samsung-launches-a-massive-49-inch-ultrawide-hdr-monitor-with-freesync-2/ (accessed on 31 August 2024).
- Akundy, V.A.; Wang, Z. 4K or not?—Automatic image resolution assessment. In Proceedings of the Image Analysis and Recognition: 17th International Conference, ICIAR 2020, Póvoa de Varzim, Portugal, 24–26 June 2020; Proceedings, Part I 17. Springer: Berlin/Heidelberg, Germany, 2020; pp. 61–65. [Google Scholar]
- Dong, S.; Li, Y.; Zhang, Z.; Gou, T.; Xie, M. A transfer-learning-based windspeed estimation on the ocean surface: Implication for the requirements on the spatial-spectral resolution of remote sensors. Appl. Intell. 2024, 54, 7603–7620. [Google Scholar] [CrossRef]
- Du, X.; He, S.; Yang, H.; Wang, C. Multi-Field Context Fusion Network for Semantic Segmentation of High-Spatial-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 5830. [Google Scholar] [CrossRef]
- Su, Y.; Cheng, J.; Bai, H.; Liu, H.; He, C. Semantic segmentation of very-high-resolution remote sensing images via deep multi-feature learning. Remote Sens. 2022, 14, 533. [Google Scholar] [CrossRef]
- Smith, L.N.; Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Baltimore, MD, USA, 15–17 April 2019; SPIE: Bellingham, WA, USA, 2019; Volume 11006, pp. 369–386. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Chen, W.; Jiang, Z.; Wang, Z.; Cui, K.; Qian, X. Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8924–8933. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Ma, X.; Zhang, X.; Pun, M.O.; Liu, M. A multilevel multimodal fusion transformer for remote sensing semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5403215. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.O. RS 3 Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6011405. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Wang, Z.; Pun, M.O. Unsupervised domain adaptation augmented by mutually boosted attention for semantic segmentation of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5400515. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Chantharaj, S.; Pornratthanapong, K.; Chitsinpchayakun, P.; Panboonyuen, T.; Vateekul, P.; Lawavirojwong, S.; Srestasathiern, P.; Jitkajornwanich, K. Semantic segmentation on medium-resolution satellite images using deep convolutional networks with remote sensing derived indices. In Proceedings of the 2018 IEEE 15th International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhonpathom, Thailand, 11–13 July 2018; pp. 1–6. [Google Scholar]
- Bao, Y.; Liu, W.; Gao, O.; Lin, Z.; Hu, Q. E-Unet++: A Semantic Segmentation Method for Remote Sensing Images. In Proceedings of the 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 18–20 June 2021; Volume 4, pp. 1858–1862. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, B. RSUnet: A New Full-scale Unet for Semantic Segmentation of Remote Sensing Images. 2022. Available online: https://www.researchsquare.com/article/rs-1211375/v1 (accessed on 31 August 2024).
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3640–3649. [Google Scholar]
- Xia, F.; Wang, P.; Chen, L.C.; Yuille, A.L. Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part V 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 648–663. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Zhang, J.; Lin, S.; Ding, L.; Bruzzone, L. Multi-scale context aggregation for semantic segmentation of remote sensing images. Remote Sens. 2020, 12, 701. [Google Scholar] [CrossRef]
- Wang, Z.; Zhou, Y.; Wang, F.; Wang, S.; Qin, G.; Zou, W.; Zhu, J. A Multi-Scale Edge Constraint Network for the Fine Extraction of Buildings from Remote Sensing Images. Remote Sens. 2023, 15, 927. [Google Scholar] [CrossRef]
- Liu, Y.; Gross, L.; Li, Z.; Li, X.; Fan, X.; Qi, W. Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling. IEEE Access 2019, 7, 128774–128786. [Google Scholar] [CrossRef]
- Nong, Z.; Su, X.; Liu, Y.; Zhan, Z.; Yuan, Q. Boundary-Aware Dual-Stream Network for VHR Remote Sensing Images Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5260–5268. [Google Scholar] [CrossRef]
- He, G.; Dong, Z.; Feng, P.; Muhtar, D.; Zhang, X. Dual-Range Context Aggregation for Efficient Semantic Segmentation in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 2500605. [Google Scholar] [CrossRef]
- Ma, H.; Yang, H.; Huang, D. Boundary guided context aggregation for semantic segmentation. arXiv 2021, arXiv:2110.14587. [Google Scholar]
- Bai, H.; Cheng, J.; Huang, X.; Liu, S.; Deng, C. HCANet: A hierarchical context aggregation network for semantic segmentation of high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 6002105. [Google Scholar] [CrossRef]
- Liu, Z.; Li, J.; Song, R.; Wu, C.; Liu, W.; Li, Z.; Li, Y. Edge Guided Context Aggregation Network for Semantic Segmentation of Remote Sensing Imagery. Remote Sens. 2022, 14, 1353. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, J.; Deng, H. Global Multi-Attention UResNeXt for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 1836. [Google Scholar] [CrossRef]
- Liu, K.H.; Lin, B.Y. MSCSA-Net: Multi-scale channel spatial attention network for semantic segmentation of remote sensing images. Appl. Sci. 2023, 13, 9491. [Google Scholar] [CrossRef]
- Guo, R.; Liu, J.; Li, N.; Liu, S.; Chen, F.; Cheng, B.; Duan, J.; Li, X.; Ma, C. Pixel-wise classification method for high resolution remote sensing imagery using deep neural networks. Isprs Int. J. -Geo-Inf. 2018, 7, 110. [Google Scholar] [CrossRef]
- Liu, P.; Liu, X.; Liu, M.; Shi, Q.; Yang, J.; Xu, X.; Zhang, Y. Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sens. 2019, 11, 830. [Google Scholar] [CrossRef]
- Alam, M.; Wang, J.F.; Guangpei, C.; Yunrong, L.; Chen, Y. Convolutional neural network for the semantic segmentation of remote sensing images. Mob. Netw. Appl. 2021, 26, 200–215. [Google Scholar] [CrossRef]
- Qiao, W.; Shen, L.; Wang, J.; Yang, X.; Li, Z. A weakly supervised semantic segmentation approach for damaged building extraction from postearthquake high-resolution remote-sensing images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6002705. [Google Scholar] [CrossRef]
- Wang, Y.; Li, Y.; Chen, W.; Li, Y.; Dang, B. DNAS: Decoupling Neural Architecture Search for High-Resolution Remote Sensing Image Semantic Segmentation. Remote Sens. 2022, 14, 3864. [Google Scholar] [CrossRef]
- Li, X.; Lei, L.; Kuang, G. Multilevel adaptive-scale context aggregating network for semantic segmentation in high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 6003805. [Google Scholar] [CrossRef]
- Chong, Y.; Chen, X.; Pan, S. Context union edge network for semantic segmentation of small-scale objects in very high resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 19, 6003805. [Google Scholar] [CrossRef]
- Zhou, L.; Zhao, H.; Liu, Z.; Cai, K.; Liu, Y.; Zuo, X. MHLDet: A Multi-Scale and High-Precision Lightweight Object Detector Based on Large Receptive Field and Attention Mechanism for Remote Sensing Images. Remote Sens. 2023, 15, 4625. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A.; Agrawal, A. Context encoding for semantic segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7151–7160. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 801–818, ISBN 978-3-030-01233-5. [Google Scholar]
- Mou, L.; Hua, Y.; Zhu, X.X. A relation-augmented fully convolutional network for semantic segmentation in aerial scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12416–12425. [Google Scholar]
- Shan, L.; Wang, W. MBNet: A Multi-Resolution Branch Network for Semantic Segmentation Of Ultra-High Resolution Images. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; pp. 2589–2593. [Google Scholar]
- Shan, L.; Li, M.; Li, X.; Bai, Y.; Lv, K.; Luo, B.; Chen, S.B.; Wang, W. Uhrsnet: A semantic segmentation network specifically for ultra-high-resolution images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 1460–1466. [Google Scholar]
- Li, Q.; Yang, W.; Liu, W.; Yu, Y.; He, S. From contexts to locality: Ultra-high resolution image segmentation via locality-aware contextual correlation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7252–7261. [Google Scholar]
- Huynh, C.; Tran, A.T.; Luu, K.; Hoai, M. Progressive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Montreal, BC, Canada, 11–17 October 2021; pp. 16755–16764. [Google Scholar]
- Chen, W.; Li, Y.; Dang, B.; Zhang, Y. EHSNet: End-to-End Holistic Learning Network for Large-Size Remote Sensing Image Semantic Segmentation. arXiv 2022, arXiv:2211.11316. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
- Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; Huang, T. Seggpt: Segmenting everything in context. arXiv 2023, arXiv:2304.03284. [Google Scholar]
- Prades, J.; Safont, G.; Salazar, A.; Vergara, L. Estimation of the number of endmembers in hyperspectral images using agglomerative clustering. Remote Sens. 2020, 12, 3585. [Google Scholar] [CrossRef]
Model Name | Impervious Surface | Building | Low Vegetation | Tree | Car | OA | Mean F1 | mIoU |
---|---|---|---|---|---|---|---|---|
FCN-8s [24] | 90.0 | 93.0 | 77.7 | 86.5 | 80.4 | 88.3 | 85.5 | 75.5 |
UNet [52] | 90.5 | 93.3 | 79.6 | 87.5 | 76.4 | 89.2 | 85.5 | 75.5 |
SegNet [53] | 90.2 | 93.7 | 78.5 | 85.8 | 83.9 | 88.5 | 86.4 | 76.8 |
EncNet [54] | 91.2 | 94.1 | 79.2 | 86.9 | 83.7 | 89.4 | 87.0 | 77.8 |
RefineNet [55] | 91.1 | 94.1 | 79.8 | 87.2 | 82.3 | 88.9 | 86.9 | 77.1 |
CCEM [56] | 91.5 | 93.8 | 79.4 | 87.3 | 83.5 | 88.6 | 87.1 | 78.0 |
DeepLabv3+ [57] | 91.4 | 94.7 | 79.6 | 87.6 | 85.8 | 89.9 | 87.8 | 79.0 |
S-RA-FCN [58] | 90.5 | 93.8 | 79.6 | 87.5 | 82.6 | 89.2 | 86.8 | 77.3 |
GLNet [18] | 89.3 | 92.4 | 79.0 | 85.7 | 79.6 | 88.5 | 85.2 | 78.4 |
MBNet [59] | 90.4 | 93.3 | 79.2 | 86.0 | 80.1 | 85.0 | 85.8 | 77.9 |
UHRSNet [60] | 90.1 | 92.9 | 79.4 | 86.2 | 79.7 | 85.7 | 85.7 | 78.7 |
FCtL [61] | 90.4 | 90.7 | 80.2 | 87.9 | 84.0 | 86.6 | 86.1 | 80.0 |
MagNet [62] | 91.1 | 92.7 | 79.0 | 83.6 | 84.1 | 85.9 | 86.2 | 79.8 |
EHSNet [63] | 87.9 | 89.7 | 78.2 | 81.9 | 80.7 | 83.7 | 87.4 | 78.3 |
Mask2Former [64] | 89.9 | 91.8 | 80.4 | 82.1 | 81.4 | 84.9 | 88.9 | 81.9 |
SegGPT [65] | 88.3 | 89.6 | 77.0 | 86.4 | 83.4 | 84.9 | 89.0 | 80.3 |
EFFNet | 92.2 | 95.3 | 82.1 | 88.8 | 87.0 | 91.0 | 89.1 | 81.1 |
Model Name | Impervious Surface | Building | Low Vegetation | Tree | Car | OA | Mean F1 | mIoU |
---|---|---|---|---|---|---|---|---|
FCN-8s [24] | 89.9 | 93.7 | 83.0 | 85.2 | 93.5 | 87.8 | 89.1 | 71.7 |
UNet [52] | 88.2 | 91.1 | 82.8 | 84.9 | 91.6 | 86.2 | 87.7 | 68.5 |
SegNet [53] | 87.8 | 90.7 | 81.0 | 84.7 | 89.7 | 85.7 | 86.8 | 66.2 |
EncNet [54] | 91.0 | 94.9 | 84.4 | 85.9 | 93.6 | 89.0 | 90.0 | 73.2 |
RefineNet [55] | 88.1 | 93.1 | 85.6 | 86.3 | 90.3 | 88.3 | 88.7 | 72.6 |
CCEM [56] | 88.3 | 93.2 | 84.7 | 86.0 | 92.8 | 89.1 | 89.0 | 72.8 |
DeepLabv3+ [57] | 91.3 | 94.8 | 84.2 | 86.6 | 93.8 | 89.2 | 90.1 | 73.8 |
S-RA-FCN [58] | 90.7 | 94.2 | 83.8 | 85.8 | 93.6 | 88.5 | 89.6 | 72.5 |
GLNet [18] | 89.7 | 90.4 | 80.9 | 83.3 | 91.7 | 84.7 | 87.2 | 72.1 |
MBNet [59] | 88.2 | 90.6 | 81.2 | 85.1 | 91.8 | 86.2 | 87.4 | 72.8 |
UHRSNet [60] | 90.3 | 91.1 | 82.0 | 86.0 | 92.0 | 87.0 | 88.3 | 73.0 |
FCtL [61] | 88.3 | 92.1 | 82.9 | 85.4 | 93.1 | 88.4 | 87.0 | 69.9 |
MagNet [62] | 90.3 | 93.7 | 83.4 | 84.6 | 93.0 | 89.0 | 86.8 | 70.3 |
EHSNet [63] | 88.7 | 92.2 | 82.6 | 83.6 | 92.7 | 88.0 | 85.1 | 67.3 |
Mask2Former [64] | 89.0 | 91.7 | 83.9 | 86.4 | 92.9 | 88.8 | 88.1 | 73.2 |
SegGPT [65] | 89.2 | 91.3 | 79.5 | 81.2 | 87.9 | 85.8 | 85.6 | 72.0 |
EFFNet | 92.1 | 94.6 | 86.3 | 87.9 | 93.9 | 92.0 | 91.0 | 74.6 |
Dataset | Impervious Surface | Building | Low Vegetation | Tree | Car | OA | Mean F1 | mIoU |
---|---|---|---|---|---|---|---|---|
Vaihingen | 92.2 ± 0.01 | 95.3 ± 0.01 | 82.1 ± 0.01 | 88.8 ± 0.02 | 87.0 ± 0.01 | 91.0 ± 0.01 | 89.1 ± 0.05 | 81.1 ± 0.20 |
Potsdam | 92.1 ± 0.02 | 94.6 ± 0.01 | 86.3 ± 0.01 | 87.9 ± 0.01 | 93.9 ± 0.01 | 92.0 ± 0.02 | 91.0 ± 0.04 | 74.6 ± 0.12 |
(a) | |||
Accuracy (mIoU%) | FPS | FLOPs | |
UNet | 75.5 | 0.09 | 1.36 |
FCN-8S | 75.5 | 0.04 | 4.52 |
DeepLab V3+ | 79.0 | 0.02 | 4.44 |
GLNet | 78.4 | 0.17 | 0.20 |
MBNet | 77.9 | 0.10 | 0.32 |
UHRSNet | 78.7 | 0.04 | 0.15 |
FCtL | 80.0 | 0.01 | 0.13 |
MagNet | 79.8 | 0.11 | 0.80 |
EFFNet | 81.1 | 0.22 | 4.75 |
(b) | |||
Accuracy (mIoU%) | FPS | FLOPs | |
UNet | 68.5 | 0.09 | 1.36 |
FCN-8S | 71.7 | 0.04 | 4.52 |
DeepLab V3+ | 73.8 | 0.02 | 4.44 |
GLNet | 72.1 | 0.17 | 0.20 |
MBNet | 72.8 | 0.10 | 0.32 |
UHRSNet | 73.0 | 0.04 | 0.15 |
FCtL | 69.9 | 0.01 | 0.13 |
MagNet | 70.3 | 0.11 | 0.80 |
EFFNet | 74.6 | 0.22 | 4.75 |
Backbone | Decoder | |
---|---|---|
mIoU | 79.2 | 81.1 |
With Score Map | Without Score Map | |
---|---|---|
mIoU | 81.1 | 77.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, Y.; Wang, M.; Huang, X.; Xin, C.; Sun, Y. Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion. Remote Sens. 2024, 16, 3248. https://doi.org/10.3390/rs16173248
Sun Y, Wang M, Huang X, Xin C, Sun Y. Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion. Remote Sensing. 2024; 16(17):3248. https://doi.org/10.3390/rs16173248
Chicago/Turabian StyleSun, Yihao, Mingrui Wang, Xiaoyi Huang, Chengshu Xin, and Yinan Sun. 2024. "Fast Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images via Score Map and Fast Transformer-Based Fusion" Remote Sensing 16, no. 17: 3248. https://doi.org/10.3390/rs16173248