Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates
<p>Architecture of the HRNet-v2 model.</p> "> Figure 2
<p>Channel attention gate.</p> "> Figure 3
<p>Spatial attention gate.</p> "> Figure 4
<p>Structure of the csAG module.</p> "> Figure 5
<p>Architecture of the csAG-HRNet model.</p> "> Figure 6
<p>Aerial images from Ansan and Siheung, South Korea.</p> "> Figure 7
<p>WHU dataset (blue box: training area, green box: validation area, red boxes: test areas).</p> "> Figure 8
<p>Building extraction results for site 1 (residential area) according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 9
<p>Detailed results for site 1 (residential area) according to each deep learning model. Red indicates false positives, and cyan indicates false negatives. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 10
<p>Building extraction results for test dataset 2 (industrial area) according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 10 Cont.
<p>Building extraction results for test dataset 2 (industrial area) according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 11
<p>Detailed results for site 1 (industrial area) according to each deep learning model. Red indicates false positives, and cyan indicates false negatives. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 12
<p>Building extraction results for site 2 according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 12 Cont.
<p>Building extraction results for site 2 according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> "> Figure 13
<p>Building extraction results for the WHU dataset according to each deep learning model. The red and cyan regions indicate false positives and false negatives, respectively. (<b>a</b>) Original image. (<b>b</b>) Ground-truth labels. (<b>c</b>) Results of SegNet. (<b>d</b>) Results of U-Net. (<b>e</b>) Results of FC-DenseNet. (<b>f</b>) Results of HRNet-v2. (<b>g</b>) Results of csAG-HRNet.</p> ">
Abstract
:1. Introduction
2. Methodology
2.1. HRNet-v2
2.2. csAG Module
2.3. csAG-HRNet
3. Experiments and Results
3.1. Datasets and Settings
3.1.1. South Korea Building Dataset Including Orthophoto and Vector Data
3.1.2. WHU Building Dataset
3.1.3. Experimental Settings
3.2. Accuracy Assessment
3.2.1. Evaluation Metrics
3.2.2. Evaluation Results for the South Korea Building Dataset
Residential Area of Site 1
Industrial Area of Site 1
Site 2
3.2.3. WHU Dataset Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.; Zhong, J.; Tan, Y. Multiple-Oriented and Small Object Detection with Convolutional Neural Networks for Aerial Image. Remote Sens. 2019, 11, 2176. [Google Scholar] [CrossRef] [Green Version]
- Francis, A.; Sidiropoulos, P.; Muller, J.-P. CloudFCN: Accurate and Robust Cloud Detection for Satellite Imagery with Deep Learning. Remote Sens. 2019, 11, 2312. [Google Scholar] [CrossRef] [Green Version]
- Seo, S.; Choi, J.; Lee, J.; Kim, H.; Seo, D.; Jeong, J.; Kim, M. UPSNet: Unsupervised Pan-Sharpening Network With Registration Learning Between Panchromatic and Multi-Spectral Images. IEEE Access 2020, 8, 201199–201217. [Google Scholar] [CrossRef]
- Gu, J.; Sun, X.; Zhang, Y.; Fu, K.; Wang, L. Deep Residual Squeeze and Excitation Network for Remote Sensing Image Super-Resolution. Remote Sens. 2019, 11, 1817. [Google Scholar] [CrossRef] [Green Version]
- Hou, B.; Liu, Q.; Wang, H.; Wang, Y. From W-Net to CDGAN: Bitemporal Change Detection via Deep Learning Techniques. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1790–1802. [Google Scholar] [CrossRef] [Green Version]
- Kang, W.; Xiang, Y.; Wang, F.; You, H. EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images. Remote Sens. 2019, 11, 2813. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 231–241. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Huang, G.; Liu, Z.; Van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Pattern Recognition and Computer Vision 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The one hundred layers tiramisu: Fully convolutional DenseNets for semantic segmentation. In Proceedings of the Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 1175–1183. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Lim, J.S.; Astrid, M.; Yoon, H.J.; Lee, S.I. Small Object Detection using Context and Attention. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–21 June 2019; pp. 181–186. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Sirmacek, B.; Unsalan, C. Building detection from aerial images using invariant color features and shadow information. In Proceedings of the 23rd International Symposium on Computer and Information Sciences, Istanbul, Turkey, 27–29 October 2008; pp. 1–5. [Google Scholar]
- Zhang, Y. Optimisation of building detection in satellite images by combining multispectral classification and texture filtering. ISPRS J. Photogramm. Remote Sens. 1999, 54, 50–60. [Google Scholar] [CrossRef]
- Ngo, T.T.; Mazet, V.; Collet, C.; De Fraipont, P. Shape-based building detection in visible band images using shadow information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 920–932. [Google Scholar] [CrossRef]
- Ghandour, A.J.; Jezzini, A.A. Autonomous Building Detection Using Edge Properties and Image Color Invariants. Buildings 2018, 8, 65. [Google Scholar] [CrossRef] [Green Version]
- Song, Y.; Shan, J. Building extraction from high resolution color imagery based on edge flow driven active contour and JSEG. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Beijing, China, 3–11 July 2008; Volume 37, pp. 185–190. [Google Scholar]
- Chen, M.; Wu, J.; Liu, L.; Zhao, W.; Tian, F.; Shen, Q.; Zhao, B.; Du, R. DR-Net: An Improved Network for Building Extraction from High Resolution Remote Sensing Image. Remote Sens. 2021, 13, 294. [Google Scholar] [CrossRef]
- Wagner, F.H.; Dalagnol, R.; Tarabalka, Y.; Segantine, T.Y.; Thomé, R.; Hirye, M. U-Net-Id, an Instance Segmentation Model for Building Extraction from Satellite Images—Case Study in the Joanópolis City, Brazil. Remote Sens. 2020, 12, 1544. [Google Scholar] [CrossRef]
- Ma, J.; Wu, L.; Tang, X.; Liu, F.; Zhang, X.; Jiao, L. Building extraction of aerial images by a global and multi-scale encoder-decoder network. Remote Sens. 2020, 12, 2350. [Google Scholar] [CrossRef]
- Guo, M.; Liu, H.; Xu, Y.; Huang, Y. Building extraction based on U-Net with an attention block and multiple losses. Remote Sens. 2020, 12, 1400. [Google Scholar] [CrossRef]
- Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRnet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Li, W.; Gong, W.; Wang, Z.; Sun, J. An improved boundary-aware perceptual loss for building extraction from VHR images. Remote Sens. 2020, 12, 1195. [Google Scholar] [CrossRef] [Green Version]
- Yi, Y.; Zhang, Z.; Zhang, W.; Zhang, C.; Li, W.; Zhao, T. Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Luo, J.; Huang, B.; Hu, X.; Sun, Y.; Yang, Y.; Xu, N.; Zhou, N. DE-Net: Deep Encoding Network for Building Extraction from High-Resolution Remote Sensing Imagery. Remote Sens. 2019, 11, 2380. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Hou, X.; Zhao, X. Automatic building extraction from high-resolution aerial imagery via fully convolutional encoder-decoder network with non-local block. IEEE Access 2020, 8, 7313–7322. [Google Scholar] [CrossRef]
- Li, W.; He, C.; Fang, J.; Zheng, J.; Fu, H.; Yu, L. Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data. Remote Sens. 2019, 11, 403. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Wu, J.; Fan, Y.; Gao, H.; Shao, Y. An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN. Sensors 2020, 20, 1465. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, Y.; Zhou, J.; Qi, W.; Li, X.; Gross, L.; Shao, Q.; Zhao, Z.; Ni, L.; Fan, X.; Li, Z. ARC-Net: An efficient network for building extraction from high-resolution aerial images. IEEE Access 2020, 8, 154997–155010. [Google Scholar] [CrossRef]
- Jin, Y.; Xu, W.; Zhang, C.; Luo, X.; Jia, H. Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images. Remote Sens. 2021, 13, 692. [Google Scholar] [CrossRef]
- Wu, T.; Hu, Y.; Peng, L.; Chen, R. Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 2910. [Google Scholar] [CrossRef]
- Yang, H.; Wu, P.; Yao, X.; Wu, Y.; Wang, B.; Xu, Y. Building Extraction in Very High Resolution Imagery by Dense-Attention Networks. Remote Sens. 2018, 10, 1768. [Google Scholar] [CrossRef] [Green Version]
- Ye, Z.; Fu, Y.; Gan, M.; Deng, J.; Comber, A.; Wang, K. Building extraction from very high resolution aerial imagery using joint attention deep neural network. Remote Sens. 2019, 11, 2970. [Google Scholar] [CrossRef] [Green Version]
- Deng, W.; Shi, Q.; Li, J. Attention-Gate-Based Encoder-Decoder Network for Automatical Building Extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 2611–2620. [Google Scholar] [CrossRef]
- He, S.; Jiang, W. Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery. Remote Sens. 2021, 13, 760. [Google Scholar] [CrossRef]
- Sun, S.; Mu, L.; Wang, L.; Liu, P.; Liu, X.; Zhang, Y. Semantic Segmentation for Buildings of Large Intra-Class Variation in Remote Sensing Images with O-GAN. Remote Sens. 2021, 13, 475. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B.; Gite, S.; Alamri, A. Building footprint extraction from high resolution aerial images using Generative Adversarial Network (GAN) architecture. IEEE Access 2020, 8, 209517–209527. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. arXiv 2020, arXiv:1908.07919. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Khanh, T.L.B.; Dao, D.P.; Ho, N.H.; Yang, H.J.; Baek, E.T.; Lee, G.; Kim, S.H.; Yoo, S.B. Enhancing U-Net with Spatial-Channel Attention Gate for Abnormal Tissue Segmentation in Medical Imaging. Appl. Sci. 2020, 10, 5729. [Google Scholar] [CrossRef]
- Schlemper, J.; Oktay, O.; Schaap, M.; Heinrich, M.; Kainz, B.; Glocker, B.; Rueckert, D. Attention gated networks: Learning to leverage salient regions in medical images. Med. Image Anal. 2019, 53, 197–207. [Google Scholar] [CrossRef] [PubMed]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
Predicted Label | |||
---|---|---|---|
True | False | ||
Ground-truth data | True | TP (True Positive) | FN (False Negative) |
False | FP (False Positive) | TN (True Negative) |
Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|
SegNet | 0.9640 | 0.8737 | 0.9008 | 0.8871 |
U-Net | 0.9665 | 0.9078 | 0.8754 | 0.8913 |
FC-DenseNet | 0.9690 | 0.9344 | 0.8632 | 0.8974 |
HRNetV2 | 0.9700 | 0.9030 | 0.9062 | 0.9046 |
csAG-HRNet | 0.9728 | 0.9256 | 0.8994 | 0.9123 |
Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|
SegNet | 0.9294 | 0.9358 | 0.9189 | 0.9273 |
U-Net | 0.9450 | 0.9583 | 0.9281 | 0.9429 |
FC-DenseNet | 0.9473 | 0.9472 | 0.9451 | 0.9461 |
HRNetV2 | 0.9510 | 0.9584 | 0.9409 | 0.9496 |
csAG-HRNet | 0.9568 | 0.9666 | 0.9445 | 0.9554 |
Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|
SegNet | 0.9187 | 0.9392 | 0.7179 | 0.8138 |
U-Net | 0.9224 | 0.9128 | 0.7588 | 0.8287 |
FC-DenseNet | 0.9208 | 0.8584 | 0.8142 | 0.8357 |
HRNetV2 | 0.9387 | 0.9370 | 0.8064 | 0.8668 |
csAG-HRNet | 0.9417 | 0.9448 | 0.8117 | 0.8732 |
Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|
SegNet | 0.9762 | 0.9819 | 0.9869 | 0.9842 |
U-Net | 0.9771 | 0.9816 | 0.9886 | 0.9849 |
FC-DenseNet | 0.9765 | 0.9784 | 0.9911 | 0.9844 |
HRNetV2 | 0.9771 | 0.9829 | 0.9873 | 0.9849 |
csAG-HRNet | 0.9780 | 0.9842 | 0.9870 | 0.9855 |
Dataset | Locations of csAG Modules | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
Site 1 (apartments and houses) | Block | 0.9721 | 0.9145 | 0.9070 | 0.9107 |
Fusion | 0.9694 | 0.9044 | 0.9001 | 0.9023 | |
LastLayer | 0.9704 | 0.9085 | 0.9025 | 0.9055 | |
Block + Fusion | 0.9718 | 0.9184 | 0.9003 | 0.9093 | |
Block + LastLayer | 0.9702 | 0.9095 | 0.8999 | 0.9047 | |
Fusion + LastLayer | 0.9708 | 0.9206 | 0.8908 | 0.9055 | |
Block + Fusion + LastLayer | 0.9728 | 0.9256 | 0.8994 | 0.9123 | |
Site 1 (factories) | Block | 0.9520 | 0.9649 | 0.9360 | 0.9502 |
Fusion | 0.9487 | 0.9619 | 0.9321 | 0.9468 | |
LastLayer | 0.9505 | 0.9643 | 0.9335 | 0.9486 | |
Block + Fusion | 0.9468 | 0.9580 | 0.9323 | 0.9450 | |
Block + LastLayer | 0.9476 | 0.9582 | 0.9337 | 0.9458 | |
Fusion + LastLayer | 0.9485 | 0.9618 | 0.9319 | 0.9466 | |
Block + Fusion + LastLayer | 0.9568 | 0.9666 | 0.9445 | 0.9554 | |
Site 2 | Block | 0.9370 | 0.9455 | 0.7912 | 0.8615 |
Fusion | 0.9380 | 0.9331 | 0.8072 | 0.8656 | |
LastLayer | 0.9418 | 0.9474 | 0.8098 | 0.8732 | |
Block+Fusion | 0.9416 | 0.9462 | 0.8100 | 0.8729 | |
Block + LastLayer | 0.9375 | 0.9428 | 0.7955 | 0.8629 | |
Fusion + LastLayer | 0.9307 | 0.9555 | 0.7552 | 0.8436 | |
Block + Fusion + LastLayer | 0.9417 | 0.9448 | 0.8117 | 0.8732 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Seong, S.; Choi, J. Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates. Remote Sens. 2021, 13, 3087. https://doi.org/10.3390/rs13163087
Seong S, Choi J. Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates. Remote Sensing. 2021; 13(16):3087. https://doi.org/10.3390/rs13163087
Chicago/Turabian StyleSeong, Seonkyeong, and Jaewan Choi. 2021. "Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates" Remote Sensing 13, no. 16: 3087. https://doi.org/10.3390/rs13163087