Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery
<p>(<b>a</b>) The original image, (<b>b</b>) the ground truth, and (<b>c</b>) the prediction by U-Net [<a href="#B21-remotesensing-13-00760" class="html-bibr">21</a>]. There exist clutters around the boundaries, and some buildings are incomplete. The straight lines and right angles of some buildings are not well preserved.</p> "> Figure 2
<p>The overall architecture of our framework. The first stage produces multi-level semantics with different spatial resolutions (1/2, 1/4, and 1/16 of the input size, respectively). The second stage parses them to the segmentation mask. The last stage generates the boundary mask. While testing, only the former two stages are reserved.</p> "> Figure 3
<p>Some visualization examples for the segmentation mask and inferred semantic boundary through spatial variation derivation.</p> "> Figure 4
<p>Convolution and residual blocks. (<b>a</b>) denotes the standard convolution and residual block used in YOLOv3 [<a href="#B37-remotesensing-13-00760" class="html-bibr">37</a>], and (<b>b</b>) denotes the separable convolution and our advanced “Separable Residual Block”.</p> "> Figure 5
<p>The structure of the convolutional block attention module (CBAM). It is composed of two sub-modules.</p> "> Figure 6
<p>The channel attention mechanism and spatial attention mechanism. (<b>a</b>) and (<b>b</b>) respectively denote the channel attention module and spatial attention module.</p> "> Figure 7
<p>Examples of building extraction results produced by our method and U-Net on the WHU Aerial Building Dataset. The first two rows are aerial images and the ground truth. The predictions by U-Net and our method are in row 3 and row 4, respectively. The last two rows show the errors (wrongly predicted pixels are marked in red) of U-Net and our method, respectively.</p> "> Figure 8
<p>Examples of building extraction results produced by our method and U-Net on the Inria Aerial Image Labeling Dataset. The first two rows are aerial images and the ground truth. The predictions by U-Net and our method are in row 3 and row 4, respectively. The last two rows show the errors (wrongly predicted pixels are marked in red) of U-Net and our method, respectively.</p> "> Figure 9
<p>Examples of building extraction produced by “w/B” and “B”. From left to right: aerial images, labels, predictions of “w/B”, predictions of “B”, errors of “w/B”, and errors of “B”. The weight <math display="inline"><semantics> <msub> <mi>ω</mi> <mi>i</mi> </msub> </semantics></math> is set to 2.</p> "> Figure 10
<p>Visualization examples of the attention mechanism (best view in colors). The numbers 0–71 denote the feature map channels from the lower level, while 72–143 denote that from the higher level. The CBAM guides the network to pay different amounts of attention to different feature map channels and regions while addressing different images.</p> "> Figure 11
<p>Examples of building extraction results on the WHU Satellite Building Dataset II. From left to right: satellite images, labels, predictions, and errors.</p> ">
Abstract
:1. Introduction
- A boundary-assisted learning pattern is proposed, with the assistance of which the boundary morphology maintenance of buildings is markedly ameliorated. Moreover, the SVF module combines the segmentation task and boundary learning task so that they can interact with each other, making the network easier to train.
- A new FCN-based architecture is proposed. The utilization of separable convolutions reduces the number of parameters in the model while expanding the receptive fields by using large filters. The introduction of a CBAM plays a role in boosting the model.
2. Methodology
2.1. Overall Framework
2.2. Significant Modules
2.2.1. Spatial Variation Fusion
2.2.2. Separable Convolution
2.2.3. Convolutional Block Attention Module
2.3. Loss Functions
3. Experiments and Comparisons
3.1. Datasets
3.2. Implementation Details
3.3. Results and Comparisons
3.3.1. Comparison on the WHU Aerial Building Dataset
3.3.2. Comparison on the Inria Aerial Image Labeling Dataset
4. Discussion
4.1. Effectiveness of Boundary-Assisted Learning
4.2. Analysis of the Attention Module
4.3. Evaluation on Satellite Images
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Awrangjeb, M.; Hu, X.Y.; Yang, B.S.; Tian, J.J. Editorial for Special Issue: “Remote Sensing based Building Extraction”. Remote Sens. 2020, 12, 549. [Google Scholar] [CrossRef] [Green Version]
- Rashidian, V.; Baise, L.G.; Koch, M. Detecting Collapsed Buildings after a Natural Hazard on VHR Optical Satellite Imagery Using U-Net Convolutional Neural Networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9394–9397. [Google Scholar]
- Liu, X.; Liang, X.; Li, X.; Xu, X.; Ou, J. A Future Land Use Simulation Model (FLUS) for Simulating Multiple Land Use Scenarios by Coupling Human and Natural Effects. Landsc. Urban Plan. 2017, 168, 94–116. [Google Scholar] [CrossRef]
- Liu, P.H.; Liu, X.P.; Liu, M.X.; Shi, Q.; Yang, J.X.; Xu, X.C.; Zhang, Y.Y. Building Footprint Extraction from High-Resolution Images via Spatial Residual Inception Convolutional Neural Network. Remote Sens. 2019, 11, 830. [Google Scholar] [CrossRef] [Green Version]
- Ji, S.P.; Wei, S.Q.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–596. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L.; Zhu, T. Building Change Detection from Multitemporal High-Resolution Remotely Sensed Images Based on a Morphological Building Index. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 105–115. [Google Scholar] [CrossRef]
- Wang, D.; Song, W.D. A Method of Building Edge Extraction from Very High Resolution Remote Sensing Images. Environ. Prot. Circ. Econ. 2011, 29, 26–28. [Google Scholar]
- Hu, R.; Huang, X.; Huang, Y. An Enhanced Morphological Building Index for Building Extraction from High-resolution Images. J. Geod. Geoinf. Sci. 2014, 43, 514–520. [Google Scholar]
- Ok, A.O.; Senaras, C.; Yuksel, B. Automated Detection of Arbitrarily Shaped Buildings in Complex Environments from Monocular VHR Optical Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2013, 51, 1701–1717. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Lu, H.; Zhang, Q. Applications of Deep Convolutional Neural Network in Computer Vision. J. Data Acquis. Process. 2016, 31, 1–17. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ghosh, S.; Das, N.; Das, I.; Maulik, U. Understanding Deep Learning Techniques for Image Segmentation. ACM Comput. Surv. 2019, 52, 73. [Google Scholar] [CrossRef] [Green Version]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Tran. Pattern anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.S.; Shi, J.P.; Qi, X.J.; Wang, X.G.; Jia, J.Y. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Pinheiro, P.O.; Lin, T.Y.; Collobert, R.; Dollar, P. Learning to Refine Object Segments. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 75–91. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Araujo, T.; Aresta, G.; Castro, E.; Rouco, J.; Aguiar, P.; Eloy, C.; Polonia, A.; Campilho, A. Classification of breast cancer histology images using Convolutional Neural Networks. PLoS ONE 2017, 12, e0177544. [Google Scholar] [CrossRef]
- Volpi, M.; Tuia, D. Dense Semantic Labeling of Subdecimeter Resolution Images with Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 881–893. [Google Scholar] [CrossRef] [Green Version]
- Yi, Y.N.; Zhang, Z.J.; Zhang, W.C.; Zhang, C.R.; Li, W.D.; Zhao, T. Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network. Remote Sens. 2019, 11, 1774. [Google Scholar] [CrossRef] [Green Version]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 645–657. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Zhang, X.C.; Zhao, X.Y.; Xin, Q.C. Extracting Building Boundaries from High Resolution Optical Images and LiDAR Data by Integrating the Convolutional Neural Network and the Active Contour Model. Remote Sens. 2018, 10, 1459. [Google Scholar] [CrossRef] [Green Version]
- Yuan, J.Y. Learning Building Extraction in Aerial Scenes with Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2793–2798. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, S.; Vanneschi, L. Improved Fully Convolutional Network with Conditional Random Fields for Building Extraction. Remote Sens. 2018, 10, 1135. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
- Bischke, B.; Helber, P.; Folz, J.; Borth, D.; Dengel, A. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. In Proceedings of the IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; pp. 1480–1484. [Google Scholar]
- Howard, A.G.; Zhu, M.L.; Chen, B.; Kalenichenko, D.; Wang, W.J.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Yi, W.P.; Sebastian, E.; Hinrich, S. Attention-Based Convolutional Neural Network for Machine Comprehension. arXiv 2016, arXiv:1602.04341. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. In Proceedings of the IEEE International Symposium on Geoscience and Remote Sensing, Fort Worth, TX, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Zhen, M.; Wang, J.; Zhou, L.; Li, S.; Shen, T.; Shang, J.; Fang, T.; Quan, L. Joint Semantic Segmentation and Boundary Detection Using Iterative Pyramid Contexts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 13663–13672. [Google Scholar]
- Luo, H.F.; Chen, C.C.; Fang, L.N.; Zhu, X.; Lu, L.J. High-Resolution Aerial Images Semantic Segmentation Using Deep Fully Convolutional Network with Channel Attention Mechanism. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 3492–3507. [Google Scholar] [CrossRef]
- Li, X.; Hu, X.L.; Yang, J. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation Networks. arXiv 2017, arXiv:1709.01507. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv 2016, arXiv:1612.03928. [Google Scholar]
- Diederik, P.K.; Jimmy, L.B. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1026–1034. [Google Scholar]
- Wu, G.M.; Shao, X.W.; Guo, Z.L.; Chen, Q.; Yuan, W.; Shi, X.D.; Xu, Y.W.; Shibasaki, R. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sens. 2018, 10, 407. [Google Scholar] [CrossRef] [Green Version]
- Lin, G.S.; Milan, A.; Shen, C.H.; Reid, I. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5168–5177. [Google Scholar]
Method | Precision | Recall | F1 | IoU |
---|---|---|---|---|
SegNet | 92.1 | 89.9 | 91.0 | 85.8 |
Deeplab | 94.3 | 92.2 | 93.2 | 87.3 |
RefineNet | 93.7 | 92.3 | 93.0 | 86.9 |
SRI-Net | 95.2 | 93.3 | 94.2 | 89.1 |
CU-Net | 94.6 | 91.7 | 93.1 | 87.1 |
SiU-Net | 93.8 | 93.9 | 93.8 | 88.4 |
U-Net | 91.4 | 94.5 | 92.9 | 86.8 |
Ours | 95.1 | 94.9 | 95.0 | 90.5 |
Method | Precision | Recall | F1 | IoU |
---|---|---|---|---|
SegNet | 79.6 | 75.4 | 77.4 | 63.2 |
Deeplab | 84.9 | 81.3 | 83.1 | 71.1 |
RefineNet | 86.4 | 80.3 | 82.7 | 70.1 |
SRI-Net | 85.8 | 81.5 | 83.4 | 71.8 |
U-Net | 83.1 | 81.1 | 82.1 | 69.7 |
Ours | 83.5 | 91.1 | 87.1 | 77.2 |
Precision | Recall | F1 | IoU | |||||
---|---|---|---|---|---|---|---|---|
w/B | B | w/B | B | w/B | B | w/B | B | |
1 | 94.8 | 95.5 | 93.9 | 94.5 | 94.3 | 95.0 | 89.3 | 90.4 |
2 | 93.9 | 95.1 | 95.0 | 94.9 | 94.4 | 95.0 | 89.4 | 90.5 |
3 | 92.8 | 92.0 | 94.3 | 96.3 | 93.5 | 94.1 | 87.8 | 88.9 |
5 | 89.6 | 90.2 | 96.8 | 96.9 | 93.1 | 93.4 | 87.0 | 87.7 |
Method | Precision | Recall | F1 | IoU |
---|---|---|---|---|
U-Net | 76.7 | 74.5 | 75.6 | 60.7 |
SiU-Net | 72.5 | 79.6 | 75.9 | 61.1 |
Ours | 79.5 | 82.3 | 80.9 | 67.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, S.; Jiang, W. Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery. Remote Sens. 2021, 13, 760. https://doi.org/10.3390/rs13040760
He S, Jiang W. Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery. Remote Sensing. 2021; 13(4):760. https://doi.org/10.3390/rs13040760
Chicago/Turabian StyleHe, Sheng, and Wanshou Jiang. 2021. "Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery" Remote Sensing 13, no. 4: 760. https://doi.org/10.3390/rs13040760