Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss
"> Figure 1
<p>A general scheme of training the deep segmentation network.</p> "> Figure 2
<p>A brief illustration of the improved U-net structure: The left-side blocks form the encoder part, in which the channels of the feature maps are doubled from 64 to 1024. The blocks on the right side constitute the decoder part, in which the resolution of the feature maps is gradually recovered by upsampling and gathering information from the encoder feature maps. Finally, a binary segmentation map that has the same resolution as the input image is produced as the road extraction result.</p> "> Figure 3
<p>The improved U-net with atrous spatial pyramid pooling (ASPP) integration: The ASPP module is inserted after the bottleneck of the Encoder-Decoder network. This means that the feature map generated by the encoder is processed by using ASPP, and then, the result is fed into the decoder.</p> "> Figure 4
<p>An example of augmenting the training data by rotating and flipping.</p> "> Figure 5
<p>The training metrics of the three different experiments. (<b>a</b>) is the BCE loss of the training process, and (<b>b</b>) is the BCE loss of the test predictions. (<b>c</b>) is the SSIM metric in training process, and (<b>d</b>) is the SSIM metric of the test predictions. (<b>e</b>) is the mean F1-score of the test predictions.</p> "> Figure 6
<p>Visual comparisons between the baseline U-net and ASPP-integrated U-net using test samples from the Massachusetts road dataset: The results are binarized using a threshold of 0.5. For each row in (<b>a</b>)–(<b>d</b>), it contains an input image, a ground truth image and two result maps of a sample region.</p> "> Figure 7
<p>Visualization results of three models in the test set of the Massachusetts road dataset: The brightness of the results represents the confidence of the predicted results. For each row in (<b>a</b>)–(<b>e</b>), it contains an input image, a ground truth image and three result maps of a sample region.</p> ">
Abstract
:1. Introduction
- A U-net-based Encoder-Decoder network is proposed as the baseline network for road extraction from remote sensing imagery. Firstly, the advanced ELU activation function was applied to improve the performance of the network. This was also used for road and building segmentation in References [17,27], so its effectiveness has been demonstrated. Secondly, batch normalization [28] was used to accelerate the convergence of the training process and to prevent the exploding gradient problem.
- The ASPP is integrated with the proposed Encoder-Decoder network to capture multiscale information. DeepLabv3+ [29], combined with the Encoder-Decoder and ASPP, is able to improve the extraction of small details. However, since DeepLabv3+ is mainly designed for daily scenes, it neglects targets that are really small. DeepLabv3+ applies the idea of gradually restoring the resolution from the Encoder-Decoder network, but it can only restore 1/4 of the resolution of the input, and these results should be improved by using a CRF to capture more detail. Hence, it is still insufficient for the road extraction task. Therefore, we integrated the ASPP with the strong Encoder-Decoder network of the U-net, and with the advantage of ASPP, we can overcome the deficiency of DeepLabv3+. Compared with DeepLabv3+, our approach can restore the resolution completely and is more suitable for processing remote sensing imagery.
- The SSIM is employed as a loss function to improve the quality of road segmentation. To the best of our knowledge, this is the first time that the SSIM has been used as a loss function for semantic segmentation. The SSIM loss function supplements the representation to a degree that the general cross-entropy loss function cannot reach, and it improves the quality of the target extraction results.
2. Materials and Methods
2.1. ASPP-Integrated Encoder-Decoder Network
2.1.1. Architecture of the Improved U-net
2.1.2. Integration of ASPP
2.2. SSIM Loss
2.3. Dataset and Preprocessing
2.4. Evaluation Metrics
3. Experiment Results and Discussion
3.1. Implementation Details
3.2. Discussion on Experimental Results
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ASPP | Atrous spatial pyramid pooling |
AdaGrad | Adaptive gradient algorithm |
ADAM | Adaptive Moment Estimation |
BCE | Binary cross-entropy |
CNN | Convolutional neural network |
CRF | Conditional random field |
ELU | Exponential linear unit |
FCN | Fully convolutional network |
FN | False Negative |
FP | False Positive |
FSM | Finite state machine |
GIS | Geographic information system |
MSSIM | Mean SSIM |
ReLU | Rectified Linear Unit |
RMSprop | Root-mean square prop |
RSRCNN | Road structure refined CNN |
SGD | Stochastic gradient descent |
SSIM | Structural similarity |
TN | True Negative |
TP | True Positive |
References
- Steger, C. An unbiased detector of curvilinear structures. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 113–125. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.T.; Venkateswar, V.; Chellappa, R. Edge detection and linear feature extraction using a 2-D random field model. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 84–95. [Google Scholar] [CrossRef]
- Koutaki, G.; Uchimura, K. Automatic road extraction based on cross detection in suburb. Electron. Imaging 2004, 36, 2–9. [Google Scholar]
- Hu, J.; Razdan, A.; Femiani, J.C.; Cui, M.; Wonka, P. Road network extraction and intersection detection from aerial images by tracking road footprints. IEEE Trans. Geosci. Remote Sens. 2007, 45, 4144–4157. [Google Scholar] [CrossRef]
- Ravanbakhsh, M.; Heipke, C.; Pakzad, K. Road junction extraction from high-resolution aerial imagery. Photogramm. Rec. 2010, 23, 405–423. [Google Scholar] [CrossRef]
- Marikhu, R.; Dailey, M.N.; Makhanov, S.; Honda, K. A Family of quadratic snakes for road extraction. In Proceedings of the 8th Asia Conference on Computer Vision, Tokyo, Japan, 18–22 November 2007; pp. 85–94. [Google Scholar]
- Wegner, J.D.; Montoya-Zegarra, J.A.; Schindler, K. A higher-order CRF model for road network extraction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1698–1705. [Google Scholar]
- Mnih, V.; Hinton, G.E. Learning to detect roads in high-resolution aerial images. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 210–223. [Google Scholar]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
- Wang, J.; Song, J.; Chen, M.; Yang, Z. Road network extraction: A neural-dynamic framework based on deep learning and a finite state machine. Int. J. Remote Sens. 2015, 36, 3144–3169. [Google Scholar] [CrossRef]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhong, Z.; Li, J.; Cui, W.; Jiang, H. Fully convolutional networks for building and road extraction: Preliminary results. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1591–1594. [Google Scholar]
- Wei, Y.; Wang, Z.; Xu, M. Road structure refined CNN for road extraction in aerial image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 709–713. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic road detection and centerline extraction via cascaded End-to-End convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
- Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Road segmentation of remotely-Sensed images using deep convolutional neural networks with landscape metrics and conditional random fields. Remote Sens. 2017, 9, 680. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A Deep convolutional encoder-decoder architecture for image segmentation. arXiv 2015, arXiv:1511.00561. [Google Scholar] [CrossRef] [PubMed]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Mosinska, A.; Márquez-Neila, P.; Kozinski, M.; Fua, P. Beyond the pixel-wise loss for topology-aware delineation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Munich, Germany, 8–14 September 2018; pp. 3136–3145. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Zhou, W.; Alan Conrad, B.; Hamid Rahim, S.; Simoncelli Eero, P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar]
- Søgaard, J.; Krasula, L.L.; Shahid, M.; Temel, D.; Søgaard, J.; Krasula, L.L.; Shahid, M.; Temel, D.; Brunnström, K.; Razaak, M. Applicability of Existing Objective Metrics of Perceptual Quality for Adaptive Video Streaming. Electron. Imaging 2016, 13, 1–7. [Google Scholar] [CrossRef]
- Shrestha, S.; Vanneschi, L. Improved Fully convolutional network with conditional random fields for building extraction. Remote Sens. 2018, 10, 1135. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the Iernational Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with atrous separable convolution for semantic image segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar]
- Iglovikov, V.; Mushinskiy, S.; Osin, V. Satellite Imagery Feature Detection using Deep Convolutional Neural Network: A Kaggle Competition. arXiv 2017, arXiv:1706.06169. [Google Scholar]
- He, H.; Wang, S.; Yang, D.; Wang, S.; Liu, X. A road extraction method for remote sensing image based on Encoder-Decoder network. Acta Geod. Cartogr. Sin. 2019, 48, 330–338. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, UK, 2016. [Google Scholar]
- Wiedemann, C.; Heipke, C.; Mayer, H.; Jamet, O. Empirical Evaluation of Automatically Extracted Road Axes. In Empirical Evaluation Techniques in Computer Vision; IEEE Computer Society Press: Los Alamitos, CA, USA, 1998; pp. 172–187. ISBN 978-0-818-68401-2. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Model Name | ASPP Integration | Loss Function |
---|---|---|
Baseline U-net | No | |
ASPP-U-net | Yes | |
ASPP-U-net+SSIM | Yes |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, H.; Yang, D.; Wang, S.; Wang, S.; Li, Y. Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss. Remote Sens. 2019, 11, 1015. https://doi.org/10.3390/rs11091015
He H, Yang D, Wang S, Wang S, Li Y. Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss. Remote Sensing. 2019; 11(9):1015. https://doi.org/10.3390/rs11091015
Chicago/Turabian StyleHe, Hao, Dongfang Yang, Shicheng Wang, Shuyang Wang, and Yongfei Li. 2019. "Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss" Remote Sensing 11, no. 9: 1015. https://doi.org/10.3390/rs11091015