Abstract
With the advent of the Internet of Things (IoT) era, many devices have surfaced that capture and generate various visual data. To recognize and extract a meaningful pattern from these visual data, powerful methods are required for different IoT applications. Fortunately, deep convolutional neural networks (CNNs) significantly improve the performance of almost all tasks in computer vision, including semantic image segmentation. However, the feature extraction of CNNs may cause the loss of contextual and spatial information. Moreover, the standard convolutional and pooling layers adopted by most CNN architectures lead to a fixed receptive field, which makes it challenging to deal with multi-scale objects in the image. To remedy these issues of CNNs for semantic image segmentation, this paper proposes a multi-level graph convolutional recurrent neural network (MGCRNN) to combine CNNs and graph neural networks (GNNs) for fusing multi-level features. By applying graph convolutional recurrent neural network (GCRNN), the proposed model acquires a global view of the image and aggregates multi-level contextual and structural information. The experiments verify the ability of GCRNN to obtain a flexible receptive field and learn structure features without losing spatial information. Results of these experiments conducted on the Pascal VOC 2012 and Cityscapes datasets show that the proposed model outperforms baseline approaches and can be competitive with state-of-the-art methods
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234-241).
Liang, X., Shen, X., Feng, J., Lin, L., & Yan, S. (2016). Semantic object parsing with graph lstm. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 125-143).
Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019). Graph-based global reasoning networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 433–442.
Qi, X., Liao, R., Jia, J., Fidler, S., & Urtasun, R. (2017). 3d graph neural networks for rgbd semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 5199-5208).
Liang, W., Xingming, S., Zhiqiang, R., Jing, L., & Chengtao, W. (2011). A Sequential Circuit-Based IP Watermarking Algorithm for Multiple Scan Chains in Design-for-Test. Radioengineering,20(2).
Li, Xiong, Tan, Jiawei, Liu, Anfeng, Vijayakumar, Pandi, Kumar, Neeraj, & Alazab, Mamoun. (2020). A Novel UAV-enabled Data Collection Scheme for Intelligent Transportation System through UAV Speed Control. IEEE Transactions on Intelligent Transportation Systems,. https://doi.org/10.1109/TITS.2020.3040557.
Liang, W., Huang, W., Long, J., Zhang, K., Li, K.-C., & Zhang, D. (2020). Deep reinforcement learning for resource protection and real-time detection in IoT environment. IEEE Internet of Things Journal, 7, 6392–6401. https://doi.org/10.1109/JIOT.2020.2974281.
Li, X., Liu, T., Obaidat, M. S., Wu, F., Vijayakumar, P., & Kumar, N. (2020). A Lightweight Privacy-Preserving Authentication Protocol for VANETs. IEEE Systems Journal, 14(3), 3547–3557.
Li, X., Liu, S., Wu, F., Kumari, S., & Rodrigues, J. J. (2018). Privacy preserving data aggregation scheme for mobile edge computing assisted IoT applications. IEEE Internet of Things Journal, 6, 4755–4763.
Liang, W., Liao, B., Long, J., Jiang, Y., & Peng, L. (2016). Study on PUF based secure protection for IC design. Microprocessors and Microsystems, 45, 56–66.
Yuan, Y., & Wang, J. (2018). Ocnet: Object context network for scene parsing. arXiv:1809.00916. Accessed on October 10, 2020.
Liang, W., Li, K.-C., Long, J., Kui, X., & Zomaya, A. Y. (2019). An industrial network intrusion detection algorithm based on multifeature data clustering optimization model. IEEE Transactions on Industrial Informatics, 16(3), 2063–2071.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3431-3440).
Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 1520-1528).
Yang, M., Yu, K., Zhang, C., Li, Z., & Yang, K. (2018). Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3684-3692).
Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587. Accessed on October 10, 2020.
Landrieu, L., & Simonovsky, M. (2018). Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4558-4567).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980. Accessed on October 10, 2020.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2881-2890).
Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122. Accessed on October 10, 2020.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7151-7160).
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., & Torr, P. H. (2015). Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 1529-1537).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338. https://doi.org/10.1007/s11263-009-0275-4.
Wang, X., Ye, Y., & Gupta, A. (2018). Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6857-6866).
Bertasius, G., Torresani, L., Yu, S. X., & Shi, J. (2017). Convolutional random walk networks for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 858-866).
Lu, Y., Chen, Y., Zhao, D., Liu, B., Lai, Z., & Chen, J. (2020). CNN-G: Convolutional Neural Network Combined with Graph for Image Segmentation with Theoretical Analysis. IEEE Transactions on Cognitive and Developmental Systems,. https://doi.org/10.1109/TCDS.2020.2998497.
Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv:1707.01926. Accessed on October 10, 2020.
Li, G., Muller, M., Qian, G., Delgadillo, I. C., Abualshour, A., Thabet, A. K., & Ghanem, B. (2019). DeepGCNs: Making GCNs Go as Deep as CNNs. arXiv:1910.06849. Accessed on October 10, 2020.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3213-3223).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 801-818).
Liu, Q., Kampffmeyer, M. C., Jenssen, R., & Salberg, A.-B. (2020). Multi-view Self-Constructing Graph Convolutional Networks with Adaptive Class Weighting Loss for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR) (pp. 44-45).
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In Proceedings of Advances in Neural Information Processing Systems (pp. 109-117).
Pinheiro, P., & Collobert, R. (2014). Recurrent convolutional neural networks for scene labeling. In Proceedings of the International Conference on Machine Learning (pp. 82-90).
Shen, F., Gan, R., Yan, S., & Zeng, G. (2017). Semantic segmentation via structured patch prediction, context crf and guidance crf. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp 1953-1961).
Gori, M., Monfardini, G., & Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings of the IEEE International Joint Conference on Neural Networks, Montreal (pp. 729-734).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80. https://doi.org/10.1109/TNN.2008.2005605.
Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., & Torr, P. H. (2020). Dual graph convolutional network for semantic segmentation. arXiv:1909.06121. Accessed on October 10, 2020.
Yu, B., Yin, H., & Zhu, Z. (2017). Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv:1709.04875. Accessed on October 10, 2020.
Garcia, V., & Bruna, J. (2017). Few-Shot Learning with Graph Neural Networks. arXiv:1711.04043. Accessed on October 10, 2020.
Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of Advances in Neural Information Processing Systems (pp. 5099-5108).
Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 863-872).
Acknowledgements
This research is supported by the National Key Research and Development Project under Grant 2018YFB1801600.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, D., Qu, H., Zhao, J. et al. Multi-level graph convolutional recurrent neural network for semantic image segmentation. Telecommun Syst 77, 563–576 (2021). https://doi.org/10.1007/s11235-021-00769-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-021-00769-y