[go: up one dir, main page]

Skip to main content
Log in

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

With the development of deep learning, the network architectures and algorithm accuracy applied to monocular depth estimation have been greatly improved. However, these complex network structures can be very difficult to realize real-time processing on embedded platforms. Consequently, this study proposed a lightweight encoding and decoding structure based on the U-Net model. The depthwise separable convolution was introduced into the encoder and decoder to optimize the network structure, further reduce the computational complexity, and improve the running speed, the implementation algorithm being more suitable for embedded platforms. When the accuracy of similar depth images was achieved, the network parameters could be reduced by up to eight times, and the running speed could be more than doubled. The research showed the proposed method to be very effective, having a certain reference value in monocular depth estimation algorithms running on embedded platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

Enquiries about data availability should be directed to the authors.

References

  1. Liu, F., Shen, C., Lin, G., et al. (2015). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis & Machine Intelligence, 38(10), 2024–2039.

    Article  Google Scholar 

  2. Qingbo, Z., & Hongyuan, W. (2010). Block recovery stereo matching algorithm using image segmentation. Journal of Huazhong University of Science and Technology, 38(1), 81–84.

    Google Scholar 

  3. Zexiao, X., & Zuoqi, Z. (2018). Spatial point localization method based on the motion recovery structure. Progress in Laser and optoelectronics, 55(8), 370–377.

    Google Scholar 

  4. Cheng, X., Xiaohan, T., Siping, L., et al. (2019). Fast monocular depth estimation methods for embedded platforms. "in chinese", CN110599533A.

  5. Eigen, D., Puhrsch, C., and Fergus, R., (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems (NIPS), pp. 2366–2374.

  6. Eigen, D., Fergus, R. (2014). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In 2015 IEEE International Conference on Computer Vision (ICCV).

  7. Liu, F., Shen, C., and Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170.

  8. Li, B., Shen, C. H., Dai, Y. C., Van, den H. A., and He M Y. (2015). Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston IEEE, 1119–1127 https://doi.org/10.1109/CVPR.2015.7298715.

  9. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., and Navab, N., (2016). Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248.

  10. Cao, Y., Wu, Z., & Shen, C. (2017). Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology.

  11. Garg, R., Vijay Kumar, B. G., Carneiro, G., and Ian, R.,(2016). Unsupervised CNN for single view depth estimation: geometry to the rescue. In Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 740-756.

  12. Godard C, Aodha O M, and Brostow G J. (2017). Unsupervised monocular depth estimation with left-right consistency. In Conference on Computer Vision and Pattern Recognition (CVPR).

  13. Godard, C., Aodha, O. M., Firman, M., et al. (2019). Digging into self-supervised monocular depth estimation. In ICCV.

  14. Tosi, F., Aleotti, F., Poggi, M., and Mattoccia, S., (2019). Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9799–9809. https://doi.org/10.1109/CVPR.2019.01003.

  15. Casser, V., Pirk, S., Mahjourian, R., et al. (2019). Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. In AAAI.

  16. Wang, L., Zhang, J., Wang, Y., et al. (2020). Cliffnet for monocular depth estimation with hierarchical embedding loss. Cham: Springer.

    Book  Google Scholar 

  17. Mancini, M., Costante, G., Valigi, P., et al. (2016). Fast robust monocular depth estimation for obstacle detection with fully convolutional networks. https://doi.org/10.1109/IROS.2016.7759632

  18. Atapour-Ab Arghouei, A., (2018). Real-time monocular depth estimation using synthetic data with domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision & Pattern Recognition. IEEE.

  19. Liu W, Anguelov D, Erhan D, et al. (2016). SSD: single shot multibox detector. In European Conference on Computer Vision.

  20. Redmon, J., Farhadi, A., (2015). YOLOv3: An incremental improvement. arXiv e-prints, 2018.

  21. Technicolor, T., Related, S., Technicolor, T., et al. ImageNet classification with deep convolutional neural networks [50].

  22. Lecun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989). Optimal brain damage. In Advances in Neural Information Processing Systems 2, NIPS Conference, Denver, Colorado, USA, November 27–30, 1989.

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  24. Girshick R. Fast R-CNN. arXiv e-prints, 2015.

  25. Mancini, M., Costante, G., Valigi, P., and Ciarfuglia, T. A., (2016). Fast robust monocular depth estimation for obstacle detection with fully convolutional networks. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4296–4303.

  26. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint https://arxiv.org/pdf/1704.04861.pdf.

  27. Jlab, C., Qlab, C., Rui, C., et al. (2020). MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS Journal of Photogrammetry and Remote Sensing, 166, 255–267.

    Article  Google Scholar 

  28. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Springer International Publishing.

    Google Scholar 

  29. Chen, L. C., Zhu, Y., Papandreou, G., et al. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision. Springer, Cham.

  30. Enkun, C., Yanqing, T., & Jiawei, L. (2020). Calibration error compensation for the stereo measurement system. Applied Optics, 242(06), 46–52.

    Google Scholar 

  31. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651.

    Google Scholar 

  32. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  33. Heise, P., Klose, S., Jensen, B., & Knoll, A. (2013). Pm-huber: Patchmatch with huber regularization for stereo matching. In IEEE 2013 IEEE International Conference on Computer Vision (ICCV) - Sydney, Australia, 2013.12.1–2013.12.8, pp. 2360–2367.

  34. Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulffff, J., and Black, M. J., (2019). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12240–12249. https://doi.org/10.1109/CVPR.2019.01252.

  35. Wofk, D., Ma, F., Yang, T-J., Karaman, S., and Sze, V., (2019). FastDepth: Fast monocular depth estimation on embedded systems. In International Conference on Robotics and Automation (ICRA).

  36. Kuznietsov, Y., Stuckler, J., and Leibe, B., (2017). Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655. https://doi.org/10.1109/CVPR.2017.238.

  37. Zhou T., Brown M., Snavely N, and Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6612–6619. https://doi.org/10.1109/CVPR.2017.700.

  38. Yin, Z., and Shi, J., (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2018, pp. 1983–1992. doi:https://doi.org/10.1109/CVPR.2018.00212.

Download references

Acknowledgements

Thanks to Godard and his team who shared their results.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC Grant No. 61903124).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HuiBin Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, F., Yin, X., Shen, J. et al. OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network. Wireless Pers Commun 128, 2831–2846 (2023). https://doi.org/10.1007/s11277-022-10074-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-022-10074-9

Keywords

Navigation