[go: up one dir, main page]

Skip to main content
Log in

Self-supervised monocular depth estimation based on image texture detail enhancement

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present a new self-supervised monocular depth estimation method with multi-scale texture detail enhancement. Based on the observation that the image texture detail and the semantic information have essential significance on the depth estimation, we propose to provide them to the network to learn more sharpness and structural integrity of depth. Firstly, we generate the filtered images and detail images by multi-scale decomposition and use a deep neural network to automatically learn their weights to construct the texture detail enhanced image. Then, we consider the semantic features by putting deep features from the VGG-19 network into a self-attention network, guide the depth decoder network to focus on the integrity of objects in the scene. Finally, we propose a scale-invariant smooth loss to improve the structural integrity of the predicted depth. We evaluate our method on the KITTI 2015 and Make3D datasets and apply the predicted depth to novel view synthesis. The experimental results show that it has achieved satisfactory results compared with the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Patt. Recognit. Lett. 30(2), 88–97 (2009)

    Article  Google Scholar 

  2. Burt, P., Adelson, E.: The laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)

    Article  Google Scholar 

  3. Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: AAAI, pp. 8001–8008 (2019)

  4. Chen, P.Y., Liu, A.H., Liu, Y.C., Wang, Y.C.F.: Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: CVPR, pp. 2619–2627 (2019)

  5. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)

  6. Dippel, S., Stahl, M., Wiemker, R., Blaffert, T.: Multiscale contrast enhancement for radiographies: Laplacian pyramid versus fast wavelet transform. IEEE Trans. Med. Imaging 21(4), 343–353 (2002)

    Article  Google Scholar 

  7. Do, M., Vetterli, M.: The Contourlet Transform: An Efficient Directional Multiresolution Image Representation. IEEE Trans. Image Process. 14(12), 2091–2106 (2005)

    Article  Google Scholar 

  8. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp. 2650–2658 (2015)

  9. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)

  10. Fan, X., Wu, W., Zhang, L., Yan, Q., Fu, G., Chen, Z., Long, C., Xiao, C.: Shading-aware shadow detection and removal from a single image. Visual Comput. 36(10–12), 2175–2188 (2020)

    Article  Google Scholar 

  11. Fattal, R., Agrawala, M., Rusinkiewicz, S.: Multiscale shape and detail enhancement from multi-light image collections. ACM Transactions on Graphics 26(3),(2007)

  12. Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: Learning to predict new views from the world’s imagery. In: CVPR, pp. 5515–5524 (2016)

  13. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR, pp. 2002–2011 (2018)

  14. Fu, Y., Yan, Q., Liao, J., Chow, A.L.H., Xiao, C.: Real-time dense 3D reconstruction and camera tracking via embedded planes representation. Visual Comput. 36(10–12), 2215–2226 (2020)

    Article  Google Scholar 

  15. Fu, Y., Yan, Q., Liao, J., Xiao, C.: Joint texture and geometry optimization for rgb-d reconstruction. In: CVPR, pp. 5949–5958 (2020)

  16. Garg, R., VijayKumar, B.G., Carneiro, G., Reid, I.: Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: ECCV, pp. 740–756 (2016)

  17. Garg, V., Singh, K.: An improved grunwald-letnikov fractional differential mask for image texture enhancement. Int. J. Adv. Comput. Sci. Appl. 3(11), 130–135 (2012)

    Google Scholar 

  18. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)

  19. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, pp. 6602–6611 (2017)

  20. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation. In: ICCV, pp. 3827–3837 (2019)

  21. Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR, pp. 2482–2491 (2020)

  22. Guo, X., Li, H., Yi, S., Ren, J., Wang, X.: Learning monocular depth by distilling cross-domain stereo networks. In: ECCV, pp. 506–523 (2018)

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

  24. Hirschmueller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Patt. Anal. Mach. Intell. 30(2), 328–341 (2008)

    Article  Google Scholar 

  25. Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: CVPR, pp. 4755–4764 (2020)

  26. Karen, S., Andrew, Z.: Very deep convolutional networks for large-scale image. In: ICLR (2015)

  27. Karsch, K., Liu, C., Kang, S.B.: Depthtransfer: Depth extraction from video using non-parametric sampling. IEEE Trans. Patt. Anal. Mach. Intell. 36(11), 2144–2158 (2014)

    Article  Google Scholar 

  28. Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression. In: ICCV, pp. 66–75 (2017)

  29. Klingner, M., Termöhlen, J.A., Mikolajczyk, J., Fingscheidt, T.: Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: ECCV, pp. 2619–2627 (2020)

  30. Klodt, M., Vedaldi, A.: Supervising the new with the old: Learning sfm from sfm. In: ECCV, pp. 713–728 (2018)

  31. Kundu, J.N., Uppala, P.K., Pahuja, A., Babu, R.V.: Adadepth: Unsupervised content congruent adaptation for depth estimation. In: CVPR, pp. 2656–2665 (2018)

  32. Kuznietsov, Y., Stückle, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: CVPR, pp. 2215–2223 (2017)

  33. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV, pp. 239–248 (2016)

  34. Li, R., Wang, S., Long, Z., Gu, D.: Undeepvo: Monocular visual odometry through unsupervised deep learning. In: ICRA, pp. 7286–7291 (2018)

  35. Liao, J., Wei, M., Fu, Y., Yan, Q., Xiao, C.: Dense multiview stereo based on image texture enhancement. Computer Animation and Virtual Worlds 32(2),(2021)

  36. Liu, C., Gu, J., Kim, K., Narasimhan, S.G., Kautz, J.: Neural rgb-d sensing: Depth and uncertainty from a video camera. In: CVPR, pp. 10,978–10,987 (2019)

  37. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Patt. Anal. Mach. Intell. 38(10), 2024–2039 (2016)

    Article  Google Scholar 

  38. Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: CVPR, pp. 716–723 (2014)

  39. Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: ICCV, pp. 4473–4481 (2017)

  40. Luo, C., Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R., Yuille, A.: Every pixel counts ++: Joint learning of geometry and motion with 3d holistic understanding. IEEE Trans. Patt. Anal. Mach. Intell. 42(10), 2624–2641 (2020)

    Article  Google Scholar 

  41. Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., Lin, L.: Single view stereo matching. In: CVPR, pp. 155–163 (2018)

  42. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR, pp. 5667–5675 (2018)

  43. Mehta, I., Sakurikar, P., Narayanan, P.J.: Structured adversarial training for unsupervised monocular depth estimation. In: 3DV, pp. 314–323 (2018)

  44. Niklaus, S., Mai, L., Yang, J., Liu, F.: 3d ken burns effect from a single image. ACM Trans. Graphics 38(6), 1842:1-1842:15 (2019)

    Article  Google Scholar 

  45. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS (2017)

  46. Pillai, S., Ambrus, R., Gaidon, A.: Superdepth: Self-supervised, super-resolved monocular depth estimation. In: ICRA, pp. 9250–9256 (2019)

  47. P.Kingma, D., Lei Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)

  48. Poggi, M., Tosi, F., Mattoccia, S.: Learning monocular depth estimation with unsupervised trinocular assumptions. In: 3DV, pp. 324–333 (2018)

  49. Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M.J.: Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: CVPR, pp. 12,232–12,241 (2019)

  50. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234–241 (2015)

  51. Saxena, A., Sun, M., Ng, A.Y.: Make3d: Learning 3d scene structure from a single still image. IEEE Trans. Patt. Anal. Mach. Intell. 31(5), 824–840 (2009)

    Article  Google Scholar 

  52. Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4d rgbd light field from a single image. In: CVPR, pp. 2262–2270 (2017)

  53. Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: CVPR (2019)

  54. Wang, C., Miguel Buenaposada, J., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR, pp. 2022–2030 (2018)

  55. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, p. 7794–7803 (2018)

  56. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment : From error visibility to structural similarity. IEEE Transactions on Image Processing 13(4),(2004)

  57. Watson, J., Firman, M., Brostow, G., Turmukhambetov, D.: Self-supervised monocular depth hints. In: ICCV, pp. 2162–2171 (2019)

  58. Xie, J., Girshick, R., Farhadi, A.: Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In: ECCV, pp. 842–857 (2016)

  59. Yang, N., Wang, R., Stueckler, J., Cremers, D.: Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In: ECCV, pp. 835–852 (2018)

  60. Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R.: Lego: Learning edge with geometry all at once by watching videos. In: CVPR, pp. 225–234 (2018)

  61. Yang, Z., Wang, P., Xu, W., Zhao, L., Nevatia, R.: Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: AAAI, pp. 7493–7500 (2018)

  62. Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: CVPR, pp. 1983–1992 (2018)

  63. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)

  64. Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learning view synthesis using multiplane images. ACM Transactions on Graphics 37(4),(2018)

  65. Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: ECCV (2016)

  66. Zou, Y., Luo, Z., Huang, J.B.: Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: ECCV, pp. 38–55 (2018)

Download references

Acknowledgements

This work is partially supported by the Key Technological Innovation Projects of Hubei Province (2018AAA062), NSFC (No. 61972298), Science and Technology Cooperation Project of The Xinjiang Production and Construction Corps (No. 2019BC008), Wuhan University-Huawei GeoInformatices Innovation Lab.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fei Luo or Chunxia Xiao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Luo, F., Li, W. et al. Self-supervised monocular depth estimation based on image texture detail enhancement. Vis Comput 37, 2567–2580 (2021). https://doi.org/10.1007/s00371-021-02206-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02206-2

Keywords

Navigation