Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Jingyang Zhang¹,
Shiwei Li²,
Zixin Luo¹,
Tian Fang² &
…
Yao Yao ORCID: orcid.org/0000-0001-9866-4291¹

2484 Accesses
85 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

Learning-based multi-view stereo (MVS) methods have demonstrated promising results. However, very few existing networks explicitly take the pixel-wise visibility into consideration, resulting in erroneous cost aggregation from occluded pixels. In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework Vis-MVSNet significantly improves depth accuracy in reconstruction scenes with severe occlusion. Extensive experiments are performed on DTU, BlendedMVS, Tanks and Temples and ETH3D datasets to justify the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

Article 19 June 2022

MVSTER: Epipolar Transformer for Efficient Multi-view Stereo

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Campbell, N. D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 766–779.
Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In International Conference on Computer Vision (ICCV), pp. 1538–1547.
Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534.
Furukawa, Y. & Ponce, J. (2006). Carved visual hulls for image-based modeling. In European Conference on Computer Vision (ECCV), Springer, pp. 564–577.
Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and Robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.
Article Google Scholar
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In International Conference on Computer Vision (ICCV), pp 873–881.
Grum, M., & Bors, A. G. (2014). 3d modeling of multiple-object scenes from sets of images. Pattern Recognition, 47(1), 326–343.
Article Google Scholar
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F. & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504.
Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282.
Hartmann, W., Galliani, S., Havlena, M., Van Gool, L. & Schindler, K. (2017). Learned multi-patch similarity. In International Conference on Computer Vision (ICCV), pp 1586–1594.
He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.
Article Google Scholar
Huang, P. H., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. B. (2018). Deepmvs: Learning multi-view stereopsis. In Computer Vision and Pattern Recognition (CVPR), pp. 2821–2830.
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., & Aanæs, H. (2014). Large scale multi-view stereopsis evaluation. In Computer Vision and Pattern Recognition (CVPR), pp. 406–413.
Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In International Conference on Computer Vision (ICCV), pp. 2307–2315.
Ji, M., Zhang, J., Dai, Q., & Fang, L. (2020). Surfacenet+: An end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4078–4093.
Article Google Scholar
Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Neural Information Processing Systems (NeurIPS), vol 30.
Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In Neural Information Processing Systems (NeurIPS), vol 30.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In International Conference on Computer Vision (ICCV), pp. 66–75.
Kim, S., Min, D., Kim, S., & Sohn, K. (2018). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.
Article MathSciNet Google Scholar
Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 205–214.
Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Knapitsch, A., Park, J., Zhou, Q. Y., & Koltun, V. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4), 78.
Article Google Scholar
Kuhn, A., Hirschmüller, H., Scharstein, D., & Mayer, H. (2017). A tv prior for high-quality scalable multi-view stereo reconstruction. International Journal of Computer Vision, 124(1), 2–17.
Article MathSciNet Google Scholar
Kuhn, A., Lin, S., & Erdler, O. (2019). Plane completion and filtering for multi-view stereo reconstruction. In German Conference on Pattern Recognition (GCPR), pp. 18–32.
Kuhn, A., Sormann, C., Rossi, M., Erdler, O., & Fraundorfer, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In International Conference on 3D Vision (3DV), pp. 404–413.
Kutulakos, K. N., & Seitz, S. M. (2000). A theory of shape by space carving. International Journal of Computer Vision, 38(3), 199–218.
Article MATH Google Scholar
Lhuillier, M., & Quan, L. (2005). A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 418–433.
Article Google Scholar
Li, Z., Zuo, W., Wang, Z., & Zhang, L. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.
Article MATH Google Scholar
Liao, J., Fu, Y., Yan, Q., & Xiao, C. (2019). Pyramid multi-view stereo with local consistency. Computer Graphics Forum, 38(7), 335–346.
Article Google Scholar
Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J. M., Yang, R., Nistér, D., & Pollefeys, M. (2007). Real-time visibility-based fusion of depth maps. In International Conference on Computer Vision (ICCV), pp. 1–8.
Paschalidou, D., Ulusoy, O., Schmitt, C., Van Gool, L., & Geiger, A. (2018). Raynet: Learning volumetric 3d reconstruction with ray potentials. In Computer Vision and Pattern Recognition (CVPR), pp. 3897–3906.
Poggi, M., & Mattoccia, S. (2016). Learning from scratch a confidence measure. In British Machine Vision Conference (BMVC), vol 2, pp. 4.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention, pp. 234–241.
Schönberger, J. L., Zheng, E., Frahm, J. M., & Pollefeys, M. (2016). Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 501–518.
Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, vol 1, pp. 519–528.
Slabaugh, G. G., Culbertson, W. B., Malzbender, T., Stevens, M. R., & Schafer, R. W. (2004). Methods for volumetric reconstruction of visual scenes. International Journal of Computer Vision, 57(3), 179–199.
Article Google Scholar
Sormann, C., Knöbelreiter, P., Kuhn, A., Rossi, M., Pock, T., & Fraundorfer, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In International Conference on 3D Vision (3DV), pp. 394–403.
Tola, E., Strecha, C., & Fua, P. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.
Article Google Scholar
Tosi, F., Poggi, M., Benincasa, A., & Mattoccia, S. (2018). Beyond local reasoning for stereo confidence estimation with deep learning. In European Conference on Computer Vision (ECCV), pp. 319–334.
Wang, F., Galliani, S., Vogel, C., Speciale, P., & Pollefeys, M. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14194–14203.
Xu, Q., & Tao, W. (2019). Multi-scale geometric consistency guided multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 5483–5492.
Xu, Q., & Tao, W. (2020). Planar prior assisted patchmatch multi-view stereo. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12516–12523.
Xu, Z., Liu, Y., Shi, X., Wang, Y., & Zheng, Y. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5981–5990.
Xue, Y., Chen, J., Wan, W., Huang, Y., Yu, C., Li, T., & Bao, J. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In International Conference on Computer Vision (ICCV), pp. 4312–4321.
Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., & Tai, Y.W. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In European Conference on Computer Vision (ECCV), pp. 674–689.
Yang, J., Mao, W., Alvarez, J. M., & Liu, M. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886.
Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., & Quan, L. (2017). Relative camera refinement for accurate dense reconstruction. In 2017 International Conference on 3D Vision (3DV), IEEE, pp. 185–194.
Yao, Y., Luo, Z., Li, S., Fang, T., & Quan, L. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 767–783.
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Computer Vision and Pattern Recognition (CVPR), pp. 5525–5534.
Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., & Quan, L. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Computer Vision and Pattern Recognition (CVPR), pp. 1790–1799.
Zhang, J., Yao, Y., Li, S., Luo, Z., & Fang, T. (2020). Visibility-aware multi-view stereo network. In British Machine Vision Conference (BMVC).
Zhang, J., Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2020). Learning stereo matchability in disparity regression networks. In International Conference on Pattern Recognition (ICPR), pp. 1611–1618.
Zhang, R., Li, S., Fang, T., Zhu, S., & Quan, L. (2015). Joint camera clustering and surface segmentation for large-scale multi-view stereo. In International Conference on Computer Vision (ICCV), pp. 2084–2092.
Zheng, E., Dunn, E., Jojic, V., & Frahm, J. M, (2014). Patchmatch based joint view selection and depthmap estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 1510–1517.

Download references

Acknowledgements

This work is supported by Hong Kong RGC GRF 16206819, 16203518 and T22-603/15N.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Sai Kung District, Hong Kong
Jingyang Zhang, Zixin Luo & Yao Yao
Everest Innovation Technology, Kowloon, Hong Kong
Shiwei Li & Tian Fang

Authors

Jingyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Zixin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Tian Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Yao.

Additional information

Communicated by William Smith.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, J., Li, S., Luo, Z. et al. Vis-MVSNet: Visibility-Aware Multi-view Stereo Network. Int J Comput Vis 131, 199–214 (2023). https://doi.org/10.1007/s11263-022-01697-3

Download citation

Received: 20 August 2021
Accepted: 26 September 2022
Published: 14 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11263-022-01697-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

MVSTER: Epipolar Transformer for Efficient Multi-view Stereo

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

MVSTER: Epipolar Transformer for Efficient Multi-view Stereo

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now