Abstract
Learning-based multi-view stereo (MVS) methods have demonstrated promising results. However, very few existing networks explicitly take the pixel-wise visibility into consideration, resulting in erroneous cost aggregation from occluded pixels. In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework Vis-MVSNet significantly improves depth accuracy in reconstruction scenes with severe occlusion. Extensive experiments are performed on DTU, BlendedMVS, Tanks and Temples and ETH3D datasets to justify the effectiveness of the proposed framework.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Campbell, N. D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 766–779.
Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In International Conference on Computer Vision (ICCV), pp. 1538–1547.
Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534.
Furukawa, Y. & Ponce, J. (2006). Carved visual hulls for image-based modeling. In European Conference on Computer Vision (ECCV), Springer, pp. 564–577.
Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and Robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In International Conference on Computer Vision (ICCV), pp 873–881.
Grum, M., & Bors, A. G. (2014). 3d modeling of multiple-object scenes from sets of images. Pattern Recognition, 47(1), 326–343.
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F. & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504.
Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282.
Hartmann, W., Galliani, S., Havlena, M., Van Gool, L. & Schindler, K. (2017). Learned multi-patch similarity. In International Conference on Computer Vision (ICCV), pp 1586–1594.
He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.
Huang, P. H., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. B. (2018). Deepmvs: Learning multi-view stereopsis. In Computer Vision and Pattern Recognition (CVPR), pp. 2821–2830.
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., & Aanæs, H. (2014). Large scale multi-view stereopsis evaluation. In Computer Vision and Pattern Recognition (CVPR), pp. 406–413.
Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In International Conference on Computer Vision (ICCV), pp. 2307–2315.
Ji, M., Zhang, J., Dai, Q., & Fang, L. (2020). Surfacenet+: An end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4078–4093.
Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Neural Information Processing Systems (NeurIPS), vol 30.
Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In Neural Information Processing Systems (NeurIPS), vol 30.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In International Conference on Computer Vision (ICCV), pp. 66–75.
Kim, S., Min, D., Kim, S., & Sohn, K. (2018). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.
Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 205–214.
Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Knapitsch, A., Park, J., Zhou, Q. Y., & Koltun, V. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4), 78.
Kuhn, A., Hirschmüller, H., Scharstein, D., & Mayer, H. (2017). A tv prior for high-quality scalable multi-view stereo reconstruction. International Journal of Computer Vision, 124(1), 2–17.
Kuhn, A., Lin, S., & Erdler, O. (2019). Plane completion and filtering for multi-view stereo reconstruction. In German Conference on Pattern Recognition (GCPR), pp. 18–32.
Kuhn, A., Sormann, C., Rossi, M., Erdler, O., & Fraundorfer, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In International Conference on 3D Vision (3DV), pp. 404–413.
Kutulakos, K. N., & Seitz, S. M. (2000). A theory of shape by space carving. International Journal of Computer Vision, 38(3), 199–218.
Lhuillier, M., & Quan, L. (2005). A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 418–433.
Li, Z., Zuo, W., Wang, Z., & Zhang, L. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.
Liao, J., Fu, Y., Yan, Q., & Xiao, C. (2019). Pyramid multi-view stereo with local consistency. Computer Graphics Forum, 38(7), 335–346.
Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J. M., Yang, R., Nistér, D., & Pollefeys, M. (2007). Real-time visibility-based fusion of depth maps. In International Conference on Computer Vision (ICCV), pp. 1–8.
Paschalidou, D., Ulusoy, O., Schmitt, C., Van Gool, L., & Geiger, A. (2018). Raynet: Learning volumetric 3d reconstruction with ray potentials. In Computer Vision and Pattern Recognition (CVPR), pp. 3897–3906.
Poggi, M., & Mattoccia, S. (2016). Learning from scratch a confidence measure. In British Machine Vision Conference (BMVC), vol 2, pp. 4.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention, pp. 234–241.
Schönberger, J. L., Zheng, E., Frahm, J. M., & Pollefeys, M. (2016). Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 501–518.
Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, vol 1, pp. 519–528.
Slabaugh, G. G., Culbertson, W. B., Malzbender, T., Stevens, M. R., & Schafer, R. W. (2004). Methods for volumetric reconstruction of visual scenes. International Journal of Computer Vision, 57(3), 179–199.
Sormann, C., Knöbelreiter, P., Kuhn, A., Rossi, M., Pock, T., & Fraundorfer, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In International Conference on 3D Vision (3DV), pp. 394–403.
Tola, E., Strecha, C., & Fua, P. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.
Tosi, F., Poggi, M., Benincasa, A., & Mattoccia, S. (2018). Beyond local reasoning for stereo confidence estimation with deep learning. In European Conference on Computer Vision (ECCV), pp. 319–334.
Wang, F., Galliani, S., Vogel, C., Speciale, P., & Pollefeys, M. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14194–14203.
Xu, Q., & Tao, W. (2019). Multi-scale geometric consistency guided multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 5483–5492.
Xu, Q., & Tao, W. (2020). Planar prior assisted patchmatch multi-view stereo. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12516–12523.
Xu, Z., Liu, Y., Shi, X., Wang, Y., & Zheng, Y. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5981–5990.
Xue, Y., Chen, J., Wan, W., Huang, Y., Yu, C., Li, T., & Bao, J. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In International Conference on Computer Vision (ICCV), pp. 4312–4321.
Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., & Tai, Y.W. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In European Conference on Computer Vision (ECCV), pp. 674–689.
Yang, J., Mao, W., Alvarez, J. M., & Liu, M. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886.
Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., & Quan, L. (2017). Relative camera refinement for accurate dense reconstruction. In 2017 International Conference on 3D Vision (3DV), IEEE, pp. 185–194.
Yao, Y., Luo, Z., Li, S., Fang, T., & Quan, L. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 767–783.
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Computer Vision and Pattern Recognition (CVPR), pp. 5525–5534.
Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., & Quan, L. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Computer Vision and Pattern Recognition (CVPR), pp. 1790–1799.
Zhang, J., Yao, Y., Li, S., Luo, Z., & Fang, T. (2020). Visibility-aware multi-view stereo network. In British Machine Vision Conference (BMVC).
Zhang, J., Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2020). Learning stereo matchability in disparity regression networks. In International Conference on Pattern Recognition (ICPR), pp. 1611–1618.
Zhang, R., Li, S., Fang, T., Zhu, S., & Quan, L. (2015). Joint camera clustering and surface segmentation for large-scale multi-view stereo. In International Conference on Computer Vision (ICCV), pp. 2084–2092.
Zheng, E., Dunn, E., Jojic, V., & Frahm, J. M, (2014). Patchmatch based joint view selection and depthmap estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 1510–1517.
Acknowledgements
This work is supported by Hong Kong RGC GRF 16206819, 16203518 and T22-603/15N.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by William Smith.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Li, S., Luo, Z. et al. Vis-MVSNet: Visibility-Aware Multi-view Stereo Network. Int J Comput Vis 131, 199–214 (2023). https://doi.org/10.1007/s11263-022-01697-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01697-3