[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets used in this study can be downloaded from https://github.com/YoYo000/MVSNet, and the results of the proposed model tested on the tanks and temples dataset are submitted to https://www.tanksandtemples.org/leaderboard/.

Code Availability

When this paper is accepted, we will disclose all relevant codes.

References

  1. Aanæs H, Jensen RR, Vogiatzis G et al (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vision 120:153–168. https://doi.org/10.1007/s11263-016-0902-9

    Article  MathSciNet  Google Scholar 

  2. Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373. https://doi.org/10.1007/s11053-022-10051-w

    Article  Google Scholar 

  3. Cai Y, Li L, Wang D et al (2023) Mfnet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3d reconstruction. Appl Intell 53(4):4289–4301. https://doi.org/10.1007/s10489-022-03754-3

    Article  Google Scholar 

  4. Campbell ND, Vogiatzis G, Hernández C et al (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Computer vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I 10. Springer, pp 766–779

  5. Chen PH, Yang HC, Chen KW et al (2020) Mvsnet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process 29:7261–7273. https://doi.org/10.1109/TIP.2020.3000611

    Article  Google Scholar 

  6. Chen R, Han S, Xu J et al (2020) Visibility-aware point-based multi-view stereo network. IEEE Trans Pattern Anal Mach Intell 43(10):3695–3708. https://doi.org/10.1109/TPAMI.2020.2988729

    Article  Google Scholar 

  7. Cheng S, Xu Z, Zhu S et al (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2524–2534

  8. Ding Y, Yuan W, Zhu Q et al (2022) Transmvsnet: global context-aware multi-view stereo network with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8585–8594

  9. Furukawa Y, Ponce J (2009) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376. https://doi.org/10.1109/TPAMI.2009.161

    Article  Google Scholar 

  10. Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE international conference on computer vision. pp 873–881

  11. Gu X, Fan Z, Zhu S et al (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2495–2504

  12. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141

  13. Ji M, Gall J, Zheng H et al (2017) Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of the IEEE international conference on computer vision. pp 2307–2315

  14. Kang Z, Yang J, Yang Z et al (2020) A review of techniques for 3d reconstruction of indoor environments. ISPRS Int J Geo Inf 9(5):330. https://doi.org/10.1109/TPAMI.2020.3032602

    Article  Google Scholar 

  15. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  16. Knapitsch A, Park J, Zhou QY et al (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph 36(4):1–13. https://doi.org/10.1145/3072959.3073599

    Article  Google Scholar 

  17. Laga H, Jospin LV, Boussaid F et al (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell 44(4):1738–1764. https://doi.org/10.1109/TPAMI.2020.3032602

    Article  Google Scholar 

  18. Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125

  19. Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768

  20. Ma X, Gong Y, Wang Q et al (2021) Epp-mvsnet: epipolar-assembling based depth prediction for multi-view stereo. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5732–5740

  21. Peng R, Wang R, Wang Z et al (2022) Rethinking depth estimation for multi-view stereo: a unified representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8645–8654

  22. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241, https://doi.org/10.1007/978-3-319-24574-4_28

  23. Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4104–4113

  24. Shi Y, Xi J, Hu D et al (2023) Raymvsnet++: learning ray-based 1d implicit fields for accurate multi-view stereo. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3296163

    Article  Google Scholar 

  25. Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920. https://doi.org/10.1007/s00138-011-0346-8

    Article  Google Scholar 

  26. Wang F, Galliani S, Vogel C et al (2021) Patchmatchnet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 14194–14203

  27. Wang X, Zhu Z, Huang G et al (2022) Mvster: epipolar transformer for efficient multi-view stereo. In: Computer vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI. Springer, pp 573–591, https://doi.org/10.1007/978-3-031-19821-2_33

  28. Wei Z, Zhu Q, Min C et al (2021) Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6187–6196

  29. Xu H, Zhang J, Cai J et al (2023) Unifying flow, stereo and depth estimation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3298645

    Article  Google Scholar 

  30. Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI conference on artificial intelligence. pp 12508–12515

  31. Xu Q, Kong W, Tao W et al (2022) Multi-scale geometric consistency guided and planar prior assisted multi-view stereo. IEEE Trans Pattern Anal Mach Intell 45(4):4945–4963. https://doi.org/10.1109/TPAMI.2022.3200074

  32. Yan J, Wei Z, Yi H et al (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV. Springer, pp 674–689. https://doi.org/10.1007/978-3-030-58548-8_39

  33. Yang HC, Chen PH, Chen KW et al (2020) Fade: feature aggregation for depth estimation with multi-view stereo. IEEE Trans Image Process 29:6590–6600. https://doi.org/10.1109/TIP.2020.2991883

  34. Yao Y, Luo Z, Li S et al (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV). pp 767–783

  35. Yao Y, Luo Z, Li S et al (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5525–5534

  36. Yao Y, Luo Z, Li S et al (2020) Blendedmvs: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1790–1799

  37. Yi H, Wei Z, Ding M et al (2020) Pyramid multi-view stereo net with self-adaptive view aggregation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer, pp 766–782

  38. Yu A, Guo W, Liu B et al (2021) Attention aware cost volume pyramid based multi-view stereo network for 3d reconstruction. ISPRS J Photogramm Remote Sens 175:448–460

    Article  Google Scholar 

  39. Zhang J, Li S, Luo Z et al (2023) Vis-mvsnet: visibility-aware multi-view stereo network. Int J Comput Vision 131(1):199–214. https://doi.org/10.1007/s11263-022-01697-3

    Article  Google Scholar 

  40. Zhang S, Wei Z, Xu W et al (2023) Dsc-mvsnet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo. Compl Intell Syst 9(6):6953–6969. https://doi.org/10.1007/s40747-023-01106-3

    Article  Google Scholar 

Download references

Funding

This work is supported by the Fundamental Research Funds for the Central Universities (Science and technology leading talent team project) (2022JBQY009), National Natural Science Foundation of China (51827813), National Key R &D Program “Transportation Infrastructure” “Reveal the list and take command” project (2022YFB2603302) and R &D Program of Beijing Municipal Education Commission (KJZD20191000402).

Author information

Authors and Affiliations

Authors

Contributions

Ming Han: First author. Ming Han made substantial contributions to the conception and design of the research, including formulating the research questions and hypotheses. Ming Han also conducted the experiments, collected and analyzed the data, and interpreted the results. Hui Yin: Corresponding author. Hui Yin provided overall supervision and guidance throughout the research. Aixin Chong: Third author. Aixin Chong provided valuable opinions in revising and improving the manuscript. Qianqian Du: Fourth author. Qianqian Du provided suggestions for improving the manuscript.

Corresponding author

Correspondence to Hui Yin.

Ethics declarations

Competing interests

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Ethics approval

Ethics approval was not required for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, M., Yin, H., Chong, A. et al. Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume. Appl Intell 54, 7924–7940 (2024). https://doi.org/10.1007/s10489-024-05574-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05574-z

Keywords