Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Ming Han¹,
Hui Yin ORCID: orcid.org/0000-0002-4226-4368^1,3,
Aixin Chong² &
…
Qianqian Du¹

301 Accesses
1 Citation
Explore all metrics

Abstract

Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

Multi-view Stereo Network with Attention Thin Volume

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Article 07 June 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets used in this study can be downloaded from https://github.com/YoYo000/MVSNet, and the results of the proposed model tested on the tanks and temples dataset are submitted to https://www.tanksandtemples.org/leaderboard/.

Code Availability

When this paper is accepted, we will disclose all relevant codes.

References

Aanæs H, Jensen RR, Vogiatzis G et al (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vision 120:153–168. https://doi.org/10.1007/s11263-016-0902-9
Article MathSciNet Google Scholar
Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373. https://doi.org/10.1007/s11053-022-10051-w
Article Google Scholar
Cai Y, Li L, Wang D et al (2023) Mfnet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3d reconstruction. Appl Intell 53(4):4289–4301. https://doi.org/10.1007/s10489-022-03754-3
Article Google Scholar
Campbell ND, Vogiatzis G, Hernández C et al (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Computer vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I 10. Springer, pp 766–779
Chen PH, Yang HC, Chen KW et al (2020) Mvsnet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process 29:7261–7273. https://doi.org/10.1109/TIP.2020.3000611
Article Google Scholar
Chen R, Han S, Xu J et al (2020) Visibility-aware point-based multi-view stereo network. IEEE Trans Pattern Anal Mach Intell 43(10):3695–3708. https://doi.org/10.1109/TPAMI.2020.2988729
Article Google Scholar
Cheng S, Xu Z, Zhu S et al (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2524–2534
Ding Y, Yuan W, Zhu Q et al (2022) Transmvsnet: global context-aware multi-view stereo network with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8585–8594
Furukawa Y, Ponce J (2009) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376. https://doi.org/10.1109/TPAMI.2009.161
Article Google Scholar
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE international conference on computer vision. pp 873–881
Gu X, Fan Z, Zhu S et al (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2495–2504
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141
Ji M, Gall J, Zheng H et al (2017) Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of the IEEE international conference on computer vision. pp 2307–2315
Kang Z, Yang J, Yang Z et al (2020) A review of techniques for 3d reconstruction of indoor environments. ISPRS Int J Geo Inf 9(5):330. https://doi.org/10.1109/TPAMI.2020.3032602
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Knapitsch A, Park J, Zhou QY et al (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph 36(4):1–13. https://doi.org/10.1145/3072959.3073599
Article Google Scholar
Laga H, Jospin LV, Boussaid F et al (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell 44(4):1738–1764. https://doi.org/10.1109/TPAMI.2020.3032602
Article Google Scholar
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Liu S, Qi L, Qin H et al (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
Ma X, Gong Y, Wang Q et al (2021) Epp-mvsnet: epipolar-assembling based depth prediction for multi-view stereo. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5732–5740
Peng R, Wang R, Wang Z et al (2022) Rethinking depth estimation for multi-view stereo: a unified representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8645–8654
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241, https://doi.org/10.1007/978-3-319-24574-4_28
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4104–4113
Shi Y, Xi J, Hu D et al (2023) Raymvsnet++: learning ray-based 1d implicit fields for accurate multi-view stereo. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3296163
Article Google Scholar
Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920. https://doi.org/10.1007/s00138-011-0346-8
Article Google Scholar
Wang F, Galliani S, Vogel C et al (2021) Patchmatchnet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 14194–14203
Wang X, Zhu Z, Huang G et al (2022) Mvster: epipolar transformer for efficient multi-view stereo. In: Computer vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI. Springer, pp 573–591, https://doi.org/10.1007/978-3-031-19821-2_33
Wei Z, Zhu Q, Min C et al (2021) Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6187–6196
Xu H, Zhang J, Cai J et al (2023) Unifying flow, stereo and depth estimation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3298645
Article Google Scholar
Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI conference on artificial intelligence. pp 12508–12515
Xu Q, Kong W, Tao W et al (2022) Multi-scale geometric consistency guided and planar prior assisted multi-view stereo. IEEE Trans Pattern Anal Mach Intell 45(4):4945–4963. https://doi.org/10.1109/TPAMI.2022.3200074
Yan J, Wei Z, Yi H et al (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV. Springer, pp 674–689. https://doi.org/10.1007/978-3-030-58548-8_39
Yang HC, Chen PH, Chen KW et al (2020) Fade: feature aggregation for depth estimation with multi-view stereo. IEEE Trans Image Process 29:6590–6600. https://doi.org/10.1109/TIP.2020.2991883
Yao Y, Luo Z, Li S et al (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV). pp 767–783
Yao Y, Luo Z, Li S et al (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5525–5534
Yao Y, Luo Z, Li S et al (2020) Blendedmvs: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1790–1799
Yi H, Wei Z, Ding M et al (2020) Pyramid multi-view stereo net with self-adaptive view aggregation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer, pp 766–782
Yu A, Guo W, Liu B et al (2021) Attention aware cost volume pyramid based multi-view stereo network for 3d reconstruction. ISPRS J Photogramm Remote Sens 175:448–460
Article Google Scholar
Zhang J, Li S, Luo Z et al (2023) Vis-mvsnet: visibility-aware multi-view stereo network. Int J Comput Vision 131(1):199–214. https://doi.org/10.1007/s11263-022-01697-3
Article Google Scholar
Zhang S, Wei Z, Xu W et al (2023) Dsc-mvsnet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo. Compl Intell Syst 9(6):6953–6969. https://doi.org/10.1007/s40747-023-01106-3
Article Google Scholar

Download references

Funding

This work is supported by the Fundamental Research Funds for the Central Universities (Science and technology leading talent team project) (2022JBQY009), National Natural Science Foundation of China (51827813), National Key R &D Program “Transportation Infrastructure” “Reveal the list and take command” project (2022YFB2603302) and R &D Program of Beijing Municipal Education Commission (KJZD20191000402).

Author information

Authors and Affiliations

Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
Ming Han, Hui Yin & Qianqian Du
Key Laboratory of Beijing for Railway Engineering, Beijing Jiaotong University, Beijing, 100044, China
Aixin Chong
Frontiers Science Center for Smart High-speed Railway System, Beijing Jiaotong University, Beijing, 100044, China
Hui Yin

Authors

Ming Han
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yin
View author publications
You can also search for this author in PubMed Google Scholar
Aixin Chong
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Du
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ming Han: First author. Ming Han made substantial contributions to the conception and design of the research, including formulating the research questions and hypotheses. Ming Han also conducted the experiments, collected and analyzed the data, and interpreted the results. Hui Yin: Corresponding author. Hui Yin provided overall supervision and guidance throughout the research. Aixin Chong: Third author. Aixin Chong provided valuable opinions in revising and improving the manuscript. Qianqian Du: Fourth author. Qianqian Du provided suggestions for improving the manuscript.

Corresponding author

Correspondence to Hui Yin.

Ethics declarations

Competing interests

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Ethics approval

Ethics approval was not required for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, M., Yin, H., Chong, A. et al. Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume. Appl Intell 54, 7924–7940 (2024). https://doi.org/10.1007/s10489-024-05574-z

Download citation

Accepted: 28 May 2024
Published: 15 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10489-024-05574-z

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Abstract

Graphical abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

Multi-view Stereo Network with Attention Thin Volume

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Abstract

Graphical abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

Multi-view Stereo Network with Attention Thin Volume

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Explore related subjects

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now