[go: up one dir, main page]

Skip to content

pcl3dv/OV-NeRF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for
3D Semantic Understanding

Guibiao Liao1,2, Kaichen Zhou3, Zhenyu Bao1,2, Kanglin Liu2, Qing Li2, *

1Peking University 2Pengcheng Laboratory 3University of Oxford

*Corresponding author: lqing900205@gmail.com

Image

*Abstract: The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics of 3D scenes. Enhancing the capabilities of NeRFs in open-vocabulary 3D semantic perception tasks has been a recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) for semantic field learning encounter difficulties due to noisy and view-inconsistent semantics provided by CLIP. To tackle these limitations, we propose OV-NeRF, which exploits the potential of pre-trained vision and language foundation models to enhance semantic field learning through proposed single-view and cross-view strategies. First, from the single-view perspective, we introduce Region Semantic Ranking (RSR) regularization by leveraging 2D mask proposals derived from Segment Anything (SAM) to rectify the noisy semantics of each training view, facilitating accurate semantic field learning. Second, from the cross-view perspective, we propose a Cross-view Self-enhancement (CSE) strategy to address the challenge raised by view-inconsistent semantics. Rather than invariably utilizing the 2D inconsistent semantics from CLIP, CSE leverages the 3D consistent semantics generated from the well-trained semantic field itself for semantic field training, aiming to reduce ambiguity and enhance overall semantic consistency across different views. Extensive experiments validate our OV-NeRF outperforms current state-of-the-art methods, achieving a significant improvement of 20.31% and 18.42% in mIoU metric on Replica and Scannet, respectively. Furthermore, our approach exhibits consistent superior results across various CLIP configurations, further verifying its robustness.

Qualitative Result

Replica

ScanNet

3DOVS

Quantitative Result

Replica

ScanNet

3DOVS

Citation

Cite below if you find this repository helpful to your project:

@article{liao2024ov,
  title={OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding},
  author={Liao, Guibiao and Zhou, Kaichen and Bao, Zhenyu and Liu, Kanglin and Li, Qing},
  journal={arXiv preprint arXiv:2402.04648},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published