[go: up one dir, main page]

Skip to content

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

Notifications You must be signed in to change notification settings

HKUST-LongGroup/Awesome-Open-Vocabulary-Detection-and-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 

Repository files navigation

Static Badge Static Badge Static Badge arXiv PDF

Chaoyang Zhu, Long Chen*


News

Please remain tuned, this repo will be maintained on a week-to-week basis.

  • 27/06/2024: NeRF and 3DGS based 3D scene understanding is added.
  • 05/06/2024: Our 2nd version manuscript is accepted by TPAMI.

Bibtex

If you find our survey helpful, please consider citing our paper:

@article{survey-ovd-ovs,
  title={A survey on open-vocabulary detection and segmentation: Past, present, and future},
  author={Zhu, Chaoyang and Chen, Long},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

✨ PR is welcome!

Though we aim to cover every paper, still chances may happen that some works are missing. Peer review is welcome and will be highly appreciated, if you are the authors and find our recordings are incorrect, don't hesitate to contact me and fire a PR.

General Overview

In this survey, we cover two settings (zero-shot and open-vocabulary) and six tasks (object detection, semantic/instance/panoptic segmentation, 3D scene understanding, and video understanding). We pivot on the permission to weak supervision signals and the usage of weak supervision signals to build a taxonomy that is universal across these diverse settings and tasks. The weak supervision signals can be image-text pairs or large vision-language models. Below is a general overview of each methodology.

In current literature, zero-shot and open-vocabulary are used interchangeably, however, we highlight their subtle differences through the evolvement from traditional zero-shot to the newly formulated open-vocabulary setting.

Table of Contents

Zero-Shot Object Detection

Visual-Semantic Space Mapping

Venue Paper Abbr Full Title Project
ECCV'18 ZSDv1 Zero-Shot Object Detection N/A
ACCV'18 & IJCV'20 ZSDv2 Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts N/A
AAAI'20 CA-ZSR Context-Aware Zero-Shot Recognition Code
AAAI'19 ZSD-TD Zero-Shot Object Detection with Textual Descriptions N/A
ACCV'20 BLC Background Learnable Cascade for Zero-Shot Object Detection Code
ICCV'19 TL-ZSD Transductive Learning for Zero-Shot Object Detection N/A
arXiv'23 SSB Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline N/A
WACV'20 MS-Zero A Multi-Space Approach to Zero-Shot Object Detection N/A
TCSVT'19 ZS-YOLO Zero Shot Detection N/A
AAAI'21 DPIF Inference Fusion with Associative Semantics for Unseen Object Detection Code
TPAMI'21 ContrastZSD Semantics-Guided Contrastive Network for Zero-Shot Object detection N/A
IJCAI'20 ZSD-CNN Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space N/A

Novel Visual Feature Synthesis

Venue Paper Abbr Paper Title Project
CVPR'20 DELO Dont Even Look Once: Synthesizing Features for Zero-Shot Detection N/A
ACCV'20 SU Synthesizing the Unseen for Zero-shot Object Detection Code
AAAI'20 GTNet GTNet: Generative Transfer Network for Zero-Shot Object Detection Code
CVPR'22 RRFS Robust Region Feature Synthesizer for Zero-Shot Object Detection Code

Zero-Shot Segmentation

Zero-Shot Semantic Segmentation

Visual-Semantic Space Mapping

Venue Paper Abbr Paper Title Project
CVPR'20 SPNet Semantic Projection Network for Zero- and Few-Label Semantic Segmentation Code
NeurIPS'20 ULZSS Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation Code
ICCV'21 JoEm Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation Code
ICCVW'19 VM Zero-Shot Semantic Segmentation via Variational Mapping N/A
ICCV'21 PMOSR Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation N/A

Novel Visual Feature Synthesis

Venue Paper Abbr Paper Title Project
NeurIPS'19 ZS3Net Zero-Shot Semantic Segmentation Code
NeurIPS'20 CSRL Consistent Structural Relation Learning for Zero-Shot Segmentation N/A
MM'20 CaGNet Context-aware Feature Generation for Zero-shot Semantic Segmentation Code
ICCV'21 SIGN SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation Code

Zero-Shot Instance Segmentation

Venue Paper Abbr Paper Title Project
CVPR'21 ZSIS Zero-Shot Instance Segmentation Code

Open-Vocabulary Object Detection

Region-Aware Training

Venue Paper Abbr Paper Title Project
CVPR'21 OVR-CNN Open-Vocabulary Object Detection Using Captions Code
GCPR'22 LocOv Localized Vision-Language Matching for Open-vocabulary Object Detection Code
arXiv'23 MMC-Det Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection N/A
NeurIPS'22 DetCLIP DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection N/A
CVPR'23 DetCLIPv2 DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment N/A
CVPR'24 DetCLIPv3 DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection N/A
AAAI'24 WSOVOD Weakly Supervised Open-Vocabulary Object Detection Code
CVPR'23 RO-ViT Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
N/A
ICCV'23 CFM-ViT Contrastive Feature Masking Open-Vocabulary Vision Transformer N/A
ICCV'23 DITO Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection Code
ICLR'23 VLDet Learning Object-Language Alignments for Open-Vocabulary Object Detection Code
ICCV'23 GOAT Open-Vocabulary Object Detection With an Open Corpus
N/A
ECCV'22 OV-DETR Open-Vocabulary DETR with Conditional MatchingCode
arXiv'23 Prompt-OVD Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection N/A
CVPR'23 CORA CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching N/A
ICCV'23 EdaDet EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment Code
ICCV'21 MDETR MDETR: Modulated Detection for End-to-End Multi-Modal Understanding Code
ECCV'22 MAVL Class-agnostic Object Detection with Multi-modal Transformer Code
NeurIPS'24 MQ-Det Multi-modal Queried Object Detection in the Wild Code
CVPR'24 YOLO-World Real-Time Open-Vocabulary Object Detection Code
MM'23 SGDN Open-Vocabulary Object Detection via Scene Graph Discovery N/A
CVPR'24 USE USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation N/A

Pseudo-Labeling

Venue Paper Abbr Paper Title Project
CVPR'22 RegionCLIP RegionCLIP: Region-based Language-Image Pretraining Code
ECCV'22 VL-PLM Exploiting Unlabeled Data with Vision and Language Models for Object Detection Code
CVPR'22 GLIP Grounded Language-Image Pre-training Code
NeurIPS'22 GLIPv2 GLIPv2: Unifying Localization and VL
Understanding
Code
arXiv'23 Grounding-DINO Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Code
ECCV'22 PromptDet PromptDet: Towards Open-vocabulary Detection using Uncurated Images Code
arXiv'23 SAS-Det Taming Self-Training for Open-Vocabulary Object Detection Code
ECCV'22 PB-OVD Open Vocabulary Object Detection with Pseudo Bounding-Box Labels Code
AAAI'24 CLIM CLIM: Contrastive Language-Image Mosaic for Region Representation Code
arXiv'22 VTP-OVD Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection N/A
AAAI'24 ProxyDet ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection Code
NeurIPS'23 CoDet CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection Code
ECCV'22 Detic Detecting Twenty-thousand Classes using Image-level Supervision Code
ICML'23 MMC Multi-Modal Classifiers for Open-Vocabulary Object Detection Code
arXiv'23 3Ways Three ways to improve feature alignment for open vocabulary detectio N/A
arXiv'23 PLAC Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection N/A
arXiv'23 PCL Open-Vocabulary Object Detection using Pseudo Caption Labels
N/A
NeurIPS'24 OWLv2 Scaling Open-Vocabulary Object Detection Code

Knowledge Distillation

Venue Paper Abbr Paper Title Project
ICLR'22 ViLD Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code
ICDMW'22 ZSD-YOLO Zero-shot Object Detection Through Vision-Language Embedding Alignment Code
WACV'24 LP-OVOD LP-OVOD: Open-Vocabulary Object Detection by Linear Probing Code
arXiv'23 EZSD Efficient Feature Distillation for Zero-shot Annotation Object Detection Code
AAAI'24 SIC-CADS Simple Image-level Classification Improves Open-vocabulary Object Detection Code
CVPR'23 BARON Aligning Bag of Regions for Open-Vocabulary Object Detection Code
CVPR'23 OADP Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Code
arXiv'23 GridCLIP GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning N/A
NeurIPS'22 RKDWTF Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection Code
ICCV'23 DK-DETR Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection Code
CVPR'22 HierKD Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation Code
CVPR'22 DetPro Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model Code
arXiv'23 CLIPSelf CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Code
CVPR'24 SAMP Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection N/A
IJCV'24 OV-DAR OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition N/A
CVPR'24 LBP Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection N/A

Transfer Learning

Venue Paper Abbr Paper Title Project
ECCV'22 OWL-ViT Simple Open-Vocabulary Object Detection with Vision Transformers Code
CVPR'23 UniDetector Detecting Everything in the Open World: Towards Universal Object Detection Code
ICLR'23 F-VLM F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models Code
CVPR'23 ScaleDet ScaleDet: A Scalable Multi-Dataset Object Detector N/A
ICCV'23 OpenSeed A Simple Framework for Open-Vocabulary Segmentation and Detection Code
arXiv'23 DRR What Makes Good Open-Vocabulary Detector: A Disassembling Perspective N/A
arXiv'23 Sambor Boosting Segment Anything Model Towards Open-Vocabulary Learning Code

Open-Vocabulary Segmentation

Open-Vocabulary Semantic Segmentation

Region-Aware Training

Venue Paper Abbr Paper Title Project
ECCV'22 OpenSeg Scaling Open-Vocabulary Image Segmentation with Image-Level Labels N/A
arXiv'23 SLIC SILC: Improving Vision Language Pretraining with Self-Distillation N/A
CVPR'22 GroupViT GroupViT: Semantic Segmentation Emerges from Text Supervision Code
ECCV'22 ViL-Seg Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding N/A
ICML'23 SegCLIP SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation Code
CVPR'23 OVSegmentor Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision Code
CVPR'23 PACL Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning N/A
CVPR'23 TCL Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
ECCV'22 SimSeg A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model Code

Pseudo-Labeling

Venue Paper Abbr Paper Title Project
ECCV'22 TTD Open-Vocabulary Semantic Segmentation Using Test-Time Distillation N/A

Knowledge Distillation

Venue Paper Abbr Paper Title Project
arXiv'23 GKC Global Knowledge Calibration for Fast Open-Vocabulary Segmentation N/A
arXiv'23 SAM-CLIP SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding N/A
ICCV'23 ZeroSeg Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only Code

Transfer Learning

Venue Paper Abbr Paper Title Project
ICLR'22 LSeg Language-driven Semantic Segmentation Code
CVPR'23 SAZS Delving Into Shape-Aware Zero-Shot Semantic Segmentation Code
MM'23 CEL Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation N/A
CVPR'22 ZegFormer Decoupling Zero-Shot Semantic Segmentation Code
NeurIPS'22 ReCo ReCo: Retrieve and Co-segment for Zero-shot Transfer Project
arXiv'23 SCAN Open-Vocabulary Segmentation with Semantic-Assisted Calibration N/A
ECCV'22 ZSSeg A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model Code
ECCV'22 MaskCLIP Extract Free Dense Labels from CLIP Code
arXiv'23 CLIP-DINOiser CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation Code
PRCV'23 MVP-SEG MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation N/A
arXiv'23 OVDiff Diffusion Models for Zero-Shot Open-Vocabulary Segmentation Project
WACV'24 FOSSIL FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval N/A
NeurIPS'24 POMP Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition Code
NeurIPS'24 AttrSeg AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation N/A
arXiv'23 PnP-OVSS Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models Code
arXiv'23 TagAlign TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Project
arXiv'23 SelfSeg Auto-Vocabulary Semantic Segmentation N/A
CVPR'22 DenseCLIP DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Code
CVPR'23 OVSeg Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Code
arXiv'23 CAT-Seg CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation Code
arXiv'23 SED SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation Code
NeurIPS'23 MAFT Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Code
arXiv'23 TagCLIP TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation N/A
CVPR'23 ZegCLIP ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation Code
CVPR'22 CLIPSeg Image Segmentation Using Text and Image Prompts Code
CVPR'23 SAN Side Adapter Network for Open-Vocabulary Semantic Segmentation Code
arXiv'23 CLIP Surgery CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks Code
arXiv'23 CaR CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor Project
arXiv'24 Cascade-CLIP Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation Code
arXiv'24 OpenDAS OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation Project
arXiv'24 H-CLIP Parameter-efficient Fine-tuning in Hyperspherical
Space for Open-vocabulary Semantic Segmentation N/A

Open-Vocabulary Instance Segmentation

Region-Aware Training

Venue Paper Abbr Paper Title Project
ICCV'23 CGG Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation Code
CVPR'23 D2Zero Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation Code

Pseudo-Labeling

Venue Paper Abbr Paper Title Project
CVPR'23 XPM Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling Code
CVPR'23 Mask-free OVIS Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations Code
arXiv'23 MosaicFusion MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation Code

Knowledge Distillation

Venue Paper Abbr Paper Title Project
arXiv'24 OV-SAM Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Code

Open-Vocabulary Panoptic Segmentation

Region-Aware Training

Venue Paper Abbr Paper Title Project
arXiv'24 Uni-OVSeg Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision Code
CVPR'23 X-Decoder Generalized Decoding for Pixel, Image, and Language Code
CVPR'24 APE Learning active tactile perception through belief-space control Code

Knowledge Distillation

Venue Paper Abbr Paper Title Project
CVPR'23 PADing Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation Code

Transfer Learning

Venue Paper Abbr Paper Title Project
NeurIPS'23 FC-CLIP Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Code
CVPR'23 FreeSeg FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation Project
arXiv'24 PosSAM PosSAM: Panoptic Open-vocabulary Segment Anything Project
ICCV'23 MasQCLIP MasQCLIP for Open-Vocabulary Universal Image Segmentation Project
CVPR'23 OMG-Seg OMG-Seg: Is One Model Good Enough For All Segmentation? Code
arXiv'23 Semantic-SAM Semantic-SAM: Segment and Recognize Anything at Any Granularity Code
CVPR'23 ODISE Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Code
NeurIPS'23 HIPIE Hierarchical Open-vocabulary Universal Image Segmentation Code
ICML'23 MaskCLIP Open-Vocabulary Universal Image Segmentation with MaskCLIP Project
ICCV'23 OPSNet Open-vocabulary Panoptic Segmentation with Embedding Modulation N/A

Open-Vocabulary 3D Scene Understanding

Open-Vocabulary 3D Detection

Venue Paper Abbr Paper Title Project
CVPR'23 OV-3DET Open-Vocabulary Point-Cloud Object Detection without 3D Annotation Code
AAAI'24 FM-OV3D FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection Code
arXiv'23 OpenSight OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection N/A
NeurIPS'23 CoDA CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection Code
arXiv'23 L3Det Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection N/A

Open-Vocabulary 3D Segmentation

Open-Vocabulary 3D Semantic Segmentation

Venue Paper Abbr Paper Title Project
arXiv'21 SeCondPoint Language-Level Semantics Conditioned 3D Point Cloud Segmentation N/A
3DV'21 3DGenZ Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Code
CVPR'23 OpenScene OpenScene: 3D Scene Understanding with Open Vocabularies Project
CVPR'23 PLA PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Code
arXiv'23 RegionPLC RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding Project

Open-Vocabulary 3D Instance Segmentation

Venue Paper Abbr Paper Title Project
NeurIPS'23 OpenMask3D OpenMask3D: Open-Vocabulary 3D Instance Segmentation Project
CVPR'24 MaskClustering MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation Project
arXiv'23 OpenIns3D OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Project
arXiv'23 Open3DIS Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance Project
arXiv'24 OpenSU3D OpenSU3D: Open World 3D Scene Understanding using Foundation Models Project

NeRF and 3DGS based

NeRF (Neural Radiance Field) and 3DGS (3D Gaussian Splatting) are hot topics for novel view synthesis in a holistic scene. They leverage multi-view consistency learning inherently imposed in the 3D model to help 2D image segmentation or directly perform 3D semantic segmentation over points (voxel or gaussian) in the scene.

Venue Paper Abbr Paper Title Project
ICCV'21 Semantic-NeRF In-Place Scene Labelling and Understanding With Implicit Scene Representation Code
NeurIPS'22 FFD Decomposing NeRF for Editing via Feature Field Distillation Code
arXiv'23 Gaussian Grouping Gaussian Grouping: Segment and Edit Anything in 3D Scenes Code
ICCV'23 LERF LERF: Language Embedded Radiance Fields Project
NeurIPS'23 3DOVS Weakly Supervised 3D Open-vocabulary Segmentation Code
arXiv'24 OpenGaussian OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding Project
arXiv'24 OV-NeRF OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding Code
arXiv'24 Semantic Gaussians Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting Project
arXiv'24 FMGS FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding Project
CVPR'24 LEGaussians Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding Code
CVPR'24 LangSplat LangSplat: 3D Language Gaussian Splatting Project
CVPR'24 Feature 3DGS Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields Code

Open-Vocabulary Video Understanding

Open-Vocabulary Video Instance Segmentation

Venue Paper Abbr Paper Title Project
ICCV'23 OV2Seg Towards Open-Vocabulary Video Instance Segmentation Code
arXiv'23 OpenVIS OpenVIS: Open-vocabulary Video Instance Segmentation Code
arXiv'24 BriVIS Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation Code