Seungjun Lee1* ·
Yuyang Zhao2* ·
Gim Hee Lee2
1Korea University ·
2National University of Singapore
*equal contribution
Code | Paper | Project Page
SOLE is highly generalizable and can segment corresponding instances with various language instructions, including but not limited to visual questions, attributes description, and functional description.
Table of Contents
- [2024/04/20] Code is released 💡.
- [2024/05/02] Pre-processed data and weights are released. Now you can train and evaluate our SOLE 👏🏻.
- Release the code
- Release the preprocessed data and weights
- Release the evaluation code for Replica dataset
- Release the pre-processed data and precomputed features for Replica dataset
The main dependencies of the project are the following:
python: 3.10.9
cuda: 11.3
You can set up a conda environment as follows
export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"
conda env create -f environment.yml
conda activate sole
pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
mkdir third_party
cd third_party
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas
cd ../../pointnet2
python setup.py install
cd ../../
pip3 install pytorch-lightning==1.7.2
pip3 install open-clip-torch
We provide the pre-processed 3D data and precomputed features for the training and evaluation which are listed below:
- Pre-processed 3D data
- ScanNet
- ScanNet200
Replica(coming soon)
- Precomputed per-point CLIP features
- ScanNet
Replica(coming soon)
- Precomputed features of MCA and MEA
You can download above data with following Download data and weight. We also provide the specific data configuration in here to help your understanding for our pre-processed data.
For the stable training, we employ a two-stage training process:
- Pretrain the backbone with only using mask-annotations.
- Train the mask decoder while backbone is fixed. Mask annotations and three types of associations are used for the training. (See the original paper for the details.)
For the training, we provide pretrained backbone weights for ScanNet and ScanNet200 datasets listed below:
For the evaluation, we provide the official weight of SOLE for ScanNet and ScanNet200 datasets listed below:
- Offical weights of SOLE for ScanNet
- Official weights of SOLE for ScanNet200
Official weights of SOLE for Replica(coming soon)
You can download all of the weights for the pretrained backbone and SOLE with following Download data and weight.
We provide the python script that download all of the pre-processed data and weights we mentioned above. You can run the command below:
python download_data.py
Once you run the above command, the downloaded files must be automatically located to the corresponding path. Refer to the file structure below.
├── backbone_checkpoint
│ ├── backbone_scannet.ckpt <- Backbone weights for ScanNet
│ └── backbone_scannet200.ckpt <- Backobne weights for ScanNet200
│
├── checkpoint
│ ├── scannet.ckpt <- Official weights for ScanNet
│ └── scannet200.ckpt <- Official weights for ScanNet200
│
├── data
│ └── preprocessed
│ ├── scannet <- Preprocessed ScanNet data
│ └── scannet200 <- Preprocessed ScanNet200 data
│
├── openvocab_supervision
│ ├── openseg
│ │ └── scannet <- Precomputed per-point CLIP features for ScanNet
│ │ ├── scene0000_00_0.pt
│ │ ├── scene0000_01_0.pt
│ │ └── ...
│ ├── scannet_mca <- Precomputed features of MCA for ScanNet
│ │ ├── scene0000_00.pickle
│ │ ├── scene0000_01.pickle
│ │ └── ...
│ ├── scannet_mea <- Precomputed features of MEA for ScanNet
│ │ ├── scene0000_00.pickle
│ │ ├── scene0000_01.pickle
│ │ └── ...
│ ├── scannet200_mca <- Precomputed features of MCA for ScanNet200
│ │ ├── scene0000_00.pickle
│ │ ├── scene0000_01.pickle
│ │ └── ...
│ └── scannet200_mea <- Precomputed features of MEA for ScanNet200
│ ├── scene0000_00.pickle
│ ├── scene0000_01.pickle
│ └── ...
If you successfully download all of the given files, you are now ready to train and evaluate the model. Check the training and evaluation command in Training and Testing section to run the SOLE.
Train the SOLE on the ScanNet dataset.
bash scripts/scannet/scannet_train.sh
Train the SOLE on the ScanNet200 dataset.
bash scripts/scannet200/scannet200_train.sh
Evaluate the SOLE on the ScanNet dataset.
bash scripts/scannet/scannet_val.sh
Evaluate the SOLE on the ScanNet200 dataset.
bash scripts/scannet200/scannet200_val.sh
If you want to use wandb during the training, set the workspace
in conf/config_base_instance_segmentation.yaml
file to your wandb workspace name. And run the command below before running the training/testing command:
wandb enabled
If you want to turn off the wandb, run the command below before running the training/testing command:
wandb disabled
We build our code on top of the Mask3D. We sincerely thank to Mask3D team for the amazing work and well-structured code. Furthermore, our work is inspired a lot from the following works:
We express our gratitude for their exceptional contributions.
If you find our code or paper useful, please cite
@article{lee2024segment,
title = {Segment Any 3D Object with Language},
author = {Lee, Seungjun and Zhao, Yuyang and Lee, Gim Hee},
year = {2024},
journal = {arXiv preprint arXiv:2404.02157},
}