[go: up one dir, main page]

Skip to content

[ISMAR 2023] Self-supervised monocular depth estimation with visual attention

Notifications You must be signed in to change notification settings

IlyaInd/MonoVAN

Repository files navigation

MonoVAN

This is the reference PyTorch implementation for training and testing depth estimation models using the method described in

MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation

Ilya Indyk and Ilya Makarov

qualitative_kitti

⚙️ Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

pip install -r requirements.txt

We ran our experiments with PyTorch 1.13.0, CUDA 11.7, Python 3.10 and CentOS 7.

Note that our code is built based on Monodepth2.

📊Results on KITTY

table_kitti

💾 KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Warning: it weighs about 175GB, so make sure you have enough space to unzip too!

Our default settings expect that you have converted the png images to jpeg with this command, which also deletes the raw KITTI .png files:

find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and train from raw png files by adding the flag --png when training, at the expense of slower load times.

The above conversion command creates images which match our experiments, where KITTI .png images were converted to .jpg on Ubuntu 16.04 with default chroma subsampling 2x2,1x1,1x1. We found that Ubuntu 18.04 defaults to 2x2,2x2,2x2, which gives different results, hence the explicit parameter in the conversion command.

You can also place the KITTI dataset wherever you like and point towards it with the --data_path flag during training and evaluation.

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Custom dataset

You can train on a custom monocular or stereo dataset by writing a new dataloader class which inherits from MonoDataset – see the KITTIDataset class in datasets/kitti_dataset.py for an example.

⏳ Training

By default models are saved to ~/tmp/<model_name>. This can be changed with the --log_dir flag.

PLease download the ImageNet-1K pretrained VAN B1 or B2 model to ./ckpt/.

Monocular training:

wandb disabled
python train.py --model_name=model \
                --batch_size=16 \
                --num_epochs=21 \
                --learning_rate=0.00013 \
                --weight_decay=0 \
                --scheduler='step' \
                --lr_final_div_factor=0.1 \
                --log_frequency=50 \
                --num_workers=8 \
                --scheduler_step_size=15 \
                --log_dir='logs' \

GPUs

You can specify which GPU to use with the CUDA_VISIBLE_DEVICES environment variable:

CUDA_VISIBLE_DEVICES=2 python train.py --model_name mono_model

Code to support training on multiple GPUs will be released later.

🔧 Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

📊 KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the epoch 19 weights of a model named mono_model:

python evaluate_depth.py --load_weights_folder ~/tmp/mono_model/models/weights_19/ --eval_mono

For stereo models, you must use the --eval_stereo flag (see note below):

python evaluate_depth.py --load_weights_folder ~/tmp/stereo_model/models/weights_19/ --eval_stereo

If you train your own model with our code you are likely to see slight differences to the publication results due to randomization in the weights initialization and data loading.

An additional parameter --eval_split can be set. The three different values possible for eval_split are explained here:

--eval_split Test set size For models trained with... Description
eigen 697 --split eigen_zhou (default) or --split eigen_full The standard Eigen test files
eigen_benchmark 652 --split eigen_zhou (default) or --split eigen_full Evaluate with the improved ground truth from the new KITTI depth benchmark
benchmark 500 --split benchmark The new KITTI depth benchmark test files.

Weights

Weights for the models with B1 and B2 configurations trained on the KITTY dataset will be released later.

Acknowledgement

Thanks the authors for their works:

Monodepth2

VAN

HR-Depth

About

[ISMAR 2023] Self-supervised monocular depth estimation with visual attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages