pytorch-layoutnet

News: Check out our new project HoHoNet on this task and more!
News: Check out our new project HorizonNet on this task.

This is an unofficial implementation of CVPR 18 paper "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image". Official layout dataset are all converted to .png and pretrained models are converted to pytorch state-dict.
What difference from official:

Architecture: Only joint bounday branch and corner branch are implemented as the paper states that "Training with 3D regressor has a small impact".
Pre-processing: Implementation of line segment detector and pano image alignment are converted from matlab to python in pano.py and pano_lsd_align.py.
Post-processing: No 3D layout optimization. Alternatively, this repo implement a gradient ascent optimizing the similar loss. (see below for more detail)

Overview of the pipeline:

Use this repo, you can:

extract/visualize layout of your own 360 images with my trained network
reproduce official experiments
train on your own dataset
quantitative evaluatation (3D IoU, Corner Error, Pixel Error)

Requirements

Python 3
pytorch>=0.4.1
numpy
scipy
Pillow
torchfile
opencv-python>=3.1 (for pre-processing)
open3d (for layout 3D viewer)
shapely (for layout 3D viewer)

Visualization

1. Preparation

Get your fasinated 360 room images. I will use assert/demo.png for example.
Prepare the enviroment to run the python scripts.
Download the trained model from here (350M). Put the 3 files extracted from the downloaded zip under ckpt/ folder.
- So you will get ckpt/epoch_30_*.pth

2. Pre-processing (Align camera pose with floor)

Pre-process the above assert/demo.png by firing below command. See python visual_preprocess.py -h for more detailed script description.
```
python visual_preprocess.py --img_glob assert/demo.png --output_dir assert/output_preprocess/
```
Arguments explanation:
- --img_glob telling the path to your fasinated 360 room image(s).
- --output_dir telling the path to the directory for dumping the results.
- Hint: you can use shell-style wildcards with quote (e.g. "my_fasinated_img_dir/*png") to process multiple images in one shot.
Under the given --output_dir, you will get results like below and prefix with source image basename.
- The aligned rgb images [SOURCE BASENAME]_aligned_rgb.png and line segments images [SOURCE BASENAME]_aligned_line.png
  - demo_aligned_rgb.png demo_aligned_line.png
- The detected vanishing points [SOURCE BASENAME]_VP.txt (Here demo_VP.txt)
```
-0.006676 -0.499807 0.866111
0.000622 0.866128 0.499821
0.999992 -0.002519 0.003119
```

3. Layout Prediction with LayoutNet

Predict the layout from above aligned image and line segments by firing below command.

python visual.py --path_prefix ckpt/epoch_30 --img_glob assert/output_preprocess/demo_aligned_rgb.png --line_glob assert/output_preprocess/demo_aligned_line.png --output_dir assert/output

Arguments explanation:
- --path_prefix prefix path to the trained model.
- --img_glob path to the VP aligned image.
- --line_glob path to the corresponding line segment image of the VP aligned image.
- --output_dir path to the directory to dump the results.
- Hint: for the two glob, you can use wildcards with quote
- Hint: for better result, you can add --flip, --rotate 0.25 0.5 0.75, --post_optimization
you will get results like below and prefix with source image basename.
- The model's output corner/edge probability map [SOURCE BASENAME]_[cor|edg].png
  - demo_aligned_rgb_cor.png demo_aligned_rgb_edg.png
- The extracted layout and all in one image [SOURCE BASENAME]_[bon|all].png
  - demo_aligned_rgb_bon.png demo_aligned_rgb_all.png
- The extracted corners of the layout [SOURCE BASENAME]_cor_id.txt
```
104.928192 186.603119
104.928192 337.168579
378.994934 177.796646
378.994934 346.994629
649.976440 183.446518
649.976440 340.711731
898.234619 190.629089
898.234619 332.616364
```

4. Layout 3D Viewer

A pure python script to visualize the predicted layout in 3D using points cloud. Below command will visualize the result store in assert/

python visual_3d_layout.py --ignore_ceiling --img assert/output_preprocess/demo_aligned_rgb.png --layout  assert/output/demo_aligned_rgb_cor_id.txt

Arguements explanationL
- --img path to aligned 360 image
- --layout path to the txt stroing the cor_id (predicted or ground truth)
- --ignore_ceiling prevent rendering ceiling
- for more arguments, see python visual_3d_layout.py -h
- In the window, you can use mouse and scroll to change the viewport

Preparation for Training

Download offical data and pretrained model as below

/pytorch-layoutnet 
  /data
  | /origin
  |   /data  (download and extract from official)
  |   /gt    (download and extract from official)
  /ckpt
    /panofull_*_pretrained.t7  (download and extract from official)

Execute python torch2pytorch_data.py to convert data/origin/**/* to data/train, data/valid and data/test for pytorch data loader. Under these folder, img/ contains all raw rgb .png while line/, edge/, cor/ contain preprocessed Manhattan line segment, ground truth boundary and ground truth corner respectively.

[optional] Use torch2pytorch_pretrained_weight.py to convert official pretrained pano model to encoder, edg_decoder, cor_decoder pytorch state_dict (see python torch2pytorch_pretrained_weight.py -h for more detailed). examples:

to convert layout pretrained only

python torch2pytorch_pretrained_weight.py --torch_pretrained ckpt/panofull_joint_box_pretrained.t7 --encoder ckpt/pre_full_encoder.pth --edg_decoder ckpt/pre_full_edg_decoder.pth --cor_decoder ckpt/pre_full_cor_decoder.pth

to convert full pretrained (layout regressor branch will be ignored)

python torch2pytorch_pretrained_weight.py --torch_pretrained ckpt/panofull_joint_box_pretrained.t7 --encoder ckpt/pre_full_encoder.pth --edg_decoder ckpt/pre_full_edg_decoder.pth --cor_decoder ckpt/pre_full_cor_decoder.pth

Training

See python train.py -h for detailed arguments explanation.
The default training strategy is the same as official. To launch experiments as official "corner+boundary" setting (--id is used to identified the experiment and can be named youself):

python train.py --id exp_default

To train only using RGB channels as input (no Manhattan line segment):

python train.py --id exp_rgb --input_cat img --input_channels 3

Gradient Ascent Post Optimization

Instead of offical 3D layout optimization with sampling strategy, this repo implement a gradient ascent optimization algorithm to minimize the similar loss of official.
The process abstract below:

greedily extract the cuboid parameter from corner/edge probability map
- The cuboid are consist of the 6 parameters (cx, cy, dx, dy, theta, h)
- corner probability map edge probability map
sample points alone the cuboid boundary and project them to equirectangular formatted corner/edge probability map
- The sample projected points are visualized as green dot
for each projected sample point, getting value by bilinear interpolation from nearest 4 neighbor pixel on the corner/edge probability map
all the sampled values are reduced to a single scalar called score
compute the gradient for the 6 cuboid parameter to maximize the score
Iterative apply gradient ascent (step 2 through 6)

It take less than 2 seconds on CPU and found slightly better result than offical reported.

Quantitative Evaluation

See python eval.py -h for more detailed arguments explanation. To get the result from my trained network (link above):

python eval.py --path_prefix ckpt/epoch_30 --flip --rotate 0.333 0.666

To evaluate with gradient ascent post optimization:

python eval.py --path_prefix ckpt/epoch_30 --flip --rotate 0.333 0.666 --post_optimization

Dataset - PanoContext

exp	3D IoU(%)	Corner error(%)	Pixel error(%)
Official best	`75.12`	`1.02`	`3.18`
ours rgb only	`71.42`	`1.30`	`3.83`
ours rgb only w/ gd opt	`72.52`	`1.50`	`3.66`
ours	`75.11`	`1.04`	`3.16`
ours w/ gd opt	`76.90`	`0.93`	`2.81`

Dataset - Stanford 2D-3D

exp	3D IoU(%)	Corner error(%)	Pixel error(%)
Official best	`77.51`	`0.92`	`2.42`
ours rgb only	`70.39`	`1.50`	`4.28`
ours rgb only w/ gd opt	`71.90`	`1.35`	`4.25`
ours	`75.49`	`0.96`	`3.07`
ours w/ gd opt	`78.90`	`0.88`	`2.78`

References

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem
CVPR2018

@inproceedings{zou2018layoutnet,
  title={LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image},
  author={Zou, Chuhang and Colburn, Alex and Shan, Qi and Hoiem, Derek},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2051--2059},
  year={2018}
}

Official torch implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pytorch-layoutnet

Requirements

Visualization

1. Preparation

2. Pre-processing (Align camera pose with floor)

3. Layout Prediction with LayoutNet

4. Layout 3D Viewer

Preparation for Training

Training

Gradient Ascent Post Optimization

Quantitative Evaluation

Dataset - PanoContext

Dataset - Stanford 2D-3D

References

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
assert		assert
ckpt		ckpt
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
eval.py		eval.py
model.py		model.py
pano.py		pano.py
pano_lsd_align.py		pano_lsd_align.py
pano_opt.py		pano_opt.py
torch2pytorch_data.py		torch2pytorch_data.py
torch2pytorch_pretrained_weight.py		torch2pytorch_pretrained_weight.py
train.py		train.py
utils.py		utils.py
utils_eval.py		utils_eval.py
visual.py		visual.py
visual_3d_layout.py		visual_3d_layout.py
visual_preprocess.py		visual_preprocess.py

License

sunset1995/pytorch-layoutnet

Folders and files

Latest commit

History

Repository files navigation

pytorch-layoutnet

Requirements

Visualization

1. Preparation

2. Pre-processing (Align camera pose with floor)

3. Layout Prediction with LayoutNet

4. Layout 3D Viewer

Preparation for Training

Training

Gradient Ascent Post Optimization

Quantitative Evaluation

Dataset - PanoContext

Dataset - Stanford 2D-3D

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages