8000 Merge pull request #384 from Stability-AI/vikram/sv4d · CodersSampling/generative-models@31fe459 · GitHub
Skip to content

Commit 31fe459

Browse files
authored
Merge pull request Stability-AI#384 from Stability-AI/vikram/sv4d
Adds SV4D code
2 parents fbdc58c + abe9ed3 commit 31fe459

File tree

16 files changed

+3174
-23
lines changed

16 files changed

+3174
-23
lines changed

README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,30 @@
44

55
## News
66

7+
8+
**July 24, 2024**
9+
- We are releasing **[Stable Video 4D (SV4D)](https://huggingface.co/stabilityai/sv4d)**, a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
10+
- **SV4D** was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
11+
- To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
12+
- Please check our [project page](), [tech report]() and [video summary]() for more details.
13+
14+
**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [SV4D](https://huggingface.co/stabilityai/sv4d) and [SV3D_u]((https://huggingface.co/stabilityai/sv3d)) from HuggingFace)
15+
16+
To run **SV4D** on a single input video of 21 frames:
17+
- Download SV3D models (`sv3d_u.safetensors` and `sv3d_p.safetensors`) from [here](https://huggingface.co/stabilityai/sv3d) and SV4D model (`sv4d.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d) to `checkpoints/`
18+
- Run `python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>`
19+
- `input_path` : The input video `<path/to/video>` can be
20+
- a single video file in `gif` or `mp4` format, such as `assets/test_video1.mp4`, or
21+
- a folder containing images of video frames in `.jpg`, `.jpeg`, or `.png` format, or
22+
- a file name pattern matching images of video frames.
23+
- `num_steps` : default is 20, can increase to 50 for better quality but longer sampling time.
24+
- `sv3d_version` : To specify the SV3D model to generate reference multi-views, set `--sv3d_version=sv3d_u` for SV3D_u or `--sv3d_version=sv3d_p` for SV3D_p.
25+
- `elevations_deg` : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run `python scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0`
26+
- **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos (with noisy background), try segmenting the foreground object using [Cliipdrop](https://clipdrop.co/) before running SV4D.
27+
28+
![tile](assets/sv4d.gif)
29+
30+
731
**March 18, 2024**
832
- We are releasing **[SV3D](https://huggingface.co/stabilityai/sv3d)**, an image-to-video model for novel multi-view synthesis, for research purposes:
933
- **SV3D** was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.

assets/hiphop_parrot.mp4

56.8 KB
Binary file not shown.

assets/sv4d.gif

7.99 MB
Loading

assets/test_video1.mp4

92.4 KB
Binary file not shown.

assets/test_video2.mp4

25.2 KB
Binary file not shown.

0 commit comments

Comments
 (0)
0