|
4 | 4 |
|
5 | 5 | ## News
|
6 | 6 |
|
| 7 | + |
| 8 | +**July 24, 2024** |
| 9 | +- We are releasing **[Stable Video 4D (SV4D)](https://huggingface.co/stabilityai/sv4d)**, a video-to-4D diffusion model for novel-view video synthesis. For research purposes: |
| 10 | + - **SV4D** was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object. |
| 11 | + - To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency. |
| 12 | + - Please check our [project page](), [tech report]() and [video summary]() for more details. |
| 13 | + |
| 14 | +**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [SV4D](https://huggingface.co/stabilityai/sv4d) and [SV3D_u]((https://huggingface.co/stabilityai/sv3d)) from HuggingFace) |
| 15 | + |
| 16 | +To run **SV4D** on a single input video of 21 frames: |
| 17 | +- Download SV3D models (`sv3d_u.safetensors` and `sv3d_p.safetensors`) from [here](https://huggingface.co/stabilityai/sv3d) and SV4D model (`sv4d.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d) to `checkpoints/` |
| 18 | +- Run `python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>` |
| 19 | + - `input_path` : The input video `<path/to/video>` can be |
| 20 | + - a single video file in `gif` or `mp4` format, such as `assets/test_video1.mp4`, or |
| 21 | + - a folder containing images of video frames in `.jpg`, `.jpeg`, or `.png` format, or |
| 22 | + - a file name pattern matching images of video frames. |
| 23 | + - `num_steps` : default is 20, can increase to 50 for better quality but longer sampling time. |
| 24 | + - `sv3d_version` : To specify the SV3D model to generate reference multi-views, set `--sv3d_version=sv3d_u` for SV3D_u or `--sv3d_version=sv3d_p` for SV3D_p. |
| 25 | + - `elevations_deg` : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run `python scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0` |
| 26 | + - **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos (with noisy background), try segmenting the foreground object using [Cliipdrop](https://clipdrop.co/) before running SV4D. |
| 27 | + |
| 28 | +  |
| 29 | + |
| 30 | + |
7 | 31 | **March 18, 2024**
|
8 | 32 | - We are releasing **[SV3D](https://huggingface.co/stabilityai/sv3d)**, an image-to-video model for novel multi-view synthesis, for research purposes:
|
9 | 33 | - **SV3D** was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
|
|
0 commit comments