You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+79-40Lines changed: 79 additions & 40 deletions
Original file line number
Diff line number
Diff line change
@@ -4,26 +4,48 @@
4
4
5
5
## News
6
6
7
+
**November 21, 2023**
8
+
9
+
- We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
10
+
-[SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid): This model was trained to generate 14
11
+
frames at resolution 576x1024 given a context frame of the same size.
12
+
We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware `deflickering decoder`.
13
+
-[SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt): Same architecture as `SVD` but finetuned
14
+
for 25 frame generation.
15
+
- We provide a streamlit demo `scripts/demo/video_sampling.py` and a standalone python script `scripts/sampling/simple_video_sample.py` for inference of both models.
16
+
- Alongside the model, we release a [technical report](https://stability.ai/research/stable-video-diffusion-scaling-latent-video-diffusion-models-to-large-datasets).
17
+
18
+

19
+
7
20
**July 26, 2023**
8
-
- We are releasing two new open models with a permissive [`CreativeML Open RAIL++-M` license](model_licenses/LICENSE-SDXL1.0) (see [Inference](#inference) for file hashes):
9
-
-[SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0): An improved version over `SDXL-base-0.9`.
10
-
-[SDXL-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0): An improved version over `SDXL-refiner-0.9`.
11
21
12
-

22
+
- We are releasing two new open models with a
23
+
permissive [`CreativeML Open RAIL++-M` license](model_licenses/LICENSE-SDXL1.0) (see [Inference](#inference) for file
24
+
hashes):
25
+
-[SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0): An improved version
26
+
over `SDXL-base-0.9`.
27
+
-[SDXL-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0): An improved version
28
+
over `SDXL-refiner-0.9`.
13
29
30
+

14
31
15
32
**July 4, 2023**
33
+
16
34
- A technical report on SDXL is now available [here](https://arxiv.org/abs/2307.01952).
17
35
18
36
**June 22, 2023**
19
37
20
-
21
38
- We are releasing two new diffusion models for research purposes:
22
-
-`SDXL-base-0.9`: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses [OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) for text encoding whereas the refiner model only uses the OpenCLIP model.
23
-
-`SDXL-refiner-0.9`: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
39
+
-`SDXL-base-0.9`: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The
40
+
base model uses [OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip)
41
+
and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) for text encoding whereas the refiner model only uses
42
+
the OpenCLIP model.
43
+
-`SDXL-refiner-0.9`: The refiner has been trained to denoise small noise levels of high quality data and as such is
44
+
not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
24
45
25
46
If you would like to access these models for your research, please apply using one of the following links:
26
-
[SDXL-0.9-Base model](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9), and [SDXL-0.9-Refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
and [SDXL-0.9-Refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
27
49
This means that you can apply for any of the two links - and if you are granted - you can access both.
28
50
Please log in to your Hugging Face Account with your organization email to request access.
29
51
**We plan to do a full release soon (July).**
@@ -32,21 +54,32 @@ Please log in to your Hugging Face Account with your organization email to reque
32
54
33
55
### General Philosophy
34
56
35
-
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling `instantiate_from_config()` on objects defined in yaml configs. See `configs/` for many examples.
57
+
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by
58
+
calling `instantiate_from_config()` on objects defined in yaml configs. See `configs/` for many examples.
36
59
37
60
### Changelog from the old `ldm` codebase
38
61
39
-
For training, we use [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly `LatentDiffusion`, now `DiffusionEngine`) has been cleaned up:
62
+
For training, we use [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), but it should be easy to use other
63
+
training wrappers around the base modules. The core diffusion model class (formerly `LatentDiffusion`,
64
+
now `DiffusionEngine`) has been cleaned up:
40
65
41
-
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: `GeneralConditioner`, see `sgm/modules/encoders/modules.py`.
66
+
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial
67
+
conditionings, and all combinations thereof) in a single class: `GeneralConditioner`,
68
+
see `sgm/modules/encoders/modules.py`.
42
69
- We separate guiders (such as classifier-free guidance, see `sgm/modules/diffusionmodules/guiders.py`) from the
43
70
samplers (`sgm/modules/diffusionmodules/sampling.py`), and the samplers are independent of the model.
44
-
- We adopt the ["denoiser framework"](https://arxiv.org/abs/2206.00364) for both training and inference (most notable change is probably now the option to train continuous time models):
45
-
* Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see `sgm/modules/diffusionmodules/denoiser.py`.
46
-
* The following features are now independent: weighting of the diffusion loss function (`sgm/modules/diffusionmodules/denoiser_weighting.py`), preconditioning of the network (`sgm/modules/diffusionmodules/denoiser_scaling.py`), and sampling of noise levels during training (`sgm/modules/diffusionmodules/sigma_sampling.py`).
71
+
- We adopt the ["denoiser framework"](https://arxiv.org/abs/2206.00364) for both training and inference (most notable
72
+
change is probably now the option to train continuous time models):
73
+
* Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
74
+
see `sgm/modules/diffusionmodules/denoiser.py`.
75
+
* The following features are now independent: weighting of the diffusion loss
76
+
function (`sgm/modules/diffusionmodules/denoiser_weighting.py`), preconditioning of the
77
+
network (`sgm/modules/diffusionmodules/denoiser_scaling.py`), and sampling of noise levels during
78
+
training (`sgm/modules/diffusionmodules/sigma_sampling.py`).
47
79
- Autoencoding models have also been cleaned up.
48
80
49
81
## Installation:
82
+
50
83
<aname="installation"></a>
51
84
52
85
#### 1. Clone the repo
@@ -60,29 +93,17 @@ cd generative-models
60
93
61
94
This is assuming you have navigated to the `generative-models` root after cloning it.
62
95
63
-
**NOTE:** This is tested under `python3.8` and `python3.10`. For other python versions, you might encounter version conflicts.
64
-
65
-
66
-
**PyTorch 1.13**
67
-
68
-
```shell
69
-
# install required packages from pypi
70
-
python3 -m venv .pt13
71
-
source .pt13/bin/activate
72
-
pip3 install -r requirements/pt13.txt
73
-
```
96
+
**NOTE:** This is tested under `python3.10`. For other python versions, you might encounter version conflicts.
74
97
75
98
**PyTorch 2.0**
76
99
77
-
78
100
```shell
79
101
# install required packages from pypi
80
102
python3 -m venv .pt2
81
103
source .pt2/bin/activate
82
104
pip3 install -r requirements/pt2.txt
83
105
```
84
106
85
-
86
107
#### 3. Install `sgm`
87
108
88
109
```shell
@@ -114,8 +135,10 @@ depending on your use case and PyTorch version, manually.
114
135
115
136
## Inference
116
137
117
-
We provide a [streamlit](https://streamlit.io/) demo for text-to-image and image-to-image sampling in `scripts/demo/sampling.py`.
118
-
We provide file hashes for the complete file as well as for only the saved tensors in the file (see [Model Spec](https://github.com/Stability-AI/ModelSpec) for a script to evaluate that).
138
+
We provide a [streamlit](https://streamlit.io/) demo for text-to-image and image-to-image sampling
139
+
in `scripts/demo/sampling.py`.
140
+
We provide file hashes for the complete file as well as for only the saved tensors in the file (
141
+
see [Model Spec](https://github.com/Stability-AI/ModelSpec) for a script to evaluate that).
The weights of SDXL-0.9 are available and subject to a [research license](model_licenses/LICENSE-SDXL0.9).
146
170
If you would like to access these models for your research, please apply using one of the following links:
147
-
[SDXL-base-0.9 model](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9), and [SDXL-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
**NOTE 1:** Using the non-toy-dataset configs `configs/example_training/imagenet-f8_cond.yaml`, `configs/example_training/txt2img-clipl.yaml` and `configs/example_training/txt2img-clipl-legacy-ucg-training.yaml` for training will require edits depending on the used dataset (which is expected to stored in tar-file in the [webdataset-format](https://github.com/webdataset/webdataset)). To find the parts which have to be adapted, search for comments containing `USER:` in the respective config.
and `configs/example_training/txt2img-clipl-legacy-ucg-training.yaml` for training will require edits depending on the
235
+
used dataset (which is expected to stored in tar-file in
236
+
the [webdataset-format](https://github.com/webdataset/webdataset)). To find the parts which have to be adapted, search
237
+
for comments containing `USER:` in the respective config.
207
238
208
-
**NOTE 2:** This repository supports both `pytorch1.13` and `pytorch2`for training generative models. However for autoencoder training as e.g. in `configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml`, only `pytorch1.13` is supported.
239
+
**NOTE 2:** This repository supports both `pytorch1.13` and `pytorch2`for training generative models. However for
240
+
autoencoder training as e.g. in `configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml`,
241
+
only `pytorch1.13` is supported.
209
242
210
-
**NOTE 3:** Training latent generative models (as e.g. in `configs/example_training/imagenet-f8_cond.yaml`) requires retrieving the checkpoint from [Hugging Face](https://huggingface.co/stabilityai/sdxl-vae/tree/main) and replacing the `CKPT_PATH` placeholder in [this line](configs/example_training/imagenet-f8_cond.yaml#81). The same is to be done for the provided text-to-image configs.
243
+
**NOTE 3:** Training latent generative models (as e.g. in `configs/example_training/imagenet-f8_cond.yaml`) requires
244
+
retrieving the checkpoint from [Hugging Face](https://huggingface.co/stabilityai/sdxl-vae/tree/main) and replacing
245
+
the `CKPT_PATH` placeholder in [this line](configs/example_training/imagenet-f8_cond.yaml#81). The same is to be done
The `GeneralConditioner` is configured through the `conditioner_config`. Its only attribute is `emb_models`, a list of
217
253
different embedders (all inherited from `AbstractEmbModel`) that are used to condition the generative model.
218
254
All embedders should define whether or not they are trainable (`is_trainable`, default `False`), a classifier-free
219
-
guidance dropout rate is used (`ucg_rate`, default `0`), and an input key (`input_key`), for example, `txt` for text-conditioning or `cls` for class-conditioning.
255
+
guidance dropout rate is used (`ucg_rate`, default `0`), and an input key (`input_key`), for example, `txt` for
256
+
text-conditioning or `cls` for class-conditioning.
220
257
When computing conditionings, the embedder will get `batch[input_key]` as input.
221
258
We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated
222
259
appropriately.
@@ -229,7 +266,8 @@ enough as we plan to experiment with transformer-based diffusion backbones.
229
266
230
267
#### Loss
231
268
232
-
The loss is configured through `loss_config`. For standard diffusion model training, you will have to set `sigma_sampler_config`.
269
+
The loss is configured through `loss_config`. For standard diffusion model training, you will have to
270
+
set `sigma_sampler_config`.
233
271
234
272
#### Sampler config
235
273
@@ -239,8 +277,9 @@ guidance.
239
277
240
278
### Dataset Handling
241
279
242
-
243
-
For large scale training we recommend using the data pipelines from our [data pipelines](https://github.com/Stability-AI/datapipelines) project. The project is contained in the requirement and automatically included when following the steps from the [Installation section](#installation).
280
+
For large scale training we recommend using the data pipelines from
281
+
our [data pipelines](https://github.com/Stability-AI/datapipelines) project. The project is contained in the requirement
282
+
and automatically included when following the steps from the [Installation section](#installation).
244
283
Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of
0 commit comments