AI60201 Module3
AI60201 Module3
Adway Mitra
19 October 2022
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 1 / 61
Contents
1 Background
5 Normalizing Flows
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 2 / 61
Background
Background
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 3 / 61
Background
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 4 / 61
Background
Image Generation
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 6 / 61
Deep Autoregressive Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 7 / 61
Deep Autoregressive Models
Autoregressive Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 8 / 61
Deep Autoregressive Models
PixelRNN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 9 / 61
Deep Autoregressive Models
PixelCNN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 10 / 61
Variational Autoencoder (VAE)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 11 / 61
Variational Autoencoder (VAE)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 12 / 61
Variational Autoencoder (VAE)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 13 / 61
Variational Autoencoder (VAE) Training a VAE
Objective Function
∏N
θ∗ = argmaxθ,ϕ i=1 p(Xi |gθ (Zi )), where Zi ∼ N (hϕ (Xi ))
Neural Network parameters θ, ϕ, may be estimated by backpropagation after
defining suitable loss function
∑N ∑N
ℓ(θ, ϕ) = i=1 ||Xi − gθ (Zi )|| + i=1 KL(N (hϕ (Xi ))||N (0, 1))
Equivalent to maximizing the evidence lower bound (ELBO)
Backpropagation cannot be applied directly as it includes sampling Zi as an
intermediate step
Solution: Decouple the sampling from the backpropagation
Reparameterization Trick: ϵi ∼ N (0, 1) sampled independently,
Zi = µϕ (Xi ) + ϵi σϕ (Xi )
Now backpropagation can be applied to estimate θ, ϕ
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 14 / 61
Variational Autoencoder (VAE) Training a VAE
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 15 / 61
Variational Autoencoder (VAE) Training a VAE
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 16 / 61
Variational Autoencoder (VAE) Training a VAE
Supervised VAE
The datapoints used for training may be accompanied by class labels (Xi , Yi )
The generator/decoder network should be sensitive to the class label, i.e.
gθ (Yi , Zi )
Similarly, the inference/encoder network should be able to predict probability
distribution q(Yi , Zi |Xi )
The training process will remain unchanged
The loss function will include cross-entropy loss function for Yi (prediction by
encoder and actual label)
For new image generation, the user should specify Yi , and the image will be
generated accordingly
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 17 / 61
Variational Autoencoder (VAE) Training a VAE
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 18 / 61
Variational Autoencoder (VAE) Training a VAE
Interpretation
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 19 / 61
Variational Autoencoder (VAE) Training a VAE
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 21 / 61
Deep Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 22 / 61
Deep Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 23 / 61
Deep Generative Models
Inference in DBM
We may be interested to find the meaning of every latent variable (i.e. which
attribute it represents)
Approach: carry out inference of all latent variables to compute posteriors
l
p(Zij = 1|X)
Cannot be estimated directly, so we can use Gibbs Sampling
According to D-separation rules, each variable in layer l is independent of all
variables except those in the neighboring layers
For simplicity, assume all variables are binary
p(Zil = 1| − Zil ) = σ(−(Z l−1 )T Wil − Wil+1 Z l+1 ) where σ denotes the
sigmoid function
We sample each variable in one iteration, keeping the rest constant
We repeat this for many iterations, collecting samples of each variable
regularly. Posteriors estimated from these samples
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 24 / 61
Deep Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 25 / 61
Deep Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 26 / 61
Deep Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 27 / 61
Normalizing Flows
Normalizing Flows
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 28 / 61
Normalizing Flows
Inversion Model
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 29 / 61
Normalizing Flows
Inversion Model
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 30 / 61
Normalizing Flows
Flow of Transformations
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 31 / 61
Normalizing Flows
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 32 / 61
Normalizing Flows
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 33 / 61
Normalizing Flows
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 34 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 35 / 61
Denoising Diffusion Models
Introduction
We start with any valid sample x0 , eg. an image from a reference dataset
We add some random noise to it, eg. IID Gaussian noise, to get a noisy
version x1 of x0 , i.e. x1 = x0 + e0 where e0 ∼ q(x0 )
It is possible to recover x0 from x1 , by adding some “counter-noise”, i.e.
x0 = x1 + f1 where f1 ∼ p(x1 )!
It may be possible to analytically calculate p(x1 ) from q(x0 )
But if we keep on adding noise to the samples, to get {x0 , x1 , x2 , .....xT },
can we recover x0 from xT ?
We need to predict the noise at each step!
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 36 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 37 / 61
Denoising Diffusion Models
Denoising Diffusion
Is the reverse process possible, i.e. obtain x0 from xT , i.e. x0 ∼ q(xT )?
If so, it opens up a generative model where we sample the code Z = XT
from N (0, I), and denoise it to get x0 , which is a valid sample following the
data distribution q(x0 )!
Possible only if we can find q(xt−1 |xt ) which is generally intractable!
Solution: approximate it using pθ (xt−1 |xt ) = N (uθ (xt , t), σt2 I)!
uθ : an unknown function represented by a neural network with parameters θ
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 38 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 39 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 40 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 41 / 61
Denoising Diffusion Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 42 / 61
Denoising Diffusion Models
Denoising Diffusion model has clear parallels with VAE and especially
Normalizing Flows
All three approaches first generate noise, then progressively distort it towards
a specific distribution over the output space
In case of VAE, the distortion steps are not explicitly built into the model,
though each layer of the decoder network may be considered as a step.
In all three cases, the learning problem involves encoder p(z|x) in addition to
decoder q(x|z)
In case of NF, the encoder can be directly calculated because the decoder
steps are invertible. In VAE and Densoising Diffusion, variational
approximation is needed
Unlike VAE and NF, the initial noise in denoising diffusion models is of the
same size as that of the target space (eg. m × n image)
Empirically, Denoising Diffusion is seen to generate best quality images
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 43 / 61
Generative Adversarial Networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 44 / 61
Generative Adversarial Networks
Adversarial Learning
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 45 / 61
Generative Adversarial Networks
2-sample Test
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 46 / 61
Generative Adversarial Networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 47 / 61
Generative Adversarial Networks
Training a GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 48 / 61
Generative Adversarial Networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 49 / 61
Generative Adversarial Networks
Conditional GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 50 / 61
Generative Adversarial Networks Variants of GAN
Mode Collapse
pDAT A may have many modes, i.e. regions of high probability density in the
sample space
Mode collapse: a situation where pGEN contains only one or two of these
modes
Result: most generated samples are near-identical, or have only a small
number of categories
Reasons:
1 The generator distribution is less expressive than pDAT A
2 The alternate mini-max optimization is unable to find an equilibrium
Typically, the generator latches on to a few samples on which the
discriminator fails at any iteration
The generator may keep switching between modes in different iterations
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 51 / 61
Generative Adversarial Networks Variants of GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 52 / 61
Generative Adversarial Networks Variants of GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 53 / 61
Generative Adversarial Networks Variants of GAN
f-GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 54 / 61
Generative Adversarial Networks Variants of GAN
W-GAN
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 55 / 61
Evaluating Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 56 / 61
Evaluating Generative Models
Test Likelihood
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 57 / 61
Evaluating Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 58 / 61
Evaluating Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 59 / 61
Evaluating Generative Models
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 60 / 61
Acknowledgement
Thank you!
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Adway Mitra (Indian Institute of Technology Kharagpur) Centre of Excellence in AI 19 October 2022 61 / 61