[go: up one dir, main page]

0% found this document useful (0 votes)
11 views58 pages

Lecture 09 - Generative Models

The document outlines a course on Deep Learning (CS-878) for Fall 2024, led by Assistant Professor Muhammad Naseer Bajwa at NUST, Islamabad. It covers various generative models including Autoencoders, Variational Autoencoders, and Generative Adversarial Networks, detailing their architectures, applications, and training methodologies. Key concepts such as representation learning, dimensionality reduction, and the adversarial training process of GANs are discussed.

Uploaded by

Fahad Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views58 pages

Lecture 09 - Generative Models

The document outlines a course on Deep Learning (CS-878) for Fall 2024, led by Assistant Professor Muhammad Naseer Bajwa at NUST, Islamabad. It covers various generative models including Autoencoders, Variational Autoencoders, and Generative Adversarial Networks, detailing their architectures, applications, and training methodologies. Key concepts such as representation learning, dimensionality reduction, and the adversarial training process of GANs are discussed.

Uploaded by

Fahad Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Deep Learning (CS-878)

Fall – 2024
Muhammad Naseer Bajwa
Assistant Professor,
Department of Computing, SEECS
Co-Principal Investigator,
Deep Learning Lab, NCAI
NUST, Islamabad
naseer.bajwa@seecs.edu.pk
WM/04.02 S. 1

WM/04-05 S. 1
Overview of this week’s lectures

Generative Models

- Autoencoders

- Variational Autoencoders

- Generative Adversarial Networks

- Diffusion Models

WM/04.02 S. 2

02/21
WM/04-05 S. 2
High dimensional data can often be represented by a low dimensional code

- Assumption: The data lies near a manifold in high dimensional


space i.e. in the direction orthogonal to this manifold, there’s not
much variation in the data.

- Find the manifold.

- Project the data on to the manifold.

- The dimensionality is reduced without losing much


information.

WM/04.02 S. 3

03/21
WM/04-05 S. 3
High dimensional data can often be represented by a low dimensional code

- Assumption: The data lies near a manifold in high dimensional


space i.e. in the direction orthogonal to this manifold, there’s not
much variation in the data.

- Find the manifold.

- Project the data on to the manifold.

- The dimensionality is reduced without losing much


information.

- If the manifold is linear, this operation may be performed by using

- PCA: Efficient

- Linear Autoencoders: Inefficient but well-generalisable


WM/04.02 S. 4

03/21
WM/04-05 S. 4
Autoencoders are unsupervised algorithms for representation learning

- With non-linear layers, deep autoencoders can find non-linear code.

- Curved manifold in the input space may also be dealt with. Output Vector

Hidden Activations

Code

Hidden Activations

Input Vector

WM/04.02 S. 5

04/21
WM/04-05 S. 5
Autoencoders are unsupervised algorithms for representation learning

- With non-linear layers, deep autoencoders can find non-linear code.

- Curved manifold in the input space may also be dealt with. Output Vector

- Autoencoders are mirrored neural networks with an intentional bottleneck in


the middle. Hidden Activations

- Code vector is the compressed representation of the input.


Code

Hidden Activations

Input Vector

WM/04.02 S. 6

04/21
WM/04-05 S. 6
Autoencoders are unsupervised algorithms for representation learning

- With non-linear layers, deep autoencoders can find non-linear code.

- Curved manifold in the input space may also be dealt with. Output Vector

- Autoencoders are mirrored neural networks with an intentional bottleneck in


the middle. Hidden Activations

- Code vector is the compressed representation of the input.


Code
- Deep Autoencoders use a supervised learning algorithm to do
unsupervised learning.
Hidden Activations

Input Vector

WM/04.02 S. 7

04/21
WM/04-05 S. 7
Autoencoders are unsupervised algorithms for representation learning

- With non-linear layers, deep autoencoders can find non-linear code.

- Curved manifold in the input space may also be dealt with. Output Vector

- Autoencoders are mirrored neural networks with an intentional bottleneck in


the middle. Hidden Activations

- Code vector is the compressed representation of the input.


Code
- Deep Autoencoders use a supervised learning algorithm to do
unsupervised learning.
Hidden Activations
- Encoder converts coordinates in input space to coordinates on the
manifold. The decoder performs inverse mapping.

Input Vector

WM/04.02 S. 8

04/21
WM/04-05 S. 8
Deep Autoencoders Architecture

WM/04.02 S. 9

https://architecturewithexample.blogspot.com/2021/10/convolutional-autoencoder-architecture.html
https://starship-knowledge.com/tag/autoencoder-vs-restricted-boltzmann-machine
05/21
WM/04-05 S. 9
Autoencoders have multiple applications

- Compression

- Save storage space

- Compare data in latent space

WM/04.02 S. 10

06/21
WM/04-05 S. 10
Autoencoders have multiple applications

- Compression

- Save storage space

- Compare data in latent space

- Generation

- Interpolate/Extrapolate from latent space

WM/04.02 S. 11

06/21
WM/04-05 S. 11
Autoencoders have multiple applications

- Compression

- Save storage space

- Compare data in latent space

- Generation

- Interpolate/Extrapolate from latent space

- Denoising

- Learn to focus on only the import information

WM/04.02 S. 12

07/21
WM/04-05 S. 12
Denoising Autoencoders denoise input

- During training, the input samples are deliberately


contaminated with noise.

WM/04.02 S. 13

Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on
Machine learning. 2008.
08/21
WM/04-05 S. 13
Denoising Autoencoders denoise input

- During training, the input samples are deliberately


contaminated with noise.

- Noisy sample is provided to the DAE and is asked to


reconstruct the original signal.

WM/04.02 S. 14

Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on
Machine learning. 2008.
08/21
WM/04-05 S. 14
Denoising Autoencoders denoise input

- During training, the input samples are deliberately


contaminated with noise.

- Noisy sample is provided to the DAE and is asked to


reconstruct the original signal.

- The model learns to understand the difference between


original signal and noise encodes only original signal in latent
space.

WM/04.02 S. 15

Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on
Machine learning. 2008.
08/21
WM/04-05 S. 15
Denoising Autoencoders denoise input

- During training, the input samples are deliberately


contaminated with noise.

- Noisy sample is provided to the DAE and is asked to


reconstruct the original signal.

- The model learns to understand the difference between


original signal and noise encodes only original signal in latent
space.

WM/04.02 S. 16

Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on
Machine learning. 2008.
08/21
WM/04-05 S. 16
Variational Autoencoders are purpose-built for generating new data

- DAE are good for many tasks but not for generation.

- Their latent space is not symmetrical around the


origin.
- Some classes are represented over small areas,
other over large areas in latent space.
- The latent space is not continuous.

WM/04.02 S. 17

Kingma, Diederik P., and Max Welling. "Auto-encoding variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

09/21
WM/04-05 S. 17
Variational Autoencoders are purpose-built for generating new data

- DAE are good for many tasks but not for generation.

- Their latent space is not symmetrical around the


origin.
- Some classes are represented over small areas,
other over large areas in latent space.
- The latent space is not continuous.

- Variational Autoencoders are designed specifically for generating data.

- Instead of mapping an input sample to a single point in the latent space, VAE map each sample to a
multivariate probability distribution in the latent space.

WM/04.02 S. 18

Kingma, Diederik P., and Max Welling. "Auto-encoding variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

09/21
WM/04-05 S. 18
Variational Autoencoders are purpose-built for generating new data

- DAE are good for many tasks but not for generation.

- Their latent space is not symmetrical around the


origin.
- Some classes are represented over small areas,
other over large areas in latent space.
- The latent space is not continuous.

- Variational Autoencoders are designed specifically for generating data.

- Instead of mapping an input sample to a single point in the latent space, VAE map each sample to a
multivariate probability distribution in the latent space.

WM/04.02 S. 19

Kingma, Diederik P., and Max Welling. "Auto-encoding variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

09/21
WM/04-05 S. 19
Variational Autoencoders are purpose-built for generating new data

- DAE are good for many tasks but not for generation.

- Their latent space is not symmetrical around the


origin.
- Some classes are represented over small areas,
other over large areas in latent space.
- The latent space is not continuous.

- Variational Autoencoders are designed specifically for generating data.

- Instead of mapping an input sample to a single point in the latent space, VAE map each sample to a
multivariate probability distribution in the latent space.

- Loss of VAE is regularised.

ℒ 𝜙, 𝜃, 𝑥 = 𝑥 − 𝑥ො 2 + 𝐷(𝑞𝜙 𝑧 𝑥 || 𝑝(𝑧))
WM/04.02 S. 20

Kingma, Diederik P., and Max Welling. "Auto-encoding variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

09/21
WM/04-05 S. 20
What properties do we want to achieve from regularisation?

- Continuity: Points closer to each other in latent space should be decoded to


semantically similar data.

WM/04.02 S. 21

Kingma, Diederik P., and Max Welling. "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

10/21
WM/04-05 S. 21
What properties do we want to achieve from regularisation?

- Continuity: Points closer to each other in latent space should be decoded to


semantically similar data.

- Completeness: Each point in the latent space should be decoded to


meaningful representation.

WM/04.02 S. 22

Kingma, Diederik P., and Max Welling. "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

10/21
WM/04-05 S. 22
What properties do we want to achieve from regularisation?

- Continuity: Points closer to each other in latent space should be decoded to


semantically similar data.

- Completeness: Each point in the latent space should be decoded to


meaningful representation.

- Regularisation with normal prior helps enforce information gradient.

WM/04.02 S. 23

Kingma, Diederik P., and Max Welling. "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

10/21
WM/04-05 S. 23
Generative Adversarial Networks

- GANs are also unsupervised models that consist of two


sub-models.

- Generator: Synthesises some data from random


noise and gives it to Discriminator.

- Discriminator: Tries to find out if the Generator output


is fake or sampled from a real distribution.

WM/04.02 S. 24

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

11/21
WM/04-05 S. 24
Generative Adversarial Networks

- GANs are also unsupervised models that consist of two


sub-models.

- Generator: Synthesises some data from random


noise and gives it to Discriminator.

- Discriminator: Tries to find out if the Generator output


is fake or sampled from a real distribution.

- Both sub-models compete with each other (hence


Adversarial) in a zero-sum game, when only one sub-
model can win.

WM/04.02 S. 25

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

11/21
WM/04-05 S. 25
Generative Adversarial Networks

- GANs are also unsupervised models that consist of two


sub-models.

- Generator: Synthesises some data from random


noise and gives it to Discriminator.

- Discriminator: Tries to find out if the Generator output


is fake or sampled from a real distribution.

- Both sub-models compete with each other (hence


Adversarial) in a zero-sum game, when only one sub-
model can win.

- Every time a sub-model loses, it updates its weights and starts another round.

WM/04.02 S. 26

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

11/21
WM/04-05 S. 26
Generative Adversarial Networks

- GANs are also unsupervised models that consist of two


sub-models.

- Generator: Synthesises some data from random


noise and gives it to Discriminator.

- Discriminator: Tries to find out if the Generator output


is fake or sampled from a real distribution.

- Both sub-models compete with each other (hence


Adversarial) in a zero-sum game, when only one sub-
model can win.

- Every time a sub-model loses, it updates its weights and starts another round.

- The training stops when fake samples are no more distinguishable from real samples.
WM/04.02 S. 27

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

11/21
WM/04-05 S. 27
How to train a GAN?

- Let 𝐺: Generator Model, a differentiable function with parameters 𝜃𝑔


𝐷: Discriminator Model, a differentiable function with parameters 𝜃𝑑
𝒙: data
𝑝𝑑𝑎𝑡𝑎 : Distribution of data.
𝑝𝑔 : Generator’s distribution over 𝒙
𝑝𝑧 (𝒛): A prior on input noise variables.
𝐺(𝒛; 𝜃𝑔 ): Output of generator, a mapping from noise to data space (synthetic sample)
𝐷(𝒙; 𝜃𝑑 ): Output of discriminator, a scalar. Represents the probability that 𝑥 came from the data
instead of 𝑝𝑔 .

WM/04.02 S. 28

https://www.youtube.com/watch?v=Gib_kiXgnvA

12/21
WM/04-05 S. 28
How to train a GAN?

- Let 𝐺: Generator Model, a differentiable function with parameters 𝜃𝑔


𝐷: Discriminator Model, a differentiable function with parameters 𝜃𝑑
𝒙: data
𝑝𝑑𝑎𝑡𝑎 : Distribution of data.
𝑝𝑔 : Generator’s distribution over 𝒙
𝑝𝑧 (𝒛): A prior on input noise variables.
𝐺(𝒛; 𝜃𝑔 ): Output of generator, a mapping from noise to data space (synthetic sample)
𝐷(𝒙; 𝜃𝑑 ): Output of discriminator, a scalar. Represents the probability that 𝑥 came from the data
instead of 𝑝𝑔 .

- Train 𝐷 to maximise the probability of assigning the correct label to both 𝑝𝑑𝑎𝑡𝑎 (𝑥) and samples from 𝐺.

WM/04.02 S. 29

https://www.youtube.com/watch?v=Gib_kiXgnvA

12/21
WM/04-05 S. 29
How to train a GAN?

- Let 𝐺: Generator Model, a differentiable function with parameters 𝜃𝑔


𝐷: Discriminator Model, a differentiable function with parameters 𝜃𝑑
𝒙: data
𝑝𝑑𝑎𝑡𝑎 : Distribution of data.
𝑝𝑔 : Generator’s distribution over 𝒙
𝑝𝑧 (𝒛): A prior on input noise variables.
𝐺(𝒛; 𝜃𝑔 ): Output of generator, a mapping from noise to data space (synthetic sample)
𝐷(𝒙; 𝜃𝑑 ): Output of discriminator, a scalar. Represents the probability that 𝑥 came from the data
instead of 𝑝𝑔 .

- Train 𝐷 to maximise the probability of assigning the correct label to both 𝑝𝑑𝑎𝑡𝑎 (𝑥) and samples from 𝐺.

- Simultaneously, train 𝐺 to minimise log(1 − 𝐷(𝐺(𝑧))).

WM/04.02 S. 30

https://www.youtube.com/watch?v=Gib_kiXgnvA

12/21
WM/04-05 S. 30
How to train a GAN?

- Let 𝐺: Generator Model, a differentiable function with parameters 𝜃𝑔


𝐷: Discriminator Model, a differentiable function with parameters 𝜃𝑑
𝒙: data
𝑝𝑑𝑎𝑡𝑎 : Distribution of data.
𝑝𝑔 : Generator’s distribution over 𝒙
𝑝𝑧 (𝒛): A prior on input noise variables.
𝐺(𝒛; 𝜃𝑔 ): Output of generator, a mapping from noise to data space (synthetic sample)
𝐷(𝒙; 𝜃𝑑 ): Output of discriminator, a scalar. Represents the probability that 𝑥 came from the data
instead of 𝑝𝑔 .

- Train 𝐷 to maximise the probability of assigning the correct label to both 𝑝𝑑𝑎𝑡𝑎 (𝑥) and samples from 𝐺.

- Simultaneously, train 𝐺 to minimise log(1 − 𝐷(𝐺(𝑧))).

- The objective of the training is

min max 𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎 (𝑥) log 𝐷 𝑥


WM/04.02 S. 31
+ 𝔼𝑧~𝑝𝑧 (𝑧) log 1 − 𝐷 𝐺 𝑧
𝐺 𝐷
https://www.youtube.com/watch?v=Gib_kiXgnvA

12/21
WM/04-05 S. 31
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

𝒛 𝐺(𝑧) ෝ
𝒙

WM/04.02 S. 32

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 32
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

- Start with something simple.

- Gaussian Noise

Noise
𝒁
distribution

𝒛 𝐺(𝑧) ෝ
𝒙

WM/04.02 S. 33

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 33
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

- Start with something simple.

- Gaussian Noise

- GANs learn a transformation to convert a random noise


vector to target data distribution.

Noise
𝒁
distribution
෡ Learned Data
𝑿
𝒛 𝐺(𝑧) ෝ
𝒙 distribution

WM/04.02 S. 34

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 34
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

- Start with something simple.

- Gaussian Noise

- GANs learn a transformation to convert a random noise


vector to target data distribution.

Noise
𝒁
distribution
෡ Learned Data
𝑿
𝒛 𝐺(𝑧) distribution

WM/04.02 S. 35

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 35
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

- Start with something simple.

- Gaussian Noise

- GANs learn a transformation to convert a random noise


vector to target data distribution.

Noise
𝒁
distribution
෡ Learned Data
𝑿
𝒛 𝐺(𝑧) distribution

WM/04.02 S. 36

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 36
GANs are distribution transformers

- GANs try to learn the distribution of the original data.

- But the original data distribution could be complex and


hard to learn directly.

- Start with something simple.

- Gaussian Noise

- GANs learn a transformation to convert a random noise


vector to target data distribution.

Noise
𝒁
distribution
෡ Learned Data
𝑿
𝒛 𝐺(𝑧)
? distribution

WM/04.02 S. 37

Goodfellow, Ian, et al. "Generative Adversarial Networks." Communications of the ACM 63.11 (2020): 139-144.

13/21
WM/04-05 S. 37
Conditional GANs control the generated output

- Allows to achieve paired-translations between different types of data.


𝒄

𝒄
𝐷(𝑥) 0/1
𝐺(𝑧) ෝ
𝒙

WM/04.02 S. 38

Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017.
14/21
WM/04-05 S. 38
Conditional GANs control the generated output

- Allows to achieve paired-translations between different types of data.


𝒄
Label to Street Scene
𝒄
𝐷(𝑥) 0/1
𝐺(𝑧) ෝ
𝒙

Input Output

WM/04.02 S. 39

Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017.
15/21
WM/04-05 S. 39
Conditional GANs control the generated output

- Allows to achieve paired-translations between different types of data.


𝒄
Label to Street Scene
𝒄
𝐷(𝑥) 0/1
𝐺(𝑧) ෝ
𝒙

Input Output
Black and White to Colour

WM/04.02 S. 40
Input Output
Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017.
15/21
WM/04-05 S. 40
Conditional GANs control the generated output

- Allows to achieve paired-translations between different types of data.


𝒄
Label to Street Scene
𝒄
𝐷(𝑥) 0/1
𝐺(𝑧) ෝ
𝒙

Input Output
Black and White to Colour Day to Night

WM/04.02 S. 41
Input Output Input Output
Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017.
15/21
WM/04-05 S. 41
Difference between VAE and GANs

- In General, GANs produce better quality data


compared to VAEs.

- Reason?

VAE

GANs

WM/04.02 S. 42

16/21
WM/04-05 S. 42
Difference between VAE and GANs

- In General, GANs produce better quality data


compared to VAEs.

- Reason?

- VAE has to optimise two losses simultaneously


which may interfere with each other. VAE

- GANs work in hit-and-trial fashion and their


output is empirically sound.

GANs

WM/04.02 S. 43

16/21
WM/04-05 S. 43
Applications of GANs

- Video frame prediction

WM/04.02 S. 44

17/21
WM/04-05 S. 44
Applications of GANs

- Video frame prediction

- Morphing Audio/Videos/Images

WM/04.02 S. 45

17/21
WM/04-05 S. 45
Applications of GANs

- Video frame prediction

- Morphing Audio/Videos/Images

- Image enhancement

WM/04.02 S. 46

17/21
WM/04-05 S. 46
Applications of GANs

- Video frame prediction

- Morphing Audio/Videos/Images

- Image enhancement

- Text to Image Generation

WM/04.02 S. 47

17/21
WM/04-05 S. 47
Diffusion Models break the transformation into small steps

- GANs learn to convert noise into synthetic data in one


big transformation.
VAEs 𝒙 𝑞𝜙 (𝑧|𝑥) 𝒛 𝑝𝜃 (𝑥|𝑧) ෝ
𝒙

𝐷(𝑥) 0/1

GANs 𝒛 𝐺(𝑧) ෝ
𝒙

WM/04.02 S. 48

18/21
WM/04-05 S. 48
Diffusion Models break the transformation into small steps

- GANs learn to convert noise into synthetic data in one


big transformation.
VAEs 𝒙 𝑞𝜙 (𝑧|𝑥) 𝒛 𝑝𝜃 (𝑥|𝑧) ෝ
𝒙

- Diffusion Models learn small, repeated transformations


to convert noise into synthetic data.
𝒙

𝐷(𝑥) 0/1

GANs 𝒛 𝐺(𝑧) ෝ
𝒙

Diffusion 𝒙𝟏 𝒙𝟐 𝒙𝟑 … 𝒛
𝒙𝟎
Models

WM/04.02 S. 49

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing
systems 33 (2020): 6840-6851.
18/21
WM/04-05 S. 49
Diffusion Models break the transformation into small steps

- GANs learn to convert noise into synthetic data in one


big transformation.
VAEs 𝒙 𝑞𝜙 (𝑧|𝑥) 𝒛 𝑝𝜃 (𝑥|𝑧) ෝ
𝒙

- Diffusion Models learn small, repeated transformations


to convert noise into synthetic data.
𝒙
- Consider the gradual change in the input as a
Markov Chain.
𝐷(𝑥) 0/1
𝑇
GANs 𝒛 𝐺(𝑧) ෝ
𝒙
𝑞 𝒙1:𝑇 𝒙0 ≔ ෑ 𝑞 𝒙𝑡 𝒙𝑡−1
𝑡=1

Diffusion 𝒙𝟏 𝒙𝟐 𝒙𝟑 … 𝒛
𝒙𝟎
Models

WM/04.02 S. 50

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing
systems 33 (2020): 6840-6851.
18/21
WM/04-05 S. 50
Diffusion Models gradually add Gaussian noise and then remove it

- We want our model to learn fine-grained details, general outlines,


and everything in between.

WM/04.02 S. 51

19/21
WM/04-05 S. 51
Diffusion Models gradually add Gaussian noise and then remove it

- We want our model to learn fine-grained details, general outlines,


and everything in between.

- Add different noise levels to the training data and train the model
to denoise it.

WM/04.02 S. 52

19/21
WM/04-05 S. 52
Diffusion Models gradually add Gaussian noise and then remove it

- We want our model to learn fine-grained details, general outlines,


and everything in between.

- Add different noise levels to the training data and train the model
to denoise it.

- What should the model do at each noise level?

- Predict and remove the noise to make the image clearer.

- Keep doing it until the image is denoised.

WM/04.02 S. 53

19/21
WM/04-05 S. 53
A trained diffusion model can generate new data

- The model learns to take different noisy images and turn them back to the original image.

WM/04.02 S. 54

20/21
WM/04-05 S. 54
A trained diffusion model can generate new data

- The model learns to take different noisy images and turn them back to the original image.

- Once trained, the model can take an arbitrary noise vector and turn it into a synthetic image.

WM/04.02 S. 55

20/21
WM/04-05 S. 55
A trained diffusion model can generate new data

- The model learns to take different noisy images and turn them back to the original image.

- Once trained, the model can take an arbitrary noise vector and turn it into a synthetic image.

- Just like a trained sculptor can chisel away a face from a block of stone.

Apollo

WM/04.02 S. 56

20/21
WM/04-05 S. 56
Diffusion model can generate interesting images from prompt

An elephant scuba A banana riding a a bunny reading his A panda bear eating A red boat flying
diving underwater horse on the moon email on laptop pasta upside down in rain

An astronaut A crocodile fishing A tree with all kinds Bichon Maltese and a A diffusion model
WM/04.02
walking a crocodile S. 57 on a boat while of fruits black bunny playing generating an image
in park reading a paper backgammon

21/21
WM/04-05 S. 57
Do you have any problem?

Some material (images, tables, text etc.) in this


presentation has been borrowed from different
books, lecture notes, and the web. The original
contents solely belong to their owners, and are
used in this presentation only for clarifying various
educational concepts. Any copyright infringement is
not at all intended.

WM/04.02 S. 58

EOP
WM/04-05 S. 58

You might also like