5 - Vae
5 - Vae
Vijay Sankar
Data Scientist, Ericsson
Topics:
1. Dimensionality reduction
2. Autoencoders
3. Variational Autoencoders - Intuition
4. Regularisation
5. Sampling
6. Reparameterization Trick
7. Demo
Dimensionality Reduction
1. The process of reducing the number of features that
describe some data
2. This reduction is done either by selection (only some
existing features are conserved) or by extraction (a
reduced number of new features are created based
on the old features)
3. The main purpose of a dimensionality reduction
method is to look for the pair that keeps the
maximum of information when encoding and, so, has
the minimum of reconstruction error when decoding
4. PCA and Autoencoders
Principal Components Analysis (PCA)
1. statistical procedure that uses an orthogonal transformation that converts a set of correlated variables to a set
of uncorrelated variables
2. The idea of PCA is to build n_e new independent features that are linear combinations of the n_d old features
and so that the projections of the data on the subspace defined by these new features are as close as possible to
the initial data (in term of euclidean distance)
Autoencoders
1. An autoencoder is a feed-forward neural net whose
job it is to take an input x and predict x.
2. To make this non-trivial, we need to add a bottleneck
layer whose dimension is much smaller than the
input
3. The bottleneck layer captures the compressed latent
coding, so the nice by-product is dimension reduction
4. Autoencoders are lossy
5. The low-dimensional representation can be used as
the representation of the data in various
applications, e.g., image retrieval, data compression
Inferencing Autoencoder
1. FIrst, Dimensionality reduction with no reconstruction loss often comes with a price: the lack of interpretable
and exploitable structures in the latent space (lack of regularity)
2. Second, most of the time the final purpose of dimensionality reduction is not to only reduce the number of
dimensions of the data but to reduce this number of dimensions while keeping the major part of the data
structure information in the reduced representations
Problem with Autoencoder
1. The latent space they convert their inputs to and
where their encoded vectors lie, may not be
continuous, or allow easy interpolation.
2. If the space has discontinuities (eg. gaps between
clusters) and you sample/generate a variation from
there, the decoder will simply generate an unrealistic
output, because the decoder has no idea how to deal
with that region of the latent space.
3. During training, it never saw encoded vectors
coming from that region of latent space.
Variational Autoencoders - Intuition
1. “what is the link between autoencoders and content
generation?”
2. if the latent space is regular enough (well
“organized” by the encoder during the training
process), we could take a point randomly from that
latent space and decode it to get a new content.
3. The decoder would then act more or less like the
generator of a Generative Adversarial Network.
4. Regularity of the latent space for autoencoders is a
difficult point that depends on the distribution of the
data in the initial space, the dimension of the latent
space and the architecture of the encoder
Variational Autoencoders
1. Variational autoencoder can be defined as being an autoencoder whose training is regularised to avoid
overfitting and ensure that the latent space has good properties that enable generative process.
2. slight modification of the encoding-decoding process: instead of encoding an input as a single point, we encode
it as a distribution over the latent space
Loss Function - KL Divergence
1. the loss function that is minimised when training a VAE is composed of
a. “reconstruction term” (on the final layer), that tends to make the encoding-decoding scheme as
performant as possible, and
b. “regularisation term” (on the latent layer), that tends to regularise the organisation of the latent space
by making the distributions returned by the encoder close to a standard normal distribution
2. That regularisation term is expressed as the Kullback-Leibler divergence and Common reconstruction losses
are binary cross-entropy (BCE) and mean squared error (MSE)
Regularisation Intuitions
1. The regularity that is expected from the latent space in order to make generative process possible can be
expressed through two main properties:
a. continuity (two close points in the latent space should not give two completely different contents once
decoded) and
b. completeness (for a chosen distribution, a point sampled from the latent space should give “meaningful”
content once decoded)
Regularisation Intuitions
1. We have to regularise both the covariance matrix and the mean of the distributions returned by the encoder
2. With this regularisation term, we prevent the model to encode data far apart in the latent space and encourage
as much as possible returned distributions to “overlap”, satisfying this way the expected continuity and
completeness conditions
3. Naturally, as for any regularisation term, this comes at the price of a higher reconstruction error on the
training data
Sampling with VAE
1. Sampling from a Variational Autoencoder (VAE) enables the generation of new data that is similar to the one
seen during training and it is a unique aspect that separates VAE from traditional AE architecture.
2. There are several ways of sampling from a VAE:
a. Posterior sampling
b. Prior sampling
c. Class centers
d. Interpolation
Posterior Sampling
1. Sampling from the posterior distribution given a provided input.
2. Posterior sampling allows the generating of realistic data samples but with low variability: output data is
similar to the input data.
Prior Sampling
1. sampling from the latent space assuming a standard normal multivariate distribution. This is possible due to
the assumption (used during VAE training) that the latent variables are normally distributed. This method
does not allow the generation of data with specific properties (for example, generating data from a specific
class).
2. Prior sampling with N(0, I) does not always generate plausible data but has high variability.
Class Centers
1. Mean encodings of each class can be accumulated from the whole dataset and later be used for a controlled
(conditional generation)
2. Sampling from a normal distribution with averaged class μ guarantees the generation of new data from the
same class.
Interpolation
1. interpolation between two points in the latent space can reveal how changes in the latent space variable
correspond to changes in the generated data
2. Interpolation can be done linearly or between Two Random Latent Vectors or between Two Class Centers
Reparameterization Trick
1. mathematical operation used in the training process of VAEs
2. The VAE architecture includes a sampling operation where we sample latent variables from a distribution
parameterized by the outputs of the encoder.
3. The direct application of backpropagation here is problematic because of the inherent randomness of the
sampling operation
4. This method allows us to incorporate the random element required for sampling from the latent distribution
while preserving the chain of differentiable operations needed for backpropagation
Application of VAE
● Image Generation
● Anomaly Detection
● Data Imputation
● Style Transfer
Demos
● VAE