[go: up one dir, main page]

0% found this document useful (0 votes)
118 views23 pages

Unit 5

Uploaded by

Aanchal Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views23 pages

Unit 5

Uploaded by

Aanchal Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT-V

AUTO ENCODERS
Auto encoders are a type of neural network used primarily for unsupervised learning tasks,
particularly for dimensionality reduction and feature learning. The basic idea of an auto encoder
is to learn a representation (encoding) for a set of data, typically for the purpose of
dimensionality reduction, denoising, or generative tasks.

Structure of Auto encoders:-

Encoder: This part of the network compresses the input into a latent-space representation. It
encodes the input data as a compressed, lower-dimensional representation. The encoder is
typically composed of a series of layers that gradually decrease in size.

Latent Space (Bottleneck): This layer represents the compressed knowledge that the network
has learned about the input data. The latent space is a lower-dimensional representation, which is
a compressed version of the input.

Decoder: The decoder part of the network reconstructs the input data from the latent space
representation. Its structure is often a mirror image of the encoder, with layers increasing in size.

Types of Auto encoders:

Basic Auto encoder: Consists of fully connected layers and is used for simple tasks like
dimensionality reduction.

Convolution Auto encoder: Uses convolution layers instead of fully connected layers. It's more
suitable for tasks involving image data.

Denoising Auto encoder: Trained to remove noise from data. It's given a corrupted input and is
trained to reconstruct the original, uncorrupted input.

Variation Auto encoder (VAE): A generative variant of auto encoders. VAEs not only learn a
compressed representation but also the parameters of a probability distribution representing the
data.

Applications of Auto encoders:

Dimensionality Reduction: Reducing the number of variables in data, similar to PCA but more
powerful due to the non-linear capabilities of neural networks.

Feature Learning: Learning new features for a dataset, which can be useful for tasks like
anomaly detection or retraining a network for classification tasks.
Denoising: Auto encoders can learn to remove noise from images or other data types, which is
useful in image processing and other signal processing applications.

Generative Models: Variation auto encoders are often used as generative models to generate
new data instances that are similar to the training data.

Training Auto encoders:

The training process involves minimizing a loss function that measures the difference between
the input and its reconstruction (e.g., mean squared error).

In practice, regularization techniques might be applied to prevent over fitting, especially in the
case of simple auto encoders.

Auto encoders are a versatile tool in the machine learning toolbox, offering a way to learn
compressed representations of data in an unsupervised manner, which can be crucial for tasks
where labeled data is scarce or for improving the efficiency of data representation.

At the heart of deep learning lies the neural network, an intricate interconnected system of
nodes that mimics the human brain’s neural architecture. Neural networks excel at discerning
intricate patterns and representations within vast datasets, allowing them to make predictions,
classify information, and generate novel insights. Auto encoders emerge as a fascinating
subset of neural networks, offering a unique approach to unsupervised learning. Auto encoders
are an adaptable and strong class of architectures for the dynamic field of deep learning, where
neural networks develop constantly to identify complicated patterns and representations. With
their ability to learn effective representations of data, these unsupervised learning models have
received considerable attention and are useful in a wide variety of areas, from image
processing to anomaly detection.

What are Auto encoders?

Auto encoders are a specialized class of algorithms that can learn efficient representations of
input data with no need for labels. It is a class of artificial neural networks designed
for unsupervised learning. Learning to compress and effectively represent input data without
specific labels is the essential principle of an automatic decoder. This is accomplished using a
two-fold structure that consists of an encoder and a decoder. The encoder transforms the input
data into a reduced-dimensional representation, which is often referred to as “latent space” or
“encoding”. From that representation, a decoder rebuilds the initial input. For the network to
gain meaningful patterns in data, a process of encoding and decoding facilitates the definition
of essential features.
Architecture of Auto encoder in Deep Learning

The general architecture of an auto encoder includes an encoder, decoder, and bottleneck layer.

Encoder

Input layer take raw input data the hidden layers progressively reduce the dimensionality of the
input, capturing important features and patterns. These layers compose the encoder. The
bottleneck layer (latent space) is the final hidden layer, where the dimensionality is
significantly reduced. This layer represents the compressed encoding of the input data.

Decoder

The bottleneck layer takes the encoded representation and expands it back to the
dimensionality of the original input. The hidden layers progressively increase the
dimensionality and aim to reconstruct the original input. The output layer produces the
reconstructed output, which ideally should be as close as possible to the input data. The loss
function used during training is typically a reconstruction loss, measuring the difference
between the input and the reconstructed output. Common choices include mean squared error
(MSE) for continuous data or binary cross-entropy for binary data. During training, the auto
encoder learns to minimize the reconstruction loss, forcing the network to capture the most
important features of the input data in the bottleneck layer. After the training process, only the
encoder part of the auto encoder is retained to encode a similar type of data used in the training
process. The different ways to constrain the network are: –

Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then
the network will be forced to pick up only the representative features of the data thus encoding
the data.

Regularization: In this method, a loss term is added to the cost function which encourages the
network to train in ways other than copying the input.
Denoising: Another way of constraining the network is to add noise to the input and teach the
network how to remove the noise from the data.

Tuning the Activation Functions: This method involves changing the activation functions of
various nodes so that a majority of the nodes are dormant thus, effectively reducing the size of
the hidden layers.

Types of Auto encoders

There are diverse types of auto encoders and analyze the advantages and disadvantages
associated with different variation:

Denoising Auto encoder

Denoising auto encoder works on a partially corrupted input and trains to recover the original
undistorted image. As mentioned above, this method is an effective way to constrain the
network from simply copying the input and thus learn the underlying structure and important
features of the data.

Advantages

This type of auto encoder can extract important features and reduce the noise or the useless
features. Denoising auto encoders can be used as a form of data augmentation, the restored
images can be used as augmented data thus generating additional training samples.

Disadvantages

Selecting the right type and level of noise to introduce can be challenging and may require
domain knowledge.

Denoising process can result into loss of some information that is needed from the original
input. This loss can impact accuracy of the output.

Sparse Auto encoder

This type of auto encoder typically contains more hidden units than the input but only a few
are allowed to be active at once. This property is called the sparsely of the network. The
sparsely of the network can be controlled by either manually zeroing the required hidden units,
tuning the activation functions or by adding a loss term to the cost function.

Advantages

The sparsely constraint in sparse auto encoders helps in filtering out noise and irrelevant
features during the encoding process. These auto encoders often learn important and
meaningful features due to their emphasis on sparse activations.
Disadvantages

The choice of hyper parameters plays a significant role in the performance of this auto encoder.
Different inputs should result in the activation of different nodes of the network.

Variation Auto encoder

Variation auto encoder makes strong assumptions about the distribution of latent variables and
uses the Stochastic Gradient Variation Bayes estimator in the training process. It assumes that
the data is generated by a Directed Graphical Model and tries to learn an approximation to to
the conditional property. Where and are the parameters of the encoder and the
decoder respectively.

Advantages

Variation Auto encoders are used to generate new data points that resemble the original
training data. These samples are learned from the latent space. Variation Auto encoder is
probabilistic framework that is used to learn a compressed representation of the data that
captures its underlying structure and variations, so it is useful in detecting anomalies and data
exploration.

Convolution Auto encoder

Convolution auto encoders are a type of auto encoder that uses convolution neural networks
(CNNs) as their building blocks. The encoder consists of multiple layers that take an image or
a grid as input and pass it through different convolution layers thus forming a compressed
representation of the input. The decoder is the mirror image of the encoder it deconvolves the
compressed representation and tries to reconstruct the original image.

Implementation of Auto encoders

We’ve created an auto encoder comprising two dense layers: an encoder responsible for
condensing the images into a 64-dimensional latent vector, and a decoder tasked with
reconstructing the initial image based on this latent space.

What is an auto encoder?

An auto encoder is a type of artificial neural network used to learn data encodings in an unsupervised
manner. The aim of an auto encoder is to learn a lower-dimensional representation (encoding) for a
higher-dimensional data, typically for dimensionality reduction, by training the network to capture the
most important parts of the input image.
The architecture of auto encoders

Let’s start with a quick overview of auto encoders’ architecture.

Auto encoders consist of 3 parts:

1. Encoder:A module that compresses the train-validate-test set input data into an encoded
representation that is typically several orders of magnitude smaller than the input data.

2. Bottleneck: A module that contains the compressed knowledge representations and is therefore the
most important part of the network.

3. Decoder: A module that helps the network “decompress” the knowledge representations and
reconstructs the data back from its encoded form. The output is then compared with a ground truth.

The relationship between the Encoder, Bottleneck, and Decoder

Encoder

The encoder is a set of convolution blocks followed by pooling modules that compress the input to the
model into a compact section called the bottleneck. The bottleneck is followed by the decoder that
consists of a series of up sampling modules to bring the compressed feature back into the form of an
image. In case of simple auto encoders, the output is expected to be the same as the input data with
reduced noise.

How to train auto encoders?


You need to set 4 hyper parameters before training an auto encoder:

Code size: The code size or the size of the bottleneck is the most important hyper parameter used to
tune the auto encoder. The bottleneck size decides how much the data has to be compressed. This can
also act as a regularization term.

Number of layers: Like all neural networks, an important hyper parameter to tune auto encoders is the
depth of the encoder and the decoder. While a higher depth increases model complexity, a lower depth
is faster to process.

Number of nodes per layer: The number of nodes per layer defines the weights we use per layer.
Typically, the number of nodes decreases with each subsequent layer in the auto encoder as the input to
each of these layers becomes smaller across the layers.

Reconstruction Loss: The loss function we use to train the auto encoder is highly dependent on the
type of input and output we want the auto encoder to adapt to. If we are working with image data, the
most popular loss functions for reconstruction are MSE Loss and L1 Loss. In case the inputs and
outputs are within the range [0, 1], as in MNIST, we can also make use of Binary Cross Entropy as the
reconstruction loss.

5 types of auto encoders

The idea of auto encoders for neural networks isn't new.

The first applications date to the 1980s. Initially used for dimensionality reduction and feature learning,
an auto encoder concept has evolved over the years and is now widely used for learning generative
models of data.

Under complete auto encoders

Sparse auto encoders

Contractive auto encoders

Denoising auto encoders

Variation Auto encoders (for generative modeling)

1. under complete auto encoders

An under complete auto encoder is one of the simplest types of auto encoders.

Under complete auto encoder takes in an image and tries to predict the same image as output, thus
reconstructing the image from the compressed bottleneck region.
Under complete auto encoders are truly unsupervised as they do not take any form of label, the target
being the same as the input. The primary use of auto encoders like such is the generation of the latent
space or the bottleneck, which forms a compressed substitute of the input data and can be easily
decompressed back with the help of the network when needed. This form of compression in the data
can be modeled as a form of dimensionality reduction.

When we think of dimensionality reduction, we tend to think of methods like PCA (Principal
Component Analysis) that form a lower-dimensional hyper plane to represent data in a higher-
dimensional form without losing information PCA can only build linear relationships. As a result, it is
put at a disadvantage compared with methods like under complete auto encoders that can learn non-
linear relationships and, therefore, perform better in dimensionality reduction. Effectively, if we
remove all non-linear activations from an under complete auto encoder and use only linear layers, we
reduce the under complete auto encoder into something that works at an equal footing with PCA.

The loss function used to train an under complete auto encoder is called reconstruction loss, as it is a
check of how well the image has been reconstructed from the input data. Although the reconstruction
loss can be anything depending on the input and output, we will use an L1 loss to depict the term (also
called the norm loss) represented by:

‍ here �^ represents the predicted output and x represents the ground truth.
W

As the loss function has no explicit regularization term, the only method to ensure that the model is not
memorizing the input data is by regulating the size of the bottleneck and the number of hidden layers
within this part of the network—the architecture.

2. Sparse auto encoders

Sparse auto encoders are similar to the under complete auto encoders in that they use the same image as
input and ground truth. However the means via which encoding of information is regulated is
significantly different.
While under complete auto encoders are regulated and fine-tuned by regulating the size of the
bottleneck, the sparse auto encoder is regulated by changing the number of nodes at each hidden layer.
Since it is not possible to design a neural network that has a flexible number of nodes at its hidden
layers, sparse auto encoder’s work by penalizing the activation of some neurons in hidden layers. In
other words, the loss function has a term that calculates the number of neurons that have been activated
and provides a penalty that is directly proportional to that. This penalty, called the sparsely function,
prevents the neural network from activating more neurons and serves as a regularize.

While typical regularizes work by creating a penalty on the size of the weights at the nodes, sparsely
regularize works by creating a penalty on the number of nodes activated. This form of regularization
allows the network to have nodes in hidden layers dedicated to find specific features in images during
training and treating the regularization problem as a problem separate from the latent space problem.
We can thus set latent space dimensionality at the bottleneck without worrying about regularization.
There are two primary ways in which the sparsely regularize term can be incorporated into the loss
function. L1 Loss: In here, we add the magnitude of the sparsely regularize as we do for general
regularizes:

Where h represents the hidden layer, i represents the image in the minibatch, and a represents the
activation.

KL-Divergence: In this case, we consider the activations over a collection of samples at once rather
than summing them as in the L1 Loss method. We constrain the average activation of each neuron over
this collection. Considering the ideal distribution as a Bernoulli distribution, we include KL divergence
within the loss to reduce the difference between the current distribution of the activations and the ideal
(Bernoulli) distribution:

‍3. Contractive auto encoders

Similar to other auto encoders, contractive auto encoders perform task of learning a representation of
the image while passing it through a bottleneck and reconstructing it in the decoder. The contractive
auto encoder also has a regularization term to prevent the network from learning the identity function
and mapping input into the output. Contractive auto encoders work on the basis that similar inputs
should have similar encodings and a similar latent space representation. It means that the latent space
should not vary by a huge amount for minor variations in the input. To train a model that works along
with this constraint, we have to ensure that the derivatives of the hidden layer activations are small with
respect to the input data.‍

Where h represents the hidden layer and x represents the input.

An important thing to note in the loss function (formed from the norm of the derivatives and the
reconstruction loss) is that the two terms contradict each other. While the reconstruction loss wants the
model to tell differences between two inputs and observe variations in the data, the frobenius norm of
the derivatives says that the model should be able to ignore variations in the input data. Putting these
two contradictory conditions into one loss function enables us to train a network where the hidden
layers now capture only the most essential information. This information is necessary to separate
images and ignore information that is non-discriminatory in nature, and therefore, not important.

4. Denoising auto encoders

Denoising auto encoders, as the name suggests, are auto encoders that remove noise from an image. As
opposed to auto encoders we’ve already covered, this is the first of its kind that does not have the input
image as its ground truth. In denoising auto encoders; we feed a noisy version of the image, where
noise has been added via digital alterations. The noisy image is fed to the encoder-decoder architecture,
and the output is compared with the ground truth image.
The denoising auto encoder gets rid of noise by learning a representation of the input where the noise
can be filtered out easily. While removing noise directly from the image seems difficult, the auto
encoder performs this by mapping the input data into a lower-dimensional manifold (like in under
complete auto encoders), where filtering of noise becomes much easier. Essentially, denoising auto
encoders work with the help of non-linear dimensionality reduction. The loss function generally used in
these types of networks is L2 or L1 loss.

5. Variation auto encoders

Standard and variation auto encoders learn to represent the input just in a compressed form called the
latent space or the bottleneck. Therefore, the latent space formed after training the model is not
necessarily continuous and, in effect, might not be easy to interpolate.

For example—

This is what a variational autoencoder would learn from the input:

While these attributes explain the image and can be used in reconstructing the image from the
compressed latent space, they do not allow the latent attributes to be expressed in a probabilistic
fashion. Variation auto encoders deal with this specific topic and express their latent attributes as a
probability distribution, leading to the formation of a continuous latent space that can be easily sampled
and interpolated. When fed the same input, a variation auto encoder would construct latent attributes in
the following manner:
The latent attributes are then sampled from the latent distribution formed and fed to the decoder,
reconstructing the input. The motivation behind expressing the latent attributes as a probability
distribution can be very easily understood via statistical expressions.

We aim at identifying the characteristics of the latent vector z that reconstructs the output given a
particular input. Effectively, we want to study the characteristics of the latent vector given a certain
output x [p (z|x)]. While estimating the distribution becomes impossible mathematically, a much
simpler and easier option is to build a parameterized model that can estimate the distribution for us. It
does it by minimizing the KL divergence between the original distribution and our parameterized one.

Expressing the parameterized distribution as q, we can infer the possible latent attributes used in the
image reconstruction. Assuming the prior z to be a multivariate Gaussian model, we can build a
parameterized distribution as one containing two parameters, the mean and the variance. The
corresponding distribution is then sampled and fed to the decoder, which then proceeds to reconstruct
the input from the sample points.

While this seems easy in theory, it becomes impossible to implement because back propagation cannot
be defined for a random sampling process performed before feeding the data to the decoder. To get by
this hurdle, we use the reparameterization trick—a cleverly defined way to bypass the sampling process
from the neural network.In the reparameterization trick, we randomly sample a value from a unit
Gaussian and then scale this

A diagrammatic view of what we attain can be expressed as:

The variation auto encoder thus allows us to learn smooth latent state representations of the input data.
To train a VAE, we use two loss functions: the reconstruction loss and the other being the KL
divergence. While reconstruction loss enables the distribution to correctly describe the input, by
focusing only on minimizing the reconstruction loss, the network learns very narrow distributions—
akin to discrete latent attributes.The KL divergence loss prevents the network from learning narrow
distributions and tries to bring the distribution closer to a unit normal distribution.

Applications of auto encoders

Now that you understand various types of auto encoders, let’s summarize some of their most common
use cases.

1. Dimensionality reduction

Under complete auto encoders are those that are used for dimensionality reduction. These can be used
as a pre-processing step for dimensionality reduction as they can perform fast and accurate
dimensionality reductions without losing much information. Furthermore, while dimensionality
reduction procedures like PCA can only perform linear dimensionality reductions under complete auto
encoders can perform large-scale non-linear dimensionality reductions.

2. Image denoising

Auto encoders like the denoising auto encoder can be used for performing efficient and highly accurate
image denoising. Unlike traditional methods of denoising, auto encoders do not search for noise, they
extract the image from the noisy data that has been fed to them via learning a representation of it. The
representation is then decompressed to form a noise-free image. Denoising auto encoders thus can
denies complex images that cannot be demised via traditional methods.

3. Generation of image and time series data

Variation Auto encoders can be used to generate both image and time series data.
The parameterized distribution at the bottleneck of the autoencoder can be randomly sampled to
generate discrete values for latent attributes, which can then be forwarded to the decoder,leading to
generation of image data. VAEs can also be used to model time series data like music.

4. Anomaly detection

Under complete auto encoders can also be used for anomaly detection.

For example—consider an auto encoder that has been trained on a specific dataset P. For any image
sampled for the training dataset, the auto encoder is bound to give a low reconstruction loss and is
supposed to reconstruct the image as is. For any image which is not present in the training dataset,
however, the auto encoder cannot perform the reconstruction, as the latent attributes are not adapted for
the specific image that has never been seen by the network. As a result, the outlier image gives off a
very high reconstruction loss and can easily be identified as an anomaly with the help of a proper
threshold. An auto encoder is an unsupervised learning technique for neural networks that learns
efficient data representations (encoding) by training the network to ignore signal “noise.” Auto
encoders can be used for image denoising, image compression, and, in some cases, even generation of
image data. While auto encoders might seem easy at the first glance (as they have a very simple
theoretical background), making them learn a representation of the input that is meaningful is quite
difficult. Auto encoders like the under complete auto encoder and the sparse auto encoder do not have
large scale applications in computer vision compared to VAEs and DAEs which are still used in works
since being proposed in 2013 (by Kingmaker al).

What Are Auto encoders?

Auto encoders are very useful in the field of unsupervised machine learning. You can use them
to compress the data and reduce its dimensionality. The main difference between Auto encoders
and Principle Component Analysis (PCA) is that while PCA finds the directions along which you
can project the data with maximum variance, Auto encoders reconstruct our original input given
just a compressed version of it. If anyone needs the original data can reconstruct it from the
compressed data using an auto encoder.

Architecture

An Auto encoder is a type of neural network that can learn to reconstruct images, text, and other
data from compressed versions of themselves. An Auto encoder consists of three layers:

Decoder

The Encoder layer compresses the input image into a latent space representation. It encodes the
input image as a compressed representation in a reduced dimension. The compressed image is a
distorted version of the original image.The Code layer represents the compressed input fed to the
decoder layer.
The decoder layer decodes the encoded image back to the original dimension. The decoded
image is reconstructed from latent space representation, and it is reconstructed from the latent
space representation and is a loss reconstruction of the original image.

Training Auto encoders

When you're building an auto encoder, there are a few things to keep in mind.

First, the code or bottleneck size is the most critical hyper parameter to tune the auto encoder. It
decides how much data has to be compressed. It can also act as a regularization term. Secondly,
it's important to remember that the number of layers is critical when tuning auto encoders. A
higher depth increases model complexity, but a lower depth is faster to process. Thirdly, you
should pay attention to how many nodes you use per layer. The number of nodes decreases with
each subsequent layer in the auto encoder as the input to each layer becomes smaller across the
layers.

Types of Auto encoders

Under Complete Auto encoders

Under complete auto encoders is an unsupervised neural network that you can use to generate a
compressed version of the input data. It is done by taking in an image and trying to predict the
same image as output, thus reconstructing the image from its compressed bottleneck region. The
primary use for auto encoders like these is generating a latent space or bottleneck, which forms a
compressed substitute of the input data and can be easily decompressed back with the help of the
network when needed.

Sparse Auto encoders

Sparse auto encoders are controlled by changing the number of nodes at each hidden layer. Since
it is impossible to design a neural network with a flexible number of nodes at its hidden layers,
sparse auto encoder’s work by penalizing the activation of some neurons in hidden layers. It
means that a penalty directly proportional to the number of neurons activated is applied to the
loss function. As a means of regularizing the neural network, the sparsely function prevents more
neurons from being activated.

There are two types of regularizes used:

The L1 Loss method is a general regularize we can use to add magnitude to the model. The KL-
divergence method considers the activations over a collection of samples at once rather than
summing them as in the L1 Loss method. We constrain the average activation of each neuron
over this collection.

Your AI/ML CareerAI Engineer Master


Contractive Autoencoders

The input is passed through a bottleneck in a contractive autoencoder and then reconstructed in
the decoder. The bottleneck function is used to learn a representation of the image while passing
it through.

The contractive autoencoder also has a regularization term to prevent the network from learning
the identity function and mapping input into output.

To train a model that works along with this constraint, we need to ensure that the derivatives of
the hidden layer activations are small concerning the input.

Denoising Autoencoders

Have you ever wanted to remove noise from an image but didn't know where to start? If so, then
denoising autoencoders are for you!

Denoising autoencoders are similar to regular autoencoders in that they take an input and
produce an output. However, they differ because they don't have the input image as their ground
truth. Instead, they use a noisy version.

It is because removing image noise is difficult when working with images.

You'd have to do it manually. But with a denoising autoencoder, we feed the noisy idea into our
network and let it map it into a lower-dimensional manifold where filtering out noise becomes
much more manageable.

The loss function usually used with these networks is L2 or L1 loss.

Variation Auto encoders

Variation auto encoders (VAEs) are models that address a specific problem with standard auto
encoders. When you train an auto encoder, it learns to represent the input just in a compressed
form called the latent space or the bottleneck. However, this latent space formed after training is
not necessarily continuous and, in effect, might not be easy to interpolate. Variation auto
encoders deal with this specific topic and express their latent attributes as a probability
distribution, forming a continuous latent space that can be easily sampled and interpolated.

Use Cases

Anomaly detection: auto encoders can identify data anomalies using a loss function that
penalizes model complexity. It can be helpful for anomaly detection in financial markets, where
you can use it to identify unusual activity and predict market trends. Data denoising image and
audio: auto encoders can help clean up noisy pictures or audio files. You can also use them to
remove noise from images or audio recordings.
Image in painting: auto encoders have been used to fill in gaps in images by learning how to
reconstruct missing pixels based on surrounding pixels. For example, if you're trying to restore
an old photograph that's missing part of its right side, the auto encoder could learn how to fill in
the missing details based on what it knows about the rest of the photo. Information retrieval: auto
encoders can be used as content-based image retrieval systems that allow users to search for
images based on their content.

UNDER COMPLETE AUTO ENCODER

An under complete auto encoder is a type of auto encoder designed to learn a compressed, lower-
dimensional representation of the input data. The key characteristic of an under complete auto
encoder is that the dimensionality of the latent space (the internal representation or encoding) is
less than the dimensionality of the input data. This constraint forces the auto encoder to learn the
most salient features of the data.

Structure and Functioning:

Encoder: The encoder part of the network compresses the input data into a smaller
representation. It typically consists of a series of layers that decrease in size, leading to the
bottleneck layer.

Bottleneck (Latent Space): This is the core of the auto encoder, where the data is represented in
its most compressed form. The dimensionality of this layer is less than that of the input layer,
hence the term "under complete."

Decoder: The decoder part reconstructs the input data from this compressed representation. It
generally mirrors the structure of the encoder, with layers increasing in size.

Learning Process:

Objective: The auto encoder is trained to minimize the reconstruction error – the difference
between the original input and the reconstructed output. Commonly used loss functions include
Mean Squared Error (MSE) for continuous input data or cross-entropy for binary input data.

Back propagation and Optimization: Standard techniques like back propagation are used with
optimization algorithms (such as SGD, Adam) to train the network.

Applications:

Dimensionality Reduction: Similar to PCA but more powerful due to non-linearity, under
complete auto encoders can reduce the dimensionality of data, which is useful for visualization
or as a preprocessing step for other algorithms.
Feature Extraction: The compressed representation can serve as a feature set that captures the
most important aspects of the original data, which can be useful in various machine learning
tasks.

Denoising: Although specifically designed for this task, undercomplete autoencoders can also
perform denoising by learning to ignore the "noise" features and capture the underlying structure
of the data.

Advantages and Challenges:

Non-linear Dimensionality Reduction: Unlike linear methods like PCA, undercomplete auto
encoders can capture non-linear relationships in the data.

Risk of Over fitting: If the network is too complex, it might end up memorizing the input data
rather than learning useful features. Regularization techniques like dropout can be helpful.

Choice of Architecture: The architecture of the auto encoder (number of layers, number of
neurons in each layer) is crucial and needs to be designed based on the specific characteristics of
the data and the task at hand.

Regularization of Auto encoders

Auto encoders are a variant of feed-forward neural networks that have an extra bias for
calculating the error of reconstructing the original input. After training, auto encoders are then
used as a normal feed-forward neural network for activations. This is an unsupervised form of
feature extraction because the neural network uses only the original input for learning weights
rather than back propagation, which has labels. Deep networks can use either RBMs or auto
encoders as building blocks for larger networks (a single network rarely uses both).

Use of auto encoders

Auto encoders are used to learn compressed representations of datasets. Commonly, we use it in
reducing the dimensions of the dataset. The output of the auto encoder is a reformation of the
input data in the most efficient form.

Similarities of auto encoders to multilayer perceptron

Auto encoders are identical to multilayer perceptron neural networks because, like multilayer
perceptron, auto encoders have an input layer, some hidden layers, and an output layer. The key
difference between a multilayer perceptron network and an auto encoder is that the output layer
of an auto encoder has the same number of neurons as that of the input layer.
Regularization

Regularization helps with the effects of out-of-control parameters by using different methods to
minimize parameter size over time. In mathematical notation, we see regularization represented
by the coefficient lambda, controlling the trade-off between finding a good fit and keeping the
value of certain feature weights low as the exponents on features increase. Regularization
coefficients L1 and L2 help fight over fitting by making certain weights smaller. Smaller-valued
weights lead to simpler hypotheses, which are the most generalizable. Unregularized weights
with several higher-order polynomials in the feature sets tend to overfit the training set. As the
input training set size grows, the effect of regularization decreases, and the parameters tend to
increase in magnitude. This is appropriate because an excess of features relative to training set
examples leads to over fitting in the first place. Bigger data is the ultimate regularize.

Regularized auto encoders

There are other ways to constrain the reconstruction of an auto encoder than to impose a hidden
layer of smaller dimensions than the input. The regularized auto encoders use a loss function that
helps the model to have other properties besides copying input to the output. We can generally
find two types of regularized auto encoder: the denoising auto encoder and the sparse auto
encoder.

Denoising auto encoder

We can modify the auto encoder to learn useful features is by changing the inputs; we can add
random noise to the input and recover it to the original form by removing noise from the input
data. This prevents the auto encoder from copying the data from input to output because it
contains random noise. We ask it to subtract the noise and produce meaningful underlying data.
This is called a denoising auto encoder.
REGULARIZED AUTO ENCODER

Regularized auto encoders are a variant of auto encoders that incorporate regularization
techniques to impose additional constraints on the learned representation (encoding) of the input
data, beyond just minimizing the reconstruction error. These constraints can help in learning
more useful and robust features. There are several types of regularized auto encoders, each with
its specific form of regularization:

Sparse Auto encoders:

Regularization: They use a sparsity constraint on the hidden layers to ensure that only a small
number of neurons are active at the same time.

Purpose: This leads to a representation that captures the most significant features of the input
data.

Implementation: It’s typically achieved by adding a sparsity term to the loss function, like the
KL divergence between the average activation of hidden neurons and a sparsity parameter.

Denoising Auto encoders:

Regularization: They are trained to reconstruct the original input from a corrupted version of it.

Purpose: This process forces the auto encoder to learn more robust features and ignore the
"noise" in the input data.

Implementation: During training, the input is intentionally corrupted (e.g., by adding Gaussian
noise or masking some of the input values), and the network learns to predict the uncorrupted
input.

Contractive Auto encoders:

Regularization: They add a penalty to the loss function based on the Frobenius norm of the
Jacobian matrix of the encoder activations with respect to the input.

Purpose: This encourages the model to learn a representation that is robust to slight variations of
input values.

Implementation: The regularization term makes the learned representation insensitive to small
variations in the input data.

Variation Auto encoders (VAEs):

Regularization: They are a probabilistic approach that learns a distribution, represented by the
mean and variance, for the input data in the latent space.
Purpose: VAEs are designed to generate new data that’s similar to the input data.

Implementation: They use a different training principle involving the reparameterization trick
and optimization of the variation lower bound.

Applications:

Feature Extraction and Dimensionality Reduction: Regularized auto encoders are excellent
for feature extraction, as the regularization helps in learning more meaningful representations.

Denoising: Particularly with denoising auto encoders, they are effective in tasks that require the
model to be robust to noise in the input data.

Data Generation: Variation auto encoders are used in generative models to produce new data
instances that are similar to the training data.

Advantages:

Robustness: Regularized auto encoders are generally more robust to over fitting compared to
standard auto encoders.

Quality of Representation: They often learn more useful and generalizable features.

Challenges:

Complexity in Training: The added regularization terms can make the training process more
complex and sometimes harder to converge.

Choice of Regularization: Selecting the appropriate type and degree of regularization is crucial
and can depend heavily on the specific application and nature of the input data.

Regularized auto encoders extend the basic auto encoder framework to enforce the learning of
more useful and robust features, making them valuable for a wide range of applications in
unsupervised learning and beyond.

STOCHASTIC ENCODER AND DECODERS

Stochastic encoders and decoders are components of neural network architectures, particularly
prominent in certain types of auto encoders and sequence-to-sequence (seq2seq) models, where
randomness is introduced into the encoding and decoding processes. This stochasticity enables
the models to generate diverse and often more realistic outputs, making them especially useful in
tasks like generative modeling.
1. Stochastic Encoders:

In the context of auto encoders, a stochastic encoder doesn't just produce a fixed point in the
latent space for a given input. Instead, it generates a distribution (usually defined by parameters
like mean and variance). This approach is a key feature of Variation Auto encoders (VAEs). In a
VAE, the encoder outputs parameters of a probability distribution (e.g., Gaussian), and the actual
encoding is a random sample from this distribution. This sampling introduces randomness in the
encoding process. The stochastic nature of the encoder allows the model to capture and represent
the probabilistic distribution of the input data in the latent space, making it particularly effective
for generative tasks.

2. Stochastic Decoders:

Stochastic decoders are often used in sequence generation tasks, such as in natural language
processing for tasks like machine translation, text generation, and summarization. In a seq2seq
model with a stochastic decoder, each step of the decoding process can involve randomness. For
instance, rather than always picking the most likely next word, the model might sample from a
probability distribution over the possible next words. This randomness allows for the generation
of varied and more human-like text, as opposed to deterministic models that always generate the
same output for a given input.

Applications:

Generative Modeling: VAEs with stochastic encoders are used to generate new data instances
(like images, music, or text) that are similar to the training data.

Text Generation: Stochastic decoders in seq2seq models are used in applications like chat bots,
where generating diverse and natural-sounding responses is desirable.

Data Augmentation: Stochastic auto encoders can be used to create variations of existing data,
useful in tasks where data is scarce.

Advantages:

Diversity in Outputs: The introduction of randomness allows these models to generate a variety
of outputs, enhancing creativity and realism in tasks like text generation or image creation.

Better Generalization: Stochastic models can potentially generalize better to new data, as they
learn to model the underlying probability distribution of the data rather than memorizing specific
instances.
Challenges:

Training Difficulty: Stochastic models can be more challenging to train due to the inherent
randomness. Techniques like the reparameterization trick in VAEs are used to make training
more tractable.

Balance between Randomness and Relevance: Finding the right amount of randomness to
introduce, so that the outputs are diverse but still relevant and coherent

CONTRACTIVE ENCODERS

Contrastive encoders refer to a class of models used in machine learning for tasks such as
representation learning and similarity comparison. These models are designed to learn
representations of input data in such a way that similar inputs are mapped to nearby points in the
learned representation space. The key idea behind contrastive encoders is to encourage the model
to pull together representations of similar inputs while pushing apart representations of dissimilar
inputs. This is typically achieved by using a contrastive loss function, which penalizes the model
when similar inputs are far apart in the representation space and when dissimilar inputs are close
together. Contrastive encoders have been used in various domains, including computer vision,
natural language processing, and reinforcement learning. They have been particularly successful
in tasks where learning meaningful representations of data is crucial, such as in unsupervised or
self-supervised learning settings. By learning to capture the underlying structure of the input
data, contrastive encoders can be used to improve performance on downstream tasks such as
classification, clustering, and similarity search.

You might also like