[go: up one dir, main page]

0% found this document useful (0 votes)
37 views24 pages

Introduction To Deep Learning - Deep Feed Forward Network

The document provides an introduction to deep feedforward networks, also known as multilayer perceptrons, which are fundamental models in deep learning. It discusses the importance of these networks in machine learning applications, their structure, and the significance of weight initialization techniques like Xavier initialization to ensure effective training. Additionally, it highlights the concept of symmetry breaking to prevent identical feature creation among neurons in a layer.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views24 pages

Introduction To Deep Learning - Deep Feed Forward Network

The document provides an introduction to deep feedforward networks, also known as multilayer perceptrons, which are fundamental models in deep learning. It discusses the importance of these networks in machine learning applications, their structure, and the significance of weight initialization techniques like Xavier initialization to ensure effective training. Additionally, it highlights the concept of symmetry breaking to prevent identical feature creation among neurons in a layer.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to deep learning, Deep

feed forward network


Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Deep feed forward network
Deep feedforward networks, also often called
feedforward neural networks, or multilayer perceptrons
(MLPs), are the quintessential deep learning models.

The goal of a feedforward network is to approximate some


function f ∗. For example, for a classifier, y = f ∗(x) maps an
input x to a category y.
Deep feed forward network
A feedforward network defines a mapping
y = f (x; θ)
and learns the value of the parameters θ that result in the
best function approximation.

These models are called feedforward because information flows


through the function being evaluated from x, through the
intermediate computations used to define f , and finally to the
output y.
Deep feed forward network

Feedforward networks are of extreme importance to machine


learning practitioners.
They form the basis of many important commercial
applications.
For example, the convolutional networks used for object
recognition from photos are a specialized kind of feedforward
network.
Deep feed forward network

Feedforward networks are a conceptual stepping stone on


the path to recurrent networks, which power many natural
language applications.

Feedforward neural networks are called networks because


they are typically represented by composing together many
different functions.
Deep feed forward network

For example,
we might have three functions f (1) , f (2), and f (3)
connected in a chain, to form

f (x) = f(3)(f (2)(f (1)(x))).

These chain structures are the most commonly used structures


of neural networks.
Deep feed forward network

These chain structures are the most commonly used


structures of neural networks.

In this case, f (1) is called the first layer of the network,
f (2) is called the second layer, and so on.
Deep feed forward network

The overall length of the chain gives the depth of the model.

It is from this terminology that the name “deep learning”


arises.

The final layer of a feedforward network is called the output


layer.
Deep feed forward network

The behavior of the other layers is not directly specified by the


training data.

Because the training data does not show the desired output for
each of these layers, these layers are called hidden layers.
Deep feed forward network

Finally, these networks are called neural because they are


loosely inspired by neuroscience.

Each hidden layer of the network is typically vector-valued.

The dimensionality of these hidden layers determines the


width of the model.
Initialization

Initialization is particularly important in neural networks


because of the stability issues associated with neural network
training.

Neural networks often exhibit stability problems in the sense


that the activations of each layer either become successively
weaker or successively stronger.
Initialization
The effect is exponentially related to the depth of the network,
and is therefore particularly severe in deep networks.

One way of ameliorating this effect to some extent is to choose


good initialization points in such a way that the gradients are
stable across the different layers.
Initialization

One possible approach to initialize the weights is to generate


random values from a Gaussian distribution with zero mean
and a small standard deviation, such as 10−2.

Typically, this will result in small random values that are both
positive and negative.
Initialization

One problem with this initialization is that it is not sensitive to


the number of inputs to a specific neuron.

For example, if one neuron has only 2 inputs and another has 100
inputs, the output of the former is far more sensitive to the
average weight because of the additive effect of more inputs
(which will show up as a much larger gradient).
Initialization

One problem with this initialization is that it is not sensitive to


the number of inputs to a specific neuron.
Initialization
Example,
1. Neuron A with 2 Inputs:
 Suppose this neuron has only two input
connections.
 The output of this neuron depends heavily on the values
of its weights because there are fewer inputs contributing to
the output. Any small change in the weights can
significantly affect the neuron's output.
Initialization

2. Neuron B with 100 Inputs:


This neuron has 100 input connections.
The effect of each individual weight on the output diminishes
because the outputs from all 100 inputs combine. Even if a few
weights change, the impact on the neuron's output is less
pronounced due to the averaging effect.
Initialization

In general, it can be shown that the variance of the outputs


linearly scales with the number of inputs, and

Therefore the standard deviation scales with the square root of


the number of inputs.
Initialization

To balance this fact, each weight is initialized to a value drawn


from a Gaussian distribution with standard deviation sqrt(1/r),

where r is the number of inputs to that neuron.


Xavier initialization or Glorot

Is a weight initialization technique designed to help neural


networks converge more efficiently during training.
It was introduced by Xavier Glorot and Yoshua Bengio in their
paper "Understanding the difficulty of training deep feedforward
neural networks."
Xavier initialization or Glorot
The weights are initialized in such a way that:
The variance of the outputs of each layer is the same as the
variance of its inputs.
The gradients during backpropagation have a similar variance
across layers, preventing vanishing or exploding gradients.
This is achieved by carefully scaling the initial weights based on
the number of input and output neurons.
Xavier initialization or Glorot initialization.

Let rin and rout respectively be the fan-in and fan-out for a
particular neuron.

Then to use a Gaussian distribution with standard deviation of


sqrt(2/(rin + rout)).
Symmetry breaking.

An important consideration in using randomized methods is that


symmetry breaking is important.
if all weights are initialized to the same value (such as 0), all
updates will move in lock-step in a layer.
As a result, identical features will be created by the neurons in a
layer.
It is important to have a source of asymmetry among the neurons
to begin with.
Thank You!

You might also like