0% found this document useful (0 votes)

37 views24 pages

Introduction To Deep Learning - Deep Feed Forward Network

The document provides an introduction to deep feedforward networks, also known as multilayer perceptrons, which are fundamental models in deep learning. It discusses the importance of these networks in machine learning applications, their structure, and the significance of weight initialization techniques like Xavier initialization to ensure effective training. Additionally, it highlights the concept of symmetry breaking to prevent identical feature creation among neurons in a layer.

Uploaded by

devanand272003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views24 pages

Introduction To Deep Learning - Deep Feed Forward Network

Uploaded by

devanand272003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Introduction to deep learning, Deep

feed forward network

Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Deep feed forward network
Deep feedforward networks, also often called
feedforward neural networks, or multilayer perceptrons
(MLPs), are the quintessential deep learning models.

The goal of a feedforward network is to approximate some

function f ∗. For example, for a classifier, y = f ∗(x) maps an
input x to a category y.
Deep feed forward network
A feedforward network defines a mapping
y = f (x; θ)
and learns the value of the parameters θ that result in the
best function approximation.

These models are called feedforward because information flows

through the function being evaluated from x, through the
intermediate computations used to define f , and finally to the
output y.
Deep feed forward network

Feedforward networks are of extreme importance to machine

learning practitioners.
They form the basis of many important commercial
applications.
For example, the convolutional networks used for object
recognition from photos are a specialized kind of feedforward
network.
Deep feed forward network

Feedforward networks are a conceptual stepping stone on

the path to recurrent networks, which power many natural
language applications.

Feedforward neural networks are called networks because

they are typically represented by composing together many
different functions.
Deep feed forward network

For example,
we might have three functions f (1) , f (2), and f (3)
connected in a chain, to form

f (x) = f(3)(f (2)(f (1)(x))).

These chain structures are the most commonly used structures

of neural networks.
Deep feed forward network

These chain structures are the most commonly used

structures of neural networks.

In this case, f (1) is called the first layer of the network,
f (2) is called the second layer, and so on.
Deep feed forward network

The overall length of the chain gives the depth of the model.

It is from this terminology that the name “deep learning”

arises.

The final layer of a feedforward network is called the output

layer.
Deep feed forward network

The behavior of the other layers is not directly specified by the

training data.

Because the training data does not show the desired output for
each of these layers, these layers are called hidden layers.
Deep feed forward network

Finally, these networks are called neural because they are

loosely inspired by neuroscience.

Each hidden layer of the network is typically vector-valued.

The dimensionality of these hidden layers determines the

width of the model.
Initialization

Initialization is particularly important in neural networks

because of the stability issues associated with neural network
training.

Neural networks often exhibit stability problems in the sense

that the activations of each layer either become successively
weaker or successively stronger.
Initialization
The effect is exponentially related to the depth of the network,
and is therefore particularly severe in deep networks.

One way of ameliorating this effect to some extent is to choose

good initialization points in such a way that the gradients are
stable across the different layers.
Initialization

One possible approach to initialize the weights is to generate

random values from a Gaussian distribution with zero mean
and a small standard deviation, such as 10−2.

Typically, this will result in small random values that are both
positive and negative.
Initialization

One problem with this initialization is that it is not sensitive to

the number of inputs to a specific neuron.

For example, if one neuron has only 2 inputs and another has 100
inputs, the output of the former is far more sensitive to the
average weight because of the additive effect of more inputs
(which will show up as a much larger gradient).
Initialization

One problem with this initialization is that it is not sensitive to

the number of inputs to a specific neuron.
Initialization
Example,
1. Neuron A with 2 Inputs:
 Suppose this neuron has only two input
connections.
 The output of this neuron depends heavily on the values
of its weights because there are fewer inputs contributing to
the output. Any small change in the weights can
significantly affect the neuron's output.
Initialization

2. Neuron B with 100 Inputs:

This neuron has 100 input connections.
The effect of each individual weight on the output diminishes
because the outputs from all 100 inputs combine. Even if a few
weights change, the impact on the neuron's output is less
pronounced due to the averaging effect.
Initialization

In general, it can be shown that the variance of the outputs

linearly scales with the number of inputs, and

Therefore the standard deviation scales with the square root of

the number of inputs.
Initialization

To balance this fact, each weight is initialized to a value drawn

from a Gaussian distribution with standard deviation sqrt(1/r),

where r is the number of inputs to that neuron.

Xavier initialization or Glorot

Is a weight initialization technique designed to help neural

networks converge more efficiently during training.
It was introduced by Xavier Glorot and Yoshua Bengio in their
paper "Understanding the difficulty of training deep feedforward
neural networks."
Xavier initialization or Glorot
The weights are initialized in such a way that:
The variance of the outputs of each layer is the same as the
variance of its inputs.
The gradients during backpropagation have a similar variance
across layers, preventing vanishing or exploding gradients.
This is achieved by carefully scaling the initial weights based on
the number of input and output neurons.
Xavier initialization or Glorot initialization.

Let rin and rout respectively be the fan-in and fan-out for a
particular neuron.

Then to use a Gaussian distribution with standard deviation of

sqrt(2/(rin + rout)).
Symmetry breaking.

An important consideration in using randomized methods is that

symmetry breaking is important.
if all weights are initialized to the same value (such as 0), all
updates will move in lock-step in a layer.
As a result, identical features will be created by the neurons in a
layer.
It is important to have a source of asymmetry among the neurons
to begin with.
Thank You!

DL Mod2
No ratings yet
DL Mod2
152 pages
Unit-1 NN
No ratings yet
Unit-1 NN
12 pages
DL Unit 3 Notes
No ratings yet
DL Unit 3 Notes
16 pages
Neural Network
No ratings yet
Neural Network
7 pages
Unit 3
No ratings yet
Unit 3
110 pages
DL Unit-1 San
No ratings yet
DL Unit-1 San
58 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
DL 2
No ratings yet
DL 2
62 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
General Observation
No ratings yet
General Observation
93 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Deep Learning
No ratings yet
Deep Learning
38 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Module 3
No ratings yet
Module 3
83 pages
Intro to Feed Forward Neural Networks
No ratings yet
Intro to Feed Forward Neural Networks
41 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Unit 3
No ratings yet
Unit 3
8 pages
Main
No ratings yet
Main
25 pages
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
3 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Neural Network
No ratings yet
Neural Network
55 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
Deep Learning & Neural Networks
No ratings yet
Deep Learning & Neural Networks
10 pages
Unit3 DL JNTUK
No ratings yet
Unit3 DL JNTUK
15 pages
Module 2
No ratings yet
Module 2
44 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Unit 3
No ratings yet
Unit 3
7 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Neural Network Essentials
No ratings yet
Neural Network Essentials
34 pages
4.0 The Complete Guide To Artificial Neural Networks
0% (1)
4.0 The Complete Guide To Artificial Neural Networks
23 pages
Lec3 Learning
No ratings yet
Lec3 Learning
147 pages
Unit 1 Notes Final
No ratings yet
Unit 1 Notes Final
36 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Unit2 3 Notes
No ratings yet
Unit2 3 Notes
34 pages
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Neural Networks Lecture Notes
No ratings yet
Neural Networks Lecture Notes
19 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
Deep Learning in Healthcare
100% (1)
Deep Learning in Healthcare
57 pages
ET 287 Unit3 MLP
No ratings yet
ET 287 Unit3 MLP
71 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
NNDL
No ratings yet
NNDL
96 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
13 Ann
No ratings yet
13 Ann
39 pages
Artificial Neural Network Concepts/Terminology
No ratings yet
Artificial Neural Network Concepts/Terminology
22 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Speech Recognition
No ratings yet
Speech Recognition
7 pages
Convolution and Pooling As An Infinitely Strong Prior
100% (1)
Convolution and Pooling As An Infinitely Strong Prior
11 pages
AdaGrad - RMSProp - Adam
No ratings yet
AdaGrad - RMSProp - Adam
9 pages
Recurrent Neural Networks RNN
No ratings yet
Recurrent Neural Networks RNN
19 pages
Activation Functions - Sigmoid - Tanh - ReLU - Softmax - Risk Minimization - Loss Function
No ratings yet
Activation Functions - Sigmoid - Tanh - ReLU - Softmax - Risk Minimization - Loss Function
17 pages
Introduction To Neural Networks - Single Layer Perceptrons - Modified
No ratings yet
Introduction To Neural Networks - Single Layer Perceptrons - Modified
26 pages
Encoder-Decoder Sequence To Sequence Architechure
No ratings yet
Encoder-Decoder Sequence To Sequence Architechure
16 pages
Computer Vision
No ratings yet
Computer Vision
20 pages
Slopes Checkpoint Relearning
No ratings yet
Slopes Checkpoint Relearning
7 pages
46667bosfnd p3 Init
No ratings yet
46667bosfnd p3 Init
13 pages
Midas Touch: The Astrology of Wealth 1st Edition Marc Boney PDF Download
100% (9)
Midas Touch: The Astrology of Wealth 1st Edition Marc Boney PDF Download
119 pages
SDE Course
No ratings yet
SDE Course
1 page
100 MCQ Question Series
No ratings yet
100 MCQ Question Series
19 pages
The Problem of Overfitting: Perspective
No ratings yet
The Problem of Overfitting: Perspective
12 pages
Theory of Architecture - Reviewer 1.1
100% (1)
Theory of Architecture - Reviewer 1.1
7 pages
9MA0 01 Mock Q 01
No ratings yet
9MA0 01 Mock Q 01
1 page
Higgs
100% (1)
Higgs
36 pages
Numbers 6-9 Practice Sheets
No ratings yet
Numbers 6-9 Practice Sheets
13 pages
Python 89 Codes
No ratings yet
Python 89 Codes
80 pages
Chapter 3 II Arc Length & Area of Sector ENRICH
100% (1)
Chapter 3 II Arc Length & Area of Sector ENRICH
17 pages
Cardan Joints Efficiency PDF
No ratings yet
Cardan Joints Efficiency PDF
6 pages
Pile Group Settlement Analysis On The Basis of Static Load Test
No ratings yet
Pile Group Settlement Analysis On The Basis of Static Load Test
9 pages
Cethos30 - Double Integration Method
No ratings yet
Cethos30 - Double Integration Method
5 pages
契约理论讲义9 拍卖
No ratings yet
契约理论讲义9 拍卖
5 pages
IXL Add, Subtract, Multiply, or Divide Two Decimals 6th Grade Math
No ratings yet
IXL Add, Subtract, Multiply, or Divide Two Decimals 6th Grade Math
1 page
Comparison of Effective Width Method, Pigeaud's Theory and Westergaad's Method With SAP2000 Fem Model
No ratings yet
Comparison of Effective Width Method, Pigeaud's Theory and Westergaad's Method With SAP2000 Fem Model
14 pages
Audio-Based Toxic Language Detection
No ratings yet
Audio-Based Toxic Language Detection
7 pages
Neo-Riemannian Theory
No ratings yet
Neo-Riemannian Theory
3 pages
Under Her Spell Roberto Rossellini in India 1st Edition Dileep Padgaonkar Available Any Format
100% (2)
Under Her Spell Roberto Rossellini in India 1st Edition Dileep Padgaonkar Available Any Format
117 pages
Grade 10 Book (Maths Quest 10+10A)
91% (11)
Grade 10 Book (Maths Quest 10+10A)
920 pages
Heizer Om13 TB MD
No ratings yet
Heizer Om13 TB MD
34 pages
CHL-Internet Appendix
No ratings yet
CHL-Internet Appendix
21 pages
Basic Statistical Concepts For Nurses
100% (2)
Basic Statistical Concepts For Nurses
23 pages
P4 Math Test 3
No ratings yet
P4 Math Test 3
7 pages
SQL Basics for Beginners
No ratings yet
SQL Basics for Beginners
10 pages
Understanding Polars Without Math
100% (1)
Understanding Polars Without Math
176 pages
Data Interpretation IBPS PO PDF Set 6
No ratings yet
Data Interpretation IBPS PO PDF Set 6
38 pages
Sit Cse PG - Big Data Syllabus
No ratings yet
Sit Cse PG - Big Data Syllabus
96 pages

Introduction To Deep Learning - Deep Feed Forward Network

Uploaded by

Introduction To Deep Learning - Deep Feed Forward Network

Uploaded by

Introduction to deep learning, Deep

feed forward network

The goal of a feedforward network is to approximate some

These models are called feedforward because information flows

Feedforward networks are of extreme importance to machine

Feedforward networks are a conceptual stepping stone on

Feedforward neural networks are called networks because

f (x) = f(3)(f (2)(f (1)(x))).

These chain structures are the most commonly used structures

These chain structures are the most commonly used

It is from this terminology that the name “deep learning”

The final layer of a feedforward network is called the output

The behavior of the other layers is not directly specified by the

Finally, these networks are called neural because they are

Each hidden layer of the network is typically vector-valued.

The dimensionality of these hidden layers determines the

Initialization is particularly important in neural networks

Neural networks often exhibit stability problems in the sense

One way of ameliorating this effect to some extent is to choose

One possible approach to initialize the weights is to generate

One problem with this initialization is that it is not sensitive to

One problem with this initialization is that it is not sensitive to

2. Neuron B with 100 Inputs:

In general, it can be shown that the variance of the outputs

Therefore the standard deviation scales with the square root of

To balance this fact, each weight is initialized to a value drawn

where r is the number of inputs to that neuron.

Is a weight initialization technique designed to help neural

Then to use a Gaussian distribution with standard deviation of

An important consideration in using randomized methods is that

You might also like