0% found this document useful (0 votes)

20 views37 pages

FDL Module2

Uploaded by

ghostwolfvn6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views37 pages

FDL Module2

Uploaded by

ghostwolfvn6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

FOUNDATIONS OF DEEP LEARNING

SURYA C
DEPT. OF ADS

1
Module 2
Training Deep Learning Models

2
Set up and Initialization Issues

3
Set up and Initialization Issues
• Complex and multi dimensional landscapes have few local minima

• Random wight initialization gives you enough variability

• With random weights some will be strengthened and some weakened

• Small numbers lead to vanishing gradient and large numbers lead to exploding
gradients

• It is important to set the variance of the weights proportional to the size of each
layer with in the network

• Initializing weights is more important than initializing biases

4
Common weight Initialization Techniques
• Kaiming:
• Uniform distribution
• Standard deviation is proportional to the input units in the layer
1
• 𝑤 ∈ −𝜎, 𝜎 𝜎=
𝑁𝑖𝑛
1+ 𝑎2
• 𝜎=
𝑁𝑖𝑛
• a = slope of negative activation(1 for linear activation, 0 for ReLU)
• Weights of different layers will be different according to the size of each layer
based on activation slope
• Specially designed for ReLU activation function
• Address the problem of vanishing gradient

5
Common weight Initialization Techniques
• Xavier:
• Weights are random numbers drawn from a normal distribution with a mean of 0
with a symmetric distribution
• The variance is set to by the total size of the weights matrix(total number of inputs
and outputs)
2
• 𝑤 ∈ 𝑁 0, 𝜎 2 𝜎2 =
𝑁 𝑖 + 𝑁𝑜
𝑝 𝑝

• Designed to address the exploding gradient problem

• Suitable for feedforward networks with tanh and sigmoidal activation function
• Assume a linear relationship between the input and output

6
Common weight Initialization Techniques
• Freezing Weights
• “Freezing” a layer means to switch off gradient descent in that layer

• The weights will not change and thus the layer will no longer learn

• The main application of freezing is when we are using a pre-trained model

7
Vanishing and Exploding Gradient Problems
• Vanishing gradient • Exploding gradient

• Weights don’t change – no

learning
• Weights change wildly – bad
• Problematic for deep networks solution
• The network never stops learning
8
Gradient Descent Algorithm

9
Stochastic Gradient Descent

10
Stochastic Gradient Descent

11
Mini Batches Gradient Descent

12
Mini Batches Gradient Descent

13
Courtesy: Deep learning in Malayalam 14
Courtesy: Deep learning in Malayalam 15
Courtesy: Deep learning in Malayalam 16
Courtesy: Deep learning in Malayalam 17
Courtesy: Deep learning in Malayalam 18
Courtesy: Deep learning in Malayalam 19
Courtesy: Deep learning in Malayalam 20
Courtesy: Deep learning in Malayalam 21
Courtesy: Deep learning in Malayalam 22
Courtesy: Deep learning in Malayalam 23
Courtesy: Deep learning in Malayalam 24
25
Courtesy: Deep learning in Malayalam 26
Concept of Regularization
• Overfitting is an important problem in neural network training
• We use Regularization to avoid overfitting
• Redefining the Loss function by adding a penalty
• 𝐽 𝜃: 𝑋, 𝑦 = 𝐽 𝜃; 𝑋, 𝑦 + 𝛼Ω θ
• Two types
• L1 (Lasso) Regularization
• L2 (Ridge) Regularization

27
L2 Regularization
• Simplest and most common form of regularization
• Commonly known as ‘weight decay’
• Also known as Ridge regression or Tikhonov regularization
1 2
• This regularization strategy drives the weights closer to the origin by adding a regularization term Ω(θ)= 𝑤
2
• To simplify, we assume no bias parameters. So θ is just w

• Or
• 𝐽(𝜃)𝐿2 = 𝐽 θ + λ σ𝑛𝑖=1 𝑤𝑖2

• J(θ) – Original cost function without regularization

• 𝑤𝑖 − Weights of the model
• n – total number of weights
• λ- regularization hyperparameter

• L2 regularization encourages all parameters to be small, but not exactly zero

• It preaches the model from becoming too sensitive to small variations in the training data

28
L1 Regularization
• Also known as Lasso regularization
• L1 regularization adds a penalty term to loss function that is proportional to the
absolute value of the model’s parameter
• Ω θ = 𝑤 𝑖 = σ𝑖 𝑤𝑖
• The equation is
• Or
• 𝐽(𝜃)𝐿2 = 𝐽 θ + λ σ𝑛𝑖=1 𝑤𝑖
• J(θ) – Original cost function without regularization
• 𝑤𝑖 − Weights of the model
• n – total number of weights
• α/λ- regularization hyperparameter that contains the strength of the penalty
• L1 regularization encourages the model to have sparse weights, i.e; many parameters
become originally zero
• It selects a subset of important features and can be used for feature selection
• It helps prevent overfitting by making the model simple 29
Methods of Regularization
• Early stopping
• Dataset Augmentation
• Parameter tying and sharing
• Ensemble methods
• Dropout
• Batch Normalization

30
Early Stopping

31
Dataset Augmentation
• The best way to make an ML model
generalize better is to give more training
data
• However, the amount of data we have is
limited
• Create fake data and add it into the
training set
• This approach is easiest for classification
• We can generate new data by
transferring the input data in the
training set
• Image dataset can be easily created by
transferring pixels in each direction
• Creating fake data and injecting noise to
the data are also considered as dataset
augmentation
32
Parameter Tying and Sharing
• Techniques used in deep learning to reduce the number of parameters in a
neural network model while maintain its capacity to capture complex patterns
in data
• It helps mitigate the risk of overfitting and can make models more efficient
• Parameter tying involves using the same set of parameters for multiple layers
of a neural network
• It essentially enforces that certain weights are identical or constrained in a
particular way
• Embedding
• Weight sharing in CNN
• Parameter sharing is a specific form of parameter tying where the same set of
parameters are used across different layers or units in a neural network

33
Ensemble Methods
• Bagging
• Boosting
• Stacking
• Random Forests
• Neural Network ensembles

34
Dropout

35
Batch Normalization

36
Thank You………………….

NITW - Improving Deep Neural Networks
No ratings yet
NITW - Improving Deep Neural Networks
50 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
DL Class3
No ratings yet
DL Class3
28 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Cours 4
No ratings yet
Cours 4
30 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Training Neural
No ratings yet
Training Neural
16 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
DL Mod2
No ratings yet
DL Mod2
152 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
Week 10
No ratings yet
Week 10
69 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
Unit 3
No ratings yet
Unit 3
110 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Ceng403 - Week 6b
No ratings yet
Ceng403 - Week 6b
51 pages
DL Module 2
No ratings yet
DL Module 2
8 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
3 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Deep Learning
100% (2)
Deep Learning
49 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
DL UNIT 3 - Part1
No ratings yet
DL UNIT 3 - Part1
27 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Mod 4
No ratings yet
Mod 4
65 pages
Deep Learning with Keras Basics
No ratings yet
Deep Learning with Keras Basics
58 pages
Grade XI - Phase Exam-2 - Students Circular 2024-25
No ratings yet
Grade XI - Phase Exam-2 - Students Circular 2024-25
3 pages
Student and Teacher Daily Routines
No ratings yet
Student and Teacher Daily Routines
2 pages
FCE Speaking Exam Part 1 Questions
100% (1)
FCE Speaking Exam Part 1 Questions
2 pages
Young Compensation Inventory YCI
100% (2)
Young Compensation Inventory YCI
3 pages
Jsu Uasa Year 4
No ratings yet
Jsu Uasa Year 4
2 pages
2014 Script Grad
No ratings yet
2014 Script Grad
6 pages
Report On AttendanceManagementSystem
No ratings yet
Report On AttendanceManagementSystem
105 pages
Primary Teacher Instruction Guide
No ratings yet
Primary Teacher Instruction Guide
8 pages
SAAJAN
No ratings yet
SAAJAN
910 pages
Ob Clinical Presentation - Centering Pregnancy
No ratings yet
Ob Clinical Presentation - Centering Pregnancy
9 pages
AI Course For K12 & High School
No ratings yet
AI Course For K12 & High School
4 pages
Code of Ethics
No ratings yet
Code of Ethics
2 pages
Ultra-High Vacuum Systems Explained
No ratings yet
Ultra-High Vacuum Systems Explained
140 pages
القراءة التأويمية لنص العرض
No ratings yet
القراءة التأويمية لنص العرض
48 pages
C - XVII STUDENTS RESEARCH CONFERENCE
No ratings yet
C - XVII STUDENTS RESEARCH CONFERENCE
35 pages
Q1 Module 3 - Topic3 - Functions
No ratings yet
Q1 Module 3 - Topic3 - Functions
4 pages
Literacy Plan - 5th Grade Writing Program - Bri Bissell and Madison Lewis
No ratings yet
Literacy Plan - 5th Grade Writing Program - Bri Bissell and Madison Lewis
19 pages
Hindu Studies - Engish
No ratings yet
Hindu Studies - Engish
8 pages
DGSS
No ratings yet
DGSS
5 pages
The Doctrine of The Sanctuary in The Seventh-Day Adventist Church
No ratings yet
The Doctrine of The Sanctuary in The Seventh-Day Adventist Church
344 pages
10th Grade Scientific Revolution Lesson
No ratings yet
10th Grade Scientific Revolution Lesson
3 pages
OJT Agreement for Engineering Students
No ratings yet
OJT Agreement for Engineering Students
3 pages
Advanced Speaking PDF
100% (1)
Advanced Speaking PDF
11 pages
Physician Assistant: A Guide To Clinical Practice Ruth Ballweg Instant Download
100% (3)
Physician Assistant: A Guide To Clinical Practice Ruth Ballweg Instant Download
55 pages
Experienced Service Professional
No ratings yet
Experienced Service Professional
3 pages
Observation in "The Invisible Japanese Gentlemen"
No ratings yet
Observation in "The Invisible Japanese Gentlemen"
2 pages
Environmental Science Projects Boost Student Knowledge
No ratings yet
Environmental Science Projects Boost Student Knowledge
17 pages
Teaching Mock Demo Script - Leah N. Agulto
100% (1)
Teaching Mock Demo Script - Leah N. Agulto
3 pages
Inter School Chess Tmt. 2025
No ratings yet
Inter School Chess Tmt. 2025
2 pages
Simplifying Compliances Tax & E-Filing Payroll Accounting
No ratings yet
Simplifying Compliances Tax & E-Filing Payroll Accounting
17 pages

FDL Module2

Uploaded by

FDL Module2

Uploaded by

FOUNDATIONS OF DEEP LEARNING

• Random wight initialization gives you enough variability

• With random weights some will be strengthened and some weakened

• Initializing weights is more important than initializing biases

• Designed to address the exploding gradient problem

• The main application of freezing is when we are using a pre-trained model

• Weights don’t change – no

• J(θ) – Original cost function without regularization

• L2 regularization encourages all parameters to be small, but not exactly zero

You might also like