Module 2 Initialization and Optimization Technique

Module 2 covers training deep models, focusing on weight initialization methods (Kaiming and Xavier), optimization techniques (various forms of Gradient Descent), and regularization strategies to combat issues like vanishing and exploding gradients. It explains the concepts of bias and variance in model training, highlighting their impact on underfitting and overfitting. Key techniques for effective training are discussed, including activation functions, batch normalization, and gradient clipping to ensure stable learning.

Uploaded by

reshmi5356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Module 2 Initialization and Optimization Technique

Uploaded by

reshmi5356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Module 2: Training Deep Models

Introduction, setup and initialization- Kaiming, Xavier weight

intializations, Vanishing and exploding gradient problems,
Optimization techniques - Gradient Descent (GD), Stochastic
GD, GD with momentum, GD with Nesterov momentum,
AdaGrad, RMSProp, Adam., Regularization Techniques - L1
and L2 regularization, Early stopping, Dataset augmentation,
Parameter tying and sharing, Ensemble methods, Dropout,
Batch normalization.

Vanishing and Exploding Gradient Problems in deep

learning:
These problems occur during backpropagation, the process
used to train deep neural networks.
In backpropagation, we compute gradients (partial
derivatives) layer by layer, moving from the output layer back
toward the input layer. These gradients are used to update the
weights in the network using optimization algorithms like
gradient descent.
When you multiply many gradients together (one per layer),
two things can happen:
1. Vanishing Gradients
 If the derivatives (gradients) at each layer are small
numbers (e.g., 0.1), then when multiplied across many
layers, the final gradient becomes extremely small—
almost zero.
 As a result, the earlier layers (closer to the input) get
almost no signal to update their weights.
 This makes the network unable to learn from the data.
 In extreme cases, gradients can become exactly zero,
stopping learning entirely.

2. Exploding Gradients
 If the derivatives at each layer are large numbers (e.g.,
>1), the product of many such derivatives becomes
extremely large.
 This causes huge weight updates and makes the model
unstable—loss becomes NaN, or weights blow up.
 The model cannot converge or learn effectively.

To avoid these problems, deep learning practitioners use:

 Activation functions like ReLU (instead of sigmoid or
tanh)
 Batch normalization to stabilize layer inputs
 Gradient clipping to limit very large gradients
 Proper weight initialization (e.g., Xavier, He
initialization)
 Skip connections (e.g., in ResNet) to ease gradient flow

What is Bias and Variance?

These are two types of errors that affect how well your
model learns and generalizes.

1. Bias
 Bias is error due to incorrect assumptions in the learning
algorithm.
 A high bias model is too simple to capture the true
pattern in the data.
Example:
 Trying to fit a straight line (linear model) to a complex
curve → the model underfits.
 High bias → Underfitting
 Low accuracy on both training and test data.
2. Variance
 Variance is error due to sensitivity to small changes in
training data.
 A high variance model is too complex and learns noise
as if it were signal.

 A model fits training data perfectly, but performs poorly

on test data.
High variance -overfitting
High training accuracy but low test accuracy

Weight initialization
Key points
1. Weight should be small
2. Weight should not be the same
3. Weight should have great variance
1. Xavier Initialization (Also called Glorot Initialization)
Used for Activation functions like Sigmoid and Tanh
Goal: Keep the variance of outputs and gradients the
same across layers, so the model trains smoothly.
Formula:
If a layer has:
 n_in inputs
 n_out output
 Sigmoid and Tanh can saturate if values are too
high/low.
 Xavier keeps activations within a useful range to
avoid this.
2. He Initialization (Also called Kaiming Initialization)
Used for: Activation functions like ReLU or Leaky
ReLU
Formula:
If a layer has n_in input neurons:

 Ensures that enough signal passes through ReLU

without vanishing.
OPTIMIZATION TECHNIQUES
Optimizers are algorithms or methods to change the attributes
of the neural networks, such as weights and learning rate to
reduce the losses.
• There are different types of optimization techniques.
Some of the techniques are
1· Gradient Descent
2 · Stochastic Gradient Descent (SGD)
3. Mini-Batch Stochastic Gradient Descent (MB — SGD)
4. SGD with Momentum
5. Nesterov Accelerated Gradient (NAG)
6. Adaptive Gradient (AdaGrad)
7. AdaDelta
8. RMSProp
9. Adam
10. Nadam

DL Mod2
No ratings yet
DL Mod2
45 pages
Module 2
No ratings yet
Module 2
13 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Weight Initialization Techniques Assignment Questions
No ratings yet
Weight Initialization Techniques Assignment Questions
8 pages
Module 2
No ratings yet
Module 2
66 pages
ADL Question Bank
No ratings yet
ADL Question Bank
23 pages
DL Mod2
No ratings yet
DL Mod2
152 pages
Deep Learning Challenges & Solutions
No ratings yet
Deep Learning Challenges & Solutions
64 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Module 2
No ratings yet
Module 2
13 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
ML Concepts
No ratings yet
ML Concepts
3 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
17 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Data Science Interview Qes.
No ratings yet
Data Science Interview Qes.
15 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
DL Class3
No ratings yet
DL Class3
28 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Cours 4
No ratings yet
Cours 4
30 pages
Understanding Loss & Regularization in Deep Learning
No ratings yet
Understanding Loss & Regularization in Deep Learning
19 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Unit3 DL JNTUK
No ratings yet
Unit3 DL JNTUK
15 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Week 10
No ratings yet
Week 10
69 pages
PDF Hyperparameter Tuning Batch Normalization
No ratings yet
PDF Hyperparameter Tuning Batch Normalization
11 pages
ISE-2 5 DL Marks New Imp
No ratings yet
ISE-2 5 DL Marks New Imp
17 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Deep Neural Network Training Guide
No ratings yet
Deep Neural Network Training Guide
55 pages
DL 3unit Last Topic Meta Algoritham
No ratings yet
DL 3unit Last Topic Meta Algoritham
32 pages
Assignment Jaiprakash
No ratings yet
Assignment Jaiprakash
5 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
FDL Module2
No ratings yet
FDL Module2
37 pages
DL Notes
No ratings yet
DL Notes
16 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
5 - Chapter8 - Optimization 2
No ratings yet
5 - Chapter8 - Optimization 2
40 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Circular NTC 2025 07052025
No ratings yet
Circular NTC 2025 07052025
5 pages
Neo-Behaviorism: Tolman & Bandura
No ratings yet
Neo-Behaviorism: Tolman & Bandura
3 pages
Effective Teaching Methods Guide
No ratings yet
Effective Teaching Methods Guide
79 pages
Research Survey Questionnaire
No ratings yet
Research Survey Questionnaire
7 pages
Emma RPMS (2023 2024) (Autosaved)
No ratings yet
Emma RPMS (2023 2024) (Autosaved)
64 pages
B.Ed. Trainees' Learning Styles
No ratings yet
B.Ed. Trainees' Learning Styles
6 pages
PGCE
No ratings yet
PGCE
2 pages
Monday Tuesday Wednesday Thursday Friday
No ratings yet
Monday Tuesday Wednesday Thursday Friday
5 pages
Complete Bundle Testbank Psychological Testing Principles Applications and Issues 9th Edition
No ratings yet
Complete Bundle Testbank Psychological Testing Principles Applications and Issues 9th Edition
408 pages
Rating Sheet Demo Teaching 4
No ratings yet
Rating Sheet Demo Teaching 4
2 pages
TFF Annual Review Worksheets
No ratings yet
TFF Annual Review Worksheets
11 pages
GEN499 Week 2 Assignment
100% (1)
GEN499 Week 2 Assignment
7 pages
Speaking 0
No ratings yet
Speaking 0
2 pages
International Journal of Nursing Studies: Inge A. Pool, Rob F. Poell, Marjolein G.M.C. Berings, Olle Ten Cate
No ratings yet
International Journal of Nursing Studies: Inge A. Pool, Rob F. Poell, Marjolein G.M.C. Berings, Olle Ten Cate
12 pages
B.Ed.S2, P11&12 (Teaching of SS), U2, EngMed
No ratings yet
B.Ed.S2, P11&12 (Teaching of SS), U2, EngMed
34 pages
Readings Peace Ed
No ratings yet
Readings Peace Ed
13 pages
Communicative Language Teaching
No ratings yet
Communicative Language Teaching
14 pages
Skills for Career Changers
No ratings yet
Skills for Career Changers
2 pages
Activity Slides-Activate! Student Engagement
No ratings yet
Activity Slides-Activate! Student Engagement
13 pages
Agriculture - Grade 6 - Term-I
No ratings yet
Agriculture - Grade 6 - Term-I
25 pages
Analytical Exposition Exercises
No ratings yet
Analytical Exposition Exercises
3 pages
Erratum Sams Int Gcse Arabic 4aa1 01
No ratings yet
Erratum Sams Int Gcse Arabic 4aa1 01
3 pages
Chronicles of CAFA 1
No ratings yet
Chronicles of CAFA 1
5 pages
Alternative Learning System Programs Overview
100% (1)
Alternative Learning System Programs Overview
20 pages
Resume of Muhammad Amin Kamal
No ratings yet
Resume of Muhammad Amin Kamal
2 pages
MODULE 8 - Week 3 - The Child and Adolescent Learners and Learning Principles
No ratings yet
MODULE 8 - Week 3 - The Child and Adolescent Learners and Learning Principles
6 pages
RRLS 2023
No ratings yet
RRLS 2023
5 pages
Philosophy of Education: Education As A Horizon of Freedom
No ratings yet
Philosophy of Education: Education As A Horizon of Freedom
11 pages
MAPEH - Global Initiatives LP
No ratings yet
MAPEH - Global Initiatives LP
8 pages
HONDA Report PDF
No ratings yet
HONDA Report PDF
70 pages

Module 2 Initialization and Optimization Technique

Uploaded by

Module 2 Initialization and Optimization Technique

Uploaded by

Module 2: Training Deep Models

Introduction, setup and initialization- Kaiming, Xavier weight

Vanishing and Exploding Gradient Problems in deep

To avoid these problems, deep learning practitioners use:

What is Bias and Variance?

 A model fits training data perfectly, but performs poorly

 Ensures that enough signal passes through ReLU

You might also like