0% found this document useful (0 votes)

19 views15 pages

Optimization in Deep Learning

The document discusses optimization techniques in deep learning, highlighting methods such as Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent. It emphasizes the importance of selecting appropriate optimizers to enhance model training efficiency and accuracy, illustrated by a case study showing improved performance metrics. Key concepts such as learning rates, batch weights, and convergence criteria are also explained.

Uploaded by

recognitionface397

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views15 pages

Optimization in Deep Learning

Uploaded by

recognitionface397

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SNS COLLEGE OF ENGINEERING

Kurumbapalayam(Po), Coimbatore – 641 107

Accredited by NAAC-UGC with ‘A’ Grade
Approved by AICTE, Recognized by UGC & Affiliated to Anna University, Chennai

Department of AI &DS

Course Name – 19AD602 DEEP LEARNING

III Year / VI Semester

UNIT-4 OPTIMIZATION AND GENERALIZATION

Topic: OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

OPTIMIZATION IN DEEP LEARNING

CASE STUDY:
After applying optimization techniques like batch normalization, learning rate scheduling, and the Adam
optimizer, training time reduces to 4 hours, accuracy improves to 90%, and the model generalizes better to
unseen data.

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

1/15
OPTIMIZATION IN DEEP LEARNING

There are various optimization techniques to change model weights and learning rates, like Gradient Descent, Stochastic
Gradient Descent, Stochastic Gradient descent with momentum, Mini-Batch Gradient Descent, AdaGrad, RMSProp, AdaDelta, and
Adam. These optimization techniques play a critical role in the training of neural networks, as they help improve the model by
adjusting its parameters to minimize the loss of function value. Choosing the best optimizer depends on the application.
1. The epoch is the number of times the algorithm iterates over the entire training dataset.
2. Batch weights refer to the number of samples used for updating the model parameters.
3. A sample is a single record of data in a dataset.
4. Learning Rate is a parameter determining the scale of model weight updates
5. Weights and Bias are learnable parameters in a model that regulate the signal between two neurons.

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

2/15
OPTIMIZATION IN DEEP LEARNING

Gradient Descent
A derivative or gradient indicates the direction of increase of the function. Thus a negative derivative or
gradient would indicate the direction of decrease of the function. This fact is used to minimize the value of
the function.
In gradient descent, we initialize the variables with random values.
1. We calculate the derivative/gradient for each variable.
2. We take steps in the direction of the negative derivate/gradient using a learning rate. The learning rate
controls the descent. Too large learning rate may result in oscillations while a small learning rate results
in slow convergence and hence the optimal value of the learning rate is critical
3. This is iteratively done until we reach a convergence criteria.

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

3/15
OPTIMIZATION IN DEEP LEARNING

1. Formula :

𝜃(𝑘+1)=𝜃𝑘−𝛼∇𝐽(𝜃𝑘)
where,
● θ(k+1) is the updated parameter vector at the (k+1)th iteration.
● θk is the current parameter vector at the kth iteration.
● α is the learning rate, which is a positive scalar that determines the step size for each iteration.
● ∇J(θk) is the gradient of the cost or loss function J with respect to the parameters θk

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

4/15
OPTIMIZATION IN DEEP LEARNING

Its a variant of gradient descent in which we ensure that the step size taken is sufﬁcient
enough to reduce the objective function thereby avoiding small steps. Here the step size is
determined through a line search which must satisfy Armijo condition. Below is the process
1. Initialization : We set a initial guess for the function f(x)
2. Gradient : We compute the gradient of the objective function ∇f(x)
3. Line Search : Here we take a large step size( and check if the reduction in function value
(using updated value and old value) satisﬁes below conditions know as Armjio condition ,

f(xt−1+α∇f(xt−1))−f(xt−1)≥cα∣∣∇f(xt−1)∣∣
2

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

515
OPTIMIZATION IN DEEP LEARNING

1. Here
● We are trying to find value at x(t) at time steep t and x(t-1) is the value at step t-1
● α is the step size
● c is a constant between 0 to 1.
● If we do not get the required reduction we reduce the step size by beta β ∈ (0, 1) iteratively till the above condition know as
Armjio is satisfied
● Why this value ? It has been shown mathematically through Taylor series first order expansion that the minimum decrease in
f(x) should be at least "step size * ∇f(x)2 ". These theoretical value is not practically possible to achieve that's why we
multiply by a fraction c.
2. Update : Update the solution parameters with the chosen step size.
3. Convergence Check: This can be done by examining the magnitude of the gradient, the change in the objective function value, or
other convergence criteria

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

6/15
OPTIMIZATION IN DEEP LEARNING

Gradient descent with Armijo Full Relaxation condition:

It is an optimization algorithm that combines the Armijo line search condition with a full Newton step. It considers
both the first derivative and second derivative(Hessian) information to find a step size that ensures sufficient
decrease in the objective function while incorporating information about the curvature of the function.
1. Initialization
2. Gradient
3. Line Search

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

7/15
OPTIMIZATION IN DEEP LEARNING

Stochastic Gradient Descent (SGD):

It's a variation of the Gradient Descent algorithm. In Gradient Descent, we analyze the entire dataset in each step, which may not be efficient
when dealing with very large datasets. To address this issue, we have Stochastic Gradient Descent (SGD). In Stochastic Gradient Descent, we
process just one example at a time to perform a single step. So, if the dataset contains 10000 rows, SGD will update the model parameters
10000 times in a single cycle through the dataset, as opposed to just once in the case of Gradient Descent.
Here's the process:
1. Select an example from the dataset.
2. Calculate its gradient.
3. Utilize the calculated gradient from step 2 to update the model weights.
4. Repeat steps 1 to 3 for all examples in the training dataset.
5. Completing a full pass through all the examples constitutes one epoch.
6. Repeat this entire process for several epochs as specified during training.

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

8/15
OPTIMIZATION IN DEEP LEARNING

Mini Batch Stochastic Gradient Descent:

We utilize a mini-batch stochastic gradient descent, which consists of a predetermined number of training examples,
smaller than the full dataset. This approach combines the advantages of the previously mentioned variants. In one
epoch, following the creation of fixed-size mini-batches, we execute the following steps:
1. Select a mini-batch.
2. Compute the mean gradient of the mini-batch.
3. Apply the mean gradient obtained in step 2 to update the model's weights.
4. Repeat steps 1 to 2 for all the mini-batches that have been created.

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

9/15
OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

10/15
OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

11/15
OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

12/15
OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

13/15
OPTIMIZATION IN DEEP LEARNING

THANK YOU

GULSHAN BANU.A/ AP/AI AND DS /OPTIMIZATION IN DEEP LEARNING /SNSCE

14/15

Lecture 2
No ratings yet
Lecture 2
31 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Cours 5
No ratings yet
Cours 5
23 pages
Non-Convex Optimization For Deep Networks and Stochastic
No ratings yet
Non-Convex Optimization For Deep Networks and Stochastic
9 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
No ratings yet
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
50 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Module 1
No ratings yet
Module 1
7 pages
Backpropagation Optimization Tutorial
No ratings yet
Backpropagation Optimization Tutorial
14 pages
Deep Learning Optimization Guide
No ratings yet
Deep Learning Optimization Guide
30 pages
Chapter 8-Deep Learning Book
No ratings yet
Chapter 8-Deep Learning Book
27 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Gradient Descent for ML Practitioners
No ratings yet
Gradient Descent for ML Practitioners
27 pages
Deep Learning Optimizers Explained
No ratings yet
Deep Learning Optimizers Explained
20 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Optimization Algorithms Deep PDF
No ratings yet
Optimization Algorithms Deep PDF
9 pages
1 Intro
No ratings yet
1 Intro
91 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Training NNs
No ratings yet
Training NNs
34 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
5.scaling Optimization
No ratings yet
5.scaling Optimization
68 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
Deep Learning (MODULE-2)
No ratings yet
Deep Learning (MODULE-2)
86 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
DL Exp2
No ratings yet
DL Exp2
6 pages
Neural Network Optimization Tactics
No ratings yet
Neural Network Optimization Tactics
20 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
UNIT3
No ratings yet
UNIT3
37 pages
Neural Network Training: Optimization
No ratings yet
Neural Network Training: Optimization
62 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
PCA and Convex Optimization and Bias, Variance-2
No ratings yet
PCA and Convex Optimization and Bias, Variance-2
29 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Chapter 4
No ratings yet
Chapter 4
33 pages
Optimization in Deep Learning - by Adam Cataldo - AC On AI
No ratings yet
Optimization in Deep Learning - by Adam Cataldo - AC On AI
5 pages
Unit 2.a Optimzer
No ratings yet
Unit 2.a Optimzer
10 pages
Optimization Gradient Descent
No ratings yet
Optimization Gradient Descent
13 pages
BME 6407 - Class 10 (April 2023)
No ratings yet
BME 6407 - Class 10 (April 2023)
31 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Deep Learning Chapter 1
No ratings yet
Deep Learning Chapter 1
46 pages
Introduction To Convnet-VGG
No ratings yet
Introduction To Convnet-VGG
35 pages
LSTM
No ratings yet
LSTM
13 pages
Ads Complete Notes
No ratings yet
Ads Complete Notes
54 pages
Advanced Artificial Intelligence
No ratings yet
Advanced Artificial Intelligence
82 pages
Virtual Memory & File Systems Guide
No ratings yet
Virtual Memory & File Systems Guide
42 pages
Basic Training Session No 1 PDF
No ratings yet
Basic Training Session No 1 PDF
33 pages
Neural Networks Quiz for Learners
No ratings yet
Neural Networks Quiz for Learners
17 pages
Papar-2 Answers
No ratings yet
Papar-2 Answers
26 pages
Rahayeb Hills Camp - Google Maps
No ratings yet
Rahayeb Hills Camp - Google Maps
15 pages
MODULE 4 Hardware The CPU & Storage
No ratings yet
MODULE 4 Hardware The CPU & Storage
62 pages
Cloud Web Application
No ratings yet
Cloud Web Application
10 pages
The Social Engineer Toolkit
No ratings yet
The Social Engineer Toolkit
20 pages
Font and Graphics Formats Explained
No ratings yet
Font and Graphics Formats Explained
1 page
PDF Application Options On Desktop@UCL, Desktop@UCL Anywhere For Staff and Students - Information Services Division - UCL - University College London
No ratings yet
PDF Application Options On Desktop@UCL, Desktop@UCL Anywhere For Staff and Students - Information Services Division - UCL - University College London
2 pages
Solarwinds Software Services Agreement For Cloud
No ratings yet
Solarwinds Software Services Agreement For Cloud
8 pages
CSA Recap Part4
No ratings yet
CSA Recap Part4
10 pages
1 s2.0 S2352710224012543 Main
No ratings yet
1 s2.0 S2352710224012543 Main
16 pages
CHAPTER NO 3 Class 8
No ratings yet
CHAPTER NO 3 Class 8
5 pages
Bairstow Method and Muller Method Lecture-18
No ratings yet
Bairstow Method and Muller Method Lecture-18
11 pages
The Fast-Track Guide To VXLAN BGP EVPN Fabrics Implement Today's Multi-Tenant Software-Defined Networks 1st Edition Rene Cardona
No ratings yet
The Fast-Track Guide To VXLAN BGP EVPN Fabrics Implement Today's Multi-Tenant Software-Defined Networks 1st Edition Rene Cardona
50 pages
Computeractive - 3 07 2024
No ratings yet
Computeractive - 3 07 2024
76 pages
Math & Programming Guide
No ratings yet
Math & Programming Guide
23 pages
SQL Record
No ratings yet
SQL Record
3 pages
Programming Parallelism: by Kelvin Chou
No ratings yet
Programming Parallelism: by Kelvin Chou
27 pages
Directx Video Acceleration Specification For Vp8 and Vp9 Video Coding
No ratings yet
Directx Video Acceleration Specification For Vp8 and Vp9 Video Coding
34 pages
Project Information: 1. Build Internet Infrastructure Migrate To New Technology
100% (1)
Project Information: 1. Build Internet Infrastructure Migrate To New Technology
11 pages
RDBMS Quizzes
100% (1)
RDBMS Quizzes
42 pages
Types of Computer
No ratings yet
Types of Computer
15 pages
Printer Multitech
No ratings yet
Printer Multitech
2 pages
SPRING MEADOWS PUBLIC SCHOOL (AutoRecovered)
No ratings yet
SPRING MEADOWS PUBLIC SCHOOL (AutoRecovered)
19 pages
ARIES Recovery Algorithm
No ratings yet
ARIES Recovery Algorithm
4 pages
CAN Bus Reverse Engineering Guide
No ratings yet
CAN Bus Reverse Engineering Guide
27 pages
Research On Building Information Model (BIM) Technology: Tianqi Yang and Lihui Liao
No ratings yet
Research On Building Information Model (BIM) Technology: Tianqi Yang and Lihui Liao
7 pages
Outline of Technology
No ratings yet
Outline of Technology
25 pages