Regularization
Lecture 09
22 January 2025 1
Regularization
This technique discourages Any modification we make to the
learning a more complex or learning algorithm that is intended
flexible model, to avoid the risk to reduce the generalization error,
of overfitting. but not its training error
22 January 2025 2
Regularization with Modified Loss Functions
• Augment Ordinary Least Squares with regularization term:
• LASSO Regression L1 Regularization
• Ridge Regression L2 Regularization
• Elastic Net Regularization
Least Absolute Shrinkage & Selection Operator(LASSO): L1 Regularization
Minimize cost function: 1) Ordinary Least Squares 2) Regularization Term
Minimize { }
Forcing
• L1 penalizes regressors by shrinking their weights
• Regressors that contribute little to error reduction are more penalized
• λ is the weighting factor for regularization to tune overfit ↔ underfit
Ridge Regression : L2 Regularization
Minimize cost function: 1) Ordinary Least Squares 2) Regularization Term
Minimize { }
Forcing,
Elastic Net Regularization
•
Dropout Regularization
𝑥1
𝑥2
𝑦ො
𝑥3
𝑥4
22 January 2025 9
Drop out regularization: Prevents Overfitting
This technique has also become popular recently. We drop out some of the hidden units for
specific training examples. Different hidden units may go off for different examples. In different
iterations of the optimization the different units may be dropped randomly.
The dropouts can also be different for different layers. So, we can select specific layers which
have higher number of units and may be contributing more towards overfitting; thus suitable for
higher dropout rates.
For some of the layers drop-out can be 0, that means no dropout
22 January 2025 10
Layer wise drop out
0 0.2 0.2 0 0 0
22 January 2025 11
Drop out
• Drop out also help in spreading out the weights at all layers as the
system will be reluctant to put more weight on some specific node. So
it help in shrinking weights and has an adaptive effect on the weights.
• Dropout has a similar effect as L2 regularization for overfitting.
• We don’t use dropout for test examples
• We also need to bump up the values at the output of each layer
corresponding to the dropout
22 January 2025 12
Data Augmentation
More training data is one more solution for overfitting.
As getting additional data may be expensive and may not be possible
Flipping of all the images can be one of the ways to increase your data.
Randomly zooming in and zooming out can be another way
Distorting some of the images based on your application may be another way
to increase your data.
22 January 2025 13
Data Augmentation
22 January 2025 14
Early stopping
Dev set error
Training error
22 January 2025
# iterations 15
Early Stopping
Sometime dev set error goes down and
By stopping halfway, we also reduce
then it start going up. So, you may
number of iterations to train and the
decide to stop where the curve has
computation time.
started taking a different turn.
Early stopping does not go fine with
orthogonalization because it contradicts We are stopping the process of
with our original objective of optimizing optimization in between to take care
(w, b) to the minimum possible cost of the overfitting which is a different
function. objective then optimization.
22 January 2025 16
Thank You
For more information, please visit the
following links:
gauravsingal789@gmail.com
gaurav.singal@nsut.ac.in
https://www.linkedin.com/in/gauravsingal789/
http://www.gauravsingal.in
22 January 2025
17