We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 19
Batch normalization
e Batch Norm is a normalization technique done between
the layers of a Neural Network instead of in the raw data.
e Itis done along mini-batches instead of the full data set.
e Itserves to speed up training and use higher learning
rates, making learning easier.Why it's needed?
e One of the most common problems is to avoid over-fitting.
e your model is performing very well on the training data but
is unable to predict the test data accurately.
e The solution to such a problem is regularization.
e The regularization techniques help to improve a model and
allows it to converge faster.
e We have several regularization tools at our end, some of
them are early stopping, dropout, weight initialization
techniques, and batch normalization.e The regularization helps in preventing the over-fitting of
the model and the learning process becomes more
efficient.
e Normalization is a pre-processing technique used to
standardize data.
e Not normalizing the data before training can cause
problems in our network, making it drastically harder to
train and decrease its learning speed.e There are two main methods to normalize our data. The most
straightforward method is to scale it to a range from 0 to 1:
z-m
Lnermalized = ——————_
maz — min
e x the data point to normalize, m the mean of the data set,
x_{max} the highest value, and x_{min} the lowest value. This
technique is generally used in the inputs of the data.e The other technique used to normalize data is forcing the
data points to have a mean of 0 and a standard deviation
of 1, using the following formula:
nt
fnormalized = 3
e x the data point to normalize, m the mean of the data set,
and s the standard deviation of the data set.Now, each data point mimics a standard normal
distribution.
Having all the features on this scale, none of them will
have a bias, and therefore, our models will learn better.
In Batch Norm, we use this last technique to normalize
batches of data inside the network itself.
we can define the normalization formula of Batch Norm as:
N z—m,
z :How Is It Applied?
e Regular feed-forward Neural Network: x_i are the inputs, z
the output of the neurons, a the output of the activation
functions, and y the output of the network:
x1
x2
x3Batch Norm —
e Inthe image represented with a red line — is applied to the
neurons’ output just before applying the activation function.
e Usually, a neuron without Batch Norm would be computed as
follows:
z= g(w,a2)+ b; a= f(z)e g() the linear transformation of the neuron, w the weights of the
neuron, b the bias of the neurons, and f() the activation function.
e The model learns the parameters w and b. Adding Batch Norm, it
looks
z=glwyar); Ph ( —m) +t B; a= f(z%)
ZAN the output of Batch Norm, m_z the mean of the neurons’
output, s_z the standard deviation of the output of the
neurons, and \gamma and \beta learning parameters of Batch
Norm.e The parameters \beta and \gamma shift the mean and
standard deviation, respectively.
e These values are learned over epochs and the other
learning parameters, such as the weights of the
neurons, aiming to decrease the loss of the model.hyperparameter optimization?
e hyperparameters are different parameter values that are
used to control the learning process and have a
significant effect on the performance of models.
e Most of algorithms come with the default values of their
hyperparameters.
e But the default values do not always perform well .
e This is why you need to optimize them in order to get the
right combination that will give you the best performance.e So hyperparameter optimization is the process of finding
the right combination of hyperparameter values to achieve
maximum performance on the data in a reasonable
amount of time.
e This process plays a vital role in the prediction accuracy of
a Model.Batch Size: To enhance the speed of the learning process,
the training set is divided into different subsets, which are
known as a batch.
Number of Epochs: An epoch can be defined as the complete
cycle for training the model.
Epoch represents an iterative learning process. The number
of epochs varies from model to model,
To determine the right number of epochs, a validation error is
taken into account. The number of epochs is increased until
there is a reduction in a validation error.
If there is no improvement in reduction error ,then it indicates
to stop increasing the number of epochs.Activation function
e Activation function introduces non-linearity to the model.
e Other alternatives are sigmoid, tanh and other activation
functions depending on the task.
e Number of hidden layers and units
e Itis usually good to add more layers until the test error no longer
improves.e Hyperparameters include the size of kernels, number of kernels, length of
strides, and pooling size, which directly affect the performance and training
speed of CNNs.
e the number of convolution layers, the number of convolution kernels, the
number of pooling layers, the number of the fully connected layer and the
optimizer.Learning rate
e Learning rate controls how much to update the weight in the
optimization algorithm.
\ é =
\ g gy} A
SS
‘AW = ean gradient JJ
Optimal learning rate ‘Small learning rate Large learning rateif we choose the wrong learning rate?
we'll have very slow progress since we're taking minimal steps to
update the weights,
ii) we'll never even reach the desired point since we might define a
large rate that will make the model bounce across the loss function
without any convergence:
So, the learning rate should never be too high or too low for this
reasonHow to optimize hyperparameters
Grid Search
performs hyperparameter tuning to determine the optimal values for a
given model.
Grid search works by trying every possible combination of parameters
you want to try in your model.
This means it will take a lot of time to perform the entire search which
can get very computationally expensive.Random Search
Random combinations of the values of the hyperparameters are used to
find the best solution for the built model.
The drawback of Random Search is that it can sometimes miss
important points (values) in the search space.
The main difference between these two techniques
GridSearchCV has to try ALL the parameter combinations, however,
RandomSearchCV can choose only a few ‘random’ combinations out of
all the available combinations.