[go: up one dir, main page]

0% found this document useful (0 votes)
16 views19 pages

CNN 02 Batch Normalization

Deep Learning Unit 3 CNN ( part-2 ) Material.

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
16 views19 pages

CNN 02 Batch Normalization

Deep Learning Unit 3 CNN ( part-2 ) Material.

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 19
Batch normalization e Batch Norm is a normalization technique done between the layers of a Neural Network instead of in the raw data. e Itis done along mini-batches instead of the full data set. e Itserves to speed up training and use higher learning rates, making learning easier. Why it's needed? e One of the most common problems is to avoid over-fitting. e your model is performing very well on the training data but is unable to predict the test data accurately. e The solution to such a problem is regularization. e The regularization techniques help to improve a model and allows it to converge faster. e We have several regularization tools at our end, some of them are early stopping, dropout, weight initialization techniques, and batch normalization. e The regularization helps in preventing the over-fitting of the model and the learning process becomes more efficient. e Normalization is a pre-processing technique used to standardize data. e Not normalizing the data before training can cause problems in our network, making it drastically harder to train and decrease its learning speed. e There are two main methods to normalize our data. The most straightforward method is to scale it to a range from 0 to 1: z-m Lnermalized = ——————_ maz — min e x the data point to normalize, m the mean of the data set, x_{max} the highest value, and x_{min} the lowest value. This technique is generally used in the inputs of the data. e The other technique used to normalize data is forcing the data points to have a mean of 0 and a standard deviation of 1, using the following formula: nt fnormalized = 3 e x the data point to normalize, m the mean of the data set, and s the standard deviation of the data set. Now, each data point mimics a standard normal distribution. Having all the features on this scale, none of them will have a bias, and therefore, our models will learn better. In Batch Norm, we use this last technique to normalize batches of data inside the network itself. we can define the normalization formula of Batch Norm as: N z—m, z : How Is It Applied? e Regular feed-forward Neural Network: x_i are the inputs, z the output of the neurons, a the output of the activation functions, and y the output of the network: x1 x2 x3 Batch Norm — e Inthe image represented with a red line — is applied to the neurons’ output just before applying the activation function. e Usually, a neuron without Batch Norm would be computed as follows: z= g(w,a2)+ b; a= f(z) e g() the linear transformation of the neuron, w the weights of the neuron, b the bias of the neurons, and f() the activation function. e The model learns the parameters w and b. Adding Batch Norm, it looks z=glwyar); Ph ( —m) +t B; a= f(z%) ZAN the output of Batch Norm, m_z the mean of the neurons’ output, s_z the standard deviation of the output of the neurons, and \gamma and \beta learning parameters of Batch Norm. e The parameters \beta and \gamma shift the mean and standard deviation, respectively. e These values are learned over epochs and the other learning parameters, such as the weights of the neurons, aiming to decrease the loss of the model. hyperparameter optimization? e hyperparameters are different parameter values that are used to control the learning process and have a significant effect on the performance of models. e Most of algorithms come with the default values of their hyperparameters. e But the default values do not always perform well . e This is why you need to optimize them in order to get the right combination that will give you the best performance. e So hyperparameter optimization is the process of finding the right combination of hyperparameter values to achieve maximum performance on the data in a reasonable amount of time. e This process plays a vital role in the prediction accuracy of a Model. Batch Size: To enhance the speed of the learning process, the training set is divided into different subsets, which are known as a batch. Number of Epochs: An epoch can be defined as the complete cycle for training the model. Epoch represents an iterative learning process. The number of epochs varies from model to model, To determine the right number of epochs, a validation error is taken into account. The number of epochs is increased until there is a reduction in a validation error. If there is no improvement in reduction error ,then it indicates to stop increasing the number of epochs. Activation function e Activation function introduces non-linearity to the model. e Other alternatives are sigmoid, tanh and other activation functions depending on the task. e Number of hidden layers and units e Itis usually good to add more layers until the test error no longer improves. e Hyperparameters include the size of kernels, number of kernels, length of strides, and pooling size, which directly affect the performance and training speed of CNNs. e the number of convolution layers, the number of convolution kernels, the number of pooling layers, the number of the fully connected layer and the optimizer. Learning rate e Learning rate controls how much to update the weight in the optimization algorithm. \ é = \ g gy} A SS ‘AW = ean gradient JJ Optimal learning rate ‘Small learning rate Large learning rate if we choose the wrong learning rate? we'll have very slow progress since we're taking minimal steps to update the weights, ii) we'll never even reach the desired point since we might define a large rate that will make the model bounce across the loss function without any convergence: So, the learning rate should never be too high or too low for this reason How to optimize hyperparameters Grid Search performs hyperparameter tuning to determine the optimal values for a given model. Grid search works by trying every possible combination of parameters you want to try in your model. This means it will take a lot of time to perform the entire search which can get very computationally expensive. Random Search Random combinations of the values of the hyperparameters are used to find the best solution for the built model. The drawback of Random Search is that it can sometimes miss important points (values) in the search space. The main difference between these two techniques GridSearchCV has to try ALL the parameter combinations, however, RandomSearchCV can choose only a few ‘random’ combinations out of all the available combinations.

You might also like