CNN 02 Batch Normalization

Deep Learning Unit 3 CNN ( part-2 ) Material.

Uploaded by

vhoratanvir1610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

16 views19 pages

CNN 02 Batch Normalization

Deep Learning Unit 3 CNN ( part-2 ) Material.

Uploaded by

vhoratanvir1610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 19

Batch normalization e Batch Norm is a normalization technique done between the layers of a Neural Network instead of in the raw data. e Itis done along mini-batches instead of the full data set. e Itserves to speed up training and use higher learning rates, making learning easier.Why it's needed? e One of the most common problems is to avoid over-fitting. e your model is performing very well on the training data but is unable to predict the test data accurately. e The solution to such a problem is regularization. e The regularization techniques help to improve a model and allows it to converge faster. e We have several regularization tools at our end, some of them are early stopping, dropout, weight initialization techniques, and batch normalization.e The regularization helps in preventing the over-fitting of the model and the learning process becomes more efficient. e Normalization is a pre-processing technique used to standardize data. e Not normalizing the data before training can cause problems in our network, making it drastically harder to train and decrease its learning speed.e There are two main methods to normalize our data. The most straightforward method is to scale it to a range from 0 to 1: z-m Lnermalized = ——————_ maz — min e x the data point to normalize, m the mean of the data set, x_{max} the highest value, and x_{min} the lowest value. This technique is generally used in the inputs of the data.e The other technique used to normalize data is forcing the data points to have a mean of 0 and a standard deviation of 1, using the following formula: nt fnormalized = 3 e x the data point to normalize, m the mean of the data set, and s the standard deviation of the data set.Now, each data point mimics a standard normal distribution. Having all the features on this scale, none of them will have a bias, and therefore, our models will learn better. In Batch Norm, we use this last technique to normalize batches of data inside the network itself. we can define the normalization formula of Batch Norm as: N z—m, z :How Is It Applied? e Regular feed-forward Neural Network: x_i are the inputs, z the output of the neurons, a the output of the activation functions, and y the output of the network: x1 x2 x3Batch Norm — e Inthe image represented with a red line — is applied to the neurons’ output just before applying the activation function. e Usually, a neuron without Batch Norm would be computed as follows: z= g(w,a2)+ b; a= f(z)e g() the linear transformation of the neuron, w the weights of the neuron, b the bias of the neurons, and f() the activation function. e The model learns the parameters w and b. Adding Batch Norm, it looks z=glwyar); Ph ( —m) +t B; a= f(z%) ZAN the output of Batch Norm, m_z the mean of the neurons’ output, s_z the standard deviation of the output of the neurons, and \gamma and \beta learning parameters of Batch Norm.e The parameters \beta and \gamma shift the mean and standard deviation, respectively. e These values are learned over epochs and the other learning parameters, such as the weights of the neurons, aiming to decrease the loss of the model.hyperparameter optimization? e hyperparameters are different parameter values that are used to control the learning process and have a significant effect on the performance of models. e Most of algorithms come with the default values of their hyperparameters. e But the default values do not always perform well . e This is why you need to optimize them in order to get the right combination that will give you the best performance.e So hyperparameter optimization is the process of finding the right combination of hyperparameter values to achieve maximum performance on the data in a reasonable amount of time. e This process plays a vital role in the prediction accuracy of a Model.Batch Size: To enhance the speed of the learning process, the training set is divided into different subsets, which are known as a batch. Number of Epochs: An epoch can be defined as the complete cycle for training the model. Epoch represents an iterative learning process. The number of epochs varies from model to model, To determine the right number of epochs, a validation error is taken into account. The number of epochs is increased until there is a reduction in a validation error. If there is no improvement in reduction error ,then it indicates to stop increasing the number of epochs.Activation function e Activation function introduces non-linearity to the model. e Other alternatives are sigmoid, tanh and other activation functions depending on the task. e Number of hidden layers and units e Itis usually good to add more layers until the test error no longer improves.e Hyperparameters include the size of kernels, number of kernels, length of strides, and pooling size, which directly affect the performance and training speed of CNNs. e the number of convolution layers, the number of convolution kernels, the number of pooling layers, the number of the fully connected layer and the optimizer.Learning rate e Learning rate controls how much to update the weight in the optimization algorithm. \ é = \ g gy} A SS ‘AW = ean gradient JJ Optimal learning rate ‘Small learning rate Large learning rateif we choose the wrong learning rate? we'll have very slow progress since we're taking minimal steps to update the weights, ii) we'll never even reach the desired point since we might define a large rate that will make the model bounce across the loss function without any convergence: So, the learning rate should never be too high or too low for this reasonHow to optimize hyperparameters Grid Search performs hyperparameter tuning to determine the optimal values for a given model. Grid search works by trying every possible combination of parameters you want to try in your model. This means it will take a lot of time to perform the entire search which can get very computationally expensive.Random Search Random combinations of the values of the hyperparameters are used to find the best solution for the built model. The drawback of Random Search is that it can sometimes miss important points (values) in the search space. The main difference between these two techniques GridSearchCV has to try ALL the parameter combinations, however, RandomSearchCV can choose only a few ‘random’ combinations out of all the available combinations.

Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Normalization Techniques
No ratings yet
Normalization Techniques
23 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Batch Normalization
No ratings yet
Batch Normalization
6 pages
Cours 6
No ratings yet
Cours 6
26 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
DL-Lecture-10 Deep Learning Experiments
No ratings yet
DL-Lecture-10 Deep Learning Experiments
15 pages
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
No ratings yet
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
3 pages
Lec 4 - Regularization
No ratings yet
Lec 4 - Regularization
32 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
7 CNN 3
No ratings yet
7 CNN 3
30 pages
03 Optim Batch Normalization
No ratings yet
03 Optim Batch Normalization
13 pages
3 DL
No ratings yet
3 DL
15 pages
Batch Normalization in Deep Learning
No ratings yet
Batch Normalization in Deep Learning
17 pages
PDF Hyperparameter Tuning Batch Normalization
No ratings yet
PDF Hyperparameter Tuning Batch Normalization
11 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
12.batch Normalization
No ratings yet
12.batch Normalization
12 pages
Lecture 8.7
No ratings yet
Lecture 8.7
9 pages
BN Layer
No ratings yet
BN Layer
4 pages
Dropout
No ratings yet
Dropout
14 pages
L1, L2andBatchnormalization (1) T1754749408264
No ratings yet
L1, L2andBatchnormalization (1) T1754749408264
9 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
15 Improving Performance - Hacks & Tricks
No ratings yet
15 Improving Performance - Hacks & Tricks
57 pages
Optimization
No ratings yet
Optimization
44 pages
How To Use Batch Normalization With TensorFlow and TF - Keras To Train Deep Neural Networks Faster
No ratings yet
How To Use Batch Normalization With TensorFlow and TF - Keras To Train Deep Neural Networks Faster
11 pages
Exponential Convergence Rates For Batch Normalization - 1
No ratings yet
Exponential Convergence Rates For Batch Normalization - 1
1 page
Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
Batch Normalization in AIML Accelerating Deep Learning
No ratings yet
Batch Normalization in AIML Accelerating Deep Learning
12 pages
Batch Normalization
No ratings yet
Batch Normalization
7 pages
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
No ratings yet
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
16 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Chapter 2 - 4 Important Techniques
No ratings yet
Chapter 2 - 4 Important Techniques
34 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Hyperparameters Diagram Freebie Version
No ratings yet
Hyperparameters Diagram Freebie Version
6 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Deep Learning Optimization Guide
100% (1)
Deep Learning Optimization Guide
105 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Batch Normalization
No ratings yet
Batch Normalization
11 pages
ML Lec 09 ANN Quadratic Training
No ratings yet
ML Lec 09 ANN Quadratic Training
44 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Cours 5
No ratings yet
Cours 5
23 pages
Batch Normalization Preconditioning For Neural Network Training
No ratings yet
Batch Normalization Preconditioning For Neural Network Training
41 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Training NNs
No ratings yet
Training NNs
34 pages
Batch Normalization in Neural Network Simply Explained - by Anthony Kwok - Medium
No ratings yet
Batch Normalization in Neural Network Simply Explained - by Anthony Kwok - Medium
23 pages
DL Notes
No ratings yet
DL Notes
16 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Practical 6 Encoding (Tanvir)
No ratings yet
Practical 6 Encoding (Tanvir)
5 pages
Practical 7 LSTM
No ratings yet
Practical 7 LSTM
2 pages
Practical 5 Missing Values
No ratings yet
Practical 5 Missing Values
4 pages
DSA Notes (Codewithtanvir)
No ratings yet
DSA Notes (Codewithtanvir)
8 pages
Practical-6 MAD
No ratings yet
Practical-6 MAD
8 pages
Practical 6 Encoding
No ratings yet
Practical 6 Encoding
2 pages
Practical 10
No ratings yet
Practical 10
7 pages
Practical-8 BDA
No ratings yet
Practical-8 BDA
17 pages
Practical-9 BDA
No ratings yet
Practical-9 BDA
20 pages
Practical 11
No ratings yet
Practical 11
11 pages
Practical 8 GRU
No ratings yet
Practical 8 GRU
3 pages
Practical-9 MAD
No ratings yet
Practical-9 MAD
7 pages
Practical-7 MAD
No ratings yet
Practical-7 MAD
4 pages
Round01 MernStack25
No ratings yet
Round01 MernStack25
4 pages
DL3 (1) Tanvir
No ratings yet
DL3 (1) Tanvir
6 pages
Practical2 Perceptron DL Formatted
No ratings yet
Practical2 Perceptron DL Formatted
5 pages
DLA - Lab Manual Index - Sem-7-IT
No ratings yet
DLA - Lab Manual Index - Sem-7-IT
4 pages
Practical 5
No ratings yet
Practical 5
13 pages
DL Prac
No ratings yet
DL Prac
10 pages
DSV Lab Manual Index
No ratings yet
DSV Lab Manual Index
4 pages
Lab Manual
No ratings yet
Lab Manual
11 pages
DLA - Lab Manual Index - Sem-7-IT
No ratings yet
DLA - Lab Manual Index - Sem-7-IT
4 pages
Practical 5 ICC
No ratings yet
Practical 5 ICC
15 pages
Student Data Analysis Report
No ratings yet
Student Data Analysis Report
7 pages
DL4Final (1) Tanvir
No ratings yet
DL4Final (1) Tanvir
10 pages
Practical 2 Report
No ratings yet
Practical 2 Report
5 pages
ICC Unit-2
No ratings yet
ICC Unit-2
75 pages
DL5 and 6th Final (1) Tanvir
No ratings yet
DL5 and 6th Final (1) Tanvir
12 pages
Practical5 (MAD) Tanvir
No ratings yet
Practical5 (MAD) Tanvir
14 pages
Practical4 IaaS Public Cloud
No ratings yet
Practical4 IaaS Public Cloud
19 pages

CNN 02 Batch Normalization

Uploaded by

CNN 02 Batch Normalization

Uploaded by

You might also like