0% found this document useful (0 votes)

171 views18 pages

Lecture 3 CNN - Backpropagation

This document discusses backpropagation for convolutional neural networks. It begins by introducing gradient-based learning and backpropagation. It then describes backpropagation for common neural network layers like softmax, fully connected, pooling, ReLU, and convolutional layers. It provides equations for computing the gradients of loss with respect to weights and previous layer outputs. It discusses how convolutional layers are implemented using im2col and matrix multiplication. It concludes by mentioning CIFAR-10 dataset and exercises for further study.

Uploaded by

Trần Văn Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views18 pages

Lecture 3 CNN - Backpropagation

Uploaded by

Trần Văn Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Lecture 3:

CNN: Back-propagation

boris. ginzburg@intel.com

1
Agenda

 Introduction to gradient-based learning for Convolutional NN

 Backpropagation for basic layers
– Softmax
– Fully Connected layer
– Pooling
– ReLU
– Convolutional layer
 Implementation of back-propagation for Convolutional layer
 CIFAR-10 training

2
Good Links

1. http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
2. http://www.iro.umontreal.ca/~pift6266/H10/notes/gradien
t.html#flowgraph

3
Gradient based training

Conv. NN is a function y = f(x0,w), where

x0 is image [28,28],
w – network parameters (weights, bias)
y – softmax output= probability that x belongs to one of 10 classes 0..9

4
Gradient based training

We want to find parameters W, to minimize an error

E (f(x0,w),y0) = -log (f(x0,w)- y0).
For this we will do iterative gradient descent:
−𝜕𝐸
w(t) = w(t-1) – λ * (t)
𝜕𝑤

How do we compute gradient of E wrt weights?

Loss function E is chain of functions. Let’ s go layer by layer,
from last layer back, and use the chain rule for gradient of
complex functions:
𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤,𝑦𝑙−1 )
= ×
𝜕𝑦𝑙−1 𝜕𝑦𝑙 𝜕𝑦𝑙−1
𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤,𝑦𝑙−1 )
= ×
𝜕𝑤𝑙 𝜕𝑦𝑙 𝜕𝑤𝑙

5
LeNet topology

Soft Max + LogLoss

Inner Product

ReLUP

BACKWARD
Inner Product
FORWARD

Pooling [2x2, stride 2]

Convolutional layer [5x5]

Pooling [2x2, stride 2]

Convolutional layer [5x5]

Data Layer
6
Layer:: Backward( )

class Layer {
Setup (bottom, top); // initialize layer
Forward (bottom, top); //compute : 𝑦𝑙 = 𝑓 𝑤𝑙 , 𝑦𝑙−1
Backward( top, bottom); //compute gradient
}
𝜕𝐸
Backward: we start from gradient from last layer and
𝜕𝑦𝑙
𝜕𝐸 𝜕𝐸
1) propagate gradient back : →
𝜕𝑦𝑙 𝜕𝑦𝑙−1
𝜕𝐸
2) compute the gradient of E wrt weights wl:
𝜕𝑤𝑙

7
Softmax with LogLoss Layer

Consider the last layer, softmax with log-loss (MNIST example ):

𝑒 𝑦0 9 𝑦𝑘
𝐸 = − log 𝑝𝑦0 = −log( 9 𝑒 𝑦𝑘 ) = −𝑦0 + log( 0𝑒 )
0

For all k=0..9 , except k0 (right answer) we want to decrease pk:

𝜕𝐸 𝑒 𝑦𝑘
= 9 𝑒 𝑦𝑘 = 𝑝𝑘 ,
𝜕 𝑦𝑘 0

for k=k0 (right answer) we want to increase pk:

𝜕𝐸
= −(1 − 𝑝𝑘0 )
𝜕 𝑦𝑘0

See http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression

8
Inner product (Fully Connected) Layer

Fully connected layer is just Matrix – Vector multiplication:

𝑦𝑙 = 𝑊𝑙 ∗ 𝑦𝑙−1
𝜕𝐸 𝜕𝐸
So = ∗ 𝑊𝑙 𝑇
𝜕𝑦𝑙−1 𝜕𝑦𝑙
𝜕𝐸 𝜕𝐸
and = ∗ 𝑦𝑙−1
𝜕𝑊𝑙 𝜕𝑦𝑙
Note: we need 𝑦𝑙−1 , so we should keep them on forward pass.

9
ReLU Layer

Rectified Linear Unit :

𝑦𝑙 = max (0, 𝑦𝑙−1 )
= 0, 𝑖𝑓 (𝑦𝑙 < 0)
𝜕𝐿
so = 𝜕𝐿
𝜕𝑦𝑙−1 = , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜕𝑦𝑙

10
Max-Pooling Layer
Forward :
for (p = 0; p< k; p++)
for (q = 0; q< k; q++)
yn (x, y) = max( yn (x, y), yn-1(x + p, y + q));

Backward:

𝜕𝐿
= 0, 𝑖𝑓 ( 𝑦𝑛 𝑥, 𝑦 ! = 𝑦𝑛−1 𝑥 + 𝑝, 𝑦 + 𝑞 )
(𝑥 + 𝑝, 𝑦 + 𝑞) = 𝜕𝐿
𝜕𝑦𝑛−1 = (𝑥, 𝑦), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜕 𝑦𝑛

Quiz:
1. What will be gradient for Sum-pooling?
2. What will be gradient if pooling areas overlap? (e.g. stride =1)?

11
Convolutional Layer :: Backward
for (n = 0; n < N; n ++)
for (m = 0; m < M; m ++) M N

for(y = 0; y<Y; y++) K

for(x = 0; x<X; x++) X

for (p = 0; p< K; p++) Y

for (q = 0; q< K; q++)

yL (n; x, y) += yL-1(m, x+p, y+q) * w (n ,m; p, q);

Let’ s use the chain rule for convolutional layer

𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤,𝑦𝑙−1 )
𝜕𝑦𝑙−1
=
𝜕𝑦𝑙
×
𝜕𝑦𝑙−1
;

𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤, 𝑦𝑙−1 )

= ×
𝜕𝑤𝑙 𝜕𝑦𝑙 𝜕𝑤𝑙−1
12
Convolutional Layer :: Backward
Example: M=1, N=2, K=2.
Take one pixel in level (n-1). Which pixels in next level are influenced by it?

2x2

13
Convolutional Layer :: Backward

Let’ s use the chain rule for convolutional layer:

𝜕𝐸 𝜕𝐸
Gradient is sum of convolution with gradients over all
𝜕𝑦𝑙−1 𝜕𝑦𝑙
feature maps from “upper” layer:
𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤,𝑦𝑙−1 ) 𝑁 𝜕𝐸
= × = 𝑛=1 𝑏𝑎𝑐𝑘_𝑐𝑜𝑟𝑟(𝑊, )
𝜕𝑦𝑙−1 𝜕𝑦𝑙 𝜕𝑦𝑙−1 𝜕𝑦 𝑙

Gradient of E wrt w is sum over all “pixels” (x,y) in the input

map :
𝜕𝐸 𝜕𝐸 𝜕𝑦𝑙 (𝑤,𝑦𝑙−1 ) 𝜕𝐸
= × = 0≤𝑥≤𝑋 𝑥, 𝑦 °𝑦𝑙−1 (𝑥, 𝑦)
𝜕𝑤𝑙 𝜕𝑙 𝜕𝑤𝑙 𝜕𝑦𝑙
0<𝑦<𝑌

14
Convolutional Layer :: Backward
How this is implemented:
backward ( ) { …
// im2col data to col_data
im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_,
col_data);
// gradient w.r.t. weight.:
caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , 1.,
weight_diff );
// gradient w.r.t. bottom data:
caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , 0.,
col_diff );
// col2im back to the data
col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_,
bottom_diff );
} 15
Convolutional Layer : im2col

Implementation is based on reduction of convolution layer to

matrix – matrix multiply ( See Chellapilla et all , “High Performance
Convolutional Neural Networks for Document Processing” )

16
CIFAR-10 Training

http://www.cs.toronto.edu/~kriz/cifar.html
https://www.kaggle.com/c/cifar-10
60000 32x32 colour images in 10 classes, with 6000 images per class.
There are:
 50000 training images

 10000 test images.

17
Exercises

1. Look at definition of following layers (Backward)

– sigmoid, tanh
2. Implement a new layer:
– softplus 𝑦𝑙 = log( 1 + 𝑒 𝑦𝑙−1 )
3. Train CIFAR-10 with different topologies

Project:
1. Port CIFAR-100 to caffe

02 CNN Slides
No ratings yet
02 CNN Slides
77 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Part 2 Module 2 DL BP
No ratings yet
Part 2 Module 2 DL BP
66 pages
Neural Network and CNN
No ratings yet
Neural Network and CNN
85 pages
K-Max Pooling Operation
No ratings yet
K-Max Pooling Operation
134 pages
CNN Derivation & Implementation Guide
No ratings yet
CNN Derivation & Implementation Guide
8 pages
DL Notes Handwritten
No ratings yet
DL Notes Handwritten
48 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
108 pages
NN 07
No ratings yet
NN 07
24 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
CS 182 Berkeley 2021 Discussion 3
No ratings yet
CS 182 Berkeley 2021 Discussion 3
5 pages
Dis3 Sol
No ratings yet
Dis3 Sol
7 pages
Algorithm - Pseudocode of 2D CNN
No ratings yet
Algorithm - Pseudocode of 2D CNN
7 pages
Unit 4
No ratings yet
Unit 4
51 pages
Assignment3 - DeepLearning
No ratings yet
Assignment3 - DeepLearning
16 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Convnets 2
No ratings yet
Convnets 2
17 pages
Lecture 17. Convolutional Neural Networks PDF
No ratings yet
Lecture 17. Convolutional Neural Networks PDF
32 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Lecture 2 CNN
No ratings yet
Lecture 2 CNN
105 pages
Train Your Image Classifier Model With PyTorch
No ratings yet
Train Your Image Classifier Model With PyTorch
6 pages
Neural Network
No ratings yet
Neural Network
97 pages
DLassignment
No ratings yet
DLassignment
6 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
97 pages
Convolutional Layer Examples
No ratings yet
Convolutional Layer Examples
69 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Convolutional Neural Networks in Python
100% (3)
Convolutional Neural Networks in Python
141 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Intro to CNNs for Tech Enthusiasts
No ratings yet
Intro to CNNs for Tech Enthusiasts
31 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
31 pages
03 PL, Activation, BackProp, CNN
No ratings yet
03 PL, Activation, BackProp, CNN
95 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
Convolutional Neural Networks in Computer Vision: Jochen Lang
No ratings yet
Convolutional Neural Networks in Computer Vision: Jochen Lang
42 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
Co!&co2 Alm Q&a
No ratings yet
Co!&co2 Alm Q&a
7 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
77 pages
Recitation 4
No ratings yet
Recitation 4
18 pages
Deep Learning UNIT-4
No ratings yet
Deep Learning UNIT-4
34 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
51 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
APKA Report
No ratings yet
APKA Report
3 pages
Neural Network Innovations
No ratings yet
Neural Network Innovations
9 pages
Lesson 3
No ratings yet
Lesson 3
6 pages
CNN-Based Gender Classification Guide
No ratings yet
CNN-Based Gender Classification Guide
7 pages
Introduction To Feed Forward Neural Networks
No ratings yet
Introduction To Feed Forward Neural Networks
121 pages
Lecture 23
No ratings yet
Lecture 23
15 pages
Unit II
No ratings yet
Unit II
38 pages
CS 236 Section 3
No ratings yet
CS 236 Section 3
59 pages
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
No ratings yet
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
6 pages
Deep Learning for MIMO Systems
No ratings yet
Deep Learning for MIMO Systems
9 pages
On Deep Learning-Based Massive MIMO Indoor User Localization
No ratings yet
On Deep Learning-Based Massive MIMO Indoor User Localization
5 pages
Nonlinear Precoding For Phase-Quantized Constant-Envelope Massive MU-MIMO-OFDM
No ratings yet
Nonlinear Precoding For Phase-Quantized Constant-Envelope Massive MU-MIMO-OFDM
6 pages
Lec9 CNN 25jan18
No ratings yet
Lec9 CNN 25jan18
111 pages
Deep Learning With Matlab Quick Start Guide PDF
No ratings yet
Deep Learning With Matlab Quick Start Guide PDF
1 page
Deep Tensor Convolution On Multicores
No ratings yet
Deep Tensor Convolution On Multicores
10 pages
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
No ratings yet
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
6 pages
Backpropagation: TA: Yi Wen
No ratings yet
Backpropagation: TA: Yi Wen
39 pages
Present Simple Vs Present Conti
No ratings yet
Present Simple Vs Present Conti
1 page
Ashrafian, Hutan Sunzi Surgical Philosophy Concepts of Modern Surgery Paralleled To Sun Tzus Art of War
No ratings yet
Ashrafian, Hutan Sunzi Surgical Philosophy Concepts of Modern Surgery Paralleled To Sun Tzus Art of War
4 pages
Flight Planning and Monitering
No ratings yet
Flight Planning and Monitering
61 pages
DLP English 10
No ratings yet
DLP English 10
346 pages
Astm F 2394-07
No ratings yet
Astm F 2394-07
13 pages
274 282 Published
No ratings yet
274 282 Published
9 pages
Didactic of Teaching English Language
87% (15)
Didactic of Teaching English Language
24 pages
Hsslive Xi English March 2020 QN
100% (1)
Hsslive Xi English March 2020 QN
7 pages
Meiso No Mori Crematorium Gifu by Toyo Ito & Associates PDF
No ratings yet
Meiso No Mori Crematorium Gifu by Toyo Ito & Associates PDF
11 pages
Siddhanta Deepika Volume 13
100% (1)
Siddhanta Deepika Volume 13
599 pages
Photoelectric Sensor Specs
No ratings yet
Photoelectric Sensor Specs
8 pages
TBC 85 in Meoh MSDS 031606
No ratings yet
TBC 85 in Meoh MSDS 031606
10 pages
Neuroscience-Based Teaching Guide
No ratings yet
Neuroscience-Based Teaching Guide
22 pages
Refresher Masangkay Structural
No ratings yet
Refresher Masangkay Structural
6 pages
English Exercises
No ratings yet
English Exercises
26 pages
Chem-Feed & Water Treatment Prospects
No ratings yet
Chem-Feed & Water Treatment Prospects
28 pages
1 SM PDF
No ratings yet
1 SM PDF
15 pages
Criminal Procedure in T'ang China
No ratings yet
Criminal Procedure in T'ang China
35 pages
(CÃ VÅ© Mai PhÆ°Æ¡ng) Tá NG Ã N Ngá Phã¡p Trá NG Ä Iá M Cho Kã Thi THPT 2025 (Buá I 2)
No ratings yet
(CÃ VÅ© Mai PhÆ°Æ¡ng) Tá NG Ã N Ngá Phã¡p Trá NG Ä Iá M Cho Kã Thi THPT 2025 (Buá I 2)
3 pages
C5 Prob
No ratings yet
C5 Prob
4 pages
Making The Multi Dimensional Taste of Japan
No ratings yet
Making The Multi Dimensional Taste of Japan
13 pages
Solar Cell Contact Fabrication
No ratings yet
Solar Cell Contact Fabrication
9 pages
Syamala Dandakam PDF
86% (7)
Syamala Dandakam PDF
11 pages
Personal Strategic Plan Example
No ratings yet
Personal Strategic Plan Example
2 pages
Q2e RW3 U04 Energy PDF
No ratings yet
Q2e RW3 U04 Energy PDF
14 pages
Iso 27001 Implementation Roadmap
100% (7)
Iso 27001 Implementation Roadmap
1 page
Notes - The Homecoming - by Rabindranath Tagore
100% (3)
Notes - The Homecoming - by Rabindranath Tagore
3 pages
Global System For Mobile Communication
100% (1)
Global System For Mobile Communication
7 pages
Research Output On Paddy (Oryza Sativa) : A Scientometric Study
No ratings yet
Research Output On Paddy (Oryza Sativa) : A Scientometric Study
4 pages
ISC Class12 Math Projects
No ratings yet
ISC Class12 Math Projects
24 pages
Acute Inflammation
No ratings yet
Acute Inflammation
22 pages