0% found this document useful (0 votes)

20 views37 pages

TRes Net

Resnet notes

Uploaded by

Satyarth Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views37 pages

TRes Net

Resnet notes

Uploaded by

Satyarth Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

ResNet

Some slides were adated/taken from various sources, including Andrew Ng’s Coursera Lectures, CS231n: Convolutional Neural Networks for Visual Recognition lectures, Stanford University CS
Waterloo Canada lectures, Aykut Erdem, et.al. tutorial on Deep Learning in Computer Vision, Ismini Lourentzou's lecture slide on "Introduction to Deep Learning", Ramprasaath's lecture
slides, and many more. We thankfully acknowledge them. Students are requested to use this material for their study only and NOT to distribute it.
In this Lecture

• Introducing a breakthrough neural networks architecture introduced

on 2015.
• Why deep?
• What’s the problem in learning deep networks?
• ResNet and how it allow us to gain more performance via deeper
networks.
• Some results, improvements and farther works.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners

Technical ResNet
Intro ResNet Results Comparison
details 1000
Deep vs Shallow Networks
What happens when we continue stacking deeper layers on a “plain” convolutional
neural network?

56-layer
Training error

56-layer

Test error
20-layer

20-layer

Iterations Iterations

56-layer model performs worse on both training and test error

-> The deeper model performs worse, but it’s not caused by overfitting!
Technical ResNet
Intro ResNet Results Comparison
details 1000
Deeper models are harder to optimize

• The deeper model should be able to perform at least as well as the

shallower model.
• A solution by construction is copying the learned layers from the
shallower model and setting additional layers to identity mapping.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Challenges

• Deeper Neural Networks start to degrade in performance.

• Vanish/Exploding Gradient – May lead for extremely complex
parameters initializations to make it work. Still might suffer from
Vanish/Exploding even for the best parameters.
• Long training times – Due to too many training parameters.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet

• A specialized network introduced by Microsoft.

• Connects inputs of layers into farther part of that network to allow
“shortcuts”.
• Simple idea – great improvements with both performance and train
time.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Plain Network

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks
X Big NN a[l]

a[l] a[l+2]
X Big NN

a[l+2]=g(z[l+2]+a[l])
=g(w[l+2] a[l+2]+b[l+2] +a[l])=g(a[l])
if w[l+2]=0 and b[l+2] =0
Identity function is easy to learn for residual block
Skip Connections “shortcuts”

• Such connections are referred as skipped connections or shortcuts. In

general similar models could skip over several layers.
• They refer to residual part of the network as a unit with input and
output.
• Such residual part receives the input as an amplifier to its output –
The dimensions usually are the same.
• Another option is to use a projection to the output space.
• Either way – no additional training parameters are used.

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet
He et. al. 2015

F(x) = w[l+2] a[l+2]+b[l+2] So H(x) of plain layers is replaced

x = a[l] by new H(x) = w[l+2] a[l+2]+b[l+2] + a[l]
g() is ReLU
Slide Credit: Fei Li et. al.
ResNet
He et. al. 2015

Referring the original residual function as H(x)

The residual part now fits a new function F(x)= H(x)-X
The original mapping recast into old H(x)+X
It is easier to learn residual F(x) Slide Credit: Fei Fei Li et. al.
ResNet as a ConvNet
• Till now we talked about fully connected layers.
• The ResNet idea could easily expended into convolutional model.
• Other adaptations of this idea could be easily introduced to almost
any kind of deep layered network.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128
3x3 conv 3x3 conv, 128

F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

3x3 conv, 128
filters and downsample F(x) X 3x3 conv, 128
filters, /2
relu 3x3 conv, 128
spatially with
identity
spatially using stride 2 3x3 conv, 128 stride 2
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64 3x3 conv, 64
3x3 conv, 64
filters
3x3 conv, 64
X 3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512
Full ResNet architecture:
relu 3x3 conv, 512
- Stack residual blocks 3x3 conv, 512, /2
F(x) + x
- Every residual block has ..
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64
Residual block 3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2 Beginning
Input conv layer
Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000 No FC layers
Pool besides FC
ResNet Architecture 3x3 conv, 512
3x3 conv, 512
1000 to
output
classes
3x3 conv, 512
3x3 conv, 512
Full ResNet architecture: Global
relu 3x3 conv, 512 average
- Stack residual blocks 3x3 conv, 512, /2 pooling layer
F(x) + x after last
- Every residual block has ..
conv layer
.
two 3x3 conv layers 3x3 conv, 128

- Periodically, double # of 3x3 conv 3x3 conv, 128

filters and downsample F(x) X 3x3 conv, 128

relu 3x3 conv, 128
identity
spatially using stride 2 3x3 conv, 128
3x3 conv 3x3 conv, 128, / 2
(/2 in each dimension) 3x3 conv, 64
- Additional conv layer at 3x3 conv, 64

the beginning X
3x3 conv, 64
3x3 conv, 64

- No FC layers at the end Residual block 3x3 conv, 64

(only FC 1000 to output 3x3 conv, 64

Pool
classes) 7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
Softmax
FC 1000
Pool

ResNet Architecture 3x3 conv, 512

3x3 conv, 512

3x3 conv, 512, /2
Total depths of 34, 50, 101, or
..
152 layers for ImageNet .
3x3 conv, 128
3x3 conv, 128

3x3 conv, 128

3x3 conv, 128, / 2

3x3 conv, 64
3x3 conv, 64

Pool
7x7 conv, 64, / 2
Input

Technical
Intro ResNet Results ResNet 1000 Comparison
details
ResNet Architecture
28x28x256
output

For deeper networks 1x1 conv, 256

(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv, 64
(similar to GoogLeNet)
1x1 conv, 64

28x28x256
input

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet Architecture
28x28x256
output
1x1 conv, 256 filters projects
back to 256 feature maps
For deeper networks (28x28x256) 1x1 conv, 256
(ResNet-50+), use “bottleneck”
layer to improve efficiency 3x3 conv operates over
3x3 conv, 64
(similar to GoogLeNet) only 64 feature maps

1x1 conv, 64 filters 1x1 conv, 64

to project to
28x28x64 28x28x256
input

Technical ResNet
Intro ResNet Results Comparison
details 1000
Residual Blocks (skip connections)
Deeper Bottleneck Architecture

Technical ResNet
Intro ResNet Results Comparison
details 1000
Deeper Bottleneck Architecture (Cont.)
• Addresses high training time of very deep networks.
• Keeps the time complexity same as the two layered convolution
• Allows us to increase the number of layers
• allows the model to converge much faster.
• 152-layer ResNet has 11.3 billion FLOPS while VGG-16/19 nets has
15.3/19.6 billion FLOPS.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Why Do ResNets Work Well?

Technical ResNet
Intro ResNet Results Comparison
details 1000
Why Do ResNets Work Well? (Cont
(Cont)
Cont)
• In theory ResNet is still identical to plain networks, but in practice due to
the above the convergence is much faster.
• No additional training parameters introduced.
• No addition complexity introduced.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Training ResNet in practice
• Batch Normalization after every CONV layer.
• Xavier/2 initialization from He et al.
• SGD + Momentum (0.9)
• Learning rate: 0.1, divided by 10 when validation error
plateaus.
• Mini-batch size 256.
• Weight decay of 1e-5.
• No dropout used.
Technical ResNet
Intro ResNet Results Comparison
details 1000
Loss Function
• For measuring the loss of the model a combination of cross-entropy
and softmax were used.
• The output of the cross-entropy was normalized using softmax
function.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Results
Experimental Results
- Able to train very deep
networks without degrading
(152 layers on ImageNet, 1202
on Cifar)
- Deeper networks now achieve
lowing training error as
expected
- Swept 1st place in all ILSVRC
and COCO 2015 competitions ILSVRC 2015 classification winner (3.6%
top 5 error) -- better than “human
performance”! (Russakovsky 2014)
Technical ResNet
Intro ResNet Results Comparison
details 1000
Comparing Plain to ResNet (18/
18/34 Layers)

Technical ResNet
Intro ResNet Results Comparison
details 1000
Comparing Plain to Deeper ResNet
Test Error: Train Error:

Technical ResNet
Intro ResNet Results Comparison
details 1000
ResNet on More than 1000 Layers
• To farther improve learning of extremely deep ResNet “Identity
Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang,
Shaoqing Ren, and Jian Sun 2016” suggests to pass the input directly
to the final residual layer, hence allowing the network to easily learn
to pass the input as identity mapping both in forward and backward
passes.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Identity Mappings in Deep Residual Networks

Technical ResNet
Intro ResNet Results Comparison
details 1000
Identity Mappings in Deep Residual Networks
Improvement on CIFAR-
CIFAR-10

• Another important improvement – using the Batch

Normalization as pre-activation improves the
regularization.
• This improvement leads to better performances for
smaller networks as well.

Technical ResNet
Intro ResNet Results Comparison
details 1000
Reduce Learning Time with Random Layer
Drops
• Dropping layers during training, and using the full network in testing.
• Residual block are used as network’s building block.
• During training, input flows through both the shortcut and the weights.
• Training: Each layer has a “survival probability” and is randomly dropped.
• Testing: all blocks are kept active.
• re-calibrated according to its survival probability during training.

Technical ResNet
Intro ResNet Results Comparison
details 1000

Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
GoogleNET and ResNet v4 With Nin and Bias
No ratings yet
GoogleNET and ResNet v4 With Nin and Bias
82 pages
Modules 3 C
No ratings yet
Modules 3 C
44 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
Deep Learning Assign 2
No ratings yet
Deep Learning Assign 2
5 pages
Convnets 3
No ratings yet
Convnets 3
17 pages
Deep CNN
No ratings yet
Deep CNN
66 pages
CNN (1) - Unit 3 - Merged
No ratings yet
CNN (1) - Unit 3 - Merged
14 pages
DL Ass 742
No ratings yet
DL Ass 742
14 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
CSCI417 Machine Intelligence - Lec11 RNN - V1
No ratings yet
CSCI417 Machine Intelligence - Lec11 RNN - V1
61 pages
Module - 2
No ratings yet
Module - 2
117 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Lec 6
No ratings yet
Lec 6
31 pages
Unit 3
No ratings yet
Unit 3
38 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
Unit 3
No ratings yet
Unit 3
37 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
ResNet & VGGNet Deep Learning Guide
No ratings yet
ResNet & VGGNet Deep Learning Guide
44 pages
Deep Residual Learning For Image Recognition (Summary)
No ratings yet
Deep Residual Learning For Image Recognition (Summary)
11 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
Understanding ResNet
No ratings yet
Understanding ResNet
11 pages
CNN Architectures - Transfer Learning
No ratings yet
CNN Architectures - Transfer Learning
64 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Avik Chakraborty MCAN-302
No ratings yet
Avik Chakraborty MCAN-302
11 pages
Res Net
No ratings yet
Res Net
13 pages
CNN Architectures for Text and Image
No ratings yet
CNN Architectures for Text and Image
167 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
ResNet: Deep Learning with Skip Connections
No ratings yet
ResNet: Deep Learning with Skip Connections
8 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
DL UNIT 2 CNN Architectures
No ratings yet
DL UNIT 2 CNN Architectures
12 pages
Unit-2 Adl
No ratings yet
Unit-2 Adl
25 pages
Case Studies
No ratings yet
Case Studies
17 pages
VGG and Resnet
No ratings yet
VGG and Resnet
18 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Bascis of AI - Module 2 - Complementary Study Material - 4
No ratings yet
Bascis of AI - Module 2 - Complementary Study Material - 4
4 pages
Difference Between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference Between AlexNet, VGGNet, ResNet, and Inception
25 pages
Evolution of CNN Architecture
No ratings yet
Evolution of CNN Architecture
13 pages
Lecture 6 Review
No ratings yet
Lecture 6 Review
74 pages
Resnets: Background
No ratings yet
Resnets: Background
8 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
17 pages
CNN Fundamentals & Case Studies
No ratings yet
CNN Fundamentals & Case Studies
27 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
Residual Networks
No ratings yet
Residual Networks
13 pages
5b Dana
No ratings yet
5b Dana
67 pages
Alex Net
No ratings yet
Alex Net
26 pages
Densely Connected Convolutional Networks
No ratings yet
Densely Connected Convolutional Networks
11 pages
ResNet Overview
No ratings yet
ResNet Overview
17 pages
ResNet Architecture
No ratings yet
ResNet Architecture
4 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Res Netdetaila
No ratings yet
Res Netdetaila
24 pages
Deep Learning in Financial Forecasting
No ratings yet
Deep Learning in Financial Forecasting
33 pages
Regular Languages: Outline
No ratings yet
Regular Languages: Outline
45 pages
STA 406 Stochastic Processes Past Papers
No ratings yet
STA 406 Stochastic Processes Past Papers
9 pages
ARCH-GARCH Model and Cointegration Analysis Guide
No ratings yet
ARCH-GARCH Model and Cointegration Analysis Guide
3 pages
Chapter 3
No ratings yet
Chapter 3
46 pages
Analysis of Sequential Circuits - Doc
No ratings yet
Analysis of Sequential Circuits - Doc
35 pages
Introduction To Object Oriented Analysis and Design
No ratings yet
Introduction To Object Oriented Analysis and Design
32 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
GATE Exam PDA and CFL Questions
No ratings yet
GATE Exam PDA and CFL Questions
20 pages
الشبكات العصبية
0% (1)
الشبكات العصبية
18 pages
Lec 3
No ratings yet
Lec 3
51 pages
DL syllabus-PHD
No ratings yet
DL syllabus-PHD
3 pages
Finautomata
No ratings yet
Finautomata
1 page
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
5 pages
Slide 02 ConceptsofOO
No ratings yet
Slide 02 ConceptsofOO
54 pages
Automata Theory PPT Seminar
100% (1)
Automata Theory PPT Seminar
21 pages
Practical No.1
No ratings yet
Practical No.1
2 pages
Problem Set 4: Theory Questions
No ratings yet
Problem Set 4: Theory Questions
5 pages
NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
Generative AI Complete Questions
No ratings yet
Generative AI Complete Questions
3 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
Chapter15 RNN
No ratings yet
Chapter15 RNN
29 pages
Praktikum TT M9.
No ratings yet
Praktikum TT M9.
6 pages
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
No ratings yet
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
10 pages
1 Binomial and Poisson Distribution
No ratings yet
1 Binomial and Poisson Distribution
21 pages
Final 2003
No ratings yet
Final 2003
18 pages
Excel Stat Tables
No ratings yet
Excel Stat Tables
14 pages
Anna University Question Paper - MA2261 Probability and Random Processes
No ratings yet
Anna University Question Paper - MA2261 Probability and Random Processes
3 pages
NFA Concepts for CS Students
No ratings yet
NFA Concepts for CS Students
33 pages
TensorFlow Roadmap
No ratings yet
TensorFlow Roadmap
22 pages