Deep residual learning for
Image Recognition
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Presenters
Akash Gadicherla Sashidhar Reddy Avuthu
Computer Engineering Computer Engineering
ag1816@scarletmail.rutgers.edu
sa2220@scarletmail.rutgers.edu
Outline
MOTIVATION/ APPROACH RESULTS RELATED WORK CONCLUSION OUR THOUGHTS
BACKGROUND
Motivation/
Background
Background
Common convention
Unable to scale
for image recognition Depth of network is
networks beyond 20
was using CNNs crucially important
layers
(VGG, AlexNet)
Motivation Network scaling
Gradient degradation
Optimization issues
Approach
•Instead of directly learning the
transformation H(x), residual learning
Residual reformulates this to H(x)=F(x)+x
Mapping: •Residual Function F(x): Represents
the difference between the input and
target, making it easier to learn.
•Simplifies learning by focusing on
small adjustments.
Advantages: •Addresses degradation by allowing
layers to "do nothing" if needed,
where F(x) can be zero.
Approach
•Shortcut Connections
•Types of Shortcuts:
• Identity Shortcuts: Add the input directly,
requiring no extra parameters.
• Projection Shortcuts: Use 1x1 convolution to
match input-output dimensions when necessary.
•Function of Shortcuts:
• Enable effective information flow across layers.
• Minimize optimization difficulty by bypassing
some transformations when they’re unnecessary.
Approach
•Architecture Overview
•Plain Networks vs. ResNets:
• Plain Networks: Sequential layers, vulnerable to
degradation.
• ResNet: Introduces shortcut connections in every few
layers to mitigate issues.
•Bottleneck Architecture:
• For deep networks (50+ layers), ResNet employs a 3-layer
bottleneck (1x1-3x3-1x1) to reduce computational costs.
Experimental
Results
•Results on ImageNet
•Setup:
• ResNets with 18, 34, 50, 101, and 152 layers
tested.
• Dataset: ImageNet, over a million images
across 1,000 classes.
•Results:
• 152-layer ResNet: Achieved 3.57% top-5
error, setting a new state-of-the-art.
• Ensemble: Further reduced error, winning
the 2015 ImageNet competition.
•Results on CIFAR-10
• Objective: Evaluate extreme network depth.
Experimental •
•
Setup: ResNets tested up to 1,202 layers on CIFAR-10.
Findings:
Results •
•
ResNets trained successfully without degradation, even at 1,202 layers.
Conclusion: Residual learning enables training of very deep networks,
unlike plain networks that degrade as depth increases.
Related Work ResNeXt
• Uses groups to
GoogLeNet learn different
features
• Uses Inception module
• 22 layers
• Parallel computing
Conclusion
and Future
Work
Summary
RESNETS ABLE TO SOLVE EASY SET THE
ACHIEVE HIGH THE GRADIENT OPTIMIZATION STANDARD FOR
ACCURACY WITH DEGRADATION FUTURE IMAGE
HIGHER DEPTH PROBLEM CLASSIFICATION
ASSOCIATED ARCHITECTURES
WITH HIGH
LAYER NEURAL
NETWORKS
Our Thoughts
Overall, the paper had a lot of detail and thoroughly
explained the residual networks
We were impressed with how relevant it still is today
The model is versatile and can be extended for other
vision based applications
Thank
You!
Any Questions?