SlideShare a Scribd company logo
Deep Residual Learning for
Image Recognition
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
Presented by – Sanjay Saha, School of Computing, NUS
CS6240 – Multimedia Analysis – Sem 2 AY2019/20
Objective | Problem Statement
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Motivation
Performance of plain networks in a deeper architecture
Image source: paper
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Main Idea
• Skip Connections/ Shortcuts
• Trying to avoid:
‘Vanishing Gradients’
‘Long training times’
Image source: Wikipedia
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Contributions | Problem Statement
• These extremely deep residual nets are easy to optimize, but the
counterpart “plain” nets (that simply stack layers) exhibit higher
training error when the depth increases.
• These deep residual nets can easily enjoy accuracy gains from greatly
increased depth, producing results substantially better than previous
networks.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
A residual learning framework to ease the training of networks that
are substantially deeper than those used previously.
Perfor
mance
Depth
School of Computing
Literature
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Literature Review
• Partial solutions for vanishing
• Batch Normalization – To rescale the weights over some batch.
• Smart Initialization of weights – Like for example Xavier initialization.
• Train portions of the network individually.
• Highway Networks
• Feature residual connections of the form
𝑌 = 𝑓 𝑥 × 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑥 + 𝑏) + 𝑥 × (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊𝑥 + 𝑏 )
• Data-dependent gated shortcuts with parameters
• When gates are ‘closed’, the layers become ‘non-residual’.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet | Design | Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Plain Block
𝑎[𝑙] 𝑎[𝑙+2]
𝑎[𝑙+1]
𝑧[𝑙+1]
= 𝑊[𝑙+1]
𝑎[𝑙]
+ 𝑏[𝑙+1]
“linear”
𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1])
“relu”
𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2]
“output”
𝑎[𝑙+2]
= 𝑔 𝑧 𝑙+2
“relu on output”
Image source: deeplearning.ai
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Residual Block
𝑎[𝑙] 𝑎[𝑙+2]
𝑎[𝑙+1]
𝑧[𝑙+1]
= 𝑊[𝑙+1]
𝑎[𝑙]
+ 𝑏[𝑙+1]
“linear”
𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1])
“relu”
𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2]
“output”
𝑎[𝑙+2]
= 𝑔 𝑧 𝑙+2
+ 𝑎 𝑙
“relu on output plus input”
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: deeplearning.ai
School of Computing
Skip Connections
• Skipping immediate connections!
• Referred to as residual part of the network.
• Such residual part receives the input as an amplifier to its output –
The dimensions usually are the same.
• Another option is to use a projection to the output space.
• Either way – no additional training parameters are used.
Image source: towardsdatascience.com
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Image source: paper
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
Stacked Residual Blocks
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
3x3 conv
layers
2x # of filters
2 strides to down-sample
Avg. pool after the
last conv layer
FC layer to
output classes
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
Image source: paper
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
Image source: paper
School of Computing
ResNet Architecture
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
1x1 conv with 64 filters
28x28x64
Input:
28x28x256
3x3 conv on 64 feature
maps only
1x1 conv with 256 filters
28x28x256
BOTTLENECK
Image source: paper
School of Computing
Summary | Advantages
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Benefits of Bottleneck
• Less training time for deeper networks
• By keeping time complexity same as
two-layer conv.
• Hence, allows to increase # of layers.
• And, model converges faster: 152-
layer ResNet has 11.3 billion FLOPS
while VGG-16/19 nets has 15.3/19.6
billion FLOPS.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Input:
28x28x256
Image source: paper
School of Computing
Summary – Advantages of ResNet over Plain
Networks
• A deeper plain network tends to perform bad because of the
vanishing and exploding gradients
• In such cases, ResNets will stop improving rather than decrease in
performance: 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 = 𝑔(𝑤 𝑙+1 𝑎 𝑙+1 + 𝑏 𝑙 + 𝑎[𝑙])
• If a layer is not ‘useful’, L2 regularization will bring its parameters very
close to zero, resulting in 𝑎[𝑙+2]
= 𝑔 𝑎[𝑙]
= 𝑎[𝑙]
(when using ReLU)
• In theory, ResNet is still identical to plain networks, but in practice
due to the above the convergence is much faster.
• No additional training parameters and complexity introduced.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Results
• ILSVRC 2015 classification winner (3.6% top 5 error) -- better than “human performance”!
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%) of ensembles. The top-5 error is on the
test set of ImageNet and reported by the test server
School of Computing
Results
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Error rates (%, 10-crop testing) on ImageNet
validation set
Error rates (%) of single-model results on
the ImageNet validation set
School of Computing
Plain vs. ResNet
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Plain vs. Deeper ResNet
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu)
Image source: paper
School of Computing
Conclusion | Future Trends
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Conclusion
•Easy to optimize deep neural networks.
•Guaranteed Accuracy gain with deeper layers.
•Addressed: Vanishing Gradient and Longer
Training duration.
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Future Trends
• Identity Mappings in Deep Residual Networks suggests to pass the
input directly to the final residual layer, hence allowing the network
to easily learn to pass the input as identity mapping both in forward
and backward passes. (He et. al. 2016)
• Using the Batch Normalization as pre-activation improves the
regularization
• Reduce Learning Time with Random Layer Drops
• ResNeXt: Aggregated Residual Transformations for Deep Neural
Networks. (Xie et. al. 2016)
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
Questions?

More Related Content

PPTX
Resnet.pptx
PDF
PDF
Densenet CNN
PPTX
Batch normalization presentation
PPTX
AlexNet.pptx
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PPTX
AlexNet
PDF
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Resnet.pptx
Densenet CNN
Batch normalization presentation
AlexNet.pptx
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
AlexNet
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks

What's hot (20)

PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PPTX
Convolutional Neural Network and Its Applications
PDF
Convolutional Neural Networks (CNN)
PPTX
Convolutional Neural Network
PPTX
Machine Learning - Convolutional Neural Network
PPTX
Activation functions and Training Algorithms for Deep Neural network
PPT
Cnn method
PPTX
Convolutional neural network
PPTX
Convolution Neural Network (CNN)
PDF
Convolutional Neural Network Models - Deep Learning
PPTX
Introduction to CNN
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
Convolutional neural network
PDF
Convolutional neural network
PDF
Introduction to Generative Adversarial Networks (GANs)
PPTX
Convolution Neural Network (CNN)
PPTX
Deep Learning With Neural Networks
PPTX
Image classification with Deep Neural Networks
PPTX
Artifical Neural Network and its applications
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network and Its Applications
Convolutional Neural Networks (CNN)
Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Activation functions and Training Algorithms for Deep Neural network
Cnn method
Convolutional neural network
Convolution Neural Network (CNN)
Convolutional Neural Network Models - Deep Learning
Introduction to CNN
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Convolutional neural network
Convolutional neural network
Introduction to Generative Adversarial Networks (GANs)
Convolution Neural Network (CNN)
Deep Learning With Neural Networks
Image classification with Deep Neural Networks
Artifical Neural Network and its applications
Ad

Similar to ResNet basics (Deep Residual Network for Image Recognition) (20)

PPTX
ResNet.pptx
PPTX
ResNet.pptx
PPTX
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
PPTX
CNN Arcitecture Implementation Resnet CNN-RESNET
PPTX
CNN_Deep Learning ResNet_Presentation1.pptx
PDF
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
PDF
6119ijcsitce01
PDF
International Journal of Computational Science, Information Technology and Co...
PDF
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
PPTX
Resnet for image processing (3)
PDF
Case Study of Convolutional Neural Network
PPTX
Computer Vision for Beginners
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
Paper overview: "Deep Residual Learning for Image Recognition"
PDF
PDF
Deep LearningフレームワークChainerと最近の技術動向
PDF
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
ResNet.pptx
ResNet.pptx
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
CNN Arcitecture Implementation Resnet CNN-RESNET
CNN_Deep Learning ResNet_Presentation1.pptx
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
6119ijcsitce01
International Journal of Computational Science, Information Technology and Co...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
Resnet for image processing (3)
Case Study of Convolutional Neural Network
Computer Vision for Beginners
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
DLD meetup 2017, Efficient Deep Learning
Paper overview: "Deep Residual Learning for Image Recognition"
Deep LearningフレームワークChainerと最近の技術動向
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
Convolutional neural networks for image classification — evidence from Kaggle...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Ad

More from Sanjay Saha (7)

PDF
Face Recognition Basic Terminologies
PDF
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
PDF
Convolutional Deep Belief Nets by Lee. H. 2009
PDF
IEEE_802.11e
PPTX
Image Degradation & Resoration
PPTX
Fault Tree Analysis
PDF
Stack and Queue (brief)
Face Recognition Basic Terminologies
Is Face Recognition Safe from Realizable Attacks? - IJCB 2020 - Sanjay Saha, ...
Convolutional Deep Belief Nets by Lee. H. 2009
IEEE_802.11e
Image Degradation & Resoration
Fault Tree Analysis
Stack and Queue (brief)

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
.pdf is not working space design for the following data for the following dat...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
IBA_Chapter_11_Slides_Final_Accessible.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
STUDY DESIGN details- Lt Col Maksud (21).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
IB Computer Science - Internal Assessment.pptx
Business Analytics and business intelligence.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Qualitative Qantitative and Mixed Methods.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Reliability_Chapter_ presentation 1221.5784
.pdf is not working space design for the following data for the following dat...
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Database Infoormation System (DBIS).pptx

ResNet basics (Deep Residual Network for Image Recognition)

  • 1. Deep Residual Learning for Image Recognition Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Presented by – Sanjay Saha, School of Computing, NUS CS6240 – Multimedia Analysis – Sem 2 AY2019/20
  • 2. Objective | Problem Statement Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 3. Motivation Performance of plain networks in a deeper architecture Image source: paper Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 4. Main Idea • Skip Connections/ Shortcuts • Trying to avoid: ‘Vanishing Gradients’ ‘Long training times’ Image source: Wikipedia Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 5. Contributions | Problem Statement • These extremely deep residual nets are easy to optimize, but the counterpart “plain” nets (that simply stack layers) exhibit higher training error when the depth increases. • These deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) A residual learning framework to ease the training of networks that are substantially deeper than those used previously. Perfor mance Depth School of Computing
  • 6. Literature Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 7. Literature Review • Partial solutions for vanishing • Batch Normalization – To rescale the weights over some batch. • Smart Initialization of weights – Like for example Xavier initialization. • Train portions of the network individually. • Highway Networks • Feature residual connections of the form 𝑌 = 𝑓 𝑥 × 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑥 + 𝑏) + 𝑥 × (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊𝑥 + 𝑏 ) • Data-dependent gated shortcuts with parameters • When gates are ‘closed’, the layers become ‘non-residual’. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 8. ResNet | Design | Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 9. Plain Block 𝑎[𝑙] 𝑎[𝑙+2] 𝑎[𝑙+1] 𝑧[𝑙+1] = 𝑊[𝑙+1] 𝑎[𝑙] + 𝑏[𝑙+1] “linear” 𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1]) “relu” 𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2] “output” 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 “relu on output” Image source: deeplearning.ai Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 10. Residual Block 𝑎[𝑙] 𝑎[𝑙+2] 𝑎[𝑙+1] 𝑧[𝑙+1] = 𝑊[𝑙+1] 𝑎[𝑙] + 𝑏[𝑙+1] “linear” 𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1]) “relu” 𝑧[𝑙+2] = 𝑊[𝑙+2] 𝑎[𝑙+1] + 𝑏[𝑙+2] “output” 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 “relu on output plus input” Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: deeplearning.ai School of Computing
  • 11. Skip Connections • Skipping immediate connections! • Referred to as residual part of the network. • Such residual part receives the input as an amplifier to its output – The dimensions usually are the same. • Another option is to use a projection to the output space. • Either way – no additional training parameters are used. Image source: towardsdatascience.com Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 12. ResNet Architecture Image source: paper Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 13. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper Stacked Residual Blocks School of Computing
  • 14. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 15. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 16. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper 3x3 conv layers 2x # of filters 2 strides to down-sample Avg. pool after the last conv layer FC layer to output classes School of Computing
  • 17. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 Image source: paper School of Computing
  • 18. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only Image source: paper School of Computing
  • 19. ResNet Architecture Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) 1x1 conv with 64 filters 28x28x64 Input: 28x28x256 3x3 conv on 64 feature maps only 1x1 conv with 256 filters 28x28x256 BOTTLENECK Image source: paper School of Computing
  • 20. Summary | Advantages Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 21. Benefits of Bottleneck • Less training time for deeper networks • By keeping time complexity same as two-layer conv. • Hence, allows to increase # of layers. • And, model converges faster: 152- layer ResNet has 11.3 billion FLOPS while VGG-16/19 nets has 15.3/19.6 billion FLOPS. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Input: 28x28x256 Image source: paper School of Computing
  • 22. Summary – Advantages of ResNet over Plain Networks • A deeper plain network tends to perform bad because of the vanishing and exploding gradients • In such cases, ResNets will stop improving rather than decrease in performance: 𝑎[𝑙+2] = 𝑔 𝑧 𝑙+2 + 𝑎 𝑙 = 𝑔(𝑤 𝑙+1 𝑎 𝑙+1 + 𝑏 𝑙 + 𝑎[𝑙]) • If a layer is not ‘useful’, L2 regularization will bring its parameters very close to zero, resulting in 𝑎[𝑙+2] = 𝑔 𝑎[𝑙] = 𝑎[𝑙] (when using ReLU) • In theory, ResNet is still identical to plain networks, but in practice due to the above the convergence is much faster. • No additional training parameters and complexity introduced. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 23. Results Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 24. Results • ILSVRC 2015 classification winner (3.6% top 5 error) -- better than “human performance”! Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%) of ensembles. The top-5 error is on the test set of ImageNet and reported by the test server School of Computing
  • 25. Results Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Error rates (%, 10-crop testing) on ImageNet validation set Error rates (%) of single-model results on the ImageNet validation set School of Computing
  • 26. Plain vs. ResNet Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 27. Plain vs. Deeper ResNet Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) Image source: paper School of Computing
  • 28. Conclusion | Future Trends Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 29. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 30. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 31. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 32. Conclusion •Easy to optimize deep neural networks. •Guaranteed Accuracy gain with deeper layers. •Addressed: Vanishing Gradient and Longer Training duration. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 33. Future Trends • Identity Mappings in Deep Residual Networks suggests to pass the input directly to the final residual layer, hence allowing the network to easily learn to pass the input as identity mapping both in forward and backward passes. (He et. al. 2016) • Using the Batch Normalization as pre-activation improves the regularization • Reduce Learning Time with Random Layer Drops • ResNeXt: Aggregated Residual Transformations for Deep Neural Networks. (Xie et. al. 2016) Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing
  • 34. Presented by – Sanjay Saha (sanjaysaha@u.nus.edu) School of Computing Questions?