SlideShare a Scribd company logo
Day 1 Lecture 4
Backward Propagation
Elisa Sayrol
[course site]
Learning
Purely Supervised
Typically Backpropagation + Stochastic Gradient Descent (SGD)
Good when there are lots of labeled data
Layer-wise Unsupervised + Supervised classifier
Train each layer in sequence, using regularized auto-encoders or Restricted Boltzmann
Machines (RBM)
Hold the feature extractor, on top train linear classifier on features
Good when labeled data is scarce but there are lots of unlabeled data
Layer-wise Unsupervised + Supervised Backprop
Train each layer in sequence
Backprop through the whole system
Good when learning problem is very difficult
Slide Credit: Lecun 2
From Lecture 3
L Hidden Layers
Hidden pre-activation (k>0)
Hidden activation (k=1,…L)
Output activation (k=L+1)
Figure Credit: Hugo Laroche NN course 3
Backpropagation algorithm
The output of the Network gives class scores that depens on the input
and the parameters
• Define a loss function that quantifies our unhappiness with the
scores across the training data.
• Come up with a way of efficiently finding the parameters that
minimize the loss function (optimization)
4
Probability Class given an input
(softmax)
Minimize the loss (plus some
regularization term) w.r.t. Parameters
over the whole training set.
Loss function; e.g., negative log-
likelihood (good for classification)
h2
h3
a3
a4 h4
Loss
Hidden Hidden Output
W2
W3
x a2
Input
W1
Regularization term (L2 Norm)
aka as weight decay
Figure Credit: Kevin McGuiness
Forward Pass
5
Backpropagation algorithm
• We need a way to fit the model to data: find parameters (W(k)
, b(k)
) of the
network that (locally) minimize the loss function.
• We can use stochastic gradient descent. Or better yet, mini-batch
stochastic gradient descent.
• To do this, we need to find the gradient of the loss function with respect to
all the parameters of the model (W(k)
, b(k)
)
• These can be found using the chain rule of differentiation.
• The calculations reveal that the gradient wrt. the parameters in layer k only
depends on the error from the above layer and the output from the layer
below.
• This means that the gradients for each layer can be computed iteratively,
starting at the last layer and propagating the error back through the network.
This is known as the backpropagation algorithm.
Slide Credit: Kevin McGuiness 6
1. Find the error in the top layer: 3. Backpropagate error to layer below2. Compute weight updates
h2
h3
a3
a4 h4
Loss
Hidden Hidden Output
W2
W3
x a2
Input
W1
L
Figure Credit: Kevin McGuiness
Backward Pass
7
Optimization
Stochastic Gradient Descent
Stochastic Gradient Descent with momentum
Stochastic Gradient Descent with L2 regularization
http://cs231n.github.io/optimization-1/
http://cs231n.github.io/optimization-2/
: learning rate
: weight decay
Recommended lectures:
8

More Related Content

PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
PDF
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
PDF
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
PDF
Recurrent Instance Segmentation (UPC Reading Group)
PDF
Deep Learning for Computer Vision: Memory usage and computational considerati...
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Recurrent Instance Segmentation (UPC Reading Group)
Deep Learning for Computer Vision: Memory usage and computational considerati...
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)

What's hot (20)

PDF
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
PDF
Joint unsupervised learning of deep representations and image clusters
PDF
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
PDF
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
PPTX
Visual Object Analysis using Regions and Local Features
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
PPTX
Object detection - RCNNs vs Retinanet
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
PPTX
Image Classification using deep learning
PDF
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Joint unsupervised learning of deep representations and image clusters
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Visual Object Analysis using Regions and Local Features
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Object detection - RCNNs vs Retinanet
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Image Classification using deep learning
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Ad

Similar to Deep Learning for Computer Vision: Backward Propagation (UPC 2016) (20)

PDF
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
PPTX
22PCOAM16_UNIT 2_ Session 12 Deriving Back-Propagation .pptx
PDF
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
PPTX
Classification_by_back_&propagation.pptx
PPTX
Back Propagation-11-11-2qwasdddddd024.pptx
PDF
Classification by back propagation, multi layered feed forward neural network...
PPTX
PDF
NPTEL_backprobagation_Lecture4_DL(1).pdf
PPTX
Training Neural Networks.pptx
PPT
Back propagation
PPTX
Deep neural networks & computational graphs
PPTX
PRML Chapter 5
PPTX
ML_ Unit 2_Part_B
PPTX
back propagation1_presenation_lab 6.pptx
PPT
this is a Ai topic neural network ML_Lecture_4.ppt
PPTX
DeepLearningLecture.pptx
PPT
nural network ER. Abhishek k. upadhyay
PPTX
Deep learning crash course
PPTX
Maxhine learning rec02 - MLP and BP.pptx
PPTX
This is about session rec02 - MLP and BP.pptx
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
22PCOAM16_UNIT 2_ Session 12 Deriving Back-Propagation .pptx
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Classification_by_back_&propagation.pptx
Back Propagation-11-11-2qwasdddddd024.pptx
Classification by back propagation, multi layered feed forward neural network...
NPTEL_backprobagation_Lecture4_DL(1).pdf
Training Neural Networks.pptx
Back propagation
Deep neural networks & computational graphs
PRML Chapter 5
ML_ Unit 2_Part_B
back propagation1_presenation_lab 6.pptx
this is a Ai topic neural network ML_Lecture_4.ppt
DeepLearningLecture.pptx
nural network ER. Abhishek k. upadhyay
Deep learning crash course
Maxhine learning rec02 - MLP and BP.pptx
This is about session rec02 - MLP and BP.pptx
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Welding lecture in detail for understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
additive manufacturing of ss316l using mig welding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
DOCX
573137875-Attendance-Management-System-original
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
bas. eng. economics group 4 presentation 1.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Welding lecture in detail for understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
Digital Logic Computer Design lecture notes
additive manufacturing of ss316l using mig welding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
573137875-Attendance-Management-System-original
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

Deep Learning for Computer Vision: Backward Propagation (UPC 2016)

  • 1. Day 1 Lecture 4 Backward Propagation Elisa Sayrol [course site]
  • 2. Learning Purely Supervised Typically Backpropagation + Stochastic Gradient Descent (SGD) Good when there are lots of labeled data Layer-wise Unsupervised + Supervised classifier Train each layer in sequence, using regularized auto-encoders or Restricted Boltzmann Machines (RBM) Hold the feature extractor, on top train linear classifier on features Good when labeled data is scarce but there are lots of unlabeled data Layer-wise Unsupervised + Supervised Backprop Train each layer in sequence Backprop through the whole system Good when learning problem is very difficult Slide Credit: Lecun 2
  • 3. From Lecture 3 L Hidden Layers Hidden pre-activation (k>0) Hidden activation (k=1,…L) Output activation (k=L+1) Figure Credit: Hugo Laroche NN course 3
  • 4. Backpropagation algorithm The output of the Network gives class scores that depens on the input and the parameters • Define a loss function that quantifies our unhappiness with the scores across the training data. • Come up with a way of efficiently finding the parameters that minimize the loss function (optimization) 4
  • 5. Probability Class given an input (softmax) Minimize the loss (plus some regularization term) w.r.t. Parameters over the whole training set. Loss function; e.g., negative log- likelihood (good for classification) h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Regularization term (L2 Norm) aka as weight decay Figure Credit: Kevin McGuiness Forward Pass 5
  • 6. Backpropagation algorithm • We need a way to fit the model to data: find parameters (W(k) , b(k) ) of the network that (locally) minimize the loss function. • We can use stochastic gradient descent. Or better yet, mini-batch stochastic gradient descent. • To do this, we need to find the gradient of the loss function with respect to all the parameters of the model (W(k) , b(k) ) • These can be found using the chain rule of differentiation. • The calculations reveal that the gradient wrt. the parameters in layer k only depends on the error from the above layer and the output from the layer below. • This means that the gradients for each layer can be computed iteratively, starting at the last layer and propagating the error back through the network. This is known as the backpropagation algorithm. Slide Credit: Kevin McGuiness 6
  • 7. 1. Find the error in the top layer: 3. Backpropagate error to layer below2. Compute weight updates h2 h3 a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass 7
  • 8. Optimization Stochastic Gradient Descent Stochastic Gradient Descent with momentum Stochastic Gradient Descent with L2 regularization http://cs231n.github.io/optimization-1/ http://cs231n.github.io/optimization-2/ : learning rate : weight decay Recommended lectures: 8