SlideShare a Scribd company logo
The Back Propagation Learning Algorithm




  For networks with hidden units.
  Error Correcting algorithm.
  Solves the credit (blame) assignment problem.




                         1
What is supervised learning?


Can we teach a network to learn to associate a pattern of
inputs with corresponding outputs?
i.e. given initial set of weights, how can they be adapted
to produce the desired output? Use a training set:
                                                    y


                 a       f?        d
   payment




                     b        e?


             c                                  w       p



                               workload
             person      workload      pay   P(happy)
             a           0.1           0.9   0.95
             b           0.3           0.7   0.8
             c           0.07          0.2   0.2
             d           0.9           0.9   0.3
             e           0.7           0.5   ??
             f           0.4           0.8   ??

After training, how does network generalise to patterns
unseen during learning?

                                   2
Learning by Error Correction


In the perceptron there was a binary valued output Ý and
a target Ø.
               x1            x2                        xN




                        w1        w2           wN




                             output y
                             target t
                                                        y


                Æ                                           1

   Ý    step            ÛÜ
                    ¼
                                                            0
                                                                Σwi xi
                                                                i


Define this error measure:
                              ½            ´Ø   Ý µ¾
                              ¾
It counts the number of incorrect outputs.

We want to design a weight changing procedure that
minimises .

                                       3
Learning by Error Correction



How do we change the weights Û¼ Û½         ÛÆ so that
error decreases?
                     E

Suppose error
                           slope                  slope
varies with weight         -ve                    +ve
Û like this.

                                                 wi


If we could measure the slope


                           Û
then changing weights by the negative of the slope will
minimise .



  slope +ve   ¡Û -ve     move towards minimum of
  slope -ve   ¡Û +ve



                           4
More Perceptron Problems


For the perceptron, can’t be differentiated with respect
to weights Û¼ Û½        ÛÆ because involves output Ý
which is not differentiable.
           ½    ´Ø   Ý µ¾           Ý   step
                                                       Æ
                                                           ÛÜ
           ¾                                           ¼


Threshold Unit:
                                               y

     ´        ÈÆ Û Ü                               1
       ½   if               ¼
Ý
       ¼   if
              ÈÆ ¼ Û Ü      ¼
                    ¼
                                                   0
                                                                Σwi xi
                                                                i


Sigmoid Unit:
                                               y


                ½                              1
Ý              ÈÆ ¡
     ½ · ÜÔ      ÛÜ
                                               0
                                                                 Σwi xi
                                                                 i




                                5
Gradient Descent


                         E

The error is now               slope               slope
a differentiable               -ve                 +ve
function.

                                                   wi


Change weights using negative slope

                         ¡Û            Û




    Û
        +ve   ¡Û   -ve
                             move towards minimum of

    Û
        -ve   ¡Û   +ve



This approach is called Gradient Descent




                                6
Derivation of Back Propagation


         x1                      v1                      y1


         x2                      v2                      y2




         xk                       vj                       yi
                      uj k                    wi j



         xN                      vN                      yN



         inputs                  hidden                    outputs
           xk                      vj                        yi



                                              È            ¡
          output             Ý         sig           Û Ú
                                              È               ¡
         hidden              Ú         sig           Ù Ü

              error                    ½È     È  Ø   Ý ¡¾
                                       ¾

We need to find the derivatives of                    with respect to weights
Û and Ù .

                                       7
Preliminaries


                                                      xk   ujk vj   wij yi

On a single pattern (drop )
                        ½                    ¡¾
                        ¾           Ø    Ý
and
                                    ½
             Ý                   ÈÆ ¡
                   ½ · ÜÔ          Û Ú

Note that:
                        Ý                     ¡
                        Ú
                                Ý       ½ Ý       Û


                        Ý                     ¡
                        Û
                                Ý       ½ Ý       Ú



              since if Ý
                                      ½
                                ½ · ÜÔ´ Üµ
                        Ý
                 then           Ý ´½   Ý µ
                        Ü




                            8
Between Hidden and Output                               Û

                                                            xk   ujk vj   wij yi

For weights between hidden units
and output units.

                                 ½                 ¡¾
                                 ¾        Ø    Ý

                                              Ý
                            Û             Ý   Û
                        ¡
       Ý
               Ý    Ø
       Ý
       Û
               Ý ´½     ÝµÚ
                                    ¡
                   Û
                             Ý     Ø ßÞ ´½   Ý µ Ú
                                      Ý
                                     call this Æ




                                      9
Between Input and Hidden                                    Ù

                                                                 xk   ujk vj   wij yi

For weights between input units
and hidden units.

                                      ½                     ¡¾
                                      ¾            Ø    Ý

                                                       Ý     Ú
                         Ù                     Ý       Ú     Ù

                             ¡
       Ý
                   Ý    Ø
       Ý
       Ú
               Ý ´½      ÝµÛ
       Ú
       Ù
                   Ú   ´½   Ú µ Ü

                                          ¡
           Ù
                                 Ý     Ø Ý ´½   Ý µ Û Ú ´½   Ú µ Ü


           Ù
                             ÆÛ           Ú   ´½   Ú µ Ü

                                              10
Between Hidden and Output                                 ¡Û

                                                                  xk      ujk vj   wij yi

Modifying weights between hidden
units and output units using
gradient descent.


          ¡Û                Û

                                        ¡
                            Ý    
                                ßÞ Ø        Ý ´½      
                                                    ßÞ Ý µ Ú
                                                 close to ¼ ½
                                                 small for Ý
                 Learning
                 constant




                                                                “input”
                                error




                                            ßÞ
                                            Æ




                                 11
Between Input and Hidden             ¡Ù

                                            xk   ujk vj   wij yi

Modifying weights between input
units and hidden units using
gradient descent.


          ¡Ù            Ù


                            Æ Û Ú   ´½   Ú µÜ

               back propagation of error


The same procedure is applicable to a net with many
hidden layers.




                            12
An Example



        x1      u                                  x2
                      =0                  2.0
                 21
                         .8              =
    u 11 =2.0                     u 12
                                                 u 22 =0.8
                                                                        ܽ ܾ target Ø
     u 10 = -1.0                                u 20 = -1.0              0       0   0
                       v1           v2
1                                                             1
                                                                         0       1   1
                                                                         1       0   1
             w1 =2.0                     w2 = -1.0
                                                                         1       1   0
        1                     y
             w0 = -1.0



                                                                             ¡
                 hidden Ú½                         sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼
                                                       
                                                   0.9526                  ¡
                                   Ú¾              sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼
                                                       
                                                   0.6457               ¡
                 output Ý                          sig Û½Ú½ · Û¾Ú¾ · Û¼
                                                   0.5645

                       error                        ½  Ø   Ý ¡¾
                                                    ¾
                                                   0.1593




                                                         13
An Example: updating the weights


 Learning constant        ½¼

             output        Æ        ´Ý   ص Ý´½   ݵ
                                    0.1388
                        ¡Û¼           ƽ ¼
                                    -0.1388
                        ¡Û½             ÆÚ½
                                    -0.1322
                        ¡Û¾             ÆÚ¾
                                    -0.0896


 hidden (to Ú½)                         hidden (to Ú¾)
¡Ù½¼        ÆÛ½ Ú½´½     Ú½µ½ ¼ ¡Ù¾¼               ÆÛ¾ Ú¾´½     Ú¾µ½ ¼
        -0.0125                                0.0318
¡Ù½½        ÆÛ½ Ú½´½     Ú½µÜ½ ¡Ù¾½                ÆÛ¾ Ú¾´½     Ú¾µÜ½
        -0.0125                                0.0318
¡Ù½¾        ÆÛ½ Ú½´½     Ú½µÜ¾ ¡Ù¾¾                ÆÛ¾ Ú¾´½     Ú¾µÜ¾
        -0.0125                                0.0318




                               14
An Example: a New Error



        x1      u                                  x2
                                              8
                      =0                  1.9
                 21
                        .83              =
 u 11 =1.98                       u 12
                                                  u 22 =0.83
                                                                        ܽ ܾ target Ø
     u 10 = -1.01                            u 20 = -0.96                0       0   0
                      v1            v2
 1                                                             1
                                                                         0       1   1
                                                                         1       0   1
            w1 =1.86                     w2 = -1.08
                                                                         1       1   0
        1                     y
             w0 = -1.13



                                                                             ¡
                hidden Ú½                          sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼
                                                       
                                                   0.9509                  ¡
                                   Ú¾              sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼
                                                       
                                                   0.6672               ¡
                 output Ý                          sig Û½Ú½ · Û¾Ú¾ · Û¼
                                                   0.4776

                      error                         ½  Ø   Ý ¡¾
                                                    ¾
                                                   0.1140

The error has reduced for this pattern.



                                                         15
Summary




  Credit-assignment problem solved for hidden units:

           Input                                Output

                                           ƽ
                             Û½

                             Û¾
                 Æ                         ƾ
                             Û¿

             Æ       ¼

                         ´ µÈ Û Æ          Æ¿

                             Errors

       total input to unit ;            1st derivative of acti-
                                    ¼




  vation function (sigmoid)
  Outstanding issues:
  1. Number of layers; number and type of units in
     layer
  2. Learning rates
  3. Local or distributed representations

                              16

More Related Content

PDF
Deep Learning Intro
PDF
010_20160216_Variational Gaussian Process
PPTX
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
DOCX
Backpropagation
PPTX
PPT
nural network ER. Abhishek k. upadhyay
PPSX
Perceptron (neural network)
PPT
Perceptron
Deep Learning Intro
010_20160216_Variational Gaussian Process
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
Backpropagation
nural network ER. Abhishek k. upadhyay
Perceptron (neural network)
Perceptron

Viewers also liked (9)

PPTX
Hopfield Networks
PDF
The Back Propagation Learning Algorithm
PPTX
Artificial Neural Network
PDF
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
PPT
Neural Networks
PPT
lecture07.ppt
PPTX
HOPFIELD NETWORK
PPT
Back propagation
PPTX
neural network
Hopfield Networks
The Back Propagation Learning Algorithm
Artificial Neural Network
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Neural Networks
lecture07.ppt
HOPFIELD NETWORK
Back propagation
neural network
Ad

More from ESCOM (20)

PDF
redes neuronales tipo Som
DOC
redes neuronales Som
PDF
redes neuronales Som Slides
PDF
red neuronal Som Net
PDF
Self Organinising neural networks
DOC
redes neuronales Kohonen
DOC
Teoria Resonancia Adaptativa
DOC
ejemplo red neuronal Art1
DOC
redes neuronales tipo Art3
DOC
Art2
DOC
Redes neuronales tipo Art
DOC
Neocognitron
PPT
Neocognitron
PPT
Neocognitron
PPT
Fukushima Cognitron
PPT
Counterpropagation NETWORK
PPT
Counterpropagation NETWORK
PPT
Counterpropagation
PPT
Teoría de Resonancia Adaptativa Art2 ARTMAP
PPT
Teoría de Resonancia Adaptativa ART1
redes neuronales tipo Som
redes neuronales Som
redes neuronales Som Slides
red neuronal Som Net
Self Organinising neural networks
redes neuronales Kohonen
Teoria Resonancia Adaptativa
ejemplo red neuronal Art1
redes neuronales tipo Art3
Art2
Redes neuronales tipo Art
Neocognitron
Neocognitron
Neocognitron
Fukushima Cognitron
Counterpropagation NETWORK
Counterpropagation NETWORK
Counterpropagation
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa ART1
Ad

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Institutional Correction lecture only . . .
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Lesson notes of climatology university.
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chinmaya Tiranga quiz Grand Finale.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Final Presentation General Medicine 03-08-2024.pptx
Institutional Correction lecture only . . .
human mycosis Human fungal infections are called human mycosis..pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Complications of Minimal Access Surgery at WLH
102 student loan defaulters named and shamed – Is someone you know on the list?
O7-L3 Supply Chain Operations - ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

The Back Propagation Learning Algorithm

  • 1. The Back Propagation Learning Algorithm For networks with hidden units. Error Correcting algorithm. Solves the credit (blame) assignment problem. 1
  • 2. What is supervised learning? Can we teach a network to learn to associate a pattern of inputs with corresponding outputs? i.e. given initial set of weights, how can they be adapted to produce the desired output? Use a training set: y a f? d payment b e? c w p workload person workload pay P(happy) a 0.1 0.9 0.95 b 0.3 0.7 0.8 c 0.07 0.2 0.2 d 0.9 0.9 0.3 e 0.7 0.5 ?? f 0.4 0.8 ?? After training, how does network generalise to patterns unseen during learning? 2
  • 3. Learning by Error Correction In the perceptron there was a binary valued output Ý and a target Ø. x1 x2 xN w1 w2 wN output y target t y Æ 1 Ý step ÛÜ ¼ 0 Σwi xi i Define this error measure: ½ ´Ø   Ý µ¾ ¾ It counts the number of incorrect outputs. We want to design a weight changing procedure that minimises . 3
  • 4. Learning by Error Correction How do we change the weights Û¼ Û½ ÛÆ so that error decreases? E Suppose error slope slope varies with weight -ve +ve Û like this. wi If we could measure the slope Û then changing weights by the negative of the slope will minimise . slope +ve ¡Û -ve move towards minimum of slope -ve ¡Û +ve 4
  • 5. More Perceptron Problems For the perceptron, can’t be differentiated with respect to weights Û¼ Û½ ÛÆ because involves output Ý which is not differentiable. ½ ´Ø   Ý µ¾ Ý step Æ ÛÜ ¾ ¼ Threshold Unit: y ´ ÈÆ Û Ü 1 ½ if ¼ Ý ¼ if ÈÆ ¼ Û Ü ¼ ¼ 0 Σwi xi i Sigmoid Unit: y ½ 1 Ý  ÈÆ ¡ ½ · ÜÔ   ÛÜ 0 Σwi xi i 5
  • 6. Gradient Descent E The error is now slope slope a differentiable -ve +ve function. wi Change weights using negative slope ¡Û   Û Û +ve ¡Û -ve move towards minimum of Û -ve ¡Û +ve This approach is called Gradient Descent 6
  • 7. Derivation of Back Propagation x1 v1 y1 x2 v2 y2 xk vj yi uj k wi j xN vN yN inputs hidden outputs xk vj yi  È ¡ output Ý sig Û Ú  È ¡ hidden Ú sig Ù Ü error ½È È  Ø   Ý ¡¾ ¾ We need to find the derivatives of with respect to weights Û and Ù . 7
  • 8. Preliminaries xk ujk vj wij yi On a single pattern (drop ) ½   ¡¾ ¾ Ø  Ý and ½ Ý  ÈÆ ¡ ½ · ÜÔ   Û Ú Note that: Ý   ¡ Ú Ý ½ Ý Û Ý   ¡ Û Ý ½ Ý Ú since if Ý ½ ½ · ÜÔ´ Üµ Ý then Ý ´½   Ý µ Ü 8
  • 9. Between Hidden and Output Û xk ujk vj wij yi For weights between hidden units and output units. ½   ¡¾ ¾ Ø  Ý Ý Û Ý Û   ¡ Ý Ý  Ø Ý Û Ý ´½  ÝµÚ   ¡ Û Ý   Ø ßÞ ´½   Ý µ Ú Ý call this Æ 9
  • 10. Between Input and Hidden Ù xk ujk vj wij yi For weights between input units and hidden units. ½   ¡¾ ¾ Ø  Ý Ý Ú Ù Ý Ú Ù   ¡ Ý Ý  Ø Ý Ú Ý ´½  ÝµÛ Ú Ù Ú ´½   Ú µ Ü   ¡ Ù Ý   Ø Ý ´½   Ý µ Û Ú ´½   Ú µ Ü Ù ÆÛ Ú ´½   Ú µ Ü 10
  • 11. Between Hidden and Output ¡Û xk ujk vj wij yi Modifying weights between hidden units and output units using gradient descent. ¡Û   Û   ¡   Ý   ßÞ Ø Ý ´½   ßÞ Ý µ Ú close to ¼ ½ small for Ý Learning constant “input” error ßÞ Æ 11
  • 12. Between Input and Hidden ¡Ù xk ujk vj wij yi Modifying weights between input units and hidden units using gradient descent. ¡Ù   Ù   Æ Û Ú ´½   Ú µÜ back propagation of error The same procedure is applicable to a net with many hidden layers. 12
  • 13. An Example x1 u x2 =0 2.0 21 .8 = u 11 =2.0 u 12 u 22 =0.8 ܽ ܾ target Ø u 10 = -1.0 u 20 = -1.0 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =2.0 w2 = -1.0 1 1 0 1 y w0 = -1.0   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9526 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6457 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.5645 error ½  Ø   Ý ¡¾ ¾ 0.1593 13
  • 14. An Example: updating the weights Learning constant ½¼ output Æ ´Ý   ص Ý´½   ݵ 0.1388 ¡Û¼   ƽ ¼ -0.1388 ¡Û½   ÆÚ½ -0.1322 ¡Û¾   ÆÚ¾ -0.0896 hidden (to Ú½) hidden (to Ú¾) ¡Ù½¼   ÆÛ½ Ú½´½   Ú½µ½ ¼ ¡Ù¾¼   ÆÛ¾ Ú¾´½   Ú¾µ½ ¼ -0.0125 0.0318 ¡Ù½½   ÆÛ½ Ú½´½   Ú½µÜ½ ¡Ù¾½   ÆÛ¾ Ú¾´½   Ú¾µÜ½ -0.0125 0.0318 ¡Ù½¾   ÆÛ½ Ú½´½   Ú½µÜ¾ ¡Ù¾¾   ÆÛ¾ Ú¾´½   Ú¾µÜ¾ -0.0125 0.0318 14
  • 15. An Example: a New Error x1 u x2 8 =0 1.9 21 .83 = u 11 =1.98 u 12 u 22 =0.83 ܽ ܾ target Ø u 10 = -1.01 u 20 = -0.96 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =1.86 w2 = -1.08 1 1 0 1 y w0 = -1.13   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9509 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6672 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.4776 error ½  Ø   Ý ¡¾ ¾ 0.1140 The error has reduced for this pattern. 15
  • 16. Summary Credit-assignment problem solved for hidden units: Input Output ƽ Û½ Û¾ Æ Æ¾ Û¿ Æ ¼ ´ µÈ Û Æ Æ¿ Errors total input to unit ; 1st derivative of acti- ¼ vation function (sigmoid) Outstanding issues: 1. Number of layers; number and type of units in layer 2. Learning rates 3. Local or distributed representations 16