The Back Propagation Learning Algorithm

The Back Propagation Learning Algorithm

For networks with hidden units.
Error Correcting algorithm.
Solves the credit (blame) assignment problem.

1

What is supervised learning?

Can we teach a network to learn to associate a pattern of
inputs with corresponding outputs?
i.e. given initial set of weights, how can they be adapted
to produce the desired output? Use a training set:
y

a f? d
payment

b e?

c w p

workload
person workload pay P(happy)
a 0.1 0.9 0.95
b 0.3 0.7 0.8
c 0.07 0.2 0.2
d 0.9 0.9 0.3
e 0.7 0.5 ??
f 0.4 0.8 ??

After training, how does network generalise to patterns
unseen during learning?

2

Learning by Error Correction

In the perceptron there was a binary valued output Ý and
a target Ø.
x1 x2 xN

w1 w2 wN

output y
target t
y

Æ 1

Ý step ÛÜ
¼
0
Σwi xi
i

Deﬁne this error measure:
½ ´Ø Ý µ¾
¾
It counts the number of incorrect outputs.

We want to design a weight changing procedure that
minimises .

3

Learning by Error Correction

How do we change the weights Û¼ Û½ ÛÆ so that
error decreases?
E

Suppose error
slope slope
varies with weight -ve +ve
Û like this.

wi

If we could measure the slope

Û
then changing weights by the negative of the slope will
minimise .

slope +ve ¡Û -ve move towards minimum of
slope -ve ¡Û +ve

4

More Perceptron Problems

For the perceptron, can’t be differentiated with respect
to weights Û¼ Û½ ÛÆ because involves output Ý
which is not differentiable.
½ ´Ø Ý µ¾ Ý step
Æ
ÛÜ
¾ ¼

Threshold Unit:
y

´ ÈÆ Û Ü 1
½ if ¼
Ý
¼ if
ÈÆ ¼ Û Ü ¼
¼
0
Σwi xi
i

Sigmoid Unit:
y

½ 1
Ý ÈÆ ¡
½ · ÜÔ ÛÜ
0
Σwi xi
i

5

Gradient Descent

E

The error is now slope slope
a differentiable -ve +ve
function.

wi

Change weights using negative slope

¡Û Û

Û
+ve ¡Û -ve
move towards minimum of

Û
-ve ¡Û +ve

This approach is called Gradient Descent

6

Derivation of Back Propagation

x1 v1 y1

x2 v2 y2

xk vj yi
uj k wi j

xN vN yN

inputs hidden outputs
xk vj yi

È ¡
output Ý sig Û Ú
È ¡
hidden Ú sig Ù Ü

error ½È È Ø Ý ¡¾
¾

We need to ﬁnd the derivatives of with respect to weights
Û and Ù .

7

Preliminaries

xk ujk vj wij yi

On a single pattern (drop )
½ ¡¾
¾ Ø Ý
and
½
Ý ÈÆ ¡
½ · ÜÔ Û Ú

Note that:
Ý ¡
Ú
Ý ½ Ý Û

Ý ¡
Û
Ý ½ Ý Ú

since if Ý
½
½ · ÜÔ´ Üµ
Ý
then Ý ´½ Ý µ
Ü

8

Between Hidden and Output Û

xk ujk vj wij yi

For weights between hidden units
and output units.

½ ¡¾
¾ Ø Ý

Ý
Û Ý Û
¡
Ý
Ý Ø
Ý
Û
Ý ´½ ÝµÚ
¡
Û
Ý Ø ßÞ ´½ Ý µ Ú
Ý
call this Æ

9

Between Input and Hidden Ù

xk ujk vj wij yi

For weights between input units
and hidden units.

½ ¡¾
¾ Ø Ý

Ý Ú
Ù Ý Ú Ù

¡
Ý
Ý Ø
Ý
Ú
Ý ´½ ÝµÛ
Ú
Ù
Ú ´½ Ú µ Ü

¡
Ù
Ý Ø Ý ´½ Ý µ Û Ú ´½ Ú µ Ü

Ù
ÆÛ Ú ´½ Ú µ Ü

10

Between Hidden and Output ¡Û

xk ujk vj wij yi

Modifying weights between hidden
units and output units using
gradient descent.

¡Û Û

¡
Ý
ßÞ Ø Ý ´½
ßÞ Ý µ Ú
close to ¼ ½
small for Ý
Learning
constant

“input”
error

ßÞ
Æ

11

Between Input and Hidden ¡Ù

xk ujk vj wij yi

Modifying weights between input
units and hidden units using
gradient descent.

¡Ù Ù

Æ Û Ú ´½ Ú µÜ

back propagation of error

The same procedure is applicable to a net with many
hidden layers.

12

An Example

x1 u x2
=0 2.0
21
.8 =
u 11 =2.0 u 12
u 22 =0.8
Ü½ Ü¾ target Ø
u 10 = -1.0 u 20 = -1.0 0 0 0
v1 v2
1 1
0 1 1
1 0 1
w1 =2.0 w2 = -1.0
1 1 0
1 y
w0 = -1.0

¡
hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼

0.9526 ¡
Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼

0.6457 ¡
output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼
0.5645

error ½ Ø Ý ¡¾
¾
0.1593

13

An Example: updating the weights

Learning constant ½¼

output Æ ´Ý Øµ Ý´½ Ýµ
0.1388
¡Û¼ Æ½ ¼
-0.1388
¡Û½ ÆÚ½
-0.1322
¡Û¾ ÆÚ¾
-0.0896

hidden (to Ú½) hidden (to Ú¾)
¡Ù½¼ ÆÛ½ Ú½´½ Ú½µ½ ¼ ¡Ù¾¼ ÆÛ¾ Ú¾´½ Ú¾µ½ ¼
-0.0125 0.0318
¡Ù½½ ÆÛ½ Ú½´½ Ú½µÜ½ ¡Ù¾½ ÆÛ¾ Ú¾´½ Ú¾µÜ½
-0.0125 0.0318
¡Ù½¾ ÆÛ½ Ú½´½ Ú½µÜ¾ ¡Ù¾¾ ÆÛ¾ Ú¾´½ Ú¾µÜ¾
-0.0125 0.0318

14

An Example: a New Error

x1 u x2
8
=0 1.9
21
.83 =
u 11 =1.98 u 12
u 22 =0.83
Ü½ Ü¾ target Ø
u 10 = -1.01 u 20 = -0.96 0 0 0
v1 v2
1 1
0 1 1
1 0 1
w1 =1.86 w2 = -1.08
1 1 0
1 y
w0 = -1.13

¡
hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼

0.9509 ¡
Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼

0.6672 ¡
output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼
0.4776

error ½ Ø Ý ¡¾
¾
0.1140

The error has reduced for this pattern.

15

Summary

Credit-assignment problem solved for hidden units:

Input Output

Æ½
Û½

Û¾
Æ Æ¾
Û¿

Æ ¼

´ µÈ Û Æ Æ¿

Errors

total input to unit ; 1st derivative of acti-
¼

vation function (sigmoid)
Outstanding issues:
1. Number of layers; number and type of units in
layer
2. Learning rates
3. Local or distributed representations

16

The Back Propagation Learning Algorithm

More Related Content

Viewers also liked (9)

More from ESCOM (20)

Recently uploaded (20)

The Back Propagation Learning Algorithm