Back Propagation in Deep Neural Network

Back Propagation of Error in Neural Network
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar (IIIT Surat) Machine Learning 1 / 12

Outlines
1 Introduction to Back Propagation Algorithm
2 Steps for Solving Back Propagation Problem
3 References

Introduction to Back Propagation Algorithm
Key Features of Back Propagation Algorithm:
1 Back-propagation is the essence of neural net training.
2 It is the practice of fine-tuning the weights of a neural net based on
the error rate (i.e. loss) obtained in the previous epoch (i.e. iteration).
3 Proper tuning of the weights ensures lower error rates, making the
model reliable by increasing its generalization.

Example
A simple neural network that have two input (x1, x2) and two output
(O1, O2). There is a single hidden layer in between input-output.
Q What will be the updated weight, so that total error value below the
0.001? ⇒ w∗
1 , w∗
2 , .....w∗
8 =??

Solution
⇒ As per the given question.
Input x1 = 0.05 and x2 = 0.10
Bias b1 = 0.35 and b2 = 0.60
Desired/target output O1 = 0.01 and O2 = 0.99
Learning rate η = 0.6
Initial weight w1 = 0.15, w2 = 0.25, w3 = 0.20, w4 = 0.30, w5 = 0.40,
w6 = 0.50, w7 = 0.45, w8 = 0.55
Condition, ETotal ≤ 0.001
Unknown w∗
1 , w∗
2 , .....w∗
8
⇒ From above figure, Input → Hidden layer
h1(in) = w1x1 + w2x2 + b1 (1)
h2(in) = w3x1 + w4x2 + b1 (2)

Continued–
Hidden Layer → O(in)
O1(in) = w5h1(out) + w6h2(out) + b2 (3)
O2(in) = w7h1(out) + w8h2(out) + b2 (4)
Relation between h1(in) and h1(out) or h2(in) and h2(out)
Here, we use the sigmoid function as an activation function
h1(out) =
1
1 + e−h1(in)
or h2(out) =
1
1 + e−h2(in)
(5)
Similarly, we can use the same activation function for finding the relation
between O1(in) and O1(out)
Note: As per the given question
O1(d) = 0.01 → Desired output across 1st node
O2(d) = 0.99 → Desired output across 2nd node

Continued–
From (1)
⇒ h1(in) = w1x1 + w2x2 + b1 = 0.15 × 0.05 + 0.25 × 0.10 + 0.35 = 0.385
⇒ h2(in) = w3x1 + w4x2 + b1 = 0.20 × 0.05 + 0.30 × 0.10 + 0.35 = 0.390
⇒ h1(out) = 1
1+e−h1(in) = 0.59508
⇒ h2(out) = 1
1+e−h2(in) = 0.59628
⇒ O1(in) = w5h1(in) + w6h2(in) + b2 = 0.40 × 0.385 + 0.50 × 0.390+
0.60 = 0.949
⇒ O2(in) = w7h1(in) + w8h2(in) + b2 = 0.45 × 0.385 + 0.55 × 0.390+
0.60 = 0.98775
⇒ O1(out) = 1
1+e−O1(in) = 0.72091 = Ô1 → Observed 1st output
⇒ O2(out) = 1
1+e−O2(in) = 0.72864 = Ô2 → Observed 2nd output

Continued–
⇒ Error observed across 1st output node.
EO1 =
1
2
(Ô1 − O1(d))2
=
1
2
(0.72091 − 0.01)2
= 0.2527
⇒ Error observed across 2nd output node.
EO2 =
1
2
(Ô2 − O2(d))2
=
1
2
(0.72864 − 0.01)2
= 0.2582
⇒ Total error Etot = EO1 + EO2 = 0.5109
Weight updation
Weight updation for hidden layer to O(in) node
⇒ Weight updation for W5
∂Etot
∂w5
=
∂Etot
∂O1(out)
×
∂O1(out)
∂O1(in)
×
∂O1(in)
∂w5
(6)

Continued–
⇒ ∂Etot
∂O1(out) = (Ô1 − O1(d)) = 0.72091 − 0.01 = 0.71091
⇒ ∂Etot
∂O1(in) = Ô1(1 − Ô1) = 0.72091(1 − 0.72091) = 0.2012
⇒ ∂O1(in)
∂w5
= h1(out) = 0.59508
⇒ New updated weight for w5
w∗
5 = w5(old) − η
∂Etot
∂w5
= 0.40 − 0.6 × 0.085117 = 0.34892
⇒ Note: In similar fashion, w∗
6 , w∗
7 , w∗
8 can also be calculated for a
single iteration. i.e,
⇒ w∗
6 = w6(old) − η∂Etot
∂w6
⇒ w∗
∂w7
⇒ w∗
∂w8

Continued–
Weight updation for input to hidden layer link, i.e. w∗
1 , w∗
2 , w∗
3 , w∗
4
Weight updation rule for w1
∂Etot
∂w1
=
∂Etot
∂h1(out)
×
∂h1(out)
∂h1(in)
×
∂h1(in)
∂w1
(7)
⇒
∂Etot
∂h1(out)
=
∂EO1
∂h1(out)
+
∂EO2
∂h1(out)
⇒
∂EO1
∂h1(out)
=
∂EO1
∂O1(out)
×
∂O1(out)
∂O1(in)
×
∂O1(in)
∂h1(out)
= (Ô1 − O1(d)) × Ô1(1 − Ô1) × w5
⇒
∂EO2
∂h1(out)
=
∂EO2
∂O2(out)
×
∂O2(out)
∂O2(in)
×
∂O2(in)
∂w1
= (Ô2 − O2(d)) × Ô2(1 − Ô2) × w7
⇒
∂Etot
∂h1(out)
= (Ô1 − O1(d)) × Ô1(1 − Ô1) × w5 + (Ô2 − O2(d)) × Ô2(1 − Ô2) × w7
⇒
∂h1(out)
∂h1(in)
= h1(in)(1 − h1(in)) = 0.385(1 − 0.385)
⇒
∂h1(in)
∂w1
= x1 = 0.05
w∗
1 = w1(old) − η
∂Etot
∂w1
(8)

Continued–
⇒ Note: In similar fashion, w∗
2 , w∗
3 , w∗
4 can also be calculated for a
single iteration. i.e,
⇒ w∗
∂w2
⇒ w∗
∂w3
⇒ w∗
∂w4
Note: This iteration process will be continued till the total error didn’t
approach to the certain threshold.

References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.

Back Propagation in Deep Neural Network

More Related Content

What's hot (20)

Similar to Back Propagation in Deep Neural Network (20)

More from VARUN KUMAR (20)

Recently uploaded (20)

Back Propagation in Deep Neural Network