Neural Network - Feed Forward - Back Propagation Visualization

Neural Network - Forward and Back
Propagation, Gradient Descent
Visualize Forward and Back Propagation in a
simple Neural Network using IBM Rational
Rhapsody MDA tool to design the Neural
Network and execute the model. Minimize
the Cost (Loss) using Gradient Descent

The Problem: wX + b = Y
•Question: Given a set of X inputs, a set of
Y targets (expected, desired), and initial
values for the Weights, can the Neural
Network model compute the outputs,
Yhat, (actual, computed, predicted), to
match the Y targets within a predefined
error(i.e. Y – Yhat <= 10E-6)?

The Problem: wX + b = Y (cont.)
•Answer: Yes! Use gradient descent
algorithm to find the minimum of a
function.
•Use the Forward(FP) and Back
Propagation(BP) capability of the NN
model to minimize the Cost or Loss of the
NN and thus find the right weights.

•In FP compute the Outputs for each
neuron in each Hidden Layer
•the output of the Input Layer is the
transferred input, X, and does not
need to be computed
•compute the Cost at the output Layer

•In BP compute the Gradients and update the
Weights(coefficients) to minimize the cost
•Iterate again (FPBP) until the Cost
approaches the predefined error.
• Xs and Ys are randomly generated within (-1, 1)
• Weights are randomly initialized between (-0.1, 0.1)

An Artificial Neural Network is suitable to an
Object Oriented design:
•ANNModel (“layer manager”) has Layers
• Maintains a list of Layers
• Delegates operations to the Layers
•Layer (“neuron manager”) has Neurons
• Maintains a list of Neurons
• Delegates operations to the Neurons
• ANNModel does not have DIRECT access to the
Neurons but through its Layers

Artificial Neural Network is suitable to an
Object Oriented design (cont):
•Neuron maintains a list of Weights that is
indexed by the Ids of the Previous Layer
Neurons
• This indexing constitutes the connection between
a current layer and the previous layer in a NN
• A Neuron encapsulates data and operations to:
• Activate a neuron (i.e. compute its output)
• Compute its Gradient
• Update its Weights (compute momentum, and
delta weight)

Artificial Neural Network is suitable to an
Object Oriented design (cont):
•Activator (“singleton”)
•Encapsulates the activation functions and
their corresponding derivatives
•The Neuron interacts with the Activator
when it needs to use the activation function
and the corresponding derivatives
•It is Instantiated by the ANNModel at
startup

USE Rational Rhapsody MDA tool to design a NN
and visualize the two major states of execution in
a NN:
•Forward Propagation state
• Compute the Outputs
• Compute Error(Cost, Loss)
•Back Propagation state
• Minimize Error(Cost) by using gradient descent
• Compute gradients - For Output layer, For Hidden layer
• Update weights - For Output layer, For Hidden layer

The NN Model has two major states of execution:
1. Forward_Propagation state:
•Activation sub state - compute the Outputs
•Compute_Cost sub state - compute Error
2. Back_Propagation state:
•Compute_Gradients sub state
•Update_Weights sub state
x. Configuration state - the inputs, target outputs, and the
weights are generated before training.

Here is the Logical View – Class diagram

Dynamic View – CONFIGURE state

Dynamic View – FP->ACTIVATE state

Dynamic View – FP->COMPUTE_COST state

Dynamic View – BP --> COMPUTE_GRADIENTS state

Dynamic View – BP -->Update_Weights state

Output
Input
Hidden1
h2
Ah2
Gh2
h3
Ah3
Gh3
1.0
X3
X1
O1
X2
O2
O3
h0
Ah0 =1WB_h1
WX1_h1
WX2_h1
WX3_h1
Wb_h2
WX1_h2
WX2_h2
WX3_h2
Wb_h3
WX1_h3
WX2_h3
WX3_h3
WB_o2
Wh1_o2
Wh2_o2
Wh3_o2
WB_o3
Wh1_o3
Wh2_o3
Wh3_o3
Y1hat = tanh(Zo1)
Err1 = Y1 – Y1hat
Go1 = Err1*tanhD(ZO1)
Y2hat = tanh(Zo2)
Err2 = Y2 – Y2hat
Y3hat = tanh(Zo3)
Err3 = Y3 – Y3hat
Zh1
h1_Wlist
O1_Wlist
Zi = Sum of (w * previous neuron output)
Ah1 = tanh (Zh1); Y1hat = tanh (ZO1)
Zh1 = (h1_Wlist[0]*1.0 + h1_Wlist[1]*X1 + h1_Wlist[2]*X2 + h1_Wlist[3]*X3) =
(WB_h1*1.0 + WX1_h1*X1 + WX2_h1*X2 + WX3_h1*X3)
ZO1 = (O1_Wlist[0]*1.0 + O1_Wlist[1]*Ah1 + O1_Wlist[2]*Ah2 + O1_Wlist[3]*Ah3) =
(WB_O1*1.0 + Wh1_O1*Ah1 + Wh2_O1*Ah2 + Wh3_O1*Ah3)
Zh2
Zh3
WB_o1
Wh1_o1
Wh2_o1
Wh3_o1
O2_Wlist
O3_Wlist
h2_Wlist
h3_Wlist
h1
Ah1
Gh1
ZO1
ZO2
ZO3
Cost = (Err1^2+Err2^2+Err3^2) / (2*3)
FORWARD PROPAGATION

𝐶𝑜𝑠𝑡(𝑤,𝑏) =
1
2𝑁 𝑖=1
𝑁
(𝑦𝑖 − 𝑦𝑖ℎ𝑎𝑡)2
where: yi – target, expected, desired, ideal
yihat – predicted, computed, actual
N – number of outputs in Output layer
FORWARD PROPAGATION

Output
O1
O2
O3
Y1hat = AF(Zo1)
Err1 = Y1 – Y1hat
Go1 = Err1 * AF’(ZO1)
Y2hat = AF(Zo2)
Err2 = Y2 – Y2hat
Y3hat = AF(Zo3)
Err3 = Y3 – Y3hat
Output Layer Gradient – Go(i)
Zo1
ZO2
ZO3
COMPUTE GRADIENTS

Output
Input
Hidden1
h1
Ah1
Gh1
h2
Ah2
Gh2
h3
Ah3
Gh3
1.0
X3
X1
O1
X2
O2
O3
h0
Ah0
1
WB_h1
WX1_h1
WX2_h1
WX3_h1
WB_o2
Wh1_o2
Wh2_o2
Wh3_o2
WB_o3
Wh1_o3
Wh2_o3
Wh3_o3
Y1hat = tanh(Zo1)
Err1 = Y1 – Y1hat
Y2hat = tanh(Zo2)
Err2 = Y2 – Y2hat
Y3hat = tanh(Zo3)
Err3 = Y3 – Y3hat
Zh1
h1_Wlist
O1_Wlist
Gh1= SumG * tanhD(Zh1)
Zh2
Zh3
WB_o1
Wh1_o1
Wh2_o1
Wh3_o1
O2_Wlist
O3_Wlist
Wb_h2
WX1_h2
WX2_h2
WX3_h2
Wb_h3
WX1_h3
WX2_h3
WX3_h3
h2_Wlist
h3_Wlist
ZO1
ZO2
ZO3
SumG = (O1_Wlist[1]*Go1 + O2_Wlist[1]*Go2 + O3_Wlist[1]*Go3 ) =
Wh1_o1*Go1 + Wh1_o2*Go2 + Wh1_o3*Go3
COMPUTE GRADIENTS
Hidden Layer Gradient – Gh(i)

Output
Input
Hidden1
h1
h2
h3
I0
A=1
I3
A=X3
I1
A=X1
O1
I2
A=X2
O2
O3
h0
Ah0
=1
WB_h1
WX1_h1
WX2_h1
WX3_h1
WB_o2
Wh1_o2
Wh2_o2
Wh3_o2
WB_o3
Wh1_o3
Wh2_o3
Wh3_o3
Y1hat = tanh(Zo1)
Err1 = Y1 – Y1hat
Y2hat = tanh(Zo2)
Err2 = Y2 – Y2hat
Y3hat = tanh(Zo3)
Err3 = Y3 – Y3hat
Zh1
Zo1
Zo2
Zo3
H1_Wlist
O1_Wlist
Ah1
Gh1
Zh2
Zh3
WB_h1 += (0.5*1.0*Gh1) ;
WX1_h1 +=(0.5*X1* Gh1);
WX2_h1 +=(0.5*X2* Gh1);
WX3_h1 +=(0.5*X3* Gh1);
WB_o1
Wh1_o1
Wh2_o1
Wh3_o1
WB_o1 += (0.5*1.0* Go1); WB_o2 += (0.5*1.0* Go2); WB_o3 += (0.5*1.0* Go3)
Wh1_o1 += (0.5*Ah1* Go1); Wh1_o2 += (0.5*Ah1 * Go2); Wh1_o3 += (0.5* Ah1* Go3)
O2_Wlist
O3_Wlist
Ah2
Ah3
Gh2
Gh3
Cost = (Err1^2+Err2^2+Err3^2) / (2*3)
W += (LearnRate * PrevOut * CurrentG)
LearnRate = 0.5;
UPDATE WEIGHTS

OutputHidden1
h1
Ah1
Gh1
h2
Ah2
Gh2
h3
Ah3
Gh3
O1
O2
O3
h0
Ah0
1.0
WB_o2
Wh1_o2
Wh2_o2
Wh3_o2
WB_o3
Wh1_o3
Wh2_o3
Wh3_o3
Y1hat = tanh(Zo1)
Err_o1 = Y1 – Y1hat
Go1 = Err_o1*tanhD(Zo1)
Y2hat = tanh(Zo2)
Y3hat = tanh(Zo3)
O1_Wlist
WB_o1
Wh1_o1
Wh2_o1
Wh3_o1
O2_Wlist
O3_Wlist
ZO1
ZO2
ZO3
Wh1_o1 += ΔW(t) + momt; momt = α * ΔW(t-1) ; ΔW = LR * Ah1 * Go1
UPDATE WEIGHTS
Zh1
Zh2
Zh3
WX1_h1 += ΔW(t) + momt; momt = α * ΔW(t-1) ;
ΔW = LR * X1 * Gh1
ΔW = LR * X2 * Gh2
ΔW = LR * X3 * Gh3
WB_h1 += ΔW(t) + momt; momt = α * ΔW(t-1) ;
ΔW = (LR * 1.0 *Gh1);
ΔW = (LR * X1 * Gh1);
ΔW = (LR * X2 * Gh1);
ΔW = (LR * X3 * Gh1);
Zh0

float Neuron:: Activate(Layer* prevLayer,const Activation_Type& activation) {
assert(prevLayer != NULL);
//Iterate through the previous layer neurons to compute the product sum (Zi)
//This is Fan-In: from many(previous) to one(current)
itsInputSum = ZERO_FLOAT;
OMIterator<Neuron*> iPrevNeuron = prevLayer->getItsNeuronList();
for (iPrevNeuron.reset(); *iPrevNeuron != NULL; ++iPrevNeuron) {
float w = itsWeightList[(*iPrevNeuron)->itsId]->W;
float output = (*iPrevNeuron)->itsOutput;
itsInputSum += w * output;
}
itsOutput = Activator::S_GetInstance()->Run(itsInputSum, activation);
return itsOutput;
}

//Output layer
void Neuron::ComputeOutputGradient( float expectedOutput,
const Activation_Type& activation, bool useOutput) {
itsError = expectedOutput - itsOutput;
float x = itsInputSum; //default
if (useOutput == true) { x = itsOutput; //experiment }
float deriv = Activator::S_GetInstance()->RunDeriv(x, activation);
itsGradient = itsError * deriv;
}

void Neuron::ComputeHiddenGradient(Layer* nextLayer,
const Activation_Type& activation, bool useOutput) {
//Compute the contribution of each current neuron to the network Error
itsGradientSum = ZERO_FLOAT;
OMIterator<Neuron*> iNextNeuron = nextLayer->getItsNeuronList();
for (iNextNeuron.reset(); *iNextNeuron != NULL; ++iNextNeuron) {
if ((*iNextNeuron)->itsBiasFlag == false) { //skip the bias neuron
float w = (*iNextNeuron)->itsWeightList[itsId]->W;
float gradient = (*iNextNeuron)->itsGradient;
itsGradientSum += w * gradient;
}
}//for()
float x = itsInputSum; //default
if (useOutput == true) { x = itsOutput; //experiment }
float deriv = Activator::S_GetInstance()->RunDeriv(x, activation);
itsGradient = itsGradientSum * deriv;
}

void Neuron::UpdateWeights(
Layer* prevLayer, float learningRate, float alpha) {
OMIterator<Neuron*> iPrevNeuron = prevLayer->getItsNeuronList();
for (iPrevNeuron.reset(); *iPrevNeuron != NULL; ++iPrevNeuron) {
int id = (*iPrevNeuron)->itsId;
float output = (*iPrevNeuron)->itsOutput ;
float momentum = alpha * itsWeightList[id]->DeltaW;
float deltaW = learningRate * output * itsGradient + momentum;
itsWeightList[id]->DeltaW = deltaW;
itsWeightList[id]->W += deltaW;
}
}

void Activator::Run(float x, const Activation_Type& activation) {
float output = ZERO_FLOAT;
switch(activation){
case eSigmoid:
output = Sigmoid(x); break;
case eReLu:
output = ReLu(x); break;
case eLeakyReLu:
output = LeakyReLu(x); break;
case eTanh:
default:
output = Tanh(x);break;
}
return output;
}

Neural Network - Feed Forward - Back Propagation Visualization

-4
-2
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
Fully Connected Network: Input - 11 Neurons, 2 Hidden - 3 Neurons, Output – 10 Outputs
LId NId PrevNId Weight

-4
-2
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
Fully Connected Network: Input - 11 Neurons, 2 Hidden - 3 Neurons, Output – 10
-4
-2
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

0
0.5
1
1.5
2
2.5
3
3.5
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
361
370
379
388
397
406
415
424
433
442
451
460
469
478
487
496
505
514
523
532
541
550
559
568
577
586
595
604
613
622
631
640
649
658
667
676
685
694
703
712
721
Chart Title
LRate Alpha
0
0.05
0.1
0.15
0.2
0.25
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
361
370
379
388
397
406
415
424
433
442
451
460
469
478
487
496
505
514
523
532
541
550
559
568
577
586
595
604
613
622
631
640
649
658
667
676
685
694
703
712
721
Cost

0
0.5
1
1.5
2
2.5
3
3.5
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
361
370
379
388
397
406
415
424
433
442
451
460
469
478
487
496
505
514
523
532
541
550
559
568
577
586
595
604
613
622
631
640
649
658
667
676
685
694
703
712
721
Chart Title
LRate Alpha
0
0.01
0.02
0.03
0.04
0.05
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
361
370
379
388
397
406
415
424
433
442
451
460
469
478
487
496
505
514
523
532
541
550
559
568
577
586
595
604
613
622
631
640
649
658
667
676
685
694
703
712
721
Cost
LR
OK

Neural Network - Feed Forward - Back Propagation Visualization

More Related Content

What's hot (20)

Similar to Neural Network - Feed Forward - Back Propagation Visualization (20)

Recently uploaded (20)

Neural Network - Feed Forward - Back Propagation Visualization