Introduction to Neural networks (under graduate course) Lecture 4 of 9

Neural Networks
Dr. Randa Elanwar
Lecture 4

Lecture Content
• Linearly separable functions: logical gate
implementation
– Learning laws: Perceptron learning rule
– Pattern mode solution method
– Batch mode solution method
2Neural Networks Dr. Randa Elanwar

Learning Linearly Separable Functions
• Initial network has a randomly assigned weights.
• Learning is done by making small adjustments in the weights to
reduce the difference between the observed and predicted values.
• Main difference from the logical algorithms is the need to repeat the
update phase several times in order to achieve convergence.
• Updating process is divided into epochs.
• Each epoch updates all the weights of the process.
• Note that: the initial weights and the learning rate value determine
the number of iterations needed for conversion.

Perceptron learning rule
• Desired the desired output for a given input
• Network calculates what it thinks the output should be
• Network changes its weights in proportion to the error between the
desired & calculated results
• wi,j =  * [Desiredi - outputi] * inputj
– where:
–  is the learning rate (given constant);
– Desiredi - outputi is the error term;
– and inputj is the input activation
• wi,j = wi,j + wi,j (delta rule)
• Note: there are other learning rules/laws that will be discussed later
• Learning rate : (1)Used to control the amount of weight adjustment
at each step of training, (2) ranges from 0 to 1, (3) determines the
rate of learning in each time step

Adjusting perceptron weights
• wi,j = wi,j + wi,j
• wi,j =  * [Desiredi - outputi] * inputj
• missi is (Desiredi - outputi)
• Adjust each wi,j based on inputj and missi
• If a set of <input, output> pairs are learnable (representable),
the delta rule will find the necessary weights (when miss=0)
– in a finite number of steps
– independent of initial weights
Desired < 0, output > 0 w<0
Desired = 0, output = 0 w=0
Desired > 0, output < 0 w>0

Hypothetical example
• Suppose we have 2 glasses: first is narrow and tall and has
water in it, second is wide and short with no water in it
• Target is to make both glasses contain the same volume of
water
• Initially, we add some water from the tall to the short then we
measure volumes
• If the volume in the short is less than the tall we add more
water
• If the volume in the short is more than the tall we return back
some water
• And so on till: If both volumes are equal we are done
• The target = desired output, water = weights, difference
measure = error

Node biases
• A node’s output is a weighted function of its inputs
• What is a bias?
• How can we learn the bias value?
• Answer: treat them like just another weight

Training biases ()
• A node’s output:
– 1 if w1x1 + w2x2 + … + wnxn >= 
– 0 otherwise
• Rewrite
– w1x1 + w2x2 + … + wnxn -  >= 0
– w1x1 + w2x2 + … + wnxn + (-1) >= 0
• Hence, the bias is just another weight whose activation is
always -1
• Just add one more input unit to the network topology
bias

Linearly Separable Functions
• When solving the logical AND problem we are searching for the
straight line equation separating +ve (1)and –ve (0) output regions
on the graph
• Different values for w1, w2, θ lead to different line slope. We have
more than 1 solution depending on: initial weights W, learning rate ,
activation function f and learning mode (Pattern vs. Batch)
 IwIw 2211
+ve +ve +ve +ve
-ve -ve -ve -ve

• Similarly for the logical OR problem
• Different values for w1, w2, θ lead to different line slope.
• We have more than 1 solution depending on: initial weights W,
learning rate , activation function f and learning mode (Pattern
vs. Batch)
 IwIw 2211
-ve
-ve
-ve
-ve
+ve +ve +ve +ve

• Example: logical AND, with initial weights 0.5, 0.3
with bias = 0.5 and activation step function at t=0.5.
The learning rate = 1
x2
w1= 0.5
w2 = 0.3
x1
yin = x1w1 + x2w2
  y
Activation Function:
Binary Step Function
t = 0.5,
(y-in) = 1 if y-in >= t
otherwise (y-in) = 0

Solving Linearly Separable Functions
(Pattern mode)
• Given:
• Since we consider bias as additional weight thus the
weight vector is 1x3 we have to fix the
dimensionality of the input vector x1, x2, x3 and x4
from 2x1 to be 3x1 to perform the multiplication.
 5.03.05.0)0( W











11
10
01
00
X).( bXWfY 
x1
x2
x3
x4
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1










111
101
011
001
X

(Pattern mode)
• Update weight vector for iteration 1
  0,5.0
1
0
0
.5.03.05.01.)0( 








 yXW OK
OK
OK
Wrong









5.0
3.1
5.1
)..( 4)0()1( XyWW ydis
TT

  0,2.0
1
1
0
.5.03.05.02.)0( 








 yXW
  0,0
1
0
1
.5.03.05.03.)0( 








 yXW
  0,3.0
1
1
1
.5.03.05.04.)0( 








 yXW

(Pattern mode)
  1,5.0
1
0
0
.5.03.15.11.)1( 








 yXW Wrong










5.0
3.1
5.1
)..( 1)1()2( XyWW ydis
TT

  1,8.0
1
1
0
.5.03.15.12.)2( 








 yXW Wrong










5.1
3.0
5.1
)..( 2)2()3( XyWW ydis
TT


(Pattern mode)
  0,0
1
0
1
.5.13.05.13.)3( 








 yXW
  0,3.0
1
1
1
.5.13.05.14.)3( 








 yXW
OK
Wrong
  0,5.0
1
0
0
5.03.15.21.)4( 








 yXW
OK
Wrong  1,8.0
1
1
0
5.03.15.22.)4( 








 yXW










5.0
3.1
5.2
)..( 4)3()4( XyWW ydis
TT











5.1
3.0
5.2
2)..()4()5( XyyWW dis
TT


(Pattern mode)
  1,1
1
0
1
.5.13.05.23.)5( 








 yXW Wrong










5.2
3.0
5.1
3)..()5()6( XyyWW dis
TT

  0,7.0
1
1
1
.5.23.05.14.)6( 








 yXW










5.1
3.1
5.2
4)..()6()7( XyyWW dis
TT

Wrong
  0,5.1
1
0
0
.5.13.15.21.)7( 








 yXW
0,2.0
1
1
0
.]5.13.15.2[2.)7( 








 yXW
1,1
1
0
1
].5.13.15.2[3.)7( 








 yXW Wrong
OK
OK

(Pattern mode)










5.2
3.1
5.1
3)..()7()8( XyyWW dis
TT

0,3.0
1
1
1
].5.23.15.1[4.)8( 








 yXW
Wrong










5.1
3.2
5.2
4)..()8()9( XyyWW dis
TT

0,5.1
1
0
0
].5.13.25.2[1.)9( 








 yXW
1,8.0
1
1
0
].5.13.15.2[2.)9( 








 yXW Wrong
OK










5.2
3.1
5.2
3)..()9()10( XyyWW dis
TT


(Pattern mode)
• The weights learning has converged at 10 iterations
0,0
1
0
1
].5.13.15.2[3.)10( 








 yXW
1,3.1
1
1
1
].5.23.15.2[4.)10( 








 yXW
0,5.2
1
0
0
].5.13.15.2[1.)10( 








 yXW
0,7.0
1
1
0
].5.13.15.2[2.)10( 








 yXW
OK
OK
OK
OK

Solving Linearly Separable Functions (Batch
mode)
• Add w for all misclassified inputs together in 1
step
0,5.0
1
0
0
].5.03.05.0[1.)0( 








 yXW
0,2.0
1
1
0
].5.03.05.0[2.)0( 








 yXW
0,0
1
0
1
].5.03.05.0[3.)0( 








 yXW
0,3.0
1
1
1
].5.03.05.0[4.)0( 








 yXW
OK
OK
OK
Wrong









5.0
3.1
5.1
4)..()0()1( XyyWW dis
TT


mode)
• Add w for all misclassified inputs together in 1 step
1,5.0
1
0
0
].5.03.15.1[1.)1( 








 yXW
1,8.1
1
1
0
].5.03.15.1[2.)1( 








 yXW
1,2
1
0
1
].5.03.15.1[3.)1( 








 yXW
1,3.3
1
1
1
].5.03.15.1[4.)1( 








 yXW
Wrong
Wrong
Wrong
OK
3)..(2)..(1)..()1()2( XyXyXy yyyWW disdisdis
TT
 














































5.2
3.0
5.0
1
0
1
1
1
0
1
0
0
5.0
3.1
5.1
)2(W
T

mode)
• Add w for all misclassified inputs together in 1
step
0,5.2
1
0
0
].5.23.05.0[1.)2( 








 yXW
0,2.2
1
1
0
].5.23.05.0[2.)2( 








 yXW
0,2
1
0
1
].5.23.05.0[3.)2( 








 yXW
0,7.1
1
1
1
].5.23.05.0[4.)2( 








 yXW
OK
OK
OK
Wrong










5.1
3.1
5.1
4)..()2()3( XyyWW dis
TT


mode)
• Note that
• The number of iterations in Batch mode solution is
sometimes less than those of pattern mode
• The final weights obtained by Batch mode solution are
different from those obtained by pattern mode solution.
0,5.1
1
0
0
].5.13.15.1[1.)3( 








 yXW
0,2.0
1
1
0
].5.13.15.1[2.)3( 








 yXW
0,0
1
0
1
].5.13.15.1[3.)3( 








 yXW
1,3.1
1
1
1
].5.13.15.1[4.)3( 








 yXW
OK
OK
OK
OK

Introduction to Neural networks (under graduate course) Lecture 4 of 9

More Related Content

What's hot (20)

Similar to Introduction to Neural networks (under graduate course) Lecture 4 of 9 (20)

More from Randa Elanwar (20)

Recently uploaded (20)

Introduction to Neural networks (under graduate course) Lecture 4 of 9