Artificial Neural Network

An Introduction to
Artificial Neural Network
Dr Iman Ardekani

Biological Neurons
Modeling Neurons
McCulloch-Pitts Neuron
Network Architecture
Learning Process
Perceptron
Linear Neuron
Multilayer Perceptron
Content

Ramón-y-Cajal (Spanish Scientist, 1852~1934):
1. Brain is composed of individual cells called
neurons.
2. Neurons are connected to each others by
synopses.
Biological Neurons

Neurons Structure (Biology)
Biological Neurons
Dendrites Cell body
Nucleus
Axon
Axon
Terminals

Synaptic Junction (Biology)
Biological Neurons
Synapse

Neurons Function (Biology)
1. Dendrites receive signal from other neurons.
2. Neurons can process (or transfer) the received signals.
3. Axon terminals deliverer the processed signal to other
tissues.
Biological Neurons
What kind of signals? Electrical Impulse Signals

Modeling Neurons (Computer Science)
Biological Neurons
Input Outputs
Process

Modeling Neurons
Net input signal is a linear combination of input signals xi.
Each Output is a function of the net input signal.
Biological Neurons
x1x2
x3
x4
x5
x6
x7
x8
x9
x10
y

- McCulloch and Pitts (1943) for introducing the
idea of neural networks as computing machines
- Hebb (1949) for inventing the first rule for self-
organized learning
- Rosenblass (1958) for proposing the perceptron
as the first model for learning with a teacher
Modelling Neurons

Net input signal received through synaptic junctions is
net = b + Σwixi = b + WT
X
Weight vector: W =[w1 w2 … wm]T
Input vector: X = [x1 x2 … xm]T
Each output is a function of the net stimulus signal (f is called the
activation function)
y = f (net) = f(b + WTX)
Modelling Neurons

General Model for Neurons
Modelling Neurons
y= f(b+ Σwnxn)
x1
x2
xm
.
.
. y
w1
w2
wm
b
 f

Activation functions
Modelling Neurons
1 1 1
1 1 1
-1 -1 -1
Good for classification Simple computation Continuous & Differentiable
Threshold Function/ Hard Limiter Linear Function sigmoid Function

Sigmoid Function
Modelling Neurons
1
sigmoid Function
f(x) =
1
1 + e-ax
= threshold function
when a goes to infinity

Modelling Neurons
y {0,1}
y
x1
x2
xm
.
.
. y
w1
w2
wm
b


Modelling Neurons
y [0,1]

x1
x2
xm
.
.
. y
w1
w2
wm
b

Single-input McCulloch-Pitts neurode with b=0, w1=-1 for binary
inputs:
Conclusion?
x1 net y
0 0 1
1 -1 0 x1 y
w1
b


Two-input McCulloch-Pitts neurode with b=-1,
w1=w2=1 for binary inputs:
x1 x2 net y
0 0 ? ?
0 1 ? ?
1 0 ? ?
1 1 ? ?
x1
y
w1
b

x2
w2

w1=w2=1 for binary inputs:
x1 x2 net y
0 0 -1 0
0 1 0 1
1 0 0 1
1 1 1 1
x1
y
w1
b

x2
w2

w1=w2=1 for binary inputs :
x1 x2 net y
0 0 ? ?
0 1 ? ?
1 0 ? ?
1 1 ? ?
x1
y
w1
b

x2
w2

w1=w2=1 for binary inputs :
x1 x2 net y
0 0 -2 0
0 1 -1 0
1 0 -1 0
1 1 0 1
x1
y
w1
b

x2
w2

Every basic Boolean function can be
implemented using combinations of McCulloch-
Pitts Neurons.

the McCulloch-Pitts neuron can be used as a
classifier that separate the input signals into two
classes (perceptron):
X1
X2
Class A (y=1)
Class B (y=0)
Class A  y=?
y = 1  net = ?
net  0  ?
b+w1x1+w2x2  0
Class B  y=?
y = 0  net = ?
net < 0  ?
b+w1x1+w2x2 < 0

Class A  b+w1x1+w2x2  0
Class B  b+w1x1+w2x2 < 0
Therefore, the decision boundary is a hyperline given by
b+w1x1+w2x2 = 0
Where w1 and w2 come from?

Solution: More Neurons Required
X2
X1
X1 and x2
X2
X1
X1 or x2
X2
X1
X1 xor x2
?

Nonlinear Classification
X1
X2
Class A (y=1)
Class B (y=0)

Single Layer Feed-forward Network
Single Layer:
There is only one
computational
layer.
Feed-forward:
Input layer projects
to the output layer
not vice versa.
 f
 f
 f
Input layer
(sources)
Output layer
(Neurons)

Multi Layer Feed-forward Network
 f
 f
 f
Input layer
(sources)
Hidden layer
(Neurons)
 f
 f
Output layer
(Neurons)

Single Layer Recurrent Network
 f
 f
 f

Multi Layer Recurrent Network
 f
 f
 f
 f
 f

The mechanism based on which a neural network
can adjust its weights (synaptic junctions weights):
- Supervised learning: having a teacher
- Unsupervised learning: without teacher
Learning Processes

Supervised Learning
Learning Processes
Teacher
Learner
Environment 
Input
data Desired
response
Actual
response
Error Signal
+
-

Unsupervised Learning
Learning Processes
LearnerEnvironment
Input
data
Neurons learn based on a competitive task.
A competition rule is required (competitive-learning rule).

- The goal of the perceptron to classify input data into two classes A and B
- Only when the two classes can be separated by a linear boundary
- The perceptron is built around the McCulloch-Pitts Neuron model
- A linear combiner followed by a hard limiter
- Accordingly the neuron can produce +1 and 0
Perceptron
x1
x2
xm
.
.
. y
wk
wk
wk
b


Equivalent Presentation
Perceptron
x1
x2
xm
.
.
. y
w1
w2
wm
b

w0
1
net = WTX
Weight vector: W =[w0 w1 … wm]T
Input vector: X = [1 x1 x2 … xm]T

There exist a weight vector w such that we may state
WTx > 0 for every input vector x belonging to A
WTx ≤ 0 for every input vector x belonging to B
Perceptron

Elementary perceptron Learning Algorithm
1) Initiation: w(0) = a random weight vector
2) At time index n, form the input vector x(n)
3) IF (wTx > 0 and x belongs to A) or (wTx ≤ 0 and x
belongs to B) THEN w(n)=w(n-1) Otherwise
w(n)=w(n-1)-ηx(n)
4) Repeat 2 until w(n) converges
1)
Perceptron

- when the activation function simply is f(x)=x the
neuron acts similar to an adaptive filter.
- In this case: y=net=wTx
- w=[w1 w2 … wm]T
- x=[x1 x2 … xm]T
Linear Neuron
x1
x2
xm
.
.
. y
w1
w2
wm
b


Linear Neuron Learning (Adaptive Filtering)
Linear Neuron
Teacher
Learner (Linear
Neuron)
Environment 
Input
data Desired
Response
d(n)
Actual
Response
y(n)
Error Signal
+
-

LMS Algorithm
To minimize the value of the cost function defined as
E(w) = 0.5 e2(n)
where e(n) is the error signal
e(n)=d(n)-y(n)=d(n)-wT(n)x(n)
In this case, the weight vector can be updated as follows
wi(n+1)=wi(n-1) - μ( )
Linear Neuron
dE
dwi

LMS Algorithm (continued)
= e(n) = e(n) {d(n)-wT(n)x(n)}
= -e(n) xi(n)
wi(n+1)=wi(n)+μe(n)xi(n)
w(n+1)=w(n)+μe(n)x(n)
Linear Neuron
dE
dwi
de(n)
dwi
d
dwi

Summary of LMS Algorithm
2) At time index n, form the input vector x(n)
3) y(n)=wT(n)x(n)
4) e(n)=d(n)-y(n)
5) w(n+1)=w(n)+μe(n)x(n)
6) Repeat 2 until w(n) converges
Linear Neuron

- To solve the xor problem
- To solve the nonlinear classification problem
- To deal with more complex problems

- The activation function in multilayer perceptron is
usually a Sigmoid function.
- Because Sigmoid is differentiable function, unlike
the hard limiter function used in the elementary
perceptron.

Architecture
 f
 f
 f
Input layer
(sources)
Hidden layer
(Neurons)
 f
 f
Output layer
(Neurons)

Architecture
 f
 f
 f
Inputs (from layer k-1) layer k Outputs (to layer k+1)

Consider a single neuron in a multilayer perceptron (neuron k)
 f
yk
y0=1
y1
ym
Wk,0
Wk,1
Wk,m
.
.
.
netk = Σwk,iyi
yk= f (netk)

Multilayer perceptron Learning (Back Propagation Algorithm)
Teacher
Neuron j
of layer k
layer k-1 
Input
data Desired
Response
dk(n)
Actual
Response
yk(n)
Error Signal
ek(n)
+
-

Back Propagation Algorithm:
Cost function of neuron j:
Ek = 0.5 e2
j
Cost function of network:
E = ΣEj = 0.5Σe2
j
ek = dk – yk
ek = dk – f(netk)
ek = dk – f(Σwk,iyi)

Cost function of neuron k:
Ek = 0.5 e2
k
ek = dk – yk
ek = dk – f(netk)
ek = dk – f(Σwk,iyi)

Cost function of network:
E = ΣEj = 0.5Σe2
j

To minimize the value of the cost function E(wk,i), the
weight vector can be updated using a gradient based
algorithm as follows
wk,i(n+1)=wk,i (n) - μ( )
=?
dE
dwk,i
dE
dwk,j

= =
= δk yk
δk = = -ekf’(netk)
dE
dwk,i
dE
dek
dek
dyk
dyk
dnetk
dnetk
dwk,i
dE
dnetk
dnetk
dwk,i
dE
dek
dek
dyk
dyk
dnetk
dE
dek
= ek
dek
dnetk
= f’(netk)
dek
dyk
= -1
dnetk
dwk,i
= yi
Local Gradient

Substituting into the gradient-based algorithm:
wk,i(n+1)=wk,i (n) – μ δk yk
δk = -ekf’(netk) = - {dk- yk} f’(netk) = ?
If k is an output neuron we have all the terms of δk
When k is a hidden neuron?
dE
dwk,i

When k is hidden
δk = =
= f’(netk)
=?
dE
dek
dek
dyk
dyk
dnetk
dE
dyk
dyk
dnetk
dek
dnetk
= f’(netk)
dE
dyk
dE
dyk

= ?
E = 0.5Σe2
j
dE
dyk
dE
dyk
dej
dyk
dej
dnetj
dnetj
dyk
wjk
-f’(netj)
= -Σ ej f’(netj) wjk = -Σ δj wjk
= Σ ej
= Σ ej

We had δk =-
Substituting = Σ δj wjk into δk results in
δk = - f’(netk) Σ δj wjk
which gives the local gradient for the hidden neuron k
dE
dyk
f’(netk)
dE
dyk

Summary of Back Propagation Algorithm
At time index n, get the input data and
2) Calculate netk and yk for all the neurons
3) For output neurons, calculate ek=dk-yk
4) For output neurons, δk = -ekf’(netk)
5) For hidden neurons, δk = - f’(netk) Σ δj wjk
6) Update every neuron weights by
wk,i(NEW)=wk,i (OLD) – μ δk yk
7. Repeat steps 2~6 for the next data set (next time index)

Artificial Neural Network

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Artificial Neural Network (20)

More from Iman Ardekani (8)

Recently uploaded (20)

Artificial Neural Network

Editor's Notes