AI-CH5 (ANN) - Artificial Neural Network

Artificial Intelligence
Chapter 5- Artificial Neural Networks

Content
2
 Biological Analogy
 History
 ANN Structures
 Learning in ANN
 Designing an ANN based System
 Back Propagation
 Chapter Summary

Artificial Neural Network (ANN)
3
 Biological Analogy
 ANN is a computational model that simulate some
properties of the human brain
Biological Neural Network Artificial Neural Networks

Artificial Neural Network (ANN)…
4
 Biological Analogy…
 The brain is composed of a mass of interconnected
neurons. Each neuron is connected to many other neurons
 Neurons transmit signals to each other
 Whether a signal is transmitted is an all-or-nothing event
(the electrical potential in the cell body of the neuron is
thresholded)
 Whether a signal is sent, depends on the strength of the
bond (synapse) between two neurons

5
 Brain Vs Digital Computers
 Computers require hundreds of cycles to simulate a
firing of a neuron
 The brain can fire all the neurons in a single step
o Parallelism
Parallelism
 Computers require billions of cycles to perform some
tasks but the brain takes less than a second
e.g. Face Recognition

6
 Brain Vs Digital Computers…
A crude comparison of the raw computational resources available to
computers (circa 1994) and brains
Computer Human
Computational
units
1 CPU, 105
gates 1011
neurons
Storage units 109
bits RAM, 1010
bits disk
1011
neurons, 1014
synapses
Cycle time 10-8
sec 10-3
sec
Bandwidth 109
bits/sec 1014
bits/sec
Neuron
updates/sec
105
1014

7
 History
 1943:
1943: McCulloch & Pitts show that neurons can be combined
to construct a Turing machine (using ANDs, Ors, & NOTs)
 1958:
1958: Rosenblatt shows that perceptrons will converge if
what they are trying to learn can be represented
 1969:
1969: Minsky & Papert showed the limitations of
perceptrons, killing research for a decade
 1985:
1985: backpropagation
backpropagation algorithm revitalizes the field

8
 ANN
 Is a system composed of many simple processing
elements operating in parallel which can acquire,
store, and utilize experiential knowledge
 Receives information from other neurons or from
external sources, transform the information, and
pass it on to other neurons or as external outputs
 Useful for pattern recognition, learning, and the
interpretation of incomplete inputs

9
 ANN Structures
 Feed-forward neural nets
Feed-forward neural nets:
:
 Links can only go in one direction
 Arranged in layers
layers
 Each unit is linked only in the unit in next layer
 No units are linked between the same layer, back to the
previous layer or skipping a layer
 Computations can proceed uniformly from input to output
units
 No internal state exists
Recurrent
Recurrent
Feed-forward
Feed-forward

10
 ANN Structures
Feed-forward neural nets…
…
Recurrent
Recurrent
Feed-forward
Feed-forward
Multi Layer Networks
Multi Layer Networks Perceptrons
Perceptrons

11
 ANN Structures
Feed-forward neural nets…
…
Recurrent
Recurrent
Feed-forward
Feed-forward
- Have one or more
layers of hidden units.
- With two possibly
very large hidden
layers, it is possible
possible
to implement any
to implement any
function
function.
- Networks without hidden
layer are called perceptrons.
- Perceptrons are very
limited in what they can
represent, but this makes
their learning problem much
simpler.
Multi Layer Networks
Multi Layer Networks Perceptrons
Perceptrons

12
 ANN Structures
 Recurrent neural nets
Recurrent neural nets:
:
 Links can go anywhere and form arbitrary topologies
 Allows activation to be feedback to the previous unit
 Internal state is stored in its activation level
 Can become unstable
 Can oscillate
 May take long time to compute a stable output
 Learning process is much more difficult
 Can implement more complex designs
 Can model certain systems with internal states
Recurrent
Recurrent
Feed-forward
Feed-forward

13
 Learning in ANN
 Supervised Learning
 Classification problem
 Unsupervised Learning
 Clustering
 Reinforcement Learning
 Rewarding approach

14
 Designing an ANN based system
 Get the data set
 Determine the network topology
 Training
 Optimization
 Testing

15
 Designing an ANN based system…
 Get the data set
 Example
 Determine the characteristics of the customers of a given business
institution (Bank) in terms of credit risks (low credit risk , or high credit
risk)
o The data for this problem is all about the attributes of the customers
Age
Income
Debt
Payment record

16
 Determine the network topology (Default NN topology)
 Number of nodes in the Output layer
 Equal to input parameters (independent variables)
 Number of nodes and layers in the Hidden layer
 There is no a clear cutting rule (Optimization Governs)
o Minimum error rate, and
o Less number of hidden nodes Helpful to simplify the hardware
representation of the network
 Number of nodes in the output layer
 Equal to output parameters (dependent variables)
o e.g. Number of class labels
 Possible to optimize the number of output nodes as per the nature of the
output

17
 Training
 Dataset
 Determine the training, validation, and testing dataset
 Activation function
 Back propagation algorithm
 Stopping rule
 Error rate
 Number of epoch
 Momentum coefficient

18
 Optimization: to produce an optimal NN topology
 Minimum error rate
 Less number of nodes and layers at hidden layer
 Testing
 Without input noise
 With input noise
 Using the Network
 If the error rate is within the acceptable error range, the network is ready for
use

19
 General Rules
 Initial network has a randomly assigned weights
 Learning is done by making small adjustments in the weights to
reduce the difference between the observed and predicted values
 Main difference from the logical algorithms is the need to repeat
the update phase several times in order to achieve convergence
 Updating process is divided into epochs
 Each epoch updates all the weights of the process.

20
 Example 1
 Determine the characteristics of the customers of a given business
institution (Bank) in terms of credit risks (low credit risk , or high credit
risk)
 The data for this problem is all about the attributes of the customers
o Age
o Income
o Debt
o Payment record

21
 Example 1…

22
 Backpropagation
 In 1969 a method for learning in multi-layer network,
Backpropagation
Backpropagation, was invented by Bryson and Ho
 The Backpropagation algorithm is a sensible approach
for dividing the contribution of each weight
 Works basically the same as perceptrons

23
 Backpropagation…
 Backpropagation Learning Principles: Hidden Layers
and Gradients
 There are two differences for the updating rule :
differences for the updating rule :
 The activation of the hidden unit is used instead of
instead of
the input value
the input value
 The rule contains a term for the gradient of the
activation function

24
 Network training
 1. Initialize network with random weights (small random numbers (e.g. ranging from
-1.0 to 1.0, or -0.5 to 0.5))
 2. For all training cases (called examples):
 a. Present training inputs to network and calculate output
o Given a unit j in the hidden or output layer, the net input, Ij , to unit j is
Ij =  WijOi + өj
o Where
Wij is the weight of the connection from unit i in the previous layer to unit j;
Oi is the output of unit i from the previous layer ; and
Өj is the bias of the unit
 b.
b. activation function: Given the net input Ij to unit j, then Oj , the out put of unit j is
computed as
Oj = 1 / (1 + e-Ij
) ---------- Sigmoid function

25
 Network training…
 3. Back propagate the error
 For a unit j in the output layer , the error Errj is computed by
Errj = Oj(1- Oj ) (Tj- Oj )
o Where
Oj is the actual output of unit j
Tj is true output Oj
Oj(1- Oj ) the deviation of logistic function
 To compute the error of a hidden layer unit j , the weighted sum of the errors of the
units connected to unit j in the next layer are considered. The error of the hidden layer
unit j is
Errj = Oj(1- Oj )  ErrkWjk
o Where
Wjk is weight of the connection from unit j to a unit k in the next higher layer
Errk is the error of unit k

26
 4. Update weights and biases to reflect the propagated error
 Update weights
ΔWij = (l)ErrjOi
Wij = Wij + ΔWij
o Where
ΔWij the change in weight Wij
l learning rate (a constant typically having the value b/n 0.0 and 1.0)
(the rule of thumb is to set the learning rate to 1/t, where t is the number
of iterations through the training set so far.
 Update biases
Δ өj = (l)Errj
өj = өj + Δ өj

27
 5. Terminating conditions
 All ΔWij =in the previous epoch were so small as to be below
some specified threshold, or
 The percentage of samples misclassified in the previous epoch
is below some threshold, or
 A prespecified number of epochs has expired

28
 Example 1
4
2
3
5
6
x1
x2
x3
w14
w15
w24
w25
w34
w35
w46
w56
The above figure shows a multilayer feed-forward neural network. Let the
learning rate be 0.9. The initial weight and bias values of the network
are given in the table along with the first training sample, X = (1, 0, 1),
whose class label is 1. Show the calculation for the backpropagation.
x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56 ө4 ө5 ө6
1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2 -0.2 0.2 0.1

29
 Summary
 NNs are used for classification and function approximation
or mapping problems which are:
 Tolerant of some imprecision
 Have lots of training data available
 Hard and fast rules cannot easily be applied

30
 Summary…
 ANNs applications can be grouped in to the following categories:
 Clustering:
 A clustering algorithm explores the similarity between patterns and
places similar patterns in a cluster. Best known applications include data
compression and data mining.
 Classification/Pattern recognition:
 The task of pattern recognition is to assign an input pattern (like
handwritten symbol) to one of many classes. This category includes
algorithmic implementations such as associative memory.
 Function approximation:
 The tasks of function approximation is to find an estimate of the unknown
function f() subject to noise. Various engineering and scientific disciplines
require function approximation

31
 Summary…
 ANNs applications can be grouped in to the following
categories:…
 Prediction/Dynamical Systems:
 The task is to forecast some future values of a time-sequenced
data. Prediction has a significant impact on decision support
systems. Prediction differs from Function approximation by
considering time factor. Here the system is dynamic and may
produce different results for the same input data based on system
state (time).

Assignment-2
32
 Explain the following terms (with in the perspective of ANNs)
 Epoch
 Learning rate
 Momentum coefficient
 Explain sampling techniques used to determine the training, validation, and
testing datasets in the training of ANNs.
 Compare ANN with
 Support Vector Machine (SVM)
 Decision tree
 Bayesian Belief Network
Due date 04.01.2018

Project
33
 Consider a classification problem in a supervised
environment and develop an ANN based expert
systems in the domains given below
 Group-1
 Agriculture
 Group-2
 Energy
 Group-3
 Health
 Report submission and Presentation due date
 26.01.18

AI-CH5 (ANN) - Artificial Neural Network

More Related Content

Similar to AI-CH5 (ANN) - Artificial Neural Network (20)

Recently uploaded (20)

AI-CH5 (ANN) - Artificial Neural Network