SlideShare a Scribd company logo
1
Unit III NEURAL NETWORKS
Adaptive Networks – Feed Forward Networks
2
Topics
• Machine Learning using Neural Network
• Adaptive Networks – Feed Forward Networks
• Supervised Learning Neural Networks, Radial
Basis Function Networks
• Reinforcement Learning
• Unsupervised Learning Neural Networks
• Adaptive Resonance Architectures
• Advances in Neural Networks
3
ARTIFICIAL NEURAL NET
X2
X1
W2
W1
Y
The figure shows a simple artificial neural net with two input
neurons (X1, X2) and one output neuron (Y). The inter connected
weights are given by W1 and W2.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
4
ASSOCIATION OF BIOLOGICAL NET WITH
ARTIFICIAL NET
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
5
The neuron is the basic information processing unit
of a NN. It consists of:
1. A set of links, describing the neuron inputs, with
weights W1, W2, …, Wm.
2. An adder function (linear combiner) for
computing the weighted sum of the inputs (real
numbers):
3. Activation function for limiting the amplitude of
the neuron output.
j
m
j
j X
W
u 


1
)
b
(u
y 

PROCESSING OF AN ARTIFICIAL NET
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
6
BIAS OF AN ARTIFICIAL NEURON
The bias value is added to the weighted sum
∑wixi so that we can transform it from the origin.
Yin = w
∑ ixi + b, where b is the bias
x1-x2=0
x1-x2= 1
x1
x2
x1-x2= -1
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
7
MULTI LAYER ARTIFICIAL NEURAL NET
INPUT: records without class attribute with
normalized attributes values.
INPUT VECTOR: X = { x1, x2, …, xn} where n is the
number of (non-class) attributes.
INPUT LAYER: there are as many nodes as non-
class attributes, i.e. as the length of the input
vector.
HIDDEN LAYER: the number of nodes in the
hidden layer and the number of hidden layers
depends on implementation.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
8
OPERATION OF A NEURAL NET
-
f
Weighted
sum
Input
vector x
Output y
Activation
function
Weight
vector w
å
w0j
w1j
wnj
x0
x1
xn
Bias
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
9
WEIGHT AND BIAS UPDATION
Per Sample Updating
• Updating weights and biases after the presentation
of each sample.
Per Training Set Updating (Epoch or Iteration)
• Weight and bias increments could be accumulated
in variables and the weights and biases updated
after all the samples of the training set have been
presented.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
10
STOPPING CONDITION
 All change in weights (wij) in the previous epoch
are below some threshold, or
 The percentage of samples misclassified in the
previous epoch is below some threshold, or
 A pre-specified number of epochs has expired.
 In practice, several hundreds of thousands of epochs
may be required before the weights will converge.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
11
BUILDING BLOCKS OF ARTIFICIAL NEURAL NET
 Network Architecture (Connection between
Neurons)
 Setting the Weights (Training)
 Activation Function
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
12
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
13
LAYER PROPERTIES
 Input Layer: Each input unit may be designated by
an attribute value possessed by the instance.
 Hidden Layer: Not directly observable, provides
nonlinearities for the network.
 Output Layer: Encodes possible values.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
14
TRAINING PROCESS
 Supervised Training - Providing the network with a
series of sample inputs and comparing the output
with the expected responses.
 Unsupervised Training - Most similar input vector is
assigned to the same output unit.
 Reinforcement Training - Right answer is not
provided but indication of whether ‘right’ or ‘wrong’
is provided.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
15
ACTIVATION FUNCTION
 ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
 HARD LIMIT FUCNTION (DISCRETE)
• Binary Activation function
• Bipolar activation function
• Identity function
 SIGMOIDAL ACTIVATION FUNCTION
(CONTINUOUS)
• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
16
ACTIVATION FUNCTION
Activation functions:
(A) Identity
(B) Binary step
(C) Bipolar step
(D) Binary sigmoidal
(E) Bipolar sigmoidal
(F) Ramp
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
17
CONSTRUCTING ANN
 Determine the network properties:
• Network topology
• Types of connectivity
• Order of connections
• Weight range
 Determine the node properties:
• Activation range
 Determine the system dynamics
• Weight initialization scheme
• Activation – calculating formula
• Learning rule
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
18
PROBLEM SOLVING
 Select a suitable NN model based on the nature of
the problem.
 Construct a NN according to the characteristics of
the application domain.
 Train the neural network with the learning
procedure of the selected model.
 Use the trained network for making inference or
solving problems.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
19
NEURAL NETWORKS
 Neural Network learns by adjusting the weights so
as to be able to correctly classify the training data
and hence, after testing phase, to classify unknown
data.
 Neural Network needs long time for training.
 Neural Network has a high tolerance to noisy and
incomplete data.
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
20
SALIENT FEATURES OF ANN
 Adaptive learning
 Self-organization
 Real-time operation
 Fault tolerance via redundant information coding
 Massive parallelism
 Learning and generalizing ability
 Distributed representation
“Principles of Soft Computing, 2nd
Edition”
by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
Adaptive Networks
• An adaptive network is a network structure
consisting of a number of nodes connected through
directional links.
• Each node represents a process unit, and the links
between nodes specify the causal relationship
between the connected nodes.
• All or part of the nodes are adaptive
• The learning rule specifies how these parameters
should be updated to minimize a prescribed error
measure
▫ It is a mathematical expression that measures the
discrepancy between the network's actual output and a
desired output.
21
Adaptive Networks
• Basic learning rule of the adaptive network is the
well-known steepest descent method, in which
the gradient vector is derived by successive
invocations of the chain rule.
• The gradient in a multilayer neural network -
was called the back propagation learning rule
22
Feed forward adaptive network
23
Adaptive Networks
• An adaptive network is a network structure
whose overall input-output behavior is
determined by a collection of modifiable
parameters.
• The configuration of an adaptive network
▫ set of nodes connected by directed links where
each node performs a static node function on its
incoming signals to generate a single node output
and each link specifies the direction of signal flow
from one node to another.
24
Adaptive Networks
• An adaptive network is heterogeneous and each
node may have a specific node function different
from the others.
• Links in an adaptive network are merely used to
specify the propagation direction of node outputs;
generally there are no weights or parameters
associated with links.
25
Adaptive Networks
• The parameters of an adaptive network are distributed
into its nodes, so each node has a local parameter set.
• The union of these local parameter sets is the
network's overall parameter set.
• If a node's parameter set is not empty, then its node
function depends on the parameter values; we use a
square to represent this kind of adaptive node.
• If a node has an empty parameter set, then its
function is fixed; we use a circle to denote this type of
fixed node.
• Each adaptive node can be decomposed into a fixed
node plus one or several parameter nodes.
26
parameter sharing problem
27
Adaptive networks - Classification
• Feedforward
• Recurrent
28
Layered representation of the feed-forward
adaptive network
• No links between nodes in the same layer, and
outputs of nodes in a specific layer are always
connected to nodes in succeeding layers.
• This representation is usually preferred because
of its modularity, in that nodes in the same layer
have the same functionality or generate the same
level of abstraction about input vectors.
29
Topological ordering representation
• Labels the nodes in an ordered sequence 1,2,3,..., such
that there are no links from node i to node j whenever i
>= j.
• This representation is less modular than the layer
representation, but it facilitates the formulation of
learning rules, as will be detailed in the next section.
• Special case of layered representation (one node per
layer)
30
Feed-forward adaptive network
• Conceptually, a feedforward adaptive network is actually a
static mapping between its input and output spaces; this
mapping may be either a simple linear relationship or a
highly nonlinear one, depending on the network structure
(node arrangement and connections, and so on) and the
functionality for each node.
• Here our aim is to construct a network for achieving a
desired nonlinear mapping that is regulated by a data set
consisting of desired input-output pairs of a target system
to be modeled.
• This data set is usually called the training data set, and the
procedures we follow in adjusting the parameters to
improve the network's performance are often referred to
as the learning rules or adaptation algorithms.
31
Feed-forward adaptive network
• Usually a network's performance is measured as
the discrepancy between the desired output and
the network's output under the same input
conditions.
• This discrepancy is called the error measure and it
can assume different forms for different
applications.
• Generally speaking, a learning rule is derived by
applying a specific optimization technique to a
given error measure.
32
Examples of adaptive networks
• An adaptive network with a single linear node
33
Examples of adaptive networks
• Perceptron network
• We can form an equivalent network with a single
node whose function is the composition of f3 and
f4 the resulting node is the building block of the
classical perceptron
34
Examples of adaptive networks
• Since the step function is discontinuous at one point
and flat at all the other points, it is not suitable for
derivative-based learning procedures.
• One way to get around this difficulty is to use the
sigmoidal function as a squashing function that has
values between 0 and 1:
• This is a continuous and differentiate approximation
to the step function. The composition of f3 and this
differentiable f4 is the building block for the
multilayer perceptron in the following example.
35
A multilayer perceptron
36
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• The central part of this learning rule concerns how
to recursively obtain a gradient vector in which
each element is defined as the derivative of an
error measure with respect to a parameter.
• This is done by means of the chain rule, a basic
formula for differentiating composite functions.
• The procedure of finding a gradient vector in a
network structure is generally referred to as
backpropagation because the gradient vector is
calculated in the direction opposite to the flow of
the output of each node.
37
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• Once the gradient is obtained, a number of
derivative-based optimization and regression
techniques are available for updating the
parameters.
• In particular, if we use the gradient vector in a
simple steepest descent method, the resulting
learning paradigm is often referred to as the
backpropagation learning rule.
38
39
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• Suppose that a given feedforward adaptive network in the
layered representation has L layers and layer l (L = 0,1,
..., L; l = 0 represents the input layer) has N(l) nodes.
• Then the output and function of node i [i = 1, ..., N(l)] in
layer l can be represented as xl,i and fl,i respectively.
• We assume that there are no jumping links (that is, links
connecting nonconsecutive layers).
• Since the output of a node depends on the incoming
signals and the parameter set of the node, we have the
following general expression for the node function fl,i
40
α, β, γ etc. are the parameters of this node.
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• An error measure for the pth (1 ≤ p < P) entry of the
training data
where dk – kth component of desired output
xL,k – kthe component of actual output
• Minimize the overall error measure, which is defined
as
• assume that Ep depends on the output nodes only,
and it is not universal definition
41
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• To use steepest descent to minimize the error
measure, first we have to obtain the gradient
vector.
• Before calculating the gradient vector, we should
observe the following causal relationships:
42
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• A small change in a parameter α will affect the output
of the node containing α; this in turn will affect the
output of the final layer and thus the error measure.
• Therefore, the basic concept in calculating the gradient
vector is to pass a form of derivative information
starting from the output layer and going backward
layer by layer until the input layer is reached.
• We define the error signal ϵl,i as the derivative of the
error measure Ep with respect to the output of node i in
layer l, taking both direct and indirect paths into
consideration.
43
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• The ordered derivative takes into consideration
both the direct and indirect paths that lead to the
causal relationship.
• The error signal for the ith
output node (at layer L)
can be calculated directly:
44
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• For the internal node at the ith position of layer l,
the error signal can be derived by the chain rule:
• The error signal of an internal node at layer l can
be expressed as a linear combination of the error
signal of the nodes at layer l + 1.
45
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• Therefore, for any l and i , we can find ϵl,i
▫ Find error signals at the output layer,
▫ Repeat iteratively to reach the desired layer.
• This procedure is called backpropagation since
the error signals are obtained sequentially from
the output layer back to the input layer.
• The gradient vector is defined as the derivative of
the error measure with respect to each parameter,
so we have to apply the chain rule again to find
the gradient vector.
46
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• If α is a parameter of the ith node at layer l, we
have
• More general form:
where S is the set of nodes containing a as a
parameter; and x* and f* are the output and
function, respectively, of a generic node in s.
47
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• The derivative of the overall error measure E with
respect to α is
• Accordingly, for simple steepest descent without
line minimization, the update formula for the
generic parameter α is
• η is the learning rate
48
k is the step size, the length of each transition along the gradient direction in the
parameter space.
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• When an n-node feedforward network is
represented in its topological order, we can
envision the error measure Ep as the output of an
additional node with index n + 1, whose node
function fn+1 can be defined on the outputs of any
nodes with smaller index;
• Therefore, Ep may depend directly on any nodes.
• Applying the chain rule again, we have the
following concise formula for calculating the
error signal
49
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• where the first term shows the direct effect of xi on
Ep via the direct path from node i to node n + 1 and
each product term in the summation indicates the
indirect effect of Xi on Ep.
• Once we find the error signal for each node, then
the gradient vector for the parameters is derived
as before.
50
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• Another systematic way to calculate the error
signals is through the representation of the error-
propagation network (or sensitivity model),
which is obtained from the original adaptive
network by reversing the links and supplying the
error signals at the output layer as inputs to the
new network.
51
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
52
BACKPROPAGATION FOR FEEDFORWARD
NETWORKS
• There are two types of learning paradigms that are
available to suit the needs for various applications.
• Off-line learning (or batch learning)
▫ The update formula for parameter a is based on Equation
(8.8) and the update action takes place only after the whole
training data set has been presented—that is, only after each
epoch or sweep.
• On-line learning (or pattern-by-pattern learning)
▫ The parameters are updated immediately after each input-
output pair has been presented, and the update formula is
based on Equation (8.6).
• In practice, it is possible to combine these two learning
modes and update the parameter after k training data
entries have been presented, where k is between 1 and P
and it is sometimes referred to as the epoch size.
53
EXTENDED BACKPROPAGATION FOR RECURRENT
NETWORKS
54
EXTENDED BACKPROPAGATION FOR RECURRENT
NETWORKS
55
Because it has directional
loops 3-4-5,
3-4-6-5, and 6 (a self-loop),
this is a typical recurrent
network with node functions
denoted as follows:
EXTENDED BACKPROPAGATION FOR RECURRENT
NETWORKS
• Two operating modes through which the
network may satisfy the node functions
▫ Synchronous operation
 Back propagation Through Time (BPTT)
 Real-Time Recurrent Learning (RTRL)
▫ Continuous operation
56
SYNCHRONOUS OPERATION
• If a network is operated synchronously, all nodes
change their outputs simultaneously according
to a global clock signal and there is a time delay
associated with each link.
• This synchronization is reflected by adding the
time t as an argument to the output of each node
(assuming there is a unit time delay associated
with each link):
57
CONTINUOUS OPERATION
• In a network that is operated continuously, all
nodes continuously change their outputs until
Equation (8.13) is satisfied.
• This operating mode is of particular interest for
analog circuit implementations, where a certain
kind of dynamical evolution rule is imposed on
the network.
58
(8.13)
Backpropagation Through Time (BPTT)
• Identifying a set of parameters that will make
the output of a node (or several nodes) follow a
given trajectory (or trajectories) in the discrete
time domain.
• This problem of tracking or trajectory following
is usually solved by using a method called
unfolding of time to transform a recurrent
network into a feedforward one, as long as the
time t does not exceed a reasonable maximum T.
59
Backpropagation Through Time (BPTT)
•
60
Backpropagation Through Time (BPTT)
•
61
Backpropagation Through Time (BPTT)
•
62
Backpropagation Through Time (BPTT)
• Note that the error signals of a parameter node come
from nodes located at layers across different time
instants;
• Thus, the backpropagation procedure (and the
corresponding steepest descent) for this kind of
unfolded network is often called backpropagation
through time (BPTT).
• Disadvantages of BPTT
▫ Requires extensive computing resources when the
sequence length T is large
▫ The duplication of nodes makes both memory
requirements and simulation time proportional to T.
63
Real-Time Recurrent Learning (RTRL)
• For long sequences or sequences of unknown
length, real-time recurrent learning (RTRL) is
employed to perform on-line learning — that is,
to update parameters while the network is
running rather than at the end of the presented
sequences.
64
Real-Time Recurrent Learning (RTRL)
65
Real-Time Recurrent Learning (RTRL)
• To save computation and memory requirements,
a sensible choice is to minimize Ei at each time
step instead of trying to minimize E at the end of
a sequences.
• To achieve this, we need to calculate ∂+
E/∂α
recursively at each time step i.
66
Real-Time Recurrent Learning (RTRL)
67
Try this!!!
68

More Related Content

PPT
ccghchbnmllmlmbgcccchvhxzdxfchvjbjbjvhvgcxz
PPT
NNFL 2 - Guru Nanak Dev Engineering College
PPTX
machine learning supervised learning with example
PPT
NNFL 3 - Guru Nanak Dev Engineering College
PDF
Neural networks introduction
PPT
Soft Computing-173101
PPTX
Neural network
PPTX
02 Fundamental Concepts of ANN
ccghchbnmllmlmbgcccchvhxzdxfchvjbjbjvhvgcxz
NNFL 2 - Guru Nanak Dev Engineering College
machine learning supervised learning with example
NNFL 3 - Guru Nanak Dev Engineering College
Neural networks introduction
Soft Computing-173101
Neural network
02 Fundamental Concepts of ANN

Similar to UNIT 3 - Neural networks feed forward n/w (20)

PPT
INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANN).ppt
PPT
w1-01-introtonn.ppt
PPTX
Acem neuralnetworks
PPTX
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
PPT
Neuralnetwork 101222074552-phpapp02
PPT
ann-ics320Part4.ppt
PPT
ann-ics320Part4.ppt
PPT
Artificial-Neural-Networks.ppt
PPTX
Neural network
PPT
Neural-Networks.ppt
PPT
UNIT 5-ANN.ppt
PDF
7 nn1-intro.ppt
PPTX
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
PPT
Soft Computering Technics - Unit2
PPT
Neural networks1
PPTX
Presentationnnnn
PDF
A Study of Social Media Data and Data Mining Techniques
PPTX
Artificial Neural Network
PPTX
Artificial Neural Networks for NIU
PPT
Artificial Neural Networks-Supervised Learning Models
INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANN).ppt
w1-01-introtonn.ppt
Acem neuralnetworks
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Neuralnetwork 101222074552-phpapp02
ann-ics320Part4.ppt
ann-ics320Part4.ppt
Artificial-Neural-Networks.ppt
Neural network
Neural-Networks.ppt
UNIT 5-ANN.ppt
7 nn1-intro.ppt
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
Soft Computering Technics - Unit2
Neural networks1
Presentationnnnn
A Study of Social Media Data and Data Mining Techniques
Artificial Neural Network
Artificial Neural Networks for NIU
Artificial Neural Networks-Supervised Learning Models
Ad

More from Subha421414 (12)

PPTX
EC6402-COMMUNICATION-THEORY-INTRODUCTION.pptx
PPTX
CS3491_AIML_Unit1_Problem solving_minmax pruning
PPTX
Evolution of medical information retrieval.pptx
PPT
medical-imaging-shale-olagbegi-informatics.ppt
PPT
Chapter 5 - Biomass and biofuels bioenergy
PPT
Chapter5_Digital logic design_digital electronics
PPT
chapter4-digital-modulation-part1 (1).ppt
PPTX
Automation in Clinical laboratories: Automation in hematology.pptx
PPT
Optical Detectors -Principle & Types.ppt
PPTX
Challenges for Standardization Cloud Computing and Big Data IOT
PPTX
Principles of renewable energy resources
PPT
slide_7.5.ppt
EC6402-COMMUNICATION-THEORY-INTRODUCTION.pptx
CS3491_AIML_Unit1_Problem solving_minmax pruning
Evolution of medical information retrieval.pptx
medical-imaging-shale-olagbegi-informatics.ppt
Chapter 5 - Biomass and biofuels bioenergy
Chapter5_Digital logic design_digital electronics
chapter4-digital-modulation-part1 (1).ppt
Automation in Clinical laboratories: Automation in hematology.pptx
Optical Detectors -Principle & Types.ppt
Challenges for Standardization Cloud Computing and Big Data IOT
Principles of renewable energy resources
slide_7.5.ppt
Ad

Recently uploaded (20)

PDF
Digital Logic Computer Design lecture notes
PPTX
additive manufacturing of ss316l using mig welding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
CH1 Production IntroductoryConcepts.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
Well-logging-methods_new................
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT
Mechanical Engineering MATERIALS Selection
Digital Logic Computer Design lecture notes
additive manufacturing of ss316l using mig welding
Embodied AI: Ushering in the Next Era of Intelligent Systems
Foundation to blockchain - A guide to Blockchain Tech
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Structs to JSON How Go Powers REST APIs.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
CH1 Production IntroductoryConcepts.pptx
573137875-Attendance-Management-System-original
Well-logging-methods_new................
bas. eng. economics group 4 presentation 1.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mechanical Engineering MATERIALS Selection

UNIT 3 - Neural networks feed forward n/w

  • 1. 1 Unit III NEURAL NETWORKS Adaptive Networks – Feed Forward Networks
  • 2. 2 Topics • Machine Learning using Neural Network • Adaptive Networks – Feed Forward Networks • Supervised Learning Neural Networks, Radial Basis Function Networks • Reinforcement Learning • Unsupervised Learning Neural Networks • Adaptive Resonance Architectures • Advances in Neural Networks
  • 3. 3 ARTIFICIAL NEURAL NET X2 X1 W2 W1 Y The figure shows a simple artificial neural net with two input neurons (X1, X2) and one output neuron (Y). The inter connected weights are given by W1 and W2. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 4. 4 ASSOCIATION OF BIOLOGICAL NET WITH ARTIFICIAL NET “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 5. 5 The neuron is the basic information processing unit of a NN. It consists of: 1. A set of links, describing the neuron inputs, with weights W1, W2, …, Wm. 2. An adder function (linear combiner) for computing the weighted sum of the inputs (real numbers): 3. Activation function for limiting the amplitude of the neuron output. j m j j X W u    1 ) b (u y   PROCESSING OF AN ARTIFICIAL NET “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 6. 6 BIAS OF AN ARTIFICIAL NEURON The bias value is added to the weighted sum ∑wixi so that we can transform it from the origin. Yin = w ∑ ixi + b, where b is the bias x1-x2=0 x1-x2= 1 x1 x2 x1-x2= -1 “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 7. 7 MULTI LAYER ARTIFICIAL NEURAL NET INPUT: records without class attribute with normalized attributes values. INPUT VECTOR: X = { x1, x2, …, xn} where n is the number of (non-class) attributes. INPUT LAYER: there are as many nodes as non- class attributes, i.e. as the length of the input vector. HIDDEN LAYER: the number of nodes in the hidden layer and the number of hidden layers depends on implementation. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 8. 8 OPERATION OF A NEURAL NET - f Weighted sum Input vector x Output y Activation function Weight vector w å w0j w1j wnj x0 x1 xn Bias “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 9. 9 WEIGHT AND BIAS UPDATION Per Sample Updating • Updating weights and biases after the presentation of each sample. Per Training Set Updating (Epoch or Iteration) • Weight and bias increments could be accumulated in variables and the weights and biases updated after all the samples of the training set have been presented. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 10. 10 STOPPING CONDITION  All change in weights (wij) in the previous epoch are below some threshold, or  The percentage of samples misclassified in the previous epoch is below some threshold, or  A pre-specified number of epochs has expired.  In practice, several hundreds of thousands of epochs may be required before the weights will converge. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 11. 11 BUILDING BLOCKS OF ARTIFICIAL NEURAL NET  Network Architecture (Connection between Neurons)  Setting the Weights (Training)  Activation Function “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 12. 12 “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 13. 13 LAYER PROPERTIES  Input Layer: Each input unit may be designated by an attribute value possessed by the instance.  Hidden Layer: Not directly observable, provides nonlinearities for the network.  Output Layer: Encodes possible values. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 14. 14 TRAINING PROCESS  Supervised Training - Providing the network with a series of sample inputs and comparing the output with the expected responses.  Unsupervised Training - Most similar input vector is assigned to the same output unit.  Reinforcement Training - Right answer is not provided but indication of whether ‘right’ or ‘wrong’ is provided. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 15. 15 ACTIVATION FUNCTION  ACTIVATION LEVEL – DISCRETE OR CONTINUOUS  HARD LIMIT FUCNTION (DISCRETE) • Binary Activation function • Bipolar activation function • Identity function  SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS) • Binary Sigmoidal activation function • Bipolar Sigmoidal activation function “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 16. 16 ACTIVATION FUNCTION Activation functions: (A) Identity (B) Binary step (C) Bipolar step (D) Binary sigmoidal (E) Bipolar sigmoidal (F) Ramp “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 17. 17 CONSTRUCTING ANN  Determine the network properties: • Network topology • Types of connectivity • Order of connections • Weight range  Determine the node properties: • Activation range  Determine the system dynamics • Weight initialization scheme • Activation – calculating formula • Learning rule “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 18. 18 PROBLEM SOLVING  Select a suitable NN model based on the nature of the problem.  Construct a NN according to the characteristics of the application domain.  Train the neural network with the learning procedure of the selected model.  Use the trained network for making inference or solving problems. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 19. 19 NEURAL NETWORKS  Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after testing phase, to classify unknown data.  Neural Network needs long time for training.  Neural Network has a high tolerance to noisy and incomplete data. “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 20. 20 SALIENT FEATURES OF ANN  Adaptive learning  Self-organization  Real-time operation  Fault tolerance via redundant information coding  Massive parallelism  Learning and generalizing ability  Distributed representation “Principles of Soft Computing, 2nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
  • 21. Adaptive Networks • An adaptive network is a network structure consisting of a number of nodes connected through directional links. • Each node represents a process unit, and the links between nodes specify the causal relationship between the connected nodes. • All or part of the nodes are adaptive • The learning rule specifies how these parameters should be updated to minimize a prescribed error measure ▫ It is a mathematical expression that measures the discrepancy between the network's actual output and a desired output. 21
  • 22. Adaptive Networks • Basic learning rule of the adaptive network is the well-known steepest descent method, in which the gradient vector is derived by successive invocations of the chain rule. • The gradient in a multilayer neural network - was called the back propagation learning rule 22
  • 23. Feed forward adaptive network 23
  • 24. Adaptive Networks • An adaptive network is a network structure whose overall input-output behavior is determined by a collection of modifiable parameters. • The configuration of an adaptive network ▫ set of nodes connected by directed links where each node performs a static node function on its incoming signals to generate a single node output and each link specifies the direction of signal flow from one node to another. 24
  • 25. Adaptive Networks • An adaptive network is heterogeneous and each node may have a specific node function different from the others. • Links in an adaptive network are merely used to specify the propagation direction of node outputs; generally there are no weights or parameters associated with links. 25
  • 26. Adaptive Networks • The parameters of an adaptive network are distributed into its nodes, so each node has a local parameter set. • The union of these local parameter sets is the network's overall parameter set. • If a node's parameter set is not empty, then its node function depends on the parameter values; we use a square to represent this kind of adaptive node. • If a node has an empty parameter set, then its function is fixed; we use a circle to denote this type of fixed node. • Each adaptive node can be decomposed into a fixed node plus one or several parameter nodes. 26
  • 28. Adaptive networks - Classification • Feedforward • Recurrent 28
  • 29. Layered representation of the feed-forward adaptive network • No links between nodes in the same layer, and outputs of nodes in a specific layer are always connected to nodes in succeeding layers. • This representation is usually preferred because of its modularity, in that nodes in the same layer have the same functionality or generate the same level of abstraction about input vectors. 29
  • 30. Topological ordering representation • Labels the nodes in an ordered sequence 1,2,3,..., such that there are no links from node i to node j whenever i >= j. • This representation is less modular than the layer representation, but it facilitates the formulation of learning rules, as will be detailed in the next section. • Special case of layered representation (one node per layer) 30
  • 31. Feed-forward adaptive network • Conceptually, a feedforward adaptive network is actually a static mapping between its input and output spaces; this mapping may be either a simple linear relationship or a highly nonlinear one, depending on the network structure (node arrangement and connections, and so on) and the functionality for each node. • Here our aim is to construct a network for achieving a desired nonlinear mapping that is regulated by a data set consisting of desired input-output pairs of a target system to be modeled. • This data set is usually called the training data set, and the procedures we follow in adjusting the parameters to improve the network's performance are often referred to as the learning rules or adaptation algorithms. 31
  • 32. Feed-forward adaptive network • Usually a network's performance is measured as the discrepancy between the desired output and the network's output under the same input conditions. • This discrepancy is called the error measure and it can assume different forms for different applications. • Generally speaking, a learning rule is derived by applying a specific optimization technique to a given error measure. 32
  • 33. Examples of adaptive networks • An adaptive network with a single linear node 33
  • 34. Examples of adaptive networks • Perceptron network • We can form an equivalent network with a single node whose function is the composition of f3 and f4 the resulting node is the building block of the classical perceptron 34
  • 35. Examples of adaptive networks • Since the step function is discontinuous at one point and flat at all the other points, it is not suitable for derivative-based learning procedures. • One way to get around this difficulty is to use the sigmoidal function as a squashing function that has values between 0 and 1: • This is a continuous and differentiate approximation to the step function. The composition of f3 and this differentiable f4 is the building block for the multilayer perceptron in the following example. 35
  • 37. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • The central part of this learning rule concerns how to recursively obtain a gradient vector in which each element is defined as the derivative of an error measure with respect to a parameter. • This is done by means of the chain rule, a basic formula for differentiating composite functions. • The procedure of finding a gradient vector in a network structure is generally referred to as backpropagation because the gradient vector is calculated in the direction opposite to the flow of the output of each node. 37
  • 38. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • Once the gradient is obtained, a number of derivative-based optimization and regression techniques are available for updating the parameters. • In particular, if we use the gradient vector in a simple steepest descent method, the resulting learning paradigm is often referred to as the backpropagation learning rule. 38
  • 39. 39
  • 40. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • Suppose that a given feedforward adaptive network in the layered representation has L layers and layer l (L = 0,1, ..., L; l = 0 represents the input layer) has N(l) nodes. • Then the output and function of node i [i = 1, ..., N(l)] in layer l can be represented as xl,i and fl,i respectively. • We assume that there are no jumping links (that is, links connecting nonconsecutive layers). • Since the output of a node depends on the incoming signals and the parameter set of the node, we have the following general expression for the node function fl,i 40 α, β, γ etc. are the parameters of this node.
  • 41. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • An error measure for the pth (1 ≤ p < P) entry of the training data where dk – kth component of desired output xL,k – kthe component of actual output • Minimize the overall error measure, which is defined as • assume that Ep depends on the output nodes only, and it is not universal definition 41
  • 42. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • To use steepest descent to minimize the error measure, first we have to obtain the gradient vector. • Before calculating the gradient vector, we should observe the following causal relationships: 42
  • 43. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • A small change in a parameter α will affect the output of the node containing α; this in turn will affect the output of the final layer and thus the error measure. • Therefore, the basic concept in calculating the gradient vector is to pass a form of derivative information starting from the output layer and going backward layer by layer until the input layer is reached. • We define the error signal ϵl,i as the derivative of the error measure Ep with respect to the output of node i in layer l, taking both direct and indirect paths into consideration. 43
  • 44. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • The ordered derivative takes into consideration both the direct and indirect paths that lead to the causal relationship. • The error signal for the ith output node (at layer L) can be calculated directly: 44
  • 45. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • For the internal node at the ith position of layer l, the error signal can be derived by the chain rule: • The error signal of an internal node at layer l can be expressed as a linear combination of the error signal of the nodes at layer l + 1. 45
  • 46. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • Therefore, for any l and i , we can find ϵl,i ▫ Find error signals at the output layer, ▫ Repeat iteratively to reach the desired layer. • This procedure is called backpropagation since the error signals are obtained sequentially from the output layer back to the input layer. • The gradient vector is defined as the derivative of the error measure with respect to each parameter, so we have to apply the chain rule again to find the gradient vector. 46
  • 47. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • If α is a parameter of the ith node at layer l, we have • More general form: where S is the set of nodes containing a as a parameter; and x* and f* are the output and function, respectively, of a generic node in s. 47
  • 48. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • The derivative of the overall error measure E with respect to α is • Accordingly, for simple steepest descent without line minimization, the update formula for the generic parameter α is • η is the learning rate 48 k is the step size, the length of each transition along the gradient direction in the parameter space.
  • 49. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • When an n-node feedforward network is represented in its topological order, we can envision the error measure Ep as the output of an additional node with index n + 1, whose node function fn+1 can be defined on the outputs of any nodes with smaller index; • Therefore, Ep may depend directly on any nodes. • Applying the chain rule again, we have the following concise formula for calculating the error signal 49
  • 50. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • where the first term shows the direct effect of xi on Ep via the direct path from node i to node n + 1 and each product term in the summation indicates the indirect effect of Xi on Ep. • Once we find the error signal for each node, then the gradient vector for the parameters is derived as before. 50
  • 51. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • Another systematic way to calculate the error signals is through the representation of the error- propagation network (or sensitivity model), which is obtained from the original adaptive network by reversing the links and supplying the error signals at the output layer as inputs to the new network. 51
  • 53. BACKPROPAGATION FOR FEEDFORWARD NETWORKS • There are two types of learning paradigms that are available to suit the needs for various applications. • Off-line learning (or batch learning) ▫ The update formula for parameter a is based on Equation (8.8) and the update action takes place only after the whole training data set has been presented—that is, only after each epoch or sweep. • On-line learning (or pattern-by-pattern learning) ▫ The parameters are updated immediately after each input- output pair has been presented, and the update formula is based on Equation (8.6). • In practice, it is possible to combine these two learning modes and update the parameter after k training data entries have been presented, where k is between 1 and P and it is sometimes referred to as the epoch size. 53
  • 54. EXTENDED BACKPROPAGATION FOR RECURRENT NETWORKS 54
  • 55. EXTENDED BACKPROPAGATION FOR RECURRENT NETWORKS 55 Because it has directional loops 3-4-5, 3-4-6-5, and 6 (a self-loop), this is a typical recurrent network with node functions denoted as follows:
  • 56. EXTENDED BACKPROPAGATION FOR RECURRENT NETWORKS • Two operating modes through which the network may satisfy the node functions ▫ Synchronous operation  Back propagation Through Time (BPTT)  Real-Time Recurrent Learning (RTRL) ▫ Continuous operation 56
  • 57. SYNCHRONOUS OPERATION • If a network is operated synchronously, all nodes change their outputs simultaneously according to a global clock signal and there is a time delay associated with each link. • This synchronization is reflected by adding the time t as an argument to the output of each node (assuming there is a unit time delay associated with each link): 57
  • 58. CONTINUOUS OPERATION • In a network that is operated continuously, all nodes continuously change their outputs until Equation (8.13) is satisfied. • This operating mode is of particular interest for analog circuit implementations, where a certain kind of dynamical evolution rule is imposed on the network. 58 (8.13)
  • 59. Backpropagation Through Time (BPTT) • Identifying a set of parameters that will make the output of a node (or several nodes) follow a given trajectory (or trajectories) in the discrete time domain. • This problem of tracking or trajectory following is usually solved by using a method called unfolding of time to transform a recurrent network into a feedforward one, as long as the time t does not exceed a reasonable maximum T. 59
  • 63. Backpropagation Through Time (BPTT) • Note that the error signals of a parameter node come from nodes located at layers across different time instants; • Thus, the backpropagation procedure (and the corresponding steepest descent) for this kind of unfolded network is often called backpropagation through time (BPTT). • Disadvantages of BPTT ▫ Requires extensive computing resources when the sequence length T is large ▫ The duplication of nodes makes both memory requirements and simulation time proportional to T. 63
  • 64. Real-Time Recurrent Learning (RTRL) • For long sequences or sequences of unknown length, real-time recurrent learning (RTRL) is employed to perform on-line learning — that is, to update parameters while the network is running rather than at the end of the presented sequences. 64
  • 66. Real-Time Recurrent Learning (RTRL) • To save computation and memory requirements, a sensible choice is to minimize Ei at each time step instead of trying to minimize E at the end of a sequences. • To achieve this, we need to calculate ∂+ E/∂α recursively at each time step i. 66