SlideShare a Scribd company logo
Neural Networks and
Fuzzy Systems
Fundamental Concepts of
Artificial Neural Networks (ANN)
Dr. Tamer Ahmed Farrag
Course No.: 803522-3
Course Outline
Part I : Neural Networks (11 weeks)
• Introduction to Machine Learning
• Fundamental Concepts of Artificial Neural Networks
(ANN)
• Single layer Perception Classifier
• Multi-layer Feed forward Networks
• Single layer FeedBack Networks
• Unsupervised learning
Part II : Fuzzy Systems (4 weeks)
• Fuzzy set theory
• Fuzzy Systems
2
Artificial Neural Networks
3
The Human Brain
4
• Brain contains about 1010 basic units called neurons. Each
neuron in turn, is connected to about 104
other neurons.
• A neuron is a small cell that receives electro-chemical
signals from its various sources and in turn responds by
transmitting electrical impulses to other neurons.
The Stability-Plasticity Dilemma
• Plasticity: to continue to adapt to new information
• Stability: to preserve the information previously
learned
• The NN needs to remain ‘plastic’ to significant or
useful information but remain ‘stable’ when
presented with irrelevant information. This is
known as stability − plasticity dilemma
5
Training vs. Inference
• Training: acquiring knowledge
• Inference: solving a problem using the acquired
knowledge
6
Brain Vs Computer
7
Biological Neuron
• A biological neuron has three types of main components;
dendrites, soma (or cell body) and axon.
• Dendrites receives signals from other neurons.
• The soma, sums the incoming signals. When sufficient
input is received, the cell fires; that is it transmit a signal
over its axon to other cells.
8
Neural network: Definition
• Neural network: information processing paradigm inspired by
biological nervous systems, such as our brain
• Structure: large number of highly interconnected processing
elements (neurons) working together
• Like people, they learn from experience (by example)
9
Artificial Neural Network: Definition
• The idea of ANN: NNs learn relationship between cause and
effect or organize large volumes of data into orderly and
informative patterns.
• Definition of ANN: “Data processing system consisting of a
large number of simple, highly interconnected processing
elements (artificial neurons) in an architecture inspired by the
structure of the cerebral cortex of the brain”
(Tsoukalas & Uhrig, 1997).
10
Artificial Neurons
• ANNs have been developed as generalizations of
mathematical models of neural biology, based on the
assumptions that:
1. Information processing occurs at many simple
elements called neurons.
2. Signals are passed between neurons over connection
links.
3. Each connection link has an associated weight, which,
in typical neural net, multiplies the signal
transmitted.
4. Each neuron applies an activation function to its net
input to determine its output signal.
11
Analogy of ANN With Biological NN
12
The Artificial Neuron
13
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃
Typical Architecture of ANNs
• A typical neural network contains a large number
of artificial neurons called units arranged in a
series of layers.
14
Typical Architecture of ANNs (cont.)
• Input layer—It contains those units (artificial neurons) which
receive input from the outside world on which network will
learn, recognize about or otherwise process.
• Output layer—It contains units that respond to the information
about how it’s learned any task.
• Hidden layer—These units are in between input and output
layers. The job of hidden layer is to transform the input into
something that output unit can use in some way.
 Most neural networks are fully connected that means to say
each hidden neuron is fully connected to the every neuron in
its previous layer(input) and to the next layer (output) layer.
15
Popular ANNs Architectures (Sample)
16http://www.asimovinstitute.org/neural-network-zoo/
Popular ANNs Architectures (cont.)
17
Single layer perceptron —Neural Network having two input
units and one output units with no hidden layers.
Multilayer Perceptron—These networks use more than one
hidden layer of neurons, unlike single layer perceptron. These
are also known as deep feedforward neural networks.
Hopfield Network—A fully interconnected network of neurons
in which each neuron is connected to every other neuron. The
network is trained with input pattern by setting a value of
neurons to the desired pattern. Then its weights are
computed. The weights are not changed. Once trained for one
or more patterns, the network will converge to the learned
patterns.
Popular ANNs Architectures (cont.)
18
Deep Learning Neural Network — It is a feedforward neural
networks with big structure (many hidden layers and large
number of neurons in each layer) used fro deep learning
Recurrent Neural Network—Type of neural network in which
hidden layer neurons has self-connections. Recurrent neural
networks possess memory. At any instance, hidden layer
neuron receives activation from the lower layer as well as it
previous activation value.
Long /Short Term Memory Network (LSTM)—Type of neural
network in which memory cell is incorporated inside hidden
layer neurons is called LSTM network.
Convolutional Neural Network—is a class of deep, feed-
forward artificial neural networks that has successfully been
applied to analyzing visual imagery
How are ANNs being used in solving
problems?
• The problem variables are mainly: inputs, weights and outputs.
• Examples (training data) represent a solved problem. i.e. Both
the inputs and outputs are known
• There are many different algorithms that can be used when
training artificial neural networks, each with their own separate
advantages and disadvantages.
• The learning process within ANNs is a result of altering the
network's weights and bias (threshold), with some kind of
learning algorithm.
• The objective is to find a set of weight matrices which when
applied to the network should map any input to a correct
output.
• For a new problem, we now have the inputs and the weights,
therefore, we can easily get the outputs.
19
Learning Techniques in ANNs
•In supervised learning, the training data is input to the
network, and the desired output is known weights and bias are
adjusted until output yields desired value.
Supervised Learning
•The input data is used to train the network whose output is
known. The network classifies the input data and adjusts the
weight by feature extraction in input data.
Unsupervised Learning
•Here the value of the output is unknown, but the network
provides the feedback whether the output is right or wrong. It is
semi-supervised learning.
Reinforcement Learning
20
Learning Algorithms
Hebbian
Depends on
input-output
correlation
𝑤 =
𝑗=1
𝑚
𝑋𝑗 𝑌𝑗
𝑇
Gradient
Descent
minimization
of error E
Δ𝑤𝑖𝑗 = 𝜂
𝜕𝐸
𝜕𝑤𝑖𝑗
Competitive
Learning
Only output
neuron with
high input is
updated
Winner-take-
all strategy
Stochastic
Learning
Weights are
adjusted in a
probabilistic
fashion
Such as
simulated
annealing
21
Learning Algorithms
22
• This is the simplest training algorithm used in case of
supervised training model.
• In case, the actual output is different from target output, the
difference or error is find out.
• The gradient descent algorithm changes the weights of the
network in such a manner to minimize this error.
Gradient Descent
• It is an extension of gradient based delta learning rule.
• Here, after finding an error (the difference between desired
and target), the error is propagated backward from output
layer to the input layer via hidden layer.
• It is used in case of multilayer neural network.
Back propagation
Learning Data Sets in ANN
23
• A training dataset is a dataset of examples used for learning, that is to
fit the parameters (e.g., weights) of, for example, a classifier.
• One Epoch comprises of one full training cycle on the training set.
Training set
• A validation dataset is a set of examples used to tune the
hyperparameters (i.e. number of hidden) of a classifier.
• the validation set should follow the same probability distribution as the
training dataset.
Validation set (development set)
• A test set is therefore a set of examples used only to assess the performance
(i.e. generalization) of a fully specified classifier.
• A better fitting of the training dataset as opposed to the test dataset usually
points to overfitting.
• the test set should follow the same probability distribution as the training
dataset.
Test set
Why Study ANNs?
• Artificial Neural Networks are powerful computational
systems consisting of many simple processing elements
connected together to perform tasks analogously to
biological brains.
• They are massively parallel, which makes them efficient,
robust, fault tolerant and noise tolerant.
• They can learn from training data and generalize to new
situations.
• They are useful for brain modeling and real world
applications involving pattern recognition, function
approximation, prediction, …
24
Applications of ANNs
• Signal processing
• Pattern recognition, e.g. handwritten characters or face
identification.
• Diagnosis or mapping symptoms to a medical case.
• Speech recognition
• Human Emotion Detection
• Educational Loan Forecasting
• Computer Vision
• Deep Learning
25
26
History
• 1943 McCulloch-Pitts neurons
• 1949 Hebb’s law
• 1958 Perceptron (Rosenblatt)
• 1960 Adaline, better learning rule (Widrow, Huff)
• 1969 Limitations (Minsky, Papert)
• 1972 Kohonen nets, associative memory
• 1977 Brain State in a Box (Anderson)
• 1982 Hopfield net, constraint satisfaction
• 1985 ART (Carpenter, Grossfield)
• 1986 Backpropagation (Rumelhart, Hinton, McClelland)
• 1988 Neocognitron, character recognition (Fukushima)
• …….
• …….
The McCulloch-Pitts Neuron
27
The Artificial Neuron
28
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃
The McCulloch-Pitts Neuron
• In the context of neural networks, a McCulloch-
Pitts is an artificial neuron using the a step
function as the activation function.
• It is also called Threshold Logic Unit.
• Threshold step function:
29
F(x)=
0 𝑓𝑜𝑟 𝑥 < 𝑇
1 𝑓𝑜𝑟 𝑥 ≥ 𝑇
The McCulloch-Pitts Neuron (cont).
• In simple words, the output of the McCulloch-Pitts
Neuron equals to 1 if the result of the previous
equation ≥ T , otherwise the output will equals to
zero:
• Where T is a threshold value.
30
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃
𝒐𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃
Example
• If a McCulloch-Pitts neuron has 3 inputs (x1=1 ,
x2=1, x3=1) and the weights are (w1=1, w2 = -1,
w3= -1) and there is no bias. Find the output?
31
𝒊= 𝟎
𝒏
𝒘𝒊 𝒙𝒊 +𝒃
1
T
X1
X2
X3
w1
w2
w3
Sum=(1*1)+(1*-1)+(1*-1) + 0 = -1
output =0
Features of McCulloch-Pitts model
• Allows binary 0,1 states only
• Operates under a discrete-time assumption
• Weights and the neurons’ thresholds are fixed in
the model and no interaction among network
neurons (no learn)
• Just a primitive model.
• We can use multi – layer of McCulloch-Pitts
neurons to implement the basic logic gates. All
we need to do is find the appropriate
connection weights and neuron thresholds to
produce the right outputs for each set of inputs.
Check single Input McCulloch-Pitts Neuron
33
CommentZxWT
Always 1
1010
1110
Works as
inverter
10-10
01-10
Works as buffer0011
1111
Always 0
00-11
01-11
w= ? z = ?
T=?X =?
Example: McCulloch-Pitts NOR
34
1
z = A NOR B
T=1
A
B
T=0
-1
1
ZBA
100
010
001
011
• Two layers McCulloch-Pitts Neurons to implement NOR Gate
• Check the truth table ??
F(x)=
0 𝑓𝑜𝑟 𝑥 < 𝑇
1 𝑓𝑜𝑟 𝑥 ≥ 𝑇
Example: McCulloch-Pitts NAND
35
-1
z = A NAND B
T=0A
B
T=1
1
-1
ZBA
100
110
101
011
• Two layers McCulloch-Pitts Neurons to implement NAND Gate
• Check the truth table ??
T=0 1
Activation Functions
• Assume:
• S can be anything ranging from -inf to +inf. The neuron
really doesn’t know the bounds of the value. So how
do we decide whether the neuron should fire or not (
output = 1 or 0).
• So, we decided to add “activation functions” for this
purpose. To check the S value produced by a neuron
and decide whether outside connections should
consider this neuron as “fired” or not. Or rather let’s
say—“activated” or not.
• activation function serves as a threshold and also
called as “Transfer function”.
36
𝑺 =
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃
Activation Functions (cont.)
• The activation functions can be basically divided
into 2 types-
1. Linear Activation Function
2. Non-linear Activation Functions
(Unit Step, Sigmoid, Tach , ReLU, Leaky ReLU , Softmax, …..)
• In most cases, activation function are non-linear
function, that is, the role of activation function is
made neural networks non-linear.
37
Activation Functions (cont.)
• There have been many kinds of activation functions
(over 640 different activation function proposals) that
have been proposed over the years.
• However, best practice confines the use to only a
limited kind of activation functions.
• Next we will explore the most important and widely
used activation function.
• But, the most important question is ”how do we know
which one to use?”.
• Answer: Depending on best practice and nature of the
problem
38
Popular Activation Functions
• Linear or Identity
• Step Activation Function (previously explained)
• Sigmoid or Logistic Activation Function
• Tanh or hyperbolic tangent Activation Function
• ReLU (Rectified Linear Unit) Activation Function
• Leaky ReLU
• Softmax function
39https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
Linear or Identity Activation Function
• the function is a line or linear.
• Therefore, the output of the functions will not be
confined between any range.
40
Step Activation Function
• Used in McCulloch-Pitts Neuron.
• hard limiter activation function is a special case
of step function:
• Sign activation function is a special case of step
function:
41
F(x)=
0 𝑓𝑜𝑟 𝑥 < 𝑇
1 𝑓𝑜𝑟 𝑥 ≥ 𝑇
1
0
F(x)=hardlim(x)=
0 𝑓𝑜𝑟 𝑥 < 0
1 𝑓𝑜𝑟 𝑥 ≥ 0
1
-1
F(x)=sign(x)=
−1 𝑓𝑜𝑟 𝑥 < 0
+1 𝑓𝑜𝑟 𝑥 ≥ 0
1
T0
Sigmoid or Logistic Activation Function
• The Sigmoid Function curve looks like a S-shape.
• It exists between (0 to 1). Used in binary classifiers
to predict the probability (0 to1) as an output.
42
f(x)=
1
1+𝑒−𝑥
Tanh or hyperbolic tangent Activation Function
• The Tanh Function curve looks like a S-shape.
• The range of the tanh function is from (-1 to 1)
43
f(x)= tanh 𝑥 =
𝑒 𝑥−𝑒−𝑥
𝑒 𝑥+𝑒−𝑥
ReLU (Rectified Linear Unit) Activation Function
• The ReLU is the most used activation function in
the world right now.
• Any negative input given to the ReLU activation
function turns the value into zero immediately in
the graph
44
f(x)= max(0, 𝑥)
Leaky ReLU
• Leaky ReLUs allow a small, non-zero gradient
when the unit is not active (negative values).
45
Softmax Activation Function
• the softmax function is a generalization of the
sigmoid logistic function.
• The softmax function is used in multiclass
classification methods.
• Will be discusses later.
46

More Related Content

PPTX
Top 10 uses of AI in Healthcare
PDF
HCI LAB MANUAL
PPTX
Virtual Private Networks (VPN) ppt
PDF
CNS - Unit - 2 - Stream Ciphers and Block Ciphers
PPTX
PPS
Physical security.ppt
PPTX
Hierarchical clustering
Top 10 uses of AI in Healthcare
HCI LAB MANUAL
Virtual Private Networks (VPN) ppt
CNS - Unit - 2 - Stream Ciphers and Block Ciphers
Physical security.ppt
Hierarchical clustering

What's hot (20)

PPSX
Perceptron (neural network)
PPTX
Perceptron & Neural Networks
PPTX
Neural network
PPTX
Feedforward neural network
PPTX
Kohonen self organizing maps
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
Counter propagation Network
PPTX
Artificial Neural Network Topology
PPT
Perceptron
PPTX
HOPFIELD NETWORK
PDF
Neural networks introduction
PPT
backpropagation in neural networks
PPTX
Neural networks
PPS
Neural Networks
PPTX
Neural networks...
PPTX
neural network
PPT
Principles of soft computing-Associative memory networks
PPTX
04 Multi-layer Feedforward Networks
Perceptron (neural network)
Perceptron & Neural Networks
Neural network
Feedforward neural network
Kohonen self organizing maps
Artificial Neural Networks Lect3: Neural Network Learning rules
Counter propagation Network
Artificial Neural Network Topology
Perceptron
HOPFIELD NETWORK
Neural networks introduction
backpropagation in neural networks
Neural networks
Neural Networks
Neural networks...
neural network
Principles of soft computing-Associative memory networks
04 Multi-layer Feedforward Networks
Ad

Similar to 02 Fundamental Concepts of ANN (20)

PPTX
Artificial Neural Network in Medical Diagnosis
PPT
ANN_B.TechPresentation of ANN basics.ppt
PPTX
neuralnetwork.pptx
PPT
Lec 1-2-3-intr.
PPTX
Neural network
DOCX
Neural networks of artificial intelligence
PPTX
Artificial Neural Network
PPTX
Basics of Artificial Neural Network
PPT
INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANN).ppt
PPT
w1-01-introtonn.ppt
PPTX
Artificial neural network
PPTX
neuralnetwork.pptx
PPTX
neuralnetwork.pptx
PDF
Neural networks are parallel computing devices.docx.pdf
PDF
7 nn1-intro.ppt
PDF
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
PPT
Neuralnetwork 101222074552-phpapp02
PPTX
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Network in Medical Diagnosis
ANN_B.TechPresentation of ANN basics.ppt
neuralnetwork.pptx
Lec 1-2-3-intr.
Neural network
Neural networks of artificial intelligence
Artificial Neural Network
Basics of Artificial Neural Network
INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANN).ppt
w1-01-introtonn.ppt
Artificial neural network
neuralnetwork.pptx
neuralnetwork.pptx
Neural networks are parallel computing devices.docx.pdf
7 nn1-intro.ppt
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
Neuralnetwork 101222074552-phpapp02
Artificial Neural Networks ppt.pptx for final sem cse
Ad

Recently uploaded (20)

PDF
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
PPTX
master seminar digital applications in india
PDF
Basic Mud Logging Guide for educational purpose
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Open folder Downloads.pdf yes yes ges yes
PPTX
Open Quiz Monsoon Mind Game Prelims.pptx
PDF
The Final Stretch: How to Release a Game and Not Die in the Process.
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Structure & Organelles in detailed.
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
master seminar digital applications in india
Basic Mud Logging Guide for educational purpose
102 student loan defaulters named and shamed – Is someone you know on the list?
STATICS OF THE RIGID BODIES Hibbelers.pdf
GDM (1) (1).pptx small presentation for students
Open folder Downloads.pdf yes yes ges yes
Open Quiz Monsoon Mind Game Prelims.pptx
The Final Stretch: How to Release a Game and Not Die in the Process.
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Renaissance Architecture: A Journey from Faith to Humanism
Microbial disease of the cardiovascular and lymphatic systems
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPH.pptx obstetrics and gynecology in nursing
Cell Structure & Organelles in detailed.
TR - Agricultural Crops Production NC III.pdf
Anesthesia in Laparoscopic Surgery in India
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf

02 Fundamental Concepts of ANN

  • 1. Neural Networks and Fuzzy Systems Fundamental Concepts of Artificial Neural Networks (ANN) Dr. Tamer Ahmed Farrag Course No.: 803522-3
  • 2. Course Outline Part I : Neural Networks (11 weeks) • Introduction to Machine Learning • Fundamental Concepts of Artificial Neural Networks (ANN) • Single layer Perception Classifier • Multi-layer Feed forward Networks • Single layer FeedBack Networks • Unsupervised learning Part II : Fuzzy Systems (4 weeks) • Fuzzy set theory • Fuzzy Systems 2
  • 4. The Human Brain 4 • Brain contains about 1010 basic units called neurons. Each neuron in turn, is connected to about 104 other neurons. • A neuron is a small cell that receives electro-chemical signals from its various sources and in turn responds by transmitting electrical impulses to other neurons.
  • 5. The Stability-Plasticity Dilemma • Plasticity: to continue to adapt to new information • Stability: to preserve the information previously learned • The NN needs to remain ‘plastic’ to significant or useful information but remain ‘stable’ when presented with irrelevant information. This is known as stability − plasticity dilemma 5
  • 6. Training vs. Inference • Training: acquiring knowledge • Inference: solving a problem using the acquired knowledge 6
  • 8. Biological Neuron • A biological neuron has three types of main components; dendrites, soma (or cell body) and axon. • Dendrites receives signals from other neurons. • The soma, sums the incoming signals. When sufficient input is received, the cell fires; that is it transmit a signal over its axon to other cells. 8
  • 9. Neural network: Definition • Neural network: information processing paradigm inspired by biological nervous systems, such as our brain • Structure: large number of highly interconnected processing elements (neurons) working together • Like people, they learn from experience (by example) 9
  • 10. Artificial Neural Network: Definition • The idea of ANN: NNs learn relationship between cause and effect or organize large volumes of data into orderly and informative patterns. • Definition of ANN: “Data processing system consisting of a large number of simple, highly interconnected processing elements (artificial neurons) in an architecture inspired by the structure of the cerebral cortex of the brain” (Tsoukalas & Uhrig, 1997). 10
  • 11. Artificial Neurons • ANNs have been developed as generalizations of mathematical models of neural biology, based on the assumptions that: 1. Information processing occurs at many simple elements called neurons. 2. Signals are passed between neurons over connection links. 3. Each connection link has an associated weight, which, in typical neural net, multiplies the signal transmitted. 4. Each neuron applies an activation function to its net input to determine its output signal. 11
  • 12. Analogy of ANN With Biological NN 12
  • 13. The Artificial Neuron 13 𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇 𝒊=𝟎 𝒏 𝒘𝒊 𝒙𝒊 + 𝒃
  • 14. Typical Architecture of ANNs • A typical neural network contains a large number of artificial neurons called units arranged in a series of layers. 14
  • 15. Typical Architecture of ANNs (cont.) • Input layer—It contains those units (artificial neurons) which receive input from the outside world on which network will learn, recognize about or otherwise process. • Output layer—It contains units that respond to the information about how it’s learned any task. • Hidden layer—These units are in between input and output layers. The job of hidden layer is to transform the input into something that output unit can use in some way.  Most neural networks are fully connected that means to say each hidden neuron is fully connected to the every neuron in its previous layer(input) and to the next layer (output) layer. 15
  • 16. Popular ANNs Architectures (Sample) 16http://www.asimovinstitute.org/neural-network-zoo/
  • 17. Popular ANNs Architectures (cont.) 17 Single layer perceptron —Neural Network having two input units and one output units with no hidden layers. Multilayer Perceptron—These networks use more than one hidden layer of neurons, unlike single layer perceptron. These are also known as deep feedforward neural networks. Hopfield Network—A fully interconnected network of neurons in which each neuron is connected to every other neuron. The network is trained with input pattern by setting a value of neurons to the desired pattern. Then its weights are computed. The weights are not changed. Once trained for one or more patterns, the network will converge to the learned patterns.
  • 18. Popular ANNs Architectures (cont.) 18 Deep Learning Neural Network — It is a feedforward neural networks with big structure (many hidden layers and large number of neurons in each layer) used fro deep learning Recurrent Neural Network—Type of neural network in which hidden layer neurons has self-connections. Recurrent neural networks possess memory. At any instance, hidden layer neuron receives activation from the lower layer as well as it previous activation value. Long /Short Term Memory Network (LSTM)—Type of neural network in which memory cell is incorporated inside hidden layer neurons is called LSTM network. Convolutional Neural Network—is a class of deep, feed- forward artificial neural networks that has successfully been applied to analyzing visual imagery
  • 19. How are ANNs being used in solving problems? • The problem variables are mainly: inputs, weights and outputs. • Examples (training data) represent a solved problem. i.e. Both the inputs and outputs are known • There are many different algorithms that can be used when training artificial neural networks, each with their own separate advantages and disadvantages. • The learning process within ANNs is a result of altering the network's weights and bias (threshold), with some kind of learning algorithm. • The objective is to find a set of weight matrices which when applied to the network should map any input to a correct output. • For a new problem, we now have the inputs and the weights, therefore, we can easily get the outputs. 19
  • 20. Learning Techniques in ANNs •In supervised learning, the training data is input to the network, and the desired output is known weights and bias are adjusted until output yields desired value. Supervised Learning •The input data is used to train the network whose output is known. The network classifies the input data and adjusts the weight by feature extraction in input data. Unsupervised Learning •Here the value of the output is unknown, but the network provides the feedback whether the output is right or wrong. It is semi-supervised learning. Reinforcement Learning 20
  • 21. Learning Algorithms Hebbian Depends on input-output correlation 𝑤 = 𝑗=1 𝑚 𝑋𝑗 𝑌𝑗 𝑇 Gradient Descent minimization of error E Δ𝑤𝑖𝑗 = 𝜂 𝜕𝐸 𝜕𝑤𝑖𝑗 Competitive Learning Only output neuron with high input is updated Winner-take- all strategy Stochastic Learning Weights are adjusted in a probabilistic fashion Such as simulated annealing 21
  • 22. Learning Algorithms 22 • This is the simplest training algorithm used in case of supervised training model. • In case, the actual output is different from target output, the difference or error is find out. • The gradient descent algorithm changes the weights of the network in such a manner to minimize this error. Gradient Descent • It is an extension of gradient based delta learning rule. • Here, after finding an error (the difference between desired and target), the error is propagated backward from output layer to the input layer via hidden layer. • It is used in case of multilayer neural network. Back propagation
  • 23. Learning Data Sets in ANN 23 • A training dataset is a dataset of examples used for learning, that is to fit the parameters (e.g., weights) of, for example, a classifier. • One Epoch comprises of one full training cycle on the training set. Training set • A validation dataset is a set of examples used to tune the hyperparameters (i.e. number of hidden) of a classifier. • the validation set should follow the same probability distribution as the training dataset. Validation set (development set) • A test set is therefore a set of examples used only to assess the performance (i.e. generalization) of a fully specified classifier. • A better fitting of the training dataset as opposed to the test dataset usually points to overfitting. • the test set should follow the same probability distribution as the training dataset. Test set
  • 24. Why Study ANNs? • Artificial Neural Networks are powerful computational systems consisting of many simple processing elements connected together to perform tasks analogously to biological brains. • They are massively parallel, which makes them efficient, robust, fault tolerant and noise tolerant. • They can learn from training data and generalize to new situations. • They are useful for brain modeling and real world applications involving pattern recognition, function approximation, prediction, … 24
  • 25. Applications of ANNs • Signal processing • Pattern recognition, e.g. handwritten characters or face identification. • Diagnosis or mapping symptoms to a medical case. • Speech recognition • Human Emotion Detection • Educational Loan Forecasting • Computer Vision • Deep Learning 25
  • 26. 26 History • 1943 McCulloch-Pitts neurons • 1949 Hebb’s law • 1958 Perceptron (Rosenblatt) • 1960 Adaline, better learning rule (Widrow, Huff) • 1969 Limitations (Minsky, Papert) • 1972 Kohonen nets, associative memory • 1977 Brain State in a Box (Anderson) • 1982 Hopfield net, constraint satisfaction • 1985 ART (Carpenter, Grossfield) • 1986 Backpropagation (Rumelhart, Hinton, McClelland) • 1988 Neocognitron, character recognition (Fukushima) • ……. • …….
  • 28. The Artificial Neuron 28 𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇 𝒊=𝟎 𝒏 𝒘𝒊 𝒙𝒊 + 𝒃
  • 29. The McCulloch-Pitts Neuron • In the context of neural networks, a McCulloch- Pitts is an artificial neuron using the a step function as the activation function. • It is also called Threshold Logic Unit. • Threshold step function: 29 F(x)= 0 𝑓𝑜𝑟 𝑥 < 𝑇 1 𝑓𝑜𝑟 𝑥 ≥ 𝑇
  • 30. The McCulloch-Pitts Neuron (cont). • In simple words, the output of the McCulloch-Pitts Neuron equals to 1 if the result of the previous equation ≥ T , otherwise the output will equals to zero: • Where T is a threshold value. 30 𝒊=𝟎 𝒏 𝒘𝒊 𝒙𝒊 + 𝒃 𝒐𝒖𝒕𝒑𝒖𝒕 = 𝒇 𝒊=𝟎 𝒏 𝒘𝒊 𝒙𝒊 + 𝒃
  • 31. Example • If a McCulloch-Pitts neuron has 3 inputs (x1=1 , x2=1, x3=1) and the weights are (w1=1, w2 = -1, w3= -1) and there is no bias. Find the output? 31 𝒊= 𝟎 𝒏 𝒘𝒊 𝒙𝒊 +𝒃 1 T X1 X2 X3 w1 w2 w3 Sum=(1*1)+(1*-1)+(1*-1) + 0 = -1 output =0
  • 32. Features of McCulloch-Pitts model • Allows binary 0,1 states only • Operates under a discrete-time assumption • Weights and the neurons’ thresholds are fixed in the model and no interaction among network neurons (no learn) • Just a primitive model. • We can use multi – layer of McCulloch-Pitts neurons to implement the basic logic gates. All we need to do is find the appropriate connection weights and neuron thresholds to produce the right outputs for each set of inputs.
  • 33. Check single Input McCulloch-Pitts Neuron 33 CommentZxWT Always 1 1010 1110 Works as inverter 10-10 01-10 Works as buffer0011 1111 Always 0 00-11 01-11 w= ? z = ? T=?X =?
  • 34. Example: McCulloch-Pitts NOR 34 1 z = A NOR B T=1 A B T=0 -1 1 ZBA 100 010 001 011 • Two layers McCulloch-Pitts Neurons to implement NOR Gate • Check the truth table ?? F(x)= 0 𝑓𝑜𝑟 𝑥 < 𝑇 1 𝑓𝑜𝑟 𝑥 ≥ 𝑇
  • 35. Example: McCulloch-Pitts NAND 35 -1 z = A NAND B T=0A B T=1 1 -1 ZBA 100 110 101 011 • Two layers McCulloch-Pitts Neurons to implement NAND Gate • Check the truth table ?? T=0 1
  • 36. Activation Functions • Assume: • S can be anything ranging from -inf to +inf. The neuron really doesn’t know the bounds of the value. So how do we decide whether the neuron should fire or not ( output = 1 or 0). • So, we decided to add “activation functions” for this purpose. To check the S value produced by a neuron and decide whether outside connections should consider this neuron as “fired” or not. Or rather let’s say—“activated” or not. • activation function serves as a threshold and also called as “Transfer function”. 36 𝑺 = 𝒊=𝟎 𝒏 𝒘𝒊 𝒙𝒊 + 𝒃
  • 37. Activation Functions (cont.) • The activation functions can be basically divided into 2 types- 1. Linear Activation Function 2. Non-linear Activation Functions (Unit Step, Sigmoid, Tach , ReLU, Leaky ReLU , Softmax, …..) • In most cases, activation function are non-linear function, that is, the role of activation function is made neural networks non-linear. 37
  • 38. Activation Functions (cont.) • There have been many kinds of activation functions (over 640 different activation function proposals) that have been proposed over the years. • However, best practice confines the use to only a limited kind of activation functions. • Next we will explore the most important and widely used activation function. • But, the most important question is ”how do we know which one to use?”. • Answer: Depending on best practice and nature of the problem 38
  • 39. Popular Activation Functions • Linear or Identity • Step Activation Function (previously explained) • Sigmoid or Logistic Activation Function • Tanh or hyperbolic tangent Activation Function • ReLU (Rectified Linear Unit) Activation Function • Leaky ReLU • Softmax function 39https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
  • 40. Linear or Identity Activation Function • the function is a line or linear. • Therefore, the output of the functions will not be confined between any range. 40
  • 41. Step Activation Function • Used in McCulloch-Pitts Neuron. • hard limiter activation function is a special case of step function: • Sign activation function is a special case of step function: 41 F(x)= 0 𝑓𝑜𝑟 𝑥 < 𝑇 1 𝑓𝑜𝑟 𝑥 ≥ 𝑇 1 0 F(x)=hardlim(x)= 0 𝑓𝑜𝑟 𝑥 < 0 1 𝑓𝑜𝑟 𝑥 ≥ 0 1 -1 F(x)=sign(x)= −1 𝑓𝑜𝑟 𝑥 < 0 +1 𝑓𝑜𝑟 𝑥 ≥ 0 1 T0
  • 42. Sigmoid or Logistic Activation Function • The Sigmoid Function curve looks like a S-shape. • It exists between (0 to 1). Used in binary classifiers to predict the probability (0 to1) as an output. 42 f(x)= 1 1+𝑒−𝑥
  • 43. Tanh or hyperbolic tangent Activation Function • The Tanh Function curve looks like a S-shape. • The range of the tanh function is from (-1 to 1) 43 f(x)= tanh 𝑥 = 𝑒 𝑥−𝑒−𝑥 𝑒 𝑥+𝑒−𝑥
  • 44. ReLU (Rectified Linear Unit) Activation Function • The ReLU is the most used activation function in the world right now. • Any negative input given to the ReLU activation function turns the value into zero immediately in the graph 44 f(x)= max(0, 𝑥)
  • 45. Leaky ReLU • Leaky ReLUs allow a small, non-zero gradient when the unit is not active (negative values). 45
  • 46. Softmax Activation Function • the softmax function is a generalization of the sigmoid logistic function. • The softmax function is used in multiclass classification methods. • Will be discusses later. 46

Editor's Notes

  • #6: Define plasticity and stability of a neural network What is meant by the stability-plasticity dilemma?
  • #7: Define training and inference