02 Fundamental Concepts of ANN

Neural Networks and
Fuzzy Systems
Fundamental Concepts of
Artificial Neural Networks (ANN)
Dr. Tamer Ahmed Farrag
Course No.: 803522-3

Course Outline
Part I : Neural Networks (11 weeks)
• Introduction to Machine Learning
• Fundamental Concepts of Artificial Neural Networks
(ANN)
• Single layer Perception Classifier
• Multi-layer Feed forward Networks
• Single layer FeedBack Networks
• Unsupervised learning
Part II : Fuzzy Systems (4 weeks)
• Fuzzy set theory
• Fuzzy Systems
2

The Human Brain
4
• Brain contains about 1010 basic units called neurons. Each
neuron in turn, is connected to about 104
other neurons.
• A neuron is a small cell that receives electro-chemical
signals from its various sources and in turn responds by
transmitting electrical impulses to other neurons.

The Stability-Plasticity Dilemma
• Plasticity: to continue to adapt to new information
• Stability: to preserve the information previously
learned
• The NN needs to remain ‘plastic’ to significant or
useful information but remain ‘stable’ when
presented with irrelevant information. This is
known as stability − plasticity dilemma
5

Training vs. Inference
• Training: acquiring knowledge
• Inference: solving a problem using the acquired
knowledge
6

Biological Neuron
• A biological neuron has three types of main components;
dendrites, soma (or cell body) and axon.
• Dendrites receives signals from other neurons.
• The soma, sums the incoming signals. When sufficient
input is received, the cell fires; that is it transmit a signal
over its axon to other cells.
8

Neural network: Definition
• Neural network: information processing paradigm inspired by
biological nervous systems, such as our brain
• Structure: large number of highly interconnected processing
elements (neurons) working together
• Like people, they learn from experience (by example)
9

Artificial Neural Network: Definition
• The idea of ANN: NNs learn relationship between cause and
effect or organize large volumes of data into orderly and
informative patterns.
• Definition of ANN: “Data processing system consisting of a
large number of simple, highly interconnected processing
elements (artificial neurons) in an architecture inspired by the
structure of the cerebral cortex of the brain”
(Tsoukalas & Uhrig, 1997).
10

Artificial Neurons
• ANNs have been developed as generalizations of
mathematical models of neural biology, based on the
assumptions that:
1. Information processing occurs at many simple
elements called neurons.
2. Signals are passed between neurons over connection
links.
3. Each connection link has an associated weight, which,
in typical neural net, multiplies the signal
transmitted.
4. Each neuron applies an activation function to its net
input to determine its output signal.
11

Analogy of ANN With Biological NN
12

The Artificial Neuron
13
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏
𝒘𝒊 𝒙𝒊 + 𝒃

Typical Architecture of ANNs
• A typical neural network contains a large number
of artificial neurons called units arranged in a
series of layers.
14

Typical Architecture of ANNs (cont.)
• Input layer—It contains those units (artificial neurons) which
receive input from the outside world on which network will
learn, recognize about or otherwise process.
• Output layer—It contains units that respond to the information
about how it’s learned any task.
• Hidden layer—These units are in between input and output
layers. The job of hidden layer is to transform the input into
something that output unit can use in some way.
 Most neural networks are fully connected that means to say
each hidden neuron is fully connected to the every neuron in
its previous layer(input) and to the next layer (output) layer.
15

Popular ANNs Architectures (Sample)
16http://www.asimovinstitute.org/neural-network-zoo/

Popular ANNs Architectures (cont.)
17
Single layer perceptron —Neural Network having two input
units and one output units with no hidden layers.
Multilayer Perceptron—These networks use more than one
hidden layer of neurons, unlike single layer perceptron. These
are also known as deep feedforward neural networks.
Hopfield Network—A fully interconnected network of neurons
in which each neuron is connected to every other neuron. The
network is trained with input pattern by setting a value of
neurons to the desired pattern. Then its weights are
computed. The weights are not changed. Once trained for one
or more patterns, the network will converge to the learned
patterns.

Popular ANNs Architectures (cont.)
18
Deep Learning Neural Network — It is a feedforward neural
networks with big structure (many hidden layers and large
number of neurons in each layer) used fro deep learning
Recurrent Neural Network—Type of neural network in which
hidden layer neurons has self-connections. Recurrent neural
networks possess memory. At any instance, hidden layer
neuron receives activation from the lower layer as well as it
previous activation value.
Long /Short Term Memory Network (LSTM)—Type of neural
network in which memory cell is incorporated inside hidden
layer neurons is called LSTM network.
Convolutional Neural Network—is a class of deep, feed-
forward artificial neural networks that has successfully been
applied to analyzing visual imagery

How are ANNs being used in solving
problems?
• The problem variables are mainly: inputs, weights and outputs.
• Examples (training data) represent a solved problem. i.e. Both
the inputs and outputs are known
• There are many different algorithms that can be used when
training artificial neural networks, each with their own separate
advantages and disadvantages.
• The learning process within ANNs is a result of altering the
network's weights and bias (threshold), with some kind of
learning algorithm.
• The objective is to find a set of weight matrices which when
applied to the network should map any input to a correct
output.
• For a new problem, we now have the inputs and the weights,
therefore, we can easily get the outputs.
19

Learning Techniques in ANNs
•In supervised learning, the training data is input to the
network, and the desired output is known weights and bias are
adjusted until output yields desired value.
Supervised Learning
•The input data is used to train the network whose output is
known. The network classifies the input data and adjusts the
weight by feature extraction in input data.
Unsupervised Learning
•Here the value of the output is unknown, but the network
provides the feedback whether the output is right or wrong. It is
semi-supervised learning.
Reinforcement Learning
20

Learning Algorithms
Hebbian
Depends on
input-output
correlation
𝑤 =
𝑗=1
𝑚
𝑋𝑗 𝑌𝑗
𝑇
Gradient
Descent
minimization
of error E
Δ𝑤𝑖𝑗 = 𝜂
𝜕𝐸
𝜕𝑤𝑖𝑗
Competitive
Learning
Only output
neuron with
high input is
updated
Winner-take-
all strategy
Stochastic
Learning
Weights are
adjusted in a
probabilistic
fashion
Such as
simulated
annealing
21

Learning Algorithms
22
• This is the simplest training algorithm used in case of
supervised training model.
• In case, the actual output is different from target output, the
difference or error is find out.
• The gradient descent algorithm changes the weights of the
network in such a manner to minimize this error.
Gradient Descent
• It is an extension of gradient based delta learning rule.
• Here, after finding an error (the difference between desired
and target), the error is propagated backward from output
layer to the input layer via hidden layer.
• It is used in case of multilayer neural network.
Back propagation

Learning Data Sets in ANN
23
• A training dataset is a dataset of examples used for learning, that is to
fit the parameters (e.g., weights) of, for example, a classifier.
• One Epoch comprises of one full training cycle on the training set.
Training set
• A validation dataset is a set of examples used to tune the
hyperparameters (i.e. number of hidden) of a classifier.
• the validation set should follow the same probability distribution as the
training dataset.
Validation set (development set)
• A test set is therefore a set of examples used only to assess the performance
(i.e. generalization) of a fully specified classifier.
• A better fitting of the training dataset as opposed to the test dataset usually
points to overfitting.
• the test set should follow the same probability distribution as the training
dataset.
Test set

Why Study ANNs?
• Artificial Neural Networks are powerful computational
systems consisting of many simple processing elements
connected together to perform tasks analogously to
biological brains.
• They are massively parallel, which makes them efficient,
robust, fault tolerant and noise tolerant.
• They can learn from training data and generalize to new
situations.
• They are useful for brain modeling and real world
applications involving pattern recognition, function
approximation, prediction, …
24

Applications of ANNs
• Signal processing
• Pattern recognition, e.g. handwritten characters or face
identification.
• Diagnosis or mapping symptoms to a medical case.
• Speech recognition
• Human Emotion Detection
• Educational Loan Forecasting
• Computer Vision
• Deep Learning
25

26
History
• 1943 McCulloch-Pitts neurons
• 1949 Hebb’s law
• 1958 Perceptron (Rosenblatt)
• 1960 Adaline, better learning rule (Widrow, Huff)
• 1969 Limitations (Minsky, Papert)
• 1972 Kohonen nets, associative memory
• 1977 Brain State in a Box (Anderson)
• 1982 Hopfield net, constraint satisfaction
• 1985 ART (Carpenter, Grossfield)
• 1986 Backpropagation (Rumelhart, Hinton, McClelland)
• 1988 Neocognitron, character recognition (Fukushima)
• …….
• …….

The Artificial Neuron
28
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏

The McCulloch-Pitts Neuron
• In the context of neural networks, a McCulloch-
Pitts is an artificial neuron using the a step
function as the activation function.
• It is also called Threshold Logic Unit.
• Threshold step function:
29
F(x)=
0 𝑓𝑜𝑟 𝑥 < 𝑇
1 𝑓𝑜𝑟 𝑥 ≥ 𝑇

The McCulloch-Pitts Neuron (cont).
• In simple words, the output of the McCulloch-Pitts
Neuron equals to 1 if the result of the previous
equation ≥ T , otherwise the output will equals to
zero:
• Where T is a threshold value.
30
𝒊=𝟎
𝒏
𝒐𝒖𝒕𝒑𝒖𝒕 = 𝒇
𝒊=𝟎
𝒏

Example
• If a McCulloch-Pitts neuron has 3 inputs (x1=1 ,
x2=1, x3=1) and the weights are (w1=1, w2 = -1,
w3= -1) and there is no bias. Find the output?
31
𝒊= 𝟎
𝒏
𝒘𝒊 𝒙𝒊 +𝒃
1
T
X1
X2
X3
w1
w2
w3
Sum=(1*1)+(1*-1)+(1*-1) + 0 = -1
output =0

Features of McCulloch-Pitts model
• Allows binary 0,1 states only
• Operates under a discrete-time assumption
• Weights and the neurons’ thresholds are fixed in
the model and no interaction among network
neurons (no learn)
• Just a primitive model.
• We can use multi – layer of McCulloch-Pitts
neurons to implement the basic logic gates. All
we need to do is find the appropriate
connection weights and neuron thresholds to
produce the right outputs for each set of inputs.

Check single Input McCulloch-Pitts Neuron
33
CommentZxWT
Always 1
1010
1110
Works as
inverter
10-10
01-10
Works as buffer0011
1111
Always 0
00-11
01-11
w= ? z = ?
T=?X =?

Example: McCulloch-Pitts NOR
34
1
z = A NOR B
T=1
A
B
T=0
-1
1
ZBA
100
010
001
011
• Two layers McCulloch-Pitts Neurons to implement NOR Gate
• Check the truth table ??
F(x)=

Example: McCulloch-Pitts NAND
35
-1
z = A NAND B
T=0A
B
T=1
1
-1
ZBA
100
110
101
011
• Two layers McCulloch-Pitts Neurons to implement NAND Gate
• Check the truth table ??
T=0 1

Activation Functions
• Assume:
• S can be anything ranging from -inf to +inf. The neuron
really doesn’t know the bounds of the value. So how
do we decide whether the neuron should fire or not (
output = 1 or 0).
• So, we decided to add “activation functions” for this
purpose. To check the S value produced by a neuron
and decide whether outside connections should
consider this neuron as “fired” or not. Or rather let’s
say—“activated” or not.
• activation function serves as a threshold and also
called as “Transfer function”.
36
𝑺 =
𝒊=𝟎
𝒏

Activation Functions (cont.)
• The activation functions can be basically divided
into 2 types-
1. Linear Activation Function
2. Non-linear Activation Functions
(Unit Step, Sigmoid, Tach , ReLU, Leaky ReLU , Softmax, …..)
• In most cases, activation function are non-linear
function, that is, the role of activation function is
made neural networks non-linear.
37

Activation Functions (cont.)
• There have been many kinds of activation functions
(over 640 different activation function proposals) that
have been proposed over the years.
• However, best practice confines the use to only a
limited kind of activation functions.
• Next we will explore the most important and widely
used activation function.
• But, the most important question is ”how do we know
which one to use?”.
• Answer: Depending on best practice and nature of the
problem
38

Popular Activation Functions
• Linear or Identity
• Step Activation Function (previously explained)
• Sigmoid or Logistic Activation Function
• Tanh or hyperbolic tangent Activation Function
• ReLU (Rectified Linear Unit) Activation Function
• Leaky ReLU
• Softmax function
39https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

Linear or Identity Activation Function
• the function is a line or linear.
• Therefore, the output of the functions will not be
confined between any range.
40

Step Activation Function
• Used in McCulloch-Pitts Neuron.
• hard limiter activation function is a special case
of step function:
• Sign activation function is a special case of step
function:
41
F(x)=
1
0
F(x)=hardlim(x)=
0 𝑓𝑜𝑟 𝑥 < 0
1 𝑓𝑜𝑟 𝑥 ≥ 0
1
-1
F(x)=sign(x)=
−1 𝑓𝑜𝑟 𝑥 < 0
+1 𝑓𝑜𝑟 𝑥 ≥ 0
1
T0

Sigmoid or Logistic Activation Function
• The Sigmoid Function curve looks like a S-shape.
• It exists between (0 to 1). Used in binary classifiers
to predict the probability (0 to1) as an output.
42
f(x)=
1
1+𝑒−𝑥

Tanh or hyperbolic tangent Activation Function
• The Tanh Function curve looks like a S-shape.
• The range of the tanh function is from (-1 to 1)
43
f(x)= tanh 𝑥 =
𝑒 𝑥−𝑒−𝑥
𝑒 𝑥+𝑒−𝑥

ReLU (Rectified Linear Unit) Activation Function
• The ReLU is the most used activation function in
the world right now.
• Any negative input given to the ReLU activation
function turns the value into zero immediately in
the graph
44
f(x)= max(0, 𝑥)

Leaky ReLU
• Leaky ReLUs allow a small, non-zero gradient
when the unit is not active (negative values).
45

Softmax Activation Function
• the softmax function is a generalization of the
sigmoid logistic function.
• The softmax function is used in multiclass
classification methods.
• Will be discusses later.
46

02 Fundamental Concepts of ANN

More Related Content

What's hot (20)

Similar to 02 Fundamental Concepts of ANN (20)

Recently uploaded (20)

02 Fundamental Concepts of ANN

Editor's Notes