Understanding of neural network architecture

Advanced Artificial Neural
Networks
Perceptron & Beyond
Lecture 2 & 3
Dr. Tehseen Zia

Perceptron
• Rosenblatt proposed a binary classification method
• Key Idea
• One weight per input
• Multiple weights with respective inputs and add bias
• If result larger than threshold, return 1, otherwise 0

Example Problem
• Will I pass this course?
Let’s start with a simple two feature model

Example Problem
𝑥1
𝑥2
Σ
𝑤1
𝑤2

Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
𝜏=10
𝑦

Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
𝜏=10
𝑦
What variables of the model are known and not
known ?

Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
𝜏=10
𝑦
known ?
X
X X

Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
𝜏=10
𝑦
known ?
X
X X
Learning objective: To find values of the variables ()

Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
𝜏=0
𝑦
𝑥0=1
𝑤0=−10
How to fix threshold?

Example Problem
𝑥1 =4
𝑥2=5
Σ
𝑤1=1
𝑤2=1
𝜏=10
=0

Example Problem
𝑥1 =4
=5
Σ
𝑤1=1
𝑤2=1
𝜏=0
=0
𝑥0=1
𝑏¿−10
How to fix threshold?

Example Problem
• How to implement the model

Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made

But, how do the algorithm knew that the
predictions are wrong?

But, how do the algorithm knew that the
predictions are wrong?
Cost function/ loss function tell it

What is Cost function? 𝐿(𝑊 ,𝑏)=∑
𝑖=1
𝑚
(𝑦− ^
𝑦)
2

What is Cost function? 𝐿(𝑊 ,𝑏)=∑
𝑖=1
𝑚
(𝑦− ^
𝑦)
2
Actual
class
Predicted
class

Learning Problem
• Problem:
• Loss function:
• Objective
• Minimize with respect to
• Algorithm:
• Start with some
• Keep changing to reduce
• Until we (hopefully) end-up with minimum

How much weights to
increase or decrease ?

Gradient Decent
Algorithm

Hill-Descent
𝐿
(
𝑊
)
𝑊
“Climbing down from Everest in thick fog
with amnesia"
𝑥
Σ
𝑤
𝜏
𝑦
𝐿(𝑊 )

Hill-Descent
𝐿
(
𝑊
)
𝑊
Question 1: What we need to take a step
downwards ?

Hill-Descent
𝐿
(
𝑊
)
𝑊
Question 1: What we need to take a step
downwards ?: Direction and step size

Hill-Descent
𝐿
(
𝑊
)
𝑊
Question 2: How do we find direction of
decreasing ?

From Hill-Descent to Gradient-Descent
𝐿
(
𝑊
)
𝑊
Question 2: How do we find direction of
decreasing ?:

𝐿
(
𝑊
)
𝑊
Knowing direction is not enough, we take
a step towards the direction:
𝜕(𝐿 (𝑊 ))
𝜕𝑊
α

𝐿
(
𝑊
)
𝑊
Knowing direction is not enough, we take
a step towards the direction:
-
α

Gradient-Descent Algorithm
-
}

Gradient-Descent Algorithm: Why It Works ?
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝐿(𝑤
𝑗
)
𝑤 𝑗
-
-

𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝐿(𝑤
𝑗
)
𝑤 𝑗
-
-
-
-
𝑤 𝑗
𝑛𝑒𝑤
<𝑤 𝑗
𝑜𝑙𝑑
𝑤 𝑗
𝑛𝑒𝑤
>𝑤 𝑗
𝑜𝑙𝑑

𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
-
𝑤 𝑗
𝑛𝑒𝑤
=𝑤 𝑗
𝑜𝑙𝑑
Why ?

Gradient Descent Algorithm: Semantics of
Learning Rate
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
-
-

Gradient Descent Algorithm: Semantics of
Learning Rate
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
-
-
If is small gradient descent
convergence is slow
If is large gradient descent
convergence fast but it can
overshoot the minimum or
diverge

Convergence
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤

Convergence
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
𝜕 𝐿(𝑊 )
𝜕𝑤
As it approaches to local
minimum, gradient descent
automatically takes smaller
steps

Derivation of
• Loss function:
• = = …

Gradient-Descent Algorithm
- .
}

Perceptron Decision Boundary: Linear
∑
𝑥1
𝑤1
𝑥1
^
𝑦
𝑤1 =0.5
^
𝑦
Single input neuron

Linearly and Nonlinearly Separable Problem

Implementing AND with Perceptron
∑=𝑤 1𝑥1+𝑤 2𝑥2
𝑤 1=1
1
𝑥 1
𝑥 2
^
𝑦
¿ 2
= 1
= 0

Implementing OR with Perceptron
∑=𝑤 1𝑥1+𝑤 2𝑥2
𝑤 1=1
1
𝑥 1
𝑥 2
^
𝑦
¿ 1
= 1
= 0

Implementing XOR with Perceptron

Implementing XOR with Perceptron
We cannot because
XOR is nonlinearly
separable

Implementing XOR with Multi-Perceptron
Perceptron # 2
Perceptron # 1

Nonlinearly Separable Problems and Multi-
perceptron
Perceptron # 2
Perceptron # 1
Decision rule:
if ∑ of P1 < 0 -> black
elseif ∑ of P2 > 0 -> black
else white

Multi-perceptron Architecture
∑
𝑥1
𝑤11
^
𝑦
∑
𝑤22
𝑥2
∑
𝑤12
𝑤21
𝑤11
𝑤21
Perceptron # 1
Perceptron # 2 Perceptron # 3

Nonlinearity is Important
• Perceptron 1:
• Perceptron 2:
• Perceptron 3:

• Perceptron 1:
• Perceptron 2:
• Perceptron 3:
+ + +

• Perceptron 1:
• Perceptron 2:
• Perceptron 3:
+ + +
+

• Perceptron 1:
• Perceptron 2:
• Perceptron 3:
+ + +
+
Linear function

Computational Neuron
𝑧=𝑤 1𝑥 1+𝑤 2𝑥 2
𝑤 1
𝑤 2
𝑥 1
𝑥 2
𝑎=𝜎(𝑧)
𝑎= ^
𝑦
=

Interpretation
𝑎=𝜎(𝑧)
𝑎=𝑝(𝑦=1|𝑥,𝑤¿
^
𝑦=1,𝑖𝑓𝑎≥0.5
Decision rule:
^
𝑦=0 ,𝑖𝑓𝑎<0.5

Decision Boundary
𝑧=𝑤 0+𝑤 1 𝑥 1+𝑤 2 𝑥 2
= ()
= 0
= -3 𝑤 1=1𝑤 2=1
Suppose

Learning Task
• Dataset :
• Input:
• Classifier
• Where
• How to choose parameters ?
=
𝑧=𝑤 1 𝑥1
+𝑤 2 𝑥2
+…+𝑤 𝑛 𝑥𝑛

Cost Function Mean square loss
function
=

Cost Function
)
Mean square loss
function
=

function

Cost Function
Cost = 0 if y = 1, a = 1
But as -> 0
Cost -> ∞
Mean square loss
function

Cost Function
Cost = 0 if y = 0, a = 0
But as -> 1
Cost -> ∞
Mean square loss
function

function
𝐽(𝑊 )=−
1
𝑚
∑
𝑖=1
𝑚
ylog(^
𝑦)+(1− y)log(1− ^
𝑦)

function
Cross-entropy loss function
𝐽(𝑊 )=−
1
𝑚
∑
𝑖=1
𝑚
ylog(^
𝑦)+(1− y)log(1− ^
𝑦)

Learning Parameters: Gradient Descent
Algorithm

Understanding of neural network architecture

More Related Content

Similar to Understanding of neural network architecture (20)

Recently uploaded (20)

Understanding of neural network architecture