3. Perceptron
• Rosenblatt proposed a binary classification method
• Key Idea
• One weight per input
• Multiple weights with respective inputs and add bias
• If result larger than threshold, return 1, otherwise 0
7. Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
• Will I pass this course?
Let’s start with a simple two feature model
𝜏=10
𝑦
What variables of the model are known and not
known ?
8. Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
• Will I pass this course?
Let’s start with a simple two feature model
𝜏=10
𝑦
What variables of the model are known and not
known ?
X
X X
9. Example Problem
𝑥1
𝑥2
Σ
𝑤1=1
𝑤2=1
• Will I pass this course?
Let’s start with a simple two feature model
𝜏=10
𝑦
What variables of the model are known and not
known ?
X
X X
Learning objective: To find values of the variables ()
14. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
15. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
But, how do the algorithm knew that the
predictions are wrong?
16. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
But, how do the algorithm knew that the
predictions are wrong?
Cost function/ loss function tell it
17. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
What is Cost function? 𝐿(𝑊 ,𝑏)=∑
𝑖=1
𝑚
(𝑦− ^
𝑦)
2
18. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
What is Cost function? 𝐿(𝑊 ,𝑏)=∑
𝑖=1
𝑚
(𝑦− ^
𝑦)
2
Actual
class
Predicted
class
19. Learning Problem
• Problem:
• Loss function:
• Objective
• Minimize with respect to
• Algorithm:
• Start with some
• Keep changing to reduce
• Until we (hopefully) end-up with minimum
20. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
How much weights to
increase or decrease ?
21. Training a Perceptron
• Learning algorithm:
• Initialize weights randomly
• Take one sample and predict
• For wrong predictions, update weights
• If the output was = 1, increase weights
• If the output was = 0, decrease weights
• Repeat until no errors are made
Gradient Decent
Algorithm
35. Gradient Descent Algorithm: Semantics of
Learning Rate
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
𝐿(𝑤¿¿
𝑗)¿
𝑤 𝑗
-
-
If is small gradient descent
convergence is slow
If is large gradient descent
convergence fast but it can
overshoot the minimum or
diverge
48. Nonlinearly Separable Problems and Multi-
perceptron
Perceptron # 2
Perceptron # 1
Decision rule:
if ∑ of P1 < 0 -> black
elseif ∑ of P2 > 0 -> black
else white