SlideShare a Scribd company logo
CSCS3003 (Deep Learning) : Lecture 2
McCulloch Pitts Neuron, Thresholding Logic, Perceptrons
Syed Sajid Hussain
Department of Computer Science and Engineering
UPES,Dehradun
Deep Learning Presentation that has been
Module 2.1: Biological
Neurons
1
y
σ
w1 w2 w3
x1 x2 x3
Artificial Neuron
2
• The most fundamental unit of a deep
neural network is called an artificial
neuron
σ
y
w1 w2 w3
x1 x2
x3
2
Artificial Neuron
• The most fundamental unit of a deep
neural network is called an artificial
neuron
• Why is it called a neuron ? Where does
the inspiration come from ?
• The inspiration comes from biology
(more specifically, from the brain)
• biological neurons = neural cells =
neural processing units
• W
e will first see what a biological
neuron looks like ...
Biological Neurons∗
• dendrite: receives signals from other
neurons
• synapse: point of connection to other
neurons
• soma: processes the information
• axon: transmits the output of
this neuron
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
3
• Of course, in reality, it is not just a single neuron
which does all this
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
inform- ation to other neurons they are
connected to
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
inform- ation to other neurons they are
connected to
• These neurons may also fire (again, in red) and
the process continues
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
inform- ation to other neurons they are
connected to
• These neurons may also fire (again, in red) and
the process continues eventually resulting in a
re- sponse (laughter in this case)
5
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
inform- ation to other neurons they are
connected to
• These neurons may also fire (again, in red) and
the process continues eventually resulting in a
re- sponse (laughter in this case)
• An average human brain has around 1011 (100
bil- lion) neurons!
5
• This massively parallel network also ensures that
there is division of work
6
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
6
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
6
A simplified illustration
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
6
A simplified illustration
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
6
A simplified illustration
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
6
A simplified illustration
A simplified illustration
6
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or
respond to a certain stimulus
• The neurons in the brain are arranged
in a hierarchy
7
Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s
lecture slides
8
Disclaimer
9
• I understand very little about how the brain works!
• What you saw so far is an overly simplified explanation of how the brain
works!
• But this explanation suffices for the purpose of this course!
Module 2.2: McCulloch Pitts
Neuron
10
x1 x2 ..
..
xn
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
11
x1 x2 ..
..
xn ∈ {0, 1}
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
11
x1 x2 ..
..
xn ∈ {0, 1}
g
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs
11
x1 x2 ..
..
xn ∈ {0, 1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
11
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
11
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
11
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
11
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
11
1 2 n
n
L
i=1
g(x , x , ..., x ) = g(x) = xi
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
11
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
y = f (g(x)) = 1 if
g(x) ≥ θ
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
11
1 2 n
n
∑
g(x , x , ..., x ) = g(x) = xi
y = f (g(x)) = 1 if
= 0
if
i=1
g(x) ≥
θ
g(x) <
θ
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
11
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
g(x) ≥
θ
y = f (g(x)) = 1 if
= 0
if
g(x) < θ
• θ is called the thresholding
parameter
x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
• g aggregates the inputs and the function f
takes a decision based on this aggregation
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory,else
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
g(x) ≥
θ
y = f (g(x)) = 1 if
= 0
if
g(x) < θ
• θ is called the thresholding
parameter 11
Let us implement some boolean functions using this McCulloch Pitts (MP)
neuron ...
12
y ∈ {0, 1}
θ
x1 x2 x3
A McCulloch Pitts unit
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
13
x1 x2 x3
AND function
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
3
x1 x2 x3
AND function
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
13
x1 x2 x3
OR function
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
x1 x2
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
1
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
x1 x2
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
1
x1 x2 x1 x2
x1 AND !x2
∗
NOR function
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
1 0
x1 x2 x1 x2
x1 AND !x2
∗
NOR function
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
y ∈ {0,
1}
1 0
x1 x2 x1 x2 x1
NO
T function
x1 AND !x2
∗
NOR function
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
y ∈ {0,
1}
θ
x1 x2 x3
A McCulloch Pitts unit
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
y ∈ {0,
1}
1 0 0
x1 x2 x1 x2 x1
NO
T function
x1 AND !x2
∗
NOR function
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13
• Can any boolean function be represented using a McCulloch Pitts unit ?
14
• Can any boolean function be represented using a McCulloch Pitts unit ?
• Before answering this question let us first see the geometric interpretation of a MP
unit
...
14
y ∈ {0, 1}
1
x1 x2
OR function
15
1
2
x + x =
L 2
i=1 xi ≥ 1
y ∈ {0, 1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1
(0, 0) (1, 0) 15
y ∈ {0, 1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
x1
(0, 0) (1, 0) 15
y ∈ {0,
1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
x1
(0, 0) (1, 0) 15
x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
• Points lying on or above the
line and points lying below this
line
x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
• Points lying on or above the
line and points lying below this
line
• In other words, all inputs which produce an
output
0 will be on one side
(
L n
i=1 xi < θ) of the line
and
all inputs which produce an output 1 will lie on
the
other side
(
L n
i=1 xi ≥ θ) of this line
x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
• Points lying on or above the
line and points lying below this
line
• In other words, all inputs which produce an
output
0 will be on one side
(
L n
i=1 xi < θ) of the line
and
all inputs which produce an output 1 will lie on
the
other side
(
L n
i=1 xi ≥ θ) of this line
• Let us convince ourselves about this with a few
more examples (if it is not already clear from the
math)
y ∈ {0, 1}
2
x1 x2
AND function
16
1
2
x + x =
L 2
i=1 xi ≥ 2
y ∈ {0, 1}
2
x1 x2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1
(0, 0) (1, 0)
16
y ∈ {0, 1}
2
x1 x2
AND function
1
2
x + x = L
2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
16
x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
y ∈ {0,
1}
x1
(0, 0) (1, 0)
16
x1 x2
Tautology (always ON)
x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
y ∈ {0,
1}
0
x1 x2
Tautology (always ON)
x1
(0, 0) (1, 0)
16
x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
y ∈ {0,
1}
0
x1 x2
Tautology (always ON)
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
x1
(0, 0) (1, 0)
16
x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
y ∈ {0,
1}
0
x1 x2
Tautology (always ON)
x2
(0, 1) (1, 1)
x1 + x2 = θ = 0
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
x1
(0, 0) (1, 0)
16
x1 x2 x3
y ∈ {0,
1}
O
R
1
• What if we have more than 2 inputs?
17
y ∈ {0, 1}
O
R
1
x1
x2
x3 x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
• What if we have more than 2 inputs?
(0, 0,
1)
(1, 0,
1)
x3
17
y ∈ {0, 1}
O
R
1
x1
x2
x3 x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
• What if we have more than 2 inputs?
• Well, instead of a line we will have a
plane
(0, 0,
1)
(1, 0,
1)
x3
17
x1 x2 x3
y ∈ {0, 1}
O
R
1
x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
• What if we have more than 2 inputs?
• Well, instead of a line we will have a
plane
• For the OR function, we want a
plane such that the point (0,0,0) lies
on one side and the remaining 7 points
lie on the other side of the plane
(0, 0,
1)
(1, 0,
1)
x3
17
y ∈ {0, 1}
O
R
1
(0, 0,
0)
x1
x2
x3 x2
(0, 1, 0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1, 1)x1 + x2 + x3 = θ = 1
• What if we have more than 2 inputs?
• Well, instead of a line we will have a
plane
• For the OR function, we want a
plane such that the point (0,0,0) lies
on one side and the remaining 7 points
lie on the other side of the plane
(0, 0,
1)
(1, 0,
1)
x3
17
The story so far ...
18
• A single McCulloch Pitts Neuron can be used to represent boolean functions which
are linearly separable
The story so far ...
18
• A single McCulloch Pitts Neuron can be used to represent boolean functions which
are linearly separable
• Linear separability (for boolean functions) : There exists a line (plane) such that all
in- puts which produce a 1 lie on one side of the line (plane) and all inputs which
produce a 0 lie on other side of the line (plane)
Module 2.3:
Perceptron
19
The story ahead ...
20
• What about non-boolean (say, real) inputs ?
The story ahead ...
20
• What about non-boolean (say, real) inputs ?
• Do we always need to hand code the
threshold ?
The story ahead ...
20
• What about non-boolean (say, real) inputs ?
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?
The story ahead ...
20
• What about non-boolean (say, real) inputs ?
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?
• What about functions which are not linearly separable ?
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
21
x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
• A more general computational model than
McCul- loch–Pitts neurons
x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
• A more general computational model than
McCul- loch–Pitts neurons
• Main differences: Introduction of numerical
weights for inputs and a mechanism for learning
these weights
x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
• A more general computational model than
McCul- loch–Pitts neurons
• Main differences: Introduction of numerical
weights for inputs and a mechanism for learning
these weights
• Inputs are no longer limited to boolean values
x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
• A more general computational model than
McCul- loch–Pitts neurons
• Main differences: Introduction of numerical
weights for inputs and a mechanism for learning
these weights
• Inputs are no longer limited to boolean values
• Refined and carefully analyzed by Minsky and Pa-
pert (1969) - their model is referred to as the
per- ceptron model here
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i=1
i i
y = 1 if w ∗ x ≥
θ
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i=1
i i
y = 1 if w ∗ x − θ ≥
0
x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i i
y = 1 if w ∗ x − θ ≥
0 i=1
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
.. xn
y
w1
w2
22
.. .. wn
x1 x2
..
A more accepted convention,
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
y = 1 if wi ∗ xi − θ ≥
0
i=1
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
.. xn
y
w1
w2
22
.. .. wn
x1 x2
..
A more accepted convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
.. xn
y
w1
w2
where, x0 = 1 and w0 =
−θ 22
.. .. wn
x1 x2
..
A more accepted convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
x1 x2
..
.. xn
y
w1
w2
where, x0 = 1 and w0 =
−θ 22
.. .. wn
w0 = −θ
x0 = 1
A more accepted
convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
x1 x2
..
.. xn
y
w1
w2
where, x0 = 1 and w0 =
−θ 22
.. .. wn
w0 = −θ
x0 = 1
A more accepted
convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i=0
i i
= 0 if w ∗ x <
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0
We will now try to answer the following questions:
• Why are we trying to implement boolean functions?
• Why do we need weights ?
• Why is w0 = −θ called the bias ?
23
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
• Consider the task of predicting whether we would like a
movie or not
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
• Consider the task of predicting whether we would like a
movie or not
• Suppose, we base our decision on 3 inputs (binary, for
sim- plicity)
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• Consider the task of predicting whether we would like a
movie or not
• Suppose, we base our decision on 3 inputs (binary, for
sim- plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the
other inputs
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• Consider the task of predicting whether we would like a
movie or not
• Suppose, we base our decision on 3 inputs (binary, for
sim- plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the
other inputs
• Specifically, even if the actor is not Matt Damon and the
genre is not thriller we would still want to cross the
threshold θ by assigning a high weight to isDirectorNolan
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• w0 is called the bias as it represents the prior
(prejudice)
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, director
[θ = 0]
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, director
[θ = 0]
• On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan [θ =
3]
x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, director
[θ = 0]
• On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan [θ =
3]
• The weights (w1, w2, ..., wn) and the bias (w0) will
depend on the data (viewer history in this case)
What kind of functions can be implemented using the perceptron? Any difference
from McCulloch Pitts neurons?
25
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
26
n
L
i
y = 1 if x ≥ 0
i=0
n
L
i=0
i
= 0 if x < 0
Perceptron
n
L
i i
y = 1 if w ∗ x ≥
0 i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
n
L
i
y = 1 if x ≥ 0
i=0
n
L
i=0
i
= 0 if x < 0
Perceptron
n
L
i i
y = 1 if w ∗ x ≥
0 i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
26
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
26
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be
used to implement linearly separable functions
26
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be
used to implement linearly separable functions
• Then what is the difference?
26
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
i=0
26
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be
used to implement linearly separable functions
• Then what is the difference? The weights
(includ- ing threshold) can be learned and the
inputs can be real valued
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
i=0
26
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be
used to implement linearly separable functions
• Then what is the difference? The weights
(includ- ing threshold) can be learned and the
inputs can be real valued
• We will first revisit some boolean functions and
then see the perceptron learning algorithm (for
learning weights)
x1 x2
OR
0 0
27
x1 x2
OR
0 0 0
27
x1 x2
OR 0
0 0 0 w +
L 2
i=1 wixi
27
x1 x2
OR 0
0 0 0 w +
L 2
i=1 wixi < 0
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1
27
x1 x2
OR 0
L
0
0 0 0 w +
1 0 1 w +
L
2
i=1
2
i=1
wixi < 0
wixi ≥ 0
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥
−w0
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various other solutions are possible)
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities is
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities is
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
other solutions are possible)
27
x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities is
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
• Note that we can come up
with a similar set of
inequalities and find the value
of θ for a McCul- loch Pitts
neuron also (Try it!) 27
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
x1
(0, 0) (1, 0)
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line?
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ- ent values of w1, w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -
1
1.5 0
3
1
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
1.5 0 1
0.45 0.45 3
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1
3
1.5 0
1
0.45 0.45 3
• We are interested in those values of w0, w1,
w2
which result in 0 error
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29
• Let us fix the threshold (−w0 = 1) and try
differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
• Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1
3
1.5 0
1
0.45 0.45 3
• We are interested in those values of w0, w1,
w2
which result in 0 error
• Let us plot the error surface corresponding to
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29
• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
30
• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
• For a given w0, w1, w2 we will
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
com- binations of (x1, x2 ) and note
down how many errors we make
30
• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
• For a given w0, w1, w2 we will
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
com- binations of (x1, x2 ) and note
down how many errors we make
• For the OR function, an error occurs
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) >=
(0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
30
• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
• For a given w0, w1, w2 we will
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
com- binations of (x1, x2 ) and note
down how many errors we make
• For the OR function, an error occurs
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0
• or if (x1, x2) /= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
30
• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
• For a given w0, w1, w2 we will
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
com- binations of (x1, x2 ) and note
down how many errors we make
• For the OR function, an error occurs
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) /=
(0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
• W
e are interested in finding an
algorithm which finds the values of w ,
30
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies and a
la- bel (class) associated with each movie
indicating whether the user liked this movie or
not : binary decision
• Further, suppose we represent each movie with
n
features (some boolean, some real valued)
33
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 =
imdbRating(scaled to 0
to 1)
... ... 33
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies and a
la- bel (class) associated with each movie
indicating whether the user liked this movie or
not : binary decision
• Further, suppose we represent each movie with
n
features (some boolean, some real valued)
x0 = 1 x1 x2 .. ..
xn
y
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 =
imdbRating(scaled to 0
to 1)
... ... 33
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies and a
la- bel (class) associated with each movie
indicating whether the user liked this movie or
not : binary decision
• Further, suppose we represent each movie with n
features (some boolean, some real valued)
• We will assume that the data is linearly separable
and we want a perceptron to learn how to make
this decision
x1 x2 .. ..
xn
y
w0 = −θ
x0 = 1
33
w1
w2
.. .. wn
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 =
imdbRating(scaled to 0
to 1)
... ...
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies and a
la- bel (class) associated with each movie
indicating whether the user liked this movie or
not : binary decision
• Further, suppose we represent each movie with n
features (some boolean, some real valued)
• We will assume that the data is linearly separable
and we want a perceptron to learn how to make
this decision
• In other words, we want the perceptron to find
the equation of this separating plane (or find the
val- ues of w0, w1, w2, .., wm)

More Related Content

PPTX
Lecture2-Introduction to Artificial Intelligence
PPTX
UNIt 1 DEEP learning introduction .pptx
PDF
MLIP - Chapter 2 - Preliminaries to deep learning
PDF
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
PDF
Lect1_Threshold_Logic_Unit lecture 1 - ANN
PPT
cs4811-ch11-neural-networks.ppt
PPT
Annintro
PPT
Artificial Neural Network seminar presentation using ppt.
Lecture2-Introduction to Artificial Intelligence
UNIt 1 DEEP learning introduction .pptx
MLIP - Chapter 2 - Preliminaries to deep learning
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
Lect1_Threshold_Logic_Unit lecture 1 - ANN
cs4811-ch11-neural-networks.ppt
Annintro
Artificial Neural Network seminar presentation using ppt.

Similar to Deep Learning Presentation that has been (20)

PPT
ML-NaiveBayes-NeuralNets-Clustering.ppt-
PPT
Annintro
PDF
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
PDF
PDF
10-Perceptron.pdf
PPTX
Mc Culloch Pitts Neuron
PDF
Artificial Neural Network
PDF
Lecture 4 neural networks
PPT
Ann by rutul mehta
PDF
Lect аі 2 n net p2
PPT
Neural networks 1
PPTX
Counter propagation Network
PPT
annintro.ppt
PPTX
Artificial intelligence
PPTX
3 НЕЙРОННЫЕ СЕТИ КАК СОСТАВНЫЕ ЭЛЕМЕНТЫ ГЛУБОКОГО ОБУЧЕНИЯ.pptx
PPTX
SujanKhamrui_28100119050.pptx
PPT
Machine learning by using python lesson 2 Neural Networks By Professor Lili S...
PDF
Soft computing BY:- Dr. Rakesh Kumar Maurya
PPTX
neural-networks (1)
PPTX
THE HUMAN BRAIN AS THE NEURAL NETWORK.pptx
ML-NaiveBayes-NeuralNets-Clustering.ppt-
Annintro
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
10-Perceptron.pdf
Mc Culloch Pitts Neuron
Artificial Neural Network
Lecture 4 neural networks
Ann by rutul mehta
Lect аі 2 n net p2
Neural networks 1
Counter propagation Network
annintro.ppt
Artificial intelligence
3 НЕЙРОННЫЕ СЕТИ КАК СОСТАВНЫЕ ЭЛЕМЕНТЫ ГЛУБОКОГО ОБУЧЕНИЯ.pptx
SujanKhamrui_28100119050.pptx
Machine learning by using python lesson 2 Neural Networks By Professor Lili S...
Soft computing BY:- Dr. Rakesh Kumar Maurya
neural-networks (1)
THE HUMAN BRAIN AS THE NEURAL NETWORK.pptx
Ad

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Pharma ospi slides which help in ospi learning
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PDF
01-Introduction-to-Information-Management.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Lesson notes of climatology university.
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Pharma ospi slides which help in ospi learning
human mycosis Human fungal infections are called human mycosis..pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
01-Introduction-to-Information-Management.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Lesson notes of climatology university.
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Yogi Goddess Pres Conference Studio Updates
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Microbial diseases, their pathogenesis and prophylaxis
Ad

Deep Learning Presentation that has been

  • 1. CSCS3003 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons Syed Sajid Hussain Department of Computer Science and Engineering UPES,Dehradun
  • 4. y σ w1 w2 w3 x1 x2 x3 Artificial Neuron 2 • The most fundamental unit of a deep neural network is called an artificial neuron
  • 5. σ y w1 w2 w3 x1 x2 x3 2 Artificial Neuron • The most fundamental unit of a deep neural network is called an artificial neuron • Why is it called a neuron ? Where does the inspiration come from ? • The inspiration comes from biology (more specifically, from the brain) • biological neurons = neural cells = neural processing units • W e will first see what a biological neuron looks like ...
  • 6. Biological Neurons∗ • dendrite: receives signals from other neurons • synapse: point of connection to other neurons • soma: processes the information • axon: transmits the output of this neuron ∗ Image adapted from https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg 3
  • 7. • Of course, in reality, it is not just a single neuron which does all this 5
  • 8. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons 5
  • 9. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons • The sense organs relay information to the lowest layer of neurons 5
  • 10. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons • The sense organs relay information to the lowest layer of neurons • Some of these neurons may fire (in red) in re- sponse to this information and in turn relay inform- ation to other neurons they are connected to 5
  • 11. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons • The sense organs relay information to the lowest layer of neurons • Some of these neurons may fire (in red) in re- sponse to this information and in turn relay inform- ation to other neurons they are connected to • These neurons may also fire (again, in red) and the process continues 5
  • 12. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons • The sense organs relay information to the lowest layer of neurons • Some of these neurons may fire (in red) in re- sponse to this information and in turn relay inform- ation to other neurons they are connected to • These neurons may also fire (again, in red) and the process continues eventually resulting in a re- sponse (laughter in this case) 5
  • 13. • Of course, in reality, it is not just a single neuron which does all this • There is a massively parallel interconnected net- work of neurons • The sense organs relay information to the lowest layer of neurons • Some of these neurons may fire (in red) in re- sponse to this information and in turn relay inform- ation to other neurons they are connected to • These neurons may also fire (again, in red) and the process continues eventually resulting in a re- sponse (laughter in this case) • An average human brain has around 1011 (100 bil- lion) neurons! 5
  • 14. • This massively parallel network also ensures that there is division of work 6
  • 15. • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus 6
  • 16. • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus 6 A simplified illustration
  • 17. • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus 6 A simplified illustration
  • 18. • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus 6 A simplified illustration
  • 19. • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus 6 A simplified illustration
  • 20. A simplified illustration 6 • This massively parallel network also ensures that there is division of work • Each neuron may perform a certain role or respond to a certain stimulus
  • 21. • The neurons in the brain are arranged in a hierarchy 7
  • 22. Sample illustration of hierarchical processing∗ ∗ Idea borrowed from Hugo Larochelle’s lecture slides 8
  • 23. Disclaimer 9 • I understand very little about how the brain works! • What you saw so far is an overly simplified explanation of how the brain works! • But this explanation suffices for the purpose of this course!
  • 24. Module 2.2: McCulloch Pitts Neuron 10
  • 25. x1 x2 .. .. xn • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) 11
  • 26. x1 x2 .. .. xn ∈ {0, 1} • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) 11
  • 27. x1 x2 .. .. xn ∈ {0, 1} g • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs 11
  • 28. x1 x2 .. .. xn ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation 11
  • 29. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation 11
  • 30. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory 11
  • 31. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 11
  • 32. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 11 1 2 n n L i=1 g(x , x , ..., x ) = g(x) = xi
  • 33. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 11 1 2 n n L g(x , x , ..., x ) = g(x) = xi i=1 y = f (g(x)) = 1 if g(x) ≥ θ
  • 34. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 11 1 2 n n ∑ g(x , x , ..., x ) = g(x) = xi y = f (g(x)) = 1 if = 0 if i=1 g(x) ≥ θ g(x) < θ
  • 35. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 11 1 2 n n L g(x , x , ..., x ) = g(x) = xi i=1 g(x) ≥ θ y = f (g(x)) = 1 if = 0 if g(x) < θ • θ is called the thresholding parameter
  • 36. x1 x2 .. .. xn ∈ {0, 1} y ∈ {0, 1} g f • McCulloch (neuroscientist) and Pitts (logician) proposed a highly simplified computational model of the neuron (1943) • g aggregates the inputs and the function f takes a decision based on this aggregation • The inputs can be excitatory or inhibitory • y = 0 if any xi is inhibitory,else 1 2 n n L g(x , x , ..., x ) = g(x) = xi i=1 g(x) ≥ θ y = f (g(x)) = 1 if = 0 if g(x) < θ • θ is called the thresholding parameter 11
  • 37. Let us implement some boolean functions using this McCulloch Pitts (MP) neuron ... 12
  • 38. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit 13
  • 39. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} 13 x1 x2 x3 AND function
  • 40. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} 3 x1 x2 x3 AND function 13
  • 41. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} 13 x1 x2 x3 OR function
  • 42. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} 1 x1 x2 x3 OR function 13
  • 43. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} 1 x1 x2 x3 OR function x1 x2 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 44. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} 1 y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} 1 x1 x2 x3 OR function x1 x2 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 45. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} y ∈ {0, 1} 1 x1 x2 x3 OR function 1 x1 x2 x1 x2 x1 AND !x2 ∗ NOR function ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 46. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} y ∈ {0, 1} 1 x1 x2 x3 OR function 1 0 x1 x2 x1 x2 x1 AND !x2 ∗ NOR function ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 47. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} y ∈ {0, 1} 1 x1 x2 x3 OR function y ∈ {0, 1} 1 0 x1 x2 x1 x2 x1 NO T function x1 AND !x2 ∗ NOR function ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 48. y ∈ {0, 1} θ x1 x2 x3 A McCulloch Pitts unit y ∈ {0, 1} y ∈ {0, 1} 3 x1 x2 x3 AND function y ∈ {0, 1} y ∈ {0, 1} 1 x1 x2 x3 OR function y ∈ {0, 1} 1 0 0 x1 x2 x1 x2 x1 NO T function x1 AND !x2 ∗ NOR function ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 13
  • 49. • Can any boolean function be represented using a McCulloch Pitts unit ? 14
  • 50. • Can any boolean function be represented using a McCulloch Pitts unit ? • Before answering this question let us first see the geometric interpretation of a MP unit ... 14
  • 51. y ∈ {0, 1} 1 x1 x2 OR function 15 1 2 x + x = L 2 i=1 xi ≥ 1
  • 52. y ∈ {0, 1} 1 x1 x2 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 (0, 0) (1, 0) 15
  • 53. y ∈ {0, 1} 1 x1 x2 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 + x2 = θ = 1 x1 (0, 0) (1, 0) 15
  • 54. y ∈ {0, 1} 1 x1 x2 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 + x2 = θ = 1 • A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves x1 (0, 0) (1, 0) 15
  • 55. x1 x2 y ∈ {0, 1} 1 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 + x2 = θ = 1 • A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves x1 (0, 0) (1, 0) 15 L n i=1 xi − θ = 0 • Points lying on or above the line and points lying below this line
  • 56. x1 x2 y ∈ {0, 1} 1 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 + x2 = θ = 1 • A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves x1 (0, 0) (1, 0) 15 L n i=1 xi − θ = 0 • Points lying on or above the line and points lying below this line • In other words, all inputs which produce an output 0 will be on one side ( L n i=1 xi < θ) of the line and all inputs which produce an output 1 will lie on the other side ( L n i=1 xi ≥ θ) of this line
  • 57. x1 x2 y ∈ {0, 1} 1 OR function 1 2 x + x = L 2 i=1 xi ≥ 1 x2 (0, 1) (1, 1) x1 + x2 = θ = 1 • A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves x1 (0, 0) (1, 0) 15 L n i=1 xi − θ = 0 • Points lying on or above the line and points lying below this line • In other words, all inputs which produce an output 0 will be on one side ( L n i=1 xi < θ) of the line and all inputs which produce an output 1 will lie on the other side ( L n i=1 xi ≥ θ) of this line • Let us convince ourselves about this with a few more examples (if it is not already clear from the math)
  • 58. y ∈ {0, 1} 2 x1 x2 AND function 16 1 2 x + x = L 2 i=1 xi ≥ 2
  • 59. y ∈ {0, 1} 2 x1 x2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) x1 (0, 0) (1, 0) 16
  • 60. y ∈ {0, 1} 2 x1 x2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) x1 + x2 = θ = 2 x1 (0, 0) (1, 0) 16
  • 61. x1 x2 y ∈ {0, 1} 2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) x1 + x2 = θ = 2 y ∈ {0, 1} x1 (0, 0) (1, 0) 16 x1 x2 Tautology (always ON)
  • 62. x1 x2 y ∈ {0, 1} 2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) x1 + x2 = θ = 2 y ∈ {0, 1} 0 x1 x2 Tautology (always ON) x1 (0, 0) (1, 0) 16
  • 63. x1 x2 y ∈ {0, 1} 2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) y ∈ {0, 1} 0 x1 x2 Tautology (always ON) x2 (0, 1) (1, 1) x1 + x2 = θ = 2 x1 (0, 0) (1, 0) x1 (0, 0) (1, 0) 16
  • 64. x1 x2 y ∈ {0, 1} 2 AND function 1 2 x + x = L 2 i=1 xi ≥ 2 x2 (0, 1) (1, 1) y ∈ {0, 1} 0 x1 x2 Tautology (always ON) x2 (0, 1) (1, 1) x1 + x2 = θ = 0 x1 + x2 = θ = 2 x1 (0, 0) (1, 0) x1 (0, 0) (1, 0) 16
  • 65. x1 x2 x3 y ∈ {0, 1} O R 1 • What if we have more than 2 inputs? 17
  • 66. y ∈ {0, 1} O R 1 x1 x2 x3 x2 (0, 0, 0) (0, 1, 0) (1, 0, 0) x1 (1, 1, 0) (0, 1, 1) (1, 1, 1) • What if we have more than 2 inputs? (0, 0, 1) (1, 0, 1) x3 17
  • 67. y ∈ {0, 1} O R 1 x1 x2 x3 x2 (0, 0, 0) (0, 1, 0) (1, 0, 0) x1 (1, 1, 0) (0, 1, 1) (1, 1, 1) • What if we have more than 2 inputs? • Well, instead of a line we will have a plane (0, 0, 1) (1, 0, 1) x3 17
  • 68. x1 x2 x3 y ∈ {0, 1} O R 1 x2 (0, 0, 0) (0, 1, 0) (1, 0, 0) x1 (1, 1, 0) (0, 1, 1) (1, 1, 1) • What if we have more than 2 inputs? • Well, instead of a line we will have a plane • For the OR function, we want a plane such that the point (0,0,0) lies on one side and the remaining 7 points lie on the other side of the plane (0, 0, 1) (1, 0, 1) x3 17
  • 69. y ∈ {0, 1} O R 1 (0, 0, 0) x1 x2 x3 x2 (0, 1, 0) (1, 0, 0) x1 (1, 1, 0) (0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1 • What if we have more than 2 inputs? • Well, instead of a line we will have a plane • For the OR function, we want a plane such that the point (0,0,0) lies on one side and the remaining 7 points lie on the other side of the plane (0, 0, 1) (1, 0, 1) x3 17
  • 70. The story so far ... 18 • A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable
  • 71. The story so far ... 18 • A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable • Linear separability (for boolean functions) : There exists a line (plane) such that all in- puts which produce a 1 lie on one side of the line (plane) and all inputs which produce a 0 lie on other side of the line (plane)
  • 73. The story ahead ... 20 • What about non-boolean (say, real) inputs ?
  • 74. The story ahead ... 20 • What about non-boolean (say, real) inputs ? • Do we always need to hand code the threshold ?
  • 75. The story ahead ... 20 • What about non-boolean (say, real) inputs ? • Do we always need to hand code the threshold ? • Are all inputs equal ? What if we want to assign more weight (importance) to some inputs ?
  • 76. The story ahead ... 20 • What about non-boolean (say, real) inputs ? • Do we always need to hand code the threshold ? • Are all inputs equal ? What if we want to assign more weight (importance) to some inputs ? • What about functions which are not linearly separable ?
  • 77. • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958) 21
  • 78. x1 x2 .. .. xn y w1 21 w2 .. .. wn • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958)
  • 79. x1 x2 .. .. xn y w1 21 w2 .. .. wn • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958) • A more general computational model than McCul- loch–Pitts neurons
  • 80. x1 x2 .. .. xn y w1 21 w2 .. .. wn • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958) • A more general computational model than McCul- loch–Pitts neurons • Main differences: Introduction of numerical weights for inputs and a mechanism for learning these weights
  • 81. x1 x2 .. .. xn y w1 21 w2 .. .. wn • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958) • A more general computational model than McCul- loch–Pitts neurons • Main differences: Introduction of numerical weights for inputs and a mechanism for learning these weights • Inputs are no longer limited to boolean values
  • 82. x1 x2 .. .. xn y w1 21 w2 .. .. wn • Frank Rosenblatt, an American psychologist, pro- posed the classical perceptron model (1958) • A more general computational model than McCul- loch–Pitts neurons • Main differences: Introduction of numerical weights for inputs and a mechanism for learning these weights • Inputs are no longer limited to boolean values • Refined and carefully analyzed by Minsky and Pa- pert (1969) - their model is referred to as the per- ceptron model here
  • 83. x1 x2 .. .. xn y w1 w2 22 .. .. wn
  • 84. x1 x2 .. .. xn y w1 w2 22 .. .. wn n L i=1 i i y = 1 if w ∗ x ≥ θ
  • 85. x1 x2 .. .. xn y w1 w2 22 .. .. wn n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ
  • 86. x1 x2 .. .. xn y w1 w2 22 .. .. wn n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above,
  • 87. x1 x2 .. .. xn y w1 w2 22 .. .. wn n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i=1 i i y = 1 if w ∗ x − θ ≥ 0
  • 88. x1 x2 .. .. xn y w1 w2 22 .. .. wn n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i i y = 1 if w ∗ x − θ ≥ 0 i=1 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 89. .. xn y w1 w2 22 .. .. wn x1 x2 .. A more accepted convention, n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L y = 1 if wi ∗ xi − θ ≥ 0 i=1 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 90. .. xn y w1 w2 22 .. .. wn x1 x2 .. A more accepted convention, n L i=0 y = 1 if wi ∗ xi ≥ 0 n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i=1 y = 1 if wi ∗ xi − θ ≥ 0 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 91. .. xn y w1 w2 where, x0 = 1 and w0 = −θ 22 .. .. wn x1 x2 .. A more accepted convention, n L i=0 y = 1 if wi ∗ xi ≥ 0 n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i=1 y = 1 if wi ∗ xi − θ ≥ 0 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 92. x1 x2 .. .. xn y w1 w2 where, x0 = 1 and w0 = −θ 22 .. .. wn w0 = −θ x0 = 1 A more accepted convention, n L i=0 y = 1 if wi ∗ xi ≥ 0 n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i=1 y = 1 if wi ∗ xi − θ ≥ 0 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 93. x1 x2 .. .. xn y w1 w2 where, x0 = 1 and w0 = −θ 22 .. .. wn w0 = −θ x0 = 1 A more accepted convention, n L i=0 y = 1 if wi ∗ xi ≥ 0 n L i=0 i i = 0 if w ∗ x < 0 n L i i y = 1 if w ∗ x ≥ θ i=1 n L i=1 i i = 0 if w ∗ x < θ Rewriting the above, n L i=1 y = 1 if wi ∗ xi − θ ≥ 0 n L i=1 i i = 0 if w ∗ x − θ < 0
  • 94. We will now try to answer the following questions: • Why are we trying to implement boolean functions? • Why do we need weights ? • Why is w0 = −θ called the bias ? 23
  • 95. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 • Consider the task of predicting whether we would like a movie or not
  • 96. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 • Consider the task of predicting whether we would like a movie or not • Suppose, we base our decision on 3 inputs (binary, for sim- plicity)
  • 97. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • Consider the task of predicting whether we would like a movie or not • Suppose, we base our decision on 3 inputs (binary, for sim- plicity) • Based on our past viewing experience (data), we may give a high weight to isDirectorNolan as compared to the other inputs
  • 98. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • Consider the task of predicting whether we would like a movie or not • Suppose, we base our decision on 3 inputs (binary, for sim- plicity) • Based on our past viewing experience (data), we may give a high weight to isDirectorNolan as compared to the other inputs • Specifically, even if the actor is not Matt Damon and the genre is not thriller we would still want to cross the threshold θ by assigning a high weight to isDirectorNolan
  • 99. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • w0 is called the bias as it represents the prior (prejudice)
  • 100. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • w0 is called the bias as it represents the prior (prejudice) • A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, director [θ = 0]
  • 101. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • w0 is called the bias as it represents the prior (prejudice) • A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, director [θ = 0] • On the other hand, a selective viewer may only watch thrillers starring Matt Damon and directed by Nolan [θ = 3]
  • 102. x1 x2 x3 y w0 = −θ x0 = 1 24 w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan • w0 is called the bias as it represents the prior (prejudice) • A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, director [θ = 0] • On the other hand, a selective viewer may only watch thrillers starring Matt Damon and directed by Nolan [θ = 3] • The weights (w1, w2, ..., wn) and the bias (w0) will depend on the data (viewer history in this case)
  • 103. What kind of functions can be implemented using the perceptron? Any difference from McCulloch Pitts neurons? 25
  • 104. McCulloch Pitts Neuron (assuming no inhibitory inputs) 26 n L i y = 1 if x ≥ 0 i=0 n L i=0 i = 0 if x < 0 Perceptron n L i i y = 1 if w ∗ x ≥ 0 i=0 n L i=0 i i = 0 if w ∗ x < 0
  • 105. McCulloch Pitts Neuron (assuming no inhibitory inputs) n L i y = 1 if x ≥ 0 i=0 n L i=0 i = 0 if x < 0 Perceptron n L i i y = 1 if w ∗ x ≥ 0 i=0 n L i=0 i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves 26
  • 106. McCulloch Pitts Neuron (assuming no inhibitory inputs) n L i=0 i y = 1 if x ≥ 0 n L = 0 if xi < 0 i=0 Perceptron n L i i ∗ x ≥ 0 y = 1 if w i=0 n L i=0 i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves • All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side 26
  • 107. McCulloch Pitts Neuron (assuming no inhibitory inputs) n L i=0 i y = 1 if x ≥ 0 n L = 0 if xi < 0 i=0 Perceptron n L i i ∗ x ≥ 0 y = 1 if w i=0 n L i=0 i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves • All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side • In other words, a single perceptron can only be used to implement linearly separable functions 26
  • 108. McCulloch Pitts Neuron (assuming no inhibitory inputs) n L i=0 i y = 1 if x ≥ 0 n L = 0 if xi < 0 i=0 Perceptron n L i i ∗ x ≥ 0 y = 1 if w i=0 n L i=0 i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves • All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side • In other words, a single perceptron can only be used to implement linearly separable functions • Then what is the difference? 26
  • 109. McCulloch Pitts Neuron (assuming no inhibitory inputs) i=0 26 n L i=0 i y = 1 if x ≥ 0 n L = 0 if xi < 0 i=0 Perceptron n L i i ∗ x ≥ 0 y = 1 if w i=0 n L i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves • All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side • In other words, a single perceptron can only be used to implement linearly separable functions • Then what is the difference? The weights (includ- ing threshold) can be learned and the inputs can be real valued
  • 110. McCulloch Pitts Neuron (assuming no inhibitory inputs) i=0 26 n L i=0 i y = 1 if x ≥ 0 n L = 0 if xi < 0 i=0 Perceptron n L i i ∗ x ≥ 0 y = 1 if w i=0 n L i i = 0 if w ∗ x < 0 • From the equations it should be clear that even a perceptron separates the input space into two halves • All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side • In other words, a single perceptron can only be used to implement linearly separable functions • Then what is the difference? The weights (includ- ing threshold) can be learned and the inputs can be real valued • We will first revisit some boolean functions and then see the perceptron learning algorithm (for learning weights)
  • 112. x1 x2 OR 0 0 0 27
  • 113. x1 x2 OR 0 0 0 0 w + L 2 i=1 wixi 27
  • 114. x1 x2 OR 0 0 0 0 w + L 2 i=1 wixi < 0 27
  • 115. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 27
  • 116. x1 x2 OR 0 L 0 0 0 0 w + 1 0 1 w + L 2 i=1 2 i=1 wixi < 0 wixi ≥ 0 27
  • 117. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 27
  • 118. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 w0 + L 2 wixi ≥ 0 27
  • 119. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0
  • 120. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
  • 121. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
  • 122. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
  • 123. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
  • 124. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 27 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 • One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible)
  • 125. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 • One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) 27
  • 126. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 • One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 other solutions are possible) 27
  • 127. x1 x2 OR 0 0 0 w0 + L 2 wixi < 0 i=1 1 0 1 w0 + L 2 wixi ≥ 0 i=1 0 1 1 i=1 w0 + L 2 wixi ≥ 0 1 1 1 w + L 2 i=1 0 i i w x ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 • One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 • Note that we can come up with a similar set of inequalities and find the value of θ for a McCul- loch Pitts neuron also (Try it!) 27
  • 128. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 x1 (0, 0) (1, 0) x2 −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) 29
  • 129. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 x1 x2 (0, 0) (1, 0) −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) −1 + (−1)x1 + (−1)x2 = 0 29
  • 130. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? x1 x2 (0, 0) (1, 0) −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) −1 + (−1)x1 + (−1)x2 = 0 29
  • 131. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs x1 x2 (0, 0) (1, 0) −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) −1 + (−1)x1 + (−1)x2 = 0 29
  • 132. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make x1 x2 (0, 0) (1, 0) −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) −1 + (−1)x1 + (−1)x2 = 0 29
  • 133. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 x1 x2 (0, 0) (1, 0) −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) −1 + (−1)x1 + (−1)x2 = 0 29
  • 134. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 - 1 1.5 0 3 1 x1 x2 −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) (0, 0) (1, 0) −1 + (1.5)x1 + (0)x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 29
  • 135. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 x1 x2 −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) (0, 0) (1, 0) −1 + (1.5)x1 + (0)x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 29
  • 136. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 • We are interested in those values of w0, w1, w2 which result in 0 error x1 x2 −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) (0, 0) (1, 0) −1 + (1.5)x1 + (0)x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 29
  • 137. • Let us fix the threshold (−w0 = 1) and try differ- ent values of w1, w2 • Say, w1 = −1, w2 = −1 • What is wrong with this line? We make an error on 1 out of the 4 inputs • Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 • We are interested in those values of w0, w1, w2 which result in 0 error • Let us plot the error surface corresponding to x1 x2 −1 + 1.1x1 + 1.1x2 = 0 (0, 1) (1, 1) (0, 0) (1, 0) −1 + (1.5)x1 + (0)x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 29
  • 138. • For ease of analysis, we will keep w0 fixed (-1) and plot the error for different values of w1, w2 30
  • 139. • For ease of analysis, we will keep w0 fixed (-1) and plot the error for different values of w1, w2 • For a given w0, w1, w2 we will compute −w0 + w1 ∗ x1 + w2 ∗ x2 for all com- binations of (x1, x2 ) and note down how many errors we make 30
  • 140. • For ease of analysis, we will keep w0 fixed (-1) and plot the error for different values of w1, w2 • For a given w0, w1, w2 we will compute −w0 + w1 ∗ x1 + w2 ∗ x2 for all com- binations of (x1, x2 ) and note down how many errors we make • For the OR function, an error occurs if (x1, x2 ) = (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) >= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 30
  • 141. • For ease of analysis, we will keep w0 fixed (-1) and plot the error for different values of w1, w2 • For a given w0, w1, w2 we will compute −w0 + w1 ∗ x1 + w2 ∗ x2 for all com- binations of (x1, x2 ) and note down how many errors we make • For the OR function, an error occurs if (x1, x2 ) = (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 ≥ 0 • or if (x1, x2) /= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 30
  • 142. • For ease of analysis, we will keep w0 fixed (-1) and plot the error for different values of w1, w2 • For a given w0, w1, w2 we will compute −w0 + w1 ∗ x1 + w2 ∗ x2 for all com- binations of (x1, x2 ) and note down how many errors we make • For the OR function, an error occurs if (x1, x2 ) = (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) /= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 • W e are interested in finding an algorithm which finds the values of w , 30
  • 143. • Let us reconsider our problem of deciding whether to watch a movie or not • Suppose we are given a list of m movies and a la- bel (class) associated with each movie indicating whether the user liked this movie or not : binary decision • Further, suppose we represent each movie with n features (some boolean, some real valued) 33
  • 144. x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... 33 • Let us reconsider our problem of deciding whether to watch a movie or not • Suppose we are given a list of m movies and a la- bel (class) associated with each movie indicating whether the user liked this movie or not : binary decision • Further, suppose we represent each movie with n features (some boolean, some real valued)
  • 145. x0 = 1 x1 x2 .. .. xn y x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... 33 • Let us reconsider our problem of deciding whether to watch a movie or not • Suppose we are given a list of m movies and a la- bel (class) associated with each movie indicating whether the user liked this movie or not : binary decision • Further, suppose we represent each movie with n features (some boolean, some real valued) • We will assume that the data is linearly separable and we want a perceptron to learn how to make this decision
  • 146. x1 x2 .. .. xn y w0 = −θ x0 = 1 33 w1 w2 .. .. wn x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... • Let us reconsider our problem of deciding whether to watch a movie or not • Suppose we are given a list of m movies and a la- bel (class) associated with each movie indicating whether the user liked this movie or not : binary decision • Further, suppose we represent each movie with n features (some boolean, some real valued) • We will assume that the data is linearly separable and we want a perceptron to learn how to make this decision • In other words, we want the perceptron to find the equation of this separating plane (or find the val- ues of w0, w1, w2, .., wm)