Deep Learning Presentation that has been

CSCS3003 (Deep Learning) : Lecture 2
McCulloch Pitts Neuron, Thresholding Logic, Perceptrons
Syed Sajid Hussain
Department of Computer Science and Engineering
UPES,Dehradun

Module 2.1: Biological
Neurons
1

y
σ
w1 w2 w3
x1 x2 x3
Artificial Neuron
2
• The most fundamental unit of a deep
neural network is called an artificial
neuron

σ
y
w1 w2 w3
x1 x2
x3
2
Artificial Neuron
• The most fundamental unit of a deep
neural network is called an artificial
neuron
• Why is it called a neuron ? Where does
the inspiration come from ?
• The inspiration comes from biology
(more specifically, from the brain)
• biological neurons = neural cells =
neural processing units
• W
e will first see what a biological
neuron looks like ...

Biological Neurons∗
• dendrite: receives signals from other
neurons
• synapse: point of connection to other
neurons
• soma: processes the information
• axon: transmits the output of
this neuron
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
3

• Of course, in reality, it is not just a single neuron
which does all this
5

which does all this
• There is a massively parallel interconnected net-
work of neurons
5

which does all this
work of neurons
• The sense organs relay information to the lowest
layer of neurons
5

which does all this
work of neurons
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
information to other neurons they are
connected to
5

which does all this
work of neurons
layer of neurons
connected to
• These neurons may also fire (again, in red) and
the process continues
5

which does all this
work of neurons
layer of neurons
connected to
the process continues eventually resulting in a
re- sponse (laughter in this case)
5

which does all this
work of neurons
layer of neurons
connected to
the process continues eventually resulting in a
re- sponse (laughter in this case)
• An average human brain has around 1011 (100
bil- lion) neurons!
5

• This massively parallel network also ensures that
there is division of work
6

• Each neuron may perform a certain role or
respond to a certain stimulus
6

6
A simplified illustration

A simplified illustration
6

• The neurons in the brain are arranged
in a hierarchy
7

Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s
lecture slides
8

Disclaimer
9
• I understand very little about how the brain works!
• What you saw so far is an overly simplified explanation of how the brain
works!
• But this explanation suffices for the purpose of this course!

Module 2.2: McCulloch Pitts
Neuron
10

x1 x2 ..
..
xn
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational
model of the neuron (1943)
11

x1 x2 ..
..
xn ∈ {0, 1}
11

x1 x2 ..
..
xn ∈ {0, 1}
g
• g aggregates the inputs
11

x1 x2 ..
..
xn ∈ {0, 1}
g
f
• g aggregates the inputs and the function f
takes a decision based on this aggregation
11

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
11

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• The inputs can be excitatory or inhibitory
11

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
• y = 0 if any xi is inhibitory,else
11

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
11
1 2 n
n
L
i=1
g(x , x , ..., x ) = g(x) = xi

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
11
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
y = f (g(x)) = 1 if
g(x) ≥ θ

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
11
1 2 n
n
∑
g(x , x , ..., x ) = g(x) = xi
y = f (g(x)) = 1 if
= 0
if
i=1
g(x) ≥
θ
g(x) <
θ

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
11
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
g(x) ≥
θ
y = f (g(x)) = 1 if
= 0
if
g(x) < θ
• θ is called the thresholding
parameter

x1 x2 ..
..
xn ∈ {0, 1}
y ∈ {0,
1}
g
f
1 2 n
n
L
g(x , x , ..., x ) = g(x) = xi
i=1
g(x) ≥
θ
y = f (g(x)) = 1 if
= 0
if
g(x) < θ
• θ is called the thresholding
parameter 11

Let us implement some boolean functions using this McCulloch Pitts (MP)
neuron ...
12

y ∈ {0, 1}
θ
x1 x2 x3
A McCulloch Pitts unit
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
13
x1 x2 x3
AND function

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
3
x1 x2 x3
AND function
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
13
x1 x2 x3
OR function

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
x1 x2
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be
0
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
1
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
1
x1 x2 x3
OR function
x1 x2
x1 AND !x2
∗
∗
0
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
1
x1 x2 x1 x2
x1 AND !x2
∗
NOR function
∗
0
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
1 0
x1 x2 x1 x2
x1 AND !x2
∗
NOR function
∗
0
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
y ∈ {0,
1}
1 0
x1 x2 x1 x2 x1
NO
T function
x1 AND !x2
∗
NOR function
∗
0
13

y ∈ {0,
1}
θ
x1 x2 x3
y ∈ {0,
1}
y ∈ {0,
1}
3
x1 x2 x3
AND function
y ∈ {0,
1}
y ∈ {0,
1}
1
x1 x2 x3
OR function
y ∈ {0,
1}
1 0 0
x1 x2 x1 x2 x1
NO
T function
x1 AND !x2
∗
NOR function
∗
0
13

• Can any boolean function be represented using a McCulloch Pitts unit ?
14

• Can any boolean function be represented using a McCulloch Pitts unit ?
• Before answering this question let us first see the geometric interpretation of a MP
unit
...
14

y ∈ {0, 1}
1
x1 x2
OR function
15
1
2
x + x =
L 2
i=1 xi ≥ 1

y ∈ {0, 1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1
(0, 0) (1, 0) 15

y ∈ {0, 1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
x1
(0, 0) (1, 0) 15

y ∈ {0,
1}
1
x1 x2
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
x1
(0, 0) (1, 0) 15

x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
• Points lying on or above the
line and points lying below this
line

x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
line
• In other words, all inputs which produce an
output
0 will be on one side
(
L n
i=1 xi < θ) of the line
and
all inputs which produce an output 1 will lie on
the
other side
(
L n
i=1 xi ≥ θ) of this line

x1 x2
y ∈ {0,
1}
1
OR function
1
2
x + x =
L 2
i=1 xi ≥ 1
x2
(0, 1) (1, 1)
x1 + x2 = θ = 1
x1
(0, 0) (1, 0) 15
L n
i=1 xi − θ = 0
line
• In other words, all inputs which produce an
output
0 will be on one side
(
L n
i=1 xi < θ) of the line
and
all inputs which produce an output 1 will lie on
the
other side
(
L n
i=1 xi ≥ θ) of this line
• Let us convince ourselves about this with a few
more examples (if it is not already clear from the
math)

y ∈ {0, 1}
2
x1 x2
AND function
16
1
2
x + x =
L 2
i=1 xi ≥ 2

y ∈ {0, 1}
2
x1 x2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1
(0, 0) (1, 0)
16

y ∈ {0, 1}
2
x1 x2
AND function
1
2
x + x = L
2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
16

x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
y ∈ {0,
1}
x1
(0, 0) (1, 0)
16
x1 x2
Tautology (always ON)

x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
y ∈ {0,
1}
0
x1 x2
x1
(0, 0) (1, 0)
16

x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
y ∈ {0,
1}
0
x1 x2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
x1
(0, 0) (1, 0)
16

x1 x2
y ∈ {0, 1}
2
AND function
1
2
x + x =
L 2
i=1 xi ≥ 2
x2
(0, 1) (1, 1)
y ∈ {0,
1}
0
x1 x2
x2
(0, 1) (1, 1)
x1 + x2 = θ = 0
x1 + x2 = θ = 2
x1
(0, 0) (1, 0)
x1
(0, 0) (1, 0)
16

x1 x2 x3
y ∈ {0,
1}
O
R
1
• What if we have more than 2 inputs?
17

y ∈ {0, 1}
O
R
1
x1
x2
x3 x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
(0, 0,
1)
(1, 0,
1)
x3
17

y ∈ {0, 1}
O
R
1
x1
x2
x3 x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
• Well, instead of a line we will have a
plane
(0, 0,
1)
(1, 0,
1)
x3
17

x1 x2 x3
y ∈ {0, 1}
O
R
1
x2
(0, 0,
0)
(0, 1,
0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1,
1)
plane
• For the OR function, we want a
plane such that the point (0,0,0) lies
on one side and the remaining 7 points
lie on the other side of the plane
(0, 0,
1)
(1, 0,
1)
x3
17

y ∈ {0, 1}
O
R
1
(0, 0,
0)
x1
x2
x3 x2
(0, 1, 0)
(1, 0, 0) x1
(1, 1,
0)
(0, 1,
1)
(1, 1, 1)x1 + x2 + x3 = θ = 1
plane
• For the OR function, we want a
plane such that the point (0,0,0) lies
on one side and the remaining 7 points
lie on the other side of the plane
(0, 0,
1)
(1, 0,
1)
x3
17

The story so far ...
18
• A single McCulloch Pitts Neuron can be used to represent boolean functions which
are linearly separable

The story so far ...
18
• A single McCulloch Pitts Neuron can be used to represent boolean functions which
are linearly separable
• Linear separability (for boolean functions) : There exists a line (plane) such that all
inputs which produce a 1 lie on one side of the line (plane) and all inputs which
produce a 0 lie on other side of the line (plane)

The story ahead ...
20
• What about non-boolean (say, real) inputs ?

The story ahead ...
20
• Do we always need to hand code the
threshold ?

The story ahead ...
20
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?

The story ahead ...
20
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?
• What about functions which are not linearly separable ?

• Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
21

x1 x2 .. .. xn
y
w1
21
w2 .. .. wn

x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• A more general computational model than
McCul- loch–Pitts neurons

x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
• Main differences: Introduction of numerical
weights for inputs and a mechanism for learning
these weights

x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
these weights
• Inputs are no longer limited to boolean values

x1 x2 .. .. xn
y
w1
21
w2 .. .. wn
these weights
• Inputs are no longer limited to boolean values
• Refined and carefully analyzed by Minsky and Pa-
pert (1969) - their model is referred to as the
perceptron model here

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i=1
i i
y = 1 if w ∗ x ≥
θ

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
Rewriting the above,

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i=1
i i
y = 1 if w ∗ x − θ ≥
0

x1 x2 .. .. xn
y
w1
w2
22
.. .. wn
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i i
y = 1 if w ∗ x − θ ≥
0 i=1
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

.. xn
y
w1
w2
22
.. .. wn
x1 x2
..
A more accepted convention,
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
y = 1 if wi ∗ xi − θ ≥
0
i=1
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

.. xn
y
w1
w2
22
.. .. wn
x1 x2
..
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

.. xn
y
w1
w2
where, x0 = 1 and w0 =
−θ 22
.. .. wn
x1 x2
..
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

x1 x2
..
.. xn
y
w1
w2
−θ 22
.. .. wn
w0 = −θ
x0 = 1
A more accepted
convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

x1 x2
..
.. xn
y
w1
w2
−θ 22
.. .. wn
w0 = −θ
x0 = 1
A more accepted
convention,
n
L
i=0
y = 1 if wi ∗ xi ≥
0
n
L
i=0
i i
= 0 if w ∗ x <
0
n
L
i i
y = 1 if w ∗ x ≥
θ i=1
n
L
i=1
i i
= 0 if w ∗ x <
θ
n
L
i=1
y = 1 if wi ∗ xi − θ ≥
0
n
L
i=1
i i
= 0 if w ∗ x − θ <
0

We will now try to answer the following questions:
• Why are we trying to implement boolean functions?
• Why do we need weights ?
• Why is w0 = −θ called the bias ?
23

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
• Consider the task of predicting whether we would like a
movie or not

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
movie or not
• Suppose, we base our decision on 3 inputs (binary, for
sim- plicity)

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
movie or not
sim- plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the
other inputs

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
movie or not
sim- plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the
other inputs
• Specifically, even if the actor is not Matt Damon and the
genre is not thriller we would still want to cross the
threshold θ by assigning a high weight to isDirectorNolan

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
• w0 is called the bias as it represents the prior
(prejudice)

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, director
[θ = 0]

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
[θ = 0]
• On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan [θ =
3]

x1 x2 x3
y
w0 = −θ
x0 = 1
24
w1 w2
w3
x1 = isActorDamon
[θ = 0]
• On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan [θ =
3]
• The weights (w1, w2, ..., wn) and the bias (w0) will
depend on the data (viewer history in this case)

What kind of functions can be implemented using the perceptron? Any difference
from McCulloch Pitts neurons?
25

McCulloch Pitts Neuron
(assuming no inhibitory inputs)
26
n
L
i
y = 1 if x ≥ 0
i=0
n
L
i=0
i
= 0 if x < 0
Perceptron
n
L
i i
y = 1 if w ∗ x ≥
0 i=0
n
L
i=0
i i
= 0 if w ∗ x <
0

n
L
i
y = 1 if x ≥ 0
i=0
n
L
i=0
i
= 0 if x < 0
Perceptron
n
L
i i
y = 1 if w ∗ x ≥
0 i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
• From the equations it should be clear that even
a perceptron separates the input space into two
halves
26

n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
halves
• All inputs which produce a 1 lie on one side and
all inputs which produce a 0 lie on the other side
26

n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
halves
• In other words, a single perceptron can only be
used to implement linearly separable functions
26

n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i=0
i i
= 0 if w ∗ x <
0
halves
• Then what is the difference?
26

i=0
26
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i i
= 0 if w ∗ x <
0
halves
• Then what is the difference? The weights
(includ- ing threshold) can be learned and the
inputs can be real valued

i=0
26
n
L
i=0
i
y = 1 if x ≥ 0
n
L
= 0 if xi < 0
i=0
Perceptron
n
L
i i
∗ x ≥
0
y = 1 if
w
i=0
n
L
i i
= 0 if w ∗ x <
0
halves
• Then what is the difference? The weights
(includ- ing threshold) can be learned and the
inputs can be real valued
• We will first revisit some boolean functions and
then see the perceptron learning algorithm (for
learning weights)

x1 x2
OR 0
0 0 0 w +
L 2
i=1 wixi
27

x1 x2
OR 0
0 0 0 w +
L 2
i=1 wixi < 0
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1
27

x1 x2
OR 0
L
0
0 0 0 w +
1 0 1 w +
L
2
i=1
2
i=1
wixi < 0
wixi ≥ 0
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥
−w0

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
27
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various other solutions are possible)

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• One possible solution to this set of inequalities is
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
other solutions are possible)
27

x1 x2
OR
0 0 0 w0 +
L 2
wixi < 0
i=1
1 0 1 w0 +
L 2
wixi ≥ 0
i=1
0 1 1 i=1
w0 +
L 2
wixi ≥ 0
1 1 1 w +
L 2
i=1
0 i i
w x ≥
0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
w0 = −1, w1 = 1.1, , w2 = 1.1 (and
various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
• Note that we can come up
with a similar set of
inequalities and find the value
of θ for a McCul- loch Pitts
neuron also (Try it!) 27

• Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
x1
(0, 0) (1, 0)
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
29

• Say, w1 = −1, w2 = −1
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29

• Say, w1 = −1, w2 = −1
• What is wrong with this line?
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29

• Say, w1 = −1, w2 = −1
• What is wrong with this line? We make an error
on 1 out of the 4 inputs
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29

• Say, w1 = −1, w2 = −1
• Lets try some more values of w1, w2 and note
how many errors we make
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29

• Say, w1 = −1, w2 = −1
w1 w2 errors
-1 -1 3
x1
x2
(0, 0) (1, 0)
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
−1 + (−1)x1 + (−1)x2 = 0
29

• Say, w1 = −1, w2 = −1
w1 w2 errors
-1 -
1
1.5 0
3
1
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
29

differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
w1 w2 errors
-1 -1 3
1.5 0 1
0.45 0.45 3
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29

differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
w1 w2 errors
-1 -1
3
1.5 0
1
0.45 0.45 3
• We are interested in those values of w0, w1,
w2
which result in 0 error
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29

differ-
ent values of w1,
w2
• Say, w1 = −1, w2 = −1
w1 w2 errors
-1 -1
3
1.5 0
1
0.45 0.45 3
• We are interested in those values of w0, w1,
w2
which result in 0 error
• Let us plot the error surface corresponding to
x1
x2
−1 + 1.1x1 + 1.1x2 = 0
(0, 1)
(1, 1)
(0, 0) (1, 0)
−1 + (1.5)x1 + (0)x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
29

• For ease of analysis, we will keep w0
fixed (-1) and plot the error for different
values of w1, w2
30

values of w1, w2
• For a given w0, w1, w2 we will
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
com- binations of (x1, x2 ) and note
down how many errors we make
30

values of w1, w2
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
• For the OR function, an error occurs
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) >=
(0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
30

values of w1, w2
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0
• or if (x1, x2) /= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
30

values of w1, w2
compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all
if (x1, x2 ) = (0, 0) but −w0 + w1 ∗
x1 + w2 ∗ x2 ≥ 0 or if (x1, x2 ) /=
(0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
• W
e are interested in finding an
algorithm which finds the values of w ,
30

• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies and a
la- bel (class) associated with each movie
indicating whether the user liked this movie or
not : binary decision
• Further, suppose we represent each movie with
n
features (some boolean, some real valued)
33

x1 = isActorDamon
x4 =
imdbRating(scaled to 0
to 1)
... ... 33
• Further, suppose we represent each movie with
n

x0 = 1 x1 x2 .. ..
xn
y
x1 = isActorDamon
x4 =
to 1)
... ... 33
• Further, suppose we represent each movie with n
• We will assume that the data is linearly separable
and we want a perceptron to learn how to make
this decision

x1 x2 .. ..
xn
y
w0 = −θ
x0 = 1
33
w1
w2
.. .. wn
x1 = isActorDamon
x4 =
to 1)
... ...
• Further, suppose we represent each movie with n
• We will assume that the data is linearly separable
and we want a perceptron to learn how to make
this decision
• In other words, we want the perceptron to find
the equation of this separating plane (or find the
values of w0, w1, w2, .., wm)

Deep Learning Presentation that has been

More Related Content

Similar to Deep Learning Presentation that has been (20)

Recently uploaded (20)

Deep Learning Presentation that has been