Deep Learning for Computer Vision: Deep Networks (UPC 2016)

Day 1 Lecture 3
Deep Networks
Elisa Sayrol
[course site]

From Neurons to Convolutional Neural Networks
Figures Credit: Hugo Laroche NN course
2

Figure Credit: Hugo Laroche NN course
Hidden pre-activation
Hidden activation
Output activation
g(x) activation function:
sigmoid:
tangh:
ReLU:
o(x) output activation function:
Softmax:
3

L Hidden Layers
Hidden pre-activation (k>0)
Hidden activation (k=1,…L)
Output activation (k=L+1)
Slide Credit: Hugo Laroche NN course 4

What if the input is all the
pixels within an image?
5

For a 200x200 image,
we have 4x104
neurons
each one with 4x104
inputs, that is 16x108
parameters, only for one
layer!!!
Figure Credit: Ranzatto 6

For a 200x200 image, we
have 4x104
neurons each one
with 10x10 “local
connections” (also called
receptive field) inputs, that is
4x106
What else can we do to
reduce the number of
parameters?

Translation invariance: we can use same
parameters to capture a specific “feature” in any
area of the image. We can try different sets of
parameters to capture different features.
These operations are equivalent to perform
convolutions with different filters.
Ex: With100 different filters (or feature extractors)
of size 10x10, the number of parameters is 104
That is why they are called Convolutional
Neural Networks, ( ConvNets or CNNs)

…and don’t forget the activation function!
Figure Credit: Ranzatto
ReLu PReLu
9

Most ConvNets use Pooling
(or subsampling) to reduce
dimensionality and provide
invariance to small local
changes.
Pooling options:
• Max
• Average
• Stochastic pooling

Padding (P): When doing the
convolution in the borders, you may
add values to compute the
convolution.
When the values are zero, that is
quite common, the technique is called
zero-padding.
When padding is not used the output
size is reduced.
FxF=3x3
11

Padding (P): When doing the
convolution in the borders, you may
add values to compute the
convolution.
When the values are zero, that is
quite common, the technique is called
zero-padding.
When padding is not used the output
size is reduced.
FxF=5x5
12

Stride (S): When doing the
convolution or another operation, like
pooling, we may decide to slide not
pixel by pixel but every 2 or more
pixels. The number of pixels that we
skip is the value of the stride.
It might be used to reduce the
dimensionality of the output
13

Example: Most convnets contain several convolutional layers, interspersed with
pooling layers, and followed by a small number of fully connected layers
A layer is characterized by its width, height and depth (that is, the number of
filters used to generate the feature maps)
An architecture is characterized by the number of layers
LeNet-5 From Lecun ´98
14

Deep Learning for Computer Vision: Deep Networks (UPC 2016)

More Related Content

What's hot (20)

Similar to Deep Learning for Computer Vision: Deep Networks (UPC 2016) (20)

More from Universitat Politècnica de Catalunya (20)

Recently uploaded (20)

Deep Learning for Computer Vision: Deep Networks (UPC 2016)