SlideShare a Scribd company logo
Convolution as matrix
multiplication
• Edwin Efraín Jiménez Lepe
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
23 -14
50 -14
353 354
535 248
FeedForward
Applying kernel rotation
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
24 -13
51 -13
353 354
535 248
Now with bias
1
1
1
1
1 0
FeedForward
16 24 32
47 18 26
68 12 9
Input
0 0
-2.94504954e-05 0
d_y
∗ =
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 0
-2e-05 6e-06
0 0
0 0
x
Im2col(d_y)
-1.38417328e-03 3.00583533
e-04
-2.00263369e-03 4.34886814
e-04
-5.30108917e-04 1.15117098
e-04
-3.53405945e-04 7.67447318
e-05
Rearrange
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
d_w = input * d_y
The update correspond to the
Rotated kernel
BackPropagation
d_w
0 0
6.39539432e-06 0
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
0 0
-2.94504954e-05 0
d_y
d_x = d_y * w (without rotation)
BackPropagation
0 0
-6.39539432e-06 0
We need full convolution
And keep kernel unrotated
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
0 1
-1 0
2 3
4 5
W1
W2
∗
0 1
-1 0
2 3
4 5
W1
W2
∗
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
=
0 1
-1 0
2 3
4 5
W1
W2
∗
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T
=
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
d_x = d_y * w (without rotation)
BackPropagation
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
+ =
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
reshape
0 0 0
0.3198e-04 0.5503e-04 0
-0.1026e-04 0.1279-04 0
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
T
=
In fact, we can do it in just one operation
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
Notice, every channel of delta is multiplied
by the correspondent filter that generates it
A multi-channel example
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
Output= (2,2,2)
=
2171 2170
5954 2064
13042 13575
11023 6425
Applying theano convolution (which rotates
Automatically the filters)
A multi-channel example (vectorized)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
x
T
A multi-channel example (vectorized)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
x =
T
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
2171 13042
5954 11023
2170 13575
2064 6425
Channel 1
Channel 2
Rearrange
2171 2170
5954 2064
13042 13575
11023 6425
Backpropagation
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
Imagine we got the next error
from an up-layer
And we want to propagate it to
the correspondent layer (input of convolution)
We need to compute d_y * w (without rotation)
But is a ‘full’ convolution, so we add 1 zero padding to d_y
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
Backpropagation d_y-1=d_y * w (without rotation)
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
im2col
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
Backpropagation d_y-1=d_y * w (without rotation)
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
T
Notice, every channel of delta is multiplied
by the correspondent filter that generates it
0
-1
1
0
2
4
3
5
-2
24
68
16
18
22
32
60
23
46
7
35
42
81
20
78
x =
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
Backpropagation d_y-1=d_y * w (without rotation)
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
rearrange
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagation (no vectorized) d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(2,3,2,2)
Transpose
dimensions
0 and 1
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3
Backpropagation (no vectorized, full convolution)
d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3
=
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagation
d_w=input * d_y16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 24 32
47 18 26
68 12 9
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
26 57 43
24 21 12
02 11 19
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
18 47 21
4 6 12
81 22 13
Backpropagation d_w=input * d_y
=
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
Error associated with rotated kernel, it means
We need to rotate this result to update the
unrotated kernel
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
+
+
Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
=
33.832 153.425
28.367 121.275
43.2764 162.12
22.627 85.417
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
rearrange

More Related Content

PPTX
8 bit alu design
PDF
Read only memory(rom)
PDF
PPTX
Bi-linear transformation 2
PDF
6933.laser p pts
PDF
Introduction to Convolutional Neural Networks
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
Backpropagation in Convolutional Neural Network
8 bit alu design
Read only memory(rom)
Bi-linear transformation 2
6933.laser p pts
Introduction to Convolutional Neural Networks
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Backpropagation in Convolutional Neural Network

Viewers also liked (8)

PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
PPTX
Neuroevolution and deep learing
PDF
Deep Convolutional Neural Networks - Overview
PDF
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
PPTX
Introduction to CNN
PDF
101: Convolutional Neural Networks
PDF
Convolutional Neural Networks (CNN)
PDF
Deep Learning - Convolutional Neural Networks
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
Neuroevolution and deep learing
Deep Convolutional Neural Networks - Overview
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Introduction to CNN
101: Convolutional Neural Networks
Convolutional Neural Networks (CNN)
Deep Learning - Convolutional Neural Networks
Ad

Similar to Convolution as matrix multiplication (20)

PPTX
Deep learning simplified
PDF
Brief Introduction to Deep Learning + Solving XOR using ANNs
PPTX
Neural Network Back Propagation Algorithm
PPTX
Estado del Arte de la IA
PDF
Using Raspberry Pi GPU for DNN
PPTX
Neural network - how does it work - I mean... literally!
PDF
Intro to Machine Learning with TF- workshop
PDF
Logistic regression
PPTX
Back propagation
DOC
Intro matlab-nn
PDF
Artificial Neural Networks
PDF
Get started with TinyML - Embedded online conference
PPTX
Support Vector Machines Simply
PPTX
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
PDF
Teaching your sensors new tricks with Machine Learning - Eta Compute webinar
PDF
Introduction to Machine Learning
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PPT
Neural network and mlp
Deep learning simplified
Brief Introduction to Deep Learning + Solving XOR using ANNs
Neural Network Back Propagation Algorithm
Estado del Arte de la IA
Using Raspberry Pi GPU for DNN
Neural network - how does it work - I mean... literally!
Intro to Machine Learning with TF- workshop
Logistic regression
Back propagation
Intro matlab-nn
Artificial Neural Networks
Get started with TinyML - Embedded online conference
Support Vector Machines Simply
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Teaching your sensors new tricks with Machine Learning - Eta Compute webinar
Introduction to Machine Learning
Machine Learning Essentials Demystified part2 | Big Data Demystified
Neural network and mlp
Ad

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Essential Infomation Tech presentation.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administraation Chapter 3
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Nekopoi APK 2025 free lastest update
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
CHAPTER 2 - PM Management and IT Context
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
VVF-Customer-Presentation2025-Ver1.9.pptx
How Creative Agencies Leverage Project Management Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03
How to Migrate SBCGlobal Email to Yahoo Easily
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Operating system designcfffgfgggggggvggggggggg
Essential Infomation Tech presentation.pptx
L1 - Introduction to python Backend.pptx
Understanding Forklifts - TECH EHS Solution
System and Network Administraation Chapter 3
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Nekopoi APK 2025 free lastest update

Convolution as matrix multiplication

  • 1. Convolution as matrix multiplication • Edwin Efraín Jiménez Lepe
  • 2. 16 24 32 47 18 26 68 12 9 Input 0 1 -1 0 2 3 4 5 W1 W2 ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 5 1 3 -1 4 0 2 x W1 W2 = 23 353 50 535 -14 354 -14 248 Rearrange 23 -14 50 -14 353 354 535 248 FeedForward Applying kernel rotation
  • 3. 16 24 32 47 18 26 68 12 9 Input 0 1 -1 0 2 3 4 5 W1 W2 ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 5 1 3 -1 4 0 2 x W1 W2 = 23 353 50 535 -14 354 -14 248 Rearrange 24 -13 51 -13 353 354 535 248 Now with bias 1 1 1 1 1 0 FeedForward
  • 4. 16 24 32 47 18 26 68 12 9 Input 0 0 -2.94504954e-05 0 d_y ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 0 -2e-05 6e-06 0 0 0 0 x Im2col(d_y) -1.38417328e-03 3.00583533 e-04 -2.00263369e-03 4.34886814 e-04 -5.30108917e-04 1.15117098 e-04 -3.53405945e-04 7.67447318 e-05 Rearrange -1.38417328e-03 -5.30108917e-04 -2.00263369e-03 3.53405945e-04 d_w = input * d_y The update correspond to the Rotated kernel BackPropagation d_w 0 0 6.39539432e-06 0 -1.38417328e-03 -5.30108917e-04 -2.00263369e-03 3.53405945e-04
  • 5. 0 0 -2.94504954e-05 0 d_y d_x = d_y * w (without rotation) BackPropagation 0 0 -6.39539432e-06 0 We need full convolution And keep kernel unrotated 0 0 0 0 0 0 0 0 0 -2.94504954e-05 0 0 0 0 0 0 d_y 0 0 0 0 0 0 0 0 0 6.39539432e-06 0 0 0 0 0 0 0 1 -1 0 2 3 4 5 W1 W2 ∗ 0 1 -1 0 2 3 4 5 W1 W2 ∗
  • 6. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 0 0 0 0 -2.94504954e-05 0 0 0 0 0 0 d_y 0 0 0 0 0 0 0 0 0 6.39539432e-06 0 0 0 0 0 0 = 0 1 -1 0 2 3 4 5 W1 W2 ∗ 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 x T T
  • 7. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 x T T = 0 0 -0.2945e-04 0 0.2945e-04 0 0 0 0 0 0.3198e-04 0.1919-04 0 0.2558-04 0.1279-04 0 0 0
  • 8. d_x = d_y * w (without rotation) BackPropagation 0 0 -0.2945e-04 0 0.2945e-04 0 0 0 0 0 0.3198e-04 0.1919-04 0 0.2558-04 0.1279-04 0 0 0 + = 0 0.3198e-04 -0.1026e-04 0 0.5503e-04 0.1279-04 0 0 0 reshape 0 0 0 0.3198e-04 0.5503e-04 0 -0.1026e-04 0.1279-04 0
  • 9. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 T = In fact, we can do it in just one operation 0 0.3198e-04 -0.1026e-04 0 0.5503e-04 0.1279-04 0 0 0 Notice, every channel of delta is multiplied by the correspondent filter that generates it
  • 10. A multi-channel example 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,3,3) (2,3,2,2) Output= (2,2,2) = 2171 2170 5954 2064 13042 13575 11023 6425 Applying theano convolution (which rotates Automatically the filters)
  • 11. A multi-channel example (vectorized) 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,3,3) (2,3,2,2) = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 0 60 1 32 -1 22 0 18 5 35 3 7 4 46 2 23 16 78 68 20 24 81 -2 42 x T
  • 12. A multi-channel example (vectorized) 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 x = T 0 60 1 32 -1 22 0 18 5 35 3 7 4 46 2 23 16 78 68 20 24 81 -2 42 2171 13042 5954 11023 2170 13575 2064 6425 Channel 1 Channel 2 Rearrange 2171 2170 5954 2064 13042 13575 11023 6425
  • 13. Backpropagation .1678 .098 .002 .246 0.5 0.67 0.21 0.487 Imagine we got the next error from an up-layer And we want to propagate it to the correspondent layer (input of convolution) We need to compute d_y * w (without rotation) But is a ‘full’ convolution, so we add 1 zero padding to d_y 0 0 0 0 0 .1678 .098 0 0 .002 .246 0 0 0 0 0 0 0 0 0 0 0.5 .67 0 0 .21 .487 0 0 0 0 0
  • 14. Backpropagation d_y-1=d_y * w (without rotation) 0 0 0 0 0 .1678 .098 0 0 .002 .246 0 0 0 0 0 0 0 0 0 0 0.5 .67 0 0 .21 .487 0 0 0 0 0 im2col 0 0 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 0 0 0 0 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 0 0
  • 15. Backpropagation d_y-1=d_y * w (without rotation) 0 0 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 0 0 0 0 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 0 0 T Notice, every channel of delta is multiplied by the correspondent filter that generates it 0 -1 1 0 2 4 3 5 -2 24 68 16 18 22 32 60 23 46 7 35 42 81 20 78 x = 30 18.339 41.6848 28.7678 11.3634 37.8224 6.722 1.476 4.336 51.0322 47.6112 98.3552 64.376 44.7626 99.7084 19.61 8.981 35.284 14.642 31.212 56.622 22.528 38.992 73.295 8.766 11.693 19.962
  • 16. Backpropagation d_y-1=d_y * w (without rotation) 30 18.339 41.6848 28.7678 11.3634 37.8224 6.722 1.476 4.336 51.0322 47.6112 98.3552 64.376 44.7626 99.7084 19.61 8.981 35.284 14.642 31.212 56.622 22.528 38.992 73.295 8.766 11.693 19.962 rearrange 30 51.0322 14.642 28.7678 64.376 22.528 6.722 19.61 8.766 18.339 47.6112 31.212 11.3634 44.7626 38.992 1.476 8.981 11.693 41.6848 98.3552 56.622 37.8224 99.7084 73.295 4.336 35.284 19.962
  • 17. Backpropagation (no vectorized) d_y-1=d_y * w (without rotation) .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (2,3,2,2) Transpose dimensions 0 and 1 .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,2,2,2) Filter 3
  • 18. Backpropagation (no vectorized, full convolution) d_y-1=d_y * w (without rotation) .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,2,2,2) Filter 3 = 30 51.0322 14.642 28.7678 64.376 22.528 6.722 19.61 8.766 18.339 47.6112 31.212 11.3634 44.7626 38.992 1.476 8.981 11.693 41.6848 98.3552 56.622 37.8224 99.7084 73.295 4.336 35.284 19.962
  • 19. Backpropagation d_w=input * d_y16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 (3,3,3) ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = d_y (2,2,2) Dimensions do not match, So it is telling us that we need to Apply both filters to any cannel of the input 16 24 32 47 18 26 68 12 9 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 9.5588 13.5952 12.7386 7.8064 42.716 49.882 55.684 33.323 26 57 43 24 21 12 02 11 19 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 15.1628 16.7726 8.7952 9.3958 66.457 67.564 31.847 30.103 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 9.1104 12.9086 6.8332 5.4248 44.252 44.674 33.744 21.991 18 47 21 4 6 12 81 22 13
  • 20. Backpropagation d_w=input * d_y = 33.832 43.2764 28.367 22.627 153.425 162.12 121.275 85.417 Error associated with rotated kernel, it means We need to rotate this result to update the unrotated kernel 9.5588 13.5952 12.7386 7.8064 42.716 49.882 55.684 33.323 15.1628 16.7726 8.7952 9.3958 66.457 67.564 31.847 30.103 9.1104 12.9086 6.8332 5.4248 44.252 44.674 33.744 21.991 + +
  • 21. Backpropagation vectorized d_w=input * d_y (without rotate d_y) 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 (3,3,3) ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = d_y (2,2,2) Dimensions do not match, So it is telling us that we need to Apply both filters to any cannel of the input 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 T x .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487
  • 22. Backpropagation vectorized d_w=input * d_y (without rotate d_y) 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 T x .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 = 33.832 153.425 28.367 121.275 43.2764 162.12 22.627 85.417 33.832 43.2764 28.367 22.627 153.425 162.12 121.275 85.417 rearrange