Convolution as matrix multiplication

Convolution as matrix
multiplication
• Edwin Efraín Jiménez Lepe

16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
23 -14
50 -14
353 354
535 248
FeedForward
Applying kernel rotation

16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
24 -13
51 -13
353 354
535 248
Now with bias
1
1
1
1
1 0
FeedForward

16 24 32
47 18 26
68 12 9
Input
0 0
-2.94504954e-05 0
d_y
∗ =
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 0
-2e-05 6e-06
0 0
0 0
x
Im2col(d_y)
-1.38417328e-03 3.00583533
e-04
-2.00263369e-03 4.34886814
e-04
-5.30108917e-04 1.15117098
e-04
-3.53405945e-04 7.67447318
e-05
Rearrange
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
d_w = input * d_y
The update correspond to the
Rotated kernel
BackPropagation
d_w
0 0
6.39539432e-06 0
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04

0 0
-2.94504954e-05 0
d_y
d_x = d_y * w (without rotation)
BackPropagation
0 0
-6.39539432e-06 0
We need full convolution
And keep kernel unrotated
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
0 1
-1 0
2 3
4 5
W1
W2
∗
0 1
-1 0
2 3
4 5
W1
W2
∗

BackPropagation
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
=
0 1
-1 0
2 3
4 5
W1
W2
∗
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T

BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T
=
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0

BackPropagation
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
+ =
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
reshape
0 0 0
0.3198e-04 0.5503e-04 0
-0.1026e-04 0.1279-04 0

BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
T
=
In fact, we can do it in just one operation
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
Notice, every channel of delta is multiplied
by the correspondent filter that generates it

A multi-channel example
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
Output= (2,2,2)
=
2171 2170
5954 2064
13042 13575
11023 6425
Applying theano convolution (which rotates
Automatically the filters)

A multi-channel example (vectorized)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
x
T

A multi-channel example (vectorized)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
x =
T
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
2171 13042
5954 11023
2170 13575
2064 6425
Channel 1
Channel 2
Rearrange
2171 2170
5954 2064
13042 13575
11023 6425

Backpropagation
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
Imagine we got the next error
from an up-layer
And we want to propagate it to
the correspondent layer (input of convolution)
We need to compute d_y * w (without rotation)
But is a ‘full’ convolution, so we add 1 zero padding to d_y
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0

Backpropagation d_y-1=d_y * w (without rotation)
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
im2col
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0

0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
T
Notice, every channel of delta is multiplied
by the correspondent filter that generates it
0
-1
1
0
2
4
3
5
-2
24
68
16
18
22
32
60
23
46
7
35
42
81
20
78
x =
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962

30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
rearrange
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962

Backpropagation (no vectorized) d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(2,3,2,2)
Transpose
dimensions
0 and 1
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3

Backpropagation (no vectorized, full convolution)
d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3
=
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962

Backpropagation
d_w=input * d_y16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 24 32
47 18 26
68 12 9
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
26 57 43
24 21 12
02 11 19
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
18 47 21
4 6 12
81 22 13

Backpropagation d_w=input * d_y
=
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
Error associated with rotated kernel, it means
We need to rotate this result to update the
unrotated kernel
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
+
+

Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487

Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
=
33.832 153.425
28.367 121.275
43.2764 162.12
22.627 85.417
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
rearrange

Convolution as matrix multiplication

More Related Content

Viewers also liked (8)

Similar to Convolution as matrix multiplication (20)

Recently uploaded (20)

Convolution as matrix multiplication