2021 01-04-learning filter-basis

Introduction
• In this paper, we try to reduce the number of parameters of CNNs by learning
a basis of the filters in convolutional layers.
• We focus on Filter Decomposition.
• Filter decomposition approximates the original filter with a lightweight
convolution and a linear projection.

Filter Decomposition
2D : w x h -> 3 x 3 kernel size
3D : c x w x h -> narrow networks
w
h
h
w
in_c

Introduction
• The aforementioned methods are "hard" decomposition.
• we propose a novel filter basis learning method that circumvents the limitation
of the "hard" filter decomposition methods.
• we split the 3D filters along the input channel dimension and each spilt is
considered as a basic element.
• we assume that the ensemble of those basic elements within one
convolutional layer can a be represented by the linear combinations of a
basis.

Contribution of this paper
1. we propose a novel basis learning method that can reduce the input channel,
making it eligible for narrow networks. our method can be applied to
convolutional layers with different kernel sizes and even 1 x 1 convolutions.
2. our method achieves state-of-the-art compression performance.
3. our method generalizes easily to prior work just by changing the number of
splits, thus leading to a unified formulation of different filter decomposition
methods.
4. we validate our method on both high level(Classification) and low
level(Super Resolution) vision tasks.

Filter Decomposition for Network Compression
• input image : 𝑥 ∈ 𝑋
• label : 𝑦 ∈ 𝑌
𝑦 = 𝑓𝜃(𝑥)
• 𝑊 ≈ 𝐵 ⋅ 𝐴 (𝐵 ∈ ℜ𝑐𝑤ℎ×𝑚
, 𝐴 ∈ ℜ𝑚×𝑛
)
• 𝑊 ≈ ℜ𝑐𝑤ℎ×𝑛 = 𝑊1, ⋯ , 𝑊
𝑛 : filter wise
• 𝑊 ≈ ℜ𝑤ℎ×𝑐𝑛
: channel wise
• 𝑚 < 𝑛

Decomposing convolution layer with filter basis
Each 3D filter 𝑊𝑖 ∈ ℜ𝑐𝑤ℎ×1 (or 𝑊𝑖 ∈ ℜ𝑤ℎ×1 for the channel-wise decomposition case) is
represented by the linear combination of a set of 𝑚 filter basis {𝐵𝑗|𝑗 = 1, ⋯ , 𝑚} with the
coding coefficient vector 𝐴𝑖 ∈ ℜ𝑚×1:
𝑊𝑖 ≈
𝑗=1
𝑚
𝑎𝑗,𝑖𝐵𝑗 , 𝑖 = 1, ⋯ , 𝑛
where 𝐴𝑖 is the 𝑖-th column of 𝐴, 𝐵𝑗 is the 𝑗-th filter basis with dimension 𝑐𝑤ℎ × 1 or 𝑤ℎ ×
1 for the 3D filter-wise decomposition and 2D channel-wise decomposition cases,
respectively.

Decomposing convolution layer with filter basis

Compression rate with different filter basis
• filter-wise
Γ𝑓𝑖𝑙𝑡𝑒𝑟 =
𝑚 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛
+
𝑚
𝑐 ⋅ 𝑤 ⋅ ℎ
• channel-wise
Γ𝑐ℎ𝑎𝑛𝑛𝑒𝑙 =
𝑚 ⋅ 𝑤 ⋅ ℎ + 𝑐 ⋅ 𝑚 ⋅ 𝑛
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛 ⋅ 𝑐
+
𝑚
𝑤 ⋅ ℎ
• split-wise
Γ𝑠𝑝𝑙𝑖𝑡 =
𝑚 ⋅ 𝑝 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛 ⋅ 𝑠
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛 × 𝑠
+
𝑚
𝑝 ⋅ 𝑤 ⋅ ℎ

Compression rate with different filter basis
𝑠∗
, 𝑝∗
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑠,𝑝
𝑚
𝑛 × 𝑠
+
𝑚
𝑝 ⋅ 𝑤 ⋅ ℎ
=
𝑐 ⋅ 𝑤 ⋅ ℎ
𝑛
,
𝑛 ⋅ 𝑐
𝑤 ⋅ ℎ
optimal group : 𝑠∗
≈ 𝑤 × ℎ

Implementing with convolution
𝑊𝑖, 𝐵𝑗 ∈ ℜ𝑐×𝑤×ℎ
𝑥 ∗ 𝑊𝑖 = 𝑥 ∗
𝑗=1
𝑚
𝑎𝑗,𝑖𝐵𝑗 =
𝑗=1
𝑚
𝑎𝑗,𝑖(𝑥 ∗ 𝐵𝑗)

Filter basis decomposition for special filter sizes
• 1 x 1 convolution case
When the input/output channels are quite large, considerable parameters and
computation are consumed by 1 x 1 convolution.
• c >> n > m convolution case
output channel < input channel

General filter basis learning approach
𝑙 - th layer
𝑓𝐵,𝐴 | 𝜃(⋅) : CNN
After having learned the basis and the coding matrices {B, A}, there is no need
to store the original filters.
During inference, {B, A} is used as the weight parameter as the lightweight and
1 x 1 convolution, respectively.

Task - Image Classification
• M : number of basis
• T : number of transition layer
Interestingly, although our ‘M38T12’ model uses two more basis than ‘M36T6’,
the error rate rises a little bit.
This is because ‘M38T12’ uses an aggressive compression, i.e., s = 12 in the
transition block.
Therefore, the compression degree of the DenseBlock and the transition block
should be balanced to obtain the best trade-off between compression ratio and
accuracy.

2021 01-04-learning filter-basis

More Related Content

What's hot (20)

Similar to 2021 01-04-learning filter-basis (20)

More from JAEMINJEONG5 (12)

Recently uploaded (20)

2021 01-04-learning filter-basis