SlideShare a Scribd company logo
2021 01-04-learning filter-basis
Introduction
• In this paper, we try to reduce the number of parameters of CNNs by learning
a basis of the filters in convolutional layers.
• We focus on Filter Decomposition.
• Filter decomposition approximates the original filter with a lightweight
convolution and a linear projection.
Filter Decomposition
2D : w x h -> 3 x 3 kernel size
3D : c x w x h -> narrow networks
w
h
h
w
in_c
Introduction
• The aforementioned methods are "hard" decomposition.
• we propose a novel filter basis learning method that circumvents the limitation
of the "hard" filter decomposition methods.
• we split the 3D filters along the input channel dimension and each spilt is
considered as a basic element.
• we assume that the ensemble of those basic elements within one
convolutional layer can a be represented by the linear combinations of a
basis.
Contribution of this paper
1. we propose a novel basis learning method that can reduce the input channel,
making it eligible for narrow networks. our method can be applied to
convolutional layers with different kernel sizes and even 1 x 1 convolutions.
2. our method achieves state-of-the-art compression performance.
3. our method generalizes easily to prior work just by changing the number of
splits, thus leading to a unified formulation of different filter decomposition
methods.
4. we validate our method on both high level(Classification) and low
level(Super Resolution) vision tasks.
Filter Decomposition for Network Compression
• input image : 𝑥 ∈ 𝑋
• label : 𝑦 ∈ 𝑌
𝑦 = 𝑓𝜃(𝑥)
• 𝑊 ≈ 𝐵 ⋅ 𝐴 (𝐵 ∈ ℜ𝑐𝑤ℎ×𝑚
, 𝐴 ∈ ℜ𝑚×𝑛
)
• 𝑊 ≈ ℜ𝑐𝑤ℎ×𝑛 = 𝑊1, ⋯ , 𝑊
𝑛 : filter wise
• 𝑊 ≈ ℜ𝑤ℎ×𝑐𝑛
: channel wise
• 𝑚 < 𝑛
Decomposing convolution layer with filter basis
Each 3D filter 𝑊𝑖 ∈ ℜ𝑐𝑤ℎ×1 (or 𝑊𝑖 ∈ ℜ𝑤ℎ×1 for the channel-wise decomposition case) is
represented by the linear combination of a set of 𝑚 filter basis {𝐵𝑗|𝑗 = 1, ⋯ , 𝑚} with the
coding coefficient vector 𝐴𝑖 ∈ ℜ𝑚×1:
𝑊𝑖 ≈
𝑗=1
𝑚
𝑎𝑗,𝑖𝐵𝑗 , 𝑖 = 1, ⋯ , 𝑛
where 𝐴𝑖 is the 𝑖-th column of 𝐴, 𝐵𝑗 is the 𝑗-th filter basis with dimension 𝑐𝑤ℎ × 1 or 𝑤ℎ ×
1 for the 3D filter-wise decomposition and 2D channel-wise decomposition cases,
respectively.
Decomposing convolution layer with filter basis
Compression rate with different filter basis
• filter-wise
Γ𝑓𝑖𝑙𝑡𝑒𝑟 =
𝑚 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛
+
𝑚
𝑐 ⋅ 𝑤 ⋅ ℎ
• channel-wise
Γ𝑐ℎ𝑎𝑛𝑛𝑒𝑙 =
𝑚 ⋅ 𝑤 ⋅ ℎ + 𝑐 ⋅ 𝑚 ⋅ 𝑛
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛 ⋅ 𝑐
+
𝑚
𝑤 ⋅ ℎ
• split-wise
Γ𝑠𝑝𝑙𝑖𝑡 =
𝑚 ⋅ 𝑝 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛 ⋅ 𝑠
𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ
=
𝑚
𝑛 × 𝑠
+
𝑚
𝑝 ⋅ 𝑤 ⋅ ℎ
Compression rate with different filter basis
𝑠∗
, 𝑝∗
= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑠,𝑝
𝑚
𝑛 × 𝑠
+
𝑚
𝑝 ⋅ 𝑤 ⋅ ℎ
=
𝑐 ⋅ 𝑤 ⋅ ℎ
𝑛
,
𝑛 ⋅ 𝑐
𝑤 ⋅ ℎ
optimal group : 𝑠∗
≈ 𝑤 × ℎ
Implementing with convolution
Implementing with convolution
𝑊𝑖, 𝐵𝑗 ∈ ℜ𝑐×𝑤×ℎ
𝑥 ∗ 𝑊𝑖 = 𝑥 ∗
𝑗=1
𝑚
𝑎𝑗,𝑖𝐵𝑗 =
𝑗=1
𝑚
𝑎𝑗,𝑖(𝑥 ∗ 𝐵𝑗)
Filter basis decomposition for special filter sizes
• 1 x 1 convolution case
When the input/output channels are quite large, considerable parameters and
computation are consumed by 1 x 1 convolution.
• c >> n > m convolution case
output channel < input channel
Learning Filter Basis
General filter basis learning approach
𝑙 - th layer
𝑓𝐵,𝐴 | 𝜃(⋅) : CNN
After having learned the basis and the coding matrices {B, A}, there is no need
to store the original filters.
During inference, {B, A} is used as the weight parameter as the lightweight and
1 x 1 convolution, respectively.
CIFAR10
Task - Image Classification
• M : number of basis
• T : number of transition layer
Interestingly, although our ‘M38T12’ model uses two more basis than ‘M36T6’,
the error rate rises a little bit.
This is because ‘M38T12’ uses an aggressive compression, i.e., s = 12 in the
transition block.
Therefore, the compression degree of the DenseBlock and the transition block
should be balanced to obtain the best trade-off between compression ratio and
accuracy.
CIFAR10

More Related Content

PPTX
2021 03-02-transformer interpretability
PPTX
2020 12-04-shake shake
PPTX
2021 03-01-on the relationship between self-attention and convolutional layers
PPTX
2021 06-02-tabnet
PPTX
2021 05-04-u2-net
PDF
PR-284: End-to-End Object Detection with Transformers(DETR)
PDF
2021 04-01-dalle
PPTX
2020 12-03-vit
2021 03-02-transformer interpretability
2020 12-04-shake shake
2021 03-01-on the relationship between self-attention and convolutional layers
2021 06-02-tabnet
2021 05-04-u2-net
PR-284: End-to-End Object Detection with Transformers(DETR)
2021 04-01-dalle
2020 12-03-vit

What's hot (20)

PPT
Cnn method
PDF
MobileNet V3
PDF
2021 03-02-spade
PDF
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PPTX
Machine Learning - Introduction to Convolutional Neural Networks
PDF
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PDF
2021 04-03-sean
PDF
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
Overview of Convolutional Neural Networks
PPTX
2020 12-1-adam w
PPTX
Variational Auto Encoder and the Math Behind
PPTX
Semi orthogonal low-rank matrix factorization for deep neural networks
PPTX
CNN and its applications by ketaki
PDF
Convolutional Neural Network Models - Deep Learning
PDF
Deep learning
PDF
Case Study of Convolutional Neural Network
Cnn method
MobileNet V3
2021 03-02-spade
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Machine Learning - Introduction to Convolutional Neural Networks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
2021 04-03-sean
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Overview of Convolutional Neural Networks
2020 12-1-adam w
Variational Auto Encoder and the Math Behind
Semi orthogonal low-rank matrix factorization for deep neural networks
CNN and its applications by ketaki
Convolutional Neural Network Models - Deep Learning
Deep learning
Case Study of Convolutional Neural Network
Ad

Similar to 2021 01-04-learning filter-basis (20)

PPTX
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
PPTX
FILTER BANKS
PDF
Lecture_Information Theory_ and coding Part-2.pdf
PDF
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
PDF
convolutional_neural_networks in deep learning
PDF
DSP_FOEHU - Lec 11 - IIR Filter Design
PPTX
Waste Classification System using Convolutional Neural Networks.pptx
PDF
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
PDF
Deep Learning: CNN and RNN Architectures pdf
PDF
AI_Theory: Covolutional_neuron_network.pdf
PPTX
Communication Systems_B.P. Lathi and Zhi Ding (Lecture No 31-39)
PPT
hpc-unit-IV-2-dense-matrix-algorithms.ppt
PPTX
lecture_25-26__modeling_digital_control_systems.pptx
PDF
channel_mzhazbay.pdf
PPTX
Dynamic user clustering_for_mmWave_NOMA.pptx
PPTX
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
PDF
DSP_2018_FOEHU - Lec 07 - IIR Filter Design
PDF
A Method to Derate the Rate-Dependency in the Pass-Band Droop of Comb Decimators
PDF
Densenet CNN
PDF
Paper study: Attention, learn to solve routing problems!
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
FILTER BANKS
Lecture_Information Theory_ and coding Part-2.pdf
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
convolutional_neural_networks in deep learning
DSP_FOEHU - Lec 11 - IIR Filter Design
Waste Classification System using Convolutional Neural Networks.pptx
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Deep Learning: CNN and RNN Architectures pdf
AI_Theory: Covolutional_neuron_network.pdf
Communication Systems_B.P. Lathi and Zhi Ding (Lecture No 31-39)
hpc-unit-IV-2-dense-matrix-algorithms.ppt
lecture_25-26__modeling_digital_control_systems.pptx
channel_mzhazbay.pdf
Dynamic user clustering_for_mmWave_NOMA.pptx
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
DSP_2018_FOEHU - Lec 07 - IIR Filter Design
A Method to Derate the Rate-Dependency in the Pass-Band Droop of Comb Decimators
Densenet CNN
Paper study: Attention, learn to solve routing problems!
Ad

More from JAEMINJEONG5 (12)

PPTX
Jaemin_230701_Simple_Copy_paste.pptx
PPTX
2022-01-17-Rethinking_Bisenet.pptx
PPTX
Swin transformer
PPTX
2021 04-04-google nmt
PDF
2021 03-02-distributed representations-of_words_and_phrases
PPTX
2021 01-02-linformer
PPTX
2020 12-2-detr
PPTX
2020 11 4_bag_of_tricks
PPTX
2020 11 2_automated sleep stage scoring of the sleep heart
PPTX
2020 11 1_sleep_net
PPTX
2020 11 3_face_detection
PPTX
white blood cell classification
Jaemin_230701_Simple_Copy_paste.pptx
2022-01-17-Rethinking_Bisenet.pptx
Swin transformer
2021 04-04-google nmt
2021 03-02-distributed representations-of_words_and_phrases
2021 01-02-linformer
2020 12-2-detr
2020 11 4_bag_of_tricks
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 1_sleep_net
2020 11 3_face_detection
white blood cell classification

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
composite construction of structures.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Sustainable Sites - Green Building Construction
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Geodesy 1.pptx...............................................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
composite construction of structures.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Construction Project Organization Group 2.pptx
UNIT 4 Total Quality Management .pptx
CH1 Production IntroductoryConcepts.pptx
Mechanical Engineering MATERIALS Selection
Sustainable Sites - Green Building Construction
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing

2021 01-04-learning filter-basis

  • 2. Introduction • In this paper, we try to reduce the number of parameters of CNNs by learning a basis of the filters in convolutional layers. • We focus on Filter Decomposition. • Filter decomposition approximates the original filter with a lightweight convolution and a linear projection.
  • 3. Filter Decomposition 2D : w x h -> 3 x 3 kernel size 3D : c x w x h -> narrow networks w h h w in_c
  • 4. Introduction • The aforementioned methods are "hard" decomposition. • we propose a novel filter basis learning method that circumvents the limitation of the "hard" filter decomposition methods. • we split the 3D filters along the input channel dimension and each spilt is considered as a basic element. • we assume that the ensemble of those basic elements within one convolutional layer can a be represented by the linear combinations of a basis.
  • 5. Contribution of this paper 1. we propose a novel basis learning method that can reduce the input channel, making it eligible for narrow networks. our method can be applied to convolutional layers with different kernel sizes and even 1 x 1 convolutions. 2. our method achieves state-of-the-art compression performance. 3. our method generalizes easily to prior work just by changing the number of splits, thus leading to a unified formulation of different filter decomposition methods. 4. we validate our method on both high level(Classification) and low level(Super Resolution) vision tasks.
  • 6. Filter Decomposition for Network Compression • input image : 𝑥 ∈ 𝑋 • label : 𝑦 ∈ 𝑌 𝑦 = 𝑓𝜃(𝑥) • 𝑊 ≈ 𝐵 ⋅ 𝐴 (𝐵 ∈ ℜ𝑐𝑤ℎ×𝑚 , 𝐴 ∈ ℜ𝑚×𝑛 ) • 𝑊 ≈ ℜ𝑐𝑤ℎ×𝑛 = 𝑊1, ⋯ , 𝑊 𝑛 : filter wise • 𝑊 ≈ ℜ𝑤ℎ×𝑐𝑛 : channel wise • 𝑚 < 𝑛
  • 7. Decomposing convolution layer with filter basis Each 3D filter 𝑊𝑖 ∈ ℜ𝑐𝑤ℎ×1 (or 𝑊𝑖 ∈ ℜ𝑤ℎ×1 for the channel-wise decomposition case) is represented by the linear combination of a set of 𝑚 filter basis {𝐵𝑗|𝑗 = 1, ⋯ , 𝑚} with the coding coefficient vector 𝐴𝑖 ∈ ℜ𝑚×1: 𝑊𝑖 ≈ 𝑗=1 𝑚 𝑎𝑗,𝑖𝐵𝑗 , 𝑖 = 1, ⋯ , 𝑛 where 𝐴𝑖 is the 𝑖-th column of 𝐴, 𝐵𝑗 is the 𝑗-th filter basis with dimension 𝑐𝑤ℎ × 1 or 𝑤ℎ × 1 for the 3D filter-wise decomposition and 2D channel-wise decomposition cases, respectively.
  • 8. Decomposing convolution layer with filter basis
  • 9. Compression rate with different filter basis • filter-wise Γ𝑓𝑖𝑙𝑡𝑒𝑟 = 𝑚 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛 𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ = 𝑚 𝑛 + 𝑚 𝑐 ⋅ 𝑤 ⋅ ℎ • channel-wise Γ𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 𝑚 ⋅ 𝑤 ⋅ ℎ + 𝑐 ⋅ 𝑚 ⋅ 𝑛 𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ = 𝑚 𝑛 ⋅ 𝑐 + 𝑚 𝑤 ⋅ ℎ • split-wise Γ𝑠𝑝𝑙𝑖𝑡 = 𝑚 ⋅ 𝑝 ⋅ 𝑤 ⋅ ℎ + 𝑚 ⋅ 𝑛 ⋅ 𝑠 𝑛 ⋅ 𝑐 ⋅ 𝑤 ⋅ ℎ = 𝑚 𝑛 × 𝑠 + 𝑚 𝑝 ⋅ 𝑤 ⋅ ℎ
  • 10. Compression rate with different filter basis 𝑠∗ , 𝑝∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑠,𝑝 𝑚 𝑛 × 𝑠 + 𝑚 𝑝 ⋅ 𝑤 ⋅ ℎ = 𝑐 ⋅ 𝑤 ⋅ ℎ 𝑛 , 𝑛 ⋅ 𝑐 𝑤 ⋅ ℎ optimal group : 𝑠∗ ≈ 𝑤 × ℎ
  • 12. Implementing with convolution 𝑊𝑖, 𝐵𝑗 ∈ ℜ𝑐×𝑤×ℎ 𝑥 ∗ 𝑊𝑖 = 𝑥 ∗ 𝑗=1 𝑚 𝑎𝑗,𝑖𝐵𝑗 = 𝑗=1 𝑚 𝑎𝑗,𝑖(𝑥 ∗ 𝐵𝑗)
  • 13. Filter basis decomposition for special filter sizes • 1 x 1 convolution case When the input/output channels are quite large, considerable parameters and computation are consumed by 1 x 1 convolution. • c >> n > m convolution case output channel < input channel
  • 15. General filter basis learning approach 𝑙 - th layer 𝑓𝐵,𝐴 | 𝜃(⋅) : CNN After having learned the basis and the coding matrices {B, A}, there is no need to store the original filters. During inference, {B, A} is used as the weight parameter as the lightweight and 1 x 1 convolution, respectively.
  • 17. Task - Image Classification • M : number of basis • T : number of transition layer Interestingly, although our ‘M38T12’ model uses two more basis than ‘M36T6’, the error rate rises a little bit. This is because ‘M38T12’ uses an aggressive compression, i.e., s = 12 in the transition block. Therefore, the compression degree of the DenseBlock and the transition block should be balanced to obtain the best trade-off between compression ratio and accuracy.