[NeuralIPS 2020]filter in filter pruning

Data-driven AI
Security HCI (DASH) Lab
1
Data-driven AI
Pruning Filter in Filter
김민하
소프트웨어학과
성균관대학교
NeuralIPS 2020
May 6, 2020
Data-driven AI

Data-driven AI
Pruning?
[NIPS2015] Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626
To remove the weight in neural network model
Weight Pruning
Weight Pruning (WP) prunes weight of each filter
It remove redundant neurons iteratively

Data-driven AI
Pruning?
C : number of channels
N : number of output channels
W,H : Width / High
Filter/Channel Pruning
Filter/Channel Pruning (FP) prunes at the level of filter and channel
It can prune a large region compared with the weight pruning
1. 2.

Data-driven AI
Pruning?
W,H : Width / High
Group Pruning
 It breaks the independent assumption on the filters
 Although the position in each filter same, the importance of weights is different
 The network may lose representation ability under a large pruning ratio

Data-driven AI
Abstract
model deployment is sometimes costly due to the large number of parameters in
DNNs
 To solve this, ‘Pruning’ which is one of model compression algorithms
 Filter Pruning(FP), Channel Pruning (CP), Weight Pruning (WP) and Group Pruning (GP)…
Proble
m
Backgrou
nd
However, these can lose Important information because of pruning the weights of the
same position
 They wonder if they can learn the optimal kernel size of each filter by pruning

Data-driven AI
Abstract
Solutio
n
To converge the strength of filter pruning and weight pruning, they propose method as
Stripe-Wise Pruning (SWP) with Filter Skeleton (FS).
To fine the ‘filter shape’ alongside the filter weights, they propose ‘Filter Skeleton (FS)’
It treat a filter as K x K stripes, by pruning the stripes instead of the whole filter
It can achieve finer granularity than traditional FP while being hardware friendly

Data-driven AI
W,H : Width / High
Stripe Pruning
It keeps each filter independent with each other,
thus can lead to a more efficient network structure.
(Their proposed method)
Proposed Method – Filter Skeleton(FS)

Data-driven AI
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width
Filter Skeleton (FS)
• Learnable matrix that reflects the shape of each filter
• Values of FS first initialized to 1
• l-th convolutional layer’s weight W : 𝑅𝑁×𝐶×𝐾×𝐾
→ FS : 𝑅𝑁×𝐾×𝐾
Each filter has one FS

Data-driven AI
● Filter Skeleton(FS)
○ Loss function
○ Gradient of W(Weight), I(Filter Skeleton)
○ Gradient of W, I
(1)
(2)
(3)
(4)
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width

Data-driven AI
 Mini-figure  one of 9 strips
 X-axis  all the filters (N)
 Y-axis  summation of the stripes located in the same position of all the filters
Filter (3x3)

Data-driven AI
Proposed Method – Stripe-wise pruning (SWP)
• Stripe-wise pruning (SWP)
• Set threshold δ
• Corresponding values in FS < δ → not updated during training → pruned
α : magnitude of regularization
g(I) : L1 norm penalty on ‘I’
(5)
(6)

Data-driven AI
Proposed Method – Stripe-wise pruning (SWP)
(7)

Data-driven AI
Experiments-Group Pruning vs. Stripe Pruning
Can continue the training
Can not continue the training
Group-wise pruning (GP)
They find that in GW, layer2.7 filters will be identified as invalid
because all the weights are removed while training
 It can not continue the training
Stripe-wise pruning (SWP)
Stripe-wise pruning keeps each filter independent of each other
 It can continue the training and achieve a higher accuracy than GP

Data-driven AI
Experiments
● Baseline accuracy
(CIFAR-10)
○ VGG16 93.25%
○ ResNet56 93.1%
● Baseline accuracy
(ImageNet)
○ ResNet18
- Top-1 69.76%
- Top-5 89.08%

Data-driven AI
• White color denotes the corresponding strip in the filter is removed by SWP
• In the layer that close to input, most preserved layers have multiple strips
• In middle layers, SWP only have one strip  redundancy is decreased
Experiments - Visualization of the filters pruned by SWP
(VGG19)
Layer #
Display the filters according to their frequency in such layer
Highest
Frequency
Lowest
Frequency

Data-driven AI
Experiments – Ablation Study
• How hyper-parameters affect pruning results
• Changing α (magnitude of regularization), δ (threshold)
• α = 1e-5, δ = 0.05 gives the acceptable pruning ratio and test accuracy

Data-driven AI
Conclusion
• Stripe-Wise Pruning (SWP)
- They propose a new pruning paradigm called SWP (Stripe-Wise Pruning)
- They achieve a higher pruning ratio compared to the filter-wise and
group-wise pruning methods.
- It achieve finer granularity than traditional FP while being hardware
friendly
• Filter Skeleton (FS)
- They propose a new method ‘Filter Skeleton’ to efficiently learn the
optimal shape of the filters for pruning
- Through extensive experiments and analyses, they demonstrate
the effectiveness
• Achievement SOTA pruning ratio
- They show SWP achieves state-of-art pruning ratio on CIFAR-10 and
ImageNet datasets compared to filter-wise, channel-wise or group-
wise pruning

Data-driven AI
Thank you !

[NeuralIPS 2020]filter in filter pruning

More Related Content

What's hot (20)

Similar to [NeuralIPS 2020]filter in filter pruning (13)

More from KIMMINHA3 (13)

Recently uploaded (20)

[NeuralIPS 2020]filter in filter pruning