SlideShare a Scribd company logo
Data-driven AI
Security HCI (DASH) Lab
1
Data-driven AI
Security HCI (DASH) Lab
Pruning Filter in Filter
김민하
소프트웨어학과
성균관대학교
NeuralIPS 2020
May 6, 2020
Data-driven AI
Security HCI (DASH) Lab
Data-driven AI
Security HCI (DASH) Lab
Pruning?
[NIPS2015] Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626
To remove the weight in neural network model
Weight Pruning
Weight Pruning (WP) prunes weight of each filter
It remove redundant neurons iteratively
Data-driven AI
Security HCI (DASH) Lab
Pruning?
C : number of channels
N : number of output channels
W,H : Width / High
Filter/Channel Pruning
Filter/Channel Pruning (FP) prunes at the level of filter and channel
It can prune a large region compared with the weight pruning
1. 2.
Data-driven AI
Security HCI (DASH) Lab
Pruning?
C : number of channels
N : number of output channels
W,H : Width / High
Group Pruning
 It breaks the independent assumption on the filters
 Although the position in each filter same, the importance of weights is different
 The network may lose representation ability under a large pruning ratio
Data-driven AI
Security HCI (DASH) Lab
Abstract
model deployment is sometimes costly due to the large number of parameters in
DNNs
 To solve this, ‘Pruning’ which is one of model compression algorithms
 Filter Pruning(FP), Channel Pruning (CP), Weight Pruning (WP) and Group Pruning (GP)…
Proble
m
Backgrou
nd
However, these can lose Important information because of pruning the weights of the
same position
 They wonder if they can learn the optimal kernel size of each filter by pruning
Data-driven AI
Security HCI (DASH) Lab
Abstract
Solutio
n
To converge the strength of filter pruning and weight pruning, they propose method as
Stripe-Wise Pruning (SWP) with Filter Skeleton (FS).
To fine the ‘filter shape’ alongside the filter weights, they propose ‘Filter Skeleton (FS)’
It treat a filter as K x K stripes, by pruning the stripes instead of the whole filter
It can achieve finer granularity than traditional FP while being hardware friendly
Data-driven AI
Security HCI (DASH) Lab
C : number of channels
N : number of output channels
W,H : Width / High
Stripe Pruning
It keeps each filter independent with each other,
thus can lead to a more efficient network structure.
(Their proposed method)
Proposed Method – Filter Skeleton(FS)
Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width
Filter Skeleton (FS)
• Learnable matrix that reflects the shape of each filter
• Values of FS first initialized to 1
• l-th convolutional layer’s weight W : 𝑅𝑁×𝐶×𝐾×𝐾
→ FS : 𝑅𝑁×𝐾×𝐾
Each filter has one FS
Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
● Filter Skeleton(FS)
○ Loss function
○ Gradient of W(Weight), I(Filter Skeleton)
○ Gradient of W, I
(1)
(2)
(3)
(4)
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width
Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
 Mini-figure  one of 9 strips
 X-axis  all the filters (N)
 Y-axis  summation of the stripes located in the same position of all the filters
Filter (3x3)
Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Stripe-wise pruning (SWP)
• Stripe-wise pruning (SWP)
• Set threshold δ
• Corresponding values in FS < δ → not updated during training → pruned
α : magnitude of regularization
g(I) : L1 norm penalty on ‘I’
(5)
(6)
Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Stripe-wise pruning (SWP)
(7)
Data-driven AI
Security HCI (DASH) Lab
Experiments-Group Pruning vs. Stripe Pruning
Can continue the training
Can not continue the training
Group-wise pruning (GP)
They find that in GW, layer2.7 filters will be identified as invalid
because all the weights are removed while training
 It can not continue the training
Stripe-wise pruning (SWP)
Stripe-wise pruning keeps each filter independent of each other
 It can continue the training and achieve a higher accuracy than GP
Data-driven AI
Security HCI (DASH) Lab
Experiments
● Baseline accuracy
(CIFAR-10)
○ VGG16 93.25%
○ ResNet56 93.1%
● Baseline accuracy
(ImageNet)
○ ResNet18
- Top-1 69.76%
- Top-5 89.08%
Data-driven AI
Security HCI (DASH) Lab
• White color denotes the corresponding strip in the filter is removed by SWP
• In the layer that close to input, most preserved layers have multiple strips
• In middle layers, SWP only have one strip  redundancy is decreased
Experiments - Visualization of the filters pruned by SWP
(VGG19)
Layer #
Display the filters according to their frequency in such layer
Highest
Frequency
Lowest
Frequency
Data-driven AI
Security HCI (DASH) Lab
Experiments – Ablation Study
• How hyper-parameters affect pruning results
• Changing α (magnitude of regularization), δ (threshold)
• α = 1e-5, δ = 0.05 gives the acceptable pruning ratio and test accuracy
Data-driven AI
Security HCI (DASH) Lab
Conclusion
• Stripe-Wise Pruning (SWP)
- They propose a new pruning paradigm called SWP (Stripe-Wise Pruning)
- They achieve a higher pruning ratio compared to the filter-wise and
group-wise pruning methods.
- It achieve finer granularity than traditional FP while being hardware
friendly
• Filter Skeleton (FS)
- They propose a new method ‘Filter Skeleton’ to efficiently learn the
optimal shape of the filters for pruning
- Through extensive experiments and analyses, they demonstrate
the effectiveness
• Achievement SOTA pruning ratio
- They show SWP achieves state-of-art pruning ratio on CIFAR-10 and
ImageNet datasets compared to filter-wise, channel-wise or group-
wise pruning
Data-driven AI
Security HCI (DASH) Lab
Thank you !

More Related Content

PPTX
Transformers AI PPT.pptx
PPTX
Fine tune and deploy Hugging Face NLP models
PDF
TensorFlow and Keras: An Overview
PPTX
Long Short Term Memory (Neural Networks)
PDF
Distributed deep learning
PPTX
Hyperparameter Tuning
PDF
Introduction to XGBoost
PPTX
Machine Learning - Splitting Datasets
Transformers AI PPT.pptx
Fine tune and deploy Hugging Face NLP models
TensorFlow and Keras: An Overview
Long Short Term Memory (Neural Networks)
Distributed deep learning
Hyperparameter Tuning
Introduction to XGBoost
Machine Learning - Splitting Datasets

What's hot (20)

PDF
Transformers in 2021
PPTX
Deep Learning Tutorial
PDF
An introduction to the Transformers architecture and BERT
PDF
Deep learning - A Visual Introduction
PPTX
Ensemble methods
PPTX
Natural language processing
PPTX
Deep Learning With Neural Networks
PDF
Machine Learning and Data Mining: 10 Introduction to Classification
PDF
Tensorflow presentation
PPTX
Deep neural networks
PPTX
Normalization 방법
PPT
Artificial Neural Networks - ANN
PPTX
Introduction to Machine Learning
PDF
Introduction to Transformers for NLP - Olga Petrova
PPTX
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
PPTX
Random forest
PPTX
Attention Mechanism in Language Understanding and its Applications
PPTX
NLP State of the Art | BERT
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
PPT
Instance Based Learning in Machine Learning
Transformers in 2021
Deep Learning Tutorial
An introduction to the Transformers architecture and BERT
Deep learning - A Visual Introduction
Ensemble methods
Natural language processing
Deep Learning With Neural Networks
Machine Learning and Data Mining: 10 Introduction to Classification
Tensorflow presentation
Deep neural networks
Normalization 방법
Artificial Neural Networks - ANN
Introduction to Machine Learning
Introduction to Transformers for NLP - Olga Petrova
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Random forest
Attention Mechanism in Language Understanding and its Applications
NLP State of the Art | BERT
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Instance Based Learning in Machine Learning
Ad

Similar to [NeuralIPS 2020]filter in filter pruning (13)

PDF
Fractional step discriminant pruning
PDF
PDF
Efficient_DNN_pruning_System_for_machine_learning.pdf
PDF
Neural network pruning with residual connections and limited-data review [cdm]
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PPTX
Data-centric AI and the convergence of data and model engineering: opportunit...
PDF
"Deep Learning" Chap.6 Convolutional Neural Net
PPT
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
PPT
[Seminar arxiv]fake face detection via adaptive residuals extraction network
PDF
Compressing Neural Networks with Intel AI Lab's Distiller
PDF
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
PDF
Cheatsheet convolutional-neural-networks
PDF
deep CNN vs conventional ML
Fractional step discriminant pruning
Efficient_DNN_pruning_System_for_machine_learning.pdf
Neural network pruning with residual connections and limited-data review [cdm]
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Data-centric AI and the convergence of data and model engineering: opportunit...
"Deep Learning" Chap.6 Convolutional Neural Net
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[Seminar arxiv]fake face detection via adaptive residuals extraction network
Compressing Neural Networks with Intel AI Lab's Distiller
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Cheatsheet convolutional-neural-networks
deep CNN vs conventional ML
Ad

More from KIMMINHA3 (13)

PPTX
[ECCV2022] Generative Domain Adaptation for Face Anti-Spoofing
PPTX
[AAAI21] Self-Domain Adaptation for Face Anti-Spoofing
PPTX
[CVPR'22] Domain Generalization via Shuffled Style Assembly for Face Anti-Spo...
PPTX
[TIFS'22] Learning Meta Pattern for Face Anti-Spoofing
PPTX
[AAAI'23]Learning Polysemantic Spoof Trace
PPT
Architectures of Super-resolution (AI)
PPTX
[CVPRW2021]FReTAL: Generalizing Deepfake detection using Knowledge Distillati...
PPTX
Methods for interpreting and understanding deep neural networks
PPTX
Meta learned Confidence for Few-shot Learning
PPT
“zero-shot” super-resolution using deep internal learning [CVPR2018]
PPT
Transferable GAN-generated Images Detection Framework.
PPTX
Xception mhkim
PPTX
short text large effect measuring the impact of user reviews on android app s...
[ECCV2022] Generative Domain Adaptation for Face Anti-Spoofing
[AAAI21] Self-Domain Adaptation for Face Anti-Spoofing
[CVPR'22] Domain Generalization via Shuffled Style Assembly for Face Anti-Spo...
[TIFS'22] Learning Meta Pattern for Face Anti-Spoofing
[AAAI'23]Learning Polysemantic Spoof Trace
Architectures of Super-resolution (AI)
[CVPRW2021]FReTAL: Generalizing Deepfake detection using Knowledge Distillati...
Methods for interpreting and understanding deep neural networks
Meta learned Confidence for Few-shot Learning
“zero-shot” super-resolution using deep internal learning [CVPR2018]
Transferable GAN-generated Images Detection Framework.
Xception mhkim
short text large effect measuring the impact of user reviews on android app s...

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Advanced methodologies resolving dimensionality complications for autism neur...
SOPHOS-XG Firewall Administrator PPT.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Machine Learning_overview_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Getting Started with Data Integration: FME Form 101
A comparative study of natural language inference in Swahili using monolingua...
Assigned Numbers - 2025 - Bluetooth® Document
Univ-Connecticut-ChatGPT-Presentaion.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

[NeuralIPS 2020]filter in filter pruning

  • 1. Data-driven AI Security HCI (DASH) Lab 1 Data-driven AI Security HCI (DASH) Lab Pruning Filter in Filter 김민하 소프트웨어학과 성균관대학교 NeuralIPS 2020 May 6, 2020 Data-driven AI Security HCI (DASH) Lab
  • 2. Data-driven AI Security HCI (DASH) Lab Pruning? [NIPS2015] Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626 To remove the weight in neural network model Weight Pruning Weight Pruning (WP) prunes weight of each filter It remove redundant neurons iteratively
  • 3. Data-driven AI Security HCI (DASH) Lab Pruning? C : number of channels N : number of output channels W,H : Width / High Filter/Channel Pruning Filter/Channel Pruning (FP) prunes at the level of filter and channel It can prune a large region compared with the weight pruning 1. 2.
  • 4. Data-driven AI Security HCI (DASH) Lab Pruning? C : number of channels N : number of output channels W,H : Width / High Group Pruning  It breaks the independent assumption on the filters  Although the position in each filter same, the importance of weights is different  The network may lose representation ability under a large pruning ratio
  • 5. Data-driven AI Security HCI (DASH) Lab Abstract model deployment is sometimes costly due to the large number of parameters in DNNs  To solve this, ‘Pruning’ which is one of model compression algorithms  Filter Pruning(FP), Channel Pruning (CP), Weight Pruning (WP) and Group Pruning (GP)… Proble m Backgrou nd However, these can lose Important information because of pruning the weights of the same position  They wonder if they can learn the optimal kernel size of each filter by pruning
  • 6. Data-driven AI Security HCI (DASH) Lab Abstract Solutio n To converge the strength of filter pruning and weight pruning, they propose method as Stripe-Wise Pruning (SWP) with Filter Skeleton (FS). To fine the ‘filter shape’ alongside the filter weights, they propose ‘Filter Skeleton (FS)’ It treat a filter as K x K stripes, by pruning the stripes instead of the whole filter It can achieve finer granularity than traditional FP while being hardware friendly
  • 7. Data-driven AI Security HCI (DASH) Lab C : number of channels N : number of output channels W,H : Width / High Stripe Pruning It keeps each filter independent with each other, thus can lead to a more efficient network structure. (Their proposed method) Proposed Method – Filter Skeleton(FS)
  • 8. Data-driven AI Security HCI (DASH) Lab Proposed Method – Filter Skeleton(FS) : # of filters : channels : Kernel size : Filter Skeleton : feature map height : feature map width Filter Skeleton (FS) • Learnable matrix that reflects the shape of each filter • Values of FS first initialized to 1 • l-th convolutional layer’s weight W : 𝑅𝑁×𝐶×𝐾×𝐾 → FS : 𝑅𝑁×𝐾×𝐾 Each filter has one FS
  • 9. Data-driven AI Security HCI (DASH) Lab Proposed Method – Filter Skeleton(FS) ● Filter Skeleton(FS) ○ Loss function ○ Gradient of W(Weight), I(Filter Skeleton) ○ Gradient of W, I (1) (2) (3) (4) : # of filters : channels : Kernel size : Filter Skeleton : feature map height : feature map width
  • 10. Data-driven AI Security HCI (DASH) Lab Proposed Method – Filter Skeleton(FS)  Mini-figure  one of 9 strips  X-axis  all the filters (N)  Y-axis  summation of the stripes located in the same position of all the filters Filter (3x3)
  • 11. Data-driven AI Security HCI (DASH) Lab Proposed Method – Stripe-wise pruning (SWP) • Stripe-wise pruning (SWP) • Set threshold δ • Corresponding values in FS < δ → not updated during training → pruned α : magnitude of regularization g(I) : L1 norm penalty on ‘I’ (5) (6)
  • 12. Data-driven AI Security HCI (DASH) Lab Proposed Method – Stripe-wise pruning (SWP) (7)
  • 13. Data-driven AI Security HCI (DASH) Lab Experiments-Group Pruning vs. Stripe Pruning Can continue the training Can not continue the training Group-wise pruning (GP) They find that in GW, layer2.7 filters will be identified as invalid because all the weights are removed while training  It can not continue the training Stripe-wise pruning (SWP) Stripe-wise pruning keeps each filter independent of each other  It can continue the training and achieve a higher accuracy than GP
  • 14. Data-driven AI Security HCI (DASH) Lab Experiments ● Baseline accuracy (CIFAR-10) ○ VGG16 93.25% ○ ResNet56 93.1% ● Baseline accuracy (ImageNet) ○ ResNet18 - Top-1 69.76% - Top-5 89.08%
  • 15. Data-driven AI Security HCI (DASH) Lab • White color denotes the corresponding strip in the filter is removed by SWP • In the layer that close to input, most preserved layers have multiple strips • In middle layers, SWP only have one strip  redundancy is decreased Experiments - Visualization of the filters pruned by SWP (VGG19) Layer # Display the filters according to their frequency in such layer Highest Frequency Lowest Frequency
  • 16. Data-driven AI Security HCI (DASH) Lab Experiments – Ablation Study • How hyper-parameters affect pruning results • Changing α (magnitude of regularization), δ (threshold) • α = 1e-5, δ = 0.05 gives the acceptable pruning ratio and test accuracy
  • 17. Data-driven AI Security HCI (DASH) Lab Conclusion • Stripe-Wise Pruning (SWP) - They propose a new pruning paradigm called SWP (Stripe-Wise Pruning) - They achieve a higher pruning ratio compared to the filter-wise and group-wise pruning methods. - It achieve finer granularity than traditional FP while being hardware friendly • Filter Skeleton (FS) - They propose a new method ‘Filter Skeleton’ to efficiently learn the optimal shape of the filters for pruning - Through extensive experiments and analyses, they demonstrate the effectiveness • Achievement SOTA pruning ratio - They show SWP achieves state-of-art pruning ratio on CIFAR-10 and ImageNet datasets compared to filter-wise, channel-wise or group- wise pruning
  • 18. Data-driven AI Security HCI (DASH) Lab Thank you !