“Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park

Learning Compact DNN
Models for Embedded
Vision
Shuvra S. Bhattacharyya
University of Maryland, College Park,
USA and
INSA/IETR Rennes, France
With contributions from
Xiaomin Wu and
Rong Chen

• Pruning:
• Remove neurons or parameters that provide little or no contribution to
inference accuracy
• Distillation:
• Transfer knowledge from a large model to a small model
• Neural Architecture Search:
• Optimize the number, types and connectivity of network layers
Popular Methods to Compress DNN Models
2
© 2023 University of Maryland

Pruning: Structured and Unstructured
3
Multilayer
Perceptron:
hidden layer
example
• Implementation-
friendly
• Supports common
ML libraries
• More general
• Needs specially-
designed
hardware/
software for sparse
computation

Pruning: Structured and Unstructured
4
CNN-layer
filter
example
• Implementation-
friendly
• Supports common
ML libraries
• More general
• Needs specially-
designed
hardware/
software for sparse
computation

• Deep Compression [Han 2015]
• Uses weight threshold to prune. Leads to unstructured network architecture.
• Inference-time channel reduction without retraining [He 2017]
• Applies a criterion based on Lasso Regression.
• ThiNet — weight-magnitude-based structured pruning [Luo 2017]
• Layer-wise relevance propagation (LRP) [Yeom 2021]
• Uses a novel criterion, layer-wise relevance propagation, to select weights for
structured pruning.
5
Previously-developed Pruning Methods

NeuroGRS was designed to derive compact DNN models for neural decoding systems.
It can also be applied to generate compact DNN models for other embedded vision applications.
Calcium
imaging of
brain
Motion
correction
Neuron
detection
Neural
signal
extraction
Neural
decoding
In real-time: 10 Hz
Image source:
https://www.nature.com/articles/npp2014206,
https://www.youtube.com/watch?v=d5zK1RUJCiU&ab_c
hannel=MocomiKids.
NeuroGRS
Calcium-imaging-based
neural decoding system:
Overview
© 2023 University of Maryland 6
Design of NeuroGRS
Prediction of mouse's behavior
from analysis of neural signals.
E.g., whether or how fast
the mouse is going to move.

• GRS stands for Greedy inter-layer order with Random Selection of intra-layer
units.
• Combines pruning and architecture search with an emphasis on structured
pruning.
• Takes into consideration both the model architecture and trained weights.
• Suitable for further compressing small DNN models for optimized embedded
implementation.
• Accompanied by a dataflow-based inference system for efficient inference.
Overview of NeuroGRS
7

• Structures determine performance for shallow DNNs; learned weights can be
retrained from scratch [Liu 2018] [Frankle 2018].
• This finding is especially relevant for embedded vision, where shallow DNNs
may be preferable due to resource constraints.
• Using a large compression rate (number of removed neurons or connections)
without retraining can significantly degrade inference accuracy [Li 2016] [Hu
2016].
Foundational Findings Applied in NeuroGRS

• Specify initial CNN structure in Keras Tensorflow format.
• If pretrained, load pretrained weights. GRS can either train first and
then prune or perform direct-prune from pretrained model.
• Provide training, validation, and testing data sets.
• Configure hyperparameters.
• Run NeuroGRS to execute the pruning process.
• Output  compact model implemented in Python/C/C++, suitable for
inference on embedded platforms
How to Use the NeuroGRS Software Package
9

Using NeuroGRS (Continued)
10

Dataflow graph for NeuroGRS:
GRS Method (1)
Mi: an overparameterized DNN model candidate
P(Mi): a pruned DNN model candidate
vi: a selection criterion
k: the final selected compact model(s)
EUP: Enable Unstructured Pruning
TQ: Thresholding weight connections, Quantization
DT: Training dataset
DV: Validation dataset

GRS: Greedy inter-layer order with Random Selection of intra-layer units
• 𝑉𝑎𝑙𝐴𝑐𝑐: validation
accuracy
• 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐: validation
accuracy of the initial
structure
• 𝒯: tolerance of accuracy
drop
Failed validation:
𝑉𝑎𝑙𝐴𝑐𝑐 < 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
GRS Method (2)
[Wu 2022] X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior
prediction from calcium imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022.

Two state-of-the-art unstructured pruning methods [Han 2015]:
• 𝑇ℎ𝑟𝑒𝑠ℎ𝐼𝑛𝑖𝑡 = 0.3
• 𝑇ℎ𝑟𝑒𝑠ℎ𝑆𝑡𝑒𝑝 = 0.1
• Iteratively increase the threshold until accuracy
falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
T:
T: Cut weight connections having relatively low
magnitudes.
Q: weight
quantization.
Q:
• Iteratively decrease the number of digits until
accuracy falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
GRS Method (3)

(Python)
In: input loader
B: Bias loader
W: Weights loader
Read in float type data
GRS Method (4)

Initial models in different types and structures:
Neural Network Models for Evaluation

Experiment design:
• Investigate whether intermediate sub-structures impact the overall pruning result.
• Compare GRS with RRS.
• RRS = Random inter-layer order and Random Selection of intra-layer units.
• 𝒯 = 0.985
• Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each.
Results:
• AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement.
• Compare GRS with RRS. Metrics: GRS gives X percent more than RRS.
NeuroGRS Experiments (1)

Experiment design:
• Investigate whether state-of-the-art structured pruning methods for large neural networks are effective in our context.
• Compare GRS with NWM.
• NWM = Natural inter-layer order and Weight Magnitude based selection of intra-layer unit to prune.
• NWM is representative of other pruning methods that do not consider model structure [Han 2015, Luo 2017].
• 𝒯 = 0.985
• Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each.
Results:
• AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement.
• Compare GRS with NWM. Metrics: GRS gives X percent more than NWM.

Experiment design:
• On 9 MSN (Medium Spiny Neuron)
datasets of 3000 frames each.
• 4 different shallow DNN models.
• 10 repeated trials.
• 𝒯 of GRS, Pruning Stage T, and
Stage Q are set to 0.985, 0.995,
and 0.990, respectively.
Results:
• Structured pruning using GRS:
• Further unstructured pruning with TQ:

Experiment design:
• With 4 different types of DNN models: use NGSynth to implement their optimized and
original forms using LIDE-C, and deploy on a Raspberry Pi Zero W V1.1 platform.
• LIDE-C = Lightweight Dataflow Environment integrated with the C programming language
[Lin 2017].
• How much runtime improvement is observed from the compact models compared to their
corresponding overparameterized models?
Results:

Overview:
• Identify the far phase and the near phase of the GRS pruning process.
• Structures impact model performance less in the far phase compared to
the near phase.
• This type of phase-based reasoning can be adapted to other pruning
methods.
• Develop a "jump mechanism" to help GRS step into the near phase much
faster  Much less time (and less carbon footprint) required for pruning.
Jump-GRS Extension

• We have given an overview of pruning and other classes of methods for
compressing DNN models.
• We have introduced a new pruning method called Greedy inter-layer
order with Random Selection of intra-layer units (GRS).
• We have combined GRS with methods for unstructured pruning to
provide a more comprehensive pruning solution.
• We have introduced a software tool, called NeuroGRS, that allows system
designers apply the GRS method with a high degree of automation.
• We have introduced concepts of near- and far-phase operation, which
are applied in NeuroGRS to greatly improve pruning speed.
Conclusion
21

22
Backup Slides

• Use RRS (Random inter-layer
order, random intra-layer selection
of units).
• Retrain and validate all possible
intermediate structures 3 times.
• Set 𝒯 = 0.5 to allow pruning to
continue.
• Use an MLP model called
“mlpmulti” with a hidden structure
of 16X16X16.
• Plot the average validation
accuracy of all possible structures
with repeats at each pruning step.
Demonstrating the pruning phases
Jump-GRS Introduction

JGRS algorithm:
• Multiple attempts are used to exploit
randomization in the algorithm. The
best result across all attempts is taken.
• The last valid structure, after all
attempts, will be used as the initial
structure for the next phase.
• Each attempt can be regarded as an
examination of different cut-off artificial
node sets.
• The three phases of GRS have different
compression rates.
• JGRS reduces the compression rate as it
goes from one subphase/phase to the
next.
• A structure fails if its validation acc.
becomes lower than 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐.
Jump-GRS Method

Initial DNN models used in NeuroGRS: Scaled DNN models:
Evaluation of Jump-GRS Method

JGRS and GRS Comparison
• 18 datasets (MSN) of 3000 frames.
• Report the average of 10 repeated trials
• on all MSN datasets.
JGRS vs GRS results:
• 𝒯 for both GRS and JGRS is 0.985
• Attempts:
• Far subphase 1 = 3
• Far subphase 2 = 3
• GRS = 3
Jump-GRS Experiments (1)

JGRS on larger DNNs
• 18 MSN Datasets of 3000
frames.
• WGEVIA-REAL: 1600
balanced labeled
embeddings for two
classes of microcircuits.
• 𝒯 = 0.985
• Pruning time is reported
in seconds using a Core
i7-2600K CPU with a
GeForce GTX 1080
GPU.
JGRS on MSN dataset
JGRS on WGEVIA-REAL dataset
GRS on MSN dataset

GRS, JGRS runtime trend:
• WGEVIA-REAL dataset
• MLP models with different
numbers of nodes in each
layer.
• 𝒯 = 0.985
• Plot average runtime
(seconds) of 10 repeated
trials.
Execution
time
(seconds)

• [Bhattacharyya 2019] Bhattacharyya, Shuvra S., et al. "Handbook of Signal Processing Systems." (2019).
• [Han 2015] Han, Song, et al. "Learning both weights and connections for efficient neural network." Advances in neural information
processing systems 28 (2015).
• [He 2017] He, Yihui, Xiangyu Zhang, and Jian Sun. "Channel pruning for accelerating very deep neural networks." Proceedings of the IEEE
international conference on computer vision. 2017.
• [Hu 2016] Hu, Hengyuan, et al. "Network trimming: A data-driven neuron pruning approach towards efficient deep architectures." arXiv
preprint arXiv:1607.03250 (2016).
• [Li 2019] Li, Chunyue, et al. "Prediction of forelimb reach results from motor cortex activities based on calcium imaging and deep learning."
Frontiers in cellular neuroscience 13 (2019): 88.
• [Li 2016] Li, Hao, et al. "Pruning filters for efficient convnets." arXiv preprint arXiv:1608.08710 (2016).
• [Lee 2017] Lee, Yaesop, et al. "Online learning in neural decoding using incremental linear discriminant analysis." 2017 IEEE International
conference on cyborg and bionic systems (CBS). IEEE, 2017.
• [Lin 2017] Lin, Shuoxin, et al. "The DSPCAD framework for modeling and synthesis of signal processing systems." Handbook of
hardware/software codesign. Springer, Dordrecht, 2017. 1185-1219.
• [Liu 2018] Liu, Zhuang, et al. "Rethinking the value of network pruning." arXiv preprint arXiv:1810.05270 (2018).
• [Frankle 2018] Frankle, Jonathan, and Michael Carbin. "The lottery ticket hypothesis: Finding sparse, trainable neural networks." arXiv
preprint arXiv:1803.03635 (2018).
• [Luo 2017] Luo, Jian-Hao, Jianxin Wu, and Weiyao Lin. "Thinet: A filter level pruning method for deep neural network compression."
Proceedings of the IEEE international conference on computer vision. 2017.
• [Molchanov 2016] Molchanov, Pavlo, et al. "Pruning convolutional neural networks for resource efficient inference." arXiv preprint
arXiv:1611.06440 (2016).
References (1)
30

• [Suau 2018] Suau, Xavier, et al. "Principal filter analysis for guided network compression." arXiv preprint arXiv:1807.10585 2 (2018).
• [Wu 2022]X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior prediction from calcium
imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022.
• [Yeom 2021] Yeom, Seul-Ki, et al. "Pruning by explaining: A novel criterion for deep neural network pruning." Pattern Recognition 115
(2021): 107899.
References (2)
31

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park

More Related Content

Similar to “Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park