SlideShare a Scribd company logo
Learning Compact DNN
Models for Embedded
Vision
Shuvra S. Bhattacharyya
University of Maryland, College Park,
USA and
INSA/IETR Rennes, France
With contributions from
Xiaomin Wu and
Rong Chen
• Pruning:
• Remove neurons or parameters that provide little or no contribution to
inference accuracy
• Distillation:
• Transfer knowledge from a large model to a small model
• Neural Architecture Search:
• Optimize the number, types and connectivity of network layers
Popular Methods to Compress DNN Models
2
© 2023 University of Maryland
Pruning: Structured and Unstructured
3
© 2023 University of Maryland
Multilayer
Perceptron:
hidden layer
example
• Implementation-
friendly
• Supports common
ML libraries
• More general
• Needs specially-
designed
hardware/
software for sparse
computation
Pruning: Structured and Unstructured
4
© 2023 University of Maryland
CNN-layer
filter
example
• Implementation-
friendly
• Supports common
ML libraries
• More general
• Needs specially-
designed
hardware/
software for sparse
computation
• Deep Compression [Han 2015]
• Uses weight threshold to prune. Leads to unstructured network architecture.
• Inference-time channel reduction without retraining [He 2017]
• Applies a criterion based on Lasso Regression.
• ThiNet — weight-magnitude-based structured pruning [Luo 2017]
• Layer-wise relevance propagation (LRP) [Yeom 2021]
• Uses a novel criterion, layer-wise relevance propagation, to select weights for
structured pruning.
5
Previously-developed Pruning Methods
© 2023 University of Maryland
NeuroGRS was designed to derive compact DNN models for neural decoding systems.
It can also be applied to generate compact DNN models for other embedded vision applications.
Calcium
imaging of
brain
Motion
correction
Neuron
detection
Neural
signal
extraction
Neural
decoding
In real-time: 10 Hz
Image source:
https://www.nature.com/articles/npp2014206,
https://www.youtube.com/watch?v=d5zK1RUJCiU&ab_c
hannel=MocomiKids.
NeuroGRS
Calcium-imaging-based
neural decoding system:
Overview
© 2023 University of Maryland 6
Design of NeuroGRS
Prediction of mouse's behavior
from analysis of neural signals.
E.g., whether or how fast
the mouse is going to move.
• GRS stands for Greedy inter-layer order with Random Selection of intra-layer
units.
• Combines pruning and architecture search with an emphasis on structured
pruning.
• Takes into consideration both the model architecture and trained weights.
• Suitable for further compressing small DNN models for optimized embedded
implementation.
• Accompanied by a dataflow-based inference system for efficient inference.
Overview of NeuroGRS
7
© 2023 University of Maryland
• Structures determine performance for shallow DNNs; learned weights can be
retrained from scratch [Liu 2018] [Frankle 2018].
• This finding is especially relevant for embedded vision, where shallow DNNs
may be preferable due to resource constraints.
• Using a large compression rate (number of removed neurons or connections)
without retraining can significantly degrade inference accuracy [Li 2016] [Hu
2016].
© 2023 University of Maryland 8
Foundational Findings Applied in NeuroGRS
• Specify initial CNN structure in Keras Tensorflow format.
• If pretrained, load pretrained weights. GRS can either train first and
then prune or perform direct-prune from pretrained model.
• Provide training, validation, and testing data sets.
• Configure hyperparameters.
• Run NeuroGRS to execute the pruning process.
• Output  compact model implemented in Python/C/C++, suitable for
inference on embedded platforms
How to Use the NeuroGRS Software Package
9
© 2023 University of Maryland
Using NeuroGRS (Continued)
10
© 2023 University of Maryland
Dataflow graph for NeuroGRS:
© 2023 University of Maryland 11
GRS Method (1)
Mi: an overparameterized DNN model candidate
P(Mi): a pruned DNN model candidate
vi: a selection criterion
k: the final selected compact model(s)
EUP: Enable Unstructured Pruning
TQ: Thresholding weight connections, Quantization
DT: Training dataset
DV: Validation dataset
GRS: Greedy inter-layer order with Random Selection of intra-layer units
• 𝑉𝑎𝑙𝐴𝑐𝑐: validation
accuracy
• 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐: validation
accuracy of the initial
structure
• 𝒯: tolerance of accuracy
drop
Failed validation:
𝑉𝑎𝑙𝐴𝑐𝑐 < 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
© 2023 University of Maryland 12
GRS Method (2)
[Wu 2022] X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior
prediction from calcium imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022.
Two state-of-the-art unstructured pruning methods [Han 2015]:
• 𝑇ℎ𝑟𝑒𝑠ℎ𝐼𝑛𝑖𝑡 = 0.3
• 𝑇ℎ𝑟𝑒𝑠ℎ𝑆𝑡𝑒𝑝 = 0.1
• Iteratively increase the threshold until accuracy
falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
T:
T: Cut weight connections having relatively low
magnitudes.
Q: weight
quantization.
Q:
• Iteratively decrease the number of digits until
accuracy falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐
© 2023 University of Maryland 13
GRS Method (3)
(Python)
In: input loader
B: Bias loader
W: Weights loader
Read in float type data
© 2023 University of Maryland 14
GRS Method (4)
Initial models in different types and structures:
© 2023 University of Maryland 15
Neural Network Models for Evaluation
Experiment design:
• Investigate whether intermediate sub-structures impact the overall pruning result.
• Compare GRS with RRS.
• RRS = Random inter-layer order and Random Selection of intra-layer units.
• 𝒯 = 0.985
• Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each.
Results:
• AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement.
• Compare GRS with RRS. Metrics: GRS gives X percent more than RRS.
© 2023 University of Maryland 16
NeuroGRS Experiments (1)
Experiment design:
• Investigate whether state-of-the-art structured pruning methods for large neural networks are effective in our context.
• Compare GRS with NWM.
• NWM = Natural inter-layer order and Weight Magnitude based selection of intra-layer unit to prune.
• NWM is representative of other pruning methods that do not consider model structure [Han 2015, Luo 2017].
• 𝒯 = 0.985
• Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each.
Results:
• AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement.
• Compare GRS with NWM. Metrics: GRS gives X percent more than NWM.
© 2023 University of Maryland 17
NeuroGRS Experiments (2)
Experiment design:
• On 9 MSN (Medium Spiny Neuron)
datasets of 3000 frames each.
• 4 different shallow DNN models.
• 10 repeated trials.
• 𝒯 of GRS, Pruning Stage T, and
Stage Q are set to 0.985, 0.995,
and 0.990, respectively.
Results:
• Structured pruning using GRS:
• Further unstructured pruning with TQ:
© 2023 University of Maryland 18
NeuroGRS Experiments (3)
Experiment design:
• With 4 different types of DNN models: use NGSynth to implement their optimized and
original forms using LIDE-C, and deploy on a Raspberry Pi Zero W V1.1 platform.
• LIDE-C = Lightweight Dataflow Environment integrated with the C programming language
[Lin 2017].
• How much runtime improvement is observed from the compact models compared to their
corresponding overparameterized models?
Results:
© 2023 University of Maryland 19
NeuroGRS Experiments (4)
Overview:
• Identify the far phase and the near phase of the GRS pruning process.
• Structures impact model performance less in the far phase compared to
the near phase.
• This type of phase-based reasoning can be adapted to other pruning
methods.
• Develop a "jump mechanism" to help GRS step into the near phase much
faster  Much less time (and less carbon footprint) required for pruning.
© 2023 University of Maryland 20
Jump-GRS Extension
• We have given an overview of pruning and other classes of methods for
compressing DNN models.
• We have introduced a new pruning method called Greedy inter-layer
order with Random Selection of intra-layer units (GRS).
• We have combined GRS with methods for unstructured pruning to
provide a more comprehensive pruning solution.
• We have introduced a software tool, called NeuroGRS, that allows system
designers apply the GRS method with a high degree of automation.
• We have introduced concepts of near- and far-phase operation, which
are applied in NeuroGRS to greatly improve pruning speed.
Conclusion
21
© 2023 University of Maryland
22
Backup Slides
© 2023 University of Maryland
• Use RRS (Random inter-layer
order, random intra-layer selection
of units).
• Retrain and validate all possible
intermediate structures 3 times.
• Set 𝒯 = 0.5 to allow pruning to
continue.
• Use an MLP model called
“mlpmulti” with a hidden structure
of 16X16X16.
• Plot the average validation
accuracy of all possible structures
with repeats at each pruning step.
Demonstrating the pruning phases
© 2023 University of Maryland 24
Jump-GRS Introduction
JGRS algorithm:
• Multiple attempts are used to exploit
randomization in the algorithm. The
best result across all attempts is taken.
• The last valid structure, after all
attempts, will be used as the initial
structure for the next phase.
• Each attempt can be regarded as an
examination of different cut-off artificial
node sets.
• The three phases of GRS have different
compression rates.
• JGRS reduces the compression rate as it
goes from one subphase/phase to the
next.
• A structure fails if its validation acc.
becomes lower than 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐.
© 2023 University of Maryland 25
Jump-GRS Method
Initial DNN models used in NeuroGRS: Scaled DNN models:
© 2023 University of Maryland 26
Evaluation of Jump-GRS Method
JGRS and GRS Comparison
• 18 datasets (MSN) of 3000 frames.
• Report the average of 10 repeated trials
• on all MSN datasets.
JGRS vs GRS results:
• 𝒯 for both GRS and JGRS is 0.985
• Attempts:
• Far subphase 1 = 3
• Far subphase 2 = 3
• GRS = 3
© 2023 University of Maryland 27
Jump-GRS Experiments (1)
JGRS on larger DNNs
• 18 MSN Datasets of 3000
frames.
• WGEVIA-REAL: 1600
balanced labeled
embeddings for two
classes of microcircuits.
• 𝒯 = 0.985
• Pruning time is reported
in seconds using a Core
i7-2600K CPU with a
GeForce GTX 1080
GPU.
JGRS on MSN dataset
JGRS on WGEVIA-REAL dataset
GRS on MSN dataset
© 2023 University of Maryland 28
Jump-GRS Experiments (2)
GRS, JGRS runtime trend:
• WGEVIA-REAL dataset
• MLP models with different
numbers of nodes in each
layer.
• 𝒯 = 0.985
• Plot average runtime
(seconds) of 10 repeated
trials.
Execution
time
(seconds)
© 2023 University of Maryland 29
Jump-GRS Experiments (3)
• [Bhattacharyya 2019] Bhattacharyya, Shuvra S., et al. "Handbook of Signal Processing Systems." (2019).
• [Han 2015] Han, Song, et al. "Learning both weights and connections for efficient neural network." Advances in neural information
processing systems 28 (2015).
• [He 2017] He, Yihui, Xiangyu Zhang, and Jian Sun. "Channel pruning for accelerating very deep neural networks." Proceedings of the IEEE
international conference on computer vision. 2017.
• [Hu 2016] Hu, Hengyuan, et al. "Network trimming: A data-driven neuron pruning approach towards efficient deep architectures." arXiv
preprint arXiv:1607.03250 (2016).
• [Li 2019] Li, Chunyue, et al. "Prediction of forelimb reach results from motor cortex activities based on calcium imaging and deep learning."
Frontiers in cellular neuroscience 13 (2019): 88.
• [Li 2016] Li, Hao, et al. "Pruning filters for efficient convnets." arXiv preprint arXiv:1608.08710 (2016).
• [Lee 2017] Lee, Yaesop, et al. "Online learning in neural decoding using incremental linear discriminant analysis." 2017 IEEE International
conference on cyborg and bionic systems (CBS). IEEE, 2017.
• [Lin 2017] Lin, Shuoxin, et al. "The DSPCAD framework for modeling and synthesis of signal processing systems." Handbook of
hardware/software codesign. Springer, Dordrecht, 2017. 1185-1219.
• [Liu 2018] Liu, Zhuang, et al. "Rethinking the value of network pruning." arXiv preprint arXiv:1810.05270 (2018).
• [Frankle 2018] Frankle, Jonathan, and Michael Carbin. "The lottery ticket hypothesis: Finding sparse, trainable neural networks." arXiv
preprint arXiv:1803.03635 (2018).
• [Luo 2017] Luo, Jian-Hao, Jianxin Wu, and Weiyao Lin. "Thinet: A filter level pruning method for deep neural network compression."
Proceedings of the IEEE international conference on computer vision. 2017.
• [Molchanov 2016] Molchanov, Pavlo, et al. "Pruning convolutional neural networks for resource efficient inference." arXiv preprint
arXiv:1611.06440 (2016).
References (1)
30
© 2023 University of Maryland
• [Suau 2018] Suau, Xavier, et al. "Principal filter analysis for guided network compression." arXiv preprint arXiv:1807.10585 2 (2018).
• [Wu 2022]X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior prediction from calcium
imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022.
• [Yeom 2021] Yeom, Seul-Ki, et al. "Pruning by explaining: A novel criterion for deep neural network pruning." Pattern Recognition 115
(2021): 107899.
References (2)
31
© 2023 University of Maryland

More Related Content

PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PDF
From RNN to neural networks for cyclic undirected graphs
PDF
Compressing Neural Networks with Intel AI Lab's Distiller
PDF
Neural Networks in the Wild: Handwriting Recognition
PDF
Evolutionary optimization of bio-inspired controllers for modular soft robots...
PDF
Efficient_DNN_pruning_System_for_machine_learning.pdf
PDF
Deep randomized neural networks
PPT
Software tookits for machine learning and graphical models
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
From RNN to neural networks for cyclic undirected graphs
Compressing Neural Networks with Intel AI Lab's Distiller
Neural Networks in the Wild: Handwriting Recognition
Evolutionary optimization of bio-inspired controllers for modular soft robots...
Efficient_DNN_pruning_System_for_machine_learning.pdf
Deep randomized neural networks
Software tookits for machine learning and graphical models

Similar to “Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park (20)

PDF
Introduction to Chainer: A Flexible Framework for Deep Learning
PDF
Model Compression
PPTX
Icse15 Tech-briefing Data Science
PDF
Triangulating general intelligence and cortical microcircuits
PDF
PPTX
008 GNNs at Scale With Graph Data Science Sampling and Python Client Integrat...
PDF
DriverPack Solution Download Full ISO
PDF
Wondershare Recoverit 13.5.12.11 Free Download
PDF
RadioBOSS Advanced 7.0.8 Free Download
PDF
Apple Logic Pro X for MacOS Free Download
PDF
Deep learning and applications in non-cognitive domains I
PDF
Grl book
PDF
Nonlinear image processing using artificial neural
PDF
X trepan an extended trepan for
PDF
PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
PPTX
PPTX
Parallel Algorithms for Geometric Graph Problems (at Stanford)
PDF
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
PDF
深度学习639页PPT/////////////////////////////
PPTX
deeplearningpresentation-180625071236.pptx
Introduction to Chainer: A Flexible Framework for Deep Learning
Model Compression
Icse15 Tech-briefing Data Science
Triangulating general intelligence and cortical microcircuits
008 GNNs at Scale With Graph Data Science Sampling and Python Client Integrat...
DriverPack Solution Download Full ISO
Wondershare Recoverit 13.5.12.11 Free Download
RadioBOSS Advanced 7.0.8 Free Download
Apple Logic Pro X for MacOS Free Download
Deep learning and applications in non-cognitive domains I
Grl book
Nonlinear image processing using artificial neural
X trepan an extended trepan for
PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
深度学习639页PPT/////////////////////////////
deeplearningpresentation-180625071236.pptx
Ad

More from Edge AI and Vision Alliance (20)

PDF
“An Introduction to the MIPI CSI-2 Image Sensor Standard and Its Latest Advan...
PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“An Introduction to the MIPI CSI-2 Image Sensor Standard and Its Latest Advan...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the University of Maryland at College Park

  • 1. Learning Compact DNN Models for Embedded Vision Shuvra S. Bhattacharyya University of Maryland, College Park, USA and INSA/IETR Rennes, France With contributions from Xiaomin Wu and Rong Chen
  • 2. • Pruning: • Remove neurons or parameters that provide little or no contribution to inference accuracy • Distillation: • Transfer knowledge from a large model to a small model • Neural Architecture Search: • Optimize the number, types and connectivity of network layers Popular Methods to Compress DNN Models 2 © 2023 University of Maryland
  • 3. Pruning: Structured and Unstructured 3 © 2023 University of Maryland Multilayer Perceptron: hidden layer example • Implementation- friendly • Supports common ML libraries • More general • Needs specially- designed hardware/ software for sparse computation
  • 4. Pruning: Structured and Unstructured 4 © 2023 University of Maryland CNN-layer filter example • Implementation- friendly • Supports common ML libraries • More general • Needs specially- designed hardware/ software for sparse computation
  • 5. • Deep Compression [Han 2015] • Uses weight threshold to prune. Leads to unstructured network architecture. • Inference-time channel reduction without retraining [He 2017] • Applies a criterion based on Lasso Regression. • ThiNet — weight-magnitude-based structured pruning [Luo 2017] • Layer-wise relevance propagation (LRP) [Yeom 2021] • Uses a novel criterion, layer-wise relevance propagation, to select weights for structured pruning. 5 Previously-developed Pruning Methods © 2023 University of Maryland
  • 6. NeuroGRS was designed to derive compact DNN models for neural decoding systems. It can also be applied to generate compact DNN models for other embedded vision applications. Calcium imaging of brain Motion correction Neuron detection Neural signal extraction Neural decoding In real-time: 10 Hz Image source: https://www.nature.com/articles/npp2014206, https://www.youtube.com/watch?v=d5zK1RUJCiU&ab_c hannel=MocomiKids. NeuroGRS Calcium-imaging-based neural decoding system: Overview © 2023 University of Maryland 6 Design of NeuroGRS Prediction of mouse's behavior from analysis of neural signals. E.g., whether or how fast the mouse is going to move.
  • 7. • GRS stands for Greedy inter-layer order with Random Selection of intra-layer units. • Combines pruning and architecture search with an emphasis on structured pruning. • Takes into consideration both the model architecture and trained weights. • Suitable for further compressing small DNN models for optimized embedded implementation. • Accompanied by a dataflow-based inference system for efficient inference. Overview of NeuroGRS 7 © 2023 University of Maryland
  • 8. • Structures determine performance for shallow DNNs; learned weights can be retrained from scratch [Liu 2018] [Frankle 2018]. • This finding is especially relevant for embedded vision, where shallow DNNs may be preferable due to resource constraints. • Using a large compression rate (number of removed neurons or connections) without retraining can significantly degrade inference accuracy [Li 2016] [Hu 2016]. © 2023 University of Maryland 8 Foundational Findings Applied in NeuroGRS
  • 9. • Specify initial CNN structure in Keras Tensorflow format. • If pretrained, load pretrained weights. GRS can either train first and then prune or perform direct-prune from pretrained model. • Provide training, validation, and testing data sets. • Configure hyperparameters. • Run NeuroGRS to execute the pruning process. • Output  compact model implemented in Python/C/C++, suitable for inference on embedded platforms How to Use the NeuroGRS Software Package 9 © 2023 University of Maryland
  • 10. Using NeuroGRS (Continued) 10 © 2023 University of Maryland
  • 11. Dataflow graph for NeuroGRS: © 2023 University of Maryland 11 GRS Method (1) Mi: an overparameterized DNN model candidate P(Mi): a pruned DNN model candidate vi: a selection criterion k: the final selected compact model(s) EUP: Enable Unstructured Pruning TQ: Thresholding weight connections, Quantization DT: Training dataset DV: Validation dataset
  • 12. GRS: Greedy inter-layer order with Random Selection of intra-layer units • 𝑉𝑎𝑙𝐴𝑐𝑐: validation accuracy • 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐: validation accuracy of the initial structure • 𝒯: tolerance of accuracy drop Failed validation: 𝑉𝑎𝑙𝐴𝑐𝑐 < 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐 © 2023 University of Maryland 12 GRS Method (2) [Wu 2022] X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior prediction from calcium imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022.
  • 13. Two state-of-the-art unstructured pruning methods [Han 2015]: • 𝑇ℎ𝑟𝑒𝑠ℎ𝐼𝑛𝑖𝑡 = 0.3 • 𝑇ℎ𝑟𝑒𝑠ℎ𝑆𝑡𝑒𝑝 = 0.1 • Iteratively increase the threshold until accuracy falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐 T: T: Cut weight connections having relatively low magnitudes. Q: weight quantization. Q: • Iteratively decrease the number of digits until accuracy falls below 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐 © 2023 University of Maryland 13 GRS Method (3)
  • 14. (Python) In: input loader B: Bias loader W: Weights loader Read in float type data © 2023 University of Maryland 14 GRS Method (4)
  • 15. Initial models in different types and structures: © 2023 University of Maryland 15 Neural Network Models for Evaluation
  • 16. Experiment design: • Investigate whether intermediate sub-structures impact the overall pruning result. • Compare GRS with RRS. • RRS = Random inter-layer order and Random Selection of intra-layer units. • 𝒯 = 0.985 • Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each. Results: • AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement. • Compare GRS with RRS. Metrics: GRS gives X percent more than RRS. © 2023 University of Maryland 16 NeuroGRS Experiments (1)
  • 17. Experiment design: • Investigate whether state-of-the-art structured pruning methods for large neural networks are effective in our context. • Compare GRS with NWM. • NWM = Natural inter-layer order and Weight Magnitude based selection of intra-layer unit to prune. • NWM is representative of other pruning methods that do not consider model structure [Han 2015, Luo 2017]. • 𝒯 = 0.985 • Report average of 4 different models on 9 MSN (Medium Spiny Neuron) datasets with 10 repeated trials each. Results: • AL: Test Accuracy Loss, FCI: FLOP Count Improvement, PCI: Parameter Count Improvement. • Compare GRS with NWM. Metrics: GRS gives X percent more than NWM. © 2023 University of Maryland 17 NeuroGRS Experiments (2)
  • 18. Experiment design: • On 9 MSN (Medium Spiny Neuron) datasets of 3000 frames each. • 4 different shallow DNN models. • 10 repeated trials. • 𝒯 of GRS, Pruning Stage T, and Stage Q are set to 0.985, 0.995, and 0.990, respectively. Results: • Structured pruning using GRS: • Further unstructured pruning with TQ: © 2023 University of Maryland 18 NeuroGRS Experiments (3)
  • 19. Experiment design: • With 4 different types of DNN models: use NGSynth to implement their optimized and original forms using LIDE-C, and deploy on a Raspberry Pi Zero W V1.1 platform. • LIDE-C = Lightweight Dataflow Environment integrated with the C programming language [Lin 2017]. • How much runtime improvement is observed from the compact models compared to their corresponding overparameterized models? Results: © 2023 University of Maryland 19 NeuroGRS Experiments (4)
  • 20. Overview: • Identify the far phase and the near phase of the GRS pruning process. • Structures impact model performance less in the far phase compared to the near phase. • This type of phase-based reasoning can be adapted to other pruning methods. • Develop a "jump mechanism" to help GRS step into the near phase much faster  Much less time (and less carbon footprint) required for pruning. © 2023 University of Maryland 20 Jump-GRS Extension
  • 21. • We have given an overview of pruning and other classes of methods for compressing DNN models. • We have introduced a new pruning method called Greedy inter-layer order with Random Selection of intra-layer units (GRS). • We have combined GRS with methods for unstructured pruning to provide a more comprehensive pruning solution. • We have introduced a software tool, called NeuroGRS, that allows system designers apply the GRS method with a high degree of automation. • We have introduced concepts of near- and far-phase operation, which are applied in NeuroGRS to greatly improve pruning speed. Conclusion 21 © 2023 University of Maryland
  • 22. 22 Backup Slides © 2023 University of Maryland
  • 23. • Use RRS (Random inter-layer order, random intra-layer selection of units). • Retrain and validate all possible intermediate structures 3 times. • Set 𝒯 = 0.5 to allow pruning to continue. • Use an MLP model called “mlpmulti” with a hidden structure of 16X16X16. • Plot the average validation accuracy of all possible structures with repeats at each pruning step. Demonstrating the pruning phases © 2023 University of Maryland 24 Jump-GRS Introduction
  • 24. JGRS algorithm: • Multiple attempts are used to exploit randomization in the algorithm. The best result across all attempts is taken. • The last valid structure, after all attempts, will be used as the initial structure for the next phase. • Each attempt can be regarded as an examination of different cut-off artificial node sets. • The three phases of GRS have different compression rates. • JGRS reduces the compression rate as it goes from one subphase/phase to the next. • A structure fails if its validation acc. becomes lower than 𝒯 × 𝑂𝑟𝑖𝑉𝑎𝑙𝐴𝑐𝑐. © 2023 University of Maryland 25 Jump-GRS Method
  • 25. Initial DNN models used in NeuroGRS: Scaled DNN models: © 2023 University of Maryland 26 Evaluation of Jump-GRS Method
  • 26. JGRS and GRS Comparison • 18 datasets (MSN) of 3000 frames. • Report the average of 10 repeated trials • on all MSN datasets. JGRS vs GRS results: • 𝒯 for both GRS and JGRS is 0.985 • Attempts: • Far subphase 1 = 3 • Far subphase 2 = 3 • GRS = 3 © 2023 University of Maryland 27 Jump-GRS Experiments (1)
  • 27. JGRS on larger DNNs • 18 MSN Datasets of 3000 frames. • WGEVIA-REAL: 1600 balanced labeled embeddings for two classes of microcircuits. • 𝒯 = 0.985 • Pruning time is reported in seconds using a Core i7-2600K CPU with a GeForce GTX 1080 GPU. JGRS on MSN dataset JGRS on WGEVIA-REAL dataset GRS on MSN dataset © 2023 University of Maryland 28 Jump-GRS Experiments (2)
  • 28. GRS, JGRS runtime trend: • WGEVIA-REAL dataset • MLP models with different numbers of nodes in each layer. • 𝒯 = 0.985 • Plot average runtime (seconds) of 10 repeated trials. Execution time (seconds) © 2023 University of Maryland 29 Jump-GRS Experiments (3)
  • 29. • [Bhattacharyya 2019] Bhattacharyya, Shuvra S., et al. "Handbook of Signal Processing Systems." (2019). • [Han 2015] Han, Song, et al. "Learning both weights and connections for efficient neural network." Advances in neural information processing systems 28 (2015). • [He 2017] He, Yihui, Xiangyu Zhang, and Jian Sun. "Channel pruning for accelerating very deep neural networks." Proceedings of the IEEE international conference on computer vision. 2017. • [Hu 2016] Hu, Hengyuan, et al. "Network trimming: A data-driven neuron pruning approach towards efficient deep architectures." arXiv preprint arXiv:1607.03250 (2016). • [Li 2019] Li, Chunyue, et al. "Prediction of forelimb reach results from motor cortex activities based on calcium imaging and deep learning." Frontiers in cellular neuroscience 13 (2019): 88. • [Li 2016] Li, Hao, et al. "Pruning filters for efficient convnets." arXiv preprint arXiv:1608.08710 (2016). • [Lee 2017] Lee, Yaesop, et al. "Online learning in neural decoding using incremental linear discriminant analysis." 2017 IEEE International conference on cyborg and bionic systems (CBS). IEEE, 2017. • [Lin 2017] Lin, Shuoxin, et al. "The DSPCAD framework for modeling and synthesis of signal processing systems." Handbook of hardware/software codesign. Springer, Dordrecht, 2017. 1185-1219. • [Liu 2018] Liu, Zhuang, et al. "Rethinking the value of network pruning." arXiv preprint arXiv:1810.05270 (2018). • [Frankle 2018] Frankle, Jonathan, and Michael Carbin. "The lottery ticket hypothesis: Finding sparse, trainable neural networks." arXiv preprint arXiv:1803.03635 (2018). • [Luo 2017] Luo, Jian-Hao, Jianxin Wu, and Weiyao Lin. "Thinet: A filter level pruning method for deep neural network compression." Proceedings of the IEEE international conference on computer vision. 2017. • [Molchanov 2016] Molchanov, Pavlo, et al. "Pruning convolutional neural networks for resource efficient inference." arXiv preprint arXiv:1611.06440 (2016). References (1) 30 © 2023 University of Maryland
  • 30. • [Suau 2018] Suau, Xavier, et al. "Principal filter analysis for guided network compression." arXiv preprint arXiv:1807.10585 2 (2018). • [Wu 2022]X. Wu, D.-T. Lin, R. Chen, and S. Bhattacharyya. Learning compact DNN models for behavior prediction from calcium imaging of neural activity. Journal of Signal Processing Systems, 94:455-472, 2022. • [Yeom 2021] Yeom, Seul-Ki, et al. "Pruning by explaining: A novel criterion for deep neural network pruning." Pattern Recognition 115 (2021): 107899. References (2) 31 © 2023 University of Maryland