SlideShare a Scribd company logo
Recent Trends in DNN
Compression
October 12th, 2018
Kaushalya Madhawa
Murata Laboratory
1
Tokyo Tech
Back then…
2
• Size of commonly used DNNs
• AlexNet 240MB
• VGG 16 552MB
• Inception V3 109MB
• Running models on the cloud has its own
disadvantages
• Network latency
• Privacy
DNN Compression
• Can we achieve the same accuracy with
smaller models?
• There are several approaches to obtain
smaller models
– Compressing pre-trained networks
• DeepCompression (Han+, 2016)
– Designing of compact models
• SqueezeNet (Iandola+, 2016)
• MobileNets (Howard+, 2017)
3
Deep Compression (Han+, ICLR 2016)
• One of the first papers to introduce model compression
• Requires specific custom hardware to leverage
inferencing
• Sparsity doesn’t always translate to reduced inference
time
4
Deep Compression (Han+, ICLR 2016)
• One of the first papers to introduce model compression
• Requires specific custom hardware to leverage
inferencing
• Sparsity doesn’t always translate to reduced inference
time
5
Compact Models
• Designing networks
with less number of
parameters
• SqueezeNet - AlexNet
level accuracy with 50x
less parameters
• MobileNets - Depth-
wise separable
convolutions
6
Fire module: SqueezeNet
Compact Models
• Designing networks
with less number of
parameters
• SqueezeNet - AlexNet
level accuracy with 50x
less parameters
• MobileNets - Depth-
wise separable
convolutions
7
Fire module: SqueezeNet
Requires lot of expertise and consumes lot of time!
State-of-the-art (SOTA) in 2018
!8
State-of-the-art (SOTA) in 2018
• Mobile devices
• More memory

• Has dedicated hardware to run ML models

• Deep Learning frameworks
• Models
• Directly optimize models for the resource constraint (eg:
size)

• More focus on latency

• Optimize for multiple objectives
!9
SOTA in 2018: Devices
• Storage: <128MB • Storage: <512MB
• Neural Engine: dedicated
hardware for ML algorithms
• CoreML/ TF-Lite
!10
SOTA in 2018: Models
• Model compression

• Structured pruning is used to reduce the latency 

• Designing compact models

• Neural architecture search for finding models fulfilling
the resource restrictions

• In addition to accuracy, latency or model size also
incorporated into the objective
!11
Neural Architectural Search
• Automates the designing of neural network models

• NasNet (Zoph and Le, 2017): Accuracy is used as the
reward in a reinforcement learning model

• PPP-Net (Dong+, 2018): A multi-objective architecture
search to optimize for both accuracy and inference time
!12
Mnasnet (Tan+, 2018)
• Neural Architecture Search for mobile
devices

• Optimized for both accuracy and latency 

• Multiple pareto-optimal solutions are
found in a single architecture search

• Latency is directly measured on a mobile
phone

• Able to find models that run 1.5x faster
than MobileNet v2
Sample models
from search space Trainer
Mobile
phones
Multi-objective
reward
latency
reward
Controller
accuracy
maximize 
m
ACC(m) ×
[
LAT(m)
T ]
w
w =
{
α, if LAT(m) ≤ T
β, otherwise
!13
Mnasnet
Model Name Model_Size Top-1 Accuracy Top-5 Accuracy
TF Lite
Performance
MnasNet_0.50_22
4
8.5 Mb 68.03% 87.79% 37 ms
MnasNet_0.75_22
4
12 Mb 71.72% 90.17% 61ms
MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms
SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms
ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms
Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md
!14
Mnasnet
Model Name Model_Size Top-1 Accuracy Top-5 Accuracy
TF Lite
Performance
MnasNet_0.50_22
4
8.5 Mb 68.03% 87.79% 37 ms
MnasNet_0.75_22
4
12 Mb 71.72% 90.17% 61ms
MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms
SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms
ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms
Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md
!15
Summary
• Mobile devices are more capable in running
DNN models

• Unstructured pruning is out of fashion

• Accuracy and platform-dependent restrictions
are incorporated into multi-objective model
search
!16
References
• Dong, Jin-Dong, et al. "DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural
Architectures." arXiv preprint arXiv:1806.08198 (2018).
• Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural
networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:
1510.00149 (2015).
• Tan, Mingxing, et al. "MnasNet: Platform-Aware Neural Architecture Search for Mobile." arXiv
preprint arXiv:1807.11626 (2018).
• Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv
preprint arXiv:1611.01578 (2016)
17

More Related Content

PPTX
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
PPTX
CMPS 494 Presentation [Cloud Computing]
PDF
Cassandra Bootstrap from Backups
PDF
Os Solomon
PPTX
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
PDF
Building software defined clouds - Boyan Ivanov
PDF
Andy Davidson Automation Presentation from UKNOF 31
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
CMPS 494 Presentation [Cloud Computing]
Cassandra Bootstrap from Backups
Os Solomon
GDG Ternopil TechTalks Web #1 2015 - Data storages in Microsoft Azure
Building software defined clouds - Boyan Ivanov
Andy Davidson Automation Presentation from UKNOF 31

What's hot (19)

PDF
Distributed DNN training: Infrastructure, challenges, and lessons learned
PPTX
Discover the OVH Dedicated Cloud Webinar
PDF
The Fabric of the Future
PPTX
Running JVM in Docker
PDF
Desktop Private Cloud
PDF
Deep Learning Computer Build
PDF
An Introduction to Deep Learning (May 2018)
PPTX
Taking High Performance Computing to the Cloud: Windows HPC and
PDF
Deep Dive on Deep Learning (June 2018)
PPTX
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
PDF
Getting it Right: OpenStack Private Cloud Storage
PPTX
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
PDF
MySQL: Scale Through Consolidation Webinar
PDF
HybridAzureCloud
PDF
ITLC Ha Noi : Openstack From Atlanta to Ha Noi - Storage
PPT
Virtualization & Global CyberSoft
PDF
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
PPTX
2016 NDC - 모바일 게임 서버 엔진 개발 후기
PPTX
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
Distributed DNN training: Infrastructure, challenges, and lessons learned
Discover the OVH Dedicated Cloud Webinar
The Fabric of the Future
Running JVM in Docker
Desktop Private Cloud
Deep Learning Computer Build
An Introduction to Deep Learning (May 2018)
Taking High Performance Computing to the Cloud: Windows HPC and
Deep Dive on Deep Learning (June 2018)
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Getting it Right: OpenStack Private Cloud Storage
Gain Storage Control with SIOC and Take Performance Control with QoS from Sol...
MySQL: Scale Through Consolidation Webinar
HybridAzureCloud
ITLC Ha Noi : Openstack From Atlanta to Ha Noi - Storage
Virtualization & Global CyberSoft
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
2016 NDC - 모바일 게임 서버 엔진 개발 후기
W jak sposób architektura hipekonwergentna cisco simplivity usprawni działani...
Ad

Similar to Trends in DNN compression (20)

PPTX
Big Memory for HPC
PDF
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
PPTX
Large scalecplex
PDF
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
PDF
HKG18-312 - CMSIS-NN
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
PPTX
Choosing the right parallel compute architecture
PPTX
Deep learning on mobile
PPTX
Accelerating Deep Learning Inference 
on Mobile Systems
PDF
Netflix oss season 2 episode 1 - meetup Lightning talks
PPTX
Hyper-Convergence: Worth the Hype?
PDF
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
PDF
CPLEX Optimization Studio, Modeling, Theory, Best Practices and Case Studies
PDF
Data Lake and the rise of the microservices
PDF
Spark and Deep Learning frameworks with distributed workloads
 
PDF
SpringPeople - Introduction to Cloud Computing
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
PDF
Open power ddl and lms
PDF
Os Lamothe
PPTX
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Big Memory for HPC
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Large scalecplex
“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation fr...
HKG18-312 - CMSIS-NN
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Choosing the right parallel compute architecture
Deep learning on mobile
Accelerating Deep Learning Inference 
on Mobile Systems
Netflix oss season 2 episode 1 - meetup Lightning talks
Hyper-Convergence: Worth the Hype?
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
CPLEX Optimization Studio, Modeling, Theory, Best Practices and Case Studies
Data Lake and the rise of the microservices
Spark and Deep Learning frameworks with distributed workloads
 
SpringPeople - Introduction to Cloud Computing
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Open power ddl and lms
Os Lamothe
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Ad

More from Kaushalya Madhawa (9)

PDF
On the limitations of representing functions on sets
PDF
Graphs for Visual Understanding
PDF
Robustness of compressed CNNs
PPTX
Pruning convolutional neural networks for resource efficient inference
PDF
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...
PDF
Opportunities in Higher Education & Career Guidance
PDF
Automatic generation of event summaries using microblog streams
PDF
Understanding social connections
PDF
Leveraging mobile network big data for urban planning
On the limitations of representing functions on sets
Graphs for Visual Understanding
Robustness of compressed CNNs
Pruning convolutional neural networks for resource efficient inference
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...
Opportunities in Higher Education & Career Guidance
Automatic generation of event summaries using microblog streams
Understanding social connections
Leveraging mobile network big data for urban planning

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to the R Programming Language
PDF
annual-report-2024-2025 original latest.
PDF
[EN] Industrial Machine Downtime Prediction
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Data Science and Data Analysis
Clinical guidelines as a resource for EBP(1).pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to machine learning and Linear Models
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
ISS -ESG Data flows What is ESG and HowHow
Introduction to the R Programming Language
annual-report-2024-2025 original latest.
[EN] Industrial Machine Downtime Prediction

Trends in DNN compression

  • 1. Recent Trends in DNN Compression October 12th, 2018 Kaushalya Madhawa Murata Laboratory 1 Tokyo Tech
  • 2. Back then… 2 • Size of commonly used DNNs • AlexNet 240MB • VGG 16 552MB • Inception V3 109MB • Running models on the cloud has its own disadvantages • Network latency • Privacy
  • 3. DNN Compression • Can we achieve the same accuracy with smaller models? • There are several approaches to obtain smaller models – Compressing pre-trained networks • DeepCompression (Han+, 2016) – Designing of compact models • SqueezeNet (Iandola+, 2016) • MobileNets (Howard+, 2017) 3
  • 4. Deep Compression (Han+, ICLR 2016) • One of the first papers to introduce model compression • Requires specific custom hardware to leverage inferencing • Sparsity doesn’t always translate to reduced inference time 4
  • 5. Deep Compression (Han+, ICLR 2016) • One of the first papers to introduce model compression • Requires specific custom hardware to leverage inferencing • Sparsity doesn’t always translate to reduced inference time 5
  • 6. Compact Models • Designing networks with less number of parameters • SqueezeNet - AlexNet level accuracy with 50x less parameters • MobileNets - Depth- wise separable convolutions 6 Fire module: SqueezeNet
  • 7. Compact Models • Designing networks with less number of parameters • SqueezeNet - AlexNet level accuracy with 50x less parameters • MobileNets - Depth- wise separable convolutions 7 Fire module: SqueezeNet Requires lot of expertise and consumes lot of time!
  • 9. State-of-the-art (SOTA) in 2018 • Mobile devices • More memory • Has dedicated hardware to run ML models • Deep Learning frameworks • Models • Directly optimize models for the resource constraint (eg: size) • More focus on latency • Optimize for multiple objectives !9
  • 10. SOTA in 2018: Devices • Storage: <128MB • Storage: <512MB • Neural Engine: dedicated hardware for ML algorithms • CoreML/ TF-Lite !10
  • 11. SOTA in 2018: Models • Model compression • Structured pruning is used to reduce the latency • Designing compact models • Neural architecture search for finding models fulfilling the resource restrictions • In addition to accuracy, latency or model size also incorporated into the objective !11
  • 12. Neural Architectural Search • Automates the designing of neural network models • NasNet (Zoph and Le, 2017): Accuracy is used as the reward in a reinforcement learning model • PPP-Net (Dong+, 2018): A multi-objective architecture search to optimize for both accuracy and inference time !12
  • 13. Mnasnet (Tan+, 2018) • Neural Architecture Search for mobile devices • Optimized for both accuracy and latency • Multiple pareto-optimal solutions are found in a single architecture search • Latency is directly measured on a mobile phone • Able to find models that run 1.5x faster than MobileNet v2 Sample models from search space Trainer Mobile phones Multi-objective reward latency reward Controller accuracy maximize  m ACC(m) × [ LAT(m) T ] w w = { α, if LAT(m) ≤ T β, otherwise !13
  • 14. Mnasnet Model Name Model_Size Top-1 Accuracy Top-5 Accuracy TF Lite Performance MnasNet_0.50_22 4 8.5 Mb 68.03% 87.79% 37 ms MnasNet_0.75_22 4 12 Mb 71.72% 90.17% 61ms MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md !14
  • 15. Mnasnet Model Name Model_Size Top-1 Accuracy Top-5 Accuracy TF Lite Performance MnasNet_0.50_22 4 8.5 Mb 68.03% 87.79% 37 ms MnasNet_0.75_22 4 12 Mb 71.72% 90.17% 61ms MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md !15
  • 16. Summary • Mobile devices are more capable in running DNN models • Unstructured pruning is out of fashion • Accuracy and platform-dependent restrictions are incorporated into multi-objective model search !16
  • 17. References • Dong, Jin-Dong, et al. "DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures." arXiv preprint arXiv:1806.08198 (2018). • Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv: 1510.00149 (2015). • Tan, Mingxing, et al. "MnasNet: Platform-Aware Neural Architecture Search for Mobile." arXiv preprint arXiv:1807.11626 (2018). • Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016) 17