Trends in DNN compression

Recent Trends in DNN
Compression
October 12th, 2018
Kaushalya Madhawa
Murata Laboratory
1
Tokyo Tech

Back then…
2
• Size of commonly used DNNs
• AlexNet 240MB
• VGG 16 552MB
• Inception V3 109MB
• Running models on the cloud has its own
disadvantages
• Network latency
• Privacy

DNN Compression
• Can we achieve the same accuracy with
smaller models?
• There are several approaches to obtain
smaller models
– Compressing pre-trained networks
• DeepCompression (Han+, 2016)
– Designing of compact models
• SqueezeNet (Iandola+, 2016)
• MobileNets (Howard+, 2017)
3

Deep Compression (Han+, ICLR 2016)
• One of the first papers to introduce model compression
• Requires specific custom hardware to leverage
inferencing
• Sparsity doesn’t always translate to reduced inference
time
4

Deep Compression (Han+, ICLR 2016)
• One of the first papers to introduce model compression
• Requires specific custom hardware to leverage
inferencing
• Sparsity doesn’t always translate to reduced inference
time
5

Compact Models
• Designing networks
with less number of
parameters
• SqueezeNet - AlexNet
level accuracy with 50x
less parameters
• MobileNets - Depth-
wise separable
convolutions
6
Fire module: SqueezeNet

Compact Models
• Designing networks
with less number of
parameters
• SqueezeNet - AlexNet
level accuracy with 50x
less parameters
• MobileNets - Depth-
wise separable
convolutions
7
Fire module: SqueezeNet
Requires lot of expertise and consumes lot of time!

State-of-the-art (SOTA) in 2018
!8

State-of-the-art (SOTA) in 2018
• Mobile devices
• More memory

• Has dedicated hardware to run ML models

• Deep Learning frameworks
• Models
• Directly optimize models for the resource constraint (eg:
size)

• More focus on latency

• Optimize for multiple objectives
!9

SOTA in 2018: Devices
• Storage: <128MB • Storage: <512MB
• Neural Engine: dedicated
hardware for ML algorithms
• CoreML/ TF-Lite
!10

SOTA in 2018: Models
• Model compression

• Structured pruning is used to reduce the latency

• Designing compact models

• Neural architecture search for ﬁnding models fulﬁlling
the resource restrictions

• In addition to accuracy, latency or model size also
incorporated into the objective
!11

Neural Architectural Search
• Automates the designing of neural network models

• NasNet (Zoph and Le, 2017): Accuracy is used as the
reward in a reinforcement learning model

• PPP-Net (Dong+, 2018): A multi-objective architecture
search to optimize for both accuracy and inference time
!12

Mnasnet (Tan+, 2018)
• Neural Architecture Search for mobile
devices

• Optimized for both accuracy and latency

• Multiple pareto-optimal solutions are
found in a single architecture search

• Latency is directly measured on a mobile
phone

• Able to ﬁnd models that run 1.5x faster
than MobileNet v2
Sample models
from search space Trainer
Mobile
phones
Multi-objective
reward
latency
reward
Controller
accuracy
maximize
m
ACC(m) ×
[
LAT(m)
T ]
w
w =
{
α, if LAT(m) ≤ T
β, otherwise
!13

Mnasnet
Model Name Model_Size Top-1 Accuracy Top-5 Accuracy
TF Lite
Performance
MnasNet_0.50_22
4
8.5 Mb 68.03% 87.79% 37 ms
MnasNet_0.75_22
4
12 Mb 71.72% 90.17% 61ms
MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms
SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms
ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms
Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md
!14

Mnasnet
Model Name Model_Size Top-1 Accuracy Top-5 Accuracy
TF Lite
Performance
MnasNet_0.50_22
4
8.5 Mb 68.03% 87.79% 37 ms
MnasNet_0.75_22
4
12 Mb 71.72% 90.17% 61ms
MnasNet_1.3_224 24 Mb 75.24% 92.55% 152 ms
SqueezeNet 5.0 Mb 49.0% 72.9% 224 ms
ResNet_V2_101 178.3 Mb 76.8% 93.6% 1880 ms
Inception_V3 95.3 Mb 77.9% 93.8% 1433 ms
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md
!15

Summary
• Mobile devices are more capable in running
DNN models

• Unstructured pruning is out of fashion

• Accuracy and platform-dependent restrictions
are incorporated into multi-objective model
search
!16

References
• Dong, Jin-Dong, et al. "DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural
Architectures." arXiv preprint arXiv:1806.08198 (2018).
• Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural
networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:
1510.00149 (2015).
• Tan, Mingxing, et al. "MnasNet: Platform-Aware Neural Architecture Search for Mobile." arXiv
preprint arXiv:1807.11626 (2018).
• Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv
preprint arXiv:1611.01578 (2016)
17

Trends in DNN compression

More Related Content

What's hot (19)

Similar to Trends in DNN compression (20)

More from Kaushalya Madhawa (9)

Recently uploaded (20)

Trends in DNN compression