Searching for MobileNetV3
ICCV2019
Wonboem Jang
Introduction
This paper describes the approach we took to develop
MobileNetV3 Large and Small models
1. complementary search techniques
2. new efficient versions of nonlinearities practical for
the mobile setting
3. new efficient network design
4. a new efficient segmentation decoder
MobileNet V3
Recap: MobileNet V1
Depth-wise separable Convolution
Depthwise Convolution Pointwise Convolution
• Standard Convolution cost:
ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × 𝑘 × 𝑘 × 𝑑𝑖
• Deptwise seperable convolution
cost
ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × (𝑘2 + 𝑑𝑖)
• Reduction in computation
1
𝑑 𝑗
+
1
𝑘2
Recap: MobileNet V2
Linear Bottlenecks
• If the manifold of interest remains non-zero volume after ReLU transformation, it
corresponds to a linear transformation.
• ReLU is capable of preserving complete information about the input manifold, but only if the
input manifold lies in a low-dimensional subspace of the input space.
• we can capture this by inserting linear bottleneck layers into the convolutional blocks.
Recap: MobileNet V2
Inverted Residual
Residual Inverted Residual
wide → narrow → wide narrow → wide → narrow
ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑑′ + ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑘 × 𝑘 + ℎ × 𝑤 × 𝑑′′ × 𝑡 × 𝑑′′
= ℎ × 𝑤 × 𝑑′ × 𝑡 × (𝑑′ + 𝑘2 + 𝑑′′)
Block of size ℎ × 𝑤, expansion factor 𝑡, kernel size 𝑘, input channel 𝑑′,
output channel 𝑑′′
Recap: MnasNet
Inverted Residual
Recap: MnasNet
Inverted Residual
Recap: NetAdapt
Recap: NetAdapt
Efficient Mobile Building Block
we use a combination of these layers as building blocks
in order to build the most effective models.
upgraded with modified swish nonlinearities → h-swish
Combination of layers
1. MobileNetV1: depthwise separable convolutions
2. MobileNetV2: linear bottleneck, inverted residual
structure
3. MnasNet: lightweight attention modules based on
squeeze and excitation into the bottleneck structure
MobileNetV3
Network Search
Platform-Aware NAS for Block-wise Search
𝜔 = −0.17 vs the original 𝜔 = −0.07
We observe that accuracy changes much more
dramatically with latency for small models
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝐴𝐶𝐶(𝑚) ×
𝐿𝐴𝑇(𝑚)
𝑇
𝜔
𝑚
Network Search
NetAdapt for Layer-wise Search
1. Starts with a seed network architecture found by platform-aware NAS.
2. For each step:
(a) Generate a set of new proposals. Each proposal represents a modification of an
architecture that generates at least δ reduction in latency compared to the previous step.
(b) For each proposal we use the pre-trained model from the previous step and populate the
new proposed architecture, truncating and randomly initializing missing weights as
appropriate. Finetune each proposal for T steps to get a coarse estimate of the accuracy.
(c) Selected best proposal according to some metric.
3. Iterate previous step until target latency is reached.
Network Search
Proposal
1. Reduce the size of any expansion layer;
2. Reduce bottleneck in all blocks that share the same bottleneck size - to maintain residual
connections
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒
∆Acc
|∆latency|
→ maximize tan
Evaluation
Network Improvements
Redesigning Expensive Layers
We redesign the computionally-expensive layers at the beginning and the end of the network.
These modifications are outside of the scope of the current search space.
1. Current models based on MobileNetV2’s inverted bottleneck structure and variants use 1x1
convolution. This layer is critically important in order to have rich features for prediction.
However, this comes at a cost of extra latency.
→ Move this layer (1) past the final average pooling
2. The previous bottleneck projection layer is no longer needed to reduce computation.
→ Remove the projection and filtering layers in the previous bottleneck layer
Network Improvements
Redesigning Expensive Layers
Network Improvements
Redesigning Expensive Layers
We experimented with reducing the number of filters and using different nonlinearities to try and
reduce redundancy.
We were able to reduce the number of filters to 16 while maintaining the same accuracy as 32
filters using either ReLU or swish.
Network Improvements
Nonlinearities
While this nonlinearity improves accuracy, it comes with non-zero cost in embedded
environments as the sigmoid function is much more expensive to compute on mobile devices.
Network Improvements
Nonlinearities
First, optimized implementations of ReLU6 are available on virtually all software and hardware
frameworks.
Second, in quantized mode, it eliminates potential numerical precision loss caused by different
implementations of the approximate sigmoid.
Finally, in practice, h-swish can be implemented as a piece-wise function to reduce the number of
memory accesses driving the latency cost down substantially
Network Improvements
Nonlinearities
The cost of applying nonlinearity decreases as we go deeper into the network, since each layer
activation memory typically halves every time the resolution drops.
Thus in our architectures we only use h-swish at the second half of the model.
Network Improvements
Large squeeze-and-excite
we replace them all to fixed to be 1/4 of the number of channels in expansion layer.
Experiments
Architecture
Experiments
Results - Classification
Experiments
Results - Classification
Experiments
Results - Classification
Experiments
Results - Classification
Experiments
Results - Detection
SSDLite
Experiments
Results - Semantic Segmentation
R-ASPP
Experiments
Results - Semantic Segmentation
R-ASPP

More Related Content

PDF
Mobilenetv1 v2 slide
PDF
MobileNet - PR044
PDF
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PDF
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PPTX
Batch normalization presentation
PDF
Siamese networks
PDF
Densenet CNN
PPTX
EfficientNet
Mobilenetv1 v2 slide
MobileNet - PR044
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Batch normalization presentation
Siamese networks
Densenet CNN
EfficientNet

What's hot (20)

PPTX
AlexNet
PPTX
cnn ppt.pptx
PDF
Convolutional Neural Networks (CNN)
ODP
Artificial Neural Network
PPTX
Convolutional neural network from VGG to DenseNet
PPTX
인공지능, 기계학습 그리고 딥러닝
PDF
Training Deep Neural Nets
PDF
Optimization for Deep Learning
PPTX
Convolutional neural network
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
PPTX
Generative Adversarial Network (GAN)
PPTX
Introduction Of Artificial neural network
PPTX
CONVOLUTIONAL NEURAL NETWORK
PPTX
Deep Learning - CNN and RNN
PPTX
Convolutional neural network
PPTX
U-Net (1).pptx
PPTX
The world of loss function
PDF
Training Neural Networks
PDF
Introduction to Generative Adversarial Networks (GANs)
AlexNet
cnn ppt.pptx
Convolutional Neural Networks (CNN)
Artificial Neural Network
Convolutional neural network from VGG to DenseNet
인공지능, 기계학습 그리고 딥러닝
Training Deep Neural Nets
Optimization for Deep Learning
Convolutional neural network
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Generative Adversarial Network (GAN)
Introduction Of Artificial neural network
CONVOLUTIONAL NEURAL NETWORK
Deep Learning - CNN and RNN
Convolutional neural network
U-Net (1).pptx
The world of loss function
Training Neural Networks
Introduction to Generative Adversarial Networks (GANs)
Ad

Similar to MobileNet V3 (20)

PDF
CVPR 2018 Paper Reading MobileNet V2
PDF
MobileNet is the pretrained model in best.pdf
PDF
Low compute Convolution Neural Network models
PDF
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PDF
1801.06434
PDF
201907 AutoML and Neural Architecture Search
PDF
Convolution Neural Networks and Ensembles for Visually Impaired Aid.pdf
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PPTX
under wireless communication prediction using deeplearning water PPT.pptx
PDF
kanimozhi2019.pdf
PPTX
Machine Vision on Embedded Hardware
PDF
Tutorial-on-DNN-07-Co-design-Precision.pdf
PDF
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
PDF
Pushing Intelligence to Edge Nodes : Low Power circuits for Self Localization...
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
Neural Architecture Search: Learning How to Learn
PPTX
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
PDF
Efficient mobilenet architecture_as_image_recognit
PDF
"Tools and Techniques for Optimizing DNNs on Arm-based Processors with Au-Zon...
PPTX
Traffic Automation System
CVPR 2018 Paper Reading MobileNet V2
MobileNet is the pretrained model in best.pdf
Low compute Convolution Neural Network models
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
1801.06434
201907 AutoML and Neural Architecture Search
Convolution Neural Networks and Ensembles for Visually Impaired Aid.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
under wireless communication prediction using deeplearning water PPT.pptx
kanimozhi2019.pdf
Machine Vision on Embedded Hardware
Tutorial-on-DNN-07-Co-design-Precision.pdf
"Enabling Automated Design of Computationally Efficient Deep Neural Networks,...
Pushing Intelligence to Edge Nodes : Low Power circuits for Self Localization...
DLD meetup 2017, Efficient Deep Learning
Neural Architecture Search: Learning How to Learn
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
Efficient mobilenet architecture_as_image_recognit
"Tools and Techniques for Optimizing DNNs on Arm-based Processors with Au-Zon...
Traffic Automation System
Ad

Recently uploaded (20)

PPTX
wireless networks, mobile computing.pptx
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
First part_B-Image Processing - 1 of 2).pdf
PPTX
Principal presentation for NAAC (1).pptx
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Feature types and data preprocessing steps
PPTX
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
PPTX
mechattonicsand iotwith sensor and actuator
PDF
Design Guidelines and solutions for Plastics parts
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Measurement Uncertainty and Measurement System analysis
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PPTX
Building constraction Conveyance of water.pptx
PPTX
Petroleum Refining & Petrochemicals.pptx
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
wireless networks, mobile computing.pptx
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
First part_B-Image Processing - 1 of 2).pdf
Principal presentation for NAAC (1).pptx
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
Exploratory_Data_Analysis_Fundamentals.pdf
Feature types and data preprocessing steps
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
mechattonicsand iotwith sensor and actuator
Design Guidelines and solutions for Plastics parts
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Computer System Architecture 3rd Edition-M Morris Mano.pdf
August -2025_Top10 Read_Articles_ijait.pdf
distributed database system" (DDBS) is often used to refer to both the distri...
Information Storage and Retrieval Techniques Unit III
Measurement Uncertainty and Measurement System analysis
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Building constraction Conveyance of water.pptx
Petroleum Refining & Petrochemicals.pptx
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY

MobileNet V3

  • 2. Introduction This paper describes the approach we took to develop MobileNetV3 Large and Small models 1. complementary search techniques 2. new efficient versions of nonlinearities practical for the mobile setting 3. new efficient network design 4. a new efficient segmentation decoder MobileNet V3
  • 3. Recap: MobileNet V1 Depth-wise separable Convolution Depthwise Convolution Pointwise Convolution • Standard Convolution cost: ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × 𝑘 × 𝑘 × 𝑑𝑖 • Deptwise seperable convolution cost ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × (𝑘2 + 𝑑𝑖) • Reduction in computation 1 𝑑 𝑗 + 1 𝑘2
  • 4. Recap: MobileNet V2 Linear Bottlenecks • If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation. • ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space. • we can capture this by inserting linear bottleneck layers into the convolutional blocks.
  • 5. Recap: MobileNet V2 Inverted Residual Residual Inverted Residual wide → narrow → wide narrow → wide → narrow ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑑′ + ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑘 × 𝑘 + ℎ × 𝑤 × 𝑑′′ × 𝑡 × 𝑑′′ = ℎ × 𝑤 × 𝑑′ × 𝑡 × (𝑑′ + 𝑘2 + 𝑑′′) Block of size ℎ × 𝑤, expansion factor 𝑡, kernel size 𝑘, input channel 𝑑′, output channel 𝑑′′
  • 10. Efficient Mobile Building Block we use a combination of these layers as building blocks in order to build the most effective models. upgraded with modified swish nonlinearities → h-swish Combination of layers 1. MobileNetV1: depthwise separable convolutions 2. MobileNetV2: linear bottleneck, inverted residual structure 3. MnasNet: lightweight attention modules based on squeeze and excitation into the bottleneck structure MobileNetV3
  • 11. Network Search Platform-Aware NAS for Block-wise Search 𝜔 = −0.17 vs the original 𝜔 = −0.07 We observe that accuracy changes much more dramatically with latency for small models 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝐴𝐶𝐶(𝑚) × 𝐿𝐴𝑇(𝑚) 𝑇 𝜔 𝑚
  • 12. Network Search NetAdapt for Layer-wise Search 1. Starts with a seed network architecture found by platform-aware NAS. 2. For each step: (a) Generate a set of new proposals. Each proposal represents a modification of an architecture that generates at least δ reduction in latency compared to the previous step. (b) For each proposal we use the pre-trained model from the previous step and populate the new proposed architecture, truncating and randomly initializing missing weights as appropriate. Finetune each proposal for T steps to get a coarse estimate of the accuracy. (c) Selected best proposal according to some metric. 3. Iterate previous step until target latency is reached.
  • 13. Network Search Proposal 1. Reduce the size of any expansion layer; 2. Reduce bottleneck in all blocks that share the same bottleneck size - to maintain residual connections 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 ∆Acc |∆latency| → maximize tan Evaluation
  • 14. Network Improvements Redesigning Expensive Layers We redesign the computionally-expensive layers at the beginning and the end of the network. These modifications are outside of the scope of the current search space. 1. Current models based on MobileNetV2’s inverted bottleneck structure and variants use 1x1 convolution. This layer is critically important in order to have rich features for prediction. However, this comes at a cost of extra latency. → Move this layer (1) past the final average pooling 2. The previous bottleneck projection layer is no longer needed to reduce computation. → Remove the projection and filtering layers in the previous bottleneck layer
  • 16. Network Improvements Redesigning Expensive Layers We experimented with reducing the number of filters and using different nonlinearities to try and reduce redundancy. We were able to reduce the number of filters to 16 while maintaining the same accuracy as 32 filters using either ReLU or swish.
  • 17. Network Improvements Nonlinearities While this nonlinearity improves accuracy, it comes with non-zero cost in embedded environments as the sigmoid function is much more expensive to compute on mobile devices.
  • 18. Network Improvements Nonlinearities First, optimized implementations of ReLU6 are available on virtually all software and hardware frameworks. Second, in quantized mode, it eliminates potential numerical precision loss caused by different implementations of the approximate sigmoid. Finally, in practice, h-swish can be implemented as a piece-wise function to reduce the number of memory accesses driving the latency cost down substantially
  • 19. Network Improvements Nonlinearities The cost of applying nonlinearity decreases as we go deeper into the network, since each layer activation memory typically halves every time the resolution drops. Thus in our architectures we only use h-swish at the second half of the model.
  • 20. Network Improvements Large squeeze-and-excite we replace them all to fixed to be 1/4 of the number of channels in expansion layer.
  • 27. Experiments Results - Semantic Segmentation R-ASPP
  • 28. Experiments Results - Semantic Segmentation R-ASPP