EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com

1
EfficientNet
• Convolutional Neural Networks (ConvNets) are commonly developed at a fixed hardware resource budget,
and then scaled up for better accuracy if more resources are available.
• AlexNet : Using GPU, RELU function
• 8 Layer
• VGGNet : 3X3 convolution filter
• 19 Layer
• GoogleNet : Using Inception module for efficient
• 22 Layer
• ResNet : Using skip connection for prevent gradient vanishing
• 152 Layer
Background

2
EfficientNet
• There is still not much known about the process of efficient scaling up of ConvNet.
• The most widely known scaling up method is to simply deepen the depth or widen the width of the ConvNet.
• Though it is possible to scale two or three dimensions arbitrarily, arbitrary scaling requires tedious manual
tuning and still often yields sub-optimal accuracy and efficiency.
 Is there a theory-based scale up method that can make ConvNet perform better?
Problem Statement

3
EfficientNet
• EfficientNet proposes a 'compound scaling method' that uniformly scales the width, depth, and resolution of
the model.
Contribution

4
EfficientNet
• Intuitively, as the input image grows, the network must secure a receptive field that can accommodate a wider
area and extract refined patterns through more channels, so compound scaling can be expected to show
good results.
• In this paper, the author quantitatively analyzed the relationship between network width, depth, and
resolution for the first time.
Contribution

5
EfficientNet
• EfficientNet mainly uses the mobile inverted bottleneck convolution (MBConV) block. To understand the
MBConV block, you must understand the two contents below.
• Depthwise separable conv (first presented in MobileNetV1)
• mobile inverted bottleneck conv
Previous work

6
EfficientNet
Previous work
• A general convnet has many parameters to calculate.

7
EfficientNet
Previous work
• Depth wise Separable Convolution
MobileNetV1 (2017)

8
Paper review
Previous work
• Depth wise Separable Convolution
MobileNetV1 (2017)
• Although accuracy decreases slightly, the amount of calculation is significantly reduced.

9
EfficientNet
Previous work
• mobile inverted bottleneck conv(MBConV)
MobileNetV2 (2018)
• This paper proposes the Manifold concept ("high-dimensional channels can actually be well expressed as low-
dimensional channels") and with this hypothesis, 1x1 Pointwise Conv is first performed to increase the number
of channels and then Depthwise Conv is performed.
• In summary, the process proceeds as dimension expansion -> information extraction -> dimensionality
reduction.

10
EfficientNet
Previous work
• ReLU6 activation
MobileNetV2 (2018)
• Used in other layers except projection layer
• In existing ReLU, there is no upper limit on the positive region.
• ReLU6 is an activation function that prevents the maximum value from exceeding 6 in the positive region.

11
EfficientNet
Previous work
• swish
MobileNetV3 (2019)
• Not all layers use h-swish, and the first half usually uses ReLU.
• MobileNet authors discovered that h-swish is only effective in deeper layers.

12
EfficientNet
Previous work
SENet
• Squeeze is performed by averaging each two-dimensional feature map through global average pooling (GAP)
to obtain a single value.
• GAP converts features into one-dimensional vectors and compresses the feature map of HxWxC size into
1x1xC.
• SE Block
• Squeeze out the feature and amplify it again
• Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by
about 0.6% on the ImageNet Dataset.

13
EfficientNet
Previous work
• SE Block
• Squeeze out the feature and amplify it again
• Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by
about 0.6% on the ImageNet Dataset.
• By adding two fully-connected (FC) layers, the relative importance of each channel is determined. (Calculate
channel-wise dependencies)
• The channel descriptor vector was linearly transformed, ReLU was applied, and the sigmoid function was
applied pointwise after linear transformation.
• Finally, the scale values that have gone through the excitation operation through the Sigmoid function all
have values between 0 and 1, so they are scaled according to the importance of the channels.
SENet

14
EfficientNet
• Since we are trying to solve the problem of maximizing the accuracy of the model, this problem can be
organized as follows:
Methodology
• This is a general formulation of ConvNet.
• If H, W, C are the size of the input tensor, and F is the Conv layer, the ConvNet is as follows:
• The important problem is that the optimal d, w, and r coefficients are related to each other and subject to
different resource constraints.
• Therefore, ConvNets that have been widely used have been scaled by selecting only one of the following
dimensions.
• If one of the coefficients is increased and learned, performance improves, but there is a clear limit to the
improvement.

15
EfficientNet
Methodology
• Depth: The capacity of the model increases and more complex features can be captured.
• It becomes more difficult to learn due to the vanishing gradient problem.
• Width: Increasing the width of each layer increases accuracy.
• The amount of calculation increases in proportion to the square.
• Resolution: Increasing the resolution of the input image allows learning more detailed features, increasing
accuracy.
• The amount of calculation increases in proportion to the square.

16
EfficientNet
Baseline
• It can be seen that balancing d, w, and r coefficients is very important for ConvNet scaling.
• The author proposes a compound scaling method that uses compound coefficient to equally adjust the width,
depth, and resolution of the network.
• The FLOPS of ConvNet is dominated by convolution operation, the FLOPS of ConvNet according to the
picture above increases and decreases in proportion to (α⋅β2⋅γ2) ϕ.
• α⋅β2⋅γ2 values are limited to 2, the total FLOPS increases or decreases approximately in proportion to 2ϕ.

17
EfficientNet
Baseline
• This is the result of how the CAM (Class Activation Map) is observed when designed by adjusting only the
depth, width, and resolution of the network, and when the depth, width, and resolution are balanced using
compound scaling.
• In the case of compound scaling, it can be seen that the feature map is clearly activated for meaningful parts
in the image.

18
EfficientNet
• STEP 1: Set ϕ = 1 and perform a small grid search for α, β, and γ. The values found are α = 1.2, β = 1.1, and
γ = 1.15, which is α⋅β2⋅γ2 ≈ 2.
• STEP 2: Fix α, β, and γ and increase the overall size by changing ϕ.
• You can get better results by experimenting with a large model using α, β, and γ directly, but for a large
model, the resources required for the experiment are too many.
• So, for a small baseline network, first find good α , β , and γ (STEP 1) and then increase the overall size (STEP
2).
Model

20
EfficientNet
• Existing ConvNets showing similar Top-1 and Top-5 accuracy were grouped and compared with EfficientNet.
• EfficientNet consistently shows significantly fewer parameters and FLOPS in all domains. (The number of
parameters is up to 8.4 times less, and the FLOPS is up to 16 times less.)
Experiments

21
EfficientNet
• Table shows FLOPS and Top-1 Accuracy by experimental network d, w, and r conditions.
Experiments

24
EfficientNet
• We systematically study ConvNet scaling and identify that carefully balancing network width, depth, and
resolution is an important but missing piece, preventing us from better accuracy and efficiency.
• To address this issue, we propose a simple and highly effective compound scaling method, which enables us
to easily scale up a baseline ConvNet to any target resource constraints in a more principled way, while
maintaining model efficiency.
• Powered by this compound scaling method, we demonstrate that a mobilesize EfficientNet model can be
scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters
and FLOPS, on both ImageNet and five commonly used transfer learning datasets.
Conclusions

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

More Related Content

Similar to EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx (20)

More from ssuser2624f71 (20)

Recently uploaded (20)

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

Editor's Notes