SlideShare a Scribd company logo
Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com
1
EfficientNet
• Convolutional Neural Networks (ConvNets) are commonly developed at a fixed hardware resource budget,
and then scaled up for better accuracy if more resources are available.
• AlexNet : Using GPU, RELU function
• 8 Layer
• VGGNet : 3X3 convolution filter
• 19 Layer
• GoogleNet : Using Inception module for efficient
• 22 Layer
• ResNet : Using skip connection for prevent gradient vanishing
• 152 Layer
Background
2
EfficientNet
• There is still not much known about the process of efficient scaling up of ConvNet.
• The most widely known scaling up method is to simply deepen the depth or widen the width of the ConvNet.
• Though it is possible to scale two or three dimensions arbitrarily, arbitrary scaling requires tedious manual
tuning and still often yields sub-optimal accuracy and efficiency.
 Is there a theory-based scale up method that can make ConvNet perform better?
Problem Statement
3
EfficientNet
• EfficientNet proposes a 'compound scaling method' that uniformly scales the width, depth, and resolution of
the model.
Contribution
4
EfficientNet
• Intuitively, as the input image grows, the network must secure a receptive field that can accommodate a wider
area and extract refined patterns through more channels, so compound scaling can be expected to show
good results.
• In this paper, the author quantitatively analyzed the relationship between network width, depth, and
resolution for the first time.
Contribution
5
EfficientNet
• EfficientNet mainly uses the mobile inverted bottleneck convolution (MBConV) block. To understand the
MBConV block, you must understand the two contents below.
• Depthwise separable conv (first presented in MobileNetV1)
• mobile inverted bottleneck conv
Previous work
6
EfficientNet
Previous work
• A general convnet has many parameters to calculate.
7
EfficientNet
Previous work
• Depth wise Separable Convolution
MobileNetV1 (2017)
8
Paper review
Previous work
• Depth wise Separable Convolution
MobileNetV1 (2017)
• Although accuracy decreases slightly, the amount of calculation is significantly reduced.
9
EfficientNet
Previous work
• mobile inverted bottleneck conv(MBConV)
MobileNetV2 (2018)
• This paper proposes the Manifold concept ("high-dimensional channels can actually be well expressed as low-
dimensional channels") and with this hypothesis, 1x1 Pointwise Conv is first performed to increase the number
of channels and then Depthwise Conv is performed.
• In summary, the process proceeds as dimension expansion -> information extraction -> dimensionality
reduction.
10
EfficientNet
Previous work
• ReLU6 activation
MobileNetV2 (2018)
• Used in other layers except projection layer
• In existing ReLU, there is no upper limit on the positive region.
• ReLU6 is an activation function that prevents the maximum value from exceeding 6 in the positive region.
11
EfficientNet
Previous work
• swish
MobileNetV3 (2019)
• Not all layers use h-swish, and the first half usually uses ReLU.
• MobileNet authors discovered that h-swish is only effective in deeper layers.
12
EfficientNet
Previous work
SENet
• Squeeze is performed by averaging each two-dimensional feature map through global average pooling (GAP)
to obtain a single value.
• GAP converts features into one-dimensional vectors and compresses the feature map of HxWxC size into
1x1xC.
• SE Block
• Squeeze out the feature and amplify it again
• Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by
about 0.6% on the ImageNet Dataset.
13
EfficientNet
Previous work
• SE Block
• Squeeze out the feature and amplify it again
• Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by
about 0.6% on the ImageNet Dataset.
• By adding two fully-connected (FC) layers, the relative importance of each channel is determined. (Calculate
channel-wise dependencies)
• The channel descriptor vector was linearly transformed, ReLU was applied, and the sigmoid function was
applied pointwise after linear transformation.
• Finally, the scale values that have gone through the excitation operation through the Sigmoid function all
have values between 0 and 1, so they are scaled according to the importance of the channels.
SENet
14
EfficientNet
• Since we are trying to solve the problem of maximizing the accuracy of the model, this problem can be
organized as follows:
Methodology
• This is a general formulation of ConvNet.
• If H, W, C are the size of the input tensor, and F is the Conv layer, the ConvNet is as follows:
• The important problem is that the optimal d, w, and r coefficients are related to each other and subject to
different resource constraints.
• Therefore, ConvNets that have been widely used have been scaled by selecting only one of the following
dimensions.
• If one of the coefficients is increased and learned, performance improves, but there is a clear limit to the
improvement.
15
EfficientNet
Methodology
• Depth: The capacity of the model increases and more complex features can be captured.
• It becomes more difficult to learn due to the vanishing gradient problem.
• Width: Increasing the width of each layer increases accuracy.
• The amount of calculation increases in proportion to the square.
• Resolution: Increasing the resolution of the input image allows learning more detailed features, increasing
accuracy.
• The amount of calculation increases in proportion to the square.
16
EfficientNet
Baseline
• It can be seen that balancing d, w, and r coefficients is very important for ConvNet scaling.
• The author proposes a compound scaling method that uses compound coefficient to equally adjust the width,
depth, and resolution of the network.
• The FLOPS of ConvNet is dominated by convolution operation, the FLOPS of ConvNet according to the
picture above increases and decreases in proportion to (α⋅β2⋅γ2) ϕ.
• α⋅β2⋅γ2 values are limited to 2, the total FLOPS increases or decreases approximately in proportion to 2ϕ.
17
EfficientNet
Baseline
• This is the result of how the CAM (Class Activation Map) is observed when designed by adjusting only the
depth, width, and resolution of the network, and when the depth, width, and resolution are balanced using
compound scaling.
• In the case of compound scaling, it can be seen that the feature map is clearly activated for meaningful parts
in the image.
18
EfficientNet
• STEP 1: Set ϕ = 1 and perform a small grid search for α, β, and γ. The values found are α = 1.2, β = 1.1, and
γ = 1.15, which is α⋅β2⋅γ2 ≈ 2.
• STEP 2: Fix α, β, and γ and increase the overall size by changing ϕ.
• You can get better results by experimenting with a large model using α, β, and γ directly, but for a large
model, the resources required for the experiment are too many.
• So, for a small baseline network, first find good α , β , and γ (STEP 1) and then increase the overall size (STEP
2).
Model
20
EfficientNet
• Existing ConvNets showing similar Top-1 and Top-5 accuracy were grouped and compared with EfficientNet.
• EfficientNet consistently shows significantly fewer parameters and FLOPS in all domains. (The number of
parameters is up to 8.4 times less, and the FLOPS is up to 16 times less.)
Experiments
21
EfficientNet
• Table shows FLOPS and Top-1 Accuracy by experimental network d, w, and r conditions.
Experiments
24
EfficientNet
• We systematically study ConvNet scaling and identify that carefully balancing network width, depth, and
resolution is an important but missing piece, preventing us from better accuracy and efficiency.
• To address this issue, we propose a simple and highly effective compound scaling method, which enables us
to easily scale up a baseline ConvNet to any target resource constraints in a more principled way, while
maintaining model efficiency.
• Powered by this compound scaling method, we demonstrate that a mobilesize EfficientNet model can be
scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters
and FLOPS, on both ImageNet and five commonly used transfer learning datasets.
Conclusions
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

More Related Content

PPTX
EfficientNet
PDF
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PDF
PR-366: A ConvNet for 2020s
PPTX
ConvNeXt: A ConvNet for the 2020s explained
PPTX
EfficientNet
PPTX
ConvNeXt.pptx
PDF
Convolutional Neural Networks : Popular Architectures
PDF
[2020 CVPR Efficient DET paper review]
EfficientNet
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-366: A ConvNet for 2020s
ConvNeXt: A ConvNet for the 2020s explained
EfficientNet
ConvNeXt.pptx
Convolutional Neural Networks : Popular Architectures
[2020 CVPR Efficient DET paper review]

Similar to EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx (20)

PDF
PR243: Designing Network Design Spaces
PDF
Efficient de cvpr_2020_paper
PPTX
Cvpr 2018 papers review (efficient computing)
PDF
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PPTX
Tìm hiểu về CNN và ResNet | Computer Vision
PDF
1801.06434
PPTX
adlkchiuabcndjhvkajnfdkjhcfatgcbajkbcyudfctauygb
PPTX
Introduction to CNN Models: DenseNet & MobileNet
PDF
Mix Conv: Mixed Depthwise Convolutional Kernels
PDF
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
PDF
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PDF
MobileNet V3
PDF
ShuffleNet - PR054
PPTX
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
PPTX
GoogLeNet.pptx
PDF
Network Deconvolution review [cdm]
PPTX
Comparison of Learning Algorithms for Handwritten Digit Recognition
PDF
ResNeSt: Split-Attention Networks
PPTX
04 Deep CNN (Ch_01 to Ch_3).pptx
PR243: Designing Network Design Spaces
Efficient de cvpr_2020_paper
Cvpr 2018 papers review (efficient computing)
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Tìm hiểu về CNN và ResNet | Computer Vision
1801.06434
adlkchiuabcndjhvkajnfdkjhcfatgcbajkbcyudfctauygb
Introduction to CNN Models: DenseNet & MobileNet
Mix Conv: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
MobileNet V3
ShuffleNet - PR054
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
GoogLeNet.pptx
Network Deconvolution review [cdm]
Comparison of Learning Algorithms for Handwritten Digit Recognition
ResNeSt: Split-Attention Networks
04 Deep CNN (Ch_01 to Ch_3).pptx
Ad

More from ssuser2624f71 (20)

PPTX
Vector and Matrix operationsVector and Matrix operations
PPTX
240219_RNN, LSTM code.pptxdddddddddddddddd
PPTX
Sparse Graph Attention Networks 2021.pptx
PPTX
인공지능 로봇 윤리_1229_9차시.pptx
PPTX
인공지능 로봇 윤리_1228_8차시.pptx
PPTX
인공지능 로봇 윤리_1227_7차시.pptx
PPTX
인공지능 로봇 윤리_1226_6차시.pptx
PPTX
인공지능 로봇 윤리_1222_5차시.pptx
PPTX
인공지능 로봇 윤리_1221_4차시.pptx
PPTX
인공지능 로봇 윤리_1220_3차시.pptx
PPTX
인공지능 로봇 윤리_1219_2차시.pptx
PPTX
인공지능 로봇 윤리_1218_1차시.pptx
PPTX
디지털인문학9차시.pptx
PPTX
디지털인문학8차시.pptx
PPTX
디지털인문학7차시.pptx
PPTX
디지털인문학6차시.pptx
PPTX
디지털인문학 5차시.pptx
PPTX
디지털인문학4차시.pptx
PPTX
디지털인문학3차시.pptx
PPTX
디지털인문학2차시.pptx
Vector and Matrix operationsVector and Matrix operations
240219_RNN, LSTM code.pptxdddddddddddddddd
Sparse Graph Attention Networks 2021.pptx
인공지능 로봇 윤리_1229_9차시.pptx
인공지능 로봇 윤리_1228_8차시.pptx
인공지능 로봇 윤리_1227_7차시.pptx
인공지능 로봇 윤리_1226_6차시.pptx
인공지능 로봇 윤리_1222_5차시.pptx
인공지능 로봇 윤리_1221_4차시.pptx
인공지능 로봇 윤리_1220_3차시.pptx
인공지능 로봇 윤리_1219_2차시.pptx
인공지능 로봇 윤리_1218_1차시.pptx
디지털인문학9차시.pptx
디지털인문학8차시.pptx
디지털인문학7차시.pptx
디지털인문학6차시.pptx
디지털인문학 5차시.pptx
디지털인문학4차시.pptx
디지털인문학3차시.pptx
디지털인문학2차시.pptx
Ad

Recently uploaded (20)

PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Pharma ospi slides which help in ospi learning
PPTX
master seminar digital applications in india
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
A systematic review of self-coping strategies used by university students to ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Supply Chain Operations Speaking Notes -ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Pharma ospi slides which help in ospi learning
master seminar digital applications in india
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Classroom Observation Tools for Teachers
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
01-Introduction-to-Information-Management.pdf
Microbial diseases, their pathogenesis and prophylaxis
Microbial disease of the cardiovascular and lymphatic systems
Anesthesia in Laparoscopic Surgery in India
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

  • 1. Min-Seo Kim Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: kms39273@naver.com
  • 2. 1 EfficientNet • Convolutional Neural Networks (ConvNets) are commonly developed at a fixed hardware resource budget, and then scaled up for better accuracy if more resources are available. • AlexNet : Using GPU, RELU function • 8 Layer • VGGNet : 3X3 convolution filter • 19 Layer • GoogleNet : Using Inception module for efficient • 22 Layer • ResNet : Using skip connection for prevent gradient vanishing • 152 Layer Background
  • 3. 2 EfficientNet • There is still not much known about the process of efficient scaling up of ConvNet. • The most widely known scaling up method is to simply deepen the depth or widen the width of the ConvNet. • Though it is possible to scale two or three dimensions arbitrarily, arbitrary scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency.  Is there a theory-based scale up method that can make ConvNet perform better? Problem Statement
  • 4. 3 EfficientNet • EfficientNet proposes a 'compound scaling method' that uniformly scales the width, depth, and resolution of the model. Contribution
  • 5. 4 EfficientNet • Intuitively, as the input image grows, the network must secure a receptive field that can accommodate a wider area and extract refined patterns through more channels, so compound scaling can be expected to show good results. • In this paper, the author quantitatively analyzed the relationship between network width, depth, and resolution for the first time. Contribution
  • 6. 5 EfficientNet • EfficientNet mainly uses the mobile inverted bottleneck convolution (MBConV) block. To understand the MBConV block, you must understand the two contents below. • Depthwise separable conv (first presented in MobileNetV1) • mobile inverted bottleneck conv Previous work
  • 7. 6 EfficientNet Previous work • A general convnet has many parameters to calculate.
  • 8. 7 EfficientNet Previous work • Depth wise Separable Convolution MobileNetV1 (2017)
  • 9. 8 Paper review Previous work • Depth wise Separable Convolution MobileNetV1 (2017) • Although accuracy decreases slightly, the amount of calculation is significantly reduced.
  • 10. 9 EfficientNet Previous work • mobile inverted bottleneck conv(MBConV) MobileNetV2 (2018) • This paper proposes the Manifold concept ("high-dimensional channels can actually be well expressed as low- dimensional channels") and with this hypothesis, 1x1 Pointwise Conv is first performed to increase the number of channels and then Depthwise Conv is performed. • In summary, the process proceeds as dimension expansion -> information extraction -> dimensionality reduction.
  • 11. 10 EfficientNet Previous work • ReLU6 activation MobileNetV2 (2018) • Used in other layers except projection layer • In existing ReLU, there is no upper limit on the positive region. • ReLU6 is an activation function that prevents the maximum value from exceeding 6 in the positive region.
  • 12. 11 EfficientNet Previous work • swish MobileNetV3 (2019) • Not all layers use h-swish, and the first half usually uses ReLU. • MobileNet authors discovered that h-swish is only effective in deeper layers.
  • 13. 12 EfficientNet Previous work SENet • Squeeze is performed by averaging each two-dimensional feature map through global average pooling (GAP) to obtain a single value. • GAP converts features into one-dimensional vectors and compresses the feature map of HxWxC size into 1x1xC. • SE Block • Squeeze out the feature and amplify it again • Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by about 0.6% on the ImageNet Dataset.
  • 14. 13 EfficientNet Previous work • SE Block • Squeeze out the feature and amplify it again • Test results by attaching the module to Inception and ResNet improved the accuracy of the existing model by about 0.6% on the ImageNet Dataset. • By adding two fully-connected (FC) layers, the relative importance of each channel is determined. (Calculate channel-wise dependencies) • The channel descriptor vector was linearly transformed, ReLU was applied, and the sigmoid function was applied pointwise after linear transformation. • Finally, the scale values that have gone through the excitation operation through the Sigmoid function all have values between 0 and 1, so they are scaled according to the importance of the channels. SENet
  • 15. 14 EfficientNet • Since we are trying to solve the problem of maximizing the accuracy of the model, this problem can be organized as follows: Methodology • This is a general formulation of ConvNet. • If H, W, C are the size of the input tensor, and F is the Conv layer, the ConvNet is as follows: • The important problem is that the optimal d, w, and r coefficients are related to each other and subject to different resource constraints. • Therefore, ConvNets that have been widely used have been scaled by selecting only one of the following dimensions. • If one of the coefficients is increased and learned, performance improves, but there is a clear limit to the improvement.
  • 16. 15 EfficientNet Methodology • Depth: The capacity of the model increases and more complex features can be captured. • It becomes more difficult to learn due to the vanishing gradient problem. • Width: Increasing the width of each layer increases accuracy. • The amount of calculation increases in proportion to the square. • Resolution: Increasing the resolution of the input image allows learning more detailed features, increasing accuracy. • The amount of calculation increases in proportion to the square.
  • 17. 16 EfficientNet Baseline • It can be seen that balancing d, w, and r coefficients is very important for ConvNet scaling. • The author proposes a compound scaling method that uses compound coefficient to equally adjust the width, depth, and resolution of the network. • The FLOPS of ConvNet is dominated by convolution operation, the FLOPS of ConvNet according to the picture above increases and decreases in proportion to (α⋅β2⋅γ2) ϕ. • α⋅β2⋅γ2 values are limited to 2, the total FLOPS increases or decreases approximately in proportion to 2ϕ.
  • 18. 17 EfficientNet Baseline • This is the result of how the CAM (Class Activation Map) is observed when designed by adjusting only the depth, width, and resolution of the network, and when the depth, width, and resolution are balanced using compound scaling. • In the case of compound scaling, it can be seen that the feature map is clearly activated for meaningful parts in the image.
  • 19. 18 EfficientNet • STEP 1: Set ϕ = 1 and perform a small grid search for α, β, and γ. The values found are α = 1.2, β = 1.1, and γ = 1.15, which is α⋅β2⋅γ2 ≈ 2. • STEP 2: Fix α, β, and γ and increase the overall size by changing ϕ. • You can get better results by experimenting with a large model using α, β, and γ directly, but for a large model, the resources required for the experiment are too many. • So, for a small baseline network, first find good α , β , and γ (STEP 1) and then increase the overall size (STEP 2). Model
  • 20. 20 EfficientNet • Existing ConvNets showing similar Top-1 and Top-5 accuracy were grouped and compared with EfficientNet. • EfficientNet consistently shows significantly fewer parameters and FLOPS in all domains. (The number of parameters is up to 8.4 times less, and the FLOPS is up to 16 times less.) Experiments
  • 21. 21 EfficientNet • Table shows FLOPS and Top-1 Accuracy by experimental network d, w, and r conditions. Experiments
  • 22. 24 EfficientNet • We systematically study ConvNet scaling and identify that carefully balancing network width, depth, and resolution is an important but missing piece, preventing us from better accuracy and efficiency. • To address this issue, we propose a simple and highly effective compound scaling method, which enables us to easily scale up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency. • Powered by this compound scaling method, we demonstrate that a mobilesize EfficientNet model can be scaled up very effectively, surpassing state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on both ImageNet and five commonly used transfer learning datasets. Conclusions

Editor's Notes

  • #9: M input channel N output channel
  • #15: GP -> FC -> ReLu -> FC -> Sigmoid
  • #16:  F, L, H, W, C는 baseline network가 정해지면서 정해지며, w, d, r이 network 를 scaling하는데 사용되는 coefficient들이다. 1. Depth (d) (layer를 깊게 가져가는것, ex. ResNet-100 -> ResNet-1000) 2. Width (w) (# of channels를 많게 가져가는것) 3. Resolution (r) (input image의 사이즈를 MxM -> r*Mxr*M의, 더 큰사이즈로 입력받는것)
  • #18:  파이는 얼마나 많은 resource를 사용할지에 대해 사용자가 정할 coefficient이며, 알파, 베타, 감마가 small grid search방법으로 찾게될 변수들이다.