SlideShare a Scribd company logo
Presented by ChanHyuk Lee
2021/06/13
Computer Graphics @ Korea University
EfficientDet
MingxingTan et al.
CVPR 2020
517 citation
1/
CONTENTS
Introduction
01
Related work
02
Proposed method
03
Experiments
04
Ablation study
05
Conclusion
05
2
3
Background
Detection architecture
00
Backbone network FPN Prediction Network
Box prediction
(Regression)
Class prediction
(Classification)
Backbone network Feature Pyramid Network Prediction network
Introduction
โ€ข Recent detectors have the trade-off between accuracy and efficiency
โ€ข Most previous works only focus on a specific or a small range of resource requirements
โ€ข This points make hard to apply the recent detection models on industry field
โ€ข โ€œIs it possible to build a scalable detection architecture with both higher
accuracy and better efficiency across a wide spectrum of resource constraints?โ€
Motivation
01
4
Introduction
Challenge 1. Efficient multi-scale feature fusion
01
5
โ€ข Feature fusion : The method for combining feature maps
โ†’ Normal feature fusion methods donโ€™t care about feature resolution.
Challenge 2 : Model scaling
โ€ข Model scaling : The method for up-scaling the model architecture
โ†’ Limitation of up-scaling by considering one factor
Input-image up-scaling
Network up-scaling
02
Introduction
6
Related work
Multi-scale feature representation
01
Conv
Conv
Conv
Conv
Up scaling
Up scaling
Up scaling
1x1 Conv
1x1 Conv
1x1 Conv
1x1 Conv
Prediction
Prediction
Prediction
Prediction
Backbone
Feature
pyramid
๐’‘๐Ÿ’๐’๐’–๐’•
๐’‘๐Ÿ‘๐’๐’–๐’•
๐’‘๐Ÿ๐’๐’–๐’•
๐’‘๐Ÿ๐’๐’–๐’•
๐’‘๐Ÿ’
๐’‘๐Ÿ‘
๐’‘๐Ÿ
๐’‘๐Ÿ
7
โ€ข For considering multi-scale object
Area
Prediction
layer
Related work
Model scaling
02
โ€ข EfficientNet (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Mingxing Tan et al, ICML 2019)
โ€ข Jointly Scale up the depth, width, resolution (Compound scaling)
8
๐‘“
๐‘“
๐‘“
๐‘“
๐ท๐‘’๐‘๐‘กโ„Ž
๐ผ๐‘›๐‘๐‘ข๐‘ก ๐‘Ÿ๐‘’๐‘ ๐‘œ๐‘™๐‘ข๐‘ก๐‘–๐‘œ๐‘›
Q&A
9
Proposed method
01 RetinaNet architecture
10
02 EfficientDet architecture
BiFPN : Efficient bidirectional cross-scale connections and weighted feature fusion
Problem formulation
01
11
โ€ข Delete two blocks (compared to PANet)
โ€ข Add skip connection
โ€ข Weighted feature fusion
โ€ข Repeat BiFPN Layers
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
๐‘ค
BiFPN
Weighted Feature Fusion
02
โ€ข The difference of Resolution between Inputs โ†’ Different degrees of contribution to output
โ€ข Gave each input feature a weight to learn the contribution of the input feature.
๐‘ถ๐’–๐’•๐’‘๐’–๐’•
๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†
๐‘พ๐’†๐’Š๐’ˆ๐’‰๐’•๐’Š ๐‘ฐ๐’๐’‘๐’–๐’•
๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†๐’Š
๐‘บ๐’๐’‡๐’•๐’Ž๐’‚๐’™ โˆ’ ๐’ƒ๐’‚๐’”๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ ๐‘ญ๐’‚๐’”๐’• ๐’๐’๐’“๐’Ž๐’‚๐’๐’Š๐’›๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’
(30% Speed Gain in GPU)
12
EfficientDet
EfficientDet Architecture
01
โ€ข Using the efficientNet trained by ImageNet Data as backbone
โ€ข The Prediction layer networkโ€™s weights is shared for all Level features
13
EfficientDet
Compound scaling
02
โ€ข Previous works mostly scale up baseline network or using larger image inputs, stacking
more FPN layers
โ€ข New compound scaling method jointly scale up all dimensions of backbone network, BiFPN
network, prediction network and resolution of input.
Backbone network
02-1
โ€ข Reuse the same width/depth scaling coefficients of EfficientNet-B0 to B6
BiFPN network
02-2
โ€ข Perform grid search for finding best factor value on a list of values {1.2, 1.25, 1.3, 1.35, 1.4, 1.45}
๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘โ„Ž๐‘Ž๐‘›๐‘›๐‘’๐‘™ ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ
14
EfficientDet
Prediction network
02-3
โ€ข The width of network is same as BiFPN network's width
๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ
Input image resolution
02-4
Overall scaling output
02-5
15
Q&A
16
Experiments
Experiment configuration
01
โ€ข Dataset : COCO 2017 datasets with 118K images
โ€ข Optimizer : SGD with momentum 0.9 and weight decay 4e-5
โ€ข Learning Rate : 0 to 0.16 (First epoch), annealed down using cosine decay rule (0~0.16 ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก)
โ€ข Batch normalization is used after every convolution layer
โ€ข Every convolution layer is depth-wise conv layer
โ€ข Activation function : Swish (๐‘ฅ โˆ— ๐‘†๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐›ฝ๐‘ฅ))
โ€ข Augmentation : Multi-resolution cropping / scaling / flipping
17
Experiments
Loss function
02
โ€ข Using Focal-loss for detection
โ€ข Class imbalanced problem is most effected by easy negative samples
โ€ข Training by focusing on hard samples
โ€ข If ๐‘๐‘ก is almost 1 โ†’ โˆ’ 1 โˆ’ 0.999 ๐‘Ÿ
๐‘™๐‘œ๐‘” ๐‘๐‘ก โ‰ˆ 0
โ€ข Else โ†’ โˆ’ 1 โˆ’ 0.001 ๐‘Ÿ ๐‘™๐‘œ๐‘”(๐‘๐‘ก) โ‰ˆ โˆž
๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘œ๐‘“
๐‘๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘–๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
18
Experiments
Performance on COCO
03
โ€ข Latency is inference latency with batch size 1
โ€ข AA denotes Auto-Augmentation
19
Experiments
Model size and inference latency comparison
04
โ€ข The comparison result of using GPU (Titan-V), CPU (Xeon)
20
Experiments
EfficientDet for Semantic Segmentation
05
โ€ข Use P2 Layer in BiFPN for semantic segmentation in EfficientDet-D4 model
DeepLabv3
21
Ablation study
Disentangling Backbone and BiFPN
01
โ€ข The Backbone network and multi-feature network of EfficientDet achieves higher AP and
Efficiency than prior networks
22
Ablation study
BiFPN Cross Scale Connection
02
โ€ข For the fair comparison, FPN and PANet are repeated multiple times and change the conv.
โ€ข BiFPN achieves the best accuracy with fewer parameters and FLOPs
23
Ablation study
Softmax vs Fast Normalized fusion
03
โ€ข Fast normalized fusion approach achieves similar accuracy as the softmax-based method
โ€ข Figure 5 illustrates the learned weights for three feature fusion nodes
24
Ablation study
Compound Scaling
04
โ€ข EfficientDet jointly scale up the networkโ€™s backbone, BiFPN, prediction net, input resolution
โ€ข The proposed method achieves the best accuracy than other scaling method
25
Conclusion
Propose the weight bidirectional feature network and customized compound scaling
method, in order to improve accuracy and efficiency
01
EfficientDet achieves better accuracy and efficiency than the prior art across a wide
spectrum of resource constrains
02
EfficientDet achieves SOTA accuracy with much fewer parameters and FLOPs in object
detection and semantic segmentation
03
26
THANK
YOU
27

More Related Content

PDF
PR-217: EfficientDet: Scalable and Efficient Object Detection
PDF
Graph Convolutional Neural Networks
PDF
Resnet
PDF
Tutorial on Deep Learning
PDF
Introduction To TensorFlow
PDF
Introduction to keras
PPTX
Introduction to object detection
PDF
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-217: EfficientDet: Scalable and Efficient Object Detection
Graph Convolutional Neural Networks
Resnet
Tutorial on Deep Learning
Introduction To TensorFlow
Introduction to keras
Introduction to object detection
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

What's hot (20)

PDF
ViT (Vision Transformer) Review [CDM]
PPTX
Deep Learning, Keras, and TensorFlow
PDF
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
PDF
[๊ธฐ์ดˆ๊ฐœ๋…] Graph Convolutional Network (GCN)
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
ย 
PDF
PR-207: YOLOv3: An Incremental Improvement
PDF
ResNet basics (Deep Residual Network for Image Recognition)
PPTX
Colab workshop (for Computer vision Students)
PPTX
Dbscan algorithom
PDF
Introduction to Deep Learning, Keras, and TensorFlow
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PPTX
Image classification using cnn
PPTX
CONVOLUTIONAL NEURAL NETWORK
PPT
SIGGRAPH Asia 2008 Modern OpenGL
PPTX
Introduction to Graph neural networks @ Vienna Deep Learning meetup
PPTX
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PDF
How Powerful are Graph Networks?
ย 
PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
PDF
Faster R-CNN - PR012
ViT (Vision Transformer) Review [CDM]
Deep Learning, Keras, and TensorFlow
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
[๊ธฐ์ดˆ๊ฐœ๋…] Graph Convolutional Network (GCN)
Understanding GenAI/LLM and What is Google Offering - Felix Goh
ย 
PR-207: YOLOv3: An Incremental Improvement
ResNet basics (Deep Residual Network for Image Recognition)
Colab workshop (for Computer vision Students)
Dbscan algorithom
Introduction to Deep Learning, Keras, and TensorFlow
Object Detection Beyond Mask R-CNN and RetinaNet I
Image classification using cnn
CONVOLUTIONAL NEURAL NETWORK
SIGGRAPH Asia 2008 Modern OpenGL
Introduction to Graph neural networks @ Vienna Deep Learning meetup
MobileNet Review | Mobile Net Research Paper Review | MobileNet v1 Paper Expl...
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
How Powerful are Graph Networks?
ย 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Faster R-CNN - PR012
Ad

Similar to [2020 CVPR Efficient DET paper review] (20)

PPTX
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
PDF
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
PPTX
Cvpr 2018 papers review (efficient computing)
PDF
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PDF
Efficient de cvpr_2020_paper
PDF
ResNeSt: Split-Attention Networks
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PDF
Efficient_DNN_pruning_System_for_machine_learning.pdf
PDF
Mixed Precision Training Review
PDF
OBDPC 2022
PPTX
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
PDF
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
PPTX
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
PDF
Once-for-All: Train One Network and Specialize it for Efficient Deployment
PPTX
150807 Fast R-CNN
PDF
Deep Learning Initiative @ NECSTLab
PDF
Modern Convolutional Neural Network techniques for image segmentation
PPTX
GoogLeNet.pptx
PDF
ๆทฑๅบฆๅญธ็ฟ’ๅœจAOI็š„ๆ‡‰็”จ
PPTX
Model compression
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Cvpr 2018 papers review (efficient computing)
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Efficient de cvpr_2020_paper
ResNeSt: Split-Attention Networks
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Efficient_DNN_pruning_System_for_machine_learning.pdf
Mixed Precision Training Review
OBDPC 2022
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Once-for-All: Train One Network and Specialize it for Efficient Deployment
150807 Fast R-CNN
Deep Learning Initiative @ NECSTLab
Modern Convolutional Neural Network techniques for image segmentation
GoogLeNet.pptx
ๆทฑๅบฆๅญธ็ฟ’ๅœจAOI็š„ๆ‡‰็”จ
Model compression
Ad

More from taeseon ryu (20)

PDF
VoxelNet
PDF
OpineSum Entailment-based self-training for abstractive opinion summarization...
PPTX
3D Gaussian Splatting
PDF
JetsonTX2 Python
PPTX
Hyperbolic Image Embedding.pptx
PDF
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
PDF
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
PDF
YOLO V6
PDF
Dataset Distillation by Matching Training Trajectories
PDF
RL_UpsideDown
PDF
Packed Levitated Marker for Entity and Relation Extraction
PPTX
MOReL: Model-Based Offline Reinforcement Learning
PDF
Scaling Instruction-Finetuned Language Models
PDF
Visual prompt tuning
PDF
mPLUG
PDF
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
PDF
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
PDF
The Forward-Forward Algorithm
PPTX
Towards Robust and Reproducible Active Learning using Neural Networks
PDF
BRIO: Bringing Order to Abstractive Summarization
VoxelNet
OpineSum Entailment-based self-training for abstractive opinion summarization...
3D Gaussian Splatting
JetsonTX2 Python
Hyperbolic Image Embedding.pptx
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
YOLO V6
Dataset Distillation by Matching Training Trajectories
RL_UpsideDown
Packed Levitated Marker for Entity and Relation Extraction
MOReL: Model-Based Offline Reinforcement Learning
Scaling Instruction-Finetuned Language Models
Visual prompt tuning
mPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
The Forward-Forward Algorithm
Towards Robust and Reproducible Active Learning using Neural Networks
BRIO: Bringing Order to Abstractive Summarization

Recently uploaded (20)

PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Votre score augmente si vous choisissez une catรฉgorie et que vous rรฉdigez une...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
How to run a consulting project- client discovery
ย 
PPTX
Introduction to Inferential Statistics.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Microsoft Core Cloud Services powerpoint
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
ย 
PDF
Introduction to the R Programming Language
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Transcultural that can help you someday.
PPTX
Leprosy and NLEP programme community medicine
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
CYBER SECURITY the Next Warefare Tactics
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Votre score augmente si vous choisissez une catรฉgorie et que vous rรฉdigez une...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
SAP 2 completion done . PRESENTATION.pptx
IMPACT OF LANDSLIDE.....................
retention in jsjsksksksnbsndjddjdnFPD.pptx
[EN] Industrial Machine Downtime Prediction
How to run a consulting project- client discovery
ย 
Introduction to Inferential Statistics.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Microsoft Core Cloud Services powerpoint
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
ย 
Introduction to the R Programming Language
importance of Data-Visualization-in-Data-Science. for mba studnts
Transcultural that can help you someday.
Leprosy and NLEP programme community medicine
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Topic 5 Presentation 5 Lesson 5 Corporate Fin

[2020 CVPR Efficient DET paper review]

  • 1. Presented by ChanHyuk Lee 2021/06/13 Computer Graphics @ Korea University EfficientDet MingxingTan et al. CVPR 2020 517 citation 1/
  • 3. 3 Background Detection architecture 00 Backbone network FPN Prediction Network Box prediction (Regression) Class prediction (Classification) Backbone network Feature Pyramid Network Prediction network
  • 4. Introduction โ€ข Recent detectors have the trade-off between accuracy and efficiency โ€ข Most previous works only focus on a specific or a small range of resource requirements โ€ข This points make hard to apply the recent detection models on industry field โ€ข โ€œIs it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints?โ€ Motivation 01 4
  • 5. Introduction Challenge 1. Efficient multi-scale feature fusion 01 5 โ€ข Feature fusion : The method for combining feature maps โ†’ Normal feature fusion methods donโ€™t care about feature resolution. Challenge 2 : Model scaling โ€ข Model scaling : The method for up-scaling the model architecture โ†’ Limitation of up-scaling by considering one factor Input-image up-scaling Network up-scaling 02
  • 7. Related work Multi-scale feature representation 01 Conv Conv Conv Conv Up scaling Up scaling Up scaling 1x1 Conv 1x1 Conv 1x1 Conv 1x1 Conv Prediction Prediction Prediction Prediction Backbone Feature pyramid ๐’‘๐Ÿ’๐’๐’–๐’• ๐’‘๐Ÿ‘๐’๐’–๐’• ๐’‘๐Ÿ๐’๐’–๐’• ๐’‘๐Ÿ๐’๐’–๐’• ๐’‘๐Ÿ’ ๐’‘๐Ÿ‘ ๐’‘๐Ÿ ๐’‘๐Ÿ 7 โ€ข For considering multi-scale object Area Prediction layer
  • 8. Related work Model scaling 02 โ€ข EfficientNet (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Mingxing Tan et al, ICML 2019) โ€ข Jointly Scale up the depth, width, resolution (Compound scaling) 8 ๐‘“ ๐‘“ ๐‘“ ๐‘“ ๐ท๐‘’๐‘๐‘กโ„Ž ๐ผ๐‘›๐‘๐‘ข๐‘ก ๐‘Ÿ๐‘’๐‘ ๐‘œ๐‘™๐‘ข๐‘ก๐‘–๐‘œ๐‘›
  • 10. Proposed method 01 RetinaNet architecture 10 02 EfficientDet architecture
  • 11. BiFPN : Efficient bidirectional cross-scale connections and weighted feature fusion Problem formulation 01 11 โ€ข Delete two blocks (compared to PANet) โ€ข Add skip connection โ€ข Weighted feature fusion โ€ข Repeat BiFPN Layers ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค ๐‘ค
  • 12. BiFPN Weighted Feature Fusion 02 โ€ข The difference of Resolution between Inputs โ†’ Different degrees of contribution to output โ€ข Gave each input feature a weight to learn the contribution of the input feature. ๐‘ถ๐’–๐’•๐’‘๐’–๐’• ๐’‡๐’†๐’‚๐’•๐’–๐’“๐’† ๐‘พ๐’†๐’Š๐’ˆ๐’‰๐’•๐’Š ๐‘ฐ๐’๐’‘๐’–๐’• ๐’‡๐’†๐’‚๐’•๐’–๐’“๐’†๐’Š ๐‘บ๐’๐’‡๐’•๐’Ž๐’‚๐’™ โˆ’ ๐’ƒ๐’‚๐’”๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ ๐‘ญ๐’‚๐’”๐’• ๐’๐’๐’“๐’Ž๐’‚๐’๐’Š๐’›๐’†๐’… ๐’‡๐’–๐’”๐’Š๐’๐’ (30% Speed Gain in GPU) 12
  • 13. EfficientDet EfficientDet Architecture 01 โ€ข Using the efficientNet trained by ImageNet Data as backbone โ€ข The Prediction layer networkโ€™s weights is shared for all Level features 13
  • 14. EfficientDet Compound scaling 02 โ€ข Previous works mostly scale up baseline network or using larger image inputs, stacking more FPN layers โ€ข New compound scaling method jointly scale up all dimensions of backbone network, BiFPN network, prediction network and resolution of input. Backbone network 02-1 โ€ข Reuse the same width/depth scaling coefficients of EfficientNet-B0 to B6 BiFPN network 02-2 โ€ข Perform grid search for finding best factor value on a list of values {1.2, 1.25, 1.3, 1.35, 1.4, 1.45} ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘โ„Ž๐‘Ž๐‘›๐‘›๐‘’๐‘™ ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ 14
  • 15. EfficientDet Prediction network 02-3 โ€ข The width of network is same as BiFPN network's width ๐‘‡โ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ Input image resolution 02-4 Overall scaling output 02-5 15
  • 17. Experiments Experiment configuration 01 โ€ข Dataset : COCO 2017 datasets with 118K images โ€ข Optimizer : SGD with momentum 0.9 and weight decay 4e-5 โ€ข Learning Rate : 0 to 0.16 (First epoch), annealed down using cosine decay rule (0~0.16 ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก) โ€ข Batch normalization is used after every convolution layer โ€ข Every convolution layer is depth-wise conv layer โ€ข Activation function : Swish (๐‘ฅ โˆ— ๐‘†๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐›ฝ๐‘ฅ)) โ€ข Augmentation : Multi-resolution cropping / scaling / flipping 17
  • 18. Experiments Loss function 02 โ€ข Using Focal-loss for detection โ€ข Class imbalanced problem is most effected by easy negative samples โ€ข Training by focusing on hard samples โ€ข If ๐‘๐‘ก is almost 1 โ†’ โˆ’ 1 โˆ’ 0.999 ๐‘Ÿ ๐‘™๐‘œ๐‘” ๐‘๐‘ก โ‰ˆ 0 โ€ข Else โ†’ โˆ’ 1 โˆ’ 0.001 ๐‘Ÿ ๐‘™๐‘œ๐‘”(๐‘๐‘ก) โ‰ˆ โˆž ๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘œ๐‘“ ๐‘๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘–๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘› 18
  • 19. Experiments Performance on COCO 03 โ€ข Latency is inference latency with batch size 1 โ€ข AA denotes Auto-Augmentation 19
  • 20. Experiments Model size and inference latency comparison 04 โ€ข The comparison result of using GPU (Titan-V), CPU (Xeon) 20
  • 21. Experiments EfficientDet for Semantic Segmentation 05 โ€ข Use P2 Layer in BiFPN for semantic segmentation in EfficientDet-D4 model DeepLabv3 21
  • 22. Ablation study Disentangling Backbone and BiFPN 01 โ€ข The Backbone network and multi-feature network of EfficientDet achieves higher AP and Efficiency than prior networks 22
  • 23. Ablation study BiFPN Cross Scale Connection 02 โ€ข For the fair comparison, FPN and PANet are repeated multiple times and change the conv. โ€ข BiFPN achieves the best accuracy with fewer parameters and FLOPs 23
  • 24. Ablation study Softmax vs Fast Normalized fusion 03 โ€ข Fast normalized fusion approach achieves similar accuracy as the softmax-based method โ€ข Figure 5 illustrates the learned weights for three feature fusion nodes 24
  • 25. Ablation study Compound Scaling 04 โ€ข EfficientDet jointly scale up the networkโ€™s backbone, BiFPN, prediction net, input resolution โ€ข The proposed method achieves the best accuracy than other scaling method 25
  • 26. Conclusion Propose the weight bidirectional feature network and customized compound scaling method, in order to improve accuracy and efficiency 01 EfficientDet achieves better accuracy and efficiency than the prior art across a wide spectrum of resource constrains 02 EfficientDet achieves SOTA accuracy with much fewer parameters and FLOPs in object detection and semantic segmentation 03 26