SlideShare a Scribd company logo
SCRDet++:DetectingSmall,Cluttered
andRotatedObjectsviaInstance-Level
FeatureDenoisingandRotationLoss
Smoothing
XueYang,JunchiYanMember
,IEEE,XiaokangYangFellow,
IEEE,JinTang,WenlongLiao,TaoHe
Neha
SCI Research Lab
neha@kent.edu
Contents:
Object detection
Instance level denoising (InLD) in the Feature Map
The pipeline
How Instance-level Feature Map Denoising works
Mathematical foundation to remove instance level noise
Rotated object detection
Horizontal vs Rotated object detection
Datasets
Experiment
Effect of Instance-Level Denoising
Results
Object detection
When humans look at images or video, they can recognize and
locate objects of interest within a matter of moments.
Similarly, Object detection is a computer vision technique for
locating instances of objects in images or videos and The goal of
object detection is to replicate this intelligence using a computer.
Limitations of Current detectors
- small size, cluttered arrangement, and arbitrary orientations
1) Small objects Overwhelmed by complex
surrounding.
2) Cluttered arrangement
– Densely arranged objects
– inter- class feature coupling and intraclass feature
boundary blur
3) Arbitrary orientations.
Rotation detection > Axis aligned detection
The horizontal bounding box for a rotated object is more loose than an aligned
rotated one, such that the box contains a large portion of background or nearby
cluttered objects as disturbance.
A way to dismiss the noisy interference
from both background and other
foreground objects
Types of noises
1. Image level noise.
2. Instance level noise
– Mutual interference between objects
– interference between object and
background
Denoising is performed on raw image
for the purpose of image enhancement,
and it also improves the detection
performance of small objects.
Instance level
denoising
(InLD) in the
Feature Map
(InLD) is realized by supervised segmentation.
Instance Level Denoising ( InLD) is applied to decouple the features of
different object categories into their respective channels.
At the same time features of the object and background are
enhanced and weakened, respectively in the spatial domain.
• Rotated objects = Smooth L1 Loss + IoU constant factor
• > five parameter regression
• discountinous boundaries
• Periodicity of angular
• Exchangeability of edges
The pipeline
SCRDet++ mainly consists of four
modules:
– Feature extraction
– Image-level denoising module
– Instance-level denoising module
– ‘class+box’
Fig 1.
Instance-level Feature Map
Denoising
Instance-Level Noise has adversary effects on feature map.,
such as:
– The non-object with object-like shape has a higher
response in the feature map, especially for small
objects (see the top row of Fig. 2).
– Clutter objects that are densely arranged tend to
suffer the issue for inter-class feature coupling and
intra-class feature boundary blurring
– The response of object is not prominent enough
surrounded by the background
Fig. 2. Images (left) and their feature maps before (middle) and after (right) the
instance-level denoising operation. First row: non-object with object-like
shape. Second row: inter-class feature coupling and intra- class feature
boundary blurring.
Fig 2.
Mathematical foundation to remove
instance level noise
– Reweight the convolutional response maps [10].
– Important parts > uninformative ones
Fig 3.
- X, Y ∈ R^(C ×H ×W) are two feature maps of input image
- A(X) is an attention function
- ⊙ is the element-wise product
- Ws ∈ R^H×W and Wc ∈ R^C denote the spatial weight and
channel weight
- Wci indicates the weight of the i-th channel
- U, concatenation operation for connecting tensor among the
feature map
The new formulation which considers the total I number of object categories with one additional category
for background is as follows:
During the implementation of InLD, learned weights are regarded as a result of semantic segmentation task, where the
feature responses of each category on the previous layers of the output layer are separated in the channel dimension, and the
feature responses of the foreground and background in the spatial dimension are also polarized.
s
• Channel dimension = inter class features
• Spatial dimension = intra class features
• Original feature map + Denoised feature map = Decoupled feature map
Rotated object detection
– Ideal case: The blue box rotates
counterclockwise to the red box.
Limitations: Higher loss due to
periodicity of angular (PoA) and
exchangeability of edges (EoE)
whereas, rotating the bounding box
clockwise while scaling w and h adds more
complexity
– Thus, Add IoU constant factor in the
traditional smooth L1 loss
– The new regression loss
– determines the direction of gradient
propagation
– And magnitude of gradient
Fig 4.
Horizontal vs Rotated Object detection
Horizontal Object detection
Uses: Multi-task Loss
Rotated object detection
Uses: smooth L1 loss + IoU constant factor
Datasets
DOTA DIOR UCAS-AOD BSTLD S2 TLD
• 2806 Aerial
images
• 15 object classes
• 188,282
instances
• 23,463 Aerial
images
• 20 object classes
• 190,288
instances
• 1510 Aerial
images
• 2 object classes
• 14,596 instances
• 13,427 camera
images
• Few instances of
many categories
• 5,786 images
• 5 object
categories
• 14,130 instances
In addition to the above datasets, they also use natural image dataset COCO [8] and scene text dataset
ICDAR2015 [28] for further evaluation.
Experiment
Server with a GeForce RTX 2080 Ti and 11G memory.
– Initialization by ResNet50 [14] by default.
– The weight decay and momentum for all experiments are set 0.0001 and 0.9, respectively.
– A Momentum Optimizer was employed over 8 GPUs with a total of 8 images per minibatch.
– Standard evaluation protocol of COCO, while for other datasets, the anchors of RetinaNet-based
method were used with seven aspect ratios {1, 1/2, 2, 1/3, 3, 5, 1/5} and three scales {20 , 21/3 ,
22/3 }.
– For rotating anchor-based method (RetinaNet-R), the angle is set by an arithmetic progression
from −90◦ to −15◦ with an interval of 15 degrees.
Effect of Instance-Level Denoising
– Improved accuracy
– Effect of IoU-Smooth L1 Loss
– Eliminates the boundary effects of the angle,
– Model easily regresses the object coordinates.
– The new loss improves three detectors’(RetinaNet-R [4],SCRDet [3], FPN [15] ) accuracy to 69.83%, 68.65% and
76.20%, respectively.
– Effect of Data Augmentation and Backbone.
– Used ResNet101
– Improvement from 69.81% → 72.98%.
– Final performance of the model was improved from 72.98% to 74.41% by using ResNet152 as backbone.
– InLD with the state-of-the-art algorithms on
two datasets DOTA [16] and DIOR [17]
outperforms all other models and achieves
the best performance, 76.56% and 76.81%
respectively.
– Methods achieve the best performance,
76.56% and 76.81% respectively 77.80%
and 75.11% mAP on FPN and RetinaNet
based methods.
– Table.1 illustrates the comparison of
performance on UCAS-AOD dataset.
– Method achieves 96.95% for OBB task and
is the best out of all the existing published
methods.
Method mAP Plane Car
YOLOv2 [18] 87.90 96.60 79.20
R-DFPN [12] 89.20 95.90 82.50
DRBox [19] 89.95 94.90 85.00
S2 ARN [20] 94.90 97.60 92.20
RetinaNet-H
[4]
95.47 97.34 93.60
ICN [21] 95.67 - -
FADet [22] 95.71 98.69 92.72
R3 Det [4] 96.17 98.20 94.14
SCRDet++ (R3
Det-based)
96.95 98.93 94.97
TABLE: 1 Performance by accuracy (%) on UCAS-AOD dataset.
Results:
References:
[1] S.M.Azimi,E.Vig,R.Bahmanyar,M.Ko ̈rner,andP.Reinartz,“To- wards multi-class object detection in unconstrained remote sens- ing imagery,” in Asian Conference on
Computer Vision. Springer, 2018, pp. 150–165.
[2] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2019.
[3] X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “Scrdet: Towards more robust detection for small, cluttered and rotated objects,” in Proceedings of the
IEEE International Conference on Computer Vision (ICCV), October 2019.
[4] X. Yang, Q. Liu, J. Yan, and A. Li, “R3det: Refined single-stage detector with feature refinement for rotating object,” arXiv preprint arXiv:1908.05612, 2019.
[5] W. Qian, X. Yang, S. Peng, Y. Guo, and C. Yan, “Learn- ing modulated loss for rotated object detection,” arXiv preprint arXiv:1911.08299, 2019.
[6] Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G.-S. Xia, and X. Bai, “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2020.
[7] H.Wei,L.Zhou,Y.Zhang,H.Li,R.Guo,andH.Wang,“Oriented objects as pairs of middle lines,” arXiv preprint arXiv:1912.10694, 2019.
[8] Z. Xiao, L. Qian, W. Shao, X. Tan, and K. Wang, “Axis learning for orientated objects detection in aerial images,” Remote Sensing, vol. 12, no. 6, p. 908, 2020.
[9] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018,
pp. 7794–7803.
[10] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp.
7132–7141.
[11] X. Yang, Q. Liu, J. Yan, and A. Li, “R3det: Refined single-stage detector with feature refi
[12] X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, and Z. Guo, “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale
rotation dense feature pyramid networks,” Remote Sensing, vol. 10, no. 1, p. 132, 2018.
[13] X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, and K. Fu, “Position detection and direction prediction for arbitrary-oriented ships via multitask
rotation region convolutional neural network,” IEEE Access, vol. 6, pp. 50 839–50 849, 2018.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[15 ] T.-Y. Lin, P. Dolla ́r, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, 2017, p. 4.
[16] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection
in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[17] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS
Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 296–307, 2020.
[18] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2017, pp. 7263–7271.
[19] L. Liu, Z. Pan, and B. Lei, “Learning a rotation invariant detector with rotatable bounding box,” arXiv preprint arXiv:1711.09405, 2017.
[20] S. Bao, X. Zhong, R. Zhu, X. Zhang, Z. Li, and M. Li, “Single shot anchor refinement network for oriented object detection in optical remote
sensing imagery,” IEEE Access, vol. 7, pp. 87150–87161, 2019.
[21] S.M.Azimi,E.Vig,R.Bahmanyar,M.Ko ̈rner,andP.Reinartz,“To- wards multi-class object detection in unconstrained remote sens- ing
imagery,” in Asian Conference on Computer Vision. Springer, 2018, pp. 150–165.
[22] C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, and J. Yang, “Feature- attentioned object detection in remote sensing imagery,” in 2019 IEEE
International Conference on Image Processing (ICIP). IEEE, 2019, pp. 3886–3890.

More Related Content

PPTX
PPTX
Yolo releases gianmaria
PPTX
PPTX
PDF
Codetecon #KRK 3 - Object detection with Deep Learning
PDF
#6 PyData Warsaw: Deep learning for image segmentation
PPTX
You only look once
PPTX
Object tracking
Yolo releases gianmaria
Codetecon #KRK 3 - Object detection with Deep Learning
#6 PyData Warsaw: Deep learning for image segmentation
You only look once
Object tracking

What's hot (20)

PPTX
PPTX
Deep learning based object detection
PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
PPTX
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
PPTX
motion and feature based person tracking in survillance videos
PDF
YOLO9000 - PR023
PPTX
Yolov3
PDF
Yolo v2 ai_tech_20190421
PDF
AN ADAPTIVE MESH METHOD FOR OBJECT TRACKING
PDF
Detection and Tracking of Moving Object: A Survey
PPTX
Background subtraction
PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
PDF
Overview Of Video Object Tracking System
PPTX
TRACKING OF PARTIALLY OCCLUDED OBJECTS IN VIDEO SEQUENCES
PDF
Visual object tracking using particle clustering - ICITACEE 2014
PDF
K-Means Clustering in Moving Objects Extraction with Selective Background
PPTX
Deep learning for object detection
PPT
Real-time Object Tracking
PDF
IRJET - Real Time Object Detection using YOLOv3
PDF
Presentation of Visual Tracking
Deep learning based object detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
motion and feature based person tracking in survillance videos
YOLO9000 - PR023
Yolov3
Yolo v2 ai_tech_20190421
AN ADAPTIVE MESH METHOD FOR OBJECT TRACKING
Detection and Tracking of Moving Object: A Survey
Background subtraction
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
Overview Of Video Object Tracking System
TRACKING OF PARTIALLY OCCLUDED OBJECTS IN VIDEO SEQUENCES
Visual object tracking using particle clustering - ICITACEE 2014
K-Means Clustering in Moving Objects Extraction with Selective Background
Deep learning for object detection
Real-time Object Tracking
IRJET - Real Time Object Detection using YOLOv3
Presentation of Visual Tracking
Ad

Similar to Scrdet++ analysis (20)

PPTX
[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Net...
PPTX
Presentation2.pptx of sota seminar iit kanpur
PDF
最近の研究情勢についていくために - Deep Learningを中心に -
PDF
Blurclassification
PDF
物件偵測與辨識技術
PDF
When Remote Sensing Meets Artificial Intelligence
PPTX
[20240812_LabSeminar_Huy]Spatio-Temporal Fusion for Human Action Recognition ...
PPTX
Recent Progress on Object Detection_20170331
PDF
Object Recogniton Based on Undecimated Wavelet Transform
PDF
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
PDF
Objects as points (CenterNet) review [CDM]
PDF
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
PPTX
SEMINAR_3 powerpoint presentation, University of Buea, FET, Sample for superv...
PPTX
ppt - of a project will help you on your college projects
PPTX
LIDAR- Light Detection and Ranging.
PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
PDF
J017426467
PDF
Stixel based real time object detection for ADAS using surface normal
PDF
Fn2611681170
PDF
Object Detection and Tracking AI Robot
[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Net...
Presentation2.pptx of sota seminar iit kanpur
最近の研究情勢についていくために - Deep Learningを中心に -
Blurclassification
物件偵測與辨識技術
When Remote Sensing Meets Artificial Intelligence
[20240812_LabSeminar_Huy]Spatio-Temporal Fusion for Human Action Recognition ...
Recent Progress on Object Detection_20170331
Object Recogniton Based on Undecimated Wavelet Transform
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
Objects as points (CenterNet) review [CDM]
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
SEMINAR_3 powerpoint presentation, University of Buea, FET, Sample for superv...
ppt - of a project will help you on your college projects
LIDAR- Light Detection and Ranging.
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
J017426467
Stixel based real time object detection for ADAS using surface normal
Fn2611681170
Object Detection and Tracking AI Robot
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Microsoft 365 products and services descrption
PDF
Transcultural that can help you someday.
PPTX
chrmotography.pptx food anaylysis techni
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Global Data and Analytics Market Outlook Report
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
Steganography Project Steganography Project .pptx
PPT
Image processing and pattern recognition 2.ppt
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
statistic analysis for study - data collection
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Microsoft 365 products and services descrption
Transcultural that can help you someday.
chrmotography.pptx food anaylysis techni
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
DU, AIS, Big Data and Data Analytics.ppt
CYBER SECURITY the Next Warefare Tactics
Global Data and Analytics Market Outlook Report
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Steganography Project Steganography Project .pptx
Image processing and pattern recognition 2.ppt
SAP 2 completion done . PRESENTATION.pptx
statistic analysis for study - data collection
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf

Scrdet++ analysis

  • 2. Contents: Object detection Instance level denoising (InLD) in the Feature Map The pipeline How Instance-level Feature Map Denoising works Mathematical foundation to remove instance level noise Rotated object detection Horizontal vs Rotated object detection Datasets Experiment Effect of Instance-Level Denoising Results
  • 3. Object detection When humans look at images or video, they can recognize and locate objects of interest within a matter of moments. Similarly, Object detection is a computer vision technique for locating instances of objects in images or videos and The goal of object detection is to replicate this intelligence using a computer. Limitations of Current detectors - small size, cluttered arrangement, and arbitrary orientations
  • 4. 1) Small objects Overwhelmed by complex surrounding. 2) Cluttered arrangement – Densely arranged objects – inter- class feature coupling and intraclass feature boundary blur 3) Arbitrary orientations. Rotation detection > Axis aligned detection The horizontal bounding box for a rotated object is more loose than an aligned rotated one, such that the box contains a large portion of background or nearby cluttered objects as disturbance.
  • 5. A way to dismiss the noisy interference from both background and other foreground objects Types of noises 1. Image level noise. 2. Instance level noise – Mutual interference between objects – interference between object and background Denoising is performed on raw image for the purpose of image enhancement, and it also improves the detection performance of small objects.
  • 6. Instance level denoising (InLD) in the Feature Map (InLD) is realized by supervised segmentation. Instance Level Denoising ( InLD) is applied to decouple the features of different object categories into their respective channels. At the same time features of the object and background are enhanced and weakened, respectively in the spatial domain. • Rotated objects = Smooth L1 Loss + IoU constant factor • > five parameter regression • discountinous boundaries • Periodicity of angular • Exchangeability of edges
  • 7. The pipeline SCRDet++ mainly consists of four modules: – Feature extraction – Image-level denoising module – Instance-level denoising module – ‘class+box’ Fig 1.
  • 8. Instance-level Feature Map Denoising Instance-Level Noise has adversary effects on feature map., such as: – The non-object with object-like shape has a higher response in the feature map, especially for small objects (see the top row of Fig. 2). – Clutter objects that are densely arranged tend to suffer the issue for inter-class feature coupling and intra-class feature boundary blurring – The response of object is not prominent enough surrounded by the background Fig. 2. Images (left) and their feature maps before (middle) and after (right) the instance-level denoising operation. First row: non-object with object-like shape. Second row: inter-class feature coupling and intra- class feature boundary blurring. Fig 2.
  • 9. Mathematical foundation to remove instance level noise – Reweight the convolutional response maps [10]. – Important parts > uninformative ones Fig 3. - X, Y ∈ R^(C ×H ×W) are two feature maps of input image - A(X) is an attention function - ⊙ is the element-wise product - Ws ∈ R^H×W and Wc ∈ R^C denote the spatial weight and channel weight - Wci indicates the weight of the i-th channel - U, concatenation operation for connecting tensor among the feature map
  • 10. The new formulation which considers the total I number of object categories with one additional category for background is as follows: During the implementation of InLD, learned weights are regarded as a result of semantic segmentation task, where the feature responses of each category on the previous layers of the output layer are separated in the channel dimension, and the feature responses of the foreground and background in the spatial dimension are also polarized. s • Channel dimension = inter class features • Spatial dimension = intra class features • Original feature map + Denoised feature map = Decoupled feature map
  • 11. Rotated object detection – Ideal case: The blue box rotates counterclockwise to the red box. Limitations: Higher loss due to periodicity of angular (PoA) and exchangeability of edges (EoE) whereas, rotating the bounding box clockwise while scaling w and h adds more complexity – Thus, Add IoU constant factor in the traditional smooth L1 loss – The new regression loss – determines the direction of gradient propagation – And magnitude of gradient Fig 4.
  • 12. Horizontal vs Rotated Object detection Horizontal Object detection Uses: Multi-task Loss Rotated object detection Uses: smooth L1 loss + IoU constant factor
  • 13. Datasets DOTA DIOR UCAS-AOD BSTLD S2 TLD • 2806 Aerial images • 15 object classes • 188,282 instances • 23,463 Aerial images • 20 object classes • 190,288 instances • 1510 Aerial images • 2 object classes • 14,596 instances • 13,427 camera images • Few instances of many categories • 5,786 images • 5 object categories • 14,130 instances In addition to the above datasets, they also use natural image dataset COCO [8] and scene text dataset ICDAR2015 [28] for further evaluation.
  • 14. Experiment Server with a GeForce RTX 2080 Ti and 11G memory. – Initialization by ResNet50 [14] by default. – The weight decay and momentum for all experiments are set 0.0001 and 0.9, respectively. – A Momentum Optimizer was employed over 8 GPUs with a total of 8 images per minibatch. – Standard evaluation protocol of COCO, while for other datasets, the anchors of RetinaNet-based method were used with seven aspect ratios {1, 1/2, 2, 1/3, 3, 5, 1/5} and three scales {20 , 21/3 , 22/3 }. – For rotating anchor-based method (RetinaNet-R), the angle is set by an arithmetic progression from −90◦ to −15◦ with an interval of 15 degrees.
  • 15. Effect of Instance-Level Denoising – Improved accuracy – Effect of IoU-Smooth L1 Loss – Eliminates the boundary effects of the angle, – Model easily regresses the object coordinates. – The new loss improves three detectors’(RetinaNet-R [4],SCRDet [3], FPN [15] ) accuracy to 69.83%, 68.65% and 76.20%, respectively. – Effect of Data Augmentation and Backbone. – Used ResNet101 – Improvement from 69.81% → 72.98%. – Final performance of the model was improved from 72.98% to 74.41% by using ResNet152 as backbone.
  • 16. – InLD with the state-of-the-art algorithms on two datasets DOTA [16] and DIOR [17] outperforms all other models and achieves the best performance, 76.56% and 76.81% respectively. – Methods achieve the best performance, 76.56% and 76.81% respectively 77.80% and 75.11% mAP on FPN and RetinaNet based methods. – Table.1 illustrates the comparison of performance on UCAS-AOD dataset. – Method achieves 96.95% for OBB task and is the best out of all the existing published methods. Method mAP Plane Car YOLOv2 [18] 87.90 96.60 79.20 R-DFPN [12] 89.20 95.90 82.50 DRBox [19] 89.95 94.90 85.00 S2 ARN [20] 94.90 97.60 92.20 RetinaNet-H [4] 95.47 97.34 93.60 ICN [21] 95.67 - - FADet [22] 95.71 98.69 92.72 R3 Det [4] 96.17 98.20 94.14 SCRDet++ (R3 Det-based) 96.95 98.93 94.97 TABLE: 1 Performance by accuracy (%) on UCAS-AOD dataset. Results:
  • 17. References: [1] S.M.Azimi,E.Vig,R.Bahmanyar,M.Ko ̈rner,andP.Reinartz,“To- wards multi-class object detection in unconstrained remote sens- ing imagery,” in Asian Conference on Computer Vision. Springer, 2018, pp. 150–165. [2] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [3] X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “Scrdet: Towards more robust detection for small, cluttered and rotated objects,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2019. [4] X. Yang, Q. Liu, J. Yan, and A. Li, “R3det: Refined single-stage detector with feature refinement for rotating object,” arXiv preprint arXiv:1908.05612, 2019. [5] W. Qian, X. Yang, S. Peng, Y. Guo, and C. Yan, “Learn- ing modulated loss for rotated object detection,” arXiv preprint arXiv:1911.08299, 2019. [6] Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G.-S. Xia, and X. Bai, “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. [7] H.Wei,L.Zhou,Y.Zhang,H.Li,R.Guo,andH.Wang,“Oriented objects as pairs of middle lines,” arXiv preprint arXiv:1912.10694, 2019. [8] Z. Xiao, L. Qian, W. Shao, X. Tan, and K. Wang, “Axis learning for orientated objects detection in aerial images,” Remote Sensing, vol. 12, no. 6, p. 908, 2020. [9] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794–7803. [10] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141. [11] X. Yang, Q. Liu, J. Yan, and A. Li, “R3det: Refined single-stage detector with feature refi [12] X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, and Z. Guo, “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote Sensing, vol. 10, no. 1, p. 132, 2018.
  • 18. [13] X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, and K. Fu, “Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network,” IEEE Access, vol. 6, pp. 50 839–50 849, 2018. [14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. [15 ] T.-Y. Lin, P. Dolla ́r, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, 2017, p. 4. [16] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [17] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 296–307, 2020. [18] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263–7271. [19] L. Liu, Z. Pan, and B. Lei, “Learning a rotation invariant detector with rotatable bounding box,” arXiv preprint arXiv:1711.09405, 2017. [20] S. Bao, X. Zhong, R. Zhu, X. Zhang, Z. Li, and M. Li, “Single shot anchor refinement network for oriented object detection in optical remote sensing imagery,” IEEE Access, vol. 7, pp. 87150–87161, 2019. [21] S.M.Azimi,E.Vig,R.Bahmanyar,M.Ko ̈rner,andP.Reinartz,“To- wards multi-class object detection in unconstrained remote sens- ing imagery,” in Asian Conference on Computer Vision. Springer, 2018, pp. 150–165. [22] C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, and J. Yang, “Feature- attentioned object detection in remote sensing imagery,” in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 3886–3890.