SlideShare a Scribd company logo
Aerial Object Detection
HyeongJun Kwon
2019-2
Contents
2
1. Detecting Oriented Text in Natural Images by Linking Segments
2. 𝐑 𝟐CNN
SegLink
3
Network Overview
4
Main Idea : Decompose text into two locally detectable elements, namely
segments and links.
SegLink
The key advantage of this approach is that
long and oriented text is now detected
locally since both basic elements are
locally-detectable:
5
SegLink
Segment: oriented box is part of the word
𝒍 𝑡ℎ layer default box ∶
predicted segments box ∶
𝑎𝑙 = 𝜆
wI
w 𝑙
, where 𝜆 = 1.5
6
SegLink
Link: connects pair of adjust segments
- within layer link
- Cross layer link
7
SegLink
Link: connects pair of adjust segments
- within layer link : As segments are detected locally, a pair of neighboring segments
are also adjacent on input image.
which are the segments on the
same layer. Every segment has 8
within-layer neighbors.
8
SegLink
Link: connects pair of adjust segments
- Cross layer link : segments of the same word could be detected on multiple layers at
the same time, producing redundancies.
which are the segments on the
preceeding layer. Every segment
has 4 cross-layer neighbors.
9
SegLink
10
SegLink
11
SegLink
12
SegLink
13
SegLink
Groundtruths of Segments and Links
1) the center of the box is inside the word bounding box;
2) the ratio between the box size al and the word height h satisfies:
14
SegLink
Objective
OHEM : negative and positive ratio 3:1 in segments set
15
SegLink
Implements Details
Key Value
Dataset SynthText(before finetune), real dataset
Optimizer Standard SGD
Batch size, learning rate 32
Learning rate 10−3(first 60k iterations), 10−4(rest 30k)
Framework Tensorflow
Environments Xeon 8-core cpu, 4 Titan X, 64GB RAM
16
SegLink
Good
Example
Bad
Example
17
SegLink
Limitations
2. Curved shape & distant segments
1. Set 𝜶, 𝜷 manually by a grid search
18
𝐑 𝟐
CNN
Network Overview
19
𝐑 𝟐
CNN
the angle target is not stable in some special points.
Set box coordinates to (𝑥1, 𝑦1, 𝑥2, 𝑦2, ℎ)
𝜃2
𝜃1
Unstable condition of angle
(boundary discontinuity)
20
𝐑 𝟐
CNN
Anchor size : (8,16,32) Anchor size : (4, 8,16,32)
Keep other setting of RPN the same as Faster R-CNN
RPN for proposing axis-aligned boxes.
ROIPoolings of different pooled sizes.
try to use three ROIPoolings with different sizes to catch more text characteristics.
Add (11 x 3), (3 x 11) on (7x7)
21
𝐑 𝟐
CNN
Inclined NMS
22
𝐑 𝟐
CNN
Objective
Let 𝑤, 𝑤∗ indicates 𝑣𝑖, 𝑣𝑖
∗
𝑜𝑟 𝑢𝑖, 𝑢𝑖
∗
, 𝐿 𝑟𝑒𝑔 𝑤, 𝑤∗ is defined as ∶
23
𝐑 𝟐
CNN
Implements Details
Key Value
Dataset ICDAR2015 and augmented data about angle
Pretrained model VGG 16
Optimizer Standard SGD
Batch size, learning rate 32
Learning rate
Learning rates start from 10^−3 , and are
multiplied by 1/10 after 5×10^4 , 10×10^4
and 15 × 10^4 iterations.
Environments Tesla K80 GPU
24
𝐑 𝟐
CNN
Implements Details
25
𝐑 𝟐
CNN
Experiments

More Related Content

PPTX
Aerial detection1
PPTX
Aerial detection part3
PDF
Faster R-CNN - PR012
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PPTX
Feature pyramid networks for object detection
PPTX
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
PDF
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
PDF
Recent Object Detection Research & Person Detection
Aerial detection1
Aerial detection part3
Faster R-CNN - PR012
Semantic segmentation with Convolutional Neural Network Approaches
Feature pyramid networks for object detection
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
Recent Object Detection Research & Person Detection

What's hot (20)

PDF
Object Detection Using R-CNN Deep Learning Framework
PPTX
Tutorial on Object Detection (Faster R-CNN)
PDF
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PDF
PR-207: YOLOv3: An Incremental Improvement
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PPT
Zoooooohaib
PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
PDF
PVANet - PR033
PDF
Kernel Descriptors for Visual Recognition
PDF
PR-132: SSD: Single Shot MultiBox Detector
PDF
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
PDF
Future semantic segmentation with convolutional LSTM
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PPTX
Convolutional Patch Representations for Image Retrieval An unsupervised approach
PDF
Mask R-CNN
PDF
Objects as points (CenterNet) review [CDM]
PDF
Gnn overview
PPTX
Deep image retrieval - learning global representations for image search - ub ...
PDF
Deformable DETR Review [CDM]
PPTX
Rethinking Attention with Performers
Object Detection Using R-CNN Deep Learning Framework
Tutorial on Object Detection (Faster R-CNN)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PR-207: YOLOv3: An Incremental Improvement
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Zoooooohaib
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
PVANet - PR033
Kernel Descriptors for Visual Recognition
PR-132: SSD: Single Shot MultiBox Detector
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Future semantic segmentation with convolutional LSTM
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Mask R-CNN
Objects as points (CenterNet) review [CDM]
Gnn overview
Deep image retrieval - learning global representations for image search - ub ...
Deformable DETR Review [CDM]
Rethinking Attention with Performers
Ad

Similar to Aerial detection part2 (20)

PDF
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
PPTX
DefenseTalk_Trimmed
PPTX
object-detection.pptx
PDF
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
PPTX
Introduction to CNN
PDF
Real Time Sign Language Recognition Using Deep Learning
PDF
Alberto Massidda - Images and words: mechanics of automated captioning with n...
PDF
Convolutional neural network
PDF
Machine learning for newbies
PDF
Stadnford University practical presentation.pdf
PDF
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
PDF
AI_Theory: Covolutional_neuron_network.pdf
PPTX
Vehicle detection in Aerial Images
PDF
Mask-RCNN for Instance Segmentation
PDF
1-pytorch-CNN-RNN.pdf
PDF
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
PPTX
Text extraction from natural scene image, a survey
PDF
IRJET-MText Extraction from Images using Convolutional Neural Network
PDF
Customized mask region based convolutional neural networks for un-uniformed ...
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
DefenseTalk_Trimmed
object-detection.pptx
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
Introduction to CNN
Real Time Sign Language Recognition Using Deep Learning
Alberto Massidda - Images and words: mechanics of automated captioning with n...
Convolutional neural network
Machine learning for newbies
Stadnford University practical presentation.pdf
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
AI_Theory: Covolutional_neuron_network.pdf
Vehicle detection in Aerial Images
Mask-RCNN for Instance Segmentation
1-pytorch-CNN-RNN.pdf
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
Text extraction from natural scene image, a survey
IRJET-MText Extraction from Images using Convolutional Neural Network
Customized mask region based convolutional neural networks for un-uniformed ...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Ad

More from ssuser456ad6 (6)

PPTX
Scale invariant feature transform
PPTX
Learning joint 2 d 3d representations for depth completion
PPTX
Guided image filter
PPTX
Fast cost volume filtering for visual correspondence and beyond
PPTX
D2 net a trainable cnn for joint description and detection of local features
PPTX
Gan dissection
Scale invariant feature transform
Learning joint 2 d 3d representations for depth completion
Guided image filter
Fast cost volume filtering for visual correspondence and beyond
D2 net a trainable cnn for joint description and detection of local features
Gan dissection

Recently uploaded (20)

PPTX
Construction Project Organization Group 2.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
web development for engineering and engineering
PDF
Well-logging-methods_new................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Geodesy 1.pptx...............................................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Welding lecture in detail for understanding
PPTX
OOP with Java - Java Introduction (Basics)
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PPT
Project quality management in manufacturing
Construction Project Organization Group 2.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mechanical Engineering MATERIALS Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
web development for engineering and engineering
Well-logging-methods_new................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Geodesy 1.pptx...............................................
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Internet of Things (IOT) - A guide to understanding
Welding lecture in detail for understanding
OOP with Java - Java Introduction (Basics)
R24 SURVEYING LAB MANUAL for civil enggi
Foundation to blockchain - A guide to Blockchain Tech
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Project quality management in manufacturing

Aerial detection part2

Editor's Notes

  • #3: Cluster detection networ의 전체 스트럭쳐 입니다. 크게 Cluster proposal network랑 scale network그리고 detection network가 존재합니다.
  • #4: Cluster detection networ의 전체 스트럭쳐 입니다. 크게 Cluster proposal network랑 scale network그리고 detection network가 존재합니다.
  • #5: ClusDet의 목표는 image에 object들이 non-uniform하게 분포하며 군집화되어 있는 경향으로 인한 문제점을 해결하는 것 입니다. 기존의 연구들은 이러한 문제점에 대해서 논의한 적이 없다는 것이 선행연구들의 문제점이라 하였습니다. 이를 해결하기위해 Cluster proposal network와 Scale network를 저자는 제안하였습니다.
  • #6: Cpnet은 RPN과 거의 유사한 형태를 가지고 있습니다. 다만 large receptive field를 필요로 하여 feature extractor의 첫 번째 layer를 사용합니다.
  • #7: Cpnet은 RPN과 거의 유사한 형태를 가지고 있습니다. 다만 large receptive field를 필요로 하여 feature extractor의 첫 번째 layer를 사용합니다.
  • #8: Cpnet은 RPN과 거의 유사한 형태를 가지고 있습니다. 다만 large receptive field를 필요로 하여 feature extractor의 첫 번째 layer를 사용합니다.
  • #9: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #10: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #11: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #12: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #13: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #14: Cross layer link 는 기존의 NMS와 다르게 중복의 결합을 학습할 수 있는 방법을 제시하였습니다.
  • #15: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #16: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #17: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #18: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #19: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #20: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #21: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #22: 기존의 NMS와 다른 방식은 IoU 계산 시에 horizontal gt를 이용한 IoU계산이 아닌 oriented gt를 이용한 IoU를 계산하는 것이다.
  • #23: 람다1은 horizontal bounding box를 조정하는 coefficient이고 람다2 는 oriented bounding box를 조절하는 coefficient입니다. Regression task에서는 모두 smooth l1 loss를 사용합니다.
  • #24: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #25: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.
  • #26: 첫 번째 term은 segments score이며 마지막 term은 link score입니다.