Aerial detection part2

Aerial Object Detection
HyeongJun Kwon
2019-2

Contents
2
1. Detecting Oriented Text in Natural Images by Linking Segments
2. 𝐑 𝟐CNN

4
Main Idea : Decompose text into two locally detectable elements, namely
segments and links.
SegLink
The key advantage of this approach is that
long and oriented text is now detected
locally since both basic elements are
locally-detectable:

5
SegLink
Segment: oriented box is part of the word
𝒍 𝑡ℎ layer default box ∶
predicted segments box ∶
𝑎𝑙 = 𝜆
wI
w 𝑙
, where 𝜆 = 1.5

6
SegLink
Link: connects pair of adjust segments
- within layer link
- Cross layer link

7
SegLink
- within layer link : As segments are detected locally, a pair of neighboring segments
are also adjacent on input image.
which are the segments on the
same layer. Every segment has 8
within-layer neighbors.

8
SegLink
- Cross layer link : segments of the same word could be detected on multiple layers at
the same time, producing redundancies.
which are the segments on the
preceeding layer. Every segment
has 4 cross-layer neighbors.

13
SegLink
Groundtruths of Segments and Links
1) the center of the box is inside the word bounding box;
2) the ratio between the box size al and the word height h satisfies:

14
SegLink
Objective
OHEM : negative and positive ratio 3:1 in segments set

15
SegLink
Implements Details
Key Value
Dataset SynthText(before finetune), real dataset
Optimizer Standard SGD
Batch size, learning rate 32
Learning rate 10−3(first 60k iterations), 10−4(rest 30k)
Framework Tensorflow
Environments Xeon 8-core cpu, 4 Titan X, 64GB RAM

16
SegLink
Good
Example
Bad
Example

17
SegLink
Limitations
2. Curved shape & distant segments
1. Set 𝜶, 𝜷 manually by a grid search

18
𝐑 𝟐
CNN
Network Overview

19
𝐑 𝟐
CNN
the angle target is not stable in some special points.
Set box coordinates to (𝑥1, 𝑦1, 𝑥2, 𝑦2, ℎ)
𝜃2
𝜃1
Unstable condition of angle
(boundary discontinuity)

20
𝐑 𝟐
CNN
Anchor size : (8,16,32) Anchor size : (4, 8,16,32)
Keep other setting of RPN the same as Faster R-CNN
RPN for proposing axis-aligned boxes.
ROIPoolings of different pooled sizes.
try to use three ROIPoolings with different sizes to catch more text characteristics.
Add (11 x 3), (3 x 11) on (7x7)

22
𝐑 𝟐
CNN
Objective
Let 𝑤, 𝑤∗ indicates 𝑣𝑖, 𝑣𝑖
∗
𝑜𝑟 𝑢𝑖, 𝑢𝑖
∗
, 𝐿 𝑟𝑒𝑔 𝑤, 𝑤∗ is defined as ∶

23
𝐑 𝟐
CNN
Implements Details
Key Value
Dataset ICDAR2015 and augmented data about angle
Pretrained model VGG 16
Optimizer Standard SGD
Batch size, learning rate 32
Learning rate
Learning rates start from 10^−3 , and are
multiplied by 1/10 after 5×10^4 , 10×10^4
and 15 × 10^4 iterations.
Environments Tesla K80 GPU

24
𝐑 𝟐
CNN
Implements Details

Aerial detection part2

More Related Content

What's hot (20)

Similar to Aerial detection part2 (20)

More from ssuser456ad6 (6)

Recently uploaded (20)

Aerial detection part2

Editor's Notes