論文紹介：Towards Robust Adaptive Object Detection Under Noisy Annotations

Towards Robust Adaptive
Object Detection Under
Noisy Annotations
Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan,
CVPR2022
橋口凌大（名工大）
2022/11/25

概要
nノイズが多いアノテーションのもとでドメイン適応を用いた物体検出
nソースドメインにノイジーラベルが含まれているときドメイン適応型
物体検出器は著しく性能が劣化
nドメイン適用型物体検出器のためのフレームワークNLTEを提案

ノイズの種類
nラベルノイズ
• ミスアノテーション
• クラス破壊
nCityscapesによるノイジーアノテーションの例
n単純なノイジーラベルに対する手法では著しく性能低下
Qiushi Yang1
Baopu Li2
Yixuan Yuan1,*
Hong Kong 2
Baidu USA LLC
ang2-c}@my.cityu.edu.hk, baopuli@baidu.com
an.ee@cityu.edu.hk
dels a
otated
orma-
et do-
ce do-
atasets
(a) (b) (c)
Figure 1. Examples of noisy annotations in Cityscapes dataset.
Miss-annotated samples: The bicycle in (a); the rider and car in
(c). Class-corrupted samples: The rider and bicycle are labeled

手法
nモデル概要図
Figure 3. Overview of our NLTE framework, which includes PIM, MGRM and EAGR. c is the concatenation operation.

Potential Instance Mining
nRPN [Ren+, NeurIPS2015]により物体領域の計算
• 提案領域が物体かどうかのスコアに閾値を用いて選択する
e co-teaching
tor with noisy
roles of noisy
and used cross
et al. [24] uti-
to rectify the
ned the model
ious works at-
n the same do-
em of perfor-
otations in the
eve robust do-
(RPN). As RPN is class-agnostic, the predicted objectness
score of each proposal represents the uncertainty of the ex-
istence of an object within the proposal. Therefore, if the
proposals have larger objectness scores than thresholds and
no intersection with the ground truth boxes, we select them
as eligible candidate proposals P
s
:
P
s
= {pi | obj(pi) > ⌧, pi /
2 Ps
, 8jIoU(pi, pj) = 0},
(1)
where ⌧ is the threshold. PIM is also utilized in the tar-
get domain to mine confident positive samples P
t
for more
effective domain alignment. Through the PIM mecha-
nism, only highly-confident proposals are preserved such
that missing objects would get recaptured, which simulta-
neously increases the number of correctly labeled instances
for enhancing the discrimination ability and enriches the di-
versity of source semantic features.
3.3. Morphable Graph Relation Module

Morphable Graph Relation Module
nクラス破壊されたサンプルのドメインとsemantic informationを探索
nそれぞれのドメイングラフ内で特徴量の集約
• Neighbour(i)はiと同じ領域内のproposal
itional distributions of
Y from different do-
s)|Y s . Then with the
stribution of the drawn
o estimate Pt
Y t , which
e target domain. How-
hen Pt
Y t is computed
P̃s
XsỸ s and the learned
erformance of domain
Although correcting
tions are conducted on both domains.
Intra-domain graph feature aggregation. Given pro-
posals P 2 RN⇥D
{P, P} after PIM, we first con-
struct them as intra-domain undirected graphs G = {V, E}.
Specifically, the vertices correspond to the proposals within
each domain, and the edges are defined as the feature cosine
similarity between them (eii0 = pi·pi0
kpik2·kpi0 k2
). Afterwards,
we apply intra-domain aggregation to enhance the feature
representation within each domain, shown as follows:
pi
⇣ X
i02Neighbour(i)
(wipieii0 + pi)
⌘
, pi 2 P, (2)
14209

nグローバル関係行列の構築
wise transition probability between source and target do-
mains, we introduce a global relation matrix that represents
the category-wise affinity between domains. Specifically,
considering the source dataset contains noisy annotations
and the target dataset is unlabeled, we first utilize confi-
dent proposals after aggregation P to assemble as batch-
wise prototypes, which correspond to the class-wise feature
centroids:
{ (u)}C
u=1 =
1
Card(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
where Card is the cardinality and y0
is the most confi-
dent category. Then, the correspondence between local and
global prototypes is characterized according to their seman-
tic correlation, and an adaptive update operation for gener-
ating global prototypes {Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
ducted:
{B(u)}C
u=1 =
C
X
m=1
(1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
where ⌧(m,u) is the cosine similarity between the m-th
batch-wise prototype and the u-th global prototype. With
this adaptive update process, the representation of global
To mitigate this, the transition pr
corrupted samples are expected to b
trinsic class-wise correspondence b
domain. Therefore, we directly ex
posals features from P̃s
(ỹ) regardin
noisy labels ỹ, and generate noisy
similar to the batch-wise prototype
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ))
Then, we build local relation matr
˜s
(u) and Bt
(v) to model the transfe
samples. Each entry is the class-w
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We u
such transition probability betwee
and global relation matrix:
Lmgrm =
1
r
X
r21(Z)
|
where 1(Z) refers to the non-zero c
indicates the existence of the r-
14210
wise prototypes, which correspond to the class-wise feature
centroids:
{ (u)}C
u=1 =
1
Card(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
where Card is the cardinality and y0
is the most confi-
dent category. Then, the correspondence between local and
global prototypes is characterized according to their seman-
tic correlation, and an adaptive update operation for gener-
ating global prototypes {Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
ducted:
{B(u)}C
u=1 =
C
X
m=1
(1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
where ⌧(m,u) is the cosine similarity between the m-th
batch-wise prototype and the u-th global prototype. With
this adaptive update process, the representation of global
similar to the batch-wise prototy
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ
Then, we build local relation m
˜s
(u) and Bt
(v) to model the tran
samples. Each entry is the clas
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We
such transition probability betw
Lmgrm =
1
r
X
r21(Z
where 1(Z) refers to the non-zer
indicates the existence of the
14210

n遷移確率正則化
lobal relation matrix that represents
ity between domains. Specifically,
dataset contains noisy annotations
s unlabeled, we first utilize confi-
gregation P to assemble as batch-
correspond to the class-wise feature
1
ard(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
rdinality and y0
is the most confi-
e correspondence between local and
racterized according to their seman-
daptive update operation for gener-
{Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
osine similarity between the m-th
nd the u-th global prototype. With
ocess, the representation of global
corrupted samples are expected to be regularized by the in-
trinsic class-wise correspondence between source and target
domain. Therefore, we directly extract noisy source pro-
posals features from P̃s
(ỹ) regarding to their corresponding
noisy labels ỹ, and generate noisy source local prototype
similar to the batch-wise prototypes in Eq. (3):
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ))
X
ỹ=u
p̃s
(ỹ,i)2P̃s
(ỹ)
p̃s
(ỹ,i). (5)
Then, we build local relation matrix Z 2 RC⇥C
between
˜s
(u) and Bt
(v) to model the transferability of noisy source
samples. Each entry is the class-wise transition probabil-
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We use `1 loss to regularize
such transition probability between local relation matrix
Lmgrm =
1
r
X
r21(Z)
|zr ⇡r|, (6)
where 1(Z) refers to the non-zero columns within Z, which
indicates the existence of the r-th category within the
14210
(P(y0))
y0
=u
p(y0,i)2P(y)
(y ,i)
nality and y0
is the most confi-
orrespondence between local and
terized according to their seman-
ptive update operation for gener-
Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
ne similarity between the m-th
the u-th global prototype. With
ess, the representation of global
Card(P(ỹ)) ỹ=u
p̃s
(ỹ,i)2P̃s
(ỹ)
Then, we build local relation matrix Z 2 RC⇥C
between
˜s
(u) and Bt
(v) to model the transferability of noisy source
samples. Each entry is the class-wise transition probabil-
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We use `1 loss to regularize
such transition probability between local relation matrix
Lmgrm =
1
r
X
r21(Z)
|zr ⇡r|, (6)
where 1(Z) refers to the non-zero columns within Z, which
indicates the existence of the r-th category within the
14210
+GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44
+NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35
80%
DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33
+SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56
+CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36
+GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46
+NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43
3.5. Framework Optimization
The framework is trained with the following objective
function:
L = Ldet + mgrmLmgrm + LDAF
dis + LEAGR
dis , (12)
where Ldet denotes the loss of Faster R-CNN [37] which
consists of RPN loss and RCNN loss. LDAF
dis contains the
discrimination components in DAF [6]. Hereafter, we adopt
meta update in Eq. (11) for achieving gradient reconcile-
ment. During inference, the input images are consecutively
fed into ( f , det) to obtain the detection results.
4. Experiments
mimic the anno
select a portion
random label.
ground, the corr
Clipart1k & W
graphical image
cal VOC. All im
and testing. Wat
as the Pascal V
adversarial train
Cityscapes & F
2,975 images fo
As shown in Fi
notations itself,

実験設定
nデータセット
• Pascal VOC
• Clipart1k
• watercolor2k
• Cityscape
• Foggy Cityscape
nモデル
• DAF [Chen+, CVPR2018]
• バックボーン
• ResNet50
nOptimizer
• SGD
n学習率
• 0.0001
• 5,6エポック目に0.1で減衰
nノイズ付与
• ランダムにbboxのラベルを変更

実験結果
nPascalVOC, Noisy Pasca -> Clipart1k
• CPでは20,40%のときにDAFより低下
• NLTEは総じて性能が向上する
Table 1. Results (%) of Pascal VOC and Noisy Pascal VOC with different noisy rates (NR) ! Clipart1k.
Pascal VOC & Noisy Pascal VOC ! Clipart1k
NR Methods aero bcycle bird boat bottle bus car cat chair cow table dog hrs bike prsn plnt sheep sofa train tv mAP Imprv.
0%
DAF 29.0 45.1 33.3 25.8 28.6 48.0 39.8 12.3 35.3 50.3 22.9 17.4 33.4 33.8 59.2 44.8 20.7 26.0 45.3 49.6 35.0 0.0
+SCE 26.5 46.4 35.9 24.3 30.9 38.3 34.9 3.1 31.7 49.8 18.2 17.8 25.2 45.4 53.9 43.0 15.7 26.4 43.3 39.3 32.5 -2.5
+CP 30.3 49.2 29.8 33.2 34.1 45.8 41.1 9.7 35.8 50.7 23.6 14.4 31.7 36.9 54.6 45.8 18.6 29.9 44.8 43.5 35.2 +0.2
+GCE 31.9 53.2 27.9 25.8 31.0 41.9 39.3 4.3 34.5 46.7 18.1 18.4 30.2 39.1 55.0 44.1 18.1 21.1 43.2 40.7 33.2 -1.8
+NLTE 39.1 50.3 33.6 34.7 35.0 40.5 44.2 5.9 36.8 45.8 23.1 17.3 31.8 39.5 60.7 45.4 17.9 28.4 49.0 51.3 36.5 +1.5
20%
DAF 34.0 39.1 32.0 27.3 32.2 39.3 38.9 2.9 34.9 44.9 20.6 14.2 30.8 36.6 53.8 43.8 17.6 23.6 42.8 46.1 32.8 0.0
+SCE 23.3 42.3 33.1 27.3 28.8 42.4 35.1 4.0 33.0 44.2 14.6 19.4 27.0 40.9 51.1 45.2 14.9 25.8 41.6 34.8 31.5 -1.3
+CP 29.3 39.6 29.1 28.0 29.4 34.2 42.4 3.9 35.0 39.4 21.2 12.5 32.2 38.9 57.2 43.0 18.6 27.9 40.2 45.0 32.3 -0.5
+GCE 24.0 42.2 32.4 29.4 31.5 45.5 39.9 6.7 36.5 38.0 16.7 15.3 30.4 37.9 53.6 44.1 13.5 24.6 46.9 43.7 32.6 -0.2
+NLTE 33.1 47.5 35.5 28.2 33.7 53.8 43.8 4.2 34.2 48.4 19.3 14.6 29.7 47.2 57.1 42.5 17.7 27.7 40.0 44.5 35.1 +2.3
40%
DAF 24.5 39.4 29.1 26.9 32.8 46.5 40.0 4.7 36.1 42.0 21.3 10.6 27.8 37.3 52.8 39.7 17.5 26.9 36.0 46.2 31.9 0.0
+SCE 17.9 42.9 29.7 21.8 26.9 41.5 34.2 8.2 29.1 38.8 19.3 19.2 28.9 48.6 50.7 42.5 10.6 20.4 41.6 40.3 30.6 -1.3
+CP 24.0 40.9 31.1 22.0 31.2 33.0 40.8 4.4 34.6 36.8 18.5 13.8 29.6 41.3 51.7 38.6 14.2 27.8 26.0 37.8 29.9 -2.0
+GCE 27.0 36.2 31.1 26.0 33.8 42.9 41.2 2.6 37.0 45.3 19.0 17.2 33.0 42.9 54.6 45.7 17.6 21.4 41.9 49.1 33.3 +1.4
+NLTE 32.8 45.5 30.8 29.8 35.7 43.2 43.0 6.4 32.7 45.9 19.8 10.8 31.1 43.4 56.4 43.3 19.6 24.8 42.5 43.9 34.1 +2.2
60%
DAF 29.4 33.5 29.7 29.0 27.7 39.5 38.0 2.7 31.9 41.5 19.8 12.9 30.2 37.0 49.7 37.2 12.8 25.5 40.8 44.2 30.6 0.0
+SCE 22.2 44.0 31.3 28.0 29.8 48.7 31.6 11.0 29.1 30.7 19.7 9.2 25.8 55.9 51.9 41.3 5.7 21.8 49.0 34.9 31.1 +0.5
+CP 32.2 42.1 31.5 26.3 31.9 42.4 40.5 2.7 31.8 45.2 20.0 12.2 26.5 38.1 51.1 42.3 11.0 25.6 38.4 41.4 31.7 +1.1
+GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44.9 53.1 43.9 14.5 22.8 38.6 43.7 33.3 +2.7
+NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35.3 55.9 42.2 20.8 30.1 34.5 41.2 34.0 +3.4
80%
DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33.3 46.0 37.4 17.5 20.4 30.6 41.9 28.5 0.0
+SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56.2 52.6 38.0 8.2 21.7 41.8 35.5 30.0 +1.5
+CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36.5 46.8 35.4 19.9 27.0 29.6 39.1 29.1 +0.6
+GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46.8 50.1 37.9 7.3 20.2 42.5 36.0 29.1 +0.6
+NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43.1 56.6 42.1 17.7 23.3 31.2 38.4 32.5 +4.0

実験結果
nPascalVOC, Noisy Pasca -> Watercolor2k
• 従来手法では性能低下する可能性
• 提案手法は一貫して性能の向上
Table 2. Results (%) of Pascal VOC and Noisy Pascal VOC with
different noisy rates (NR) ! Watercolor2k.
Pascal VOC & Noisy Pascal VOC ! Watercolor2k
NR Methods bcycle bird car cat dog prsn mAP Imprv.
0%
DAF 65.8 40.4 35.3 30.0 21.5 44.1 39.6 0.0
+SCE 65.3 36.9 38.3 25.8 18.9 43.2 37.9 -1.7
+CP 67.1 39.1 34.5 27.2 22.9 45.3 39.4 -0.2
+GCE 67.3 37.0 39.7 21.9 21.3 46.4 38.9 -0.7
+NLTE 73.7 36.9 39.9 26.8 22.6 45.3 40.9 +1.3
20%
DAF 69.1 36.5 25.8 31.0 16.1 44.9 37.2 0.0
+SCE 62.4 42.6 33.2 32.2 18.5 46.5 39.2 +2.0
+CP 72 36.5 21.3 18.3 21.1 41.5 35.1 -2.1
+GCE 62.7 42.5 40.1 26.2 18.8 44.9 39.2 +2.0
+NLTE 73.7 37.1 35.3 28.1 21.2 44.5 40.0 +2.8
40%
DAF 68.0 32.9 20.5 19.8 13.6 39.4 32.4 0.0
+SCE 64.5 36.6 37.8 14.1 14.0 42.8 35.0 +2.6
+CP 66.0 36.6 17.8 24.0 18.2 39.8 33.7 +1.3
+GCE 64.3 40.0 34.7 21.3 19.0 43.8 37.2 +4.8
+NLTE 75.7 37.2 32.5 22.6 24.3 43.1 39.2 +6.8
60%
DAF 58.6 35.6 16.7 18.8 11.5 40.1 30.2 0.0
+SCE 68.1 36.3 31.8 21.9 19.7 41.3 36.5 +6.3
+CP 68.4 30.3 24.0 22.8 9.6 38.7 32.3 +2.1
+GCE 73.7 33.0 28.7 24.3 20.4 41.2 36.9 +6.7
+NLTE 69.5 35.4 27.4 28.4 19.8 51.5 38.6 +8.4
80%
DAF 56.8 36.7 15.6 19.0 14.8 37.8 30.1 0.0
+SCE 69.4 37.4 22.6 24.3 16.6 34.6 34.2 +4.1
+CP 49.1 36.1 16.6 13.7 10.1 36.9 27.1 -3.0
+GCE 62.8 34.3 14.5 13.4 10.7 40.6 29.4 -0.7
+NLTE 72.7 41.4 6.6 30.5 14.1 47.9 35.6 +5.5
4.2. Synthetic Noise

実験結果
nノイズ率20%の定性的評価
• 大きなドメインシフトがあっても不明瞭な物体を正しく分類できる
• 人物などの隠蔽物に対して正確なbboxを生成
(a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [35] (d) DAF+GCE [56] (e) DAF+NLTE (Ours)
Figure 4. Qualitative results with noisy rate 20% on Clipart1k (top row) and Watercolor2k (bottom row).
上段: Clipart1k, 下段: Watercolor2k

実験結果
n関係行列の可視化
• Pascal VOC, Noisy Pascal VOC -> Cliparrt1kの行列
• ノイズが増えてもクラスごとの遷移確率が反映されている
(a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [3
Figure 4. Qualitative results with noisy rate 20% on Clipa
Figure 5. Global relation matrices of Pascal VOC & Noisy Pascal
0% 20% 40% 60% 80%

まとめ
nノイズが多いアノテーションのもとで物体検出の精度向上
n提案手法NLTEはノイズの割合が大きくても精度を保つ
• ノイズが0%の状況においても提案手法は精度向上を示した

論文紹介：Towards Robust Adaptive Object Detection Under Noisy Annotations

More Related Content

Similar to 論文紹介：Towards Robust Adaptive Object Detection Under Noisy Annotations (20)

More from Toru Tamaki (20)

Recently uploaded (20)

論文紹介：Towards Robust Adaptive Object Detection Under Noisy Annotations