SlideShare a Scribd company logo
Towards Robust Adaptive
Object Detection Under
Noisy Annotations
Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan,
CVPR2022
橋口凌大(名工大)
2022/11/25
概要
nノイズが多いアノテーションのもとでドメイン適応を用いた物体検出
nソースドメインにノイジーラベルが含まれているときドメイン適応型
物体検出器は著しく性能が劣化
nドメイン適用型物体検出器のためのフレームワークNLTEを提案
ノイズの種類
nラベルノイズ
• ミスアノテーション
• クラス破壊
nCityscapesによるノイジーアノテーションの例
n単純なノイジーラベルに対する手法では著しく性能低下
Qiushi Yang1
Baopu Li2
Yixuan Yuan1,*
Hong Kong 2
Baidu USA LLC
ang2-c}@my.cityu.edu.hk, baopuli@baidu.com
an.ee@cityu.edu.hk
dels a
otated
orma-
et do-
ce do-
atasets
(a) (b) (c)
Figure 1. Examples of noisy annotations in Cityscapes dataset.
Miss-annotated samples: The bicycle in (a); the rider and car in
(c). Class-corrupted samples: The rider and bicycle are labeled
手法
nモデル概要図
Figure 3. Overview of our NLTE framework, which includes PIM, MGRM and EAGR. c is the concatenation operation.
Potential Instance Mining
nRPN [Ren+, NeurIPS2015]により物体領域の計算
• 提案領域が物体かどうかのスコアに閾値を用いて選択する
e co-teaching
tor with noisy
roles of noisy
and used cross
et al. [24] uti-
to rectify the
ned the model
ious works at-
n the same do-
em of perfor-
otations in the
eve robust do-
(RPN). As RPN is class-agnostic, the predicted objectness
score of each proposal represents the uncertainty of the ex-
istence of an object within the proposal. Therefore, if the
proposals have larger objectness scores than thresholds and
no intersection with the ground truth boxes, we select them
as eligible candidate proposals P
s
:
P
s
= {pi | obj(pi) > ⌧, pi /
2 Ps
, 8jIoU(pi, pj) = 0},
(1)
where ⌧ is the threshold. PIM is also utilized in the tar-
get domain to mine confident positive samples P
t
for more
effective domain alignment. Through the PIM mecha-
nism, only highly-confident proposals are preserved such
that missing objects would get recaptured, which simulta-
neously increases the number of correctly labeled instances
for enhancing the discrimination ability and enriches the di-
versity of source semantic features.
3.3. Morphable Graph Relation Module
Morphable Graph Relation Module
nクラス破壊されたサンプルのドメインとsemantic informationを探索
nそれぞれのドメイングラフ内で特徴量の集約
• Neighbour(i)はiと同じ領域内のproposal
itional distributions of
Y from different do-
s)|Y s . Then with the
stribution of the drawn
o estimate Pt
Y t , which
e target domain. How-
hen Pt
Y t is computed
P̃s
XsỸ s and the learned
erformance of domain
Although correcting
tions are conducted on both domains.
Intra-domain graph feature aggregation. Given pro-
posals P 2 RN⇥D
{P, P} after PIM, we first con-
struct them as intra-domain undirected graphs G = {V, E}.
Specifically, the vertices correspond to the proposals within
each domain, and the edges are defined as the feature cosine
similarity between them (eii0 = pi·pi0
kpik2·kpi0 k2
). Afterwards,
we apply intra-domain aggregation to enhance the feature
representation within each domain, shown as follows:
pi
⇣ X
i02Neighbour(i)
(wipieii0 + pi)
⌘
, pi 2 P, (2)
14209
Morphable Graph Relation Module
nグローバル関係行列の構築
wise transition probability between source and target do-
mains, we introduce a global relation matrix that represents
the category-wise affinity between domains. Specifically,
considering the source dataset contains noisy annotations
and the target dataset is unlabeled, we first utilize confi-
dent proposals after aggregation P to assemble as batch-
wise prototypes, which correspond to the class-wise feature
centroids:
{ (u)}C
u=1 =
1
Card(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
where Card is the cardinality and y0
is the most confi-
dent category. Then, the correspondence between local and
global prototypes is characterized according to their seman-
tic correlation, and an adaptive update operation for gener-
ating global prototypes {Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
ducted:
{B(u)}C
u=1 =
C
X
m=1
(1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
where ⌧(m,u) is the cosine similarity between the m-th
batch-wise prototype and the u-th global prototype. With
this adaptive update process, the representation of global
To mitigate this, the transition pr
corrupted samples are expected to b
trinsic class-wise correspondence b
domain. Therefore, we directly ex
posals features from P̃s
(ỹ) regardin
noisy labels ỹ, and generate noisy
similar to the batch-wise prototype
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ))
Then, we build local relation matr
˜s
(u) and Bt
(v) to model the transfe
samples. Each entry is the class-w
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We u
such transition probability betwee
and global relation matrix:
Lmgrm =
1
r
X
r21(Z)
|
where 1(Z) refers to the non-zero c
indicates the existence of the r-
14210
wise prototypes, which correspond to the class-wise feature
centroids:
{ (u)}C
u=1 =
1
Card(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
where Card is the cardinality and y0
is the most confi-
dent category. Then, the correspondence between local and
global prototypes is characterized according to their seman-
tic correlation, and an adaptive update operation for gener-
ating global prototypes {Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
ducted:
{B(u)}C
u=1 =
C
X
m=1
(1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
where ⌧(m,u) is the cosine similarity between the m-th
batch-wise prototype and the u-th global prototype. With
this adaptive update process, the representation of global
similar to the batch-wise prototy
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ
Then, we build local relation m
˜s
(u) and Bt
(v) to model the tran
samples. Each entry is the clas
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We
such transition probability betw
and global relation matrix:
Lmgrm =
1
r
X
r21(Z
where 1(Z) refers to the non-zer
indicates the existence of the
14210
Morphable Graph Relation Module
n遷移確率正則化
lobal relation matrix that represents
ity between domains. Specifically,
dataset contains noisy annotations
s unlabeled, we first utilize confi-
gregation P to assemble as batch-
correspond to the class-wise feature
1
ard(P(y0))
X
y0
=u
p(y0,i)2P(y)
p(y0,i), (3)
rdinality and y0
is the most confi-
e correspondence between local and
racterized according to their seman-
daptive update operation for gener-
{Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
osine similarity between the m-th
nd the u-th global prototype. With
ocess, the representation of global
corrupted samples are expected to be regularized by the in-
trinsic class-wise correspondence between source and target
domain. Therefore, we directly extract noisy source pro-
posals features from P̃s
(ỹ) regarding to their corresponding
noisy labels ỹ, and generate noisy source local prototype
similar to the batch-wise prototypes in Eq. (3):
{˜s
(u)}C
u=1 =
1
Card(P̃s
(ỹ))
X
ỹ=u
p̃s
(ỹ,i)2P̃s
(ỹ)
p̃s
(ỹ,i). (5)
Then, we build local relation matrix Z 2 RC⇥C
between
˜s
(u) and Bt
(v) to model the transferability of noisy source
samples. Each entry is the class-wise transition probabil-
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We use `1 loss to regularize
such transition probability between local relation matrix
and global relation matrix:
Lmgrm =
1
r
X
r21(Z)
|zr ⇡r|, (6)
where 1(Z) refers to the non-zero columns within Z, which
indicates the existence of the r-th category within the
14210
(P(y0))
y0
=u
p(y0,i)2P(y)
(y ,i)
nality and y0
is the most confi-
orrespondence between local and
terized according to their seman-
ptive update operation for gener-
Bs
(u)}C
u=1 and {Bt
(v)}C
v=1 is con-
⌧(m,u)) (m) + ⌧(m,u)B(u), (4)
ne similarity between the m-th
the u-th global prototype. With
ess, the representation of global
Card(P(ỹ)) ỹ=u
p̃s
(ỹ,i)2P̃s
(ỹ)
Then, we build local relation matrix Z 2 RC⇥C
between
˜s
(u) and Bt
(v) to model the transferability of noisy source
samples. Each entry is the class-wise transition probabil-
ity zu,v =
˜s
(u)·Bt
(v)
k ˜s
(u)
k2·kBt
(v)
k2
. We use `1 loss to regularize
such transition probability between local relation matrix
and global relation matrix:
Lmgrm =
1
r
X
r21(Z)
|zr ⇡r|, (6)
where 1(Z) refers to the non-zero columns within Z, which
indicates the existence of the r-th category within the
14210
+GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44
+NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35
80%
DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33
+SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56
+CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36
+GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46
+NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43
3.5. Framework Optimization
The framework is trained with the following objective
function:
L = Ldet + mgrmLmgrm + LDAF
dis + LEAGR
dis , (12)
where Ldet denotes the loss of Faster R-CNN [37] which
consists of RPN loss and RCNN loss. LDAF
dis contains the
discrimination components in DAF [6]. Hereafter, we adopt
meta update in Eq. (11) for achieving gradient reconcile-
ment. During inference, the input images are consecutively
fed into ( f , det) to obtain the detection results.
4. Experiments
mimic the anno
select a portion
random label.
ground, the corr
Clipart1k & W
graphical image
cal VOC. All im
and testing. Wat
as the Pascal V
adversarial train
Cityscapes & F
2,975 images fo
As shown in Fi
notations itself,
実験設定
nデータセット
• Pascal VOC
• Clipart1k
• watercolor2k
• Cityscape
• Foggy Cityscape
nモデル
• DAF [Chen+, CVPR2018]
• バックボーン
• ResNet50
nOptimizer
• SGD
n学習率
• 0.0001
• 5,6エポック目に0.1で減衰
nノイズ付与
• ランダムにbboxのラベルを変更
実験結果
nPascalVOC, Noisy Pasca -> Clipart1k
• CPでは20,40%のときにDAFより低下
• NLTEは総じて性能が向上する
Table 1. Results (%) of Pascal VOC and Noisy Pascal VOC with different noisy rates (NR) ! Clipart1k.
Pascal VOC & Noisy Pascal VOC ! Clipart1k
NR Methods aero bcycle bird boat bottle bus car cat chair cow table dog hrs bike prsn plnt sheep sofa train tv mAP Imprv.
0%
DAF 29.0 45.1 33.3 25.8 28.6 48.0 39.8 12.3 35.3 50.3 22.9 17.4 33.4 33.8 59.2 44.8 20.7 26.0 45.3 49.6 35.0 0.0
+SCE 26.5 46.4 35.9 24.3 30.9 38.3 34.9 3.1 31.7 49.8 18.2 17.8 25.2 45.4 53.9 43.0 15.7 26.4 43.3 39.3 32.5 -2.5
+CP 30.3 49.2 29.8 33.2 34.1 45.8 41.1 9.7 35.8 50.7 23.6 14.4 31.7 36.9 54.6 45.8 18.6 29.9 44.8 43.5 35.2 +0.2
+GCE 31.9 53.2 27.9 25.8 31.0 41.9 39.3 4.3 34.5 46.7 18.1 18.4 30.2 39.1 55.0 44.1 18.1 21.1 43.2 40.7 33.2 -1.8
+NLTE 39.1 50.3 33.6 34.7 35.0 40.5 44.2 5.9 36.8 45.8 23.1 17.3 31.8 39.5 60.7 45.4 17.9 28.4 49.0 51.3 36.5 +1.5
20%
DAF 34.0 39.1 32.0 27.3 32.2 39.3 38.9 2.9 34.9 44.9 20.6 14.2 30.8 36.6 53.8 43.8 17.6 23.6 42.8 46.1 32.8 0.0
+SCE 23.3 42.3 33.1 27.3 28.8 42.4 35.1 4.0 33.0 44.2 14.6 19.4 27.0 40.9 51.1 45.2 14.9 25.8 41.6 34.8 31.5 -1.3
+CP 29.3 39.6 29.1 28.0 29.4 34.2 42.4 3.9 35.0 39.4 21.2 12.5 32.2 38.9 57.2 43.0 18.6 27.9 40.2 45.0 32.3 -0.5
+GCE 24.0 42.2 32.4 29.4 31.5 45.5 39.9 6.7 36.5 38.0 16.7 15.3 30.4 37.9 53.6 44.1 13.5 24.6 46.9 43.7 32.6 -0.2
+NLTE 33.1 47.5 35.5 28.2 33.7 53.8 43.8 4.2 34.2 48.4 19.3 14.6 29.7 47.2 57.1 42.5 17.7 27.7 40.0 44.5 35.1 +2.3
40%
DAF 24.5 39.4 29.1 26.9 32.8 46.5 40.0 4.7 36.1 42.0 21.3 10.6 27.8 37.3 52.8 39.7 17.5 26.9 36.0 46.2 31.9 0.0
+SCE 17.9 42.9 29.7 21.8 26.9 41.5 34.2 8.2 29.1 38.8 19.3 19.2 28.9 48.6 50.7 42.5 10.6 20.4 41.6 40.3 30.6 -1.3
+CP 24.0 40.9 31.1 22.0 31.2 33.0 40.8 4.4 34.6 36.8 18.5 13.8 29.6 41.3 51.7 38.6 14.2 27.8 26.0 37.8 29.9 -2.0
+GCE 27.0 36.2 31.1 26.0 33.8 42.9 41.2 2.6 37.0 45.3 19.0 17.2 33.0 42.9 54.6 45.7 17.6 21.4 41.9 49.1 33.3 +1.4
+NLTE 32.8 45.5 30.8 29.8 35.7 43.2 43.0 6.4 32.7 45.9 19.8 10.8 31.1 43.4 56.4 43.3 19.6 24.8 42.5 43.9 34.1 +2.2
60%
DAF 29.4 33.5 29.7 29.0 27.7 39.5 38.0 2.7 31.9 41.5 19.8 12.9 30.2 37.0 49.7 37.2 12.8 25.5 40.8 44.2 30.6 0.0
+SCE 22.2 44.0 31.3 28.0 29.8 48.7 31.6 11.0 29.1 30.7 19.7 9.2 25.8 55.9 51.9 41.3 5.7 21.8 49.0 34.9 31.1 +0.5
+CP 32.2 42.1 31.5 26.3 31.9 42.4 40.5 2.7 31.8 45.2 20.0 12.2 26.5 38.1 51.1 42.3 11.0 25.6 38.4 41.4 31.7 +1.1
+GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44.9 53.1 43.9 14.5 22.8 38.6 43.7 33.3 +2.7
+NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35.3 55.9 42.2 20.8 30.1 34.5 41.2 34.0 +3.4
80%
DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33.3 46.0 37.4 17.5 20.4 30.6 41.9 28.5 0.0
+SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56.2 52.6 38.0 8.2 21.7 41.8 35.5 30.0 +1.5
+CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36.5 46.8 35.4 19.9 27.0 29.6 39.1 29.1 +0.6
+GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46.8 50.1 37.9 7.3 20.2 42.5 36.0 29.1 +0.6
+NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43.1 56.6 42.1 17.7 23.3 31.2 38.4 32.5 +4.0
実験結果
nPascalVOC, Noisy Pasca -> Watercolor2k
• 従来手法では性能低下する可能性
• 提案手法は一貫して性能の向上
Table 2. Results (%) of Pascal VOC and Noisy Pascal VOC with
different noisy rates (NR) ! Watercolor2k.
Pascal VOC & Noisy Pascal VOC ! Watercolor2k
NR Methods bcycle bird car cat dog prsn mAP Imprv.
0%
DAF 65.8 40.4 35.3 30.0 21.5 44.1 39.6 0.0
+SCE 65.3 36.9 38.3 25.8 18.9 43.2 37.9 -1.7
+CP 67.1 39.1 34.5 27.2 22.9 45.3 39.4 -0.2
+GCE 67.3 37.0 39.7 21.9 21.3 46.4 38.9 -0.7
+NLTE 73.7 36.9 39.9 26.8 22.6 45.3 40.9 +1.3
20%
DAF 69.1 36.5 25.8 31.0 16.1 44.9 37.2 0.0
+SCE 62.4 42.6 33.2 32.2 18.5 46.5 39.2 +2.0
+CP 72 36.5 21.3 18.3 21.1 41.5 35.1 -2.1
+GCE 62.7 42.5 40.1 26.2 18.8 44.9 39.2 +2.0
+NLTE 73.7 37.1 35.3 28.1 21.2 44.5 40.0 +2.8
40%
DAF 68.0 32.9 20.5 19.8 13.6 39.4 32.4 0.0
+SCE 64.5 36.6 37.8 14.1 14.0 42.8 35.0 +2.6
+CP 66.0 36.6 17.8 24.0 18.2 39.8 33.7 +1.3
+GCE 64.3 40.0 34.7 21.3 19.0 43.8 37.2 +4.8
+NLTE 75.7 37.2 32.5 22.6 24.3 43.1 39.2 +6.8
60%
DAF 58.6 35.6 16.7 18.8 11.5 40.1 30.2 0.0
+SCE 68.1 36.3 31.8 21.9 19.7 41.3 36.5 +6.3
+CP 68.4 30.3 24.0 22.8 9.6 38.7 32.3 +2.1
+GCE 73.7 33.0 28.7 24.3 20.4 41.2 36.9 +6.7
+NLTE 69.5 35.4 27.4 28.4 19.8 51.5 38.6 +8.4
80%
DAF 56.8 36.7 15.6 19.0 14.8 37.8 30.1 0.0
+SCE 69.4 37.4 22.6 24.3 16.6 34.6 34.2 +4.1
+CP 49.1 36.1 16.6 13.7 10.1 36.9 27.1 -3.0
+GCE 62.8 34.3 14.5 13.4 10.7 40.6 29.4 -0.7
+NLTE 72.7 41.4 6.6 30.5 14.1 47.9 35.6 +5.5
4.2. Synthetic Noise
実験結果
nノイズ率20%の定性的評価
• 大きなドメインシフトがあっても不明瞭な物体を正しく分類できる
• 人物などの隠蔽物に対して正確なbboxを生成
(a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [35] (d) DAF+GCE [56] (e) DAF+NLTE (Ours)
Figure 4. Qualitative results with noisy rate 20% on Clipart1k (top row) and Watercolor2k (bottom row).
上段: Clipart1k, 下段: Watercolor2k
実験結果
n関係行列の可視化
• Pascal VOC, Noisy Pascal VOC -> Cliparrt1kの行列
• ノイズが増えてもクラスごとの遷移確率が反映されている
(a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [3
Figure 4. Qualitative results with noisy rate 20% on Clipa
Figure 5. Global relation matrices of Pascal VOC & Noisy Pascal
0% 20% 40% 60% 80%
まとめ
nノイズが多いアノテーションのもとで物体検出の精度向上
n提案手法NLTEはノイズの割合が大きくても精度を保つ
• ノイズが0%の状況においても提案手法は精度向上を示した

More Related Content

PDF
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
PDF
[DL輪読会]Generative Models of Visually Grounded Imagination
PPTX
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
PPTX
論文紹介 Fast imagetagging
PDF
論文紹介:Learning With Neighbor Consistency for Noisy Labels
PDF
Paper reading best of both world
PDF
PPTX
[NS][Lab_Seminar_241111]Patch-Wise Graph Contrastive Learning for Image Trans...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
[DL輪読会]Generative Models of Visually Grounded Imagination
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
論文紹介 Fast imagetagging
論文紹介:Learning With Neighbor Consistency for Noisy Labels
Paper reading best of both world
[NS][Lab_Seminar_241111]Patch-Wise Graph Contrastive Learning for Image Trans...

Similar to 論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations (20)

PDF
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
PDF
GAN(と強化学習との関係)
PPTX
240408_Thanh_LabSeminar[Region Graph Embedding Network for Zero-Shot Learning...
PDF
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
PDF
20190927 generative models_aia
PDF
20140530.journal club
PDF
深層意味表現学習 (Deep Semantic Representations)
PPTX
Tell Me What They’re Holding: Weakly Supervised Object Detection with Transfe...
PPTX
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
PDF
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
PDF
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
PDF
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
PPTX
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
PPT
Constellation Models and Unsupervised Learning for Object Class Recognition
PDF
Neural Input Search for Large Scale Recommendation Models
 
PDF
PhD Defense Slides
PDF
[PR12] PR-036 Learning to Remember Rare Events
PDF
Learning Visual Representations from Uncurated Data
PDF
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
PPTX
物体検出の歴史(R-CNNからSSD・YOLOまで)
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
GAN(と強化学習との関係)
240408_Thanh_LabSeminar[Region Graph Embedding Network for Zero-Shot Learning...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
20190927 generative models_aia
20140530.journal club
深層意味表現学習 (Deep Semantic Representations)
Tell Me What They’re Holding: Weakly Supervised Object Detection with Transfe...
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
Constellation Models and Unsupervised Learning for Object Class Recognition
Neural Input Search for Large Scale Recommendation Models
 
PhD Defense Slides
[PR12] PR-036 Learning to Remember Rare Events
Learning Visual Representations from Uncurated Data
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
物体検出の歴史(R-CNNからSSD・YOLOまで)
Ad

More from Toru Tamaki (20)

PDF
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
PDF
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
PDF
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
PDF
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
PDF
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
PDF
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
PDF
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
PDF
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...
PDF
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
PDF
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
PDF
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
PDF
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
PDF
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
PDF
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PDF
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
PDF
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
PDF
論文紹介:On Feature Normalization and Data Augmentation
PDF
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
PDF
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
PDF
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
論文紹介:On Feature Normalization and Data Augmentation
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
Ad

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25-Week II
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
A comparative analysis of optical character recognition models for extracting...

論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations

  • 1. Towards Robust Adaptive Object Detection Under Noisy Annotations Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan, CVPR2022 橋口凌大(名工大) 2022/11/25
  • 3. ノイズの種類 nラベルノイズ • ミスアノテーション • クラス破壊 nCityscapesによるノイジーアノテーションの例 n単純なノイジーラベルに対する手法では著しく性能低下 Qiushi Yang1 Baopu Li2 Yixuan Yuan1,* Hong Kong 2 Baidu USA LLC ang2-c}@my.cityu.edu.hk, baopuli@baidu.com an.ee@cityu.edu.hk dels a otated orma- et do- ce do- atasets (a) (b) (c) Figure 1. Examples of noisy annotations in Cityscapes dataset. Miss-annotated samples: The bicycle in (a); the rider and car in (c). Class-corrupted samples: The rider and bicycle are labeled
  • 4. 手法 nモデル概要図 Figure 3. Overview of our NLTE framework, which includes PIM, MGRM and EAGR. c is the concatenation operation.
  • 5. Potential Instance Mining nRPN [Ren+, NeurIPS2015]により物体領域の計算 • 提案領域が物体かどうかのスコアに閾値を用いて選択する e co-teaching tor with noisy roles of noisy and used cross et al. [24] uti- to rectify the ned the model ious works at- n the same do- em of perfor- otations in the eve robust do- (RPN). As RPN is class-agnostic, the predicted objectness score of each proposal represents the uncertainty of the ex- istence of an object within the proposal. Therefore, if the proposals have larger objectness scores than thresholds and no intersection with the ground truth boxes, we select them as eligible candidate proposals P s : P s = {pi | obj(pi) > ⌧, pi / 2 Ps , 8jIoU(pi, pj) = 0}, (1) where ⌧ is the threshold. PIM is also utilized in the tar- get domain to mine confident positive samples P t for more effective domain alignment. Through the PIM mecha- nism, only highly-confident proposals are preserved such that missing objects would get recaptured, which simulta- neously increases the number of correctly labeled instances for enhancing the discrimination ability and enriches the di- versity of source semantic features. 3.3. Morphable Graph Relation Module
  • 6. Morphable Graph Relation Module nクラス破壊されたサンプルのドメインとsemantic informationを探索 nそれぞれのドメイングラフ内で特徴量の集約 • Neighbour(i)はiと同じ領域内のproposal itional distributions of Y from different do- s)|Y s . Then with the stribution of the drawn o estimate Pt Y t , which e target domain. How- hen Pt Y t is computed P̃s XsỸ s and the learned erformance of domain Although correcting tions are conducted on both domains. Intra-domain graph feature aggregation. Given pro- posals P 2 RN⇥D {P, P} after PIM, we first con- struct them as intra-domain undirected graphs G = {V, E}. Specifically, the vertices correspond to the proposals within each domain, and the edges are defined as the feature cosine similarity between them (eii0 = pi·pi0 kpik2·kpi0 k2 ). Afterwards, we apply intra-domain aggregation to enhance the feature representation within each domain, shown as follows: pi ⇣ X i02Neighbour(i) (wipieii0 + pi) ⌘ , pi 2 P, (2) 14209
  • 7. Morphable Graph Relation Module nグローバル関係行列の構築 wise transition probability between source and target do- mains, we introduce a global relation matrix that represents the category-wise affinity between domains. Specifically, considering the source dataset contains noisy annotations and the target dataset is unlabeled, we first utilize confi- dent proposals after aggregation P to assemble as batch- wise prototypes, which correspond to the class-wise feature centroids: { (u)}C u=1 = 1 Card(P(y0)) X y0 =u p(y0,i)2P(y) p(y0,i), (3) where Card is the cardinality and y0 is the most confi- dent category. Then, the correspondence between local and global prototypes is characterized according to their seman- tic correlation, and an adaptive update operation for gener- ating global prototypes {Bs (u)}C u=1 and {Bt (v)}C v=1 is con- ducted: {B(u)}C u=1 = C X m=1 (1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4) where ⌧(m,u) is the cosine similarity between the m-th batch-wise prototype and the u-th global prototype. With this adaptive update process, the representation of global To mitigate this, the transition pr corrupted samples are expected to b trinsic class-wise correspondence b domain. Therefore, we directly ex posals features from P̃s (ỹ) regardin noisy labels ỹ, and generate noisy similar to the batch-wise prototype {˜s (u)}C u=1 = 1 Card(P̃s (ỹ)) Then, we build local relation matr ˜s (u) and Bt (v) to model the transfe samples. Each entry is the class-w ity zu,v = ˜s (u)·Bt (v) k ˜s (u) k2·kBt (v) k2 . We u such transition probability betwee and global relation matrix: Lmgrm = 1 r X r21(Z) | where 1(Z) refers to the non-zero c indicates the existence of the r- 14210 wise prototypes, which correspond to the class-wise feature centroids: { (u)}C u=1 = 1 Card(P(y0)) X y0 =u p(y0,i)2P(y) p(y0,i), (3) where Card is the cardinality and y0 is the most confi- dent category. Then, the correspondence between local and global prototypes is characterized according to their seman- tic correlation, and an adaptive update operation for gener- ating global prototypes {Bs (u)}C u=1 and {Bt (v)}C v=1 is con- ducted: {B(u)}C u=1 = C X m=1 (1 ⌧(m,u)) (m) + ⌧(m,u)B(u), (4) where ⌧(m,u) is the cosine similarity between the m-th batch-wise prototype and the u-th global prototype. With this adaptive update process, the representation of global similar to the batch-wise prototy {˜s (u)}C u=1 = 1 Card(P̃s (ỹ Then, we build local relation m ˜s (u) and Bt (v) to model the tran samples. Each entry is the clas ity zu,v = ˜s (u)·Bt (v) k ˜s (u) k2·kBt (v) k2 . We such transition probability betw and global relation matrix: Lmgrm = 1 r X r21(Z where 1(Z) refers to the non-zer indicates the existence of the 14210
  • 8. Morphable Graph Relation Module n遷移確率正則化 lobal relation matrix that represents ity between domains. Specifically, dataset contains noisy annotations s unlabeled, we first utilize confi- gregation P to assemble as batch- correspond to the class-wise feature 1 ard(P(y0)) X y0 =u p(y0,i)2P(y) p(y0,i), (3) rdinality and y0 is the most confi- e correspondence between local and racterized according to their seman- daptive update operation for gener- {Bs (u)}C u=1 and {Bt (v)}C v=1 is con- ⌧(m,u)) (m) + ⌧(m,u)B(u), (4) osine similarity between the m-th nd the u-th global prototype. With ocess, the representation of global corrupted samples are expected to be regularized by the in- trinsic class-wise correspondence between source and target domain. Therefore, we directly extract noisy source pro- posals features from P̃s (ỹ) regarding to their corresponding noisy labels ỹ, and generate noisy source local prototype similar to the batch-wise prototypes in Eq. (3): {˜s (u)}C u=1 = 1 Card(P̃s (ỹ)) X ỹ=u p̃s (ỹ,i)2P̃s (ỹ) p̃s (ỹ,i). (5) Then, we build local relation matrix Z 2 RC⇥C between ˜s (u) and Bt (v) to model the transferability of noisy source samples. Each entry is the class-wise transition probabil- ity zu,v = ˜s (u)·Bt (v) k ˜s (u) k2·kBt (v) k2 . We use `1 loss to regularize such transition probability between local relation matrix and global relation matrix: Lmgrm = 1 r X r21(Z) |zr ⇡r|, (6) where 1(Z) refers to the non-zero columns within Z, which indicates the existence of the r-th category within the 14210 (P(y0)) y0 =u p(y0,i)2P(y) (y ,i) nality and y0 is the most confi- orrespondence between local and terized according to their seman- ptive update operation for gener- Bs (u)}C u=1 and {Bt (v)}C v=1 is con- ⌧(m,u)) (m) + ⌧(m,u)B(u), (4) ne similarity between the m-th the u-th global prototype. With ess, the representation of global Card(P(ỹ)) ỹ=u p̃s (ỹ,i)2P̃s (ỹ) Then, we build local relation matrix Z 2 RC⇥C between ˜s (u) and Bt (v) to model the transferability of noisy source samples. Each entry is the class-wise transition probabil- ity zu,v = ˜s (u)·Bt (v) k ˜s (u) k2·kBt (v) k2 . We use `1 loss to regularize such transition probability between local relation matrix and global relation matrix: Lmgrm = 1 r X r21(Z) |zr ⇡r|, (6) where 1(Z) refers to the non-zero columns within Z, which indicates the existence of the r-th category within the 14210 +GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44 +NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35 80% DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33 +SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56 +CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36 +GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46 +NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43 3.5. Framework Optimization The framework is trained with the following objective function: L = Ldet + mgrmLmgrm + LDAF dis + LEAGR dis , (12) where Ldet denotes the loss of Faster R-CNN [37] which consists of RPN loss and RCNN loss. LDAF dis contains the discrimination components in DAF [6]. Hereafter, we adopt meta update in Eq. (11) for achieving gradient reconcile- ment. During inference, the input images are consecutively fed into ( f , det) to obtain the detection results. 4. Experiments mimic the anno select a portion random label. ground, the corr Clipart1k & W graphical image cal VOC. All im and testing. Wat as the Pascal V adversarial train Cityscapes & F 2,975 images fo As shown in Fi notations itself,
  • 9. 実験設定 nデータセット • Pascal VOC • Clipart1k • watercolor2k • Cityscape • Foggy Cityscape nモデル • DAF [Chen+, CVPR2018] • バックボーン • ResNet50 nOptimizer • SGD n学習率 • 0.0001 • 5,6エポック目に0.1で減衰 nノイズ付与 • ランダムにbboxのラベルを変更
  • 10. 実験結果 nPascalVOC, Noisy Pasca -> Clipart1k • CPでは20,40%のときにDAFより低下 • NLTEは総じて性能が向上する Table 1. Results (%) of Pascal VOC and Noisy Pascal VOC with different noisy rates (NR) ! Clipart1k. Pascal VOC & Noisy Pascal VOC ! Clipart1k NR Methods aero bcycle bird boat bottle bus car cat chair cow table dog hrs bike prsn plnt sheep sofa train tv mAP Imprv. 0% DAF 29.0 45.1 33.3 25.8 28.6 48.0 39.8 12.3 35.3 50.3 22.9 17.4 33.4 33.8 59.2 44.8 20.7 26.0 45.3 49.6 35.0 0.0 +SCE 26.5 46.4 35.9 24.3 30.9 38.3 34.9 3.1 31.7 49.8 18.2 17.8 25.2 45.4 53.9 43.0 15.7 26.4 43.3 39.3 32.5 -2.5 +CP 30.3 49.2 29.8 33.2 34.1 45.8 41.1 9.7 35.8 50.7 23.6 14.4 31.7 36.9 54.6 45.8 18.6 29.9 44.8 43.5 35.2 +0.2 +GCE 31.9 53.2 27.9 25.8 31.0 41.9 39.3 4.3 34.5 46.7 18.1 18.4 30.2 39.1 55.0 44.1 18.1 21.1 43.2 40.7 33.2 -1.8 +NLTE 39.1 50.3 33.6 34.7 35.0 40.5 44.2 5.9 36.8 45.8 23.1 17.3 31.8 39.5 60.7 45.4 17.9 28.4 49.0 51.3 36.5 +1.5 20% DAF 34.0 39.1 32.0 27.3 32.2 39.3 38.9 2.9 34.9 44.9 20.6 14.2 30.8 36.6 53.8 43.8 17.6 23.6 42.8 46.1 32.8 0.0 +SCE 23.3 42.3 33.1 27.3 28.8 42.4 35.1 4.0 33.0 44.2 14.6 19.4 27.0 40.9 51.1 45.2 14.9 25.8 41.6 34.8 31.5 -1.3 +CP 29.3 39.6 29.1 28.0 29.4 34.2 42.4 3.9 35.0 39.4 21.2 12.5 32.2 38.9 57.2 43.0 18.6 27.9 40.2 45.0 32.3 -0.5 +GCE 24.0 42.2 32.4 29.4 31.5 45.5 39.9 6.7 36.5 38.0 16.7 15.3 30.4 37.9 53.6 44.1 13.5 24.6 46.9 43.7 32.6 -0.2 +NLTE 33.1 47.5 35.5 28.2 33.7 53.8 43.8 4.2 34.2 48.4 19.3 14.6 29.7 47.2 57.1 42.5 17.7 27.7 40.0 44.5 35.1 +2.3 40% DAF 24.5 39.4 29.1 26.9 32.8 46.5 40.0 4.7 36.1 42.0 21.3 10.6 27.8 37.3 52.8 39.7 17.5 26.9 36.0 46.2 31.9 0.0 +SCE 17.9 42.9 29.7 21.8 26.9 41.5 34.2 8.2 29.1 38.8 19.3 19.2 28.9 48.6 50.7 42.5 10.6 20.4 41.6 40.3 30.6 -1.3 +CP 24.0 40.9 31.1 22.0 31.2 33.0 40.8 4.4 34.6 36.8 18.5 13.8 29.6 41.3 51.7 38.6 14.2 27.8 26.0 37.8 29.9 -2.0 +GCE 27.0 36.2 31.1 26.0 33.8 42.9 41.2 2.6 37.0 45.3 19.0 17.2 33.0 42.9 54.6 45.7 17.6 21.4 41.9 49.1 33.3 +1.4 +NLTE 32.8 45.5 30.8 29.8 35.7 43.2 43.0 6.4 32.7 45.9 19.8 10.8 31.1 43.4 56.4 43.3 19.6 24.8 42.5 43.9 34.1 +2.2 60% DAF 29.4 33.5 29.7 29.0 27.7 39.5 38.0 2.7 31.9 41.5 19.8 12.9 30.2 37.0 49.7 37.2 12.8 25.5 40.8 44.2 30.6 0.0 +SCE 22.2 44.0 31.3 28.0 29.8 48.7 31.6 11.0 29.1 30.7 19.7 9.2 25.8 55.9 51.9 41.3 5.7 21.8 49.0 34.9 31.1 +0.5 +CP 32.2 42.1 31.5 26.3 31.9 42.4 40.5 2.7 31.8 45.2 20.0 12.2 26.5 38.1 51.1 42.3 11.0 25.6 38.4 41.4 31.7 +1.1 +GCE 26.7 43.3 33.0 28.6 33.8 51.8 38.0 6.3 33.8 41.7 22.2 13.9 33.4 44.9 53.1 43.9 14.5 22.8 38.6 43.7 33.3 +2.7 +NLTE 33.0 51.9 32.2 31.7 29.9 39.7 43.6 11.0 36.4 40.7 27.0 11.8 30.3 35.3 55.9 42.2 20.8 30.1 34.5 41.2 34.0 +3.4 80% DAF 28.2 34.0 29.6 20.8 27.7 45.0 34.4 1.4 31.5 34.1 19.9 9.3 26.2 33.3 46.0 37.4 17.5 20.4 30.6 41.9 28.5 0.0 +SCE 19.5 32.9 28.9 23.1 34.3 50.6 31.5 4.3 29.5 35.9 19.5 12.6 23.9 56.2 52.6 38.0 8.2 21.7 41.8 35.5 30.0 +1.5 +CP 25.2 36.1 27.5 29.8 32.5 29.1 34.3 3.2 31.4 37.7 22.3 7.6 30.4 36.5 46.8 35.4 19.9 27.0 29.6 39.1 29.1 +0.6 +GCE 25.8 32.8 29.2 21.1 28.8 50.0 33.5 1.3 28.9 34.1 21.7 7.7 27.2 46.8 50.1 37.9 7.3 20.2 42.5 36.0 29.1 +0.6 +NLTE 36.0 45.4 33.5 30.3 27.3 40.5 40.6 2.6 28.3 51.7 20.4 9.5 30.8 43.1 56.6 42.1 17.7 23.3 31.2 38.4 32.5 +4.0
  • 11. 実験結果 nPascalVOC, Noisy Pasca -> Watercolor2k • 従来手法では性能低下する可能性 • 提案手法は一貫して性能の向上 Table 2. Results (%) of Pascal VOC and Noisy Pascal VOC with different noisy rates (NR) ! Watercolor2k. Pascal VOC & Noisy Pascal VOC ! Watercolor2k NR Methods bcycle bird car cat dog prsn mAP Imprv. 0% DAF 65.8 40.4 35.3 30.0 21.5 44.1 39.6 0.0 +SCE 65.3 36.9 38.3 25.8 18.9 43.2 37.9 -1.7 +CP 67.1 39.1 34.5 27.2 22.9 45.3 39.4 -0.2 +GCE 67.3 37.0 39.7 21.9 21.3 46.4 38.9 -0.7 +NLTE 73.7 36.9 39.9 26.8 22.6 45.3 40.9 +1.3 20% DAF 69.1 36.5 25.8 31.0 16.1 44.9 37.2 0.0 +SCE 62.4 42.6 33.2 32.2 18.5 46.5 39.2 +2.0 +CP 72 36.5 21.3 18.3 21.1 41.5 35.1 -2.1 +GCE 62.7 42.5 40.1 26.2 18.8 44.9 39.2 +2.0 +NLTE 73.7 37.1 35.3 28.1 21.2 44.5 40.0 +2.8 40% DAF 68.0 32.9 20.5 19.8 13.6 39.4 32.4 0.0 +SCE 64.5 36.6 37.8 14.1 14.0 42.8 35.0 +2.6 +CP 66.0 36.6 17.8 24.0 18.2 39.8 33.7 +1.3 +GCE 64.3 40.0 34.7 21.3 19.0 43.8 37.2 +4.8 +NLTE 75.7 37.2 32.5 22.6 24.3 43.1 39.2 +6.8 60% DAF 58.6 35.6 16.7 18.8 11.5 40.1 30.2 0.0 +SCE 68.1 36.3 31.8 21.9 19.7 41.3 36.5 +6.3 +CP 68.4 30.3 24.0 22.8 9.6 38.7 32.3 +2.1 +GCE 73.7 33.0 28.7 24.3 20.4 41.2 36.9 +6.7 +NLTE 69.5 35.4 27.4 28.4 19.8 51.5 38.6 +8.4 80% DAF 56.8 36.7 15.6 19.0 14.8 37.8 30.1 0.0 +SCE 69.4 37.4 22.6 24.3 16.6 34.6 34.2 +4.1 +CP 49.1 36.1 16.6 13.7 10.1 36.9 27.1 -3.0 +GCE 62.8 34.3 14.5 13.4 10.7 40.6 29.4 -0.7 +NLTE 72.7 41.4 6.6 30.5 14.1 47.9 35.6 +5.5 4.2. Synthetic Noise
  • 12. 実験結果 nノイズ率20%の定性的評価 • 大きなドメインシフトがあっても不明瞭な物体を正しく分類できる • 人物などの隠蔽物に対して正確なbboxを生成 (a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [35] (d) DAF+GCE [56] (e) DAF+NLTE (Ours) Figure 4. Qualitative results with noisy rate 20% on Clipart1k (top row) and Watercolor2k (bottom row). 上段: Clipart1k, 下段: Watercolor2k
  • 13. 実験結果 n関係行列の可視化 • Pascal VOC, Noisy Pascal VOC -> Cliparrt1kの行列 • ノイズが増えてもクラスごとの遷移確率が反映されている (a) DAF [6] (b) DAF+SCE [45] (c) DAF+CP [3 Figure 4. Qualitative results with noisy rate 20% on Clipa Figure 5. Global relation matrices of Pascal VOC & Noisy Pascal 0% 20% 40% 60% 80%