SlideShare a Scribd company logo
4
Most read
8
Most read
10
Most read
Learning From Noisy
Labels With Deep Neural
Networks: A Survey
Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil
Lee, IEEE TNNLS 2022
橋口凌大(名工大玉木研)
2022/10/28
概要
nノイズの多いデータから学習させる方法のサーベイ
• ノイズにロバストなモデル
n分類タスクに焦点を当ててロバストな学習
• ロバストなアーキテクチャ
• ロバストな正則化
• ロバストな損失関数の設計
• サンプル選択
ラベルノイズ下での教師あり学習
n問題点
• ノイズラベルを記憶し,汎化性能が低下
nアプローチ
• ノイズの影響を軽減する学習方法の確立
nノイズの種類についての調査
• インスタンス依存
• インスタンス非依存
Clean labels Wrong labels
Cross Entropy
Early-learning
Regularization
[Liu+, NeurIPS2020]
深層学習とノイジーラベル
n深層学習ではラベルノイズの影響を受けやすい
nロバスト性を実現するために複数の学習手法が誕生
(a) Noise added on classification inputs. (b) Noise added on classification labels.
Figure 7. Accuracy (left in each pair, solid is train, dotted is validation) and Critical sample ratios (right in each pair) for MNIST.
(a) Noise added on classification inputs. (b) Noise added on classification labels.
Figure 8. Accuracy (left in each pair, solid is train, dotted is validation) and Critical sample ratios (right in each pair) for CIFAR10.
Algorithm 1 Langevin Adversarial Sample Search (LASS)
Require: x 2 Rn
, ↵, , r, noise process ⌘
Ensure: x̂
1: converged = FALSE
2: x̃ x; x̂ ;
3: while not converged or max iter reached do
4: = ↵ · sign(@fk(x)
@x ) + · ⌘
5: x̃ x̃ +
6: for i 2 [n] do
⇢ i
[Zhang+, ICLR2017]
深層学習によるノイズロバストな学習
nロバストな学習方法の分類
EE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 3
Robust Training
Robust Architecture (§III-A)
Robust Regularization (§III-B)
Loss Adjustment (§III-D)
Sample Selection (§III-E)
Robust Loss Function (§III-C)
Robust Loss Design
Noise Adaptation Layer
Dedicated Architecture
Explicit Regularization
Implicit Regularization
Loss Correction
Loss Reweighting
Label Refurbishment
Multi-network Learning
Multi-round Learning
Meta Learning
Hybrid Approach
g. 3. A high level research overview of robust deep learning for noisy labels. The research directions that are actively contributed by the machine learning
mmunity are categorized into five groups in blue italic.
Here, the risk minimization process is no longer noise- C. Non-deep Learning Approaches
Robust Architecture
nノイズ適応層
• ラベル遷移パターンを学習
stness of
hes [16],
research
nity. All
pervised
ion layer
ransition
reliably
o overfit
ly;
unction;
p(ỹ = j|x)=
X
i=1
p(ỹ = j, y = i|x)=
X
i=1
Tijp(y = i|x),
where Tij = p(ỹ = j|y = i, x).
(3)
In light of this, the noise adaptation layer is intended to
mimic the label transition behavior in learning a DNN. Let
p(y|x; ⇥) be the output of the base DNN with a softmax output
layer. Then, following Eq. (3), the probability of an example
x being predicted as its noisy label ỹ is parameterized by
p(ỹ = j|x; ⇥, W) =
c
X
i=1
p(ỹ = j, y=i|x; ⇥, W)
=
c
X
i=1
p(ỹ = j|y=i; W)
| {z }
Noise Adaptation Layer
p(y=i|x; ⇥)
| {z }
Base Model
.
(4)
Here, the noisy label ỹ is assumed to be conditionally
independent of the input x in general. Accordingly, as
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 4
[57]. In addition, several decision tree models are extended
using new split criteria to solve the overfitting issue when
the training data are not fully reliable [58], [59]. However, it
is infeasible to apply their design principles to deep learning.
Meanwhile, deep learning is more susceptible to label noises
than traditional machine learning owing to its high expressive
power, as proven by many researchers [21], [60], [61]. There
has been significant effort to understand why noisy labels
negatively affect the performance of DNNs [22], [61]–[63].
This theoretical understanding has led to the algorithmic de-
sign which achieves higher robustness than non-deep learning
methods. A detailed analysis of theoretical understanding for
robust deep learning was provided by Han et al. [30].
D. Regression with Noisy Labels
In addition to classification, regression is another main topic
of supervised machine learning, which aims to model the rela-
tionship between a number of features and a continuous target
variable. Unlike the classification task with a discrete label
space, the regression task considers the continuous variable as
its target label [64], and thus it learns the mapping function
Loss ℒ(𝑓(𝑥; Θ, 𝒲), ෤
𝑦)
True Label y ∈ 𝒴 Input 𝑥 ∈ 𝒳
Base Model 𝜃 with Softmax Layer
Noise Adaptation Layer 𝒑(෥
𝒚|𝒚; 𝓦)
𝑝(𝑦|𝑥; Θ)
𝑝(෤
𝑦|𝑥; Θ, 𝒲)
Noisy Label ෤
y ∈ ෨
𝒴
Label Corruption
Fig. 4. Noise modeling process using the noise adaptation layer.
Overall, we categorize all recent deep learning methods into
five groups corresponding to popular research directions, as
shown in Figure 3. In §III-D, meta learning is also discussed
because it finds the optimal hyperparameters for loss reweight-
ing. In §III-E, we discuss the recent efforts for combining
sample selection with other orthogonal directions or semi-
supervised learning toward the state-of-the-art performance.
Figure 2 illustrates the categorization of robust training
methods using these five groups.
A. Robust Architecture
Robust Architecture
n専用アーキテクチャ
• ラベル遷移確率の推定の信頼性の向上
• 2つのネットワークを学習
• ラベル遷移確率の予測
• ノイズの種類の予測
Noise Free 41%
Random 3%
Confusing 56%
p(z | x)
5 Layers of
Conv +
Pool + Norm
3 FC Layers
of Size
4096 4096 14
5 Layers of
Conv +
Pool + Norm
3 FC Layers
of Size
4096 1024 3
Label Noise
Model Layer
Down Coat
Windbreaker 4%
Jacket 1%
……
94%
p(y | x)
Noisy Label:
Windbreaker
Down Coat
Windbreaker 11%
Jacket 4%
……
75%
p(y | y
!,x)
Noise Free 11%
Random 4%
Confusing 85%
p(z | y
!,x)
Data with
Clean Labels
Figure 5. System diagram of our method. Two CNNs are used to predict the class label p(y|x) and the noise type p(z|x), respectively. The
[Xiao+, CVPR2015]
Robust Regularization
n明示的な正則化
• 学習損失を修正する
2 S. Jenni and P. Favaro
validation mini-batches
data set
training mini-batches
i2T t
!i`i(✓)
j2Vt
`j(✓) ✓ = ✓t
✏
i2T t
!ir`i(✓t
)
stochastic gradient descent
with mini-batch adaptive weights
!i
j2Vt
r`j(✓t
)>
r`i(✓t
)
|r`i(✓t)|2 + µ̂
mini-batch weights
r`j(✓t
)
r`i(✓t
)
if gradients agree
the mini-batch weights
are positive and large
Fig. 1. The training procedure of our bilevel formulation. At each iteration we sample
[Jenni&Favaro, ECCV2018]
Robust Regularization
n暗黙的な正則化
• 確率的な効果をもたらす正則化
• データ拡張
• ミニバッチ確率的勾配降下法
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
with a uniform mixture over all possible labels,
ȳ =
⌦
ȳ(1), ȳ(2), . . . , ȳ(c)
↵
,
where ȳ(i) = (1 ↵) · [ỹ = i] + ↵/c and ↵ 2 [0, 1].
(5)
Here, [·] is the Iverson bracket and ↵ is the smoothing degree.
In contrast, mixup [95] regularizes the DNN to favor simple
linear behaviors in between training examples. First, the mini-
batch is constructed using virtual training examples, each of
which is formed by the linear interpolation of two noisy
training examples (xi, ỹi) and (xj, ỹj) obtained at random
from noisy training data D̃,
xmix = xi + (1 )xj and ymix = ỹi + (1 )ỹj, (6)
where 2 [0, 1] is the balance parameter between two
examples. Thus, mixup extends the training distribution by
where C is a cons
classifier trained on
probability as tha
the specified assu
classification was pr
RD(f⇤
) = 0, then t
asymmetric noise, w
For the classific
tropy (CCE) loss is
to its fast converg
However, in the pr
[68] showed that the
better generalization
loss satisfies the af
the MAE loss is tha
Robust Loss Function
n未知のクリーンなデータに対してもロバストな損失を設計
• ベイズリスク最小化がノイズ耐性となるような損失関数を証明
[Manwani&Sastry, IEEE Transactions on Cybernetics 2013]
n以下が成り立つときにノイズ耐性があると定義
in the training data [68], [97]–[101].
Technical Detail: Initially, Manwani and Sastry [48] theo-
retically proved a sufficient condition for the loss function
such that risk minimization with that function becomes noise-
tolerant for binary classification. Subsequently, the sufficient
condition was extended for multi-class classification using
deep learning [68]. Specifically, a loss function is defined to
be noise-tolerant for a c-class classification under symmetric
noise if the function satisfies the noise rate ⌧ < c 1
c and
c
X
j=1
` f(x; ⇥), y = j = C, 8x 2 X, 8f, (8)
correction that es
the forward or bac
different importan
scheme, 3) label
refurbished label
and predicted lab
infers the optima
loss function new
methods aims to
robust to label n
update rule is adj
noise is minimize
Loss Adjustment
nパラメータ更新の前にすべての学習性の損失を調整
1. ノイズ遷移行列を推定して損失補正
2. 各例に異なる重要度を付与する損失再重み付け
3. ノイズラベルと予測ラベルから生成したラベルを用いて損失を調整
4. 損失調整の最適ルールを自動的に推測するメタ学習
n 利点
• 学習データ全てに対して損失の調整が適用できる
n 欠点
• クラス数やノイズラベルが多い時,誤補正による誤差が蓄積される [Han+,
NeurIPS2018]
Loss Adjustment
n損失再重み付け
nラベル更新
• αは信頼度
RKS AND LEARNING SYSTEMS 7
ows for a full exploration
he loss of every example.
correction is accumulated,
classes or the number of
.
the noise adaptation layer
pproach modifies the loss
e estimated label transition
pecified DNN. The main
he transition probability is
tion [62] initially approx-
using the softmax output
orrection. Subsequently, it
the original loss based on
d loss of a example (x, ỹ)
tion of its loss values for
nt is the inverse transition
weights to those with true labels. Accordingly, the reweighted
loss on the mini-batch Bt is used to update the DNN,
⇥t+1 = ⇥t ⌘r
⇣ 1
|Bt|
X
(x,ỹ)2Bt
Reweighted Loss
z }| {
w(x, ỹ)` f(x; ⇥t), ỹ
⌘
, (11)
where w(x, ỹ) is the weight of an example x with its noisy
label ỹ. Hence, the examples with smaller weights do not
significantly affect the DNN learning.
Technical Detail: In importance reweighting [108], the ratio
of two joint data distributions w(x, ỹ) = PD(x, ỹ)/PD̃(x, ỹ)
determines the contribution of the loss of each noisy example.
An approximate solution to estimate the ratio was developed
because the two distributions are difficult to determine from
noisy data. Meanwhile, active bias [109] emphasizes uncertain
examples with inconsistent label predictions by assigning their
prediction variances as the weights for training. DualGraph
[118] employs graph neural networks and reweights the ex-
c
E>
, (9)
near com-
plying the
performed
y with the
ep,
ỹ
⌘
(10)
the avail-
s for loss
rix is ob-
ich further
the weighting function as well as there additional hyper-
parameters, which is fairly hard to be applied in practice due
to the significant variation of appropriate weighting schemes
that rely on the noise type and training data.
3) Label Refurbishment: Refurbishing a noisy label ỹ
effectively prevents overfitting to false labels. Let ŷ be the
current prediction of the DNN f(x; ⇥). Therefore, the refur-
bished label yrefurb
can be obtained by a convex combination
of the noisy label ỹ and the DNN prediction ŷ,
yrefurb
= ↵ỹ + (1 ↵)ŷ, (12)
where ↵ 2 [0, 1] is the label confidence of ỹ. To mitigate the
damage of incorrect labeling, this approach backpropagates the
loss for the refurbished label instead of the noisy one, thereby
yielding substantial robustness to noisy labels.
Technical Detail: Bootstrapping [69] is the first method that
sition matrix is ob-
mation, which further
rrection. Recently, T-
an infer the transition
al T [114] factorizes
-to-estimate matrices
class posterior. Be-
umption, Zhang et al.
embedding to model
the transition matrix.
osed to use the Bayes
the distilled examples
on matrix.
ches is highly depen-
atrix is estimated. To
quire prior knowledge
n validation data.
the concept of im-
yielding substantial robustness to noisy labels.
Technical Detail: Bootstrapping [69] is the first method that
proposes the concept of label refurbishment to update the
target label of training examples. It develops a more coherent
network that improves its ability to evaluate the consistency
of noisy labels, with the label confidence ↵ obtained via
cross-validation. Dynamic bootstrapping [110] dynamically
adjusts the confidence ↵ of individual training examples. The
confidence ↵ is obtained by fitting a two-component and one-
dimensional beta mixture model to the loss distribution of
all training examples. Self-adaptive training [119] applies the
exponential moving average to alleviate the instability issue of
using instantaneous prediction of the current DNN,
yrefurb
t+1 = ↵yrefurb
t + (1 ↵)ŷ, where yrefurb
0 = ỹ (13)
D2L [111] trains a DNN using a dimensionality-driven
learning strategy to avoid overfitting to false labels. A simple
measure called local intrinsic dimensionality [120] is adopted
to evaluate the confidence ↵ in considering that the overfitting
Sample Selection
n真のラベルデータのみを選択する
• 擬似的にクリーンなデータセットを生成できるので汎用性が向上 [Song+,
ICML2019]
比較
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 12
TABLE III
COMPARISON OF ROBUST DEEP LEARNING CATEGORIES FOR OVERCOMING NOISY LABELS.
Category
P1 P2 P3 P4 P5 P6
Flexibility No Pre-train Full Exploration No Supervision Heavy Noise Complex Noise
Robust Architecture
(§III-A)
Noise Adaptation Layer 4 5 5
Dedicated Architecture 5 4 4 4
Robust Regularization
(§III-B)
Implicit Regularization 4 4
Explicit Regularization 5 4
Robust Loss Function (§III-C) 5 5
Loss Adjustment
(§III-D)
Loss Correction 5 5 5 5
Loss Reweighting 5 4
Label Refurbishment 4 4
Meta Learning 5 4 4
Sample Selection
(§III-E)
Multi-Network Learning 5 5 4
Multi-Round Learning 5 4
Hybrid Approach 4
the method marked with “5” only deals with the instance-
independent noise, while the method marked with “ ” deals
with both instance-independent and -dependent noises. The
remaining properties (i.e., P2, P3, and P4) are only assigned
k 6= i. Thus, let Ai be the set of anchor points with label i, then
the element of the noise transition matrix Tij is estimated by
T̂ij =
1
|A |
X c
X
p(ỹ = j|y = k)p(y = k|x)
まとめ
nノイズの多いラベルから学習するための手法のサーベイ
• ロバストなアーキテクチャ
• ロバストな正則化
• ロバストな損失関数の設計
• サンプル選択
n汎用的なフレームワークは存在しない
• 問題に応じて適切に設計する必要がある

More Related Content

PPTX
Semi supervised, weakly-supervised, unsupervised, and active learning
PPTX
[DL輪読会]When Does Label Smoothing Help?
PPTX
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
PDF
最近のDeep Learning (NLP) 界隈におけるAttention事情
PDF
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
PPTX
Group normalization
PDF
PRML学習者から入る深層生成モデル入門
PPTX
Noisy Labels と戦う深層学習
Semi supervised, weakly-supervised, unsupervised, and active learning
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
最近のDeep Learning (NLP) 界隈におけるAttention事情
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
Group normalization
PRML学習者から入る深層生成モデル入門
Noisy Labels と戦う深層学習

What's hot (20)

PDF
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
PDF
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
PPTX
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
PDF
機械学習モデルの判断根拠の説明(Ver.2)
PDF
時系列予測にTransformerを使うのは有効か?
PDF
Active Learning の基礎と最近の研究
PDF
Cosine Based Softmax による Metric Learning が上手くいく理由
PPTX
Curriculum Learning (関東CV勉強会)
PDF
Fisher線形判別分析とFisher Weight Maps
PPTX
【DL輪読会】ViT + Self Supervised Learningまとめ
PPTX
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
PPTX
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
PDF
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
PDF
ドメイン適応の原理と応用
PPTX
[DeepLearning論文読み会] Dataset Distillation
PPTX
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
PDF
論文紹介:Temporal Action Segmentation: An Analysis of Modern Techniques
PDF
【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs
PDF
動画認識サーベイv1(メタサーベイ )
PDF
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
機械学習モデルの判断根拠の説明(Ver.2)
時系列予測にTransformerを使うのは有効か?
Active Learning の基礎と最近の研究
Cosine Based Softmax による Metric Learning が上手くいく理由
Curriculum Learning (関東CV勉強会)
Fisher線形判別分析とFisher Weight Maps
【DL輪読会】ViT + Self Supervised Learningまとめ
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
ドメイン適応の原理と応用
[DeepLearning論文読み会] Dataset Distillation
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
論文紹介:Temporal Action Segmentation: An Analysis of Modern Techniques
【DL輪読会】Perceiver io a general architecture for structured inputs &amp; outputs
動画認識サーベイv1(メタサーベイ )
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Ad

Similar to 文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey (20)

PDF
論文紹介:Learning With Neighbor Consistency for Noisy Labels
PDF
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
PDF
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
PDF
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
PDF
Analytical study of feature extraction techniques in opinion mining
PDF
(研究会輪読) Weight Uncertainty in Neural Networks
PDF
Iberspeech2012
PDF
Thesis
PDF
Visual Impression Localization of Autonomous Robots_#CASE2015
PPTX
Event classification & prediction using support vector machine
PPT
Text categorization
PPTX
A note on word embedding
PDF
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
PDF
My PhD defence
PPTX
Anomaly detection using deep one class classifier
PPTX
Neural netorksmatching
PDF
Gi2429352937
PDF
机器学习Adaboost
PDF
Domain adaptation: A Theoretical View
PDF
Lecture7 cross validation
論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
Analytical study of feature extraction techniques in opinion mining
(研究会輪読) Weight Uncertainty in Neural Networks
Iberspeech2012
Thesis
Visual Impression Localization of Autonomous Robots_#CASE2015
Event classification & prediction using support vector machine
Text categorization
A note on word embedding
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
My PhD defence
Anomaly detection using deep one class classifier
Neural netorksmatching
Gi2429352937
机器学习Adaboost
Domain adaptation: A Theoretical View
Lecture7 cross validation
Ad

More from Toru Tamaki (20)

PDF
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
PDF
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
PDF
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
PDF
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
PDF
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
PDF
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
PDF
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
PDF
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...
PDF
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
PDF
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
PDF
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
PDF
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
PDF
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
PDF
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PDF
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
PDF
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
PDF
論文紹介:On Feature Normalization and Data Augmentation
PDF
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
PDF
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
PDF
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
論文紹介:Unboxed: Geometrically and Temporally Consistent Video Outpainting
論文紹介:OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video​ Unde...
論文紹介:HOTR: End-to-End Human-Object Interaction Detection​ With Transformers, ...
論文紹介:Segment Anything, SAM2: Segment Anything in Images and Videos
論文紹介:Unbiasing through Textual Descriptions: Mitigating Representation Bias i...
論文紹介:AutoPrompt: Eliciting Knowledge from Language Models with Automatically ...
論文紹介:「Amodal Completion via Progressive Mixed Context Diffusion」「Amodal Insta...
論文紹介:「mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal La...
論文紹介:What, when, and where? ​Self-Supervised Spatio-Temporal Grounding​in Unt...
論文紹介:PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
論文紹介:"Visual Genome:Connecting Language and Vision​Using Crowdsourced Dense I...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Stream...
論文紹介:Make Pixels Dance: High-Dynamic Video Generation
PCSJ-IMPS2024招待講演「動作認識と動画像符号化」2024年度画像符号化シンポジウム(PCSJ 2024) 2024年度映像メディア処理シンポジ...
論文紹介:T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise E...
論文紹介:On Feature Normalization and Data Augmentation
論文紹介:CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
論文紹介:MS-DETR: Efficient DETR Training with Mixed Supervision
論文紹介:Synergy of Sight and Semantics: Visual Intention Understanding with CLIP

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)

文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey

  • 1. Learning From Noisy Labels With Deep Neural Networks: A Survey Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee, IEEE TNNLS 2022 橋口凌大(名工大玉木研) 2022/10/28
  • 4. 深層学習とノイジーラベル n深層学習ではラベルノイズの影響を受けやすい nロバスト性を実現するために複数の学習手法が誕生 (a) Noise added on classification inputs. (b) Noise added on classification labels. Figure 7. Accuracy (left in each pair, solid is train, dotted is validation) and Critical sample ratios (right in each pair) for MNIST. (a) Noise added on classification inputs. (b) Noise added on classification labels. Figure 8. Accuracy (left in each pair, solid is train, dotted is validation) and Critical sample ratios (right in each pair) for CIFAR10. Algorithm 1 Langevin Adversarial Sample Search (LASS) Require: x 2 Rn , ↵, , r, noise process ⌘ Ensure: x̂ 1: converged = FALSE 2: x̃ x; x̂ ; 3: while not converged or max iter reached do 4: = ↵ · sign(@fk(x) @x ) + · ⌘ 5: x̃ x̃ + 6: for i 2 [n] do ⇢ i [Zhang+, ICLR2017]
  • 5. 深層学習によるノイズロバストな学習 nロバストな学習方法の分類 EE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 3 Robust Training Robust Architecture (§III-A) Robust Regularization (§III-B) Loss Adjustment (§III-D) Sample Selection (§III-E) Robust Loss Function (§III-C) Robust Loss Design Noise Adaptation Layer Dedicated Architecture Explicit Regularization Implicit Regularization Loss Correction Loss Reweighting Label Refurbishment Multi-network Learning Multi-round Learning Meta Learning Hybrid Approach g. 3. A high level research overview of robust deep learning for noisy labels. The research directions that are actively contributed by the machine learning mmunity are categorized into five groups in blue italic. Here, the risk minimization process is no longer noise- C. Non-deep Learning Approaches
  • 6. Robust Architecture nノイズ適応層 • ラベル遷移パターンを学習 stness of hes [16], research nity. All pervised ion layer ransition reliably o overfit ly; unction; p(ỹ = j|x)= X i=1 p(ỹ = j, y = i|x)= X i=1 Tijp(y = i|x), where Tij = p(ỹ = j|y = i, x). (3) In light of this, the noise adaptation layer is intended to mimic the label transition behavior in learning a DNN. Let p(y|x; ⇥) be the output of the base DNN with a softmax output layer. Then, following Eq. (3), the probability of an example x being predicted as its noisy label ỹ is parameterized by p(ỹ = j|x; ⇥, W) = c X i=1 p(ỹ = j, y=i|x; ⇥, W) = c X i=1 p(ỹ = j|y=i; W) | {z } Noise Adaptation Layer p(y=i|x; ⇥) | {z } Base Model . (4) Here, the noisy label ỹ is assumed to be conditionally independent of the input x in general. Accordingly, as IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 4 [57]. In addition, several decision tree models are extended using new split criteria to solve the overfitting issue when the training data are not fully reliable [58], [59]. However, it is infeasible to apply their design principles to deep learning. Meanwhile, deep learning is more susceptible to label noises than traditional machine learning owing to its high expressive power, as proven by many researchers [21], [60], [61]. There has been significant effort to understand why noisy labels negatively affect the performance of DNNs [22], [61]–[63]. This theoretical understanding has led to the algorithmic de- sign which achieves higher robustness than non-deep learning methods. A detailed analysis of theoretical understanding for robust deep learning was provided by Han et al. [30]. D. Regression with Noisy Labels In addition to classification, regression is another main topic of supervised machine learning, which aims to model the rela- tionship between a number of features and a continuous target variable. Unlike the classification task with a discrete label space, the regression task considers the continuous variable as its target label [64], and thus it learns the mapping function Loss ℒ(𝑓(𝑥; Θ, 𝒲), ෤ 𝑦) True Label y ∈ 𝒴 Input 𝑥 ∈ 𝒳 Base Model 𝜃 with Softmax Layer Noise Adaptation Layer 𝒑(෥ 𝒚|𝒚; 𝓦) 𝑝(𝑦|𝑥; Θ) 𝑝(෤ 𝑦|𝑥; Θ, 𝒲) Noisy Label ෤ y ∈ ෨ 𝒴 Label Corruption Fig. 4. Noise modeling process using the noise adaptation layer. Overall, we categorize all recent deep learning methods into five groups corresponding to popular research directions, as shown in Figure 3. In §III-D, meta learning is also discussed because it finds the optimal hyperparameters for loss reweight- ing. In §III-E, we discuss the recent efforts for combining sample selection with other orthogonal directions or semi- supervised learning toward the state-of-the-art performance. Figure 2 illustrates the categorization of robust training methods using these five groups. A. Robust Architecture
  • 7. Robust Architecture n専用アーキテクチャ • ラベル遷移確率の推定の信頼性の向上 • 2つのネットワークを学習 • ラベル遷移確率の予測 • ノイズの種類の予測 Noise Free 41% Random 3% Confusing 56% p(z | x) 5 Layers of Conv + Pool + Norm 3 FC Layers of Size 4096 4096 14 5 Layers of Conv + Pool + Norm 3 FC Layers of Size 4096 1024 3 Label Noise Model Layer Down Coat Windbreaker 4% Jacket 1% …… 94% p(y | x) Noisy Label: Windbreaker Down Coat Windbreaker 11% Jacket 4% …… 75% p(y | y !,x) Noise Free 11% Random 4% Confusing 85% p(z | y !,x) Data with Clean Labels Figure 5. System diagram of our method. Two CNNs are used to predict the class label p(y|x) and the noise type p(z|x), respectively. The [Xiao+, CVPR2015]
  • 8. Robust Regularization n明示的な正則化 • 学習損失を修正する 2 S. Jenni and P. Favaro validation mini-batches data set training mini-batches i2T t !i`i(✓) j2Vt `j(✓) ✓ = ✓t ✏ i2T t !ir`i(✓t ) stochastic gradient descent with mini-batch adaptive weights !i j2Vt r`j(✓t )> r`i(✓t ) |r`i(✓t)|2 + µ̂ mini-batch weights r`j(✓t ) r`i(✓t ) if gradients agree the mini-batch weights are positive and large Fig. 1. The training procedure of our bilevel formulation. At each iteration we sample [Jenni&Favaro, ECCV2018]
  • 9. Robust Regularization n暗黙的な正則化 • 確率的な効果をもたらす正則化 • データ拡張 • ミニバッチ確率的勾配降下法 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS with a uniform mixture over all possible labels, ȳ = ⌦ ȳ(1), ȳ(2), . . . , ȳ(c) ↵ , where ȳ(i) = (1 ↵) · [ỹ = i] + ↵/c and ↵ 2 [0, 1]. (5) Here, [·] is the Iverson bracket and ↵ is the smoothing degree. In contrast, mixup [95] regularizes the DNN to favor simple linear behaviors in between training examples. First, the mini- batch is constructed using virtual training examples, each of which is formed by the linear interpolation of two noisy training examples (xi, ỹi) and (xj, ỹj) obtained at random from noisy training data D̃, xmix = xi + (1 )xj and ymix = ỹi + (1 )ỹj, (6) where 2 [0, 1] is the balance parameter between two examples. Thus, mixup extends the training distribution by where C is a cons classifier trained on probability as tha the specified assu classification was pr RD(f⇤ ) = 0, then t asymmetric noise, w For the classific tropy (CCE) loss is to its fast converg However, in the pr [68] showed that the better generalization loss satisfies the af the MAE loss is tha
  • 10. Robust Loss Function n未知のクリーンなデータに対してもロバストな損失を設計 • ベイズリスク最小化がノイズ耐性となるような損失関数を証明 [Manwani&Sastry, IEEE Transactions on Cybernetics 2013] n以下が成り立つときにノイズ耐性があると定義 in the training data [68], [97]–[101]. Technical Detail: Initially, Manwani and Sastry [48] theo- retically proved a sufficient condition for the loss function such that risk minimization with that function becomes noise- tolerant for binary classification. Subsequently, the sufficient condition was extended for multi-class classification using deep learning [68]. Specifically, a loss function is defined to be noise-tolerant for a c-class classification under symmetric noise if the function satisfies the noise rate ⌧ < c 1 c and c X j=1 ` f(x; ⇥), y = j = C, 8x 2 X, 8f, (8) correction that es the forward or bac different importan scheme, 3) label refurbished label and predicted lab infers the optima loss function new methods aims to robust to label n update rule is adj noise is minimize
  • 11. Loss Adjustment nパラメータ更新の前にすべての学習性の損失を調整 1. ノイズ遷移行列を推定して損失補正 2. 各例に異なる重要度を付与する損失再重み付け 3. ノイズラベルと予測ラベルから生成したラベルを用いて損失を調整 4. 損失調整の最適ルールを自動的に推測するメタ学習 n 利点 • 学習データ全てに対して損失の調整が適用できる n 欠点 • クラス数やノイズラベルが多い時,誤補正による誤差が蓄積される [Han+, NeurIPS2018]
  • 12. Loss Adjustment n損失再重み付け nラベル更新 • αは信頼度 RKS AND LEARNING SYSTEMS 7 ows for a full exploration he loss of every example. correction is accumulated, classes or the number of . the noise adaptation layer pproach modifies the loss e estimated label transition pecified DNN. The main he transition probability is tion [62] initially approx- using the softmax output orrection. Subsequently, it the original loss based on d loss of a example (x, ỹ) tion of its loss values for nt is the inverse transition weights to those with true labels. Accordingly, the reweighted loss on the mini-batch Bt is used to update the DNN, ⇥t+1 = ⇥t ⌘r ⇣ 1 |Bt| X (x,ỹ)2Bt Reweighted Loss z }| { w(x, ỹ)` f(x; ⇥t), ỹ ⌘ , (11) where w(x, ỹ) is the weight of an example x with its noisy label ỹ. Hence, the examples with smaller weights do not significantly affect the DNN learning. Technical Detail: In importance reweighting [108], the ratio of two joint data distributions w(x, ỹ) = PD(x, ỹ)/PD̃(x, ỹ) determines the contribution of the loss of each noisy example. An approximate solution to estimate the ratio was developed because the two distributions are difficult to determine from noisy data. Meanwhile, active bias [109] emphasizes uncertain examples with inconsistent label predictions by assigning their prediction variances as the weights for training. DualGraph [118] employs graph neural networks and reweights the ex- c E> , (9) near com- plying the performed y with the ep, ỹ ⌘ (10) the avail- s for loss rix is ob- ich further the weighting function as well as there additional hyper- parameters, which is fairly hard to be applied in practice due to the significant variation of appropriate weighting schemes that rely on the noise type and training data. 3) Label Refurbishment: Refurbishing a noisy label ỹ effectively prevents overfitting to false labels. Let ŷ be the current prediction of the DNN f(x; ⇥). Therefore, the refur- bished label yrefurb can be obtained by a convex combination of the noisy label ỹ and the DNN prediction ŷ, yrefurb = ↵ỹ + (1 ↵)ŷ, (12) where ↵ 2 [0, 1] is the label confidence of ỹ. To mitigate the damage of incorrect labeling, this approach backpropagates the loss for the refurbished label instead of the noisy one, thereby yielding substantial robustness to noisy labels. Technical Detail: Bootstrapping [69] is the first method that sition matrix is ob- mation, which further rrection. Recently, T- an infer the transition al T [114] factorizes -to-estimate matrices class posterior. Be- umption, Zhang et al. embedding to model the transition matrix. osed to use the Bayes the distilled examples on matrix. ches is highly depen- atrix is estimated. To quire prior knowledge n validation data. the concept of im- yielding substantial robustness to noisy labels. Technical Detail: Bootstrapping [69] is the first method that proposes the concept of label refurbishment to update the target label of training examples. It develops a more coherent network that improves its ability to evaluate the consistency of noisy labels, with the label confidence ↵ obtained via cross-validation. Dynamic bootstrapping [110] dynamically adjusts the confidence ↵ of individual training examples. The confidence ↵ is obtained by fitting a two-component and one- dimensional beta mixture model to the loss distribution of all training examples. Self-adaptive training [119] applies the exponential moving average to alleviate the instability issue of using instantaneous prediction of the current DNN, yrefurb t+1 = ↵yrefurb t + (1 ↵)ŷ, where yrefurb 0 = ỹ (13) D2L [111] trains a DNN using a dimensionality-driven learning strategy to avoid overfitting to false labels. A simple measure called local intrinsic dimensionality [120] is adopted to evaluate the confidence ↵ in considering that the overfitting
  • 14. 比較 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 12 TABLE III COMPARISON OF ROBUST DEEP LEARNING CATEGORIES FOR OVERCOMING NOISY LABELS. Category P1 P2 P3 P4 P5 P6 Flexibility No Pre-train Full Exploration No Supervision Heavy Noise Complex Noise Robust Architecture (§III-A) Noise Adaptation Layer 4 5 5 Dedicated Architecture 5 4 4 4 Robust Regularization (§III-B) Implicit Regularization 4 4 Explicit Regularization 5 4 Robust Loss Function (§III-C) 5 5 Loss Adjustment (§III-D) Loss Correction 5 5 5 5 Loss Reweighting 5 4 Label Refurbishment 4 4 Meta Learning 5 4 4 Sample Selection (§III-E) Multi-Network Learning 5 5 4 Multi-Round Learning 5 4 Hybrid Approach 4 the method marked with “5” only deals with the instance- independent noise, while the method marked with “ ” deals with both instance-independent and -dependent noises. The remaining properties (i.e., P2, P3, and P4) are only assigned k 6= i. Thus, let Ai be the set of anchor points with label i, then the element of the noise transition matrix Tij is estimated by T̂ij = 1 |A | X c X p(ỹ = j|y = k)p(y = k|x)
  • 15. まとめ nノイズの多いラベルから学習するための手法のサーベイ • ロバストなアーキテクチャ • ロバストな正則化 • ロバストな損失関数の設計 • サンプル選択 n汎用的なフレームワークは存在しない • 問題に応じて適切に設計する必要がある