PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks

A Baseline For Detecting Misclassified and Out-of-Distribution
Examples In Neural Networks
PR-190
Kang, MinGuk
mingukkang1994@gmail.com
Sep., 01, 2019
https://arxiv.org/abs/1610.02136

Unprecedented Successes(Motivation)
Image Classification Real-World Applications
https://arxiv.org/abs/1905.11946

Unprecedented Successes(Motivation)
https://www.researchgate.net/figure/Collage-of-some-medical-imaging-applications-in-which-deep-learning-has-achieved_fig1_313857891

Why Deep Neural Networks tend to be overconfident?
① Softmax Probabilities are computed
with the fast-growing exponential function
But… No Experimental Analysis
https://arxiv.org/pdf/1706.04599.pdf
Expected Calibration Error(ECE)
① Depth ↑
② Filters ↑
③ Batch Normalization 有
④ Weight Decay ↓
It remains future work to understand why these
trends affect calibration while improving accuracy.

Contributions of this Paper
1. They show the prediction probability of incorrect and out-of-distribution examples tends to be lower
than the prediction probability for correct examples.
2. These prediction probabilities form our detection baseline, and we demonstrate its efficacy through
various computer vision, natural language processing, and automatic speech recognition tasks.
3. They contribute one method which outperforms the baseline on some (but not all) tasks.
4. the designation of standard tasks and evaluation metrics for assessing the automatic detection of errors
and out-of-distribution examples.

Evaluation Metrics
In-distribution Fish: 99
Out-of-distribution Fish: 1
Cheating Neural Network: 99% accuracy!
So, Accuracy is not appropriate metric for out-of-distribution detection.

Evaluation Metrics
① AUROC(Area Under Receiver Operating Characteristic Curve) ① AUPR(Area Under Precision Recall Curve)
FPR(False Positive Rate):
𝐹𝑃
𝐹𝑃+𝑇𝑁
TPR(True Positive Rate):
𝑇𝑃
𝑇𝑃+𝐹𝑁
interpreted as the probability that a positive example has a greater
detector score/value than a negative example (Fawcett, 2005).
AUROC is not ideal when the positive class and negative class have
greatly differing base rates
Precision:
𝑇𝑃
𝑇𝑃+𝐹𝑃
Recall:
𝑇𝑃
𝑇𝑃+𝐹𝑁
interpreted as the probability that a positive example has a greater
detector score/value than a negative example (Fawcett, 2005).
AUROC is not ideal when the positive class and negative class have
greatly differing base rates

Experiments(Misclassified?)
Confi: 0.81 0.91 0.84 0.91 0.85 0.75 0.90 0.88
Average: 0.86
Confi: 0.90 0.95 0.85 0.95 0.92 0.88 0.95 0.86
Predict: 7
Actual: 7
Predict: 8
Actual: 8
Predict: 8
Actual: 8
Predict: 8
Actual: 8
Predict: 5
Actual: 5
Predict: 7
Actual: 7
Predict: 9
Actual: 9
Predict: 6
Actual: 6
Average: 0.91

Experiments(Out of Distribution)
Wide(40-4)
Prediction
Train
CIFAR10 Dataset
Test
Select Maximum Softmax Probability and
Use it as out-of-distribution score

Experiments(NLP)
Same Phenomenon was discovered in the NLP! Sentiment Classification
Text Categorization
Automatic Speech Recognition
Experimental Results of Sentiment Classification

Improved Method
Abnormality Module
1. Train a normal classifier and append an auxiliary decoder
which reconstructs the input with in-distribution dataset.
2. Froze the blue layer.
3. Train red layers on clean and noised training examples.
Finally the sigmoid output of the red layers scores how normal the input is

Improved Method
Abnormality Module
Abnormality Module is useful to detect out-of-distribution samples!

Expected Calibration Error(ECE)
① Depth ↑
② Filters ↑
③ Batch Normalization 有
④ Weight Decay ↓
It remains future work to understand why these
trends affect calibration while improving accuracy.
On Calibration of Modern Neural Networks
(2017.06.14)
(2016.10.07)

A Simple Unified Framework for Detecting Out-of-
Distribution Samples And Adversarial Attacks
(2018.07.10)
Training Confidence-Calibrated Classifiers for detecting
Out-of-Distribution samples
(2017.11.26)
Train Generative Adversarial Networks to generate
Boundary Samples.
Class(k)
Probability
1/k

Deep Anomaly Detection with Outlier Exposure
(2018.12.11)
Utilize Realistic Outliers instead of boundary samples
Class(k)
Probability
1/k
In-distribution dataset Out-of-Distribution dataset
Thank You!

PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks

More Related Content

What's hot (20)

Similar to PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks (20)

More from 강민국 강민국 (12)

Recently uploaded (20)

PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks