SlideShare a Scribd company logo
A Baseline For Detecting Misclassified and Out-of-Distribution
Examples In Neural Networks
PR-190
Kang, MinGuk
mingukkang1994@gmail.com
Sep., 01, 2019
https://arxiv.org/abs/1610.02136
Unprecedented Successes(Motivation)
Image Classification Real-World Applications
https://arxiv.org/abs/1905.11946
Unprecedented Successes(Motivation)
https://www.researchgate.net/figure/Collage-of-some-medical-imaging-applications-in-which-deep-learning-has-achieved_fig1_313857891
Why Deep Neural Networks tend to be overconfident?
① Softmax Probabilities are computed
with the fast-growing exponential function
But… No Experimental Analysis
https://arxiv.org/pdf/1706.04599.pdf
Expected Calibration Error(ECE)
① Depth ↑
② Filters ↑
③ Batch Normalization 有
④ Weight Decay ↓
It remains future work to understand why these
trends affect calibration while improving accuracy.
Contributions of this Paper
1. They show the prediction probability of incorrect and out-of-distribution examples tends to be lower
than the prediction probability for correct examples.
2. These prediction probabilities form our detection baseline, and we demonstrate its efficacy through
various computer vision, natural language processing, and automatic speech recognition tasks.
3. They contribute one method which outperforms the baseline on some (but not all) tasks.
4. the designation of standard tasks and evaluation metrics for assessing the automatic detection of errors
and out-of-distribution examples.
Evaluation Metrics
In-distribution Fish: 99
Out-of-distribution Fish: 1
Cheating Neural Network: 99% accuracy!
So, Accuracy is not appropriate metric for out-of-distribution detection.
Evaluation Metrics
① AUROC(Area Under Receiver Operating Characteristic Curve) ① AUPR(Area Under Precision Recall Curve)
FPR(False Positive Rate):
𝐹𝑃
𝐹𝑃+𝑇𝑁
TPR(True Positive Rate):
𝑇𝑃
𝑇𝑃+𝐹𝑁
interpreted as the probability that a positive example has a greater
detector score/value than a negative example (Fawcett, 2005).
AUROC is not ideal when the positive class and negative class have
greatly differing base rates
Precision:
𝑇𝑃
𝑇𝑃+𝐹𝑃
Recall:
𝑇𝑃
𝑇𝑃+𝐹𝑁
interpreted as the probability that a positive example has a greater
detector score/value than a negative example (Fawcett, 2005).
AUROC is not ideal when the positive class and negative class have
greatly differing base rates
Experiments(Misclassified?)
Confi: 0.81 0.91 0.84 0.91 0.85 0.75 0.90 0.88
Average: 0.86
Confi: 0.90 0.95 0.85 0.95 0.92 0.88 0.95 0.86
Predict: 7
Actual: 7
Predict: 8
Actual: 8
Predict: 8
Actual: 8
Predict: 8
Actual: 8
Predict: 5
Actual: 5
Predict: 7
Actual: 7
Predict: 9
Actual: 9
Predict: 6
Actual: 6
Average: 0.91
Experiments(Out of Distribution)
Wide(40-4)
Prediction
Train
CIFAR10 Dataset
Test
Select Maximum Softmax Probability and
Use it as out-of-distribution score
Experiments(NLP)
Same Phenomenon was discovered in the NLP! Sentiment Classification
Text Categorization
Automatic Speech Recognition
Experimental Results of Sentiment Classification
Improved Method
Abnormality Module
1. Train a normal classifier and append an auxiliary decoder
which reconstructs the input with in-distribution dataset.
2. Froze the blue layer.
3. Train red layers on clean and noised training examples.
Finally the sigmoid output of the red layers scores how normal the input is
Improved Method
Abnormality Module
Abnormality Module is useful to detect out-of-distribution samples!
Expected Calibration Error(ECE)
① Depth ↑
② Filters ↑
③ Batch Normalization 有
④ Weight Decay ↓
It remains future work to understand why these
trends affect calibration while improving accuracy.
On Calibration of Modern Neural Networks
(2017.06.14)
(2016.10.07)
A Simple Unified Framework for Detecting Out-of-
Distribution Samples And Adversarial Attacks
(2018.07.10)
Training Confidence-Calibrated Classifiers for detecting
Out-of-Distribution samples
(2017.11.26)
Train Generative Adversarial Networks to generate
Boundary Samples.
Class(k)
Probability
1/k
Deep Anomaly Detection with Outlier Exposure
(2018.12.11)
Utilize Realistic Outliers instead of boundary samples
Class(k)
Probability
1/k
In-distribution dataset Out-of-Distribution dataset
Thank You!

More Related Content

PPTX
Group normalization
PDF
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
PDF
Anomaly detection 系の論文を一言でまとめた
PPTX
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
PDF
Introduction to A3C model
PPTX
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
PPTX
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
PPTX
【DL輪読会】LAR-SR: A Local Autoregressive Model for Image Super-Resolution
Group normalization
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
Anomaly detection 系の論文を一言でまとめた
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
Introduction to A3C model
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
【DL輪読会】LAR-SR: A Local Autoregressive Model for Image Super-Resolution

What's hot (20)

PDF
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
PDF
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
PDF
Deep Q-Network 論文輪読会
PDF
【チュートリアル】コンピュータビジョンによる動画認識
PDF
Variational AutoEncoder
PDF
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
PPTX
Curriculum Learning (関東CV勉強会)
PPTX
[DL輪読会]Meta Reinforcement Learning
PDF
クラシックな機械学習の入門 6. 最適化と学習アルゴリズム
PPTX
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
PPTX
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
PDF
3D CNNによる人物行動認識の動向
PDF
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
PPTX
識別モデルと生成モデルと損失データ
PDF
【論文読み会】Self-Attention Generative Adversarial Networks
PDF
第1回NIPS読み会・関西発表資料
PDF
Overcoming Catastrophic Forgetting in Neural Networks読んだ
PDF
【DL輪読会】Egocentric Video Task Translation (CVPR 2023 Highlight)
PPTX
[DL輪読会]Focal Loss for Dense Object Detection
PDF
[DL輪読会]Disentangling by Factorising
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
Deep Q-Network 論文輪読会
【チュートリアル】コンピュータビジョンによる動画認識
Variational AutoEncoder
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
Curriculum Learning (関東CV勉強会)
[DL輪読会]Meta Reinforcement Learning
クラシックな機械学習の入門 6. 最適化と学習アルゴリズム
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
3D CNNによる人物行動認識の動向
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
識別モデルと生成モデルと損失データ
【論文読み会】Self-Attention Generative Adversarial Networks
第1回NIPS読み会・関西発表資料
Overcoming Catastrophic Forgetting in Neural Networks読んだ
【DL輪読会】Egocentric Video Task Translation (CVPR 2023 Highlight)
[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Disentangling by Factorising
Ad

Similar to PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks (20)

PDF
Predictive uncertainty of deep models and its applications
PPTX
OOD_PPT.pptx
PDF
evaluationmeasures-ml.pdf evaluation measures
PPTX
Lecture-12Evaluation Measures-ML.pptx
PPTX
lecture-12evaluationmeasures-ml-221219130248-3522ee79.pptx eval
PPTX
Dependency modelling in data mining.pptx
PDF
Classification assessment methods
PPTX
ML-ChapterFour-ModelEvaluation.pptx
PPTX
Model Performance Metrics. Accuracy, Precision, Recall
PPTX
04 performance metrics v2
PPTX
Evaluation of multilabel multi class classification
PPTX
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
PPTX
MLA_Confusion Matrix for Classification
DOCX
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
PDF
Performance Metrics for Machine Learning Algorithms
PDF
Machine learning for predictive maintenance external
PDF
alkdjnsalkjdnaklsjdnalksjdnakldaslkdjnaskldnaskjdn
PPTX
Performance Metrics, Baseline Model, and Hyper Parameter
PPTX
EvaluationMetrics.pptx
Predictive uncertainty of deep models and its applications
OOD_PPT.pptx
evaluationmeasures-ml.pdf evaluation measures
Lecture-12Evaluation Measures-ML.pptx
lecture-12evaluationmeasures-ml-221219130248-3522ee79.pptx eval
Dependency modelling in data mining.pptx
Classification assessment methods
ML-ChapterFour-ModelEvaluation.pptx
Model Performance Metrics. Accuracy, Precision, Recall
04 performance metrics v2
Evaluation of multilabel multi class classification
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
MLA_Confusion Matrix for Classification
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docx
Performance Metrics for Machine Learning Algorithms
Machine learning for predictive maintenance external
alkdjnsalkjdnaklsjdnalksjdnakldaslkdjnaskldnaskjdn
Performance Metrics, Baseline Model, and Hyper Parameter
EvaluationMetrics.pptx
Ad

More from 강민국 강민국 (12)

PDF
Deeppermnet
PDF
[Pr12] deep anomaly detection using geometric transformations
PDF
[Pr12] self supervised gan
PDF
Anomaly detection
PDF
Deep Feature Consistent VAE
PPTX
[Probability for machine learning]
PPTX
Deep learning overview
PPTX
Generative adversarial network
PPTX
Variational AutoEncoder(VAE)
PPTX
Restricted boltzmann machine
PPTX
Backpropagation
Deeppermnet
[Pr12] deep anomaly detection using geometric transformations
[Pr12] self supervised gan
Anomaly detection
Deep Feature Consistent VAE
[Probability for machine learning]
Deep learning overview
Generative adversarial network
Variational AutoEncoder(VAE)
Restricted boltzmann machine
Backpropagation

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
1_Introduction to advance data techniques.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Mega Projects Data Mega Projects Data
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
climate analysis of Dhaka ,Banglades.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Fluorescence-microscope_Botany_detailed content
1_Introduction to advance data techniques.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Knowledge Engineering Part 1
Business Ppt On Nestle.pptx huunnnhhgfvu
Mega Projects Data Mega Projects Data
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

PR-190: A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks

  • 1. A Baseline For Detecting Misclassified and Out-of-Distribution Examples In Neural Networks PR-190 Kang, MinGuk mingukkang1994@gmail.com Sep., 01, 2019 https://arxiv.org/abs/1610.02136
  • 2. Unprecedented Successes(Motivation) Image Classification Real-World Applications https://arxiv.org/abs/1905.11946
  • 4. Why Deep Neural Networks tend to be overconfident? ① Softmax Probabilities are computed with the fast-growing exponential function But… No Experimental Analysis https://arxiv.org/pdf/1706.04599.pdf Expected Calibration Error(ECE) ① Depth ↑ ② Filters ↑ ③ Batch Normalization 有 ④ Weight Decay ↓ It remains future work to understand why these trends affect calibration while improving accuracy.
  • 5. Contributions of this Paper 1. They show the prediction probability of incorrect and out-of-distribution examples tends to be lower than the prediction probability for correct examples. 2. These prediction probabilities form our detection baseline, and we demonstrate its efficacy through various computer vision, natural language processing, and automatic speech recognition tasks. 3. They contribute one method which outperforms the baseline on some (but not all) tasks. 4. the designation of standard tasks and evaluation metrics for assessing the automatic detection of errors and out-of-distribution examples.
  • 6. Evaluation Metrics In-distribution Fish: 99 Out-of-distribution Fish: 1 Cheating Neural Network: 99% accuracy! So, Accuracy is not appropriate metric for out-of-distribution detection.
  • 7. Evaluation Metrics ① AUROC(Area Under Receiver Operating Characteristic Curve) ① AUPR(Area Under Precision Recall Curve) FPR(False Positive Rate): 𝐹𝑃 𝐹𝑃+𝑇𝑁 TPR(True Positive Rate): 𝑇𝑃 𝑇𝑃+𝐹𝑁 interpreted as the probability that a positive example has a greater detector score/value than a negative example (Fawcett, 2005). AUROC is not ideal when the positive class and negative class have greatly differing base rates Precision: 𝑇𝑃 𝑇𝑃+𝐹𝑃 Recall: 𝑇𝑃 𝑇𝑃+𝐹𝑁 interpreted as the probability that a positive example has a greater detector score/value than a negative example (Fawcett, 2005). AUROC is not ideal when the positive class and negative class have greatly differing base rates
  • 8. Experiments(Misclassified?) Confi: 0.81 0.91 0.84 0.91 0.85 0.75 0.90 0.88 Average: 0.86 Confi: 0.90 0.95 0.85 0.95 0.92 0.88 0.95 0.86 Predict: 7 Actual: 7 Predict: 8 Actual: 8 Predict: 8 Actual: 8 Predict: 8 Actual: 8 Predict: 5 Actual: 5 Predict: 7 Actual: 7 Predict: 9 Actual: 9 Predict: 6 Actual: 6 Average: 0.91
  • 9. Experiments(Out of Distribution) Wide(40-4) Prediction Train CIFAR10 Dataset Test Select Maximum Softmax Probability and Use it as out-of-distribution score
  • 10. Experiments(NLP) Same Phenomenon was discovered in the NLP! Sentiment Classification Text Categorization Automatic Speech Recognition Experimental Results of Sentiment Classification
  • 11. Improved Method Abnormality Module 1. Train a normal classifier and append an auxiliary decoder which reconstructs the input with in-distribution dataset. 2. Froze the blue layer. 3. Train red layers on clean and noised training examples. Finally the sigmoid output of the red layers scores how normal the input is
  • 12. Improved Method Abnormality Module Abnormality Module is useful to detect out-of-distribution samples!
  • 13. Expected Calibration Error(ECE) ① Depth ↑ ② Filters ↑ ③ Batch Normalization 有 ④ Weight Decay ↓ It remains future work to understand why these trends affect calibration while improving accuracy. On Calibration of Modern Neural Networks (2017.06.14) (2016.10.07)
  • 14. A Simple Unified Framework for Detecting Out-of- Distribution Samples And Adversarial Attacks (2018.07.10) Training Confidence-Calibrated Classifiers for detecting Out-of-Distribution samples (2017.11.26) Train Generative Adversarial Networks to generate Boundary Samples. Class(k) Probability 1/k
  • 15. Deep Anomaly Detection with Outlier Exposure (2018.12.11) Utilize Realistic Outliers instead of boundary samples Class(k) Probability 1/k In-distribution dataset Out-of-Distribution dataset Thank You!