SlideShare a Scribd company logo
7
Most read
8
Most read
9
Most read
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
Calibrating Probability with Undersampling
for Unbalanced Classification
Andrea Dal Pozzolo, Olivier Caelen,
Reid A. Johnson, and Gianluca Bontempi
8/12/2015
IEEE CIDM 2015
Cape Town, South Africa
1/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
INTRODUCTION
In several binary classification problems, the two classes
are not equally represented in the dataset.
In Fraud detection for example, fraudulent transactions are
rare compared to genuine ones (less than 1% [2]).
Many classification algorithms performs poorly in with
unbalanced class distribution [5].
A standard solution to unbalanced classification is
rebalancing the classes before training a classifier.
2/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
UNDERSAMPLING
Undersampling is a well-know technique used to balanced
a dataset.
It consists in down-sizing the majority class by removing
observations at random until the dataset is balanced.
Several studies [11, 4] have reported that it improves
classification performances.
Most often, the consequences of undersampling on the
posterior probability of a classifier are ignored.
3/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
OBJECTIVE OF THIS STUDY
In this work we:
formalize how undersampling works.
show that undersampling is responsible for a shift in the
posterior probability of a classifier.
study how this shift is linked to class separability.
investigate how this shift produces biased probability
estimates.
show how to obtain and use unbiased (calibrated)
probability for classification.
4/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
THE PROBLEM
Let us consider a binary classification task f : Rn → {+, −}
X ∈ Rn is the input and Y ∈ {+, −} the output domain.
+ is the minority and − the majority class.
Given a classifier K and a training set TN, we are interested
in estimating for a new sample (x, y) the posterior
probability p(y = +|x).
5/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
EFFECT OF UNDERSAMPLING
Suppose that a classifier K is trained on set TN which is
unbalanced.
Let s be a random variable associated to each sample
(x, y) ∈ TN, s = 1 if the point is sampled and s = 0
otherwise.
Assume that s is independent of the input x given the class
y (class-dependent selection):
p(s|y, x) = p(s|y) ⇔ p(x|y, s) = p(x|y)
.
Undersampling--
Unbalanced- Balanced-
Figure : Undersampling: remove randomly majority class examples.
In red samples that are removed from the unbalanced dataset (s = 0).
6/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
POSTERIOR PROBABILITIES
Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a
function of p [1]:
ps =
p
p + β(1 − p)
(1)
where β = p(s = 1|−). Using (1) we can obtain an expression of
p as a function of ps:
p =
βps
βps − ps + 1
(2)
7/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY
Let ω+ and ω− denote p(x|+) and p(x|−), and π+ (π+
s ) the class
priors before (after) undersampling. Using Bayes’ theorem:
p =
ω+π+
ω+ − δπ−
(3)
where δ = ω+ − ω−. Similarly, since ω+ does not change with
undersampling:
ps =
ω+π+
s
ω+ − δπ−
s
(4)
Now we can write ps − p as:
ps − p =
ω+π+
s
ω+ − δπ−
s
−
ω+π+
ω+ − δπ−
(5)
8/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY
Figure : ps − p as a function of δ, where δ = ω+
− ω−
for values of
ω+
∈ {0.01, 0.1} when π+
s = 0.5 and π+
= 0.1.
9/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY II
(a) ps as a function of β
3 15
0
500
1000
1500
−10 0 10 20 −10 0 10 20
x
Count
class
0
1
(b) Class distribution
Figure : Class distribution and posterior probability as a function of β
for two univariate binary classification tasks with norm class
conditional densities X− ∼ N(0, σ) and X+ ∼ N(µ, σ) (on the left
µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p
corresponds to β = 1 and ps to β < 1.
10/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
ADJUSTING POSTERIOR PROBABILITIES
We propose to use correct ps with p , which is obtained
using (2):
p =
βps
βps − ps + 1
(6)
Eq. (6) is a special case of the framework proposed by Saerens
et al. [8] and Elkan [3] (see Appendix in the paper).
0.00
0.25
0.50
0.75
1.00
−10 −5 0 5 10 15
x
Posteriorprobability
Probability
ps
p'
p
Figure : Posterior probabilities ps, p and p for β = N+
N− in the dataset
with overlapping classes (µ = 3).
11/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CLASSIFICATION THRESHOLD
Let r+ and r− be the risk of predicting an instance as positive
and negative:
r+
= (1 − p) · l1,0 + p · l1,1
r−
= (1 − p) · l0,0 + p · l0,1
where li,j is the cost in predicting i when the true class is j and
p = p(y = +|x). A sample is predicted as positive if
r+ ≤ r− [10]:
ˆy =
+ if r+ ≤ r−
− if r+ > r− (7)
Alternatively, predict as positive when p > τ with τ:
τ =
l1,0 − l0,0
l1,0 − l0,0 + l0,1 − l1,1
(8)
12/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CORRECTING THE CLASSIFICATION THRESHOLD
When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can
use the priors. Let l1,0 = π+ and l0,1 = π−, from (8) we get:
τ =
l1,0
l1,0 + l0,1
=
π+
π+ + π−
= π+
(9)
since π+ + π− = 1. Then we should use π+ as threshold with p:
p −→ τ = π+
Similarly
ps −→ τs = π+
s
From Elkan [3]:
τ
1 − τ
1 − τs
τs
= β (10)
Therefore, we obtain:
p −→ τ = π+
13/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
EXPERIMENTAL SETTINGS
We denote as ˆps, ˆp and ˆp the estimates of ps, p and p
Goal: understand which probability return the highest
ranking (AUC), calibration (BS) and classification accuracy
(G-mean).
We use a 10-fold cross validation (CV) to test our models
and we repeated the CV 10 times.
We test several classification algorithms: Random
Forest [7], SVM [6], and Logit Boost [9].
We consider real-world unbalanced datasets from the UCI
repository used in [1].
14/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
LEARNING FRAMEWORK
Test%set% Train%set%
Undersampling%Unbalanced%Model%
Balanced%Model%ˆp
ˆ!p
τ
τ '
τs
Fold%1% Fold%2% Fold%3% Fold%4% Fold%10%
Unbalanced%Dataset%
.%.%.%.%
ˆps
Figure : Learning framework for comparing models with and
without undersampling using Cross Validation (CV). We use one fold
of the CV as testing set and the others for training, and iterate the
framework to use all the folds once for testing.
15/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
DATASETS
Table : Datasets from the UCI repository used in [1].
Datasets N N+ N− N+/N
ecoli 336 35 301 0.10
glass 214 17 197 0.08
letter-a 20000 789 19211 0.04
letter-vowel 20000 3878 16122 0.19
ism 11180 260 10920 0.02
letter 20000 789 19211 0.04
oil 937 41 896 0.04
page 5473 560 4913 0.10
pendigits 10992 1142 9850 0.10
PhosS 11411 613 10798 0.05
satimage 6430 625 5805 0.10
segment 2310 330 1980 0.14
boundary 3505 123 3382 0.04
estate 5322 636 4686 0.12
cam 18916 942 17974 0.05
compustat 13657 520 13137 0.04
covtype 38500 2747 35753 0.07
16/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
RESULTS
Table : Sum of ranks and p-values of the paired t-test between the
ranks of ˆp and ˆp and between ˆp and ˆps for different metrics. In bold
the probabilities with the best rank sum (higher for AUC and
G-mean, lower for BS).
Metric Algo Rˆp Rˆps
Rˆp ρ(Rˆp, Rˆps
) ρ(Rˆp, Rˆp )
AUC LB 22,516 23,572 23,572 0.322 0.322
AUC RF 24,422 22,619 22,619 0.168 0.168
AUC SVM 19,595 19,902.5 19,902.5 0.873 0.873
G-mean LB 23,281 23,189.5 23,189.5 0.944 0.944
G-mean RF 22,986 23,337 23,337 0.770 0.770
G-mean SVM 19,550 19,925 19,925 0.794 0.794
BS LB 19809.5 29448.5 20402 0.000 0.510
BS RF 18336 28747 22577 0.000 0.062
BS SVM 17139 23161 19100 0.001 0.156
17/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
RESULTS II
The rank sum is the same for ˆps and ˆp since (6) is
monotone.
Undersampling does not always improve the ranking
(AUC) or classification accuracy (G-mean) of an algorithm.
ˆp is the probability estimate with the best calibration
(lower rank sum with BS).
ˆp has always better calibration than ˆps, then we should use
ˆp instead of ˆps.
18/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CREDIT CARDS DATASET
Real-world credit card dataset with transactions from Sep 2013,
frauds account for 0.172% of all transactions.
LB RF SVM
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqqqq qqqqqqqq
qqqqqqqq qqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.900
0.925
0.950
0.975
1.000
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
AUC
Probability
p
p'
ps
Credit−card
LB RF SVM
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqq
qqqqqqqqqq
qqqqqq
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
qqqqqq
qqqqqqqq qqqqqqqq
3e−04
6e−04
9e−04
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
BS
Probability
p
p'
ps
Credit−card
19/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CONCLUSION
As a result of undersampling, ˆps is shifted away from ˆp.
This shift is stronger for overlapping distributions and gets
larger for small values of β.
Using (6), we can remove the drift in ˆps and obtain ˆp
which has better calibration.
ˆp provides the same ranking quality of ˆps.
Results from UCI and credit card datasets show that using
ˆp with τ we are able to improve calibration without losing
predictive accuracy.
20/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
Credit card dataset: http://www.ulb.ac.be/di/map/
adalpozz/data/creditcard.Rdata
Website: www.ulb.ac.be/di/map/adalpozz
Email: adalpozz@ulb.ac.be
Thank you for the attention
Research is supported by the Doctiris scholarship
funded by Innoviris, Brussels, Belgium.
21/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
BIBLIOGRAPHY
[1] A. Dal Pozzolo, O. Caelen, and G. Bontempi.
When is undersampling effective in unbalanced classification tasks?
In Machine Learning and Knowledge Discovery in Databases. Springer, 2015.
[2] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi.
Learned lessons in credit card fraud detection from a practitioner perspective.
Expert Systems with Applications, 41(10):4915–4928, 2014.
[3] C. Elkan.
The foundations of cost-sensitive learning.
In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978, 2001.
[4] A. Estabrooks, T. Jo, and N. Japkowicz.
A multiple resampling method for learning from imbalanced data sets.
Computational Intelligence, 20(1):18–36, 2004.
[5] N. Japkowicz and S. Stephen.
The class imbalance problem: A systematic study.
Intelligent data analysis, 6(5):429–449, 2002.
[6] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis.
kernlab-an s4 package for kernel methods in r.
2004.
[7] A. Liaw and M. Wiener.
Classification and regression by randomforest.
R News, 2(3):18–22, 2002.
[8] M. Saerens, P. Latinne, and C. Decaestecker.
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.
Neural computation, 14(1):21–41, 2002.
[9] J. Tuszynski.
caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc., 2013.
R package version 1.16.
[10] V. N. Vapnik and V. Vapnik.
Statistical learning theory, volume 1.
Wiley New York, 1998.
[11] G. M. Weiss and F. Provost.
The effect of class distribution on classifier learning: an empirical study.
Rutgers Univ, 2001.
22/ 22

More Related Content

PDF
Latent Dirichlet Allocation
PDF
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
PPTX
[DL輪読会]Understanding deep learning requires rethinking generalization
PPTX
A quick introduction to R
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PDF
Deep Generative Learning for All
PDF
単語の分散表現と構成性の計算モデルの発展
Latent Dirichlet Allocation
直感的な単変量モデルでは予測できない「ワインの味」を多変量モデルで予測する
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
[DL輪読会]Understanding deep learning requires rethinking generalization
A quick introduction to R
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Deep Generative Learning for All
単語の分散表現と構成性の計算モデルの発展

What's hot (20)

PPTX
検索評価ツールキットNTCIREVALを用いた様々な情報アクセス技術の評価方法
PDF
Toward Disentanglement through Understand ELBO
PDF
About Unsupervised Image-to-Image Translation
PPTX
LEXBFS on Chordal Graphs
PPTX
Graph Representation Learning
PDF
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
PPTX
Dowhy: An end-to-end library for causal inference
PDF
Intro to Jupyter Notebooks
PPTX
Support vector machines
PPTX
Zotero紹介
PDF
FixMatch:simplifying semi supervised learning with consistency and confidence
PDF
深層意味表現学習 (Deep Semantic Representations)
PDF
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
PDF
Gnn overview
PDF
Feature Engineering
PDF
Iclr2016 vaeまとめ
PDF
Deep Learning Theory Seminar (Chap 1-2, part 1)
PDF
分散表現を用いたリアルタイム学習型セッションベース推薦システム
PDF
Trends of ICASSP 2022
PDF
[DL輪読会]Disentangling by Factorising
検索評価ツールキットNTCIREVALを用いた様々な情報アクセス技術の評価方法
Toward Disentanglement through Understand ELBO
About Unsupervised Image-to-Image Translation
LEXBFS on Chordal Graphs
Graph Representation Learning
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Dowhy: An end-to-end library for causal inference
Intro to Jupyter Notebooks
Support vector machines
Zotero紹介
FixMatch:simplifying semi supervised learning with consistency and confidence
深層意味表現学習 (Deep Semantic Representations)
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Gnn overview
Feature Engineering
Iclr2016 vaeまとめ
Deep Learning Theory Seminar (Chap 1-2, part 1)
分散表現を用いたリアルタイム学習型セッションベース推薦システム
Trends of ICASSP 2022
[DL輪読会]Disentangling by Factorising
Ad

Viewers also liked (20)

PDF
When is undersampling effective in unbalanced classification tasks?
PDF
A novel approach on gate delay transition based path delay fault model
PPS
互邀新平台产品宣介
PPTX
Aq workshop l01 - way of a muslim
PPT
Подарунки
PDF
Slideshare1
PPT
腾讯产品运营PPT-产品经理的视角
PDF
Slideshare1
DOC
Notes day 2 - angels
PDF
产品经理实用工具全集(1 8)
PPTX
PPSX
Raising a Mathematician Training Program 2015
PDF
The true religion of god
PPTX
Board room of today presentation
DOCX
tttt
PDF
The three fundamental principles
DOCX
Confusion in manhaj
PDF
Colegio nacional nicolas esguerra
PDF
Shapiro capitulo 4 1° parte
When is undersampling effective in unbalanced classification tasks?
A novel approach on gate delay transition based path delay fault model
互邀新平台产品宣介
Aq workshop l01 - way of a muslim
Подарунки
Slideshare1
腾讯产品运营PPT-产品经理的视角
Slideshare1
Notes day 2 - angels
产品经理实用工具全集(1 8)
Raising a Mathematician Training Program 2015
The true religion of god
Board room of today presentation
tttt
The three fundamental principles
Confusion in manhaj
Colegio nacional nicolas esguerra
Shapiro capitulo 4 1° parte
Ad

Similar to Calibrating Probability with Undersampling for Unbalanced Classification (20)

PDF
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PDF
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PPTX
module_of_healthcare_wound_healing_mbbs_3.pptx
PDF
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
PPTX
Borderline Smote
PPTX
credibility : evaluating what's been learned from data science
PDF
Introduction to Machine Learning Lectures
PDF
Adaptive Machine Learning for Credit Card Fraud Detection
PDF
Binary Classification with Models and Data Density Distribution by Xuan Chen
PDF
Analysis of Imbalanced Classification Algorithms A Perspective View
PPTX
Predictive analytics using 'R' Programming
PPTX
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
PPTX
in5490-classification (1).pptx
PPT
NaiveBayesfcctcvtyvyuyuvuygygygiughuobiubivvyjnh
PDF
Pattern Recognition 21BR551 MODULE 02 NOTES.pdf
PDF
Resampling methods Cross Validation Bootstrap Bias and variance estimation...
PDF
lecture 5 about lecture 5 about lecture lecture
PDF
Dealing with imbalanced data in RTB
PPTX
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
PDF
Handling Imbalanced Data: SMOTE vs. Random Undersampling
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
module_of_healthcare_wound_healing_mbbs_3.pptx
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
Borderline Smote
credibility : evaluating what's been learned from data science
Introduction to Machine Learning Lectures
Adaptive Machine Learning for Credit Card Fraud Detection
Binary Classification with Models and Data Density Distribution by Xuan Chen
Analysis of Imbalanced Classification Algorithms A Perspective View
Predictive analytics using 'R' Programming
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
in5490-classification (1).pptx
NaiveBayesfcctcvtyvyuyuvuygygygiughuobiubivvyjnh
Pattern Recognition 21BR551 MODULE 02 NOTES.pdf
Resampling methods Cross Validation Bootstrap Bias and variance estimation...
lecture 5 about lecture 5 about lecture lecture
Dealing with imbalanced data in RTB
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
Handling Imbalanced Data: SMOTE vs. Random Undersampling

More from Andrea Dal Pozzolo (9)

PDF
Andrea Dal Pozzolo's CV
PDF
Presentation of the unbalanced R package
PDF
Andrea Dal Pozzolo - Data Scientist
PDF
Is Machine learning useful for Fraud Prevention?
PDF
Credit card fraud detection and concept drift adaptation with delayed supervi...
PDF
Data Innovation Summit - Made in Belgium 2015
PDF
Doctiris project - Innoviris, Brussels
PDF
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
PDF
Racing for unbalanced methods selection
Andrea Dal Pozzolo's CV
Presentation of the unbalanced R package
Andrea Dal Pozzolo - Data Scientist
Is Machine learning useful for Fraud Prevention?
Credit card fraud detection and concept drift adaptation with delayed supervi...
Data Innovation Summit - Made in Belgium 2015
Doctiris project - Innoviris, Brussels
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Racing for unbalanced methods selection

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Microbiology with diagram medical studies .pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Sciences of Europe No 170 (2025)
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
BIOMOLECULES PPT........................
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
. Radiology Case Scenariosssssssssssssss
Derivatives of integument scales, beaks, horns,.pptx
ECG_Course_Presentation د.محمد صقران ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
HPLC-PPT.docx high performance liquid chromatography
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Biophysics 2.pdffffffffffffffffffffffffff
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Microbiology with diagram medical studies .pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Sciences of Europe No 170 (2025)
Phytochemical Investigation of Miliusa longipes.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
2Systematics of Living Organisms t-.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
BIOMOLECULES PPT........................

Calibrating Probability with Undersampling for Unbalanced Classification

  • 1. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion Calibrating Probability with Undersampling for Unbalanced Classification Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi 8/12/2015 IEEE CIDM 2015 Cape Town, South Africa 1/ 22
  • 2. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion INTRODUCTION In several binary classification problems, the two classes are not equally represented in the dataset. In Fraud detection for example, fraudulent transactions are rare compared to genuine ones (less than 1% [2]). Many classification algorithms performs poorly in with unbalanced class distribution [5]. A standard solution to unbalanced classification is rebalancing the classes before training a classifier. 2/ 22
  • 3. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion UNDERSAMPLING Undersampling is a well-know technique used to balanced a dataset. It consists in down-sizing the majority class by removing observations at random until the dataset is balanced. Several studies [11, 4] have reported that it improves classification performances. Most often, the consequences of undersampling on the posterior probability of a classifier are ignored. 3/ 22
  • 4. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion OBJECTIVE OF THIS STUDY In this work we: formalize how undersampling works. show that undersampling is responsible for a shift in the posterior probability of a classifier. study how this shift is linked to class separability. investigate how this shift produces biased probability estimates. show how to obtain and use unbiased (calibrated) probability for classification. 4/ 22
  • 5. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion THE PROBLEM Let us consider a binary classification task f : Rn → {+, −} X ∈ Rn is the input and Y ∈ {+, −} the output domain. + is the minority and − the majority class. Given a classifier K and a training set TN, we are interested in estimating for a new sample (x, y) the posterior probability p(y = +|x). 5/ 22
  • 6. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion EFFECT OF UNDERSAMPLING Suppose that a classifier K is trained on set TN which is unbalanced. Let s be a random variable associated to each sample (x, y) ∈ TN, s = 1 if the point is sampled and s = 0 otherwise. Assume that s is independent of the input x given the class y (class-dependent selection): p(s|y, x) = p(s|y) ⇔ p(x|y, s) = p(x|y) . Undersampling-- Unbalanced- Balanced- Figure : Undersampling: remove randomly majority class examples. In red samples that are removed from the unbalanced dataset (s = 0). 6/ 22
  • 7. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion POSTERIOR PROBABILITIES Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a function of p [1]: ps = p p + β(1 − p) (1) where β = p(s = 1|−). Using (1) we can obtain an expression of p as a function of ps: p = βps βps − ps + 1 (2) 7/ 22
  • 8. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY Let ω+ and ω− denote p(x|+) and p(x|−), and π+ (π+ s ) the class priors before (after) undersampling. Using Bayes’ theorem: p = ω+π+ ω+ − δπ− (3) where δ = ω+ − ω−. Similarly, since ω+ does not change with undersampling: ps = ω+π+ s ω+ − δπ− s (4) Now we can write ps − p as: ps − p = ω+π+ s ω+ − δπ− s − ω+π+ ω+ − δπ− (5) 8/ 22
  • 9. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY Figure : ps − p as a function of δ, where δ = ω+ − ω− for values of ω+ ∈ {0.01, 0.1} when π+ s = 0.5 and π+ = 0.1. 9/ 22
  • 10. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY II (a) ps as a function of β 3 15 0 500 1000 1500 −10 0 10 20 −10 0 10 20 x Count class 0 1 (b) Class distribution Figure : Class distribution and posterior probability as a function of β for two univariate binary classification tasks with norm class conditional densities X− ∼ N(0, σ) and X+ ∼ N(µ, σ) (on the left µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p corresponds to β = 1 and ps to β < 1. 10/ 22
  • 11. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion ADJUSTING POSTERIOR PROBABILITIES We propose to use correct ps with p , which is obtained using (2): p = βps βps − ps + 1 (6) Eq. (6) is a special case of the framework proposed by Saerens et al. [8] and Elkan [3] (see Appendix in the paper). 0.00 0.25 0.50 0.75 1.00 −10 −5 0 5 10 15 x Posteriorprobability Probability ps p' p Figure : Posterior probabilities ps, p and p for β = N+ N− in the dataset with overlapping classes (µ = 3). 11/ 22
  • 12. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CLASSIFICATION THRESHOLD Let r+ and r− be the risk of predicting an instance as positive and negative: r+ = (1 − p) · l1,0 + p · l1,1 r− = (1 − p) · l0,0 + p · l0,1 where li,j is the cost in predicting i when the true class is j and p = p(y = +|x). A sample is predicted as positive if r+ ≤ r− [10]: ˆy = + if r+ ≤ r− − if r+ > r− (7) Alternatively, predict as positive when p > τ with τ: τ = l1,0 − l0,0 l1,0 − l0,0 + l0,1 − l1,1 (8) 12/ 22
  • 13. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CORRECTING THE CLASSIFICATION THRESHOLD When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can use the priors. Let l1,0 = π+ and l0,1 = π−, from (8) we get: τ = l1,0 l1,0 + l0,1 = π+ π+ + π− = π+ (9) since π+ + π− = 1. Then we should use π+ as threshold with p: p −→ τ = π+ Similarly ps −→ τs = π+ s From Elkan [3]: τ 1 − τ 1 − τs τs = β (10) Therefore, we obtain: p −→ τ = π+ 13/ 22
  • 14. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion EXPERIMENTAL SETTINGS We denote as ˆps, ˆp and ˆp the estimates of ps, p and p Goal: understand which probability return the highest ranking (AUC), calibration (BS) and classification accuracy (G-mean). We use a 10-fold cross validation (CV) to test our models and we repeated the CV 10 times. We test several classification algorithms: Random Forest [7], SVM [6], and Logit Boost [9]. We consider real-world unbalanced datasets from the UCI repository used in [1]. 14/ 22
  • 15. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion LEARNING FRAMEWORK Test%set% Train%set% Undersampling%Unbalanced%Model% Balanced%Model%ˆp ˆ!p τ τ ' τs Fold%1% Fold%2% Fold%3% Fold%4% Fold%10% Unbalanced%Dataset% .%.%.%.% ˆps Figure : Learning framework for comparing models with and without undersampling using Cross Validation (CV). We use one fold of the CV as testing set and the others for training, and iterate the framework to use all the folds once for testing. 15/ 22
  • 16. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion DATASETS Table : Datasets from the UCI repository used in [1]. Datasets N N+ N− N+/N ecoli 336 35 301 0.10 glass 214 17 197 0.08 letter-a 20000 789 19211 0.04 letter-vowel 20000 3878 16122 0.19 ism 11180 260 10920 0.02 letter 20000 789 19211 0.04 oil 937 41 896 0.04 page 5473 560 4913 0.10 pendigits 10992 1142 9850 0.10 PhosS 11411 613 10798 0.05 satimage 6430 625 5805 0.10 segment 2310 330 1980 0.14 boundary 3505 123 3382 0.04 estate 5322 636 4686 0.12 cam 18916 942 17974 0.05 compustat 13657 520 13137 0.04 covtype 38500 2747 35753 0.07 16/ 22
  • 17. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion RESULTS Table : Sum of ranks and p-values of the paired t-test between the ranks of ˆp and ˆp and between ˆp and ˆps for different metrics. In bold the probabilities with the best rank sum (higher for AUC and G-mean, lower for BS). Metric Algo Rˆp Rˆps Rˆp ρ(Rˆp, Rˆps ) ρ(Rˆp, Rˆp ) AUC LB 22,516 23,572 23,572 0.322 0.322 AUC RF 24,422 22,619 22,619 0.168 0.168 AUC SVM 19,595 19,902.5 19,902.5 0.873 0.873 G-mean LB 23,281 23,189.5 23,189.5 0.944 0.944 G-mean RF 22,986 23,337 23,337 0.770 0.770 G-mean SVM 19,550 19,925 19,925 0.794 0.794 BS LB 19809.5 29448.5 20402 0.000 0.510 BS RF 18336 28747 22577 0.000 0.062 BS SVM 17139 23161 19100 0.001 0.156 17/ 22
  • 18. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion RESULTS II The rank sum is the same for ˆps and ˆp since (6) is monotone. Undersampling does not always improve the ranking (AUC) or classification accuracy (G-mean) of an algorithm. ˆp is the probability estimate with the best calibration (lower rank sum with BS). ˆp has always better calibration than ˆps, then we should use ˆp instead of ˆps. 18/ 22
  • 19. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CREDIT CARDS DATASET Real-world credit card dataset with transactions from Sep 2013, frauds account for 0.172% of all transactions. LB RF SVM qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqqqq qqqqqqqq qqqqqqqq qqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.900 0.925 0.950 0.975 1.000 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta AUC Probability p p' ps Credit−card LB RF SVM q q q q q q q q q q q q q q q q q q q q qqqqqqqqqq qqqqqqqqqq qqqqqq qqqqqq q q q q q q q q q q q q qqqqqq qqqqqq qqqqqqqq qqqqqqqq 3e−04 6e−04 9e−04 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta BS Probability p p' ps Credit−card 19/ 22
  • 20. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CONCLUSION As a result of undersampling, ˆps is shifted away from ˆp. This shift is stronger for overlapping distributions and gets larger for small values of β. Using (6), we can remove the drift in ˆps and obtain ˆp which has better calibration. ˆp provides the same ranking quality of ˆps. Results from UCI and credit card datasets show that using ˆp with τ we are able to improve calibration without losing predictive accuracy. 20/ 22
  • 21. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion Credit card dataset: http://www.ulb.ac.be/di/map/ adalpozz/data/creditcard.Rdata Website: www.ulb.ac.be/di/map/adalpozz Email: adalpozz@ulb.ac.be Thank you for the attention Research is supported by the Doctiris scholarship funded by Innoviris, Brussels, Belgium. 21/ 22
  • 22. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion BIBLIOGRAPHY [1] A. Dal Pozzolo, O. Caelen, and G. Bontempi. When is undersampling effective in unbalanced classification tasks? In Machine Learning and Knowledge Discovery in Databases. Springer, 2015. [2] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10):4915–4928, 2014. [3] C. Elkan. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978, 2001. [4] A. Estabrooks, T. Jo, and N. Japkowicz. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20(1):18–36, 2004. [5] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002. [6] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis. kernlab-an s4 package for kernel methods in r. 2004. [7] A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002. [8] M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural computation, 14(1):21–41, 2002. [9] J. Tuszynski. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc., 2013. R package version 1.16. [10] V. N. Vapnik and V. Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998. [11] G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: an empirical study. Rutgers Univ, 2001. 22/ 22