SlideShare a Scribd company logo
Binary classification performances measure cheat sheet
Damien François – v1.1 - 2009 (damien.francois@uclouvain.be)
Confusion matrix for two possible
outcomes p (positive) and n
(negative)
Actual
p n Total
Predicted
p'
true
positive
false
postive
P
n'
false
negative
true
negative
N
total P' N'
Classification accuracy
(TP + TN) / (TP + TN + FP + FN)
Error rate
(FP + FN) / (TP + TN + FP + FN)
Paired criteria
Precision: (or Positive predictive value)
proportion of predicted positives which
are actual positive
TP / (TP + FP)
Recall: proportion of actual positives
which are predicted positive
TP / (TP + FN)
Sensitivity: proportion of actual
positives which are predicted positive
TP / (TP + FN)
Specificity: proportion of actual
negative which are predicted negative
TN / (TN + FP)
True positive rate: proportion of
actual positives which are predicted
positive
TP / (TP + FN)
True negative rate: proportion of
actual negative which are predicted
negative
TN / (TN + FP)
Positive likelihood: likelihood that a
predicted positive is an actual positive
sensitivity / (1 - specificity)
Negative likelihood: likelihood that a
predicted negative is an actual
negative
(1 - sensitivity) / specificity
Combined criteria
BCR: Balanced Classification Rate
½ (TP / (TP + FN) + TN / (TN + FP))
BER: Balanced Error Rate, or HTER:
Half Total Error Rate: 1 - BCR
F-measure harmonic mean between
precision and recall
2 (precision . recall) /
(precision + recall)
Fβ-measure weighted harmonic mean
between precision and recall
(1+β)2
TP / ((1+β)2
TP + β2
FN + FP)
The harmonic mean between specificity
and sensitivity is also often used and
sometimes referred to as F-measure.
Youden's index: arithmetic mean
between sensitivity and specificity
sensitivity - (1 - specificity)
Matthews correlation correlation
between the actual and predicted
(TP . TN – FP . FN) /
((TP+FP) (TP+FN) (TP + FP) (TN+FN))1/2
comprised between -1 and 1
Discriminant power normalised
likelihood index
sqrt(3) / π .
(log (sensitivity / (1 – specificity)) +
log (specificity / (1 - sensitivity)))
<1 = poor, >3 = good, fair otherwise
Graphical tools
ROC curve receiver operating
characteristic curve : 2-D curve
parametrized by one parameter of the
classification algorithm, e.g. some
threshold in the « true postivie rate /
false positive rate » space
AUC The area under the ROC is
between 0 and 1
(Cumlative) Lift chart plot of the
true positive rate as a function of the
proportion of the population being
predicted positive, controlled by some
classifier parameter (e.g. a threshold)
Relationships
sensitivity = recall = true positive rate
specificity = true negative rate
BCR = ½ . (sensitivity + specificity)
BCR = 2 . Youden's index - 1
F-measure = F1measure
Accuracy = 1 – error rate
References
Sokolova, M. and Lapalme, G. 2009. A
systematic analysis of performance
measures for classification tasks. Inf.
Process. Manage. 45, 4 (Jul. 2009),
427-437.
Demsar, J.: Statistical comparisons of
classifiers over multiple data sets.
Journal of Machine Learning Research 7
(2006) 1–30
Regression performances measure cheat sheet
Damien François – v0.9 - 2009 (damien.francois@uclouvain.be)
Let be a set of
input/output pairs and a
function such that for ,
Squared error
SSE Sum of Squared Errors, or
RSS Residual Sum of Squares
MSE Mean Squared Error
RMSE Root Mean Squared Error
NMSE Normalised Mean Squared Error
where var is the empirical variance in
the sample.
R-squared
where var is the empirical variance in
the sample
Absolute error
MAD Mean Absolute Deviation
MAPE Mean Absolute Percentage Error
Predicted error
PRESS Predicted REsidual Sums of
Squares
where is a matrix built by stacking
the in rows. is the vector of
GCV Generalised Cross Validation
where is a matrix built by stacking
the in rows. is the vector of
Information criteria
AIC Akaike Information Criterion
where is the number of parameters
in the model
BIC Bayesian Information Criterion
where is the number of parameters
in the model
Robust error measures
Median Squared error
-trimmed MSE
where is the set of residuals
where percents of the largest
values are discarded.
M-estimators
where rho is a non-negative function
with a mininmum in 0, like the
parabola, the Hubber function, or the
bisquare function.
Graphical tool
Plot of predicted value against actual
value. A perfect model places all dots
on the diagonal.
Resampling methods
LOO – Leave-one-out: build the model
on data elements and test on
the remaining one. Iterate times to
collect all and compute mean error.
X-Val – Cross validation. Randomly
split the data in two parts, use the
first one to build the model and the
second one to test it. Iterate to get a
distribution of the test error of the
model.
K-Fold – Cut the data into K parts.
Build the model on the K-1 first parts
and test on the Kth one. Iterate from
1 to K to get a distribution of the test
error of the model.
Bootstrap – Draw a random subsample
of the data with replacement. Compute
the error on the whole dataset minus
the training error of the model and
Iterate to get a distribution of such
values. The mean of the distribution is
the optimism. The bootstrap error
estimate is the training error on the
whole dataset plus the optimism.

More Related Content

PPTX
Module 3_ Classification.pptx
PPTX
Important Classification and Regression Metrics.pptx
PPTX
Performance Measurement for Machine Leaning.pptx
PPT
clustering, k-mean clustering, confusion matrices
PPT
Lecture11_ Evaluation Metrics for classification.ppt
DOCX
Performance of the classification algorithm
PPTX
measures pptekejwjejejeeisjsjsjdjdjdjjddjdj
PPTX
Machine learning session5(logistic regression)
Module 3_ Classification.pptx
Important Classification and Regression Metrics.pptx
Performance Measurement for Machine Leaning.pptx
clustering, k-mean clustering, confusion matrices
Lecture11_ Evaluation Metrics for classification.ppt
Performance of the classification algorithm
measures pptekejwjejejeeisjsjsjdjdjdjjddjdj
Machine learning session5(logistic regression)

Similar to modelperfcheatsheet.pdf (20)

PPTX
MACHINE LEARNING PPT K MEANS CLUSTERING.
PDF
Data Science Cheatsheet.pdf
PPTX
Classification Evaluation Metrics (2).pptx
PPT
classifier_evaluation_lecture_ai_101.ppt
PPTX
All PERFORMANCE PREDICTION PARAMETERS.pptx
PPTX
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PPT
Business Intelligence and Data Analytics.ppt
PPT
BIIntroduction. on business intelligenceppt
PPT
BIIntro.ppt
PPTX
FBA-PPTs-sssion-17-20 .pptx
PPTX
ML-ChapterFour-ModelEvaluation.pptx
PDF
evaluationmeasures-ml.pdf evaluation measures
PPTX
Lecture-12Evaluation Measures-ML.pptx
PPTX
lecture-12evaluationmeasures-ml-221219130248-3522ee79.pptx eval
PDF
Machine learning Mind Map
PDF
A Novel Performance Measure for Machine Learning Classification
PDF
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
PPTX
04 performance metrics v2
PDF
working with python
PDF
A Novel Performance Measure For Machine Learning Classification
MACHINE LEARNING PPT K MEANS CLUSTERING.
Data Science Cheatsheet.pdf
Classification Evaluation Metrics (2).pptx
classifier_evaluation_lecture_ai_101.ppt
All PERFORMANCE PREDICTION PARAMETERS.pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
Business Intelligence and Data Analytics.ppt
BIIntroduction. on business intelligenceppt
BIIntro.ppt
FBA-PPTs-sssion-17-20 .pptx
ML-ChapterFour-ModelEvaluation.pptx
evaluationmeasures-ml.pdf evaluation measures
Lecture-12Evaluation Measures-ML.pptx
lecture-12evaluationmeasures-ml-221219130248-3522ee79.pptx eval
Machine learning Mind Map
A Novel Performance Measure for Machine Learning Classification
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
04 performance metrics v2
working with python
A Novel Performance Measure For Machine Learning Classification
Ad

Recently uploaded (20)

PDF
SBI Securities Weekly Wrap 08-08-2025_250808_205045.pdf
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PDF
How to Get Funding for Your Trucking Business
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
Comments on Crystal Cloud and Energy Star.pdf
PDF
Solaris Resources Presentation - Corporate August 2025.pdf
PDF
Keppel_Proposed Divestment of M1 Limited
PDF
Charisse Litchman: A Maverick Making Neurological Care More Accessible
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
Daniels 2024 Inclusive, Sustainable Development
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PDF
Nidhal Samdaie CV - International Business Consultant
PPTX
DMT - Profile Brief About Business .pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
SBI Securities Weekly Wrap 08-08-2025_250808_205045.pdf
340036916-American-Literature-Literary-Period-Overview.ppt
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
2025 Product Deck V1.0.pptxCATALOGTCLCIA
How to Get Funding for Your Trucking Business
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Principles of Marketing, Industrial, Consumers,
Comments on Crystal Cloud and Energy Star.pdf
Solaris Resources Presentation - Corporate August 2025.pdf
Keppel_Proposed Divestment of M1 Limited
Charisse Litchman: A Maverick Making Neurological Care More Accessible
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Daniels 2024 Inclusive, Sustainable Development
Digital Marketing & E-commerce Certificate Glossary.pdf.................
NEW - FEES STRUCTURES (01-july-2024).pdf
Module 2 - Modern Supervison Challenges - Student Resource.pdf
Nidhal Samdaie CV - International Business Consultant
DMT - Profile Brief About Business .pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
Ad

modelperfcheatsheet.pdf

  • 1. Binary classification performances measure cheat sheet Damien François – v1.1 - 2009 (damien.francois@uclouvain.be) Confusion matrix for two possible outcomes p (positive) and n (negative) Actual p n Total Predicted p' true positive false postive P n' false negative true negative N total P' N' Classification accuracy (TP + TN) / (TP + TN + FP + FN) Error rate (FP + FN) / (TP + TN + FP + FN) Paired criteria Precision: (or Positive predictive value) proportion of predicted positives which are actual positive TP / (TP + FP) Recall: proportion of actual positives which are predicted positive TP / (TP + FN) Sensitivity: proportion of actual positives which are predicted positive TP / (TP + FN) Specificity: proportion of actual negative which are predicted negative TN / (TN + FP) True positive rate: proportion of actual positives which are predicted positive TP / (TP + FN) True negative rate: proportion of actual negative which are predicted negative TN / (TN + FP) Positive likelihood: likelihood that a predicted positive is an actual positive sensitivity / (1 - specificity) Negative likelihood: likelihood that a predicted negative is an actual negative (1 - sensitivity) / specificity Combined criteria BCR: Balanced Classification Rate ½ (TP / (TP + FN) + TN / (TN + FP)) BER: Balanced Error Rate, or HTER: Half Total Error Rate: 1 - BCR F-measure harmonic mean between precision and recall 2 (precision . recall) / (precision + recall) Fβ-measure weighted harmonic mean between precision and recall (1+β)2 TP / ((1+β)2 TP + β2 FN + FP) The harmonic mean between specificity and sensitivity is also often used and sometimes referred to as F-measure. Youden's index: arithmetic mean between sensitivity and specificity sensitivity - (1 - specificity) Matthews correlation correlation between the actual and predicted (TP . TN – FP . FN) / ((TP+FP) (TP+FN) (TP + FP) (TN+FN))1/2 comprised between -1 and 1 Discriminant power normalised likelihood index sqrt(3) / π . (log (sensitivity / (1 – specificity)) + log (specificity / (1 - sensitivity))) <1 = poor, >3 = good, fair otherwise Graphical tools ROC curve receiver operating characteristic curve : 2-D curve parametrized by one parameter of the classification algorithm, e.g. some threshold in the « true postivie rate / false positive rate » space AUC The area under the ROC is between 0 and 1 (Cumlative) Lift chart plot of the true positive rate as a function of the proportion of the population being predicted positive, controlled by some classifier parameter (e.g. a threshold) Relationships sensitivity = recall = true positive rate specificity = true negative rate BCR = ½ . (sensitivity + specificity) BCR = 2 . Youden's index - 1 F-measure = F1measure Accuracy = 1 – error rate References Sokolova, M. and Lapalme, G. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 4 (Jul. 2009), 427-437. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006) 1–30
  • 2. Regression performances measure cheat sheet Damien François – v0.9 - 2009 (damien.francois@uclouvain.be) Let be a set of input/output pairs and a function such that for , Squared error SSE Sum of Squared Errors, or RSS Residual Sum of Squares MSE Mean Squared Error RMSE Root Mean Squared Error NMSE Normalised Mean Squared Error where var is the empirical variance in the sample. R-squared where var is the empirical variance in the sample Absolute error MAD Mean Absolute Deviation MAPE Mean Absolute Percentage Error Predicted error PRESS Predicted REsidual Sums of Squares where is a matrix built by stacking the in rows. is the vector of GCV Generalised Cross Validation where is a matrix built by stacking the in rows. is the vector of Information criteria AIC Akaike Information Criterion where is the number of parameters in the model BIC Bayesian Information Criterion where is the number of parameters in the model Robust error measures Median Squared error -trimmed MSE where is the set of residuals where percents of the largest values are discarded. M-estimators where rho is a non-negative function with a mininmum in 0, like the parabola, the Hubber function, or the bisquare function. Graphical tool Plot of predicted value against actual value. A perfect model places all dots on the diagonal. Resampling methods LOO – Leave-one-out: build the model on data elements and test on the remaining one. Iterate times to collect all and compute mean error. X-Val – Cross validation. Randomly split the data in two parts, use the first one to build the model and the second one to test it. Iterate to get a distribution of the test error of the model. K-Fold – Cut the data into K parts. Build the model on the K-1 first parts and test on the Kth one. Iterate from 1 to K to get a distribution of the test error of the model. Bootstrap – Draw a random subsample of the data with replacement. Compute the error on the whole dataset minus the training error of the model and Iterate to get a distribution of such values. The mean of the distribution is the optimism. The bootstrap error estimate is the training error on the whole dataset plus the optimism.