04 performance metrics v2

Learning Objectives
✦Discuss the need for performance metrics
✦List and analyse the key methods of performance metrics

Need for Performance Metrics
Non Technical Domain

Technical Domain

✦How do you rank machine learning algorithm?
✦How can you pick one algorithm over the other?
✦How do you measure and compare these
algorithms?

✦ Performance metric is the answer to these
questions.
✦It helps measure and compare algorithms.

- Stephen Few
“Numbers have an important story to tell.
They rely on you to give them a voice.”
Performance Metrics

- Stephen Few
“Numbers have an important story to tell.
They rely on you to give them a voice.”
Performance Metrics
Assess Machine Learning Algorithms
Machine learning models are evaluated against your selected performance
metrics
Help evaluate efficiency and accuracy of machine learning models

Key Methods of Performance
Metrics
Confusion Matrix Accuracy
Precision Recall
Specificity F1 Score

Meaning of Confusion Matrix
TP FP
FN TN
Actual
Positives(1) Negatives(0)
Positives(1)
Negatives(0)
Predicted
One of the most intuitive and easiest metrics used to find
correctness and accuracy
Not a performance measure
Almost all performance metrics are based on confusion matrix

Confusion Matrix : Example
Cancer Prediction System
There are different approaches that can
hep the center predict cancer
Okay
Let me introduce you to one of the easiest
matrices that can help you predict whether a
person has cancer, the confusion matrix.

Confusion Matrix : Classification
Problem
How to predict if a person has cancer?
Give a label / class to the target variables:
When a person is diagnosed with cancer
When a person is does not have cancer
1
0

Confusion Matrix : Classification
Problem
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicted
Sets of classes are given in both dimensions

Terms of Confusion Matrix
True PositiveTP
True Negative TN
False Negative FN
False PositiveFP
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicted

True Positive
True Positive
T
P
T
N
F
N
F
P
True Positives are the cases where the actual
class of the data point is 1 (true) and the
predicted value is also 1 (true).
The case where a person has cancer and the
model classifies the case as cancer positive
comes under true positive.

True Negative
True Negative
T
P
T
N
F
N
F
P
True Negatives are the cases when the actual
class of the data point is 0 (false) and the
predicted is also 0 (false). It is negative
because the class predicted was negative.
The case where a person does not have
cancer and the model classifies the case as
cancer negative comes under true negative.

False Positive
T
P
T
N
F
N
F
P
False positives are the cases when the actual
class of the data point is 0 (false) and the
predicted is 1 (true). It is false because the
model has predicted incorrectly.
The case where a person does not have
cancer and the model classifies the case as
cancer positive comes under false positive.
False Positive

False Negative
False Negative
T
P
T
N
F
N
F
P
• False negatives are the cases when the
actual class of the data point is 1 (true) and
the predicted is 0 (false).
• It is false because the model has predicted
incorrectly.
• It is negative because the class predicted
was negative.
The case where a person has cancer and the
model classifies the case as cancer negative
comes under false negatives.

Minimize False Cases
What should be
minimised?
✦A model is best identified by its accuracy
✦No rules are defined to identify false cases
✦It depends on business requirements and context
of the problem.

Minimize False Negative :
Example
Out of 100
people
Actual cancer
patients = 5
Bad Model
Predicts everyone as non-
cancerous
Accuracy = 95%
When a person who does not have cancer is
classified as cancerous
Missing a cancer patient will be a huge
mistake

Minimize False Positive :
Example
The model needs to classify an email as spam or ham (term used for
genuine email).
Assign a label / class to the target variables:
Email is spam
Email is not spam
1
0

Minimize False Positive :
Example
Incoming mail Model
In case of false positive
Important email as spam
! Business stands a chance to miss
an important communication
An important email marked as
spam is more business critical
than diverting a spam email to
inbox.
Classifies

Accuracy
In classification problems, accuracy is defined
by the total number of correct predictions
made out of all the predictions.

Accuracy : Calculation
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicates
Accuracy =
TP + TN
TP + FP + FN + TN

Accuracy : Example
When the target variable
classes in the data are nearly
balanced
When do we use
accuracy?

Accuracy : Example
The machine learning model will
have approximately 97%
accuracy in any new predictions.

Accuracy : Example
5 out of 100 people have cancer
When do you
NOT use
accuracy?
It’s a bad model and predicts every case as
noncancerous
It classifies 95 noncancerous patients correctly and 5
cancerous patients as noncancerous
Accuracy of the model is 95%
When the target variable classes in the data are a
majority of one class

Precision
• Refers to the closeness of two or more
measurements
• Aims at deriving correct proportion of
positive identifications

Precision : Calculation
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicates
Precision =
TP
TP + FP

Precision : Example
Its a bad model and predicts every case as cancer
When do we use
precision?
Everyone has been predicted as having cancer
Precision of the model is 5%

Recall or Sensitivity
Recall or sensitivity measures the proportion of
actual positives and that are correctly identified.

Recall or Sensitivity : Calculation
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicates
Recall =
TP
TP + FN

Recall or Sensitivity : Example
Predicts every case as cancer
When do we use
recall?
Recall is 100%
Precision of the model is 5%

Recall as a Measure
When do we use
precision and
when do we use
recall?
Precision is about being
precise, whereas recall is about
capturing all the cases.

Recall as a Measure
When do we use
precision and
when do we use
recall?
If the model captures one
correct cancer positive case, it is
100% precise.

Recall as a Measure
When do we use
precision and
when do we use
recall?
If the model captures ever case
as cancer positive, you
have100% recall.

Recall as a Measure
When do we use
precision and
when do we use
recall?
To focus on minimising false
negatives you would want 100%
recall with a good precision
score.

Recall as a Measure
When do we use
precision and
when do we use
recall?
To focus on minimising false
positives you should aim for
100% precision.

Specificity
• Measures = proportion of actual negatives
that are correctly identified
• Tries to identify probability of a negative test
result when input with a negative example

Specificity : Calculation
TP FP
FN TN
Actual
Positives(1)
Negatives(0)
Predicates
Specificity =
TN
TN + FP

Specificity : Example
Predicts every case as cancer
So specificity is
the exact
opposite of
recall
Specificity is 0%

F1 Score
Do you have to carry both precision and
recall in your pockets every time you
make a model to solve a classification
problem?
No to avoid taking both precision and
recall, its best to get a single score
(F1 score) that can represent both
precision (P) and recall (R).

F1 Score : Calculation
3 97
0 0
Actual
Fraud Not Fraud
Fraud
Not Fraud
Predicates
F1 Score =
2 * Precision * Recall
Precision + Recall

F1 Score : Example
97 out of 100 credit card transactions are legit and 3 are
fraud
When do you
use F1 score?
Predicts everything as fraud
Fraud detection

F1 Score : Example
Precision =
3
100
= 3%
Recall =
100
3
= 100%
Arithmetic Mean =
3+100
2
= 51.5%

Harmonic Mean
• Harmonic mean is an average used when x
and y are equal
• Value of the mean is smaller when x and y are
different
With reference to the fraud detection example,
F1 Score can be calculated as
F1 Score =
2 * Precision * Recall
Precision + Recall
=
2 * 3 * 100
100 + 3
= 5%

Key Takeaways
✦Confusion matrix is used to find correctness and accusation of machine learning models. It is
also used for classification problems where the output can be one of two or more types of
classes.
✦Accuracy is the number of correct prediction made by the model over all kinds of predictions.
✦Precisision refers to the closeness of two or more measurements to each other
✦Recall measures the proportion of actual positives that are identified correctly.
✦Specificity measures the proportion of actual negatives that are identified correctly.
✦F1 Score gives a single score that represents both precision (P) and recall (R).
✦Harmonic mean is used when the sample data contains extreme value because it is more
balanced than arithmetic mean.

04 performance metrics v2

More Related Content

What's hot (20)

Similar to 04 performance metrics v2 (20)

More from Anne Starr (20)

Recently uploaded (20)

04 performance metrics v2

Editor's Notes