confusion data mining algorithm _updated.ppt

CONTENTS
BINARY CLASSIFIER
CONFUSION MATRIX
ACCURACY
PRECISION
RECALL
2

3
Binary Classifier
 A binary classifier produces output with two class values or labels,
such as Yes/No, 1/0, Positive/Negative for given input data
 A dataset used for performance evaluation is called a test dataset
 Observed labels are used to compare with the predicted labels
for performance evaluation after classification
 The predicted labels will be exactly the same
if the performance of a classifier is perfect
 But it is uncommon to be able to develop a perfect classifier

4
Confusion Matrix
 A confusion matrix is formed from the four outcomes
produced as a result of binary classification
 True positive (TP): correct positive prediction
 False positive (FP): incorrect positive prediction
 True negative (TN): correct negative prediction
 False negative (FN): incorrect negative prediction

6
Confusion Matrix
 Classifier
 Levels : Green / Grey
Confusion Matrix

7
Confusion Matrix
Confusion Matrix
True Positives
Green examples
correctly identified as
green
True Negatives
Gray examples
correctly identified as
grey
False Positives
Gray examples falsely
identified as green
False Negatives
Green examples
falsely identified as
gray

8
Accuracy
 Accuracy is calculated as the number of all correct predictions
divided by the total number of the dataset
 The best ACC is 1.0, whereas the worst is 0.0
 Accuracy = (9+8) / (9+2+1+8) = 0.85

9
Precision
 Precision is calculated as the number of correct positive predictions
divided by the total number of positive predictions
 The best precision is 1.0, whereas the worst is 0.0
 Precision = 9 / (9+2) = 0.818

10
Recall
 Sensitivity = Recall = True Positive Rate
 Recall is calculated as the number of correct positive predictions
divided by the total number of positives
 The best recall is 1.0, whereas the worst is 0.0
 Recall = 9 / (9+1) = 0.9

11
Example 1
 Example: The example to classify whether images contain either a dog or a cat
 The training data contains 25000 images of dogs and cats;
 The training data 75% of 25000 images; (25000*0.75 = 18750)
 Validation data 25% of training data; (25000*0.25 = 6250)
Test Data, 5 cats, 5 dogs
 Precision = 2/(2 + 0) * 100% = 100%
 Recall = 2/(2 + 3) * 100% = 40%
 Accuracy = (2 + 5)/(2 + 0 + 3 + 5) * 100% = 70%

13
Example 2
= = = 170/300 = .556
= = .5
= = .5
= = .667
= 0.556
= = .3
= = .6
= = 0.8
P = 0.556

confusion data mining algorithm _updated.ppt

More Related Content

Similar to confusion data mining algorithm _updated.ppt (20)

More from AhmedSalama337512 (7)

Recently uploaded (20)

confusion data mining algorithm _updated.ppt