SlideShare a Scribd company logo
Supervised Learning:
Classification-III
Bayes Approach
Classification - Decision Tree 2
Reverend Thomas Bayes (1702-1761) England mathematician
Bayesian Classification
Classification - Decision Tree 3
Bayesian Classification: Why?
 Probabilistic learning:
o calculate explicit probabilities for hypothesis
o among the most practical approaches to certain types of
learning problems
 Incremental:
o each training example can incrementally increase/decrease
the probability that a hypothesis is correct
o prior knowledge can be combined with observed data
 Probabilistic prediction:
o predict multiple hypotheses, weighted by their probabilities
 Standard:
o even when Bayesian methods are computationally intractable,
they can provide a standard of optimal decision making
against which other methods can be measured
Classification - Decision Tree 4
Bayesian classification
 The classification problem may be formalized using a-
posteriori probabilities:
P(C|X) = probability that the sample tuple
X=<x1,…,xk> is of the class C
 For example
P(class=N | outlook=sunny,windy=true,…)
 Idea: assign to sample X the class label C such that
P(C|X) is maximal
Classification - Decision Tree 5
Estimating a-posteriori probabilities
 Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
 P(X) is constant for all classes
 P(C) = relative freq of class C samples
 C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
 Problem: Computing P(X|C) is unfeasible!
Classification - Decision Tree 6
P(X|C) : Naïve Bayesian classification
 Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
 If ith
attribute is categorical: P(xi|C) is estimated as the relative
frequency of samples having value xi as ith
attribute in the class
C
 If ith
attribute is continuous: P(xi|C) is estimated through a
Gaussian density function
 Computationally easy in both cases









]
[
2
1
)
(
2
)
(
2
1
X
E
e
x
p
x
Classification - Decision Tree 7
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Find class for:
 (Yes, No, Female, Yes, A)
 Total samples: 10
 Class A:3, Class B: 3, Class C:4
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
 P(own house=yes| Ci)
 P(Married=no| Ci)
 P(Gender=F| Ci)
 P(Employed=yes | Ci)
 P(credit rating=A| Ci)
Classification - Decision Tree 8
Example
Owns
House
Married Gender Employ
ed
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A)
P(Gender=F| A)
P(Employed=yes | A)
P(credit rating=A| A)
Classification - Decision Tree 9
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A)
P(Employed=yes | A)
P(credit rating=A| A)
Classification - Decision Tree 10
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A) = 3/3=1
P(Employed=yes | A)
P(credit rating=A| A)
Classification - Decision Tree 11
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A) = 3/3=1
P(Employed=yes | A) =3/3=1
P(credit rating=A| A) = 2/3
Classification - Decision Tree 12
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A) = 3/3=1
P(Employed=yes | A) =3/3=1
P(credit rating=A| A) = 2/3
P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A)
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F| A)*
P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666
Classification - Decision Tree 13
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A) = 3/3=1
P(Employed=yes | A) =3/3=1
P(credit rating=A| A) = 2/3
P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A)
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F|
A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666
 Repeat the same with Class B and C
Classification - Decision Tree 14
Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A) = 3/3=1
P(Gender=F| A) = 3/3=1
P(Employed=yes | A) =3/3=1
P(credit rating=A| A) = 2/3
P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A)
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F|
A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666
P(Class A|X) = 0.0666
P(Class B|X) = 0
P(Class C|X) = 0
Assign to Class A
Classification - Decision Tree 15
Naïve Bayesian classification – Example
 Estimating P(xi|C) P(p) = 9/14=.643
P(n) = 5/14=.357
Outlook
P(sunny | p) = 2/9 P(sunny | n) = 3/5
P(overcast | p) = 4/9 P(overcast | n) = 0
P(rain | p) = 3/9 P(rain | n) = 2/5
Temperature
P(hot | p) = 2/9 P(hot | n) = 2/5
P(mild | p) = 4/9 P(mild | n) = 2/5
P(cool | p) = 3/9 P(cool | n) = 1/5
Humidity
P(high | p) = 3/9 P(high | n) = 4/5
P(normal | p) = 6/9 P(normal | n) = 1/5
Windy
P(true | p) = 3/9 P(true | n) = 3/5
P(false | p) = 6/9 P(false | n) = 2/5
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Classification - Decision Tree 16
Naïve Bayesian classification – Example
 Estimating P(xi|C) P(p) = 9/14=.643
P(n) = 5/14=.357
Outlook
P(sunny | p) = 2/9 P(sunny | n) = 3/5
P(overcast | p) = 4/9 P(overcast | n) = 0
P(rain | p) = 3/9 P(rain | n) = 2/5
Temperature
P(hot | p) = 2/9 P(hot | n) = 2/5
P(mild | p) = 4/9 P(mild | n) = 2/5
P(cool | p) = 3/9 P(cool | n) = 1/5
Humidity
P(high | p) = 3/9 P(high | n) = 4/5
P(normal | p) = 6/9 P(normal | n) = 1/5
Windy
P(true | p) = 3/9 P(true | n) = 3/5
P(false | p) = 6/9 P(false | n) = 2/5
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Classification - Decision Tree 17
Naïve Bayesian classification – Example
 Estimating P(xi|C) P(p) = 9/14=.643
P(n) = 5/14=.357
Outlook
P(sunny | p) = 2/9 P(sunny | n) = 3/5
P(overcast | p) = 4/9 P(overcast | n) = 0
P(rain | p) = 3/9 P(rain | n) = 2/5
Temperature
P(hot | p) = 2/9 P(hot | n) = 2/5
P(mild | p) = 4/9 P(mild | n) = 2/5
P(cool | p) = 3/9 P(cool | n) = 1/5
Humidity
P(high | p) = 3/9 P(high | n) = 4/5
P(normal | p) = 6/9 P(normal | n) = 1/5
Windy
P(true | p) = 3/9 P(true | n) = 3/5
P(false | p) = 6/9 P(false | n) = 2/5
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Classification - Decision Tree 18
Naïve Bayesian classification – Example
 Estimating P(xi|C) P(p) = 9/14=.643
P(n) = 5/14=.357
Outlook
P(sunny | p) = 2/9 P(sunny | n) = 3/5
P(overcast | p) = 4/9 P(overcast | n) = 0
P(rain | p) = 3/9 P(rain | n) = 2/5
Temperature
P(hot | p) = 2/9 P(hot | n) = 2/5
P(mild | p) = 4/9 P(mild | n) = 2/5
P(cool | p) = 3/9 P(cool | n) = 1/5
Humidity
P(high | p) = 3/9 P(high | n) = 4/5
P(normal | p) = 6/9 P(normal | n) = 1/5
Windy
P(true | p) = 3/9 P(true | n) = 3/5
P(false | p) = 6/9 P(false | n) = 2/5
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Classification - Decision Tree 19
Naïve Bayesian classification – Example
 Estimating P(xi|C) P(p) = 9/14=.643
P(n) = 5/14=.357
Outlook
P(sunny | p) = 2/9 P(sunny | n) = 3/5
P(overcast | p) = 4/9 P(overcast | n) = 0
P(rain | p) = 3/9 P(rain | n) = 2/5
Temperature
P(hot | p) = 2/9 P(hot | n) = 2/5
P(mild | p) = 4/9 P(mild | n) = 2/5
P(cool | p) = 3/9 P(cool | n) = 1/5
Humidity
P(high | p) = 3/9 P(high | n) = 4/5
P(normal | p) = 6/9 P(normal | n) = 1/5
Windy
P(true | p) = 3/9 P(true | n) = 3/5
P(false | p) = 6/9 P(false | n) = 2/5
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Classification - Decision Tree 20
Naïve Bayesian classification – Example
 Classifying X: X = <rain, hot, high, false>
P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class n (don’t play)
Classification - Decision Tree 21
Naïve Bayesian classification – Example
 Classifying X: X = <rain, hot, high, false>
P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class n (don’t play)
Classification - Decision Tree 22
Naïve Bayesian classification – Example
 Classifying X: X = <rain, hot, high, false>
P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class n (don’t play)
Classification - Decision Tree 23
Naïve Bayesian classification – Example
 Classifying X: X = <rain, hot, high, false>
P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class n (don’t play)
Classification - Decision Tree 24
Bayes Approach: Continuous Variables
 Find the mean values for input variables
 Find the standard deviation for input
variables
 Apply the probability distribution for each
input variable
 Find the likelihood for each output class
 Find the probability for each class
 Assign the input to class of output
Classification - Decision Tree 25
Example 2 with continuous variables
Find Std. Dev and Mean values
Classification - Decision Tree 26
Example 2 (Cont.)
Outlook Temperature Humidity Windy Play
Sunny 66 90 True ?
Predict the following instance:
Classification - Decision Tree 27
0380
.
0
2
7
.
9
1
)
/
90
(
0221
.
0
2
2
.
10
1
)
/
90
(
0291
.
0
2
9
.
7
1
)
/
66
(
0340
.
0
2
2
.
6
1
)
/
66
(
2
2
2
2
)
7
.
9
20
.
86
90
(
2
1
)
2
.
10
1
.
79
90
(
2
1
)
9
.
7
6
.
74
66
(
2
1
)
2
.
6
73
66
(
2
1




















e
no
humidity
p
e
yes
humidity
p
e
no
e
temperatur
p
e
yes
e
temperatur
p




Example 2 (Cont.)
Find the probability dist. For each class of output
Classification - Decision Tree 28
0380
.
0
2
7
.
9
1
)
/
90
(
0221
.
0
2
2
.
10
1
)
/
90
(
0291
.
0
2
9
.
7
1
)
/
66
(
0340
.
0
2
2
.
6
1
)
/
66
(
2
2
2
2
)
7
.
9
20
.
86
90
(
2
1
)
2
.
10
1
.
79
90
(
2
1
)
9
.
7
6
.
74
66
(
2
1
)
2
.
6
73
66
(
2
1




















e
no
humidity
p
e
yes
humidity
p
e
no
e
temperatur
p
e
yes
e
temperatur
p




Example 2 (Cont.)
Outlook and windy are categorical data so PD not to be found
Classification - Decision Tree 29
%
1
.
79
000136
.
0
000036
.
0
000136
.
0
%
9
.
20
000136
.
0
000036
.
0
000036
.
0
000136
.
0
14
/
5
5
/
3
0380
.
0
0291
.
0
5
/
3
000036
.
0
14
/
9
9
/
3
0221
.
0
0340
.
0
9
/
2


















no
of
robability
P
yes
of
robability
P
no
of
Likelihood
yes
of
Likelihood
Play = NO
Example Outlook Temperature Humidity Windy Play
Sunny 66 90 True ?
Classification - Decision Tree 30
%
1
.
79
000136
.
0
000036
.
0
000136
.
0
%
9
.
20
000136
.
0
000036
.
0
000036
.
0
000136
.
0
14
/
5
5
/
3
0380
.
0
0291
.
0
5
/
3
000036
.
0
14
/
9
9
/
3
0221
.
0
0340
.
0
9
/
2


















no
of
robability
P
yes
of
robability
P
no
of
Likelihood
yes
of
Likelihood
Play = NO
Example Outlook Temperature Humidity Windy Play
Sunny 66 90 True ?
Classification - Decision Tree 31
Naïve Bayesian classification –
the independence hypothesis
 … makes computation possible
 … yields optimal classifiers when satisfied
 … but is seldom satisfied in practice, as attributes
(variables) are often correlated.
 Attempts to overcome this limitation:
o Bayesian networks, that combine Bayesian
reasoning with causal relationships between
attributes
o Decision trees, that reason on one attribute at the
time, considering most important attributes first
Classification - Decision Tree 32
Model Evaluation
 Metrics for Performance Evaluation
 How to evaluate the performance of a model?
 Methods for Performance Evaluation
 How to obtain reliable estimates?
 Methods for Model Comparison
 How to compare the relative performance among
competing models?
Classification - Decision Tree 33
Metrics for Performance Evaluation
 Focus on the predictive capability of a model
 Rather than how fast it takes to classify or build
models, scalability, etc.
 Confusion Matrix:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c D
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
Classification - Decision Tree 34
Metrics for Performance Evaluation…
 Most widely-used metric:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a
(TP)
b
(FN)
Class=No c
(FP)
d
(TN)
FN
FP
TN
TP
TN
TP
d
c
b
a
d
a










Accuracy
Classification - Decision Tree 35
Limitation of Accuracy
 Consider a 2-class problem
 Number of Class 0 examples = 9990
 Number of Class 1 examples = 10
 If model predicts everything to be class 0,
accuracy is 9990/10000 = 99.9 %
 Accuracy is misleading because model does not
detect any class 1 example
Classification - Decision Tree 36
Cost Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) Class=Yes Class=No
Class=Yes C(Yes|Yes) C(No|Yes)
Class=No C(Yes|No) C(No|No)
C(i|j): Cost of misclassifying class j example as class i
Classification - Decision Tree 37
Computing Cost of Classification
Cost
Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) + -
+ -1 100
- 1 0
Model M1 PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 150 40
- 60 250
Model M2 PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 250 45
- 5 200
Accuracy = 80%
Cost = 3910 (-150+4000+60+0)
Accuracy = 90%
Cost = 4255
Classification - Decision Tree 38
Cost-Sensitive Measures
c
b
a
a
p
r
rp
b
a
a
c
a
a









2
2
2
(F)
measure
-
F
(r)
Recall
(p)
Precision
 Precision is biased towards C(Yes|Yes) & C(Yes|No)
 Recall is biased towards C(Yes|Yes) & C(No|Yes)
 F-measure is biased towards all except C(No|No)
d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1
4
1
Accuracy
Weighted





Examples
TP=63 FN=28 91
FP=37 TN=72 109
100 100 200
Examples…
TP=77 FN=77 154
FP=23 TN=23 46
100 100 200
Examples… TP=24 FN=88 112
FP=76 TN=12 88
100 100 200
TP=88 FN=24 112
FP=12 TN=76 88
100 100 200
ROC (Receiver Operating Characteristic)
Curve
 ROC curve, is a graphical plot of the sensitivity
vs. (1 - specificity) for a binary classifier system as
its discrimination threshold is varied.
 The ROC can also be represented equivalently by
plotting the fraction of true positives (TPR = true
positive rate) vs. the fraction of false positives
(FPR = false positive rate).
 It is a comparison of two operating characteristics
(TPR & FPR) as the criterion changes
Applications
 ROC analysis provides tools to select possibly
optimal models and to discard suboptimal ones
independently from (and prior to specifying) the cost
context or the class distribution.
 ROC analysis is related in a direct and natural way
to cost/benefit analysis of diagnostic
decision making.
 Widely used in medicine, radiology, psychology and
other areas for many decades,
 It has been introduced relatively recently in other
areas like machine learning and data mining.
ROC Space
Sensitivity and Specificity
Predicted Values
Positive Negative
Actual
Positive True Positive
False Negative
(Type II error) → Sensitivity
Negative
False Positive
(Type I error)
(P-value)
True Negative → Specificity
↓
Positive predictive
value
↓
Negative predict
ive value
Sensitivity and Specificity
 Specificity (also 1 - false positive Rate):
SPC = TN / (FP + TN) = 1 − FPR
 Sensitivity (aka Recall, True positive rate):
TPR = TP / P = TP / (TP + FN)
Sensitivity and Specificity
Predicted Values
True False
Actual
Positive
True Positive=2 False Negative =1
(Type II error)
→ Sensitivity
66.67%
Negative
False Positive=18
(Type I error)
(P-value)
True Negative= 182
→ Specificity
91%
↓
Positive predictive v
alue
↓
Negative predictive v
alue

More Related Content

PPT
Business Analytics using R.ppt
PPT
Unit-2.ppt
PPT
Classification
PPTX
Naive Bayes Presentation
PPT
9-Decision Tree Induction-23-01-2025.ppt
PPT
3.Classification.ppt
PDF
naive bayes example.pdf
PDF
naive bayes example.pdf
Business Analytics using R.ppt
Unit-2.ppt
Classification
Naive Bayes Presentation
9-Decision Tree Induction-23-01-2025.ppt
3.Classification.ppt
naive bayes example.pdf
naive bayes example.pdf

Similar to Naive bayes algorithm machine learning.pptx (20)

PDF
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
PPTX
Unit 2 Machine Learning it's most important topic of basic
PDF
Naive Bayes and Decision Tree Algorithm.pdf
PPTX
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
PPTX
Applications of Classification Algorithm.pptx
PPTX
Belief Networks & Bayesian Classification
PPT
Unit-4 classification
PDF
NBaysian classifier, Naive Bayes classifier
PPT
2.3 bayesian classification
PPT
NaiveBayesfcctcvtyvyuyuvuygygygiughuobiubivvyjnh
PDF
Machine learning naive bayes and svm.pdf
PDF
Naive bayes Naive bayes Naive bayes Naive bayes
PPT
BAYESIAN theorem and implementation of i
PPTX
Lecture 10 Naive Bayes Classifier.hdghpptx
PDF
PDF
Nhan_Chapter 6_Classification 2022.pdf
PPTX
UNIT 3: Data Warehousing and Data Mining
PPT
Lecture07_ Naive Bayes Classifier Machine Learning
PPT
Text classification methods
PPT
Text classification methods
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Unit 2 Machine Learning it's most important topic of basic
Naive Bayes and Decision Tree Algorithm.pdf
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Applications of Classification Algorithm.pptx
Belief Networks & Bayesian Classification
Unit-4 classification
NBaysian classifier, Naive Bayes classifier
2.3 bayesian classification
NaiveBayesfcctcvtyvyuyuvuygygygiughuobiubivvyjnh
Machine learning naive bayes and svm.pdf
Naive bayes Naive bayes Naive bayes Naive bayes
BAYESIAN theorem and implementation of i
Lecture 10 Naive Bayes Classifier.hdghpptx
Nhan_Chapter 6_Classification 2022.pdf
UNIT 3: Data Warehousing and Data Mining
Lecture07_ Naive Bayes Classifier Machine Learning
Text classification methods
Text classification methods
Ad

More from ZainabShahzad9 (18)

PPTX
Data Science-entropy machine learning.pptx
PPT
software quality Assurance-lecture26.ppt
PPT
software quality Assurance-lecture23.ppt
PPT
Compiler Construction - CS606 Power Point Slides Lecture 13.ppt
PDF
lecture8-final.pdf ( analysis and design of algorithm)
PDF
maxflow.4up.pdf for the Maximam flow to solve using flord fulkerson algorithm
PDF
Chache memory ( chapter number 4 ) by William stalling
PDF
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
PDF
Lecture number 5 Theory.pdf(machine learning)
PDF
Lec 3.pdf
PDF
Lec-1.pdf
PPTX
Presentation1.pptx
PPTX
Presentation2-2.pptx
PPT
Lesson 20.ppt
PPTX
OS 7.pptx
PPTX
OS 6.pptx
DOCX
111803154 - Assignment 5 Normalisation.docx
PPTX
Project Presentation.pptx
Data Science-entropy machine learning.pptx
software quality Assurance-lecture26.ppt
software quality Assurance-lecture23.ppt
Compiler Construction - CS606 Power Point Slides Lecture 13.ppt
lecture8-final.pdf ( analysis and design of algorithm)
maxflow.4up.pdf for the Maximam flow to solve using flord fulkerson algorithm
Chache memory ( chapter number 4 ) by William stalling
Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf
Lecture number 5 Theory.pdf(machine learning)
Lec 3.pdf
Lec-1.pdf
Presentation1.pptx
Presentation2-2.pptx
Lesson 20.ppt
OS 7.pptx
OS 6.pptx
111803154 - Assignment 5 Normalisation.docx
Project Presentation.pptx
Ad

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Welding lecture in detail for understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Project quality management in manufacturing
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
PPT on Performance Review to get promotions
PDF
composite construction of structures.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Internet of Things (IOT) - A guide to understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mechanical Engineering MATERIALS Selection
Welding lecture in detail for understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Project quality management in manufacturing
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Automation-in-Manufacturing-Chapter-Introduction.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CH1 Production IntroductoryConcepts.pptx
UNIT 4 Total Quality Management .pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT on Performance Review to get promotions
composite construction of structures.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Internet of Things (IOT) - A guide to understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding

Naive bayes algorithm machine learning.pptx

  • 2. Classification - Decision Tree 2 Reverend Thomas Bayes (1702-1761) England mathematician Bayesian Classification
  • 3. Classification - Decision Tree 3 Bayesian Classification: Why?  Probabilistic learning: o calculate explicit probabilities for hypothesis o among the most practical approaches to certain types of learning problems  Incremental: o each training example can incrementally increase/decrease the probability that a hypothesis is correct o prior knowledge can be combined with observed data  Probabilistic prediction: o predict multiple hypotheses, weighted by their probabilities  Standard: o even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured
  • 4. Classification - Decision Tree 4 Bayesian classification  The classification problem may be formalized using a- posteriori probabilities: P(C|X) = probability that the sample tuple X=<x1,…,xk> is of the class C  For example P(class=N | outlook=sunny,windy=true,…)  Idea: assign to sample X the class label C such that P(C|X) is maximal
  • 5. Classification - Decision Tree 5 Estimating a-posteriori probabilities  Bayes theorem: P(C|X) = P(X|C)·P(C) / P(X)  P(X) is constant for all classes  P(C) = relative freq of class C samples  C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum  Problem: Computing P(X|C) is unfeasible!
  • 6. Classification - Decision Tree 6 P(X|C) : Naïve Bayesian classification  Naïve assumption: attribute independence P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)  If ith attribute is categorical: P(xi|C) is estimated as the relative frequency of samples having value xi as ith attribute in the class C  If ith attribute is continuous: P(xi|C) is estimated through a Gaussian density function  Computationally easy in both cases          ] [ 2 1 ) ( 2 ) ( 2 1 X E e x p x
  • 7. Classification - Decision Tree 7 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Find class for:  (Yes, No, Female, Yes, A)  Total samples: 10  Class A:3, Class B: 3, Class C:4  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4  P(own house=yes| Ci)  P(Married=no| Ci)  P(Gender=F| Ci)  P(Employed=yes | Ci)  P(credit rating=A| Ci)
  • 8. Classification - Decision Tree 8 Example Owns House Married Gender Employ ed Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) P(Gender=F| A) P(Employed=yes | A) P(credit rating=A| A)
  • 9. Classification - Decision Tree 9 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) P(Employed=yes | A) P(credit rating=A| A)
  • 10. Classification - Decision Tree 10 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) = 3/3=1 P(Employed=yes | A) P(credit rating=A| A)
  • 11. Classification - Decision Tree 11 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) = 3/3=1 P(Employed=yes | A) =3/3=1 P(credit rating=A| A) = 2/3
  • 12. Classification - Decision Tree 12 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) = 3/3=1 P(Employed=yes | A) =3/3=1 P(credit rating=A| A) = 2/3 P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A) = P(own house=yes| A)*P(Married=no| A)*P(Gender=F| A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A) = 1/3*1*1*1*2/3 * 0.3 = 0.0666
  • 13. Classification - Decision Tree 13 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) = 3/3=1 P(Employed=yes | A) =3/3=1 P(credit rating=A| A) = 2/3 P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A) = P(own house=yes| A)*P(Married=no| A)*P(Gender=F| A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A) = 1/3*1*1*1*2/3 * 0.3 = 0.0666  Repeat the same with Class B and C
  • 14. Classification - Decision Tree 14 Example Owns House Married Gender Employe d Credit History Risk Class Yes Yes M Yes A B No No F Yes A A Yes Yes F Yes B C Yes No M No B B No Yes F Yes B C No No F Yes B A No No M No B B Yes No F Yes A A No Yes F Yes A C Yes Yes F Yes a C  Prior Probabilities:  P(A)= 0.3  P(B)=0.3  P(C)=0.4 P{(X)| Class A} = P(own house=yes| A) = 1/3 P(Married=no| A) = 3/3=1 P(Gender=F| A) = 3/3=1 P(Employed=yes | A) =3/3=1 P(credit rating=A| A) = 2/3 P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A) = P(own house=yes| A)*P(Married=no| A)*P(Gender=F| A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A) = 1/3*1*1*1*2/3 * 0.3 = 0.0666 P(Class A|X) = 0.0666 P(Class B|X) = 0 P(Class C|X) = 0 Assign to Class A
  • 15. Classification - Decision Tree 15 Naïve Bayesian classification – Example  Estimating P(xi|C) P(p) = 9/14=.643 P(n) = 5/14=.357 Outlook P(sunny | p) = 2/9 P(sunny | n) = 3/5 P(overcast | p) = 4/9 P(overcast | n) = 0 P(rain | p) = 3/9 P(rain | n) = 2/5 Temperature P(hot | p) = 2/9 P(hot | n) = 2/5 P(mild | p) = 4/9 P(mild | n) = 2/5 P(cool | p) = 3/9 P(cool | n) = 1/5 Humidity P(high | p) = 3/9 P(high | n) = 4/5 P(normal | p) = 6/9 P(normal | n) = 1/5 Windy P(true | p) = 3/9 P(true | n) = 3/5 P(false | p) = 6/9 P(false | n) = 2/5 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N
  • 16. Classification - Decision Tree 16 Naïve Bayesian classification – Example  Estimating P(xi|C) P(p) = 9/14=.643 P(n) = 5/14=.357 Outlook P(sunny | p) = 2/9 P(sunny | n) = 3/5 P(overcast | p) = 4/9 P(overcast | n) = 0 P(rain | p) = 3/9 P(rain | n) = 2/5 Temperature P(hot | p) = 2/9 P(hot | n) = 2/5 P(mild | p) = 4/9 P(mild | n) = 2/5 P(cool | p) = 3/9 P(cool | n) = 1/5 Humidity P(high | p) = 3/9 P(high | n) = 4/5 P(normal | p) = 6/9 P(normal | n) = 1/5 Windy P(true | p) = 3/9 P(true | n) = 3/5 P(false | p) = 6/9 P(false | n) = 2/5 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N
  • 17. Classification - Decision Tree 17 Naïve Bayesian classification – Example  Estimating P(xi|C) P(p) = 9/14=.643 P(n) = 5/14=.357 Outlook P(sunny | p) = 2/9 P(sunny | n) = 3/5 P(overcast | p) = 4/9 P(overcast | n) = 0 P(rain | p) = 3/9 P(rain | n) = 2/5 Temperature P(hot | p) = 2/9 P(hot | n) = 2/5 P(mild | p) = 4/9 P(mild | n) = 2/5 P(cool | p) = 3/9 P(cool | n) = 1/5 Humidity P(high | p) = 3/9 P(high | n) = 4/5 P(normal | p) = 6/9 P(normal | n) = 1/5 Windy P(true | p) = 3/9 P(true | n) = 3/5 P(false | p) = 6/9 P(false | n) = 2/5 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N
  • 18. Classification - Decision Tree 18 Naïve Bayesian classification – Example  Estimating P(xi|C) P(p) = 9/14=.643 P(n) = 5/14=.357 Outlook P(sunny | p) = 2/9 P(sunny | n) = 3/5 P(overcast | p) = 4/9 P(overcast | n) = 0 P(rain | p) = 3/9 P(rain | n) = 2/5 Temperature P(hot | p) = 2/9 P(hot | n) = 2/5 P(mild | p) = 4/9 P(mild | n) = 2/5 P(cool | p) = 3/9 P(cool | n) = 1/5 Humidity P(high | p) = 3/9 P(high | n) = 4/5 P(normal | p) = 6/9 P(normal | n) = 1/5 Windy P(true | p) = 3/9 P(true | n) = 3/5 P(false | p) = 6/9 P(false | n) = 2/5 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N
  • 19. Classification - Decision Tree 19 Naïve Bayesian classification – Example  Estimating P(xi|C) P(p) = 9/14=.643 P(n) = 5/14=.357 Outlook P(sunny | p) = 2/9 P(sunny | n) = 3/5 P(overcast | p) = 4/9 P(overcast | n) = 0 P(rain | p) = 3/9 P(rain | n) = 2/5 Temperature P(hot | p) = 2/9 P(hot | n) = 2/5 P(mild | p) = 4/9 P(mild | n) = 2/5 P(cool | p) = 3/9 P(cool | n) = 1/5 Humidity P(high | p) = 3/9 P(high | n) = 4/5 P(normal | p) = 6/9 P(normal | n) = 1/5 Windy P(true | p) = 3/9 P(true | n) = 3/5 P(false | p) = 6/9 P(false | n) = 2/5 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N
  • 20. Classification - Decision Tree 20 Naïve Bayesian classification – Example  Classifying X: X = <rain, hot, high, false> P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 Sample X is classified in class n (don’t play)
  • 21. Classification - Decision Tree 21 Naïve Bayesian classification – Example  Classifying X: X = <rain, hot, high, false> P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 Sample X is classified in class n (don’t play)
  • 22. Classification - Decision Tree 22 Naïve Bayesian classification – Example  Classifying X: X = <rain, hot, high, false> P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 Sample X is classified in class n (don’t play)
  • 23. Classification - Decision Tree 23 Naïve Bayesian classification – Example  Classifying X: X = <rain, hot, high, false> P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 Sample X is classified in class n (don’t play)
  • 24. Classification - Decision Tree 24 Bayes Approach: Continuous Variables  Find the mean values for input variables  Find the standard deviation for input variables  Apply the probability distribution for each input variable  Find the likelihood for each output class  Find the probability for each class  Assign the input to class of output
  • 25. Classification - Decision Tree 25 Example 2 with continuous variables Find Std. Dev and Mean values
  • 26. Classification - Decision Tree 26 Example 2 (Cont.) Outlook Temperature Humidity Windy Play Sunny 66 90 True ? Predict the following instance:
  • 27. Classification - Decision Tree 27 0380 . 0 2 7 . 9 1 ) / 90 ( 0221 . 0 2 2 . 10 1 ) / 90 ( 0291 . 0 2 9 . 7 1 ) / 66 ( 0340 . 0 2 2 . 6 1 ) / 66 ( 2 2 2 2 ) 7 . 9 20 . 86 90 ( 2 1 ) 2 . 10 1 . 79 90 ( 2 1 ) 9 . 7 6 . 74 66 ( 2 1 ) 2 . 6 73 66 ( 2 1                     e no humidity p e yes humidity p e no e temperatur p e yes e temperatur p     Example 2 (Cont.) Find the probability dist. For each class of output
  • 28. Classification - Decision Tree 28 0380 . 0 2 7 . 9 1 ) / 90 ( 0221 . 0 2 2 . 10 1 ) / 90 ( 0291 . 0 2 9 . 7 1 ) / 66 ( 0340 . 0 2 2 . 6 1 ) / 66 ( 2 2 2 2 ) 7 . 9 20 . 86 90 ( 2 1 ) 2 . 10 1 . 79 90 ( 2 1 ) 9 . 7 6 . 74 66 ( 2 1 ) 2 . 6 73 66 ( 2 1                     e no humidity p e yes humidity p e no e temperatur p e yes e temperatur p     Example 2 (Cont.) Outlook and windy are categorical data so PD not to be found
  • 29. Classification - Decision Tree 29 % 1 . 79 000136 . 0 000036 . 0 000136 . 0 % 9 . 20 000136 . 0 000036 . 0 000036 . 0 000136 . 0 14 / 5 5 / 3 0380 . 0 0291 . 0 5 / 3 000036 . 0 14 / 9 9 / 3 0221 . 0 0340 . 0 9 / 2                   no of robability P yes of robability P no of Likelihood yes of Likelihood Play = NO Example Outlook Temperature Humidity Windy Play Sunny 66 90 True ?
  • 30. Classification - Decision Tree 30 % 1 . 79 000136 . 0 000036 . 0 000136 . 0 % 9 . 20 000136 . 0 000036 . 0 000036 . 0 000136 . 0 14 / 5 5 / 3 0380 . 0 0291 . 0 5 / 3 000036 . 0 14 / 9 9 / 3 0221 . 0 0340 . 0 9 / 2                   no of robability P yes of robability P no of Likelihood yes of Likelihood Play = NO Example Outlook Temperature Humidity Windy Play Sunny 66 90 True ?
  • 31. Classification - Decision Tree 31 Naïve Bayesian classification – the independence hypothesis  … makes computation possible  … yields optimal classifiers when satisfied  … but is seldom satisfied in practice, as attributes (variables) are often correlated.  Attempts to overcome this limitation: o Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes o Decision trees, that reason on one attribute at the time, considering most important attributes first
  • 32. Classification - Decision Tree 32 Model Evaluation  Metrics for Performance Evaluation  How to evaluate the performance of a model?  Methods for Performance Evaluation  How to obtain reliable estimates?  Methods for Model Comparison  How to compare the relative performance among competing models?
  • 33. Classification - Decision Tree 33 Metrics for Performance Evaluation  Focus on the predictive capability of a model  Rather than how fast it takes to classify or build models, scalability, etc.  Confusion Matrix: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a b Class=No c D a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative)
  • 34. Classification - Decision Tree 34 Metrics for Performance Evaluation…  Most widely-used metric: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a (TP) b (FN) Class=No c (FP) d (TN) FN FP TN TP TN TP d c b a d a           Accuracy
  • 35. Classification - Decision Tree 35 Limitation of Accuracy  Consider a 2-class problem  Number of Class 0 examples = 9990  Number of Class 1 examples = 10  If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 %  Accuracy is misleading because model does not detect any class 1 example
  • 36. Classification - Decision Tree 36 Cost Matrix PREDICTED CLASS ACTUAL CLASS C(i|j) Class=Yes Class=No Class=Yes C(Yes|Yes) C(No|Yes) Class=No C(Yes|No) C(No|No) C(i|j): Cost of misclassifying class j example as class i
  • 37. Classification - Decision Tree 37 Computing Cost of Classification Cost Matrix PREDICTED CLASS ACTUAL CLASS C(i|j) + - + -1 100 - 1 0 Model M1 PREDICTED CLASS ACTUAL CLASS + - + 150 40 - 60 250 Model M2 PREDICTED CLASS ACTUAL CLASS + - + 250 45 - 5 200 Accuracy = 80% Cost = 3910 (-150+4000+60+0) Accuracy = 90% Cost = 4255
  • 38. Classification - Decision Tree 38 Cost-Sensitive Measures c b a a p r rp b a a c a a          2 2 2 (F) measure - F (r) Recall (p) Precision  Precision is biased towards C(Yes|Yes) & C(Yes|No)  Recall is biased towards C(Yes|Yes) & C(No|Yes)  F-measure is biased towards all except C(No|No) d w c w b w a w d w a w 4 3 2 1 4 1 Accuracy Weighted     
  • 39. Examples TP=63 FN=28 91 FP=37 TN=72 109 100 100 200
  • 40. Examples… TP=77 FN=77 154 FP=23 TN=23 46 100 100 200
  • 41. Examples… TP=24 FN=88 112 FP=76 TN=12 88 100 100 200 TP=88 FN=24 112 FP=12 TN=76 88 100 100 200
  • 42. ROC (Receiver Operating Characteristic) Curve  ROC curve, is a graphical plot of the sensitivity vs. (1 - specificity) for a binary classifier system as its discrimination threshold is varied.  The ROC can also be represented equivalently by plotting the fraction of true positives (TPR = true positive rate) vs. the fraction of false positives (FPR = false positive rate).  It is a comparison of two operating characteristics (TPR & FPR) as the criterion changes
  • 43. Applications  ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution.  ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.  Widely used in medicine, radiology, psychology and other areas for many decades,  It has been introduced relatively recently in other areas like machine learning and data mining.
  • 45. Sensitivity and Specificity Predicted Values Positive Negative Actual Positive True Positive False Negative (Type II error) → Sensitivity Negative False Positive (Type I error) (P-value) True Negative → Specificity ↓ Positive predictive value ↓ Negative predict ive value
  • 46. Sensitivity and Specificity  Specificity (also 1 - false positive Rate): SPC = TN / (FP + TN) = 1 − FPR  Sensitivity (aka Recall, True positive rate): TPR = TP / P = TP / (TP + FN)
  • 47. Sensitivity and Specificity Predicted Values True False Actual Positive True Positive=2 False Negative =1 (Type II error) → Sensitivity 66.67% Negative False Positive=18 (Type I error) (P-value) True Negative= 182 → Specificity 91% ↓ Positive predictive v alue ↓ Negative predictive v alue