Naive bayes algorithm machine learning.pptx

Supervised Learning:
Classification-III
Bayes Approach

Classification - Decision Tree 2
Reverend Thomas Bayes (1702-1761) England mathematician
Bayesian Classification

Bayesian Classification: Why?
 Probabilistic learning:
o calculate explicit probabilities for hypothesis
o among the most practical approaches to certain types of
learning problems
 Incremental:
o each training example can incrementally increase/decrease
the probability that a hypothesis is correct
o prior knowledge can be combined with observed data
 Probabilistic prediction:
o predict multiple hypotheses, weighted by their probabilities
 Standard:
o even when Bayesian methods are computationally intractable,
they can provide a standard of optimal decision making
against which other methods can be measured

Bayesian classification
 The classification problem may be formalized using a-
posteriori probabilities:
P(C|X) = probability that the sample tuple
X=<x1,…,xk> is of the class C
 For example
P(class=N | outlook=sunny,windy=true,…)
 Idea: assign to sample X the class label C such that
P(C|X) is maximal

Estimating a-posteriori probabilities
 Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
 P(X) is constant for all classes
 P(C) = relative freq of class C samples
 C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
 Problem: Computing P(X|C) is unfeasible!

P(X|C) : Naïve Bayesian classification
 Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
 If ith
attribute is categorical: P(xi|C) is estimated as the relative
frequency of samples having value xi as ith
attribute in the class
C
 If ith
attribute is continuous: P(xi|C) is estimated through a
Gaussian density function
 Computationally easy in both cases









]
[
2
1
)
(
2
)
(
2
1
X
E
e
x
p
x

Example
Owns
House
Married Gender Employe
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 Find class for:
 (Yes, No, Female, Yes, A)
 Total samples: 10
 Class A:3, Class B: 3, Class C:4
 Prior Probabilities:
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
 P(own house=yes| Ci)
 P(Married=no| Ci)
 P(Gender=F| Ci)
 P(Employed=yes | Ci)
 P(credit rating=A| Ci)

Example
Owns
House
Married Gender Employ
ed
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(own house=yes| A) = 1/3
P(Married=no| A)
P(Gender=F| A)
P(Employed=yes | A)
P(credit rating=A| A)

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(Married=no| A) = 3/3=1
P(Gender=F| A)
P(Employed=yes | A)

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(Gender=F| A) = 3/3=1
P(Employed=yes | A)

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P(Employed=yes | A) =3/3=1
P(credit rating=A| A) = 2/3

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
P{A | (y,n,f,y,A)} = P{(y,n,f,y,A)|A}*P(A)
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F| A)*
P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F|
A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666
 Repeat the same with Class B and C

Example
Owns
House
d
Credit
History
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes a C
 P(A)= 0.3
 P(B)=0.3
 P(C)=0.4
P{(X)| Class A} =
= P(own house=yes| A)*P(Married=no| A)*P(Gender=F|
A)* P(Employed=yes | A)*P(credit rating=A| A) * P(A)
= 1/3*1*1*1*2/3 * 0.3 = 0.0666
P(Class A|X) = 0.0666
P(Class B|X) = 0
P(Class C|X) = 0
Assign to Class A

= 3/9·2/9·3/9·6/9·9/14 = 0.010582
= 2/5·2/5·4/5·2/5·5/14 = 0.018286

Bayes Approach: Continuous Variables
 Find the mean values for input variables
 Find the standard deviation for input
variables
 Apply the probability distribution for each
input variable
 Find the likelihood for each output class
 Find the probability for each class
 Assign the input to class of output

Example 2 with continuous variables
Find Std. Dev and Mean values

Example 2 (Cont.)
Outlook Temperature Humidity Windy Play
Sunny 66 90 True ?
Predict the following instance:

0380
.
0
2
7
.
9
1
)
/
90
(
0221
.
0
2
2
.
10
1
)
/
90
(
0291
.
0
2
9
.
7
1
)
/
66
(
0340
.
0
2
2
.
6
1
)
/
66
(
2
2
2
2
)
7
.
9
20
.
86
90
(
2
1
)
2
.
10
1
.
79
90
(
2
1
)
9
.
7
6
.
74
66
(
2
1
)
2
.
6
73
66
(
2
1




















e
no
humidity
p
e
yes
humidity
p
e
no
e
temperatur
p
e
yes
e
temperatur
p




Example 2 (Cont.)
Find the probability dist. For each class of output

0380
.
0
2
7
.
9
1
)
/
90
(
0221
.
0
2
2
.
10
1
)
/
90
(
0291
.
0
2
9
.
7
1
)
/
66
(
0340
.
0
2
2
.
6
1
)
/
66
(
2
2
2
2
)
7
.
9
20
.
86
90
(
2
1
)
2
.
10
1
.
79
90
(
2
1
)
9
.
7
6
.
74
66
(
2
1
)
2
.
6
73
66
(
2
1




















e
no
humidity
p
e
yes
humidity
p
e
no
e
temperatur
p
e
yes
e
temperatur
p




Example 2 (Cont.)
Outlook and windy are categorical data so PD not to be found

%
1
.
79
000136
.
0
000036
.
0
000136
.
0
%
9
.
20
000136
.
0
000036
.
0
000036
.
0
000136
.
0
14
/
5
5
/
3
0380
.
0
0291
.
0
5
/
3
000036
.
0
14
/
9
9
/
3
0221
.
0
0340
.
0
9
/
2


















no
of
robability
P
yes
of
robability
P
no
of
Likelihood
yes
of
Likelihood
Play = NO
Example Outlook Temperature Humidity Windy Play
Sunny 66 90 True ?

Naïve Bayesian classification –
the independence hypothesis
 … makes computation possible
 … yields optimal classifiers when satisfied
 … but is seldom satisfied in practice, as attributes
(variables) are often correlated.
 Attempts to overcome this limitation:
o Bayesian networks, that combine Bayesian
reasoning with causal relationships between
attributes
o Decision trees, that reason on one attribute at the
time, considering most important attributes first

Model Evaluation
 Metrics for Performance Evaluation
 How to evaluate the performance of a model?
 Methods for Performance Evaluation
 How to obtain reliable estimates?
 Methods for Model Comparison
 How to compare the relative performance among
competing models?

Metrics for Performance Evaluation
 Focus on the predictive capability of a model
 Rather than how fast it takes to classify or build
models, scalability, etc.
 Confusion Matrix:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c D
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)

Metrics for Performance Evaluation…
 Most widely-used metric:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a
(TP)
b
(FN)
Class=No c
(FP)
d
(TN)
FN
FP
TN
TP
TN
TP
d
c
b
a
d
a










Accuracy

Limitation of Accuracy
 Consider a 2-class problem
 Number of Class 0 examples = 9990
 Number of Class 1 examples = 10
 If model predicts everything to be class 0,
accuracy is 9990/10000 = 99.9 %
 Accuracy is misleading because model does not
detect any class 1 example

Computing Cost of Classification
Cost
Matrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) + -
+ -1 100
- 1 0
Model M1 PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 150 40
- 60 250
Model M2 PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 250 45
- 5 200
Accuracy = 80%
Cost = 3910 (-150+4000+60+0)
Accuracy = 90%
Cost = 4255

Cost-Sensitive Measures
c
b
a
a
p
r
rp
b
a
a
c
a
a









2
2
2
(F)
measure
-
F
(r)
Recall
(p)
Precision
 Precision is biased towards C(Yes|Yes) & C(Yes|No)
 Recall is biased towards C(Yes|Yes) & C(No|Yes)
 F-measure is biased towards all except C(No|No)
d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1
4
1
Accuracy
Weighted






Examples
TP=63 FN=28 91
FP=37 TN=72 109
100 100 200

Examples…
TP=77 FN=77 154
FP=23 TN=23 46
100 100 200

Examples… TP=24 FN=88 112
FP=76 TN=12 88
100 100 200
TP=88 FN=24 112
FP=12 TN=76 88
100 100 200

ROC (Receiver Operating Characteristic)
Curve
 ROC curve, is a graphical plot of the sensitivity
vs. (1 - specificity) for a binary classifier system as
its discrimination threshold is varied.
 The ROC can also be represented equivalently by
plotting the fraction of true positives (TPR = true
positive rate) vs. the fraction of false positives
(FPR = false positive rate).
 It is a comparison of two operating characteristics
(TPR & FPR) as the criterion changes

Applications
 ROC analysis provides tools to select possibly
optimal models and to discard suboptimal ones
independently from (and prior to specifying) the cost
context or the class distribution.
 ROC analysis is related in a direct and natural way
to cost/benefit analysis of diagnostic
decision making.
 Widely used in medicine, radiology, psychology and
other areas for many decades,
 It has been introduced relatively recently in other
areas like machine learning and data mining.

Sensitivity and Specificity
Predicted Values
Positive Negative
Actual
Positive True Positive
False Negative
(Type II error) → Sensitivity
Negative
False Positive
(Type I error)
(P-value)
True Negative → Specificity
↓
Positive predictive
value
↓
Negative predict
ive value

 Specificity (also 1 - false positive Rate):
SPC = TN / (FP + TN) = 1 − FPR
 Sensitivity (aka Recall, True positive rate):
TPR = TP / P = TP / (TP + FN)

Predicted Values
True False
Actual
Positive
True Positive=2 False Negative =1
(Type II error)
→ Sensitivity
66.67%
Negative
False Positive=18
(Type I error)
(P-value)
True Negative= 182
→ Specificity
91%
↓
Positive predictive v
alue
↓
Negative predictive v
alue

Naive bayes algorithm machine learning.pptx

More Related Content

Similar to Naive bayes algorithm machine learning.pptx (20)

More from ZainabShahzad9 (18)

Recently uploaded (20)

Naive bayes algorithm machine learning.pptx