Pattern recognition binoy 05-naive bayes classifier

Naïve Bayes Classifier
Dr. Binoy B Nair

Algorithm
• A Naive Bayesian model is easy to
build, with no complicated
iterative parameter estimation
which makes it particularly useful
for very large datasets.
• Despite its simplicity, the Naive
Bayesian classifier often does
surprisingly well and is widely
used.
Assume that there are n number of features in
the dataset, then X= {x1 ,x2 , … , xn }

Naïve Bayes -Details
• Bayes classification:
Difficulty: learning the joint probability
• Naïve Bayes classification- Assumption that all input features are conditionally
independent!
)()|,,()()()( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
)|()|()|()|,,,( 2121 CXPCXPCXPCXXXP nn 

Naïve Bayes
• NB classification rule:
• for given 𝑋 = (𝑥1, 𝑥2, 𝑥3, . . 𝑥 𝑛) and L number of classes: C1, C2, .., CL, the
vector X is assigned to class c* when:
Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1
*
1
***
1 

Naïve Bayes
• Algorithm: Continuous-valued Features
– Conditional probability often modeled with the normal distribution
– Learning Phase:
Output: normal distributions and
– Test Phase: Given an unknown instance
• Instead of looking-up tables, calculate conditional probabilities with all the
normal distributions achieved in the learning phrase
• Apply the MAP rule to make a decision
ijji
ijji
ji
jij
ji
ij
cC
cX
X
cCXP















whichforexamplesofXvaluesfeatureofdeviationstandard:
Cwhichforexamplesofvaluesfeatureof(avearage)mean:
2
)(
exp
2
1
)|(ˆ
2
2
Ln ccCXX ,,),,,(for 11 XLn
LicCP i ,,1)( 
),,( 1 naa X

Example 3-Naïve Bayes Classifier with Continuous
Attributes
• Problem: classify
whether a given
person is a male or a
female based on the
measured features.
The features include:
height, weight, and
foot size.
Training
Example training set below.
Sex
(o/p class)
Height
(ft)
Weight
(lbs)
foot size
(inches)
male 6 180 12
male 5.92 190 11
male 5.58 170 12
male 5.92 165 10
female 5 100 6
female 5.5 150 8
female 5.42 130 7
female 5.75 150 9

Example 3
• Solution
• Phase 1: Training
• The classifier created from the training set using a Gaussian distribution assumption would be:
sex
mean
(height)
variance
(height)
mean
(weight)
variance
(weight)
Mean
(foot size)
variance
(foot size)
male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01
female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00
We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.

Example 3
• Phase 2: Testing
• Below is a sample X to be classified as a male or female.
sex height (ft) weight foot size(inches)
To identify 6 130 8
Solution:
X={6,130,8}
Given this info, We wish to determine which is greater, p(male|X) or p(female|X) .
p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence
p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence

Example 3
• The evidence (also termed normalizing constant) may be calculated
since the sum of the posteriors equals one.
• evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) +
P(female)*P(height|female)*P(weight|female)*P(foot size|female)
• The evidence may be ignored since it is a positive constant and is
same for both the classes. (Normal distributions are always positive.)

Example 3
• We now determine the sex of the sample.
• P(male) = 0.5
• P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under
the bell curve that is equal to 1.)
• P(weight|male) = 5.9881e-06
• P(foot size|male) = 1.3112e-3
• numerator of p(male|X) = their product = 6.1984e-09

Naïve Bayes
• Algorithm: Discrete-Valued Features
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance ,
Look up tables to assign the label c* to X’ if
;inexampleswith)|(estimate)|(ˆ
),1;,,1(featureeachofvaluefeatureeveryFor
;inexampleswith)(estimate)(ˆ
ofvaluetargeteachFor 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knjXx
cCPcCP
)c,,c(cc




Lnn ccccccPcaPcaPcPcaPcaP ,,,),(ˆ)]|(ˆ)|(ˆ[)(ˆ)]|(ˆ)|(ˆ[ 1
*
1
***
1 
),,( 1 naa X
LNX jj ,

Example
• Example: Play Tennis
Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)

Example
• Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
We have four variables, we calculate for each
we calculate the conditional probability table

Example 2: Training dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
30…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Class:
C1:buys_computer=‘yes’
C2:buys_computer=‘no’
Data sample:
X =
(age<=30,
Income=medium,
Student=yes
Credit_rating=Fair)

Naïve Bayesian Classifier: Example 2
• Compute P(X|Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
• X=(age<=30 ,income =medium, student=yes,credit_rating=fair)
P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044
P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019
P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007
 X belongs to class “buys_computer=yes”
P(buys_computer=“yes“)=9/14
P(buys_computer=“no“)=5/14

Summary
• Naïve Bayes: the conditional independence assumption
• Training is very easy and fast; just requiring considering each attribute
in each class separately
• Test is straightforward; just looking up tables or calculating conditional
probabilities with estimated distributions
• A popular generative model
• Performance competitive to most of state-of-the-art classifiers even in
presence of violating independence assumption
• Many successful applications, e.g., spam mail filtering
• A good candidate of a base learner in ensemble learning
• Apart from classification, naïve Bayes can do more…

Pattern recognition binoy 05-naive bayes classifier

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Pattern recognition binoy 05-naive bayes classifier (20)

Recently uploaded (20)

Pattern recognition binoy 05-naive bayes classifier