SlideShare a Scribd company logo
Naïve Bayes Classifier
Dr. Binoy B Nair
Algorithm
• A Naive Bayesian model is easy to
build, with no complicated
iterative parameter estimation
which makes it particularly useful
for very large datasets.
• Despite its simplicity, the Naive
Bayesian classifier often does
surprisingly well and is widely
used.
Assume that there are n number of features in
the dataset, then X= {x1 ,x2 , … , xn }
Naïve Bayes -Details
• Bayes classification:
Difficulty: learning the joint probability
• Naïve Bayes classification- Assumption that all input features are conditionally
independent!
)()|,,()()()( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
)|()|()|()|,,,( 2121 CXPCXPCXPCXXXP nn 
Naïve Bayes
• NB classification rule:
• for given 𝑋 = (𝑥1, 𝑥2, 𝑥3, . . 𝑥 𝑛) and L number of classes: C1, C2, .., CL, the
vector X is assigned to class c* when:
Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1
*
1
***
1 
Naïve Bayes
• Algorithm: Continuous-valued Features
– Conditional probability often modeled with the normal distribution
– Learning Phase:
Output: normal distributions and
– Test Phase: Given an unknown instance
• Instead of looking-up tables, calculate conditional probabilities with all the
normal distributions achieved in the learning phrase
• Apply the MAP rule to make a decision
ijji
ijji
ji
jij
ji
ij
cC
cX
X
cCXP















whichforexamplesofXvaluesfeatureofdeviationstandard:
Cwhichforexamplesofvaluesfeatureof(avearage)mean:
2
)(
exp
2
1
)|(ˆ
2
2
Ln ccCXX ,,),,,(for 11 XLn
LicCP i ,,1)( 
),,( 1 naa X
Example 3-Naïve Bayes Classifier with Continuous
Attributes
• Problem: classify
whether a given
person is a male or a
female based on the
measured features.
The features include:
height, weight, and
foot size.
Training
Example training set below.
Sex
(o/p class)
Height
(ft)
Weight
(lbs)
foot size
(inches)
male 6 180 12
male 5.92 190 11
male 5.58 170 12
male 5.92 165 10
female 5 100 6
female 5.5 150 8
female 5.42 130 7
female 5.75 150 9
Example 3
• Solution
• Phase 1: Training
• The classifier created from the training set using a Gaussian distribution assumption would be:
sex
mean
(height)
variance
(height)
mean
(weight)
variance
(weight)
Mean
(foot size)
variance
(foot size)
male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01
female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00
We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.
Example 3
• Phase 2: Testing
• Below is a sample X to be classified as a male or female.
sex height (ft) weight foot size(inches)
To identify 6 130 8
Solution:
X={6,130,8}
Given this info, We wish to determine which is greater, p(male|X) or p(female|X) .
p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence
p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence
Example 3
• The evidence (also termed normalizing constant) may be calculated
since the sum of the posteriors equals one.
• evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) +
P(female)*P(height|female)*P(weight|female)*P(foot size|female)
• The evidence may be ignored since it is a positive constant and is
same for both the classes. (Normal distributions are always positive.)
Example 3
• We now determine the sex of the sample.
• P(male) = 0.5
• P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under
the bell curve that is equal to 1.)
• P(weight|male) = 5.9881e-06
• P(foot size|male) = 1.3112e-3
• numerator of p(male|X) = their product = 6.1984e-09
Example 3
• P(female) = 0.5
• P(height|female) = 2.2346e-1
• P(weight|female) = 1.6789e-2
• P(foot size|female) = 2.8669e-1
• numerator of p(female|X) = their product = 5.3778e-04
Result:
Since posterior numerator of p(female|X) > posterior numerator of p(male|X) , the sample is
female.
Naïve Bayes
• Algorithm: Discrete-Valued Features
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance ,
Look up tables to assign the label c* to X’ if
;inexampleswith)|(estimate)|(ˆ
),1;,,1(featureeachofvaluefeatureeveryFor
;inexampleswith)(estimate)(ˆ
ofvaluetargeteachFor 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knjXx
cCPcCP
)c,,c(cc




Lnn ccccccPcaPcaPcPcaPcaP ,,,),(ˆ)]|(ˆ)|(ˆ[)(ˆ)]|(ˆ)|(ˆ[ 1
*
1
***
1 
),,( 1 naa X
LNX jj ,
Example
• Example: Play Tennis
Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)
Example
• Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5
Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5
Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
We have four variables, we calculate for each
we calculate the conditional probability table
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
– Decision making with the MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
Example 2: Training dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
30…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Class:
C1:buys_computer=‘yes’
C2:buys_computer=‘no’
Data sample:
X =
(age<=30,
Income=medium,
Student=yes
Credit_rating=Fair)
Naïve Bayesian Classifier: Example 2
• Compute P(X|Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
• X=(age<=30 ,income =medium, student=yes,credit_rating=fair)
P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044
P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019
P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007
 X belongs to class “buys_computer=yes”
P(buys_computer=“yes“)=9/14
P(buys_computer=“no“)=5/14
Summary
• Naïve Bayes: the conditional independence assumption
• Training is very easy and fast; just requiring considering each attribute
in each class separately
• Test is straightforward; just looking up tables or calculating conditional
probabilities with estimated distributions
• A popular generative model
• Performance competitive to most of state-of-the-art classifiers even in
presence of violating independence assumption
• Many successful applications, e.g., spam mail filtering
• A good candidate of a base learner in ensemble learning
• Apart from classification, naïve Bayes can do more…
Thank You

More Related Content

PPTX
Vector Quantization Vs Scalar Quantization
ODP
Simple Introduction to AutoEncoder
PPT
My8clst
DOCX
Control Units : Microprogrammed and Hardwired:control unit
PDF
Seminar(Pattern Recognition)
PDF
Video Compression Basics
ODP
Image Processing with OpenCV
PPTX
Decision tree in artificial intelligence
Vector Quantization Vs Scalar Quantization
Simple Introduction to AutoEncoder
My8clst
Control Units : Microprogrammed and Hardwired:control unit
Seminar(Pattern Recognition)
Video Compression Basics
Image Processing with OpenCV
Decision tree in artificial intelligence

What's hot (20)

PDF
Network Coding
PPTX
Support Vector Machine ppt presentation
PPTX
Addressing modes of 8086
PPTX
Viterbi Decoder Algorithm.pptx
PPTX
Hough Transform By Md.Nazmul Islam
PPTX
Register Organisation of 8086 Microprocessor
PPTX
Onnx and onnx runtime
PPTX
Subband Coding
PPTX
module5: Encode/Decode Half & Full Adder
PPT
vector QUANTIZATION
PDF
Region Splitting and Merging Technique For Image segmentation.
PPT
Perceptron algorithm
PDF
MICROPROCESSOR & MICROCONTROLLER 8086,8051 Notes
PDF
Introduction to Autoencoders
PPTX
Chapter 14 AutoEncoder
PPTX
Convolutional codes
PPTX
Support vector machine
PPTX
Digital signal processing
Network Coding
Support Vector Machine ppt presentation
Addressing modes of 8086
Viterbi Decoder Algorithm.pptx
Hough Transform By Md.Nazmul Islam
Register Organisation of 8086 Microprocessor
Onnx and onnx runtime
Subband Coding
module5: Encode/Decode Half & Full Adder
vector QUANTIZATION
Region Splitting and Merging Technique For Image segmentation.
Perceptron algorithm
MICROPROCESSOR & MICROCONTROLLER 8086,8051 Notes
Introduction to Autoencoders
Chapter 14 AutoEncoder
Convolutional codes
Support vector machine
Digital signal processing
Ad

Viewers also liked (14)

PDF
Bayes 6
PPT
Data mining-2
PDF
PROMISE 2011: "Handling missing data in software effort prediction with naive...
PDF
Analysis of crop yield prediction using data mining techniques
PDF
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
PPT
Weather report project
PPTX
Software Project Management for 'Weather Forecasting using Data mining'
PPT
2.3 bayesian classification
PDF
Bayesian classification
PPTX
Naive Bayes Presentation
PPTX
Classification with Naive Bayes
PPTX
Naive bayes
PPTX
Crime Analysis using Data Analysis
DOC
My Project Report Documentation with Abstract & Snapshots
Bayes 6
Data mining-2
PROMISE 2011: "Handling missing data in software effort prediction with naive...
Analysis of crop yield prediction using data mining techniques
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...
Weather report project
Software Project Management for 'Weather Forecasting using Data mining'
2.3 bayesian classification
Bayesian classification
Naive Bayes Presentation
Classification with Naive Bayes
Naive bayes
Crime Analysis using Data Analysis
My Project Report Documentation with Abstract & Snapshots
Ad

Similar to Pattern recognition binoy 05-naive bayes classifier (20)

PPT
ch8Bayes.ppt
PPT
Naive bayes
PDF
NBaysian classifier, Naive Bayes classifier
PPTX
Bayesian classification
PPTX
PPTX
"Naive Bayes Classifier" @ Papers We Love Bucharest
PPTX
ch8Bayes.pptx
PPT
Naive Bayes Classifier.ppt helping others by sharing the ppt
PPT
ch8Bayes.pptch8Bayesch8Bayesch8Bayesch8Bayes
PPT
ch8Bayes.ppt kikkkkkkkkkikb kiiiiiiiiiii ikk
PPT
Lecture07_ Naive Bayes Classifier Machine Learning
PDF
naive bayes example.pdf
PDF
naive bayes example.pdf
PPT
ch8Bayes.ppt
PPT
ch8Bayes.ppt
PPTX
Naïve Bayes.pptx
PPT
Classification
PDF
GonzalezGinestetResearchDay2016
PPT
Naive-Bayewwewewewewewewewewewewewewew.ppt
PPT
Business Analytics using R.ppt
ch8Bayes.ppt
Naive bayes
NBaysian classifier, Naive Bayes classifier
Bayesian classification
"Naive Bayes Classifier" @ Papers We Love Bucharest
ch8Bayes.pptx
Naive Bayes Classifier.ppt helping others by sharing the ppt
ch8Bayes.pptch8Bayesch8Bayesch8Bayesch8Bayes
ch8Bayes.ppt kikkkkkkkkkikb kiiiiiiiiiii ikk
Lecture07_ Naive Bayes Classifier Machine Learning
naive bayes example.pdf
naive bayes example.pdf
ch8Bayes.ppt
ch8Bayes.ppt
Naïve Bayes.pptx
Classification
GonzalezGinestetResearchDay2016
Naive-Bayewwewewewewewewewewewewewewew.ppt
Business Analytics using R.ppt

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Welding lecture in detail for understanding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Project quality management in manufacturing
PDF
PPT on Performance Review to get promotions
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
Mechanical Engineering MATERIALS Selection
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
additive manufacturing of ss316l using mig welding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Embodied AI: Ushering in the Next Era of Intelligent Systems
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Welding lecture in detail for understanding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Foundation to blockchain - A guide to Blockchain Tech
Project quality management in manufacturing
PPT on Performance Review to get promotions
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mechanical Engineering MATERIALS Selection
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

Pattern recognition binoy 05-naive bayes classifier

  • 2. Algorithm • A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. • Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and is widely used. Assume that there are n number of features in the dataset, then X= {x1 ,x2 , … , xn }
  • 3. Naïve Bayes -Details • Bayes classification: Difficulty: learning the joint probability • Naïve Bayes classification- Assumption that all input features are conditionally independent! )()|,,()()()( 1 CPCXXPCPC|P|CP n XX )|,,( 1 CXXP n )|()|()|()|,,,( 2121 CXPCXPCXPCXXXP nn 
  • 4. Naïve Bayes • NB classification rule: • for given 𝑋 = (𝑥1, 𝑥2, 𝑥3, . . 𝑥 𝑛) and L number of classes: C1, C2, .., CL, the vector X is assigned to class c* when: Lnn ccccccPcxPcxPcPcxPcxP ,,,),()]|()|([)()]|()|([ 1 * 1 *** 1 
  • 5. Naïve Bayes • Algorithm: Continuous-valued Features – Conditional probability often modeled with the normal distribution – Learning Phase: Output: normal distributions and – Test Phase: Given an unknown instance • Instead of looking-up tables, calculate conditional probabilities with all the normal distributions achieved in the learning phrase • Apply the MAP rule to make a decision ijji ijji ji jij ji ij cC cX X cCXP                whichforexamplesofXvaluesfeatureofdeviationstandard: Cwhichforexamplesofvaluesfeatureof(avearage)mean: 2 )( exp 2 1 )|(ˆ 2 2 Ln ccCXX ,,),,,(for 11 XLn LicCP i ,,1)(  ),,( 1 naa X
  • 6. Example 3-Naïve Bayes Classifier with Continuous Attributes • Problem: classify whether a given person is a male or a female based on the measured features. The features include: height, weight, and foot size. Training Example training set below. Sex (o/p class) Height (ft) Weight (lbs) foot size (inches) male 6 180 12 male 5.92 190 11 male 5.58 170 12 male 5.92 165 10 female 5 100 6 female 5.5 150 8 female 5.42 130 7 female 5.75 150 9
  • 7. Example 3 • Solution • Phase 1: Training • The classifier created from the training set using a Gaussian distribution assumption would be: sex mean (height) variance (height) mean (weight) variance (weight) Mean (foot size) variance (foot size) male 5.855 3.50E-02 176.25 1.23E+02 11.25 9.17E-01 female 5.4175 9.72E-02 132.5 5.58E+02 7.5 1.67E+00 We have equiprobable classes from the dataset, so P(male)= P(female) = 0.5.
  • 8. Example 3 • Phase 2: Testing • Below is a sample X to be classified as a male or female. sex height (ft) weight foot size(inches) To identify 6 130 8 Solution: X={6,130,8} Given this info, We wish to determine which is greater, p(male|X) or p(female|X) . p(male|X) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence p(female|X) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence
  • 9. Example 3 • The evidence (also termed normalizing constant) may be calculated since the sum of the posteriors equals one. • evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) + P(female)*P(height|female)*P(weight|female)*P(foot size|female) • The evidence may be ignored since it is a positive constant and is same for both the classes. (Normal distributions are always positive.)
  • 10. Example 3 • We now determine the sex of the sample. • P(male) = 0.5 • P(height|male) = 1.5789 (A probability density greater than 1 is OK. It is the area under the bell curve that is equal to 1.) • P(weight|male) = 5.9881e-06 • P(foot size|male) = 1.3112e-3 • numerator of p(male|X) = their product = 6.1984e-09
  • 11. Example 3 • P(female) = 0.5 • P(height|female) = 2.2346e-1 • P(weight|female) = 1.6789e-2 • P(foot size|female) = 2.8669e-1 • numerator of p(female|X) = their product = 5.3778e-04 Result: Since posterior numerator of p(female|X) > posterior numerator of p(male|X) , the sample is female.
  • 12. Naïve Bayes • Algorithm: Discrete-Valued Features – Learning Phase: Given a training set S, Output: conditional probability tables; for elements – Test Phase: Given an unknown instance , Look up tables to assign the label c* to X’ if ;inexampleswith)|(estimate)|(ˆ ),1;,,1(featureeachofvaluefeatureeveryFor ;inexampleswith)(estimate)(ˆ ofvaluetargeteachFor 1 S S ijkjijkj jjjk ii Lii cCxXPcCxXP N,knjXx cCPcCP )c,,c(cc     Lnn ccccccPcaPcaPcPcaPcaP ,,,),(ˆ)]|(ˆ)|(ˆ[)(ˆ)]|(ˆ)|(ˆ[ 1 * 1 *** 1  ),,( 1 naa X LNX jj ,
  • 13. Example • Example: Play Tennis Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
  • 14. Example • Learning Phase Outlook Play=Yes Play=No Sunny 2/9 3/5 Overcast 4/9 0/5 Rain 3/9 2/5 Temperature Play=Yes Play=No Hot 2/9 2/5 Mild 4/9 2/5 Cool 3/9 1/5 Humidity Play=Yes Play=No High 3/9 4/5 Normal 6/9 1/5 Wind Play=Yes Play=No Strong 3/9 3/5 Weak 6/9 2/5 P(Play=Yes) = 9/14 P(Play=No) = 5/14 We have four variables, we calculate for each we calculate the conditional probability table
  • 15. Example • Test Phase – Given a new instance, predict its label x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) – Look up tables achieved in the learning phrase – Decision making with the MAP rule P(Outlook=Sunny|Play=No) = 3/5 P(Temperature=Cool|Play==No) = 1/5 P(Huminity=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9 P(Temperature=Cool|Play=Yes) = 3/9 P(Huminity=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
  • 16. Example • Test Phase – Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) – Look up tables – MAP rule P(Outlook=Sunny|Play=No) = 3/5 P(Temperature=Cool|Play==No) = 1/5 P(Huminity=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9 P(Temperature=Cool|Play=Yes) = 3/9 P(Huminity=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
  • 17. Example 2: Training dataset age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no Class: C1:buys_computer=‘yes’ C2:buys_computer=‘no’ Data sample: X = (age<=30, Income=medium, Student=yes Credit_rating=Fair)
  • 18. Naïve Bayesian Classifier: Example 2 • Compute P(X|Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 • X=(age<=30 ,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007  X belongs to class “buys_computer=yes” P(buys_computer=“yes“)=9/14 P(buys_computer=“no“)=5/14
  • 19. Summary • Naïve Bayes: the conditional independence assumption • Training is very easy and fast; just requiring considering each attribute in each class separately • Test is straightforward; just looking up tables or calculating conditional probabilities with estimated distributions • A popular generative model • Performance competitive to most of state-of-the-art classifiers even in presence of violating independence assumption • Many successful applications, e.g., spam mail filtering • A good candidate of a base learner in ensemble learning • Apart from classification, naïve Bayes can do more…