Naive Bayes_1.pptx Slides of NB in classical machine learning

1
 Classification:
 Predicts categorical class labels
 Classifies data (using a model) based on instance attributes to predict
class labels
 The model is induced from a training set
 The model is used to classify (predict) the class of new data
 Typical Applications of Classification
 Credit approval
 Target marketing
 Medical diagnosis/prognosis
 Treatment effectiveness analysis
Classification

2
Classification—A Two-Step Process
 (1) Model construction: describing a set of predetermined classes
 Each instance/example is assumed to belong to a predefined class, as
determined by the class label
 The set of instances used for model construction: training set
 The model is represented as classification rules, decision trees, or
mathematical formulae
 (2) Model usage: for classifying future or unknown objects
 Model Evaluation: Estimate accuracy of the model
 The known label of test sample is compared with the classified result
from the model
 Accuracy rate is the percentage of test set samples that are correctly
classified by the model
 Test set is independent of training set, otherwise over-fitting will occur

3
Classification
 Using a Classifier for Prediction
Data to be classified Classifier Decision on class
assignment
Using Hypothesis for Prediction: classifying any
example described in the same manner as the data
used in training the system (i.e. same set of features)

4
Classification
Training Set
Data with known
classes
Classification
Technique
Classifier
Data with unknown
classes
Class
Assignment

5
Naive Bayes
 Naive Bayes is a simple probabilistic classifier based on
applying Bayes' theorem (or Bayes's rule) with strong
independence (naive) assumptions
 Allows us to combine observed data and prior knowledge
 Provides practical learning algorithms

6
)
(
)
(
)
|
(
)
|
(
d
P
h
P
h
d
P
d
h
p 
)
data
the
seen
having
after
hypothesis
of
ty
(probabili
posterior
data)
the
of
y
probabilit
(marginal
evidence
data
is
hypothesis
the
if
data
the
of
ty
(probabili
likelihood
data)
any
seeing
before
hypothesis
of
ty
(probabili
belief
prior
d
h
h
h
:
)
|
(
:
)
(
)
|
(
)
(
true)
:
)
|
(
:
)
(
d
h
P
h
P
h
d
P
d
P
h
d
P
h
P
h


Who is who in Bayes’ rule sides
both
on
y
probabilit
joint
same
the
)
,
(
)
,
(
)
(
)
|
(
)
(
)
|
(
:
rearrange
Just
hypothesis
h
data
d
rule
Bayes'
ing
Understand
h
d
P
h
d
P
h
P
h
d
P
d
P
d
h
p
Proof.




Bayes’ Rule

7
Example 1
 Let h be the event of raining and d be the evidence of dark
cloud, then we have
 P(dark cloud | raining): “dark cloud” could occur in many other
events such as overcast day or forest fire. This probability can
be obtained from historical data.
 – P(raining) is the priori probability of raining. This probability
can be obtained from statistical record, for example, the
number of rainy days throughout a year.
 – P(dark cloud) is the probability of the evidence “dark cloud”
occurring.
)
(
)
(
)
|
(
)
|
(
darkCloud
P
raining
P
raining
darkCloud
P
darkCloud
raining
p 

8
More on Naive Bayes
 Generally, it is “better” to have more than one evidence to
support the prediction of an event.
 Typically, the more evidences we can gather, the better the
classification accuracy can be obtained. However, the evidence
must relate to the event (must make sense)
 When we have more than one evidence for building our NB
model, we could run into a problem of dependencies, i.e.,
some evidence may depend on one or more of other
evidences. For example, the evidence “dark cloud” directly
depends on the evidence “high humidity”.
 Dependencies can make the model a very complicated one
 Assume there are no dependencies  Naïve

9
The Naïve Bayes Classifier
 What can we do if our data d has several attributes?
 Naïve Bayes assumption: Attributes that describe data instances are
conditionally independent given the classification hypothesis
 it is a simplifying assumption, obviously it may be violated in reality
 in spite of that, it works well in practice
 The Bayesian classifier that uses the Naïve Bayes assumption and computes
the MAP hypothesis is called Naïve Bayes classifier
 One of the most practical learning methods
 Successful applications:
 Medical Diagnosis
 Text classification



t
t
T h
a
P
h
a
a
P
h
P )
|
(
)
|
,...,
(
)
|
( 1
d

10
Example. ‘Play Tennis’ data
Day Outlook Temperature Humidity Wind Play
Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

12
Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)
 That means: Play tennis or not?
 Working:
)
|
(
)
|
(
)
|
(
)
|
(
)
(
max
arg
)
|
(
)
(
max
arg
)
|
(
)
(
max
arg
]
,
[
]
,
[
]
,
[
h
strong
Wind
P
h
high
Humidity
P
h
cool
Temp
P
h
sunny
Outlook
P
h
P
h
a
P
h
P
h
P
h
P
h
no
yes
h
t
t
no
yes
h
no
yes
h
NB











x
no
x
PlayTennis
answer
no
strong
P
no
high
P
no
cool
P
no
sunny
P
no
P
yes
strong
P
yes
high
P
yes
cool
P
yes
sunny
P
yes
P
etc
no
PlayTennis
strong
Wind
P
yes
PlayTennis
strong
Wind
P
no
PlayTennis
P
yes
PlayTennis
P


















)
(
:
)
|
(
)
|
(
)
|
(
)
|
(
)
(
0053
.
0
)
|
(
)
|
(
)
|
(
)
|
(
)
(
.
60
.
0
5
/
3
)
|
(
33
.
0
9
/
3
)
|
(
36
.
0
14
/
5
)
(
64
.
0
14
/
9
)
(
0.0206

13
A Closer Look
We can ignore Pr(E) because we only need to “relatively” compare the
value to other class. :

14
Resources
 Textbook reading (contains details about using Naïve Bayes for
text classification):
Tom Mitchell, Machine Learning (book), Chapter 6.

Naive Bayes_1.pptx Slides of NB in classical machine learning

More Related Content

Similar to Naive Bayes_1.pptx Slides of NB in classical machine learning (20)

Recently uploaded (20)

Naive Bayes_1.pptx Slides of NB in classical machine learning