AI 10 | Naive Bayes Classifier

Naïve Bayes Classifier
CSI 341 Mohammad Imam Hossain | Lecturer, Dept. of CSE | UIU

Probability Theory >> Prior vs Posterior Probabilities
▸Unconditional/Prior probabilities
- refer to the degrees of belief in propositions in the absence of any other information.
- the prior probability of an event e is represented as P(e)
- for example,
P(cavity) = 0.2 [meaning cavity is true with probability 0.2 when you have no other information]
▸Conditional/Posterior probabilities
- refer to the degrees of belief in propositions with some evidence and in the absence of any further information.
- the posterior probability of an event e2 when it is known that event e1 has occurred is represented as P(e2 | e1)
- for example,
P(cavity | toothache) = 0.6
meaning whenever toothache is true and we have no other information conclude that cavity is true with probability 0.6
- conditional probability can be defined in terms of prior probabilities as follow:
P(e2 | e1) =
P(e2 ∩ e1)
P(e1)
2
Mohammad Imam Hossain | Lecturer, Dept. of CSE | UIU

Probability Theory >> Probability Rules
1) 0 ≤ P(e) ≤ 1
2) Complement Rule: P(e) + P(e’) = 1
3) Additive Rule:
P(e1 ∪ e2) = P(e1) + P(e2) – P(e1 ∩ e2)
or, P(e1 ∪ e2 ∪ e3) = P(e1) + P(e2) + P(e3) – P(e1 ∩ e2) – P(e2 ∩ e3) – P(e3 ∩ e1) + P(e1 ∩ e2 ∩ e3)
4) Additive Rule for Mutually Exclusive Event i.e. P(e1 ∩ e2) = 0 :
P(e1 ∪ e2) = P(e1) + P(e2)
or, P(e1 ∪ e2 ∪ … … … ∪ en) = P(e1) + P(e2) + … … … + P(en) = σ𝑖=1
𝑛
𝑃(𝑒𝑖)
5) Product Rule:
P(e1 ∩ e2) = P(e1) P(e2 | e1) = P(e2) P(e1 | e2)
or, P(e1 ∩ e2 ∩ … … … ∩ en) = P(e1) P(e2 | e1) P(e3 | e2, e1) P(e4 | e3, e2, e1) … … P(en | en-1, en-2, … …, e1)
6) Product Rule for Independent Event i.e. P(e2) = P(e2 | e1) :
P(e1 ∩ e2) = P(e1) P(e2 | e1) = P(e1) P(e2)
or, P(e1 ∩ e2 ∩ … … … ∩ en) = P(e1) P(e2) P(e3) P(e4) … … P(en) = ς𝑖=1
𝑛
𝑃(𝑒𝑖)
3

8) Total Probability:
P(e) = P(e ∩ c1) + P(e ∩ c2) + P(e ∩ c3)
= P(c1) P(e | c1) + P(c2) P(e | c2) + P(e) P(e | c3)
9) Bayes Theorem:
P(c2 | e) =
𝑃(𝑐2 ∩ 𝑒)
𝑃(𝑒)
=
𝑃 𝑐2 𝑃 𝑒 𝑐2)
𝑃 𝑒 ∩ 𝑐1 + 𝑃 𝑒 ∩ 𝑐2 +𝑃(𝑒 ∩ 𝑐3)
=
𝑃 𝑐2 𝑃 𝑒 𝑐2)
𝑃 𝑐1 𝑃 𝑒 𝑐1)+ 𝑃 𝑐2 𝑃 𝑒 𝑐2) + 𝑃 𝑐3 𝑃 𝑒 𝑐3)
another form,
P(cause | effect) =
𝑃 𝑒𝑓𝑓𝑒𝑐𝑡 𝑐𝑎𝑢𝑠𝑒) 𝑃(𝑐𝑎𝑢𝑠𝑒)
𝑃(𝑒𝑓𝑓𝑒𝑐𝑡)
5
c1
c3
c2
e
Posterior Prob.
Likelihood Prior Prob. of Cause
Prior Prob. of Evidence

Probability Theory >> Bayes Theorem Example
Problem 1 >>
In Orange County, 51% of the adults are males [so the other
49% are females]. One adult is randomly selected for a survey. It is later
learned that the selected survey subject(that adult) was smoking a cigar.
Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars
(based on data from the Substance Abuse and Mental Health Services
Administration). Use this additional information to find the probability that
the selected subject is a male.
P(male | smoker) = ?
Ans. 0.853
6

Probability Theory >> Bayes Theorem Example
Problem 2 >>
An aircraft emergency locator transmitter (ELT) is a device designed to transmit a signal in the case of a
crash. The Altigauge Manufacturing Company makes 80% of the ELTs, the Bryant Company makes 15% of them, and the
Chartair Company makes the other 5%. The ELTs made by Altigauge have a 4% rate of defects, the Bryant ELTs have a 6%
rate of defects, and the Chartair ELTs have a 9% rate of defects. If a randomly selected ELT is then tested and is found to
be defective, find the probability that it was made by the Altigauge Manufacturing Company.
Ans. 0.703
7

Probability Theory >> Normalization
Normalization>>
Here, e is the provided evidence.
Now detect the cause either c1 or, c2 or, c3 from which evidence e comes from.
P(c1 | e) =
𝑃(𝑐1 ∩ 𝑒)
𝑃(𝑒)
= 𝛼 P(c1 ∩ e)
P(c2 | e) =
𝑃(𝑐2 ∩ 𝑒)
𝑃(𝑒)
= 𝛼 P(c2 ∩ e)
P(c3 | e) =
𝑃(𝑐3 ∩ 𝑒)
𝑃(𝑒)
= 𝛼 P(c3 ∩ e)
We are not calculating alpha, where
𝛼 =
1
𝑃(𝑒)
=
1
𝑃 𝑐1 ∩ 𝑒 +𝑃 𝑐2 ∩ 𝑒 +𝑃(𝑐3 ∩ 𝑒)
=
1
𝑃 𝑐1 𝑃 𝑒 𝑐1 +𝑃 𝑐2 𝑃 𝑒 𝑐2 +𝑃 𝑐3 𝑃(𝑒|𝑐3)
Calculate P(c1 | e), P(c2 | e) and P(c3 | e) and find out the maximum probability (for example c2),
then we can decide that there exists a higher probability that from c2 evidence e has occurred.
8
c1
c3
c2
e

Naïve Bayes Classifier
Classifier >>
A classifier is a machine learning model that is used to discriminate different objects based on certain input
features.
Naïve Bayes Classifier >>
It belongs to the family of probability classifier, using Bayesian theorem. In this model, the class variable C
(which is to be predicted) is the root and the attribute variables Xi are the leaves.
The model is “Naive” because it assumes that attributes are conditionally independent to each other given the class..
9
C
x1 x2 x3 x4 xn… … …

Naïve Bayes Classifier >> Example
P(ck | x1, x2, x3, … …, xn)
= 𝛼 P(ck, x1, x2, x3, … …, xn) ; [ 𝛼 = 1/P(x1, x2, x3, … …, xn) ]
= 𝛼 P(ck) P(x1 | ck) P(x2 | x1, ck) P(x3 | x2, x1, ck) … …
P(xn | xn-1, xn-2, … …, x2, x1, ck)
= 𝛼 P(ck) P(x1 | ck) P(x2 | ck) P(x3 | ck) … … P(xn | ck)
= 𝛼 P(ck) ς𝑖=1
𝑛
𝑃 𝑥𝑖 𝑐𝑘)
Calculate for all possible values of classes ck and choose
the ck having higher probability.
Here,
P(ck) =
No. of instances having class ck
total no of instances
P(xi | ck) =
No. of instances having attribute xi and class ck
no of instances having class ck
10
Job Type Income Level Likes to
Hangout
Tour offer
taken
Engineer High Yes Yes
Doctor High Yes No
Engineer Medium No Yes
Teacher Medium No Yes
Doctor Medium Yes Yes
Engineer Medium No No
Teacher High Yes No
Doctor High Yes No
Teacher Medium Yes Yes
Doctor Medium No No
Engineer Medium Yes ???

P(ck | x1, x2, x3, … …, xn) = 𝛼 P(ck) ς𝑖=1
𝑛
Here,
P(ck) =
P(xi | ck) =
11
P(Y | E, M, Y) = 𝜶 P(Y) P(E | Y) P(M | Y) P(Y | Y)
P(Y) = 5/10, P(E | Y) = 2/5, P(M | Y) = 4/5, P(Y | Y) = 3/5
So, P(Y | E, M, Y) = 𝛼 (5/10) (2/5) (4/5) (3/5) = 𝛼 0.096
Hangout
Tour offer
taken
Doctor High Yes No
Teacher High Yes No
Doctor High Yes No
Doctor Medium No No

P(ck | x1, x2, x3, … …, xn) = 𝛼 P(ck) ς𝑖=1
𝑛
Here,
P(ck) =
P(xi | ck) =
12
P(N | E, M, Y) = 𝜶 P(N) P(E | N) P(M | N) P(Y | N)
P(N) = 5/10, P(E | N) = 1/5, P(M | N) = 2/5, P(Y | N) = 2/5
So, P(N | E, M, Y) = 𝛼 (5/10) (1/5) (2/5) (2/5) = 𝛼 0.016
Hangout
Tour offer
taken
Doctor High Yes No
Teacher High Yes No
Doctor High Yes No
Doctor Medium No No

Naïve Bayes Classifier >> Practice 1
14
Outlook Temperature Humidity Windy Play Golf
Rainy Hot High False No
Rainy Hot High True No
Overcast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Overcast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Sunny Mild High True No
Given a day is Rainy, Hot, High,
False.
Now decide you should play golf
or not?

15
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
python no no no no non-mammals
salmon no no yes no non-mammals
whale yes no yes no mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
bat yes yes no yes mammals
pigeon no yes no yes non-mammals
cat yes no no yes mammals
leopard shark yes no yes no non-mammals
turtle no no sometimes yes non-mammals
penguin no no sometimes yes non-mammals
porcupine yes no no yes mammals
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals
platypus no no no yes mammals
owl no yes no yes non-mammals
dolphin yes no yes no mammals
eagle no yes no yes non-mammals
yes no yes no ???

Naïve Bayes Classifier >> Laplacian Smoothing
16
P(ck | x1, x2, x3, … …, xn) = 𝛼 P(ck) ς𝑖=1
𝑛
Here,
P(ck) =
total no of instances = 0
=
No. of instances having class ck + λ
total no of instances + K λ
P(xi | ck) =
no of instances having class ck = 0
=
No. of instances having attribute xi and class ck + λ
no of instances having class ck + A λ
Here,
K = no of different values in the class,
A = no of different values in xi,
𝜆 = Laplacian smoothing constant.

17
Age Vision Astigmatism Tear production Lens type
Young Near No Reduced None
Young Near No Normal Soft
Young Near Yes Reduced None
Young Far No Normal Soft
Young Far Yes Normal Hard
Middle-aged Near No Reduced None
Middle-aged Near No Normal Soft
Middle-aged Near Yes Normal Hard
Middle-aged Far Yes Reduced None
Middle-aged Far No Normal Soft
Old Near Yes Normal Hard
Old Near No Reduced None
Old Far No Reduced None
Old Far No Normal Soft
Old Far Yes Reduced None
Given a patient is old, has far
sightedness, has astigmatism and
tear production is normal,
what lens should be suggested?
a) Use Naïve Bayes Classifier
b) Use Naïve Bayes Classifier
with Laplacian smoothing
where Laplacian smoothing
constant = 1

18
Performance
Rating
Skillset Relationship
with Manager
Bonus
High High Good Yes
High Low Good Yes
Low Low Bad No
Low Low Good No
High Low Good Yes
Low Low Bad No
Low High Bad No
Low Low Good No
High Low Bad No
High High Bad Yes
Determine an employee with the features
{Performance Rating = Low, Skillset = High,
Relationship with Manger = Good} will receive bonus
or not by using Naïve Bayes Classifier. Use Laplacian
smoothing constant = 1

19
THANKS!
Any questions?
You can find me at imam@cse.uiu.ac.bd

AI 10 | Naive Bayes Classifier

More Related Content

What's hot (20)

Similar to AI 10 | Naive Bayes Classifier (20)

More from Mohammad Imam Hossain (20)

Recently uploaded (20)

AI 10 | Naive Bayes Classifier