SlideShare a Scribd company logo
DR.M.THIAGARAJAN
ASSOCIATE PROFESSOR OF
MATHEMATICS
ST JOSEPH’S COLLEGE
TRICHIRAPPALLI
Uncertainty in AI
Outline:
 Introduction
 Basic Probability Theory
 Probabilistic Reasoning
 Why should we use probability theory?
 Dutch Book Theorem
Sources of Uncertainty
Information is partial
Information is not fully reliable.
Representation language is inherently
imprecise.
Information comes from multiple sources
and it is conflicting.
Information is approximate
Non-absolute cause-effect relationships exist
Basic Probability
Probability theory enables us to make
rational decisions.
Which mode of transportation is safer:
 Car or Plane?
 What is the probability of an accident?
Basic Probability Theory
An experiment has a set of potential outcomes,
e.g., throw a dice
The sample space of an experiment is the set
of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6}
An event is a subset of the sample space.
 {2}
 {3, 6}
 even = {2, 4, 6}
 odd = {1, 3, 5}
Probability as Relative
Frequency
An event has a probability.
Consider a long sequence of experiments. If we
look at the number of times a particular event
occurs in that sequence, and compare it to the
total number of experiments, we can compute a
ratio.
This ratio is one way of estimating the probability
of the event.
P(E) = (# of times E occurred)/(total # of trials)
Example
 100 attempts are made to swim a length in 30
secs. The swimmer succeeds on 20 occasions
therefore the probability that a swimmer can
complete the length in 30 secs is:
 20/100 = 0.2
 Failure = 1-.2 or 0.8
The experiments, the sample space and the
events must be defined clearly for probability
to be meaningful
 What is the probability of an accident?
Theoretical Probability
Principle of Indifference—
Alternatives are always to be judged
equiprobable if we have no reason
to expect or prefer one over the
other.
Each outcome in the sample space is
assigned equal probability.
Example: throw a dice
 P({1})=P({2})= ... =P({6})=1/6
Law of Large Numbers
As the number of experiments increases the
relative frequency of an event more closely
approximates the theoretical probability of
the event.
 if the theoretical assumptions hold.
Buffon’s Needle for Computing π
 Draw parallel lines 1 inch apart on a plane
 Throw a 1-inch needle on the plane
 P( needle crossing a line )=2/π
number of throws
2 number of crossings
 
Large Number Reveals
Untruth in Assumptions
Results of 1,000,000 throws of a die
Number 1 2 3 4 5 6
Fraction .155 .159 .164 .169 .174 .179
Axioms of Probability Theory
Suppose P(.) is a probability function, then
1.for any event E, 0≤P(E) ≤1.
2.P(S) = 1, where S is the sample space.
3.for any two mutually exclusive events E1 and
E2,
P(E1  E2) = P(E1) + P(E2)
Any function that satisfies the above three
axioms is a probability function.
Joint Probability
Let A, B be two events, the joint probability of both A and
B being true is denoted by P(A, B).
Example:
P(spade) is the probability of the top card being a spade.
P(king) is the probability of the top card being a king.
P(spade, king) is the probability of the top card being
both a spade and a king, i.e., the king of spade.
P(king, spade)=P(spade, king) ???
Properties of Probability
1.P(E) = 1– P(E)
2.If E1 and E2 are logically equivalent, then
P(E1)=P(E2).
 E1: Not all philosophers are more than six feet tall.
 E2: Some philosopher is not more that six feet tall.
Then P(E1)=P(E2).
3.P(E1, E2)≤P(E1).
Conditional Probability
The probability of an event may change
after knowing another event.
The probability of A given B is denoted
by P(A|B).
Example
 P( W=space ) the probability of a randomly
selected word from an English text is ‘space’
 P( W=space | W’=outer) the probability of
‘space’ if the previous word is ‘outer’
Example
A: the top card of a deck of poker cards is a king of
spade
P(A) = 1/52
However, if we know
B: the top card is a king
then, the probability of A given B is true is
P(A|B) = 1/4.
How to Compute P(A|B)?
A B
P (A| B)=
N (A a n d B)
N (B)
=
N (A a n d B)
N
N (B)
N
=
P (A, B)
P (B)
P (br own | cow)=
N (br own -cows)
N (cows)
=
P (br own -cow)
P (cow)
Business Students
Of 100 students completing a course, 20 were
business major. Ten students received As in the
course, and three of these were business majors.,
suppose A is the event that a randomly selected
student got an A in the course, B is the event that a
randomly selected event is a business major. What
is the probability of A? What is the probability of A
after knowing B is true?
3
7
80
20
A
B not B
Probabilistic Reasoning
Evidence
 What we know about a situation.
Hypothesis
 What we want to conclude.
Compute
 P( Hypothesis | Evidence )
Credit Card Authorization
E is the data about the applicant's age,
job, education, income, credit history,
etc,
H is the hypothesis that the credit card
will provide positive return.
The decision of whether to issue the
credit card to the applicant is based on
the probability P(H|E).
Medical Diagnosis
E is a set of symptoms, such as, coughing,
sneezing, headache, ...
H is a disorder, e.g., common cold, SARS,
flu.
The diagnosis problem is to find an H
(disorder) such that P(H|E) is maximum.
Linda is 31 years old, single, outspoken, and very bright.
She majored in philosophy. As a student, she was deeply
concerned with issues of discrimination and social justice,
and also participated in antinuclear demonstrations.
Please rank the following statements by their probability,
using 1 for the most probable and 8 for the least
probable.
a. Linda is a teacher in elementary school.
b. Linda works in a bookstore and takes yoga classes.
c. Linda is active in the feminist movement.
d. Linda is psychiatric social worker.
e. Linda is a member of the League of Women Voters.
f. Linda is a bank teller.
g. Linda is an insurance salesperson.
h. Linda is a bank teller and is active in the feminist
movement.
Example
A patient takes a lab test and the result comes back
positive. The test has a false negative rate of 2%
and false positive rate of 3%. Furthermore, 0.8%
of the entire population have this cancer.
What is the probability of cancer if we know the
test result is positive?
Bayes Theorem
If P(E2)>0, then
P(E1|E2)=P(E2|E1)P(E1)/P(E2)
This can be derived from the definition of
conditional probability.
The Three-Card Problem
Three cards are in a hat. One is red on both sides (the
red-red card). One is white on both sides (the white-
white card). One is red on one side and white on the
other (the red-white card). A single card is drawn
randomly and tossed into the air.
a. What is the probability that the red-red card was
drawn? (RR)
b. What is the probability that the drawn cards lands
with a white side up? (W-up)
c. What is the probability that the red-red card was not
drawn, assuming that the drawn card lands with the
a red side up. (not-RR|R-up)
Fair Bets
A bet is fair to an individual I if, according to the individual's
probability assessment, the bet will break even in the long run.
The following three bet are fair :
Bet (a): Win $4.20 if RR;
lose $2.10
otherwise. [since you believe P(RR)=1/3]
Bet (b): Win $2.00 if W-up;
lose $2.00
otherwise. [since you believe P(W-up)=1/2]
Bet (c): Win $4.00 if R-up and not-RR;
lose $4.00 if R-up and RR;
neither win nor lose if not-R-up.
[since you believe P(not-RR|R-up)=1/2]
Dutch Book
The bets that you accepted have an
interesting property:
No matter what card is drawn in the
three-card problem, and no matter how
it lands, you are guaranteed to lose
money.
This is called a Dutch Book
Verification
there are three possible outcomes
1. Some card other than red-red is drawn, and it lands with
white side up. That is, W-up and not-RR
2. Some card other than red-red is drawn, and it lands with a
red side up. That is, R-up and not-RR.
3. The red-red card is drawn, and it lands (of course) with a
red side up. That is, R-up and RR.
1 2 3
a. –$2.10 –$2.10 +$4.20
b. +$2.00 –$2.00 –$2.00
c. ±$0.00 +$4.00 –$4.00
total –$0.10 –$0.10 –$1.80
The Dutch Book Theorem
Suppose that an individual I is willing to
accept any bet that is fair for I. Then a
Dutch book can be made against I if and
only if I's assessment of probability
violates Bayesian axiomatization.
Independence: Intuition
Events are independent if one has nothing
whatever to do with others. Therefore, for
two independent events, knowing one
happening does change the probability of the
other event happening.
 one toss of coin is independent of another coin
(assuming it is a regular coin).
 price of tea in England is independent of the result
of general election in Canada.
Independent or Dependent?
Getting cold and getting cat-allergy
Mile Per Gallon and acceleration.
Size of a person’s vocabulary the
person’s shoe size.
Independence: Definition
Events A and B are independent iff:
P(A, B) = P(A) x P(B)
which is equivalent to
P(A|B) = P(A) and
P(B|A) = P(B)
when P(A, B) >0.
T1: the first toss is a head.
T2: the second toss is a tail.
P(T2|T1) = P(T2)
Conditional Independence
Dependent events can become
independent given certain other events.
Example,
 Size of shoe
 Age
 Size of vocabulary
Two events A, B are conditionally
independent given a third event C iff
P(A|B, C) = P(A|C)
Conditional Independence:
Definition
Let E1 and E2 be two events, they are
conditionally independent given E iff
P(E1|E, E2)=P(E1|E),
that is the probability of E1 is not
changed after knowing E2, given E is
true.
Equivalent formulations:
P(E1, E2|E)=P(E1|E) P(E2|E)
P(E2|E, E1)=P(E2|E)
Example: Play Tennis?
Outlook Temperature Humidity Windy Class
sunny hot high false −
sunny hot high true −
overcast hot high false +
rain mild high false +
rain cool normal false +
rain cool normal true −
overcast cool normal true +
sunny mild high false −
sunny cool normal false +
rain mild normal false +
sunny mild normal true +
overcast mild high true +
overcast hot normal false +
rain mild high true −
Predict playing tennis when <sunny, cool, high, strong>
What probability should be used to make the prediction?
How to compute the probability?
Probabilities of Individual
Attributes
Given the training set, we can compute the
probabilities
Outlook + − Humidity + −
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
P(+) = 9/14
P(−) = 5/14
Naïve Bayes Method
Knowledge Base contains
 A set of hypotheses
 A set of evidences
 Probability of an evidence given a hypothesis
Given
 A sub set of the evidences known to be present in
a situation
Find
 the hypothesis with the highest posterior
probability: P(H|E1, E2, …, Ek).
 The probability itself does not matter so much.
Naïve Bayes Method
Assumptions
 Hypotheses are exhaustive and mutually
exclusive
 H1 v H2 v … v Hk
 ¬ (Hi ^ Hj) for any i≠j
 Evidences are conditionally independent
given a hypothesis
 P(E1, E2,…, Ek|H) = P(E1|H)…P(Ek|H)
 P(H | E1, E2,…, Ek)
= P(E1, E2,…, Ek, H)/P(E1, E2,…, Ek)
= P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek)
Naïve Bayes Method
The goal is to find H that maximize P(H|E1, E2,…, Ek)
Since
P(H|E1, E2,…, Ek) = P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek)
and P(E1, E2,…, Ek) is the same for different hypotheses,
Maximizing P(H|E1, E2,…, Ek) is equivalent to maximizing
P(E1, E2,…, Ek|H)P(H)= P(E1|H)…P(Ek|H)P(H)
Naïve Bayes Method
 Find a hypothesis that maximizes P(E1|H)…P(Ek|H)P(H)
Example: Play Tennis
P(+| sunny, cool, high, strong) vs.
P(−| sunny, cool, high, strong)
P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs.
P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−)
Outlook + − Humidity + −
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
P(+) = 9/14
P(−) = 5/14
Application: Spam Detection
Spam
 Dear sir, We want to transfer to overseas ($ 126,000.000.00
USD) One hundred and Twenty six million United States
Dollars) from a Bank in Africa, I want to ask you to quietly
look for a reliable and honest person who will be capable
and fit to provide either an existing ……
Legitimate email
 Ham: for lack of better name.
Hypotheses: {Spam, Ham}
Evidence: a document
 The document is treated as a set (or bag) of
words
Knowledge
 P(Spam)
 The prior probability of an e-mail message being a spam.
 How to estimate this probability?
 P(w|Spam)
 the probability that a word is w if we know w is chosen
from a spam.
 How to estimate this probability?
Limitations of Naïve Bayesian
Cannot handle hypotheses of composite
hypotheses well
 Suppose are independent of each
other
 Consider a composite hypothesis
 How to compute the posterior probability
?
)
,...,
|
^
( 1
2
1 l
E
E
H
H
P
n
H
H ,...,
1
2
1^ H
H
Using the Bayes’ Theorem
t
independen
are
they
because
)
(
)
(
)
^
( 2
1
2
1 H
P
H
P
H
H
P 
?
)
^
|
(
compute
to
How 2
1 H
H
E
P j
)
,...
(
)
^
(
)
^
|
,...
(
)
,...,
|
^
(
1
2
1
2
1
1
1
2
1
l
l
l
E
E
P
H
H
P
H
H
E
E
P
E
E
H
H
P 
2
1
2
1
1
2
1
1
^
given
t,
independen
are
assuming
)
^
|
(
)
^
|
,...
(
H
H
E
H
H
E
P
H
H
E
E
P
j
j
l
j
l 


but this is a very unreasonable assumption
Need a better representation and a better
assumption
)
,...,
|
(
)
,...,
|
(
)
,...,
|
^
( 1
2
1
1
1
2
1 l
l
l E
E
H
P
E
E
H
P
E
E
H
H
P 
?
,...,
given
t,
independen
are
,...,
Assuming 1
1 l
n E
E
H
H
E: earth quake B: burglar
A: alarm set off
E and B are independent
But when A is given, they
are (adversely) dependent
because they become
competitors to explain A
P(B|A, E) <<P(B|A)
E explains away of A
Cannot handle causal chaining
 Ex. A: weather of the year
B: cotton production of the year
C: cotton price of next year
 Observed: A influences C
 The influence is not direct (A -> B -> C)
P(C|B, A) = P(C|B): instantiation of B
blocks influence of A on C
Summary
Basics of Probability Theory
 Experiment, sample space, events
 Axioms and prosperities
 Joint Probability
 Conditional Probability
Probabilistic Reasoning
 Bayes Theorem
Dutch Book Theorem
Independence and Conditional Independence
Naïve Bayes Method

More Related Content

PPTX
introduction to probability
PPTX
Statistical Analysis with R -II
PPT
Artificial Intelligence Introduction to probability
PDF
Chapter1_Probability , Statistics Reliability
PPT
mal eta adep summa m eta alat settab222.ppt
PPTX
maths ca1.pptx ds45hgcr5rtv7vtcuvr6d6tgu8u8
PPTX
maths ca1 - Copy.pptx swswswqswsswqsqwssqwqw
PPTX
artificial intelligence and uncertain reasoning
introduction to probability
Statistical Analysis with R -II
Artificial Intelligence Introduction to probability
Chapter1_Probability , Statistics Reliability
mal eta adep summa m eta alat settab222.ppt
maths ca1.pptx ds45hgcr5rtv7vtcuvr6d6tgu8u8
maths ca1 - Copy.pptx swswswqswsswqsqwssqwqw
artificial intelligence and uncertain reasoning

Similar to Probability theory graphical model seee.ppt (20)

PDF
problem Formulation in artificial Intelligence
PPTX
BSM with Sofware package for Social Sciences
PPTX
probability a chance for anything(from impossible to sure )
PPTX
Probability basics and bayes' theorem
PPT
Probability concepts and procedures law of profitability
PPTX
01_Module_1-ProbabilityTheory.pptx
PPTX
Guide to Probability & Distributions: From Fundamentals to Advanced Concepts
PPTX
Decoding Chance: Probability in Everyday Life
PDF
group1-151014013653-lva1-app6891.pdf
PDF
Probability concepts for Data Analytics
PPTX
Basic probability with simple example.pptx
PPTX
Class 11 Basic Probability.pptx
PPTX
Statr sessions 7 to 8
DOCX
introduction to Probability theory
PPTX
Day 3.pptx
ODP
QT1 - 04 - Probability
PPTX
Introduction to Probability and Bayes' Theorom
PPT
Week 2 notes.ppt
PDF
Introduction to probability.pdf
PPTX
Probability concepts
problem Formulation in artificial Intelligence
BSM with Sofware package for Social Sciences
probability a chance for anything(from impossible to sure )
Probability basics and bayes' theorem
Probability concepts and procedures law of profitability
01_Module_1-ProbabilityTheory.pptx
Guide to Probability & Distributions: From Fundamentals to Advanced Concepts
Decoding Chance: Probability in Everyday Life
group1-151014013653-lva1-app6891.pdf
Probability concepts for Data Analytics
Basic probability with simple example.pptx
Class 11 Basic Probability.pptx
Statr sessions 7 to 8
introduction to Probability theory
Day 3.pptx
QT1 - 04 - Probability
Introduction to Probability and Bayes' Theorom
Week 2 notes.ppt
Introduction to probability.pdf
Probability concepts
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Mega Projects Data Mega Projects Data
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
Moving the Public Sector (Government) to a Digital Adoption
Mega Projects Data Mega Projects Data
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IB Computer Science - Internal Assessment.pptx
Business Acumen Training GuidePresentation.pptx
Foundation of Data Science unit number two notes
Clinical guidelines as a resource for EBP(1).pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

Probability theory graphical model seee.ppt

  • 1. DR.M.THIAGARAJAN ASSOCIATE PROFESSOR OF MATHEMATICS ST JOSEPH’S COLLEGE TRICHIRAPPALLI
  • 2. Uncertainty in AI Outline:  Introduction  Basic Probability Theory  Probabilistic Reasoning  Why should we use probability theory?  Dutch Book Theorem
  • 3. Sources of Uncertainty Information is partial Information is not fully reliable. Representation language is inherently imprecise. Information comes from multiple sources and it is conflicting. Information is approximate Non-absolute cause-effect relationships exist
  • 4. Basic Probability Probability theory enables us to make rational decisions. Which mode of transportation is safer:  Car or Plane?  What is the probability of an accident?
  • 5. Basic Probability Theory An experiment has a set of potential outcomes, e.g., throw a dice The sample space of an experiment is the set of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6} An event is a subset of the sample space.  {2}  {3, 6}  even = {2, 4, 6}  odd = {1, 3, 5}
  • 6. Probability as Relative Frequency An event has a probability. Consider a long sequence of experiments. If we look at the number of times a particular event occurs in that sequence, and compare it to the total number of experiments, we can compute a ratio. This ratio is one way of estimating the probability of the event. P(E) = (# of times E occurred)/(total # of trials)
  • 7. Example  100 attempts are made to swim a length in 30 secs. The swimmer succeeds on 20 occasions therefore the probability that a swimmer can complete the length in 30 secs is:  20/100 = 0.2  Failure = 1-.2 or 0.8 The experiments, the sample space and the events must be defined clearly for probability to be meaningful  What is the probability of an accident?
  • 8. Theoretical Probability Principle of Indifference— Alternatives are always to be judged equiprobable if we have no reason to expect or prefer one over the other. Each outcome in the sample space is assigned equal probability. Example: throw a dice  P({1})=P({2})= ... =P({6})=1/6
  • 9. Law of Large Numbers As the number of experiments increases the relative frequency of an event more closely approximates the theoretical probability of the event.  if the theoretical assumptions hold. Buffon’s Needle for Computing π  Draw parallel lines 1 inch apart on a plane  Throw a 1-inch needle on the plane  P( needle crossing a line )=2/π number of throws 2 number of crossings  
  • 10. Large Number Reveals Untruth in Assumptions Results of 1,000,000 throws of a die Number 1 2 3 4 5 6 Fraction .155 .159 .164 .169 .174 .179
  • 11. Axioms of Probability Theory Suppose P(.) is a probability function, then 1.for any event E, 0≤P(E) ≤1. 2.P(S) = 1, where S is the sample space. 3.for any two mutually exclusive events E1 and E2, P(E1  E2) = P(E1) + P(E2) Any function that satisfies the above three axioms is a probability function.
  • 12. Joint Probability Let A, B be two events, the joint probability of both A and B being true is denoted by P(A, B). Example: P(spade) is the probability of the top card being a spade. P(king) is the probability of the top card being a king. P(spade, king) is the probability of the top card being both a spade and a king, i.e., the king of spade. P(king, spade)=P(spade, king) ???
  • 13. Properties of Probability 1.P(E) = 1– P(E) 2.If E1 and E2 are logically equivalent, then P(E1)=P(E2).  E1: Not all philosophers are more than six feet tall.  E2: Some philosopher is not more that six feet tall. Then P(E1)=P(E2). 3.P(E1, E2)≤P(E1).
  • 14. Conditional Probability The probability of an event may change after knowing another event. The probability of A given B is denoted by P(A|B). Example  P( W=space ) the probability of a randomly selected word from an English text is ‘space’  P( W=space | W’=outer) the probability of ‘space’ if the previous word is ‘outer’
  • 15. Example A: the top card of a deck of poker cards is a king of spade P(A) = 1/52 However, if we know B: the top card is a king then, the probability of A given B is true is P(A|B) = 1/4.
  • 16. How to Compute P(A|B)? A B P (A| B)= N (A a n d B) N (B) = N (A a n d B) N N (B) N = P (A, B) P (B) P (br own | cow)= N (br own -cows) N (cows) = P (br own -cow) P (cow)
  • 17. Business Students Of 100 students completing a course, 20 were business major. Ten students received As in the course, and three of these were business majors., suppose A is the event that a randomly selected student got an A in the course, B is the event that a randomly selected event is a business major. What is the probability of A? What is the probability of A after knowing B is true? 3 7 80 20 A B not B
  • 18. Probabilistic Reasoning Evidence  What we know about a situation. Hypothesis  What we want to conclude. Compute  P( Hypothesis | Evidence )
  • 19. Credit Card Authorization E is the data about the applicant's age, job, education, income, credit history, etc, H is the hypothesis that the credit card will provide positive return. The decision of whether to issue the credit card to the applicant is based on the probability P(H|E).
  • 20. Medical Diagnosis E is a set of symptoms, such as, coughing, sneezing, headache, ... H is a disorder, e.g., common cold, SARS, flu. The diagnosis problem is to find an H (disorder) such that P(H|E) is maximum.
  • 21. Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations. Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable. a. Linda is a teacher in elementary school. b. Linda works in a bookstore and takes yoga classes. c. Linda is active in the feminist movement. d. Linda is psychiatric social worker. e. Linda is a member of the League of Women Voters. f. Linda is a bank teller. g. Linda is an insurance salesperson. h. Linda is a bank teller and is active in the feminist movement.
  • 22. Example A patient takes a lab test and the result comes back positive. The test has a false negative rate of 2% and false positive rate of 3%. Furthermore, 0.8% of the entire population have this cancer. What is the probability of cancer if we know the test result is positive?
  • 23. Bayes Theorem If P(E2)>0, then P(E1|E2)=P(E2|E1)P(E1)/P(E2) This can be derived from the definition of conditional probability.
  • 24. The Three-Card Problem Three cards are in a hat. One is red on both sides (the red-red card). One is white on both sides (the white- white card). One is red on one side and white on the other (the red-white card). A single card is drawn randomly and tossed into the air. a. What is the probability that the red-red card was drawn? (RR) b. What is the probability that the drawn cards lands with a white side up? (W-up) c. What is the probability that the red-red card was not drawn, assuming that the drawn card lands with the a red side up. (not-RR|R-up)
  • 25. Fair Bets A bet is fair to an individual I if, according to the individual's probability assessment, the bet will break even in the long run. The following three bet are fair : Bet (a): Win $4.20 if RR; lose $2.10 otherwise. [since you believe P(RR)=1/3] Bet (b): Win $2.00 if W-up; lose $2.00 otherwise. [since you believe P(W-up)=1/2] Bet (c): Win $4.00 if R-up and not-RR; lose $4.00 if R-up and RR; neither win nor lose if not-R-up. [since you believe P(not-RR|R-up)=1/2]
  • 26. Dutch Book The bets that you accepted have an interesting property: No matter what card is drawn in the three-card problem, and no matter how it lands, you are guaranteed to lose money. This is called a Dutch Book
  • 27. Verification there are three possible outcomes 1. Some card other than red-red is drawn, and it lands with white side up. That is, W-up and not-RR 2. Some card other than red-red is drawn, and it lands with a red side up. That is, R-up and not-RR. 3. The red-red card is drawn, and it lands (of course) with a red side up. That is, R-up and RR. 1 2 3 a. –$2.10 –$2.10 +$4.20 b. +$2.00 –$2.00 –$2.00 c. ±$0.00 +$4.00 –$4.00 total –$0.10 –$0.10 –$1.80
  • 28. The Dutch Book Theorem Suppose that an individual I is willing to accept any bet that is fair for I. Then a Dutch book can be made against I if and only if I's assessment of probability violates Bayesian axiomatization.
  • 29. Independence: Intuition Events are independent if one has nothing whatever to do with others. Therefore, for two independent events, knowing one happening does change the probability of the other event happening.  one toss of coin is independent of another coin (assuming it is a regular coin).  price of tea in England is independent of the result of general election in Canada.
  • 30. Independent or Dependent? Getting cold and getting cat-allergy Mile Per Gallon and acceleration. Size of a person’s vocabulary the person’s shoe size.
  • 31. Independence: Definition Events A and B are independent iff: P(A, B) = P(A) x P(B) which is equivalent to P(A|B) = P(A) and P(B|A) = P(B) when P(A, B) >0. T1: the first toss is a head. T2: the second toss is a tail. P(T2|T1) = P(T2)
  • 32. Conditional Independence Dependent events can become independent given certain other events. Example,  Size of shoe  Age  Size of vocabulary Two events A, B are conditionally independent given a third event C iff P(A|B, C) = P(A|C)
  • 33. Conditional Independence: Definition Let E1 and E2 be two events, they are conditionally independent given E iff P(E1|E, E2)=P(E1|E), that is the probability of E1 is not changed after knowing E2, given E is true. Equivalent formulations: P(E1, E2|E)=P(E1|E) P(E2|E) P(E2|E, E1)=P(E2|E)
  • 34. Example: Play Tennis? Outlook Temperature Humidity Windy Class sunny hot high false − sunny hot high true − overcast hot high false + rain mild high false + rain cool normal false + rain cool normal true − overcast cool normal true + sunny mild high false − sunny cool normal false + rain mild normal false + sunny mild normal true + overcast mild high true + overcast hot normal false + rain mild high true − Predict playing tennis when <sunny, cool, high, strong> What probability should be used to make the prediction? How to compute the probability?
  • 35. Probabilities of Individual Attributes Given the training set, we can compute the probabilities Outlook + − Humidity + − sunny 2/9 3/5 high 3/9 4/5 overcast 4/9 0 normal 6/9 1/5 rain 3/9 2/5 Tempreature Windy hot 2/9 2/5 true 3/9 3/5 mild 4/9 2/5 false 6/9 2/5 cool 3/9 1/5 P(+) = 9/14 P(−) = 5/14
  • 36. Naïve Bayes Method Knowledge Base contains  A set of hypotheses  A set of evidences  Probability of an evidence given a hypothesis Given  A sub set of the evidences known to be present in a situation Find  the hypothesis with the highest posterior probability: P(H|E1, E2, …, Ek).  The probability itself does not matter so much.
  • 37. Naïve Bayes Method Assumptions  Hypotheses are exhaustive and mutually exclusive  H1 v H2 v … v Hk  ¬ (Hi ^ Hj) for any i≠j  Evidences are conditionally independent given a hypothesis  P(E1, E2,…, Ek|H) = P(E1|H)…P(Ek|H)  P(H | E1, E2,…, Ek) = P(E1, E2,…, Ek, H)/P(E1, E2,…, Ek) = P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek)
  • 38. Naïve Bayes Method The goal is to find H that maximize P(H|E1, E2,…, Ek) Since P(H|E1, E2,…, Ek) = P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek) and P(E1, E2,…, Ek) is the same for different hypotheses, Maximizing P(H|E1, E2,…, Ek) is equivalent to maximizing P(E1, E2,…, Ek|H)P(H)= P(E1|H)…P(Ek|H)P(H) Naïve Bayes Method  Find a hypothesis that maximizes P(E1|H)…P(Ek|H)P(H)
  • 39. Example: Play Tennis P(+| sunny, cool, high, strong) vs. P(−| sunny, cool, high, strong) P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs. P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−) Outlook + − Humidity + − sunny 2/9 3/5 high 3/9 4/5 overcast 4/9 0 normal 6/9 1/5 rain 3/9 2/5 Tempreature Windy hot 2/9 2/5 true 3/9 3/5 mild 4/9 2/5 false 6/9 2/5 cool 3/9 1/5 P(+) = 9/14 P(−) = 5/14
  • 40. Application: Spam Detection Spam  Dear sir, We want to transfer to overseas ($ 126,000.000.00 USD) One hundred and Twenty six million United States Dollars) from a Bank in Africa, I want to ask you to quietly look for a reliable and honest person who will be capable and fit to provide either an existing …… Legitimate email  Ham: for lack of better name.
  • 41. Hypotheses: {Spam, Ham} Evidence: a document  The document is treated as a set (or bag) of words Knowledge  P(Spam)  The prior probability of an e-mail message being a spam.  How to estimate this probability?  P(w|Spam)  the probability that a word is w if we know w is chosen from a spam.  How to estimate this probability?
  • 42. Limitations of Naïve Bayesian Cannot handle hypotheses of composite hypotheses well  Suppose are independent of each other  Consider a composite hypothesis  How to compute the posterior probability ? ) ,..., | ^ ( 1 2 1 l E E H H P n H H ,..., 1 2 1^ H H
  • 43. Using the Bayes’ Theorem t independen are they because ) ( ) ( ) ^ ( 2 1 2 1 H P H P H H P  ? ) ^ | ( compute to How 2 1 H H E P j ) ,... ( ) ^ ( ) ^ | ,... ( ) ,..., | ^ ( 1 2 1 2 1 1 1 2 1 l l l E E P H H P H H E E P E E H H P  2 1 2 1 1 2 1 1 ^ given t, independen are assuming ) ^ | ( ) ^ | ,... ( H H E H H E P H H E E P j j l j l   
  • 44. but this is a very unreasonable assumption Need a better representation and a better assumption ) ,..., | ( ) ,..., | ( ) ,..., | ^ ( 1 2 1 1 1 2 1 l l l E E H P E E H P E E H H P  ? ,..., given t, independen are ,..., Assuming 1 1 l n E E H H E: earth quake B: burglar A: alarm set off E and B are independent But when A is given, they are (adversely) dependent because they become competitors to explain A P(B|A, E) <<P(B|A) E explains away of A
  • 45. Cannot handle causal chaining  Ex. A: weather of the year B: cotton production of the year C: cotton price of next year  Observed: A influences C  The influence is not direct (A -> B -> C) P(C|B, A) = P(C|B): instantiation of B blocks influence of A on C
  • 46. Summary Basics of Probability Theory  Experiment, sample space, events  Axioms and prosperities  Joint Probability  Conditional Probability Probabilistic Reasoning  Bayes Theorem Dutch Book Theorem Independence and Conditional Independence Naïve Bayes Method