SlideShare a Scribd company logo
Acting under uncertainty – Bayesian inference – naïve bayes models.
Probabilistic reasoning – Bayesian networks – exact inference in BN –
approximate inference in BN – causal networks.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY – COIMBATORE - 48
(An Autonomous Institution)
CS3491
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
Dr. B. Dhiyanesh
Associate Professor / CSE
UNIT II
PROBABILISTIC REASONING
10/23/2024 2
Recap of Previous lecture
10/23/2024 3
Lecture Topic
10/23/2024 4
UNCERTAINTY
 Uncertainty
 Review of probability
 Probabilistic reasoning – Bayes rule
 Bayesian networks
 Inferences in Bayesian network
Contd..
ACTING UNDER UNCERTAINTY
• Agents may need to handle uncertainty, whether due to partial observability, nondeterminism,
or a combination of the two. An agent may never know for certain what state it’s in or where
it will end up after a sequence of actions.
• With this knowledge representation, we might write A→B, which means if A is true then B is
true, but consider a situation where we are not sure about whether A is true or not then we
cannot express this statement, this situation is called uncertainty.
• So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
CAUSES OF UNCERTAINTY
• Following are some leading causes of uncertainty to occur in the real world.
• Information occurred from unreliable sources.
• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
• When interpreting partial sensor information, a logical agent must consider every logically
possible explanation for the observations, no matter how unlikely. This leads to impossible
large and complex belief-state representations.
• Sometimes there is no plan that is guaranteed to achieve the goal—yet the agent must act. It
must have some way to compare the merits of plans that are not guaranteed.
• Suppose, for example, that an automated taxi!
• Automated has the goal of delivering a passenger to the airport on time.
• The agent forms a plan, A90,
• that involves leaving home 90 minutes before the flight departs and driving at a
reasonable speed.
• Even though the airport is only about 5 miles away,
• a logical taxi agent will not be able to conclude with certainty that “Plan A90 will get
us to the airport in time.”
• Instead, it reaches the weaker conclusion
• “Plan A90 will get us to the airport in time,
• as long as the car doesn’t break down or run out of gas
• I don’t get into an accident, and there are no accidents on the bridge
• The plane doesn’t leave early, and no meteorite hits the car.
• Nonetheless, in some sense A90 is in fact the right thing to do.
• A90 is expected to maximize the agent’s performance measure (where the expectation is
relative to the agent’s knowledge about the environment).
• The performance measure includes
• getting to the airport in time for the flight
• avoiding a long
• unproductive wait at the airport
• avoiding speeding tickets along the way
• The agent’s knowledge cannot guarantee any of these outcomes for A90, but it can provide
some degree of belief that they will be achieved.
• Other plans, such as A180, might increase the agent’s belief that it will get to the airport on
time, but also increase the likelihood of a long wait.
• Trying to use logic to cope with a domain like medical diagnosis thus fails for three main
reasons:
• Laziness: It is too much work to list the complete set of backgrounds or consequents needed
to ensure an exceptionless rule and too hard to use such rules.
• Theoretical ignorance: Medical science has no complete theory for the domain.
• Practical ignorance: Even if we know all the rules, we might be uncertain about a particular
patient because not all the necessary tests have been or can be run.
10/23/2024 10
UNCERTAINTY
 When an agent knows enough facts about its environment, the logical plans
and actions produces a guaranteed work.
 Unfortunately, agents never have access to the whole truth about their
environment. Agents act under uncertainty.
Contd..
10/23/2024 11
UNCERTAINTY
The Diagnosis:
 Medicine, Automobile repair, or what ever is at ask that almost always involves uncertainty.
 Let us try to write rules for dental diagnosis using first order logic, so that we can see how
the logical approach breaks down. Consider the following rule.
∀p symptom(p, toothache)  Disease(p, cavity).
 The problem is that this rule is wrong.
 Not all patients with tooth aches have cavities; some of them have gum disease, swelling, or
one of several other problems.
∀p symptom(p, toothache)  Disease(p, cavity) V Disease(p, gumdisease) V Disease(p,
swelling)
Contd..
10/23/2024 12
UNCERTAINTY
 To make the rule true, we have to add almost unlimited list of possible causes.
 We could try a causal rule:
∀p Disease (p, cavity)  symptom(p, toothache).
 But this rule is also not right either, not all cavities cause pain.
 Toothache and a cavity are unconnected, so the judgement may go wrong.
Contd..
10/23/2024 13
NATURE OF UNCERTAIN KNOWLEDGE
 This is a type of the medical domain, as well as most other judgmental domains:
Law, business, design, automobile repair, gardening, and so on.
 The agent take action, only a degree of belief in the relevant sentences.
 Our main tool for dealing with degrees of belief will be probability theory.
 The probability assigns to each sentence a numerical degree of belief between 0 and 1.
Contd..
10/23/2024 14
PROBABILITY
 Probabilities are used to compute the truth of given statement, written as numbers between
0 and 1, that describes how likely an event is to occur.
 0 indicated impossibility and 1 indicates certainly.
1. Tossing a coin
2. Rolling a dice
 Probability based reasoning
 Understanding from knowledge
 How much of uncertainty present in that event.
Contd..
10/23/2024 15
PROBABILITY
 Probability provides a way of summarizing the uncertainty, that comes from our laziness
and ignorance.
 Laziness means – too many antecedent
 Ignorance means – No complete knowledge, Lack of relevant fact, initial conditions and
not all test can run.
 Toothache problem, an 80 % chance, a probability of 0.8 that the patient has a cavity if he
or she has a toothache by statistical data.
 The 80 % summarizes those cases, but both toothache and cavity are unconnected.
 The missing 20% summarizes, all other possible causes of toothache, that we are too lazy
or ignorant to confirm or deny.
Contd..
10/23/2024 16
 Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of
the sentence.
 The sentence itself is in fact either true or false.
 It is important to note that a degree of belief is different from a degree of truth.
 A probability of 0.8 does not mean “80% true” but rather an 80% degree of belief that is, a
fairly strong expectation.
 Thus, probability theory makes the same ontological commitment as logic namely, that
facts either do or do not hold in the world.
 Degree of truth, as opposed to degree of belief, is the subject of fuzzy logic.
Contd..
10/23/2024 17
 In probability theory, a sentence such as
“The probability that the patient has a cavity is 0.8”.
 Is about the agent’s beliefs not directly about the world.
 These percepts create the evidence, which are based on probability statements.
 All probability statement must indicate the evidence with respect to that probability is begin
assessed.
 If an agent receives new percepts, its probability assessments are updated to reflect the new
evidence.
Contd..
10/23/2024 18
RANDOM VARIABLE
 Referring to a “part” of the world, whose “status” is initially unknown.
 We will use lowercase for the names of values
P(a) = 1 – P(¬a)
P(a) + P(¬a) = 1
Tossing coin: P(h) = 1 – P(¬h) : (0.5 = 1 – 0.5)
Rolling dice: P(n) = 1 – P(¬n) : (0.16 = 1 – 0.84)
Contd..
10/23/2024 19
TYPES OF RANDOM VARIABLES
 Boolean Random Variable
 Cavity domain (true, false), if Cavity = true then cavity, or
 If Cavity = false then ¬cavity
 Discrete Random Variables – countable domain
 Weather might be(sunny, rainy, cloudy, snow)
 Weather = cloudy then ¬rainy, ¬sunny, ¬snow
 Continuous Random variable
 Finite set real numbers with equal intervals e.g. Internal (0.1)
Contd..
10/23/2024 22
PRIOR PROBABILITY
 The unconditional or prior probability associated with a proposition a, is the degree of
belief according the absence of any other information. Is the probability of an event before
new data is collected.
 It is written as P(a).
 For example, if the prior probability that one have a cavity is 0.1, then we would write
P(Cavity = true) = 0.1 or P(cavity) = 0.1.
P(¬Cavity = false)
It is important to remember that P(a) can be used only when there is no other information.
P(Total =11) = P((5, 6)) + P((6, 5)) = 1/36 + 1/36 = 1/18.
Contd..
10/23/2024 23
PRIOR PROBABILITY…
 We will use an expression P(weather), which denotes a vector of values, for the
probabilities of each individual state of the weather.
P(weather = sunny) = 0.7
P(weather = rain) = 0.2
P(weather = cloudy) = 0.08
P(weather = snow) = 0.02
 We may simply write
P(Weather) = (0.7, 0.2, 0.08, 0.02)
 This statement defines a prior probability distribution for the random variable weather.
Contd..
10/23/2024 24
CONDITIONAL PROBABILITY
 The conditional or posterior probabilities notation is P(a|b),
 Where a and b are any proposition.
 This is read as “the probability of a, given that all we know is b.”
 For example,
 P(cavity | toothache) = 0.8
 If a patient is observed to have a toothache and no other information is yet available, then
the probability of the patient's having a cavity will be 0.8.
Contd..
10/23/2024 25
CONDITIONAL PROBABILITY…
 Conditional probabilities cab be defined in terms of unconditional probabilities.
 The equation is
P(a|b) =
 Probability of an event B, assuming that the event A already happened.
 When ever P(b) > 0.
 This equation can also be written as
P(a b) = P(a|b) × P(b)
 which is called the product rule.
Contd..
Need of probabilistic reasoning in AI
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to handle.
• When an unknown error occurs during an experiment.
• In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics
10/23/2024 27
BASIC AXIOMS OF PROBABILITY
 All probabilities are between 0 and 1. for any proposition a,
0 ≤ P(a) ≤ 1
 Necessarily true(i.e., valid) propositions have probability 1,
 Necessarily false (i.e., unsatisfiable) propositions have probability 0.
P(true) = 1 P(false) = 0.
 The probability of a disjunction is given by
P(a U b) = P(a) + P(b) - P(a ∩ b)
Contd..
10/23/2024 28
RECAP OF PREVIOUS LECTURE
29
 1. Axioms of probability
 0 ≤ P(a) ≤ 1
 2. P(true) = 1, P(false) = 0
 3. P(a U b) = P(a) + P(b) - P(a ∩ b)
 4. P(A = 1 | B = 1):
 The fraction of cases where A is true if B is true
10/23/2024
1
A
A B A U B
P(A)
P(A)
P(A|B)
P(B)
P(A = 0.2) P(A|B = 0.5)
30
 5. P(A,B) = P(A|B) × P(B)
 This is one of the most powerful rules in probabilistic
reasoning
10/23/2024
31
 How can we use the axioms to prove that:
P(a) = 1 – P(¬a)
 Prior probability – Degree of belief in an event, in the absence of any other information.
 P(rain tomorrow) = 0.8
 P(no-rain tomorrow) = 0.2
Conditional Probability:
 What is the probability of an event, given knowledge of another event
Example:
 P(raining | sunny)
 P(raining | cloudy)
 P(raining | cloudy, cold)
10/23/2024
Rain = 0.8
No Rain
= 0.2
32
 In some cases, given knowledge of one ore more random variable, we can improve upon
our prior belief of another random variable.
 For example:
 P(slept in movie) = 0.5
 P(slept in movie | liked movie) = 1/3 = .33
 P(didn’t slept in movie | liked movie) = .66
10/23/2024
10/23/2024 33
PROBABILISTIC REASONING – BAYES RULE
 Bayes' theorem is also known as Bayes' rule, Bayes' law, or
Bayesian reasoning, which determines the probability of an
event with uncertain knowledge.
 In probability theory, it relates the conditional probability and
marginal probabilities of two random events.
 Bayes' theorem was named after the British mathematician
Thomas Bayes. The Bayesian inference is an application of
Bayes' theorem, which is fundamental to Bayesian statistics.
 It is a way to calculate the value of P(B|A) with the knowledge of
P(A|B).
10/23/2024 34
BAYES RULE – Cont…
 Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
 Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
 Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
 As from product rule we can write:
P(A ∩ B)= P(A|B) P(B) or
 Similarly, the probability of event B with known event A:
P(A ∩ B)= P(B|A) P(A)
10/23/2024 35
BAYES RULE – Cont…
 The above equation (a) is called as Bayes' rule or
Bayes' theorem. This equation is basic of most
modern AI systems for probabilistic inference.
 It shows the simple relationship between joint and
conditional probabilities. Here,
P(A|B) P(A)
P(B|A)
P(B)
= × ---------
Posterior Prior
Likelihood
Marginal probability
-- (a)
 P(A|B) is known as posterior, which we need to calculate,
and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
 P(B|A) is called the likelihood, in which we consider that
hypothesis is true, then we calculate the probability of
evidence.
 P(A) is called the prior probability, probability of
hypothesis before considering the evidence
 P(B) is called marginal probability, pure probability of an
evidence.
10/23/2024 36
BAYES RULE – Cont…
 In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be
written as:
 Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
 Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
 This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.
 Suppose we want to perceive the effect of some unknown cause, and want to compute that cause,
then the Bayes' rule becomes:
10/23/2024 37
BAYES RULE – Example
Day Outlook Temperature Humidity Windy PlayTennis
D1 Sunny Hot High FALSE No
D2 Sunny Hot High TRUE No
D3 Overcast Hot High FALSE Yes
D4 Rainy Mild High FALSE Yes
D5 Rainy Cool Normal FALSE Yes
D6 Rainy Cool Normal TRUE No
D7 Overcast Cool Normal TRUE Yes
D8 Sunny Mild High FALSE No
D9 Sunny Cool Normal FALSE Yes
D10 Rainy Mild Normal FALSE Yes
D11 Sunny Mild Normal TRUE Yes
D12 Overcast Mild High TRUE Yes
D13 Overcast Hot Normal FALSE Yes
D14 Rainy Mild High TRUE No
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
10/23/2024 38
BAYES RULE – Example
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
Today = {Sunny, Cool, High, True}
10/23/2024 39
BAYES RULE – Cont…
Prior Probability:
 P(play tennis = Yes) = 9/14  .64
 P(play tennis = No) = 5/14  .36
Conditional Probability / Current Probability
Temperature Yes No
Mild 4/9 2/5
Hot 2/9 2/5
Cool 3/9 1/5
Humidity Yes No
High 3/9 4/5
Normal 6/9 1/5
Windy Yes No
True 3/9 3/5
False 6/9 2/5
Outlook Yes No
Overcast 4/9 0/5
Rainy 3/9 2/5
Sunny 2/9 3/5
10/23/2024 40
BAYES RULE – Cont…
P(Yes | Today) =
P(Sunny Outlook | Yes) × P(Cool Temperature | Yes) ×
P(High Humidity | Yes) × P(True Wind| Yes) × P(Yes)
----------------------------------------------------------------------
P(Today)
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
P(No | Today) =
P(Sunny Outlook | No) × P(Cool Temperature | No) ×
P(High Humidity | No) × P(True Wind| No) × P(No)
----------------------------------------------------------------------
P(Today)
10/23/2024 41
BAYES RULE – Cont…
P(Yes | Today) = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.00529
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
P(No | Today) = 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.02057
P(Yes | Today) =
0.00529
---------------------
0.00529 + 0.02057
0.00529
---------------------
0.02586
= 0.20456
=
P(No | Today) =
0.02057
---------------------
0.00529 + 0.02057
0.02057
---------------------
0.02586
= 0.79543
=
These numbers can be converted into a probability by making the sum equal to 1
(normalization):
P(Yes | Today) + P(No | Today) = 1
10/23/2024 42
BAYES RULE – Cont…
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
So, prediction that tennis would be played is ‘No’.
P(Yes | Today) > P(No | Today)
0.20456 > 0.79543
10/23/2024 43
BAYES RULE – Example
Outlook = Rainy, Temperature = Mild, Humidity = High, Wind = True
Today = {Rainy, Mild, High, True}
10/23/2024 44
BAYES RULE – EXAMPLE 2
Color = Green, Legs = 2, Hight = Tall, and Smelly = No
10/23/2024 45
BAYES RULE – EXAMPLE 2
Prior Probability:
 P(M) = 4/8  .50
 P(H) = 4/8  .50
Conditional Probability / Current Probability
Color M H
White 2/4 3/4
Green 2/4 1/4
Leg M H
3 3/4 0/4
2 1/4 4/4
Height M H
Short 3/4 2/4
Tall 1/4 2/4
Smelly M H
Yes 3/4 1/4
No 1/4 3/4
10/23/2024 46
BAYES RULE – EXAMPLE 2
P(M | New instances) =
P(Color = Green | M) × P(Legs = 2 | M) × P(Hight = Tall | M) ×
P(Smelly = No | M)
Color = Green, Legs = 2, Hight = Tall, and Smelly = No
P(H | New instances) =
P(Color = Green | H) × P(Legs = 2 | H) × P(Hight = Tall | H) ×
P(Smelly = No | H)
P(M | New instances) = 4/8 × 2/4 × 1/4 × 1/4 × 1/4 = 0.00390625
P(M | New instances) = 4/8 × 1/4 × 4/4 × 2/4 × 3/4 = 0.046875
10/23/2024 47
BAYES RULE – Cont…
Color = Green, Legs = 2, Hight = Tall, and Smelly = No
P(M | New instances) =
0.00390625
---------------------
0.00390625 + 0.046875
0.00390625
---------------------
0.05078125
= 0.076923
=
P(H | New instances) =
0.046875
---------------------
0.00390625 + 0.046875
0.046875
---------------------
0.05078125
= 0.923076
=
These numbers can be converted into a probability by making the sum equal to 1
(normalization):
P(M | New instances) + P(H | New instances) = 1
10/23/2024 48
BAYES RULE – Cont…
Color = Green, Legs = 2, Hight = Tall, and Smelly = No
Hence new instance belongs to species H
P(M | New instances) > P(H | New instances)
0.076923 > 0.923076
10/23/2024 49
BAYES RULE – EXAMPLE 2
New Instance = { Color = Red, Type = SUV, Origin = Domestic} Stolen = Yes
10/23/2024 50
BAYES RULE – EXAMPLE 2
Prior Probability:
 P(Yes) = 5/10  .50
 P(No) = 5/10  .50
Conditional Probability / Current Probability
Color Yes No
Red 3/5 2/5
Yellow 2/5 3/5
Type Yes No
Sports 4/5 2/5
SUV 1/5 3/5
Origin Yes No
Domestic 2/5 3/5
Imported 3/5 2/5
10/23/2024 51
BAYES RULE – EXAMPLE 2
P(Yes | New instances) =
P(Yes) × P(Color = Red | Yes) ×
P(Type = SUV | Yes) × P(Origin = Domestic | Yes)
Color = Red, Type = SUV, Origin = Domestic
P(No | New instances) =
P(Yes | New instances) = 5/10 × 3/5 × 1/5 × 2/5 = 0.024
P(No | New instances) = 5/10 × 2/5 × 3/5 × 3/5 = 0.072
P(No) × P(Color = Red | No) ×
P(Type = SUV | No) × P(Origin = Domestic | No)
10/23/2024 52
BAYES RULE – Cont…
Color = Red, Type = SUV, Origin = Domestic
P(Yes | New instances) =
0.024
------------------
0.024 + 0.072
0.024
--------------
0.096
= 0.25
=
P(No | New instances) =
0.072
------------------
0.024 + 0.072
0.072
--------------
0.096
= 0.75
=
These numbers can be converted into a probability by making the sum equal to 1
(normalization):
P(Yes | New instances) + P(No | New instances) = 1
10/23/2024 53
BAYES RULE – Cont…
Color = Red, Type = SUV, Origin = Domestic
Hence in the new instance vehicle is not stolen
P(Yes | New instances) > P(No | New instances)
0.25 > 0.75
10/23/2024 54
Bayesian Network
 Joint probability distribution
 Bayesian networks with examples
 Semantics of Bayesian networks
 Representing the full joint distribution
 A method for constructing Bayesian network
 Compactness and node ordering
 Conditional independence relation in Bayesian networks.
10/23/2024 56
JPD – cont….
 The Full Joint Probability Distribution specifies the probability of values to random
variables.
 It is usually too large to create or use in its explicit form.
 Joint probability distribution of two variables X and Y are
 Joint probability distribution for n variables requires 2n
entries with all possible
combinations.
Joint Probabilities X X’
Y 0.20 0.12
Y’ 0.65 0.03
10/23/2024 57
Drawbacks of Joint Probability Distributions
 Large number of variables and grows rapidly
 Time and space complexity are huge.
 Statistical estimation with probability is difficult
 Human tends signal out few propositions.
 The alternative to this is Bayesian networks.
10/23/2024 58
Bayesian Networks
 Bayesian network also called data structures, or also called as belief network or called
probabilistic network.
 The extension of Bayesian network is called as a decision network or influence diagram.
 Bayesian network is to represent the dependencies among variables and to give a brief
specification of any full joint probability distribution.
 A Bayesian network is a directed graph in which each nodes are variables and edges are
relation.
10/23/2024 59
BAYESIAN NETWORKS
 A Bayesian network is a directed graph in which each node is annotated with quantitative
probability information.
 The full specification is as follows:
 A set of random variables makes the nodes of network. Variables may be discrete or
continuous.
 A set of directed links or arrow connects a pairs of nodes. If there is an arrow from node
X to node Y, X is parent of Y.
 Each node X, has a conditional probability distribution P(X, (parents(X)). that
quantifies the effect of the parents on the node. (X is a parent of Y)
 It is Direct Acyclic Graph or DAG – that is graph has no direct cycles.
X Y
10/23/2024 60
BAYESIAN NETWORK - EXAMPLE
 A & B are unconditional, independent, evidence and parent nodes.
 C & D are conditional, dependent, hypothesis and child nodes.
Contd..
C
A
D
B
10/23/2024 61
BAYESIAN NETWORK – EXAMPLE cont…
P(A, B, C, D) = P(D|A,B) × P(C|A) × P(B) × P(A)
10/23/2024 62
Contd..
10/23/2024 63
Bayesian Network – Burglar Alarm
 You have installed a new burglar alarm at home.
 It is fairly reliable at detecting a burglary, but also responds to occasion to minor
earthquakes.
 You also have two neighbors, john and Mary, who have promised to call you at work
when they hear the alarm.
 John always calls when he hears the alarm, but sometimes confuses the telephone ringing
with the alarm and calls then, too.
 Mary, on the other hand, likes loud music and sometimes misses the alarm altogether.
 Given the evidence of who has or has not called, we would like to estimate the probability
of a burglary.
10/23/2024 64
 The burglary and earthquakes directly affect the probability of the alarm’s going off.
 But, john and Mary call depends only on the alarm
 The network does not have nodes for Mary’s currently listening to loud music or the
telephone ringing and confusing john.
10/23/2024 65
EXAMPLE
 We can calculate the probability that the alarm has sounded, but neither a burglary
nor an earthquake has occurred, and both john and marry call.
P(j ∩ m ∩ a ∩ ¬b ∩ ¬e)
P(j, m, a, ¬ b, ¬ e) = P(j | a) × P(m| a) × P(a | ¬ b ∧ ¬ e) × P( ¬ b) × P( ¬ e)
= 0.90 × 0.70 × 0.001 × 0.999 × 0.998
= 0.000628
10/23/2024 66
What is the probability that john calls?
= P(j) = P(j|a) × P(a) + P(j|¬a) × P(¬a)
= {P(j|a) × (P(a | b, e) × p(b, e) + P(a | ¬b, e) × p(¬b, e) + P(a | b, ¬e) × p(b, ¬e) + P(a | ¬b,
¬e) × p(¬b, ¬e))} + {P(j|¬a) × (P(¬ a | b, e) × p(b, e) + P(¬ a | ¬b, e) × p(¬b, e) + P(¬ a | b,
¬e) × p(b, ¬e) + P(¬ a | ¬b, ¬e) × p(¬b, ¬e))}
= {.90 × (0.95*0.001*0.002 + 0.29*(1-0.001)*(0.002) + 0.94*0.001*(1-0.002) + 0.001*(1-
0.001)*(1-0.002))} + {0.05 × (0.05*0.001*0.002 + 0.71*(1-0.001)*0.002 + 0.06*0.001*(1-
0.002) + 0.999*(1-0.001)*(1-0.002))}
= {.90 × 0.002516442} + {0.05 × 0.997483558}
= 0.0022647978 + 0.0498741779
= 0.0521389754
10/23/2024 67
What is the probability that burglary given john and mary calls?
= P(b| j, m) = α P(b)
= α P(b)
= ]
= × 0.001 × [.90 × .70 × {.95 × .002 + .94 × .998} + {0.05 × 0.01 × {0.05 × 0.02 + 0.06 ×
0.998}}]
= × 0.001 × [.63× {0.0019 + 0.93812} + {0.0005 × {0.001 + 0.05988}}]
= × 0.001 × [.63× {0.94002} + {0.0005 × {0.06088}}]
= × 0.001 × [0.5922126 + 0.00003044]
= × 0.001 × [0.59224304]
= × 0.00059224304
10/23/2024 68
 In burglary network, the topology shows that
 Burglary and earthquake directly affect the probability of the alarm
 But john and marry call depends on the alarm.
 Our assumptions from the network,
 They do not perceive any burglaries directly
 They do not notice the minor earthquakes and
 They do not discuss before calling.
10/23/2024 69
 Notice that the network does not have nodes corresponding to
 Mary is currently listening to loud music or
 There is telephone ringing and confusing john
 These factors are summarized in the uncertainty, associated with the links from alarm to
johncalls and marry calls.
 This shoes both laziness and ignorance in operations.
10/23/2024 70
Semantics of Bayesian Network
 An entry in joint distribution is the probability of conjunction of particular assignment to
each variable, such as
 P(X1 = x1, X2 = x
∧ 2 … … X
∧ ∧ n = xn)
 P(X1,X2 , . . . ,Xn) = P(xi | Parents(Xi))
 X is the random variable and x is the value of X
10/23/2024 71
METHOD FOR CONSTRUCTING BAYESIAN NETWORK
 Rewrite the joint distribution in terms of a conditional probability, using the product
rule.
 Then we repeat the process, reducing each conjunctive probability to a conditional
probability and a smaller conjunction. We end up with one big product.
10/23/2024 72
10/23/2024 73
COMPACTNESS AND NODE ORDERING
 The compactness of Bayesian network is an example of general property of locally
constructed systems. (also called as spare systems, inside some components there, and
those are communicated)
 In a locally structured system, each subcomponent interacts directly with only a bounded
number of other components, regardless of the total number of components.
 Therefore the correct order in which to add node is to add the ‘root causes’ first, then the
variables they influenced and so on until we reach the leaves.
10/23/2024 74
Suppose we decide to add the nodes in the order Marycalls, Johncalls,
Alarm, Burglary, Earthquake
 Adding MaryCalls: No parents.
 Adding JohnCalls: If Mary calls, that probably means the alarm has gone off, which of course would make it more likely
that John calls. Therefore, JohnCalls needs MaryCalls as a parent.
 Adding Alarm: Clearly, if both call, it is more likely that the alarm has gone off than if just one or neither calls, so we
need both MaryCalls and JohnCalls as parents.
 Adding Burglary: If we know the alarm state, then the call from John or Mary might give us information about our
phone ringing or Mary’s music, but not about burglary:
P(Burglary | Alarm, JohnCalls ,MaryCalls) = P(Burglary | Alarm) .
 Hence we need just Alarm as parent.
 Adding Earthquake: If the alarm is on, it is more likely that there has been an earthquake. (The alarm is an earthquake
detector of sorts.) But if we know that there has been a burglary, then that explains the alarm, and the probability of an
earthquake would be only slightly above normal. Hence, we need both Alarm and Burglary as parents.
10/23/2024 75
CONDITIONAL INDEPENDENCE RELATIONS IN BAYESIAN
NETWORKS
 A node is conditionally independent of its non-descendants, given its parents.
 Example:
 Johncalls is conditionally independent of Burglary and Earthquake and john is dependent on
alarm.
10/23/2024 76
 A node is conditionally independent of all other nodes in the network, given its parents,
children, and children’s parents that is, given its Markov blanket.
 Example:
 Burglary is independent of johncalls and Marycalls given alarm and earthquake
10/23/2024 77
BAYESIAN INFERENCE
 Probabilistic Inference System is to compute, Posterior Probability Distribution for a set
of query variables, given some observed events.
 That is, some assignment of values to a set of evidence variables.
10/23/2024 78
BAYESIAN INFERENCE – Notations
 X – denotes the query variables.
 E – set of evidence variables {E1, …, Em}
 e – particular observed event.
 Y – non-evidence, non-query variables, Y1,…, Yn. (called the hidden variables)
 The complete set of variable – X ={X} U E U Y
 A typical query asks for the posterior Probability Distribution P{X | e}
10/23/2024 79
Inferences in Bayesian network - Burglary Alarm
 In the burglary network, we might observe the event in which
JohnCalls = True and MaryCalls = True
 We could then ask for, say the probability that a burglary has occurred:
P(Burglary | JohnCalls = true, MaryCalls = Ture) = (0.284, 0.716)
 Burglary - query variables
 JohnCalls and MaryCalls – Evident variables
 Hidden Variable – Alarm
10/23/2024 80
TYPES OF INFERENCES
 Inference by enumerations – inference by listing or recording all variables or extracting
 Inference by variables Elimination – inference by variable removal.
10/23/2024 81
INFERENCE BY ENUMERATIONS
 Any conditional probability can be computed by summing terms from the full joint
distribution.
 More specifically, a query P(X|e) can be answered using equations:
P(X|e) = α P(X,e) =
 Where α is normalized constant
 X – query Variable
 E – Event
 Y – number of terms
10/23/2024 82
INFERENCE BY ENUMERATIONS - Example
 Consider the query P(Burglary | JohnCalls = true, MaryCalls = true).
 Burglary – Query Varibale (X)
 JohnCalls – Evidence Variable1(E1)
 MaryCalls – Evidence Variable 2 (E2)
 The hidden variables of this query are earthquake and alarm.
10/23/2024 83
 From Equation, using initial letters for the variables to shorten the expressions, we have
P(B| j, m) = α P(B, j, m) =
 The semantics of Bayesian networks (Equation) then gives us an expression in terms of
CPT entries. For simplicity, we do this just for Burglary = true:
P(B| j, m) = α P(B, j, m) =
 P(b) – Parent Variable – Independent variable
 P(e) - Parent Variable – Independent variable
 ) – a is alarm dependent variable (Hidden Variable) (Burglary and Earthquake)
 ) – Evidence variable from alarm.
10/23/2024 84
INFERENCE BY VARIABLES ELIMINATION
 The enumeration algorithm can be improved substantially by elimination repeated
calculations.
 The idea is simple: do the calculation once and solve the result for later use. This is a
form of dynamic programming.
10/23/2024 85
INFERENCE BY VARIABLES ELIMINATION
 Variable elimination works by evaluating expressions,
 Previous equation {derived in inference by enumeration}
P(B| j, m) =
 From this the repeated variables are separated
P(B| j, m) =
10/23/2024 86
INFERENCE BY VARIABLES ELIMINATION
 Intermediate results are stored and summations of each variable are done, for only those
portion of the expression, that depends on the variable.
 Let us illustrate this process for the burglary network
 We evaluate the expression
P(B| j, m) =
 We have annotated each part of the expression with the name of the associated variable;
these parts are called factors.
 P(b)  B P(e)  E P(a|B,e)  A
 P(j|a)  J P(m|a)  M
10/23/2024 87
 For example, the factors f4(a) and f5(a) corresponding to P(j | a) and p(m | a) depending
just on A because J and M are fixed by the query.
 They are therefore two element vectors.
10/23/2024 88
INFERENCE BY VARIABLES ELIMINATION - EXAMPLE
 Given two factors f1(A, B) and f2(B, C) with probability distributions shown below, the
pointwise product f1 × f2 = f3(A, B, C) has 21+1+1
= 8
10/23/2024 89
10/23/2024 90

More Related Content

PDF
artificial intelligence 13-quantifying uncertainity.pdf
PPTX
Probability in artificial intelligence.pptx
PPTX
QUANTIFYING UNCERTAINTY .....................
PDF
AI CHAPTER 7.pdf
PPT
Artificial Intelligence Bayesian Reasoning
PPT
Earthquake dhnjggbnkkkknvcxsefghjk gyjhvcdyj
PPTX
Uncertainty in computer agent - Copy.pptx
PDF
Probabilistic Reasoning bayes rule conditional .pdf
artificial intelligence 13-quantifying uncertainity.pdf
Probability in artificial intelligence.pptx
QUANTIFYING UNCERTAINTY .....................
AI CHAPTER 7.pdf
Artificial Intelligence Bayesian Reasoning
Earthquake dhnjggbnkkkknvcxsefghjk gyjhvcdyj
Uncertainty in computer agent - Copy.pptx
Probabilistic Reasoning bayes rule conditional .pdf

Similar to Artificial Intelligence and Machine Learning (20)

PDF
PPTX
Module5_chapter1PPT (1).pptxhdhrjgrrjjrjrdbjejej
PPTX
AI_Probability.pptx
PPT
AI Lecture 7 (uncertainty)
PPT
Your score increaseAdd more information to your s as yo.ppt
PPTX
Russel Norvig Uncertainity - chap 13.pptx
PPTX
artificial intelligence and uncertain reasoning
PDF
Uncertain knowledge and reasoning
PPTX
Lesson04-Uncertainty - Pt. 1 Probabilistic Methods.pptx
PPTX
CS3491-Unit-2 Uncertainty.pptx
PPT
Uncertainity
PPT
Uncertainty
PPTX
Chapter 13
PDF
AI_7 Statistical Reasoning
PDF
Artificial Intelligence Chap.5 : Uncertainty
PPTX
22PCOAM11 Session 22 Acting under uncertainty.pptx
PPTX
Uncertainty in AI
PPTX
moduledeeplearning Reasoning_Methods.pptx
PPTX
Uncertain Knowledge and Reasoning in Artificial Intelligence
PDF
13-uncertainty.pdf
Module5_chapter1PPT (1).pptxhdhrjgrrjjrjrdbjejej
AI_Probability.pptx
AI Lecture 7 (uncertainty)
Your score increaseAdd more information to your s as yo.ppt
Russel Norvig Uncertainity - chap 13.pptx
artificial intelligence and uncertain reasoning
Uncertain knowledge and reasoning
Lesson04-Uncertainty - Pt. 1 Probabilistic Methods.pptx
CS3491-Unit-2 Uncertainty.pptx
Uncertainity
Uncertainty
Chapter 13
AI_7 Statistical Reasoning
Artificial Intelligence Chap.5 : Uncertainty
22PCOAM11 Session 22 Acting under uncertainty.pptx
Uncertainty in AI
moduledeeplearning Reasoning_Methods.pptx
Uncertain Knowledge and Reasoning in Artificial Intelligence
13-uncertainty.pdf
Ad

More from ssuser1ecccc (20)

PPTX
Biometrics and its applications in Medical
PDF
DIP-Introduction Lecture 13-10-14 image analysis
PDF
Chap_9_Representation_and_Description.pdf
PPTX
Biometrics and its applications for system Analysis
PPTX
Biometrics and its applications for system
PDF
Project formulation in image analysis in various methods
PPTX
2 R ladder architecture analysis for image
DOCX
Existing method used for analysis of images
DOCX
Histogram analysis of image using wavelet
DOCX
Structure_of_How_to_Write_a_Journal.docx
PPTX
pixelrelationships in image processing enhancement
PPTX
Knowledge representation and reasoning in AI
PPTX
Intro class. U1 M1.pptx
PPT
NGP BME Final.ppt
PPTX
ES.pptx
PPTX
TASK Sixth Sensor Technology.pptx
PPTX
4.-personal-productive-equipment.pptx
PPTX
5.-accident-causation-theories-accident-reporting.pptx
PPTX
unit-3-osha-hcs-dot.pptx
PPTX
unit-iv-facility-safety.pptx
Biometrics and its applications in Medical
DIP-Introduction Lecture 13-10-14 image analysis
Chap_9_Representation_and_Description.pdf
Biometrics and its applications for system Analysis
Biometrics and its applications for system
Project formulation in image analysis in various methods
2 R ladder architecture analysis for image
Existing method used for analysis of images
Histogram analysis of image using wavelet
Structure_of_How_to_Write_a_Journal.docx
pixelrelationships in image processing enhancement
Knowledge representation and reasoning in AI
Intro class. U1 M1.pptx
NGP BME Final.ppt
ES.pptx
TASK Sixth Sensor Technology.pptx
4.-personal-productive-equipment.pptx
5.-accident-causation-theories-accident-reporting.pptx
unit-3-osha-hcs-dot.pptx
unit-iv-facility-safety.pptx
Ad

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Types and Its function , kingdom of life
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Institutional Correction lecture only . . .
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharma ospi slides which help in ospi learning
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
RMMM.pdf make it easy to upload and study
STATICS OF THE RIGID BODIES Hibbelers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPH.pptx obstetrics and gynecology in nursing
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Institutional Correction lecture only . . .
Basic Mud Logging Guide for educational purpose
Pharma ospi slides which help in ospi learning
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Microbial disease of the cardiovascular and lymphatic systems
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Artificial Intelligence and Machine Learning

  • 1. Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic reasoning – Bayesian networks – exact inference in BN – approximate inference in BN – causal networks. Dr. N.G.P. INSTITUTE OF TECHNOLOGY – COIMBATORE - 48 (An Autonomous Institution) CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Dr. B. Dhiyanesh Associate Professor / CSE UNIT II PROBABILISTIC REASONING
  • 2. 10/23/2024 2 Recap of Previous lecture
  • 4. 10/23/2024 4 UNCERTAINTY  Uncertainty  Review of probability  Probabilistic reasoning – Bayes rule  Bayesian networks  Inferences in Bayesian network Contd..
  • 5. ACTING UNDER UNCERTAINTY • Agents may need to handle uncertainty, whether due to partial observability, nondeterminism, or a combination of the two. An agent may never know for certain what state it’s in or where it will end up after a sequence of actions. • With this knowledge representation, we might write A→B, which means if A is true then B is true, but consider a situation where we are not sure about whether A is true or not then we cannot express this statement, this situation is called uncertainty. • So to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain reasoning or probabilistic reasoning.
  • 6. CAUSES OF UNCERTAINTY • Following are some leading causes of uncertainty to occur in the real world. • Information occurred from unreliable sources. • Experimental Errors • Equipment fault • Temperature variation • Climate change. • When interpreting partial sensor information, a logical agent must consider every logically possible explanation for the observations, no matter how unlikely. This leads to impossible large and complex belief-state representations. • Sometimes there is no plan that is guaranteed to achieve the goal—yet the agent must act. It must have some way to compare the merits of plans that are not guaranteed.
  • 7. • Suppose, for example, that an automated taxi! • Automated has the goal of delivering a passenger to the airport on time. • The agent forms a plan, A90, • that involves leaving home 90 minutes before the flight departs and driving at a reasonable speed. • Even though the airport is only about 5 miles away, • a logical taxi agent will not be able to conclude with certainty that “Plan A90 will get us to the airport in time.” • Instead, it reaches the weaker conclusion • “Plan A90 will get us to the airport in time, • as long as the car doesn’t break down or run out of gas • I don’t get into an accident, and there are no accidents on the bridge • The plane doesn’t leave early, and no meteorite hits the car.
  • 8. • Nonetheless, in some sense A90 is in fact the right thing to do. • A90 is expected to maximize the agent’s performance measure (where the expectation is relative to the agent’s knowledge about the environment). • The performance measure includes • getting to the airport in time for the flight • avoiding a long • unproductive wait at the airport • avoiding speeding tickets along the way • The agent’s knowledge cannot guarantee any of these outcomes for A90, but it can provide some degree of belief that they will be achieved. • Other plans, such as A180, might increase the agent’s belief that it will get to the airport on time, but also increase the likelihood of a long wait.
  • 9. • Trying to use logic to cope with a domain like medical diagnosis thus fails for three main reasons: • Laziness: It is too much work to list the complete set of backgrounds or consequents needed to ensure an exceptionless rule and too hard to use such rules. • Theoretical ignorance: Medical science has no complete theory for the domain. • Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient because not all the necessary tests have been or can be run.
  • 10. 10/23/2024 10 UNCERTAINTY  When an agent knows enough facts about its environment, the logical plans and actions produces a guaranteed work.  Unfortunately, agents never have access to the whole truth about their environment. Agents act under uncertainty. Contd..
  • 11. 10/23/2024 11 UNCERTAINTY The Diagnosis:  Medicine, Automobile repair, or what ever is at ask that almost always involves uncertainty.  Let us try to write rules for dental diagnosis using first order logic, so that we can see how the logical approach breaks down. Consider the following rule. ∀p symptom(p, toothache)  Disease(p, cavity).  The problem is that this rule is wrong.  Not all patients with tooth aches have cavities; some of them have gum disease, swelling, or one of several other problems. ∀p symptom(p, toothache)  Disease(p, cavity) V Disease(p, gumdisease) V Disease(p, swelling) Contd..
  • 12. 10/23/2024 12 UNCERTAINTY  To make the rule true, we have to add almost unlimited list of possible causes.  We could try a causal rule: ∀p Disease (p, cavity)  symptom(p, toothache).  But this rule is also not right either, not all cavities cause pain.  Toothache and a cavity are unconnected, so the judgement may go wrong. Contd..
  • 13. 10/23/2024 13 NATURE OF UNCERTAIN KNOWLEDGE  This is a type of the medical domain, as well as most other judgmental domains: Law, business, design, automobile repair, gardening, and so on.  The agent take action, only a degree of belief in the relevant sentences.  Our main tool for dealing with degrees of belief will be probability theory.  The probability assigns to each sentence a numerical degree of belief between 0 and 1. Contd..
  • 14. 10/23/2024 14 PROBABILITY  Probabilities are used to compute the truth of given statement, written as numbers between 0 and 1, that describes how likely an event is to occur.  0 indicated impossibility and 1 indicates certainly. 1. Tossing a coin 2. Rolling a dice  Probability based reasoning  Understanding from knowledge  How much of uncertainty present in that event. Contd..
  • 15. 10/23/2024 15 PROBABILITY  Probability provides a way of summarizing the uncertainty, that comes from our laziness and ignorance.  Laziness means – too many antecedent  Ignorance means – No complete knowledge, Lack of relevant fact, initial conditions and not all test can run.  Toothache problem, an 80 % chance, a probability of 0.8 that the patient has a cavity if he or she has a toothache by statistical data.  The 80 % summarizes those cases, but both toothache and cavity are unconnected.  The missing 20% summarizes, all other possible causes of toothache, that we are too lazy or ignorant to confirm or deny. Contd..
  • 16. 10/23/2024 16  Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of the sentence.  The sentence itself is in fact either true or false.  It is important to note that a degree of belief is different from a degree of truth.  A probability of 0.8 does not mean “80% true” but rather an 80% degree of belief that is, a fairly strong expectation.  Thus, probability theory makes the same ontological commitment as logic namely, that facts either do or do not hold in the world.  Degree of truth, as opposed to degree of belief, is the subject of fuzzy logic. Contd..
  • 17. 10/23/2024 17  In probability theory, a sentence such as “The probability that the patient has a cavity is 0.8”.  Is about the agent’s beliefs not directly about the world.  These percepts create the evidence, which are based on probability statements.  All probability statement must indicate the evidence with respect to that probability is begin assessed.  If an agent receives new percepts, its probability assessments are updated to reflect the new evidence. Contd..
  • 18. 10/23/2024 18 RANDOM VARIABLE  Referring to a “part” of the world, whose “status” is initially unknown.  We will use lowercase for the names of values P(a) = 1 – P(¬a) P(a) + P(¬a) = 1 Tossing coin: P(h) = 1 – P(¬h) : (0.5 = 1 – 0.5) Rolling dice: P(n) = 1 – P(¬n) : (0.16 = 1 – 0.84) Contd..
  • 19. 10/23/2024 19 TYPES OF RANDOM VARIABLES  Boolean Random Variable  Cavity domain (true, false), if Cavity = true then cavity, or  If Cavity = false then ¬cavity  Discrete Random Variables – countable domain  Weather might be(sunny, rainy, cloudy, snow)  Weather = cloudy then ¬rainy, ¬sunny, ¬snow  Continuous Random variable  Finite set real numbers with equal intervals e.g. Internal (0.1) Contd..
  • 20. 10/23/2024 22 PRIOR PROBABILITY  The unconditional or prior probability associated with a proposition a, is the degree of belief according the absence of any other information. Is the probability of an event before new data is collected.  It is written as P(a).  For example, if the prior probability that one have a cavity is 0.1, then we would write P(Cavity = true) = 0.1 or P(cavity) = 0.1. P(¬Cavity = false) It is important to remember that P(a) can be used only when there is no other information. P(Total =11) = P((5, 6)) + P((6, 5)) = 1/36 + 1/36 = 1/18. Contd..
  • 21. 10/23/2024 23 PRIOR PROBABILITY…  We will use an expression P(weather), which denotes a vector of values, for the probabilities of each individual state of the weather. P(weather = sunny) = 0.7 P(weather = rain) = 0.2 P(weather = cloudy) = 0.08 P(weather = snow) = 0.02  We may simply write P(Weather) = (0.7, 0.2, 0.08, 0.02)  This statement defines a prior probability distribution for the random variable weather. Contd..
  • 22. 10/23/2024 24 CONDITIONAL PROBABILITY  The conditional or posterior probabilities notation is P(a|b),  Where a and b are any proposition.  This is read as “the probability of a, given that all we know is b.”  For example,  P(cavity | toothache) = 0.8  If a patient is observed to have a toothache and no other information is yet available, then the probability of the patient's having a cavity will be 0.8. Contd..
  • 23. 10/23/2024 25 CONDITIONAL PROBABILITY…  Conditional probabilities cab be defined in terms of unconditional probabilities.  The equation is P(a|b) =  Probability of an event B, assuming that the event A already happened.  When ever P(b) > 0.  This equation can also be written as P(a b) = P(a|b) × P(b)  which is called the product rule. Contd..
  • 24. Need of probabilistic reasoning in AI • When there are unpredictable outcomes. • When specifications or possibilities of predicates becomes too large to handle. • When an unknown error occurs during an experiment. • In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge: • Bayes' rule • Bayesian Statistics
  • 25. 10/23/2024 27 BASIC AXIOMS OF PROBABILITY  All probabilities are between 0 and 1. for any proposition a, 0 ≤ P(a) ≤ 1  Necessarily true(i.e., valid) propositions have probability 1,  Necessarily false (i.e., unsatisfiable) propositions have probability 0. P(true) = 1 P(false) = 0.  The probability of a disjunction is given by P(a U b) = P(a) + P(b) - P(a ∩ b) Contd..
  • 26. 10/23/2024 28 RECAP OF PREVIOUS LECTURE
  • 27. 29  1. Axioms of probability  0 ≤ P(a) ≤ 1  2. P(true) = 1, P(false) = 0  3. P(a U b) = P(a) + P(b) - P(a ∩ b)  4. P(A = 1 | B = 1):  The fraction of cases where A is true if B is true 10/23/2024 1 A A B A U B P(A) P(A) P(A|B) P(B) P(A = 0.2) P(A|B = 0.5)
  • 28. 30  5. P(A,B) = P(A|B) × P(B)  This is one of the most powerful rules in probabilistic reasoning 10/23/2024
  • 29. 31  How can we use the axioms to prove that: P(a) = 1 – P(¬a)  Prior probability – Degree of belief in an event, in the absence of any other information.  P(rain tomorrow) = 0.8  P(no-rain tomorrow) = 0.2 Conditional Probability:  What is the probability of an event, given knowledge of another event Example:  P(raining | sunny)  P(raining | cloudy)  P(raining | cloudy, cold) 10/23/2024 Rain = 0.8 No Rain = 0.2
  • 30. 32  In some cases, given knowledge of one ore more random variable, we can improve upon our prior belief of another random variable.  For example:  P(slept in movie) = 0.5  P(slept in movie | liked movie) = 1/3 = .33  P(didn’t slept in movie | liked movie) = .66 10/23/2024
  • 31. 10/23/2024 33 PROBABILISTIC REASONING – BAYES RULE  Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability of an event with uncertain knowledge.  In probability theory, it relates the conditional probability and marginal probabilities of two random events.  Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.  It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
  • 32. 10/23/2024 34 BAYES RULE – Cont…  Bayes' theorem allows updating the probability prediction of an event by observing new information of the real world.  Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of cancer more accurately with the help of age.  Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:  As from product rule we can write: P(A ∩ B)= P(A|B) P(B) or  Similarly, the probability of event B with known event A: P(A ∩ B)= P(B|A) P(A)
  • 33. 10/23/2024 35 BAYES RULE – Cont…  The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI systems for probabilistic inference.  It shows the simple relationship between joint and conditional probabilities. Here, P(A|B) P(A) P(B|A) P(B) = × --------- Posterior Prior Likelihood Marginal probability -- (a)  P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we have occurred an evidence B.  P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of evidence.  P(A) is called the prior probability, probability of hypothesis before considering the evidence  P(B) is called marginal probability, pure probability of an evidence.
  • 34. 10/23/2024 36 BAYES RULE – Cont…  In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:  Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.  Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).  This is very useful in cases where we have a good probability of these three terms and want to determine the fourth one.  Suppose we want to perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule becomes:
  • 35. 10/23/2024 37 BAYES RULE – Example Day Outlook Temperature Humidity Windy PlayTennis D1 Sunny Hot High FALSE No D2 Sunny Hot High TRUE No D3 Overcast Hot High FALSE Yes D4 Rainy Mild High FALSE Yes D5 Rainy Cool Normal FALSE Yes D6 Rainy Cool Normal TRUE No D7 Overcast Cool Normal TRUE Yes D8 Sunny Mild High FALSE No D9 Sunny Cool Normal FALSE Yes D10 Rainy Mild Normal FALSE Yes D11 Sunny Mild Normal TRUE Yes D12 Overcast Mild High TRUE Yes D13 Overcast Hot Normal FALSE Yes D14 Rainy Mild High TRUE No Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True
  • 36. 10/23/2024 38 BAYES RULE – Example Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True Today = {Sunny, Cool, High, True}
  • 37. 10/23/2024 39 BAYES RULE – Cont… Prior Probability:  P(play tennis = Yes) = 9/14  .64  P(play tennis = No) = 5/14  .36 Conditional Probability / Current Probability Temperature Yes No Mild 4/9 2/5 Hot 2/9 2/5 Cool 3/9 1/5 Humidity Yes No High 3/9 4/5 Normal 6/9 1/5 Windy Yes No True 3/9 3/5 False 6/9 2/5 Outlook Yes No Overcast 4/9 0/5 Rainy 3/9 2/5 Sunny 2/9 3/5
  • 38. 10/23/2024 40 BAYES RULE – Cont… P(Yes | Today) = P(Sunny Outlook | Yes) × P(Cool Temperature | Yes) × P(High Humidity | Yes) × P(True Wind| Yes) × P(Yes) ---------------------------------------------------------------------- P(Today) Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True P(No | Today) = P(Sunny Outlook | No) × P(Cool Temperature | No) × P(High Humidity | No) × P(True Wind| No) × P(No) ---------------------------------------------------------------------- P(Today)
  • 39. 10/23/2024 41 BAYES RULE – Cont… P(Yes | Today) = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.00529 Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True P(No | Today) = 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.02057 P(Yes | Today) = 0.00529 --------------------- 0.00529 + 0.02057 0.00529 --------------------- 0.02586 = 0.20456 = P(No | Today) = 0.02057 --------------------- 0.00529 + 0.02057 0.02057 --------------------- 0.02586 = 0.79543 = These numbers can be converted into a probability by making the sum equal to 1 (normalization): P(Yes | Today) + P(No | Today) = 1
  • 40. 10/23/2024 42 BAYES RULE – Cont… Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True So, prediction that tennis would be played is ‘No’. P(Yes | Today) > P(No | Today) 0.20456 > 0.79543
  • 41. 10/23/2024 43 BAYES RULE – Example Outlook = Rainy, Temperature = Mild, Humidity = High, Wind = True Today = {Rainy, Mild, High, True}
  • 42. 10/23/2024 44 BAYES RULE – EXAMPLE 2 Color = Green, Legs = 2, Hight = Tall, and Smelly = No
  • 43. 10/23/2024 45 BAYES RULE – EXAMPLE 2 Prior Probability:  P(M) = 4/8  .50  P(H) = 4/8  .50 Conditional Probability / Current Probability Color M H White 2/4 3/4 Green 2/4 1/4 Leg M H 3 3/4 0/4 2 1/4 4/4 Height M H Short 3/4 2/4 Tall 1/4 2/4 Smelly M H Yes 3/4 1/4 No 1/4 3/4
  • 44. 10/23/2024 46 BAYES RULE – EXAMPLE 2 P(M | New instances) = P(Color = Green | M) × P(Legs = 2 | M) × P(Hight = Tall | M) × P(Smelly = No | M) Color = Green, Legs = 2, Hight = Tall, and Smelly = No P(H | New instances) = P(Color = Green | H) × P(Legs = 2 | H) × P(Hight = Tall | H) × P(Smelly = No | H) P(M | New instances) = 4/8 × 2/4 × 1/4 × 1/4 × 1/4 = 0.00390625 P(M | New instances) = 4/8 × 1/4 × 4/4 × 2/4 × 3/4 = 0.046875
  • 45. 10/23/2024 47 BAYES RULE – Cont… Color = Green, Legs = 2, Hight = Tall, and Smelly = No P(M | New instances) = 0.00390625 --------------------- 0.00390625 + 0.046875 0.00390625 --------------------- 0.05078125 = 0.076923 = P(H | New instances) = 0.046875 --------------------- 0.00390625 + 0.046875 0.046875 --------------------- 0.05078125 = 0.923076 = These numbers can be converted into a probability by making the sum equal to 1 (normalization): P(M | New instances) + P(H | New instances) = 1
  • 46. 10/23/2024 48 BAYES RULE – Cont… Color = Green, Legs = 2, Hight = Tall, and Smelly = No Hence new instance belongs to species H P(M | New instances) > P(H | New instances) 0.076923 > 0.923076
  • 47. 10/23/2024 49 BAYES RULE – EXAMPLE 2 New Instance = { Color = Red, Type = SUV, Origin = Domestic} Stolen = Yes
  • 48. 10/23/2024 50 BAYES RULE – EXAMPLE 2 Prior Probability:  P(Yes) = 5/10  .50  P(No) = 5/10  .50 Conditional Probability / Current Probability Color Yes No Red 3/5 2/5 Yellow 2/5 3/5 Type Yes No Sports 4/5 2/5 SUV 1/5 3/5 Origin Yes No Domestic 2/5 3/5 Imported 3/5 2/5
  • 49. 10/23/2024 51 BAYES RULE – EXAMPLE 2 P(Yes | New instances) = P(Yes) × P(Color = Red | Yes) × P(Type = SUV | Yes) × P(Origin = Domestic | Yes) Color = Red, Type = SUV, Origin = Domestic P(No | New instances) = P(Yes | New instances) = 5/10 × 3/5 × 1/5 × 2/5 = 0.024 P(No | New instances) = 5/10 × 2/5 × 3/5 × 3/5 = 0.072 P(No) × P(Color = Red | No) × P(Type = SUV | No) × P(Origin = Domestic | No)
  • 50. 10/23/2024 52 BAYES RULE – Cont… Color = Red, Type = SUV, Origin = Domestic P(Yes | New instances) = 0.024 ------------------ 0.024 + 0.072 0.024 -------------- 0.096 = 0.25 = P(No | New instances) = 0.072 ------------------ 0.024 + 0.072 0.072 -------------- 0.096 = 0.75 = These numbers can be converted into a probability by making the sum equal to 1 (normalization): P(Yes | New instances) + P(No | New instances) = 1
  • 51. 10/23/2024 53 BAYES RULE – Cont… Color = Red, Type = SUV, Origin = Domestic Hence in the new instance vehicle is not stolen P(Yes | New instances) > P(No | New instances) 0.25 > 0.75
  • 52. 10/23/2024 54 Bayesian Network  Joint probability distribution  Bayesian networks with examples  Semantics of Bayesian networks  Representing the full joint distribution  A method for constructing Bayesian network  Compactness and node ordering  Conditional independence relation in Bayesian networks.
  • 53. 10/23/2024 56 JPD – cont….  The Full Joint Probability Distribution specifies the probability of values to random variables.  It is usually too large to create or use in its explicit form.  Joint probability distribution of two variables X and Y are  Joint probability distribution for n variables requires 2n entries with all possible combinations. Joint Probabilities X X’ Y 0.20 0.12 Y’ 0.65 0.03
  • 54. 10/23/2024 57 Drawbacks of Joint Probability Distributions  Large number of variables and grows rapidly  Time and space complexity are huge.  Statistical estimation with probability is difficult  Human tends signal out few propositions.  The alternative to this is Bayesian networks.
  • 55. 10/23/2024 58 Bayesian Networks  Bayesian network also called data structures, or also called as belief network or called probabilistic network.  The extension of Bayesian network is called as a decision network or influence diagram.  Bayesian network is to represent the dependencies among variables and to give a brief specification of any full joint probability distribution.  A Bayesian network is a directed graph in which each nodes are variables and edges are relation.
  • 56. 10/23/2024 59 BAYESIAN NETWORKS  A Bayesian network is a directed graph in which each node is annotated with quantitative probability information.  The full specification is as follows:  A set of random variables makes the nodes of network. Variables may be discrete or continuous.  A set of directed links or arrow connects a pairs of nodes. If there is an arrow from node X to node Y, X is parent of Y.  Each node X, has a conditional probability distribution P(X, (parents(X)). that quantifies the effect of the parents on the node. (X is a parent of Y)  It is Direct Acyclic Graph or DAG – that is graph has no direct cycles. X Y
  • 57. 10/23/2024 60 BAYESIAN NETWORK - EXAMPLE  A & B are unconditional, independent, evidence and parent nodes.  C & D are conditional, dependent, hypothesis and child nodes. Contd.. C A D B
  • 58. 10/23/2024 61 BAYESIAN NETWORK – EXAMPLE cont… P(A, B, C, D) = P(D|A,B) × P(C|A) × P(B) × P(A)
  • 60. 10/23/2024 63 Bayesian Network – Burglar Alarm  You have installed a new burglar alarm at home.  It is fairly reliable at detecting a burglary, but also responds to occasion to minor earthquakes.  You also have two neighbors, john and Mary, who have promised to call you at work when they hear the alarm.  John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too.  Mary, on the other hand, likes loud music and sometimes misses the alarm altogether.  Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.
  • 61. 10/23/2024 64  The burglary and earthquakes directly affect the probability of the alarm’s going off.  But, john and Mary call depends only on the alarm  The network does not have nodes for Mary’s currently listening to loud music or the telephone ringing and confusing john.
  • 62. 10/23/2024 65 EXAMPLE  We can calculate the probability that the alarm has sounded, but neither a burglary nor an earthquake has occurred, and both john and marry call. P(j ∩ m ∩ a ∩ ¬b ∩ ¬e) P(j, m, a, ¬ b, ¬ e) = P(j | a) × P(m| a) × P(a | ¬ b ∧ ¬ e) × P( ¬ b) × P( ¬ e) = 0.90 × 0.70 × 0.001 × 0.999 × 0.998 = 0.000628
  • 63. 10/23/2024 66 What is the probability that john calls? = P(j) = P(j|a) × P(a) + P(j|¬a) × P(¬a) = {P(j|a) × (P(a | b, e) × p(b, e) + P(a | ¬b, e) × p(¬b, e) + P(a | b, ¬e) × p(b, ¬e) + P(a | ¬b, ¬e) × p(¬b, ¬e))} + {P(j|¬a) × (P(¬ a | b, e) × p(b, e) + P(¬ a | ¬b, e) × p(¬b, e) + P(¬ a | b, ¬e) × p(b, ¬e) + P(¬ a | ¬b, ¬e) × p(¬b, ¬e))} = {.90 × (0.95*0.001*0.002 + 0.29*(1-0.001)*(0.002) + 0.94*0.001*(1-0.002) + 0.001*(1- 0.001)*(1-0.002))} + {0.05 × (0.05*0.001*0.002 + 0.71*(1-0.001)*0.002 + 0.06*0.001*(1- 0.002) + 0.999*(1-0.001)*(1-0.002))} = {.90 × 0.002516442} + {0.05 × 0.997483558} = 0.0022647978 + 0.0498741779 = 0.0521389754
  • 64. 10/23/2024 67 What is the probability that burglary given john and mary calls? = P(b| j, m) = α P(b) = α P(b) = ] = × 0.001 × [.90 × .70 × {.95 × .002 + .94 × .998} + {0.05 × 0.01 × {0.05 × 0.02 + 0.06 × 0.998}}] = × 0.001 × [.63× {0.0019 + 0.93812} + {0.0005 × {0.001 + 0.05988}}] = × 0.001 × [.63× {0.94002} + {0.0005 × {0.06088}}] = × 0.001 × [0.5922126 + 0.00003044] = × 0.001 × [0.59224304] = × 0.00059224304
  • 65. 10/23/2024 68  In burglary network, the topology shows that  Burglary and earthquake directly affect the probability of the alarm  But john and marry call depends on the alarm.  Our assumptions from the network,  They do not perceive any burglaries directly  They do not notice the minor earthquakes and  They do not discuss before calling.
  • 66. 10/23/2024 69  Notice that the network does not have nodes corresponding to  Mary is currently listening to loud music or  There is telephone ringing and confusing john  These factors are summarized in the uncertainty, associated with the links from alarm to johncalls and marry calls.  This shoes both laziness and ignorance in operations.
  • 67. 10/23/2024 70 Semantics of Bayesian Network  An entry in joint distribution is the probability of conjunction of particular assignment to each variable, such as  P(X1 = x1, X2 = x ∧ 2 … … X ∧ ∧ n = xn)  P(X1,X2 , . . . ,Xn) = P(xi | Parents(Xi))  X is the random variable and x is the value of X
  • 68. 10/23/2024 71 METHOD FOR CONSTRUCTING BAYESIAN NETWORK  Rewrite the joint distribution in terms of a conditional probability, using the product rule.  Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product.
  • 70. 10/23/2024 73 COMPACTNESS AND NODE ORDERING  The compactness of Bayesian network is an example of general property of locally constructed systems. (also called as spare systems, inside some components there, and those are communicated)  In a locally structured system, each subcomponent interacts directly with only a bounded number of other components, regardless of the total number of components.  Therefore the correct order in which to add node is to add the ‘root causes’ first, then the variables they influenced and so on until we reach the leaves.
  • 71. 10/23/2024 74 Suppose we decide to add the nodes in the order Marycalls, Johncalls, Alarm, Burglary, Earthquake  Adding MaryCalls: No parents.  Adding JohnCalls: If Mary calls, that probably means the alarm has gone off, which of course would make it more likely that John calls. Therefore, JohnCalls needs MaryCalls as a parent.  Adding Alarm: Clearly, if both call, it is more likely that the alarm has gone off than if just one or neither calls, so we need both MaryCalls and JohnCalls as parents.  Adding Burglary: If we know the alarm state, then the call from John or Mary might give us information about our phone ringing or Mary’s music, but not about burglary: P(Burglary | Alarm, JohnCalls ,MaryCalls) = P(Burglary | Alarm) .  Hence we need just Alarm as parent.  Adding Earthquake: If the alarm is on, it is more likely that there has been an earthquake. (The alarm is an earthquake detector of sorts.) But if we know that there has been a burglary, then that explains the alarm, and the probability of an earthquake would be only slightly above normal. Hence, we need both Alarm and Burglary as parents.
  • 72. 10/23/2024 75 CONDITIONAL INDEPENDENCE RELATIONS IN BAYESIAN NETWORKS  A node is conditionally independent of its non-descendants, given its parents.  Example:  Johncalls is conditionally independent of Burglary and Earthquake and john is dependent on alarm.
  • 73. 10/23/2024 76  A node is conditionally independent of all other nodes in the network, given its parents, children, and children’s parents that is, given its Markov blanket.  Example:  Burglary is independent of johncalls and Marycalls given alarm and earthquake
  • 74. 10/23/2024 77 BAYESIAN INFERENCE  Probabilistic Inference System is to compute, Posterior Probability Distribution for a set of query variables, given some observed events.  That is, some assignment of values to a set of evidence variables.
  • 75. 10/23/2024 78 BAYESIAN INFERENCE – Notations  X – denotes the query variables.  E – set of evidence variables {E1, …, Em}  e – particular observed event.  Y – non-evidence, non-query variables, Y1,…, Yn. (called the hidden variables)  The complete set of variable – X ={X} U E U Y  A typical query asks for the posterior Probability Distribution P{X | e}
  • 76. 10/23/2024 79 Inferences in Bayesian network - Burglary Alarm  In the burglary network, we might observe the event in which JohnCalls = True and MaryCalls = True  We could then ask for, say the probability that a burglary has occurred: P(Burglary | JohnCalls = true, MaryCalls = Ture) = (0.284, 0.716)  Burglary - query variables  JohnCalls and MaryCalls – Evident variables  Hidden Variable – Alarm
  • 77. 10/23/2024 80 TYPES OF INFERENCES  Inference by enumerations – inference by listing or recording all variables or extracting  Inference by variables Elimination – inference by variable removal.
  • 78. 10/23/2024 81 INFERENCE BY ENUMERATIONS  Any conditional probability can be computed by summing terms from the full joint distribution.  More specifically, a query P(X|e) can be answered using equations: P(X|e) = α P(X,e) =  Where α is normalized constant  X – query Variable  E – Event  Y – number of terms
  • 79. 10/23/2024 82 INFERENCE BY ENUMERATIONS - Example  Consider the query P(Burglary | JohnCalls = true, MaryCalls = true).  Burglary – Query Varibale (X)  JohnCalls – Evidence Variable1(E1)  MaryCalls – Evidence Variable 2 (E2)  The hidden variables of this query are earthquake and alarm.
  • 80. 10/23/2024 83  From Equation, using initial letters for the variables to shorten the expressions, we have P(B| j, m) = α P(B, j, m) =  The semantics of Bayesian networks (Equation) then gives us an expression in terms of CPT entries. For simplicity, we do this just for Burglary = true: P(B| j, m) = α P(B, j, m) =  P(b) – Parent Variable – Independent variable  P(e) - Parent Variable – Independent variable  ) – a is alarm dependent variable (Hidden Variable) (Burglary and Earthquake)  ) – Evidence variable from alarm.
  • 81. 10/23/2024 84 INFERENCE BY VARIABLES ELIMINATION  The enumeration algorithm can be improved substantially by elimination repeated calculations.  The idea is simple: do the calculation once and solve the result for later use. This is a form of dynamic programming.
  • 82. 10/23/2024 85 INFERENCE BY VARIABLES ELIMINATION  Variable elimination works by evaluating expressions,  Previous equation {derived in inference by enumeration} P(B| j, m) =  From this the repeated variables are separated P(B| j, m) =
  • 83. 10/23/2024 86 INFERENCE BY VARIABLES ELIMINATION  Intermediate results are stored and summations of each variable are done, for only those portion of the expression, that depends on the variable.  Let us illustrate this process for the burglary network  We evaluate the expression P(B| j, m) =  We have annotated each part of the expression with the name of the associated variable; these parts are called factors.  P(b)  B P(e)  E P(a|B,e)  A  P(j|a)  J P(m|a)  M
  • 84. 10/23/2024 87  For example, the factors f4(a) and f5(a) corresponding to P(j | a) and p(m | a) depending just on A because J and M are fixed by the query.  They are therefore two element vectors.
  • 85. 10/23/2024 88 INFERENCE BY VARIABLES ELIMINATION - EXAMPLE  Given two factors f1(A, B) and f2(B, C) with probability distributions shown below, the pointwise product f1 × f2 = f3(A, B, C) has 21+1+1 = 8