Artificial Intelligence and Machine Learning

Acting under uncertainty – Bayesian inference – naïve bayes models.
Probabilistic reasoning – Bayesian networks – exact inference in BN –
approximate inference in BN – causal networks.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY – COIMBATORE - 48
(An Autonomous Institution)
CS3491
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
Dr. B. Dhiyanesh
Associate Professor / CSE
UNIT II
PROBABILISTIC REASONING

10/23/2024 2
Recap of Previous lecture

10/23/2024 4
UNCERTAINTY
 Uncertainty
 Review of probability
 Probabilistic reasoning – Bayes rule
 Bayesian networks
 Inferences in Bayesian network
Contd..

ACTING UNDER UNCERTAINTY
• Agents may need to handle uncertainty, whether due to partial observability, nondeterminism,
or a combination of the two. An agent may never know for certain what state it’s in or where
it will end up after a sequence of actions.
• With this knowledge representation, we might write A→B, which means if A is true then B is
true, but consider a situation where we are not sure about whether A is true or not then we
cannot express this statement, this situation is called uncertainty.
• So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.

CAUSES OF UNCERTAINTY
• Following are some leading causes of uncertainty to occur in the real world.
• Information occurred from unreliable sources.
• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
• When interpreting partial sensor information, a logical agent must consider every logically
possible explanation for the observations, no matter how unlikely. This leads to impossible
large and complex belief-state representations.
• Sometimes there is no plan that is guaranteed to achieve the goal—yet the agent must act. It
must have some way to compare the merits of plans that are not guaranteed.

• Suppose, for example, that an automated taxi!
• Automated has the goal of delivering a passenger to the airport on time.
• The agent forms a plan, A90,
• that involves leaving home 90 minutes before the flight departs and driving at a
reasonable speed.
• Even though the airport is only about 5 miles away,
• a logical taxi agent will not be able to conclude with certainty that “Plan A90 will get
us to the airport in time.”
• Instead, it reaches the weaker conclusion
• “Plan A90 will get us to the airport in time,
• as long as the car doesn’t break down or run out of gas
• I don’t get into an accident, and there are no accidents on the bridge
• The plane doesn’t leave early, and no meteorite hits the car.

• Nonetheless, in some sense A90 is in fact the right thing to do.
• A90 is expected to maximize the agent’s performance measure (where the expectation is
relative to the agent’s knowledge about the environment).
• The performance measure includes
• getting to the airport in time for the flight
• avoiding a long
• unproductive wait at the airport
• avoiding speeding tickets along the way
• The agent’s knowledge cannot guarantee any of these outcomes for A90, but it can provide
some degree of belief that they will be achieved.
• Other plans, such as A180, might increase the agent’s belief that it will get to the airport on
time, but also increase the likelihood of a long wait.

• Trying to use logic to cope with a domain like medical diagnosis thus fails for three main
reasons:
• Laziness: It is too much work to list the complete set of backgrounds or consequents needed
to ensure an exceptionless rule and too hard to use such rules.
• Theoretical ignorance: Medical science has no complete theory for the domain.
• Practical ignorance: Even if we know all the rules, we might be uncertain about a particular
patient because not all the necessary tests have been or can be run.

10/23/2024 10
UNCERTAINTY
 When an agent knows enough facts about its environment, the logical plans
and actions produces a guaranteed work.
 Unfortunately, agents never have access to the whole truth about their
environment. Agents act under uncertainty.
Contd..

10/23/2024 11
UNCERTAINTY
The Diagnosis:
 Medicine, Automobile repair, or what ever is at ask that almost always involves uncertainty.
 Let us try to write rules for dental diagnosis using first order logic, so that we can see how
the logical approach breaks down. Consider the following rule.
∀p symptom(p, toothache)  Disease(p, cavity).
 The problem is that this rule is wrong.
 Not all patients with tooth aches have cavities; some of them have gum disease, swelling, or
one of several other problems.
∀p symptom(p, toothache)  Disease(p, cavity) V Disease(p, gumdisease) V Disease(p,
swelling)
Contd..

10/23/2024 12
UNCERTAINTY
 To make the rule true, we have to add almost unlimited list of possible causes.
 We could try a causal rule:
∀p Disease (p, cavity)  symptom(p, toothache).
 But this rule is also not right either, not all cavities cause pain.
 Toothache and a cavity are unconnected, so the judgement may go wrong.
Contd..

10/23/2024 13
NATURE OF UNCERTAIN KNOWLEDGE
 This is a type of the medical domain, as well as most other judgmental domains:
Law, business, design, automobile repair, gardening, and so on.
 The agent take action, only a degree of belief in the relevant sentences.
 Our main tool for dealing with degrees of belief will be probability theory.
 The probability assigns to each sentence a numerical degree of belief between 0 and 1.
Contd..

10/23/2024 14
PROBABILITY
 Probabilities are used to compute the truth of given statement, written as numbers between
0 and 1, that describes how likely an event is to occur.
 0 indicated impossibility and 1 indicates certainly.
1. Tossing a coin
2. Rolling a dice
 Probability based reasoning
 Understanding from knowledge
 How much of uncertainty present in that event.
Contd..

10/23/2024 15
PROBABILITY
 Probability provides a way of summarizing the uncertainty, that comes from our laziness
and ignorance.
 Laziness means – too many antecedent
 Ignorance means – No complete knowledge, Lack of relevant fact, initial conditions and
not all test can run.
 Toothache problem, an 80 % chance, a probability of 0.8 that the patient has a cavity if he
or she has a toothache by statistical data.
 The 80 % summarizes those cases, but both toothache and cavity are unconnected.
 The missing 20% summarizes, all other possible causes of toothache, that we are too lazy
or ignorant to confirm or deny.
Contd..

10/23/2024 16
 Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of
the sentence.
 The sentence itself is in fact either true or false.
 It is important to note that a degree of belief is different from a degree of truth.
 A probability of 0.8 does not mean “80% true” but rather an 80% degree of belief that is, a
fairly strong expectation.
 Thus, probability theory makes the same ontological commitment as logic namely, that
facts either do or do not hold in the world.
 Degree of truth, as opposed to degree of belief, is the subject of fuzzy logic.
Contd..

10/23/2024 17
 In probability theory, a sentence such as
“The probability that the patient has a cavity is 0.8”.
 Is about the agent’s beliefs not directly about the world.
 These percepts create the evidence, which are based on probability statements.
 All probability statement must indicate the evidence with respect to that probability is begin
assessed.
 If an agent receives new percepts, its probability assessments are updated to reflect the new
evidence.
Contd..

10/23/2024 18
RANDOM VARIABLE
 Referring to a “part” of the world, whose “status” is initially unknown.
 We will use lowercase for the names of values
P(a) = 1 – P(¬a)
P(a) + P(¬a) = 1
Tossing coin: P(h) = 1 – P(¬h) : (0.5 = 1 – 0.5)
Rolling dice: P(n) = 1 – P(¬n) : (0.16 = 1 – 0.84)
Contd..

10/23/2024 19
TYPES OF RANDOM VARIABLES
 Boolean Random Variable
 Cavity domain (true, false), if Cavity = true then cavity, or
 If Cavity = false then ¬cavity
 Discrete Random Variables – countable domain
 Weather might be(sunny, rainy, cloudy, snow)
 Weather = cloudy then ¬rainy, ¬sunny, ¬snow
 Continuous Random variable
 Finite set real numbers with equal intervals e.g. Internal (0.1)
Contd..

10/23/2024 22
PRIOR PROBABILITY
 The unconditional or prior probability associated with a proposition a, is the degree of
belief according the absence of any other information. Is the probability of an event before
new data is collected.
 It is written as P(a).
 For example, if the prior probability that one have a cavity is 0.1, then we would write
P(Cavity = true) = 0.1 or P(cavity) = 0.1.
P(¬Cavity = false)
It is important to remember that P(a) can be used only when there is no other information.
P(Total =11) = P((5, 6)) + P((6, 5)) = 1/36 + 1/36 = 1/18.
Contd..

10/23/2024 23
PRIOR PROBABILITY…
 We will use an expression P(weather), which denotes a vector of values, for the
probabilities of each individual state of the weather.
P(weather = sunny) = 0.7
P(weather = rain) = 0.2
P(weather = cloudy) = 0.08
P(weather = snow) = 0.02
 We may simply write
P(Weather) = (0.7, 0.2, 0.08, 0.02)
 This statement defines a prior probability distribution for the random variable weather.
Contd..

10/23/2024 24
CONDITIONAL PROBABILITY
 The conditional or posterior probabilities notation is P(a|b),
 Where a and b are any proposition.
 This is read as “the probability of a, given that all we know is b.”
 For example,
 P(cavity | toothache) = 0.8
 If a patient is observed to have a toothache and no other information is yet available, then
the probability of the patient's having a cavity will be 0.8.
Contd..

10/23/2024 25
CONDITIONAL PROBABILITY…
 Conditional probabilities cab be defined in terms of unconditional probabilities.
 The equation is
P(a|b) =
 Probability of an event B, assuming that the event A already happened.
 When ever P(b) > 0.
 This equation can also be written as
P(a b) = P(a|b) × P(b)
 which is called the product rule.
Contd..

Need of probabilistic reasoning in AI
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to handle.
• When an unknown error occurs during an experiment.
• In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics

10/23/2024 27
BASIC AXIOMS OF PROBABILITY
 All probabilities are between 0 and 1. for any proposition a,
0 ≤ P(a) ≤ 1
 Necessarily true(i.e., valid) propositions have probability 1,
 Necessarily false (i.e., unsatisfiable) propositions have probability 0.
P(true) = 1 P(false) = 0.
 The probability of a disjunction is given by
P(a U b) = P(a) + P(b) - P(a ∩ b)
Contd..

10/23/2024 28
RECAP OF PREVIOUS LECTURE

29
 1. Axioms of probability
 0 ≤ P(a) ≤ 1
 2. P(true) = 1, P(false) = 0
 3. P(a U b) = P(a) + P(b) - P(a ∩ b)
 4. P(A = 1 | B = 1):
 The fraction of cases where A is true if B is true
10/23/2024
1
A
A B A U B
P(A)
P(A)
P(A|B)
P(B)
P(A = 0.2) P(A|B = 0.5)

30
 5. P(A,B) = P(A|B) × P(B)
 This is one of the most powerful rules in probabilistic
reasoning
10/23/2024

31
 How can we use the axioms to prove that:
P(a) = 1 – P(¬a)
 Prior probability – Degree of belief in an event, in the absence of any other information.
 P(rain tomorrow) = 0.8
 P(no-rain tomorrow) = 0.2
Conditional Probability:
 What is the probability of an event, given knowledge of another event
Example:
 P(raining | sunny)
 P(raining | cloudy)
 P(raining | cloudy, cold)
10/23/2024
Rain = 0.8
No Rain
= 0.2

32
 In some cases, given knowledge of one ore more random variable, we can improve upon
our prior belief of another random variable.
 For example:
 P(slept in movie) = 0.5
 P(slept in movie | liked movie) = 1/3 = .33
 P(didn’t slept in movie | liked movie) = .66
10/23/2024

10/23/2024 33
PROBABILISTIC REASONING – BAYES RULE
 Bayes' theorem is also known as Bayes' rule, Bayes' law, or
Bayesian reasoning, which determines the probability of an
event with uncertain knowledge.
 In probability theory, it relates the conditional probability and
marginal probabilities of two random events.
 Bayes' theorem was named after the British mathematician
Thomas Bayes. The Bayesian inference is an application of
Bayes' theorem, which is fundamental to Bayesian statistics.
 It is a way to calculate the value of P(B|A) with the knowledge of
P(A|B).

10/23/2024 34
BAYES RULE – Cont…
 Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
 Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
 Bayes' theorem can be derived using product rule and conditional probability of event A
with known event B:
 As from product rule we can write:
P(A ∩ B)= P(A|B) P(B) or
 Similarly, the probability of event B with known event A:
P(A ∩ B)= P(B|A) P(A)

10/23/2024 35
 The above equation (a) is called as Bayes' rule or
Bayes' theorem. This equation is basic of most
modern AI systems for probabilistic inference.
 It shows the simple relationship between joint and
conditional probabilities. Here,
P(A|B) P(A)
P(B|A)
P(B)
= × ---------
Posterior Prior
Likelihood
Marginal probability
-- (a)
 P(A|B) is known as posterior, which we need to calculate,
and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
 P(B|A) is called the likelihood, in which we consider that
hypothesis is true, then we calculate the probability of
evidence.
 P(A) is called the prior probability, probability of
hypothesis before considering the evidence
 P(B) is called marginal probability, pure probability of an
evidence.

10/23/2024 36
 In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be
written as:
 Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
 Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
 This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.
 Suppose we want to perceive the effect of some unknown cause, and want to compute that cause,
then the Bayes' rule becomes:

10/23/2024 37
BAYES RULE – Example
Day Outlook Temperature Humidity Windy PlayTennis
D1 Sunny Hot High FALSE No
D2 Sunny Hot High TRUE No
D3 Overcast Hot High FALSE Yes
D4 Rainy Mild High FALSE Yes
D5 Rainy Cool Normal FALSE Yes
D6 Rainy Cool Normal TRUE No
D7 Overcast Cool Normal TRUE Yes
D8 Sunny Mild High FALSE No
D9 Sunny Cool Normal FALSE Yes
D10 Rainy Mild Normal FALSE Yes
D11 Sunny Mild Normal TRUE Yes
D12 Overcast Mild High TRUE Yes
D13 Overcast Hot Normal FALSE Yes
D14 Rainy Mild High TRUE No
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = True

10/23/2024 38
Today = {Sunny, Cool, High, True}

10/23/2024 39
Prior Probability:
 P(play tennis = Yes) = 9/14  .64
 P(play tennis = No) = 5/14  .36
Conditional Probability / Current Probability
Temperature Yes No
Mild 4/9 2/5
Hot 2/9 2/5
Cool 3/9 1/5
Humidity Yes No
High 3/9 4/5
Normal 6/9 1/5
Windy Yes No
True 3/9 3/5
False 6/9 2/5
Outlook Yes No
Overcast 4/9 0/5
Rainy 3/9 2/5
Sunny 2/9 3/5

10/23/2024 41
P(Yes | Today) = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.00529
P(No | Today) = 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.02057
P(Yes | Today) =
0.00529
---------------------
0.00529 + 0.02057
0.00529
---------------------
0.02586
= 0.20456
=
P(No | Today) =
0.02057
---------------------
0.00529 + 0.02057
0.02057
---------------------
0.02586
= 0.79543
=
These numbers can be converted into a probability by making the sum equal to 1
(normalization):
P(Yes | Today) + P(No | Today) = 1

10/23/2024 42
So, prediction that tennis would be played is ‘No’.
P(Yes | Today) > P(No | Today)
0.20456 > 0.79543

10/23/2024 43
Outlook = Rainy, Temperature = Mild, Humidity = High, Wind = True
Today = {Rainy, Mild, High, True}

10/23/2024 44
BAYES RULE – EXAMPLE 2
Color = Green, Legs = 2, Hight = Tall, and Smelly = No

10/23/2024 45
Prior Probability:
 P(M) = 4/8  .50
 P(H) = 4/8  .50
Color M H
White 2/4 3/4
Green 2/4 1/4
Leg M H
3 3/4 0/4
2 1/4 4/4
Height M H
Short 3/4 2/4
Tall 1/4 2/4
Smelly M H
Yes 3/4 1/4
No 1/4 3/4

10/23/2024 47
P(M | New instances) =
0.00390625
---------------------
0.00390625 + 0.046875
0.00390625
---------------------
0.05078125
= 0.076923
=
P(H | New instances) =
0.046875
---------------------
0.00390625 + 0.046875
0.046875
---------------------
0.05078125
= 0.923076
=
(normalization):
P(M | New instances) + P(H | New instances) = 1

10/23/2024 48
Hence new instance belongs to species H
P(M | New instances) > P(H | New instances)
0.076923 > 0.923076

10/23/2024 49
New Instance = { Color = Red, Type = SUV, Origin = Domestic} Stolen = Yes

10/23/2024 50
Prior Probability:
 P(Yes) = 5/10  .50
 P(No) = 5/10  .50
Color Yes No
Red 3/5 2/5
Yellow 2/5 3/5
Type Yes No
Sports 4/5 2/5
SUV 1/5 3/5
Origin Yes No
Domestic 2/5 3/5
Imported 3/5 2/5

10/23/2024 52
P(Yes | New instances) =
0.024
------------------
0.024 + 0.072
0.024
--------------
0.096
= 0.25
=
P(No | New instances) =
0.072
------------------
0.024 + 0.072
0.072
--------------
0.096
= 0.75
=
(normalization):
P(Yes | New instances) + P(No | New instances) = 1

10/23/2024 53
Hence in the new instance vehicle is not stolen
P(Yes | New instances) > P(No | New instances)
0.25 > 0.75

10/23/2024 54
Bayesian Network
 Joint probability distribution
 Bayesian networks with examples
 Semantics of Bayesian networks
 Representing the full joint distribution
 A method for constructing Bayesian network
 Compactness and node ordering
 Conditional independence relation in Bayesian networks.

10/23/2024 56
JPD – cont….
 The Full Joint Probability Distribution specifies the probability of values to random
variables.
 It is usually too large to create or use in its explicit form.
 Joint probability distribution of two variables X and Y are
 Joint probability distribution for n variables requires 2n
entries with all possible
combinations.
Joint Probabilities X X’
Y 0.20 0.12
Y’ 0.65 0.03

10/23/2024 57
Drawbacks of Joint Probability Distributions
 Large number of variables and grows rapidly
 Time and space complexity are huge.
 Statistical estimation with probability is difficult
 Human tends signal out few propositions.
 The alternative to this is Bayesian networks.

10/23/2024 58
Bayesian Networks
 Bayesian network also called data structures, or also called as belief network or called
probabilistic network.
 The extension of Bayesian network is called as a decision network or influence diagram.
 Bayesian network is to represent the dependencies among variables and to give a brief
specification of any full joint probability distribution.
 A Bayesian network is a directed graph in which each nodes are variables and edges are
relation.

10/23/2024 59
BAYESIAN NETWORKS
 A Bayesian network is a directed graph in which each node is annotated with quantitative
probability information.
 The full specification is as follows:
 A set of random variables makes the nodes of network. Variables may be discrete or
continuous.
 A set of directed links or arrow connects a pairs of nodes. If there is an arrow from node
X to node Y, X is parent of Y.
 Each node X, has a conditional probability distribution P(X, (parents(X)). that
quantifies the effect of the parents on the node. (X is a parent of Y)
 It is Direct Acyclic Graph or DAG – that is graph has no direct cycles.
X Y

10/23/2024 60
BAYESIAN NETWORK - EXAMPLE
 A & B are unconditional, independent, evidence and parent nodes.
 C & D are conditional, dependent, hypothesis and child nodes.
Contd..
C
A
D
B

10/23/2024 61
BAYESIAN NETWORK – EXAMPLE cont…
P(A, B, C, D) = P(D|A,B) × P(C|A) × P(B) × P(A)

10/23/2024 63
Bayesian Network – Burglar Alarm
 You have installed a new burglar alarm at home.
 It is fairly reliable at detecting a burglary, but also responds to occasion to minor
earthquakes.
 You also have two neighbors, john and Mary, who have promised to call you at work
when they hear the alarm.
 John always calls when he hears the alarm, but sometimes confuses the telephone ringing
with the alarm and calls then, too.
 Mary, on the other hand, likes loud music and sometimes misses the alarm altogether.
 Given the evidence of who has or has not called, we would like to estimate the probability
of a burglary.

10/23/2024 64
 The burglary and earthquakes directly affect the probability of the alarm’s going off.
 But, john and Mary call depends only on the alarm
 The network does not have nodes for Mary’s currently listening to loud music or the
telephone ringing and confusing john.

10/23/2024 65
EXAMPLE
 We can calculate the probability that the alarm has sounded, but neither a burglary
nor an earthquake has occurred, and both john and marry call.
P(j ∩ m ∩ a ∩ ¬b ∩ ¬e)
P(j, m, a, ￢ b, ￢ e) = P(j | a) × P(m| a) × P(a | ￢ b ∧ ￢ e) × P( ￢ b) × P( ￢ e)
= 0.90 × 0.70 × 0.001 × 0.999 × 0.998
= 0.000628

10/23/2024 66
What is the probability that john calls?
= P(j) = P(j|a) × P(a) + P(j|¬a) × P(¬a)
= {P(j|a) × (P(a | b, e) × p(b, e) + P(a | ¬b, e) × p(¬b, e) + P(a | b, ¬e) × p(b, ¬e) + P(a | ¬b,
¬e) × p(¬b, ¬e))} + {P(j|¬a) × (P(¬ a | b, e) × p(b, e) + P(¬ a | ¬b, e) × p(¬b, e) + P(¬ a | b,
¬e) × p(b, ¬e) + P(¬ a | ¬b, ¬e) × p(¬b, ¬e))}
= {.90 × (0.95*0.001*0.002 + 0.29*(1-0.001)*(0.002) + 0.94*0.001*(1-0.002) + 0.001*(1-
0.001)*(1-0.002))} + {0.05 × (0.05*0.001*0.002 + 0.71*(1-0.001)*0.002 + 0.06*0.001*(1-
0.002) + 0.999*(1-0.001)*(1-0.002))}
= {.90 × 0.002516442} + {0.05 × 0.997483558}
= 0.0022647978 + 0.0498741779
= 0.0521389754

10/23/2024 67
What is the probability that burglary given john and mary calls?
= P(b| j, m) = α P(b)
= α P(b)
= ]
= × 0.001 × [.90 × .70 × {.95 × .002 + .94 × .998} + {0.05 × 0.01 × {0.05 × 0.02 + 0.06 ×
0.998}}]
= × 0.001 × [.63× {0.0019 + 0.93812} + {0.0005 × {0.001 + 0.05988}}]
= × 0.001 × [.63× {0.94002} + {0.0005 × {0.06088}}]
= × 0.001 × [0.5922126 + 0.00003044]
= × 0.001 × [0.59224304]
= × 0.00059224304

10/23/2024 68
 In burglary network, the topology shows that
 Burglary and earthquake directly affect the probability of the alarm
 But john and marry call depends on the alarm.
 Our assumptions from the network,
 They do not perceive any burglaries directly
 They do not notice the minor earthquakes and
 They do not discuss before calling.

10/23/2024 69
 Notice that the network does not have nodes corresponding to
 Mary is currently listening to loud music or
 There is telephone ringing and confusing john
 These factors are summarized in the uncertainty, associated with the links from alarm to
johncalls and marry calls.
 This shoes both laziness and ignorance in operations.

10/23/2024 70
Semantics of Bayesian Network
 An entry in joint distribution is the probability of conjunction of particular assignment to
each variable, such as
 P(X1 = x1, X2 = x
∧ 2 … … X
∧ ∧ n = xn)
 P(X1,X2 , . . . ,Xn) = P(xi | Parents(Xi))
 X is the random variable and x is the value of X

10/23/2024 71
METHOD FOR CONSTRUCTING BAYESIAN NETWORK
 Rewrite the joint distribution in terms of a conditional probability, using the product
rule.
 Then we repeat the process, reducing each conjunctive probability to a conditional
probability and a smaller conjunction. We end up with one big product.

10/23/2024 73
COMPACTNESS AND NODE ORDERING
 The compactness of Bayesian network is an example of general property of locally
constructed systems. (also called as spare systems, inside some components there, and
those are communicated)
 In a locally structured system, each subcomponent interacts directly with only a bounded
number of other components, regardless of the total number of components.
 Therefore the correct order in which to add node is to add the ‘root causes’ first, then the
variables they influenced and so on until we reach the leaves.

10/23/2024 74
Suppose we decide to add the nodes in the order Marycalls, Johncalls,
Alarm, Burglary, Earthquake
 Adding MaryCalls: No parents.
 Adding JohnCalls: If Mary calls, that probably means the alarm has gone off, which of course would make it more likely
that John calls. Therefore, JohnCalls needs MaryCalls as a parent.
 Adding Alarm: Clearly, if both call, it is more likely that the alarm has gone off than if just one or neither calls, so we
need both MaryCalls and JohnCalls as parents.
 Adding Burglary: If we know the alarm state, then the call from John or Mary might give us information about our
phone ringing or Mary’s music, but not about burglary:
P(Burglary | Alarm, JohnCalls ,MaryCalls) = P(Burglary | Alarm) .
 Hence we need just Alarm as parent.
 Adding Earthquake: If the alarm is on, it is more likely that there has been an earthquake. (The alarm is an earthquake
detector of sorts.) But if we know that there has been a burglary, then that explains the alarm, and the probability of an
earthquake would be only slightly above normal. Hence, we need both Alarm and Burglary as parents.

10/23/2024 75
CONDITIONAL INDEPENDENCE RELATIONS IN BAYESIAN
NETWORKS
 A node is conditionally independent of its non-descendants, given its parents.
 Example:
 Johncalls is conditionally independent of Burglary and Earthquake and john is dependent on
alarm.

10/23/2024 76
 A node is conditionally independent of all other nodes in the network, given its parents,
children, and children’s parents that is, given its Markov blanket.
 Example:
 Burglary is independent of johncalls and Marycalls given alarm and earthquake

10/23/2024 77
BAYESIAN INFERENCE
 Probabilistic Inference System is to compute, Posterior Probability Distribution for a set
of query variables, given some observed events.
 That is, some assignment of values to a set of evidence variables.

10/23/2024 78
BAYESIAN INFERENCE – Notations
 X – denotes the query variables.
 E – set of evidence variables {E1, …, Em}
 e – particular observed event.
 Y – non-evidence, non-query variables, Y1,…, Yn. (called the hidden variables)
 The complete set of variable – X ={X} U E U Y
 A typical query asks for the posterior Probability Distribution P{X | e}

10/23/2024 79
Inferences in Bayesian network - Burglary Alarm
 In the burglary network, we might observe the event in which
JohnCalls = True and MaryCalls = True
 We could then ask for, say the probability that a burglary has occurred:
P(Burglary | JohnCalls = true, MaryCalls = Ture) = (0.284, 0.716)
 Burglary - query variables
 JohnCalls and MaryCalls – Evident variables
 Hidden Variable – Alarm

10/23/2024 80
TYPES OF INFERENCES
 Inference by enumerations – inference by listing or recording all variables or extracting
 Inference by variables Elimination – inference by variable removal.

10/23/2024 81
INFERENCE BY ENUMERATIONS
 Any conditional probability can be computed by summing terms from the full joint
distribution.
 More specifically, a query P(X|e) can be answered using equations:
P(X|e) = α P(X,e) =
 Where α is normalized constant
 X – query Variable
 E – Event
 Y – number of terms

10/23/2024 82
INFERENCE BY ENUMERATIONS - Example
 Consider the query P(Burglary | JohnCalls = true, MaryCalls = true).
 Burglary – Query Varibale (X)
 JohnCalls – Evidence Variable1(E1)
 MaryCalls – Evidence Variable 2 (E2)
 The hidden variables of this query are earthquake and alarm.

10/23/2024 83
 From Equation, using initial letters for the variables to shorten the expressions, we have
P(B| j, m) = α P(B, j, m) =
 The semantics of Bayesian networks (Equation) then gives us an expression in terms of
CPT entries. For simplicity, we do this just for Burglary = true:
P(B| j, m) = α P(B, j, m) =
 P(b) – Parent Variable – Independent variable
 P(e) - Parent Variable – Independent variable
 ) – a is alarm dependent variable (Hidden Variable) (Burglary and Earthquake)
 ) – Evidence variable from alarm.

10/23/2024 84
INFERENCE BY VARIABLES ELIMINATION
 The enumeration algorithm can be improved substantially by elimination repeated
calculations.
 The idea is simple: do the calculation once and solve the result for later use. This is a
form of dynamic programming.

10/23/2024 85
 Variable elimination works by evaluating expressions,
 Previous equation {derived in inference by enumeration}
P(B| j, m) =
 From this the repeated variables are separated
P(B| j, m) =

10/23/2024 86
 Intermediate results are stored and summations of each variable are done, for only those
portion of the expression, that depends on the variable.
 Let us illustrate this process for the burglary network
 We evaluate the expression
P(B| j, m) =
 We have annotated each part of the expression with the name of the associated variable;
these parts are called factors.
 P(b)  B P(e)  E P(a|B,e)  A
 P(j|a)  J P(m|a)  M

10/23/2024 87
 For example, the factors f4(a) and f5(a) corresponding to P(j | a) and p(m | a) depending
just on A because J and M are fixed by the query.
 They are therefore two element vectors.

10/23/2024 88
INFERENCE BY VARIABLES ELIMINATION - EXAMPLE
 Given two factors f1(A, B) and f2(B, C) with probability distributions shown below, the
pointwise product f1 × f2 = f3(A, B, C) has 21+1+1
= 8

Artificial Intelligence and Machine Learning

More Related Content

Similar to Artificial Intelligence and Machine Learning (20)

More from ssuser1ecccc (20)

Recently uploaded (20)

Artificial Intelligence and Machine Learning