SlideShare a Scribd company logo
Bayesian Networks:
Independencies and Inference
Scott Davies and Andrew Moore
Note to other teachers and users of these slides. Andrew and Scott
would be delighted if you found this source material useful in giving
your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. PowerPoint originals are available. If you
make use of a significant portion of these slides in your own lecture,
please include this message, or the following link to the source
repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections
gratefully received.
What Independencies does a Bayes Net Model?
• In order for a Bayesian network to model a
probability distribution, the following must be true by
definition:
Each variable is conditionally independent of all its non-
descendants in the graph given the value of all its parents.
• This implies
• But what else does it imply?



n
i
i
i
n X
parents
X
P
X
X
P
1
1 ))
(
|
(
)
( 
What Independencies does a Bayes Net Model?
• Example:
Z
Y
X
Given Y, does learning the value of Z tell us
nothing new about X?
I.e., is P(X|Y, Z) equal to P(X | Y)?
Yes. Since we know the value of all of X’s
parents (namely, Y), and Z is not a
descendant of X, X is conditionally
independent of Z.
Also, since independence is symmetric,
P(Z|Y, X) = P(Z|Y).
Quick proof that independence is symmetric
• Assume: P(X|Y, Z) = P(X|Y)
• Then:
)
,
(
)
(
)
|
,
(
)
,
|
(
Y
X
P
Z
P
Z
Y
X
P
Y
X
Z
P 
)
(
)
|
(
)
(
)
,
|
(
)
|
(
Y
P
Y
X
P
Z
P
Z
Y
X
P
Z
Y
P

(Bayes’s Rule)
(Chain Rule)
(By Assumption)
(Bayes’s Rule)
)
(
)
|
(
)
(
)
|
(
)
|
(
Y
P
Y
X
P
Z
P
Y
X
P
Z
Y
P

)
|
(
)
(
)
(
)
|
(
Y
Z
P
Y
P
Z
P
Z
Y
P


What Independencies does a Bayes Net Model?
• Let I<X,Y,Z> represent X and Z being conditionally
independent given Y.
• I<X,Y,Z>? Yes, just as in previous example: All X’s
parents given, and Z is not a descendant.
Y
X Z
What Independencies does a Bayes Net Model?
• I<X,{U},Z>? No.
• I<X,{U,V},Z>? Yes.
• Maybe I<X, S, Z> iff S acts a cutset between X and Z
in an undirected version of the graph…?
Z
V
U
X
Things get a little more confusing
• X has no parents, so we’re know all its parents’
values trivially
• Z is not a descendant of X
• So, I<X,{},Z>, even though there’s a undirected path
from X to Z through an unknown variable Y.
• What if we do know the value of Y, though? Or one
of its descendants?
Z
X
Y
The “Burglar Alarm” example
• Your house has a twitchy burglar alarm that is also
sometimes triggered by earthquakes.
• Earth arguably doesn’t care whether your house is
currently being burgled
• While you are on vacation, one of your neighbors calls
and tells you your home’s burglar alarm is ringing. Uh
oh!
Burglar Earthquake
Alarm
Phone Call
Things get a lot more confusing
• But now suppose you learn that there was a medium-sized
earthquake in your neighborhood. Oh, whew! Probably not a
burglar after all.
• Earthquake “explains away” the hypothetical burglar.
• But then it must not be the case that
I<Burglar,{Phone Call}, Earthquake>, even though
I<Burglar,{}, Earthquake>!
Burglar Earthquake
Alarm
Phone Call
d-separation to the rescue
• Fortunately, there is a relatively simple algorithm for
determining whether two variables in a Bayesian
network are conditionally independent: d-separation.
• Definition: X and Z are d-separated by a set of
evidence variables E iff every undirected path from X
to Z is “blocked”, where a path is “blocked” iff one
or more of the following conditions is true: ...
A path is “blocked” when...
• There exists a variable V on the path such that
• it is in the evidence set E
• the arcs putting V in the path are “tail-to-tail”
• Or, there exists a variable V on the path such that
• it is in the evidence set E
• the arcs putting V in the path are “tail-to-head”
• Or, ...
V
V
A path is “blocked” when… (the funky case)
• … Or, there exists a variable V on the path such that
• it is NOT in the evidence set E
• neither are any of its descendants
• the arcs putting V on the path are “head-to-head”
V
d-separation to the rescue, cont’d
• Theorem [Verma & Pearl, 1998]:
• If a set of evidence variables E d-separates X and Z in
a Bayesian network’s graph, then I<X, E, Z>.
• d-separation can be computed in linear time using a
depth-first-search-like algorithm.
• Great! We now have a fast algorithm for automatically
inferring whether learning the value of one variable
might give us any additional hints about some other
variable, given what we already know.
• “Might”: Variables may actually be independent when they’re not d-
separated, depending on the actual probabilities involved
d-separation example
A B
C D
E F
G
I
H
J
•I<C, {}, D>?
•I<C, {A}, D>?
•I<C, {A, B}, D>?
•I<C, {A, B, J}, D>?
•I<C, {A, B, E, J}, D>?
Bayesian Network Inference
• Inference: calculating P(X|Y) for some variables or
sets of variables X and Y.
• Inference in Bayesian networks is #P-hard!
Reduces to
How many satisfying assignments?
I1 I2 I3 I4 I5
O
Inputs: prior probabilities of .5
P(O) must be
(#sat. assign.)*(.5^#inputs)
Bayesian Network Inference
• But…inference is still tractable in some cases.
• Let’s look a special class of networks: trees / forests
in which each node has at most one parent.
Decomposing the probabilities
• Suppose we want P(Xi | E) where E is some set of evidence
variables.
• Let’s split E into two parts:
• Ei
-
is the part consisting of assignments to variables in the subtree rooted at
Xi
• Ei
+
is the rest of it
Xi
Decomposing the probabilities, cont’d
)
,
|
(
)
|
( 

 i
i
i
i E
E
X
P
E
X
P
Xi
Decomposing the probabilities, cont’d
)
|
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(









i
i
i
i
i
i
i
i
i
E
E
P
E
X
P
E
X
E
P
E
E
X
P
E
X
P
Xi
Decomposing the probabilities, cont’d
)
|
(
)
|
(
)
|
(
)
|
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(














i
i
i
i
i
i
i
i
i
i
i
i
i
E
E
P
E
X
P
X
E
P
E
E
P
E
X
P
E
X
E
P
E
E
X
P
E
X
P
Xi
Decomposing the probabilities, cont’d
)
(
λ
)
(
απ
)
|
(
)
|
(
)
|
(
)
|
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
X
X
E
E
P
E
X
P
X
E
P
E
E
P
E
X
P
E
X
E
P
E
E
X
P
E
X
P















Xi
Where:
• is a constant independent of Xi
•(Xi) = P(Xi |Ei
+
)
• (Xi) = P(Ei
-
| Xi)
Using the decomposition for inference
• We can use this decomposition to do inference as
follows. First, compute (Xi) = P(Ei
-
| Xi) for all Xi
recursively, using the leaves of the tree as the base
case.
• If Xi is a leaf:
• If Xi is in E: (Xi) = 1 if Xi matches E, 0 otherwise
• If Xi is not in E: Ei
-
is the null set, so
P(Ei
-
| Xi) = 1 (constant)
Quick aside: “Virtual evidence”
• For theoretical simplicity, but without loss of
generality, let’s assume that all variables in E (the
evidence set) are leaves in the tree.
• Why can we do this WLOG:
Xi
Xi
Xi’
Observe Xi
Equivalent to
Observe Xi’
Where P(Xi’|Xi) =1 if Xi’=Xi, 0 otherwise
Calculating (Xi) for non-leaves
• Suppose Xi has one child, Xc.
• Then:
Xi
Xc

 
)
|
(
)
(
λ i
i
i X
E
P
X
Calculating (Xi) for non-leaves
• Suppose Xi has one child, Xc.
• Then:
Xi
Xc
 

 

j
i
C
i
i
i
i X
j
X
E
P
X
E
P
X )
|
,
(
)
|
(
)
(
λ
Calculating (Xi) for non-leaves
• Suppose Xi has one child, Xc.
• Then:
Xi
Xc











j
C
i
i
i
C
j
i
C
i
i
i
i
j
X
X
E
P
X
j
X
P
X
j
X
E
P
X
E
P
X
)
,
|
(
)
|
(
)
|
,
(
)
|
(
)
(
λ
Calculating (Xi) for non-leaves
• Suppose Xi has one child, Xc.
• Then:
Xi
Xc




















j
C
i
C
j
C
i
i
C
j
C
i
i
i
C
j
i
C
i
i
i
i
j
X
X
j
X
P
j
X
E
P
X
j
X
P
j
X
X
E
P
X
j
X
P
X
j
X
E
P
X
E
P
X
)
(
λ
)
|
(
)
|
(
)
|
(
)
,
|
(
)
|
(
)
|
,
(
)
|
(
)
(
λ
Calculating (Xi) for non-leaves
• Now, suppose Xi has a set of children, C.
• Since Xi d-separates each of its subtrees, the contribution
of each subtree to (Xi) is independent:
 















C
X X
j
i
j
C
X
i
j
i
i
i
j j
j
X
X
X
P
X
X
E
P
X
)
λ(
)
|
(
)
(
λ
)
|
(
)
(
λ
where j(Xi) is the contribution to P(Ei
-
| Xi) of the part of
the evidence lying in the subtree rooted at one of Xi’s
children Xj.
We are now -happy
• So now we have a way to recursively compute all the (Xi)’s,
starting from the root and using the leaves as the base case.
• If we want, we can think of each node in the network as an
autonomous processor that passes a little “ message” to its
parent.
   


The other half of the problem
• Remember, P(Xi|E) = (Xi)(Xi). Now that we have
all the (Xi)’s, what about the (Xi)’s?
(Xi) = P(Xi |Ei
+
).
• What about the root of the tree, Xr? In that case, Er
+
is
the null set, so (Xr) = P(Xr). No sweat. Since we also
know (Xr), we can compute the final P(Xr).
• So for an arbitrary Xi with parent Xp, let’s inductively
assume we know (Xp) and/or P(Xp|E). How do we get
(Xi)?
Computing (Xi)
Xp
Xi

 
)
|
(
)
(
π i
i
i E
X
P
X
Computing (Xi)
Xp
Xi
 




j
i
p
i
i
i
i E
j
X
X
P
E
X
P
X )
|
,
(
)
|
(
)
(
π
Computing (Xi)
Xp
Xi












j
i
p
i
p
i
j
i
p
i
i
i
i
E
j
X
P
E
j
X
X
P
E
j
X
X
P
E
X
P
X
)
|
(
)
,
|
(
)
|
,
(
)
|
(
)
(
π
Computing (Xi)
Xp
Xi

















j
i
p
p
i
j
i
p
i
p
i
j
i
p
i
i
i
i
E
j
X
P
j
X
X
P
E
j
X
P
E
j
X
X
P
E
j
X
X
P
E
X
P
X
)
|
(
)
|
(
)
|
(
)
,
|
(
)
|
,
(
)
|
(
)
(
π
Computing (Xi)
Xp
Xi






















j p
i
p
p
i
j
i
p
p
i
j
i
p
i
p
i
j
i
p
i
i
i
i
j
X
E
j
X
P
j
X
X
P
E
j
X
P
j
X
X
P
E
j
X
P
E
j
X
X
P
E
j
X
X
P
E
X
P
X
)
(
λ
)
|
(
)
|
(
)
|
(
)
|
(
)
|
(
)
,
|
(
)
|
,
(
)
|
(
)
(
π
Computing (Xi)
Xp
Xi
Where i(Xp) is defined as


























j
p
i
p
i
j p
i
p
p
i
j
i
p
p
i
j
i
p
i
p
i
j
i
p
i
i
i
i
j
X
j
X
X
P
j
X
E
j
X
P
j
X
X
P
E
j
X
P
j
X
X
P
E
j
X
P
E
j
X
X
P
E
j
X
X
P
E
X
P
X
)
(
π
)
|
(
)
(
λ
)
|
(
)
|
(
)
|
(
)
|
(
)
|
(
)
,
|
(
)
|
,
(
)
|
(
)
(
π
)
(
λ
)
|
(
p
i
p
X
E
X
P
We’re done. Yay!
• Thus we can compute all the (Xi)’s, and, in turn, all
the P(Xi|E)’s.
• Can think of nodes as autonomous processors passing
 and  messages to their neighbors
   


 
   
Conjunctive queries
• What if we want, e.g., P(A, B | C) instead of just
marginal distributions P(A | C) and P(B | C)?
• Just use chain rule:
• P(A, B | C) = P(A | C) P(B | A, C)
• Each of the latter probabilities can be computed
using the technique just discussed.
Polytrees
• Technique can be generalized to polytrees:
undirected versions of the graphs are still trees, but
nodes can have more than one parent
Dealing with cycles
• Can deal with undirected cycles in graph by
• clustering variables together
• Conditioning
A
B C
D
A
D
BC
Set to 0 Set to 1
Join trees
• Arbitrary Bayesian network can be transformed via
some evil graph-theoretic magic into a join tree in
which a similar method can be employed.
A
B
E D
F
C
G
ABC
BCD BCD
DF
In the worst case the join tree nodes must take on exponentially
many combinations of values, but often works well in practice

More Related Content

PDF
Inference in Bayesian Networks
PPT
Cs221 lecture3-fall11
PDF
Bayesian Networks - A Brief Introduction
PPT
Basen Network
PDF
Lesson 28
PDF
AI Lesson 28
PPTX
Unit V -Graphical Models.pptx for artificial intelligence
PPTX
Unit V -Graphical Models in artificial intelligence and machine learning
Inference in Bayesian Networks
Cs221 lecture3-fall11
Bayesian Networks - A Brief Introduction
Basen Network
Lesson 28
AI Lesson 28
Unit V -Graphical Models.pptx for artificial intelligence
Unit V -Graphical Models in artificial intelligence and machine learning

Similar to bayesian in artificial intelligence and search methods (20)

PDF
BayesianNetwork-converted.pdf
PPT
AIML unit-2(1).ppt
PPTX
PRML Chapter 8
PDF
Dag in mmhc
PPTX
Bayesian Belief Network in artificial intelligence.pptx
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PPTX
620054032-20220209112111-PPT06-Probabilistic-Reasoning.pptx
PPT
Bayesian Networks Model in Step By Steps
PPT
ch15BayesNet.ppt
PPTX
Inference in HMM and Bayesian Models
PPTX
Probabilistic Reasoning
PDF
19 uncertain evidence
PPTX
Bayesian probabilistic interference
PPTX
Bayesian probabilistic interference
PPTX
Bayesian network
PDF
Causal Bayesian Networks
PPTX
Bayesian network
PPTX
Bayes network
PPTX
Presentation1.pptx
PDF
NB classifier_Detailed pdf you can use it
BayesianNetwork-converted.pdf
AIML unit-2(1).ppt
PRML Chapter 8
Dag in mmhc
Bayesian Belief Network in artificial intelligence.pptx
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
620054032-20220209112111-PPT06-Probabilistic-Reasoning.pptx
Bayesian Networks Model in Step By Steps
ch15BayesNet.ppt
Inference in HMM and Bayesian Models
Probabilistic Reasoning
19 uncertain evidence
Bayesian probabilistic interference
Bayesian probabilistic interference
Bayesian network
Causal Bayesian Networks
Bayesian network
Bayes network
Presentation1.pptx
NB classifier_Detailed pdf you can use it
Ad

More from RaghavendraPrasad179187 (10)

PPT
gmatrix distro_gmatrix distro_gmatrix distro
PPT
Project Planning and control in Software Engineering
PPT
spatial surveillance techniques in artificial intelligence
PPT
hillclimb algorithm for heuristic search
PPT
bayessian structures and its role in artificial intelligence
PPT
HEURISTIC SEARCH IN ARTIFICIAL INTELLEGENCE
PPT
Dijkstra_Algorithm with illustarted example
PPTX
Linked list data structures and algorithms
PPTX
lecture-inference-in-first-order-logic.pptx
PPT
Heuristic Search Algorithm in AI and its Techniques
gmatrix distro_gmatrix distro_gmatrix distro
Project Planning and control in Software Engineering
spatial surveillance techniques in artificial intelligence
hillclimb algorithm for heuristic search
bayessian structures and its role in artificial intelligence
HEURISTIC SEARCH IN ARTIFICIAL INTELLEGENCE
Dijkstra_Algorithm with illustarted example
Linked list data structures and algorithms
lecture-inference-in-first-order-logic.pptx
Heuristic Search Algorithm in AI and its Techniques
Ad

Recently uploaded (20)

PDF
Classroom Observation Tools for Teachers
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Pre independence Education in Inndia.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Structure & Organelles in detailed.
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Sports Quiz easy sports quiz sports quiz
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
Classroom Observation Tools for Teachers
Supply Chain Operations Speaking Notes -ICLT Program
TR - Agricultural Crops Production NC III.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pre independence Education in Inndia.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Anesthesia in Laparoscopic Surgery in India
Cell Structure & Organelles in detailed.
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial disease of the cardiovascular and lymphatic systems
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Renaissance Architecture: A Journey from Faith to Humanism
Sports Quiz easy sports quiz sports quiz
O5-L3 Freight Transport Ops (International) V1.pdf

bayesian in artificial intelligence and search methods

  • 1. Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 2. What Independencies does a Bayes Net Model? • In order for a Bayesian network to model a probability distribution, the following must be true by definition: Each variable is conditionally independent of all its non- descendants in the graph given the value of all its parents. • This implies • But what else does it imply?    n i i i n X parents X P X X P 1 1 )) ( | ( ) ( 
  • 3. What Independencies does a Bayes Net Model? • Example: Z Y X Given Y, does learning the value of Z tell us nothing new about X? I.e., is P(X|Y, Z) equal to P(X | Y)? Yes. Since we know the value of all of X’s parents (namely, Y), and Z is not a descendant of X, X is conditionally independent of Z. Also, since independence is symmetric, P(Z|Y, X) = P(Z|Y).
  • 4. Quick proof that independence is symmetric • Assume: P(X|Y, Z) = P(X|Y) • Then: ) , ( ) ( ) | , ( ) , | ( Y X P Z P Z Y X P Y X Z P  ) ( ) | ( ) ( ) , | ( ) | ( Y P Y X P Z P Z Y X P Z Y P  (Bayes’s Rule) (Chain Rule) (By Assumption) (Bayes’s Rule) ) ( ) | ( ) ( ) | ( ) | ( Y P Y X P Z P Y X P Z Y P  ) | ( ) ( ) ( ) | ( Y Z P Y P Z P Z Y P  
  • 5. What Independencies does a Bayes Net Model? • Let I<X,Y,Z> represent X and Z being conditionally independent given Y. • I<X,Y,Z>? Yes, just as in previous example: All X’s parents given, and Z is not a descendant. Y X Z
  • 6. What Independencies does a Bayes Net Model? • I<X,{U},Z>? No. • I<X,{U,V},Z>? Yes. • Maybe I<X, S, Z> iff S acts a cutset between X and Z in an undirected version of the graph…? Z V U X
  • 7. Things get a little more confusing • X has no parents, so we’re know all its parents’ values trivially • Z is not a descendant of X • So, I<X,{},Z>, even though there’s a undirected path from X to Z through an unknown variable Y. • What if we do know the value of Y, though? Or one of its descendants? Z X Y
  • 8. The “Burglar Alarm” example • Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. • Earth arguably doesn’t care whether your house is currently being burgled • While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing. Uh oh! Burglar Earthquake Alarm Phone Call
  • 9. Things get a lot more confusing • But now suppose you learn that there was a medium-sized earthquake in your neighborhood. Oh, whew! Probably not a burglar after all. • Earthquake “explains away” the hypothetical burglar. • But then it must not be the case that I<Burglar,{Phone Call}, Earthquake>, even though I<Burglar,{}, Earthquake>! Burglar Earthquake Alarm Phone Call
  • 10. d-separation to the rescue • Fortunately, there is a relatively simple algorithm for determining whether two variables in a Bayesian network are conditionally independent: d-separation. • Definition: X and Z are d-separated by a set of evidence variables E iff every undirected path from X to Z is “blocked”, where a path is “blocked” iff one or more of the following conditions is true: ...
  • 11. A path is “blocked” when... • There exists a variable V on the path such that • it is in the evidence set E • the arcs putting V in the path are “tail-to-tail” • Or, there exists a variable V on the path such that • it is in the evidence set E • the arcs putting V in the path are “tail-to-head” • Or, ... V V
  • 12. A path is “blocked” when… (the funky case) • … Or, there exists a variable V on the path such that • it is NOT in the evidence set E • neither are any of its descendants • the arcs putting V on the path are “head-to-head” V
  • 13. d-separation to the rescue, cont’d • Theorem [Verma & Pearl, 1998]: • If a set of evidence variables E d-separates X and Z in a Bayesian network’s graph, then I<X, E, Z>. • d-separation can be computed in linear time using a depth-first-search-like algorithm. • Great! We now have a fast algorithm for automatically inferring whether learning the value of one variable might give us any additional hints about some other variable, given what we already know. • “Might”: Variables may actually be independent when they’re not d- separated, depending on the actual probabilities involved
  • 14. d-separation example A B C D E F G I H J •I<C, {}, D>? •I<C, {A}, D>? •I<C, {A, B}, D>? •I<C, {A, B, J}, D>? •I<C, {A, B, E, J}, D>?
  • 15. Bayesian Network Inference • Inference: calculating P(X|Y) for some variables or sets of variables X and Y. • Inference in Bayesian networks is #P-hard! Reduces to How many satisfying assignments? I1 I2 I3 I4 I5 O Inputs: prior probabilities of .5 P(O) must be (#sat. assign.)*(.5^#inputs)
  • 16. Bayesian Network Inference • But…inference is still tractable in some cases. • Let’s look a special class of networks: trees / forests in which each node has at most one parent.
  • 17. Decomposing the probabilities • Suppose we want P(Xi | E) where E is some set of evidence variables. • Let’s split E into two parts: • Ei - is the part consisting of assignments to variables in the subtree rooted at Xi • Ei + is the rest of it Xi
  • 18. Decomposing the probabilities, cont’d ) , | ( ) | (    i i i i E E X P E X P Xi
  • 19. Decomposing the probabilities, cont’d ) | ( ) | ( ) , | ( ) , | ( ) | (          i i i i i i i i i E E P E X P E X E P E E X P E X P Xi
  • 20. Decomposing the probabilities, cont’d ) | ( ) | ( ) | ( ) | ( ) | ( ) , | ( ) , | ( ) | (               i i i i i i i i i i i i i E E P E X P X E P E E P E X P E X E P E E X P E X P Xi
  • 21. Decomposing the probabilities, cont’d ) ( λ ) ( απ ) | ( ) | ( ) | ( ) | ( ) | ( ) , | ( ) , | ( ) | ( i i i i i i i i i i i i i i i X X E E P E X P X E P E E P E X P E X E P E E X P E X P                Xi Where: • is a constant independent of Xi •(Xi) = P(Xi |Ei + ) • (Xi) = P(Ei - | Xi)
  • 22. Using the decomposition for inference • We can use this decomposition to do inference as follows. First, compute (Xi) = P(Ei - | Xi) for all Xi recursively, using the leaves of the tree as the base case. • If Xi is a leaf: • If Xi is in E: (Xi) = 1 if Xi matches E, 0 otherwise • If Xi is not in E: Ei - is the null set, so P(Ei - | Xi) = 1 (constant)
  • 23. Quick aside: “Virtual evidence” • For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree. • Why can we do this WLOG: Xi Xi Xi’ Observe Xi Equivalent to Observe Xi’ Where P(Xi’|Xi) =1 if Xi’=Xi, 0 otherwise
  • 24. Calculating (Xi) for non-leaves • Suppose Xi has one child, Xc. • Then: Xi Xc    ) | ( ) ( λ i i i X E P X
  • 25. Calculating (Xi) for non-leaves • Suppose Xi has one child, Xc. • Then: Xi Xc       j i C i i i i X j X E P X E P X ) | , ( ) | ( ) ( λ
  • 26. Calculating (Xi) for non-leaves • Suppose Xi has one child, Xc. • Then: Xi Xc            j C i i i C j i C i i i i j X X E P X j X P X j X E P X E P X ) , | ( ) | ( ) | , ( ) | ( ) ( λ
  • 27. Calculating (Xi) for non-leaves • Suppose Xi has one child, Xc. • Then: Xi Xc                     j C i C j C i i C j C i i i C j i C i i i i j X X j X P j X E P X j X P j X X E P X j X P X j X E P X E P X ) ( λ ) | ( ) | ( ) | ( ) , | ( ) | ( ) | , ( ) | ( ) ( λ
  • 28. Calculating (Xi) for non-leaves • Now, suppose Xi has a set of children, C. • Since Xi d-separates each of its subtrees, the contribution of each subtree to (Xi) is independent:                  C X X j i j C X i j i i i j j j X X X P X X E P X ) λ( ) | ( ) ( λ ) | ( ) ( λ where j(Xi) is the contribution to P(Ei - | Xi) of the part of the evidence lying in the subtree rooted at one of Xi’s children Xj.
  • 29. We are now -happy • So now we have a way to recursively compute all the (Xi)’s, starting from the root and using the leaves as the base case. • If we want, we can think of each node in the network as an autonomous processor that passes a little “ message” to its parent.      
  • 30. The other half of the problem • Remember, P(Xi|E) = (Xi)(Xi). Now that we have all the (Xi)’s, what about the (Xi)’s? (Xi) = P(Xi |Ei + ). • What about the root of the tree, Xr? In that case, Er + is the null set, so (Xr) = P(Xr). No sweat. Since we also know (Xr), we can compute the final P(Xr). • So for an arbitrary Xi with parent Xp, let’s inductively assume we know (Xp) and/or P(Xp|E). How do we get (Xi)?
  • 32. Computing (Xi) Xp Xi       j i p i i i i E j X X P E X P X ) | , ( ) | ( ) ( π
  • 36. Computing (Xi) Xp Xi Where i(Xp) is defined as                           j p i p i j p i p p i j i p p i j i p i p i j i p i i i i j X j X X P j X E j X P j X X P E j X P j X X P E j X P E j X X P E j X X P E X P X ) ( π ) | ( ) ( λ ) | ( ) | ( ) | ( ) | ( ) | ( ) , | ( ) | , ( ) | ( ) ( π ) ( λ ) | ( p i p X E X P
  • 37. We’re done. Yay! • Thus we can compute all the (Xi)’s, and, in turn, all the P(Xi|E)’s. • Can think of nodes as autonomous processors passing  and  messages to their neighbors            
  • 38. Conjunctive queries • What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)? • Just use chain rule: • P(A, B | C) = P(A | C) P(B | A, C) • Each of the latter probabilities can be computed using the technique just discussed.
  • 39. Polytrees • Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent
  • 40. Dealing with cycles • Can deal with undirected cycles in graph by • clustering variables together • Conditioning A B C D A D BC Set to 0 Set to 1
  • 41. Join trees • Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed. A B E D F C G ABC BCD BCD DF In the worst case the join tree nodes must take on exponentially many combinations of values, but often works well in practice