Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees

1. Artiﬁcial Intelligence Belief Propagation and Junction Trees Andres Mendez-Vazquez March 28, 2016 1 / 102

2. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Deﬁnition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 2 / 102

4. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was ﬁrst proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102

5. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was ﬁrst proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102

6. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was ﬁrst proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. A C D B E F G H I 4 / 102

7. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modiﬁed graph called a junction tree by cycle elimination 5 / 102

8. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modiﬁed graph called a junction tree by cycle elimination 5 / 102

10. Example The Message Passing Stuﬀ 7 / 102

11. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102

16. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102

21. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102

25. Example We have that A = NX ∪ DX A X 12 / 102

35. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102

41. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102

45. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102

46. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102

47. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW 19 / 102

48. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW X Y W 19 / 102

56. Thus We have then λ (x) = λY (x) λW (x) for all values x 21 / 102

63. Now Case 2: X /∈ A and X is the root In this speciﬁc case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102

66. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT 24 / 102

67. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT Z T X 24 / 102

78. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102

81. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on speciﬁed values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102

84. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102

90. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102

96. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102

100. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102

103. Example of Tree Initialization We have then 34 / 102

104. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102

112. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102

118. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102

128. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102

134. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102

135. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102

136. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102

139. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102

142. Final Graph We have then H B L C 43 / 102

143. For the Generalization Please look at... Look at pages 123 - 156 at Richard E. Neapolitan. 2003. Learning Bayesian Networks. Prentice-Hall, Inc 44 / 102

144. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more eﬃciently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102

145. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more eﬃciently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102

146. More in the Intuition High-level Intuition Computing marginals is straightforward in a tree structure. 46 / 102

147. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102

150. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes ABC BC BCD CD CDE S 47 / 102

153. Simplicial Node Simplicial Node In a graph G, a vertex v is called simplicial if and only if the subgraph of G induced by the vertex set {v} ∪ N (v) is a clique. N (v) is the neighbor of v in the Graph. 50 / 102

154. Example Vertex 3 is simplicial, while 4 is not 1 2 3 4 51 / 102

155. Perfect Elimination Ordering Deﬁnition A graph G on n vertices is said to have a perfect elimination ordering if and only if there is an ordering {v1, ..., vn} of G’s vertices, such that each vi is simplicial in the subgraph induced by the vertices {v1, ..., vi}. 52 / 102

156. Chordal Graph Deﬁnition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Deﬁnition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102

157. Chordal Graph Deﬁnition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Deﬁnition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102

158. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102

162. Maximal Clique Deﬁnition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to ﬁnd all the maximal cliques of G which takes no more than O N4 time. 55 / 102

165. Elimination Clique Deﬁnition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102

166. Elimination Clique Deﬁnition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102

167. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an eﬃcient algorithm for ﬁnding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102

170. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102

174. We have the following deﬁnitions Deﬁnition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102

182. Definition Junction Tree Given a graph G = (V , E), a graph G = (V , E ) is said to be a Junction Tree for G, iff: 1 The nodes of G are the maximal cliques of G (i.e. G is a clique graph of G.) 2 G is a tree. 3 Running Intersection Property / Junction Tree Property: 1 For each v ∈ V , define Gv to be the induced subgraph of G consisting of exactly those nodes which correspond to maximal cliques of G that contain v. Then Gv must be a connected graph. 62 / 102

184. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102

188. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102

191. Step 2 Find the maximal cliques in the chordal graph List the N Cliques ({vN } ∪ N (vN )) ∩ {v1, ..., vN } ({vN−1} ∪ N (vN−1)) ∩ {v1, ..., vN−1} · · · {v1} Note: If the graph is Chordal this is not necessary because all the cliques are maximal. 66 / 102

192. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102

198. Step 3 This step can be implemented quickly in practice using a hash table Running Time: O |C|2 D = O N2D 68 / 102

199. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For ﬁnding the maximum-weight spanning tree 69 / 102

200. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For ﬁnding the maximum-weight spanning tree 69 / 102

201. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102

204. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102

205. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102

208. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. 74 / 102

209. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. S B L F E T X A 74 / 102

211. How do you build a Junction Tree? Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. S B L F E T X A 76 / 102

213. Listing of Cliques Identify the Cliques A clique is a subset of nodes which is complete (i.e. there is an edge between every pair of nodes) and maximal. S B L F E T X A {B,S,L} {B,L,E} {B,E,F} {L,E,T} {A,T} {E,X} 78 / 102

214. Build the Clique Graph Clique Graph Add an edge between Cj and Ci with weight |Ci ∩ Cj| > 0 BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 79 / 102

215. Getting The Junction Tree Run the Maximum Kruskal’s Algorithm BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 80 / 102

216. Getting The Junction Tree Finally BSL LET BLE EX BEF AT 81 / 102

218. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is deﬁned in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102

221. Then Main idea The idea is to transform one representation of the joint distribution to another in which for each clique, c, the potential function gives the marginal distribution for the variables in c, i.e. φC (xc) = P (xc) This will also apply for each separator, s. 84 / 102

222. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102

225. Then For example, we have at the beginning φBSL = φBFL = φLX = 1,then S B L F X BSL BLF LX After Initialization 86 / 102

227. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102

228. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102

229. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102

230. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102

231. An Example Consider a ﬂow from the clique {B,S,L} to {B,L,F} BSL BL BLF L LX 90 / 102

233. An Example Initial representation φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) P (B) P (L) = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 92 / 102

234. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 0.000152 0.089848 b2 0.000488 0.909512 φBLF = P (F|B, L) l1 l2 f1, b1 0.000114 0.0089848 f1, b2 0.000244 0.0454756 f2, b1 0.000038 0.0808632 f2, b2 0.000244 0.8640364 93 / 102

235. Now Introduce Evidence We have A ﬂow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102

236. Now Introduce Evidence We have A ﬂow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102

237. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 0.00015 0.04985 b2 0.00045 0.14955 φBLF = P (F|B, L) l1 l2 f1, b1 0.0001125 0.004985 f1, b2 0.000245 0.0074775 f2, b1 0.0000375 0.044865 f2, b2 0.000255 0.1420725 95 / 102

239. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – ﬂows passed from periphery to C0 3 Distribution Phase – ﬂows passed from C0 to periphery 97 / 102

243. Example Distribution BSL LET BLE EX BEF AT 99 / 102

244. Example Collection BSL LET BLE EX BEF AT 100 / 102

245. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any conﬁguration which diﬀers from the evidence. 101 / 102

249. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102

250. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102

Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees (20)

More from Andres Mendez-Vazquez (20)

Recently uploaded (20)

Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees