SlideShare a Scribd company logo
Artificial Intelligence
Belief Propagation and Junction Trees
Andres Mendez-Vazquez
March 28, 2016
1 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
2 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
3 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
4 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
4 / 102
Introduction
We will be looking at the following algorithms
Pearl’s Belief Propagation Algorithm
Junction Tree Algorithm
Belief Propagation Algorithm
The algorithm was first proposed by Judea Pearl in 1982, who
formulated this algorithm on trees, and was later extended to
polytrees.
A
C D
B
E
F G H
I
4 / 102
Introduction
Something Notable
It has since been shown to be a useful approximate algorithm on
general graphs.
Junction Tree Algorithm
The junction tree algorithm (also known as ’Clique Tree’) is a method
used in machine learning to extract marginalization in general graphs.
it entails performing belief propagation on a modified graph called a
junction tree by cycle elimination
5 / 102
Introduction
Something Notable
It has since been shown to be a useful approximate algorithm on
general graphs.
Junction Tree Algorithm
The junction tree algorithm (also known as ’Clique Tree’) is a method
used in machine learning to extract marginalization in general graphs.
it entails performing belief propagation on a modified graph called a
junction tree by cycle elimination
5 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
6 / 102
Example
The Message Passing Stuff
7 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Thus
We can do the following
To pass information from below and from above to a certain node V .
Thus
We call those messages
π from above.
λ from below.
8 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
9 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Inference on Trees
Recall
A rooted tree is a DAG
Now
Let (G, P) be a Bayesian network whose DAG is a tree.
Let a be a set of values of a subset A ⊂ V .
For simplicity
Imagine that each node has two children.
The general case can be inferred from it.
10 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Then
Let DX be the subset of A
Containing all members that are in the subtree rooted at X
Including X if X ∈ A
Let NX be the subset
Containing all members of A that are non-descendant’s of X.
This set includes X if X ∈ A
11 / 102
Example
We have that A = NX ∪ DX
A
X
12 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) = P (x|dX , nX )
=
P (dX , nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX |x) P (x)
P (dX , nX )
=
P (dX |x, nX ) P (nX , x) P (x)
P (x) P (dX , nX )
=
P (dX |x) P (x|nX ) P (nX )
P (dX , nX )
Here because d-speration if X /∈ A
=
P (dX |x) P (x|nX ) P (nX )
P (dX |nX ) P (nX )
Note: You need to prove when X ∈ A
13 / 102
Thus
We have for each value of x
P (x|A) =
P (dX |x) P (x|nX )
P (dX |nX )
= βP (dX |x) P (x|nX )
where β, the normalizing factor, is a constant not depending on x.
14 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
15 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Now, we develop the messages
We want
λ (x) P (dX |x)
π (x) P (x|nX )
Where means “proportional to”
Meaning
π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ).
Once, we have that
P (x|a) = αλ (x) π (x)
where α, the normalizing factor, is a constant not depending on x.
16 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Developing λ (x)
We need
λ (x) P (dX |x)
Case 1: X ∈ A and X ∈ DX
Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
λ (ˆx) ≡ 1
λ (x) ≡ 0 for x = ˆx
17 / 102
Now
Case 2: X /∈ A and X is a leaf
Then, dX = ∅ and
P (dX |x) = P (∅|x) = 1 for all values of x
Thus, to achieve proportionality, we can set
λ (x) ≡ 1 for all values of x
18 / 102
Now
Case 2: X /∈ A and X is a leaf
Then, dX = ∅ and
P (dX |x) = P (∅|x) = 1 for all values of x
Thus, to achieve proportionality, we can set
λ (x) ≡ 1 for all values of x
18 / 102
Finally
Case 3: X /∈ A and X is a non-leaf
Let Y be X’s left child, W be X’s right child.
Since X /∈ A
DX = DY ∪ DW
19 / 102
Finally
Case 3: X /∈ A and X is a non-leaf
Let Y be X’s left child, W be X’s right child.
Since X /∈ A
DX = DY ∪ DW
X
Y W
19 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
P (dX |x) = P (dY , dW |x)
= P (dY |x) P (dW |x) Because the d-separation at X
=
y
P (dY , y|x)
w
P (dW , w|x)
=
y
P (y|x) P (dY |y)
w
P (w|x) P (dW |w)
y
P (y|x) λ (y)
w
P (w|x) λ (w)
Thus, we can get proportionality by defining for all values of x
λY (x) = y P (y|x) λ (y)
λW (x) = w P (w|x) λ (w)
20 / 102
Thus
We have then
λ (x) = λY (x) λW (x) for all values x
21 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Developing π (x)
We need
π (x) P (x|nX )
Case 1: X ∈ A and X ∈ NX
Given any X = ˆx, we have:
P (ˆx|nX ) = P (ˆx|ˆx) = 1
P (x|nX ) = P (x|ˆx) = 0 for x = ˆx
Thus, to achieve proportionality, we can set
π (ˆx) ≡ 1
π (x) ≡ 0 for x = ˆx
22 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Now
Case 2: X /∈ A and X is the root
In this specific case nX = ∅ or the empty set of random variables.
Then
P (x|nX ) = P (x|∅) = P (x) for all values of x
Enforcing the proportionality, we get
π (x) ≡ P (x) for all values of x
23 / 102
Then
Case 3: X /∈ A and X is not the root
Without loss of generality assume X is Z’s right child and T is the Z’s
left child
Then, NX = NZ ∪ DT
24 / 102
Then
Case 3: X /∈ A and X is not the root
Without loss of generality assume X is Z’s right child and T is the Z’s
left child
Then, NX = NZ ∪ DT
Z
T X
24 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Then
We have
P (x|nX ) =
z
P (x|z) P (z|nX )
=
z
P (x|z) P (z|nZ , dT )
=
z
P (x|z)
P (z, nZ , dT )
P (nZ , dT )
=
z
P (x|z)
P (dT , z|nZ ) P (nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z, nZ ) P (z|nZ ) P(nZ )
P (nZ , dT )
=
z
P (x|z)
P (dT |z) P (z|nZ ) P(nZ )
P (nZ , dT )
Again the d-separation for z
25 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Last Step
We have
P (x|nX ) =
z
P (x|z)
P (z|nZ ) P (nZ ) P (dT |z)
P (nZ , dT )
= γ
z
P (x|z) π (z) λT (z)
where γ = P(nZ )
P(nZ ,dT )
Thus, we can achieve proportionality by
πX (z) ≡ π (z) λT (z)
Then, setting
π (x) ≡
z
P (x|z) πX (z) for all values of x
26 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
27 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
How do we implement this?
We require the following functions
initial_tree
update-tree
intial_tree has the following input and outputs
Input: ((G, P), A, a, P (x|a))
Output: After this call A and a are both empty making P (x|a) the
prior probability of x.
Then each time a variable V is instantiated for ˆv the routine
update-tree is called
Input: ((G, P), A, a, V , ˆv, P (x|a))
Output: After this call V has been added to A, ˆv has been added to
a and for every value of x, P (x|a) has been updated to be
the conditional probability of x given the new a.
28 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Algorithm: Inference-in-trees
Problem
Given a Bayesian network whose DAG is a tree, determine the probabilities
of the values of each node conditional on specified values of the nodes in
some subset.
Input
Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a
set of values a of a subset A ⊆ V.
Output
The Bayesian network (G, P) updated according to the values in a. The λ
and π values and messages and P(x|a) for each X∈V are considered part
of the network.
29 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Initializing the tree
void initial_tree
input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A,
set-of-variable-values& a)
1 A = ∅
2 a = ∅
3 for (each X∈V)
4 for (each value x of X)
5 λ (x) = 1 // Compute λ values.
6 for (the parent Z of X) // Does nothing if X is the a root.
7 for (each value z of Z)
8 λX (z) = 1 // Compute λ messages.
9 for (each value r of the root R)
10 P(r|a) = P (r) // Compute P(r|a).
11 π (r) = P (r) // Compute R’s π values.
12 for (each child X of R)
13 send_π_msg(R, X)
30 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Updating the tree
void update_tree
Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables&
A, set-of-variable-values& a, variable V , variable-value ˆv)
1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V
to A and instantiate V to ˆv
2 a = ∅
3 for (each value of v = ˆv)
4 λ (v) = 0, π (v) = 0, P(v|a) = 0
5 if (V is not the root && V ’s parent Z /∈A)
6 send_λ_msg(V , Z)
7 for (each child X of V such that X /∈A) )
8 send_π_msg(V , X)
31 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the λ message
void send_λ_msg(node Y , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of x)
2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message
3 λ (x) = U∈CHX
λU (x) // Compute X s λ values
4 P(x|a) = αλ (x) π (x) // Compute P(x|a)
5 normalize P(x|a)
6 if (X is not the root and X s parent Z /∈A)
7 send_λ_msg(X, Z)
8 for (each child W of X such that W = Y and W ∈A) )
9 send_π_msg(X, W )
32 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Sending the π message
void send_π_msg(node Z , node X)
Note: For simplicity (G, P) is not shown as input.
1 for (each value of z)
2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π
message
3 for (each value of x)
4 π (x) = z P (x|z) πX (z) // ComputeX s π values
5 P(x|a) = αλ (x) π (x) // Compute P(x|a)
6 normalize P(x|a)
7 for (each child Y of X such that Y /∈A) )
8 send_π_msg(X, Y )
33 / 102
Example of Tree Initialization
We have then
34 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
We have then
A=∅, a=∅
Compute λ values
λ(h1) = 1;λ(h2) = 1;
λ(b1) = 1; λ(b2) = 1;
λ(l1) = 1; λ(l2) = 1;
λ(c1) = 1; λ(c2) = 1;
Compute λ messages
λB(h1) = 1; λB(h2) = 1;
λL(h1) = 1; λL(h2) = 1;
λC (l1) = 1; λC (l2) = 1;
35 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
Calling initial_tree((G, P), A, a)
Compute P (h|∅)
P(h1|∅) = P(h1) = 0.2
P(h2|∅) = P(h2) = 0.8
Compute H’s π values
π(h1) = P(h1) = 0.2
π(h2) = P(h2) = 0.8
Send messages
send_π_msg(H, B)
send_π_msg(H, L)
36 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
H sends B a π message
πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2
πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8
Compute B’s π values
π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2)
= (0.25) (0.2) + (0.05) (0.8) = 0.09
π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2)
= (0.75) (0.2) + (0.95) (0.8) = 0.91
37 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
The call send_π_msg(H, B)
Compute P (b|∅)
P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α
P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α
Then, normalize
P (b1|∅) =
0.09α
0.09α + 0.91α
= 0.09
P (b2|∅) =
0.91α
0.09α + 0.91α
= 0.91
38 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
H sends L a π message
πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2
πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8
Compute L s π values
π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2)
= (0.003) (0.2) + (0.00005) (0.8) = 0.00064
π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2)
= (0.997) (0.2) + (0.99995) (0.8) = 0.99936
Compute P (l|∅)
P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α
P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α
39 / 102
Send the call send_π_msg(H, L)
Then, normalize
P (l1|∅) =
0.00064α
0.00064α + 0.99936α
= 0.00064
P (l2|∅) =
0.99936α
0.00064α + 0.99936α
= 0.99936
40 / 102
Send the call send_π_msg(H, L)
Then, normalize
P (l1|∅) =
0.00064α
0.00064α + 0.99936α
= 0.00064
P (l2|∅) =
0.99936α
0.00064α + 0.99936α
= 0.99936
40 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
L sends C a π message
πC (l1) = π (l1) = 0.00064
πC (l2) = π (l2) = 0.99936
Compute C s π values
π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2)
= (0.6) (0.00064) + (0.02) (0.99936) = 0.02037
π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2)
= (0.4) (0.00064) + (0.98) (0.99936) = 0.97963
41 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Send the call send_π_msg(L, C)
Compute P (c|∅)
P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α
P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α
Normalize
P (c1|∅) =
0.02037α
0.02037α + 0.97963α
= 0.02037
P (c2|∅) =
0.99936α
0.02037α + 0.97963α
= 0.97963
42 / 102
Final Graph
We have then
H
B L
C
43 / 102
For the Generalization Please look at...
Look at pages 123 - 156 at
Richard E. Neapolitan. 2003. Learning Bayesian Networks. Prentice-Hall,
Inc
44 / 102
History
Invented in 1988
Invented by Lauritzen and Spiegelhalter, 1988
Something Notable
The general idea is that the propagation of evidence through the network
can be carried out more efficiently by representing the joint probability
distribution on an undirected graph called the Junction tree (or Join tree).
45 / 102
History
Invented in 1988
Invented by Lauritzen and Spiegelhalter, 1988
Something Notable
The general idea is that the propagation of evidence through the network
can be carried out more efficiently by representing the joint probability
distribution on an undirected graph called the Junction tree (or Join tree).
45 / 102
More in the Intuition
High-level Intuition
Computing marginals is straightforward in a tree structure.
46 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
47 / 102
Junction Tree Characteristics
The junction tree has the following characteristics
It is an undirected tree
Its nodes are clusters of variables (i.e. from the original BN)
Given two clusters, C1 and C2, every node on the path between them
contains their intersection C1 ∩ C2
In addition
A Separator, S, is associated with each edge and contains the variables in
the intersection between neighboring nodes
ABC BC BCD CD CDE
S
47 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
48 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
49 / 102
Simplicial Node
Simplicial Node
In a graph G, a vertex v is called simplicial if and only if the subgraph of
G induced by the vertex set {v} ∪ N (v) is a clique.
N (v) is the neighbor of v in the Graph.
50 / 102
Example
Vertex 3 is simplicial, while 4 is not
1 2
3 4
51 / 102
Perfect Elimination Ordering
Definition
A graph G on n vertices is said to have a perfect elimination ordering if
and only if there is an ordering {v1, ..., vn} of G’s vertices, such that each
vi is simplicial in the subgraph induced by the vertices {v1, ..., vi}.
52 / 102
Chordal Graph
Definition
A Chordal Graph is one in which all cycles of four or more vertices have a
chord, which is an edge that is not part of the cycle but connects two
vertices of the cycle.
Definition
For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a
set S ⊂ V such that the graph G − S has at least two disjoint connected
components, one of which contains x and another of which contains y.
53 / 102
Chordal Graph
Definition
A Chordal Graph is one in which all cycles of four or more vertices have a
chord, which is an edge that is not part of the cycle but connects two
vertices of the cycle.
Definition
For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a
set S ⊂ V such that the graph G − S has at least two disjoint connected
components, one of which contains x and another of which contains y.
53 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Chordal Graph
Theorem
For a graph G on n vertices, the following conditions are equivalent:
1 G has a perfect elimination ordering.
2 G is chordal.
3 If H is any induced subgraph of G and S is a vertex separator of H of
minimal size, S’s vertices induce a clique.
54 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Maximal Clique
Definition
A maximal clique is a clique that cannot be extended by including one
more adjacent vertex, meaning it is not a subset of a larger clique.
We have the the following Claims
1 A chordal graph with N vertices can have no more than N maximal
cliques.
2 Given a chordal graph with G = (V , E), where |V | = N , there exists
an algorithm to find all the maximal cliques of G which takes no more
than O N4 time.
55 / 102
Elimination Clique
Definition (Elimination Clique)
Given a chordal graph G, and an elimination ordering for G which does
not add any edges.
Suppose node i (Assuming a Labeling) is eliminated in some step of
the elimination algorithm, then the clique consisting of the node i
along with its neighbors during the elimination step (which must be
fully connected since elimination does not add edges) is called an
elimination clique.
Formally
Suppose node i is eliminated in the kth step of the algorithm, and let G(k)
be the graph just before the kth elimination step. Then, the clique
Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k).
56 / 102
Elimination Clique
Definition (Elimination Clique)
Given a chordal graph G, and an elimination ordering for G which does
not add any edges.
Suppose node i (Assuming a Labeling) is eliminated in some step of
the elimination algorithm, then the clique consisting of the node i
along with its neighbors during the elimination step (which must be
fully connected since elimination does not add edges) is called an
elimination clique.
Formally
Suppose node i is eliminated in the kth step of the algorithm, and let G(k)
be the graph just before the kth elimination step. Then, the clique
Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k).
56 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
From This
Theorem
Given a chordal graph and an elimination ordering which does not add any
edges. Let C be the set of maximal cliques in the chordal graph, and let
Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this
elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique
is also an elimination clique for this particular ordering.
Something Notable
The theorem proves the 2nd claims given earlier. Firstly, it shows that a
chordal graph cannot have more than N maximal cliques, since we have
only N elimination cliques.
It is more
It gives us an efficient algorithm for finding these N maximal cliques.
Simply go over each elimination clique and check whether it is
maximal.
57 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Therefore
Even with a brute force approach
It will not take more than |Ce|2
× D = O N3 with D = maxC∈C |C|.
Because
Since both clique size and number of elimination cliques is bounded by N
Observation
The maximum clique problem, which is NP-hard on general graphs, is easy
on chordal graphs.
58 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
59 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
We have the following definitions
Definition
The following are equivalent to the statement “G is a tree”
1 G is a connected, acyclic graph over N nodes.
2 G is a connected graph over N nodes with N − 1 edges.
3 G is a minimal connected graph over N nodes.
4 (Important) G is a graph over N nodes, such that for any 2 nodes i
and j in G , with i = j, there is a unique path from i to j in G.
Theorem
For any graph G = (V , E), the following statements are equivalent:
1 G has a junction tree.
2 G is chordal.
60 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
61 / 102
Definition
Junction Tree
Given a graph G = (V , E), a graph G = (V , E ) is said to be a Junction
Tree for G, iff:
1 The nodes of G are the maximal cliques of G (i.e. G is a clique
graph of G.)
2 G is a tree.
3 Running Intersection Property / Junction Tree Property:
1 For each v ∈ V , define Gv to be the induced subgraph of G
consisting of exactly those nodes which correspond to maximal cliques
of G that contain v. Then Gv must be a connected graph.
62 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
63 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Given a DAG G = (V , E) and |V | = N
Chordalize the graph using the elimination algorithm with an arbitrary
elimination ordering, if required.
For this, you can use the following greedy algorithm
Given a list of nodes:
1 Is the vertex simplicial? If it is not, make it simplicial.
2 If not remove it from the list.
64 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 1
Another way
1 By the Moralization Procedure.
2 Triangulate the moral graph.
Moralization Procedure
1 Add edges between all pairs of nodes that have a common child.
2 Make all edges in the graph undirected.
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3
possesses a chord.
65 / 102
Step 2
Find the maximal cliques in the chordal graph
List the N Cliques
({vN } ∪ N (vN )) ∩ {v1, ..., vN }
({vN−1} ∪ N (vN−1)) ∩ {v1, ..., vN−1}
· · ·
{v1}
Note: If the graph is Chordal this is not necessary because all the
cliques are maximal.
66 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
Compute the separator sets for each pair of maximal cliques and
construct a weighted clique graph
For each pair of maximal cliques (Ci, Cj) in the graph
We check whether they posses any common variables.
If yes, we designate a separator set
Between these 2 cliques as Sij = Ci ∩ Cj.
Then, we compute these separators trees
We build a clique graph:
Nodes are the Cliques.
Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0.
67 / 102
Step 3
This step can be implemented quickly in practice using a hash table
Running Time: O |C|2
D = O N2D
68 / 102
Step 4
Compute a maximum-weight spanning tree on the weighted clique
graph to obtain a junction tree
You can us for this the Kruskal and Prim for Maximum Weight Graph
We will give Kruskal’s algorithm
For finding the maximum-weight spanning tree
69 / 102
Step 4
Compute a maximum-weight spanning tree on the weighted clique
graph to obtain a junction tree
You can us for this the Kruskal and Prim for Maximum Weight Graph
We will give Kruskal’s algorithm
For finding the maximum-weight spanning tree
69 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
Maximal Kruskal’s algorithm
Initialize an edgeless graph T with nodes that are all the maximal cliques
in our chordal graph.
Then
We will add edges to T until it becomes a junction tree.
Sort the m edges ei in our clique graph from step 3 by weight wi
We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1
70 / 102
Step 4
For i = 1, 2, ..., m
1 Add edge ei to T if it does not introduce a cycle.
2 If |C| − 1 edges have been added, quit.
Running Time given that |E| = O |C|2
O |C|2
log |C|2
= O |C|2
log |C| = O N2 log N
71 / 102
Step 4
For i = 1, 2, ..., m
1 Add edge ei to T if it does not introduce a cycle.
2 If |C| − 1 edges have been added, quit.
Running Time given that |E| = O |C|2
O |C|2
log |C|2
= O |C|2
log |C| = O N2 log N
71 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
72 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
73 / 102
How do you build a Junction Tree?
Given a General DAG
S
B L
F
E
T
X
A
Build a Chordal Graph
Moral Graph – marry common parents and remove arrows.
74 / 102
How do you build a Junction Tree?
Given a General DAG
S
B L
F
E
T
X
A
Build a Chordal Graph
Moral Graph – marry common parents and remove arrows.
S
B L
F
E
T
X
A
74 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
75 / 102
How do you build a Junction Tree?
Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater
than 3 possesses a chord.
S
B L
F
E
T
X
A
76 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
77 / 102
Listing of Cliques
Identify the Cliques
A clique is a subset of nodes which is complete (i.e. there is an edge
between every pair of nodes) and maximal.
S
B L
F
E
T
X
A {B,S,L}
{B,L,E}
{B,E,F}
{L,E,T}
{A,T}
{E,X}
78 / 102
Build the Clique Graph
Clique Graph
Add an edge between Cj and Ci with weight |Ci ∩ Cj| > 0
BSL
LET
BLE
EX
BEF
AT
1 1
1
2
2
2
1
1
1
79 / 102
Getting The Junction Tree
Run the Maximum Kruskal’s Algorithm
BSL
LET
BLE
EX
BEF
AT
1 1
1
2
2
2
1
1
1
80 / 102
Getting The Junction Tree
Finally
BSL
LET
BLE
EX
BEF
AT
81 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
82 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Potential Representation for the Junction Tree
Something Notable
The joint probability distribution can now be represented in terms of
potential functions, φ.
This is defined in each clique and each separator
Thus
P (x) =
n
i=1 φC (xci )
m
j=1 φS xsj
where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and
xsj correspond to a separator.
Basic idea is to represent probability distribution corresponding to any
graph as a product of clique potentials
P (x) =
1
Z
n
i=1
φC (xci )
83 / 102
Then
Main idea
The idea is to transform one representation of the joint distribution to
another in which for each clique, c, the potential function gives the
marginal distribution for the variables in c, i.e.
φC (xc) = P (xc)
This will also apply for each separator, s.
84 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Now, Initialization
To initialize the potential functions
1 Set all potentials to unity
2 For each variable, xi, select one node in the junction tree (i.e. one
clique) containing both that variable and its parents, pa(xi), in the
original DAG.
3 Multiply the potential by P (xi|pa (xi))
85 / 102
Then
For example, we have at the beginning φBSL = φBFL = φLX = 1,then
S
B L
F X
BSL BLF LX
After Initialization
86 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
87 / 102
Propagating Information in a Junction Tree
Passing Information using the separators
Passing information from one clique C1 to another C2 via the separator in
between them, S0, requires two steps
First Step
Obtain a new potential for S0 by marginalizing out the variables in C1 that
are not in S0:
φ∗
S0
=
C1−S0
φC1
88 / 102
Propagating Information in a Junction Tree
Passing Information using the separators
Passing information from one clique C1 to another C2 via the separator in
between them, S0, requires two steps
First Step
Obtain a new potential for S0 by marginalizing out the variables in C1 that
are not in S0:
φ∗
S0
=
C1−S0
φC1
88 / 102
Propagating Information in a Junction Tree
Second Step
Obtain a new potential for C2:
φ∗
C2
= φC2 λS0
Where
λS0 =
φ∗
S0
φS0
89 / 102
Propagating Information in a Junction Tree
Second Step
Obtain a new potential for C2:
φ∗
C2
= φC2 λS0
Where
λS0 =
φ∗
S0
φS0
89 / 102
An Example
Consider a flow from the clique {B,S,L} to {B,L,F}
BSL BL BLF L LX
90 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
91 / 102
An Example
Initial representation
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0.000002 0.039998
s2, b2 0.000038 0.759962
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L) P (B) P (L) = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
92 / 102
An Example
After Flow
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0.000002 0.039998
s2, b2 0.000038 0.759962
φBL = 1
l1 l2
b1 0.000152 0.089848
b2 0.000488 0.909512
φBLF = P (F|B, L)
l1 l2
f1, b1 0.000114 0.0089848
f1, b2 0.000244 0.0454756
f2, b1 0.000038 0.0808632
f2, b2 0.000244 0.8640364
93 / 102
Now Introduce Evidence
We have
A flow from the clique {B,S,L} to {B,L,F}, but this time we he
information that Joe is a smoker, S = s1.
Incorporation of Evidence
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
94 / 102
Now Introduce Evidence
We have
A flow from the clique {B,S,L} to {B,L,F}, but this time we he
information that Joe is a smoker, S = s1.
Incorporation of Evidence
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 1 1
b2 1 1
φBLF = P (F|B, L)
l1 l2
f1, b1 0.75 0.1
f1, b2 0.5 0.05
f2, b1 0.25 0.9
f2, b2 0.5 0.95
94 / 102
An Example
After Flow
φBSL = P (B|S) P (L|S) P (S)
l1 l2
s1, b1 0.00015 0.04985
s1, b2 0.00045 0.14955
s2, b1 0 0
s2, b2 0 0
φBL = 1
l1 l2
b1 0.00015 0.04985
b2 0.00045 0.14955
φBLF = P (F|B, L)
l1 l2
f1, b1 0.0001125 0.004985
f1, b2 0.000245 0.0074775
f2, b1 0.0000375 0.044865
f2, b2 0.000255 0.1420725
95 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
96 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
The Full Propagation
Two phase propagation (Jensen et al, 1990)
1 Select an arbitrary clique, C0
2 Collection Phase – flows passed from periphery to C0
3 Distribution Phase – flows passed from C0 to periphery
97 / 102
Outline
1 Introduction
What do we want?
2 Belief Propagation
The Intuition
Inference on Trees
The Messages
The Implementation
3 Junction Trees
How do you build a Junction Tree?
Chordal Graph
Tree Graphs
Junction Tree Formal Definition
Algorithm For Building Junction Trees
Example
Moralize the DAG
Triangulate
Listing of Cliques
Potential Function
Propagating Information in a Junction Tree
Example
Now, the Full Propagation
Example of Propagation
98 / 102
Example
Distribution
BSL
LET
BLE
EX
BEF
AT
99 / 102
Example
Collection
BSL
LET
BLE
EX
BEF
AT
100 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After the two propagation phases have been carried out
The Junction tree will be in equilibrium with each clique containing
the joint probability distribution for the variables it contains.
Marginal probabilities for individual variables can then be obtained
from the cliques.
Now, some evidence E can be included before propagation
By selecting a clique for each variable for which evidence is available.
The potential for the clique is then set to 0 for any configuration
which differs from the evidence.
101 / 102
The Full Propagation
After propagation the result will be
P (x, E) = c∈C φc (xc, E)
s∈S φs (xs, E)
After normalization
P (x|E) = c∈C φc (xc|E)
s∈S φs (xs|E)
102 / 102
The Full Propagation
After propagation the result will be
P (x, E) = c∈C φc (xc, E)
s∈S φs (xs, E)
After normalization
P (x|E) = c∈C φc (xc|E)
s∈S φs (xs|E)
102 / 102

More Related Content

PDF
Artificial Intelligence 06.2 More on Causality Bayesian Networks
PDF
16 Machine Learning Universal Approximation Multilayer Perceptron
PDF
06 Machine Learning - Naive Bayes
PDF
18 Machine Learning Radial Basis Function Networks Forward Heuristics
PDF
Introduction to logistic regression
PDF
24 Machine Learning Combining Models - Ada Boost
PDF
The Kernel Trick
PDF
Kernels and Support Vector Machines
Artificial Intelligence 06.2 More on Causality Bayesian Networks
16 Machine Learning Universal Approximation Multilayer Perceptron
06 Machine Learning - Naive Bayes
18 Machine Learning Radial Basis Function Networks Forward Heuristics
Introduction to logistic regression
24 Machine Learning Combining Models - Ada Boost
The Kernel Trick
Kernels and Support Vector Machines

What's hot (20)

PDF
23 Machine Learning Feature Generation
PDF
11 Machine Learning Important Issues in Machine Learning
PDF
20 k-means, k-center, k-meoids and variations
PDF
Lecture10 - Naïve Bayes
PPTX
Fuzzy logic andits Applications
PDF
Lecture 3 qualtifed rules of inference
PDF
07 Machine Learning - Expectation Maximization
PDF
An overview of Bayesian testing
PDF
27 Machine Learning Unsupervised Measure Properties
PDF
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
PDF
Machine Learning and Data Mining - Decision Trees
PDF
Can we estimate a constant?
PPTX
Module 4 part_1
PDF
Probability and Statistics
PDF
An Extension to the Zero-Inflated Generalized Power Series Distributions
PDF
Matroid Basics
PDF
MLHEP Lectures - day 1, basic track
PDF
EVEN GRACEFUL LABELLING OF A CLASS OF TREES
PDF
Astaño 4
PDF
Rademacher Averages: Theory and Practice
23 Machine Learning Feature Generation
11 Machine Learning Important Issues in Machine Learning
20 k-means, k-center, k-meoids and variations
Lecture10 - Naïve Bayes
Fuzzy logic andits Applications
Lecture 3 qualtifed rules of inference
07 Machine Learning - Expectation Maximization
An overview of Bayesian testing
27 Machine Learning Unsupervised Measure Properties
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Machine Learning and Data Mining - Decision Trees
Can we estimate a constant?
Module 4 part_1
Probability and Statistics
An Extension to the Zero-Inflated Generalized Power Series Distributions
Matroid Basics
MLHEP Lectures - day 1, basic track
EVEN GRACEFUL LABELLING OF A CLASS OF TREES
Astaño 4
Rademacher Averages: Theory and Practice
Ad

Viewers also liked (18)

PDF
02 probabilistic inference in graphical models
PPTX
Bayesian Networks with R and Hadoop
PDF
Bn presentation
PDF
World Cup Qualification Prediction - How it works
PDF
C04922125
PDF
Physics of Algorithms Talk
PDF
Efficient Belief Propagation in Depth Finding
PDF
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
PDF
Bayes Belief Network
PPTX
A Movement Recognition Method using LBP
PDF
SocNL: Bayesian Label Propagation with Confidence
DOCX
Peter LaBrash resume
PPTX
Presentación Iriana Colina Seguridad industrial 2 semestre
PPSX
áGuia azul gracilene pinto
PPT
Introducción a la computacón
PDF
How to prepare for class 12 board exams
PPTX
Digimon leonardo zec & ivan đopar 7.d
PPT
Cake Phpで簡単問い合わせフォームの作り方
02 probabilistic inference in graphical models
Bayesian Networks with R and Hadoop
Bn presentation
World Cup Qualification Prediction - How it works
C04922125
Physics of Algorithms Talk
Efficient Belief Propagation in Depth Finding
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
Bayes Belief Network
A Movement Recognition Method using LBP
SocNL: Bayesian Label Propagation with Confidence
Peter LaBrash resume
Presentación Iriana Colina Seguridad industrial 2 semestre
áGuia azul gracilene pinto
Introducción a la computacón
How to prepare for class 12 board exams
Digimon leonardo zec & ivan đopar 7.d
Cake Phpで簡単問い合わせフォームの作り方
Ad

Similar to Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees (20)

PDF
Lecture15 xing
PPT
Basen Network
PPT
Bayesnetwork
PDF
kactl.pdf
PPT
Bayesian Networks Model in Step By Steps
PPT
ch15BayesNet.ppt
PDF
Mit18 05 s14_class3slides
PPT
Bayesian Belief Network (BBN) Bayesian Belief Network (BBN) Bayesian Belief N...
PDF
Reading Seminar (140515) Spectral Learning of L-PCFGs
PDF
A Probabilistic Attack On NP-Complete Problems
PPT
Discrete probability
PDF
Lesson 29
PDF
AI Lesson 29
PDF
Bayes Belief Networks
PDF
ExamsGamesAndKnapsacks_RobMooreOxfordThesis
PPTX
Randomized algorithms all pairs shortest path
PDF
Ab cancun
PDF
Statistical inference of generative network models - Tiago P. Peixoto
PDF
Learning Bayesian Networks
PDF
Probability Formula sheet
Lecture15 xing
Basen Network
Bayesnetwork
kactl.pdf
Bayesian Networks Model in Step By Steps
ch15BayesNet.ppt
Mit18 05 s14_class3slides
Bayesian Belief Network (BBN) Bayesian Belief Network (BBN) Bayesian Belief N...
Reading Seminar (140515) Spectral Learning of L-PCFGs
A Probabilistic Attack On NP-Complete Problems
Discrete probability
Lesson 29
AI Lesson 29
Bayes Belief Networks
ExamsGamesAndKnapsacks_RobMooreOxfordThesis
Randomized algorithms all pairs shortest path
Ab cancun
Statistical inference of generative network models - Tiago P. Peixoto
Learning Bayesian Networks
Probability Formula sheet

More from Andres Mendez-Vazquez (20)

PDF
2.03 bayesian estimation
PDF
05 linear transformations
PDF
01.04 orthonormal basis_eigen_vectors
PDF
01.03 squared matrices_and_other_issues
PDF
01.02 linear equations
PDF
01.01 vector spaces
PDF
06 recurrent neural_networks
PDF
05 backpropagation automatic_differentiation
PDF
Zetta global
PDF
01 Introduction to Neural Networks and Deep Learning
PDF
25 introduction reinforcement_learning
PDF
Neural Networks and Deep Learning Syllabus
PDF
Introduction to artificial_intelligence_syllabus
PDF
Ideas 09 22_2018
PDF
Ideas about a Bachelor in Machine Learning/Data Sciences
PDF
Analysis of Algorithms Syllabus
PDF
18.1 combining models
PDF
17 vapnik chervonenkis dimension
PDF
A basic introduction to learning
PDF
Introduction Mathematics Intelligent Systems Syllabus
2.03 bayesian estimation
05 linear transformations
01.04 orthonormal basis_eigen_vectors
01.03 squared matrices_and_other_issues
01.02 linear equations
01.01 vector spaces
06 recurrent neural_networks
05 backpropagation automatic_differentiation
Zetta global
01 Introduction to Neural Networks and Deep Learning
25 introduction reinforcement_learning
Neural Networks and Deep Learning Syllabus
Introduction to artificial_intelligence_syllabus
Ideas 09 22_2018
Ideas about a Bachelor in Machine Learning/Data Sciences
Analysis of Algorithms Syllabus
18.1 combining models
17 vapnik chervonenkis dimension
A basic introduction to learning
Introduction Mathematics Intelligent Systems Syllabus

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Sustainable Sites - Green Building Construction
PPTX
Welding lecture in detail for understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
PPT on Performance Review to get promotions
PDF
Well-logging-methods_new................
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
composite construction of structures.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Construction Project Organization Group 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mechanical Engineering MATERIALS Selection
Sustainable Sites - Green Building Construction
Welding lecture in detail for understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Operating System & Kernel Study Guide-1 - converted.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT on Performance Review to get promotions
Well-logging-methods_new................
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
composite construction of structures.pdf

Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junction Trees

  • 1. Artificial Intelligence Belief Propagation and Junction Trees Andres Mendez-Vazquez March 28, 2016 1 / 102
  • 2. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 2 / 102
  • 3. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 3 / 102
  • 4. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102
  • 5. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. 4 / 102
  • 6. Introduction We will be looking at the following algorithms Pearl’s Belief Propagation Algorithm Junction Tree Algorithm Belief Propagation Algorithm The algorithm was first proposed by Judea Pearl in 1982, who formulated this algorithm on trees, and was later extended to polytrees. A C D B E F G H I 4 / 102
  • 7. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modified graph called a junction tree by cycle elimination 5 / 102
  • 8. Introduction Something Notable It has since been shown to be a useful approximate algorithm on general graphs. Junction Tree Algorithm The junction tree algorithm (also known as ’Clique Tree’) is a method used in machine learning to extract marginalization in general graphs. it entails performing belief propagation on a modified graph called a junction tree by cycle elimination 5 / 102
  • 9. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 6 / 102
  • 10. Example The Message Passing Stuff 7 / 102
  • 11. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 12. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 13. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 14. Thus We can do the following To pass information from below and from above to a certain node V . Thus We call those messages π from above. λ from below. 8 / 102
  • 15. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 9 / 102
  • 16. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 17. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 18. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 19. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 20. Inference on Trees Recall A rooted tree is a DAG Now Let (G, P) be a Bayesian network whose DAG is a tree. Let a be a set of values of a subset A ⊂ V . For simplicity Imagine that each node has two children. The general case can be inferred from it. 10 / 102
  • 21. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 22. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 23. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 24. Then Let DX be the subset of A Containing all members that are in the subtree rooted at X Including X if X ∈ A Let NX be the subset Containing all members of A that are non-descendant’s of X. This set includes X if X ∈ A 11 / 102
  • 25. Example We have that A = NX ∪ DX A X 12 / 102
  • 26. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 27. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 28. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 29. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 30. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 31. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 32. Thus We have for each value of x P (x|A) = P (x|dX , nX ) = P (dX , nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX |x) P (x) P (dX , nX ) = P (dX |x, nX ) P (nX , x) P (x) P (x) P (dX , nX ) = P (dX |x) P (x|nX ) P (nX ) P (dX , nX ) Here because d-speration if X /∈ A = P (dX |x) P (x|nX ) P (nX ) P (dX |nX ) P (nX ) Note: You need to prove when X ∈ A 13 / 102
  • 33. Thus We have for each value of x P (x|A) = P (dX |x) P (x|nX ) P (dX |nX ) = βP (dX |x) P (x|nX ) where β, the normalizing factor, is a constant not depending on x. 14 / 102
  • 34. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 15 / 102
  • 35. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 36. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 37. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 38. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 39. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 40. Now, we develop the messages We want λ (x) P (dX |x) π (x) P (x|nX ) Where means “proportional to” Meaning π(x) may not be equal to P (x|nX ), but π(x) = k × P (x|nX ). Once, we have that P (x|a) = αλ (x) π (x) where α, the normalizing factor, is a constant not depending on x. 16 / 102
  • 41. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 42. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 43. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 44. Developing λ (x) We need λ (x) P (dX |x) Case 1: X ∈ A and X ∈ DX Given any X = ˆx, we have that for P (dX |x) = 0 for x = ˆx Thus, to achieve proportionality, we can set λ (ˆx) ≡ 1 λ (x) ≡ 0 for x = ˆx 17 / 102
  • 45. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102
  • 46. Now Case 2: X /∈ A and X is a leaf Then, dX = ∅ and P (dX |x) = P (∅|x) = 1 for all values of x Thus, to achieve proportionality, we can set λ (x) ≡ 1 for all values of x 18 / 102
  • 47. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW 19 / 102
  • 48. Finally Case 3: X /∈ A and X is a non-leaf Let Y be X’s left child, W be X’s right child. Since X /∈ A DX = DY ∪ DW X Y W 19 / 102
  • 49. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 50. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 51. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 52. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 53. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 54. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 55. Thus We have then P (dX |x) = P (dY , dW |x) = P (dY |x) P (dW |x) Because the d-separation at X = y P (dY , y|x) w P (dW , w|x) = y P (y|x) P (dY |y) w P (w|x) P (dW |w) y P (y|x) λ (y) w P (w|x) λ (w) Thus, we can get proportionality by defining for all values of x λY (x) = y P (y|x) λ (y) λW (x) = w P (w|x) λ (w) 20 / 102
  • 56. Thus We have then λ (x) = λY (x) λW (x) for all values x 21 / 102
  • 57. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 58. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 59. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 60. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 61. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 62. Developing π (x) We need π (x) P (x|nX ) Case 1: X ∈ A and X ∈ NX Given any X = ˆx, we have: P (ˆx|nX ) = P (ˆx|ˆx) = 1 P (x|nX ) = P (x|ˆx) = 0 for x = ˆx Thus, to achieve proportionality, we can set π (ˆx) ≡ 1 π (x) ≡ 0 for x = ˆx 22 / 102
  • 63. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 64. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 65. Now Case 2: X /∈ A and X is the root In this specific case nX = ∅ or the empty set of random variables. Then P (x|nX ) = P (x|∅) = P (x) for all values of x Enforcing the proportionality, we get π (x) ≡ P (x) for all values of x 23 / 102
  • 66. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT 24 / 102
  • 67. Then Case 3: X /∈ A and X is not the root Without loss of generality assume X is Z’s right child and T is the Z’s left child Then, NX = NZ ∪ DT Z T X 24 / 102
  • 68. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 69. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 70. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 71. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 72. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 73. Then We have P (x|nX ) = z P (x|z) P (z|nX ) = z P (x|z) P (z|nZ , dT ) = z P (x|z) P (z, nZ , dT ) P (nZ , dT ) = z P (x|z) P (dT , z|nZ ) P (nZ ) P (nZ , dT ) = z P (x|z) P (dT |z, nZ ) P (z|nZ ) P(nZ ) P (nZ , dT ) = z P (x|z) P (dT |z) P (z|nZ ) P(nZ ) P (nZ , dT ) Again the d-separation for z 25 / 102
  • 74. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 75. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 76. Last Step We have P (x|nX ) = z P (x|z) P (z|nZ ) P (nZ ) P (dT |z) P (nZ , dT ) = γ z P (x|z) π (z) λT (z) where γ = P(nZ ) P(nZ ,dT ) Thus, we can achieve proportionality by πX (z) ≡ π (z) λT (z) Then, setting π (x) ≡ z P (x|z) πX (z) for all values of x 26 / 102
  • 77. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 27 / 102
  • 78. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 79. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 80. How do we implement this? We require the following functions initial_tree update-tree intial_tree has the following input and outputs Input: ((G, P), A, a, P (x|a)) Output: After this call A and a are both empty making P (x|a) the prior probability of x. Then each time a variable V is instantiated for ˆv the routine update-tree is called Input: ((G, P), A, a, V , ˆv, P (x|a)) Output: After this call V has been added to A, ˆv has been added to a and for every value of x, P (x|a) has been updated to be the conditional probability of x given the new a. 28 / 102
  • 81. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 82. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 83. Algorithm: Inference-in-trees Problem Given a Bayesian network whose DAG is a tree, determine the probabilities of the values of each node conditional on specified values of the nodes in some subset. Input Bayesian network (G, P) whose DAG is a tree, where G = (V , E), and a set of values a of a subset A ⊆ V. Output The Bayesian network (G, P) updated according to the values in a. The λ and π values and messages and P(x|a) for each X∈V are considered part of the network. 29 / 102
  • 84. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 85. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 86. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 87. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 88. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 89. Initializing the tree void initial_tree input: (Bayesian-network& (G, P) where G = (V, E), set-of-variables& A, set-of-variable-values& a) 1 A = ∅ 2 a = ∅ 3 for (each X∈V) 4 for (each value x of X) 5 λ (x) = 1 // Compute λ values. 6 for (the parent Z of X) // Does nothing if X is the a root. 7 for (each value z of Z) 8 λX (z) = 1 // Compute λ messages. 9 for (each value r of the root R) 10 P(r|a) = P (r) // Compute P(r|a). 11 π (r) = P (r) // Compute R’s π values. 12 for (each child X of R) 13 send_π_msg(R, X) 30 / 102
  • 90. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 91. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 92. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 93. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 94. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 95. Updating the tree void update_tree Input: (Bayesian-network& (G, P) where G = (V , E), set-of-variables& A, set-of-variable-values& a, variable V , variable-value ˆv) 1 A = A∪ {V }, a= a∪ {ˆv}, λ (ˆv) = 1, π (ˆv) = 1, P(ˆv|a) = 1 // Add V to A and instantiate V to ˆv 2 a = ∅ 3 for (each value of v = ˆv) 4 λ (v) = 0, π (v) = 0, P(v|a) = 0 5 if (V is not the root && V ’s parent Z /∈A) 6 send_λ_msg(V , Z) 7 for (each child X of V such that X /∈A) ) 8 send_π_msg(V , X) 31 / 102
  • 96. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 97. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 98. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 99. Sending the λ message void send_λ_msg(node Y , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of x) 2 λY (x) = y P (y|x) λ (y) // Y sends X a λ message 3 λ (x) = U∈CHX λU (x) // Compute X s λ values 4 P(x|a) = αλ (x) π (x) // Compute P(x|a) 5 normalize P(x|a) 6 if (X is not the root and X s parent Z /∈A) 7 send_λ_msg(X, Z) 8 for (each child W of X such that W = Y and W ∈A) ) 9 send_π_msg(X, W ) 32 / 102
  • 100. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 101. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 102. Sending the π message void send_π_msg(node Z , node X) Note: For simplicity (G, P) is not shown as input. 1 for (each value of z) 2 πX (z) = π (z) Y ∈CHZ −{X} λY (z) // Z sends X a π message 3 for (each value of x) 4 π (x) = z P (x|z) πX (z) // ComputeX s π values 5 P(x|a) = αλ (x) π (x) // Compute P(x|a) 6 normalize P(x|a) 7 for (each child Y of X such that Y /∈A) ) 8 send_π_msg(X, Y ) 33 / 102
  • 103. Example of Tree Initialization We have then 34 / 102
  • 104. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 105. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 106. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 107. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 108. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 109. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 110. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 111. Calling initial_tree((G, P), A, a) We have then A=∅, a=∅ Compute λ values λ(h1) = 1;λ(h2) = 1; λ(b1) = 1; λ(b2) = 1; λ(l1) = 1; λ(l2) = 1; λ(c1) = 1; λ(c2) = 1; Compute λ messages λB(h1) = 1; λB(h2) = 1; λL(h1) = 1; λL(h2) = 1; λC (l1) = 1; λC (l2) = 1; 35 / 102
  • 112. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 113. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 114. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 115. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 116. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 117. Calling initial_tree((G, P), A, a) Compute P (h|∅) P(h1|∅) = P(h1) = 0.2 P(h2|∅) = P(h2) = 0.8 Compute H’s π values π(h1) = P(h1) = 0.2 π(h2) = P(h2) = 0.8 Send messages send_π_msg(H, B) send_π_msg(H, L) 36 / 102
  • 118. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 119. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 120. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 121. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 122. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 123. The call send_π_msg(H, B) H sends B a π message πB(h1) = π(h1)λL(h1) = 0.2 × 1 = 0.2 πB(h2) = π(h2)λL(h2) = 0.8 × 1 = 0.8 Compute B’s π values π (b1) = P (b1|h1) πB (h1) + P (b1|h2) πB (h2) = (0.25) (0.2) + (0.05) (0.8) = 0.09 π (b2) = P (b2|h1) πB (h1) + P (b2|h2) πB (h2) = (0.75) (0.2) + (0.95) (0.8) = 0.91 37 / 102
  • 124. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 125. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 126. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 127. The call send_π_msg(H, B) Compute P (b|∅) P (b1|∅) = αλ (b1) π (b1) = α (1) (0.09) = 0.09α P (b2|∅) = αλ (b2) π (b2) = α (1) (0.91) = 0.91α Then, normalize P (b1|∅) = 0.09α 0.09α + 0.91α = 0.09 P (b2|∅) = 0.91α 0.09α + 0.91α = 0.91 38 / 102
  • 128. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 129. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 130. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 131. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 132. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 133. Send the call send_π_msg(H, L) H sends L a π message πL (h1) = π (h1) λB (h1) = (0.2) (1) = 0.2 πL (h2) = π (h2) λB (h2) = (0.8) (1) = 0.8 Compute L s π values π (l1) = P (l1|h1) πL (h1) + P (l1|h2) πL (h2) = (0.003) (0.2) + (0.00005) (0.8) = 0.00064 π (l2) = P (l2|h1) πB (h1) + P (l2|h2) πB (h2) = (0.997) (0.2) + (0.99995) (0.8) = 0.99936 Compute P (l|∅) P (l1|∅) = αλ (l1) π (l1) = α (1) (0.00064) = 0.00064α P (l2|∅) = αλ (l2) π (l2) = α (1) (0.99936) = 0.99936α 39 / 102
  • 134. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102
  • 135. Send the call send_π_msg(H, L) Then, normalize P (l1|∅) = 0.00064α 0.00064α + 0.99936α = 0.00064 P (l2|∅) = 0.99936α 0.00064α + 0.99936α = 0.99936 40 / 102
  • 136. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 137. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 138. Send the call send_π_msg(L, C) L sends C a π message πC (l1) = π (l1) = 0.00064 πC (l2) = π (l2) = 0.99936 Compute C s π values π (c1) = P (c1|l1) πC (l1) + P (c1|l2) πC (l2) = (0.6) (0.00064) + (0.02) (0.99936) = 0.02037 π (c2) = P (c2|l1) πC (h1) + P (c2|l2) πC (l2) = (0.4) (0.00064) + (0.98) (0.99936) = 0.97963 41 / 102
  • 139. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 140. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 141. Send the call send_π_msg(L, C) Compute P (c|∅) P (c1|∅) = αλ (c1) π (c1) = α (1) (0.02037) = 0.02037α P (c2|∅) = αλ (c2) π (c2) = α (1) (0.97963) = 0.97963α Normalize P (c1|∅) = 0.02037α 0.02037α + 0.97963α = 0.02037 P (c2|∅) = 0.99936α 0.02037α + 0.97963α = 0.97963 42 / 102
  • 142. Final Graph We have then H B L C 43 / 102
  • 143. For the Generalization Please look at... Look at pages 123 - 156 at Richard E. Neapolitan. 2003. Learning Bayesian Networks. Prentice-Hall, Inc 44 / 102
  • 144. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more efficiently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102
  • 145. History Invented in 1988 Invented by Lauritzen and Spiegelhalter, 1988 Something Notable The general idea is that the propagation of evidence through the network can be carried out more efficiently by representing the joint probability distribution on an undirected graph called the Junction tree (or Join tree). 45 / 102
  • 146. More in the Intuition High-level Intuition Computing marginals is straightforward in a tree structure. 46 / 102
  • 147. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 148. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 149. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes 47 / 102
  • 150. Junction Tree Characteristics The junction tree has the following characteristics It is an undirected tree Its nodes are clusters of variables (i.e. from the original BN) Given two clusters, C1 and C2, every node on the path between them contains their intersection C1 ∩ C2 In addition A Separator, S, is associated with each edge and contains the variables in the intersection between neighboring nodes ABC BC BCD CD CDE S 47 / 102
  • 151. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 48 / 102
  • 152. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 49 / 102
  • 153. Simplicial Node Simplicial Node In a graph G, a vertex v is called simplicial if and only if the subgraph of G induced by the vertex set {v} ∪ N (v) is a clique. N (v) is the neighbor of v in the Graph. 50 / 102
  • 154. Example Vertex 3 is simplicial, while 4 is not 1 2 3 4 51 / 102
  • 155. Perfect Elimination Ordering Definition A graph G on n vertices is said to have a perfect elimination ordering if and only if there is an ordering {v1, ..., vn} of G’s vertices, such that each vi is simplicial in the subgraph induced by the vertices {v1, ..., vi}. 52 / 102
  • 156. Chordal Graph Definition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Definition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102
  • 157. Chordal Graph Definition A Chordal Graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Definition For any two vertices x, y ∈ G such that (x, y) ∈ E, a x − y separator is a set S ⊂ V such that the graph G − S has at least two disjoint connected components, one of which contains x and another of which contains y. 53 / 102
  • 158. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 159. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 160. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 161. Chordal Graph Theorem For a graph G on n vertices, the following conditions are equivalent: 1 G has a perfect elimination ordering. 2 G is chordal. 3 If H is any induced subgraph of G and S is a vertex separator of H of minimal size, S’s vertices induce a clique. 54 / 102
  • 162. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 163. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 164. Maximal Clique Definition A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. We have the the following Claims 1 A chordal graph with N vertices can have no more than N maximal cliques. 2 Given a chordal graph with G = (V , E), where |V | = N , there exists an algorithm to find all the maximal cliques of G which takes no more than O N4 time. 55 / 102
  • 165. Elimination Clique Definition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102
  • 166. Elimination Clique Definition (Elimination Clique) Given a chordal graph G, and an elimination ordering for G which does not add any edges. Suppose node i (Assuming a Labeling) is eliminated in some step of the elimination algorithm, then the clique consisting of the node i along with its neighbors during the elimination step (which must be fully connected since elimination does not add edges) is called an elimination clique. Formally Suppose node i is eliminated in the kth step of the algorithm, and let G(k) be the graph just before the kth elimination step. Then, the clique Ci = {i} ∪ N(k) (i) where N(k) (i) is the neighbor of i in the Graph G(k). 56 / 102
  • 167. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 168. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 169. From This Theorem Given a chordal graph and an elimination ordering which does not add any edges. Let C be the set of maximal cliques in the chordal graph, and let Ce = (∪i∈V Ci) be the set of elimination cliques obtained from this elimination ordering. Then, C ⊆ Ce. In other words, every maximal clique is also an elimination clique for this particular ordering. Something Notable The theorem proves the 2nd claims given earlier. Firstly, it shows that a chordal graph cannot have more than N maximal cliques, since we have only N elimination cliques. It is more It gives us an efficient algorithm for finding these N maximal cliques. Simply go over each elimination clique and check whether it is maximal. 57 / 102
  • 170. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 171. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 172. Therefore Even with a brute force approach It will not take more than |Ce|2 × D = O N3 with D = maxC∈C |C|. Because Since both clique size and number of elimination cliques is bounded by N Observation The maximum clique problem, which is NP-hard on general graphs, is easy on chordal graphs. 58 / 102
  • 173. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 59 / 102
  • 174. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 175. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 176. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 177. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 178. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 179. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 180. We have the following definitions Definition The following are equivalent to the statement “G is a tree” 1 G is a connected, acyclic graph over N nodes. 2 G is a connected graph over N nodes with N − 1 edges. 3 G is a minimal connected graph over N nodes. 4 (Important) G is a graph over N nodes, such that for any 2 nodes i and j in G , with i = j, there is a unique path from i to j in G. Theorem For any graph G = (V , E), the following statements are equivalent: 1 G has a junction tree. 2 G is chordal. 60 / 102
  • 181. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 61 / 102
  • 182. Definition Junction Tree Given a graph G = (V , E), a graph G = (V , E ) is said to be a Junction Tree for G, iff: 1 The nodes of G are the maximal cliques of G (i.e. G is a clique graph of G.) 2 G is a tree. 3 Running Intersection Property / Junction Tree Property: 1 For each v ∈ V , define Gv to be the induced subgraph of G consisting of exactly those nodes which correspond to maximal cliques of G that contain v. Then Gv must be a connected graph. 62 / 102
  • 183. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 63 / 102
  • 184. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 185. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 186. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 187. Step 1 Given a DAG G = (V , E) and |V | = N Chordalize the graph using the elimination algorithm with an arbitrary elimination ordering, if required. For this, you can use the following greedy algorithm Given a list of nodes: 1 Is the vertex simplicial? If it is not, make it simplicial. 2 If not remove it from the list. 64 / 102
  • 188. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 189. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 190. Step 1 Another way 1 By the Moralization Procedure. 2 Triangulate the moral graph. Moralization Procedure 1 Add edges between all pairs of nodes that have a common child. 2 Make all edges in the graph undirected. Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. 65 / 102
  • 191. Step 2 Find the maximal cliques in the chordal graph List the N Cliques ({vN } ∪ N (vN )) ∩ {v1, ..., vN } ({vN−1} ∪ N (vN−1)) ∩ {v1, ..., vN−1} · · · {v1} Note: If the graph is Chordal this is not necessary because all the cliques are maximal. 66 / 102
  • 192. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 193. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 194. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 195. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 196. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 197. Step 3 Compute the separator sets for each pair of maximal cliques and construct a weighted clique graph For each pair of maximal cliques (Ci, Cj) in the graph We check whether they posses any common variables. If yes, we designate a separator set Between these 2 cliques as Sij = Ci ∩ Cj. Then, we compute these separators trees We build a clique graph: Nodes are the Cliques. Edges (Ci, Cj) are added with weight |Ci ∩ Cj| if |Ci ∩ Cj| > 0. 67 / 102
  • 198. Step 3 This step can be implemented quickly in practice using a hash table Running Time: O |C|2 D = O N2D 68 / 102
  • 199. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For finding the maximum-weight spanning tree 69 / 102
  • 200. Step 4 Compute a maximum-weight spanning tree on the weighted clique graph to obtain a junction tree You can us for this the Kruskal and Prim for Maximum Weight Graph We will give Kruskal’s algorithm For finding the maximum-weight spanning tree 69 / 102
  • 201. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 202. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 203. Step 4 Maximal Kruskal’s algorithm Initialize an edgeless graph T with nodes that are all the maximal cliques in our chordal graph. Then We will add edges to T until it becomes a junction tree. Sort the m edges ei in our clique graph from step 3 by weight wi We have for e1, e2, ..., em with w1 ≥ w2 ≥ · · · ≥ w1 70 / 102
  • 204. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102
  • 205. Step 4 For i = 1, 2, ..., m 1 Add edge ei to T if it does not introduce a cycle. 2 If |C| − 1 edges have been added, quit. Running Time given that |E| = O |C|2 O |C|2 log |C|2 = O |C|2 log |C| = O N2 log N 71 / 102
  • 206. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 72 / 102
  • 207. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 73 / 102
  • 208. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. 74 / 102
  • 209. How do you build a Junction Tree? Given a General DAG S B L F E T X A Build a Chordal Graph Moral Graph – marry common parents and remove arrows. S B L F E T X A 74 / 102
  • 210. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 75 / 102
  • 211. How do you build a Junction Tree? Triangulate the moral graph An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord. S B L F E T X A 76 / 102
  • 212. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 77 / 102
  • 213. Listing of Cliques Identify the Cliques A clique is a subset of nodes which is complete (i.e. there is an edge between every pair of nodes) and maximal. S B L F E T X A {B,S,L} {B,L,E} {B,E,F} {L,E,T} {A,T} {E,X} 78 / 102
  • 214. Build the Clique Graph Clique Graph Add an edge between Cj and Ci with weight |Ci ∩ Cj| > 0 BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 79 / 102
  • 215. Getting The Junction Tree Run the Maximum Kruskal’s Algorithm BSL LET BLE EX BEF AT 1 1 1 2 2 2 1 1 1 80 / 102
  • 216. Getting The Junction Tree Finally BSL LET BLE EX BEF AT 81 / 102
  • 217. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 82 / 102
  • 218. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 219. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 220. Potential Representation for the Junction Tree Something Notable The joint probability distribution can now be represented in terms of potential functions, φ. This is defined in each clique and each separator Thus P (x) = n i=1 φC (xci ) m j=1 φS xsj where x = (xc1 , ..., xcn ) and each variable xci correspond to a clique and xsj correspond to a separator. Basic idea is to represent probability distribution corresponding to any graph as a product of clique potentials P (x) = 1 Z n i=1 φC (xci ) 83 / 102
  • 221. Then Main idea The idea is to transform one representation of the joint distribution to another in which for each clique, c, the potential function gives the marginal distribution for the variables in c, i.e. φC (xc) = P (xc) This will also apply for each separator, s. 84 / 102
  • 222. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 223. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 224. Now, Initialization To initialize the potential functions 1 Set all potentials to unity 2 For each variable, xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(xi), in the original DAG. 3 Multiply the potential by P (xi|pa (xi)) 85 / 102
  • 225. Then For example, we have at the beginning φBSL = φBFL = φLX = 1,then S B L F X BSL BLF LX After Initialization 86 / 102
  • 226. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 87 / 102
  • 227. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102
  • 228. Propagating Information in a Junction Tree Passing Information using the separators Passing information from one clique C1 to another C2 via the separator in between them, S0, requires two steps First Step Obtain a new potential for S0 by marginalizing out the variables in C1 that are not in S0: φ∗ S0 = C1−S0 φC1 88 / 102
  • 229. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102
  • 230. Propagating Information in a Junction Tree Second Step Obtain a new potential for C2: φ∗ C2 = φC2 λS0 Where λS0 = φ∗ S0 φS0 89 / 102
  • 231. An Example Consider a flow from the clique {B,S,L} to {B,L,F} BSL BL BLF L LX 90 / 102
  • 232. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 91 / 102
  • 233. An Example Initial representation φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) P (B) P (L) = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 92 / 102
  • 234. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0.000002 0.039998 s2, b2 0.000038 0.759962 φBL = 1 l1 l2 b1 0.000152 0.089848 b2 0.000488 0.909512 φBLF = P (F|B, L) l1 l2 f1, b1 0.000114 0.0089848 f1, b2 0.000244 0.0454756 f2, b1 0.000038 0.0808632 f2, b2 0.000244 0.8640364 93 / 102
  • 235. Now Introduce Evidence We have A flow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102
  • 236. Now Introduce Evidence We have A flow from the clique {B,S,L} to {B,L,F}, but this time we he information that Joe is a smoker, S = s1. Incorporation of Evidence φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 1 1 b2 1 1 φBLF = P (F|B, L) l1 l2 f1, b1 0.75 0.1 f1, b2 0.5 0.05 f2, b1 0.25 0.9 f2, b2 0.5 0.95 94 / 102
  • 237. An Example After Flow φBSL = P (B|S) P (L|S) P (S) l1 l2 s1, b1 0.00015 0.04985 s1, b2 0.00045 0.14955 s2, b1 0 0 s2, b2 0 0 φBL = 1 l1 l2 b1 0.00015 0.04985 b2 0.00045 0.14955 φBLF = P (F|B, L) l1 l2 f1, b1 0.0001125 0.004985 f1, b2 0.000245 0.0074775 f2, b1 0.0000375 0.044865 f2, b2 0.000255 0.1420725 95 / 102
  • 238. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 96 / 102
  • 239. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 240. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 241. The Full Propagation Two phase propagation (Jensen et al, 1990) 1 Select an arbitrary clique, C0 2 Collection Phase – flows passed from periphery to C0 3 Distribution Phase – flows passed from C0 to periphery 97 / 102
  • 242. Outline 1 Introduction What do we want? 2 Belief Propagation The Intuition Inference on Trees The Messages The Implementation 3 Junction Trees How do you build a Junction Tree? Chordal Graph Tree Graphs Junction Tree Formal Definition Algorithm For Building Junction Trees Example Moralize the DAG Triangulate Listing of Cliques Potential Function Propagating Information in a Junction Tree Example Now, the Full Propagation Example of Propagation 98 / 102
  • 245. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 246. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 247. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 248. The Full Propagation After the two propagation phases have been carried out The Junction tree will be in equilibrium with each clique containing the joint probability distribution for the variables it contains. Marginal probabilities for individual variables can then be obtained from the cliques. Now, some evidence E can be included before propagation By selecting a clique for each variable for which evidence is available. The potential for the clique is then set to 0 for any configuration which differs from the evidence. 101 / 102
  • 249. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102
  • 250. The Full Propagation After propagation the result will be P (x, E) = c∈C φc (xc, E) s∈S φs (xs, E) After normalization P (x|E) = c∈C φc (xc|E) s∈S φs (xs|E) 102 / 102