Physics of Algorithms Talk

Orbit-Product Analysis of (Generalized) Gaussian
Belief Propagation

Jason Johnson, Post-Doctoral Fellow, LANL
Joint work with Michael Chertkov and Vladimir Chernyak

Physics of Algorithms Workshop
Santa Fe, New Mexico
September 3, 2009

Overview
Introduction
graphical models + belief propagation
specialization to Gaussian model

Analysis of Gaussian BP
walk-sum analysis for means, variances, covariances1
orbit-product analysis/corrections for determinant2

Current Work on Generalized Belief Propagation (GBP) [Yedidia et
al]
uses larger “regions” to capture more walks/orbits of the
graph (better approximation)
However, it can also lead to over-counting of walks/orbits
(bad approximation/unstable algorithm)!
1
Earlier joint work with Malioutov & Willsky (NIPS, JMLR ’06).
2
Johnson, Chernyak & Chertkov (ICML ’09).

Graphical Models
A graphical model is a multivariate probability distribution that is
expressed in terms of interactions among subsets of variables (e.g.
pairwise interactions on the edges of a graph G ).

1
P(x) = ψi (xi ) ψij (xi , xj )
Z
i∈V {i,j}∈G
Markov property:

A S B

P(xA , xB |xS ) = P(xA |xS )P(xB |xS )

Given the potential functions ψ, the goal of inference is to compute
marginals P(xi ) = xV i P(x) or the normalization constant Z ,
which is generally diﬃcult in large, complex graphical models.

Gaussian Graphical Model

Information form of Gaussian density.

P(x) ∝ exp − 2 x T Jx + hT x
1

Gaussian graphical model: sparse J matrix

Jij = 0 if and only if {i, j} ∈ G

Potentials: 1 2
ψi (xi ) = e − 2 Jii xi +hi xi
ψij (xi , xj ) = e −Jij xi xj

Inference corresponds to calculation of mean vector µ = J −1 h,
covariance matrix K = J −1 or determinant Z = det J −1 . Marginals
P(xi ) speciﬁed by means µi and variances Kii .

Belief Propagation

Belief Propagation iteratively updates a set of messages µi→j (xj )
deﬁned on directed edges of the graph G using the rule:

µi→j (xj ) ∝ ψi (xi ) µk→i (xi )ψ(xi , xj )
xi k∈N(i)j

Iterate message updates until converges to a ﬁxed point.

Marginal Estimates: combine messages at a node
1
P(xi ) = ψi (xi ) µk→i (xi )
Zi
k∈N(i)

˜
ψi (xi )

Belief Propagation II

Pairwise Estimates (on edges of graph):

1 ˜ ˜ ψ(xi , xj )
P(xi , xj ) = ψi (xi )ψj (xj )
Zij µi→j (xj )µj→i (xi )
˜
ψij (xi ,xj )

Estimate of Normalization Constant:
Zij
Z bp = Zi
Zi Zj
i∈V {i,j}∈G

BP ﬁxed point is saddle point of RHS with respect to
messages/reparameterizations.
In trees, BP converges in ﬁnite number of steps and is exact
(equivalent to variable elimination).

Gaussian Belief Propagation (GaBP)
1
Messages µi→j (xj ) ∝ exp{ 2 αi→j xj2 + βi→j xj }.

BP ﬁxed-point equations reduce to:

αi→j = Jij (Jii − αij )−1
2

βi→j = −Jij (Jii − αij )−1 (hi + βij )

where αij = k∈N(i)j αk→i and βij = k∈N(i)j αk→i .
Marginals speciﬁed by:

Kibp = (Jii − αk→i )−1
k∈N(i)

µbp
i = Kibp (hi + βk→i )
k∈N(i)

Gaussian BP Determinant Estimate

Estimates of pairwise covariance on edges:
−1
bp Jii − αij Jij
K(ij) =
Jij Jjj − αji

Estimate of Z det K = det J −1 :
Zij
Z bp = Zi
Zi Zj
i∈V {i,j}∈G

where Zi = Kibp and Zij = det K(ij) .
bp

Exact in tree models (equivalent to Gaussian elimination),
approximate in loopy models.

The BP Computation Tree

BP marginal estimates are equivalent to the exact marginal in a
tree-structured model [Weiss & Freeman].

(4)
1
µ2→1
1 2 3 (3) 2 4
µ3→2
3 5 5 7
4 5 6 (2)
µ6→3
6 4 6 8 2 6 8 8
(1)
µ5→6
7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9

The BP messages correspond to upwards variable elimination steps
in this computation tree.

Walk-Summable Gaussian Models

∞
Let J = I − R. If ρ(R) < 1 then (I − R)−1 = L
L=0 R .

Walk-Sum interpretation of inference:
∞
?
Kij = Rw = Rw
L=0 L w :i→j
w :i →j

∞
?
µi = hj Rw = h∗ R w
j L=0 L w :∗→i
w :j →i

Walk-Summable if w :i→j |R w | converges for all i, j. Absolute
convergence implies convergence of walk-sums (to same value) for
arbitrary orderings and partitions of the set of walks. Equivalent to
ρ(|R|) < 1.

Walk-Sum Interpretation of GaBP

Combine interpretation of BP as exact inference on computation
tree with walk-sum interpretation of Gaussian inference in trees:
messages represent walk-sums in subtrees of computation tree
Gauss BP converges in walk-summable models
complete walk-sum for the means
incomplete walk-sum for the variances

Complete Walk-Sum for Means

Every walk in G ending at a node i maps to a walk of the
computation tree Ti (ending at root node of Ti )...

1
1 2 3
2 4

4 5 6 3 5 5 7

6 4 6 8 2 6 8 8

7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9

Gaussian BP converges to the correct means in WS models.

Incomplete Walk-Sum for Variances

Only those totally backtracking walks of G can be embedded as
closed walks in the computation tree...

1
1 2 3
2 4

4 5 6 3 5 5 7

6 4 6 8 2 6 8 8

7 8 9 5 9 1 7 3 9 7 9 1 3 3 9 7 9 5 9

Gaussian BP converges to incorrect variance estimates
(underestimate in non-negative model).

Zeta Function and Orbit-Product

What about the determinant?
Deﬁnition of Orbits:
A walk is closed if it begins and ends at same vertex.
It is primitive if does not repeat a shorter walk.
Two primitive walks are equivalent if one is a cyclic shift of
the other.
Deﬁne orbits ∈ L of G to be equivalence classes of closed,
primitive walks.
Theorem. Let Z det(I − R)−1 . If ρ(|R|) < 1 then

Z= (1 − R )−1 Z.

A kind of zeta function in graph theory.

Zbp as Totally-Backtracking Orbit-Product

Deﬁnition of Totally-Backtracking Orbits:
Orbit is reducible if it contains backtracking steps ...(ij)(ji)...,
else it is irreducible (or backtrackless).
Every orbit has a unique irreducible core γ = Γ( ) obtained
by iteratively deleting pairs of backtracking steps until no more
remain. Let Lγ denote the set of all orbits that reduce to γ.
Orbit is totally backtracking (or trivial) if it reduces to the
empty orbit Γ( ) = ∅, else it is non-trivial.

Theorem. If ρ(|R|) < 1 then Z bp (deﬁned earlier) is equal to the
totally-backtracking orbit-product:

Z bp = Z
∈L∅

Orbit-Product Correction and Error Bound

Orbit-product correction to Z bp :

Z = Z bp Z
∈L∅

Error Bound: missing orbits must all involve cycles of the graph...

1 Z ρg
log bp ≤
n Z g (1 − ρ)

where ρ ρ(|R|) < 1 and g is girth of the graph (length of
shortest cycle).

Reduction to Backtrackless Orbit-Product Correction

We may reduce the orbit-product correction to one over just
backtrackless orbits γ
 

Z = Zbp Z = Zbp  Z
γ ∈L(γ)

Zγ

with modiﬁed orbit-factors Zγ based on GaBP

Zγ = (1 − rij )−1 where rij (1 − αij )−1 rij
(ij)∈γ

The factor (1 − αij )−1 serves to reconstruct totally-backtracking
walks at each point i along the backtrackless orbit γ.

Backtrackless Determinant Correction
Deﬁne backtrackless graph G of G as follows: nodes of G
correspond to directed edges of G , edges (ij) → (jk) for k = i.

21 32
1 2 3

12 23
14 41 25 52 36 63
54 65

4 5 6

45 56
47 74 58 85 69 96
87 98

7 8 9
78 89

Let R be adjacency matrix of G with modiﬁed edge-weights r
based on GaBP. Then,

Z = Zbp det(I − R )−1

Region-Based Estimates/Corrections
Select a set of regions R ⊂ 2V that is closed under intersections
and cover all vertices and edges of G.
Deﬁne regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule:

nA = 1 − nB
B∈R|A B

To capture all orbits covered by any region (without over-counting)
we calculate the estimate:
n
ZR ZB B (det(I − RB )−1 )nB
B B

Error Bounds. Select regions to cover all orbits up to length L.
Then,
1 ZB ρL
log ≤
n Z L(1 − ρ)

Example: 2-D Grids

L
Choice of regions for grids: overlapping L × L, 2 × L, L × L ,
2
L
2 × L
2
(shifted by L ).
2

For example, in 6 × 6 grid with block size L = 4:
n = +1 n = −1 n = +1

256 × 256 Periodic Grid, uniform edge weights r ∈ [0, .25].
Test with L = 2, 4, 8, 16, 32.
1 0.25

0.9 ρ(|R|) n−1 log Ztrue
0.8 ρ(|R′|) 0.2
n−1 log Z
bp
0.7 n−1 log ZB (L=2,4,8,...)
0.6 0.15

0.5

0.4 0.1

0.3

0.2 0.05

0.1

0 0
0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25
r r
−2
0.25 10
−1
n log Ztrue
0.2
n−1 log Z −4
10
bp
n−1 log Zbp Z′B
−6
0.15 10

−8
0.1 10

n−1|log Z−1 ZB|
0.05
−10
10 true
n−1|log Z−1 Zbp Z′B|
true
−12
0 10
0 0.05 0.1 0.15 0.2 0.25 5 10 15 20 25 30
r L

Generalized Belief Propagation
Select a set of regions R ⊂ 2V that is closed under intersections
and cover all vertices and edges of G.
Deﬁne regions counts (nA ∈ Z, A ∈ R) by inclusion-exclusion rule:

nA = 1 − nB
B∈R|A B

Then, GBP solves for saddle point of

ZR (ψ) Z (ψR )nR
A∈R

over reparameterizations {ψA , A ∈ R} of the form
1
P(x) = ψR (xR )nR
Z
A∈R

Denote saddle-point by Zgbp = ZR (ψ gbp ).

Example: 2-D Grid Revisited

−2
10
block estimate
−4 GBP estimate
10

−6
10
free energy error

−8
10

−10
10

−12
10

−14
10

−16
10
4 6 8 10 12 14 16
block size

GBP Toy Example
Look at graph G = K4 and consider diﬀerent choices of regions...

1
2

1
4 2

4 3
3

BP Regions:

n = +1 12 n = −2 2

24 23
1

14 13 4 3
34

GBP “3∆” Regions:

n = +1 n = −1 12 n = +1

124 123
1

134 14 13

GBP “4∆” Regions:

n = +1 n = −1 12 n = +1 2

124 123
24 23
1

234 14 13 4 3
134 34

Computational Experiment with equal edge weights r = .32 (the
model becomes singular/indeﬁnite for r ≥ 1 ).
3

Z = 10.9
Zbp = 2.5
Zgbp (3∆) = 9.9
Zgbp (4∆) = 54.4!!!

GBP with 3∆ regions is big improvement of BP (GBP captures
more orbits).
What went wrong with the 4∆ method?

Orbit-Product Interpretation of GBP
Answer: sometimes GBP can overcount orbits of the graph.
Let T (R) be the set of hypertrees T one may construct from
regions R.
Orbit spans T if we can embed in T but cannot embed it
in any sub-hypertree of T .
Let g #{T ∈ T (R)| spans T }.
Orbit-Product Interpretation of GBP:

Zgbp = Zg

Remark. GBP may also include multiples of an orbit as
independent orbits (these are not counted by Z ).
We say GBP is consistent if g ≤ 1 for all (primitive) orbits and
g = 0 for multiples of orbits (no overcounting).

Examples of Over-Counting
Orbit = [(12)(23)(34)(41)]:

2 1 1

124
1 4 134 123 2 4 2
234

4 3 3 3

Orbit = [(12)(23)(34)(42)(21)]:

2 1

2 1 2 12

124 123 2
1
134
234
4 3
4 3
4 3

Conclusion and Future Work

Graphical view of inference in walk-summable Gaussian graphical
models that is very intuitive for understanding iterative inference
algorithms and approximation methods.
Future Work:
many open questions on GBP.
multiscale method to approximate longer orbits from
coarse-grained model.
beyond walk-summable?

Physics of Algorithms Talk

More Related Content

What's hot (20)

Viewers also liked (10)

Similar to Physics of Algorithms Talk (20)

Physics of Algorithms Talk