SlideShare a Scribd company logo
Machine LearningMachine Learninggg
Exact InferenceExact Inference
Eric XingEric Xing
Lecture 11, August 14, 2010
AAA
B A
C
B A
C
B A
C
A
DC
A
DC
B AB A AA
fm
bmcm
dm
Eric Xing © Eric Xing @ CMU, 2006-2010 1
Reading:
E F
H
E F
H
E F
H
E FE FE F
E
G
E
G
E
G
A
DC
E
A
DC
E
A
DC
E
DC DC
hm
gm
em
fm
Inference and LearningInference and Learning
 We now have compact representations of probability
distributions: GM
 A BN M describes a unique probability distribution P
 Typical tasks:
 Task 1: How do we answer queries about P?
 We use inference as a name for the process of computing answers to such
queries
 Task 2: How do we estimate a plausible model M from data D? Task 2: How do we estimate a plausible model M from data D?
i. We use learning as a name for the process of obtaining point estimate of M.
ii. But for Bayesian, they seek p(M |D), which is actually an inference problem.
Eric Xing © Eric Xing @ CMU, 2006-2010 2
iii. When not all variables are observable, even computing point estimate of M
need to do inference to impute the missing data.
Inferential Query 1:
LikelihoodLikelihood
 Most of the queries one may ask involve evidenceq y
 Evidence xv is an assignment of values to a set Xv of nodes in the GM
over varialbe set X={X1, X2, …, Xn}
 Without loss of generality Xv={Xk+1, … , Xn},
 Write XH=XXv as the set of hidden variables, XH can be or X
 Simplest query: compute probability of evidence
∑ ∑ )(),,()(
1
1
x x
k
k
,,x,xPPP v
x
vHv xXXx
H
 
Eric Xing © Eric Xing @ CMU, 2006-2010 3
 this is often referred to as computing the likelihood of xv
Inferential Query 2:
Conditional Probability
 Often we are interested in the conditional probability
Conditional Probability
p y
distribution of a variable given the evidence
VHVH xXxX
xXX
),(),(
)|(
PP
P
 this is the a posteriori belief in XH, given evidence xv
 

Hx
VHH
VH
V
VH
VVH
xxXx
xXX
),()(
)|(
PP
P
this is the a posteriori belief in XH, given evidence xv
 We usually query a subset Y of all hidden variables XH={Y,Z}
and "don't care" about the remaining Z:and don t care about the remaining, Z:
 
z
VV xzZYxY )|,()|( PP
Eric Xing © Eric Xing @ CMU, 2006-2010 4
 the process of summing out the "don't care" variables z is called
marginalization, and the resulting P(Y|xv) is called a marginal prob.
Applications of a posteriori Belief
 Prediction: what is the probability of an outcome given the starting
?
Applications of a posteriori Belief
condition
 the query node is a descendent of the evidence
A CB
?
q y
 Diagnosis: what is the probability of disease/fault given symptoms
A CB
?
 the query node an ancestor of the evidence
 Learning under partial observation
A CB
g p
 fill in the unobserved values under an "EM" setting (more later)
 The directionality of information flow between variables is not
Eric Xing © Eric Xing @ CMU, 2006-2010 5
y
restricted by the directionality of the edges in a GM
 probabilistic inference can combine evidence form all parts of the network
Inferential Query 3:
Most Probable Assignment
 In this query we want to find the most probable joint
Most Probable Assignment
assignment (MPA) for some variables of interest
 Such reasoning is usually performed under some given
evidence xv, and ignoring (the values of) other variables Z:
 
z
VyVyV xzZYxYxY )|,(maxarg)|(maxarg|*
PP
 this is the maximum a posteriori configuration of Y.
Eric Xing © Eric Xing @ CMU, 2006-2010 6
Complexity of Inference
Thm:
Complexity of Inference
Thm:
Computing P(XH=xH| xv) in an arbitrary GM is NP-hard
 Hardness does not mean we cannot solve inference
 It implies that we cannot find a general procedure that works efficiently
for arbitrary GMsy
 For particular families of GMs, we can have provably efficient
procedures
Eric Xing © Eric Xing @ CMU, 2006-2010 7
Approaches to inferenceApproaches to inference
 Exact inference algorithmsg
 The elimination algorithm
 Belief propagationp p g
 The junction tree algorithms (but will not cover in detail here)
 Approximate inference techniques Approximate inference techniques
 Variational algorithmsVariational algorithms
 Stochastic simulation / sampling methods
 Markov chain Monte Carlo methods
Eric Xing © Eric Xing @ CMU, 2006-2010 8
Inference on General BN via
Variable Elimination
General idea:
Variable Elimination
 Write query in the form
  ii paxPXP 1 )|(),( e
 this suggests an "elimination order" of latent variables to be marginalized
 Iteratively
 
nx x x i
ii paxPXP
3 2
1 )|(),( e
 Iteratively
 Move all irrelevant terms outside of innermost sum
 Perform innermost sum, getting a new term
I t th t i t th d t Insert the new term into the product
 wrap-up
),(
)|(
eXP
XP 1
Eric Xing © Eric Xing @ CMU, 2006-2010 9
)(
),(
)|(
e
e
P
XP 1
1 
Hidden Markov ModelHidden Markov Model
y2 y3y1 yT...
p(x y) = p(x x y y )
A AA Ax2 x3x1 xT
y2 y3y1 yT...
...
p(x, y) = p(x1……xT, y1, ……, yT)
= p(y1) p(x1 | y1) p(y2 | y1) p(x2 | y2) … p(yT | yT-1) p(xT | yT)
C diti l b bilitConditional probability:
Eric Xing © Eric Xing @ CMU, 2006-2010 10
Hidden Markov ModelHidden Markov Model
y2 y3y1 yT...
Conditional probability:
A AA Ax2 x3x1 xT
y2 y3y1 yT...
...
Eric Xing © Eric Xing @ CMU, 2006-2010 11
A Bayesian network
A food web
A Bayesian network
A food web
B A
DC
E FE F
G H
Eric Xing © Eric Xing @ CMU, 2006-2010 12
What is the probability that hawks are leaving given that the grass condition is poor?
Example: Variable Elimination
 Query: P(A |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C,D,E,F,G,H
 Initial factors:
B A
DC
 Choose an elimination order: H,G,F,E,D,C,B
E F
G H
),|()|()|(),|()|()|()()( fehPegPafPdcePadPbcPbPaP
 Step 1:
 Conditioning (fix the evidence node (i.e., h) on its observed value (i.e., )):h
~
 This step is isomorphic to a marginalization step:
),|
~
(),( fehhpfemh 
B A
DC
Eric Xing © Eric Xing @ CMU, 2006-2010 13
 
h
h hhfehpfem )
~
(),|(),(  E F
G
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C,D,E,F,G
 Initial factors:
B A
DC
E F
G H
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femegPafPdcePadPbcPbPaP
fehPegPafPdcePadPbcPbPaP
h
 Step 2: Eliminate G
t compute
1)|()(  g
g egpem
B A
DC)()()|()|()|()|()()( fememafPdcePadPbcPbPaP h
Eric Xing © Eric Xing @ CMU, 2006-2010 14
E F
),()|(),|()|()|()()(
),()()|(),|()|()|()()(
femafPdcePadPbcPbPaP
fememafPdcePadPbcPbPaP
h
hg


Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C,D,E,F
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPbcPbPaP
femegPafPdcePadPbcPbPaP
fehPegPafPdcePadPbcPbPaP
h
h


 Step 3: Eliminate F
t
),()|(),|()|()|()()( ff h
 compute

f
hf femafpaem ),()|(),(
)()|()|()|()()( eamdcePadPbcPbPaP
B A
DC
Eric Xing © Eric Xing @ CMU, 2006-2010 15
),(),|()|()|()()( eamdcePadPbcPbPaP f
E
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C,D,E
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPbcPbPaP
femegPafPdcePadPbcPbPaP
fehPegPafPdcePadPbcPbPaP
h
h


 Step 4: Eliminate E
t
),(),|()|()|()()(
),()|(),|()|()|()()(
eamdcePadPbcPbPaP
ff
f
h

B A
DC
 compute

e
fe eamdcepdcam ),(),|(),,(
)()|()|()()( dcamadPbcPbPaP
B A
DC
Eric Xing © Eric Xing @ CMU, 2006-2010 16
E
),,()|()|()()( dcamadPbcPbPaP e
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C,D
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPbcPbPaP
femegPafPdcePadPbcPbPaP
fehPegPafPdcePadPbcPbPaP
h
h


),,()|()|()()(
),(),|()|()|()()(
dcamadPbcPbPaP
eamdcePadPbcPbPaP
e
f


 Step 5: Eliminate D
 compute
 ed dcamadpcam ),,()|(),(
B A
C
Eric Xing © Eric Xing @ CMU, 2006-2010 17
d
ed p ),,()|(),(
),()|()()( camdcPbPaP d
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B,C
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPdcPbPaP
femegPafPdcePadPdcPbPaP
fehPegPafPdcePadPdcPbPaP
h
h


),()|()()(
),,()|()|()()(
),(),|()|()|()()(
camdcPbPaP
dcamadPdcPbPaP
eamdcePadPdcPbPaP
d
e
f



 Step 6: Eliminate C
 compute
 dc cambcpbam ),()|(),(
B A
Eric Xing © Eric Xing @ CMU, 2006-2010 18
),()|()()( camdcPbPaP d
c
dc p ),()|(),(
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPdcPbPaP
femegPafPdcePadPdcPbPaP
fehPegPafPdcePadPdcPbPaP
h
h


),()|()()(
),,()|()|()()(
),(),|()|()|()()(
camdcPbPaP
dcamadPdcPbPaP
eamdcePadPdcPbPaP
d
e
f



 Step 7: Eliminate B
 compute
),()()( bambPaP c
 cb bambpam ),()()(
A
Eric Xing © Eric Xing @ CMU, 2006-2010 19
b
cb p ),()()(
)()( amaP b
Example: Variable Elimination
 Query: P(B |h)
B A
Example: Variable Elimination
 Need to eliminate: B
 Initial factors:
B A
DC
E F
G H),()|(),|()|()|()()(
),()|()|(),|()|()|()()(
),|()|()|(),|()|()|()()(
femafPdcePadPdcPbPaP
femegPafPdcePadPdcPbPaP
fehPegPafPdcePadPdcPbPaP
h
h


),()|()()(
),,()|()|()()(
),(),|()|()|()()(
camdcPbPaP
dcamadPdcPbPaP
eamdcePadPdcPbPaP
d
e
f



 Step 8: Wrap-up
)()(
),()()(
amaP
bambPaP
b
c


,)()()
~
,( amaphap b  b amaphp )()()
~
(
Eric Xing © Eric Xing @ CMU, 2006-2010 20
Step 8 ap up ,)()(),( pp b


a
b
b
amap
amap
haP
)()(
)()(
)
~
|(
a
b amaphp )()()(
Complexity of variable
elimination
 Suppose in one elimination step we compute
elimination

x
kxkx yyxmyym ),,,('),,( 11 

k
xmyyxm )()(' y
This requires
 multiplications


i
cikx i
xmyyxm
1
1 ),(),,,( y
 CXk )Val()Val( Y p
─ For each value of x, y1, …, yk, we do k multiplications
i
Ci
)()(
 additions
─ For each value of y1, …, yk , we do |Val(X)| additions

i
Ci
X )Val()Val( Y
Eric Xing © Eric Xing @ CMU, 2006-2010 21
Complexity is exponential in number of variables in the
intermediate factor
Elimination CliquesElimination Cliques
B AB A
DC
E F
G H
B A
DC
E F
G H
B A
DC
B A
DC
G H
B A
DC
B A
DCD
E F
G H
D
E F
G
DC
E F
DC
E
)( fem )(em )( aem )( dcam
B A
DC
B A
C
B A A
),( femh )(emg ),( aemf ),,( dcame
Eric Xing © Eric Xing @ CMU, 2006-2010 22
D
),( camd ),( bamc )(amb
Understanding Variable
EliminationElimination
 A graph elimination algorithm
B A
DC
E F
B A
DC
E F
B A
DC
B A
DC
E F
B A
DC
E F
B A
DC
E
B A
C
B A A
moralization
G H G H G
graph elimination
 Intermediate terms correspond to the cliques resulted from
elimination
 “good” elimination orderings lead to small cliques and hence reduce complexitygood elimination orderings lead to small cliques and hence reduce complexity
(what will happen if we eliminate "e" first in the above graph?)
 finding the optimum ordering is NP-hard, but for many graph optimum or near-
optimum can often be heuristically found
Eric Xing © Eric Xing @ CMU, 2006-2010 23
p y
 Applies to undirected GMs
From Elimination to Belief
PropagationPropagation
 Recall that Induced dependency during marginalization isp y g g
captured in elimination cliques
 Summation <-> elimination
 Intermediate term <-> elimination cliqueq
A
B A
C
A
B A A
A
E F
A
DC
A
DC
Eric Xing © Eric Xing @ CMU, 2006-2010 24
 Can this lead to an generic
inference algorithm?
E F
H
E
G
E
Tree GMsTree GMs
Undirected tree: a
unique path between
Directed tree: all
nodes except the root
ha e e actl one
Poly tree: can have
multiple parents
Eric Xing © Eric Xing @ CMU, 2006-2010 25
any pair of nodes have exactly one
parent
Equivalence of directed and
undirected trees
 Any undirected tree can be converted to a directed tree by choosing a root
undirected trees
node and directing all edges away from it
 A directed tree and the corresponding undirected tree make the same
conditional independence assertionsp
 Parameterizations are essentially the same.
 Undirected tree: Undirected tree:
 Directed tree:
 Equivalence:
Eric Xing © Eric Xing @ CMU, 2006-2010 26
 Evidence:?
From elimination to message
passingpassing
 Recall ELIMINATION algorithm:
 Choose an ordering Z in which query node f is the final node
 Place all potentials on an active list
 Eliminate node i by removing all potentials containing i, take sum/product over xi.
Place the resultant factor back on the list Place the resultant factor back on the list
 For a TREE graph:
 Choose query node f as the root of the tree
 View tree as a directed tree with edges pointing towards from f
 Elimination ordering based on depth-first traversal
 Elimination of each node can be considered as message-passing (or Belief Propagation) Elimination of each node can be considered as message passing (or Belief Propagation)
directly along tree branches, rather than on some transformed graphs
 thus, we can use the tree itself as a data-structure to do general inference!!
Eric Xing © Eric Xing @ CMU, 2006-2010 27
Message passing for treesMessage passing for trees
Let m (x ) denote the factor resulting from
f
Let mij(xi) denote the factor resulting from
eliminating variables from bellow up to i,
which is a function of xi:
i This is reminiscent of a message sent
from j to i.
j
k l
Eric Xing © Eric Xing @ CMU, 2006-2010 28
k l
mij(xi) represents a "belief" of xi from xj!
 Elimination on trees is equivalent to message passing alongq g p g g
tree branches!
f
ii
j
Eric Xing © Eric Xing @ CMU, 2006-2010 29
k l
The message passing protocol:The message passing protocol:
 A two-pass algorithm:p g
X1
(X ) (X )
X
m21(X 1)
m32(X 2) m42(X 2)
m12(X 2)
X2
X3
X4
m32(X 2) m42(X 2)
Eric Xing © Eric Xing @ CMU, 2006-2010 30
m24(X 4)
3
m23(X 3)
Belief Propagation (SP-algorithm):
Sequential implementationSequential implementation
Eric Xing © Eric Xing @ CMU, 2006-2010 31
Belief Propagation (SP-algorithm):
Parallel synchronous implementationParallel synchronous implementation
 For a node of degree d, whenever messages have arrived on any subset of d-1 node,
compute the message for the remaining edge and send!
Eric Xing © Eric Xing @ CMU, 2006-2010 32
compute the message for the remaining edge and send!
 A pair of messages have been computed for each edge, one for each direction
 All incoming messages are eventually computed for each node
Correctness of BP on treeCorrectness of BP on tree
 Collollary: the synchronous implementation is "non-blocking"
 Thm: The Message Passage Guarantees obtaining all
marginals in the tree
 What about non-tree?
Eric Xing © Eric Xing @ CMU, 2006-2010 33
Inference on general GMInference on general GM
 Now, what if the GM is not a tree-like graph?
 Can we still directly run message
i t l l it d ?message-passing protocol along its edges?
 For non-trees, we do not have the guarantee that message-passingg g p g
will be consistent!
 Then what? Then what?
 Construct a graph data-structure from P that has a tree structure, and run message-passing
on it!
Eric Xing © Eric Xing @ CMU, 2006-2010 34
 Junction tree algorithm
Elimination Clique
 Recall that Induced dependency during marginalization is
Elimination Clique
p y g g
captured in elimination cliques
 Summation <-> elimination
 Intermediate term <-> elimination cliqueq
A
B A
C
A
B A A
A
E F
A
DC
A
DC
Eric Xing © Eric Xing @ CMU, 2006-2010 35
 Can this lead to an generic
inference algorithm?
E F
H
E
G
E
A Clique TreeA Clique Tree
B A
C
B A A
bmcm
A
C
A
DC
fm
dm
E F
A
DC
E hm
em
E F
H
E
G
E h
gm
Eric Xing © Eric Xing @ CMU, 2006-2010 36

e
fg
e
eamemdcep
dcam
),()(),|(
),,(
From Elimination to Message
Passing
 Elimination  message passing on a clique tree
Passing
 Elimination  message passing on a clique tree
B A
DC
E F
B A
DC
E F
B A
DC
B A
DC
E F
B A
DC
E F
B A
DC
E
B A
C
B A A
B A
C
B A A
bmcm
G H G H G

A
E F
A
DC
A
DC
em
fm
dm

e dcam ),,(
E F
H
E
G
DC
E hm
gm

e
fg eamemdcep ),()(),|(
Eric Xing © Eric Xing @ CMU, 2006-2010 37
 Messages can be reused
From Elimination to Message
PassingPassing
 Elimination  message passing on a clique tree Elimination  message passing on a clique tree
 Another query ...
B A
C
B A A
cm bm
A
E F
A
DC
A
DC
em
dm
fm
E F
H
E
G
DC
E
gm
hm
Eric Xing © Eric Xing @ CMU, 2006-2010 38
 Messages mf and mh are reused, others need to be recomputed
The Shafer Shenoy AlgorithmThe Shafer Shenoy Algorithm
 Shafer-Shenoy algorithmy g
 Message from clique i to clique j :
  S )(
 Clique marginal
 
 
iji
i
SC jk
kiikCji S

)(
 SCp )()( 
Eric Xing © Eric Xing @ CMU, 2006-2010 39
 
k
kiikCi SCp i
)()( 
A Sketch of the Junction Tree
AlgorithmAlgorithm
 The algorithmg
 Construction of junction trees --- a special clique tree
 Propagation of probabilities --- a message-passing protocol
 Results in marginal probabilities of all cliques --- solves all
queries in a single run
 A generic exact inference algorithm for any GM A generic exact inference algorithm for any GM
 Complexity: exponential in the size of the maximal clique ---
a good elimination order often leads to small maximal clique,
and hence a good (i.e., thin) JT
 Many well-known algorithms are special cases of JT
Eric Xing © Eric Xing @ CMU, 2006-2010 40
 Forward-backward, Kalman filter, Peeling, Sum-Product ...
The Junction tree algorithm for HMMThe Junction tree algorithm for HMM
 A junction tree for the HMM ),( 11 xy ),( 21 yy ),( 32 yy ),( TT yy 1
A AA Ax2 x3x1 xT
y2 y3y1 yT
...
...
y yy yy yy
)( 2y )( 3y )( Ty
)( 1y )( 2y

 Rightward pass ),( 22 xy ),( 33 xy ),( TT xy
),( 1tt yy)( ttt y1 )( 11  ttt y
 )|()()|(
  
ty
tttttttttt yyyyy )()(),()( 11111 
 This is exactly the forward algorithm! ),( 11  tt xy
)( 1 tt
y







t
tt
t
y
tttyytt
y
ttttttt
yayxp
yxpyyyp
)()|(
)|()()|(
, 111
1111
1


y g
 Leftward pass …


 
1
11111
ty
tttttttttt yyyyy )()(),()( 
),( 1tt yy)( ttt y1 )( 11  ttt y
)( 1 ty
Eric Xing © Eric Xing @ CMU, 2006-2010 41
 This is exactly the backward algorithm!
1ty



1
11111
ty
ttttttt yxpyyyp )|()()|( 
),( 11  tt xy
)( 1 tt
y
SummarySummary
 The simple Eliminate algorithm captures the key algorithmic
Operation underlying probabilistic inference:
--- That of taking a sum over product of potential functions
 The computational complexity of the Eliminate algorithm can be
reduced to purely graph-theoretic considerations.
 This graph interpretation will also provide hints about how to design
improved inference algorithms
 What can we say about the overall computational complexity of the
algorithm? In particular, how can we control the "size" of the
Eric Xing © Eric Xing @ CMU, 2006-2010 42
summands that appear in the sequence of summation operation.

More Related Content

PDF
ABC with data cloning for MLE in state space models
PDF
Lausanne 2019 #2
PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
Accelerated approximate Bayesian computation with applications to protein fol...
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
Side 2019 #12
PDF
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
PDF
NBBC15, Reyjavik, June 08, 2015
ABC with data cloning for MLE in state space models
Lausanne 2019 #2
Intro to Approximate Bayesian Computation (ABC)
Accelerated approximate Bayesian computation with applications to protein fol...
Inference for stochastic differential equations via approximate Bayesian comp...
Side 2019 #12
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
NBBC15, Reyjavik, June 08, 2015

What's hot (20)

PDF
Boston talk
PDF
My data are incomplete and noisy: Information-reduction statistical methods f...
PDF
Predictive Modeling in Insurance in the context of (possibly) big data
PDF
Intractable likelihoods
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Machine Learning in Actuarial Science & Insurance
PDF
Side 2019 #9
PDF
Accelerating Metropolis Hastings with Lightweight Inference Compilation
PDF
Approximate Bayesian model choice via random forests
PDF
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
PDF
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
PDF
When Classifier Selection meets Information Theory: A Unifying View
PDF
better together? statistical learning in models made of modules
PDF
from model uncertainty to ABC
PDF
Convergence of ABC methods
PDF
02 math essentials
PDF
Hands-On Algorithms for Predictive Modeling
PDF
ABC workshop: 17w5025
PDF
random forests for ABC model choice and parameter estimation
PDF
An overview of Bayesian testing
Boston talk
My data are incomplete and noisy: Information-reduction statistical methods f...
Predictive Modeling in Insurance in the context of (possibly) big data
Intractable likelihoods
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Machine Learning in Actuarial Science & Insurance
Side 2019 #9
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Approximate Bayesian model choice via random forests
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
When Classifier Selection meets Information Theory: A Unifying View
better together? statistical learning in models made of modules
from model uncertainty to ABC
Convergence of ABC methods
02 math essentials
Hands-On Algorithms for Predictive Modeling
ABC workshop: 17w5025
random forests for ABC model choice and parameter estimation
An overview of Bayesian testing
Ad

Viewers also liked (10)

PDF
Linkedin - Facts about Linkedin and why you need to be on it
PPT
Presentation 20100428
PDF
Social Recruiting & Professional Networking (Viadeo)
PDF
LinkedIn vs Xing
PPTX
Exact Inference in Bayesian Networks using MapReduce__HadoopSummit2010
PDF
Xing
PDF
Viadeo Student Challenge : présentation de la 2ème meilleure équipe !
PPT
Introducing the XING Partner Ecosystem
PPTX
Mobile LBS Summit 2010, Wiesbaden
PDF
Don’t forget to XING – das Businessnetzwerk im Social-Media–Mix #afbmc
Linkedin - Facts about Linkedin and why you need to be on it
Presentation 20100428
Social Recruiting & Professional Networking (Viadeo)
LinkedIn vs Xing
Exact Inference in Bayesian Networks using MapReduce__HadoopSummit2010
Xing
Viadeo Student Challenge : présentation de la 2ème meilleure équipe !
Introducing the XING Partner Ecosystem
Mobile LBS Summit 2010, Wiesbaden
Don’t forget to XING – das Businessnetzwerk im Social-Media–Mix #afbmc
Ad

Similar to Lecture11 xing (20)

PDF
Lesson 29
PDF
AI Lesson 29
PPT
Cs221 lecture4-fall11
PDF
Lecture12 xing
PPT
AIML unit-2(1).ppt
PDF
Lecture10 - Naïve Bayes
PDF
07 Machine Learning - Expectation Maximization
PPTX
Into to prob_prog_hari (2)
PDF
2012 mdsp pr06  hmm
PDF
Probabilistic Reasoning bayes rule conditional .pdf
PDF
Machine Learning With MapReduce, K-Means, MLE
PPTX
planning and decision making
PPT
. An introduction to machine learning and probabilistic ...
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
PPT
tutorial.ppt
PPTX
Hmm and neural networks
PPT
6. reasoning under uncertainity .ppt
PPT
Basen Network
PDF
Bayesian Learning- part of machine learning
PPTX
Machine Learning Algorithms Review(Part 2)
Lesson 29
AI Lesson 29
Cs221 lecture4-fall11
Lecture12 xing
AIML unit-2(1).ppt
Lecture10 - Naïve Bayes
07 Machine Learning - Expectation Maximization
Into to prob_prog_hari (2)
2012 mdsp pr06  hmm
Probabilistic Reasoning bayes rule conditional .pdf
Machine Learning With MapReduce, K-Means, MLE
planning and decision making
. An introduction to machine learning and probabilistic ...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
tutorial.ppt
Hmm and neural networks
6. reasoning under uncertainity .ppt
Basen Network
Bayesian Learning- part of machine learning
Machine Learning Algorithms Review(Part 2)

More from Tianlu Wang (20)

PDF
L7 er2
PDF
L8 design1
PDF
L9 design2
PDF
14 pro resolution
PDF
13 propositional calculus
PDF
12 adversal search
PDF
11 alternative search
PDF
10 2 sum
PDF
22 planning
PDF
21 situation calculus
PDF
20 bayes learning
PDF
19 uncertain evidence
PDF
18 common knowledge
PDF
17 2 expert systems
PDF
17 1 knowledge-based system
PDF
16 2 predicate resolution
PDF
16 1 predicate resolution
PDF
15 predicate
PDF
09 heuristic search
PDF
08 uninformed search
L7 er2
L8 design1
L9 design2
14 pro resolution
13 propositional calculus
12 adversal search
11 alternative search
10 2 sum
22 planning
21 situation calculus
20 bayes learning
19 uncertain evidence
18 common knowledge
17 2 expert systems
17 1 knowledge-based system
16 2 predicate resolution
16 1 predicate resolution
15 predicate
09 heuristic search
08 uninformed search

Recently uploaded (20)

PPTX
A slideshow about aesthetic value in arts
PDF
the saint and devil who dominated the outcasts
PPTX
CPAR-ELEMENTS AND PRINCIPLE OF ARTS.pptx
PPTX
CMU-PPT-LACHICA-DEFENSE FOR RESEARCH PRESENTATION
PDF
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
PPTX
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
PPTX
Art Appreciation-Lesson-1-1.pptx College
PPTX
slide head and neck muscel for medical students
PPTX
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
PDF
waiting, Queuing, best time an event cab be done at a time .pdf
PPTX
unit5-servicesrelatedtogeneticsinnursing-241221084421-d77c4adb.pptx
PPTX
E8 Q1 020ssssssssssssssssssssssssssssss2 PS.pptx
PPSX
Multiple scenes in a single painting.ppsx
PPTX
SAPOTA CULTIVATION.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
PPTX
22 Bindushree Sahu.pptxmadam curie life and achievements
PPTX
Military history & Evolution of Armed Forces of the Philippines
PPTX
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
PDF
Close Enough S3 E7 "Bridgette the Brain"
PPTX
MUSIC-W1-Q1-1.pptxL;ML;MLNL;NL;NL;N;LNL;NL;N
PPTX
Socio ch 1 characteristics characteristics
A slideshow about aesthetic value in arts
the saint and devil who dominated the outcasts
CPAR-ELEMENTS AND PRINCIPLE OF ARTS.pptx
CMU-PPT-LACHICA-DEFENSE FOR RESEARCH PRESENTATION
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
Art Appreciation-Lesson-1-1.pptx College
slide head and neck muscel for medical students
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
waiting, Queuing, best time an event cab be done at a time .pdf
unit5-servicesrelatedtogeneticsinnursing-241221084421-d77c4adb.pptx
E8 Q1 020ssssssssssssssssssssssssssssss2 PS.pptx
Multiple scenes in a single painting.ppsx
SAPOTA CULTIVATION.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
22 Bindushree Sahu.pptxmadam curie life and achievements
Military history & Evolution of Armed Forces of the Philippines
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
Close Enough S3 E7 "Bridgette the Brain"
MUSIC-W1-Q1-1.pptxL;ML;MLNL;NL;NL;N;LNL;NL;N
Socio ch 1 characteristics characteristics

Lecture11 xing

  • 1. Machine LearningMachine Learninggg Exact InferenceExact Inference Eric XingEric Xing Lecture 11, August 14, 2010 AAA B A C B A C B A C A DC A DC B AB A AA fm bmcm dm Eric Xing © Eric Xing @ CMU, 2006-2010 1 Reading: E F H E F H E F H E FE FE F E G E G E G A DC E A DC E A DC E DC DC hm gm em fm
  • 2. Inference and LearningInference and Learning  We now have compact representations of probability distributions: GM  A BN M describes a unique probability distribution P  Typical tasks:  Task 1: How do we answer queries about P?  We use inference as a name for the process of computing answers to such queries  Task 2: How do we estimate a plausible model M from data D? Task 2: How do we estimate a plausible model M from data D? i. We use learning as a name for the process of obtaining point estimate of M. ii. But for Bayesian, they seek p(M |D), which is actually an inference problem. Eric Xing © Eric Xing @ CMU, 2006-2010 2 iii. When not all variables are observable, even computing point estimate of M need to do inference to impute the missing data.
  • 3. Inferential Query 1: LikelihoodLikelihood  Most of the queries one may ask involve evidenceq y  Evidence xv is an assignment of values to a set Xv of nodes in the GM over varialbe set X={X1, X2, …, Xn}  Without loss of generality Xv={Xk+1, … , Xn},  Write XH=XXv as the set of hidden variables, XH can be or X  Simplest query: compute probability of evidence ∑ ∑ )(),,()( 1 1 x x k k ,,x,xPPP v x vHv xXXx H   Eric Xing © Eric Xing @ CMU, 2006-2010 3  this is often referred to as computing the likelihood of xv
  • 4. Inferential Query 2: Conditional Probability  Often we are interested in the conditional probability Conditional Probability p y distribution of a variable given the evidence VHVH xXxX xXX ),(),( )|( PP P  this is the a posteriori belief in XH, given evidence xv    Hx VHH VH V VH VVH xxXx xXX ),()( )|( PP P this is the a posteriori belief in XH, given evidence xv  We usually query a subset Y of all hidden variables XH={Y,Z} and "don't care" about the remaining Z:and don t care about the remaining, Z:   z VV xzZYxY )|,()|( PP Eric Xing © Eric Xing @ CMU, 2006-2010 4  the process of summing out the "don't care" variables z is called marginalization, and the resulting P(Y|xv) is called a marginal prob.
  • 5. Applications of a posteriori Belief  Prediction: what is the probability of an outcome given the starting ? Applications of a posteriori Belief condition  the query node is a descendent of the evidence A CB ? q y  Diagnosis: what is the probability of disease/fault given symptoms A CB ?  the query node an ancestor of the evidence  Learning under partial observation A CB g p  fill in the unobserved values under an "EM" setting (more later)  The directionality of information flow between variables is not Eric Xing © Eric Xing @ CMU, 2006-2010 5 y restricted by the directionality of the edges in a GM  probabilistic inference can combine evidence form all parts of the network
  • 6. Inferential Query 3: Most Probable Assignment  In this query we want to find the most probable joint Most Probable Assignment assignment (MPA) for some variables of interest  Such reasoning is usually performed under some given evidence xv, and ignoring (the values of) other variables Z:   z VyVyV xzZYxYxY )|,(maxarg)|(maxarg|* PP  this is the maximum a posteriori configuration of Y. Eric Xing © Eric Xing @ CMU, 2006-2010 6
  • 7. Complexity of Inference Thm: Complexity of Inference Thm: Computing P(XH=xH| xv) in an arbitrary GM is NP-hard  Hardness does not mean we cannot solve inference  It implies that we cannot find a general procedure that works efficiently for arbitrary GMsy  For particular families of GMs, we can have provably efficient procedures Eric Xing © Eric Xing @ CMU, 2006-2010 7
  • 8. Approaches to inferenceApproaches to inference  Exact inference algorithmsg  The elimination algorithm  Belief propagationp p g  The junction tree algorithms (but will not cover in detail here)  Approximate inference techniques Approximate inference techniques  Variational algorithmsVariational algorithms  Stochastic simulation / sampling methods  Markov chain Monte Carlo methods Eric Xing © Eric Xing @ CMU, 2006-2010 8
  • 9. Inference on General BN via Variable Elimination General idea: Variable Elimination  Write query in the form   ii paxPXP 1 )|(),( e  this suggests an "elimination order" of latent variables to be marginalized  Iteratively   nx x x i ii paxPXP 3 2 1 )|(),( e  Iteratively  Move all irrelevant terms outside of innermost sum  Perform innermost sum, getting a new term I t th t i t th d t Insert the new term into the product  wrap-up ),( )|( eXP XP 1 Eric Xing © Eric Xing @ CMU, 2006-2010 9 )( ),( )|( e e P XP 1 1 
  • 10. Hidden Markov ModelHidden Markov Model y2 y3y1 yT... p(x y) = p(x x y y ) A AA Ax2 x3x1 xT y2 y3y1 yT... ... p(x, y) = p(x1……xT, y1, ……, yT) = p(y1) p(x1 | y1) p(y2 | y1) p(x2 | y2) … p(yT | yT-1) p(xT | yT) C diti l b bilitConditional probability: Eric Xing © Eric Xing @ CMU, 2006-2010 10
  • 11. Hidden Markov ModelHidden Markov Model y2 y3y1 yT... Conditional probability: A AA Ax2 x3x1 xT y2 y3y1 yT... ... Eric Xing © Eric Xing @ CMU, 2006-2010 11
  • 12. A Bayesian network A food web A Bayesian network A food web B A DC E FE F G H Eric Xing © Eric Xing @ CMU, 2006-2010 12 What is the probability that hawks are leaving given that the grass condition is poor?
  • 13. Example: Variable Elimination  Query: P(A |h) B A Example: Variable Elimination  Need to eliminate: B,C,D,E,F,G,H  Initial factors: B A DC  Choose an elimination order: H,G,F,E,D,C,B E F G H ),|()|()|(),|()|()|()()( fehPegPafPdcePadPbcPbPaP  Step 1:  Conditioning (fix the evidence node (i.e., h) on its observed value (i.e., )):h ~  This step is isomorphic to a marginalization step: ),| ~ (),( fehhpfemh  B A DC Eric Xing © Eric Xing @ CMU, 2006-2010 13   h h hhfehpfem ) ~ (),|(),(  E F G
  • 14. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B,C,D,E,F,G  Initial factors: B A DC E F G H ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femegPafPdcePadPbcPbPaP fehPegPafPdcePadPbcPbPaP h  Step 2: Eliminate G t compute 1)|()(  g g egpem B A DC)()()|()|()|()|()()( fememafPdcePadPbcPbPaP h Eric Xing © Eric Xing @ CMU, 2006-2010 14 E F ),()|(),|()|()|()()( ),()()|(),|()|()|()()( femafPdcePadPbcPbPaP fememafPdcePadPbcPbPaP h hg  
  • 15. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B,C,D,E,F  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPbcPbPaP femegPafPdcePadPbcPbPaP fehPegPafPdcePadPbcPbPaP h h    Step 3: Eliminate F t ),()|(),|()|()|()()( ff h  compute  f hf femafpaem ),()|(),( )()|()|()|()()( eamdcePadPbcPbPaP B A DC Eric Xing © Eric Xing @ CMU, 2006-2010 15 ),(),|()|()|()()( eamdcePadPbcPbPaP f E
  • 16. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B,C,D,E  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPbcPbPaP femegPafPdcePadPbcPbPaP fehPegPafPdcePadPbcPbPaP h h    Step 4: Eliminate E t ),(),|()|()|()()( ),()|(),|()|()|()()( eamdcePadPbcPbPaP ff f h  B A DC  compute  e fe eamdcepdcam ),(),|(),,( )()|()|()()( dcamadPbcPbPaP B A DC Eric Xing © Eric Xing @ CMU, 2006-2010 16 E ),,()|()|()()( dcamadPbcPbPaP e
  • 17. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B,C,D  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPbcPbPaP femegPafPdcePadPbcPbPaP fehPegPafPdcePadPbcPbPaP h h   ),,()|()|()()( ),(),|()|()|()()( dcamadPbcPbPaP eamdcePadPbcPbPaP e f    Step 5: Eliminate D  compute  ed dcamadpcam ),,()|(),( B A C Eric Xing © Eric Xing @ CMU, 2006-2010 17 d ed p ),,()|(),( ),()|()()( camdcPbPaP d
  • 18. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B,C  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPdcPbPaP femegPafPdcePadPdcPbPaP fehPegPafPdcePadPdcPbPaP h h   ),()|()()( ),,()|()|()()( ),(),|()|()|()()( camdcPbPaP dcamadPdcPbPaP eamdcePadPdcPbPaP d e f     Step 6: Eliminate C  compute  dc cambcpbam ),()|(),( B A Eric Xing © Eric Xing @ CMU, 2006-2010 18 ),()|()()( camdcPbPaP d c dc p ),()|(),(
  • 19. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPdcPbPaP femegPafPdcePadPdcPbPaP fehPegPafPdcePadPdcPbPaP h h   ),()|()()( ),,()|()|()()( ),(),|()|()|()()( camdcPbPaP dcamadPdcPbPaP eamdcePadPdcPbPaP d e f     Step 7: Eliminate B  compute ),()()( bambPaP c  cb bambpam ),()()( A Eric Xing © Eric Xing @ CMU, 2006-2010 19 b cb p ),()()( )()( amaP b
  • 20. Example: Variable Elimination  Query: P(B |h) B A Example: Variable Elimination  Need to eliminate: B  Initial factors: B A DC E F G H),()|(),|()|()|()()( ),()|()|(),|()|()|()()( ),|()|()|(),|()|()|()()( femafPdcePadPdcPbPaP femegPafPdcePadPdcPbPaP fehPegPafPdcePadPdcPbPaP h h   ),()|()()( ),,()|()|()()( ),(),|()|()|()()( camdcPbPaP dcamadPdcPbPaP eamdcePadPdcPbPaP d e f     Step 8: Wrap-up )()( ),()()( amaP bambPaP b c   ,)()() ~ ,( amaphap b  b amaphp )()() ~ ( Eric Xing © Eric Xing @ CMU, 2006-2010 20 Step 8 ap up ,)()(),( pp b   a b b amap amap haP )()( )()( ) ~ |( a b amaphp )()()(
  • 21. Complexity of variable elimination  Suppose in one elimination step we compute elimination  x kxkx yyxmyym ),,,('),,( 11   k xmyyxm )()(' y This requires  multiplications   i cikx i xmyyxm 1 1 ),(),,,( y  CXk )Val()Val( Y p ─ For each value of x, y1, …, yk, we do k multiplications i Ci )()(  additions ─ For each value of y1, …, yk , we do |Val(X)| additions  i Ci X )Val()Val( Y Eric Xing © Eric Xing @ CMU, 2006-2010 21 Complexity is exponential in number of variables in the intermediate factor
  • 22. Elimination CliquesElimination Cliques B AB A DC E F G H B A DC E F G H B A DC B A DC G H B A DC B A DCD E F G H D E F G DC E F DC E )( fem )(em )( aem )( dcam B A DC B A C B A A ),( femh )(emg ),( aemf ),,( dcame Eric Xing © Eric Xing @ CMU, 2006-2010 22 D ),( camd ),( bamc )(amb
  • 23. Understanding Variable EliminationElimination  A graph elimination algorithm B A DC E F B A DC E F B A DC B A DC E F B A DC E F B A DC E B A C B A A moralization G H G H G graph elimination  Intermediate terms correspond to the cliques resulted from elimination  “good” elimination orderings lead to small cliques and hence reduce complexitygood elimination orderings lead to small cliques and hence reduce complexity (what will happen if we eliminate "e" first in the above graph?)  finding the optimum ordering is NP-hard, but for many graph optimum or near- optimum can often be heuristically found Eric Xing © Eric Xing @ CMU, 2006-2010 23 p y  Applies to undirected GMs
  • 24. From Elimination to Belief PropagationPropagation  Recall that Induced dependency during marginalization isp y g g captured in elimination cliques  Summation <-> elimination  Intermediate term <-> elimination cliqueq A B A C A B A A A E F A DC A DC Eric Xing © Eric Xing @ CMU, 2006-2010 24  Can this lead to an generic inference algorithm? E F H E G E
  • 25. Tree GMsTree GMs Undirected tree: a unique path between Directed tree: all nodes except the root ha e e actl one Poly tree: can have multiple parents Eric Xing © Eric Xing @ CMU, 2006-2010 25 any pair of nodes have exactly one parent
  • 26. Equivalence of directed and undirected trees  Any undirected tree can be converted to a directed tree by choosing a root undirected trees node and directing all edges away from it  A directed tree and the corresponding undirected tree make the same conditional independence assertionsp  Parameterizations are essentially the same.  Undirected tree: Undirected tree:  Directed tree:  Equivalence: Eric Xing © Eric Xing @ CMU, 2006-2010 26  Evidence:?
  • 27. From elimination to message passingpassing  Recall ELIMINATION algorithm:  Choose an ordering Z in which query node f is the final node  Place all potentials on an active list  Eliminate node i by removing all potentials containing i, take sum/product over xi. Place the resultant factor back on the list Place the resultant factor back on the list  For a TREE graph:  Choose query node f as the root of the tree  View tree as a directed tree with edges pointing towards from f  Elimination ordering based on depth-first traversal  Elimination of each node can be considered as message-passing (or Belief Propagation) Elimination of each node can be considered as message passing (or Belief Propagation) directly along tree branches, rather than on some transformed graphs  thus, we can use the tree itself as a data-structure to do general inference!! Eric Xing © Eric Xing @ CMU, 2006-2010 27
  • 28. Message passing for treesMessage passing for trees Let m (x ) denote the factor resulting from f Let mij(xi) denote the factor resulting from eliminating variables from bellow up to i, which is a function of xi: i This is reminiscent of a message sent from j to i. j k l Eric Xing © Eric Xing @ CMU, 2006-2010 28 k l mij(xi) represents a "belief" of xi from xj!
  • 29.  Elimination on trees is equivalent to message passing alongq g p g g tree branches! f ii j Eric Xing © Eric Xing @ CMU, 2006-2010 29 k l
  • 30. The message passing protocol:The message passing protocol:  A two-pass algorithm:p g X1 (X ) (X ) X m21(X 1) m32(X 2) m42(X 2) m12(X 2) X2 X3 X4 m32(X 2) m42(X 2) Eric Xing © Eric Xing @ CMU, 2006-2010 30 m24(X 4) 3 m23(X 3)
  • 31. Belief Propagation (SP-algorithm): Sequential implementationSequential implementation Eric Xing © Eric Xing @ CMU, 2006-2010 31
  • 32. Belief Propagation (SP-algorithm): Parallel synchronous implementationParallel synchronous implementation  For a node of degree d, whenever messages have arrived on any subset of d-1 node, compute the message for the remaining edge and send! Eric Xing © Eric Xing @ CMU, 2006-2010 32 compute the message for the remaining edge and send!  A pair of messages have been computed for each edge, one for each direction  All incoming messages are eventually computed for each node
  • 33. Correctness of BP on treeCorrectness of BP on tree  Collollary: the synchronous implementation is "non-blocking"  Thm: The Message Passage Guarantees obtaining all marginals in the tree  What about non-tree? Eric Xing © Eric Xing @ CMU, 2006-2010 33
  • 34. Inference on general GMInference on general GM  Now, what if the GM is not a tree-like graph?  Can we still directly run message i t l l it d ?message-passing protocol along its edges?  For non-trees, we do not have the guarantee that message-passingg g p g will be consistent!  Then what? Then what?  Construct a graph data-structure from P that has a tree structure, and run message-passing on it! Eric Xing © Eric Xing @ CMU, 2006-2010 34  Junction tree algorithm
  • 35. Elimination Clique  Recall that Induced dependency during marginalization is Elimination Clique p y g g captured in elimination cliques  Summation <-> elimination  Intermediate term <-> elimination cliqueq A B A C A B A A A E F A DC A DC Eric Xing © Eric Xing @ CMU, 2006-2010 35  Can this lead to an generic inference algorithm? E F H E G E
  • 36. A Clique TreeA Clique Tree B A C B A A bmcm A C A DC fm dm E F A DC E hm em E F H E G E h gm Eric Xing © Eric Xing @ CMU, 2006-2010 36  e fg e eamemdcep dcam ),()(),|( ),,(
  • 37. From Elimination to Message Passing  Elimination  message passing on a clique tree Passing  Elimination  message passing on a clique tree B A DC E F B A DC E F B A DC B A DC E F B A DC E F B A DC E B A C B A A B A C B A A bmcm G H G H G  A E F A DC A DC em fm dm  e dcam ),,( E F H E G DC E hm gm  e fg eamemdcep ),()(),|( Eric Xing © Eric Xing @ CMU, 2006-2010 37  Messages can be reused
  • 38. From Elimination to Message PassingPassing  Elimination  message passing on a clique tree Elimination  message passing on a clique tree  Another query ... B A C B A A cm bm A E F A DC A DC em dm fm E F H E G DC E gm hm Eric Xing © Eric Xing @ CMU, 2006-2010 38  Messages mf and mh are reused, others need to be recomputed
  • 39. The Shafer Shenoy AlgorithmThe Shafer Shenoy Algorithm  Shafer-Shenoy algorithmy g  Message from clique i to clique j :   S )(  Clique marginal     iji i SC jk kiikCji S )(  SCp )()(  Eric Xing © Eric Xing @ CMU, 2006-2010 39   k kiikCi SCp i )()( 
  • 40. A Sketch of the Junction Tree AlgorithmAlgorithm  The algorithmg  Construction of junction trees --- a special clique tree  Propagation of probabilities --- a message-passing protocol  Results in marginal probabilities of all cliques --- solves all queries in a single run  A generic exact inference algorithm for any GM A generic exact inference algorithm for any GM  Complexity: exponential in the size of the maximal clique --- a good elimination order often leads to small maximal clique, and hence a good (i.e., thin) JT  Many well-known algorithms are special cases of JT Eric Xing © Eric Xing @ CMU, 2006-2010 40  Forward-backward, Kalman filter, Peeling, Sum-Product ...
  • 41. The Junction tree algorithm for HMMThe Junction tree algorithm for HMM  A junction tree for the HMM ),( 11 xy ),( 21 yy ),( 32 yy ),( TT yy 1 A AA Ax2 x3x1 xT y2 y3y1 yT ... ... y yy yy yy )( 2y )( 3y )( Ty )( 1y )( 2y   Rightward pass ),( 22 xy ),( 33 xy ),( TT xy ),( 1tt yy)( ttt y1 )( 11  ttt y  )|()()|(    ty tttttttttt yyyyy )()(),()( 11111   This is exactly the forward algorithm! ),( 11  tt xy )( 1 tt y        t tt t y tttyytt y ttttttt yayxp yxpyyyp )()|( )|()()|( , 111 1111 1   y g  Leftward pass …     1 11111 ty tttttttttt yyyyy )()(),()(  ),( 1tt yy)( ttt y1 )( 11  ttt y )( 1 ty Eric Xing © Eric Xing @ CMU, 2006-2010 41  This is exactly the backward algorithm! 1ty    1 11111 ty ttttttt yxpyyyp )|()()|(  ),( 11  tt xy )( 1 tt y
  • 42. SummarySummary  The simple Eliminate algorithm captures the key algorithmic Operation underlying probabilistic inference: --- That of taking a sum over product of potential functions  The computational complexity of the Eliminate algorithm can be reduced to purely graph-theoretic considerations.  This graph interpretation will also provide hints about how to design improved inference algorithms  What can we say about the overall computational complexity of the algorithm? In particular, how can we control the "size" of the Eric Xing © Eric Xing @ CMU, 2006-2010 42 summands that appear in the sequence of summation operation.