Cristopher M. Bishop's tutorial on graphical models

Part 1: Graphical Models Machine Learning Techniques for Computer Vision Microsoft Research Cambridge ECCV 2004, Prague Christopher M. Bishop

About this Tutorial Learning is the new frontier in computer vision Focus on concepts not lists of algorithms not technical details Graduate level Please ask questions!

Overview Part 1: Graphical models directed and undirected graphs inference and learning Part 2: Unsupervised learning mixture models, EM variational inference, model complexity continuous latent variables Part 3: Supervised learning decision theory linear models, neural networks, boosting, sparse kernel machines

Probability Theory Sum rule Product rule From these we have Bayes’ theorem with normalization

Role of the Graphs New insights into existing models Motivation for new models Graph based algorithms for calculation and computation c.f. Feynman diagrams in physics

Decomposition Consider an arbitrary joint distribution By successive application of the product rule

Directed Acyclic Graphs Joint distribution where denotes the parents of i No directed cycles

Undirected Graphs Provided then joint distribution is product of non-negative functions over the cliques of the graph where are the clique potentials, and Z is a normalization constant

Conditioning on Evidence Variables may be hidden (latent) or visible (observed) Latent variables may have a specific interpretation, or may be introduced to permit a richer class of distribution

Conditional Independences x independent of y given z if, for all values of z , For undirected graphs this is given by graph separation!

“Explaining Away” C.I. for directed graphs similar, but with one subtlety Illustration: pixel colour in an image image colour surface colour lighting colour

Example: State Space Models Hidden Markov model Kalman filter

Example: Factorial SSM Multiple hidden sequences Avoid exponentially large hidden space

Example: Markov Random Field Typical application: image region labelling

Example: Conditional Random Field

Inference Simple example: Bayes’ theorem

Message Passing Example Find marginal for a particular node for M -state nodes, cost is exponential in length of chain but, we can exploit the graphical structure (conditional independences)

Message Passing Joint distribution Exchange sums and products

Message Passing Express as product of messages Recursive evaluation of messages Find Z by normalizing

Belief Propagation Extension to general tree-structured graphs At each node: form product of incoming messages and local evidence marginalize to give outgoing message one message in each direction across every link Fails if there are loops

Junction Tree Algorithm An efficient exact algorithm for a general graph applies to both directed and undirected graphs compile original graph into a tree of cliques then perform message passing on this tree Problem: cost is exponential in size of largest clique many vision models have intractably large cliques

Loopy Belief Propagation Apply belief propagation directly to general graph need to keep iterating might not converge State-of-the-art performance in error-correcting codes

Max-product Algorithm Goal: find define then Message passing algorithm with “sum” replaced by “max” Example: Viterbi algorithm for HMMs

Inference and Learning Data set Likelihood function (independent observations) Maximize (log) likelihood Predictive distribution

Regularized Maximum Likelihood Prior , posterior MAP (maximum posterior) Predictive distribution Not really Bayesian

Bayesian Learning Key idea is to marginalize over unknown parameters, rather than make point estimates avoids severe over-fitting of ML and MAP allows direct model comparison Parameters are now latent variables Bayesian learning is an inference problem!

And Finally … the Exponential Family Many distributions can be written in the form Includes: Gaussian Dirichlet Gamma Multi-nomial Wishart Bernoulli … Building blocks in graphs to give rich probabilistic models

Illustration: the Gaussian Use precision (inverse variance) In standard form

Maximum Likelihood Likelihood function (independent observations) Depends on data via sufficient statistics of fixed dimension

Conjugate Priors Prior has same functional form as likelihood Hence posterior is of the form Can interpret prior as effective observations of value Examples: Gaussian for the mean of a Gaussian Gaussian-Wishart for mean and precision of Gaussian Dirichlet for the parameters of a discrete distribution

Summary of Part 1 Directed graphs Undirected graphs Inference by message passing: belief propagation

Cristopher M. Bishop's tutorial on graphical models

More Related Content

What's hot (17)

Similar to Cristopher M. Bishop's tutorial on graphical models (20)

More from butest (20)

Cristopher M. Bishop's tutorial on graphical models