SlideShare a Scribd company logo
MarkovChainMonteCarlo
theory and worked examples




                             Dario Digiuni,
                             A.A. 2007/2008
Markov Chain Monte Carlo
• Class of sampling algorithms

• High sampling efficiency

• Sample from a distribution with unknown normalization constant

• Often the only way to solve problems in time polynomial in the
  number of dimensions
       e.g. evaluation of a convex body volume
MCMC: applications
   • Statistical Mechanics
     Metropolis-Hastings



   • Optimization
     ▫ Simulated annealing




   • Bayesian Inference
     ▫ Metropolis-Hastings
     ▫ Gibbs sampling
The Monte Carlo principle
• Sample a set of N independent and identically-distributed variables




• Approximation of the target p.d.f. with the empirical expression




       … then approximation of the integrals!
Rejection Sampling
  1. It needs finding M!
  2. Low acceptance rate
Idea
• I can use the previously sampled value to find the following one

• Exploration of the configuration space by means of Markov Chains:




       def .: Markov process




       def .: Markov chain
Invariant distribution
 • Stability conditions:

   1. Irreducibility= for every state there exists a finite probability to visit
      any other state
   2. Aperiodicity = there are no loops.

 • Sufficient condition
   1. Detailed balance principle




 MCMC algorithms are aperiodic, irreducible Markov chains having
  the target pdf as the invariant distribution
Example
• What is the probability to find the lift at the ground floor in a three
  floor building?

  ▫ 3 states Markov chain




  ▫ Lift= Random Walker

  ▫ Transition matrix




  ▫ Looking for the invariant distribution
      … burn-in …
Example - 2
• I can apply the matrix T on the right to any of the states, e.g.



                                                          homogeneous
                                                          Markov chain




                                               ~ 50% is the probability to find
• Google’s PageRank:                               the lift at the ground floor

  ▫ Websites are the states, T is defined by the number of hyperlinks among
    them and the user is the random walker:

      The webpages are displayed following the invariant distribution!
Metropolis-Hastings
• Given the target distribution
                                                          equivalent to T
  1.   Choose a value for

  2.   Sample from a proposal distribution

  3.   Accept the new value with probability




  4.   Return to 1
                  Ratio independent            Equal in Metropolis algorithm
                 of the normalization!
M.-H. – Pros and Cons
• Very general sampling method:

  ▫ I can sample from a unnormalized distribution

  ▫ It does not require to provide upper bound for the function



• Good working depends on the choice of the proposal distribution

  ▫ well-mixing condition
M.-H. - Example
• In Statistical Mechanics it is important to evalue the partition
  function,

  e.g. Ising model
                                                   Sum every possible spin state:
                                                     In a 10 x 10 x 10 spin cube,
                                                      I would have to sum over
 MCMC APPROACH:

 1. Evaluate the system’s energy                   Possible states = UNFEASIBLE

 2. Pick up a spin at random and flip it:

     1. If energy decreases, this is the new spin configuration

     2. If energy increases, this is the new spin configuration with
        probability
Simulated Annealing
• It allows one to find the global maximum of a generic pdf

  ▫ No comparison between the value of local minima required
  ▫ Application to the maximum-likelihood method

• It is a non-homogeneous Markov chain whose invariant distribution
  keeps changing as follows:
Simulated Annealing: example
  • Let us apply the algorithm to a simple, 1-dimensional case

  • The optimal cooling scheme is
Simulated Annealing: Pros and Cons
• The global maximum is univocally determined
  ▫ Even if walker starts next to a local (non global!) maximum, it converges to the
    true global maximum




• It requires a good tuning of the parameters
Gibbs Sampler
• Optimal method to marginalize multidimensional distributions

• Let us assume we have a n-dimensional vector and that we know all
  the conditional probability expression for the pdf




• We take the following proposal distribution:
Gibbs Sampler - 2
• Then:




                    very efficient
                      method!
Gibbs Sampler – practically
Gibbs Sampler – practically
1.       §Initialize              fix n-1 coordinates and sample
                                  from the resulting pdf

2.       for (i=0 ; i < N; i++)

     •     Sample

     •     Sample

     •     Sample




     •     Sample
Gibbs Sampler – example




• Let us pretend we cannot determine the normalization
  constant…




  … but we can make a comparison with the true marginalized
    pdf…
Gibbs Sampler – results
                • Comparison      between      Gibbs
                  Sampling and the true M.-H.
                  sampling from the marginalized pdf


                             • Good c2 agreement
A complex MCMC application
 A radioactive source decays with frequency l1 and a detector records
   only every k1 –th event, then at the moment tc the decay rate
 changes to l2 and only one event out ofk2 is recorded.



 Apparently l1 , k1 , tc , l2 and k2 are undetermined.


         We wish to find them.
Preparation
• The waiting time for the k-th event in a Poissonian process with
  frequency l is distributed according to:




• I can sample a big amount of events from this pdf, changing the
  parameters l1 e k1 to l2 e k2 at time tc

• I evaluate the likelihood:
Idea
• I assume log-likelihood to be the invariant distribution!
  ▫ which are the Markov chain states?

        struct State {
                                                 Parameter
            double lambda1, lambda2;
                                                 space
            double tc;
            int k1, k2;                          Corresponding log-
            double plog;                         likelihood value

           State(double la1, double la2, double t, int kk1, int kk2) :

                       lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {}

         State() {};
        };
Practically
• I have to find an appropriate proposal distribution to move among
  the states
  ▫ Attention: varying li and ki I have toi prevent the acceptance rate to be
    too low… but also too high!

• The a ratio is evaluated as the ratio between the final-state and
  initial-state likelihood values.

• Try to guess the values for li , ki and tc

• Let the chain evolve for a burn-in time and then record the results.
Results   • Even if the inital guess is quite far from the real
            value, the random walker converges.
            guess:          l1=5      l2 = 5   k1 = 3    k2 = 2


            real:           l1=1      l2 = 2   k1 = 1,   k2 = 1
Results- 2
  • Estimate of the uncertainty




                                  l2

                       l1
Results- 3
    • All the parameters can be detemined quickly
      guess:        tc=150           real:    tc=300
References
• C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50
  (2003), 5-43.

• G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174.

• W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, Numerical
  Recipes , Third Edition, Cambridge University Press, 2007.

• M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli
  (1998).

• B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes
  for EEB 581

More Related Content

PDF
Markov Chain Monte Carlo Methods
PDF
Introduction to MCMC methods
PPTX
ML - Multiple Linear Regression
PPTX
Markov chain
PPTX
ID3 ALGORITHM
PDF
Metropolis-Hastings MCMC Short Tutorial
PDF
Artificial intelligence and first order logic
PPTX
Support vector machines (svm)
Markov Chain Monte Carlo Methods
Introduction to MCMC methods
ML - Multiple Linear Regression
Markov chain
ID3 ALGORITHM
Metropolis-Hastings MCMC Short Tutorial
Artificial intelligence and first order logic
Support vector machines (svm)

What's hot (20)

PPTX
Machine learning session4(linear regression)
PPT
Naive bayes
PDF
LSTM Basics
PDF
Latent Dirichlet Allocation
PDF
PCA (Principal component analysis)
PDF
My data are incomplete and noisy: Information-reduction statistical methods f...
PDF
2.03 bayesian estimation
ODP
Probabilistic programming
PDF
Naive Bayes Classifier
PDF
Methods of Optimization in Machine Learning
PPTX
Markov chain-model
PPTX
A brief introduction of Artificial neural network by example
PDF
Multi-armed Bandits
PDF
Probability Theory
PDF
02 linear algebra
PDF
Logistic regression in Machine Learning
PDF
Introduction to XGBoost
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPTX
Introduction to Clustering algorithm
PDF
Stochastic gradient descent and its tuning
Machine learning session4(linear regression)
Naive bayes
LSTM Basics
Latent Dirichlet Allocation
PCA (Principal component analysis)
My data are incomplete and noisy: Information-reduction statistical methods f...
2.03 bayesian estimation
Probabilistic programming
Naive Bayes Classifier
Methods of Optimization in Machine Learning
Markov chain-model
A brief introduction of Artificial neural network by example
Multi-armed Bandits
Probability Theory
02 linear algebra
Logistic regression in Machine Learning
Introduction to XGBoost
Naïve Bayes Classifier Algorithm.pptx
Introduction to Clustering algorithm
Stochastic gradient descent and its tuning
Ad

Similar to Markov Chain Monte Carlo explained (20)

PDF
Change Point Analysis
PPT
MAchin learning graphoalmodesland bayesian netorls
PDF
Discrete Diffusion Models - Presentation
PPTX
lecture # 5.pptxvvhjjjgguiyffuffyuffyufg
PPTX
머피의 머신러닝: 17장 Markov Chain and HMM
PPTX
iit masters thesis powerpoint presentation
PDF
PDF
Graph Analysis Beyond Linear Algebra
PPTX
Financial Networks III. Centrality and Systemic Importance
PPTX
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
PPTX
Fa18_P1.pptx
PPTX
Presentation1
PPTX
Lecture # 4.pptxfhtjyjyuiuuiiuyutyytytyyt
PPTX
Selection K in K-means Clustering
PPTX
How to analyse bulk transcriptomic data using Deseq2
PDF
Firefly exact MCMC for Big Data
PPT
A small debate of power of randomness
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Sampling and Markov Chain Monte Carlo Techniques
PPT
shoemakerRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR.ppt
Change Point Analysis
MAchin learning graphoalmodesland bayesian netorls
Discrete Diffusion Models - Presentation
lecture # 5.pptxvvhjjjgguiyffuffyuffyufg
머피의 머신러닝: 17장 Markov Chain and HMM
iit masters thesis powerpoint presentation
Graph Analysis Beyond Linear Algebra
Financial Networks III. Centrality and Systemic Importance
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
Fa18_P1.pptx
Presentation1
Lecture # 4.pptxfhtjyjyuiuuiiuyutyytytyyt
Selection K in K-means Clustering
How to analyse bulk transcriptomic data using Deseq2
Firefly exact MCMC for Big Data
A small debate of power of randomness
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Sampling and Markov Chain Monte Carlo Techniques
shoemakerRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR.ppt
Ad

Recently uploaded (20)

PPT
Chapter four Project-Preparation material
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
WRN_Investor_Presentation_August 2025.pdf
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PPT
Data mining for business intelligence ch04 sharda
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
DOCX
Euro SEO Services 1st 3 General Updates.docx
PDF
Business model innovation report 2022.pdf
PDF
A Brief Introduction About Julia Allison
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PPTX
5 Stages of group development guide.pptx
PPTX
Business Ethics - An introduction and its overview.pptx
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Chapter four Project-Preparation material
Laughter Yoga Basic Learning Workshop Manual
WRN_Investor_Presentation_August 2025.pdf
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Data mining for business intelligence ch04 sharda
ICG2025_ICG 6th steering committee 30-8-24.pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Euro SEO Services 1st 3 General Updates.docx
Business model innovation report 2022.pdf
A Brief Introduction About Julia Allison
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
DOC-20250806-WA0002._20250806_112011_0000.pdf
5 Stages of group development guide.pptx
Business Ethics - An introduction and its overview.pptx
Probability Distribution, binomial distribution, poisson distribution
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf

Markov Chain Monte Carlo explained

  • 1. MarkovChainMonteCarlo theory and worked examples Dario Digiuni, A.A. 2007/2008
  • 2. Markov Chain Monte Carlo • Class of sampling algorithms • High sampling efficiency • Sample from a distribution with unknown normalization constant • Often the only way to solve problems in time polynomial in the number of dimensions e.g. evaluation of a convex body volume
  • 3. MCMC: applications • Statistical Mechanics Metropolis-Hastings • Optimization ▫ Simulated annealing • Bayesian Inference ▫ Metropolis-Hastings ▫ Gibbs sampling
  • 4. The Monte Carlo principle • Sample a set of N independent and identically-distributed variables • Approximation of the target p.d.f. with the empirical expression … then approximation of the integrals!
  • 5. Rejection Sampling 1. It needs finding M! 2. Low acceptance rate
  • 6. Idea • I can use the previously sampled value to find the following one • Exploration of the configuration space by means of Markov Chains: def .: Markov process def .: Markov chain
  • 7. Invariant distribution • Stability conditions: 1. Irreducibility= for every state there exists a finite probability to visit any other state 2. Aperiodicity = there are no loops. • Sufficient condition 1. Detailed balance principle MCMC algorithms are aperiodic, irreducible Markov chains having the target pdf as the invariant distribution
  • 8. Example • What is the probability to find the lift at the ground floor in a three floor building? ▫ 3 states Markov chain ▫ Lift= Random Walker ▫ Transition matrix ▫ Looking for the invariant distribution … burn-in …
  • 9. Example - 2 • I can apply the matrix T on the right to any of the states, e.g. homogeneous Markov chain ~ 50% is the probability to find • Google’s PageRank: the lift at the ground floor ▫ Websites are the states, T is defined by the number of hyperlinks among them and the user is the random walker:  The webpages are displayed following the invariant distribution!
  • 10. Metropolis-Hastings • Given the target distribution equivalent to T 1. Choose a value for 2. Sample from a proposal distribution 3. Accept the new value with probability 4. Return to 1 Ratio independent Equal in Metropolis algorithm of the normalization!
  • 11. M.-H. – Pros and Cons • Very general sampling method: ▫ I can sample from a unnormalized distribution ▫ It does not require to provide upper bound for the function • Good working depends on the choice of the proposal distribution ▫ well-mixing condition
  • 12. M.-H. - Example • In Statistical Mechanics it is important to evalue the partition function, e.g. Ising model Sum every possible spin state: In a 10 x 10 x 10 spin cube, I would have to sum over MCMC APPROACH: 1. Evaluate the system’s energy Possible states = UNFEASIBLE 2. Pick up a spin at random and flip it: 1. If energy decreases, this is the new spin configuration 2. If energy increases, this is the new spin configuration with probability
  • 13. Simulated Annealing • It allows one to find the global maximum of a generic pdf ▫ No comparison between the value of local minima required ▫ Application to the maximum-likelihood method • It is a non-homogeneous Markov chain whose invariant distribution keeps changing as follows:
  • 14. Simulated Annealing: example • Let us apply the algorithm to a simple, 1-dimensional case • The optimal cooling scheme is
  • 15. Simulated Annealing: Pros and Cons • The global maximum is univocally determined ▫ Even if walker starts next to a local (non global!) maximum, it converges to the true global maximum • It requires a good tuning of the parameters
  • 16. Gibbs Sampler • Optimal method to marginalize multidimensional distributions • Let us assume we have a n-dimensional vector and that we know all the conditional probability expression for the pdf • We take the following proposal distribution:
  • 17. Gibbs Sampler - 2 • Then: very efficient method!
  • 18. Gibbs Sampler – practically
  • 19. Gibbs Sampler – practically 1. §Initialize fix n-1 coordinates and sample from the resulting pdf 2. for (i=0 ; i < N; i++) • Sample • Sample • Sample • Sample
  • 20. Gibbs Sampler – example • Let us pretend we cannot determine the normalization constant… … but we can make a comparison with the true marginalized pdf…
  • 21. Gibbs Sampler – results • Comparison between Gibbs Sampling and the true M.-H. sampling from the marginalized pdf • Good c2 agreement
  • 22. A complex MCMC application A radioactive source decays with frequency l1 and a detector records only every k1 –th event, then at the moment tc the decay rate changes to l2 and only one event out ofk2 is recorded. Apparently l1 , k1 , tc , l2 and k2 are undetermined. We wish to find them.
  • 23. Preparation • The waiting time for the k-th event in a Poissonian process with frequency l is distributed according to: • I can sample a big amount of events from this pdf, changing the parameters l1 e k1 to l2 e k2 at time tc • I evaluate the likelihood:
  • 24. Idea • I assume log-likelihood to be the invariant distribution! ▫ which are the Markov chain states? struct State { Parameter double lambda1, lambda2; space double tc; int k1, k2; Corresponding log- double plog; likelihood value State(double la1, double la2, double t, int kk1, int kk2) : lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {} State() {}; };
  • 25. Practically • I have to find an appropriate proposal distribution to move among the states ▫ Attention: varying li and ki I have toi prevent the acceptance rate to be too low… but also too high! • The a ratio is evaluated as the ratio between the final-state and initial-state likelihood values. • Try to guess the values for li , ki and tc • Let the chain evolve for a burn-in time and then record the results.
  • 26. Results • Even if the inital guess is quite far from the real value, the random walker converges. guess: l1=5 l2 = 5 k1 = 3 k2 = 2 real: l1=1 l2 = 2 k1 = 1, k2 = 1
  • 27. Results- 2 • Estimate of the uncertainty l2 l1
  • 28. Results- 3 • All the parameters can be detemined quickly guess: tc=150 real: tc=300
  • 29. References • C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50 (2003), 5-43. • G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174. • W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, Numerical Recipes , Third Edition, Cambridge University Press, 2007. • M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli (1998). • B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581