SlideShare a Scribd company logo
Markov Chain Monte Carlo Methods
Applications in Machine Learning
Andres Mendez-Vazquez
June 1, 2017
1 / 61
Images/cinvestav-
Outline
1 Introduction
The Main Reason
Examples of Application
Basically
2 The Monte Carlo Method
FERMIAC and ENIAC Computers
Immediate Applications
3 Markov Chains
Introduction
Enters Perron-Frobenious Theorem
Enter Google’s Page Rank
4 Markov Chain Monte Carlo Methods
Combining the Power of the two Methods
5 Metropolis Hastings
Introduction
A General Idea
Applications in Machine Learning
6 The Gibbs Sampler
Introduction
The Simplest Algorithm
Applications in Machine Learning
2 / 61
Images/cinvestav-
Chance
There are many phenomenas that introduce chance in their models
Therefore
Why not use SAMPLING to understand those
phenomenas?
4 / 61
Images/cinvestav-
Thus
Markov Chain Monte Carlo (MCMC) Methods
Algorithms that use Markov Chains to achieve samples of a target
phenomena!!!
Thus
They are computer based simulations able to obtain samples of π (x).
5 / 61
Images/cinvestav-
The Reason
There are several high dimensional problems
For Example, computing the volume of a convex body in d dimensions.
It is the only known general approach for providing a solution within a
reasonable time, O dk .
Therefore
MCMC plays significant role in statistics, econometrics, physics and
computing science.
6 / 61
Images/cinvestav-
Examples of Application
Bayesian Inference and Learning
Given some unknown variables X1, X2, ..., XK and data Y , we want to
know properties about Y
Graphically
8 / 61
Images/cinvestav-
Deep Learning
Restricted Boltzmann Machines use MCMC to optimize their weights
E (v, h) = −
i
aivi −
j
bjhj −
i j
viwi,jhj
9 / 61
Images/cinvestav-
What do we want?
Given a probability distribution of interest
π (x) , x ∈ RN
Which has the following structure
π (x) =
1
Z
h (x)
where h (x) is a PDF and Z is a unknown normalization constant.
Thus,
We want to understand such distribution!!!
11 / 61
Images/cinvestav-
The beginning of Monte Carlo Methods
1945 two events change the world forever
The successful nuclear test at Alamogordo.
The building of the first electronic computer, ENIAC.
Pushed for the creation of the Monte Carlo Methods
Original idea came from Stan Ulman... He loved relaxing by playing
poker and solitary!!!
Stan had an uncle who borrowed money from relatives because he
“just had to go to Monte Carlo.”
13 / 61
Images/cinvestav-
Together with Von Neumann
The Guy Behind the Minimax Algorithm
They started to develop an idea to trace the path of neutrons in a
spherical reactor.
14 / 61
Images/cinvestav-
Thus
We have then
At each stage a sequence of decisions has to be made based on statistical
probabilities appropriate to the physical and geometric factors.
For this, we only need a source of uniform random numbers!!!
Because it is possible to use the inverse of the cumulative of the target
function to obtain the necessary samples!!!
15 / 61
Images/cinvestav-
Thus, the FERMIAC was born!!!
When the ENIAC was moved and it was necessary to keep generating
target statistics
16 / 61
Images/cinvestav-
However
Once the ENIAC went on-line again
It took two months to have the basic controls for the Monte-Carlo
One fortnight for the last phases of the implementation.
Then, the tests were ran
And Monte Carlo was born!!!
17 / 61
Images/cinvestav-
Look At This, Von Neumann’s Programs
Design First and Programming After
18 / 61
Images/cinvestav-
Monte Carlo Integration
We can get integral of complex functions
I =
Ω
sin ln (x + y + 1)dxdy
Where Ω is a disk with
(x, y) | x −
1
2
2
+ y −
1
2
2
≤
1
4
We only need a source of randomly uniform points at that area
I ≈ Volume of Ω × Average Value of f in Ω
20 / 61
Images/cinvestav-
Getting the First Moment!!!
The goal is to compute the following expectation
E [f] = f (z) p (z) dz
Solution
Obtain a set of samples z(i) where i = 1, ..., N drawn independently from
p(z)
Approximate the expectation as
E [f] ≈ E [f] =
1
N
N
i=1
f z(i)
21 / 61
Images/cinvestav-
Clustering Using Stochastic Process
We use the following process
Imagine a Chinese restaurant with an infinite number of circular tables, each with
infinite capacity!!!
Customer ONE sits at the first table
The next customer either sits at the same table as customer ONE
Or the next table
Something like this
22 / 61
Images/cinvestav-
Thus, it is possible to build and entire Random Process
Simply Asking
p (customer i assigned to table j|D, α) =
f (dij) if j = i
α if i = j
Where
D is the distance between customers
with a similarity dij = d (ci, cj)
Let see the code
There we have a series of nice ideas.
23 / 61
Images/cinvestav-
Markov Chains
The random process Xt ∈ S for t = 1, 2, ..., T has a Markov property
If and only if
p (XT |XT−1, XT−2, ..., X1) = p (XT |XT−1)
Finite-State Discrete Time Markov Chains
It can be completely specified by the transition matrix.
P = [pij] with pij = P [Xt = j|Xt−1 = i]
25 / 61
Images/cinvestav-
Example
We have the following transition matrix







0.2 01 0.1 0.1 0.5
0.4 01 0.1 0.2 0.1
0.1 0.3 0.2 0.1 0.1
0.2 0.4 0.2 0.3 0.2
0.1 0.1 0.4 0.3 0.1







Graphically
1
5
2
3
4
26 / 61
Images/cinvestav-
What kind of Markov Chain do we like to study?
Ergodic
A Markov chain is called an ergodic chain if it is possible to go from every
state to every state (not necessarily in one move).
Aperiodic
A state i has period k if any return to state i must occur in multiples of k
time steps.
k = gcd {n > 0|P (Xn = i|X0 = i) > 0}
27 / 61
Images/cinvestav-
Therefore
Thus
If k = 1, then the state is said to be aperiodic.
Otherwise (k > 1), the state is said to be periodic with period k.
A Markov chain is aperiodic if every state is aperiodic
28 / 61
Images/cinvestav-
The Theorem
Perron–Frobenius Theorem
Let A be a positive square matrix. Then:
a. ρ(A) is an eigenvalue, and it has a positive eigenvector.
b. ρ(A) is the only eigenvalue on the disc |λ| = ρ(A).
30 / 61
Images/cinvestav-
This and other theorems allows to calculate something
quite interesting
Using the Power method
The method is described by the recurrence relation
w(i+1)
=
Tw(i)
Tw(i)
where Tw(i) =
√
w(i)tTtTw(i)
Then
The sub-sequence {wki
}∞
i=1 converges to an eigenvector associated with
the dominant eigenvalue.
31 / 61
Images/cinvestav-
Long Ago in a Long Forgotten Land
Dozens of Companies fought for the Search Landscape
American On-Line
Netscape
Yahoo
Infoseek
Lycos
Altavista
33 / 61
Images/cinvestav-
This is Old
For Example
34 / 61
Images/cinvestav-
Enters Larry Paige and Sergey Brin (Circa 1996)
They Invented the Google Matrix (A Misspelling of Googol = 10100
)
G = αS + (1 − α) 1v
Where
S is a modified version of an adjacency matrix by converting the
number of links into a probability
35 / 61
Images/cinvestav-
In addition
Also
1 is the Column Vector of ones.
v is a row vector of probabilities
v =
1
n
,
1
n
, ...,
1
n
(At the initial experiments)
The Matrix n × n, (1 − α) 1v








(1 − α) 1
n (1 − α) 1
n · · · (1 − α) 1
n
(1 − α) 1
n (1 − α) 1
n · · · (1 − α) 1
n
...
... ... ...
(1 − α) 1
n (1 − α) 1
n · · · (1 − α) 1
n








36 / 61
Images/cinvestav-
The Dampening Factor
Finally α
In the Google matrix indicates that Random Web surfers move to a
different web-page by some means other than selecting a link with
probability 1 − α
37 / 61
Images/cinvestav-
The Algorithm was the Edge!!!
First Google Server
38 / 61
Images/cinvestav-
After the introduction of the algorithm,
THE END for everybody else!!!
39 / 61
Images/cinvestav-
Now Imagine the following
You have a target distribution π that you want to sample
You can use a generative q distribution that you know to try to generate
the necessary samples.
Then, you have a process like this
1 Sample x ∼ q (x).
2 Use the functional form of the target distribution π to
Accept or Reject the sample x as being generated by π (x)
41 / 61
Images/cinvestav-
Basically
Markov Condition
The generation of x does not depend on previous states!!!
42 / 61
Images/cinvestav-
Further!!!
Monte Carlo Method
The algebraic and geometric properties of the target distribution helps
to accept or reject the sample!!!
43 / 61
Images/cinvestav-
History - The not so great remarks...
Metropolis
Generalized
−→ Metropolis-Hasitngs
Special Case
−→ Gibbs Sampling
All developments are done in Computational Physics.
The Landmark 1953 Paper N. Metropolis, A. Rosenbluth, M.
Rosenbluth, A. Teller, and E. Teller:
“Equation of state calculations by fast computing machines, Journal of
Chemical Physics.”
There is a quote by A. Rosenbluth
“Metropolis played no role in its development other than providing
computer time!”
After all, Metropolis was the supervisor in Los Alamos National Lab.
45 / 61
Images/cinvestav-
Steps of the Metropolis-Hastings
An M-H steps involves the following
A M-H step uses
1 The Target/Invariant Distribution l (x)
2 The Proposal/Sampling Distribution q (x |x)
Then
It involves sampling a candidate value x given the current value x
according to q (x |x)
The Markov chain then moves towards x
With acceptance probability
A (x, x ) = min 1,
l (x ) q (x|x )
l (x) q (x |x)
47 / 61
Images/cinvestav-
Logistic Regression
We know
l (w|x, y) =
n
i=1


exp wT xi
1 + exp {wT xi}


yi
1
1 + exp {wT xi}
1−yi
(1)
In our case, we use as q a Multi-variate Gaussian
π (w) ∼ N (µ, Σ)
49 / 61
Images/cinvestav-
Logistic Regression
The Markov chain then moves towards x
With acceptance probability
A (x, x ) = min 1,
l (x )
l (x)
50 / 61
Images/cinvestav-
Thus, we trigger the process of Sampling
Then, we look at the modes in the distribution
51 / 61
Images/cinvestav-
Let’s Take a Look at the Program
It is not so complex
A little bit of work!!!
52 / 61
Images/cinvestav-
The Assumptions
Suppose
We have an n-dimensional vector x.
The expressions for the full conditionals
p (xj|x1, ..., xj−1, xj+1, ..., xn)
Here, we have the following proposal distribution
q x |x(i)
=



p xj |x
(i)
−j if x−j = x
(i)
−j
0 otherwise
54 / 61
Images/cinvestav-
The Gibbs Sampler
The Algorithm
1 Init x0,1:n
2 For i = 0 to N − 1
Sample x
(i+1)
1 ∼ p x1|x
(i)
2 , ..., x
(i)
n
Sample x
(i+1)
2 ∼ p x2|x
(i)
1 , x
(i)
3 , ..., x
(i)
n
· · ·
Sample x
(i+1)
j ∼ p xj|x
(i)
1 , ..., x
(i)
j−1, x
(i)
j+1, ..., x
(i)
n
· · ·
Sample x
(i+1)
n ∼ p xn|x
(i)
1 , ..., x
(i)
n−1
56 / 61
Images/cinvestav-
Latent Dirichlet Allocation
It is an algorithm for finding topics composed by sets of words
You require to have documents!!!
Data consists
Of documents di consisting of a set of words wi
In a universe of W = {w1, ..., wn} words.
58 / 61
Images/cinvestav-
Then
We want to find the mixture of words to topics
Thus, we can easily do this by counting:
Counts of topic k in document d
The distribution of topics in a document
Counts of word v in document d
The distribution of words in a document
Thus, we can compute the probability of topics Zi (Gibbs Term)
p (Zi|Z−i, W)
59 / 61
Images/cinvestav-
For this
We need to introduce some extra terms
Ωd,k - count of topic k in document d.
Ψk,v- counts of word v in document d.
Thus the Gibbs Term
p (Zi|Z−i, W) =
Ψ−i
k,v + β
v Ψ−i
k,v + Nv · β
×
Ω−i
d,k + α
k Ω−i
d,k + Kα
With
Nv = number of different words.
β = Renovation Dirichlet Parameter for words
K = Number of topics
α= Renovation Dirichlet Parameter for topics
60 / 61
Images/cinvestav-
So, we have the following code
We have
The Following!!!
61 / 61

More Related Content

PDF
26 Machine Learning Unsupervised Fuzzy C-Means
PDF
20 k-means, k-center, k-meoids and variations
PDF
Intro to Classification: Logistic Regression & SVM
PDF
Introduction to logistic regression
PDF
06 recurrent neural_networks
PDF
11 Machine Learning Important Issues in Machine Learning
PDF
Cheatsheet convolutional-neural-networks
PDF
Cheatsheet supervised-learning
26 Machine Learning Unsupervised Fuzzy C-Means
20 k-means, k-center, k-meoids and variations
Intro to Classification: Logistic Regression & SVM
Introduction to logistic regression
06 recurrent neural_networks
11 Machine Learning Important Issues in Machine Learning
Cheatsheet convolutional-neural-networks
Cheatsheet supervised-learning

What's hot (20)

PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
PDF
Distributed ADMM
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
PMF BPMF and BPTF
PDF
MCMC and likelihood-free methods
PDF
MVPA with SpaceNet: sparse structured priors
PDF
Cheatsheet recurrent-neural-networks
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Cheatsheet deep-learning-tips-tricks
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Iclr2016 vaeまとめ
PDF
A review on structure learning in GNN
PDF
Cheatsheet deep-learning
PPT
Chapter 24 aoa
PDF
Kernels and Support Vector Machines
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
(DL hacks輪読) Variational Inference with Rényi Divergence
PDF
Assignment 2 daa
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
Distributed ADMM
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PMF BPMF and BPTF
MCMC and likelihood-free methods
MVPA with SpaceNet: sparse structured priors
Cheatsheet recurrent-neural-networks
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Cheatsheet deep-learning-tips-tricks
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Iclr2016 vaeまとめ
A review on structure learning in GNN
Cheatsheet deep-learning
Chapter 24 aoa
Kernels and Support Vector Machines
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
(DL hacks輪読) Variational Inference with Rényi Divergence
Assignment 2 daa
Ad

Similar to Markov chain monte_carlo_methods_for_machine_learning (20)

PDF
Markov Chain Monte Carlo explained
PPT
ch14MarkovChainkfkkklmkllmkkaskldask.ppt
PDF
Sampling and Markov Chain Monte Carlo Techniques
PPT
MAchin learning graphoalmodesland bayesian netorls
PPTX
Monte Carlo Berkeley.pptx
PDF
Markov Chain Monte Carlo Methods
PDF
A bit about мcmc
PDF
thesis_final_draft
PDF
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
PDF
PDF
Markov Chain Monte Carlo Innovations And Applications W S Kendall
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
PPTX
Advanced operation research
PPTX
Pattern Recognition and Machine Learning : Graphical Models
PDF
Hastings 1970
PDF
12 Machine Learning Supervised Hidden Markov Chains
ODP
Machine learning 2016: deep networks and Monte Carlo Tree Search
ODP
Machine learning 2016: deep networks and Monte Carlo Tree Search
PDF
Introduction to MCMC methods
Markov Chain Monte Carlo explained
ch14MarkovChainkfkkklmkllmkkaskldask.ppt
Sampling and Markov Chain Monte Carlo Techniques
MAchin learning graphoalmodesland bayesian netorls
Monte Carlo Berkeley.pptx
Markov Chain Monte Carlo Methods
A bit about мcmc
thesis_final_draft
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Markov Chain Monte Carlo Innovations And Applications W S Kendall
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
Advanced operation research
Pattern Recognition and Machine Learning : Graphical Models
Hastings 1970
12 Machine Learning Supervised Hidden Markov Chains
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
Introduction to MCMC methods
Ad

More from Andres Mendez-Vazquez (20)

PDF
2.03 bayesian estimation
PDF
05 linear transformations
PDF
01.04 orthonormal basis_eigen_vectors
PDF
01.03 squared matrices_and_other_issues
PDF
01.02 linear equations
PDF
01.01 vector spaces
PDF
05 backpropagation automatic_differentiation
PDF
Zetta global
PDF
01 Introduction to Neural Networks and Deep Learning
PDF
25 introduction reinforcement_learning
PDF
Neural Networks and Deep Learning Syllabus
PDF
Introduction to artificial_intelligence_syllabus
PDF
Ideas 09 22_2018
PDF
Ideas about a Bachelor in Machine Learning/Data Sciences
PDF
Analysis of Algorithms Syllabus
PDF
18.1 combining models
PDF
17 vapnik chervonenkis dimension
PDF
A basic introduction to learning
PDF
Introduction Mathematics Intelligent Systems Syllabus
PDF
Introduction Machine Learning Syllabus
2.03 bayesian estimation
05 linear transformations
01.04 orthonormal basis_eigen_vectors
01.03 squared matrices_and_other_issues
01.02 linear equations
01.01 vector spaces
05 backpropagation automatic_differentiation
Zetta global
01 Introduction to Neural Networks and Deep Learning
25 introduction reinforcement_learning
Neural Networks and Deep Learning Syllabus
Introduction to artificial_intelligence_syllabus
Ideas 09 22_2018
Ideas about a Bachelor in Machine Learning/Data Sciences
Analysis of Algorithms Syllabus
18.1 combining models
17 vapnik chervonenkis dimension
A basic introduction to learning
Introduction Mathematics Intelligent Systems Syllabus
Introduction Machine Learning Syllabus

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PPT on Performance Review to get promotions
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Geodesy 1.pptx...............................................
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Well-logging-methods_new................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Construction Project Organization Group 2.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mechanical Engineering MATERIALS Selection
CH1 Production IntroductoryConcepts.pptx
Lecture Notes Electrical Wiring System Components
PPT on Performance Review to get promotions
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Geodesy 1.pptx...............................................
web development for engineering and engineering
Digital Logic Computer Design lecture notes
Foundation to blockchain - A guide to Blockchain Tech
Well-logging-methods_new................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Sustainable Sites - Green Building Construction
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Internet of Things (IOT) - A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Construction Project Organization Group 2.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Markov chain monte_carlo_methods_for_machine_learning

  • 1. Markov Chain Monte Carlo Methods Applications in Machine Learning Andres Mendez-Vazquez June 1, 2017 1 / 61
  • 2. Images/cinvestav- Outline 1 Introduction The Main Reason Examples of Application Basically 2 The Monte Carlo Method FERMIAC and ENIAC Computers Immediate Applications 3 Markov Chains Introduction Enters Perron-Frobenious Theorem Enter Google’s Page Rank 4 Markov Chain Monte Carlo Methods Combining the Power of the two Methods 5 Metropolis Hastings Introduction A General Idea Applications in Machine Learning 6 The Gibbs Sampler Introduction The Simplest Algorithm Applications in Machine Learning 2 / 61
  • 3. Images/cinvestav- Chance There are many phenomenas that introduce chance in their models Therefore Why not use SAMPLING to understand those phenomenas? 4 / 61
  • 4. Images/cinvestav- Thus Markov Chain Monte Carlo (MCMC) Methods Algorithms that use Markov Chains to achieve samples of a target phenomena!!! Thus They are computer based simulations able to obtain samples of π (x). 5 / 61
  • 5. Images/cinvestav- The Reason There are several high dimensional problems For Example, computing the volume of a convex body in d dimensions. It is the only known general approach for providing a solution within a reasonable time, O dk . Therefore MCMC plays significant role in statistics, econometrics, physics and computing science. 6 / 61
  • 6. Images/cinvestav- Examples of Application Bayesian Inference and Learning Given some unknown variables X1, X2, ..., XK and data Y , we want to know properties about Y Graphically 8 / 61
  • 7. Images/cinvestav- Deep Learning Restricted Boltzmann Machines use MCMC to optimize their weights E (v, h) = − i aivi − j bjhj − i j viwi,jhj 9 / 61
  • 8. Images/cinvestav- What do we want? Given a probability distribution of interest π (x) , x ∈ RN Which has the following structure π (x) = 1 Z h (x) where h (x) is a PDF and Z is a unknown normalization constant. Thus, We want to understand such distribution!!! 11 / 61
  • 9. Images/cinvestav- The beginning of Monte Carlo Methods 1945 two events change the world forever The successful nuclear test at Alamogordo. The building of the first electronic computer, ENIAC. Pushed for the creation of the Monte Carlo Methods Original idea came from Stan Ulman... He loved relaxing by playing poker and solitary!!! Stan had an uncle who borrowed money from relatives because he “just had to go to Monte Carlo.” 13 / 61
  • 10. Images/cinvestav- Together with Von Neumann The Guy Behind the Minimax Algorithm They started to develop an idea to trace the path of neutrons in a spherical reactor. 14 / 61
  • 11. Images/cinvestav- Thus We have then At each stage a sequence of decisions has to be made based on statistical probabilities appropriate to the physical and geometric factors. For this, we only need a source of uniform random numbers!!! Because it is possible to use the inverse of the cumulative of the target function to obtain the necessary samples!!! 15 / 61
  • 12. Images/cinvestav- Thus, the FERMIAC was born!!! When the ENIAC was moved and it was necessary to keep generating target statistics 16 / 61
  • 13. Images/cinvestav- However Once the ENIAC went on-line again It took two months to have the basic controls for the Monte-Carlo One fortnight for the last phases of the implementation. Then, the tests were ran And Monte Carlo was born!!! 17 / 61
  • 14. Images/cinvestav- Look At This, Von Neumann’s Programs Design First and Programming After 18 / 61
  • 15. Images/cinvestav- Monte Carlo Integration We can get integral of complex functions I = Ω sin ln (x + y + 1)dxdy Where Ω is a disk with (x, y) | x − 1 2 2 + y − 1 2 2 ≤ 1 4 We only need a source of randomly uniform points at that area I ≈ Volume of Ω × Average Value of f in Ω 20 / 61
  • 16. Images/cinvestav- Getting the First Moment!!! The goal is to compute the following expectation E [f] = f (z) p (z) dz Solution Obtain a set of samples z(i) where i = 1, ..., N drawn independently from p(z) Approximate the expectation as E [f] ≈ E [f] = 1 N N i=1 f z(i) 21 / 61
  • 17. Images/cinvestav- Clustering Using Stochastic Process We use the following process Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity!!! Customer ONE sits at the first table The next customer either sits at the same table as customer ONE Or the next table Something like this 22 / 61
  • 18. Images/cinvestav- Thus, it is possible to build and entire Random Process Simply Asking p (customer i assigned to table j|D, α) = f (dij) if j = i α if i = j Where D is the distance between customers with a similarity dij = d (ci, cj) Let see the code There we have a series of nice ideas. 23 / 61
  • 19. Images/cinvestav- Markov Chains The random process Xt ∈ S for t = 1, 2, ..., T has a Markov property If and only if p (XT |XT−1, XT−2, ..., X1) = p (XT |XT−1) Finite-State Discrete Time Markov Chains It can be completely specified by the transition matrix. P = [pij] with pij = P [Xt = j|Xt−1 = i] 25 / 61
  • 20. Images/cinvestav- Example We have the following transition matrix        0.2 01 0.1 0.1 0.5 0.4 01 0.1 0.2 0.1 0.1 0.3 0.2 0.1 0.1 0.2 0.4 0.2 0.3 0.2 0.1 0.1 0.4 0.3 0.1        Graphically 1 5 2 3 4 26 / 61
  • 21. Images/cinvestav- What kind of Markov Chain do we like to study? Ergodic A Markov chain is called an ergodic chain if it is possible to go from every state to every state (not necessarily in one move). Aperiodic A state i has period k if any return to state i must occur in multiples of k time steps. k = gcd {n > 0|P (Xn = i|X0 = i) > 0} 27 / 61
  • 22. Images/cinvestav- Therefore Thus If k = 1, then the state is said to be aperiodic. Otherwise (k > 1), the state is said to be periodic with period k. A Markov chain is aperiodic if every state is aperiodic 28 / 61
  • 23. Images/cinvestav- The Theorem Perron–Frobenius Theorem Let A be a positive square matrix. Then: a. ρ(A) is an eigenvalue, and it has a positive eigenvector. b. ρ(A) is the only eigenvalue on the disc |λ| = ρ(A). 30 / 61
  • 24. Images/cinvestav- This and other theorems allows to calculate something quite interesting Using the Power method The method is described by the recurrence relation w(i+1) = Tw(i) Tw(i) where Tw(i) = √ w(i)tTtTw(i) Then The sub-sequence {wki }∞ i=1 converges to an eigenvector associated with the dominant eigenvalue. 31 / 61
  • 25. Images/cinvestav- Long Ago in a Long Forgotten Land Dozens of Companies fought for the Search Landscape American On-Line Netscape Yahoo Infoseek Lycos Altavista 33 / 61
  • 27. Images/cinvestav- Enters Larry Paige and Sergey Brin (Circa 1996) They Invented the Google Matrix (A Misspelling of Googol = 10100 ) G = αS + (1 − α) 1v Where S is a modified version of an adjacency matrix by converting the number of links into a probability 35 / 61
  • 28. Images/cinvestav- In addition Also 1 is the Column Vector of ones. v is a row vector of probabilities v = 1 n , 1 n , ..., 1 n (At the initial experiments) The Matrix n × n, (1 − α) 1v         (1 − α) 1 n (1 − α) 1 n · · · (1 − α) 1 n (1 − α) 1 n (1 − α) 1 n · · · (1 − α) 1 n ... ... ... ... (1 − α) 1 n (1 − α) 1 n · · · (1 − α) 1 n         36 / 61
  • 29. Images/cinvestav- The Dampening Factor Finally α In the Google matrix indicates that Random Web surfers move to a different web-page by some means other than selecting a link with probability 1 − α 37 / 61
  • 30. Images/cinvestav- The Algorithm was the Edge!!! First Google Server 38 / 61
  • 31. Images/cinvestav- After the introduction of the algorithm, THE END for everybody else!!! 39 / 61
  • 32. Images/cinvestav- Now Imagine the following You have a target distribution π that you want to sample You can use a generative q distribution that you know to try to generate the necessary samples. Then, you have a process like this 1 Sample x ∼ q (x). 2 Use the functional form of the target distribution π to Accept or Reject the sample x as being generated by π (x) 41 / 61
  • 33. Images/cinvestav- Basically Markov Condition The generation of x does not depend on previous states!!! 42 / 61
  • 34. Images/cinvestav- Further!!! Monte Carlo Method The algebraic and geometric properties of the target distribution helps to accept or reject the sample!!! 43 / 61
  • 35. Images/cinvestav- History - The not so great remarks... Metropolis Generalized −→ Metropolis-Hasitngs Special Case −→ Gibbs Sampling All developments are done in Computational Physics. The Landmark 1953 Paper N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller: “Equation of state calculations by fast computing machines, Journal of Chemical Physics.” There is a quote by A. Rosenbluth “Metropolis played no role in its development other than providing computer time!” After all, Metropolis was the supervisor in Los Alamos National Lab. 45 / 61
  • 36. Images/cinvestav- Steps of the Metropolis-Hastings An M-H steps involves the following A M-H step uses 1 The Target/Invariant Distribution l (x) 2 The Proposal/Sampling Distribution q (x |x) Then It involves sampling a candidate value x given the current value x according to q (x |x) The Markov chain then moves towards x With acceptance probability A (x, x ) = min 1, l (x ) q (x|x ) l (x) q (x |x) 47 / 61
  • 37. Images/cinvestav- Logistic Regression We know l (w|x, y) = n i=1   exp wT xi 1 + exp {wT xi}   yi 1 1 + exp {wT xi} 1−yi (1) In our case, we use as q a Multi-variate Gaussian π (w) ∼ N (µ, Σ) 49 / 61
  • 38. Images/cinvestav- Logistic Regression The Markov chain then moves towards x With acceptance probability A (x, x ) = min 1, l (x ) l (x) 50 / 61
  • 39. Images/cinvestav- Thus, we trigger the process of Sampling Then, we look at the modes in the distribution 51 / 61
  • 40. Images/cinvestav- Let’s Take a Look at the Program It is not so complex A little bit of work!!! 52 / 61
  • 41. Images/cinvestav- The Assumptions Suppose We have an n-dimensional vector x. The expressions for the full conditionals p (xj|x1, ..., xj−1, xj+1, ..., xn) Here, we have the following proposal distribution q x |x(i) =    p xj |x (i) −j if x−j = x (i) −j 0 otherwise 54 / 61
  • 42. Images/cinvestav- The Gibbs Sampler The Algorithm 1 Init x0,1:n 2 For i = 0 to N − 1 Sample x (i+1) 1 ∼ p x1|x (i) 2 , ..., x (i) n Sample x (i+1) 2 ∼ p x2|x (i) 1 , x (i) 3 , ..., x (i) n · · · Sample x (i+1) j ∼ p xj|x (i) 1 , ..., x (i) j−1, x (i) j+1, ..., x (i) n · · · Sample x (i+1) n ∼ p xn|x (i) 1 , ..., x (i) n−1 56 / 61
  • 43. Images/cinvestav- Latent Dirichlet Allocation It is an algorithm for finding topics composed by sets of words You require to have documents!!! Data consists Of documents di consisting of a set of words wi In a universe of W = {w1, ..., wn} words. 58 / 61
  • 44. Images/cinvestav- Then We want to find the mixture of words to topics Thus, we can easily do this by counting: Counts of topic k in document d The distribution of topics in a document Counts of word v in document d The distribution of words in a document Thus, we can compute the probability of topics Zi (Gibbs Term) p (Zi|Z−i, W) 59 / 61
  • 45. Images/cinvestav- For this We need to introduce some extra terms Ωd,k - count of topic k in document d. Ψk,v- counts of word v in document d. Thus the Gibbs Term p (Zi|Z−i, W) = Ψ−i k,v + β v Ψ−i k,v + Nv · β × Ω−i d,k + α k Ω−i d,k + Kα With Nv = number of different words. β = Renovation Dirichlet Parameter for words K = Number of topics α= Renovation Dirichlet Parameter for topics 60 / 61
  • 46. Images/cinvestav- So, we have the following code We have The Following!!! 61 / 61