Talk 5

Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 5 - Introduction to Bootstrap (and hints on Markov
Chains) - 27.01.2015

Introduction
Let’s assume, for a moment, the Central Limit Theorem
(CLT):
If a random sample of n observations y1, y2, ..., yn is drawn from a
population of mean µ and sd σ2, for n enough large, the sample
distribution of the sample mean can be approximated by a normal
density with mean µ and variance σ2
n
Averages taken from any distribution will have a normal
distribution
The standard deviation decreases as the number of
observation increases
But .. nobody tells us exactly how big the sample has to be.

Why Bootstrap?
1. Sometimes we cannot take advantages of the CLT, because:
Nobody tells us exactly how big the sample has to be.
Empirically, in some cases the sample is really small.
So, we are not encouraged to conjecture any distribution
assumption. We just have the data and we let the raw data
speak.
The bootstrap method attempts to determine the probability
distribution from the data itself, without recourse to CLT.
2. To better estimate the variance of a parameter, and
consequently having more accurate conﬁdence intervals and
hypothesis testing.

Basic Idea of Bootstrap
To use the original sample as the population, and to draw M
samples from the original sample (the bootstrap samples). To
Deﬁne the estimator using the bootstrap samples.
Figure: Real World versus Bootstrap World

Structure of Bootstrap
1. Originally, from a list of data (the sample), one computes a
statistic (an estimation).
2. Then, he/she can creates an artiﬁcial list of data (a new
sample), by randomly drawing elements from the list.
3. He/she computes a new statistic (estimation), from the new
sample.
4. He/she repeats, let’s say, M = 1000 times the point 2) and 3)
and he/she looks to the distribution of these 1000 statistics.

Type of resampling methods
1. The Monte Carlo algorithm: with replacement, the size of the
bootstrap sample must be equal to the size of the original data set
2. Jackknife algorithm: we simply re sample from the original sample
deleting one value at a time, the size is equal to n - 1.

Estimation of the sample mean
Suppose we extracted a sample x = (x1, x2, ..., xn) from the
population X. Let’s say the sample size is small: n = 10.
We can compute the sample mean ˆXn using the values of the
sample x. But, since n is small, the CLT does not hold, so that we
can say anything about the sample mean distribution.
APPROACH: We extract M samples (or sub-samples) of dimension
n from the sample x (with replacement, MC).
We can deﬁne the bootstrap sample means: ˆxi,b, ∀i = 1..., M. This
become the new sample with dimension M.
Bootstrap sample mean:
Mb(X) = M
i ˆxi,b/M
Bootstrap sample variance:
Vb(X) = M
i (ˆxi,b − Mb(X))2/M − 1 –(Chunk 1)

Bootstrap Conﬁdence interval with variance
estimation
Let’s take a random sample of size n= 25 from a normal
distribution with mean 10 and standard deviation 3.
We can consider the sampling distribution of the sample mean.
From that, we estimate the intervals.
The bootstrap estimates standard error by re sampling the data in
our original sample.
Instead of repeatedly drawing samples of size n= 25 from the
population, we will repeatedly draw new samples of size n=25 from
our original sample, re sampling with replacement.
We can estimate the standard error of the sample mean using the
standard deviation of the bootstrapped sample means. –(Chunk
2)

Bootstrap conﬁdence intervals: formula

Confidence interval with quantiles
Suppose we have a sample of data from an exponential distribution
with parameter λ:
f (x|λ) = λe−λx (remember: the estimation of λ is
ˆλ = 1/ˆxn).
An alternative solution to the use of bootstrap estimated standard
errors (since the estimation of the standard errors from an
exponential is not straightforward) is the use of bootstrap
quantiles.
We can obtain M bootstrap estimates ˆλb and define q∗(α) the α
quantile of the bootstrap distribution of the M λ estimates.
The new bootstrap confidence interval for λ will be:
[2 ∗ ˆλ − q∗(1 − α/2); 2 ∗ ˆλ − q∗(α/2)] –(Chunk 3)

Regression model coefficient estimate with Bootstrap
Now we will consider the situation where we have data on two variables.
This is the type of data that arises in linear regression models. It does
not make sense to bootstrap the two variables separately, so they remain
linked when bootstrapped.
If our original n=4 sample contains the observations (y1=1,x1=3),
(y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample these
original couples in pairs.
Recall that the linear regression model is: yi = β1 + β2xi + i . We are
going to construct a bootstrap interval for the slope coefficient β2:
1. We draw M bootstrap bivariate samples.
2. We define the OLS ˆβ2 coefficient for each bootstrap sample.
3. We define the bootstrap quantiles, and we use the 0.025 (α/2) and
the 0.975 (1 − α/2) to define the confidence interval for ˆβ2.
–(Chunk 4)

Regression model coefficient estimate with Bootstrap
(alternative): sampling the residuals
An alternative solution for bootstrap estimating the regression
coefficient is a two stage methods in which:
1. You draw M samples. For each one you run a regression and
you define M bootstrap residual vectors (M vectors of
dimension n).
2. You add those residuals to each of the M dependent variable’s
vector.
3. You perform M new regression models using the new
dependent variables, to estimate M bootstrapped β2.
The method consists in using the (α/2) and the (1 - α/2)
quantiles of bootstrapped β2 to define the confidence interval.
–(Chunk 5)

References
Efron, B., Tibshirani, R. (1993). An introduction to the
bootstrap (Vol. 57). CRC press
Figure: Efron and Tbishirani foundational book

Routines in R
1. boot, by Brian Ripley.
Functions and datasets for bootstrapping from the book
Bootstrap Methods and Their Applications by A. C. Davison
and D. V. Hinkley (1997, CUP).
2. bootstrap, by Rob Tibshirani.
Software (bootstrap, cross-validation, jackknife) and data for
the book An Introduction to the Bootstrap by B. Efron and
R. Tibshirani, 1993, Chapman and Hall

Markov Chain
Markov Chain is an important method in probability and many
other area of research.
They are used to model the probability to belong to a certain state
in a certain period, given that the state in the past period is
known.
Example of weather: What is the markov probability for the state
tomorrow will be sunny, given that today is rainy?
The main properties of Markov Chain processes are:
Memory of the process (usually the memory is ﬁxed to 1).
Stationarity of the distribution.

Chart 1
A picture of an easy example of markov chain with two possible
states and reported transition probabilities.
Figure: An example of 2 states markov chain

Notation
We define a stochastic process {Xt, t = 0, 1, 2, ...} that takes on a
finite or countable number of possible values.
Let the possible values be non negative integers (i.e.Xt ∈ Z+). If
Xt = i, then the process is said to be in state i at time t.
The Markov process (in discrete time) is defined as follows:
Pij = P[Xt+1 = j|Xt = i, Xt−1 = i, ..., X0 = i] = P[Xt+1 = j|Xt =
i], ∀i, j ∈ Z+
We call Pij a 1-step transition probability because we move from
time t to time t + 1.
It is a first order Markov Chain (memory = 1) because the
probability of being in state j at time (t + 1) only depends on the
state at time t.

Notation - 2
The t − step transition probability
Ptij = P[Xt+k = j|Xk = i], ∀t ≥ 0, i, j ≥ 0
The Champman Kolmogorov equations allow us to compute these
t − step transition probabilities. It states that:
Ptij = k PtikPmkj , ∀t, m ≥ 0, ∀i, j ≥ 0
N.B. Base probability properties:
1. Pij ≥ 0, ∀i, j ≥ 0
2. j≥0 Pij = 1, i = 0, 1, 2, ...

Example: conditional probability
Consider two states: 0 = rain and 1 = no rain.
Deﬁne two probabilities:
α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will rain
tomorrow given it rains today
β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will rain
tomorrow given it does not rain today. What is the probability it
will rain the day after tomorrow given it rains today, given α = 0.7
and β = 0.3?
The transition probability matrix will be:
P = [P00, P01, P10, P11], or
P = [α = 0.7, β = 0.3, 1 − α = 0.4, 1 − β = 0.6] –(Chunk 6)

Example: unconditional probababily
What is the unconditional probability it will rain the day after
tomorrow?
We need to deﬁne the unconditional or marginal distribution of the
state at time t:
P[Xt = j] = i P[Xt = j|X0 = 1]P[X0 = i] = i Ptij ∗ αi ,
where αi = P[X0 = i], ∀i ≥ 0
and P[Xt = j|X0 = 1] is the conditional probability just computed
before. –(Chunk 7)

Stationary distributions
A stationary distribution π is the probability distribution such that
when the Markov chain reaches the stationary distribution, then it
remains in that probability forever.
It means we are asking this question: What is the probability to be
in a particular state in the long-run?
Let’s deﬁne πj as the limiting probability that the process will be in
state j at time t, or
πj = limt→∞Pnij
Using Fubini’s theorem
(https://www.youtube.com/watch?v=6-sGhUeOOk8), we can
deﬁne the stationary distribution as:
πj = i Pij πi , or, better, with these approximations: π0 = β
α;
π1 = 1−α
α

Example: stationary distribution
Back to our example.
We can compute the 2 step, 3 step, ..., n- step transition
distributions, and give a look WHEN it reach the
convergence.
An alternative method to compute the stationary transition
distribution consists in using this easy formula:
π0 = β
α
π1 = 1−α
α

References
Ross, S. M. (2006). Introduction to probability models. Access
Online via Elsevier.
Figure: Cover of the 10th edition

Routines in R
markovchain, by Giorgio Alfredo Spedicato.
A package for easily handling discrete Markov chains.
MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and
Jong Hee Park.
Perform Monte Carlo simulations based on Markov Chain
approach.

Talk 5

More Related Content

What's hot (20)

Similar to Talk 5 (20)

More from University of Salerno (20)

Recently uploaded (20)

Talk 5