Fundamentals of Statistics - Bayesian Statistics.pdf

1/20
18.650 – Fundamentals of Statistics
5. Bayesian Statistics

2/20
Goals
So far, we have followed the frequentist approach (cf. meaning of
a confidence interval).
An alternative is the Bayesian approach.
New concepts will come into play:
I prior and posterior distributions
I Bayes’ formula
I Priors: improper, non informative
I Bayesian estimation: posterior mean, Maximum a posteriori
(MAP)
I Bayesian confidence region
In a sense, Bayesian inference amounts to having a likelihood
function Ln(✓) that is weighted by prior knowledge on what ✓
might be. This is useful in many applications.

3/20
The frequentist approach
I Assume a statistical model (E, {IP✓}✓2⇥).
I We assumed that the data X1, . . . , Xn was drawn i.i.d from
IP✓⇤ for some unknown ✓⇤.
I When we used the MLE for example, we looked at all possible
✓ 2 ⇥.
I Before seeing the data we did not prefer a choice of ✓ 2 ⇥
over another.

4/20
The Bayesian approach
I In many practical contexts, we have a belief about ✓⇤
I Using the data, we want to update that belief and transform
it into a belief.

5/20
The kiss example
I Let p be the proportion of couples that turn their head to the
right
I Let X1, . . . , Xn
i.i.d
⇠ Ber(p).
I In the frequentist approach, we estimated p (using the MLE),
we constructed some confidence interval for p, we did
hypothesis testing (e.g., H0 : p = .5 v.s. H1 : p 6= .5).
I Before analyzing the data, we may believe that p is likely to
be close to 1/2.
I The Bayesian approach is a tool to update our prior belief
using the data.

6/20
The kiss example
I Our prior belief about p can be quantified:
I E.g., we are 90% sure that p is between .4 and .6, 95% that it
is between .3 and .8, etc...
I Hence, we can model our prior belief using a distribution for
p, as if p was random.
I In reality, the true parameter is not random ! However, the
Bayesian approach is a way of modeling our belief about the
parameter by doing as if it was random.
I E.g., p ⇠ Beta(a, b) (Beta distribution. It has pdf
f(x) =
1
K
xa 1
(1 x)b 1
1
I(x 2 [0, 1]), K =
Z 1
0
ta 1
(1 t)b 1
dt
I This distribution is called the

7/20
The kiss example
I In our statistical experiment, X1, . . . , Xn are assumed to be
i.i.d. Bernoulli r.v. with parameter p conditionally on .
I After observing the available sample X1, . . . , Xn, we can
update our belief about p by taking its distribution
conditionally on the data.
I The distribution of p conditionally on the data is called the
I Here, the posterior distribution is
Beta a +
n
X
i=1
Xi, a + n
n
X
i=1
Xi

8/20
Clinical trials
Let us revisit our clinical trial example
I Pharmaceutical companies use hypothesis testing to test if a
new drug is efficient.
I To do so, they administer a drug to a group of patients (test
group) and a placebo to another group (control group).
I We consider testing a drug that is supposed to lower LDL
(low-density lipoprotein), a.k.a ”bad cholesterol” among
patients with a high level of LDL (above 200 mg/dL)

9/20
Clinical trials
I Let d > 0 denote the expected decrease of LDL level (in
mg/dL) for a patient that has used the drug.
I Let c > 0 denote the expected decrease of LDL level (in
mg/dL) for a patient that has used the placebo.
Quantity of interest: ✓ := .
In practice we have a prior belief on ✓. For example,
I ✓ ⇠ Unif([100, 200])
I ✓ ⇠ Exp(100)
I ✓ ⇠ N(100, 300),
I . . .

10/20
Prior and posterior
I Consider a probability distribution on a parameter space ⇥
with some pdf ⇡(·): the prior distribution.
I Let X1, . . . , Xn be a sample of n random variables.
I Denote by Ln(·|✓) the joint pdf of X1, . . . , Xn conditionally
on ✓, where ✓ ⇠ ⇡.
I Remark: Ln(X1, . . . , Xn|✓) is the used in the
frequentist approach.
I The conditional distribution of ✓ given X1, . . . , Xn is called
the posterior distribution. Denote by ⇡(·|X1, . . . , Xn) its pdf.

11/20
Bayes’ formula
I Bayes’ formula states that:
⇡(✓|X1, . . . , Xn) / ⇡(✓)Ln(X1, . . . , Xn|✓), 8✓ 2 ⇥.
I The constant does not depend on ✓:
⇡(✓|X1, . . . , Xn) =
⇡(✓)Ln(X1, . . . , Xn|✓)
R
⇥
, 8✓ 2 ⇥.

12/20
Bernoulli experiment with a Beta prior
In the Kiss example:
I p ⇠ Beta(a, a):
⇡(p) / pa 1
(1 p)a 1
, p 2 (0, 1)
I Given p, X1, . . . , Xn
i.i.d.
⇠ Ber(p), so
Ln(X1, . . . , Xn|p) =
I Hence,
⇡(p|X1, . . . , Xn) / pa 1+
Pn
i=1 Xi
(1 p)a 1+n
Pn
i=1 Xi
.
I The posterior distribution is

13/20
Non informative priors
I We can still use a Bayesian approach if we have no prior
information about the parameter. How to pick prior ⇡?
I Good candidate: ⇡(✓) / 1, i.e., constant pdf on ⇥.
I If ⇥ is bounded, this is the prior on ⇥.
I If ⇥ is unbounded, this does not define a proper pdf on ⇥ !
I An improper prior on ⇥ is a measurable, nonnegative function
⇡(·) defined on ⇥ that is not integrable.
I In general, one can still define a posterior distribution using an
improper prior, using Bayes’ formula.

14/20
Examples
I If p ⇠ U(0, 1) and given p, X1, . . . , Xn
i.i.d.
⇠ Ber(p) :
⇡(p|X1, . . . , Xn) /
i.e., the posterior distribution is
I If ⇡(✓) = 1, 8✓ 2 IR and given X1, . . . , Xn|✓
i.i.d.
⇠ N(✓, 1):
⇡(✓|X1, . . . , Xn) / exp
i.e., the posterior distribution is

15/20
Je↵reys’ prior
I Je↵reys prior:
⇡J (✓) /
p
det I(✓)
where I(✓) is the matrix of the statistical
model associated with X1, . . . , Xn in the frequentist approach
(provided it exists).
I In the previous examples:
I Bernoulli experiment: ⇡J (p) / 1
p
p(1 p)
, p 2 (0, 1): the prior is
Beta( , ).
I Gaussian experiment: ⇡J (✓) / 1, ✓ 2 IR is an
prior.

16/20
Je↵reys’ prior
I Je↵reys prior satisfies a reparametrization invariance principle:
If ⌘ is a reparametrization of ✓ (i.e., ⌘ = (✓) for some
one-to-one map ), then the pdf ˜
⇡(·) of ⌘ satisfies:
˜
⇡(⌘) /
q
det ˜
I(⌘),
where ˜
I(⌘) is the Fisher information of the statistical model
parametrized by ⌘ instead of ✓.

17/20
Bayesian confidence regions
I For ↵ 2 (0, 1), a Bayesian confidence region with level ↵ is a
random subset R of the parameter space ⇥, which depends
on the sample X1, . . . , Xn, such that:
IP[✓ 2 R|X1, . . . , Xn] =
I Note that R depends on the prior ⇡(·).
I ”Bayesian confidence region” and ”confidence interval” are
two distinct notions.

18/20
Bayesian estimation
I The Bayesian framework can also be used to estimate the true
underlying parameter (hence, in a frequentist approach).
I In this case, the prior distribution does not reflect a prior
belief: It is just an artificial tool used in order to define a new
class of estimators.
I Back to the frequentist approach: The sample
X1, . . . , Xn is associated with a statistical model
(E, (IP✓)✓2⇥).
I Define a prior (that can be improper) with pdf ⇡ on the
parameter space ⇥.
I Compute the posterior pdf ⇡(·|X1, . . . , Xn) associated with ⇡.

19/20
Bayesian estimation
I Bayes estimator:
ˆ
✓(⇡)
=
This is the posterior mean.
I The Bayesian estimator depends on the choice of the prior
distribution ⇡ (hence the superscript ⇡).
I Another popular choice is the point that maximizes the
posterior distribution, provided it is unique. It is called the
MAP (maximum a posteriori):
ˆ
✓map
= argmax
✓2⇥

20/20
Bayesian estimation
I In the previous examples:
I Kiss example with prior Beta(a, a) (a > 0):
p̂(⇡)
=
a +
Pn
i=1 Xi
2a + n
=
a/n + X̄n
2a/n + 1
.
In particular, for a = 1/2 (Je↵reys prior),
p̂(⇡J )
=
1/(2n) + X̄n
1/n + 1
.
I Gaussian example with Je↵rey’s prior: ˆ
✓(⇡J )
= X̄n.
I In each of these examples, the Bayes estimator is consistent
and asymptotically normal.
I In general, the asymptotic properties of the Bayes estimator
do not depend on the choice of the prior.

Fundamentals of Statistics - Bayesian Statistics.pdf

More Related Content

Similar to Fundamentals of Statistics - Bayesian Statistics.pdf (20)

Recently uploaded (20)

Fundamentals of Statistics - Bayesian Statistics.pdf