SlideShare a Scribd company logo
Variational Inference for the univariate Gaussian
Tomonari MASADA @ Nagasaki University
April 10, 2018
1 Approximate posterior
We consider the univariate Gaussian model for modeling a set of real-valued observations. (Our discussion
is based on Section 10.1.3 of PRML by C. M. Bishop.)
When we have a set of N observations X = {x1, . . . , xN }, the likelihood function is given by
p(X|µ, τ) =
N
n=1
1
√
2πσ2
exp −
(xn − µ)2
2σ2
=
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
(1)
where τ ≡ 1/σ2
is called precision.
We introduce conjugate prior distributions for µ and τ as follows:
p(µ|τ; µ0, λ0) = N(µ|µ0, (λ0τ)−1
) (2)
p(τ; a0, b0) = Gam(τ|a0, b0) (3)
where Gam(τ|a0, b0) is the gamma distribution.
While we can find the posterior exactly for this problem, we approximate the exact posterior by a
factorized approximate posterior given by
q(µ, τ) = q(µ)q(τ) (4)
The full joint distribution can be obtained as follows:
p(X, µ, τ; µ0, λ0, a0, b0) = p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
=
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
×
λ0τ
2π
1/2
exp −
λ0τ
2
(µ − µ0)2
×
1
Γ(a0)
ba0
0 τa0−1
exp(−b0τ)
(5)
A lower bound of the log evidence can be obtained as follows:
ln p(X; µ0, λ0, a0, b0) = ln p(X, µ, τ; µ0, λ0, a0, b0)dµdτ (6)
= ln p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)dµdτ (7)
= ln q(µ)q(τ)
p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
q(µ)q(τ)
dµdτ (8)
≥ q(µ)q(τ) ln
p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
q(µ)q(τ)
dµdτ (9)
We refer to the above lower bound by L.
First, we keep q(τ) and maximize L.
L = q(µ)q(τ) ln
p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
q(µ)q(τ)
dµdτ (10)
= q(µ) q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dτ + const. dµ − q(µ) ln q(µ)dµ + const. (11)
1
This is a negative KL-divergence and is thus maximized when
ln q(µ) = q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dτ + const. (12)
= q(τ) ln
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
dτ (13)
+ q(τ) ln
λ0τ
2π
1/2
exp −
λ0τ
2
(µ − µ0)2
dτ (14)
= −
Eq(τ)[τ]
2
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
+ const. (15)
where Eq(τ)[·] denotes the expectation with respect to the approximate posterior q(τ). This result means
that q(µ) is a Gaussian distribution, whose mean and precision is given by the following math:
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
= Nµ2
− 2µ
N
n=1
xn +
N
n=1
x2
n + λ0µ2
− 2λ0µ0µ + λ0µ2
0 (16)
= (λ0 + N) µ2
− 2
λ0µ0 + N ¯x
λ0 + N
µ + const. (17)
= (λ0 + N) µ −
λ0µ0 + N ¯x
λ0 + N
2
+ const. (18)
∴ µN =
λ0µ0 + N ¯x
λ0 + N
(19)
λN = (λ0 + N)Eq(τ)[τ] (20)
Next, we keep q(µ) and maximize L.
L = q(µ)q(τ) ln
p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
q(µ)q(τ)
dµdτ (21)
= q(τ) q(µ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dµ + const. dτ (22)
+ q(τ) ln p(τ; a0, b0)dτ − q(τ) ln q(τ)dτ + const. (23)
This is maximized when
ln q(τ) = q(µ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dµ + ln p(τ; a0, b0) + const. (24)
= q(µ) ln
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
×
λ0τ
2π
1/2
exp −
λ0τ
2
(µ − µ0)2
dµ (25)
+ ln
1
Γ(a0)
ba0
0 τa0−1
exp(−b0τ) + const. (26)
=
N + 1
2
ln(τ) −
τ
2
q(µ)
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
dµ + (a0 − 1) ln(τ) − b0τ + const. (27)
= a0 +
N − 1
2
ln(τ) − b0 +
1
2
Eq(µ)
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
τ + const. (28)
(29)
This result means that q(τ) is a gamma distribution, whose parameters are
aN = a0 +
N − 1
2
(30)
bN = b0 +
1
2
Eq(µ)
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
(31)
2
Consequently, µN and λN can be computed as
µN =
λ0µ0 + N ¯x
λ0 + N
(32)
λN = (λ0 + N)
aN
bN
(33)
when aN and bN are given. Further, aN and bN can be computed as
aN = a0 +
N − 1
2
(34)
bN = b0 +
1
2
Eq(µ)
N
n=1
(xn − µ)2
+ λ0(µ − µ0)2
(35)
= b0 +
1
2
N
n=1
x2
n + λ0µ2
0 − (λ0µ0 + N ¯x)Eq(µ)[µ] + (λ0 + N)Eq(µ)[µ2
] (36)
= b0 +
1
2
N
n=1
x2
n + λ0µ2
0 − (λ0µ0 + N ¯x)µN + (λ0 + N)
1
λN
+ µ2
N (37)
= b0 +
1
2
N
n=1
x2
n + λ0µ2
0 +
λ0 + N
λN
(38)
2 True posterior
The full joint distribution is obtained as follows:
p(X, µ, τ; µ0, λ0, a0, b0) = p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)
=
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
×
λ0τ
2π
1/2
exp −
λ0τ
2
(µ − µ0)2
×
1
Γ(a0)
ba0
0 τa0−1
exp(−b0τ)
(39)
exp −
τ
2
N
n=1
(xn − µ)2
× exp −
λ0τ
2
(µ − µ0)2
(40)
= exp −
τ
2
N
n=1
(xn − µ)2
−
λ0τ
2
(µ − µ0)2
(41)
= exp −
τ
2
Nµ2
− 2N ¯xµ + Nx2 + λ0µ2
− 2λ0µ0 + λ0µ2
0 (42)
= exp −
τ
2
(λ0 + N)µ2
− 2(λ0µ0 + N ¯x)µ + (λ0µ2
0 + Nx2) (43)
= exp −
τ(λ0 + N)
2
µ2
− 2
λ0µ0 + N ¯x
λ0 + N
µ −
τ
2
(λ0µ2
0 + Nx2) (44)
= exp −
τ(λ0 + N)
2
µ −
λ0µ0 + N ¯x
λ0 + N
2
−
λ0µ0 + N ¯x
λ0 + N
2
−
τ
2
(λ0µ2
0 + Nx2) (45)
= exp −
τ(λ0 + N)
2
µ −
λ0µ0 + N ¯x
λ0 + N
2
+
τ
2
(λ0µ0 + N ¯x)2
λ0 + N
− (λ0µ2
0 + Nx2) (46)
= exp −
τ(λ0 + N)
2
µ −
λ0µ0 + N ¯x
λ0 + N
2
× exp
τ
2
(λ0µ0 + N ¯x)2
λ0 + N
− (λ0µ2
0 + Nx2) (47)
(48)
3
Therefore,
exp −
τ
2
N
n=1
(xn − µ)2
× exp −
λ0τ
2
(µ − µ0)2
dµ (49)
=
2π
τ(λ0 + N)
1/2
exp
τ
2
(λ0µ0 + N ¯x)2
λ0 + N
− (λ0µ2
0 + Nx2) (50)
p(X, τ; µ0, λ0, a0, b0) = p(X, µ, τ; µ0, λ0, a0, b0)dµ (51)
=
τ
2π
N/2
λ0τ
2π
1/2
1
Γ(a0)
ba0
0 τa0−1
exp(−b0τ)
×
2π
τ(λ0 + N)
1/2
exp
τ
2
(λ0µ0 + N ¯x)2
λ0 + N
− (λ0µ2
0 + Nx2) (52)
=
1
2π
N/2
λ0
λ0 + N
1/2
1
Γ(a0)
ba0
0 τa0+N/2−1
× exp − b0 +
1
2
(λ0µ0 + N ¯x)2
λ0 + N
− (λ0µ2
0 + Nx2) τ (53)
=
1
2π
N/2
λ0
λ0 + N
1/2
1
Γ(a0)
ba0
0 τa0+N/2−1
× exp − b0 +
λ0µ0 + N ¯x
2
λ0µ0 + N ¯x
λ0 + N
−
λ0µ2
0 + Nx2
λ0µ0 + N ¯x
τ (54)
(55)
Therefore, the evidence is given by
p(X; µ0, λ0, a0, b0) = p(X, τ; µ0, λ0, a0, b0)dτ (56)
=
1
2π
N/2
λ0
λ0 + N
1/2
1
Γ(a0)
ba0
0 τa0+N/2−1
× exp − b0 +
λ0µ0 + N ¯x
2
λ0µ0 + N ¯x
λ0 + N
−
λ0µ2
0 + Nx2
λ0µ0 + N ¯x
τ dτ (57)
=
1
2π
N/2
λ0
λ0 + N
1/2
Γ(a0 + N/2)
Γ(a0)
ba0
0 b0 +
λ0µ0 + N ¯x
2
λ0µ0 + N ¯x
λ0 + N
−
λ0µ2
0 + Nx2
λ0µ0 + N ¯x
a0+N/2
(58)
Therefore, the true posterior is given by
p(µ, τ|X; µ0, λ0, a0, b0) =
p(X, µ, τ; µ0, λ0, a0, b0)
p(X; µ0, λ0, a0, b0)
=
τ
2π
N/2
exp −
τ
2
N
n=1
(xn − µ)2
×
λ0τ
2π
1/2
exp −
λ0τ
2
(µ − µ0)2
×
1
Γ(a0)
ba0
0 τa0−1
exp(−b0τ)
1
2π
N/2
λ0
λ0 + N
1/2
Γ(a0 + N/2)
Γ(a0)
ba0
0 b0 +
λ0µ0 + N ¯x
2
λ0µ0 + N ¯x
λ0 + N
−
λ0µ2
0 + Nx2
λ0µ0 + N ¯x
a0+N/2
=
λ0 + N
2π
1/2
exp −
τ
2
N
n=1
(xn − µ)2
−
λ0τ
2
(µ − µ0)2
× τa0+(N−1)/2
exp(−b0τ)
Γ(a0 + N/2) b0 +
λ0µ0 + N ¯x
2
λ0µ0 + N ¯x
λ0 + N
−
λ0µ2
0 + Nx2
λ0µ0 + N ¯x
a0+N/2
(59)
4

More Related Content

PDF
1586746631GAMMA BETA FUNCTIONS.pdf
PPTX
Rrb group d
PDF
Lesson 6: Polar, Cylindrical, and Spherical coordinates
PDF
Solution of Differential Equations in Power Series by Employing Frobenius Method
PPT
Analytic Geometry
PDF
Expectation propagation for latent Dirichlet allocation
PDF
2_GLMs_printable.pdf
1586746631GAMMA BETA FUNCTIONS.pdf
Rrb group d
Lesson 6: Polar, Cylindrical, and Spherical coordinates
Solution of Differential Equations in Power Series by Employing Frobenius Method
Analytic Geometry
Expectation propagation for latent Dirichlet allocation
2_GLMs_printable.pdf

Similar to A note on variational inference for the univariate Gaussian (20)

PDF
Murphy: Machine learning A probabilistic perspective: Ch.9
PDF
PPTX
tut07.pptx
PDF
A Note on Expectation-Propagation for Latent Dirichlet Allocation
PPTX
PRML Chapter 2
PDF
Gaussian process in machine learning
PDF
(DL hacks輪読)Bayesian Neural Network
PDF
Stability criterion of periodic oscillations in a (10)
PDF
Introduction to Bayesian Inference
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
PDF
On the Jensen-Shannon symmetrization of distances relying on abstract means
PDF
8803-09-lec16.pdf
PDF
SAS Homework Help
PDF
Gaussian Processes: Applications in Machine Learning
PDF
A Note on the Asymptotic Convergence of Bernoulli Distribution
PDF
Cs229 notes8
PDF
Bayesian computation with INLA
PDF
Survival and hazard estimation of weibull distribution based on
PDF
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PDF
A Note on Correlated Topic Models
Murphy: Machine learning A probabilistic perspective: Ch.9
tut07.pptx
A Note on Expectation-Propagation for Latent Dirichlet Allocation
PRML Chapter 2
Gaussian process in machine learning
(DL hacks輪読)Bayesian Neural Network
Stability criterion of periodic oscillations in a (10)
Introduction to Bayesian Inference
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
On the Jensen-Shannon symmetrization of distances relying on abstract means
8803-09-lec16.pdf
SAS Homework Help
Gaussian Processes: Applications in Machine Learning
A Note on the Asymptotic Convergence of Bernoulli Distribution
Cs229 notes8
Bayesian computation with INLA
Survival and hazard estimation of weibull distribution based on
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
A Note on Correlated Topic Models
Ad

More from Tomonari Masada (20)

PDF
Learning Latent Space Energy Based Prior Modelの解説
PDF
Denoising Diffusion Probabilistic Modelsの重要な式の解説
PDF
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
PDF
A note on the density of Gumbel-softmax
PPTX
トピックモデルの基礎と応用
PDF
Mini-batch Variational Inference for Time-Aware Topic Modeling
PDF
Document Modeling with Implicit Approximate Posterior Distributions
PDF
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
PDF
A Note on ZINB-VAE
PDF
A Note on Latent LSTM Allocation
PDF
A Note on TopicRNN
PDF
Topic modeling with Poisson factorization (2)
PDF
Poisson factorization
PPTX
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
PPTX
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
TXT
Word count in Husserliana Volumes 1 to 28
PDF
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
PDF
FDSE2015
PDF
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
PDF
A Note on BPTT for LSTM LM
Learning Latent Space Energy Based Prior Modelの解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
A note on the density of Gumbel-softmax
トピックモデルの基礎と応用
Mini-batch Variational Inference for Time-Aware Topic Modeling
Document Modeling with Implicit Approximate Posterior Distributions
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
A Note on ZINB-VAE
A Note on Latent LSTM Allocation
A Note on TopicRNN
Topic modeling with Poisson factorization (2)
Poisson factorization
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Word count in Husserliana Volumes 1 to 28
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
FDSE2015
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A Note on BPTT for LSTM LM
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Computer network topology notes for revision
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to machine learning and Linear Models
Mega Projects Data Mega Projects Data
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Qualitative Qantitative and Mixed Methods.pptx
Lecture1 pattern recognition............
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Computer network topology notes for revision
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
Introduction to machine learning and Linear Models

A note on variational inference for the univariate Gaussian

  • 1. Variational Inference for the univariate Gaussian Tomonari MASADA @ Nagasaki University April 10, 2018 1 Approximate posterior We consider the univariate Gaussian model for modeling a set of real-valued observations. (Our discussion is based on Section 10.1.3 of PRML by C. M. Bishop.) When we have a set of N observations X = {x1, . . . , xN }, the likelihood function is given by p(X|µ, τ) = N n=1 1 √ 2πσ2 exp − (xn − µ)2 2σ2 = τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 (1) where τ ≡ 1/σ2 is called precision. We introduce conjugate prior distributions for µ and τ as follows: p(µ|τ; µ0, λ0) = N(µ|µ0, (λ0τ)−1 ) (2) p(τ; a0, b0) = Gam(τ|a0, b0) (3) where Gam(τ|a0, b0) is the gamma distribution. While we can find the posterior exactly for this problem, we approximate the exact posterior by a factorized approximate posterior given by q(µ, τ) = q(µ)q(τ) (4) The full joint distribution can be obtained as follows: p(X, µ, τ; µ0, λ0, a0, b0) = p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) = τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 × λ0τ 2π 1/2 exp − λ0τ 2 (µ − µ0)2 × 1 Γ(a0) ba0 0 τa0−1 exp(−b0τ) (5) A lower bound of the log evidence can be obtained as follows: ln p(X; µ0, λ0, a0, b0) = ln p(X, µ, τ; µ0, λ0, a0, b0)dµdτ (6) = ln p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0)dµdτ (7) = ln q(µ)q(τ) p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) q(µ)q(τ) dµdτ (8) ≥ q(µ)q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) q(µ)q(τ) dµdτ (9) We refer to the above lower bound by L. First, we keep q(τ) and maximize L. L = q(µ)q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) q(µ)q(τ) dµdτ (10) = q(µ) q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dτ + const. dµ − q(µ) ln q(µ)dµ + const. (11) 1
  • 2. This is a negative KL-divergence and is thus maximized when ln q(µ) = q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dτ + const. (12) = q(τ) ln τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 dτ (13) + q(τ) ln λ0τ 2π 1/2 exp − λ0τ 2 (µ − µ0)2 dτ (14) = − Eq(τ)[τ] 2 N n=1 (xn − µ)2 + λ0(µ − µ0)2 + const. (15) where Eq(τ)[·] denotes the expectation with respect to the approximate posterior q(τ). This result means that q(µ) is a Gaussian distribution, whose mean and precision is given by the following math: N n=1 (xn − µ)2 + λ0(µ − µ0)2 = Nµ2 − 2µ N n=1 xn + N n=1 x2 n + λ0µ2 − 2λ0µ0µ + λ0µ2 0 (16) = (λ0 + N) µ2 − 2 λ0µ0 + N ¯x λ0 + N µ + const. (17) = (λ0 + N) µ − λ0µ0 + N ¯x λ0 + N 2 + const. (18) ∴ µN = λ0µ0 + N ¯x λ0 + N (19) λN = (λ0 + N)Eq(τ)[τ] (20) Next, we keep q(µ) and maximize L. L = q(µ)q(τ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) q(µ)q(τ) dµdτ (21) = q(τ) q(µ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dµ + const. dτ (22) + q(τ) ln p(τ; a0, b0)dτ − q(τ) ln q(τ)dτ + const. (23) This is maximized when ln q(τ) = q(µ) ln p(X|µ, τ)p(µ|τ; µ0, λ0)dµ + ln p(τ; a0, b0) + const. (24) = q(µ) ln τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 × λ0τ 2π 1/2 exp − λ0τ 2 (µ − µ0)2 dµ (25) + ln 1 Γ(a0) ba0 0 τa0−1 exp(−b0τ) + const. (26) = N + 1 2 ln(τ) − τ 2 q(µ) N n=1 (xn − µ)2 + λ0(µ − µ0)2 dµ + (a0 − 1) ln(τ) − b0τ + const. (27) = a0 + N − 1 2 ln(τ) − b0 + 1 2 Eq(µ) N n=1 (xn − µ)2 + λ0(µ − µ0)2 τ + const. (28) (29) This result means that q(τ) is a gamma distribution, whose parameters are aN = a0 + N − 1 2 (30) bN = b0 + 1 2 Eq(µ) N n=1 (xn − µ)2 + λ0(µ − µ0)2 (31) 2
  • 3. Consequently, µN and λN can be computed as µN = λ0µ0 + N ¯x λ0 + N (32) λN = (λ0 + N) aN bN (33) when aN and bN are given. Further, aN and bN can be computed as aN = a0 + N − 1 2 (34) bN = b0 + 1 2 Eq(µ) N n=1 (xn − µ)2 + λ0(µ − µ0)2 (35) = b0 + 1 2 N n=1 x2 n + λ0µ2 0 − (λ0µ0 + N ¯x)Eq(µ)[µ] + (λ0 + N)Eq(µ)[µ2 ] (36) = b0 + 1 2 N n=1 x2 n + λ0µ2 0 − (λ0µ0 + N ¯x)µN + (λ0 + N) 1 λN + µ2 N (37) = b0 + 1 2 N n=1 x2 n + λ0µ2 0 + λ0 + N λN (38) 2 True posterior The full joint distribution is obtained as follows: p(X, µ, τ; µ0, λ0, a0, b0) = p(X|µ, τ)p(µ|τ; µ0, λ0)p(τ; a0, b0) = τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 × λ0τ 2π 1/2 exp − λ0τ 2 (µ − µ0)2 × 1 Γ(a0) ba0 0 τa0−1 exp(−b0τ) (39) exp − τ 2 N n=1 (xn − µ)2 × exp − λ0τ 2 (µ − µ0)2 (40) = exp − τ 2 N n=1 (xn − µ)2 − λ0τ 2 (µ − µ0)2 (41) = exp − τ 2 Nµ2 − 2N ¯xµ + Nx2 + λ0µ2 − 2λ0µ0 + λ0µ2 0 (42) = exp − τ 2 (λ0 + N)µ2 − 2(λ0µ0 + N ¯x)µ + (λ0µ2 0 + Nx2) (43) = exp − τ(λ0 + N) 2 µ2 − 2 λ0µ0 + N ¯x λ0 + N µ − τ 2 (λ0µ2 0 + Nx2) (44) = exp − τ(λ0 + N) 2 µ − λ0µ0 + N ¯x λ0 + N 2 − λ0µ0 + N ¯x λ0 + N 2 − τ 2 (λ0µ2 0 + Nx2) (45) = exp − τ(λ0 + N) 2 µ − λ0µ0 + N ¯x λ0 + N 2 + τ 2 (λ0µ0 + N ¯x)2 λ0 + N − (λ0µ2 0 + Nx2) (46) = exp − τ(λ0 + N) 2 µ − λ0µ0 + N ¯x λ0 + N 2 × exp τ 2 (λ0µ0 + N ¯x)2 λ0 + N − (λ0µ2 0 + Nx2) (47) (48) 3
  • 4. Therefore, exp − τ 2 N n=1 (xn − µ)2 × exp − λ0τ 2 (µ − µ0)2 dµ (49) = 2π τ(λ0 + N) 1/2 exp τ 2 (λ0µ0 + N ¯x)2 λ0 + N − (λ0µ2 0 + Nx2) (50) p(X, τ; µ0, λ0, a0, b0) = p(X, µ, τ; µ0, λ0, a0, b0)dµ (51) = τ 2π N/2 λ0τ 2π 1/2 1 Γ(a0) ba0 0 τa0−1 exp(−b0τ) × 2π τ(λ0 + N) 1/2 exp τ 2 (λ0µ0 + N ¯x)2 λ0 + N − (λ0µ2 0 + Nx2) (52) = 1 2π N/2 λ0 λ0 + N 1/2 1 Γ(a0) ba0 0 τa0+N/2−1 × exp − b0 + 1 2 (λ0µ0 + N ¯x)2 λ0 + N − (λ0µ2 0 + Nx2) τ (53) = 1 2π N/2 λ0 λ0 + N 1/2 1 Γ(a0) ba0 0 τa0+N/2−1 × exp − b0 + λ0µ0 + N ¯x 2 λ0µ0 + N ¯x λ0 + N − λ0µ2 0 + Nx2 λ0µ0 + N ¯x τ (54) (55) Therefore, the evidence is given by p(X; µ0, λ0, a0, b0) = p(X, τ; µ0, λ0, a0, b0)dτ (56) = 1 2π N/2 λ0 λ0 + N 1/2 1 Γ(a0) ba0 0 τa0+N/2−1 × exp − b0 + λ0µ0 + N ¯x 2 λ0µ0 + N ¯x λ0 + N − λ0µ2 0 + Nx2 λ0µ0 + N ¯x τ dτ (57) = 1 2π N/2 λ0 λ0 + N 1/2 Γ(a0 + N/2) Γ(a0) ba0 0 b0 + λ0µ0 + N ¯x 2 λ0µ0 + N ¯x λ0 + N − λ0µ2 0 + Nx2 λ0µ0 + N ¯x a0+N/2 (58) Therefore, the true posterior is given by p(µ, τ|X; µ0, λ0, a0, b0) = p(X, µ, τ; µ0, λ0, a0, b0) p(X; µ0, λ0, a0, b0) = τ 2π N/2 exp − τ 2 N n=1 (xn − µ)2 × λ0τ 2π 1/2 exp − λ0τ 2 (µ − µ0)2 × 1 Γ(a0) ba0 0 τa0−1 exp(−b0τ) 1 2π N/2 λ0 λ0 + N 1/2 Γ(a0 + N/2) Γ(a0) ba0 0 b0 + λ0µ0 + N ¯x 2 λ0µ0 + N ¯x λ0 + N − λ0µ2 0 + Nx2 λ0µ0 + N ¯x a0+N/2 = λ0 + N 2π 1/2 exp − τ 2 N n=1 (xn − µ)2 − λ0τ 2 (µ − µ0)2 × τa0+(N−1)/2 exp(−b0τ) Γ(a0 + N/2) b0 + λ0µ0 + N ¯x 2 λ0µ0 + N ¯x λ0 + N − λ0µ2 0 + Nx2 λ0µ0 + N ¯x a0+N/2 (59) 4