SlideShare a Scribd company logo
The Multivariate Gaussian Probability
Distribution
Peter Ahrendt
IMM, Technical University of Denmark
mail : pa@imm.dtu.dk, web : www.imm.dtu.dk/∼pa
January 7, 2005
Contents
1 Definition 2
2 Functions of Gaussian Variables 4
3 Characteristic function and Moments 6
4 Marginalization and Conditional Distribution 9
4.1 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Conditional distribution . . . . . . . . . . . . . . . . . . . . . . . 10
5 Tips and Tricks 11
5.1 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Useful integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1
Chapter 1
Definition
The definition of a multivariate gaussian probability distribution can be stated
in several equivalent ways. A random vector X = [X1X2 . . . XN ] can be said to
belong to a multivariate gaussian distribution if one of the following statements
is true.
• Any linear combination Y = a1X1 + a2X2 + . . . + aN XN , ai ∈ R is a
(univariate) gaussian distribution.
• There exists a random vector Z = [Z1, . . . , ZM ] with components that are
independent and standard normal distributed, a vector µ = [µ1, . . . , µN ]
and an N-by-M matrix A such that X = AZ + µ.
• There exists a vector µ and a symmetric, positive semi-definite matrix Γ
such that the characteristic function of X can be written φx(t) ≡ heitT
X
i =
eiµT
t− 1
2 tT
Γt
.
Under the assumption that the covariance matrix Σ is non-singular, the prob-
ability density function (pdf) can be written as :
Nx(µ, Σ) =
1
p
(2π)d|Σ|
exp
µ
−
1
2
(x − µ)T
Σ−1
(x − µ)
¶
= |2πΣ|− 1
2 exp
µ
−
1
2
(x − µ)T
Σ−1
(x − µ)
¶ (1.1)
Then µ is the mean value, Σ is the covariance matrix and | · | denote the
determinant. Note that it is possible to have multivariate gaussian distributions
with singular covariance matrix and then the above expression cannot be used
for the pdf. In the following, however, non-singular covariance matrices will be
assumed.
In the limit of one dimension, the familiar expression of the univariate gaussian
pdf is found.
2
Nx(µ, σ2
) =
1
√
2πσ2
exp
µ
−
(x − µ)2
2σ2
¶
=
1
√
2πσ2
exp
µ
−
1
2
(x − µ)σ−2
(x − µ)
¶
(1.2)
Neither of them have a closed-form expression for the cumulative density func-
tion.
Symmetries
It is noted that in the one-dimensional case there is a symmetry in the pdf.
Nx(µ, σ2
) which is centered on µ. This can be seen by looking at ”contour
lines”, i.e. setting the exponent −(x−µ)2
2σ2 = c. It is seen that σ determines the
width of the distribution.
In the multivariate case, it is similarly useful to look at −1
2 (x − µ)T
Σ−1
(x − µ) = c.
This is a quadratic form and geometrically the contour curves (for fixed c) are hy-
perellipsoids. In 2D, this is normal ellipsoids with the form (x−x0
a )2
+(y−y0
b )2
=
r2
, which gives symmetries along the principal axes. Similarly, the hyperellip-
soids show symmetries along their principal axes.
Notation: If a random variable X has a gaussian distribution, it is written as
X∼ N(µ, Σ). The probability density function of this variable is then given by
Nx(µ, Σ).
3
Chapter 2
Functions of Gaussian
Variables
Linear transformation and addition of variables
Let A, B ∈ Mc·d
and c ∈ Rd
. Let X ∼ N(µx, Σx) and Y ∼ N(µy, Σy) be
independent variables. Then
Z = AX + BY + c ∼ N(Aµx + Bµy + c, AΣxAT
+ BΣyBT
) (2.1)
Transform to standard normal variables
Let X ∼ N(µ, Σ). Then
Z = Σ− 1
2 (X − µ) ∼ N(0, 1) (2.2)
Note, that by Σ− 1
2 is actually meant a unique matrix, although in general
matrices with fractional exponents are not. The matrix that is meant can be
found from the diagonalisation into Σ = UΛUT
= (UΛ
1
2 )(UΛ
1
2 )T
where Λ
is the diagonal matrix with the eigenvalues of Σ and U is the matrix with the
eigenvectors. Then Σ− 1
2 = (UΛ
1
2 )−1
= Λ− 1
2 U−1
.
In the one-dimensional case, this corresponds to the transformation of X ∼
N(µ, σ2
) into Y = σ−1
(X − µ) ∼ N(0, 1).
Addition
Let Xi ∼ N(µi, Σi), i ∈ 1, ..., N be independent variables. Then
N
X
i
Xi ∼ N(
N
X
i
µi,
N
X
i
Σi) (2.3)
Note: This is a direct implication of equation (2.1).
4
Quadratic
Let Xi ∼ N(0, 1), i ∈ 1, ..., N be independent variables. Then
N
X
i
X2
i ∼ χ2
n (2.4)
Alternatively let X ∼ N(µx, Σx). Then
Z = (X − µ)T
Σ−1
(X − µ) ∼ χ2
n (2.5)
This is, however, the same thing since Z= X̃
T
X̃ =
PN
i X̃i
2
, where X̃i are the
decorrelated components (see eqn. (2.2)).
5
Chapter 3
Characteristic function and
Moments
The characteristic function of the univariate gaussian distribution is given by
φx(t) ≡ heitX
i = eitµ−σ2
t2
/2
. The generalization to multivariate gaussian distri-
butions is
φx(t) ≡ heitT
X
i = eiµT
t− 1
2 tT
Σt
(3.1)
The pdf p(x) is related to the characteristic function.
p(x) =
1
(2π)d
Z
Rd
φx(t) e−itT
x
dt (3.2)
It is seen that the characteristic function is the inverse Fourier transform of the
pdf.
Moments of a pdf are generally defined as :
hXk1
1 Xk2
2 · · · XkN
N i ≡
Z
Rd
xk1
1 xk2
2 · · · xkN
N p(x) dx (3.3)
where hXk1
1 Xk2
2 · · · XkN
N i is the k’th order moment, k = [k1, k2, . . . , kN ] (ki ∈ N)
and k = k1 + k2 + . . . + kN . A well-known example is the first order moment,
called the mean value µi (of variable Xi) - or the mean µ ≡ [µ1µ2 . . . µN ] of the
whole random vector X.
The k’th order central moment is defined as above, but with Xi replaced by
Xi − µi in equation (3.3). An example is the second order central moment,
called the variance, which is given by h(Xi − µi)2
i.
Any moment (that exists) can be found from the characteristic function [8]:
6
hXk1
1 Xk2
2 · · · XkN
N i = (−j)k ∂k
φx(t)
∂tk1
1 . . . ∂tkN
N
¯
¯
¯
¯
t=0
(3.4)
where k = k1 + k2 + . . . + kN .
1. Order Moments
Mean µ ≡ hXi (3.5)
2. Order Moments
Variance cii ≡ h(Xi − µi)2
i = hX2
i i − µ2
i (3.6)
Covariance cij ≡ h(Xi − µi)(Xj − µj)i (3.7)
Covariance matrix Σ ≡ h(X − µ)(X − µ)T
i ≡ [cij] (3.8)
3. Order Moments
Often the skewness is used.
Skew(X) ≡
h(Xi − hXii)3
i
h(Xi − hXii)2i
3
2
=
h(Xi − µi)3
i
h(Xi − µi)2i
3
2
(3.9)
All 3. order central moments are zero for gaussian distributions and thus also
the skewness.
4. Order Moments
The kurtosis is (in newer literature) given as
Kurt(X) ≡
h(Xi − µi)4
i
h(Xi − µi)2i2
− 3 (3.10)
Let X ∼ N(µ, Σ). Then
h(Xi − µi)(Xj − µj)(Xk − µk)(Xl − µl)i = cijckl + cilcjk + cikclj (3.11)
and
Kurt(X) = 0 (3.12)
N. Order Moments
Any central moment of a gaussian distribution can (fairly easily) be calculated
with the following method [3] (sometimes known as Wicks theorem).
Let X ∼ N(µ, Σ). Then
7
• Assume k is odd. Then the central k’th order moments are all zero.
• Assume k is even. Then the central k’th order moments are equal to
P
(cijckl . . . cxz). The sum is taken over all different permutations of the k
indices, where it is noted that cij = cji. This gives (k − 1)!/(2k/2−1
(k/2 −
1)!) terms which each is the product of k/2 covariances.
An example is illustrative. The different 4. order central moments of X are found
with the above method to give
h(Xi − µi)4
i = 3c2
ii
h(Xi − µi)3
(Xj − µj)i = 3ciicij
h(Xi − µi)2
(Xj − µj)2
i = ciicjj + 2c2
ij
h(Xi − µi)2
(Xj − µj)(Xk − µk)i = ciicjk + 2cijcik
h(Xi − µi)(Xj − µj)(Xk − µk)(Xl − µl)i = cijckl + cilcjk + cikclj
(3.13)
The above results were found by seeing that the different permutations of the
k=4 indices are (12)(34), (13)(24) and (14)(23). Other permutations are equiv-
alents, such as for instance (32)(14) which is equivalent to (14)(23). When cal-
culating e.g. h(Xi − µi)2
(Xj − µj)(Xk − µk)i, the assignment (1 → i, 2 → i, 3 →
j, 4 → k) gives the terms ciicjk, cijcik and cijcik in the sum.
Calculations with moments
Let b ∈ Rc
, A, B ∈ Mc·d
. Let X and Y be random vectors and f and g vector
functions. Then
hAf(X) + Bg(X) + bi = Ahf(X)i + Bhg(X)i + b (3.14)
hAX + bi = AhXi + b (3.15)
hhY|Xii ≡ E(E(Y|X)) = hYi (3.16)
If Xi and Xj are independent then
hXiXji = hXiihXji (3.17)
8
Chapter 4
Marginalization and
Conditional Distribution
4.1 Marginalization
Marginalization is the operation of integrating out variables of the pdf of a
random vector X. Assume that X is split into two parts (since the order-
ing of the Xi is arbitrary, this corresponds to any division of the variables),
X = [XT
1:cXT
c+1:N ]T
= [X1X2 . . . XcXc+1 . . . XN ]T
. Let the pdf of X be p(x) =
p(x1, . . . , xN ), then :
p(x1, . . . , xc) =
Z
· · ·
Z
Rc+1:N
p(x1, . . . , xN ) dxc+1 . . . xN (4.1)
The nice part about gaussian distributions is that every marginal distribution
of a gaussian distribution is itself a gaussian. More specifically, let X be split
into two parts as above and X ∼ N(µ, Σ), then :
p(x1, . . . , xc) = p(x1:c) = N1:c(µ1:c, Σ1:c) (4.2)
where µ1:c = [µ1, µ2, . . . , µc] and
Σ1:c =






c11 c21 . . . cc1
c12 c22
.
.
.
.
.
.
...
.
.
.
cc1 . . . . . . ccc






In words, the mean and covariance matrix of the marginal distribution is the
same as the corresponding elements of the joint distribution.
9
4.2. CONDITIONAL DISTRIBUTION
4.2 Conditional distribution
As in the previous, let X = [XT
1:cXT
c+1:N ]T
= [X1X2 . . . XcXc+1 . . . XN ]T
be a
division of the variables into two parts. Let X ∼ N(µ, Σ) and use the notation
X = [XT
1:cXT
c+1:N ]T
= [XT
(1)XT
(2)]T
and
µ =
µ
µ(1)
µ(2)
¶
and
Σ =
µ
Σ11 Σ12
Σ21 Σ22
¶
It is found that the conditional distribution p(x(1)|x(2)) is in fact again a gaus-
sian distribution and
X(1)|X(2) ∼ N(µ(1) + Σ12Σ−1
22 (x2 − µ(2)), Σ11 − Σ12Σ−1
22 ΣT
12) (4.3)
10
Chapter 5
Tips and Tricks
5.1 Products
Consider the product Nx(µa, Σa) · Nx(µb, Σb) and note that they both have x
as their ”random variable”. Then
Nx(µa, Σa) · Nx(µb, Σb) = zcNx(µc, Σc) (5.1)
where Σc = (Σ−1
a + Σ−1
b )−1
and µc = Σc(Σ−1
a µa + Σ−1
b µb) and
zc = |2πΣaΣbΣ−1
c |− 1
2 exp
µ
−
1
2
(µa − µb)T
Σ−1
a ΣcΣ−1
b (µa − µb)
¶
= |2π(Σa + Σb)|− 1
2 exp
µ
−
1
2
(µa − µb)T
(Σa + Σb)−1
(µa − µb)
¶ (5.2)
In words, the product of two gaussians is another gaussian (unnormalized). This
can be generalised to a product of K gaussians with distributions Xk ∼ N(µk, Σk).
K
Y
k=1
Nx(µk, Σk) = z̃ · Nx(µ̃, Σ̃) (5.3)
where Σ̃ =
³PK
k=1 Σ−1
k
´−1
and µ̃ = Σ̃
³PK
k=1 Σ−1
k µk
´
=
³PK
k=1 Σ−1
k
´−1 ³PK
k=1 Σ−1
k µk
´
and
z̃ =
|2πΣd|
1
2
QK
k=1 |2πΣk|
1
2
Y
i<j
exp
µ
−
1
2
(µi − µj)T
Bij(µi − µj)
¶
(5.4)
where
11
5.2. GAUSSIAN INTEGRALS
Bij = Σ−1
i
à K
X
k=1
Σ−1
k
!−1
Σ−1
j (5.5)
5.2 Gaussian Integrals
A nice thing about the fact that products of gaussian functions are again a
gaussian function, is that it makes gaussian integrals easier to calculate since
R
Nx(µ̃, Σ̃)dx = 1. Using this with the equations (5.1) and (5.3) of the previous
section gives the following.
Z
Rd
Nx(µa, Σa) · Nx(µb, Σb) dx =
Z
Rd
zcNx(µc, Σc) dx
= zc
(5.6)
Similarly,
Z
Rd
K
Y
k=1
Nx(µk, Σk) dx = z̃ (5.7)
Equation (5.3) can also be used to calculate integrals such as
R
|x|q
(
Q
k Nx(µk, Σk)) dx
or similar by using the same technique as above.
5.3 Useful integrals
Let X ∼ N(µ, Σ) and a ∈ Rd
an arbitrary vector. Then
heaT
x
i ≡
Z
Nx(µ, Σ)eaT
x
dx = eaT
µ+ 1
2 aT
Σa
(5.8)
From this expression, it is possible to find integrals such as
R
exp(xT
Ax +
aT
x) dx. Another useful integral is
hexT
Ax
i ≡
Z
Nx(µ, Σ)exT
Ax
dx = |I − 2ΣA|− 1
2 e− 1
2 µT
(Σ−2A−1
)−1
µ
(5.9)
where A ∈ Md·d
is a non-singular matrix.
12
Bibliography
[1] T.W. Anderson, An introduction to multivariate statistical analysis. Wiley,
1984.
[2] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University
Press, 1995.
[3] K. Triantafyllopoulos, “On the central moments of the multidimensional
Gaussian distribution,” The Mathematical Scientist, vol. 28, pp. 125–128,
2003.
[4] S. Roweis, “Gaussian Identities,” http://www.cs.toronto.edu/ roweis/notes.html.
[5] J. Larsen, “Gaussian Integrals,” Tech. Rep., http://isp.imm.dtu.dk/staff-
/jlarsen/pubs/frame.htm
[6] www.Wikipedia.org.
[7] www.MathWorld.wolfram.com
[8] P. Kidmose, “Blind Separation of Heavy Tail Signals,” Ph.d. Thesis,
http://isp.imm.dtu.dk/staff/kidmose/pk publications.html
13

More Related Content

ODT
PPTX
Satellite and its orbits
PPTX
Schedule Based MAC Protocol
PPTX
Lyapunov theory
PPTX
Serial peripheral interface
PPTX
Introduction of VANET
PDF
Chapter2
PDF
Ece analog-communications
Satellite and its orbits
Schedule Based MAC Protocol
Lyapunov theory
Serial peripheral interface
Introduction of VANET
Chapter2
Ece analog-communications

What's hot (20)

PPTX
Microwave imaging
PPTX
Contention based MAC protocols
PPTX
helical antenna, construction, geometry and design criteria and example
PPT
MIMO in 15 minutes
PPT
Beamforming antennas (1)
PPTX
Fault tolerance techniques
PPTX
Wireless sensors networks protocols
PPTX
Reconnaissance satellite or Spy Satellite
PPTX
Adhoc wireless networks and its issues
PPTX
Pan sharpening
PPTX
Unit 3 introduction to cognitive radios
PPTX
Conformal antenna
PPT
Monopulse tracking radar
PPTX
PDF
3 ECE AWP.pdf
PDF
Beamforming in 5G, MU-MIMO, 5G Antenna arrey
PDF
ATI Courses Satellite Communications Systems Engineering Professional Develop...
PDF
Kalman Filter Presentation
PPTX
HSPA AND HSDPA
Microwave imaging
Contention based MAC protocols
helical antenna, construction, geometry and design criteria and example
MIMO in 15 minutes
Beamforming antennas (1)
Fault tolerance techniques
Wireless sensors networks protocols
Reconnaissance satellite or Spy Satellite
Adhoc wireless networks and its issues
Pan sharpening
Unit 3 introduction to cognitive radios
Conformal antenna
Monopulse tracking radar
3 ECE AWP.pdf
Beamforming in 5G, MU-MIMO, 5G Antenna arrey
ATI Courses Satellite Communications Systems Engineering Professional Develop...
Kalman Filter Presentation
HSPA AND HSDPA
Ad

Similar to The Multivariate Gaussian Probability Distribution (20)

PDF
Statistical Hydrology for Engineering.pdf
PDF
A Note on “   Geraghty contraction type mappings”
PDF
Section3 stochastic
PDF
Section5 stochastic
PDF
Hierarchical matrices for approximating large covariance matries and computin...
PDF
A new implementation of k-MLE for mixture modelling of Wishart distributions
PDF
Applications of Differential Calculus in real life
PDF
Metodo gauss_newton.pdf
PDF
Multivariate Gaussin, Rayleigh & Rician distributions
PDF
Principal Component Analysis
PDF
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
PDF
Moment-Generating Functions and Reproductive Properties of Distributions
PDF
IJSRED-V2I5P56
PDF
Adaptive Restore algorithm & importance Monte Carlo
PDF
Linear models for classification
PDF
multivariate normal distribution.pdf
PDF
Fixed Point Theorem in Fuzzy Metric Space Using (CLRg) Property
PDF
Fixed points theorem on a pair of random generalized non linear contractions
PDF
Multivriada ppt ms
Statistical Hydrology for Engineering.pdf
A Note on “   Geraghty contraction type mappings”
Section3 stochastic
Section5 stochastic
Hierarchical matrices for approximating large covariance matries and computin...
A new implementation of k-MLE for mixture modelling of Wishart distributions
Applications of Differential Calculus in real life
Metodo gauss_newton.pdf
Multivariate Gaussin, Rayleigh & Rician distributions
Principal Component Analysis
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Moment-Generating Functions and Reproductive Properties of Distributions
IJSRED-V2I5P56
Adaptive Restore algorithm & importance Monte Carlo
Linear models for classification
multivariate normal distribution.pdf
Fixed Point Theorem in Fuzzy Metric Space Using (CLRg) Property
Fixed points theorem on a pair of random generalized non linear contractions
Multivriada ppt ms
Ad

Recently uploaded (20)

PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
2. Earth - The Living Planet earth and life
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPT
protein biochemistry.ppt for university classes
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
Sciences of Europe No 170 (2025)
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
BIOMOLECULES PPT........................
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
2. Earth - The Living Planet earth and life
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
protein biochemistry.ppt for university classes
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Viruses (History, structure and composition, classification, Bacteriophage Re...
Sciences of Europe No 170 (2025)
Cell Membrane: Structure, Composition & Functions
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Introduction to Cardiovascular system_structure and functions-1
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
7. General Toxicologyfor clinical phrmacy.pptx
The KM-GBF monitoring framework – status & key messages.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
BIOMOLECULES PPT........................

The Multivariate Gaussian Probability Distribution

  • 1. The Multivariate Gaussian Probability Distribution Peter Ahrendt IMM, Technical University of Denmark mail : pa@imm.dtu.dk, web : www.imm.dtu.dk/∼pa January 7, 2005
  • 2. Contents 1 Definition 2 2 Functions of Gaussian Variables 4 3 Characteristic function and Moments 6 4 Marginalization and Conditional Distribution 9 4.1 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Conditional distribution . . . . . . . . . . . . . . . . . . . . . . . 10 5 Tips and Tricks 11 5.1 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.3 Useful integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1
  • 3. Chapter 1 Definition The definition of a multivariate gaussian probability distribution can be stated in several equivalent ways. A random vector X = [X1X2 . . . XN ] can be said to belong to a multivariate gaussian distribution if one of the following statements is true. • Any linear combination Y = a1X1 + a2X2 + . . . + aN XN , ai ∈ R is a (univariate) gaussian distribution. • There exists a random vector Z = [Z1, . . . , ZM ] with components that are independent and standard normal distributed, a vector µ = [µ1, . . . , µN ] and an N-by-M matrix A such that X = AZ + µ. • There exists a vector µ and a symmetric, positive semi-definite matrix Γ such that the characteristic function of X can be written φx(t) ≡ heitT X i = eiµT t− 1 2 tT Γt . Under the assumption that the covariance matrix Σ is non-singular, the prob- ability density function (pdf) can be written as : Nx(µ, Σ) = 1 p (2π)d|Σ| exp µ − 1 2 (x − µ)T Σ−1 (x − µ) ¶ = |2πΣ|− 1 2 exp µ − 1 2 (x − µ)T Σ−1 (x − µ) ¶ (1.1) Then µ is the mean value, Σ is the covariance matrix and | · | denote the determinant. Note that it is possible to have multivariate gaussian distributions with singular covariance matrix and then the above expression cannot be used for the pdf. In the following, however, non-singular covariance matrices will be assumed. In the limit of one dimension, the familiar expression of the univariate gaussian pdf is found. 2
  • 4. Nx(µ, σ2 ) = 1 √ 2πσ2 exp µ − (x − µ)2 2σ2 ¶ = 1 √ 2πσ2 exp µ − 1 2 (x − µ)σ−2 (x − µ) ¶ (1.2) Neither of them have a closed-form expression for the cumulative density func- tion. Symmetries It is noted that in the one-dimensional case there is a symmetry in the pdf. Nx(µ, σ2 ) which is centered on µ. This can be seen by looking at ”contour lines”, i.e. setting the exponent −(x−µ)2 2σ2 = c. It is seen that σ determines the width of the distribution. In the multivariate case, it is similarly useful to look at −1 2 (x − µ)T Σ−1 (x − µ) = c. This is a quadratic form and geometrically the contour curves (for fixed c) are hy- perellipsoids. In 2D, this is normal ellipsoids with the form (x−x0 a )2 +(y−y0 b )2 = r2 , which gives symmetries along the principal axes. Similarly, the hyperellip- soids show symmetries along their principal axes. Notation: If a random variable X has a gaussian distribution, it is written as X∼ N(µ, Σ). The probability density function of this variable is then given by Nx(µ, Σ). 3
  • 5. Chapter 2 Functions of Gaussian Variables Linear transformation and addition of variables Let A, B ∈ Mc·d and c ∈ Rd . Let X ∼ N(µx, Σx) and Y ∼ N(µy, Σy) be independent variables. Then Z = AX + BY + c ∼ N(Aµx + Bµy + c, AΣxAT + BΣyBT ) (2.1) Transform to standard normal variables Let X ∼ N(µ, Σ). Then Z = Σ− 1 2 (X − µ) ∼ N(0, 1) (2.2) Note, that by Σ− 1 2 is actually meant a unique matrix, although in general matrices with fractional exponents are not. The matrix that is meant can be found from the diagonalisation into Σ = UΛUT = (UΛ 1 2 )(UΛ 1 2 )T where Λ is the diagonal matrix with the eigenvalues of Σ and U is the matrix with the eigenvectors. Then Σ− 1 2 = (UΛ 1 2 )−1 = Λ− 1 2 U−1 . In the one-dimensional case, this corresponds to the transformation of X ∼ N(µ, σ2 ) into Y = σ−1 (X − µ) ∼ N(0, 1). Addition Let Xi ∼ N(µi, Σi), i ∈ 1, ..., N be independent variables. Then N X i Xi ∼ N( N X i µi, N X i Σi) (2.3) Note: This is a direct implication of equation (2.1). 4
  • 6. Quadratic Let Xi ∼ N(0, 1), i ∈ 1, ..., N be independent variables. Then N X i X2 i ∼ χ2 n (2.4) Alternatively let X ∼ N(µx, Σx). Then Z = (X − µ)T Σ−1 (X − µ) ∼ χ2 n (2.5) This is, however, the same thing since Z= X̃ T X̃ = PN i X̃i 2 , where X̃i are the decorrelated components (see eqn. (2.2)). 5
  • 7. Chapter 3 Characteristic function and Moments The characteristic function of the univariate gaussian distribution is given by φx(t) ≡ heitX i = eitµ−σ2 t2 /2 . The generalization to multivariate gaussian distri- butions is φx(t) ≡ heitT X i = eiµT t− 1 2 tT Σt (3.1) The pdf p(x) is related to the characteristic function. p(x) = 1 (2π)d Z Rd φx(t) e−itT x dt (3.2) It is seen that the characteristic function is the inverse Fourier transform of the pdf. Moments of a pdf are generally defined as : hXk1 1 Xk2 2 · · · XkN N i ≡ Z Rd xk1 1 xk2 2 · · · xkN N p(x) dx (3.3) where hXk1 1 Xk2 2 · · · XkN N i is the k’th order moment, k = [k1, k2, . . . , kN ] (ki ∈ N) and k = k1 + k2 + . . . + kN . A well-known example is the first order moment, called the mean value µi (of variable Xi) - or the mean µ ≡ [µ1µ2 . . . µN ] of the whole random vector X. The k’th order central moment is defined as above, but with Xi replaced by Xi − µi in equation (3.3). An example is the second order central moment, called the variance, which is given by h(Xi − µi)2 i. Any moment (that exists) can be found from the characteristic function [8]: 6
  • 8. hXk1 1 Xk2 2 · · · XkN N i = (−j)k ∂k φx(t) ∂tk1 1 . . . ∂tkN N ¯ ¯ ¯ ¯ t=0 (3.4) where k = k1 + k2 + . . . + kN . 1. Order Moments Mean µ ≡ hXi (3.5) 2. Order Moments Variance cii ≡ h(Xi − µi)2 i = hX2 i i − µ2 i (3.6) Covariance cij ≡ h(Xi − µi)(Xj − µj)i (3.7) Covariance matrix Σ ≡ h(X − µ)(X − µ)T i ≡ [cij] (3.8) 3. Order Moments Often the skewness is used. Skew(X) ≡ h(Xi − hXii)3 i h(Xi − hXii)2i 3 2 = h(Xi − µi)3 i h(Xi − µi)2i 3 2 (3.9) All 3. order central moments are zero for gaussian distributions and thus also the skewness. 4. Order Moments The kurtosis is (in newer literature) given as Kurt(X) ≡ h(Xi − µi)4 i h(Xi − µi)2i2 − 3 (3.10) Let X ∼ N(µ, Σ). Then h(Xi − µi)(Xj − µj)(Xk − µk)(Xl − µl)i = cijckl + cilcjk + cikclj (3.11) and Kurt(X) = 0 (3.12) N. Order Moments Any central moment of a gaussian distribution can (fairly easily) be calculated with the following method [3] (sometimes known as Wicks theorem). Let X ∼ N(µ, Σ). Then 7
  • 9. • Assume k is odd. Then the central k’th order moments are all zero. • Assume k is even. Then the central k’th order moments are equal to P (cijckl . . . cxz). The sum is taken over all different permutations of the k indices, where it is noted that cij = cji. This gives (k − 1)!/(2k/2−1 (k/2 − 1)!) terms which each is the product of k/2 covariances. An example is illustrative. The different 4. order central moments of X are found with the above method to give h(Xi − µi)4 i = 3c2 ii h(Xi − µi)3 (Xj − µj)i = 3ciicij h(Xi − µi)2 (Xj − µj)2 i = ciicjj + 2c2 ij h(Xi − µi)2 (Xj − µj)(Xk − µk)i = ciicjk + 2cijcik h(Xi − µi)(Xj − µj)(Xk − µk)(Xl − µl)i = cijckl + cilcjk + cikclj (3.13) The above results were found by seeing that the different permutations of the k=4 indices are (12)(34), (13)(24) and (14)(23). Other permutations are equiv- alents, such as for instance (32)(14) which is equivalent to (14)(23). When cal- culating e.g. h(Xi − µi)2 (Xj − µj)(Xk − µk)i, the assignment (1 → i, 2 → i, 3 → j, 4 → k) gives the terms ciicjk, cijcik and cijcik in the sum. Calculations with moments Let b ∈ Rc , A, B ∈ Mc·d . Let X and Y be random vectors and f and g vector functions. Then hAf(X) + Bg(X) + bi = Ahf(X)i + Bhg(X)i + b (3.14) hAX + bi = AhXi + b (3.15) hhY|Xii ≡ E(E(Y|X)) = hYi (3.16) If Xi and Xj are independent then hXiXji = hXiihXji (3.17) 8
  • 10. Chapter 4 Marginalization and Conditional Distribution 4.1 Marginalization Marginalization is the operation of integrating out variables of the pdf of a random vector X. Assume that X is split into two parts (since the order- ing of the Xi is arbitrary, this corresponds to any division of the variables), X = [XT 1:cXT c+1:N ]T = [X1X2 . . . XcXc+1 . . . XN ]T . Let the pdf of X be p(x) = p(x1, . . . , xN ), then : p(x1, . . . , xc) = Z · · · Z Rc+1:N p(x1, . . . , xN ) dxc+1 . . . xN (4.1) The nice part about gaussian distributions is that every marginal distribution of a gaussian distribution is itself a gaussian. More specifically, let X be split into two parts as above and X ∼ N(µ, Σ), then : p(x1, . . . , xc) = p(x1:c) = N1:c(µ1:c, Σ1:c) (4.2) where µ1:c = [µ1, µ2, . . . , µc] and Σ1:c =       c11 c21 . . . cc1 c12 c22 . . . . . . ... . . . cc1 . . . . . . ccc       In words, the mean and covariance matrix of the marginal distribution is the same as the corresponding elements of the joint distribution. 9
  • 11. 4.2. CONDITIONAL DISTRIBUTION 4.2 Conditional distribution As in the previous, let X = [XT 1:cXT c+1:N ]T = [X1X2 . . . XcXc+1 . . . XN ]T be a division of the variables into two parts. Let X ∼ N(µ, Σ) and use the notation X = [XT 1:cXT c+1:N ]T = [XT (1)XT (2)]T and µ = µ µ(1) µ(2) ¶ and Σ = µ Σ11 Σ12 Σ21 Σ22 ¶ It is found that the conditional distribution p(x(1)|x(2)) is in fact again a gaus- sian distribution and X(1)|X(2) ∼ N(µ(1) + Σ12Σ−1 22 (x2 − µ(2)), Σ11 − Σ12Σ−1 22 ΣT 12) (4.3) 10
  • 12. Chapter 5 Tips and Tricks 5.1 Products Consider the product Nx(µa, Σa) · Nx(µb, Σb) and note that they both have x as their ”random variable”. Then Nx(µa, Σa) · Nx(µb, Σb) = zcNx(µc, Σc) (5.1) where Σc = (Σ−1 a + Σ−1 b )−1 and µc = Σc(Σ−1 a µa + Σ−1 b µb) and zc = |2πΣaΣbΣ−1 c |− 1 2 exp µ − 1 2 (µa − µb)T Σ−1 a ΣcΣ−1 b (µa − µb) ¶ = |2π(Σa + Σb)|− 1 2 exp µ − 1 2 (µa − µb)T (Σa + Σb)−1 (µa − µb) ¶ (5.2) In words, the product of two gaussians is another gaussian (unnormalized). This can be generalised to a product of K gaussians with distributions Xk ∼ N(µk, Σk). K Y k=1 Nx(µk, Σk) = z̃ · Nx(µ̃, Σ̃) (5.3) where Σ̃ = ³PK k=1 Σ−1 k ´−1 and µ̃ = Σ̃ ³PK k=1 Σ−1 k µk ´ = ³PK k=1 Σ−1 k ´−1 ³PK k=1 Σ−1 k µk ´ and z̃ = |2πΣd| 1 2 QK k=1 |2πΣk| 1 2 Y i<j exp µ − 1 2 (µi − µj)T Bij(µi − µj) ¶ (5.4) where 11
  • 13. 5.2. GAUSSIAN INTEGRALS Bij = Σ−1 i à K X k=1 Σ−1 k !−1 Σ−1 j (5.5) 5.2 Gaussian Integrals A nice thing about the fact that products of gaussian functions are again a gaussian function, is that it makes gaussian integrals easier to calculate since R Nx(µ̃, Σ̃)dx = 1. Using this with the equations (5.1) and (5.3) of the previous section gives the following. Z Rd Nx(µa, Σa) · Nx(µb, Σb) dx = Z Rd zcNx(µc, Σc) dx = zc (5.6) Similarly, Z Rd K Y k=1 Nx(µk, Σk) dx = z̃ (5.7) Equation (5.3) can also be used to calculate integrals such as R |x|q ( Q k Nx(µk, Σk)) dx or similar by using the same technique as above. 5.3 Useful integrals Let X ∼ N(µ, Σ) and a ∈ Rd an arbitrary vector. Then heaT x i ≡ Z Nx(µ, Σ)eaT x dx = eaT µ+ 1 2 aT Σa (5.8) From this expression, it is possible to find integrals such as R exp(xT Ax + aT x) dx. Another useful integral is hexT Ax i ≡ Z Nx(µ, Σ)exT Ax dx = |I − 2ΣA|− 1 2 e− 1 2 µT (Σ−2A−1 )−1 µ (5.9) where A ∈ Md·d is a non-singular matrix. 12
  • 14. Bibliography [1] T.W. Anderson, An introduction to multivariate statistical analysis. Wiley, 1984. [2] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995. [3] K. Triantafyllopoulos, “On the central moments of the multidimensional Gaussian distribution,” The Mathematical Scientist, vol. 28, pp. 125–128, 2003. [4] S. Roweis, “Gaussian Identities,” http://www.cs.toronto.edu/ roweis/notes.html. [5] J. Larsen, “Gaussian Integrals,” Tech. Rep., http://isp.imm.dtu.dk/staff- /jlarsen/pubs/frame.htm [6] www.Wikipedia.org. [7] www.MathWorld.wolfram.com [8] P. Kidmose, “Blind Separation of Heavy Tail Signals,” Ph.d. Thesis, http://isp.imm.dtu.dk/staff/kidmose/pk publications.html 13