Sufficient statistics

Alessandro Ortis
Università degli Studi di Catania
Dipartimento di Matematica e Informatica
Image Processing Lab - iplab.dmi.unict.it
Sufficient statistics

Parameter estimation: given a sample X = (x1, x2, … xn)
from a population with pdf P(X|θ), we try to infer θ
from some information represented by X.
A. Ortis – Sufficient statistics

Could be useful finding a reduced representation of X
by means a function F(X)?
Ex:
X T(X)= mean(X)
4 5 6 5
5 5 5 5
3 5 7 5

[ 4, 5, 6]
T(X) = 5 [ 5, 5, 5]
[ 3, 5, 7]
...
Is there any loss of information ? Have we lost useful
data or the representation given by T(X) is enought to
infer the same information about θ conteined in X ?

[ 4, 5, 6]
T(X) = 5 [ 5, 5, 5]
[ 3, 5, 7]
….
Is it sufficient to consider only the reduced data T(X)?

Def .
A statistic T(X) is sufficient for θ if P(X|T(X)) is
not a function of θ.

Example: Let (x1, x2, … xn) be a random sample of n Bernoulli(p)
trials
x =
1 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏. 𝑝
0 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏. 1 − 𝑝
Can we find a sufficient statistic for p?
Considering the definition of sufficiency, can we find a function
T(X) such that P(X|T(X)) is independent from p?
(solution in the next slide...)

This conditional distribution does not depend on p!
Once the value of T(X) is known, no other functon of X
will provide any additiona information about p.
If T(X) = 𝑋𝑖 = t
we have P(X | T(X)) =
1
𝑛
𝑡

A sufficient statistic T(X) reduces X in two senses:
1) We can reduce the dimensionality of data
2) The possible values assumed by T(X) are fewer

A statistic T(X) induces a partition on the sample space.
Given a value t, we can define the subset
𝐴 𝑡 = 𝑋: 𝑇 𝑋 = 𝑡

Bernoulli population with n=3, the sample space of X is
0,0,0 ; 0,0,1 ; 0,1,0 ; 0,1,1 ; 1,0,0 ;
1,0,1 ; 1,1,0 ; 1,1,1

0,0,0 ; 0,0,1 ; 0,1,0 ; 0,1,1 ; 1,0,0 ;
1,0,1 ; 1,1,0 ; 1,1,1
t Induced subset
0 { 0,0,0 }
1 { 0,0,1 ; 0,1,0 ; 1,0,0 }
2 { 0,1,1 ; 1,1,0 ; 1,0,1 }
3 { 1,1,1 }

Theorem:
T(X) is a sufficient statistic for θ sif the likelihood
factorizes into the following form
L(x1, x2, … xn | θ ) = g( θ, T(x1, x2, … xn))·h(x1, x2, … xn)

Theorem:
T(X) is a sufficient statistic for θ sif the likelihood
factorizes into the following form
L(x1, x2, … xn | θ ) = g( θ, T(x1, x2, … xn))·h(x1, x2, … xn)
θ and X interact only via T(X)

Def.
T is a minimal sufficient statistic if the following statements
are true:
1. T is sufficient
2. If S is any other sufficient statistic then T = g(U) for some
function g

In other words, T generates the coarsest sufficient partition.
A minimal sufficient statistic is the smallest sufficient
statistic and therefore it represents the ultimate data
reduction with respect to estimating θ . In general, it may or
may not exists.

Theorem:
T(X) is a minimal sufficient statistics if
P(𝑥1, 𝑥2, … 𝑥 𝑛 | 𝜃)
P(𝑦1, 𝑦2, … 𝑦𝑛 | 𝜃)
𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜃
𝑇(𝑥1, 𝑥2, … 𝑥 𝑛) = 𝑇(𝑦1, 𝑦2, … 𝑦𝑛)

• T(X) may not exist
• If so, is not unique
• Any 1-1 function of a sufficient statistic which does
not depends on 𝜃 is also a sufficient statistic
• All we considered so far on sufficiency can easily be
extended to accommodate two (or more)
parameters.

Example: let (x1, x2, … xn) be N(μ,σ2) observations.
Let 𝜃1 = μ e 𝜃2 = σ2 we have that
T(X) = ( 𝑋𝑖, 𝑋𝑖
2
)
T(X) = ( 𝑋, 𝑆2)
Are both minimal sufficient statistics for N(μ,σ2)

Sufficient statistics

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Sufficient statistics (20)

Recently uploaded (20)

Sufficient statistics