Stat 3203 -cluster and multi-stage sampling

Stat-3203: Sampling Technique-II
(Chapter-2: Cluster and Multi-stage Sampling)
Md. Menhazul Abedin
Lecturer
Statistics Discipline
Khulna University, Khulna-9208
Email: menhaz70@gmail.com

Objectives and Outline
Single stage cluster sampling
Cluster sampling with equal and unequal
sample size
Properties
Advantages and disadvantages
Multi-stage cluster sampling (two stage)

Acknowledgement
• Daroga Singh & F. S. Chaudhary
• M. Nurul Islam
• Ravindra Singh & Naurang Singh Mangat

Background…
• SRS
• Stratified
• Systematic

Cluster
• A cluster is an aggregate or group, consisting
of several (nonhomogeneuos) population
elements

Intuition…
• Study variable: Income/ Awarness/ health status
etc
• Ghatbhogh, Rupsa,
Naihati
• PSU: Primary sampling
Unit
• Single stage sampling
Sample
Collect Information
from all individual

Intuition…
• Upazila Union
• Two stage Sampling
PSU SSU

Intuition…
• Study variable: Income/Awarness/Healthy etc
• Multistage sampling
Division District
UpazilaUnion
village Household

Why cluster sampling?
• Feasibility: No samling frame needed
• Economy: Reduction of cost
• Flexibility of cluster formation: Manipulation
of cluster size possible (like political division,
administrative division, commercial capital)

Disadvantages...
• Loss of precision:
• Problems in analysis:
• Do you think any other disadvantages…?
Please insert here...

Cluster sampling and Others
• Cluster sampling and SRS
• Cluster sampling and Stratified
• Cluster sampling and Systematic

Cluster sampling
Cluster-1 Cluster-3Cluster-2 Cluster-4 Cluster-5
Construct a sample

Definition…
• Cluster sampling is a method of sampling,
which consists of first selecting, at random
groups, called clusters of elements from the
population, and then choosing all of the
elements within each cluster to make up the
sample. (M. Nurul Islam)

Stratified sampling
Strata-1
N1
Strata-2
N2
Strata-3
N2
Strata-4
N2
n1 n3n2 n4
N1+N2+
N3+N4=
N
n1+n2+n
3+n4=n

Single-stage cluster sampling (equal)
Clusters
Elements 1 2 3 ... i ... N
1 𝑦11 𝑦21 𝑦31 ... 𝑦𝑖1 ... 𝑦 𝑁1
2 𝑦12 𝑦22 𝑦32 ... 𝑦𝑖2 ... 𝑦 𝑁2
... ... ... ... ... ... ... ...
j 𝑦1𝑗 𝑦2𝑗 𝑦3𝑗 ... 𝑦𝑖𝑗 ... 𝑦 𝑁𝑗
... ... ... ... ... ... ... ...
M 𝑦1𝑀 𝑦2𝑀 𝑦3𝑀 ... 𝑦𝑖𝑀 ... 𝑦 𝑁𝑀
Cluster total 𝑦1 𝑦2 𝑦3 ... 𝑦𝑖 ... 𝑦 𝑁
Cluster mean 𝑦1 𝑦2 𝑦3 ... 𝑦𝑖 ... 𝑦 𝑁
Layout of NM popn elements inclusters

Clusters
Elements 1 2 3 ... i ... n
1 𝑦11 𝑦21 𝑦31 ... 𝑦𝑖1 ... 𝑦 𝑛1
2 𝑦12 𝑦22 𝑦32 ... 𝑦𝑖2 ... 𝑦 𝑛2
... ... ... ... ... ... ... ...
j 𝑦1𝑗 𝑦2𝑗 𝑦3𝑗 ... 𝑦𝑖𝑗 ... 𝑦 𝑛𝑗
... ... ... ... ... ... ... ...
M 𝑦1𝑀 𝑦2𝑀 𝑦3𝑀 ... 𝑦𝑖𝑀 ... 𝑦 𝑛𝑀
Cluster total 𝑦1 𝑦2 𝑦3 ... 𝑦𝑖 ... 𝑦 𝑛
Cluster mean 𝑦1 𝑦2 𝑦3 ... 𝑦𝑖 ... 𝑦 𝑛
Layout of nM sample elements inclusters

• Indivisual cluster mean
• 𝑦𝑖 =
1
𝑀
𝑦𝑖1 + 𝑦𝑖2 + ⋯ + 𝑦𝑖𝑀 =
𝑦 𝑖
𝑀
=
1
𝑀 𝑗=1
𝑀
𝑦𝑖𝑗
• n cluster mean (sample mean)
• 𝑦𝑛 =
1
𝑛 𝑖=1
𝑛
𝑦𝑖
• Sample mean
𝑦 =
𝑦
𝑛𝑀
=
1
𝑛𝑀 𝑖=1
𝑛
𝑗=1
𝑀
𝑦𝑖𝑗 =
1
𝑛𝑀 𝑖=1
𝑛
𝑦𝑖 =
1
𝑛𝑀 𝑖=1
𝑛
𝑀 𝑦𝑖 =
1
𝑛 𝑖=1
𝑛
𝑦𝑖 = 𝑦𝑛= n cluster mean
Sample mean = n cluster mean

• N cluster mean 𝑌𝑁 =
1
𝑁 𝑖=1
𝑁
𝑦𝑖
• Population mean
𝑌 =
𝑌
𝑁𝑀
=
1
𝑁𝑀 𝑖=1
𝑁
𝑗=1
𝑀
𝑦𝑖𝑗 =
1
𝑁𝑀 𝑖=1
𝑁
𝑦𝑖 =
1
𝑁𝑀 𝑖=1
𝑁
𝑀 𝑦𝑖 =
1
𝑁 𝑖=1
𝑁
𝑦𝑖 = 𝑌𝑛 = N cluster
mean
Population mean = N cluster mean

• Variance calculation:
𝑉 𝑦𝑛 =
𝑁 − 𝑛
𝑁
1
𝑛
1
𝑀2
𝑖=1
𝑁
𝑦𝑖 − 𝑖=1
𝑁
𝑦𝑖
𝑁
2
𝑁 − 1
𝑉 𝑦𝑛 =
𝑁 − 𝑛
𝑁
1
n
𝑖=1
𝑁
𝑦𝑖 − 𝑌 2
𝑁 − 1
=
1−𝑓
n
𝑆 𝑏
2
• Replace 𝑆 𝑏
2
by 𝑠 𝑏
2
= 𝑖=1
𝑛
𝑦 𝑖− 𝑦 𝑛
2
𝑛−1
• Estimator of 𝑉 𝑦𝑛 is v 𝑦𝑛 =
1−𝑓
n
𝑠 𝑏
2

• Theorem 8.1: defined mean is unbiased and
estimate the variance of mean.
(Need intra-cluster correlation discussed next
slide)
• 𝑉 𝑦𝑛 =
(1−𝑓)(𝑁𝑀−1)
n𝑀2(𝑁−1)
𝑆2
[1 + (𝑀 − 1)𝜌]
Or 𝑉 𝑦𝑛 ≈
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]

Intra-cluster correlation
• The similarity of observations within a cluster
can be quantified by means of the Intracluster
Correlation Coefficient (ICC), sometimes also
referred to as intraclass correlation coefficient.
• This is very similar to the well known
Pearson’s correlation coefficient; only that we
do not simultaneously look at observations of
two variables on the same object but we look
simultaneously on two values of the same
variable, but taken at two different objects.
• Calculation like Auto-correlation (discussed)

Intra-cluster correlation
• Mean square between elementsin the population
𝑆2
=
𝑖=1
𝑁
𝑗=1
𝑀
𝑦 𝑖𝑗− 𝑌
2
𝑁𝑀−1
• Intra cluster correlation
𝜌 =
𝐸(𝑦𝑖𝑗 − 𝑌)(𝑦𝑗𝑘 − 𝑌)
𝐸 𝑦𝑖𝑗 − 𝑌
2
=
2 𝑖=1
𝑁
𝑗=1<𝑘
𝑀
(𝑦𝑖𝑗 − 𝑌)(𝑦𝑗𝑘 − 𝑌)
(𝑀 − 1)(𝑁𝑀 − 1)𝑆2

Variance in terms of 𝜌
• 𝑉 𝑦𝑛 =
𝑁−𝑛
𝑁
1
n
𝑖=1
𝑁
𝑦 𝑖− 𝑌 2
𝑁−1
• Expand the squared term and relate with 𝜌
• 𝑉 𝑦𝑛 =
(1−𝑓)(𝑁𝑀−1)
n𝑀2(𝑁−1)
𝑆2
[1 + (𝑀 − 1)𝜌]
• If N large 𝑁𝑀 − 1 ≈ 𝑁𝑀 and 𝑁 − 1 ≈ 𝑁
• 𝑉 𝑦𝑛 ≈
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]
• 𝑉 𝑦𝑛 =
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌] [simplicity ]

Design effect
• Variance of 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
• 𝑉 𝑦𝑛 =
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]
• Variance of 𝑆𝑖𝑚𝑝𝑙𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
• 𝑉 𝑦 𝑛𝑀 =
𝑁𝑀−𝑛𝑀
𝑁𝑀
𝑆2
𝑛𝑀
=
1−𝑓
nM
𝑆2
• Dividing
𝑉 𝑦 𝑛
𝑉 𝑦 𝑛𝑀
= 1 + 𝑀 − 1 𝜌 = Deff
• What is the inter pretation of Design effect?
– It’s simple, can you find it. Try your best.

Relationship between 𝜌, Deff and M
• 𝐷𝑒𝑓𝑓 = 1 + 𝑀 − 1 𝜌
– See its property when
– 𝜌 = 1 [Deff=M all the M values in a cluster are
equal]
– 𝑀 = 1 [SRS= cluster sampling]
– 𝜌 = 0 [cluster void
– 𝐷𝑒𝑓𝑓 = 0 or +1 find range of intra-cluster
correlation

Efficiency of cluster sampling
•
𝑉 𝑦 𝑛𝑀
𝑉 𝑦 𝑛
=
1
1+ 𝑀−1 𝜌
=
1
𝐷𝑒𝑓𝑓
• Observe its characteristics when
– 𝜌 > 0 Cluster sampling less efficient compared to SRS
– 𝜌 < 0 Cluster sampling more efficient compared to SRS

Single-stage cluster sampling (Equal)
• Find Optimum n and M subject to constraint
cost.
– Ignore it provisionally

Example
• Example: 8.2
• Example: 8.3

Single stage cluster sampling with
Unequal cluster size

Single-stage cluster sampling (Unequal)
Clusters
Elements 1 2 3 ... i ... N
1 𝑦11 𝑦21 𝑦31 ... 𝑦𝑖1 ... 𝑦 𝑁1
2 𝑦12 𝑦22 𝑦32 ... 𝑦𝑖2 ... 𝑦 𝑁2
... … ... ... ... ... ... ...
... ... ... ... ... ... ... ...
𝑀𝑖 𝑦1𝑀1
𝑦2𝑀2
𝑦3𝑀3
... 𝑦𝑖𝑀 𝑖
... 𝑦 𝑁𝑀 𝑁

• Total number of elements
𝑀0 = 𝑖=1
𝑁
𝑀𝑖
• Total number of elements in each cluster
𝑦𝑖 = 𝑗=1
𝑀 𝑖
𝑦𝑖𝑗
• Average number of elements per cluster
𝑀 =
𝑖=1
𝑁
𝑀𝑖
N
=
𝑀0
𝑁

• Population mean (1)
𝑌 =
𝑖=1
𝑁
𝑗=1
𝑀 𝑖
𝑦𝑖𝑗
𝑖=1
𝑁
𝑀𝑖
=
𝑖=1
𝑁
𝑀𝑖 𝑦𝑖
𝑖=1
𝑁
𝑀𝑖
=
𝑖=1
𝑁
𝑀𝑖 𝑦𝑖
𝑀0
• Population mean (2)
𝑌𝑁 =
𝑖=1
𝑁
𝑦𝑖
𝑁
• Are they same?

• Sample mean (1)
𝑦𝑛 = 𝑖=1
𝑛
𝑦 𝑖
𝑛
Biased for 𝑌 but unbiased for 𝑌𝑁
• Sample mean (2)
• 𝑦𝑛 =
𝑁
𝑛𝑀0
𝑖=1
𝑛
𝑀𝑖 𝑦𝑖 =
1
𝑛 𝑖=1
𝑛
(
𝑀 𝑖 𝑦 𝑖
𝑀
)
This is unbiased for 𝑌

Clusters
Elements 1 2 3 ... i ... N
1 𝑦11 𝑦21 𝑦31 ... 𝑦𝑖1 ... 𝑦 𝑁1
2 𝑦12 𝑦22 𝑦32 ... 𝑦𝑖2 ... 𝑦 𝑁2
... … ... ... ... ... ... ...
... ... ... ... ... ... ... ...
M 𝑦1𝑀1
𝑦2𝑀2
𝑦3𝑀3
... 𝑦𝑖𝑀 𝑖
... 𝑦 𝑁𝑀 𝑁

• Do an example

• Further study
– Cluster sampling with PPS sampling (No need right
now )

Background...
• A unit may contain too many elements to
obtain a measurement on each
• A unit may contain elements that are nearly
alike.
Multi-stage cluster sampling (Two-stage)

Background...
•
𝑉 𝑦 𝑛𝑀
𝑉 𝑦 𝑛
=
1
1+ 𝑀−1 𝜌
or
𝑉 𝐶𝑙𝑢𝑠𝑡𝑒𝑟
𝑉 𝑆𝑅𝑆
=
1
1+ 𝑀−1 𝜌
– What will be happen when M increase??????
• Less efficient cluster sampling
• Large cluster draw small sample

• Sub-sampling (two stage sampling)
• A two stage cluster is one, which is obtained
by first selecting a sample of cluster and then
selecting again a sample of elements from
each sampled cluster.
• Village → Household (subsample)

Cluster 𝑴𝒊 Population elements Total Cluster mean
1 𝑀1 𝑦11, 𝑦12, … , 𝑦1𝑗, … , 𝑦1𝑀1 𝑌1 =
𝑗=1
𝑴 𝟏
𝒚 𝟏𝒋
𝑌1 =
𝑌1
𝑀1
2 𝑀2 𝑦21, 𝑦22, … , 𝑦2𝑗, … , 𝑦2𝑀2 𝑌2 =
𝑗=1
𝑴 𝟐
𝒚 𝟐𝒋
𝑌2 =
𝑌2
𝑀2
… … … … …
i 𝑀𝑖 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑗, … , 𝑦𝑖𝑀 𝑖 𝑌𝑖 =
𝑗=1
𝑴 𝒊
𝒚𝒊𝒋
𝑌𝑖 =
𝑌𝑖
𝑀𝑖
… … … … …
N 𝑀 𝑁 𝑦 𝑁1, 𝑦 𝑁2, … , 𝑦 𝑁𝑗, … , 𝑦 𝑁𝑀 𝑁 𝑌𝑁 =
𝑗=1
𝑴 𝑵
𝒚 𝑵𝒋
𝑌𝑁 =
𝑌𝑁
𝑀 𝑁

• 𝑌 = 𝑖=1
𝑁
𝑌𝑖 = 𝑖=1
𝑁
𝑗=𝑗
𝑀 𝑖
𝑦𝑖𝑗
• 𝑀0 = 𝑖=1
𝑁
𝑀𝑖
• 𝑌𝑖 =
𝑗=𝑗
𝑀 𝑖 𝑦 𝑖𝑗
𝑀 𝑖
=
𝑌 𝑖
𝑀 𝑖
• Population mean
𝑌 =
𝑖=1
𝑁
𝑗=𝑗
𝑀 𝑖
𝑦𝑖𝑗
𝑖=1
𝑁
𝑀𝑖
=
Y
𝑀0
=
𝑖=1
𝑁
𝑌𝑖
𝑀0
=
𝑖=1
𝑁
𝑌𝑖
𝑀0
=
𝑖=1
𝑁
𝑀𝑖 𝑌𝑖
𝑀0
• Population pooled mean
𝑌𝑖 =
𝑖=1
𝑁
𝑌𝑖
𝑁
=
𝑗=𝑗
𝑀 𝑖
𝑦𝑖𝑗
𝑁
=
𝑖=1
𝑁
𝑀𝑖 𝑌𝑖
𝑁
Red and blue mean
are different. Red is
individual cluster
mean but blue is
polled mean

Unit 𝑴𝒊 𝒎𝒊 Sample observation Total Cluster mean
1 𝑀1 𝑚1 𝑦11, 𝑦12, … , 𝑦1𝑗, … , 𝑦1𝑚1 𝑦1 =
𝑗=1
𝒎 𝟏
𝒚 𝟏𝒋
𝑦1 =
𝑦1
𝑚1
2 𝑀2 𝑚2 𝑦21, 𝑦22, … , 𝑦2𝑗, … , 𝑦2𝑚2 𝑦2 =
𝑗=1
𝒎 𝟐
𝒚 𝟐𝒋
𝑦2 =
𝑦2
𝑚2
… … … … … …
i 𝑀𝑖 𝑚𝑖 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑗, … , 𝑦𝑖𝑚 𝑖 𝑦𝑖 =
𝑗=1
𝒎 𝒊
𝒚𝒊𝒋
𝑦𝑖 =
𝑦𝑖
𝑚𝑖
… … … … … …
n 𝑀 𝑛 𝑚 𝑛 𝑦 𝑛1, 𝑦 𝑛2, … , 𝑦 𝑛𝑗, … , 𝑦𝑛𝑚 𝑛 𝑦 𝑁 =
𝑗=1
𝒎 𝒏
𝒚 𝒏𝒋
𝑦𝑛 =
𝑦𝑛
𝑚 𝑛

• 𝑦 = 𝑖=1
𝑛
𝑦𝑖 = 𝑖=1
𝑛
𝑗=𝑗
𝑚 𝑖
𝑦𝑖𝑗
• 𝑚0 = 𝑖=1
𝑛
𝑚𝑖 , 𝑚 =
𝑚0
𝑛
• Average value per second stage unit
• 𝑦𝑖 =
𝑗=𝑗
𝑚 𝑖 𝑦 𝑖𝑗
𝑚 𝑖
=
𝑦 𝑖
𝑚 𝑖
, 𝑦 =
y
𝑚0
• Average value per first-stage unit
𝑦𝑛 =
𝑦
𝑛
=
𝑖=1
𝑛
𝑗=𝑗
𝑚 𝑖 𝑦 𝑖𝑗
𝑛

• Number of estimator is defined (You can define
more with good properties as a researcher )
• 𝑦𝑡𝑠(1) = 𝑖=1
𝑛
𝑦 𝑖
𝑛
ordinary mean based on first
stage unit mean.
• 𝑦𝑡𝑠(2) = 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛 𝑀
=
𝑁
𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛
based on 𝑀0
= 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑖=1
𝑛
𝑀 𝑖
= 𝑦𝑡𝑠 Known as ratio estimator
• 𝑌𝑅 = 𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑖=1
𝑛
𝑀 𝑖
estimator of total
replace 𝑀0by 𝑀0 = 𝑁 𝑖=1
𝑛
𝑀 𝑖
𝑛

Why such Scribble functions?
• 𝑖 th cluster total= 𝑀𝑖 𝑦𝑖
• Estimator of total Y over selected n clusters
𝑖=1
𝑛
𝑀𝑖 𝑦𝑖
• Average value of Y per cluster is 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛
• Estimator of total Y over N clusters
N
n 𝑖=1
𝑛
𝑀𝑖 𝑦𝑖
• Total= Total frequency × mean

Why such Scribble functions?
•
N
n 𝑖=1
𝑛
𝑀𝑖 𝑦𝑖 = 𝑌 = 𝑀0 × 𝑚𝑒𝑎𝑛 = 𝑀0
𝑁
𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛
• 𝑚𝑒𝑎𝑛 =
𝑌
𝑀0
[Estimator for 𝑌]
• Thus 𝑦𝑡𝑠(2) =
𝑁
𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛

Unbiasedness...
• Theorem 9.1: The estimator 𝑦𝑡𝑠(2) is unbiased
and its variance is given by
𝑉 𝑦 𝑡𝑠 2 = 1 − 𝑓1
1
𝑀2
𝑆 𝑏
2
𝑛
+
1
𝑛𝑁 𝑀2
𝑖=1
𝑁
𝑀𝑖
2
1 − 𝑓2𝑖
𝑆𝑖
2
𝑚𝑖
Where 𝑓1 =
𝑛
𝑁
, 𝑓2𝑖 =
𝑚 𝑖
𝑀 𝑖
Prerequisite given next slide

Conditional Expectation
• 𝐸 𝑋 = 𝐸[𝐸 𝑋 𝑌 ]
• 𝐸 𝑢 = 𝐸1 𝐸2 𝑢 𝑏∗
= 𝑗 𝑝 𝑏∗
= 𝐵∗
(𝐸 𝑢 𝐵∗
)
• 𝐸1 is unconditional in our context expectation of first
stage selection
• 𝐸2 conditional expectationin our context expectation of
second stage selections from a given set of first stage
units.
• 𝑉 𝑥 = 𝑉 𝐸 𝑋 𝑌 + 𝐸 𝑉 𝑋 𝑌
• 𝑉 𝑦 𝑡𝑠 2 = 𝑉1 𝐸2 𝑦 𝑡𝑠 2 𝑛 + 𝐸1 𝑉2 𝑦 𝑡𝑠 2 𝑛

Advantages
• Flexible than one stage
• Quality control purpose
• Large survey
• Less cost & more convenience over stratified
sampling of same size

Stat 3203 -cluster and multi-stage sampling

More Related Content

What's hot (20)

Similar to Stat 3203 -cluster and multi-stage sampling (20)

More from Khulna University (9)

Recently uploaded (20)

Stat 3203 -cluster and multi-stage sampling