Predictive mean-matching2

Predictive mean matching imputation in survey
sampling
Shu Yang Jae Kwang Kim
Iowa State University
June 14, 2017

Outline
1 Basic Setup
2 Main result
3 Nearest neighbor imputation
4 Variance estimation
5 Simulation study
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 2 / 31

Predictive mean matching: Basic Setup
FN = {(xi , yi , δi ), i = 1, 2, · · · , N}: ﬁnite population, where
δi =
1 if yi is observed
0 otherwise
Note that δi are deﬁned throughout the population. It is referred to
as Reverse approach (Fay, 1992; Shao and Steel, 1999; Kim, Navarro,
and Fuller, 2006; Berg, Kim, and Skinner, 2016).
Parameter of interest: µ = N−1 N
i=1 yi .

Basic Setup (Cont’d)
Let A be the index set of the probability sample selected from FN.
The ﬁrst order inclusion probability πi are known in the sample.
We observe (xi , δi , δi yi ) for i ∈ A.
Imputation estimator of µ is
ˆµ =
1
N
i∈A
1
πi
{δi yi + (1 − δi )y∗
i }
where y∗
i is an imputed estimator of yi for unit i with δi = 0.

Assumptions
We assume
E(yi | xi ) = m(xi ; β∗
)
where m(·) is a function of x known up to β∗.
We assume MAR (missing-at-random) in the sense that
P(δ = 1 | x, y) = P(δ = 1 | x)

Regression Imputation
Two-step imputation
1 Obtain a consistent estimator of β∗:
i∈A
1
πi
δi {yi − m(xi ; β)}g(xi ; β) = 0
for some g(xi ; β). That is, find ˆβ that satisfies ˆβ = β∗ + op(1).
2 Compute ˆyi = m(xi ; ˆβ) and use y∗
i = ˆyi (deterministic imputation) or
y∗
i = ˆyi + ê∗
i (stochastic imputation).
More rigorous theory can be found in Shao and Steel (1999) and Kim and
Rao (2009).

Predictive Mean Matching (PMM) Imputation
1 Obtain ˆβ satisfying ˆβ = β∗ + op(1).
2 For each unit i with δi = 0, obtain a predicted value of yi as
ˆyi = m(xi ; ˆβ).
3 Find the nearest neighbor of unit i from the respondents with the
minimum distance between yj and ˆyi . Let i(1) be the index of the
nearest neighbor of unit i, which satisﬁes
d(yi(1), ˆyi ) ≤ d(yj , ˆyi ),
for any j ∈ AR, where d(yi , yj ) = |yi − yj | and AR is the set of the
respondents.
4 Use y∗
i = yi(1) for δi = 0.

PMM Imputation
PMM estimator
ˆµpmm =
1
N
i∈A
1
πi
δi yi + (1 − δi )yi(1) . (1)
It is a hot deck imputation (using real observations as the imputed
values).
Because it uses a model information for E(y | x), it can be eﬃcient if
the model is good.
Popular imputation method, but its asymptotic properties are not
investigated rigorously. Asymptotic properties and variance estimation
are open research problems.

Remark
Express the PMM estimator (1) as a function of ˆβ:
ˆµpmm(β) = N−1
i∈A
π−1
i {δi yi + (1 − δi )yi(1)}
= N−1



i∈A
π−1
i δi yi +
j∈A
π−1
j (1 − δj )
i∈A
δi dij yi



= N−1
i∈A
π−1
i δi (1 + κβ,i )yi ,
where
κβ,i =
j∈A
πi π−1
j (1 − δj )dij (2)
and dij = 1 if yj(1) = yi and dij = 0 otherwise.

Remark (Cont’d)
If the regression imputation were used then the imputation estimator
is a smooth function of β and the standard linearization method can
be used to investigate the asymptotic properties of ˆµReg (ˆβ).
In the PMM imputation, ˆµPMM(β) is not a smooth function of β and
we cannot apply the standard linearization method.

2. Main result
Overview
Theorem 1: We will ﬁrst establish the asymptotic normality of
n1/2{ˆµpmm(β∗) − µ} when β∗ is known.
Theorem 2: Next, we will establish the asymptotic normality of
n1/2{ˆµpmm(ˆβ) − µ}.
Theorem 3: In addition, we also discuss asymptotic properties of the
nearest neighbor imputation estimator.

Basic Idea
Express
ˆµpmm(β) − µ = Dn(β) + Bn(β), (3)
where
Dn(β) =
1
N
i∈A
1
πi
{m(xi ; β) + δi (1 + κβ,i ){yi − m(xi ; β)} −
N
i=1
yi
and
Bn(β) =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1); β) − m(xi ; β)}.
The diﬀerence m(xi(1); β) − m(xi ; β) accounts for the matching
discrepancy, and Bn(β) contributes to the asymptotic bias of the
matching estimator.

Asymptotic bias (Abadie and Imbens, 2006)
If the matching for the nearest neighbor is based on x, then
d(xi(1), xi ) = Op(n−1/p
),
where p is the dimension of x. Thus, for the classical nearest
neighbor imputation using x, the asymptotic bias
Bn =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1)) − m(xi )}
satisﬁes Bn = Op(n−1/p) which is not negligible for p ≥ 2.
For the case of PMM, we use a scalar function m(x) to ﬁnd the
nearest neighbor. Thus, Bn = Op(n−1) and the PMM estimator is
asymptotically unbiased.

Theorem 1
Suppose that m(x) = E(y | x) = m(xi ; β∗) and σ2(x) = var(y | x). Under
the regularity conditions (skipped), we have
n1/2
{ˆµpmm(β∗
) − µ} → N(0, V1)
in distribution, as n → ∞, where
V1 = V m
+ V e
(4)
with
V m
=
n
N2
E V
i∈A
π−1
i m(xi ) | FN ,
V e
= E
n
N2
i∈A
π−1
i δi (1 + κβ∗,i ) − 1
2
σ2
(xi ) ,
and κβ,i is deﬁned in (2).

Theorem 2
Under some additional regularity conditions, we have
n1/2
{ˆµpmm(ˆβ) − µ} → N(0, V2)
in distribution, as n → ∞, with
V2 = V1 − γ2V −1
s γ2 + γ1τ−1
Vsτ−1
γ1 (5)
where V1 is deﬁned in (4), γ1 = E{ ˙m(x; β)},
γ2 = p lim
1
N
N
i=1
n
N
1
πi
(1 + κβ∗,i ) − 1 δi ˙m(xi ; β∗
)
and ˙m(x; β) = ∂m(x; β)/∂β .

Remark
1 The second term in (5),
V2 − V1 = −γ2V −1
s γ2 + γ1τ−1
Vsτ−1
γ1,
reflects the effect of using ˆβ instead of β∗ in the PMM imputation.
2 If n/N = o(1) and m(xi ; β) = β0 + β1xi with scalar x, then γ1 = γ2.
Furthermore, under SRS with g(x; β) = ˙m(x; β)/σ2(xi ) in the
estimating function of ˆβ, then V −1
s = τ−1Vsτ−1 . In this case,
V2 = V1.
3 In general, V2 is different from V1 and the effect of the sampling error
of ˆβ should be reflected in the variance estimation of ˆµPMM(ˆβ).

3. Nearest neighbor imputation
Instead of using a distance function on y, one may use a distance
function on x. Thus, the nearest neighbor of unit i, denoted by i(1),
can be determined to satisfy
d(xi(1), xi ) ≤ d(xj , xi ),
for j ∈ AR, where d(xi , xj ) is the distance function between xi and xj .
The NNI estimator can be written as
ˆµNNI = N−1
i∈A
π−1
i δi yi + (1 − δi )yi(1) .
Here, the only diﬀerence is on the matching variable for identifying
i(1).

Asymptotic properties
We can obtain a similar decomposition in (3):
ˆµNNI − µ = Dn + Bn,
where
Dn =
1
N
i∈A
1
πi
{m(xi ) + δi (1 + κi ){yi − m(xi )} −
N
i=1
yi
and
Bn =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1)) − m(xi )}.
The asymptotic bias is not negligible for p ≥ 2: Bn = Op(n−1/p)

Bias-corrected NNI estimator
Let ˆm(x) be a (nonparametric) estimator of m(x) = E(y | x).
We can estimate Bn by
ˆBn =
1
N
i∈A
1
πi
(1 − δi ){ ˆm(xi(1)) − ˆm(xi )}.
A bias-corrected NNI estimator of µ is
ˆµNNI,bc = N−1
i∈A
π−1
i {δi yi + (1 − δi )y∗
i }
where y∗
i = ˆm(xi ) + yi(1) − ˆm(xi(1)).

Theorem 3
Under some regularity conditions, the bias corrected NNI estimator is
asymptotically equivalent to the PMM estimator with known β∗. That is,
n1/2
{ˆµNNI,bc − ˆµPMM(β∗
)} = op(1).
Thus, we have
n1/2
(ˆµNNI,bc − µ) → N(0, V1)
in distribution, as n → ∞, where V1 is deﬁned in (4).

4. Replication variance estimation
If there is no nonresponse, we can use
ˆVrep(ˆµ) =
L
k=1
ck ˆµ(k)
− ˆµ
2
as a variance estimator of ˆµ = i∈A wi yi , where ˆµ(k) = i∈A w
(k)
i yi .
For example, in the delele-1 jackknife method under SRS, we have
L = n, ck = (n − 1)/n, and
w
(k)
i =
(n − 1)−1 if i = k
0 if i = k
We are interested in estimating the variance of PMM imputation
estimator.

Approach 1
Idea: Apply bootstrap (or jackknife) method and repeat the same
imputation method.
Such approach provides consistent variance estimation for regression
imputation.
ˆµ
(k)
reg,I =
i∈A
w
(k)
i δi yi + (1 − δi )m(xi ; ˆβ(k)
)
However, in the stochastic regression imputation, such approach does
not work because it does not capture the random imputation part
correctly.

Note that
ˆµreg,I2 =
i∈A
wi {δi yi + (1 − δi )(ˆyi + ê∗
i )}
=
i∈A
wi {ˆyi + δi (1 + κi )êi } ,
where κi is defined in (2).
Thus, we can write
ˆµreg,I2(β) =
i∈A
wi {m(xi ; β) + δi (1 + κi )(yi − m(xi ; β))}
=
i∈A
wi f (xi , yi , δi , κi ; β)

Approach 2:
Idea: If the imputation estimator can be expressed as
ˆµI =
i∈A
wi f (xi , yi , δi , κi ; ˆβ)
for some f function (known form), we can use
ˆµ
(k)
I =
i∈A
w
(k)
i f (xi , yi , δi , κi ; ˆβ(k)
)
to construct a replication variance estimator
ˆVrep =
L
k=1
ck ˆµ
(k)
I − ˆµI
2
.

Variance estimation for PMM estimator
1 Obtain the k-th replicate of ˆβ, denoted by ˆβ(k), by solving
i∈A
w
(k)
i δi {yi − m(xi ; β)}g(xi ; β) = 0
for β.
2 Calculate the k-th replicate of ˆµpmm by
ˆµ
(k)
pmm =
i∈A
w
(k)
i [m(xi ; ˆβ(k)
) + δi (1 + κi ){yi − m(xi ; ˆβ(k)
)}].
3 Compute
ˆVrep =
L
k=1
ck ˆµ
(k)
pmm − ˆµpmm
2
.

5. Simulation Study
We wish to answer the following questions from a simulation study:
1 Is the bias of PMM imputation estimator asymptotically negligible?
2 Does the bias of NNI estimator remain signiﬁcant for p ≥ 2?
3 Is the PMM imputation estimator more robust than the regression
imputation estimator?
Other issues
Eﬃciency comparison
Variance estimation validity
Coverage property for Normal-based Interval estimators.

Simulation Setup 1
Population model (N = 50, 000)
1 P1: y = −1 + x1 + x2 + e
2 P2: y = −1.167 + x1 + x2 + (x1 − 0.5)2
+ (x2 − 0.5)2
+ e
3 P3: y = −1.5 + x1 + x2 + x2 + x4 + x5 + x6 + e
where x1, x2, x3 ∼ Uniform(0, 1), x4, x5, x6, e ∼ N(0, 1).
Sampling design:
1 SRS of size n = 400
2 PPS of size n = 400 with size measure si = log(|yi + νi | + 4) where
νi ∼ N(0, 1).
Response probability: δi ∼ Bernoulli{p(xi )} where
logit{p(xi )} = 0.2 + x1i + x2i . The overall response rate is 75%.

Simulation Setup 2
Imputation methods (for P1 and P2)
1 NNI using (x1, x2)
2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i
3 Stochastic Regression imputation (SRI) using y∗
i = ˆyi + ˆe∗
i , where
ˆy = ˆβ0 + ˆβ1x1i + ˆβ2x2i .
Imputation methods (for P3)
1 NNI using (x1, x2, · · · , x6)
2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i
3 SRI using y∗
i = ˆyi + ˆe∗
i , where ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i .
Parameter of interest: θ = E(Y )

Simulation Result 1
Table: Simulation results: Bias (×102
) and S.E. (×102
) of the point estimator.
PMM NNI SRI
Bias S.E. Bias S.E. Bias S.E.
Simple Random Sampling
(P1) -0.15 6.46 -0.21 6.54 -0.23 6.44
(P2) -0.22 6.54 -0.25 6.55 -0.37 6.46
(P3) 1.90 11.85 18.59 11.06 0.11 11.17
Probability Proportional to Size Sampling
(P1) 0.05 6.46 0.13 6.37 0.18 6.53
(P2) 0.30 6.52 0.12 6.47 0.16 6.60
(P3) 1.33 10.99 17.53 10.70 0.40 11.10
PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI:
stochastic regression imputation.

Simulation Result 2
Table: Simulation results: Relative Bias of jackknife variance estimates (×102
)
and Coverage Rate (%) of 95% conﬁdence intervals.
PMM NNI SRI
RB CR RB CR RB CR
Simple Random Sampling
(P1) 5 95.1 3 95.1 5 95.8
(P2) 5 95.4 3 95.3 5 95.6
(P3) 4 95.2 4 63.8 4 95.5
Probability Proportional to Size Sampling
(P1) 2 95.5 3 94.8 2 94.9
(P2) 1 95.4 0 95.3 3 94.9
(P3) 7 95.8 3 65.5 -3 95.6
PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI:
stochastic regression imputation.

6. Discussion
The non-smoothness nature of the PMM estimator makes the
asymptotic result diﬃcult to investigate.
Some recent econometric papers (Andreou and Werker, 2012; Abadie
and Imbens, 2016) provides some key ideas to solve this problem in
survey sampling.
The proof for Theorem 2 involves Martingale central limit theorem
and Le Cam’s third lemma.
Replication variance estimation can be constructed to properly
capture the variability of the PMM estimator.
Fractional imputation can be developed to address this problem. This
is a topic for future research.

Predictive mean-matching2

More Related Content

What's hot (20)

Similar to Predictive mean-matching2 (20)

Recently uploaded (20)

Predictive mean-matching2