CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistics and Potential Applications to Distributed Data - Richard Smith, Feb 12, 2018

BLOCKING METHODS FOR SPATIAL
STATISTICS AND POTENTIAL
APPLICATIONS TO DISTRIBUTED DATA
Petrutza Caragea1 and Richard L Smith2
1Department of Statistics, Iowa State University
2Departments of STOR and Biostatistics,
UNC-Chapel Hill, and SAMSI
SAMSI Workshop on Remote Sensing
Caltech, February 12, 2018
1

This talk is based primarily on the second of two papers by
Petrutza Caragea and Richard Smith:
1. Asymptotic properties of computationally eﬃcient alternative
estimators for a class of multivariate normal models. Journal
of Multivariate Analysis, 98, 1417–1440
2. Approximate likelihoods of spatial processes.
Preprint, available at
http://www.stat.unc.edu/postscript/rs/Techno-R-v1.pdf
2

Model Structure
Y ∼ N[Xη, Σ(θ)],
where Y is an N×1 vector of observations, X is an N×P matrix of
covariates, η is a P ×1 vector of unknown regression coeﬃcients,
and Σ(θ) is the N × N covariance matrix, expressed in terms of
a ﬁnite-dimensional parameter vector θ.
Examples:
σij = σ2 exp −
dij
ρ
, θ = (σ, ρ),
σij =
σ
2ν−1Γ(ν)


2ν1/2dij
ρ


ν
Kν


2ν1/2dij
ρ

 , θ = (σ, ρ, ν).
3

Previous Approaches based on Blocking Ideas
Exact likelihood:
p(Y ) = p(Y1)
N
i=2
p(Yi | Y1, ..., Yi−1). (1)
Vecchia (1988): in (1), replace p(Yi | Y1, ..., Yi−1) by p(Yi | Si)
where Si is some subset.
Extension by Stein et al. (2004): Update in blocks of form
p(Yi, Yi+1, ..., Yi+k | Si).
4

Our Idea (three variants)
1. Inter-block (“big blocks”) method. For each block, compute
the block mean. The inter-block likelihood is just the joint
density of the block means.
2. Intra-block (“small blocks”) method. For each block, com-
pute the joint density of all observations in that block. The
intra-block likelihood is the product of joint densities for all
the blocks, in eﬀect treating the blocks as independent.
3. Hybrid method. Start with the inter-block likelihood. For
each block, compute the joint density of observations in that
block, conditional on the block mean. The hybrid likelihood
is the inter-block likelihood multiplied by the product of the
conditional densities for the blocks. In eﬀect, the hybrid
likelihood assumes that the deviations from each block mean,
conditional on the block mean itself, are independent across
blocks.
5

Computational Savings
Suppose N observations are divided into B blocks of roughly K
observations per block, so that N ≈ KB.
1. Inter-block method: O(B2K2 + B3)
2. Intra-block method: O(BK3)
3. Hybrid method: O(B2K2 + B3 + BK3)
If B grows at a rate between O(N1/2) and O(N2/3), then all three
of these approximate likelihoods may be computed in O(N2)
steps, compared with O(N3) for exact likelihood.
6

Some Methodological Details
Negative log likelihood:
(η, θ) =
1
2
log |Σ(θ)| +
1
2
(Y − Xη)T Σ(θ)−1(Y − Xη),
ˆη = (XT Σ−1X)−1XT Σ−1Y,
P (θ) = min
η
(η, θ)
=
1
2
log |Σ(θ)| +
1
2
G2(θ) where
G2(θ) = Y T {Σ−1 − Σ−1X(XT Σ−1X)−1XT Σ−1}Y.
Alternatively, use REML:
R(θ) =
1
2
log |Σ(θ)| +
1
2
log |XT Σ(θ)−1X| +
1
2
G2(θ).
7

Methodological Details (ctd.)
Inter-block, intra-block and hybrid estimators all lead to approx-
imate NLLH expressions of the structure
(θ) = −
1
2
log |R| +
1
2
Y T RY
for some matrix R.
Idea: Use R in place of Σ−1 for other operations, in particular,
• Estimation of η
• Computation of REML estimator
• Kriging
8

Use of “Information Sandwich” Formula
to Assess Eﬃciency
1. If Y ∼ N[µ(θ), Σ(θ)] and the parameters θ are estimated by
maximum likelihood, then the Fisher information matrix has
(r, s) entry
∂µT
∂θr
Σ−1 ∂µ
∂θs
+
1
2
tr Σ−1∂Σ
∂θr
Σ−1∂Σ
∂θs
2. If an estimator ˜θ is obtained by minimizing the unbiased es-
timating function S(Y1, ..., YN; θ),
(a) Write H(θ) for the matrix with (r, s) entry E ∂S
∂θr
∂S
∂θs
.
(b) Write W(θ) for the matrix with (r, s) entry E ∂2S
∂θr∂θs
.
(c) Then ˜θ has approximate covariance matrix W(θ)−1H(θ)W(θ)−1.
9

Applications of Formula
1. Calculate eﬃciency in form
Asymptotic variance of MLE
Asymptotic variance of given estimator
2. Also estimate the ratio of the approximate variance of the
estimator given by the IS formula to the “direct” variance
estimator derived from W(θ)−1 (“IS/Direct Ratio”)
10

Table 1: Efficiency of estimators
for AR(1) model with B=50, K=10
θ Inter-block Intra-block Hybrid Intra-block Hybrid
Efficiency Efficiency Efficiency IS/Direct IS/Direct
–0.750 .00538 .92595 .92267 1.25 1.26
–0.250 .08999 .91329 .91373 1.002 1.003
–0.010 .15983 .90182 .90280 1.000 1.000
0.010 .16684 .90182 .90281 1.000 1.000
0.250 .27301 .91329 .91409 1.002 1.001
0.750 .73896 .92595 .91800 1.246 1.177
11

Table 2: Efficiency of estimators for
exponential model on a 20 x 20 lattice
(Table shows theoretical calculations;
simulations in parentheses)
True range B, K Method Efficiency Efficiency IS/Direct IS/Direct
and scale for range for scale for range for scale
0.5,1 100, 4 Inter .172 (.124) .118 (.134) N/A N/A
0.5,1 100, 4 Intra .572 (.503) 1.000 (.999) 1.02 (0.89) 1.04 (1.06)
0.5,1 100, 4 HYB .665 (.611) 1.000 (.998) .997 (0.89) 1.03 (1.04)
1.5,1 100, 4 Inter .467 (.426) .778 (.754) N/A N/A
1.5,1 100, 4 Intra .779 (.776) .949 (.966) 1.63 (1.50) 2.10 (1.92)
1.5,1 100, 4 HYB .813 (.784) .964 (.955) .98 (.91) 1.15 (1.07)
0.5,1 25, 16 Inter .011 (0.015) .003 (.011) N/A N/A
0.5,1 25, 16 Intra .818 (.797) 1.000 (.999) 1.01 (.95) 1.02 (1.03)
0.5,1 25, 16 HYB .823 (.800) 1.000 (.998) 1.01 (.95) 1.02 (1.03)
1.5,1 25, 16 Inter .090 (.061) .085 (.056) N/A N/A
1.5,1 25, 16 Intra .886 (.887) .937 (.954) 1.39 (1.33) 1.52 (1.41)
1.5,1 25, 16 HYB .880 (.842) .935 (.925) 1.23 (1.22) 1.23 (1.29)
12

Table 3: Efficiency of estimators for
exponential model on a 27 x 27 lattice
divided into 3 x 3 blocks
True range Method Efficiency Efficiency IS/Direct IS/Direct
and scale for range for scale for range for scale
3,1 Inter .44704 .87855 N/A N/A
3,1 Intra .83511 .87175 2.91 3.44
3,1 HYB .81079 .85079 1.45 1.64
9,1 Inter .75813 .93191 N/A N/A
9,1 Intra .72408 .73858 10.99 12.70
9,1 HYB .76690 .77747 1.88 1.98
27,1 Inter .90026 .95448 N/A N/A
27,1 Intra .71722 .71900 32.06 36.38
27,1 HYB .77195 .77435 1.99 2.03
13

Table 4: Efficiency of estimators for Matérn
model with unknown range, scale and shape
parameters on a 27 x 27 lattice
divided into 3 x 3 blocks
True Method Efficiency Efficiency Efficiency IS/Direct IS/Direct IS/Direct
σ, ρ, ν for range for scale for shape for range for scale for shape
3,1,1 Inter .38638 .22566 .00552 N/A N/A N/A
3,1,1 Intra .67215 .87884 .47059 1.67 2.64 1.30
3,1,1 HYB .61722 .81863 .47753 1.47 1.80 1.32
9,1,1 Inter .38325 .84483 .02399 N/A N/A N/A
9,1,1 Intra .58047 .71886 .42485 4.91 9.79 1.78
9,1,1 HYB .62819 .74415 .48413 2.22 2.69 1.75
27,1,1 Inter .53529 .88697 .05428 N/A N/A N/A
27,1,1 Intra .45077 .59332 .29222 16.28 34.96 3.72
27,1,1 HYB .70594 .77619 .44586 2.43 3.07 2.37
3,1,.1 Inter .80723 .06917 .07704 N/A N/A N/A
3,1,.1 Intra .53118 .96192 .34166 1.37 1.96 1.02
3,1,.1 HYB .90968 .97333 .77054 1.06 1.09 1.08
9,1,.1 Inter .85495 .41777 .10820 N/A N/A N/A
9,1,.1 Intra .62836 .86186 .32750 2.31 6.39 1.03
9,1,.1 HYB .90449 .92904 .80421 1.14 1.14 1.11
27,1,.1 Inter .89971 .83043 .12496 N/A N/A N/A
27,1,.1 Intra .67707 .82933 .31381 3.63 16.83 1.04
27,1,.1 HYB .89021 .89893 .81865 1.24 1.22 1.11
14

Comparison with methods of
Stein and Vecchia
Use simulations on a non-lattice array (same as Stein et al.
2005).
We compare our methods with two versions of Vecchia’s ap-
proach and two versions of Stein’s.
15

Table 5: Mean squared errors for parameter
estimates for a spatial process with
Gaussian covariance matrix
Parameter MLE Inter- Intra- Hybrid
Block Block
Scale 0.002 0.141 0.002 0.002
Range 0.005 2.057 0.005 0.005
Parameter Vecchia Vecchia Stein Stein
1-nbr 10-nbrs single blocks
Scale 0.002 0.002 0.002 0.002
Range 0.06 0.006 0.005 0.005
17

Conclusion from this comparison:
Our “intra-block” and “hybrid” approaches appear to be as good
as the blocking methods of Vecchia and Stein, and superior to
those based on single neighbors
20

Thoughts on the Relevance of
These Comparisons to “Data Systems”
1. There is an obvious analogy whereby each “block” corre-
sponds to one data center in a system that consists of several
2. The “intra-block” method should be directly comparable; the
“hybrid” method requires only the communication of block
means to a common server.
3. Are there extensions of the hybrid method involving condi-
tioning on a low-dimensionsal summary statistic from each
block (rather than 1-dimensional, i.e. the block mean)?
4. Also the related problem of “designing the blocks” — they
don’t have to be spatially contiguous but there may be other
constraints in a multi-server system
21

The Way Forwards?
1. Theoretical comparisons based on the IS approach avoid the
need for very large simulations
2. The theoretical calculations also require extensive computa-
tion but in some cases there are short cuts (e.g. lattice data
plus a stationary model leads to a substantial simpliﬁcation)
3. These concepts can also be extended to the design of the
blocks (subsystems) themselves
4. Our estimators are obviously not the only ones possible but
our approach may provide a starting point for more general
theoretical optimizations
22

Thank You for
Your Attention!
23

CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistics and Potential Applications to Distributed Data - Richard Smith, Feb 12, 2018

More Related Content

What's hot (19)

Similar to CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistics and Potential Applications to Distributed Data - Richard Smith, Feb 12, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded (20)

CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistics and Potential Applications to Distributed Data - Richard Smith, Feb 12, 2018