Geostatistics: Spatial Data Analysis

Geostatistics (GiSc 3052)
By Moges GT & Samuel D.R
1

Course objective and competences to be
acquired
• Students will be trained to apply geo-statistics
and surface analysis to generate information
about geographic features.
2

3
dataset of known values
(e.g. temperature)
raster interpolated from
these points
(e.g. Temperature)

Upon the completion of this course
students will be able to:
• Understand the concept of regionalized variable
theory.
• Explain spatial relationships of the features.
• Describe and model spatial data.
• Understand the basic approach in conducting
variogram modeling.
• Understand how to apply geo-statistics methods
in spatial interpolation.
• Generate surface related information.
• Use surface derivatives for different application.
4

1. Overview of classical statistics:
• Probability theory review, univariate, bivariate
and multivariate data analysis.
Practical 1:
• Orientation about the course nature and
laboratory regulation.
• Univariate data analysis
Unit
1
5

Probability Theory Review
• We review some basic concepts and results of
probability theory that are of relevance. It is
meant to be a refresher of concepts covered
in a separate statistics course. However, in
case you have not taken such courses, the
concepts and results will be explained using
simple examples.
Unit
1
6

Univariate Analysis: Introduction
• The probability of an event as between 0 and 1,
representing the chance or relative frequency of
occurrence of the event. The probabilities of all
possible (mutually exclusive) events of an
experiment must sum to 1.
• In practice, the outcomes of experiments are
assigned numerical values, e.g., when tossing a
coin, 1 and 2 can be assigned to the outcome of
“head" and “tail", respectively
• Such numerical values can be represented by a
Random variable (random variable ).
Unit
1
7

• Two types of random variable exist:
– discrete and
– continuous
• Discrete examples include
– the outcome of tossing a coin (head or tail)
– the grades of course with Pass or Fail
– Landuse (forest, agricultural, water)
Unit
1
8

• Continuous examples include
– the height of all men in the country (ranging from,
say, 1.5 to 1.90m),
– the grades of a class (e.g., 0.0 to 100.0 points)
– Raster* data covering WG area such as
• Elevation of the terrain (raster data with 90m
resolution); e.g SRTM DEM used in GIS classes
• NDVI derived from Landsat 8 image (2013, 335 day of
the year)
• Temperature in raster format
Unit
1
9

• The probability of a random variable
occurring at any possible value (discrete
random variable ) or within a range of values
(continuous random variable ) is described by
its probability distribution.
Unit
1
10

In the example (figure),
P1 = Pr[X = 1] = 0.5,
P2 = Pr[X = 2] = 0.5,
and P1+P2 = 1.
In the discrete random
variable , its distribution is
just the frequency (or
proportion) of occurrence.
Outcomes of experiments
of tossing a coin: discrete
random variable and its
probability distribution.
Unit
1
11

Exercise 1:
• Calculate the probability distribution of the trees in the
manmade (plantation) forest according to size
(diameter of the trees). We use sample data from
WGCFNR plantation forest
• Steps:
– (1) assign each tree numerical value (a discrete random
variable ) – using diameter class (e.g. Class 1 = 10 – 15 cm,
class 2 = 15 – 20 cm. etc;
– (2) calculate the proportion;
– (3) plot the probability.
– You can either do it by hand with a calculator, or using
Excel. Does the total probability sum up to 1? We often
call such a diagram “histogram".
Unit
1
12

Exercise 1:
Diameter
class (DBH
class) Class range
Number of
trees X = xi
13 10 – 15 5 1
18 15 – 20 15 2
23 20 – 25 21 3
28 25 - 30 7 4
33 30 - 35 6 5
Table 1: distribution of the trees in the manmade
(plantation) forest according to size (diameter of the
trees); data from WGCFNR forest. An example of a
discrete random variable
Unit
1
13

Constructing a Histogram for Continuous
Data: Equal Class Widths
• Determine the frequency and relative
frequency for each class. Mark the class
boundaries on a horizontal measurement axis.
• Above each class interval, draw a rectangle
whose height is the corresponding relative
frequency (or frequency).
Unit
1
14

Exercise 2:
• A sample data set of a continuous random
variable : the thickness (X; m) of an aquifer is
measured along the horizontal distance (di; m)
(Table 2). For the thickness, calculate the
mean, variance, standard deviation and CV,
calculate and plot the histogram and
cumulative distribution.
Unit
1
15

Exercise 2:
Table 2: Aquifer thickness along a D distance.
di 1 2 3 4 5 6 7 8 9 10 11
xi 56 57 55 54 49 43 37 36 39 37 41
di 12 13 14 15 16 17 18 19 20 21 22
xi 41 36 33 40 44 53 53 54 51 48 54
di 23 24 25 26 27 28 29 30 31 32 33
xi 63 65 63 63 53 50 50 54 49 43 43
di 34 35 36 37 38 39
xi 47 47 50 53 61 61
Unit
1
16

pdf and cdf functions (curves) of a
continuous random variable
• Probability density function (pdf: fX(x))
• Cumulative distribution function (cdf: FX(x))
Unit
1
17

Exercise 2:
• For the sample set, a few other key statistics
are also of interest:
– mean (μ)
– Variance (σ2)
– Standard deviation (σ) =
– Coefficient of variation (CV = σ/μ)
Unit
1
18

Measures of Variability
• Results vary from individual to individual, from
group to group, from city to city, from
moment to moment. Variation always exists in
a data set, regardless of which characteristic
you’re measuring, because not every
individual will have the same exact value for
every characteristic you measure.
Unit
1
19

• Without a measure of variability you can’t
compare two data sets effectively.
– What if in both sets two sets of data have about
the same average and the same median?
– Does that mean that the data are all the same?
• Not at all.
Unit
1
20

• For example, the data sets 199, 200, 201, and 0,
200, 400 both have
– the same average,
• which is 200,
– and the same median,
• which is also 200.
• Yet they have very different amounts of
variability.
• The first data set has a very small amount of
variability compared to the second.
Unit
1
21

• By far the most commonly used measure of
variability is the standard deviation.
• The standard deviation of a data set
represents the typical distance from any point
in the data set to the center.
• It’s roughly the average distance from the
center, and in this case, the center is the
average.
Unit
1
22

Bivariate Analysis: Introduction
• In the previous section, we look at the
statistical measures of a single random
variable However, correlation can often exist
between two random variable
• For example,
– the height and diameter of tree are often
correlated.
– the elevation and temperature in most areas are
often correlated.
Unit
1
23

• In this chapter you analyze two numerical
variables, X and Y, to look for patterns, find the
correlation, and make predictions about Y from
X, if appropriate, using simple linear
regression.
Unit
1
24

• In this case, the weight increases with increasing
height for which we say a positive correlation
exists between the two variables. To investigate
correlation, a scatter plot is often used, e.g., for
each person, the height and weight is cross-
plotted. Often, some sort of fit is attempted.
Here, we see a linear function fitted to the scatter
plot.
Unit
1
26

• However, to quantitatively evaluate
correlation, a correlation coefficient (rXY ) is
often used:
Unit
1
27

• As defined previously, μX (or μY ) is the mean of X
(or Y) in its univariate distribution.
• ρXY ( or the rXY) varies between
–-1 (perfect negative correlation: Y=-X) to
–1 (perfect positive correlation: Y=X).
• When rXY = 0, we say the two variables are not
correlated.
• In our example, rXY = 0.76, thus there is a certain
amount of positive correlation between weight
and height.
Unit
1
28

correlation between two random variable
• The correlation between two random variable
is the cornerstone of geostatistics:
– one random variable is a
geological/hydrological/petrophscial property at
one spatial location,
– the second random variable can be the
• (1) same property at a different location (auto-
correlation studies; kriging); or,
• (2) a different property at a different location (cross-
correlation studies, co-kriging).
Unit
1
29

Bivariate Random Variables
• The covariance between X and Y (σ XY )
measures how well the two variables track
each other: when one goes up, how does the
other go on average?
• The unit of covariance is the product of the
unit of random variable X and unit of random
variable Y. The covariance between a random
variable X with itself is equal to its variance:
Unit
1
30

correlation
• The correlation (or correlation coefficient)
between X and Y is a dimension-less,
normalized version of ρ XY :
see next slide
Unit
1
31

covariance
• an estimator of the covariance can be defined
as:
• If X and Y are independent, then they are
uncorrelated and their covariance σ XY (thus ρ
XY ) is zero.
• The covariance is best thought of as a
measure of linear dependence.
Unit
1
33

Multivariate Analysis
• Linear Combination of many random variable
• Extending the bivariate arithmetics into
multivariate analysis, we can get another host
of relationships.
Unit
1
34

The idea of regression
• The idea of regression is to build a model that
estimates or predicts one quantitative variable
(y) by using at least one other quantitative
variable (x). Simple linear regression uses
exactly one x variable to estimate the y
variable.
• Multiple linear regression, on the other hand,
uses more than one x variable to estimate the
value of y.
Unit
1
35

Discovering the uses of multiple regression
• One situation in which multiple regression is
useful is when the y variable is hard to track
down; that is, its value can’t be measured
straight up, and you need more than one
other piece of information to help get a
handle on what its value will be.
Unit
1
37

General form of the multiple regression
model
• The general idea of simple linear regression is
to fit the best straight line through that data
that you possibly can and use that line to
make estimates for y based on certain x-
values. The equation of the best-fitting line in
simple linear regression is
– y = b0 + b1x1
– where b0 is the y-intercept and b1 is the slope.
• (The equation also has the form y = a +bx)
Unit
1
39

model
• In the multiple regression setting, you have more than one
x variable that is related to y.
– Call these x variables x1, x2, . . . xk.
• In the most basic multiple regression model, you use some
or all of these x variables to estimate y where each x
variable is taken to the first power. This process is called
finding the best-fitting linear function for the data.
• This linear function looks like the following:
– y = b0 + b1x1 + b2x2 + . . . + bkxk
– and you can call it the multiple (linear) regression model.
• You use this model to make estimates about y based on
given values of the x variables.
Unit
1
40

model
• A linear function is an equation whose x terms
are taken to the first power only.
– For example y = 2x1 + 3x2 + 4x3 is a linear equation
using three x variables.
• If any of the x terms are squared, the function
would be a quadratic one;
• If an x term is taken to the third power, the
function would be a cubic function, and so on.
In this chapter, I consider only linear functions.
Unit
1
41

Unit 2. Introduction to Geostatistics:
Definition and history of
geostatistics, advantages of
geostatistics, geostatistics analysis
requirements.
Practical 2:
Bivariate data analysis
Unit
2
42

What is geostatistics?
• What is statistics?
• What then is geo-statistics?
Unit
2
43

Comment
• The term statistics has two common
meanings, which we want to clearly
separate: descriptive and inferential
statistics.
• But to understand the difference between
descriptive and inferential statistics, we
must first be clear on the difference between
populations and samples.
Unit
2
44

Populations and samples
• A population is a set of well-defined objects.
– We must be able to say, for every object, if it is in the
population or not.
– We must be able, in principle, to find every individual of the
population.
• A geographic example of a population is all pixels in a
multi-spectral satellite image.
• A sample is some subset of a population.
– We must be able to say, for every object in the population, if
it is in the sample or not.
– Sampling is the process of selecting a sample from a
population.
– Continuing the example, a sample from this population could
be a set of pixels from known ground truth points.
Unit
2
45

What do we mean by statistics?
• Two common use of the word:
– Descriptive statistics: numerical summaries of
samples;
• (what was observed)
– Inferential statistics: from samples to
populations.
• (what could have been or will be observed in a larger
population)
Unit
2
46

A concise definition of inferential
statistics
• Statistics: The determination of the
probable from the possible
– . . . which implies the rigorous definition and
then quantification of “probable".
– Probable causes of past events or observations
– Probable occurrence of future events or
observations
• This is a definition of inferential statistics:
– Observations → Inferences
Unit
2
47

Why use statistical analysis?
• Descriptive: we want to summarize some data in a
shorter form
• Inferential: We are trying to understand some process
and maybe predict based on this understanding.
• So we need to model it, i.e. make a conceptual or
mathematical representation, from which we infer the
process.
• But how do we know if the model is “correct"?
• Are we imagining relations where there are none?
• Are there true relations we haven't found?
– Statistical analysis gives us a way to quantify the confidence
we can have in our inferences.
Unit
2
48

Comment
• The most common example of geo-statistical
inference is the prediction of some attribute at an
unsampled point, based on some set of sampled
points.
• In the next slide we show an example from the
Meuse river floodplain in the southern Netherlands.
The
• copper (Cu) content of soil samples has been
measured at 155 points (left figure); from this we
can predict
• at all points in the area of interest (right figure).
Unit
2
49

What is geo-statistics?
• Geostatistics is statistics on a population with
known location, i.e. coordinates:
– In one dimension (along a line or curve)
– In two dimensions (in a map or image)
– In three dimensions (in a volume)
• The most common application of geostatistics
is in 2D (maps).
• Key point: Every observation (sample point)
has both:
– coordinates (where it is located); and
– attributes (what it is).
Unit
2
51

Comment
• Let's first look at
a data set that is
not geo-statistical.
• It is a list of soil
samples (without
their locations)
with the lead (Pb)
concentration.
The column Pb is
the attribute of
interest.
Unit
2
52

To check your understanding . . .
• Q5 : Can we determine the mean,
maximum, minimum and standard
deviation of this set of samples?
• Q6 : Can we make a map of the sample
points with their Pb values?
Unit
2
53

Comment
• Now we look at a data set that is geo-statistical (next
slide).
• These are soil samples taken in the Jura mountains of
Switzerland, and their lead content; but this time
with their coordinates.
• The columns E and N are the coordinates, i.e. the
spatial reference; the column Pb is the attribute.
• First let's look at the tabular form:
Unit
2
54

• Q7 : Comparing this to the non-
geostatistical list of soil samples and their
lead contents (above), what new information
is added here?
Unit
2
56

Comment
• On the figure (next slide) you will see:
– A coordinate system (shown by the over-printed
grid lines)
– The locations of 256 sample points - where a soil
sample was taken
– The attribute value at each sample point -
symbolized by the relative size of the symbol at
each point - in this case the amount of lead (Pb)
in the soil sample
Unit
2
57

• Q8 : In the figure, how can you determine the
coordinates of each sample point?
• Q9 : What are the coordinates of the sample point
displayed as a red symbol?
• Q10 : What is the mathematical origin (in the sense
of Cartesian or analytic geometry) of this coordinate
system?
• Q11 : How could these coordinates be related to
some common system such as UTM?
Unit
2
59

• Q12 : Suppose we have a satellite image that
has not been geo-referenced. Can we speak
of geostatistics on the pixel values?
• Q13 : In this case, what are the coordinates
and what are the attributes?
• Q14 : Suppose now the images has been geo-
referenced. What are now the coordinates?
Unit
2
60

Geostatistics requirements
• The location of a sample is an intrinsic part of its definition.
• All data sets from a given area are implicitly related by their coordinates
– So they can be displayed and related in a GIS
• Values at sample points can not be assumed to be independent: there is
often evidence that nearby points tend to have similar values of attributes.
• That is, there may be a spatial structure to the data
– Classical statistics assumes independence of samples
– But, if there is spatial structure, this is not true!
– This has major implications for sampling design and statistical inference
• Data values may be related to their coordinates ! spatial trend
Unit
2
61

Feature and geographic spaces
• The word space is used in mathematics to
refer to any set of variables that form
metric axes and which therefore allow us to
compute a distance between points in that
space.
– If these variables represent geographic
coordinates, we have a geographic space.
– If these variables represent attributes, we have
a feature space.
Unit
2
62

Comment
• You are probably quite familiar with feature
space from your study of non-spatial statistics.
• Even with one variable, we have a unit of
measure; this forms a 1D or univariate feature
space.
• Most common are two variables which we want
to relate with correlation or regression analysis;
this is a bivariate feature space.
• In multivariate analysis the feature space has
more than two dimensions.
Unit
2
63

Comment
• Multivariate feature spaces can have many
dimensions; we can only see three at a time.
Unit
2
64

Comment
• So, feature space is perhaps a new term but not a new
concept if you've followed a statistics course with
– univariate, bivariate and multivariate analysis.
• What then is geographic space? Simply put, it is a
mathematical space where the axes are map
coordinates that relate points to some reference
location on or in the Earth (or another physical body).
• These coordinates are often in some geographic
coordinate system that was designed to give each
location on (part of) the Earth a unique identification;
a common example is the Universal Transmercator
(UTM) grid.
• However, a local coordinate system can be used, as
long as there is a clear relation between locations and
coordinates.
Unit
2
65

Geographic space
• Axes are 1D lines; they almost always have the same
units of measure (e.g. metres,
• kilometres . . . )
– One-dimensional: coordinates are on a line with respect to
some origin .
– Two-dimensional: coordinates are on a grid with respect to
some origin .
– Three-dimensional: coordinates are grid and elevation from
a reference elevation
• Note: latitude-longitude coordinates do not have equal
distances in the two dimensions; they should be
transformed to metric (grid) coordinates for geo-
statistical analysis.
Unit
2
66

Interpolation
• Interpolation is based on the assumption that
spatially distributed objects are spatially
correlated; in other words, things that are close
together tend to have similar characteristics.
• For instance, if it is raining on one side of the
street, you can predict with a high level of
confidence that it is also raining on the other side
of the street.
• You would be less sure if it was raining across
town and less confident still about the state of the
weather in the neighbouring province.
68

What is a spatial interpolation?
• Interpolation predicts values for cells in a
raster from a limited number of sample data
points. It can be used to predict unknown
values for any geographic point data:
elevation, rainfall, chemical concentrations,
noise levels, and so on.
69

70
On the left is a point dataset of known values. On
the right is a raster interpolated from these points.
Unknown values are predicted with a
mathematical formula that uses the values of
nearby known points.

E
N
Unit
3:
non-geoststistical
spatial
analysis
71

E
N
Unit
3:
non-geoststistical
spatial
analysis
72
896
477
Sample points for copper
What is the
value at point
p?
606
794
783
646

Interpolation methods (to be discussed in the
class)
• Natural neighbour
• Local simple mean (average method)
• Polygon
• Triangulation
• Inverse Distance Method
• Polynomial equation
• Spline
73

Natural neighbour
• Natural Neighbor interpolation finds the
closest subset of input samples to a query
point and applies weights to them based on
proportionate areas to interpolate a value
(Sibson, 1981). It is also known as Sibson or
"area-stealing" interpolation.
74
Unit
3:
non-geoststistical
spatial
analysis
Interpolation methods

Unit
3:
non-geoststistical
spatial
analysis
75

Unit
3:
non-geoststistical
spatial
analysis
76

Unit
3:
non-geoststistical
spatial
analysis
77

Unit
3:
non-geoststistical
spatial
analysis
78

Unit
3:
non-geoststistical
spatial
analysis
79
methods

Unit
3:
non-geoststistical
spatial
analysis
80
methods

IDW
• The IDW (Inverse Distance Weighted) tool
uses a method of interpolation that estimates
cell values by averaging the values of sample
data points in the neighborhood of each
processing cell. The closer a point is to the
center of the cell being estimated, the more
influence, or weight, it has in the averaging
process.
81

Unit
3:
non-geoststistical
spatial
analysis
82
Interpolation
methods

Spline
• The Spline tool uses an interpolation method
that estimates values using a mathematical
function that minimizes overall surface
curvature, resulting in a smooth surface that
passes exactly through the input points.
83

Unit
3:
non-geoststistical
spatial
analysis
84

Unit
3:
non-geoststistical
spatial
analysis
85

Unit
3:
non-geoststistical
spatial
analysis
86

Unit
3:
non-geoststistical
spatial
analysis
87

INTERPOLATION METHODS USING
SAMPLES
88
Unit
3:
non-geoststistical
spatial
analysis

More explanation on: point Estimation
• For each of the point estimation methods we
describe in the following sections, we will
show the detail’s of the estimation of the V
value at 65E,137N.
• No sweeping conclusions should be drawn
from this single example; it is presented only
to provide a familiar common thread through
our presentation of various methods. Once we
have looked at
89
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis

Distances to sample values in the vicinity of
65E,137N
90
Unit
3:
non-geoststistical
spatial
analysis

point estimation methods
• The Values at sample locations near 65E, 137N
• are shown in the figure on the next slide and
listed in the previous Table.
– The variability of these nearby sample values presents
a challenge for estimation.
– Values range from 227 to 791 ppm;
• The estimated value therefore, can cover quite a
broad range depending on how we choose to
weight the individual values.
• In the following sections we will look at four quite
different point estimation methods.
91
Unit
3:
non-geoststistical
spatial
analysis

E
N
Unit
3:
non-geoststistical
spatial
analysis
92
696
477
606
794
783
646
The goal is to estimate the value of V at the point 65E,137N,
located by the arrow, from the surrounding seven V data values.

The method of triangulation (1/9)
• by fitting a plane through three samples that surround the point
being estimated. The equation of a plane can be expressed
generally as
 z = ax + by + c
• In our example, where we are trying to estimate V values using
coordinate information, z is the V value, x is the easting, and y is
the northing.
• Given the coordinates and the V value of three nearby samples, we
can calculate the coefficients a, b and c by solving the following
system of equations:
 aX1+ bY1 + c = Z1
 aX2 + bY2 + c = Z2
 αX 3 + bY3 + C = Z3
93
Unit
3:
non-geoststistical
spatial
analysis

• From the figure we can find three samples
that nicely surround the point being
estimated: the 696 ppm, the 227 ppm, and
the 606 ppm samples.
• Using the data for these three samples, the
set of equations we need to solve is
63a + 140b + c = 696
64a + 129b + c = 227
71a + 140b + c = 606
94
Unit
3:
non-geoststistical
spatial
analysis

• The solution to these three simultaneous equations is
α= -11.250 b =41.614 c= -442.159
• which gives us the following equation as our
triallgulation estimator:
V = 1. 250x + 41.614y - 442.159
• This is the equation of the plane that passes through
the three nearby samples we have chosen;
• Using this equation we can now estimate the value at
any location simply by substituting the appropriate
easting and northing. Substituting the coordinates x =
65 and y = 137 into our equation gives us an estimate
of 548.7 ppm at the location 65E,137N.
95
Unit
3:
non-geoststistical
spatial
analysis

96
The figure show the contours of the estimated V
values that this equation produces.
Unit
3:
non-geoststistical
spatial
analysis

97
Unit
3:
non-geoststistical
spatial
analysis

98

99

100

101
The method of triangulation (9a/9)

102
The method of triangulation (9b/9)
Unit
3:
non-geoststistical
spatial
analysis

Local Sample Mean
103
• T he mean of the seven nearby samples
shown in Figure is 603.7ppm. This estimate is
much higher than the triangulation estimate.
• The two samples with V values greater than
750 ppm in the eastern half of Figure receive
more than 25% of the total weight and
therefore have a considerable influence on
our estimated value.
Unit
3:
non-geoststistical
spatial
analysis

Inverse Distance Methods (1/6)
• One obvious way to do
this is to make the weight
for each sample inversely
proportional to its
distance from the point
being estimated:
104
• d1, . . . ,dn are the distances from each of
the n sample locations to the point being
estimated and Vl , . . . ,Vn are the sample
values.
Unit
3:
non-geoststistical
spatial
analysis

Inverse distance weighting calculations for
sample values in the vicinity of 65E,137N
105
Unit
3:
non-geoststistical
spatial
analysis

• The nearest sample, the 696 ppm sample at 63E, 140N,
receives about 26% of the total weight, while the farthest
sample, the 783 ppm sample at 75E, 128N, receives less
than 7%. A good example of the effect of the inverse
distance weighting can be found in a comparison of the
weights given to the 477 ppm sample and the 791 pprn
sample.
• The 791 ppm sample at 73E, 141N is about twice as far
away from the point we are trying to estimate as the 477
ppm sample at 61E, 139N; the 791 ppm sample therefore
receives about half the weight of the 477 ppm sample.
• Using the weights given in previous Table our inverse
distance estimate of the V value at 65E, 137N is 594 ppm.
106
Unit
3:
non-geoststistical
spatial
analysis

• The inverse distance estimator we gave in the
next Equation can easily be adapted to
include a broad range of estimates. Rather
than using weights that are inversely
proportional to the distance, we can make the
weights inversely proportional to any power of
the distance:
107
Unit
3:
non-geoststistical
spatial
analysis

108
The effect of the inverse distance exponent on the sample
weights and on the V estimate.

• Different choices of the exponent p will result
in different estimates.
109
Unit
3:
non-geoststistical
spatial
analysis

Search Neighbourhoods
• For the case studies we perform in this
chapter, we use a circular search
neighbourhood with a radius of 25 m.
• All samples that fall within 25 m of the point
we are estimating will be included in the
estimation procedure.
110
Unit
3:
non-geoststistical
spatial
analysis

The First Law of Geography
Tobler’s Law:
• The central tenet of Geography is that
location matters for understanding a wide
variety of phenomena.
• Everything is related to everything else, but
things that are closer together are more
related to each other than those that are
further apart
Unit
3:
The
First
Law
of
Geography
111

Geographers’ Perspectives on the World
• Location matters
– Real-world relationships
– Horizontal connections between places
– Importance of scale (both in time and space)
Unit
3:
The
First
Law
of
Geography
112

Geographic Information
• Includes knowledge about where something is
• Includes knowledge about what is at a given
location
• Can be very detailed:
– e.g. the locations of all buildings in a city or the
– locations of all trees in a forest stand
• Or it can be very coarse:
– e.g. the population density of an entire country or the
global sea surface temperature distribution
• There is always a spatial component associated
with geographic information
Unit
3:
The
First
Law
of
Geography
113

Unit
3:
The
First
Law
of
Geography
114

Unit
3:
The
First
Law
of
Geography
116

Unit
3:
The
First
Law
of
Geography
117

Unit
3:
The
First
Law
of
Geography
118

Unit
3:
The
First
Law
of
Geography
119

Unit
3:
The
First
Law
of
Geography
120

Unit
3:
The
First
Law
of
Geography
121

Practical 4 to 5: Exploratory data analysis
• Non-geostatistical interpolation (ArcGIS and/or QGIS)
– Inverse distance
– Closest point
– Moving average
– Least square polynomial
– Spline
– Triangulation
• Individual report (short report)
– What are the required input (data and parameters)
– What are the outputs
– Compare the different
• methods and/or parameters
122

4. Characterizing spatial process
• Covariance,
• Correlation and variogram.
• Understanding and measure of similarity
between different data.
124
Unit
4:
Characterizing
spatial
process

variogram
• The most common way to visualize local
spatial dependence is the variogram, also
called (for historical reasons) the
semivariogram.
• To understand this, we have to first define the
semivariance as a mathematical measure of
the difference between the two points in a
point-pair.
125
Unit
4:
Characterizing
spatial
process

126
The semi-variogram is
based on modelling the
(squared) differences in
the z-values as a function
of the distances between
all of the known points.
Unit
4:
Characterizing
spatial
process

Semivariance
• This is a mathematical measure of the difference between the two
points in a point-pair. It is expressed as squared difference so that
the order of the points doesn't matter (i.e. subtraction in either
direction gives the same results). Each pair of observation points
has a semivariance, usually represented as the Greek letter
(`gamma'), and dened as:
• where x is a geographic point and z(x) is its attribute value.
• (Note: The `semi' refers to the factor 1=2, because there are two
ways to compute for the same point pair.)
• So, the semivariance between two points is half the squared
dierence between their values. If the values are similar, the
semivariance will be small.
127
Unit
4:
Characterizing
spatial
process

Q
128
Unit
4:
Characterizing
spatial
process

Point pair
• Now we know two things about a point-pair:
1. The distance between them in geographic space;
2. The semivariance between them in attribute
space.
• So . . . it seems natural to see if points that are
`close by' in geographical space are also `close
by' in attribute space.
• This would be evidence of local spatial
dependence.
129
Unit
4:
Characterizing
spatial
process

The variogram cloud
• This is a graph showing semivariances between
all point-pairs:
– X-axis: The separation distance within the point-pair
– Y-axis: The semivariance
• Advantage: Shows the comparison between all
point-pairs as a function of their separation;
• Advantage: Shows which point-pairs do not fit the
general pattern
• Disadvantage: too many graph points, hard to
interpret
130
Unit
4:
Characterizing
spatial
process

131
Unit
4:
Characterizing
spatial
process

Q
132
Unit
4:
Characterizing
spatial
process

variogram cloud
• Clearly, the variogram cloud gives too much
information. If there is a relation between
separation and semi-variance, it is hard to see.
• The usual way to visualize this is by grouping the
point-pairs into lags or bins according to some
separation range, and computing some
representative semi-variance for the entire lag.
• Often this is the arithmetic average, but not
always.
133
Unit
4:
Characterizing
spatial
process

134
Origins
• Involve a set of statistical techniques called
Kriging (there are a bunch of different Kriging
methods)
• Kriging is named after Danie Gerhardus Krige, a
South African mining engineer who presented
the ideas in his masters thesis in 1951. These
ideas were later formalized by a prominent
French mathematician Georges Matheron
• For more information, see:
– Krige, Danie G. (1951). "A statistical
approach to some basic mine valuation
problems on the Witwatersrand". J. of the
Chem., Metal. and Mining Soc. of South
Africa 52 (6): 119–139.
– Matheron, Georges (1962). Traité de
géostatistique appliquée, Editions Technip,
France
• Kriging has two parts: the quantification of the
spatial structure in the data (called variography)
and prediction of values at unknown points
Souce of this information: http://en.wikipedia.org/wiki/Daniel_Gerhardus_Krige
Georges Matheron
Danie Gerhardus Krige

135
Motivating Example: Ordinary Kriging
• Imagine we have data on the concentration of gold (denote it by Y) in
western Pennsylvania at a set of 200 sample locations (call them points
p1…p200).
• Since Y has a meaningful value at every point, our goal is to create a
prediction surface for the entire region using these sample points
• Notation: In this western PA region, Y(p) will denote the concentration
level of gold at any point p.

136
Global and Local Structure
• Without any a priori knowledge about the distribution of gold in Western PA,
we have no theoretical reason to expect to find different concentrations of
gold at different locations in that region.
– I.e., theoretically, the expected value of gold concentration should not vary with
latitude and longitude
– In other words, we would expect that there is some general, average, value of
gold concentration (called global structure) that is constant throughout the region
(even though we assume it’s constant, we do not know what its value is)
• Of course, when we look at the data, we see that there is some variability in
the gold concentrations at different points. We can consider this to be a local
deviation from the overall global structure, known as the local structure or
residual or error term.
• In other words, geostatisticians would decompose the value of gold Y(p) into
the global structure μ(p) and local structure ε(p).
• Y(p) = μ(p) + ε(p)

137
ε(p)
• As per the First Law of Geography, the local
structures ε(p) of nearby observations will often be
correlated. That is, there is still some meaningful
information (i.e., spatial dependencies) that can be
extracted from the spatially dependent component
of the residuals.
• So, our ordinary kriging model will:
– Estimate this constant but unknown global structure μ(p),
and
– Incorporate the dependencies among the residuals ε(p).
Doing so will enable us to create a continuous surface of
gold concentration in western PA.

138
Assumptions of Ordinary Kriging
• For the sake of the methods that we will be employing, we need to
make some assumptions:
– Y(p) should be normally distributed
– The global structure μ(p) is constant and unknown (as in the
gold example)
– Covariance between values of ε depends only on distance
between the points,
• To put it more formally, for each distance h and each pair of
locations p and t within the region of interest that are h units are
apart, there exists a common covariance value, C(h), such that
covariance [ε(p), ε(t)] = C(h).
• This is called isotropy

139
Covariance and Distance
• From the First Law of Geography it would then follow that as distance
between points increases, the similarity (i.e., covariance or correlation)
between the values at these points decreases
• If we plot this out, with inter-point distance h on the x-axis, and
covariance C(h) on the y-axis, we get a graph that looks something like
the one below. This representation of covariance as a function of distance
is called as the covariogram
• Alternatively, we can plot correlation against distance (the correlogram)

140
Covariograms and Weights
• Geostatistical methods incorporate this covariance-
distance relationship into the interpolation models
– More specifically, this information is used to calculate the
weights
– As IDW, kriging is a weighted average of points in the
vicinity
• Recall that in IDW, in order to predict the value at an unknown
point, we assume that nearer points will have higher weights (i.e.,
weights are determined based on distance)
• In geostatistical techniques, we calculate the distances between
the unknown point at which we want to make a prediction and
the measured points nearby, and use the value of the
covariogram for those distances to calculate the weight of each
of these surrounding measured points.
– I.e., the weight of a point h units away will depend on the value of
C(h)

141
But…
• Unfortunately, it so happens that one generally cannot estimate
covariograms and correlograms directly
• For that purpose, a related function of distance (h) called the semi-
variogram (or simply the variogram) is calculated
– The variogram is denoted by γ(h)
– One can easily obtain the covariogram from the variogram (but not the
other way around)
• Covariograms and variograms tell us the spatial structure of the data
Covariogram C(h) Variogram γ(h)

142
Interpretation of Variograms
• As mentioned earlier, a covariogram might be thought of as covariance (i.e., similarity)
between point values as a function of distance, such that C(h) is greater at smaller
distances
• A variogram, on the other hand, might be thought of as “dissimilarity between point
values as a function of distance”, such that the dissimilarity is greater for points that are
farther apart
• Variograms are usually interpreted in terms of the corresponding covariograms or
correlograms
• A common mistake when interpreting variograms is to say that variance increases with
distance.
Covariogram C(h) Variogram γ(h)

143
• When there are n points, the number of inter-point distances is equal to
• Example:
– With 15 points, we have 15(15-1)/2 = 105 inter-point distances (marked in yellow on the grid in the
lower left)
– Since we’re using Euclidean distance, the distance between points 1 and 2 is the same as the
distance between points 2 and 1, so we count it only once. Also, the distance between a point and
itself will always be zero, and is of no interest here.
• The maximum distance h on a covariogram or variogram is called the bandwidth,
and should equal half the maximum inter-point distance.
– In the figure on the lower right, the blue line connects the points that are the farthest away from
each other. The bandwidth in this example would then equal to half the length of the blue line
Bandwidth (The Maximum Value of h)

144
Mathematical definition of a variogram
• In other words, for each distance h between 0 and the bandwidth
– Find all pairs of points i and j that are separated by that distance h
– For each such point pair, subtract the value of Y at point j from the
value of Y at point i, and square the difference
– Average these square distances across all point pairs and divide the
average by 2. That’s your variogram value!
• Division by 2 -> hence the occasionally used name semi-variogram
• However, in practice, there will generally be only one pair of points that
are exactly h units apart, unless we’re dealing with regularly spaced
samples. Therefore, we create “bins”, or distance ranges, into which we
place point pairs with similar distances, and estimate γ only for midpoints
of these bins rather than at all individual distances.
– These bins are generally of the same size
– It’s a rule of thumb to have at least 30 point pairs per bin
• We call these estimates of γ(h) at the bin midpoints the empirical
variogram
 
 
2
)
(
)
(
2
1
)
( j
Y
i
Y
average
h 



145
Fitting a Variogram Model
• Now, we’re going to fit a variogram model (i.e., curve) to the
empirical variogram
• That is, based on the shape of the empirical variogram,
different variogram curves might be fit
• The curve fitting generally employs the method of least
squares – the same method that’s used in regression analysis
A very comprehensive guide on variography by Dr. Tony Smith (University of Pennsylvania)
http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_II/4_Variograms.pdf

146
The Variogram Parameters
• The variogram models are a function of three parameters, known as the range, the sill, and
the nugget.
– The range is typically the level of h at the correlation between point values is zero (i.e.,
there is no longer any spatial autocorrelation)
– The value of γ at r is called the sill, and is generally denoted by s
• The variance of the sample is used as an estimate of the sill
– Different models have slightly different definitions of these parameters
– The nugget deserves a slide of its own
Graph taken from: http://www.geog.ubc.ca/courses/geog570/talks_2001/Variogr1neu.gif

147
Spatial Independence at Small Distances
• Even though we assume that values at points that are very
near each other are correlated, points that are separated by
very, very small values might be considerably less correlated
– E.g.: you might find a gold nugget and no more gold in the vicinity
• In other words, even though γ(0) is always 0, however γ at
very, very small distances will be equal to a value a that is
considerably greater than 0.
• This value denoted by a is called the nugget
• The ratio of the nugget to the sill is known as the nugget
effect, and may be interpreted as the percentage of variation
in the data that is not spatial
• The difference between the sill and the nugget is known as
the partial sill
– The partial sill, and not the sill itself, is reported in GeoStatistical
Analyst

148
Pure Nugget Effect Variograms
• Pure nugget effect is when the covariance between point values is zero at
all distances h
• That is, there is absolutely no spatial autocorrelation in the data (even at
small distances)
• Pure nugget effect covariogram and variogram are presented below
• Interpolation won’t give a reasonable predictions
• Most cases are not as extreme and have both a spatially dependent and a
spatially independent component, regardless of variogram model chosen
(discussed on following slides)

149
The Spherical Model
• The spherical model is the most widely used variogram model
• Monotonically non-decreasing
– I.e., as h increases, the value of γ(h) does not decrease - i.e., it goes up (until h≤r) or
stays the same (h>r)
• γ(h≥r)=s and C(h≥r)=0
– That is, covariance is assumed to be exactly zero at distances h≥r

150
The Exponential Model
• The exponential variogram looks very similar to the spherical model, but assumes
that the correlation never reaches exactly zero, regardless of how great the
distances between points are
• In other words, the variogram approaches the value of the sill asymptotically
• Because the sill is never actually reached, the range is generally considered to be
the smallest distance after which the covariance is 5% or less of the maximum
covariance
• The model is monotonically increasing
– I.e., as h goes up, so does γ(h)

151
The Wave (AKA Hole-Effect) Model
On the picture to the left, the waves exhibit a
periodic pattern. A non-standard form of spatial
autocorrelation applies. Peaks are similar in values
to other peaks, and troughs are similar in values to
other troughs. However, note the dampening in the
covariogram and variogram below: That is, peaks
that are closer together have values that are more
correlated than peaks that are father apart (and
same holds for troughs).
More is said about the applicability of these models in
ttp://www.gaa.org.au/pdf/gaa_pyrcz_deutsch.pdf
Variogram graph edited slightly from:
http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_
II/4_Variograms.pdf

152
Variograms and Kriging Weights

5. VAROGRAM
MODELING/ANALYSIS:
154

The empirical variogram
• To summarize the variogram cloud, group the separations
into lags (separation bins, like a histogram)
• Then, compute the average semivariance of all the point-
pairs in the bin.
• This is the empirical variogram, as the so-called Matheron
estimator:
– m(h) is the number of point pairs separated by vector h, in
practice some range (bin)
– These are indexed by i; the notation z(xi +h) means the “tail“ of
point-pair i, i.e. separated from the ”head“ xi by the separation
vector h.
155
Unit
5:
Varogram
Modeling/analysis:

Defining the bins
• There are some practical considerations, just like dening bins for a
histogram:
– Each bin should have enough points to give a robust estimate of the
representative semi-variance; otherwise the variogram is erratic;
– If a bin is too wide, the theoretical variogram model will be hard to
estimate and fit; note we haven't seen this yet, it is in the next lecture;
– The largest separation should not exceed half the longest separation in
the dataset;
– In general the largest separation should be somewhat shorter, since it
is the local spatial dependence which is most interesting.
• All computer programs that compute variograms use some defaults
for the largest separation and number of bins; gstat uses 1/3 of the
longest separation, and divides this into 15 equal-width bins.
157
Unit
5:
Varogram
Modeling/analysis:

Numerical example of an empirical
variogram
• Here is an empirical
variogram of log10Pb
from the Jura soil
samples; for simplicity
the maximum
separation was set to
1.5 km:
– np are the number of
point-pairs in the bin;
dist is the average
separation of these
pairs; gamma is the
average semivariance
in the bin.
158
Unit
5:
Varogram
Modeling/analysis:

Q
159
Unit
5:
Varogram
Modeling/analysis:

Plotting the empirical variogram
• This can be plotted as semivariance gamma
against average separation dist, along with the
number of points that contributed to each
estimate np.
160
Unit
5:
Varogram
Modeling/analysis:

161
Unit
5:
Varogram
Modeling/analysis:

Q
162
Unit
5:
Varogram
Modeling/analysis:

Features of the empirical variogram
• Later we will look at fitting a theoretical model to the
empirical variogram; but even without a model we can
notice some features which characterize the spatial
dependence, which we define here only qualitatively:
– Sill: maximum semi-variance
• represents variability in the absence of spatial dependence
– Range: separation between point-pairs at which the sill is
reached
• distance at which there is no evidence of spatial dependence
– Nugget: semi-variance as the separation approaches zero
• represents variability at a point that can't be explained by spatial
structure
163
Unit
5:
Varogram
Modeling/analysis:

Semivariogram
0
10
20
30
40
50
60
0 50 100 150 200
Lag (m)
Semivariance
Sill
Range
Nugget
Unit
5:
Varogram
Modeling/analysis:

Semivariogram
0
10
20
30
40
50
60
0 50 100 150 200
Lag (m)
Semivariance
Spatially dependent Spatially independent
Unit
5:
Varogram
Modeling/analysis:

Semivariogram uses
• Use range to determine maximum
sampling distances
• The sill indicates intra-field variability
• The model can be used for interpolation
of values in unsampled areas
Unit
5:
Varogram
Modeling/analysis:

Q
167
Unit
5:
Varogram
Modeling/analysis:

168
Unit
5:
Varogram
Modeling/analysis:

Effect of bin width
• The same set of points can be displayed with many bin
widths
• This has the same effect as different bin widths in a
univariate histogram: same data, different visualization
• In addition, visual and especially automatic variogram
fitting is affected
• Wider (fewer) bins ! less detail, also less noise
• Narrower (more) bins ! more detail, but also more noise
• General rule:
– as narrow as possible (detail) without “too much“ noise;
– and with sufficient point-pairs per bin (> 100, preferably > 200)
169
Unit
5:
Varogram
Modeling/analysis:

Evidence of spatial dependence
• The empirical variogram provides evidence that there is
local spatial dependence.
• The variability between point-pairs is lower if they are
closer to each other; i.e. the separation is small.
• There is some distance, the range where this efect is noted;
beyond the range there is no dependence.
• The relative magnitude of the total sill and nugget give the
strength of the local spatial dependence; the nugget
represents completely unexplained variability.
• There are of course variables for which there is no spatial
dependence, in which case the empirical variogram has the
sill equal to the nugget; this is called a pure nugget effect
• The next graph shows an example.
171
Unit
5:
Varogram
Modeling/analysis:

172
Unit
5:
Varogram
Modeling/analysis:

Visualizing anisotropy
• Anisotropy
• Variogram surfaces
• Directional variograms
173
Unit
5:
Varogram
Modeling/analysis:

What?
• We have been considering spatial dependence as if it is the same in
all directions from a point (isotropic or omnidirectional).
• For example, if I want to know the weather at a point where there
is no station, I can equally consider stations at some distance from
my location, no matter whether they are N, S, E or W.
• But this is self-evidently not always true! In this example, suppose
the winds almost always blow from the
• North. Then the temperatures recorded at stations 100 km to the N
or S of me will likely be closer to the temperature at my station
than temperatures recorded at stations 100 km to the E or W.
• We now see how to detect anisotropy.
174
Unit
5:
Varogram
Modeling/analysis:

Anisotropy
• Greek ”Iso"+ “tropic"= English “same"+ “trend";
Greek “an-"= English ”not-"
• Variation may depend on direction, not just
distance
• This is why we refer to the separation vector; up
till now this has just meant distance, but now it
includes direction
– Case 1: same sill, different ranges in dierent directions
(geometric, also called affine, anisotropy)
– Case 2: same range, sill varies with direction (zonal
anisotropy)
175
Unit
5:
Varogram
Modeling/analysis:

Spatial trends
• Isotropic - trend is a function of distance from
a known (sampled) point only
• Anisotropic - trend is a function of both
distance and direction from a known point
Unit
5:
Varogram
Modeling/analysis:

How can anisotropy arise?
• Directional process
– Example: sand content in a narrow flood plain:
much greater spatial dependence along the axis
parallel to the river
– Example: population density in a hilly terrain with
long, linear valleys
• Note that the nugget must logically be
isotropic: it is variation at a point (which has
no direction)
177
Unit
5:
Varogram
Modeling/analysis:

How do we detect anisotropy?
1. Looking for directional patterns in the post-plot;
2. With a variogram surface, sometimes called a
variogram map;
3. Computing directional variograms, where we
only consider points separated by a distance but
also in a given horizontal direction from each
other.
• We can compute different directional variograms
and see if they have different structure.
178
Unit
5:
Varogram
Modeling/analysis:

Detecting anisotropy with a variogram
surface
• One way to see anistropy is with a variogram surface,
sometimes called a variogram map.
• This is not a map! but rather a plot of semivariances vs.
distance and direction (the separation vector)
• Each grid cell shows the semivariance at a given
distance and direction separation (lag)
• Symmetric by definition, can be read in either direction
• A transect from the origin to the margin gives a
directional variogram (next visualization technique)
179
Unit
5:
Varogram
Modeling/analysis:

180
Unit
5:
Varogram
Modeling/analysis:

181
Unit
5:
Varogram
Modeling/analysis:

182
Unit
5:
Varogram
Modeling/analysis:

Q
183
Unit
5:
Varogram
Modeling/analysis:

184
Reviewing Ordinary Kriging
• Again, ordinary kriging will:
– Give us an estimate of the constant but unknown global
structure μ(p), and
– Use variography to examine the dependencies among the
residuals ε(p) and to create kriging weights.
• We calculate the distances between the unknown point at which
we want to make a prediction and the measured points that are
nearby and use the value of the covariogram for those distances to
calculate the weight of each of these surrounding measured
points.
• The end result is, of course, a continuous prediction
surface
• Prediction standard errors can also be obtained – this
is a surface indicating the accuracy of prediction

185
• Now, take another example: imagine we have data on the
temperature at 100 different weather stations (call them
w1..w100) throughout Florida, and we want to predict the
values of temperature (T) at every point w in the entire state
using these data.
• Notation: temperature at point w is denoted by T(w)
• We know that temperature at lower latitudes are expected to
be higher. So, T(w) will be expected to vary with latitude
– Ordinary kriging is not appropriate here, because it assumes that the
global structure is the same everywhere. This is clearly not the case
here.
– A method called universal kriging allows for a non-constant global
structure
• We might model the global structure μ as in regression:
• Everything else in universal kriging is pretty much the same as in ordinary
kriging (e.g., variography)
Universal Kriging
  )
(
1
0 w
latitude
w 

 


186
Some More Advanced Techniques
• Indicator Kriging is a geostatistical interpolation
method does not require the data to be normally
distributed.
• Co-kriging is an interpolation technique that is used
when there is a second variable that is strongly
correlated with the variable from which we’re trying to
create a surface, and which is sampled at the same set
of locations as our variable of interest and at a number
of additional locations.
• For more details on indicator kriging and co-kriging,
see one of the texts suggested at the end of this
presentation

187
Isotropy vs. Anisotropy
• When we use isotropic (or omnidirectional)
covariograms, we assume that the covariance
between the point values depends only on distance
– Recall the covariance stationarity assumption
• Anisotropic (or directional) covariograms are used
when we have reason to believe that direction plays
a role as well (i.e., covariance is a function of both
distance and direction)
– E.g., in some problems, accounting for direction is
appropriate (e.g., when wind or water currents might be a
factor)
For more on anisotropic variograms, see http://web.as.uky.edu/statistics/users/yzhen8/STA695/lec05.pdf

188
IDW vs. Kriging
• We get a more “natural” look to the data with Kriging
• You see the “bulls eye” effect in IDW but not (as much) in Kriging
• Helps to compensate for the effects of data clustering, assigning individual points within a cluster less
weight than isolated data points (or, treating clusters more like single points)
• Kriging also give us a standard error
• If the data locations are quite dense and uniformly distributed throughout the area of interest, we will get
decent estimates regardless of which interpolation method we choose.
• On the other hand, if the data locations fall in a few clusters and there are gaps in between these clusters,
we will obtain pretty unreliable estimates regardless of whether we use IDW or Kriging.
These are interpolation results using the gold data in Western PA (IDW vs. Ordinary Kriging)

6. KRIGING (SPATIAL ESTIMATION)
189

Why other methods than interpolations
• In next units we will look at ordinary kriging
• Ordinary kriging is “linear" because
– its estimates are weighted linear combinations of the
available data; it is “unbiased" since it tries to have mR, the
mean residual or error, equal to 0;
– it is “best" because it aims at minimizing σ2
R (the
variance of the errors).
• All of the other estimation methods we have seen so
far are also linear and, as we have already seen, are
also theoretically unbiased.
• The distinguishing feature of ordinary kriging,
therefore, is its aim of minimizing the error variance.
190
Unit
6:
Kriging
(spatial
estimation)

How to deal with error
• The importance of this for ordinary kriging is that
– we never know mR and therefore cannot guarantee
that it is exactly 0.
– Nor do we know σ2
R; therefore, we cannot minimize
it.
• The best we can do is to build a model of the data
we are studying and work with the average error
and the error variance for the model.
191
Unit
6:
Kriging
(spatial
estimation)

variance
• In ordinary kriging, we use a probability model
in which the bias and the error variance can
both be calculated and then choose weights
for the nearby samples that ensure that the
average error for our model, mR is exactly 0
and that our modeled error variance σ2
R, is
minimized.
192
Unit
6:
Kriging
(spatial
estimation)

ordinary kriging system
• This system of equations, often referred to as
the ordinary kriging system, can be written in
matrix notation as
193
Unit
6:
Kriging
(spatial
estimation)

weights
• To solve for the weights, we multiply the
previous Equation on both sides by c-1 the
inverse of the left-hand side covariance
matrix:
194
Unit
6:
Kriging
(spatial
estimation)

An Example of Ordinary Kriging
• Let us return to the seven sample data
configuration we used earlier to see a specific
example of how ordinary kriging is done. The
data configuration is shown again in next
slides; we have labelled the point we are
estimating as location 0, and the sample
locations as 1 through 7. The coordinates of
these eight points are given in Table following
the figure, along with the available sample
values.
195
Unit
6:
Kriging
(spatial
estimation)

An example of a data configuration
196
• An example of
a data
configuration
to illustrate the
kriging
estimator.
• The sample
value is given
immediately to
the right of the
plus sign.
Unit
6:
Kriging
(spatial
estimation)

197
Coordinates and sample values for the data shown
in previous Figure
Unit
6:
Kriging
(spatial
estimation)

pattern of spatial
• To calculate the ordinary kriging weights, we
must first decide what pattern of spatial
continuity we want our random function
model to have.
198
Unit
6:
Kriging
(spatial
estimation)

covariances
• To keep this example relatively simple, we will
calculate all of our covariances from the
following function:
199
An example of an exponential
covariance function .
Unit
6:
Kriging
(spatial
estimation)

variogram
• The covariance function corresponds to the
following variogram:
200
An example of an exponential
variogram model
Unit
6:
Kriging
(spatial
estimation)

Remark to covariance & variogram model
– Co
• commonly called the
nugget effect
• provides a
discontinuity at the
origin.
201
Unit
6:
Kriging
(spatial
estimation)
– a
• commonly called the range
• provides a distance beyond
which the varriogram or
covariance value remains
essentially constant.
– Co + Cl
• commonly called the sill
• is the variogram value for very
large distances, γ( ∞) it is also
the covariance value for |h| =
0, and the variance of our
random variables, σ2.
Both of these functions, shown in the previous two
slides, can be described by the following parameters:

• Geostatisticians normally define the spatial continuity
in their random function model through the variogram
and solve the ordinary kriging system using the
covariance. In this example, we will use the covariance
function throughout.
• By using the covariance function, we have chosen to
ignore the possibility of anisotropy for the moment;
the covariance between the data values at any two
locations will depend only on the distance between
them and not on the direction. Later, when we
examine the effect of the various parameters, we will
also study the important effect of anisotropy.
202
Unit
6:
Kriging
(spatial
estimation)

203
A table of distances, from the previous Figure,
between all possible pairs of the seven data
locations.
Unit
6:
Kriging
(spatial
estimation)

• To demonstrate how ordinary kriging works,
we will use the following parameters for the
function given in the following Equation:
204
Unit
6:
Kriging
(spatial
estimation)

• These are not necessarily good choices, but
they will make the details of the ordinary
kriging procedure easier to follow since our
covariance model now has a quite simple
expression:
205
Unit
6:
Kriging
(spatial
estimation)

• Having chosen a covariance function from
which we can calculate all the covariances
required for our random function model, we
can now build the C and D matrices.
206
Unit
6:
Kriging
(spatial
estimation)

207
Using Table, which provides the distances between
every pair of locations, and Equation above, the C
matrix is
Unit
6:
Kriging
(spatial
estimation)

211
The set of weights
that will provide
unbiased estimates
with a minimum
estimation variance is
calculated by
multiplying C-1 by D:

212
The ordinary
kriging weights for
the seven samples
using the isotropic
exponential
covariance model
given in Equation
below. The sample
value is given
immediately to the
right of the plus
sign while the
kriging weights are
shown in
parenthesis.

• Below is shown the sample values along with
their corresponding weights. The resulting
estimate is
213

214
the minimized error variance expressed as

215
The minimized estimation variance is

Detailed exercise
• Refer to the practical exercise on
– interpolation using IDW
– kriging
216

Spatial Interpolation: A Brief
Introduction
Eugene Brusilovskiy

218
• Introduction to interpolation
• Deterministic interpolation methods
• Some basic statistical concepts
• Autocorrelation and First Law of Geography
• Geostatistical Interpolation
– Introduction to variography
– Kriging models
General Outline

219
What is Interpolation?
• Assume we are dealing with a variable which has meaningful values at every point
within a region (e.g., temperature, elevation, concentration of some mineral).
Then, given the values of that variable at a set of sample points, we can use an
interpolation method to predict values of this variable at every point
– For any unknown point, we take some form of weighted average of the values at
surrounding points to predict the value at the point where the value is unknown
– In other words, we create a continuous surface from a set of points
– As an example used throughout this presentation, imagine we have data on the
concentration of gold in western Pennsylvania at a set of 200 sample locations:
Input Process Output

220
Appropriateness of Interpolation
• Interpolation should not be used when there isn’t a
meaningful value of the variable at every point in space
(within the region of interest)
• That is, when points represent merely the presence of events
(e.g., crime), people, or some physical phenomenon (e.g.,
volcanoes, buildings), interpolation does not make sense.
• Whereas interpolation tries to predict the value of your
variable of interest at each point, density analysis (available,
for instance, in ArcGIS’s Spatial Analyst) “takes known
quantities of some phenomena and spreads it across the
landscape based on the quantity that is measured at each
location and the spatial relationship of the locations of the
measured quantities”.
– Source:
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Un
derstanding_density_analysis

221
Interpolation vs. Extrapolation
• Interpolation is prediction within the range of our data
– E.g., having temperature values for a bunch of
locations all throughout PA, predict the temperature
values at all other locations within PA
• Note that the methods we are talking about are strictly
those of interpolation, and not extrapolation
• Extrapolation is prediction outside the range of our data
– E.g., having temperature values for a bunch of
locations throughout PA, predict the temperature
values in Kazakhstan

222
First Law of Geography
• “Everything is related to everything else, but
near things are more related than distant
things.”
– Waldo Tobler (1970)
• This is the basic premise
behind interpolation, and
near points generally
receive higher weights
than far away points
Waldo Tobler
Reference: TOBLER, W. R. (1970). "A computer movie simulating urban growth in the
Detroit region". Economic Geography, 46(2): 234-240.

223
Methods of Interpolation
• Deterministic methods
– Use mathematical functions to calculate the values at unknown locations
based either on the degree of similarity (e.g. IDW) or the degree of smoothing
(e.g. RBF) in relation with neighboring data points.
– Examples include:
• Inverse Distance Weighted (IDW)
• Radial Basis Functions (RBF)
• Geostatistical methods
– Use both mathematical and statistical methods to predict values at all
locations within region of interest and to provide probabilistic estimates of the
quality of the interpolation based on the spatial autocorrelation among data
points.
• Include a deterministic component and errors (uncertainty of prediction)
– Examples include:
• Kriging
• Co-Kriging
Reference: http://www.crwr.utexas.edu/gis/gishydro04/Introduction/TermProjects/Peralvo.pdf

224
Exact vs. Inexact Interpolation
• Interpolators can be either exact or inexact
– At sampled locations, exact interpolators yield values identical to the
measurements.
• I.e., if the observed temperature in city A is 90 degrees, the point
representing city A on the resulting grid will still have the temperature of
90 degrees
– At sampled locations, inexact interpolators predict values that are
different from the measured values.
• I.e., if the observed temperature in city A is 90 degrees, the inexact
interpolator will still create a prediction for city A, and this prediction will
not be exactly 90 degrees
– The resulting surface will not pass through the original point
– Can be used to avoid sharp peaks or troughs in the output surface
• Model quality can be assessed by the statistics of the differences between
predicted and measured values
– Jumping ahead, the two deterministic interpolators that will be briefly
presented here are exact. Kriging can be exact or inexact.
Reference: Burrough, P. A., and R. A. McDonnell. 1998. Principles of geographical information systems. Oxford University Press,
Oxford. 333pp.

225
Part 1. Deterministic Interpolation

226
Inverse Distance Weighted (IDW)
• IDW interpolation explicitly relies on the First Law of
Geography. To predict a value for any unmeasured location,
IDW will use the measured values surrounding the prediction
location. Measured values that are nearest to the prediction
location will have greater influence (i.e., weight) on the
predicted value at that unknown point than those that are
farther away.
– Thus, IDW assumes that each measured point has a local influence
that diminishes with distance (or distance to the power of q > 1), and
weighs the points closer to the prediction location greater than those
farther away, hence the name inverse distance weighted.
• Inverse Squared Distance (i.e., q=2) is a widely used interpolator
• For example, ArcGIS allows you to select the value of q.
• Weights of each measured point are proportional to the
inverse distance raised to the power value q. As a result, as
the distance increases, the weights decrease rapidly. How fast
the weights decrease is dependent on the value for q.
Source: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works

227
Inverse Distance Weighted - Continued
• Because things that are close to one another are more alike
than those farther away, as the locations get farther away, the
measured values will have little relationship with the value of
the prediction location.
– To speed up the computation we might only use several points that
are the closest
– As a result, it is common practice to limit the number of measured
values that are used when predicting the unknown value for a location
by specifying a search neighborhood. The specified shape of the
neighborhood restricts how far and where to look for the measured
values to be used in the prediction. Other neighborhood parameters
restrict the locations that will be used within that shape.
• The output surface is sensitive to clustering and the presence
of outliers.

228
Search Neighborhood Specification
Points with known values of elevation that are outside the circle are just too far from the
target point at which the elevation value is unknown, so their weights are pretty much 0.
5 nearest neighbors
with known values
(shown in red)
of the unknown point
(shown in black)
will be used to
determine its value

229
The Accuracy of the Results
• One way to assess the accuracy of the interpolation is known
as cross-validation
– Remember the initial goal: use all the measured points to create a
surface
– However, assume we remove one of the measured points from our
input, and re-create the surface using all the remaining points.
– Now, we can look at the predicted value at that removed point and
compare it to the point’s actual value!
– We do the same thing for all the points
– If the average (squared) difference between the actual value and the
prediction is small, then our model is doing a good job at predicting
values at unknown points. If this average squared difference is large,
then the model isn’t that great. This average squared difference is
called mean square error of prediction. For instance, the Geostatistical
Analyst of ESRI reports the square root of this average squared
difference
– Cross-validation is used in other interpolation methods as well

230
A Cross-Validation Example
• Assume you have measurements at 15 data points,
from which you want to create a prediction surface
• The Measured column tells you the measured value
at that point. The Predicted column tells you the
prediction at that point when we remove it from the
input (i.e., use the other 14 points to create a
surface). The Error column is simply the difference
between the measured and predicted values.
• Because we can have an over-prediction or under-
prediction at any point, the error can be positive or
negative. So averaging the errors won’t do us much
good if we want to see the overall error – we’ll end
up with a value that is essentially zero due to these
positives and negatives
• Thus, in order to assess the extent of error in our
prediction, we square each term, and then take the
average of these squared errors. This average is called
the mean squared error (MSE)
• For example, ArcGIS reports the square root of this
mean squared error (referred to as simply Root-
Mean-Square in Geostatistical Analyst). This root
mean square error is often denoted as RMSE.

231
Examples of IDW with Different q’s
• Larger q’s (i.e., power to which distance is raised) yield smoother surfaces
• Food for thought: What happens when q is set to 0?
Gold concentrations at locations
in western PA
q = 1
q=2
q=3
q=10
The Geostatistical Analyst of ArcGIS is
able to tell you the optimal value of q
by seeing which one yields the
minimum RMSE. (Here, it is q=1).

232
Part 2. A Review of Stats 101

233
Before we do any Geostatistics…
• … Let’s review some basic statistical topics:
– Normality
– Variance and Standard Deviations
– Covariance and Correlation
• … and then briefly re-examine the underlying
premise of most spatial statistical analyses:
– Autocorrelation

234
Normality
• A lot of statistical tests – including many in
geostatistics – rely on the assumption that the
data are normally distributed
• When this assumption does not hold, the
results are often inaccurate

236
Data Transformations
• Sometimes, it is possible to transform a variable’s distribution by
subjecting it to some simple algebraic operation.
– The logarithmic transformation is the most widely used to achieve
normality when the variable is positively skewed (as in the image on
the left below)
– Analysis is then performed on the transformed variable.

237
The Mean and the Variance
• The mean (average) of a variable is also known as the
expected value
– Usually denoted by the Greek letter μ
– As an aside, for a normally distributed variable, the mean
is equal to the median
• The variance is a measure of dispersion of a variable
– Calculated as the average squared distance of the possible
values of the variable from mean.
– Standard deviation is the square root of the variance
– Standard deviation is generally denoted by the Greek letter
σ, and variance is therefore denoted by

238
Example: Calculation of Mean and Variance
Person Test Score Distance from the Mean (Distance from the Mean) Squared
1 90 15 225
2 55 -20 400
3 100 25 625
4 55 -20 400
5 85 10 100
6 70 -5 25
7 80 5 25
8 30 -45 2025
9 95 20 400
10 90 15 225
Mean: 75
Variance: 445 (Average of the
entries in this column)
Standard deviation (Square root of
the variance): 21.1

239
Covariance and Correlation
• Defined as a measure of how much two variables X
and Y change together
– The units of Cov (X, Y) are those of X multiplied by those of Y
– The covariance of a variable X with itself is simply the
variance of X
• Since these units are fairly obscure, a dimensionless
measure of the strength of the relationship between
variables is often used instead. This measure is known
as the correlation.
– Correlations range from -1 to 1, with positive values close to
one indicating a strong direct relationship and negative
values close to -1 indicating a strong inverse relationship

240
Spatial Autocorrelation
• Sometimes, rather than examining the association
between two variables, we might look at the relationship
of values within a single variable at different time points
or locations
• There is said to be (positive) autocorrelation in a variable
if observations that are closer to each other in space
have related values (recall Tobler’s Law)
• As an aside, there could also be temporal autocorrelation
– i.e., values of a variable at points close in time will be
related

241
Examples of Spatial Autocorrelation
(Source: http://image.weather.com/images/maps/current/acttemp_720x486.jpg)

242
Examples of Spatial Autocorrelation (Cont’d)
(Source: http://capita.wustl.edu/CAPITA/CapitaReports/localPM10/gifs/elevatn.gif)

243
Regression
• A statistical method used to examine the relationship
between a variable of interest and one or more
explanatory variables
– Strength of the relationship
– Direction of the relationship
• Often referred to as Ordinary Least Squares (OLS)
regression
• Available in all statistical packages
• Note that the presence of a relationship does not
imply causality

244
For the purposes of demonstration, let’s focus
on a simple version of this problem
• Variable of interest (dependent variable)
– E.g., education (years of schooling)
• Explanatory variable (AKA independent variable or predictor):
– E.g., Neighborhood Income

245
But what does a regression do? An example with a single predictor

246
The example on the previous page can be
easily extended to cases when we have more
than one predictor
• When we have n>1 predictors, rather than getting a line in 2 dimensions,
we get a line in n+1 dimensions (the ‘+1’ accounts for the dependent
variable)
• Each independent variable will have its own slope coefficient which will
indicate the relationship of that particular predictor with the dependent
variable, controlling for all other independent variables in the regression.
• The equation of the best fit line becomes
Dep. Variable = m1*predictor1 + m2*predictor2 + … + m3*predictor 3 + b + residuals
where the m’s are the coefficients of the corresponding predictors and b is
the y-intercept term
• The coefficient of each predictor may be interpreted as the amount by
which the dependent variable changes as the independent variable
increases by one unit (holding all other variables constant)

247
Some (Very) Basic Regression Diagnostics
• R-squared: the percent of variance in the dependent variable that
is explained by the independent variables
• The so-called p-value of the coefficient
– The probability of getting a coefficient (slope) value as far from zero as we
observe in the case when the slope is actually zero
– When p is less than 0.05, the independent variable is considered to be a
statistically significant predictor of the dependent variable
– One p-value per independent variable
• The sign of the coefficient of the independent variable (i.e., the
slope of the regression line)
– One coefficient per independent variable
– Indicates whether the relationship between the dependent and
independent variables is positive or negative
– We should look at the sign when the coefficient is statistically significant

248
Some (but not all) regression assumptions
1. The dependent variable should be normally
distributed (i.e., the histogram of the variable
should look like a bell curve)
2. Very importantly, the observations should be
independent of each other. (The same holds
for regression residuals). If this assumption is
violated, our coefficient estimates could be
wrong!

249
Part 3. Geostatistical Interpolation

250
Some Widely Used Texts on Geostatistics
– Bailey, T.C. and Gatrell, A.C. (1995) Interactive
Spatial Data Analysis. Addison Wesley Longman,
Harlow, Essex.
– Cressie, N.A.C. (1993) Statistics for Spatial Data.
(Revised Edition). Wiley, John & Sons, Inc.,
– Isaaks, E.H. and Srivastava, R.M. (1989) An
Introduction to Applied Geostatistics. Oxford
University Press, New York, 561 p.

Ecosystems are: Hierarchically structured, Metastable, Far
from equilibrium
Spatial Relationships
Theoretical
Framework:
“An Introduction to Applied Geostatistics“, E. Isaaks and R. Srivastava, (1989).
“Factorial Analysis”, C. J. Adcock, (1954)
“Spatial Analysis: A guide for ecologists”, M. Fortin and M. Dale, (2005)

Geostatistics: Spatial Data Analysis

Basic paradigm:
Ecosystem processes (change) are constrained and controlled by the pattern of
hierarchical scales
“Things” closer together (in both space and time) are more alike then things far
apart – “Tobler’s Law” (1970, Economic Geography) “Everything is related to everything, but near
things are more related then distant things”
Ecological “scale” is the space and time “distance” apart (lag) at which significant
variation is NO LONGER correlated with “distance”
Distance
Scale
Variation

Applied Geostatistics
Spatial Structure
Regionalized Variable
Moran I (1950) GEARY C (1954)
Semivariance
Stationarity Anistotropy

Applied Geostatistics
Notes on Introduction to Spatial
Autocorrelation
Geostatistical methods were developed for interpreting data that varies continuously over a
predefined, fixed spatial region. The study of geostatistics assumes that at least some of the
spatial variation observed for natural phenomena can be modeled by random processes with
spatial autocorrelation.
D}
i
:
{z(i) 
Geostatistics is based on the theory of regionalized variables, variable distributed in
space (or time). Geostatiscal theory supports that any measurement of
regionalozed variables can be viewed as a realization of a random function (or
random process, or random field, or stochastic process)

Spatial Structure
Geostatistical techniques are designed to evaluate the spatial structure of a variable,
or the relationship between a value measured at a point in one place, versus a
value from another point measured a certain distance away.
Describing spatial structure is useful for:
 Indicating intensity of pattern and the scale at which that pattern is exposed
 Interpolating to predict values at unmeasured points across the domain (e.g. kriging)
 Assessing independence of variables before applying parametric tests of significance

Regionalized Variables take on values according to spatial location.
Given a variable z, measured at a location i , the variability in z can be broken down into three
components:
Where:



 )
(
)
(
)
( i
i
i s
f
z

)
(i
f A “structural” coarse scale forcing or trend

)
(i
s A random” Local spatial dependency

 error variance (considered normally distributed)
Usually removed by detrending
What we are interested in
Coarse scale forcing or trends can be removed by fitting a surface to the trend using
regression and then working with regression residuals
Regionalized Variable

+
Z1
+
Z3
+
Zn
+
Z4 +
Zi
+
Z2
Regionalized Variable Zi
Variables are spatially correlated,
Therefore:
Z(x+h) can be estimated from
Z(x) by using a regression model.
** This assumption holds true with a
recognized increased in error, from
other lest square models.
Function Z in domain D
= a set of space dependent values
Histogram of samples zi
Z(x)
Z(x+h)
Cov(Z(x),Z(x+h))
D}
i
:
{z(i) 

X
1
3
2
1
:
:
µ
Y
3
2
5
3
:
:
µ
x y x·y x2 y2
b
a
Tan 

θ
a
b
As θ decreases, a/b goes to 0
θ1
a
b
A Statement of the extent to which two data sets agree.
θ2
One distribution
Two distributions
2
1 
 Tan
Tan 
data deviates
Deviations
Product
of
Deviations
Sum
of
squares
If you were to calculate correlation
by hand …. You would produce these
Terms.
Determined by the extent to which the two regressions lines depart
from the horizontal and vertical.
Correlation
Coefficient:
Correlation:

n
)
x
(x
n
)
y
(y
)/n
x
)(x
y
1(y
n
1
i
2
i
n
1
i
2
i
n
1
i
i
i










n
)
x
(x
n
)
x
(x
w
/
)
x
)(x
x
(x
w
n
1
i
2
i
n
1
i
2
i
n
1
i
n
1
i
n
1
j
ij
n
1
j
j
i
ij


 



  





Spatial
auto-correlation
Correlation
Coefficient




 
 



n
1
i
2
i
n
1
i
n
1
j
ij
n
1
i
n
1
j
j
i
ij
)
x
(x
)
w
(
)
x
)(x
x
(x
w
N
=
Briggs UT-Dallas GISC 6382 Spring 2007

Spatial Structure
Autocorrelation: := Degree of correlation to self
Spatial
Autocorrelation:
:= The relationship is a function of distance
Spatial
Structure which is:
Exogenous (induced) … induced external spatial dependence
Endogenous (inherent) … inherent spatial autocorrelation
Spatial
Dependence:
Compare values at given distance apart -- LAGS
A
B
C
D
Point – Point Autocorrelation
A - B Positive
A - C None
A - D Negative
Direction of
Autocorrelation:
Anisotropic := varies in intensity and range with orientation
Isotropic := varies similarly in all directions

Spatial Structure
Given: Spatial Pattern is an outcome of the synthesis of dynamic processes
operating at various spatial and temporal scales
Therefore: Structure at any given time is but one realization of several potential
outcomes
Assuming: All processes are Stationary (homogeneous)
Where: Properties are independent of absolute location and direction in space
Therefore: Observations are independent which := they are homoscedastic and
form a known distribution
That is:     ij
j
Z
i
Z j
i
X
X 


 ,
,
,
, 2
2


Stationarity is a property of the process NOT the data allowing spatial
inferences
And:
Stationarity is scale dependent
Furthermore:
Inference (spatial statistics) apply over regions of assumed stationarity
Thus:

Space
A B C D E F G H I J
First Order Neighbors Topology
Binary Connectivity Matrix
Distance Class
Connectivity Matrix
1
1 1
1 0 1
1 0 0 1
0 0 0 1 1
0 0 1 1 0 1
0 0 0 0 0 1 1
0 1 1 0 0 0 1 1
0 1 0 0 0 0 0 1 1
A
B
C
D
E
F
G
H
I
J
A B C D E F G H I J
1
1 2
1 2 1
1 2 2 1
2 3 2 1 1
2 2 1 1 2 1
3 2 2 2 2 1 1
2 1 1 2 3 2 1 1
2 1 2 3 3 2 2 1 1
A
B
C
D
E
F
G
H
I
J
J
I H
B
G
C F
D
A E
Topological v’s Euclidean
1= connected, 0=not connected

Positive autocorrelation:
Negative autocorrelation:
No autocorrelation:
A variable is thought to be autocorrelated if it is
possible to predict its value at a given location, by
knowing its value at other nearby locations.
 Autocorrelation is evaluated using structure functions that assess the spatial
structure or dependency of the variable.
 Two of these functions are autocorrelation and semivariance which are graphed as
a correlogram and semivariogram, respectively.
 Both functions plot the spatial dependence of the variable against the spatial
separation or lag distance.

Space
A B C D …. J
A 0.0
B 2.00 0.00
C 1.41 3.16 0.00
:
:
J
A B C D …. J
A 0
B 2 0
C 1 3 0
:
:
J
A B C D …. J
A 0
B 0 0
C 1 0 0
:
:
J
A B C D …. J
Euclidean Distance Matrix
Euclidean Distance Matrix
Connectivity Matrix
Weighted Matrix
A B
C D E
F G
H I J
K L
A 0
B 0 0
C 0.7 0 0
D 0.7 0.7 0
:
J

Moran I (1950)
The numerator is a covariance (cross-product) term; the denominator is a variance term.
Where:
n is the number of pairs
Zi is the deviation from the mean for value at location i (i.e., Zi = xi – x for variable x) Zj is the
deviation from the mean for value at location j (i.e., Zj = xj – x for variable x) wij is an indicator
function or weight at distance d (e.g. wij = 1, if j is in distance class d from point i, otherwise = 0)
Wij is the sum of all weights (number of pairs in distance class)
2
i
i
ij
i j
j
i
ij
(d)
Z
W
Z
Z
w
n
I



• A cross-product statistic that is used to describe autocorrelation
• Compares value of a variable at one location with values at all other locations
Values range from [-1, 1] Value =
1 : Perfect positive correlation Value = -1:
Perfect negative correlation

Moran I (1950)
Again; where for variable x:
n is the number of pairs
wij(d) is the distance class connectivity matrix (e.g. wij = 1, if j is in distance class d from point
i, otherwise = 0) W(d) is the sum of all
weights (number of pairs in distance class)
 
2
1
_
_
1 1
_
)
(
)
(
1
1





































n
i
i
j
n
j
i
i
n
i
j
j
i
ij
d
d
x
x
n
x
x
x
x
d
w
W
I


  


i
2
i
ij
i j
2
j
i
ij
(d)
Z
2W
]
)
y
(y
w
1)[
[(N
C
• A squared difference statistic for assessing spatial autocorrelation
• Considers differences in values between pairs of observations, rather than the
covariation between the pairs (Moran I)
GEARY C (1954)
The numerator in this equation is a defference term that gets squared.
The Geary C statistic is more sensitive to extreme values & clustering than the Moran I, and
behaves like a distance measure:
Values range from [0,3]
Value = 0 : Positive autocorrelation Value = 1
: No autocorrelation Value > 1 :
Negative autocorrelation

Ripley’s K (1976)
The L (d) transformations
 
 
1
,
1 1






N
N
j
i
k
A
L
i
i
j
j
(d)

Where:
A = area
N = nuber of points
D = distance
K(i,j) = the weight, which is 1 when |i-j| < d, 0 when |i-j| > d
Determines if features are clustered at multiple
different distance. Sensitive to study area
boundary. Conceptualized as “number of points”
within a set of radius sets.
If events follow complete spatial randomness, the
number of points in a circle follows a Poisson
distribution (mean less then 1) and defines the
“expected”.




j
i
j
i
ij
(d)
x
x
x
x
d
w
G
)
(
Where:
d = distance class
Wij = weight matrix, which is 1 when |i-j| < d, 0 when |i-j| > d
General G
Effectively Distinguishes between “hot and cold” spots. G is relatively large
if high values cluster, low if low values cluster.
Numerator are “within” a distance bound (d), expressed relative to the
entire study area.

Semivariance
2
)
(
1
)
(
2 j
i
i j
ij
d
y
y
w
n
d 
 

Where :
j is a point at distance d from i
nd is the number of points in that distance class (i.e., the sum of the weights wij for that
distance class)
wij is an indicator function set to 1 if the pair of points is within the distance class.
2
)
(
)
(
2
1
)
( i
d
i
d
n
i
y
y
d
n
d 

 



The geostatistical measure that describes the rate of change of the regionalized variable is
known as the semivariance.
Semivariance is used for descriptive analysis where the spatial structure of the data is
investigated using the semivariogram and for predictive applications where the
semivariogram is fitted to a theoretical model, parameterized, and used to predict the
regionalized variable at other non-measured points (kriging).

The sill is the value at which the semivariogram levels off (its asymptotic value)
The range is the distance at which the semivariogram levels off (the spatial extent of
structure in the data)
The nugget is the semivariance at a distance 0.0, (the y –intercept)
A semivariogram is a plot of the structure function that, like autocorrelation, describes the
relationship between measurements taken some distance apart. Semivariograms define
the range or distance over which spatial dependence exists.

Autocorrelation assumes stationarity, meaning that the spatial
structure of the variable is consistent over the entire domain of the dataset.
The stationarity of interest is second-order (weak) stationarity, requiring that:
(a) the mean is constant over the region
(b) variance is constant and finite; and
(c) covariance depends only on between-sample spacing
 In many cases this is not true because of larger trends in the data
 In these cases, the data are often detrended before analysis.
 One way to detrend data is to fit a regression to the trend, and use only
the residuals for autocorrelation analysis
Stationarity

Autocorrelation also assumes isotropy, meaning that the spatial structure of the variable is
consistent in all directions.
Often this is not the case, and the variable exhibits anisotropy, meaning that there is a
direction-dependent trend in the data.
Anistotropy
If a variable exhibits different ranges in different directions, then there is a geometric
anisotropy. For example, in a dune deposit, larger range in the wind direction compared to
the range perpendicular to the wind direction.

bd
co 

γ(d)
)]
/
exp(
1
[
γ(d) 2
2
a
d
c
co 



)]
/
exp(
1
[
γ(d) a
d
c
co 



Gaussian:
Linear:
Spherical:
Exponential:
For predictions, the empirical semivariogram is converted to a theoretic one by fitting a
statistical model (curve) to describe its range, sill, & nugget.









a
d
c
c
a
d
a
d
a
d
c
c
o
o
,
)],
2
/
(
)
2
/
3
[
γ(d)
3
3
There are four common models used to fit semivariograms:
Where:
c0 = nugget
b = regression slope
a = range
c0+ c = sill
Assumes no sill
or range

• Check for enough number of pairs at each lag distance (from 30 to 50).
• Removal of outliers
• Truncate at half the maximum lag distance to ensure enough pairs
• Use a larger lag tolerance to get more pairs and a smoother variogram
• Start with an omnidirectional variogram before trying directional variograms
• Use other variogram measures to take into account lag means and variances
(e.g., inverted covariance, correlogram, or relative variograms)
• Use transforms of the data for skewed distributions (e.g. logarithmic transforms).
• Use the mean absolute difference or median absolute difference to derive the range
Variogram Modeling Suggestions

Geostatistics: Spatial Data Analysis

More Related Content

Similar to Geostatistics: Spatial Data Analysis (20)

Recently uploaded (20)

Geostatistics: Spatial Data Analysis