Geostatistics (GiSc 3052)
By Moges GT & Samuel D.R
1
Course objective and competences to be
acquired
• Students will be trained to apply geo-statistics
and surface analysis to generate information
about geographic features.
2
3
dataset of known values
(e.g. temperature)
raster interpolated from
these points
(e.g. Temperature)
Upon the completion of this course
students will be able to:
• Understand the concept of regionalized variable
theory.
• Explain spatial relationships of the features.
• Describe and model spatial data.
• Understand the basic approach in conducting
variogram modeling.
• Understand how to apply geo-statistics methods
in spatial interpolation.
• Generate surface related information.
• Use surface derivatives for different application.
4
1. Overview of classical statistics:
• Probability theory review, univariate, bivariate
and multivariate data analysis.
Practical 1:
• Orientation about the course nature and
laboratory regulation.
• Univariate data analysis
Unit
1
5
Probability Theory Review
• We review some basic concepts and results of
probability theory that are of relevance. It is
meant to be a refresher of concepts covered
in a separate statistics course. However, in
case you have not taken such courses, the
concepts and results will be explained using
simple examples.
Unit
1
6
Univariate Analysis: Introduction
• The probability of an event as between 0 and 1,
representing the chance or relative frequency of
occurrence of the event. The probabilities of all
possible (mutually exclusive) events of an
experiment must sum to 1.
• In practice, the outcomes of experiments are
assigned numerical values, e.g., when tossing a
coin, 1 and 2 can be assigned to the outcome of
“head" and “tail", respectively
• Such numerical values can be represented by a
Random variable (random variable ).
Unit
1
7
Univariate Analysis: Introduction
• Two types of random variable exist:
– discrete and
– continuous
• Discrete examples include
– the outcome of tossing a coin (head or tail)
– the grades of course with Pass or Fail
– Landuse (forest, agricultural, water)
Unit
1
8
Univariate Analysis: Introduction
• Continuous examples include
– the height of all men in the country (ranging from,
say, 1.5 to 1.90m),
– the grades of a class (e.g., 0.0 to 100.0 points)
– Raster* data covering WG area such as
• Elevation of the terrain (raster data with 90m
resolution); e.g SRTM DEM used in GIS classes
• NDVI derived from Landsat 8 image (2013, 335 day of
the year)
• Temperature in raster format
Unit
1
9
Univariate Analysis: Introduction
• The probability of a random variable
occurring at any possible value (discrete
random variable ) or within a range of values
(continuous random variable ) is described by
its probability distribution.
Unit
1
10
Univariate Analysis: Introduction
In the example (figure),
P1 = Pr[X = 1] = 0.5,
P2 = Pr[X = 2] = 0.5,
and P1+P2 = 1.
In the discrete random
variable , its distribution is
just the frequency (or
proportion) of occurrence.
Outcomes of experiments
of tossing a coin: discrete
random variable and its
probability distribution.
Unit
1
11
Exercise 1:
• Calculate the probability distribution of the trees in the
manmade (plantation) forest according to size
(diameter of the trees). We use sample data from
WGCFNR plantation forest
• Steps:
– (1) assign each tree numerical value (a discrete random
variable ) – using diameter class (e.g. Class 1 = 10 – 15 cm,
class 2 = 15 – 20 cm. etc;
– (2) calculate the proportion;
– (3) plot the probability.
– You can either do it by hand with a calculator, or using
Excel. Does the total probability sum up to 1? We often
call such a diagram “histogram".
Unit
1
12
Exercise 1:
Diameter
class (DBH
class) Class range
Number of
trees X = xi
13 10 – 15 5 1
18 15 – 20 15 2
23 20 – 25 21 3
28 25 - 30 7 4
33 30 - 35 6 5
Table 1: distribution of the trees in the manmade
(plantation) forest according to size (diameter of the
trees); data from WGCFNR forest. An example of a
discrete random variable
Unit
1
13
Constructing a Histogram for Continuous
Data: Equal Class Widths
• Determine the frequency and relative
frequency for each class. Mark the class
boundaries on a horizontal measurement axis.
• Above each class interval, draw a rectangle
whose height is the corresponding relative
frequency (or frequency).
Unit
1
14
Exercise 2:
• A sample data set of a continuous random
variable : the thickness (X; m) of an aquifer is
measured along the horizontal distance (di; m)
(Table 2). For the thickness, calculate the
mean, variance, standard deviation and CV,
calculate and plot the histogram and
cumulative distribution.
Unit
1
15
Exercise 2:
Table 2: Aquifer thickness along a D distance.
di 1 2 3 4 5 6 7 8 9 10 11
xi 56 57 55 54 49 43 37 36 39 37 41
di 12 13 14 15 16 17 18 19 20 21 22
xi 41 36 33 40 44 53 53 54 51 48 54
di 23 24 25 26 27 28 29 30 31 32 33
xi 63 65 63 63 53 50 50 54 49 43 43
di 34 35 36 37 38 39
xi 47 47 50 53 61 61
Unit
1
16
pdf and cdf functions (curves) of a
continuous random variable
• Probability density function (pdf: fX(x))
• Cumulative distribution function (cdf: FX(x))
Unit
1
17
Exercise 2:
• For the sample set, a few other key statistics
are also of interest:
– mean (μ)
– Variance (σ2)
– Standard deviation (σ) =
– Coefficient of variation (CV = σ/μ)
Unit
1
18
Measures of Variability
• Results vary from individual to individual, from
group to group, from city to city, from
moment to moment. Variation always exists in
a data set, regardless of which characteristic
you’re measuring, because not every
individual will have the same exact value for
every characteristic you measure.
Unit
1
19
Measures of Variability
• Without a measure of variability you can’t
compare two data sets effectively.
– What if in both sets two sets of data have about
the same average and the same median?
– Does that mean that the data are all the same?
• Not at all.
Unit
1
20
Measures of Variability
• For example, the data sets 199, 200, 201, and 0,
200, 400 both have
– the same average,
• which is 200,
– and the same median,
• which is also 200.
• Yet they have very different amounts of
variability.
• The first data set has a very small amount of
variability compared to the second.
Unit
1
21
Measures of Variability
• By far the most commonly used measure of
variability is the standard deviation.
• The standard deviation of a data set
represents the typical distance from any point
in the data set to the center.
• It’s roughly the average distance from the
center, and in this case, the center is the
average.
Unit
1
22
Bivariate Analysis: Introduction
• In the previous section, we look at the
statistical measures of a single random
variable However, correlation can often exist
between two random variable
• For example,
– the height and diameter of tree are often
correlated.
– the elevation and temperature in most areas are
often correlated.
Unit
1
23
Bivariate Analysis: Introduction
• In this chapter you analyze two numerical
variables, X and Y, to look for patterns, find the
correlation, and make predictions about Y from
X, if appropriate, using simple linear
regression.
Unit
1
24
Unit
1
25
Bivariate Analysis: Introduction
• In this case, the weight increases with increasing
height for which we say a positive correlation
exists between the two variables. To investigate
correlation, a scatter plot is often used, e.g., for
each person, the height and weight is cross-
plotted. Often, some sort of fit is attempted.
Here, we see a linear function fitted to the scatter
plot.
Unit
1
26
Bivariate Analysis: Introduction
• However, to quantitatively evaluate
correlation, a correlation coefficient (rXY ) is
often used:
Unit
1
27
Bivariate Analysis: Introduction
• As defined previously, μX (or μY ) is the mean of X
(or Y) in its univariate distribution.
• ρXY ( or the rXY) varies between
–-1 (perfect negative correlation: Y=-X) to
–1 (perfect positive correlation: Y=X).
• When rXY = 0, we say the two variables are not
correlated.
• In our example, rXY = 0.76, thus there is a certain
amount of positive correlation between weight
and height.
Unit
1
28
correlation between two random variable
• The correlation between two random variable
is the cornerstone of geostatistics:
– one random variable is a
geological/hydrological/petrophscial property at
one spatial location,
– the second random variable can be the
• (1) same property at a different location (auto-
correlation studies; kriging); or,
• (2) a different property at a different location (cross-
correlation studies, co-kriging).
Unit
1
29
Bivariate Random Variables
• The covariance between X and Y (σ XY )
measures how well the two variables track
each other: when one goes up, how does the
other go on average?
• The unit of covariance is the product of the
unit of random variable X and unit of random
variable Y. The covariance between a random
variable X with itself is equal to its variance:
Unit
1
30
correlation
• The correlation (or correlation coefficient)
between X and Y is a dimension-less,
normalized version of ρ XY :
see next slide
Unit
1
31
correlation
Unit
1
32
covariance
• an estimator of the covariance can be defined
as:
• If X and Y are independent, then they are
uncorrelated and their covariance σ XY (thus ρ
XY ) is zero.
• The covariance is best thought of as a
measure of linear dependence.
Unit
1
33
Multivariate Analysis
• Linear Combination of many random variable
• Extending the bivariate arithmetics into
multivariate analysis, we can get another host
of relationships.
Unit
1
34
The idea of regression
• The idea of regression is to build a model that
estimates or predicts one quantitative variable
(y) by using at least one other quantitative
variable (x). Simple linear regression uses
exactly one x variable to estimate the y
variable.
• Multiple linear regression, on the other hand,
uses more than one x variable to estimate the
value of y.
Unit
1
35
Discovering the uses of multiple regression
• One situation in which multiple regression is
useful is when the y variable is hard to track
down; that is, its value can’t be measured
straight up, and you need more than one
other piece of information to help get a
handle on what its value will be.
Unit
1
37
General form of the multiple regression
model
• The general idea of simple linear regression is
to fit the best straight line through that data
that you possibly can and use that line to
make estimates for y based on certain x-
values. The equation of the best-fitting line in
simple linear regression is
– y = b0 + b1x1
– where b0 is the y-intercept and b1 is the slope.
• (The equation also has the form y = a +bx)
Unit
1
39
General form of the multiple regression
model
• In the multiple regression setting, you have more than one
x variable that is related to y.
– Call these x variables x1, x2, . . . xk.
• In the most basic multiple regression model, you use some
or all of these x variables to estimate y where each x
variable is taken to the first power. This process is called
finding the best-fitting linear function for the data.
• This linear function looks like the following:
– y = b0 + b1x1 + b2x2 + . . . + bkxk
– and you can call it the multiple (linear) regression model.
• You use this model to make estimates about y based on
given values of the x variables.
Unit
1
40
General form of the multiple regression
model
• A linear function is an equation whose x terms
are taken to the first power only.
– For example y = 2x1 + 3x2 + 4x3 is a linear equation
using three x variables.
• If any of the x terms are squared, the function
would be a quadratic one;
• If an x term is taken to the third power, the
function would be a cubic function, and so on.
In this chapter, I consider only linear functions.
Unit
1
41
Unit 2. Introduction to Geostatistics:
Definition and history of
geostatistics, advantages of
geostatistics, geostatistics analysis
requirements.
Practical 2:
Bivariate data analysis
Unit
2
42
What is geostatistics?
• What is statistics?
• What then is geo-statistics?
Unit
2
43
Comment
• The term statistics has two common
meanings, which we want to clearly
separate: descriptive and inferential
statistics.
• But to understand the difference between
descriptive and inferential statistics, we
must first be clear on the difference between
populations and samples.
Unit
2
44
Populations and samples
• A population is a set of well-defined objects.
– We must be able to say, for every object, if it is in the
population or not.
– We must be able, in principle, to find every individual of the
population.
• A geographic example of a population is all pixels in a
multi-spectral satellite image.
• A sample is some subset of a population.
– We must be able to say, for every object in the population, if
it is in the sample or not.
– Sampling is the process of selecting a sample from a
population.
– Continuing the example, a sample from this population could
be a set of pixels from known ground truth points.
Unit
2
45
What do we mean by statistics?
• Two common use of the word:
– Descriptive statistics: numerical summaries of
samples;
• (what was observed)
– Inferential statistics: from samples to
populations.
• (what could have been or will be observed in a larger
population)
Unit
2
46
A concise definition of inferential
statistics
• Statistics: The determination of the
probable from the possible
– . . . which implies the rigorous definition and
then quantification of “probable".
– Probable causes of past events or observations
– Probable occurrence of future events or
observations
• This is a definition of inferential statistics:
– Observations → Inferences
Unit
2
47
Why use statistical analysis?
• Descriptive: we want to summarize some data in a
shorter form
• Inferential: We are trying to understand some process
and maybe predict based on this understanding.
• So we need to model it, i.e. make a conceptual or
mathematical representation, from which we infer the
process.
• But how do we know if the model is “correct"?
• Are we imagining relations where there are none?
• Are there true relations we haven't found?
– Statistical analysis gives us a way to quantify the confidence
we can have in our inferences.
Unit
2
48
Comment
• The most common example of geo-statistical
inference is the prediction of some attribute at an
unsampled point, based on some set of sampled
points.
• In the next slide we show an example from the
Meuse river floodplain in the southern Netherlands.
The
• copper (Cu) content of soil samples has been
measured at 155 points (left figure); from this we
can predict
• at all points in the area of interest (right figure).
Unit
2
49
Unit
2
50
What is geo-statistics?
• Geostatistics is statistics on a population with
known location, i.e. coordinates:
– In one dimension (along a line or curve)
– In two dimensions (in a map or image)
– In three dimensions (in a volume)
• The most common application of geostatistics
is in 2D (maps).
• Key point: Every observation (sample point)
has both:
– coordinates (where it is located); and
– attributes (what it is).
Unit
2
51
Comment
• Let's first look at
a data set that is
not geo-statistical.
• It is a list of soil
samples (without
their locations)
with the lead (Pb)
concentration.
The column Pb is
the attribute of
interest.
Unit
2
52
To check your understanding . . .
• Q5 : Can we determine the mean,
maximum, minimum and standard
deviation of this set of samples?
• Q6 : Can we make a map of the sample
points with their Pb values?
Unit
2
53
Comment
• Now we look at a data set that is geo-statistical (next
slide).
• These are soil samples taken in the Jura mountains of
Switzerland, and their lead content; but this time
with their coordinates.
• The columns E and N are the coordinates, i.e. the
spatial reference; the column Pb is the attribute.
• First let's look at the tabular form:
Unit
2
54
Sample data
Unit
2
55
To check your understanding . . .
• Q7 : Comparing this to the non-
geostatistical list of soil samples and their
lead contents (above), what new information
is added here?
Unit
2
56
Comment
• On the figure (next slide) you will see:
– A coordinate system (shown by the over-printed
grid lines)
– The locations of 256 sample points - where a soil
sample was taken
– The attribute value at each sample point -
symbolized by the relative size of the symbol at
each point - in this case the amount of lead (Pb)
in the soil sample
Unit
2
57
Unit
2
58
To check your understanding . . .
• Q8 : In the figure, how can you determine the
coordinates of each sample point?
• Q9 : What are the coordinates of the sample point
displayed as a red symbol?
• Q10 : What is the mathematical origin (in the sense
of Cartesian or analytic geometry) of this coordinate
system?
• Q11 : How could these coordinates be related to
some common system such as UTM?
Unit
2
59
To check your understanding . . .
• Q12 : Suppose we have a satellite image that
has not been geo-referenced. Can we speak
of geostatistics on the pixel values?
• Q13 : In this case, what are the coordinates
and what are the attributes?
• Q14 : Suppose now the images has been geo-
referenced. What are now the coordinates?
Unit
2
60
Geostatistics requirements
• The location of a sample is an intrinsic part of its definition.
• All data sets from a given area are implicitly related by their coordinates
– So they can be displayed and related in a GIS
• Values at sample points can not be assumed to be independent: there is
often evidence that nearby points tend to have similar values of attributes.
• That is, there may be a spatial structure to the data
– Classical statistics assumes independence of samples
– But, if there is spatial structure, this is not true!
– This has major implications for sampling design and statistical inference
• Data values may be related to their coordinates ! spatial trend
Unit
2
61
Feature and geographic spaces
• The word space is used in mathematics to
refer to any set of variables that form
metric axes and which therefore allow us to
compute a distance between points in that
space.
– If these variables represent geographic
coordinates, we have a geographic space.
– If these variables represent attributes, we have
a feature space.
Unit
2
62
Comment
• You are probably quite familiar with feature
space from your study of non-spatial statistics.
• Even with one variable, we have a unit of
measure; this forms a 1D or univariate feature
space.
• Most common are two variables which we want
to relate with correlation or regression analysis;
this is a bivariate feature space.
• In multivariate analysis the feature space has
more than two dimensions.
Unit
2
63
Comment
• Multivariate feature spaces can have many
dimensions; we can only see three at a time.
Unit
2
64
Comment
• So, feature space is perhaps a new term but not a new
concept if you've followed a statistics course with
– univariate, bivariate and multivariate analysis.
• What then is geographic space? Simply put, it is a
mathematical space where the axes are map
coordinates that relate points to some reference
location on or in the Earth (or another physical body).
• These coordinates are often in some geographic
coordinate system that was designed to give each
location on (part of) the Earth a unique identification;
a common example is the Universal Transmercator
(UTM) grid.
• However, a local coordinate system can be used, as
long as there is a clear relation between locations and
coordinates.
Unit
2
65
Geographic space
• Axes are 1D lines; they almost always have the same
units of measure (e.g. metres,
• kilometres . . . )
– One-dimensional: coordinates are on a line with respect to
some origin .
– Two-dimensional: coordinates are on a grid with respect to
some origin .
– Three-dimensional: coordinates are grid and elevation from
a reference elevation
• Note: latitude-longitude coordinates do not have equal
distances in the two dimensions; they should be
transformed to metric (grid) coordinates for geo-
statistical analysis.
Unit
2
66
Interpolation
• Interpolation is based on the assumption that
spatially distributed objects are spatially
correlated; in other words, things that are close
together tend to have similar characteristics.
• For instance, if it is raining on one side of the
street, you can predict with a high level of
confidence that it is also raining on the other side
of the street.
• You would be less sure if it was raining across
town and less confident still about the state of the
weather in the neighbouring province.
68
What is a spatial interpolation?
• Interpolation predicts values for cells in a
raster from a limited number of sample data
points. It can be used to predict unknown
values for any geographic point data:
elevation, rainfall, chemical concentrations,
noise levels, and so on.
69
70
On the left is a point dataset of known values. On
the right is a raster interpolated from these points.
Unknown values are predicted with a
mathematical formula that uses the values of
nearby known points.
E
N
Unit
3:
non-geoststistical
spatial
analysis
71
E
N
Unit
3:
non-geoststistical
spatial
analysis
72
896
477
Sample points for copper
What is the
value at point
p?
606
794
783
646
Interpolation methods (to be discussed in the
class)
• Natural neighbour
• Local simple mean (average method)
• Polygon
• Triangulation
• Inverse Distance Method
• Polynomial equation
• Spline
73
Natural neighbour
• Natural Neighbor interpolation finds the
closest subset of input samples to a query
point and applies weights to them based on
proportionate areas to interpolate a value
(Sibson, 1981). It is also known as Sibson or
"area-stealing" interpolation.
74
Unit
3:
non-geoststistical
spatial
analysis
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
75
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
76
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
77
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
78
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
79
methods
Unit
3:
non-geoststistical
spatial
analysis
80
methods
IDW
• The IDW (Inverse Distance Weighted) tool
uses a method of interpolation that estimates
cell values by averaging the values of sample
data points in the neighborhood of each
processing cell. The closer a point is to the
center of the cell being estimated, the more
influence, or weight, it has in the averaging
process.
81
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
82
Interpolation
methods
Spline
• The Spline tool uses an interpolation method
that estimates values using a mathematical
function that minimizes overall surface
curvature, resulting in a smooth surface that
passes exactly through the input points.
83
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
84
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
85
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
86
Interpolation methods
Unit
3:
non-geoststistical
spatial
analysis
87
Interpolation methods
INTERPOLATION METHODS USING
SAMPLES
88
Unit
3:
non-geoststistical
spatial
analysis
More explanation on: point Estimation
• For each of the point estimation methods we
describe in the following sections, we will
show the detail’s of the estimation of the V
value at 65E,137N.
• No sweeping conclusions should be drawn
from this single example; it is presented only
to provide a familiar common thread through
our presentation of various methods. Once we
have looked at
89
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Distances to sample values in the vicinity of
65E,137N
90
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
point estimation methods
• The Values at sample locations near 65E, 137N
• are shown in the figure on the next slide and
listed in the previous Table.
– The variability of these nearby sample values presents
a challenge for estimation.
– Values range from 227 to 791 ppm;
• The estimated value therefore, can cover quite a
broad range depending on how we choose to
weight the individual values.
• In the following sections we will look at four quite
different point estimation methods.
91
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
E
N
Unit
3:
non-geoststistical
spatial
analysis
92
696
477
606
794
783
646
The goal is to estimate the value of V at the point 65E,137N,
located by the arrow, from the surrounding seven V data values.
The method of triangulation (1/9)
• by fitting a plane through three samples that surround the point
being estimated. The equation of a plane can be expressed
generally as
 z = ax + by + c
• In our example, where we are trying to estimate V values using
coordinate information, z is the V value, x is the easting, and y is
the northing.
• Given the coordinates and the V value of three nearby samples, we
can calculate the coefficients a, b and c by solving the following
system of equations:
 aX1+ bY1 + c = Z1
 aX2 + bY2 + c = Z2
 αX 3 + bY3 + C = Z3
93
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
The method of triangulation (2/9)
• From the figure we can find three samples
that nicely surround the point being
estimated: the 696 ppm, the 227 ppm, and
the 606 ppm samples.
• Using the data for these three samples, the
set of equations we need to solve is
63a + 140b + c = 696
64a + 129b + c = 227
71a + 140b + c = 606
94
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
The method of triangulation (3/9)
• The solution to these three simultaneous equations is
α= -11.250 b =41.614 c= -442.159
• which gives us the following equation as our
triallgulation estimator:
V = 1. 250x + 41.614y - 442.159
• This is the equation of the plane that passes through
the three nearby samples we have chosen;
• Using this equation we can now estimate the value at
any location simply by substituting the appropriate
easting and northing. Substituting the coordinates x =
65 and y = 137 into our equation gives us an estimate
of 548.7 ppm at the location 65E,137N.
95
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
96
The figure show the contours of the estimated V
values that this equation produces.
The method of triangulation (4/9)
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
97
The method of triangulation (5/9)
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
98
The method of triangulation (6/9)
Interpolation methods using samples
99
The method of triangulation (7/9)
Interpolation methods using samples
100
The method of triangulation (8/9)
Interpolation methods using samples
101
The method of triangulation (9a/9)
Interpolation methods using samples
102
The method of triangulation (9b/9)
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Local Sample Mean
103
• T he mean of the seven nearby samples
shown in Figure is 603.7ppm. This estimate is
much higher than the triangulation estimate.
• The two samples with V values greater than
750 ppm in the eastern half of Figure receive
more than 25% of the total weight and
therefore have a considerable influence on
our estimated value.
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Inverse Distance Methods (1/6)
• One obvious way to do
this is to make the weight
for each sample inversely
proportional to its
distance from the point
being estimated:
104
• d1, . . . ,dn are the distances from each of
the n sample locations to the point being
estimated and Vl , . . . ,Vn are the sample
values.
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Inverse distance weighting calculations for
sample values in the vicinity of 65E,137N
105
Inverse Distance Methods (2/6)
Unit
3:
non-geoststistical
spatial
analysis
Inverse Distance Methods (3/6)
• The nearest sample, the 696 ppm sample at 63E, 140N,
receives about 26% of the total weight, while the farthest
sample, the 783 ppm sample at 75E, 128N, receives less
than 7%. A good example of the effect of the inverse
distance weighting can be found in a comparison of the
weights given to the 477 ppm sample and the 791 pprn
sample.
• The 791 ppm sample at 73E, 141N is about twice as far
away from the point we are trying to estimate as the 477
ppm sample at 61E, 139N; the 791 ppm sample therefore
receives about half the weight of the 477 ppm sample.
• Using the weights given in previous Table our inverse
distance estimate of the V value at 65E, 137N is 594 ppm.
106
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Inverse Distance Methods (4/6)
• The inverse distance estimator we gave in the
next Equation can easily be adapted to
include a broad range of estimates. Rather
than using weights that are inversely
proportional to the distance, we can make the
weights inversely proportional to any power of
the distance:
107
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Inverse Distance Methods (5/6)
108
The effect of the inverse distance exponent on the sample
weights and on the V estimate.
Interpolation methods using samples
Inverse Distance Methods (6/6)
• Different choices of the exponent p will result
in different estimates.
109
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
Search Neighbourhoods
• For the case studies we perform in this
chapter, we use a circular search
neighbourhood with a radius of 25 m.
• All samples that fall within 25 m of the point
we are estimating will be included in the
estimation procedure.
110
Interpolation methods using samples
Unit
3:
non-geoststistical
spatial
analysis
The First Law of Geography
Tobler’s Law:
• The central tenet of Geography is that
location matters for understanding a wide
variety of phenomena.
• Everything is related to everything else, but
things that are closer together are more
related to each other than those that are
further apart
Unit
3:
The
First
Law
of
Geography
111
Geographers’ Perspectives on the World
• Location matters
– Real-world relationships
– Horizontal connections between places
– Importance of scale (both in time and space)
Unit
3:
The
First
Law
of
Geography
112
Geographic Information
• Includes knowledge about where something is
• Includes knowledge about what is at a given
location
• Can be very detailed:
– e.g. the locations of all buildings in a city or the
– locations of all trees in a forest stand
• Or it can be very coarse:
– e.g. the population density of an entire country or the
global sea surface temperature distribution
• There is always a spatial component associated
with geographic information
Unit
3:
The
First
Law
of
Geography
113
Unit
3:
The
First
Law
of
Geography
114
Unit
3:
The
First
Law
of
Geography
116
Unit
3:
The
First
Law
of
Geography
117
Unit
3:
The
First
Law
of
Geography
118
Unit
3:
The
First
Law
of
Geography
119
Unit
3:
The
First
Law
of
Geography
120
Unit
3:
The
First
Law
of
Geography
121
Practical 4 to 5: Exploratory data analysis
• Non-geostatistical interpolation (ArcGIS and/or QGIS)
– Inverse distance
– Closest point
– Moving average
– Least square polynomial
– Spline
– Triangulation
• Individual report (short report)
– What are the required input (data and parameters)
– What are the outputs
– Compare the different
• methods and/or parameters
122
123
4. Characterizing spatial process
• Covariance,
• Correlation and variogram.
• Understanding and measure of similarity
between different data.
124
Unit
4:
Characterizing
spatial
process
variogram
• The most common way to visualize local
spatial dependence is the variogram, also
called (for historical reasons) the
semivariogram.
• To understand this, we have to first define the
semivariance as a mathematical measure of
the difference between the two points in a
point-pair.
125
Unit
4:
Characterizing
spatial
process
126
The semi-variogram is
based on modelling the
(squared) differences in
the z-values as a function
of the distances between
all of the known points.
Unit
4:
Characterizing
spatial
process
Semivariance
• This is a mathematical measure of the difference between the two
points in a point-pair. It is expressed as squared difference so that
the order of the points doesn't matter (i.e. subtraction in either
direction gives the same results). Each pair of observation points
has a semivariance, usually represented as the Greek letter
(`gamma'), and dened as:
• where x is a geographic point and z(x) is its attribute value.
• (Note: The `semi' refers to the factor 1=2, because there are two
ways to compute for the same point pair.)
• So, the semivariance between two points is half the squared
dierence between their values. If the values are similar, the
semivariance will be small.
127
Unit
4:
Characterizing
spatial
process
Q
128
Unit
4:
Characterizing
spatial
process
Point pair
• Now we know two things about a point-pair:
1. The distance between them in geographic space;
2. The semivariance between them in attribute
space.
• So . . . it seems natural to see if points that are
`close by' in geographical space are also `close
by' in attribute space.
• This would be evidence of local spatial
dependence.
129
Unit
4:
Characterizing
spatial
process
The variogram cloud
• This is a graph showing semivariances between
all point-pairs:
– X-axis: The separation distance within the point-pair
– Y-axis: The semivariance
• Advantage: Shows the comparison between all
point-pairs as a function of their separation;
• Advantage: Shows which point-pairs do not fit the
general pattern
• Disadvantage: too many graph points, hard to
interpret
130
Unit
4:
Characterizing
spatial
process
131
Unit
4:
Characterizing
spatial
process
Q
132
Unit
4:
Characterizing
spatial
process
variogram cloud
• Clearly, the variogram cloud gives too much
information. If there is a relation between
separation and semi-variance, it is hard to see.
• The usual way to visualize this is by grouping the
point-pairs into lags or bins according to some
separation range, and computing some
representative semi-variance for the entire lag.
• Often this is the arithmetic average, but not
always.
133
Unit
4:
Characterizing
spatial
process
134
Origins
• Involve a set of statistical techniques called
Kriging (there are a bunch of different Kriging
methods)
• Kriging is named after Danie Gerhardus Krige, a
South African mining engineer who presented
the ideas in his masters thesis in 1951. These
ideas were later formalized by a prominent
French mathematician Georges Matheron
• For more information, see:
– Krige, Danie G. (1951). "A statistical
approach to some basic mine valuation
problems on the Witwatersrand". J. of the
Chem., Metal. and Mining Soc. of South
Africa 52 (6): 119–139.
– Matheron, Georges (1962). Traité de
géostatistique appliquée, Editions Technip,
France
• Kriging has two parts: the quantification of the
spatial structure in the data (called variography)
and prediction of values at unknown points
Souce of this information: http://en.wikipedia.org/wiki/Daniel_Gerhardus_Krige
Georges Matheron
Danie Gerhardus Krige
135
Motivating Example: Ordinary Kriging
• Imagine we have data on the concentration of gold (denote it by Y) in
western Pennsylvania at a set of 200 sample locations (call them points
p1…p200).
• Since Y has a meaningful value at every point, our goal is to create a
prediction surface for the entire region using these sample points
• Notation: In this western PA region, Y(p) will denote the concentration
level of gold at any point p.
136
Global and Local Structure
• Without any a priori knowledge about the distribution of gold in Western PA,
we have no theoretical reason to expect to find different concentrations of
gold at different locations in that region.
– I.e., theoretically, the expected value of gold concentration should not vary with
latitude and longitude
– In other words, we would expect that there is some general, average, value of
gold concentration (called global structure) that is constant throughout the region
(even though we assume it’s constant, we do not know what its value is)
• Of course, when we look at the data, we see that there is some variability in
the gold concentrations at different points. We can consider this to be a local
deviation from the overall global structure, known as the local structure or
residual or error term.
• In other words, geostatisticians would decompose the value of gold Y(p) into
the global structure μ(p) and local structure ε(p).
• Y(p) = μ(p) + ε(p)
137
ε(p)
• As per the First Law of Geography, the local
structures ε(p) of nearby observations will often be
correlated. That is, there is still some meaningful
information (i.e., spatial dependencies) that can be
extracted from the spatially dependent component
of the residuals.
• So, our ordinary kriging model will:
– Estimate this constant but unknown global structure μ(p),
and
– Incorporate the dependencies among the residuals ε(p).
Doing so will enable us to create a continuous surface of
gold concentration in western PA.
138
Assumptions of Ordinary Kriging
• For the sake of the methods that we will be employing, we need to
make some assumptions:
– Y(p) should be normally distributed
– The global structure μ(p) is constant and unknown (as in the
gold example)
– Covariance between values of ε depends only on distance
between the points,
• To put it more formally, for each distance h and each pair of
locations p and t within the region of interest that are h units are
apart, there exists a common covariance value, C(h), such that
covariance [ε(p), ε(t)] = C(h).
• This is called isotropy
139
Covariance and Distance
• From the First Law of Geography it would then follow that as distance
between points increases, the similarity (i.e., covariance or correlation)
between the values at these points decreases
• If we plot this out, with inter-point distance h on the x-axis, and
covariance C(h) on the y-axis, we get a graph that looks something like
the one below. This representation of covariance as a function of distance
is called as the covariogram
• Alternatively, we can plot correlation against distance (the correlogram)
140
Covariograms and Weights
• Geostatistical methods incorporate this covariance-
distance relationship into the interpolation models
– More specifically, this information is used to calculate the
weights
– As IDW, kriging is a weighted average of points in the
vicinity
• Recall that in IDW, in order to predict the value at an unknown
point, we assume that nearer points will have higher weights (i.e.,
weights are determined based on distance)
• In geostatistical techniques, we calculate the distances between
the unknown point at which we want to make a prediction and
the measured points nearby, and use the value of the
covariogram for those distances to calculate the weight of each
of these surrounding measured points.
– I.e., the weight of a point h units away will depend on the value of
C(h)
141
But…
• Unfortunately, it so happens that one generally cannot estimate
covariograms and correlograms directly
• For that purpose, a related function of distance (h) called the semi-
variogram (or simply the variogram) is calculated
– The variogram is denoted by γ(h)
– One can easily obtain the covariogram from the variogram (but not the
other way around)
• Covariograms and variograms tell us the spatial structure of the data
Covariogram C(h) Variogram γ(h)
142
Interpretation of Variograms
• As mentioned earlier, a covariogram might be thought of as covariance (i.e., similarity)
between point values as a function of distance, such that C(h) is greater at smaller
distances
• A variogram, on the other hand, might be thought of as “dissimilarity between point
values as a function of distance”, such that the dissimilarity is greater for points that are
farther apart
• Variograms are usually interpreted in terms of the corresponding covariograms or
correlograms
• A common mistake when interpreting variograms is to say that variance increases with
distance.
Covariogram C(h) Variogram γ(h)
143
• When there are n points, the number of inter-point distances is equal to
• Example:
– With 15 points, we have 15(15-1)/2 = 105 inter-point distances (marked in yellow on the grid in the
lower left)
– Since we’re using Euclidean distance, the distance between points 1 and 2 is the same as the
distance between points 2 and 1, so we count it only once. Also, the distance between a point and
itself will always be zero, and is of no interest here.
• The maximum distance h on a covariogram or variogram is called the bandwidth,
and should equal half the maximum inter-point distance.
– In the figure on the lower right, the blue line connects the points that are the farthest away from
each other. The bandwidth in this example would then equal to half the length of the blue line
Bandwidth (The Maximum Value of h)
144
Mathematical definition of a variogram
• In other words, for each distance h between 0 and the bandwidth
– Find all pairs of points i and j that are separated by that distance h
– For each such point pair, subtract the value of Y at point j from the
value of Y at point i, and square the difference
– Average these square distances across all point pairs and divide the
average by 2. That’s your variogram value!
• Division by 2 -> hence the occasionally used name semi-variogram
• However, in practice, there will generally be only one pair of points that
are exactly h units apart, unless we’re dealing with regularly spaced
samples. Therefore, we create “bins”, or distance ranges, into which we
place point pairs with similar distances, and estimate γ only for midpoints
of these bins rather than at all individual distances.
– These bins are generally of the same size
– It’s a rule of thumb to have at least 30 point pairs per bin
• We call these estimates of γ(h) at the bin midpoints the empirical
variogram
 
 
2
)
(
)
(
2
1
)
( j
Y
i
Y
average
h 


145
Fitting a Variogram Model
• Now, we’re going to fit a variogram model (i.e., curve) to the
empirical variogram
• That is, based on the shape of the empirical variogram,
different variogram curves might be fit
• The curve fitting generally employs the method of least
squares – the same method that’s used in regression analysis
A very comprehensive guide on variography by Dr. Tony Smith (University of Pennsylvania)
http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_II/4_Variograms.pdf
146
The Variogram Parameters
• The variogram models are a function of three parameters, known as the range, the sill, and
the nugget.
– The range is typically the level of h at the correlation between point values is zero (i.e.,
there is no longer any spatial autocorrelation)
– The value of γ at r is called the sill, and is generally denoted by s
• The variance of the sample is used as an estimate of the sill
– Different models have slightly different definitions of these parameters
– The nugget deserves a slide of its own
Graph taken from: http://www.geog.ubc.ca/courses/geog570/talks_2001/Variogr1neu.gif
147
Spatial Independence at Small Distances
• Even though we assume that values at points that are very
near each other are correlated, points that are separated by
very, very small values might be considerably less correlated
– E.g.: you might find a gold nugget and no more gold in the vicinity
• In other words, even though γ(0) is always 0, however γ at
very, very small distances will be equal to a value a that is
considerably greater than 0.
• This value denoted by a is called the nugget
• The ratio of the nugget to the sill is known as the nugget
effect, and may be interpreted as the percentage of variation
in the data that is not spatial
• The difference between the sill and the nugget is known as
the partial sill
– The partial sill, and not the sill itself, is reported in GeoStatistical
Analyst
148
Pure Nugget Effect Variograms
• Pure nugget effect is when the covariance between point values is zero at
all distances h
• That is, there is absolutely no spatial autocorrelation in the data (even at
small distances)
• Pure nugget effect covariogram and variogram are presented below
• Interpolation won’t give a reasonable predictions
• Most cases are not as extreme and have both a spatially dependent and a
spatially independent component, regardless of variogram model chosen
(discussed on following slides)
149
The Spherical Model
• The spherical model is the most widely used variogram model
• Monotonically non-decreasing
– I.e., as h increases, the value of γ(h) does not decrease - i.e., it goes up (until h≤r) or
stays the same (h>r)
• γ(h≥r)=s and C(h≥r)=0
– That is, covariance is assumed to be exactly zero at distances h≥r
150
The Exponential Model
• The exponential variogram looks very similar to the spherical model, but assumes
that the correlation never reaches exactly zero, regardless of how great the
distances between points are
• In other words, the variogram approaches the value of the sill asymptotically
• Because the sill is never actually reached, the range is generally considered to be
the smallest distance after which the covariance is 5% or less of the maximum
covariance
• The model is monotonically increasing
– I.e., as h goes up, so does γ(h)
151
The Wave (AKA Hole-Effect) Model
On the picture to the left, the waves exhibit a
periodic pattern. A non-standard form of spatial
autocorrelation applies. Peaks are similar in values
to other peaks, and troughs are similar in values to
other troughs. However, note the dampening in the
covariogram and variogram below: That is, peaks
that are closer together have values that are more
correlated than peaks that are father apart (and
same holds for troughs).
More is said about the applicability of these models in
ttp://www.gaa.org.au/pdf/gaa_pyrcz_deutsch.pdf
Variogram graph edited slightly from:
http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_
II/4_Variograms.pdf
152
Variograms and Kriging Weights
153
5. VAROGRAM
MODELING/ANALYSIS:
154
The empirical variogram
• To summarize the variogram cloud, group the separations
into lags (separation bins, like a histogram)
• Then, compute the average semivariance of all the point-
pairs in the bin.
• This is the empirical variogram, as the so-called Matheron
estimator:
– m(h) is the number of point pairs separated by vector h, in
practice some range (bin)
– These are indexed by i; the notation z(xi +h) means the “tail“ of
point-pair i, i.e. separated from the ”head“ xi by the separation
vector h.
155
Unit
5:
Varogram
Modeling/analysis:
156
Defining the bins
• There are some practical considerations, just like dening bins for a
histogram:
– Each bin should have enough points to give a robust estimate of the
representative semi-variance; otherwise the variogram is erratic;
– If a bin is too wide, the theoretical variogram model will be hard to
estimate and fit; note we haven't seen this yet, it is in the next lecture;
– The largest separation should not exceed half the longest separation in
the dataset;
– In general the largest separation should be somewhat shorter, since it
is the local spatial dependence which is most interesting.
• All computer programs that compute variograms use some defaults
for the largest separation and number of bins; gstat uses 1/3 of the
longest separation, and divides this into 15 equal-width bins.
157
Unit
5:
Varogram
Modeling/analysis:
Numerical example of an empirical
variogram
• Here is an empirical
variogram of log10Pb
from the Jura soil
samples; for simplicity
the maximum
separation was set to
1.5 km:
– np are the number of
point-pairs in the bin;
dist is the average
separation of these
pairs; gamma is the
average semivariance
in the bin.
158
Unit
5:
Varogram
Modeling/analysis:
Q
159
Unit
5:
Varogram
Modeling/analysis:
Plotting the empirical variogram
• This can be plotted as semivariance gamma
against average separation dist, along with the
number of points that contributed to each
estimate np.
160
Unit
5:
Varogram
Modeling/analysis:
161
Unit
5:
Varogram
Modeling/analysis:
Q
162
Unit
5:
Varogram
Modeling/analysis:
Features of the empirical variogram
• Later we will look at fitting a theoretical model to the
empirical variogram; but even without a model we can
notice some features which characterize the spatial
dependence, which we define here only qualitatively:
– Sill: maximum semi-variance
• represents variability in the absence of spatial dependence
– Range: separation between point-pairs at which the sill is
reached
• distance at which there is no evidence of spatial dependence
– Nugget: semi-variance as the separation approaches zero
• represents variability at a point that can't be explained by spatial
structure
163
Unit
5:
Varogram
Modeling/analysis:
Semivariogram
0
10
20
30
40
50
60
0 50 100 150 200
Lag (m)
Semivariance
Sill
Range
Nugget
Unit
5:
Varogram
Modeling/analysis:
Semivariogram
0
10
20
30
40
50
60
0 50 100 150 200
Lag (m)
Semivariance
Spatially dependent Spatially independent
Unit
5:
Varogram
Modeling/analysis:
Semivariogram uses
• Use range to determine maximum
sampling distances
• The sill indicates intra-field variability
• The model can be used for interpolation
of values in unsampled areas
Unit
5:
Varogram
Modeling/analysis:
Q
167
Unit
5:
Varogram
Modeling/analysis:
168
Unit
5:
Varogram
Modeling/analysis:
Effect of bin width
• The same set of points can be displayed with many bin
widths
• This has the same effect as different bin widths in a
univariate histogram: same data, different visualization
• In addition, visual and especially automatic variogram
fitting is affected
• Wider (fewer) bins ! less detail, also less noise
• Narrower (more) bins ! more detail, but also more noise
• General rule:
– as narrow as possible (detail) without “too much“ noise;
– and with sufficient point-pairs per bin (> 100, preferably > 200)
169
Unit
5:
Varogram
Modeling/analysis:
170
Evidence of spatial dependence
• The empirical variogram provides evidence that there is
local spatial dependence.
• The variability between point-pairs is lower if they are
closer to each other; i.e. the separation is small.
• There is some distance, the range where this efect is noted;
beyond the range there is no dependence.
• The relative magnitude of the total sill and nugget give the
strength of the local spatial dependence; the nugget
represents completely unexplained variability.
• There are of course variables for which there is no spatial
dependence, in which case the empirical variogram has the
sill equal to the nugget; this is called a pure nugget effect
• The next graph shows an example.
171
Unit
5:
Varogram
Modeling/analysis:
172
Unit
5:
Varogram
Modeling/analysis:
Visualizing anisotropy
• Anisotropy
• Variogram surfaces
• Directional variograms
173
Unit
5:
Varogram
Modeling/analysis:
What?
• We have been considering spatial dependence as if it is the same in
all directions from a point (isotropic or omnidirectional).
• For example, if I want to know the weather at a point where there
is no station, I can equally consider stations at some distance from
my location, no matter whether they are N, S, E or W.
• But this is self-evidently not always true! In this example, suppose
the winds almost always blow from the
• North. Then the temperatures recorded at stations 100 km to the N
or S of me will likely be closer to the temperature at my station
than temperatures recorded at stations 100 km to the E or W.
• We now see how to detect anisotropy.
174
Unit
5:
Varogram
Modeling/analysis:
Anisotropy
• Greek ”Iso"+ “tropic"= English “same"+ “trend";
Greek “an-"= English ”not-"
• Variation may depend on direction, not just
distance
• This is why we refer to the separation vector; up
till now this has just meant distance, but now it
includes direction
– Case 1: same sill, different ranges in dierent directions
(geometric, also called affine, anisotropy)
– Case 2: same range, sill varies with direction (zonal
anisotropy)
175
Unit
5:
Varogram
Modeling/analysis:
Spatial trends
• Isotropic - trend is a function of distance from
a known (sampled) point only
• Anisotropic - trend is a function of both
distance and direction from a known point
Unit
5:
Varogram
Modeling/analysis:
How can anisotropy arise?
• Directional process
– Example: sand content in a narrow flood plain:
much greater spatial dependence along the axis
parallel to the river
– Example: population density in a hilly terrain with
long, linear valleys
• Note that the nugget must logically be
isotropic: it is variation at a point (which has
no direction)
177
Unit
5:
Varogram
Modeling/analysis:
How do we detect anisotropy?
1. Looking for directional patterns in the post-plot;
2. With a variogram surface, sometimes called a
variogram map;
3. Computing directional variograms, where we
only consider points separated by a distance but
also in a given horizontal direction from each
other.
• We can compute different directional variograms
and see if they have different structure.
178
Unit
5:
Varogram
Modeling/analysis:
Detecting anisotropy with a variogram
surface
• One way to see anistropy is with a variogram surface,
sometimes called a variogram map.
• This is not a map! but rather a plot of semivariances vs.
distance and direction (the separation vector)
• Each grid cell shows the semivariance at a given
distance and direction separation (lag)
• Symmetric by definition, can be read in either direction
• A transect from the origin to the margin gives a
directional variogram (next visualization technique)
179
Unit
5:
Varogram
Modeling/analysis:
180
Unit
5:
Varogram
Modeling/analysis:
181
Unit
5:
Varogram
Modeling/analysis:
182
Unit
5:
Varogram
Modeling/analysis:
Q
183
Unit
5:
Varogram
Modeling/analysis:
184
Reviewing Ordinary Kriging
• Again, ordinary kriging will:
– Give us an estimate of the constant but unknown global
structure μ(p), and
– Use variography to examine the dependencies among the
residuals ε(p) and to create kriging weights.
• We calculate the distances between the unknown point at which
we want to make a prediction and the measured points that are
nearby and use the value of the covariogram for those distances to
calculate the weight of each of these surrounding measured
points.
• The end result is, of course, a continuous prediction
surface
• Prediction standard errors can also be obtained – this
is a surface indicating the accuracy of prediction
185
• Now, take another example: imagine we have data on the
temperature at 100 different weather stations (call them
w1..w100) throughout Florida, and we want to predict the
values of temperature (T) at every point w in the entire state
using these data.
• Notation: temperature at point w is denoted by T(w)
• We know that temperature at lower latitudes are expected to
be higher. So, T(w) will be expected to vary with latitude
– Ordinary kriging is not appropriate here, because it assumes that the
global structure is the same everywhere. This is clearly not the case
here.
– A method called universal kriging allows for a non-constant global
structure
• We might model the global structure μ as in regression:
• Everything else in universal kriging is pretty much the same as in ordinary
kriging (e.g., variography)
Universal Kriging
  )
(
1
0 w
latitude
w 

 

186
Some More Advanced Techniques
• Indicator Kriging is a geostatistical interpolation
method does not require the data to be normally
distributed.
• Co-kriging is an interpolation technique that is used
when there is a second variable that is strongly
correlated with the variable from which we’re trying to
create a surface, and which is sampled at the same set
of locations as our variable of interest and at a number
of additional locations.
• For more details on indicator kriging and co-kriging,
see one of the texts suggested at the end of this
presentation
187
Isotropy vs. Anisotropy
• When we use isotropic (or omnidirectional)
covariograms, we assume that the covariance
between the point values depends only on distance
– Recall the covariance stationarity assumption
• Anisotropic (or directional) covariograms are used
when we have reason to believe that direction plays
a role as well (i.e., covariance is a function of both
distance and direction)
– E.g., in some problems, accounting for direction is
appropriate (e.g., when wind or water currents might be a
factor)
For more on anisotropic variograms, see http://web.as.uky.edu/statistics/users/yzhen8/STA695/lec05.pdf
188
IDW vs. Kriging
• We get a more “natural” look to the data with Kriging
• You see the “bulls eye” effect in IDW but not (as much) in Kriging
• Helps to compensate for the effects of data clustering, assigning individual points within a cluster less
weight than isolated data points (or, treating clusters more like single points)
• Kriging also give us a standard error
• If the data locations are quite dense and uniformly distributed throughout the area of interest, we will get
decent estimates regardless of which interpolation method we choose.
• On the other hand, if the data locations fall in a few clusters and there are gaps in between these clusters,
we will obtain pretty unreliable estimates regardless of whether we use IDW or Kriging.
These are interpolation results using the gold data in Western PA (IDW vs. Ordinary Kriging)
6. KRIGING (SPATIAL ESTIMATION)
189
Why other methods than interpolations
• In next units we will look at ordinary kriging
• Ordinary kriging is “linear" because
– its estimates are weighted linear combinations of the
available data; it is “unbiased" since it tries to have mR, the
mean residual or error, equal to 0;
– it is “best" because it aims at minimizing σ2
R (the
variance of the errors).
• All of the other estimation methods we have seen so
far are also linear and, as we have already seen, are
also theoretically unbiased.
• The distinguishing feature of ordinary kriging,
therefore, is its aim of minimizing the error variance.
190
Unit
6:
Kriging
(spatial
estimation)
How to deal with error
• The importance of this for ordinary kriging is that
– we never know mR and therefore cannot guarantee
that it is exactly 0.
– Nor do we know σ2
R; therefore, we cannot minimize
it.
• The best we can do is to build a model of the data
we are studying and work with the average error
and the error variance for the model.
191
Unit
6:
Kriging
(spatial
estimation)
variance
• In ordinary kriging, we use a probability model
in which the bias and the error variance can
both be calculated and then choose weights
for the nearby samples that ensure that the
average error for our model, mR is exactly 0
and that our modeled error variance σ2
R, is
minimized.
192
Unit
6:
Kriging
(spatial
estimation)
ordinary kriging system
• This system of equations, often referred to as
the ordinary kriging system, can be written in
matrix notation as
193
Unit
6:
Kriging
(spatial
estimation)
weights
• To solve for the weights, we multiply the
previous Equation on both sides by c-1 the
inverse of the left-hand side covariance
matrix:
194
Unit
6:
Kriging
(spatial
estimation)
An Example of Ordinary Kriging
• Let us return to the seven sample data
configuration we used earlier to see a specific
example of how ordinary kriging is done. The
data configuration is shown again in next
slides; we have labelled the point we are
estimating as location 0, and the sample
locations as 1 through 7. The coordinates of
these eight points are given in Table following
the figure, along with the available sample
values.
195
Unit
6:
Kriging
(spatial
estimation)
An example of a data configuration
196
• An example of
a data
configuration
to illustrate the
kriging
estimator.
• The sample
value is given
immediately to
the right of the
plus sign.
Unit
6:
Kriging
(spatial
estimation)
197
Coordinates and sample values for the data shown
in previous Figure
Unit
6:
Kriging
(spatial
estimation)
pattern of spatial
• To calculate the ordinary kriging weights, we
must first decide what pattern of spatial
continuity we want our random function
model to have.
198
Unit
6:
Kriging
(spatial
estimation)
covariances
• To keep this example relatively simple, we will
calculate all of our covariances from the
following function:
199
An example of an exponential
covariance function .
Unit
6:
Kriging
(spatial
estimation)
variogram
• The covariance function corresponds to the
following variogram:
200
An example of an exponential
variogram model
Unit
6:
Kriging
(spatial
estimation)
Remark to covariance & variogram model
– Co
• commonly called the
nugget effect
• provides a
discontinuity at the
origin.
201
Unit
6:
Kriging
(spatial
estimation)
– a
• commonly called the range
• provides a distance beyond
which the varriogram or
covariance value remains
essentially constant.
– Co + Cl
• commonly called the sill
• is the variogram value for very
large distances, γ( ∞) it is also
the covariance value for |h| =
0, and the variance of our
random variables, σ2.
Both of these functions, shown in the previous two
slides, can be described by the following parameters:
• Geostatisticians normally define the spatial continuity
in their random function model through the variogram
and solve the ordinary kriging system using the
covariance. In this example, we will use the covariance
function throughout.
• By using the covariance function, we have chosen to
ignore the possibility of anisotropy for the moment;
the covariance between the data values at any two
locations will depend only on the distance between
them and not on the direction. Later, when we
examine the effect of the various parameters, we will
also study the important effect of anisotropy.
202
Unit
6:
Kriging
(spatial
estimation)
203
A table of distances, from the previous Figure,
between all possible pairs of the seven data
locations.
Unit
6:
Kriging
(spatial
estimation)
• To demonstrate how ordinary kriging works,
we will use the following parameters for the
function given in the following Equation:
204
Unit
6:
Kriging
(spatial
estimation)
• These are not necessarily good choices, but
they will make the details of the ordinary
kriging procedure easier to follow since our
covariance model now has a quite simple
expression:
205
Unit
6:
Kriging
(spatial
estimation)
• Having chosen a covariance function from
which we can calculate all the covariances
required for our random function model, we
can now build the C and D matrices.
206
Unit
6:
Kriging
(spatial
estimation)
207
Using Table, which provides the distances between
every pair of locations, and Equation above, the C
matrix is
Unit
6:
Kriging
(spatial
estimation)
208
209
210
211
The set of weights
that will provide
unbiased estimates
with a minimum
estimation variance is
calculated by
multiplying C-1 by D:
212
The ordinary
kriging weights for
the seven samples
using the isotropic
exponential
covariance model
given in Equation
below. The sample
value is given
immediately to the
right of the plus
sign while the
kriging weights are
shown in
parenthesis.
• Below is shown the sample values along with
their corresponding weights. The resulting
estimate is
213
214
the minimized error variance expressed as
215
The minimized estimation variance is
Detailed exercise
• Refer to the practical exercise on
– interpolation using IDW
– kriging
216
Spatial Interpolation: A Brief
Introduction
Eugene Brusilovskiy
218
• Introduction to interpolation
• Deterministic interpolation methods
• Some basic statistical concepts
• Autocorrelation and First Law of Geography
• Geostatistical Interpolation
– Introduction to variography
– Kriging models
General Outline
219
What is Interpolation?
• Assume we are dealing with a variable which has meaningful values at every point
within a region (e.g., temperature, elevation, concentration of some mineral).
Then, given the values of that variable at a set of sample points, we can use an
interpolation method to predict values of this variable at every point
– For any unknown point, we take some form of weighted average of the values at
surrounding points to predict the value at the point where the value is unknown
– In other words, we create a continuous surface from a set of points
– As an example used throughout this presentation, imagine we have data on the
concentration of gold in western Pennsylvania at a set of 200 sample locations:
Input Process Output
220
Appropriateness of Interpolation
• Interpolation should not be used when there isn’t a
meaningful value of the variable at every point in space
(within the region of interest)
• That is, when points represent merely the presence of events
(e.g., crime), people, or some physical phenomenon (e.g.,
volcanoes, buildings), interpolation does not make sense.
• Whereas interpolation tries to predict the value of your
variable of interest at each point, density analysis (available,
for instance, in ArcGIS’s Spatial Analyst) “takes known
quantities of some phenomena and spreads it across the
landscape based on the quantity that is measured at each
location and the spatial relationship of the locations of the
measured quantities”.
– Source:
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Un
derstanding_density_analysis
221
Interpolation vs. Extrapolation
• Interpolation is prediction within the range of our data
– E.g., having temperature values for a bunch of
locations all throughout PA, predict the temperature
values at all other locations within PA
• Note that the methods we are talking about are strictly
those of interpolation, and not extrapolation
• Extrapolation is prediction outside the range of our data
– E.g., having temperature values for a bunch of
locations throughout PA, predict the temperature
values in Kazakhstan
222
First Law of Geography
• “Everything is related to everything else, but
near things are more related than distant
things.”
– Waldo Tobler (1970)
• This is the basic premise
behind interpolation, and
near points generally
receive higher weights
than far away points
Waldo Tobler
Reference: TOBLER, W. R. (1970). "A computer movie simulating urban growth in the
Detroit region". Economic Geography, 46(2): 234-240.
223
Methods of Interpolation
• Deterministic methods
– Use mathematical functions to calculate the values at unknown locations
based either on the degree of similarity (e.g. IDW) or the degree of smoothing
(e.g. RBF) in relation with neighboring data points.
– Examples include:
• Inverse Distance Weighted (IDW)
• Radial Basis Functions (RBF)
• Geostatistical methods
– Use both mathematical and statistical methods to predict values at all
locations within region of interest and to provide probabilistic estimates of the
quality of the interpolation based on the spatial autocorrelation among data
points.
• Include a deterministic component and errors (uncertainty of prediction)
– Examples include:
• Kriging
• Co-Kriging
Reference: http://www.crwr.utexas.edu/gis/gishydro04/Introduction/TermProjects/Peralvo.pdf
224
Exact vs. Inexact Interpolation
• Interpolators can be either exact or inexact
– At sampled locations, exact interpolators yield values identical to the
measurements.
• I.e., if the observed temperature in city A is 90 degrees, the point
representing city A on the resulting grid will still have the temperature of
90 degrees
– At sampled locations, inexact interpolators predict values that are
different from the measured values.
• I.e., if the observed temperature in city A is 90 degrees, the inexact
interpolator will still create a prediction for city A, and this prediction will
not be exactly 90 degrees
– The resulting surface will not pass through the original point
– Can be used to avoid sharp peaks or troughs in the output surface
• Model quality can be assessed by the statistics of the differences between
predicted and measured values
– Jumping ahead, the two deterministic interpolators that will be briefly
presented here are exact. Kriging can be exact or inexact.
Reference: Burrough, P. A., and R. A. McDonnell. 1998. Principles of geographical information systems. Oxford University Press,
Oxford. 333pp.
225
Part 1. Deterministic Interpolation
226
Inverse Distance Weighted (IDW)
• IDW interpolation explicitly relies on the First Law of
Geography. To predict a value for any unmeasured location,
IDW will use the measured values surrounding the prediction
location. Measured values that are nearest to the prediction
location will have greater influence (i.e., weight) on the
predicted value at that unknown point than those that are
farther away.
– Thus, IDW assumes that each measured point has a local influence
that diminishes with distance (or distance to the power of q > 1), and
weighs the points closer to the prediction location greater than those
farther away, hence the name inverse distance weighted.
• Inverse Squared Distance (i.e., q=2) is a widely used interpolator
• For example, ArcGIS allows you to select the value of q.
• Weights of each measured point are proportional to the
inverse distance raised to the power value q. As a result, as
the distance increases, the weights decrease rapidly. How fast
the weights decrease is dependent on the value for q.
Source: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works
227
Inverse Distance Weighted - Continued
• Because things that are close to one another are more alike
than those farther away, as the locations get farther away, the
measured values will have little relationship with the value of
the prediction location.
– To speed up the computation we might only use several points that
are the closest
– As a result, it is common practice to limit the number of measured
values that are used when predicting the unknown value for a location
by specifying a search neighborhood. The specified shape of the
neighborhood restricts how far and where to look for the measured
values to be used in the prediction. Other neighborhood parameters
restrict the locations that will be used within that shape.
• The output surface is sensitive to clustering and the presence
of outliers.
228
Search Neighborhood Specification
Points with known values of elevation that are outside the circle are just too far from the
target point at which the elevation value is unknown, so their weights are pretty much 0.
5 nearest neighbors
with known values
(shown in red)
of the unknown point
(shown in black)
will be used to
determine its value
229
The Accuracy of the Results
• One way to assess the accuracy of the interpolation is known
as cross-validation
– Remember the initial goal: use all the measured points to create a
surface
– However, assume we remove one of the measured points from our
input, and re-create the surface using all the remaining points.
– Now, we can look at the predicted value at that removed point and
compare it to the point’s actual value!
– We do the same thing for all the points
– If the average (squared) difference between the actual value and the
prediction is small, then our model is doing a good job at predicting
values at unknown points. If this average squared difference is large,
then the model isn’t that great. This average squared difference is
called mean square error of prediction. For instance, the Geostatistical
Analyst of ESRI reports the square root of this average squared
difference
– Cross-validation is used in other interpolation methods as well
230
A Cross-Validation Example
• Assume you have measurements at 15 data points,
from which you want to create a prediction surface
• The Measured column tells you the measured value
at that point. The Predicted column tells you the
prediction at that point when we remove it from the
input (i.e., use the other 14 points to create a
surface). The Error column is simply the difference
between the measured and predicted values.
• Because we can have an over-prediction or under-
prediction at any point, the error can be positive or
negative. So averaging the errors won’t do us much
good if we want to see the overall error – we’ll end
up with a value that is essentially zero due to these
positives and negatives
• Thus, in order to assess the extent of error in our
prediction, we square each term, and then take the
average of these squared errors. This average is called
the mean squared error (MSE)
• For example, ArcGIS reports the square root of this
mean squared error (referred to as simply Root-
Mean-Square in Geostatistical Analyst). This root
mean square error is often denoted as RMSE.
231
Examples of IDW with Different q’s
• Larger q’s (i.e., power to which distance is raised) yield smoother surfaces
• Food for thought: What happens when q is set to 0?
Gold concentrations at locations
in western PA
q = 1
q=2
q=3
q=10
The Geostatistical Analyst of ArcGIS is
able to tell you the optimal value of q
by seeing which one yields the
minimum RMSE. (Here, it is q=1).
232
Part 2. A Review of Stats 101
233
Before we do any Geostatistics…
• … Let’s review some basic statistical topics:
– Normality
– Variance and Standard Deviations
– Covariance and Correlation
• … and then briefly re-examine the underlying
premise of most spatial statistical analyses:
– Autocorrelation
234
Normality
• A lot of statistical tests – including many in
geostatistics – rely on the assumption that the
data are normally distributed
• When this assumption does not hold, the
results are often inaccurate
235
N=140
236
Data Transformations
• Sometimes, it is possible to transform a variable’s distribution by
subjecting it to some simple algebraic operation.
– The logarithmic transformation is the most widely used to achieve
normality when the variable is positively skewed (as in the image on
the left below)
– Analysis is then performed on the transformed variable.
237
The Mean and the Variance
• The mean (average) of a variable is also known as the
expected value
– Usually denoted by the Greek letter μ
– As an aside, for a normally distributed variable, the mean
is equal to the median
• The variance is a measure of dispersion of a variable
– Calculated as the average squared distance of the possible
values of the variable from mean.
– Standard deviation is the square root of the variance
– Standard deviation is generally denoted by the Greek letter
σ, and variance is therefore denoted by
238
Example: Calculation of Mean and Variance
Person Test Score Distance from the Mean (Distance from the Mean) Squared
1 90 15 225
2 55 -20 400
3 100 25 625
4 55 -20 400
5 85 10 100
6 70 -5 25
7 80 5 25
8 30 -45 2025
9 95 20 400
10 90 15 225
Mean: 75
Variance: 445 (Average of the
entries in this column)
Standard deviation (Square root of
the variance): 21.1
239
Covariance and Correlation
• Defined as a measure of how much two variables X
and Y change together
– The units of Cov (X, Y) are those of X multiplied by those of Y
– The covariance of a variable X with itself is simply the
variance of X
• Since these units are fairly obscure, a dimensionless
measure of the strength of the relationship between
variables is often used instead. This measure is known
as the correlation.
– Correlations range from -1 to 1, with positive values close to
one indicating a strong direct relationship and negative
values close to -1 indicating a strong inverse relationship
240
Spatial Autocorrelation
• Sometimes, rather than examining the association
between two variables, we might look at the relationship
of values within a single variable at different time points
or locations
• There is said to be (positive) autocorrelation in a variable
if observations that are closer to each other in space
have related values (recall Tobler’s Law)
• As an aside, there could also be temporal autocorrelation
– i.e., values of a variable at points close in time will be
related
241
Examples of Spatial Autocorrelation
(Source: http://image.weather.com/images/maps/current/acttemp_720x486.jpg)
242
Examples of Spatial Autocorrelation (Cont’d)
(Source: http://capita.wustl.edu/CAPITA/CapitaReports/localPM10/gifs/elevatn.gif)
243
Regression
• A statistical method used to examine the relationship
between a variable of interest and one or more
explanatory variables
– Strength of the relationship
– Direction of the relationship
• Often referred to as Ordinary Least Squares (OLS)
regression
• Available in all statistical packages
• Note that the presence of a relationship does not
imply causality
244
For the purposes of demonstration, let’s focus
on a simple version of this problem
• Variable of interest (dependent variable)
– E.g., education (years of schooling)
• Explanatory variable (AKA independent variable or predictor):
– E.g., Neighborhood Income
245
But what does a regression do? An example with a single predictor
246
The example on the previous page can be
easily extended to cases when we have more
than one predictor
• When we have n>1 predictors, rather than getting a line in 2 dimensions,
we get a line in n+1 dimensions (the ‘+1’ accounts for the dependent
variable)
• Each independent variable will have its own slope coefficient which will
indicate the relationship of that particular predictor with the dependent
variable, controlling for all other independent variables in the regression.
• The equation of the best fit line becomes
Dep. Variable = m1*predictor1 + m2*predictor2 + … + m3*predictor 3 + b + residuals
where the m’s are the coefficients of the corresponding predictors and b is
the y-intercept term
• The coefficient of each predictor may be interpreted as the amount by
which the dependent variable changes as the independent variable
increases by one unit (holding all other variables constant)
247
Some (Very) Basic Regression Diagnostics
• R-squared: the percent of variance in the dependent variable that
is explained by the independent variables
• The so-called p-value of the coefficient
– The probability of getting a coefficient (slope) value as far from zero as we
observe in the case when the slope is actually zero
– When p is less than 0.05, the independent variable is considered to be a
statistically significant predictor of the dependent variable
– One p-value per independent variable
• The sign of the coefficient of the independent variable (i.e., the
slope of the regression line)
– One coefficient per independent variable
– Indicates whether the relationship between the dependent and
independent variables is positive or negative
– We should look at the sign when the coefficient is statistically significant
248
Some (but not all) regression assumptions
1. The dependent variable should be normally
distributed (i.e., the histogram of the variable
should look like a bell curve)
2. Very importantly, the observations should be
independent of each other. (The same holds
for regression residuals). If this assumption is
violated, our coefficient estimates could be
wrong!
249
Part 3. Geostatistical Interpolation
250
Some Widely Used Texts on Geostatistics
– Bailey, T.C. and Gatrell, A.C. (1995) Interactive
Spatial Data Analysis. Addison Wesley Longman,
Harlow, Essex.
– Cressie, N.A.C. (1993) Statistics for Spatial Data.
(Revised Edition). Wiley, John & Sons, Inc.,
– Isaaks, E.H. and Srivastava, R.M. (1989) An
Introduction to Applied Geostatistics. Oxford
University Press, New York, 561 p.
Ecosystems are: Hierarchically structured, Metastable, Far
from equilibrium
Spatial Relationships
Theoretical
Framework:
“An Introduction to Applied Geostatistics“, E. Isaaks and R. Srivastava, (1989).
“Factorial Analysis”, C. J. Adcock, (1954)
“Spatial Analysis: A guide for ecologists”, M. Fortin and M. Dale, (2005)
Geostatistics:     Spatial Data Analysis
Time
Basic paradigm:
Ecosystem processes (change) are constrained and controlled by the pattern of
hierarchical scales
“Things” closer together (in both space and time) are more alike then things far
apart – “Tobler’s Law” (1970, Economic Geography) “Everything is related to everything, but near
things are more related then distant things”
Ecological “scale” is the space and time “distance” apart (lag) at which significant
variation is NO LONGER correlated with “distance”
Distance
Scale
Variation
Applied Geostatistics
Spatial Structure
Regionalized Variable
Spatial Autocorrelation
Moran I (1950) GEARY C (1954)
Semivariance
Stationarity Anistotropy
Applied Geostatistics
Notes on Introduction to Spatial
Autocorrelation
Geostatistical methods were developed for interpreting data that varies continuously over a
predefined, fixed spatial region. The study of geostatistics assumes that at least some of the
spatial variation observed for natural phenomena can be modeled by random processes with
spatial autocorrelation.
D}
i
:
{z(i) 
Geostatistics is based on the theory of regionalized variables, variable distributed in
space (or time). Geostatiscal theory supports that any measurement of
regionalozed variables can be viewed as a realization of a random function (or
random process, or random field, or stochastic process)
Spatial Structure
Geostatistical techniques are designed to evaluate the spatial structure of a variable,
or the relationship between a value measured at a point in one place, versus a
value from another point measured a certain distance away.
Describing spatial structure is useful for:
 Indicating intensity of pattern and the scale at which that pattern is exposed
 Interpolating to predict values at unmeasured points across the domain (e.g. kriging)
 Assessing independence of variables before applying parametric tests of significance
Regionalized Variables take on values according to spatial location.
Given a variable z, measured at a location i , the variability in z can be broken down into three
components:
Where:



 )
(
)
(
)
( i
i
i s
f
z

)
(i
f A “structural” coarse scale forcing or trend

)
(i
s A random” Local spatial dependency

 error variance (considered normally distributed)
Usually removed by detrending
What we are interested in
Coarse scale forcing or trends can be removed by fitting a surface to the trend using
regression and then working with regression residuals
Regionalized Variable
+
Z1
+
Z3
+
Zn
+
Z4 +
Zi
+
Z2
Regionalized Variable Zi
Variables are spatially correlated,
Therefore:
Z(x+h) can be estimated from
Z(x) by using a regression model.
** This assumption holds true with a
recognized increased in error, from
other lest square models.
Function Z in domain D
= a set of space dependent values
Histogram of samples zi
Z(x)
Z(x+h)
Cov(Z(x),Z(x+h))
D}
i
:
{z(i) 
X
1
3
2
1
:
:
µ
Y
3
2
5
3
:
:
µ
x y x·y x2 y2
b
a
Tan 

θ
a
b
As θ decreases, a/b goes to 0
θ1
a
b
A Statement of the extent to which two data sets agree.
θ2
One distribution
Two distributions
2
1 
 Tan
Tan 
data deviates
Deviations
Product
of
Deviations
Sum
of
squares
If you were to calculate correlation
by hand …. You would produce these
Terms.
Determined by the extent to which the two regressions lines depart
from the horizontal and vertical.
Correlation
Coefficient:
Correlation:
n
)
x
(x
n
)
y
(y
)/n
x
)(x
y
1(y
n
1
i
2
i
n
1
i
2
i
n
1
i
i
i










n
)
x
(x
n
)
x
(x
w
/
)
x
)(x
x
(x
w
n
1
i
2
i
n
1
i
2
i
n
1
i
n
1
i
n
1
j
ij
n
1
j
j
i
ij


 



  





Spatial
auto-correlation
Correlation
Coefficient




 
 



n
1
i
2
i
n
1
i
n
1
j
ij
n
1
i
n
1
j
j
i
ij
)
x
(x
)
w
(
)
x
)(x
x
(x
w
N
=
Briggs UT-Dallas GISC 6382 Spring 2007
Spatial Structure
Autocorrelation: := Degree of correlation to self
Spatial
Autocorrelation:
:= The relationship is a function of distance
Spatial
Structure which is:
Exogenous (induced) … induced external spatial dependence
Endogenous (inherent) … inherent spatial autocorrelation
Spatial
Dependence:
Compare values at given distance apart -- LAGS
A
B
C
D
Point – Point Autocorrelation
A - B Positive
A - C None
A - D Negative
Direction of
Autocorrelation:
Anisotropic := varies in intensity and range with orientation
Isotropic := varies similarly in all directions
Spatial Structure
Given: Spatial Pattern is an outcome of the synthesis of dynamic processes
operating at various spatial and temporal scales
Therefore: Structure at any given time is but one realization of several potential
outcomes
Assuming: All processes are Stationary (homogeneous)
Where: Properties are independent of absolute location and direction in space
Therefore: Observations are independent which := they are homoscedastic and
form a known distribution
That is:     ij
j
Z
i
Z j
i
X
X 


 ,
,
,
, 2
2


Stationarity is a property of the process NOT the data allowing spatial
inferences
And:
Stationarity is scale dependent
Furthermore:
Inference (spatial statistics) apply over regions of assumed stationarity
Thus:
Space
A B C D E F G H I J
First Order Neighbors Topology
Binary Connectivity Matrix
Distance Class
Connectivity Matrix
1
1 1
1 0 1
1 0 0 1
0 0 0 1 1
0 0 1 1 0 1
0 0 0 0 0 1 1
0 1 1 0 0 0 1 1
0 1 0 0 0 0 0 1 1
A
B
C
D
E
F
G
H
I
J
A B C D E F G H I J
1
1 2
1 2 1
1 2 2 1
2 3 2 1 1
2 2 1 1 2 1
3 2 2 2 2 1 1
2 1 1 2 3 2 1 1
2 1 2 3 3 2 2 1 1
A
B
C
D
E
F
G
H
I
J
J
I H
B
G
C F
D
A E
Topological v’s Euclidean
1= connected, 0=not connected
Spatial Autocorrelation
Positive autocorrelation:
Negative autocorrelation:
No autocorrelation:
A variable is thought to be autocorrelated if it is
possible to predict its value at a given location, by
knowing its value at other nearby locations.
 Autocorrelation is evaluated using structure functions that assess the spatial
structure or dependency of the variable.
 Two of these functions are autocorrelation and semivariance which are graphed as
a correlogram and semivariogram, respectively.
 Both functions plot the spatial dependence of the variable against the spatial
separation or lag distance.
Space
A B C D …. J
A 0.0
B 2.00 0.00
C 1.41 3.16 0.00
:
:
J
A B C D …. J
A 0
B 2 0
C 1 3 0
:
:
J
A B C D …. J
A 0
B 0 0
C 1 0 0
:
:
J
A B C D …. J
Euclidean Distance Matrix
Euclidean Distance Matrix
Connectivity Matrix
Weighted Matrix
A B
C D E
F G
H I J
K L
A 0
B 0 0
C 0.7 0 0
D 0.7 0.7 0
:
J
Moran I (1950)
The numerator is a covariance (cross-product) term; the denominator is a variance term.
Where:
n is the number of pairs
Zi is the deviation from the mean for value at location i (i.e., Zi = xi – x for variable x) Zj is the
deviation from the mean for value at location j (i.e., Zj = xj – x for variable x) wij is an indicator
function or weight at distance d (e.g. wij = 1, if j is in distance class d from point i, otherwise = 0)
Wij is the sum of all weights (number of pairs in distance class)
2
i
i
ij
i j
j
i
ij
(d)
Z
W
Z
Z
w
n
I



• A cross-product statistic that is used to describe autocorrelation
• Compares value of a variable at one location with values at all other locations
Values range from [-1, 1] Value =
1 : Perfect positive correlation Value = -1:
Perfect negative correlation
Moran I (1950)
Again; where for variable x:
n is the number of pairs
wij(d) is the distance class connectivity matrix (e.g. wij = 1, if j is in distance class d from point
i, otherwise = 0) W(d) is the sum of all
weights (number of pairs in distance class)
 
2
1
_
_
1 1
_
)
(
)
(
1
1





































n
i
i
j
n
j
i
i
n
i
j
j
i
ij
d
d
x
x
n
x
x
x
x
d
w
W
I

  


i
2
i
ij
i j
2
j
i
ij
(d)
Z
2W
]
)
y
(y
w
1)[
[(N
C
• A squared difference statistic for assessing spatial autocorrelation
• Considers differences in values between pairs of observations, rather than the
covariation between the pairs (Moran I)
GEARY C (1954)
The numerator in this equation is a defference term that gets squared.
The Geary C statistic is more sensitive to extreme values & clustering than the Moran I, and
behaves like a distance measure:
Values range from [0,3]
Value = 0 : Positive autocorrelation Value = 1
: No autocorrelation Value > 1 :
Negative autocorrelation
Ripley’s K (1976)
The L (d) transformations
 
 
1
,
1 1






N
N
j
i
k
A
L
i
i
j
j
(d)

Where:
A = area
N = nuber of points
D = distance
K(i,j) = the weight, which is 1 when |i-j| < d, 0 when |i-j| > d
Determines if features are clustered at multiple
different distance. Sensitive to study area
boundary. Conceptualized as “number of points”
within a set of radius sets.
If events follow complete spatial randomness, the
number of points in a circle follows a Poisson
distribution (mean less then 1) and defines the
“expected”.



j
i
j
i
ij
(d)
x
x
x
x
d
w
G
)
(
Where:
d = distance class
Wij = weight matrix, which is 1 when |i-j| < d, 0 when |i-j| > d
General G
Effectively Distinguishes between “hot and cold” spots. G is relatively large
if high values cluster, low if low values cluster.
Numerator are “within” a distance bound (d), expressed relative to the
entire study area.
Semivariance
2
)
(
1
)
(
2 j
i
i j
ij
d
y
y
w
n
d 
 

Where :
j is a point at distance d from i
nd is the number of points in that distance class (i.e., the sum of the weights wij for that
distance class)
wij is an indicator function set to 1 if the pair of points is within the distance class.
2
)
(
)
(
2
1
)
( i
d
i
d
n
i
y
y
d
n
d 

 



The geostatistical measure that describes the rate of change of the regionalized variable is
known as the semivariance.
Semivariance is used for descriptive analysis where the spatial structure of the data is
investigated using the semivariogram and for predictive applications where the
semivariogram is fitted to a theoretical model, parameterized, and used to predict the
regionalized variable at other non-measured points (kriging).
The sill is the value at which the semivariogram levels off (its asymptotic value)
The range is the distance at which the semivariogram levels off (the spatial extent of
structure in the data)
The nugget is the semivariance at a distance 0.0, (the y –intercept)
A semivariogram is a plot of the structure function that, like autocorrelation, describes the
relationship between measurements taken some distance apart. Semivariograms define
the range or distance over which spatial dependence exists.
Autocorrelation assumes stationarity, meaning that the spatial
structure of the variable is consistent over the entire domain of the dataset.
The stationarity of interest is second-order (weak) stationarity, requiring that:
(a) the mean is constant over the region
(b) variance is constant and finite; and
(c) covariance depends only on between-sample spacing
 In many cases this is not true because of larger trends in the data
 In these cases, the data are often detrended before analysis.
 One way to detrend data is to fit a regression to the trend, and use only
the residuals for autocorrelation analysis
Stationarity
Autocorrelation also assumes isotropy, meaning that the spatial structure of the variable is
consistent in all directions.
Often this is not the case, and the variable exhibits anisotropy, meaning that there is a
direction-dependent trend in the data.
Anistotropy
If a variable exhibits different ranges in different directions, then there is a geometric
anisotropy. For example, in a dune deposit, larger range in the wind direction compared to
the range perpendicular to the wind direction.
bd
co 

γ(d)
)]
/
exp(
1
[
γ(d) 2
2
a
d
c
co 



)]
/
exp(
1
[
γ(d) a
d
c
co 



Gaussian:
Linear:
Spherical:
Exponential:
For predictions, the empirical semivariogram is converted to a theoretic one by fitting a
statistical model (curve) to describe its range, sill, & nugget.









a
d
c
c
a
d
a
d
a
d
c
c
o
o
,
)],
2
/
(
)
2
/
3
[
γ(d)
3
3
There are four common models used to fit semivariograms:
Where:
c0 = nugget
b = regression slope
a = range
c0+ c = sill
Assumes no sill
or range
• Check for enough number of pairs at each lag distance (from 30 to 50).
• Removal of outliers
• Truncate at half the maximum lag distance to ensure enough pairs
• Use a larger lag tolerance to get more pairs and a smoother variogram
• Start with an omnidirectional variogram before trying directional variograms
• Use other variogram measures to take into account lag means and variances
(e.g., inverted covariance, correlogram, or relative variograms)
• Use transforms of the data for skewed distributions (e.g. logarithmic transforms).
• Use the mean absolute difference or median absolute difference to derive the range
Variogram Modeling Suggestions

More Related Content

PDF
UG_B.Sc._Psycology_11933 –PSYCHOLOGICAL STATISTICS.pdf
PPTX
Statistics and probability pp
PPT
Chapter07
PDF
Lecturenotesstatistics
PDF
1 lab basicstatisticsfall2013
PDF
Unit 4 rm
PPTX
MODULE 1: Random Variables and Probability Distributions Quarter 3 Statistics...
PPT
Chapter01
UG_B.Sc._Psycology_11933 –PSYCHOLOGICAL STATISTICS.pdf
Statistics and probability pp
Chapter07
Lecturenotesstatistics
1 lab basicstatisticsfall2013
Unit 4 rm
MODULE 1: Random Variables and Probability Distributions Quarter 3 Statistics...
Chapter01

Similar to Geostatistics: Spatial Data Analysis (20)

PPT
Chapter01
PPT
Penggambaran Data dengan Grafik
DOCX
DMV (1) (1).docx
PPT
Descriptive stats
PDF
1Basic biostatistics.pdf
PPTX
Statistics for second language educators
PPTX
ISM_Session_5 _ 23rd and 24th December.pptx
PPT
PDF
1. STATISTICS AND PROBABILITY.pdf
PDF
Stats-Proba_Q3_Lesson-1-with-Seatwork-1.pdf
PPTX
Basic geostatistics
PPT
Advanced statistics
PDF
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
PDF
01 introduction stat
DOCX
Random variables and probability distributions Random Va.docx
PDF
2020-2021 EDA 101 Handout.pdf
PDF
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
PDF
Statistics Refresher.pdf
PPT
New statistics
PPT
Probability and statistics
Chapter01
Penggambaran Data dengan Grafik
DMV (1) (1).docx
Descriptive stats
1Basic biostatistics.pdf
Statistics for second language educators
ISM_Session_5 _ 23rd and 24th December.pptx
1. STATISTICS AND PROBABILITY.pdf
Stats-Proba_Q3_Lesson-1-with-Seatwork-1.pdf
Basic geostatistics
Advanced statistics
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
01 introduction stat
Random variables and probability distributions Random Va.docx
2020-2021 EDA 101 Handout.pdf
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
Statistics Refresher.pdf
New statistics
Probability and statistics
Ad

Recently uploaded (20)

PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Trump Administration's workforce development strategy
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
IGGE1 Understanding the Self1234567891011
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
Introduction to pro and eukaryotes and differences.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
B.Sc. DS Unit 2 Software Engineering.pptx
Cambridge-Practice-Tests-for-IELTS-12.docx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Trump Administration's workforce development strategy
Paper A Mock Exam 9_ Attempt review.pdf.
Hazard Identification & Risk Assessment .pdf
Weekly quiz Compilation Jan -July 25.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
IGGE1 Understanding the Self1234567891011
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
TNA_Presentation-1-Final(SAVE)) (1).pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Environmental Education MCQ BD2EE - Share Source.pdf
Ad

Geostatistics: Spatial Data Analysis

  • 1. Geostatistics (GiSc 3052) By Moges GT & Samuel D.R 1
  • 2. Course objective and competences to be acquired • Students will be trained to apply geo-statistics and surface analysis to generate information about geographic features. 2
  • 3. 3 dataset of known values (e.g. temperature) raster interpolated from these points (e.g. Temperature)
  • 4. Upon the completion of this course students will be able to: • Understand the concept of regionalized variable theory. • Explain spatial relationships of the features. • Describe and model spatial data. • Understand the basic approach in conducting variogram modeling. • Understand how to apply geo-statistics methods in spatial interpolation. • Generate surface related information. • Use surface derivatives for different application. 4
  • 5. 1. Overview of classical statistics: • Probability theory review, univariate, bivariate and multivariate data analysis. Practical 1: • Orientation about the course nature and laboratory regulation. • Univariate data analysis Unit 1 5
  • 6. Probability Theory Review • We review some basic concepts and results of probability theory that are of relevance. It is meant to be a refresher of concepts covered in a separate statistics course. However, in case you have not taken such courses, the concepts and results will be explained using simple examples. Unit 1 6
  • 7. Univariate Analysis: Introduction • The probability of an event as between 0 and 1, representing the chance or relative frequency of occurrence of the event. The probabilities of all possible (mutually exclusive) events of an experiment must sum to 1. • In practice, the outcomes of experiments are assigned numerical values, e.g., when tossing a coin, 1 and 2 can be assigned to the outcome of “head" and “tail", respectively • Such numerical values can be represented by a Random variable (random variable ). Unit 1 7
  • 8. Univariate Analysis: Introduction • Two types of random variable exist: – discrete and – continuous • Discrete examples include – the outcome of tossing a coin (head or tail) – the grades of course with Pass or Fail – Landuse (forest, agricultural, water) Unit 1 8
  • 9. Univariate Analysis: Introduction • Continuous examples include – the height of all men in the country (ranging from, say, 1.5 to 1.90m), – the grades of a class (e.g., 0.0 to 100.0 points) – Raster* data covering WG area such as • Elevation of the terrain (raster data with 90m resolution); e.g SRTM DEM used in GIS classes • NDVI derived from Landsat 8 image (2013, 335 day of the year) • Temperature in raster format Unit 1 9
  • 10. Univariate Analysis: Introduction • The probability of a random variable occurring at any possible value (discrete random variable ) or within a range of values (continuous random variable ) is described by its probability distribution. Unit 1 10
  • 11. Univariate Analysis: Introduction In the example (figure), P1 = Pr[X = 1] = 0.5, P2 = Pr[X = 2] = 0.5, and P1+P2 = 1. In the discrete random variable , its distribution is just the frequency (or proportion) of occurrence. Outcomes of experiments of tossing a coin: discrete random variable and its probability distribution. Unit 1 11
  • 12. Exercise 1: • Calculate the probability distribution of the trees in the manmade (plantation) forest according to size (diameter of the trees). We use sample data from WGCFNR plantation forest • Steps: – (1) assign each tree numerical value (a discrete random variable ) – using diameter class (e.g. Class 1 = 10 – 15 cm, class 2 = 15 – 20 cm. etc; – (2) calculate the proportion; – (3) plot the probability. – You can either do it by hand with a calculator, or using Excel. Does the total probability sum up to 1? We often call such a diagram “histogram". Unit 1 12
  • 13. Exercise 1: Diameter class (DBH class) Class range Number of trees X = xi 13 10 – 15 5 1 18 15 – 20 15 2 23 20 – 25 21 3 28 25 - 30 7 4 33 30 - 35 6 5 Table 1: distribution of the trees in the manmade (plantation) forest according to size (diameter of the trees); data from WGCFNR forest. An example of a discrete random variable Unit 1 13
  • 14. Constructing a Histogram for Continuous Data: Equal Class Widths • Determine the frequency and relative frequency for each class. Mark the class boundaries on a horizontal measurement axis. • Above each class interval, draw a rectangle whose height is the corresponding relative frequency (or frequency). Unit 1 14
  • 15. Exercise 2: • A sample data set of a continuous random variable : the thickness (X; m) of an aquifer is measured along the horizontal distance (di; m) (Table 2). For the thickness, calculate the mean, variance, standard deviation and CV, calculate and plot the histogram and cumulative distribution. Unit 1 15
  • 16. Exercise 2: Table 2: Aquifer thickness along a D distance. di 1 2 3 4 5 6 7 8 9 10 11 xi 56 57 55 54 49 43 37 36 39 37 41 di 12 13 14 15 16 17 18 19 20 21 22 xi 41 36 33 40 44 53 53 54 51 48 54 di 23 24 25 26 27 28 29 30 31 32 33 xi 63 65 63 63 53 50 50 54 49 43 43 di 34 35 36 37 38 39 xi 47 47 50 53 61 61 Unit 1 16
  • 17. pdf and cdf functions (curves) of a continuous random variable • Probability density function (pdf: fX(x)) • Cumulative distribution function (cdf: FX(x)) Unit 1 17
  • 18. Exercise 2: • For the sample set, a few other key statistics are also of interest: – mean (μ) – Variance (σ2) – Standard deviation (σ) = – Coefficient of variation (CV = σ/μ) Unit 1 18
  • 19. Measures of Variability • Results vary from individual to individual, from group to group, from city to city, from moment to moment. Variation always exists in a data set, regardless of which characteristic you’re measuring, because not every individual will have the same exact value for every characteristic you measure. Unit 1 19
  • 20. Measures of Variability • Without a measure of variability you can’t compare two data sets effectively. – What if in both sets two sets of data have about the same average and the same median? – Does that mean that the data are all the same? • Not at all. Unit 1 20
  • 21. Measures of Variability • For example, the data sets 199, 200, 201, and 0, 200, 400 both have – the same average, • which is 200, – and the same median, • which is also 200. • Yet they have very different amounts of variability. • The first data set has a very small amount of variability compared to the second. Unit 1 21
  • 22. Measures of Variability • By far the most commonly used measure of variability is the standard deviation. • The standard deviation of a data set represents the typical distance from any point in the data set to the center. • It’s roughly the average distance from the center, and in this case, the center is the average. Unit 1 22
  • 23. Bivariate Analysis: Introduction • In the previous section, we look at the statistical measures of a single random variable However, correlation can often exist between two random variable • For example, – the height and diameter of tree are often correlated. – the elevation and temperature in most areas are often correlated. Unit 1 23
  • 24. Bivariate Analysis: Introduction • In this chapter you analyze two numerical variables, X and Y, to look for patterns, find the correlation, and make predictions about Y from X, if appropriate, using simple linear regression. Unit 1 24
  • 26. Bivariate Analysis: Introduction • In this case, the weight increases with increasing height for which we say a positive correlation exists between the two variables. To investigate correlation, a scatter plot is often used, e.g., for each person, the height and weight is cross- plotted. Often, some sort of fit is attempted. Here, we see a linear function fitted to the scatter plot. Unit 1 26
  • 27. Bivariate Analysis: Introduction • However, to quantitatively evaluate correlation, a correlation coefficient (rXY ) is often used: Unit 1 27
  • 28. Bivariate Analysis: Introduction • As defined previously, μX (or μY ) is the mean of X (or Y) in its univariate distribution. • ρXY ( or the rXY) varies between –-1 (perfect negative correlation: Y=-X) to –1 (perfect positive correlation: Y=X). • When rXY = 0, we say the two variables are not correlated. • In our example, rXY = 0.76, thus there is a certain amount of positive correlation between weight and height. Unit 1 28
  • 29. correlation between two random variable • The correlation between two random variable is the cornerstone of geostatistics: – one random variable is a geological/hydrological/petrophscial property at one spatial location, – the second random variable can be the • (1) same property at a different location (auto- correlation studies; kriging); or, • (2) a different property at a different location (cross- correlation studies, co-kriging). Unit 1 29
  • 30. Bivariate Random Variables • The covariance between X and Y (σ XY ) measures how well the two variables track each other: when one goes up, how does the other go on average? • The unit of covariance is the product of the unit of random variable X and unit of random variable Y. The covariance between a random variable X with itself is equal to its variance: Unit 1 30
  • 31. correlation • The correlation (or correlation coefficient) between X and Y is a dimension-less, normalized version of ρ XY : see next slide Unit 1 31
  • 33. covariance • an estimator of the covariance can be defined as: • If X and Y are independent, then they are uncorrelated and their covariance σ XY (thus ρ XY ) is zero. • The covariance is best thought of as a measure of linear dependence. Unit 1 33
  • 34. Multivariate Analysis • Linear Combination of many random variable • Extending the bivariate arithmetics into multivariate analysis, we can get another host of relationships. Unit 1 34
  • 35. The idea of regression • The idea of regression is to build a model that estimates or predicts one quantitative variable (y) by using at least one other quantitative variable (x). Simple linear regression uses exactly one x variable to estimate the y variable. • Multiple linear regression, on the other hand, uses more than one x variable to estimate the value of y. Unit 1 35
  • 36. Discovering the uses of multiple regression • One situation in which multiple regression is useful is when the y variable is hard to track down; that is, its value can’t be measured straight up, and you need more than one other piece of information to help get a handle on what its value will be. Unit 1 37
  • 37. General form of the multiple regression model • The general idea of simple linear regression is to fit the best straight line through that data that you possibly can and use that line to make estimates for y based on certain x- values. The equation of the best-fitting line in simple linear regression is – y = b0 + b1x1 – where b0 is the y-intercept and b1 is the slope. • (The equation also has the form y = a +bx) Unit 1 39
  • 38. General form of the multiple regression model • In the multiple regression setting, you have more than one x variable that is related to y. – Call these x variables x1, x2, . . . xk. • In the most basic multiple regression model, you use some or all of these x variables to estimate y where each x variable is taken to the first power. This process is called finding the best-fitting linear function for the data. • This linear function looks like the following: – y = b0 + b1x1 + b2x2 + . . . + bkxk – and you can call it the multiple (linear) regression model. • You use this model to make estimates about y based on given values of the x variables. Unit 1 40
  • 39. General form of the multiple regression model • A linear function is an equation whose x terms are taken to the first power only. – For example y = 2x1 + 3x2 + 4x3 is a linear equation using three x variables. • If any of the x terms are squared, the function would be a quadratic one; • If an x term is taken to the third power, the function would be a cubic function, and so on. In this chapter, I consider only linear functions. Unit 1 41
  • 40. Unit 2. Introduction to Geostatistics: Definition and history of geostatistics, advantages of geostatistics, geostatistics analysis requirements. Practical 2: Bivariate data analysis Unit 2 42
  • 41. What is geostatistics? • What is statistics? • What then is geo-statistics? Unit 2 43
  • 42. Comment • The term statistics has two common meanings, which we want to clearly separate: descriptive and inferential statistics. • But to understand the difference between descriptive and inferential statistics, we must first be clear on the difference between populations and samples. Unit 2 44
  • 43. Populations and samples • A population is a set of well-defined objects. – We must be able to say, for every object, if it is in the population or not. – We must be able, in principle, to find every individual of the population. • A geographic example of a population is all pixels in a multi-spectral satellite image. • A sample is some subset of a population. – We must be able to say, for every object in the population, if it is in the sample or not. – Sampling is the process of selecting a sample from a population. – Continuing the example, a sample from this population could be a set of pixels from known ground truth points. Unit 2 45
  • 44. What do we mean by statistics? • Two common use of the word: – Descriptive statistics: numerical summaries of samples; • (what was observed) – Inferential statistics: from samples to populations. • (what could have been or will be observed in a larger population) Unit 2 46
  • 45. A concise definition of inferential statistics • Statistics: The determination of the probable from the possible – . . . which implies the rigorous definition and then quantification of “probable". – Probable causes of past events or observations – Probable occurrence of future events or observations • This is a definition of inferential statistics: – Observations → Inferences Unit 2 47
  • 46. Why use statistical analysis? • Descriptive: we want to summarize some data in a shorter form • Inferential: We are trying to understand some process and maybe predict based on this understanding. • So we need to model it, i.e. make a conceptual or mathematical representation, from which we infer the process. • But how do we know if the model is “correct"? • Are we imagining relations where there are none? • Are there true relations we haven't found? – Statistical analysis gives us a way to quantify the confidence we can have in our inferences. Unit 2 48
  • 47. Comment • The most common example of geo-statistical inference is the prediction of some attribute at an unsampled point, based on some set of sampled points. • In the next slide we show an example from the Meuse river floodplain in the southern Netherlands. The • copper (Cu) content of soil samples has been measured at 155 points (left figure); from this we can predict • at all points in the area of interest (right figure). Unit 2 49
  • 49. What is geo-statistics? • Geostatistics is statistics on a population with known location, i.e. coordinates: – In one dimension (along a line or curve) – In two dimensions (in a map or image) – In three dimensions (in a volume) • The most common application of geostatistics is in 2D (maps). • Key point: Every observation (sample point) has both: – coordinates (where it is located); and – attributes (what it is). Unit 2 51
  • 50. Comment • Let's first look at a data set that is not geo-statistical. • It is a list of soil samples (without their locations) with the lead (Pb) concentration. The column Pb is the attribute of interest. Unit 2 52
  • 51. To check your understanding . . . • Q5 : Can we determine the mean, maximum, minimum and standard deviation of this set of samples? • Q6 : Can we make a map of the sample points with their Pb values? Unit 2 53
  • 52. Comment • Now we look at a data set that is geo-statistical (next slide). • These are soil samples taken in the Jura mountains of Switzerland, and their lead content; but this time with their coordinates. • The columns E and N are the coordinates, i.e. the spatial reference; the column Pb is the attribute. • First let's look at the tabular form: Unit 2 54
  • 54. To check your understanding . . . • Q7 : Comparing this to the non- geostatistical list of soil samples and their lead contents (above), what new information is added here? Unit 2 56
  • 55. Comment • On the figure (next slide) you will see: – A coordinate system (shown by the over-printed grid lines) – The locations of 256 sample points - where a soil sample was taken – The attribute value at each sample point - symbolized by the relative size of the symbol at each point - in this case the amount of lead (Pb) in the soil sample Unit 2 57
  • 57. To check your understanding . . . • Q8 : In the figure, how can you determine the coordinates of each sample point? • Q9 : What are the coordinates of the sample point displayed as a red symbol? • Q10 : What is the mathematical origin (in the sense of Cartesian or analytic geometry) of this coordinate system? • Q11 : How could these coordinates be related to some common system such as UTM? Unit 2 59
  • 58. To check your understanding . . . • Q12 : Suppose we have a satellite image that has not been geo-referenced. Can we speak of geostatistics on the pixel values? • Q13 : In this case, what are the coordinates and what are the attributes? • Q14 : Suppose now the images has been geo- referenced. What are now the coordinates? Unit 2 60
  • 59. Geostatistics requirements • The location of a sample is an intrinsic part of its definition. • All data sets from a given area are implicitly related by their coordinates – So they can be displayed and related in a GIS • Values at sample points can not be assumed to be independent: there is often evidence that nearby points tend to have similar values of attributes. • That is, there may be a spatial structure to the data – Classical statistics assumes independence of samples – But, if there is spatial structure, this is not true! – This has major implications for sampling design and statistical inference • Data values may be related to their coordinates ! spatial trend Unit 2 61
  • 60. Feature and geographic spaces • The word space is used in mathematics to refer to any set of variables that form metric axes and which therefore allow us to compute a distance between points in that space. – If these variables represent geographic coordinates, we have a geographic space. – If these variables represent attributes, we have a feature space. Unit 2 62
  • 61. Comment • You are probably quite familiar with feature space from your study of non-spatial statistics. • Even with one variable, we have a unit of measure; this forms a 1D or univariate feature space. • Most common are two variables which we want to relate with correlation or regression analysis; this is a bivariate feature space. • In multivariate analysis the feature space has more than two dimensions. Unit 2 63
  • 62. Comment • Multivariate feature spaces can have many dimensions; we can only see three at a time. Unit 2 64
  • 63. Comment • So, feature space is perhaps a new term but not a new concept if you've followed a statistics course with – univariate, bivariate and multivariate analysis. • What then is geographic space? Simply put, it is a mathematical space where the axes are map coordinates that relate points to some reference location on or in the Earth (or another physical body). • These coordinates are often in some geographic coordinate system that was designed to give each location on (part of) the Earth a unique identification; a common example is the Universal Transmercator (UTM) grid. • However, a local coordinate system can be used, as long as there is a clear relation between locations and coordinates. Unit 2 65
  • 64. Geographic space • Axes are 1D lines; they almost always have the same units of measure (e.g. metres, • kilometres . . . ) – One-dimensional: coordinates are on a line with respect to some origin . – Two-dimensional: coordinates are on a grid with respect to some origin . – Three-dimensional: coordinates are grid and elevation from a reference elevation • Note: latitude-longitude coordinates do not have equal distances in the two dimensions; they should be transformed to metric (grid) coordinates for geo- statistical analysis. Unit 2 66
  • 65. Interpolation • Interpolation is based on the assumption that spatially distributed objects are spatially correlated; in other words, things that are close together tend to have similar characteristics. • For instance, if it is raining on one side of the street, you can predict with a high level of confidence that it is also raining on the other side of the street. • You would be less sure if it was raining across town and less confident still about the state of the weather in the neighbouring province. 68
  • 66. What is a spatial interpolation? • Interpolation predicts values for cells in a raster from a limited number of sample data points. It can be used to predict unknown values for any geographic point data: elevation, rainfall, chemical concentrations, noise levels, and so on. 69
  • 67. 70 On the left is a point dataset of known values. On the right is a raster interpolated from these points. Unknown values are predicted with a mathematical formula that uses the values of nearby known points.
  • 69. E N Unit 3: non-geoststistical spatial analysis 72 896 477 Sample points for copper What is the value at point p? 606 794 783 646
  • 70. Interpolation methods (to be discussed in the class) • Natural neighbour • Local simple mean (average method) • Polygon • Triangulation • Inverse Distance Method • Polynomial equation • Spline 73
  • 71. Natural neighbour • Natural Neighbor interpolation finds the closest subset of input samples to a query point and applies weights to them based on proportionate areas to interpolate a value (Sibson, 1981). It is also known as Sibson or "area-stealing" interpolation. 74 Unit 3: non-geoststistical spatial analysis Interpolation methods
  • 78. IDW • The IDW (Inverse Distance Weighted) tool uses a method of interpolation that estimates cell values by averaging the values of sample data points in the neighborhood of each processing cell. The closer a point is to the center of the cell being estimated, the more influence, or weight, it has in the averaging process. 81 Interpolation methods
  • 80. Spline • The Spline tool uses an interpolation method that estimates values using a mathematical function that minimizes overall surface curvature, resulting in a smooth surface that passes exactly through the input points. 83 Interpolation methods
  • 86. More explanation on: point Estimation • For each of the point estimation methods we describe in the following sections, we will show the detail’s of the estimation of the V value at 65E,137N. • No sweeping conclusions should be drawn from this single example; it is presented only to provide a familiar common thread through our presentation of various methods. Once we have looked at 89 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 87. Distances to sample values in the vicinity of 65E,137N 90 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 88. point estimation methods • The Values at sample locations near 65E, 137N • are shown in the figure on the next slide and listed in the previous Table. – The variability of these nearby sample values presents a challenge for estimation. – Values range from 227 to 791 ppm; • The estimated value therefore, can cover quite a broad range depending on how we choose to weight the individual values. • In the following sections we will look at four quite different point estimation methods. 91 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 89. E N Unit 3: non-geoststistical spatial analysis 92 696 477 606 794 783 646 The goal is to estimate the value of V at the point 65E,137N, located by the arrow, from the surrounding seven V data values.
  • 90. The method of triangulation (1/9) • by fitting a plane through three samples that surround the point being estimated. The equation of a plane can be expressed generally as  z = ax + by + c • In our example, where we are trying to estimate V values using coordinate information, z is the V value, x is the easting, and y is the northing. • Given the coordinates and the V value of three nearby samples, we can calculate the coefficients a, b and c by solving the following system of equations:  aX1+ bY1 + c = Z1  aX2 + bY2 + c = Z2  αX 3 + bY3 + C = Z3 93 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 91. The method of triangulation (2/9) • From the figure we can find three samples that nicely surround the point being estimated: the 696 ppm, the 227 ppm, and the 606 ppm samples. • Using the data for these three samples, the set of equations we need to solve is 63a + 140b + c = 696 64a + 129b + c = 227 71a + 140b + c = 606 94 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 92. The method of triangulation (3/9) • The solution to these three simultaneous equations is α= -11.250 b =41.614 c= -442.159 • which gives us the following equation as our triallgulation estimator: V = 1. 250x + 41.614y - 442.159 • This is the equation of the plane that passes through the three nearby samples we have chosen; • Using this equation we can now estimate the value at any location simply by substituting the appropriate easting and northing. Substituting the coordinates x = 65 and y = 137 into our equation gives us an estimate of 548.7 ppm at the location 65E,137N. 95 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 93. 96 The figure show the contours of the estimated V values that this equation produces. The method of triangulation (4/9) Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 94. 97 The method of triangulation (5/9) Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 95. 98 The method of triangulation (6/9) Interpolation methods using samples
  • 96. 99 The method of triangulation (7/9) Interpolation methods using samples
  • 97. 100 The method of triangulation (8/9) Interpolation methods using samples
  • 98. 101 The method of triangulation (9a/9) Interpolation methods using samples
  • 99. 102 The method of triangulation (9b/9) Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 100. Local Sample Mean 103 • T he mean of the seven nearby samples shown in Figure is 603.7ppm. This estimate is much higher than the triangulation estimate. • The two samples with V values greater than 750 ppm in the eastern half of Figure receive more than 25% of the total weight and therefore have a considerable influence on our estimated value. Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 101. Inverse Distance Methods (1/6) • One obvious way to do this is to make the weight for each sample inversely proportional to its distance from the point being estimated: 104 • d1, . . . ,dn are the distances from each of the n sample locations to the point being estimated and Vl , . . . ,Vn are the sample values. Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 102. Inverse distance weighting calculations for sample values in the vicinity of 65E,137N 105 Inverse Distance Methods (2/6) Unit 3: non-geoststistical spatial analysis
  • 103. Inverse Distance Methods (3/6) • The nearest sample, the 696 ppm sample at 63E, 140N, receives about 26% of the total weight, while the farthest sample, the 783 ppm sample at 75E, 128N, receives less than 7%. A good example of the effect of the inverse distance weighting can be found in a comparison of the weights given to the 477 ppm sample and the 791 pprn sample. • The 791 ppm sample at 73E, 141N is about twice as far away from the point we are trying to estimate as the 477 ppm sample at 61E, 139N; the 791 ppm sample therefore receives about half the weight of the 477 ppm sample. • Using the weights given in previous Table our inverse distance estimate of the V value at 65E, 137N is 594 ppm. 106 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 104. Inverse Distance Methods (4/6) • The inverse distance estimator we gave in the next Equation can easily be adapted to include a broad range of estimates. Rather than using weights that are inversely proportional to the distance, we can make the weights inversely proportional to any power of the distance: 107 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 105. Inverse Distance Methods (5/6) 108 The effect of the inverse distance exponent on the sample weights and on the V estimate. Interpolation methods using samples
  • 106. Inverse Distance Methods (6/6) • Different choices of the exponent p will result in different estimates. 109 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 107. Search Neighbourhoods • For the case studies we perform in this chapter, we use a circular search neighbourhood with a radius of 25 m. • All samples that fall within 25 m of the point we are estimating will be included in the estimation procedure. 110 Interpolation methods using samples Unit 3: non-geoststistical spatial analysis
  • 108. The First Law of Geography Tobler’s Law: • The central tenet of Geography is that location matters for understanding a wide variety of phenomena. • Everything is related to everything else, but things that are closer together are more related to each other than those that are further apart Unit 3: The First Law of Geography 111
  • 109. Geographers’ Perspectives on the World • Location matters – Real-world relationships – Horizontal connections between places – Importance of scale (both in time and space) Unit 3: The First Law of Geography 112
  • 110. Geographic Information • Includes knowledge about where something is • Includes knowledge about what is at a given location • Can be very detailed: – e.g. the locations of all buildings in a city or the – locations of all trees in a forest stand • Or it can be very coarse: – e.g. the population density of an entire country or the global sea surface temperature distribution • There is always a spatial component associated with geographic information Unit 3: The First Law of Geography 113
  • 118. Practical 4 to 5: Exploratory data analysis • Non-geostatistical interpolation (ArcGIS and/or QGIS) – Inverse distance – Closest point – Moving average – Least square polynomial – Spline – Triangulation • Individual report (short report) – What are the required input (data and parameters) – What are the outputs – Compare the different • methods and/or parameters 122
  • 119. 123
  • 120. 4. Characterizing spatial process • Covariance, • Correlation and variogram. • Understanding and measure of similarity between different data. 124 Unit 4: Characterizing spatial process
  • 121. variogram • The most common way to visualize local spatial dependence is the variogram, also called (for historical reasons) the semivariogram. • To understand this, we have to first define the semivariance as a mathematical measure of the difference between the two points in a point-pair. 125 Unit 4: Characterizing spatial process
  • 122. 126 The semi-variogram is based on modelling the (squared) differences in the z-values as a function of the distances between all of the known points. Unit 4: Characterizing spatial process
  • 123. Semivariance • This is a mathematical measure of the difference between the two points in a point-pair. It is expressed as squared difference so that the order of the points doesn't matter (i.e. subtraction in either direction gives the same results). Each pair of observation points has a semivariance, usually represented as the Greek letter (`gamma'), and dened as: • where x is a geographic point and z(x) is its attribute value. • (Note: The `semi' refers to the factor 1=2, because there are two ways to compute for the same point pair.) • So, the semivariance between two points is half the squared dierence between their values. If the values are similar, the semivariance will be small. 127 Unit 4: Characterizing spatial process
  • 125. Point pair • Now we know two things about a point-pair: 1. The distance between them in geographic space; 2. The semivariance between them in attribute space. • So . . . it seems natural to see if points that are `close by' in geographical space are also `close by' in attribute space. • This would be evidence of local spatial dependence. 129 Unit 4: Characterizing spatial process
  • 126. The variogram cloud • This is a graph showing semivariances between all point-pairs: – X-axis: The separation distance within the point-pair – Y-axis: The semivariance • Advantage: Shows the comparison between all point-pairs as a function of their separation; • Advantage: Shows which point-pairs do not fit the general pattern • Disadvantage: too many graph points, hard to interpret 130 Unit 4: Characterizing spatial process
  • 129. variogram cloud • Clearly, the variogram cloud gives too much information. If there is a relation between separation and semi-variance, it is hard to see. • The usual way to visualize this is by grouping the point-pairs into lags or bins according to some separation range, and computing some representative semi-variance for the entire lag. • Often this is the arithmetic average, but not always. 133 Unit 4: Characterizing spatial process
  • 130. 134 Origins • Involve a set of statistical techniques called Kriging (there are a bunch of different Kriging methods) • Kriging is named after Danie Gerhardus Krige, a South African mining engineer who presented the ideas in his masters thesis in 1951. These ideas were later formalized by a prominent French mathematician Georges Matheron • For more information, see: – Krige, Danie G. (1951). "A statistical approach to some basic mine valuation problems on the Witwatersrand". J. of the Chem., Metal. and Mining Soc. of South Africa 52 (6): 119–139. – Matheron, Georges (1962). Traité de géostatistique appliquée, Editions Technip, France • Kriging has two parts: the quantification of the spatial structure in the data (called variography) and prediction of values at unknown points Souce of this information: http://en.wikipedia.org/wiki/Daniel_Gerhardus_Krige Georges Matheron Danie Gerhardus Krige
  • 131. 135 Motivating Example: Ordinary Kriging • Imagine we have data on the concentration of gold (denote it by Y) in western Pennsylvania at a set of 200 sample locations (call them points p1…p200). • Since Y has a meaningful value at every point, our goal is to create a prediction surface for the entire region using these sample points • Notation: In this western PA region, Y(p) will denote the concentration level of gold at any point p.
  • 132. 136 Global and Local Structure • Without any a priori knowledge about the distribution of gold in Western PA, we have no theoretical reason to expect to find different concentrations of gold at different locations in that region. – I.e., theoretically, the expected value of gold concentration should not vary with latitude and longitude – In other words, we would expect that there is some general, average, value of gold concentration (called global structure) that is constant throughout the region (even though we assume it’s constant, we do not know what its value is) • Of course, when we look at the data, we see that there is some variability in the gold concentrations at different points. We can consider this to be a local deviation from the overall global structure, known as the local structure or residual or error term. • In other words, geostatisticians would decompose the value of gold Y(p) into the global structure μ(p) and local structure ε(p). • Y(p) = μ(p) + ε(p)
  • 133. 137 ε(p) • As per the First Law of Geography, the local structures ε(p) of nearby observations will often be correlated. That is, there is still some meaningful information (i.e., spatial dependencies) that can be extracted from the spatially dependent component of the residuals. • So, our ordinary kriging model will: – Estimate this constant but unknown global structure μ(p), and – Incorporate the dependencies among the residuals ε(p). Doing so will enable us to create a continuous surface of gold concentration in western PA.
  • 134. 138 Assumptions of Ordinary Kriging • For the sake of the methods that we will be employing, we need to make some assumptions: – Y(p) should be normally distributed – The global structure μ(p) is constant and unknown (as in the gold example) – Covariance between values of ε depends only on distance between the points, • To put it more formally, for each distance h and each pair of locations p and t within the region of interest that are h units are apart, there exists a common covariance value, C(h), such that covariance [ε(p), ε(t)] = C(h). • This is called isotropy
  • 135. 139 Covariance and Distance • From the First Law of Geography it would then follow that as distance between points increases, the similarity (i.e., covariance or correlation) between the values at these points decreases • If we plot this out, with inter-point distance h on the x-axis, and covariance C(h) on the y-axis, we get a graph that looks something like the one below. This representation of covariance as a function of distance is called as the covariogram • Alternatively, we can plot correlation against distance (the correlogram)
  • 136. 140 Covariograms and Weights • Geostatistical methods incorporate this covariance- distance relationship into the interpolation models – More specifically, this information is used to calculate the weights – As IDW, kriging is a weighted average of points in the vicinity • Recall that in IDW, in order to predict the value at an unknown point, we assume that nearer points will have higher weights (i.e., weights are determined based on distance) • In geostatistical techniques, we calculate the distances between the unknown point at which we want to make a prediction and the measured points nearby, and use the value of the covariogram for those distances to calculate the weight of each of these surrounding measured points. – I.e., the weight of a point h units away will depend on the value of C(h)
  • 137. 141 But… • Unfortunately, it so happens that one generally cannot estimate covariograms and correlograms directly • For that purpose, a related function of distance (h) called the semi- variogram (or simply the variogram) is calculated – The variogram is denoted by γ(h) – One can easily obtain the covariogram from the variogram (but not the other way around) • Covariograms and variograms tell us the spatial structure of the data Covariogram C(h) Variogram γ(h)
  • 138. 142 Interpretation of Variograms • As mentioned earlier, a covariogram might be thought of as covariance (i.e., similarity) between point values as a function of distance, such that C(h) is greater at smaller distances • A variogram, on the other hand, might be thought of as “dissimilarity between point values as a function of distance”, such that the dissimilarity is greater for points that are farther apart • Variograms are usually interpreted in terms of the corresponding covariograms or correlograms • A common mistake when interpreting variograms is to say that variance increases with distance. Covariogram C(h) Variogram γ(h)
  • 139. 143 • When there are n points, the number of inter-point distances is equal to • Example: – With 15 points, we have 15(15-1)/2 = 105 inter-point distances (marked in yellow on the grid in the lower left) – Since we’re using Euclidean distance, the distance between points 1 and 2 is the same as the distance between points 2 and 1, so we count it only once. Also, the distance between a point and itself will always be zero, and is of no interest here. • The maximum distance h on a covariogram or variogram is called the bandwidth, and should equal half the maximum inter-point distance. – In the figure on the lower right, the blue line connects the points that are the farthest away from each other. The bandwidth in this example would then equal to half the length of the blue line Bandwidth (The Maximum Value of h)
  • 140. 144 Mathematical definition of a variogram • In other words, for each distance h between 0 and the bandwidth – Find all pairs of points i and j that are separated by that distance h – For each such point pair, subtract the value of Y at point j from the value of Y at point i, and square the difference – Average these square distances across all point pairs and divide the average by 2. That’s your variogram value! • Division by 2 -> hence the occasionally used name semi-variogram • However, in practice, there will generally be only one pair of points that are exactly h units apart, unless we’re dealing with regularly spaced samples. Therefore, we create “bins”, or distance ranges, into which we place point pairs with similar distances, and estimate γ only for midpoints of these bins rather than at all individual distances. – These bins are generally of the same size – It’s a rule of thumb to have at least 30 point pairs per bin • We call these estimates of γ(h) at the bin midpoints the empirical variogram     2 ) ( ) ( 2 1 ) ( j Y i Y average h   
  • 141. 145 Fitting a Variogram Model • Now, we’re going to fit a variogram model (i.e., curve) to the empirical variogram • That is, based on the shape of the empirical variogram, different variogram curves might be fit • The curve fitting generally employs the method of least squares – the same method that’s used in regression analysis A very comprehensive guide on variography by Dr. Tony Smith (University of Pennsylvania) http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_II/4_Variograms.pdf
  • 142. 146 The Variogram Parameters • The variogram models are a function of three parameters, known as the range, the sill, and the nugget. – The range is typically the level of h at the correlation between point values is zero (i.e., there is no longer any spatial autocorrelation) – The value of γ at r is called the sill, and is generally denoted by s • The variance of the sample is used as an estimate of the sill – Different models have slightly different definitions of these parameters – The nugget deserves a slide of its own Graph taken from: http://www.geog.ubc.ca/courses/geog570/talks_2001/Variogr1neu.gif
  • 143. 147 Spatial Independence at Small Distances • Even though we assume that values at points that are very near each other are correlated, points that are separated by very, very small values might be considerably less correlated – E.g.: you might find a gold nugget and no more gold in the vicinity • In other words, even though γ(0) is always 0, however γ at very, very small distances will be equal to a value a that is considerably greater than 0. • This value denoted by a is called the nugget • The ratio of the nugget to the sill is known as the nugget effect, and may be interpreted as the percentage of variation in the data that is not spatial • The difference between the sill and the nugget is known as the partial sill – The partial sill, and not the sill itself, is reported in GeoStatistical Analyst
  • 144. 148 Pure Nugget Effect Variograms • Pure nugget effect is when the covariance between point values is zero at all distances h • That is, there is absolutely no spatial autocorrelation in the data (even at small distances) • Pure nugget effect covariogram and variogram are presented below • Interpolation won’t give a reasonable predictions • Most cases are not as extreme and have both a spatially dependent and a spatially independent component, regardless of variogram model chosen (discussed on following slides)
  • 145. 149 The Spherical Model • The spherical model is the most widely used variogram model • Monotonically non-decreasing – I.e., as h increases, the value of γ(h) does not decrease - i.e., it goes up (until h≤r) or stays the same (h>r) • γ(h≥r)=s and C(h≥r)=0 – That is, covariance is assumed to be exactly zero at distances h≥r
  • 146. 150 The Exponential Model • The exponential variogram looks very similar to the spherical model, but assumes that the correlation never reaches exactly zero, regardless of how great the distances between points are • In other words, the variogram approaches the value of the sill asymptotically • Because the sill is never actually reached, the range is generally considered to be the smallest distance after which the covariance is 5% or less of the maximum covariance • The model is monotonically increasing – I.e., as h goes up, so does γ(h)
  • 147. 151 The Wave (AKA Hole-Effect) Model On the picture to the left, the waves exhibit a periodic pattern. A non-standard form of spatial autocorrelation applies. Peaks are similar in values to other peaks, and troughs are similar in values to other troughs. However, note the dampening in the covariogram and variogram below: That is, peaks that are closer together have values that are more correlated than peaks that are father apart (and same holds for troughs). More is said about the applicability of these models in ttp://www.gaa.org.au/pdf/gaa_pyrcz_deutsch.pdf Variogram graph edited slightly from: http://www.seas.upenn.edu/~ese502/NOTEBOOK/Part_ II/4_Variograms.pdf
  • 149. 153
  • 151. The empirical variogram • To summarize the variogram cloud, group the separations into lags (separation bins, like a histogram) • Then, compute the average semivariance of all the point- pairs in the bin. • This is the empirical variogram, as the so-called Matheron estimator: – m(h) is the number of point pairs separated by vector h, in practice some range (bin) – These are indexed by i; the notation z(xi +h) means the “tail“ of point-pair i, i.e. separated from the ”head“ xi by the separation vector h. 155 Unit 5: Varogram Modeling/analysis:
  • 152. 156
  • 153. Defining the bins • There are some practical considerations, just like dening bins for a histogram: – Each bin should have enough points to give a robust estimate of the representative semi-variance; otherwise the variogram is erratic; – If a bin is too wide, the theoretical variogram model will be hard to estimate and fit; note we haven't seen this yet, it is in the next lecture; – The largest separation should not exceed half the longest separation in the dataset; – In general the largest separation should be somewhat shorter, since it is the local spatial dependence which is most interesting. • All computer programs that compute variograms use some defaults for the largest separation and number of bins; gstat uses 1/3 of the longest separation, and divides this into 15 equal-width bins. 157 Unit 5: Varogram Modeling/analysis:
  • 154. Numerical example of an empirical variogram • Here is an empirical variogram of log10Pb from the Jura soil samples; for simplicity the maximum separation was set to 1.5 km: – np are the number of point-pairs in the bin; dist is the average separation of these pairs; gamma is the average semivariance in the bin. 158 Unit 5: Varogram Modeling/analysis:
  • 156. Plotting the empirical variogram • This can be plotted as semivariance gamma against average separation dist, along with the number of points that contributed to each estimate np. 160 Unit 5: Varogram Modeling/analysis:
  • 159. Features of the empirical variogram • Later we will look at fitting a theoretical model to the empirical variogram; but even without a model we can notice some features which characterize the spatial dependence, which we define here only qualitatively: – Sill: maximum semi-variance • represents variability in the absence of spatial dependence – Range: separation between point-pairs at which the sill is reached • distance at which there is no evidence of spatial dependence – Nugget: semi-variance as the separation approaches zero • represents variability at a point that can't be explained by spatial structure 163 Unit 5: Varogram Modeling/analysis:
  • 160. Semivariogram 0 10 20 30 40 50 60 0 50 100 150 200 Lag (m) Semivariance Sill Range Nugget Unit 5: Varogram Modeling/analysis:
  • 161. Semivariogram 0 10 20 30 40 50 60 0 50 100 150 200 Lag (m) Semivariance Spatially dependent Spatially independent Unit 5: Varogram Modeling/analysis:
  • 162. Semivariogram uses • Use range to determine maximum sampling distances • The sill indicates intra-field variability • The model can be used for interpolation of values in unsampled areas Unit 5: Varogram Modeling/analysis:
  • 165. Effect of bin width • The same set of points can be displayed with many bin widths • This has the same effect as different bin widths in a univariate histogram: same data, different visualization • In addition, visual and especially automatic variogram fitting is affected • Wider (fewer) bins ! less detail, also less noise • Narrower (more) bins ! more detail, but also more noise • General rule: – as narrow as possible (detail) without “too much“ noise; – and with sufficient point-pairs per bin (> 100, preferably > 200) 169 Unit 5: Varogram Modeling/analysis:
  • 166. 170
  • 167. Evidence of spatial dependence • The empirical variogram provides evidence that there is local spatial dependence. • The variability between point-pairs is lower if they are closer to each other; i.e. the separation is small. • There is some distance, the range where this efect is noted; beyond the range there is no dependence. • The relative magnitude of the total sill and nugget give the strength of the local spatial dependence; the nugget represents completely unexplained variability. • There are of course variables for which there is no spatial dependence, in which case the empirical variogram has the sill equal to the nugget; this is called a pure nugget effect • The next graph shows an example. 171 Unit 5: Varogram Modeling/analysis:
  • 169. Visualizing anisotropy • Anisotropy • Variogram surfaces • Directional variograms 173 Unit 5: Varogram Modeling/analysis:
  • 170. What? • We have been considering spatial dependence as if it is the same in all directions from a point (isotropic or omnidirectional). • For example, if I want to know the weather at a point where there is no station, I can equally consider stations at some distance from my location, no matter whether they are N, S, E or W. • But this is self-evidently not always true! In this example, suppose the winds almost always blow from the • North. Then the temperatures recorded at stations 100 km to the N or S of me will likely be closer to the temperature at my station than temperatures recorded at stations 100 km to the E or W. • We now see how to detect anisotropy. 174 Unit 5: Varogram Modeling/analysis:
  • 171. Anisotropy • Greek ”Iso"+ “tropic"= English “same"+ “trend"; Greek “an-"= English ”not-" • Variation may depend on direction, not just distance • This is why we refer to the separation vector; up till now this has just meant distance, but now it includes direction – Case 1: same sill, different ranges in dierent directions (geometric, also called affine, anisotropy) – Case 2: same range, sill varies with direction (zonal anisotropy) 175 Unit 5: Varogram Modeling/analysis:
  • 172. Spatial trends • Isotropic - trend is a function of distance from a known (sampled) point only • Anisotropic - trend is a function of both distance and direction from a known point Unit 5: Varogram Modeling/analysis:
  • 173. How can anisotropy arise? • Directional process – Example: sand content in a narrow flood plain: much greater spatial dependence along the axis parallel to the river – Example: population density in a hilly terrain with long, linear valleys • Note that the nugget must logically be isotropic: it is variation at a point (which has no direction) 177 Unit 5: Varogram Modeling/analysis:
  • 174. How do we detect anisotropy? 1. Looking for directional patterns in the post-plot; 2. With a variogram surface, sometimes called a variogram map; 3. Computing directional variograms, where we only consider points separated by a distance but also in a given horizontal direction from each other. • We can compute different directional variograms and see if they have different structure. 178 Unit 5: Varogram Modeling/analysis:
  • 175. Detecting anisotropy with a variogram surface • One way to see anistropy is with a variogram surface, sometimes called a variogram map. • This is not a map! but rather a plot of semivariances vs. distance and direction (the separation vector) • Each grid cell shows the semivariance at a given distance and direction separation (lag) • Symmetric by definition, can be read in either direction • A transect from the origin to the margin gives a directional variogram (next visualization technique) 179 Unit 5: Varogram Modeling/analysis:
  • 180. 184 Reviewing Ordinary Kriging • Again, ordinary kriging will: – Give us an estimate of the constant but unknown global structure μ(p), and – Use variography to examine the dependencies among the residuals ε(p) and to create kriging weights. • We calculate the distances between the unknown point at which we want to make a prediction and the measured points that are nearby and use the value of the covariogram for those distances to calculate the weight of each of these surrounding measured points. • The end result is, of course, a continuous prediction surface • Prediction standard errors can also be obtained – this is a surface indicating the accuracy of prediction
  • 181. 185 • Now, take another example: imagine we have data on the temperature at 100 different weather stations (call them w1..w100) throughout Florida, and we want to predict the values of temperature (T) at every point w in the entire state using these data. • Notation: temperature at point w is denoted by T(w) • We know that temperature at lower latitudes are expected to be higher. So, T(w) will be expected to vary with latitude – Ordinary kriging is not appropriate here, because it assumes that the global structure is the same everywhere. This is clearly not the case here. – A method called universal kriging allows for a non-constant global structure • We might model the global structure μ as in regression: • Everything else in universal kriging is pretty much the same as in ordinary kriging (e.g., variography) Universal Kriging   ) ( 1 0 w latitude w     
  • 182. 186 Some More Advanced Techniques • Indicator Kriging is a geostatistical interpolation method does not require the data to be normally distributed. • Co-kriging is an interpolation technique that is used when there is a second variable that is strongly correlated with the variable from which we’re trying to create a surface, and which is sampled at the same set of locations as our variable of interest and at a number of additional locations. • For more details on indicator kriging and co-kriging, see one of the texts suggested at the end of this presentation
  • 183. 187 Isotropy vs. Anisotropy • When we use isotropic (or omnidirectional) covariograms, we assume that the covariance between the point values depends only on distance – Recall the covariance stationarity assumption • Anisotropic (or directional) covariograms are used when we have reason to believe that direction plays a role as well (i.e., covariance is a function of both distance and direction) – E.g., in some problems, accounting for direction is appropriate (e.g., when wind or water currents might be a factor) For more on anisotropic variograms, see http://web.as.uky.edu/statistics/users/yzhen8/STA695/lec05.pdf
  • 184. 188 IDW vs. Kriging • We get a more “natural” look to the data with Kriging • You see the “bulls eye” effect in IDW but not (as much) in Kriging • Helps to compensate for the effects of data clustering, assigning individual points within a cluster less weight than isolated data points (or, treating clusters more like single points) • Kriging also give us a standard error • If the data locations are quite dense and uniformly distributed throughout the area of interest, we will get decent estimates regardless of which interpolation method we choose. • On the other hand, if the data locations fall in a few clusters and there are gaps in between these clusters, we will obtain pretty unreliable estimates regardless of whether we use IDW or Kriging. These are interpolation results using the gold data in Western PA (IDW vs. Ordinary Kriging)
  • 185. 6. KRIGING (SPATIAL ESTIMATION) 189
  • 186. Why other methods than interpolations • In next units we will look at ordinary kriging • Ordinary kriging is “linear" because – its estimates are weighted linear combinations of the available data; it is “unbiased" since it tries to have mR, the mean residual or error, equal to 0; – it is “best" because it aims at minimizing σ2 R (the variance of the errors). • All of the other estimation methods we have seen so far are also linear and, as we have already seen, are also theoretically unbiased. • The distinguishing feature of ordinary kriging, therefore, is its aim of minimizing the error variance. 190 Unit 6: Kriging (spatial estimation)
  • 187. How to deal with error • The importance of this for ordinary kriging is that – we never know mR and therefore cannot guarantee that it is exactly 0. – Nor do we know σ2 R; therefore, we cannot minimize it. • The best we can do is to build a model of the data we are studying and work with the average error and the error variance for the model. 191 Unit 6: Kriging (spatial estimation)
  • 188. variance • In ordinary kriging, we use a probability model in which the bias and the error variance can both be calculated and then choose weights for the nearby samples that ensure that the average error for our model, mR is exactly 0 and that our modeled error variance σ2 R, is minimized. 192 Unit 6: Kriging (spatial estimation)
  • 189. ordinary kriging system • This system of equations, often referred to as the ordinary kriging system, can be written in matrix notation as 193 Unit 6: Kriging (spatial estimation)
  • 190. weights • To solve for the weights, we multiply the previous Equation on both sides by c-1 the inverse of the left-hand side covariance matrix: 194 Unit 6: Kriging (spatial estimation)
  • 191. An Example of Ordinary Kriging • Let us return to the seven sample data configuration we used earlier to see a specific example of how ordinary kriging is done. The data configuration is shown again in next slides; we have labelled the point we are estimating as location 0, and the sample locations as 1 through 7. The coordinates of these eight points are given in Table following the figure, along with the available sample values. 195 Unit 6: Kriging (spatial estimation)
  • 192. An example of a data configuration 196 • An example of a data configuration to illustrate the kriging estimator. • The sample value is given immediately to the right of the plus sign. Unit 6: Kriging (spatial estimation)
  • 193. 197 Coordinates and sample values for the data shown in previous Figure Unit 6: Kriging (spatial estimation)
  • 194. pattern of spatial • To calculate the ordinary kriging weights, we must first decide what pattern of spatial continuity we want our random function model to have. 198 Unit 6: Kriging (spatial estimation)
  • 195. covariances • To keep this example relatively simple, we will calculate all of our covariances from the following function: 199 An example of an exponential covariance function . Unit 6: Kriging (spatial estimation)
  • 196. variogram • The covariance function corresponds to the following variogram: 200 An example of an exponential variogram model Unit 6: Kriging (spatial estimation)
  • 197. Remark to covariance & variogram model – Co • commonly called the nugget effect • provides a discontinuity at the origin. 201 Unit 6: Kriging (spatial estimation) – a • commonly called the range • provides a distance beyond which the varriogram or covariance value remains essentially constant. – Co + Cl • commonly called the sill • is the variogram value for very large distances, γ( ∞) it is also the covariance value for |h| = 0, and the variance of our random variables, σ2. Both of these functions, shown in the previous two slides, can be described by the following parameters:
  • 198. • Geostatisticians normally define the spatial continuity in their random function model through the variogram and solve the ordinary kriging system using the covariance. In this example, we will use the covariance function throughout. • By using the covariance function, we have chosen to ignore the possibility of anisotropy for the moment; the covariance between the data values at any two locations will depend only on the distance between them and not on the direction. Later, when we examine the effect of the various parameters, we will also study the important effect of anisotropy. 202 Unit 6: Kriging (spatial estimation)
  • 199. 203 A table of distances, from the previous Figure, between all possible pairs of the seven data locations. Unit 6: Kriging (spatial estimation)
  • 200. • To demonstrate how ordinary kriging works, we will use the following parameters for the function given in the following Equation: 204 Unit 6: Kriging (spatial estimation)
  • 201. • These are not necessarily good choices, but they will make the details of the ordinary kriging procedure easier to follow since our covariance model now has a quite simple expression: 205 Unit 6: Kriging (spatial estimation)
  • 202. • Having chosen a covariance function from which we can calculate all the covariances required for our random function model, we can now build the C and D matrices. 206 Unit 6: Kriging (spatial estimation)
  • 203. 207 Using Table, which provides the distances between every pair of locations, and Equation above, the C matrix is Unit 6: Kriging (spatial estimation)
  • 204. 208
  • 205. 209
  • 206. 210
  • 207. 211 The set of weights that will provide unbiased estimates with a minimum estimation variance is calculated by multiplying C-1 by D:
  • 208. 212 The ordinary kriging weights for the seven samples using the isotropic exponential covariance model given in Equation below. The sample value is given immediately to the right of the plus sign while the kriging weights are shown in parenthesis.
  • 209. • Below is shown the sample values along with their corresponding weights. The resulting estimate is 213
  • 210. 214 the minimized error variance expressed as
  • 212. Detailed exercise • Refer to the practical exercise on – interpolation using IDW – kriging 216
  • 213. Spatial Interpolation: A Brief Introduction Eugene Brusilovskiy
  • 214. 218 • Introduction to interpolation • Deterministic interpolation methods • Some basic statistical concepts • Autocorrelation and First Law of Geography • Geostatistical Interpolation – Introduction to variography – Kriging models General Outline
  • 215. 219 What is Interpolation? • Assume we are dealing with a variable which has meaningful values at every point within a region (e.g., temperature, elevation, concentration of some mineral). Then, given the values of that variable at a set of sample points, we can use an interpolation method to predict values of this variable at every point – For any unknown point, we take some form of weighted average of the values at surrounding points to predict the value at the point where the value is unknown – In other words, we create a continuous surface from a set of points – As an example used throughout this presentation, imagine we have data on the concentration of gold in western Pennsylvania at a set of 200 sample locations: Input Process Output
  • 216. 220 Appropriateness of Interpolation • Interpolation should not be used when there isn’t a meaningful value of the variable at every point in space (within the region of interest) • That is, when points represent merely the presence of events (e.g., crime), people, or some physical phenomenon (e.g., volcanoes, buildings), interpolation does not make sense. • Whereas interpolation tries to predict the value of your variable of interest at each point, density analysis (available, for instance, in ArcGIS’s Spatial Analyst) “takes known quantities of some phenomena and spreads it across the landscape based on the quantity that is measured at each location and the spatial relationship of the locations of the measured quantities”. – Source: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Un derstanding_density_analysis
  • 217. 221 Interpolation vs. Extrapolation • Interpolation is prediction within the range of our data – E.g., having temperature values for a bunch of locations all throughout PA, predict the temperature values at all other locations within PA • Note that the methods we are talking about are strictly those of interpolation, and not extrapolation • Extrapolation is prediction outside the range of our data – E.g., having temperature values for a bunch of locations throughout PA, predict the temperature values in Kazakhstan
  • 218. 222 First Law of Geography • “Everything is related to everything else, but near things are more related than distant things.” – Waldo Tobler (1970) • This is the basic premise behind interpolation, and near points generally receive higher weights than far away points Waldo Tobler Reference: TOBLER, W. R. (1970). "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.
  • 219. 223 Methods of Interpolation • Deterministic methods – Use mathematical functions to calculate the values at unknown locations based either on the degree of similarity (e.g. IDW) or the degree of smoothing (e.g. RBF) in relation with neighboring data points. – Examples include: • Inverse Distance Weighted (IDW) • Radial Basis Functions (RBF) • Geostatistical methods – Use both mathematical and statistical methods to predict values at all locations within region of interest and to provide probabilistic estimates of the quality of the interpolation based on the spatial autocorrelation among data points. • Include a deterministic component and errors (uncertainty of prediction) – Examples include: • Kriging • Co-Kriging Reference: http://www.crwr.utexas.edu/gis/gishydro04/Introduction/TermProjects/Peralvo.pdf
  • 220. 224 Exact vs. Inexact Interpolation • Interpolators can be either exact or inexact – At sampled locations, exact interpolators yield values identical to the measurements. • I.e., if the observed temperature in city A is 90 degrees, the point representing city A on the resulting grid will still have the temperature of 90 degrees – At sampled locations, inexact interpolators predict values that are different from the measured values. • I.e., if the observed temperature in city A is 90 degrees, the inexact interpolator will still create a prediction for city A, and this prediction will not be exactly 90 degrees – The resulting surface will not pass through the original point – Can be used to avoid sharp peaks or troughs in the output surface • Model quality can be assessed by the statistics of the differences between predicted and measured values – Jumping ahead, the two deterministic interpolators that will be briefly presented here are exact. Kriging can be exact or inexact. Reference: Burrough, P. A., and R. A. McDonnell. 1998. Principles of geographical information systems. Oxford University Press, Oxford. 333pp.
  • 221. 225 Part 1. Deterministic Interpolation
  • 222. 226 Inverse Distance Weighted (IDW) • IDW interpolation explicitly relies on the First Law of Geography. To predict a value for any unmeasured location, IDW will use the measured values surrounding the prediction location. Measured values that are nearest to the prediction location will have greater influence (i.e., weight) on the predicted value at that unknown point than those that are farther away. – Thus, IDW assumes that each measured point has a local influence that diminishes with distance (or distance to the power of q > 1), and weighs the points closer to the prediction location greater than those farther away, hence the name inverse distance weighted. • Inverse Squared Distance (i.e., q=2) is a widely used interpolator • For example, ArcGIS allows you to select the value of q. • Weights of each measured point are proportional to the inverse distance raised to the power value q. As a result, as the distance increases, the weights decrease rapidly. How fast the weights decrease is dependent on the value for q. Source: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works
  • 223. 227 Inverse Distance Weighted - Continued • Because things that are close to one another are more alike than those farther away, as the locations get farther away, the measured values will have little relationship with the value of the prediction location. – To speed up the computation we might only use several points that are the closest – As a result, it is common practice to limit the number of measured values that are used when predicting the unknown value for a location by specifying a search neighborhood. The specified shape of the neighborhood restricts how far and where to look for the measured values to be used in the prediction. Other neighborhood parameters restrict the locations that will be used within that shape. • The output surface is sensitive to clustering and the presence of outliers.
  • 224. 228 Search Neighborhood Specification Points with known values of elevation that are outside the circle are just too far from the target point at which the elevation value is unknown, so their weights are pretty much 0. 5 nearest neighbors with known values (shown in red) of the unknown point (shown in black) will be used to determine its value
  • 225. 229 The Accuracy of the Results • One way to assess the accuracy of the interpolation is known as cross-validation – Remember the initial goal: use all the measured points to create a surface – However, assume we remove one of the measured points from our input, and re-create the surface using all the remaining points. – Now, we can look at the predicted value at that removed point and compare it to the point’s actual value! – We do the same thing for all the points – If the average (squared) difference between the actual value and the prediction is small, then our model is doing a good job at predicting values at unknown points. If this average squared difference is large, then the model isn’t that great. This average squared difference is called mean square error of prediction. For instance, the Geostatistical Analyst of ESRI reports the square root of this average squared difference – Cross-validation is used in other interpolation methods as well
  • 226. 230 A Cross-Validation Example • Assume you have measurements at 15 data points, from which you want to create a prediction surface • The Measured column tells you the measured value at that point. The Predicted column tells you the prediction at that point when we remove it from the input (i.e., use the other 14 points to create a surface). The Error column is simply the difference between the measured and predicted values. • Because we can have an over-prediction or under- prediction at any point, the error can be positive or negative. So averaging the errors won’t do us much good if we want to see the overall error – we’ll end up with a value that is essentially zero due to these positives and negatives • Thus, in order to assess the extent of error in our prediction, we square each term, and then take the average of these squared errors. This average is called the mean squared error (MSE) • For example, ArcGIS reports the square root of this mean squared error (referred to as simply Root- Mean-Square in Geostatistical Analyst). This root mean square error is often denoted as RMSE.
  • 227. 231 Examples of IDW with Different q’s • Larger q’s (i.e., power to which distance is raised) yield smoother surfaces • Food for thought: What happens when q is set to 0? Gold concentrations at locations in western PA q = 1 q=2 q=3 q=10 The Geostatistical Analyst of ArcGIS is able to tell you the optimal value of q by seeing which one yields the minimum RMSE. (Here, it is q=1).
  • 228. 232 Part 2. A Review of Stats 101
  • 229. 233 Before we do any Geostatistics… • … Let’s review some basic statistical topics: – Normality – Variance and Standard Deviations – Covariance and Correlation • … and then briefly re-examine the underlying premise of most spatial statistical analyses: – Autocorrelation
  • 230. 234 Normality • A lot of statistical tests – including many in geostatistics – rely on the assumption that the data are normally distributed • When this assumption does not hold, the results are often inaccurate
  • 232. 236 Data Transformations • Sometimes, it is possible to transform a variable’s distribution by subjecting it to some simple algebraic operation. – The logarithmic transformation is the most widely used to achieve normality when the variable is positively skewed (as in the image on the left below) – Analysis is then performed on the transformed variable.
  • 233. 237 The Mean and the Variance • The mean (average) of a variable is also known as the expected value – Usually denoted by the Greek letter μ – As an aside, for a normally distributed variable, the mean is equal to the median • The variance is a measure of dispersion of a variable – Calculated as the average squared distance of the possible values of the variable from mean. – Standard deviation is the square root of the variance – Standard deviation is generally denoted by the Greek letter σ, and variance is therefore denoted by
  • 234. 238 Example: Calculation of Mean and Variance Person Test Score Distance from the Mean (Distance from the Mean) Squared 1 90 15 225 2 55 -20 400 3 100 25 625 4 55 -20 400 5 85 10 100 6 70 -5 25 7 80 5 25 8 30 -45 2025 9 95 20 400 10 90 15 225 Mean: 75 Variance: 445 (Average of the entries in this column) Standard deviation (Square root of the variance): 21.1
  • 235. 239 Covariance and Correlation • Defined as a measure of how much two variables X and Y change together – The units of Cov (X, Y) are those of X multiplied by those of Y – The covariance of a variable X with itself is simply the variance of X • Since these units are fairly obscure, a dimensionless measure of the strength of the relationship between variables is often used instead. This measure is known as the correlation. – Correlations range from -1 to 1, with positive values close to one indicating a strong direct relationship and negative values close to -1 indicating a strong inverse relationship
  • 236. 240 Spatial Autocorrelation • Sometimes, rather than examining the association between two variables, we might look at the relationship of values within a single variable at different time points or locations • There is said to be (positive) autocorrelation in a variable if observations that are closer to each other in space have related values (recall Tobler’s Law) • As an aside, there could also be temporal autocorrelation – i.e., values of a variable at points close in time will be related
  • 237. 241 Examples of Spatial Autocorrelation (Source: http://image.weather.com/images/maps/current/acttemp_720x486.jpg)
  • 238. 242 Examples of Spatial Autocorrelation (Cont’d) (Source: http://capita.wustl.edu/CAPITA/CapitaReports/localPM10/gifs/elevatn.gif)
  • 239. 243 Regression • A statistical method used to examine the relationship between a variable of interest and one or more explanatory variables – Strength of the relationship – Direction of the relationship • Often referred to as Ordinary Least Squares (OLS) regression • Available in all statistical packages • Note that the presence of a relationship does not imply causality
  • 240. 244 For the purposes of demonstration, let’s focus on a simple version of this problem • Variable of interest (dependent variable) – E.g., education (years of schooling) • Explanatory variable (AKA independent variable or predictor): – E.g., Neighborhood Income
  • 241. 245 But what does a regression do? An example with a single predictor
  • 242. 246 The example on the previous page can be easily extended to cases when we have more than one predictor • When we have n>1 predictors, rather than getting a line in 2 dimensions, we get a line in n+1 dimensions (the ‘+1’ accounts for the dependent variable) • Each independent variable will have its own slope coefficient which will indicate the relationship of that particular predictor with the dependent variable, controlling for all other independent variables in the regression. • The equation of the best fit line becomes Dep. Variable = m1*predictor1 + m2*predictor2 + … + m3*predictor 3 + b + residuals where the m’s are the coefficients of the corresponding predictors and b is the y-intercept term • The coefficient of each predictor may be interpreted as the amount by which the dependent variable changes as the independent variable increases by one unit (holding all other variables constant)
  • 243. 247 Some (Very) Basic Regression Diagnostics • R-squared: the percent of variance in the dependent variable that is explained by the independent variables • The so-called p-value of the coefficient – The probability of getting a coefficient (slope) value as far from zero as we observe in the case when the slope is actually zero – When p is less than 0.05, the independent variable is considered to be a statistically significant predictor of the dependent variable – One p-value per independent variable • The sign of the coefficient of the independent variable (i.e., the slope of the regression line) – One coefficient per independent variable – Indicates whether the relationship between the dependent and independent variables is positive or negative – We should look at the sign when the coefficient is statistically significant
  • 244. 248 Some (but not all) regression assumptions 1. The dependent variable should be normally distributed (i.e., the histogram of the variable should look like a bell curve) 2. Very importantly, the observations should be independent of each other. (The same holds for regression residuals). If this assumption is violated, our coefficient estimates could be wrong!
  • 245. 249 Part 3. Geostatistical Interpolation
  • 246. 250 Some Widely Used Texts on Geostatistics – Bailey, T.C. and Gatrell, A.C. (1995) Interactive Spatial Data Analysis. Addison Wesley Longman, Harlow, Essex. – Cressie, N.A.C. (1993) Statistics for Spatial Data. (Revised Edition). Wiley, John & Sons, Inc., – Isaaks, E.H. and Srivastava, R.M. (1989) An Introduction to Applied Geostatistics. Oxford University Press, New York, 561 p.
  • 247. Ecosystems are: Hierarchically structured, Metastable, Far from equilibrium Spatial Relationships Theoretical Framework: “An Introduction to Applied Geostatistics“, E. Isaaks and R. Srivastava, (1989). “Factorial Analysis”, C. J. Adcock, (1954) “Spatial Analysis: A guide for ecologists”, M. Fortin and M. Dale, (2005)
  • 249. Time
  • 250. Basic paradigm: Ecosystem processes (change) are constrained and controlled by the pattern of hierarchical scales “Things” closer together (in both space and time) are more alike then things far apart – “Tobler’s Law” (1970, Economic Geography) “Everything is related to everything, but near things are more related then distant things” Ecological “scale” is the space and time “distance” apart (lag) at which significant variation is NO LONGER correlated with “distance” Distance Scale Variation
  • 251. Applied Geostatistics Spatial Structure Regionalized Variable Spatial Autocorrelation Moran I (1950) GEARY C (1954) Semivariance Stationarity Anistotropy
  • 252. Applied Geostatistics Notes on Introduction to Spatial Autocorrelation Geostatistical methods were developed for interpreting data that varies continuously over a predefined, fixed spatial region. The study of geostatistics assumes that at least some of the spatial variation observed for natural phenomena can be modeled by random processes with spatial autocorrelation. D} i : {z(i)  Geostatistics is based on the theory of regionalized variables, variable distributed in space (or time). Geostatiscal theory supports that any measurement of regionalozed variables can be viewed as a realization of a random function (or random process, or random field, or stochastic process)
  • 253. Spatial Structure Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured at a point in one place, versus a value from another point measured a certain distance away. Describing spatial structure is useful for:  Indicating intensity of pattern and the scale at which that pattern is exposed  Interpolating to predict values at unmeasured points across the domain (e.g. kriging)  Assessing independence of variables before applying parametric tests of significance
  • 254. Regionalized Variables take on values according to spatial location. Given a variable z, measured at a location i , the variability in z can be broken down into three components: Where:     ) ( ) ( ) ( i i i s f z  ) (i f A “structural” coarse scale forcing or trend  ) (i s A random” Local spatial dependency   error variance (considered normally distributed) Usually removed by detrending What we are interested in Coarse scale forcing or trends can be removed by fitting a surface to the trend using regression and then working with regression residuals Regionalized Variable
  • 255. + Z1 + Z3 + Zn + Z4 + Zi + Z2 Regionalized Variable Zi Variables are spatially correlated, Therefore: Z(x+h) can be estimated from Z(x) by using a regression model. ** This assumption holds true with a recognized increased in error, from other lest square models. Function Z in domain D = a set of space dependent values Histogram of samples zi Z(x) Z(x+h) Cov(Z(x),Z(x+h)) D} i : {z(i) 
  • 256. X 1 3 2 1 : : µ Y 3 2 5 3 : : µ x y x·y x2 y2 b a Tan   θ a b As θ decreases, a/b goes to 0 θ1 a b A Statement of the extent to which two data sets agree. θ2 One distribution Two distributions 2 1   Tan Tan  data deviates Deviations Product of Deviations Sum of squares If you were to calculate correlation by hand …. You would produce these Terms. Determined by the extent to which the two regressions lines depart from the horizontal and vertical. Correlation Coefficient: Correlation:
  • 257. n ) x (x n ) y (y )/n x )(x y 1(y n 1 i 2 i n 1 i 2 i n 1 i i i           n ) x (x n ) x (x w / ) x )(x x (x w n 1 i 2 i n 1 i 2 i n 1 i n 1 i n 1 j ij n 1 j j i ij                Spatial auto-correlation Correlation Coefficient            n 1 i 2 i n 1 i n 1 j ij n 1 i n 1 j j i ij ) x (x ) w ( ) x )(x x (x w N = Briggs UT-Dallas GISC 6382 Spring 2007
  • 258. Spatial Structure Autocorrelation: := Degree of correlation to self Spatial Autocorrelation: := The relationship is a function of distance Spatial Structure which is: Exogenous (induced) … induced external spatial dependence Endogenous (inherent) … inherent spatial autocorrelation Spatial Dependence: Compare values at given distance apart -- LAGS A B C D Point – Point Autocorrelation A - B Positive A - C None A - D Negative Direction of Autocorrelation: Anisotropic := varies in intensity and range with orientation Isotropic := varies similarly in all directions
  • 259. Spatial Structure Given: Spatial Pattern is an outcome of the synthesis of dynamic processes operating at various spatial and temporal scales Therefore: Structure at any given time is but one realization of several potential outcomes Assuming: All processes are Stationary (homogeneous) Where: Properties are independent of absolute location and direction in space Therefore: Observations are independent which := they are homoscedastic and form a known distribution That is:     ij j Z i Z j i X X     , , , , 2 2   Stationarity is a property of the process NOT the data allowing spatial inferences And: Stationarity is scale dependent Furthermore: Inference (spatial statistics) apply over regions of assumed stationarity Thus:
  • 260. Space A B C D E F G H I J First Order Neighbors Topology Binary Connectivity Matrix Distance Class Connectivity Matrix 1 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 A B C D E F G H I J A B C D E F G H I J 1 1 2 1 2 1 1 2 2 1 2 3 2 1 1 2 2 1 1 2 1 3 2 2 2 2 1 1 2 1 1 2 3 2 1 1 2 1 2 3 3 2 2 1 1 A B C D E F G H I J J I H B G C F D A E Topological v’s Euclidean 1= connected, 0=not connected
  • 261. Spatial Autocorrelation Positive autocorrelation: Negative autocorrelation: No autocorrelation: A variable is thought to be autocorrelated if it is possible to predict its value at a given location, by knowing its value at other nearby locations.  Autocorrelation is evaluated using structure functions that assess the spatial structure or dependency of the variable.  Two of these functions are autocorrelation and semivariance which are graphed as a correlogram and semivariogram, respectively.  Both functions plot the spatial dependence of the variable against the spatial separation or lag distance.
  • 262. Space A B C D …. J A 0.0 B 2.00 0.00 C 1.41 3.16 0.00 : : J A B C D …. J A 0 B 2 0 C 1 3 0 : : J A B C D …. J A 0 B 0 0 C 1 0 0 : : J A B C D …. J Euclidean Distance Matrix Euclidean Distance Matrix Connectivity Matrix Weighted Matrix A B C D E F G H I J K L A 0 B 0 0 C 0.7 0 0 D 0.7 0.7 0 : J
  • 263. Moran I (1950) The numerator is a covariance (cross-product) term; the denominator is a variance term. Where: n is the number of pairs Zi is the deviation from the mean for value at location i (i.e., Zi = xi – x for variable x) Zj is the deviation from the mean for value at location j (i.e., Zj = xj – x for variable x) wij is an indicator function or weight at distance d (e.g. wij = 1, if j is in distance class d from point i, otherwise = 0) Wij is the sum of all weights (number of pairs in distance class) 2 i i ij i j j i ij (d) Z W Z Z w n I    • A cross-product statistic that is used to describe autocorrelation • Compares value of a variable at one location with values at all other locations Values range from [-1, 1] Value = 1 : Perfect positive correlation Value = -1: Perfect negative correlation
  • 264. Moran I (1950) Again; where for variable x: n is the number of pairs wij(d) is the distance class connectivity matrix (e.g. wij = 1, if j is in distance class d from point i, otherwise = 0) W(d) is the sum of all weights (number of pairs in distance class)   2 1 _ _ 1 1 _ ) ( ) ( 1 1                                      n i i j n j i i n i j j i ij d d x x n x x x x d w W I
  • 265.       i 2 i ij i j 2 j i ij (d) Z 2W ] ) y (y w 1)[ [(N C • A squared difference statistic for assessing spatial autocorrelation • Considers differences in values between pairs of observations, rather than the covariation between the pairs (Moran I) GEARY C (1954) The numerator in this equation is a defference term that gets squared. The Geary C statistic is more sensitive to extreme values & clustering than the Moran I, and behaves like a distance measure: Values range from [0,3] Value = 0 : Positive autocorrelation Value = 1 : No autocorrelation Value > 1 : Negative autocorrelation
  • 266. Ripley’s K (1976) The L (d) transformations     1 , 1 1       N N j i k A L i i j j (d)  Where: A = area N = nuber of points D = distance K(i,j) = the weight, which is 1 when |i-j| < d, 0 when |i-j| > d Determines if features are clustered at multiple different distance. Sensitive to study area boundary. Conceptualized as “number of points” within a set of radius sets. If events follow complete spatial randomness, the number of points in a circle follows a Poisson distribution (mean less then 1) and defines the “expected”.
  • 267.    j i j i ij (d) x x x x d w G ) ( Where: d = distance class Wij = weight matrix, which is 1 when |i-j| < d, 0 when |i-j| > d General G Effectively Distinguishes between “hot and cold” spots. G is relatively large if high values cluster, low if low values cluster. Numerator are “within” a distance bound (d), expressed relative to the entire study area.
  • 268. Semivariance 2 ) ( 1 ) ( 2 j i i j ij d y y w n d     Where : j is a point at distance d from i nd is the number of points in that distance class (i.e., the sum of the weights wij for that distance class) wij is an indicator function set to 1 if the pair of points is within the distance class. 2 ) ( ) ( 2 1 ) ( i d i d n i y y d n d        The geostatistical measure that describes the rate of change of the regionalized variable is known as the semivariance. Semivariance is used for descriptive analysis where the spatial structure of the data is investigated using the semivariogram and for predictive applications where the semivariogram is fitted to a theoretical model, parameterized, and used to predict the regionalized variable at other non-measured points (kriging).
  • 269. The sill is the value at which the semivariogram levels off (its asymptotic value) The range is the distance at which the semivariogram levels off (the spatial extent of structure in the data) The nugget is the semivariance at a distance 0.0, (the y –intercept) A semivariogram is a plot of the structure function that, like autocorrelation, describes the relationship between measurements taken some distance apart. Semivariograms define the range or distance over which spatial dependence exists.
  • 270. Autocorrelation assumes stationarity, meaning that the spatial structure of the variable is consistent over the entire domain of the dataset. The stationarity of interest is second-order (weak) stationarity, requiring that: (a) the mean is constant over the region (b) variance is constant and finite; and (c) covariance depends only on between-sample spacing  In many cases this is not true because of larger trends in the data  In these cases, the data are often detrended before analysis.  One way to detrend data is to fit a regression to the trend, and use only the residuals for autocorrelation analysis Stationarity
  • 271. Autocorrelation also assumes isotropy, meaning that the spatial structure of the variable is consistent in all directions. Often this is not the case, and the variable exhibits anisotropy, meaning that there is a direction-dependent trend in the data. Anistotropy If a variable exhibits different ranges in different directions, then there is a geometric anisotropy. For example, in a dune deposit, larger range in the wind direction compared to the range perpendicular to the wind direction.
  • 272. bd co   γ(d) )] / exp( 1 [ γ(d) 2 2 a d c co     )] / exp( 1 [ γ(d) a d c co     Gaussian: Linear: Spherical: Exponential: For predictions, the empirical semivariogram is converted to a theoretic one by fitting a statistical model (curve) to describe its range, sill, & nugget.          a d c c a d a d a d c c o o , )], 2 / ( ) 2 / 3 [ γ(d) 3 3 There are four common models used to fit semivariograms: Where: c0 = nugget b = regression slope a = range c0+ c = sill Assumes no sill or range
  • 273. • Check for enough number of pairs at each lag distance (from 30 to 50). • Removal of outliers • Truncate at half the maximum lag distance to ensure enough pairs • Use a larger lag tolerance to get more pairs and a smoother variogram • Start with an omnidirectional variogram before trying directional variograms • Use other variogram measures to take into account lag means and variances (e.g., inverted covariance, correlogram, or relative variograms) • Use transforms of the data for skewed distributions (e.g. logarithmic transforms). • Use the mean absolute difference or median absolute difference to derive the range Variogram Modeling Suggestions