SlideShare a Scribd company logo
3
Most read
11
Most read
13
Most read
1
Chapter 9 – Variogram models
Given a geostatistical model, Z(s), its variogram g(h) is formally defined as
where f(s, u) is the joint probability density function of Z(s) and Z(u).
For an intrinsic random field, the variogram can be estimated using the method of
moments estimator, as follows:
where h is the distance separating sample locations si and si+h, N(h) is the number of
distinct data pairs. In some circumstances, it may be desirable to consider direction in
addition to distance. In an isotropic case, h should be written as a scalar h, representing
magnitude.
Note: In the literature the terms variogram and semivariogram are often used interchangeably. By
definition g(h) is semivariogram and the variogram is 2g(h).
    u
s
u
s
u
s
u
s
h d
d
f
Z
Z
Z
Z )
,
(
)
(
)
(
2
1
)
(
)
(
var
2
1
)
( 2
 



g
 
2
)
(
1
)
(
)
(
)
(
2
1
)
(
ˆ  



h
h
s
s
h
h
N
i
i
i z
z
N
g
2
Robust variogram estimator
Variogram provides an important tool for describing how the spatial data are related
with distance. As we have seen it is defined in terms of dissimilarity in data values
between two locations separated by a distance h. It is noted that the moment estimator
given in the previous page is sensitive to outliers in the data. Thus, sometimes robust
estimators are used. The widely used robust estimator is given by Cressie and Hawkins
(1980):
The motivation behind this estimator is that for a Gaussian process, we have
Based on the Box-Cox transformation, it is found that the fourth-root of 1
2 is more
normally distributed.
* Cressie, N. and Hawkins, D. M. 1980. Robust estimation of the variogram, I. Journal of the
International Association for Mathematical Geology 12:115-125.
  .
)
(
/
494
.
0
457
.
0
)
(
)
(
)
(
2
1
)
(
ˆ
4
2
/
1
h
N
h
s
z
s
z
h
N
h i
i





g
  2
1
2
~
)
(
2
)
(
)
(

g h
s
Z
h
s
Z 

3
Variogram parameters
The main goal of a variogram analysis is to construct a variogram that best estimates
the autocorrelation structure of the underlying stochastic process. A typical
variogram can be described using three parameters:
Nugget effect – represents micro-scale variation or measurement error. It is estimated
from the empirical variogram at h = 0.
Range – is the distance at which the variogram
reaches the plateau, i.e., the distance (if any)
at which data are no longer correlated.
Sill – is the variance of the random field V(Z),
disregarding the spatial structure. It is the
plateau the variogram reaches at the range,
g(range). h
g(h)
0 2 4 6 8 10
0.0
0.4
0.8
1.2 range h = 5
nugget = 0.2
sill = 1.0
)
(
ˆ h
g
4
5 m
20°
20°
0.5 m
0.5 m
Setting variogram parameters
Construction of a variogram requires consideration of a few things:
1. An appropriate lag increment for h – It defines the distance at which the
variogram is calculated.
2. A tolerance for the lag increment – It establishes distance bins for the lag
increments to accommodate unevenly spaced observations.
3. The number of lags over which the variogram will be calculated – The number
of lags in conjunction with the size of the lag increment will define the total
distance over which a variogram is calculated.
4. A tolerance for angle – It determines how wide the bins will span.
Two practical rules:
1. It is recommended that h is chosen as such
that the number of pairs is greater than 30.
2. The distance of reliability for an
experimental variogram is h < D/2, where
D is the maximum distance over the field of data.
5
Computing variograms
An experimental variogram is calculated using the package geoR:
Create a geodata data for Gigante soil data, surface water pH values:
>soil87pH.geodat=as.geodata(soil87.dat,coords.col=2:3,data.col=5,covar.col=6:14)
>variog.b=variog(soil87pH.geodat,max.dist=500)
>variog.c=variog(soil87pH.geodat, max.dist=500, op=“cloud”)
>plot(variog.b)
>plot(variog.c)
6
Covariogram and Correlogram
Covariogram (analogous to covariance) and correlogram (analogous to correlation
coefficient) are another two useful methods for measuring spatial correlation. They
describe similarity in values between two locations.
Covariogram:
Its estimator:
where is the sample mean.
At h = 0, Ĉ(0) is simply the finite variance of the random field. It is straightforward
to establish the relationship:
The correlogram is defined as
 
)
(
),
(
cov
)
( h
s
z
s
z
h
C i
i 

 
 




)
(
1
)
)
(
)(
)
(
(
)
(
1
)
(
ˆ
h
N
i
i
i z
h
s
z
z
s
z
h
N
h
C
z
.
)
0
(
)
(
1
)
0
(
)
(
)
(
C
h
C
h
C
h
g
 


).
(
)
0
(
)
( h
C
C
h 

g
7
Properties of the moment estimator for variogram
1. It is unbiased:
2. If Z(s) is ergodic, as n  . This means that the moment
estimator approaches the true value for the variogram as the size of the region
increases. The estimator is consistent.
3. The moment estimator converges in distribution to a normal distribution
as n  , i.e., it is approximately normally distributed for large samples.
4. For Gaussian processes, the approximate variance-covariance matrix of
is available (Cressie 1985).
* Cressie, N. 1985. Fitting variogram models by weighted least squares. Mathematical Geology 17:563-
586.
)
(
))
(
ˆ
( h
h
E g
g 
)
(
)
(
ˆ h
h g
g 
)
(
ˆ h
g
)
(
ˆ h
g
8
Properties of the moment estimator for covariance
The covariance: C(h) = cov(Z(si), Z(si+h))
The moment estimator:
Properties:
1. The moment estimator for the covariance is biased. The bias arises because the
covariance function for the residuals, is not the same as the
covariance function for the errors,
2. For a second-order stationary random field, the moment estimator for the
covariance is consistent: Ĉ(h)C(h) almost surely as n  . However, the
convergence is slower than the varigogram.
3. For a second-order stationary random field, the moment estimator is
approximately normally distributed.
Properties 1 and 2 are the reasons why the variogram is preferred over the
covariance function (and correlogram) in modeling geostatistical data.
,
)
(
)
(
ˆ z
s
Z
s i
i 


.
)
(
)
( 
 
 i
i s
Z
s
 
 




)
(
1
)
)
(
)(
)
(
(
)
(
1
)
(
ˆ
h
N
i
i
i z
h
s
z
z
s
z
h
N
h
C
9
Variogram models
There are two reasons we need to fit a model to the empirical variogram:
1. Spatial prediction (kriging) requires estimates of the variogram g(h) for those h’s
which are not available in the data.
2. The empirical variogram cannot guarantee the variance of predicted values to be
positive. A variogram model can ensure the variance positive.
Various parametric variogram models have been used in the literature. The follows are
some of the most popular ones.
Linear model –
where c0 is the nugget effect. The linear variogram
has no sill, and so the variance of the process is infinite.
The existence of a linear variogram suggests a trend in
the data, so you should consider fitting a trend to the
data, modeling the data as a function of the coordinates (trend surface analysis).
bh
c
h 
 0
)
(
g
h
g(h)
10
Power model -
where c0 is the nugget effect. The power variogram has no sill, so the variance of the
process is infinite. The linear variogram is a special case of the power model.
Similarly, the existence of a linear variogram suggests a trend in the data, so you
should consider fitting a trend to the data, modeling the data as a function of the
coordinates (trend surface analysis).

g bh
c
h 
 0
)
(
h
g(h)
 < 1
 > 1
11
Exponential model -
where c0 is the nugget effect. The sill is c0+c1. The
range for the exponential model is defined to be 3
at which the variogram is of 95% of the sill.
Gaussian model -
where c0 is the nugget effect. c0+c1 is the sill. The
range is 3. This model describes a random field that
is considered to be too smooth and possesses the
peculiar property that Z(s) can be predicted without
error for any s on the plane.
 

g /
1
0 1
)
( h
e
c
c
h 



h
g(h)
h
g(h)





 

  2
)
/
(
1
0 1
)
( 
g h
e
c
c
h
12
Cauchy model -
where c0 is the nugget effect is 0. The sill is c0+c1.
Matern model -
where c0 is the nugget effect. The sill is c0+c1.
Matern model is the default in variog of geoR.




















g 2
1
0 )
(
1
1
)
(
h
c
c
h
h
g(h)










 
)
(
)
(
)
(
2
1
1
)
( 1
1
0



g 


h
h
c
c
h


 


 )
2
(
2
)
(
)
(
t
t
Bessel function:
Matern function for c0=0
13
Logistic model (rational quadratic model) -
where c0 is the nugget effect. The sill is c0+a/b. The
range for the logistic model is
Spherical model -
where c0 is the nugget effect. The sill is c0+c1. The
range for the spherical model can be computed by
setting g(h) = 0.95(c0+c1).
2
2
0
1
)
(
bh
ah
c
h



g
.
)
(
19
0
0
bc
a
b
bc
a


for 0  h  a
for h  a








 3
1
0 )
(
2
1
2
3
)
(
a
h
a
h
c
c
h
g
1
0
)
( c
c
h 

g
h
g(h)
h
g(h)
14
Parameter estimation
There are commonly two ways to fit the variogram models to an empirical variogram. Assume
the variogram model g(h; q), where q is an unknown parameter vector. For example, for
the exponential variogram model q = (c0, c1, ).
Ordinary least squares method – The OLS estimator for q is obtained by finding that
minimizes
The OLS estimation can be easily implemented in Splus using function nlminb or nls. Initial
values for q are required, these values can be obtained from the empirical variogram.
Notes:
1. OLS estimation assumes that
- does not depend on the lag distance hi
- for all pairs of lag distances hi  hi.
2. Both assumptions are violated. The variance and the covariance depend on the number of
pairs of sites used to compute the empirical variogram (see Cressie 1985).
3. These violations do not contribute significantly to the bias of the parameter estimation.
qˆ
  .
)
;
(
)
(
ˆ
)
( 2
 

i
i
i h
h
Q θ
θ g
g
))
(
ˆ
var( i
h
g
0
))
(
ˆ
),
(
ˆ
cov( 
j
i h
h g
g
15
Weighted least squares estimator
The WLS estimator for q is obtained by finding that minimizes
where
So that,
To note that the WLS estimator is more precise (has a smaller variance) than the OLS estimator.
Model selection criteria: Select a model with the smallest residual sum of squares or AIC or log-
likelihood ratio, but pay a particularly attention to the goodness-of-fit at short distance lags
(important for efficient spatial prediction).
qˆ
  .
))
(
ˆ
var(
)
;
(
)
(
ˆ
)
(
2



i i
i
i
h
h
h
Q
g
g
g θ
θ
.
)
(
))
,
(
(
2
))
(
ˆ
var(
2
i
i
i
h
N
h
h
θ
g
g 
  .
1
)
;
(
)
(
ˆ
)
(
2
1
)
;
(
)
(
ˆ
))
;
(
(
2
)
(
)
(
2
2
2  









 

i i
i
i
i
i
i
i
i
h
h
h
N
h
h
h
h
N
Q
θ
θ
θ
θ
g
g
g
g
g
16
R implementation for fitting variograms
An experimental variogram is fitted using variofit of geoR:
Create a geodata data for Gigante soil data, surface water pH values:
>variog.b=variog(soil87.geodat,max.dist=500)
>variog.ols.exp=variofit(soil87.geodat, cov.model=“exponential”,wei=“equal”)
>variog.wls.exp=variofit(soil87.geodat, cov.model=“exponential”)
>plot(variog.b)
>lines(variog.ols.exp)
>lines(variog.wls.exp,col=“red”)
Note: (1) There are many covariance model for choosing: "matern", "exponential",
"gaussian", "spherical", "circular", "cubic", "wave", "power", "powered.exponential",
"cauchy", "gencauchy", "gneiting", "gneiting.matern", "pure.nugget".
(2) In the function variofit, weis=“equal” (i.e., OLS) each sample equally contributes to the
objective function Q(q), while by the default (i.e., WLS) Q(q) is weighted in proportion to the
number of obs used in computing the sample variance. Thus, locations based on a few obs
will not carry as much weight compared to the one based on a large number of obs.
17
Fractals – The concept of dimension
Geometric objects are traditionally viewed and measured in the Euclidean space, e.g., line,
rectangle and cube, with dimension D = 1, 2, and 3, respectively.
However, many phenomena in nature (e.g., clouds, snow flakes, tree architecture) cannot be
satisfactorily described using Euclidean dimensions. To describe the irregularity of such
geometric phenomena (irregular geometric objects
are called fractals), we need to generalize the
concept of Euclidean dimension.
The Hausdorff Dimension – If we take an object
residing in Euclidean dimension D and reduce its
linear size by 1/r in each spatial direction, the
number of replicas of the original object would
increase to N = rD times. D = log(N)/log(r), is the
Hausdorff dimension, named after the German
mathematician, Felix Hausdorff. The important
point is that in fractal dimension D need not be an
integer, it could be a fraction. It has proved useful
for describing natural objects.
D = 1 D = 2 D = 3
r = 1
r = 2
r = 3
N = 1
N = 1 N = 1
N = 4 N = 8
N = 2
N = 3
N = 9 N = 27
18
Examples of geometric objects with non-integer dimensions
1. Cantor set (dust) – Begin with a line of length 1, called initiator. Then remove the middle
third of the line, this step is called the generator, because it specifies a rule that is used to
generate a new form. The generator could iteratively
infinitely be applied to the remaining segments
so that to generate a set of “dust”. The dusts
are obviously neither points nor lines, but lay
somewhere between them, thus has a dimension
between 0 and 1:
D = log(N)/log(r) = log(2)/log(3) = 0.6309.
2. Koch curve – D = log(4)/log(3) = 1.2618.
Initiator
Generator
3. Sierpinski triangle –
D = log(3)/log(2)
= 1.5850
19
Self-similarity and smoothness
An important property of a fractal is self-similarity, which refers to an infinite nesting of
structure on all scales. It means that a substructure resembles the form of its superstructure,
e.g., leaf shape resembles branch shape, whereas branch resembles tree shape.
Another important way to understand fractal dimension is that D is a smoothness measure of
a spatial process/object (e.g., surface smoothness/roughness). When D = 1 (a line), or = 2 (a
plane), the objects are smooth. For those objects whose D’s are between 1 or 2 (e.g., Koch
curve or Sierpinski triangle), their smoothness varies between a line and a plane.
Study surface growth and smoothness is increasingly becoming an important physic and
biological subjects. It has much to do with fractal geometry and spatial statistics. An example
is a technology, called molecular beam epitaxy, used to manufacture thin films for computer
chips and other semiconductor devices. It is a process to deposit silicon molecules to create a
very smooth si surface.
* Manderlbrot, B. B. 1982. The fractal geometry of Nature. Freeman, San Francisco.
* Meakin, P. 1998. Fractals, scaling and growth far from equilibrium. Cambridge U. Press.
* Barabási, A.-L. and Stanley, H. E. 1995. Fractal concepts in surface growth. Cambridge U. Press.
20
Calculating fractal dimension from a variogram
Because the smoothness of a spatial process is directly related to the smoothness of the
covariance function at h  0, a fractal dimension can be calculated from a variogram. If
then we say the process Z(s) is continuous.
For a continuous covariance, we have
or
Where o(h) is a term of smaller order than h for h at neighborhood 0. The fractal dimension
of the surface is D = 2 – /2.  can be estimated from an empirical variogram as follows:
log(g(h)) = log(b) +  log(h).
* Davies, S. & Hall, P. 1999. Fractal analysis of surface roughness by using spatial data (with Discussion). JRSS, B. 61:3-37.
* Palmer, M. W. 1988. Fractal geometry: a tool for describing spatial patterns of plant communities. Vegetation 75:91-102.
* Burrough, P. A. 1981. Fractal dimensions of landscapes and other environmental data. Nature 294:240-242.
,
)
(
)
0
( 
bh
h
C
C 

)
(
)
0
(
)
( 

b h
o
h
C
h
C 



b
g h
h 
)
(

More Related Content

PDF
The mineral reserves & reserves estimation using triangular methods
PPTX
Radioactive Survey
PPT
Seismic acquisition
PPTX
Mineral exploration
PPT
Mineral exploration
PDF
Stages of exploration
The mineral reserves & reserves estimation using triangular methods
Radioactive Survey
Seismic acquisition
Mineral exploration
Mineral exploration
Stages of exploration

What's hot (20)

PPTX
SEISMIC METHOD
PPT
Vertical Exaggeration
PPT
Concept of oc mine planning & design(final)
PPTX
Introduction to Seismic Method
PPTX
Gravity method
PDF
Basics1variogram
PPTX
SAMPLING IN GEOLOGY
PPTX
ELECTRICAL METHODS OF GEOPHYSICAL EXPLORATION OF MINERAL DEPOSITS.pptx
PPT
Geological mapping
PPT
Underground metal mining methods
PPTX
The Dip Meter Log By Majid Marooq UAJK
PPT
Magnetic prospecting
PDF
Openpit fundamentals
PDF
Mine surveying 981 (1)
DOCX
Unstable/Astatic Gravimeters and Marine Gravity Survey
PPT
UNFC concept & teminology.ppt
DOC
Slug Test Procedures
PDF
Basins sedimentarys
PDF
Variograms
SEISMIC METHOD
Vertical Exaggeration
Concept of oc mine planning & design(final)
Introduction to Seismic Method
Gravity method
Basics1variogram
SAMPLING IN GEOLOGY
ELECTRICAL METHODS OF GEOPHYSICAL EXPLORATION OF MINERAL DEPOSITS.pptx
Geological mapping
Underground metal mining methods
The Dip Meter Log By Majid Marooq UAJK
Magnetic prospecting
Openpit fundamentals
Mine surveying 981 (1)
Unstable/Astatic Gravimeters and Marine Gravity Survey
UNFC concept & teminology.ppt
Slug Test Procedures
Basins sedimentarys
Variograms
Ad

Similar to Variogram C9.ppt (20)

PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
PDF
1646367063278_Material-5---Spatial-Structure-of-Variogram.pdf
PDF
Optimization of sample configurations for variogram estimation
PPT
geostatistics_for introduction and analysis
PDF
The Correlogram Explained
PDF
1648796607723_Material-8---Concept-on-Estimation-Variance.pdf
PDF
Linear Model of Coregionalization
PPTX
Basic geostatistics
PDF
Introduction geostatistic for_mineral_resources
PDF
Deriving and applying direct and cross indicator variograms in SIS (2006)
PDF
Autocorrelation_kriging_techniques for Hydrology
PDF
Temporal trends of spatial correlation within the PM10 time series of the Air...
PDF
Lecture 2: Stochastic Hydrology
PPTX
Chapters 14 and 15 presentation
PDF
Manual Gstat
PPT
Ch11.kriging
PDF
Lecturenotesstatistics
PPTX
Exploratory Spatial Data Analysis spatial data analysis and interpretation.pptx
PDF
Introductory Statistics Explained.pdf
PPTX
ANOVA.pptx
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
1646367063278_Material-5---Spatial-Structure-of-Variogram.pdf
Optimization of sample configurations for variogram estimation
geostatistics_for introduction and analysis
The Correlogram Explained
1648796607723_Material-8---Concept-on-Estimation-Variance.pdf
Linear Model of Coregionalization
Basic geostatistics
Introduction geostatistic for_mineral_resources
Deriving and applying direct and cross indicator variograms in SIS (2006)
Autocorrelation_kriging_techniques for Hydrology
Temporal trends of spatial correlation within the PM10 time series of the Air...
Lecture 2: Stochastic Hydrology
Chapters 14 and 15 presentation
Manual Gstat
Ch11.kriging
Lecturenotesstatistics
Exploratory Spatial Data Analysis spatial data analysis and interpretation.pptx
Introductory Statistics Explained.pdf
ANOVA.pptx
Ad

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
UNIT 4 Total Quality Management .pptx
Construction Project Organization Group 2.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Digital Logic Computer Design lecture notes
Embodied AI: Ushering in the Next Era of Intelligent Systems
Foundation to blockchain - A guide to Blockchain Tech
bas. eng. economics group 4 presentation 1.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
OOP with Java - Java Introduction (Basics)
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CH1 Production IntroductoryConcepts.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Lecture Notes Electrical Wiring System Components
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
Mechanical Engineering MATERIALS Selection

Variogram C9.ppt

  • 1. 1 Chapter 9 – Variogram models Given a geostatistical model, Z(s), its variogram g(h) is formally defined as where f(s, u) is the joint probability density function of Z(s) and Z(u). For an intrinsic random field, the variogram can be estimated using the method of moments estimator, as follows: where h is the distance separating sample locations si and si+h, N(h) is the number of distinct data pairs. In some circumstances, it may be desirable to consider direction in addition to distance. In an isotropic case, h should be written as a scalar h, representing magnitude. Note: In the literature the terms variogram and semivariogram are often used interchangeably. By definition g(h) is semivariogram and the variogram is 2g(h).     u s u s u s u s h d d f Z Z Z Z ) , ( ) ( ) ( 2 1 ) ( ) ( var 2 1 ) ( 2      g   2 ) ( 1 ) ( ) ( ) ( 2 1 ) ( ˆ      h h s s h h N i i i z z N g
  • 2. 2 Robust variogram estimator Variogram provides an important tool for describing how the spatial data are related with distance. As we have seen it is defined in terms of dissimilarity in data values between two locations separated by a distance h. It is noted that the moment estimator given in the previous page is sensitive to outliers in the data. Thus, sometimes robust estimators are used. The widely used robust estimator is given by Cressie and Hawkins (1980): The motivation behind this estimator is that for a Gaussian process, we have Based on the Box-Cox transformation, it is found that the fourth-root of 1 2 is more normally distributed. * Cressie, N. and Hawkins, D. M. 1980. Robust estimation of the variogram, I. Journal of the International Association for Mathematical Geology 12:115-125.   . ) ( / 494 . 0 457 . 0 ) ( ) ( ) ( 2 1 ) ( ˆ 4 2 / 1 h N h s z s z h N h i i      g   2 1 2 ~ ) ( 2 ) ( ) (  g h s Z h s Z  
  • 3. 3 Variogram parameters The main goal of a variogram analysis is to construct a variogram that best estimates the autocorrelation structure of the underlying stochastic process. A typical variogram can be described using three parameters: Nugget effect – represents micro-scale variation or measurement error. It is estimated from the empirical variogram at h = 0. Range – is the distance at which the variogram reaches the plateau, i.e., the distance (if any) at which data are no longer correlated. Sill – is the variance of the random field V(Z), disregarding the spatial structure. It is the plateau the variogram reaches at the range, g(range). h g(h) 0 2 4 6 8 10 0.0 0.4 0.8 1.2 range h = 5 nugget = 0.2 sill = 1.0 ) ( ˆ h g
  • 4. 4 5 m 20° 20° 0.5 m 0.5 m Setting variogram parameters Construction of a variogram requires consideration of a few things: 1. An appropriate lag increment for h – It defines the distance at which the variogram is calculated. 2. A tolerance for the lag increment – It establishes distance bins for the lag increments to accommodate unevenly spaced observations. 3. The number of lags over which the variogram will be calculated – The number of lags in conjunction with the size of the lag increment will define the total distance over which a variogram is calculated. 4. A tolerance for angle – It determines how wide the bins will span. Two practical rules: 1. It is recommended that h is chosen as such that the number of pairs is greater than 30. 2. The distance of reliability for an experimental variogram is h < D/2, where D is the maximum distance over the field of data.
  • 5. 5 Computing variograms An experimental variogram is calculated using the package geoR: Create a geodata data for Gigante soil data, surface water pH values: >soil87pH.geodat=as.geodata(soil87.dat,coords.col=2:3,data.col=5,covar.col=6:14) >variog.b=variog(soil87pH.geodat,max.dist=500) >variog.c=variog(soil87pH.geodat, max.dist=500, op=“cloud”) >plot(variog.b) >plot(variog.c)
  • 6. 6 Covariogram and Correlogram Covariogram (analogous to covariance) and correlogram (analogous to correlation coefficient) are another two useful methods for measuring spatial correlation. They describe similarity in values between two locations. Covariogram: Its estimator: where is the sample mean. At h = 0, Ĉ(0) is simply the finite variance of the random field. It is straightforward to establish the relationship: The correlogram is defined as   ) ( ), ( cov ) ( h s z s z h C i i           ) ( 1 ) ) ( )( ) ( ( ) ( 1 ) ( ˆ h N i i i z h s z z s z h N h C z . ) 0 ( ) ( 1 ) 0 ( ) ( ) ( C h C h C h g     ). ( ) 0 ( ) ( h C C h   g
  • 7. 7 Properties of the moment estimator for variogram 1. It is unbiased: 2. If Z(s) is ergodic, as n  . This means that the moment estimator approaches the true value for the variogram as the size of the region increases. The estimator is consistent. 3. The moment estimator converges in distribution to a normal distribution as n  , i.e., it is approximately normally distributed for large samples. 4. For Gaussian processes, the approximate variance-covariance matrix of is available (Cressie 1985). * Cressie, N. 1985. Fitting variogram models by weighted least squares. Mathematical Geology 17:563- 586. ) ( )) ( ˆ ( h h E g g  ) ( ) ( ˆ h h g g  ) ( ˆ h g ) ( ˆ h g
  • 8. 8 Properties of the moment estimator for covariance The covariance: C(h) = cov(Z(si), Z(si+h)) The moment estimator: Properties: 1. The moment estimator for the covariance is biased. The bias arises because the covariance function for the residuals, is not the same as the covariance function for the errors, 2. For a second-order stationary random field, the moment estimator for the covariance is consistent: Ĉ(h)C(h) almost surely as n  . However, the convergence is slower than the varigogram. 3. For a second-order stationary random field, the moment estimator is approximately normally distributed. Properties 1 and 2 are the reasons why the variogram is preferred over the covariance function (and correlogram) in modeling geostatistical data. , ) ( ) ( ˆ z s Z s i i    . ) ( ) (     i i s Z s         ) ( 1 ) ) ( )( ) ( ( ) ( 1 ) ( ˆ h N i i i z h s z z s z h N h C
  • 9. 9 Variogram models There are two reasons we need to fit a model to the empirical variogram: 1. Spatial prediction (kriging) requires estimates of the variogram g(h) for those h’s which are not available in the data. 2. The empirical variogram cannot guarantee the variance of predicted values to be positive. A variogram model can ensure the variance positive. Various parametric variogram models have been used in the literature. The follows are some of the most popular ones. Linear model – where c0 is the nugget effect. The linear variogram has no sill, and so the variance of the process is infinite. The existence of a linear variogram suggests a trend in the data, so you should consider fitting a trend to the data, modeling the data as a function of the coordinates (trend surface analysis). bh c h   0 ) ( g h g(h)
  • 10. 10 Power model - where c0 is the nugget effect. The power variogram has no sill, so the variance of the process is infinite. The linear variogram is a special case of the power model. Similarly, the existence of a linear variogram suggests a trend in the data, so you should consider fitting a trend to the data, modeling the data as a function of the coordinates (trend surface analysis).  g bh c h   0 ) ( h g(h)  < 1  > 1
  • 11. 11 Exponential model - where c0 is the nugget effect. The sill is c0+c1. The range for the exponential model is defined to be 3 at which the variogram is of 95% of the sill. Gaussian model - where c0 is the nugget effect. c0+c1 is the sill. The range is 3. This model describes a random field that is considered to be too smooth and possesses the peculiar property that Z(s) can be predicted without error for any s on the plane.    g / 1 0 1 ) ( h e c c h     h g(h) h g(h)           2 ) / ( 1 0 1 ) (  g h e c c h
  • 12. 12 Cauchy model - where c0 is the nugget effect is 0. The sill is c0+c1. Matern model - where c0 is the nugget effect. The sill is c0+c1. Matern model is the default in variog of geoR.                     g 2 1 0 ) ( 1 1 ) ( h c c h h g(h)             ) ( ) ( ) ( 2 1 1 ) ( 1 1 0    g    h h c c h        ) 2 ( 2 ) ( ) ( t t Bessel function: Matern function for c0=0
  • 13. 13 Logistic model (rational quadratic model) - where c0 is the nugget effect. The sill is c0+a/b. The range for the logistic model is Spherical model - where c0 is the nugget effect. The sill is c0+c1. The range for the spherical model can be computed by setting g(h) = 0.95(c0+c1). 2 2 0 1 ) ( bh ah c h    g . ) ( 19 0 0 bc a b bc a   for 0  h  a for h  a          3 1 0 ) ( 2 1 2 3 ) ( a h a h c c h g 1 0 ) ( c c h   g h g(h) h g(h)
  • 14. 14 Parameter estimation There are commonly two ways to fit the variogram models to an empirical variogram. Assume the variogram model g(h; q), where q is an unknown parameter vector. For example, for the exponential variogram model q = (c0, c1, ). Ordinary least squares method – The OLS estimator for q is obtained by finding that minimizes The OLS estimation can be easily implemented in Splus using function nlminb or nls. Initial values for q are required, these values can be obtained from the empirical variogram. Notes: 1. OLS estimation assumes that - does not depend on the lag distance hi - for all pairs of lag distances hi  hi. 2. Both assumptions are violated. The variance and the covariance depend on the number of pairs of sites used to compute the empirical variogram (see Cressie 1985). 3. These violations do not contribute significantly to the bias of the parameter estimation. qˆ   . ) ; ( ) ( ˆ ) ( 2    i i i h h Q θ θ g g )) ( ˆ var( i h g 0 )) ( ˆ ), ( ˆ cov(  j i h h g g
  • 15. 15 Weighted least squares estimator The WLS estimator for q is obtained by finding that minimizes where So that, To note that the WLS estimator is more precise (has a smaller variance) than the OLS estimator. Model selection criteria: Select a model with the smallest residual sum of squares or AIC or log- likelihood ratio, but pay a particularly attention to the goodness-of-fit at short distance lags (important for efficient spatial prediction). qˆ   . )) ( ˆ var( ) ; ( ) ( ˆ ) ( 2    i i i i h h h Q g g g θ θ . ) ( )) , ( ( 2 )) ( ˆ var( 2 i i i h N h h θ g g    . 1 ) ; ( ) ( ˆ ) ( 2 1 ) ; ( ) ( ˆ )) ; ( ( 2 ) ( ) ( 2 2 2               i i i i i i i i i h h h N h h h h N Q θ θ θ θ g g g g g
  • 16. 16 R implementation for fitting variograms An experimental variogram is fitted using variofit of geoR: Create a geodata data for Gigante soil data, surface water pH values: >variog.b=variog(soil87.geodat,max.dist=500) >variog.ols.exp=variofit(soil87.geodat, cov.model=“exponential”,wei=“equal”) >variog.wls.exp=variofit(soil87.geodat, cov.model=“exponential”) >plot(variog.b) >lines(variog.ols.exp) >lines(variog.wls.exp,col=“red”) Note: (1) There are many covariance model for choosing: "matern", "exponential", "gaussian", "spherical", "circular", "cubic", "wave", "power", "powered.exponential", "cauchy", "gencauchy", "gneiting", "gneiting.matern", "pure.nugget". (2) In the function variofit, weis=“equal” (i.e., OLS) each sample equally contributes to the objective function Q(q), while by the default (i.e., WLS) Q(q) is weighted in proportion to the number of obs used in computing the sample variance. Thus, locations based on a few obs will not carry as much weight compared to the one based on a large number of obs.
  • 17. 17 Fractals – The concept of dimension Geometric objects are traditionally viewed and measured in the Euclidean space, e.g., line, rectangle and cube, with dimension D = 1, 2, and 3, respectively. However, many phenomena in nature (e.g., clouds, snow flakes, tree architecture) cannot be satisfactorily described using Euclidean dimensions. To describe the irregularity of such geometric phenomena (irregular geometric objects are called fractals), we need to generalize the concept of Euclidean dimension. The Hausdorff Dimension – If we take an object residing in Euclidean dimension D and reduce its linear size by 1/r in each spatial direction, the number of replicas of the original object would increase to N = rD times. D = log(N)/log(r), is the Hausdorff dimension, named after the German mathematician, Felix Hausdorff. The important point is that in fractal dimension D need not be an integer, it could be a fraction. It has proved useful for describing natural objects. D = 1 D = 2 D = 3 r = 1 r = 2 r = 3 N = 1 N = 1 N = 1 N = 4 N = 8 N = 2 N = 3 N = 9 N = 27
  • 18. 18 Examples of geometric objects with non-integer dimensions 1. Cantor set (dust) – Begin with a line of length 1, called initiator. Then remove the middle third of the line, this step is called the generator, because it specifies a rule that is used to generate a new form. The generator could iteratively infinitely be applied to the remaining segments so that to generate a set of “dust”. The dusts are obviously neither points nor lines, but lay somewhere between them, thus has a dimension between 0 and 1: D = log(N)/log(r) = log(2)/log(3) = 0.6309. 2. Koch curve – D = log(4)/log(3) = 1.2618. Initiator Generator 3. Sierpinski triangle – D = log(3)/log(2) = 1.5850
  • 19. 19 Self-similarity and smoothness An important property of a fractal is self-similarity, which refers to an infinite nesting of structure on all scales. It means that a substructure resembles the form of its superstructure, e.g., leaf shape resembles branch shape, whereas branch resembles tree shape. Another important way to understand fractal dimension is that D is a smoothness measure of a spatial process/object (e.g., surface smoothness/roughness). When D = 1 (a line), or = 2 (a plane), the objects are smooth. For those objects whose D’s are between 1 or 2 (e.g., Koch curve or Sierpinski triangle), their smoothness varies between a line and a plane. Study surface growth and smoothness is increasingly becoming an important physic and biological subjects. It has much to do with fractal geometry and spatial statistics. An example is a technology, called molecular beam epitaxy, used to manufacture thin films for computer chips and other semiconductor devices. It is a process to deposit silicon molecules to create a very smooth si surface. * Manderlbrot, B. B. 1982. The fractal geometry of Nature. Freeman, San Francisco. * Meakin, P. 1998. Fractals, scaling and growth far from equilibrium. Cambridge U. Press. * Barabási, A.-L. and Stanley, H. E. 1995. Fractal concepts in surface growth. Cambridge U. Press.
  • 20. 20 Calculating fractal dimension from a variogram Because the smoothness of a spatial process is directly related to the smoothness of the covariance function at h  0, a fractal dimension can be calculated from a variogram. If then we say the process Z(s) is continuous. For a continuous covariance, we have or Where o(h) is a term of smaller order than h for h at neighborhood 0. The fractal dimension of the surface is D = 2 – /2.  can be estimated from an empirical variogram as follows: log(g(h)) = log(b) +  log(h). * Davies, S. & Hall, P. 1999. Fractal analysis of surface roughness by using spatial data (with Discussion). JRSS, B. 61:3-37. * Palmer, M. W. 1988. Fractal geometry: a tool for describing spatial patterns of plant communities. Vegetation 75:91-102. * Burrough, P. A. 1981. Fractal dimensions of landscapes and other environmental data. Nature 294:240-242. , ) ( ) 0 (  bh h C C   ) ( ) 0 ( ) (   b h o h C h C     b g h h  ) (