SlideShare a Scribd company logo
Concepts in Geostatistics
Concepts in Geostatistics
Edited by
Richard B. McCammon
Springer-Verlag
Berlin Heidelberg New York
1975
Library of Congress Cataloging in Publication Data
Main entry under title:
Concepts in geostatistics.
Includes bibliographies and index.
1. Geology--Statistical methods. 2. Electronic
data processing--Geology. I. ~1cCammon, Richard B., ed.
QE33.2.M3C66 519.5102'4~5 74-23669
No part of this book may be translated or reproduced in
any form without written permission from Springer-Verlag.
~ 1975 by Springer-Verlag New York Inc.
lSBN-13: 978-3-540-06892-1 e-lSBN-13: 978-3-642-85976-2
DOl: 10.1007/978-3-642-85976-2
Preface
A two-week summer short course entitled Current Statistical
Methods in Geology supported by the National Science Foundation was
held at the University of Illinois at Chicago Circle in Chicago,
Illinois from June 19 to June 30, 1972. The aim of the short course
was to bridge the gap between the traditional first courses in sta-
tistics offered at most educational institutions and geostatistics as
it is being developed by geologists and statisticians engaged in the
application of statistics in geology. The course was intended for
geology college teachers who were either then teaching or preparing
to teach a course within their department dealing with computer ap-
plications and the use of statistical methods in geology.
This book arose out of the class notes which were prepared by
the course director and the invited lecturers.
We are grateful to the 28 teachers who attended for their enthu-
siastic interest and thoughtful responses to the many statistical
concepts presented to them as geologists during the two weeks of the
course.
I am deeply grateful to my graduate assistants, Richard Kolb and
Andrea Krivz, for the long hours spent in collating the course mater-
ials, testing the various computer programs, and instructing the par-
ticipants in the use of computer BASIC.
Richard B. McCammon
v
Introduction
It is now little over 10 years since Miller and Kahns' Statistical
Analysis in the Geological Sciences appeared in the geologic litera-
ture. By all accounts, this is considered to be the first modern text
in statistical geology. Since then, Krumbein and Graybills' An Intro-
duction to Statistical Models in Geology, the two volume work by Koch
and Link, Statistical Analysis of Geological Data, and Davis' Statis-
tics and Data Analysis in Geology have appeared. These books have
been witness to the increasing quantification taking place in geology
and the earth sciences generally. Coupled with the advances in com-
puters, the geologist is now in a position to portray his data and
characterize his results on a scale that heretofore was not possible.
Briefly, the numeric treatment of geologic data has come of age.
In the quantification of geology, statistics has served as mid-
wife to the concept of the process response model applied to geologic
processes. Because precise hypotheses about natural processes have
always proved difficult to formulate, it is not surprising that sta-
tistical rather than deterministic models have been put forward.
Today statistics is being applied in virtually every branch of geology.
As elsewhere in science where statistics has been applied, how-
ever, what has held back the more rapid assimilation of statistical
concepts in the minds of those engaged within a particular discipline
is the absence of an orderly presentation of statistics as it applies
to the particular discipline. Although this deficiency has been
largely overcome in physics, chemistry, and lately, biology, this is
not yet the case for geology. Moreover, there have come to be
vii
viii INTRODUCTION
identified a number of statistical methods commonly used in geology
that are not sufficiently understood by the average geology teacher
so as to be presented effectively to the student.
There is little doubt that the geologist of the future will be
required to make more quantitative judgments. In assessing the impact
of technology on the environment for instance, the geologist will have
to interpret data obtained from a wide variety of sources from which
he will be expected to extract a more exact meaning. In reconstruc-
ting more precisely the Earth's past based on geophysical measurements
and geochemical analyses of rock samples, he will need to perform sta-
tistical analyses in order that more exact inferences may be drawn
from the data. Because more quantitative data will be collected, more
quantitative geologic models will need to be developed. It is likely
that as geologic prediction becomes more precise, it will become more
quantitative. A geologist therefore will need to be better trained
in the application of statistics.
The educational imperative for statistics in geology, therefore,
is to introduce statistical concepts into the curriculum. This can be
done either as a course in geostatistics, or, if this is not feasible,
to incorporate basic statistical concepts into those geology courses
that utilize statistical methods. While it is true that students in
geology and related earth science fields must continue to be encour-
aged to take the basic course in statistics, the fact remains that ex-
posure to statistics within the field of interest is the anvil upon
which a more meaningful grasp of statistics will be forged.
Few geology departments today can afford the luxury of having one
of its faculty members specialize in geostatistics. While it is rec-
ognized that a more quantitative approach to geology is evolving, there
remains the more pressing problem of unifying the earth sciences and
exposing the student to a more comprehensive view of the Earth's en-
vironment, past and present. Therefore, what is most likely to happen
at this time is for a department to single out a member who has found
statistics particularly useful in his field of study and to ask him to
teach a course in geostatistics. Upon agreeing to this, the faculty
member in question realizes soon afterward his own limited exposure to
statistics or what is more likely, his inadequate knowledge of the
INTRODUCTION ix
application of statistics in geology in fields outside of his own. It
was with this in mind that a two-week summer short course for geology
college teachers was given and from which this book has evolved.
The book is divided into chapters corresponding to the material
presented by the different lecturers in the course. There has been
no attempt made to treat any subject in its full detail nor has there
been a concerted effort to survey all the possible topics covering the
field of statistical geology. The idea rather has been to introduce
some basic concepts and to give examples of applications of statistics
in geology with the intention of provoking interest and eventually
generating discussion among geologists. For someone who is either now
teaching or is planning to teach a course in geostatistics, it is
hoped this book will serve as a guide. Much of the contained material
was prepared specifically for the two week short course. For the
student most of all, it is hoped that the book will make for enjoyable
reading and fruitful study.
Richard B. McCammon
List of Contributors
Felix Chayes
Geophysical Laboratory
Carnegie Institute of Washington
Washington, DC 20008
William T. Fox
Department of Geology
Williams College
Williamstown, MA 01267
J .E. Klovan
Department of Geology
University of Calgary
Calgary 44, Alberta
Canada
W.C. Krumbein
Department of Geological Sciences
Northwestern University
Evanston, IL 60201
R. B. McCammon
Department of Geological Sciences
University of Illinois at Chicago Circle
Chicago, Illinois 60680
Daniel F. Merriam
Department of Geology
Syracuse University
Syracuse, NY 13210
xi
Contents
PREFACE
INTRODUCTION
LIST OF CONTRIBUTORS
CHAPTER 1.
1.1
1.2
1.3
1.4
1.5
STATISTICS AND PROBABILITY by R.B. McCammon
Sample Mean and Variance
Elements of Probability
Problems
Searching for.Dikes
Rocks in Thin Section
References
CHAPTER 2. R- AND Q-MODE FACTOR ANALYSIS by J.E. Klovan
2.1 Meaning of Factor
2.2 Data Matrices
2.3 Factor Analytic Modes
2.4 The R-Mode Model
2.5 A Practical Example
2.6 Factors
2.7 The Q-Mode Model
2.8 An Example
2.9 Oblique Rotation
2.10 Practical Examples
Refere.nces
APPENDIX 1: A Primer on Matrix Algebra
xiii
v
vii
xi
1
1
4
6
7
13
19
21
21
22
23
24
36
37
45
51
54
57
60
62
xiv CONTENTS
CHAPTER 3. SOME PRACTICAL ASPECTS OF TIME SERIES ANALYSIS by 70
William T. Fox
3.1 Generalities
3.2 Polynomial Curve Fitting
3.3 Iterated Moving Averages
3.4 Fourier Analysis
3.5 An Application
70
72
74
78
81
References 88
CHAPTER 4. MARKOV MODELS IN THE EARTH SCIENCES by W.C. Krumbein 90
4.1 Fundamentals
4.2 A Spectrum of Models
4.3 The Markov Chain
4.4 Geometric Distribution
4.5 Probability Trees
4.6 Embedded Markov Chains
4.7 Extensions
References
Bibliography
CHAPTER 5. A PRIORI AND EXPERIMENTAL APPROXIMATION OF
SIMPLE RATIO CORRELATIONS by Felix Chayes
5.1 Ratio Correlations
5.2 RTCRSM2
5.3 UNNO AND RANEX
5.4 Comments on Usage
References
APPENDIX 1: RTCRSM2 Program
APPENDIX 2: RANEX and UNNO Programs
CHAPTER 6. COMPUTER PERSPECTIVES IN GEOLOGY by
Daniel F. Merriam
6.1 Generalities
6.2 Early Beginnings
6.3 Usage in General
6.4 Usage in Geology
6.5 Patterns and Trends
References
90
91
92
94
95
97
99
101
102
106
106
111
112
113
114
115
129
138
138
139
139
141
146
149
CONTENTS
CHAPTER 7. PROBLEM SET IN GEOSTATISTICS by R.B. McCammon
7.1 A Paleontologist's Dilemma
7.2 Particle Size Distribution in Thin Section
7.3 Particle Diameter Sampling Experiment
7.4 Linear Regression of Porosity Data. I
7.5 Linear Regression of Porosity Data. II
7.6 Sunspots and Earthquakes
7.7 A Beginning and an End
7.8 Helmert Transformation
7.9 A Spine to Remember
7.10 Nearshore-Offshore Sediments
References
xv
150
150
150
152
153
155
156
157
158
160
162
163
Chapter 1
Statistics and Probability
R. B. McCammon
1.1 SAMPLE MEAN AND VARIANCE
Perhaps the best known statistic for describing a given set of
numbers is the arithmetic mean. The mean conveys the notion of cen-
tral tendency. With respect to data, the mean is linked with the idea
of sampling and statistical inference. As familiar as the mean may be
to most of us, its sequential properties may not be so well known.
Thus, for ordered observations, whether these be based on time, space,
or experimental design, the mean of n such observations is given by
x
n n
+ x
n
where it is understood that the x.-th observation precedes the x.-th
1 J
observation for i<j. If we add now an observation and recalculate the
mean based on en + 1) observations, we can write
where xn+1 represents the new observation. This we recognize as the
recursive form of the mean.
Next, we define v
v
1
2 R. B. "1cCAMMON
as the ratio of the mean calculated for n + I observations divided by
the mean calculated for the preceding n observations, and similarly,
we define
n
x
n
where xn+l represents the next observation. From this, we can write
v= n~ I+ (n ! 1) n
or, solving for n,
n = (v - l)n + v
We can ask now how large or how small must a new observation be
in order to affect the mean significantly. Suppose, for instance, we
find that the mean is doubled after a new observation is added to ten
previous observations. For this to happen, the new observation would
have to be 12 times greater than the previous mean. In order to dou-
ble a mean calculated from 100 preceding observations, the new obser-
vation would have to be 102 times greater than the preceding mean.
We conclude, therefore, that the mean becomes increasingly more diffi-
cult to alter with increasing sample size unless there are increas-
ingly erratic fluctuations in the observations. In the context of a
time-dependent geologic process, we can conclude that cumulative ef-
fects tend toward equilibrium with advancing geologic time and that
any significant departure from equilibrium is most likely due to out-
side influences.
Another statistic used to characterize a set of given values is
the variance. The variance describes the scatter about the mean and
for n observations, is defined as
n
'E (xi
- 2
- xn)
2 i=l
sn n - 1
where x represents the mean of the n observations. The denominator
n
is given by n - 1 rather than n by virtue of the fact that the
1. STATISTICS AND PROBABILITY
numerator can be expressed as the sum of n - 1 squared terms each of
which is independent of the mean.
If we consider n-ordered observations as before, we can write
n - 1 2
---s
n n
the recursive form of the variance. If
x = x
n+1 n
it follows that
for s2 > 0
n
This reinforces our earlier comment as regards the increase in the
stability of the mean with an increase in sample size.
3
We turn the problem around now slightly and inquire for which new
observation will it be true that
2
s
n
Using the recursive relation, we obtain
x = x ± .In + 1 s
n+1 n 1 n n
as the value for which the variance remains constant. It implies how-
ever that the new mean will be different. Thus the paradox is that in
order to maintain a constant mean, the variance must be reduced,
whereas to maintain a constant variance, the mean must change. For
successive observations, the mean and variance cannot both remain
constant (unless the variance is zero). For observations further re-
moved, the mean and variance can both remain constant, however, if
one considers cyclic fluctuations. This is, in fact, the definition
of a stationary time series about which we shall hear more later.
From what we have said about the variance, it should not be dif-
ficult for you to write the recursive form for the covariance between
two ordered pairs of observations. From there, it should not be much
more difficult to write the recursive form of the correlation coeffi-
cient. This is left as an exercise.
4 R.B. McCAMMON
1.2 ELEMENTS OF PROBABILITY
Probability can be viewed as partial information either known or
presumed known prior to an event. In the previous section, the mean
and variance were looked upon as descriptors of past data collected.
In terms of probability, our concern lies with both past and future
data. For some random variable Y, we associate the probability p(y)
that Y will take on the value y. Such a statement is conditioned by
information H and thus we define the conditional probability p(yIH)
as the probability that Y takes on the value y given information H.
An example from geology that illustrates the concept of conditional
probability is the fossil collector in search of trilobites who esti-
mates that there is a much greater probability of finding a trilobite
in Cambrian strata compared with Cretaceous strata.
that
and
or
The two most important properties of conditional probability are
p(yIH) ~ 0 for all y ~ Y
~ p(YIH) dy
y~Y
~ P (yIH) 1
y~Y
1
if Y is discrete.
We can argue further that H represents information on a second
random variable X; consequently, the conditional probability p(Ylx) is
expressed as
P(ylx) = p(x,y)
p(x)
where p(x,y) is the joint probability for X and Y and p(x) is the un-
conditional probability for X. If Y is independent of X
p(x,y) = p(x)p(y)
so that
p(Ylx) = p(y)
which is another way of saying that Y is independent of X.
1. STATISTICS AND PROBABILITY
For Y dependent on X, we can write
p(y) = jP(yIX)P(X) dx
XLX
for the unconditional probability of Y.
5
In many instances, it is necessary to transform one random vari-
able to a new variable. For the continuous case, if
Y = f(X)
it follows that
p(y) dy = p(x) dx
for all x L X and y L Y. Solving for Y, we have
p(y) = p(x) Idx/dyl
Let us consider an example. We define
p(x) = 1~x
otherwise
This is a probability density since
II 1 IX =1
O
P(X) dx = f 2x dx = x2
o x=O
1
Suppose we wish to find the probability density of Y where
Y = X2
Taking the derivative,
dy = 2x dx
we have
p (y) 1
so that
p (y) = {~
otherwise
Thus, the random variable Y is uniformly distributed between 0 and 1.
6 R.B. McCAMMON
Another distribution we may need to establish is the sum of two
independent rand?m variables each of which follows the same probabil-
ity density. In general, we can write for continuous variables
p (x + y) = f p (u)p (z - u) du
uE::U
where Z = X + Y.
As an example, consider
p(x) = p(y) = ~~
O~x,y~l
otherwise
We wish to find the probability density for Z defined as
Z = X + Y
We write
p (z)
or
p (z)
z
f l(u)p(z - u) du
p (u)p (z - u) du
1
O~z~l
as the probability density for Z. Try the following problems.
1. 3 PROBLEMS
1.3.1 Consider the probability density function for X given by
f(x) = ~ ~ otherwise
Let Y = -~n X. Find the probability density function for Y.
1. STATISTICS AND PROBABILITY
1.3.2 Consider the probability density function for X given by
f(x) = (1/ l21T)e-1/2x2
Let Y = eX. Find fey).
1.3.3 Let the probability density function for X and Y be given by
f(x) II~a O::"x::"a
otherwise
fey) ~ l~a O,::y::..a
otherwise
Let Z X + Y. Find the probability density function for Z.
1.4 SEARCHING FOR DIKES
7
Suppose that we have a line segment of length L located somewhere
inside a given area. We propose to locate the segment by conducting a
search along parallel traverse lines spaced a distance D apart. We
will consider the line segment found if one of the traverse lines in-
tersects the segment. For L < D, we ask then what will be the proba-
bility that the line segment is found. Within a geologic context, the
line segment might represent the horizontal trace of a mineralized
dike and the parallel line traverses represent the survey lines of a
field party. Assuming the mineralization associated with the dike has
economic value, the cost of such a search can be weighed against the
expected value of the potential ore deposit. Ignoring the economic
implications, consider the probabilistic aspect of the problem.
To say that a line segment of length L lies somewhere in a given
area and nothing more presumes that such a line segment has a random
orientation with respect to an arbitrarily chosen traverse line. For
an arbitrary angle e, the situation can be seen in Figure 1.1. Thus,
the length component h of the line segment perpendicular to a given
line of traverse is given by
h = L sin e
and consequently, the probability of intersecting the line segment for
fixed e is given by
8 R.B. McCAMMON
f
D
!
FIGURE 1.1
Pr{Ila} = ~ = ~ sin a
where Pr{Ila} represents the conditional probability of intersection.
To obtain the unconditional probability Pr{I}, we must integrate
pr{I} =IePr{I Ia}pr{a}
where Pr{a} defines the probability density for a. The integration is
performed for all values of a. On the assumption that a is randomly
distributed, we define Pr{a}
pr{a} = ~a
7[
which is to say that a is uniformly distributed between 0 and 7[.
By substitution, we obtain
7[/2
Pr{I} = -- Sln a da = --
2 L j ' 2L
7[ D 0 7[ D
as the probability of intersection.
In the search for the dike, we may wish to search on a grid.
Thus, we can inquire as to the probability of intersecting the line
segment of length L for a search conducted on a grid having mesh size
D. For a fixed angle a, the situation can be seen in Figure 1.2.
Thus, we need to consider an intersection of the line segment with
either horizontally or vertically spaced lines of the grid. If we
take the horizontal to mean the x direction and the vertical to mean
the y direction, then the probability of intersection Pr{I} is equal
to
Pr{I} Pr{I U I }
x Y
Pr{I } + Pr{I } - Pr{I n I }
x Y x Y
1. STATISTICS AND PROBABILITY 9
0
0
£Jh
b
FIGURE 1. 2
where {I U I } represents the intersection along either a horizontal
x y .
or vertical line of traverse and 0 n I } represents the intersec-
x y
tion of both a horizontal and vertical line. Because the grid has a
square outline, it follows that
PrO }
x
and hence
Pr{I}
PrO}
y
To derive the expression for the latter, we can write
Pr{Ix n I } =fPdI ,I le}Pr{e}
y x y
e
where Pr{I ,I Ie} is the conditional probability of intersecting the
x y
line segment with both a horizontal and vertical line of the grid.
Referring to Figure 1.2,
pdI ,I Ie}
x y
so that
pdI n I }
x Y
Consequently, we have
11/2
:;;-
2(DL)2 f
" sin e cos e de
o
.!.~(4-~)
11 D D
10 R. B. McCAMMON
as the probability that a search conducted on a grid with mesh size D
will locate a line segment of length L (L~D) given that the line seg-
ment has no known orientation or location.
While the conducted search has been oversimplified in terms of
actual practice, there is much that can be deduced from this simple
example. After presenting this to students, for example, the follow-
ing questions can be posed:
1. If, in fact, the line segment in question is suspected of having
a known preferred orientation, how then can the probability of
intersection based either on a paraHel-line or grid-type search
be maximized?
2. How is the probability affected if L>O? At this point it is pru-
dent to pause and remind ourselves that some questions, though
simply stated, cause considerable distress. The above question
falls within this category. For L>D in the case of parallel line
search, for instance, the conditional probability of intersection
Pr{I/8} is given by
L .
D Sln S o ~ 8 ~ sin-1 D/L, sin- 1 (- r) ~S~1T
PrO IS}
1 -1 D -1 ( D)
sin - < 8 < sin - -
L - - L
so that the unconditional probability of intersection Pr{I} is
pr{I}
-1
sin (D/L)
2f L .
- - Sln
1T D
o
. -1 D
Sln L +
S dS d8
This is by way of saying that questions posed to the students
must be thought out beforehand.
3. Does the probability of intersection change if, instead of a line
segment, a circle of diameter L is considered?
4. Taking into account the economics of a search for a mineralized
dike having an expected value V, for what spacing of 0 will the
expected gain be maximized? In posing this question, it is
1. STATISTICS AND PROBABILITY 11
necessary to specify the cost of the search as a function of the
line spacing or the grid size.
Thus far, our attention has focused on a single line segment or
put into its geologic context, a single dike. In Figure 1.3, however,
you will notice there are 100 such line segments (or dikes) of equal
length that have been located at random within a square area. We can
enlarge our original problem by asking for the probability of inter-
secting the i-th line segment with length L. This is equal to
Pr{I. }
1
2 L
ITO
~ ~ (4 ~)
(for parallel lines spaced a
distance D apart)
(for a square grid with mesh
size D)
depending on the type of search. For example, consider the parallel-
line type of search. If we assume that the location and the orienta-
tion of the different line segments with length L are each independent,
it follows that the number of intersections observed for a given set
FIGURE 1. 3
12 R.B. McCAMMON
of parallel-line traverses spaced a distance D apart is binomially
distributed. The expected number of such intersections N is given by
E eN) = NP{I.}
1
where N is the total number of line segments contained within the
search area.
Take a sheet of tracing paper and with a ruler make a series of
parallel lines spaced a distance D, D = 2L, apart. Next, place this
overlay on Figure 1.3 and for an arbitrarily chosen orientation, count
the number of intersections of line segments with traverse lines. Re-
peat this several times if you wish, varying both the location and
orientation of the overlay. Since the total number of line segments
is known, you can compare the observed number with the expected number
given in this instance as
E eN)
since L/D equals 1/2.
Under such conditions, therefore, when the total number of line
segments of length L is unknown, the estimate of the total number can
be based on the observed number of intersections given as
D 11
N est total =L2 N obs
All we have said above applies equally to a grid type of search.
A grid of mesh size D = 2L can be used to perform a similar experi-
ment. Remember, the probability of an intersection is different than
for parallel-line spacing however.
A question that can be posed to students at this point is what
effect there is on the probability of an intersection if the length of
the line segment or line segments to be located is unknown. It may be
the case even that this length has a specified probability density
function Pr{L}. Thus, the length L can be treated as a variable. The
probability of an intersection P{I} is expressed in this instance by
Pr{I} =fa~ Pr{I /L,e}Pr{L,e}
1. STATISTICS AND PROBABILITY 13
or if we assume that the length L is statistically independent of the
angle 8,
Pr{L,8} Pr{L}Pr{8 }
We can write
Pr{I} = f 1 PrUiL,8}Pr{L}Pr{8}
L e
The probability of an intersection, therefore, is seen to vary de-
pending on the distributions assigned to Land 8; consequently, Land
8 become parameters that affect the observations made for different
search strategies. A variety of probability density functions could
be inserted in the above equation with the result that the probability
of an intersection would differ from one situation to the next.
1.5 ROCKS IN THIN SECTION
We turn our attention now to a problem that takes us beyond the
probability of an intersection.. We consider the length of a line of
intersection. Imagine that a circle of diameter D is located some-
where between two parallel lines spaced a distance t apart as shown
in Figure 1.4. Suppose we locate another line at random that lies be-
tween the two lines and extends parallel to them. The probability
that this line will intersect the circle is given by
Pr{I} = D/t
where Pr{I} is the probability of intersection. Here we are interes-
ted not in the probability of intersection but rather in the length of
!ro
!L
FIGURE 1. 4
14 R.B. McCAMMON
the chord of the circle being intersected. This chord can be consid-
ered the apparent diameter of the circle. We wish to derive the prob-
ability density of this length. To anticipate the geologic implica-
tion of this problem, it is sufficient to take note that the random
slicing of circles by lines is identical in concept to the preparation
of rock thin sections in which grains imbedaed in a matrix are cut
through by a random plane. For the latter, the grain size distribu-
tion observed subsequently in thin section will underestimate the ac-
tual particle size distribution in the rock. It is natural to examine
what effect this has on the moments of the true distribution and how
this bias, since it exists, can be reduced if not eliminated.
Taking the simplest case, we ask what is the probability, given
that our circle is intersected by the line, that the observed length L
will be greater than some length La(La>O) and less than or equal to
some length Lb (La ~ Lb ~ D). Referring to Figure 1.5, it is seen
that this probability is given by
I 2h
Pr{L <L<L I} =--
a - b D
where the multiplier of 2 derives from the presence of two slabs of
thickness h occurring within the diameter of the circle. From the
above figure, we see that
and
2 (Lb) 2 (D) 2
~ + 2" = 2"
T ~Hb IHa
0
_1
FIGURE 1. 5
1. STATISTICS AND PROBABILITY
so that h is given by
L 2
a
D
15
~1
The unconditional probability that the observed length will lie within
these limits is
for
If we let La approach 0 and set Lb to an arbitrary value c(O<c5D), the
cumulative probability distribution for L is
Taking the derivative with respect to L, the probability density is
As long as we are concerned with only a single diameter D, we can
without loss of generality let t = D = 1 so that
P (£) £>0
for a circle of unit diameter. A graph of this probability density
function is given in Figure 1.6. The distribution falls off rapidly
for values much less than one. The question is, how much does this
affect the estimate of the circle diameter if observations are based
on the apparent diameters measured by successive random slices of a
circle. The mean of the above distribution is given by
1
E(L) = 1
o
1 £2
£p (,Q,) d£ = f --- d,Q, = 2:..
o "1 _£2 4
16 R.B. McCAMMON
pt£)
o
FIGURE 1.6
so that an estimate of the true diameter based on the average value
of apparent diameters obtained by successive random slices would be
in error by approximately 25 percent. In this instance, the estimate
could be corrected simply by multiplying the average value of apparent
diameter by 4/~. In Figure 1.7, 50 circles of equal diameter Dare
located at random within a given square area. Using a ruled trans-
parent overlay of parallel lines spaced a distance D apart measure the
apparent diameters of circles intersected for several random positions
of the overlay. The probability density of the apparent diameter of
the circles is generated then. If you wish, construct a histogram for
the values obtained. Next, calculate the mean and compare this with
the expected value of the distribution above. The two values should
agree within the precision allowed by the total number of measured
intersections.
In reality, particles are of different sizes and mixed together;
therefore we must consider a distribution of diameters. In the pre-
sent context, this can be represented by circles having different dia-
meters mixed together in fixed proportions.
To advance our discussion, we must take note that what we took
before as pC~), we now mean as pC~/D), where D is a specified diameter.
The unconditional probability density of the apparent diameter for
mixtures of circles having different diameters can be expressed as
Pr{L} =1 Pr{L/D}Pr{D}
D
where Pr{D} is the probability density of the circle having a diameter
D. Rewritten in lower case, it is
p(~) =~ p(~ld)p(d)
1. STATISTICS AND PROBABILITY 17
FIGURE 1. 7
The general problem is now as follows: Given an observed dis-
tribution p(i) of apparent diameters, find the distribution of the
true diameters p(d) given that p(i!d) is specified. In Figure 1.8,
for instance, 80 circles 1/4 inch in diameter and 20 circles 1/2 inch
in diameter are located at random within a given square area*. Using
an overlay with parallel lines spaced 1/2 inch apart, measure the ap-
parent diameters of the circles intersected for several random orien-
tations. Again, you may wish to construct a histogram. For this ex-
ample, p(d) is given by
r
dl 0.25
p(di ) 0.2 d2 0.50
0 otherwise
so that
2 i/d.
1
L
1
P(i) t p(d. ) i>O
i=l 1 fl _ (i/d.)2
1
*Reduced scale used in Figure 1.8
18
CO)
0 Oob
CO 0
~
0
0800 0
<§) 0
CO
0
0 c9
0 0 e
0 6)0
FIGURE 1.8
and therefore
l'),p('),) d'),
d.
ECL) = 1 '),>0
fpc'),) cU
d.
1
2
-d
1 L: pCd.)d.
2 i=1 1 1
R.B. McCAMMON
1-+---1
o -k -t in.
@~
0
0 ©
0
0 0
1. STATISTICS AND PROBABILITY
(~/4)[0.S x (l/S) + 0.2 x (1/2)]
O.S x (1/2) + 0.2
19
where t has been set equal to the largest diameter. Compare the mean
of the apparent diameters with this expected value. Once again, the
values should agree within the precision allowed by the total number
of measured intersections. While the solution to the general problem
where there is a continuous distribution of particle diameters is ob-
viously more difficult, the same principles apply as to this example.
These examples, as with the examples in the preceding section,
have touched lightly on the more broader topic of geometric probability.
Readers who wish to pursue details of the subject further can refer to
the short but lucid monograph by Kendall and Moran (1963) and two more
recent review articles by Moran (1966, 1969). Within the field of
geology, geometric probability has been applied to the study of par-
ticle size in thin section (Rose, 1965) and to the probability of
success of locating elliptical targets underground with square, rec-
tangular, and hexagonal grids (Singer, 1972). For these references,
particularly the latter two, the reader will recognize the difficulty
in translating the relatively straightforward equations discussed in
this chapter to the more complex situations met with in practice.
Despite these difficulties, however, the concept of geometric proba-
bility offers a very practical device for studying the uncertainty as-
sociated with spatial form in geology.
REFERENCES
Kendall, M. G., and Moran, P. A. P., 1963, Geometrical probability:
London, Chas. Griffin, Std., 125 p.
Moran, P. A. P., 1966, A note on recent research in geometrical proba-
bility: Jour. App. Prob., v.3, p. 453-563.
Moran, P. A. P., 1969, A second note on recent research in geometrical
probability~ Adv. App. Prob., v. 1, p. 73-90.
20 R.B. McCA~ON
Rose, H. E., 1968, The determination of the grain size distribution of
a spherical granular material embedded in a matrix: Sediment.,
v. 10, p. 293-309.
Singer, G. A., 1972, Ellipgrid, A FORTRAN IV program for calculating
the probability of success in locating elliptical targets with
square rectangular and hexagonal grids: Geocom. Programs, v. 4,
p. 1-10.
Chapter 2
R- and Q-Mode Factor Analysis
J. E. Klovan
2.1 MEANING OF FACTOR
Factor analysis is a generic term that describes a variety of
mathematical procedures applicable to the analysis of data matrices.
Although developed, and largely exploited by psychologists, it is a
method of general application to many branches of scientific enquiry
and geology is no exception.
At the outset the word "factor" requires precise definition be-
cause the way it is interpreted can give a false impression as to what
the method attempts to do. 4athematically, a factor refers to one of
a number of things that when multiplied together yield a product.
Another use of the word is in reference to some sort of theoretical or
hypothetical casual variable. As will become clear, it is the former
meaning tha~ should be applied to the method; occasionally the second
meaning may be applicable to the results of the method.
The principles of the mathematics involved in factor analysis
were outlined by Pearson in 1901. Starting in 1904, Spearman began
applying the method to psychological theories. Thurstone, Holzinger,
and a large number of other workers expanded on the method during the
1930's and 1940's. The advent of electronic computers in the 1950's
made the laborious calculations involved amenable to quick solution
and the methods became widely available. In the late 1950's the method
was first applied to geologic problems.
Geologists are commonly faced with problems wherein a large number
of properties are measured or described on a large number of things.
The "things" may, for example, be rocks and the "properties" may be the
21
22 J.E. KLOVAN
amounts of various minerals making up the rocks. If these data are
arranged in tabular form such that each rock represents a row of the
table and each mineral species a column, then the resulting chart of
numbers is referred to as a data matrix.
Analysis of such a data matrix may pose a considerable problem to
the investigator if it contains many numbers. The primary aim of fac-
tor analysis is to achieve a parsimonious description of such a data
matrix -- that is, to determine if this table of numbers can be sim-
plified in some way.
Returning to rocks and minerals as a concrete example, perhaps
there are a small number of mineral assemblages, which, if determined,
describe the rocks almost as well as all the amounts of the individual
minerals, In this case the objective of factor analysis would be to
simplify the original large data matrix by determining
1. The number of mineral assemblages present
2. The composition of each assemblage in terms of the original min-
eral species
3. A description of each rock sample in terms of the amount of each
assemblage present in it
The present chapter will attempt to outline the mathematical pro-
cedures used in one of the methods of factor analysis, namely, the
method of principal components. A simplified heuristic approach will
be followed that will attempt to make use of the geologists' ability
to visualize three-dimensional concepts. Several simple examples will
be used to lead the reader through a formidable mathematical jungle,
and finally, some real applications of the method are briefly explained.
For readers with no experience with matrix algebra, the Appendix
contains concepts that may be helpful in the following exposition.
2.2 DATA MATRICES
A matrix is a table of numbers with so many rows and so many col-
umns. As a matter of convention here, the rows of a data matrix will
represent geologic entities, the columns will represent attributes of
these entities. In most cases there will be more entities than
2. R- AND Q-MODE FACTOR ANALYSIS 23
attributes so that most data matrices are rectangular in shape and are
"taller" than they are "wide." More simply, data matrices tend to
have more rows than columns.
In the terminology of matrix algebra, an entire matrix is symbol-
ized by a capital letter. "X" will be used to symbolize any data ma-
trix. The size of the matrix is specified by a double subscript no-
tation, thus X
N refers to a table of numbers with N rows and n col-
,n
umns. If 93 rocks have been analyzed for 12 minerals, the resulting
data matrix may be symbolized as X93 12'
,
The entities of a geologic data matrix will depend on the nature
of the problem. Rock or sediment specimens are obvious cases. Sam-
ples of water or oil collected from various formations are also com-
mon. (Note that the word "sample" raises some semantic problems in
that it carries a special statistical connotation.)
Attributes, often referred to as variables, also depend on the
nature of the problem. A rock may be a~alyzed as to its mineral com-
ponents in which case the amount of each mineral is considered an at-
tribute. The rock could equally well (or, as well) be analyzed in
terms of certain chemical elements. The amount of an element then be-
comes an attribute.
Clearly, attributes do not exist in and of themselves; they are
properties of things. It is important, therefore, to define at the
outset of an investigation what is an entity and what is an attribute.
A fossil, for example, may in one study be considered an entity and
various features of it will be attributes. Or, in another study, the
amount of that fossil in a stratum may be considered an attribute of
that stratum.
2.3 FACTOR ANALYTIC MODES
Confronted with a data matrix the investigator may focus his at-
tention on two distinct yet interrelated questions:
R-mode: If the primary purpose of the investigation is to understand
the inter-relationships among the attributes, then the ana-
lysis is said to be an R-mode problem.
24 J.E. KLOVAN
Q-mode: If the primary purpose is to determine interrelationships
among the entities, then the analysis is referred to as Q-
mode.
In many cases both R- and Q-mode analyses are performed on the
same data matrix. As discussed later, factor analysis is applicable
to both types of questions. The essential solutions of the factor
analysis are only slightly dependent on the mode. The exact nature
of this relationship is described in more complete detail in a recent
paper (Klovan and Imbrie, 1971).
2.4 THE R-MODE MODEL
Given the data matrix X
N ,the basic problem is to determine m
,n
linear combinations of the original n variables that describe the geo-
logical entities without significant loss of information (assuming
m«n). These m linear combinations are termed factors. The method of
analysis operates not on the original data matrix but rather on the
matrix of correlation coefficients derived from the data matrix.
The well-known Pearson product-moment correlation coefficient is
the standard means of assessing the degree of linear relationship be-
tween a pair of variables.
If X. and X. are any two variables, that is, two columns from the
1 J
data matrix X, then the correlation coefficient between them may be
computed from:
(2.4.1)
N
I: (XkJ. - x.)2
k=l J
N
where the notation ~1 refers to summation over all the entities; Xi
and X. are the mean values of variable X. and X..
J 1 J
This is the so-called raw score formula and the situation is por-
trayed in Figure 2.1.
2. R- AND Q-MODE FACTOR ANALYSIS
Va,2
....
'12 =+1.0
Va, 1
Va,S
... .
'56 = 0.0
Va,5
FIGURE 2.1
Va, 4
........
'34 = -1.0
25
Va, 3
The origin of this graph may be shifted without changing the con-
figuration of the points. If the mean value of a variable is sub-
tracted from every value of the variable, the results are deviate
scores. The resulting numbers show how far from the mean each entity
is. This also results in shifting the origin of the variable to its
mean value.
If we define x. and x. as two variables in deviate form, then the
1 J
correlation formula becomes
r ..
1J
N
L: ~i~J'
k=l
IN 2~N 2
"
L: xk · L: xkJ'
k=l 1 k=l
This situation is shown on Figure 2.2
A variable in standard form is defined as
x. - X. x.
Z. 1 1 1
1 (J. a.
1 1
(2.4.2)
(2.4.3)
26
,.
.
[
Va, 2
••• ":2 =+ 1.0
Va,1
IVa, 6
.
1
! '56 0.0
-.-.-...-.-.-.-
: Va, 5
FIGURE 2.2
Va,4
'34 = - 1.0
.'. Va, 3
J.E. KLOVAN
where a. is the standard deviation of variable X.. A standardized
~ ~
variable may be viewed as having a mean of zero and a standard devia-
tion of one. The individual values of the variable show how far from
the mean an entity is in terms of units of standard deviation.
The standard deviation of a variable is given by
N
(Xki
_ X)2
~tl
2
L
xki
a.
N
~ k=l N
(2.4.4)
or
INa.
~
= L xk ·
~
k=l ~
(2.4.5)
(Editor's note: This definition differs slightly from the one used in
Chapter 1 in that N, rather than N-l, is in the denominator. In fac-
tor analysis, the sample size is usually large enough so that this dif-
ference can be safely ignored.)
Thus
z.~
X. - X.
~ ~
a.
~
x.
-2:..=
a.
~
X.
~
(2.4.6)
2. R- AND Q-MODE FACTOR ANALYSIS
Substituting this into formula (2.4.2) we obtain
r ..
1.J
1 N
-N :E Zk1.·ZkJ·
k=1
27
(2.4.7)
This formula illustrates the fact that the correlation coefficient is
nothing more than the average value of the cross-product between two
variables given in standard form.
Up to now only two variables have been considered. In the gen-
eral case, correlation coefficients are computed between every possi-
ble pair of variables and arranged into a square symmetrical matrix
R This matrix contains all the information regarding the pairwise
n,n
linear relationships between the variables.
In Figure 2.1 the correlation coefficient was perceived to mea-
sure the degree of linear association between two variables as mea-
sured by the scatter of data points. Note that the axes of the graph
are the variables and the entities are points on the graph. If three
variables are considered, then the third variable is constructed as an
axis at right angles to the other two, and the entities will form some
sort of three-dimensional swarm of points. Other variables can be
added by constructing axes at right angles to all other axes but of
course this situation cannot be portrayed in three dimensions. A row
of the data matrix may then be considered as a vector that gives the
coordinates of an entity in n-dimensional space.
The situation may be reversed. A graph may be constructed using
the entities as sets of orthogonal axes as in Figure 2.3. Here the
variables become points on the graph. A column of the data matrix may
then be considered as a vector that gives the coordinates of a variable
in N-dimensional space.
If the variables are expressed in deviate form, that is the ori-
gin is at the mean, then variable Xi and variable Xj can be portrayed
as two vectors in N-space. From the Pythagorean theorem, the length
of a vector is equal to
~
2
t. = .6 xk ·
1. k=l 1.
(2.4.8)
28 J.E. KLOVAN
Item 1
FIGURE 2.3
Further, elementary trigonometry shows that the angle e between the
two vectors is equal to
cos e (2.4.9)
which is exactly equivalent to formula (2.4.2).
Thus the correlation coefficient between any two variables is
also the cosine of the angle between the two vectors representing the
variables situated in N-space.
Both interpretations of the correlation coefficient will be found
useful in the following discussion.
The following equation perhaps best summarizes the underlying
rationale of factor analysis:
2. R- AND Q-MODE FACTOR ANALYSIS 29
(2.4.10)*
In words, the equation states that any variable (for convenience
considered in standard form) Z., consists of a linear combination of
J
m common factors plus a unique factor. The resemblance of this equa-
tion to a mUltiple regression equation should be obvious.
In the factor model, the F's refer to hypothetical variables
called factors. It is assumed that each of these m factors will be
involved in the delineation of.two or more variables, thus the factors
are said to be common to several variables; m is assumed to be less
than n, the number of variables. The a's are analogous to S weights
in regression analysis. They are weights to be applied to the factors
so that the factors can best predict the value of Z.; "best" defined
J
in a least-squares sense. In factor analysis parlance, the a's are
termed loadings and the F's factor scores. The factor designated E.
J
is a factor unique to variable Z. and is analogous to the error term
J
in a regression equation.
The factor model contains n such equations; one for each variable.
For a particular entity k, equation (2.4.10) becomes:
(2.4.11)
The values for the a's do not change from entity to entity (just
as the S's remain constant in regression equations), but the values of
the F's do change from entity to entity. An excellent way to view the
F's is to think of them as new variables -that are linear combinations
of the old variables. As such, each entity can "contain" a different
amount of each one of these new variables. The F's are referred to as
factor scores.
The basic problem then is threefold:
1. To determine values for the a's
2. To determine values for the F's
3. To determine m, the number of common factors
*In this equation and the following derivation, the author rea-
lizes that he is confusing the principal component model with that of
a true factor analytic model. The justification for this is that, in
practice, most geologic applications follow this model, and, addition-
ally, it is easier to explain the underlying rationale and objectives
in this form.
30 J.E. KLOVAN
There are several ways in which an explanation of the solution
can be approached. Two such approaches will be dealt with here.
The equation for the variance of a variable in standard form is
given by
2
eJ.
J
Due
course,
N 2
L Zkj
k=l
N
to the standardization process
equal to one.
the variance
In terms of the factor model the variance may be
N
2
L Zkj 2 2 2 2 2 2
a1j l:Fk1 + a 2j l:Fk2 a .l:Fk
2 k=l + ... + mJ m
eJ.
J N N N N
= 1
Two simplifying restrictions may now be imposed.
1. The factors must be in standard form.
2. The factors must be uncorrelated.
(2.4.12)
of Z. is, of
J
written as:
2 2
a.l:Ek ·
+~
N
(2.4.13)
2
The first constraint makes every term of the form l:Fk1/N equal to
one (since this is the variance of the factor).
The second constraint makes every term of the form l:Fk1Fkp/N
equal to zero [see equation (2.4.7)].
The entire equation thus becomes:
(2.4.14)
It is seen that the total variance of a variable is to be made up
of the sum of the squared a's.
Further, the total variance consists of two parts.
1. That due to the common factors. This is termed the communality,
symbolized h~.
J
2
+ ... + a.
Jm
(2.4.15)
2. R- AND Q-MODE FACTOR ANALYSIS 31
2. That due to the unique factor. This of course is equal to 1 -
h~ and by definition is that part of the variance of variable j
J
that is not shared by any of the other variables. It is analogous
to the error term in regression analysis.
The method of principal components attempts to minimize this unique
variance in the solution for the a's and F's.
The algebraic notation of the factor model is very cumbersome and
not readily comprehended. Matrix notation allows an easier represen-
tation of the model.
As has been pointed out, the X data matrix can be transformed
N,n
to the standardized version ZN .
,n
We can consider the Z matrix as being the sum of two matrices:
Z
N,n C + EN
N,n ,n (2.4 .16)
where C contains the "true" measures and the matrix E contains "error"
measures.
It is a fact that any matrix can be expressed as the product of
two other matrices. Thus the matrix Z can be considered as the pro-
duct of the matrix F and A or
Z = FA' (2.4.17)
where A' is the transpose of A.
The F matrix contains the factor scores; the A matrix the factor
loadings. But the model contains two types of F's and A's, the common
and unique portions.
The factor loading matrix can be considered as consisting of two
parts.
mI
I
I
I
I
I
Ac I Au
I
I
I
I
n I
The first m columns contain the common factor loadings; the last
u columns contain the unique factor loadings; Au is a diagonal matrix.
32 J.E. KLOVAN
Similarly, the F matrix can be partitioned into a part containing
m columns of common factor scores and a part containing u columns of
unique factor scores.
m
n ~____~____________~
The model is now evident; Ac and Fc will be chosen in such a way
as to yield the matrix C; Au and Fu yield the matrix E. The sum of C
and E, of course, yield the original matrix Z.
To summarize, the total data matrix is considered to be derivable
from the product of two other matrices (Z = FA').
Z can further be considered as the sum of two matrices C and E;
C containing "true" measures, E containing error measures: Z = C + E.
Both F and A can be partitioned into two components, a common variance
part and an error part. Thus
Z = F A' + F A'
c c u u
(2.4 .18)
Because we are usually only interested in the matrix C, solution
for Fc and Ac will be sufficient. E can always be obtained from
E = Z - C.
The basic matrix manipulations required for solution are presented
below in point form. Each step is then explained with reference to a
simple geometric model.
1. The correlation matrix may be obtained from
R = 1:..z'z
N
2. The basic factor model states that
Z = FA'
or
(2.4.19)
(2.4.20)
2. R- AND Q-MODE FACTOR ANALYSIS 33
Z' =AF' (2.4.21)
3. Substituting (2.4.20) and (2.4.21) into (2.4.19) and ignoring the
1 b'
constant N' we 0 taln
R = Z'Z = AF'FA' (2.4.22)
4. We impose the condition that the factors will be uncorrelated;
that is, F will be orthonormal.
F'F = I (2.4.23)
where I is the identity matrix. Thus, (2.4.22) becomes
R = AA' (2.4.24)
5. We impose the constraint that
A'A = A (2.4.25)
where A is the diagonal matrix of eigenvalues of the correlation
matrix R, or
U'RU = A (2.4.26)
where U contains the eigenvectors associated with A. U is a
square orthonormal matrix so that:
U'U = UU' = I
6. The following matrix manipulation provides the solution.
(a) Pre-multiply (2.4.26) by U
UU'RU = UA
RU = UA
(b) Post-multiply (2.4.28) by U'
RUU' UAU'
R UAU'
(2.4.27)
(2.4.28)
(2.4.29)
(c) Because R is a square symmetric matrix with the Gramian
property (positive semi-definiteness):
7. Substituting into (2.4.25)
R = AA' = UA1/ 2A
1/ 2U'
(2.4.30)
34 J .E. KLOVAN
A = Ul',l/2 (2.4.31)
8. The matrix F may be solved for from
Z FA'
ZA FA 'A
ZA FA
F ZAA- 1 (2.4.32)
Explanation
~. Earlier it was shown that this equation was valid. Geo-
metrically, Figure 2.4 shows the scatter diagram interpretation of a
situation involving three variables. Note that the swarm of data
points is in the form of a three-dimensional ellipsoid. Theoretically,
every correlation matrix will define such an ellipsoid -- a hyperellip-
soid when more than three dimensions are involved.
~. The basic factor equation states that the data matrix
can be considered as the product of two matrices F and A. Unfortu-
nately, matrix theory shows that there is an infinite number of pairs
of matrices F and A whose product will reproduce Z.
Var 3
,
/ .
/ .
,
, .
,
. ' .
" .. " .
• I
• I
• I
• I
• I
I .,.~'
, . •• ,I
I' .....;---1 /3'('2
• I
FIGURE 2.4
2. R- AND Q-MODE FACTOR ANALYSIS 35
Step 3. This equation simply shows the relation between F, A, R,
and Z.
Step 4. Because there is an infinite number of pairs of matrices
whose product will yield Z, we impose a constraint that the F matrix
be orthonormal. Simply, this means that factors will be in standard
form and furthermore, they will be uncorrelated. If we consider these
factors to be new variables then this implies that the new variables
have no mutual correlation among them. Because of this constraint, it
is seen that the F matrix can for the moment be disregarded from fur-
ther consideration.
Steps 5-6. The crux of the method of principal components is em-
bodied in equation (2.4.25).
Any square symmetric matrix, such as R, can be uniquely defined
in terms of two other matrices that have special properties.
In the expression R = UAU', A is a diagonal matrix containing
the eigenvalues of R. U is a square orthonormal matrix containing the
associated eigenvectors. The calculation of eigenvalues and eigen-
vectors is a straightforward matter using computer programs. Essen-
tially, eigenvalues are the roots of a series of partial derivative
equations set up so as to maximize the variance and retain orthogon-
ality of the factors. Physically, the eigenvectors merely represent
the positions of the axes of the ellipsoid (or hyperellipsoid) shown
on Figure 2.4. The eigenvalues are proportional to the lengths of
these axes. The largest eigenvalue and its corresponding eigenvector
represent the major axis of the ellipsoid. It is important to note
that the data points show maximum spread along this axis, that is,
the variance of the data points is at a maximum.
The second largest eigenvalue and its eigenvector represent the
largest minor axis. The axis is, of course, at right angles to the
major acis and the data points are seen to have the second largest
amount of variance along this direction. The same reasoning applies
to the remaining eigenvalues and eigenvectors.
So what is accomplished at this step is to create a new frame of
reference for the data points. Rather than using the old set of vari-
ables as reference axes we can use the eigenvectors instead. These
36 J.E. KLOVAN
have the property that they are located along directions of maximum
variance and are uncorrelated.
~. The equation R = AA' suggests that the correlation ma-
trix derived from the original data can be duplicated exactly by the
major product moment of the factor loadings matrix. This is true if
as many factors as original variables are used. However, the matrix
Ac of equation (2.4.18) will approximate R. The difference matrix,
R - AcA'c contains the residual correlations not accounted for by the
common factors. The determination of the number of common factors
needed is left for later discussion.
The end result of the matrix manipulations is the equation A =
UA1/2. This simply means that the desired matrix of factor loadings
is the orthogonal matrix of eigenvectors of R, each column of which is
scaled by the square root of the corresponding eigenvalue. This is
merely a normalization process.
~. The matrix of factor scores is obtained by straightfor-
ward matrix manipulation.
2.5 A PRACTICAL EXAMPLE
To recapitulate what has been discussed in rather abstract terms,
and to give physical significance to the method as explained thus far,
a simple geologic problem will be followed through.
Figure 2.5 is a typical geologic data matrix with 20 rows and 10
columns. The rows represent 20 localities and the columns, as indi-
cated, represent attributes of the rocks and structures at each local-
ity. The data are fictitious.
Figure 2.6 is the 10 x 10 correlation matrix obtained from these
data.
Figure 2.7 is the list of eigenvalues obtained from the correla-
tion matrix.
2. R- AND Q-MODE FACTOR ANALYSIS 37
Geologi cal Prooerties
OJ
O'l II)
10
"
OJ > '0
N
~ OJ '+- 10 ~
~ 0 OJ '+- OJ
OJ ~ .~ r- '+- ~
~ OJ > OJ U 0 '+- N OJ
.~ r- 0 N 0 ~ E
u 10 U .~ II) '+- s:: OJ .......
r- .s::. II) II) OJ 0 0 II) ~ II)
10 C ~ ~ .~ II) OJ OJ
U II) E OJ r- 10 O'l ~II) OJ E ~
" lOS:: s:: 10 .s::. s:: ....... ~
s:: s:: s:: .~ ~o co~ ~ II) ~
.~ .~ .~ '+- 1I).i:l U s::.~ .s::. s:: U
r- >~ 10 Or- co 10
Locality O'l OJ 10 ~ ~IO 0- r- 0 .~
OJ ~
::E LL. z: Vl uu Vl wo I- :> LL.
1 1175 999 975 625 158 262 437 324 431 433
2 936 820 813 575 267 379 478 413 411 428
3 765 711 716 599 457 548 579 558 491 513
4 624 598 600 542 471 515 531 520 490 500
5 417 422 422 432 444 441 437 439 437 437
6 401 403 375 401 405 270 317 290 515 465
7 520 504 488 469 427 370 410 386 507 482
8 661 626 618 553 462 466 506 480 529 523
9 877 787 773 594 354 401 493 434 500 498
10 1060 932 898 656 315 312 468 370 580 552
11 1090 960 935 681 334 375 518 427 567 555
12 896 811 790 629 403 411 511 448 570 555
13 748 688 672 560 401 399 472 426 525 512
14 617 573 553 477 360 315 385 342 487 462
15 436 424 389 393 361 207 277 236 514 455
16 664 587 560 419 212 182 287 221 397 369
17 750 665 651' 484 259 299 387 331 399 396
18 903 797 791 573 291 396 486 427 421 437
19 998 888 887 657 366 499 583 527 480 506
20 1162 999 994 671 252 404 539 450 449 471
Data matrix
FIGURE 2.5
Correlation
coefficients
between
the
ten
geological
orooerties
Correlation
Coefficients
Variable
1
2
3
4
5
6
7
8
1
1.000
0.998
0.994
0.908
-0.576
0.130
0.581
0.282
2
0.998
1.000
0.998
0.933
-0.523
0.183
0.625
0.334
3
0.994
0.998
1.000
0.942
-0.497
0.235
0.664
0.383
4
0.908
0.933
0.942
1.000
-0.180
0.477
0.834
0.610
5
-0.576
-0.523
-0.497
-0.180
1.000
0.616
0.258
0.519
6
0.130
0.183
0.235
0.477
0.616
1.000
0.880
0.987
7
0.581
0.625
0.664
0.834
0.258
0.880
1.000
0.944
8
0.282
0.334
0.383
0.610
0.519
0.987
0.944
1.000
9
0.012
0.057
0.035
0.286
0.539
0.181
0.216
0.208
10
0.258
0.313
0.312
0.590
0.550
0.524
0.604
0.573
Figure
2.6
9
0.012
0.057
0.035
0.286
0.539
0.181
0.216
0.208
1.000
0.909
10
0.258
0.313
0.312
0.590
0.550
0.524
0.604
0.573
0.909
1.000
tJ.l
00
c...
tTl
;><:
I:""'
0
~
2. R- AND Q-MODE FACTOR ANALYSIS 39
Eigenvalues of correlation matrix
Eigenvalue Percent Variance
Exnlained
Factor I 5.46 54.61
Factor II 3.19 86.54
Factor III 1.35 100.00
FIGURE 2.7
There are only three nonzero eigenvalues (that there are three and
only three is a reflection of the "cooked-up" nature of the problem).
Geometrically, the analysis began with the 20 data points scattered in
10-dimensional space. Because there are only thre~ nonzero eigen-
values, the implication is that the hyperellipsoid enclosing the data
points has seven axes of zero length and exists, in fact, as an ordin-
ary three-dimensional ellipsoid. Therefore the data points can be lo-
cated with reference to three mutually perpendicular axes instead of
the original 10.
The A matrix, in Figure 2.8, will be found to reproduce R exactly
from R =AA'. The F matrix in Figure 2.9 will, in conjunction with A,
reproduce the standardized version of the data matrix according to
Z = FA'.
The geologic interpretation of these matrices will be deferred
until some additional concepts are put forward.
Principal Component Factor Matrix
Factors
Comm. 1 2 3
1 1.0000 0.8029 -0.5894 0.0886
2 1.0000 0.8385 -0.5367 0.0940
3 1.0000 0.8579 -0.5122 0.0407
4 1.0000 0.9760 -0.1961 0.0943
5 1.000 0.0176 0.9998 -0.0098
6 1.0000 0.6538 0.5999 -0.4611
7 1.0000 0.9297 0.2393 -0.2799
8 1.0000 0.7647 0.5018 -0.4042
9 1.0000 0.3268 0.5407 0.7751
10 1.0000 0.6641 0.5437 0.5132
Variance 54.614 31. 928 13.459
Cum. var 54.614 86.542 100.000
Principal Factors of Correlation Matrix
FJr,URE 2.8
40 J.E. KLOVAN
Pri nci pal Factor Score Matrix
Locality Factors
1 2 3
1 0.3887 -2.2383 0.1359
2 0.1989 -0.9798 -1.1363
3 0.9083 1.2182 -1.0941
4 0.2926 1.3964 -0.9847
5 -0.9595 1.0959 -1.4845
6 -1.6185 0.6760 0.9116
7 -0.7464 0.9150 0.1871
8 0.2992 1.2964 0.0020
9 0.4987 0.0377 0.1130
10 0.9111 -0:4040 2.1310
11 1. 2932 -0.1946 1.5206
12 0.8987 0.6101 1.1914
13 0.1935 0.5946 0.4475
14 -0.8159 0.1323 0.3074
15 -1.8532 0.1726 1.3474
16 -1. 7613 -1. 5741 -0.2310
17 -0.8554 -1.0525 -0.9248
18 0.2300 -0.6998 -1.1155
19 1. 3189 0.1578 -0.7823
20 1.1785 -1.1592 -0.5417
FIGURE 2.9
2.6 FACTORS
The basic factor equation, Z = FA' states that the data matrix
can be considered as the product of two factors F and A. The mathe-
matical procedure used to obtain these matrices has been outlined, but
it is now necessary to explain what they signify and how they are in-
terpreted and used.
The matrix of factor scores, F, in general, consists of N rows
and m columns where N is the number of entities and m equals the num-
ber of common factors. Each column is in standard form with zero mean
and unit variance, and there is zero correlation between columns.
Because the factors are linear combinations of the original vari-
ables, they can themselves be considered as new variables with the
abovementioned properties. Scanning down a column of F, the "amount"
of this new variable as contained in each entity is revealed. Being
in standard form, factor scores are expressed in units of standard
2. R- AND Q-MODE FACTOR ANALYSIS 41
deviation from the mean of the hypothetical variable. Thus, the vari-
ation from entity to entity is expressed in relative terms only. None-
theless, these new variables can be plotted and manipulated in the
same way as any other variable. Using the factors as orthogonal axes,
the entities may be plotted on a scattergram to show the distribution
of entities in m-dimensional space.
The matrix of factor loadings, A, generally has n rows and m col-
umns. The rows correspond to the original variables; the columns are
the factors. Each column has been scaled so that the sum of squared
elements in the column is equivalent to the amount of original vari-
ance accounted for by that factor. The elements in a column may be
considered as the coefficients of a linear equation relating the vari-
ables to the factor -- in essence, they give the recipe for the factor
Therefore, the columns of the A matrix can be used to give some phy-
sical meaning to the factors.
A row of the A matrix shows how the variance of a variable is
distributed among the factors. Interrelationships between variables
can be determined by a comparison of their rows in the A matrix.
As was pointed out in equation (2.4.15), the sum of the factor
loadings squared in a row of A is an. expression of the amount of vari-
ance of a variable accounted for by the m factors. This was termed
the communality. The communality attached to each row of the A matrix
gives an appreciation of how well each variable is explained by the m
factors considered.
Another valid view of an element in A is that it represents the
correlation between a variable and a factor. Because correlations
are angular measures, the row elements actually represent the cosines
between a variable and the m reference factor axes. A useful way to
analyze and interpret the factors is to plot the loadings as two-
dimensional scattergrams. For m factors there will be m(m - 1)/2 such
graphs, which in effect gives two-dimensional snaps'lots of m-space.
Groupings of variables and trends between them often yield important
clues as to the physical significance of the factors.
A practical example of interpretation is deferred until the matter
of rotation has been discussed.
42 J.E. KLOVAN
It has been stressed several times that there is an infinite num-
ber of solutions for the equation Z = FA'. This is so because there
is an infinite number of pairs of matrices F and A that will reproduce
Z.
The method of principal components determines a unique solution
because certain constraints are imposed, namely, that the F matrix is
orthonormal and that the A matrix contains the eigenvectors of the cor-
relation matrix produced from Z. These constraints yield a solution
with two desirable properties; the factor axes are orthogonal and pass
through positions of maximum variance. But, they also possess an un-
desirable property. Considered as new variables, they are very gen-
eral and, in fact, correspond to a sort of average of all the original
variables. Although this may in some instances be useful, it is com-
mon to move the positions of the factor axes by rotating them so that
they will satisfy certain other criteria. An attempt is made to
achieve what is termed "simple structure," by which is meant that the
factor axes are located in positions such that:
1. For each factor only a relatively few variables will have high
loadings, and the remainder will have small loadings.
2. Each variable will have loadings on only a few of the factors.
3. For any given pair of factors, a number of variables will have
small loadings on both factors.
4. For any given pair of factors, some of the variables will have
high loadings on the second factor but not on the first.
5. For any given pair of factors, very few of the variables will have
high loadings on both.
What these conditions attempt to achieve is to place the factor
axes in more meaningful positions, that is, so that they will be
highly correlated with some of the original variables.
A large number of methods have been designed to accomplish these
objectives but only two will be considered here.
An approximation to simple structure, designed by Kaiser (1958),
uses a rigid rotation procedure. This means that the orthogonal prin-
cipal component factors will be rigidly rotated and maintained ortho-
gonal.
Kaiser's approach is to find a new set of positions for the prin-
cipal factors such that the variance of the factor loadings on each
2. R- AND Q-MODE FACTOR ANALYSIS 43
factor is a maximum; the loadings should tend toward unity and zero
(the sum of the variance for all m factors is the actual quantity max-
imized). That is, when the value of V in the following expression is
maximized simple structure should be obtained:
V
m
nL:
p=1 (2.6.1)
where b. is the loading of variable j on factor p on the new, rotated
JP
factor axes.
The full explanation of this equation is rather too involved for
these notes, and the reader is referred to Harman (1960, p. 301) for
a full discussion. The process can be readily understood in terms of
matrix algebra. Given the n by m matrix of principal factor loadings
A, the objective is to transform it to.an n by m matrix of varimax
factor loadings B such that B will satisfy equation (2.6.1).
Usually, factors are transformed (or rotated) two at a time. In
matrix terms, this can be accomplished by
B = AT (2.6.2)
where T consists of
fcos <I> - sin <l>J
[:;in <I> cos <I>
<I> is the angle of rotation required to yield a maximum value of V in
equation (2.6.1) and is determined by an iterative process.
The matrix B contains the loadings of the original n variables on
the m rotated factors and can be interpreted in the same way as the A
matrix.
Factor scores for the varimax factors can also be computed. Again
the values can be interpreted in the same way as the principal factor
scores. The varimax factor scores remain in standard form but the
factors are now slightly correlated.
Figure 2.10 shows the rotated, varimax factor matrix and Figure
2.11 the associated factor scores.
Figure 2.12 illustrates and compares the plots of principal com-
ponents and varimax loading derived from the data matrix of Figure 2.5.
44
Var Comm.
1 1.0000
2 1.0000
3 1.0000
4 1.0000
5 1.0000
6 1.0000
7 1.0000
8 1.0000
9 1.0000
10 1.0000
Variance
Cum. var
Varimax Factor Matrix
Factors
1
0.9971
0.9916
0.9835
0.8813
-0.6197
0.0558
0.5191
0.2102
0.0146
0.2338
44.771
44.771
FIGURE 2.10
2
0.0765
0.1241
0.1804
0.3985
0.5880
0.9897
0.8380
0.9648
0.0488
0.3979
33.318
78.089
J .E. KLOVAN
3
-0.0060
0.0362
0.0117
0.2540
0.5198
0.1317
0.1680
0.1580
0.9987
0.8872
21. 912
100.000
Note that in both cases, graphs of the factor loadings reveal
similar patterns but with the varimax loadings the factor axes are lo-
cated near the extreme clusters of variables. Interpretation is thus
facilitated.
As can be seen from the graph and scanning down the columns of
the varimax factor loading matrix B, loadings on variables 1, 2, and
3 are extremely high on Factor 1 and very low on the other two factors.
This would lead to the interpretation that the "new" variable, Factor
1, is in some way a paleotemperature index because variables 1, 2, and
3 are all geologic paleothermometers. The first column of the varimax
factor score matrix thus shows the distribution of paleotemperature
at each of the 20 localities in terms of units of standard deviation
away from the mean paleotemperature. Similarly, Factor 2, of the B
matrix, may be interpreted in terms of deformation, while Factor 3
represents some index of permeability.
2. R- AND Q-MODE FACTOR ANALYSIS 45
Varimax Factor Score Matrix
Locality Factors
1 1.7191 -1.1345 -0.9480
2 0.6130 0.2141 -1.3682
3 -0.2291 1.8610 0.0226
4 -0.7962 1. 5403 0.0196
5 -1. 6301 0.9332 -0.8890
6 -1. 5345 -1.0766 0.6182
7 -1.1157 -0.0212 0.4175
8 -0.5908 0.9143 0.7663
9 0.3711 0.2439 0.2588
10 1.2434 -0.9398 1.7595
11 1.3197 -0.2480 1.4876
12 0.4639 0.1764 1.5324
13 -0.1684 0.1884 0.7211
14 -0.6634 -0.5747 0.0782
15 -1.3364 -1.7591 0.6394
16 -0.3829 -1. 7847 -1.5171
17 -0.1144 -0.5587 -1. 5440
18 0.4618 0.3838 -1.1944
19 0.7982 1.3086 -0.1605
20 1.5620 0.3336 -0.6997
FIGURE 2.11
This rather simple examp~e illustrates the main features of R-
mode factor analysis. Several complications arise when real data are
analyzed, and these will be touched o"n following a discussion of Q-
mode techniques.
2.7 THE Q-MODE MODEL
When the nature of a geologic problem is such that relationships
between entities are the focus of attention rather than relationships
FIGURE 2.12
2 R- AND Q-MODE FACTOR ANALYSIS 47
between properties, then the Q-mode method of factor analysis becomes
a useful analytical tool.
Numerous such geologic situations are easily envisaged. The de-
lineation of lithofacies or biofacies are perhaps the most evident.
Here the objective is to find groups of entities that are similar to
one another in terms of their total composition.
The catch in this objective is to define "similarity" in a mathe-
matically realistic way. Several measures of similarity will be dis-
cussed next. Once all the interentity similarities are computed and
arranged in matrix form, the previously described methods of solution
are applicable to the analysis of this similarity matrix.
Similarity Indices
There is a vast, and ever expanding literature, on the problems
associated with a mathematical definition of similarity. There is
little in the way of theoretical justification for the selection of
one index over another; however, Gower (1967) has at least underscored
several considerations that must be taken into account.
Aside from similarity coefficients designed for presence-absence
data, three indices have commonly been used in Q-mode analysis.
1. Correlation Coefficient. The Pearson product moment correlation
coefficient has been used to indicate the degree of relationship
between two entities. If X
k and X~ are any two ~ of the data
matrix then:
(2.7.1)
measures the degree of relationship between the two entities.
As appealing as this index may appear there are a number of
intuitive and theoretical drawbacks. Note, for example, the ~
and x~ terms. These are average values for item k and~. If an
item is described in terms of a wide variety of properties
48 J.E. KLOVAN
measured in different scales, what then does such an average mean
in physical terms? By subtracting this value from each of the
attribute quantities, the proportions of the attributes are al-
tered from entity to entity. To partially overcome this, some
workers advocate standardizing the data matrix by columns before
using this equation. Clearly, this only adds a further complica-
tion to the data and complicates interpretation.
For this and several other reasons, the correlation coeffi-
cient is not considered a good index of similarity.
2. Coefficient of Proportional Similarity. Imbrie and Purdy (1962)
define an index of similarity referred to as cos 6. The equation
used is
(2.7.2)
For positive data this index ranges from zero, for perfect
dissimilarity to one for complete similarity. The difficulty with
this index is that while it preserves the proportional relation-
ships between entities it is blind to the absolute sizes involved.
Thus a "midget" and a "giant" whose attribute proportions are
identical would be considered as being completely similar. In
many problems where the investigator is interested in changes in
the proportions of constituents, such as sedimentologic and faunal
studies, the index is very appropriate. Imbrie (1963, p. 26) and
McIntrye (1969, p. DBM-A-41) suggest methods for including a
"size" variable that helps to remove the inherent "size" blind-
ness of the index.
3. Distance Coefficient. Harbaugh (1964) describes the use of a co-
efficient that measures the distance between entities in n-
dimensional space. The complement of this distance is then taken
as a measure of similarity. In order to standardize this index,
all attributes must be scaled so that the maximum value of each
is 1 and the minimum value is zero. This, of course, distorts
2. R- AND Q-MODE FACTOR ANALYSIS
the proportionality. The equation for computation, assuming
scaled attributes, is:
49
I - (2.7.3)
N
However obtained, the similarities between all possible pairs of
entities are calculated and arranged in a square, symmetric, similarity
matrix SN N" This matrix contains all the information concerning the
,
interrelations between the N entities under study. Q-mode factor
analysis begins at this point.
It will be recalled that in R-mode analysis, the objective was to
create m linear combinations of the n variables. The m linear combin-
ations could be considered as new variables. Similarly, in Q-mode
analysis, the objective is to find new, hypothetical entities whose
compositions are linear combinations of those of the original entities.
As pointed out by Imbrie (1963), these "new" entities, or Q-mode
factors, can be conceived of as being composite end members. combina-
tions of which can be used to reconstruct the original entities. The
problem then is to "unmix" the original entities into the smallest
possible number of end members. In this respect, Q-mode analysis is
a "mirror image" of R-mode analysis.
Computational Procedure
Using the cos 8 measure of similarity, the following equations
reveal the necessary steps in the analysis.
Let X
N be the data matrix. Form the diagonal matrix D whose
,n
principal diagonal contains the square root of the row vector lengths
of X. That is,
~
n 2
L: Xk ·
i=1 1
(2.7.4)
50
The matrix operation
W= D- 1X
J .E. KLOVAN
(2.7.5)
row-normalizes each row of X, that is, each row vector in W is of unit
length.
Then the similarity matrix is computed from
S = WW'
The basic factor equation is
W = AF'
and
W' FA'
thus,
S = AF'FA'
The constraint
F'F = I
results in
S AA'
As in the R-mode, we stipulate that
A'A = /
and following the same reasoning as before
A = U/1/2
and
(2.7.6)
(2.7.7)
(2.7.8)
(2.7.9)
(2.7.10)
(2.7.11)
(2.7.12)
(2.7.13)
(2.7.14)
It must be emphasized that the A and F matrices so derived are
not the same as those derived in the R-mode analysis. One set is de-
rived from S, the other from R.
Once obtained, A may be rotated to B as discussed in the section
on varimax rotation.
The B matrix in general consists of N rows and m columns. Each
row corresponds to an entity; each column represents a factor.
2. R- AND Q-MODE ANALYSIS 51
The factors are best thought of as hypothetical entities that are
completely dissimilar in terms of the proportions of their constitu-
ents.
Scanning down any column of B shows the amount of the hypotheti-
cal entity contained in each real entity. Scanning across any row
shows the composition of a real entity in terms of the hypothetical
entities.
The F matrix has n rows and m columns. The rows represent the
original attributes used to describe the entities. The numbers in a
row thus describe the relative amount of the attribute in each factor.
A column gives the "composition" of the hypothetical entity in terms
of the original attributes. Unfortunately the scale of these numbers
is obscure. Thus, they can be used in relative terms only.
2.8 AN EXAMPLE
Imbrie (1963) presents an example of Q-mode analysis that gives
the basic ideas behind the method. The matrix of fictitious data is
given in Figure 2.13. The rows represent 10 sediment samples and the
columns represent 10 species of minerals. The cos e similarity ma-
trix is given in Figure 2.14. Eigenvalues derived from this matrix
indicate that there are three, and only three, independent dimensions
to the data.
Data Matrix for Q-mode Example
Var A Var B Var C Var D Var E Var F Var G Var H Var I Var J
Loc 1 5.0 25.0 15.0 5.0 5.0 20.0 10.0 5.0 5.0 5.0
Loc 2 10.0 30.0 17.0 17.0 8.0 8.0 5.0 4.0 1.0 0.0
Loc 3 3.0 6.0 10.0 13.0 25.0 15.0 13.0 8.0 5.0 2.0
Loc 4 7.5 27.5 16.0 11. 0 6.5 14.0 7.5 4.5 3.0 2.5
Loc 5 4.6 21.2 14.0 6.6 9.0 19.0 10.6 5.6 5.0 4.4
Loc 6 3.8 13.6 12.0 9.8 17.0 17.0 11.8 6.8 5.0 3.2
Loc 7 8.3 26.6 15.9 14.2 9.1 11.1 6.8 4.6 2.2 1.2
Loc 8 6.1 22.7 14.6 10.2 9.9 15.4 9.1 5.3 3.8 2.9
Loc 9 7.6 24.2 15.2 13.8 10.8 11.8 7.6 5.0 2.6 1.4
Loc 10 3.9 10.3 11.2 12.6 21.3 14.8 11.9 7.3 4.6 2.1
FIGURE 2.13
Lac
1
Lac
2
Lac
3
Lac
4
Lac
1
1.0000
Lac
2
0.8739
1.0000
Lac
3
0.6906
0.6480
1.0000
Lac
4
0.9654
0.9704
0.6906
1.0000
Lac
5
0.9888
0.8733
0.7908
0.9595
Lac
6
0.8849
0.8020
0.9479
0.8697
Lac
7
0.9244
0.9901
0.7227
0.9902
Lac
8
0.9714
0.9391
0.7941
0.9862
Lac
9
0.9282
0.9779
0.7758
0.9855
Lac
10
0.7878
0.7531
0.9877
0.7952
Cas
theta
matrix
Lac
5
Lac
6
Lac
7
1.0000
0.9445
1.0000
0.9315
0.8723
1.0000
0.9860
0.9390
0.9779
0.9457
0.9082
0.9967
0.8705
0.9828
0.8200
FIGURE
2.14
Lac
8
Lac
9
1.0000
0.9867
1.0000
0.8790
0.8636
Lac
10
1.0000
(J1
N
'-<
tTl
::>":
t""
0
~
2. R- AND Q-MODE FACTOR ANALYSIS 53
The varimax factor loading matrix and its associated factor score
matrix are given in Figures 2.15 and 2.16.
Figure 2.17 is one of three possible plots of the varimax factor
loading. It is now evident that the original 10 sediment samples are
various mixtures of the three hypothetical samples. Three maps could
be drawn showing the spatial distribution of the end members and from
this, mechanism of transport might be deduced.
The composition of the end-members can be roughly determined
from a study of the columns of the F matrix of Figure 2.16.
Varimax Factor Matrix
Factors
Commun 1 2 3
Loc 2 1.0000 0.9133 0.3492 0.2095
Loc 7 1.0000 0.8494 0.4311 0.3043
Loc 4 1.0000 0.8179 0.3810 0.4311
Loc 9 1. 0000 0.8080 0.5006 0.3107
Loc 8 1.0000 0.7266 0.5183 0.4510
Loc 1 1.0000 0.6605 0.3899 0.6416
Loc 5 1.0000 0.6228 0.5222 0.5826
Loc 3 1.0000 0.3094 0.9314 0.1918
Loc 10 1.0000 0.4363 0.8632 0.2541
Loc 6 1.0000 0.4900 0.7714 0.4060
Variance 47.56 36.05 16.40
Cum percent 47.56 83.60 100.00
FIGURE 2.15
54 J.E. KLOVAN
Varimax Factor Score Matrix
Factors
1 2 3
Var A 0.881 0.038 -0.294
Var B 2.446 -0.468 0.948
Var C 1.l35 0.422 0.483
Var D 1. 333 1.003 -1. 346
Var E -0.097 2.433 -0.742
Var F -0.206 0.971 2.167
Var G -0.177 1.061 0.811
Var H 0.030 0.668 0.199
Var -0.208 0.393 0.611
Var ,1 -0.218 0.086 0.809
FIGURE 2.16
2
1.0
.3
.10
0.8 6
0.6
.9 .8 .5
.7
0.4 4 .1
.2
0.2
~--__--__----__--__--~3
0.2 0.4 0.6 0.8 .10
FIGURE 2.17
2.9 OBLIQUE ROTATION
The factors obtained in R- and O-mode analysis are constrained to
be orthogonal. There may be no physical reason for them to be mutu-
ally orthogonal and thus many schemes have been devised to find sets
of factors that are oblique to one another.
2. R- AND ~-MODE FACTOR ANALYSIS 55
Of these, the method due to Imbrie (1963) is the most sim?le.
Referring to Figure 2.17, it is apparent that the most divergent real
sediment samples 1, 2, 3 could be used as end members from which all
the remaining sam?les could be derived. Imbrie's method is to rotate
the factor axes so that they coincide with the most divergent samples
and then express all the other samples as proportions of these end
member samples.
This is accomplished by constructing an m by m matrix T, which
contains the varimax loadings of the most divergent samples. Then
C (2.9.1)
yields the oblique factor matric C. Figure 2.18 illustrates the re-
sults of the operation applied to the problem just discussed. The
method is also applicable to R-mode factor loadings matrices. The
oblique factors are no longer uncorre1ated and computation of factor
scores becomes much more involved.
In both of the examples presented thus far the correct choice of
the number of factors needed to reproduce the data matrix has been un-
equivocab1e. Coincidentally, in both instances there were three, and
only three, nonzero eigenvalues. But both of these examples involved
fictitious data that were "manufactured" for the purpose of exposition.
When analyzing real data the investigator must select the correct num-
ber of factors, and this is seldom unequivocab1e. Although some sta-
tistical methods are available to aid in this section, experience has
shown that certain empirical criteria are more useful.
The essentials of the problem are:
1. In order to reproduce exactly the data matrix from the two de-
rived matrices (Z = FA') it is necessary to use as many factors
as original variables. This is because the common variance and
unique variance together equal the total variance.
2. Because data are expressed in standard form in R-mode, and en-
tities are row normalized in Q-mode, the total variance (informa-
tion content) is equal to nand N, respectively. Each eigenvalue
extracted accounts for a certain amount of variance; thus the
percent variance explained can be calculated by dividing the ei-
genvalue by n (or N). By cumulating these percentages, it is
56
Localitv
1
2
3
4
5
6
7
8
9
10
3
1.0 3
0.8
0.6
0.4
0.2
2
,10
J . E. KLOVA,lI
.6
.9 8 5
7
.
4 111
0.2 0.4 0.6 0.8 1.0
Oblique Factor Matrix
Factors
1 2 3
1.000 0.000 0.000
0.000 1.000 0.000
0.000 0.000 1.000
0.497 0.537 0.000
0.847 0.000 0.207
0.441 0.000 0.644
0.200 0.753 0.097
0.530 0.343 0.207
0.207 0.668 0.201
0.108 0.116 0.839
FIGURE 2.18
possible to arbitrarily stop extracting factors once the cumula-
tive variance explained reaches some specified level, for example,
95%. The remaining factors needed to account for the remaining
variance are assumed to represent unique factors.
3. The equation R = AA' (or S = AA') implies that the correlation
matrix can be reproduced by forming the major product moment of
2. R- AND Q-MODE FACTOR ANALYSIS 57
the factor loadings matrix. It is, of course, possible to pro-
duce an estimate of R with an n x I matrix A.
The difference, or residual matrix, is obtained from:
R = R - A A'
r l n,l l,n
(2.9.2)
If significant correlations remain in R then the second factor
r l
may be added to A and the process repeated.
R = R - A A'
r 2 n,2 2,n
(2.9.3)
This procedure may be continued until there are no signifi-
cant residual correlations. The number of columns in A is then
taken as the correct number of factors.
4. The sums of squares of the factor loadings in a row of the A
matrix is termed the communality and represents that proportion
of the variance of a variable accounted for by the number of fac-
tors used. The "correct" number of factors can be judged by the
values of communality for all the variables. If many communali-
ties are low, say less than 0.8, more factors are probably re-
quired.
5. Because variance is extracted in descending order, factor loadings
on the first few factors will be higher than those on the later
ones. When all the loadings on a given factor appear to resemble
nothing more than noise (error components), then this and succeed-
ing factors may be removed from further consideration. This is
often best judged on a rotated factor matrix.
6. The final criterion is entirely subjective. If factors are in-
terpretable and "make sense" they are probably relevant. Uninter-
pretable factors or those whose spatial distribution forms no
sensible pattern may merely represent error components.
2.10 PRACTICAL EXAMPLES
Rather than add to an already lengthy account, an annotated bib-
liography of a few pertinent papers is given below. Study of these
58 J.E. KLOVAN
papers should enable the reader to develop a better understanding of
how the results of factor analysis are put to use in a variety of geo-
logic problems.
Cameron, E. M., 1968, A geochemical profile of the Swan Hills Reef.
Can. Jour. Earth Sci., v. 5, p. 287-309.
A detailed study of chemical variations in a reef complex. Factor
analysis, coupled with trend surface analysis, provides a method for
determining the diagenetic history of a slightly dolomitized, lime-
stone reef.
Degens, E. T., Spencer, D. W., and Parker, R. H., 1967, Paleobiochem-
istry of Molluscan Shell Proteins: Compo Biochem. Physiol., V.
20, p. 553-579.
The interrelationships between amino acids in various molluscs is stud-
ied by means of R-mode factor analysis. Environmental and genetic
effects on amino acid compositions are revealed.
Harbaugh, J. W., and Demirmen, F., 1964, Application of factor analysis
to petrographic variations of Americus limestone (Lower Permian),
Kansas and Oklahoma: Kan. Geol. Survey Dist. Pub. 15.
A paleoecologic analysis of a thin limestone unit based on petro-
graphic and chemical attributes. Uses both correlation coefficients
and distance coefficients as similarity indices in a Q-mode analysis.
Hitchon, B., Billings, G. K., and Klovan, J. E., 1971, Geochemistry
and origin of formation waters in the western Canada sedimentary
basin - III. Factors controlling chemical composition: Geochim.
et Cosmochim. Acta, V. 35, p. 567-598.
R- and Q-mode analyses are used to document flow paths and the chemi-
cal reactions responsible for variations in the chemistry of subsurface
formation waters. Oblique rotation is used to achieve simple struc-
ture. R-mode factor scores are used as input variables to second-
order factor analyses.
Imbrie, J., and van Andel, T. H., 1964, Vector analysis of heavy-
mineral data: Geol. Soc. Amer. Bull., V. 75, p. 1131-1156.
The classic paper on the use of Q-mode factor analysis in the study of
sediments. The Q-mode model is developed and applied to two recent
sedimentary basins in a way that clearly shows the utility and power
of the method.
2. R- AND Q-MODE FACTOR ANALYSIS 59
Imbrie, J., and Kipp, N. G., 1971, A new micropaleontological method
for quantitative paleoclimatology: Application to a late Pleis-
tocene Caribbean core, in The Late Cenozoic glacial ages
(Turekian, K., ed.): New Haven, Conn., Yale Univ. Press.
An involved and elegant method of analysis applied to Recent and
Pleistocene foraminifera shows how the results of Q-mode factor ana-
lysis can be used in predictive, nonlinear regression models. Pleis-
tocene oceanic temperatures and salinites can be accurately predicted
on the basis of foram assemblages.
Klovan, J. E., 1966, The use of factor analysis in determining deposi-
tional environments from grain-size distributions: Jour. Sed.
Petrology, v. 36, no. 1, p. 115-125.
Q-mode analysis is used to classify recent sediment samples on the
basis of their grain-size distributions. The factors extracted are
claimed to reflect different types of depositional energy.
Lonka, A., 1967, Trace-elements in the Finnish Precambrian phyllites
as indicators of salinity at the time of sedimentation: Bull.
Comm. Geol. Finlande No. 209.
Trace element variation as revealed by factor analysis leads to the
surprising conclusion that depositional salinities can be determined
even in highly metamorphosed shales.
Matalas, N. C., and Reiher, B. J., 1967, Some comments on the use of
factor analysis: Water Resources Research, v. 3, no. 1, p. 213-
223.
Some critical comments on the use and abuse of factor analysis as ap-
plied to hydrologic problems. Many mathematical and substantiative
arguments are presented that must be taken into account when interpre-
ting results. Although many of the comments are germane, others are
open to serious question.
McCammon, R. B., 1966, Principal component analysis and its application
in large-scale correlation studies: Jour. Geol., v. 74, no. 5,
pt. 2, p. 721-733.
Explains the use of R-mode analysis as applied to crude oil variations
and biostratigraphic problems. A "minimum entropy" criterion is used
to achieve simple structure in rotation.
60 J.E. KLOVAN
McCammon, R. B., 1969, Multivariate methods in geology, in Models of
geological processes (Fenner, P., ed.): Washington, D.C., Am.
Geol. Inst.
One of the best treatments of the mathematics and concepts of factor
analysis and many other related topics. Algebraic and geometrical ex-
planations are presented at an elementary level and the use of several
examples makes understanding especially easy.
McElroy, M. N., and Kaesler, R. L., 1965, Application of factor analysis
to the Upper Cambrian Reagan Sandstone of central and northwest
Kansas: The Compass, v. 42, no. 3, p. 188-201.
An application of factor analysis to a typical stratigraphic problem.
Factors are interpreted in terms of regional influences that affect
thickness, grain-size, and mineralogy of a sandstone unit.
Spencer, D., 1966, Factors affecting element distributions in a Silur-
ian graptolite band: Chem. Geol., v. 1, p. 221-249.
R-mode analysis is used to determine the underlying casual influences
affecting the chemical variability of a thin shale unit. A very en-
lightening discussion of how the factor matrices can be interpreted is
especially useful.
REFERENCES
Gower, J. C., 1967, Multivariate analysis and multidimensional geometry:
The Statistician, v. 17, no. 1, p. 13-28.
Harbaugh, J. W., 1964, BALGOL programs for calculation of distance co-
efficients and correlation coefficients using an IBM 7090 com-
puter: Kansas Geol. Survey Sp. Dist. Pub. 9.
Harman, H. H., 1960, Modern factor analysis: Chicago, Illinois, Univ.
of Chicago Press, 471 p.
Imbrie, J. and Purdy, E. G., 1962, Classification of modern Bahamian
carbonate sediments, in Classification of carbonate rocks - a
symposium, Mem. 1, Amer. Assoc. Petroleum Geol., p. 253-272.
Imbrie, J., 1963, Factor and vector analysis programs for analyzing
geologic data: U.S. Office of Naval Research, Tech. Rept. 6,
83 p.
2. R- AND Q-MODE FACTOR ANALYSIS
Kaiser, H. F., 1958, The varimax criterion for analytic rotation in
factor analysis: Psychometrika, v. 23, p. 187-200.
61
Klovan, J. E. and Imbrie, J., 1971, An algorithm and FORTRAN IV pro-
gram for large-scale P-mode factor analysis and calculation of
factor scores: Jour. Inter. Assn. Math. Geol., v. 3, p. 61-67.
McIntyre, D. B., 1969, Introduction to the study of data matrices, in
Models of geological ~rocesses (Fenner, P., ed.): Washington,
D.C., Amer. Geol. Inst.
Pearson, K., 1901, On lines and planes of closest fit to systems of
points in space: Phil. Mag., v. 6, p. 559-572.
Spearman, C., 1904, General intelligence, objectively determined and
measured: Amer. Jour. Psychol, v. 15, p. 201-293.
Thurston, L. L., 1947, Multiple factor analysis: Chicago, Illinois,
Univ. of Chicago Press, 535 p.
62 J.E. KLOVAN
APPENDIX 1. A PRIMER ON MATRIX ALGEBRA
1. A matrix is a rectangular chart of numbers. A matrix is symbol-
ized by a capital letter and its size is shown by two subscripts;
the first referring to the number of rows, the second the number
of columns. Thus A represents the matrix A with r rows and c
r,c
columns. Any number in the matrix is termed an element. Thus,
a.. is the element of A in the i-th row and j-th column. Two
l,J
matrices are said to be equal if all elements correspond exactly.
That is, A = B if a .. = b .. for all i and j.
lJ lJ
2. The transpose of a matrix is another matrix in which the rows and
columns are interchanged. It is symbolized by an apostrophe.
Thus A' is the transpose of matrix A.
3. Special types of matrices include:
(a)
(b)
(c)
(d)
(e)
Rectangular matrix. Has more rows than columns or vice versa.
Square matrix. Has the same number of rows as columns.
Square symmetric matrix. A square matrix such that a..
lJ
a .. for all values of i and j.
Jl
The lower left triangular
part of the matrix below the diagonal is a mirror image of
the upper right triangular portion.
Diagonal matrix. Only the elements in the principal dia-
gonal are nonzero and all other elements are zero. That is
a.. F 0 when i =j but a.. = 0 when i Fj.
lJ lJ
Identity matrix. A diagonal matrix whose diagonal elements
all equal 1.
(f) Column vector. A matrix 'with n rows but only one column.
(g) Row vector. A matrix with n columns but only one row.
(h) Scalar. A matrix with one row and one column.
4. Matrices can be added together (or subtracted) only if they are
size compatible, that }s, each matrix must have the same number
of rows and columns.
Addition or subtraction is done on an element by element
basis. Thus the elements of C in C =A + B are equal to the sums
of corresponding elements of A and B; c ..
lJ
and j.
a .. + b .. for all i
lJ lJ
2. R- AND ~-MODE FACTOR ANALYSIS
1. Matrix Definition
A [~ ~ ~]
159
2 8 6
2. Trans)2ose
[:
7 1
:]
A' 2 5
4 9
3. Types of Matrices
[;:]
rectangular
Size of A is 4 by 3; A43
a 32 = 5
A' is the transpose of A above.
[153]
524
347
square square symmetric
[~~:~]
o 0 0 1
- 1
3
o
4
1 [35871J
diagonal
4. Matrix Addition
[~ ~ ~]
7 8 1
436
A
identity
+
column vector
C = A + B
[~ :~]
203
436
B
row vector
63
64 J.E. KLOVAN
5. Multiplication of matrices can only be performed if the number of
columns of the pre-factor is equal to the number of rows of the
post-factor. That is, C = A . B is only possible if m = e.
n,m e,f
An element of C is defined as follows:
c ..
1J
where m is the number of columns of A and the number of rows of B.
A row of C is produced by multiplying the first row of A
times each column of B. This is repeated for every row of A until
the C matrix is complete.
The minor product moment is defined as C =A'A. C contains
the sums of squares and cross products of the rows of A.
6. The trace of a square matrix is the sum of its diagonal elements.

7. The matrix analog of scalar division is accomplished by inver-
sion. If A is a square matrix and A·B = B·A = I then B is said
to be the inverse of A. The notation A-I is commonly used to de-
note the inverse of A.
Finding the inverse of a matrix is a rather complicated pro-
cedure and the reader is referred to any good text on matrix
algebra for details.
2. R- AND Q-MODE FACTOR ANALYSIS 65
5. Matrix Mul tiElication C A . B
B
C
I
~J
2
r: ] [: 7
:]
cil all bll + a l2 . b21
4
14 5 . 2 + 1 4
l4 3
20 10
1 1 6 3
cl2 all + b12 + a12 . b22
A C 7 5 . 1 + 1 2
In general, the following "box" notation for matrix multiplication will
be found usefuL.
[post-factor]
[pre-factor] [product]
A
[: :] [~
1
~J
3
A'
[::] [:
2
,~]
G
1
~J [14 6 J 10
3 6 10 6
minor product moment major product moment
6. Note that the trace of both products is equal to 24.
7. Matrix inversion. Given the square matrix A, it is necessary to
find a matrix B such that AB = I or
[B]
[A] [r]
Expanding the above yields
allb ll + a12b21 + + a1nbn1 1.0
allb12 + a12b22 + + a lnbn2 0
a21b ll + a22n21 + + a 2nbn1 0
a21b22 + a22b22 + + a2nbn2 1.0
etc.
When b's are determined so as to satisfy this set of simul-
taneous equations then B = A-I, the inverse of A.
66 J.E. KLOVAN
8. A square matrix Q is said to be orthogonal if
Q'Q = D
where D is a diagonal matrix.
A square matrix Q is said to be orthonormal if
Q'Q = QQ' = I
9. The Rank of a Matrix. The rank of a matrix may be defined, in
terms of its column or row vectors, as the number of linearly in-
dependent row, or column, vectors present in the matrix.
Another view of rank is as follows. A matrix X can be ex-
nm
pressed as the product of two matrices whose common order is r.
If X cannot be expressed as the product of any pair of matrices
with a common order less than r, then the rank of X is r. It can
be appreciated from this that a very large matrix may have a low
rank and thus be expressable as the product of two smaller ma-
trices.
This is the basis of practically all multivariate methods of
data analysis.
2- R- AND Q-MODE FACTOR ANALYSIS
8.
_ [2 3J then Q'Q
Q - 1-6
Q is orthogonal
Q = [0.5 -0.866J
0.866 0.5
Q is orthonormal
9. Rank of a Matrix
[
6 3 12]
A = 8 4 16
12 6 24
67
Q'Q QQ' I
The rank of A is 1 for there is only
one linearly independent vector; all
columns (or rows) are multiples of
each other.
A can be reproduced by the product of an infinite number of pairs
of vectors, for example:
[2 1 4]
mu
3
12]
4 16
6 24
[:
10
,:]
The rank of A is 2 for there are two
A 5 linearly independent vectors; co1-
25 umns 1 and 2 are multiples of each
other.
[:
10
,:]
A 5 The rank of A is 3.
20
68 J.E. KLOVAN
10. Eigenvalues and Eigenvectors
Given a real square symmetric matrix A, there exist scalars
A and vectors u such that
Au = AU
or
Au - AU 0
or
(A AI)u 0
[u]
[~]
An nth-order symmetric matrix A has eigenvalues Al ,A 2... A
n
possibly not all distinct and possibly some being zero. Associa-
ted with each A is a vector (eigenvector) ul ,u2.•. un ' such that
u'.u. = 0 for all i and j when i ~ j. u'.u. 1 for i = l,n.
~ J ~ ~
Placing the eigenvalues in a diagonal matrix A and the eigenvectors
into Q we obtain
AQ = QA
or
Q'AQ A
or
A = QAA'
which is referred to as the basic structure of the square symme-
tric matrix A. The rank of A is equal to the number of nonzero
eigenvalues.
If there are m nonzero eigenvalues, the basic structure sug-
gests that two small matrices contain the same information as does
A, viz
2. R- AND Q-MODE FACTOR ANALYSIS 69
[ A J[ Il' J
JU ill] [ UA ] [ A ]
Chapter 3
Some Practical Aspects of
Time Series Analysis
William T. Fox
3.1 GENERALITIES
In many geologic applications, geologic observations in a strati-
graphic succession correspond to changes taking place through time.
Where deposition was continuous and the rate of deposition was rela-
tively constant, stratigraphic thickness can be considered directly
proportional to time. Therefore, time-trend curves can be plotted with
stratigraphic thickness corresponding to time as the independent vari-
able and the parameter being studied as the dependent variable. The
dependent variables can include such things as grain size, carbonate
content, color, or fossil distribution. In modern sedimentary studies
where processes are being studied through time, absolute time can be
plotted as the independent variable and the dependent variables can in-
clude such things as barometric pressure, wind velocity, wave height,
or current velocity.
As pointed out by Kendall (1948, pp. 363-437), a sequence of ob-
servations made in a time series are influenced by three separate com-
ponents: (1) a trend or long-term component, (2) cyclical or oscilla-
ting functions about the trend, and (3) a random or irregular component.
The trend is considered a broad, smooth undulating motion of the sys-
tem over a relatively long period of time or through a relatively
large number of sedimentation units. The cyclical or oscillating
fluctuations about the trend represent a "seasonal effect" or local
variations that are superimposed on the trend component. When the
cyclical fluctuations and the trend have been subtracted from the data,
70
3. TIME SERIES ANALYSIS 71
we are left with the random fluctuations that are referred to as the
random error or residuals.
Several techniques are available for separating the trend compo-
nent from the oscillating fluctuations and random variations in a time
series. The most straightforward method is to draw a purely interpre-
tive curve through the clusters of high and low values on the observed
data curve. This "eye balling" technique has been used effectively by
Walpole and Carozzi (1961) in their study of the microfacies of the
Rundel group. For illustrative purposes, a free-hand trace of the
main trend is useful, but since it is highly interpretive, it would be
difficult for another worker to reproduce. Also, with a free-hand
method, it is difficult to separate a major trend from minor oscilla-
tions in the data.
Weiss et al. (1965) constructed smooth curves or "graphic logs"
by grouping layers into units that were each 3 feet thick. The result-
ing "moving total" curves were used for correlating between adjacent
measured sections and interpreting the depositional environment. The
moving total curves gave a good picture of the gross lithologic changes
but would be difficult to use for detailed stratigraphic studies. This
method gave a reproducible curve for the major trend, but the small-
scale oscillations or other higher-frequency components superimposed
on the random fluctuations were lost.
One of the most frequently used methods for smoothing a time
series is the simple moving average described by Krumbein and Pettijohn
(1938, p. 198) and used by Walker and Sutton (1967, p. 1014). With
this technique, the data are arranged in a stratigraphic sequence with
values recorded at equal increments of time or stratigraphic thickness.
Statting at the base of the section, a series of moving averages is
taken on successive groups involving an odd number of data points. In
practice, the successive averages are computed for a series by dropping
the lowest data point and adding the next value in the sequence. As
pointed out by ~iller and Kahn (1962, p. 355), the moving average tech-
nique should be regarded as descriptive rather than analytical. Be-
cause the weight of each value is equally distributed within the group
being averaged, the moving average technique subdues highs and low and
displaces peaks and valleys in the trend. As with the previous
72 WILLIAM T. FOX
techniques described, this method gives an approximation of the major
trend, but since the highs and lows in the trend are markedly reduced,
the oscillating fluctuations are exaggerated.
The three techniques that provide the most useful methods for
smoothing curves include polynomial curve fitting, iterated moving
averages, and Fourier analysis. Polynomial curve fitting and iterated
moving averages make use of summation equations that are readily adapt-
able to computer programming. In Fourier analysis, a series of sine
and cosine curves representing fundamental harmonics are fit to the
observed data.
3.2 POLYNOMIAL CURVE FITTING
It is implicit in the concept of time trend analysis that the
movement be relatively smooth over long periods of time (Kendall, 1948,
p. 371). Therefore, the trend component (U1) can be represented by a
polynomial in the time element, t, as follows:
+ '"
By increasing the size of p, we can obtain as close an approximation
to a finite series as we desire. When the polynomial is fitted to the
whole time series by the method of least squares, it gives a curvi-
linear regression line of Ut on the variable t. The method of fitting
a polynomial to the data by least-squares analysis has worked quite
successfully for trend surface analysis of facies maps. In fitting a
polynomial response surface to a map, the most success has been with
the linear, quadratic, and cubic surfaces. In using a polynomial
fitted over the entire time series by least squares, it would be
necessary to use a much higher-order polynomial to fit the trend. As
pointed out by Kendall (1948, p. 371), the high-order polynomial would
be somewhat artificial, and the coefficients, being based on high-
order moments, would be very unstable from a sampling point of view.
It would also be difficult to separate the trend component from the
oscillating and random components when using a high-order polynomial.
3. TIME SERIES ANALYSIS 73
As a possible alternative for finding a single high-order poly-
nomial which approximates the entire time series, Kendall (1948, p.
372) suggests using a sequence of low-order polynomials representing
overlapping segments of the series. In using this technique, data
points must be spaced at equal intervals along the time line. The
first step is to take an odd number of data points (2m + 1), with m
representing the number of points on each side of the value being
smoothed, and to fit a polynomial of order p, with p not greater. than
2m, to them. The value of the polynomial at the middle of its range
is substituted for the corresponding observed data point in plotting
the smoothed curve. The polynomial fitting operation is repeated for
consecutive sets of 2m + 1 terms from the beginning to the end of the
time series. The degree of smoothing of the trend curve is controlled
by the number of terms included in the polynomial.
In a series of 2m + 1 terms, the terms are denoted by
According to Kendall (1948, p. 372), the coefficients of a polynomial
of the order p are obtained by the method of least squares giving an
equation of the following form:
(3.2.1)
In equation (3.2.1), the constants, C's, depend on m, the number
of terms, and p, the order of the polynomial, but are independent of
the V's. AO is equivalent to Vo at t = 0, so this is the value for
the middle of the range of the polynomial. As can be seen from equa-
tion (3.2.1), this is equivalent to a weighted average in which the
weights are independent of the observed values. Therefore, to compute
the trend line, the constants for equation (3.2.1) are determined for
the selected values of m and p, and then the value for AO given in
equation (3.2.1) is calculated for each consecutive set of 2m + 1
terms in the series. It should be noted that there will be a loss of
m terms at the beginning and at the end of the trend curve. Tables
listing the formulas for fitting a polynomial of orders 2 and 3
74 WILLIAM T. FOX
(quadratic and cubic) and orders 4 and 5 (qu'artic and quintic) for
m = 1 to 10 are given in Kendall (1948, p. 374) and Whittaker and
Robinson (1929, p. 295). The same value is obtained by fitting a
polynomial of order 2 (quadratic) or order 3 (cubic) since the case p
odd includes the next lowest (even) value of p. Therefore, it is not
necessary to give separate values for the even (quadratic and quartic)
polynomials if the odd (cubic and quintic) polynomials have been cal-
culated (Kendall, 1948, p. 373). Four of the equations fitting a
quadratic and cubic when m = 2, m = 3, m = 4 and m = 5 are given as
equations (3.2.2) to (3.2.5) (Whittaker and Robinson, 1929, p. 295).
m 2 :
m 3:
m 4:
m 5:
U'
o
U'
o
1
35[17UO + 12(U1 + U_ 1) - 3(U2 + U_ 2)]
1
21[7Uo + 6(U1 + U_ 1) + 3(U2 + U_ 2)
- 2(U3 + U_3)]
1
Ub 231[59Uo + 54(U1 + U_ 1) + 39(U2 + U_ 2)
U'
o
+ 14(U3 + U_3) - 21(U4 + U_4)]
1
429[89UO + 84(U1 + U_ 1) + 69(U2 + U_ 2)
(3.2.2)
(3.2.3)
(3.2.4)
As the number of terms is increased, sizes of the constants be-
come quite large and in moving from quadratic and cubic to quartic and
quintic, the sizes of the constants also greatly increase. Because of
the labor involved in their use, it is advisable to use a computer for
plotting trend curves.
3.3 ITERATED MOVING AVERAGES
Several different techniques have been proposed to simplify the
computations for fitting a trend line by moving averages. The iterated
averages method most widely used was introduced by actuaries for "grad-
uating" a life expectancy curve, which is similar to fitting a trend
3. TIME SERIES ANALYSIS 7S
line in geology. Whittaker and Robinson (1929, p. 286) point out one
of the earliest examples of iterated moving averages, which was devel-
oped by Woolhouse in 1870. Woolhouse computed each point in the trend
line by passing five parabolas through the following five sets of
points with three points to a set:
To compute the graduated value, Woolhouse took the arithmetic mean of
the values of the five parabolas as they passed through a line perpen-
dicular to the time series through Va. The values at Va can be deter-
mined for each of the parabolas using the Newton-Gauss formula of in-
terpolation given in Whittaker and Robinson (1929, p. 36). The arith-
metic mean of the interpolated values of Va can also be found by the
following summation formula, which is given as equation (3.3.1):
VI
a
(3.3.1)
The Woolhouse IS-term formula using seven terms on each side of
the central value has about the same degree of smoothing as the nine-
term formula given as equation (3.2.3). The Woolhouse formula using
iterated moving averages gives a smoother trend curve than the fitting
of a quadratic or cubic polynomial to the data.
Another type of iterated moving average formula using three suc-
cessive averages covering IS points was developed by Spencer and is
given by Kendall (1948, p. 376). The first moving average used for
five terms has a constants -3, 3, 4, 3, -3. The values resulting from
this moving average are averaged first in sets of five points each,
then these values are averaged twice in sets of four. The form used
for such an iteration is given as
Vo = 3~0[4]2[S] [-3, 3, 4, 3, -3] (3.3.2)
76 WILLIAM T. FOX
When the separate iterations are combined into a complete summation
formula, the weights are those given in equation (3.3.3), which is
Spencer's IS-point formula.
U6 =
1
320[74UO + 67(U1 + U_ 1) + 46(U2 + U_ 2) + 21(U3 + U_ 3)
(3.3.3)
Expanding the same technique that was used for the Spencer 15-
point formula [equation (3.3.3)], Spencer also developed a 21-point
equation explained by Whittaker and Robinson (1929, p. 290). In using
this formula, the seven-term series (-1,0,1,2,1,0,-1) is averaged
and the values are averaged first in sets of seven, then twice in sets
of five, as is shown in
1 2
Uo= 350[5] [7] [-1, 0, 1, 2, 1, 0, -1] (3.3.4)
Spencer's 21-term formula can be expanded into the following summation
formula;
(3.3.5)
When the computations must be done by hand or with a desk calcu-
lator, it is useful to set up a table to carry out the successive aver-
aging. Vistelius (1961) used such a table to compute the trend terms
for Spencer's 21-term formula.
only portions of the table, it
uated In using
By using cardboard cutouts exposing
is possible to compute about 600 grad-
the table with Spencer's 21-term equa-
tion,
values per day.
the first step
- U_ 3) for the
is to form the computation (1/2) (-U3 + U1 + 2UO
entire series. The values derived from the first
computation are then summed by sevens and divided by seven, then summed
twice by fives, dividing by five each time. The actual order in which
the iterations are carried out is immaterial, but with a long series
3. TIME SERIES ANALYSIS 77
it is advisable to do the more complicated operations while the num-
bers are still small.
The "goodness of fi til of a smoothed curve to the original curve
may be expressed as the percentage reduction in the total sum of
squares, which is given by the expression:
100
where
l:x2
trend -
(l:Xtrend) 2
n
2
l:x 2 (l:Xobs )
obs -
n
values on trend surface at location of data points
observed data values
number of data values
Obviously, a perfect fit of the curve to the data points would give
100 percent and any less perfect fit would yield a correspondingly
smaller percentage of total sum of squares.
Nine smoothing equations are available with the (Fox, 1964) pro-
gram for computing and plotting trend curves with varying degrees of
smoothing. Formulas derived by Sheppard (Whittaker and Robinson,
1929, p. 279) for fitting a quadratic and cubic polynomial to m points,
with m varying from 2 to 10, are used in the program. Equations
(3.2.2) to (3.2.5) in this chapter are the first four equations used
with the program. By increasing the number of terms in the smoothing
equation, the fluctuations in the data are subdued and the underlying
trends of sedimentation are accentuated. As a characteristic of the
type of smoothing equation, there are still minor fluctuations in the
data, even when the 2l-term equation is used for smoothing fossil
data (Figure 3.1). When using an iterated moving average as done by
Woolhouse's IS-term equation and Spencer's 15- and 2l-term equations,
the minor fluctuations are completely removed, leaving only the smooth
trend curve. Since the program was published (Fox, 1964), it has been
modified (Fox, 1968) with the addition of the Woolhouse IS-term equa-
tion of equation (3.2.5) and the Spencer 21-term equation of equation
(3.3.5). In degree of smoothing, the Woolhouse equation is equivalent
to the Sheppard nine-term equation (m = 4), and the Spencer 21-term
78
45
4080 246
%
ICrinOidea
IBryozoa
WILLIAM T. FOX
ITrilobita
IBrachiopoda Icoelenterata
, Mollusca
f~'O ~ ~ ~'b ~ ~
.~,~.. }! .. t ¢~ ,i: f .. ~ !? !?
~~ r.: $~~~ 0 S'!-..tti~6QOQ
~~!!,;, ~ ~ ~.~ ~ f.i~;;;O~~
~".. JJ. ~Q -s:,"l ~ I ~ ~~§$"J:
. , v q ~ l~ "q <Ii~,' ,
RHO Scale
FIGURE 3.1
c
B
2 2
W.T.F.
equation is equivalent to the Sheppard 11-term equation em = 5). Be-
cause of the more uniform smoothing, the iterated moving average curves
are preferred over the polynomial fitting curves. The only apparent
disadvantage to using the Spencer and Woolhouse formulas is the loss of
points at the beginning and end of the curve. Since the series being
smoothed is quite long relative to the loss at each end, the overall
effect is not too bad.
3.4 FOURIER ANALYSIS
Geologic processes that are cyclic in nature can be best described
using Fourier analysis. In Fourier analysis, a complicated curve can
3. TIME SERIES ANALYSIS 79
be broken down into an aggregate of simple wave forms described by a
series of sine and cosine curves. The observed data can be expressed
as a series of fundamental harmonics that are theoretically independ-
ent. Each harmonic has a wavelength that is a discrete fraction of
the total observation period. For each harmonic, the wavelength is
defined as the distance from crest to crest and the amplitude as one-
half the height from trough to crest. In Fourier analysis of geologic
processes, wavelength can be expressed in time or stratigraphic thick-
ness and the amplitude in the observed units for each parameter.
The complicated form of the observed data can be represented by
an aggregate of simple wave forms that are expressed by the amplitude
of the cosine and sine terms, an and bn , respectively. Although the
function of the form Z = f(x) is not known, data points (xi' zi)' are
available at equal intervals. Thus, the coefficients an and bn may be
determined by numerical integration methods employing equations (3.4.1)
and (3.4.2) and used in equation (3.4.3) to approximate the observed
curve according to methods described by Harbaugh and Merriam (1968)
and Fox
a
n
b
where
n
x.
1.
n
Z.
1.
ahd Davis (1971) .
~[ZO + ZK K-1 nrrx.]
+ L: z. cos T n O,1,2, ... ,K/2 (3.4.1)
K 2
i=l
1.
K-l nrrx.
~L:z. sin 1.
1,2, ... ,K/2 (3.4.2)
-L- n
i=l 1.
aO N nrrx. nrrx.
F(xi ) 1. 1.
Z+L: a cos -L-+ b sin -L- (3.4.3)
n n
n=l
observed value at i-th sampling point,
value of approximating function at i-th sampling point
coefficient of zeroth degree cosine term, which is equal to
the mean
degree of term
coefficient of cosine terms, n = 1,2, ... ,
coefficient of sine terms, n 1, 2, ... , '"
3.1416
sampling point, time in this case
80 WILLIAM T. FOX
i
K
L
N
0, 1, 2, ... , K
maximum number of sampling points (an even number)
half of fundamental sampling length K 6x/2
maximum degree of series, N = K/2.
In analyzing geologic processes, it is convenient to plot each
harmonic as a single sine curve with a given phase and amplitude. In
Fourier analysis, each harmonic is expressed by a pair of sine and co-
sine curves with the same period. When the sine and cosine curves are
added algebraically, a new sine curve results with a phase shift and a
new amplitude. The phase shift can be determined by using an arc-
tangent subroutine (Louden, 1967). The phase for each harmonic, Pn'
can be computed according to
a
n
Pn = arc tan b
n
(3.4.4)
The phase that is expressed in degrees is used to determine the start-
ing point for a sine curve for each individual harmonic. Since the
period for each harmonic is expressed in hours, it is also possible to
convert the phase into hours. In this way, it is possible to compare
coastal parameters such as wave height or wave period by comparing the
phase shifts of different harmonics. The amplitude, an' for each har-
monic can be determined directly from the power spectrum (Preston and
Henderson, 1964). The discrete power spectrum, a 2 , is defined as the
n
sum of the square Fourier coefficients according to
n = 0, 1, 2, ... , K/2 (3.4.5)
The term "power spectrum" arose because of its relation to the power
dissipation in an alternating current circuit (Harbaugh and Merriam,
1968). The amplitude for each harmonic is derived from the square
root of the power spectrum according to equation (3.4.6)
_I 2 2
a ="a +b
n n n
n = 0, 1, 2, ... , K/2 (3.4.6)
where the phase, Pn' is expressed in radians and the amplitude, an' is
derived from the power spectrum. The height of the curve, zi' can be
computed at each sampling point using
3. TIME SERIES ANALYSIS 81
N n~(p + x.)
aO '" • n 1
_ + L...J On sln L
2 n=l
(3.4.7)
The amplitudes of the Fourier coefficients are especially meaningful
because they are in the same units as the original data. Thus, if
wave height is measured in feet or longshore current velocity in feet
per second, the Fourier coefficients will be expressed in the same
respective units.
Since each Fourier component is a discrete harmonic of the curve
for the observed data, the Fourier components are theoretically inde-
pendent. The number of Fourier components available is equal to one-
half the total number of data points or observations. Therefore, with
360 observations taken at 2-hour intervals over 30 days, it is possi-
ble to obtain 180 Fourier harmonics with periods ranging from 4 to 720
hours. With least squares techniques, the total variance accounted
for by each harmonic can be calculated. Theoretically, a curve con-
sisting of the full 180 Fourier harmonics should account for 100 per-
cent of the total sum of squares. In practice, a small number of
basic harmonics usually accounts for a very large percentage of the
total sum of squares. Where a particular harmonic or set of harmonics
are related to a naturally occurring cycle a large percentage of the
sum of squares can be accounted for by a small number of harmonics.
3.5 AN APPLICATION
Barometric pressure, which is a major indication of the weather
patterns passing through an area, has been selected to demonstrate
how Fourier analysis is used. The observed curve for barometric pres-
sure from 8:00 a.m., June 29 through 6:00 a.m., July 29, 1970, is
plotted acro1s the top of Figure 3.2. From the observed data, it can
be seen that four ,major low-pressure systems passed through the area
on July 4, 9, 15, and 19, 1970. Minor low-pressure systems, which
cause small fluctuations in the observed curve, can be seen on July 1,
17, and 24. The major low-pressure systems are accompanied by high
wind ,and waves that caused beach erosion or deposition, and changes
in the configuration of the nearshore bars.
82
30.2
'" 30.0
Q)
.s::
<.>
c:
- 29.8
2
3
7
8
WILLIAM T. FOX
July, 1970
5 10 15 20 25
FIGURE 3.2
The period, cosine, sine, phase, amplitude, and sum of squares
for the first 15 Fourier harmonics for barometric pressure are given
in Table 3.l.
3. TIME SERIES ANALYSIS 83
TABLE 3.1
Harmonic, Period Cosine, Sine, Phase, Amplitude, Sum Sq. ,
a b Pn [J
%
n n n n
1 720.0 0.059 -0.096 296.8 0.112 33.6
2 360.0 -0.024 -0.069 199.2 0.073 15.2
3 240.0 -0.053 0.046 207.2 0.070 15.1
4 180.0 0.031 0.005 40.4 0.031 2.1
5 144.0 0.045 0.049 17.0 0.067 11.1
6 120.0 -0.051 -0.033 88.7 0.051 8.8
7 102.6 -0.012 0.016 92.0 0.020 1.3
8 90.0 0.028 -0.029 34.1 0.041 4.1
9 80.0 0.012 -0.008 27.3 0.014 0.2
10 72.0 -0.009 -0.008 45.4 0.012 0.7
11 65.3 0.005 0.007 6.4 0.009 0.1
12 60.0 0.007 -0.009 23.9 0.011 0.2
13 55.2 -0.032 -0.009 38.2 0.024 2.2
14 51. 3 0.020 -0.004 14.5 0.020 0.8
15 48.0 0.007 -0.019 21. 2 0.020 0.9
The first harmonic has a period of 30 days or 720 hours and an ampli-
tude of 0.112 inches of mercury. The phase for the first harmonic is
148.4 degrees or 296.8 hours. The first harmonic, which accounts for
approximately 33.6 percent of the total sum of squares, is plotted as
the second curve in Figure 3.2. The curve for the first harmonic has
a low on July 9 and a high on July 24. This is strongly influenced by
the low-pressure systems early in the month and the high-pressure sys-
tem that passed through the area late in the month. The second har-
monic plotted as the third curve in Figure 3.2 has a period of 15 days
or 360 hours, an amplitude of 0.73 inches, and a phase of 199.2 degrees
and hours. This harmonic has highs on July 10 and 25 and lows on July
2 and 17. Although the curve for the second harmonic does not appear
to agree with the observed data, it accounts for 15.2 percent of the
total sum of squares. The curves for the third through eighth har-
monics are also plotted in Figure 3.2 along with the cumulative curve
84 WILLIAM T. FOX
for the first eight harmonics. The first eight harmonics for baro-
metric pressure account for 90.6 percent of the total sum of squares.
As with any periodic data having a wave form, the harmonics interfere
with each other resulting in the reinforcement or cancellation at cer-
tain parts of the curve. In the cumulative curve for the first eight
harmonics the major high- and low-pressure systems in the observed
data can be easily recognized.
In order to get a closer approximation of the mathematical func-
tion representing barometric pressure, the first 15 Fourier harmonics
were computed from the observed data. The cumulative curve for the
first 15 harmonics, which accounts for 95.4 percent of the total sum
of squares, includes the minor lows on July 1, 17, and 24. The 15th
harmonic has a period of 48 hours or two days, therefore the bottom
curve in Figure 3.2 accounts for all the variation in the data which
has a period of one day or longer. The residual obtained by subtract-
ing the IS-term curve from the observed data still accounts for 4.6
percent of the total sum of squares. This can be accounted for by the
diurnal variation in barometric pressure due to heating during the day
and cooling off at night. The normal diurnal fluctuation of baromet-
ric pressure has an amplitude of about 0.03 inches of mercury. By
using Fourier analysis, therefore, it is possible to eliminate the di-
urnal variation from the barometric pressure curve. It is also possi-
ble to compare barometric pressure with other environmental parameters
by comparing the phase and amplitude for each of the Fourier components.
By keeping the number of harmonics constant, it is possible to visu-
ally compare the computed curves. The period, phase, and amplitude
for each of the environmental parameters are given in Fox and Davis
(1971) .
In the northern hemisphere, winds circulate in a counterclockwise
direction around a low-pressure system. During the summer months, the
low-pressure systems generally pass to the north of. the study area
located on the eastern shore of Lake Michigan. Therefore, as the low-
pressure system approaches the area, winds blowout of the southwest
and generate waves from that direction. As the front passes over, the
wind builds up in intensity and shifts over to the northwest. Since
the winds following the passage of the front are generally stronger,
3. TIME SERIES ANALYSIS 8S
the waves from the northwest are higher and have a longer period. Dur-
ing the high wave conditions following the passage of the front, the
waves run up on the beach and water percolates into the groundwater
system.
The IS-term Fourier curves for wind velocity, wave period, breaker
height, and groundwater table level are plotted in Figure 3. 3. The
period, phase, and amplitude for each of the harmonics are given in
Fox and Davis (1971). Wind, which is the driving force, controls the
wave period and breaker height, which in turn influence the level of
the groundwater table. Therefore, a phase lag would be expected in
the Fourier curves with wind velocity reaching a peak first, followed
by wave period and breaker height, with the groundwater table
g20
..c:
5
2
3
2
2
i 1
LL
July. 1970
5 10 15
I ' « I I I I I I II I
20
! I I
25
'"
Lake
O.......-->...........:.:..:.-........:..a_-.;...:...;_......__......_ .......__..... level
FIGURE 3.3
86 WILLIAM T. FOX
responding a few hours later. The IS-term Fourier curve for wind ve-
locity shows peaks that correspond to the low points in the barometric
pressure curve in Figure 3.2. The maximum wind velocity was recorded
at 8:00 a.m. on July 4, 1970. The curves for wave period and breaker
height in Figure 3.3 correspond quite closely to the wind velocity
curve. The curve for wave period has its peaks a few hours after
breaker height and drops off more slowly than breaker height. As
waves change from storm waves to swells, the breaker height decreases
and the wave period increases.
There is a surprisingly close correspondence between the curves
for breaker height and groundwater level in Figure 3.3. Groundwater
level was measured in three tubes located approximately 10, 21, and 32
feet from the plunge zone. Since the plunge zone moves with time,
average distances are given for the groundwater tubes. For each of
the groundwater tubes, the sixth Fourier harmonic with a period of
120 hours, or S days, accounts for the greatest percentage of the
total sum of squares. For the first groundwater tube, the sixth har-
monic has a phase of 9.8 hours. For the second tube, the same har-
monic has a phase of 12.3 hours and for the third tube, it has a phase
of 16.8 hours. This yields a phase difference of approximately 7 hours
between the first and the third tubes. Since the tubes are 22 feet
apart, this indicates that the groundwater that was fed into the fore-
shore by the run-up of the beaches percolates through the beach at a
rate of approximately 3 feet per hour. Therefore, the curve for the
second groundwater tube, which is given in Figure 3.3, has a phase lag
of approximately 7 hours behind the breaker height curve.
The reversal of wind direction with the passage of low pressure
systems plays an important role in controlling coastal processes.
Plots of the alongshore component of the wind, longshore current ve-
locity, and breaker angle are given in Figure 3.4. As a low-pressure
system approaches the study area, wind and waves are generated from
the southwest. The waves that move onto the shore from the southwest
generate longshore currents that move to the north. As the low-
pressure system passes over the area, wind direction shifts over to
the northwest, generating waves out of that direction. With the shift
in wind direction, breaker angle and longshore current are reversed
3. TIME SERIES ANALYSIS 87
July. 1970
5 10 15 20 25
. . . . I . I
20
Alongshore wind
5 10
0
.t=
North
:;;
Co
0
~ South
~
10
-20
2
"0
c:
0
~ South
:;; 0
Co
North
<i
" -1
u.
North
South
FIGURE 3.4
with the current flowing to the south. Wind and waves approaching
from the northwest, accompanied by a southward flowing'longshore cur-
rent, are recorded as positive, while wind and waves from the south-
west with a northward flowing longshore current are considered nega-
tive. As a low-pressure system approaches, a gradient wind is gener-
ated around the low-pressure system which spirals counterclockwise in-
ward toward the center. Since the fronts pass to the north of the
study area during the summer, the counterclockwise winds are blowing
out of the southwest as the front moves into the sea. As the front
approaches, the winds increase in velocity, building up the heights of
the breakers and increasing the velocity of the longshore currents.
After the front passes over, the winds shift over to the northwest fol-
lowed by a corresponding shift in breaker angle and longshore current
88 WILLIAM T. FOX
direction. The storm cycle pattern, with a low in barometric pressure
accompanied by a peak in wind velocity and breaker height and a rever-
sal in longshore current direction, is repeated several times in Fig-
ures 3.2, 3.3, and 3.4.
REFERENCES
Fox, W. T., 1964, FORTRAN and FAP program for calculating and plotting
time-trend curves using an IBM 7090 or 7094/1401 computer system:
Kansas Geol. Survey Spec. Dist. Pub. 12, 24.
Fox, W. T., 1968, Quantitative paleoecologic analysis of fossil com-
munities in the Richmond Group: Jour. Geology, v. 76, pp. 613-
640.
Fox, W. T. and Davis, R. A., Jr., 1971, Fourier analysis of weather
and wave data from Holland, Michigan, July, 1970: O.N.R. Tech.
Report No.3, Contract 388-092, 79 p.
Harbaugh, J. W., and Merriam, D. F., 1968, Computer applications in
stratigraphic analysis: New York, John Wiley &Sons, 282 p.
Kendall, M. G., 1948, The advanced theory of statistics: London, C.
Griffin &Co., 503 p.
Krumbein, W. C., and Pettijohn, F. J., 1938, Manual of sedimentary
petrography: New York, Appleton-Century Co., 549 p.
Louden, R. K., 1967, Programming the IBM 1130 and 1800: Englewood
Cliffs, N.J., Prentice-Hall, Inc., 433 p.
Miller, R. L., and Kahn, J. S., 1962, Statistical analysis in the geo-
logical sciences: New York, John Wiley &Sons, 483 p.
Preston, F. W., and Henderson, J. H., 1964, Fourier series characteri-
zation of cyclic sediments for stratigraphic correlation, in
symposium on cyclic sedimentation (Merriam, D. F., ed.): Kansas
Geol. Survey Bull. 169, v. 2, pp. 415-425.
Vistelius, A. B., 1961, Sedimentation time-trend functions and their
application for correlation of sedimentary deposits: Jour. Geo-
logy, v. 69, pp. 703-728.
Walker, R. G., and Sutton, R. G., 1967, Quantitative analysis of tur-
bidites in the Upper Devonian Sonyea Group, New York: Jour. Sed.
Petrology, v. 37, pp. 1012-1022.
3. TIME SERIES ANALYSIS 89
Walpole, R. L., and Carozzi, A. V., 1961, Microfacies study of the
Rundle Group (~1ississippian) of Front Ranges, Central Alberta,
Canada: Am. Assoc. Petroleum Geologists Bull., v. 45, pp. 1810-
1846.
Weiss, M. P., Edwards, W. R., Norman, C. E., and Sharp, E. R., 1965,
The American Upper Ordovician standard. VII. Stratigraphy and
petrology of the Cynthiana and Eden Formations of the Ohio Valley:
Geol. Soc. Amer. Spec. Paper 81, 76 p.
Whittaker, E. T., and Robinson, G., 1929, The calculus of observations
in A treatise of numerical mathematics (2d ed.): London, Blackie
&Sons, 395 p.
Chapter 4
Markov Models in the Earth Sciences
w. C. Krumbein
4.1 FUNDAMENTALS
The term "random process" has an unfortunate connotation for many
earth scientists. It seems to imply a haphazard, unorganized, spora-
dic, and unpredictable process that violates the basic principles of
science. These principles rest heavily on the fact that science seeks
for systematic, patterned responses from recognizable causes and that
unpredictable or chance events have no place in scientific analysis.
Hence, by extension, models that postulate any kind of random occur-
rences are naturally held suspect.
Much of this misunderstanding arises from lack of recognition that
a random variable has as valid a basis in scientific investigation as
the conventional nonstochastic variable (systematic variable) that
forms the basis of classLcal mathematical physics. A random variable
is a mathematical entity that arises from probabilistic mechanisms,
just as systematic variables are associated with deterministic me-
chanisms. The outcome of a deterministic experiment is exactly pre-
dictable from knowledge of the relations between dependent and inde-
pendent variables, whereas the outcome of a probabilistic experiment
depends on the liklihood of a given event occurring in some underlying
set of probabilities. This set of probabilities constitutes the sam-
ple space of the probabilistic mechanism; if this is known, then the
group behavior of the variables is completely predictable in rigorous
mathematical terms. Moreover, the probability of a particular event
occurring can also be exactly stated.
90
4. MARKOV MODELS IN THE EARTH SCIENCES 91
4.2 A SPECTRUM OF MODELS
In virtually all fields of science the range of process mechanisms
extends from fully path-dependent deterministic models (in which past
events completely control future events) to independent-events models,
in which the past has no influence whatever on future events. A simple
example of a deterministic model is the negative exponential process
in time,
-at
f(t) = Y = YOe (4.2.1)
where the dependent variable Y is completely controlled by the constant
YO' the fixed exponent a, and the independent variable t. If YO and a
are known either from theory or experiment, the value of Y associated
with any time t can be exactly predicted. Although equation (4.2.1)
is a continuous function, it can be discretized by considering succes-
sive values of f(t) at some small increment lit. In this way the "state
of the system" can be thought of in terms of discrete points in time
t n_2, tn_I' tn' tn+l' and so on. For some phenomena the distance, X,
can be substituted for time.
At the other end of the spectrum is the independent-events model.
In discrete form this is expressed as
I t = i} = p.
n J
(4.2.2)
which says that the probability of the system being in state j at time
tn+l' given that the state at tn is i, is simply the probability of
state j occurring at time tn+l' wholly independently of the previous
state of the system.
Somewhere between these extremes lie processes in which partial
dependencies are present, in the sense that the state of the system at
tn+l does depend on the state at tn' but is not influenced by earlier
states as at tn_I' This particular case gives rise to the simplest
kind of Markov chain, a discrete-time, discrete-state, one-step memory
process expressed as
p ..
~J
(4.2.3)
92 IV. C. KRUMaEIN
in which p .. is the conditional probability of the system being in
1J
state j at tn+l' given that it was in state i at tn· Here Pij is the
transitional probability: the probability that the system changes
from state i to state j in the discrete time step from tn to tn+l.
4.3 THE MARKOV CHAIN
In its simplest classical form the first-order, discrete-state,
discrete-time Markov chain can be visualized as representing a system
with a finite number of discrete states, A, B, C, ... , behaving in such
a way that a transition occurs from state i to j (where j may be either
the same state or a different state) at each tick of a conceptual
"Markovian Cl,ock." The model is expressed as a transition probability
matrix with rows represented by i and columns by j. A three-state
system can be shown as:
To State j
A B C
A PM PAB PAC
From State i B PBA PBB PBC
e PCA PCB PCC
Here, PM' PBB ' and Pec' commonly designated as Pii' represent
transitions from a given state to itself, whereas the offdiagonal en-
tries, designated as p.. where jFi, represent transitions to other
1J
states. The notation in the matrix is such that, for state A, PM is
the probability that the system will remain in the same state, PAB is
the probability that the system will move to state B at a given clock
tick, and PAC is the probability that it will move to state C. These
three probabilities sum to 1.0. Note that when transitions occur from
a given state to itself, no change in the system is apparent to an on-
looker until the system, on some given clock tick, does change to a
different state. The length of time that the system remains in a given
state after having entered it at a particular click, is called the
4. MARKOV MODELS IN THE EARTH SCIENCES 93
(discrete) waiting time for state i. Literally it refers to the num-
ber of clock ticks that the system "waits" in state i before leaving i
for another state jFi.
In structuring data for this simplest Markov chain, observations
of state are made at each tick of the clock. This of course is an
imaginary clock, and in practice one selects a fixed time interval
based on theory, observation of what is going on, or even simple geo-
logic intuition. In stratigraphic applications observations of state
are made at fixed vertical intervals along a stratigraphic section,
say at every foot. Thus, if state A represents sandstone, state B
shale, and state C limestone, a sequence of such observations upward
through the section might be AAABBCBBBABBCCCA... , which says that the
system starts in state A and remains there for two more clock ticks,
after which it changes to state B and remains there for a second tick,
changes to state C for one click, then returns to state B, and so on.
If observations are made at I-foot intervals, the section has 3 feet
of sandstone at the base, followed by 2 feet of shale, 1 foot of lime-
stone, and so on.
In this particular procedure, equal increments of distance are
used instead of equal increments of time. This is a matter of conveni-
ence, and although several severe geologic implications may be involved,
the matrix is still expressed as transition probabilities, but now the
waiting time is a discrete "thickness time." Krumbein and Dacey (1969)
refer to this kind of structuring as a Markov chain with transition
matrix P.
The actual procedure for assembling the transition matrix is
given in Krumbein (1967, p. 3). Here, we follow thro~gh with the sim-
plest case.
Transition probability matrices can be generated from any succes-
sion of events, but this of itself gives no indication whether the
process is Markovian. Statistical tests are available for making the
decision; the most widely used is that of Anderson and Goodman (1957),
which tests the hypothesis of an independent-events model against the
alternative that a first-order Markovian property is present.
Even if the hypothesis of an independent-events model is rejected,
and that of a first-order chain accepted at least by implication, there
94 W.C. KRUMEEIN
is a second requirement that must be fulfilled. This is that the dis-
tribution of waiting times for each state must be geometrically dis-
tributed. This follows from the fact that the output from any simula-
tion or Monte Carlo studies with Markov chains having matrix P = [p .. J
1J
is distributed in this manner with parameter (1 - Pii). This require-
ment is so important that it deserves detailed examination.
4.4 GEOMETRIC DISTRIBUTION
This distribution is conveniently examined in terms of indepen-
'dent-events models of the kind shown in equation (4.2.2). Consider a
single six-sided die, in which the sample space has six elements, rep-
resenting the six faces, each with its pattern of dots. Each face has
probability 1/6 of occurring faceup on any toss, and probability 5/6
that some other face will appear on top. If we consider the die in
these terms, we have a simple system of two states. If a given face,
say 4, comes up on a given toss, we can translate this into a "waiting
time" by asking how long the system will remain in this state, i.e.,
what is the liklihood that a 4 will occur once, twice, or more times
in successive throws before a non-4 shows up?
We can set up an equation for this as follows: Let P{R = k} be
the probability that the number of times, R, that a 4 comes face up is
exactly k, where k 1,2,3, .... The probability that a 4-face ap-
pears initially is p 1/6, and the probability that it will not ap-
pear on the next trial is 1 - P = 5/6. Thus, once a 4-face occurs,
the probability that it will occur exactly k times means that it must
be repeated exactly k - 1 times, so that on the k-th trial some face
other than a 4 will occur. This leads to the geometric density with
parameter (1 - p):
{ } k-1 k-1
P R = k = (1 - p) (p) = (5/6) (1/6) (4.4.1)
In this expression a "success" occurs when on the k-th trial a non-4
appears face up.
The successive probabilities are easily calculated.
k = 1, we obtain P{R = I} ~ (5/6) (1/6)° = 5/6 = 0.8333.
By setting
For k = 2
4. MARKOV MODELS IN THE EARTH SCIENCES
this becomes (5/6) (1/6)1 = 0.1389; for k = 3 we have (5/6) (1/6)2
0.0232; and the probability that k exceeds 3 is only 0.0046.
95
The geometric distribution applies to all independent-events
models of the kind expressed by equation (4.2.2). The reason that it
also applies to each state of a Markov chain with transition matrix
P = [p .. ] is that each row of the transition matrix. is in fact a two-
1J
state independent-events model (a Bernoulli model) in that when the
system is in state i, the probability of remaining in that state on
the next tick of the clock is Pii and the probability that it will
change to another state j#i is (1 - Pii). The one-step memory of the
Markov chain is thus related to the outcome of a random draw that de-
termines whether the next drawing is to be made from the same row
(state) or from some other row as specified by the offdiagonal p.. 's
1J
for j#i.
The full development of the geometric distribution as it applies
to the simplest Markov chain is given in Krumbein and Dacey (1969, p.
83), which includes an example with a histogram. It is interesting to
note, incidentally, that (1 - Pii) is estimated by the reciprocal of
the arithmetic mean of the geometric distribution. In terms of equa-
tion (4.4.1), Pii is the probability of remaining in state i for any
individual drawing, and (1 - Pii) is the probability of leaving state
i on any given tick of the clock.
4.5 PROBABILITY TREES
The transition matrix P can be used to show the probabilities as-
sociated with each state of the system through a succession of ticks on
the Markovian clock. Such a tree is illustrated in Krumbein (1967, p.
27) and in Harbaugh and Bonham-Carter (1970, p. 115-117). The diagrams
are very instructive, and with patience one can extend the tree until
succeeding sets of branches all have fixed probabilities. At this
point the system has reached equilibrium, and the fixed probabilities
associated with each state in the system express the overall average
relative proportion of that component in the system under study.
96 W.C. KRUMBEIN
When stratigraphic data are structured in the equal-interval form,
with vertical spacing of h feet, the fixed probability vector x 100
gives the overall percentage of the total thickness of each lithology
in the section. Mathematically the fixed vector is obtained by rais-
ing the matrix P to successively higher powers, and noting when all the
rows of the transition matrix achieve the same values.
The fixed probability vector is actually an independent-events
model, and simulations arising from it have geometric distributions as
in equations (4.4.1), with parameters (1 - Pi)' where Pi is now the
fixed probability of the i-th component.
This simplest Markov model serves to bring out some essential
points in applications of Markov processes in geology. Recall that it
is a discrete-state, discrete-time, one-step memory model. Variants
of this model may have two or more steps in their memories (Pattison,
1965; Schwarzacher, 1967, James and Krumbein, 1969, p. 552). Moreover,
the chains may be converted to continuous-time models (Krumbein, 1968a,
b), and (though this becomes mathematically complex), the states may
be made continuous instead of discrete.
A point not previously emphasized is that the probabilities in a
Markov matrix must be stationary in the sense that the transition
probabilities remain the same through the entire system being studied.
Harbaugh and Bonham-Carter (1970, p. 122) develop this topic in greater
detail.
A very large variety of experiments may be conducted in the frame-
work of this simplest Markov model. Time or distance may in general
be interchanged, and neither of these need to be observed at fixed
intervals.
A question that emerges when Markov models are examined in detail
concerns proper procedures when the input data are not geometrically
distributed. This is particularly appropriate in stratigraphic ana-
lysis, where much observational data suggest that rock thicknesses are
distributed lognormally rather than geometrically. We examine this
situation next.
4. MARKOV MODELS IN THE EARTH SCIENCES 97
4.6 Et~BEDDED MARKOV CHAINS
When a set of real-world data has the Markov property but does
not display a geometric distribution for each state, an embedded
Markov chain may be more appropriate for analysis. This version of
the Markov model is obtained by structuring the transition probability
matrix on the basis of changes of state only, so that no transitions
from a given state to itself are permitted. In this arrangement the
sequence of observations listed earlier becomes simply ABCBABC ... ,
thus recording only the sequence of rock types in the stratigraphic
section.
The result of structuring the data this way is to reduce the dia-
gonal elements in the transition matrix to zero. Because this model
behaves very differently from the Markov chain with transition matrix
p = [p .. ], Krumbein and Dacey use the symbol r .. for the transition
~J ~J
probabilities, with r ii identically zero. Thus the earlier matrix be-
comes:
To State
A B C
A
[r:A
r AB
TA"J The r I s here are
From State i B 0 r BC probabilities,
not correlation
C rCA r CB 0 coefficients.
This is called an embedded Markov chain with transition matrix
R = [r . . J. It is obtained from the P matrix by the relation r .. =
____
~:..J_ ~J
p .. /(1 - p .. ) for all jli. Each p.. in the diagonal is then changed
~J ~~ ~~
to zero. The embedded chain does not specify a waiting time, which
means that any frequency distribution can be used as follows: The R
matrix is used to get the succession of states, and for each occur-
rence a random observation is drawn from the frequency distribution
of elements in the corresponding state.
Carr et al. (1966) used such a matrix in a study of a Chester
(upper ~1ississippian) section, using lognormal thickness distributions
for each component. Although the embedded chain is independent of the
98 W.C. KRUMBEIN
geometric waiting time, the matrix nevertheless should be tested for
the Markov property. An interesting question to be raised is what a
simulated section would look like if the R matrix is used directly for
simulation, without assigning random thicknesses.
It should be mentioned here that probability trees and fixed
probability vectors for embedded Markov chains with matrix R do not
have the same interpretation as in Markov chains with matrix P. When
the embedded matrix (with its zero diagonal elements) is raised to its
equilibrium power, the fixed probabilities x 100 give the percentage
of the overall number of times that a given lithologic type occurs.
This is because the R matrix gives no information about thicknesses.
In their study of the Chester section, Carr et al. noted that oc-
casionally a given rock type was immediately followed by a variant of
the same kind of rock, as when a thickbedded limestone is overlain
directly by a thinbedded limestone. To cope with these situations,
entries were put into the diagonal of the R matrix, to re~resent the
probability of such occurrences among the lithologic units in their
section. This variant was called a multistory transition matrix, and
it was used in simulation as before, drawing random thicknesses as
needed.
The introduction of any finite element into the diagonal of an
embedded matrix immediately introduces a geometric waiting time into
that state of the system, and random thicknesses drawn from a lognor-
mal distribution are no longer strictly appropriate. This can easily
be seen by simulation experiments, which automatically yield geometric
thickness distributions with parameter (1 - r .. ). Potter and Blakely
11
later (1967) used this same kind of matrix to simulate a fluviatile
sandstone section with several varieties of sand bedding.
The problems raised by adoption of either the simple ~arkov chain
with matrix P = [p .. ] or the embedded chain with matrix R = [r.. ]
1) 1)
mainly concern the "true" distribution of bed thicknesses and litho-
logic-unit thicknesses in stratigraphic sections. What has been sug-
gested is a re~examination of the operational definitions by which these
distributions are obtained, inasmuch as the critical point involved is
the relative frequencies of very thin beds (Krumbein, 1972). If cur-
rent operational definitions are insensitive to very thin beds or
4. MARKOV MODELS IN THE EARTH SCIENCES 99
units, an observed distribution could appear to be lognormal rather
than geometric, or more properly exponential in the continuous case,
inasmuch as the continuous equivalent of the geometric distribution is
simply f(t) SeSt , where S is the parameter, related to the transition
probabilities in the P matrix.
4.7 EXTENSIONS
Two aspects of Markovian analysis in stratigraphy will probably
become more important as time goes on. One of these, touched upon
earlier, is the use of transition rates rather than transition proba-
bilities. This involves moving the model from discrete to continuous
time, but allowing the states to remain discrete. The subject is ex-
plored in Krumbein (1968a, b) in terms of a lateral-shift model that
can be applied to transgressive-regressive movements of a strand line.
This can actually be done with transition probabilities by redefining
the· states of the system as the successive positions through time of
the strand line, as monitored in terms of strand-line deposits. In
this approach one may start to analyze processes rather than responses,
though observational data may be less readily obtained. Part of the
difficulty involves exact relations between time and thickness of
stratigraphic units, if the outcome of a continuous-time process is to
be expressed in rock thicknesses as continuous distributions.
A second promising avenue for further research is to examine
stratigraphic sections at the individual-bed level rather than that of
the rock units themselves. In this approach the transition matrix is
based on successions of individual beds, and the transition probabili-
ties express the liklihood that a given kind of bed (say of shale) per-
sists through a number of successive clock ticks, or whether the state
changes to another kind of bed, say of limestone. This model involves
three frequency distributions, one representing the distribution of the
number of beds per lithologic unit, the second representing the thick-
ness distributions of the individual beds, and the third concerns the
thickness distributions of the lithologic units, whose individual
thicknesses represent the sum of all the bed thicknesses in the unit.
100 HI.C. KRUMBEIN
Dacey and Krumbein (1970) have looked into this problem, and the
most interesting part of the study was the demonstration that if the
number of beds in lithologic units of the i-th lithology are dis-
tributed geometrically, and if the thickness distribution of beds of
the i-th kind of lithology in the lithologic unit is also geometric
(with the same or different parameters), then the thickness distribu-
tion of the lithologic units of the i-th kind of lithology will be
distributed geometrically with a parameter predictable from the param-
eters of the number of beds distribution and the bed thickness dis-
tribution.
Implementation of the lateral-shift model and of the bed-
transition model is hindered by lack of observational data in the lit-
erature. The lateral-shift model requires identification of cross-
cutting relations between time lines and rock lines over short dis-
trances, and the bedding model is presently hindered by lack of truly
discriminatory ways of distinguishing between thickness samples drawn
from logarithmic and exponential distributions. Some examples of both
of these kinds of distributions tend to plot as relatively straight
lines on log-probability paper, and even chi-square tests may not be
fully discriminatory.
Despite the difficulties that beset advanced analytical applica-
tions of Markov models, in contrast to their largely descriptive pre-
sent use, there are several relatively straightforward criteria that
can be helpful in choosing Markovian models as against independent-
events models for analyzing earth-science data. Basically these de-
pend upon two considerations: presence or absence of a Markov prop-
erty; and presence or absence of geometric distributions in the data
of concern. Four combinations can be distinguished in stratigraphic
analysis:
1. The observed data have a first-order Markov dependency (i.e.,
event t 1 is controlled by the state of the system at event t )
n+ n
in the succession of lithologies, and they each have a geometric
distribution of lithologic-unit thicknesses.
2. The observed data have a first-order Markov dependency in the suc-
cession of lithologies, but they do not have a geometric distribu-
tion of lithologic-unit thicknesses.
4. MARKOV MODELS IN THE EARTH SCIENCES 101
3. The observed data do not have a first-order Markov dependency in
the succession of lithologies, but they do have a geometric dis-
tribution of lithologic-unit thicknesses.
4. The observed data have neither a first-order Markov dependency in
the succession of lithologies nor do they have a geometric dis-
tribution of lithologic-unit thicknesses.
If combination 1 obtains, then all operations and interpreta-
tions that apply to discrete-state discrete-time first-order Markov
chains with transition matrix P are appropriate. If combination 2
obtains, then the appropriate model is the embedded Markov chain with
transition matrix R.
Where combination 3 obtains, the appropriate model is an indepen-
dent-events model of the kind shown in equation (4.2.2).
Combination 4, having neither the Markov property nor the geo-
metric distribution, is outside the limits of this discussion. In the
present context, however, this last case could be called a "degenerate
Markov chain," just as, in a sense, Anderson and Goodman's (1957) test
is that of HO equals a "zero-order chain" as against Ha equals a
first-order chain.
REFERENCES
Anderson, T. W., and Goodman, L. A., 1957, Statistical inference about
Markov chains: Anals Math. Statistics, v. 28, p. 89-110.
Carr, D. D., and others, 1966, Stratigraphic sections, bedding se-
quences, and random processes: Science, v. 154, no. 3753, p.
1162-64.
Dacey, M. F., and Krumbein, W. C., 1970, Markovian models in strati-
graphic analysis: Math. Geol., v. 2, p. 175-191.
Harbaugh, J. W., and Bonham-Carter, G., 1970, Computer simulation in
geology: New York, John Wiley &Sons, 575 p.
James, W. R., and Krumbein, W. C., 1969, Frequency distributions of
stream link lengths: Jour. Geology, v. 77, p. 544-565.
Krumbein, W. C., 1967, FORTRAN IV computer programs for Markov chain
experiments in geology: Kansas Geol. Survey Computer Contr. 13,
38 p.
102 w. C. KRUMBEIN
Krumbein, W. C., 1968a, Computer simulation of transgressive and re-
gressive deposits with a discrete-state, continuous-time Markov
model, in computer applications in the earth sciences: Colloquium
on simulation, D. F. ~erriam, ed.: Kansas Geol. Survey Computer
Contr. 22, p. 11-18.
Krumbein, W. C., 1968b, FORTRAN IV computer program for simulation of
transgression and regression with continuous-time Markov models:
Kansas Geol. Survey Computer Contr. 26, 38 p.
Krumbein, W. C., 1972, Probabilistic models and the quantification
process in geology: Geol. Soc. Amer. Spec. Paper 146, p. 1-10.
Krumbein, W. C., and Dacey, M. F., 1969, Markov chains and embedded
Markov chains in geology: Math. Geol., v. 1, p. 79-96.
Pattison, A., 1965, Synthesis of hourly rainfall data: Water Re-
sources Research, v. 1, p. 489-498.
Schwarzacher, W., 1967, Some experiments to simulate the Pennsylvanian
rock sequence of Kansas: Kans. Geol. Survey Computer Contr. No.
18, p. 5-14.
BIBLIOGRAPHY
Adelman, I. G., 1958, A stochastic analysis of the size distribution
of firms: Jour. Amer. Stat. Assoc., v. 53, p. 893-904. (Example
of discrete states with unequal class intervals.)
Agterberg, F. P., 1966, The use of multivariate Markov schemes in geol-
ogy: Jour. Geology, v. 74, p. 764-785.
Agterberg, F. P., 1966, Markov schemes for multivariate well data:
Min. Ind. Experiment Sta., Pennsylvania State Univ. Spec. Publ.
2-65, p. Y1-Y18. (Theory and application of first-order Markov
process to study of chemical elements in a reef.)
Allegre, C., 1964, Vers une logique mathematique des series sedimen-
taires: Bull. Soc. Geol. France, v. 6, p. 214-218.
Amorocho, J., and Hart,
hydrologic systems
v. 45, p. 307-321.
(p.3l8).]
W. E., 1964, Critique of current methods in
investigation: Trans. Amer. Geophys. Union,
[First-order and higher-order Markov chains
Bartlett, M. S., 1960, An introduction to stochastic
special reference to methods and applications:
processes with
Cambridge, The
University Press, 312 p.
4. MARKOV MODELS IN THE EARTH SCIENCES 103
Billingsley, P., 1961, Statistical methods in Harkov chains: Ann.
Math. Stat., v. 32, p. 12-40.
Clark, W. A. V., 1964, Harkov chain analysis in geography: an appli-
cation to the movement of rental housing areas: Ann. Assoc.
Am. Geog., v. 55, p. 351-359. (Study of rentals in several cities
for three 10-year intervals.)
Coleman, J. S., 1964, Introduction to mathematical sociology:
Illinois, Free Press, Glencoe, 554 p.
Doob, J. L., 1953, Stochastic processes: New York, John Wiley & Sons,
Inc., 154 p.
Feller, W., 1968, An introduction to probability theory and its ap-
plications (3rd ed.): New York, John Wiley &Sons, 509 p.
Fenner, P., (Editor), 1969, Hodels of geologic processes: AGI/CEGS
Short Course, Philadelphia, November, 1969. Available through
American Geological Institute, Washington, D.C.
Gingerich, P. D., 1969, Harkov analysis of cyclic alluvial sediments:
Jour. Sed. Pet., v. 39, no. 1, p. 330-332.
Graf, D. L., Blyth, C. R., and Stemmler, R. S., 1967, One-dimensional
disorder in carbonates: Illinois Geol. Survey Circ. 408, 61 p.
(First-order Harkov model applied to crystallographic defects in
carbonate crystals.)
Griffiths, J. C., 1966, Future trends in geomathematics: Pennsylvania
State Univ., Hineral Industries, v. 35, p. 1-8.
Harbaugh, J. W., 1966, Hathematical simulation of marine sedimentation
with IBM 7090/7094 computers: Kansas Geol. Survey Computer Contr.
1, 52 p.
Harbaugh, J. W., and Wahlstedt, W. J., 1967, FORTRAN IV program for
mathematical simulation of marine sedimentation with IBM 7040 or
7094 computers: Kansas Geol. Survey Computer Contr. 9, 40 p.
Heller, R. A., and Shinozuka, M., 1966, Development of randomized load
sequences with transition probabilities based on a Harkov pro-
cess: Technometrics, v. 8, p. 107-114.
Karlin, S., 1966, A first course in stochastic processes: New York,
Academic Press, 502 p.
Kemeny, J. G., and Snell, J. L., 1960, Finite Harkov chains: Princeton
New Jersey, Van Nostrand Co., Inc., 210 p.
Krumbein, W. C., and Graybill, F. A., 1965, An introduction to statis-
tical models in geology: New York, HcGraw-Hi11 Book Co., 475 p.
104 W.C. KRUMBEIN
Krumbein, W. C., and Scherer, W., 1970, Structuring observational data
for Markov and semi-Markov models in geology: Tech. Rept. No.
15, ONR Task 389-150. National Clearinghouse No. AD 716794.
Leopold, L. B., Wolman, M. G., and Miller, J. P., 1964, Fluvial pro-
cesses in geomorphology: San Francisco, Freeman and Co., 522 p.
Loucks, D. P., and Lynn, W. R., 1966, Probabilistic models for pre-
dicting stream quality: Water Resources Research, v. 2, p. 593-
605.
Lumsden, D. N., 1971, Facies and bed thickness distributions of lime-
stones: Jour. Sed. Pet., v. 41, p. 593-598.
Matalas, N. C., 1967, Some distribution problems in time series simu-
lation: Computer Contribution 18, Kansas Geol. Survey, p. 37-40.
Merriam, D. F., and Cocke, N. C., eds., 1968, Computer applications in
the earth sciences: Colloquium on simulation: Kansas Geol.
Survey Computer Contr. 22, 58 p.
Potter, P. E., and Blakely, R. E., 1967, Generation of a synthetic
vertical profile of a fluvial sandstone body. J. Soc. Petrol.
Eng., v. 6, p. 243-251.
Potter, P. E., and Blakely, R. F., 1968, Random processes and litho-
logic transitions: Jour. Geology, v. 76, p. 154-170.
Rogers, A., 1966, A Markovian policy model of interregional migration:
Regional Sci. Assoc. Papers, v. 17, p. 205- 224. (Interregional
migration under controlled and uncontrolled political conditions.)
Scheidegger, A. E., and Langbein, W. B., 1966, Probability concepts in
geomorphology: U.S. Geol. Survey Prof. Paper 500-C, p. C1-C14.
(Markov processes continuous in time and space for slope develop-
ment.)
Schwarzacher, W., 1964, An application of statistical time-series
analysis of a limestone-shale sequence: J. Geol., v. 72, p. 195-
213.
Schwarzacher, W., 1968, Experiments with variable sedimentation rates,
in computer applications in the earth sciences: Colloquium on
simulation, D. F. Merriam, ed.: Kansas Geol. Survey Computer
Contr. 22, p. 19-21.
Schwarzacher, W., 1972, The semi-Markov process as a general sedimen-
tation model: Mathematical models in sedimentology, edited by
D. F. Merriam: New York, Plenum Press, p. 247-268.
Shreve, R. L., 1966, Statistical law of stream numbers: Jour. Geology,
v. 74, p. 17-37.
Shreve, R. L., 1967, Infinite topologically random channel networks:
Jour. Geology, v. 75, p. 178-186.
4. MARKOV MODELS IN THE EARTH SCIENCES 105
Shreve, R. L., 1969, Stream lengths and basin areas in topologically
random channel networks: J. Geol., v. 77, p. 397-414.
Smart, J. S., 1968, Statistical properties of stream lengths: Water
Resources Research, v. 4, p. 1001-1014.
Smart, J. S., 1969, Topological ?roperties of channel networks: Geol.
Soc. America Bull., v. 80, p. 1757-1774.
Vistelius, A. B., 1949, On the question of the mechanism of the forma-
tion of strata: Doklady Akademii, Nauk SSSR, v. 65, p. 191-194.
Vistelius, A. B., and Feigel'son, T. S., 1965, On the theory of bed
formation: Doklady Akademii, Nauk SSSR, v. 164, p. 158-160.
Vistelius, A. B., and Faas, A. V., 1965, On the character of the alter-
nation of strata in certain sedimentary rock masses: Doklady
Akademii, Nauk SSSR, v. 164, p. 629-632.
Vistelius, A. B., 1966, Genesis of the Mt. Belaya granodiorite,
Kamchatka (an experiment in stochastic modeling): Doklady
Akademii, Nauk SSSR, v. 167, p. 1115-1118. (Application of a
Markov chain in the study of the sequence of mineral grains in
a thin section.)
Watson, R. A., 1969, Explanation and prediction in geology: Jour.
Geology, v. 77, p. 488-494.
Wickman, F. E., 1966, Repose period patterns of volcanoes; V. General
discussion and a tentative stochastic model: Arkiv Mineralogi
Geologi, v. 4, p. 351-367.
Zeller, E. J., 1964, Cycles and psychology, in Symposium on Cyclic
Sedimentation: Kansas Geol. Survey Bull. 169, v. 2, p. 631-636.
Chapter 5
A Priori and Experimental Approximation
of Simple Ratio Correlations
Felix Chaves
5.1 RATIO CORRELATIONS
Most measurements are of quantities that are in some sense ratios,
but this requires no special consideration in correlation analysis or
in studies of interdependence if the denominators of the ratios being
compared are constants. That elevation is measured in units initially
defined as some fraction of the distance from the equator to the pole
and specific gravity as some mUltfple of the weight of an equivalent
volume of water at a particular temperature and pressure, for instance,
need not concern the petrologist seeking to characterize the relation
between the elevations of a set of samples in a sill and their specific
gravities. Indeed, without such scaling parameters it is difficult to
see how questions of this kind could be answered, or even asked.
When the scaling parameters are themselves variables, however, as
is often the case in geochemistry, the situation is very different.
Relations between ratios may then be very different from those between
the numerators and denominators - the "absolute" variables or "terms" -
of the ratios. In particular, as was noted long ago by Pearson (1896),
even though the terms are uncorrelated, there may nevertheless be cor-
relation, and sometimes very strong correlation, between pairs of
ratios formed from them.
To characterize the general relationship, we first express the
ratios Yi = X1/X2, Yj = X
3/X4 as first-order approximations of the true
means, variances, and covariances of the X's, where each "observation"
vector, X = [X1,X2,X3,X4] is drawn simply at random from a parent
106
5. SIMPLE RATIO CORRELATIONS 107
population characterized by means and variances ~m' ~~ for m = 1,4 and
covariances ~mn for m~n.
Each observed value of Xm = ~m + om' so that
Y.
1
~1 + °1
__-=".c..;. and Y.
~2 + u 2 J (5.1.1)
It is readily shown (see, for instance, Chayes, 1971) that to
first-order approximation
(5.1.2)
and, similarly,
(5.1.3)
Then, taking expectations on both sides of (5.1.2) and (5.1.3),
so that, using (5.1.2) and the left half of (5.1.4),
"'.1
Y.
1
and, from (5.1.3) and the right half of (5.1.4),
Further, multiplying (5.1.5) by (5.1.6)
(5.1.4)
(5.1.5)
(5.1.6)
(5.1. 7)
To find the parent correlation, p .. , between Y. and YJ
" we re-
2 2 1J 1
quire the expectations of "'i' "'j' "'i"'j' viz.,
108
and
FELIX CdAYES
Var(Yi ) E(ll~) ~1 2 2
2 2
- 2)J1)Ji102~
1 '" E 4()Ji1 + )J102
)J2
122 2 2
- 2)J1)J20"12)
4()J20"1 + )J10"2 (5.1.8)
)J2
Var(Y. ) E(ll~) 1 2 2 2 2
- 2)J3)J40"34)
'" 4()J40"3 + )J30"4
J J
)J4
(5.1.9)
Cov(Yi,Yj ) =E(llillj)
1
'" 22"()J2)J40"13 - )J2)J30"14 - )J1)J40"23 + )J1)J30"24) (5.1.10)
)J2)J4
Thus, finally,
Cov(Y. ,Y.)
1 J
-{Var(Y.) . Var(Y.)
"1 J
(5.1.11)
Now by definition O"mn = O"mO"nPmn' so that division of the numera-
tor and denominator of (5.1.11) by ()J1)J2)J3)J4) leads at once to the
commonly found form [see, for instance, equation (2.1) of Chayes,
1971, in which the signs of the correlation terms in the denominator
are wrong]
(5.1.12)
where Cm = O"m/)Jm is Pearson's coefficient of variation and Pmn is his
coefficient of correlation.
From (5.1.12) it is evident, as is intuitively obvious, that if
all terms of a pair of ratios are different and uncorrelated, the
ratios themselves will also be uncorrelated. If, however, two ratios
s. SIMPLE RATIO CORRELATIONS 109
have a common denominator they will be correlated even though their
numerators are uncorrelated with each other and with the denominator.
This at first sight paradoxical result - called "spurious" correlation
by Pearson - can be reached by introducing into (5.1.11) or (5.1.12)
the constraints that ~2 = ~4' 02 = 04' and Pmn = 0 for all mfn. But
working it out ab initio is just as simple and provides useful drill
for the novice.
If Yi = X1/X2 and Y
k
(5.1.5) and
X3/X2, then of course 6i is exactly as in
(5.1.13)
so that
(5.1.14)
If the X's are uncorrelated, the expectations of all cross prod-
uct terms in the 8's vanish, and
Var(Yi )
122
4(~201 +
2 2
~1° 2) (5.1.15)
~2
Var(Yk)
122
4(~203 +
2 2
~302) (5.1.16)
~2
Cov(Yi,Yk) E(6i 6k)
~1~3 2
~2
~2
Thus,
Cov(Yi,Yk)
Pik
-VVar (Yi) . Var(Yk)
2
~1~3° 2 (5.1.18)
~ 2 2
(~20l
2 2
+ ~ 1°2)
2 2
(~2° 3 +
2 2
~302)
110 FE LI X CHAYES
which approximates the correlation between two ratios with common de-
nominator as a function of the means and variances of the numerators
and denominator, a result again easily restated in terms of the coef-
ficients of correlation and variation.*
Thus, ratios with common denominator will tend to be positively
correlated if their numerators are uncorrelated with each other and
with their denominator. Indeed, the correlation generated in this
fashion may be far from trivial. If, for instance, the coefficients
of variation of the terms are equal, the correlation between the ratios
is -0.5, and if the, coefficient of variation of the denominator is
larger than the (equal) coefficients of variation of the numerators,
the correlation between the ratios will be greater than 0.5; if it is
twice as large, something not at all unlikely in geochemistry, the
correlation of the ratios will be -0.8.
The other simple ratio correlations - those between a ratio and
its numerator or denominator, between ratios with common numerator,
and between ratios the numerator of one of which is the denominator
of the other - can of course be approximated in analogous fashion. All
save the last are common in geochemical work (for a review, see Chayes,
1949) and the interested reader will find it useful to carry through
the computations for the case of zero covariance between the X's,
comparing his results with those shown in Table 2.1 of Chayes (1971).
The approximations used here and in all the work so far cited are
of first order only, and can be expected to yield reliable results only
if terms of second and higher order in (02/~2) are small enough to
ignore. In much geochemical work this is not so; indeed, in this field
we often seem to use a particular variable as denominator precisely
because its relative variance is large, so that higher powers of (° 2/
~2) will often not be negligible. Although the work up to this point
*Division of the numerator and denominator of the right side of
222
(5.1.18) by (~1~2~3) leads at once to
C2
Pik 2
~ (Ci + C;) (C; + C~)
5. SIMPLE RATIO CORRELATIONS 111
shows pretty clearly that correlation generated by the process of ratio
formation may be far too strong to ignore, in many practical cases,
alas, it will not lead to useful approximations of that correlation.
Higher-order approximations for means, variances, and covariances
are available (Tukey), and an analytical formulation using them would
have the usual advantage of providing, in principle at least, a gen-
eral solution, something of considerable aesthetic and scientific ap-
peal. But a general solution is not indispensable if one can obtain a
satisfactory solution for any specific problem that may arise. That
is what one ought to be able to accomplish by simulation experimenta-
tion, and the remainder of this chapter describes the structure and
use of a computer program, RTCRSM2 (~a!io ~o£relation ~i~ulation, ver-
sion ~, which is an attempt to exploit this possibility.
5.2 RTCRSM2
In using RTCRSM2, the investigator assigns:
1. Appropriate parent means and variances to the four pseudorandom
variables A,B,C,D to be used as numerators and denominators of
the ratios
2. The number of items (sample size) per simulation, and the number
of simulations per experiment
He must also initialize the random number generator, either by provid-
ing it with a starting residue or instructing it to use a stored one.
Given this information, the program generates a set of four ran-
dom numbers from a parent population uniformly distributed in the range
(0,1), transforms these to normal deviates with zero mean and unit
variance, and adjusts each with its assigned mean and variance to prod-
uce the current set of "observed" values of A, B, C, and D. From these
all possible simple ratios are formed, and the elements of this vector
of terms and ratios, together with their squares and cross products,
are then stored in cumulators. The process is repeated until the re-
quested number of items - i.e., sets of "observed values" of A, B, C,
D - has been supplied and processed.
112 FELIX CHAYES
The covariance matrix is then computed from the cumulated sums,
sums of squares, and sums of cross products, its diagonal elements are
converted to standard deviations, its off-diagonal elements to corre-
lations, and the requested results printed out. Since the objective
is to approximate as closely as possible the value of an unknown pa-
rent correlation, the number of items per simulation should be as large
as the computing budget will permit. But the cumulating procedure used
in the program, selected because it economizes on core requirement and
places no upper limit on the number of items per sample, leads to large
rounding error. Double precision is essential for simulations con-
taining more than a few hundred items; on the Univac 1108 it seems to
satisfactorily control rounding error even for very large simulations.
In this kind of work it is easy to bury oneself in numbers.
RTCRSM2 is designed as a specific problem solver, and its printout may
be restricted to those particular ratio correlations of immediate in-
terest. In fact, unless the user specifies the type(s) of corre1a-
tion(s) to be printed, the output will consist only of an error mes-
sage reminding him that he should have done so. Loading instructions
are provided in lines 32 to 76 of the accompanying program listing.
5.3 UNNO AND RANEX
The random number generator referred to in RTCRSM2 as UNNO is de-
signed to generate uniformly distributed numbers in the range (0,1);
these are normalized in the main program. UNNO, coded in Fortran by
L. Finger, appears to work admirably on the Univac 1108 used for the
calculations reported below. Whether it performs satisfactorily on
any specific computer can be determined experimentally and such experi-
mentation should certainly precede routine operation of RTCRSM2, for
unless the random number generator it uses is demonstrably sound, the
results yielded by RTCRSM2 are uninterpretab1e. Program RANEX is de-
signed to provide information on this matter; the rather extensive
battery of tests performed by it is described in lines 2 to 9, and
loading instructions are given in lines 19 to 34 of the accompanying
5. SIMPLE RATIO CORRELATIONS 113
listing. If UNNO performs unsatisfactorily or the user prefers
another generator, the subroutine calls in RTCRSM2 will require modi-
fication: these are contained in lines 231, 249, and 250. (With
analogous modification of cards 98, 110, and 111, incidentally, RANEX
may be used to test the output of any subroutine designed to generate
uniformly distributed pseudorandom numbers in the range 0,1.)
5.4 COMMENTS ON USAGE
Unless instructions to the contrary are provided at operation
time, the variables A, B, C, D of RTCRSM2 are drawn from theoretically
uncorrelated parents; in reasonably large samples, the correlations
between them should be negligibly small. The actual sample correla-
tions can be printed out, however, so it is always possible to see how
nearly this goal has been attained in a particular simulation, and it
is probably wise to do so. But ratio correlations can be approximated
by simulation whatever the correlations between their terms, and RTCRSM2
provides limited facility for relaxing the restriction that the terms
are uncorre1ated. Specifically, the user may assign correlation(s) of
arbitrary size and sign between any pair or any two mutually exclusive
pairs of variables A, B, C, and D. (When this option is exercised, the
sample correlations between variables A, B, C, and D should always be
printed out.)
Preliminary experimentation with RTCRSM2 confirms an earlier sug-
gestion (Chayes, 1971) that for ratios formed from uncorrelated terms
homogeneous in C, the linear approximations of p are very good if C <
0.1 and still fairly good if C < 0.15. For 0.15 < C < 0.35 the simu-
lated ratio correlations differ widely both from the linear approxi-
mations and from zero, so that experimental determination or higher-
order approximation of null values against which to test observed
ratio correlations will be essential in this range. Simulation exper-
iments suggest that with further increase of C, Pearson's "spurious"
correlation - that between ratios with common denominator - averages
about 0.5 with very large variance, while the other simple ratio cor-
relations rapidly approach zero with small variance.
114 FELIX CHAYES
These rather unexpected results will be described more fully when
the work is completed and are mentioned here only for the sake of the
perspective they provide. Compositional variables are necessarily
nonnegative and in the absence of pronounced skew their coefficients
of variation must be fairly small. For uniformly distributed variables
- in which there is no central tendency - it is easily shown that C =
11/3 or 0.557; present indications are that it would be wise to avoid
correlations between ratios of such variables with common denominators
but safe enough to test observed values of other simple ratio correla-
tions formed from them against a null value of zero. In binomial or
multinomial variables, on the other hand, C = 1(1 - p)/np; here the
first-order approximations of ratio correlations should be adequate if
0.05 < P < 0.95 and n > 400, and if p > 0.1 this should be true even
for n as small as 100. But in work involving sampling variation, as
opposed to mere counting variance, of major constituents, the relevant
values of C will nearly always be considerably larger than for bino-
mially distributed variables and, since there is usually a fairly
strong central tendency, considerably smaller than for rectangularly
distributed ones. It seems likely, then, that in dealing with ratios
formed of major constituents, whether expressed normatively, modally,
or as oxides, there may be frequent need for experimental determination
of the appropriate null value of p along the lines suggested here, for
it will often happen that the coefficients of variation of such vari-
ables are large enough to make the first-order approximation of p un-
satisfactory, but small enough so that the assumption that p = 0 is
unrealistic.
REFERENCES
Chayes, F., 1949, On correlation in petrography: J. Geol., v. 57, p.
239-254.
Chayes, F., 1971, Ratio correlation: Chicago, University of Chicago
Press.
Pearson, K., 1896-1897, On a form of spurious correlation which may
arise when indices are used in the measurements of organs: Proc.
Roy. Soc. (London), v. 60, p. 489-502.
Tukey, J. W., undated, The propagation of errors, fluctuations and tol-
erances: Unpublished Technical Reports No. 10, 11, 12, Princeton
University.
APPENDIX
1
C
PROGRAM
RT(qS~2
RTCP
C
FINDS
CORRELATIn~s
~ETWEE~
ALL
PATIOS
FORMED
FROM
4
TERMS
CA,A,C,O)
PTCR
C
WITH
iSSIG~~EfJ
MFANS,STtlNDAPf)
fJEVlflTU1NS
ANO
COI'1MON
!CLEMENTS,
BY
MONTF
PTCR
C
C"oLO
SIM(JLAT{nN.
FOUR
UNIFORMLY
OISTRIBUTEn
PSEUDCPANDOIii
NUMRERS
PTCR
CAPE
GENER
ATEO
IN
THE
RANGE
(Q,
1),
IJORMALI
ZED
BY
THE
•
DIRECT'
R
TCR
C
DPOCEf)llRE
OF
lfLEN
"NO
"I=V[RO,
ADJlJSTEn
FOR
AS<;IGt-.,IFO
'-1EANS,
STANDARf)
RTCR
C
DEVIATIONS
AND
COMMON
FLEMENTS,
A~D
STORED
IN
THE
FIRST
FOUP
ELEMFNTS
RTCP
C
OF
VHTOR
Y
IA=YIll,
B=Y(21,C=YUI,f)=Y(4»).
THE
REf'AAINING
12
ELEMENTS
RTCR
C
OF
Y
APE
LOADED
WITH
~rNADY
PATIOS
01=
THE'
FIRST
4,
TN
THE
ORnER
SHOWN
RTCR
C
BHOW
IN
fJATA
BLOCK
NAM.
AS
EACH
Y
V!='CTOR
IS
Cn"1PLETED,
THE
SUMS,
SUMSRTCR
C
OF
SQUARES
AND
SUf'AS
OF
C'~OSS-PR(1(lUCTS
OF
ITS
ELEMENTS
ARE
CUMUL
ATEO.
RTCR
C
THE
PAOCFOUPE
IS
ITERATFf)
NR
TIMES
TO
GENERATE
THE
SAMPLE
ARPAY.
ON
RTCR
C
COMPLETION
Of
THE
NRTH
CYCt.E
THE
COVARIANCE
MATRIX
IS
GF~ERATEQ
RTCR
C
FROM
THE
CUf'AIfLATED
SUt-~S
OF
SQUARFS
AND
CPOSS-PQ.ODUCTS,
AN£)
THE
f)ESIREOPTCR
C
CORRELATIONS
APE
EXTRACTE'9
AND
PRINTED.
RTCR
C
RTCR
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
IF
NO
CORRELATIONS
ARE
SPECTFED
ON
INPUT,
VARIA~LFS
A,8,(,0
WILL
RF
RTCR
UNCORRFLATFO,
T(1ll~qTS
'IF
EXPERIMENTAL
EAROR.
IF
IISER
SPECIFIES
RTCR
OESIPED
CORRELATIONS
(POSITIVE
OR
NEGATIVE)
BETWFEN
ANY
PAIR
OR
ANY
2
PTCR
MUTUALLY
EXCLUSIVF
DAloS
OF
VARIABLES
A,R,C,n,
THE
REQUIRED
COMMON
EL-RTCR
F'~ENT
STANDARD
DEVIATIONS
ARE
crWPlJTfO
FRCI'!
AN
ALGORITHM
BASED
ON
RTCQ
EQ.
(3.1<=;),
P.?6,
OF
RATIO
CORRELATION
(CHAYES,l971'.
EXECUTION
TERM-
RTCR
INATES
IN
ERRnp
IF
THE
<;Ar~E
VArUAI3LE
APPFARS
IN
BOTH
CORRELATIONS.
RTCA
Q,IINDOM
NUMBFRS
ARE
GENFRATEf)
I3Y
SURfUlUTINE
UNNOII,F),
WHFRE
I
IS
THE
INTEGER
S
FED
MJO
F
IS
TW
FUJAT
FO
NU"13ER
IN
RANGF
(0,
ll.
THE
FORTRAN
GENERHOR
USED
WTTH
THIS
VERSION
(11=
'UNEX
WlIS
COOED
RY
L.
FINGER.
PROGRAM
WRITTEN
BY
F.
CHAYES
FOR
NSF
IN~T'TUTE
ON
GEOSTAfISTICS,
(~ICAGO
CIRClF,
1972
R
TCR
RT(P
RTCR
RTCP
PTe
R
,
P
TCR
RTel<'
PTCR
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
;>00
210
220
230
240
250
260
270
280
290
~oo
310
V1
en
H
~
t""'
tTl
~
H
o
n
o
~
E
H
~
....
....V1
C
*******************************************************************
RTC~
320
~
~
C
*
*
RTCR
~30
Q
C
*
CARr)
I~PUT
TO
PR1GRAM
DTCRS~2
*
RTCR
340
C
'
"
*
R
TCR
350
C
*
COMMflNn
ChRD
IfS,I4,llIl,tl51
'
*
PTCR
360
C
*
COL.
VAR
I
ABLE
DEFINlTION
nR
FUNCTION
*
RTCP
370
C
*
1-5
NR
NUMBroR
OF
ITEMS
PFf<.
SAMPLE
*
RTCR
380
C
*
6-9
~MSP
NUMBER
OF
sa~Pl~S
TO
RE
DRAWN
*
RTC~
390
C
*
10
KML~T
O-NOP,
I
-
RFAD
AND
USE
COMMeN
ELEMENTS
*
RTCR
400
C
*
(Nfl
-
VECTOR
RQST
RESTRICTS
PRINT
TO
MATERIAL
REQUESTED)
'
*
RTCR
410
C
*
II
RQST(
II
O-Nnp,
1
-
ALL,
IN
ONE
RIr.
MATRIX
*
P
TCR
420
C
*
12
I
.
(2)
O-NOP,
1
-
BETWEEN
All
TERMS
IA,BfC,D)
*
Q.TCR
430
C
*
13
'
.
(
31
O-NOP
1
-
0F
TYPE
XliX?
WITH
Xl
*
RTCP,
440
C
*
14
•
•
{
41
O-NOP,
1
-
OF
TYPr:
XII
X2
W
J
TH
X2
*
RTCR
450
C
*
15
•
t
(
5)
O-NOP
1
-
OF
TYPE
XI/X2
WITH
XI/X3
*
RTCR
460
C
*
16
•
•
(6)
o-"WP
1
-
OF
TYPF
X1/X2
WITH
X3/X?
*
RTCR
470
C
'
*
17
"
(7)
Q-NOP,
i-OF
TYPE
X1/X2
WITH
X2/X3
>I<
RTCR
480
C
*
18
t
t
(
8)
O-N~P,
1
-
OF
TYPE
XlIX?
WITH
X2/Xl
*
RTeR
490
C
*
19
·
,
(9)
O-N'lP,
1
-
CF
TYPE
XI/X2
WITH
X3/X4
*
RTCP
500
C
'
"
20
I
I
(10
I
UNASSIGNED
*
~
TCR
510
C
*
21-35
OIN
STARTING
SEFn
OF
RA~DOM
MJMBEA
GENERATOR
*
RTCR
520
C
~,
(GIN
K'
USE
K
*
RTCR
530
C
*
QIN
0'
USE
1ST
RESIDUE
FROM
UNNO
*
R.TCR.
540
C
*
QIN
-I'
USE
EXISTING
RESIDUE)
*
RTCP
550
C
'
"
*
RTCR
560
C
*
PARAMETER
CARD
(12F6.31
,.
R.TCR
570
C
*
1-6
r,VAV(
II
PAP,ENT
MEAN
OF
A
'"
RTCP
580
C
'
"
7-12
GVS
D(
11
PARENT
STA~nARD
DEVIATION
OF
A
*
R
TCR
590
C
*
*
RTCR
600
C
*
*
RTCR
610
'T.I
tTl
C
*
43-48
GVS
Dt
4)
PARENT
STANDARD
DEVIATION
OF
D
>I<
R
TC
P
620
t'"
I-<
C
*
*
RTCP
630
><
C
*
COMMON
FlEMENT
CAPO
(USE
ONLY
IF
K1J!l"1T
=
1)
(12F6.3)
*
RTCR
640
@
C
'
"
1-6
RHO
(11
DESIRED
CClDR.
R
T
W.
V
AR
I
A
FIt
E
S
A
AND
B
*
RTCP
650
=<
C
*
7-12
~Hn(2)
DESIREO
CORR.
RTW.
VARIABLES
A
AND
C
*
RTCP
660
tTl
Vl
C
C
C
C
C
C
C
C
C
C
C
C
C
C
*
13-18
RHI:
(3)
DES
YREI)
CORP.
BTW.
VA.R
IABlF:S
A.
AND
f}
*
RTe!:!
*
19-24
RHO(4)
DESIRED
CORR.
BTW.
VARIABLES
e
AND
C
*
RTCR
*
2S-:0
RHO(S)
DESIRED
CfJRR.
RTW.
VARIAfILES
BAND
0
*
RTCP
*
31-3~
RHO(61
DESIRED
C8RR.
8TW.
VARIARLES
C
AND
D
*
RTCR
*
N.B.
-
ANY
ELEMENT
OF
RHn,
(]R
ANY
ON£'
OF
THE
PAIRS
*
RTCR
'"
{l,61,
(l,S}'
(3,4)
MAY
qE
~SSIGNED
NON-ZFRO
VALUES.
*
PTCR
*
PROBL£'MS
IN
t,.!HICH
OTHER
PAIRS
OR
MOR£'
THAN
TWO
ELEMENTS.
RTCR.
'"
OF
RHO
ARE
ASSIGNED
NON-ZERO
VALUES
WILL
BE
IGNORED.
*
RTCR
*
11:
RTCR
*********.***********************"'*********************************
RTCR
DIMENSJ(1N
AV(3ZI,
RHO((')
,
CMSD(6),
r,VAV(4),
GVSO(4),
NAM(16),
*ORBF136I,
Pt41,
SIGt32,321,
SlJMn21,
Y!321,
RQSTlIOl,NUPG(2I,
*ORSD(41
OOUPLE
PRECISION
AV,SlJM,SIG,RN,RNL,Y
REAL
N~M
INTEGER
OllT
,Q,QIN,RQ<;T
DATil
IN,Ot)T,NUPG,/'5,6
,4H(
IHI,
lH)
I
DATA
"lAM/'
A','
B
','
C
','
[)
','A/O','fl/D','C/D','D/C','A/C',
*'B/C','C/R'
,'O/B'
,'A/R','B/A','C/A','O/.A'1
INPUT
FOP"IATS
RTCR
PTeR
R.
TCR
RTCR
Q
TCR
RTeR
R
TCP
PTCR
Q
TCR
R
TCR
RTCR
PTCP
h70
680
6<10
700
710
720
730
740
750
760
770
780
790
ROO
810
820
830
fl40
850
860
870
880
1
FOR
Mit
T
(t
'),
14,
11
I
1
,
I
1
'5
)
~
FORMAT
(12F6.31
5
FORMAT
(6(6X,F6.111
RTCR
8911
RTCR
900
R
TCR
910
OUTPUT
fORMUS
RTCP.
910
?
FfJRMAT
('1',2~X,'A.
-
ASSIGNEr)
PARAMETERS
FOR
SIMULATIONS
YIFLDIN~RTCR
930
*'
CORREUTyrlNS
SHOWN
IN
fl,
qFLOJ'II'OPIRFNT
VALUES
OF
HRIAS
OF
RATIRTCR
940
*OS'/16X,4(8Y,A21/10X,'MEAN',6X,4FIO.4/10X,'STf}.-f)EV.',Fll.4,3FlO.4RTCR
9S0
*,IITIO,'
ITE~ATIONS
IN
SIMULATION.
ENTRY
SEED
FnR
RANDOM
NUM~ER
GRTCP
960
*ENERIITCR
IS
',Il5,'
TN
SA'"1PLF
NU"IflER',I3,'.')
PTCR
970
4
FGRMAT
I'
NO
ASSIGNEr)
COMMON
ELEMENTS
AMONG
TERMS
OF
RATIOS.')
RTCR
980
6
FORMAT
!l15X,'
INITIAI_
PARA"'lETERS
LISTEn
ABOVE
WILL
BE
"'ODIFIED
BY
PTCR
990
*COMMON
ELEMENT
ADJUSTMENTS
TO
INTRODUCE
fOLLOWING
CORRELATIONS-'f
RTCRI000
*51X,6(A2,A?,7X)/',
REQUESTED
CCRR!=LATIO"'S'
,27X,6(F6.4,5X)I'
REOUIRFPTCRIOIO
V1
(J)
H
~
~
~
H
o
8
~
SH
~
(J)
I-'
I-'
--J
*1)
STAIiOARO
DEVIATIONS
OF
COMMON
ftEMENTS',cdF8.
1
+,3X))
RTCPI020
p
FOR~AT
t/35X'COMPARISON
OF
PARENT
AND
SAMPLE
VALUES
OF
RATIO
TERRTCRI030
*""S
.A,
p,
r:
AND
D.'/55X,""EAfJ',l2X,'STD.-IlEV.'/40X,'TERM
PARENRTCRI040
*T
SAMPU',4X,1Pt~RENT
SAMPLr'/(40)(,A2,8X,F6.2,LX,F1~3,3X,F6.2,
RTCRI050
*IX,F8.4/11
RTCRI060
10
FORMAT
(/'O""ATRIX
WITH
CORRELATIONS
OFF,
STANOARD
DEVIATIONS
ON,
ORTCRI070
*IAGONAL
-'/15X,16IA3,4X»)
ATCRl0eO
12
FORMAT
19X,A3,16F7.41
PTCRI09"
14
FORMAT
(II'
CORRELATTOfJS
RETWf=EN
TERMS',4X,6(2X,A3,',',A3)/33X,
RTCRllOO
*6IF6.4,3XII
RTCRIIIO
16
FORMAT
III'
CORRElATIOfJS
RETWFEN
RATIOS
AND
THEIR
NUMERATnRS
-'I
RTCRl120
*12(?X,A3,','
,A3,lX)I?.X,F6.4,11(4X,F6.4»)
RTr"ll~O
18
FURMAT
(II'
CORRELATIONS
RETWEEN
RATIOS
AND
THEIR
DENOMINATORS
-'/RTCRI140
*12(2X,A3,',',A3,lXI/2X,F6.4,llt4X.F6.4»)
j?
T
CPl150
20
FORMAT
(II'
CORRELATIONS
BETWEEN
RATIOS
WITH
CO~MON
NUMERATORS
-'/RTCRI160
*
I:?
I
?X,
A
3,
'
,
'
,
A
3,1
X)
!2
X,
F
6.4,t
l
(
4X
,F
6.
4))
RT
CR
11
70
72
FORMAT
(II'
CORRELATIONS
BET~EEN
RATIOS
WITH
COMMON
DENOMINATORS
-RTeR1180
*
,
112
(
2)(
,A
3,
,
,
'
,
A
3,1
Xl
12
X,
r:.
6.
1
+
,11
(4X
,F6.
41
)
R
T
CR
1190
24
FOqMAT
III'
CORRFLATIONS
BETWEEN
?ATI~S
IN
WHICH
ONE
TERM,',A3,'ORRTCR1200
*',A2,',
IS
THE
NUMERATOR
OF
ONE
AND
THE
DENOMINATOR
OF
THE
OTHER
-RT(RI210
*
'/1?(2X,t.3,',·,A3,lXl/2X,F6.4,l1l4X,F6.4ll
RTCRI220
26
FORMAT
til'
C(lPRE'LATIONS
flETWEFN
R,.,nos
WHICH
ARF
RECIPROCALS
-
',RTCRI230
*6(2X,A],','
,A3tlXI/'55X,6(F6~4,4Xl
I
RTCRl240
28
FORMAT
III'
I.ISF
',115,'
AS
SFEO
TO
RANOOM
NUfvlBER
GFNERATOR
IN
NEXTRTCR1250
*
EXDFRIMFNT'/lHl)
RTCR1260
30
Ff1RMAT
(1I2'5X,'~.
-
SIMULATION
RESULTS'!
RTCR1210
32
FORMATI'I',20X,'ASK
"IF
NO
DUr:STIOIJS
ANn
T
WILL
TELL
YOU
NO
LIFS~
VRTCR12RO
*HTf'P
PQST
IS
EMPTY.
TRY
AfAIN
'IlHll
RTCR1290
34
FORM,.,T
III'
CORRELATIONS
gETWEr:.N
RATIOS
~ITH
NO
CC~MON
TERMS
-'I
RTCR1300
*
12
I
2X
,A
3,
"
,
,
A
3,1
X
II?
X,
F
6.4,1
11
4X
,F
6.
4)
1
R
TCR
1310
36
FORMAT
(znX,'FAULTY
nATA
CARD.
PROGRAM
LOOKS
FOR
NEXT
PROBLEM.'I'
RTCR13?O
*1')
RTC~1330
3A
H1PMAT
('lKMUH
CARn
SPFCIFIES
"lORE
THAN
TWD
OR
A
FAIJLTY
PAIR
OF
RTCRI340
*Cf1RRElATIONS.
PEAO
IN
NEXT
PRORLFM
OR
QUIT."
RTCCl13'50
RTCR1160
I-'
I-'
00
"rj
tTl
r
H
><
@
2<
tTl
'Jl
C
READ
Cn~MA~D
CARD
C
70
READ
IIN,l,FRP=71,END=300)
NR,NMsr,KMLMT,RQSr,QIN
IF
(NM~D.tQ.O)
NMSP
=
1
C
CHECK
PRINT
REQUEST.
IF
IT
IS
EMPTY,
SKIP
TO
NEXT
PROBLEM.
r
DO
7?
J
=
1,1
0
IF
IROSTIJI.EQ.1)
72
CflNTINUE
WR
IT
E
(nUT,32)
GO
TO
70
GO
TO
7S
C
PRINT
FRROR
MESSAGE,
RETURN
FOR
NEW
COMMAND
CARD
73
WRITE
(QUT,36)
GfJ
Tn
70
C
INITIALIZE
RANOOM
NUMBFR
GEN"ORHOR
IF
THIS
IS
FIRST
PASS
OF
75
IF
lOIN)
100,80,90
80
CALL
UN!JOIQ,RA)
GO
TO
100
90
Q
=
QIN
C
READ
PARA'4fTFR
CARD,
THEN
COM/.10~l
ELE~1E'NT
CARD
I
f
RFClUIREf).
100
READ
{IN,3,ERR=731
(GVAV{I),GVSD!Il,t=l,'tl
C
REREAD
GVsn
INTO
rAsa
so
IT
CAN
RE
RECLAIMED
AS
NEEDED.
REA
0
(0,5)
OR
S
0
NS
=
0
IF
(KMlMT.fQ.O)
GO
TO
101
RTrR1370
RTCR1380
RTCR13QO
RTCR1400
RTCR1410
PTCRl420
RTCR1430
R
rCR
1440
RTCR1450
RTCR
1460
RTCR1470
RTCP
1480
R
TeR
1490
RTCR1500
EXECUTTON.RTCR1510
RTCR1520
RTCR1530
R
TCR154Q
RTCR1550
RTCR1560
RTCR
1570
RTCR1580
RTCP,1590
RTCP1600
C
READ
REQUESTEry
CORRELATIONS
flETWEEN
C
ELEMENT
STANDARD
DEVIATIONS
A,fI,C,O,
COMPUTE
REQUIRED
CO~MON-
RTCR1610
RTCR1620
RTCR1630
READ
(IN,3,EPP=73)
RHO
C
OETERMINF
WHETHER
CORRELATION
RFQUEST
IS
VALID.
KR
=
()
00
1002
K=
1,6
IF
(RHO(KII
1001,1002.1001
1001
KR
=
KR
+
1
1002
CONTINUE
I
F
(K
R
•
EQ
.1)
GO
TO
1007
R
TCR
1640
RTCR1650
RTCR1660
RTCP1670
RTCR1680
RTCPl690
RTCR1700
RTC?l71
0
V1
fJ}
1-1
~
t""'
ttl
~
1-1
o
n
o
~
S
1-1
o
Z
fJ}
.....
.....
'"
IF
(KR.EQ.2)
GO
TO
1004
RTf-Rl720
C
CORRELATION
RfQUEST
FAULTY,
SKIP
TO
NEXT
PROBLEM
RTCR1730
1003
WR
ITE
(OllT,
38)
RTCR1740
GO
TO
70
RTCR1750
C
DETERMINE
WHETHER
A
VALIn
PAtR
OF
CORRELATIONS
HAS
BEEN
REQUESTED
RTCR1760
1004
KR
=
0
RTCR1770
no
1006
K
=
1,6
RTCR1780
IF
(RHOIK))
1005.,1006,1005
RTCR1790
1005
KR
:
K~
+
K
RTCR1800
1006
CONTINUE
RTCRIBIO
IF
(KR.NE.7)
GO
TO
1003
RTCR1820
C
PROCESS
VAUD
CORRElATIOIII
REQUEST
RTCR1830
1007
KC
=
n
RTCR1840
DO
1010
J
:
1,3
RTCRI850
JK
=
J
•
1
RTCR1860
00
1010
K
=
JK,4
RTCR1870
KC
=
KC
+
1
RTCR1880
IF
(RHO(KC')
1008,1010,1008
RTCRIR90
1008
PF
:
(GVSO{K)/GV~O(J)I**2
RTCP1900
RSQ
=
RHn(KC)**2
RTCR1910
fMPT
=
(RSQ*11.+PF'+SQRTIRSQ**2*(1.-PF1**2+4.*RSQ*PF»)/(2.*ll.
RTCP1920
*-RSQI'
RTCR1930
CMSD(KCI
=
GVSOIJI*~QRTICMPTI
RTCR1940
to
10
CONTI
NUE
R
TCR
1950
101
NS
=
N~
+
1
RTCR1960
IF
(NS.GToNM<:;P)
GO
TO
70
QTCR1970
C
TRANSFER
INITIAL
STANDARD
DEVIATION~
FROM
ORSD
TO
GVSD
~TCP19BO
DO
1015
r
=
1,4
RTCR1990
1015
GVSrlll
=
O~Srl(II
RTCR2000
c
~ECORO
INPUT
DATA
RTCR2010
WRIT
E
IOU
T
,
2
I
(
NA
1.1
t
[)
,
1=
1
,
I~),
I
G
V
'I
V
I
J)
,J
=
1
,
4)
,
(G
V
S
[)(
1<)
,
K:
1
,4)
,
NR
,Q
,R
TC
P
2020
*NS
.
RTCR2030
LINF
=
9
RTCR2040
IF(KML"1Tl
10?,102,l03
RTCR2050
102
WRITE
(OUT,4)
RTCR2060
.....
tv
o
~
t"'
H
><
@
~
trl
Vl
LINE
=
L!"JE
+
t
GU
Tn
105
C
lGAO
cn~MON-TER~
NAMF-PAIR~
TN
PRAF
103
I
=
0
Oil
104
J
-=
1,3
JK
=
J
+
1
DO
11')4
K
.JK,4
I
=
I
+
1
POFlF{Il
NA~!J)
I
=
I
+-
1
104
PRBF(I)
NAM(K)
WPITF
(OUT,6)
(PP.RF(I)'I=1,121,{RHOtJI,J=l,6),tCMSOtK),K=1,6)
LI
~1
E
=
l
P~E
+
5
C
CLEAR
CU~ULATOQS
lO~
Dn
106
J
=
1,16
SlI"'1(JI
=
0.0
DO
1
0
6
K
J,
1
6
106
SIG(K,Jl
-=
0.0
c
(
GfNF:RAH
APRhY,
CIJM1JLATINr;
SUMS
OF
VARIAI3LFS,
SQUAQES
AND
CR(1SS-
C
PRnOUCTS
STEPWISE.
00
145
N
=
1,
NR
("
r,FNERATF
ONE
SET
OF
VALUES
FOR
A,B,C,O,
AN!)
STORF
IN
Y(I),I-=1,4.
no
110
I
R
=
1,4
110
(/ILL
flNNO(Q,R(IR)
c
C
NORMALIZE
PI!),
TRANSFeR"",
STORF
TRANSFORMED
VAlIJE
IN
y(I
I,
I
1,4.
c
DO
115
T
-=
1,"3,2
R(T)
=
SQo,T(-?*AIOG(R(TlI)
J
=
I
+
1
R(J)
6.2831851*RtJ)
Y
(I)
GV
AIJ
(J)
+
GVSf)
([
I
*R
([)
*(f)S
IF/(
J
II
115
Y(J)
-=
r.VAV(J)
+
GVsotJ)*R(I)*SIN(R(J')
Q~CR2070
R
TCR20AO
RTCP.2090
RTCR2100
R
TCR2110
RTCR2120
RTCR2130
RTCR2140
RTCR21'50
RTCR2160
IHCR2170
RTCR211~0
RTCR2190
RTCR2200
RTCR2210
RTCR2220
RTCR2230
IHCR2240
RTCR2250
PTCR2260
RTCP2HO
R
TCR
2280
RTCR2290
RTCP2300
RTCR2310
RTCR2320
QTt:R2330
RTCRH40
R
TC
R2350
RTCR2360
RTCP2370
P
TCR2380
R
TCP
2390
RTCR?400
VI
en
H
~
fu
~
H
o
8
§
S
H
~
en
....N
....
C
ADJUST
A,P.C,D
FOp
COMMON
ELEMENTS,
IF
REQUIRED
IF(KMLMT.EQ.O)
GO
TO
125
KC
=
0
DO
120
J
=
1,3
JK=
J
...
1
DO
120
K
=
JK,4
KC
=
KC
...
1
I
F
(
C
MS
0
(
K
C
»
1
2
0
,120,11
7
117
CALL
UNNO(Q,PA)
CAL
L
UNNP
(Q,
I{P
I
lA
=
SORT
(-2.*ALOG(!~AlI
lR
=
COS(6.2831853*RB)
CMFL
=
CMSD(KC,*ZA*ZB
Y
«
J)
=
Y
(
J)
...
O~F
L
C
SIGN
OF
INCREMENT
TO
V(K)
DEPENDS
nN
SIGN
OF
RHO(KC)
I
FIR
He'
(K
C)
J
I
18,
118,
119
C
118
V
(K)
=
YI
K)
-
C
f'>1H
G'J
T':1
120
lIe)
V(KI
=
Y(KI
...
CMEL
120
r:flNTI
NUF
C
STr:JPE
"JUMf'RATnRS
OF
RATIOS
IN
Y(NJ,N=5,16
C
125
I
=
4
1)0
130
J
=
1,3
on
130
K
=
1,4
I
=
I
...
1
130
VO)
=
V(KI
C
OIVIDE
NUMEPATnps
RY
APPROPRIATE
nENn~INATnps
K
=
17
r:
f)O
litO
J
=
],4
Dn
litCI
J
=
1,
'3
K
=
K
-
1
140
Y{K)
=
Y(I<I/Y(Tl
RTCf/2410
I{
TCP2420
P.TCR2430
RTCR?440
RTCR2
1
.50
RTCR2460
RTCP2470
RTCR2480
RTCR2490
PTCR.2500
R
Tep
2510
RTCR2520
RTCR2530
RTCP?54(l
RTCR2550
RTC
P
2560
RTCP?570
RTCR?580
RTCP2590
RTCP2600
RTCR2610
R
TCR2620
RTCR2630
RTCP2640
RTCR2650
RTCR?660
RTCR2670
RTCR2680
RTCR2690
RTCR2700
RTC
P
2710
RTCR2720
RTCR2730
RTCR?740
PTCR2150
t-'
N
N
"rj
tTl
t"""
H
><
9
2<
tTl
en
C
(11,,",UlATF
SUMS
OF
V~R!ABlfS,
SQUAPES
AND
X-PPODUCTS
FOR
ONE
ITEM
nr.
14'5
K
=
1,16
SUM(K)
=
SUMlKI
+
YIK)
no
145
J
K,
16
145
SIG(J,KI
=
SIG(J,KI
+
Y(JI*V(K}
C
C
STOPE
AVERAGES
IN
AV,
CONVERT
SIG
TO
rov
MATRIX
RN
=
NR
c
RNl
=
RN
-
1.
DO
1'50
K
=
1.16
AVlKI
=
SUMlKl/RN
DO
150
J
=
K,
1.
6
1'30
SIGlJ,K)
=(SIr-tJ,KI
-
SJM(Jl*AVIKlt/RNl
C
~AKE
~TAG.
ElFMENTS
STD.
DEVIATIONS,
OFF-OIAG.
CORR.
COEFFICIENTS
C
no
160
T
1,16
160S1(;lI,1I
SQRTCSIG(JtI))
DC
170
J
-=
1,15
K
-=
J
+
1
DO
170
I
=
SIG(I,JI
170
SIGIJ,II
K,16
StGII,J)/!
SH;!l
,Il*SIt;(J,J)'
SIG(I,JI
C
COMPUTATIONS
(0MPLETE.
PREPARE
TO
PRINT
C
IF
(K~'lMT.EQ.OI
GO
TO
ZOO
C
ADJUST
PAPA~TER
VAlUFS
F8R
COMMON
El~~ENT
EFFECTS
KC
=
0
on
t<~0
J
=
1,3
KJ
=
J
+
1
00
190
K
=
KJ,
4
KC
=
KC
+
1
IF(CMSO(KC»
190,190,180
180
GV$O(JI
SQRT(GVSDlJI**Z
+
CMSO(KCI**Z'
GVSO(KI
=
SQRT(GVSD(K)**Z+CMSO(KC)**?)
Rrcp
2160
RTCPZ770
HcP-Z780
RTCRZ790
RTCR2800
RTCR2810
RTCR2820
RTCP2830
R
TCPZ840
PTCP2850
RTCRZ860
R
TCR?f370
RTCRZAfO
RTCR2~C)O
RTCR2900
RTCR7910
RTCR2CJ20
PTCR2930
RTCR2940
RTCR2950
PTCRZ960
RTCR2970
RrCp
2980
RTCR?990
RrCP3001)
RTCR1010
PTCP1020
RTCR3030
RTCR
3040
RTCR3050
R
TCR3060
R
TCR'3070
RTCR3080
R
TCR.3090
RTCR3100
U1
C/l
H
~
~
~
H
o
8
~
t""
~
H
o
~
.....
N
VI
190
CONTI
NUF
C
C
PRINT
COMPARISON
OF
PARENT
AND
SAMPLE
RATIO
TERMS
200
WRITE
(OUT.30)
C
WRITE
IOUT,8)(NAM(J),GVAVtJ),AV(JI,GVSO(JI,SIG(J,J),J=1,4)
LINE
=
LINE
+
15
IF
(ROST(
11
•
EO.
0)
GO
TO
220
C
PRINT
COMPLETE
MATRIX
OF
£ORRELATIONS
AND
STO.-DEVS.
IF
REQUESTED
WRITE
(fJUT,lO)
NAM
C
DO
210
J
=
1,
1
6
210
WR
I
TE
(OU
T
,12)
NA
M
(
J),
(S
I
G
(I
,J
I
,
1==1
,
16
)
LINE
=
LINE
+
19
C
PACK
AND
LIST
rrRRELATIONS
~ETWEEN
TERMS
OF
RATIOS
r.
220
IF
{RQST(2).EQ.01
GO
TO
23<;
IP
=
0
N
=
0
DO
230
J
==
1,3
K
::
J
+
1
no
230
I
=
K,4
IP
=
IP
+
1
P
R
R
F
(
I
P
)
==
NA
M
(
J
I
IP
==
IP
+
1
PRBF(
IP)
==
NAMe
1)
N
=
N
+
1
230
PRRF(12+NI
=
SIGtI,J}
WRITE
(OUT,14)
(PRRF(N),N=1,18)
L
HIE
==
LI
fJ~
+
4
235
IF
(RQSTDI.EQ.O)
GO
TO
245
C
PACK
AND
LIST
CORRELATIONS
RFTWEEN
RATIOS
AND
THEIR
NUMERATORS
KlO
=
4
KP
=
0
N
=
Q
RTCR31l0
RTCR3120
RTCR3130
RTCR3140
RTCR3150
RTCl>3160
RTCR3170
RTCR3180
RTCR3190
R
TCR
3200
RTCR3210
R
TCR
3220
RTCR1230
R
TCR3240
RTCR3250
RTCR3260
RTCR3?70
RTCR3280
RTCR32<:lO
RTCA
3300
RTCR3310
RTCR3320
RTCR3330
RTCP3340
RTCR3350
RTCR3360
RTrR3370
R
TCR3380
RTCR.3390
RTCP3400
R
TeR
341
0
R
TCR
3420
RTCP3430
RT(R3440
R
TCR3450
......
N
...
"r:I
tTl
t-<
H
><
9
2<
tTl
en
c
on
240
J
=
1,
4
KLO
=
KLD
+
1
DO
240
K
=
KLO.16,4
KP
=
KP
+
]
PRAFCKPI
=
NAMIKI
KP
:0
KP
+
1
P
Q
8F(KP)
=
NAM(J)
N
=
N
+
1
240
PRPFI24+NI
=
SJG(K,JI
LINE
=
LINE
+
5
IF
(5~
-
LINFI
243,244,244
243
WRITE
(OUT,NUPG)
L
HIE'
=
0
244
WRITE
cnUT,l6)
(PRRF(N),
N
=
1,361
LINE'
=
tINF
+
5.
245
IF(RQST(4).EQ.O)
GO
TO
255
C
PACK
AND
LIST
rQRPELATIO~S
RFTWEEN
RATIOS
AND
THEIR
DENOMINATORS
KLn
-=
17
KP
=
0
,,!
=
()
Dr)
250
J
=
1,4
KHI
=
nfl-
1
KLO
=
KlO
-
3
DO
250
K
=
KLO,KHI
Kfl
=
KP
+
1
PRBF(KP'
=
NAM(K)
KP
=
KP
+
1
PRBFtKP'
=
NA~(JI
N
=
N
+
1
250
PRBFC24+N)
=
SIG(K,J)
LINE'
=
LINE
+
5
IF
(55
-
LINE)
253,254,254
253
WRITF
(GUT,NUPG)
LINE
=
0
RTCR3460
q
TCR3470
RTCR3480
RTCR3490
R
TCR3500
RTCR3510
PTCR3520
RTCR3530
RTCR3540
lHCR3550
PTCR3560
RTCRV570
RTCR3580
R
TCR
'3
590
RTCR'3600
RTCR3610
RTCR3620
RTCR3630
R
TCR3640
RTCR3650
RTCR3660
RTCR3670
RTCR3680
R
TCR36QO
RTCR3700
RTCR3710
RTCR372
0
RTCR3730
RTCR"740
RTCR3750
RTCR3760
RTCR3770
RTCR3780
RTCR37QO
RTCR1800
VI
en
H
~
~
~
H
o
8
~t""'
~
H
~
....N
VI
254
WRITEIOUT,lB)
(PRBfIN)
,N=1,36)
R
TCR
3810
.....
tv
C
RTCR3A20
0
255
IF(RQST(51.fQ.0)
GO
Tn
280
RTCP3830
C
PACK
A~D
LIST
CORRELATIONS
BFTWFEN
R~TIOS
WITH
COMMON
NUMERATORS
R
TCR
3840
t
NK
=
I
RTCR3850
KP
=
0
RTCR31160
N
0
RTCR3870
r
5
RTCP.3A80
J
13
RTCR3890
K
9
RTCR3900
260
KP
=
KP
+
1
RTCR3<H0·
PRRF{KP)
=
NAMt
I
I
RTCR3920
KP
=
KP
+
1
RTCR3930
DRRF(KP)
=
NAMIKI
R
TCR3940
N
-=
N
+
1
R
TCR3950
PRRF(24+N)
=
SIG(I,K)
IHCR3960
KP
=
KP
+
1
RTCR3970
PRBF(l<P)
=
NAM(
II
RTCR3980
KP
=
KP
+
1
RTCR3990
PRP-FIK")
-=
NAM
(J)
R
TCR4000
N
=
N
+
1
RTCR4010
PRFlF(24+NI
-=
SlG(l,J)
RTCR4020
KP
=
KP
+
1
PTCR4030
PRBFIKP)
=
NAMt
K)
RTC94040
KP
=
KP
+
1
RTCR4050
PRP-F(KP)
=
NAM(J)
PTCR4060
N
=
N
+
1
R
TCR4070
PRBEI
?4+N
I
SIGIK,JI
RTCR4080
T
I
+
INK
RTCR4090
J
=
J
+
1
NK
RTCR4100
'T:I
ttl
K
=
K
+
INK
RTCR4ltO
c-<
H
IF
(KP.LT.24)
GO
TO
2~O
RTCR4120
><
IF(INK
-
2)
275,275,270
RTCR4130
@
C
WRITE
CORR.
WITH
COMMON
DENOMIN.,
RETURN
FOR
NEW
COM/oIIANO
RTCR4140
2<
270
LINE
=
LINE
+
5
RTCP4150
ttl
CJl
2700
2701
C
IF
155
-
LINE)
2700,2701,2701
WRITE
(OUT,NUPG)
LINE
=
0
~RtTE
(OUT,22)
(PRBF(N),N=1,36)
R
TCRA160
RTCR4170
RTCR4180
RTCR4190
QTCR4200
271
IF(RQST(7).EQ.O)
GO
TO
272
RTCR4210
C
PRINT
CORRELATIONS
IN
WHICH
THE
SA~E
TERM
IS
THE
NUMERATOR
OF
8NE
RTCR4220
C
AND
THE
DENOMINATOR
OF
THE
OTHER.
RTCR4230
C
C
LINE
=
LINE
+
5
RTCR4240
IF
(55
-
LINE)
2710,2711,2711
RTCR4250
2710
WRITE
(DUT,NUPG)
RTCR4260
LINE
=
0
R
TCR4270
2711
WRITE
(oUT,241
NAM(
11,
fJA~(2),
NAMI
14).
NAM(5)'
NA"1(l4).
NAM(9),
RTCR42RO
*NAM(l'5l,
NAM('i),
NAM(l'5l,
NAM(13),
NAM(16J,
NA"'I(9I,
r-JAM(l6)
,
RTCR4290
>!'NAM(13),
NAM(lll,
NAt.1(6),
NAl.1l1l1,
NAM(l4),
NIVH121,
NAM(lO),
RTCR4300
*NMH12),
NAM(l4),
NAM(13),
NAM(6),
NAM(13),
NAM(lO}.
SIG(l4,S),
RTCR4310
*
S
I
G
(I
4,
9),
c:;r
G
(
1'5
,
'5
),
S
1
G
(15
,13).
S
I
G
(
16,
q),
S
I
G
(16
,1
3),
SIr,(
11
,
6
)
R
TC
R
4320
~',
SIGlll,141,
<;IG(12,10),
SIG{12,l4),
SIGIl3.61,
SIGID,lO)
RTC
R
4330
RTCR4340
LINE
=
LINE
+
'5
RTCR43S0
I~
(5S
-
LINE)
2712,?713,2713
RTCR4360
2712
WRTTE
(fJUT,'l1JPGl
RTCR4370
LINE
=
0
RTCR4380
2713
WRITE
(OUT,24J
NAM(3),
NAM(4),
NAM(8),
NAM(ll),
NAIII(S}.
NAM(lS),
RTCR4390
*NAM(9),
NAM(7),
NAM(9),
"JA~{lll,
NAM{lOl,
NAM(7),
NAMII0).
"lAMl1S'RTCR4400
*,
NA"'115),
Nil
M
Ull
,
NAM(S).
NA"'1fl2),
NAfoI(6',
NA"1(8),
NAM(6',
NAM(16)RTCR4410
*.
NA'~(7),
f'IAM(l2),
NtMlll,
NAM{16I,
SIGlfI,11)
,
STG(8.IS),
RTCR4420
*
S
I
G
(9
,11,
S
I
G
(
9
,1
1
I,
S
I
Gil
0
,
7
),
S!
G
(
10,1'5
),
S!
G
(
5
,
8
)
,
S
I
G
(
5
,
12)
,
R
TC
R
443
0
*SIG(6,8),
SIGI6,161,
SIG(7,121,
SIG(7,16)
RTCR4440
RTCR44S0
272
IFIRQST(R).EQ.O)
GO
TO
273
C
PRINT
CORRFLATlnNS
PF
RIITIOS
WHICH
ARE
RECIPROCALS
OF
EACH
OTHER.
R
TCR.4460
RTCR4470
RTCR44PO
RTCR4490
RTCR4500
RTCR4510
L
IN
F
=
L
H'
E
+
4
IF
(5~
-
LINE)
2720,2721,2721
2720
WRIT'"
{CUT,""JPGl
L
If'E
=
0
V1
en
......
~
f;;
~
......
o
n
o
~
t""'
~
......
o
ti
......
tv
-...]
2721
WRITE
(0UT,26)
N~~(13),NA~(14l,NAMI9).NAM(15»,NAM(5),NAM(16),NAM(IRTCR4520
*O),NAM(11),NAM(61,NAM(12),~AM(7),NAM(8',SIG(13,14),SIG(9,151,SIG(5~TCQ4530
*
,
16)
,S
I
G
(
10,11
,
,
S
!
G
(
6
,
12
)
,
S
I
G
(7
,~
»
Q
TC
R
4
54
0
273
IF
(RQST(91.FQ.0)
GO
TO
101
RTCP-4550
C
PRINT
CORRELATIONS
BETWEF.N
RATTOS
LACKING
COMM8N
TERMS
RTCR4560
LINE
=
LINF
+
5
RTCP4570
IF
(55
~
LINE)
2730,2731,2731
RTCQ4580
27'30
WRITE
(f1UT,NUPG)
RTCR,4590
L
INF
=
0
RTCR4600
2711
WRITF
(OUT,34)
NAM(S),
NAM(lO),
NAM(16),
NAM{lO),
NAM(5),
NAM(11),RTCR4610
*NAM(l6),
NtI~(1U,
NA~(6),
r-JAM(Q),
NAM(l2),
NAM(9),
NA
1.1
I
6'
,
NAM(l5IPTCR4620
*,
NA~(l?I,
I'!AM!15',
'IA~I71,
NAMIl3',
NAIJ!(8),
NAt'l(13).
NAM(7),
RTCP4630
*
NII.IJ!(l41,
NAMUn,
NA!~(14).
<;IG(5tlO),
SIGI16,l0),
SIGC5,IU,
RTCR4640
*SIG(16,1l),
SIG(6,9),
SIG(12,9',
SIG(6,15),
SIG(12,15),
SIG(7,13),RTCR,4650
*SIG(8,13),
SIGI1,141.
SIGtS,141
RTC
Q
4660
GO
T0
101
RTCR4670
C
WRITE
COPR.
WITH
COMMON
Ntlf.lFRATCRS.
CONTINUE
RT(R4680
275
LINE
=
LINE
+
5
RTCR.4690
IF
(55
-
LINF)
2750,2751,2751
RTCR4700
2750
WRITF
(OUT,NurG'
RTCR4710
UNF
=
0
RTCR4720
27'51
WRITE
{CIJT,?Ol
(PRBF(N),N=1,361
RTCR4730
280
IF
(RQST(61.FQ.O)
GO
TO
271
RTCR4740
C
INITJALIZ~
POINTERS
TO
P~CK
CORR.
8~TWEEN
RATIOS
WITH
CO~M.
OENOMIN.
RTCR4750
I
NK
=
:3
R
TC
R
4
760
KP
=
0
RTCQ,4770
N
I)
RT(R4780
I
=
5
RTC
P
4790
J
=
7
RTCP4POO
K
=
6
RTCR4RIO
GO
TO
260
RTCR4820
c
C
RECORD
CURRENT
VALUE
300
WRITE(OUT,?8IQ
STOP
OF
RAf'JnOM
NtlM~ER
GENERATOR
C;
FE!).
tHCR4830
RTCR4840
R,TCR4850
R
TCR4860
ATfR4870
ENO
~
IV
00
cilt""
.....
><
9
?<
ttl
til
APPENDIX
2
C
PROGRAM
RANEX
RNEX
C
EXAMINES
RANDO/>.1NFSS
AND
UfJIFORMITY
OF
A
SEQUENCE
OF
NU"lBERS
BY
,
RNEX
C
1.
COUNTING
FREQIJENCIES
OF
RUNS
OF
LENGTH
K
OVER
OR
UNOER
1/2,
RNEX
C
2.
COUNTING
FREQUENCIES
OF
RUNS
OF
LF:NGTH
~1
UP
OR
[lOWN.
RNEX
C
3.
COUNTING
FREQUENCIES
WITH
Wf-HCH
PAIRfD
NIJ"1BERS
SFPARATED
8Y
A
RNEX
l:
GIVE!'.j
OISTM1CE
(LAG)
FALL
IN
JOINT
S1ZE
CLASS(M,NI,
M
:
N
=
1,20.
THISRNEX
C
TABULATION
IS
OMITTED
WHEN
SAMPLES
OF
LESS
THAN
2000
NUMBERS
ARF
USED.PNEX
C
4.
COMPUTING
CHI-SQUARE(S)
ON
ASSUMPTION
THAT
NUMBERS
AND
NUMBER-
RNEX
C
PAIRS
ARE
UNIFOR~~LY
OISTRIBUTED,
RNFX
C
5.
COMPUTING
AUTO-CORRELATION
COEFFICIENTS
FOR
LAGS
J,
J=l,LAG
RNEX
C
R
NF
X
C
INPUT
IS
FROM
SUBROUTINE
UNNO(l,F)
WHfRE
L
IS
A
RANDOM
INTEGER
RNEX
C
AND
F
IS
A
FLOATFD
RANOO'1
~HH>lRER
IN
THE
RANGE
(0,
U
RNEX
C
RNE
X
C
PROGRAM
WRITTEN
BY
F.
CHAYES
FOR
NSF
STATISTICAL
GEOLnGY
INSTITUTE,
RNEX
C
CH1CAGO
CIRCLEt
tn2.
RNEX
C
RNFX
C
********************************************************************
RNEX
C
*
*
RNEX
C
*
CARD
INPUT
TO
RANEX
*
RNEX
C
*
*
RNEX
C
*
(Ol.
VARIABLE
PJNCTTON
OR
DEFINITION
*
RNEX
C
*
*
RNEX
C
*
T
ITL
E
CARD
(2(),t4)
*
RNEX
C
*
1-80
TITl
TITLE
INFORMATION,
80
CHARACTERS
*
RNEX
C
*
*
RNEX
C
*
CO~MANO
CARD
(3I5,115)
*
R~FX
C
*
1-5
NOKMP
N'IM3ER
OF
ITEMS
PER
SAMPLE
*
RNFX
C
*
6-10
ITER
NUMBER
OF
SAMPLES
*
RNEX
C
*
11-15
LAG
INTERVtl
P.ETWEEN
MEMBERS
OF
A
PAIR
(=0'
*
RNEX
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
110
180
190
200
710
220
;no
240
250
260
270
280
290
300
U1
{fl
H
'6
~
~
H
o
n
o
~
S
H
~
{fl
.....
tV
0
C
*
16-30
'<RNl
INIH,tIL
VAlUE
OF
RANDOM
NUMR~R
SFEO
*
RNEX
RNEX
PNEX
RNEX
RNEX
RNF
X
RNEX
RNE
X
RNfX
RflEX
RNEX
RNEX
C
*
(
K;
USE
K
*
C
>:
I);
USE
1ST
RfS
IOUE
FROM
GENERATOR
*
C
*
-
1;
U
SF
lAS
T
V
A
l
U
ERE
AC
HE
0
a
IJ
PRE
-
*
r
'"
r.~Ef)rNG
PASS
OF
SAME
EXEr.UTlOIJ)
*
c
*
*
c
********************************************************************
C
c
C
C
c
C
c
c
DfMENSION
CHIPR(201,
KL(20,20),
LNl(4',
lN2(4),
lN3(4),
LN4(41,
*lN5(41,
LN6(41,
NPON(20),
NRGT(20),
NRlS(201.
NPUPtZO),
NUMF(ZO),
*RLnU(ZOI,
RlUD(201,
RNVECIIOO),
TITl(201,
UKlM(20),
AlJCR(200)
f)OU~LE
PRECISION
AUCR,OK~P
INTEGER
fJUT
EQUIVAl
ENCF
(R"lVEC,CHIPR),
(RNVECPll
,NUMF,.
(RNVEC(411
,RLOIJ)
,
*(PNVEC(61"QUID)
PNEX
RNEX
RNEX
RNEX
RNEX
RNEX
DATA
IN,OUT/5,61
RNEX
DATA
UKlM/.O~,.lO,.15,.21),.25,.30,.35,.40,.45,.50,.S5,.60,.65,.70,RNEX
*.75,.80,.85,.QO,.95,1.1
RNFX
DATA
lNl,lN7,lN3,lN4/4H
RUN,4HS
'IF,4H
lEN,3HGTH,4H
NU,4HM~ER,
RNEX
*4~S
•
,3~1/2,4H
NU,4HMRER,4HS
=
,1Hl/2,4H
,4H
TOT,4HAl
QNEX
*3H
1
RNEX
DATA
LN5,lN6/4HORUN,4HS
DO,4HWIJ
,3H
,4H
RUN,4HS
UP,4H
,3H
RNEX
*1
RNEX
INPUT
FOPMATS
1
FO~MAT
(20A4/315,
Il51
CUTPUT
~OR"'AT~
RNEX
RNE
X
RNFX
RNEX
RNFX
2
FORMAT
('l',20A4,2X,'lAG
=',14.',
LENGTH
=',16,',
START
*X,'SAMPlf
NO.',1311)
='
,I
13/115RNFX
RNEX
RNEX
OF
PAIRS
IN
SIZE
ClASRNEX
4
FOPMAT
(3A4,lA3,13I7,415/17X,3T5)
6
FORMAT
(l1I30X,
'JIJNC,]SA
ARRAY
I
-
FREQUENC
IES
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
410
480
4QO
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
....tJ.I
o
"r:I
tIl
I:""
1-1
><
g
?<
tIl
en
C
*SES
RAND
C'1156X,'RANDOM
NO.(K
+'.I3,'l
+'IlSX,20FS.21'
RANDOM
*.
K
+')
8
FOR~AT
(IOX,FS.2,20I5)
lO
FOPMAT
(/6X,'CHI-SQ.'
,2X,20FS.I/IJ
12
FOR..,AT
('OCHI-SQUARE
FOR
DEPARTURE
-FROM
UNIFORM
DISTRIBUTION
OF
*UMRERS
IN
RANGE
(O,U
IS
',FIO.3,'.1)
14
FORMAT
(/4SX,fAUTOCORRELATION
OVER
INDICATED
LAG
DISTANCE(S)
_I)
16
FORMAT
(4X,'FlLAG
NOS.)',20151
NORNE
X
RNEX
RNEX
RNEX
NRNEX
RNEX
RNEX
RNEX
RNEX
18
fORMAT
(SX,fEXPECTED',Fll.l,12f1.1,4FS.l/19X,3FS.U
20
FORMAT
l'l*****REQUEST
FOR
CALCULATIONS
WITH
LAG
=
0
*)
22
FORMAT
nO(I3,F7.4,
I
.
I)
24
FORMAT
(/10X,'FINAL
RANDOM
NUMBE~
IS
',I15/1Hl)
IGNORED*****'RNEX
RNEX
RNFX
RNEX
26
FORMAT
If
1
FRROR"
PR'lGRA.M
SKIPS
J>ROBLEM
INITIATED
BY
FAUl
TV
INPUT
*CARO,
SEARCHfS
FOR
TITLE
CARD
OF
NEXT
PRORlFM.')
RNEX
RNEX
RNE
X
28
FORMAT
(1HO,20X,'SAMPlE
TOO
SMAll
1+
2000)
FOR
EFFECTIVE
LAG
PAIR
*COMPARISONS.
PROCEED
TO
UNIFORMITY
TEST.'IIS6X,'RANOOM
NO.(K
+',13RNEX
*,
'l
+
'
11'5
X
,
2
OF
<;
•
2
)
R
N
F
X
RNEX
C
READ
COMMANO
CARD
R
NEX
RNEX
RNFX
RNEX
RNEX
RNEX
RNEX
RNEX
RNEX
RNEX
RNEX
RNEX
RNEX
c
60
READ
(IN,
1,ERP=63,END=3501
IF
(ITER.EQ.O'
ITFR
=
1
IF
(lAG.GT.O)
GO
TO
65
WRITE
(nUT,20)
63
WRITE
(OUT,26)
GO
TO
60
TITL,NOKMP,ITFP,LAG,KRNL
C
INITIALIZE
RANDOM
NU~RER
65
IF(KRNl)
70,80,90
GENE
RATOR
70
KR
=
KRtST
KRNl
=
KRtST
GO
TO
100
660
670
680
690
700
710
720
730
740
7S0
760
770
780
790
800
810
820
830
840
8S0
860
870
880
l'l90
900
910
920
930
940
950
960
<no
80
CAll
UNN0(KP,RA)
KRNl
::
KR
GO
Tn
100
PNEX
980
RNEX
990
RNEXIOOO
VI
en
H
~
t-<
tTl
SH
0
n
0
::0
r;;
t-<
~
H
0
z
en
I-'
VI
I-'
C
90
KR
KRNL
ITP
=
0
100
I
TP
=
I
TR
+
1
IF
(ITR.GT.ITER)
GO
TO
60
IF
(IT~.GT.l'
KRNL
=
KRLST
C
GENE~ATE
AND
STORE
(LAG+1)
RANQrM
NU~RFRS
L
=
ABSfLAGI
+
1
DO
110
J
=
1,
L
110
CALL
UNNOlKR,PNVFCIJ)1
IF
(L.LT.?)
CALL
UNNOIKR,RNVECI2JI
C
C
CLEAR
All
FREQUENCY
AND
OTHER
CUMUlATnRS
nn
115
r
=
1,20
C
NRGT(
I)
=
0
NRLS(
1)
=
0
NRUPI
t)
=
0
NRON(
IJ
=
0
DO
115
J
==
1,20
115
KL(
J,t)
=
0
DO
120
J
=
1,200
120
AUCRIJI
=
0.0
C
RECORD
RUNS
UP-OOWN,
OVER-AND
UNDER
1/2,
IN
FIRST
(LAG+1JNUMBERS
M
=
1
K
=
1
MM
=
0
KK
=
0
OF
=
PNVEC{2J
-
PNVEC(1)
KF
=
RNVEC(I)
+
0.5
00
175
J
=
2.L
IF
(J.LT.3)
GO
TO
158
OL
=
RNVECfJ)
-
RNVECfJ-1J
IF
(OF)
125,125,140
C
CUP~ENT
RUN
IS
DOWN
R~E
XI010
RNEX1020
RNEXI030
RNFXI040
RNEX1050
R~EXI060
R~E
XI070
RNEXI080
RNEXI090
RNFXllOO
RNEXlllO
RNEX1l20
RNEXl130
RNEXl140
RNE
X1l50
RNEXl160
RNEXl170
RNEX1l80
RNEX1l90
RNEX1200
RNEX1210
RNEX1220
RNEX1230
RNEX1240
RNEXl250
RNEX1260
RNEX1210
RNEX1290
RNEX1290
RNEX1300
RNEX1310
RNEX1120
RNEX1330
RNEX1340
RNEX1350
.....
t.I
N
~
t'"'
H
><
g
2;:
tT:I
en
125
TFIDl)
130.130.135
130
M
=
M
+
1
IF
O".FQ.20)
M
=
20
IF
(M.
GT.
MIA
I
MM
=
M
GO
TO
155
135
NRON(~)
=
NRDNlM)
+
GO
T(l
150
C
CURRENT
RUN
IS
UP
140
IFI
DL)
145,14'3,130
145
NRUPIM)
=
NRUPIMl
+
1
150
r~
-=
1
155
f"lF
=
Ol
158
KS
=
RNVECIJl
+
0.5
IF
IKS
+
KF
-
1)
160,165,160
C
rUP~ENT
PIIIR
THE'
SAME,
INCPE~ENT
COUNTER
160
K=
K
+
1
IF
{K.GT.201
K
=
20
IF
(K.
GT.
KK)
KK
=
K
GO
TO
175
C
CURRENT
PAIR
DIFFER,
RECORD
RUN
LENGTH,
RESET
K
c
165
IF(KF.EQ.1)
GO
TO
170
N
RL
S
(
K)
=
NR
LSI
K
I
+
1
GO
Tn
173
170
NRGT(K)
=
NRGT(K)
+
1
173
K
-=
1
175
KF
=
KS
C
CLASSIFY
AND
STORE
FREQUENCIES
OF
PAIRED
NUt~~ERS
M
AND
M+[,
CUMULATE
C
oPODlJCTS
FOR
AUTO-CORREl/HION
COE
FF
IC
lENT,
Mil
INTA
IN
COUNT
01"
RUIIJS
OF
C
lEMGTH
KOVER
ANO
UNDER
1/2.
(
DO
300
N
=
1,NOKMP
C
CUMUL
ATE
AUTO-CORP.
SUM
flF
PRODUC
TS
lL
=
L
-
1
DO
l71l
!
-=
I,lL
RNEX1360
RNEX1370
RNEX1380
R~'EX1390
RNEX1400
RNEX1410
R"lEX1420
RNEX1430
RNEX1440
RNEX1450
RNFX1460
RNEX1470
RNEX1480
RNEX1490
RNEX1500
R
NE
Xl51
0
RNEX1520
RNEX1530
RNfX1540
RNEX1550
RNEX1560
RNEX1570
RNEX1580
RNEX1590
RNEX1600
RNEX1610
RNEX1620
RNEX1630
RNEX1640
RNEX16<;O
RNEX1660
RNEX1670
RNEX1680
RNEX1690
RNEX1700
U1
en
H
i!§
t'"'
tTl
S
H
o
n
o
~
E
H
o
z
en
......
(N
(N
178
AUCRIIJ
=
AlICR(I)
+
RNVEC(L-I)*RII,IVEC(U
C
CLASSIFY
CURRENT
NUMAER-PAIR
RNVEC(I)
AND
RNVEC(L)
00
190
I
=
1,20
C
IF
(RNVEC(l)
-
LJKLM(I1)
IIJ0,180,190
]
80
Ie
=
I
GO
TO
200
190
CONTINUE
IC
=
20
200
~O
220
I
:
1,20
IF
(RNVEC
(ll
-
lIKLM(
I
I)
210,210,220
210
IR
=
I
GO
Tn
230
220
CONTI
NUE
IR=
20
230
KlIIPtICl
=
KlIIR,IC)
+
1
C
SHIFT
ELEMFNTS
OF
RNVEC
DOWN
ONE
CELL,
GENERATE
NEW
RNVECCLl
DO
240
JR
=
2,
L
240
RNVECIJR
-
1)
=
RNVEC(JR)
CALL
UNN1(KR,RNVEC(lll
C
(
C
CONTINUE
COUII,IT
'JF
RUNS
UP-DOWN,
f)VER-llNDER
1/2
DL
=
RNVECIL)
-
RNVEC(L-I)
IF
([)F)
245,245,260
245
IF!OL)
250,2~O,255
250
101
=
M
+
1
IF
IM.EO.20)
M
=
20
IF
(M.GT.MM)
101M
=
M
GO
Tn
275
255
NRONIM)
=
NRON!M)
+
1
GO
TO
no
260
[FIDt)
2h5,?65,25C
265
NRlJPIM)
=
NPtJP!f.I)
+
1
270
~
=
1
215
I'lF
=
DL
RNEXI710
RNE
X1720
RNEX1730
RNEX1740
RNEX1750
RNEX1760
RIJEXI770
RNEX1780
RNEX1790
RNEXlSOO
RNEX1810
RNEX1820
RNEX1830
RNEX1840
RNEX1850
RNEX1860
RNEXlfl70
RNEX1880
RNEX1890
RNFX1900
RNE'X1910
RNEX1920
RNI:"X1930
RNEX1940
R"IEX19S0
RNFXI%O
RNF.
Xl
97
0
RNEX1980
RNEX1990
RNEX2000
RII,IEX?OIO
PNEX20?O
RNE
X2030
RNEX2040
QNEX2050
......
tJ.I
~
'r.I
tTl
t""'
1-1
><
@
?<
tTl
en
KS
=
RNVECCl)
+
0.5
IF
(KS
+
KF
-
1)
280,285,280
Z80
K
=
K
+
1
IF
(K.GT.20)
K
=
20
IF
IK.GT.KK)
KK
=
K
GO
TO
300
285
I
F
I
K
F
•
F.'
Q
•
U
GOT
0
29
0
NRlSIK)
=
NRlS(K)
+
1
GO
TO
Z95
290
NRGTIK)
=
NRGTIK)
+
1
295
K
=
1
300
KF
=
KS
r
PECORD
lAST
RUN
Of
F.'ACH
TYPE
c
IFCOU
302,302,304
302
NRDNIM'
=
NRDN(Ml
+
1
GO
TO
306
304
NRUP(M)
=
NRUP(M)
+
1
306
IFIKS.GT.O)
GO
TO
30R
NRlSIK)
=
NRlqK)
+
1
GO
TO
309
308
NRGTIK)
=
NRGT(K)
+
1
C
CALCULATE
AUTO-CnPRElATION
COEFF.,
BEGt
N
WR
IT
WG
30Q
KRl
ST
=
KR
OKMP
=
NOKMP
DO
30
9
5
J
=
1,
L
L
3095
AUCR(J)
=
AUCRIJ)/OKMP
WRI
TE
(OllT,?)
TITl
,lAG,NOKMP,KRNl,
ITR
C
C
CALC.
OBS.
:
EXP.
NUMRERS
nF
RUNS
QF
lFNGTH
+
:
=
1/2
Dn
310
J
=
1,KK
310
RlOU(J)
=
FLOATINOKMP
+
LAG
-
J
-
4)/2.**IJ+l1
C
COMPUTF
CHI-SQUARES
FOR
INnrVInUAlS
AND
PAIRED
FREQUENCIES
CHI
FR
=
0.0
TMP
=
FlOATINOKMP)/400.
RNEX2060
RNEX2070
RNEX2080
RNEX2090
RNEX2100
RNEX2110
RNEX2120
!?
NH213
0
RNEXZ140
RNEX2150
RNEX2160
RNEX2170
RNEX2180
RNE
X21
<:)0
RNEX2200
RNEX2210
RNEX2220
RNEX2230
PNEX2240
RNEX?250
RNEX2260
RNE
X2270
RNEX2280
RNEX2290
RNEX2300
RNEX2"110
RNEX2320
RNEX2330
RNEX2340
RNEX2350
RNEXZ360
RNEX2370
RNEX2380
RNEX2390
RNEX2400
IJ1
en
....
~
~
~
....
o
8
~
t"'
~
....
ffi
....II<
VI
c
T"IF
=
lO.*TMP
DO
:n
0
I
C
=
1,
2
0
CHIPR(ICI
-=
O.
SUMFP
=
O.
NlIMF(
10
-=
0
flO
32()
lR
=
1,20
CLFP
=
KLlIR,IC)
S
lJI4
FR
=
S
U
MFR
+
C
L
F
R
NU/<IFIICI
=
MJIo'FIIC)
+
KUIR,Ie)
320
CHIPP.IIC)
=
CHTPRIIC)
+
(CLFR
-
T"1
o
l**2
CHIPR(ICI
=
Cf"tPR(ICIITMP
330
CHIF~
CHIFR
+
(SUMFR-
T"IFI
**
,
CHI
FP
=
CH
TFQ'/P~F
C
COMPLETE
WRITIN~
c
WRITF
(rUT,41
lNt,
(J,J=l,KK)
WRITE
1()IIT,41
LN2,
(NRLSIJ},J=l,KK)
WR,ITF
(rUT,4)
LN1,
(NRGT(J),J-=l,KK)
on
332
J
=
1,KK
•
332
NRGT(J)
=
NRGT(J)
+
NRLSIJ)
WRITE
(OUT,4)
LN4,
('JRGTfJ),J=l,KK)
,,"'RITE
{OlJTr18)
(RlOll{JI,
J
=
1,KKI
WRITF
(CUl,41
LN5,(NRnN(J)
,J=l,w~)
WRITE
(rUT,4)
LNn,(NRUP(JI,J
=
1
,"'lM)
C
CALC.
CBS.
ANn
EXP.
NUMBERS
OF
RUNS
OF
LFNGTH
K
·UD-AND-OOWN'.
ROT
=
6.
00
335
J
=
1,~H.1
F~
=
J
+
."
BOT
=
ROT
*
FI"
RllJOIJI
2*1
(J**2+3*J+ll*(NOKMP+lAG+l)-(J**1+3*J**2-J-4)
I
RL!J()(J)
=
RLlID(Jl/ROT
315
NRUP(J)
=
NRONLJl
+
NRjfl{J)
WRITf
(QUT,4)
If14,(NRU£l(JI,J=1,M'''l)
WRYTE
UJlJ
T
,1S)
(RLtJfl(JI.J=l,MM)
RNEX2410
RNFX2420
RNEX2430
RNEX2440
RNEX?450
RNEX2460
RNEX2470
RNEX2480
RNE'X?490
RNFX2500
RNEX2510
RNFX2520
RNEXZ530
PNEX2540
RNEX2550
RNEX2560
RNEX2570
RNEX2580
RNEX2590
RNEX2600
RNEV610
R
NE
X262
0
RNEX2630
RNEX2640
R
NE
X2l:50
R
NEXZ660
RNEX2670
RNEX?680
RNEX2690
RNEX?
700
RNE
X2710
RNEX2720
RNEX2T30
p
NEX2740
RNEX2750
....VI
0
~
t""'
H
:><
g
2<
tTl
CIl
IF
(NOKMP.LT.20001
GO
TO
345
WRITE
{(lUT,61
LAG,
(UKLM(Il,r=l"~O)
Oil
340
J
=
1,20
340
WRITE
IOUT,8)
UKlMIJ),IKL(J,K),K=1,20)
WRITE
IOUT.IO)
(CI-lIPf{(JI,J=1,20)
345
tF
l"I:JK~1P.LT.?OOO)
WRITE
(OUT,281
LAG,UKLM
WPIH
I(lUT,l61
NUMF
WRITE
(OUT,12)
CI-lIFR
WRIT
E
([I'
JT
,
14
I
WRITF
(nUT,221
(J.AUCR(JI,J=l,LLl
wp.ITE
(OlIT,24)
KRLST
Gn
Tel
100
350
STOP
END
SUBROUTINE
UNNO(L,FI
PSEUO[1-PANOOM
NUMljER
GENERATOR
C(lOEO
ElY
L.W.FINGER
CALLING
SEQUENCE
IS
CALL
UNNO
{L,F}
l
IS
THF
LAST
RANO(lM
NUMBER
CALCt/LATED.
IF
NOT
CHANGED,
THE
NUMBERS
COME
OUT
IN
SEQUENCE.
F
IS
THE
RA~OOM
NUMBER
IN
REAL
FOP~
AND
IS
UNIFOR~LY
DISTRIBUTEO
IN
THE
RANGE
(O,l)
OOtiRLE
PRFCI<;ION
D
IFIL.EQ.0IL=23192344
O=L
0=OMOOI513.000*0,2147483641.000)
L=O
L=L+l
F=(O+1.ODOI/2147483647.000
RETURN
END
RNEX2760
RNEX2770
RNEX2780
RNEX?790
RNEX2800
RNEX2810
RNEX2820
R~lEX2830
RNEX28'+O
RNEX2850
flNEl<2860
RN!=X2870
RNEX2880
RNEX2890
LJNNO
10
UNNO
20
UNNO
30
UflNO
40
UNNO
50
UNNO
60
UNNO
70
tjJlNO
80
IJNNO
90
UNt.JO
100
UNNO
11
0
U"INO
120
UNNO
130
UNJlO
140
U11110
150
UNNO
160
V1
en
I-!
~
f;;
~
I-!
o
n
o
~
E
I-!
~
en
......
V.
'
I
Chapter 6
Computer Perspectives in Geology
Daniel F. Merriam
6.1 GENERALITIES
The lntroduction of computers into society in the mid-19th century
ushered in the space-age and, with it, some special problems. Although
the idea of computers and computing has been with us for some time, the
explosive development of computers just after World War II was not
foreseen, and the public as a whole was not ready for such dramatic and
rapid changes in this field. So, although we have seen men on the moon,
world-wide weather forcasting via satellites, and breakthroughs in med-
icine and science, some of the 1984 predictions have come to pass.
There has been an invasion of privacy, many trades and practices have
been declared obsolete and redundant, and an impersonal touch has been
added to an ever-increasing complex society.
The problems that plague the general public also plague science,
and geology is no exception. The numerous changes stemming from the
computer have forced geologists to reevaluate their contributions and
in some instances, the results have been startling. Geologists have
been put in the position of having to think and formulate their prob-
lems and methods of solution in much greater detail. Instead of re-
placing geologists, the computer has created a demand for more and bet-
ter trained ones. One effect the computer has had on geology is to
force a metamorphosis on us, that is a change from a qualitative sci-
ence to a quantitative one. Purists claim even that geology is only
now becoming a science.
138
6. COMPUTER PERSPECTIVES IN GEOLOGY 139
6.2 EARLY BEGINNINGS
Computers date from 1812 when Charles Babbage invented his dif-
ference machine. The machine was designed to operate automatically
without human intervention. Although the underlying principle was
simple, Babbage encountered almost insurmountable problems in building
his machine. It was unfortunate that Babbage was unable to complete
his marvelous invention, but he was too far ahead of his time, and
technology simply had not developed to the necessary extent.
The next important event was the development of a punched-card
system by Herman Hollerith in about 1890. Mr. Hollerith worked for
the u.S. Bureau of Census, and he recognized early the need for the
rapid manipulation of data. Even then it was difficult to summarize
census information. Hollerith's original idea of the punched card is
probably the device most used today for input to the computer.
During World War II the development of computers was accelerated,
especially to aid in solving problems in ballistic missile development
and in the development of the atomic bomb. Rapid strides were made
possible through the adaptation of Boolean algebra which had been
formulated some years before by Claude Shannon. Although the first
computers were awkward to use and slow by today's standards, they
served their purpose and laid the foundation for a complex and dynamic
industry.
6.3 USAGE IN GENERAL
Computers may be used for a number of reasons,. including (1) sav-
ing time and effort, (2) making use of information in ways that would
be virtually impossible without the aid of computers, and (3) improv-
ing the rigor of thought processes (Harbaugh and ~erriam, 1968). The
application of computer techniques to solving problems in the earth
sciences is now important and becoming more so. Many of these appli-
cations were unthought of just a few years ago and indeed just a few
months or even weeks ago. As aptly stated by P. C. Hammer (1966, per-
sonal communication)
140 DANIEL F. MERRIAM
"People who are thinking about what they are doing are using com-
puters."
Although some of the techniques were possible before the advent of the
computer, many were not. Execution is presently feasible only because
of the ease and speed with which they can be accomplished and as stated
by P. C. Hammer (1966, personal communication)
"A computer is an intelligence amplifier."
The ability to save time and effort and to some extent the ability
to manipulate data in ways that are impossible otherwise are mainly an
aspect of computers and computing systems themselves. It is essential,
therefore, that some personal involvement in the process of computing
be attained.
Two of the most important aspects of the use of computers are re-
peatability and reliability; that is, anyone can take a set of data
and reproduce the results within the same limits of accuracy by the
same method. It is not possible to do this with qualitative methods
and these two aspects can not be overestimated (Merriam, 1965).
Obviously there are times and places when and where a computer
can be used to good advantage. These are (1) if there is a large vol-
ume of data, (2) if speed or frequency is necessary in retrieval of
data, or (3) if a particular problem is extremely complex. There is
an area between these extremes where it is easier to do the required
manipulations manually.
There are other considerations on whether to use a computer.
These include (1) the availability of programs, (2) the reducibility
of the data to numeric form, and (3) ease of accessibility and econ-
omic feasibility.
If it is necessary to develop programs, it can be extremely tedi-
ous and expensive. Obviously, if there is a low volume of data, or
speed is no object, or the problem is not too complicated, it would be
desirable to obtain results manually. Fortunately, however, programs
are available from many sources and many may be adapted for a particu-
lar use.
Many geologic data are qualitative and not amenable to computer
analysis. Many data also are incomplete or of poor quality and
6. COMPUTER PERSPECTIVES IN GEOLOGY 141
essentially useless by today's standards. It may be desirable in many
instances simply to recollect the data. Another requirement is the
necessity that the problem be expressed in a sequence of relatively
simple, logical, algebraic statements. This is necessary because op-
erations to manipulate the data must be explicit, precise, and unam-
biguous. This requirement can usually be met although it may take
considerable thinking and planning on the part of the investigator.
6.4 USAGE IN GEOLOGY
Just as computers date from 1812, modern geology dates from the
late 18th century and the work of James Hutton, a Scot. It is inter-
esting to note that these events both took place in Britain at about
the same time and about 150 years before the application of computers
to geologic problems.
The first earth scientists to use computers were those who were
numerically inclined. It is not surprising then that geophysicists
were the first. They had been using slide rules and desk calculators
for many years, and it was natural to adapt to a new and better method
of processing their data. Other exceptions were those conducting sta-
tistical studies of sediments and their contained fossils and those
working with engineering aspects, such as hydrologists. I~ere large
quantities of data were handled, techniques were needed in manipulating
them. Just as Herman Hollerith needed assistance with his census data,
geologists needed help with their data processing. It is logical then
that sorting of oil and gas data and stratigraphic information was
accomplished with punch cards in the early 1950's (Parker, 1952).
Early bibliographic systems also utilized punched cards. As data ac-
cumulated, it was necessary to sort faster, and when computers became
generally available, automatic procedures replaced manual ones.
It might be imagined that because computers have only been com-
mercially available for about 18 years that the utilization of them by
geologists has been most recent. This is indeed true, and only in the
past 10 years has this involvement become increasingly important. The
142 DANIEL F. r.JERRIAM
importance can be judged by the number of geologic publications appear-
ing which have in some way utilized the computer as shown in Figure 6.1 .
The number of publications is increasing rapidly and an obvious in-
crease occurred in the number of reports on research beginning in 1962,
which is the result of the general availability of second-generation
computers. Geology entered the computer age with a publication in a
regularly issued geology journal of a geologically oriented IBM 650
program by Krumbein and Sloss (1958). The original program is repro-
duced in Figure 6.2. Other important events which have affected the
development of computer applications in geology are listed in Table
6.1.
~
.~
.~
:0
~
D-
o
.8
E
:>
Z
300
250
200
150
100
50
1950 55 60
Years
FIGURE 6.1
65 70
6. COMPUTER PERSPECTIVES IN GEOLOGY 143
IBM 650 Basic Program for Three
Percentaqes and Two Ratios
Zero drum, Start program at 0501, No Subroutines required
Location of
Instruction OP Data Instr. Abbrev. Remarks
0501 70 1501 0502 RD Read data card
0502 65 1501 0503 RAL Code in accumulator
0503 20 0727 0504 STL Store code
0504 65 1503 0505 RAL Total in accumulator
0505 20 0728 0506 STL Store total thickness
0506 65 1504 0507 RAL B to accumulator
0507 16 0660 0508 SL Subtract 10 1010 1010
0508 45 0510 0509 BRNZ Branch on B data
0509 24 0729 0515 STD Store no data code
0510 15 0660 0511 AL Add 10 1010 1010
0511 35 0004 0512 SLT Shift left
0512 64 1503 0513 DVRU Divide by total
0513 31 0001 0514 SRD Shi ft and round
0514 20 0729 0515 STL Store oercent B
0515 65 1505 0516 RAL C to accumul ator
0516 16 0660 0517 SL Subtract 10 1010 1010
0517 45 0519 0518 BRNZ Branch on C data
0518 24 0730 0524 STD Store no data code
0519 15 0660 0520 AL Add 10 1010 1010
0520 35 0004 0521 SLT Shift left
0521 64 1503 0522 DVRU Divide bV total
0522 31 0001 0523 SRD Shi ft and round
0523 20 0730 0524 STL Store oercent C
0524 65 1506 0525 RAL A to accumulator
0525 16 0660 0526 SL Subtract 10 1010 1010
0526 45 0528 0527 BRNZ Branch on A data
0527 24 0731 0552 STD Store no data code
0528 15 0660 0529 AL Add 10 1010 1010
0529 35 0004 0530 ALT Shift left
0530 64 1503 0531 DVRU Di vi de by tota1
0531 31 0001 0532 SRD Shift and round
0532 20 0731 0552 STL Store oercent A
0552 71 0727 0553 PCH Punch percentages
0553 65 1501 0554 RAL Code in accumulator
0554 20 0827 0555 STL Store code
0555 65 1503 0556 RAL Total in accumulator
0556 20 0828 0557 STL Store total
0557 65 1506 0558 RAL A to accumulator
0558 45 0570 0559 BRNZ Branch on nonzero A
0559 65 1504 0560 RAL B to accumulator
0560 15 1505 0561 AL Add.e
0561 45 0564 0562 BRNZ Branch on nonzero B + C
0562 65 0662 0563 RAL Indeterminate code
0563 20 0829 0586 STL Store code for 010
0564 16 0661 0563 SL Subtract 20 2020 2020
FIGURE 6.2
144 DANIEL F. MERRIAM
FIGURE 6.2 (continued)
Location of
Instruction OP Data Instr. Abbrev. Remarks
0565 45 0567 0566 BRNZ Branch on no B + C data
0566 24 0829 0586 STD Store no data code
0567 65 0663 0568 RAL Infinity code
0568 20 0829 0586 STL Store infinity code
0569 16 0660 0570 SL Subtract 10 1010 1010
0570 45 0572 0571 BRNZ Branch on nonzero A
0571 24 0829 0586 STD Store no data code
0572 65 1504 0573 RAL B to accumulator
0573 16 0660 0574 SL Subtract 10 1010 1010
0574 45 0576 0575 BRNZ Branch on B data
0575 24 0829 0586 STD Store no data code
0576 65 1505 0577 RAL C to accumulator
0577 16 0660 0578 SL Subtract 10 1010 1010
0578 45 0580 0579 BRNZ Branch on C data
0579 24 0829 0586 STD Store no data code
0580 65 1504 0581 RAL B to accumulator
0581 15 1505 0582 AL Add C
0582 35 0003 0583 SLT Shift left
0583 64 1506 0584 DVRU Divide bv A
0584 31 0001 0585 SRD Shi ft and round
0585 20 0829 0586 STL Store ratio (B + C)/A
0586 65 1505 0587 RLA C to accumulator
0587 45 0597 0588 BRNZ Branch to C data
0588 65 1504 0589 RAL B to accumulator
0589 45 0592 0590 BRNZ Branch on B data
0590 65 0662 0591 RAL Indeterminate code
0591 20 0830 0609 STL Store indeterminate code
0592 16 0660 0593 SL Subtract 10 1010 1010
0593 45 0595 0594 BRNZ Branch on B data
0594 24 0830 0609 STD Store no data code
0595 65 0663 0596 RAL Infi nity code
0596 20 0830 0609 STL Store infinity code
0597 16 0660 0598 SL Subtract 10 1010 1010
0598 45 0600 0599 BRNZ Branch on C data
0599 24 0830 0609 STD Store no data code
0600 65 1504 0601 RAL B to accumulator
0601 16 0660 0602 SL Subtract 10 1010 1010
0602 45 0604 0603 BRNZ Branch on B data
0603 24 0830 0609 STO Store no data code
0604 15 0660 0605 AL Add 10 1010 1010
0605 35 0003 0606 SLT Shi ft left
0606 64 1505 0607 OVRU Oi vi de by C
0607 31 0001 0608 SRD Shi ft and round
0608 20 0830 0609 STL Store rati 0 B/ C
0609 71 0827 0501 PCH Punch ratio card
0660 10 1010 1010 Const
0661 20 2020 2020 Const
0662 90 9090 9090 Const
0663 99 9999 9999 Const
6. COMPUTER PERSPECTIVES IN GEOLOGY 145
TABLE 6.1
1812
1890
1941
1944
1946
1951
1952
1953
1954
Important Events in Computer Applications in Geology
Charles Babbage and his difference machine.
Punched-card system developed by Herman Hollerith.
23, first electronic computer, made in Germany.
Mark I, the decimal electromechanical calculator put into
operation at Harvard.
ENIAC built at the University of Pennsylvania.
UNIVAC, the first commercial computer.
Digital plotters introduced.
First FORTRAN compiler written.
Introduction of the IBM 650, the first mass-produced com-
puter.
1958 w. C. Krumbein and L. L. Sloss published the first geologi-
cally oriented computer program in a recognized geologic
j ouma!.
Transistorized second-generation computers introduced.
The ALGOL language was jointly introduced in several coun-
tries.
1961 Establishment of the symposia series "Computer Applications
in the Mineral Industries" by University of Arizona.
1963 Announcement of third-generation microcircuit computers.
First regular publication of geologic computer programs as
Special Distribution Publications of the Kansas Geological
Survey.
First year more than 100 papers published on computer appli-
cations in geology.
1964 Time-sharing system successfully used at Dartmouth University.
1966 First series of geologic publications to deal exclusively
with computer programs established by Kansas Geological
Survey.
First of eight colloquia on "Computer Applications in the
Earth Sciences" sponsored by the Kansas Geological Survey.
Establishment of an Associate Editor for Computer Applica-
tions for the AAPG Bulletin.
1967 American Association of Petroleum Geologists Committee on
Electronic Data Storage and Retrieval formed.
COGEODATA (lUGS Committee on Storage, Automatic processing,
and retrieval of geologic data) formed.
146
TABLE 6.1
DANIEL F. HERRIAM
(Continued)
Publication in lUGS Geological Newsletter, first international
attempt to standardize description of mineral deposits in
computer processable form.
1968 IAMG founded in Prague at the IGC.
1969 First issues of the Journal of the IAMG published.
GEOCOM Bulletin, an international current awareness publica-
tion, initiated.
US Geological Survey publishes its first Computer Contribu-
tion series.
First book in a series on "Computer applications in the
earth sciences" (published by Plenum Publ. Corp.)
1970 An informal research group on Computer Technology formed by
SEPM.
6.5 PATTERNS AND TRENDS
For 150 years geologists have been collecting data. By its nature
geology has been a historical and observational science. By the mid-
20th century, however, this emphasis was beginning to change to one of
understanding geologic processes (Sylvester-Bradley, 1972). This
metamosphosis of where to how is being accelerated, and obviously the
next logical step is one of understanding why, that is putting together
the whole story. So, even in the short time geologists have utilized
computers, a progression through several stages of computing environ-
ments has taken place (other disciplines have also undergone this
transformation, for example, chemistry and physics).
Early applications were mainly analytical. Next, was a stage of
collecting data in machinable form for use in predictive techniques
utilizing methods developed and well tested in other disciplines as
indicated in Table 6.2. Simulation followed and results are being
evaluated in the 1970's. This progression in development of the sub-
ject is paralleled and recorded by examples in the literature (Preston,
1969). Past and future trends are shown in Figure 6.3.
TABLE
6.2
Discovery
Development
Application
Assimilation
Historical
Record
of
Stage
of
Integration
of
New
Concepts
or
Techniques
in
a
Discipline
Publications
Papers
general
with
suggestions
of
pos-
sibilities
Papers
demonstrate
use
of
different
techniques
Papers
acknowledge
use
~of
computers
and
source
of
programs.
Different
problems
tried
Completely
integrated
Data
None
Artificial
Sample
data
sets
Real
data
in
quanti
ties
necessary
to
solve
problems
Computer
Programs
None
"Borrowed"
from
other
fields
in-
tact
Modified
and
adap-
ted
from
other
fields
with
some
geologic
bent
Programs
written
with
only
parts
of
"canned"
programs
used
but
specific
for
purpose
References
Practically
none
Mostly
from
other
disciplines
Everything
written
on
the
subject
in
geology
Citation
of
only
those
papers
of
pertinence
to
work
'"
n
o
3:
""
~
tTl
:>::l
""
tTl
:>::l
(j)
""
tTl
n
...,
t-I
<
tTl
(j)
t-I
Z
G"l
tTl
o
t-<
o
G"l
-<
I-'
.p..
--.]
148 DANIEL F. MERRIAM
1960's . /
1970's ....,../
...-"'-
--- "
- 1980's ..
FIGURE 6.3
./'
./
The future is naturally unknown. Developments are too rapid and
users are heavily dependent on developments in the computer industry
and other disciplines for advancement of methods and ideas. It is
clear at this moment in time that we are moving from the application
to assimilation stage in most areas and that we are using simulation
to test real situations and learn of processes in an effort to under-
stand why.
Past developments will have a bearing on future events. Individ-
uals especially i~ universities have been developing and adapting
techniques for solving geologic problems (mainly because of lack of
funds and accessibility to data files). Simultaneously in industry,
large-data files have been converted to machinable form. The develop-
ment of standards and requirements for programs and data-file format by
government agencies and other interested organizations have made con-
siderable progress (e.g., Hubaux, 1970; Robinson, 1970). The wedding
of new imaginative techniques as applied to real data of known quality
in readily accessible, compatible files will result surely in a com-
pletely integrated system leading to significant findings. Only one
warning. Geologists must keep in mind their objectives and define
their problems sharply to keep from becoming a tool of the computer
rather than the master.
6. COMPUTER PERSPECTIVES IN GEOLOGY 149
REFERENCES
Harbaugh, J. W., and Merriam, D. F., 1968, Computer applications in
stratigraphic analysis: New York, John Wiley &Sons, 282 p.
Hubaux, A., 1970, Description of geological objects: Jour. Math. Geol-
ogy, v. 2, no. 1, p. 89-95.
Krumbein, W. C., and Sloss, L. L., 1958, High-speed digital computers
in stratigraphic and facies analysis: Am. Assoc. Petroleum Geol-
ogists Bull., v. 42, no. 11, p. 2650-2669.
Merriam, D. F., 1965, Geology and the computer: New Scientist, v. 26,
no. 444, p. 513-516.
Merriam, D. F., 1969, Computer utilization by geologists, in Symposium
on computer applications in petroleum exploration: Kansas Geol.
Survey Computer Contr. 40, p. 1-4.
Parker, M. A., 1952, Punched-card techniques speed map making Cabs.):
Geol. Soc. Amer. Bull., v. 63, no. 12, pt. 2, p. 1288.
Preston, F. W., 1969, Systems analysis--the next phase for computer
development in petroleum engineering in Computer applications in
the earth sciences: New York, PlenumlPress, p. 177-189.
Robinson, S. C., 1970, A review of data processing in the earth sci-
ences in Canada: Jour. Math. Geology, v. 2, no. 4, p. 377-397.
Sylvester-Bradley, P. C., 1972, Geobiology and the future of palaeon-
tology: Jour. Geol. Soc., v. 128, pt. 2, p. 109-117.
Chapter 7
Problem Set in Geostatistics
R. B. McCammon
7.1 A PALEONTOLOGIST'S DILEMMA
Six fossils, Acanthus minerva, Gyro pipus, Rega elegans, Acanthus
exerta, Rega veliforma, and Gyro robusta were found together in a tray
at the museum. From an outside label, it is known that Rega veliforma,
Acanthus exerta, and Gyro robusta are marine, fresh water, and brack-
ish water species (not necessarily in that order) each from a differ-
ent locality. Acanthus minerva was collected in Illinois. The fresh
water species was collected in New England. Rega elegans is known not
to be marine. The fresh water species, and one of the other, which is
a marine species, were collected from the same locality. Gyro robusta
is larger than the marine species. One of the species of the same
genus as the fresh water species was collected in California. Which
one is the marine species?
No doubt you will unravel the logic for this dilemma. Do you
think, however, that you could write a computer program to solve the
problem?
7.2 PARTICLE SIZE DISTRIBUTION IN THIN SECTION
In a recent paper by Rose (1968), it was shown that the probabil-
ity p that an intersection figure cut at random in a block of thickness
t from a single sphere of diameter D will have a diameter falling be-
tween the limits da and db is given by
150
7. PROBLEM SET IN GEOSTATISTICS 151
(7.2.1)
where 0 < da ~ db ~ D < t.
1. Let D t. Show that equation (7.2.1) is a probability distribu-
tion. What is the expected value? Prepare a frequency plot of
this distribution.
2. Rose went on to show that if the number of spheres of diameter D
equal to id was denoted as Ni where da = (j - l)d and db = jd,
then the expected number between da and db was given by
C.. = N. <Pl.).
1) 1
where
<p .• =
1)
o i<j
Write a computer program to generate the elements of the matrix
<P for m = 10.
3. Suppose you are given the following observed frequencies of par-
ticle diameters determined from thin section:
Size class
(smallest) 1
2
3
4
5
6
7
8
9
(largest) 10
Frequency
o
16
'87
155
150
65
32
8
4
1
Using the matrix generated above, determine the true frequency
distribution of particle diameters. Interpret your results.
152 R.B. McCAMMON
7.3 PARTICLE DIAMETER SAMPLING EXPERIMENT
We have just examined the probability distribution for the appa-
rent diameter of a spherical particle of diameter D cut at random in
thin section. Now let us simulate the process. Imagine a spherical
grain
p ___--'r:rn
__y~X-=9-1...::2-'----- pi
o
cut by a random plane parallel to PP'. Y is defined as a random vari-
able distributed uniformly between zero and D/2, the particle radius.
We have
F/D
P (y) = 10
o ~ y ~ D/2
otherwise
as the probability density of Y. The apparent diameter 2X related to
Y by
or
- ( 2 2
X = ~-Y + (D/2)
is a random variable also. We wish to find the probability density of
2X by experiment.
1. Let D/2 = 1. Use a random number generator function to generate
a sample of 100 values of Y distributed uniformly between 0 and 1.
Calculate 2X for each Y and prepare a histogram for 2X in inter-
vals of 0.2. Calculate the mean and standard deviation. Retain
the mean value.
2. Repeat the sampling process above 100 times, recording only the
mean value of 2X. Prepare a histogram of the mean. How would
you describe the shape of this distribution?
7. PROBLEM SET IN GEOSTATISTICS 153
For a fuller explanation of the methods required to solve the two
following problems, the reader is referred to McCammon (1969). The
reference is given in Section 2.10 in Chapter 2.
7.4 LINEAR REGRESSION OF POROSITY DATA.
The data given in Table 7.1 were gathered from well logs and rock
cores taken from drill holes that penetrated subsurface formations in
the Chicago land area. At issue is the relationship between log-
derived and core-derived estimates of porosity. The values given are
expressed as percentages.
TABLE 7.1
Sample
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Log-
derived
porosity
10.0
9.0
7.0
6.0
9.0
7.0
10.0
5.0
7.0
0.0
5.0
6.0
9.0
8.0
5.0
6.0
6.0
5.0
8.0
15.0
16.0
7.0
12.0
14.0
22.0
Core-
derived
porosity
5.5
3.6
3.6
4.9
7.1
2.0
8.5
5.2
2.6
1.9
6.1
9.3
6.9
4.3
3.3
2.5
4.8
2.4
3.8
18.4
14.7
10.9
12.5
18.6
22.1
Sample
no.
26
27
28
28
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Log-
derived
porosity
10.0
5.0
7.0
8.0
9.0
5.0
8.0
9.0
7.0
3.0
7.0
7.0
10.0
7.0
5.0
8.0
4.0
5.0
8.0
16.0
4.0
5.0
14.0
21. 0
21.0
Core-
derived
porosity
9.6
10.3
4.5
6.0
6.7
4.1
4.5
6.5
3.3
2.1
2.5
6.8
3.7
6.0
3.4
2.2
1.8
2.9
2.6
15.3
16.9
15.7
12.4
22.9
21.8
154 R.B. McCAMMON
1. As a first step, prepare a scatter diagram of the log-derived
porosity versus the core-derived porosity values on graph paper
by plotting the log-derived porosity value along the ordinate and
the core-derived porosity value along the abscissa for each sam-
pIe. Draw in the "best" fitting line by eye. Should the line be
made to pass through the origin?
2. As measurements, core-derived porosity estimates are conceded to
yield greater accuracies than log-derived estimates. However,
rock cores must be analyzed for porosity singly using laboratory
apparatus, whereas the acoustic log (the geophysical log commonly
used for porosity determination) is recorded continuously with
depth and therefore a continuous porosity profile in the borehole
is generated. Now consider the log-derived porosity to be the
dependent variable and calculate the intercept and slope param-
eters of the "best" fitting line using ordinary linear regression.
Draw this line on the scatter diagram and compare it with the
previous one.
3. It is known that the depth of investigation for the acoustic log
extends beyond the borehole and into the formation. The trans-
mitter-receiver distance for the tool is much greater also than
the length of core used in porosity determinations. Thus, on the
average, log-derived estimates of porosity may be as accurate (or
representative) as core-derived estimates of porosity and thus
both are dependent variables. Assuming the error estimates are
approximately equal, calculate the intercept and slope parameters
of the "best" fitting line. Draw in this line and compare it
with the others.
4. Finally, past experience can guide us in establishing a fixed
relationship between two variables subject to errors. Suppose it
is known that core-derived porosity estimates are accurate to
within one percent and that log-derived porosity estimates are
accurate to within five percent. Calculate the intercept and
slope parameters of the "best" fitting line and compare this line
with the others.
5., There are obviously many approaches to analyzing these data. What
is important to note is that prior knowledge after all is a part
of the data.
7. PROBLEM SET IN GEOSTATISTICS 155
7.5 LINEAR REGRESSION OF POROSITY DATA. II
On the assumption that the core-derived estimates of porosity are
exact, we may wish to examine the nature of the regression, treating
the log derived porosity as a dependent variable having independent
and identically distributed normal random errors for successive ob-
servations. We can construct an analysis of variance table (Table 7.2)
and proceed to test various hypotheses.
Let R2 = (SS due to regression)/(total variation) be defined as
the proportion of total variation explained by the regression. Calcu-
late R2 for both models; Perform an F test on the significance of the
regression for each model at the 95 percent level.
For the linear model given by
Y aX + f3
make a t test at the 95 percent significance level that a O.
For the model given by
Y aX
make a test at the 95 percent significance level that a 1.
TABLE 7.2
Source SS, D.F. ,
of sum of degrees of Mean
variation squares freedom square
Due to l: (y - -2
y) 1
regression
Residual
A 2
n - 2 2
SS/ (n - 2)
l: (y - y) s
Total
-2
variation l: (y - y) n - 1
156 R.B. HcCAMMON
7.6 SUNSPOTS AND EARTHQUAKES
It has been suggested (Simpson, 1967) that solar activity may be
a triggering mechanism for earthquakes. Table 7.3, extracted from
Fig. 2 of Simpson's article, covers the period between January 1, 1950
through June 30, 1963 and lists the average number of earthquakes oc-
curring per day (>5.5 on the Richter scale) for those days in which
the Zurich sunspot number fell within the stated intervals. A higher
sunspot number is associated with greater solar activity.
TABLE 7.3
Zurich Average no. Zurich Average no.
sunspot of sunspot of
number earthquakes numher earthquakes
0-5 3.2 110-115 5.2
5-10 3.6 115-120 5.6
10-15 3.5 120-125 5.8
15-20 4.0 125-130 5.8
20-25 3.7 130-135 5.9
25-30 3.7 135-140 5.9
30-35 4.1 140-145 6.1
35-40 4.1 145-150 6.3
40-45 4.4 150-155 5.1
45-50 3.9 155-160 4.7
50-55 3.9 160-165 5.7
55-60 4.0 165-170 5.8
60-65 3.9 170-175 5.8
65-70 4.6 175-180 5.8
70-75 4.0 180-185 5.7
75-80 4.5 185-190 6.0
80-85 5.2 190-195 5.5
85-90 4.0 195-200 5.4
90-95 5.7 200-205 6.0
95-100 5.4 205-210 5.0
100-105 5.3 210-215 6.2
105-110 5.4 215-220 5.5
7. PROBLEM SET IN GEOSTATISTICS 157
From these data, calculate the correlation coefficient between the
daily average earthquakes recorded and the Zurich sunspot number. Pre-
pare a scatter plot. Do you regard this correlation as significant?
Does this imply a casual relationship? Is there an inherent fallacy
in casting the data in this form?
7.7 ABEGINNING AND AN END
In a recent book, Shaw (1964) has proposed a new method for bio-
stratigraphic correlation. For two fossiliferous stratigraphic columns,
he has proposed that the time correlation be based on the first and last
occurrences of the species present. Table 7.4, taken from Shaw, lists
the elevations of first and last occurrences of fossil species for two
stratigraphic sections from the Upper Cambrian Riley Formation, Llano
Uplift, Texas.
TABLE 7.4
~10rgan White
Creek Creek
Species Base Top Base Top
Kormagnostus simrlex 299 485 460 655
Kinsabia varigata 373 464 561 653
°risthotreta derressa 419 494 582 725
Sricule B 419 504 561 725
Tricrericerhalus coria 419 529 628 706
Meteorasris metra 446 485 628 677
Kingstonia rontotocensis 453 475 628 706
Raaschella ornata 529 532 750 751
Arhelasris walcotti 530 561 744 771
Angulotreta triangularis 538 570 756 779
158 R•B. ~1c CA~1~10N
Prepare a scatter plot of these data showing the elevation of oc-
currence of each species for the two stratigraphic sections. Calculate
the correlation coefficient for:
1. First occurrences
2. Last occurrences
3. Combined occurrences
Do you think there are significant differences in the correlations?
How would you account for these? How would you go about establishing
the equation of correlation for these two sections?
7.8 HEU1ERT TRANSFlRMATION
The purpose of this exercise is to make you familiar with the geo-
metrical interpretation of closed data arrays and to give you a feel
for manipulating precentage data in three- or higher-dimensional space.
The Helmert transformation is defined for n-dimensional space by
the matrix
l/m l/m l/m 11m
-J
P , [ -1/12 1/12 0 0
-1/16 -1/16 2/16 0
-1/v'n(n - 1) -l/ln(n - 1) -lIn(n - 1) (n - l)/In(n
where the rows of P are the unit vectors of the transformation referred
to as the initial Euclidean axes.
For an observation vector x (xl ,x2, ... ,xn) defined so that
·n
L x.
i=1 1.
1 x. > 0
1.
the origin first is relocated by defining
Yi = xi - lin i 1, ... , n
The transformed vector Z (zl,z2' ... , zn) is defined by
Z = Py
or expressed algebraically,
7. PROBLEM SET IN GEOSTATISTICS
z
n
o
1/I2(Y2 - Y1)
1/16(2Y3 - Y1 - Y2)
l//n(n - l)[(n - l)y - (y + .•. + Y )]
n 1 n-1
159
Using triangular graph paper, superimpose the coordinate axes de-
fined by the He1mert transformation for n = 3 on the triangular coor-
dinate system. Plot the following points and check to see that the
Helmert coordinates give rise to'the same location as would be obtained
using triangular coordinates.
1
2
3
0.40
0.60
0.10
0.30
0.10
0.40
0.30
0.30
0.50
For the fourfold system, n 4, determine the Helmert coordinates for
the following three points in 4-space and draw the intersection figure
for the tetrahedral representation of these points.
Xl x2 x3 x4
1 0.40 0.00 0.00 0.60
2 0.30 0.00 0.70 0.00
3 0.20 0.80 0.00 0.00
Position the Helmert coordinate axes on the tetrahedron. What are the
coordinates of the normal vector to the intersection figure formed by
the three points?
For the following two problems, the reader should refer to
McCammon (1969) for a more complete discussion of the methods involved.
This reference can be found in Section 2.10 in Chapter 2.
160 R. B. McCA'1!'-10N
7.9 ASPINE TO REMEMBER
The Florida crown conch (Melongena corona) is found commonly in
the intertidal areas of Florida and Alabama, usually in the shade of
mangrove trees. The shell is characterized by one or more rows of
open spines on the shoulder. Colonial differences exist, however,
which has given way to some preferring to erect subspecies. Imagine
there are 46 specimens of the Florida crown conch in trays before you.
Apart from the variation in size of the shell (as measured by its
length), the most noticeable morphologic difference between specimens
is the presence or absence of the development of lower spines. To aid
you in the subsequent analysis, Table' 7.S lists for each specimen the
shell length (expressed in centimeters) and the number (if any) of lower
spines. Imagine that four of the specimens have been taped in such a
way that the presence or absence of lower spines cannot be ascertained.
These four specimens constitute unknowns which can be used to test the
efficiency of any subsequent classification.
1. Prepare separate histograms based on shell length for shells which
have and do not have lower spines.
2. There is clear evidence that the larger specimens tend to have
lower spines, whereas the smaller specimens do not. The question
is whether this difference has statistical significance. For the
purposes here, we assume that the specimens represent a random
sample from whatever parent population or populations we wish to
define. We can perform first a t test of significance between
the mean shell length for the two groups. Do this assuming first
equal variance for the two populations and second, unequal vari-
ance.
3. Since a significant difference does exist, we can devise a dis-
criminant function based on shell length that will predict the
presence or absence of lower spines for a particular specimen.
For these data, construct such a function and estimate the proba-
bility of a wrong prediction.
4. For the four taped specimens, predict for each whether lower
spines are present.
7. PROBLEM SET IN GEOSTATISTICS 161
TABLE 7.5
Sample Length, Number Sample Length, Number
no. cm of spines no. cm of spines
1 4.50 2 24 3.78 3
2 3.23 0 25 2.92 0
3 4.21 6 26 4.44 7
4 3.39 0 27 3.82 4
5 3.88 0 28 4.45 8
6 4.73 5 29 4.57 5
7 2.65 0 30 3.54 0
8 3.94 0 31 3.22 0
9 3.84 0 32 2.66 0
10 4.02 0 33 4.24 0
11 4.08 4 34 2.94 0
12 4.15 10 35 3.36 0
13 3.36 0 36 4.16 8
14 3.48 0 37 4.02 0
15 3.38 0 38 4.73 3
16 3.40 0 39 4.61 2
17 3.54 7 40 4.79 5
18 2.85 0 41 2.74 0
19 2.71 0 42 4.64 0
20 3.32 0 43 3.78
21 4.59 0 44 4.09
22 4.19 4 45 4.91
23 3.44 0 46 3.96
5. Having found the truth about the four taped specimens, (results
on specimens available on request) add these to the sample data
and construct a new discriminant function and estimate the proba-
bility of a wrong prediction.
162 R.B. McCAMMON
7.10 NEARSHORE-OFFSHORE SEDIMENTS
The data in Table 7.6 represent grain size analyses for 17 sam-
ples of recent sediments collected from two separate environments and
four additional samples that have not been classified.
1. Plot these data on a triangular diagram and draw in by eye the
line which "best" separates the two types of environments.
2. Calculate the linear discriminant and draw in the line that rep-
resents this function on the triangular diagram. What is the
probability of misclassification?
3. Classify the four sample unknowns as being either nearshore or
offshore.
TABLE 7.6
Sample Sand Silt Clay
Nearshore sediments
1 45 53 2
2 92 8 0
3 69 25 6
4 75 25 0
5 63 37 0
6 42 54 4
7 46 51 3
Offshore sediments
8 36 60 4
9 34 61 5
10 6 87 7
11 3 91 6
12 8 87 5
13 33 63 4
14 59 36 5
15 20 78 2
16 48 52 0
17 2 80 18
Sample unknown
A 19 71 10
B 64 35 1
C 33 55 12
D 21 74 5
7. PROBLEM SET IN GEOSTATISTICS 163
4. Suppose it is known that the four unknowns were collected from the
same locality. How would you now classify the group of four un-
knowns?
REFERENCES
Rose, H. E., 1968, The determination of the grainsize distribution of
a spherical granular material embedded in a matrix: Sedimentol-
ogy, v. 10, p. 293-309.
Shaw, A., 1964, Time in stratigraphy: New York, McGraw-Hill Book Co.,
365 p.
Simpson, J., 1967, Earth and Planetary Sci. Letters, v. 3, p. 417-425.
Adelman, I. G., 102
Agterberg, F. P., 102
Allegre, C., 102
Amorocho, J., 102
Anderson, T. W., 93, 101
Apparent diameter, 15
Babbage, C. , 139, 145
Barometric pressure, 81
Bartlett, M. s. , 102
Billings, G. K. , 58
Billingsley, P. , 103
Binomial distribution, 12
Biostratigraphie correlation,
157
Blakely, R. F., 98, 104
Blyth, C. R., 103
Bonham-Carter, G., 95, 96, 101
Breaker height, 85
Cameron, E. M. , 58
Carozzi, A. v. , 71, 89
Carr, D. D. , 97, 98, 101
Chayes, F. , 107, 110, 113, 114
Chester section, 97, 98
Clark, W. A. v. , 103
Index
Coastal processes, 86
Cocke, N. C., 104
Coefficient of variation, 108
COGEODATA, 145
Coleman, J. S., 103
Communality, 30, 41, 57
Computers
in geology, 141
usage, 139
Conditional probability, 4, 92
Correlation coefficient, 3, 24
Correlation matrix, 36, 42
Dacey, M. F., 95, 100, 101, 102
Data matrix, 22
Davis, R. A. Jr., 79, 88
Degens, E. T., 58
Demirmen, F., 58
Deterministic model, 91
Deviate score, 25
Diurnal variation, 84
Doob, J. L., 103
Earthquakes, 156
Edwards, W. R., 89
Eigenvalues, 35, 55, 68-69
165
166
Eigenvectors, 35, 42, 68-69
Embedded Markov chain, 97
Factor
common, 29
loadings, 29, 41,
meaning of, 21
rotation, 42
scores, 29
Factor analysis, 21
Factors, 40
Feller, W. , 103
Finger, L. , 112
Florida crown conch,
Fourier
analysis, 72, 78
coefficients, 81
components, 81
57
160
Fox, W. T., 77, 79, 88
GEOCOM Bulletin, 146
Geometric distribution, 94
Geometric probability, 19
Gingerich, P. D., 103
Goodman, L. A., 93, 101
Goodness of fit, 77
Gower, J .. C., 60
Graf, D. L., 103
Graybill, F. A., 103
Grid search, 8
Griffiths, J. C., 103
Groundwater level, 85
Hammer, P. C. , 139, 140
Harbaugh, J. W., 48, 58, 60, 79,
80, 88, 95, 96, 101, 103,
139, 149
Harman, H. H., 43, 60
Hart, W. E., 102
Heller, R. A., 103
INDEX
Helmert transformation, 158, 159
Henderson, J. H., 80, 88
Hitchon, B., 58
Hollerith, H., 139
Hubaux, A., 148, 149
IAMG, 146
Imbrie, J., 24, 48, 49, 51, 55, 58,
59, 60, 61
Independent-events model, 91, 95
Iterated moving average, 72, 74
James, W. R., 96, 101
Kaesler, R. L. , 60
Kahn, J. S. , 71, 88
Kaiser, H. F. , 42, 61
Karlin, S. , 103
Kemeny, J. G. , 103
Kendall, M. G., 19, 70, 72, 73, 74,
75, 88
Kipp, N. G., 59
Klovan, J. E., 24, 58, 59, 60
Krumbein, W. C., 71,88,93,95, 96,
98, 99, 100, 101, 102, 103,
104, 142, 145, 149
Lake Michigan, 84
Langbein, W. B., 104
Leopold, L. B., 104
Linear regression, 153
Lonka, A. , 59
Loucks, D. P. , 104
Louden, R. K. , 80, 88
INDEX
Lumsden, D. N., 104
Lynn, W. R., 104
mean, 108
variance, 107
Porosity data, 153, 155
Markov chain, 91, 92 Potter, P. E., 98, 104
Markovian clock, 92, 95 Power spectrum, 80
167
Matrix "algebra Preston, F. W., 80, 88, 146, 149
primer on, 62-69 Principal components, 31, 42
Matalas, N. C., 59, 104 Probability tree, 95
McCammon, R. B., 59, 60,153,159 Pseudorandom variable, 111
McElroy, M. N., 60 Purdy, E. G., 48, 60
McIntyre, D. B., 48, 61
Merriam, D. F., 79, 80, 88, 104, Q-mode factor analysis, 24, 25
139, 149
Miller, J. P. , 104
Miller, R. L. , 71, 88
Moran, P. A. P. , 19
Moving average, 71
Nearshore-offshore sediments, 162
Nonstochastic variable, 90
Norman, C. E., 89
Oblique rotation, 54
Paleontologist's dilemma, 150
Parallel-line search, 7
Parent correlation, 107
Parker, M. A., 141, 149
Parker, R. H., 58
Pattison, A., 96, 102
Pearson, K., 61, 106, 114
Pettijohn, F. J., 71, 88
Phase shift, 80
Polynomial curve fitting, 72
Population
covariance, 107
Random process, 90
Random variables, 90
sum of two, 6
transformation, 5
RANEX, 112
Rank of a matrix, 66-67
Ratio correlations, 106
Recursive mean, 1
Recursive variance, 2
Reiher, B. J., 59
R-mode factor analysis, 23
Robinson, G., 74, 75, 76, 77, 89
Robinson, S. C., 148, 149
Rocks in thin section, 13, 150
Rogers, A., 104
Rose, H. E., 19, 20, 150, 163
RTCRSM2, 111
Sample mean, 1
Sample space, 90
Sample variance, 2
Scheidegger, A. E., 104
Scherer, W., 104
168
Schwarzacher, W., 96, 102, 104
Search for dikes, 7
Shannon, C. E., 139
Sharp, E. R., 89
Shaw, A., 157, 163
Sheppard's formulas, 77, 78
Shinozuka, M., 103
Shreve, R. L., 104, 105
Similarity indices
coefficient of similarity, 48
correlation coefficient, 47
distance coefficient, 48
Simpson, J., 156, 163
Singer, G. A., 19, 20
Sloss, L. L., 142, 145, 149
Smart, J. S., 105
Snell, J. L., 103
Spearman, C. , 61
Spencer, D. W. , 58, 60
Spencer
IS-term formula, 76, 77
21-term formula, 76, 77
Spurious correlation, 109, 113
Standard deviation, 26
Standard form, .25
Stationarity, 3, 96
Stemmler, R. S., 103
Sunspots, 156
Sutton, R. G., 71, 88
Sylvester-Bradley, P. C., 146,
149
Thurston, L. L., 61
Times series, 70
Transitional probability, 92
Transition matrix, 97
multistory, 98
Tukey, J. W., 111, 114
UNNO, 112
Upper Cambrian Riley fm, 157
Van Andel, T. H., 58
Varimax rotation, 43
Vistelius, A. B., 76, 88, 105
Wahlstedt, W. J., 103
Waiting time, 93
Walker, R. G., 71, 88
Walpole, R. L., 71, 89
Watson, R. A., 105
Wave period, 85
Weiss, M. P., 71, 89
INDEX
Whittaker, E. T., 74, 75, 76, 77,
89
Wickman, F. E., 105
Wind velocity, 85
Wolman, M. G., 104
Woolhouse IS-term formula, 75, 77
Zeller, E. J., 105
Zurich sunspot number, 156

More Related Content

PDF
Lowrie William - Fundamentals of Geophysics-Cambridge University Press (2007)...
PDF
9780521859028 frontmatter
PDF
A Students Guide To Geophysical Equations 1st Edition William Lowrie
PDF
Geodesy 4th Edition Wolfgang Torge Jrgen Mller
PDF
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
PDF
Highresolution Approaches In Stratigraphic Paleontology 1st Edition Micha Kow...
PDF
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
PPTX
++++COMPARISON BETWEEN COURSES 2023.pptx
Lowrie William - Fundamentals of Geophysics-Cambridge University Press (2007)...
9780521859028 frontmatter
A Students Guide To Geophysical Equations 1st Edition William Lowrie
Geodesy 4th Edition Wolfgang Torge Jrgen Mller
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
Highresolution Approaches In Stratigraphic Paleontology 1st Edition Micha Kow...
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
++++COMPARISON BETWEEN COURSES 2023.pptx

Similar to Concepts in Geostatistics, 1975.pdf (20)

PDF
Essays on Geography and GIS, Vol. 4
PDF
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
PDF
Spatial Analysis In Karst Geomorphology An Example From Krk Island Croatia 1s...
PDF
Geochemical Anomaly And Mineral Prospectivity Mapping In Gis Emmanuel John Mu...
PDF
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
PDF
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
PDF
Ap physics-1-sample-syllabus-1-id-1066422v1
PDF
Field theory a path integral approach 2nd Edition Ashok Das
PPTX
QUANTITATIVE REVOLUTION merits and demerits .pptx
PDF
Theory Of Relativity 2nd Edition Pathria R K
PDF
Fission-Track Thermochronology and its Application to Geology Marco G. Malusà
DOCX
Ge 249 research methods in geography
PDF
Geochemical anomaly and mineral prospectivity mapping in GIS 1st Edition Emma...
PDF
Directions For Mathematics Research Experience For Undergraduates Mark A Pete...
PDF
Fission-Track Thermochronology and its Application to Geology Marco G. Malusà
PDF
Geochemical anomaly and mineral prospectivity mapping in GIS 1st Edition Emma...
PPTX
My Experiments with the Innovative Research Techniques in Geography
PDF
Sims mini mod
PPT
Core Content Coaching Grade 8 Topographic Maps & Satellite Views 14-15
PDF
Elementary Climate Physics 1st Edition F W Taylor
Essays on Geography and GIS, Vol. 4
ebooksclub.org_Quantitative_Ecology_Second_Edition_Measurement_Models_and_.pdf
Spatial Analysis In Karst Geomorphology An Example From Krk Island Croatia 1s...
Geochemical Anomaly And Mineral Prospectivity Mapping In Gis Emmanuel John Mu...
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
Laboratory Manual in Physical Geology 11th Edition American Solutions Manual
Ap physics-1-sample-syllabus-1-id-1066422v1
Field theory a path integral approach 2nd Edition Ashok Das
QUANTITATIVE REVOLUTION merits and demerits .pptx
Theory Of Relativity 2nd Edition Pathria R K
Fission-Track Thermochronology and its Application to Geology Marco G. Malusà
Ge 249 research methods in geography
Geochemical anomaly and mineral prospectivity mapping in GIS 1st Edition Emma...
Directions For Mathematics Research Experience For Undergraduates Mark A Pete...
Fission-Track Thermochronology and its Application to Geology Marco G. Malusà
Geochemical anomaly and mineral prospectivity mapping in GIS 1st Edition Emma...
My Experiments with the Innovative Research Techniques in Geography
Sims mini mod
Core Content Coaching Grade 8 Topographic Maps & Satellite Views 14-15
Elementary Climate Physics 1st Edition F W Taylor
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
modul_python (1).pptx for professional and student
PPTX
Database Infoormation System (DBIS).pptx
PDF
Introduction to Data Science and Data Analysis
PPT
DATA COLLECTION METHODS-ppt for nursing research
PDF
Introduction to the R Programming Language
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Lecture1 pattern recognition............
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx
[EN] Industrial Machine Downtime Prediction
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
STERILIZATION AND DISINFECTION-1.ppthhhbx
A Complete Guide to Streamlining Business Processes
modul_python (1).pptx for professional and student
Database Infoormation System (DBIS).pptx
Introduction to Data Science and Data Analysis
DATA COLLECTION METHODS-ppt for nursing research
Introduction to the R Programming Language
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Lecture1 pattern recognition............
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
importance of Data-Visualization-in-Data-Science. for mba studnts
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Ad

Concepts in Geostatistics, 1975.pdf

  • 2. Concepts in Geostatistics Edited by Richard B. McCammon Springer-Verlag Berlin Heidelberg New York 1975
  • 3. Library of Congress Cataloging in Publication Data Main entry under title: Concepts in geostatistics. Includes bibliographies and index. 1. Geology--Statistical methods. 2. Electronic data processing--Geology. I. ~1cCammon, Richard B., ed. QE33.2.M3C66 519.5102'4~5 74-23669 No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag. ~ 1975 by Springer-Verlag New York Inc. lSBN-13: 978-3-540-06892-1 e-lSBN-13: 978-3-642-85976-2 DOl: 10.1007/978-3-642-85976-2
  • 4. Preface A two-week summer short course entitled Current Statistical Methods in Geology supported by the National Science Foundation was held at the University of Illinois at Chicago Circle in Chicago, Illinois from June 19 to June 30, 1972. The aim of the short course was to bridge the gap between the traditional first courses in sta- tistics offered at most educational institutions and geostatistics as it is being developed by geologists and statisticians engaged in the application of statistics in geology. The course was intended for geology college teachers who were either then teaching or preparing to teach a course within their department dealing with computer ap- plications and the use of statistical methods in geology. This book arose out of the class notes which were prepared by the course director and the invited lecturers. We are grateful to the 28 teachers who attended for their enthu- siastic interest and thoughtful responses to the many statistical concepts presented to them as geologists during the two weeks of the course. I am deeply grateful to my graduate assistants, Richard Kolb and Andrea Krivz, for the long hours spent in collating the course mater- ials, testing the various computer programs, and instructing the par- ticipants in the use of computer BASIC. Richard B. McCammon v
  • 5. Introduction It is now little over 10 years since Miller and Kahns' Statistical Analysis in the Geological Sciences appeared in the geologic litera- ture. By all accounts, this is considered to be the first modern text in statistical geology. Since then, Krumbein and Graybills' An Intro- duction to Statistical Models in Geology, the two volume work by Koch and Link, Statistical Analysis of Geological Data, and Davis' Statis- tics and Data Analysis in Geology have appeared. These books have been witness to the increasing quantification taking place in geology and the earth sciences generally. Coupled with the advances in com- puters, the geologist is now in a position to portray his data and characterize his results on a scale that heretofore was not possible. Briefly, the numeric treatment of geologic data has come of age. In the quantification of geology, statistics has served as mid- wife to the concept of the process response model applied to geologic processes. Because precise hypotheses about natural processes have always proved difficult to formulate, it is not surprising that sta- tistical rather than deterministic models have been put forward. Today statistics is being applied in virtually every branch of geology. As elsewhere in science where statistics has been applied, how- ever, what has held back the more rapid assimilation of statistical concepts in the minds of those engaged within a particular discipline is the absence of an orderly presentation of statistics as it applies to the particular discipline. Although this deficiency has been largely overcome in physics, chemistry, and lately, biology, this is not yet the case for geology. Moreover, there have come to be vii
  • 6. viii INTRODUCTION identified a number of statistical methods commonly used in geology that are not sufficiently understood by the average geology teacher so as to be presented effectively to the student. There is little doubt that the geologist of the future will be required to make more quantitative judgments. In assessing the impact of technology on the environment for instance, the geologist will have to interpret data obtained from a wide variety of sources from which he will be expected to extract a more exact meaning. In reconstruc- ting more precisely the Earth's past based on geophysical measurements and geochemical analyses of rock samples, he will need to perform sta- tistical analyses in order that more exact inferences may be drawn from the data. Because more quantitative data will be collected, more quantitative geologic models will need to be developed. It is likely that as geologic prediction becomes more precise, it will become more quantitative. A geologist therefore will need to be better trained in the application of statistics. The educational imperative for statistics in geology, therefore, is to introduce statistical concepts into the curriculum. This can be done either as a course in geostatistics, or, if this is not feasible, to incorporate basic statistical concepts into those geology courses that utilize statistical methods. While it is true that students in geology and related earth science fields must continue to be encour- aged to take the basic course in statistics, the fact remains that ex- posure to statistics within the field of interest is the anvil upon which a more meaningful grasp of statistics will be forged. Few geology departments today can afford the luxury of having one of its faculty members specialize in geostatistics. While it is rec- ognized that a more quantitative approach to geology is evolving, there remains the more pressing problem of unifying the earth sciences and exposing the student to a more comprehensive view of the Earth's en- vironment, past and present. Therefore, what is most likely to happen at this time is for a department to single out a member who has found statistics particularly useful in his field of study and to ask him to teach a course in geostatistics. Upon agreeing to this, the faculty member in question realizes soon afterward his own limited exposure to statistics or what is more likely, his inadequate knowledge of the
  • 7. INTRODUCTION ix application of statistics in geology in fields outside of his own. It was with this in mind that a two-week summer short course for geology college teachers was given and from which this book has evolved. The book is divided into chapters corresponding to the material presented by the different lecturers in the course. There has been no attempt made to treat any subject in its full detail nor has there been a concerted effort to survey all the possible topics covering the field of statistical geology. The idea rather has been to introduce some basic concepts and to give examples of applications of statistics in geology with the intention of provoking interest and eventually generating discussion among geologists. For someone who is either now teaching or is planning to teach a course in geostatistics, it is hoped this book will serve as a guide. Much of the contained material was prepared specifically for the two week short course. For the student most of all, it is hoped that the book will make for enjoyable reading and fruitful study. Richard B. McCammon
  • 8. List of Contributors Felix Chayes Geophysical Laboratory Carnegie Institute of Washington Washington, DC 20008 William T. Fox Department of Geology Williams College Williamstown, MA 01267 J .E. Klovan Department of Geology University of Calgary Calgary 44, Alberta Canada W.C. Krumbein Department of Geological Sciences Northwestern University Evanston, IL 60201 R. B. McCammon Department of Geological Sciences University of Illinois at Chicago Circle Chicago, Illinois 60680 Daniel F. Merriam Department of Geology Syracuse University Syracuse, NY 13210 xi
  • 9. Contents PREFACE INTRODUCTION LIST OF CONTRIBUTORS CHAPTER 1. 1.1 1.2 1.3 1.4 1.5 STATISTICS AND PROBABILITY by R.B. McCammon Sample Mean and Variance Elements of Probability Problems Searching for.Dikes Rocks in Thin Section References CHAPTER 2. R- AND Q-MODE FACTOR ANALYSIS by J.E. Klovan 2.1 Meaning of Factor 2.2 Data Matrices 2.3 Factor Analytic Modes 2.4 The R-Mode Model 2.5 A Practical Example 2.6 Factors 2.7 The Q-Mode Model 2.8 An Example 2.9 Oblique Rotation 2.10 Practical Examples Refere.nces APPENDIX 1: A Primer on Matrix Algebra xiii v vii xi 1 1 4 6 7 13 19 21 21 22 23 24 36 37 45 51 54 57 60 62
  • 10. xiv CONTENTS CHAPTER 3. SOME PRACTICAL ASPECTS OF TIME SERIES ANALYSIS by 70 William T. Fox 3.1 Generalities 3.2 Polynomial Curve Fitting 3.3 Iterated Moving Averages 3.4 Fourier Analysis 3.5 An Application 70 72 74 78 81 References 88 CHAPTER 4. MARKOV MODELS IN THE EARTH SCIENCES by W.C. Krumbein 90 4.1 Fundamentals 4.2 A Spectrum of Models 4.3 The Markov Chain 4.4 Geometric Distribution 4.5 Probability Trees 4.6 Embedded Markov Chains 4.7 Extensions References Bibliography CHAPTER 5. A PRIORI AND EXPERIMENTAL APPROXIMATION OF SIMPLE RATIO CORRELATIONS by Felix Chayes 5.1 Ratio Correlations 5.2 RTCRSM2 5.3 UNNO AND RANEX 5.4 Comments on Usage References APPENDIX 1: RTCRSM2 Program APPENDIX 2: RANEX and UNNO Programs CHAPTER 6. COMPUTER PERSPECTIVES IN GEOLOGY by Daniel F. Merriam 6.1 Generalities 6.2 Early Beginnings 6.3 Usage in General 6.4 Usage in Geology 6.5 Patterns and Trends References 90 91 92 94 95 97 99 101 102 106 106 111 112 113 114 115 129 138 138 139 139 141 146 149
  • 11. CONTENTS CHAPTER 7. PROBLEM SET IN GEOSTATISTICS by R.B. McCammon 7.1 A Paleontologist's Dilemma 7.2 Particle Size Distribution in Thin Section 7.3 Particle Diameter Sampling Experiment 7.4 Linear Regression of Porosity Data. I 7.5 Linear Regression of Porosity Data. II 7.6 Sunspots and Earthquakes 7.7 A Beginning and an End 7.8 Helmert Transformation 7.9 A Spine to Remember 7.10 Nearshore-Offshore Sediments References xv 150 150 150 152 153 155 156 157 158 160 162 163
  • 12. Chapter 1 Statistics and Probability R. B. McCammon 1.1 SAMPLE MEAN AND VARIANCE Perhaps the best known statistic for describing a given set of numbers is the arithmetic mean. The mean conveys the notion of cen- tral tendency. With respect to data, the mean is linked with the idea of sampling and statistical inference. As familiar as the mean may be to most of us, its sequential properties may not be so well known. Thus, for ordered observations, whether these be based on time, space, or experimental design, the mean of n such observations is given by x n n + x n where it is understood that the x.-th observation precedes the x.-th 1 J observation for i<j. If we add now an observation and recalculate the mean based on en + 1) observations, we can write where xn+1 represents the new observation. This we recognize as the recursive form of the mean. Next, we define v v 1
  • 13. 2 R. B. "1cCAMMON as the ratio of the mean calculated for n + I observations divided by the mean calculated for the preceding n observations, and similarly, we define n x n where xn+l represents the next observation. From this, we can write v= n~ I+ (n ! 1) n or, solving for n, n = (v - l)n + v We can ask now how large or how small must a new observation be in order to affect the mean significantly. Suppose, for instance, we find that the mean is doubled after a new observation is added to ten previous observations. For this to happen, the new observation would have to be 12 times greater than the previous mean. In order to dou- ble a mean calculated from 100 preceding observations, the new obser- vation would have to be 102 times greater than the preceding mean. We conclude, therefore, that the mean becomes increasingly more diffi- cult to alter with increasing sample size unless there are increas- ingly erratic fluctuations in the observations. In the context of a time-dependent geologic process, we can conclude that cumulative ef- fects tend toward equilibrium with advancing geologic time and that any significant departure from equilibrium is most likely due to out- side influences. Another statistic used to characterize a set of given values is the variance. The variance describes the scatter about the mean and for n observations, is defined as n 'E (xi - 2 - xn) 2 i=l sn n - 1 where x represents the mean of the n observations. The denominator n is given by n - 1 rather than n by virtue of the fact that the
  • 14. 1. STATISTICS AND PROBABILITY numerator can be expressed as the sum of n - 1 squared terms each of which is independent of the mean. If we consider n-ordered observations as before, we can write n - 1 2 ---s n n the recursive form of the variance. If x = x n+1 n it follows that for s2 > 0 n This reinforces our earlier comment as regards the increase in the stability of the mean with an increase in sample size. 3 We turn the problem around now slightly and inquire for which new observation will it be true that 2 s n Using the recursive relation, we obtain x = x ± .In + 1 s n+1 n 1 n n as the value for which the variance remains constant. It implies how- ever that the new mean will be different. Thus the paradox is that in order to maintain a constant mean, the variance must be reduced, whereas to maintain a constant variance, the mean must change. For successive observations, the mean and variance cannot both remain constant (unless the variance is zero). For observations further re- moved, the mean and variance can both remain constant, however, if one considers cyclic fluctuations. This is, in fact, the definition of a stationary time series about which we shall hear more later. From what we have said about the variance, it should not be dif- ficult for you to write the recursive form for the covariance between two ordered pairs of observations. From there, it should not be much more difficult to write the recursive form of the correlation coeffi- cient. This is left as an exercise.
  • 15. 4 R.B. McCAMMON 1.2 ELEMENTS OF PROBABILITY Probability can be viewed as partial information either known or presumed known prior to an event. In the previous section, the mean and variance were looked upon as descriptors of past data collected. In terms of probability, our concern lies with both past and future data. For some random variable Y, we associate the probability p(y) that Y will take on the value y. Such a statement is conditioned by information H and thus we define the conditional probability p(yIH) as the probability that Y takes on the value y given information H. An example from geology that illustrates the concept of conditional probability is the fossil collector in search of trilobites who esti- mates that there is a much greater probability of finding a trilobite in Cambrian strata compared with Cretaceous strata. that and or The two most important properties of conditional probability are p(yIH) ~ 0 for all y ~ Y ~ p(YIH) dy y~Y ~ P (yIH) 1 y~Y 1 if Y is discrete. We can argue further that H represents information on a second random variable X; consequently, the conditional probability p(Ylx) is expressed as P(ylx) = p(x,y) p(x) where p(x,y) is the joint probability for X and Y and p(x) is the un- conditional probability for X. If Y is independent of X p(x,y) = p(x)p(y) so that p(Ylx) = p(y) which is another way of saying that Y is independent of X.
  • 16. 1. STATISTICS AND PROBABILITY For Y dependent on X, we can write p(y) = jP(yIX)P(X) dx XLX for the unconditional probability of Y. 5 In many instances, it is necessary to transform one random vari- able to a new variable. For the continuous case, if Y = f(X) it follows that p(y) dy = p(x) dx for all x L X and y L Y. Solving for Y, we have p(y) = p(x) Idx/dyl Let us consider an example. We define p(x) = 1~x otherwise This is a probability density since II 1 IX =1 O P(X) dx = f 2x dx = x2 o x=O 1 Suppose we wish to find the probability density of Y where Y = X2 Taking the derivative, dy = 2x dx we have p (y) 1 so that p (y) = {~ otherwise Thus, the random variable Y is uniformly distributed between 0 and 1.
  • 17. 6 R.B. McCAMMON Another distribution we may need to establish is the sum of two independent rand?m variables each of which follows the same probabil- ity density. In general, we can write for continuous variables p (x + y) = f p (u)p (z - u) du uE::U where Z = X + Y. As an example, consider p(x) = p(y) = ~~ O~x,y~l otherwise We wish to find the probability density for Z defined as Z = X + Y We write p (z) or p (z) z f l(u)p(z - u) du p (u)p (z - u) du 1 O~z~l as the probability density for Z. Try the following problems. 1. 3 PROBLEMS 1.3.1 Consider the probability density function for X given by f(x) = ~ ~ otherwise Let Y = -~n X. Find the probability density function for Y.
  • 18. 1. STATISTICS AND PROBABILITY 1.3.2 Consider the probability density function for X given by f(x) = (1/ l21T)e-1/2x2 Let Y = eX. Find fey). 1.3.3 Let the probability density function for X and Y be given by f(x) II~a O::"x::"a otherwise fey) ~ l~a O,::y::..a otherwise Let Z X + Y. Find the probability density function for Z. 1.4 SEARCHING FOR DIKES 7 Suppose that we have a line segment of length L located somewhere inside a given area. We propose to locate the segment by conducting a search along parallel traverse lines spaced a distance D apart. We will consider the line segment found if one of the traverse lines in- tersects the segment. For L < D, we ask then what will be the proba- bility that the line segment is found. Within a geologic context, the line segment might represent the horizontal trace of a mineralized dike and the parallel line traverses represent the survey lines of a field party. Assuming the mineralization associated with the dike has economic value, the cost of such a search can be weighed against the expected value of the potential ore deposit. Ignoring the economic implications, consider the probabilistic aspect of the problem. To say that a line segment of length L lies somewhere in a given area and nothing more presumes that such a line segment has a random orientation with respect to an arbitrarily chosen traverse line. For an arbitrary angle e, the situation can be seen in Figure 1.1. Thus, the length component h of the line segment perpendicular to a given line of traverse is given by h = L sin e and consequently, the probability of intersecting the line segment for fixed e is given by
  • 19. 8 R.B. McCAMMON f D ! FIGURE 1.1 Pr{Ila} = ~ = ~ sin a where Pr{Ila} represents the conditional probability of intersection. To obtain the unconditional probability Pr{I}, we must integrate pr{I} =IePr{I Ia}pr{a} where Pr{a} defines the probability density for a. The integration is performed for all values of a. On the assumption that a is randomly distributed, we define Pr{a} pr{a} = ~a 7[ which is to say that a is uniformly distributed between 0 and 7[. By substitution, we obtain 7[/2 Pr{I} = -- Sln a da = -- 2 L j ' 2L 7[ D 0 7[ D as the probability of intersection. In the search for the dike, we may wish to search on a grid. Thus, we can inquire as to the probability of intersecting the line segment of length L for a search conducted on a grid having mesh size D. For a fixed angle a, the situation can be seen in Figure 1.2. Thus, we need to consider an intersection of the line segment with either horizontally or vertically spaced lines of the grid. If we take the horizontal to mean the x direction and the vertical to mean the y direction, then the probability of intersection Pr{I} is equal to Pr{I} Pr{I U I } x Y Pr{I } + Pr{I } - Pr{I n I } x Y x Y
  • 20. 1. STATISTICS AND PROBABILITY 9 0 0 £Jh b FIGURE 1. 2 where {I U I } represents the intersection along either a horizontal x y . or vertical line of traverse and 0 n I } represents the intersec- x y tion of both a horizontal and vertical line. Because the grid has a square outline, it follows that PrO } x and hence Pr{I} PrO} y To derive the expression for the latter, we can write Pr{Ix n I } =fPdI ,I le}Pr{e} y x y e where Pr{I ,I Ie} is the conditional probability of intersecting the x y line segment with both a horizontal and vertical line of the grid. Referring to Figure 1.2, pdI ,I Ie} x y so that pdI n I } x Y Consequently, we have 11/2 :;;- 2(DL)2 f " sin e cos e de o .!.~(4-~) 11 D D
  • 21. 10 R. B. McCAMMON as the probability that a search conducted on a grid with mesh size D will locate a line segment of length L (L~D) given that the line seg- ment has no known orientation or location. While the conducted search has been oversimplified in terms of actual practice, there is much that can be deduced from this simple example. After presenting this to students, for example, the follow- ing questions can be posed: 1. If, in fact, the line segment in question is suspected of having a known preferred orientation, how then can the probability of intersection based either on a paraHel-line or grid-type search be maximized? 2. How is the probability affected if L>O? At this point it is pru- dent to pause and remind ourselves that some questions, though simply stated, cause considerable distress. The above question falls within this category. For L>D in the case of parallel line search, for instance, the conditional probability of intersection Pr{I/8} is given by L . D Sln S o ~ 8 ~ sin-1 D/L, sin- 1 (- r) ~S~1T PrO IS} 1 -1 D -1 ( D) sin - < 8 < sin - - L - - L so that the unconditional probability of intersection Pr{I} is pr{I} -1 sin (D/L) 2f L . - - Sln 1T D o . -1 D Sln L + S dS d8 This is by way of saying that questions posed to the students must be thought out beforehand. 3. Does the probability of intersection change if, instead of a line segment, a circle of diameter L is considered? 4. Taking into account the economics of a search for a mineralized dike having an expected value V, for what spacing of 0 will the expected gain be maximized? In posing this question, it is
  • 22. 1. STATISTICS AND PROBABILITY 11 necessary to specify the cost of the search as a function of the line spacing or the grid size. Thus far, our attention has focused on a single line segment or put into its geologic context, a single dike. In Figure 1.3, however, you will notice there are 100 such line segments (or dikes) of equal length that have been located at random within a square area. We can enlarge our original problem by asking for the probability of inter- secting the i-th line segment with length L. This is equal to Pr{I. } 1 2 L ITO ~ ~ (4 ~) (for parallel lines spaced a distance D apart) (for a square grid with mesh size D) depending on the type of search. For example, consider the parallel- line type of search. If we assume that the location and the orienta- tion of the different line segments with length L are each independent, it follows that the number of intersections observed for a given set FIGURE 1. 3
  • 23. 12 R.B. McCAMMON of parallel-line traverses spaced a distance D apart is binomially distributed. The expected number of such intersections N is given by E eN) = NP{I.} 1 where N is the total number of line segments contained within the search area. Take a sheet of tracing paper and with a ruler make a series of parallel lines spaced a distance D, D = 2L, apart. Next, place this overlay on Figure 1.3 and for an arbitrarily chosen orientation, count the number of intersections of line segments with traverse lines. Re- peat this several times if you wish, varying both the location and orientation of the overlay. Since the total number of line segments is known, you can compare the observed number with the expected number given in this instance as E eN) since L/D equals 1/2. Under such conditions, therefore, when the total number of line segments of length L is unknown, the estimate of the total number can be based on the observed number of intersections given as D 11 N est total =L2 N obs All we have said above applies equally to a grid type of search. A grid of mesh size D = 2L can be used to perform a similar experi- ment. Remember, the probability of an intersection is different than for parallel-line spacing however. A question that can be posed to students at this point is what effect there is on the probability of an intersection if the length of the line segment or line segments to be located is unknown. It may be the case even that this length has a specified probability density function Pr{L}. Thus, the length L can be treated as a variable. The probability of an intersection P{I} is expressed in this instance by Pr{I} =fa~ Pr{I /L,e}Pr{L,e}
  • 24. 1. STATISTICS AND PROBABILITY 13 or if we assume that the length L is statistically independent of the angle 8, Pr{L,8} Pr{L}Pr{8 } We can write Pr{I} = f 1 PrUiL,8}Pr{L}Pr{8} L e The probability of an intersection, therefore, is seen to vary de- pending on the distributions assigned to Land 8; consequently, Land 8 become parameters that affect the observations made for different search strategies. A variety of probability density functions could be inserted in the above equation with the result that the probability of an intersection would differ from one situation to the next. 1.5 ROCKS IN THIN SECTION We turn our attention now to a problem that takes us beyond the probability of an intersection.. We consider the length of a line of intersection. Imagine that a circle of diameter D is located some- where between two parallel lines spaced a distance t apart as shown in Figure 1.4. Suppose we locate another line at random that lies be- tween the two lines and extends parallel to them. The probability that this line will intersect the circle is given by Pr{I} = D/t where Pr{I} is the probability of intersection. Here we are interes- ted not in the probability of intersection but rather in the length of !ro !L FIGURE 1. 4
  • 25. 14 R.B. McCAMMON the chord of the circle being intersected. This chord can be consid- ered the apparent diameter of the circle. We wish to derive the prob- ability density of this length. To anticipate the geologic implica- tion of this problem, it is sufficient to take note that the random slicing of circles by lines is identical in concept to the preparation of rock thin sections in which grains imbedaed in a matrix are cut through by a random plane. For the latter, the grain size distribu- tion observed subsequently in thin section will underestimate the ac- tual particle size distribution in the rock. It is natural to examine what effect this has on the moments of the true distribution and how this bias, since it exists, can be reduced if not eliminated. Taking the simplest case, we ask what is the probability, given that our circle is intersected by the line, that the observed length L will be greater than some length La(La>O) and less than or equal to some length Lb (La ~ Lb ~ D). Referring to Figure 1.5, it is seen that this probability is given by I 2h Pr{L <L<L I} =-- a - b D where the multiplier of 2 derives from the presence of two slabs of thickness h occurring within the diameter of the circle. From the above figure, we see that and 2 (Lb) 2 (D) 2 ~ + 2" = 2" T ~Hb IHa 0 _1 FIGURE 1. 5
  • 26. 1. STATISTICS AND PROBABILITY so that h is given by L 2 a D 15 ~1 The unconditional probability that the observed length will lie within these limits is for If we let La approach 0 and set Lb to an arbitrary value c(O<c5D), the cumulative probability distribution for L is Taking the derivative with respect to L, the probability density is As long as we are concerned with only a single diameter D, we can without loss of generality let t = D = 1 so that P (£) £>0 for a circle of unit diameter. A graph of this probability density function is given in Figure 1.6. The distribution falls off rapidly for values much less than one. The question is, how much does this affect the estimate of the circle diameter if observations are based on the apparent diameters measured by successive random slices of a circle. The mean of the above distribution is given by 1 E(L) = 1 o 1 £2 £p (,Q,) d£ = f --- d,Q, = 2:.. o "1 _£2 4
  • 27. 16 R.B. McCAMMON pt£) o FIGURE 1.6 so that an estimate of the true diameter based on the average value of apparent diameters obtained by successive random slices would be in error by approximately 25 percent. In this instance, the estimate could be corrected simply by multiplying the average value of apparent diameter by 4/~. In Figure 1.7, 50 circles of equal diameter Dare located at random within a given square area. Using a ruled trans- parent overlay of parallel lines spaced a distance D apart measure the apparent diameters of circles intersected for several random positions of the overlay. The probability density of the apparent diameter of the circles is generated then. If you wish, construct a histogram for the values obtained. Next, calculate the mean and compare this with the expected value of the distribution above. The two values should agree within the precision allowed by the total number of measured intersections. In reality, particles are of different sizes and mixed together; therefore we must consider a distribution of diameters. In the pre- sent context, this can be represented by circles having different dia- meters mixed together in fixed proportions. To advance our discussion, we must take note that what we took before as pC~), we now mean as pC~/D), where D is a specified diameter. The unconditional probability density of the apparent diameter for mixtures of circles having different diameters can be expressed as Pr{L} =1 Pr{L/D}Pr{D} D where Pr{D} is the probability density of the circle having a diameter D. Rewritten in lower case, it is p(~) =~ p(~ld)p(d)
  • 28. 1. STATISTICS AND PROBABILITY 17 FIGURE 1. 7 The general problem is now as follows: Given an observed dis- tribution p(i) of apparent diameters, find the distribution of the true diameters p(d) given that p(i!d) is specified. In Figure 1.8, for instance, 80 circles 1/4 inch in diameter and 20 circles 1/2 inch in diameter are located at random within a given square area*. Using an overlay with parallel lines spaced 1/2 inch apart, measure the ap- parent diameters of the circles intersected for several random orien- tations. Again, you may wish to construct a histogram. For this ex- ample, p(d) is given by r dl 0.25 p(di ) 0.2 d2 0.50 0 otherwise so that 2 i/d. 1 L 1 P(i) t p(d. ) i>O i=l 1 fl _ (i/d.)2 1 *Reduced scale used in Figure 1.8
  • 29. 18 CO) 0 Oob CO 0 ~ 0 0800 0 <§) 0 CO 0 0 c9 0 0 e 0 6)0 FIGURE 1.8 and therefore l'),p('),) d'), d. ECL) = 1 '),>0 fpc'),) cU d. 1 2 -d 1 L: pCd.)d. 2 i=1 1 1 R.B. McCAMMON 1-+---1 o -k -t in. @~ 0 0 © 0 0 0
  • 30. 1. STATISTICS AND PROBABILITY (~/4)[0.S x (l/S) + 0.2 x (1/2)] O.S x (1/2) + 0.2 19 where t has been set equal to the largest diameter. Compare the mean of the apparent diameters with this expected value. Once again, the values should agree within the precision allowed by the total number of measured intersections. While the solution to the general problem where there is a continuous distribution of particle diameters is ob- viously more difficult, the same principles apply as to this example. These examples, as with the examples in the preceding section, have touched lightly on the more broader topic of geometric probability. Readers who wish to pursue details of the subject further can refer to the short but lucid monograph by Kendall and Moran (1963) and two more recent review articles by Moran (1966, 1969). Within the field of geology, geometric probability has been applied to the study of par- ticle size in thin section (Rose, 1965) and to the probability of success of locating elliptical targets underground with square, rec- tangular, and hexagonal grids (Singer, 1972). For these references, particularly the latter two, the reader will recognize the difficulty in translating the relatively straightforward equations discussed in this chapter to the more complex situations met with in practice. Despite these difficulties, however, the concept of geometric proba- bility offers a very practical device for studying the uncertainty as- sociated with spatial form in geology. REFERENCES Kendall, M. G., and Moran, P. A. P., 1963, Geometrical probability: London, Chas. Griffin, Std., 125 p. Moran, P. A. P., 1966, A note on recent research in geometrical proba- bility: Jour. App. Prob., v.3, p. 453-563. Moran, P. A. P., 1969, A second note on recent research in geometrical probability~ Adv. App. Prob., v. 1, p. 73-90.
  • 31. 20 R.B. McCA~ON Rose, H. E., 1968, The determination of the grain size distribution of a spherical granular material embedded in a matrix: Sediment., v. 10, p. 293-309. Singer, G. A., 1972, Ellipgrid, A FORTRAN IV program for calculating the probability of success in locating elliptical targets with square rectangular and hexagonal grids: Geocom. Programs, v. 4, p. 1-10.
  • 32. Chapter 2 R- and Q-Mode Factor Analysis J. E. Klovan 2.1 MEANING OF FACTOR Factor analysis is a generic term that describes a variety of mathematical procedures applicable to the analysis of data matrices. Although developed, and largely exploited by psychologists, it is a method of general application to many branches of scientific enquiry and geology is no exception. At the outset the word "factor" requires precise definition be- cause the way it is interpreted can give a false impression as to what the method attempts to do. 4athematically, a factor refers to one of a number of things that when multiplied together yield a product. Another use of the word is in reference to some sort of theoretical or hypothetical casual variable. As will become clear, it is the former meaning tha~ should be applied to the method; occasionally the second meaning may be applicable to the results of the method. The principles of the mathematics involved in factor analysis were outlined by Pearson in 1901. Starting in 1904, Spearman began applying the method to psychological theories. Thurstone, Holzinger, and a large number of other workers expanded on the method during the 1930's and 1940's. The advent of electronic computers in the 1950's made the laborious calculations involved amenable to quick solution and the methods became widely available. In the late 1950's the method was first applied to geologic problems. Geologists are commonly faced with problems wherein a large number of properties are measured or described on a large number of things. The "things" may, for example, be rocks and the "properties" may be the 21
  • 33. 22 J.E. KLOVAN amounts of various minerals making up the rocks. If these data are arranged in tabular form such that each rock represents a row of the table and each mineral species a column, then the resulting chart of numbers is referred to as a data matrix. Analysis of such a data matrix may pose a considerable problem to the investigator if it contains many numbers. The primary aim of fac- tor analysis is to achieve a parsimonious description of such a data matrix -- that is, to determine if this table of numbers can be sim- plified in some way. Returning to rocks and minerals as a concrete example, perhaps there are a small number of mineral assemblages, which, if determined, describe the rocks almost as well as all the amounts of the individual minerals, In this case the objective of factor analysis would be to simplify the original large data matrix by determining 1. The number of mineral assemblages present 2. The composition of each assemblage in terms of the original min- eral species 3. A description of each rock sample in terms of the amount of each assemblage present in it The present chapter will attempt to outline the mathematical pro- cedures used in one of the methods of factor analysis, namely, the method of principal components. A simplified heuristic approach will be followed that will attempt to make use of the geologists' ability to visualize three-dimensional concepts. Several simple examples will be used to lead the reader through a formidable mathematical jungle, and finally, some real applications of the method are briefly explained. For readers with no experience with matrix algebra, the Appendix contains concepts that may be helpful in the following exposition. 2.2 DATA MATRICES A matrix is a table of numbers with so many rows and so many col- umns. As a matter of convention here, the rows of a data matrix will represent geologic entities, the columns will represent attributes of these entities. In most cases there will be more entities than
  • 34. 2. R- AND Q-MODE FACTOR ANALYSIS 23 attributes so that most data matrices are rectangular in shape and are "taller" than they are "wide." More simply, data matrices tend to have more rows than columns. In the terminology of matrix algebra, an entire matrix is symbol- ized by a capital letter. "X" will be used to symbolize any data ma- trix. The size of the matrix is specified by a double subscript no- tation, thus X N refers to a table of numbers with N rows and n col- ,n umns. If 93 rocks have been analyzed for 12 minerals, the resulting data matrix may be symbolized as X93 12' , The entities of a geologic data matrix will depend on the nature of the problem. Rock or sediment specimens are obvious cases. Sam- ples of water or oil collected from various formations are also com- mon. (Note that the word "sample" raises some semantic problems in that it carries a special statistical connotation.) Attributes, often referred to as variables, also depend on the nature of the problem. A rock may be a~alyzed as to its mineral com- ponents in which case the amount of each mineral is considered an at- tribute. The rock could equally well (or, as well) be analyzed in terms of certain chemical elements. The amount of an element then be- comes an attribute. Clearly, attributes do not exist in and of themselves; they are properties of things. It is important, therefore, to define at the outset of an investigation what is an entity and what is an attribute. A fossil, for example, may in one study be considered an entity and various features of it will be attributes. Or, in another study, the amount of that fossil in a stratum may be considered an attribute of that stratum. 2.3 FACTOR ANALYTIC MODES Confronted with a data matrix the investigator may focus his at- tention on two distinct yet interrelated questions: R-mode: If the primary purpose of the investigation is to understand the inter-relationships among the attributes, then the ana- lysis is said to be an R-mode problem.
  • 35. 24 J.E. KLOVAN Q-mode: If the primary purpose is to determine interrelationships among the entities, then the analysis is referred to as Q- mode. In many cases both R- and Q-mode analyses are performed on the same data matrix. As discussed later, factor analysis is applicable to both types of questions. The essential solutions of the factor analysis are only slightly dependent on the mode. The exact nature of this relationship is described in more complete detail in a recent paper (Klovan and Imbrie, 1971). 2.4 THE R-MODE MODEL Given the data matrix X N ,the basic problem is to determine m ,n linear combinations of the original n variables that describe the geo- logical entities without significant loss of information (assuming m«n). These m linear combinations are termed factors. The method of analysis operates not on the original data matrix but rather on the matrix of correlation coefficients derived from the data matrix. The well-known Pearson product-moment correlation coefficient is the standard means of assessing the degree of linear relationship be- tween a pair of variables. If X. and X. are any two variables, that is, two columns from the 1 J data matrix X, then the correlation coefficient between them may be computed from: (2.4.1) N I: (XkJ. - x.)2 k=l J N where the notation ~1 refers to summation over all the entities; Xi and X. are the mean values of variable X. and X.. J 1 J This is the so-called raw score formula and the situation is por- trayed in Figure 2.1.
  • 36. 2. R- AND Q-MODE FACTOR ANALYSIS Va,2 .... '12 =+1.0 Va, 1 Va,S ... . '56 = 0.0 Va,5 FIGURE 2.1 Va, 4 ........ '34 = -1.0 25 Va, 3 The origin of this graph may be shifted without changing the con- figuration of the points. If the mean value of a variable is sub- tracted from every value of the variable, the results are deviate scores. The resulting numbers show how far from the mean each entity is. This also results in shifting the origin of the variable to its mean value. If we define x. and x. as two variables in deviate form, then the 1 J correlation formula becomes r .. 1J N L: ~i~J' k=l IN 2~N 2 " L: xk · L: xkJ' k=l 1 k=l This situation is shown on Figure 2.2 A variable in standard form is defined as x. - X. x. Z. 1 1 1 1 (J. a. 1 1 (2.4.2) (2.4.3)
  • 37. 26 ,. . [ Va, 2 ••• ":2 =+ 1.0 Va,1 IVa, 6 . 1 ! '56 0.0 -.-.-...-.-.-.- : Va, 5 FIGURE 2.2 Va,4 '34 = - 1.0 .'. Va, 3 J.E. KLOVAN where a. is the standard deviation of variable X.. A standardized ~ ~ variable may be viewed as having a mean of zero and a standard devia- tion of one. The individual values of the variable show how far from the mean an entity is in terms of units of standard deviation. The standard deviation of a variable is given by N (Xki _ X)2 ~tl 2 L xki a. N ~ k=l N (2.4.4) or INa. ~ = L xk · ~ k=l ~ (2.4.5) (Editor's note: This definition differs slightly from the one used in Chapter 1 in that N, rather than N-l, is in the denominator. In fac- tor analysis, the sample size is usually large enough so that this dif- ference can be safely ignored.) Thus z.~ X. - X. ~ ~ a. ~ x. -2:..= a. ~ X. ~ (2.4.6)
  • 38. 2. R- AND Q-MODE FACTOR ANALYSIS Substituting this into formula (2.4.2) we obtain r .. 1.J 1 N -N :E Zk1.·ZkJ· k=1 27 (2.4.7) This formula illustrates the fact that the correlation coefficient is nothing more than the average value of the cross-product between two variables given in standard form. Up to now only two variables have been considered. In the gen- eral case, correlation coefficients are computed between every possi- ble pair of variables and arranged into a square symmetrical matrix R This matrix contains all the information regarding the pairwise n,n linear relationships between the variables. In Figure 2.1 the correlation coefficient was perceived to mea- sure the degree of linear association between two variables as mea- sured by the scatter of data points. Note that the axes of the graph are the variables and the entities are points on the graph. If three variables are considered, then the third variable is constructed as an axis at right angles to the other two, and the entities will form some sort of three-dimensional swarm of points. Other variables can be added by constructing axes at right angles to all other axes but of course this situation cannot be portrayed in three dimensions. A row of the data matrix may then be considered as a vector that gives the coordinates of an entity in n-dimensional space. The situation may be reversed. A graph may be constructed using the entities as sets of orthogonal axes as in Figure 2.3. Here the variables become points on the graph. A column of the data matrix may then be considered as a vector that gives the coordinates of a variable in N-dimensional space. If the variables are expressed in deviate form, that is the ori- gin is at the mean, then variable Xi and variable Xj can be portrayed as two vectors in N-space. From the Pythagorean theorem, the length of a vector is equal to ~ 2 t. = .6 xk · 1. k=l 1. (2.4.8)
  • 39. 28 J.E. KLOVAN Item 1 FIGURE 2.3 Further, elementary trigonometry shows that the angle e between the two vectors is equal to cos e (2.4.9) which is exactly equivalent to formula (2.4.2). Thus the correlation coefficient between any two variables is also the cosine of the angle between the two vectors representing the variables situated in N-space. Both interpretations of the correlation coefficient will be found useful in the following discussion. The following equation perhaps best summarizes the underlying rationale of factor analysis:
  • 40. 2. R- AND Q-MODE FACTOR ANALYSIS 29 (2.4.10)* In words, the equation states that any variable (for convenience considered in standard form) Z., consists of a linear combination of J m common factors plus a unique factor. The resemblance of this equa- tion to a mUltiple regression equation should be obvious. In the factor model, the F's refer to hypothetical variables called factors. It is assumed that each of these m factors will be involved in the delineation of.two or more variables, thus the factors are said to be common to several variables; m is assumed to be less than n, the number of variables. The a's are analogous to S weights in regression analysis. They are weights to be applied to the factors so that the factors can best predict the value of Z.; "best" defined J in a least-squares sense. In factor analysis parlance, the a's are termed loadings and the F's factor scores. The factor designated E. J is a factor unique to variable Z. and is analogous to the error term J in a regression equation. The factor model contains n such equations; one for each variable. For a particular entity k, equation (2.4.10) becomes: (2.4.11) The values for the a's do not change from entity to entity (just as the S's remain constant in regression equations), but the values of the F's do change from entity to entity. An excellent way to view the F's is to think of them as new variables -that are linear combinations of the old variables. As such, each entity can "contain" a different amount of each one of these new variables. The F's are referred to as factor scores. The basic problem then is threefold: 1. To determine values for the a's 2. To determine values for the F's 3. To determine m, the number of common factors *In this equation and the following derivation, the author rea- lizes that he is confusing the principal component model with that of a true factor analytic model. The justification for this is that, in practice, most geologic applications follow this model, and, addition- ally, it is easier to explain the underlying rationale and objectives in this form.
  • 41. 30 J.E. KLOVAN There are several ways in which an explanation of the solution can be approached. Two such approaches will be dealt with here. The equation for the variance of a variable in standard form is given by 2 eJ. J Due course, N 2 L Zkj k=l N to the standardization process equal to one. the variance In terms of the factor model the variance may be N 2 L Zkj 2 2 2 2 2 2 a1j l:Fk1 + a 2j l:Fk2 a .l:Fk 2 k=l + ... + mJ m eJ. J N N N N = 1 Two simplifying restrictions may now be imposed. 1. The factors must be in standard form. 2. The factors must be uncorrelated. (2.4.12) of Z. is, of J written as: 2 2 a.l:Ek · +~ N (2.4.13) 2 The first constraint makes every term of the form l:Fk1/N equal to one (since this is the variance of the factor). The second constraint makes every term of the form l:Fk1Fkp/N equal to zero [see equation (2.4.7)]. The entire equation thus becomes: (2.4.14) It is seen that the total variance of a variable is to be made up of the sum of the squared a's. Further, the total variance consists of two parts. 1. That due to the common factors. This is termed the communality, symbolized h~. J 2 + ... + a. Jm (2.4.15)
  • 42. 2. R- AND Q-MODE FACTOR ANALYSIS 31 2. That due to the unique factor. This of course is equal to 1 - h~ and by definition is that part of the variance of variable j J that is not shared by any of the other variables. It is analogous to the error term in regression analysis. The method of principal components attempts to minimize this unique variance in the solution for the a's and F's. The algebraic notation of the factor model is very cumbersome and not readily comprehended. Matrix notation allows an easier represen- tation of the model. As has been pointed out, the X data matrix can be transformed N,n to the standardized version ZN . ,n We can consider the Z matrix as being the sum of two matrices: Z N,n C + EN N,n ,n (2.4 .16) where C contains the "true" measures and the matrix E contains "error" measures. It is a fact that any matrix can be expressed as the product of two other matrices. Thus the matrix Z can be considered as the pro- duct of the matrix F and A or Z = FA' (2.4.17) where A' is the transpose of A. The F matrix contains the factor scores; the A matrix the factor loadings. But the model contains two types of F's and A's, the common and unique portions. The factor loading matrix can be considered as consisting of two parts. mI I I I I I Ac I Au I I I I n I The first m columns contain the common factor loadings; the last u columns contain the unique factor loadings; Au is a diagonal matrix.
  • 43. 32 J.E. KLOVAN Similarly, the F matrix can be partitioned into a part containing m columns of common factor scores and a part containing u columns of unique factor scores. m n ~____~____________~ The model is now evident; Ac and Fc will be chosen in such a way as to yield the matrix C; Au and Fu yield the matrix E. The sum of C and E, of course, yield the original matrix Z. To summarize, the total data matrix is considered to be derivable from the product of two other matrices (Z = FA'). Z can further be considered as the sum of two matrices C and E; C containing "true" measures, E containing error measures: Z = C + E. Both F and A can be partitioned into two components, a common variance part and an error part. Thus Z = F A' + F A' c c u u (2.4 .18) Because we are usually only interested in the matrix C, solution for Fc and Ac will be sufficient. E can always be obtained from E = Z - C. The basic matrix manipulations required for solution are presented below in point form. Each step is then explained with reference to a simple geometric model. 1. The correlation matrix may be obtained from R = 1:..z'z N 2. The basic factor model states that Z = FA' or (2.4.19) (2.4.20)
  • 44. 2. R- AND Q-MODE FACTOR ANALYSIS 33 Z' =AF' (2.4.21) 3. Substituting (2.4.20) and (2.4.21) into (2.4.19) and ignoring the 1 b' constant N' we 0 taln R = Z'Z = AF'FA' (2.4.22) 4. We impose the condition that the factors will be uncorrelated; that is, F will be orthonormal. F'F = I (2.4.23) where I is the identity matrix. Thus, (2.4.22) becomes R = AA' (2.4.24) 5. We impose the constraint that A'A = A (2.4.25) where A is the diagonal matrix of eigenvalues of the correlation matrix R, or U'RU = A (2.4.26) where U contains the eigenvectors associated with A. U is a square orthonormal matrix so that: U'U = UU' = I 6. The following matrix manipulation provides the solution. (a) Pre-multiply (2.4.26) by U UU'RU = UA RU = UA (b) Post-multiply (2.4.28) by U' RUU' UAU' R UAU' (2.4.27) (2.4.28) (2.4.29) (c) Because R is a square symmetric matrix with the Gramian property (positive semi-definiteness): 7. Substituting into (2.4.25) R = AA' = UA1/ 2A 1/ 2U' (2.4.30)
  • 45. 34 J .E. KLOVAN A = Ul',l/2 (2.4.31) 8. The matrix F may be solved for from Z FA' ZA FA 'A ZA FA F ZAA- 1 (2.4.32) Explanation ~. Earlier it was shown that this equation was valid. Geo- metrically, Figure 2.4 shows the scatter diagram interpretation of a situation involving three variables. Note that the swarm of data points is in the form of a three-dimensional ellipsoid. Theoretically, every correlation matrix will define such an ellipsoid -- a hyperellip- soid when more than three dimensions are involved. ~. The basic factor equation states that the data matrix can be considered as the product of two matrices F and A. Unfortu- nately, matrix theory shows that there is an infinite number of pairs of matrices F and A whose product will reproduce Z. Var 3 , / . / . , , . , . ' . " .. " . • I • I • I • I • I I .,.~' , . •• ,I I' .....;---1 /3'('2 • I FIGURE 2.4
  • 46. 2. R- AND Q-MODE FACTOR ANALYSIS 35 Step 3. This equation simply shows the relation between F, A, R, and Z. Step 4. Because there is an infinite number of pairs of matrices whose product will yield Z, we impose a constraint that the F matrix be orthonormal. Simply, this means that factors will be in standard form and furthermore, they will be uncorrelated. If we consider these factors to be new variables then this implies that the new variables have no mutual correlation among them. Because of this constraint, it is seen that the F matrix can for the moment be disregarded from fur- ther consideration. Steps 5-6. The crux of the method of principal components is em- bodied in equation (2.4.25). Any square symmetric matrix, such as R, can be uniquely defined in terms of two other matrices that have special properties. In the expression R = UAU', A is a diagonal matrix containing the eigenvalues of R. U is a square orthonormal matrix containing the associated eigenvectors. The calculation of eigenvalues and eigen- vectors is a straightforward matter using computer programs. Essen- tially, eigenvalues are the roots of a series of partial derivative equations set up so as to maximize the variance and retain orthogon- ality of the factors. Physically, the eigenvectors merely represent the positions of the axes of the ellipsoid (or hyperellipsoid) shown on Figure 2.4. The eigenvalues are proportional to the lengths of these axes. The largest eigenvalue and its corresponding eigenvector represent the major axis of the ellipsoid. It is important to note that the data points show maximum spread along this axis, that is, the variance of the data points is at a maximum. The second largest eigenvalue and its eigenvector represent the largest minor axis. The axis is, of course, at right angles to the major acis and the data points are seen to have the second largest amount of variance along this direction. The same reasoning applies to the remaining eigenvalues and eigenvectors. So what is accomplished at this step is to create a new frame of reference for the data points. Rather than using the old set of vari- ables as reference axes we can use the eigenvectors instead. These
  • 47. 36 J.E. KLOVAN have the property that they are located along directions of maximum variance and are uncorrelated. ~. The equation R = AA' suggests that the correlation ma- trix derived from the original data can be duplicated exactly by the major product moment of the factor loadings matrix. This is true if as many factors as original variables are used. However, the matrix Ac of equation (2.4.18) will approximate R. The difference matrix, R - AcA'c contains the residual correlations not accounted for by the common factors. The determination of the number of common factors needed is left for later discussion. The end result of the matrix manipulations is the equation A = UA1/2. This simply means that the desired matrix of factor loadings is the orthogonal matrix of eigenvectors of R, each column of which is scaled by the square root of the corresponding eigenvalue. This is merely a normalization process. ~. The matrix of factor scores is obtained by straightfor- ward matrix manipulation. 2.5 A PRACTICAL EXAMPLE To recapitulate what has been discussed in rather abstract terms, and to give physical significance to the method as explained thus far, a simple geologic problem will be followed through. Figure 2.5 is a typical geologic data matrix with 20 rows and 10 columns. The rows represent 20 localities and the columns, as indi- cated, represent attributes of the rocks and structures at each local- ity. The data are fictitious. Figure 2.6 is the 10 x 10 correlation matrix obtained from these data. Figure 2.7 is the list of eigenvalues obtained from the correla- tion matrix.
  • 48. 2. R- AND Q-MODE FACTOR ANALYSIS 37 Geologi cal Prooerties OJ O'l II) 10 " OJ > '0 N ~ OJ '+- 10 ~ ~ 0 OJ '+- OJ OJ ~ .~ r- '+- ~ ~ OJ > OJ U 0 '+- N OJ .~ r- 0 N 0 ~ E u 10 U .~ II) '+- s:: OJ ....... r- .s::. II) II) OJ 0 0 II) ~ II) 10 C ~ ~ .~ II) OJ OJ U II) E OJ r- 10 O'l ~II) OJ E ~ " lOS:: s:: 10 .s::. s:: ....... ~ s:: s:: s:: .~ ~o co~ ~ II) ~ .~ .~ .~ '+- 1I).i:l U s::.~ .s::. s:: U r- >~ 10 Or- co 10 Locality O'l OJ 10 ~ ~IO 0- r- 0 .~ OJ ~ ::E LL. z: Vl uu Vl wo I- :> LL. 1 1175 999 975 625 158 262 437 324 431 433 2 936 820 813 575 267 379 478 413 411 428 3 765 711 716 599 457 548 579 558 491 513 4 624 598 600 542 471 515 531 520 490 500 5 417 422 422 432 444 441 437 439 437 437 6 401 403 375 401 405 270 317 290 515 465 7 520 504 488 469 427 370 410 386 507 482 8 661 626 618 553 462 466 506 480 529 523 9 877 787 773 594 354 401 493 434 500 498 10 1060 932 898 656 315 312 468 370 580 552 11 1090 960 935 681 334 375 518 427 567 555 12 896 811 790 629 403 411 511 448 570 555 13 748 688 672 560 401 399 472 426 525 512 14 617 573 553 477 360 315 385 342 487 462 15 436 424 389 393 361 207 277 236 514 455 16 664 587 560 419 212 182 287 221 397 369 17 750 665 651' 484 259 299 387 331 399 396 18 903 797 791 573 291 396 486 427 421 437 19 998 888 887 657 366 499 583 527 480 506 20 1162 999 994 671 252 404 539 450 449 471 Data matrix FIGURE 2.5
  • 50. 2. R- AND Q-MODE FACTOR ANALYSIS 39 Eigenvalues of correlation matrix Eigenvalue Percent Variance Exnlained Factor I 5.46 54.61 Factor II 3.19 86.54 Factor III 1.35 100.00 FIGURE 2.7 There are only three nonzero eigenvalues (that there are three and only three is a reflection of the "cooked-up" nature of the problem). Geometrically, the analysis began with the 20 data points scattered in 10-dimensional space. Because there are only thre~ nonzero eigen- values, the implication is that the hyperellipsoid enclosing the data points has seven axes of zero length and exists, in fact, as an ordin- ary three-dimensional ellipsoid. Therefore the data points can be lo- cated with reference to three mutually perpendicular axes instead of the original 10. The A matrix, in Figure 2.8, will be found to reproduce R exactly from R =AA'. The F matrix in Figure 2.9 will, in conjunction with A, reproduce the standardized version of the data matrix according to Z = FA'. The geologic interpretation of these matrices will be deferred until some additional concepts are put forward. Principal Component Factor Matrix Factors Comm. 1 2 3 1 1.0000 0.8029 -0.5894 0.0886 2 1.0000 0.8385 -0.5367 0.0940 3 1.0000 0.8579 -0.5122 0.0407 4 1.0000 0.9760 -0.1961 0.0943 5 1.000 0.0176 0.9998 -0.0098 6 1.0000 0.6538 0.5999 -0.4611 7 1.0000 0.9297 0.2393 -0.2799 8 1.0000 0.7647 0.5018 -0.4042 9 1.0000 0.3268 0.5407 0.7751 10 1.0000 0.6641 0.5437 0.5132 Variance 54.614 31. 928 13.459 Cum. var 54.614 86.542 100.000 Principal Factors of Correlation Matrix FJr,URE 2.8
  • 51. 40 J.E. KLOVAN Pri nci pal Factor Score Matrix Locality Factors 1 2 3 1 0.3887 -2.2383 0.1359 2 0.1989 -0.9798 -1.1363 3 0.9083 1.2182 -1.0941 4 0.2926 1.3964 -0.9847 5 -0.9595 1.0959 -1.4845 6 -1.6185 0.6760 0.9116 7 -0.7464 0.9150 0.1871 8 0.2992 1.2964 0.0020 9 0.4987 0.0377 0.1130 10 0.9111 -0:4040 2.1310 11 1. 2932 -0.1946 1.5206 12 0.8987 0.6101 1.1914 13 0.1935 0.5946 0.4475 14 -0.8159 0.1323 0.3074 15 -1.8532 0.1726 1.3474 16 -1. 7613 -1. 5741 -0.2310 17 -0.8554 -1.0525 -0.9248 18 0.2300 -0.6998 -1.1155 19 1. 3189 0.1578 -0.7823 20 1.1785 -1.1592 -0.5417 FIGURE 2.9 2.6 FACTORS The basic factor equation, Z = FA' states that the data matrix can be considered as the product of two factors F and A. The mathe- matical procedure used to obtain these matrices has been outlined, but it is now necessary to explain what they signify and how they are in- terpreted and used. The matrix of factor scores, F, in general, consists of N rows and m columns where N is the number of entities and m equals the num- ber of common factors. Each column is in standard form with zero mean and unit variance, and there is zero correlation between columns. Because the factors are linear combinations of the original vari- ables, they can themselves be considered as new variables with the abovementioned properties. Scanning down a column of F, the "amount" of this new variable as contained in each entity is revealed. Being in standard form, factor scores are expressed in units of standard
  • 52. 2. R- AND Q-MODE FACTOR ANALYSIS 41 deviation from the mean of the hypothetical variable. Thus, the vari- ation from entity to entity is expressed in relative terms only. None- theless, these new variables can be plotted and manipulated in the same way as any other variable. Using the factors as orthogonal axes, the entities may be plotted on a scattergram to show the distribution of entities in m-dimensional space. The matrix of factor loadings, A, generally has n rows and m col- umns. The rows correspond to the original variables; the columns are the factors. Each column has been scaled so that the sum of squared elements in the column is equivalent to the amount of original vari- ance accounted for by that factor. The elements in a column may be considered as the coefficients of a linear equation relating the vari- ables to the factor -- in essence, they give the recipe for the factor Therefore, the columns of the A matrix can be used to give some phy- sical meaning to the factors. A row of the A matrix shows how the variance of a variable is distributed among the factors. Interrelationships between variables can be determined by a comparison of their rows in the A matrix. As was pointed out in equation (2.4.15), the sum of the factor loadings squared in a row of A is an. expression of the amount of vari- ance of a variable accounted for by the m factors. This was termed the communality. The communality attached to each row of the A matrix gives an appreciation of how well each variable is explained by the m factors considered. Another valid view of an element in A is that it represents the correlation between a variable and a factor. Because correlations are angular measures, the row elements actually represent the cosines between a variable and the m reference factor axes. A useful way to analyze and interpret the factors is to plot the loadings as two- dimensional scattergrams. For m factors there will be m(m - 1)/2 such graphs, which in effect gives two-dimensional snaps'lots of m-space. Groupings of variables and trends between them often yield important clues as to the physical significance of the factors. A practical example of interpretation is deferred until the matter of rotation has been discussed.
  • 53. 42 J.E. KLOVAN It has been stressed several times that there is an infinite num- ber of solutions for the equation Z = FA'. This is so because there is an infinite number of pairs of matrices F and A that will reproduce Z. The method of principal components determines a unique solution because certain constraints are imposed, namely, that the F matrix is orthonormal and that the A matrix contains the eigenvectors of the cor- relation matrix produced from Z. These constraints yield a solution with two desirable properties; the factor axes are orthogonal and pass through positions of maximum variance. But, they also possess an un- desirable property. Considered as new variables, they are very gen- eral and, in fact, correspond to a sort of average of all the original variables. Although this may in some instances be useful, it is com- mon to move the positions of the factor axes by rotating them so that they will satisfy certain other criteria. An attempt is made to achieve what is termed "simple structure," by which is meant that the factor axes are located in positions such that: 1. For each factor only a relatively few variables will have high loadings, and the remainder will have small loadings. 2. Each variable will have loadings on only a few of the factors. 3. For any given pair of factors, a number of variables will have small loadings on both factors. 4. For any given pair of factors, some of the variables will have high loadings on the second factor but not on the first. 5. For any given pair of factors, very few of the variables will have high loadings on both. What these conditions attempt to achieve is to place the factor axes in more meaningful positions, that is, so that they will be highly correlated with some of the original variables. A large number of methods have been designed to accomplish these objectives but only two will be considered here. An approximation to simple structure, designed by Kaiser (1958), uses a rigid rotation procedure. This means that the orthogonal prin- cipal component factors will be rigidly rotated and maintained ortho- gonal. Kaiser's approach is to find a new set of positions for the prin- cipal factors such that the variance of the factor loadings on each
  • 54. 2. R- AND Q-MODE FACTOR ANALYSIS 43 factor is a maximum; the loadings should tend toward unity and zero (the sum of the variance for all m factors is the actual quantity max- imized). That is, when the value of V in the following expression is maximized simple structure should be obtained: V m nL: p=1 (2.6.1) where b. is the loading of variable j on factor p on the new, rotated JP factor axes. The full explanation of this equation is rather too involved for these notes, and the reader is referred to Harman (1960, p. 301) for a full discussion. The process can be readily understood in terms of matrix algebra. Given the n by m matrix of principal factor loadings A, the objective is to transform it to.an n by m matrix of varimax factor loadings B such that B will satisfy equation (2.6.1). Usually, factors are transformed (or rotated) two at a time. In matrix terms, this can be accomplished by B = AT (2.6.2) where T consists of fcos <I> - sin <l>J [:;in <I> cos <I> <I> is the angle of rotation required to yield a maximum value of V in equation (2.6.1) and is determined by an iterative process. The matrix B contains the loadings of the original n variables on the m rotated factors and can be interpreted in the same way as the A matrix. Factor scores for the varimax factors can also be computed. Again the values can be interpreted in the same way as the principal factor scores. The varimax factor scores remain in standard form but the factors are now slightly correlated. Figure 2.10 shows the rotated, varimax factor matrix and Figure 2.11 the associated factor scores. Figure 2.12 illustrates and compares the plots of principal com- ponents and varimax loading derived from the data matrix of Figure 2.5.
  • 55. 44 Var Comm. 1 1.0000 2 1.0000 3 1.0000 4 1.0000 5 1.0000 6 1.0000 7 1.0000 8 1.0000 9 1.0000 10 1.0000 Variance Cum. var Varimax Factor Matrix Factors 1 0.9971 0.9916 0.9835 0.8813 -0.6197 0.0558 0.5191 0.2102 0.0146 0.2338 44.771 44.771 FIGURE 2.10 2 0.0765 0.1241 0.1804 0.3985 0.5880 0.9897 0.8380 0.9648 0.0488 0.3979 33.318 78.089 J .E. KLOVAN 3 -0.0060 0.0362 0.0117 0.2540 0.5198 0.1317 0.1680 0.1580 0.9987 0.8872 21. 912 100.000 Note that in both cases, graphs of the factor loadings reveal similar patterns but with the varimax loadings the factor axes are lo- cated near the extreme clusters of variables. Interpretation is thus facilitated. As can be seen from the graph and scanning down the columns of the varimax factor loading matrix B, loadings on variables 1, 2, and 3 are extremely high on Factor 1 and very low on the other two factors. This would lead to the interpretation that the "new" variable, Factor 1, is in some way a paleotemperature index because variables 1, 2, and 3 are all geologic paleothermometers. The first column of the varimax factor score matrix thus shows the distribution of paleotemperature at each of the 20 localities in terms of units of standard deviation away from the mean paleotemperature. Similarly, Factor 2, of the B matrix, may be interpreted in terms of deformation, while Factor 3 represents some index of permeability.
  • 56. 2. R- AND Q-MODE FACTOR ANALYSIS 45 Varimax Factor Score Matrix Locality Factors 1 1.7191 -1.1345 -0.9480 2 0.6130 0.2141 -1.3682 3 -0.2291 1.8610 0.0226 4 -0.7962 1. 5403 0.0196 5 -1. 6301 0.9332 -0.8890 6 -1. 5345 -1.0766 0.6182 7 -1.1157 -0.0212 0.4175 8 -0.5908 0.9143 0.7663 9 0.3711 0.2439 0.2588 10 1.2434 -0.9398 1.7595 11 1.3197 -0.2480 1.4876 12 0.4639 0.1764 1.5324 13 -0.1684 0.1884 0.7211 14 -0.6634 -0.5747 0.0782 15 -1.3364 -1.7591 0.6394 16 -0.3829 -1. 7847 -1.5171 17 -0.1144 -0.5587 -1. 5440 18 0.4618 0.3838 -1.1944 19 0.7982 1.3086 -0.1605 20 1.5620 0.3336 -0.6997 FIGURE 2.11 This rather simple examp~e illustrates the main features of R- mode factor analysis. Several complications arise when real data are analyzed, and these will be touched o"n following a discussion of Q- mode techniques. 2.7 THE Q-MODE MODEL When the nature of a geologic problem is such that relationships between entities are the focus of attention rather than relationships
  • 58. 2 R- AND Q-MODE FACTOR ANALYSIS 47 between properties, then the Q-mode method of factor analysis becomes a useful analytical tool. Numerous such geologic situations are easily envisaged. The de- lineation of lithofacies or biofacies are perhaps the most evident. Here the objective is to find groups of entities that are similar to one another in terms of their total composition. The catch in this objective is to define "similarity" in a mathe- matically realistic way. Several measures of similarity will be dis- cussed next. Once all the interentity similarities are computed and arranged in matrix form, the previously described methods of solution are applicable to the analysis of this similarity matrix. Similarity Indices There is a vast, and ever expanding literature, on the problems associated with a mathematical definition of similarity. There is little in the way of theoretical justification for the selection of one index over another; however, Gower (1967) has at least underscored several considerations that must be taken into account. Aside from similarity coefficients designed for presence-absence data, three indices have commonly been used in Q-mode analysis. 1. Correlation Coefficient. The Pearson product moment correlation coefficient has been used to indicate the degree of relationship between two entities. If X k and X~ are any two ~ of the data matrix then: (2.7.1) measures the degree of relationship between the two entities. As appealing as this index may appear there are a number of intuitive and theoretical drawbacks. Note, for example, the ~ and x~ terms. These are average values for item k and~. If an item is described in terms of a wide variety of properties
  • 59. 48 J.E. KLOVAN measured in different scales, what then does such an average mean in physical terms? By subtracting this value from each of the attribute quantities, the proportions of the attributes are al- tered from entity to entity. To partially overcome this, some workers advocate standardizing the data matrix by columns before using this equation. Clearly, this only adds a further complica- tion to the data and complicates interpretation. For this and several other reasons, the correlation coeffi- cient is not considered a good index of similarity. 2. Coefficient of Proportional Similarity. Imbrie and Purdy (1962) define an index of similarity referred to as cos 6. The equation used is (2.7.2) For positive data this index ranges from zero, for perfect dissimilarity to one for complete similarity. The difficulty with this index is that while it preserves the proportional relation- ships between entities it is blind to the absolute sizes involved. Thus a "midget" and a "giant" whose attribute proportions are identical would be considered as being completely similar. In many problems where the investigator is interested in changes in the proportions of constituents, such as sedimentologic and faunal studies, the index is very appropriate. Imbrie (1963, p. 26) and McIntrye (1969, p. DBM-A-41) suggest methods for including a "size" variable that helps to remove the inherent "size" blind- ness of the index. 3. Distance Coefficient. Harbaugh (1964) describes the use of a co- efficient that measures the distance between entities in n- dimensional space. The complement of this distance is then taken as a measure of similarity. In order to standardize this index, all attributes must be scaled so that the maximum value of each is 1 and the minimum value is zero. This, of course, distorts
  • 60. 2. R- AND Q-MODE FACTOR ANALYSIS the proportionality. The equation for computation, assuming scaled attributes, is: 49 I - (2.7.3) N However obtained, the similarities between all possible pairs of entities are calculated and arranged in a square, symmetric, similarity matrix SN N" This matrix contains all the information concerning the , interrelations between the N entities under study. Q-mode factor analysis begins at this point. It will be recalled that in R-mode analysis, the objective was to create m linear combinations of the n variables. The m linear combin- ations could be considered as new variables. Similarly, in Q-mode analysis, the objective is to find new, hypothetical entities whose compositions are linear combinations of those of the original entities. As pointed out by Imbrie (1963), these "new" entities, or Q-mode factors, can be conceived of as being composite end members. combina- tions of which can be used to reconstruct the original entities. The problem then is to "unmix" the original entities into the smallest possible number of end members. In this respect, Q-mode analysis is a "mirror image" of R-mode analysis. Computational Procedure Using the cos 8 measure of similarity, the following equations reveal the necessary steps in the analysis. Let X N be the data matrix. Form the diagonal matrix D whose ,n principal diagonal contains the square root of the row vector lengths of X. That is, ~ n 2 L: Xk · i=1 1 (2.7.4)
  • 61. 50 The matrix operation W= D- 1X J .E. KLOVAN (2.7.5) row-normalizes each row of X, that is, each row vector in W is of unit length. Then the similarity matrix is computed from S = WW' The basic factor equation is W = AF' and W' FA' thus, S = AF'FA' The constraint F'F = I results in S AA' As in the R-mode, we stipulate that A'A = / and following the same reasoning as before A = U/1/2 and (2.7.6) (2.7.7) (2.7.8) (2.7.9) (2.7.10) (2.7.11) (2.7.12) (2.7.13) (2.7.14) It must be emphasized that the A and F matrices so derived are not the same as those derived in the R-mode analysis. One set is de- rived from S, the other from R. Once obtained, A may be rotated to B as discussed in the section on varimax rotation. The B matrix in general consists of N rows and m columns. Each row corresponds to an entity; each column represents a factor.
  • 62. 2. R- AND Q-MODE ANALYSIS 51 The factors are best thought of as hypothetical entities that are completely dissimilar in terms of the proportions of their constitu- ents. Scanning down any column of B shows the amount of the hypotheti- cal entity contained in each real entity. Scanning across any row shows the composition of a real entity in terms of the hypothetical entities. The F matrix has n rows and m columns. The rows represent the original attributes used to describe the entities. The numbers in a row thus describe the relative amount of the attribute in each factor. A column gives the "composition" of the hypothetical entity in terms of the original attributes. Unfortunately the scale of these numbers is obscure. Thus, they can be used in relative terms only. 2.8 AN EXAMPLE Imbrie (1963) presents an example of Q-mode analysis that gives the basic ideas behind the method. The matrix of fictitious data is given in Figure 2.13. The rows represent 10 sediment samples and the columns represent 10 species of minerals. The cos e similarity ma- trix is given in Figure 2.14. Eigenvalues derived from this matrix indicate that there are three, and only three, independent dimensions to the data. Data Matrix for Q-mode Example Var A Var B Var C Var D Var E Var F Var G Var H Var I Var J Loc 1 5.0 25.0 15.0 5.0 5.0 20.0 10.0 5.0 5.0 5.0 Loc 2 10.0 30.0 17.0 17.0 8.0 8.0 5.0 4.0 1.0 0.0 Loc 3 3.0 6.0 10.0 13.0 25.0 15.0 13.0 8.0 5.0 2.0 Loc 4 7.5 27.5 16.0 11. 0 6.5 14.0 7.5 4.5 3.0 2.5 Loc 5 4.6 21.2 14.0 6.6 9.0 19.0 10.6 5.6 5.0 4.4 Loc 6 3.8 13.6 12.0 9.8 17.0 17.0 11.8 6.8 5.0 3.2 Loc 7 8.3 26.6 15.9 14.2 9.1 11.1 6.8 4.6 2.2 1.2 Loc 8 6.1 22.7 14.6 10.2 9.9 15.4 9.1 5.3 3.8 2.9 Loc 9 7.6 24.2 15.2 13.8 10.8 11.8 7.6 5.0 2.6 1.4 Loc 10 3.9 10.3 11.2 12.6 21.3 14.8 11.9 7.3 4.6 2.1 FIGURE 2.13
  • 64. 2. R- AND Q-MODE FACTOR ANALYSIS 53 The varimax factor loading matrix and its associated factor score matrix are given in Figures 2.15 and 2.16. Figure 2.17 is one of three possible plots of the varimax factor loading. It is now evident that the original 10 sediment samples are various mixtures of the three hypothetical samples. Three maps could be drawn showing the spatial distribution of the end members and from this, mechanism of transport might be deduced. The composition of the end-members can be roughly determined from a study of the columns of the F matrix of Figure 2.16. Varimax Factor Matrix Factors Commun 1 2 3 Loc 2 1.0000 0.9133 0.3492 0.2095 Loc 7 1.0000 0.8494 0.4311 0.3043 Loc 4 1.0000 0.8179 0.3810 0.4311 Loc 9 1. 0000 0.8080 0.5006 0.3107 Loc 8 1.0000 0.7266 0.5183 0.4510 Loc 1 1.0000 0.6605 0.3899 0.6416 Loc 5 1.0000 0.6228 0.5222 0.5826 Loc 3 1.0000 0.3094 0.9314 0.1918 Loc 10 1.0000 0.4363 0.8632 0.2541 Loc 6 1.0000 0.4900 0.7714 0.4060 Variance 47.56 36.05 16.40 Cum percent 47.56 83.60 100.00 FIGURE 2.15
  • 65. 54 J.E. KLOVAN Varimax Factor Score Matrix Factors 1 2 3 Var A 0.881 0.038 -0.294 Var B 2.446 -0.468 0.948 Var C 1.l35 0.422 0.483 Var D 1. 333 1.003 -1. 346 Var E -0.097 2.433 -0.742 Var F -0.206 0.971 2.167 Var G -0.177 1.061 0.811 Var H 0.030 0.668 0.199 Var -0.208 0.393 0.611 Var ,1 -0.218 0.086 0.809 FIGURE 2.16 2 1.0 .3 .10 0.8 6 0.6 .9 .8 .5 .7 0.4 4 .1 .2 0.2 ~--__--__----__--__--~3 0.2 0.4 0.6 0.8 .10 FIGURE 2.17 2.9 OBLIQUE ROTATION The factors obtained in R- and O-mode analysis are constrained to be orthogonal. There may be no physical reason for them to be mutu- ally orthogonal and thus many schemes have been devised to find sets of factors that are oblique to one another.
  • 66. 2. R- AND ~-MODE FACTOR ANALYSIS 55 Of these, the method due to Imbrie (1963) is the most sim?le. Referring to Figure 2.17, it is apparent that the most divergent real sediment samples 1, 2, 3 could be used as end members from which all the remaining sam?les could be derived. Imbrie's method is to rotate the factor axes so that they coincide with the most divergent samples and then express all the other samples as proportions of these end member samples. This is accomplished by constructing an m by m matrix T, which contains the varimax loadings of the most divergent samples. Then C (2.9.1) yields the oblique factor matric C. Figure 2.18 illustrates the re- sults of the operation applied to the problem just discussed. The method is also applicable to R-mode factor loadings matrices. The oblique factors are no longer uncorre1ated and computation of factor scores becomes much more involved. In both of the examples presented thus far the correct choice of the number of factors needed to reproduce the data matrix has been un- equivocab1e. Coincidentally, in both instances there were three, and only three, nonzero eigenvalues. But both of these examples involved fictitious data that were "manufactured" for the purpose of exposition. When analyzing real data the investigator must select the correct num- ber of factors, and this is seldom unequivocab1e. Although some sta- tistical methods are available to aid in this section, experience has shown that certain empirical criteria are more useful. The essentials of the problem are: 1. In order to reproduce exactly the data matrix from the two de- rived matrices (Z = FA') it is necessary to use as many factors as original variables. This is because the common variance and unique variance together equal the total variance. 2. Because data are expressed in standard form in R-mode, and en- tities are row normalized in Q-mode, the total variance (informa- tion content) is equal to nand N, respectively. Each eigenvalue extracted accounts for a certain amount of variance; thus the percent variance explained can be calculated by dividing the ei- genvalue by n (or N). By cumulating these percentages, it is
  • 67. 56 Localitv 1 2 3 4 5 6 7 8 9 10 3 1.0 3 0.8 0.6 0.4 0.2 2 ,10 J . E. KLOVA,lI .6 .9 8 5 7 . 4 111 0.2 0.4 0.6 0.8 1.0 Oblique Factor Matrix Factors 1 2 3 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.497 0.537 0.000 0.847 0.000 0.207 0.441 0.000 0.644 0.200 0.753 0.097 0.530 0.343 0.207 0.207 0.668 0.201 0.108 0.116 0.839 FIGURE 2.18 possible to arbitrarily stop extracting factors once the cumula- tive variance explained reaches some specified level, for example, 95%. The remaining factors needed to account for the remaining variance are assumed to represent unique factors. 3. The equation R = AA' (or S = AA') implies that the correlation matrix can be reproduced by forming the major product moment of
  • 68. 2. R- AND Q-MODE FACTOR ANALYSIS 57 the factor loadings matrix. It is, of course, possible to pro- duce an estimate of R with an n x I matrix A. The difference, or residual matrix, is obtained from: R = R - A A' r l n,l l,n (2.9.2) If significant correlations remain in R then the second factor r l may be added to A and the process repeated. R = R - A A' r 2 n,2 2,n (2.9.3) This procedure may be continued until there are no signifi- cant residual correlations. The number of columns in A is then taken as the correct number of factors. 4. The sums of squares of the factor loadings in a row of the A matrix is termed the communality and represents that proportion of the variance of a variable accounted for by the number of fac- tors used. The "correct" number of factors can be judged by the values of communality for all the variables. If many communali- ties are low, say less than 0.8, more factors are probably re- quired. 5. Because variance is extracted in descending order, factor loadings on the first few factors will be higher than those on the later ones. When all the loadings on a given factor appear to resemble nothing more than noise (error components), then this and succeed- ing factors may be removed from further consideration. This is often best judged on a rotated factor matrix. 6. The final criterion is entirely subjective. If factors are in- terpretable and "make sense" they are probably relevant. Uninter- pretable factors or those whose spatial distribution forms no sensible pattern may merely represent error components. 2.10 PRACTICAL EXAMPLES Rather than add to an already lengthy account, an annotated bib- liography of a few pertinent papers is given below. Study of these
  • 69. 58 J.E. KLOVAN papers should enable the reader to develop a better understanding of how the results of factor analysis are put to use in a variety of geo- logic problems. Cameron, E. M., 1968, A geochemical profile of the Swan Hills Reef. Can. Jour. Earth Sci., v. 5, p. 287-309. A detailed study of chemical variations in a reef complex. Factor analysis, coupled with trend surface analysis, provides a method for determining the diagenetic history of a slightly dolomitized, lime- stone reef. Degens, E. T., Spencer, D. W., and Parker, R. H., 1967, Paleobiochem- istry of Molluscan Shell Proteins: Compo Biochem. Physiol., V. 20, p. 553-579. The interrelationships between amino acids in various molluscs is stud- ied by means of R-mode factor analysis. Environmental and genetic effects on amino acid compositions are revealed. Harbaugh, J. W., and Demirmen, F., 1964, Application of factor analysis to petrographic variations of Americus limestone (Lower Permian), Kansas and Oklahoma: Kan. Geol. Survey Dist. Pub. 15. A paleoecologic analysis of a thin limestone unit based on petro- graphic and chemical attributes. Uses both correlation coefficients and distance coefficients as similarity indices in a Q-mode analysis. Hitchon, B., Billings, G. K., and Klovan, J. E., 1971, Geochemistry and origin of formation waters in the western Canada sedimentary basin - III. Factors controlling chemical composition: Geochim. et Cosmochim. Acta, V. 35, p. 567-598. R- and Q-mode analyses are used to document flow paths and the chemi- cal reactions responsible for variations in the chemistry of subsurface formation waters. Oblique rotation is used to achieve simple struc- ture. R-mode factor scores are used as input variables to second- order factor analyses. Imbrie, J., and van Andel, T. H., 1964, Vector analysis of heavy- mineral data: Geol. Soc. Amer. Bull., V. 75, p. 1131-1156. The classic paper on the use of Q-mode factor analysis in the study of sediments. The Q-mode model is developed and applied to two recent sedimentary basins in a way that clearly shows the utility and power of the method.
  • 70. 2. R- AND Q-MODE FACTOR ANALYSIS 59 Imbrie, J., and Kipp, N. G., 1971, A new micropaleontological method for quantitative paleoclimatology: Application to a late Pleis- tocene Caribbean core, in The Late Cenozoic glacial ages (Turekian, K., ed.): New Haven, Conn., Yale Univ. Press. An involved and elegant method of analysis applied to Recent and Pleistocene foraminifera shows how the results of Q-mode factor ana- lysis can be used in predictive, nonlinear regression models. Pleis- tocene oceanic temperatures and salinites can be accurately predicted on the basis of foram assemblages. Klovan, J. E., 1966, The use of factor analysis in determining deposi- tional environments from grain-size distributions: Jour. Sed. Petrology, v. 36, no. 1, p. 115-125. Q-mode analysis is used to classify recent sediment samples on the basis of their grain-size distributions. The factors extracted are claimed to reflect different types of depositional energy. Lonka, A., 1967, Trace-elements in the Finnish Precambrian phyllites as indicators of salinity at the time of sedimentation: Bull. Comm. Geol. Finlande No. 209. Trace element variation as revealed by factor analysis leads to the surprising conclusion that depositional salinities can be determined even in highly metamorphosed shales. Matalas, N. C., and Reiher, B. J., 1967, Some comments on the use of factor analysis: Water Resources Research, v. 3, no. 1, p. 213- 223. Some critical comments on the use and abuse of factor analysis as ap- plied to hydrologic problems. Many mathematical and substantiative arguments are presented that must be taken into account when interpre- ting results. Although many of the comments are germane, others are open to serious question. McCammon, R. B., 1966, Principal component analysis and its application in large-scale correlation studies: Jour. Geol., v. 74, no. 5, pt. 2, p. 721-733. Explains the use of R-mode analysis as applied to crude oil variations and biostratigraphic problems. A "minimum entropy" criterion is used to achieve simple structure in rotation.
  • 71. 60 J.E. KLOVAN McCammon, R. B., 1969, Multivariate methods in geology, in Models of geological processes (Fenner, P., ed.): Washington, D.C., Am. Geol. Inst. One of the best treatments of the mathematics and concepts of factor analysis and many other related topics. Algebraic and geometrical ex- planations are presented at an elementary level and the use of several examples makes understanding especially easy. McElroy, M. N., and Kaesler, R. L., 1965, Application of factor analysis to the Upper Cambrian Reagan Sandstone of central and northwest Kansas: The Compass, v. 42, no. 3, p. 188-201. An application of factor analysis to a typical stratigraphic problem. Factors are interpreted in terms of regional influences that affect thickness, grain-size, and mineralogy of a sandstone unit. Spencer, D., 1966, Factors affecting element distributions in a Silur- ian graptolite band: Chem. Geol., v. 1, p. 221-249. R-mode analysis is used to determine the underlying casual influences affecting the chemical variability of a thin shale unit. A very en- lightening discussion of how the factor matrices can be interpreted is especially useful. REFERENCES Gower, J. C., 1967, Multivariate analysis and multidimensional geometry: The Statistician, v. 17, no. 1, p. 13-28. Harbaugh, J. W., 1964, BALGOL programs for calculation of distance co- efficients and correlation coefficients using an IBM 7090 com- puter: Kansas Geol. Survey Sp. Dist. Pub. 9. Harman, H. H., 1960, Modern factor analysis: Chicago, Illinois, Univ. of Chicago Press, 471 p. Imbrie, J. and Purdy, E. G., 1962, Classification of modern Bahamian carbonate sediments, in Classification of carbonate rocks - a symposium, Mem. 1, Amer. Assoc. Petroleum Geol., p. 253-272. Imbrie, J., 1963, Factor and vector analysis programs for analyzing geologic data: U.S. Office of Naval Research, Tech. Rept. 6, 83 p.
  • 72. 2. R- AND Q-MODE FACTOR ANALYSIS Kaiser, H. F., 1958, The varimax criterion for analytic rotation in factor analysis: Psychometrika, v. 23, p. 187-200. 61 Klovan, J. E. and Imbrie, J., 1971, An algorithm and FORTRAN IV pro- gram for large-scale P-mode factor analysis and calculation of factor scores: Jour. Inter. Assn. Math. Geol., v. 3, p. 61-67. McIntyre, D. B., 1969, Introduction to the study of data matrices, in Models of geological ~rocesses (Fenner, P., ed.): Washington, D.C., Amer. Geol. Inst. Pearson, K., 1901, On lines and planes of closest fit to systems of points in space: Phil. Mag., v. 6, p. 559-572. Spearman, C., 1904, General intelligence, objectively determined and measured: Amer. Jour. Psychol, v. 15, p. 201-293. Thurston, L. L., 1947, Multiple factor analysis: Chicago, Illinois, Univ. of Chicago Press, 535 p.
  • 73. 62 J.E. KLOVAN APPENDIX 1. A PRIMER ON MATRIX ALGEBRA 1. A matrix is a rectangular chart of numbers. A matrix is symbol- ized by a capital letter and its size is shown by two subscripts; the first referring to the number of rows, the second the number of columns. Thus A represents the matrix A with r rows and c r,c columns. Any number in the matrix is termed an element. Thus, a.. is the element of A in the i-th row and j-th column. Two l,J matrices are said to be equal if all elements correspond exactly. That is, A = B if a .. = b .. for all i and j. lJ lJ 2. The transpose of a matrix is another matrix in which the rows and columns are interchanged. It is symbolized by an apostrophe. Thus A' is the transpose of matrix A. 3. Special types of matrices include: (a) (b) (c) (d) (e) Rectangular matrix. Has more rows than columns or vice versa. Square matrix. Has the same number of rows as columns. Square symmetric matrix. A square matrix such that a.. lJ a .. for all values of i and j. Jl The lower left triangular part of the matrix below the diagonal is a mirror image of the upper right triangular portion. Diagonal matrix. Only the elements in the principal dia- gonal are nonzero and all other elements are zero. That is a.. F 0 when i =j but a.. = 0 when i Fj. lJ lJ Identity matrix. A diagonal matrix whose diagonal elements all equal 1. (f) Column vector. A matrix 'with n rows but only one column. (g) Row vector. A matrix with n columns but only one row. (h) Scalar. A matrix with one row and one column. 4. Matrices can be added together (or subtracted) only if they are size compatible, that }s, each matrix must have the same number of rows and columns. Addition or subtraction is done on an element by element basis. Thus the elements of C in C =A + B are equal to the sums of corresponding elements of A and B; c .. lJ and j. a .. + b .. for all i lJ lJ
  • 74. 2. R- AND ~-MODE FACTOR ANALYSIS 1. Matrix Definition A [~ ~ ~] 159 2 8 6 2. Trans)2ose [: 7 1 :] A' 2 5 4 9 3. Types of Matrices [;:] rectangular Size of A is 4 by 3; A43 a 32 = 5 A' is the transpose of A above. [153] 524 347 square square symmetric [~~:~] o 0 0 1 - 1 3 o 4 1 [35871J diagonal 4. Matrix Addition [~ ~ ~] 7 8 1 436 A identity + column vector C = A + B [~ :~] 203 436 B row vector 63
  • 75. 64 J.E. KLOVAN 5. Multiplication of matrices can only be performed if the number of columns of the pre-factor is equal to the number of rows of the post-factor. That is, C = A . B is only possible if m = e. n,m e,f An element of C is defined as follows: c .. 1J where m is the number of columns of A and the number of rows of B. A row of C is produced by multiplying the first row of A times each column of B. This is repeated for every row of A until the C matrix is complete. The minor product moment is defined as C =A'A. C contains the sums of squares and cross products of the rows of A. 6. The trace of a square matrix is the sum of its diagonal elements. 7. The matrix analog of scalar division is accomplished by inver- sion. If A is a square matrix and A·B = B·A = I then B is said to be the inverse of A. The notation A-I is commonly used to de- note the inverse of A. Finding the inverse of a matrix is a rather complicated pro- cedure and the reader is referred to any good text on matrix algebra for details.
  • 76. 2. R- AND Q-MODE FACTOR ANALYSIS 65 5. Matrix Mul tiElication C A . B B C I ~J 2 r: ] [: 7 :] cil all bll + a l2 . b21 4 14 5 . 2 + 1 4 l4 3 20 10 1 1 6 3 cl2 all + b12 + a12 . b22 A C 7 5 . 1 + 1 2 In general, the following "box" notation for matrix multiplication will be found usefuL. [post-factor] [pre-factor] [product] A [: :] [~ 1 ~J 3 A' [::] [: 2 ,~] G 1 ~J [14 6 J 10 3 6 10 6 minor product moment major product moment 6. Note that the trace of both products is equal to 24. 7. Matrix inversion. Given the square matrix A, it is necessary to find a matrix B such that AB = I or [B] [A] [r] Expanding the above yields allb ll + a12b21 + + a1nbn1 1.0 allb12 + a12b22 + + a lnbn2 0 a21b ll + a22n21 + + a 2nbn1 0 a21b22 + a22b22 + + a2nbn2 1.0 etc. When b's are determined so as to satisfy this set of simul- taneous equations then B = A-I, the inverse of A.
  • 77. 66 J.E. KLOVAN 8. A square matrix Q is said to be orthogonal if Q'Q = D where D is a diagonal matrix. A square matrix Q is said to be orthonormal if Q'Q = QQ' = I 9. The Rank of a Matrix. The rank of a matrix may be defined, in terms of its column or row vectors, as the number of linearly in- dependent row, or column, vectors present in the matrix. Another view of rank is as follows. A matrix X can be ex- nm pressed as the product of two matrices whose common order is r. If X cannot be expressed as the product of any pair of matrices with a common order less than r, then the rank of X is r. It can be appreciated from this that a very large matrix may have a low rank and thus be expressable as the product of two smaller ma- trices. This is the basis of practically all multivariate methods of data analysis.
  • 78. 2- R- AND Q-MODE FACTOR ANALYSIS 8. _ [2 3J then Q'Q Q - 1-6 Q is orthogonal Q = [0.5 -0.866J 0.866 0.5 Q is orthonormal 9. Rank of a Matrix [ 6 3 12] A = 8 4 16 12 6 24 67 Q'Q QQ' I The rank of A is 1 for there is only one linearly independent vector; all columns (or rows) are multiples of each other. A can be reproduced by the product of an infinite number of pairs of vectors, for example: [2 1 4] mu 3 12] 4 16 6 24 [: 10 ,:] The rank of A is 2 for there are two A 5 linearly independent vectors; co1- 25 umns 1 and 2 are multiples of each other. [: 10 ,:] A 5 The rank of A is 3. 20
  • 79. 68 J.E. KLOVAN 10. Eigenvalues and Eigenvectors Given a real square symmetric matrix A, there exist scalars A and vectors u such that Au = AU or Au - AU 0 or (A AI)u 0 [u] [~] An nth-order symmetric matrix A has eigenvalues Al ,A 2... A n possibly not all distinct and possibly some being zero. Associa- ted with each A is a vector (eigenvector) ul ,u2.•. un ' such that u'.u. = 0 for all i and j when i ~ j. u'.u. 1 for i = l,n. ~ J ~ ~ Placing the eigenvalues in a diagonal matrix A and the eigenvectors into Q we obtain AQ = QA or Q'AQ A or A = QAA' which is referred to as the basic structure of the square symme- tric matrix A. The rank of A is equal to the number of nonzero eigenvalues. If there are m nonzero eigenvalues, the basic structure sug- gests that two small matrices contain the same information as does A, viz
  • 80. 2. R- AND Q-MODE FACTOR ANALYSIS 69 [ A J[ Il' J JU ill] [ UA ] [ A ]
  • 81. Chapter 3 Some Practical Aspects of Time Series Analysis William T. Fox 3.1 GENERALITIES In many geologic applications, geologic observations in a strati- graphic succession correspond to changes taking place through time. Where deposition was continuous and the rate of deposition was rela- tively constant, stratigraphic thickness can be considered directly proportional to time. Therefore, time-trend curves can be plotted with stratigraphic thickness corresponding to time as the independent vari- able and the parameter being studied as the dependent variable. The dependent variables can include such things as grain size, carbonate content, color, or fossil distribution. In modern sedimentary studies where processes are being studied through time, absolute time can be plotted as the independent variable and the dependent variables can in- clude such things as barometric pressure, wind velocity, wave height, or current velocity. As pointed out by Kendall (1948, pp. 363-437), a sequence of ob- servations made in a time series are influenced by three separate com- ponents: (1) a trend or long-term component, (2) cyclical or oscilla- ting functions about the trend, and (3) a random or irregular component. The trend is considered a broad, smooth undulating motion of the sys- tem over a relatively long period of time or through a relatively large number of sedimentation units. The cyclical or oscillating fluctuations about the trend represent a "seasonal effect" or local variations that are superimposed on the trend component. When the cyclical fluctuations and the trend have been subtracted from the data, 70
  • 82. 3. TIME SERIES ANALYSIS 71 we are left with the random fluctuations that are referred to as the random error or residuals. Several techniques are available for separating the trend compo- nent from the oscillating fluctuations and random variations in a time series. The most straightforward method is to draw a purely interpre- tive curve through the clusters of high and low values on the observed data curve. This "eye balling" technique has been used effectively by Walpole and Carozzi (1961) in their study of the microfacies of the Rundel group. For illustrative purposes, a free-hand trace of the main trend is useful, but since it is highly interpretive, it would be difficult for another worker to reproduce. Also, with a free-hand method, it is difficult to separate a major trend from minor oscilla- tions in the data. Weiss et al. (1965) constructed smooth curves or "graphic logs" by grouping layers into units that were each 3 feet thick. The result- ing "moving total" curves were used for correlating between adjacent measured sections and interpreting the depositional environment. The moving total curves gave a good picture of the gross lithologic changes but would be difficult to use for detailed stratigraphic studies. This method gave a reproducible curve for the major trend, but the small- scale oscillations or other higher-frequency components superimposed on the random fluctuations were lost. One of the most frequently used methods for smoothing a time series is the simple moving average described by Krumbein and Pettijohn (1938, p. 198) and used by Walker and Sutton (1967, p. 1014). With this technique, the data are arranged in a stratigraphic sequence with values recorded at equal increments of time or stratigraphic thickness. Statting at the base of the section, a series of moving averages is taken on successive groups involving an odd number of data points. In practice, the successive averages are computed for a series by dropping the lowest data point and adding the next value in the sequence. As pointed out by ~iller and Kahn (1962, p. 355), the moving average tech- nique should be regarded as descriptive rather than analytical. Be- cause the weight of each value is equally distributed within the group being averaged, the moving average technique subdues highs and low and displaces peaks and valleys in the trend. As with the previous
  • 83. 72 WILLIAM T. FOX techniques described, this method gives an approximation of the major trend, but since the highs and lows in the trend are markedly reduced, the oscillating fluctuations are exaggerated. The three techniques that provide the most useful methods for smoothing curves include polynomial curve fitting, iterated moving averages, and Fourier analysis. Polynomial curve fitting and iterated moving averages make use of summation equations that are readily adapt- able to computer programming. In Fourier analysis, a series of sine and cosine curves representing fundamental harmonics are fit to the observed data. 3.2 POLYNOMIAL CURVE FITTING It is implicit in the concept of time trend analysis that the movement be relatively smooth over long periods of time (Kendall, 1948, p. 371). Therefore, the trend component (U1) can be represented by a polynomial in the time element, t, as follows: + '" By increasing the size of p, we can obtain as close an approximation to a finite series as we desire. When the polynomial is fitted to the whole time series by the method of least squares, it gives a curvi- linear regression line of Ut on the variable t. The method of fitting a polynomial to the data by least-squares analysis has worked quite successfully for trend surface analysis of facies maps. In fitting a polynomial response surface to a map, the most success has been with the linear, quadratic, and cubic surfaces. In using a polynomial fitted over the entire time series by least squares, it would be necessary to use a much higher-order polynomial to fit the trend. As pointed out by Kendall (1948, p. 371), the high-order polynomial would be somewhat artificial, and the coefficients, being based on high- order moments, would be very unstable from a sampling point of view. It would also be difficult to separate the trend component from the oscillating and random components when using a high-order polynomial.
  • 84. 3. TIME SERIES ANALYSIS 73 As a possible alternative for finding a single high-order poly- nomial which approximates the entire time series, Kendall (1948, p. 372) suggests using a sequence of low-order polynomials representing overlapping segments of the series. In using this technique, data points must be spaced at equal intervals along the time line. The first step is to take an odd number of data points (2m + 1), with m representing the number of points on each side of the value being smoothed, and to fit a polynomial of order p, with p not greater. than 2m, to them. The value of the polynomial at the middle of its range is substituted for the corresponding observed data point in plotting the smoothed curve. The polynomial fitting operation is repeated for consecutive sets of 2m + 1 terms from the beginning to the end of the time series. The degree of smoothing of the trend curve is controlled by the number of terms included in the polynomial. In a series of 2m + 1 terms, the terms are denoted by According to Kendall (1948, p. 372), the coefficients of a polynomial of the order p are obtained by the method of least squares giving an equation of the following form: (3.2.1) In equation (3.2.1), the constants, C's, depend on m, the number of terms, and p, the order of the polynomial, but are independent of the V's. AO is equivalent to Vo at t = 0, so this is the value for the middle of the range of the polynomial. As can be seen from equa- tion (3.2.1), this is equivalent to a weighted average in which the weights are independent of the observed values. Therefore, to compute the trend line, the constants for equation (3.2.1) are determined for the selected values of m and p, and then the value for AO given in equation (3.2.1) is calculated for each consecutive set of 2m + 1 terms in the series. It should be noted that there will be a loss of m terms at the beginning and at the end of the trend curve. Tables listing the formulas for fitting a polynomial of orders 2 and 3
  • 85. 74 WILLIAM T. FOX (quadratic and cubic) and orders 4 and 5 (qu'artic and quintic) for m = 1 to 10 are given in Kendall (1948, p. 374) and Whittaker and Robinson (1929, p. 295). The same value is obtained by fitting a polynomial of order 2 (quadratic) or order 3 (cubic) since the case p odd includes the next lowest (even) value of p. Therefore, it is not necessary to give separate values for the even (quadratic and quartic) polynomials if the odd (cubic and quintic) polynomials have been cal- culated (Kendall, 1948, p. 373). Four of the equations fitting a quadratic and cubic when m = 2, m = 3, m = 4 and m = 5 are given as equations (3.2.2) to (3.2.5) (Whittaker and Robinson, 1929, p. 295). m 2 : m 3: m 4: m 5: U' o U' o 1 35[17UO + 12(U1 + U_ 1) - 3(U2 + U_ 2)] 1 21[7Uo + 6(U1 + U_ 1) + 3(U2 + U_ 2) - 2(U3 + U_3)] 1 Ub 231[59Uo + 54(U1 + U_ 1) + 39(U2 + U_ 2) U' o + 14(U3 + U_3) - 21(U4 + U_4)] 1 429[89UO + 84(U1 + U_ 1) + 69(U2 + U_ 2) (3.2.2) (3.2.3) (3.2.4) As the number of terms is increased, sizes of the constants be- come quite large and in moving from quadratic and cubic to quartic and quintic, the sizes of the constants also greatly increase. Because of the labor involved in their use, it is advisable to use a computer for plotting trend curves. 3.3 ITERATED MOVING AVERAGES Several different techniques have been proposed to simplify the computations for fitting a trend line by moving averages. The iterated averages method most widely used was introduced by actuaries for "grad- uating" a life expectancy curve, which is similar to fitting a trend
  • 86. 3. TIME SERIES ANALYSIS 7S line in geology. Whittaker and Robinson (1929, p. 286) point out one of the earliest examples of iterated moving averages, which was devel- oped by Woolhouse in 1870. Woolhouse computed each point in the trend line by passing five parabolas through the following five sets of points with three points to a set: To compute the graduated value, Woolhouse took the arithmetic mean of the values of the five parabolas as they passed through a line perpen- dicular to the time series through Va. The values at Va can be deter- mined for each of the parabolas using the Newton-Gauss formula of in- terpolation given in Whittaker and Robinson (1929, p. 36). The arith- metic mean of the interpolated values of Va can also be found by the following summation formula, which is given as equation (3.3.1): VI a (3.3.1) The Woolhouse IS-term formula using seven terms on each side of the central value has about the same degree of smoothing as the nine- term formula given as equation (3.2.3). The Woolhouse formula using iterated moving averages gives a smoother trend curve than the fitting of a quadratic or cubic polynomial to the data. Another type of iterated moving average formula using three suc- cessive averages covering IS points was developed by Spencer and is given by Kendall (1948, p. 376). The first moving average used for five terms has a constants -3, 3, 4, 3, -3. The values resulting from this moving average are averaged first in sets of five points each, then these values are averaged twice in sets of four. The form used for such an iteration is given as Vo = 3~0[4]2[S] [-3, 3, 4, 3, -3] (3.3.2)
  • 87. 76 WILLIAM T. FOX When the separate iterations are combined into a complete summation formula, the weights are those given in equation (3.3.3), which is Spencer's IS-point formula. U6 = 1 320[74UO + 67(U1 + U_ 1) + 46(U2 + U_ 2) + 21(U3 + U_ 3) (3.3.3) Expanding the same technique that was used for the Spencer 15- point formula [equation (3.3.3)], Spencer also developed a 21-point equation explained by Whittaker and Robinson (1929, p. 290). In using this formula, the seven-term series (-1,0,1,2,1,0,-1) is averaged and the values are averaged first in sets of seven, then twice in sets of five, as is shown in 1 2 Uo= 350[5] [7] [-1, 0, 1, 2, 1, 0, -1] (3.3.4) Spencer's 21-term formula can be expanded into the following summation formula; (3.3.5) When the computations must be done by hand or with a desk calcu- lator, it is useful to set up a table to carry out the successive aver- aging. Vistelius (1961) used such a table to compute the trend terms for Spencer's 21-term formula. only portions of the table, it uated In using By using cardboard cutouts exposing is possible to compute about 600 grad- the table with Spencer's 21-term equa- tion, values per day. the first step - U_ 3) for the is to form the computation (1/2) (-U3 + U1 + 2UO entire series. The values derived from the first computation are then summed by sevens and divided by seven, then summed twice by fives, dividing by five each time. The actual order in which the iterations are carried out is immaterial, but with a long series
  • 88. 3. TIME SERIES ANALYSIS 77 it is advisable to do the more complicated operations while the num- bers are still small. The "goodness of fi til of a smoothed curve to the original curve may be expressed as the percentage reduction in the total sum of squares, which is given by the expression: 100 where l:x2 trend - (l:Xtrend) 2 n 2 l:x 2 (l:Xobs ) obs - n values on trend surface at location of data points observed data values number of data values Obviously, a perfect fit of the curve to the data points would give 100 percent and any less perfect fit would yield a correspondingly smaller percentage of total sum of squares. Nine smoothing equations are available with the (Fox, 1964) pro- gram for computing and plotting trend curves with varying degrees of smoothing. Formulas derived by Sheppard (Whittaker and Robinson, 1929, p. 279) for fitting a quadratic and cubic polynomial to m points, with m varying from 2 to 10, are used in the program. Equations (3.2.2) to (3.2.5) in this chapter are the first four equations used with the program. By increasing the number of terms in the smoothing equation, the fluctuations in the data are subdued and the underlying trends of sedimentation are accentuated. As a characteristic of the type of smoothing equation, there are still minor fluctuations in the data, even when the 2l-term equation is used for smoothing fossil data (Figure 3.1). When using an iterated moving average as done by Woolhouse's IS-term equation and Spencer's 15- and 2l-term equations, the minor fluctuations are completely removed, leaving only the smooth trend curve. Since the program was published (Fox, 1964), it has been modified (Fox, 1968) with the addition of the Woolhouse IS-term equa- tion of equation (3.2.5) and the Spencer 21-term equation of equation (3.3.5). In degree of smoothing, the Woolhouse equation is equivalent to the Sheppard nine-term equation (m = 4), and the Spencer 21-term
  • 89. 78 45 4080 246 % ICrinOidea IBryozoa WILLIAM T. FOX ITrilobita IBrachiopoda Icoelenterata , Mollusca f~'O ~ ~ ~'b ~ ~ .~,~.. }! .. t ¢~ ,i: f .. ~ !? !? ~~ r.: $~~~ 0 S'!-..tti~6QOQ ~~!!,;, ~ ~ ~.~ ~ f.i~;;;O~~ ~".. JJ. ~Q -s:,"l ~ I ~ ~~§$"J: . , v q ~ l~ "q <Ii~,' , RHO Scale FIGURE 3.1 c B 2 2 W.T.F. equation is equivalent to the Sheppard 11-term equation em = 5). Be- cause of the more uniform smoothing, the iterated moving average curves are preferred over the polynomial fitting curves. The only apparent disadvantage to using the Spencer and Woolhouse formulas is the loss of points at the beginning and end of the curve. Since the series being smoothed is quite long relative to the loss at each end, the overall effect is not too bad. 3.4 FOURIER ANALYSIS Geologic processes that are cyclic in nature can be best described using Fourier analysis. In Fourier analysis, a complicated curve can
  • 90. 3. TIME SERIES ANALYSIS 79 be broken down into an aggregate of simple wave forms described by a series of sine and cosine curves. The observed data can be expressed as a series of fundamental harmonics that are theoretically independ- ent. Each harmonic has a wavelength that is a discrete fraction of the total observation period. For each harmonic, the wavelength is defined as the distance from crest to crest and the amplitude as one- half the height from trough to crest. In Fourier analysis of geologic processes, wavelength can be expressed in time or stratigraphic thick- ness and the amplitude in the observed units for each parameter. The complicated form of the observed data can be represented by an aggregate of simple wave forms that are expressed by the amplitude of the cosine and sine terms, an and bn , respectively. Although the function of the form Z = f(x) is not known, data points (xi' zi)' are available at equal intervals. Thus, the coefficients an and bn may be determined by numerical integration methods employing equations (3.4.1) and (3.4.2) and used in equation (3.4.3) to approximate the observed curve according to methods described by Harbaugh and Merriam (1968) and Fox a n b where n x. 1. n Z. 1. ahd Davis (1971) . ~[ZO + ZK K-1 nrrx.] + L: z. cos T n O,1,2, ... ,K/2 (3.4.1) K 2 i=l 1. K-l nrrx. ~L:z. sin 1. 1,2, ... ,K/2 (3.4.2) -L- n i=l 1. aO N nrrx. nrrx. F(xi ) 1. 1. Z+L: a cos -L-+ b sin -L- (3.4.3) n n n=l observed value at i-th sampling point, value of approximating function at i-th sampling point coefficient of zeroth degree cosine term, which is equal to the mean degree of term coefficient of cosine terms, n = 1,2, ... , coefficient of sine terms, n 1, 2, ... , '" 3.1416 sampling point, time in this case
  • 91. 80 WILLIAM T. FOX i K L N 0, 1, 2, ... , K maximum number of sampling points (an even number) half of fundamental sampling length K 6x/2 maximum degree of series, N = K/2. In analyzing geologic processes, it is convenient to plot each harmonic as a single sine curve with a given phase and amplitude. In Fourier analysis, each harmonic is expressed by a pair of sine and co- sine curves with the same period. When the sine and cosine curves are added algebraically, a new sine curve results with a phase shift and a new amplitude. The phase shift can be determined by using an arc- tangent subroutine (Louden, 1967). The phase for each harmonic, Pn' can be computed according to a n Pn = arc tan b n (3.4.4) The phase that is expressed in degrees is used to determine the start- ing point for a sine curve for each individual harmonic. Since the period for each harmonic is expressed in hours, it is also possible to convert the phase into hours. In this way, it is possible to compare coastal parameters such as wave height or wave period by comparing the phase shifts of different harmonics. The amplitude, an' for each har- monic can be determined directly from the power spectrum (Preston and Henderson, 1964). The discrete power spectrum, a 2 , is defined as the n sum of the square Fourier coefficients according to n = 0, 1, 2, ... , K/2 (3.4.5) The term "power spectrum" arose because of its relation to the power dissipation in an alternating current circuit (Harbaugh and Merriam, 1968). The amplitude for each harmonic is derived from the square root of the power spectrum according to equation (3.4.6) _I 2 2 a ="a +b n n n n = 0, 1, 2, ... , K/2 (3.4.6) where the phase, Pn' is expressed in radians and the amplitude, an' is derived from the power spectrum. The height of the curve, zi' can be computed at each sampling point using
  • 92. 3. TIME SERIES ANALYSIS 81 N n~(p + x.) aO '" • n 1 _ + L...J On sln L 2 n=l (3.4.7) The amplitudes of the Fourier coefficients are especially meaningful because they are in the same units as the original data. Thus, if wave height is measured in feet or longshore current velocity in feet per second, the Fourier coefficients will be expressed in the same respective units. Since each Fourier component is a discrete harmonic of the curve for the observed data, the Fourier components are theoretically inde- pendent. The number of Fourier components available is equal to one- half the total number of data points or observations. Therefore, with 360 observations taken at 2-hour intervals over 30 days, it is possi- ble to obtain 180 Fourier harmonics with periods ranging from 4 to 720 hours. With least squares techniques, the total variance accounted for by each harmonic can be calculated. Theoretically, a curve con- sisting of the full 180 Fourier harmonics should account for 100 per- cent of the total sum of squares. In practice, a small number of basic harmonics usually accounts for a very large percentage of the total sum of squares. Where a particular harmonic or set of harmonics are related to a naturally occurring cycle a large percentage of the sum of squares can be accounted for by a small number of harmonics. 3.5 AN APPLICATION Barometric pressure, which is a major indication of the weather patterns passing through an area, has been selected to demonstrate how Fourier analysis is used. The observed curve for barometric pres- sure from 8:00 a.m., June 29 through 6:00 a.m., July 29, 1970, is plotted acro1s the top of Figure 3.2. From the observed data, it can be seen that four ,major low-pressure systems passed through the area on July 4, 9, 15, and 19, 1970. Minor low-pressure systems, which cause small fluctuations in the observed curve, can be seen on July 1, 17, and 24. The major low-pressure systems are accompanied by high wind ,and waves that caused beach erosion or deposition, and changes in the configuration of the nearshore bars.
  • 93. 82 30.2 '" 30.0 Q) .s:: <.> c: - 29.8 2 3 7 8 WILLIAM T. FOX July, 1970 5 10 15 20 25 FIGURE 3.2 The period, cosine, sine, phase, amplitude, and sum of squares for the first 15 Fourier harmonics for barometric pressure are given in Table 3.l.
  • 94. 3. TIME SERIES ANALYSIS 83 TABLE 3.1 Harmonic, Period Cosine, Sine, Phase, Amplitude, Sum Sq. , a b Pn [J % n n n n 1 720.0 0.059 -0.096 296.8 0.112 33.6 2 360.0 -0.024 -0.069 199.2 0.073 15.2 3 240.0 -0.053 0.046 207.2 0.070 15.1 4 180.0 0.031 0.005 40.4 0.031 2.1 5 144.0 0.045 0.049 17.0 0.067 11.1 6 120.0 -0.051 -0.033 88.7 0.051 8.8 7 102.6 -0.012 0.016 92.0 0.020 1.3 8 90.0 0.028 -0.029 34.1 0.041 4.1 9 80.0 0.012 -0.008 27.3 0.014 0.2 10 72.0 -0.009 -0.008 45.4 0.012 0.7 11 65.3 0.005 0.007 6.4 0.009 0.1 12 60.0 0.007 -0.009 23.9 0.011 0.2 13 55.2 -0.032 -0.009 38.2 0.024 2.2 14 51. 3 0.020 -0.004 14.5 0.020 0.8 15 48.0 0.007 -0.019 21. 2 0.020 0.9 The first harmonic has a period of 30 days or 720 hours and an ampli- tude of 0.112 inches of mercury. The phase for the first harmonic is 148.4 degrees or 296.8 hours. The first harmonic, which accounts for approximately 33.6 percent of the total sum of squares, is plotted as the second curve in Figure 3.2. The curve for the first harmonic has a low on July 9 and a high on July 24. This is strongly influenced by the low-pressure systems early in the month and the high-pressure sys- tem that passed through the area late in the month. The second har- monic plotted as the third curve in Figure 3.2 has a period of 15 days or 360 hours, an amplitude of 0.73 inches, and a phase of 199.2 degrees and hours. This harmonic has highs on July 10 and 25 and lows on July 2 and 17. Although the curve for the second harmonic does not appear to agree with the observed data, it accounts for 15.2 percent of the total sum of squares. The curves for the third through eighth har- monics are also plotted in Figure 3.2 along with the cumulative curve
  • 95. 84 WILLIAM T. FOX for the first eight harmonics. The first eight harmonics for baro- metric pressure account for 90.6 percent of the total sum of squares. As with any periodic data having a wave form, the harmonics interfere with each other resulting in the reinforcement or cancellation at cer- tain parts of the curve. In the cumulative curve for the first eight harmonics the major high- and low-pressure systems in the observed data can be easily recognized. In order to get a closer approximation of the mathematical func- tion representing barometric pressure, the first 15 Fourier harmonics were computed from the observed data. The cumulative curve for the first 15 harmonics, which accounts for 95.4 percent of the total sum of squares, includes the minor lows on July 1, 17, and 24. The 15th harmonic has a period of 48 hours or two days, therefore the bottom curve in Figure 3.2 accounts for all the variation in the data which has a period of one day or longer. The residual obtained by subtract- ing the IS-term curve from the observed data still accounts for 4.6 percent of the total sum of squares. This can be accounted for by the diurnal variation in barometric pressure due to heating during the day and cooling off at night. The normal diurnal fluctuation of baromet- ric pressure has an amplitude of about 0.03 inches of mercury. By using Fourier analysis, therefore, it is possible to eliminate the di- urnal variation from the barometric pressure curve. It is also possi- ble to compare barometric pressure with other environmental parameters by comparing the phase and amplitude for each of the Fourier components. By keeping the number of harmonics constant, it is possible to visu- ally compare the computed curves. The period, phase, and amplitude for each of the environmental parameters are given in Fox and Davis (1971) . In the northern hemisphere, winds circulate in a counterclockwise direction around a low-pressure system. During the summer months, the low-pressure systems generally pass to the north of. the study area located on the eastern shore of Lake Michigan. Therefore, as the low- pressure system approaches the area, winds blowout of the southwest and generate waves from that direction. As the front passes over, the wind builds up in intensity and shifts over to the northwest. Since the winds following the passage of the front are generally stronger,
  • 96. 3. TIME SERIES ANALYSIS 8S the waves from the northwest are higher and have a longer period. Dur- ing the high wave conditions following the passage of the front, the waves run up on the beach and water percolates into the groundwater system. The IS-term Fourier curves for wind velocity, wave period, breaker height, and groundwater table level are plotted in Figure 3. 3. The period, phase, and amplitude for each of the harmonics are given in Fox and Davis (1971). Wind, which is the driving force, controls the wave period and breaker height, which in turn influence the level of the groundwater table. Therefore, a phase lag would be expected in the Fourier curves with wind velocity reaching a peak first, followed by wave period and breaker height, with the groundwater table g20 ..c: 5 2 3 2 2 i 1 LL July. 1970 5 10 15 I ' « I I I I I I II I 20 ! I I 25 '" Lake O.......-->...........:.:..:.-........:..a_-.;...:...;_......__......_ .......__..... level FIGURE 3.3
  • 97. 86 WILLIAM T. FOX responding a few hours later. The IS-term Fourier curve for wind ve- locity shows peaks that correspond to the low points in the barometric pressure curve in Figure 3.2. The maximum wind velocity was recorded at 8:00 a.m. on July 4, 1970. The curves for wave period and breaker height in Figure 3.3 correspond quite closely to the wind velocity curve. The curve for wave period has its peaks a few hours after breaker height and drops off more slowly than breaker height. As waves change from storm waves to swells, the breaker height decreases and the wave period increases. There is a surprisingly close correspondence between the curves for breaker height and groundwater level in Figure 3.3. Groundwater level was measured in three tubes located approximately 10, 21, and 32 feet from the plunge zone. Since the plunge zone moves with time, average distances are given for the groundwater tubes. For each of the groundwater tubes, the sixth Fourier harmonic with a period of 120 hours, or S days, accounts for the greatest percentage of the total sum of squares. For the first groundwater tube, the sixth har- monic has a phase of 9.8 hours. For the second tube, the same har- monic has a phase of 12.3 hours and for the third tube, it has a phase of 16.8 hours. This yields a phase difference of approximately 7 hours between the first and the third tubes. Since the tubes are 22 feet apart, this indicates that the groundwater that was fed into the fore- shore by the run-up of the beaches percolates through the beach at a rate of approximately 3 feet per hour. Therefore, the curve for the second groundwater tube, which is given in Figure 3.3, has a phase lag of approximately 7 hours behind the breaker height curve. The reversal of wind direction with the passage of low pressure systems plays an important role in controlling coastal processes. Plots of the alongshore component of the wind, longshore current ve- locity, and breaker angle are given in Figure 3.4. As a low-pressure system approaches the study area, wind and waves are generated from the southwest. The waves that move onto the shore from the southwest generate longshore currents that move to the north. As the low- pressure system passes over the area, wind direction shifts over to the northwest, generating waves out of that direction. With the shift in wind direction, breaker angle and longshore current are reversed
  • 98. 3. TIME SERIES ANALYSIS 87 July. 1970 5 10 15 20 25 . . . . I . I 20 Alongshore wind 5 10 0 .t= North :;; Co 0 ~ South ~ 10 -20 2 "0 c: 0 ~ South :;; 0 Co North <i " -1 u. North South FIGURE 3.4 with the current flowing to the south. Wind and waves approaching from the northwest, accompanied by a southward flowing'longshore cur- rent, are recorded as positive, while wind and waves from the south- west with a northward flowing longshore current are considered nega- tive. As a low-pressure system approaches, a gradient wind is gener- ated around the low-pressure system which spirals counterclockwise in- ward toward the center. Since the fronts pass to the north of the study area during the summer, the counterclockwise winds are blowing out of the southwest as the front moves into the sea. As the front approaches, the winds increase in velocity, building up the heights of the breakers and increasing the velocity of the longshore currents. After the front passes over, the winds shift over to the northwest fol- lowed by a corresponding shift in breaker angle and longshore current
  • 99. 88 WILLIAM T. FOX direction. The storm cycle pattern, with a low in barometric pressure accompanied by a peak in wind velocity and breaker height and a rever- sal in longshore current direction, is repeated several times in Fig- ures 3.2, 3.3, and 3.4. REFERENCES Fox, W. T., 1964, FORTRAN and FAP program for calculating and plotting time-trend curves using an IBM 7090 or 7094/1401 computer system: Kansas Geol. Survey Spec. Dist. Pub. 12, 24. Fox, W. T., 1968, Quantitative paleoecologic analysis of fossil com- munities in the Richmond Group: Jour. Geology, v. 76, pp. 613- 640. Fox, W. T. and Davis, R. A., Jr., 1971, Fourier analysis of weather and wave data from Holland, Michigan, July, 1970: O.N.R. Tech. Report No.3, Contract 388-092, 79 p. Harbaugh, J. W., and Merriam, D. F., 1968, Computer applications in stratigraphic analysis: New York, John Wiley &Sons, 282 p. Kendall, M. G., 1948, The advanced theory of statistics: London, C. Griffin &Co., 503 p. Krumbein, W. C., and Pettijohn, F. J., 1938, Manual of sedimentary petrography: New York, Appleton-Century Co., 549 p. Louden, R. K., 1967, Programming the IBM 1130 and 1800: Englewood Cliffs, N.J., Prentice-Hall, Inc., 433 p. Miller, R. L., and Kahn, J. S., 1962, Statistical analysis in the geo- logical sciences: New York, John Wiley &Sons, 483 p. Preston, F. W., and Henderson, J. H., 1964, Fourier series characteri- zation of cyclic sediments for stratigraphic correlation, in symposium on cyclic sedimentation (Merriam, D. F., ed.): Kansas Geol. Survey Bull. 169, v. 2, pp. 415-425. Vistelius, A. B., 1961, Sedimentation time-trend functions and their application for correlation of sedimentary deposits: Jour. Geo- logy, v. 69, pp. 703-728. Walker, R. G., and Sutton, R. G., 1967, Quantitative analysis of tur- bidites in the Upper Devonian Sonyea Group, New York: Jour. Sed. Petrology, v. 37, pp. 1012-1022.
  • 100. 3. TIME SERIES ANALYSIS 89 Walpole, R. L., and Carozzi, A. V., 1961, Microfacies study of the Rundle Group (~1ississippian) of Front Ranges, Central Alberta, Canada: Am. Assoc. Petroleum Geologists Bull., v. 45, pp. 1810- 1846. Weiss, M. P., Edwards, W. R., Norman, C. E., and Sharp, E. R., 1965, The American Upper Ordovician standard. VII. Stratigraphy and petrology of the Cynthiana and Eden Formations of the Ohio Valley: Geol. Soc. Amer. Spec. Paper 81, 76 p. Whittaker, E. T., and Robinson, G., 1929, The calculus of observations in A treatise of numerical mathematics (2d ed.): London, Blackie &Sons, 395 p.
  • 101. Chapter 4 Markov Models in the Earth Sciences w. C. Krumbein 4.1 FUNDAMENTALS The term "random process" has an unfortunate connotation for many earth scientists. It seems to imply a haphazard, unorganized, spora- dic, and unpredictable process that violates the basic principles of science. These principles rest heavily on the fact that science seeks for systematic, patterned responses from recognizable causes and that unpredictable or chance events have no place in scientific analysis. Hence, by extension, models that postulate any kind of random occur- rences are naturally held suspect. Much of this misunderstanding arises from lack of recognition that a random variable has as valid a basis in scientific investigation as the conventional nonstochastic variable (systematic variable) that forms the basis of classLcal mathematical physics. A random variable is a mathematical entity that arises from probabilistic mechanisms, just as systematic variables are associated with deterministic me- chanisms. The outcome of a deterministic experiment is exactly pre- dictable from knowledge of the relations between dependent and inde- pendent variables, whereas the outcome of a probabilistic experiment depends on the liklihood of a given event occurring in some underlying set of probabilities. This set of probabilities constitutes the sam- ple space of the probabilistic mechanism; if this is known, then the group behavior of the variables is completely predictable in rigorous mathematical terms. Moreover, the probability of a particular event occurring can also be exactly stated. 90
  • 102. 4. MARKOV MODELS IN THE EARTH SCIENCES 91 4.2 A SPECTRUM OF MODELS In virtually all fields of science the range of process mechanisms extends from fully path-dependent deterministic models (in which past events completely control future events) to independent-events models, in which the past has no influence whatever on future events. A simple example of a deterministic model is the negative exponential process in time, -at f(t) = Y = YOe (4.2.1) where the dependent variable Y is completely controlled by the constant YO' the fixed exponent a, and the independent variable t. If YO and a are known either from theory or experiment, the value of Y associated with any time t can be exactly predicted. Although equation (4.2.1) is a continuous function, it can be discretized by considering succes- sive values of f(t) at some small increment lit. In this way the "state of the system" can be thought of in terms of discrete points in time t n_2, tn_I' tn' tn+l' and so on. For some phenomena the distance, X, can be substituted for time. At the other end of the spectrum is the independent-events model. In discrete form this is expressed as I t = i} = p. n J (4.2.2) which says that the probability of the system being in state j at time tn+l' given that the state at tn is i, is simply the probability of state j occurring at time tn+l' wholly independently of the previous state of the system. Somewhere between these extremes lie processes in which partial dependencies are present, in the sense that the state of the system at tn+l does depend on the state at tn' but is not influenced by earlier states as at tn_I' This particular case gives rise to the simplest kind of Markov chain, a discrete-time, discrete-state, one-step memory process expressed as p .. ~J (4.2.3)
  • 103. 92 IV. C. KRUMaEIN in which p .. is the conditional probability of the system being in 1J state j at tn+l' given that it was in state i at tn· Here Pij is the transitional probability: the probability that the system changes from state i to state j in the discrete time step from tn to tn+l. 4.3 THE MARKOV CHAIN In its simplest classical form the first-order, discrete-state, discrete-time Markov chain can be visualized as representing a system with a finite number of discrete states, A, B, C, ... , behaving in such a way that a transition occurs from state i to j (where j may be either the same state or a different state) at each tick of a conceptual "Markovian Cl,ock." The model is expressed as a transition probability matrix with rows represented by i and columns by j. A three-state system can be shown as: To State j A B C A PM PAB PAC From State i B PBA PBB PBC e PCA PCB PCC Here, PM' PBB ' and Pec' commonly designated as Pii' represent transitions from a given state to itself, whereas the offdiagonal en- tries, designated as p.. where jFi, represent transitions to other 1J states. The notation in the matrix is such that, for state A, PM is the probability that the system will remain in the same state, PAB is the probability that the system will move to state B at a given clock tick, and PAC is the probability that it will move to state C. These three probabilities sum to 1.0. Note that when transitions occur from a given state to itself, no change in the system is apparent to an on- looker until the system, on some given clock tick, does change to a different state. The length of time that the system remains in a given state after having entered it at a particular click, is called the
  • 104. 4. MARKOV MODELS IN THE EARTH SCIENCES 93 (discrete) waiting time for state i. Literally it refers to the num- ber of clock ticks that the system "waits" in state i before leaving i for another state jFi. In structuring data for this simplest Markov chain, observations of state are made at each tick of the clock. This of course is an imaginary clock, and in practice one selects a fixed time interval based on theory, observation of what is going on, or even simple geo- logic intuition. In stratigraphic applications observations of state are made at fixed vertical intervals along a stratigraphic section, say at every foot. Thus, if state A represents sandstone, state B shale, and state C limestone, a sequence of such observations upward through the section might be AAABBCBBBABBCCCA... , which says that the system starts in state A and remains there for two more clock ticks, after which it changes to state B and remains there for a second tick, changes to state C for one click, then returns to state B, and so on. If observations are made at I-foot intervals, the section has 3 feet of sandstone at the base, followed by 2 feet of shale, 1 foot of lime- stone, and so on. In this particular procedure, equal increments of distance are used instead of equal increments of time. This is a matter of conveni- ence, and although several severe geologic implications may be involved, the matrix is still expressed as transition probabilities, but now the waiting time is a discrete "thickness time." Krumbein and Dacey (1969) refer to this kind of structuring as a Markov chain with transition matrix P. The actual procedure for assembling the transition matrix is given in Krumbein (1967, p. 3). Here, we follow thro~gh with the sim- plest case. Transition probability matrices can be generated from any succes- sion of events, but this of itself gives no indication whether the process is Markovian. Statistical tests are available for making the decision; the most widely used is that of Anderson and Goodman (1957), which tests the hypothesis of an independent-events model against the alternative that a first-order Markovian property is present. Even if the hypothesis of an independent-events model is rejected, and that of a first-order chain accepted at least by implication, there
  • 105. 94 W.C. KRUMEEIN is a second requirement that must be fulfilled. This is that the dis- tribution of waiting times for each state must be geometrically dis- tributed. This follows from the fact that the output from any simula- tion or Monte Carlo studies with Markov chains having matrix P = [p .. J 1J is distributed in this manner with parameter (1 - Pii). This require- ment is so important that it deserves detailed examination. 4.4 GEOMETRIC DISTRIBUTION This distribution is conveniently examined in terms of indepen- 'dent-events models of the kind shown in equation (4.2.2). Consider a single six-sided die, in which the sample space has six elements, rep- resenting the six faces, each with its pattern of dots. Each face has probability 1/6 of occurring faceup on any toss, and probability 5/6 that some other face will appear on top. If we consider the die in these terms, we have a simple system of two states. If a given face, say 4, comes up on a given toss, we can translate this into a "waiting time" by asking how long the system will remain in this state, i.e., what is the liklihood that a 4 will occur once, twice, or more times in successive throws before a non-4 shows up? We can set up an equation for this as follows: Let P{R = k} be the probability that the number of times, R, that a 4 comes face up is exactly k, where k 1,2,3, .... The probability that a 4-face ap- pears initially is p 1/6, and the probability that it will not ap- pear on the next trial is 1 - P = 5/6. Thus, once a 4-face occurs, the probability that it will occur exactly k times means that it must be repeated exactly k - 1 times, so that on the k-th trial some face other than a 4 will occur. This leads to the geometric density with parameter (1 - p): { } k-1 k-1 P R = k = (1 - p) (p) = (5/6) (1/6) (4.4.1) In this expression a "success" occurs when on the k-th trial a non-4 appears face up. The successive probabilities are easily calculated. k = 1, we obtain P{R = I} ~ (5/6) (1/6)° = 5/6 = 0.8333. By setting For k = 2
  • 106. 4. MARKOV MODELS IN THE EARTH SCIENCES this becomes (5/6) (1/6)1 = 0.1389; for k = 3 we have (5/6) (1/6)2 0.0232; and the probability that k exceeds 3 is only 0.0046. 95 The geometric distribution applies to all independent-events models of the kind expressed by equation (4.2.2). The reason that it also applies to each state of a Markov chain with transition matrix P = [p .. ] is that each row of the transition matrix. is in fact a two- 1J state independent-events model (a Bernoulli model) in that when the system is in state i, the probability of remaining in that state on the next tick of the clock is Pii and the probability that it will change to another state j#i is (1 - Pii). The one-step memory of the Markov chain is thus related to the outcome of a random draw that de- termines whether the next drawing is to be made from the same row (state) or from some other row as specified by the offdiagonal p.. 's 1J for j#i. The full development of the geometric distribution as it applies to the simplest Markov chain is given in Krumbein and Dacey (1969, p. 83), which includes an example with a histogram. It is interesting to note, incidentally, that (1 - Pii) is estimated by the reciprocal of the arithmetic mean of the geometric distribution. In terms of equa- tion (4.4.1), Pii is the probability of remaining in state i for any individual drawing, and (1 - Pii) is the probability of leaving state i on any given tick of the clock. 4.5 PROBABILITY TREES The transition matrix P can be used to show the probabilities as- sociated with each state of the system through a succession of ticks on the Markovian clock. Such a tree is illustrated in Krumbein (1967, p. 27) and in Harbaugh and Bonham-Carter (1970, p. 115-117). The diagrams are very instructive, and with patience one can extend the tree until succeeding sets of branches all have fixed probabilities. At this point the system has reached equilibrium, and the fixed probabilities associated with each state in the system express the overall average relative proportion of that component in the system under study.
  • 107. 96 W.C. KRUMBEIN When stratigraphic data are structured in the equal-interval form, with vertical spacing of h feet, the fixed probability vector x 100 gives the overall percentage of the total thickness of each lithology in the section. Mathematically the fixed vector is obtained by rais- ing the matrix P to successively higher powers, and noting when all the rows of the transition matrix achieve the same values. The fixed probability vector is actually an independent-events model, and simulations arising from it have geometric distributions as in equations (4.4.1), with parameters (1 - Pi)' where Pi is now the fixed probability of the i-th component. This simplest Markov model serves to bring out some essential points in applications of Markov processes in geology. Recall that it is a discrete-state, discrete-time, one-step memory model. Variants of this model may have two or more steps in their memories (Pattison, 1965; Schwarzacher, 1967, James and Krumbein, 1969, p. 552). Moreover, the chains may be converted to continuous-time models (Krumbein, 1968a, b), and (though this becomes mathematically complex), the states may be made continuous instead of discrete. A point not previously emphasized is that the probabilities in a Markov matrix must be stationary in the sense that the transition probabilities remain the same through the entire system being studied. Harbaugh and Bonham-Carter (1970, p. 122) develop this topic in greater detail. A very large variety of experiments may be conducted in the frame- work of this simplest Markov model. Time or distance may in general be interchanged, and neither of these need to be observed at fixed intervals. A question that emerges when Markov models are examined in detail concerns proper procedures when the input data are not geometrically distributed. This is particularly appropriate in stratigraphic ana- lysis, where much observational data suggest that rock thicknesses are distributed lognormally rather than geometrically. We examine this situation next.
  • 108. 4. MARKOV MODELS IN THE EARTH SCIENCES 97 4.6 Et~BEDDED MARKOV CHAINS When a set of real-world data has the Markov property but does not display a geometric distribution for each state, an embedded Markov chain may be more appropriate for analysis. This version of the Markov model is obtained by structuring the transition probability matrix on the basis of changes of state only, so that no transitions from a given state to itself are permitted. In this arrangement the sequence of observations listed earlier becomes simply ABCBABC ... , thus recording only the sequence of rock types in the stratigraphic section. The result of structuring the data this way is to reduce the dia- gonal elements in the transition matrix to zero. Because this model behaves very differently from the Markov chain with transition matrix p = [p .. ], Krumbein and Dacey use the symbol r .. for the transition ~J ~J probabilities, with r ii identically zero. Thus the earlier matrix be- comes: To State A B C A [r:A r AB TA"J The r I s here are From State i B 0 r BC probabilities, not correlation C rCA r CB 0 coefficients. This is called an embedded Markov chain with transition matrix R = [r . . J. It is obtained from the P matrix by the relation r .. = ____ ~:..J_ ~J p .. /(1 - p .. ) for all jli. Each p.. in the diagonal is then changed ~J ~~ ~~ to zero. The embedded chain does not specify a waiting time, which means that any frequency distribution can be used as follows: The R matrix is used to get the succession of states, and for each occur- rence a random observation is drawn from the frequency distribution of elements in the corresponding state. Carr et al. (1966) used such a matrix in a study of a Chester (upper ~1ississippian) section, using lognormal thickness distributions for each component. Although the embedded chain is independent of the
  • 109. 98 W.C. KRUMBEIN geometric waiting time, the matrix nevertheless should be tested for the Markov property. An interesting question to be raised is what a simulated section would look like if the R matrix is used directly for simulation, without assigning random thicknesses. It should be mentioned here that probability trees and fixed probability vectors for embedded Markov chains with matrix R do not have the same interpretation as in Markov chains with matrix P. When the embedded matrix (with its zero diagonal elements) is raised to its equilibrium power, the fixed probabilities x 100 give the percentage of the overall number of times that a given lithologic type occurs. This is because the R matrix gives no information about thicknesses. In their study of the Chester section, Carr et al. noted that oc- casionally a given rock type was immediately followed by a variant of the same kind of rock, as when a thickbedded limestone is overlain directly by a thinbedded limestone. To cope with these situations, entries were put into the diagonal of the R matrix, to re~resent the probability of such occurrences among the lithologic units in their section. This variant was called a multistory transition matrix, and it was used in simulation as before, drawing random thicknesses as needed. The introduction of any finite element into the diagonal of an embedded matrix immediately introduces a geometric waiting time into that state of the system, and random thicknesses drawn from a lognor- mal distribution are no longer strictly appropriate. This can easily be seen by simulation experiments, which automatically yield geometric thickness distributions with parameter (1 - r .. ). Potter and Blakely 11 later (1967) used this same kind of matrix to simulate a fluviatile sandstone section with several varieties of sand bedding. The problems raised by adoption of either the simple ~arkov chain with matrix P = [p .. ] or the embedded chain with matrix R = [r.. ] 1) 1) mainly concern the "true" distribution of bed thicknesses and litho- logic-unit thicknesses in stratigraphic sections. What has been sug- gested is a re~examination of the operational definitions by which these distributions are obtained, inasmuch as the critical point involved is the relative frequencies of very thin beds (Krumbein, 1972). If cur- rent operational definitions are insensitive to very thin beds or
  • 110. 4. MARKOV MODELS IN THE EARTH SCIENCES 99 units, an observed distribution could appear to be lognormal rather than geometric, or more properly exponential in the continuous case, inasmuch as the continuous equivalent of the geometric distribution is simply f(t) SeSt , where S is the parameter, related to the transition probabilities in the P matrix. 4.7 EXTENSIONS Two aspects of Markovian analysis in stratigraphy will probably become more important as time goes on. One of these, touched upon earlier, is the use of transition rates rather than transition proba- bilities. This involves moving the model from discrete to continuous time, but allowing the states to remain discrete. The subject is ex- plored in Krumbein (1968a, b) in terms of a lateral-shift model that can be applied to transgressive-regressive movements of a strand line. This can actually be done with transition probabilities by redefining the· states of the system as the successive positions through time of the strand line, as monitored in terms of strand-line deposits. In this approach one may start to analyze processes rather than responses, though observational data may be less readily obtained. Part of the difficulty involves exact relations between time and thickness of stratigraphic units, if the outcome of a continuous-time process is to be expressed in rock thicknesses as continuous distributions. A second promising avenue for further research is to examine stratigraphic sections at the individual-bed level rather than that of the rock units themselves. In this approach the transition matrix is based on successions of individual beds, and the transition probabili- ties express the liklihood that a given kind of bed (say of shale) per- sists through a number of successive clock ticks, or whether the state changes to another kind of bed, say of limestone. This model involves three frequency distributions, one representing the distribution of the number of beds per lithologic unit, the second representing the thick- ness distributions of the individual beds, and the third concerns the thickness distributions of the lithologic units, whose individual thicknesses represent the sum of all the bed thicknesses in the unit.
  • 111. 100 HI.C. KRUMBEIN Dacey and Krumbein (1970) have looked into this problem, and the most interesting part of the study was the demonstration that if the number of beds in lithologic units of the i-th lithology are dis- tributed geometrically, and if the thickness distribution of beds of the i-th kind of lithology in the lithologic unit is also geometric (with the same or different parameters), then the thickness distribu- tion of the lithologic units of the i-th kind of lithology will be distributed geometrically with a parameter predictable from the param- eters of the number of beds distribution and the bed thickness dis- tribution. Implementation of the lateral-shift model and of the bed- transition model is hindered by lack of observational data in the lit- erature. The lateral-shift model requires identification of cross- cutting relations between time lines and rock lines over short dis- trances, and the bedding model is presently hindered by lack of truly discriminatory ways of distinguishing between thickness samples drawn from logarithmic and exponential distributions. Some examples of both of these kinds of distributions tend to plot as relatively straight lines on log-probability paper, and even chi-square tests may not be fully discriminatory. Despite the difficulties that beset advanced analytical applica- tions of Markov models, in contrast to their largely descriptive pre- sent use, there are several relatively straightforward criteria that can be helpful in choosing Markovian models as against independent- events models for analyzing earth-science data. Basically these de- pend upon two considerations: presence or absence of a Markov prop- erty; and presence or absence of geometric distributions in the data of concern. Four combinations can be distinguished in stratigraphic analysis: 1. The observed data have a first-order Markov dependency (i.e., event t 1 is controlled by the state of the system at event t ) n+ n in the succession of lithologies, and they each have a geometric distribution of lithologic-unit thicknesses. 2. The observed data have a first-order Markov dependency in the suc- cession of lithologies, but they do not have a geometric distribu- tion of lithologic-unit thicknesses.
  • 112. 4. MARKOV MODELS IN THE EARTH SCIENCES 101 3. The observed data do not have a first-order Markov dependency in the succession of lithologies, but they do have a geometric dis- tribution of lithologic-unit thicknesses. 4. The observed data have neither a first-order Markov dependency in the succession of lithologies nor do they have a geometric dis- tribution of lithologic-unit thicknesses. If combination 1 obtains, then all operations and interpreta- tions that apply to discrete-state discrete-time first-order Markov chains with transition matrix P are appropriate. If combination 2 obtains, then the appropriate model is the embedded Markov chain with transition matrix R. Where combination 3 obtains, the appropriate model is an indepen- dent-events model of the kind shown in equation (4.2.2). Combination 4, having neither the Markov property nor the geo- metric distribution, is outside the limits of this discussion. In the present context, however, this last case could be called a "degenerate Markov chain," just as, in a sense, Anderson and Goodman's (1957) test is that of HO equals a "zero-order chain" as against Ha equals a first-order chain. REFERENCES Anderson, T. W., and Goodman, L. A., 1957, Statistical inference about Markov chains: Anals Math. Statistics, v. 28, p. 89-110. Carr, D. D., and others, 1966, Stratigraphic sections, bedding se- quences, and random processes: Science, v. 154, no. 3753, p. 1162-64. Dacey, M. F., and Krumbein, W. C., 1970, Markovian models in strati- graphic analysis: Math. Geol., v. 2, p. 175-191. Harbaugh, J. W., and Bonham-Carter, G., 1970, Computer simulation in geology: New York, John Wiley &Sons, 575 p. James, W. R., and Krumbein, W. C., 1969, Frequency distributions of stream link lengths: Jour. Geology, v. 77, p. 544-565. Krumbein, W. C., 1967, FORTRAN IV computer programs for Markov chain experiments in geology: Kansas Geol. Survey Computer Contr. 13, 38 p.
  • 113. 102 w. C. KRUMBEIN Krumbein, W. C., 1968a, Computer simulation of transgressive and re- gressive deposits with a discrete-state, continuous-time Markov model, in computer applications in the earth sciences: Colloquium on simulation, D. F. ~erriam, ed.: Kansas Geol. Survey Computer Contr. 22, p. 11-18. Krumbein, W. C., 1968b, FORTRAN IV computer program for simulation of transgression and regression with continuous-time Markov models: Kansas Geol. Survey Computer Contr. 26, 38 p. Krumbein, W. C., 1972, Probabilistic models and the quantification process in geology: Geol. Soc. Amer. Spec. Paper 146, p. 1-10. Krumbein, W. C., and Dacey, M. F., 1969, Markov chains and embedded Markov chains in geology: Math. Geol., v. 1, p. 79-96. Pattison, A., 1965, Synthesis of hourly rainfall data: Water Re- sources Research, v. 1, p. 489-498. Schwarzacher, W., 1967, Some experiments to simulate the Pennsylvanian rock sequence of Kansas: Kans. Geol. Survey Computer Contr. No. 18, p. 5-14. BIBLIOGRAPHY Adelman, I. G., 1958, A stochastic analysis of the size distribution of firms: Jour. Amer. Stat. Assoc., v. 53, p. 893-904. (Example of discrete states with unequal class intervals.) Agterberg, F. P., 1966, The use of multivariate Markov schemes in geol- ogy: Jour. Geology, v. 74, p. 764-785. Agterberg, F. P., 1966, Markov schemes for multivariate well data: Min. Ind. Experiment Sta., Pennsylvania State Univ. Spec. Publ. 2-65, p. Y1-Y18. (Theory and application of first-order Markov process to study of chemical elements in a reef.) Allegre, C., 1964, Vers une logique mathematique des series sedimen- taires: Bull. Soc. Geol. France, v. 6, p. 214-218. Amorocho, J., and Hart, hydrologic systems v. 45, p. 307-321. (p.3l8).] W. E., 1964, Critique of current methods in investigation: Trans. Amer. Geophys. Union, [First-order and higher-order Markov chains Bartlett, M. S., 1960, An introduction to stochastic special reference to methods and applications: processes with Cambridge, The University Press, 312 p.
  • 114. 4. MARKOV MODELS IN THE EARTH SCIENCES 103 Billingsley, P., 1961, Statistical methods in Harkov chains: Ann. Math. Stat., v. 32, p. 12-40. Clark, W. A. V., 1964, Harkov chain analysis in geography: an appli- cation to the movement of rental housing areas: Ann. Assoc. Am. Geog., v. 55, p. 351-359. (Study of rentals in several cities for three 10-year intervals.) Coleman, J. S., 1964, Introduction to mathematical sociology: Illinois, Free Press, Glencoe, 554 p. Doob, J. L., 1953, Stochastic processes: New York, John Wiley & Sons, Inc., 154 p. Feller, W., 1968, An introduction to probability theory and its ap- plications (3rd ed.): New York, John Wiley &Sons, 509 p. Fenner, P., (Editor), 1969, Hodels of geologic processes: AGI/CEGS Short Course, Philadelphia, November, 1969. Available through American Geological Institute, Washington, D.C. Gingerich, P. D., 1969, Harkov analysis of cyclic alluvial sediments: Jour. Sed. Pet., v. 39, no. 1, p. 330-332. Graf, D. L., Blyth, C. R., and Stemmler, R. S., 1967, One-dimensional disorder in carbonates: Illinois Geol. Survey Circ. 408, 61 p. (First-order Harkov model applied to crystallographic defects in carbonate crystals.) Griffiths, J. C., 1966, Future trends in geomathematics: Pennsylvania State Univ., Hineral Industries, v. 35, p. 1-8. Harbaugh, J. W., 1966, Hathematical simulation of marine sedimentation with IBM 7090/7094 computers: Kansas Geol. Survey Computer Contr. 1, 52 p. Harbaugh, J. W., and Wahlstedt, W. J., 1967, FORTRAN IV program for mathematical simulation of marine sedimentation with IBM 7040 or 7094 computers: Kansas Geol. Survey Computer Contr. 9, 40 p. Heller, R. A., and Shinozuka, M., 1966, Development of randomized load sequences with transition probabilities based on a Harkov pro- cess: Technometrics, v. 8, p. 107-114. Karlin, S., 1966, A first course in stochastic processes: New York, Academic Press, 502 p. Kemeny, J. G., and Snell, J. L., 1960, Finite Harkov chains: Princeton New Jersey, Van Nostrand Co., Inc., 210 p. Krumbein, W. C., and Graybill, F. A., 1965, An introduction to statis- tical models in geology: New York, HcGraw-Hi11 Book Co., 475 p.
  • 115. 104 W.C. KRUMBEIN Krumbein, W. C., and Scherer, W., 1970, Structuring observational data for Markov and semi-Markov models in geology: Tech. Rept. No. 15, ONR Task 389-150. National Clearinghouse No. AD 716794. Leopold, L. B., Wolman, M. G., and Miller, J. P., 1964, Fluvial pro- cesses in geomorphology: San Francisco, Freeman and Co., 522 p. Loucks, D. P., and Lynn, W. R., 1966, Probabilistic models for pre- dicting stream quality: Water Resources Research, v. 2, p. 593- 605. Lumsden, D. N., 1971, Facies and bed thickness distributions of lime- stones: Jour. Sed. Pet., v. 41, p. 593-598. Matalas, N. C., 1967, Some distribution problems in time series simu- lation: Computer Contribution 18, Kansas Geol. Survey, p. 37-40. Merriam, D. F., and Cocke, N. C., eds., 1968, Computer applications in the earth sciences: Colloquium on simulation: Kansas Geol. Survey Computer Contr. 22, 58 p. Potter, P. E., and Blakely, R. E., 1967, Generation of a synthetic vertical profile of a fluvial sandstone body. J. Soc. Petrol. Eng., v. 6, p. 243-251. Potter, P. E., and Blakely, R. F., 1968, Random processes and litho- logic transitions: Jour. Geology, v. 76, p. 154-170. Rogers, A., 1966, A Markovian policy model of interregional migration: Regional Sci. Assoc. Papers, v. 17, p. 205- 224. (Interregional migration under controlled and uncontrolled political conditions.) Scheidegger, A. E., and Langbein, W. B., 1966, Probability concepts in geomorphology: U.S. Geol. Survey Prof. Paper 500-C, p. C1-C14. (Markov processes continuous in time and space for slope develop- ment.) Schwarzacher, W., 1964, An application of statistical time-series analysis of a limestone-shale sequence: J. Geol., v. 72, p. 195- 213. Schwarzacher, W., 1968, Experiments with variable sedimentation rates, in computer applications in the earth sciences: Colloquium on simulation, D. F. Merriam, ed.: Kansas Geol. Survey Computer Contr. 22, p. 19-21. Schwarzacher, W., 1972, The semi-Markov process as a general sedimen- tation model: Mathematical models in sedimentology, edited by D. F. Merriam: New York, Plenum Press, p. 247-268. Shreve, R. L., 1966, Statistical law of stream numbers: Jour. Geology, v. 74, p. 17-37. Shreve, R. L., 1967, Infinite topologically random channel networks: Jour. Geology, v. 75, p. 178-186.
  • 116. 4. MARKOV MODELS IN THE EARTH SCIENCES 105 Shreve, R. L., 1969, Stream lengths and basin areas in topologically random channel networks: J. Geol., v. 77, p. 397-414. Smart, J. S., 1968, Statistical properties of stream lengths: Water Resources Research, v. 4, p. 1001-1014. Smart, J. S., 1969, Topological ?roperties of channel networks: Geol. Soc. America Bull., v. 80, p. 1757-1774. Vistelius, A. B., 1949, On the question of the mechanism of the forma- tion of strata: Doklady Akademii, Nauk SSSR, v. 65, p. 191-194. Vistelius, A. B., and Feigel'son, T. S., 1965, On the theory of bed formation: Doklady Akademii, Nauk SSSR, v. 164, p. 158-160. Vistelius, A. B., and Faas, A. V., 1965, On the character of the alter- nation of strata in certain sedimentary rock masses: Doklady Akademii, Nauk SSSR, v. 164, p. 629-632. Vistelius, A. B., 1966, Genesis of the Mt. Belaya granodiorite, Kamchatka (an experiment in stochastic modeling): Doklady Akademii, Nauk SSSR, v. 167, p. 1115-1118. (Application of a Markov chain in the study of the sequence of mineral grains in a thin section.) Watson, R. A., 1969, Explanation and prediction in geology: Jour. Geology, v. 77, p. 488-494. Wickman, F. E., 1966, Repose period patterns of volcanoes; V. General discussion and a tentative stochastic model: Arkiv Mineralogi Geologi, v. 4, p. 351-367. Zeller, E. J., 1964, Cycles and psychology, in Symposium on Cyclic Sedimentation: Kansas Geol. Survey Bull. 169, v. 2, p. 631-636.
  • 117. Chapter 5 A Priori and Experimental Approximation of Simple Ratio Correlations Felix Chaves 5.1 RATIO CORRELATIONS Most measurements are of quantities that are in some sense ratios, but this requires no special consideration in correlation analysis or in studies of interdependence if the denominators of the ratios being compared are constants. That elevation is measured in units initially defined as some fraction of the distance from the equator to the pole and specific gravity as some mUltfple of the weight of an equivalent volume of water at a particular temperature and pressure, for instance, need not concern the petrologist seeking to characterize the relation between the elevations of a set of samples in a sill and their specific gravities. Indeed, without such scaling parameters it is difficult to see how questions of this kind could be answered, or even asked. When the scaling parameters are themselves variables, however, as is often the case in geochemistry, the situation is very different. Relations between ratios may then be very different from those between the numerators and denominators - the "absolute" variables or "terms" - of the ratios. In particular, as was noted long ago by Pearson (1896), even though the terms are uncorrelated, there may nevertheless be cor- relation, and sometimes very strong correlation, between pairs of ratios formed from them. To characterize the general relationship, we first express the ratios Yi = X1/X2, Yj = X 3/X4 as first-order approximations of the true means, variances, and covariances of the X's, where each "observation" vector, X = [X1,X2,X3,X4] is drawn simply at random from a parent 106
  • 118. 5. SIMPLE RATIO CORRELATIONS 107 population characterized by means and variances ~m' ~~ for m = 1,4 and covariances ~mn for m~n. Each observed value of Xm = ~m + om' so that Y. 1 ~1 + °1 __-=".c..;. and Y. ~2 + u 2 J (5.1.1) It is readily shown (see, for instance, Chayes, 1971) that to first-order approximation (5.1.2) and, similarly, (5.1.3) Then, taking expectations on both sides of (5.1.2) and (5.1.3), so that, using (5.1.2) and the left half of (5.1.4), "'.1 Y. 1 and, from (5.1.3) and the right half of (5.1.4), Further, multiplying (5.1.5) by (5.1.6) (5.1.4) (5.1.5) (5.1.6) (5.1. 7) To find the parent correlation, p .. , between Y. and YJ " we re- 2 2 1J 1 quire the expectations of "'i' "'j' "'i"'j' viz.,
  • 119. 108 and FELIX CdAYES Var(Yi ) E(ll~) ~1 2 2 2 2 - 2)J1)Ji102~ 1 '" E 4()Ji1 + )J102 )J2 122 2 2 - 2)J1)J20"12) 4()J20"1 + )J10"2 (5.1.8) )J2 Var(Y. ) E(ll~) 1 2 2 2 2 - 2)J3)J40"34) '" 4()J40"3 + )J30"4 J J )J4 (5.1.9) Cov(Yi,Yj ) =E(llillj) 1 '" 22"()J2)J40"13 - )J2)J30"14 - )J1)J40"23 + )J1)J30"24) (5.1.10) )J2)J4 Thus, finally, Cov(Y. ,Y.) 1 J -{Var(Y.) . Var(Y.) "1 J (5.1.11) Now by definition O"mn = O"mO"nPmn' so that division of the numera- tor and denominator of (5.1.11) by ()J1)J2)J3)J4) leads at once to the commonly found form [see, for instance, equation (2.1) of Chayes, 1971, in which the signs of the correlation terms in the denominator are wrong] (5.1.12) where Cm = O"m/)Jm is Pearson's coefficient of variation and Pmn is his coefficient of correlation. From (5.1.12) it is evident, as is intuitively obvious, that if all terms of a pair of ratios are different and uncorrelated, the ratios themselves will also be uncorrelated. If, however, two ratios
  • 120. s. SIMPLE RATIO CORRELATIONS 109 have a common denominator they will be correlated even though their numerators are uncorrelated with each other and with the denominator. This at first sight paradoxical result - called "spurious" correlation by Pearson - can be reached by introducing into (5.1.11) or (5.1.12) the constraints that ~2 = ~4' 02 = 04' and Pmn = 0 for all mfn. But working it out ab initio is just as simple and provides useful drill for the novice. If Yi = X1/X2 and Y k (5.1.5) and X3/X2, then of course 6i is exactly as in (5.1.13) so that (5.1.14) If the X's are uncorrelated, the expectations of all cross prod- uct terms in the 8's vanish, and Var(Yi ) 122 4(~201 + 2 2 ~1° 2) (5.1.15) ~2 Var(Yk) 122 4(~203 + 2 2 ~302) (5.1.16) ~2 Cov(Yi,Yk) E(6i 6k) ~1~3 2 ~2 ~2 Thus, Cov(Yi,Yk) Pik -VVar (Yi) . Var(Yk) 2 ~1~3° 2 (5.1.18) ~ 2 2 (~20l 2 2 + ~ 1°2) 2 2 (~2° 3 + 2 2 ~302)
  • 121. 110 FE LI X CHAYES which approximates the correlation between two ratios with common de- nominator as a function of the means and variances of the numerators and denominator, a result again easily restated in terms of the coef- ficients of correlation and variation.* Thus, ratios with common denominator will tend to be positively correlated if their numerators are uncorrelated with each other and with their denominator. Indeed, the correlation generated in this fashion may be far from trivial. If, for instance, the coefficients of variation of the terms are equal, the correlation between the ratios is -0.5, and if the, coefficient of variation of the denominator is larger than the (equal) coefficients of variation of the numerators, the correlation between the ratios will be greater than 0.5; if it is twice as large, something not at all unlikely in geochemistry, the correlation of the ratios will be -0.8. The other simple ratio correlations - those between a ratio and its numerator or denominator, between ratios with common numerator, and between ratios the numerator of one of which is the denominator of the other - can of course be approximated in analogous fashion. All save the last are common in geochemical work (for a review, see Chayes, 1949) and the interested reader will find it useful to carry through the computations for the case of zero covariance between the X's, comparing his results with those shown in Table 2.1 of Chayes (1971). The approximations used here and in all the work so far cited are of first order only, and can be expected to yield reliable results only if terms of second and higher order in (02/~2) are small enough to ignore. In much geochemical work this is not so; indeed, in this field we often seem to use a particular variable as denominator precisely because its relative variance is large, so that higher powers of (° 2/ ~2) will often not be negligible. Although the work up to this point *Division of the numerator and denominator of the right side of 222 (5.1.18) by (~1~2~3) leads at once to C2 Pik 2 ~ (Ci + C;) (C; + C~)
  • 122. 5. SIMPLE RATIO CORRELATIONS 111 shows pretty clearly that correlation generated by the process of ratio formation may be far too strong to ignore, in many practical cases, alas, it will not lead to useful approximations of that correlation. Higher-order approximations for means, variances, and covariances are available (Tukey), and an analytical formulation using them would have the usual advantage of providing, in principle at least, a gen- eral solution, something of considerable aesthetic and scientific ap- peal. But a general solution is not indispensable if one can obtain a satisfactory solution for any specific problem that may arise. That is what one ought to be able to accomplish by simulation experimenta- tion, and the remainder of this chapter describes the structure and use of a computer program, RTCRSM2 (~a!io ~o£relation ~i~ulation, ver- sion ~, which is an attempt to exploit this possibility. 5.2 RTCRSM2 In using RTCRSM2, the investigator assigns: 1. Appropriate parent means and variances to the four pseudorandom variables A,B,C,D to be used as numerators and denominators of the ratios 2. The number of items (sample size) per simulation, and the number of simulations per experiment He must also initialize the random number generator, either by provid- ing it with a starting residue or instructing it to use a stored one. Given this information, the program generates a set of four ran- dom numbers from a parent population uniformly distributed in the range (0,1), transforms these to normal deviates with zero mean and unit variance, and adjusts each with its assigned mean and variance to prod- uce the current set of "observed" values of A, B, C, and D. From these all possible simple ratios are formed, and the elements of this vector of terms and ratios, together with their squares and cross products, are then stored in cumulators. The process is repeated until the re- quested number of items - i.e., sets of "observed values" of A, B, C, D - has been supplied and processed.
  • 123. 112 FELIX CHAYES The covariance matrix is then computed from the cumulated sums, sums of squares, and sums of cross products, its diagonal elements are converted to standard deviations, its off-diagonal elements to corre- lations, and the requested results printed out. Since the objective is to approximate as closely as possible the value of an unknown pa- rent correlation, the number of items per simulation should be as large as the computing budget will permit. But the cumulating procedure used in the program, selected because it economizes on core requirement and places no upper limit on the number of items per sample, leads to large rounding error. Double precision is essential for simulations con- taining more than a few hundred items; on the Univac 1108 it seems to satisfactorily control rounding error even for very large simulations. In this kind of work it is easy to bury oneself in numbers. RTCRSM2 is designed as a specific problem solver, and its printout may be restricted to those particular ratio correlations of immediate in- terest. In fact, unless the user specifies the type(s) of corre1a- tion(s) to be printed, the output will consist only of an error mes- sage reminding him that he should have done so. Loading instructions are provided in lines 32 to 76 of the accompanying program listing. 5.3 UNNO AND RANEX The random number generator referred to in RTCRSM2 as UNNO is de- signed to generate uniformly distributed numbers in the range (0,1); these are normalized in the main program. UNNO, coded in Fortran by L. Finger, appears to work admirably on the Univac 1108 used for the calculations reported below. Whether it performs satisfactorily on any specific computer can be determined experimentally and such experi- mentation should certainly precede routine operation of RTCRSM2, for unless the random number generator it uses is demonstrably sound, the results yielded by RTCRSM2 are uninterpretab1e. Program RANEX is de- signed to provide information on this matter; the rather extensive battery of tests performed by it is described in lines 2 to 9, and loading instructions are given in lines 19 to 34 of the accompanying
  • 124. 5. SIMPLE RATIO CORRELATIONS 113 listing. If UNNO performs unsatisfactorily or the user prefers another generator, the subroutine calls in RTCRSM2 will require modi- fication: these are contained in lines 231, 249, and 250. (With analogous modification of cards 98, 110, and 111, incidentally, RANEX may be used to test the output of any subroutine designed to generate uniformly distributed pseudorandom numbers in the range 0,1.) 5.4 COMMENTS ON USAGE Unless instructions to the contrary are provided at operation time, the variables A, B, C, D of RTCRSM2 are drawn from theoretically uncorrelated parents; in reasonably large samples, the correlations between them should be negligibly small. The actual sample correla- tions can be printed out, however, so it is always possible to see how nearly this goal has been attained in a particular simulation, and it is probably wise to do so. But ratio correlations can be approximated by simulation whatever the correlations between their terms, and RTCRSM2 provides limited facility for relaxing the restriction that the terms are uncorre1ated. Specifically, the user may assign correlation(s) of arbitrary size and sign between any pair or any two mutually exclusive pairs of variables A, B, C, and D. (When this option is exercised, the sample correlations between variables A, B, C, and D should always be printed out.) Preliminary experimentation with RTCRSM2 confirms an earlier sug- gestion (Chayes, 1971) that for ratios formed from uncorrelated terms homogeneous in C, the linear approximations of p are very good if C < 0.1 and still fairly good if C < 0.15. For 0.15 < C < 0.35 the simu- lated ratio correlations differ widely both from the linear approxi- mations and from zero, so that experimental determination or higher- order approximation of null values against which to test observed ratio correlations will be essential in this range. Simulation exper- iments suggest that with further increase of C, Pearson's "spurious" correlation - that between ratios with common denominator - averages about 0.5 with very large variance, while the other simple ratio cor- relations rapidly approach zero with small variance.
  • 125. 114 FELIX CHAYES These rather unexpected results will be described more fully when the work is completed and are mentioned here only for the sake of the perspective they provide. Compositional variables are necessarily nonnegative and in the absence of pronounced skew their coefficients of variation must be fairly small. For uniformly distributed variables - in which there is no central tendency - it is easily shown that C = 11/3 or 0.557; present indications are that it would be wise to avoid correlations between ratios of such variables with common denominators but safe enough to test observed values of other simple ratio correla- tions formed from them against a null value of zero. In binomial or multinomial variables, on the other hand, C = 1(1 - p)/np; here the first-order approximations of ratio correlations should be adequate if 0.05 < P < 0.95 and n > 400, and if p > 0.1 this should be true even for n as small as 100. But in work involving sampling variation, as opposed to mere counting variance, of major constituents, the relevant values of C will nearly always be considerably larger than for bino- mially distributed variables and, since there is usually a fairly strong central tendency, considerably smaller than for rectangularly distributed ones. It seems likely, then, that in dealing with ratios formed of major constituents, whether expressed normatively, modally, or as oxides, there may be frequent need for experimental determination of the appropriate null value of p along the lines suggested here, for it will often happen that the coefficients of variation of such vari- ables are large enough to make the first-order approximation of p un- satisfactory, but small enough so that the assumption that p = 0 is unrealistic. REFERENCES Chayes, F., 1949, On correlation in petrography: J. Geol., v. 57, p. 239-254. Chayes, F., 1971, Ratio correlation: Chicago, University of Chicago Press. Pearson, K., 1896-1897, On a form of spurious correlation which may arise when indices are used in the measurements of organs: Proc. Roy. Soc. (London), v. 60, p. 489-502. Tukey, J. W., undated, The propagation of errors, fluctuations and tol- erances: Unpublished Technical Reports No. 10, 11, 12, Princeton University.
  • 126. APPENDIX 1 C PROGRAM RT(qS~2 RTCP C FINDS CORRELATIn~s ~ETWEE~ ALL PATIOS FORMED FROM 4 TERMS CA,A,C,O) PTCR C WITH iSSIG~~EfJ MFANS,STtlNDAPf) fJEVlflTU1NS ANO COI'1MON !CLEMENTS, BY MONTF PTCR C C"oLO SIM(JLAT{nN. FOUR UNIFORMLY OISTRIBUTEn PSEUDCPANDOIii NUMRERS PTCR CAPE GENER ATEO IN THE RANGE (Q, 1), IJORMALI ZED BY THE • DIRECT' R TCR C DPOCEf)llRE OF lfLEN "NO "I=V[RO, ADJlJSTEn FOR AS<;IGt-.,IFO '-1EANS, STANDARf) RTCR C DEVIATIONS AND COMMON FLEMENTS, A~D STORED IN THE FIRST FOUP ELEMFNTS RTCP C OF VHTOR Y IA=YIll, B=Y(21,C=YUI,f)=Y(4»). THE REf'AAINING 12 ELEMENTS RTCR C OF Y APE LOADED WITH ~rNADY PATIOS 01= THE' FIRST 4, TN THE ORnER SHOWN RTCR C BHOW IN fJATA BLOCK NAM. AS EACH Y V!='CTOR IS Cn"1PLETED, THE SUMS, SUMSRTCR C OF SQUARES AND SUf'AS OF C'~OSS-PR(1(lUCTS OF ITS ELEMENTS ARE CUMUL ATEO. RTCR C THE PAOCFOUPE IS ITERATFf) NR TIMES TO GENERATE THE SAMPLE ARPAY. ON RTCR C COMPLETION Of THE NRTH CYCt.E THE COVARIANCE MATRIX IS GF~ERATEQ RTCR C FROM THE CUf'AIfLATED SUt-~S OF SQUARFS AND CPOSS-PQ.ODUCTS, AN£) THE f)ESIREOPTCR C CORRELATIONS APE EXTRACTE'9 AND PRINTED. RTCR C RTCR C C C C C C C C C C C C C C C IF NO CORRELATIONS ARE SPECTFED ON INPUT, VARIA~LFS A,8,(,0 WILL RF RTCR UNCORRFLATFO, T(1ll~qTS 'IF EXPERIMENTAL EAROR. IF IISER SPECIFIES RTCR OESIPED CORRELATIONS (POSITIVE OR NEGATIVE) BETWFEN ANY PAIR OR ANY 2 PTCR MUTUALLY EXCLUSIVF DAloS OF VARIABLES A,R,C,n, THE REQUIRED COMMON EL-RTCR F'~ENT STANDARD DEVIATIONS ARE crWPlJTfO FRCI'! AN ALGORITHM BASED ON RTCQ EQ. (3.1<=;), P.?6, OF RATIO CORRELATION (CHAYES,l971'. EXECUTION TERM- RTCR INATES IN ERRnp IF THE <;Ar~E VArUAI3LE APPFARS IN BOTH CORRELATIONS. RTCA Q,IINDOM NUMBFRS ARE GENFRATEf) I3Y SURfUlUTINE UNNOII,F), WHFRE I IS THE INTEGER S FED MJO F IS TW FUJAT FO NU"13ER IN RANGF (0, ll. THE FORTRAN GENERHOR USED WTTH THIS VERSION (11= 'UNEX WlIS COOED RY L. FINGER. PROGRAM WRITTEN BY F. CHAYES FOR NSF IN~T'TUTE ON GEOSTAfISTICS, (~ICAGO CIRClF, 1972 R TCR RT(P RTCR RTCP PTe R , P TCR RTel<' PTCR 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 ;>00 210 220 230 240 250 260 270 280 290 ~oo 310 V1 en H ~ t""' tTl ~ H o n o ~ E H ~ .... ....V1
  • 127. C ******************************************************************* RTC~ 320 ~ ~ C * * RTCR ~30 Q C * CARr) I~PUT TO PR1GRAM DTCRS~2 * RTCR 340 C ' " * R TCR 350 C * COMMflNn ChRD IfS,I4,llIl,tl51 ' * PTCR 360 C * COL. VAR I ABLE DEFINlTION nR FUNCTION * RTCP 370 C * 1-5 NR NUMBroR OF ITEMS PFf<. SAMPLE * RTCR 380 C * 6-9 ~MSP NUMBER OF sa~Pl~S TO RE DRAWN * RTC~ 390 C * 10 KML~T O-NOP, I - RFAD AND USE COMMeN ELEMENTS * RTCR 400 C * (Nfl - VECTOR RQST RESTRICTS PRINT TO MATERIAL REQUESTED) ' * RTCR 410 C * II RQST( II O-Nnp, 1 - ALL, IN ONE RIr. MATRIX * P TCR 420 C * 12 I . (2) O-NOP, 1 - BETWEEN All TERMS IA,BfC,D) * Q.TCR 430 C * 13 ' . ( 31 O-NOP 1 - 0F TYPE XliX? WITH Xl * RTCP, 440 C * 14 • • { 41 O-NOP, 1 - OF TYPr: XII X2 W J TH X2 * RTCR 450 C * 15 • t ( 5) O-NOP 1 - OF TYPE XI/X2 WITH XI/X3 * RTCR 460 C * 16 • • (6) o-"WP 1 - OF TYPF X1/X2 WITH X3/X? * RTCR 470 C ' * 17 " (7) Q-NOP, i-OF TYPE X1/X2 WITH X2/X3 >I< RTCR 480 C * 18 t t ( 8) O-N~P, 1 - OF TYPE XlIX? WITH X2/Xl * RTeR 490 C * 19 · , (9) O-N'lP, 1 - CF TYPE XI/X2 WITH X3/X4 * RTCP 500 C ' " 20 I I (10 I UNASSIGNED * ~ TCR 510 C * 21-35 OIN STARTING SEFn OF RA~DOM MJMBEA GENERATOR * RTCR 520 C ~, (GIN K' USE K * RTCR 530 C * QIN 0' USE 1ST RESIDUE FROM UNNO * R.TCR. 540 C * QIN -I' USE EXISTING RESIDUE) * RTCP 550 C ' " * RTCR 560 C * PARAMETER CARD (12F6.31 ,. R.TCR 570 C * 1-6 r,VAV( II PAP,ENT MEAN OF A '" RTCP 580 C ' " 7-12 GVS D( 11 PARENT STA~nARD DEVIATION OF A * R TCR 590 C * * RTCR 600 C * * RTCR 610 'T.I tTl C * 43-48 GVS Dt 4) PARENT STANDARD DEVIATION OF D >I< R TC P 620 t'" I-< C * * RTCP 630 >< C * COMMON FlEMENT CAPO (USE ONLY IF K1J!l"1T = 1) (12F6.3) * RTCR 640 @ C ' " 1-6 RHO (11 DESIRED CClDR. R T W. V AR I A FIt E S A AND B * RTCP 650 =< C * 7-12 ~Hn(2) DESIREO CORR. RTW. VARIABLES A AND C * RTCP 660 tTl Vl
  • 128. C C C C C C C C C C C C C C * 13-18 RHI: (3) DES YREI) CORP. BTW. VA.R IABlF:S A. AND f} * RTe!:! * 19-24 RHO(4) DESIRED CORR. BTW. VARIABLES e AND C * RTCR * 2S-:0 RHO(S) DESIRED CfJRR. RTW. VARIAfILES BAND 0 * RTCP * 31-3~ RHO(61 DESIRED C8RR. 8TW. VARIARLES C AND D * RTCR * N.B. - ANY ELEMENT OF RHn, (]R ANY ON£' OF THE PAIRS * RTCR '" {l,61, (l,S}' (3,4) MAY qE ~SSIGNED NON-ZFRO VALUES. * PTCR * PROBL£'MS IN t,.!HICH OTHER PAIRS OR MOR£' THAN TWO ELEMENTS. RTCR. '" OF RHO ARE ASSIGNED NON-ZERO VALUES WILL BE IGNORED. * RTCR * 11: RTCR *********.***********************"'********************************* RTCR DIMENSJ(1N AV(3ZI, RHO((') , CMSD(6), r,VAV(4), GVSO(4), NAM(16), *ORBF136I, Pt41, SIGt32,321, SlJMn21, Y!321, RQSTlIOl,NUPG(2I, *ORSD(41 OOUPLE PRECISION AV,SlJM,SIG,RN,RNL,Y REAL N~M INTEGER OllT ,Q,QIN,RQ<;T DATil IN,Ot)T,NUPG,/'5,6 ,4H( IHI, lH) I DATA "lAM/' A',' B ',' C ',' [) ','A/O','fl/D','C/D','D/C','A/C', *'B/C','C/R' ,'O/B' ,'A/R','B/A','C/A','O/.A'1 INPUT FOP"IATS RTCR PTeR R. TCR RTCR Q TCR RTeR R TCP PTCR Q TCR R TCR RTCR PTCP h70 680 6<10 700 710 720 730 740 750 760 770 780 790 ROO 810 820 830 fl40 850 860 870 880 1 FOR Mit T (t '), 14, 11 I 1 , I 1 '5 ) ~ FORMAT (12F6.31 5 FORMAT (6(6X,F6.111 RTCR 8911 RTCR 900 R TCR 910 OUTPUT fORMUS RTCP. 910 ? FfJRMAT ('1',2~X,'A. - ASSIGNEr) PARAMETERS FOR SIMULATIONS YIFLDIN~RTCR 930 *' CORREUTyrlNS SHOWN IN fl, qFLOJ'II'OPIRFNT VALUES OF HRIAS OF RATIRTCR 940 *OS'/16X,4(8Y,A21/10X,'MEAN',6X,4FIO.4/10X,'STf}.-f)EV.',Fll.4,3FlO.4RTCR 9S0 *,IITIO,' ITE~ATIONS IN SIMULATION. ENTRY SEED FnR RANDOM NUM~ER GRTCP 960 *ENERIITCR IS ',Il5,' TN SA'"1PLF NU"IflER',I3,'.') PTCR 970 4 FGRMAT I' NO ASSIGNEr) COMMON ELEMENTS AMONG TERMS OF RATIOS.') RTCR 980 6 FORMAT !l15X,' INITIAI_ PARA"'lETERS LISTEn ABOVE WILL BE "'ODIFIED BY PTCR 990 *COMMON ELEMENT ADJUSTMENTS TO INTRODUCE fOLLOWING CORRELATIONS-'f RTCRI000 *51X,6(A2,A?,7X)/', REQUESTED CCRR!=LATIO"'S' ,27X,6(F6.4,5X)I' REOUIRFPTCRIOIO V1 (J) H ~ ~ ~ H o 8 ~ SH ~ (J) I-' I-' --J
  • 129. *1) STAIiOARO DEVIATIONS OF COMMON ftEMENTS',cdF8. 1 +,3X)) RTCPI020 p FOR~AT t/35X'COMPARISON OF PARENT AND SAMPLE VALUES OF RATIO TERRTCRI030 *""S .A, p, r: AND D.'/55X,""EAfJ',l2X,'STD.-IlEV.'/40X,'TERM PARENRTCRI040 *T SAMPU',4X,1Pt~RENT SAMPLr'/(40)(,A2,8X,F6.2,LX,F1~3,3X,F6.2, RTCRI050 *IX,F8.4/11 RTCRI060 10 FORMAT (/'O""ATRIX WITH CORRELATIONS OFF, STANOARD DEVIATIONS ON, ORTCRI070 *IAGONAL -'/15X,16IA3,4X») ATCRl0eO 12 FORMAT 19X,A3,16F7.41 PTCRI09" 14 FORMAT (II' CORRELATTOfJS RETWf=EN TERMS',4X,6(2X,A3,',',A3)/33X, RTCRllOO *6IF6.4,3XII RTCRIIIO 16 FORMAT III' CORRElATIOfJS RETWFEN RATIOS AND THEIR NUMERATnRS -'I RTCRl120 *12(?X,A3,',' ,A3,lX)I?.X,F6.4,11(4X,F6.4») RTr"ll~O 18 FURMAT (II' CORRELATIONS RETWEEN RATIOS AND THEIR DENOMINATORS -'/RTCRI140 *12(2X,A3,',',A3,lXI/2X,F6.4,llt4X.F6.4») j? T CPl150 20 FORMAT (II' CORRELATIONS BETWEEN RATIOS WITH CO~MON NUMERATORS -'/RTCRI160 * I:? I ?X, A 3, ' , ' , A 3,1 X) !2 X, F 6.4,t l ( 4X ,F 6. 4)) RT CR 11 70 72 FORMAT (II' CORRELATIONS BET~EEN RATIOS WITH COMMON DENOMINATORS -RTeR1180 * , 112 ( 2)( ,A 3, , , ' , A 3,1 Xl 12 X, r:. 6. 1 + ,11 (4X ,F6. 41 ) R T CR 1190 24 FOqMAT III' CORRFLATIONS BETWEEN ?ATI~S IN WHICH ONE TERM,',A3,'ORRTCR1200 *',A2,', IS THE NUMERATOR OF ONE AND THE DENOMINATOR OF THE OTHER -RT(RI210 * '/1?(2X,t.3,',·,A3,lXl/2X,F6.4,l1l4X,F6.4ll RTCRI220 26 FORMAT til' C(lPRE'LATIONS flETWEFN R,.,nos WHICH ARF RECIPROCALS - ',RTCRI230 *6(2X,A],',' ,A3tlXI/'55X,6(F6~4,4Xl I RTCRl240 28 FORMAT III' I.ISF ',115,' AS SFEO TO RANOOM NUfvlBER GFNERATOR IN NEXTRTCR1250 * EXDFRIMFNT'/lHl) RTCR1260 30 Ff1RMAT (1I2'5X,'~. - SIMULATION RESULTS'! RTCR1210 32 FORMATI'I',20X,'ASK "IF NO DUr:STIOIJS ANn T WILL TELL YOU NO LIFS~ VRTCR12RO *HTf'P PQST IS EMPTY. TRY AfAIN 'IlHll RTCR1290 34 FORM,.,T III' CORRELATIONS gETWEr:.N RATIOS ~ITH NO CC~MON TERMS -'I RTCR1300 * 12 I 2X ,A 3, " , , A 3,1 X II? X, F 6.4,1 11 4X ,F 6. 4) 1 R TCR 1310 36 FORMAT (znX,'FAULTY nATA CARD. PROGRAM LOOKS FOR NEXT PROBLEM.'I' RTCR13?O *1') RTC~1330 3A H1PMAT ('lKMUH CARn SPFCIFIES "lORE THAN TWD OR A FAIJLTY PAIR OF RTCRI340 *Cf1RRElATIONS. PEAO IN NEXT PRORLFM OR QUIT." RTCCl13'50 RTCR1160 I-' I-' 00 "rj tTl r H >< @ 2< tTl 'Jl
  • 130. C READ Cn~MA~D CARD C 70 READ IIN,l,FRP=71,END=300) NR,NMsr,KMLMT,RQSr,QIN IF (NM~D.tQ.O) NMSP = 1 C CHECK PRINT REQUEST. IF IT IS EMPTY, SKIP TO NEXT PROBLEM. r DO 7? J = 1,1 0 IF IROSTIJI.EQ.1) 72 CflNTINUE WR IT E (nUT,32) GO TO 70 GO TO 7S C PRINT FRROR MESSAGE, RETURN FOR NEW COMMAND CARD 73 WRITE (QUT,36) GfJ Tn 70 C INITIALIZE RANOOM NUMBFR GEN"ORHOR IF THIS IS FIRST PASS OF 75 IF lOIN) 100,80,90 80 CALL UN!JOIQ,RA) GO TO 100 90 Q = QIN C READ PARA'4fTFR CARD, THEN COM/.10~l ELE~1E'NT CARD I f RFClUIREf). 100 READ {IN,3,ERR=731 (GVAV{I),GVSD!Il,t=l,'tl C REREAD GVsn INTO rAsa so IT CAN RE RECLAIMED AS NEEDED. REA 0 (0,5) OR S 0 NS = 0 IF (KMlMT.fQ.O) GO TO 101 RTrR1370 RTCR1380 RTCR13QO RTCR1400 RTCR1410 PTCRl420 RTCR1430 R rCR 1440 RTCR1450 RTCR 1460 RTCR1470 RTCP 1480 R TeR 1490 RTCR1500 EXECUTTON.RTCR1510 RTCR1520 RTCR1530 R TCR154Q RTCR1550 RTCR1560 RTCR 1570 RTCR1580 RTCP,1590 RTCP1600 C READ REQUESTEry CORRELATIONS flETWEEN C ELEMENT STANDARD DEVIATIONS A,fI,C,O, COMPUTE REQUIRED CO~MON- RTCR1610 RTCR1620 RTCR1630 READ (IN,3,EPP=73) RHO C OETERMINF WHETHER CORRELATION RFQUEST IS VALID. KR = () 00 1002 K= 1,6 IF (RHO(KII 1001,1002.1001 1001 KR = KR + 1 1002 CONTINUE I F (K R • EQ .1) GO TO 1007 R TCR 1640 RTCR1650 RTCR1660 RTCP1670 RTCR1680 RTCPl690 RTCR1700 RTC?l71 0 V1 fJ} 1-1 ~ t""' ttl ~ 1-1 o n o ~ S 1-1 o Z fJ} ..... ..... '"
  • 131. IF (KR.EQ.2) GO TO 1004 RTf-Rl720 C CORRELATION RfQUEST FAULTY, SKIP TO NEXT PROBLEM RTCR1730 1003 WR ITE (OllT, 38) RTCR1740 GO TO 70 RTCR1750 C DETERMINE WHETHER A VALIn PAtR OF CORRELATIONS HAS BEEN REQUESTED RTCR1760 1004 KR = 0 RTCR1770 no 1006 K = 1,6 RTCR1780 IF (RHOIK)) 1005.,1006,1005 RTCR1790 1005 KR : K~ + K RTCR1800 1006 CONTINUE RTCRIBIO IF (KR.NE.7) GO TO 1003 RTCR1820 C PROCESS VAUD CORRElATIOIII REQUEST RTCR1830 1007 KC = n RTCR1840 DO 1010 J : 1,3 RTCRI850 JK = J • 1 RTCR1860 00 1010 K = JK,4 RTCR1870 KC = KC + 1 RTCR1880 IF (RHO(KC') 1008,1010,1008 RTCRIR90 1008 PF : (GVSO{K)/GV~O(J)I**2 RTCP1900 RSQ = RHn(KC)**2 RTCR1910 fMPT = (RSQ*11.+PF'+SQRTIRSQ**2*(1.-PF1**2+4.*RSQ*PF»)/(2.*ll. RTCP1920 *-RSQI' RTCR1930 CMSD(KCI = GVSOIJI*~QRTICMPTI RTCR1940 to 10 CONTI NUE R TCR 1950 101 NS = N~ + 1 RTCR1960 IF (NS.GToNM<:;P) GO TO 70 QTCR1970 C TRANSFER INITIAL STANDARD DEVIATION~ FROM ORSD TO GVSD ~TCP19BO DO 1015 r = 1,4 RTCR1990 1015 GVSrlll = O~Srl(II RTCR2000 c ~ECORO INPUT DATA RTCR2010 WRIT E IOU T , 2 I ( NA 1.1 t [) , 1= 1 , I~), I G V 'I V I J) ,J = 1 , 4) , (G V S [)( 1<) , K: 1 ,4) , NR ,Q ,R TC P 2020 *NS . RTCR2030 LINF = 9 RTCR2040 IF(KML"1Tl 10?,102,l03 RTCR2050 102 WRITE (OUT,4) RTCR2060 ..... tv o ~ t"' H >< @ ~ trl Vl
  • 132. LINE = L!"JE + t GU Tn 105 C lGAO cn~MON-TER~ NAMF-PAIR~ TN PRAF 103 I = 0 Oil 104 J -= 1,3 JK = J + 1 DO 11')4 K .JK,4 I = I + 1 POFlF{Il NA~!J) I = I +- 1 104 PRBF(I) NAM(K) WPITF (OUT,6) (PP.RF(I)'I=1,121,{RHOtJI,J=l,6),tCMSOtK),K=1,6) LI ~1 E = l P~E + 5 C CLEAR CU~ULATOQS lO~ Dn 106 J = 1,16 SlI"'1(JI = 0.0 DO 1 0 6 K J, 1 6 106 SIG(K,Jl -= 0.0 c ( GfNF:RAH APRhY, CIJM1JLATINr; SUMS OF VARIAI3LFS, SQUAQES AND CR(1SS- C PRnOUCTS STEPWISE. 00 145 N = 1, NR (" r,FNERATF ONE SET OF VALUES FOR A,B,C,O, AN!) STORF IN Y(I),I-=1,4. no 110 I R = 1,4 110 (/ILL flNNO(Q,R(IR) c C NORMALIZE PI!), TRANSFeR"", STORF TRANSFORMED VAlIJE IN y(I I, I 1,4. c DO 115 T -= 1,"3,2 R(T) = SQo,T(-?*AIOG(R(TlI) J = I + 1 R(J) 6.2831851*RtJ) Y (I) GV AIJ (J) + GVSf) ([ I *R ([) *(f)S IF/( J II 115 Y(J) -= r.VAV(J) + GVsotJ)*R(I)*SIN(R(J') Q~CR2070 R TCR20AO RTCP.2090 RTCR2100 R TCR2110 RTCR2120 RTCR2130 RTCR2140 RTCR21'50 RTCR2160 IHCR2170 RTCR211~0 RTCR2190 RTCR2200 RTCR2210 RTCR2220 RTCR2230 IHCR2240 RTCR2250 PTCR2260 RTCP2HO R TCR 2280 RTCR2290 RTCP2300 RTCR2310 RTCR2320 QTt:R2330 RTCRH40 R TC R2350 RTCR2360 RTCP2370 P TCR2380 R TCP 2390 RTCR?400 VI en H ~ fu ~ H o 8 § S H ~ en ....N ....
  • 133. C ADJUST A,P.C,D FOp COMMON ELEMENTS, IF REQUIRED IF(KMLMT.EQ.O) GO TO 125 KC = 0 DO 120 J = 1,3 JK= J ... 1 DO 120 K = JK,4 KC = KC ... 1 I F ( C MS 0 ( K C » 1 2 0 ,120,11 7 117 CALL UNNO(Q,PA) CAL L UNNP (Q, I{P I lA = SORT (-2.*ALOG(!~AlI lR = COS(6.2831853*RB) CMFL = CMSD(KC,*ZA*ZB Y « J) = Y ( J) ... O~F L C SIGN OF INCREMENT TO V(K) DEPENDS nN SIGN OF RHO(KC) I FIR He' (K C) J I 18, 118, 119 C 118 V (K) = YI K) - C f'>1H G'J T':1 120 lIe) V(KI = Y(KI ... CMEL 120 r:flNTI NUF C STr:JPE "JUMf'RATnRS OF RATIOS IN Y(NJ,N=5,16 C 125 I = 4 1)0 130 J = 1,3 on 130 K = 1,4 I = I ... 1 130 VO) = V(KI C OIVIDE NUMEPATnps RY APPROPRIATE nENn~INATnps K = 17 r: f)O litO J = ],4 Dn litCI J = 1, '3 K = K - 1 140 Y{K) = Y(I<I/Y(Tl RTCf/2410 I{ TCP2420 P.TCR2430 RTCR?440 RTCR2 1 .50 RTCR2460 RTCP2470 RTCR2480 RTCR2490 PTCR.2500 R Tep 2510 RTCR2520 RTCR2530 RTCP?54(l RTCR2550 RTC P 2560 RTCP?570 RTCR?580 RTCP2590 RTCP2600 RTCR2610 R TCR2620 RTCR2630 RTCP2640 RTCR2650 RTCR?660 RTCR2670 RTCR2680 RTCR2690 RTCR2700 RTC P 2710 RTCR2720 RTCR2730 RTCR?740 PTCR2150 t-' N N "rj tTl t""" H >< 9 2< tTl en
  • 134. C (11,,",UlATF SUMS OF V~R!ABlfS, SQUAPES AND X-PPODUCTS FOR ONE ITEM nr. 14'5 K = 1,16 SUM(K) = SUMlKI + YIK) no 145 J K, 16 145 SIG(J,KI = SIG(J,KI + Y(JI*V(K} C C STOPE AVERAGES IN AV, CONVERT SIG TO rov MATRIX RN = NR c RNl = RN - 1. DO 1'50 K = 1.16 AVlKI = SUMlKl/RN DO 150 J = K, 1. 6 1'30 SIGlJ,K) =(SIr-tJ,KI - SJM(Jl*AVIKlt/RNl C ~AKE ~TAG. ElFMENTS STD. DEVIATIONS, OFF-OIAG. CORR. COEFFICIENTS C no 160 T 1,16 160S1(;lI,1I SQRTCSIG(JtI)) DC 170 J -= 1,15 K -= J + 1 DO 170 I = SIG(I,JI 170 SIGIJ,II K,16 StGII,J)/! SH;!l ,Il*SIt;(J,J)' SIG(I,JI C COMPUTATIONS (0MPLETE. PREPARE TO PRINT C IF (K~'lMT.EQ.OI GO TO ZOO C ADJUST PAPA~TER VAlUFS F8R COMMON El~~ENT EFFECTS KC = 0 on t<~0 J = 1,3 KJ = J + 1 00 190 K = KJ, 4 KC = KC + 1 IF(CMSO(KC» 190,190,180 180 GV$O(JI SQRT(GVSDlJI**Z + CMSO(KCI**Z' GVSO(KI = SQRT(GVSD(K)**Z+CMSO(KC)**?) Rrcp 2160 RTCPZ770 HcP-Z780 RTCRZ790 RTCR2800 RTCR2810 RTCR2820 RTCP2830 R TCPZ840 PTCP2850 RTCRZ860 R TCR?f370 RTCRZAfO RTCR2~C)O RTCR2900 RTCR7910 RTCR2CJ20 PTCR2930 RTCR2940 RTCR2950 PTCRZ960 RTCR2970 RrCp 2980 RTCR?990 RrCP3001) RTCR1010 PTCP1020 RTCR3030 RTCR 3040 RTCR3050 R TCR3060 R TCR'3070 RTCR3080 R TCR.3090 RTCR3100 U1 C/l H ~ ~ ~ H o 8 ~ t"" ~ H o ~ ..... N VI
  • 135. 190 CONTI NUF C C PRINT COMPARISON OF PARENT AND SAMPLE RATIO TERMS 200 WRITE (OUT.30) C WRITE IOUT,8)(NAM(J),GVAVtJ),AV(JI,GVSO(JI,SIG(J,J),J=1,4) LINE = LINE + 15 IF (ROST( 11 • EO. 0) GO TO 220 C PRINT COMPLETE MATRIX OF £ORRELATIONS AND STO.-DEVS. IF REQUESTED WRITE (fJUT,lO) NAM C DO 210 J = 1, 1 6 210 WR I TE (OU T ,12) NA M ( J), (S I G (I ,J I , 1==1 , 16 ) LINE = LINE + 19 C PACK AND LIST rrRRELATIONS ~ETWEEN TERMS OF RATIOS r. 220 IF {RQST(2).EQ.01 GO TO 23<; IP = 0 N = 0 DO 230 J == 1,3 K :: J + 1 no 230 I = K,4 IP = IP + 1 P R R F ( I P ) == NA M ( J I IP == IP + 1 PRBF( IP) == NAMe 1) N = N + 1 230 PRRF(12+NI = SIGtI,J} WRITE (OUT,14) (PRRF(N),N=1,18) L HIE == LI fJ~ + 4 235 IF (RQSTDI.EQ.O) GO TO 245 C PACK AND LIST CORRELATIONS RFTWEEN RATIOS AND THEIR NUMERATORS KlO = 4 KP = 0 N = Q RTCR31l0 RTCR3120 RTCR3130 RTCR3140 RTCR3150 RTCl>3160 RTCR3170 RTCR3180 RTCR3190 R TCR 3200 RTCR3210 R TCR 3220 RTCR1230 R TCR3240 RTCR3250 RTCR3260 RTCR3?70 RTCR3280 RTCR32<:lO RTCA 3300 RTCR3310 RTCR3320 RTCR3330 RTCP3340 RTCR3350 RTCR3360 RTrR3370 R TCR3380 RTCR.3390 RTCP3400 R TeR 341 0 R TCR 3420 RTCP3430 RT(R3440 R TCR3450 ...... N ... "r:I tTl t-< H >< 9 2< tTl en
  • 136. c on 240 J = 1, 4 KLO = KLD + 1 DO 240 K = KLO.16,4 KP = KP + ] PRAFCKPI = NAMIKI KP :0 KP + 1 P Q 8F(KP) = NAM(J) N = N + 1 240 PRPFI24+NI = SJG(K,JI LINE = LINE + 5 IF (5~ - LINFI 243,244,244 243 WRITE (OUT,NUPG) L HIE' = 0 244 WRITE cnUT,l6) (PRRF(N), N = 1,361 LINE' = tINF + 5. 245 IF(RQST(4).EQ.O) GO TO 255 C PACK AND LIST rQRPELATIO~S RFTWEEN RATIOS AND THEIR DENOMINATORS KLn -= 17 KP = 0 ,,! = () Dr) 250 J = 1,4 KHI = nfl- 1 KLO = KlO - 3 DO 250 K = KLO,KHI Kfl = KP + 1 PRBF(KP' = NAM(K) KP = KP + 1 PRBFtKP' = NA~(JI N = N + 1 250 PRBFC24+N) = SIG(K,J) LINE' = LINE + 5 IF (55 - LINE) 253,254,254 253 WRITF (GUT,NUPG) LINE = 0 RTCR3460 q TCR3470 RTCR3480 RTCR3490 R TCR3500 RTCR3510 PTCR3520 RTCR3530 RTCR3540 lHCR3550 PTCR3560 RTCRV570 RTCR3580 R TCR '3 590 RTCR'3600 RTCR3610 RTCR3620 RTCR3630 R TCR3640 RTCR3650 RTCR3660 RTCR3670 RTCR3680 R TCR36QO RTCR3700 RTCR3710 RTCR372 0 RTCR3730 RTCR"740 RTCR3750 RTCR3760 RTCR3770 RTCR3780 RTCR37QO RTCR1800 VI en H ~ ~ ~ H o 8 ~t""' ~ H ~ ....N VI
  • 137. 254 WRITEIOUT,lB) (PRBfIN) ,N=1,36) R TCR 3810 ..... tv C RTCR3A20 0 255 IF(RQST(51.fQ.0) GO Tn 280 RTCP3830 C PACK A~D LIST CORRELATIONS BFTWFEN R~TIOS WITH COMMON NUMERATORS R TCR 3840 t NK = I RTCR3850 KP = 0 RTCR31160 N 0 RTCR3870 r 5 RTCP.3A80 J 13 RTCR3890 K 9 RTCR3900 260 KP = KP + 1 RTCR3<H0· PRRF{KP) = NAMt I I RTCR3920 KP = KP + 1 RTCR3930 DRRF(KP) = NAMIKI R TCR3940 N -= N + 1 R TCR3950 PRRF(24+N) = SIG(I,K) IHCR3960 KP = KP + 1 RTCR3970 PRBF(l<P) = NAM( II RTCR3980 KP = KP + 1 RTCR3990 PRP-FIK") -= NAM (J) R TCR4000 N = N + 1 RTCR4010 PRFlF(24+NI -= SlG(l,J) RTCR4020 KP = KP + 1 PTCR4030 PRBFIKP) = NAMt K) RTC94040 KP = KP + 1 RTCR4050 PRP-F(KP) = NAM(J) PTCR4060 N = N + 1 R TCR4070 PRBEI ?4+N I SIGIK,JI RTCR4080 T I + INK RTCR4090 J = J + 1 NK RTCR4100 'T:I ttl K = K + INK RTCR4ltO c-< H IF (KP.LT.24) GO TO 2~O RTCR4120 >< IF(INK - 2) 275,275,270 RTCR4130 @ C WRITE CORR. WITH COMMON DENOMIN., RETURN FOR NEW COM/oIIANO RTCR4140 2< 270 LINE = LINE + 5 RTCP4150 ttl CJl
  • 138. 2700 2701 C IF 155 - LINE) 2700,2701,2701 WRITE (OUT,NUPG) LINE = 0 ~RtTE (OUT,22) (PRBF(N),N=1,36) R TCRA160 RTCR4170 RTCR4180 RTCR4190 QTCR4200 271 IF(RQST(7).EQ.O) GO TO 272 RTCR4210 C PRINT CORRELATIONS IN WHICH THE SA~E TERM IS THE NUMERATOR OF 8NE RTCR4220 C AND THE DENOMINATOR OF THE OTHER. RTCR4230 C C LINE = LINE + 5 RTCR4240 IF (55 - LINE) 2710,2711,2711 RTCR4250 2710 WRITE (DUT,NUPG) RTCR4260 LINE = 0 R TCR4270 2711 WRITE (oUT,241 NAM( 11, fJA~(2), NAMI 14). NAM(5)' NA"1(l4). NAM(9), RTCR42RO *NAM(l'5l, NAM('i), NAM(l'5l, NAM(13), NAM(16J, NA"'I(9I, r-JAM(l6) , RTCR4290 >!'NAM(13), NAM(lll, NAt.1(6), NAl.1l1l1, NAM(l4), NIVH121, NAM(lO), RTCR4300 *NMH12), NAM(l4), NAM(13), NAM(6), NAM(13), NAM(lO}. SIG(l4,S), RTCR4310 * S I G (I 4, 9), c:;r G ( 1'5 , '5 ), S 1 G (15 ,13). S I G ( 16, q), S I G (16 ,1 3), SIr,( 11 , 6 ) R TC R 4320 ~', SIGlll,141, <;IG(12,10), SIG{12,l4), SIGIl3.61, SIGID,lO) RTC R 4330 RTCR4340 LINE = LINE + '5 RTCR43S0 I~ (5S - LINE) 2712,?713,2713 RTCR4360 2712 WRTTE (fJUT,'l1JPGl RTCR4370 LINE = 0 RTCR4380 2713 WRITE (OUT,24J NAM(3), NAM(4), NAM(8), NAM(ll), NAIII(S}. NAM(lS), RTCR4390 *NAM(9), NAM(7), NAM(9), "JA~{lll, NAM{lOl, NAM(7), NAMII0). "lAMl1S'RTCR4400 *, NA"'115), Nil M Ull , NAM(S). NA"'1fl2), NAfoI(6', NA"1(8), NAM(6', NAM(16)RTCR4410 *. NA'~(7), f'IAM(l2), NtMlll, NAM{16I, SIGlfI,11) , STG(8.IS), RTCR4420 * S I G (9 ,11, S I G ( 9 ,1 1 I, S I Gil 0 , 7 ), S! G ( 10,1'5 ), S! G ( 5 , 8 ) , S I G ( 5 , 12) , R TC R 443 0 *SIG(6,8), SIGI6,161, SIG(7,121, SIG(7,16) RTCR4440 RTCR44S0 272 IFIRQST(R).EQ.O) GO TO 273 C PRINT CORRFLATlnNS PF RIITIOS WHICH ARE RECIPROCALS OF EACH OTHER. R TCR.4460 RTCR4470 RTCR44PO RTCR4490 RTCR4500 RTCR4510 L IN F = L H' E + 4 IF (5~ - LINE) 2720,2721,2721 2720 WRIT'" {CUT,""JPGl L If'E = 0 V1 en ...... ~ f;; ~ ...... o n o ~ t""' ~ ...... o ti ...... tv -...]
  • 139. 2721 WRITE (0UT,26) N~~(13),NA~(14l,NAMI9).NAM(15»,NAM(5),NAM(16),NAM(IRTCR4520 *O),NAM(11),NAM(61,NAM(12),~AM(7),NAM(8',SIG(13,14),SIG(9,151,SIG(5~TCQ4530 * , 16) ,S I G ( 10,11 , , S ! G ( 6 , 12 ) , S I G (7 ,~ » Q TC R 4 54 0 273 IF (RQST(91.FQ.0) GO TO 101 RTCP-4550 C PRINT CORRELATIONS BETWEF.N RATTOS LACKING COMM8N TERMS RTCR4560 LINE = LINF + 5 RTCP4570 IF (55 ~ LINE) 2730,2731,2731 RTCQ4580 27'30 WRITE (f1UT,NUPG) RTCR,4590 L INF = 0 RTCR4600 2711 WRITF (OUT,34) NAM(S), NAM(lO), NAM(16), NAM{lO), NAM(5), NAM(11),RTCR4610 *NAM(l6), NtI~(1U, NA~(6), r-JAM(Q), NAM(l2), NAM(9), NA 1.1 I 6' , NAM(l5IPTCR4620 *, NA~(l?I, I'!AM!15', 'IA~I71, NAMIl3', NAIJ!(8), NAt'l(13). NAM(7), RTCP4630 * NII.IJ!(l41, NAMUn, NA!~(14). <;IG(5tlO), SIGI16,l0), SIGC5,IU, RTCR4640 *SIG(16,1l), SIG(6,9), SIG(12,9', SIG(6,15), SIG(12,15), SIG(7,13),RTCR,4650 *SIG(8,13), SIGI1,141. SIGtS,141 RTC Q 4660 GO T0 101 RTCR4670 C WRITE COPR. WITH COMMON Ntlf.lFRATCRS. CONTINUE RT(R4680 275 LINE = LINE + 5 RTCR.4690 IF (55 - LINF) 2750,2751,2751 RTCR4700 2750 WRITF (OUT,NurG' RTCR4710 UNF = 0 RTCR4720 27'51 WRITE {CIJT,?Ol (PRBF(N),N=1,361 RTCR4730 280 IF (RQST(61.FQ.O) GO TO 271 RTCR4740 C INITJALIZ~ POINTERS TO P~CK CORR. 8~TWEEN RATIOS WITH CO~M. OENOMIN. RTCR4750 I NK = :3 R TC R 4 760 KP = 0 RTCQ,4770 N I) RT(R4780 I = 5 RTC P 4790 J = 7 RTCP4POO K = 6 RTCR4RIO GO TO 260 RTCR4820 c C RECORD CURRENT VALUE 300 WRITE(OUT,?8IQ STOP OF RAf'JnOM NtlM~ER GENERATOR C; FE!). tHCR4830 RTCR4840 R,TCR4850 R TCR4860 ATfR4870 ENO ~ IV 00 cilt"" ..... >< 9 ?< ttl til
  • 140. APPENDIX 2 C PROGRAM RANEX RNEX C EXAMINES RANDO/>.1NFSS AND UfJIFORMITY OF A SEQUENCE OF NU"lBERS BY , RNEX C 1. COUNTING FREQIJENCIES OF RUNS OF LENGTH K OVER OR UNOER 1/2, RNEX C 2. COUNTING FREQUENCIES OF RUNS OF LF:NGTH ~1 UP OR [lOWN. RNEX C 3. COUNTING FREQUENCIES WITH Wf-HCH PAIRfD NIJ"1BERS SFPARATED 8Y A RNEX l: GIVE!'.j OISTM1CE (LAG) FALL IN JOINT S1ZE CLASS(M,NI, M : N = 1,20. THISRNEX C TABULATION IS OMITTED WHEN SAMPLES OF LESS THAN 2000 NUMBERS ARF USED.PNEX C 4. COMPUTING CHI-SQUARE(S) ON ASSUMPTION THAT NUMBERS AND NUMBER- RNEX C PAIRS ARE UNIFOR~~LY OISTRIBUTED, RNFX C 5. COMPUTING AUTO-CORRELATION COEFFICIENTS FOR LAGS J, J=l,LAG RNEX C R NF X C INPUT IS FROM SUBROUTINE UNNO(l,F) WHfRE L IS A RANDOM INTEGER RNEX C AND F IS A FLOATFD RANOO'1 ~HH>lRER IN THE RANGE (0, U RNEX C RNE X C PROGRAM WRITTEN BY F. CHAYES FOR NSF STATISTICAL GEOLnGY INSTITUTE, RNEX C CH1CAGO CIRCLEt tn2. RNEX C RNFX C ******************************************************************** RNEX C * * RNEX C * CARD INPUT TO RANEX * RNEX C * * RNEX C * (Ol. VARIABLE PJNCTTON OR DEFINITION * RNEX C * * RNEX C * T ITL E CARD (2(),t4) * RNEX C * 1-80 TITl TITLE INFORMATION, 80 CHARACTERS * RNEX C * * RNEX C * CO~MANO CARD (3I5,115) * R~FX C * 1-5 NOKMP N'IM3ER OF ITEMS PER SAMPLE * RNFX C * 6-10 ITER NUMBER OF SAMPLES * RNEX C * 11-15 LAG INTERVtl P.ETWEEN MEMBERS OF A PAIR (=0' * RNEX 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 110 180 190 200 710 220 ;no 240 250 260 270 280 290 300 U1 {fl H '6 ~ ~ H o n o ~ S H ~ {fl ..... tV 0
  • 141. C * 16-30 '<RNl INIH,tIL VAlUE OF RANDOM NUMR~R SFEO * RNEX RNEX PNEX RNEX RNEX RNF X RNEX RNE X RNfX RflEX RNEX RNEX C * ( K; USE K * C >: I); USE 1ST RfS IOUE FROM GENERATOR * C * - 1; U SF lAS T V A l U ERE AC HE 0 a IJ PRE - * r '" r.~Ef)rNG PASS OF SAME EXEr.UTlOIJ) * c * * c ******************************************************************** C c C C c C c c DfMENSION CHIPR(201, KL(20,20), LNl(4', lN2(4), lN3(4), LN4(41, *lN5(41, LN6(41, NPON(20), NRGT(20), NRlS(201. NPUPtZO), NUMF(ZO), *RLnU(ZOI, RlUD(201, RNVECIIOO), TITl(201, UKlM(20), AlJCR(200) f)OU~LE PRECISION AUCR,OK~P INTEGER fJUT EQUIVAl ENCF (R"lVEC,CHIPR), (RNVECPll ,NUMF,. (RNVEC(411 ,RLOIJ) , *(PNVEC(61"QUID) PNEX RNEX RNEX RNEX RNEX RNEX DATA IN,OUT/5,61 RNEX DATA UKlM/.O~,.lO,.15,.21),.25,.30,.35,.40,.45,.50,.S5,.60,.65,.70,RNEX *.75,.80,.85,.QO,.95,1.1 RNFX DATA lNl,lN7,lN3,lN4/4H RUN,4HS 'IF,4H lEN,3HGTH,4H NU,4HM~ER, RNEX *4~S • ,3~1/2,4H NU,4HMRER,4HS = ,1Hl/2,4H ,4H TOT,4HAl QNEX *3H 1 RNEX DATA LN5,lN6/4HORUN,4HS DO,4HWIJ ,3H ,4H RUN,4HS UP,4H ,3H RNEX *1 RNEX INPUT FOPMATS 1 FO~MAT (20A4/315, Il51 CUTPUT ~OR"'AT~ RNEX RNE X RNFX RNEX RNFX 2 FORMAT ('l',20A4,2X,'lAG =',14.', LENGTH =',16,', START *X,'SAMPlf NO.',1311) =' ,I 13/115RNFX RNEX RNEX OF PAIRS IN SIZE ClASRNEX 4 FOPMAT (3A4,lA3,13I7,415/17X,3T5) 6 FORMAT (l1I30X, 'JIJNC,]SA ARRAY I - FREQUENC IES 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 410 480 4QO 500 510 520 530 540 550 560 570 580 590 600 610 620 630 640 650 ....tJ.I o "r:I tIl I:"" 1-1 >< g ?< tIl en
  • 142. C *SES RAND C'1156X,'RANDOM NO.(K +'.I3,'l +'IlSX,20FS.21' RANDOM *. K +') 8 FOR~AT (IOX,FS.2,20I5) lO FOPMAT (/6X,'CHI-SQ.' ,2X,20FS.I/IJ 12 FOR..,AT ('OCHI-SQUARE FOR DEPARTURE -FROM UNIFORM DISTRIBUTION OF *UMRERS IN RANGE (O,U IS ',FIO.3,'.1) 14 FORMAT (/4SX,fAUTOCORRELATION OVER INDICATED LAG DISTANCE(S) _I) 16 FORMAT (4X,'FlLAG NOS.)',20151 NORNE X RNEX RNEX RNEX NRNEX RNEX RNEX RNEX RNEX 18 fORMAT (SX,fEXPECTED',Fll.l,12f1.1,4FS.l/19X,3FS.U 20 FORMAT l'l*****REQUEST FOR CALCULATIONS WITH LAG = 0 *) 22 FORMAT nO(I3,F7.4, I . I) 24 FORMAT (/10X,'FINAL RANDOM NUMBE~ IS ',I15/1Hl) IGNORED*****'RNEX RNEX RNFX RNEX 26 FORMAT If 1 FRROR" PR'lGRA.M SKIPS J>ROBLEM INITIATED BY FAUl TV INPUT *CARO, SEARCHfS FOR TITLE CARD OF NEXT PRORlFM.') RNEX RNEX RNE X 28 FORMAT (1HO,20X,'SAMPlE TOO SMAll 1+ 2000) FOR EFFECTIVE LAG PAIR *COMPARISONS. PROCEED TO UNIFORMITY TEST.'IIS6X,'RANOOM NO.(K +',13RNEX *, 'l + ' 11'5 X , 2 OF <; • 2 ) R N F X RNEX C READ COMMANO CARD R NEX RNEX RNFX RNEX RNEX RNEX RNEX RNEX RNEX RNEX RNEX RNEX RNEX c 60 READ (IN, 1,ERP=63,END=3501 IF (ITER.EQ.O' ITFR = 1 IF (lAG.GT.O) GO TO 65 WRITE (nUT,20) 63 WRITE (OUT,26) GO TO 60 TITL,NOKMP,ITFP,LAG,KRNL C INITIALIZE RANDOM NU~RER 65 IF(KRNl) 70,80,90 GENE RATOR 70 KR = KRtST KRNl = KRtST GO TO 100 660 670 680 690 700 710 720 730 740 7S0 760 770 780 790 800 810 820 830 840 8S0 860 870 880 l'l90 900 910 920 930 940 950 960 <no 80 CAll UNN0(KP,RA) KRNl :: KR GO Tn 100 PNEX 980 RNEX 990 RNEXIOOO VI en H ~ t-< tTl SH 0 n 0 ::0 r;; t-< ~ H 0 z en I-' VI I-'
  • 143. C 90 KR KRNL ITP = 0 100 I TP = I TR + 1 IF (ITR.GT.ITER) GO TO 60 IF (IT~.GT.l' KRNL = KRLST C GENE~ATE AND STORE (LAG+1) RANQrM NU~RFRS L = ABSfLAGI + 1 DO 110 J = 1, L 110 CALL UNNOlKR,PNVFCIJ)1 IF (L.LT.?) CALL UNNOIKR,RNVECI2JI C C CLEAR All FREQUENCY AND OTHER CUMUlATnRS nn 115 r = 1,20 C NRGT( I) = 0 NRLS( 1) = 0 NRUPI t) = 0 NRON( IJ = 0 DO 115 J == 1,20 115 KL( J,t) = 0 DO 120 J = 1,200 120 AUCRIJI = 0.0 C RECORD RUNS UP-OOWN, OVER-AND UNDER 1/2, IN FIRST (LAG+1JNUMBERS M = 1 K = 1 MM = 0 KK = 0 OF = PNVEC{2J - PNVEC(1) KF = RNVEC(I) + 0.5 00 175 J = 2.L IF (J.LT.3) GO TO 158 OL = RNVECfJ) - RNVECfJ-1J IF (OF) 125,125,140 C CUP~ENT RUN IS DOWN R~E XI010 RNEX1020 RNEXI030 RNFXI040 RNEX1050 R~EXI060 R~E XI070 RNEXI080 RNEXI090 RNFXllOO RNEXlllO RNEX1l20 RNEXl130 RNEXl140 RNE X1l50 RNEXl160 RNEXl170 RNEX1l80 RNEX1l90 RNEX1200 RNEX1210 RNEX1220 RNEX1230 RNEX1240 RNEXl250 RNEX1260 RNEX1210 RNEX1290 RNEX1290 RNEX1300 RNEX1310 RNEX1120 RNEX1330 RNEX1340 RNEX1350 ..... t.I N ~ t'"' H >< g 2;: tT:I en
  • 144. 125 TFIDl) 130.130.135 130 M = M + 1 IF O".FQ.20) M = 20 IF (M. GT. MIA I MM = M GO TO 155 135 NRON(~) = NRDNlM) + GO T(l 150 C CURRENT RUN IS UP 140 IFI DL) 145,14'3,130 145 NRUPIM) = NRUPIMl + 1 150 r~ -= 1 155 f"lF = Ol 158 KS = RNVECIJl + 0.5 IF IKS + KF - 1) 160,165,160 C rUP~ENT PIIIR THE' SAME, INCPE~ENT COUNTER 160 K= K + 1 IF {K.GT.201 K = 20 IF (K. GT. KK) KK = K GO TO 175 C CURRENT PAIR DIFFER, RECORD RUN LENGTH, RESET K c 165 IF(KF.EQ.1) GO TO 170 N RL S ( K) = NR LSI K I + 1 GO Tn 173 170 NRGT(K) = NRGT(K) + 1 173 K -= 1 175 KF = KS C CLASSIFY AND STORE FREQUENCIES OF PAIRED NUt~~ERS M AND M+[, CUMULATE C oPODlJCTS FOR AUTO-CORREl/HION COE FF IC lENT, Mil INTA IN COUNT 01" RUIIJS OF C lEMGTH KOVER ANO UNDER 1/2. ( DO 300 N = 1,NOKMP C CUMUL ATE AUTO-CORP. SUM flF PRODUC TS lL = L - 1 DO l71l ! -= I,lL RNEX1360 RNEX1370 RNEX1380 R~'EX1390 RNEX1400 RNEX1410 R"lEX1420 RNEX1430 RNEX1440 RNEX1450 RNFX1460 RNEX1470 RNEX1480 RNEX1490 RNEX1500 R NE Xl51 0 RNEX1520 RNEX1530 RNfX1540 RNEX1550 RNEX1560 RNEX1570 RNEX1580 RNEX1590 RNEX1600 RNEX1610 RNEX1620 RNEX1630 RNEX1640 RNEX16<;O RNEX1660 RNEX1670 RNEX1680 RNEX1690 RNEX1700 U1 en H i!§ t'"' tTl S H o n o ~ E H o z en ...... (N (N
  • 145. 178 AUCRIIJ = AlICR(I) + RNVEC(L-I)*RII,IVEC(U C CLASSIFY CURRENT NUMAER-PAIR RNVEC(I) AND RNVEC(L) 00 190 I = 1,20 C IF (RNVEC(l) - LJKLM(I1) IIJ0,180,190 ] 80 Ie = I GO TO 200 190 CONTINUE IC = 20 200 ~O 220 I : 1,20 IF (RNVEC (ll - lIKLM( I I) 210,210,220 210 IR = I GO Tn 230 220 CONTI NUE IR= 20 230 KlIIPtICl = KlIIR,IC) + 1 C SHIFT ELEMFNTS OF RNVEC DOWN ONE CELL, GENERATE NEW RNVECCLl DO 240 JR = 2, L 240 RNVECIJR - 1) = RNVEC(JR) CALL UNN1(KR,RNVEC(lll C ( C CONTINUE COUII,IT 'JF RUNS UP-DOWN, f)VER-llNDER 1/2 DL = RNVECIL) - RNVEC(L-I) IF ([)F) 245,245,260 245 IF!OL) 250,2~O,255 250 101 = M + 1 IF IM.EO.20) M = 20 IF (M.GT.MM) 101M = M GO Tn 275 255 NRONIM) = NRON!M) + 1 GO TO no 260 [FIDt) 2h5,?65,25C 265 NRlJPIM) = NPtJP!f.I) + 1 270 ~ = 1 215 I'lF = DL RNEXI710 RNE X1720 RNEX1730 RNEX1740 RNEX1750 RNEX1760 RIJEXI770 RNEX1780 RNEX1790 RNEXlSOO RNEX1810 RNEX1820 RNEX1830 RNEX1840 RNEX1850 RNEX1860 RNEXlfl70 RNEX1880 RNEX1890 RNFX1900 RNE'X1910 RNEX1920 RNI:"X1930 RNEX1940 R"IEX19S0 RNFXI%O RNF. Xl 97 0 RNEX1980 RNEX1990 RNEX2000 RII,IEX?OIO PNEX20?O RNE X2030 RNEX2040 QNEX2050 ...... tJ.I ~ 'r.I tTl t""' 1-1 >< @ ?< tTl en
  • 146. KS = RNVECCl) + 0.5 IF (KS + KF - 1) 280,285,280 Z80 K = K + 1 IF (K.GT.20) K = 20 IF IK.GT.KK) KK = K GO TO 300 285 I F I K F • F.' Q • U GOT 0 29 0 NRlSIK) = NRlS(K) + 1 GO TO Z95 290 NRGTIK) = NRGTIK) + 1 295 K = 1 300 KF = KS r PECORD lAST RUN Of F.'ACH TYPE c IFCOU 302,302,304 302 NRDNIM' = NRDN(Ml + 1 GO TO 306 304 NRUP(M) = NRUP(M) + 1 306 IFIKS.GT.O) GO TO 30R NRlSIK) = NRlqK) + 1 GO TO 309 308 NRGTIK) = NRGT(K) + 1 C CALCULATE AUTO-CnPRElATION COEFF., BEGt N WR IT WG 30Q KRl ST = KR OKMP = NOKMP DO 30 9 5 J = 1, L L 3095 AUCR(J) = AUCRIJ)/OKMP WRI TE (OllT,?) TITl ,lAG,NOKMP,KRNl, ITR C C CALC. OBS. : EXP. NUMRERS nF RUNS QF lFNGTH + : = 1/2 Dn 310 J = 1,KK 310 RlOU(J) = FLOATINOKMP + LAG - J - 4)/2.**IJ+l1 C COMPUTF CHI-SQUARES FOR INnrVInUAlS AND PAIRED FREQUENCIES CHI FR = 0.0 TMP = FlOATINOKMP)/400. RNEX2060 RNEX2070 RNEX2080 RNEX2090 RNEX2100 RNEX2110 RNEX2120 !? NH213 0 RNEXZ140 RNEX2150 RNEX2160 RNEX2170 RNEX2180 RNE X21 <:)0 RNEX2200 RNEX2210 RNEX2220 RNEX2230 PNEX2240 RNEX?250 RNEX2260 RNE X2270 RNEX2280 RNEX2290 RNEX2300 RNEX2"110 RNEX2320 RNEX2330 RNEX2340 RNEX2350 RNEXZ360 RNEX2370 RNEX2380 RNEX2390 RNEX2400 IJ1 en .... ~ ~ ~ .... o 8 ~ t"' ~ .... ffi ....II< VI
  • 147. c T"IF = lO.*TMP DO :n 0 I C = 1, 2 0 CHIPR(ICI -= O. SUMFP = O. NlIMF( 10 -= 0 flO 32() lR = 1,20 CLFP = KLlIR,IC) S lJI4 FR = S U MFR + C L F R NU/<IFIICI = MJIo'FIIC) + KUIR,Ie) 320 CHIPP.IIC) = CHTPRIIC) + (CLFR - T"1 o l**2 CHIPR(ICI = Cf"tPR(ICIITMP 330 CHIF~ CHIFR + (SUMFR- T"IFI ** , CHI FP = CH TFQ'/P~F C COMPLETE WRITIN~ c WRITF (rUT,41 lNt, (J,J=l,KK) WRITE 1()IIT,41 LN2, (NRLSIJ},J=l,KK) WR,ITF (rUT,4) LN1, (NRGT(J),J-=l,KK) on 332 J = 1,KK • 332 NRGT(J) = NRGT(J) + NRLSIJ) WRITE (OUT,4) LN4, ('JRGTfJ),J=l,KK) ,,"'RITE {OlJTr18) (RlOll{JI, J = 1,KKI WRITF (CUl,41 LN5,(NRnN(J) ,J=l,w~) WRITE (rUT,4) LNn,(NRUP(JI,J = 1 ,"'lM) C CALC. CBS. ANn EXP. NUMBERS OF RUNS OF LFNGTH K ·UD-AND-OOWN'. ROT = 6. 00 335 J = 1,~H.1 F~ = J + ." BOT = ROT * FI" RllJOIJI 2*1 (J**2+3*J+ll*(NOKMP+lAG+l)-(J**1+3*J**2-J-4) I RL!J()(J) = RLlID(Jl/ROT 315 NRUP(J) = NRONLJl + NRjfl{J) WRITf (QUT,4) If14,(NRU£l(JI,J=1,M'''l) WRYTE UJlJ T ,1S) (RLtJfl(JI.J=l,MM) RNEX2410 RNFX2420 RNEX2430 RNEX2440 RNEX?450 RNEX2460 RNEX2470 RNEX2480 RNE'X?490 RNFX2500 RNEX2510 RNFX2520 RNEXZ530 PNEX2540 RNEX2550 RNEX2560 RNEX2570 RNEX2580 RNEX2590 RNEX2600 RNEV610 R NE X262 0 RNEX2630 RNEX2640 R NE X2l:50 R NEXZ660 RNEX2670 RNEX?680 RNEX2690 RNEX? 700 RNE X2710 RNEX2720 RNEX2T30 p NEX2740 RNEX2750 ....VI 0 ~ t""' H :>< g 2< tTl CIl
  • 148. IF (NOKMP.LT.20001 GO TO 345 WRITE {(lUT,61 LAG, (UKLM(Il,r=l"~O) Oil 340 J = 1,20 340 WRITE IOUT,8) UKlMIJ),IKL(J,K),K=1,20) WRITE IOUT.IO) (CI-lIPf{(JI,J=1,20) 345 tF l"I:JK~1P.LT.?OOO) WRITE (OUT,281 LAG,UKLM WPIH I(lUT,l61 NUMF WRITE (OUT,12) CI-lIFR WRIT E ([I' JT , 14 I WRITF (nUT,221 (J.AUCR(JI,J=l,LLl wp.ITE (OlIT,24) KRLST Gn Tel 100 350 STOP END SUBROUTINE UNNO(L,FI PSEUO[1-PANOOM NUMljER GENERATOR C(lOEO ElY L.W.FINGER CALLING SEQUENCE IS CALL UNNO {L,F} l IS THF LAST RANO(lM NUMBER CALCt/LATED. IF NOT CHANGED, THE NUMBERS COME OUT IN SEQUENCE. F IS THE RA~OOM NUMBER IN REAL FOP~ AND IS UNIFOR~LY DISTRIBUTEO IN THE RANGE (O,l) OOtiRLE PRFCI<;ION D IFIL.EQ.0IL=23192344 O=L 0=OMOOI513.000*0,2147483641.000) L=O L=L+l F=(O+1.ODOI/2147483647.000 RETURN END RNEX2760 RNEX2770 RNEX2780 RNEX?790 RNEX2800 RNEX2810 RNEX2820 R~lEX2830 RNEX28'+O RNEX2850 flNEl<2860 RN!=X2870 RNEX2880 RNEX2890 LJNNO 10 UNNO 20 UNNO 30 UflNO 40 UNNO 50 UNNO 60 UNNO 70 tjJlNO 80 IJNNO 90 UNt.JO 100 UNNO 11 0 U"INO 120 UNNO 130 UNJlO 140 U11110 150 UNNO 160 V1 en I-! ~ f;; ~ I-! o n o ~ E I-! ~ en ...... V. ' I
  • 149. Chapter 6 Computer Perspectives in Geology Daniel F. Merriam 6.1 GENERALITIES The lntroduction of computers into society in the mid-19th century ushered in the space-age and, with it, some special problems. Although the idea of computers and computing has been with us for some time, the explosive development of computers just after World War II was not foreseen, and the public as a whole was not ready for such dramatic and rapid changes in this field. So, although we have seen men on the moon, world-wide weather forcasting via satellites, and breakthroughs in med- icine and science, some of the 1984 predictions have come to pass. There has been an invasion of privacy, many trades and practices have been declared obsolete and redundant, and an impersonal touch has been added to an ever-increasing complex society. The problems that plague the general public also plague science, and geology is no exception. The numerous changes stemming from the computer have forced geologists to reevaluate their contributions and in some instances, the results have been startling. Geologists have been put in the position of having to think and formulate their prob- lems and methods of solution in much greater detail. Instead of re- placing geologists, the computer has created a demand for more and bet- ter trained ones. One effect the computer has had on geology is to force a metamorphosis on us, that is a change from a qualitative sci- ence to a quantitative one. Purists claim even that geology is only now becoming a science. 138
  • 150. 6. COMPUTER PERSPECTIVES IN GEOLOGY 139 6.2 EARLY BEGINNINGS Computers date from 1812 when Charles Babbage invented his dif- ference machine. The machine was designed to operate automatically without human intervention. Although the underlying principle was simple, Babbage encountered almost insurmountable problems in building his machine. It was unfortunate that Babbage was unable to complete his marvelous invention, but he was too far ahead of his time, and technology simply had not developed to the necessary extent. The next important event was the development of a punched-card system by Herman Hollerith in about 1890. Mr. Hollerith worked for the u.S. Bureau of Census, and he recognized early the need for the rapid manipulation of data. Even then it was difficult to summarize census information. Hollerith's original idea of the punched card is probably the device most used today for input to the computer. During World War II the development of computers was accelerated, especially to aid in solving problems in ballistic missile development and in the development of the atomic bomb. Rapid strides were made possible through the adaptation of Boolean algebra which had been formulated some years before by Claude Shannon. Although the first computers were awkward to use and slow by today's standards, they served their purpose and laid the foundation for a complex and dynamic industry. 6.3 USAGE IN GENERAL Computers may be used for a number of reasons,. including (1) sav- ing time and effort, (2) making use of information in ways that would be virtually impossible without the aid of computers, and (3) improv- ing the rigor of thought processes (Harbaugh and ~erriam, 1968). The application of computer techniques to solving problems in the earth sciences is now important and becoming more so. Many of these appli- cations were unthought of just a few years ago and indeed just a few months or even weeks ago. As aptly stated by P. C. Hammer (1966, per- sonal communication)
  • 151. 140 DANIEL F. MERRIAM "People who are thinking about what they are doing are using com- puters." Although some of the techniques were possible before the advent of the computer, many were not. Execution is presently feasible only because of the ease and speed with which they can be accomplished and as stated by P. C. Hammer (1966, personal communication) "A computer is an intelligence amplifier." The ability to save time and effort and to some extent the ability to manipulate data in ways that are impossible otherwise are mainly an aspect of computers and computing systems themselves. It is essential, therefore, that some personal involvement in the process of computing be attained. Two of the most important aspects of the use of computers are re- peatability and reliability; that is, anyone can take a set of data and reproduce the results within the same limits of accuracy by the same method. It is not possible to do this with qualitative methods and these two aspects can not be overestimated (Merriam, 1965). Obviously there are times and places when and where a computer can be used to good advantage. These are (1) if there is a large vol- ume of data, (2) if speed or frequency is necessary in retrieval of data, or (3) if a particular problem is extremely complex. There is an area between these extremes where it is easier to do the required manipulations manually. There are other considerations on whether to use a computer. These include (1) the availability of programs, (2) the reducibility of the data to numeric form, and (3) ease of accessibility and econ- omic feasibility. If it is necessary to develop programs, it can be extremely tedi- ous and expensive. Obviously, if there is a low volume of data, or speed is no object, or the problem is not too complicated, it would be desirable to obtain results manually. Fortunately, however, programs are available from many sources and many may be adapted for a particu- lar use. Many geologic data are qualitative and not amenable to computer analysis. Many data also are incomplete or of poor quality and
  • 152. 6. COMPUTER PERSPECTIVES IN GEOLOGY 141 essentially useless by today's standards. It may be desirable in many instances simply to recollect the data. Another requirement is the necessity that the problem be expressed in a sequence of relatively simple, logical, algebraic statements. This is necessary because op- erations to manipulate the data must be explicit, precise, and unam- biguous. This requirement can usually be met although it may take considerable thinking and planning on the part of the investigator. 6.4 USAGE IN GEOLOGY Just as computers date from 1812, modern geology dates from the late 18th century and the work of James Hutton, a Scot. It is inter- esting to note that these events both took place in Britain at about the same time and about 150 years before the application of computers to geologic problems. The first earth scientists to use computers were those who were numerically inclined. It is not surprising then that geophysicists were the first. They had been using slide rules and desk calculators for many years, and it was natural to adapt to a new and better method of processing their data. Other exceptions were those conducting sta- tistical studies of sediments and their contained fossils and those working with engineering aspects, such as hydrologists. I~ere large quantities of data were handled, techniques were needed in manipulating them. Just as Herman Hollerith needed assistance with his census data, geologists needed help with their data processing. It is logical then that sorting of oil and gas data and stratigraphic information was accomplished with punch cards in the early 1950's (Parker, 1952). Early bibliographic systems also utilized punched cards. As data ac- cumulated, it was necessary to sort faster, and when computers became generally available, automatic procedures replaced manual ones. It might be imagined that because computers have only been com- mercially available for about 18 years that the utilization of them by geologists has been most recent. This is indeed true, and only in the past 10 years has this involvement become increasingly important. The
  • 153. 142 DANIEL F. r.JERRIAM importance can be judged by the number of geologic publications appear- ing which have in some way utilized the computer as shown in Figure 6.1 . The number of publications is increasing rapidly and an obvious in- crease occurred in the number of reports on research beginning in 1962, which is the result of the general availability of second-generation computers. Geology entered the computer age with a publication in a regularly issued geology journal of a geologically oriented IBM 650 program by Krumbein and Sloss (1958). The original program is repro- duced in Figure 6.2. Other important events which have affected the development of computer applications in geology are listed in Table 6.1. ~ .~ .~ :0 ~ D- o .8 E :> Z 300 250 200 150 100 50 1950 55 60 Years FIGURE 6.1 65 70
  • 154. 6. COMPUTER PERSPECTIVES IN GEOLOGY 143 IBM 650 Basic Program for Three Percentaqes and Two Ratios Zero drum, Start program at 0501, No Subroutines required Location of Instruction OP Data Instr. Abbrev. Remarks 0501 70 1501 0502 RD Read data card 0502 65 1501 0503 RAL Code in accumulator 0503 20 0727 0504 STL Store code 0504 65 1503 0505 RAL Total in accumulator 0505 20 0728 0506 STL Store total thickness 0506 65 1504 0507 RAL B to accumulator 0507 16 0660 0508 SL Subtract 10 1010 1010 0508 45 0510 0509 BRNZ Branch on B data 0509 24 0729 0515 STD Store no data code 0510 15 0660 0511 AL Add 10 1010 1010 0511 35 0004 0512 SLT Shift left 0512 64 1503 0513 DVRU Divide by total 0513 31 0001 0514 SRD Shi ft and round 0514 20 0729 0515 STL Store oercent B 0515 65 1505 0516 RAL C to accumul ator 0516 16 0660 0517 SL Subtract 10 1010 1010 0517 45 0519 0518 BRNZ Branch on C data 0518 24 0730 0524 STD Store no data code 0519 15 0660 0520 AL Add 10 1010 1010 0520 35 0004 0521 SLT Shift left 0521 64 1503 0522 DVRU Divide bV total 0522 31 0001 0523 SRD Shi ft and round 0523 20 0730 0524 STL Store oercent C 0524 65 1506 0525 RAL A to accumulator 0525 16 0660 0526 SL Subtract 10 1010 1010 0526 45 0528 0527 BRNZ Branch on A data 0527 24 0731 0552 STD Store no data code 0528 15 0660 0529 AL Add 10 1010 1010 0529 35 0004 0530 ALT Shift left 0530 64 1503 0531 DVRU Di vi de by tota1 0531 31 0001 0532 SRD Shift and round 0532 20 0731 0552 STL Store oercent A 0552 71 0727 0553 PCH Punch percentages 0553 65 1501 0554 RAL Code in accumulator 0554 20 0827 0555 STL Store code 0555 65 1503 0556 RAL Total in accumulator 0556 20 0828 0557 STL Store total 0557 65 1506 0558 RAL A to accumulator 0558 45 0570 0559 BRNZ Branch on nonzero A 0559 65 1504 0560 RAL B to accumulator 0560 15 1505 0561 AL Add.e 0561 45 0564 0562 BRNZ Branch on nonzero B + C 0562 65 0662 0563 RAL Indeterminate code 0563 20 0829 0586 STL Store code for 010 0564 16 0661 0563 SL Subtract 20 2020 2020 FIGURE 6.2
  • 155. 144 DANIEL F. MERRIAM FIGURE 6.2 (continued) Location of Instruction OP Data Instr. Abbrev. Remarks 0565 45 0567 0566 BRNZ Branch on no B + C data 0566 24 0829 0586 STD Store no data code 0567 65 0663 0568 RAL Infinity code 0568 20 0829 0586 STL Store infinity code 0569 16 0660 0570 SL Subtract 10 1010 1010 0570 45 0572 0571 BRNZ Branch on nonzero A 0571 24 0829 0586 STD Store no data code 0572 65 1504 0573 RAL B to accumulator 0573 16 0660 0574 SL Subtract 10 1010 1010 0574 45 0576 0575 BRNZ Branch on B data 0575 24 0829 0586 STD Store no data code 0576 65 1505 0577 RAL C to accumulator 0577 16 0660 0578 SL Subtract 10 1010 1010 0578 45 0580 0579 BRNZ Branch on C data 0579 24 0829 0586 STD Store no data code 0580 65 1504 0581 RAL B to accumulator 0581 15 1505 0582 AL Add C 0582 35 0003 0583 SLT Shift left 0583 64 1506 0584 DVRU Divide bv A 0584 31 0001 0585 SRD Shi ft and round 0585 20 0829 0586 STL Store ratio (B + C)/A 0586 65 1505 0587 RLA C to accumulator 0587 45 0597 0588 BRNZ Branch to C data 0588 65 1504 0589 RAL B to accumulator 0589 45 0592 0590 BRNZ Branch on B data 0590 65 0662 0591 RAL Indeterminate code 0591 20 0830 0609 STL Store indeterminate code 0592 16 0660 0593 SL Subtract 10 1010 1010 0593 45 0595 0594 BRNZ Branch on B data 0594 24 0830 0609 STD Store no data code 0595 65 0663 0596 RAL Infi nity code 0596 20 0830 0609 STL Store infinity code 0597 16 0660 0598 SL Subtract 10 1010 1010 0598 45 0600 0599 BRNZ Branch on C data 0599 24 0830 0609 STD Store no data code 0600 65 1504 0601 RAL B to accumulator 0601 16 0660 0602 SL Subtract 10 1010 1010 0602 45 0604 0603 BRNZ Branch on B data 0603 24 0830 0609 STO Store no data code 0604 15 0660 0605 AL Add 10 1010 1010 0605 35 0003 0606 SLT Shi ft left 0606 64 1505 0607 OVRU Oi vi de by C 0607 31 0001 0608 SRD Shi ft and round 0608 20 0830 0609 STL Store rati 0 B/ C 0609 71 0827 0501 PCH Punch ratio card 0660 10 1010 1010 Const 0661 20 2020 2020 Const 0662 90 9090 9090 Const 0663 99 9999 9999 Const
  • 156. 6. COMPUTER PERSPECTIVES IN GEOLOGY 145 TABLE 6.1 1812 1890 1941 1944 1946 1951 1952 1953 1954 Important Events in Computer Applications in Geology Charles Babbage and his difference machine. Punched-card system developed by Herman Hollerith. 23, first electronic computer, made in Germany. Mark I, the decimal electromechanical calculator put into operation at Harvard. ENIAC built at the University of Pennsylvania. UNIVAC, the first commercial computer. Digital plotters introduced. First FORTRAN compiler written. Introduction of the IBM 650, the first mass-produced com- puter. 1958 w. C. Krumbein and L. L. Sloss published the first geologi- cally oriented computer program in a recognized geologic j ouma!. Transistorized second-generation computers introduced. The ALGOL language was jointly introduced in several coun- tries. 1961 Establishment of the symposia series "Computer Applications in the Mineral Industries" by University of Arizona. 1963 Announcement of third-generation microcircuit computers. First regular publication of geologic computer programs as Special Distribution Publications of the Kansas Geological Survey. First year more than 100 papers published on computer appli- cations in geology. 1964 Time-sharing system successfully used at Dartmouth University. 1966 First series of geologic publications to deal exclusively with computer programs established by Kansas Geological Survey. First of eight colloquia on "Computer Applications in the Earth Sciences" sponsored by the Kansas Geological Survey. Establishment of an Associate Editor for Computer Applica- tions for the AAPG Bulletin. 1967 American Association of Petroleum Geologists Committee on Electronic Data Storage and Retrieval formed. COGEODATA (lUGS Committee on Storage, Automatic processing, and retrieval of geologic data) formed.
  • 157. 146 TABLE 6.1 DANIEL F. HERRIAM (Continued) Publication in lUGS Geological Newsletter, first international attempt to standardize description of mineral deposits in computer processable form. 1968 IAMG founded in Prague at the IGC. 1969 First issues of the Journal of the IAMG published. GEOCOM Bulletin, an international current awareness publica- tion, initiated. US Geological Survey publishes its first Computer Contribu- tion series. First book in a series on "Computer applications in the earth sciences" (published by Plenum Publ. Corp.) 1970 An informal research group on Computer Technology formed by SEPM. 6.5 PATTERNS AND TRENDS For 150 years geologists have been collecting data. By its nature geology has been a historical and observational science. By the mid- 20th century, however, this emphasis was beginning to change to one of understanding geologic processes (Sylvester-Bradley, 1972). This metamosphosis of where to how is being accelerated, and obviously the next logical step is one of understanding why, that is putting together the whole story. So, even in the short time geologists have utilized computers, a progression through several stages of computing environ- ments has taken place (other disciplines have also undergone this transformation, for example, chemistry and physics). Early applications were mainly analytical. Next, was a stage of collecting data in machinable form for use in predictive techniques utilizing methods developed and well tested in other disciplines as indicated in Table 6.2. Simulation followed and results are being evaluated in the 1970's. This progression in development of the sub- ject is paralleled and recorded by examples in the literature (Preston, 1969). Past and future trends are shown in Figure 6.3.
  • 158. TABLE 6.2 Discovery Development Application Assimilation Historical Record of Stage of Integration of New Concepts or Techniques in a Discipline Publications Papers general with suggestions of pos- sibilities Papers demonstrate use of different techniques Papers acknowledge use ~of computers and source of programs. Different problems tried Completely integrated Data None Artificial Sample data sets Real data in quanti ties necessary to solve problems Computer Programs None "Borrowed" from other fields in- tact Modified and adap- ted from other fields with some geologic bent Programs written with only parts of "canned" programs used but specific for purpose References Practically none Mostly from other disciplines Everything written on the subject in geology Citation of only those papers of pertinence to work '" n o 3: "" ~ tTl :>::l "" tTl :>::l (j) "" tTl n ..., t-I < tTl (j) t-I Z G"l tTl o t-< o G"l -< I-' .p.. --.]
  • 159. 148 DANIEL F. MERRIAM 1960's . / 1970's ....,../ ...-"'- --- " - 1980's .. FIGURE 6.3 ./' ./ The future is naturally unknown. Developments are too rapid and users are heavily dependent on developments in the computer industry and other disciplines for advancement of methods and ideas. It is clear at this moment in time that we are moving from the application to assimilation stage in most areas and that we are using simulation to test real situations and learn of processes in an effort to under- stand why. Past developments will have a bearing on future events. Individ- uals especially i~ universities have been developing and adapting techniques for solving geologic problems (mainly because of lack of funds and accessibility to data files). Simultaneously in industry, large-data files have been converted to machinable form. The develop- ment of standards and requirements for programs and data-file format by government agencies and other interested organizations have made con- siderable progress (e.g., Hubaux, 1970; Robinson, 1970). The wedding of new imaginative techniques as applied to real data of known quality in readily accessible, compatible files will result surely in a com- pletely integrated system leading to significant findings. Only one warning. Geologists must keep in mind their objectives and define their problems sharply to keep from becoming a tool of the computer rather than the master.
  • 160. 6. COMPUTER PERSPECTIVES IN GEOLOGY 149 REFERENCES Harbaugh, J. W., and Merriam, D. F., 1968, Computer applications in stratigraphic analysis: New York, John Wiley &Sons, 282 p. Hubaux, A., 1970, Description of geological objects: Jour. Math. Geol- ogy, v. 2, no. 1, p. 89-95. Krumbein, W. C., and Sloss, L. L., 1958, High-speed digital computers in stratigraphic and facies analysis: Am. Assoc. Petroleum Geol- ogists Bull., v. 42, no. 11, p. 2650-2669. Merriam, D. F., 1965, Geology and the computer: New Scientist, v. 26, no. 444, p. 513-516. Merriam, D. F., 1969, Computer utilization by geologists, in Symposium on computer applications in petroleum exploration: Kansas Geol. Survey Computer Contr. 40, p. 1-4. Parker, M. A., 1952, Punched-card techniques speed map making Cabs.): Geol. Soc. Amer. Bull., v. 63, no. 12, pt. 2, p. 1288. Preston, F. W., 1969, Systems analysis--the next phase for computer development in petroleum engineering in Computer applications in the earth sciences: New York, PlenumlPress, p. 177-189. Robinson, S. C., 1970, A review of data processing in the earth sci- ences in Canada: Jour. Math. Geology, v. 2, no. 4, p. 377-397. Sylvester-Bradley, P. C., 1972, Geobiology and the future of palaeon- tology: Jour. Geol. Soc., v. 128, pt. 2, p. 109-117.
  • 161. Chapter 7 Problem Set in Geostatistics R. B. McCammon 7.1 A PALEONTOLOGIST'S DILEMMA Six fossils, Acanthus minerva, Gyro pipus, Rega elegans, Acanthus exerta, Rega veliforma, and Gyro robusta were found together in a tray at the museum. From an outside label, it is known that Rega veliforma, Acanthus exerta, and Gyro robusta are marine, fresh water, and brack- ish water species (not necessarily in that order) each from a differ- ent locality. Acanthus minerva was collected in Illinois. The fresh water species was collected in New England. Rega elegans is known not to be marine. The fresh water species, and one of the other, which is a marine species, were collected from the same locality. Gyro robusta is larger than the marine species. One of the species of the same genus as the fresh water species was collected in California. Which one is the marine species? No doubt you will unravel the logic for this dilemma. Do you think, however, that you could write a computer program to solve the problem? 7.2 PARTICLE SIZE DISTRIBUTION IN THIN SECTION In a recent paper by Rose (1968), it was shown that the probabil- ity p that an intersection figure cut at random in a block of thickness t from a single sphere of diameter D will have a diameter falling be- tween the limits da and db is given by 150
  • 162. 7. PROBLEM SET IN GEOSTATISTICS 151 (7.2.1) where 0 < da ~ db ~ D < t. 1. Let D t. Show that equation (7.2.1) is a probability distribu- tion. What is the expected value? Prepare a frequency plot of this distribution. 2. Rose went on to show that if the number of spheres of diameter D equal to id was denoted as Ni where da = (j - l)d and db = jd, then the expected number between da and db was given by C.. = N. <Pl.). 1) 1 where <p .• = 1) o i<j Write a computer program to generate the elements of the matrix <P for m = 10. 3. Suppose you are given the following observed frequencies of par- ticle diameters determined from thin section: Size class (smallest) 1 2 3 4 5 6 7 8 9 (largest) 10 Frequency o 16 '87 155 150 65 32 8 4 1 Using the matrix generated above, determine the true frequency distribution of particle diameters. Interpret your results.
  • 163. 152 R.B. McCAMMON 7.3 PARTICLE DIAMETER SAMPLING EXPERIMENT We have just examined the probability distribution for the appa- rent diameter of a spherical particle of diameter D cut at random in thin section. Now let us simulate the process. Imagine a spherical grain p ___--'r:rn __y~X-=9-1...::2-'----- pi o cut by a random plane parallel to PP'. Y is defined as a random vari- able distributed uniformly between zero and D/2, the particle radius. We have F/D P (y) = 10 o ~ y ~ D/2 otherwise as the probability density of Y. The apparent diameter 2X related to Y by or - ( 2 2 X = ~-Y + (D/2) is a random variable also. We wish to find the probability density of 2X by experiment. 1. Let D/2 = 1. Use a random number generator function to generate a sample of 100 values of Y distributed uniformly between 0 and 1. Calculate 2X for each Y and prepare a histogram for 2X in inter- vals of 0.2. Calculate the mean and standard deviation. Retain the mean value. 2. Repeat the sampling process above 100 times, recording only the mean value of 2X. Prepare a histogram of the mean. How would you describe the shape of this distribution?
  • 164. 7. PROBLEM SET IN GEOSTATISTICS 153 For a fuller explanation of the methods required to solve the two following problems, the reader is referred to McCammon (1969). The reference is given in Section 2.10 in Chapter 2. 7.4 LINEAR REGRESSION OF POROSITY DATA. The data given in Table 7.1 were gathered from well logs and rock cores taken from drill holes that penetrated subsurface formations in the Chicago land area. At issue is the relationship between log- derived and core-derived estimates of porosity. The values given are expressed as percentages. TABLE 7.1 Sample no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Log- derived porosity 10.0 9.0 7.0 6.0 9.0 7.0 10.0 5.0 7.0 0.0 5.0 6.0 9.0 8.0 5.0 6.0 6.0 5.0 8.0 15.0 16.0 7.0 12.0 14.0 22.0 Core- derived porosity 5.5 3.6 3.6 4.9 7.1 2.0 8.5 5.2 2.6 1.9 6.1 9.3 6.9 4.3 3.3 2.5 4.8 2.4 3.8 18.4 14.7 10.9 12.5 18.6 22.1 Sample no. 26 27 28 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Log- derived porosity 10.0 5.0 7.0 8.0 9.0 5.0 8.0 9.0 7.0 3.0 7.0 7.0 10.0 7.0 5.0 8.0 4.0 5.0 8.0 16.0 4.0 5.0 14.0 21. 0 21.0 Core- derived porosity 9.6 10.3 4.5 6.0 6.7 4.1 4.5 6.5 3.3 2.1 2.5 6.8 3.7 6.0 3.4 2.2 1.8 2.9 2.6 15.3 16.9 15.7 12.4 22.9 21.8
  • 165. 154 R.B. McCAMMON 1. As a first step, prepare a scatter diagram of the log-derived porosity versus the core-derived porosity values on graph paper by plotting the log-derived porosity value along the ordinate and the core-derived porosity value along the abscissa for each sam- pIe. Draw in the "best" fitting line by eye. Should the line be made to pass through the origin? 2. As measurements, core-derived porosity estimates are conceded to yield greater accuracies than log-derived estimates. However, rock cores must be analyzed for porosity singly using laboratory apparatus, whereas the acoustic log (the geophysical log commonly used for porosity determination) is recorded continuously with depth and therefore a continuous porosity profile in the borehole is generated. Now consider the log-derived porosity to be the dependent variable and calculate the intercept and slope param- eters of the "best" fitting line using ordinary linear regression. Draw this line on the scatter diagram and compare it with the previous one. 3. It is known that the depth of investigation for the acoustic log extends beyond the borehole and into the formation. The trans- mitter-receiver distance for the tool is much greater also than the length of core used in porosity determinations. Thus, on the average, log-derived estimates of porosity may be as accurate (or representative) as core-derived estimates of porosity and thus both are dependent variables. Assuming the error estimates are approximately equal, calculate the intercept and slope parameters of the "best" fitting line. Draw in this line and compare it with the others. 4. Finally, past experience can guide us in establishing a fixed relationship between two variables subject to errors. Suppose it is known that core-derived porosity estimates are accurate to within one percent and that log-derived porosity estimates are accurate to within five percent. Calculate the intercept and slope parameters of the "best" fitting line and compare this line with the others. 5., There are obviously many approaches to analyzing these data. What is important to note is that prior knowledge after all is a part of the data.
  • 166. 7. PROBLEM SET IN GEOSTATISTICS 155 7.5 LINEAR REGRESSION OF POROSITY DATA. II On the assumption that the core-derived estimates of porosity are exact, we may wish to examine the nature of the regression, treating the log derived porosity as a dependent variable having independent and identically distributed normal random errors for successive ob- servations. We can construct an analysis of variance table (Table 7.2) and proceed to test various hypotheses. Let R2 = (SS due to regression)/(total variation) be defined as the proportion of total variation explained by the regression. Calcu- late R2 for both models; Perform an F test on the significance of the regression for each model at the 95 percent level. For the linear model given by Y aX + f3 make a t test at the 95 percent significance level that a O. For the model given by Y aX make a test at the 95 percent significance level that a 1. TABLE 7.2 Source SS, D.F. , of sum of degrees of Mean variation squares freedom square Due to l: (y - -2 y) 1 regression Residual A 2 n - 2 2 SS/ (n - 2) l: (y - y) s Total -2 variation l: (y - y) n - 1
  • 167. 156 R.B. HcCAMMON 7.6 SUNSPOTS AND EARTHQUAKES It has been suggested (Simpson, 1967) that solar activity may be a triggering mechanism for earthquakes. Table 7.3, extracted from Fig. 2 of Simpson's article, covers the period between January 1, 1950 through June 30, 1963 and lists the average number of earthquakes oc- curring per day (>5.5 on the Richter scale) for those days in which the Zurich sunspot number fell within the stated intervals. A higher sunspot number is associated with greater solar activity. TABLE 7.3 Zurich Average no. Zurich Average no. sunspot of sunspot of number earthquakes numher earthquakes 0-5 3.2 110-115 5.2 5-10 3.6 115-120 5.6 10-15 3.5 120-125 5.8 15-20 4.0 125-130 5.8 20-25 3.7 130-135 5.9 25-30 3.7 135-140 5.9 30-35 4.1 140-145 6.1 35-40 4.1 145-150 6.3 40-45 4.4 150-155 5.1 45-50 3.9 155-160 4.7 50-55 3.9 160-165 5.7 55-60 4.0 165-170 5.8 60-65 3.9 170-175 5.8 65-70 4.6 175-180 5.8 70-75 4.0 180-185 5.7 75-80 4.5 185-190 6.0 80-85 5.2 190-195 5.5 85-90 4.0 195-200 5.4 90-95 5.7 200-205 6.0 95-100 5.4 205-210 5.0 100-105 5.3 210-215 6.2 105-110 5.4 215-220 5.5
  • 168. 7. PROBLEM SET IN GEOSTATISTICS 157 From these data, calculate the correlation coefficient between the daily average earthquakes recorded and the Zurich sunspot number. Pre- pare a scatter plot. Do you regard this correlation as significant? Does this imply a casual relationship? Is there an inherent fallacy in casting the data in this form? 7.7 ABEGINNING AND AN END In a recent book, Shaw (1964) has proposed a new method for bio- stratigraphic correlation. For two fossiliferous stratigraphic columns, he has proposed that the time correlation be based on the first and last occurrences of the species present. Table 7.4, taken from Shaw, lists the elevations of first and last occurrences of fossil species for two stratigraphic sections from the Upper Cambrian Riley Formation, Llano Uplift, Texas. TABLE 7.4 ~10rgan White Creek Creek Species Base Top Base Top Kormagnostus simrlex 299 485 460 655 Kinsabia varigata 373 464 561 653 °risthotreta derressa 419 494 582 725 Sricule B 419 504 561 725 Tricrericerhalus coria 419 529 628 706 Meteorasris metra 446 485 628 677 Kingstonia rontotocensis 453 475 628 706 Raaschella ornata 529 532 750 751 Arhelasris walcotti 530 561 744 771 Angulotreta triangularis 538 570 756 779
  • 169. 158 R•B. ~1c CA~1~10N Prepare a scatter plot of these data showing the elevation of oc- currence of each species for the two stratigraphic sections. Calculate the correlation coefficient for: 1. First occurrences 2. Last occurrences 3. Combined occurrences Do you think there are significant differences in the correlations? How would you account for these? How would you go about establishing the equation of correlation for these two sections? 7.8 HEU1ERT TRANSFlRMATION The purpose of this exercise is to make you familiar with the geo- metrical interpretation of closed data arrays and to give you a feel for manipulating precentage data in three- or higher-dimensional space. The Helmert transformation is defined for n-dimensional space by the matrix l/m l/m l/m 11m -J P , [ -1/12 1/12 0 0 -1/16 -1/16 2/16 0 -1/v'n(n - 1) -l/ln(n - 1) -lIn(n - 1) (n - l)/In(n where the rows of P are the unit vectors of the transformation referred to as the initial Euclidean axes. For an observation vector x (xl ,x2, ... ,xn) defined so that ·n L x. i=1 1. 1 x. > 0 1. the origin first is relocated by defining Yi = xi - lin i 1, ... , n The transformed vector Z (zl,z2' ... , zn) is defined by Z = Py or expressed algebraically,
  • 170. 7. PROBLEM SET IN GEOSTATISTICS z n o 1/I2(Y2 - Y1) 1/16(2Y3 - Y1 - Y2) l//n(n - l)[(n - l)y - (y + .•. + Y )] n 1 n-1 159 Using triangular graph paper, superimpose the coordinate axes de- fined by the He1mert transformation for n = 3 on the triangular coor- dinate system. Plot the following points and check to see that the Helmert coordinates give rise to'the same location as would be obtained using triangular coordinates. 1 2 3 0.40 0.60 0.10 0.30 0.10 0.40 0.30 0.30 0.50 For the fourfold system, n 4, determine the Helmert coordinates for the following three points in 4-space and draw the intersection figure for the tetrahedral representation of these points. Xl x2 x3 x4 1 0.40 0.00 0.00 0.60 2 0.30 0.00 0.70 0.00 3 0.20 0.80 0.00 0.00 Position the Helmert coordinate axes on the tetrahedron. What are the coordinates of the normal vector to the intersection figure formed by the three points? For the following two problems, the reader should refer to McCammon (1969) for a more complete discussion of the methods involved. This reference can be found in Section 2.10 in Chapter 2.
  • 171. 160 R. B. McCA'1!'-10N 7.9 ASPINE TO REMEMBER The Florida crown conch (Melongena corona) is found commonly in the intertidal areas of Florida and Alabama, usually in the shade of mangrove trees. The shell is characterized by one or more rows of open spines on the shoulder. Colonial differences exist, however, which has given way to some preferring to erect subspecies. Imagine there are 46 specimens of the Florida crown conch in trays before you. Apart from the variation in size of the shell (as measured by its length), the most noticeable morphologic difference between specimens is the presence or absence of the development of lower spines. To aid you in the subsequent analysis, Table' 7.S lists for each specimen the shell length (expressed in centimeters) and the number (if any) of lower spines. Imagine that four of the specimens have been taped in such a way that the presence or absence of lower spines cannot be ascertained. These four specimens constitute unknowns which can be used to test the efficiency of any subsequent classification. 1. Prepare separate histograms based on shell length for shells which have and do not have lower spines. 2. There is clear evidence that the larger specimens tend to have lower spines, whereas the smaller specimens do not. The question is whether this difference has statistical significance. For the purposes here, we assume that the specimens represent a random sample from whatever parent population or populations we wish to define. We can perform first a t test of significance between the mean shell length for the two groups. Do this assuming first equal variance for the two populations and second, unequal vari- ance. 3. Since a significant difference does exist, we can devise a dis- criminant function based on shell length that will predict the presence or absence of lower spines for a particular specimen. For these data, construct such a function and estimate the proba- bility of a wrong prediction. 4. For the four taped specimens, predict for each whether lower spines are present.
  • 172. 7. PROBLEM SET IN GEOSTATISTICS 161 TABLE 7.5 Sample Length, Number Sample Length, Number no. cm of spines no. cm of spines 1 4.50 2 24 3.78 3 2 3.23 0 25 2.92 0 3 4.21 6 26 4.44 7 4 3.39 0 27 3.82 4 5 3.88 0 28 4.45 8 6 4.73 5 29 4.57 5 7 2.65 0 30 3.54 0 8 3.94 0 31 3.22 0 9 3.84 0 32 2.66 0 10 4.02 0 33 4.24 0 11 4.08 4 34 2.94 0 12 4.15 10 35 3.36 0 13 3.36 0 36 4.16 8 14 3.48 0 37 4.02 0 15 3.38 0 38 4.73 3 16 3.40 0 39 4.61 2 17 3.54 7 40 4.79 5 18 2.85 0 41 2.74 0 19 2.71 0 42 4.64 0 20 3.32 0 43 3.78 21 4.59 0 44 4.09 22 4.19 4 45 4.91 23 3.44 0 46 3.96 5. Having found the truth about the four taped specimens, (results on specimens available on request) add these to the sample data and construct a new discriminant function and estimate the proba- bility of a wrong prediction.
  • 173. 162 R.B. McCAMMON 7.10 NEARSHORE-OFFSHORE SEDIMENTS The data in Table 7.6 represent grain size analyses for 17 sam- ples of recent sediments collected from two separate environments and four additional samples that have not been classified. 1. Plot these data on a triangular diagram and draw in by eye the line which "best" separates the two types of environments. 2. Calculate the linear discriminant and draw in the line that rep- resents this function on the triangular diagram. What is the probability of misclassification? 3. Classify the four sample unknowns as being either nearshore or offshore. TABLE 7.6 Sample Sand Silt Clay Nearshore sediments 1 45 53 2 2 92 8 0 3 69 25 6 4 75 25 0 5 63 37 0 6 42 54 4 7 46 51 3 Offshore sediments 8 36 60 4 9 34 61 5 10 6 87 7 11 3 91 6 12 8 87 5 13 33 63 4 14 59 36 5 15 20 78 2 16 48 52 0 17 2 80 18 Sample unknown A 19 71 10 B 64 35 1 C 33 55 12 D 21 74 5
  • 174. 7. PROBLEM SET IN GEOSTATISTICS 163 4. Suppose it is known that the four unknowns were collected from the same locality. How would you now classify the group of four un- knowns? REFERENCES Rose, H. E., 1968, The determination of the grainsize distribution of a spherical granular material embedded in a matrix: Sedimentol- ogy, v. 10, p. 293-309. Shaw, A., 1964, Time in stratigraphy: New York, McGraw-Hill Book Co., 365 p. Simpson, J., 1967, Earth and Planetary Sci. Letters, v. 3, p. 417-425.
  • 175. Adelman, I. G., 102 Agterberg, F. P., 102 Allegre, C., 102 Amorocho, J., 102 Anderson, T. W., 93, 101 Apparent diameter, 15 Babbage, C. , 139, 145 Barometric pressure, 81 Bartlett, M. s. , 102 Billings, G. K. , 58 Billingsley, P. , 103 Binomial distribution, 12 Biostratigraphie correlation, 157 Blakely, R. F., 98, 104 Blyth, C. R., 103 Bonham-Carter, G., 95, 96, 101 Breaker height, 85 Cameron, E. M. , 58 Carozzi, A. v. , 71, 89 Carr, D. D. , 97, 98, 101 Chayes, F. , 107, 110, 113, 114 Chester section, 97, 98 Clark, W. A. v. , 103 Index Coastal processes, 86 Cocke, N. C., 104 Coefficient of variation, 108 COGEODATA, 145 Coleman, J. S., 103 Communality, 30, 41, 57 Computers in geology, 141 usage, 139 Conditional probability, 4, 92 Correlation coefficient, 3, 24 Correlation matrix, 36, 42 Dacey, M. F., 95, 100, 101, 102 Data matrix, 22 Davis, R. A. Jr., 79, 88 Degens, E. T., 58 Demirmen, F., 58 Deterministic model, 91 Deviate score, 25 Diurnal variation, 84 Doob, J. L., 103 Earthquakes, 156 Edwards, W. R., 89 Eigenvalues, 35, 55, 68-69 165
  • 176. 166 Eigenvectors, 35, 42, 68-69 Embedded Markov chain, 97 Factor common, 29 loadings, 29, 41, meaning of, 21 rotation, 42 scores, 29 Factor analysis, 21 Factors, 40 Feller, W. , 103 Finger, L. , 112 Florida crown conch, Fourier analysis, 72, 78 coefficients, 81 components, 81 57 160 Fox, W. T., 77, 79, 88 GEOCOM Bulletin, 146 Geometric distribution, 94 Geometric probability, 19 Gingerich, P. D., 103 Goodman, L. A., 93, 101 Goodness of fit, 77 Gower, J .. C., 60 Graf, D. L., 103 Graybill, F. A., 103 Grid search, 8 Griffiths, J. C., 103 Groundwater level, 85 Hammer, P. C. , 139, 140 Harbaugh, J. W., 48, 58, 60, 79, 80, 88, 95, 96, 101, 103, 139, 149 Harman, H. H., 43, 60 Hart, W. E., 102 Heller, R. A., 103 INDEX Helmert transformation, 158, 159 Henderson, J. H., 80, 88 Hitchon, B., 58 Hollerith, H., 139 Hubaux, A., 148, 149 IAMG, 146 Imbrie, J., 24, 48, 49, 51, 55, 58, 59, 60, 61 Independent-events model, 91, 95 Iterated moving average, 72, 74 James, W. R., 96, 101 Kaesler, R. L. , 60 Kahn, J. S. , 71, 88 Kaiser, H. F. , 42, 61 Karlin, S. , 103 Kemeny, J. G. , 103 Kendall, M. G., 19, 70, 72, 73, 74, 75, 88 Kipp, N. G., 59 Klovan, J. E., 24, 58, 59, 60 Krumbein, W. C., 71,88,93,95, 96, 98, 99, 100, 101, 102, 103, 104, 142, 145, 149 Lake Michigan, 84 Langbein, W. B., 104 Leopold, L. B., 104 Linear regression, 153 Lonka, A. , 59 Loucks, D. P. , 104 Louden, R. K. , 80, 88
  • 177. INDEX Lumsden, D. N., 104 Lynn, W. R., 104 mean, 108 variance, 107 Porosity data, 153, 155 Markov chain, 91, 92 Potter, P. E., 98, 104 Markovian clock, 92, 95 Power spectrum, 80 167 Matrix "algebra Preston, F. W., 80, 88, 146, 149 primer on, 62-69 Principal components, 31, 42 Matalas, N. C., 59, 104 Probability tree, 95 McCammon, R. B., 59, 60,153,159 Pseudorandom variable, 111 McElroy, M. N., 60 Purdy, E. G., 48, 60 McIntyre, D. B., 48, 61 Merriam, D. F., 79, 80, 88, 104, Q-mode factor analysis, 24, 25 139, 149 Miller, J. P. , 104 Miller, R. L. , 71, 88 Moran, P. A. P. , 19 Moving average, 71 Nearshore-offshore sediments, 162 Nonstochastic variable, 90 Norman, C. E., 89 Oblique rotation, 54 Paleontologist's dilemma, 150 Parallel-line search, 7 Parent correlation, 107 Parker, M. A., 141, 149 Parker, R. H., 58 Pattison, A., 96, 102 Pearson, K., 61, 106, 114 Pettijohn, F. J., 71, 88 Phase shift, 80 Polynomial curve fitting, 72 Population covariance, 107 Random process, 90 Random variables, 90 sum of two, 6 transformation, 5 RANEX, 112 Rank of a matrix, 66-67 Ratio correlations, 106 Recursive mean, 1 Recursive variance, 2 Reiher, B. J., 59 R-mode factor analysis, 23 Robinson, G., 74, 75, 76, 77, 89 Robinson, S. C., 148, 149 Rocks in thin section, 13, 150 Rogers, A., 104 Rose, H. E., 19, 20, 150, 163 RTCRSM2, 111 Sample mean, 1 Sample space, 90 Sample variance, 2 Scheidegger, A. E., 104 Scherer, W., 104
  • 178. 168 Schwarzacher, W., 96, 102, 104 Search for dikes, 7 Shannon, C. E., 139 Sharp, E. R., 89 Shaw, A., 157, 163 Sheppard's formulas, 77, 78 Shinozuka, M., 103 Shreve, R. L., 104, 105 Similarity indices coefficient of similarity, 48 correlation coefficient, 47 distance coefficient, 48 Simpson, J., 156, 163 Singer, G. A., 19, 20 Sloss, L. L., 142, 145, 149 Smart, J. S., 105 Snell, J. L., 103 Spearman, C. , 61 Spencer, D. W. , 58, 60 Spencer IS-term formula, 76, 77 21-term formula, 76, 77 Spurious correlation, 109, 113 Standard deviation, 26 Standard form, .25 Stationarity, 3, 96 Stemmler, R. S., 103 Sunspots, 156 Sutton, R. G., 71, 88 Sylvester-Bradley, P. C., 146, 149 Thurston, L. L., 61 Times series, 70 Transitional probability, 92 Transition matrix, 97 multistory, 98 Tukey, J. W., 111, 114 UNNO, 112 Upper Cambrian Riley fm, 157 Van Andel, T. H., 58 Varimax rotation, 43 Vistelius, A. B., 76, 88, 105 Wahlstedt, W. J., 103 Waiting time, 93 Walker, R. G., 71, 88 Walpole, R. L., 71, 89 Watson, R. A., 105 Wave period, 85 Weiss, M. P., 71, 89 INDEX Whittaker, E. T., 74, 75, 76, 77, 89 Wickman, F. E., 105 Wind velocity, 85 Wolman, M. G., 104 Woolhouse IS-term formula, 75, 77 Zeller, E. J., 105 Zurich sunspot number, 156