Statistical Analysis Of Stochastic Processes In
Time 1st Edition J K Lindsey download
https://ebookbell.com/product/statistical-analysis-of-stochastic-
processes-in-time-1st-edition-j-k-lindsey-1011176
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Statistical Analysis And Stochastic Modelling Of Hydrological Extremes
Hossein Tabari
https://ebookbell.com/product/statistical-analysis-and-stochastic-
modelling-of-hydrological-extremes-hossein-tabari-55889800
Modern Problems Of Stochastic Analysis And Statistics Selected
Contributions In Honor Of Valentin Konakov 1st Edition Vladimir Panov
Eds
https://ebookbell.com/product/modern-problems-of-stochastic-analysis-
and-statistics-selected-contributions-in-honor-of-valentin-
konakov-1st-edition-vladimir-panov-eds-6841420
Statistical Analysis Of Proteomic Data Methods And Tools Thomas Burger
https://ebookbell.com/product/statistical-analysis-of-proteomic-data-
methods-and-tools-thomas-burger-47092862
Statistical Analysis Of Ecotoxicity Studies John W Green Timothy A
Springer Henrik Holbech
https://ebookbell.com/product/statistical-analysis-of-ecotoxicity-
studies-john-w-green-timothy-a-springer-henrik-holbech-49476736
Statistical Analysis Of Massive Data Streams Proceedings Of A Workshop
1st Edition Committee On Applied And Theoretical Statistics Board On
Mathematical Sciences And Their Applications
https://ebookbell.com/product/statistical-analysis-of-massive-data-
streams-proceedings-of-a-workshop-1st-edition-committee-on-applied-
and-theoretical-statistics-board-on-mathematical-sciences-and-their-
applications-51848662
Statistical Analysis Of American Divorce Alfred Cahen
https://ebookbell.com/product/statistical-analysis-of-american-
divorce-alfred-cahen-51907986
Statistical Analysis Of Designed Experiments Third Edition 3rd Edition
Helge Toutenburg Auth
https://ebookbell.com/product/statistical-analysis-of-designed-
experiments-third-edition-3rd-edition-helge-toutenburg-auth-2014970
Statistical Analysis Of Clinical Data On A Pocket Calculator
Statistics On A Pocket Calculator Cleophas
https://ebookbell.com/product/statistical-analysis-of-clinical-data-
on-a-pocket-calculator-statistics-on-a-pocket-calculator-
cleophas-22002900
Statistical Analysis Of Profile Monitoring 1st Edition Rassoul
Noorossana
https://ebookbell.com/product/statistical-analysis-of-profile-
monitoring-1st-edition-rassoul-noorossana-2373502
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Statistical Analysis of Stochastic Processes in Time
Many observed phenomena, from the changing health of a patient to values on the
stock market, are characterised by quantities that vary over time: stochastic processes
are designed to study them. Much theoretical work has been done but virtually no
modern books are available to show how the results can be applied. This book fills
that gap by introducing practical methods of applying stochastic processes to an
audience knowledgeable only in the basics of statistics. It covers almost all aspects
of the subject and presents the theory in an easily accessible form that is highlighted
by application to many examples. These examples arise from dozens of areas, from
sociology through medicine to engineering. Complementing these are exercise sets
making the book suited for introductory courses in stochastic processes.
Software is provided within the freely available R system for the reader to be able
to apply all the models presented.
J. K. LINDSEY is Professor of Quantitative Methodology, University of Liège. He
is the author of 14 books and more than 120 scientific papers.
CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC
MATHEMATICS
Editorial Board
R. Gill (Department of Mathematics, Utrecht University)
B. D. Ripley (Department of Statistics, University of Oxford)
S. Ross (Department of Industrial Engineering, University of California, Berkeley)
B. W. Silverman (St Peter’s College, Oxford)
M. Stein (Department of Statistics, University of Chicago)
This series of high-quality upper-division textbooks and expository monographs cov-
ers all aspects of stochastic applicable mathematics. The topics range from pure and
applied statistics to probability theory, operations research, optimization, and mathe-
matical programming. The books contain clear presentations of new developments
in the field and also of the state of the art in classical methods. While emphasizing
rigorous treatment of theoretical methods, the books also contain applications and
discussions of new techniques made possible by advances in computational practice.
Already published
1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley
2. Markov Chains, by J. Norris
3. Asymptotic Statistics, by A. W. van der Vaart
4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and
Andrew T. Walden
5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu
6. Empirical Processes in M-Estimation, by Sara van de Geer
7. Numerical Methods of Statistics, by John F. Monahan
8. A User’s Guide to Measure Theoretic Probability, by David Pollard
9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan
10. Data Analysis and Graphics using R, by John Maindonald and John Braun
11. Statistical Models, by A. C. Davison
12. Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll
13. Exercises in Probability, by L. Chaumont and M. Yor
Statistical Analysis of Stochastic
Processes in Time
J. K. Lindsey
University of Liège
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge cb2 2ru, UK
First published in print format
isbn-13 978-0-521-83741-5
isbn-13 978-0-511-21194-2
© Cambridge University Press 2004
2004
Information on this title: www.cambridge.org/9780521837415
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
isbn-10 0-511-21371-9
isbn-10 0-521-83741-3
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
hardback
eBook (EBL)
eBook (EBL)
hardback
Contents
Preface page ix
Notation and symbols xiii
Part I Basic principles 1
1 What is a stochastic process? 3
1.1 Definition 3
1.2 Dependence among states 10
1.3 Selecting models 14
2 Basics of statistical modelling 18
2.1 Descriptive statistics 18
2.2 Linear regression 21
2.3 Categorical covariates 26
2.4 Relaxing the assumptions 29
Part II Categorical state space 37
3 Survival processes 39
3.1 Theory 39
3.2 Right censoring 47
3.3 Interval censoring 53
3.4 Finite mixtures 57
3.5 Models based directly on intensities 60
3.6 Changing factors over a lifetime 64
4 Recurrent events 71
4.1 Theory 72
4.2 Descriptive graphical techniques 83
4.3 Counts of recurrent events 88
4.4 Times between recurrent events 91
5 Discrete-time Markov chains 101
5.1 Theory 102
5.2 Binary point processes 108
5.3 Checking the assumptions 114
5.4 Structured transition matrices 119
v
vi Contents
6 Event histories 133
6.1 Theory 133
6.2 Models for missing observations 138
6.3 Progressive states 142
7 Dynamic models 151
7.1 Serial dependence 152
7.2 Hidden Markov models 161
7.3 Overdispersed durations between recurrent events 167
7.4 Overdispersed series of counts 178
8 More complex dependencies 183
8.1 Birth processes 183
8.2 Autoregression 191
8.3 Marked point processes 195
8.4 Doubly stochastic processes 198
8.5 Change points 202
Part III Continuous state space 211
9 Time series 213
9.1 Descriptive graphical techniques 213
9.2 Autoregression 216
9.3 Spectral analysis 226
10 Diffusion and volatility 233
10.1 Wiener diffusion process 233
10.2 Ornstein–Uhlenbeck diffusion process 238
10.3 Heavy-tailed distributions 240
10.4 ARCH models 249
11 Dynamic models 255
11.1 Kalman filtering and smoothing 255
11.2 Hidden Markov models 259
11.3 Overdispersed responses 262
12 Growth curves 268
12.1 Characteristics 268
12.2 Exponential forms 269
12.3 Sigmoidal curves 275
12.4 Richards growth curve 278
13 Compartment models 285
13.1 Theory 285
13.2 Modelling delays in elimination 289
13.3 Measurements in two compartments 293
14 Repeated measurements 303
14.1 Random effects 303
14.2 Normal random intercepts 306
14.3 Normal random coefficients 310
14.4 Gamma random effects 312
Contents vii
References 317
Author index 327
Subject index 330
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Preface
Throughout their history, human beings have been fascinated by time. Indeed, what
is history but an interpretation of time? Each civilisation has had its own special
conception of time. Our present anti-civilisation only knows ‘time is money’! No
one can deny that the study of time is important. This text attempts to make more
widely available some of the tools useful in such studies.
Thus, my aim in writing this text is to introduce research workers and students
to ways of modelling a wide variety of phenomena that occur over time. My goal
is explicitly to show the broadness of the field and the many inter-relations within
it. The material covered should enable mathematically literate scientists to find
appropriate ways to handle the analysis of their own specific research problems. It
should also be suitable for an introductory course on the applications of stochastic
processes. It will allow the instructor to demonstrate the unity of a wide variety of
procedures in statistics, including connections to other courses. If time is limited,
it will be possible to select only certain chapters for presentation.
No previous knowledge of stochastic processes is required. However, an intro-
ductory course on statistical modelling, at the level of Lindsey (2004), is a neces-
sary prerequisite. Although not indispensable, it may be helpful to have more ex-
tensive knowledge of several areas of statistics, such as generalised linear and cat-
egorical response models. Familiarity with classical introductory statistics courses
based on point estimation, hypothesis testing, confidence intervals, least squares
methods, personal probabilities, . . . will be a definite handicap.
Many different types of stochastic processes have been proposed in the litera-
ture. Some involve very complex and intractable distributional assumptions. Here,
I shall restrict attention to a selection of the simpler processes, those for which
explicit probability models, and hence likelihood functions, can be specified and
which are most useful in statistical applications modelling empirical data. More
complex models, including those requiring special estimation techniques such as
Monte Carlo Markov Chain, are beyond the scope of this text. Only parametric
models are covered, although descriptive ‘nonparametric’ procedures, such as the
Kaplan–Meier estimates, are used for examining model fit.
The availability of explicit probability models is important for at least two rea-
sons:
ix
x Preface
(i) Probability statements can be made about observable data, including the
observed data:
(a) A likelihood is available for making inferences.
(b) Predictions can be made.
(ii) If the likelihood can be calculated, models can be compared to see which
best fit the data, instead of making empty claims about wonderful models
with no empirical basis, as is most often done in the statistical literature.
Isolated from a probability model basis, parameter estimates, with their standard
errors, are of little scientific value.
Many standard models, such as those for survival, point processes, Markov
chains, and time series, are presented. However, because of the book’s wide scope,
it naturally cannot cover them in as great a depth as a book dedicated to only one of
them. In addition, certain areas, such as survival analysis and time series, occupy
vast literatures to which complete justice cannot be made here. Thus, in order to
provide a reasonably equitable coverage, these two topics are explored especially
briefly; the reader can consult a good introductory text on either of these topics for
additional details.
Many basic theoretical results are presented without proof. The interested reader
can pursue these in further detail by following up the ‘Further reading’ list at the
end of each chapter. On the other hand, for the readers primarily interested in de-
veloping appropriate stochastic models to apply to their data, the sections labelled
‘Theory’ can generally be skimmed or skipped and simply used as a reference
source when required for deeper understanding of their applications.
Stochastic processes usually are classified by the type of recording made, that
is, whether they are discrete events or continuous measurements, and by the time
spacing of recording, that is, whether time is discrete or continuous. Applied
statisticians and research workers usually are particularly interested in the type
of response so that I have chosen the major division of the book in this way, distin-
guishing between categorical events and continuous measurements. Certain mod-
els, such as Markov chains using logistic or log linear models, are limited to dis-
crete time, but most of the models can be applied in either discrete or continuous
time.
Classically, statistics has distinguished between linear and nonlinear models,
primarily for practical reasons linked with numerical methods and with inference
procedures. With modern computing power, such a distinction is no longer nec-
essary and will be ignored here. The main remaining practical difference is that
nonlinear models generally require initial values of parameters to be supplied in
the estimation procedure, whereas linear models do not.
It is surprisingly difficult to find material on fitting stochastic models to data.
Most of the literature concentrates either on the behaviour of stochastic models
under specific restrictive conditions, with illustrative applications rarely involving
real data, or on the estimation of some asymptotic statistics, such as means or
variances. Unavoidably, most of the references for further reading given at the
ends of the chapters are of much more difficult level than the present text.
Preface xi
My final year undergraduate social science students have helped greatly in de-
veloping this course over the past 25 years. The early versions of the course were
based on Bartholomew (1973), but, at that time, it was very difficult or impossi-
ble actually to analyse data in class using those methods. However, this rapidly
evolved, eventually to yield Lindsey (1992). The present text reflects primarily the
more powerful software now available. Here, I have supplemented the contents of
my current course with extra theoretical explanations of the stochastic processes
and with examples drawn from a wide variety of areas besides the social sciences.
Thus, I provide the analysis of examples from many areas, including botany
(leaf growth), criminology (recidivism), demography (migration, human mortal-
ity), economics and finance (capital formation, share returns), education (univer-
sity enrolment), engineering (degradation tests, road traffic), epidemiology (AIDS
cases, respiratory mortality, spermarche), industry (mining accidents), medicine
(blood pressure, leukæmia, bladder and breast cancer), meteorology (precipita-
tion), pharmacokinetics (drug efficacy, radioactive tracers), political science (vot-
ing behaviour), psychology (animal learning), sociology (divorces, social mobil-
ity), veterinary science (cow hormones, sheep infections), and zoology (locust
activity, nematode control). Still further areas of application are covered in the
exercises.
The data for the examples and exercises, as well as the R code for all of the
examples, can be found at popgen0146uns50.unimaas.nl/ jlindsey,
along with the required R libraries. With this material, the reader can see exactly
how I performed the analyses described in the text and adapt the code to his or her
particular problems.
This text is not addressed to probabilists and academic statisticians, who will
find the definitions unrigorous and the proofs missing. Rather, it is aimed at the
scientist, looking for realistic statistical models to help in understanding and ex-
plaining the specific conditions of his or her empirical data. As mentioned above,
the reader primarily interested in applying stochastic processes can omit reading
the theory sections and concentrate on the examples. When necessary, reference
can then be made to the appropriate parts of theory.
I thank Bruno Genicot, Patrick Lindsey, and Pablo Verde who provided useful
comments on earlier versions of certain chapters of this text.
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Notation and symbols
Notation is generally explained when it is first introduced. However, for reference,
some of the more frequently used symbols are listed below.
Vectors are bold lower case and matrices bold upper case Greek or Roman let-
ters. denotes the transpose of a vector or matrix.
arbitrary indices
, random response variable and its observed value
, time
lag
sum of random variables
, explanatory variables
, number of events
∆ , ∆ change in number of events
∆ interval width
previous history
location parameter
2 variance
probability (usually binary)
change point parameter
, random parameters
, , , , , , , , , arbitrary parameters
(auto)correlation or other dependence parameter
order of a Markov process
length of a series
Pr probability of response
probability density function
cumulative distribution function
survival function
probability of a random parameter
arbitrary regression function
link function
Λ integrated intensity (function)
intensity (function)
xiii
xiv Notation and symbols
E expected value
(auto)covariance (function)
covariance matrix
I indicator function
beta function
Γ gamma function
L likelihood function
transition (probability) matrix
transition intensity matrix
marginal or conditional probability distribution
first passage distribution
diagonal matrix of probabilities
diagonal matrix of eigenvalues
matrix of eigenvectors
vector of deterministic input
vector of random input
Part I
Basic principles
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
1
What is a stochastic process?
Intuitively, a stochastic process describes some phenomenon that evolves over time
(a process) and that involves a random (a stochastic) component. Empirically, we
observe such a process by recording values of an appropriate response variable
at various points in time. In interesting cases, the phenomenon under study will
usually depend on covariates. Some of these may also vary over time, whereas
others will define the differing (static) conditions under which various ‘copies’ of
the process occur.
1.1 Definition
Thus, a stochastic process involves some response variable, say , that takes val-
ues varying randomly in some way over time (or space, although that will not be
considered here). may be a scalar or a vector, but I shall concentrate primarily
on scalar responses in this text (see, however, Section 8.3). Generally, in the study
of such processes, the term ‘random’ is replaced by ‘stochastic’; hence, the name.
An observed value or realisation of the response variable is called the state
of the process at time . We might call an observation of the state of a process an
event. However, I shall restrict the meaning of event to the occurrence of a change
of state. Thus, the number of possible different events will depend, among other
things, on the number of distinct states.
More generally, the probability of the process being in some given state at some
point in time may depend on some function of previous events and of covariates.
Usually, the probabilities of possible events will be conditional on the state of the
process. Such relationships will thus be determined by the type of model being
fitted.
The main properties distinguishing among observed stochastic processes are:
(i) The frequency or periodicity with which observations are made over time.
(ii) The set of all of its possible observable values, that is, of all possible re-
sponses or states of the series, called the state space.
(iii) The sources and forms of randomness present, including the nature of the
dependence among the values in a series of realisations of the random vari-
able .
3
4 What is a stochastic process?
(iv) The number of ‘copies’ of the process available (only one or several), which
will determine how adequate information can be obtained for modelling.
Let us look more closely at each of these aspects.
1.1.1 Time
Observations of a stochastic process (at least of the kinds that interest us here) are
made over time. If these observations are at equally spaced intervals, time is said
to be discrete. Otherwise, time is continuous. Notice, however, that a process can
never really be observed continuously because that would imply an infinite number
of observations, even in a small interval of time. Hence, the distinction is primarily
used to determine what kind of model will be applied. Continuous-time models
can be used for any data but may be more complex, and may not be appropriate
if changes can only occur at discrete time points. Discrete-time models require
equally-spaced observations, unless some simple mechanism for missingness can
be introduced.
Attention may centre on
(i) the states at given points in time,
(ii) the events, that is, on what change of state occurs at each particular time
point, or
(iii) the times of events.
Consider simple examples:
When an economist measures monthly unemployment, time is only an equally-
spaced indicator of when observations are made. The number of unemployed
may change at each monthly recording. Either the level of employment (the
state) or the amount of change (the event) may be of central interest. Time itself
is not of direct concern except for ordering the observations.
In contrast, when a doctor records the times of a patient’s repeated infections,
the state might be defined to be the total number of such infections so far suf-
fered by that patient. Each observation (event) is the same, an infection, and the
time between these events is essential. With substantial loss of information, this
could be reduced to discrete time by recording only the numbers of infections in
equally-spaced time intervals.
A response may only be recorded when some specific event of interest occurs.
However, in order to determine the timing of that event, fairly continual observa-
tion of the process usually is necessary, that is, a series of intermediate, implicit
recordings of no event. If observation begins at some natural time point, such as
birth, at which no response value occurs, the mechanism determining the time to
the first event will usually be different from that between subsequent events.
1.1 Definition 5
1.1.2 State space
At any given point in time, a process will be in some state. This usually is observed
by recording the value of a response variable at that point . As always in
statistical modelling, the set of possible states is a scientific construct that should
be defined in a way appropriate to answer the questions at hand. Although, in
principle, the state is what is being observed, certain models also assume a second
process of unobservable hidden states (Chapters 7 and 11).
The set of all possible observable states is called the state space. This may be
finite, yielding a categorical response variable, or infinite, giving either a discrete
or a continuous response variable. Generally, a minimal state space is chosen, one
in which the probability (density) of every state is nonzero.
If an observed response variable were truly continuous, every recording would
be an event because no two would be the same. However, this is empirically im-
possible so that any observable process could possibly stay in the same state (given
the limit of precision of the recording instrument) over two or more consecutive
observation points.
A categorical response usually refers to a finite set of possible different states that
may be observed. However, when only one type of response is of particular interest
and it is fairly rare, the response might be recorded as binary, indicating presence
(coded 1) or absence (coded 0) of that response. This particular response, with
a binary state space, is often called a point event or a recurrent event (Chapter 4);
thus, here the term ‘event’ refers both to one of the states and to the change of state.
Above, I gave an example of repeated infections, where the cumulative number of
infections was important. If, instead, repeated epileptic fits were recorded, the
states might more appropriately be defined as having a fit or not on each particular
day instead of the total number of fits so far.
In certain situations, the state may be defined by more than one response value,
that is, may be a vector containing quite distinct types of values. Thus, there
may be a categorical value, such as a binary indicator that it rains or not, accompa-
nied by a (usually quantitative) value, the mark, for example, how much rain fell
(Section 8.3).
A vector also is usually necessary when there are endogenous time-varying
covariates. These are variables, other than the response of direct interest, that are
influenced by the previous states of that response. Suppose, for example, that
the condition of a patient, measured in some appropriate way, is the response of
direct interest. If the dose of a medication that a patient receives depends upon
his or her previous condition, then dose and condition will generally have to be
handled simultaneously. They must together define the state and be allowed to
vary stochastically interdependently, if a reasonable model is to be constructed.
A process may also involve time-varying exogenous covariates, that is, variables
not influenced by the previous states of the process. The stochastic variability of
such covariates is not usually of interest, so that the probability of the state can be
taken to be conditional on their observed values, as in classical regression models.
Practically, we see from this section and the preceding one that models for
6 What is a stochastic process?
Table 1.1. Chapters in which different types of stochastic processes are covered.
State space
Time Categorical Continuous
Discrete 5, 8 9
Continuous 3, 4, 6, 7, 8 9, 10, 11, 12, 13, 14
stochastic processes can be classified by the type of variable observed, that is,
the state space, and by the frequency or regularity with which it is observed over
time. The structure of this text according to these criteria is summarised in Table
1.1. Recall, however, that most models in continuous time also can be applied to
observations in discrete time.
1.1.3 Randomness
In a deterministic process, one can predict the exact sequence of outcomes (states)
from the initial conditions, although some, especially chaotic, systems are ex-
tremely sensitive to these conditions. In contrast, a stochastic process has an in-
herent component of unpredictability or randomness. In empirical observation,
randomness is closely associated with unknownness or incomplete information.
To handle such situations, the theory of probability is used. Thus, predictions
involving stochastic processes will not indicate specific outcomes, but only the
probabilities of occurrence of the different possible outcomes.
A stochastic process may involve many forms of randomness. A number will
arise from the imperfections of the observation and modelling procedures. How-
ever, in scientific modelling, the most important should be inherent in the process
under study.
The types of randomness that need to be allowed for in modelling a stochastic
process can have several sources including the following:
(i) Unmeasurable variability is built into many scientific systems. This is true
of quantum mechanics, but also of most biological and social processes.
(ii) Almost invariably, all of the measurable factors determining the process
cannot be take into account. For example, unrecorded environmental con-
ditions may change over the period in which the process is acting.
(iii) The initial conditions may be very difficult to determine precisely.
Unfortunately, an inappropriate model can also generate additional spurious ran-
domness and dependencies.
Traditionally, essentially for mathematical simplicity, the Poisson, binomial, and
normal distributions have been used in modelling the randomness of stochastic pro-
cesses. However, we shall see as we proceed that a wide variety of other distribu-
tions may be more suitable in specific circumstances.
1.1 Definition 7
1.1.4 Stationarity, equilibrium, and ergodicity
The series of responses of a stochastic process usually will not be independent.
Thus, procedures must be available to introduce appropriate dependencies. Be-
cause complex models should generally be avoided in science, relatively simple
methods for introducing such dependencies are desirable.
Multivariate distributions
Multivariate probability distributions provide the general framework in which to
specify the ways in which responses are interdependent. Unfortunately, in the
context of stochastic processes, these may be difficult to use, especially if long
series of observations are available. Thus, with a series of observations, a model
involving a multivariate normal distribution will require the manipulation of an
covariance matrix. When is large, this will often not be practical or
efficient, at least for technical reasons.
Fortunately, for an ordered series of responses, such as those that interest us,
the multivariate distribution always can be decomposed into an ordered sequence
of independent conditional univariate distributions:
0 1 0 0 1 1 0 0 1 (1.1)
where
0 1
0
0 1
(1.2)
Notice that each conditional distribution 0 1 may be completely dif-
ferent from all the others, even though the multivariate distributions for differ-
ent lengths of series will usually have the same form.
Generally, it will be easier to work with the conditional distributions than the
multivariate one. For the multivariate normal distribution, this will be a series of
univariate normal distributions, not involving directly the covariance matrix. I shall
elaborate on this approach in Section 1.2 below.
In contrast to the multivariate distribution, and its univariate conditional decom-
position, the series of univariate marginal distributions, although often of interest,
cannot by itself specify a unique stochastic process (unless all successive states are
independent). It is generally of limited direct use in constructing a realistic model
and will, most often, rather be a byproduct of the construction. On the other hand,
this sequence of univariate marginal distributions of a stochastic process does pro-
vide valuable information indicating how the process is evolving over time: its
‘underlying’ profile or trend.
In the decomposition in Equation (1.1), no restrictive assumptions have been
made (except ordering). However, in order for such a multivariate model to be
tractable, even fitted conditionally, rather strong assumptions often have to be
made. The problem with the general specification just given is that every response
has a different conditional distribution, and each new response will have yet an-
other one. The situation is changing faster than information can be accumulated
about it! Thus, we require some reasonable simplifying assumptions.
8 What is a stochastic process?
Stationarity
A stochastic process is said to be strictly stationary if all sequences of consecutive
responses of equal length in time have identical multivariate distributions
1 1 1 1 (1.3)
for all and . In other words, shifting a fixed-width time observation window
along a strictly stationary series always yields the same multivariate distribution.
Such an assumption reduces enormously the amount of empirical information nec-
essary in order to model a stochastic process.
A less restrictive assumption, reducing even further the amount of information
required, is that a process is second-order stationary. This is defined only by the
mean, variance, and covariances:
E
E
(1.4)
for all and . Because a multivariate normal distribution is completely defined
by its first two moments, if the process is normal and second-order stationary, it
is strictly stationary. This is not generally true of other distributions. In this text,
stationarity will always be strict.
Stationarity is a characteristic of multivariate distributions. Thus, from Equation
(1.1), it cannot be determined solely by the conditional distributions, but requires
also that the initial marginal distribution 0 0 be specified. Of course, this, in
turn, implies that the univariate marginal distributions at all other time points will
also be known.
Stationarity can be an appropriate assumption if the stochastic process has no
inherent time origin. However, in experimental situations, for example, where
treatments are applied at a specific time point, this will not be true. The need
for greater information due to lack of stationarity can often be compensated by
studying replications of the process (Section 1.1.5).
Equilibrium
Although a process may not be stationary when it starts, it may reach an equilib-
rium after a sufficiently long time, independent of the initial conditions. In other
words, if an equilibrium has been reached, the probability that the process is in
each given state, or the proportion of time spent in each state, has converged to a
constant that does not depend on the initial conditions. This generally implies that
eventually the process approaches closely to a stationary situation in the sense that,
if it initially had the equilibrium distribution of states, it would be stationary. (See
Cox and Miller, 1965, pp. 9, 272.)
Ergodicity
The concept of ergodicity is closely related to that of equilibrium, although the for-
mer has various meanings in the literature on stochastic processes. Ergodic theo-
rems provide identities between probability averages, such as an expected value,
1.1 Definition 9
and long-run averages over a single realisation of the process. Thus, if the equi-
librium probability of being in a given state equals the proportion of a long time
period spent in that state, this is called an ergodic property of the process. In a
similar way, the law of large numbers can be generalised to stochastic processes.
(See Cox and Miller, 1965, p. 292.)
Regeneration points
Another important concept for some stochastic processes is that of a regeneration
point. This is a time instant at which the process returns to a specific state such
that future evolution of the process does not depend on how that state was reached.
In other words, whenever such a process arrives at a regeneration point, all of its
previous history is forgotten.
A well known case is the renewal process (Section 4.1.4) describing times be-
tween recurrent events which, as its name suggests, starts over again at each such
event. (See Daley and Vere-Jones, 1988, p. 13.)
I shall look at further general procedures for simplifying models of stochastic
processes in Section 1.2 below.
1.1.5 Replications
When studying a stochastic process, two approaches to obtaining adequate infor-
mation can be envisaged. One can either observe
(i) one series for a long enough period, if it is reasonably stable, or
(ii) several short ‘replications’ of the process, if they are reasonably similar.
In certain situations, one has no choice but to use replications. Thus, for example,
with survival data (Chapter 3), one single event, say death, terminates the process
so that the only way to proceed is by collecting information on a large number of
individuals and by assuming that the process is identical for all of them.
Both approaches can create problems. The phenomenon under study may not be
stable enough to be observed over a very long time, say due to problems of lack
of stationarity as discussed above. It may only be possible to assume that shorter
segments of a series are from the same stochastic process. On the other hand, with
replications, one must be able to assume that the different series recorded do, in
fact, represent the same stochastic process. In certain situations, such as survival
data, it may not even be possible to check this strong assumption empirically.
When replications of a stochastic process are modelled, extreme care must be
taken with the time scale. If it is not chronological time, problems may arise. For
example, in experiments with biological organisms, time may be measured either
from birth or from start of a treatment. If all births do not occur simultaneously and
treatment is not started at the same time for all subjects, events that occur at similar
times after beginning treatment may occur neither closely together chronologically
nor at similar ages. This can create difficult problems of interpretation due to
confounding.
10 What is a stochastic process?
In the examples that I shall analyse in this text, I shall use either one long series
or a set of several short ones, depending both on the type of problem and on the
kind of data available. Generally, in the second case, when replications are present,
I shall assume, for simplicity, that they come from the same process, perhaps with
any differences explainable by time-constant (interprocess) covariates. Only in
Section 7.3 and in Chapter 14 shall I look at some standard ways of modelling the
differences among a set of series not described by observed covariates.
1.2 Dependence among states
As we have seen, dependencies among successive responses of a stochastic process
can be modelled by multivariate distributions or equivalently by the corresponding
product of conditional distributions. Certain general procedures are available that
I shall review briefly here. Some of them arise from time series analysis (Chapter
9) but are of much wider applicability.
1.2.1 Constructing multivariate distributions
In a model for a stochastic process, some specific stochastic mechanism is assumed
to generate the states. We often can expect that states of a series observed more
closely together in time will be more similar, that is, more closely related. In other
words, the state at a given time point will generally be related to those recently
produced: the probability of a given state will be conditional, in some way, on the
values of the process previously generated.
In certain situations, an adequate model without such dependence may be con-
structed. It usually will require the availability of appropriate time-varying covari-
ates. If these have been recorded, and perhaps an appropriate time trend specified,
the present state, conditional on the covariates, should be independent of previous
states. However, in many cases, this will not be possible, because of lack of infor-
mation, or will not be desirable, perhaps because of the complexity of the model
required or its lack of generality.
In order to model time dependencies among the successive states of a stochastic
process of interest, we may choose a given form either for the conditional distribu-
tions, on the right-hand side of Equation (1.1), or for the multivariate distribution,
on the left-hand side. In general, the conditional distribution will be different from
the multivariate and marginal distributions, because the ratio of two multivariate
distributions does not yield a conditional distribution of the same form except in
very special circumstances such as the normal distribution. The one or the other
will most often be intractable.
Because a limited number of useful non-normal multivariate distributions is
available, suitable models often can only be obtained by direct construction of the
conditional distribution. Thus, usually, we shall need to set up some hierarchical
series of conditional distributions, as in Equation (1.1). In this way, by means of
the univariate conditional probabilities, we can construct an appropriate multivari-
ate distribution for the states of the series.
1.2 Dependence among states 11
The hierarchical relationship among the ordered states of a series implies, by
recursion, that the conditional probabilities are independent, as in Equation (1.1).
In this way, univariate analysis may be used and the model will be composed of a
product of terms. Thus, multivariate distributions with known conditional form are
usually much easier to handle than those with known marginal form.
However, as we have seen in Section 1.1.4, the general formulation of Equa-
tion (1.1) highlights some potential difficulties. Each state depends on a different
number of previous states so that the conditional distribution 0 1 is
different at each time point; this may or may not be reasonable. Usually, additional
assumptions must be introduced. As well, the unconditional distribution of the first
state is required. Its choice will depend upon the initial conditions for the process.
The ways in which the conditionality in the distributions is specified will depend
on the type of state space. I shall, first, look at general procedures for any kind of
state (Sections 1.2.2 and 1.2.3) and, then, at specific ones for continuous states (or
perhaps counts, although this is unusual for the true state of a stochastic process)
and for categorical states.
1.2.2 Markov processes
An important class of simple processes makes strong assumptions about depen-
dence over time, in this way reducing the amount of empirical information required
in order to model them.
A series of responses in discrete time such that
0 1 1 (1.5)
so that each state only depends on the immediately preceding one, is known as a
Markov process. In other words, the state at time , given that at 1, is inde-
pendent of all the preceding ones. Notice that, in contrast to Equation (1.1), in the
simplest situation here, the form of the conditional distribution 1 does not
change over time.
This definition can be extended both to dependence further back in time and to
continuous time (Section 6.1). Recall, however, that such a conditional specifi-
cation is not sufficient to imply stationarity. This will also depend on the initial
conditions and/or on the length of time that the process has been functioning.
Equation (1.5) specifies a Markov process of order one. More generally, if the
dependence extends only for a short, fixed distance back in time, so that the present
state only depends on the preceding states, it is said to be a Markov process of
order . The random variables and are conditionally independent for
, given the intermediate states.
It is often necessary to assume that a stochastic process has some finite order
(considerably less than the number of observations available over time), for
otherwise it is nonstationary and its multivariate distribution continues to change
with each additional observation, as we saw above.
If the response variable for a Markov process can only take discrete values or
states, it is known as a Markov chain (Chapter 5 and Section 6.1.3). Usually, there
12 What is a stochastic process?
will be a finite number of possible states (categories), and observations will be
made at equally-spaced discrete intervals. Interest often centres on the conditional
transition probabilities of changes between states. When time is continuous, we
have to work with transition rates or intensities (Section 3.1.3) between states,
instead of probabilities. When the response variable for a Markov process is con-
tinuous, we have a diffusion process (Chapter 10).
1.2.3 State dependence
One simple way to introduce Markov dependence is to construct a regression func-
tion for some parameter of the conditional probability distribution such that it in-
corporates previously generated states directly, usually in addition to the other co-
variates. The states may be either continuous or categorical. Thus, the location
parameter (often the mean) of the conditional distribution of the series could be
dependent in the following way:
1 (1.6)
for a process of order one, where is a vector of possibly time-varying covariates
in some regression function , possibly nonlinear in the parameters . If there
are more than two categorical states, usually will have to be replaced by a vector
(often of conditional probabilities) and 1 will need to be modelled as a factor
variable.
Here, the present location parameter, that is, the prediction of the mean present
state, depends directly on the previous state of the process, as given by the previous
observed response. Thus, this can be called a state dependence model.
An easy way to introduce state dependence is by creating lagged variables from
the response and using them as covariates. For a first-order Markov process (
1), the values in the vector of successive observed states will be displaced by one
position to create the lagged covariate so that the current state in the response vector
corresponds to the previous state in the lagged vector. However, this means that we
cannot use the first observed state, without additional assumptions, because we
do not know its preceding state. If a higher-order model ( 1) is used, the
displacement will be greater and more than one observation will be involved in this
way.
One case of this type of model for categorical states is the Markov chain men-
tioned above (see Chapter 5 and and Section 6.1.3). A situation in which the state
space is continuous is the autoregression of time series, at least when there are no
time-varying covariates (Chapter 9).
1.2.4 Serial dependence
For a continuous state space, a quite different possibility also exists, used widely
in classical time series analysis (Chapter 9). This is to allow some parameter to
depend, not directly on the previous observed state, but on the difference between
1.2 Dependence among states 13
that previous state and its prediction at that time. For a location parameter, this
might be
1 1 (1.7)
Notice that this will be identical to the state dependence model of Equation (1.6)
if there are no covariates, and also that it generally will not work for a categorical
state space because the subtraction has no meaning.
Dependence among states is now restricted to a more purely stochastic compo-
nent, the difference between the previous observed state and its location regression
function, called the recursive residual or innovation. This may be seen more clearly
by rewriting Equation (1.7) in terms of these differences:
1 1 (1.8)
The new predicted difference is related to the previous observed difference by .
As in Equation (1.6), the previous observed value 1 is being used to predict the
new expected value , but here corrected by its previous prediction.
I shall call this a serial dependence model. The present location parameter de-
pends on how far the previous state was from its prediction given by the corre-
sponding location parameter, the previous residual or innovation.
Thus, both state and serial dependence yield conditional models: the response
has some specified conditional distribution given the covariates and the previous
state. In contrast, for some reason, most models constructed by probabilists, as
generalisations of normal distribution serial dependence time series to other distri-
butions, require the marginal distribution to have some required form (see, among
others, Lawrance and Lewis, 1980). These are complex to construct mathemati-
cally and difficult to interpret scientifically.
1.2.5 Birth processes
If the state space has a small finite number of states, that is, it is categorical, dif-
ferent methods may need to be used. One possibility may be to condition on the
number of previous recurrent events or, more generally, on the number of times
that the process was previously in one or more of the states.
The classical situation arises when the two possible states are present or absence
of a recurrent event. Then, a birth process counts the number of previous such
events:
1 (1.9)
where is the conditional probability of the event at time and 1 is the num-
ber of previous such events. Even without time-varying covariates, this process is
clearly nonstationary.
Generally, some function of , such as a logit transformation, will instead be
used to ensure that it cannot take impossible values. More often, the process will
be defined in terms of the (log) rate or intensity of events (Section 4.1) instead of
14 What is a stochastic process?
the probability. I shall describe more general types of dependence for finite state
spaces in Section 4.1.1, once I have introduced this concept of intensity.
Only three basic types of dependency have been presented here. Other more
specific ones may often be necessary in special circumstances. Some of these will
be the subject of the chapters to follow.
1.3 Selecting models
As always in statistical work, the ideal situation occurs when the phenomenon un-
der study is well enough known so that scientific theory can tell us how to construct
an appropriate model of the stochastic process. However, theory never arises in a
vacuum, but must depend on empirical observations as well as on human power of
abstraction. Thus, we must begin somewhere! The answers to a series of questions
may help in constructing useful models.
1.3.1 Preliminary questions
When confronting the modelling of observations from some stochastic process, one
may ask various fundamental questions:
How was the point in time to begin observation chosen?
– Is there a clear time origin for the process?
– What role do the initial conditions play?
Are observations made systematically or irregularly over time?
– If observations are irregularly spaced, are these time points fixed in advance,
random, or dependent on the previous history of the process itself?
– Is the process changing continuously over time or only at specific time points?
– Are all changes in the process recorded or only those when an observation
happens to be made?
– Does a record at a given time point indicate a new value in the series then or
only that it changed some time since the previous observation?
Is the process stationary?
– Is the process increasing or decreasing systematically over time, such as a
growth curve?
– Is there periodic (daily, seasonal) variation?
– Does the process change abruptly at some time point(s)?
– Are there long-term changes?
Is more than one (type of) response recorded at each time point?
– Can several events occur simultaneously?
– Do some quantitative measurements accompany the occurrence of an event?
Does what is presently occurring depend on the previous history of the process?
– Is it sufficient to take into account what happened immediately previously (the
Markov assumption)?
1.3 Selecting models 15
– Is there a cumulative effect over the history of the process, such as a birth
effect?
Is the process influenced by external phenomena?
– Can these be described by time-varying covariates?
– Is some other unrecorded random process, such as the weather, affecting the
one of interest?
If there is more than one series, do the differences among them arise solely from
the randomness of the process?
– Are the differences simply the result of varying initial conditions?
– Do the series differ because of their dependence on their individual histories?
– Can part of the difference among the series be explained by time-constant
covariates with values specific to each series?
– Are there static random differences among the series?
– Does each series depend on a different realisation of one or more time-varying
covariates (different weather recorded in different locations)?
– Are there unrecorded random processes external to the series influencing each
of them in a different way (different weather in different locations, but never
recorded)?
Possible answers to some of these questions may be indicated by appropriate plots
of the series under study. However, most require close collaboration with the sci-
entists undertaking the study in order to develop a fruitful interaction between em-
pirical observation and theory.
1.3.2 Inference
Some objective empirical procedure must be available in order to be able to select
among models under consideration as possible descriptions of an observed stochas-
tic process. With the exception of preliminary descriptive examination of the data,
all analyses of such processes in this book will be based on the construction of
probabilistic models. This means that the probability of the actually observed pro-
cess(es) always can be calculated for any given values of the unknown parameters.
This is called the likelihood function, a function of the parameters for fixed ob-
served data. Here, all inferences will be based on this. Thus, the basic assumption
is the Fisherian one that a model is more plausible or likely if it makes the observed
data more probable.
A set of probability-based models that one is entertaining as having possibly
generated the observed data defines the likelihood function. If this function is so
complex as to be intractable, then there is a good chance that it cannot provide
useful and interpretable information about the stochastic process.
However, the probability of the data for fixed parameter values, a likelihood
value, does not, by itself, take into account the complexity of the model. More
complex models generally will make the observed data more probable, but simpler
models are more scientifically desirable. To allow for this, minus the logarithm
16 What is a stochastic process?
of the maximised likelihood can be penalised by adding to it some function of the
number of parameters estimated. Here, I shall simply add the number of parameters
to this negative log likelihood, a form of the Akaike (1973) information criterion
(AIC). Smaller values will indicate models fitting relatively better to the data, given
the constraint on the degree of complexity.
Further reading
Jones and Smith (2001) give an elementary introduction to stochastic processes.
Grimmett and Stirzaker (1992), Karlin and Taylor (1975; 1981), Karr (1991), and
Ross (1989) provide more advanced standard general introductions. The reader
also may like to consult some of the classical works such as Bailey (1964), Bartlett
(1955), Chiang (1968), Cox and Miller (1965), Doob (1953), and Feller (1950).
Important recent theoretical works include Grandell (1997), Guttorp (1995),
Küchler and Sørensen (1997), and MacDonald and Zucchini (1997). More ap-
plied texts include Snyder and Miller (1991), Thompson (1988), and the excellent
introduction to the uses of stochastic processes in molecular biology, Ewens and
Grant (2001).
An important book on multivariate dependencies is Joe (1997).
For inferences using the likelihood function and the AIC, see Burnham and
Anderson (1998) and Lindsey (2004).
Exercises
1.1 Describe several stochastic processes that you can encounter while reading
a daily newspaper.
(a) What is the state space of each?
(b) What are the possible events?
(c) Is time discrete or continuous?
(d) What covariates are available?
(e) Will interest centre primarily on durations between events or on the
states themselves?
(f) What types of dependencies might be occurring over time?
1.2 Consider the following series, each over the past ten years:
(a) the monthly unemployment figures in your country,
(b) the daily precipitation in the region where you live, and
(c) the times between your visits to a doctor.
For each series:
(a) Is the state space categorical or continuous?
(b) Is time discrete or continuous?
(c) What types of errors might be introduced in recording the observa-
tions?
(d) Is there a clear time origin for the process?
(e) Is it plausible to assume stationarity?
Exercises 17
(f) Can you expect there to be dependence among the responses?
(g) Can you find appropriate covariates upon which the process might
depend?
2
Basics of statistical modelling
In this chapter, I shall review some of the elementary principles of statistical mod-
elling, not necessarily specifically related to stochastic processes. In this way, read-
ers may perhaps more readily understand how models of stochastic processes relate
to other areas of statistics with which they are more familiar. At the same time, I
shall illustrate how many of these standard procedures are not generally applicable
to stochastic processes using, as an example, a study of the duration of marriages
before divorce. As in subsequent chapters, I shall entertain a wide variety of dis-
tributional assumptions for the response variable and use both linear and nonlinear
regression functions to incorporate covariates into the models.
2.1 Descriptive statistics
Let us first examine the data that we shall explore in this chapter.
Divorces Marriage may be conceptualised as some kind of stochastic process de-
scribing the relationships within a couple, varying over time, that may eventually
lead to rupture. In this light, the process ends at divorce and the duration of the
marriage is the centre of interest.
In order to elucidate these ideas, a study was conducted in 1984 of all people
divorcing in the city of Liège, Belgium, in that year, a total of 1727 couples. (For
the data, see Lindsey, 1992, pp. 268–280). Here, I shall examine how the length
of marriage before divorce may vary with certain covariates: the ex-spouses’ ages
and the person applying for the divorce (husband, wife, or mutual agreement).
Only divorced people were recorded, so that all durations are complete. How-
ever, this greatly restricts the conclusions that can be drawn. Thus, the design of
this study makes these data rather difficult to model.
The design was retrospective, looking back in time to see how long people were
married when they divorced in 1984. Thus, all divorces occurred within the
relatively short period of one year.
On the other hand, the couples married at quite different periods in time. This
could have an influence on the occurrence of divorce not captured by age.
The study included only those couples who did divorce so that it can tell us
18
2.1 Descriptive statistics 19
nothing about the probability of divorce. To be complete, such a study would
somehow have to include a ‘representative’ group of people who were still mar-
ried. These incompletely observed marriages would be censored (Section 3.1.3).
The reader should keep these problems in mind while reading the following analy-
ses.
2.1.1 Summary statistics
Before beginning modelling, it is always useful first to look at some simple de-
scriptive statistics.
Divorces In the divorce study, the mean length of marriage is 13.9 years, with
mean ages 38.5 and 36.1, respectively, for the husband and the wife. Because
length of marriage will be the response variable, we also should look at its vari-
ability; the variance is 75.9, or the standard deviation, 8.7. Thus, an interval of,
say, two standard deviations around the mean length of marriage contains negative
values; such an interval is meaningless. Symmetric intervals around the mean are
not appropriate indicators of variability when the distribution is asymmetric, the
typical case for many responses arising from stochastic processes.
A more useful measure would be intervals (contours) of equal probability about
the mode, which has highest probability. For this, graphical methods are often
appropriate.
2.1.2 Graphics
Visual methods often are especially appropriate for discovering simple relation-
ships among variables. Two of the most useful in the context of modelling are
histograms and scatterplots.
Divorces The histogram for duration of marriage is plotted in the upper left hand
graph of Figure 2.1; its shape is typical of duration data. We see indeed that it is
skewed, not having the form of a normal distribution. From this, intervals of equal
probability can be visualised.
However, this histogram indicates the form of the distribution for all of the cou-
ples together. Models based on covariates generally make assumptions about the
conditional distribution for each value of the covariates. Thus, a linear regression
model carries the assumption that the conditional distribution is normal with con-
stant variance for all sets of covariate values. This histogram does not provide
information about such conditional distributions. Consider, as an example then,
the explanatory variable, applicant. We can examine the histograms separately for
each of the three types of applicant. These are also given in Figure 2.1. We can see
that the form of the histogram differs quite substantially among these three groups.
The above procedure is especially suitable when an explanatory variable has
only a few categories. If a quantitative variable, like age, is involved, another
approach may be more appropriate. Let us, then, see how to examine graphically
20 Basics of statistical modelling
All couples
Proportion
of
divorces
0 10 20 30 40 50
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Husband applicant
0 10 20 30 40 50
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Wife applicant
Length of marriage
Proportion
of
divorces
0 10 20 30 40 50
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Mutual agreement
Length of marriage
0 10 20 30 40 50
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Fig. 2.1. Histograms showing the proportions of couples divorcing after various lengths of
marriage in Liège in 1984, grouped into intervals of five years: all couples and separately
by applicant.
the relationship between the response variable, duration of marriage, and husband’s
age. The scatterplot of length of marriage in relation to this age is given in Figure
2.2 (ignore, for the moment, the two diagonal lines). As might be expected, there
is a rather strict upper relationship between the two variables. Length of marriage
2.2 Linear regression 21
20 30 40 50 60 70 80
0
10
20
30
40
50
Husband’s age
Length
of
marriage Linear
Quadratic
Fig. 2.2. A scatterplot showing the relationship between length of marriage before divorce
in Liège in 1984 and the age of the husband, with the fitted normal distribution regression
lines.
is, with few exceptions, constrained to be no greater than about age 20 years. We
can also notice the interesting fact that a few very old men married late in life and
divorced rather quickly thereafter.
To relate this graph to histograms, consider the density of points along a vertical
line for some fixed age of the husband. If we move this line to the left or right,
we see that the mass of points along it shifts. This indicates that, if we produced
histograms for different age groups, they would have different shapes, as did those
for the three applicant groups. Thus, both of these graphical methods indicate that
the conditional distribution of length of marriage is changing with the covariates,
applicant and husband’s age.
As we see from these graphs, the assumptions of normality and constant variance
of the conditional distribution of length of marriage, given the applicant group or
husband’s age, do not appear to be fulfilled either. Nevertheless, I first shall attempt
to fit such models to these data.
2.2 Linear regression
One of the most widely (mis)used tools in all of statistics is linear regression. This
is often misnamed ‘least squares’ regression. However, least squares estimation
refers to the study of a deterministic process, whereby the ‘best’ straight line is
fitted through a series of points. In statistical modelling, the interpretation of linear
regression is quite different, although the technical calculations remain the same.
22 Basics of statistical modelling
2.2.1 Assumptions
Suppose that we have observations of some response variable , say the state
of a stochastic process, such as a time series or the time between recurrent events.
As well, we have some accompanying explanatory variables or covariates, ,
to which, we believe, the response is related. Then, applying normal distribution
linear regression carries the assumption that this response has a normal or Gaussian
distribution with probability density
; 2 1
2 2
e
1
2 2
2
(2.1)
conditional on the values of the covariates.
In addition, the mean of the responses is assumed to change in some deter-
ministic way with the values of these covariates, that is,
0 ∑ (2.2)
In this function, 0 is the intercept, and is the slope for the covariate, . Then,
this regression equation specifies how the mean of the distribution changes for
each value of the covariates. On the other hand, the variance 2 of is assumed
to remain constant.
The model is not just the deterministic description of Equation (2.2). As an
integral part of it, individual responses are dispersed randomly about the mean
in the specific form of the normal distribution in Equation (2.1) with the given
variance. This is illustrated in Figure 2.3. Such variability usually will be an
integral part of the scientific phenomenon under study, not just measurement error.
This regression function is called linear by statisticians for the wrong reason: it
is linear in the parameters . This is irrelevant for scientific modelling. On the
other hand, the shape of the curve may take certain restricted nonlinear forms, say
if 2 is included as a covariate, as we shall see below.
Once we understand that such a model is describing changes in a normal distri-
bution, we easily can imagine various extensions:
Other, more suitable, distributions can replace the normal distribution; these will
most often be asymmetric.
The dispersion (here, the variance) about the regression curve need not be held
constant.
The regression equation (2.2) may more appropriately be replaced by some
wider class of nonlinear relationships.
These generally will permit more realistic analysis of the data at hand. I shall
begin to examine them more closely in Section 2.4. However, first it will be useful
to review in some more detail the standard models based on the normal distribution.
2.2.2 Fitting regression lines
Linear regression models are well known and can be fitted using any standard sta-
tistical software.
2.2 Linear regression 23
2 4 6 8 10
0
10
20
30
40
x
y
Fig. 2.3. A graphical representation of a simple linear normal regression showing the linear
form of the regression function and the constant shape of the normal distribution about it.
Divorces Here, the response variable is a duration so that the normal distribution
will, almost certainly, be inappropriate, as we already have seen from the graphical
methods above. I shall, nevertheless, proceed first with models based upon it. Let
us, then, attempt first to explore the relationship between the length of marriage
and the age of the husband using linear regression. The estimated equation is
14 4 0 73 1
where 1 is the husband’s age. The positive sign of the slope indicates, as might
be expected, that mean length of marriage increases with the husband’s age. Thus,
according to this model, the length of marriage is estimated to increase, on average,
by about three-quarters of a year for each additional year of husband’s age. This
relationship is plotted as the solid line on the scatterplot in Figure 2.2. It is not very
convincing.
Likelihood
We can ask whether the inclusion of a given covariate actually does help to predict
the response variable. As briefly outlined in Section 1.3, one intuitive way to do
this is to look at how probable each model makes the observed data, called its
likelihood. Often, this is easier to study if minus its logarithm is used. Then,
smaller values indicate models ‘closer’ to the data. A problem with this procedure
is that more complex models, those with more estimated parameters, even ones that
24 Basics of statistical modelling
are not really necessary, will generally make the data more probable. However, we
usually prefer the simplest adequate model possible.
One solution to this dilemma is to penalise more complex models by using an
information criterion to compare models. These are designed especially to help in
selecting among competing models. Thus, the most widely used, the Akaike infor-
mation criterion (AIC) involves minus the logarithm of the maximised likelihood,
penalised by adding to it the number of parameters in the model estimated from
the data. This penalty prevents the measure of suitability of the models from de-
creasing too quickly as they become more complex. Note that information criteria
have no absolute meaning; they only provide a guide for comparing different mod-
els applied to the same data. Beware also that most linear and generalised linear
software return twice the negative log likelihood, so that, if AICs are supplied, they
will be twice those given here.
Divorces We can now turn to the specific question as to whether or not husband’s
age improves the prediction of length of marriage. This implicitly involves the
comparison of two models: those with and without the covariate, husband’s age.
That with it has an AIC of 5076.5. The model without this covariate simply fits a
common mean (that calculated above) to all of the responses. Its AIC is 6090.6,
indicating that the first model was a great improvement on this. Of course, this is
obvious from Figure 2.2; even if the regression model involving husband’s age does
not represent the responses very well, it does much better than simply assuming a
common mean length of marriage for everyone.
Multiple regression
One possible way to proceed to more nonlinear forms of regression curves, remain-
ing in the context of normal linear regression, is to add the square of a quantitative
covariate to the regression equation. This is a simple case of multiple regression.
Here, it will produce a nonlinear model even though statisticians call it linear!
Divorces If we add the square of husband’s age, the estimated equation becomes
20 6 1 04 1 0 0035 2
1
This addition does not improve the fit nearly as much as did inclusion of the linear
term in husband’s age: the AIC is only reduced further to 5068.6. This relationship
is plotted as the dashed line on the scatterplot in Figure 2.2; it can be seen to be
only slightly curved.
The same analysis can be carried out for the age of the wife, say 2. In fact,
these models fit better, with AICs of 5017.7 for the linear curve and 5009.7 for the
quadratic. Perhaps surprisingly, if we combine the two models, with a quadratic
relationship for both husband’s and wife’s age, we obtain a further substantial im-
provement. The AIC is 4917.1. However, the quadratic term for the wife’s age is
not necessary in this equation; eliminating it reduces the AIC to 4916.4.
2.2 Linear regression 25
Husband’s Age
Wife’s
Age
20 30 40 50 60 70 80
20
30
40
50
60
70
80
Husband’s Age
20
40
60
80
Wife’s Age
20
40
60
80
Length
of
Marriag
e
0
10
20
30
40
Fig. 2.4. Contour and three-dimensional plots of the model for mean length of marriage as
it depends on the two ex-spouses’ ages.
Interactions
We also can consider interactions between quantitative covariates, obtained by
multiplying them together.
Divorces After some experimentation, we discover that the linear interaction is
necessary, as well as the quadratic term for wife’s age and the quadratic and cubic
for husband’s age. The final estimated equation, with a substantial reduction in
AIC to 4710.3, is
13 42 1 64 1 0 25 2 0 018 2
1 0 034 2
2
0 00039 3
1 0 0068 1 2
It is more difficult to plot a regression function when there are two covariates, but
it is still possible. Two ways, as contours and as a three-dimensional perspective
plot, are shown in Figure 2.4 for this model of length of marriage as it depends on
the ages of the husband and the wife.
Notice that this model is a completely arbitrary construction, obtained by em-
pirical search. If, say, we were wanting to approximate some unknown nonlinear
function by a Taylor’s series expansion, we would want to use all terms up to a cer-
tain order, usually second. Here, I have included one third order term ( 3
1) but not
three others (including two further interactions). However, none of them improves
the model significantly.
26 Basics of statistical modelling
2.3 Categorical covariates
Not all covariates are quantitative, as is age in the divorce example. Some, called
factor variables, may indicate qualitatively to what subgroup each individual be-
longs. As we saw in Figure 2.1, a categorical variable divides the population into
subgroups, each with a distinct conditional distribution. In the context of normal
models, this simply implies that the mean response will be different.
Traditionally, a model based on the normal distribution and containing such a
variable was called ‘analysis of variance’. However, it can, in fact, be handled as
a special case of multiple regression.
Divorces For the divorce data, we saw above when examining histograms that the
person applying for the divorce is such a variable. The means are, respectively,
16.7 when the husband applies, 13.9 when the wife does, and 11.6 when both do
(mutual consent).
2.3.1 Analysis of variance
With one categorical covariate, an analysis of variance model can be written
(2.3)
where indexes the categories of the covariate. However, as it stands, this model
has one too many parameters. We must add a constraint. There is no unique way to
do this. The choice generally will depend on interpretability. I shall consider two
useful ways.
Another way to look at this problem is to realise that a categorical variable can-
not, numerically, be summarised in one number. Instead, it requires a set of indi-
cator or dummy variables, indicating to which category each observation belongs.
Thus, as we shall see, Equation (2.3) can be written equivalently as the multiple
regression of Equation (2.2) using such variables.
Baseline constraint
One way to add a constraint is to set 0 for one value of . This is called the
baseline constraint because one category is chosen as a baseline of comparison for
all of the others.
Now let us define indicator variables such that each can take the values 0 or 1,
depending on whether or not the observation is in that particular category. How-
ever, this is slightly redundant: if we know that the value of the variable is not in
any of the categories but one, then it must be in that remaining category. Thus,
we only require one less indicator variable than the number of categories of the
original variable. The category without an indicator variable is that with 0.
This yields one possible constraint on the parameters mentioned above.
These indicator variables can be used as covariates in the multiple regression of
Equation (2.2). Fortunately, most software can handle factor variables automati-
cally so that we do not need to set up the indicators ourselves. Care, however, must
be taken in interpreting the results in terms of the constraint, or equivalently, the set
2.3 Categorical covariates 27
of indicator variables, employed. Often, by default, the software chooses 1 0
or equivalently uses indicator variables for all but the first category.
Divorces For the divorce data, using the applicant as a factor variable with the
baseline constraint yields ˆ 16 7, the mean number of years of marriage for the
first category, husband applying. Then, 1 0, 2 2 8, the difference in mean
from the first category for wife applying, and 3 5 1, the difference from the
first category for mutual consent. Thus, the means given above are reproduced.
The AIC is 6044.0, showing that this categorical variable does not help nearly as
much in predicting the length of marriage as do the ages of the two ex-spouses.
Mean constraint
Instead of setting to 0 for one of the categories (above it was the first, 1),
another useful possibility for interpretation in many contexts is a constraint such
that is the mean and the are differences from it for each category. The ap-
propriate constraint is ∑ 0. This are called the mean constraint, or sometimes
the conventional constraint, because it was classically most often used in analysis
of variance.
Here, the corresponding indicator variables are more complex. Let us start by
specifying the values for all categories except the last. Then, an appropriate indi-
cator variable will take the value 1 if the observation is in the given category, 0 if
it is in another category except the last, and 1 if in the last. Again, there will be
one fewer indicator variable than the number of categories. The value of for the
last category will be minus the sum of those for the other categories, using the fact
that ∑ 0.
Again, these indicator variables can be used as covariates in the multiple regres-
sion of Equation (2.2), but many software packages can also do this automatically.
However, generally they will only calculate the values for all but one of the cate-
gories, in the way just outlined.
Divorces For the divorce data, the values obtained using these constraints are
ˆ 14 1, the mean number of years of marriage, 1 2 6, the difference from
this mean for husband applying, 2 0 1, the difference from the mean for wife
applying, and 3 2 5, the difference for mutual consent. Notice that this value,
ˆ 14 1, is not equal to the global mean calculated above. It is rather the un-
weighted mean of the means for each category.
With this parametrisation of the model, we see more easily that the length of
marriage is about average when the wife applies, being considerably longer when
the husband applies and about as much shorter when there is mutual consent. No-
tice that the differences in mean length of marriage are estimated to be the same
between categories in the two parametrisations. Thus, in the first, the difference
between husband and wife applying was 2 78; this is 0 14 2 64 2 78 in
the second. Of course, the AIC is again 6044.0 because this is just a different
parametrisation of the same model.
28 Basics of statistical modelling
2.3.2 Analysis of covariance
More complex models will contain both quantitative and qualitative variables. Tra-
ditionally, this was called ‘analysis of covariance’, but it is just still another case
of multiple regression. A categorical covariate can be used to introduce a different
curve for each of its category. Thus, for a straight line in a regression function like
Equation (2.2), it will allow a different intercept for each category of the qualitative
covariate:
0 ∑ (2.4)
where, again, indexes the categories of the qualitative covariate.
Divorces To continue the divorce example, we can model simultaneously the ages
of the two spouses and the applicant for the divorce (husband, wife, or mutual).
Thus, at this first stage, we are making the assumption that the mean length of
marriage depends on the ex-spouses’ ages in the same way for each type of ap-
plication. This improves the model only slightly; the AIC is 4911.8 as compared
to 4916.4 given above with the same intercept for all types of application (both
models without interactions between the ages).
In order to be able easily to plot the regression curves, I shall use the simpler
model with only the husband’s age. (The AIC is 5064.2, a much worse model, as
might be expected; again, it does not provide much improvement as compared to
the model with the same intercept for all types of application, given above, which
had 5068.6.) The three parallel curves, with different intercepts, are plotted in the
left graph of Figure 2.5. There is not much separation between these lines.
Interactions
A still more complex model allows not only the intercepts but also the slopes to
differ among the categories of the categorical variable. This model can be written
0 ∑ (2.5)
Here, the quantitative covariates are said to interact with the categorical covariate.
Divorces When both ages are included in the model for the divorce data, including
necessary interactions, the AIC is reduced to 4695.5. Again, in order to be able to
plot the regression curves easily, I shall use the model without the wife’s age. (This
has an AIC of 5049.0.) The curves are plotted in the right graph of Figure 2.5.
They are quite different, with that for mutual consent levelling off more rapidly
with age. (Here, the model could be simplified by eliminating the two parameters
for interactions between type of application and the square of the husband’s age.)
2.4 Relaxing the assumptions 29
20 30 40 50 60 70 80
0
10
20
30
40
Husband’s age
Length
of
marriage
20 30 40 50 60 70 80
0
10
20
30
40
Husband’s age
Husband
Wife
Mutual
Fig. 2.5. The fitted normal distribution regression lines for the divorce data, separately for
the three types of application. Left: parallel lines; right: different slopes.
2.4 Relaxing the assumptions
2.4.1 Generalised linear models
Two of the assumptions of normal models listed above (Section 2.2.1) easily can
be relaxed by turning to generalised linear models. These provide a slightly wider
choice of distribution and allow the mean to depend in a nonlinear way on the
linear regression function. The standard distributional choices are normal, gamma,
inverse Gauss, Poisson, and binomial. The modification to the regression equation
involves some transformation of the mean, called the link function, say :
0 ∑ (2.6)
This must be monotone so that its inverse exists.
Gamma distribution
Survival data (Chapter 3), and other responses involving durations, can often use-
fully be modelled by the gamma distribution
;
1e
Γ
(2.7)
Because we shall be using this distribution primarily to describe time, I have re-
placed by , as I also shall do in the following distributions.
An important special case, obtained by setting 1, is the exponential distri-
30 Basics of statistical modelling
bution, equivalently,
; e
or
; e
(2.8)
where the mean duration is given by 1 . I shall not use this distribution here,
but frequently shall return to it later (see, especially, Section 4.1.2).
In the gamma distribution, is the mean and is the ratio of the mean squared
to the variance, the reciprocal of the square of the coefficient of variation. Thus,
an exponential distribution has unit coefficient of variation which can serve as a
standard of comparison.
Each distribution in the generalised linear model family has a default, canonical
link function. For the gamma distribution, it is the reciprocal or inverse,
1
(2.9)
so that a regression equation using it will have the form
1
0 ∑ (2.10)
Most often, this is inappropriate and a log link
log 0 ∑ (2.11)
is more useful.
Thus, it is possible to change the link function for a given distribution. That used
above, in Equation (2.2) with the normal distribution, is called the identity link:
(2.12)
its canonical link.
Divorces Length of marriage can be thought of as the survival of the marriage.
Then, for the divorce data, the gamma distribution may be appropriate. Here, I only
shall consider the regression equation with a quadratic dependence on husband’s
age. The resulting curve is plotted as the solid line in Figure 2.6. The surprising
form of this curve results, of course, from the inverse link function.
The AIC is 4890.1, very much better than any of the preceding models, even
with more covariates and interactions among them. For comparison, the corre-
sponding, previously obtained, curve from the normal distribution is plotted in the
same graph, as the dotted line.
The first part of the curve from the gamma distribution with reciprocal link is
similar to that from the normal distribution. But, in contrast to this latter curve,
it reaches a peak at about 60 years and then goes back down for the older people.
This may be as reasonable a representation of the data.
However, we do not yet know whether the improved fit results primarily from
the more nonlinear form of the regression equation (the link function) or from the
2.4 Relaxing the assumptions 31
20 30 40 50 60 70 80
0
10
20
30
40
50
Husband’s age
Length
of
marriage Gamma with inverse link
Gamma with identity link
Normal with identity link
Fig. 2.6. The scatterplot of Figure 2.2 with the fitted gamma and normal regression lines.
changed distributional assumption (the gamma distribution). We best can examine
this in steps.
When the identity link is used with the gamma distribution and the quadratic
relationship in husband’s age, the curve is shown as the dashed line in Figure 2.6;
it bends in the opposite direction to that from the normal distribution. The equation
from the normal distribution is
20 6 1 04 1 0 0035 2
1
whereas that from the gamma distribution is
9 82 0 49 1 0 0032 2
1
For the gamma distribution, the AIC is 4797.9, as compared to 5068.6 above for
the normal distribution with an identity link. This demonstrates that most of the
improvement comes from the changed distribution. This conclusion can be con-
firmed by using the reciprocal link with the normal distribution; this has an AIC of
5059.4.
Thus, we must conclude that the conditional distribution of this response variable
is decidedly non-normal, at least when a quadratic dependence on husband’s age is
used. Appropriate choice of the distribution is essential when studying stochastic
processes.
32 Basics of statistical modelling
Log normal and inverse Gauss distributions
Two other distributions in the generalised linear model family, the log normal and
the inverse Gauss, may also be appropriate for duration data. The first can be
derived from the normal distribution of Equation (2.1) by taking the logarithm of
the responses and introducing the Jacobian:
; 2 1
2 2
e
1
2 2 log 2
(2.13)
The second has the form
;
1
2 3
e
1
2 2
2
(2.14)
Divorces For these two models, the AICs are, respectively, 4942.4 and 5074.6 for
the same quadratic regression function with an identity link. Neither of these fits
as well as the gamma distribution (AIC 4797.9), although the first is better than the
normal distribution (5068.6).
2.4.2 Other distributions
Generalised linear models contain a very restricted set of distributional possibil-
ities. For continuous responses, these are essentially the normal, log normal,
gamma, and inverse Gauss distributions. For stochastic processes, many other dis-
tributions will also be important. Here, I shall look at a few of them.
Weibull distribution
For duration data, the Weibull distribution
;
1e
(2.15)
is especially important because of its simple properties (Section 3.2.1).
Divorces Here, this distribution, with the quadratic function of husband’s age, has
an AIC of 4643.0. This is a major improvement on the previous models. The
regression equation is
9 13 0 46 0 0041 2
1
similar to that for the gamma distribution. These two curves are plotted in Figure
2.7. We see that the Weibull curve is higher than the gamma.
This result is important because, for a model without covariates, the gamma
distribution fits better than the Weibull. The AICs are, respectively, 5816.3 and
5850.5. Thus, the marginal distribution (gamma) is different than the conditional
one (Weibull). The distributional assumptions can change as covariates are intro-
duced into a model.
2.4 Relaxing the assumptions 33
20 30 40 50 60 70 80
0
10
20
30
40
50
Husband’s age
Length
of
marriage Weibull
Gamma
Fig. 2.7. The scatterplot of Figure 2.2 with the fitted gamma and Weibull regression lines.
The dashed gamma line is the same as that in Figure 2.6.
Other distributions
Another distribution, the log logistic, has a similar shape to the log normal but with
somewhat heavier tails:
;
e
log
3
3 1 e
log
3
2
(2.16)
Other possibilities include the log Cauchy
; 2 log 2
(2.17)
and log Laplace
;
1
2
e
log
(2.18)
with even heavier tails.
Divorces The log logistic distribution has an AIC of 4821.8, considerably better
than the log normal, but not as good as the gamma and Weibull distributions. The
log Cauchy has an AIC of 4853.0 and the log Laplace 4775.6, both also better than
the log normal. However, none of these can compete with the Weibull distribution,
although the second is better than the gamma. These results are summarised in
Table 2.1.
34 Basics of statistical modelling
Table 2.1. Fits of various distributions to the divorce data with a quadratic
regression in husband’s age and identity link.
Distribution AIC
Normal 5068.6
Gamma 4797.9
Log normal 4942.4
Inverse Gauss 5074.6
Weibull 4643.0
Log logistic 4821.8
Log Cauchy 4853.0
Log Laplace 4775.6
2.4.3 Nonlinear regression functions
Because of their link function, generalised linear models have a nonlinear compo-
nent arising from the transformation of the mean in the link function (unless the
identity link is used, as in most of the models above). However, they are linear in
that function of the mean. There is no reason that we should be restricted to such
regression functions. Indeed, in terms of husband’s age, our models above were
nonlinear. Scientifically, the statistical distinction, in terms of the parameters, be-
tween linear and nonlinear models is generally that the former are approximations
to the latter.
Logistic growth curve
Here, I shall look at one simple case of a regression function that is nonlinear both
in the covariate and in the parameters. Suppose that response variable depends on
a covariate in an S-shaped fashion, so that the mean follows the function
1 e 0 1 1
(2.19)
instead of a quadratic one. This is called a logistic growth curve (Section 12.3.1).
Divorces There appears to be little theoretical reason for using this function with
the divorce data; it implies that the mean length of marriage levels off to some con-
stant value as the husband’s age increases. With the Weibull distribution, the AIC
is 4647.4, somewhat poorer than the quadratic regression function. For compari-
son, the two curves are plotted in Figure 2.8. We can see that they are similar for
ages up to about 55, where most of the responses lie.
Because of the limitations in the design of this study outlined in Section 2.1, the
results above are not meant to be definitive in any sense. However, they do show
clearly how changing assumptions can alter the conclusions that may be drawn.
As in all scientific modelling, care must always be taken in formulating models for
stochastic processes.
Exercises 35
20 30 40 50 60 70 80
0
10
20
30
40
50
Husband’s age
Length
of
marriage Quadratic
Logistic
Fig. 2.8. The scatterplot of Figure 2.2 with the fitted linear and nonlinear Weibull regres-
sion lines. The solid line is the same as that in Figure 2.7.
Further reading
Good books on linear normal models are rare.
Several books on generalised linear models are available; these include Aitkin et
al. (1989), Dobson (2002), Lindsey (1997), and McCullagh and Nelder (1989).
For nonlinear models, see Lindsey (2001).
Exercises
2.1 Perform a more complete analysis of the study of divorce in Liège.
(a) Develop an appropriate regression model using all necessary co-
variates, including the number of children, and any necessary in-
teractions.
(b) The study also recorded the length of the court procedure. Analyse
the dependence of this response variable on the covariates.
2.2 A medical three-period cross-over trial was performed to determine gastric
half-emptying time in minutes. It involved 12 subjects, as shown in Table
2.2. This is an analysis of variance type of design for duration data.
(a) Find an appropriate model to determine whether or not there are
treatment effects. Besides the covariates explicitly present in the
table (treatment and period), you may want to consider the ‘carry-
Exploring the Variety of Random
Documents with Different Content
canimo
What made him so sick? Onsay nacasaquit cania?
I think fruit made him ill Naco ng̃a ang bong̃a maoy nacadaot
cania
Don’t eat too much fruit if you do
not want to be sick
Ayao ca pagcaon ug bong̃a sa
hinlabihan ug dili ca buut
magmasaquit
Where is your father? Hain ba ang imong amahan?
He is at home Tua sia sa balay
Why does he not walk? Mano dili sia magasodoy sodoy?
Because he has not yet recovered Cay apiogon pa sia
Waiter, how long has the physician
been waiting for me?
Bata, dugay na ba ng̃a guihulat aco
sa médico?
Not long, a few minutes Dili dugay, pila ca minuto da
You must go out very seldom, for I
never have the pleasure of meeting
you in the street
Talagsada nanaog ca daguay, cay
uala pa co maquita icao sa dalan
I am very ill, so that I cannot go out Masaquit aco ug daco, tung̃ud niana
dili aco macanaog
I am very cold Nasip-on aco tinood
I have been told your son is very
sick, what is the matter with him?
Guisuguinlan aco ng̃a masaquiton
caayo ang anac mo, onsay saquit
nia?
He went to walk the other day with
some friends, and caught a severe
cold
Sa usang adlao miadto sia sa
pagsodoy sodoy uban sa pila ca
isigcaing̃on nia ug nasip-on sia ug
daco
Will you be able to take care of that
child?
Mahimo mong pagbantay nianang
bata?
I will take care of it with the greatest
pleasure
Bantayan co sia sa maayong
cabubuton
How does your teacher feel? Comusta ang imong magtoto-on?
He is so so now Arang arang man sia caron
He is unwell Nadaot sia
Is your neighbor in good health? Maayo ba ang imong siling̃an?
He is now suffering from his
stomach
Caron guisoolan sia sa coto coto
He is a little indisposed Masaquit sia ug diotay
How is your family? Maayo ba ang imong panimalay?
They all are well except my brother Maayo man silang tanan gauas sa
acong igso-on ng̃a lalaqui
What is his illness? Onsa ba ang saquit nia?
He has sore fingers Masaquit sia sa mg̃a todlo
I have heard your uncle is not well Nadung̃ug co ng̃a masaquit usab ang
imong oyo-an
It is not true Dili matuod
He has got a sore throat Masaquit sia sa totonlan
How long has he been ill? Dugay na ng̃a masaquit sia?
It is not long since Bag-o pa
And you, Sir, how do you feel? Ug icao, Señor, comusta ca?
So so, but my daughter has a violent
fever
Arang arang, apan ang acong anac
ng̃a babaye guihilantan ug maayo
Since when? Canosa cutub?
At midnight she was seized with it Sa tung̃ang gabi-i minsugod sia sa
pagbati niana
I wish her a speedy recovery Naning̃uha aco ng̃a magmapiscay sia
sa labing madali
She is much better than she was Maayo ayo na sia caron
I hope she will get immediate ease
from her illness
Milaom aco ng̃a luason sia unta sa
madali sa iyang saquit
How are you, Madam? Comusta icao, Señora?
I have not been well lately, but I feel
better now
Bag-o pa masaquit aco, apan caron
maayo ayo na
I hope to see you better soon Nagalaom aco sa pagquita canimo
ng̃a mapiscay sa dili na madugay
Are you sick, mother? Masaquit ca ba, anan?
Yes, I am Oo
I am very sorry for it Nasubu aco caayo tung̃ud niana
I hope you are not seriously ill Basin ng̃a dili daco ang saquit mo
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Paquigpulong sa masaquiton ug sa mananambal.
I have sent for you, doctor, because I
feel very ill
Guipaanhi co icao, Doctor, cay
masaquiton aco caayo
What ails you? Onsay imong guibati?
My body is very weak Maluya caayo ang acong lauas
How did you rest last night? Maayo ba ang pagcatolog mo caron
gabi-i?
Very badly; I did not sleep a moment Dautan caayo; uala aco matolog
bisan usa ca pagpiloc da
Let me feel your pulse Pamisil ca
I had a fever the whole night Guihilantan aco sa tibooc ng̃a gabi-i
I have had a terrible nightmare Guialimong̃aoan aco ug daco
It would be better to die Maayo pa ang pagcamatay
Show me your tongue Ipaquita canaco ang imong dila
My head aches me Masaquit ang acong olo
You want bleeding Quinahanglan ang pagcadlit canimo
Your tongue is foul Buling̃on man ang imong dila
Have you any pain in your chest? Masaquit ba ang imong dughan?
I have a pain in my throat Masaquit ang acong totonlan
Sit up on the bed Mulingcod ca sa higdaan
Have you a pain in your side? Guibati mo ba ug saquit sa imong
quilid?
No, I have a pain in my waist Dili, ang acong hauac hinoo maol-ol
I feel exceedingly weak Nalay lay caayo ang acong lauas
I can scarcely stand on my legs Lugus macatindug aco
Do you sleep soundly? Nacatolog ca ba ug maayo?
I dream a great deal Nagadamgo aco caayo
How were you taken ill? Onsay guinicanan sa imong saquit?
I began by a shivering Nasip-on aco sa sinugdan
Do you think it dangerous? Sa imong pagsabut malisud ba caha
quini?
Do not believe that Ayao ca paghuna huna niana
I am very tired of being so long in
bed
Nabalao aco na sa higdaan
I am going to die Mamatay aco
We don’t know the value of health
till we have lost it
Dili quita magpacamahal sa caayo sa
atung lauas cun dili quita
magmasaquit
Take care not to catch cold Bantay ca ng̃a dili ca baya masip-on
Must I take that potion? Muinom aco caha nianang tambal?
Yes, but take it fasting Oo, apan sa dili pa icao magpainit
How often must I take it? Macapila muinom aco niana?
Three times a day Macatolo sa usa ca adlao
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Paquigpulong sa pagduao.
Good morning, Madam Maayong buntag, Señora
Good day, Sir Maayong adlao, Señor
Good afternoon, Miss Maayong palis, Señorita
How do you do? Comusta ca?
Very well, thank you very much Maayo man, diosmagbayad canimo
sa macadaghan
Be seated Mulingcod ca
Please to take a seat Mulingcod ca usa
Ah!, here is Mr. Michael, I am very
glad to see you
Ah!, ania si Sr. Miguel, nalipay aco
caayo sa pagquita canimo
I am delighted to make your
acquaintance, Sir
Daco ang calipay co sa pagcaila
canimo, Señor
I have the honor to salute you Nalipay aco sa pagpang̃amosta
canimo
Somebody knocks Dunay nagatoctoc
Can it be Mr. Nicholas? Mao ba caha si D. Nicolás?
Madam, I have the honor to wish you
good day
Señora, nalipay aco sa paghatag
canimo ug maayong adlao
I am delighted to see you well again Nalipay aco sa pagquita canimo ng̃a
mapiscay na usab
You are very kind to have thought of
me
Labihan ang caicog mo sa paghuna
huna canaco
I have called at your house several
times, but I have not had the pleasure
of finding you at home
Dinhi na aco sa iño sa nacapila ug
uala aco macadang̃at sa calipay sa
paghiabut canimo sa balay
Yes, and I am much regretting that I
was not at home to receive you
Mao lagui, ug nasaquit aco tinuod
cay uala aco sa balay sa pagdauat
canimo
Allow me to retire Tuguti na aco sa pagpauli
You want to leave me already? Pauli ca na ba?
Deign to believe that I am very sorry
that I cannot stay any longer with
you
Sayod ca usa ng̃a masaquit aco ug
daco tung̃ud cay dili aco macahimo
gayud magdugay sa paguban canimo
I equally regret that your visit has
been so short
Nasaquit aco usab tung̃ud sa
pagcahamubo sa imong visita
What are you going to do this
evening?
Onsay bubuhaton mo caron hapon?
I have to take my sister to the theatre May ihatod co sa teatro ang igso-on
co ng̃a babaye
Very well, till then Maayo man, hasta sa paquigquita
quita
Are you going already? Muadto na icao?
You are in great hurry Dinalian ca lagui
I must go Pauli aco na gayud
Why are you in such a hurry? Ng̃ano dinalian ca sa ing̃on?
I have a great many things to do Daghan man ang mg̃a buluhaton co
Don’t forget us Ayao came hicalimti
I must take leave of you Quinahanglan ng̃a manamilit aco
canimo
We must part Magabulag na quita
I am going to take leave of you Magaadios na aco canimo
Till I have the honor of seeing you
again
Hasta sa laing paghibala ta
Till we meet again Hasta sa laing pagquita
Till our next meeting Hasta ng̃a macaquita quita
Thank you for your visit Diosmagbayad canimo tung̃ud sa
imong pagduao
Your servant, Madam Ang imong magsisilve, Señora
I am at your service Tomanon co ang imong mg̃a sugo
Your humble servant Ang ubus ng̃a sologo-on mo
I am very much obliged to you Nagadiosmagbayad aco canimo sa
macadaghan
My compliments to your brother Icomusta aco sa imong igso-on
Present my regards to your mother Icomusta aco sa imong anan
Present my best wishes to your aunt Icomusta aco sa imong ia-an
Present my respects to your husband Icomusta aco sa imong bana
Give my kind regards to your lady Icomusta aco sa imong asaua
Remember me to all at home Icomusta aco sa imong mg̃a loon
I will not fail Tumanon co gayud ang imong mg̃a
sugo
Come very often Muanhi ca sa masubsub
Good by Ari na aco
Come again Balic balic
Alas! here is Mr. Alexander Diay! ania si D. Alejandro
How do you do Mr.? Comusta ca Señor?
Very well Sa calooy sa Dios
I am not well, my pains shall not
have an end
Dili maayo aco, ang acong mg̃a
saquit dili matapus sa guihapon
You must drive away your sadness Quinahanglan ng̃a licayan mo
canang pagcaming̃ao
I cannot Dili aco macahimo
You must; to dispel one’s sorrow the
best remedy is to visit the friends
Quinahanglan man; sa paglicay sa
mg̃a huna huna ng̃a masolobon ang
labing maayong sumpa mao ang
pagduao sa mg̃a abian
All the friends are not able to dispel
my sorrow
Ang tanan ng̃a mg̃a caila co dili
macagahum sa pagpauala sa acong
caming̃ao
You must try to visit them very often Magasulay ca sa pagduao canila sa
masubsub caayo
I should be so happy to be able to get
rid of my low spirits
Paladan aco unta cun cauad-an aco
sa mg̃a huna huna ng̃a maming̃aon
To do so, you must never think of
your troubles
Cay aron dang̃aton mo cana, ayao ca
pagpalandong sa guihapon sa imong
mg̃a saquit
I give you many thanks for your
advice
Nagadiosmagbayad aco canimo sa
nacadaghan tung̃ud sa imong sambag
Don’t mention it Ayao ca paging̃on niana. dili aco
tacus
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Paquigpulong tung̃ud sa pagadto sa higda-an.
I feel very sleepy Catologon aco caayo
Do you want to go to bed? Buut ca ba muhigda?
Yes, I wish I was in bed Oo, naligad na aco unta
Did the servant make the bed? Guiandam ba ang higda-an sa
sologo-on?
Yes, but he did not change the sheets Oo, apan uala sia magailis sa mg̃a
habol ng̃a coquillo
Waiter, change this pillow Bata, ilisi quining unlan
These pillow-cases are not clean Buling̃on man quining mg̃a funda
This blanket is very thick, I want
another thin one
Mabaga caayo quining habol, buut
aco ug lain ng̃a maga-an (manipis)
Do you want any more? Buut pa nimo ug lain?
Put out the lamp Palng̃a ang quinqué
Bring me the candlestick Iari ang palmatoria
Till to-morrow, good night Hasta sa ugma, maayong gabi-i
Pleasant dreams to you Basin ng̃a magadamgo ca sa mg̃a
malilipayon
Why do you not tell the children to
go to bed?
Mano dili nimo pahigda-a ang mg̃a
bata?
Because they have to sup before Cay manihapon pa sila
And when do you go to bed? Ug anosa muhigda ca?
I will go very soon Muhigda aco dayon
Tell the waiter to come here Paanhion mo ang bata
Have you closed the shutters? Guitacpan ba nimo ang persianas?
Yes, Sir Oo, Señor
You had better leave them open Ayo pa ng̃a dili sirhan sila
Why? Mano?
Because the weather is very warm Cay mainit caayo ang tiempo
At what o’clock must I waken you? Onsa ng̃a horas ipucao canimo?
At five o’clock Sa a las cinco
All right, till to-morrow Maayo man, hasta sa ugma
Good night Maayong gabi-i
Good night Adios
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Paquigpulong tung̃ud sa pagbang̃on sa higda-an.
Good morning, how did you sleep
last night?
Maayong buntag, naonsa ang
pagcatolog mo caron gabi-i?
How have you slept? Guionsa ang pagcatolog mo?
How did you rest? Naonsa ang pagpahoay mo?
Waiter, why did you not wake me? Bata, ng̃ano uala pucaoa aco nimo?
It is time to get up Horas na man sa pagbang̃on
Let us get up Mubang̃on na quita
Come, up up Nan, mubang̃on ca
Rise quickly Mubang̃on ca sa madali
I have not slept very well Uala aco matolog ug maayo
Never mind, rise Ualay sapayan, mubang̃on ca
You slept without waking Uala ca maghimata sa tibo-oc ng̃a
gabi-i
No, let me sleep a little more Uala, pasagdi pa aco ug diriot
Dress yourself, idle; do you not see
the sunshine?
Magvisti ca, tapolan; dili maquita mo
ang cahayag sa adlao?
At what o’clock did you rise? Onsa ng̃a horas nagbang̃on ca?
I have just up Caron pa nagbang̃on aco
What time is it? Onsa ng̃a horas?
It is late Buntag na man
So soon! It can’t be; I have not been
in bed more than two hours
Ing̃on ng̃a madali! Dili gayud
mahimo; uala pay duha ca horas
cutub sa paghigda co
Two hours, say nine! Duha da ca horas, ingnon mo siam!
I was sleeping so well when you
called me up!
Pagcamaayo sa pagcatolog co sa
pagpucao canaco!
I think so, but you must go up very
soon
Mao man, apan quinahanglan ng̃a
mubang̃on ca dayon
Pity me, I am even very sleepy Caloy-i aco, catologon pa aco tuod
Make haste, and dress quickly Dalia, ug magvisti ca sa ualay lang̃an
Why should I hurry so? Mano magdali aco sa ing̃on?
The boys have been in class for more
than a quarter of an hour
Didto na ang mg̃a bata sa escuelahan
capin na sa usa ca cuarto sa horas
Well, cannot they begin without me? Maayo, quinahanglan ba aco caha
cay arong musugod sila sa
pagescuela?
I have not any doubt that they can Sa ualay duha duha dili ca
quinahanglanon
So, let me sleep Busa, pacatolga aco
I cannot allow you to be in bed for a
moment
Dili aco macatogot canimo sa
paghigda pa bisan sa usa ca pagpiloc
da
I am ready Listo na man aco
Yes, but it has not been without
trouble
Mao man, apan uala mahimo cana sa
ualay cabudlay
Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey
Paquigpulong tung̃ud sa paglacao, pagviaje, &.
Where do you come from? Di-in ca guican?
I come from Bohol Guican aco sa Bohol
Where are you going now? Asa ca paing̃on caron?
I am going to Mindanao Paing̃on aco sa Mindanao
Where do you wish to go? Asa buut mong pagdolong?
I wish to go to Manila Buut cong pagdolong sa Manila
Where will you go to? Asa buut mong pagadto?
I will go home Buut cong pagadto sa amo
Where are you going to? Asa ca paing̃on?
I am going to market Paing̃on aco sa tianggi
I am going to you Paing̃on aco sa iño
I go to your house Muadto aco sa imong balay
I am going to the church Paing̃on aco sa singbahan
I am going to village Paing̃on aco sa longsod
Where have you to go? Asa bay imong adtoon?
I have to go to Cebu Dunay acong adtoon sa Sugbu
I have to go to Tagbilaran May acong adtoon sa Tagbilaran
I have to go to the church to pray the
Holy Child for us
Dunay acong adtoon sa singbahan sa
pagampo sa Santo Niño tung̃ud
canatu
Do you want to come with me? Buut ca ba muuban canaco?
Do you want to accompany me? Buut ca ba mucuyog canaco?
I cannot, for I have many things to
do
Dili aco macahimo cay duna acoy
daghang mg̃a buhat
Where shall we go to? Asa quita padolong?
We shall go to take a walk Magasodoy sodoy quita
Let us take your brother on our way Hapiton ta ang imong igso-on
We want to go and take a walk Buut came magasodoy sodoy
Which way shall we go? Onsang dalana atung pagaguian?
Which way you please Icao magabuut
Let us go to your brother’s Tala na sa balay sa imong igso-on
With all my heart Sa maayong cabubuton
I have no objection Uala acoy igalalis canimo
Let us go this way Paing̃on quita niining dalana
Where do you come from? Di-in ca guican?
I come from Mrs. Mary’s Guican aco sa balay ni Dña. María
I come from your father’s Sa balay sa imong amahan
I come from yours Guican aco sa iño
I come from the teacher’s Sa balay sa magtoto-on
I come from their house Guican aco sa ila
Where is Mr. Patrick? Hain ba si D. Patricio?
You will find him at his house Maquita mo sia sa ila
He is at home Tua sa ila
He is out Uala sa ila
He is going to his uncle’s Paing̃on sia sa balay sa iyang oyo-an
Can you tell me where he has gone
to?
Mahimo ca ba magpahibalo canaco
asa sia paing̃on?
He is just gone out Bag-o pa nanaog sia
He has gone to church Miadto sia sa singbahan
He went to the barber’s Miadto sia sa buhatan sa mananalot
Let us call at the hatter’s Hapiton ta ang baliguia-an sa
magbolohat sa mg̃a calo
Let us call at your aunt Muduao quita sa ia-an mo
Is Mr. Edward at home? Ania ba sa balay si D. Eduardo?
He is not at home Uala sia sa balay
He is at Mrs. Elizabeth’s house Atua sa balay ni Doña Isabel
Where is Mr. William going to? Asa paing̃on si D. Guillermo?
I do not know where he is going to Ambut cun asa paing̃on sia
Where does Mrs. Clara go? Asa paing̃on si Dña. Clara?
I do not know Ambut. Inay
Why are you so glad? Ng̃ano malipay ca sa ing̃on?
Because my father called at me Cay guiduao aco sa acong tatay
Why are you so sad? Ng̃ano masubu ca sa ing̃on?
Because I have seen my friends
passing and they have not called at
my house
Cay naquita co ang acong mg̃a higala
ng̃a miagui lang ug uala aco nila
hapita
Why do you not go out? Mano di ca manaog?
Because my steamer is just arrived Cay guidungoan aco sa vapor
I am going away; it is time Pauli na aco; horas na man
At what o’clock do you intend to
come back?
Onsa ng̃a horas ipauli mo?
I shall be at home very soon Mupauli aco dayon sa balay
I will go along with you Mucuyog aco canimo
I will accompany you Ubanan ta icao. Magbolyog quita
You go too fast Icao mainsil caayo
I must return home Pilit aco mupauli sa balay
Come back as fast as you can Dalia ang pagpauli, ta man sa
mahimo mo
Come back quickly Pauli ca sa madali
Will you come back again? Mupauli na icao usab?
I shall see you on my return Sibugan co icao sa pagpauli
I shall go to Cebu to-morrow Ugma muadto aco sa Sugbu
What will you gain by it? Onsay ipatigayon mo niana?
You will not get anything by it Dili nimo pagdang̃aton ug bisan onsa
When do you intend to depart? Anosa ca naghuna huna muguican?
I intend to depart to-morrow Naghuna huna aco muguican ugma
At what o’clock will the steamer set
out?
Onsa ng̃a horas iguican sa vapor?
At seven o’clock in the morning Sa a las siete sa buntag
How far did you travel last year? Asa ba cutub nagviaje ca sa usang
tuig?
As far as Spain Cutub sa España
Are you fond of riding? Mahagugma ca ba mang̃abayo?
I am very fond of it Mahagugma aco caayo
Is it good travelling? Maayo ba ang pagviaje?
It is Maayo man
Are you fond of driving? Mahagugma ca magcarruage?
I am Oo
Are you fond of travelling by sea? Mahagugma ca magviaje sa dagat?
Do you wish to travel by land? Mahagugma ca ba magviaje sa yuta?
Do you want to travel on foot? Buut ca ba mulacao sa pagviaje?
Do you like to travel on horse-back? Buut ca ba mang̃abayo sa pagviaje?
How far is it from here to Manila? Pila ba ang cahalayo cutub dinhi
hasta sa Manila?
It is not very far Dili man halayo caayo
Is Cebu very far from Bohol? Halayo ba caayo cutub sa Sugbu
hasta sa Bohol?
It is near Dool da man
Has your friend already gone to
Manila?
Miadto na ba ang abian mo sa
Manila?
He has not yet gone, but he shall go
very soon
Uala pa, apan muadto sia di na
madugay
How far is he going? Asa cutub muadto sia?
As far as my brother’s Cutub sa balay sa acong igso-on ng̃a
lalaqui
As far as my sisters’ Cutub sa balay sa acong mg̃a igso-on
ng̃a babaye
When will you go away? Anosa ca muguican?
Very soon, because they are waiting
for me at home
Dili na madugay, cay guihulat aco
nila sa balay
Shall we set out early? Magmasayo ba quita sa pagguican?
We shall start at five o’clock in the
morning
Muguican quita sa a las cinco sa
buntag
We cannot start till eight o’clock Dili mahimo quita muguican hasta sa
a las ocho
When is your brother-in-law going
out?
Anosa muguican ang imong bayao?
To-morrow evening Ugma sa hapon
Did you go very far? Halayo ba caayo ang imong
guilactan?
Not very far Dili halayo
Where is your brother? Hain ba ang imong igso-on?
He has gone to take a walk round the
garden
Didto sia nagasodoy sodoy sa
tanaman
Where was he yesterday? Diin ba sia cahapon?
He was not at home Uala sia diha sa balay
When will that man go away? Anosa muguican canang tao?
He will go immediately Muguican sia caron caron
Why has your brother gone away so
soon?
Ng̃ano minguican ang igso-on mo
ing̃on ng̃a madali?
Because some friends were waiting
for him
Cay guihulat sia sa iyang mg̃a
amigos
Why do you walk so fast? Ng̃ano ing̃on ng̃a mapiscay ang
paglacao mo?
Because I have scarcely time to be at
home at four o’clock
Cay lugus macaabut aco sa balay sa
a las cuatro
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
PDF
Probability And Random Variables Revised Gp Beaumont
PDF
An Introduction To Modelbased Survey Sampling With Applications 1st Edition R...
PDF
An Introduction To Modelbased Survey Sampling With Applications 1st Edition R...
PDF
Networks Optimisation and Evolution 1st Edition Peter Whittle
PDF
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
PDF
Probability and Random Variables G.P. Beaumont
PDF
Numerical methods of statistics 2ed Edition Monahan J.F.
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
Probability And Random Variables Revised Gp Beaumont
An Introduction To Modelbased Survey Sampling With Applications 1st Edition R...
An Introduction To Modelbased Survey Sampling With Applications 1st Edition R...
Networks Optimisation and Evolution 1st Edition Peter Whittle
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
Probability and Random Variables G.P. Beaumont
Numerical methods of statistics 2ed Edition Monahan J.F.

Similar to Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey (20)

PDF
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
PDF
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
PDF
2004_Book_AllOfStatistics (1).pdf
PDF
Probability Random Processes And Statistical Analysis Hisashi Kobayashi
PDF
Huang, Samuel S P Shen, Norden E Huang, Samuel S P Shen - Hilbert Huang Trans...
PDF
Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi
PDF
The Statistical Analysis Of Recurrent Events Springer Cook Rj
PDF
A First Course In Linear Model Theory 1st Edition Nalini Ravishanker
PDF
Statistics For Anthropology 2nd Edition Lorena Madrigal
PDF
The Estimation and Tracking of Frequency 1st Edition B. G. Quinn
PDF
Spatio Temporal Methods in Environmental Epidemiology 1st Edition Gavin Shadd...
PDF
Advances in Survival Analysis N. Balakrishnan
PDF
Formulation And Numerical Solution Of Quantum Control Problems Alfio Borzi
PDF
Causal Inference In Econometrics 1st Ed 2016 Vannam Huynh Vladik Kreinovich
PDF
Computational Methods With Applications In Bioinformatics Analysis 1st Editio...
PDF
Joint Statistical Papers Reprint 2020 J Neyman E S Pearson
PDF
Analysis of Longitudinal Data Second Edition Peter Diggle
PDF
Henderson d., plaskho p. stochastic differential equations in science and e...
PDF
An Invitation To Applied Category Theory Seven Sketches In Compositionality B...
PDF
Analysis of Longitudinal Data Second Edition Peter Diggle
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
Stationary Stochastic Processes Theory and Applications 1st Edition Georg Lin...
2004_Book_AllOfStatistics (1).pdf
Probability Random Processes And Statistical Analysis Hisashi Kobayashi
Huang, Samuel S P Shen, Norden E Huang, Samuel S P Shen - Hilbert Huang Trans...
Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi
The Statistical Analysis Of Recurrent Events Springer Cook Rj
A First Course In Linear Model Theory 1st Edition Nalini Ravishanker
Statistics For Anthropology 2nd Edition Lorena Madrigal
The Estimation and Tracking of Frequency 1st Edition B. G. Quinn
Spatio Temporal Methods in Environmental Epidemiology 1st Edition Gavin Shadd...
Advances in Survival Analysis N. Balakrishnan
Formulation And Numerical Solution Of Quantum Control Problems Alfio Borzi
Causal Inference In Econometrics 1st Ed 2016 Vannam Huynh Vladik Kreinovich
Computational Methods With Applications In Bioinformatics Analysis 1st Editio...
Joint Statistical Papers Reprint 2020 J Neyman E S Pearson
Analysis of Longitudinal Data Second Edition Peter Diggle
Henderson d., plaskho p. stochastic differential equations in science and e...
An Invitation To Applied Category Theory Seven Sketches In Compositionality B...
Analysis of Longitudinal Data Second Edition Peter Diggle
Ad

Recently uploaded (20)

PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
20th Century Theater, Methods, History.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
HVAC Specification 2024 according to central public works department
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
advance database management system book.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
IGGE1 Understanding the Self1234567891011
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
20th Century Theater, Methods, History.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
History, Philosophy and sociology of education (1).pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
HVAC Specification 2024 according to central public works department
Chinmaya Tiranga quiz Grand Finale.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
advance database management system book.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Virtual and Augmented Reality in Current Scenario
Practical Manual AGRO-233 Principles and Practices of Natural Farming
IGGE1 Understanding the Self1234567891011
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Cambridge-Practice-Tests-for-IELTS-12.docx
AI-driven educational solutions for real-life interventions in the Philippine...
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Ad

Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey

  • 1. Statistical Analysis Of Stochastic Processes In Time 1st Edition J K Lindsey download https://ebookbell.com/product/statistical-analysis-of-stochastic- processes-in-time-1st-edition-j-k-lindsey-1011176 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Statistical Analysis And Stochastic Modelling Of Hydrological Extremes Hossein Tabari https://ebookbell.com/product/statistical-analysis-and-stochastic- modelling-of-hydrological-extremes-hossein-tabari-55889800 Modern Problems Of Stochastic Analysis And Statistics Selected Contributions In Honor Of Valentin Konakov 1st Edition Vladimir Panov Eds https://ebookbell.com/product/modern-problems-of-stochastic-analysis- and-statistics-selected-contributions-in-honor-of-valentin- konakov-1st-edition-vladimir-panov-eds-6841420 Statistical Analysis Of Proteomic Data Methods And Tools Thomas Burger https://ebookbell.com/product/statistical-analysis-of-proteomic-data- methods-and-tools-thomas-burger-47092862 Statistical Analysis Of Ecotoxicity Studies John W Green Timothy A Springer Henrik Holbech https://ebookbell.com/product/statistical-analysis-of-ecotoxicity- studies-john-w-green-timothy-a-springer-henrik-holbech-49476736
  • 3. Statistical Analysis Of Massive Data Streams Proceedings Of A Workshop 1st Edition Committee On Applied And Theoretical Statistics Board On Mathematical Sciences And Their Applications https://ebookbell.com/product/statistical-analysis-of-massive-data- streams-proceedings-of-a-workshop-1st-edition-committee-on-applied- and-theoretical-statistics-board-on-mathematical-sciences-and-their- applications-51848662 Statistical Analysis Of American Divorce Alfred Cahen https://ebookbell.com/product/statistical-analysis-of-american- divorce-alfred-cahen-51907986 Statistical Analysis Of Designed Experiments Third Edition 3rd Edition Helge Toutenburg Auth https://ebookbell.com/product/statistical-analysis-of-designed- experiments-third-edition-3rd-edition-helge-toutenburg-auth-2014970 Statistical Analysis Of Clinical Data On A Pocket Calculator Statistics On A Pocket Calculator Cleophas https://ebookbell.com/product/statistical-analysis-of-clinical-data- on-a-pocket-calculator-statistics-on-a-pocket-calculator- cleophas-22002900 Statistical Analysis Of Profile Monitoring 1st Edition Rassoul Noorossana https://ebookbell.com/product/statistical-analysis-of-profile- monitoring-1st-edition-rassoul-noorossana-2373502
  • 6. Statistical Analysis of Stochastic Processes in Time Many observed phenomena, from the changing health of a patient to values on the stock market, are characterised by quantities that vary over time: stochastic processes are designed to study them. Much theoretical work has been done but virtually no modern books are available to show how the results can be applied. This book fills that gap by introducing practical methods of applying stochastic processes to an audience knowledgeable only in the basics of statistics. It covers almost all aspects of the subject and presents the theory in an easily accessible form that is highlighted by application to many examples. These examples arise from dozens of areas, from sociology through medicine to engineering. Complementing these are exercise sets making the book suited for introductory courses in stochastic processes. Software is provided within the freely available R system for the reader to be able to apply all the models presented. J. K. LINDSEY is Professor of Quantitative Methodology, University of Liège. He is the author of 14 books and more than 120 scientific papers.
  • 7. CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS Editorial Board R. Gill (Department of Mathematics, Utrecht University) B. D. Ripley (Department of Statistics, University of Oxford) S. Ross (Department of Industrial Engineering, University of California, Berkeley) B. W. Silverman (St Peter’s College, Oxford) M. Stein (Department of Statistics, University of Chicago) This series of high-quality upper-division textbooks and expository monographs cov- ers all aspects of stochastic applicable mathematics. The topics range from pure and applied statistics to probability theory, operations research, optimization, and mathe- matical programming. The books contain clear presentations of new developments in the field and also of the state of the art in classical methods. While emphasizing rigorous treatment of theoretical methods, the books also contain applications and discussions of new techniques made possible by advances in computational practice. Already published 1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley 2. Markov Chains, by J. Norris 3. Asymptotic Statistics, by A. W. van der Vaart 4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and Andrew T. Walden 5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu 6. Empirical Processes in M-Estimation, by Sara van de Geer 7. Numerical Methods of Statistics, by John F. Monahan 8. A User’s Guide to Measure Theoretic Probability, by David Pollard 9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan 10. Data Analysis and Graphics using R, by John Maindonald and John Braun 11. Statistical Models, by A. C. Davison 12. Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll 13. Exercises in Probability, by L. Chaumont and M. Yor
  • 8. Statistical Analysis of Stochastic Processes in Time J. K. Lindsey University of Liège
  • 9. cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru, UK First published in print format isbn-13 978-0-521-83741-5 isbn-13 978-0-511-21194-2 © Cambridge University Press 2004 2004 Information on this title: www.cambridge.org/9780521837415 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. isbn-10 0-511-21371-9 isbn-10 0-521-83741-3 Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Published in the United States of America by Cambridge University Press, New York www.cambridge.org hardback eBook (EBL) eBook (EBL) hardback
  • 10. Contents Preface page ix Notation and symbols xiii Part I Basic principles 1 1 What is a stochastic process? 3 1.1 Definition 3 1.2 Dependence among states 10 1.3 Selecting models 14 2 Basics of statistical modelling 18 2.1 Descriptive statistics 18 2.2 Linear regression 21 2.3 Categorical covariates 26 2.4 Relaxing the assumptions 29 Part II Categorical state space 37 3 Survival processes 39 3.1 Theory 39 3.2 Right censoring 47 3.3 Interval censoring 53 3.4 Finite mixtures 57 3.5 Models based directly on intensities 60 3.6 Changing factors over a lifetime 64 4 Recurrent events 71 4.1 Theory 72 4.2 Descriptive graphical techniques 83 4.3 Counts of recurrent events 88 4.4 Times between recurrent events 91 5 Discrete-time Markov chains 101 5.1 Theory 102 5.2 Binary point processes 108 5.3 Checking the assumptions 114 5.4 Structured transition matrices 119 v
  • 11. vi Contents 6 Event histories 133 6.1 Theory 133 6.2 Models for missing observations 138 6.3 Progressive states 142 7 Dynamic models 151 7.1 Serial dependence 152 7.2 Hidden Markov models 161 7.3 Overdispersed durations between recurrent events 167 7.4 Overdispersed series of counts 178 8 More complex dependencies 183 8.1 Birth processes 183 8.2 Autoregression 191 8.3 Marked point processes 195 8.4 Doubly stochastic processes 198 8.5 Change points 202 Part III Continuous state space 211 9 Time series 213 9.1 Descriptive graphical techniques 213 9.2 Autoregression 216 9.3 Spectral analysis 226 10 Diffusion and volatility 233 10.1 Wiener diffusion process 233 10.2 Ornstein–Uhlenbeck diffusion process 238 10.3 Heavy-tailed distributions 240 10.4 ARCH models 249 11 Dynamic models 255 11.1 Kalman filtering and smoothing 255 11.2 Hidden Markov models 259 11.3 Overdispersed responses 262 12 Growth curves 268 12.1 Characteristics 268 12.2 Exponential forms 269 12.3 Sigmoidal curves 275 12.4 Richards growth curve 278 13 Compartment models 285 13.1 Theory 285 13.2 Modelling delays in elimination 289 13.3 Measurements in two compartments 293 14 Repeated measurements 303 14.1 Random effects 303 14.2 Normal random intercepts 306 14.3 Normal random coefficients 310 14.4 Gamma random effects 312
  • 12. Contents vii References 317 Author index 327 Subject index 330
  • 14. Preface Throughout their history, human beings have been fascinated by time. Indeed, what is history but an interpretation of time? Each civilisation has had its own special conception of time. Our present anti-civilisation only knows ‘time is money’! No one can deny that the study of time is important. This text attempts to make more widely available some of the tools useful in such studies. Thus, my aim in writing this text is to introduce research workers and students to ways of modelling a wide variety of phenomena that occur over time. My goal is explicitly to show the broadness of the field and the many inter-relations within it. The material covered should enable mathematically literate scientists to find appropriate ways to handle the analysis of their own specific research problems. It should also be suitable for an introductory course on the applications of stochastic processes. It will allow the instructor to demonstrate the unity of a wide variety of procedures in statistics, including connections to other courses. If time is limited, it will be possible to select only certain chapters for presentation. No previous knowledge of stochastic processes is required. However, an intro- ductory course on statistical modelling, at the level of Lindsey (2004), is a neces- sary prerequisite. Although not indispensable, it may be helpful to have more ex- tensive knowledge of several areas of statistics, such as generalised linear and cat- egorical response models. Familiarity with classical introductory statistics courses based on point estimation, hypothesis testing, confidence intervals, least squares methods, personal probabilities, . . . will be a definite handicap. Many different types of stochastic processes have been proposed in the litera- ture. Some involve very complex and intractable distributional assumptions. Here, I shall restrict attention to a selection of the simpler processes, those for which explicit probability models, and hence likelihood functions, can be specified and which are most useful in statistical applications modelling empirical data. More complex models, including those requiring special estimation techniques such as Monte Carlo Markov Chain, are beyond the scope of this text. Only parametric models are covered, although descriptive ‘nonparametric’ procedures, such as the Kaplan–Meier estimates, are used for examining model fit. The availability of explicit probability models is important for at least two rea- sons: ix
  • 15. x Preface (i) Probability statements can be made about observable data, including the observed data: (a) A likelihood is available for making inferences. (b) Predictions can be made. (ii) If the likelihood can be calculated, models can be compared to see which best fit the data, instead of making empty claims about wonderful models with no empirical basis, as is most often done in the statistical literature. Isolated from a probability model basis, parameter estimates, with their standard errors, are of little scientific value. Many standard models, such as those for survival, point processes, Markov chains, and time series, are presented. However, because of the book’s wide scope, it naturally cannot cover them in as great a depth as a book dedicated to only one of them. In addition, certain areas, such as survival analysis and time series, occupy vast literatures to which complete justice cannot be made here. Thus, in order to provide a reasonably equitable coverage, these two topics are explored especially briefly; the reader can consult a good introductory text on either of these topics for additional details. Many basic theoretical results are presented without proof. The interested reader can pursue these in further detail by following up the ‘Further reading’ list at the end of each chapter. On the other hand, for the readers primarily interested in de- veloping appropriate stochastic models to apply to their data, the sections labelled ‘Theory’ can generally be skimmed or skipped and simply used as a reference source when required for deeper understanding of their applications. Stochastic processes usually are classified by the type of recording made, that is, whether they are discrete events or continuous measurements, and by the time spacing of recording, that is, whether time is discrete or continuous. Applied statisticians and research workers usually are particularly interested in the type of response so that I have chosen the major division of the book in this way, distin- guishing between categorical events and continuous measurements. Certain mod- els, such as Markov chains using logistic or log linear models, are limited to dis- crete time, but most of the models can be applied in either discrete or continuous time. Classically, statistics has distinguished between linear and nonlinear models, primarily for practical reasons linked with numerical methods and with inference procedures. With modern computing power, such a distinction is no longer nec- essary and will be ignored here. The main remaining practical difference is that nonlinear models generally require initial values of parameters to be supplied in the estimation procedure, whereas linear models do not. It is surprisingly difficult to find material on fitting stochastic models to data. Most of the literature concentrates either on the behaviour of stochastic models under specific restrictive conditions, with illustrative applications rarely involving real data, or on the estimation of some asymptotic statistics, such as means or variances. Unavoidably, most of the references for further reading given at the ends of the chapters are of much more difficult level than the present text.
  • 16. Preface xi My final year undergraduate social science students have helped greatly in de- veloping this course over the past 25 years. The early versions of the course were based on Bartholomew (1973), but, at that time, it was very difficult or impossi- ble actually to analyse data in class using those methods. However, this rapidly evolved, eventually to yield Lindsey (1992). The present text reflects primarily the more powerful software now available. Here, I have supplemented the contents of my current course with extra theoretical explanations of the stochastic processes and with examples drawn from a wide variety of areas besides the social sciences. Thus, I provide the analysis of examples from many areas, including botany (leaf growth), criminology (recidivism), demography (migration, human mortal- ity), economics and finance (capital formation, share returns), education (univer- sity enrolment), engineering (degradation tests, road traffic), epidemiology (AIDS cases, respiratory mortality, spermarche), industry (mining accidents), medicine (blood pressure, leukæmia, bladder and breast cancer), meteorology (precipita- tion), pharmacokinetics (drug efficacy, radioactive tracers), political science (vot- ing behaviour), psychology (animal learning), sociology (divorces, social mobil- ity), veterinary science (cow hormones, sheep infections), and zoology (locust activity, nematode control). Still further areas of application are covered in the exercises. The data for the examples and exercises, as well as the R code for all of the examples, can be found at popgen0146uns50.unimaas.nl/ jlindsey, along with the required R libraries. With this material, the reader can see exactly how I performed the analyses described in the text and adapt the code to his or her particular problems. This text is not addressed to probabilists and academic statisticians, who will find the definitions unrigorous and the proofs missing. Rather, it is aimed at the scientist, looking for realistic statistical models to help in understanding and ex- plaining the specific conditions of his or her empirical data. As mentioned above, the reader primarily interested in applying stochastic processes can omit reading the theory sections and concentrate on the examples. When necessary, reference can then be made to the appropriate parts of theory. I thank Bruno Genicot, Patrick Lindsey, and Pablo Verde who provided useful comments on earlier versions of certain chapters of this text.
  • 18. Notation and symbols Notation is generally explained when it is first introduced. However, for reference, some of the more frequently used symbols are listed below. Vectors are bold lower case and matrices bold upper case Greek or Roman let- ters. denotes the transpose of a vector or matrix. arbitrary indices , random response variable and its observed value , time lag sum of random variables , explanatory variables , number of events ∆ , ∆ change in number of events ∆ interval width previous history location parameter 2 variance probability (usually binary) change point parameter , random parameters , , , , , , , , , arbitrary parameters (auto)correlation or other dependence parameter order of a Markov process length of a series Pr probability of response probability density function cumulative distribution function survival function probability of a random parameter arbitrary regression function link function Λ integrated intensity (function) intensity (function) xiii
  • 19. xiv Notation and symbols E expected value (auto)covariance (function) covariance matrix I indicator function beta function Γ gamma function L likelihood function transition (probability) matrix transition intensity matrix marginal or conditional probability distribution first passage distribution diagonal matrix of probabilities diagonal matrix of eigenvalues matrix of eigenvectors vector of deterministic input vector of random input
  • 22. 1 What is a stochastic process? Intuitively, a stochastic process describes some phenomenon that evolves over time (a process) and that involves a random (a stochastic) component. Empirically, we observe such a process by recording values of an appropriate response variable at various points in time. In interesting cases, the phenomenon under study will usually depend on covariates. Some of these may also vary over time, whereas others will define the differing (static) conditions under which various ‘copies’ of the process occur. 1.1 Definition Thus, a stochastic process involves some response variable, say , that takes val- ues varying randomly in some way over time (or space, although that will not be considered here). may be a scalar or a vector, but I shall concentrate primarily on scalar responses in this text (see, however, Section 8.3). Generally, in the study of such processes, the term ‘random’ is replaced by ‘stochastic’; hence, the name. An observed value or realisation of the response variable is called the state of the process at time . We might call an observation of the state of a process an event. However, I shall restrict the meaning of event to the occurrence of a change of state. Thus, the number of possible different events will depend, among other things, on the number of distinct states. More generally, the probability of the process being in some given state at some point in time may depend on some function of previous events and of covariates. Usually, the probabilities of possible events will be conditional on the state of the process. Such relationships will thus be determined by the type of model being fitted. The main properties distinguishing among observed stochastic processes are: (i) The frequency or periodicity with which observations are made over time. (ii) The set of all of its possible observable values, that is, of all possible re- sponses or states of the series, called the state space. (iii) The sources and forms of randomness present, including the nature of the dependence among the values in a series of realisations of the random vari- able . 3
  • 23. 4 What is a stochastic process? (iv) The number of ‘copies’ of the process available (only one or several), which will determine how adequate information can be obtained for modelling. Let us look more closely at each of these aspects. 1.1.1 Time Observations of a stochastic process (at least of the kinds that interest us here) are made over time. If these observations are at equally spaced intervals, time is said to be discrete. Otherwise, time is continuous. Notice, however, that a process can never really be observed continuously because that would imply an infinite number of observations, even in a small interval of time. Hence, the distinction is primarily used to determine what kind of model will be applied. Continuous-time models can be used for any data but may be more complex, and may not be appropriate if changes can only occur at discrete time points. Discrete-time models require equally-spaced observations, unless some simple mechanism for missingness can be introduced. Attention may centre on (i) the states at given points in time, (ii) the events, that is, on what change of state occurs at each particular time point, or (iii) the times of events. Consider simple examples: When an economist measures monthly unemployment, time is only an equally- spaced indicator of when observations are made. The number of unemployed may change at each monthly recording. Either the level of employment (the state) or the amount of change (the event) may be of central interest. Time itself is not of direct concern except for ordering the observations. In contrast, when a doctor records the times of a patient’s repeated infections, the state might be defined to be the total number of such infections so far suf- fered by that patient. Each observation (event) is the same, an infection, and the time between these events is essential. With substantial loss of information, this could be reduced to discrete time by recording only the numbers of infections in equally-spaced time intervals. A response may only be recorded when some specific event of interest occurs. However, in order to determine the timing of that event, fairly continual observa- tion of the process usually is necessary, that is, a series of intermediate, implicit recordings of no event. If observation begins at some natural time point, such as birth, at which no response value occurs, the mechanism determining the time to the first event will usually be different from that between subsequent events.
  • 24. 1.1 Definition 5 1.1.2 State space At any given point in time, a process will be in some state. This usually is observed by recording the value of a response variable at that point . As always in statistical modelling, the set of possible states is a scientific construct that should be defined in a way appropriate to answer the questions at hand. Although, in principle, the state is what is being observed, certain models also assume a second process of unobservable hidden states (Chapters 7 and 11). The set of all possible observable states is called the state space. This may be finite, yielding a categorical response variable, or infinite, giving either a discrete or a continuous response variable. Generally, a minimal state space is chosen, one in which the probability (density) of every state is nonzero. If an observed response variable were truly continuous, every recording would be an event because no two would be the same. However, this is empirically im- possible so that any observable process could possibly stay in the same state (given the limit of precision of the recording instrument) over two or more consecutive observation points. A categorical response usually refers to a finite set of possible different states that may be observed. However, when only one type of response is of particular interest and it is fairly rare, the response might be recorded as binary, indicating presence (coded 1) or absence (coded 0) of that response. This particular response, with a binary state space, is often called a point event or a recurrent event (Chapter 4); thus, here the term ‘event’ refers both to one of the states and to the change of state. Above, I gave an example of repeated infections, where the cumulative number of infections was important. If, instead, repeated epileptic fits were recorded, the states might more appropriately be defined as having a fit or not on each particular day instead of the total number of fits so far. In certain situations, the state may be defined by more than one response value, that is, may be a vector containing quite distinct types of values. Thus, there may be a categorical value, such as a binary indicator that it rains or not, accompa- nied by a (usually quantitative) value, the mark, for example, how much rain fell (Section 8.3). A vector also is usually necessary when there are endogenous time-varying covariates. These are variables, other than the response of direct interest, that are influenced by the previous states of that response. Suppose, for example, that the condition of a patient, measured in some appropriate way, is the response of direct interest. If the dose of a medication that a patient receives depends upon his or her previous condition, then dose and condition will generally have to be handled simultaneously. They must together define the state and be allowed to vary stochastically interdependently, if a reasonable model is to be constructed. A process may also involve time-varying exogenous covariates, that is, variables not influenced by the previous states of the process. The stochastic variability of such covariates is not usually of interest, so that the probability of the state can be taken to be conditional on their observed values, as in classical regression models. Practically, we see from this section and the preceding one that models for
  • 25. 6 What is a stochastic process? Table 1.1. Chapters in which different types of stochastic processes are covered. State space Time Categorical Continuous Discrete 5, 8 9 Continuous 3, 4, 6, 7, 8 9, 10, 11, 12, 13, 14 stochastic processes can be classified by the type of variable observed, that is, the state space, and by the frequency or regularity with which it is observed over time. The structure of this text according to these criteria is summarised in Table 1.1. Recall, however, that most models in continuous time also can be applied to observations in discrete time. 1.1.3 Randomness In a deterministic process, one can predict the exact sequence of outcomes (states) from the initial conditions, although some, especially chaotic, systems are ex- tremely sensitive to these conditions. In contrast, a stochastic process has an in- herent component of unpredictability or randomness. In empirical observation, randomness is closely associated with unknownness or incomplete information. To handle such situations, the theory of probability is used. Thus, predictions involving stochastic processes will not indicate specific outcomes, but only the probabilities of occurrence of the different possible outcomes. A stochastic process may involve many forms of randomness. A number will arise from the imperfections of the observation and modelling procedures. How- ever, in scientific modelling, the most important should be inherent in the process under study. The types of randomness that need to be allowed for in modelling a stochastic process can have several sources including the following: (i) Unmeasurable variability is built into many scientific systems. This is true of quantum mechanics, but also of most biological and social processes. (ii) Almost invariably, all of the measurable factors determining the process cannot be take into account. For example, unrecorded environmental con- ditions may change over the period in which the process is acting. (iii) The initial conditions may be very difficult to determine precisely. Unfortunately, an inappropriate model can also generate additional spurious ran- domness and dependencies. Traditionally, essentially for mathematical simplicity, the Poisson, binomial, and normal distributions have been used in modelling the randomness of stochastic pro- cesses. However, we shall see as we proceed that a wide variety of other distribu- tions may be more suitable in specific circumstances.
  • 26. 1.1 Definition 7 1.1.4 Stationarity, equilibrium, and ergodicity The series of responses of a stochastic process usually will not be independent. Thus, procedures must be available to introduce appropriate dependencies. Be- cause complex models should generally be avoided in science, relatively simple methods for introducing such dependencies are desirable. Multivariate distributions Multivariate probability distributions provide the general framework in which to specify the ways in which responses are interdependent. Unfortunately, in the context of stochastic processes, these may be difficult to use, especially if long series of observations are available. Thus, with a series of observations, a model involving a multivariate normal distribution will require the manipulation of an covariance matrix. When is large, this will often not be practical or efficient, at least for technical reasons. Fortunately, for an ordered series of responses, such as those that interest us, the multivariate distribution always can be decomposed into an ordered sequence of independent conditional univariate distributions: 0 1 0 0 1 1 0 0 1 (1.1) where 0 1 0 0 1 (1.2) Notice that each conditional distribution 0 1 may be completely dif- ferent from all the others, even though the multivariate distributions for differ- ent lengths of series will usually have the same form. Generally, it will be easier to work with the conditional distributions than the multivariate one. For the multivariate normal distribution, this will be a series of univariate normal distributions, not involving directly the covariance matrix. I shall elaborate on this approach in Section 1.2 below. In contrast to the multivariate distribution, and its univariate conditional decom- position, the series of univariate marginal distributions, although often of interest, cannot by itself specify a unique stochastic process (unless all successive states are independent). It is generally of limited direct use in constructing a realistic model and will, most often, rather be a byproduct of the construction. On the other hand, this sequence of univariate marginal distributions of a stochastic process does pro- vide valuable information indicating how the process is evolving over time: its ‘underlying’ profile or trend. In the decomposition in Equation (1.1), no restrictive assumptions have been made (except ordering). However, in order for such a multivariate model to be tractable, even fitted conditionally, rather strong assumptions often have to be made. The problem with the general specification just given is that every response has a different conditional distribution, and each new response will have yet an- other one. The situation is changing faster than information can be accumulated about it! Thus, we require some reasonable simplifying assumptions.
  • 27. 8 What is a stochastic process? Stationarity A stochastic process is said to be strictly stationary if all sequences of consecutive responses of equal length in time have identical multivariate distributions 1 1 1 1 (1.3) for all and . In other words, shifting a fixed-width time observation window along a strictly stationary series always yields the same multivariate distribution. Such an assumption reduces enormously the amount of empirical information nec- essary in order to model a stochastic process. A less restrictive assumption, reducing even further the amount of information required, is that a process is second-order stationary. This is defined only by the mean, variance, and covariances: E E (1.4) for all and . Because a multivariate normal distribution is completely defined by its first two moments, if the process is normal and second-order stationary, it is strictly stationary. This is not generally true of other distributions. In this text, stationarity will always be strict. Stationarity is a characteristic of multivariate distributions. Thus, from Equation (1.1), it cannot be determined solely by the conditional distributions, but requires also that the initial marginal distribution 0 0 be specified. Of course, this, in turn, implies that the univariate marginal distributions at all other time points will also be known. Stationarity can be an appropriate assumption if the stochastic process has no inherent time origin. However, in experimental situations, for example, where treatments are applied at a specific time point, this will not be true. The need for greater information due to lack of stationarity can often be compensated by studying replications of the process (Section 1.1.5). Equilibrium Although a process may not be stationary when it starts, it may reach an equilib- rium after a sufficiently long time, independent of the initial conditions. In other words, if an equilibrium has been reached, the probability that the process is in each given state, or the proportion of time spent in each state, has converged to a constant that does not depend on the initial conditions. This generally implies that eventually the process approaches closely to a stationary situation in the sense that, if it initially had the equilibrium distribution of states, it would be stationary. (See Cox and Miller, 1965, pp. 9, 272.) Ergodicity The concept of ergodicity is closely related to that of equilibrium, although the for- mer has various meanings in the literature on stochastic processes. Ergodic theo- rems provide identities between probability averages, such as an expected value,
  • 28. 1.1 Definition 9 and long-run averages over a single realisation of the process. Thus, if the equi- librium probability of being in a given state equals the proportion of a long time period spent in that state, this is called an ergodic property of the process. In a similar way, the law of large numbers can be generalised to stochastic processes. (See Cox and Miller, 1965, p. 292.) Regeneration points Another important concept for some stochastic processes is that of a regeneration point. This is a time instant at which the process returns to a specific state such that future evolution of the process does not depend on how that state was reached. In other words, whenever such a process arrives at a regeneration point, all of its previous history is forgotten. A well known case is the renewal process (Section 4.1.4) describing times be- tween recurrent events which, as its name suggests, starts over again at each such event. (See Daley and Vere-Jones, 1988, p. 13.) I shall look at further general procedures for simplifying models of stochastic processes in Section 1.2 below. 1.1.5 Replications When studying a stochastic process, two approaches to obtaining adequate infor- mation can be envisaged. One can either observe (i) one series for a long enough period, if it is reasonably stable, or (ii) several short ‘replications’ of the process, if they are reasonably similar. In certain situations, one has no choice but to use replications. Thus, for example, with survival data (Chapter 3), one single event, say death, terminates the process so that the only way to proceed is by collecting information on a large number of individuals and by assuming that the process is identical for all of them. Both approaches can create problems. The phenomenon under study may not be stable enough to be observed over a very long time, say due to problems of lack of stationarity as discussed above. It may only be possible to assume that shorter segments of a series are from the same stochastic process. On the other hand, with replications, one must be able to assume that the different series recorded do, in fact, represent the same stochastic process. In certain situations, such as survival data, it may not even be possible to check this strong assumption empirically. When replications of a stochastic process are modelled, extreme care must be taken with the time scale. If it is not chronological time, problems may arise. For example, in experiments with biological organisms, time may be measured either from birth or from start of a treatment. If all births do not occur simultaneously and treatment is not started at the same time for all subjects, events that occur at similar times after beginning treatment may occur neither closely together chronologically nor at similar ages. This can create difficult problems of interpretation due to confounding.
  • 29. 10 What is a stochastic process? In the examples that I shall analyse in this text, I shall use either one long series or a set of several short ones, depending both on the type of problem and on the kind of data available. Generally, in the second case, when replications are present, I shall assume, for simplicity, that they come from the same process, perhaps with any differences explainable by time-constant (interprocess) covariates. Only in Section 7.3 and in Chapter 14 shall I look at some standard ways of modelling the differences among a set of series not described by observed covariates. 1.2 Dependence among states As we have seen, dependencies among successive responses of a stochastic process can be modelled by multivariate distributions or equivalently by the corresponding product of conditional distributions. Certain general procedures are available that I shall review briefly here. Some of them arise from time series analysis (Chapter 9) but are of much wider applicability. 1.2.1 Constructing multivariate distributions In a model for a stochastic process, some specific stochastic mechanism is assumed to generate the states. We often can expect that states of a series observed more closely together in time will be more similar, that is, more closely related. In other words, the state at a given time point will generally be related to those recently produced: the probability of a given state will be conditional, in some way, on the values of the process previously generated. In certain situations, an adequate model without such dependence may be con- structed. It usually will require the availability of appropriate time-varying covari- ates. If these have been recorded, and perhaps an appropriate time trend specified, the present state, conditional on the covariates, should be independent of previous states. However, in many cases, this will not be possible, because of lack of infor- mation, or will not be desirable, perhaps because of the complexity of the model required or its lack of generality. In order to model time dependencies among the successive states of a stochastic process of interest, we may choose a given form either for the conditional distribu- tions, on the right-hand side of Equation (1.1), or for the multivariate distribution, on the left-hand side. In general, the conditional distribution will be different from the multivariate and marginal distributions, because the ratio of two multivariate distributions does not yield a conditional distribution of the same form except in very special circumstances such as the normal distribution. The one or the other will most often be intractable. Because a limited number of useful non-normal multivariate distributions is available, suitable models often can only be obtained by direct construction of the conditional distribution. Thus, usually, we shall need to set up some hierarchical series of conditional distributions, as in Equation (1.1). In this way, by means of the univariate conditional probabilities, we can construct an appropriate multivari- ate distribution for the states of the series.
  • 30. 1.2 Dependence among states 11 The hierarchical relationship among the ordered states of a series implies, by recursion, that the conditional probabilities are independent, as in Equation (1.1). In this way, univariate analysis may be used and the model will be composed of a product of terms. Thus, multivariate distributions with known conditional form are usually much easier to handle than those with known marginal form. However, as we have seen in Section 1.1.4, the general formulation of Equa- tion (1.1) highlights some potential difficulties. Each state depends on a different number of previous states so that the conditional distribution 0 1 is different at each time point; this may or may not be reasonable. Usually, additional assumptions must be introduced. As well, the unconditional distribution of the first state is required. Its choice will depend upon the initial conditions for the process. The ways in which the conditionality in the distributions is specified will depend on the type of state space. I shall, first, look at general procedures for any kind of state (Sections 1.2.2 and 1.2.3) and, then, at specific ones for continuous states (or perhaps counts, although this is unusual for the true state of a stochastic process) and for categorical states. 1.2.2 Markov processes An important class of simple processes makes strong assumptions about depen- dence over time, in this way reducing the amount of empirical information required in order to model them. A series of responses in discrete time such that 0 1 1 (1.5) so that each state only depends on the immediately preceding one, is known as a Markov process. In other words, the state at time , given that at 1, is inde- pendent of all the preceding ones. Notice that, in contrast to Equation (1.1), in the simplest situation here, the form of the conditional distribution 1 does not change over time. This definition can be extended both to dependence further back in time and to continuous time (Section 6.1). Recall, however, that such a conditional specifi- cation is not sufficient to imply stationarity. This will also depend on the initial conditions and/or on the length of time that the process has been functioning. Equation (1.5) specifies a Markov process of order one. More generally, if the dependence extends only for a short, fixed distance back in time, so that the present state only depends on the preceding states, it is said to be a Markov process of order . The random variables and are conditionally independent for , given the intermediate states. It is often necessary to assume that a stochastic process has some finite order (considerably less than the number of observations available over time), for otherwise it is nonstationary and its multivariate distribution continues to change with each additional observation, as we saw above. If the response variable for a Markov process can only take discrete values or states, it is known as a Markov chain (Chapter 5 and Section 6.1.3). Usually, there
  • 31. 12 What is a stochastic process? will be a finite number of possible states (categories), and observations will be made at equally-spaced discrete intervals. Interest often centres on the conditional transition probabilities of changes between states. When time is continuous, we have to work with transition rates or intensities (Section 3.1.3) between states, instead of probabilities. When the response variable for a Markov process is con- tinuous, we have a diffusion process (Chapter 10). 1.2.3 State dependence One simple way to introduce Markov dependence is to construct a regression func- tion for some parameter of the conditional probability distribution such that it in- corporates previously generated states directly, usually in addition to the other co- variates. The states may be either continuous or categorical. Thus, the location parameter (often the mean) of the conditional distribution of the series could be dependent in the following way: 1 (1.6) for a process of order one, where is a vector of possibly time-varying covariates in some regression function , possibly nonlinear in the parameters . If there are more than two categorical states, usually will have to be replaced by a vector (often of conditional probabilities) and 1 will need to be modelled as a factor variable. Here, the present location parameter, that is, the prediction of the mean present state, depends directly on the previous state of the process, as given by the previous observed response. Thus, this can be called a state dependence model. An easy way to introduce state dependence is by creating lagged variables from the response and using them as covariates. For a first-order Markov process ( 1), the values in the vector of successive observed states will be displaced by one position to create the lagged covariate so that the current state in the response vector corresponds to the previous state in the lagged vector. However, this means that we cannot use the first observed state, without additional assumptions, because we do not know its preceding state. If a higher-order model ( 1) is used, the displacement will be greater and more than one observation will be involved in this way. One case of this type of model for categorical states is the Markov chain men- tioned above (see Chapter 5 and and Section 6.1.3). A situation in which the state space is continuous is the autoregression of time series, at least when there are no time-varying covariates (Chapter 9). 1.2.4 Serial dependence For a continuous state space, a quite different possibility also exists, used widely in classical time series analysis (Chapter 9). This is to allow some parameter to depend, not directly on the previous observed state, but on the difference between
  • 32. 1.2 Dependence among states 13 that previous state and its prediction at that time. For a location parameter, this might be 1 1 (1.7) Notice that this will be identical to the state dependence model of Equation (1.6) if there are no covariates, and also that it generally will not work for a categorical state space because the subtraction has no meaning. Dependence among states is now restricted to a more purely stochastic compo- nent, the difference between the previous observed state and its location regression function, called the recursive residual or innovation. This may be seen more clearly by rewriting Equation (1.7) in terms of these differences: 1 1 (1.8) The new predicted difference is related to the previous observed difference by . As in Equation (1.6), the previous observed value 1 is being used to predict the new expected value , but here corrected by its previous prediction. I shall call this a serial dependence model. The present location parameter de- pends on how far the previous state was from its prediction given by the corre- sponding location parameter, the previous residual or innovation. Thus, both state and serial dependence yield conditional models: the response has some specified conditional distribution given the covariates and the previous state. In contrast, for some reason, most models constructed by probabilists, as generalisations of normal distribution serial dependence time series to other distri- butions, require the marginal distribution to have some required form (see, among others, Lawrance and Lewis, 1980). These are complex to construct mathemati- cally and difficult to interpret scientifically. 1.2.5 Birth processes If the state space has a small finite number of states, that is, it is categorical, dif- ferent methods may need to be used. One possibility may be to condition on the number of previous recurrent events or, more generally, on the number of times that the process was previously in one or more of the states. The classical situation arises when the two possible states are present or absence of a recurrent event. Then, a birth process counts the number of previous such events: 1 (1.9) where is the conditional probability of the event at time and 1 is the num- ber of previous such events. Even without time-varying covariates, this process is clearly nonstationary. Generally, some function of , such as a logit transformation, will instead be used to ensure that it cannot take impossible values. More often, the process will be defined in terms of the (log) rate or intensity of events (Section 4.1) instead of
  • 33. 14 What is a stochastic process? the probability. I shall describe more general types of dependence for finite state spaces in Section 4.1.1, once I have introduced this concept of intensity. Only three basic types of dependency have been presented here. Other more specific ones may often be necessary in special circumstances. Some of these will be the subject of the chapters to follow. 1.3 Selecting models As always in statistical work, the ideal situation occurs when the phenomenon un- der study is well enough known so that scientific theory can tell us how to construct an appropriate model of the stochastic process. However, theory never arises in a vacuum, but must depend on empirical observations as well as on human power of abstraction. Thus, we must begin somewhere! The answers to a series of questions may help in constructing useful models. 1.3.1 Preliminary questions When confronting the modelling of observations from some stochastic process, one may ask various fundamental questions: How was the point in time to begin observation chosen? – Is there a clear time origin for the process? – What role do the initial conditions play? Are observations made systematically or irregularly over time? – If observations are irregularly spaced, are these time points fixed in advance, random, or dependent on the previous history of the process itself? – Is the process changing continuously over time or only at specific time points? – Are all changes in the process recorded or only those when an observation happens to be made? – Does a record at a given time point indicate a new value in the series then or only that it changed some time since the previous observation? Is the process stationary? – Is the process increasing or decreasing systematically over time, such as a growth curve? – Is there periodic (daily, seasonal) variation? – Does the process change abruptly at some time point(s)? – Are there long-term changes? Is more than one (type of) response recorded at each time point? – Can several events occur simultaneously? – Do some quantitative measurements accompany the occurrence of an event? Does what is presently occurring depend on the previous history of the process? – Is it sufficient to take into account what happened immediately previously (the Markov assumption)?
  • 34. 1.3 Selecting models 15 – Is there a cumulative effect over the history of the process, such as a birth effect? Is the process influenced by external phenomena? – Can these be described by time-varying covariates? – Is some other unrecorded random process, such as the weather, affecting the one of interest? If there is more than one series, do the differences among them arise solely from the randomness of the process? – Are the differences simply the result of varying initial conditions? – Do the series differ because of their dependence on their individual histories? – Can part of the difference among the series be explained by time-constant covariates with values specific to each series? – Are there static random differences among the series? – Does each series depend on a different realisation of one or more time-varying covariates (different weather recorded in different locations)? – Are there unrecorded random processes external to the series influencing each of them in a different way (different weather in different locations, but never recorded)? Possible answers to some of these questions may be indicated by appropriate plots of the series under study. However, most require close collaboration with the sci- entists undertaking the study in order to develop a fruitful interaction between em- pirical observation and theory. 1.3.2 Inference Some objective empirical procedure must be available in order to be able to select among models under consideration as possible descriptions of an observed stochas- tic process. With the exception of preliminary descriptive examination of the data, all analyses of such processes in this book will be based on the construction of probabilistic models. This means that the probability of the actually observed pro- cess(es) always can be calculated for any given values of the unknown parameters. This is called the likelihood function, a function of the parameters for fixed ob- served data. Here, all inferences will be based on this. Thus, the basic assumption is the Fisherian one that a model is more plausible or likely if it makes the observed data more probable. A set of probability-based models that one is entertaining as having possibly generated the observed data defines the likelihood function. If this function is so complex as to be intractable, then there is a good chance that it cannot provide useful and interpretable information about the stochastic process. However, the probability of the data for fixed parameter values, a likelihood value, does not, by itself, take into account the complexity of the model. More complex models generally will make the observed data more probable, but simpler models are more scientifically desirable. To allow for this, minus the logarithm
  • 35. 16 What is a stochastic process? of the maximised likelihood can be penalised by adding to it some function of the number of parameters estimated. Here, I shall simply add the number of parameters to this negative log likelihood, a form of the Akaike (1973) information criterion (AIC). Smaller values will indicate models fitting relatively better to the data, given the constraint on the degree of complexity. Further reading Jones and Smith (2001) give an elementary introduction to stochastic processes. Grimmett and Stirzaker (1992), Karlin and Taylor (1975; 1981), Karr (1991), and Ross (1989) provide more advanced standard general introductions. The reader also may like to consult some of the classical works such as Bailey (1964), Bartlett (1955), Chiang (1968), Cox and Miller (1965), Doob (1953), and Feller (1950). Important recent theoretical works include Grandell (1997), Guttorp (1995), Küchler and Sørensen (1997), and MacDonald and Zucchini (1997). More ap- plied texts include Snyder and Miller (1991), Thompson (1988), and the excellent introduction to the uses of stochastic processes in molecular biology, Ewens and Grant (2001). An important book on multivariate dependencies is Joe (1997). For inferences using the likelihood function and the AIC, see Burnham and Anderson (1998) and Lindsey (2004). Exercises 1.1 Describe several stochastic processes that you can encounter while reading a daily newspaper. (a) What is the state space of each? (b) What are the possible events? (c) Is time discrete or continuous? (d) What covariates are available? (e) Will interest centre primarily on durations between events or on the states themselves? (f) What types of dependencies might be occurring over time? 1.2 Consider the following series, each over the past ten years: (a) the monthly unemployment figures in your country, (b) the daily precipitation in the region where you live, and (c) the times between your visits to a doctor. For each series: (a) Is the state space categorical or continuous? (b) Is time discrete or continuous? (c) What types of errors might be introduced in recording the observa- tions? (d) Is there a clear time origin for the process? (e) Is it plausible to assume stationarity?
  • 36. Exercises 17 (f) Can you expect there to be dependence among the responses? (g) Can you find appropriate covariates upon which the process might depend?
  • 37. 2 Basics of statistical modelling In this chapter, I shall review some of the elementary principles of statistical mod- elling, not necessarily specifically related to stochastic processes. In this way, read- ers may perhaps more readily understand how models of stochastic processes relate to other areas of statistics with which they are more familiar. At the same time, I shall illustrate how many of these standard procedures are not generally applicable to stochastic processes using, as an example, a study of the duration of marriages before divorce. As in subsequent chapters, I shall entertain a wide variety of dis- tributional assumptions for the response variable and use both linear and nonlinear regression functions to incorporate covariates into the models. 2.1 Descriptive statistics Let us first examine the data that we shall explore in this chapter. Divorces Marriage may be conceptualised as some kind of stochastic process de- scribing the relationships within a couple, varying over time, that may eventually lead to rupture. In this light, the process ends at divorce and the duration of the marriage is the centre of interest. In order to elucidate these ideas, a study was conducted in 1984 of all people divorcing in the city of Liège, Belgium, in that year, a total of 1727 couples. (For the data, see Lindsey, 1992, pp. 268–280). Here, I shall examine how the length of marriage before divorce may vary with certain covariates: the ex-spouses’ ages and the person applying for the divorce (husband, wife, or mutual agreement). Only divorced people were recorded, so that all durations are complete. How- ever, this greatly restricts the conclusions that can be drawn. Thus, the design of this study makes these data rather difficult to model. The design was retrospective, looking back in time to see how long people were married when they divorced in 1984. Thus, all divorces occurred within the relatively short period of one year. On the other hand, the couples married at quite different periods in time. This could have an influence on the occurrence of divorce not captured by age. The study included only those couples who did divorce so that it can tell us 18
  • 38. 2.1 Descriptive statistics 19 nothing about the probability of divorce. To be complete, such a study would somehow have to include a ‘representative’ group of people who were still mar- ried. These incompletely observed marriages would be censored (Section 3.1.3). The reader should keep these problems in mind while reading the following analy- ses. 2.1.1 Summary statistics Before beginning modelling, it is always useful first to look at some simple de- scriptive statistics. Divorces In the divorce study, the mean length of marriage is 13.9 years, with mean ages 38.5 and 36.1, respectively, for the husband and the wife. Because length of marriage will be the response variable, we also should look at its vari- ability; the variance is 75.9, or the standard deviation, 8.7. Thus, an interval of, say, two standard deviations around the mean length of marriage contains negative values; such an interval is meaningless. Symmetric intervals around the mean are not appropriate indicators of variability when the distribution is asymmetric, the typical case for many responses arising from stochastic processes. A more useful measure would be intervals (contours) of equal probability about the mode, which has highest probability. For this, graphical methods are often appropriate. 2.1.2 Graphics Visual methods often are especially appropriate for discovering simple relation- ships among variables. Two of the most useful in the context of modelling are histograms and scatterplots. Divorces The histogram for duration of marriage is plotted in the upper left hand graph of Figure 2.1; its shape is typical of duration data. We see indeed that it is skewed, not having the form of a normal distribution. From this, intervals of equal probability can be visualised. However, this histogram indicates the form of the distribution for all of the cou- ples together. Models based on covariates generally make assumptions about the conditional distribution for each value of the covariates. Thus, a linear regression model carries the assumption that the conditional distribution is normal with con- stant variance for all sets of covariate values. This histogram does not provide information about such conditional distributions. Consider, as an example then, the explanatory variable, applicant. We can examine the histograms separately for each of the three types of applicant. These are also given in Figure 2.1. We can see that the form of the histogram differs quite substantially among these three groups. The above procedure is especially suitable when an explanatory variable has only a few categories. If a quantitative variable, like age, is involved, another approach may be more appropriate. Let us, then, see how to examine graphically
  • 39. 20 Basics of statistical modelling All couples Proportion of divorces 0 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Husband applicant 0 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Wife applicant Length of marriage Proportion of divorces 0 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Mutual agreement Length of marriage 0 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Fig. 2.1. Histograms showing the proportions of couples divorcing after various lengths of marriage in Liège in 1984, grouped into intervals of five years: all couples and separately by applicant. the relationship between the response variable, duration of marriage, and husband’s age. The scatterplot of length of marriage in relation to this age is given in Figure 2.2 (ignore, for the moment, the two diagonal lines). As might be expected, there is a rather strict upper relationship between the two variables. Length of marriage
  • 40. 2.2 Linear regression 21 20 30 40 50 60 70 80 0 10 20 30 40 50 Husband’s age Length of marriage Linear Quadratic Fig. 2.2. A scatterplot showing the relationship between length of marriage before divorce in Liège in 1984 and the age of the husband, with the fitted normal distribution regression lines. is, with few exceptions, constrained to be no greater than about age 20 years. We can also notice the interesting fact that a few very old men married late in life and divorced rather quickly thereafter. To relate this graph to histograms, consider the density of points along a vertical line for some fixed age of the husband. If we move this line to the left or right, we see that the mass of points along it shifts. This indicates that, if we produced histograms for different age groups, they would have different shapes, as did those for the three applicant groups. Thus, both of these graphical methods indicate that the conditional distribution of length of marriage is changing with the covariates, applicant and husband’s age. As we see from these graphs, the assumptions of normality and constant variance of the conditional distribution of length of marriage, given the applicant group or husband’s age, do not appear to be fulfilled either. Nevertheless, I first shall attempt to fit such models to these data. 2.2 Linear regression One of the most widely (mis)used tools in all of statistics is linear regression. This is often misnamed ‘least squares’ regression. However, least squares estimation refers to the study of a deterministic process, whereby the ‘best’ straight line is fitted through a series of points. In statistical modelling, the interpretation of linear regression is quite different, although the technical calculations remain the same.
  • 41. 22 Basics of statistical modelling 2.2.1 Assumptions Suppose that we have observations of some response variable , say the state of a stochastic process, such as a time series or the time between recurrent events. As well, we have some accompanying explanatory variables or covariates, , to which, we believe, the response is related. Then, applying normal distribution linear regression carries the assumption that this response has a normal or Gaussian distribution with probability density ; 2 1 2 2 e 1 2 2 2 (2.1) conditional on the values of the covariates. In addition, the mean of the responses is assumed to change in some deter- ministic way with the values of these covariates, that is, 0 ∑ (2.2) In this function, 0 is the intercept, and is the slope for the covariate, . Then, this regression equation specifies how the mean of the distribution changes for each value of the covariates. On the other hand, the variance 2 of is assumed to remain constant. The model is not just the deterministic description of Equation (2.2). As an integral part of it, individual responses are dispersed randomly about the mean in the specific form of the normal distribution in Equation (2.1) with the given variance. This is illustrated in Figure 2.3. Such variability usually will be an integral part of the scientific phenomenon under study, not just measurement error. This regression function is called linear by statisticians for the wrong reason: it is linear in the parameters . This is irrelevant for scientific modelling. On the other hand, the shape of the curve may take certain restricted nonlinear forms, say if 2 is included as a covariate, as we shall see below. Once we understand that such a model is describing changes in a normal distri- bution, we easily can imagine various extensions: Other, more suitable, distributions can replace the normal distribution; these will most often be asymmetric. The dispersion (here, the variance) about the regression curve need not be held constant. The regression equation (2.2) may more appropriately be replaced by some wider class of nonlinear relationships. These generally will permit more realistic analysis of the data at hand. I shall begin to examine them more closely in Section 2.4. However, first it will be useful to review in some more detail the standard models based on the normal distribution. 2.2.2 Fitting regression lines Linear regression models are well known and can be fitted using any standard sta- tistical software.
  • 42. 2.2 Linear regression 23 2 4 6 8 10 0 10 20 30 40 x y Fig. 2.3. A graphical representation of a simple linear normal regression showing the linear form of the regression function and the constant shape of the normal distribution about it. Divorces Here, the response variable is a duration so that the normal distribution will, almost certainly, be inappropriate, as we already have seen from the graphical methods above. I shall, nevertheless, proceed first with models based upon it. Let us, then, attempt first to explore the relationship between the length of marriage and the age of the husband using linear regression. The estimated equation is 14 4 0 73 1 where 1 is the husband’s age. The positive sign of the slope indicates, as might be expected, that mean length of marriage increases with the husband’s age. Thus, according to this model, the length of marriage is estimated to increase, on average, by about three-quarters of a year for each additional year of husband’s age. This relationship is plotted as the solid line on the scatterplot in Figure 2.2. It is not very convincing. Likelihood We can ask whether the inclusion of a given covariate actually does help to predict the response variable. As briefly outlined in Section 1.3, one intuitive way to do this is to look at how probable each model makes the observed data, called its likelihood. Often, this is easier to study if minus its logarithm is used. Then, smaller values indicate models ‘closer’ to the data. A problem with this procedure is that more complex models, those with more estimated parameters, even ones that
  • 43. 24 Basics of statistical modelling are not really necessary, will generally make the data more probable. However, we usually prefer the simplest adequate model possible. One solution to this dilemma is to penalise more complex models by using an information criterion to compare models. These are designed especially to help in selecting among competing models. Thus, the most widely used, the Akaike infor- mation criterion (AIC) involves minus the logarithm of the maximised likelihood, penalised by adding to it the number of parameters in the model estimated from the data. This penalty prevents the measure of suitability of the models from de- creasing too quickly as they become more complex. Note that information criteria have no absolute meaning; they only provide a guide for comparing different mod- els applied to the same data. Beware also that most linear and generalised linear software return twice the negative log likelihood, so that, if AICs are supplied, they will be twice those given here. Divorces We can now turn to the specific question as to whether or not husband’s age improves the prediction of length of marriage. This implicitly involves the comparison of two models: those with and without the covariate, husband’s age. That with it has an AIC of 5076.5. The model without this covariate simply fits a common mean (that calculated above) to all of the responses. Its AIC is 6090.6, indicating that the first model was a great improvement on this. Of course, this is obvious from Figure 2.2; even if the regression model involving husband’s age does not represent the responses very well, it does much better than simply assuming a common mean length of marriage for everyone. Multiple regression One possible way to proceed to more nonlinear forms of regression curves, remain- ing in the context of normal linear regression, is to add the square of a quantitative covariate to the regression equation. This is a simple case of multiple regression. Here, it will produce a nonlinear model even though statisticians call it linear! Divorces If we add the square of husband’s age, the estimated equation becomes 20 6 1 04 1 0 0035 2 1 This addition does not improve the fit nearly as much as did inclusion of the linear term in husband’s age: the AIC is only reduced further to 5068.6. This relationship is plotted as the dashed line on the scatterplot in Figure 2.2; it can be seen to be only slightly curved. The same analysis can be carried out for the age of the wife, say 2. In fact, these models fit better, with AICs of 5017.7 for the linear curve and 5009.7 for the quadratic. Perhaps surprisingly, if we combine the two models, with a quadratic relationship for both husband’s and wife’s age, we obtain a further substantial im- provement. The AIC is 4917.1. However, the quadratic term for the wife’s age is not necessary in this equation; eliminating it reduces the AIC to 4916.4.
  • 44. 2.2 Linear regression 25 Husband’s Age Wife’s Age 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Husband’s Age 20 40 60 80 Wife’s Age 20 40 60 80 Length of Marriag e 0 10 20 30 40 Fig. 2.4. Contour and three-dimensional plots of the model for mean length of marriage as it depends on the two ex-spouses’ ages. Interactions We also can consider interactions between quantitative covariates, obtained by multiplying them together. Divorces After some experimentation, we discover that the linear interaction is necessary, as well as the quadratic term for wife’s age and the quadratic and cubic for husband’s age. The final estimated equation, with a substantial reduction in AIC to 4710.3, is 13 42 1 64 1 0 25 2 0 018 2 1 0 034 2 2 0 00039 3 1 0 0068 1 2 It is more difficult to plot a regression function when there are two covariates, but it is still possible. Two ways, as contours and as a three-dimensional perspective plot, are shown in Figure 2.4 for this model of length of marriage as it depends on the ages of the husband and the wife. Notice that this model is a completely arbitrary construction, obtained by em- pirical search. If, say, we were wanting to approximate some unknown nonlinear function by a Taylor’s series expansion, we would want to use all terms up to a cer- tain order, usually second. Here, I have included one third order term ( 3 1) but not three others (including two further interactions). However, none of them improves the model significantly.
  • 45. 26 Basics of statistical modelling 2.3 Categorical covariates Not all covariates are quantitative, as is age in the divorce example. Some, called factor variables, may indicate qualitatively to what subgroup each individual be- longs. As we saw in Figure 2.1, a categorical variable divides the population into subgroups, each with a distinct conditional distribution. In the context of normal models, this simply implies that the mean response will be different. Traditionally, a model based on the normal distribution and containing such a variable was called ‘analysis of variance’. However, it can, in fact, be handled as a special case of multiple regression. Divorces For the divorce data, we saw above when examining histograms that the person applying for the divorce is such a variable. The means are, respectively, 16.7 when the husband applies, 13.9 when the wife does, and 11.6 when both do (mutual consent). 2.3.1 Analysis of variance With one categorical covariate, an analysis of variance model can be written (2.3) where indexes the categories of the covariate. However, as it stands, this model has one too many parameters. We must add a constraint. There is no unique way to do this. The choice generally will depend on interpretability. I shall consider two useful ways. Another way to look at this problem is to realise that a categorical variable can- not, numerically, be summarised in one number. Instead, it requires a set of indi- cator or dummy variables, indicating to which category each observation belongs. Thus, as we shall see, Equation (2.3) can be written equivalently as the multiple regression of Equation (2.2) using such variables. Baseline constraint One way to add a constraint is to set 0 for one value of . This is called the baseline constraint because one category is chosen as a baseline of comparison for all of the others. Now let us define indicator variables such that each can take the values 0 or 1, depending on whether or not the observation is in that particular category. How- ever, this is slightly redundant: if we know that the value of the variable is not in any of the categories but one, then it must be in that remaining category. Thus, we only require one less indicator variable than the number of categories of the original variable. The category without an indicator variable is that with 0. This yields one possible constraint on the parameters mentioned above. These indicator variables can be used as covariates in the multiple regression of Equation (2.2). Fortunately, most software can handle factor variables automati- cally so that we do not need to set up the indicators ourselves. Care, however, must be taken in interpreting the results in terms of the constraint, or equivalently, the set
  • 46. 2.3 Categorical covariates 27 of indicator variables, employed. Often, by default, the software chooses 1 0 or equivalently uses indicator variables for all but the first category. Divorces For the divorce data, using the applicant as a factor variable with the baseline constraint yields ˆ 16 7, the mean number of years of marriage for the first category, husband applying. Then, 1 0, 2 2 8, the difference in mean from the first category for wife applying, and 3 5 1, the difference from the first category for mutual consent. Thus, the means given above are reproduced. The AIC is 6044.0, showing that this categorical variable does not help nearly as much in predicting the length of marriage as do the ages of the two ex-spouses. Mean constraint Instead of setting to 0 for one of the categories (above it was the first, 1), another useful possibility for interpretation in many contexts is a constraint such that is the mean and the are differences from it for each category. The ap- propriate constraint is ∑ 0. This are called the mean constraint, or sometimes the conventional constraint, because it was classically most often used in analysis of variance. Here, the corresponding indicator variables are more complex. Let us start by specifying the values for all categories except the last. Then, an appropriate indi- cator variable will take the value 1 if the observation is in the given category, 0 if it is in another category except the last, and 1 if in the last. Again, there will be one fewer indicator variable than the number of categories. The value of for the last category will be minus the sum of those for the other categories, using the fact that ∑ 0. Again, these indicator variables can be used as covariates in the multiple regres- sion of Equation (2.2), but many software packages can also do this automatically. However, generally they will only calculate the values for all but one of the cate- gories, in the way just outlined. Divorces For the divorce data, the values obtained using these constraints are ˆ 14 1, the mean number of years of marriage, 1 2 6, the difference from this mean for husband applying, 2 0 1, the difference from the mean for wife applying, and 3 2 5, the difference for mutual consent. Notice that this value, ˆ 14 1, is not equal to the global mean calculated above. It is rather the un- weighted mean of the means for each category. With this parametrisation of the model, we see more easily that the length of marriage is about average when the wife applies, being considerably longer when the husband applies and about as much shorter when there is mutual consent. No- tice that the differences in mean length of marriage are estimated to be the same between categories in the two parametrisations. Thus, in the first, the difference between husband and wife applying was 2 78; this is 0 14 2 64 2 78 in the second. Of course, the AIC is again 6044.0 because this is just a different parametrisation of the same model.
  • 47. 28 Basics of statistical modelling 2.3.2 Analysis of covariance More complex models will contain both quantitative and qualitative variables. Tra- ditionally, this was called ‘analysis of covariance’, but it is just still another case of multiple regression. A categorical covariate can be used to introduce a different curve for each of its category. Thus, for a straight line in a regression function like Equation (2.2), it will allow a different intercept for each category of the qualitative covariate: 0 ∑ (2.4) where, again, indexes the categories of the qualitative covariate. Divorces To continue the divorce example, we can model simultaneously the ages of the two spouses and the applicant for the divorce (husband, wife, or mutual). Thus, at this first stage, we are making the assumption that the mean length of marriage depends on the ex-spouses’ ages in the same way for each type of ap- plication. This improves the model only slightly; the AIC is 4911.8 as compared to 4916.4 given above with the same intercept for all types of application (both models without interactions between the ages). In order to be able easily to plot the regression curves, I shall use the simpler model with only the husband’s age. (The AIC is 5064.2, a much worse model, as might be expected; again, it does not provide much improvement as compared to the model with the same intercept for all types of application, given above, which had 5068.6.) The three parallel curves, with different intercepts, are plotted in the left graph of Figure 2.5. There is not much separation between these lines. Interactions A still more complex model allows not only the intercepts but also the slopes to differ among the categories of the categorical variable. This model can be written 0 ∑ (2.5) Here, the quantitative covariates are said to interact with the categorical covariate. Divorces When both ages are included in the model for the divorce data, including necessary interactions, the AIC is reduced to 4695.5. Again, in order to be able to plot the regression curves easily, I shall use the model without the wife’s age. (This has an AIC of 5049.0.) The curves are plotted in the right graph of Figure 2.5. They are quite different, with that for mutual consent levelling off more rapidly with age. (Here, the model could be simplified by eliminating the two parameters for interactions between type of application and the square of the husband’s age.)
  • 48. 2.4 Relaxing the assumptions 29 20 30 40 50 60 70 80 0 10 20 30 40 Husband’s age Length of marriage 20 30 40 50 60 70 80 0 10 20 30 40 Husband’s age Husband Wife Mutual Fig. 2.5. The fitted normal distribution regression lines for the divorce data, separately for the three types of application. Left: parallel lines; right: different slopes. 2.4 Relaxing the assumptions 2.4.1 Generalised linear models Two of the assumptions of normal models listed above (Section 2.2.1) easily can be relaxed by turning to generalised linear models. These provide a slightly wider choice of distribution and allow the mean to depend in a nonlinear way on the linear regression function. The standard distributional choices are normal, gamma, inverse Gauss, Poisson, and binomial. The modification to the regression equation involves some transformation of the mean, called the link function, say : 0 ∑ (2.6) This must be monotone so that its inverse exists. Gamma distribution Survival data (Chapter 3), and other responses involving durations, can often use- fully be modelled by the gamma distribution ; 1e Γ (2.7) Because we shall be using this distribution primarily to describe time, I have re- placed by , as I also shall do in the following distributions. An important special case, obtained by setting 1, is the exponential distri-
  • 49. 30 Basics of statistical modelling bution, equivalently, ; e or ; e (2.8) where the mean duration is given by 1 . I shall not use this distribution here, but frequently shall return to it later (see, especially, Section 4.1.2). In the gamma distribution, is the mean and is the ratio of the mean squared to the variance, the reciprocal of the square of the coefficient of variation. Thus, an exponential distribution has unit coefficient of variation which can serve as a standard of comparison. Each distribution in the generalised linear model family has a default, canonical link function. For the gamma distribution, it is the reciprocal or inverse, 1 (2.9) so that a regression equation using it will have the form 1 0 ∑ (2.10) Most often, this is inappropriate and a log link log 0 ∑ (2.11) is more useful. Thus, it is possible to change the link function for a given distribution. That used above, in Equation (2.2) with the normal distribution, is called the identity link: (2.12) its canonical link. Divorces Length of marriage can be thought of as the survival of the marriage. Then, for the divorce data, the gamma distribution may be appropriate. Here, I only shall consider the regression equation with a quadratic dependence on husband’s age. The resulting curve is plotted as the solid line in Figure 2.6. The surprising form of this curve results, of course, from the inverse link function. The AIC is 4890.1, very much better than any of the preceding models, even with more covariates and interactions among them. For comparison, the corre- sponding, previously obtained, curve from the normal distribution is plotted in the same graph, as the dotted line. The first part of the curve from the gamma distribution with reciprocal link is similar to that from the normal distribution. But, in contrast to this latter curve, it reaches a peak at about 60 years and then goes back down for the older people. This may be as reasonable a representation of the data. However, we do not yet know whether the improved fit results primarily from the more nonlinear form of the regression equation (the link function) or from the
  • 50. 2.4 Relaxing the assumptions 31 20 30 40 50 60 70 80 0 10 20 30 40 50 Husband’s age Length of marriage Gamma with inverse link Gamma with identity link Normal with identity link Fig. 2.6. The scatterplot of Figure 2.2 with the fitted gamma and normal regression lines. changed distributional assumption (the gamma distribution). We best can examine this in steps. When the identity link is used with the gamma distribution and the quadratic relationship in husband’s age, the curve is shown as the dashed line in Figure 2.6; it bends in the opposite direction to that from the normal distribution. The equation from the normal distribution is 20 6 1 04 1 0 0035 2 1 whereas that from the gamma distribution is 9 82 0 49 1 0 0032 2 1 For the gamma distribution, the AIC is 4797.9, as compared to 5068.6 above for the normal distribution with an identity link. This demonstrates that most of the improvement comes from the changed distribution. This conclusion can be con- firmed by using the reciprocal link with the normal distribution; this has an AIC of 5059.4. Thus, we must conclude that the conditional distribution of this response variable is decidedly non-normal, at least when a quadratic dependence on husband’s age is used. Appropriate choice of the distribution is essential when studying stochastic processes.
  • 51. 32 Basics of statistical modelling Log normal and inverse Gauss distributions Two other distributions in the generalised linear model family, the log normal and the inverse Gauss, may also be appropriate for duration data. The first can be derived from the normal distribution of Equation (2.1) by taking the logarithm of the responses and introducing the Jacobian: ; 2 1 2 2 e 1 2 2 log 2 (2.13) The second has the form ; 1 2 3 e 1 2 2 2 (2.14) Divorces For these two models, the AICs are, respectively, 4942.4 and 5074.6 for the same quadratic regression function with an identity link. Neither of these fits as well as the gamma distribution (AIC 4797.9), although the first is better than the normal distribution (5068.6). 2.4.2 Other distributions Generalised linear models contain a very restricted set of distributional possibil- ities. For continuous responses, these are essentially the normal, log normal, gamma, and inverse Gauss distributions. For stochastic processes, many other dis- tributions will also be important. Here, I shall look at a few of them. Weibull distribution For duration data, the Weibull distribution ; 1e (2.15) is especially important because of its simple properties (Section 3.2.1). Divorces Here, this distribution, with the quadratic function of husband’s age, has an AIC of 4643.0. This is a major improvement on the previous models. The regression equation is 9 13 0 46 0 0041 2 1 similar to that for the gamma distribution. These two curves are plotted in Figure 2.7. We see that the Weibull curve is higher than the gamma. This result is important because, for a model without covariates, the gamma distribution fits better than the Weibull. The AICs are, respectively, 5816.3 and 5850.5. Thus, the marginal distribution (gamma) is different than the conditional one (Weibull). The distributional assumptions can change as covariates are intro- duced into a model.
  • 52. 2.4 Relaxing the assumptions 33 20 30 40 50 60 70 80 0 10 20 30 40 50 Husband’s age Length of marriage Weibull Gamma Fig. 2.7. The scatterplot of Figure 2.2 with the fitted gamma and Weibull regression lines. The dashed gamma line is the same as that in Figure 2.6. Other distributions Another distribution, the log logistic, has a similar shape to the log normal but with somewhat heavier tails: ; e log 3 3 1 e log 3 2 (2.16) Other possibilities include the log Cauchy ; 2 log 2 (2.17) and log Laplace ; 1 2 e log (2.18) with even heavier tails. Divorces The log logistic distribution has an AIC of 4821.8, considerably better than the log normal, but not as good as the gamma and Weibull distributions. The log Cauchy has an AIC of 4853.0 and the log Laplace 4775.6, both also better than the log normal. However, none of these can compete with the Weibull distribution, although the second is better than the gamma. These results are summarised in Table 2.1.
  • 53. 34 Basics of statistical modelling Table 2.1. Fits of various distributions to the divorce data with a quadratic regression in husband’s age and identity link. Distribution AIC Normal 5068.6 Gamma 4797.9 Log normal 4942.4 Inverse Gauss 5074.6 Weibull 4643.0 Log logistic 4821.8 Log Cauchy 4853.0 Log Laplace 4775.6 2.4.3 Nonlinear regression functions Because of their link function, generalised linear models have a nonlinear compo- nent arising from the transformation of the mean in the link function (unless the identity link is used, as in most of the models above). However, they are linear in that function of the mean. There is no reason that we should be restricted to such regression functions. Indeed, in terms of husband’s age, our models above were nonlinear. Scientifically, the statistical distinction, in terms of the parameters, be- tween linear and nonlinear models is generally that the former are approximations to the latter. Logistic growth curve Here, I shall look at one simple case of a regression function that is nonlinear both in the covariate and in the parameters. Suppose that response variable depends on a covariate in an S-shaped fashion, so that the mean follows the function 1 e 0 1 1 (2.19) instead of a quadratic one. This is called a logistic growth curve (Section 12.3.1). Divorces There appears to be little theoretical reason for using this function with the divorce data; it implies that the mean length of marriage levels off to some con- stant value as the husband’s age increases. With the Weibull distribution, the AIC is 4647.4, somewhat poorer than the quadratic regression function. For compari- son, the two curves are plotted in Figure 2.8. We can see that they are similar for ages up to about 55, where most of the responses lie. Because of the limitations in the design of this study outlined in Section 2.1, the results above are not meant to be definitive in any sense. However, they do show clearly how changing assumptions can alter the conclusions that may be drawn. As in all scientific modelling, care must always be taken in formulating models for stochastic processes.
  • 54. Exercises 35 20 30 40 50 60 70 80 0 10 20 30 40 50 Husband’s age Length of marriage Quadratic Logistic Fig. 2.8. The scatterplot of Figure 2.2 with the fitted linear and nonlinear Weibull regres- sion lines. The solid line is the same as that in Figure 2.7. Further reading Good books on linear normal models are rare. Several books on generalised linear models are available; these include Aitkin et al. (1989), Dobson (2002), Lindsey (1997), and McCullagh and Nelder (1989). For nonlinear models, see Lindsey (2001). Exercises 2.1 Perform a more complete analysis of the study of divorce in Liège. (a) Develop an appropriate regression model using all necessary co- variates, including the number of children, and any necessary in- teractions. (b) The study also recorded the length of the court procedure. Analyse the dependence of this response variable on the covariates. 2.2 A medical three-period cross-over trial was performed to determine gastric half-emptying time in minutes. It involved 12 subjects, as shown in Table 2.2. This is an analysis of variance type of design for duration data. (a) Find an appropriate model to determine whether or not there are treatment effects. Besides the covariates explicitly present in the table (treatment and period), you may want to consider the ‘carry-
  • 55. Exploring the Variety of Random Documents with Different Content
  • 56. canimo What made him so sick? Onsay nacasaquit cania? I think fruit made him ill Naco ng̃a ang bong̃a maoy nacadaot cania Don’t eat too much fruit if you do not want to be sick Ayao ca pagcaon ug bong̃a sa hinlabihan ug dili ca buut magmasaquit Where is your father? Hain ba ang imong amahan? He is at home Tua sia sa balay Why does he not walk? Mano dili sia magasodoy sodoy? Because he has not yet recovered Cay apiogon pa sia Waiter, how long has the physician been waiting for me? Bata, dugay na ba ng̃a guihulat aco sa médico? Not long, a few minutes Dili dugay, pila ca minuto da You must go out very seldom, for I never have the pleasure of meeting you in the street Talagsada nanaog ca daguay, cay uala pa co maquita icao sa dalan I am very ill, so that I cannot go out Masaquit aco ug daco, tung̃ud niana dili aco macanaog I am very cold Nasip-on aco tinood I have been told your son is very sick, what is the matter with him? Guisuguinlan aco ng̃a masaquiton caayo ang anac mo, onsay saquit nia? He went to walk the other day with some friends, and caught a severe cold Sa usang adlao miadto sia sa pagsodoy sodoy uban sa pila ca isigcaing̃on nia ug nasip-on sia ug daco Will you be able to take care of that child? Mahimo mong pagbantay nianang bata? I will take care of it with the greatest pleasure Bantayan co sia sa maayong cabubuton How does your teacher feel? Comusta ang imong magtoto-on?
  • 57. He is so so now Arang arang man sia caron He is unwell Nadaot sia Is your neighbor in good health? Maayo ba ang imong siling̃an? He is now suffering from his stomach Caron guisoolan sia sa coto coto He is a little indisposed Masaquit sia ug diotay How is your family? Maayo ba ang imong panimalay? They all are well except my brother Maayo man silang tanan gauas sa acong igso-on ng̃a lalaqui What is his illness? Onsa ba ang saquit nia? He has sore fingers Masaquit sia sa mg̃a todlo I have heard your uncle is not well Nadung̃ug co ng̃a masaquit usab ang imong oyo-an It is not true Dili matuod He has got a sore throat Masaquit sia sa totonlan How long has he been ill? Dugay na ng̃a masaquit sia? It is not long since Bag-o pa And you, Sir, how do you feel? Ug icao, Señor, comusta ca? So so, but my daughter has a violent fever Arang arang, apan ang acong anac ng̃a babaye guihilantan ug maayo Since when? Canosa cutub? At midnight she was seized with it Sa tung̃ang gabi-i minsugod sia sa pagbati niana I wish her a speedy recovery Naning̃uha aco ng̃a magmapiscay sia sa labing madali She is much better than she was Maayo ayo na sia caron I hope she will get immediate ease from her illness Milaom aco ng̃a luason sia unta sa madali sa iyang saquit How are you, Madam? Comusta icao, Señora? I have not been well lately, but I feel better now Bag-o pa masaquit aco, apan caron maayo ayo na
  • 58. I hope to see you better soon Nagalaom aco sa pagquita canimo ng̃a mapiscay sa dili na madugay Are you sick, mother? Masaquit ca ba, anan? Yes, I am Oo I am very sorry for it Nasubu aco caayo tung̃ud niana I hope you are not seriously ill Basin ng̃a dili daco ang saquit mo
  • 60. Paquigpulong sa masaquiton ug sa mananambal. I have sent for you, doctor, because I feel very ill Guipaanhi co icao, Doctor, cay masaquiton aco caayo What ails you? Onsay imong guibati? My body is very weak Maluya caayo ang acong lauas How did you rest last night? Maayo ba ang pagcatolog mo caron gabi-i? Very badly; I did not sleep a moment Dautan caayo; uala aco matolog bisan usa ca pagpiloc da Let me feel your pulse Pamisil ca I had a fever the whole night Guihilantan aco sa tibooc ng̃a gabi-i I have had a terrible nightmare Guialimong̃aoan aco ug daco It would be better to die Maayo pa ang pagcamatay Show me your tongue Ipaquita canaco ang imong dila My head aches me Masaquit ang acong olo You want bleeding Quinahanglan ang pagcadlit canimo Your tongue is foul Buling̃on man ang imong dila Have you any pain in your chest? Masaquit ba ang imong dughan? I have a pain in my throat Masaquit ang acong totonlan Sit up on the bed Mulingcod ca sa higdaan Have you a pain in your side? Guibati mo ba ug saquit sa imong quilid? No, I have a pain in my waist Dili, ang acong hauac hinoo maol-ol I feel exceedingly weak Nalay lay caayo ang acong lauas I can scarcely stand on my legs Lugus macatindug aco Do you sleep soundly? Nacatolog ca ba ug maayo? I dream a great deal Nagadamgo aco caayo How were you taken ill? Onsay guinicanan sa imong saquit?
  • 61. I began by a shivering Nasip-on aco sa sinugdan Do you think it dangerous? Sa imong pagsabut malisud ba caha quini? Do not believe that Ayao ca paghuna huna niana I am very tired of being so long in bed Nabalao aco na sa higdaan I am going to die Mamatay aco We don’t know the value of health till we have lost it Dili quita magpacamahal sa caayo sa atung lauas cun dili quita magmasaquit Take care not to catch cold Bantay ca ng̃a dili ca baya masip-on Must I take that potion? Muinom aco caha nianang tambal? Yes, but take it fasting Oo, apan sa dili pa icao magpainit How often must I take it? Macapila muinom aco niana? Three times a day Macatolo sa usa ca adlao
  • 63. Paquigpulong sa pagduao. Good morning, Madam Maayong buntag, Señora Good day, Sir Maayong adlao, Señor Good afternoon, Miss Maayong palis, Señorita How do you do? Comusta ca? Very well, thank you very much Maayo man, diosmagbayad canimo sa macadaghan Be seated Mulingcod ca Please to take a seat Mulingcod ca usa Ah!, here is Mr. Michael, I am very glad to see you Ah!, ania si Sr. Miguel, nalipay aco caayo sa pagquita canimo I am delighted to make your acquaintance, Sir Daco ang calipay co sa pagcaila canimo, Señor I have the honor to salute you Nalipay aco sa pagpang̃amosta canimo Somebody knocks Dunay nagatoctoc Can it be Mr. Nicholas? Mao ba caha si D. Nicolás? Madam, I have the honor to wish you good day Señora, nalipay aco sa paghatag canimo ug maayong adlao I am delighted to see you well again Nalipay aco sa pagquita canimo ng̃a mapiscay na usab You are very kind to have thought of me Labihan ang caicog mo sa paghuna huna canaco I have called at your house several times, but I have not had the pleasure of finding you at home Dinhi na aco sa iño sa nacapila ug uala aco macadang̃at sa calipay sa paghiabut canimo sa balay Yes, and I am much regretting that I was not at home to receive you Mao lagui, ug nasaquit aco tinuod cay uala aco sa balay sa pagdauat canimo
  • 64. Allow me to retire Tuguti na aco sa pagpauli You want to leave me already? Pauli ca na ba? Deign to believe that I am very sorry that I cannot stay any longer with you Sayod ca usa ng̃a masaquit aco ug daco tung̃ud cay dili aco macahimo gayud magdugay sa paguban canimo I equally regret that your visit has been so short Nasaquit aco usab tung̃ud sa pagcahamubo sa imong visita What are you going to do this evening? Onsay bubuhaton mo caron hapon? I have to take my sister to the theatre May ihatod co sa teatro ang igso-on co ng̃a babaye Very well, till then Maayo man, hasta sa paquigquita quita Are you going already? Muadto na icao? You are in great hurry Dinalian ca lagui I must go Pauli aco na gayud Why are you in such a hurry? Ng̃ano dinalian ca sa ing̃on? I have a great many things to do Daghan man ang mg̃a buluhaton co Don’t forget us Ayao came hicalimti I must take leave of you Quinahanglan ng̃a manamilit aco canimo We must part Magabulag na quita I am going to take leave of you Magaadios na aco canimo Till I have the honor of seeing you again Hasta sa laing paghibala ta Till we meet again Hasta sa laing pagquita Till our next meeting Hasta ng̃a macaquita quita Thank you for your visit Diosmagbayad canimo tung̃ud sa imong pagduao Your servant, Madam Ang imong magsisilve, Señora I am at your service Tomanon co ang imong mg̃a sugo Your humble servant Ang ubus ng̃a sologo-on mo
  • 65. I am very much obliged to you Nagadiosmagbayad aco canimo sa macadaghan My compliments to your brother Icomusta aco sa imong igso-on Present my regards to your mother Icomusta aco sa imong anan Present my best wishes to your aunt Icomusta aco sa imong ia-an Present my respects to your husband Icomusta aco sa imong bana Give my kind regards to your lady Icomusta aco sa imong asaua Remember me to all at home Icomusta aco sa imong mg̃a loon I will not fail Tumanon co gayud ang imong mg̃a sugo Come very often Muanhi ca sa masubsub Good by Ari na aco Come again Balic balic Alas! here is Mr. Alexander Diay! ania si D. Alejandro How do you do Mr.? Comusta ca Señor? Very well Sa calooy sa Dios I am not well, my pains shall not have an end Dili maayo aco, ang acong mg̃a saquit dili matapus sa guihapon You must drive away your sadness Quinahanglan ng̃a licayan mo canang pagcaming̃ao I cannot Dili aco macahimo You must; to dispel one’s sorrow the best remedy is to visit the friends Quinahanglan man; sa paglicay sa mg̃a huna huna ng̃a masolobon ang labing maayong sumpa mao ang pagduao sa mg̃a abian All the friends are not able to dispel my sorrow Ang tanan ng̃a mg̃a caila co dili macagahum sa pagpauala sa acong caming̃ao You must try to visit them very often Magasulay ca sa pagduao canila sa masubsub caayo I should be so happy to be able to get rid of my low spirits Paladan aco unta cun cauad-an aco sa mg̃a huna huna ng̃a maming̃aon
  • 66. To do so, you must never think of your troubles Cay aron dang̃aton mo cana, ayao ca pagpalandong sa guihapon sa imong mg̃a saquit I give you many thanks for your advice Nagadiosmagbayad aco canimo sa nacadaghan tung̃ud sa imong sambag Don’t mention it Ayao ca paging̃on niana. dili aco tacus
  • 68. Paquigpulong tung̃ud sa pagadto sa higda-an. I feel very sleepy Catologon aco caayo Do you want to go to bed? Buut ca ba muhigda? Yes, I wish I was in bed Oo, naligad na aco unta Did the servant make the bed? Guiandam ba ang higda-an sa sologo-on? Yes, but he did not change the sheets Oo, apan uala sia magailis sa mg̃a habol ng̃a coquillo Waiter, change this pillow Bata, ilisi quining unlan These pillow-cases are not clean Buling̃on man quining mg̃a funda This blanket is very thick, I want another thin one Mabaga caayo quining habol, buut aco ug lain ng̃a maga-an (manipis) Do you want any more? Buut pa nimo ug lain? Put out the lamp Palng̃a ang quinqué Bring me the candlestick Iari ang palmatoria Till to-morrow, good night Hasta sa ugma, maayong gabi-i Pleasant dreams to you Basin ng̃a magadamgo ca sa mg̃a malilipayon Why do you not tell the children to go to bed? Mano dili nimo pahigda-a ang mg̃a bata? Because they have to sup before Cay manihapon pa sila And when do you go to bed? Ug anosa muhigda ca? I will go very soon Muhigda aco dayon Tell the waiter to come here Paanhion mo ang bata Have you closed the shutters? Guitacpan ba nimo ang persianas? Yes, Sir Oo, Señor You had better leave them open Ayo pa ng̃a dili sirhan sila Why? Mano?
  • 69. Because the weather is very warm Cay mainit caayo ang tiempo At what o’clock must I waken you? Onsa ng̃a horas ipucao canimo? At five o’clock Sa a las cinco All right, till to-morrow Maayo man, hasta sa ugma Good night Maayong gabi-i Good night Adios
  • 71. Paquigpulong tung̃ud sa pagbang̃on sa higda-an. Good morning, how did you sleep last night? Maayong buntag, naonsa ang pagcatolog mo caron gabi-i? How have you slept? Guionsa ang pagcatolog mo? How did you rest? Naonsa ang pagpahoay mo? Waiter, why did you not wake me? Bata, ng̃ano uala pucaoa aco nimo? It is time to get up Horas na man sa pagbang̃on Let us get up Mubang̃on na quita Come, up up Nan, mubang̃on ca Rise quickly Mubang̃on ca sa madali I have not slept very well Uala aco matolog ug maayo Never mind, rise Ualay sapayan, mubang̃on ca You slept without waking Uala ca maghimata sa tibo-oc ng̃a gabi-i No, let me sleep a little more Uala, pasagdi pa aco ug diriot Dress yourself, idle; do you not see the sunshine? Magvisti ca, tapolan; dili maquita mo ang cahayag sa adlao? At what o’clock did you rise? Onsa ng̃a horas nagbang̃on ca? I have just up Caron pa nagbang̃on aco What time is it? Onsa ng̃a horas? It is late Buntag na man So soon! It can’t be; I have not been in bed more than two hours Ing̃on ng̃a madali! Dili gayud mahimo; uala pay duha ca horas cutub sa paghigda co Two hours, say nine! Duha da ca horas, ingnon mo siam! I was sleeping so well when you called me up! Pagcamaayo sa pagcatolog co sa pagpucao canaco! I think so, but you must go up very soon Mao man, apan quinahanglan ng̃a mubang̃on ca dayon
  • 72. Pity me, I am even very sleepy Caloy-i aco, catologon pa aco tuod Make haste, and dress quickly Dalia, ug magvisti ca sa ualay lang̃an Why should I hurry so? Mano magdali aco sa ing̃on? The boys have been in class for more than a quarter of an hour Didto na ang mg̃a bata sa escuelahan capin na sa usa ca cuarto sa horas Well, cannot they begin without me? Maayo, quinahanglan ba aco caha cay arong musugod sila sa pagescuela? I have not any doubt that they can Sa ualay duha duha dili ca quinahanglanon So, let me sleep Busa, pacatolga aco I cannot allow you to be in bed for a moment Dili aco macatogot canimo sa paghigda pa bisan sa usa ca pagpiloc da I am ready Listo na man aco Yes, but it has not been without trouble Mao man, apan uala mahimo cana sa ualay cabudlay
  • 74. Paquigpulong tung̃ud sa paglacao, pagviaje, &. Where do you come from? Di-in ca guican? I come from Bohol Guican aco sa Bohol Where are you going now? Asa ca paing̃on caron? I am going to Mindanao Paing̃on aco sa Mindanao Where do you wish to go? Asa buut mong pagdolong? I wish to go to Manila Buut cong pagdolong sa Manila Where will you go to? Asa buut mong pagadto? I will go home Buut cong pagadto sa amo Where are you going to? Asa ca paing̃on? I am going to market Paing̃on aco sa tianggi I am going to you Paing̃on aco sa iño I go to your house Muadto aco sa imong balay I am going to the church Paing̃on aco sa singbahan I am going to village Paing̃on aco sa longsod Where have you to go? Asa bay imong adtoon? I have to go to Cebu Dunay acong adtoon sa Sugbu I have to go to Tagbilaran May acong adtoon sa Tagbilaran I have to go to the church to pray the Holy Child for us Dunay acong adtoon sa singbahan sa pagampo sa Santo Niño tung̃ud canatu Do you want to come with me? Buut ca ba muuban canaco? Do you want to accompany me? Buut ca ba mucuyog canaco? I cannot, for I have many things to do Dili aco macahimo cay duna acoy daghang mg̃a buhat Where shall we go to? Asa quita padolong? We shall go to take a walk Magasodoy sodoy quita Let us take your brother on our way Hapiton ta ang imong igso-on
  • 75. We want to go and take a walk Buut came magasodoy sodoy Which way shall we go? Onsang dalana atung pagaguian? Which way you please Icao magabuut Let us go to your brother’s Tala na sa balay sa imong igso-on With all my heart Sa maayong cabubuton I have no objection Uala acoy igalalis canimo Let us go this way Paing̃on quita niining dalana Where do you come from? Di-in ca guican? I come from Mrs. Mary’s Guican aco sa balay ni Dña. María I come from your father’s Sa balay sa imong amahan I come from yours Guican aco sa iño I come from the teacher’s Sa balay sa magtoto-on I come from their house Guican aco sa ila Where is Mr. Patrick? Hain ba si D. Patricio? You will find him at his house Maquita mo sia sa ila He is at home Tua sa ila He is out Uala sa ila He is going to his uncle’s Paing̃on sia sa balay sa iyang oyo-an Can you tell me where he has gone to? Mahimo ca ba magpahibalo canaco asa sia paing̃on? He is just gone out Bag-o pa nanaog sia He has gone to church Miadto sia sa singbahan He went to the barber’s Miadto sia sa buhatan sa mananalot Let us call at the hatter’s Hapiton ta ang baliguia-an sa magbolohat sa mg̃a calo Let us call at your aunt Muduao quita sa ia-an mo Is Mr. Edward at home? Ania ba sa balay si D. Eduardo? He is not at home Uala sia sa balay He is at Mrs. Elizabeth’s house Atua sa balay ni Doña Isabel Where is Mr. William going to? Asa paing̃on si D. Guillermo? I do not know where he is going to Ambut cun asa paing̃on sia
  • 76. Where does Mrs. Clara go? Asa paing̃on si Dña. Clara? I do not know Ambut. Inay Why are you so glad? Ng̃ano malipay ca sa ing̃on? Because my father called at me Cay guiduao aco sa acong tatay Why are you so sad? Ng̃ano masubu ca sa ing̃on? Because I have seen my friends passing and they have not called at my house Cay naquita co ang acong mg̃a higala ng̃a miagui lang ug uala aco nila hapita Why do you not go out? Mano di ca manaog? Because my steamer is just arrived Cay guidungoan aco sa vapor I am going away; it is time Pauli na aco; horas na man At what o’clock do you intend to come back? Onsa ng̃a horas ipauli mo? I shall be at home very soon Mupauli aco dayon sa balay I will go along with you Mucuyog aco canimo I will accompany you Ubanan ta icao. Magbolyog quita You go too fast Icao mainsil caayo I must return home Pilit aco mupauli sa balay Come back as fast as you can Dalia ang pagpauli, ta man sa mahimo mo Come back quickly Pauli ca sa madali Will you come back again? Mupauli na icao usab? I shall see you on my return Sibugan co icao sa pagpauli I shall go to Cebu to-morrow Ugma muadto aco sa Sugbu What will you gain by it? Onsay ipatigayon mo niana? You will not get anything by it Dili nimo pagdang̃aton ug bisan onsa When do you intend to depart? Anosa ca naghuna huna muguican? I intend to depart to-morrow Naghuna huna aco muguican ugma At what o’clock will the steamer set out? Onsa ng̃a horas iguican sa vapor? At seven o’clock in the morning Sa a las siete sa buntag
  • 77. How far did you travel last year? Asa ba cutub nagviaje ca sa usang tuig? As far as Spain Cutub sa España Are you fond of riding? Mahagugma ca ba mang̃abayo? I am very fond of it Mahagugma aco caayo Is it good travelling? Maayo ba ang pagviaje? It is Maayo man Are you fond of driving? Mahagugma ca magcarruage? I am Oo Are you fond of travelling by sea? Mahagugma ca magviaje sa dagat? Do you wish to travel by land? Mahagugma ca ba magviaje sa yuta? Do you want to travel on foot? Buut ca ba mulacao sa pagviaje? Do you like to travel on horse-back? Buut ca ba mang̃abayo sa pagviaje? How far is it from here to Manila? Pila ba ang cahalayo cutub dinhi hasta sa Manila? It is not very far Dili man halayo caayo Is Cebu very far from Bohol? Halayo ba caayo cutub sa Sugbu hasta sa Bohol? It is near Dool da man Has your friend already gone to Manila? Miadto na ba ang abian mo sa Manila? He has not yet gone, but he shall go very soon Uala pa, apan muadto sia di na madugay How far is he going? Asa cutub muadto sia? As far as my brother’s Cutub sa balay sa acong igso-on ng̃a lalaqui As far as my sisters’ Cutub sa balay sa acong mg̃a igso-on ng̃a babaye When will you go away? Anosa ca muguican? Very soon, because they are waiting for me at home Dili na madugay, cay guihulat aco nila sa balay Shall we set out early? Magmasayo ba quita sa pagguican?
  • 78. We shall start at five o’clock in the morning Muguican quita sa a las cinco sa buntag We cannot start till eight o’clock Dili mahimo quita muguican hasta sa a las ocho When is your brother-in-law going out? Anosa muguican ang imong bayao? To-morrow evening Ugma sa hapon Did you go very far? Halayo ba caayo ang imong guilactan? Not very far Dili halayo Where is your brother? Hain ba ang imong igso-on? He has gone to take a walk round the garden Didto sia nagasodoy sodoy sa tanaman Where was he yesterday? Diin ba sia cahapon? He was not at home Uala sia diha sa balay When will that man go away? Anosa muguican canang tao? He will go immediately Muguican sia caron caron Why has your brother gone away so soon? Ng̃ano minguican ang igso-on mo ing̃on ng̃a madali? Because some friends were waiting for him Cay guihulat sia sa iyang mg̃a amigos Why do you walk so fast? Ng̃ano ing̃on ng̃a mapiscay ang paglacao mo? Because I have scarcely time to be at home at four o’clock Cay lugus macaabut aco sa balay sa a las cuatro
  • 79. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com